Modelling Customer Satisfaction Using Bayesian Networks

6 downloads 108353 Views 150KB Size Report
tool targets business users with little experience in analytics we developed a software tool which hides most of the inherent complexity of Bayes- ian Networks ...
Modelling Customer Satisfaction Using Bayesian Networks

Jabar Fatah1

Detlef Nauck1

Abstract Any service-centric business aims to maximize its customer satisfaction. One way of achieving this is to know customers’ views about the business through a survey, analyse the data, and take appropriate action. In this paper we apply Bayesian networks to survey data to find relationships between different questions. Our aim is to run ‘what-if’ scenarios in order to analyze the impact of process changes on customer satisfaction. This not only allows a user to easily set business targets but also to proactively manage customer relations. We also have implemented a software tool aimed at business users that hides the complexity of the analysis and provides an intuitive graphical user interface. Keywords: Bayesian Networks, Data Analysis.

1

Introduction

Businesses want to understand how their brand image is perceived, if customers are satisfied with certain products or the company in general or if customers are happy to recommend the products and services they use. For this reason most businesses run some form of customer surveys. The majority of tools for customer analytics currently available to business users are typically limited to computing summary statistics,

1

Mirko Boettcher1

simple visualization and reporting of data. More complex tools that could offer possible explanations for observations, discover knowledge, or allow making predictions are mostly research prototypes that aim at an academic audience or at users who are highly trained in analytics. However, it is crucial for businesses to understand what potentially drives satisfaction or loyalty and how such drivers can be influenced. This functionality is to our knowledge not provided by existing tools. In this paper we present a tool based on Bayesian Networks we developed for business users to perform advanced analysis on customer data. The tool can be used to perform sensitivity analysis, what-if analysis and impact analysis all of which are aimed at prediction and simulation of future customer actions or perceptions. Moreover, because our tool targets business users with little experience in analytics we developed a software tool which hides most of the inherent complexity of Bayesian Networks and thus allows easy modelling of customer behaviour and the exploration of future scenarios. The outline of the paper is as follows: Our choice of Bayesian networks to analyse the survey data is explained in Section 2. In this Section we also give an overview of Bayesian networks together with a description of the K2 algorithm. Our tool iCSat will be explained in Section 3 while Section 4 will be devoted to iCSat applications and a comparison of our ap-

The authors are with BT Group, Chief Technology Office, Research and Venturing, Intelligent Systems Research Centre, Adastral Park, Orion Building pp1/12, Ipswich IP5 3RE, UK (http://www.btplc.com/, e-mail [firstname.lastname]@bt.com)

proach with other methods. Finally, we draw our conclusions in Section 5.

2

Bayesian Networks

2.1

An Overview of Bayesian Networks

iCSat uses Bayesian networks [6],[16] to model dependencies between all variables available in a data set. A Bayesian network is a convenient way of representing a high-dimensional probability space by exploiting conditional independence between variables Bayesian networks form a directed acyclic graph (DAG). The nodes in the graph are represented by random variables which again have a set of random values associated with it. In our application we treat every survey question as a random variable and the set of possible answers as its domain of random values. The directed edges between nodes encode conditional dependencies between variables. In a way they show the direction of flow of information, Figure 1.

X

Y Figure 1 Two nodes joined by a directed link. Here, node X is called parent of Y while node Y is called a child of X. X is regarded as the cause while Y as the effect. If a node does not have a parent then it is called the root node. On the other hand, if a node does not have any children then it is called a leaf node. Formally, a Bayesian network, B, can be written as a tuple [1], [16], [22]:

B = (G,θ ) The first component, G, is a DAG representing the conditional independencies that hold in the domain to be modelled by B. Associated with each node is conditional probability distribution

θ i . The second component, , describes the set of these distributions. A Bayesian Network B can be looked at as a representation of a joint probability distribution P(X1, …, Xn ) over the states of its variables Z = { X1, …, Xn } [16]. The Bayesian Network encodes this joint probability distribution as the product of conditional probabilities n

n

i =1

i =1

P( X 1 ,..., X n ) = ∏ P( X i | π ( X i )) = ∏ θ i , where π (Xi) is the parent set of Xi. The advantage of using Bayesian networks is that given an instantiation of a subset of the random variables the conditional probabilities of the remaining ones can be very efficiently calculated using the θ i . In contrast, the use of the joint probability distribution would lead to a time complexity which is exponential in the number of random variables.

2.2 Learning Bayesian Network Structure from Data Different methods to construct a Bayesian networks structure from data have been proposed [3], [4], [5], [7], [8], [9], [14], [15], [19], [20], [21]. We use here the most widely used approach, the K 2 algorithm, published in [3] which utilises the following theorem. Let Z be a set of n discrete random variables, where a variable X i in Z has ri possible value assignments: ( xi1 ,..., xiri ) . Let D be a database of m cases, where each case contains a value assignment for each variable in Z . Let B denote a Bayesian Network containing only the variables in Z . Each variable X i in B has a set of parents π i = π ( X i ) . Let φ ij denote the j th unique instantiation of π i relative to D . Suppose there are qi such unique instantiations of π i . Define α ijk to be the number of cases in D in which variable xi is instantiated as vik and π i is instantiated as φij . Let

N ij =

n k =1

α ijk

Suppose that the following conditions hold: The process that generated the database can be mod-

elled accurately as a belief network containing only the variables in Z which are discrete. Also cases occur independently, given a network model. Finally, second-order probability distributions over conditional probabilities are marginally independent and uniform. Then the most probable network structure B given the database D is: n

P ( BS , D) = P ( BS )∏ f (i, π i ) i =1

Where: qi

f (i, π i ) = ∏ j =1

ri (ri − 1)! ∏α ijk ! ( N ij + ri − 1)! k =1

(1)

K 2 algorithm searches heuristically for P( B , D) . Below is the algorithm which is

The

taken from [17]: procedure K 2 ; {Input: A set of n nodes, an ordering on the nodes, an upper bound u on the number of parents a node may have, and a database D containing m cases.} {Output: For each node, print its parents.} for i := 1 to n do

π i := ∅ ; Pold := f (i,π i ) ; {This function is computed using Equation 1.} OKToProceed := true; While OKToProceed and | π i |< u do let z be the node in Pred ( X i ) − π i that maximizes f (i, π i ∪ {z}) ;

Pnew := f (i, π i ∪ {Z }) ; if Pnew > Pold then

Pold := Pnew ;

π i := π i ∪ {Z } ; else OKToProceed := false; end {While} Set π i as the parent set of X i ; end {for};

end { K 2 }; The order of the nodes is computed by the algorithm of Singh and Valtorta [18].

3

Dynamic Modelling with iCSat

Given respondent data, the problem is to model customer satisfaction, to find its drivers and to analyse what-if scenarios. In a what-if scenario we want change values or distributions of some attributes and see the influence on the remaining variables. In order to assist a business user in applying Bayesian Networks to customer satisfaction modelling we developed the software tool iCSat. The tool which we describe in the following provides the user with an easy to use and intuitive graphical user interface. It therefore hides most of the complexity of both the problem and Bayesian networks and thus opens the range of possible users also to non-experts in data analysis. iCSat uses the variant of the K2 algorithm by Singh and Valtorta [18] to automatically create a Bayesian network structure from survey data. To represent the network and to learn the conditional probability distributions from data it uses a Java library of the commercial software Netica [13]. The main purpose of iCSat is to allow business users with no background in analytics to upload survey data, let the tool create a model automatically and then run different what-if scenarios. After loading data, building and then running a Bayesian network the user interface looks as shown in Figure 2. The interface completely hides the fact that the analysis is based on Bayesian networks. The user very easily can enter assumptions – both as values and as complete distributions – into any variable and observe the effects on all other variables. Each question is represented by a histogram. A histogram on the left can be chosen by clicking it and modifying the bar heights, i.e, the answers’ distribution. Then pressing the update button will update the entire network and the prediction is displayed on the right hand side.

There are two bars, black and blue or grey, in the prediction histograms. The black ones represent the original data while the blue (grey) ones are the predicted values. The user can immediately see the impact of the input has on the network.

Figure 2: Running a model. The tool offers a sensitivity analysis which displays the strength of the dependencies between variables. For example, if we carry out sensitivity analysis for question QUESTION5 we obtain the table in Figure 3. It turns out that questions QUESTION3 (9.72%) and QUESTION4 (2.37%) are closely linked with the chosen question. The other questions, not shown in the figure, are not linked with QUESTION5 as they have 0% sensitivity.

The sensitivity measures how much the beliefs of the target node will change if a finding is entered at another node. If evidence is entered at QUESTION3 then there will be a 9.72% reduction of entropy for the target node QUESTION5. Sensitivity analysis does not indicate the directionality of this dependency. For example, if we increase the probability for responses in positive attributes of QUESTION3 then will this lead to increase or decrease in the number of responses in the positive attributes of QUESTION5? Impact analysis provides the user with this information. This analysis allows the user to see the impact of introducing evidence at some nodes on a target node. Suppose that the company changes a product or introduces a service which will cause a shift in the distribution of QUESTION3 to more positive values. By setting the value of QUESTION3 to the most positive response value we obtain the situation shown in Figure 4. The prediction of QUESTION5 is shown on the right side. If we look at the corresponding table (Figure 5) we see an increase in the first two very important categories. This also shows us the direction of dependency between this question and QUESTION3. The user can also run the tool in target setting mode where he can set any number of variables to potentially achievable distributions and let iCSat predict the distributions of the remaining questions. This mode is especially important for business users. Businesses typically set themselves stretching targets in key performance indicators like, for example, customer satisfaction. Using iCSat the user can set the distribution of the satisfaction variable to a target distribution that appears to him both achievable and challenging. After updating the network the distribution in the other variables indicate what has to change in the business in order to achieve the target distribution in customer satisfaction.

Figure 3: Sensitivity analysis.

The target setting mode is implemented by using uncertain evidence [2] which is implemented as virtual evidence (likelihoods) in the Netica library.

Figure 4: Impact analysis.

Figure 5: The predicted distribution of QUESTION5

4 Applications and Comparison to Other Approaches The tool we developed is generic and can be easily adapted such that it is applicable to any domain if the provided data is categorical. Different business users within BT have used iCSat as a data analysis tool [11]. It has been specifically used in: •

Customer satisfaction analysis: business users managed to identify drivers for customer satisfaction and their impact through feeding survey data to the tool.



Early warning system: in repair processes, the tool identified circumstances which can lead to complains by customers. This is valuable information to act upon proactively.



Target setting: planning business targets and determining achievable distributions for target variables.



Field force performance: by using data from field force operations business users looked at the impact of certain regional factors on job performance.

In the following we provide a brief comparison between our approach and others. Although the main objective of using Bayesian networks for analyzing customer feedback data is that any variable can be used as a target variable and is typically not used to classify data we still need to have an idea how well the model performs. We decided to look at the classification scenario for a selected target variable (level of satisfaction) and compare the results to a number of typical classification approaches that can handle the type of data we have. As a sample data set we used 950 records from a customer survey. The data was split randomly into training and test sets of equal size. The data contained 9 variables that were all encoded by integers representing ordered categories. For reasons of confidentiality we cannot reveal the exact meaning of the variables or the distributions of the individual attributes. We used our automatic data analysis platform SPIDA [12] to generate a neural network, a neuro-fuzzy classifier (NEFCLASS [10]), a decision tree and a support vector machine. iCSat was used to learn a Bayesian network. The confusion matrix of the Bayesian network can be seen in Table 1. The model predicts the satisfaction level with an accuracy of 51.2% on the test set. Satisfaction is measured on 7 levels plus an 8th level for a missing response. That means we are looking at an eight class problem and a random classifier would achieve 12.5% accuracy. The classification accuracies of the other approaches are shown in Table 2 and as we can see they are all slightly higher at around 54%. The best classifier is a neural network with 54.81%. However, the neural network only predicts levels 2 and 3. The same is true for the support vector machine and NEFCLASS.

Table 1. Confusion matrix of the Bayesian network (test set with 475 patterns. The accuracy is 51.5%). Prediction Actual

1

2

3

4

5

6

7

8

1

11

42

7

0

0

0

0

0

2

9

157

39

0

0

0

0

0

3

1

83

70

4

2

0

0

0

4

0

5

13

2

1

0

0

0

5

0

1

9

3

3

0

0

0

6

0

0

1

0

0

0

0

0

7

0

0

3

1

3

0

4

0

8

0

0

1

0

0

0

0

0

Table 2. Test results of a neural network, a neuro-fuzzy classifier, a decision tree and a support vector machine on predicting satisfaction levels based on 8 indicators. Model

Accuracy

Neural Network

54.81%

NefClass

53.83%

Decision Tree

54.03%

SVM

53.44%

NEFCLASS is a fully automatic neuro-fuzzy classifier that learns fuzzy rules and fuzzy sets from data. For this data set it detected only two rules and used only one input variable V1: If V1 is small then Level 2 If V1 is large then Level 3 The fuzzy sets for small and large are shown in Figure 7. All of these three approaches interpret the inputs numerically, which is of course not completely adequate, as they are merely from an ordinal scale.

Figure 7. The neuro-fuzzy classifier NEFCLASS used only one variable partitioned by two fuzzy sets. However, it produced only two rules for the two majority classes (variable names are obscured). The decision tree (C4.5) has a confusion matrix that resembles the confusion matrix of the Bayesian network and manages to make some correct classifications for most levels. We build two versions of the tree. The first version (54.03% correct) interpreted the inputs as numbers and could therefore use a variable repeatedly in a branch because it used binary splits to partition the inputs. The created tree was rather large with 20 levels and 137 nodes. The second tree interpreted the inputs as symbols and therefore used a variable only once in a branch. The tree was smaller with only 8 levels but still 100 nodes. However, the accuracy suffered and the tree achieved only 50.5 % accuracy on the test set. The Bayesian network learned by iCSat is shown in Figure 8 as Netica would display it. When we let iCSat run a sensitivity analysis it reveals the variable V1 has the strongest influence on the satisfaction level. V1 was also the only variable used by the neuro-fuzzy classifier and was the root node in both decision trees.

to identify drivers of customer satisfaction or any other target through using a very easy to use and intuitive interface that hides the complexity of the underlying Bayesian network. The tool enables users to select target questions giving them achievable distributions and predict for the distributions of other questions. At the same time iCSat is fully automatic and creates Bayesian networks from data. The user neither needs to set any parameters nor needs to design a network. The contributions of our approach to customer analytics are:

Figure 8. The Bayesian network learned by iCSat, displayed in Netica (variable names have been replaced). The graph shows a direct connection between the satisfaction node SAT and the node V1. The arc is directed from SAT to V1. This illustrates the problem an inexperienced user would have when trying to interpret the network structure. There is no guarantee that a node the user wants to use as a target is actually a child node of perceived drivers. We have to keep in mind that the data we are looking at when we analyze customer feedback is a representation of perceptions. We would expect large interdependencies between different perceptions of different aspects of satisfaction and it would probably not be suitable to assume one type of perception drives the other but not vice versa. We merely want to use the Bayesian network to represent the probability distribution of the data and to conveniently choose any variable as a target without the need for remodeling. The network structure is therefore not of use to the typical user who is actually working with iCSat.

5

Conclusions

We developed a tool to model customer satisfaction and to examine what-if scenarios based on Bayesian networks. iCSat allows business users



Exploring dynamic what-if scenarios. This is an invaluable insight into how to achieve specific target distributions by changing business drivers.



In addition to finding the dependency between different questions in a questionnaire, sensitivity analysis allows us to group dependent questions together. This can help improving the design of questionnaires.



Our approach is generic and applicable to any categorical data set.



The comparison to other approaches shows us that the model learned by iCSat is not worse in the restricted application of classifying data but offers us a much broader and more flexible application

References [1] Castillo E, Gutierrez J M, & Hadi A S Expert systems and probabilistic network models. Springer, Berlin, 1997. [2] Chan H and Darwiche A, ‘On the Revision of Probabilistic Beliefs using Uncertain Evidence’, In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), pp. 99-105, 2003. [3] Cooper, G F, and Herskovits, E H, A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9, 309-347, 1992.

[4] Fung R M, and Crawford, S. L, Constructor: A system for the induction of probabilistic models, Proceedings of AAAI, Boston, MA, MIT Press, 762-769, 1990. [5] Glymour C, Scheines R, Spirtes P, and Kelly K, Discovering Causal Structure, Academic Press, San Diego, CA, 1987. [6] Heckerman D, Wellman M P, Bayesian Networks, Communications of the ACM, vol. 38, no. 3 (March 1995). [7] Herskovits E H, Computer-based probabilistic network construction, PhD thesis, Medical Information Sciences, Stanford University, Stanford, CA, 1991. [8] Lam W and Bacchus F, Using casual information and local measures to learn Bayesian networks, Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence, Washington D.C, 243-250, 1993. [9] Lauritzen S L, Thiesson B and Spiegelhalter D, Diagnostic systems created by model selection methods-a case study, Preliminary Papers of the 4th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, 93-105, January 36 1993. [10] Nauck D, Klawonn F and Kruse R: ‘Foundations of Neuro-Fuzzy Systems’, Wiley, Chichester, 1997. [11] Nauck D, Ruta D, Spott M, and Azvine B, “Being Proactive: Customer Analytics for Predicting Customer Actions”. BT Technology Journal, Springer 2006. Accepted for publication. [12] Nauck D, Spott M and Azvine B, ‘SPIDA – A Novel Data Analysis Tool’, BT Technology Journal, vol. 21, no. 4, pp 104 – 112, Kluwer, Dordrecht, October 2003. [13] Norsys Software Corp. ‘Netica User Manual V1.05’. 1997. http://www.norsys.com/dl/NeticaMan_Win .pdf [14] Pearl J, and Verma T, A theory of inferred causation, Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, Morgan Kaufmann, 441-452, 1991.

[15] Pearl J, and Wermuth N, When can association graphs admit a casual interpretation? (first report), Preliminary Papers of the 4th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, 141-150, January 3-6 1993. [16] Pearl J. Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann, 1988. [17] Ruiz C, Illustration of the K2 Algorithm for Learning Bayes Net Structures. Computer Science Department, WPI, http://www.cs.wpi.edu/~ruiz. [18] Singh M and Valtorta M, Construction of Bayesian Network Structures from Data: a Brief Survey and an Efficient Algorithm. Int. J. Approximate Reasoning 12:111-131, Elsevier Science, New York, NY, 1995. [19] Spirtes P, and Glymour C, An algorithm for fast recovery of sparse causal graphs, Social Science Computing Review, 9(1), 62-72, 1991. [20] Spirtes P, Glymour C, and Scheines R, Causality from Probability, Pitman, London, 1990, 181-199. [21] Verma T, and Pearl J, An algorithm for deciding if a set of observed independencies has a causal explanation, Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 323-330, 1992. [22] Wittig F, Anthony Jameson: Exploiting Qualitative Knowledge in the Learning of Conditional Probabilities of Bayesian Networks. UAI 2000: 644-652