Bayesian Network Applications to Customer Surveys ... - Science Direct

0 downloads 0 Views 334KB Size Report
Bayesian Network Applications to Customer Surveys and InfoQ .... specific research study with four components: i) a specific analysis goal (g), ii) the available ...
Available online at www.sciencedirect.com

ScienceDirect Procedia Economics and Finance 17 (2014) 3 – 9

Innovation and Society 2013 Conference, IES 2013

Bayesian Network Applications to Customer Surveys and InfoQ Federica Cugnataa, Ron Kenettb, Silvia Salinic * a

Department of Economics and Statistics 'Cognetti De Martiis', University of turin, Italy b KPA Ltd., Raanana, Israel Center for Risk Engineering, NYU-Poly, NY, USA c Department of Economics, Management and qunatitative Methods, University of Milan, Italy

Abstract M odelling relationships between variables has been a major challenge for statisticians in a wide range of application areas. In conducting customer satisfaction surveys, one main objective, is to identify the drivers to overall satisfaction (or dissatisfaction) in order to initiate proactive actions for containing problems and/or improving customer satisfaction. Bayesian Networks (BN) combine graphical analysis with Bayesian analysis to represent relations linking measured and target variables. Such graphical maps are used for diagnostic and predictive analytics. This paper is about the use of BN in the analysis of customer survey data. We propose an approach to sensitivity analysis for identifying the drivers of overall satisfaction. We also address the problem of selection of robust networks. M oreover, we show how such an analysis generates high information quality (InfoQ) and can be effectively combined with an integrated analysis considering various models. © This is an open access article under the CC BY-NC-ND license © 2014 2014 The TheAuthors. Authors.Published PublishedbybyElsevier ElsevierB.V. B.V. (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of the Organizing Committee of IES 2013. Selection and peer-review under responsibility of the Organizing Committee of IES 2013. Keywords: Customer Satisfaction; Importance-performance analysis

1. Introduction Customer satisfaction studies deal with customers, consumers and user satisfaction from a product or service. The topic was init ially developed in marketing theory and applications. The Business Dictionary defines customer satisfaction as “The degree of satisfaction provided by the goods or services of a co mpany as measured by the number of repeat customers.” (Kenett and Salini, Chapter 1, 2012). With such a definition, customer satisfaction

* Corresponding author. Tel.: +97297408442 ; fax: +97297408443. E-mail address: [email protected]

2212-5671 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of the Organizing Committee of IES 2013. doi:10.1016/S2212-5671(14)00871-5

4

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

seems to be an objective and easily measured quantity. However, unlike variables such as type of product purchased or geographical location, customer satisfaction is not necessarily observed directly. Typically, in a social science context, analysis of such measures is performed indirectly by employing pro xy variables. Unobserved variables are referred to as latent variables, whilst pro xy variab les are known as observed variables. In many cases, the latent variables are very co mp lex and the choice of suitable pro xy variab les is not immed iately obvious. For examp le, in order to assess customer satisfaction fro m an airline service, it is necessary to identify attributes that characterize this type of service. A general framework for assessing airlines includes attributes such as: on board se rvice, timeliness, responsiveness of personnel, airp lane seats and other tangible service characteristics. In general, some attributes are objective, related to the service’s technical-specific characteristics and others are subjective, dealing with behaviors, feelings and psychological benefits Statistical analysis is a science that relies on a transformation of the content domain space into an analytic space with dimensions that lend themselves to quantitative analysis. Self -ad ministered surveys use structured questioning to map out perceptions and satisfaction level, using observations fro m a population frame, into data that can be statistically analy zed. The importance that users or customers attach to various services and products is a vital part of cu stomer satisfaction surveys, as much as the measure of the quality and the satisfaction. In some cases, the level of importance is asked explicitly in the survey questionnaire; in other cases, it is derived using statistical models. The survey process cons ists of four main stages: 1) Planning, 2) Co llect ing, 3) Analy zing and 4) Presenting. Modern surveys are conducted in a wide variety of techniques including phone interviews, self-reported paper questionnaires, email questionnaires, internet-based surveys, SMS-based surveys, face to face interviews, videoconferencing etc. Eventually, the survey is provid ing in formation to decision makers. In this paper we focus on one specific aspect of the informat ion derived fro m satisfaction surveys which is related to t he link between level of satisfaction and importance attributed by customers to survey items. The analysis we propose is designed to increase the quality of informat ion when the survey goals are to identify decision drivers affecting customer decisions. Th e next section briefly summarizes the various dimensions of information quality and the concept of InfoQ developed by Kenett and Shmueli (2013). The following section presents an analysis and sensitivity assessment of satisfaction drivers using Bayesian Network models and the final section concludes with a summary and a description of further areas of research. 2. Information Quality of Customer Surveys Information quality (InfoQ) defined as the potential of a dataset to achieve a specific (scientific or pract ical) goal using a given empirical analysis method (Kenett and Shmueli, 2013). In assessing InfoQ one first needs to describe a specific research study with four co mponents: i) a specific analysis goal (g), ii) the available dataset (X), iii) the method or model that was used (f) and iv) a utility measure (U). As a generic concept, we define , i.e. the derived utility fro m an applicat ion of a model to a certain data set, given the research goals. In our case the data set X, the goals, g, and the utility U are assumed identical. This definit ion describes what is done by a specific analysis. In order to assess how it is carried out, InfoQ is deconstructed into 8 d imensions. These are: (1) Data resolution, (2) Data structure, (3) Data integration, (4) Temporal relevance, (5) Generalizability, (6) Chronology of data and goal, (7) Construct and action operationalizat ion and (8) Co mmunication. The goal, g, we are considering here is the use of customer surveys to determine drivers of customer satisfaction. The analysis we conduct, f, relies on Bayesian Networks as explained in the next section. The next section demonstrates our analysis using the ABC data set that is available fro m the web site of Kenett and Salini, 2012.

3. Bayesian Networks Applied to Customer Surveys An early attempt to apply Bayesian Networks for the analysis of customer surveys was presented in Kenett and Salini (2009) and Salini and Kenett (2009). A survey with n questions produces responses that can be considered as random variables, X1 , . . . , Xn . So me of these variables, q of them, are responses to questions on overall satisfaction,

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

recommendation or repurchasing intention, that are considered target variables. Responses to the other questions, X1 , . . . , Xk , k = n-q , can be analyzed under the hypotheses that they are positively dependent with the target variables. The co mb inations (Xi , Xj ), Xi ę X1 , . . . , Xn-q , Xj ę Xn-q+1 , . . . , Xn , are either positive dependent or independent, for each pair of variable (Xi , Xj ) , i İ n-q, n -q < j İ n. Bayesian Networks (BN) imp lement a graphical model structure known as a directed acyclic graph (DA G) that is popular in Statistics, Machine Learning and Art ificial Intelligence. BN are both mathematically rigorous and intuitively understandable. They enable an effective representation and computation of the joint p robability distribution (JPD) over a set of random variables (Pearl, 2000). The structure of a DA G is defined by two sets: the set of nodes and the set of directed edges (arrows). The nodes represent random variab les and are drawn as circles labeled by the variables names. The edges represent direct dependencies among the variables and are represented by arrows between nodes. In particular, an edge fro m node Xi to node Xj represents a statistical dependence between the corresponding variables. Thus, the arrow indicates that a value taken by variab le Xj depends on the value taken by variable Xi . Node Xi is then referred to as a 'parent' o f Xj and, similarly, Xj is referred to as the 'child' of Xi . An extension of these genealogical terms is often used to define the sets of 'descendants', i.e., the set of nodes from which the node can be reached on a direct path. The DA G guarantees that there is no node that can be its own ancestor or its own descendent. Such a condition is of vital importance to the factorization of the joint probability of a co llection of nodes. Although the arrows represent direct causal connection between the variables, the reasoning process can operate on a BN by propaga ting informat ion in any direction. A BN reflects a simp le conditional independence statement, namely that each variable, given the state of its parents, is independent of its non -descendants in the graph. This property is used to reduce, sometimes significantly, the nu mber of parameters that are required to characterize the JPD of the variables. Th is reduction provides an efficient way to compute the posterior probabilit ies given the evidence present in the data (Laurit zen et al, 1988, Pearl, 2000). In addit ion to the DA G structure, wh ich is often considered as the "qualitative" part of the model, one needs to specify the "quantitative" parameters of the model. These parameters are described by applying the Markov property, where the conditional probability d istribution at each node depends only on its parents. For discrete random variables, this conditional probability is often represented by a table, listing the local probability that a child node takes on each of the feasible values – for each co mbination of values of its parents. The joint d istribution of a collection of variables can be determined uniquely by these local conditional p robability tables. For a general introduction to Bayesian Networks with applications see and Kenett, 2012. Fro m the point of view o f InfoQ dimensions, Kenett and Salini, 2011 show with an examp le that 1) Bayesian Networks are particu larly effect ive in integrating qualitative and quantitative data (Data Integration), 2) the diagnostic and predictive capabilit ies of Bayesian Networks provide generalizab ility to population subsets. The causality relationship provides further generalizability to other contexts such as organizational processes or specific job functions (Generalizability), 3) the use of a model with conditioning capabilities provides an effect ive tool to set up improvement goals and diagnose pockets of dissatisfaction (Operationalization) and 4) the visual display of a Bayesian Net work makes it particu larly appealing to decision makers who feel uneasy with mathematical models (Co mmun ication). 4. A supervised approach to B ayesian Networks In the framework of Bayesian network analysis it is relevant to identifying the drivers of overall satisfaction (Kenett and Salini, Chapter 1, 2012, Cugnata and Salin i, 2013). If a response variable (overall satisfaction) is selected, the BN can be used in supervised approach instead the classical unsupervised approach. We consider two approaches to this task. The first one is based on the problem of selection of robus t networks. According to different learning algorith ms, some arcs in the network are present or some are not. Fo llo wing Kenett and Salini 2012 (see Chapter 11, Tab le 11.4) and Perucca and Salini 2014 we count how many t imes an arc between possible drivers and target variable is present with respect the total networks examined. The important drivers show the arc in most networks, i.e. the connection does not depend on learning algorith m. The second approach is based on the sensitivity analysis (Cornalba, Kenett and Giudici 2007). In this approach we use an experimental design strategy (Kenett and Steinberg, 2006) to generate a full factorial experiments based on driver comb inations. We then analyze the target variable thus generated (overall satisfaction) with respect to the observed one, in order to study the effect that each

5

6

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

driver co mbinations has on its variability. In order to get these two impo rtant extensions of BN in customer surveys applications, we are implementing R functions and plots that can be g eneralized to other dataset and fields in which the BN are used with supervised approach. We present here an example based on the data set ABC. In other works (Kenett and Salini, 2011, Kenett and Salini, 2012, Cugnata and Salin i, 2013) have already been done many studies on these data with various goals, including the identification of the drivers of overall satisfaction, then we can consider it as a benchmark for comparisons. ABC (a fictit ious but realistic co mpany) is a typical global supplier of integrat ed software, hardware, and service solutions to media and teleco mmunicat ions service providers. The dataset includes overall variables such as satisfaction, recommendation and repurchase. Moreover, there are questions grouped by different dimensions: Equip ment, Sales Support, Technical Support, Supplies and orders, Administrative Support and Terms condition and Prices. For each topic, there is also an evaluation of overall satisfaction. For each variable the satisfaction level is based on a five-point scale (fro m 1 = very low satisfaction to 5 = very high satisfaction). We use the overall Sat isfaction and the six overall dimensions, not the single items. Figure 1 shows that overall satisfaction with ABC.

Fig. 1. Distribution T arget variable.

For simplicity we have learned a very simple network in wh ich each dimension is connected to the overall satisfaction. There are no arcs between the dimensions.

7

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

Equipment

SalesSup

TechSup

Suppliers

AdmSup

TCPrices

3 2 1 5 4

Satisfaction

Fig. 2.Constrained Bayesian Network of the ABC data .

We create a full factor plan starting fro m the original variable excluding the target one. For each row we generate 1000 simu lated target variable using the BN estimated on the observed dataset. First of all we plot the distribution of simulate target variable for each level of each v ariab le.

3

4

5

3.015 2

3

4

5

1

3

4

5

3

4

5

3.015 Mean

2.995

Mean

3.005 2.995

3.005 2.995

2

2

TermsCondPrices

3.015

AdministrativeSup

3.015

Suppliers

1

3.005

Mean 1

3.005

2

2.995

Mean

3.005 2.995

Mean

3.005 2.995 1

Mean

TechnicalSup

3.015

SalesSup

3.015

Equipment

1

2

3

4

5

1

2

3

4

Fig. 3. Mean of simulate target variable (overall satisfaction) for each level of each dimension.

5

8

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

3

4

2

3

4

5

1

3

4

5

2

3

4

5

0.256

0.260

Mean of KS

0.264

TermsCondPrices

0.256

0.260

Mean of KS

0.264

AdministrativeSup

0.256

2

0.264 0.256

1

Suppliers

1

0.260

Mean of KS

0.264 0.260

5

0.264

2

0.260

1

Mean of KS

TechnicalSup

0.256

0.260

Mean of KS

0.264

SalesSup

0.256

Mean of KS

Equipment

1

2

3

4

5

1

2

3

4

5

Fig. 4. Mean of the values of Kolmogorov-Smirnov statistic

Equipment

SalesSup

TechSup

Suppliers

2

AdmSup

TCPrices

1

Satisfaction 3

5 4

Fig. 5. Bayesian network and marginal probability of Satisfaction if the T echnical Support worsens (100% 1= very low satisfaction)

Federica Cugnata et al. / Procedia Economics and Finance 17 (2014) 3 – 9

Equipment

SalesSup

TechSup

3

Suppliers

AdmSup

TCPrices

2 1

Satisfaction 4

5

Fig. 6. Bayesian network and marginal probability of Satisfaction if the T echnical Support improves (100% 5 = very high satisfaction)

Acknowledgements The first and third authors are grateful for the financial support of the project MIUR PRIN MISURA – Multivariate models for risk assessment. References Cornalba C, Kenett RS and Giudici P. (2007) Sensitivity analysis of Bayesian networks with stochastic emulators. Paper presented to Joint ENBIS-DEINDE 2007 Conference on ‘Computer Experiments versus Physical Experiments’,University of T urin, 11 April. Cugnata F and Salini S. “Model-Based Approach for Importance-Performance Analysis”, Quality and Quantity. 10.1007/s11135-013-9940-3, October 2013. Kenett RS and Salini S. “ New Frontiers: Bayesian networks give insight into survey-data analysis”, Quality Progress, pp. 31-36, August 2009 Salini S and Kenett RS. “ Bayesian Networks of Customer Satisfaction Survey Data”, Journal of Applied Statistics, Vol. 36, No. 11, pp . 11771189, 2009. Kenett RS. "Introduction to the Special Issue on Non Standard Analysis of Customer Satisfaction Survey Data”, Q uality Technology and Quantitative Management, Vol. 7, No.1, p. i, 2010. Kenett RS and Salini S. “Modern Analysis of Customer Surveys: comparison of models and integrated analysis”, Applied Stochastic Models in Business and Industry (with discussion), 27, pp. 465–475, 2011. Kenett RS and Salini S. Modern Analysis of Customer Surveys: with Applications using R, John Wiley and Sons, 2012. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470971282.html Kenett RS. “ Applications of Bayesian Networks”, 2012. ttp://ssrn.com/abstract=2172713 Kenett RS and Steinberg D. New Frontiers in Design of Experiments, Quality Progress, pp.61-65. 2006. Kenett RS and Shmueli G. “ On Information Quality”, Journal of the Royal Statistical Society, Series A (with discussion), 176(4), 2013. http://ssrn.com/abstract=1464444. Lauritzen SL and Spiegelhalter DJ. “ Local computations with probabilities on graphical structures and their application to expert systems”, Journal of the Royal Statistical Society, Series B (Methodological), 50, 2, pp.157 -224, 1988. Pearl J. Causality: Models, Reasoning, and Inference, Cambridge University Press, UK, 2000. Perucca G., Salini S. (2014). Travellers’ Satisfaction with Railway Transport: A Bayesian Network Approach, Quality Technology & Quantitative Management, in press.

9

Suggest Documents