Quantitative Modeling of Customer Perception from Service Data ...

2 downloads 108 Views 894KB Size Report
Service Data using Evolutionary Optimization. Sunith Bandaru and ... India Science Lab, General Motors Global R&D .... lem; e.g. oil change) and 5 (major problem; e.g. engine ..... point with respect to the σ and g objectives, a focused search.
Quantitative Modeling of Customer Perception from Service Data using Evolutionary Optimization Sunith Bandaru and Kalyanmoy Deb

Vineet Khare and Rahul Chougule

Kanpur Genetic Algorithms Laboratory Indian Institute of Technology Kanpur Kanpur 208016, U.P., India

India Science Lab, General Motors Global R&D GM Technical Centre India Pvt Ltd Bangalore 560066, India

{sunithb,deb}@iitk.ac.in

{vineet.khare,rahul.chougule}@gm.com

ABSTRACT

the transaction-specific approach, customer satisfaction is viewed as a post purchase experience for a specific purchase transaction [18]. On the other hand, cumulative customer satisfaction is based on the overall purchase and consumption experience with a product or service [14]. In the context of global competition, there has been an increasing awareness in viewing the enterprise as a number of business processes and improving the processes which are directly linked with customer satisfaction. Quality and customer satisfaction have been recognized as playing a crucial role for success and survival of a product. Although, several conceptual models have been developed to define quality and customer satisfaction, limited scientific literature [1, 3, 5] has been reported on quantifying customer satisfaction. In this regard, a quantitative approach is needed to evaluate the quality and to measure the extent to which customer expectations are met. Within automotive OEMs, quality (or warranty) data analysis is focused on number of failures in the field (example, Incidences per Thousand Vehicle or IPTV [19], Problems per Hundred Vehicles or PPH) and limited emphasis is placed on the assessment of individual customers’ (or consumers’) perception and satisfaction. In addition, published reports on assessment of customer satisfaction (such as American Customer Satisfaction Index, J.D. Power and Consumer Reports) are used by OEMs for insights into vehicle quality. The American Customer Satisfaction Indices are measured on a 0 to 100 scale by several questions that assess customer satisfaction [1]. Another commonly used customer satisfaction index in the automotive domain is provided by J.D. Power and Associates. It periodically reports satisfaction indices related to initial quality, sales satisfaction, dealer maintenance and service satisfaction. Methods used to determine these indices are based on surveys. For example, the Initial Quality Study (IQS) gives information on new vehicle quality after 90 days of ownership. Owners are surveyed regarding problems with their new vehicles [3]. Consumer Reports (CR) surveys assess customer satisfaction based on three aspects: performance, safety and reliability. For the work presented in this paper, we assess customer satisfaction based on quality and service of the product. In this context, CR reliability ratings are the most relevant assessment [5]. They are obtained based on a set of questions related to failures observed by customers in various vehicle subsystems (such as engine, transmission). These survey based estimates rely on a smaller sample (usually 200 to 400 per model) and may not always represent the real world facts. In contrast to survey based method, the current work relies

This paper proposes a novel method for using the service (field failure) data of consumer vehicles to estimate customer perception. To achieve this, relevant variables are extracted from the vehicle service data and provided as input to the proposed algorithm which then comes up with an optimized mathematical model for predicting the Customer Satisfaction Index or CSI. The methodology is then extended in a way that allows comparison of the CSIs of two or more vehicle models, thus providing a measure of the market’s perceived quality of a vehicle model relative to another. Validation against the Consumer Reports data shows that customer experiences and their consequent response in surveys are indeed a reflection of the numbers the service data provides. However, it is argued that the proposed model is more generic than the Consumer Reports because – (1) it doesn’t rely on consumer surveys and (2) it can be used to assess individual consumer level satisfaction.

Categories and Subject Descriptors I.6 [Simulation and Modeling]: Applications

General Terms Performance

Keywords Customer satisfaction index (CSI), mathematical model, evolutionary optimization

1.

INTRODUCTION

Customer Relationship Management (CRM) involves practical understanding of customer satisfaction and its management. It has been a prominent aspect of business for the past decade. Two different conceptualizations, namely transaction-specific and cumulative, have been proposed in the literature for assessing customer satisfaction [4, 2]. In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’11, July 12–16, 2011, Dublin, Ireland. Copyright 2011 ACM 978-1-4503-0557-0/11/07 ...$10.00.

1763

high perceived quality and, more importantly, for highlighting the problem areas in vehicle models which rate low on the customers’ perception of quality. The work presented in this paper is based on the assumption that, given a vehicle model, most of the customers should have a similar overall view (satisfaction) of the vehicle model. This should result in a CSIvehicle distribution with low variance (as shown in Figure 2). A function f which models the CSIs based on this assumption would also make more sense for averaging over all customers. Our aim is to find such a function. By aggregating the CSIvehicle values for all the vehicles corresponding to a model, we want to find a representative CSI value for that vehicle model. Further, we modify our problem formulation slightly to find CSI functions that can quantitatively rank different vehicle models.

on recorded field data to estimate satisfaction level at an individual customer level and aggregates it over a particular vehicle model. The rest of the paper is organized as follows. Sec. 4 describes the framework for the proposed satisfaction model built on features extracted (Sec. 3) from vehicle sales and service data (Sec. 2). Sec. 5 presents the satisfaction model for a given vehicle model and Sec. 6 presents its extension to multiple vehicle models. Sec. 6 also contains discussion of results obtained and their comparison with Consumer Reports’ reliability ratings. Sec. 7 concludes with future work directions.

2.

VEHICLE SALES AND SERVICE DATA

A recent study by J.D. Power and Associates lists the following as critical elements of customer satisfaction [11]:

Surveys No. of customers

1. Quality and reliability (24%) with respect to problems experienced with the vehicle across different vehicle sub-systems. 2. Vehicle appeal (37%) which includes design, comfort, features, etc.

Subjectivity Varied customer responses

Customer Satisfaction Index (CSI)

3. Ownership costs (22%) which includes fuel consumption, insurance and costs of post-warranty service or repair.

Figure 1: Frequency distribution of CSI from surveys. Averaging is not recommended here.

4. Dealer service satisfaction (17%) with respect to service quality, service time, etc. Thus, satisfaction related to quality, reliability and service contributes close to 50% towards the overall satisfaction (including some contribution from the ownership costs). Here, we primarily focus on assessing quality, reliability and service satisfaction. Table 1 shows typical fields found in the sales and service data of a vehicle that can be used to assess this satisfaction.

No. of customers

Service data Optimum CSI model Aggregation effect

Customer Satisfaction Index (CSI)

Table 1: Data Fields in the Sales and Service Data of a Vehicle and the Notation used in this Work Vehicle Sales Data Vehicle Service Data (One-time entry) (For i-th visit) Identification no. VIN Identification no. VIN Sale date d0 Repair date di Mileage at sale m0 Mileage at repair mi Build region BR Repair cost ci Sale region SR Repair time ti Repair code ri

Figure 2: Expected frequency distribution of CSI from the proposed method which makes averaging possible.

3. FEATURE EXTRACTION Various fields presented in Table 1 are considered significant for CSI model-building. We use them here to define certain features that we call the characteristic variables of the data. The CSI function is made country (or region) specific by omitting corresponding data fields. It is to be noted that the function f in Equation (1) does not have a pre-specified mathematical form. In fact, as discussed in Section 1, CSI has traditionally been obtained qualitatively through customer surveys, questionnaires, evaluation forms, etc. In the authors’ knowledge, this work is the first attempt towards quantifying customer satisfaction. To have a computationally tractable method for modeling the CSI, we logically establish the dependency that each of the following chosen variables has with respect to the CSI.

By extracting certain characteristic variables x from the combined sales and service data, the CSI for a particular vehicle can be approximated as, CSIvehicle = f (x)

(1)

For each vehicle model there are bound to be customers who would either rate their vehicle very low or very high (due to subjectivity) in terms of quality. This in turn means that CSIvehicle will be distributed over its entire range as shown in Figure 1. Simple averaging over this distributed set might lead to a misleading representative. From the manufacturer’s point of view a CSI model which can aggregate the views of all the customers in a deterministic way is much more beneficial for identifying vehicle models with

1. x1 : number of visits made by the customer. The VIN or Vehicle Identification Number is unique for each vehicle. By counting the number of times a particular

1764

for constructing f will lead to a biased CSI model, since numerically the variables will have varying effect on the CSI. To circumvent this, all variables are normalized linearly to the range [0, 1] using,

VIN occurs in the service data in a give time-frame, the number of visits made by the customer owning that vehicle can be determined. Two repairs performed on the same day with the vehicle having the same odometer reading is considered as a single visit. Since more visits mean lower satisfaction, the dependency can be modeled as CSIvehicle ∝ x11 .

(j)

xnr i

1 . x3

i=1 1 . x2

i=1

4. x4 : average time interval between visits. The time to first visit is the calendar difference between the earliest instance in the service data and the vehicle sale date. Thereafter, the time interval between subsequent visits can be obtained from the service data alone. The cumulative time intervals are averaged over the number of visits. Mathematically, j ff x1 X 1 x4 = d1 − d 0 + (di − di−1 ) x1 i=2

(j)

max xi

(j)

− min xi

∀ i.

(2)

j

It can be shown that such a linear normalization does not effect the frequency distribution. Keeping in mind the logical dependencies established earlier, we now introduce the following notation for simplicity: Xi Xi

1 for i = 1, 2, 3, 6 (1 + xnr i ) = (1 + xnr for i = 4, 5 i ) =

(3) (4)

The addition of the constant (one here) in (3) ensures that the CSI does not approach infinity for vehicles with minimum corresponding xi (s). But this effectively causes these variables to be mapped to [1, 2]. Hence the same constant is also added in (4). Equation (1) can now be re-written as,

A larger value of x4 means longer problem-free vehicle use and hence higher customer satisfaction. Therefore, CSIvehicle ∝ x4 . 5. x5 : average miles run between visits. Like the time intervals, the miles run by the vehicle without problems can be obtained from the odometer readings in the sales and service data. j ff x1 X 1 x5 = m1 − m0 + (mi − mi−1 ) x1 i=2

CSIvehicle = f (X1 , X2 , X3 , X4 , X5 , X6 )

(5)

As noted earlier, no study exists which defines a functional form for f . Our goal therefore is to find an f which, (a) can produce a CSI distribution with low variance (as shown in Figure 2) for a given vehicle model.

It is easy to conclude that CSIvehicle ∝ x5 .

(b) is flexible enough to differentiate between two or more vehicle models which have clearly different perceived quality in the market.

6. x6 : sum of problem severity ratings. Each vehicle visit is associated with a repair code ri which defines the type of service or repair performed. All repair codes are assigned a severity rating between 1 (minor problem; e.g. oil change) and 5 (major problem; e.g. engine replacement) by subject matter experts. Since severity rating has a negative impact on the CSI, we have CSIvehicle ∝ x16 .

4.

=

Table 2: Some Statistics of Extracted Variables Stat. x1 x2 x3 x4 x5 x6 Model 1 (7310 customers) Min. 1 0 0 1 1 1 Max. 10 19.8 7662.73 178 89547 42 Mean 1.284 0.936 177.0 60.4 2876.1 3.3 Model 2 (5836 customers) Min. 1 0 0 0.5 1 1 Max. 7 24.9 7478.17 178 83821 26 Mean 1.215 1.069 192.99 61.6 3284.6 2.9 Model 3 (876 customers) Min. 1 0 2.04 1 1 1 Max. 7 22.6 11026.71 168 999959 25 Mean 1.475 1.424 351.32 47.2 3287.4 3.9

3. x3 : sum of all service/repair costs. These costs include the labor, part and miscellaneous costs. The total expenditure on a vehicle for the given period x1 X can be obtained as x3 = ci . And it follows that CSIvehicle ∝

j

j

2. x2 : sum of all repair times in hours. Each time a vehicle comes in for service or repair, the corresponding repair times are fed into the service database. The total waiting time for a customer over all visits can x1 X thus be given by x2 = ti . Logically, it is easy to infer that CSIvehicle ∝

xi − min xi

The following two forms were initially considered for the CSI model: 1. Purely multiplicative form CSIvehicle = X1α1 X2α2 X3α3 X4α4 X5α5 X6α6 2. Purely additive form

CSI MODEL FRAMEWORK

CSIvehicle = X1α1 + X2α2 + X3α3 + X4α4 + X5α5 + X6α6

The characteristic variables extracted above vary in different ranges. Table 2 shows some statistics for three vehicle models that we use in this work. The variables were extracted from the sales and service data spanning over six months. It is apparent that the direct use of these variables

The α’s are model parameters obtained through the algorithm discussed later. While both the above forms satisfactorily achieved goal (a), they lacked the flexibility that

1765

β11

β21

β31

β41

β51

β61

γ1

β12

β22

β32

β42

β52

β62

γ2

β13

β23

β33

β43

β53

β63

γ3

1

0

1

1

0

0

0

1

1

0

0

0

0

1

0

0

0

0

1

1

1

Τ1

Τ2

Τ3

+

.......

+

Figure 3: Binary representation for the proposed adaptive form of the CSI model. The string evaluates to CSIvehicle = X1α11 X3α31 X4α41 × X1α12 X2α22 + X5α53 X6α63 + · · · = X1α11 +α12 X2α22 X3α31 X4α41 + X5α53 X6α63 + · · · resents a customer, prompts us to use a genetic algorithm (GA). Once the CSI is evaluated for all customers, the values are normalized to have a standardized scale, similar to what customer surveys provide. A narrower CSI distribution means better agreement among the customers thus making averaging more sensible. It is intuitive that this can be accomplished by minimizing the variance (or standard deviation) of these normalized CSI values. However, there are two pitfalls here which take the algorithm towards trivial solutions. Firstly, noting that αil ’s are non-negative, a simple minimization of the variance causes them to approach the lower bound of zero. This in turn leads to a CSI model which is insensitive to the input vector X and outputs similar CSIvehicle values for all customers. The problem becomes apparent when these values are normalized linearly to the [0, 1] scale. The corresponding CSI distribution, though having a low variance, is severely right-skewed as will be shown later in Figure 4. This immediately motivates us to consider the absolute skewness of the CSI distribution as a second objective. The idea is thus to find a CSI model which has a good trade-off with respect to both variance and skewness. The decision-making issue of selecting one model from the many trade-off solutions is addressed later in this section. The second pitfall is due to the Boolean variables involved. When the GA comes across a model with βil = 0 ∀ i, l, it sees the solution as the global minimum with respect to the variance. However, such a solution is also trivial due to its insensitivity to the input X. Consideration of the skewness objective does not P solve this problem. Hence, we introduce the constraints l βil ≥ 1 ∀ i, which basically impose that each extracted variable Xi is used in at least one of the terms. This also makes practical sense since the six extracted variables are the only information that can be derived from the sales and service data and we would want the resulting CSI model to use all of them. We now put forth the bi-objective optimization problem formulation for finding the customer level CSI model from the sales and service data of c customers of a vehicle model. nr,(j) In the following CSIvehicle represents the CSI value (normalized between zero and one) for vehicle j. Note the arbitrary upper bound on αil ’s. Any other value may also be used. The lower bound of zero was justified earlier in the section above.

goal (b) requires. Results with these forms (omitted here for conciseness) prompted the development of a more complex adaptive CSI model described next. The model proposed here is adaptive in the sense that the mathematical form is not assumed a-priori but is adaptively built by our algorithm. It is composed of six terms. Each of these terms resemble the purely multiplicative form discussed above. That is to say, Tl =

6 Y

αil βil

Xi

∀ l ∈ {1, . . . , 6}

(6)

i=1

where βil ’s are Boolean decision variables which effectively indicate the presence (1) or absence (0) of the i-th transformed variable in the l-th term. This ability of the model to choose specific Xi ’s for each term gives it a finer level of flexibility. Whenever a term evaluates to unity (i.e when βil = 0 ∀ i), it is simply ignored in further computations. To induce a higher level of adaption, adjacent terms are allowed to be operated upon by either a multiplication or an addition. The inverse operations (division and subtraction) are not considered since the positive or negative correlation of CSIvehicle to each characteristic variable xi is already incorporated into the transformed variables Xi ’s. For the same reason the powers αil ’s are considered to be non-negative later. The use of just two operators also enables us to have Boolean encoding to represent them. With six terms, we have five additional Boolean decision variables. Note that the adaptive model is now capable of taking either the purely multiplicative or the purely additive form. The presence of 41(= 36+5) Boolean variables also makes it encodable through a binary representation. Figure 3 shows an example. Every seventh bit is the variable γl which defines the arithmetic operation between the Tl -th and Tl+1 th terms; 0 for multiplication and 1 for addition. The usual precedence order is followed for the operators (multiplication and then addition) to evaluate the CSI. It is easy to see that through the above binary encoding for the adaptive CSI model, we are trying to mimic the tree structure of a genetic programming module with T = {αil ∀ i, l = 1, 2, . . . , 6} as the terminal set and F = {×, +} as the functional set [15]. We reiterate here that though this proposed representation does not lead to the most generic form for the CSI function, it was found during simulations to be flexible enough to address goal (b).

5.

Minimize Minimize

CUSTOMER LEVEL CSI MODEL

We now utilize the adaptive form of the CSI model in our problem formulation to arrive at a CSI function which when evaluated for different customers produces a distribution (or histogram) with low variance (Figure 2). The binary representation of the model and the calculation of CSIvehicle for numerous sets of extracted variables, where each set rep-

Subject to

σ |g| X

βil ≥ 1 ∀ i ∈ {1, . . . , 6}

l

1766

36 real variables:

0 ≤ αil ≤ 1 ∀ i, l ∈ {1, . . . , 6}

36 Boolean variables: 5 Boolean variables:

βil ∈ {0, 1} ∀ i, l ∈ {1, . . . , 6} γl ∈ {0, 1} ∀ l ∈ {1, . . . , 5}

(7)

where, σ

=

g

=

Table 4: Objective Function Values of the Three Vehicle Models at their Knee Points. The terms (Ti s), and the way they are combined, need not be the same in the three CSI models. Vehicle Model Std. dev. Skewness & CSI function σ |g| Model 1 0.03591 0.05078 CSIM odel 1 T1 + T2 × T3 × T4 + T5 + T6 Model 2 0.03786 0.00517 CSIM odel 2 T1 + T2 + T3 × T4 × T5 + T6 Model 3 0.04099 0.00316 CSIM odel 3 T1 + T2 + T3 + T4 + T5 + T6

v , u X c ”2 X u1 c “ nr,(j) nr,(j) t c CSIvehicle CSIvehicle − μ , μ = c j=1 j=1 , c ”3 1 X“ nr,(j) σ3 CSIvehicle − μ c j=1

For our study, we have considered three vehicle models available in the market which, according to a recent Consumer Reports study [6], have reliability ratings as shown in Table 3. Models with unfavorable reliability ratings are intentionally chosen so that problem areas in these vehicle models can be subsequently analyzed. However, this analysis is beyond the scope of this paper and is not presented here. Figure 4 shows the non-dominated solutions obtained by solving (7) for each vehicle model individually using NSGA-II [9]. A randomly initialized population of size 2000 is evolved in each case over 500 generations. The SBX crossover and polynomial mutation are used with probabilities of 0.9 and 0.05 respectively for all real variables. The distribution indices for these operators are taken as ηc = 10 and ηm = 50. For the binary string, the two-point crossover and the bitwise mutation are used with probabilities of 0.9 and 0.15 respectively. The selection operation is based upon tournaments played between pairs of population members. It is interesting to note that the number of non-dominated points in the final populations for Models 1, 2 and 3 are 290, 296 and 152 respectively. Table 3: Consumer Model Model 1 Model 2 Model 3

modifying (7) to reflect this. The objective here is to differentiate between two or more vehicle models as distinctly as possible while ensuring that we still achieve (for each vehicle model considered) what the customer level CSI model does. By maximizing the absolute difference between the mean of nr CSIvehicle values of two vehicle models, the two-vehicle CSI model can be obtained. Extending this for multiple vehicle models simply means maximizing the sum of these absolute differences over all pairs of vehicle models since the CSI values are already normalized. The addition of the above objective makes the problem complex. However, since we are only interested in the knee point with respect to the σ and g objectives, a focused search in the near-knee region can be carried out by converting these objectives into constraints. We use the results of the customer level CSI model to impose upper bounds on the two objectives. The region of interest is shown in Figure 4. Using the region boundaries as constraints the single objective problem for multiple vehicle models is given as, X Maximize |μm − μn |

Reports Reliability Ratings [6]. Reliability Rating Average Average Worse Than Average

{m,n|m=n}

Subject to

Each point on these fronts represents a different CSI model. The sharp kink in the fronts indicates the presence of a knee. The knee point of a two dimensional Pareto-optimal front is the solution which gives maximum trade-off with respect to both objectives. Due to this characteristic it is also often called the preferred solution. Identification and significance of knee points and knee regions has been discussed in [10]. We use the normal-boundary intersection approach [7] to identify a single knee point for our purpose. Figure 4 nr also shows the CSIvehicle distributions for three solutions of Model 1; the extreme solutions and the knee point. It is clear in our case why the latter should be the preferred solution. Table 4 shows the function values at the knee point for all three vehicle models and the structure of the knee-point CSI models. The corresponding αil and βil values (36 + 36 in number) are omitted here for brevity.

6.

σm ≤ 0.05 ∀ m |gm | ≤ 1.0 ∀ m X βil ≥ 1 ∀ i ∈ {1, . . . , 6}

(8)

l

36 real variables: 36 Boolean variables: 5 Boolean variables:

0 ≤ αil ≤ 1 ∀ i, l ∈ {1, . . . , 6} βil ∈ {0, 1} ∀ i, l ∈ {1, . . . , 6} γl ∈ {0, 1} ∀ l ∈ {1, . . . , 5}

where σm , μm and gm are respectively the standard deviation, mean and skewness of the normalized CSI values of the m-th vehicle model (having cm customers) calculated using the same formulae as in (7). The only difference in notation here is that nr represents CSI normalization (between zero and one) over all vehicle models. A GA-based optimization algorithm is used which takes the sales and service data of all the vehicle models as input. Readers might draw parallels between this approach and discriminant analysis methods [13, 17] used in statistics to classify samples based on their features. However, there are issues like the strict assumption of normality, requirement of a training set and the non-invertibility of covariance matrices which prevents us from using these methods for our purpose. Also, unlike the proposed approach, discriminant methods are not concerned with the distribution characteristics like skewness of the classified samples.

MULTIPLE VEHICLE CSI MODEL

The customer level CSI model focuses on obtaining a narnr row and low skewness CSIvehicle distribution thus aggregating the views of all the customers and making averaging possible. However, as seen in Table 4 the obtained CSI function f differs from one vehicle model to other and hence does not provide a common basis for comparing or ranking them. The multiple vehicle CSI model is obtained by

1767

10

6

Model 1 Model 2 Model 3

400 Knee−point solution

200

300

0 0

0.5 CSInr

vehicle

4

1

σ = 0.03591 g = 0.05078

200 100

Region of interest

0 0

2

60 No. of customers

Absolute skewness (|g|)

8

σ = 0.02542 g = 9.99433

No. of customers

No. of customers

600

0.5 CSInr vehicle

σ = 0.10872 g = 0.00002

40 20

1 0 0

0.5 CSInr

1

1

vehicle

0.02

0.04

0.05

0.06

0.08 0.1 Standard deviation (σ)

0.12

0.14

0.16

Figure 4: Trade-off fronts for Models 1, 2 and 3 obtained using the customer level CSI model highlighting the knee region. Notice the change in the CSI distribution of Model 1 from one extreme solution to another. 200

6.1 Results and Discussion

Model 1 Model 2

The developed multiple vehicle CSI model is now applied to pairwise combinations of the three vehicle models in Table 3. A straightforward GA which can handle both real and binary variables is used to solve (8). While the population size and number of generations are chosen to be 500 and 10000 respectively, other GA parameters and operators are the same as those specified for NSGA-II in Sec. 5. Constraints are handled using the penalty parameterless approach [8]. Models 1 and 2, both having an overall ‘Average’ Consumer Reports rating, produce CSI distributions that almost overlap as shown in Figure 5 where normal distribution curves are fitted to the CSI values to clearly illustrate the location of means. Note that despite maximizing the difference between the means of the two distributions only an objective value of 0.00853 was achieved. This implies that the models indeed are very similar in terms of satisfaction. However, when Models 1 and 3 are considered, the ‘Worse Than Average’ rating of Model 3 is clearly reflected as shown in Figure 6. Similar is the case when Models 2 and 3 are compared, also shown in Figure 6. Using the obtained CSI distributions the relative ranking with respect to the mean CSI of the three models can be established as, CSIM odel

1

≈ CSIM odel

2

> CSIM odel 3 .

No. of customers

150

100

50

0 0

0.2

0.4 nr 0.6 CSI

0.8

1

vehicle

Figure 5: CSI distributions for models 1 and 2 with |μ1 − μ2 | = 0.00853. through a genetic programming or a neural network, in which case the CSI function itself would be a black box. Recall that our argument for selecting a reduced function set was to be able to implement the CSI modelbuilding in a genetic algorithm based approach. A GP on the other hand is generally computationally expensive.

(9)

When the service data of all three models is used, the obtained optimal CSI distributions are shown in Figure 7. It confirms the ranking deduced above. The optimum CSI models obtained in all the cases are shown in Table 5 along with the mean, standard deviation and skewness of the corresponding CSI distributions. An important observation from these results is that the proposed method may be underestimating the difference between an ‘Average’ and a ‘Worse Than Average’ vehicle model. We offer the following plausible explanations for this behavior: 1. First, the adaptive model that we started with may not be generic enough to capture subtle variations in extracted variables. A better generic form can be achieved

1768

2. Secondly, we note the fact that the automotive industry is a very competitive one. It has also been one of the fastest growing consumer sectors since the industrial revolution. Lately with the shift of focus towards customers, most major automotive companies are frequently upgrading their technology so as to make better vehicles and maintain their market share. The result is a quality crunch meaning that no manufacturer makes vehicles that are ‘too bad’ or ‘too good’. The proposed multiple vehicle CSI model could be reflecting this fact. Our reason to believe in this theory is that the objective values obtained from both the plots in Figure 6 are nearly the same. 3. The third reason concerns not the proposed method but the validation data itself. According to the Con-

40

70

35

Model 1 Model 3

40 30 20

Table 5: Distribution Characteristics of the obtained Multiple Vehicle CSI Models. The terms (Ti s), and the way they are combined, need not be the same in the vehicle combinations considered. Vehicle combinations Mean Std. dev. Skewness & CSI function μm σm |gm | Model 1 0.31946 0.04999 0.81335 Model 2 0.32799 0.04776 0.68732 CSIM odel 12 T1 + T2 + T3 + T4 + T5 + T6 Model 1 0.37412 0.03605 0.92016 Model 3 0.35608 0.04998 0.95858 CSIM odel 13 T1 × T2 + T3 + T4 + T5 × T6 Model 2 0.37592 0.03683 0.97813 Model 3 0.35739 0.04997 0.92109 CSIM odel 23 T1 + T2 + T3 + T4 + T5 + T6 Model 1 0.36898 0.03529 0.98196 Model 2 0.36988 0.03660 0.99339 Model 3 0.35143 0.04998 0.99303 CSIM odel 123 T1 + T2 + T3 + T4 + T5 + T6

Model 2 Model 3

30

50

No. of customers

No. of customers

60

25 20 15 10

10

5

0 0

0.2

0.4 nr 0.6 CSIvehicle

0.8

1

0 0

0.2

0.4 nr 0.6 CSIvehicle

0.8

1

Figure 6: CSI distributions for models 1 and 3 with |μ1 − μ3 | = 0.01804 (left) and for models 2 and 3 with |μ2 − μ3 | = 0.01853 (right). 70 60

Model 1 Model 2

No. of customers

50

Model 3

The adjusted degrees of freedom for the variance estimate in the denominator is given by, «2 „ 2 σm σ2 + n cm cn ν= (12) 4 σn4 σm + c2m (cm − 1) c2n (cn − 1)

40 30 20 10 0.25 0 0

0.2

0.4 nr CSI

0.6

0.3 0.8

0.35

0.4

0.45

Note that μ and σ here are the sample mean and variance respectively contrary to the conventional notation in statistics. As is customary in such tests, we use α = 5% significance level. Table 6 shows the values of t, ν and the corresponding 95% confidence interval for the pairwise combinations in Table 5.

1

vehicle

Figure 7: CSI distributions obtained using multiple vehicle CSI model for all three vehicle models with |μ1 − μ2 | + |μ1 − μ3 | + |μ2 − μ3 | = 0.03690.

Table 6: Welch’s t-test Vehicle t ν model pairs m = 1, n = 2 −9.962 12730.21 986.91 m = 1, n = 3 10.358 m = 2, n = 3 10.594 1022.33

sumer Reports website, the reliability ratings are calculated from the responses of only about 200 to 400 customers per model collected over 12 months. Sampling errors can cause the satisfaction levels to be biased. In contrast, our method utilizes the service data of all customers that visited an authorized service station in the said period of six months. The procedure, however can also be directly applied to longer periods.

Since none of the confidence intervals enclose the hypothesized mean difference value of zero, the null hypothesis H0 can be rejected in all three cases with 95% confidence. The alternate hypothesis stating that the difference between the means is statistically significant is hence accepted. Further, with the same level of confidence, it can be stated that M1 < M2 , M1 > M3 and M2 > M3 . The Welch test has thus enabled us to extend Equation (9) to the whole population (of customers of the three vehicle models) as,

The proximity of the distributions shown in Figures 5 and 6 leads us to the question: ‘Are the differences between the means in all the three cases statistically significant for ranking the vehicle models on their basis ?’. To answer this, we consider the null hypothesis H0 : Mm − Mn = 0,

(10)

where Mi is the population mean for vehicle model i. [16] suggests the use of Welch’s t-test for testing this hypothesis. This test is basically an extension of the Student’s two sample t-test for independent or unpaired (in statistical hypothesis testing terminology) samples with unequal sizes and variances. The modified t statistic is given by, μm − μn t= q 2 . σm σ2 + cnn cm

statistics. 95% confidence interval (Mm − Mn ) (−0.0102, −0.0068) (0.0146, 0.0214) (0.0151, 0.0220)

CSIM odel

2

> CSIM odel

1

> CSIM odel 3 .

(13)

Through our procedure we have not just validated the results of the multiple vehicle CSI model against Consumer Reports, but also established that Model 2 is slightly better than Model 1. The step-rating of Consumer Reports (consisting of the ratings ‘Worse’, ‘Worse than Average’, ‘Average’, ‘Better than Average’ and ‘Better’) and the limited sample sizes prevented this subtle difference from emerging.

(11)

1769

6.2 A Note on Computation Effort

sions with Dr. Prakash G. Bharati, Dr. Pulak Bandyopadhyay and Dr. Pattada A. Kallappa have been very helpful.

The large number of customers involved with each vehicle model makes the task of evaluating the CSI function computationally intensive. A typical serial GA code that uses 1000 population members takes more than 24 hours to find the optimum CSI distribution on a machine having 3GB memory and four core processors each clocked at 2.66GHz. The processor burden is higher in case of multiple vehicle CSI modeling. To utilize the inherently parallelizable nature of genetic algorithms, the proposed methodology was implemented on an NVIDIA Tesla C1060 Graphics Processing Unit containing 30 symmetric multi-processors using the CUDA architecture [12]. Since the CSI evaluation for each population member is independent of others, the computations are assigned to population-size number of threads operating simultaneously in the device. The obtained values are copied to the host machine where the mean, skewness and variance are evaluated. This parallelization scheme implemented at the objective function evaluation stage yielded speed-ups of about 45x in case of customer level CSI models.

7.

9. REFERENCES [1] The American Customer Satisfaction Index. ACSI, 2010. www.theacsi.org. [2] T. Andreassen. Antecedents to satisfaction with service recovery. European Journal of Marketing, 34(1/2):156–175, 2000. [3] JDPower.com. J.D. Power and Associates, 2010. www.jdpower.com. [4] W. Boulding, A. Kalra, R. Staelin, and V. Zeithaml. A dynamic process model of service quality: From expectations to behavioral intentions. Journal of marketing research, 30(1):7–27, 1993. [5] ConsumerReports.org. Consumers Union of U.S., Inc., 2010. www.consumerreports.org. [6] ConsumerReports. ConsumersReports.org, April 2010. [7] I. Das. On characterizing the “knee” of the Pareto curve based on normal-boundary intersection. Structural and Multidisciplinary Optimization, 18(2):107–115, 1999. [8] K. Deb. An efficient constraint handling method for genetic algorithms. Computer Methods in Applied Mechanics and Engineering, 186(2–4):311–338, 2000. [9] K. Deb, S. Agarwal, A. Pratap, and T. Meyarivan. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, 2002. [10] K. Deb and S. Gupta. Understanding knee points in bicriteria problems and their implications as preferred solution principles. Engineering optimization, 2011. [11] J.D. Power/What Car? 2009 UK Vehicle Ownership Satisfaction Study, 2009. www.testdriven.co.uk. [12] NVIDIA CUDA: Compute Unified Device Architecture Programming Guide, 2007. [13] R. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7:179–188, 1936. [14] C. Fornell, M. Johnson, E. Anderson, J. Cha, and B. Bryant. The American customer satisfaction index: Nature, purpose and findings. The Journal of Marketing, 60(4):7–18, 1996. [15] J. Koza. Genetic programming: On the programming of computers by means of natural selection. MIT press, 1992. [16] R. Lomax. An introduction to statistical concepts for education and behavioral sciences. Lawrence Erlbaum, 2001. [17] G. McLachlan and J. Wiley. Discriminant analysis and statistical pattern recognition. Wiley, 1992. [18] R. Oliver. A conceptual model of service quality and service satisfaction: Compatible goals, different concepts. Advances in services marketing and management, 2:65–85, 1993. [19] J. Robinson and S. Chukova. Estimating mean cumulative functions from truncated automotive warranty data. In Communications of the Fourth International Conference on Mathematical Methods in Reliability, Methodology and Practice, pages CD–ROM (4 pages), Santa Fe, New Mexico, USA, 2004.

CONCLUSIONS

In this paper, we presented two CSI model-building approaches that quantify customer opinions at the vehicle and model level. The first approach involved a bi-objective optimization formulation for aggregating perceptions of different customers about their vehicles through the service data which is constantly recorded by most automotive manufacturers. The obtained trade-off fronts displayed knee behavior using which a region of interest was identified for the multiple vehicle CSI model. Results from the latter enabled us to compare different vehicle models with respect to quality and reliability and thus rank them according to their averaged CSI values. The relative ranking was found to agree with the survey based ratings of Consumer Reports. The current study has opened up a few research directions in similar lines. Some marketing research literature point to a time-varying customer satisfaction index. The present method can be a good starting point for mathematical CSI models which consider the time of vehicle usage. It is a common phenomenon that customer expectations go down with the age of the product. A model depicting this change can be useful to companies for devising future market strategies. As pointed out earlier, studies into the flexibility of the proposed adaptive approach can help classify vehicle models more distinctly. Similarly, identification of vehicles with low CSI, determination of the most common problems faced by their owners, and improving quality of the associated components can lead to an effective improvement in the overall CSI of the vehicle model. Customer satisfaction measurement has had much of the focus of marketing researchers in recent times. The subjectivity associated with surveys, though widely-accepted, has to be controlled for making reliable predictions. This paper has explored a viable direction for achieving the same.

8.

ACKNOWLEDGMENTS

The first author wishes to thank Rupesh Tulshyan for help with parallelization and CUDA implementation of the algorithm. The financial support and vehicle related data provided by India Science Laboratory, General Motors R&D, in pursuing this research is highly appreciated. Initial discus-

1770