sures should be used for hedonic services and cognitive measures for ...... faction with Management Consulting Services,â Journal of Services. Research, 3 (2) ...
An Examination of the Quality and Context-Specific Applicability of Commonly Used Customer Satisfaction Measures Jochen Wirtz National University of Singapore
Meng Chung Lee Infocomm Development Authority, Singapore
The authors catalogued commonly used satisfaction measures and then examined their reliability and explained variance. Furthermore, they explored the notion that affective measures may be better in capturing satisfaction with predominantly hedonic products, whereas cognitive measures may be better for products with mostly utilitarian benefits. To do this, the authors examined all measures in a hedonic and utilitarian service context. The authors found that a six-item 7-point semantic differential scale performed best across both service contexts, followed by a four-item 7-point semantic differential scale. The third best measure was a single-item 11-point percentage scale. Furthermore, they rejected the notion that affective measures should be used for hedonic services and cognitive measures for utilitarian services. Rather, measures that were shown to be of good quality (i.e., had high satisfaction loadings, good reliability, and low error variances) were equally applicable for measuring satisfaction for both types of services. In addition, all satisfaction measures captured both affective and cognitive aspects of satisfaction, independent of their scale anchors.
Keywords: customer satisfaction; measurement; comparison; hedonic and utilitarian scales
INTRODUCTION AND RESEARCH BACKGROUND In today’s increasingly competitive world of business, satisfying or even delighting one’s customers has become an essential ingredient for success (Oliver, Rust, and Varki 1997). Customer satisfaction is an antecedent of repurchase intent, word of mouth, customer loyalty, and ultimately long-term profitability of a firm (e.g., Bearden and Teel 1983; Fornell 1992; LaBarbera and Mazursky 1983; Oliver 1980, 1997; Oliver and Swan 1989). Therefore, it is essential for firms to effectively manage customer satisfaction. To be able do this, we need accurate measurement of satisfaction (Wirtz 2001; Wirtz and Bateson 1995). Although myriad satisfaction measures have been developed, to date, there has been no empirical research on
The authors gratefully acknowledge the excellent research assistance provided by Handy Amin and Patricia Chew. This study was supported by a research grant provided by the NUS Business School, National University of Singapore. Journal of Service Research, Volume 5, No. 4, May 2003 345-355 DOI: 10.1177/1094670503251135 © 2003 Sage Publications
346
JOURNAL OF SERVICE RESEARCH / May 2003
differentiating the use of these measures. Measures that have been shown to have high validity and reliability in one or several contexts have been implicitly assumed to be equally applicable in other contexts. It was one of the key objectives of this study to directly compare the satisfaction loadings and reliability and error variances of commonly used satisfaction measures across different service contexts. Furthermore, satisfaction has often been measured using scales anchored in emotive words such as pleased, delighted, and happy. These measures are generally regarded as affective in content based on the nature of their scale anchors (Hausknecht 1990). However, does this imply that these measures actually capture more affective rather than cognitive postpurchase evaluation compared to other measures with more cognitive scale anchors? This notion has been suggested in the past (Hausknecht 1990) and is examined empirically in the present study. Finally, it seems reasonable to suggest that the selection of satisfaction measures should be dependent on at least three factors: (a) purpose of measurement, (b) product characteristics, and (c) respondent characteristics (Lee and Wirtz, 1997). Stauss and Hentschel (1992) demonstrated empirically that the critical incident technique is superior to attribute-specific satisfaction measures in identifying potential areas for service improvement. The finding demonstrates that one type of measure is superior to another depending on the research objective. Our article investigates the second issue of whether the selection of appropriate satisfaction measures should be a function of product characteristics. For example, affective measures may be better in capturing satisfaction with predominantly hedonic products, whereas cognitive measures may be better for products with mostly utilitarian benefits. To reiterate, this study has three objectives. First, it directly compares the quality of commonly used satisfaction measures. Second, the affective and cognitive content of those commonly measures is explored. Finally, it is tested whether affective scales are better at measuring satisfaction with hedonic services and vice versa for cognitive measures and utilitarian services.
LITERATURE Customer Satisfaction Consumers evaluate their consumption experiences by comparing the perceived product performance with their expectations. This disconfirmation-of-expectations model is the most widely used satisfaction model across virtually all product categories (e.g., Churchill and Surprenant 1982; Oliver 1997; Patterson 2000; Swan and Trawick 1981; Tse and Wilton 1988; Westbrook 1980). It performs
well in competitive markets with reasonably knowledgeable customers who are able to match their needs and wants with what they expect from the chosen product (Wirtz and Mattila 2001). Although the disconfirmationof-expectations model is mostly based on cognitive evaluations of product performance, the role of affect during consumption (e.g., Mano and Oliver 1993; Oliver 1997; Oliver, Rust, and Varki 1997; Westbrook 1987; Wirtz and Bateson 1999; Yu and Dean 2001) and affective expectations before consumption (termed target arousal by Wirtz, Mattila, and Tan 2000) have increasingly attracted attention. Next, we discuss the influence of hedonic and utilitarian product benefits on customer satisfaction measurement. Product Benefits Consumption experience refers to the subjective consciousness of consumers as they interact with goods and services (Oliver and Westbrook 1993). It includes consciously experienced cognitive phenomena such as thoughts, beliefs and goals, and the perception of sensory, emotive, imaginal, and aesthetic responses to the ownership and usage of products (Hirschman and Holbrook 1982). As research on consumption experiences grows, evidence suggests that consumers purchase goods and services for a combination of two types of benefits: hedonic and utilitarian. Hedonic benefits. Hedonic benefits of a product are obtained by customers through the intrinsically pleasing properties of a product, such as the pleasure provided by listening to a musical recording. Hedonic benefits are associated with the sensory and experiential attributes of the product (Batra and Athola 1990). Therefore, consumers usually purchase products with mostly hedonic benefits for the sake of pleasure and enjoyment. Mano and Oliver (1993) showed that the evaluation of hedonic benefits is mostly affective. Similarly, Dube-Rioux (1990) advanced that affect would become a key driver of satisfaction with products when the experiential aspect of consumption is important. Given that consumers buy certain products primarily for hedonic benefits, and satisfying affective expectations during consumption of such products is likely to be the key driver of satisfaction, it seems intuitively appealing that satisfaction measures high in affective content should perform better than measures low in affective content. The notion that for products with mostly hedonic benefits, satisfaction measures high in affective content capture satisfaction better than measures low in affective content is examined in this article. To be able to do this, we also establish the hedonic and cognitive content of commonly used satisfaction measures.
Wirtz, Lee / CUSTOMER SATISFACTION MEASURES
347
Utilitarian benefits. Utilitarian benefits of a product are associated with the more instrumental and functional attributes of the product (Batra and Athola 1990). An example would be the ability of an air conditioner to cool a room. When consumers evaluate their satisfaction with the utilitarian benefits of a product, consumers primarily evaluate whether a product fulfills their instrumental or functional expectations. Mano and Oliver (1993) showed empirically that the evaluation of the utilitarian benefits of a product is predominantly cognitive. Therefore, it seems intuitive that for products with mostly utilitarian benefits, satisfaction measures high in cognitive content capture satisfaction better than measures low in cognitive content. Again, this notion is examined here.
ent degrees of affect and cognition. Therefore, scales with predominantly affective anchors (e.g., the delightedterrible scale) and scales with mostly cognitive anchors (e.g., the percentage scale) needed to be included in this study. After reviewing the satisfaction literature, nine satisfaction measures were chosen (see Table 1). All nine scales have been commonly used in applied and/or academic research. Six of the nine measures were single-item overall satisfaction measures, and three of them were multi-item overall satisfaction measures. Other measures used in this study included Mehrabian and Russell’s (1974) pleasure scale, two disconfirmation measures, and two postpurchase behavior scales (refer to Table 2 for details).
METHOD
FINDINGS
A survey with a self-administered questionnaire was used, allowing for the testing of “real-life” satisfaction in different research contexts. A pilot test with 60 participants and across 10 different products was conducted using Batra and Athola’s (1990) hedonic and utilitarian fouritem 7-point scales. Two products were identified that provide a high degree of hedonic or utilitarian benefits: ice cream restaurant services for hedonic benefits and automatic teller machine (ATM) services for utilitarian benefits. Both products were selected as research contexts for this study. In the study proper, the degree of hedonic and utilitarian benefits of the two products was measured after the various satisfaction scales to avoid potential demand effects. As intended, the ATM service had a significantly higher degree of utilitarian benefits (M = 5.47) than the ice cream restaurant (M = 4.16, p < .001). Similarly, the ice cream restaurant was seen as offering a significantly higher degree of hedonic benefits (M = 5.18) than the ATM service (M = 4.26, p < .001). A convenience sample of 260 university students from a local university was surveyed. Three questionnaires were incomplete and therefore dropped, leading to a final sample size of 257. To disguise the objective of this study and to reduce potential demand effects, we introduced this study to participants as a study on consumer satisfaction with ATM services and ice cream restaurants. Participants were told to evaluate their most recent experience with both products. The order of the products and the order of the satisfaction measures were randomized to minimize potential order effects and to allow for their testing. In total, 12 versions of questionnaires with randomized orders of the two products and the satisfaction measures were used. To examine the issues highlighted in this study, we needed to include measures that have anchors with differ-
We first explore the affective and cognitive content of the measures tested. This is followed by an exploration of the performance of measures in the context of hedonic service and utilitarian service. The quality of the satisfaction measures across both contexts is assessed, whereby we also explore whether the service context (i.e., hedonic vs. utilitarian services) should drive the type of satisfaction measures to be used. Investigation Into the Content of Satisfaction Measures A two-step method was used to determine the affective and cognitive content of all nine satisfaction measures. First, a factor analysis was conducted on the six items of the pleasure scale, together with the two one-item disconfirmation measures. It was expected that two factors would be extracted. Specifically, the pleasure scale should capture the affective dimension and the disconfirmation scale the cognitive dimension of consumption, as highlighted in the literature review. Second, a correlation analysis was conducted between the factor scores and the nine satisfaction measures. This analysis helped to determine the content of the satisfaction measures, that is, whether they capture the cognitive or affective aspects of satisfaction or both. A similar approach had been used by Westbrook (1983), when he examined the correspondence of consumers’ emotional product usage experiences to consumer satisfaction measures, and by Williams (1988), who examined the shared variance between job satisfaction measures and two factors representing affect and cognition. Both the samples (i.e., the ice cream restaurant and the ATM service) were combined for this analysis. A factor analysis was first conducted on the six indicators of pleasure and the two indicators of disconfirmation using the
348
JOURNAL OF SERVICE RESEARCH / May 2003
TABLE 1 Satisfaction Measures Examined in This Study Measure Very satisfied–very dissatisfied Percentage Likert-satisfied Pleased Happy Delighted-terrible Semantic differential (4 items)
Semantic differential (6 items)
Likert-type (12 items)
Operationalization
Source
One-item 7-point bipolar scale: very satisfied to very dissatisfied One-item 11-point percentage degree scale: not at all satisfied to completely satisfied One-item 7-point Likert-type scale: I am satisfied with this experience One-item 7-point bipolar scale: pleased to displeased One-item 7-point bipolar scale: extremely happy to extremely unhappy One-item 7-point bipolar scale: delighted to terrible Four-item 7-point bipolar scale (alpha = 0.92): satisfied to dissatisfied favorable to unfavorable pleasant to unpleasant I like it very much to I didn’t like it at all Six-item 7-point bipolar scale (alpha = 0.93): pleased me to displeased me contented with to disgusted with very satisfied with to very dissatisfied with did a good job for me to did a poor job for me wise choice to poor choice happy with to unhappy with Twelve-item 7-point Likert-type scale (alpha = 0.91): This is one of the best experiences I had in This experience of is exactly what I need This experience of hasn’t worked out as well as I thought it would I am satisfied with the experience I have mixed feelings with the experience My choice was a wise one If I could do it over again, I would not have I truly enjoyed I feel bad about my decision to I am not happy that I decided to has been a good experience I’m sure it was the right thing for me to
Oliver and Bearden (1983) Oliver and Bearden (1983) Westbrook and Oliver (1981) Crosby and Stephens (1987) Crosby and Stephens (1987) Westbrook (1980) Eroglu and Machleit (1990)
Oliver and Swan (1989)
Westbrook and Oliver (1981)
TABLE 2 varimax rotated component analysis method. The analysis resulted in only two factors with eigenvalues above 1, explaining 60.6% of the variance (37.7% for the first factor and 22.9% for the second). Table 3 shows that the six pleasure indicators loaded highly on the first factor and that the disconfirmation measures loaded highly on the second factor. Therefore, these two factors were labeled as affect and cognition. The sixth indicator of the pleasure scale (see Table 2) loaded heavily on both factors, though slightly more on the affective factor. This may be due to the scale anchors, which represent the satisfaction level of subjects (i.e., anchored as satisfied and unsatisfied). To obtain a cleaner measure for pleasure, we dropped this item from further analysis.
The second part of the analysis involved examining the correlations of the two factors with each of the satisfaction measures. If measures were highly correlated with the affect factor, then they are affective in content and vice versa for measures that correlated highly with the cognitive factor. Table 4 shows the correlations of the nine satisfaction measures with the affective and cognitive factors. The data showed that the affectively anchored happy measure had the highest correlation with the affective factor and the lowest correlation with the cognitive factor. Furthermore, the cognitively anchored percentage, very satisfied–very dissatisfied, Likert-satisfied, and Likerttype (12 items) scales captured more of the cognitive rather than the affective factor. However, contrary to ex-
Wirtz, Lee / CUSTOMER SATISFACTION MEASURES
349
Operationalization of Variables Variable
Symbol
Operationalization
Pleasure Pleas1 Pleas2 Pleas3 Pleas4 Pleas5 Pleas6 Disconfirmation Disc1 Disc2 Postpurchase behavior
Repeat1 Repeat2 Recommend1 Recommend2
Six-item semantic differential scale (alpha = .80) Bored to relaxed Hopeful to despairing Happy to unhappy Melancholic to contented Pleased to annoyed Unsatisfied to satisfied Two-item 7-point multichotomous scale (r = .80) Worse than expected to better than expected Too low: It was better than you expected to Too high: It was worse than you expected Two-item 7-point bipolar scale (r = .93) Likely to unlikely to repurchase Possible to impossible to repurchase Two-item 7-point bipolar scale (r = .94) Likely to unlikely to recommend Possible to impossible to recommend
TABLE 3 Affective and Cognitive Factor Loadings for Pleasure and Disconfirmation Measures Factor Loading Construct
Indicator
Pleasure
Pleas1 Pleas2 Pleas3 Pleas4 Pleas5 Pleas6 Disc1 Disc2
Disconfirmation Variance explained (%)
Factor 1: Affect
Factor 2: Cognition
.70 .70 .82 .63 .79 .60 .01 –.01 37.7
–.06 –.03 –.05 .05 .17 .44 .90 .89 22.9
NOTE: Italicized numbers represent those measures with high affective and cognitive loadings, respectively.
pectations, it was also found that the affect-anchored pleased and delighted-terrible measures had roughly the same amount of cognitive and affective content. Of the multi-item scales, which were composed of a mix of affective and cognitive-anchored scale items, both semantic differential (4 items) and semantic differential (6 items) had high correlations with both cognitive and affective factors. The Likert-type (12 items) scale had a high loading on the cognitive factor but a relatively low loading on the affective factor. Overall, it was somewhat surprising to find that all measures were relatively highly correlated to both cognition and affective factors. The generally held notion that certain satisfaction measures capture predominantly affect while others capture predominantly cognition had to be rejected.
Source Mehrabian and Russell (1974)
Oliver (1980) Churchill and Suprenant (1982) Dabholkar (1995)
Dabholkar (1995)
Model Development To assess the quality of the satisfaction measures included in this study, structural equation modeling (SEM) was used. For each product, a separate model was constructed using AMOS. The two conceptual models for the ice cream restaurant services and ATM services were assessed using maximum likelihood estimation. We modeled disconfirmation as an antecedent to satisfaction, which in turn drove postpurchase behavior. Postpurchase behavior included repeat purchase intent and word of mouth. For the ATM model, two indicators—Recommend1 and Recommend2—of the postpurchase behavior construct had high error variances (> 2) and low item reliabilities (< .11). In retrospect, this seems intuitively logical as customers would rarely recommend the use of a particular ATM to their friends. Therefore, these two measures were dropped from the subsequent ATM model, and only the repeat usage scales were retained. Confirmatory factor analysis was used to examine the reliability and validity of the measures used. Item reliability was used to examine the measurement error in each indicator, whereas scale reliability was used to examine the commonality of all items measuring the same construct. The results for both models showed that the reliabilities for all items were well above .50. The data also showed that the variance extracted exceeded 60% for all constructs. This implied that all the latent constructs in both models were adequately captured by their measurement items (Fornell and Larcker 1981). Discriminant validity for the construct pairs was examined using Fornell and Larcker’s (1981) test of average trait variance extracted. All con-
350
JOURNAL OF SERVICE RESEARCH / May 2003
TABLE 4 Correlation of Satisfaction Measures With the Affective and Cognitive Factor
Measure
Ranked by Correlation Correlation Ranked by Correlation Coefficient Coefficient Correlation With With With With Affective Affective Cognitive Cognitive Factor Factor Factor Factor
Happy Semantic differential (4 items) Semantic differential (6 items) Pleased Delighted-terrible Very satisfied– very dissatisfied Likert-type (12 items) Percentage Likert-satisfied
1
.54
.42
9
2
.52
.52
4
3 4 5
.46 .46 .44
.57 .48 .50
1 8 6
6
.40
.51
5
7 8 9
.37 .36 .35
.53 .54 .49
3 2 7
NOTE: All scales are single-item scales unless indicated otherwise.
struct pairs for both measurement models demonstrated that the average variance extracted for any two constructs (i.e., disconfirmation, satisfaction, and postpurchase behavior) exceeded the square of the structural link between any two constructs. Therefore, it could be concluded that the constructs had discriminant validity. Figures 1 and 2 show the structural models for the ice cream restaurant and the ATM service, respectively. The fit indices for the ice cream restaurant model were as follows: goodness-of-fit index (GFI) = 0.95, adjusted goodness-of-fit index (AFGI) = 0.93, χ2 = 99.98 (df = 82, p = .09), root mean square error of approximation (RMSEA) = 0.03, root mean square residual (RMR) = 0.04, NFI = 0.98, relative fit index (RFI) = 0.97, and CFI = 1.00. The ATM model fit indices were as follows: GFI = 0.96, AFGI = 0.94, χ2 = 63.14 (df = 54, p = .19), RMSEA = 0.03, RMR = 0.03, NFI = 0.97, RFI = 0.96, and CFI = 1.00. These findings indicated that the models fit the data very well. Examination of Satisfaction Measures in the Hedonic Context The standardized estimates for the various paths and the associated t values for the ice cream restaurant model are provided in Figure 1. All the parameters were found to be statistically significant at p < .05. Disconfirmation had a significant positive effect on satisfaction (standardized effect of 0.61, t = 8.07), and satisfaction had a positive impact on postpurchase behavior (standardized effect of 0.79, t = 15.79). An examination of the factor loadings and error variances of the measures for disconfirmation and
postpurchase behavior (Table 5) revealed that most of the measures for the two constructs were good, with high factor loadings and medium to low error variances. The satisfaction measures are examined in Table 6. It can be seen that the error variances of multi-item satisfaction measures (semantic differential [6 items], semantic differential [4 items], Likert-type [12 items] scales) were lower than those of single-item measures. All three multiitem measures also had higher factor loadings and item reliabilities than any of the single-item measures. This indicates that multi-item measures are better in capturing satisfaction than single-item measures, a view voiced by many researchers (e.g., Bearden and Teel 1983; Churchill and Surprenant 1982; Yi 1990). Three single-item measures—percentage, pleased, and delighted-terrible— emerged as good measures in comparison to the other single-item measures. They had the lowest error variances among all single-item measures. They also had high factor loadings and item reliabilities, with percentage scoring higher than pleased and delighted-terrible on both. The Likert-satisfied measure performed the worst among all the measures; that is, it had the lowest factor loading and item reliability and the highest error variance. Table 6 also provides a comparison between the affective content of each satisfaction measure and its factor loading. This allowed us to examine the notion that for products with mostly hedonic benefits, satisfaction measures high in affective content capture more of satisfaction than measures low in affective content. Happy and semantic differential (4 items) measures were those with the highest affective content (as shown in Table 6), whereas percentage, Likert-satisfied, and Likerttype (12 items) scales had low affective content. However, contrary to expectations, the cognition-heavy Likert-type (12 items) and percentage scales had higher factor loadings (0.90 each) than the happy measure (0.85), even though the happy measure had a higher affective content than both scales. Finally, the semantic differential (6 items) measure had the highest factor loading (0.95) among all the satisfaction measures but only moderate affective content compared to some other satisfaction measures (such as happy and semantic differential [4 items]). These results reject the notion that for products with mostly hedonic benefits, satisfaction measures high in affective content capture more of satisfaction than satisfaction measures low in affective content. Examination of Satisfaction Measures in the Utilitarian Context For the ATM service model, the standardized estimates for the various model paths and the associated t values are provided in Figure 2. All the parameters were statistically
Wirtz, Lee / CUSTOMER SATISFACTION MEASURES
351
FIGURE 1 Model 1: Ice Cream Restaurant Service Repeat1 e3
e17
e18
Repeat2
0.89 e1
e14
0.94
Disc1
Disconfirmationof-Expectations e2
e13
0.87
0.61* (8.07)
Post-Purchase Behavior
0.79* (15.79)
Satisfaction
0.96 Recommend1
e15
Disc2
0.69
0.95 Recommend2
0.87
0.90
Very SatisfiedVery Dissatisfied
Percentage
e4
e5
0.81
0.89
LikertSatisfied
Pleased
e6
0.85
0.88
Happy
e7
0.94
DelightedTerrible
e8
0.90
0.95 Semantic Diff. (4 items)
e9
e16
Semantic Diff. (6 items)
e10
Likert (12 items)
e11
e12
NOTE: Figures in parentheses are the t values. Curved arrows indicate covariance. *p < .05.
FIGURE 2 Model 2: Automatic Teller Machine (ATM) Service e16
e3
e1
e2
Disc1
Disc2
e15
0.92
0.83 Disconfirmationof-Expectations
0.72* (9.76)
0.45* (6.91)
Satisfaction
Post-Purchase Behavior 0.94
0.80
0.80 Very SatisfiedVery Dissatisfied e4
0.82 Percentage e5
0.82
0.76
LikertSatisfied
Pleased
e6
e7
0.76
0.73
Happy
0.83
DelightedTerrible
e8
NOTE: Figures in parentheses are the t values. Curved arrows indicate covariance. *p < .05.
e9
0.87 Semantic Diff. (4 items) e10
Repeat1
Repeat2
0.75 Semantic Diff. (6 items) e11
Likert (12 items) e12
e13
e14
352
JOURNAL OF SERVICE RESEARCH / May 2003
TABLE 5 Ice Cream Restaurant Study: Assessment of Disconfirmation and Postpurchase Behavior Measures Construct Disconfirmation Postpurchase behavior
Indicator Disc1 Disc2 Repeat1 Repeat2 Recommend1 Recommend2
TABLE 7 Automatic Teller Machine Service Study: Assessment of Disconfirmation and Postpurchase Behavior Measures
Factor Loading Error Variance .94 .69 .87 .89 .96 .95
.11 .50 .53 .35 .21 .24
Semantic differential (6 items) Semantic differential (4 items) Likert-type (12 items) Percentage Pleased Delighted-terrible Very satisfied–very dissatisfied Happy Likert-satisfied
Indicator
Disconfirmation
Disc1 Disc2 Repeat1 Repeat2
Postpurchase behavior
TABLE 6 Ice Cream Restaurant Study: Assessment of Satisfaction Measures
Measure
Construct
Correlation With Satisfaction Item Error Affective Loading Reliability Variance Factor
.95
.91
.09
.46
.94 .90 .90 .89 .88
.88 .81 .81 .79 .77
.16 .19 .21 .30 .25
.52 .37 .36 .46 .44
.87 .85 .81
.76 .72 .65
.40 .34 .51
.40 .54 .35
Factor Loading Error Variance .83 .80 .92 .94
.19 .22 .28 .16
TABLE 8 Automatic Teller Machine Service Study: Assessment of Satisfaction Measures
Measure Semantic differential (6 items) Semantic differential (4 items) Percentage Likert-satisfied Very satisfied–very dissatisfied Happy Pleased Likert-type (12 items) Delighted-terrible
Correlation With Satisfaction Item Error Cognitive Loading Reliability Variance Factor
.87
.76
.19
.57
.83 .82 .82
.69 .67 .67
.29 .32 .43
.52 .54 .49
.80 .76 .76 .75 .73
.64 .58 .57 .57 .54
.41 .41 .47 .28 .29
.51 .42 .48 .53 .50
NOTE: All scales are single-item scales unless indicated otherwise.
NOTE: All scales are single-item scales unless indicated otherwise.
significant at the .05 level. Disconfirmation had significant positive effects on satisfaction (standardized effect of 0.72, t = 9.76), and satisfaction had a positive impact on postpurchase behavior (standardized effect of 0.45, t = 6.91). An examination of the factor loadings and error variances of the disconfirmation and postpurchase behavior measures (Table 7) revealed that all measures were acceptable with high factor loadings and medium to low error variances. Table 8 shows that the error variances of multi-item satisfaction measures were lower than those of single-item measures. These measures also had higher factor loadings and item reliabilities than almost all of the single-item measures. This replicated the findings of the ice cream restaurant study. Similar to the results in the ice cream restaurant study, semantic differential (6 items) emerged as the best satisfaction measure with the highest item reliability and factor
loading and the lowest error variance. Also similar to the results of the ice cream restaurant study, percentage emerged as the best single-item scale. Table 8 provides a comparison between the cognitive content of each satisfaction measure and its factor loading. This served to examine the notion that for products with mostly utilitarian benefits, satisfaction measures high in cognitive content should capture more of satisfaction than satisfaction measures low in cognitive content. All the satisfaction measures were either moderately or highly correlated with the cognitive factor (see Table 8). Semantic differential (6 items) was the measure that had the highest correlation with the cognitive factor (0.57), followed by percentage (0.54). An examination of the results showed that most satisfaction measures with high correlations with the cognitive factor also loaded highly on satisfaction. These include the semantic differential (6 items), percentage, semantic differential (4 items), and very satisfied–very dissatisfied scales. Most satisfaction mea-
Wirtz, Lee / CUSTOMER SATISFACTION MEASURES
sures, which had low correlations with the cognitive factor, also had low factor loadings of the satisfaction construct. These include pleased, happy, and delighted-terrible. However, these findings do not conclusively show that, for utilitarian products, satisfaction measures high in cognitive content capture more of satisfaction than measures that are low in cognitive content for two reasons. First, the Likert-type (12 items) scale had the second lowest loading (0.75) on satisfaction, although its correlation with the cognitive factor was the third highest among all satisfaction measures. Second, a comparison of both the ice cream and ATM structural models revealed that two measures had consistently low error variances (< 0.30), high item reliabilities (> 0.70), and high factor loadings (> 0.83). Not surprisingly, both scales were multi-item measures. They were the semantic differential (6 items) and semantic differential (4 items) measures. Both measures loaded highly on the cognitive factor, and semantic differential (4 items) correlated more highly with the affective factor than semantic differential (6 items). Among the single-item measures, percentage emerged as the best measure in both studies with the lowest error variances (< 0.40) and the highest factor loadings. Again, these findings rejected the notion that different measures perform better for measuring satisfaction with different types of product benefits.
CONCLUSIONS, DISCUSSION, AND IMPLICATIONS Two multi-item semantic differential scales performed best across both service contexts. A semantic differential (6 items) scale (e.g., Oliver and Swan 1983) was the best performing measure across both service contexts. It loaded most highly on satisfaction, had the highest item reliability, and had by far the lowest error variance across both studies. A semantic differential (4 items) scale (e.g., Eroglu and Machleit 1990) was the second best performing measure, which was again consistent across both contexts. The third best scale was single-item percentage measure (e.g., Westbrook 1980), which was the third and fourth best measure for the ATM and ice cream restaurant context, respectively. It may be that the finely grained rating on its 11-point scale with extreme scale anchors (not at all satisfied to completely satisfied) is able to capture satisfaction well and discriminate between different degrees of satisfaction to a similarly high level as the top two performing multi-item 7-point scales. It seems that these multi-item scales achieve finely grained measurement by tapping into satisfaction from different angles. All other measures tested either did not consistently perform well across both service contexts or performed significantly less well in both service contexts. For exam-
353
ple, the third multi-item measure tested, Westbrook and Oliver’s (1981) Likert-type (12 items) scale, performed well for the ice cream restaurant but not the ATM service. It may be that this scale performs better if respondents have a certain level of involvement (assuming that the respondent involvement differed significantly between the ice cream restaurant and the ATM service) so that they are able to give meaningful responses to the items in this scale concerning choice evaluation, decision making, timing of purchase, and items on the experience, such as “This is one of the best experiences I had with such services” (see Table 1). Similarly, the Likert-satisfied measure performed relatively well for the ATM but performed worst for the ice cream restaurant service. Although still at acceptable levels, the overall least well performing measures were the single-item delighted-terrible, happy, and the Likertsatisfied scales. Furthermore, it was found that the applicability of satisfaction measures seems to be independent of whether the product benefits were mostly hedonic or utilitarian. In particular, the top three performing measures performed best for both products, having the highest satisfaction loadings and lowest error of all measures tested. The content of satisfaction measures (whether high/low in affective content or high/low in cognitive content) did not seem to influence their satisfaction loadings across different product benefits. Therefore, the notion that product benefits and measure content should be matched was rejected. Perhaps, as suggested by Crooker and Near (1998), cognitive and affective measures might not be distinct from each other in the studies of subjective well-being. As such, it seems more important for satisfaction measures to have good psychometric properties (i.e., to have good reliability, convergent and discriminant validity, and low measurement error) than to have either affective or cognitive scale anchors. In sum, it seems that although customer satisfaction has a dual cognitive and affective basis (Oliver, Rust, and Varki 1997), it is a one-dimensional construct (Westbrook and Oliver 1981) with measurement scales that perform equally well in capturing utilitarian and/or hedonic consumption experiences and related attributes or dimensions. Our findings seem to suggest that satisfaction is a truly one-dimensional response to consumption experiences. Finally, we found that all nine measures tested in this study captured significant levels of affective and cognitive content of consumption satisfaction independent of their scale anchors. This finding is surprising because conceptual work in the past had suggested that customer satisfaction measures with cognitive or affective anchors capture cognitive or affective aspects of satisfaction, respectively. The results of the present study are, however, consistent with work in the job satisfaction literature (Brief and Rob-
354
JOURNAL OF SERVICE RESEARCH / May 2003
ertson 1989; Organ and Near 1985; Williams 1988), which also showed that job satisfaction scales load on both affect and cognition independent of their scale anchors. In conclusion, the satisfaction evaluation in itself seems to be able to equally well capture hedonic and utilitarian consumption experiences. However, stronger emotional responses to consumption, such as customer delight, probably would be better captured by separate constructs and their respective measurement scales. For example, Oliver, Rust, and Varki (1997) showed empirically that satisfaction and consumption delight were two different constructs. In conclusion, we identified a six-item 7-point semantic differential scale that consistently performed best across both studies, followed by a four-item 7-point semantic differential scale that consistently performed second best and a one-item 11-point percentage scale that was ranked third and fourth in the two studies. It seems that dependent on a trade-off between length of the questionnaire and quality of satisfaction measure, these scales seem to be good options for measuring customer satisfaction in academic and applied studies research alike. All other measures tested consistently performed worse than our top three measures, and/or their performance varied significantly across the two service contexts in our study. These results suggest that more careful pretesting would be prudent should these measures be used.
LIMITATIONS AND FURTHER RESEARCH Although this study makes a contribution to our knowledge of customer satisfaction measurement, several limitations and future research suggestions deserve mention. First, more extreme hedonic and utilitarian service contexts could be tested. Although the two research contexts selected for the present study differed most in the 10 pretest services in terms of their degree of hedonic and utilitarian benefits, future work could replicate our findings in perhaps even more extreme service settings (e.g., using a recreational theme park or a symphony concert as contexts for mostly hedonic services, as were used by Oliver, Rust, and Varki 1997). Also, the applicability across contexts could be explored along dimensions other than the hedonic-utilitarian dimension tested here. One dimension for further testing might be product involvement, which has been shown to affect the satisfaction evaluation process (e.g., Mattila 1998). Second, the present study focused on scale anchors and commonly used satisfaction scales. It did not consider other measurement issues such as potential response biases, respondent interpretation and understanding, discriminating power, ease of administration, ease of use by
respondents, and credibility and usefulness of results. These issues have been discussed in the context of singleitem customer satisfaction measures with varying number of scale points (cf. Devlin, Dong, and Brown 1993) but not specifically with regard to the satisfaction measures tested in our study. Future work could explore how common satisfaction measures perform on those dimensions and perhaps explore boundary conditions. For example, an 11point scale may perform well when presented in a printed questionnaire, but too many categories may be confusing and lead to response biases when presented in a telephone interview (cf. Devlin, Dong, and Brown 1993). Also, in our study, it seemed that the 11-point percentage (one item) scale was almost as good as the best performing two multi-item scales. Future work could examine whether the quality of other single-item satisfaction scales could be significantly improved by increasing the number of scale points, something that is less likely to further improve the already more finely grained multi-item scales.
REFERENCES Andrews, Frank M. and Stephen B. Withey (1976), Social Indicators of Well-Being. New York: Plenum. Batra, Rajeev and Olli T. Athola (1990), “Measuring the Hedonic and Utilitarian Sources of Consumer Attitudes,” Marketing Letters, 2 (2), 159-70. Bearden, William O. and Jesse E. Teel (1983), “Selected Determinants of Consumer Satisfaction and Complaint Reports,” Journal of Marketing Research, 20 (February), 21-8. Brief, Arthur P. and Loriann Robertson (1989), “Job Attitude Organisation,” Journal of Applied Social Psychology, 19 (9), 717-27. Churchill, Gilbert A. and Carol Surprenant (1982), “An Investigation into the Determinants of Customer Satisfaction,” Journal of Marketing Research, 19 (November), 491-504. Crooker, Karen J. and Janet P. Near (1998), “Happiness and Satisfaction: Measures of Affect and Cognition?” Social Indicators Research, 44 (2), 195-224. Crosby, Lawrence A. and Nancy Stephens (1987), “Effects of Relationship Marketing on Satisfaction, Retention, and Prices in the Life Insurance Industry,” Journal of Marketing Research, 24 (November), 404-11. Dabholkar, Pratibha A. (1995), “The Convergence of Satisfaction and Service Quality Evaluations with Increasing Customer Patronage,” Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 8 (1), 61-71. Devlin, Susan J., H. K. Dong, and Marbue Brown (1993), “Selecting a Scale for Measuring Quality,” Marketing Research, 5 (1), 12-7. Dube-Rioux, Laurette (1990), “The Power of Affective Reports in Predicting Satisfaction Judgements,” Advances in Consumer Research, 17, 571-76. Eroglu, Sergin A. and Karen A. Machleit (1990), “An Empirical Study of Retail Crowding: Antecedents and Consequences,” Journal of Retailing, 66 (Summer), 201-21. Fornell, Claes (1992), “A National Customer Satisfaction Barometer,” Journal of Marketing, 56 (January), 6-21. and D. F. Larcker (1981), “Evaluating Structural Equation Models with Unobservable Variables and Measurement Error,” Journal of Marketing Research, 18 (February), 39-50.
Wirtz, Lee / CUSTOMER SATISFACTION MEASURES
Hausknecht, Douglas (1990), “Measurement Scales in Consumer Satisfaction/Dissatisfaction,” Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 3, 1-11. Hirschman, Elizabeth C. and Morris B. Holbrook (1982), “Hedonic Consumption: Emerging Concepts, Methods and Propositions,” Journal of Marketing, 46 (September), 92-101. LaBarbera, Priscilla A. and David Mazursky (1983), “A Longitudinal Assessment of Consumer Satisfaction/Dissatisfaction: The Dynamic Process of the Cognitive Process,” Journal of Marketing Research, 20 (November), 393-404. Lee, Meng Chung and Jochen Wirtz (1997), “Choosing Appropriate Customer Satisfaction Measures: First Steps towards a Normative Framework,” Proceedings of the Eighth Biennial World Marketing Congress 1997, Kuala Lumpur, Malaysia: Academy of Marketing Science, 8, 244-6. Mano, Haim and Richard L. Oliver (1993), “Assessing the Dimensionality and Structure of the Consumption Experience: Evaluation, Feeling and Satisfaction,” Journal of Consumer Research, 20 (December), 451-66. Mattila, Anna (1998), “An Examination of Consumers’ Use of Heuristic Cues in Making Satisfaction Judgments,” Psychology & Marketing, 15 (August), 477-501. Mehrabian, Albert and James Russell (1974), An Approach to Environmental Psychology. Cambridge, MA: MIT Press. Oliver, Richard L. (1980), “A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions,” Journal of Marketing Research, 17 (November), 460-9. (1997), Satisfaction: A Behavioral Perspective on the Consumer. New York: McGraw-Hill. and William O. Bearden (1983), “The Role of Involvement in Satisfaction Processes,” Advances in Consumer Research, 11, 250-5. , Roland T. Rust, and Sajeev Varki (1997), “Customer Delight: Foundations, Findings, and Managerial Insight,” Journal of Retailing, 73 (3), 311-36. and John E. Swan (1989), “Consumer Perceptions of Interpersonal Equity and Satisfaction in Transactions: A Field Survey Approach,” Journal of Marketing, 53 (April), 21-35. and Robert A. Westbrook (1993), “Profiles of Consumer Emotions and Satisfaction in Ownership and Usage,” Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 6, 12-27. Organ, Dennis W. and Janet P. Near (1985), “Cognition vs Affect in Measures of Job Satisfaction,” International Journal of Psychology, 20, 241-53. Patterson, Paul G. (2000), “A Contingency Approach to Modeling Satisfaction with Management Consulting Services,” Journal of Services Research, 3 (2), 138-53. Stauss, Bernd and Bert Hentschel (1992), “Attribute-Based versus Incident-Based Measurement of Service Quality: Results of an Empirical Study within the German Car Service Industry,” in Quality Management in Services, Paul Kunst and Jos Lemmink, eds. Assen/ Maastricht, the Netherlands: Van Gorcum, 59-78. Swan, John E. and Irving F. Trawick (1981), “Disconfirmation of Expectations and Satisfaction with a Retail Service.” Journal of Retailing, 57 (Fall), 49-67. Tse, David K. and Peter C. Wilton (1988), “Models of Consumer Satisfaction Formation: An Extension,” Journal of Marketing Research, 25 (May), 204-12. Westbrook, Robert A. (1980), “A Rating Scale for Measuring Product/ Service Satisfaction,” Journal of Marketing, 44 (Fall), 68-72. (1983), “Consumer Satisfaction and the Phenomenology of Emotions during Automobile Ownership Experiences,” in International Fare in Consumer Satisfaction and Complaining Behavior, Ralph L. Day and Keith H. Hunt, eds. Bloomington: Indiana University Press.
355
(1987), “Product/Consumption-Based Affective Responses and Postpurchase Processes,” Journal of Marketing Research, 24 (August), 258-70. and Richard L. Oliver (1981), “Developing Better Measures of Consumer Satisfaction: Some Preliminary Results,” Advances in Consumer Research, 9, 94-9. Williams, L. J. (1988), “Affective and Non-Affective Components of Job Satisfaction Organizational Commitments as Determinants of Organisational Citizenship and Behaviors.” Ph.D. diss., Indiana University. Wirtz, Jochen (2001), “Improving the Measurement of Customer Satisfaction: A Test of Three Methods to Reduce Halo,” Managing Service Quality, 11 (2), 99-111. and John E. G. Bateson (1995), “An Experimental Investigation of Halo Effects in Satisfaction Measures of Service Attributes,” International Journal of Service Industry Management, 6 (3), 84-102. and (1999), “Consumer Satisfaction with Services: Integrating the Environmental Perspective in Services Marketing into the Traditional Disconfirmation Paradigm,” Journal of Business Research, 44 (1), 55-66. and Anna Mattila (2001), “Exploring the Role of Alternative Perceived Performance Measures and Needs-Congruency in the Consumer Satisfaction Process,” Journal of Consumer Psychology, 11 (3), 181-92. , , and Rachel L. P. Tan (2000), “The Moderating Role of Target-Arousal State on the Impact of Affect on Satisfaction: An Examination in the Context of Service Experiences,” Journal of Retailing, 76 (3), 347-65. Yi, Youjae (1990), “A Critical Review of Consumer Satisfaction,” in Review of Marketing, Valerie A. Zeithaml, ed. Chicago: American Marketing Association. Yu, Yi-Ting and Alison Dean (2001), “The Contribution of Emotional Satisfaction to Consumer Loyalty,” International Journal of Service Industry Management, 12 (3), 234-50.
Jochen Wirtz is an associate professor of marketing and director of the APEX-MBA (Asia-Pacific Executive MBA) program with the NUS Business School, National University of Singapore. His current research focuses on customer satisfaction and customer feedback systems. His recent work has been published in the Journal of Retailing, Journal of Consumer Psychology, Journal of Business Research, Psychology and Marketing, Journal of Services Marketing, International Journal of Service Industry Management, and Cornell HRL Quarterly. He is also the author of Services Marketing in Asia: Managing People, Technology and Strategy (with Christopher Lovelock and Hean Tat Keh). In recognition of his excellence in instruction, He has received several teaching awards, including the business school’s Outstanding Educator Award of the Year and the Award for Excellence in Instruction from the MBA Alumni. Outside academia, he has been an active management consultant working with a number of consulting firms in Asia and Europe, including Accenture, Arthur D. Little, and KPMG. Meng Chung Lee is a manager with the Research and Statistics Unit of the Infocomm Development Authority in Singapore. Her current work focuses on analysis and projection of Infocomm industry statistics. She was an M.Sc. student in marketing with the NUS Business School, National University of Singapore while this study was conducted.