Measurement of Consumers' Sensory Discrimination ... - Springer Link

4 downloads 9311 Views 177KB Size Report
Aug 31, 2015 - Preference: Efficiency of Preference-difference Test Utilizing the 3-Point ... method. In this method, 3-point paired-preference test was performed before ... Email: [email protected] ...... tests. J. Marketing 22: 411-414 (1958). 33.
Food Sci. Biotechnol. 24(4): 1355-1362 (2015) DOI 10.1007/s10068-015-0174-0

RESEARCH ARTICLE

Measurement of Consumers’ Sensory Discrimination and Preference: Efficiency of Preference-difference Test Utilizing the 3-Point Preference Test Precedes the Same-different Test In-Ah Kim, Ji-Young Yoon, and Hye-Seong Lee

Received January 27, 2015; revised April 6, 2015; accepted April 26, 2015; published online August 31, 2015 © KoSFoST and Springer 2015

Abstract In fast moving consumer goods industries, effective and robust sensory discrimination and preference methods need to achieve various business objectives, such as product reformulation. A new preference-difference test was designed in this study as more effective and reliable method. In this method, 3-point paired-preference test was performed before conducting same-different test. The performance of new method was compared with performances of difference-preference test and conventional pairedpreference test. Each of 256 female consumers participated in all three test methods in a counterbalanced order for discriminating two types of strawberry flavored carbonated drinks. McNemar test and Thurstonian modeling/signal detection theory (SDT) were used to compare the effectiveness of same-different tests. SDT analysis and a significance test using the concept of ‘identicality norm’ were applied to compare robustness of preference tests. The results of these analyses indicated that preference-difference test could not only provide better discrimination but also showed more robust preference results. Keywords: equivalence, sensory discrimination, sensory preference, quality control, reformulation

Introduction In fast moving consumer goods (FMCG) industries such as food industry, overall sensory difference tests and preference tests using consumers have been used to investigate the In-Ah Kim, Ji-Young Yoon, Hye-Seong Lee () Department of Food Science and Engineering, Ewha Womans University, Seoul 120-750, Korea Tel: +82-2-3277-6687; Fax: +82-2-3277-6687 Email: [email protected]

sensory and preferential difference (or equivalence) between similar products for various business objectives such as reformulation, benchmarking, cost reduction, and advertising claim substantiation (1-3). Therefore, in order to make successful business decisions in the FMCG industry, we need test methods that are highly effective and accurate in determining overall sensory difference and preference of samples. Previous studies have reported that better discriminating results were obtained when the overall difference tests were conducted by evoking consumers’ feelings and affective states of mind (4-6). When consumers eat or drink food products everyday life, they do not analyze but enjoy them in a holistic and unitary way (7). It has been also reported that consumers’ natural perceptual process for food is generally affective rather than analytical (6,8-10). Based on this notion, Chae et al. (11) have devised an affective familiarization which performed before conducting the same-different tests. They found that the sensitivity of the same-different test increased after introducing this affective/ synthetic perceptual strategy. Taking advantage of this affective-based sensory perception process, Kim et al. (2,3) have developed a new methodology of the preferredreference duo-trio tests using a constant-reference mode duo-trio method preceded by the paired-preference test. Nevertheless, while conducting consumer tests, it is desirable to minimize the number of tastings in order to reduce fatigue and sensory adaptations that could be induced by complex and susceptible sensory tests. Therefore, compared to the duo-trio tests in which three samples are used for conducting a single test, it is preferable to use the samedifferent tests in which only two samples are used for conducting a single test. In this study, we have investigated further modifications in the overall difference test method using a new format of the preference-difference test, which

1356

is based on the long version of the same-different test method and the 3-point paired-preference test. The goal of this study was to develop an effective and robust preference-difference test method. This novel test method might provide sensory difference information induced by the consumers’ natural affective perception. In addition, this test method might also provide the preference information relevant to consumer behavior. While implementing this new method, the paired-preference test with ‘no preference’ option was performed before conducting the same-different test. This preference test having ‘no preference’ option was named 3-point paired-preference test, while the forced paired-preference test was referred as 2-point pairedpreference test. The performance of this newly designed preference-difference test method was compared with the difference-preference test and the conventional 2-point paired-preference test method. In a previous study, Chapman et al. (12) have introduced the difference-preference test in which the 2-point paired-preference test is conducted after performing the same-different test. They argued that such branching questions reduced the number of ‘false preference’ responses. Thus, they conducted a more accurate preference analysis. But, in this study, we hypothesized that the preference-difference test was more efficient than the difference-preference test because of the following reasons: 1) in the preference-difference test, consumers’ naturally affective perceptual process is involved in sensory discrimination; 2) in the difference-preference test, ‘no preference’ is recorded if subjects responded ‘same’ (S) after tasting two samples in the same-different test. However, it is hard to be convinced without performing the actual preference test. In this study, by conducting the 3point paired-preference test in a different order, we compared these sensory discrimination tests and experimentally tested this hypothesis. The method’s efficiency was compared in terms of test effectiveness and robustness. The discrimination sensitivity of the method was used as a criterion for determining method effectiveness, while the effects of different sequences of sample presentations were used to determine method robustness.

Materials and Methods Subjects Two hundred and fifty-six females were recruited from Ewha Womans University in Seoul, Korea. These subjects were untrained, naïve consumers. They were undergraduate and graduate students (age: over 20 years old) who consumed strawberry-flavored carbonated drinks and had interests in participating in food evaluation without disqualification for tasting food. They were instructed not to eat food or drink anything except water, and they were neither allowed to use mouthwash nor were they permitted

Kim et al.

to use strong fragrant cosmetics or perfumes at least one hour prior to performing these tests. Samples and tasting condition Two different types of strawberry-flavored carbonated drinks (Nongshim Co., Ltd., Seoul, Korea) were used in this study. Sample A, which was the reference sample, contained artificial strawberry flavor while, the reformulated version of reference sample (sample B) contained strawberry juice. Owing to the different ingredients, these two samples had subtle differences in their sensory characteristics. Prior to conducting this experiment, the samples were refrigerated (5oC) as a can for one or two days. To maintain the same tasting temperature, a thermos (1 L; Zhejiang Gangzida Industry & Trade Co., Ltd., Zhejiang, China) was used. A new can (355 mL) containing the sample was opened and poured into the thermos and then five minutes prior to the experiment, samples were poured into ten cups (20 mL of the sample was poured in each cup) having lids. Then, these cups were tasted by each subject. In order to control the carbonation of the sample, the remaining content of the can was discarded. A 20 mL aliquot of the sample was served in a 60 mL disposable plastic cup (C & Tech Co., Yeoju, Korea). These cups were presented on a white plastic tray at a consistent temperature (8±0.5oC). Two cups of the samples were used in a single test. Each sample was coded with a three-digit random number. The presentation sequence of samples was counterbalanced. Each subject performed the tests in an isolated sensory booth under white fluorescent light condition (light intensity: 650-720 lux). Subjects were required to drink the entire portion while conducting each test. Subjects were not allowed to taste the samples again. These subjects rinsed their mouth with purified water before they started each session. Each subject took approximately 20 to 30 minutes to finish an experimental session of ten tests. Experimental design The three sensory test methods (the 2-point paired-preference, the difference-preference test, and the preference-difference test) were compared using a related samples design. The preference-difference test was the one in which a 3-point paired-preference test was performed before conducting a same-different test. Conversely, the difference-preference test was the one in which a same-different test was conducted before performing a 3-point paired-preference test. In this one-day experiment, different test methods were performed in separate experimental sessions. Therefore, all subjects participated in three experimental sessions. The experimental design is summarized in Table 1. In order to counterbalance the order of the test methods, each subject was randomly included in one of the four groups so each group had sixty-four subjects. The 2-point

1357

Efficiency of Preference-difference Test Table 1. The overview of experimental design Subjects

1st session

2nd session

3rd session

Group 1-1 Group 1-2 Group 2-1 Group 2-2

2-point paired-preference test 2-point paired-preference test difference-preference test preference-difference test

difference-preference test preference-difference test preference-difference test difference-preference test

preference-difference test difference-preference test 2-point paired-preference test 2-point paired-preference test

paired-preference test was performed either in the first experimental session (Group 1) or last session (Group 2). For both Groups 1 and 2, half of the subjects first performed the different-preference test method, which was considered as the control group. The other half of the subjects first performed the preference-difference test method, which was considered as the treatment group. Within each group, two sub-groups performed the same-different test and the 3-point paired-preference test in a different order in two separate experimental sessions. The 2-point paired-preference test Each subject performed two tests using two different sample pairs ( and ) in a random order. In each test, subjects were asked to taste samples from left to right. Then, they were asked to select the one which they preferred. The difference-preference test (The same-different test before the 3-point paired-preference test) Each subject performed four tests, each consisting of two tasks. First, they performed a same-different test and then they performed a 3-point paired-preference test allowing ‘no preference’ option. The four tests included the following sample sequence: , , , and , in a random order. In the same-different test, subjects were required to taste samples from left to right. Then, they had to answer whether the two samples are ‘same’ (S) or ‘different’ (D). The subjects who said that the two samples were ‘different’ (D) were provided with a separate response sheet in which they had to answer which of the two samples they preferred. If the subjects said that the two samples which they tasted were ‘same’ (S), their preference response was regarded as ‘no preference’. The preference-difference test (The same-different test after the 3-point paired-preference test) Each subject performed four tests, each consisting of two tasks. First, a 3-point paired-preference test allowing ‘no preference’ option was conducted, and then a same-different test was conducted. Subjects were required to taste samples left to right. Then, they were asked to answer which of the two samples they preferred on a response sheet. Thereafter, they were also asked to answer whether the two samples were ‘same’ (S) or ‘different’ (D) in a separate response sheet.

Data analysis Based on the long version of the samedifferent tests, we analyzed the responses obtained from the same-different test conducted in this study (13,14). In the long version of the same-different tests, when the two samples ‘A’ and ‘B’ were to be discriminated, the same pair ( or ) as well as the different pair ( or ) were presented. Thus, when these preferencedifference test or difference-preference test formats included the preference question and the ‘same or different’ question, it was possible to collect the preference responses as well as the ‘same’ (S) or ‘different’ (D) responses for the same or different pairs of samples. For the analysis of responses obtained in the samedifferent tests, two different data analysis methods were used and compared: the McNemar test (15) and the d' analysis under Thurstonian modeling/signal detection theory (SDT) (16,17). In the McNemar test, the response frequencies from each pair of samples were analyzed by χ2-test to determine the significance of the sensory difference between samples. In Thurstonian modeling/signal detection theory (SDT) (16,17), the d' estimates indicate the degree of sensory difference between samples, which is not contaminated by the response bias (17). Based on the data pooled from all the subjects, the d' estimates were computed for each sample sequence obtained through each method. These d' estimates were then tested to determine whether they were significantly different from zero to be compared with the result of McNemar test. To analyze the responses obtained from the preference tests, the d' estimates were computed from all three methods based on pooled data across all subjects, for different pairs of samples. Based on this Thurstonian modeling/signal detection theory (SDT) (16,17), it was possible to compare the test performance of three different versions of the paired-preference test methods (the 2-point paired-preference test, the 3-point paired-preference test from the differencepreference test, and the 3-point paired-preference test from the preference-difference test). All the computations of d' estimates and their variances were performed using the R-package sensR (18). The 2AFC Thurstonian model was applied to analyze the results of 2-point paired-preference test. Whereas, the results of 3point paired-preference test were analyzed using the Thurstonian 2-AC model (19-21). The significance test and the two-sided 95% confidence interval which was determined

1358

Kim et al.

using the likelihood root statistic were performed using the R-package ‘sens R’ (18) and ‘ordinal’ (22). Both R-package ‘sens R’ and ‘ordinal’ are available in the free statistical software package R (23). In addition, to determine whether there was a significant difference between the response frequencies for the same pair and different pair, one-way χ2-test was used (24).

Results and Discussion In this study, we analyzed the results of same-different tests and preference tests. In each test method, we investigated all the different sample sequences. Thus, we determined the test that would bring about higher discrimination. We also found out which test had the most robust preference performance. In both the same-different tests and preference tests, one important method parameter that can affect the test performance is the sample presentation sequence (1,25). Based on the definition of the test robustness used in the experiments of Boutrolle et al. (26) and Kim et al. (27), we examined the robustness of this consumer test method in terms of the stability of results obtained from the variable test sequences, which were used in the same method. Comparison of discrimination performance To determine whether or not there exists a sensory difference between samples, we have conducted a significance test that is conventionally used in food sensory science. In this study, the significant tests were applied using two different approaches: the McNemar test (Table 2) and the d' analysis (Table 3). In the context of Thurstonian modeling/signal detection theory (SDT) (16,17), the test method that produces higher d' estimate is considered to be more effective because it reflects better discrimination performance

between samples. Based on 95% confidence interval, the d' estimates can also be tested against zero to determine statistical significance. Thus, for comparisons of the sensory difference test methods, the results of the same-different test method indicate significant differences in more number of cases could be considered as a more effective measurement method. From this point of view, for comparisons of the two data analysis methods to the sensory difference tests, the results of the data analysis that would indicate significant differences in more number of cases could be considered as more effective data analysis method. The McNemar test compares the responses which responded ‘same’ (S) to the control pair (same pair) and simultaneously responded ‘different’ (D) to the test pair (different pair) and the responses which responded ‘different’ (D) to the control pair (same pair) and simultaneously responded ‘same’ (S) to the test pair (different pair). In Table 2, these response frequencies are shown in the middle two columns titled and (shaded area). Based on the p-value presented in Table 2, we evaluated the results of the two test methods using the McNemar test. In the difference-preference method, none of the samples showed any significant difference in the results. But, in the preference-difference test, the sample sequences, and were found to have significant difference. This result indicates that the discriminative sensitivity of the preference-difference test was superior to that of the difference-preference test. As shown in Table 3, the d' estimates of the samedifferent test were computed by comparing the ‘same’ (S) responses with both the control pair (same pair) and the test pair (different pair). Based on the results at 95% confidence intervals in Table 3, the d' estimates also confirm that the preference-difference test is more discriminating than the difference-preference test. However, unlike the results of McNemar test, the result of d' analysis indicates that the

Table 2. Results of McNemar applied on the same-different tests using response frequencies obtained from 256 consumer subjects

1)

Method

Stimuli sequence

Response frequencies1)

p-value2)

78 88 79 84 329

0.26 1.15 0.75 1.98 4.20

0.55 0.25 0.34 0.14 0.04

101 93 97 84 375

2.29 0.74 7.44 3.78 13.19

0.11 0.34 0.01 0.04 0.00







Difference-preference test

Pooled

39 43 44 43 169

73 69 72 73 287

66 56 61 56 239

Preference-difference test

Pooled

29 28 38 32 127

72 73 76 82 303

54 62 45 58 219

S (or D) marked in italic indicates the responses of ‘same’ (or ‘different’) to the sample pair. Values in bold indicate significant difference. α=0.05

2)

χ2



1359

Efficiency of Preference-difference Test Table 3. Results of d' estimates and its 95% CI computed for the two types of same-different tests Method

Difference-preference test

Preference-difference test

Stimuli sequence

Response frequencies of ‘same’ (S) Same pairs (256) Different pairs (256)

d' (SE)

95% CI of d' 1)



112 112 116 116

105 99 105 99

0.54 (0.56) 0.74 (0.47) 0.67 (0.50) 0.84 (0.40)

-0.57-1.64 -0.18-1.67 -0.31-1.66 0.06-1.63

Pooled

228

204

0.71 (0.33)

0.05-1.36



101 101 114 114

83 90 83 90

0.98 (0.36) 0.74 (0.50) 1.22 (0.23) 1.03 (0.30)

0.27-1.68 -0.24-1.72 0.75-1.68 0.43-1.62

Pooled

215

173

1.00 (0.20)

0.61-1.40

1)

Two-sided 95% confidence interval for the d' estimate was computed based on the likelihood root statistic and values in bold indicate difference from zero.

difference-preference test has a significant difference in case of sample sequence. The proportion of hits and false alarms used to quantify the d' estimates were based on Thurstonian modeling/signal detection theory (SDT) (16,17). Thus, d' estimates for are significantly different from zero because the response frequencies of ‘same’ (S) for same pair (hit) were high, while the response frequencies of ‘same’ (S) for different pair (false alarm) were low. Moreover, in the preferencedifference test, the test using a sample sequence, was found to be significantly different in addition to the tests using sample sequences, and . Therefore, it can be concluded that the d' analysis is more sensitive than the McNemar test in analyzing the same-different test results. From applying both data analysis methods, it was confirmed the sequence effects of same-different tests, which previously were reported at discriminating milk samples (11) and tomato juice samples containing varying levels of salts (1), could also be applied for discriminating carbonated drinks. In this study, such sequence effects were attributed to an increasing number of ‘same’ (S) responses to the same pair of the sample B (). This implies that the sensory characteristics of sample B might be better than the sample A, as seen by the responses of consumers. Thus, for the future sensory study, should sensory discrimination tests be performed, the test methods that use the sample B as a constant-reference or reminder might be recommended as more efficient test method. Comparison of preference results In the 2-point pairedpreference tests, the forced response option (prefer sample A vs. sample B) has been conventionally used in sample presentations of a different pair ( or ). Yet, this forced response option has the problem of overestimating preferences due to potential guessing (28). Moreover, if a

‘no preference’ option is not provided and the preferences are split 50:50 using the forced response option, it would be impossible to distinguish whether there exists consumer segments in preference (i.e. half prefer sample A and the other half prefer sample B). We also cannot determine if no preference occurred due to no differences between the two samples (19,29). Therefore, recently, the effectiveness of the 3-point paired-preference test has been actively reported (20,30,31). This 2-point paired-preference test includes the ‘no preference’ option and it is also the format that is legally approved for supporting advertising claims (24). Yet, further research studies are required to validate the effectiveness and robustness of the 3-point paired-preference test. To determine whether there is a difference in test performance in the three different types of preference tests used in this study, the results of the three preference tests showing the preferences of different pairs were compared in Table 4. We compared the preference test results only for and sample pairs, because the 2-point pairedpreference test was only used for different sample pairs. The d' estimates of the three preference tests were not significantly different from zero. For the 3-point pairedpreference tests, t-criterion is also displayed in Table 4. The value of the t-criterion is subjects’ decisional parameter indicating the size of the preference used to judge whether the samples are different or not in terms of preference. When performing the preference test as a secondary question in the difference-preference test, the subjects used a larger criterion indicating that they were more reluctant to give ‘preference’ response. On the other hand, when performing the preference test as a primary question in the preference-difference test, the subjects were comparatively less reluctant to give ‘preference’ response. This difference could be explained by the fact that in the procedure of the difference-preference test, subjects who gave the ‘same’

1360

Kim et al.

Table 4. Results of d' estimates, τ-criterion, and its 95% CI for the three types of paired-preference test performed on different pairs Method 3-point paired-preference test

Difference-preference test Preference-difference test

2-point paired-preference test

d' (SE)1)

95% CI of d' 2)

τ-criterion (SE)

95% CI of τ-criterion2)

0.00 (0.07) -0.07 (0.07)

-0.13-0.14 -0.21-0.07

0.89 (0.05) 0.59 (0.04)

0.79-0.99 0.51-0.67

0.11 (0.08)

0.00-0.27

-

-

1)

Positive values indicate preference for sample B, while negative values indicated preference for sample A. Two-sided 95% confidence interval for the d' estimate and τ-criterions were computed using the likelihood root statistic.

2)

Table 5. Results of preference response in terms of percentage (and frequency) for same pairs and different pairs, and the p-value of its χ2-test applied in the two types of preference test Method Difference-preference test Preference-difference test p-value

Responses to same pairs

Responses to different pairs

Prefer A

No preference

Prefer B

Prefer A

No preference

Prefer B

25.6% (131) 34.8% (178)

49.6% (254) 33.4% (171)

24.8% (127) 31.8% (163)

26.2% (134) 35.5% (182)

46.9% (240) 32.4% (166)

27.0% (138) 32.0% (164)