Latent class logits and discrete choice experiments: Implications for welfare measures Adán L. Martínez-Cruz1
Current research practices include estimation of latent class logits on data collected with discrete choice experiments. This practice relies on a mismatch in the characterization of heterogeneity in preferences: while discrete choice experiments usually assume homogeneity, latent class logits seek for discrete heterogeneity. This paper uses Monte Carlo simulations to study whether this mismatch impacts the reliability of welfare estimates. The experiment design in this paper varies i) the amount of discrete heterogeneity, and ii) the amount of information available through either number of pseudo-respondents or number of choice sets. Resulting estimates are unbiased with relatively large dispersion in every simulated scenario. Due to the large dispersion, the null hypothesis that a welfare measure is zero cannot be rejected. This false conclusion is reached even under scenarios in which the amount of simulated available information is larger than the amount of information usually available in empirical applications. Since simulated scenarios closely resemble features of empirical applications, findings from this paper imply that an analyst planning the estimation of a latent class logit on discrete choice data will need either (i) to collect more information than usually gathered in empirical applications; or (ii) to gather information about the source and/or magnitude of discrete heterogeneity, and include this information in the sample size calculation; or (iii) to design discrete choice experiments seeking efficient WTP estimates; or (iv) a strategy combining the previous three options.
NOTICE: This is the author’s version of a work that has been accepted for publication in the Revue d’économie politique (special issue on no-market valuation). Changes resulting from publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. A definite version is forthcoming. Citation: Martínez-Cruz, A. L. (2015), Latent class logits and discrete choice experiments: Implications for welfare measures, Revue d’économie politique (special issue on no-market valuation), forthcoming.
1. Introduction Current practices in economics and other research fields include the estimation of latent class logits (LCL) on data collected with discrete choice experiments (DCE) (e.g. Broch and Vedel [2012]; Garrod et al. [2012]; Kikulwe et al. [2011]). 2 As pointed out by Louviere et al. [2011], this practice relies on a
1 Senior researcher at CEPE, ETH-Zurich,
[email protected]. The author thanks Ted McConnell and Anna Alberini for their comments and suggestions on previous versions of this paper, as well as the participants to the first workshop on non-market valuation (IUT-University of Nantes) and one anonymous reviewer. 2 A detailed description of latent class logit and discrete choice experiments is beyond the scope of this paper. Further details about discrete choice experiments can be found in Carson and Louviere [2011], and Kuhfeld [2006] [2010]. Hoyos [2010] provides a description of the most common DCE in environmental valuation applications. Further details about latent class logit can be found in Train [2003] [2008]. McCutcheon [1987], and McLachlan and Peel [2000] provide introductory explanations of latent class analysis.
mismatch in the characterization of heterogeneity in preferences: a LCL seeks discrete heterogeneity and DCE are usually generated with designs that assume homogeneity. 3 This paper uses Monte Carlo (MC) simulations to study whether the reliability of welfare estimates is impacted by this mismatch. Three willingness to pay (WTP) measures are studied: (i) WTP for a marginal change in an attribute; (ii) WTP for a non-marginal change in an alternative's attribute; and (iii) WTP to avoid the loss of an alternative. Reliability is evaluated in terms of unbiasedness, efficiency, and accuracy. Welfare estimates are calculated with parameters obtained from a LCL estimated on simulated choices. Choices are simulated according to scenarios that vary (i) the amount of discrete heterogeneity, and (ii) the amount of information available through either number of pseudo-respondents or number of choice sets. The DCE design, levels of discrete heterogeneity, number of pseudo-respondents, and number of choice sets are chosen to resemble features of applications in the environmental and resource economics field (see section two for details on these features). In this way, results are intended to be empirically relevant. Four findings are highlighted. First, WTP estimates are unbiased regardless amount of discrete heterogeneity, source and amount of information. Second, accuracy and efficiency increase with the amount of information, regardless the source of information. Third, the most efficient and the most accurate estimates are obtained when the largest amount of information is simulated, regardless the amount of heterogeneity. Fourth, increase in accuracy and efficiency does not prevent from confidence intervals falsely including the value zero. As a direct consequence of the last result, the null hypothesis that a WTP measure is zero cannot be rejected. This finding implies that an analyst planning to estimate a LCL on discrete choice data needs either (i) to collect more information than usually gathered in empirical applications; or (ii) to gather information about the source and/or magnitude of discrete heterogeneity, and include this information in the sample size calculation; or (iii) to design discrete choice experiments seeking efficient WTP estimates (e.g. Scarpa and Rose [2008]); or (iv) a strategy combining the previous three options. Further discussion of these implications is presented in section five.
Few studies have researched the consequences on WTP estimates from violations to assumptions underlying the design of DCE (Carson and Louviere [2011]). Exceptions include Carlsson and Martinsson [2003], Lusk and Norwood [2005], and Ferrini and Scarpa [2007]. Only Ferrini and Scarpa [2007] have studied the consequences of violating the assumption of homogeneity in preferences, focusing on continuous unobserved heterogeneity. They study the reliability of marginal WTP estimates, comparing four designing strategies – from the most rudimentary fractional factorial strategy to the state-of-the-art Bayesian strategy. Their most revealing result is that Bayesian strategies using poor a priori information perform poorly in comparison to the fractional factorial strategy. In contrast to Ferrini and Scarpa [2007], this paper (i) studies discrete unobserved heterogeneity; (ii) analyzes impacts on non-marginal WTP measures, and (iii) focuses only on one DCE designing strategy.
2. Current practices in environmental economics The goal in this section is to establish i) the most frequent levels of discrete heterogeneity (i.e. number of classes) in empirical applications gathering data with DCE; and ii) the most frequent features of DCE in applications estimating a LCL. Both elements are used to generate simulation scenarios from which empirically relevant implications can be extracted (see section three to learn about the Monte Carlo scenarios). Thus this section describes current practices in empirical applications that estimate a LCL on data collected with DCE. This review focuses on applications in the field of environmental and resource economics.
3 Although strategies incorporating the possibility of continuous unobserved heterogeneity have been proposed (e.g. Bliemer and Rose [2010]; Sándor and Wedel [2002] [2005]; Yu et al. [2009] [2011]), these innovations have not reached the case in which unobserved heterogeneity is discretely conceptualized, and are infrequent in environmental and resource economics.
First column in table 1 lists authors and year of reviewed publications. Some papers are listed twice because they implement two DCE. The second column lists the number of alternatives included in the choice sets and whether a status quo (sq) alternative has been added. Around 74% (14) include three alternatives, one of which is a status quo alternative. All applications but Boxal and Adamowicz [2002] present unlabeled alternatives. The third column lists the number of choice sets presented to respondents. Around 53% of the applications use either six or eight choice sets. The minimum number of choice sets is four, with a maximum of 16. The fourth column in table 1 lists the methods used to i) combine attributes, and ii) to generate the choice sets. With respect to the combination of attributes, 63% (12) use an orthogonal fractionalfactorial design (OFFD); 21% (four) use non-orthogonal fractional-factorial design (NFFD); one application use a full-factorial design (FFD); and two applications do not specify the type of fractional-factorial design used (u-FFD). Around 74% (14) use a design that identifies only main effects; three applications use a design that identifies main and two-way effects; and two applications do not specify the identified effects. With respect to the generation of choice sets, 53% (10) do not report how the choice sets are created; five (26% ) randomly pair alternatives; and four (21% ) use shifted pairing. The fifth column in table 1 presents the strategy to block choice sets. Eight applications (42% ) do not report the strategy to block choice sets; three (16% ) do not use blocking; six (32% ) use random blocking, and two applications (10% ) use orthogonal blocking. The sixth column in table 1 presents the number of attributes and levels for each attribute in the DCE. Nine applications (47% ) manipulate five attributes; four (21% ) manipulate six attributes; two applications manipulate four attributes; other two manipulate seven attributes; one application manipulates eight attributes; and one more applications manipulates nine attributes. The most recurrent numbers of levels are six and three. Manipulation of the price attribute is of special interest for economists. Column seven presents the number of levels of the price attribute. Seven applications (37% ) manipulate three levels; four (21% ) manipulate four levels; four applications (21% ) manipulate six levels; and five, seven and 11 levels are manipulated in one application each. Garrod et al. [2012] do not include a price attribute in their DCE. Columns eight and nine in table 1 present, respectively, size of samples analyzed and number of fitted classes. With a minimum of 86 and a maximum of 1,273, the average sample size is 449. The median is 311. There are three modes, repeated twice each: 240, 253, and 300. With respect to number of fitted classes, nine applications (47% ) fit three classes; five (26% ) fit two classes; four (21% ) fit four classes; and one application fits five classes. As a summary of the current practices, most studies use orthogonal fractional-factorial designs that identify main effects only. Around half of the applications do not report how choice sets have been generated; around a quarter of the applications use random pairing; and another quarter use shifted pairing Around 40% of the applications do not report how choice sets were blocked; and 30% use random blocking. Around 68% of the applications manipulate either five or six attributes. Around 80% of the applications vary the price attribute across three, four or six levels. A large majority of studies use unlabeled alternatives and add a status quo alternative to the designed strategies. Half of the applications gather information from 311 or less respondents. Around half of the applications present respondents to either six or eight choice sets. With respect to the number of fitted classes, almost half of the applications have fitted three classes; a quarter has fitted two classes; and around 20% have fitted four classes. This description of current practices closely resembles the one presented by Ferrini and Scarpa [2007]. Focusing on a set of applications published before 2006, Ferrini and Scarpa [2007] find the majority of applications use main effects orthogonal fractional-factorial designs, add an status quo alternative to
unlabeled alternatives, manipulate five or six attributes, present respondents to four, six or eight choice tasks, and half of the applications interview 350 or less respondents. 4
3. Simulation strategy Monte Carlo (MC) simulations are designed to check whether the mismatch in the characterization of heterogeneity matters in terms of reliability of WTP estimation. The MC simulation in this paper consists of six steps. Step one generates alternatives according to a main effects orthogonal fractional-factorial design. Five attributes are manipulated. Three attributes have two levels. One attribute has three levels, and the price attribute has four levels. Choice sets are created through shifted pairing, and orthogonally blocked. Choice sets include two generic alternatives and a status quo alternative. These DCE features resemble those frequently used in environmental and resource economics applications. Step two of the MC simulation generates true indirect utilities according to two heterogeneity scenarios assuming, respectively, tw o and three classes of pseudo-individuals. A large majority of applications in environmental and agricultural economics fit two or three classes to empirical data. Number of classes is varied to test whether reliability of WTP estimates depends on the amount of unobserved discrete heterogeneity. For each heterogeneity scenario, number of pseudo-respondents is varied across two levels: (i) 300 which is close to the median sample size in environmental and resource economics applications; and (ii) 1,000 which is larger than the sample size analyzed by 95% of the environmental and resource economics applications. Number of pseudo-respondents is varied to test whether reliability increases when a larger amount of information is used. Comparisons across variation in choice sets and pseudo-respondents provide insights about whether differences in reliability in welfare measures originate in differences between sources of information. Thus for each pair heterogeneity/pseudo-respondent scenario, number of choice sets is varied across three levels: three, six and twelve. This range of values includes numbers that are, respectively, below and above six and eight – the most frequent numbers of choice sets in environmental and resource economics applications. Step three simulates three different changes in attributes, and calculates the corresponding true WTP measures. WTP measures are calculated from observed utilities. Observed utilities result from adding a type I extreme value error term to true indirect utilities simulated in step two. True preference parameters are used when calculating true WTP measures. True preference parameters are the parameters used in step two to generate true indirect utilities. Step four of the MC simulation adds a type I extreme value error term to each pseudo-respondent’s true utilities, and estimates a LCL on the corresponding simulated choices. The LCL fits the corresponding number of classes to each data-generating process; i.e. the empirical model fits the correct number of classes. With the estimated preference parameters at hand, step six of the MC simulation consists in estimating three WTP measures for each respondent. Steps five and six are repeated 1,000 times, and resulting WTP estimates are stored. Step six compares average estimated WTP against average true WTP. True and estimated WTP are averaged over pseudo-respondents. Thus there is one true average WTP and 1,000 average estimated WTP for each heterogeneity scenario. This holds for each of the three WTP measures under study. 4 Ferrini and Scarpa [2007] also notice that many applications fail in providing a complete description of the strategy generating the DCE. The review in this paper suggests this practice remains common in the field.
3.1. Heterogeneity scenarios True preference parameters for both heterogeneity scenarios are listed in table 2. True preferences in the scenario with two classes closely resemble the parameters reported by Birol et al. [2006] who carry out a non-market valuation of a wetland’s attributes, and subsequently present a cost-benefit analysis of different management scenarios. As in our DCE, Birol et al. [2006] manipulates five attributes, expressed in terms of variations in the conditions of a wetland with respect to current levels. The attributes’ labels are biodiversity, open water surface area, research and education, number of farmers re-trained in environmentally-friendly activities, and a one-time payment. In this MC simulation, the first three attributes are characterized in terms of two levels – high and low; number of farmer re-trained takes three possible values – 30, 75, and 150; and the payment varies across four values -- 3, 10, 40, and 80 (in American dollars). True preference parameters with three classes closely resemble the parameters reported by Ruto et al. [2008] who carry out a non-market valuation of livestock breeds, and the discuss the implications for the preservation of indigenous cattle in Kenya. As in our DCE, Ruto et al. [2008] manipulates five attributes, expressed in terms of variations of features describing the quality of the cattle. The attributes’ labels are gender, body appearance, breed, weight, and price. In this MC simulation, the first three attributes are characterized in terms of two levels – male and female, poor and good, indigenous and exotic, respectively; weight takes three possible values – 0.9, 1.2, and 1.6 (in hundreds of kilograms); and the price varies across four values – 1, 1.2, 1.5, and 1.8 (in thousands of Kenian shillings).
3.2. True and estimated WTP measures Three willingness to pay (WTP) measures are of interest in this paper: (i) WTP for a marginal change in an attribute – re-training in the scenario with two classes, and weight in the scenario with three classes; (ii) WTP for a 25\% improvement in an attribute of alternative 1 – re-training in the scenario with two classes, and weight in the scenario with three classes; and (iii) WTP to avoid the loss of alternative 2. Calculation of WTP estimates depend on from whose perspective the calculation is approached. In this paper, true and estimated WTP measures are calculated from an empirical researcher's perspective. From this perspective, true indirect utility individual i receives from alternative j (Uij) is not fully known. Thus indirect utilities have two components: observed utility (V ij) and unobserved utility (𝜉𝑖𝑗 ); i.e. 𝑈𝑖𝑗 = 𝑉𝑖𝑗 + 𝜉𝑖𝑗 . An empirical researcher can at best calculate expected welfare measures, carrying out expectations over the unobserved component of the utilities, i.e. 1 1 𝐽 𝐽 𝐸𝜉 (𝑊𝑇𝑃𝑖 ) = ̂ [𝐸𝜉 (𝑈𝑖𝑚𝑎𝑥,𝑎 ) − 𝐸𝜉 (𝑈𝑖𝑚𝑎𝑥,𝑏 )] = ̂ [𝑙𝑛 ∑𝑗=1 𝑒𝑥𝑝 (𝑉𝑖𝑗𝑎 ) − 𝑙𝑛 ∑𝑗=1 𝑒𝑥𝑝 (𝑉𝑖𝑗𝑏 )] (1) −𝛽𝑝
−𝛽𝑝
where 𝛽̂𝑝 is the estimated preference for the price attribute; the superscripts a and b stand for after and before a change in an alternative has occurred. The expression in parentheses calculates the difference between expected maximum utility before and expected maximum utility after a change in an alternative has occurred. Expression (1) is used to calculate expected WTP measures when the researcher assumes the unobserved component of utility only captures pure randomness. However, the unobserved component of utility may capture both pure randomness and unobserved heterogeneity that produces correlation between observed and unobserved utilities. That is, 𝑈𝑖𝑗 = 𝑉𝑖𝑗 + 𝜂𝑖𝑗 𝜂𝑖𝑗 = 𝜁𝑖𝑗 + 𝜉𝑖𝑗 (2) where 𝜂𝑖𝑗 stands for the unobserved component of utility; 𝜉𝑖𝑗 stands for the purely random unobserved variation; and 𝜁𝑖𝑗 stands for the unobserved component that generates correlation between observed and
unobserved utilities. Following an error components interpretation, 𝜁𝑖𝑗 can be conceptualized as a deviation from average preferences. Thus an empirical researcher calculates the expected WTP over the distribution of this deviation: 1 𝐽 𝐽 𝐸𝜁 𝐸𝜉 (𝑊𝑇𝑃𝑖 ) = ∫ ̂ [𝑙𝑛 ∑𝑗=1 𝑒𝑥𝑝 (𝑉𝑖𝑗𝑎 ) − 𝑙𝑛 ∑𝑗=1 𝑒𝑥𝑝 (𝑉𝑖𝑗𝑏 )]𝑓(𝜁)𝑑𝜁 (3) −𝛽 𝑝
where 𝑓(𝜁) is the distribution assumed by the researcher for the deviations from the average preferences. These deviations may be assumed continuously distributed, to model continuous unobserved heterogeneity, or may also be assumed discretely distributed, to model discrete unobserved heterogeneity. In this paper, true WTP measures are calculated following expression (1); i.e. true WTP measures from an empirical researcher's perspective are conceptualized as an expectation over pure randomness. True preference parameters are used when calculating true WTP measures; i.e. preference parameters take the values listed in table 2. Calculation of estimated WTP measures is made according to expression (2). This expression does not reduce to a closed-form solution. Therefore, computation of expression (2) is carried out by simulation. This simulation involves two steps. First, a draw is taken from 𝑓(𝜁). Second, expression (2) is evaluated at the drawn value. Both steps are sequentially repeated S times. The simulated expected WTP equals the average value of the S computed values (see Train [2003] for further details).
4. Results Reliability of WTP estimates is evaluated in terms of unbiasedness, efficiency and accuracy. An estimate is categorized as unbiased if its 95% confidence interval includes the true WTP. The most efficient estimate is the one with the smallest 95% confidence interval. Efficiency comparison is restricted to unbiased estimates. Accuracy refers to the magnitude of the relative difference between the estimates and the true values, measured as the absolute value of the mean relative error. Before discussing the results in terms of unbiasedness, efficiency and accuracy, figure 1 is presented to graphically illustrate the main findings of this study. Figure 1 shows a snapshot to the empirical density of the WTP for a 25% increase in re-training (T) of alternative 1, by choice sets scenarios, for the scenario with two classes and 300 pseudo-respondents. 5 The vertical straight line locates the true WTP for this non-marginal change. Three features of figure 1 are highlighted. Frist, WTP estimates are distributed around the true WTP, regardless the number of choice sets presented to the pseudorespondents. Second, a larger number of choice sets yield smaller confidence interval. Third, the three distributions include the value zero. These features are interpreted as evidence that (i) WTP estimates for a non-marginal change in T are unbiased regardless the number of choice sets; (ii) an increase in the number of choice sets increases the efficiency of the WTP estimates; and (iii) the large dispersion of the estimates provokes the null hypothesis that a WTP measure is zero cannot be rejected. Similar stories can be inferred from figures illustrate results from the rest of simulation scenarios. Table 3 summarizes performance of WTP estimates in terms of unbiasedness and efficiency. Yes indicates that 95% confidence interval includes the true corresponding WTP. In this case, the WTP estimate is considered unbiased. As shown in table 3, WTP estimates are unbiased in every case, regardless number of pseudo-respondents, number of choice sets, and number classes. This finding holds for the three WTP measures under study. The WTP estimate with the smallest 95% confidence interval is considered the most efficient estimate. The most efficient estimate is highlighted in table 3 with an asterisk (* ). According to table 3, the most efficient WTP estimate is the estimate obtained under the 12 choice sets scenario. This finding holds for both two- and three-classes scenarios, and for both 300 and 1,000 pseudo-respondent scenarios. This finding also holds for the three WTP measures under study.
5
Similar figures for the rest of simulation exercises are available upon request.
Table 4 summarizes results in terms of accuracy which is measured as the average of the absolute value of the relative errors (AARE); i.e. ̂ (𝑊𝑇𝑃−𝑊𝑇𝑃
𝐴𝐴𝑅𝐸 = 𝑀 −1 ∑𝑀 𝑚=1 |
𝑊𝑇𝑃
|
A smaller AARE implies more accuracy. Four findings from table 4 are highlighted here. First, holding number of pseudo-respondents constant, the AARE improves with number of choice sets, regardless the welfare measure under study and the heterogeneity scenario. Second, holding number of choice sets constant, the AARE improves with number of pseudo-respondents. Third, different sources of information do not generate differences in improvement of accuracy. For instance, take the scenario with two classes, and WTPA. For this case, when passing from 900 observations (300 pseudo-respondents and three choice sets) to 3,600 observations (300 pseudo-respondents and 12 choice sets), accuracy increases in percentage terms just a little more than w hen passing from 900 observations (300 pseudorespondents and three choice sets) to 3,000 observations (1,000 pseudo-respondents and three choice sets). Fourth, the most accurate WTP estimate is obtained under the 1,000 pseudo-respondents, 12 choice sets scenario. These findings are intuitive: increasing the available information --- either by increasing the number of choice sets or by increasing the number of pseudo-respondents--- increases the accuracy of WTP estimates. The best scenario in terms of accuracy is the one in which a large amount of respondents face a large amount of choice sets. As figure 1 suggests, the dispersion of welfare estimates may provoke that regardless unbiasedness, the null hypothesis of WTP measure being zero cannot be rejected. Table 5 summarizes the results in terms of whether the null hypothesis that WTP measure is zero can correctly be rejected at 95% confidence. Three findings are highlighted. First, the 1,000 pseudo-respondents, 12 choice sets scenario is the only scenario under which the null hypothesis is correctly rejected at 95% confidence. This result holds across the two heterogeneity scenarios, and across the three WTP measures. Second, in most cases, the 300 pseudo-respondents, three choice sets scenario cannot reject the null hypothesis. Third, the six choice sets scenario is able to correctly reject the null hypothesis only for the scenario w ith three classes. Findings in table 5 imply that, although the scenario with the largest number of observations yields the correct statistical conclusion for both heterogeneity scenarios, no monotonic pattern is observed in terms of whether increase in choice sets or increase in pseudo-respondents decreases dispersion of WTP estimates to the point to correctly reject the null hypothesis that WTP is zero at 95% confidence.
5. Conclusions and discussion Current research practices include the estimation of LCL on data gathered with DCE. This practice rests on a mismatch about heterogeneity in preferences: LCL seek for discrete heterogeneity but DCE are frequently generated with fractional-factorial designs that assume homogeneous preferences. This paper carries out a series of Monte Carlo simulations to check whether WTP estimates are reliable regardless this mismatch The findings in this paper are straightforward. First, WTP estimates are unbiased regardless amount of discrete heterogeneity – two or three classes of pseudo-respondents–, and amount and source of information – pseudo-respondents or choice sets. Second, accuracy and efficiency increase with the amount of information, regardless the source of information. Third, regardless the amount of discrete heterogeneity, the most efficient and the most accurate estimates are obtained under the scenario with the largest simulated information –1,000 pseudo-respondents and 12 choice sets. Fourth, increase in accuracy and efficiency does not prevent that under some scenarios the null hypothesis that WTP measure is zero cannot be rejected. The first three findings are good news. According to the first result, WTP estimates are unbiased regardless the described mismatch in assumptions about heterogeneity in preferences. The second and third findings are intuitive because increasing the number of choice sets and pseudo-respondents is
simply increasing the information available to the statistical model. Given that estimates are unbiased, increasing information positively impacts efficiency. The increase in efficiency translates to improvements in accuracy. Despite the good news, there is a nuance that turns out to have implications for the empirical literature. This nuance refers to the finding that the null hypothesis that WTP falsely cannot be rejected at 95% confidence under some scenarios. This finding holds under scenarios that resemble conditions frequently found in empirical literature. For instance, the scenario with two classes, and 300 pseudo-individuals and either three, six or 12 choice sets. These scenarios cannot reject the null hypothesis for a non-marginal change. These scenarios are relevant for the empirical literature because the median sample size in published papers is around 300, and the most frequent numbers of choice sets are six and eight. Thus, according to the findings from our simulations, empirical applications may face the risk of yielding unbiased WTP estimates that falsely cannot be distinguished from zero. A direct implication of these findings is that empirical researchers may need to gather more information than usually is gathered in environmental and resource economics. Take into account that heterogeneity in empirical applications arise from a number of different sources, which implies that dispersion of WTP measures in empirical applications may be larger than the dispersion observed in the experiments presented in this paper. Thus the risk of falsely not rejecting the zero null hypothesis is likely larger in empirical applications than the implied by the findings in this paper, making the suggestion of gathering more information particularly relevant. When facing the decision of increasing the available information, empirical researchers may decide (i) to collect more information either through a larger number of respondents or by facing a respondent to a larger number of choice sets; or (ii) to gather information about the source and/or magnitude of discrete heterogeneity, and include this information in the sample size calculation; or (iii) to design discrete choice experiments seeking efficient WTP estimates; or (iv) a strategy combining the previous three options. An empirical researcher facing the decision of increasing number of respondents or number or choice sets may consider that, according to the findings in this paper, no monotonic improvement in efficiency is observed neither from increasing number of choices nor from increasing number of respondents. Not surprisingly, findings in this paper suggest that analyzing the largest amount of simulated information yields dispersion small enough to correctly reject the null hypothesis that WTP is zero. Consider, however, the amount of information simulated in this paper: 1,000 pseudo-respondents and 12 choice sets. Obtaining this amount of information is not an easy task in practice. On one hand, financial justifications are behind the decision of facing each respondent to several choice sets. On the other hand, respondents may not be willing to answer many choice sets. In taking the decision about what amount of information should be gathered, empirical researchers may want to consider recent evidence suggesting that respondents can answer up to 16 or 17 choice sets without showing symptoms of tiredness (see Bech et al. [2011]; Hess et al. [2012]). These results may potentially be context-dependent. Thus researchers still need to pay attention to designing discrete choice experiments that minimize the mental burden to the respondent. Information about source and/or magnitude of discrete heterogeneity can be useful when calculating the sample size. In the ideal case, a researcher may know the number of classes, or strata in sampling terminology, that can be considered in a stratified sampling strategy. In this way, variance of estimated parameters and sample size can be reduced, with the corresponding impact on the dispersion of WTP estimates. Indeed, LCL are implemented in contexts for which source and magnitude of discrete heterogeneity are not obvious. More realistically, because the use of pilot surveys and/or semi-structured interviews is common in discrete choice experiments applications, researchers may obtain some information about sources and magnitude of heterogeneity. This information is useful in proposing a priori distributions for preference parameters. In this context, empirical researchers may also consider designing strategies that incorporate a priori information about preferences in the design of discrete choice experiments. Some of these strategies do not rely on the assumption of homogeneity in preferences (see Bliemer and Rose [2010]; Ferrini and Scarpa [2007]; Scarpa and Rose [2008]; Yu et al. [2009] [2001]).
The work by Scarpa and Rose [2008] is of particular relevance in this context. They use a criterion to minimize the variance of functions that depend on parameter estimates. Indeed, WTP is a function involving parameter estimates. This criterion, however, has not been generalized to the case in which parameters are discretely heterogeneous. While the Monte Carlo simulations in this paper have resembled current practices in the field of environmental and resource economics, the conclusions and implications are relevant to any field where LCL are used to infer heterogeneity from data gathered with a DCE.
References BECH M., T. KJAER and J. LAURIDSEN [2011], “Does the number of choice sets matter? Results from a web survey applying a discrete choice experiment”. Health Economics, 20, 273 -286. BEHARRY-BORG and R. SCARPA [2010], “Valuing quality changes in Caribbean coastal waters for heterogeneous beach visitors”. Ecological Economics, 69, 1124 -1139. BIROL E., K. KAROUSAKIS and P. KOUNDOURI [2006], “Using a choice experiment to account for preference heterogeneity in wetland attributes: The case of Cheimaditida wetland in Greece”. Ecological Economics, 60, 145-156. BIROL E., E. R. VILLALBA and M. SMALE [2009], “Farmer preferences for milpa diversity and genetically modified maize in Mexico”. Environment and Development Economics, 14(4), 521-540. BLIEMER M. C. and J. M. ROSE [2010], “Construction of experimental design for mixed logit models allowing for correlations across choice observations”. Transportation Research Part B, 44, 720 -734. BOXALL P. C. and W. L. ADAMOWICZ [2002], “Understanding heterogeneous preferences in random utility models: A latent class approach”. Environmental and Resource Economics, 23, 421 -446. BROCH S. W. and S. E. VEDEL [2012], “Using choice experiments to investigate the policy relevance of heterogeneity in farmer agri-environmental contract preferences”. Environmental and Resource Economics, 51, 561-581. BROUWER R., J. MARTIN-ORTEGA and J. BERBEL [2010], “Spatial preference heterogeneity: A choice experiment”. Land Economics, 86(3), 552-568. CARLSSON F. and P. MARTINSSON [2003], “Design techniques for stated preferences methods in health economics”. Health Economics, 12, 281-294. CARSON R. T., and J. J. LOUVIERE [2011], “A common nomenclature for stated preference elicitation approaches”. Environment and Resource Economics, 49, 539-559. CHUNG C., B. C. BRIGGEMAN and S. HAN [2012], “Willingness-to-pay for beef quality attributes: A latent segmentation analysis of Korean grocery shoppers”. Journal of Agricultural and Applied Economics, 44(4), 447-459. COLOMBO S., N. HANLEY and J. J. LOUVIERE [2009], “Modeling preference heterogeneity in stated choice data: An analysis for public goods generated by agriculture”. Agricultural Economics, 40, 307 -322. FERRINI, S. and R. SCARPA [2007], “Designs with a priori information for nonmarket valuation with choice experiments: A Monte Carlo study”. Journal of Environmental Economics and Management, 53, 342 363. GARROD G., E. RUTO, K. WILLIS and N. POWE [2012], “Heterogeneity of preferences for the benefits of Environmental Stewardship: A latent-class approach”. Ecological Economics, 76, 104-111. HOYOS D. [2010], “The state of the art of environmental valuation with discrete choice experiments”. Ecological Economics, 69, 1595-1603. HESS S., D. A. HENSHER and A. DALY [2012], “Not bored yet – revisiting the respondent fatigue in stated choice experiments”. Transportation Research Part A, 46, 626 -644. KIKULWE E. M., E. BIROL, J. WESSELER and J. FALCK-ZEPEDA [2011], “A latent approach to investigating demand for genetically modified banana in Uganda”. Agricultural Economics, 42, 547 -560. KOSENIUS A. K. [2010], “Heterogeneous preferences for water quality attributes: The case of eutrophication in the Gulf of Finland, the Baltic Sea”. Ecological Economics, 69, 528 -538. KUHFELD W. F. [2006], Construction of efficient designs for discrete choice experiments. In R. Grover and M. Vriens, editors, The handbook of marketing research, 312 -329. Sage Publications
KUHFELD W. F. [2010], Marketing research methods in SAS: Experimental design. SAS Insititue, available at http://support.sas.com/techsup/technote/ts722.pdf. LOUVIERE J. J., D. PIHLENS and R. CARSON [2011], “Design of discrete choice experiments: A discussion of issues that matter in future applied research”. Journal of Choice Modelling, 4, 1, 1 -8. LUSK J. L. and F. B. NORWOOD [2005], “Effect of experimental design on choice-based conjoint valuation estimates”. American Journal of Agricultural Economics, 87, 771-785. MCCUTCHEON A. L. [1987], Latent class analysis. Sage Publications. MCLACHLAN G. and D. PEEL [2000], Finite mixture models. John Wiley and Sons. MILON J. W. and D. SCROGIN [2006], “Latent preferences and valuation of wetland ecosystems”. Ecological Economics, 56, 162-175. OUMA E., A. ABDULAI and A. DRUCKER [2007], “Measuring heterogeneous preferences for cattle traits among cattle-keeping households in East Africa”. American Journal of Agricultural Economics, 89(4), 1005-1019. RUTO E., G. GARROD and R. SCARPA [2008], “Valuing animal genetic resources: A choice modeling application to indigenous cattle in Kenya”. Agricultural Economics, 38, 89 -98. SÁNDOR Z. and M. WEDEL [2002], “Profile construction in experimental choice designs for mixed logit models”. Marketing Science, 21, 4, 455-475. SÁNDOR Z. and M. WEDEL [2005], “Heterogeneous conjoint choice designs”. Journal of Marketing Research, 42, 2, 210-218. SCARPA R., A. G. DRUCKER, S. ANDERSON, N. FERRAES-EHUAN, V. GOMEZ, C. R. RISOPATRON and O. RUBIO-LEONEL [2003], “Valuing genetic resources in peasant economies: The case of hairless creole pigs in Yucatan”. Ecological Economics, 45, 427 -443. SCARPA R. and J. M. ROSE [2008], “Design efficiency for non-market valuation with choice modeling: How to measure it, what to report and why”. Australian Journal of Agricultural and Resource Economics, 52, 253-282. TRAIN K. [2003], Discrete choice methods with simulations. Cambridge University Press. TRAIN K. [2008], “EM algorithms for nonparametric estimation of mixing distributions”, Journal of Choice Modelling,1(1), 40-69. VAN PUTTEN I. E., S. M. JENNINGS, J. J. LOUVIERE and L. B. BURGESS [2011], “Tasmanian landowner preferences for conservation incentive programs: A latent class approach”. Journal of Environmental Management, 92, 2647-2656. YU J., P. GOOS and M. VANDEBROCK [2009], “Efficient conjoint choice designs in the presence of respondent heterogeneity”. Marketing Science, 28(1), 122-135. YU J., P. GOOS and M. VANDEBROCK [2011], “Individually adapted sequential Bayesian conjoint-choice designs in the presence of consumer heterogeneity”. International Journal of Research in Marketing, 28, 378-388.
Table 1. Design of discrete choice experiments in empirical applications estimating a latent class logit
Paper Boxal and Adamowicz [2002] Scarpa et al. [2003] Birol et al. [2006] Milon and Scrogin [2006] Milon and Scrogin [2006] Ouma et al. [2007] Ouma et al. [2007] Ruto et al. [2008] Birol et al. [2009] Colombo et al. [2009] Beharry-Borg and Scarpa [2010] Beharry-Borg and Scarpa [2010] Brouwer et al. [2010] Kosenius [2010] Kikulwe et al. [2011] van Putten et al. [2011] Broch and Vedel [2012]
Alternativesa 5 + sq
Choice sets 8
2 + sq
6
2 + sq
8
2
7
2
7
2 + sq
12
2 + sq
11
2 + sq
8
2 + sq
6
2 + sq
6
2 + sq
8
2 + sq
9
2 + sq
4
2 + sq
6
2 + sq
16
2 + sq
8
2 + sq
6
Method to: combine attibutesb/ generate choice sets OFFD/ unspecified OFFD/ random OFFD/ unspecified OFFD/ unspecified OFFD/ unspecified NFFD/ unspecified NFFD/ unspecified OFFD/ random OFFD/ random OFFD/ shifted OFFD/ shifted OFFD/ shifted OFFD/ unspecified NFFDc/ unspecified u-FFD/ unspecified u-FFDd/ shifted NFFDd/ unspecified
Method to block choice sets unspecified random random unspecified unspecified no blocking no blocking random random unspecified orthogonal orthogonal unspecified unspecified no blocking unspecified unspecified
Continued on next page
Attributes/ levels 5 / 44 × 2 5 / 23 × 3 × 4 5 / 23 × 42 6 / 36 6 / 36 8 / 25 × 33 7 / 24 × 33 5 / 23 × 32 5 / 32 × 22 × 5 6 / 35 × 6 6 / 36 9 / 39 5 / 2 2 × 3 2× 6 5 / 34 × 7 4 / 32 × 2 × 6 5 / 2 4× 4 4 / 33 × 6
Levels of price Attribute 4
Sample size 620
Number of fitted classes 4
4
300
2
4
407
2
3
240
3
3
240
3
3
253
3
3
253
3
3
311
3
5
420
3
6
300
3
3
86
2
3
198
2
6
619
4
7
726
5
6
421
2
4
132
3
6
853
4
Table 1 – continued from previous page
Paper Chung et al. [2012] Garrod et al. [2012]
Alternativesa 3 + sq
Choice sets 10
2
4
a sq: status quo alternative. factorial design. c
b
Method to: combine attibutesb/ generate choice sets FFd/ radom OFFDc/ random
Strategy to block choice sets random random
Attributes/ levels 7 / 2 3 × 3 2 × 5 × 11 5 / 25
Levels of price Attribute 11
Sample size 873
Number of fitted classes 3
NA
1,273
4
OFFD: orthogonal fractional-factorial design; NFFD: non-orthogonal fractional-factorial design; FFD: full-factorial design; u-FFD: unspecified fractional-
Identified effects are not specified;
d
Main affect and two-way effects are identified; otherwise, only main affects are identified.
Table 2. Preferences in heterogeneity scenarios under study
Relative size Attributes Status quo Biodiversity Open water surface area Research and education Re-training Payment Relative size Attributes Gender Body appearance Breed Weight Price
True values of preference parameters Class 1 Class 2 Class 3 Two classesa 0.700 0.300 NA 2.400 0.270
-1.200 0.000
NA NA
0.160
0.300
NA
0.140 0.003 -0.015
NA NA NA
0.500
0.000 0.003 -0.045 Three classesb 0.260
0.240
-0.150 11.000 -0.480 2.000 7.000
-0.300 2.100 -0.210 5.500 5.000
1.410 -1.410 -0.150 3.500 3.000
Further details about the attributes can be found in a Birol et al. [2006] and b Ruto et al. [2008].
Figure 1. Snapshot to empirical density of WTP for 25% improvement in re-training of alternative 1, by number of choice sets (scenario with two classes, 300 pseudo-respondents) Table 3. 95% confidence interval include true WTP measurea,b
WTPA WTPL MWTP WTPA WTPL MWTP
Number of pseudo-respondents 300 1,000 Number of choice sets Number of choice sets Three Six Twelve Three Six Twelve Two classes Yes Yes Yes* Yes Yes Yes* Yes Yes Yes* Yes Yes Yes* Yes Yes Yes* Yes Yes Yes* Three classes Yes Yes Yes* Yes Yes Yes* * Yes Yes Yes Yes Yes Yes* * Yes Yes Yes Yes Yes Yes*
a
Yes: true WTP measure is included in the 95% confidence interval. WTPA: WTP for 25% increase in re-training of alternative 1 (two classes), or WTP for 25% for increase in weight of alternative 1 (three classes); WTPL: WTP to avoid loss of alternative 2 (both heterogeneity scenarios); MWTP: WTP for a marginal change in re-training (two classes), or weight (three classes). * Smallest 95% confidence interval. b
Table 4. Average absolute value of relative errors (AARE)a,b
WTPA WTPL MWTP WTPA WTPL MWTP
Number of pseudo-respondents 300 1,000 Number of choice sets Number of choice sets Three Six Twelve Three Six Twelve Two classes 1.163 0.691 0.458 0.592 0.366 0.256 0.760 0.468 0.333 0.367 0.256 0.184 0.032 0.019 0.013 0.016 0.010 0.007 Three classes 0.063 0.053 0.031 0.076 0.028 0.017 0.067 0.025 0.016 0.027 0.013 0.008 1.394 0.312 0.177 0.499 0.163 0.089
̂ AARE is measured as M −1 ∑M m=1|(WTP − WTP)⁄WTP|, where M is the number of Monte Carlo repetitions; i.e., 1,000. b WTPA: WTP for 25% increase in re-training of alternative 1 (two classes), or WTP for 25% for increase in weight of alternative 1 (three classes); WTPL: WTP to avoid loss of alternative 2 (both heterogeneity scenarios); MWTP: WTP for a marginal change in re-training (two classes), or weight (three classes). a
Table 5. Null hypothesis that WTP measure is zero can correctly be rejected at 95% confidencea,b Number of pseudo-respondents 300
1,000
Number of choice sets
Number of choice sets
Three
Six
Twelve
WTPA
No
No
No
WTPL
Yes
Yes
MWTP
No
No
Three
Six
Twelve
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Two classes
Three classes WTPA
No
Yes
Yes
No
Yes
Yes
WTPL
No
Yes
Yes
No
Yes
Yes
MWTP
No
Yes
Yes
No
Yes
Yes
a
No: null hypothesis that WTP measure is zero cannot be rejected at 95% confidence; yes: null hypothesis that WTP measure is zero can be rejected at 95% confidence. b WTPA: WTP for 25% increase in re-training of alternative 1 (two classes), or WTP for 25% for increase in weight of alternative 1 (three classes); WTPL: WTP to avoid loss of alternative 2 (both heterogeneity scenarios); MWTP: WTP for a marginal change in re-training (two classes), or weight (three classes).