estimation of mixture models using cross-validation ... - Oxford Academic

4 downloads 5920 Views 342KB Size Report
Jul 12, 2011 - validation that alleviates many of these associated model selection issues. ... Key words: crop insurance, cross-validation, insurance rating, ...
ESTIMATION OF MIXTURE MODELS USING CROSS-VALIDATION OPTIMIZATION: IMPLICATIONS FOR CROP YIELD DISTRIBUTION MODELING JOSHUA D. WOODARD AND BRUCE J. SHERRICK A critical issue in identifying an appropriate characterization of crop yield distributions is that the best-fitting distribution in an in-sample framework is not necessarily the best choice out-of-sample. This study provides a methodology for estimating flexible and efficient mixture models using crossvalidation that alleviates many of these associated model selection issues. The method is illustrated in an application to the rating of group risk insurance products. Results indicate that nonparametric models often fit best in-sample but are inefficient and consistently overstate true rates, and vice versa for parametric models. The proposed model provides unbiased rates and also has desirable efficiency properties. Key words: crop insurance, cross-validation, insurance rating, mixture models, out-of-sample likelihood, probability forecasting, yield distributions..

Historically, much attention in the agricultural economics literature has focused on fitting crop yield distributions, identifying the “best” alternative, and designing and investigating desirable frameworks within which to do so (Antle 2010; Atwood, Shaik, and Watts 2003; Day 1965; Gallagher 1986; Hennessy 2009; Just and Weninger 1999; Ker 1996; Ker and Coble 2003; Ker and Goodwin 2000; Lanoue et al. 2010; Nelson 1990; Nelson and Preckel 1989; Norwood, Roberts, and Lusk 2004; Ramirez 1997; Ramirez, Misra, and Field 2003; Ramirez, McDonald, and Carpio 2010; Ramirez, Misra, and Nelson 2003; Sherrick et al. 2004; Turvey and Zhao 1999;Wang and Zhang 2002). Several subareas have evolved related to this problem, including assessment of “in-sample” goodnessof-fit measures across candidate distributions (Sherrick et al. 2004), development and modification of parametric and semiparametric distributional models (Goodwin and Ker 1998; Goodwin, Roberts, and Coble 2000; Ker and Coble 2003; Ker and Goodwin 2000; Racine and Ker 2006; Ramirez 1997), assessment Joshua D. Woodard is an assistant professor in the Department of Agricultural Economics at Texas A&M University. Bruce Sherrick is a professor in the Department of Agricultural and Consumer Economics at the University of Illinois at Urbana-Champaign. The authors would like to thank Ximing Wu, the editor, and two anonymous reviewers for helpful comments and suggestions. All errors are our own.

of implications for insurance rates (Ker and Coble 2003; Sherrick et al. 2004), and assessment of “out-of-sample” forecasting performance (Lanoue et al. 2010; Norwood, Roberts, and Lusk 2004).1 Yet, disagreement remains about what the “right” distribution choice is for any given yield dataset, what criteria should be used for evaluating goodness-of-fit, and even what estimation framework should be employed. As noted by Norwood, Roberts, and Lusk (2004; hereafter NRL) and more recently by Lanoue et al. (2010), the best fitting distribution in an in-sample framework does not necessarily lead to the most efficient or least biased choice for a particular application.2 1 The studies listed are only a sampling of some of the important research related to this question; this list is in no way intended to be exhaustive. We are not aware of any review papers on this topic that span yield distribution fitting, evaluation, and applications to crop insurance, though versions exist on each individual theme. 2 We use the term“bias”to refer to the degree to which the expectation of the candidate’s cumulative distribution function (CDF) diverges from the true underlying CDF; “efficiency” refers generally to the degree of variability in the difference between the candidate and true CDF in repeated sampling for a relevant metric (e.g., integrated root mean square error). Alternatively, where the intent is clear, in the case of insurance rates, bias refers to the extent to which the expectation of the insurance rate generated from the candidate distribution model differs from the true rate, while efficiency refers to the degree of variability in the insurance rate generated from a candidate distribution around the true rate in repeated sampling for a relevant metric (e.g., root mean square error).

Amer. J. Agr. Econ. 93(4): 968–982; doi: 10.1093/ajae/aar034 Received July 2010; accepted April 2011; published online July 12, 2011 © The Author (2011). Published by Oxford University Press on behalf of the Agricultural and Applied Economics Association. All rights reserved. For permissions, please e-mail: [email protected]

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

At the core of this phenomenon is the concern that sampling variability may render highly parameterized models prone to overfitting. With this problem in mind, NRL use a framework for evaluating candidate distributions taking into account the inherent sampling variability in observed data by constructing “out-of-sample” log-likelihood (OSLL) functions, in which OSLL realizations are constructed by successively estimating the yield distribution model while holding out observation(s), and then evaluating the predicted density value at the out-of-sample observation(s). This approach is more generally known as cross-validation (CV) when used to rank candidate distributions. This study extends previous research on out-of-sample yield probability forecasting by employing CV techniques to estimate an optimal mixing of candidate distribution models themselves. The central concept is deceptively simple: if a valid and desirable singular choice among models can be made according to this criterion (i.e., implicitly optimizing the choice among models by assigning a weight of 0 to all but one model, which receives a weight of 1), then it is likewise valid and likely more desirable to assign component weights on [0, 1] to each model (i.e., mixing). Application of this logic regarding optimization via CV is not new, as related ideas have been applied to problems in statistics and econometrics, including bandwidth selection in kernel density estimation (Cameron and Trivedi 2005), machine learning and artificial intelligence (Corduneanu and Bishop 2001; Moore and Lee 2004), switching regression and model transition smoothing (Goldfeld and Quandt 1976),and price forecasting and training of artificial neural networks (Shahwan and Odening 2007), but to the authors’ knowledge, a CV approach has not been applied to the problem of mixing probability distribution models of different classes/types or different estimators.3 The model that results from this procedure is a mixed distribution composed of multiple underlying distributions or distribution estimators. It results in a form that can express a wider degree of flexibility than any single underlying model alone, and should also be preferred out-of-sample. In some sense the resulting model is a semiparametric estimator, although it is more general in that it

3 While not examples of CV, approaches to Bayesian model averaging exist that mix across different distribution types as well.

969

can encapsulate virtually any other parametric, semiparametric, or nonparametric estimator without modification to the basic approach. Here, the terminology “out-of-sample preferred” is used to refer to the model that maximizes the OSLL using a “leave-one-out” CV measure, as employed by NRL. The OSLL criterion has the desirable asymptotic property that it maximizes the Kullback–Leibler information criterion and has also been found to have desirable small-sample properties in these applications (Norwood,Ferrier,and Lusk 2001;Norwood,Lusk,and Roberts 2002;NRL). The resulting mixture model also has the advantage that the model selection/mixing process is easily automated, so an analyst would not need not to manually pick each distribution or estimator type for different samples. The present study makes several contributions to the crop yield distribution modeling literature. First, it provides a straightforward framework for addressing the overfitting problem that has been documented to complicate in-sample distribution estimation, allowing for out-of-sample desirable parameterizations of highly flexible models. The results indicate that this methodology generally provides an effective approach to addressing the overfitting phenomenon while still allowing for great flexibility in the underlying distribution shape. Second, the classic approach to outof-sample distribution analysis has routinely focused either on measures of yield prediction error or on the likelihood of the predicted density (NRL); this study also shows how the OSLL approach in NRL can be generalized to accommodate a wide class of relevant penalty or loss functions, which often should differ based on the context of application. Third, the study highlights the dangers associated with unrestricted application of highly parameterized probability models. Specifically, via in-sample and out-of-sample comparisons, the results indicate a clear relationship between the flexibility of the candidate distribution (in terms of number of parameters, or by virtue of distribution being a non- or semiparametric) and its propensity to overfit to sampling error in actual data. Monte Carlo results of an applied insurance rating analysis of a popular crop insurance product provide convincing evidence that the method is effective in overcoming many of the problems that confound traditional modeling frameworks. Specifically, the method is shown to generate rates that are substantially less

970

July 2011

biased and more accurate than existing parametric and nonparametric alternatives evaluated.

Probability Models for In-Sample and Out-of-Sample Analysis Consider the problem of evaluating alternate distributions such as Weibull, beta, mixture-ofnormals, kernel density, etc., each fit to a data sample, Y = {y1 , y2 , . . . yN }. Suppose that there are Kcandidate distributional models, M(η) = {M1 (η), M2 (η), . . . MK (η)} available, each with a specific pdf, fk (x, θk ), and an estimator for the optimal “in-sample” model parameters, θk∗ (η), given some observed data η and observation, x, at which the pdf is evaluated, so that M(η)k = {fk (x, θk ); θk∗ (η)}, where η represents data used to estimate optimal parameters, θk∗ . In this notation, the traditional insample distribution fitting approach is to set η = Y, resulting in model k as Mk (Y) = {fk (x, θk ); θk∗ (Y)}. The fitted pdf of model k given all in-sample data, Y, evaluated at x is then expressed as fk (x, θk∗ (Y)). The in-sample likelihood for model k is N ∗ then LIn The best i=1 fk,i (yi , θk (Y)). k = likelihood measure implicitly “selects” the single candidate that optimizes this form of congruence. In the case of modeling out-ofsample yields, one observation is withheld for out-of-sample evaluation, which we denote y−i = {yj ∈ Y : yi ∈ / yj }.4 This is sometimes referred to as “leave-one-out” CV. The “out-of-sample” fitted pdf for model k evaluated at the hold-out observation is denoted fk,i (yi , θk∗ (y−i )). The OSLL for model k is then  ∗ calculated as LOut = N i=1 fk,i (yi , θk (y−i )). For k ∗ ease of exposition, let θk,i = θk∗ (y−i ) so that  ∗ = N LOut i=1 fk,i (yi , θk,i ). NRL argue that a k desirable objective/criterion is to select the Out ∀k. This idea model k∗ such that LOut k∗ > Lk is applied in the following to the evaluation and optimization of mixture models that are composed of two or more of the K component models.

4 Other schemes are also possible, including leave-more-out, vfold, and Monte Carlo/resampling techniques. The focus herein is on the leave-one-out approach, but the logic can be extended with little modification to others as well,and could serve as an interesting area of future research.

Amer. J. Agr. Econ.

Optimization of a Mixed Probability Model Using Out-of-Sample Likelihood Functions The optimal mixing of two or more of the K available component models, given sample data Y, is carried out by optimizing the leaveone-out OSLL CV criterion. While other CV criteria exist in addition to the leave-one-out approach, the framework developed here can be applied as a straightforward extension to other, more complex cases, such as v-fold and “leave-more-out” CV (Baumann 2003; Zhang 1992).  Let W = {w1 , w2 , . . . , wS : i∈K wi = 1; wi ≥ 0 ∀i} be the K  component weights and fMIX (x|W, M, η) = k∈K (wk · fk (x, θk∗ (η))) be the mixture distribution associated with weights W, the set of candidate models M, and data η. The objective is to find the optimal mixing weights W ∗ of the underlying component models for fMIX . The out-of-sample likelihood function of fMIX is then LOut MIX (W, M, Y) = N  ∗ [ (w · f (y , θ ))]. In order for the k k,i i k,i k∈K i=1 K component models to truly represent outof-sample measures, the parameters must also themselves be functions of only y−i and not yi . These conditions suggest a well-defined objective that can be optimized to find the optimal mixing of models in fMIX . That is, the optimal mixing weights W ∗ that maximize the out-of-sample likelihood function are: (1)

  W ∗ (M, Y) = arg max LOut MIX (W, M, Y) . W

Optimization of equation (1) is straightforward but could be computationally intensive for large or even moderately scaled problems.5 Note that the optimization of weights at the out-of-sample stage does not pertain to, nor affect, the estimation of the model parameters of the underlying candidate distributions, as doing so would simply reduce the problem back to the simple “in-sample” estimates; rather, it optimizes the weight given to each model itself.6

5 The necessary computations to conduct the out-of-sample optimization for the 961 counties (37 observations each) in this study take about 10 minutes to run in MATLAB on a medium high-end desktop computer. For large datasets, resampling and Monte Carlo techniques could be employed to reduce the number of needed validations and evaluations of the log-likelihood function. This is left as an area of future research. 6 In general, if the underlying true distribution is a mixture, this method will provide an inconsistent estimate of the weight except in special cases where the distributions being mixed are parameterized such that data generated from one distribution would

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

At the last stage,the full sample of data Y can then be utilized along with the optimal weights W ∗ to arrive at the final mixture model pdf as ∗ (x|W ∗ , M, Y) fMIX

(2)

=



971

yield yi is: r−i,k = (1/G)

G

[Max(0, G − x)

0

· fk,i (x, θk∗ (y−i )] dx,



w∗k · fk (x, θk∗ (Y)) .

k∈K

That is, in equation (2), the full data set is employed to estimate the component model parameters θ ∗ (Y), for the final mixture pdf after the optimal weights have been estimated. In addition to the desirable flexibility and efficiency properties of the estimator, it also has the advantage that it is relatively simple and straightforward to estimate. Further, the only output that is needed from the model in order to implement it in practice is simply an optimized set of weights and the set of models. This is in contrast to more complicated alternatives, which require complex programs and evaluations of potentially large raw datasets at every level.

and where G is a guaranteed yield (or trigger yield) for the yield insurance policy; and ri,k is the associated out-of-sample loss-cost for model k with hold-out yield yi and is calculated as ri = Max(0, G − yi )/G. Note that in this case the loss-cost observed for period i, ri , does not depend on the model k under consideration. One could similarly define an underlying mixture model for the probability distribution used to develop the rate, and thus obtain the rate for the mixture model consisting of K component models as r−i,MIX = (1/G) ·



G

[Max(0, G − x)

0

(w·k fk,i (x, θk∗ (y−i )))] dx,

k∈K

Cross-Validation Optimization of a Mixed Probability Model Using Other Criteria Oftentimes the purpose of estimating yield distributions is for a specific application in which certain measures of economic importance are of greater relevance than the distribution itself. Importantly, optimizing with respect to the OSLL criterion can be extended to accommodate other possible objectives or loss functions. For example, if an insurer were to estimate yield distributions for rating applications, then the predictive error of the resulting insurance rate generated from the predicted density over only the insured region is perhaps of greater interest than the distribution itself. In this case, one could define a loss function as the sum of squared differences between the out-of-sample loss-cost and the rate estimated using the insample distribution, holding out observation  yi , as Ck = i (r−i,k − ri )2 , where the in-sample estimated rate, r−i,k , for model k with hold-out

result in consistent parameter estimates of the second, and vice versa. Other methods such as expectation-maximization, would provide consistent estimates in general, but the resulting model of such approaches would also presumably provide less efficient out-of-sample model estimates than the method proposed.

 so that CMIX = i (r−i,MIX − ri )2 . Last, analogous to equation (1), the optimal set of weights for W ∗ can be obtained as the solution to the loss function minimization as (3)

W ∗ = arg min (CMIX ) . W

The final mixture model pdf under this crite∗ rion is again expressed as fMIX (x|W ∗ , M, Y), as in equation (2), the difference being that a different objective is used to choose W ∗ in equation (3). Data To illustrate the CV optimized mixture modeling approach, county-level corn yields from the USDA National Agricultural Statistics Service for 1972–2008 were employed, consisting of major Midwest production states, plus Maryland. Counties were included only if they contained data for all thirty-seven years, resulting in 961 counties. A feature of corn yields is that they have increased through time due to technological gains. To account for technological change, the data are first detrended against time, as commonly required (see e.g., Coble et al. 2008; Sherrick et al. 2004). This study uses a linear trend form at

972

July 2011

the county level and assumes homoscedasticity (unconditional) of yields around trend. A robust iterative reweighted least squares Huber M-estimator was employed to estimate trend (Fabozzi, Kolm, and Pachamanova 2007; Rachev et al. 2006). The use of robust estimators has gained some popularity in the empirical literature (Finger 2010; Ramirez, Misra, and Nelson 2003), further motivating its use. The homoscedasticity assumption has been supported for Midwest corn yields in previous research (Woodard, Sherrick, and Schitkey 2008;Yu and Babcock 2009).7 Implicitly, this approach also assumes that yields are distributed independently across time. This assumption is typically warranted on the basis that year-to-year weather does not tend to exhibit autocorrelation in this region (Jewson and Brix 2006; Woodard and Garcia 2008).

Empirical Approach The empirical analysis is applied to several distributions commonly used in crop insurance applications. The performance of the set of candidates is evaluated with the log-likelihood criterion in both in-sample and out-of-sample frameworks using optimal CV mixing models estimated for both in-sample and out-ofsample cases.8 The in-sample optimal mixture models are estimated similarly to the outof-sample optimal mixtures, except that the in-sample log-likelihood is used as the basis of optimization, as opposed to the out-ofsample log-likelihood in equation (1). That is, the component models are first estimated using all in-sample data; the optimal in-sample

7 The homoscedasticity assumption is not central to evaluating the relative performance of the proposed method. Indeed, the method could be successfully applied to a simple form of heteroscedasticity simply by transforming the residuals first. The objective function itself could even be extended to allow for joint estimation of trend,heteroscedasticity form,and distribution shape, although this is beyond the scope of the study. Of course, no one can ever know the “true” nature of heteroscedasticity present in these Midwest counties in “reality”; however, here we simply attempt to define a reasonable data-generating process so as to assess the model in the context of a relevant application. 8 In practice an analyst would use the optimal out-of-sample mixing weight in conjunction with the parameters for the component models estimated from the entire dataset to construct the final model. In the Results section, we evaluate the out-of-sample likelihood values for the individual models and the optimal mixture model using the same set of parameters for each hold-out observation in order to make comparisons on the same basis. Thus, the decrease in the OSLL for the out-of-sample mixture model versus the component models is due only to the mixing parameter, and not to the estimation of the component model parameters.

Amer. J. Agr. Econ.

mixing weight is then solved for using the in-sample likelihoods in equation (1). In order to maintain tractability when estimating mixture model combinations,the analysis limits the number of component models to combinations of two. Each county is assessed individually to allow for evaluation of spatial patterns in the results.9 In practice, various levels of aggregation could be implemented within the current framework in a straightforward manner (e.g., estimate of optimal mixture combinations for a state or region, instead of county by county). The following distributions were evaluated due to their prevalence in related work: twoparameter Weibull (Weibull; Lanoue et al. 2010; Sherrick et al. 2004), four-parameter beta (Beta; Nelson and Preckel 1989; Sherrick et al. 2004), normal (Normal; Atwood, Shaik, and Watts 2003; Just and Weninger 1999), a semiparametric two-component mixture of normals (TCMN; Goodwin, Roberts, and Coble 2000; Ker 1996), and a kernel density estimator (Kdensity; Deng, Barnett, and Vedenov 2007). All parametric distributions were fit using maximum likelihood estimation (MLE). TCMN was fit using the expectation-maximization (EM) algorithm. A Gaussian (normal) kernel was used for Kdensity with the standard Silverman plug-in estimate for bandwidth. For the Beta distribution fitting; initial estimates of the scale and shape parameters were obtained by first scaling the data on the unit interval and then estimating the scale and shape parameters using a standard beta fitting routine; initial estimates of the lower and upper limits are set to zero and 120% of the maximum observation, and final parameters are solved for using constrained MLE. Several other approaches were explored for fitting the beta, but overall the use of other approaches (including fixing upper and lower limits in various combinations, different starting points, etc.) had only trivial implications for the conclusions and all performed similarly out-of-sample. Several other distributions were also investigated and considered, including other kernel density estimators (different kernels and methods of estimating bandwidth), a three-parameter Weibull, and a three-component mixture of normals, among others. The main results of the study

9 If the county data are spatially correlated, efficiency could possibly be improved by incorporating the spatial correlation in the estimation. While this is an interesting possibility, it is beyond the scope of the current study, is not central to demonstrating the basic method developed here, and is left as an area of future study.

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

973

Table 1. Average Log-Likelihood (single distributions)

Table 2. Frequency Model Ranked Best (single distributions)

Distribution Weibull Beta Normal TCMN Kdensity

Out-of-Sample

In-Sample

Distribution

−159.6782 −174.2531 −162.3535 −161.3321 −161.8254

−157.2170 −155.6764 −159.3889 −150.7567 −155.6130

Weibull Beta Normal TCMN Kdensity

are not sensitive to the distributions selected for evaluation.

Results The single distribution models are first assessed to document differences in performance implied by in-sample versus out-of-sample criteria. Results for optimal out-of-sample mixture models (OptMixOut) are then presented, followed by a comparison to in-sample mixture models. Out-of-Sample versus In-Sample Results for Single Distribution Models Table 1 presents average log-likelihood values for each of the distributions evaluated. The first column of results presents OSLLs, while the second presents in-sample log-likelihoods (ISLLs). Table 2 presents the frequency with which each distribution was ranked best for its county. Based on the ISLL criterion, TCMN is the best-fitting distribution, with an ISLL value of −150.7567, followed by Kdensity, Beta, Weibull, and Normal. The rankings in table 2 present similar findings, with TCMN ranking best over 71% of the time, followed by Beta (16.86%), and Kdensity (11.34%);Weibull (0.31%) and Normal (0%) ranked very low in-sample in terms of frequency of best fit. The out-of-sample results present a drastically different picture. In terms of the average OSLL, Weibull is by far the best-fitting distribution (−159.67). Interestingly, Beta was the worst performing (−174.25), followed by Normal (−162.33), Kdensity (−161.82), and TCMN (−161.33). The frequency of best rank was also drastically different, with Weibull again performing much better (62.96% of the time ranked best) than the other candidate models. The results show that the semiparametric TCMN estimator has the potential to outperform other, more highly parameterized forms (the four-parameter Beta) but is also

Out-of-Sample

In-Sample

0.6296 0.0031 0.0947 0.1332 0.1394

0.0031 0.1686 0.0000 0.7149 0.1134

very prone to overfitting and poor out-ofsample performance relative to more parsimonious forms. The exception was Normal, which underperformed likely because it was—while parsimonious—not able to accommodate the negative skewness typical of crop yields in this region. On the other hand, Weibull was both parsimonious and close enough in shape to the underlying data to allow it to perform well outof-sample even though it might not have been selected on the basis of the in-sample fitting alone. The results comparing Weibull versus Beta are most revealing. While Beta performed quite well in-sample due to its shape characteristics, yet the out-of-sample performance was very poor relative to Weibull. The difference between the ISLL and OSLL for Weibull (a difference of −2.46) was also much less than its more flexible counterparts (−19.57 for Beta, −10.58 for TCMN, and −6.21 for Kdensity), again highlighting the impact of the overfitting phenomenon. Figures 1 and 2 present maps indicating the best-fitting distribution in each county (in-sample and out-of-sample, respectively). Referring to the in-sample case (figure 1), the best-fitting distribution is dominated by TCMN and Kdensity, although there is one region in the west-central zone that is well characterized in-sample by Beta. The out-of-sample case (figure 2) shows that the best-fitting distribution was most oftenWeibull. Again, Figure 2 displays spatial patterns in terms of best ranking distribution, showing that the most desirable distribution varies by region. These results highlight the fact that caution should be exercised with both nonparametric and parameterized forms when one is selecting distributions or evaluating them in strictly insample contexts. Although NRL investigated a different semiparametric estimator, we found, unlike NRL, that this semiparametric estimator had the same potential to suffer from overfitting. Overall, the tradeoff between degree of parameterization (either in terms of number of parameters or by virtue of it being

974

July 2011

Amer. J. Agr. Econ.

Figure 1. Best ranking distribution, in-sample (ISLL criterion)

Figure 2. Best ranking distribution, out-of-sample (OSLL criterion) semi- or nonparametric) and degree of overfitting comes through quite clearly in the results. Lastly, and along those lines, the results regarding the superior performance of Weibull is consistent with the findings of Sherrick et al. (2004) and Lanoue et al. (2010), who found Weibull to be a reasonable characterization of yields for this crop/region in crop insurance applications. Optimal Out-of-Sample Mixture Model and Weight Estimation Results Table 3 presents average OSLLs for the estimated optimal mixture model (OptMixOut) in the top panel, while table 4 presents frequencies in which particular mixture model combinations ranked best. Table 5 (top panel) presents the average optimal weight of each distribution in OptMixOut for each combination, where the weights presented are the weight of the model in each row, and the corresponding column is the companion distribution. Note that OptMixOut usually has positive weights for more than one model (e.g., the optimal weight to Weibull distribution in

the Weibull–TCMN mixture was 76.42%; other combinations are evident in table 5). As shown in table 4, the mixture models containing Weibull often ranked best, with Weibull–Normal ranking best (17.79% of the time) relative to all other models, followed by those also containing the mixture of Kdensity (15.09%), Beta (13.74%), and TCMN (11.86%). It is worth noting that the trivial mixture models (i.e., where all weight is on one component distribution) were in some cases superior to any other possible mixture model. The values in panel 1 of table 4 do not sum to 100%, indicating that in some cases the highest ranking model consisted simply of one of the single models. Overall, the highest ranked model (in terms of OSLL) was a mixture model in 72.32% of cases, and 27.68% of the time the optimal model consisted of a single distribution. The high prevalence of this occurrence is likely due to the fact that Weibull happens to be a highly adequate single distribution representation of these particular data. To explore the degree of heterogeneity in weights—in terms both of spread over the [0, 1] interval and over space, figure 4 presents

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

975

Table 3. Average Log-Likelihood (mixture distributions) Component Dist. 1\2

Beta

Normal

TCM

Kdensity

Out-of-Sample Weibull Beta Normal TCMN

−159.4211 – – –

−159.3987 −160.9439 – –

−159.2044 −160.7255 −160.5285 –

−159.1836 −160.8224 −160.7433 −160.4761

In-Sample Weibull Beta Normal TCMN

−155.5666 – – –

−157.1262 −155.6405 – –

−144.0992 −143.6675 −144.5979 –

−155.4424 −154.5260 −155.5800 −143.9135

Table 4. Frequency Model Ranked Best (mixture distributions) Component Dist. 1\2

Beta

Normal

TCM

Kdensity

Out-of-Sample Weibull Beta Normal TCMN

0.1374 – – –

0.1779 0.0073 – –

0.1186 0.0281 0.0156 –

0.1509 0.0229 0.0343 0.0302

In-Sample Weibull Beta Normal TCMN

0.0312 – – –

0.0000 0.0083 – –

0.0531 0.0801 0.0042 –

0.0073 0.0447 0.0000 0.0385

Note: Excludes cases in which the optimal mixture model was composed of only one distribution, and thus totals sum to the fraction improved with mixed fitting.

Table 5. Average Optimal Weight in Optimal Mixture Combination Component Dist. 1\2

Weibull

Beta

Normal

TCM

Kdensity

Out-of-Sample Weibull Beta Normal TCMN Kdensity

– 11.08% 14.44% 23.58% 22.59%

88.92% – 64.09% 78.72% 70.37%

85.56% 35.91% – 53.79% 53.67%

76.42% 21.28% 46.21% – 45.08%

77.41% 29.63% 46.33% 54.92% –

In-Sample Weibull Beta Normal TCMN Kdensity

– 74.97% 10.18% 73.39% 72.97%

25.03% – 4.96% 71.76% 49.99%

89.82% 95.04% – 75.49% 92.50%

26.61% 28.24% 24.51% – 27.29%

27.03% 50.01% 7.50% 72.71% –

Note: Weights are for the distribution in each row. The distribution named in the column is the companion distribution in the mixture model.

a map of the optimal weight in the Weibull distribution when Weibull was a component in the optimal model (i.e., had the highest OSLL for the county). As is evident, the optimal weights vary greatly across counties and also demonstrate some spatial correlation by region. In many cases the weight is near zero (0.000–0.120) or one (0.881–1.000), but in most

cases it is somewhere in between. Referring again to table 3, OptMixOut also tended to substantially outperform its component distribution counterparts consistently. This result is most evident for the highly flexible distributions. For example,the OSLL for Beta increases from −174.25 to −159.42 when mixed with Weibull, to −160.72 when mixed with TCMN,

976

July 2011

Amer. J. Agr. Econ.

Figure 3. Weight of Weibull in optimal mixture model, in-sample (OptMixIn)

Figure 4. Weight of Weibull in optimal CV mixture model, out-of-sample (OptMixOut) to −160.94 when mixed with Normal, and to 160.82 when mixed with Kdensity. Similar results are found for TCMN, Kdensity, and Normal.TheWeibull mixtures consistently outperform the single model Weibull modestly as well. This result is not surprising, since the mixtures are more flexible and were optimized according to the OSLL criterion, and thus must perform at least as well. Out-of-Sample versus In-Sample Estimation of Mixture Model Weights The next issue to address is whether it is even necessary to employ optimized CV mixture models in an out-of-sample optimization framework in the first place. A priori, we would expect the in-sample optimization of mixing weights across models to overweight more flexible models in a similar way that highly parameterized distributions tend to overfit in-sample. To shed light on this issue, the lower panels in tables 3, 4, and 5 present results for the optimal mixture models using ISLL as the basis for optimizing weights (OptMixIn). The results are in stark contrast to the out-of-sample procedure.

For example, the average ISLL in panel 2, table 3 is greatest for models with TCMN and Kdensity as components, whereas in panel 1 (out-of-sample) the Weibull models tend to perform best. Table 4 presents similar findings,indicating that based on in-sample criteria, there appears to be little to gain from mixture models, as they improve fit only about 26.74% of the time. Table 5 also shows that in-sample estimation of optimal weights leads to highly skewed estimates of optimal weights. For example referring to the Weibull–Kdensity mixture model, out-of-sample estimation gives a weight of 77.41% on average to Weibull and 22.59% to Kdensity, whereas in-sample estimation greatly overweights Kdensity relative to Weibull (72.97% versus 27.03%). Contrasting figure 3 (in-sample optimal Weibull weight) with figure 4 (out-of-sample optimal Weibull weight), it is apparent again that Weibull is underweighted greatly in the optimal in-sample mixture model relative to the out-ofsample optimized model. In fact, in most cases the Weibull weight is essentially zero for the insample mixture case, but was weighted much

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

more equally with other models when optimized using out-of-sample criteria. This result not only illustrates the fundamental overfitting problem endemic to more highly parameterized and semi- and nonparametric methods,but also shows profound differences in estimated optimal mixing weights in an in-sample versus out-of-sample framework.10 Overall, the common theme in contrasting in- and out-of-sample results is that the more highly parameterized distributions tend to fit well in-sample but perform dramatically worse out-of-sample. This overfitting phenomenon has also been illustrated in a similar context by Lanoue et al. (2010). The results hold for other distributions investigated as well (not reported), including the other kernel density estimators, three- and four- component mixtures, as well as other procedures investigated for fitting the beta distribution. For this reason, the Weibull distribution tended to perform well due to the fact that its fundamental shape is able to accommodate the data used in this study (as it regards skewness/kurtosis combinations), but it is still parsimonious. Normal—while parsimonious—tends to not perform as well due to its restrictive shape properties—that is, it is symmetric and has fixed kurtosis. As expected, the in-sample mixing estimates tend to overweight the more flexible models relative to the optimal out-of-sample weights, and the fact that the in-sample and out-ofsample estimated weights vary dramatically indicates that efficient estimation of such mixing weights should be approached in only an out-of-sample framework. The method for determining optimal mixtures of candidate models for the out-of-sample case also appears to perform quite well, substantially increasing OSLL in many cases; in most cases the resulting model was composed of nontrivial mixtures. That is, in some cases the optimal weight was simply equal to one, indicating that it is not always necessary to mix models if the best-fitting single component model sufficiently described the shape of the data, but the losses in estimation are slight from application of the more general approach. In most cases, though, where more flexibility is useful to describe the underlying data, the weights are split between models in an optimal manner,

10 Second-order out-of-sample optimization of weights was also explored (not reported) by dropping two observations (one for estimating the weights, and the other for validating the weights themselves); while some variability resulted, the differences are minimal between first-order and second-order out-of-sample optimization.

977

and there appear to be meaningful spatial patterns in these features as well. Monte Carlo Insurance Rating Analysis It is reasonable to investigate whether the differences in alternative distributional representations have economically meaningful differences in practical applications. If the resulting mixture model distribution simply performs similarly to the best single distribution model, then the gains that could result from implementing the method may not justify the extra effort. Additionally, it should be pointed out that in some sense, just as in-sample estimators can be prone to overfitting in-sample, so out-of-sample estimators (such as that used to estimate the OptMixOut mixing weight) could also be prone to overfitting to out-of-sample sampling error in any particular sample. In order to assess these impacts in the context of a relevant application, this section conducts a straightforward Monte Carlo analysis to assess the impact that adoption of the proposed mixture modeling framework would have on rate estimation for an insurer that uses distributional rating. The rating application adopted is for rating a county-based insurance product, and the process we adopt is similar to the approach currently in use by the Risk Management Agency (RMA) of the USDA for estimating group risk protection (GRP) insurance rates. GRP is an insurance product offered under the Federal Crop Insurance Program that bases indemnities on county crop yields.11 As explained by Coble et al. (2008), the RMA establishes rates for GRP by first detrending NationalAgricultural Statistics Service county yields and then using the residuals to construct a base rate. To examine the impacts of alternative distributional representations, comparable rates were generated for each county using each of the individual distributions, Weibull, Normal, Beta, TCMN, and Kdensity, as well as for the optimal mixture distributions estimated from both the in-sample and out-of-sample approaches (OptMixOut and OutMixIn). Recall, OutMixIn is estimated similarly to OptMixOut except that instead of maximizing the mixing weight according to the

11 For further details and policy context, refer to Deng, Barnett, and Vedenov (2007), Ker and Coble (2003), or Skees, Black, and Barnett (1997), among others, for more complete explanations of this program and rating.

978

July 2011

OSLL criterion, the ISLL is simply optimized directly. The Monte Carlo analysis assumes that the “true” distribution used for sampling is defined by the distribution that results from fitting a kernel density to the observed sample data on a county-by-county basis. This is similar to the approach taken in earlier work, whereby “pilot” distributions are assumed to take the form of some nonparametric density based on the actual data observed (see e.g., Ker and Coble 2003; Ker and Goodwin 2000). Samples of size N = 37 were used to most closely reflect the level of data constraints in which the current program operates (i.e., the same length as our sample period). At each iteration of the simulation a sample of size Nwas drawn from the (assumed) “true” distribution; the sample was then used to estimate parameters for each of the candidate models.12 Five hundred iterations of the simulation were conducted. The assumed “true” rate (True) is that which resulted from the assumed kernel density, and is provided for comparison. The standard GRP indemnity function is used to generate expected indemnities for each county. The rate is then calculated as the expected indemnity divided by the liability, where the liability equals the average detrended yield multiplied by the coverage level. The expected indemnity is calculated by integrating the GRP indemnity function over the appropriate distribution. Numerical integration (quadrature) is used to arrive at the final rate.13 See Deng, Barnett, and Vedenov (2007) for further details on the GRP calculations and indemnity function. The four coverage levels of 65%, 75%, 85%, and 90% are evaluated and tabulated. Several statistics are reported to evaluate the performance of the methods. First, in order to assess aggregate rate bias, the mean rates 12 Note,in the case of OptMixOut,that the optimal mixing weight is estimated with the OSLL CV optimization technique using the available N sampled observations that would be observable to the analyst. Thus, OSLL evaluations are obtained by successively estimating the parameters of the component models with N − 1 of the sample observations and evaluating the OSLL value for each of the hold-out observations. The optimal out-of-sample weight is then solved for as described previously. The final parameters of the component models are then estimated using all N observations and used in conjunction with the estimated weight to estimate rates. 13 Simulating from the mixture distributions, OptMixOut and OutMixIn, is straightforward provided that one can sample from the component distributions. While there are several ways to sample from mixed distributions, the simplest way involves a process whereby first a uniform variate is drawn to determine which distribution to draw from (based on the relative weight); second, a random variate is drawn from the corresponding component distribution, which becomes the draw.

Amer. J. Agr. Econ.

(Mean Rate) are generated from the simulations under each method as the average rate over all simulations and all counties. In order to gauge the volatility in rates caused by sampling effects that would be expected within a county, the root mean square errors (Rate RMSE) between the rates generated by each of the individual distributions relative to the True rate (for each county) are calculated. Note that the analysis was conducted county-by-county, so the results reflect aggregate county results. Thus, the aggregate mean rate tells us only the degree to which rates under each candidate method are biased for the whole system (i.e., for all counties); however, it could be the case that any particular method is unbiased in only certain counties and not others. Thus we also report a measure for rate accuracy across counties (RateAccuracy),calculated as the root mean square error across counties between the mean rate (averaged calculated across all iterations for each county) and the True rates for the respective counties. Rate Accuracy thus reflects the volatility in expected rate bias across counties under each method; this interpretation is different from the Rate RMSE, which reflects the expected intra-county rate volatility due to sampling effects. Monte Carlo Insurance Rating Analysis Results Results of the rating analysis are presented in tables 6 (Mean Rate), 7 (Rate RMSE), and 8 (Rate Accuracy). The Monte Carlo rate results lend strong support to the usefulness of our proposed out-of-sample mixture model (OptMixOut). Specifically, the parametric models (while modestly less volatile) systematically underestimated rates by 10%–25%, while the nonparametric (Kdensity) and semiparametric (TCMN) models, and OptMixIn consistently overestimated rates by 15%–40%.14 On the other hand, OptMixOut was virtually unbiased. Furthermore, it was equally efficient (by RMSE) to Kdensity and consistently more efficient than TCMN and OptMixIn. OptMixOut also outperformed all models on the basis of Rate Accuracy across counties. The True rates are equal to 3.8407%, 2.9555%, 1.7327%, and

14 A reviewer pointed out that the use of higher-order kernels may decrease the bias in the rates generated under the kernel density. Given that the use of higher-order kernels does not exist within the agricultural insurance literature to our knowledge, further investigation of such models may provide a worthwhile avenue for future research.

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

979

Table 6. Monte Carlo Rate Analysis, Mean Rate Coverage Level Model True Rate Weibull Beta Normal TCMN Kdensity OptMixOut OptMixIn

90%

85%

75%

65%

3.8407% 3.7011% 3.6622% 3.5400% 4.4914% 4.4780% 3.8740% 4.3307%

2.9555% 2.7912% 2.6936% 2.5487% 3.4569% 3.5400% 2.9503% 3.3444%

1.7327% 1.5853% 1.4595% 1.3382% 2.0182% 2.2112% 1.7161% 1.9784%

0.9909% 0.9067% 0.8122% 0.7464% 1.1775% 1.3818% 1.0059% 1.1704%

Table 7. Monte Carlo Rate Analysis, Rate Root Mean Square Error Coverage Level Model Weibull Beta Normal TCMN Kdensity OptMixOut OptMixIn

90%

85%

75%

65%

0.0168 0.0178 0.0162 0.0580 0.0220 0.0254 0.0470

0.0158 0.0165 0.0146 0.0516 0.0210 0.0231 0.0420

0.0140 0.0144 0.0123 0.0402 0.0190 0.0191 0.0331

0.0126 0.0128 0.0109 0.0313 0.0175 0.0160 0.0262

Table 8. Monte Carlo Rate Analysis, Inter-County Rate Accuracy Coverage Level Model Weibull Beta Normal TCMN Kdensity OptMixOut OptMixIn

90%

85%

75%

65%

0.0084 0.0077 0.0054 0.0140 0.0136 0.0068 0.0114

0.0090 0.0080 0.0061 0.0119 0.0133 0.0065 0.0100

0.0095 0.0085 0.0072 0.0088 0.0127 0.0065 0.0080

0.0094 0.0085 0.0079 0.0073 0.0123 0.0068 0.0073

0.9909%, for the 90%, 85%, 75%, and 65% coverage levels, respectively. In comparing the fitted rates, several important patterns are apparent. First, Kdensity and TCMN systematically overestimate the baseline (True) rate, as does OptMixIn. For example, Kdensity 90% (65%) rate was estimated to be 4.4780% (1.3818%) on average, nearly 16% (40%) higher than the baseline rate. On the other hand, the parametric distributions consistently underestimate the rates, although by lower magnitudes. While some sampling variability is always expected under any rating approach, the fact that the result is consistent for such a large region, and the rating comparisons are for a one-to-one matching of a reasonable data generating process, indicates that the bias is

systematic and troubling.15 Interestingly, the average OptMixOut rate tends to replicate the True rate quite well, and is virtually unbiased. It also performed best in terms of Rate Accuracy, indicating that this performance was not only apparent at an aggregate level but was also consistent across counties. Overall, the rate simulation results indicate the potential for significant efficiency gains and bias reduction from applying the proposed method. This is likely due to the fact that the

15 A similar analysis was conducted employing the empirical distribution as the baseline. In both jackknife and bootstrap, resampling techniques were applied under this assumption, and in all cases the results were qualitatively similar to the results reported.

980

July 2011

out-of-sample mixture approach has the potential to accommodate a much wider class of underlying data distributions, which also eliminates the need to evaluate and select individual distributions for various regions. The superior performance of OptMixOut relative to all other approaches in terms of accuracy and bias reveal it to be a potentially useful addition to the analyst’s toolbox. Furthermore, the argument often employed for the use of non- and semiparametric alternatives is that they allow for needed flexibility that standard parametric methods cannot provide. The results suggest that if this is a motivating factor, the proposed method should be preferred on all grounds to existing semi- and nonparametric candidates.

Conclusions Selection of the “best” distribution choice and the “best” criterion or method for guiding such selection have been long debated in the agricultural economics literature. This study addresses primarily the latter question by proposing a framework for optimal model mixing in a CV/out-of-sample framework. In doing so, this approach provides a promising avenue for dealing with the former question as well—that is, it is simply not necessary to constrain the model selection to choices of a single distribution. The fact that the optimization of mixing weights is approached in an out-ofsample framework allows for definition and design of specifications that generate results that are both relatively efficient and unbiased. It is also relatively simple and easy to implement in practice. Furthermore, the results indicate that the proposed model produces significant gains in terms of rate bias, efficiency, and accuracy relative to many existing nonparametric and semiparametric alternatives. It was also superior to parametric models on the basis of bias and rate accuracy, while giving up only little in terms of rate efficiency. The results also illustrate in an informative way the tradeoffs typically present between the degrees of parameterization and model (in)efficiency in these applications, and suggest that caution should be exercised in working with highly parameterized, nonparametric, and semiparametric forms in in-sample frameworks. Furthermore, the results caution against making blanket assessments with respect to choosing a “best” particular individual component distribution form or estimator

Amer. J. Agr. Econ.

(be it parametric or nonparametric). Indeed, the proposed method presents a promising avenue for avoiding having to make such a dichotomous distinction at all. The impact of in-sample versus out-of-sample distribution model selection criteria is also profound as applied to the selection of component distributions and estimation of weights. Another important finding is that the semiparametric and nonparametric methods tend to perform very poorly out-of-sample relative to other alternatives in this application (due in part to their extra flexibility), and the Monte Carlo rating analysis also found that they would be prone to generate inflated crop insurance rates, at least for the Midwest corn application investigated here. This result is consistent with the finding by Lanoue et al. (2010), who, using known theoretical distributions as starting points, find similar relationships between a distribution’s degree of parameterization and proneness to overfit in this manner. Last, the results corroborate those of Sherrick et al. (2004) and Lanoue et al. (2010) regarding the performance of the Weibull distribution in this region/crop. Future research could focus on applying the proposed approach to other regions, crops, and applications. It should also explore ways in which this logic can be applied to optimal copula choice as well as to general multivariate distribution estimation. The optimal set of possible mixture distributions evaluated here is highly limited in that only five component distributions/estimators were reported (and only in combinations of two). In practice the possible set of distributions could be much higher. Thus, future studies comparing more distributions and combinations could be interesting. The rate analysis here was based on the optimized CV mixture model using OSLL. As outlined earlier, other objectives could also be employed (such as root mean square between estimated loss rates and observed loss-costs). Thus, future research could explore the implications of employing other out-ofsample optimization criteria, or other validation techniques, such as v-fold, leave-more-out, or Monte Carlo variants.

References Antle, J. M. 2010. Do Economic Variables Follow Scale or Location–Scale Distributions? American Journal of Agricultural Economics, 92 (1): 196–204.

Woodard and Sherrick

Estimation of Mixture Models Using Cross-Validation Optimization

Atwood, J., S. Shaik, and M. Watts. 2003. Are Crop Yields Normally Distributed? A Reexamination. American Journal of Agricultural Economics 85(4): 888–901. Baumann, K. 2003. Cross–Validation as the Objective Function for Variable–Selection Techniques, Trends in Analytical Chemistry, 22: 395–406. Cameron,A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications: Cambridge, Coble, K. H., A. Harri, J. Anderson, A. Ker, and B. Goodwin. 2008. USDA Risk Management Agency Review of County Yield Trending Procedures and Related Topics, February 18. Corduneanu, A., and C. M. Bishop. 2001. Variational Bayesian Model Selection for Mixture Distributions. In Artificial Intelligence and Statistics, ed. T. Jaakkola and T. Richardson, 27–34. Day. R. H. 1965. Probability Distributions of Field Crop Yields. Journal of Farm Economics 47(3): 713–741. Deng, X., B. J. Barnett, and D. V. Vedenov. 2007. Is There a Viable Market for AreaBased Crop Insurance? American Journal of Agricultural Economics 89: 508–519. Fabozzi, F., P. Kolm, and D. Pachamanova. 2007. Robust Portfolio Optimization and Management. The Frank J. Fabozzi Series. Hoboken, NJ: John Wiley and Sons. Finger, F. 2010. Revisiting the Evaluation of Robust Regression Techniques for Crop Yield Data Detrending. American Journal of Agricultural Economics 1–7. Gallagher. P. 1986. U.S. Corn Yield Capacity and Probability: Estimation and Forecasting with Nonsymmetric Disturbances. North Central Journal of Agricultural Economics 8(1): 109–122. Goldfeld, S. M, and R. E. Quandt. 1976. Techniques for Estimating Switching Regressions. Chapter 1 in Studies in Nonlinear Estimation, ed. authors. Cambridge, MA: Ballinger Publishing. Goodwin, B. K., and A. Ker. 1998. Nonparametric Estimation of Crop Yield Distributions: Implications for Rating GroupRisk Crop Insurance Contracts. American Journal of Agricultural Economics 80(1):139–153. Goodwin, B. K., M. C. Roberts, and K. H. Coble. 2000. Measurement of Price Risk in Revenue Insurance: Implications of Distributional Assumptions. Journal

981

of Agricultural and Resource Economics 25(1): 195–214. Hennessy, D. A. 2009. Crop Yield Skewness Under Law of the Minimum Technology. American Journal of Agricultural Economics 91 (1): 197–208. Jewson, S., and A. Brix. 2005. Weather Derivative Valuation: The Meteorological, Statistical, Financial and Mathematical Foundations. Cambridge, UK: University Press. Just, R. E., and Q. Weninger. 1999. Are Crop Yields Normally Distributed? American Journal of Agricultural Economics 81(2): 287–304. Ker, A. P. 1996. Using SNP Maximum Likelihood Techniques to Recover Conditional Densities:A New Approach to Recovering Premium Rates. Working paper, Department of Agricultural and Resource Economics, University of Arizona. Ker, A. P., and K. Coble. 2003. Modeling Conditional Yield Densities. American Journal of Agricultural Economics 85(2): 291–304. Ker, A. P., and B. K. Goodwin. 2000. Nonparametric Estimation of Crop Insurance Rates Revisited.American Journal ofAgricultural Economics 82 (2): 463–478. Lanoue, C., B. J. Sherrick, J. D. Woodard, and N. D. Paulson. 2010. Evaluating Yield Models for Crop Insurance Rating. Paper presented at the joint annual meeting of the Agricultural and Applied Economics Association, the Canadian Agricultural Economics Society, and the Western Agricultural Economics Association, Denver, Colorado, July 25–27. Moore, A., and M. S. Lee. 2004. Efficient Algorithms for Minimizing Cross Validation Error. Proceedings of the 11th International Conference on Machine Learning, 190–198. Nelson, C. H. 1990. The Influence of Distributional Assumptions on the Calculation of Crop Insurance Premia. North Central Journal of Agricultural Economics 12(1): 71–78. Nelson, C. H., and P. Preckel. 1989. The Conditional Beta Distribution as a Stochastic Production Function. American Journal of Agricultural Economics 71(2): 370–378. Norwood, B., P. Ferrier, and J. Lusk. 2001. Model Selection Using Likelihood Functions and Out–of– Sample Performance. Proceedings of the NCR-134 Conference of Applied Commodity Price

982

July 2011

Analysis, Forecasting, and Market Risk Management. Norwood, B., J. Lusk, and M.C. Roberts. 2002. A Comparison of Crop Yield Distribution Selection Criteria. Paper presented at the 2002 Agricultural Economics Southern Meetings in Orlando, Florida, February 2–6. Norwood, B., M. Roberts, and J. Lusk. 2004. Ranking Crop Yield Models Using Out-ofSample Likelihood Functions. American Journal of Agricultural Economics 86(4): 1021–1043. Rachev, S., S. Mittnik, F. Fabozzi, S. Focardi, and T. Jašiæ. 2006. Financial Econometrics: From Basics to Advanced Modeling Techniques. The Frank J. Fabozzi Series. Hoboken: John Wiley and Sons. Racine, J., and A. Ker. 2006. Rating Crop Insurance Policies with Efficient Nonparametric Estimators that Admit Mixed Data Types. Journal of Agricultura1 and Resource Economics 31: 27–39. Ramírez, O. A. 1997. Estimation and Use of a Multivariate Parametric Model for Simulating Heteroskedastic, Correlated, Nonnormal Random Variables: The Case of Corn Belt Corn, Soybean, and Wheat Yields. American Journal of Agricultural Economics 79(1): 191–205. Ramirez, O. A.,T. U. McDonald, and C. E. Carpio. 2010. A Flexible Parametric Family for the Modeling and Simulation of Yield Distributions. Journal of Agricultural and Applied Economics 42: 303–319. Ramirez, O. A., S. Misra, and J. Field. 2003. Crop-Yield Distributions Revisited.American Journal of Agricultural Economics 85(1): 108–120. Ramírez, O. A., S. K. Misra, and J. Nelson. 2003. Efficient Estimation of Agricultural Time Series Models with Nonnormal Dependent Variables. American

Amer. J. Agr. Econ.

Journal of Agricultural Economics 85(4): 1029–1040. Shahwan, T., and M. Odening. 2007. Forecasting Agricultural Commodity Prices Using Hybrid Neural Networks. Computational Intelligence in Economics and Finance, vol. 2, ed. S.-H. Chen, P. Wang, and T.-W. Kuo. Sherrick, B., F. Zanini, G. Schnitkey, and S. Irwin. 2004. Crop Insurance Valuation under Alternative Yield Distributions. American Journal of Agricultural Economics 86(2): 406–419. Skees, J. R., J. R. Black, and B. J. Barnett. 1997. Designing and Rating an Area Yield Crop Insurance Contract. American Journal of Agricultural Economics 79(2): 430–438. Turvey, C. G., and J. Zhao. 1999. Parametric and Non-Parametric Crop Yield Distributions and Their Effects on All-Risk Crop Insurance Premiums. Paper, University of Guelph, Ontario. Wang, H., and H. Zhang. 2002. Model-Based Clustering for Cross-Sectional Time Series Data. Journal of Agricultural, Biological, and Environmental Statistics 7(1): 107–127. Woodard, J. D., and P. Garcia. 2008. Weather Derivatives, Spatial Aggregation, and Systemic Insurance Risk: Implications for Reinsurance Hedging. Journal of Agricultural and Resource Economics 33: 34–51. Woodard, J. D., B. J. Sherrick, and G. D. Schnitkey. 2008. Crop Insurance Ratemaking under Trending Liabilities. Working paper, University of Illinois at Urbana– Champaign. Yu, T., and B.A. Babcock. 2009Are U.S. Corn and Soybeans Becoming More Drought Tolerant? Working Paper 09-WP 500, Iowa State University CARD. Zhang, P. 1993. Model Selection via Multifold Cross Validation. Annals of Statistics 21: 299–313.

Suggest Documents