Errors in Coefficient or Expected Value? Effects of ...

0 downloads 0 Views 1MB Size Report
of the underlying functional or structural relationships (Charles, 1998; Harwood and. Stokes, 2003). Monte Carlo simulation is commonly applied to examine and.
Values for a 23-34 Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012) J. Fish. Soc.Simulated Taiwan, 2012, 39(1):

23

Errors in Coefficient or Expected Value? Effects of Different Methods on Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model Yu-Jia Lin1, Nan-Jay Su1, Brian M. Jessop2, Wei-Chuan Chiang3 and Chi-Lu Sun1* (Received, November 18, 2011; Accepted, January 26, 2012)

ABSTRACT Monte Carlo simulation is widely applied to incorporate the uncertainties in fisheries assessment models. However, even modeling the same kind of uncertainty, different practices often occur among studies, which may lead to erroneous results. We demonstrate how simulation results differed among methods of incorporating uncertainty into simulation in different ways: adding random errors to model either coefficients or expected values. Using life history parameter data from sailfish (Istiophorus platypterus), natural mortality from Pauly’s empirical equation, and lengths-at-age from the von Bertalanffy growth model (VBGM) were simulated using different methods. Different simulation methods did not affect the averages of simulated values from Pauly’s empirical equation and had only slight effects on the simulated lengths-at-age from the VBGM. For both linear Pauly’s equation and nonlinear VBGM, the variances of the simulated values from the errors-in-coefficients methods were under-estimated, being approximately 1 to 7% or 40 to 95% of those from errors-in-expected-values methods, depending on whether the correlation among coefficients was included or not. Therefore, adding random errors with either an additive or multiplicative error structure to the expected values is preferred over the errors-incoefficient methods for fully representing uncertainty in the data. Key words: Monte Carlo simulation, Uncertainty, Model coefficient, Additive or multiplicative error.

INTRODUCTION Uncertainties are ubiquitous when modelling fisheries population dynamics, such as estimating errors or natural variability in model parameters or unknown states of the underlying functional or structural relationships (Charles, 1998; Harwood and Stokes, 2003). Monte Carlo simulation is commonly applied to examine and incorporate uncertainties into assessment models (Smith et al., 1992). It has been used to examine the effects of parameter uncertainties on biological reference points from per-recruit models for several fisheries 1

(Chen and Wilson, 2002; Lin et al., 2010) and to evaluate different management strategies under various scenarios (Rochet and Rice, 2009). However, differences among studies often occur in their simulation practices even when modeling the same source of uncertainty, which may lead to inconsistent results and thus erroneous conclusions. Pauly’s empirical equation (Pauly, 1980) is a linear function (in logarithmic scale) widely applied to estimate the natural mortality rate (M) of fish stocks under data-poor situations (Vetter, 1988; Quinn and Deriso, 1999). However, the variance and confidence TAO

Institute of Oceanography, National Taiwan University, Taipei City, Taiwan 10617 Department of Fisheries and Oceans, Bedford Institute of Oceanography, PO Box 1006, Dartmouth, Nova Scotia B2Y 4A2, Canada 3 Eastern Marine Biology Research Center of Fisheries Research Institute, Taitung County, Taiwan 96143 * Corresponding author. E-mail: [email protected] 2

24

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

intervals of the estimated M are not available from the original equation, which is one of the drawbacks of estimating M using empirical equations (Vetter, 1988). To overcome this, Cubillos et al. (1999) and Quiroz et al. (2010) derived the variance of the estimated M by adding normally distributed random errors with mean zero and corresponding variances (see the “coefficient-error” scenario in Quiroz et al., 2010). Based on linear regression theory, Hall et al. (2004) alternatively constructed a prediction distribution by adding an error term to the estimated M from Pauly’s empirical equation. However, it is unclear how the simulated values differ between these two methods, especially when the correlations among model coefficients were not considered by Cubillos et al. (1999) and Quiroz et al. (2010). In addition to Pauly’s linear equation, differences in simulation practices were also found for nonlinear models. When incorporating uncertainty in growth into yieldper-recruit models, Chang et al. (2001) modelled random variation in the growth coefficient K, while Lin et al. (2010) used multiplicative errors on expected lengthsat-age. However, how different ways of handling uncertainties in growth affect the simulated lengths-at-age is unknown. The objective of this study was to examine the differences in simulated values between different simulation methods. Natural mortality (in logarithmic scale) from Pauly’s empirical equation and the lengths-at-age from von Bertalanffy growth model estimated from sailfish (Istiophorus platypterus) data were used as linear and nonlinear examples.

MATERIALS AND METHODS Linear and Nonlinear examples Pauly’s linear empirical equation is: Log (M) = β1 + β2 log (L∞) + β3 log (K) + β4 log (T) (Pauly, 1980) and the non-linear von Bertalanffy growth model is: Lt = L∞ [1 e-K(t-t0)]. Log(M) is the logarithmic (base 10) transformation of the natural mortality (M), β1

to β4 are the coefficients in Pauly’s equation, Lt is the length at age t, L∞, K, and t0 are VBGM coefficients, and T is the mean water temperature (Quinn and Deriso, 1999). Pauly’s empirical equation was refitted using the original data excluding the first observation in Table 1 of Pauly (1980) because it is possibly an outlier (Jensen, 2001). Data from female sailfish (I. platypterus) in the waters off eastern Taiwan (T = 26°C, Chiang et al., 2004) were used to fit the VBGM with a multiplicative error structure. Calculation of expected value and simulation by different methods The expected logarithm of M (log(M)) and lengths-at-age (Lt) were calculated by: Log (M) = β1 + β2 log (L∞) + β3 log (K) + β4 log (T) and Lt = L∞ [1 - e-K(t - t0)] e0.5MSEGR,M, where MSEGR,M is the mean sum of square errors in the VBGM with a multiplicative error structure. The term e0.5MSEGR,M is the correction factor to reduce the bias due to inverse transformation in the multiplicative model (Miller, 1984; Hayes et al., 1995). The method used by Cubillos et al. (1999) was referred to errors in coefficients without correlations (E-Coef-NCor), and the method simulating errors in model coefficients with the correlations among the coefficients was termed E-Coef-Cor. The method by Hall et al. (2004) was called additive errors on the expected values (EA-Exp). A multiplicative error structure (called EM-EXP) can also be used to model lengths-at-age (Quinn and Deriso, 1999) and therefore was considered in the comparison of lengths-at-age. The formulas for incorporating random error into the models are shown in Table 1. In each of the methods [three for log(M) and four for lengths-at-age], random errors were generated from assumed distributions with expectation zero and variances or covariance matrices calculated from the data and were then entered into the simulation according to the formulas in Table 1. The values of the model coefficients, variance, and covariance matrices are listed in Table 2. TAO

Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012)

25

Table 1. Formulae for the calculation of expected natural mortality (M) and lengths-at-weight (Lt) and for four simulation methods to incorporate parameter uncertainties: (a) E-CoefCor: errors in coefficients with correlations among coefficients (b) E-Coef-NCor: errors in coefficients without correlation, (c) EA-Exp: additive errors and (d) EM-Exp: multiplicative errors on expected values. For the definition of symbols refer to Material and methods. MSElog(M),A is the additive mean sums of squares error from refitting Pauly’s equation with the original data. MSEGR,A and MSEGR,M are, respectively, the mean sum of squares error for the von Bertalanffy growth model with additive and multiplicative error structure Approach

Parameter

Expected

M

Log (M) = β1 + β2 log (L∞) + β3 log (K) + β4 log (T )

Lt

Lt = L∞ [1 - e-K(t - t0)] e0.5MSEGR,M log (Ms1) = (β1 + εβ1,1) + (β2 + εβ2,1) log (L∞) + (β3 + εβ3,1) log (K ) + (β4 + εβ4,1) log (T )

E-Coef-Cor

M Lt

E-Coef-NCor

M

Formula

Errors

εβ1,1 , εβ2,1 , εβ3,1 , εβ4,1 ,~ N4 (0, ΣM, 1)

Lt,s1 = (L∞ + εL∞,1) [1 - e-(K + εK,1)(t - t0 - εt0,1)]e0.5MSEGR,M log (Ms2) = (β1 + εβ1,2) + (β2 + εβ2,2) log (L∞) + (β3 + εβ3,2) log (K ) + (β4 + εβ4,2) log (T )

εL1,1 , εK,1 , εt0, 1 ,~ N3 (0, ΣGR,1)

εL1,2 , εK,2 , εt0, 2 ,~ N3 (0, ΣGR,2)

εβ1,2 , εβ2,2 , εβ3,2 , εβ4,2 ,~ N4 (0, ΣM, 2)

EA-Exp

M

Lt,s2 = (L∞ + εL∞,2) [1 - e-(K + εK,2)(t - t0 - εt0,2)]e0.5MSEGR,M log(M)S3 = log(M) + εA,M

Lt

Lt,S3 = Lt + εA,GR

εA,GR ~ N (0, MSEGR,A)

EM-Exp

Lt

Lt,S4 = Lt e εM,GR

εM,GR ~ N (0, MSEGR,M)

Lt

εA,M ~ N (0, MSElog(M),A)

Table 2. M  odel coefficients, corresponding covariance matrix, and mean sum of squares error under additive or multiplicative error structure in Pauly’s empirical equation (β1, β2, β3, β4, ΣM,1, ΣM,2 and MSElog(M),A) and VBGM (LI, K, t0, ΣGR,1, ΣGR,2, MSEGR,A and MSEGR,M) for sailfish I. platypterus Pauly’s empirical equation β1, β2, β3, β4

-0.057, -0.278, 0.648, 0.500 2

ΣM,1

0.117 -3 3.85 × 10-5 4.32 × 10-3 6.18 × 10

-3

-5

3.85 × 10 2 0.068 -3 3.34 × 10-3 -1.69 × 10

4.32 × 10-3 3.34 × 10 2 0.071 -3 -2.57 × 10 TAO

2

ΣM,2

0.117 0 2 0 0.068 0 0 0 0

0 0 0 2 0 0 2 0.071 0 0.082 2

0.246

MSElog(M),A VBGM LI, K, t0

230.3, 0.171, -2.56 2

ΣGR,1

3.5 -0.03 -0.33

-0.03 2 0.008 -4 7.7 × 10

-0.33 -4 7.7 × 10 2 0.104

2

ΣGR,2

3.5 -0.03 -0.33

-0.03 2 0.008 -4 7.7 × 10

-0.33 -4 7.7 × 10 2 0.104

2

MSEGR,A

8.2

MSEGR,M

0.05

2

-3

6.18 × 10-3 -1.69 × 10 -3 -2.57 × 10 2 0.082

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

26

Data from I. platypterus and the 175 fish stocks from Pauly’s original data [Table 1 in Pauly (1980), the first observation was excluded] were used to simulate log(M) values using the E-Coef-Cor, E-Coef-NCor, and EA-Exp methods. Differences in the mean and variance of the simulated log(M) values among methods were also examined. To compare lengths-at-age among methods, the relative error (RE) between the simulated lengths-at-age from the j th method (Lt,j) and the expected value (Lt) was used to quantify the difference between simulated and expected lengths-at-age: REL, j (%) = 100 × (Lt,j - Lt)/Lt . All computations were completed in R (version 2.12.2, R Core Development Team, 2011).

RESULTS Linear model example: Simulation of log(M) from Pauly’s equation The mean values of the simulated log(M) from three methods were similar to three decimal places to the expected value (-0.502). However, errors-in-coefficients methods resulted in smaller SD of log(M) values than that of errors in-expected-value method. The SD of log(M) was 0.047 for the E-Coef-Cor method and 0.239 for the E-Coef-NCor method, which was 19.2% and 97% to that of the EA-Exp method (0.246). For the 175 fish stocks from Pauly’s original data, the means of log(M) were

TAO

1 Fig. 1.2Relative SD of log(M) in base 10 to that from the EA-Exp (%) method for the (a) E-Coef-Cor and (b) E-Coef-NCor methods using data on 175 fish stocks from Pauly (1980).

Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012)

nearly identical among the three methods, with differences ranging within ± 0.004. On the other hand, the SD of log(M) from the E-Coef-Cor method was the smallest, around 1 to 6% of that from the EA-Exp method. Those from the E-Coef-NCor method were also smaller than 0.246, being 40 to 95% to that from EA-Exp, except that Acipenser transmontanus from Gulf of California with larger SD of 101% (Fig. 1). Nonlinear example: Simulation of lengthsat-age from VBGM The means of simulated lengths-at-age of sailfish from four methods were similar in trend and were generally close (within ± 3%) to the expected values (Figs. 2). The degree of variability also differed among simulation methods. The 95% quantile range of simulated lengths-at-age from the E-CoefCor method was the narrowest (Fig. 2a),

27

with less than 7% of data points included in this range. The 95% quantile range from the E-Coef-Cor method (Fig. 2b) was the second narrowest, with around 50% of data points included. The ranges from the EA-Exp and EM-Exp methods were considerably wider and covered more than 90% of the data points (Fig. 2c and d). Most relative errors (RE) for the simulated mean lengths-at-age were within the range of ± 30% (Fig. 3). The RE from the E-Coef-Cor method was the smallest, with some extremely low values, and the variability was least for the middle ages (Fig. 3a). For the E-Coef-NCor, EA-Exp, and EMExp methods, the values of RE became large, within ± 15, 30, and 20%, respectively. The RE values gradually decreased with age for the E-Coef-NCor and EA-Exp methods (Fig. 3b, c), and remained stable for all ages for the EM-Exp method (Fig. 3d).

TAO

1 2

Fig. 2. O  bserved (cross) and expected lengths-at-age (black broken line) of sailfish (Istiophorus platypterus), the averages of simulated lengths-at-age (grey solid line), and the corresponding 95% confidence intervals of simulated lengths-at-age (grey broken line) from the (a) E-Coef-Cor, (b) E-Coef-NCor, (c) EA-Exp and (d) EM-Exp methods.

28

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

1 2

Fig. 3. Box-plots for the relative errors (%) of the simulated lengths-at-age for sailfish (Istiophorus platypterus) from the (a) N-Cor-Cor, (b) E-Cor-NCor, (c) EA-Exp and (d) EM-Exp methods.

DISCUSSION Differences in variation of simulated values in the linear models Different simulation practices to incorporate data uncertainty into a model resulted in different variability for both linear and nonlinear models. For a linear model with the form: Y = Xβ + ε, where ε is the error term with mean zero and variance of σ2 (Supplementary Material S0) with a given X, Y and a new set of covariates, Xs, it can be shown that the conditional expectations for E-Coef-Cor, E-Coef-NCor and EA-Exp method are all equal to Y = Xs β (Supplementary Material S2). Meanwhile, the conditional variance of the simulated -1 values is σ2 [Xs (XTX) XsT ] for the E-Coef2 Cor method, σ X SNX sT for the E-CoefNCor method, where N is the matrix with -1 diagonal elements identical to (XTX) and off-diagonal elements of zero, and that for EA-Exp is σ2 (Supplementary Material S3). Therefore, different degrees of variability were incorporated into simulation for these three methods that in E-Coef-Cor method only the variance of the mean response was incorporated, while in the EA-Exp

method, it is the variance of Y given X to be incorporated (Supplementary Material S3; Ricker, 1973; Seber and Lee, 2003; Weisburg, 2005). Under special conditions, e.g., a mean of X > 0 and only one covariate (xs) that does not greatly differ from the mean of X, and a sample size that is not trivially small, it can be shown that the conditional variance of a simulated mean from the E-Coef-NCor method, σ2XSNXsT, is smaller than σ2, but still -1 larger than σ2 [Xs (XTX) XsT ] (Supplementary Material S5). For multiple covariates, this order seemed unchanged given that only one out of 175 fish stocks showed a relative SD larger than 100%. Accordingly, the variation of log(M) using the methodology applied by Cubillos et al. (1999) and Quiroz et al. (2010) (i.e., E-Coef-NCor) was very likely underestimated relative to that in Hall et al. (2004) (i.e., EA-Exp). The errors-incoefficient method evidently underestimates the variances of simulated values in the linear model with the magnitude of underestimation likely dependent on specific combinations of life-history parameters (Fig. 1). TAO

Differences in variation of simulated values in the nonlinear VBGM

Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012)

For the non-linear VBGM, applying different simulation methods also resulted in different means, variances, and distributions of simulated lengths-at-age for similar data and coefficient values. The sailfish example showed that averages of simulated lengthsat-age were affected only marginally and differed less than 10% among methods. The smaller variance of simulated values from the E-Coef-Cor and E-Coef-NCor methods implies possible underestimation of the variation in the data as occurred for the linear models. Differences between the EA-Exp and EM-Exp methods reflected the different assumptions about the error structure of the VBGM (additive or multiplicative) and/or the choices of the magnitude of variability to model (a fixed value or a fixed proportion), which can be determined by examination of residual plots. For example, if the residuals become more scattered in a stable fashion with the increases in the covariates, then a multiplicative error structure might be better than additive structure (Quinn and Deriso 1999). Consequence of ignoring correlation among coefficients Ignoring the correlations among coefficients did not affect the means of simulated values in the linear model and had minor effects on the means of simulated lengths-at-age in the nonlinear VBGM. However, it increased the variance of simulated values by approximately 10 times and changed the shape of their distribution. In Monte Carlo simulation, neglecting correlation among parameters leads to biased result, depending on the sign (positive or negative) and scale of the correlation among parameters (Smith et al., 1992). In a hierarchical Bayesian growth model, ignoring these correlations also biases estimation of the mean and variance of VBGM coefficients (Helser and Lai, 2004). In addition, the variance of the simulated values from the E-Coef-Cor method in a linear model is the variance of the mean response for this linear model (Ricker, 1973). But it is difficult to interpret the variance of simulated values

29

from the E-Coef-NCor method for both the linear and nonlinear models. Therefore, the correlations among parameters should be correctly specified in the simulation to avoid additional biases (Smith et al. 1992). In conclusion, the simulated means of natural mortality in logarithm scale were not different for three simulation methods (E-Coef-Cor, E-Coef-NCor and EA-Exp) in the linear model and the means of simulated lengths-at-age Lt were around ± 3% different for four methods (E-Coef-Cor, E-CoefNCor, EA-Exp and EM-Exp) in the nonlinear von Bertalanffy growth model. However, simulated values from E-Coef-Cor and E-Coef-NCor methods had smaller variance, around 1% to 20% and 40% to 95% to that from EA-Exp method. For simulating lengths-at-age from the nonlinear VBGM, their 95% confidence intervals included small amount (around 7% and 50%) of observed data points. The uncertainty in the data was fully incorporated into the simulation for the errors-in-expected-value methods, which permitted explicit explanation of the variation of simulated values. The smaller variance of simulated values from the errors-in-coefficients methods indicated that only a part of the uncertainty in the data was represented. Therefore, simulation methods incorporating either additive or multiplicative error in the expected values may be recommended for both linear and nonlinear models while the data uncertainty might be under-represented in the errors-incoefficients methods TAO

ACKNOWLEDGEMENT This work is financially sponsored by Aims for Top University Project (Project ID: 2234). I would like to thank Dr. André Punt, University of Washington and KS Chen, National Taiwan University for providing valuable suggestions on an early version of this MS.

REFERENCE Chang, S. K., C. C. Hsu and H. C. Lui (2001). Management implication on Indian Ocean

30

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

albacore from simple yield analysis incorporating parameter uncertainty. Fish. Res., 51: 1-10. Charles, A. T. (1998). Living with uncertainty in fisheries: Analytical methods, management priorities and the Canadian ground fishery experience. Fish. Res., 37: 37-50. Chen, Y. and C. Wilson (2002). A simulation study to evaluate impacts of uncertainty on the assessment of American lobster fishery in the Gulf of Maine. Can. J. Fish. Aquat. Sci., 59: 1394-1403. Chiang, W. C., C. L. Sun, S. Z. Yeh and W. C. Su (2004). Age and growth of sailfish (Istiophorus platyterus) in waters off eastern Taiwan. Fish. Bull., 102: 251-263. Cubillos, L. A., R. Alarcón and A. Brante (1999). Empirical estimates of natural mortality for the Chilean hake (Merluccius gayi): Evaluation of precision. Fish. Res., 42: 147-153. Hall, H. G., S. A. Hesp and I. C. Potter (2004). A Bayesian approach for overcoming inconsistencies in mortality estimates using, as an example, data for Acanthopagrus latus. Can. J. Fish. Aquat. Sci., 61: 1202-1211 Harwood, J. and K. Stokes (2003). Coping with uncertainty in ecological advice: Lessons from fisheries. Trends. Ecol. Evol., 18: 617-622. Hayes, D. B., J. K. T. Brodziak and J. B. O’Gorman (1995). Efficiency and bias of estimators and sampling designs for determining length– weight relationships of fish. Can. J. Fish. Aquat. Sci., 52: 84-92. Helser, T. E. and H. L. Lai (2004). A Bayesian hierarchical meta-analysis of fish growth: With an example for North American largemouth bass, Micropterus salmoides. Ecol. Model., 178: 339-416. Jensen, A. L. (2001). Comparison of theoretical derivations, simple linear regressions, multiple linear regression and principal components for analysis of fish mortality, growth and environmental temperature data. Environmetrics, 12: 591-598 Lin, Y. J., Y. J. Chang, C. L. Sun and W. N. Tzeng (2010). Evaluation of the Japanese eel fishery in the lower reaches of the Kao-Ping River, southwestern Taiwan using a perrecruit analysis. Fish. Res., 106: 329-336. Miller, D. M. (1984). Reducing transformation bias

in curve fitting. Am. Stat., 38: 124-126. Pauly, D. (1980). On the interrelationships between natural mortality, growth parameters and mean environmental temperatures in 175 fish stocks. ICES J. Mar. Sci., 39: 175-192. Quinn, J. T. II and R. B. Deriso (1999). Quantitative Fish Dynamics. Oxford University Press, New York, U.S.A. Quiroz, J. C., R. Wiffa and B. Caneco (2010). Incorporating uncertainty into estimation of natural mortality for two species of Rajidae fished in Chile. Fish. Res., 102: 297-304. R Core Development Team. (2011). R: A language and environment for statistical computing, reference index version 2.12.2. R Foundation for statistical computing, Vienna, Austria. http://www.R-project.org Ricker, W. E. (1973). Linear regression in fisheries research. J. Fish. Res. Board. Can., 30: 409434. Rochet, M. J. and J. C. Rice (2009). Simulationbased management strategy evaluation: ignorance disguised as mathematics? ICES J. Mar. Sci., 66: 754-762. Seber, G. A. F. and A. J. Lee (2003). Linear Regression Analysis. 2nd edition. John Wiley and Sons. Hoboken, NJ, U.S.A. Smith, A. E., P. B. Ryan and J. S. Evans (1992). The effect of neglecting correlations when propagating uncertainty and estimating the population distribution of risk. Risk Anal., 12: 467-474. Vetter, E. F. (1988). Estimation of natural mortality in fish stocks: A review. Fish. Bull., 86: 25-43. Weisburg, S. (2005). Applied Linear Regression. 3rd edition. John Wiley and Sons, Hoboken, NJ, U.S.A. TAO

Supplementary Material S0. Definition of linear model Given the response variable Y and p explanatory variables X1, X2, …, Xp (X1 can be 1 to represent the intercept), the linear model in this study is: Y = β1x1 + ... βpxp + ε, in which the β’s are the model coefficients and ε is an additive error with mean zero and variance σ2. Given a data set of sample size n [yi, xij] for i = 1 to n and j =1 to p and denote

Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012)

ª\º «\ »  Y = «« »» , X =  « » ¬ \ Q ¼ Q u1

E[YSS1 @ E > ȕˆ1  H  [ V     ȕˆ S  H S  [ VS @

ª x11  x1p º «x » « 21  x 2 p » «    » « » «¬ x n1  x np »¼

ª E º «E »  β = ««  »» , and ε = « » «¬ E S »¼ S u1

31

ȕˆ1[ V     ȕˆ S [ VS  E[H [ V     H S [ VS @ YˆV  [ V E[H  ]    [ VS E[H S  ] YˆV E[YSS 2 @ E > ȕˆ1  H  [ V     ȕˆ S  H S  [ VS @

ª H º «H » « » . «» « » ¬H Q ¼ Q u1

ȕˆ1[ V     ȕˆ S [ VS  E[H  [ V     H S  [ VS @ YˆV  [ V E[H  ]    [ VS E[H S  ] YˆV

The linear model can be expressed in matrix form as: Y = Xβ + ε. The leastsquares estimate of model coefficients is β = (XTX)-1XTY with a conditional covariance 2 2 matrix of σ (XTX)-1. σ can be estimated by the mean sums of squares error (MSE), i.e. ıˆ 2 MSE Ȉ ni 1( y i  ; L˟ˆ )2 /( n  p ) (Seber and Lee, 2003; Weisburg, 2005).

E[YSS 3 @ E > ȕˆ1[ V     ȕˆ1[ VS  H V @ YˆV  E[H V ] YˆV

Therefore, the conditional expectations S1 S3 of YS , and YS are all equal to YS in linear models. S3. Conditional variance for simulated values from different methods

S1. Three simulation methods The simulated values for the E-Coef-Cor S1 S2 (YS ) and E-Coef-NCor (YS ) models, given the explanatory variables used in simulation S1 Xs = (xs1...xsp), are computed as: YS = (β1 S2 + ε1,1)xs1 + ... + (βp + εp,1)xsp and YS = (β1 + ε1,2)xs1 + ... + (βp + εp,2)xsp, where ε1,1, …, εp,1 and ε1,2, …, εp,2 are randomly generated from assumed multivariable distributions with a mean vector of zero and covariance matrix Σ1 and Σ2, respectively. Σ1 is the covariance matrix of model coefficients [i.e., MSE(XTX)-1] and Σ2 is the matrix with the diagonal elements identical to those in Σ1 and off-diagonal elements of zero. The simulated values from the EA-Exp method S3 (YS ) are computed by adding a random error (εs) with mean zero and the variance of 2 σ to the expected response (YS = β1xs1 + ... S3 βpxsp), namely YS = YS + εs . For brevity, we drop the conditioning parts of the conditional expectations, variances, and distributions (i.e. E[Y|X] as E[Y]). S2. Conditional expectation of simulated values among methods S1

S2

The conditional expectations of YS , YS S3 and YS given data X, Y and Xs is:

ªc11 «c « 21 T -1 We denote (X X) = «  « «¬c p1 0  0 º c ª 11 «0 c  0 »» 22 « and, N = «     » « » 0  c pp »¼ «¬ 0

c12  c1p º c 22  c 2 p »»    » » c p 2  c pp »¼

, then

TAO

-1

Xs(XTX) XsT = x 2s 1c11 + x2s 2c22 …+ x2s 2cpp + xs1xs2 (c12 + c21) + xs1xs3 (c13 + c31) + ... + xsp(p-1)xsp (c(p-1)p + cp(p-1)) S1

The conditional variance of YS is: Var[YSS1 ]

Var[( ˟ˆ1  İ1,1 )xs1    ( ˟ˆp  İ p,1 )xsp ]

Var[İ1,1xs1    İ p,1xsp ] 2 xs21Var[İ1,1 ]    xsp Var[İ p,1 ]  Ȉ ipz j xsi xsj Cov[İ i ,1, İ j ,2 ] 2 ıˆ 2 [ xs21c11    xsp c pp  xs1xs 2 (c12  c 21 )   

xs ( p -1) xsp (c( p -1) p  c p( p -1) )] ıˆ 2 [ X s (X T X)ů1 X Ts ]

In addition, the conditional variance for Yˆ  Xs βˆ is Var( X βˆ ) s

s

-1 T  Xs Var(βˆ ) XTs  ˆ 2 Xs (X T X) Xs  Var[YSS1 ]

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

32

S2

The conditional variance of YS is: S2 S

Var[Y

]

to n).

Var[( ˟ˆ1  İ1,2 )xs1    ( ˟ˆp  İ p,2 )xsp ]

Var[İ1,2 xs1    İ p,2 xsp ] 2 xs21Var[İ1,2 ]    xsp Var[İ p,2 ]

§ n n © Ȉi 1 xi

Then (X T X) ¨¨

Ȉ ni 1 x i · ¸ Ȉ ni 1 x i2 ¸¹

with the inverse of

p iz j

 Ȉ xsi xsj Cov[İ i ,2 , İ j ,2 ]

Because Cov[εi,2, εj,2] is zero for any i ≠ j in S2 YS , then

(X T X)ů1

1 § Ȉ ni 1 x i2 ¨ ns x2 ¨© - Ȉ ni 1 x i

2 9DU[YSS 2 @ ıˆ  > xs2&    xsp & pp @

ıˆ  ; V1; V7

§ Ȉ ni 1 x i2 1 ¨ nȈ ni 1 x i2  (Ȉ ni 1 x i )2 ¨© - Ȉ ni 1 x i

- Ȉ ni 1 xi · ¸ n ¸¹

- Ȉ ni 1 x i · ¸ n ¸¹ ,

where Sx2 Ȉ ni 1( x i - x )2 . The conditional variance of β1 and β2, and the covariance of β1 and β2 is

S3

The conditional variance of YS is: Var[YSS 3 ]  Var[ βˆ1x s 1    βˆ1x sp  ε s ]



ı2 n ˰2 n 2 ı Ȉ i 1x i , [ ,  and 2 ¦ L Sx nS x2 n6 [ i 1 respectively.



 Var[ε s ]  ˆ 2 S1

Therefore, the conditional variance of YS S2 S3 would be different to that of YS and YS -1 T T -1 T unless (X X) = N, and [Xs(X X) Xs ] = 1. S4. Conditional distribution of simulated values We further assumed that ε1,1, …, εp,1 and ε1,2, …, εp,2 are generated from p-variable multivariate normal distributions with mean vectors of zeros and covariance matrices Σ1 and Σ2. Then any linear combination of ε1,1, …, εp,1 and ε1,2, …, εp,2 is also normally distributed (Balarishnan and Nezorov, 2003). εs is assumed to be randomly generated from a univariate normal distribution with a mean of zero and variance of σ2. S1 Because YS = YS + ε1,1 xs1 + … + εp,1 xsp S2 and YS = YS + ε1,2 xs1 + … + εp,2 xsp are the linear combinations of ε1,1, …, εp,1 and ε1,2, …, S3 εp,2, and YS = YS + εs is simply adding εs to the expected response, then the conditional S1 S2 S2 distributions of YS , YS and YS are also normal with a conditional expectation of YS, 2 -1 2 and variances of σ [Xs(XTX) XsT], σ XsNXsT 2 and σ , respectively

Values are simulated according to three methods with the explanatory variable at xs. 2 2 2 2 σ is estimated by σ and σs is set at σ . The S1 S2 S3 conditional variance of YS , YS and YS is: Var[Yss1 ]

ıˆ 2 n 2 [Ȉ i 1x i  nx s2  2Ȉ ni 1x i ] nS x2

ıˆ 2 n 2 [Ȉ i 1x i  nx  n( xs2 - 2 x  x 2 )] nS x2 ıˆ 2 1 ( xs - x )2 [  ] Sx2 n Sx2 Var[Yss 2 ]

ıˆ 2 n 2 [Ȉ i 1x i  nx s2 ] nS x2 TAO

ıˆ 2 n 2 [Ȉ i 1x i  nx  n( xs2 - 2 x  x 2 )  2nx ] nS x2 ıˆ 2 1 ( xs - x )2  2 x [  ] Sx2 n Sx2

S5. Simple linear model example

and Var[YS ] = Var[εS] = σ Given that x > 0, n is not trivially small and xs does not depart from x greatly so that ª 1 ( x - x )2  2 x º 1! «  s » Sx2 ¬n ¼

The differences in the conditional expectation and variance among methods can be discussed directly in a simple linear model, (i.e. Y = β1 + β2x and X = [1, xi], i = 1

then Var[Y S ] > Var[Y S > Var[Y S ], indicating that the simulated values from the E-Coef-Cor and E-Coef-NCor methods would have underestimated variation relative to those from the EA-Exp method.

2

S3

S3

S2

S1

Simulated Values for a Linear Model and the Nonlinear Von Bertalanffy Growth Model (2012)

REFERENCES Balarishnan, N. and B. Nezorov (2003). A Primer on Statistical Distributions. John Wiley and Sons. Hoboken, NJ, U.S.A. Seber, G. A. F. and A. J. Lee (2003). Linear

33

Regression Analysis. 2nd edition. John Wiley and Sons. Hoboken, NJ, U.S.A. Weisburg, S. (2005). Applied Linear Regression. 3rd edition. John Wiley and Sons. Hoboken, NJ, U.S.A.

TAO

34

Yu-Jia Lin, Nan-Jay Su, Brian M. Jessop, Wei-Chuan Chiang and Chi-Lu Sun (2012)

模擬方式對於模擬線性模式及非線性范氏成長模式 之影響 林裕嘉1‧蘇楠傑1‧Brian M. Jessop2‧江偉全3‧孫志陸1* (2011年11月18日收件;2012年1月26日接受) 蒙地卡羅模擬是一種,可將不確定性納入漁業評估模式中考量之有利工具。然而在 實務上,即使考量同樣類型的不確定性,不同研究間使用的方式也往往有所差異,因此有 可能產生額外的偏差。本研究探討:1.加成性(additive)誤差於模式參數並考量參數相關、 2.加成性誤差於模式參數並忽略參數相關、3.加成性及、4.乘積性(multiplicative)誤差於 模式期望值等四種方式,採用雨傘旗魚(Istiophorus platypterus)的資料,利用鮑利經驗式 (Pauly’s empirical equation)及范氏成長模式(Von Bertalanffy Growth equation),作為線 性及非線性模式之範例。在線性模式及范氏成長模式中,不同模擬方式影響模式之條件期 望值甚微。然而方式1及2,其模擬模值變異數僅為方式3之1~7%或40~95%,模擬隨機誤 差、並加於模式參數上之模擬方式,並不能完全呈現資料中之不確定性。因此欲使用蒙地 卡羅模擬資料時,建議應使用加成性或乘積性誤差於模式期望值之方式。 關鍵詞:蒙地卡羅模擬,不確定性,模式參數,加成性及乘積性誤差。

TAO

臺灣大學海洋研究所,臺灣,中華民國 Department of Fisheries and Oceans, Bedford Institute of Oceanography, PO Box 1006, Dartmouth, Nova Scotia B2Y 4A2, Canada 3 水產試驗所東部海洋生物研究中心,臺灣,中華民國 * 通訊作者,E-mail: [email protected] 1 2

Suggest Documents