Marine Pollution Bulletin 58 (2009) 1916–1921
Contents lists available at ScienceDirect
Marine Pollution Bulletin journal homepage: www.elsevier.com/locate/marpolbul
Note
A Bayesian hierarchical modeling approach for analyzing observational data from marine ecological studies Song S. Qian a,*, J. Kevin Craig b, Melissa M. Baustian c, Nancy N. Rabalais d a
Nicholas School of the Environment, Duke University, Durham, NC 27708, USA Florida State University Coastal and Marine Laboratory, Florida State University, 3618 Highway 98, St. Teresa, FL 32358-2702, USA c Department of Oceanography and Coastal Sciences, 1231 Energy, Coast and Environment Building, Louisiana State University, Baton Rouge, LA 70803, USA d Louisiana Universities Marine Consortium, 8124 Highway, 56, Chauvin, LA 70344, USA b
a r t i c l e Keywords: ANOVA Bayesian statistics Gulf of Mexico Hierarchical model Hypoxia
i n f o
a b s t r a c t We introduce the Bayesian hierarchical modeling approach for analyzing observational data from marine ecological studies using a data set intended for inference on the effects of bottom-water hypoxia on macrobenthic communities in the northern Gulf of Mexico off the coast of Louisiana, USA. We illustrate (1) the process of developing a model, (2) the use of the hierarchical model results for statistical inference through innovative graphical presentation, and (3) a comparison to the conventional linear modeling approach (ANOVA). Our results indicate that the Bayesian hierarchical approach is better able to detect a ‘‘treatment” effect than classical ANOVA while avoiding several arbitrary assumptions necessary for linear models, and is also more easily interpreted when presented graphically. These results suggest that the hierarchical modeling approach is a better alternative than conventional linear models and should be considered for the analysis of observational field data from marine systems. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Statistical methods for analyzing experimental data originally developed by Fisher (1925) are widely used in ecology (e.g., Gotelli and Ellison, 2004). Analysis of variance (ANOVA), in particular, is the most common means of analyzing experimental data in order to make causal inferences about the effects of an independent variable on a response variable. Although originally developed to analyze conventional experiments characterized by replication of treatments and formal controls, ANOVA is often applied to observational field data where samples or sampling stations are considered replicated experimental units and categorized into discrete groups presumed to differ with respect to some independent, or ‘‘treatment,” variable. The results of ANOVA can be ambiguous when the normality and independence assumptions of the response data are not met, when the experimental design is nested or unbalanced, when there are missing values, or when background variability is high resulting in low statistical power. Such conditions are commonly encountered in observational studies of marine systems. Gelman and coworkers explored the use of Bayesian hierarchical (or multilevel) models as an alternative analytical approach to traditional linear models such as ANOVA (Gelman, 2005; Gelman * Corresponding author. Tel.: +1 919 613 8105; fax: +1 919 681 5740. E-mail addresses:
[email protected] (S.S. Qian),
[email protected] (J.K. Craig),
[email protected] (M.M. Baustian),
[email protected] (N.N. Rabalais). 0025-326X/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.marpolbul.2009.09.029
and Tuerlinckx, 2000; Gelman and Hill, 2007). Qian and Shen (2007) re-analyzed a number of classic ecological field studies using Bayesian hierarchical models and concluded that the hierarchical approach is more informative and easier to interpret than traditional linear models, can provide a more accurate description of the true relationship between the response and potential predictor variables, and can discern a smaller treatment effect than that discernable with traditional linear models. These improvements are attributable largely to the alternative computational framework, which is described in greater detail in the above references, and have led to the suggestion that the Bayesian hierarchical approach should replace traditional ANOVA as the method of choice to analyze ecological data. The hierarchical approach emphasizes the estimation of treatment effect size using variance components rather than significance tests (i.e., F-tests) for whether an effect exists or not based on p-values. Such significance tests have been widely criticized in a number of sub-fields of ecology (e.g., Anderson et al., 2000; Johnson, 1999) and in statistics (Krantz, 1999). Hypothesis testing following traditional ANOVA is typically done using any of a number of multiple comparison procedures that emphasize the Type I and Type II error rates. The type I error (rejecting the null when it is true) is the focus of the classical statistics, while type II error (failure to reject the null when the alternative is true) is often the concern of a scientist because we rarely conduct a costly experiment believing the null hypothesis of no effect to be true. What is more relevant to a scientist is the error of sign, that is, we conclude a positive effect when, in fact,
S.S. Qian et al. / Marine Pollution Bulletin 58 (2009) 1916–1921
a negative effect is true. More generally, an error of sign includes the case when we predict an effect that is larger than a certain magnitude when it is actually less. The Type S (for sign) error rate, arises naturally from the Bayesian hierarchical modeling computations and has been shown to be more robust than traditional multiple comparison procedures typically used for hypothesis testing (Gelman and Tuerlinckx, 2000). Here, we further expand on the work of Qian and Shen (2007) by illustrating the application of Bayesian hierarchical models to a particular marine field study that illustrates several of the analytical challenges noted above. The study used field-collected benthic cores in multiple regions of the Louisiana continental shelf that differed in bottom dissolved oxygen conditions to infer the effects of bottom-water hypoxia (dissolved oxygen 62.0 mg L1) on benthic macroinfaunal communities (Baustian et al., in press). This was an observational study because the treatment (hypoxia) was not randomly applied to the selected benthic communities, but, rather, communities within the hypoxic area, inshore, and offshore of the hypoxic area were hypothesized to differ in macroinfaunal abundance and diversity. Using this example, we compare the results from conventional ANOVA followed by multiple comparisons with those from the Bayesian hierarchical model, or multilevel ANOVA. In the remainder of the paper we present a brief description of the data and the statistical methods in Section 2, results of the Bayesian approach in Section 3, followed by a brief discussion and direction to additional literature. 2. Methods 2.1. Data Benthic infaunal communities often exhibit high spatial and temporal variability due to the myriad factors that can impact infaunal abundance and species composition (Barry and Dayton, 1991; Morrisey et al., 1992). Low bottom dissolved oxygen typically causes declines in abundance and diversity, but the magnitude of these effects vary with the duration of exposure to low-oxygen, species-specific tolerances, and other factors (Baustian and Rabalais, 2009; Diaz and Rosenberg, 1995; Rabalais et al., 2001). Benthic infaunal abundance and species richness were sampled over an approximately 5000 km2 area off the central coast of Louisiana from August 2–12, 2003 using a box core (Baustian et al., in press). Three replicate measurements were collected at each of fifteen stations in three general areas of the shelf: (1) within the hypoxic zone (five stations), (2) inshore of the hypoxic zone (four stations) and (3) offshore of the hypoxic zone (six stations). Further, one sub-core was taken from the sediment collected in each box core. The objective of the study was to test the hypothesis that hypoxia causes changes within the benthic infaunal community, measured in terms of total abundance and species richness. This type of data is typical of that obtained from observational field studies in marine systems in that: (1) variability in the response variable (abundance, richness) was expected to be high and (2) sample sizes were relatively low due to the difficulty and high costs of obtaining samples from open continental shelf environments. Both (1) and (2) lead to low statistical power to detect a treatment effect. Because the hypoxia ‘‘treatment” was not randomly applied to the sampled benthic communities as in a controlled experiment, there is also a danger that an effect attributed to hypoxia was actually caused by some correlated but unmeasured factors. 2.2. Statistical methods A major difference between the Bayesian and classical statistics approaches is whether or not to treat an unknown parameter of
1917
interest as a random variable. In classical statistics, a model parameter (for example, the effect of hypoxia on benthic macroinvertebrate species richness) is assumed to be a constant, and randomness enters through the sampling process. In contrast, the Bayesian approach treats unknown parameters as random variables. This is not only a philosophical difference that distinguishes Bayesian and frequentist approaches, but also leads to very different methods for statistical inference. For a frequentist, statistical inference about a parameter is made through the sampling distribution of the estimator, while a Bayesian approach treats an unknown parameter as a random variable and statistical inference is made based on the posterior probability distribution of the parameter. Although philosophically distinct in treating an unknown parameter, practically the main difference between the Bayesian and classical approaches is the level of computation involved. Conventional wisdom suggests that the Bayesian approach has a much higher computational requirement because application of the Bayes Theorem requires potentially high dimensional integration. But when the computational procedures are examined more closely, the classical approach actually entails more difficult computations. The widespread use of the classical statistics was largely due to the efforts of R.A. Fisher and other leading statistical figures in the 1950s who standardized a number of methods and systematically tabulated the most commonly used computations. These efforts were followed by fast computational algorithms implemented in most commonly used software packages. In contrast, although the general principle of the Bayesian computation (application of the Bayes Theorem) is applicable to all cases, there is not a simple and generalized algorithm for a class of problems such as the generalized linear models (GLM). The convenience of a set of generalized computational methods available in classical statistics is, however, at the expense of rather limited flexibility. In this section, we discuss this point using ANOVA and hypothesis testing as examples. 2.2.1. Hierarchical (multilevel) modeling for ANOVA A one-way ANOVA problem is typically represented by a simple equation:
yij ¼ b0 þ bi þ ij
ð1Þ
where yij is the jth observation receiving ith treatment, bi is the P treatment effect i and i bi ¼ 0. The total variance in yij is partitioned into between group variance and within group variance. The classical ANOVA uses the following estimators for the b’s: ni k X X ^0 ¼ 1 y b N i¼1 j¼1 ij
and ni X ^0 1 ^0 y ^i ¼ b i y ¼b b ni j¼1 ij
where N is the total sample size and ni is the sample size in treatment i. The variance partition is estimated using the sum-ofsquares calculation. For the one-way ANOVA problem in Eq. (1), PP ^0 Þ2 is partithe total variance in the response SST ¼ i j ðyij b tioned into within group variation (or residual sum of squares) i P hP 2 and the variation among groups SSG ¼ SSE ¼ j i ðyij yi Þ P ^ 2 i ni ðyi b0 Þ , and SSG + SSE = SST. The mean sum of squares, MSG ¼ SSG=ðk 1Þ and MSE ¼ SSE=ðN k Þ, are the estimates of the between and within group variances, respectively. The same sumof-square computation applies to multi-way ANOVA, but a unique partitioning requires that the design is balanced.
1918
S.S. Qian et al. / Marine Pollution Bulletin 58 (2009) 1916–1921
When using the Bayesian hierarchical model, the coefficients bi are modeled as a sample from a normal distribution with mean 0 and variance r2b :
bi Nð0; r2b Þ The model error term distribution:
ij Nð0; r2 Þ
ð2Þ
ij
is also modeled as from a normal
ð3Þ
Naturally, the within group variance is r2 and the between group variance is r2b . The connection between the two approaches is illustrated by the analytical solution of the estimated mean of bi , which is h i i Þ (and the variance is ½1=ð1=r2b þ 1=r2 ÞÞ. ðr2b Þ=ðr2b þ r2 Þ ðb0 y The Bayesian hierarchical mean approaches the classical estimator when r2b ! 1. The Bayesian approach is in fact a generalization of the classical approach. The Bayesian hierarchically estimated (group) means are closer to the overall mean of 0 than the same estimated using the classical method. This phenomenon is known as Bayesian shrinkage. The level of shrinkage (difference between the two estimates) is determined by the relative size of r2 and r2b . The shrinkage is large when the within group variance is large in comparison to the between group variance (we are less certain about the treatment effect directly calculated from data under such conditions, hence a large shrinkage to discount the estimated effects), and vice versa. The classical estimator has no shrinkage. As a result, the Bayesian estimated treatment effect sizes are smaller than the corresponding classical estimates. 2.2.2. Type S based hypothesis testing and multiple comparisons The F-test, or ratio of between group to within group variance, as a basis for assessing statistical significance is unique to classical ANOVA. Because the F-test only determines whether differences exist among groups but provides no insight on particular differences, multiple comparisons are almost always conducted once a significant F-statistic is obtained. Because the number of comparisons among groups can be large, a concern about such procedures is the control of the overall type I error probability. The typical strategy in classical statistics is to reduce the significance level for each comparison or to make the confidence intervals of the differences wider in order to maintain a particular overall error probability (i.e., a ¼ 0:05). For example, when testing the difference between two treatment levels bj and bk with known within group variance r2 , the classical two-sided test can be conducted by computing the p confidence interval of the estimated difference ffiffiffi k 1:96 2r. This formulation is based on the fact that the j y y k is r2 þ r2 ¼ 2r2 (and j y variance of the estimated difference y pffiffiffi the standard deviation of the difference is 2r). If the confidence interval does not include 0, we reject the null hypothesis of H0 : bj bk ¼ 0. The value 1.96 is obtained using a test-wide significance level of pffiffiffi 0.05. The null hypothesis is rejected when k j > 1:96 2r. When performing multiple comparisons, the j y jy value 1.96 is scaled up using various correction methods. These correction methods are often overly conservative resulting in a much smaller comparison-wide type I error probability (hence the tests are of lower statistical power). Under the Bayesian hierarchical modeling approach, it has been shown that the two sided test is a special case of a one-sided test that minimizes the type S (for sign) error (Gelman and Tuerlinckx, 2000). That is, we test H0 : bj < bk versus Ha : bj > bk , and the type S error is the error of concluding Ha while H0 is true. Under the hierarchical model, the posterior distribution of the difference bj bk is a normal distribuh i j y k Þ and variance tion with mean ðr2b Þ=ðr2b þ r2 Þ ðy
2=ð1=r2b þ 1=r2 Þ. The posterior 95% credible interval of the differqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h i j y k Þ 1:96 ½2=ð1=r2b þ 1=r2 Þ. To test ence is ðr2b Þ=ðr2b þ r2 Þ ðy the classical two-sided hypothesis using the Bayesian results, we rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h i pffiffiffi j y k j > 1:96 2r 1 þ ðr2 Þ=ðr2b Þ . Again, the reject the null if jy classical testing procedure is a special case of the Bayesian approach (when rb ! 1). Gelman and Tuerlinckx (2000) derived rules for testing the one-sided hypothesis to minimize the type S error: if the posterior 95% credible interval of bj bk does not include 0, we conclude that bj > bk if the interval is positive, and bj < bk if the interval is negative. If the interval includes 0, we are unable to determine whether bk is larger or less than bj . 2.2.3. A Bayesian hierarchical alternative to ANOVA The classical ANOVA can be entirely replaced by the Bayesian hierarchical model represented in Eqs. (1)–(3). The Bayesian approach estimates the posterior distribution of bi , which can be used to estimate the posterior distributions of any paired differences between groups. A plot of the 90% credible intervals of these differences will clearly show which groups are different. This approach includes the multiple comparisons following classical ANOVA as a special case. The benefit of using the hierarchical approach is that the confusing multiple comparison correction necessary to maintain the overall error rate is handled by the hierarchical structure of the multiple treatment levels. This hierarchical structure provides partial pooling of information among different treatment levels and the estimated treatment effects are shrunk towards the overall mean. This shrinkage is based on the strength of information in each group and is a much improved alternative to the ‘‘corrections” used in classical multiple comparison. Computation of the Bayesian hierarchical model is easily implemented using Markov chain Monte Carlo simulation (Gilks et al., 1996; Qian et al., 2003; Qian and Shen, 2007). 2.2.4. The model The three areas where our data were collected represent three levels of the treatment: hypoxia, inshore of hypoxia, and offshore of hypoxia. A typical approach to analyzing these data would be nested ANOVA with the three areas representing three levels of the treatment (within hypoxia, inshore of hypoxia, and offshore of hypoxia), the replicate measurements at each station representing the nested factor within each area, and the multiple sub-cores representing the error variance. In this case, a complicating problem is that not only do bottom dissolved oxygen concentrations differ among the areas, but other characteristics (e.g., water depth, distance to shore) also differ. Depth is a potential factor affecting both benthic infauna abundance and species richness. However, we do not know how to best describe the depth relationship. For this reason, we propose a hierarchical model:
yijk Nðbij ; r2 Þ ; r2within Þ bij Nðbarea i barea i
h
Nðb ; r
ð4Þ
2 between Þ
where yijk is the kth response variable value (log abundance or species richness) in station j at area i. The station-specific mean bij is used to model depth and/or other confounding factors. For each area i, bij ’s are modeled as from a common (area-specific) distribuare further modeled as from a hyper distion. The area means barea i tribution. This model is a generalization of a conventional nested ANOVA model. The Bayesian hierarchical modeling approach estimates the joint distribution of all model parameters: ; bh ; r2 ; r2within ; r2between , from which we can calculate the marbij ; barea i ginal distributions of individual parameters. The comparison of
1919
S.S. Qian et al. / Marine Pollution Bulletin 58 (2009) 1916–1921
interest is the difference in response among the three areas. Given the posterior joint distribution is presented in terms of MCMC samples, we can easily calculate the posterior distributions of differences of all three pairs of the three areas (inshore, hypoxia, offshore). 3. Results and discussion Posterior distributions of model parameters are presented graphically, showing the estimated mean and 50% and 90% credible intervals (Fig. 1). Stations within the hypoxic zone had lower abundance and species richness than those either inshore or offshore of the hypoxic zone (Fig. 1). When using the hierarchical approach to estimate the posterior distributions of each of the paired differences between the three areas, analogous to multiple comparisons following a conventional ANOVA, difference in species richness between the hypoxic and offshore areas as well as difference in abundance between the hypoxic and inshore areas are statistically significant (in the sense that the probability of making a type S error is below 0.05), while the other paired comparisons are not. This result is easily visualized by the 90% credible interval of the differ-
offshore offshore offshore offshore offshore offshore hypoxia hypoxia hypoxia hypoxia hypoxia inshore inshore inshore inshore
ence (Fig. 1 bottom row). Species richness is significantly lower in the hypoxic area than offshore but differences in richness between hypoxic and inshore areas as well as between inshore and offshore areas are not significant (Fig. 1). These results cannot be accounted for by depth or other potentially confounding factors because the hierarchical model presented here had considerably more support (lower deviance information criteria or DIC, Spiegelhalter et al. 2002) than two alternative models that explicitly incorporated an effect of depth on species abundance and richness (Tables 1 and 2). The hierarchical model also shows a large within station variation both in abundance and species richness. Furthermore, the estimated residual standard deviations vary by area (Fig. 1, third row). Although the constant variance assumption of the conventional ANOVA are not necessarily violated in this problem, the separately estimated residual standard deviation for each treatment group relieves us from this assumption that is difficult to meet or test in practice. The results presented in Fig. 1 include all the necessary information needed for statistical inference, both in terms of estimating effect sizes and hypothesis testing of differences among groups. For estimation, the posterior distributions summarized in these figures
offshore offshore offshore offshore offshore offshore hypoxia hypoxia hypoxia hypoxia hypoxia inshore inshore inshore inshore 2.0 2.5 3.0 3.5 Station mean log richness
3.5
offshore
offshore
hypoxic
hypoxic
inshore
inshore 2.5 3.0 Mean log richness
3.5
4.0
offshore
offshore
hypoxic
hypoxic
inshore
inshore 0.10
0.20 0.30 0.40 Residual Standard Deviation
0.3
I−O
I−O
H−O
H−O
H−I
H−I −1.0 −0.6 −0.2 Station mean log richness
0.2
4.0 4.5 5.0 5.5 6.0 Station mean log abundance
4.5 5.0 Mean log abundance
5.5
0.4 0.5 0.6 0.7 0.8 0.9 Residual Standard Deviation
−1.0 −0.5 0.0 0.5 Station mean log abundace
1.0
Fig. 1. Results for species richness are shown in the left column and results for abundance are in the right column. The two Bayesian hierarchical models show that (1) there is large variation among stations within each area (hypoxic, inshore, offshore) for both species richness and total abundance (top row), (2) the hypoxic area has the lowest mean species richness and lowest mean total abundance (second row), (3) the residual standard deviations are different among the three areas (third row), and (4) the hypoxic area has a significantly lower species richness than the offshore area and a significantly lower total abundance than the inshore area (bottom row).
1920
S.S. Qian et al. / Marine Pollution Bulletin 58 (2009) 1916–1921
Table 1 Alternative models. Models
Equations yijk N bij ; r2y bij N b1i þ b2 depthij ; r2within b1i N lb1 ; r21
Common slope
yijk N bij ; r2y bij N b1i þ b2i depthij ; r2within b1i N lb1 ; r21 b2i N lb2 ; r22
Area-specific slopes
Table 2 Comparing three competing models using deviance information criterion.
Richness Abundance
Common slope
Area-specific slopes
Eq. (4)
79 49
71 42
63 5
First, it reduces the influence of extreme data points in any given group. Because the level of shrinkage is determined by the relative sizes of within and between group variation, a group with larger within group variance will have an estimated mean that is shrunk more towards the overall mean than the same from a group with a smaller within group variance. This increased shrinkage reflects our uncertainty regarding the estimated mean with large variance. Second, because the estimated mean (posterior distribution) has been shrunk, no further corrections are necessary when conducting multiple comparisons (Gelman and Tuerlinckx, 2000). The differences between the classical and Bayesian approaches have meaningful consequences for how we interpret the effects of hypoxia on macrobenthic community structure. Based on traditional linear models we detected a significant effect of hypoxia on species richness but the effect on abundance was not statistically significant (Baustian et al., in press). Using the Bayesian hierarchical model we detected the same effect on species richness but also an effect on abundance. The Bayesian approach should be recognized as a better alternative to the traditional ANOVA because of its improved statistical power to detect a treatment effect and the fewer number of arbitrary assumptions. Acknowledgement
show both the estimated mean and the 50% and 90% credible intervals. When comparing effects for three treatment levels, the multiple comparison plots provide results that are optimal in terms of minimizing the type S error. The probability of a type S error is more informative than the conventional type I error because we are typically more interested in the nature of the difference among groups (e.g., whether total abundance in the hypoxic zone is less than the abundance in the inshore area) than whether or not there is a difference. Gelman and Tuerlinckx (2000) also argued that the type S error concept is more relevant in terms of hypothesis testing. By adopting the type S error we are focusing on the results we are interested in, rather than making inference using the null hypothesis as a conduit. Furthermore, when the null hypothesis of no difference is of interest, the traditional hypothesis testing method is ineffective in determining the type II error rate, a concept that is potentially important but can never be adequately quantified and often misinterpreted using the classical hypothesis testing procedure (Qian (2009, pp. 83–87)). The type S error based test eliminated the type II error. Our hierarchical modeling approach represents a generalization of the conventional linear modeling approach for multilevel data. The generalization not only provides more informative results, but also avoids some arbitrary decisions necessary in classical statistics. For example, multiple comparisons in classical statistics require adjustments to the test-wide significance level in order to maintain an overall error rate. In our hierarchical model, this adjustment is automatically determined based on the relative size of the between and within group variances. The main differences between the Bayesian multilevel approach and the classical ANOVA are the common prior distribution on the h i area-specific means Nðbh ; r2between Þ used in the Bayesian method. The classical approach does not have the common prior (equivalent to setting the between group variance to infinity). The use of the common prior in the Bayesian approach results in the natural ‘‘shrinkage towards the mean.” The shrinkage effect was originally studied by Stein (1956, 1960, 1962), and used by others, e.g., Lindley, 1962; Lindley and Smith, 1972; Sclove, 1968, 1971; Zellner and Vandaele, 1974; Berger, 1985; Leonard and Hsu, 2001; Judge et al., 1985; Mittelhammer et al., 2000. The general consensus of these studies is that shrinkage techniques yield improved estimates and predictions of not only individual outcomes but also of total outcomes. Specifically, shrinkage results in two desirable features.
This work was partly supported by the US Geological Survey through a cooperative agreement between USGS and Duke University (08HQAG0121) to S.S.Q., by the National Oceanic and Atmospheric Administration Center for Sponsored Coastal Ocean Research Grant No. NA03NOS4780037 (Publication No. NGOMEX 126) to N.N.R. and No. NA03NOS4780040 to J.K.C. The views expressed herein are those of the authors and do not necessarily reflect the view of the US Government and its agencies. We thank the associate editor and one anonymous reviewer whose constructive comments and suggestions on the initial submittal led to a much improved paper. References Anderson, D.R., Burnham, K.P., Thompson, W.L., 2000. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64, 912–923. Barry, J.P., Dayton, P.K., 1991. Physical heterogeneity and the organization of marine communities. In: Kolasa, J., Pickett, S.T.A. (Eds.), Ecological Heterogeneity. Springer-Verlag, New York, pp. 270–320. Baustian, M.M., Craig, J.K., Rabalais, N.N., in press. Effects of summer hypoxia on macrobenthos and Atlantic croaker foraging selectivity in the northern Gulf of Mexico. Journal of Experimental Marine Biology and Ecology. Baustian, M.M., Rabalais, N.N., 2009. Seasonal composition of benthic macroinfauna exposed to hypoxia in the northern Gulf of Mexico. Estuaries and Coasts 32, 975–983. Berger, J.O., 1985. Statistical Decision Theory and Bayesian Analysis, second ed. Springer-Verlag, New York. Diaz, R.J., Rosenberg, R., 1995. Marine benthic hypoxia: a review of its ecological effects and the behavioural responses of benthic macrofauna. Annual Review of Oceanography and Marine Biology 33, 245–303. Fisher, R.A., 1925. Statistical Methods for Research Workers, first ed. Oliver and Boyd, Edinburgh. 14th edition reprinted in 1970. Gelman, A., 2005. Analysis of variance – why it is more important than ever (with discussions). The Annals of Statistics 33 (1), 1–53. Gelman, A., Hill, J., 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York. Gelman, A., Tuerlinckx, F., 2000. Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics 15, 373–390. Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), 1996. Markov Chain Monte Carlo in Practice. Chapman and Hall, New York. Gotelli, N.J., Ellison, A.M., 2004. A Primer of Ecological Statistics. Sinauer Associates Inc. Publishers, Sunderland, MA, USA. Johnson, D.H., 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63 (3), 763–772. Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H., Lee, T., 1985. The Theory and Practice of Econometrics, second ed. Wiley, New York. Krantz, D.H., 1999. The null hypothesis testing controversy in psychology. Journal of the American Statistical Association 94, 1372–1381. Leonard, T., Hsu, J.S.J., 2001. Bayesian Methods. Cambridge University Press, Cambridge.
S.S. Qian et al. / Marine Pollution Bulletin 58 (2009) 1916–1921 Lindley, D.V., 1962. Discussion on Professor Stein’s paper. Journal of the Royal Statistical Society, Series B 24, 285–287. Lindley, D.V., Smith, A.F.M., 1972. Bayes estimates for the linear model. Journal of the Royal Statistical Society, Series B 34, 1–41. Mittelhammer, R.C., Judge, G.G., Miller, D.J., 2000. Econometric Foundations. Cambridge University Press, Cambridge. Morrisey, D.J., Howitt, L., Underwood, A.J., 1992. Spatial variation in soft-sediment benthos. Marine Ecology Progress Series 81, 197–204. Qian, S.S., 2009. Environmental and Ecological Statistics with R. Chapman and Hall/ CRC Press, Boca Raton, Florida. Qian, S.S., Shen, Z., 2007. Ecological applications of multilevel analysis of variance. Ecology 88 (10), 2489–2495. Qian, S.S., Stow, C.A., Borsuk, M., 2003. On Monte Carlo methods for Bayesian inference. Ecological Modelling 159, 269–277. Rabalais, N.N., Smith, L.E., Harper, D.E., Justic, D., 2001. Effects of seasonal hypoxia on continental shelf benthos. In: Rabalais, N.N., Turner, R.E. (Eds.), Coastal Hypoxia: Consequences for Living Resources and Ecosystems, Coastal and Estuarine Studies, vol. 58. American Geophysical Union, Washington, DC, pp. 211–240.
1921
Sclove, S.L., 1968. Improved estimators for coefficients in linear regression. Journal of American Statistical Association 63, 596–606. Sclove, S.L., 1971. Improved estimation of parameters in multivariate regression. Sankhya, Series A 33, 61–66. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van der Linde, A., 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64 (4), 583–616. Stein, C., 1956. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. University of California Press, Berkeley, CA, pp. 197–206. Stein, C., 1960. Multiple regression. In: Olkin, I. (Ed.), Contributions to Probability and Statistics in Honor of Harold Hotelling. Stanford University Press, Stanford. Stein, C., 1962. Confidence sets for the mean of a multivariate normal distribution. Journal of the Royal Statatistical Society, Series B 24, 265–296. Zellner, A., Vandaele, W.A., 1974. Bayes–Stein estimators for k-means, regression and simultaneous equation models. In: Fienberg, S.E., Zellner, A. (Eds.), Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage. NorthHolland, Amsterdam, pp. 627–653.