model to statistical methods typically used to analyze variability in marine ... for the purpose of variability analyses (since other data, as abundances, are partial ...
Mechanistic vs. Statistical Variability Decomposition (unpublished supplementary material by Maria Moreno de Castro)
We were asked for more clarification about the quality of our method. My intention in this supplementary material is providing a review-looking summary comparing uncertainty propagation with a mechanistic model to statistical methods typically used to analyze variability in marine sciences, as analysis of variance (ANOVA) and principal components analysis (PCA). Essentially, both ANOVA and PCA methods try to identify major variability contributors by decomposing the variance as in the following: Var_POC = a_1*Var_DIN + a_2*Var_temp + a_3*Var_size + a_4*Var_loses + a_5*Var_CO2uptake + .... + a_i*Var_factor_i + ... + a_N*Var_factor_N. The higher the value of a coefficient a_i, the more the variations in POC correlate with, or are explained by, the variations in factor_i. With this kind of description, ANOVA and PCA are assuming linearity of the decomposition of the total variance and independence of the partial variances; for instance, given N = 3, neither the following decomposition: Var_POC = a_1*Var_DIN *a_2* Var_temp / (a_3*Var_size) i.e, a nonlinear variability decomposition, or this other: Var_POC = a_1*Var_DIN + a_2*Var_temp(Var_size) i.e., non independent factors variability, are accounted by ANOVA or PCA.
After this clarification, I enumerate the reasons why uncertainty propagation with a mechanistic model is better: (1) Var_factor_i were not measured for all i. Actually, only DIN and temperature are available per mesocosm for the purpose of variability analyses (since other data, as abundances, are partial and extrapolated noncomplete measurements); then, as there are only time series for DIN and temperature, only Var_DIN and Var_temp could be part of the Var_POC decomposition as described above. The only way to overcome this limitation is by simulating Var_factor_i. (2) we look for individual contributors to Var_POC, thus we do not impose linearity in the decomposition; ANOVA and PCA look for correlations between Var_POC and Var_factor_i with all the factors simultaneously and try to estimate how much of the variability of POC is explained by the variability in factor_i relative to, and scaled with, the variability in other factors, such as all together add to Var_POC. (3) our method consider the variances change in time, Var_factor_i(t), i.e., we account for the influence of Var_factor_i on Var_POC at any time and not only on temporal average; the most ANOVA can do is finding a correlation of Var_POC and time (named repeated measures ANOVA) but it cannot find at what time a given Var_factor_i is more relevant, as our method does. (4) using statistical tools to calculate the thresholds below which the expected treatment effect is not masked (e.g., with ANOVA, t-test, linear mixed-effect models, ...) the value of that expected effect is needed; however, due to the low statistical power in experiments (suboptimal number of replicates) this can only be estimated by mechanistic models. (5) a mechanistic description differs from post-processing data analysis in accounting for underlaying
processes and the interactions that cause the observed effects, rather than analyzing the effects. Our mechanistic method is as bad as (or as good as) ANOVA and PCA because: (1) we do impose independence of the components, as in ANOVA and PCA; however, as our method resolves how the variances change in time, we can infer that, for instance, Var_DIN is not depending on Var_remin, since their effect is not overlapping in time. In other words, by introducing in the system two distributions simultaneously, one simulating uncertainties in DIN and another simulating uncertainties in 'remin', the results on Var_POC are basically unaffected, since former triggers POC variability during the bloom and the latter during the post-bloom; only possible dependences between contributors triggering POC variability at the same time could be relevant, but only to third order. (2) the fit of the reference run to the sample date per treatment level suffers the same limitation (low statistical power) than typically used statistical tools using summary values as mean of sample units, as ANOVA and PCA, meaning our results regarding the reference run (i.e., CO2 enhance biomass production) are as representative as the experimental data. (3) the calculation of the thresholds suffers the same limitation (low statistical power) than typically used statistical tools using summary values as the sample variance, as ANOVA and PCA, meaning our results regarding the thresholds (i.e., when we compare Var_POC_observations with Var_POC_model) are as representative as the experimental data. Finally, our mechanistic method is worse than ANOVA and PCA because: (1) it is not a universal post-processing black box that requires low-supervised implementation in data analysis packages (popular software are, for instance, R or STATISTICA); (2) quite the opposite, our method demands mechanistic understanding of mesocosm dynamics and its variability, meaning not many science has been done in this direction, thus ours is a pioneer work susceptible to improvement. These comparisons can be summarized in the following table: Method performance with respect to statistical data analyses
Better
availability of variances Var_factor_i
(1)
limitation by assuming linearity of variance decomposition
(2)
temporal description of variability
(3)
Neutral
limitation by assuming dependence of variances
(1)
residuals generation
(2)
thresholds estimation
(4)
mechanistic description
(5)
Worse
(3)
degree of supervision in the application
(1)
presence in mesocosm scientific community
(2)