Improving Estimations of Spatial Distribution of Soil ... - Semantic Scholar

1 downloads 0 Views 4MB Size Report
Jan 25, 2016 - plantation at regional scale: An application of random forest plus residuals .... Orton T, Lark R. The Bayesian maximum entropy method for ...
RESEARCH ARTICLE

Improving Estimations of Spatial Distribution of Soil Respiration Using the Bayesian Maximum Entropy Algorithm and Soil Temperature as Auxiliary Data Junguo Hu1,2, Jian Zhou1, Guomo Zhou1*, Yiqi Luo3, Xiaojun Xu2, Pingheng Li2, Junyi Liang3 1 Information Engineering College of Zhejiang A & F University, Linan, PR China, 2 Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology Research, Linan, PR China, 3 Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, United States of America * [email protected]

OPEN ACCESS Citation: Hu J, Zhou J, Zhou G, Luo Y, Xu X, Li P, et al. (2016) Improving Estimations of Spatial Distribution of Soil Respiration Using the Bayesian Maximum Entropy Algorithm and Soil Temperature as Auxiliary Data. PLoS ONE 11(1): e0146589. doi:10.1371/journal.pone.0146589 Editor: Kevin Scott Brown, University of Connecticut, UNITED STATES Received: July 27, 2015 Accepted: December 18, 2015 Published: January 25, 2016 Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Data Availability Statement: The authors have uploaded the data in FigShare, https://dx.doi.org/10. 6084/m9.figshare.2058498.v1. Funding: This work was supported by the NSF China Programs (Grant No. 31300539 and 31500527), http://www.nsfc.gov.cn/; it was also supported by the Public Welfare Technology Application Research Program of Zhejiang province (Grant No. 2015C31004), http://www.zjkjt.gov.cn/.

Abstract Soil respiration inherently shows strong spatial variability. It is difficult to obtain an accurate characterization of soil respiration with an insufficient number of monitoring points. However, it is expensive and cumbersome to deploy many sensors. To solve this problem, we proposed employing the Bayesian Maximum Entropy (BME) algorithm, using soil temperature as auxiliary information, to study the spatial distribution of soil respiration. The BME algorithm used the soft data (auxiliary information) effectively to improve the estimation accuracy of the spatiotemporal distribution of soil respiration. Based on the functional relationship between soil temperature and soil respiration, the BME algorithm satisfactorily integrated soil temperature data into said spatial distribution. As a means of comparison, we also applied the Ordinary Kriging (OK) and Co-Kriging (Co-OK) methods. The results indicated that the root mean squared errors (RMSEs) and absolute values of bias for both Day 1 and Day 2 were the lowest for the BME method, thus demonstrating its higher estimation accuracy. Further, we compared the performance of the BME algorithm coupled with auxiliary information, namely soil temperature data, and the OK method without auxiliary information in the same study area for 9, 21, and 37 sampled points. The results showed that the RMSEs for the BME algorithm (0.972 and 1.193) were less than those for the OK method (1.146 and 1.539) when the number of sampled points was 9 and 37, respectively. This indicates that the former method using auxiliary information could reduce the required number of sampling points for studying spatial distribution of soil respiration. Thus, the BME algorithm, coupled with soil temperature data, can not only improve the accuracy of soil respiration spatial interpolation but can also reduce the number of sampling points.

Competing Interests: The authors have declared that no competing interests exist.

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

1 / 19

Spatial Distribution of Soil Respiration Using BME

Introduction Soil respiration represents one of the most important fluxes in the terrestrial carbon (C) cycle [1–3]. Therefore, accurately estimating the amount of soil carbon dioxide (CO2) efflux is of great importance to understand the terrestrial C cycle and the mechanisms involving climate change and its effects. Influenced by numerous natural factors, soil CO2 efflux tends to show intense spatial heterogeneity [4, 5]. Practically, studies on soil CO2 efflux generally adopt the scattered point sampling method because of the measurement limits associated with soil respiration[5, 6]. However, due to the extreme spatial and temporal variabilities in soil respiration, it is crucial to have denser spatial data points to undertake spatial interpolation [7]. There are various spatial interpolation methods, most of which have already been applied in many fields [8]. Generally, spatial interpolation methods can be classified into non-geostatistical methods, geostatistical methods (e.g., Kriging), and mixed methods. As an unbiased estimate method, the Kriging method is the most mature and popular method in the field of environmental science[9–11]. Due to expanding application scopes and varying application requirements in different fields, mixed interpolation methods have developed over time, both in theory and application [12]. In addition to a number of commonly used methods, some machine-learning methods have also been adopted toward spatial interpolation, and fairly good results were derived. Popular machine-learning methods include the neural network method and random forest method[13, 14]. Li et al. [15] explored the utilization of many machine-learning methods in spatial interpolation. Hu et al. [16] employed the neural network interpolation algorithm to study illumination distributions. Furthermore, many other algorithms are already being used to study interpolation in the field of environmental science. For instance, researchers have expanded data use to spatial and temporal aspects by adopting Bayesian prior information [17, 18]. Most of the current interpolation methods have already been applied in the study of spatial distribution of soil respiration. Teixeira et al. [19] compared the results of the application of the Kriging and sequential Gaussian fitting methods to soil respiration interpolation, and they found that latter achieved better results. Stoyan et al. [20] studied the spatial variation of soil respiration using the Kriging method. Jordan et al. [21] also used the Kriging method to study the small-scale spatial heterogeneity of soil respiration for a growing forest. However, soil respiration entails a complex a complex interrelationship of physical, biological, and chemical reactions, and thus, it is hard to fully analyze its spatial heterogeneity by merely interpolating data from a few sampled points. Consequently, it is crucial to improve the interpolation accuracy of soil respiration and compensating for the above-mentioned deficiency by including additional information, such as impact factors that are easily accessible, for example, auxiliary information on soil respiration, during sampling. Teixeira et al. [6] compared the interpolation results obtained from the OK and Co-Kriging (Co-OK) methods, using soil bulk density as the second feature, and they showed that the inclusion of this feature greatly improved the effect of interpolation. Huang et al. [22] studied the influence of vegetation and soil properties on the estimation of soil respiration space with the aid of remote sensing techniques. Jurasinski et al. [23] took root microbes as auxiliary information and investigated the spatial distribution of soil respiration by adopting the Co-OK method. The Bayesian Maximum Entropy (BME) algorithm is a combination of the Bayesian statistical theory and the information theory of Shannon. It is used to handle spatiotemporal variables that can be integrated into more empirical knowledge and soft data (auxiliary information), and consequently, it can aid in the collation of environmental information in the field of geostatistics. Compared to the kriging algorithm, the BME algorithm is more theoretical and systematic. Gao et al. [24] coupled the temperature data obtained by remote sensing as soft data with the BME algorithm to study the regional spatial distribution of soil moisture. They

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

2 / 19

Spatial Distribution of Soil Respiration Using BME

compared the results of the BME algorithm to those of kriging and proved that the former could improve the regional interpolation effect. Akita et al. [25] developed a moving-window BME method to improve the estimation accuracy of regional air pollutant distribution. Studies in the field of soil respiration have indicated that soil surface temperature significantly affects soil respiration, representing an exponential relationship [26–28]. Soil temperature can be more easily obtained by advanced technologies (such as wireless sensor networks) than soil respiration. Thus, in this paper, we take advantage of the ability of the BME algorithm to use soft data and employ soil temperature as the soft data. In doing so, we confirm the following three hypotheses: (1) The BME method is more accurate at estimating the spatial distribution of soil CO2 efflux than OK and Co-OK methods, (2) data on soil temperature, used as auxiliary information, provide improved estimates of the spatial distribution of soil CO2 efflux on small scale, (3) and this auxiliary information can help reduce the number of sampling points while studying the spatial distribution of soil CO2 efflux.

Materials and Methods Study area The experimental site was located in the city of Lin’an in the northwest of Jincheng County, Zhejiang Province, China (119°43’15.24”–119°43’26.97”E and 30°15’21.60”–30°15’33.27”N). The entire study area is open grassland bounded by a lake in the east; sparse woods in the south, west, and northwest; and open grassland in the northeast. The average altitude is 50 m, and the highest altitude is 170 m. The area has an average annual frost-free period of 237 days and receives an average annual rainfall of 1613.9 mm over a total of about 158 days. The average annual temperature is 16.4°C, with 1847.3 h of annual sunshine. Roughly speaking, this area is warm and humid, featuring a subtropical monsoon climate, and it has sufficient illumination, abundant rainfall, and four distinct seasons. The area belongs to Zhejiang A & F University, and we can do research freely. Other researchers also can easily get permission to do research in the area. Warning signs had been set during the testing process to insure there was no any danger. The research didn’t cause irreversible damage to the soil and there isn’t any protected species in the study area.

Data sources The experimental area covers an area of 35 m × 35 m, which was divided into grids of 5 × 5m. Seven Lr100GE-6400 [29], developed by GreenOrbs Laboratory, were used to measure the CO2 efflux within each grid. The Lr100GE-6400 is an open-box CO2 efflux measuring instrument. One day before measuring the CO2 efflux, a PVC soil collar was pressed into the soil to a depth of about 5 cm at the center of the surface of each soil core. In order to ensure simultaneous readings, we took 1 minute to warm up the 7 instruments, and 3 minutes to conduct the measurements, with all measurements being completed within the 90 minutes between 13:00 to 14:30 on September 30 (Day 1) and October 7 (Day 2), 2014. We chose the stable data as the experimental data from all the observed data. On Day 1, we conducted measurements from east to west (horizontal direction), while on Day 2, we did so from south to north (vertical direction). The summary statistics of the CO2 efflux data have been provided in Table 1. Higher density temperature data (35 × 35) were measured manually by 30 corrected soil TP-101 thermometers, logged when they achieved equilibrium at 1 to 2 min inserted into the soil at a depth of about 5 cm-10cm. All rounds of sampling were completed in 90 min. The summary statistics of the soil temperature data appear in Table 2.

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

3 / 19

Spatial Distribution of Soil Respiration Using BME

Table 1. Summary of Soil Respiration Data. ID

Date

No.

Max(μmg/m2s)

Min(μmg/m2s)

Range(μmg/m2s)

Mean(μmg/m2s)

SD(μmg/m2s)

Cv(%)

Day 1

September 30, 2014

49

4.939

2.242

2.698

3.476

0.602

17.332

Day 2

October 7, 2014

49

6.649

3.635

3.019

5.810

0.512

8.805

doi:10.1371/journal.pone.0146589.t001

Data preprocessing The application of soft data (soil temperature) is important for the BME algorithm, to integrate uncertain information into the estimation. Suitable and high quality soft data can improve the performance of the algorithm. Soft data can be integrated into expert knowledge, experimental conclusions, and so on, with the common probability-type of soft data [30, 31]. Probability soft data can be approximately normally distributed or Student t-distributed to express the measurement error or physical interpretation [32, 33]. Interval soft data denote physical meanings with upper and lower bounds. After reviewing other similar studies, we assumed that soil respiration and soil temperature share a functional relationship of the Arrhenius type [26, 27], as shown in Eq 1. We used the measured data to calculate the fitting parameters in Eq 1. The soft data are given by the fitting results at regular intervals of the Student’s t-distribution, and the formula used to calculate the prediction interval is given by Eq 2 [34]: R^s ¼ aebT

Rsinterval

ð1Þ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ^ si  tn2;0:025 STP 1 þ 1 þ ðTi  T Þ ¼ R STT n

ð2Þ

Here, R^S is the estimated soil CO2 efflux relative to the soil temperature Ti, with parameters a and b. RSinterval is the prediction interval corresponding to each estimated soil CO2 efflux ^ si : tn2;0:025 is the critical value of the Student’s t-distribution with (n– 2) degrees of freedom R and a confidence level of 95%. STP is the standard deviation (SD) of the soil CO2 efflux estimation error. T is the average soil temperature. STT is the sum of the square of the deviations. The numerical values of parameters in Eqs 1 and 2 are shown in Table 3. Fig 1 shows the relationship between soil CO2 efflux and soil temperature, the probability distribution, and prediction interval of the soft data.

Comparison between methods This paper compared the results of the BME method and two Kriging methods (OK and CoOK). The spatial estimates from the three methods were evaluated by several validation methods. Kriging. Kriging is an unbiased linear estimation method used to characterize a physical attribute's spatial variation and generate attribute estimates at un-sampled locations. OK, the simplest and most widely used kind of Kriging, calculates the weights (relative contributions) of attribute samples surrounding each estimation point by means of the geostatistical Table 2. Summary of Soil Temperature Data. ID

Date

Day 1

September 30, 2014

Day 2

October 7, 2014

No.

Max (°C)

Min (°C)

Range (°C)

Mean (°C)

SD (°C)

Cv (%)

1225

26.58

25.16

1.42

25.89

0.36

1.371

1225

27.70

25.98

1.72

27.18

0.4

1.468

doi:10.1371/journal.pone.0146589.t002

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

4 / 19

Spatial Distribution of Soil Respiration Using BME

Table 3. Parameters of the Arrhenius Type Formula and Summary Statistics of the Soft Data. ID

CR

a

b

STP

STT

T

tn–2,0.025

Day 1

0.6058

0.0037

0.3497

0.65

6.38

25.89

1.761

Day 2

0.5701

0.1331

0.1370

0.74

8.28

27.18

1.761

Note: The parameters in this table were calculated considering the 49-point measuring scheme, and they would need to be recalculated if a different sample size was used. doi:10.1371/journal.pone.0146589.t003

variogram, and the unknown attribute values are then estimated as the linear combination of the weighted samples, subject to the condition that the sum of the weights is equal to 1, see Eq 3: ZðV0 Þ ¼

n X

li ZðVi Þ; with

i¼1

n X

li ¼ 1

ð3Þ

i¼1

Here, Z (V0) represents the value of the estimated point V0, Vi represents the value of the ith point among n points around V0, and λi denote the weight coefficients. Co-OK follows the same principle as OK. However, the former considers more than one variable, and in addition to considering the spatial relationship of the main variable itself, it considers the relationships between the main variable and all other variable types to enable better predictions. Adding more information about relevant variables while estimating the main variable can improve the estimated effects. As we consider the spatial variability of soil C flux over time, adding the closely related variable of soil temperature can compensate for the insufficiency of CO2 efflux sampling and improve the accuracy of the estimation, as shown in Eq 4: Z1 ðV0 Þ ¼

n1 X i¼1

l1i Z1 ðV1i Þ þ

n2 X j¼1

l2j Z2 ðV2j Þ; with

n1 X i¼1

l1i ¼ 1 and

n2 X

l2j ¼ 1

ð4Þ

j¼1

Fig 1. Temperature–soil CO2 Efflux Scatter Plots (Tem–Rs), Fitting Relationship, and Probability Distribution. Plots (a) and (b) denote the aforementioned relationships for Day 1 and Day 2, respectively. The red line (reg in the legend) shows the relationship between temperature and soil CO2 efflux. The green lines (Plup and Pldown in the legend) indicate the prediction intervals at a confidence level of 95%. The blue line (pd in the legend) refers to the probability distribution of the soft data (estimated CO2 efflux corresponding to soil temperature). doi:10.1371/journal.pone.0146589.g001

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

5 / 19

Spatial Distribution of Soil Respiration Using BME

Here, Z1 is the estimate of the main variable Z1 at point V0. λ1i is the weight of the main variable Z1, and λ2j is the weight of the auxiliary variable Z2 (the second characteristic). The Kriging method studies the spatial relationships from point to point, which are usually used to express spatial variability with an experimental variogram. Variograms generally include self-variograms and cross-variograms, and they are used to determine the spatial autocorrelations of the variable’s properties, as shown as Eq 5 [35]: 1 X 2 ½ZðXi Þ  ZðXi þ hÞ 2NðhÞ i¼1 NðhÞ

^g ðhÞ ¼

ð5Þ

^ where ΥðhÞ is the experimental semivariance at a separation distance h, Z(Xi) is the property value of the variable at the i-th point, and N(h) is the number of pairs of points separated by the distance h. In this paper, we added soil temperature properties as the auxiliary information. Cross-variograms can show the relationship between two variables, as seen in Eq 6 [36]: 1 X ½ZðXi Þ  ZðXi þ hÞ½YðXi Þ  YðXi þ hÞ 2NðhÞ i¼1 NðhÞ

^g ZY ðhÞ ¼

ð6Þ

^g ZY ðhÞ is the experience cross-variogram at separation distance h, Z (Xi) is the main property value at the i-th point, Y(Xi) is the secondary property value at the i-th point, and N(h) is the number of pairs of points separated by distance h. Based on the coefficient of determination (R2) and squared residuals, we chose the Gaussian and Spherical models as the optimal variogram model, as depicted by Eq 7 and Eq 8: gðhÞ ¼ c0 þ cð1  expð3

h2 ÞÞ a2

3 h 1 h3 gðhÞ ¼ c0 þ c½ ð Þ  ð Þ  2 a 2 a

ð7Þ

ð8Þ

where γ (h) is the semivariance, C0 indicates the nugget, C represents the structural variability, C0 + C represents the sill variance, and a denotes the correlation length range in geostatistics. Bayesian Maximum Entropy. BME is a spatiotemporal analysis and mapping method that combines information theory with Bayesian statistics [30]. Compared with classical geostatistics (kriging), BME can consider general-prior and site-specific knowledge using a certain error and uncertainty in soft (uncertain) data, in addition to hard (exact) data, to improve the accuracy of spatiotemporal analysis. The meaning of soft data is very flexible; it may denote sampled data, historical data, rough measurement data, expert knowledge, and/or model fitting data. The manner in which the BME method uses comprehensive information and employs the probability method to express uncertainty to the extent possible allows it to present more realistic results in the spatiotemporal analysis of nature attributes. The BME process can be divided into three stages: prior, meta-prior, and posterior (Fig 2). The prior stage mainly uses general knowledge G, to calculate the prior joint probability density function fG(χmap) using the Shannon information measure. χmap is composed using hard data χhard, soft data χsoft, and unknown values at estimation point χk (the point to be estimated). The process involved in obtaining the expected information is shown as Eq 9, where gα(χmap) contains the general statistical information about χmap, such as the mean and covariance. The Shannon information measure is used to maximize the entropy under the relevant constraints of gα(χmap). Eq 10 shows the function of the Lagrange multipliers method (LMM) for

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

6 / 19

Spatial Distribution of Soil Respiration Using BME

Fig 2. Flowchart of the BME Process. doi:10.1371/journal.pone.0146589.g002

maximizing the expected information by introducing the Lagrange multiplier μα and the expectation of gα(χmap). Z Info½wmap  ¼ 

dwmap fG ðwmap Þlog fG ðwmap Þ

Z M½fG ðwmap Þ ¼ 

dwmap fG ðwmap Þlog fG ðwmap Þ 

ð9Þ

Z m f ga fG ðwmap Þdwmap  E½ga ðwmap Þg ð10Þ a a

XN

At the meta-prior stage, specific knowledge will be added into the calculation, including the measured hard data χhard and various forms of soft data χsoft. The soft data set fS(χsoft) denoted probability in this paper. At the posterior stage, using Bayes’ theorem, the posterior probability density function fK(xk|χdata) of the estimation points is calculated, taking into account the conditions of the specific knowledge, shown as Eqs 11 and 12: fK ðxk jχdata Þ ¼ A

1

Z dχsoft fS ðχsoft ÞfG ðχmap Þ

ð11Þ

Z A¼

dχsoft fS ðχsoft ÞfG ðχdata Þ

ð12Þ

After computing the posterior probability density function fK(xk|χdata), we can obtain the attribute values at the estimation points by way of the maximum posterior probability or

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

7 / 19

Spatial Distribution of Soil Respiration Using BME

maximum expectation, shown as Eqs 13 and 14: Z  xk ¼ xk fK ðxk jχdata Þdxk

ð13Þ

xk ¼ maxðfK ðxk jχdata ÞÞ

ð14Þ

where xk denotes the estimated values of xk. We choose Eq 9 as the final estimation method in this paper. Methods of validation. In order to evaluate the performance of the three methods (OK, Co-OK, and BME), about 10% of the collected data (which were selected to avoid continuous sampling points or points located on the outside, namely, a total of five points) were employed as data for cross-validation. The three statistical indicators of the root mean squared error (RMSE), correlation coefficient (CR), and average deviation (mean bias) were used to quantify the accuracy of the estimation results, as shown in Eqs 15, 16 and 17: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 n X ðxk  xk Þ ð15Þ RMSE ¼ n k¼1 n P

ðxk  x Þðxk  xÞ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi CR ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n n P P 2 2   ðxk  x Þ ðxk  xÞ

ð16Þ

k¼1

k¼1

k¼1

n P

Bias ¼

k¼1

ðxk  xk Þ

ð17Þ

n

We applied GS+ 9.0 spatial analysis software to calculate the semivariance and for the OK and Co-OK methods[37]. The BMElib library (BMEGUI3.0 software) was used for the BME method[31], and Matlab 8.4 was employed for basic processing and mapping.

Results Comparison of results of the OK, Co-OK, and BME methods the mean values of soil respiration in the study area was 3.476 and 5.81 μmg/m2s with the varition range of 2.698 and 3.019 μmg/m2s respectively, and the values of the coefficient of variation were observed 17.332 and 8.805 respectively, which proved that it was significant to consider the spatial variability of soil respiration in the study area. As shown in the Table 4, the parameters of the variogram models for the auto-variograms of soil respiration and the crossTable 4. Models and parameters of the auto-variograms and cross-variograms fitted to the soil repiration and temperature. ID

C0

|C0+C|

C0/| C0+C|

A(m)

R2

Variant

Models

Day 1

FCO2

Gaussian

0.035

0.436

0.080

18.05

0.706

Day 1

FCO2×Ts

Spherical

0.181

0.434

0.417

29.88

1.761

Day 2

FCO2

Gaussian

0.031

0.442

0.070

32.11

0.915

Day 2

FCO2×Ts

Spherical

0.232

0.542

0.428

30.59

0.887

Note: FCO2 is the CO2 Flux and the Ts is the soil temperature doi:10.1371/journal.pone.0146589.t004

PLOS ONE | DOI:10.1371/journal.pone.0146589 January 25, 2016

8 / 19

Spatial Distribution of Soil Respiration Using BME

variograms of the soil respiration and temperature in the study area during the abservation periods. There was some different in the range values of soil respiration in the observing days but the similar range of the cross-varigram of soil respiration and temperature, shown in Fig 3, and the soil respiration presented strong spatial dependence with the C0/(C0 + C) ratio