Improving Estimation of Specific Surface Area by ...

7 downloads 155 Views 603KB Size Report
Mar 26, 2013 - on Springer's website. .... efficiency of fractal parameters (FPs) and conventional PSD ... In this study, we predicted entire PSD curve from soil.
Improving Estimation of Specific Surface Area by Artificial Neural Network Ensembles Using Fractal and Particle Size Distribution Curve Parameters as Predictors Hossein Bayat, Sabit Ersahin & Estela N. Hepper

Environmental Modeling & Assessment ISSN 1420-2026 Volume 18 Number 5 Environ Model Assess (2013) 18:605-614 DOI 10.1007/s10666-013-9366-2

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media Dordrecht. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy Environ Model Assess (2013) 18:605–614 DOI 10.1007/s10666-013-9366-2

Improving Estimation of Specific Surface Area by Artificial Neural Network Ensembles Using Fractal and Particle Size Distribution Curve Parameters as Predictors Hossein Bayat & Sabit Ersahin & Estela N. Hepper

Received: 7 April 2012 / Accepted: 11 March 2013 / Published online: 26 March 2013 # Springer Science+Business Media Dordrecht 2013

Abstract Specific surface area (SSA) is one of the principal soil properties used in modeling soil processes. In this study, artificial neural network (ANN) ensembles were evaluated to predict SSA. Complete soil particle-size distribution was estimated from sand, silt, and clay fractions using the model by Skaggs et al. and then the particle-size distribution curve parameters (PSDCPs) and fractal parameters were calculated. The PSDCPs were used to predict 20 particle-size classes for a soil sample’s particle size distribution. Fractal parameters were calculated by the model of Bird et al. In addition, total soilspecific surface area (TSS) was calculated using the above 20 size classes. Pedotransfer functions were developed for SSA and TSS using ANN ensembles from 63 pieces of SSA data taken from the literature. Fractal parameters, PSDCPs, and some other soil properties were used to predict SSA and TSS. Introducing fractal parameters and PSDCPs improved the SSA estimations by 12.5 and 11.1 %, respectively. The improvements were even better for TSS estimations (27.7 and 27.0 %, respectively). The use of fractal parameters as estimators described 44 and 92.8 % of the variation in SSA and TSS, respectively, while PSDCPs explained 42 and 6.6 % of the variation in SSA and TSS, respectively. The results suggested that fractal parameters and PSDCPs could be successfully used as predictors in ANN ensembles to predict SSA and TSS.

Project supported by the Bu Ali Sina University, Hamadan, Iran. H. Bayat (*) Department of Soil Science, Faculty of Agriculture, Bu Ali Sina University, Hamadan, Iran e-mail: [email protected] S. Ersahin Department of Forest Engineering, Faculty of Forestry, Çankırı Karatekin University, 18100 Çankırı, Turkey E. N. Hepper Facultad de Agronomía, UNLPam, cc 300, 6300 Santa Rosa, Argentina

Keywords Artificial neural networks . Ensemble . Fractal parameters . Particle-size distribution . Specific surface area . Prediction

1 Introduction Specific surface area (SSA) is one of the principal soil properties for agricultural, industrial, and environmental applications that are related to the physical and chemical properties of a porous medium [1]. Previous studies have indicated that SSA is closely related to and has a determining influence on many soil properties such as grain-size distribution (i.e., the clay-size fraction), mineralogical composition [2], consistency limits [3], swelling and shrinkage characteristics [4], compressibility characteristics [5], cation exchange capacity, clay content [6], frost heave [7], water retention characteristics [8], water movement, soil aggregation [9], activity [2], sorption and desorption characteristics [10], and angle of the internal friction of soils [11]. Moreover, processes such as contaminant accumulation, nutrient dynamics, and chemical transport are greatly influenced by SSA [12]. Contrary to soil properties such as organic matter, pH, and particle-size distribution (PSD), SSA is not measured routinely [13]; therefore, data for surface area are unavailable in most of the databases. In addition, direct measurement techniques for SSA are time consuming, labor intensive, expensive, technically difficult, and require skilled personnel [14, 15]. Therefore, since methods to measure the SSA are diverse and their results are generally incomparable [16] due to the error from either inherent limitations of the instrument employed or the basic assumptions applied in mathematical models that are used for computing SSA. Hence, finding a simple, economic, and brief methodology that predicts reliable values for soil SSA is quite essential. In the past 15 years, many pedotransfer functions (PTFs) have been developed using artificial neural networks (ANNs)

Author's personal copy 606

H. Bayat et al.

and they have performed at least as well as other techniques and overcome the problem of introducing statistical uncertainties into PTFs [17, 18]. Although efforts have been made to develop PTFs by regression methods, to our knowledge, no research has been conducted to evaluate the performance of ANNs, especially ANN ensembles in estimating SSA from readily available soil data. This motivated us to use the ensemble method that combines a number of individual ANNs. SSA can be predicted by developing PTFs, between relatively easy to measure soil survey variables and SSA. In any case, finding new input parameters that are more preferable or necessary to estimate SSA and can improve its estimation without spending more time or cost, remains a challenging issue. The fractal theory has been widely used to characterize soil properties [19]. Some authors [20] suggested that the fractal dimension of the PSD can be useful to quantify the relationships between soil texture and related soil properties and processes. Ersahin et al. [21] have used fractal dimension to predict SSA; however, no one has investigated the efficiency of fractal parameters (FPs) and conventional PSD curve parameters (PSDCPs) to predict SSA by ANNs. Therefore, it would be practical and economical to use the ANN ensembles procedure together with calculated FPs and PSDCPs to predict soil SSA. More importantly, these models should improve the accuracy of predicted SSA data and aid in populating the database, which will benefit all users of soil survey data for agricultural, industrial, and environmental applications. The objectives of this study were: (1) to calculate the FPs and PSDCPs using limited soil texture data and to evaluate their relationship with measured soil SSA and calculated total soil-specific surface area (TSS), (2) to develop the PTFs by ANN ensembles to evaluate the utility of using calculated FPs and PSDCPs to improve SSA and TSS predictions, and (3) to compare the efficiency of FPs and PSDCPs in the prediction of soil SSA and TSS.

molecules on the particle surfaces [22]. Hepper et al. [15] measured the SSA with ethylene glycol mono ethylene ether after sieving the samples ( 0

ð2Þ

where P(r) is the mass fraction of soil particles with radii less than r, r0 is the lower bound on radii for which the model applies, and c and u are model parameters (hereafter, we call c and u as (PSDCPs)) that can be calculated using the following equations: u ¼ v1b w b

c ¼ aln wv ;

v ¼ ln

1 1 P ð r1 Þ 1 1 P ð r0 Þ

1 r r ln r1 r0 2

0

;

;

w ¼ ln

1 1 Pðr2 Þ 1 1 Pðr0 Þ

0 b ¼ aln r1 rr 0

ð3Þ

ð4Þ

ð5Þ

2.1 Datasets We developed PTFs to predict measured soil SSA and calculated TSS using 17, 22, and 24 SSA data taken from Aringhieri et al. [1], Ersahin et al. [21], and Hepper et al. [15], respectively. Basic soil properties of sand, silt, clay, and organic matter were used along with cation exchange capacity and SSA to develop PTFs. Ersahin et al. [21] measured the SSA of the soils analyzing the retention of ethylene glycol monoethylene ether, which is a polar molecule that forms only one layer of

1 > Pðr2 Þ > Pðr1 Þ > Pðr0 Þ > 0;

r2 > r1 > r0 > 0

To implement the method described in the above section, we must select values for r0, r1, and r2. We used r0 =1 μm, r1 =25 μm, and r2 =999 μm. According to the USDA particle-size classification system, these radii specify that P(r0 =1 μm) is the clay mass fraction, P(r1 =25 μm) is the clay+silt fraction, and P(r2 =999 μm) is the clay+silt+sand fraction. This was the only data available in the datasets that were used in this study. However, P(r2 =999 μm) used

Author's personal copy Estimation of SSA by ANN Ensembles Using FPs as Predictors

607

by the Skaggs et al. [25] model, is not exactly the same as 1,000 μm specified by USDA for clay+silt+sand mass fraction. However, as 999 μm is too close to the upper bound of sand fraction (1,000 μm), we believe that the above assumption may not cause much error in the fractal scaling. The Pore–Solid Fractal model of Bird et al. [26] has been applied to PSD data: Ms ðd  di Þ ¼ cBdi 3D

ð6Þ

where Ms (d≤di) is the cumulative mass of particles below an upper limit, di, D is the fractal dimension of the PSD, and cB is a composite scaling constant. Hereafter, D and cB will be referred as fractal parameters (FPs). 2.3 Calculation of Soil’s TSS TSS can be predicted by calculations based on the sizes, shapes, and relative quantity of different types of soil particles [27]. We used predicted (extended) PSD classes to calculate TSS of soil mineral fractions. Sand and silt particles are assumed to be spherical and clay particles are assumed to be platy. For a sphere of radius r, the ratio of surface to mass is SSi ¼

3 ρs ri

ð7Þ

For a platy particle of thickness x, the ratio of surface to mass is SSj 

2 ρ s xj

ð8Þ

Here, SSi is the SSA of the ith class of spherical particles and SSj is the SSA of the jth class of platy particles, ρs is the particle density, ri is the mean radius of the ith class of spherical particles, and xj is the mean thickness of the jth class of platy particles. The approximate TSS can be calculated by the summation equation: n m X   X TSS m2 =g ¼ ci SSi þ cj SSj i¼1

ð9Þ

j¼1

Here, ci and cj are the mass fraction of particles of average radius ri and average thickness xj, respectively. The variables n and m are the number of classes of spherical and platy particles, respectively. Measured and calculated values of specific surface area (SSA and TSS) were regressed against each of the FPs (D and cB) to determine the relationship between them. To assess the nature of the relationship between SSA and/or TSS and FPs, different types of regression equations were considered.

2.4 ANN Ensembles The 63 data that were taken from Aringhieri et al. [1], Ersahin et al. [21] and Hepper et al. [15] were partitioned into three subsets using a randomized approach, a training set of 37 data, a cross-validation set of ten data, and a testing set of 16 data. The precise PTFs were developed using ANN ensembles. We selected input data randomly in all ANN models. For every PTF, 80 models were developed using two types of ANNs; feed-forward multilayer perceptrons and generalized regression neural networks in order to predict the soil SSA and TSS. Performances of two types of ANNs were evaluated; each type was run with one hidden layer and different hidden neurons ranging from 3 to 12. We followed procedure by Minasny and McBratney [28] who developed “neuropath” software to create PTFs, in developing individual models. Therefore, to develop each of 80 models for prediction of the soil SSA and TSS, combination of ANN, and bootstrap method [29] were applied. The input data were selected randomly in 50 different times, to obtain 50 bootstrap datasets of the same size as the training dataset. For each bootstrap data set, a network was trained and the soil SSA and TSS were predicted. We assumed as the final estimate of an individual model the mean of the 50 predictions [30]. Several transfer functions including tanh, exponential, logistic, identity, and sine were tested in hidden and output layers to achieve the greatest accuracy and reliability. According to Baker and Ellison [31], the root mean square error (RMSE) tends to steady state when the number of ANN members in the PTF is greater than 5 or 6. We evaluated the effect of the number of ANN ensemble members on the RMSE of the ensemble models, and behaving conservatively, selected the 20 most successful ANN models from 80 developed to create an ANN ensemble model. We combined the predictions done from individual ANN members by simple averaging (all members in the ensemble are assigned an equal weight) and weighted averaging, weighted by the testing error. In weighting, ANNs with smallest errors are given more weight. The weighted average X is:

X ¼

N P

  N P

i¼1

i¼1

 Ei

ð N  1Þ 

  Ei N P

 Pi ð10Þ

Ei

i¼1

where p1, p2, …, pi, …, pN, (N is the number of ensemble members) are several independent, unbiased estimates of SSA or TSS, and Ei is the sum of squared error at optimization of the ith ANN.

Author's personal copy 608

H. Bayat et al.

2.5 Development of Pedotransfer Functions Since the D, SSA and TSS had non-normal distributions, 30D–2, log SSA, and log TSS were used to normalize them, respectively. All variables were standardized to have a zero mean and unit variance. To predict SSA, six PTFs were developed. PTF1 was based on the basic soil properties (i.e., sand, silt, clay, and organic matter contents). PTF2 used FPs (D and cB) as additional inputs. To build PTF3, PSDCPs (i.e., c and u in the Eq. 3) were introduced to the model along with the basic soil properties. To develop PTF4, only FPs were used as inputs, and to develop PTF5, only PSDCPs were used as inputs. We used the soil cation exchange capacity as an input along with basic soil properties to develop PTF6. The same procedure was followed to develop the other six PTFs to predict TSS. The performances of all the PTFs were evaluated and compared with each other.

while keeping all the other input variables constant, and then dividing the resulting standard deviation of the output variable by the standard deviation of that specified input variable [34]. The sensitivity analyses were done to determine the relative importance of the input variables, and all the changes in the outputs were reported as a percentage. 2.7 Evaluation Criteria Three criteria, namely the Akaike information criterion (AIC) [35], RMSE and relative improvement (RI) were used to evaluate the reliability of PTFs. We conducted the Morgan–Granger–Newbold (MGN) test (Eq. 11) [36] to determine whether the differences of the corresponding indices for various PTFs (1–6) are significant or not and if it can be regarded as an improvement from one PTF to the next. The equation applied is the following: ρsd MGN ¼ qffiffiffiffiffiffiffiffiffi

2.6 Sensitivity Analyses According to Donigian and Rao [32], sensitivity analysis is the “degree to which the model result is affected by changes in a selected model input.” The sensitivity coefficient of an input variable can be calculated, making small changes in the input variable of particular focus while keeping all the other inputs constant and then dividing the change in the output variable by the change in the input variable [33]. The basic idea is that the inputs of the network are shifted slightly and the corresponding change in the output is reported either as a percentage or as a raw difference [34]. Response of its output when one standard deviation change is made in an input is a good measure in sensitivity analysis. Consequently, the sensitivity coefficient of the output variables (SSA and/or TSS) to a given input variable was approximated by causing changes in the specified input variable within the range of mean±standard deviation values

ð11Þ

1ρsd N1

where N is the number of samples, ρsd is the correlation coefficient between st =e1, t +e2, t and dt =e1, t −e2, t; e1, t and e2, t represent the forecast errors from first and second competing models, respectively; and (N−1) is the degree of freedom. If the forecasts are equally accurate, then ρsd will be zero and consequently, the MGN value will be zero. The more the difference of the accuracy of the forecasts from two competing models is, the greater the MGN value is.

3 Results and Discussion Soil samples used in this study had a high variation due to variations in land use, parent material, and climate (Table 1). Distribution of soil textures in the USDA textural triangle is

Table 1 Descriptive statistics of soils used for training and testing

Train Mean SD Min Max Test Mean SD Min Max

Sand (%)

Silt (%)

Clay (%)

CEC (cmolc kg−1)

Organic matter (%)

c

u

cB

D

SSA (m2 g−1)

TSS (m2 g−1)

32.7 22.4 0.2 80.8

39.1 15.0 15.0 84.5

28.3 18.9 2.3 73.0

21.1 15.1 4.1 78.5

2.1 2.3 0.1 15.3

0.42 0.08 0.17 0.64

0.65 0.61 0.11 4.32

13.70 8.90 0.78 31.72

2.80 0.12 2.42 2.96

130 101 18 524

0.37 0.18 0.06 0.83

37.5 23.8 9.0 86.3

38.0 15.1 9.7 57.3

24.4 16.8 2.4 54.0

22.2 10.9 6.2 43.4

4.9 8.9 0.1 37.6

0.43 0.08 0.33 0.62

0.56 0.29 0.14 1.06

11.54 7.81 0.58 24.39

2.76 0.15 2.39 2.92

120 74 25 325

0.32 0.16 0.05 0.56

CEC cation exchange capacity, c and u coefficients of PSD curve model, D fractal dimension, cB constant of fractal model, SSA measured specific surface area, TSS calculated specific surface area

Author's personal copy Estimation of SSA by ANN Ensembles Using FPs as Predictors

609

shown in Fig. 1. SSA ranged from 18 to 524 m2 g−1. The TSS values ranged from 0.05 to 0.83 m2 g−1 with a mean of 0.37 and 0.32 m2 g−1 for training and testing data, respectively. The values of TSS are substantially lower than those of SSA (Table 1). This showed that SSA was under predicted when PSD was solely used with Skagge's model. This result was expected to some extent, since there are several factors influencing SSA, such as particle shapes [27], mineralogical composition [2], and soil organic matter [37] that we did not include in our calculations of TSS. That a relatively high correlation (R2 =0.705) occurred between TSS and SSA (Fig. 2) is promising. Since SSA is an operationally defined concept, dependent on the measurement technique and sample preparation [38], and its measurement is difficult, costly, and time consuming; it may be easy, useful, and economic to calculate TSS from sand, silt, and clay contents by employing Skagge's model, and then using this calculated value with proper equation (i.e., an exponential equation) to predict SSA. In addition, TSS can be used as secondary information to predict other soil properties that are difficult to measure, such as the soil water retention curve and the cation exchange capacity [39].

Fig. 2 Relationship between measured soil-specific surface area (SSA) and calculated soil-specific surface area (TSS)

3.1 Correlation Analysis The Pearson correlation analysis was performed to evaluate the relations between SSA and/or TSS and input variables (Table 2). The results suggested that there were strong Fig. 1 Textural distribution of studied soils on the USDA soil textural triangle

20

100

0

correlations between FPs and SSA (r=0.66 (p

Suggest Documents