Using rule-based regression models to predict and ...

©2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Using rule-based regression models to predict and interpret soil properties from X-ray powder diffraction data https://doi.org/10.1016/j.geoderma.2018.04.005 Benjamin M. Butlera,∗, Sharon M. O’Rourkeb , Stephen Hilliera,c a

b

The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK UCD School of Biosystems and Food Engineering, University College Dublin, Dublin 4, Ireland c Department of Soil and Environment, Swedish University of Agricultural Sciences (SLU), SE-75007, Uppsala, Sweden

Abstract Data mining is often used to derive calibrations for soil property prediction from diffuse reflectance spectroscopy, facilitating inference of organic and mineral contributions to given properties. In contrast to spectroscopy, X-ray powder diffraction (XRPD) offers a more direct probe into the complexities of soil mineralogy. Here a national scale XRPD dataset of Scottish soils is used in combination with the rule-based regression algorithm ’Cubist’ for prediction of eight soil properties (total carbon and nitrogen, cation exchange capacity, pH, aqua regia extractable potassium, and the sand, silt and clay size fractions), and interpretation of soil property– mineralogy relationships. Precision sample preparation methods prior to XRPD analysis eliminated effects of preferred orientation, creating reproducible data appropriate for data mining. For direct comparison, Cubist was also applied to an equivalent dataset of near infrared spectroscopy (NIRS) measurements. In terms of predictive performance, XRPD surpassed NIRS for prediction of six of the eight soil properties investigated. Notably, diffuse scattering from X-ray amorphous organic matter facilitated relatively accurate predictions of total carbon and nitrogen from XRPD. Aqua regia extractable potassium was predicted with substantial accuracy and confirmed to reflect the phyllosilicate potassium. The particle size fractions were predicted with moderate-substantial agreement using combinations of quartz, phyllosilicate and feldspar variables. This approach introduces the value of XRPD datasets in enhancing the understanding of soil mineralogy–property relationships whilst contributing soil mineralogy’s advance into the digital soil typing ∗

Corresponding author Email address: [email protected] (Benjamin M. Butler)

Preprint submitted to Geoderma

June 4, 2018

paradigm. Keywords: Soil mineralogy, X-ray powder diffraction, Soil properties, Cubist

1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

1. Introduction There is a growing base of literature detailing the application of data mining algorithms to the prediction of soil properties from spectroscopic data (Minasny and McBratney, 2008, 2016; Nocita et al., 2015; Pérez-Fernández and Robertson, 2016; Reeves and Smith, 2009; Viscarra Rossel and Webster, 2012; Viscarra Rossel et al., 2016; Viscarra Rossel and Behrens, 2010). This chemometric analysis of soil spectral data is accepted as a rapid and cost-effective form of analysis where multiple soil properties can be derived from a single measurement. The approach facilitates attainment of greater spatial and temporal resolution (Sanchez et al., 2009), and also allows identification of specific soil constituents that contribute to each soil property (Viscarra Rossel et al., 2009). To date, visible-near infrared spectroscopy (vis-NIRS) is the most widely applied analytical technique for soil property prediction (SorianoDisla et al., 2014; Viscarra Rossel et al., 2016). The prediction of soil properties from visible-near infrared spectroscopy is based on the intensity of absorbance bands that relate to functional groups in the organic and mineral components of a soil sample (Viscarra Rossel and Behrens, 2010). In terms of mineralogy, most soils contain a mixture of primary minerals derived from the soil’s parent materials, and secondary minerals such as clay minerals, the occurrence of which is often controlled by weathering (Newman, 1984; Dixon and Schulze, 2002). As a whole therefore, soil mineralogy displays substantial heterogeneity on a variety of scales. Additionally, the suite of minerals present in a given soil are variously distributed amongst the sand, silt and clay size fractions, whilst also varying widely in chemical composition, crystal structure, surface area and solubility (Dixon and Schulze, 2002). Soil minerals are thus intimately related both directly and indirectly to many of the physical, chemical and biological properties of a soil (Newman, 1984; Andrist-Rangel et al., 2006). Given the complexity of soil however, it has been notoriously challenging to systematise soil property–mineralogy relationships (Newman, 1984). Though vis-NIR and MIR spectra may contain considerable information on soil mineralogy (Viscarra Rossel et al., 2009; Sila et al., 2016), X-ray powder diffraction (XRPD) provides a more direct mineralogical probe because diffraction data is fundamentally related to the crystal structure and crystal chemistry of minerals in soils (Schulze, 1989). Rather than analysing a sample at a range of wavelengths, XRPD most commonly uses a monochromatic X-ray beam. The resulting signal (a series of peaks) as a function of diffraction angle (◦ 2θ) is related to atomic spacings in the 2

36

ordered crystalline lattice by the ‘Bragg’ equation: nλ = 2d sinθ

37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

(1)

where n is an integer, λ is the monochromatic X-ray wavelength (angstroms, ˚ A), d is the atomic spacing (d-spacing, ˚ A) between planes of atoms, and θ is the angle between the incident rays and the plane of atoms. As such, XRPD produces diffraction patterns, or ‘diffractograms’, rather than ‘spectra’. Chemical information about minerals is also encoded in the relative intensities of the various diffractogram peaks relating to a given mineral, and variation in mineral chemical composition may also alter d-spacings (and hence peak positions). Phase identification from these peaks can be readily achieved using comprehensive databases such as the Powder Diffraction File (ICDD, 2016). Further to the discrete ‘Bragg’ peaks derived from crystalline materials like minerals, the presence of amorphous phases within a sample results in diffuse scattering of X-rays across a wide 2θ range. Therefore amorphous soil constituents (e.g. organic matter and volcanic glass) typically result in broad maxima that are often considered as ‘background’ and consequently ignored or removed during analysis. XRPD is particularly useful for analysing complex mineral mixtures, such as soil (Ulery and Drees, 2008), and with appropriate sample preparation (Hillier, 1999) can be used to accurately quantify soil mineral concentrations (Chipera and Bish, 2002; Omotoso et al., 2006). Mineral identification and quantification, however, are time consuming and often challenging undertakings- and the limited number of samples measured by XRPD in many soil science studies reflects these difficulties [e.g. Andrist-Rangel et al. (2010, 2013); Nagra et al. (2017); Jones and McBratney (2016); Kramer et al. (2017)]. Aside from these challenges, advances in sample preparation [e.g. spray drying, Hillier (1999)] now facilitate reproducible high-throughput XRPD analysis where hundreds or thousands of soil samples can be analysed for a single project (Towett et al., 2015; Barr et al., 2009). As such, a national scale dataset of Scottish soils has recently been analysed by XRPD with the aim of advancing the understanding of soil property–mineralogy relationships. At the same time, such datasets may contribute to aligning soil XRPD with the data-driven, ‘digital’, approaches widely applied in soil spectroscopy (Nocita et al., 2015). Since it is currently impractical to apply accurate quantitative mineral analysis to high-throughput soil XRPD datasets, alternative techniques to relate mineralogy to properties must be applied. Here this alternative takes the form of data mining, which to our knowledge has not been previously applied to soil XRPD data. Specifically, this investigation aims to illustrate how a national scale XRPD dataset in combination with data mining can be used to predict and interpret soil (< 2 mm)

3

73

properties, whilst also deriving descriptions for the way these properties are defined by, and linked to, soil mineral composition.

74

1.1. Hypotheses

72

75 76

77 78 79

A national dataset of Scottish soil properties and their corresponding XRPD measurements were used to investigate the following hypotheses: i) Data mining of XRPD data can be used to predict mineral soil properties ii) Information derived from the models of predicted soil properties can be used for interpretation of soil property–mineralogy relationships

80

2. Materials and methods

81

2.1. Soil dataset and sample selection

82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108

The National Soil Inventory of Scotland (NSIS) is an objective dataset of Scottish soils consisting of samples collected from two sampling campaigns. The first collection represents samples obtained between 1978 and 1988 from 721 sites defined by a 10 km grid across Scotland (hereafter NSIS 1). The second collection was obtained between 2007 and 2009 (hereafter NSIS 2), where a quarter of the original locations (184 sites) were re-sampled (Chapman et al., 2013; Pérez-Fernández and Robertson, 2016). During sampling for both NSIS 1 and NSIS 2, soils were taken from each of the main soil horizons at all sites, resulting in a combined archive of 3,936 soil samples. From the NSIS soil archive, all NSIS 2 samples (n = 700) were selected for analysis by XRPD (section 2.3), whilst NSIS 1 samples from corresponding profiles were also retrieved and analysed if sufficient sample was available (n = 546). The same selection was previously applied for NIRS analysis [see Pérez-Fernández and Robertson (2016) and section 2.3], thus permitting direct comparison of XRPD to a more established chemometric technique. For the purpose of this investigation the dataset of 1,246 samples was refined further by selection of mineral horizons only (as recorded in the field). The mineral horizons were selected for two reasons: first, soil properties were found to display bimodal distributions in terms of the presence of mineral and organic ‘clusters’ (Figure 1), which can create misleading performance parameters of predictive models; second, information derived from XRPD in this context is dominated by the mineral components of each soil sample, as opposed to organic and amorphous phases. In addition to removing organic horizons, calcareous soils (an uncommon soil type in the Scottish context) were identified using the Powder Diffraction File database (ICDD, 2016) in Bruker EVA software and were also removed from the dataset (n = 9). Overall, the selection criteria resulted in a dataset comprised of 854 mineral soil samples measured by XRPD and NIRS. This encompassed 7 major soil groups 4

Density

0.10

0.05

0.00 0

20

40

60

Total Carbon (%)

Figure 1: Histogram displaying the density distribution of total carbon for the entire dataset of Scottish soils analysed by XRPD and NIRS (n = 1, 246), as measured by mass spectrometry (see section 2.2). The bimodal distribution is caused by the inclusion of organic soil horizons in the dataset. The Histosols (a common soil type in Scotland) are represented by the cluster of samples with total carbon exceeding approximately 40%.

109 110 111 112

(Fluvisols, Cambisols, Gleysols, Histosols, Podzols, Leptosols and Regosols; Table 1) and 32 primary rock types in the parent material (defined in the field, Table 2). The site locations of the selected samples, and the number of soil samples at each site (n = 184), are displayed in Figure 2. Table 1: Major soil groups represented in the dataset of 854 soil samples investigated.

Major soil group n Alluvial soils (Fluvisols) 38 Brown earths (Cambisols) 190 Gleys (Gleysols) 238 Peats (Histosols) 31 Podzols 329 Rankers and immature soils (Leptosols and Regosols) 28

113

114 115 116

2.2. Soil properties For each soil sample in the NSIS dataset, a wide variety of properties were measured that might be related to the XRPD and NIRS measurements using data mining. This study focuses upon eight of these properties: total carbon (CT ), total 5

Table 2: Primary rock types of the 854 mineral soil samples investigated. A total of 45 samples did not have their primary rock type recorded.

Primary Rock Type

n

Igneous Granite Diorite Undifferentiated intermediate igneous Andesite Undifferentiated basic igneous Olivine dolerite Basalt

61 3 4 22 4 6 35

Metamorphic Epidiorite Metamorphic quartzite Granulite Slate Black slate Phyllite Hornblende-schist Mica-schist Quartz-schist Quartz-mica-schist Schistose grit Argillaceous schist Undifferentiated schist Undifferentiated gneiss

4 6 7 6 6 10 1 17 24 51 1 3 195 22

Sedimentary Flint Sedimentary quartzite Flagstone Undifferentiated shale Feldspathic sandstone Siliceous sandstone Arkose sandstone Ferruginous sandstone Undifferentiated sandstone Greywacke Conglomerate

3 18 14 14 3 3 20 6 109 127 4

6

145

nitrogen (NT ), pH in water (pHH2 O ), cation exchange capacity (CEC), aqua-regia extractable potassium (Kaqr ), and particle size distribution (sand, silt and clay). Together these span organic, textural, and chemical properties of soil. Most of these properties are also relevant to the Global soil Map project (Sanchez et al., 2009). CT and NT were measured by mass spectrometry using the method described in Chapman et al. (2013). pHH2 O was determined with a combination electrode on the supernatant of a 3:1 mixture of distilled water (volume, cm3 ) and soil (weight, g). To determine CEC, exchangeable base cations were displaced from the soil exchange sites using a neutral solution of ammonium acetate (1 M) and analysed by inductively coupled plasma atomic emission spectroscopy (ICP-OES). The CEC was then estimated as the potential cation exchange at pH 7.0, calculated as the sum of exchangeable Na+ , Mg2+ , K+ and Ca2+ concentrations (in cmolc kg−1 ) (AndristRangel et al., 2010). Kaqr was determined by digesting 0.5 g of ground sample using the procedure of McGrath and Cunliffe (1985) as modified by McGrath (1987) (3:1 of 50 % HCl:concentrated HNO3 by vol.), with the digest being made up into 100 ml of 12.5 % HNO3 prior to analysis by ICP-OES. Particle size distributions were determined by subsampling the soils into 125 ml polyethylene bottles, to which ∼40 ml of de-ionised water and 2 − 3 ml of a sodium hexametaphosphate and sodium carbonate solution were added. Samples were then placed on an end-over-end shaker for 16 hours prior to final dispersal in an ultrasonic bath. Aliquots of the sample were then analysed for particle size determination using laser diffraction (Mastersizer 2000 with Hydro G dispersal unit, Malvern Instruments) until the sample was exhausted. All particle size fractions relate to the International Soil Science Society definitions (sand = 20 − 2000 µm, silt = 2 − 20 µm, clay = < 2 µm). Summary statistics of the 8 soil properties are displayed in Table 3. Five of the soil properties display substantial skewness [i.e. skew > 1, Table 3; Meyer et al. (2017)], however the modelling approach applied herein (‘Cubist’, see section 2.4.2) is known to handle this type of data (Minasny and McBratney, 2008; O’Rourke et al., 2016b), therefore no data transformations of the soil properties were applied.

146

2.3. X-ray powder diffraction and near infrared spectroscopy

117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144

147 148 149 150 151 152 153 154

Samples were prepared for XRPD by McCrone milling 3 g of < 2 mm sieved and air-dried (30 ◦ C) soil for 12 minutes in ethanol and spray drying the resulting slurry, as described by Hillier (1999). Diffractograms were measured by scanning from 3 − 70 ◦ 2θ, on a Panalytical Xpert Pro instrument, using nickel-filtered copper Kα radiation, counting for 100 s per 0.167◦ step using a position sensitive Xcelerator detector. All samples contained quartz, therefore the quartz peaks in each diffractogram were used as an internal standard to correct peak positions (2θ) for the effects of sample height displacement and sample transparency, which are common 7

Number of soil samples at each site ● ● ● ● ● ● ● ● ● ● ● ● ●

60

1 2 3 4 5 6 7 8 9 10 11 12 13

●

● ●

Latitude

● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

58

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

56

N 0km

50km

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

100km

54 −7.5

−5.0

−2.5

Longitude Figure 2: The soils represented in the dataset of 854 mineral soil horizons across Scotland. Each point represents a sampling site, with the colour denoting the number of soil samples at each site that were analysed by XRPD and NIRS. The majority of sites were sampled and analysed for both NSIS 1 and NSIS 2.

8

9

Soil property Total Carbon (%) Total Nitrogen (%) pHH2 O CEC (cmolc kg−1 ) Kaqr (mg kg−1 ) Sand (%) Silt (%) Clay (%)

n Mean Med. 851 2.818 1.603 846 0.182 0.097 854 5.151 5.080 854 3.584 1.403 845 2432 1900 820 71.615 73.512 820 23.742 22.313 820 4.643 3.967

St. dev. 3.691 0.235 0.723 5.253 1999 15.443 12.737 3.400

Min. 0.020 0.005 3.250 0.047 0.000 16.102 0.000 0.000

Max. Skew. Coeff. var. 41.549 3.584 130.983 1.889 2.796 129.605 8.130 0.285 14.046 63.861 3.721 146.565 13853 2.36 82.182 100.000 −0.596 21.563 68.160 0.580 53.647 20.106 1.244 73.226

Table 3: Summary statistics of soil properties from the investigated dataset. n represents the number of samples that had associated measurements for each soil property. Total number of samples in the dataset studied (excluding organic and calcareous soils) is 854 (see section 2.1).

174

experimental aberrations. The 2θ corrections were approximated by maximising the correlation between each sample and a standard quartz pattern [PDF 05-0490, ICDD (2016)], before converting all data to a harmonised 2θ scale. The XRPD data were then subjected to pre-treatment by binning each diffractogram into binwidths of 5 measurement intervals and taking the mean of each bin. Binning acted to smooth the XRPD data and facilitated faster computation of subsequent Cubist models, but crucially retained sufficient mineralogical detail. A square-root transform of the XRPD data was tested for each calibration model subsequently derived (section 2.4), results from which are only presented here if improved performance parameters were observed (section 2.4.3; Table 5). A detailed account of the protocol used to collect NIR spectra is provided in Pérez-Fernández and Robertson (2016). Briefly, air-dried (30 ◦ C) and sieved (< 2 mm) samples were analysed at 2 nm intervals in the range of 1100 − 2500 nm using a FOSS NIRSystems 5000 spectrophotometer with a transport module sampling attachment and a quarter-cup sample holder. The measured reflectances, R, were converted to apparent absorbance as log10 (1/R) and standard normal variate baseline correction applied (Barnes et al., 1989). A first derivative transform of the NIRS data [as used in Pérez-Fernández and Robertson (2016)] was tested for each calibration model subsequently derived, results from which are only presented here if improved performance parameters were observed (Table 5).

175

2.4. Soil property prediction

155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173

177

All methods of data analysis and soil property prediction described hereafter were performed in the R software environment (R Core Team, 2017).

178

2.4.1. Selection of training and evaluation data

176

179 180 181 182 183 184 185 186 187 188 189 190 191

For each soil property the selected samples were split into two sets: one comprising approximately 75 % of the data used to build and train subsequent calibration models (hereafter the training set), and the remaining data used to evaluate the performance of each model (hereafter the evaluation set). The training and evaluation sets were selected by applying the Kennard-Stone [KS; Kennard and Stone (1969)] algorithm to a matrix of XRPD data that had been grouped and averaged by site (Figure 2). Grouping the data in this way reduced the potential for pseudoreplication, both within soil profiles and between NSIS 1/NSIS 2 site replicates. The sites (n = 184) selected by the KS algorithm for the training and evaluation sets defined the soil horizons (n = 854) that constituted each set. For each soil property, the subsequent calibration models (section 2.4.2) with XRPD or NIRS data were based on the same KS algorithm output, thereby allowing direct comparison of model performance.

10

192

2.4.2. Cubist

217

To predict soil properties from XRPD and NIRS measurements, the ‘Cubist’ machine-learning algorithm was applied. Cubist is an extension of Quinlan’s M5 model tree (Quinlan, 1992), and is featured in the CRAN repository under the same name for use in the R statistical software environment (Kuhn and Quinlan, 2017). Cubist defines a series of conditions based on predictor variables (i.e. the XRPD or NIR measurement intervals) that partition the data. At each partition there is a multivariate linear model used to predict the output (i.e. the soil property). Cubist has been widely applied in the prediction of soil properties from spectral measurements (Minasny and McBratney, 2008; Viscarra Rossel and Webster, 2012; O’Rourke et al., 2016a; Viscarra Rossel et al., 2016), particularly as it respects the upper and lower limits of data that can otherwise result in negative prediction values (Minasny and McBratney, 2008). Calibration models from Cubist were generated for each soil property using the XRPD and NIRS datasets individually. Cubist has functionality for a boosting-like scheme called ‘committees’ (C), where iterative Cubist models are created sequentially until a pre-defined limit (Cmax ) is reached (Figure 3). After each iteration, the subsequent model is created using an adjusted version of the training set response (i.e. the soil property). If the model over-predicted a value (assessed here using 10-fold cross validation), the response is adjusted downward for the next model. The reverse applies if the model under-predicted a value (Kuhn and Quinlan, 2017). Once Cmax is attained, the final output is the mean prediction of all iterations (Figure 3). In this study, 5 values of Cmax were tested (2, 4, 6, 8 and 10) for each soil property and each analytical technique. The value of Cmax that produced the smallest root-mean-square error (RMSE) from cross-validation was then used to develop each subsequent Cubist ensemble to be applied to the evaluation set.

218

2.4.3. Performance parameters

193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216

219 220 221 222 223 224 225 226 227 228 229

As stated above, only RMSE was used during cross-validation to define the optimum Cubist ensemble. For an overall assessment of model performance for each soil property, the evaluation set was applied to the Cubist models, and a range of performance parameters derived. To compute these parameters, model predictions were compared to the independent laboratory measurements (section 2.2). The coefficient of determination (R2 ) and concordance correlation coefficient [ρc ; Lin (1989)] were used to assess the covariation and correspondence between the model predictions and laboratory measurements. The ρc is a combined measure of precision and bias that ranges from −1 to 1, with a value of 1 representing perfect agreement. Here values of ρc > 0.90 are considered to represent excellent agreement, values between 0.80 and 0.90 substantial agreement, values between 0.70 and 0.80 moderate

11

Figure 3: Flowchart depicting the steps applied during development of Cubist models with varying numbers of committees.

230 231 232 233 234 235 236 237 238 239 240

agreement, and values < 0.70 poor agreement. The root-mean-square error (RMSE) was used to quantify model inaccuracy. Model bias was assessed as the mean error. Lastly, the ratio of performance to inter-quartile range [RPIQ; Bellon-Maurel et al. (2010)] was used as a measure of the relative goodness of the calibration models, calculated as the interquartile range (IQR)/RMSE of prediction. A higher RPIQ represents better predictive performance of a model. The RPIQ is considered to be a particularly appropriate performance parameter for soil calibration models given the tendency for datasets to display skewed properties (Bellon-Maurel et al., 2010). Since no single performance parameter can effectively define model performance (Reeves and Smith, 2009; Viscarra Rossel and Webster, 2012), a holistic approach using all model performance parameters was used. 12

241

2.4.4. Cubist model feature selection

262

Cubist efficiently selects features (i.e. XRPD or NIRS variables) from a dataset when deriving calibration models (Minasny and McBratney, 2008). Some variables are selected to define the rules within the model trees that partition the data, while others are used in the multivariate linear regressions associated with each partition. The importance of each variable in the Cubist model can be considered proportional to the prevalence of its use within rules and regressions (Kuhn and Quinlan, 2017). The more a variable is used, the more likely it is to relate (positively or negatively) to the property being predicted. Therefore, by extracting and interpreting the variables selected by Cubist, it is possible to infer which parts of a diffractogram or spectra are important in predicting a given soil property. To identify soil mineralogy–property relationships, the rule and regression variables used by each Cubist model were extracted and plotted alongside a mean diffractogram (calculated as the mean intensity of each XRPD 2θ interval from all samples in a given training set), which are hereafter referred to as ’feature plots’. The feature plots facilitated qualitative identification of key diffractogram features relating to each soil property, allowing identification of specific soil constituents associated with that feature based on its position (2θ, or derived d-spacing; equation 1) in the diffractogram. Given the wealth of literature relating NIR spectra to soil properties using data mining (Viscarra Rossel and Behrens, 2010; Pérez-Fernández and Robertson, 2016) and our objective to identify and interpret soil property–mineralogy relationships, feature plots are only described herein for the XRPD Cubist models.

263

3. Results and discussion

242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261

264 265 266 267 268

269 270 271 272 273 274

275 276 277

The eight soil properties under study are assessed here based on organic, chemical or textural grouping. Organic properties refer to the CT and NT . Chemical properties refer to pHH2 O , CEC, and Kaqr . Textural properties refer to sand, silt and clay size fractions. Each soil grouping is discussed separately. The XRPD predictive models for each soil property are assessed in relation to: 1. The use of feature plots in combination with the Powder Diffraction File 4+ database (ICDD, 2016) for interpretation of diffractogram variables that show strong relation to each property (see hypothesis ii, section 1.1). 2. Performance parameters of the XRPD models for the prediction of each soil property, with comparison to the corresponding NIRS predictive models (see hypothesis i, section 1.1) 3.1. Organic properties The diffractogram features selected by Cubist to predict CT and NT display similar characteristics (Figure 4a-b). In both cases the predominant features relate to a 13

14

Usage (%) 73 71 68 67 65 59 45 44 42 42

Usage (%) 77 69 65 65 65 62 59 50 47 47

Rank 1 2 3 4 5 6 7 8 9 10

Rank 1 2 3 4 5 6 7 8 9 10

(mg kg−1 ) d I.D. 4.499 Phy. (02,11) 10.070 D. Mic./Illite 1.821 Quartz 3.437 D. Mic./K Fel. 4.461 Phy. (02,11) 9.882 D. Mic. 2.628 Plag./D.Mic. 10.365 Illite 10.166 Illite/D. Mic. 3.095 Plag.

Kaqr 2θ 19.717 8.774 50.041 25.899 19.885 8.941 34.086 8.524 8.691 28.283

I.D.

Org. Org. Quartz Org. Quartz Org. D. Mic. Quartz Quartz/D. Mic. Kaol./Org.

(%)

carbon d 4.823 5.354 1.821 4.889 3.363 5.249 5.003 1.369 1.449 4.151

Total 2θ 18.381 16.543 50.041 18.130 26.484 16.877 17.713 68.502 64.242 21.388 Sand (%) Usage (%) 2θ d 93 19.885 4.461 66 41.186 2.190 57 25.231 3.527 51 25.314 3.515 50 20.052 4.425 50 34.921 2.567 42 19.717 4.499 38 20.386 4.353 38 53.884 1.700 38 17.044 5.198 I.D. Phy. (02,11) D. Mic. K Fel. Plag. D. Mic/Illite D. Mic. Phy. (02,11) Kaol. D. Mic. Org.

Total nitrogen (%) Usage (%) 2θ d I.D. 88 16.877 5.249 Org. 82 18.381 4.823 Org. 67 17.629 5.027 D. Mic. 64 17.378 5.099 Org. 64 16.794 5.275 Org. 61 18.464 4.801 Org./Illite 56 18.548 4.780 Org. 52 21.639 4.104 D. Mic. 47 15.875 5.578 Org./Plag. 46 18.130 4.889 Org. Silt (% Usage (%) 2θ d 68 20.135 4.407 52 48.036 1.893 48 19.968 4.443 48 19.885 4.461 46 21.388 4.151 40 25.231 3.527 38 9.192 9.613 38 17.044 5.198 35 25.314 3.515 30 7.020 12.582

pHH2 O Usage (%) 2θ d 81 27.904 3.195 70 24.897 3.573 62 27.820 3.204 61 23.560 3.773 56 12.366 7.152 47 24.312 3.658 41 12.450 7.104 40 36.759 2.443 38 19.801 4.480 38 23.644 3.760

I.D. D. Mic./Illite Plag. Phy. (02,11) Phy. (02,11) Kaol. K Fel. Illite Org. D. Mic./Plag. Exp.

I.D. Plag./D. Mic. Kaol. Plag./D. Mic. Plag./K Fel. Kaol. Plag. Kaol. D. Mic./Plag. Phy. (02,11) Plag./K Fel.

Clay (%) Usage (%) 2θ d 80 20.052 4.425 45 21.388 4.151 43 20.553 4.318 40 20.135 4.407 39 20.386 4.353 34 5.934 14.882 34 19.968 4.353 33 9.192 9.613 31 20.219 4.388 30 19.717 4.499

I.D. Phy. (02,11) Kaol. D. Mic./Quartz D Mic./Illite Kaol. Chlor./Exp. Phy. (02,11) Illite D. Mic./Illite Phy. (02,11)

CEC (cmolc kg−1 ) Usage (%) 2θ d I.D. 55 15.374 5.759 Org. 53 6.018 14.675 Chlor./Exp. 43 20.135 4.407 D. Mic./Illite 43 18.715 4.738 Org./Chlor. 43 6.101 14.474 Chlor./Exp. 39 19.467 4.556 Phy. (02,11) 37 21.221 4.183 Kaol. 35 12.450 7.104 Kaol./Chlor. 34 24.897 3.573 Kaol. 32 24.813 3.573 D. Mic.

Table 4: Details of the 10 most common regression variables used for the Cubist predictions of each soil property. Usage represents the percentage of multivariate equations that the variable is used in, d is the d-spacing of the Bragg reflection (˚ A), and I.D. is the soil constituent related to the selected variable. Given that the XRPD data were binned prior to analysis, each 2θ interval represents a band of 0.084 ◦ 2θ. Org. = organic matter. D. Mic. = dioctahedral mica. Plag. = plagioclase. Kaol = kaolinite. K Fel. = potassium feldspar. Phy. (02,11) = general non-basal phyllosilicate reflection. Chlor. = chlorite. Exp. = expandable clay.

15

Figure 4: Feature plots from the Cubist XRPD calibration models for CT , NT , pHH2 O and CEC. Red = mean diffractogram. Black = variable use in decisions. Grey = variable use in multiple linear regression. The mean diffractogram has been square root transformed to emphasise minor peaks.

H2O

278 279 280 281 282 283 284 285 286 287 288 289 290 291

region of the diffractogram associated with the maxima of the diffuse scattering from ‘X-ray amorphous’ organic matter (Figure 5; Table 4). This scattering is expressed in the selected features as a ‘hump’ centred between approximately 15 − 20 ◦ 2θ. In comparison to the use of variables associated with the organic matter, few features relating to the mineralogy are used by the Cubist models when predicting CT or NT . Of the little mineralogical information used, the selected peaks mainly relate to quartz. Where quartz features are used in the multivariate regressions, their contributions to the CT and NT prediction are consistently negative (i.e. the increased quartz = decreased organic matter = decreased CT and NT ) whilst features relating to the organic ‘hump’ are consistently positive (i.e. increased diffuse scattering = increased organic matter = increased CT and NT ). Overall, the observation of Cubist predominantly using the organic matter concentration to predict CT and NT can be considered a valid assumption, particularly since calcareous soils were excluded from this dataset (such that CT should equate closely with total organic carbon). 10000

Counts

7500

5000

2500

0 0

20

40

60

2θ

Figure 5: A typical organic soil (red) diffractogram plotted alongside that of a typical mineral soil (black). The presence of organic matter results in a distinct increase in diffuse scattering, especially between 2θ = 15 − 25. The organic diffractogram has been offset from the mineral diffractogram on the y-axis to aid interpretation. 292 293 294 295

The performance parameters of the Cubist XRPD models (Table 5) indicate that CT and NT can be adequately predicted from the diffractograms. When applied to the evaluation set (n = 203), both organic properties were predicted with excellent agreement (ρc = 0.962 and 0.924 for CT and NT , respectively) and rela16

318

tively good accuracy (RMSE = 0.932 % and 0.081 % for CT and NT , respectively). This likely reflects the constrained nature of C:N ratios in (Scottish) soil organic matter, even over large spatial scales (Cleveland and Liptzin, 2007; Kirkby et al., 2011). The higher RPIQ for CT than for NT indicates that, based on the RMSE and the inter-quartile range of predicted values, Cubist is able to utilise features from the diffractograms (Figure 4a-b) to create a more robust model for CT prediction than for NT . This may reflect inevitable variation in C:N despite it being largely constrained, and that the XRPD features used by Cubist relate primarily to the amorphous nature of soil organic matter rather than its elemental composition. When compared to the corresponding Cubist models produced using the NIRS dataset, the XRPD models produce better predictive performance for CT and NT with regards to all computed performance parameters (Table 5). The XRPD models are therefore considered to be better predictors of CT and NT , which, given that NIR carries more information relating to organic matter and associated functional groups compared to XRPD, is somewhat unexpected. This highlights the value of information encoded within the diffuse scattering of X-ray amorphous phases in XRPD data, which could easily end up being overlooked or even removed altogether if typical background subtractions had been applied. In this context, it should be noted that Scottish soils are relatively high in organic matter, and despite this study excluding many organic soil horizons from analysis, the refined dataset had a mean CT of 2.8 ± 3.7%, and a maximum of 41.6 % (Table 3). Thus it may transpire that CT prediction of soils from elsewhere, with lower organic carbon contents, may be better resolved with NIR, rather than XRPD.

319

3.2. Chemical properties

296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317

320 321 322 323 324 325 326 327 328 329 330 331 332 333

To predict the chemical properties (pHH2 O , CEC and Kaqr ), Cubist selected markedly different diffractogram features compared to those used for CT and NT prediction (section 3.1). Most notably, the selected features for each of the chemical properties are dominated by those related to mineralogy, as opposed to organic matter (Figure 4c-d and Figure 6a). Specifically, for pHH2 O prediction, the features selected for the decisions comprise a range of minerals including feldspars (plagioclase and potassium feldspars), dioctahedral mica, and a range of features related to the presence of clay minerals and the phyllosilicates mineral group (Figure 4c). Quartz is notably absent from the decision features for pHH2 O prediction. The most frequently used variables in the multivariate regressions to predict pHH2 O relate to plagioclase, potassium feldspar, dioctahedral mica and phyllosilicates (Table 4). In terms of each minerals exact contribution to the multivariate regressions, there is a lack of consistency as to whether specific features contribute positively or negatively to pHH2 O prediction. The same 17

18

Soil property Total Carbon (%) Total Nitrogen (%) pHH2 O CEC (cmolc kg−1 ) Kaqr (mg kg−1 ) Sand (%) Silt (%) Clay (%)

ncal 648 647 650 650 647 585 585 585

neval 203 199 204 204 198 235 235 235

Meaneval 3.055 0.197 5.136 3.296 2367 75.768 20.487 3.745

SDeval 3.442 0.216 0.655 3.701 1893 14.138 11.668 3.104

Transform R2 X 0.928 X 0.859 X 0.414 X 0.453 X 0.780 0.696 0.657 0.586

XRPD ρc RMSE Bias 0.962 0.932 0.118 0.924 0.081 −0.008 0.605 0.517 −0.111 0.610 2.742 −0.207 0.855 910 −48 0.820 8.408 −0.207 0.783 7.548 0.043 0.749 2.141 0.319 RPIQ 3.150 1.887 1.333 1.302 1.507 2.257 1.885 2.134

X X

Transform X X X X X

R2 0.723 0.711 0.669 0.650 0.213 0.502 0.421 0.452

NIRS ρc RMSE Bias 0.814 1.851 −0.195 0.817 0.117 −0.013 0.791 0.386 −0.088 0.765 2.216 −0.230 0.427 1720 −6.478 0.684 10.804 0.341 0.600 9.818 −0.726 0.661 2.504 0.268

RPIQ 1.943 1.508 1.901 1.781 0.914 1.319 1.112 1.215

Table 5: Performance parameters of the Cubist models derived for soil property prediction from the XRPD and NIRS datasets. ncal and neval are the number of samples used in the calibration and evaluation sets, respectively. An ‘X’ in the Transform field denotes cases where a square root or first derivative transform was applied to the XRPD or NIRS data, respectively. The Meaneval and SDeval are the mean and standard deviation of the soil property laboratory measurements for each evaluation set, respectively. All performance parameters (R2 , ρc , RMSE, bias and RPIQ) relate to the evaluation set.

19

Figure 6: Feature plots from the Cubist XRPD calibration models for Kaqr , and the sand, silt and clay size fractions. Red = mean diffractogram. Black = variable use in decisions. Grey = variable use in multiple linear regression. The mean diffractogram has been square root transformed to emphasise minor peaks.

aqr

334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372

applies in the partitions defined in the Cubist model, which do not show any notable grouping based on the mineralogy. In combination with the performance of pHH2 O prediction (see below and Table 5), this may be interpreted as an indication that the pHH2 O of these Scottish soils is not well characterised by a model built primarily on mineralogical information, making it difficult to identify the precise contributions from specific minerals. The Cubist model for CEC prediction utilises plagioclase, kaolinite and general phyllosilicate features to partition the data (Figure 4d). With respect to regression variables, there is concentrated use of features associated with expandable clays (i.e. vermiculite, smectite, or mixed layer clays containing expandable layers) centred very generally at around 5.9 ◦ 2θ, which typically exhibit high CEC values [Dixon and Schulze (2002); Figure 4d; Table 4]. Although the CEC of a soil is affected by the precise identity of the clay minerals and their contrasting CEC properties (Parfitt et al., 1995; Ma and Eggleton, 1999), the Cubist partitions based on phyllosilicate/clay mineral features display more indiscrete effects on predictions of CEC. Generally, when partitions are made based on higher peak intensities of clay mineral features (including kaolinite, expandable clay and the (02,11) general non-basal phyllosilicate reflection), the partitioned samples display relatively high CEC’s. The reverse applies when partitions are made based on low intensities of clay mineral features. This generalisation of the clay mineral’s CEC properties by Cubist can be assigned to the difficulty of the precise identification of clay minerals from bulk soil XRPD analysis. This difficulty stems from clay minerals having similar structures and therefore broadly similar random powder diffraction patterns that are difficult to distinguish from one-another especially when mixed with other minerals (Moore and Reynolds, 1989). Aside from the phyllosilicates, quartz is also selected by Cubist as a decision feature (Figure 4d, 60.15 ◦ 2θ), and is used to define groups of soil samples that display relatively low CEC’s. Lastly, there is a notable presence of variables associated with diffusely scattered organic matter, which bares similarity to those observed in the CT and NT feature plots (Figure 4a-b; section 3.1). This presumably reflects the significant role of organic matter in defining the CEC of a soil (Parfitt et al., 1995). For Kaqr prediction, the selected features are again largely dominated by phyllosilicate contributions (Figure 6a). Despite the major sink of total soil K being feldspar minerals (Andrist-Rangel et al., 2006), the lack of feldspar features in prediction of Kaqr reflects their resistance to aqua regia digests (Andrist-Rangel et al., 2006). The consistent use of variables in the 8.7−8.9 ◦ 2θ range (Table 4; Figure 6a) as regression parameters relates to mica and/or illitic minerals. Less specifically, the (02,11) non basal diffraction band centered at 19.7 ◦ 2θ is used in 77 % of the multivariate regressions, and encompasses all dioctahedral phyllosilicates within the soil 20

373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409

sample. Further dominant features relate to multiple phyllosilicate reflections (the (20,13) phyllosilicate band) between approximately 34 − 35 ◦ 2θ and trioctahedral phyllosilicate reflections from 28.9 − 29.3 ◦ 2θ (Figure 6a; Table 4). The principal use of phyllosilicate features to predict Kaqr is consistent with observations that this property is sensitive to the potassium associated with clay minerals (Andrist-Rangel et al., 2006, 2010), therefore Cubist is indirectly, and appropriately, relating phyllosilicate identity and concentration (encoded within the diffractograms) to Kaqr . Performance parameters for the prediction of chemical soil properties from XRPD data (Table 5) show that pHH2 O can only be predicted with poor agreement when applied to the evaluation set (ρc = 0.605). Although the variables selected by Cubist for CEC prediction align with the predominant soil features that define this soil property, it too can only be predicted with poor agreement (ρc = 0.610). When the poor agreements of the pHH2 O and CEC predictions are combined with their relatively high RMSE’s (0.517 and 2.742, respectively), it is evident that these properties are not appropriately predicted by XRPD to advocate any wider predictive use in this case. This indicates that the combination of unprocessed bulk soil XRPD data with Cubist is inadequate for accurate pH or CEC prediction. Accurate prediction of these properties by Cubist would likely need to account for the phyllosilicate mineralogy more precisely, and also for the nature and composition of the organic matter, neither of which can be attained by bulk soil XRPD alone. In contrast, Kaqr can be predicted with substantial agreement (ρc = 0.855). The predictive performance of Cubist for Kaqr indicates that this property is primarily influenced by mineralogical features that are well resolved by bulk soil XRPD measurements, and confirms phyllosilicate concentration and composition as the primary driver of Kaqr in soil (Andrist-Rangel et al., 2006; ?, 2013). When the same chemical properties were predicted by Cubist using the NIRS dataset, the NIRS models produced better predictive performance for pHH2 O and CEC, while the XRPD model was the better predictor of Kaqr (Table 5). The NIRS Cubist models predicted pHH2 O with moderate agreement when applied to the evaluation set (ρc = 0.791), displaying lower RMSE than the XRPD predictions. This performance suggests that the more detailed organic information encoded within NIRS more suitably relates to the pHH2 O and CEC than the mineralogically dominated XRPD data in these soils. With respect to Kaqr prediction, the NIRS Cubist models performed particularly poorly when compared to the XRPD models, yielding ρc = 0.427 (ρc from XRPD = 0.855). This again highlights how this Kaqr is influenced primarily by phyllosilicate concentration and composition, which is not well characterised by NIRS.

21

410

3.3. Textural properties

443

In predicting the sand, silt and clay size fractions, Cubist consistently selects the (02,11) phyllosilicate/clay mineral non basal diffraction band (19.7 − 19.9 ◦ 2θ) in decision and regression variables (Figure 6b-d; Table 4). Cubist also utilises quartz (e.g. 60.1 ◦ 2θ) and feldspar (e.g. 27.8 ◦ 2θ) features to partition the data. It is generally found that the Cubist partitions based on an intense (02,11) phyllosilicate band are associated with lower predictions of sand and silt fractions, whilst the reverse applies to the predictions associated with intense quartz and feldspar features. This is in agreement with the understanding that soils with high sand fractions generally have a quartz dominated mineralogy (FitzPatrick, 1980). Furthermore, it would be relatively unusual to observe feldspar minerals in the clay size fraction (Odom et al., 1976), although soils with strong glacial influence (rock flour) can be exceptions. Kaolinite regression variables are noticeable in the sand (e.g. 20.4 ◦ 2θ), silt and clay (e.g. 21.2 − 21.7 ◦ 2θ) predictions (Figure 6b-d; Table 4). For these Scottish soils, it is likely high kaolinite is inherited from kaolinitic parent material, and such soils high in kaolinite will tend have high clay content (and thus, lower sand/silt content). In summary, to predict textural soil properties, Cubist attempts to balance information relating to clay minerals, primary phyllosilicates, feldspars, and quartz, and to some degree this reflects how the different mineral groups naturally partition into different particle size fractions. Performance parameters for the prediction of textural soil properties from XRPD data show the sand fraction to be predicted with substantial agreement by Cubist when applied to the evaluation set (ρc = 0.807; Table 5), while the silt and clay fractions were predicted with moderate agreement (ρc = 0.783 and 0.749, respectively). The observation of soil particle size fractions being predicted with moderate to substantial agreement from XRPD data reflects how the break-up (i.e. size) and break down (i.e. chemical/physical weathering) of inorganic soil constituents is somewhat governed by their mineralogy (Dixon and Schulze, 2002). When predicting the same textural properties using the equivalent NIRS dataset, the sand, silt and clay Cubist models displayed inferior performance compared to their XRPD equivalents (Table 5). Ultimately this highlights the relationship between soil mineralogy and soil texture that cannot be described by NIRS alone. At the same time this reflects the value of national scale XRPD data in soil science and justifies its use in combination with data mining.

444

4. Conclusions

411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442

445 446

Using a precision method of sample preparation involving micronisation and spray drying, this study has shown that XRPD can be used in combination with

22

462

data mining to predict and interpret a range of organic, chemical and textural soil properties from a national scale dataset of mineral soils. Contrary to the way in which the crystalline minerals dominate the information from XRPD, information from diffuse scattering (background) associated with X-ray amorphous soil organic matter can yield accurate prediction of CT and NT . More complex properties such as pH and CEC are not well predicted when data mining is applied to bulk soil XRPD data in this case. Nevertheless the models select specific features of diffraction patterns that can be interpreted in terms of the mineralogical constituents that are influencing these properties. Notably the Kaqr reflects the concentration of phyllosilicate K, and for the Scottish soils presented here, is well predicted by the XRPD-based Cubist models. When compared to an equivalent NIRS dataset, the XRPD data yielded superior predictive models for six of the eight soil properties investigated. This therefore justifies the application of data mining to soil XRPD data, and contributes to the alignment of soil XRPD with the recent advances in soil spectroscopy; where newly developed high-throughput, data-driven, approaches are widely applied.

463

Acknowledgements

447 448 449 450 451 452 453 454 455 456 457 458 459 460 461

470

The authors would like to thank the many staff that assisted in the sampling and analysis of NSIS 1 and NSIS 2. In particular, we are grateful to Carrie Donald and Rachael Hill for NIR data collection, Helen Pendlowski for XRPD data collection, and Caroline Thomson for both NIR and XRPD data collection. This work was supported by a Macaulay Development Trust Fellowship. The support of the Scottish Government’s Rural and Environment Science and Analytical Services Division (RESAS) is also gratefully acknowledged.

471

References

464 465 466 467 468 469

472 473 474 475

476 477 478 479

480 481

¨ Andrist-Rangel, Y., Hillier, S., Oborn, I., Lilly, A., Towers, W., Edwards, A. C., Paterson, E., 2010. Assessing potassium reserves in northern temperate grassland soils: A perspective based on quantitative mineralogical analysis and aqua-regia extractable potassium. Geoderma 158 (3-4), 303–314. ¨ Andrist-Rangel, Y., Simonsson, M., Andersson, S., Oborn, I., Hillier, S., 2006. Mineralogical budgeting of potassium in soil: A basis for understanding standard measures of reserve potassium. Journal of Plant Nutrition and Soil Science 169 (5), 605–615. ¨ Andrist-Rangel, Y., Simonsson, M., Oborn, I., Hillier, S., 2013. Acid-extractable potassium in agricultural soils: Source minerals assessed by differential and quan23

482 483

484 485 486

487 488 489

490 491 492 493

494 495 496 497

498 499 500

501 502

503 504

505 506

507 508

509 510

511 512 513

titative X-ray diffraction. Journal of Plant Nutrition and Soil Science 176 (3), 407–419. Barnes, R. J., Dhanoa, M. S., Lister, S. J., 1989. Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Applied Spectroscopy 43 (5), 772–777. Barr, G., Dong, W., Gilmore, C. J., 2009. PolySNAP3: A computer program for analysing and visualizing high-throughput data from diffraction and spectroscopic sources. Journal of Applied Crystallography 42 (5), 965–974. Bellon-Maurel, V., Fernandez-Ahumada, E., Palagos, B., Roger, J. M., McBratney, A., 2010. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. Trends in Analytical Chemistry 29 (9), 1073–1081. Chapman, S. J., Bell, J. S., Campbell, C. D., Hudson, G., Lilly, A., Nolan, A. J., Robertson, A. H. J., Potts, J. M., Towers, W., 2013. Comparison of soil carbon stocks in Scottish soils between 1978 and 2009. European Journal of Soil Science 64 (4), 455–465. Chipera, S. J., Bish, D. L., 2002. FULLPAT: A full-pattern quantitative analysis program for X-ray powder diffraction using measured and calculated patterns. Journal of Applied Crystallography 35 (6), 744–749. Cleveland, C. C., Liptzin, D., 2007. C:N:P stoichiometry in soil: Is there a ”Redfield ratio” for the microbial biomass? Biogeochemistry 85 (3), 235–252. Dixon, J. B., Schulze, D. G., 2002. Soil mineralogy with environmental applications. SSSA Book Series. 7. Soil Science Society of America, Madison, WI. FitzPatrick, E. A., 1980. Soils. Their formation, classification and distribution. Longman. Hillier, S., 1999. Use of an air brush to spray dry samples for X-ray powder diffraction. Clay Minerals 34 (1), 127–135. ICDD, 2016. PDF-4+ 2016 (Database). International Center for Diffraction Data, Newtown Square, PA, USA. Jones, E. J., McBratney, A. B., 2016. In situ analysis of soil mineral composition through conjoint use of Visible, Near-Infrared and X-ray Fluorescence Spectroscopy. In: Digital Soil Morphometrics. Springer, pp. 51–62.

24

514 515

516 517 518

519 520 521

522 523

524 525

526 527

528 529 530

531 532 533

534 535 536

537 538 539

540 541

542 543

544 545 546

Kennard, R. W., Stone, L. A., 1969. Computer aided design of experiments. Technometrics 11, 137–148. Kirkby, C. A., Kirkegaard, J. A., Richardson, A. E., Wade, L. J., Blanchard, C., Batten, G., 2011. Stable soil organic matter: A comparison of C:N:P:S ratios in Australian and other world soils. Geoderma 163 (3-4), 197–208. Kramer, M. G., Lajtha, K., Audfenkampe, A., 2017. Depth trends of soil organic matter C:N and 15N natural abundance controlled by association with minerals. Biogeochemistry, 1–12. Kuhn, M., Quinlan, R., 2017. Cubist: Rule- and Instance-based regression modelling. R package version 0.2.1. Lin, L. I.-K., 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45 (1), 255–268. Ma, C. H. I., Eggleton, R. A., 1999. Cation Exchange Capacity of Kaolinite. Clays and Clay Minerals 47 (2), 174–180. McGrath, S., mar 1987. Computerized quality control, statistics and regional mapping of the concentrations of trace and major elements in the soil of England and Wales. Soil Use and Management 3 (1), 31–38. McGrath, S. P., Cunliffe, C. H., 1985. A simplified method for the extraction of the metals Fe, Zn, Cu, Ni, Cd, Pb, Cr, Co and Mn from soils and sewage sludges. Journal of the Science of Food and Agriculture 36, 794–798. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2017. e1071: Misc functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, https://cran.r-project.org/package=e1071. Minasny, B., McBratney, A. B., 2008. Regression rules as a tool for predicting soil properties from infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems 94 (1), 72–79. Minasny, B., McBratney, A. B., 2016. Digital soil mapping: A brief history and some lessons. Geoderma 264, 301–311. Moore, D., Reynolds, R., 1989. X-ray diffraction and the identification and analysis of clay minerals. Oxford University Press. Nagra, G., Burkett, D., Huang, J., Ward, C., Triantafilis, J., 2017. Field level digital mapping of soil mineralogy using proximal and remote-sensed data. Soil Use and Management 33 (3), 425–436. 25

547 548

549 550 551 552 553 554 555

556 557

558 559 560

561 562 563

564 565 566 567

568 569 570

571 572 573

574 575

576 577

578 579

Newman, A. C. D., 1984. The significance of clays in agriculture and soils. Phil. Trans. R. Soc. Lond. A. 311, 375 – 389. Nocita, M., Stevens, A., van Wesemael, B., Aitkenhead, M., Bachmann, M., Barthes, B., Dor, E. B., Brown, D. J., Clairotte, M., Csorba, A., Dardenne, P., Dematte, J. A. M., Genot, V., Guerrero, C., Knadel, M., Montanarella, L., Noon, C., RamirezLopez, L., Robertson, J., Sakai, H., Soriano-Disla, J. M., Shepherd, K. D., Stenberg, B., Towett, E. K., Vargas, R., Wetterlind, J., 2015. Soil Spectroscopy: An Alternative to Wet Chemistry for Soil Monitoring. Advances in Agronomy 132, 139–159. Odom, I. E., Doe, T. W., Dott, R. H., 1976. Nature of feldspar-grain size relations in some quartz-rich sandstones. Journal of Sedimentary Research 46 (4), 862–870. Omotoso, O., McCarty, D. K., Hillier, S., Kleeberg, R., 2006. Some successful approaches to quantitative mineral analysis as revealed by the 3rd reynolds cup contest. Clays and Clay Minerals 54 (6), 748–760. O’Rourke, S. M., Minasny, B., Holden, N., McBratney, A. B., 2016a. Synergistic Use of Vis-NIR, MIR, and XRF Spectroscopy for the Determination of Soil Geochemistry. Soil Sci. Soc. Am. J. O’Rourke, S. M., Stockmann, U., Holden, N. M., McBratney, A. B., Minasny, B., 2016b. An assessment of model averaging to improve predictive power of portable vis-NIR and XRF for the determination of agronomic soil properties. Geoderma 279, 31–44. Parfitt, R. L., Giltrap, D. J., Whitton, J. S., 1995. Contribution of organic matter and clay minerals to the cation exchange capacity of soils. Communications in Soil Science and Plant Analysis 26, 1343–1355. Pérez-Fernández, E., Robertson, A. J., 2016. Global and local calibrations to predict chemical and physical properties of a national spatial dataset of Scottish soils from their near infrared spectra. Journal of Near Infrared Spectroscopy 24 (3), 305. Quinlan, J. R., 1992. Learning with continuous classes. Machine Learning 92, 343– 348. R Core Team, 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Reeves, J. B., Smith, D. B., 2009. The potential of mid- and near-infrared diffuse reflectance spectroscopy for determining major- and trace-element concentrations

26

580 581

582 583 584 585 586

587 588 589

590 591 592

593 594 595 596

597 598 599 600

601 602

603 604

605 606 607 608

609 610 611

in soils from a geochemical survey of North America. Applied Geochemistry 24 (8), 1472–1481. Sanchez, P. A., Ahamed, S., Carré, F., Hartemink, A. E., Hempel, J., Huising, J., Lagacherie, P., Mcbratney, A. B., Mckenzie, N. J., de Lourdes Mendon¸ca-santos, M., Minasny, B., Montanarella, L., Okoth, P., Palm, C. A., Sachs, J. D., Shepherd, K. D., V˚ agen, T.-g., Vanlauwe, B., Walsh, M. G., Winowiecki, L. A., Zhang, G.-l., 2009. Digital Soil Map of the World. Science 325, 6–7. Schulze, D. G., 1989. An introduction to soil mineralogy. In: Amonette, J. E., Bleam, W. F., Schulze, D. G., Dixon, J. B. (Eds.), Soil mineralogy with environmental applications. Soil Science Society of America, Madison, WI, pp. 1–35. Sila, A. M., Shepherd, K. D., Pokhariyal, G. P., 2016. Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties. Chemometrics and Intelligent Laboratory Systems 153, 92–105. Soriano-Disla, J. M., Janik, L. J., Viscarra Rossel, R. a., Macdonald, L. M., McLaughlin, M. J., 2014. The Performance of Visible, Near-, and Mid-Infrared Reflectance Spectroscopy for Prediction of Soil Physical, Chemical, and Biological Properties. Applied Spectroscopy Reviews 49 (2), 139–186. Towett, E. K., Shepherd, K. D., Tondoh, J. E., Winowiecki, L. A., Lulseged, T., Nyambura, M., Sila, A., V˚ agen, T.-g., Cadisch, G., 2015. Total elemental composition of soils in Sub-Saharan Africa and relationship with soil forming factors. Geoderma Regional 5, 157–168. Ulery, A. L., Drees, R., 2008. Methods of soil analysis: mineralogical methods. ASA-CSSA-SSSA. Viscarra Rossel, R. A., Behrens, T., 2010. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158 (1-2), 46–54. Viscarra Rossel, R. A., Behrens, T., Ben-Dor, E., Brown, D. J., Demattê, J. A. M., Shepherd, K. D., Shi, Z., Stenberg, B., Stevens, A., Adamchuk, V., et al., 2016. A global spectral library to characterize the world’s soil. Earth-Science Reviews 155 (January), 198–230. Viscarra Rossel, R. A., Cattle, S. R., Ortega, A., Fouad, Y., 2009. In situ measurements of soil colour, mineral composition and clay content by vis-NIR spectroscopy. Geoderma 150 (3-4), 253–266.

27

612 613 614

Viscarra Rossel, R. A., Webster, R., 2012. Predicting soil properties from the Australian soil visible-near infrared spectroscopic database. European Journal of Soil Science 63 (6), 848–860.

28

Using rule-based regression models to predict and ...

Using rule-based regression models to predict and ...

Suggest Documents

Applying Regression Models to Predict Business Results

Multiple linear regression models to predict the ... - ASLO - Wiley

Development of Regression-Based Models to Predict Fecal Bacteria ...

TRUSTABLE SYMBOLIC REGRESSION MODELS: USING ...

TRUSTABLE SYMBOLIC REGRESSION MODELS: USING ...

Using Linear Regression to Predict Changes in ... - Semantic Scholar

Using Linear Regression to Predict Changes in ... - Semantic Scholar

Using correlations and regression models - UV

Using Stochastic Models to Describe and Predict Social Dynamics of ...

Using Light-Use and Production Efficiency Models to Predict ...

Using ocean models to predict spatial and ... - Wiley Online Library

Using Interest and Transition Models to Predict ... - Semantic Scholar

Using degree-day and nonlinear regression models to ... - DergiPark

An Approach to Using Finite Element Models to Predict Suspension ...

Regression model to predict thread consumption incorporating ...

Using directional shear stress models to predict slope ...

Using species distribution models to predict new ... - CiteSeerX

Algorithmic Models to Predict Allergic Disease Using ... - MedIND

Using Cutting-Edge Tree-Based Stochastic Models to Predict ... - MDPI

Using mathematical models to predict annoyance ... - Semantic Scholar

Using species distribution models to predict new ... - CiteSeerX

Using Attributes to Predict Objectives in Preference Models

Using Symmetric Causal Independence Models to Predict Gene

Using Evolutionary Models with Mutations to Predict Long-Term ... - ijmo