Partial least squares modeling of Hyperion ... - SPIE Digital Library

6 downloads 0 Views 697KB Size Report
This paper investigated the capacity of Hyperion images coupled with Partial least squares analysis (PLS) for mapping agricultural soil properties. Soil samples ...
Partial least squares modeling of Hyperion image spectra for mapping agricultural soil properties Tingting Zhang*a, b, Lin Lia and Baojuan Zheng†a Department of Earth Sciences, Indiana University-Purdue University, Indianapolis, 723 West Michigan St., Indianapolis, IN, USA 46202; b College of Earth Sciences, Jilin University, 2199 Jianshe St., Changchun, Jilin, China 130061 a

ABSTRACT This paper investigated the capacity of Hyperion images coupled with Partial least squares analysis (PLS) for mapping agricultural soil properties. Soil samples were collected from Cicero Creek Watershed of central Indiana, and analyzed for soil moisture content (MC), soil organic matter (SOM), total carbon (C), total phosphorus (P), total nitrogen (N) and clay content. Two scenes of Hyperion images covering the watershed were acquired, calibrated and georeferenced, and image spectra were extracted from them. Two phases of PLS modeling was conducted: all samples were used and outliers were identified and removed in phase 1, and in phase 2, the outlier removed dataset were split into two subsets for calibration and validation. The PLS results for both phases indicate that PLS modeling of Hyperion spectra is effective to predict MC, SOM, total C, and total N, but resulted in low correlations for total P and clay content. The low correlation for total P is attributed to low correlation between SOM and total P. The worst correlation for clay content is due to the low signal-to-noise ratio of Hyperion images in the short wave infrared region. Future work is needed for improving the estimates of total P and clay content. Keywords: Soil properties, Hyperion, PLS

1. INTRODUCTION Accurate information on soil properties has important implication for environmental monitoring, modeling and precision agriculture. For example, as the largest carbon stock of continental biosphere, soil organic matter (SOM) is related to the size and capacity of soil microbial population and controls soil structural stability. Effective management of soil C reservoir can help to mitigate greenhouse gases [1]. Soil moisture is related to energy exchange between soil and atmosphere and favorable to N2O and CH4 production; both soil moisture and clay content are important for plant growth and soil quality. Soil nutrient conditions determine the rate of fertilizer applications that is directly related to N2O emission and soil capacity to consume atmosphere CH4. Remote sensing of multiple soil properties (total C, SOM, total N, total P, clay, carbonate, and moisture content) represents a major advance in the technology for mapping the spatial distribution of SOM and soil C and incorporating the spatial data into the decision-making process for preventing global warming. A number of laboratory investigations have been conducted to characterize soil constituents using multivariate statistical analyses of laboratory and in situ measured reflectance spectra [2-4]. Among these models, multiple linear regression analysis (MLR), principal component regression (PCR) or partial least squares regression (PLS) are commonly used to quantify hyperspectral soil data. MLR uses a linear equation to correlate a response variable (i.e., chemical concentration) with two or more explanatory variables (i.e., spectral wavelength). The number of spectral wavelengths that could be used in MLR is limited because a larger number of spectral bands than the number of samples can result in rank deficiency problems. Both PCR and PLS are full-spectrum methods. PCR is simply principal component analysis (PCA) of spectra followed by a regression against chemical compositions, while PLS is a rotated PCA applied to both *[email protected]; phone 1 317 274 8383; fax 1 317 274 7966 †Current address: Geography Department, Virginia Tech, 115 Major Williams Hall, Blacksburg, VA, USA 24061

Remote Sensing and Modeling of Ecosystems for Sustainability VI, edited by Wei Gao, Thomas J. Jackson, Proc. of SPIE Vol. 7454, 74540P · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.824635

Proc. of SPIE Vol. 7454 74540P-1

spectra and chemical compositions and then finds the best relationship between them. Recently, several new statistical tools have been utilized for soil mapping, such as artificial neural networks (ANN) and boosted regression trees (BRT). [5] built BRT and PLS models with soils collected from all around the world for determining soil organic carbon, inorganic carbon, clay, cation exchange capacity (CEC) and Fe. As compared to numerous study cases of using laboratory and in situ measured reflectance spectra, fewer attempts have been made to map soil properties with airborne or satellite hyperspectral imagery. This type of research is necessary because the relationships between soil properties and spectra are not necessary transferable to airborne or satellite hyperspectral images because uncertainties in image radiometric and geometric corrections or induced by image spatial resolution and imaging conditions [6]. [7] applied MLR to estimate soil organic matter with soil field multipspectral data. [8] applied MLR to Digital Airborne Imaging Spectrometer (DAIS-7915) data. Soil organic carbon (SOC) was mapped using the CASI-2 data [9] and estimated from the AHS-160 data [10]. The Hyperspectral Mapper (HyMap) data were used to map organic C and total N [11] and soil salinity [12]. [13] predicted SOC from Hyperion spectra through PLS modeling. Without an exception, all these studies used airborne hyperspectral images. In this study, satellite Hyperion hyperspectral images coupled with the PLS regression is proposed and assessed for estimating soil constituents with the aim of examining the effectiveness of this technical procedure.

2. MATERIALS AND METHODS 2.1 Study area and soil samples The study area is located in Cicero Creek Watershed of Central Indiana (Fig.1) in which the row-crop agriculture is the dominant type of land use. According to the State Soil Geographic (STATSGO) data base, main soil associations in Cicero Creek Watershed are crosby-treaty-miami, miami-crosby-treaty, patton-del rey-crosby and drummer-torontowingate. Fertilizers and pesticides have been the main source of pollutants discharged into Morse Reservoir, thus cause yearly algal blooms in this drinking water reservoir.

Fig.1. Sampling location in Cicero Creek

Soil sample collection was carried out on the same day or after a few days when satellite images were acquired. A total of 33 surface (0 - 2 cm) agricultural soils were sampled and each was collected from about a 20 × 20 cm area. In order to yield representative soil samples, about five soil samples were collected for each soil association. The geographic coordinates of each sampling site were recorded at one meter accuracy using a global positioning system (GPS) instrument. Soil samples were kept fresh in Zip-loc bags (17 cm × 20 cm) and stored over ice in coolers before transported to the laboratory. Soil samples were stored in a laboratory refrigerator (-4oC) before soil property analyses. Each soil sample was analyzed for the amount of SOM, moisture, total C, total N, total P and clay; SOM was determined with the Loss on Ignition

Proc. of SPIE Vol. 7454 74540P-2

(LOI) method, moisture measured by the gravimetric method, total N and total C by dry combustion, total P using strong acid digestion followed by the molybdate blue technique and detection with a Shimadzu scanning spectrophotometer at 880 nm, and clay content through freeze drying, centrifuge and particle size analysis. Clay particle size analysis was done on the Malvern Mastersizer 2000 laser particle size analyzer. The range and mean for the content of each soil parameter is presented in Table 1. Table 1. Maximum, minimum and mean values of soil constituents Soil Properties Moisture content % SOM % Total P (mg kg-1 soil) Total C % Total N % Clay content %

Maximum 29.76 13.29 1066.11 6.37 0.52 36.78

Minimum 4.45 2.55 297.95 0.77 0.095 16.27

Mean 14.68 5.05 647.35 1.91 0.20 26.64

2.2 Hyperion image acquisition and processing Hyperion imaging spectrometer is on board the EO-1 satellite which launched on November 21, 2000. Hyperion images are characterized by a total of 242 channels in 10 nm spectral intervals over the spectral region of 356-2577 nm, and acquired at a 30-meter spatial resolution with an approximate 50:1 signal-to-noise ratio (SNR). A Hyperion scene has 7.7 km cross-track width with 42 km or 185 km along-track length. The EO-1 satellite does not acquire data continuously and its sensors are only activated to collect specific scenes upon a request. Two scenes of Hyperion images were acquired respectively on Apr. 24 and May 7, 2007. During the acquisition of Hyperion images, there was no green vegetation cover in the field. The images were acquired at around 10:00 am local time (16:00 GMT). Fig. 1 shows the area where the Cicero Creek watershed was covered by two Hyperion images and each has 7288×1955 pixels. The Hyperion images were delivered from United States Geological Survey (USGS) in radiance, and they were radiometrically and geometrically corrected so that reflectance spectra could be extracted from them and related to a specified soil property. For the radiometric calibration, the following steps were taken: 1) a pixel shift was applied to samples 129–256 in the SWIR region to co-register this portion of the data with the VNIR observations; 2) the VNIR bands were multiplied by a scale factor of 1.08, and the SWIR bands were multiplied by a scale factor of 1.18; and 3) the wavelength values were increased by 2 nm for all bands. These calibration steps followed the recommendation by Goddard Space Flight Center (GSFC) [14] and were achieved through the Band Math and Edit ENVI Header functions provided by ENVI 4.5. Another step in the radiometric calibration was to convert the Hyperion images from radiance to reflectance, which was accomplished using ACORN, a commercialized software package for atmospheric calibration [15]. For the geometric calibration, the Hyperion images were rectified by referencing 2006 aerial photographs of the Hamilton, Tipton counties, Indiana. High spatial resolution (2 m) aerial photographs were degraded to 30 meter spatial resolution, and ground control point (GCPs) pairs were manually selected from the referencing aerial photo and the Hyperion images. A bilinear warping method was applied to project each Hyperion image into the coordination of Universal Transverse Mercator (UTM) Zone 16, NAD-1983 Datum. The registration accuracy was assessed using the ENVI dynamic overlay function. Two Hyperion scenes were acquired on different days within which atmospheric conditions and the sensors' behavior may not be so consistent that the same ground object could be spectrally different on the two scenes. A normalization correction was applied to the Hyperion scene acquired on May 7, 2007 where the image acquired on Apr. 24 was used as the master image. After the normalization, two Hyperion images were mosaiced together on which soil sampling sites were projected using the coordinates recorded by the GPS unit, and the Hyperion image spectra (Fig.2) corresponding to these sampling sites were extracted for building spectral-chemical compositional models.

Proc. of SPIE Vol. 7454 74540P-3

7000

%Reflectance×100

6000 5000 4000 3000 2000 1000 0 0

500

1000 1500 Wavelength (nm)

2000

2500

Fig.2. Hyperion image spectra smoothed for clarity

2.3 Partial Least Squares (PLS) Modeling Partial least squares (PLS) modeling was used to build relationships between soil property parameters and Hyperion reflectance spectra. PLS is a standard multivariate regression method developed by Herman Wold [16, 17]; it uses a few eigenvectors of the explanatory variables so that the corresponding scores not only explain the variance of explanatory variables but also have high correlation with response variables. A simplified PLS model consists of two outer relations shown in equations 1 and 2 that describe the eigenstructure decomposition of both the matrix containing the explanatory variables (i.e., spectral bands) and the matrix containing the response variables (i.e., the abundance of SOM), and an inner relation shown in equation 3 that links the resultant score matrices from these two eigenstructure decompositions [18]. X = TP’ + E

(1)

Y = UQ’ + F

(2)

U = BT

(3)

The first outer relation is derived by applying principle component analysis (PCA) to X, resulting in the score matrix T and the loading matrix P plus an error matrix E. In the similar way, the second outer relation is derived by decomposing Y into the score matrix U and the loading matrix Q and the error term F. The inner relation is a multiple linear regression between the score matrices U and T in which B is a regression coefficient matrix determined via least square minimization. The prime represents matrix transpose. Y is computed as: Y = TBQ’+ F

(4)

The goal of PLS modeling is to minimize the norm of F while maximizing the covariance between X and Y by the inner relation. Selecting the optimal number of latent variables is essential for building a robust PLS model. The leave-oneout cross-validation method was used to determine the optimal number of latent variables. Given a set of m samples, m1 samples are used to develop a calibration model and the concentration of the left out sample is predicted using the calibration model. This process is repeated until each sample has been left out once. The predicted error sum of squares (PRESS) can be calculated as: m

^

PRESS = ∑ ( y(i ) − yi ) 2 i =1

(5)

where ŷ(i) and yi are the estimated and actual concentration for the left-out sample respectively. Root mean square error of cross-validation (RMSECV) for each PLS model with a given number of latent variables is expressed as: RMSECV = PRESS j m

Proc. of SPIE Vol. 7454 74540P-4

(6)

where j is the number of latent variables. In general, the number of latent variables is considered optimal when it yields the minimal RMSECV. For a specified soil property parameter, PLS modeling was carried out in two phases: the first was used to determine outliers in the dataset for the quality control purpose and the second was run with the calibration and validation dataset in which outliers had been removed. For screening outliers, two statistical indices leverage and studentized residual were used to determine outliers in the dataset. Leverage defines the influence that a given sample has on a PLS model and is defined as the variance of the vector of PLS factors for the sample weighted by the covariance matrix of the factor score matrix T for the calibration dataset [19]. The studentized residual indicates the lack of fit for the content of a sample, and is assumed to obey the normal (mean zero and unit variance) distribution. The scatter plot of leverage vs. studentized residual was examined where samples with either studentized residuals not less than 3 or leverage not smaller than 3+3N/m was identified as an outlier [19]. Here N is the optimal number of PLS factors. The identified outliers would be discarded in the second phase PLS analysis. For the PLS analysis in the second phase, the outlier removed dataset was divided into two subsets: one for calibration and the other for validation. Separation between the calibration and validation subset was achieved by first sorting the samples in the descent or ascent order and then selecting the samples for validation randomly, and the remaining samples would be used for calibration. Previous studies have shown that data pretreatment improved the PLS performance [20, 21]. While a wide range of pretreatment methods are available including mean centering, auto scale, derivatives, smoothing, multiplicative scatter correction and orthogonal signal correction, we applied mean centering to the dataset. Mean centering aims to subtract individual band means from Hyperion image spectra and the similar subtraction was also applied to soil property contents. Mean centering was selected to use because of its simplicity and without involving with additional sophistication for interpretation of PLS results.

3. RESULTS 3.1 PLS modeling-Phase 1 Because two soil samples were beyond the coverage of the Hyperion image and spectra of another three samples had poor quality, 28 out of 33 samples were analyzed for identification of outliers in the PLS model for estimating individual soil constituents: moisture content, SOM, total P, total C, total N and clay content. Using the criteria for leverage and studentized residual, one outlier was determined from modeling moisture, SOM, total C and total N, and should be excluded in the second phase analysis. Fig. 3 shows the plot of leverage vs. studentized residual where a sample having a residual magnitude larger than 3 is shown as an outlier. With respect to total P, two samples with the highest student residuals were discarded because of their extremely low P content though neither were determined as outliers based on the criteria for leverage and studentized residual. 1.5 1

SOM Studentized Residual

0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 0

0.05

0.1

0.15 0.2 Leverage

0.25

0.3

Fig. 3. Studentized residual for SOM

Proc. of SPIE Vol. 7454 74540P-5

0.35

After removal of the outliers from each data set for modeling individual soil property, PLS was applied and the results were shown in Table 2. According to coefficient of determination, PLS results in the best estimate for SOM and total C, followed by moisture (R2 =0.79) and then by total N (R2=0.70) and total P (R2=0.69). PLS generated the most poor estimate for clay content (R2=0.49), yet the slope of 0.9865 for clay is closed to 1. Comparison between estimated and measured soil property values was shown in Fig.4. Table 2. Calibration results using all usable samples Soil Properties

No. of Sample

Moisture content % SOM % Total P (mg kg-1 soil) Total C % Total N % Clay content %

No. of LVs

27 27 26 27 27 28

4 6 3 6 4 4

15

25 Predicted SOM %

Predicted moisture %

30

20

10

15 10 y=0.9763x R2=0.79

5 0

5 y=0.9859x R2=0.89 0

0

10

20

30

Measured moisture %

(a)

0

10

15

Measured SOM % 8

Predicted total C %

Predicted total P (mg kg-1 soil)

5

(b)

1500

1000

500 y=0.9822x R2=0.69 0

6

4

2

y=0.9717x R2=0.86

0 0

500

1000

Measured total P (mg (c)

Calibration RMSEC 2.34 0.65 93.43 0.37 0.039 3.16

R2 0.79 0.89 0.69 0.86 0.70 0.49

1500

kg-1

0

soil)

2

4

6

Measured total C % (d)

Proc. of SPIE Vol. 7454 74540P-6

8

40

Predicted clay %

Predicted total N%

0.6

0.4

0.2 y=0.9666x R2=0.70

30

20

10

y=0.9865x R2=0.49

0

0.0 0.0 (e)

0.2 0.4 Measured total N %

0

0.6

10

20

30

40

Measured clay %

(f)

Fig.4. PLS results for all samples after removing outliers : a) moisture content; b) SOM; c) total P; d) total C; e) total N; f) clay content.

3.2 PLS modeling-phase 2 Eight samples were randomly selected for validation from the outlier removed datasets, and the remaining samples (1820) were used in calibration for estimating individual soil property content in this phase of PLS modeling. The PLS results were summarized in Table 3. According to coefficient of determination, PLS generated consistently well calibration for all soil property parameters with R2 ranging from 0.80 to 0.89 except total P with R2=0.71. In spite of lower R2 values than those for calibration, the PLS validation for all soil property parameters was reasonably well with R2 ranging from 0.55 to 0.67, but a poor result for clay content with R2=-0.51. One should note that both PLS calibration and validation resulted in a slope close to 1 for estimating all soil property parameters, and this is shown in Figs.5-10. Table 3. Results from PLS modeling for split subsets Soil Properties Moisture content % SOM % Total P (mg kg-1 soil) Total C % Total N % Clay content %

No. of LVs 5 5 3 4 4 4

Calibration R2 0.89 0.87 0.71 0.81 0.80 0.87

RMSEC 1.76 0.74 87.66 0.46 0.037 1.61

Proc. of SPIE Vol. 7454 74540P-7

Validation R2 0.60 0.67 0.55 0.63 0.64 -0.50

RMSEP 3.12 1.27 104.35 0.49 0.052 4.78

30

a

Predicted moisture %

Predicted moisture %

30

20

10 y=0.9876x R2=0.89 for calibration

b 20

10 y=0.8858x R2=0.60 for validation

0

0 0

10 20 Measured moisture %

0

30

10 20 Measured moisture %

30

Fig.5. Measured moisture content vs. moisture content predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation.

18

a

14

Predicted SOM %

Predicted SOM %

18

10

y=0.9818x R2=0.87 for calibration

6

2

b

14

10

6

y=0.9932x R2=0.67 for validation

2 2

6 10 14 Measured SOM %

18

2

6

10 14 Measured SOM%

18

Fig.6. Measured SOM vs. SOM predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation.

Proc. of SPIE Vol. 7454 74540P-8

1500

a

Predicted total P (mg kg-1 soil)

Predicted total P (mg kg-1 soil)

1500

1000

500 y=0.9852x R2=0.71 for calibration

b 1000

500 y=0.9917x R2=0.55 for validation 0

0 0

500 1000 Mesured total P (mg kg-1 soil)

0

1500

500 1000 Mesured total P (mg kg-1 soil)

1500

Fig.7. Measured total P vs. total P predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation. 8

8

b Predicted total C %

Predicted total C %

a 6

4

2

y=0.9572x R2=0.81 for calibration

0

6

4

2

y=0.9081x R2=0.63 for validation

0 0

2

4

6

8

0

Measured total C %

2 4 6 Measured total C %

Fig.8. Measured total C vs. total C predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation.

Proc. of SPIE Vol. 7454 74540P-9

8

0.5

0.5

a Predicted total N %

Predicted total N %

0.4

0.3

0.2 y=0.9695x R2=0.80 for calibration

0.1

b

0.4

0.3

0.2 y=0.9351x R2=0.64 for validation

0.1

0.0

0.0 0.0

0.1

0.2

0.3

0.4

0.0

0.5

0.1

Measured total N %

0.2

0.3

0.4

0.5

Measured total N %

Fig.9. Measured total N vs. total N predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation. 40

40

b 30

30 Predicted clay %

Predicted clay %

a

20

10

20

10

y=0.9967x R2=0.87 for calibration

0

y=1.0007x R2=-0.50 for validation

0 0

10

20 30 Measured clay %

40

0

10

20 30 Measured clay %

40

Fig.10. Measured clay content vs. clay content predicted by PLS modeling with Hyperion spectra: a) results for calibration, and b) results for validation.

4. DISCUSSION The results from PLS modeling of both phases 1 and 2 indicate that SOM was modeled at the highest accuracy: R2=0.89 in phase 1, R2=0.87 for calibration and R2=0.67 for validation in phase 2. These results for SOM are expected and consistent with a number of previous studies that have shown that SOM can be reliably estimated using imaging spectroscopy [22]. For estimating SOM, high performance of PLS is attributed to its capability of using full-spectrum information and the dominant effect of SOM on the VNIR-SWIR spectral response. Although many of previous studies have shown successes in estimating SOM from lab or in situ measured reflectance spectra [10][23], this study shows that Hyperion images can be used to generate detailed SOM maps at large scales.

Proc. of SPIE Vol. 7454 74540P-10

SOM consists of organic carbon, organic nitrogen and organic phosphorus. In our dataset, total C and total N show high correlations to SOM with a correlation coefficient 0.93 and 0.95 respectively, indicating that mineral portion is less important in total C and N of the analyzed soils. This makes sense given the fact that the soil samples were collected in farming fields. These high correlations also explain why total C and N can be estimated at high accuracies: R2 reaches 0.86 for total C and 0.70 for N in phase 1, and 0.64 for total C and 0.64 for N for the phase 2 validation. However, the estimates for total P are not as accurate as those for total C and N, and this can be due to a low correlation between total P and SOM in the analyzed samples. The correlation coefficient between total P and SOM is only 0.30, much lower than those for total C and N. In addition, two samples with the lowest total P content(

Suggest Documents