INT. J. REMOTE SENSING, VOL.
25,
NO.
10 MAY, 2004, 9, 1723–1732
An artificial neural network model for estimating crop yields using remotely sensed information D. JIANG*{, X. YANG{, N. CLINTON{ and N. WANG{ {Resource and Environment Data Center, Institute of Geographic Science and Natural Resources Research, Chinese Academy of Sciences, Beijing P.O. Box 9717, 100101, People’s Republic of China {Center for the Assessment and Monitoring of Forest and Environmental Resources, University of California, Berkeley, California, USA (Received 23 April 2001; in final form 29 April 2003 ) Abstract. Crop yield forecasting is a very important task for researchers in remote sensing. Problems exist with traditional statistical modelling (especially regression models) of nonlinear functions with multiple factors in the cropland ecosystem. This paper describes the successful application of an artificial neural network in developing a model for crop yield forecasting using back-propagation algorithms. The model has been adapted and calibrated using on the ground survey and statistical data, and it has proven to be stable and highly accurate.
1.
Introduction The establishment of models for estimating crop yield is a very useful and challenging task in remote sensing research. Scientists in the USA, Europe, Asia and elsewhere have done much research on this subject, and different types of yield estimating models have arisen since the 1980s. However, due to the complexity and nonlinear character of agricultural ecosystems, problems existed in the model using traditional statistical algorithms (Kimes et al. 1998, Jiang 2000). There are several types of yield estimation models that use remotely sensed data. They can be classified as either theoretical or experiential models (Wang 1996). Using differential equations, theoretical models try to emulate the primary principles of agricultural ecosystems by simulating the response relationship between yield and environmental factors. Unfortunately, the physical relationships in agricultural ecosystems are very complex and not understood in most cases. Consequently, researchers are often forced to make simplifying assumptions in order to resolve the equations and satisfying results are rarely achieved. A more practical approach is an experiential model. There are two types of experiential models: linear and nonlinear. Most physical, biological processes are nonlinear. Therefore, simple linear models often perform poorly (Kimes and GastelluEtchegorry 1999). In some recent studies, researchers have recognized that nonlinear models are more realistic and potentially more accurate. However, *Corresponding author; e-mail:
[email protected] International Journal of Remote Sensing ISSN 0143-1161 print/ISSN 1366-5901 online # 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/0143116031000150068
1724
D. Jiang et al.
many of them are overly simplistic because it is difficult to know the appropriate nonlinear forms of cropland ecosystem functions (Hayes and Decher 1996, Rasmussen 1998). Artificial neural networks (ANN) have proved to be a more powerful and selfadaptive method of yield estimation as compared to traditional linear and simple nonlinear analyses (Simpson 1994, Baret et al. 1995, Jiang 2000). This method employs a nonlinear response function that iterates many times in a special network structure in order to learn the complex functional relationship between input and output training data. Once trained, an ANN model can remember a functional relationship and be used for further calculation. For these reasons, the ANN concept has been widely used to develop models, especially in strongly nonlinear, complicated systems (e.g. Louis and Yan 1998). The objective of this study was to apply an ANN as an efficient and powerful tool for establishing a yield forecasting model for winter wheat in He Nan province in north China in 1999. Most of the parameters were derived from National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) images; though statistical, historical yield data were also used. Yields of winter wheat were calculated at the county level.
2.
Study region Winter wheat is the primary crop in north China. The study region selected in this paper is the He Nan province, which has the largest winter wheat planting area in China (about 7.1 million hectares in 1999). The province, with a total area of 167 000 km2, is located at the centre of China, illustrated by figure 1. Yields of winter wheat in this area are mainly controlled by solar radiance, water stress conditions, temperature and soil conditions. Yields per unit area generally range from 2000 kg ha21 to 7500 kg ha21.
Figure 1.
Location of the study region: He Nan province.
Estimating crop yields using an ANN model
1725
3.
Data acquisition Winter wheat yields are affected by many factors, such as sunlight supply, temperature, water stress, soil conditions, etc. These factors should be measured using appropriate indices. In this research, we selected five indices that represented the factors mentioned above. The indices are Normalized Difference Vegetation Index (NDVI), absorbed photosynthesis active radiation (APAR), surface temperature (Ts), water stress index and average crop yield over the last 10 years. The former four factors can be retrieved from NOAA AVHRR data. Daily 1B format NOAA AVHRR images were prepared by the China National Satellite Meteorological Center (CNSMC) for the whole growing season (from October 1998 to the end of June 1999). Geometric correction and radiance validation was done by CNSMC. In order to minimize cloud contamination and atmospheric attenuation, daily images were decade composed. After this processing, a decade composed, equal latitude and longitude, 1 km61 km resolution NOAA AVHRR dataset was obtained for further calculation. Average crop yield per unit area, which represented soil conditions in some areas, was derived from government statistical data. 3.1. NDVI NDVI has proved to be one of the most efficient indices of growing conditions for crops. It can be derived from the first channel (red) and the second channel (infrared) of NOAA AVHRR data using the following formula (Tucker 1979): NDVI~ðCH2 {CH1 Þ=ðCH2 zCH1 Þ
ð1Þ
where CH1 and CH2 stand for values of the first and second channels of NOAA AVHRR. 3.2. APAR Solar radiation is the most important source of energy for crops. APAR represents the portion of radiation absorbed by crop leaves for photosynthesis. It has been shown to have a close relationship with net primary production (NPP) and crop yield (Prince and Goward 1996). It is calculated by the following formula (Wiegand et al. 1991, Asrar et al. 1992): APAR~PAR:FAPAR ð2Þ In equation (2), PAR~daily photosynthesis active radiation. FAPAR is a function of absorbed photosynthesis active radiation, which can be estimated approximately using FAPAR~1.89762 (NDVIz0.001) (Zhang 1998). PAR can be calculated using: PAR~geQtotal (azben/N), where Qtotal~daily total radiation, g~percentage of PAR in the total radiation, n~actual sunlight hours, N~theoretical sunlight hours, and a, b~regional coefficients (Sellers et al. 1992). Zhang (1996) provided approximated monthly values of a and b in north China (table 1). Table 1.
Suggested values of regional coefficient a and b (Zhang 1996).
Month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
a b
0.06 0.70
0.02 0.76
0.19 0.47
0.09 0.62
0.03 0.73
0.07 0.65
0.11 0.54
0.04 0.68
0.12 0.57
0.09 0.60
0.12 0.56
0.08 0.64
1726
D. Jiang et al.
3.3. Canopy surface temperature Surface temperature of the crop canopy is an integrated index; it represents not only sunlight radiated onto leaves, but also intensity of evaporation from the canopy. The more intense the evaporation is (strong metabolism and plenty of available water), the lower the temperature will be (Nemani et al. 1993). It is for this reason that it was selected as an important factor for the crop yield model. It is reported that canopy surface temperature could be determined from two thermal channels of NOAA AVHRR using an algorithm called the ‘split-windows’ algorithm. It was first presented by Price (1984) and has been applied and adjusted in several areas. 3:5ze4 ð3Þ z0:75T5 ðe4 {e5 Þ Ts ~½T4 z3:33ðT4 {T5 Þ 4:5 where Ts~surface temperature; T4, T5~light temperature from channel 4 and channel 5, respectively, of NOAA AVHRR; e4, e5~surface emission of channel 4 and channel 5, respectively, of NOAA AVHRR. The authors used an experimental formula, originating from equation (3), and specific to the situation in north China (Xu 1999): Ts~3.214 T422.190 T522.889. 3.4. Water stress index There are many methods for using remote sensing techniques for detecting soil/ crop water conditions: microwave method, thermal infrared method, evapotranspiration method, etc. Some researchers suggested that NDVI/Ts has a close relationship with crop water content (Goward et al. 1985). In arid or semi-arid areas (such as north China), healthy crops have high NDVI and strong evaporation, which results in a lower canopy surface temperature, and a high value of NDVI/Ts (Becker and Li 1990). NDVI/Ts has been used for water stress monitoring in recent years with good results (Lambin and Ehrlich 1996). This study adopted NDVI/Ts (water index) as one of the important indicators of crop yield. 3.5. Average crop yield per unit area Although crop yield per unit area of each county varied year by year, it was found that multi-annual averages of crop yields reflect regional crop growing conditions, especially soil quality and fertility conditions (Wang 1996). Ten-year (1989–1998) averaged crop yield per unit area was calculated using statistical data reported by the government. The use of this averaged index made it more stable for yield estimation. The planting date of winter wheat in the He Nan province is usually in the middle of October. The crop will boot after planting and then grow very slowly during winter. When the next spring comes, it will turn green (called ‘tillering’) and begin to grow faster. In order to make full use of the parameters, integral NDVI, Ts, water index and APAR were calculated from tillering (at the end of February) to the harvest (in the middle of June) of winter wheat. In addition, the real yields, which were used as ground data, were counted at the county level, so that all integral parameters were averaged county by county. In the following context, names of parameters refer to the county-averaged integral values.
Estimating crop yields using an ANN model
1727
4.
Methodology The ANN is a system that consists of artificial neurons, which are units whose function is similar to those of human neurons. Those neurons are connected to each other using weight. A more complete presentation of this concept was given by Zhang (1994). The basic structure of an ANN has three layers: the input layer, the hidden layer and the output layer, which always has a different number of neurons. The function of each neuron is a sigmoid function given by: f ðxÞ~
1 : 1zex
ðÞ
4.1. Model design The structure of the ANN used here can be seen in figure 2. The input layer has five neurons, each representing one of the yield factors mentioned in §3.1–3.5. The hidden layer has eight neurons and the output layer has one neuron. The ith neuron of the input layer connects with the jth neuron of the hidden layer by weight Wij, and weight between the jth neuron of the hidden layer and the tth neuron of output layer is Vjt (in this case t~1). The weighting function is used to simulate and recognize the response relationship between crop yield and corresponding environmental factors. 4.2. Training An ANN is a universal function approximator that directly adapts to any nonlinear function defined by a representative set of training data (Kimes and Gastellu-Etchegorry 1999). Once trained, the ANN recognizes the functional relationship between parameters (NDVI, APAR, etc.) and targets (actual crop yields). It can also remember this relationship for future calculations. In this study, the whole province was divided into eight sub-regions and two to three sample counties were selected for field collection of yield data in each sub-region. There were up to 30 counties selected; 20 of them were used for model training and 10 of
Figure 2.
The structure of a back-propagation ANN model used for crop yield estimation.
1728
D. Jiang et al.
Figure 3.
Spatial distribution of the sample counties in He Nan province.
them were used for validating data. Figure 3 shows the spatial distribution of these sample counties. The ANN model mentioned above was trained several thousand times until the average relative error was smaller than a predefined threshold. The training was done automatically by the ANN programs. Sample data and training results can be seen in table 2. Note that all of the input/output data were normalized before
Table 2.
Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample data and training result of the ANN yield estimation model of winter wheat in He Nan province in 1999. County name
NDVI
Ts
Xin Zheng Wei Shi Lan Kao Meng Jing Shong Xian Ru Yang Bao Feng Tang Yin Nei Huang Xin Xiang Feng Qiu Chang Heng Nan Le Xu Chang Yan Ling Que Shan Tong Bai Xin Yang Gu Shi Luo Shan
0.5787 0.5178 0.4497 0.3559 0.1062 0.1717 0.4287 0.7427 0.4767 0.5265 0.7342 0.8501 0.8832 0.8443 1.0000 0.4134 0.0464 0.0000 0.3593 0.0677
0.5498 0.4059 0.2794 0.5978 0.4462 0.5031 0.6155 0.2306 0.3570 0.0661 0.0996 0.0000 0.2838 0.4291 0.2374 0.7363 1.0000 0.7918 0.7178 0.6827
Actual Model Relative yield results errors Water Average (%) index APAR yield (kg ha21) (kg ha21) 0.5832 0.5062 0.4580 0.3810 0.1431 0.1829 0.3865 0.8349 0.5199 0.6657 0.8006 0.9766 0.7785 0.8129 1.0000 0.3618 0.0124 0.0000 0.3274 0.0812
0.3863 0.5234 0.5240 0.3148 0.3434 0.3426 0.2906 0.8193 0.5084 0.7699 0.8987 1.0000 0.8119 0.9476 0.3669 0.0652 0.2066 0.0550 0.2066 0.0000
0.3748 0.3897 0.3517 0.2730 0.1316 0.1404 0.2238 0.6846 0.5849 1.0000 0.7189 0.5709 0.7160 0.4907 0.7467 0.2953 0.1275 0.1915 0.0579 0.0000
4437.2 4395.3 4537.5 4034.1 3710.0 3737.7 3888.5 5140.2 4925.9 6018.8 5819.6 5588.3 5285.4 5202.5 5471.4 3744.6 3018.0 3131.1 2898.6 2771.7
4337.9 4585.7 4616.1 3794.9 3556.4 3566.9 3761.3 5448.0 4957.4 5654.3 5544.3 5566.5 5463.3 5055.5 5573.7 3775.2 3105.5 3187.2 3372.2 3138.2
2.24 24.33 21.73 5.93 4.14 4.57 3.27 25.99 20.64 6.06 4.73 0.39 23.37 2.83 21.87 20.96 22.90 21.79 29.97 28.35
Estimating crop yields using an ANN model
1729
training, so the numbers 0 and 1 in table 2 represent minimum and maximum values. The output results (yields) were converted into actual values in the end. The final form of the ANN model is given as follows: 1
"
Y~
q P
1z exp {
!#
ð4Þ
Vj Hj {c
j~1
In equation (4), Y~winter yield per unit area exported from the neural network model, q~number of nodes hidden (q~8 here), Vj~weight between jth hidden node and output node (in this case, there is only one output node), c~threshold of the output node, Hj~exported values from the jth hidden node:
Hj ~
1z exp {
1 n P
ð5Þ
Wij ai {hj
i~1
Wij~weight between ith input node and the jth hidden node, ai~values of the ith input factor, n~number of nodes of input (n~5 here), hj~threshold of the jth hidden node. Table 3 shows the values of these parameters obtained from model training.
4.3. Model validation Data from the other 10 sample counties (as mentioned in §4.2) were used to validate the ANN model. A multi-regressing linear model (MR model) was also used so that results from the two types of model could be compared. All information appears in table 4. The MR model is: Yield~0.11 NDVI20.28 Tsz0.04 waterz0.18 APARz0.57 average yieldz0.23 It is obvious from table 4 that the ANN model performed better than the MR model: the average relative error (absolute value) of the ANN model is 3.5% compared to the 11.5% error of the MR model. All relative errors (absolute value) of the ANN model are smaller than 10%, and 80% of the relative errors are even smaller than 5%, but the maximum relative error of the MR model is up to 33.5%. Generally speaking, the ANN model is more stable and accurate than the MR model.
Table 3.
Parameters of neural network yield estimating model of winter wheat in He Nan province in 1999.
Parameters W1j W2j W3j W4j W5j hj Vj c
j~1
j~2
j~3
j~4
j~5
j~6
j~7
j~8
20.759 2.066 20.354 20.359 21.507 0.777 20.157 2.835
0.164 1.924 0.022 0.288 21.898 0.993 2.480
20.713 1.096 20.509 0.544 22.686 0.121 20.623
0.288 1.932 20.761 0.329 22.062 1.032 1.014
0.810 2.138 2.262 1.260 21.288 0.677 1.645
0.296 1.758 0.645 1.065 21.446 0.834 1.253
20.238 0.120 20.516 20.452 21.184 1.790 0.004
0.852 0.830 20.596 0.960 22.421 0.539 0.193
1730
Table 4.
Comparison of results of ANN model and multi-regression model.
NDVI
Ts (‡C)
Water index
APAR (MJ m22)
Average yield (kg ha21)
Actual yield (kg ha21)
Result of ANN
Result of MR
Relative error (ANN)
Relative error (MR)
Qi Xian Luan Chuan Lu Shan Jia Xian JunXian Yan Jin Qing Feng Chang Ge Fu Gou Nan Zhao
3329.1 2121.7 1635.3 2401.2 2918.2 2426.0 2897.4 3237.2 2781.8 1394.4
3466.8 3606.2 3974.1 3852.8 3495.9 3438.8 3551.5 3642.4 3549.3 3772.0
100.5 78.5 57.6 83.5 117.3 94.6 106.9 130.2 97.7 46.2
3664.2 3460.8 2555.5 3018.0 3823.4 3272.9 3834.8 4210.7 3530.0 2749.7
3950.6 2634.6 2483.0 4087.2 5202.1 4494.9 5498.1 4855.7 4764.6 2698.0
4807.1 4215.9 3593.4 4686.5 5473.1 5220.3 5417.9 5948.1 5132.1 3420.8
5194.7 4269.9 3414.9 4482.6 5570.0 5343.0 5557.9 5612.9 5296.7 3394.8
4742.0 3466.8 2389.9 3815.7 5415.3 4804.8 5481.6 5187.0 4934.0 2837.2
28.06 21.28 4.97 4.35 21.77 22.34 22.59 5.64 23.21 0.76
1.35 17.8 33.5 18.6 1.06 7.96 21.2 12.8 3.86 17.1
D. Jiang et al.
County name
Estimating crop yields using an ANN model
Figure 4.
1731
Yields per area unit of winter wheat estimated by ANN model, He Nan province, north China, 1999.
5.
Model application After calibration and validation of the ANN model, parameters of every county in the He Nan province in 1999 were put into the model and winter wheat yields per unit area were calculated. The output results can be seen in figure 4. 6.
Conclusion This study demonstrated that it is possible to develop a yield estimation model using ANN techniques. The ANN performed much better than the traditional linear, multivariate regression approach. This conclusion has been confirmed by the recent work of other researchers (e.g. Simpson 1994). The stability and accuracy of the ANN yield estimation model results from several important characteristics of the model: (1) capabilities of ANN itself, such as self-learning, compatibility and flexibility; (2) integrated use of remotely sensed data together with historical statistical information. Parameters retrieved from satellite images were coalesced by the main growing season of the crop; and (3) precise division of the study area based on agricultural knowledge and careful selection of sample data. The model has only been applied in the He Nan province in 1999. Further studies may focus on the test and calibration of this model in larger areas and over longer temporal scales. Acknowledgments This work was one of the state’s key projects (KZCX2-308-4) supported by Chinese Academy of Sciences. We are grateful to Yang Mugeng, of the China University of Mining and Technology, for his help on neural network programs. We also thank Mr Rosema, of the EARS Company in the Netherlands, for useful discussions about crop growth monitoring.
1732
Estimating crop yields using an ANN model
References ASRAR, G., MYNENI, R. B., and CHOUDHURY, B. J., 1992, Spatial heterogeneity in vegetation canopies and remote sensing of APAR: a modeling study. Remote Sensing of Environment, 41, 85–103. BARET, F., CLEVERA, J. G., and STEVEN, M. D., 1995, The robustness of canopy gap fraction estimates from red and near-infrared reflectance: a comparison of approaches. Remote Sensing of Environment, 54, 141–151. BECKER, F., and LI, Z. L., 1990, Temperature independent spectral indices in thermal infrared bands. Remote Sensing of Environment, 32, 17–33. GOWARD, S. N., CRUICKSHANKS, G. D., and HOPE, A. S., 1985, Observed relation between thermal emission and reflected spectral radiance of a complex vegetated landscape. Remote Sensing of Environment, 18, 137–146. HAYES, M. J., and DECHER, W. L., 1996, Using NOAA AVHRR data to estimate maize production in the United States Corn Belt. International Journal of Remote Sensing, 17, 3189–3200. JIANG, D., 2000, Study on crop yield forecasting model using remote sensed information supported by artificial neural network. Doctoral dissertation, Institute of Geographic Science and Natural Resources Research, Chinese Academy of Sciences. KIMES, D. S., and GASTELLU-ETCHEGORRY, J., 1999, Recovery of vegetation characteristics using neural networks. Proceedings of International Symposium on Digital Earth, 29 November 1999 (China: Science Press), pp. 322–329. KIMES, D. S., NELSON, R. F., MANRY, M. T., and FUNG, A. K., 1998, Attributes of neural networks for extracting continuous vegetation variables from optical and radar measurements. International Journal of Remote Sensing, 19, 2639–2663. LAMBIN, E. F., and EHRLICH, D., 1996, The surface temperature–vegetation index space for land cover and land-cover change analysis. International Journal of Remote Sensing, 17, 463–487. LOUIS, E. K., and YAN, X.-H., 1998, A neural network model for estimating sea surface chlorophyll and sediments from Thematic Mapper imagery. Remote Sensing of Environment, 66, 153–165. NEMANI, R. R., PIERCE, L. L., RUNNING, S. W., and GOWARD, S., 1993, Developing satellite derived estimates of surface moisture status. Journal of Applied Meteorology, 32, 548–557. PRICE, J. C., 1984, Land surface measurements from the split window channels of NOAA-7 AVHRR. Journal of Geophysical Research, 89, 7231–7237. PRINCE, S. D., and GOWARD, S. N., 1996, Evaluation of the NOAA/NASA Pathfinder AVHRR land data set for global primary production modeling. International Journal of Remote Sensing, 17, 217–221. RASMUSSEN, M. S., 1998, Developing simple, operational, consistent NDVI-vegetation models by applying environment and climatic information. Part II: Crop yield assessment. International Journal of Remote Sensing, 19, 119–139. SIMPSON, G., 1994, Crop yield prediction using a CMAC neural network. Proceedings of the Society of Photo-Optical Instrumentation Engineers, 2315, 160–171. TUCKER, C. J., 1979, Red and photographic infrared linear combination for monitoring vegetation. Remote Sensing of Environment, 8, 127–150. WANG, N., 1996, Wheat Growth Monitoring and Yield Estimation by Remote Sensing in China (Beijing: Science and Technology Press). WIEGAND, C. L., RICHARDSON, A. J., ESCOBAR, D. E., and GERBERMANN, A. H., 1991, Vegetation indices in crop assessments. Remote Sensing of Environment, 35, 105–119. XU, X., 1999, Retrieval of land surface parameters from remote sensed data. Doctoral dissertation, Institute of Remote Sensing Application, Chinese Academy of Sciences. ZHANG, J., 1998, Study on remote sensing—photosynthesis model for crop yield prediction. Doctoral dissertation, Institute of Remote Sensing Application, Chinese Academy of Sciences. ZHANG, L., 1994, Artificial Neural Network and its Applications (Shanghai: Fu Dan University Press). ZHANG, R. H., 1996, Experimental Remote Sensing Model and Site Work (Beijing: Science and Technology Press).