This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
1
Deep Recurrent Neural Networks for Winter Vegetation Quality Mapping via Multitemporal SAR Sentinel-1 Dinh Ho Tong Minh , Dino Ienco , Raffaele Gaetano, Nathalie Lalande, Emile Ndikumana, Faycal Osman, and Pierre Maurel
Abstract— Mapping winter vegetation quality is a challenging problem in remote sensing. This is due to cloud coverage in winter periods, leading to a more intensive use of radar rather than optical images. The aim of this letter is to provide a better understanding of the capabilities of Sentinel-1 radar images for winter vegetation quality mapping through the use of deep learning techniques. Analysis is carried out on a multitemporal Sentinel-1 data over an area around Charentes-Maritimes, France. This data set was processed in order to produce an intensity radar data stack from October 2016 to February 2017. Two deep recurrent neural network (RNN)-based classifiers were employed. Our work revealed that the results of the proposed RNN models clearly outperformed classical machine learning approaches (support vector machine and random forest). Index Terms— Gated recurrent unit (GRU), long short term memory (LSTM), multitemporal, recurrent neural network (RNN), Sentinel-1, synthetic aperture radar (SAR), vegetation quality.
I. I NTRODUCTION
A
N EXCESS of nitrates in drinking water is harmful to human health. They mainly derive from the excessive use of agricultural fertilizers leached on bare soils during rainfall. Since 2011, the European Nitrates Directive imposed the use of winter vegetation coverage to reduce pollution diffusion in the perimeter of drinking water collections [1]. This allows the absorption of a part of the nitrate surpluses, thereby limiting the leaching of the soil. When spring and summer crops are harvested, the soil is released from vegetation cover in autumn
Manuscript received July 10, 2017; revised September 21, 2017 and December 7, 2017; accepted January 9, 2018. This work was supported in part by the European Space Agency, Centre National d’Etudes Spatiales/Terre, Ocean, Surfaces Continentales, Atmosphere and in part by the National Research Agency within the framework of the program “Investissements d’Avenir” for the GEOSUD under Project ANR-10-EQPX-20. (Corresponding author: Dinh Ho Tong Minh.) D. Ho Tong Minh, E. Ndikumana, F. Osman, and P. Maurel are with the UMR-TETIS Laboratory, IRSTEA, University of Montpellier, 34090 Montpellier, France (e-mail:
[email protected]; emile.
[email protected];
[email protected];
[email protected]). D. Ienco is with the UMR-TETIS Laboratory, IRSTEA, University of Montpellier, 34090 Montpellier, France, and also with the LIRMM Laboratory, 34090 Montpellier, France (e-mail:
[email protected]). R. Gaetano is with the UMR-TETIS Laboratory, CIRAD, University of Montpellier, 34090 Montpellier, France (e-mail:
[email protected]). N. Lalande is with Envylis Cie, 34750 Villeneuve-les-Maguelone, France (e-mail:
[email protected]). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2018.2794581
and early winter, and the farmers have to plant intermediate crops (cereals, grasslands, fallow land, etc.) and receive aid in return [2]. The mapping of the winter cover in vulnerable areas helps the services of the State to control the declarations of the farmers, but also the local actors to set up innovative action plans to reduce diffuse pollution. Remote sensing satellite imagery is a valuable aid in understanding the level of vegetation cover in winter [3]. Moreover, the high cloud cover levels during the winter season and the dynamic of vegetation justify the use of radar images time series rather than optical ones. Recently, the Sentinel-1 satellite is operating day and night and performing C-band synthetic aperture radar (SAR) imaging, thus enabling image acquisition regardless of the weather [4]. Sentinel-1 data are systematically acquired at a global scale in terrain observation with progressive scan (TOPS) mode with a revisit period of 12 days. This offers a unique source of information to map our vegetation quality cover in the winter season. In the literature of remote sensing analysis for operational land cover mapping, most of the works are based on standard machine learning approaches [i.e., support vector machine (SVM), random forest (RF)] [5]. These methods generally ignore temporal dependencies available in time series data. Conversely, recurrent neural networks (RNNs) [6], a family of deep learning methods, offer models to explicitly manage temporal dependencies among data (e.g., long short term memory (LSTM) [7]), which makes them suitable for the mining of multitemporal SAR Sentinel-1 data. In this letter, we propose to use two deep RNN approaches to explicitly consider temporal correlation of SAR data in order to discriminate among quality classes of winter vegetation coverage exhibiting complex temporal behaviors. We experimentally show that the use of RNN to deal with a time series of SAR Sentinel-1 data offers two main advantages: 1) RNNs models can be adopted to work on pixel-based time series and 2) they yield a consistent improvement in winter vegetation quality classes when compared with those generated by classical machine learning approaches (SVM and RF). II. S TUDY A REA A. Charentes-Maritimes Site The study area (horizontal: 40 km, vertical: 60 km) is located in the department of Charentes-Maritimes, West
1545-598X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
Fig. 2. Temporal profiles of the five different classes with respect to the (Top) VV and (Bottom) VH polarizations.
is dual-polarization (VV + VH) data, resulting in 26 images. Fig. 2 summarizes the temporal profiles of the five winter vegetation quality classes per polarization. Each time series is made up of 13 points (one for each acquisition). B. SAR Processing
Fig. 1. Study site (horizontal: 40 km, vertical: 60 km) is located in the Department of Charentes-Maritimes, West France. The background is the average VV-polarization intensity. Colored polygons represent field locations.
France, which is a zone very vulnerable to diffuse pollution. This area includes two drinking water supply basins: La Rochelle and Saintes. Both the basins are regulated by the Re-Sources actions program (2016–2020), which aims to improve the water quality while maintaining efficient agricultural production in both environmental and economic terms. It is based on a close partnership between the state services, the water agency, the region and local actors (municipalities, professional agricultural organizations, and farmers). B. Ground Data A field survey was carried out on 194 plots in December 2016 by the local actors with the help of an environmental engineering company. The following information was collected on each plot: land cover, an estimate of the quality of the vegetative development (five classes: bare soil, very low, low, average, and high), the date of observation, and a photograph. Fig. 1 shows the study area and the position of ground samples. III. SAR DATA AND P ROCESSING A. SAR Data The Sentinel-1 single look complex data set includes 13 acquisitions in TOPS mode from October 7, 2016 to February 28, 2017, with a temporal baseline of 12 days. This
First, a master image was chosen and all images were coregistered taking into account the TOPS mode of the master scene [8]. Five-look (five range looks) backscatter images were generated and radiometrically calibrated for range spreading loss, antenna gain, normalized reference area and the calibration constant using parameters included in the Sentinel-1 SAR header. We then improve the time series SAR Sentinel-1 data set by exploiting a temporal filtering developed by [9] to reduce noise while retaining as much as possible the fine structures present in the images. In order to create a unified data set, all the image data must be orthorectified into map coordinates. This was accomplished by creating a simulated SAR image from a Shuttle Radar Topography Mission digital elevation model 30 m, and using the simulated SAR image to coregister the two image sets. The pixel size of the orthorectified image data is 10 m. After geocoding, all backscatter images are converted to the logarithm dB scale, normalized to values between 0 and 255 (8 bits).1 Finally, for each pixel, a 2 × 13 matrix (13 acquisitions and two polarizations, −VV + VH) is generated as input for classifiers. IV. R ECURRENT N EURAL N ETWORK RNNs are well-established machine learning techniques that demonstrate their quality in different domains such as speech recognition, signal processing, and natural language processing [11], [12]. Unlike standard feed forward networks (i.e., convolutional neural networks–CNNs), RNNs explicitly manage temporal data dependencies since the output of the neuron at time t − 1 is used, together with the next input, to feed the neuron itself at time t. A sketch of a typical RNN neuron is depicted in Fig. 3. 1 The SAR Sentinel-1 data were processed using the IRSTEA TomoSAR platform, which offers SAR, interferometry, and tomography processing [10].
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HO TONG MINH et al.: DEEP RNNs FOR WINTER VEGETATION QUALITY MAPPING VIA MULTITEMPORAL SAR SENTINEL-1
3
yt = tanh(W yx x t + W yh h t −1 + b y )
(3)
ct = i t yt + f t ct −1 ot = σ (Wox x t + Woh h t −1 + bo )
(4) (5)
h t = ot tanh(ct ).
(6)
B. Gated Recurrent Unit
Fig. 3.
(Left) RNN unit. (Right) Unfolded structure.
Among the different RNN models, LSTM [7] and gated recurrent unit [13] (GRU) are the two most well-known RNN units. We briefly describe below the two RNN units. For each of them we supply and discuss the equations that describe its inner behavior. The symbol indicates an elementwise multiplication while σ and tanh represent sigmoid and hyperbolic tangent function, respectively. The input of an RNN unit is a sequence of variables (x 1 , . . . , x N ) where a generic element x t is a feature vector and t refers to the corresponding timestamp. A. Long-Short-Term Memory The main purpose of the LSTM model was to learn longterm dependencies [7], since previous RNN models failed in this task due to the problem of vanishing and exploding gradients. Equations (1)–(6) formally describe the LSTM neuron. The LSTM unit is composed of two cell states, the memory Ct and the hidden state h t , and three different gates: input (i t ), forget ( f t ), and output (ot ) gate that are employed to control the flow of information. All the three gates combine the current input x t with the hidden state h t −1 coming from the previous timestamp. The gates have also two important functions: 1) they regulate how much information has to be forgotten/remembered during the process and 2) they deal with the problem of vanishing/exploding gradients. We can observe that the gates are implemented by a sigmoid. This function returns values between 0 and 1. The LSTM unit also uses a temporary cell state yt that rescales the current input. This temporary cell is implemented by a hyperbolic tangent function that returns values between −1 and 1. Both sigmoid and hyperbolic tangent are applied elementwise. i t regulates how much of the current information needs to be maintained (i t yt ), while f t indicates how much of the previous memory needs to be retained at the current step ( f t ct −1 ). Finally, ot impacts the new hidden state h t deciding how much information of the current memory will be outputted to the next step. The different W∗∗ matrices and bias coefficients b∗ are the parameters learned during the training of the model. Both the memory Ct and the hidden state h t are forwarded to the next time step i t = σ (Wi x x t + Wih h t −1 + bi )
(1)
f t = σ (W f x x t + W f h h t −1 + b f )
(2)
Cho et al. [13] built a new RNN unit that is designed to make the computation and implementation of LSTM model much easier. Equations (7)–(9) formally describe the GRU neuron. This unit follows the general philosophy of the LSTM model implementing gates and cell states but, conversely to the LSTM model, the GRU unit has only two gates, update (z t ) and reset (rt ), and one cell state, the hidden state (h t ). Moreover, the two gates combine the current input (x t ) with the information coming from the previous timestamps (h t −1 ). The update gate effectively controls the tradeoff between how much information from the previous hidden state will carry over to the current hidden state and how much information of the current timestamp needs to be kept. This acts similar to the memory cell in the LSTM unit supporting the RNN in order to remember long-term information. On the other hand, the reset gate monitors how much information of the previous timestamps needs to be integrated with the current information. As each hidden unit has separate reset and update gates, they are able to capture dependencies over different timescales. Units more prone to capturing shortterm dependencies will tend to have a frequently activated reset gate, but those that capture longer term dependencies will have update gates that remain mostly active [13] z t = σ (Wzx x t + Wzh h t −1 + bz )
(7)
rt = σ (Wr x x t + Wrh h t −1 + br )
(8)
h t = z t h t −1 + (1 − z t ) tanh(Whx x t + Whr (rt h t −1 ) + bh ).
(9)
C. RNN-Based Time Series Classification To perform the classification task, for each of the RNN unit, we have built a deep architecture that stacks together five units. The use of multiple units, which is similar to what is commonly done for CNN networks combining together several convolutional layers [6], will allow the extraction of highlevel nonlinear temporal dependencies available in the remote sensing time series. The RNN model learns a new representation of the input sequences but it does not make any prediction by itself. To this end, a SoftMax layer [14] is stacked on top of the last recurrent unit to perform the final multiclass prediction. The SoftMax layer has as many neurons as the number of the classes to predict. We choose the SoftMax instead of the Sigmoid function because, due to per-layer normalization the value of the SoftMax layer can be seen as a probability distribution over the classes that sum to 1, while in the case of a sigmoid, neurons can output any value between 0 and 1. This choice is further justified in the proposed application (multiclass prediction) by the constraint that each sample exclusively belongs to a single class. From an architectural point of view,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
TABLE I
TABLE II
D ISTRIBUTION OF THE N UMBER OF P IXEL P ER C LASS
F IVEFOLD C ROSS VALID ON THE T IME S ERIES SAR S ENTINEL -1 D ATA
the connection between the RNN model and the SoftMax layer is realized by fully connecting the last hidden state vector of the recurrent architecture with the SoftMax neurons. This schema is instantiated for both LSTM and GRU units, thus coming up with two different classifiers: an LSTM-based and a GRU-based classification scheme. V. E XPERIMENTAL R ESULTS A. Experimental Settings We compare the LSTM and the GRU-based time series models with standard machine learning approaches, RF and SVM, usually employed in the remote sensing field [15]. For the RF model, we set the number of trees at 400 and we allow a maximum tree depth of 25. For the SVM model we use radial basis function kernel with default gamma and complexity parameter equals to 106. For RF we used the python implementation supplied by the Scikit-learn library [16] while for SVM we use the LibSVM implementation [17]. Considering the RNN-Based classifiers, we set the number of hidden dimensions at 512. An initial learning rate of 5 × 10−4 and a decay of 5 × 10−5 is employed. We have implemented the model via the Keras python library [18] with Theano as back end. To train the model we have used the Rmsprop strategy that is a variant of the stochastic gradient descent [19]. The loss function being optimized is the categorical cross-entropy that is the standard loss function employed for multiclass classification tasks [20]. The model is trained for 350 epochs with a batch size set at 64. To validate the different methods, we perform a fivefold cross-validation on the data set as shown in Table I. The whole data set (72 589 pixels) is randomly divided into five partitions (folds) of equal size. Then, four folds are used to train the model (around 58 071 pixels) and the fifth fold (around 14 518 pixels) is used for the test phase. This is repeated five times in order to consider each fold as a possible test set. Finally, the score measures are calculated considering the concatenation of the per-fold predicted values with respect to the true classes. The training step takes, on average, 248 min to learn each RNN model on the training set and less than 3 min to classify the pixel time series in the test set. In order to assess classification performances, we use not only the global accuracy and kappa measures but also average and per-class F-measure. B. Results and Discussions Table II summarizes the results of the different classification approaches on the SAR Sentinel 1 time series data
Fig. 4.
Per class F-measure of the different approaches.
considering the winter vegetation quality mapping task. At first glance, we can observe a significant performance gain using RNN-based classification approaches over classical machine learning methods (RF and SVM). The performance gain involves all the different evaluation metrics. On average, both LSTM and GRU improve the classification performances of more than 7 points of accuracy/F-measure. Between the two RNNs models, the GRU-based method obtains slightly better results than the LSTM one. Fig. 4 illustrates a per-class F-measure comparison. Here we can obtain a more precise comprehension of the behavior of the different methods. We can see that the performance gain offered by the RNN-based methods involve all the five classes resulting in equally good results on all of them. Conversely, we note that both RF and SVM show different behaviors for different classes. Both classifiers obtain the best performances on the High quality class (4) and the lowest performances on the Low quality class (2). This behavior can be explained considering the temporal profiles of both VV and VH presented in Fig. 2. The Strong class (depicted with green lines in Fig. 2) has a clear and distinct profile, this facilitates its detection without having to consider its temporal correlation. On the other hand, the Low class (depicted with black lines in Fig. 2) intersects the temporal profiles of all the other classes multiple times. This is probably the reason why both RF and SVM approaches are not able to correctly detect this class since they ignore the temporal correlation of the data. On the other hand we can argue that the RNN-Based approaches well discriminate among all the classes since they extract and summarize the important signal portions that support the discriminative task among the different winter quality classes.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HO TONG MINH et al.: DEEP RNNs FOR WINTER VEGETATION QUALITY MAPPING VIA MULTITEMPORAL SAR SENTINEL-1
5
the data in order to discriminate between quality classes of winter vegetation coverage, typically characterized by similar but complex temporal behaviors. ACKNOWLEDGMENT The authors would like to thank the Syndicat des Eaux de la Charente Maritime, the water authority of the City of La Rochelle, who carried out field surveys, and the Re-Sources program. R EFERENCES
Fig. 5. Confusion matrices on the SAR SENTINEL 1 time series data of the different approaches. (a) RF. (b) SVM. (c) LSTM. (d) GRU.
Fig. 5 shows the confusion matrices for each method. We observe that a high-misclassification rate is recorded between the Low (2) and Bare soil (5) classes. This is true for all the different classifiers. However, for the RNN-based approaches this misclassification error is not as high. Conversely, in the case of both RF and SVM this misclassification behavior is significant. Another problem we can observe is related to the discrimination between the Low and the Very Low classes. Also in this case, the standard machine learning approaches are under strain and they are responsible for a considerable misclassification rate. The joint optimization of nonlinear input transformations along with the classification, proper to all deep learning approaches, provides here a valuable strategy to discriminate among the different winter vegetation quality classes. Furthermore, as expected, the ability of RNNs to deal with the temporal correlations, characterizing the SAR Sentinel-1 data, results in a gain in performance on all classes with particular emphasis on those classes that exhibit similar temporal behaviors for longer periods. All these results indicate that the RNN models (both LSTM-based and GRU-based) are well suited to detect and exploit temporal dependencies as opposed to common classification approaches that do not explicitly leverage temporal correlations. VI. C ONCLUSION In this letter, we show how RNN models such as LSTM and GRU can be used to deal with SAR Sentinel-1 time series concerning the mapping of winter vegetation quality. RNNs clearly outperform classical machine learning approaches commonly used in remote sensing. The experiments highlight the appropriateness of a specific class of deep learning models (RNNs), which explicitly consider the temporal correlation of
[1] G. Kallis and D. Butler, “The EU water framework directive: Measures and implications,” Water Policy, vol. 3, no. 2, pp. 125–142, 2001. [2] C. Buckley and P. Carney, “The potential to reduce the risk of diffuse pollution from agriculture while improving economic performance at farm level,” Environ. Sci. Policy, vol. 25, no. 1, pp. 118–126, 2013. [3] W. D. Hively, M. Lang, G. W. McCarty, J. Keppler, A. Sadeghi, and L. L. McConnell, “Using satellite remote sensing to estimate winter cover crop nutrient uptake efficiency,” J. Soil Water Conservation, vol. 64, no. 5, pp. 303–313, 2009. [4] R. Torres et al., “GMES Sentinel-1 mission,” Remote Sens. Environ., vol. 120, pp. 9–24, May 2012. [5] J. Inglada, A. Vincent, M. Arias, and C. Marais-Sicre, “Improved early crop type identification by joint use of high temporal resolution SAR and optical image time series,” Remote Sens., vol. 8, no. 5, p. 362, 2016. [6] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, Aug. 2013. [7] S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lag problems,” in Proc. NIPS, 1996, pp. 473–479. [8] P. Prats-Iraola, R. Scheiber, L. Marotti, S. Wollstadt, and A. Reigber, “TOPS interferometry with TerraSAR-X,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 8, pp. 3179–3188, Aug. 2012. [9] S. Quegan and J. J. Yu, “Filtering of multichannel SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, pp. 2373–2379, Nov. 2001. [10] D. H. T. Minh, Y.-N. Ngo, N. Baghdadi, and P. Maurel, “TomoSAR platform: The new Irstea service as demand for SAR, interferometry, polarimetry and tomography,” in Proc. ESA Living Planet Symp., 2016, pp. 1–8. [11] K. Soma, R. Mori, R. Sato, N. Furumai, and S. Nara, “Simultaneous multichannel signal transfers via chaos in a recurrent neural network,” Neural Comput., vol. 27, no. 5, pp. 1083–1101, 2015. [12] T. Linzen, E. Dupoux, and Y. Goldberg, “Assessing the ability of LSTMs to learn syntax-sensitive dependencies,” in Proc. TACL, vol. 4. 2016, pp. 521–535. [13] K. Cho et al., “Learning phrase representations using RNN encoderdecoder for statistical machine translation,” in Proc. EMNLP, 2014, pp. 1724–1734. [14] A. Graves, A.-R. Mohamed, and G. E. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. ICASSP, May 2013, pp. 6645–6649. [15] R. Flamary, M. Fauvel, M. D. Mura, and S. Valero, “Analysis of multitemporal classification techniques for forecasting image time series,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 5, pp. 953–957, May 2015. [16] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011. [17] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27, 2011. [18] F. Chollet. (2015). keras. [Online]. Available: https://github.com/fchollet/ keras [19] Y. N. Dauphin, H. de Vries, J. Chung, and Y. Bengio, “RMSProp and equilibrated adaptive learning rates for nonconvex optimization,” unpublished paper, 2015. [Online]. Available: https://arxiv.org/abs/1502.04390v1 [20] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens. Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016.