int. j. remote sensing, 1999, vol. 20 , no. 9, 1841 ± 1851
The use of Neural Networks for the estimation of oceanic constituents based on the M ERIS instrument DANIEL BUCKTON and EON O’M ONGAIN² Physics Department, University College Dublin, Bel® eld, Dublin 4, Ireland. 353-1-7062233. Fax: 353-1-2837275.
SEAÂ N DANAHER FIES, Leeds Metropolitan University, Leeds LS1 3HE, England, UK (R eceiv ed 27 March 1997; in ® nal fo rm 28 November 1997 ) Abstract. Arti® cial Neural Networks ( NNs) are used in estimations of oceanic constituents from simulated data for the Mechron Resolution Imaging Spectrometer (MERIS) instrument system for Case II water applications. The simulation includes the e ects of oceanic substances such as algal related chlorophyll, non-chlorophyllous suspended matter and DOM (dissolved organic matter). It is shown here that NNs can be used to estimate oceanic constituents based on simulated data which include the e ects of realistic noise and variability models. The advantage of NNs is that they not only achieve higher retrieval accuracy than more traditional techniques such as band ratio algorithms, but they also allow the inclusion of usually super¯ uous or unused information, such as geometric parameters and atmospheric visibility.
1.
Introduction A number of methods have been proposed for the estim ation of oceanic constituents from both real and sim ulated remotely sensed data (Gordon et al. 1983, Parslow 1991, Danaher and O’M ongain 1992, Danaher et al . 1992, O’M ongain et al. 1993 ). Recent attem pts at maximizing the inversion capabilities have concentrated on the use of Neural Networks ( NNs) to allow inversion with the inclusion of parameters such as Sun angle and viewing geometry ( Benediktsson et al. 1993, Buckton et al. 1995 ). The sim ulations developed by University College Dublin and Leeds M etropolitan University allow the calculation of the expected signal at satellite level, including the e ects of atmospheric interaction and instrument noise (O’Mongain et al. 1993, Buckton et al. 1995 ). Any realistic sim ulation requires noise and variability models so as to provide information about the true ability to retrieve the concentration of oceanic constituents. ² e-mail:
[email protected] Internati onal Journal of R em ote Sensing ISSN 0143-1161 print/ ISSN 1366-5901 online Ñ 1999 Taylor & Francis L td http:/ /www.tand f.co.uk/JNLS/ res.htm http:/ /www.tayl orandfrancis.com /JNLS/ res.htm
1842
D. B u ck to n et al.
2.
The sim ulation models The sim ulation for the generation of the M echron Resolution Imaging Spectrom eter (M ERIS ) signal is based on models from various sources. The oceanic model is an adaptation of a model based on the work of M orel (1988 ), while the atmospheric model is that attributable to the work of Descham ps et al. (1982 ). The instrument model incorporates the expected gain and noise levels of the M ERIS sensor. The structure of the sim ulation is shown in ® gure 1; the constituent concentration ranges, geometric parameters and atmospheric optical depth range are shown in table 1.
Figure 1.
General structure of the simulation and estimation of the oceanic constituents utilizing oceanic, atmospheric and satellite models. Table 1.
C c hl C s ed C ye l hS hv w t p (l 0 )
The range of parameters (de® ned in the text) for each of the simulations. For each particular scene the values are selected randomly from within the range. C ye l is expressed as absorption at 440 nm. Min
Max
0 0 0 23 0 0 0.1
30 1 1 53 47 180 0.15
Units m g lÕ
mg lÕ
1 1
a ( 440 ) m Õ
deg. deg. deg.
1
1843
ME R IS
2.1. T he o cea nic m odel The oceanic model takes as input the constituent concentrations, i.e. the concentration of chlorophyll, dissolved organic matter (DOM ) and non-chlorophyllous suspended matter, which we shall call sedim ents. The model’s output is the subsurface irradiance re¯ ectance. The basic requirem ent for the calculation of the subsurface re¯ ectance is knowledge of the absorption and backscatter coe cients, which can be related to the re¯ ectance by Gordon et al. ( 1975 ) and M orel (1988 ). R (l , z) = 0.331
b b (l , z) a (l , z)
.
(1 )
For this the absorption a and backscatter b are calculated by the sim ple addition of their speci® c components. Hence the total absorption and total backscatter are given by the equations a = aw + ac+ as + ay
b b = bw + bc + bs+ b y
(2 )
where the subscripts w, c, s and y denote water, chlorophyll-a indexed algae, sediments and DOM . M orel’s model for oceanic re¯ ectance for Case I waters is used to calculate the absorption and backscatter coe cients attributable to algae and is then modi® ed to the Case II situation by the inclusion of the e ects of sedim ents and DOM . The pure water measurem ents used are taken from Smith and Baker (1981 ), while the sedim ent’ s speci® c absorption and backscatter are taken from measurements made by Doer er ( 1992 ). The values for the absorption and scatter due to sedim ents are very much dependent on the sedim ent type chosen and hence are speci® c to a particular body of water. The absorption coe cient due to DOM can be modelled by the equation a y (l )= a (l 0 ) exp [Õ
0.014 (l Õ
l0 )]
(3 )
where l 0 is taken to be 440 nm, which is our reference point for the concentration of the DOM . The backscatter due to DOM is assumed to be negligible and can be dropped from equation ( 2 ). The atmospheric model requires as input the surface re¯ ectance, which is assumed to be a Lambertian re¯ ector. The above surface re¯ ectance is calculated from the subsurface re¯ ectance by the constant factor of a half. This sim ulation of the air ± water interface will require further work to improve its realism. The subsurface re¯ ectances due to increasing concentrations of the constituents, individually, are shown in ® gure 2. 2.2. T he a tm osp heric m o del The model used for sim ulation of the atmospheric component was developed by Descham ps et al. ( 1982 ). This model is em pirical and sim ply sim ulates the Rayleigh and aerosol components of the atmosphere. As stated, the model takes as input the above surface re¯ ectance and the geometric parameters. The assumption is made that the surface approximates a Lambertian re¯ ector. The geometric parameters are de® ned as the satellite viewing angle hv , the Sun angle h S and the relative azimuth angle w between the Sun and the satellite at the pixel under observation. The output of the model is the top-of-atmosphere ( TOA) re¯ ectance which can be converted to the satellite input radiance via the solar spectrum . The optical thickness of the aerosol component t p (l 0 = 1 m m) was modi® ed to be an unknown parameter within
1844
D. B u ck to n et al.
Figure 2. Subsurface re¯ ectance as a function of chlorophyll ( 0 to 30 m g lÕ ), sediment ( 0 to 1 1 1 mg lÕ ) and yellow substance ( 0 to 1 m Õ ) concentrations. For chlorophyll the re¯ ectance decreases in the blue as the chlorophyll concentration increases; for sediment the re¯ ectance increases as the concentration increases; and for yellow substance the re¯ ectance in the blue decreases as the concentration increases. 1
the sim ulation having a range 0.1 to 0.15, where t p (l 0 ) = 0.132 correspo nds to 23 km horizontal visibility. 2.3. T he instrum e nt m odel The instrument model calculates the output signal that is returned to Earth from 2 1 1 an input radiance in W m Õ sr Õ band Õ at the satellite. The radiance is converted into observed photoelectron counts O , allowing the calculation of instrument and photon noise based on European Space Agency ( ESA) model of the instrument. The instrument and photon noise are modelled by choosing a random number from a 2 normal distribution with s = Ó O + N , where N are the instrument noise ® gures supplied by ESA at the time of the sim ulations. This noise model does not di er in any substantial respect from the current model, and includes dark, digitization and instrument noise levels. This normally distributed noise is added to O to form the sim ulated observation set. The signal to noise ratio (SNR) for a typical observation is shown in ® gure 3. 3.
Arti® cial Neural Networks The general structure of a NN can be found in many modern textbooks (e.g. Carling 1992 ). In this study we will consider only the feed-forward multilayer Perceptron Architectu re. In such an implementation the network consists of a number of layers of neurons, with the output from each neuron connecting to the input of
ME R IS
1845
Figure 3. The signal-to-noise ratio (SNR) for the MERIS instrument for a typical dataset, including photonic and instrument noise. Instrument and noise models used were those available at the time of the simulation and may di er from current models.
every neuron on the next layer. The input p and the output a of a single neuron are related by a weight w , bias b and a transfer function f, as represented by ® gure 4 and the equation a = f (w p + b)
(4 )
The transfer function of the neuron can vary from binary to linear to complex functions; log± sigmoid transfer functions are the type used here (with the exception of the output layer which is linear). The output of a single neuron can be represented by the equation a=
1
1 + exp (w p + b)
(5 )
The output of the neuron would either feed through into the next layer or form one of the outputs of the network (if the neuron is in the last layer). The overall structure is graphically represented in ® gure 5 where the input values to the complete network are speci® ed in the input vector P with the ® nal output vector I , being determ ined by the combination of the input with the weights and biases. The network may be represented by some function F such that we may write I = F ( P , w 1 , .., w l , b 1 , .., b l )
Figure 4.
(6 )
A single neuron consisting of an input value p , a weight w , a bias b and a transfer function f.
1846
Figure 5.
D. B u ck to n et al.
A two-layer network consisting of three inputs, a four-neuron sigmoid layer and a linear output layer.
where w i and b i represent vectors containing the weights and biases of the i th layer, l being the total number of layers in the network. The purpose of NNs in this context is to e ectively perform a function approximation. If, for example, we have a function G that produces, based on some physical parameters x and y , the observation O given by O = G (x , y )
(7 )
then it is possible, using derived values of the weights and biases, to produce a NN 1 which can either approximate the function G or its inverse G Õ . Here the sim ulation performs the function of G , with x being the constituents, y the viewing geometries and O the observations. We desire the network to perform e ectively an inversion of this, such that from the observational data and the viewing information we can robustly estim ate the constituent concentrations. For the network to approximate a function, the appropriate values for the weights and biases have to be determ ined. This is done in the training stage. This utilizes a set of inputs P and a known correspo nding calibrating or training set T which span the input range that the user wishes the network to operate over. An error minimizing function is presented with the network, along with random initial guesses of the weights and biases and the set of inputs P with the correspo nding set of training parameters T . The training algorithm initially calculates the output vector or matrix I from P . This is denoted I 0 , the 0 subscript denoting the zeroth epoch or stage in the error minimization process. The training algorithm calculates the error in a sum squared sense between I 0 and T and modi® es the weights and biases in accordance with the method used by the error minimization algorithm, which would often be something like the line of greatest slope of the error curve. This is repeated until the error between T and I reaches a satisfactory level. Once training has been completed it is then possible to use the network to estim ate from a new set of observables the correspo nding input values. The initial requirem ents for the implementation of a neural network lie in: decisions about the numbers of layers to use in the network; the number of neurons to use in each layer; the transfer function of the neurons in each layer; and the
1847
ME R IS
method of presenting the data to the network. The mechanisms for training and implementation are well documented (M athworks Inc. 1994 ). The principal training algorithm used here is that attributable to Levenberg± M arquardt which, while requiring more memory than other techniques for its implementation, converges quickly to global minima. It should also be noted that the ® nal implementation of a NN and the training technique are quite independent processes. W hile the training process tends to be numerically intensive, the ® nal implementation requires comparatively little computation. 3.1. Im plem entatio n o f a N N The implementation of a NN is straightforward using the toolbox accessory for the M ATLAB software system produced by the M athworks company. This provides easy access to many powerful NN features. Here we have a set of observations O which can be attributed to a set of parameters such as the constituent concentrations and the geometric parameters. From this we wish to have a NN to perform a function approximation of the relationship between the two. For a satellite image the input information constitutes the observed data, contained in O j , and the geometric parameters, h S , h v and w (where O j is a vector of the observations at each wavelength of pixel j ). The oceanic constituent concentrations and the atmospheric composition would constitute the unknown or output parameters, being known only for the training data set. We set a combination input matrix P to be equal to the observation matrix O and the geometric parameters such that a row of the P matrix can be expressed j j P = [ O l1
...
j
j
O ln
j
hS
hv
j
w]
(8 )
where l1 , ..., l n correspo nd to the n spectral bands of the instrument. The network would then be trained to modify the weights and biases to calculate from P the correspo nding constituent concentrations, where the training matrix T would be formed from the constituent concentrations, a row of which can be written j j T = [C c j Cc ,
j Cy
j
j
C s]
Cy
(9 )
j Cs
where and represent the concentrations of chlorophyll, DOM and sedim ents respectively for the j th pixel. Extra parameters may also be required, such as the optical depth (this is not dealt with here) . Under sim ulation, even when noise sources are present, the condition number of the O matrix is large. This, combined with the bene® t of a dimensionally reduced input matrix, makes it desirable to perform some sort of pre-processing on the input matrix. This is performed with the aid of Singular Valued Decomposition (SVD). 3.1.1. S V D a s a p re-p roce ssing tech n ique SVD is used as a pre-processing technique for the reduction of the input data before presenta tion to the NN. The observation matrix O may be decomposed by SVD such that O = W LV
T
( 10 ) T
T
where W contains the unit colum n eigenvectors of OO with O signifying the T transpose of the O matrix. V contains the unit row eigenvectors of the matrix O O , with L being a diagonal matrix containing the square roots of the eigenvalues of T O O sorted in descending order. Depending on how much of the full rank of O is
1848
D. B u ck to n et al.
spanned by the signal space associated with the constituents, we can select the ® rst k colum ns of the W matrix as input to the network which we denote by a prime, giving W ¾ . This reduces the size of the input matrix to the network. Typically the value of k used is between eight and ten, depending on the relative levels of signal and noise. The geometric parameters are usually appended (columnwise) to this to generate the input to the network P , given by P = [W ¾
hS
w] .
hv
( 11 )
The NN is trained for this with the associated training matrix T ; this contains the oceanic constituent concentrations as speci® ed in equation ( 9 ). Once the weights and biases have been established by training, the constituent concentrations for subsequent datasets can be estim ated by presenting the network with a new set of data. A new input matrix PÄ can be generated from the new Ä by making use of the unitary orthogonal properties of the V observation matrix O matrix, hence Ä V ¾ L¾ Õ PÄ = [ O
1
hÄ S
wÄ ]
hÄ v
( 12 )
where again V ¾ and L¾ are reduced matrices containing only the ® rst k colum ns and diagonal elements respectively, with the tilde signifying the parameters used belong to a set independent of the training set. The training phase is deem ed to be complete when the error on a second training set decreases negligibly or starts to increase. Continued training is unlikely to improve the inversion accuracy, or in the case of an overdetermined network it is likely to cause overtraining , a situation where the network ® ts to speci® c data points instead of generalizing the overall scene. At this stage inversion accuracy is tested on a third sim ulation set to provide a truly independent estim ate of the accuracy. W hereas the weights and biases are calculated within the training algorithm, human intervention is used to determ ine both the number of layers and the number of neurons in each layer. W ith too sim ple an architecture (too few neurons and / or layers) the sum squared error will not reduce su ciently even with the training data. W ith too complex an architecture the network will yield a very low error on the ® rst training set but the performance will degrade on the second dataset (overtraining). 3.2. Mea su res of a ccu racy The notion of accuracy can be viewed in a di erent manner depending on the circumstances. The two forms that are used here are the concept of correlation and that of RM S error in its logarithmic format. The i th sim ulated / measured parameter est is C i and the correspo nding estim ate’s C i accuracy is calculated by the use of the following equation for the correlation es t
x=
S i (C i Õ
C
est
)(C i Õ
C)
t Ó S i (C es i Õ
) Ó S i (C i Õ
C
e = 20 log 1 0
Ó S i (C i Õ C i ) 2 Ó S i (C i )
est 2
2
( 13 )
C)
and the RM S error by es t
2
(14 )
The measure of correlation can be di cult to visualize in this context. However, it su ces to say that a correlation of one correspo nds to a perfect matching of the
1849
ME R IS
relationship between the estim ated and the true value, and a correlation of zero indicates no relationship between the two measures. It should be noted that the correlation measure ignores any o set or bias and scaling errors between the two sets. Hence the correlation measure is not a complete measure of accuracy. The RM S error calculates the normed di erence between the two measures and express it in a decibel format. Assuming a normal distribution of error between the estim ate and the sim ulated value, the error can then be described as to be within 30% for Õ 10 dB, 10% for Õ 20 dB and 3% for Õ 30 dB. 4.
Results The model input in the form of sim ulated concentrations and geometry is shown in table 1. The general procedure for the selection of a suitably sized NN is to start with an undersized network which is trained until little improvement per epoch is attained. The size of the network is gradually increased until no signi® cant improvement with size is attained or the presence of overtraining is observed. The number of ¯ oating point operations (¯ ops) for the inversion process is dependent on the size of the network and increases signi® cantly with the number of nodes in each layer. 4.1. Inversion o f no ise-f ree sim u lated d ata . The noise-free case provides an indication of the maximum likely inversion accuracy. It is, of course, a ected by the method used for reducing the data before it is input into the network. The method that is used, SVD, being linear, will lose information due to the nonlinearity of the function G , if the number of vectors chosen to input to the network is less than the number of bands. So for the calculation of the noise-free case it was necessary to set k = 15. This allowed the maximum level of information to be input to the NN. The retrieved accuracy shown in table 2 demonstrate s good performance with a typical error of less than 10% for the three constituents. 4.2. Inversion o f sim u lated d ata includ ing n o ise e ec ts A considerable number of noise sources and variations will be present in any satellite observation system . These can have a signi® cant a ect on any retrieval attem pts. The noise and sources of variation that are likely to a ect retrieval performance here are: instrument and photonic noise; atmospheric and geometric variation. E
E
The inclusion of instrument noise in the TOA calculated signal has a signi® cant Table 2.
NN inversion accuracy for a three-layer network with 13 log± sigmoid neurons in the ® rst layer, ® ve log± sigmoid neurons in the second layer and a three-neuron linear ® nal layer. The simulated data were noise-free with 300 scenes. dB
n
Chl-a Sed. DOM
0.994 0.996 0.997 Õ
Õ
Õ
30.1 26.5 28.1
% 3 5 4
1850
D. B u ck to n et al. Table 3.
NN inversion accuracy for a three-layer network with 13 log± sigmoid neurons in the ® rst layer, ® ve log± sigmoid neurons in the second layer and a three-neuron linear ® nal layer. The simulated data contained photonic and instrument noise with aerosol optical thickness variability. 300 scenes were simulated for both the training and implementation sets. dB
n
Chl-a Sed. DOM
0.85 0.96 0.89 Õ
Õ
Õ
11.2 17.4 12.1
% 27 13 25
Figure 6. Simulated chlorophyll concentration versus estimated concentration for 300 data points. The left plot corresponds to simulated data without noise, obtaining a retrieval accuracy of 3% with a correlation of 0.994. The right plot shows retrieval abilities in the presence of photonic and instrument noise, obtaining a retrieval accuracy of 27% with a correlation of 0.85. Both plots include unknown atmospheric optical depth.
e ect on the inversion accuracy, the results of which are shown in table 3. A scatter plot of the sim ulated chlorophyll concentration versus chlorophyll concentration retrieved is shown in ® gure 6. For the results presented here k = 10; the slight reduction in the retrieval accuracy due to this is o set by the reduction in the computational requirem ents for inversion and training. These results show that the use of NNs in the inversion of oceanic constituents in Case II waters is likely to be able to calculate the chlorophyll-a and sedim ent concentration to an accuracy of between 10% and 30% over the range speci® ed. 5.
Conclusion In the absence of noise we have shown how it is possible, using a reasonable number of pixels with ground truth (approximately 300 data points), to invert sim ulated satellite observations to calculate oceanic constituent concentrations. However, the performance of any retrieval technique will be adversely a ected by
ME R IS
1851
the presence of noise sources and variability as outlined in §4.2. W ith the inclusion of instrument and photonic noise, variations due to the atmosphere (in a limited way) and geometric parameters, the inversion accuracy is reduced but still performs within an acceptable accuracy for Case II waters. The required level of computation to perform such an inversion (including pre-processing ) is approximately 1130 ¯ ops per pixel for the network used here. This number may be reduced by the use of look up tables. Further work is required to examine the e ects of additional noise, variabilities and instabilities as described. It is also desirable to include atmospheric correction schem es, whether NN based (O’Mongain et al. 1993 ) or using traditional techniques, so that the inversion mechanism can be trained on estim ated above water re¯ ectances. References B enediktsson, J ., S wain P ., and E rsoy, K ., 1993, Conjugate-gradient neural networks in classi® cation of multisource and very high dimensional remote sensing data. Inte rnatio nal Jo urnal of R em ote S ensing , 14 , 2883± 2903. B uckton, D ., D anaher, S ., and O’ M ongain, E ., 1995, Simulation of the MERIS instrument and constituent estimation. P roceeding s S P IE G lo bal P rocess Monito ring and R em ote S ensing of the O cean and S ea Ice , 2586 , 2± 13. C arling, A ., 1992, Introducing Neural Netw orks ( Sigma Press, UK). ISBN: 1-85058-174-6. D anaher, S ., and O’ M ongain, E ., 1992, Singular value decomposition in multispectral radiometry. Inte rnatio nal Jo urnal of R em ote S ensing , 13 , 1771± 1777. D anaher, S ., O’ M ongain, E ., and W alsh, J ., 1992, A new cross-correllation algorithm and the detection of rhodamine-B dye in sea water. Inte rnatio nal Jo urnal of R em ote S ensing , 13 , 1743± 1755. D emuth, H ., and B eale, M., 1994, T he Neural Netw ork T oolbo x : User G uide , The Mathworks Inc., Natick, MA, USA. D eschamps, P . Y ., H erman, M ., T anre, D ., R ouquet, M . C ., and D urpaire, J . P ., 1982, E ets atmosphrics et e valuation du signal pour des instruments optiques de te le de tection. E SA Jo urnal , 6 , 233± 246. D oerffer, R ., 1992, Imaging spectroscopy for detection of chlorphyll and suspended matter, GKSS, ISSN 0344-9629; F. Toselli and J. Bodechtel (eds.), Im agin g S pectroscopy: F u ndam enta ls and P rospectiv e A pplica tio ns , 215± 257. G ordon, H ., B rown, O . B ., and J acobs, M . M ., 1975, Computed relationships between inherent and apparent optical properties of a ¯ at homogenous ocean. A pplied O ptic s , 14 ( 2), 417± 427. G ordon, H ., C lark, D . K ., B rown, O . B ., E vans, R . H ., and B roenkow, W . W ., 1983, Phytoplankton pigment concentrations in the middle Atlantic bight: comparison of ship determinations and CZCS estimates. A pplied O ptic s , 22 ( 1 ), 20± 36. M orel, A ., 1988, Optical modeling of the upper ocean in relation to its biogenous matter content (Case I waters). Jo urnal of G eophysical R esearch , 93 , 10 749± 10 768. O’ M ongain, E ., D anaher, S ., B uckton, D ., and B ezy, J . L ., 1993, De® nition of the calibration requirements for an imaging spectrometer system. P roceeding s S P IE R ecent A dvances in S ensors, R adio m etric C alib ratio n, and P rocessing of R em ote ly S ensed D ata , 1938 , 88± 99. P arslow, J ., 1991, An e cient algorithm for estimating chlorophyll from CZCS data. Inte rnatio nal Jo urnal of R em ote S ensing , 12 , 2065± 2072. S mith, R ., and B aker, K ., 1981, Optical properties of the clearest natural waters ( 200± 800 nm). A pplied O ptic s , 20 , 177± 184.