continues to be the most popular method for modeling transient soil moisture ... of repeated flow simulations as e.g., Monte Carloâbased studies and/or inverse ..... first and second moments of the output zsz, was determined by comparing the ...
WATER RESOURCES RESEARCH, VOL. 41, W03022, doi:10.1029/2004WR003630, 2005
Self-organizing maps with multiple input-output option for modeling the Richards equation and its inverse solution N. Schu¨tze and G. H. Schmitz Institute of Hydrology and Meteorology, Dresden University of Technology, Dresden, Germany
U. Petersohn Institute of Artificial Intelligence, Dresden University of Technology, Dresden, Germany Received 8 September 2004; revised 18 October 2004; accepted 10 November 2004; published 26 March 2005.
[1] Inverse solutions of the Richards equation, either for evaluating soil hydraulic
parameters from experimental data or for optimizing irrigation parameters, require considerable numerical effort. We present an alternative methodology based on self-organizing maps (SOM) which was further developed in order to include multiple input-output (MIO) relationships. The resulting SOM-MIO network approximates the Richards equation and its inverse solution with an outstanding accuracy, and both tasks can be performed by the same network. No additional training is required for solving the different tasks, which represents a significant advantage over conventional networks. An application of the SOM-MIO simulating a laboratory irrigation experiment in a Monte Carlo–based framework shows a much improved computational efficiency compared to the used numerical simulation model. The high consistency of the results predicted by the artificial neural network and by the numerical model demonstrates the excellent suitability of the SOM-MIO for dealing with such kinds of stochastic simulation or for solving inverse problems. Citation: Schu¨tze, N., G. H. Schmitz, and U. Petersohn (2005), Self-organizing maps with multiple input-output option for modeling the Richards equation and its inverse solution, Water Resour. Res., 41, W03022, doi:10.1029/2004WR003630.
1. Introduction [2] The numerical solution of the Richards equation continues to be the most popular method for modeling transient soil moisture transport, mainly, due to the various limitations of analytical and semianalytical infiltration models [Philip, 1957; Haverkamp, 1983; Schmitz and Liedl, 1998]. User friendly commercial (e.g., Hydrus-2D [Simu´nek et al., 1996]) and public domain simulation programs (e.g., VS2DT [Healy and Ronan, 2000]) are routinely applied, proving themselves to be versatile and flexible tools. However, all these numerical models require the support of iterative algorithms. Together with the ambiguity of the parameters of the van Genuchten soil model [Felgenhauer et al., 1999] this may jeopardize the stability of the computation, especially in the presence of strong pressure gradients or flow near saturation [Vogel et al., 2000]. Obviously, things become even worse if the inverse solution of the Richards equation is required [Pan and Wu, 1998] which, moreover, incorporates the risk that the optimization may end up in a local minimum and, hence, provide an incorrect result. This contribution investigates a specific type of artificial neural network (ANN) to serve as an alternative strategy when dealing routinely with a multitude of repeated flow simulations as e.g., Monte Carlo –based studies and/or inverse solutions of the Richards equation. Copyright 2005 by the American Geophysical Union. 0043-1397/05/2004WR003630$09.00
[3] Applications of ANN in water resources, which mostly employ the Multilayer Perceptron architecture (MLP), are well documented in several review papers [Atiya and Shaheen, 1999; Maier and Dandy, 2000; American Society of Civil Engineers (ASCE), 2000a, 2000b; Dawson and Wilby, 2001]. A number of applications focus on the development of robust tools in rainfall-runoff modeling [Shamseldin, 1997; Sezin and Johnson, 1999] as well as stream flow forecasting [Shamseldin and O’Connor, 2001; Jayawardena and Fernando, 2001; Atiya and Shaheen, 1999]. All these ANN based models perform well in comparison with conventional methods [Hsu et al., 1995; Dawson and Wilby, 1999]. However, they yield poor results if the training data do not cover all essential scenarios [Kumar and Minocha, 2001; Minns and Hall, 1996]. In a first attempt to overcome this limitation, Smith and Eli [1995] use model generated data rather than field data. They portray an entire hydrograph by a Fourier series and obtain accurate predictions for multiple storm events. Schmitz and Schu¨tze [2002] proposed another solution to the dilemma of insufficient training data by employing physically based flow models in order to train a specific type of ANN, namely self-organizing maps (SOM) [Kohonen, 2001]. They achieved promising results when evaluating a reduced inverse solution of the Richards equation. [4] Self-organizing maps are originally designed for objective classification, pattern recognition and image compression, i.e., they lack the ability to generate output data. Cai et al. [1994] successfully employed a standard SOM for
W03022
1 of 10
W03022
¨ TZE ET AL.: SOMS FOR MODELING THE RICHARDS EQUATION SCHU
classifying flow regimes in horizontal air-water flow. Other typical applications of SOMs include the compression of remote sensing data [Kothari and Islam, 1999] and data clustering [Bowden et al., 2002]. In contrast to MLP, the SOM network architecture provides an insight into the underlying process and avoids problems regarding the identification and training of MLP [Hsu et al., 2002]. Most probably, the lack of an output function in the SOM architecture has until now restricted its popularity in water resources. Efforts are being made to overcome this limitation. For predicting classes of hydraulic conductivity Rizzo and Dougherty [1994] generate discrete output values using the Counterpropagation network (CPN) [Hecht-Nielsen, 1987]. The CPN consists of a standard SOM and a so-called Grossberg layer. [5] Recently, further combinations of SOM with various other techniques have been designed to approximate single continuous functions. Ranayake et al. [2002] combined selforganizing maps with a Multilayer Perceptron for a two-step estimation of the hydraulic conductivity and the dispersion coefficient. While the SOM was used to identify the sub range of the parameters, the MLP provided the final estimates. Hsu et al. [1997] modified a CPN in order to estimate precipitation via a Grossberg layer. At a later date, Hsu et al. [2002] successfully applied their self-organizing linear output (SOLO) mapping network to hydrologic rainfall-runoff forecasting. The SOLO network architecture combines a SOM and a linear mapping network of the same dimension. Schu¨tze and Schmitz [2003] implement another type of hybrid ANN, the local linear map (LLM). They integrate the mapping functions directly in a single selforganizing map thus ensuring a high reliability and accuracy in approximating the numerical solution of the Richards equation. All the above discussed approaches enable trained hybrid SOM to be used for solving single input-output problems, i.e., for reidentifying an output vector from a given input vector. [6] This contribution aims to proceed a step further by developing a self-organizing map architecture with a multiple input-output option (SOM-MIO) which, for example, allows simulating soil water transport as well as solving different inverse problems within a single SOM-MIO. Moreover, we introduce a training procedure which ensures that the generated data fully portrays the modeling domain. Thus we analyze the aspects of generating optimal training sets by physically based models following the suggestions of ASCE [2000b].
W03022
[8] Another technique for learning a particular function is unsupervised training. Network architectures like the selforganizing maps use algorithms which fit an ‘‘elastic net’’ of nodes to a signal space (represented by a great number of sample vectors (x, y), i.e., input plus output vectors) to approximate its density function. In order to realize this, SOMs combine unsupervised with competitive training methods. 2.1. Self-Organizing Maps [9] A self-organizing map can be adapted to almost arbitrary domains of definition. It allows an interpretation of its internal network structure and is able to approximate the graphs of any continuous function [Kohonen, 2001]. [10] The SOM network used in this investigation consists of l neurons organized on a regular grid. A n + mdimensional weight vector m = (m1. . . mn, mn+1. . . mn+m) is assigned to each neuron, where n = dim(x) and m = dim(y), denote the dimensions of the sample input and the sample output, respectively. Thus contrary to MLP and RBF networks, the input signal of the SOM, xSOM, always consists of both input and output vectors (x, y) which are specified in detail in section 3. The neurons are connected to adjacent neurons by a neighborhood relationship, which defines the topology or the structure of the SOM. In order to characterize the basic features we use a two-dimensional structure of the self-organizing map and a hexagonal grid for the neighborhood relationship Ni. Generally, the SOM is trained iteratively. Each iteration k involves an unsupervised training step using a new sample vector xSOM. The weight vectors mi were modified according to the following training procedure. [11] 1. Begin training the SOM. [12] 2. Initialize the SOM: choose random values for the initial weight vectors mi. [13] 3. Begin iteration k. [14] 4. Iteration step 1: best matching unit (winner) search. At each iteration k one single sample vector xSOM(k) is randomly chosen from the input data set and its distance i to the weight vectors of the SOM is calculated by i ¼ kxSOM ðk Þ mi k ¼
nþm X
2 xjSOM ðk Þ mji :
ð1Þ
j¼1
The neuron whose weight vector mi is closest to the input vector xSOM(k) is the ‘‘winner,’’ i.e., the best matching unit (BMU) c, represented by the weight vector mc(k) kxSOM ðk Þ mc ðk Þk ¼ minfkxSOM ðk Þ mi kgi ; i ¼ 1; 2 . . . l:
2. Methods
ð2Þ
[7] Neural networks are composed of simple elements operating in parallel which are inspired by biological nervous systems. As in nature, the network function is largely determined by the connections between the elements. Commonly, the training of ANN is based on a comparison of their output y0 and a known target y. Such network architectures use a supervised learning procedure with a multitude of corresponding input-output pairs. Most of the current neural network applications apply the Backpropagation algorithm to layered feedforward networks (i.e., MLP) for supervised learning. A specific form of this learning principle is also used by the radial basis function (RBF) networks.
[15] 5. End iteration step 1. [16] 6. Iteration step 2: weight vectors update. After finding the best matching unit c, the weight vectors of the SOM are updated. Thus the BMU c moves closer to the input vector in the sample space. Figure 1 shows how the reference vector mc(k) of the BMU and its neighbors move toward the sample vector xSOM(k). Figures 1a and 1b correspond to the situation before and after updating, respectively. The rule for updating the weight vector of unit i is given by
2 of 10
mi ðk þ 1Þ ¼ mi ðk Þ þ as ðk Þhci ðk Þ½xSOM ðk Þ mi ðk Þ;
ð3Þ
W03022
¨ TZE ET AL.: SOMS FOR MODELING THE RICHARDS EQUATION SCHU
Figure 1. Updating the best matching unit and its neighbors toward the sample vector xSOM marked x. where k denotes the iteration step of a training procedure, as(k) is the learning rate at step k and hci(k) is the so-called neighborhood function which is valid for the actual BMU c. hci(k) is a nonincreasing function of k and of the distance dci of unit i from the best matching unit c. The Gaussian function is widely used to describe this relationship: 2
2
hci ðk Þ ¼ edci =2s
ðk Þ
:
sðk Þ ¼ sð0Þek=kmax :
ð5Þ
The learning rate as should also vary with the increasing number of training steps as indicated in equation (3). Kohonen [2001] recommended commencing an initial value as(0) with a value close to 1 and then to decrease gradually with an increasing number of training steps k, e.g., as ðk Þ ¼ as ð0Þek=kmax :
nition of images and acoustic patterns. When using the SOM for these tasks the most important step before application is the interpretation of its final internal structure, i.e., the labeling of the network units appertaining to a certain class. The result of an operation of a classic SOM represents then discrete information, e.g., a certain phoneme in speech recognition or a character in optical character recognition (OCR) represented by the BMU. [22] In order to offer a wide range of application in water resources we now expand the SOM principle by introducing a new interpolation method when applying a trained SOM which generates multiple continuous output information. This leads to the new SOM-MIO architecture which in accordance with the underlying problem arranges the data vectors after training into two predefined parts during application. Rearranging the original data vectors allows for switching between the different mapping functions provided by the SOM-MIO. For example, consider a sample vector xSOM with three components (x1, x2, x3). Three options for operating the SOM-MIO now exist: (1) (x = (x1, x2), y = x3), (2) (x = (x2, x3), y = x1), and (3) (x = (x1, x3), y = x2). y denotes the required output which is not available during application. Two matrices Dx and Dy, with
ð4Þ
Variable s is the neighborhood radius at iteration k and dci = krc rik is the distance between map units c and i on the map grid. The neighborhood radius S corresponds to the neighborhood relationship Ni. [17] 7. End iteration step 2. [18] 8. End iteration k. [19] 9. End training of the SOM. [20] Steps 1 and 2 are repeated as often as necessary until the amount nd of sample vectors achieves convergence. Convergence implies that hci(k) ! 0 for k ! 1 and thus, is dependent upon on the function of the neighborhood radius s(k). A common choice is an exponential decay described by Ritter et al. [1992]:
ð6Þ
The cooperation between neighboring neurons, a unique feature of the SOM algorithm, ensures a rapid convergence and a high accuracy in approximating functional relationships. Even though the exponential decays described in equations (5) and (6) for the neighborhood radius s and the learning rate a are purely heuristic solutions, they are adequate for a robust formation of the self-organizing map [Kohonen, 2001]. 2.2. New SOM-MIO [21] Originally, the SOM is intended as a tool for solving classification problems, e.g., as feature extraction or recog-
W03022
D ¼ diagfdi gdi ¼
0 i ¼ 1...n 1 i ¼ n þ 1 . . . n þ m;
ð7Þ
must be a priori defined according to the chosen mapping function, in order to select input or output components of reference and data vectors. For example, operating the SOM-MIO with option 1 requires 2
3 2 3 1 0 0 0 0 0 Dx ¼ 4 0 1 0 5 Dy ¼ 4 0 0 0 5: 0 0 0 0 0 1
ð8Þ
By choosing Dx and Dy the same SOM-MIO can be used for different mapping tasks, i.e., either for the approximation of a numerical model or for solving the corresponding inverse problem. This is a unique feature of the SOM-MIO among all other ANN architectures. 2.3. Interpolation Using the Delaunay Triangulation [23] The new architecture of the SOM-MIO allows a calculation of the required output value y using an adapted interpolation method. The strategy for determining a smooth continuous output information uses the Delaunay triangulation TD which is regularly applied to the generation of finite element meshes as well as to Digital Terrain Models (DTM). The interpolation method with triangulation (ITRI) provides multidimensional mapping using the same uniquely trained SOM. Our implementation of this method is based on a Delaunay triangulation using the Quickhull algorithm suggested by Barber et al. [1996]. For a given set of points the Delaunay triangulation maximizes the minimum angle over all possible triangulations, a fact which is desirable for interpolation methods. [24] For application, it is necessary to perform a unique transformation of the hexagonal SOM topology into a set of nT triangles {tj}nT:
3 of 10
tj nT ¼ TD fmi Dx gl :
ð9Þ
W03022
¨ TZE ET AL.: SOMS FOR MODELING THE RICHARDS EQUATION SCHU
W03022
simulation task as well as the inverse solution with the same network. [27] The operation of an ANN as a mapping function requires a set of data (x, y), which represents the input x 2 X, X