AR18
Recurrent and Constructive-Algorithm Networks For Sand Behaviour Modelling by
Miguel P. Romo*, Silvia R. García, Manuel J. Mendoza and Victor Taboada-Urtuzuástegui
*Institute of Engineering, National University of Mexico Apdo. Postal 70-472, Coyoacán 04510, México, D.F. Tel. 52 5622 3500 - 52 5616 0626. Fax 52 5616 0784 Email:
[email protected]
1
ABSTRACT
This article presents and discusses various aspects regarding the modelling of the behaviour of a coarse granular material using Recurrent Neural Networks (RNNs) and Constructive Algorithms (CAs). A series of undrained triaxial tests following compression stress path was performed to develop the data base for neural network training and testing where the relative density (Dr) and the confining effective stress () were varied. The range of Dr and values was selected to have both dilatant and compressive sand behaviour. Modelling of sand behaviour is done using Cascade and Jordan's network architectures. Several input functions, learning rules and transfer functions are utilised to evaluate their effects on the accuracy achieved by both algorithms during the training and predicting stages as well as on the time employed to perform these tasks. It is also shown that for the case of cascade networks, when the full-size network having two outputs (pore water pressure and deviatoric stress) is divided into two networks with only one output each, the accuracy of predictions is improved appreciably.
The results, in terms of pore water pressure-stress-strain relationships, included in this paper point out the great potential RNNs and CAs have to become another class of computational tools to solve complex problems in material modelling. Thus it is conceivable that ANNs, properly trained on a comprehensive data set, could be designed to model the behaviour of soil materials under a variety of initial conditions and stress path trajectories.
KEY WORDS: Recurrent neural networks, constructive algorithms, coarse sand modelling
2
INTRODUCTION
Development of mathematical models to describe the relationship between stresses and strains has been the purpose of many investigations to evaluate stress-strain states in large earthen structures like dam-embankments or other geotechnical problems. These models incorporate mathematical rules and expressions that capture most varied and complex behaviours. A typical material model development includes: (i) a material to be tested and the behaviour is observed, (ii) a mathematical model is hypothesised to explain the observed behaviour, and material parameters are determined, (iii) the mathematical model is used to predict yet untested stress paths which are checked against the results from existing or new experiments, and (iv) the mathematical model is then generalised to account for the behaviours observed.
This process, if a constitutive model is to be developed, usually leads to mathematical expressions that include large number of parameters each of whose rule or contribution is poorly understood. This could lead to violation of the principles of thermodynamics, the requirements of symmetry and indifference of frame [1]. Recent studies [2] have shown that constitutive relations for granular materials cannot be sufficiently formulated in the principal stress space unless the effect of non-coaxiallity be accounted for. All of these creates solution difficulty with traditional techniques, pointing out the need of other alternatives that are potentially capable of resolving these interpretative problems. It may be feasible to develop a comprehensive data base, which could be used to apply knowledge-based procedures and to develop modelling tools that, in principle, would predict the response of soil materials when subjected to a large variety of conditions. In this paper, Artificial Neural Networks (ANNs) are employed as a behaviour-modelling technique for coarse sand under undrained triaxial compression. The time-varying nature of material behaviour under monotonic loading indicates that networks of the type of RNNs and/or CAs should be used. These kinds of networks allow connections between any pair of processing units but keep the input and output units inherent to multilayer networks.
The application of neural network in solving geotechnical problems provides a fundamentally different approach to the representation of material behaviour relationships. The field of neural networks has experienced a considerable
3
resurgence since the early 1940s. ANNs present a different approach from phenomenological-modelling and the traditional statistical methods. It needs no hypothesis about the physics of the phenomenon being modelled. In essence, they can map one space of input patterns to a space of output patterns [3].
In view of the many potential advantages that knowledge-based procedures over classical methods may have, several research projects are underway to explore the application of neural networks to a wide variety of geotechnical problems. A comprehensive list of references is given elsewhere [4].
Ghaboussi et al. [5] presented results of their research in using neural networks as a computational tool for capturing complex material behaviour (stress-strain paths of concrete). With back-propagation neural networks their predictions are better than the predictions achieved with mathematical material models. Ellis and co-workers [6] implemented feed forward back-propagation networks for modelling the stress-strain relationship of sand with varying grain-size distribution and stress history. They concluded that knowledge-based procedures adequately perform this task.
The results reported in the above investigations indicate that back-propagation networks reproduce well material behaviour. However, this algorithm presents some limitations when applied to modelling the constitutive behaviour of geomaterials. Does not simulate the stress-strain behaviour as a constitutive model (stress-dependence is not simulated). Its architecture may be suited only to the case learned. And the convergence or learning time may grow exponentially with the size of the network (number of nodes and/or connections).
In an attempt to improve the modelling of material constitutive behaviour, Ghaboussi and Sidarta [7] proposed a nested adaptive neural network and applied it to model the undrained and drained behaviour of sand in triaxial tests. Similarly, Ellis et al. [8], Penumadu and Chameau [9], and Penumadu and Zhao [10] used a type of network called feed-back neural network to represent the behaviour of sand, clay type soil and gravel. The results reported in all of the above studies show a significant improvement over those obtained using back-propagation algorithms.
To further pursue the investigations in this field, the authors performed a series of undrained triaxial compression tests. Relative density (Dr) and the confining effective stress (3, the apostrophe, meaning effective stress, is not used
4
in this paper) were varied, to develop the data base required for training and testing, and hence, designing an appropriate neural network. In this investigation, it is particularly explored the use of the RNN proposed by Jordan and Bishop [11] and the Cascade network [12]. Various learning methods, and input and activation functions were used in this study to test their capabilities. However, only those that gave better results are reported herein. Throughout comparisons between the neural predictions and unseen data, it is shown that procedures developed from knowledge based on experimental information are a viable alternative to analytically-based modelling.
ARTIFICIAL NEURAL NETWORKS BACKGROUND
Neural networks consist of massively connected single neurones. They are computational models that process the information in a parallel distributed fashion. An ANN is usually defined as a network composed of a large number of single processing units (neurones) that are massively interconnected, operate in parallel and learn from experience (training examples).
Feed forward and recurrent neural networks as well as constructive algorithms are major classes of network models. Feed forward networks, stick as the popular multilayer perceptron [13], are commonly used as representative models trained employing a learning algorithm based on a set of sampled input-output data. Feed forward networks consist of one input layer, one or multiple hidden layers and one output layer. The input units are merely distribution cells, which provide all of the measured variables to all of the processing neurones on the second (hidden) layer. Each of these neurones are activated and transmitted to other processing units. The input and output of a network computation is represented by the activation level of designated input and output units, respectively. The connections between these units, of which are many, vary in their efficiency of transmittal of this activation signal. What the network computes is highly dependent on how the units are interconnected and the strengths of the connections between them.
Recurrent neural networks are transient in nature [14]. They are basically feed forward networks with feed back capabilities which are achieved by connecting the neurones' outputs to their inputs. The essence of closing the feed back loop is to enable control of the ith output through the jth (j=1,2,…,n) outputs. This is specially meaningful if the
5
present output controls the output at the following instant, as is the case in the response of soil samples under monotonically increasing loading where outputs from present step influence the outputs of the next step.
Fahlman and Lebiere [12], have shown that among the construction algorithms, the cascade-correlation algorithm (CCA) is well suited to solve non linear classification problems that are extremely hard to be solved with back propagation algorithms. CCA begins with a minimal network, then is enlarged by adding single hidden units, one at a time, until the optimum solution is reached. This feature makes the CCA appealing to model stress-strain curves because it eliminates the need of guessing the size, depth (number of hidden layers) and connectivity pattern of the network in advance.
Recurrent Networks
Hopfield [14] developed a discrete network based on the neural model proposed by Amari [15]. Hopfield's model, depicted in figure 1, assumes that, at time zero, an input vector X=(x 1,x2,…,xn) is presented to the network's single layer of processing units, one component value per processing unit. This initial input vector forms initial output vector Yo=(yo1, yo2,…,yon). Each initial output is fed back to a branching node, where it fans out to each of the processing units except itself, as shown in figure 1. This feed-back fan-out lines lead into the processing units (indicated as jth processing unit in figure 1) and are excited or inhibited by the weights {W ij}, where the sub-index i designates the source of the input and the sub-index j indicates the destination neurone associated with this weight. Since none of the processing units are assumed not to feed back to themselves, then W jj=0 for each j. On the (r+1)th feed back loop, the output (yr1, yr2,…,yrn) is processed by summing the weighted feed back values wij yri, adding on a random value Ij (see figure 1), and then substracting a threshold value j. The results are summed and then activated by a bipolar function (threshold) defined by f(s)=1 if s0, else f(s)=-1.
6
Figure 1. The Hopfield network architecture
The Hopfield model, with random inputs I j, fires (activates) in a random serial fashion at the j th processing unit to update the single jth output yj(r+1) following equation (1):
n y rj 1 f wij y rj I j j i 1
(1)
The Hopfield model has been extended by Kosko [16] to a bi-directional associative memory network (also known as BAM), which is a hetero-associative memory. This network can be used as a two-way (forwards and backwards) associative memory and it can recognise more classes than the Hopfield network. However, the number of classes cannot exceed the minimum of the number of neurones in any of the two layers of BAM network. Also the association has to be one-to-one.
Jordan and Bishop [11] used a back-propagation network where the outputs are added to the input vector in a closed loop to generate temporal sequences. A schematic architecture of Jordan's network is given in figure 2. Notice that the
7
vectors Y(1,i+1),…,Y(n,i+1) are cycled back to the input after each loading step. In this fashion, the network is capable of storing information throughout time, which makes it particularly suitable for forecasting applications.
Input layer
Hidden layer
Output layer y 1,i+1
x1 x2
y 2,i+1
xn
y n,i+1
y 1,i
y 2,i
y n,i
Figure 2. Schematic representation of Jordan's network
Jordan's network architecture can be trained using back-propagation or any other algorithm that can be used on a feed forward architecture.
In this paper, however, Quick Propagation (QP) and Conjugate Gradient (CG) are only
considered because of their advantages over others on finding the absolute minimum of the error function. The input function employed was the Dot Product (DP), which was constant for all topologies reported herein, and the transfer (activation) function found to yield better results was de Bipolar Sigmoid (BS).
QP is a supervised learning algorithm, which provides several useful heuristic procedures for minimising the time required for finding a good set of weights. These heuristic procedures automatically regulate the step size, and detect conditions that accelerate learning. QP evaluates the trend of the weight updates over time when the step size can be optimised [17].
8
CG is a classical numerical technique for minimising arbitrary functions, which are differentiable. This method when applied to neural networks becomes an excellent learning technique that can be used for catch mode training of feed forward networks. For a detailed presentation see Johansson et al. [18].
DP is an input function defined by a weighted sum of the inputs plus a bias value. Intuitively, this weighs each input according to its relative influence in increasing the net input to the node. Weights and inputs may take on negative values. For inputs of roughly the same magnitude, the absolute magnitude of the weights corresponds to the relative importance of the inputs.
Constructive Algorithms
There are several constructive algorithms such as the tower and pyramid algorithm [19, 20], the cascade-correlation algorithm [12], the tiling algorithm [21], and the upstart algorithm [22]. This study employs the cascade-correlation algorithm to determine sand behaviour. In the case of cascade architectures the hidden units are added to the network one at a time and do not change after they have been adjoined. After each processing unit is added, the magnitude of the correlation between the new unit’s output and the residual error signal is maximised [12]. The residual error is the net difference between the network output and the target (desired) output.
The Cascade-Correlation Algorithm (CCA) starts with n number of input branching modes and the j output neurones. There is no middle (hidden) layer at first, as shown in figure 3. The weights at the output neurones are trained over all classes in the single layer perceptron mode [13]. Then a hidden neurone is added to the net and is trained using any well-known algorithm for single-layer networks. Here, the QP algorithm was utilised to train the output weights. The input weights are frozen and all the output weights are trained again. The process is repeated until the residual error is acceptably small. Once this is achieved, a new hidden unit is created beginning with a candidate unit that receives trainable input connections from all of the network’s external inputs and from all pre-existing hidden units. Then, the correlation between the activation of the candidate unit and the residual error of the network is maximised by training all the links leading to the candidate unit. Learning is stopped when the correlation has no more improvement. Finally the candidate unit with maximum correlation is selected, its incoming weights are frozen and it is adjoined to the net.
9
The candidate unit is transformed into a hidden unit by generating links between the selected candidate unit and all the output units. Since the weights leading to the new hidden unit are frozen, a new permanent feature detector is obtained. The algorithm is repeated until the overall error of the network falls below a chosen threshold value. The process described above is shown in figure 3 for the case of two hidden units added. There, the vertical lines adjoin all incoming activations. Frozen connections are indicated with open boxes and the solid boxes represent the connections that are trained repeatedly. It is to be noted that training is done in stages where the early stages are very quick because of the small size of the hidden layer. This type of network is very potential to carry out non linear problems. However, as the number of hidden neurones in the single hidden layer increases the operational mode becomes progressively slower. Thus, it is recommended to keep the length of the network as small as possible. Constructive algorithms should, in principle, fit better the training data as new neurones are added. Generalisation improves, but only up to the point where data start being over fitted. Generalisation problem may be dealt with using in an analogous fashion to increasing the number of cells with the distributed method [23], and apply techniques such as cross validation [24] to determine how many cells to add. Such pruning (or destructive algorithms) can be very important for fitting data. Currently there is much research activity in constructive algorithms [25] and further development with these procedures is to be expected.
Trained connections Frozen connections
Figure 3. Neural network trained with cascade-correlation after two hidden units have been added (modified from [12])
10
Experimental testing program
García [26] examined the behaviour of sub-rounded-uniformly-graded coarse sand with maximum size, d max = 2.5 mm, subjected to monotonic and cyclic undrained loading in a triaxial cell. This experimental investigation focused mainly on studying the effect that cyclic loading could have on the phase transformation line [27] defined for the sand. The results reported in [26] showed that within the range of relative densities and effective stresses considered, the influence was negligible as long as the cyclic shear stress applied did not exceed the undrained shear strength of the coarse sand. In this paper, however, only the information pertaining to the stress-pore water pressure-strain behaviour of the sand under monotonic loading was used to develop the ANNs shown herein.
Table 1. Triaxial tests and sand samples general conditions Test # Dr 3
p
e
1
2
3
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.92
0.89
0.51
0.66
0.45
0.50
0.48
0.50
0.92
0.60
0.66
0.76
0.83
0.86
0.71
0.74
0.90
0.94
0.4
1.6
1.0
2.5
1.6
2.0
1.0
3.0
1.0
0.8
0.8
1.1
1.0
2.5
3.0
3.0
2.0
3.0
p-i (i=1,…,4) : patterns used for testing the prediction capabilities of the ANNs e-j (j=1,…,14) : patterns used for training
All samples were prepared by air-pluviating the sand followed by gentle vibrations. Then, the samples were saturated, isotropically consolidated and load was increased up to failure (in undrained condition) following a strain-controlled compression stress path. The axial load was monotonically increased while the radial stress was kept constant. The values of 3 and Dr used in the 18 tests are included in Table 1.
In order to minimise scale effects, the sample diameter to maximum particle size ratio was always larger than eight, the value recommended by Leslie [28] to reduce scale effects. Sample diameters were 35 mm. Of these tests, 14 were used for training the artificial networks and the remaining four were utilised for testing them. The stress-strain and pore water pressure-strain data sets employed during the training period are plotted in figure 4. It can be seen that the data base included a broad spectrum of behaviours, ranging from dilatant to compressive behaviour. The data used in the testing mode are included in figure 8. Comparing the curves plotted in this figure with those of figure 4, it is
11
observed that aside from test p-1, the patterns used to evaluate the predicting capabilities of the ANNs developed fall within the range of data used during the training stages. Thus, if RNN and CAA networks matched p-i results, it could be asserted that they are able to generalise.
The influence of membrane penetration during sample consolidation and membrane compliance caused by developed pore water pressure upon shearing the specimen in undrained conditions, were considered following Pierce [29] procedure. Detailed discussions on the effects of membrane penetration and membrane compliance on the test results used in this investigation, are given elsewhere [26].
16
Deviator stress, d, in kg/cm
2
14
14
12
e-1 e-4 e-6 e-8 e-10 e-12 e-13
10 8 6 4 2
10 8 6 4 2 0
0 0
2
4
6
0
8
2
4
6
8
0.8
0.8
Normalised pore pressure, u/ 3
e-2 e-3 e-5 e-7 e-9 e-11 e-14
12
0.6
0.6
e-1 e-4 e-6 e-8 e-10 e-12 e-13
0.4 0.2 0 -0.2 -0.4 -0.6
e-2 e-3 e-5 e-7 e-9 e-11 e-14
0.4 0.2 0 -0.2 -0.4 -0.6
-0.8
-0.8 0
2
4
6
Axial Deformation, e (%)
8
0
2
4
6
Axial Deformation, e (%)
Figure 4. Data base used for networks training
12
8
RNN Modelling
Stress-pore water pressure-strain behaviour may be viewed as a transient problem, whose present state depends on the past stress trajectory and initial stress state. Therefore, the network of this transient problem must have the capability to modify the input vector as the phenomenon progresses, and thus, to account for any trajectory changes. This implies that modelling the response of a sand specimen under a new load increment, the input vector for this loading stage has to include sample response (i.e., stress, pore pressure and strain) to the previous load increment.
Accordingly, a regression-type network as that sketched in figure 5 is developed in this section. The input parameters are the relative density (Dr), confining effective stress (3) and percent axial deformation (ei). The axial deformation was assigned in increments of 0.1%. The recurrent input data are the deviatoric stress (d,i) and the normalised pore water pressure (u/3,i). Notice that the input normalised pore pressures, u/3, and deviatoric stresses, d, are actualised (brought to the present load stage) after each loading increment. In contrast, feed forward networks, connect the outputs only to the inputs of neurones in subsequent layers. This renders a rather stiff architecture that precludes the network to model soil behaviour properly, which is particularly noticeable during recalling (testing). Recurrent networks are able to store information through time, which makes them particularly suitable for forecasting applications. This implies temporal sequencing of the activations of the network nodes. Starting with the initial conditions, all the activations of the nodes are zeroed. For the first load increment, each node is activated by the input corresponding to the initial conditions, and so now the input vector to a node consists of the external input for the first load increment, along with the current output (generated for the initial conditions) of each node. The current output of a node depends upon the previous external input, consequently the activations may depend upon previous input vectors as well as the current input vector.
In this paper, a subset of the RNNs that includes a total of three layers (one hidden) is used. This architecture was initially proposed by Jordan and Bishop [11] and is implemented by adding nodes to the input layer of a feed forward network. The number of nodes added is equal to the number of nodes on the output layer. This architecture can be trained using back propagation or any other algorithm.
13
Input layer
Hidden layer
Output layer
Dr 3
d, i+1
ei u/
3, i+ 1
d, i
u/ 3, i
Figure 5. Schematic representation of a RNN for the modelling of sand behaviour
To evaluate the learning and, most importantly, the predicting capabilities of this paradigm, five networks were designed and tested, using the data included in Table 1. The main features of these five networks are summarised in Table 2. The nomenclature used to specify network architecture is {i,h,o}, where i is the number of input variables, h is the number of neurones in the hidden layer and o is the number of output variables.
Figure 6 shows the effect of the network topology on the mean-squared error and the processing time. It may be seen that for the case of networks with QP as a learning rule, the number of processing units in the hidden layer that yielded better results is 20 (4.net). Networks using CG as a learning rule and 15 processing units in the hidden layer (3c.net) gave more accurate values. The correlation achieved by each network, figure 6, indicate that the best design is 4.net using QP. The correlation reached by 4.net is nearly one and its capability for generalisation is also better than all other topologies as indicated by the lower mean-squared error depicted in figure 7. All the results plotted in figures 6 and 7 correspond to the testing (recall) mode. Thus, they represent the accuracy with which the networks can reproduce unseen data. Computation times indicated in figure 7 correspond to run time in a PC with a Pentium III processor at 800 MHz.
14
Table 2. General conditions of the RNN designs. Architecture
Input Function
Learning Rule
Activation
Design / Name
Function 3,5,2 / 1.net 3,10,2 / 2.net
Jordan
DP
Recurrent
QP
BS
3,15,2 / 3.net 3,20,2 / 4.net 3,25,2 / 5.net 3,5,2 / 1c.net 3,10,2 / 2c.net
Jordan
DP
Recurrent
CG
BS
3,15,2 / 3c.net 3,20,2 / 4c.net 3,25,2 / 5c.net
1
Correlation
0.8
Conjugate Gradients Quick Propagation
0.6 0.4 0.2 0 1.net
2.net
3.net
4.net
5.net
Architectures Figure 6. Correlation achieved by each network 8 Mean-squared error
7
0.15 hrs
1.9 hrs
6 5 4
0.75 hrs
1.5 hrs
0.5 hrs
19 hrs
3 2
1.5 hrs QP
1
CG
1.0 hrs 10 hrs
5
0 1.net
2.net
3.net
4.net
5.net
Architectures
Figure 7. Effect of network architecture on process time and mean-squared error
15
Figure 8. Comparison between experimental data and predicted results with the proposed RNN
Finally, the accuracy with which the 4.net (QP) predicts the experimental results can be appreciated in figure 8. It can be seen that the network predictions practically fall on top of the experimental data, for both dilatant and compressive behaviours. These results add support to the growing belief that feed-back artificial neural networks can be employed to model the behaviour of materials.
CAA Modelling
This algorithm solves the slow-pace learning problem of back-propagation and prevents the design of over-sized networks by adding automatically one neurone, at a time, in the hidden layer, until the relative error between the computed output and the target value is acceptable. It is important to notice that this process not necessarily produces the optimum design of the network because the convergence criterion may lead to either of the two following
16
problems. If the magnitude if the convergence criterion is very small, the network growing process may yield an over-sized architecture and thus a not-so-smooth error function. Over-sized architectures are usually inadequate for generalisation (prediction). On the contrary, for relatively high-magnitude convergence criteria, the constructiontraining process may be suspended prematurely leading to an architecture unfit to reproduce unseen results. Thus, predictions by such network are usually poor.
Since there is not a straight-forward procedure to define the magnitude of the convergence criterion that will lead to the optimum topology, a trial and error process has to be adopted. After several trials, we found that a convergence criterion of 0.001 yielded the network architecture best fit to predict the stress-pore water pressure-strain behaviour of the coarse sand. Network predicting capabilities can be further improved by decomposing the modelling problem in sub-tasks. Thus, for the case considered in this paper, it is likely that if the full-size network with two outputs (see figure 9) is divided into two independent networks having only one output each (see figure 10), the accuracy of predictions could be enhanced.
Accordingly, two alternatives of network design were studied. First, a network having d and u/3 as outputs was considered. The general characteristics of the optimum architecture (1.net) are shown in Table 3. A schematic plot of this architecture is shown in Figure 9.
input layer
hidden layer
output layer
Dr
3
d u / 3
ei 52 hidden nodes
Figure 9. Schematic representation of the full size cascade network architecture
17
input layer
hidden layer
output layer
input layer
Dr
3
hidden layer
output layer
Dr
3
d
ei
u /3
ei 37 hidden nodes
47 hidden nodes
Figure 10. Schematic representation of independent cascade network architectures
Table 3. General characteristics of optimum Cascade networks
ARCHITECTURE
INPUT
LEARNING
ACTIVATION
FUNCTION
RULE
FUNCTION
DESIGN/NAME
3,52,2/1.net CASCADE
DP
CG
BS
3,37,1(d)/2.net 3,47,1(u/3)/3.net
The input parameters were Dr, 3 and ei. The optimum network topology included one hidden layer to which 52 neurones were added (one at a time) until the mean-squared error (0.001) condition was satisfied. This was accomplished in 55,000 iterations. The CG learning paradigm was utilised. All weights of the neurones in the hidden layer were activated with the BS function, and the weights of the output neurones were activated with a linear function (i.e., the incoming information is transmitted without any modification). This combination of activation functions added efficiency to the iterative-convergent procedure.
The stress-strain and pore-water pressure curves obtained with this CAA network topology are compared with the experimental data in figure 11. The knowledge-based modelling predictions are in good agreement with the unseen sample test results. However, when compared with the RNN forecasting (figure 8), the CAA results are less accurate.
18
Figure 11. Full-size cascade network results versus experimental data
Thus, it would seem, at least for the data base used and the characteristics of this CAA network, that RNNs reproduce more closely the behaviour of the sand samples.
Then, we proceeded to develop two independent networks with only one output each. The general characteristics of the topologies (2.net and 3.net) are shown in Table 3. An in Figure 10 they are schematically plotted. The convergence criterion used was also 0.001. The architecture of the network to model the stress-strain behaviour (2.net) included 37 neurones in the hidden layer and required 19,750 iterations to achieve the approximations shown in figure 12. For the network designed to model the behaviour of the normalised pore water pressure versus strain (3.net), 47 hidden neurones and 46,200 iterations were needed to reach the accuracy indicated in figure 13. These results confirm that simpler independent networks improve predicting capabilities.
19
Figure 12. Cascade-correlation results for the architecture with only d as output
Figure 13. Cascade-correlation results for the architecture with only u/3 as output
CONCLUSIONS
This paper reports the findings of the studies to model the behaviour of a coarse sand with two artificial neural algorithms: the recurrent neural network proposed by Jordan and Bishop [11] and the cascade-correlation developed by Fahlam and Lebiere [12]. The results included in this article, although of limited extent, show that these two algorithms can model with proficiency the stress-pore water pressure-strain behaviour of granular materials.
20
Throughout the study, it was found that to develop artificial networks with generalisation capabilities the information given as input patterns should cover a wide spectrum of cases of the tasks to be performed (i.e., stress-strain behaviour). Furthermore, the function that maps the inputs into the outputs should be a relatively smooth function. This may be accomplished, as shown in this paper, decomposing the problem into sub-tasks using uncorrelated subnetworks.
The feed-back process performed by recurrent networks, utilises the information of a previous state to compute the present state. Transforms low-quality training sets (i.e., some of the parameters of the input data have a predominant influence on the training) into subsets where the effect of all training patterns are reasonably equalised.
The designer of cascade-correlation networks (or any type of growing networks) has to be aware that the growing process does not necessarily guarantee the design of the optimum architecture. This is caused by the fact that the number of neurones added depends on the magnitude of the convergence criterion. The smaller the criterion, the more neurones added. The studies carried out showed that over-sized networks perform well during training but may have troubles to generalise (predict). Thus, a balance between the magnitude of convergence criterion and number of added neurones must be defined by trial and error. The results also show that network predictions can be enhanced by decomposing the problem into sub-goals using independent sub-networks.
ACKNOWLEDGEMENTS
The authors would like to express their appreciation to CONACyT (National Council for Science and Technology of Mexico) for its support through grant 33032-U. They also wish to acknowledge the encouragement and guidance provided by Dr. Jesús Figueroa Nazuno of the National Politechnique Institute. Likewise, they are thankful to Arturo Paz and Roberto Soto for their skilled work in editing this document.
21
REFERENCES
[1]
Wu, X., 'Neural network-based material modeling', Ph D., UMI University of Illinois at Urbana Champaign , pp 202 (1991)
[2]
Gutiérrez, M. and Ishihara, K., 'Non-coaxiality and energy dissipation in granular materials', Soils and Foundations, 40, No. 2, pp 49-60 (2000)
[3]
Garret, J.H.,. Guanaratman, D.J., and Irazic, N., 'Introduction', ASCE, Artificial Neural Networks for Civil Engineering: Fundamentals and Applications, eds. Kartam, N., Flood, I. and Garret, Jr, J. H., pp. 1-19 (1997)
[4]
Romo, M.P., 'Earthquake geotechnical engineering and artificial neural networks', 4th Arthur Casagrande Lecture, Proceedings of the XI Pan-American Conference on Soil Mechanics and Geotechnical Engineering, Vol. 5, Foz do Iguassu, Brazil (1999)
[5]
Ghaboussi, J., Garret, J.H., and Wu, X., 'Knowledge-based modeling of material behavior with neural networks', Journal of Engineering Mechanics, ASCE, 117, No.1., pp. 133-153 (1991)
[6]
Ellis, G.W., Yao, C., and Zhao, R., 'Neural network modeling of the mechanical behavior of sand', Proceedings of the 9th Conference of Engineering Mechanics, ASCE. New York, N.Y., pp. 421-424 (1992)
[7]
Ghaboussi, J. and Sidarta, D.E., 'New nested adaptive neural networks (NANN) for constitutive modeling', Computers and Geotechnics, 22, No.1, pp. 29-71 (1998)
[8]
Ellis, G.W., Yao, C., Zhao, R. and Pensamadu, D., 'Stress-strain modeling of sands using artificial neural networks', Journal of Geotechnical Engineering Division, ASCE, 121, No.5, pp. 429-435 (1995)
22
[9]
Penumadu, D. and Chameau, J-L., 'Geomaterial modelling using artificial neural networks', in Artificial Neural Networks for Civil Engineers: Fundamentals and Applications, edited by Kartan, N., Flood, I. and Garret, J.H. Jr., ASCE, Ch. 8, pp 160-184 (1997)
[10]
Penumadu, D. and Zhao, R., 'Triaxial compression behaviour of sand and gravel using artificial neural networks (ANN) ', Computers and Geotechnics, 24, No. 3, pp 207-230 (1999)
[11]
Jordan, M.I. and Bishop, C., 'Neural networks', MIT Artificial Intelligence Lab Memo 1562, March (1996)
[12]
Fahlman, S.E. and Lebiere, C., 'The cascade-correlation learning architecture', CUV-CS-90-100, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1991)
[13]
Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 'Learning internal representations by error propagation', Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, edited by Rumelhart and McClelland, MIT Press, Cambridge, pp 318-362, (1986)
[14]
Hopfield, J.J., 'Neural networks and physical systems with emergent collective computational abilities', Proceedings of the National Academy of Science, 79, pp 2554-2558 (1982)
[15]
Amari, S., 'Neural theory of association and concept formation', Biological Cybernetics, 26, pp 175-185 (1977), cited in C. C. Looney, Patter Recognition Using Neural Networks, Oxford University Press (1997)
[16]
Kosko, B., 'Bi-directional associative memories', IEEE Transactions on Systems, Man and Cybernetics, 18, No. 1, pp 49-60 (1988)
[17]
Fahlman, S.E., 'An empirical study of learning speed in back-propagation networks', CMU Technical Report CMU-CS-88-162 (1988)
23
[18]
Johansson, E.M., Dowla, F.U., Goodman, D.M., 'Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method', Preprint UCRL-JC-104850, Lawrence Livermore National Laboratory, Sept. 26 (1990)
[19]
Gallant, S.I., 'Three constructive algorithms for network learning', Proceedings, Eighth Annual Conference of the Cognitive Science Society, Amherst, MA, August 15-17, pp 652-660 (1986)
[20]
Gallant, S.I., 'Perceptron-based learning algorithm', IEEE Transactions on Neural Networks, 1, no. 2, June, pp 179-192 (1990)
[21]
Mézard, M., and Nadal, J.P., 'Learning feed forward layered networks: The tiling algorithm', J. Phys. A: Math. and Gen., 22, No. 22, pp 2191-2203 (1989)
[22]
Frean, M., 'The upstart algorithm: A method for constructing and training feed forward neural networks', Neural Computation, 2, pp 198-209 (1990)
[23]
Yoshida, K., Hayashi, Y. and Imura, A.A., 'Neural network expert system for diagnosing hepatobiliary disorders', MEDINFO '89: Proceedings of the Sixth Conference on Medical Informatics, Beijing, October 1620, pp 116-120, cited in S. I. Gallant, Neural Network Learning and Expert Systems, The MIT Press, Third printing (1989)
[24]
Breiman, L., Friedman, J., Olshen, R. and Stone, C., 'Classification and regression trees', Wadsworth International Group, Belmont CA (1986), cited in S. I. Gallant, Neural Network Learning and Expert Systems, The MIT Press, Third printing (1995)
[25]
Gallant, S.I., Neural network learning and expert systems, The MIT Press, Cambridge Massachusetts, third printing, (1995).
24
[26]
García, S.R., 'Undrained cyclic and static behavior of a coarse sand', Thesis to partially fulfill the requirements for the Master in Engineering degree, División de Estudios de Posgrado, Facultad de Ingeniería, UNAM (in Spanish), pp.165 (1999)
[27]
Ishihara, K., 'Liquefaction and flow failure during earthquakes', Géotechnique, 43, No. 2, pp 351-415 (1993)
[28]
Leslie, D. D., 'Large scale triaxial tests on gravelly soils', Proceedings of the 2nd Pan-American Conference on Soil Mechanics and Foundation Engineering, Vol. 1, pp 181-202 (1963)
[29]
Pierce, W.G., 'Constitutive relations for saturated sand under undrained cyclic loading', Ph.D. thesis, Civil Engineering Department, Rensselaer Polytechnic Institute, Troy, New York (1985)
25