Approximating a Function and its Derivatives

8 downloads 0 Views 198KB Size Report
the accuracy of approximating a given function and its derivatives. Some concluding ... the function f(x)=0:02(12 + 3x ?3:5x2 + 7:2x3)(1 + cos 4 x)(1 + 0:8 sin 3 x).
Approximating a Function and its Derivatives Using MSE-Optimal Linear Combinations of Trained Feedforward Neural Networks Sherif Hashem y Bruce Schmeiser z School of Industrial Engineering, Purdue University, 1287 Grissom Hall, West Lafayette, IN 47907-1287

Abstract

In this paper, we show that using MSE-optimal linear combinations of a set of trained feedforward networks may signi cantly improve the accuracy of approximating a function and its rst and second order derivatives. Our results are compared to the accuracies achieved by the single best network and by the simple averaging of the outputs of the trained networks.

1 Introduction Feedforward neural networks (FNN) are widely used for function approximation. They are considered universal approximators capable of approximating an unknown mapping and its derivatives arbitrarily well (Hornik et al. 1990). Approximating the derivatives, that is the derivatives of the output with respect to the inputs, is of signi cant importance in many applications. For example in process optimization, the rst and second order derivatives obtained from a neural network, which was trained on the process response, may be used in approximating the gradient vector and the Hessian matrix of the process response, allowing the use of optimization techniques such as Newton's method to optimize the process response (Gill et al. 1981, pp. 105{107). We assume that the activation functions of the trained FNNs are di erentiable. However, they may be of di erent forms. Under this assumption, the formulas to compute the rst and second order derivatives from the trained networks are given by Hashem (1992). In Section 2, a short overview of the MSE-Optimal Linear Combination (MSE-OLC) of trained neural networks is given. In Section 3, an example is presented to illustrate the improvement in the accuracy of approximating a given function and its derivatives. Some concluding remarks are given in Section 4.

2 Linear combinations of trained neural networks In general, constructing a good neural network model for a complex function is not an easy task. The training process may involve trying multiple networks with possibly di erent structures and values for the training parameters. Typically, the best network | based on some optimality criterion | is selected while the rest of the (partially) trained networks are discarded. In Proceedings of the 1993 World Vol. 1, pp. 617{620. y email: [email protected] z email: [email protected] 

Congress on Neural Networks.

1

Lawrence Erlbaum Associates, New Jersey,

Hashem and Schmeiser (1992) discussed combining a set of trained neural networks to improve the resultant model accuracy. The combination is constructed by forming the weighted sum of the corresponding outputs of the trained networks. The weights employed in the combination are computed based on some optimality criterion. One frequently adopted optimality criterion, in both the neural networks and the statistics communities, is minimizing the mean squared error (MSE). Hashem and Schmeiser (1992) demonstrated that the MSE-OLC, which minimizes the MSE over a given data set, may be superior to both the single best network and the average of these networks. This result was based on comparing the resultant MSE(s) on the function values.

3 Example We now present an example to illustrate the usefulness of constructing MSE-OLC of trained FNNs in improving the approximation accuracy of the function and its derivatives. A comparison with the accuracy obtained using the single best FNN, as well as the simple averaging of the FNNs, is presented. For the MSE-OLC case, the unconstrained optimal weights are computed using the formulas given by Hashem and Schmeiser (1992). The best FNN is the one with the smallest MSE among the six trained networks, while the simple averaging of the FNNs is performed by averaging the corresponding outputs of the trained networks for a given input. Consider the problem of approximating the function f (x) = 0:02(12 + 3x ? 3:5x2 + 7:2x3)(1 + cos 4x)(1 + 0:8 sin 3x)

over the interval [0; 1], a problem reported by Namatame and Kimata (1989). We used a slightly modi ed version of the backpropagation algorithm or the Generalized Delta Rule (Rumelhart et al. 1986) to train three 2-hidden-layers FNNs with 5 hidden units in each hidden layer (NN1, NN2, and NN3); and three 1-hidden-layer FNNs with 10 hidden units (NN4, NN5, and NN6). The networks were initialized with independent random weights. Each network had one input unit and one output unit. The activation function for the hidden units as well as the output units was the logistic sigmoid function g (x) = (1 + e?x )?1 . A set of 200 randomly generated points was used for training all the networks. Except for the structural di erences and the di erent initial weights, the six networks were trained in the same manner. The unconstrained optimal weights were estimated based on the data in the training set. The estimated optimal-weights vector was (0:126023; ?0:194889; 0:638781; 0:779431; ?0:660341; 0:311545)t. To evaluate the approximation accuracy, 10 sets of 10,000 independent randomly generated input points were generated, and the true values of the function and its derivatives at these points were used to compute the corresponding MSE(s). The MSE-OLC yielded an MSE of 0:000017 for the function approximation, a reduction of 87:3% over the MSE produced by NN4, the best FNN to approximate f (x). Moreover, for the rst and second order derivatives, the resultant MSE(s) produced by the MSE-OLC were 0:10 and 133:3, which are 69:7% and 64:5% less than the corresponding MSE(s) produced by NN3, the best FNN to approximate the rst and second order derivatives, respectively. The MSE(s) produced by the MSE-OLC for approximating f (x) and its rst and second order derivatives are 95:6%, 85:8; % and 74:3% less than the MSE(s) produced by the simple averaging of the FNNs, respectively. Figures 1, 2, and 3 show the approximations obtained from NN3, NN4, the simple averaging of the FNNs, and the MSE-OLC plotted against f (x) and its rst and second order derivatives (respectively). The MSE-OLC appears to perform better than any of the other approximators.

2

f(x) UOLC_NN NN4

1

f(x) Av-NNs NN3

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

X

0.8

1

X

(a)

(b)

Figure 1: The function f(x), and the approximations obtained from NN3, NN4, the MSE-OLC, and the average of the six FNNs 12

12

fp(x) UOLC_NN NN3

10

fp(x) Av-NNs NN4

10

8

8

6

6

4

4

2

2

0

0

-2

-2

-4

-4

-6

-6 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

X

0.8

1

X

(a)

(b)

Figure 2: The true rst-order derivative fp(x), and the approximations obtained from NN3, NN4, the MSE-OLC, and the average of the six FNNs

150

150

fpp(x) UOLC_NN NN3

fpp(x) Av-NNs NN4

100

100

50

50

0

0

-50

-50

-100

-100

-150

-150 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

X

0.8

1

X

(a)

(b)

Figure 3: The true second-order derivative fpp(x), and the approximations obtained from NN3, NN4, the MSE-OLC, and the average of the six FNNs 3

4 Concluding remarks Constructing MSE-OLC of the trained networks may yield more accurate approximations to an unknown function and its derivatives compared to using the best trained network or averaging the outputs of the networks. The computational e ort required for computing the optimal weights is modest and no information about the values of derivatives of the function at the training points is needed. The strength of using the MSE-OLC relies on integrating the knowledge acquired by all the trained FNNs to improve the approximation accuracy.

5 Acknowledgements This research was supported by PRF Research Award 6901627 from Purdue University and by National Science Foundation Grant DMS-8717799. The software used for training and testing the neural networks was partially developed at the Sensor and System Development Center, Honeywell Inc., Minneapolis, MN.

References Gill, P. E., W. Murray, & M. H. Wright (1981). Practical Optimization, Academic Press. Hashem, S. (1992). Sensitivity Analysis for Feedforward Arti cial Neural Networks with Di erentiable Activation Functions, Proceedings of the 1992 International Joint Conference on Neural Networks in Baltimore, I:419{424. Hashem, S., & B. Schmeiser (1992). Improving Model Accuracy Using Optimal Linear Combinations of Trained Neural Networks, Technical Report SMS92-16, School of Industrial Engineering, Purdue University. Hornik, K., M. Stinchcombe, & H. White (1990). Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedforward Networks, Neural Networks, 3, 551{560. Namatame, A., & Y. Kimata (1989). Improving the Generalising Capabilities of a Back-Propagation Network, The International Journal of Neural Networks Research & Applications, 1(2), 86{94. Rumelhart, D. E., G. E. Hinton, and R. J. Williams (1986). Learning Internal Representations by Error Propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D. Rumelhart and J. McClelland (eds.), MIT Press, Cambridge, MA, Vol. 1, Ch. 8, 318{328.

4

Suggest Documents