regularization tools for training large feed-forward neural ... - CiteSeerX

3 downloads 0 Views 240KB Size Report
NETWORKS USING AUTOMATIC. DIFFERENTIATION. JERRY ERIKSSON, M ARTEN GULLIKSSON, PER LINDSTR OM and. PER- AKE WEDIN. Department of ...
Copyright information to be inserted by the Publishers

REGULARIZATION TOOLS FOR TRAINING LARGE FEED-FORWARD NEURAL NETWORKS USING AUTOMATIC DIFFERENTIATION  and JERRY ERIKSSON, M ARTEN GULLIKSSON, PER LINDSTROM   PER-AKE WEDIN

Department of Computing Science, Umea University S-901 87 Umea, Sweden E-mail: [email protected], [email protected], [email protected], [email protected] We describe regularization tools for training large-scale arti cial feed-forward neural networks. We propose algorithms that explicitly use a sequence of Tikhonov regularized nonlinear least squares problems. For large-scale problems, methods using new special purpose automatic di erentiation are used in a conjugate gradient method for computing a truncated Gauss-Newton search direction. The algorithms developed utilize the structure of the problem in di erent ways and perform much better than a Polak-Ribiere based method. All algorithms are tested using benchmark problems and guidelines by Lutz Prechelt in the Proben1 package. All software is written in Matlab and gathered in a toolbox. KEY WORDS: Neural network training, Tikhonov regularization, Automatic di erentiation, Large-scale problems.

1 INTRODUCTION The training phase of supervised feed-forward neural networks leads to very dicult unconstrained nonlinear least squares problems. The diculties are due to the fact that the Jacobian matrix is rank de cient almost everywhere. By regularizing the original problem we get a less ill-conditioned problem with a solution limited in norm. For large problems, new special purpose automatic di erentiation algorithms for computing the Jacobian times a vector is used in a conjugate gradient method. In this paper, we propose optimization methods explicitly applied to the nonlinear regularized problem for large-scale problems. To be speci c, we formulate and solve nonlinear Tikhonov regularized problems. In [12] it was shown theoretically and practically that this approach is superior to standard optimization regularization  Financial support has been received by the Swedish National Board of Industrial and Technical Development under grant NUTEK 8421-94-4603

1

2

J. ERIKSSON, M. GULLIKSSON et al.

techniques, such as Levenberg-Marquardt (LM) (trust region methods) [16] and truncated QR-methods such as subspace minimization [6, 7, 14]. In feed-forward neural network computations, the di erence between the output vector, A, and the desired target vector, T , is named the error vector, E = T ? A, see Section 1.1. However, we use the name conventions from the eld of numerical nonlinear optimization and write f instead of E . Then the nonlinear least squares problem is written as m 1X 2 (1) min F ( x ) = x2

Suggest Documents