Document not found! Please try again

REGULARIZATION TOOLS FOR TRAINING FEED ... - CiteSeerX

0 downloads 0 Views 224KB Size Report
... toolbox. KEY WORDS: Neural network training, Tikhonov regularization, Automatic di .... Matlab's Neural Network Toolbox 3], are much too slow we compare our methods ...... Tutorial 11: Nonlinear black-box modeling in system identi cation.
Copyright information to be inserted by the Publishers

REGULARIZATION TOOLS FOR TRAINING FEED-FORWARD NEURAL NETWORKS PART II: Large-scale problems  and JERRY ERIKSSON, M ARTEN GULLIKSSON, PER LINDSTROM   PER-AKE WEDIN Department of Computing Science, Umea University S-901 87 Umea, Sweden E-mail: [email protected], [email protected], [email protected], [email protected] We describe regularization tools for training large-scale arti cial feed-forward neural networks. In a companion paper (in this issue) we give the basic ideas and some theoretical results regarding the Gauss-Newton method compared to other methods such as the Levenberg-Marquardt method applied on small and medium size problems. We propose algorithms that explicitly use a sequence of Tikhonov regularized nonlinear least squares problems. For small-and-medium size problems the Gauss-Newton method is applied to the regularized problem. For large-scale problems, methods using new special purpose automatic di erentiation are used in a conjugate gradient method for computing a truncated Gauss-Newton search direction. The algorithms developed utilize the structure of the problem in di erent ways and perform much better than the Polak-Ribiere based method. All algorithms are tested using benchmark problems and guidelines by Lutz Prechelt in the Proben1 package. All software is written in Matlab and gathered in a toolbox. KEY WORDS: Neural network training, Tikhonov regularization, Automatic di erentiation, Large-scale problems.

1 INTRODUCTION The training phase of supervised feed-forward neural networks leads to very dicult unconstrained nonlinear least squares problems. The diculty is due to the fact that the Jacobian matrix is rank de cient almost everywhere. By regularizing the original problem we get a less ill-conditioned problem with a solution limited in norm. For large problems, new special purpose automatic di erentiation algorithms for computing the Jacobian times a vector is used in a conjugate gradient method. In this paper, we propose optimization methods explicitly applied to the nonlinear regularized problem for large-scale problems. To be speci c, we formulate  Financial support has been received by the Swedish National Board of Industrial and Technical Development under grant NUTEK 8421-94-4603

1

2

J. ERIKSSON, M. GULLIKSSON et al.

and solve nonlinear Tikhonov regularized problems. In [7] (this issue) it was shown theoretically and practically that this approach is superior to the standard optimization regularization techniques, such as in Levenberg-Marquardt (LM) or trust region methods [15] or as in truncated QR-methods as in subspace minimization approaches described in [5, 6, 13]. We will use the same notations as in [7] which, for simplicity is partly repeated below. In feed-forward neural network computations, the di erence between the output vector, A and the desired target vector T , is named the error vector (E = T ? A), see Section 1.1 in [7]. However, we use the name conventions from the eld of numerical nonlinear optimization and write f instead of E . Then the nonlinear least squares problem is written as m X 1 min F (x) = 2 fi2(x); x2

Suggest Documents