PONS2train: tool for testing the MLP architecture and local training methods for runoff forecast Petr M´aca, Jirka Pavl´asek, Pavel Pech Department of Water Resources and Environmental Modeling, Faculty of Environmental Sciences, Czech University of Life Sciences Prague
[email protected] Introduction
C. The Data Transformation.
The Case Study
The purpose of presented poster is to introduce the PONS2train developed for runoff prediction via multilayer perceptron – MLP. The software application enables the implementation of 12 different MLP’s transfer functions, comparison of 9 local training algorithms and finally the evaluation the MLP performance via 17 selected model evaluation metrics.
Two simple methods for data transfomation are implemented
The Runoff forecast at small micro catchment based on dataset with the hourly resolution.
The PONS2train software is written in C++ programming language. Its implementation consists of 4 classes. The NEURAL NET and NEURON classes implement the MLP, the CRITERIA class estimates model evaluation metrics and for model performance evaluation via testing and validation datasets. The DATA PATTERN class prepares the validation, testing and calibration datasets. The software application uses the LAPACK, BLAS and ARMADILLO C++ linear algebra libraries.
D. The Additional Functionalities.
• MLP with one hidden layer, the output layer activation function – linear
• the non-linear data transformation
• inputs Q(t − 1), P(t − 1), P(t − 2), P(t − 3), the output Q(t)
Dtrans = (1 − exp(−γDorig ))
0.8
I. The persistency index of 7 TA on 11 different AcF.
• Nguyen-Widrow’s method PI [−]
• uniform distribution
The over-fitting control
0.4
0.6
The MLP weight initialization:
• early stopping of training based upon the selected model evaluation metrics 0.2
• neuron saturation control
The OLS benchmark model. The multi-run training and simulation. The 17 model evaluation metrics.
0.0
The PONS2train implements the first order local optimization algorithms: standard on-line and batch back-propagation with learning rate combined with momentum and its variants with the regularization term, Rprop and standard batch back-propagation with variable momentum and learning rate. The second order local training algorithms represents: the Levenberg–Marquardt algorithm with and without regularization and four variants of scaled conjugate gradients.
• the linear transformation
FLET−cal
HEST−cal
PER−cal
POL−cal
LM−cal
BP−cal
BP_regul−cal
Training algorithms
PONS2train consists of 4 classes, written in C++ using armadillo template library.
II. The ensemble of validation runoff forecast. • same settings for trainings algorithms
2.0 0.5
1.0
1.5
Runoff [mm]
600
800
• network initialization
1200
3.5 2.0 1.0 0.5 600
800
1000
1200
600
800
Time [hour]
• network training
1000
0.0
0.5
3. The Neural Net Class Methods.
1.5
Runoff [mm]
1.0
Runoff [mm]
• neuron saturation control
2.5
2.5
• estimation of neuron output signal
3.0
• estimation of the activation, the derivatives of error function
1200
3.0
3.5
• selection the activation function
1000
Time [hour]
Time [hour]
Fig 3. The Levenberg-Marquardt TA - the validation results: Inverse abs (Left Up), Gaus AcF (Right Up), LogSig (Right Down) and Wave AcF (Left Down).
• early stopping
• neurons saturation control
III. The AcF correlation.
• multi-run training
• the Persistency index of 150 ensemble, the Scaled CoGr Perry
• multi run simulation
• network weight analysis
0.5
−10
0
−2.0
−0.5
−2.0
−0.5
−200
−50
−1
−1.5 −0.5
0.0
−4
cloglog
cloglogm 0
−1.5
• network training, testing and validation
• OLS benchmark model of given problem
0
−150
gaus
0.0
−10
htan
4. The Criteria Class Methods.
0.0
−2.5
inv_abs
0.0
−2.0
• estimation of 17 model evaluation metrics
loglog
logsig 0.0
−2.5
• saving the model evaluation results
rootsig
−4
sech
−12 −200 −50
−10
The PONS2train C++ implementation enables to extend MLP architecture to hybrid MLP case.
sigmoid
−50
wave −4
−2
0
−150
0.0
The simple MLP
0.4
−50
0.8
−2.5
0.0
0.4
−0.5
0.8
−2.5 −1.0
0.0
0.4
0.5
0.8
−12
0.0
0.4
−6 −2
0.8
−50
0.0
0.4
−20
0
0.8
cloglog
• Scaled conjugate gradient - Fletcher (FLET) • Rprop and Rprop-
0.6 0.0 0.6 0.0 0.6
0.0
0.0
logsig
ANN MLP
rootsig
ANN MLP
sech
0.0
ANN MLP
ANN outputs
0.6
0.0
ANN inputs
sigmoid
wave
ANN MLP
0.0
0.4
0.8
0.0
0.4
0.8
0.0
0.4
0.8
0.0
0.4
0.8
0.0
0.4
0.8
0.0
0.4
0.8
Fig 4. - green all 150 results, blue filtered models PI > 0. Fig 1. ANN architectures built with PONS2train
Currently only local first order and second order gradient methods are implemented.
loglog
0.6
• Scaled conjugate gradient - Hestenes (HEST)
0.6
The hybrid MLP
0.0
• Scaled conjugate gradient - Polak (POL)
0.0
inv_abs
• Levenberq Marquardt with regularization • Scaled conjugate gradient - Perry (PER)
htan
0.6
• Levenberq Marquardt (LM)
gaus
0.0
• Back-propagation with regularization term (BP regul)
• Batch back-propagation variable self-adapting learning rate
cloglogm
0.6
ANN MLP
ANN outputs
0.6
• Back-propagation with learning rate and momentum term (BP)
ANN inputs
0.0
The list of batch trainings methods
0.6
0.0
• standard back-propagation with regularization term
1200
0.6
• standard back-propagation with learning rate and momentum term
1000
Time [hour]
5. The MLP Architecture Examples. The list of online trainings methods
800
−2.0
B. The 9 Local Search Training Algorithms (TA)
600
0.0
Tab 1. The built in activation functions - AcF. Function name Transfer function 1 Logistic Sigmoid y(a) = 1 + exp(−a) Hyperbolic Tangent y(a) = tanh(a) y(a) = a Linear function Gaussian function y(a) = exp(−a2) a Inverse abs y(a) = 1 + |a| LogLog y(a) = exp(−exp(−a)) ClogLog y(a) = 1 − exp(−exp(a)) ClogLogm y(a) = 1 − 2exp(−0.7exp(a)) a √ y(a) = RootSig 2 1 + 1 + a 2 1 LogSig y(a) = 1 + exp(−a) 2 Sech y(a) = exp(a) + exp(−a) y(a) = (1 − a2)exp(−a2) Wave The AcF is constant for neurons in one layer.
0.0
2. The Neuron Class Methods. A. The 12 Activation functions PONS2train enables to build MLP with different activation functions – see following Tab.
2.0 0.5
1.0
• saving the results of network training, testing and validation
1.5
• preparing the training, testing and validation data patterns
Runoff [mm]
2.5
• data transformation
2.5
3.0
• loading the data form data files
3.0
3.5
1. The Data Pattern Class Methods.
3.5
• different activation function in hidden layer
0.0
The PONS2train Features
Fig 2. The each box-plot shows the results with PI > 0 selected from 11 AcF x 150 simulations, right validation - left calibration.
2.0
The runoff forecast case study focuses on PONS2train implementation and shows the different aspects of the MLP training, the MLP architecture estimation, the neural network weights analysis and uncertainty of model training.
The PONS2train Software Implementation
1.5
The other important PONS2train features are: the multi-run, the weight saturation control, early stopping of trainings, and the MLP weights analysis. The weights initialization is done via two different methods: random sampling from uniform distribution on open interval or Nguyen Widrow method. The data patterns can be transformed via linear and non-linear transformation.
The simple MLP could have an multiple outputs.
The Final Remarks • PONS2train is freely available at http://kvhem.cz or upon request from authors