Nonlinear Approximations using Multi-layered ... - CiteSeerX

0 downloads 0 Views 202KB Size Report
Abstract. Many ideas in statistics can be expressed in neural network notations. They include regression models from simple linear regression to projection ...
Proceedings of the 2nd IMT-GT Regional Conference on Mathematics, Statistics and Applications Universiti Sains Malaysia, Penang, June 13-15, 2006

Non Linear Approximations using Multi-layered Perceptions and Polynomial Regressions Ong Hong Choon1, Lim Chee Kang2 and Yong Yeow Wui3 1,2,3 Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia 1 [email protected] 2 [email protected] 3 [email protected] Abstract. Many ideas in statistics can be expressed in neural network notations. They include regression models from simple linear regression to projection pursuit regression, nonparametric regression, generalized additive models and others. In this study, we simulate a multi-layered perception with single hidden layer using error-backpropagation algorithm and compare the result with polynomial regression using the similar number of variables over five different non-linear functions which are all scaled so that the standard deviation is one for a large regular grid with 2500 points on [0,1]2. The empirical results obtained shows that the polynomial regression models perform better than the multi-layered perceptrons except for complicated interaction functions. 2000 Mathematics Subject Classification: 62J02, Secondary 68T05 Key words and phrases: Nonlinear approximations, Multi-layered perceptrons, polynomial regression. 1. Introduction There is considerable overlap between neural networks and statistics. Artificial neural networks are capable of processing vast amounts of data and making predictions that are sometimes surprisingly accurate. [9] provided a good elementary discussion of a variety of classification methods including statistical and neural network methods. Besides, many useful articles on neural networks and statistics are found in [10]. Therefore, neural network capabilities are similar or identical to popular statistical methods such as generalized linear models, polynomial regression, nonparametric regression and discriminant analysis, projection pursuit regression, principal components, and cluster analysis, especially where the emphasis is on prediction of complicated phenomena rather that on the explanation. Hence, we do a comparison in the accuracy between neural networks models and the polynomial regression models. Neural network works through computer programs and forecast the value without having to know the statistics behind. Neural networks are rich and

flexible nonlinear systems that can capture complex interactions among the input variables in the system. Neural networks can also easily handle more input variables extremely well when the numbers of experiments are more. But, in the case of multiple linear regressions a large number of input variables lead to a polynomial with many parameters that involve tedious computation (see [1]). In other words, polynomial regression models are often limited to the order of the equation. If we continue to expand the model to higher degree polynomial models, the equation will be more complicated. However, if the order or degree of polynomial equation is more than six, the additional terms will be redundant (see [6]). Neural networks are able perform when there are noisy or incomplete data and have the ability to generalize from the input data. This is due to its flexible nonlinear systems (see [2]). In addition, the system itself is able to process the information correctly as if the information resembles the original training data set. On the contrary, polynomial regression model needs a lot of process to generate a highly accurate output. Those processes are messy and complicated. Therefore, probability of an error cause by negligence will increase. This may influence the accuracy of the result. This generation is important and useful in practical applications because in the real world, data is often noisy and incomplete. Since neural networks are adaptable for new situations, there is no need for an apriority mathematical model for input-output transformation. They learn correct responses by observing real world examples. Other than that, they are best applied to problems whose solution requires knowledge which is difficult to specify (see [8]). This makes neural network ideal for data rich, but theoretically poor in application. Multi-layer Perceptron (MLP) is both non-parametric and stochastic and have been identified by statisticians as a powerful nonlinear regression model. Multilayer perceptron models are nonlinear models that can be used to approximate almost any function with a high degree of accuracy (see [10]). A MLP also contains a hidden layer of neurons that uses nonlinear activation functions, such as logistic function, hyperbolic tangent function, etc. A MLP fits a simple nonlinear regression curve with a logistic or hyperbolic tangent function. This simple MLP acts very much like a polynomial regression (see [4]). 2. Methodology In this study, a comparison is made between the accuracy of a commonly used neural network model, namely the multi-layered perceptron (MLP) and a nonlinear regression model, the polynomial regression.

The performances of the MLP using error backpropagation algorithm and the polynomial fit are calculated using five nonlinear functions g ( j ) : [0,1] 2 → R. These functions are scaled so that the standard deviation is 1 (for a large regular grid with 2 500 points on [0, 1] 2 ), and translated to make the range nonnegative The five nonlinear functions used in testing the performance of the MLP and polynomial fit are the same as those in [5]. This facilitates performance comparisons across the different functions. The abscissa values {( xA1 , xA 2 )} were generated as uniform random variables on [0, 1] which are independent of each other. We generated 225 points

{( xA1 , xA 2 )} of abscissa values, and used this same set for experiments with all the five functions, thus eliminating an unnecessary variability component in

yA

the

simulation. In other words, A = 1, 2, " , 225, and j = 1, " , 5.

( j)

= g ( j ) ( xA1 , xA 2 ),

for

The functions are as follows: 1. Simple Interaction Function:

g (1) ( x1 , x 2 ) = 10.391(( x1 − .4) ⋅ ( x 2 − .6) + .36) . 2.

Radial Function:

g ( 2) ( x1 , x 2 ) = 24.234 (r 2 (.75 − r 2 )), r 2 = ( x1 − .5) 2 + ( x 2 − .5) 2 3.

Harmonic Function:

g (3) ( x1 , x 2 ) = 42.659 ((2 + x1 ) / 20 + Re( z 5 )), where z = x1 + ix2 − .5(1 + i ), x1 = x1 − .5, or equivalently, with ~ 4 2 g (3) ( x , x ) = 42.659 (.1 + ~ x (.05 + ~ x − 10 ~ x ~ x 1

4.

2

1

1

1

2

2

~ x2 = x2 − .5, 4 + 5~ x )). 2

Additive Function:

g ( 4 ) ( x1 , x 2 ) = 1.33561 (1.5(1 − x1 ) + e 2 x1 −1 sin(3π ( x1 − .6) 2 ) + e 3( x2 −.5) sin(4π ( x 2 − .9) 2 )). 5.

Complicated Interaction Function:

g (5) ( x1 , x 2 ) = 1.9 (1.35 + e x1 sin(13 ( x1 − .6) 2 ) ⋅ e − x2 sin(7 x 2 )). The MLP with single hidden layer of 8 neurons results in (2 × 8) + (8 × 1) = 24 weights. Results are also tested for 16 neurons (2×16 + 16×1= 48 weights), 54 and 60 neurons to check if the number of neurons is the determining factor for the accuracy. The MLP program will generate 2 input data (2 independent variables X1 and X2) and an output data (Y) as shown in Figure 1.

Input layer

Hidden layer

Output layer

True value

1 X1

2

Ŷ

Y

X2 3

8

Independent Variables

Parameters (2 x 8) + (8 x 1)

Predicted Value

Dependent Variables

Figure 1: Multi-layered perceptron as a Nonlinear Regression MINITAB version 13 is used to estimate the parameters of the polynomial regression for approximating the five nonlinear functions. We estimate the parameters based on the input data (X1 and X2) and the output data (Y). The polynomial fit is done using an approximately similar number of parameters (weights). Three types of polynomial fits are tried for the five non linear functions in [5]. The fitted models are: (i) Without interaction except for the x1x2 term (24 parameters) Ŷ = ß0 + ß1x1 + ß2x2 + ß3x12 + ß4x22 + ß5x13 + ß6x23+ ß7x14 + ß8x24 + ß9x15 + ß10x25 + ß11x16 + ß12x26 + ß13x17 + ß14x27 + ß15x18 + ß16x28 + ß17x19 + ß18x29+ ß19x110 + ß20x210 + ß21x111 + ß22x211 + ß23x1x2 (ii) Without interaction (25 parameters) Ŷ = ß0 + ß1x1 + ß2x2 + ß3x12 + ß4x22 + ß5x13 + ß6x23+ ß7x14 + ß8x24 + ß9x15 + ß10x25 + ß11x16 + ß12x26 + ß13x17 + ß14x27 + ß15x18 + ß16x28 + ß17x19 + ß18x29+ ß19x110 + ß20x210 + ß21x111 + ß22x211 + ß23x112 +ß24x112 (iii) With interaction (28 parameters) Ŷ = ß0 + ß1x1 + ß2x2 + ß3x1x2 + ß4x12 + ß5x22 + ß6 x13 + ß7x23 + ß8x12x2 + ß9x1x22 + ß10x14 + ß11x24 + ß12x1x23 + ß13x12x22 + ß14x13x2 + ß15x15 + ß16x25 + ß17x1x24 + ß18x12x23 + ß19x13x22+ ß20x14x2 + ß21x16 + ß22x26 + ß23x1x25+ ß24x12x24 + ß25x13x23 + ß26x14x22 + ß27x15x2

Ŷ is the predicted outcome value for the polynomial model with regression coefficients / parameters ßi for each degree and Ŷ intercept ß0. The hypothesis tests is of the form H0 : coefficient equal 0 HA: coefficient not equal 0 If the P-value of the hypothesis test falls below 0.05, then we reject the null hypothesis at the 5% significance level. We could then assert that we are 95% confident that the true coefficient is not zero. If the P-value is more than 0.05, we can accept the null hypothesis at the 5% significance level. In addition, we could reduce the coefficients because it only gives small influence to the equation. However, we still use it to do the basic comparison between MLP and polynomial regression in this study. R-Squared (R-Sq), called the coefficient of determination, which measures the percentage of the variability in Y which has been explained by the model. The value of the R-Sq = a % indicates that a % of the total variations are explained by the model. The adjusted R-squared (R-Sq (adj)), adjusts the above statistic for the number of independent variables in the model. From the result of Analysis of Variance, the coefficients significantly contribute in the model if P-value is less than α = 0.05 (5% significance level) (see [7]). A summary of the test result is shown in Table 1. As in [5], the fraction of variance unexplained (FVU) on the test set is used for the comparison of the accuracy in the simulations of the five non-linear functions. Fraction of Variance Unexplained (FVU) is the proportion of the variation among the observed values y1, y2 … yn that remains unexplained by the fitted regression. When the FVU is close to 0, the variation of the observed values of y around the fitted regression function is much smaller 2

⎞ ⎛ ⎜y −y ˆ i ⎟⎟ i ⎜ than their variation around y (see [3]). It is defined as FVU = ∑ i =1 ⎝ ⎠ n 2 n ⎛ ⎞ ⎟ ⎜ − ∑ ⎜ yi yn ⎟ i =1 ⎝ ⎠ n

y , yˆi and yn i

represent true y, predicted y and the mean of y respectively

3. Result Table 1: Results Summary for Polynomial Regression used from MINITAB Without interaction (24 parameters) except the x1x2 term Constant, x1, x2, x1x2

Without interaction (25 parameters) Constant

g2

R-Sq R-Sq(adj) P-value of coefficient of term

Suggest Documents