Nonlinear Process Modeling Using Multiple Neural Network (MNN

5 downloads 0 Views 792KB Size Report
promoting investigations in the robustness of neural networks models. Therefore, the main problem in nonlinear modeling based on neural network is ...
Nonlinear Process Modeling Using Multiple Neural Network (MNN) Combination Based on Modified Dempster-Shafer (DS) Approach Mat Noor, R. A. Ahmad, Z. and Baharuddin, I. School of Chemical Engineering, Engineering Campus USM, 14300 Nibong Tebal, Penang, Malaysia [email protected]

Abstract—In this work, modified Demspter-Shafer (DS) is employed as the method for multiple neural networks (MNN) combination. The modified DS – MNN combination was employed to a nonlinear process. The ‘best’ single network condition is somehow a difficult condition to achieve especially in nonlinear process modeling; therefore, multiple neural networks were applied in this work. Furthermore, MNN was combined with a nonlinear combination method – DS method to further improved the MNN model. In this case, a conical water tank was used as the nonlinear system. Based on the results, the modified DS – MNN implementation in the nonlinear conic water tank system was convincing and showed the reliability of MNN as a modeling tool. Keywords-neural networks; multiple neural networks; DempsterShafer method; nonlinear process modeling

I.

INTRODUCTION

Single neural network has been increasingly used in building nonlinear models which utilizes sample data in industrial processes generally. Even though neural networks have significant capability in representing nonlinear functions, inconsistency of accuracy still seems to be a problem where a neural network model cannot cope or perform well when it is applied to new unseen data. So robustness of model is one of the main criteria that need to be considered when judging the performance of neural network models. Furthermore, advanced process control and supervision of industrial processes require accurate process models promoting investigations in the robustness of neural networks models. Therefore, the main problem in nonlinear modeling based on neural network is robustness and prediction accuracy. A lot of researchers have been concentrated on achieving better performance via optimal or suboptimum structure and optimum training parameters of neural network. The multiple-modeling method can enhance the robustness and generalization by combining several models. Now it is widely accepted that multiple models is an effective method to gain better modeling performance [1]. In this method, several sub-models, which have different characteristics and the same objectives, are constructed

c 978-1-4577-2119-9/12/$26.00 2012 IEEE

Section of Chemical Engineering Technology Universiti Kuala Lumpur-Malaysian Institute of Chemical and Bioengineering Technology, Lot 1988, Bandar Vendor Taboh Naning, 78000, Alor Gajah, Malacca, Malaysia [email protected] according to training sample data subsets. Then outputs of sub-models are integrated to improve whole estimation performance. There are several advantages of multiple neural network over single neural network that make us decided to use it in this research. As state before, the problem with single neural network is its robustness and outcome prediction accuracy. This is where multiple neural networks come as a solution for a better performance of neural network [1]. Multiple neural networks have a better generalization and at the same time increase reliability through redundancy. Since multiple neural networks are structured with a several submodels, which have different characteristics and the same objectives, each network is a specialist at solving a particular portion of the problem. Furthermore, any relevant data from any sub-models can be useful for the entire network even though the data give poor representations. The structured of several sub-models give multiple neural networks a better approximation than single neural network. II. MULTIPLE NEURAL NETWORKS AND MODIFIED DEMPSTER-SHAFER METHOD The single neural network tends to render a slow reaction in learning process. This is due to the network processes the input and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. The set of data which enables the training is called the "training set." During the training of a network the same set of data is processed many times as the connection weights are ever refined. Single neural network also do not give explicit knowledge representation in the form of rules or some other easy interpretable form. The single neural network is an implicit, hidden in the network structure and optimized weight, between the nodes in the network. To overcome these disadvantages or lack of single neural network, multiple neural networks has been introduced as a solution.The DempsterShafer theory is a mathematical theory of evidence [2] based

1404

on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to calculate the probability of an event. The theory was developed by Arthur P. Dempster (1968) and Glenn Shafer (1976). The Dempster-Shafer theory, also known as the theory of belief functions, is a generalization of the Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilities for each question of interest, belief functions allow us to base degrees of belief for one question on probabilities for a related question. These degrees of belief may or may not have the mathematical properties of probabilities; how much they differ from probabilities will depend on how closely the two questions are related. The Dempster-Shafer theory is based on two ideas; 1) the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, 2) Dempster's rule for combining such degrees of belief when they are based on independent items of evidence. Morelli and DeSimone Jr. [3] applied the Dempster Shafer theory of evidence to the spatial correlation problem. Usually the correlation problem is solved using a Bayesian approach by evaluating the likelihood function for each possible assignment and choosing the maximum likelihood function as the correct assignment. A simulation comparison is then made between the decisions arrived at using the Dempster-Shafer theory and those found using the Bayesian approach. As the result, the decisions & value made by both theories is identical. Since Dempster-Shafer theory is more computationally intensive than the Bayesian approach, the writer feel that people will be more comfortable with the Bayesian approach to the spatial correlation problem. In order to utilize or to extract the shape information of objects in an image, a method for representing shape is needed. Here comes the shape recognition method. Hu et al. [4] developed a new neural network shape recognition system based on Dempster-Shafer theory. It is composed of three parts; they are pre-processing part, feature extracting part and recognition part. Recognition part fully utilizes the advantages of Dempster-Shafer Theory in uncertainty reasoning. One of the category fall is shape recognition system is, face recognition system. A face recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source. In 1994, Horace and Ng [5] proposed a novel approach to face recognition based on an application of the Dempster-Shafer Theory. The technique makes use of a set of visual evidence derived from two projected views, frontal and profile of the unknown person. The set of visual evidence and their associate hypotheses are subsequently combined using the Dempster's rule to output a ranked list of possible candidates. III.

MATERIALS AND METHODS

In this case study, 20 networks with fixed identical structure were developed from bootstrap re-samples of the original training and testing data.. The rationale to choose 20 networks for all combination is based on the work of Zhang [6] which

shows that constant sum square error (SSE) was observed after combining about 15 networks. Therefore combining 20 neural networks would be reasonable. If the number of networks is too small we might not get the optimum reduction of the SSE in the combination. In re-sampling the training and testing data using bootstrap re-sampling techniques, the training and testing data were first transformed in the form corresponding to the discrete time functions by taking into account the time lags in model inputs and outputs, therefore re-sampling the transformed data does not affect the input-output mapping of the models. Then the individual networks were trained by the Levenberg-Marquardt optimisation algorithm with regularisation and “early stopping”. All weights and biases were randomly initialised in the range from –0.1 to 0.1. The individual networks are single hidden layer feed forward neural networks. Hidden neurons use the logarithmic sigmoid activation function whereas output layer neurons use the linear activation function. Instead of selecting a single neural network model, a combination of several neural network models is implemented to improve the accuracy and robustness of the prediction models. The final model prediction is a weighted combination of the individual neural network outputs. To cope with different magnitudes in the input and output data, all the data were scaled to zero mean and unit standard deviation. The data for neural network model building need to be divided into; 1) training data (for network training), 2) testing data (for cross-validation based network structure selection and early stopping), 3) unseen validation data (for evaluation of the final selected model). In networks with fixed structure, the network structures, i.e. the number of hidden neurons, were determined through cross validation. Single hidden layer neural networks with different numbers of hidden neurons were trained on the training data and tested on the testing data. The network with the lowest sum of squared errors (SSE) on the testing data was considered as having the best network topology. In assessing the developed models, SSE on the unseen validation data is used as the performance criterion. In this case study, one-stepahead predictions of the process is apply, where the process output at time (t-1), y(t-1), is used as a model input to predict the process output at time t, y(t), as follows:

yˆ (t ) = f [ y (t − 1), y (t − 2),......., y (t − n), u (t − 1), u (t − 2),......u (t − m)]

(1)

where u(t-1) is the process input at time (t-1), yˆ (t ) is the predicted process output at time t, m and n are the time lags in the process input and output respectively. A. Case Study: Conic Water Tank Fig. 1 shows the schematic diagram of a conic water tank. There is an inlet stream to the tank and an outlet stream from the tank. Manipulating the inlet water flow rate will regulate the water tank level [7].

2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)

1405

both the static and dynamic characteristics of the process vary with the operating condition. All the network building data were generated from the simulation programme and normally distributed noise with zero mean and a standard deviation of 0.7 cm were added to the simulated tank level. Networks were trained on the training data set and tested on the testing data set. The dynamic model for tank level prediction is of the form:

yˆ( t ) = f [y( t − 1), u ( t − 1)]

(6)

ˆ represents the tank level prediction, y represents tank where y level, u represents the inlet flow rate, f is a nonlinear function represented by the neural networks and t is the discrete time. Figure1. Conic water tank

Let V, Qi and Qo be the volume of water in the tank and the inlet and outlet water flows rates respectively, then the material balance in the water tank can be written as

dV = Qi − Qo dt

(2)

The outlet water flow rate, Qo, is related to the tank level, h, by the following equation: Qo =

k h

(3)

where k is constant for a fixed valve opening. The volume of water in the tank is related to the tank level by the following equation:

⎡ hr h2 ⎤ V = πh ⎢r 2 + + ⎥ tan θ 3(tan θ ) 2 ⎦ ⎣

IV.

RESULTS AND DISCUSSION

A method called nearest-neighbour method was introduced by Skellam [8] used in this part to calculate corresponding error for every single prediction in the testing and training data. This is done by comparing the distribution of the distances that occur from a data point to its nearest neighbor in a given data set with the randomly distributed data set. Fig. 2 shows predicted error distribution for 20 sets of data for testing & training. Error distribution seems to have an identical pattern through all 20 sets of data although there are some fluctuations in some samples. Normal Dempster-Shaper combination theory give the following rules two combine 2 data, X1 andX2,

( X 1 ⊕ X 2 )( A) =

∑X *X 1− ∑ X * X 1

2

1

(7) 2

(4)

Where r is the tank bottom radius and θ is the angle between the tank boundary and the horizontal plane. Combining (1) to (3), the following dynamics model for the tank level can be obtained:

dh = dt

Qi − k h ⎡ 2rh h2 ⎤ π ⎢r 2 + + ⎥ tan θ (tan θ ) 2 ⎦ ⎣

(5)

Based on the above model, a simulation programme is developed to simulate the process. The parameters used in the simulation are r = 10 cm, k = 34.77 cm 2.5/s and θ = 60o. The sampling time is 10 seconds. The above equation indicates that the relationship between the inlet water flow rate and the water level in the tank is quite nonlinear. The outlet valve characteristic determines that the static gain increases with tank level. Because the tank is of a conical shape, the time constant of the processes increases with the tank level. Thus,

1406

Figure 2. Predicted Errors Using Nearest Neighbour Method

2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)

correlation & increase overall performance since 0.9884 is closer to 1. Fig. 4 shows the residues calculate for 199 samples set of data generate from modified DS theory. This residue is calculated by subtract each value from validation data from prediction data from our approach, this will generate 199 residues for overall system. Mean residue for our approach is 0.004, which is small when we compare to actual value from validation data. From the results, modified DS method has shown convincing performance.

Figure 3. Modified DS Theory Output and Validation Data for 199 sample

This combining rule can be generalized by iteration; if we treat X1 and X2 not as a single data from training & testing, but rather as the predicted error that we have calculate in previous section.To translate a weight of a single data using predicted error, we slightly modified (6) to give us a higher weight for smaller error – so that, next data prediction will be based on smaller error occurred in previous data sets. Weight for single data in respective group, data that are derived from (7) will then normalize by dividing each by their sum yields the final weight values.

⎤ ⎡ Xi ⎥ ⎢ W = 1− ⎢ 1− Xn ⎥ ⎥ ⎢ i =∏ 1, 2...n ⎦ ⎣

(8)

From Table I, it shows that using modified Dempster Shafer Theory can give biggest weight for smallest percentage error relative to each data in each sets. The analysis is carried out in order to evaluate or to compare the performance of the DS Combination Method with the validation data from Neural Networks. Fig. 3 shows the performance of MNN-modified Dempster Shafer compare to validation data from neural networks. It is shown that the Dempster-Shafer combination on neural networks has significantly improved the performance of the network. This figure is supported by the fact that sumsquare error (SSE) for prediction data using DS combination method (3.3969) is smaller compare to SSE for single output data (3.8123). The correlation coefficient, R sometimes also called the cross-correlation coefficient, is a quantity that gives the quality of a least squares fitting to the original data. In this case, we plot the prediction data from modified DS theory compare to actual value based on validation data. Execute correlation coefficient, R from MATLAB, it shows that mean value of R for prediction data using 20 networks is bigger compare to R value for DS method. It is concluded that our result using modified DS theory has a stronger positive linear

Figure 4. Residue of Predicted Data from Modified DS Method

TABLE I. SSE and R for Single Output and MNN

Sum-Square Error (SSE)

Correlation Coefficient (R)

3.8123

0.9873

3.3969

0.9884

Single Output Data Modified DS Theory

TABLE II. Sum-Square Error (SSE) & Correlation Coefficient (R) Value

Data % error Weight, W Wn

A 0.056 0.944 0.3148

V.

B 0.542 0.457 0.1524

C 0.200 0.7997 0.2666

D 0.202 0.7977 0.2659

CONCLUSION

As mention earlier, the objective of this research are to study the combination of multiple neural networks using new approach – Dempster-Shafer Method and compares the performances of multiple neural networks to that of a single neural network. Each chapter has particular objectives in order to achieve the objectives state above. 20 sets of data from our case study have been combined using multiple neural network and further prediction using Dempster-Shafer method.

2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)

1407

Percentage error for each data in each set then calculated using nearest-neighbour method, these error become our parameters in weight calculation using modified Dempster-Shafer theory. From our result in this approach, combination of Multiple Neural Networks using modified Dempster-Shafer theory give us a better performance compare to single data output. Improvement in performance of data prediction indicated by significant decrease in Sum Square Error (SSE) value, R-value & also small residue value when compare to single output data. ACKNOWLEDGEMENTS

[2] S. Kari, “Combination of Evidence in Dempster-Shafer Theory”, Sandia National Laboratories, April 2002. [3] M. Michael and J.D. Anthony Jr.,” Application of Dempster Shafer Theory of Evidence to the Correlation Problem”, ISIF 2002. [4] Hu Liangmei, Gao Jun, Wang Andong, and Hu Yong,” A Neural Network Shape Recognition System based on D-S Theory”, Intell. Trans. Sys., IEEE Proceedings, Vol. 1, 2003, pp. 524 – 528. [5]H. H. S. Ip and J.M. C. Ng, “Human Face Recognition using DempsterShafer Theory”, Image Proc., IEEE Proceedings, Vol. 2, 1994, pp. 292-295. [6] J. Zhang, “Developing Robust Non-linear Models Through Bootstrap Aggregated Neural Networks”, Neurocomputing, Vol. 25 , 1999, 93-113. [7] J. Zhang, “Developing Robust Neural Network Models by Using Both Dynamic and Static Process Operating Data”, Ind.Eng.Chem.Res, Vol. 40, 2001, 234-241. [8] J. G. Skellam, “Studies in statistical ecology. I.” Spatial pattern, Biometrica, Vol. 39, 346-362, 1952

The authors would like to thank Universiti Sains Malaysia (USM) for their support through Fellowship Grant, Ministry of Science, Technology and Innovation through Grant No. 6013378 and Universiti Kuala Lumpur (UniKL) for the support. REFERENCES [1] Z. Ahmad, R. A. Mat Noor and J. Zhang, “Multiple neural networks modeling techniques in process control: a review”, Asia Pac. J. of Chem. Eng., Vol. 4, 2009, 403 – 419.

1408

2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)