simulation of pattern recognition system via modular ...

2 downloads 0 Views 323KB Size Report
[17] VOLNA, E. Emergence of modularity in evolved neural networks. Neural Network World. Int. Journal on Neural & Mass-Parallel Comp. and Inf. Systems.
SIMULATION OF PATTERN RECOGNITION SYSTEM VIA MODULAR NEURAL NETWORKS VOLNÁ Eva, (CZ), KOTYRBA Martin, (CZ), KOCIAN Václav, (CZ), JANOŠEK Michal, (CZ) Abstract. It is the purpose of the present paper to suggest an approach to utilization of mathematical models of a classification system for pattern recognition. Pattern recognition has a long history but has recently become much more widespread as the automated capture of signals and images has become cheaper. Very many of the applications of neural networks are to classification, and so are within the field of pattern recognition. This article describes a classification system for pattern recognition based on artificial neural networks with modular architecture. We use a three layer feedforward network model that is learned with the backpropagation algorithm for all experiments. Our experimental recognition objects were digits and their type fonts. We also propose outline further development on this topic in conclusion. Key words. Simulation, pattern recognition, modular neural networks, neuro-classifier. Mathematics Subject Classification: Primary 82C32, 68T10; Secondary 68T05.

1

Classification via Neural Networks

Classification is one of the most active research and application areas of neural networks. The advantage of neural networks lies in the following theoretical aspects. First, neural networks are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model. Second, they are universal functional approximators in that neural networks can approximate any function with arbitrary accuracy [10]. Since any classification procedure seeks a functional relationship between the group membership and the attributes of the object, accurate identification of this underlying function is doubtlessly important. Third, neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. Finally, neural networks are able to estimate the posterior probability, which provides the basis for establishing classification rule and performing statistical analysis [15].

1295

On the other hand, the effectiveness of neural network classification has been tested empirically. Neural networks have been successfully applied to a variety of real-world classification tasks in industry, business and science. Applications include bankruptcy prediction [7; 13], handwriting recognition [8], speech recognition [16], product inspection [3], fault detection [6], medical diagnosis [9], bond rating [11] etc. A number of performance comparisons between neural and conventional classifiers have been made by many studies, e.g. [1; 2; 4]. In addition, several computer experimental evaluations of neural networks for classification problems have been conducted under a variety of conditions, e.g. [12; 18]. This article describes a classification system for handwritten digit recognition based on artificial neural networks. We propose an study of a pattern recognition system using neural network technologies and outline further development on this topic. 2

Artificial neural networks

Figure 1. A simple artificial neuron [14]. An Artificial Neural Network (ANN) is a massively parallel system, inspired by the human neural system. Its units, neurons (one is shown in Figure 1), are interconnected by connections called synapses. A neuron obtains input signals (x1, ..., xn) and relevant weights of connections (w1, ..., wn), optionally a value called bias b is added in order to shift the sum relative to the origin. The weighted sum of inputs is computed and the bias is added so that we obtain a value called stimulus or inner potential (z) of the neuron. After that it is transformed by an activation function f into output value y, which may be propagated to other neurons as their input or be considered as an output of the network. A i-unit’s state, yi is computed as it is shown in equations (1), where bi is its bias and the activation function is a sigmoid. The purpose of the activation function is to perform a threshold operation on the potential of the neuron. n

zi =  wij x j + bi j=1



yi = 1 + e



(1)

 zi 1

This article describes a classification system for pattern recognition based on artificial neural network with modular architecture. Multilayer feedforward neural networks belong to the most

1296

common ones in practical use. Its architecture contains units organized into three different types of layers. A subset of input units has no input connections from other units; their states are fixed by the problem. Another subset of units is designated as output units; their states are considered the result of the computation. Units that are neither input nor output are known as hidden units. Learning algorithm of such neural network called backpropagation and belongs to a group called “gradient descent methods”. When looking at Figure 2, it is obvious that the initial position on the weight landscape greatly influences both the length and the path made when seeking the global minimum.

Figure 2. An intuitive approach to the gradient descent method, looking for the global minimum [14]. The starting point is a), the final one is b). Now, a more formal definition of the backpropagation algorithm (for a three layer network) is presented [14]. 1. The input vector is presented to the network. 2. The feedforward is performed, so that each neuron computes its output (yi) following the formula (1) over neurons in previous layer. 3. The error on the output layer is computed for each neuron using the desired output (oi) on the same neuron, see formula (2).

erri0 = yi 1  yi oi  yi  4.

(2)

The error is propagated back to the hidden layer over all the hidden neurons (hi) and weights between each of them and over all neurons in the output layer see formula (3). r

erri h = hi 1  hi  err j0 wij0

(3)

j=1

5.

Having values err j0 and erri h computed, the weights from the hidden to the output layer and from the input to the hidden layer can be adjusted following formulas (4), where α is the learning coefficient and xi is the i-th neuron in the input layer.

1297

wij0 t +1 = wij0 t + α err j0 hi wijh t +1 = wijh t + α err jh xi 6.

(4)

All the preceding steeps are repeated until the total error of the network (5) over all training pairs does not fall under certain level, where m is number of units in the output layer. E=

1 m  yi  oi 2  2 i=1

(5)

The formulas in step three and four are products of derivation of the error function on each node. A detailed explanation of this derivation as well as of the complete algorithm can be found in [5]. 3

Modular neural networks

The modular network architecture has advantages in terms of learning speed [17]. Several characteristics of modular architectures suggest that they should learn faster than networks with complete sets of connections between adjacent layers. One such characteristic is that modular architectures can take advantage of function decomposition. If there is a natural way to decompose a complex function into a set of simpler functions, then a modular architecture should be able to learn the set of simpler functions faster than a monolithic multilayer network. In addition to their ability to take advantage of function decomposition, modular architectures can be designed to reduce the presence of conflicting training information that tends to retard learning. This occurs when the backpropagation algorithm is applied to a monolithic network containing a hidden unit that projects to two or more output units. Moreover, modular architectures generalize better because they perform local generalization in the sense that only learns patterns from a limited region of the input space. Modular architectures tend to develop representations that are more easily interpreted than the representations developed by single networks. As a result of learning, the hidden units of the system used in separate networks for the tasks’ contributes to the solutions of these tasks in more understandable ways than the hidden units of the single network applied to both tasks. In our experiments, we use modular neural networks that are derived from classical feedforward networks with some connections missing, usually a task decomposing network into some modules. These modules usually have only one input layer, by which the entire input vector is presented to “inner” modules, each of which is represented by its own units in the hidden and output layer. 4

Modular Neuroclassifier

In this article, for all experiments a three layer feedforward network model is used and it is learned with the backpropagation algorithm. Our experimental recognition objects were digits and their type fonts. Digits acquired as binary images. A digitized character image consists of pixels, usually black on a background of white. The three layer net was trained using 50 digits; that is one sample of each class 0 through 9 was presented, followed by another set of 0 – 9 and so on. The images were uniformly divided into 9 x 7 pixel grids. The whole training set is shown in the Figure 3. In our simulations the neural network performs a task that is decomposable into simple tasks. The input layer is fully interconnected with the hidden layer. In the split model the pattern of connectivity between the hidden and output layers is restricted by partitioning the hidden nodes into two subsets

1298

(e.g. modules). Nodes in each of the subsets are connected only to the output nodes that are associated with one of the modules. The split model thus functions as two distinct systems, overlapping only at the input level. We have used the modular neural network architecture that represents both shape and its font. The network architecture is the following: 63 – 40 –15. The shape recognition was represented by one system and we use the network architecture 63 – 20 –10, while the font recognition was represented by another system and we use the network architecture 63 – 20 – 5. The input layer (containing 63 units) was fully interconnected with the hidden layer containing 40 units, which were divided into two modules (20 units were associated with the first module and 20 units were associated with the second module). Each unit in the hidden layer was interconnected only with output units associated with its module (10 units were associated with the first module and 5 units were associated with the second module).

0 1 2345678 9 0123456789 01 2 34 5 6 7 89 0 1 2 34 5 6 7 8 9 0123456789  

 

 

 

Figure 3. The training set.

Error

10

5

0 0

100

200

300

400

Figure 4. Error function history (average values).

1299

500 Cycles

 

The training algorithm was backpropagation, we used the following parameters: learning rate was 0.4 and momentum was 0.1. History of error functions during whole calculation is shown in the Figure 4. There is shown an average value of the error functions, because the adaptation with backpropagation algorithm was applied 10 times during 500 cycles on average. After the training phase another set of 50 digits was used to test the recognition rate. The recognition rate using neural network technology is acceptable, averaging about 90% for training and 88% for testing, e.g. 91% for testing digits recognition and 85% testing font recognition . In detail, see Table 1. Other numerical simulations give very similar results. character: training testing digit recognition testing writer recognition

0 89%

1 91%

2 89%

3 88%

4 92%

5 89%

6 91%

7 92%

8 90%

9 89%

91%

91%

92%

91%

92%

90%

91%

92%

90%

90%

86%

89%

76%

80%

89%

86%

87%

88%

86%

83%

Table 1. Detail of the recognition rate using neural network technology for training and testing. 5

Conclusions and future works

Handwritten digit recognition is an important task in automated document analysis. Different methods including neural networks or statistical analysis, structured or syntactic approaches have been used to solve these problems. Applications have been developed to read postal addresses, bank checks, tax forms, and census forms, including reading aids for the visually impaired, among others. The handwritten digit recognition problem is a suitable task to explore new approaches in 2-D pattern recognition classifiers because it is a complex problem but restricted to only ten classes. The use of artificial neural system for various recognition tasks is well founded in the literature [3; 6; 7; 8; 9; 11; 13; 16] etc. The advantage of using neural nets is not that the solution they provide is especially elegant or even fast; it is that system ’learns’ it own algorithm for the classification task, and does so on actual samples data. In general, performance of the neural network seems quite good. The errors made by the network are only in a way understandable: the misclassified digits are often misshaped and carry features normally found in other digits. In this article we provided handwritten digit recognition for digits 0 to 9 and their fonts of using pattern recognition systems with neural network technology. This article also demonstrated that the effectiveness of such system is very good. Our future work should address to signature verification that is an important research topic in the area of biometric verification. In bank and public offices, signature verification remains one of the most acceptable means of verifying cheque/document legitimacy. Since manual verification for a large amount of checks or documents is tedious and easily influenced by physical and psychological factors, automatic processing by computers is advocated. Acknowledgement

The paper was supported University of Ostrava grant SGS/PřF/2011. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

1300

References

[1] [2] [3] [4] [5] [6] [7] [8]

[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

ABU-NIMEH, S., NAPPA, D., WANG, X., and NAIR, S. A comparison of machine learning techniques for phishing detection. In Proceedings of the eCrime Researchers Summit, 2007. Al-ASSAF, Y., El KADI, H. “Fatigue life prediction of composite materials using polynomial classifiers and recurrent neural networks,” Composite Structures, vol. 77, no. 4, pp. 561–569, 2007. CASASENT D., CHEN, X. “New training strategies for RBF neural networks for X-ray agricultural product inspection,” Pattern Recognition, vol. 36(2), pp. 535-547, 2003. CORREA, M., BIELZA, C., PAMIES-TEIXEIRA, J. “Comparison of Bayesian networks and artificial neural networks for quality detection in a machining process”. Expert Systems with Applications: An International Journal, Vol. 36 No 3, pp. 7270-7279, 2009. FAUSETT, L.V. Fundamentals of Neural Networks. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1994. FEKIH, A., H. XU and F. N. CHOWDHURY, Neural Networks Based System Identification Techniques for Model Based Fault Detection of Nonlinear Systems, International Journal of Innovative Computing, In-formation and Control, vol.3, no.5, pp.1073-1085, 2007. GHIASSI, M., SAIDANE, H., & ZIMBRA, D.K. “A dynamic artificial neural network model for forecasting series events.” International Journal of Forecasting, 21, (2005) 341– 362. GRAVES, A., FERNANDEZ, S. LIWICKI, M., BUNKE, H. and SCHMIDHUBER, J. “Unconstrained Online Handwriting Recognition with Recurrent Neural Networks,” J. Platt, In Koller, D., Singer, Y. and Roweis, S. (eds.) Advances in Neural Information Processing Systems, 20, 2008. HABAS, P.A., ZURADA, J.M., ELMAGHRABY, A.S., TOURASSI, G.D. “Particle swarm optimization of neural network CAD systems with clinically relevant objectives”, Proceedings of Medical Imaging 2007: Computer-Aided Diagnosis; 2007. HORNIK, K. “Approximation capabilities of multilayer feedforward networks,” Neural Networks, 4, (1991) 251–257. HUANG, Z., CHEN, H., HSU, C.-J., CHEN, W.-H., & WU, S. “Credit rating analysis with support vector machine and neural networks: A market comparative study.” Decision Support Systems, 37, (2004)543–558. OU, G., MURPHEY, Y.L., and FELDKAMP, L. “Multiclass pattern classification using neural networks.” In Proceeding of the International conference on Pattern Recognition (ICPR 2004), volume IV, 2004, pp. 585-588. PENDHARKAR, P. C. “A threshold-varying artificial neural network approach for classification and its application to bankruptcy prediction problem.” Computers and Ops Res., 2005, 32(10), 2561–2582. RAMIK, D. Modular neural networks. Bachelor thesis. University of Ostrava, 2006. RICHARD, M. D. and LIPPMANN, R. “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Comput., 3, (1991) 461–483. SKOWRONSKI, M. D., HARRIS, J. G. “Automatic speech recognition using a predictive echo state network classifier.” Neural Netw. 20, (2007) 414–423. VOLNA, E. Emergence of modularity in evolved neural networks. Neural Network World. Int. Journal on Neural & Mass-Parallel Comp. and Inf. Systems. Vol.13, No.6, pp. 617-628, 2003. ZHANG, G. P. “Neural networks for classification: A survey.” IEEE Transactions Systems, Man, and Cybernetics-Part C: Applications and Reviews, 30(4), (2000) 451–462.

1301

Current address Eva Volná, doc. RNDr. PaedDr. PhD. University of Ostrava, Faculty of Science Dept. of Department of Informatics and Computers 30th dubna st. 22, 70103 Ostrava, Czech Republic email: [email protected] Martin Kotyrba, Mgr. University of Ostrava, Faculty of Science Dept. of Department of Informatics and Computers 30th dubna st. 22, 70103 Ostrava, Czech Republic email: [email protected] Václav Kocian, Mgr. University of Ostrava, Faculty of Science Dept. of Department of Informatics and Computers 30th dubna st. 22, 70103 Ostrava, Czech Republic email: vaclav.kocian@ student.osu.cz Michal Janošek, Mgr. University of Ostrava, Faculty of Science Dept. of Department of Informatics and Computers 30th dubna st. 22, 70103 Ostrava, Czech Republic email: michal.janosek@ student.osu.cz

1302