Investigating neural network efficiency and structure by ... - CiteSeerX

0 downloads 0 Views 69KB Size Report
ABSTRACT: This research investigates the analysis and efficiency of neural ... any image feature, enabling feature search of an image database, an important ... described here attempts to model the architecture of a neural network to the data ...
Investigating neural network efficiency and structure by weight investigation Dr. Martin Lefley, Tom Kinsella Phone +44 1202 595572 Fax +44 1202 595314 School of DEC, Bournemouth University, Poole, DORSET. BH12 5BB. email: [email protected]

ABSTRACT: This research investigates the analysis and efficiency of neural networks, using a technique for network link pruning. The technique is tested with inefficient architectures for the XOR problem and then for a network from a real world, complex, image recognition task. By removing each link and examining effect upon error level, a fuzzy set is developed with membership indicating link saliency. As well as efficiency, the technique is useful to investigate solution architecture. It is hypothesised that similar insights may be gained for any problem solved by similar architecture This paper begins with the background, research and possible applications. Experimental design, implementation, methodology and results are given. The conclusion considers implications and suggests further research. Results indicate that this technique can significantly improve efficiency of a neural network for a real application. Both memory requirements and execution speeds improve by nearly 30 times. Further development is hoped to deliver improvements to efficiency and depth of investigation. KEYWORDS: Image processing; Neural networks; Pruning; Skeletonising; Face Recognition

1. BACKGROUND RESEARCH There are few known practical design steps for the architure of a neural net modeling a complex problem space. Huang and Huang, (1991) consider theoretical methods to assess bounds on the number of hidden neurons. However the solution is itself too theoretical for practical application. Others suggest the use of principal components to design the required number of hidden neurons but this is only a heuristic. Another approach is to use a fully connected net,with what is thought to be a sufficiently large size. If a net over generalises then it may be increased in size, otherwise, if it is not generalising, it can be reduced in size. A number of researchers have already explored and reported on pruning, mostly to improve network efficiency. However, there are many variants on approach, evaluation and application domains. Reed (1993) mathematically analysed a number of pruning techniques such as sensitivity analysis, 2nd. derivative behaviour and genetic algorithms. He notes that sensitivity analysis misses redundant group behaviour. Giles & Omlin (1994) show pruning can improve the efficiency of a recurrent net trained to recognise grammars in text strings. Jasic & Poh (1995) consider the removal of correlated nodes from oversized networks. They show good results for simulated and real data applications. Gorodkin et. al. (1997) used numerical methods to validate the theoretically derived salience measure of Le Cun et. al’s (1989) “optimal brain damage” method, based on weight variance. 1.1 RESEARCH INTO THE EXPERIMENT Two examples are used for the research described here, The XOR problem and locating facial images in an image database. The XOR problem can be viewed as one of the simplest examples of a non-linear problem (for its size, notice also that a fine grained n x n bit parity non-linear problem would be too detailed for human vision). Such simplicity makes it suitable for the illustration of concepts and for testing the approach before using it with a real application.

ESIT 2000, 14-15 September 2000, Aachen, Germany

423

Facial recognition is a large, complex, data analysis problem that is known to be a major task of the human brain, that it carries out with remarkable competence. One of the first applications to demonstrate the power of neural image interpretation was Golomb et. al. (1991) who demonstrated SEXNET, a system to identify the sex of a human subject. The problem of face recognition as an example of a complex vision task has been considered by many researchers. Manjuith (1992) uses a feature based approach and Brunelli & Poggio (1992) compare features and templates. Kirby et. Al.(1990) use the Karhunen-Loeve transform and Turk & Pentland (1991) use eigenfaces, Payne et. Al. (1992) and Micheli-Tzanakou et. al. (1995) use backpropagation neural networks. Lawrence et. Al. (1995) get very good results from a hybrid approach. Kopecz (1995) describes a semi-automatic access control system which uses a computerised face recogniser. In order to test the ability for the system to generalise, a different task was used here, based on research into facial location, an important pre-task for a face recognition system. The requirement is for the neural network to identify whether an image consists of a general face, having been trained on one set of faces and tested on a different set. Generally such a system could locate any image feature, enabling feature search of an image database, an important application in information retrieval. Based on these design decision, the following architectures were tested XOR with one extraneous input node, with value unchanging XOR with one extraneous input node, with value uncorrelated with the output XOR with 5 hidden nodes Image general face/not face decision system 1.2 BIOLOGICAL RELEVANCE Various experiments such as those of Blakemore & Cooper (1970) show that development of the brain depends on the environment. Hubel and Wiesel (1962) reported important information found by investigating the receptive fields in the cat's visual cortex. From this they deduced models of binocular interaction and functional architecture. The research described here attempts to model the architecture of a neural network to the data it is trained with.

2. METHOD To begin with the network is trained with a set of examples until a given minimum error threshold is reached. Then each weight is systematically removed from the neural network, all of the training examples are presented to the network, forward propagated and the error level calculated. In all the examples here the error function was taken as the square of the maximum error. Membership of fuzzy set of weight importance = Error function with weight removed For each link Record link weight Remove link For each example Forward propagate Calculate error Next example If error < Threshold Then remove link Else restore weight Next link

2.1 COLLECTING THE IMAGE DATABASE. An opportunity sample in Bournemouth University provided a useful image sample database. There is a wide range of adult age groups for about half the sample. The other half represents a small cross section, that of male and female subjects between the ages of 17 and 21. This narrow cross section enables tests to be made, of face image verification between similar facial types, based on age group. Each subject was briefed and assured that their image would not be

ESIT 2000, 14-15 September 2000, Aachen, Germany

424

used for anything other than experimental purposes. They were then asked to move towards and stand in front of a standard monochrome CCTV camera. Thus the faces were positioned by the subjects, without any verbal or visual feedback. In collecting data for the image data base, a budget camera was used to minimize potential target system price. For maximum flexibility realistic quality images were used including noise such as a potted plant in the background and people walking past. This makes a more realistic but more difficult system than other systems which use single located faces such as those used by Giles (1995) or Javidi (1995). Experiments with different training algorithms led to the selection of a standard back propagation network. McClelland & Rumelhart (1986). Termination was set when none of the outputs were less than 0.01 from their correct output. Scans were made on a 25 x 36 pixel region, this was a large box for the neural network that was of similar average dimensional ratio as that of the average face in the sample. A net was trained and tested using images surrounding the face and other regions. This gave some excellent results for the test data and gave a very high output for all of the training data that came from an enclosed face. For more information see Lefley (1997). 2.2 IMPLEMENTATION All software was written in PowerBasic, a compiled Basic, with a Visual Basic front end. Whilst faster implementations are possible, the high development speed and reasonable execution speed suggested this combination. Native code or more powerful code compilers could increase the speed but general experiments have shown that this is unlikely to be greater than a doubling of performance. Code optimisation may increase this further, depending on the requirements for flexibility. 2.3 ACKNOWLEDGEMENTS PowerBasic is a registered trademark of Powerbasic Inc. Visual basic is a registered trademark of the Microsoft corporation.

3. RESULTS Two sets of results are presented here, one for a problem with an absolute known correct answer and then for a problem which is not well known or thoroughly explored. For a well known problem, the XOR problem was used. This is a very difficult problem for its small size and is well known as the classic linearly inseperable problem, used by Minsky to describe an important limitation of the Perceptron neural architecture. The unknown problem is that of generic face recognition. This is the problem of identifying that image data is of some human face rather than any other real, possible image data. All of the processing was undertaken with a modestly powerful computer, an IBM 266MHz PC with1128Mbyte memory. All timings are approximate. Each net was trained using the standard back propagation rule and then the pruning was applied. Various strategies were applied, the most successful being to prune the net harshly (allow reasonably large changes in error) and then retrain, prune again and train again. With this strategy, no further reductions could be made using subsequent pruning. No strategy was found to be universally better than others and the reasonably large dimensionality in problem and solution spaces, mean that it is difficult to conduct exhaustive trials.

ESIT 2000, 14-15 September 2000, Aachen, Germany

425

3.1 XOR NETS The nets were trained and pruned to have a maximum error of 0.01 for any of the four logical examples.

Figure 1: The XOR net with extraneous input node. Thin lines represent all links and nodes, thick lines and nodes represent those left after pruning. Links from the extraneous input node with data uncorrelated with the output data are removed. This node may now also be removed. The bias links marked * are also removed as part of the efficiency savings. The net is reduced from 11 links to 7 , 63 % of original size. The net with extra hidden nodes was also efficiently reduced, not to two but three nodes. This may be interpreted as “if either are 1 but both are not 1”. The net was reduced from 21 to 8 links, 62% of the previous size. With this architecture, the fully connected net with 2 hidden nodes would have 9 links, thus this solution is more efficient than the standard smaller model from which it was extended.

Figure 2: The XOR net with extraneous hidden nodes. Thin lines represent all links, thick lines represent those left after pruning. Links from the extraneous input node with data uncorrelated with the output data are removed. The final training cycle for this net took some 5 million epochs further to train to the required error rate, indicating a flat error surface, though a further experiment showed it to have occurred by chance.

ESIT 2000, 14-15 September 2000, Aachen, Germany

426

3.2 FACE NETWORK The 192 examples of face and non-face data was reduced to a set of 192 images each 30x30 pixels. The network architecture was chosen to have 900 input nodes plus one bias node, 20 hidden nodes and 1 output node. The learning data set was comprised of 12 face and 180 non-face examples. Training took approximately 120 hours on a Pentium 200MHz computer the first pruning reduced the net to 10% of its size and also took about 120 hours to complete. After two train prune cycles the net was reduced from 18062 to 635 links a reduction to 3.5% of original size, or about 30 times. Thus this part of the net is stored in 30 times less memory and runs 30 times faster, at 500 decisions per second on a 500MHz Pentium computer. Surprisingly, examination of the network architecture showed that the net had been reduced to a single hidden node.

Face Not face Total

Examples

Total correct Full Pruned

Percentage error Full Pruned

65 983 1048

43 957 1000

33 3 5

54 919 973

17 7 7

Table 1. Results after pruning the face recognition network

Maximising type 1 errors Maximising type 2 errors Total examples

Total correct face non - face 54 917 43 959 65 983

Percentage errors face non - face 83 93 66 92

Table 2. Results of generalising the outputs for face recognition network before and after pruning Despite reducing the network from 20 hidden nodes to 1, the performance is not significantly impaired. This is, of course, a result of the sensitivity analysis technique used. Interestingly, the percentage error of type 1 errors was almost halved but the percentage of type 2 errors doubled. These errors can be adjusted to suit the application requirements by adjusting the classification threshold. Given that the net efficiency has increased by a factor of almost 20, the change in error is very small. The insight into the nature of the problem and the way the net has chosen to deal with it is very useful. The solution to a real, complex image processing task has been chosen as a single, linear, vector sum and threshold decision. Such information is very useful in determining how to proceed in developing a more powerful net architecture and efficient increases in performance may hypothetically be made without reaching the size (and inefficiency) of the original network.

4. CONCLUSIONS Significant efficiencies may be gained in back propagation networks of architecture similar to those used here. Very useful information on the problem structure and the way the network has learnt can be gained. Such information is complex to illustrate graphically, as is any net based system but the structure of the necessary image task may be gleaned from the remaining weights. The structure revealed may be used to make more informed decisions about how to proceed with a practical artificial neural network project. 4.1 CRITICISMS Learning is time consuming, learning and prune time are long but there is potential to speed it up using dedicated hardware. Further reductions using massive parallelism could be made but such a system would be expensive. Long term batch processing can be used for these stages using a relatively cheap computer and the architectures and algorithms can be realised using a parallel hardware, multi-computer and/or multiprocessor approach. This method may interpret the data in a different way than might be ideal or than that a human expert might develop. The system detailed here has only been shown to work for a single camera, indoors with a full face, with vague

ESIT 2000, 14-15 September 2000, Aachen, Germany

427

positioning. This would make it useful for example for checking passport photographs, credit transactions or controlling site entry. It has the potential to be extended and tested for various other applications. 4.2 WAY FORWARD Better visualisation tools are required and more experiments could be performed on data with known defects, or correlations. A summaries tool for node connections may provide a more useful analysis tool. The search surface is assumed to be flat, techniques may be developed to consider more efficient strategies. Summaries of node connections may provide a more useful analysis tool but were not found to be necessary here.

5. REFERENCES Brunelli, R. & Poggio, T. , 1992, “Face recognition: Features vs. Templates”, Technical report. TR9110-04, Instituto per la Ricerca Scientificae technologia. Blakemore, C. & Cooper, G. F, 1970, “Development of the brain depends on the visual environment”, Nature. 228, 477-478. Giles, C. L. & Omlin, C. W., 1994, “Pruning recurrent neural Networks for improved generalization performance”, IEEE transactions on neural networks. Vol 5. No. 5 p848-851 Golomb, B.A., Lawrence, .T., 1991, “SEXNET: A neural network identifies sex from human faces”, In D.S.Touretzky & R. Lippman Eds. Advances in neural Information Processing systems 3. Morgan Kauffmann. San Mateo. Huang, S., Huang, Y.,1991 “Bounds on the number of hidden neurons”, IEEE transactions on neural networks. volume 2, Part 1, p47-55. Hubel, D. H., Wiesel, T. N., 1962, “Receptive fields binocular interaction and functional architecture in the cat's visual cortex”, Journal of physiology. 166, 106-154. Javidi, B, Li, J., Tang, Q, 1995, “Optical implementation of neural networks for face recognition by the use of non-linear joint transform operators”, Applied Optics Vol. 34 No. 20. Jasic, T., & Poh, H. L., 1995, “Analysis of pruning in back propogation networks for artificial and real world mapping problems”, Procedures of the International workshop on ANNs: From natural to atificial intelligence. Spain Gorodkin, J., Hansen, L.K., Lautrup, B. & Solla, S. A., 1997, “Universal distribution of saliences for pruning in layered neural networks”, International journal of neural networks. Vol 8. No. 5&6. Kirby, M., Sirovich, L., 1990, “Application of the Karhunen-Loeve procedure for the charecterization of human faces”, IEEE Transactions on pattern analysis and machine intelligence. 12(1):103-108. Le Cun, Y., Denker, J. S., Solla, S. A., 1989, “Optimal brain damage”, Advances in neural information processing systems. 2. Ed. Tourezky. p598605 Denver. Lefley, M., 1997, “Neural facial feature recognition”, Proceedings of Neuro-Fuzzy conference. Paderbourne, Germany. Marr, D., 1982, “Vision”, SanFrancisco, W. H. Freeman & Co. Kopecz, J., Konen, W., Schulze-Kruger E. 1995, “Access control with face recognition”, Zentrum fur Neuroinformatik. Bochum. Germany. Lawrence, S., Giles, C. L., Tsoi, A. C., Back, A. D., 1995, “Face recognition: A hybrid neural network approach”, Technical report UIMACS TR-9616. Institute for advanced computer studies. Maryland USA. Manjuith, B. S., Chellappa R., von der Malsvurg C. 1992, “A feature based approach to face recognition”, Proceedings of computer vision and pattern recognition conference. Maryland. USA. Micheli-Tzanakou, Uyeda, Ray, R, Sharma A Ramanujan, R Dong, J., 1995, “Comparison of neural network algorithms for face recognition”, Simulation Volume 65. Part 1. p37-51. Payne, T.L., Solheim, I., Castain, R. , 1992, “Investigating facial recognition systems using backpropogation neural networks”, Proc. of 3rd. workshop on neural networks. Alabama. Reed, R., 1993, “Pruning algorithms - a survey”, IEEE transactions on neural networks. Vol. 4 No. 5 p740-747 Turk, M., Pentland, A., 1991, “Face recognition using eigenfaces”, In IEEE Proc. of computer vision and pattern recognition. p586-591, Hawaii.

ESIT 2000, 14-15 September 2000, Aachen, Germany

428

Suggest Documents