Using Genetic Engineering to Find Modular Structures for Architectures of Arti cial Neural Networks Christoph M. Friedrich
University of Witten/Herdecke Institute for Technology Development and Systems Analysis Alfred-Herrhausen Str. 50; 58448 Witten, Germany E-mail:
[email protected]; URL: http://www.tussy.uni-wh.de/~chris
Abstract
Starting with an Evolutionary Algorithm to optimize the architecture of an Arti cial Neural Network (ANN), it will be shown that it is possible, with the help of a graph-database and Genetic Engineering, to nd modular structures for these networks. A new graph-rewriting is used to construct families of architectures from these modular structures. Simulation results for two problems are given. This technique can be useful as an alternative to Automatic De ned Functions for computing intensive structure optimization problems, where modularity is needed.
1 Introduction
One of the major problems using ANN's is the design of their architecture. The architecture of an ANN greatly in uences its performance. If the architecture is too small, the net is not able to learn the desired input/output mapping. On the other hand, if the architecture is too large, the net generalizes poorly on unseen data. Beside of constructive and pruning techniques, Evolutionary Algorithms have been suggested by many scientists to nd good architectures for ANN's. Much work has been done in this area. For a survey of recent work the paper of Branke [2] is suggested. Most of the diculties in this eld arise from the problem of choosing the right representation, to encode a network graph for the Evolutionary Algorithm. The second problem is the scalability of the found encoding technique, in this work the main interest is therefore on modularity, to obtain a scalable method for this problem.
2 Cellular Encoding A method to optimize the architecture and weights of boolean neural networks, where the weights are restricted to the values -1 and 1, was suggested by F. Gruau [3] who named it Cellular Encoding. In this encoding technique, the information to develop an architecture is obtained by interpreting the information from a grammar-tree. The nodes of this tree encode information about graph rewriting operations. The Appeared in: Proceedings of the 3rd International Conference on Arti cial Neural Networks andGenetic Algorithms '97 (ICANNGA '97); Pages 375-379; G.D. Smith editor; Norwich, UK; April 1997; Springer.
development of a neural network starts with a graph consisting of one node (cell), having ingoing connections from the input area and outgoing connections to the output area. Every cell has a reading head pointing to one node of the grammar-tree (cellular code). The nodes of the cellular code contain symbols de ning the graph rewriting operation. This technique is comparable to a Turing machine. Instead of writing to a tape in Cellular Encoding the cells are changed. The operation #par: for example symbolizes a parallel cell division, both cells inherit the input and output connections from the mother cell. The reading heads of the new created cells are moved to the left and right subtree of the grammar-tree. For biological plausibility, the development of the cells should be parallel. This can only be achieved by using a FIFO Queue. The development of a network ends, if all cells evaluate the #end: operation, located as terminal at the leaves of the grammar-tree. As mentioned above, the cellular code is given as a grammar-tree. In this case it is possible to use Genetic Programming to optimize the cellular code. Genetic Programming is a Genetic Algorithm, that uses trees instead of Bitstrings as representation. The recombination operator crossover is realized by exchanging the subtrees of two parent individuals. The mutation operator brings some variations to the genome and is realized by inclusion of randomly initialized trees. Gruau has proved some properties of his encoding technique, among them are completeness, compactness, closure, modularity and scalability. He got these properties by using a recursion operator that makes it possible to repeat parts of his cellular Code.
3 Modi ed Cellular Encoding
Evolutionary Algorithm Offsprings 1100110 1001110
4
4
5
3
0001011
Parents Recombination Mutation
1100110 1001011
Selection
0001110
3
Database Network Generator 1100110
1
2
#par:
1
#seqcopy:
#seq:
4
4
4
5
5
2
#lsplit:
Fitness
with Networkgraphs, fitness
Evaluation
and cellular code Decoding
#usplit:
test patterns
Network simulator
4
5
4
5
learning
Checking for graph-identity 3
5
3
3
3
6
3
6
training patterns 1
2
1
2
1
2
1
2
1
2
Figure 1: The used rewriting operations for the modi ed Cellular Encoding For the optimization of the architecture of ANN's with real-valued weights, the Cellular Encoding method was modi ed. Some new Development operators were developed and tested. The set of operators f#par:, #seq:, #seqcopy:, #usplit:, #lsplit:, #end: g was used to develop the architectures. Figure 1 shows the eects of rewriting the hidden node of a network with the development operators. One property of the created networks should be their correctness. A created graph is a correct architecture for a feedforward neural network, if it contains only feedforward connections and if there are no isolated nodes, which means that all hidden nodes are on a path from an input node to an output node. All operators were designed to hold this property, so only correct network graphs could be created. Dierent to Gruau's work, it is possible with the operator #seqcopy:, to create networks with shortcut connections. The second modi cation to the evolutionary development cycle was the use of a graph-database. This database contains the cellular code, network graphs and tness parameters like the number of necessary epochs for learning, classi cation error on the learning, validation and test set and other parameters of the networks. Figure 2 shows the working structure of the evolutionary cycle. Before the tness calculation of a developed network, it is checked, whether the tness of this network was calculated ve times before. The number ve was used as a compromise between statistical evaluation possibility and necessary computing time. It would be better to use the tness distribution as information
validation patterns
Figure 2: Modi ed Cellular Encoding using a graphdatabase If not, the tness is calculated again and saved in the database. The resulting tness of a network is the mean of all tness evaluations of this network. This method allows it to minimize the tness distortion resulting from the random initialization of the weights. This increases the validity and robustness of the obtained results. Other authors document only solutions which are results of a single tness evaluation. In this work, only results are given for architectures tested ve times. Another advantage of the database is the possibility to save computing-expensive tness evaluations, if the network was tested ve times before.
3.1 Genetic Engineering
Optimizing the architecture of an ANN using Evolutionary Algorithms leads to a proper performing architecture for a problem. But it gives no insight into the problem, how good architectures are built. It would be much more interesting to nd building principles for good architectures to create modular architectures. In his work on Genetic Programming, Koza [4] suggested Automatic De ned Functions as a possible solution for this problem. Unfortunately for the task of architecture optimization the tness evaluation is very computing intensive, so it will be not possible to use big population sizes and many epochs. Especially for Genetic Programming, where the representation of a problem is a grammar-tree,
for this number.
Page 2
Altenberg [1] suggested a method to nd good modular structures that are subtrees of the representation. He called this technique Genetic Engineering. It depends on the assumption, that some parts of the genetic program have a higher impact on the complete tness of the genetic program than others. It is a problem to attach a tness value to sub-programs, because it is only possible to obtain the tness of the complete phenotype. In this work the tness of modular structures of the genetic programs were found by analyzing the cellular code in the graph-database, which was built during the formerly described optimization process. This makes it possible to obtain information about many optimization runs that start with dierent populations. The tness of a subtree is equivalent to the frequency of this subtree in the database. The found modular structures are called modules and noted as a preorder traversal of the subtree.
3.2 Graph-Rewriting Having found good modular structures for neural network architectures, it would be interesting to see how architectures built of this modules perform. One possibility is to include these subtrees as encapsulated functions in the next evolutionary optimization process. In contrast to this, in this work a special graphrewriting method was used. The (i; k)-rewriting of a network graph with a module starts with a fully connected n ? i ? m network, where n; m are determined by the problem. Then all hidden nodes are rewritten k times with the rewriting operations de ned by the module. For k = 0 this results in standard architectures with one hidden layer. With this method it is possible to create families of architectures consisting of good performing modules. Figure 3 shows an example of a network created by (1; 1)-rewriting of a 3-1-2 network with the module #usplit:#par:#lsplit:. The hidden node will be rst rewritten with a #usplit: operation and the resulting nodes will be rewritten with the operation #par: or #lsplit:.
Figure 3: Resulting network of (1; 1)-rewriting of a 3-1-2 network with the module #usplit:#par:#lsplit:
4 Experimental Results The data for the rst experiment are taken from the benchmark suite from Prechelt [7]. The task for the diabetes problem is to classify from some diagnostic data (e.g. blood pressure, result of glucose tolerance test etc.), whether a female Pima indian was diabetes positive or not. The patterns sets were divided into three datasets with the size (576/96/96). There are 8 input parameters and 2 output classes. The classi cation method is WTA (winner takes all). The problem is dicult, because some data is not available for all patterns and therefore set to zero. Prechelt and Michie [6] found networks having a classi cation error of 24.8 % on the test set. The activation function of the network was the tanh(x). The learning set was presented up to 500 times, with RPROP as learning algorithm but cross validation usually stopped the run before. The tness function for this problem was a linear combination of the classi cation error on the test set (factor 1) and the learning set (factor 0.3). Several optimization runs with this problem were made. About 100,000 architectures were tested, this task needed about 16 days of UltraSparc computing time. The best architecture found by the evolutionary algorithm is a network with 33 nodes (23 hidden nodes) and 148 weights. This sparsely connected network shows a classi cation error in the mean of ve runs of 23.125 % on the test set and 15 % on the learning set. The application of Genetic Engineering on the data in the graph-database found the module #par:#usplit:#usplit: with the highest frequency. The elements of the family of architectures that can be constructed through (i; 1)-rewriting perform very well on this problem. The architecture created by (5; 1)-rewriting, a net with 26 nodes (16 hidden nodes) and 160 weights shows the same classi cation error of 23.125 % on the test set, but the classi cation error on the learning set was about 5 % better with the evolutionary optimized architecture. The second tested problem was the approximation of a mexican-hat function with an ANN. Mandischer [5] used it to show the eectiveness of his technique to optimize the architecture of an ANN. The problem has 2 inputs, the x- and y-coordinate of the function and one output, the z-coordinate. The network was learned with 841 patterns describing the function in the range [?2:1; 2:1]. The tness-function was a linear combination of the classi cation error on the learning set (factor 1), remaining sum of squared errors (factor 10) and number of used Epochs (factor 1) to obtain a mean squared error of 0.01. The learning set was presented up to 1000 times, but occasionally
Page 3
less epochs were needed. All other parameters were set as in the diabetes problem. 100 generations with a population size of 50 were tested. This results in 5000 tness evaluations. For these evaluations approximately 12 hours of SparcStation 10 computing time was needed. The best found network had 35 nodes (32 hidden) and 214 weights. This network needed 58 epochs in the mean of ve runs to approximate the mexican-hat function. All standard-architectures up to 200 hidden nodes needed approximately 150 epochs. Mandischer found in his work a network with 100 nodes and 590 weights, that needed 64 epochs in one test. The works are not directly comparable, because Mandischer uses standard-backpropagation and in this work RPROP was used. As a conclusion it can be seen that the evolutionary optimization process nds better performing architectures.
The results were compared with results available from literature. All evolved architectures show better performance for the optimized criteria than comparable standard architectures. With the help of a graphdatabase it was furthermore possible to nd modular structures for the architecture of ANN's. Architectures created with a new graph-rewriting method from this modular structures show better performance than standard architectures,. This technique can be used as an alternative to Automatic De ned Functions for problems that restrict the number of possible tness evaluations. Using this method, it is possible to create a library of good modular structures for problem-speci c modular network architectures. Further investigations on these modular structures and graph-rewritings may give some new insights into the working principles of ANN's and their learning algorithms.
1000 "mexicanModul.dat" "mexicanStandard.dat"
900
References
800
[1] L. Altenberg. The evolution of evolvability in genetic programming. In K. E. Kinnear, editor, Advances in Genetic Programming. MIT Press, 1994. http://pueo.mhpcc.edu/ altenber/PAPERS/Papers2.html.
700
Epochs
600 500 400 300 200 100 0 0
10
20
30
40
50 Nodes
60
70
80
90
100
Figure 4: Comparison of needed Epochs for standardarchitectures and architectures build by graphrewriting for the mexican-hat problem The application of Genetic Engineering found the module #seqcopy:#seqcopy:#usplit:. The family of architectures build by (i; 1)-rewriting with this module shows a better tness than standard architectures on this problem. Figure 4 gives a comparison of this result, it should be noted, that the number of weights for standard-architectures and architectures build by the rewriting operations are the same, if they have the same number of nodes.
5 Conclusion It was shown that it is possible to nd good performing architectures for non-boolean ANN's, with a modi ed Cellular Encoding method. The functionality of the method was demonstrated on two problems.
[2] J. Branke. Evolutionary algorithms for neural network design and training. In Proceedings of the 1st Nordic Workshop on Genetic Algorithms and its Applications, 1995. ftp://ftp.aifb.unikarlsruhe.de/pub/jbr/Vaasa.ps. [3] F. Gruau. Cellular encoding of genetic neural networks. Technical Report 92-21, Ecole Normale Superieure de Lyon, Institut IMAG, 1992. ftp://lip.ens-lyon.fr/pub/Rapports/RR/RR92/RR92-21.ps.Z. [4] J. R. Koza. Genetic programming, on the programming of Computers by means of natural selection. MIT Press, 1992. [5] M. Mandischer. Representation and evolution of neural networks. In R.F. Albrecht, C.R. Reeves, and N.C. Steele, editors, Arti cial Neural Nets and Genetic Algorithms Proceedings of the International Conference at Innsbruck, Austria, pages 643{649, Wien and New York, 1993. Springer. [6] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, neural and statistical classi cation. Ellis Horwood Ltd., 1994.
Page 4
[7] Lutz Prechelt. PROBEN1 | A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultat fur Informatik, Universitat Karlsruhe, D-76128 Karlsruhe, Germany, September 1994. ftp://ftp.ira.uka.de/pub/papers/techreports/1994/1994-21.ps.Z.
Page 5