Evolving Deterministic Finite Automata Using Cellular Encoding Scott Brave Computer Science Dept. Stanford University Stanford, California 94305
[email protected]
ABSTRACT This paper presents a method for the evolution of deterministic finite automata that combines genetic programming and cellular encoding. Programs are evolved that specify actions for the incremental growth of a deterministic finite automata from an initial single-state zygote. The results show that, given a test bed of positive and negative samples, the proposed method is successful at inducing automata to recognize several different languages.
1.
Introduction
The automatic creation of finite automata has long been a goal of the evolutionary computation community. Fogel et. al. [1966] was the first to propose the generation of deterministic finite automata (DFAs) by means of an evolutionary process, and the possibility of inferring languages from examples was initially established by Gold [1967]. Since then, much work has been done in the induction of DFAs for language recognition. Tomita [1982] showed that hill-climbing in the space of nine-state automata was both successful and superior to enumerative search at inducing a diverse set of seven regular languages. This group of seven languages, seen in table 1, has become a benchmark for work in language induction and will be the main languages investigated in this paper. The automata Table 1 Tomita's benchmarks. Language Description TL1 TL2 TL3 TL4 TL5 TL6 TL7
produced by Tomita's method, however, rarely generalized to out-of-sample sentences from the desired target language. Pollack [1991] was able to produce DFAs with better out-of-sample performance using a recurrent high-order back-propagation network, but had difficulties with convergence on Tomita's languages TL2 and TL6. Other work has applied neural networks to the induction of higher order Chomsky languages (e.g. context-free and Turing) with varying success ([Giles et. al. 1990]; [Giles et. al. 1992]; [Waltrous and Kuhn 1992]; [Williams and Zisper 1988]). The majority of these methods, however, require some part of the automata structure to be specified by the user before being applied. In particular, the user almost always has to specify the number of states. Genetic programming (GP) [Koza 1992] provides the flexibility to evolve the structure and function of automata in their entirety. Dunay et. al. [1994] describes a genetic programming encoding which successfully produces generalizing automata for several languages, including all of the Tomita regular languages. This paper presents a method for applying cellular encoding, described by Gruau [1994], to the evolution of DFAs. The genetic programming tree is interpreted as a coding for the growth of an automata from an initial singlestate zygote. Mimicking the process of cell division in nature, execution of a GP tree causes the progressive division and development of cells in the embryo to form the fully grown automata. The encoding ensures that the GP trees are always valid, specify automata that contain no orphan or sink states, and can be crossed over normally. The encoding also allows for back pointers (cycles) to be generated easily. The results show that the method is successful in inducing several target languages, including all but one of the Tomita regular languages (TL6).
Example Sentence Language b* bbbbb (ba)* babababa aaabaabb no odd a strings after odd b strings bbabbaa no aaa substrings pairwise, an even sum of ab's and aabbaaaba ba's number of b's - number of a's = 3n abbaabbbb (multiple of 3) a*b*a*b* aababbb
in
the Example Sentence NOT in the Language bbaba abbaaab abbbaaaaa abbaaaba aabbaaab bbabaaa abbaabba
Genetic Programming 1996: Proceedings of the First Annual Conference
2.
Cellular Encoding of Deterministic Finite Automata
Gruau [1994] presents a genetic programming technique, called cellular encoding, that allows for the concurrent evolution of the architecture, weights, and thresholds of a neural network. Each tree in the population represents a cellular code that specifies the development of a neural network from a single neuron (ancestor cell). Developmental operations include neuron (cell) division, link modification, and threshold specification. The method has shown to be successful at producing neural networks to solve a variety of problems. This paper applies cellular encoding to the evolution of deterministic finite automata.
2.1.
Function Set
The functions that make up the GP tree are of two types: those which cause cell (state) division, and those which modify properties of existing states. Deterministic finite automata were chosen, instead of non-deterministic finite automata to make the fitness evaluation as fast as possible. This decision, however, makes the choice of functions somewhat more constrained since each operation on the growing embryonic automata must preserve the determinism. The most intuitive way to implement the cell division described above on a serial machine would be to maintain a pointer to the currently active cell (state) [Gruau 1994]. Functions in the GP tree would then cause the active cell to divide, modify the active cell, or change which cell is active. It turns out, however, that in dealing with deterministic finite automata, it is easier to think of a certain arc being active at a given time, instead of a cell. So, during execution of the GP tree there is always a single arc which is active. Functions in the tree either cause the cell at the head of the active arc to divide, modify where the active arc points, or change which arc is active. The zygote(ancestor) DFA, which classifies all inputs identically, is shown in figure 1. b a Figure 1 Single state zygote DFA. There are two functions that cause new states to be created in the automata. PARALLEL and PARALLEL-REJ are similar to the PARALLEL function in Gruau's work. PARALLEL is a two argument function that creates a new accepting node to which the active arc will now point. PARALLEL-REJ creates a new rejecting node. The output arcs of the newly-created node are copied from the old node to which the active arc pointed. PARALLEL then makes the output a arc of the new node active and evaluates its first child branch. Next, output b is made active and the second child branch is evaluated. Figure 2 demonstrates the behavior of the PARALLEL function on a section of an automata. If the output arc b of node 1, seen in figure 2a, is active, calling PARALLEL will create the new node labeled N as shown in figure 2b (bold type indicates the active arc and double circles signify accepting states).
(a)
(b) 4
3 a
b
a
b
a
2
2 a
4
3
b
N
a
1
b
b
1
Figure 2 Effect of PARALLEL function. (a) Before calling PARALLEL, and (b) After calling PARALLEL. Notice that if node 2 and node N are both accepting, or both rejecting, the behavior of the automata will not be immediately affected. The possibility for specialization has been created, however, since subsequent operations can modify the output arcs of node N. The second class of functions allow for modification of the active arc, but create no new nodes. RETRACT(arg1) simply makes the active arc point to the node it leaves, creating a self-loop, and then evaluates its argument. WRITE-NODE(arg1) pushes the number of the node at the tail of the active arc onto a stack which is maintained during growth. TO-NODE(arg1) makes the active arc point to the node whose number is on the top of the stack and then pops the stack. If the stack is empty, TO-NODE leaves the active arc unchanged. Both WRITE-NODE and TO-NODE evaluate their argument after performing their operations. DONE is the only terminal, and simply indicates that the active arc is finished being modified.
2.2.
Example Encoding
To gain a better understanding of the encoding, we will step through the growth of a simple automata that recognizes the language, (a|b)*bab(a|b)* (all strings that contain the subsequence bab). The example GP tree used for this demonstration is shown in figure 3. 1. PARALLEL-REJ (1)
2. DONE
3. WRITE_NODE
4. PARALLEL-REJ (2)
5. PARALLEL-REJ (3)
6. TO-NODE 7. DONE
13. RETRACT
8. PARALLEL (4)
14. DONE
9. RETRACT
11. RETRACT
10. DONE
12. DONE
Figure 3 Example GP tree that encodes an automata to recognize (a|b)* bab(a|b)*.
Genetic Programming 1996: Proceedings of the First Annual Conference
(a)
(b)
(c)
b a
(d)
a
a
1
b
b
b
1
2 a|b
a
1
1
Figure 4 Example growth of an automata (initial steps).
To avoid confusion in this discussion we will refer to nodes in the genetic programming tree as "nodes" and nodes in the finite automata as "states". Nodes in the tree are numbered at the left to indicate the order of evaluation (preorder) and numbers in parentheses denote creation of a new state in the automata. The program begins with a call to PARALLEL-REJ at node 1. The first call to PARALLEL or PARALLEL-REJ in the GP tree is used to specify the zygote as an accepting or rejecting state (all functions, necessarily of single-arity, before such a call are ignored). Therefore, execution of node 1 causes the initial zygote (start-state) to be a rejecting state as shown in figure 4a. This state is the first created and thus receives the label "1" as indicated in parenthesis in figure 3. The output arc a, as indicated in figure 4b, is now activated and the left subtree of node 1 is evaluated. The function, DONE, at tree node 2 is executed and no modifications to the active arc occur. The control now moves to the right subtree and arc b is activated (figure 4c). Node 3, WRITE-NODE, causes a "1" to be pushed onto the stack and node 4, PARALLEL-REJ, creates a new rejecting state as seen in figure 4d. Program execution now continues by activating arc a of state 2 and evaluating the left subtree of node 4. The function call at node 5 causes another cell (state) division as shown in figure 5a; state 3 is created and the previous output arcs of state 1
(a)
are copied (remember that we copy the output arcs from the state to which the active arc previously pointed). The active arc now becomes output arc a of state 3 and nodes 6 and 7 are executed. This causes the "1" to be popped off the stack and the active arc is made to point to state 1. In this case, however, the arc was already pointing to state 1, so the TONODE function was redundant. Output arc b of state 3 is now activated and a final cell division at node 8 causes a new accepting state (state 4) to be created as shown in figure 5b. The two subtrees of node 8 cause both output arcs of state 4 to retract (figure 5c). The output arc b of state 2 now becomes active as the right subtree of node 4 is evaluated. Figure 5d shows the final "mature" automata that correctly recognizes the target language, (a|b)*bab(a|b)*. Note that, although not illustrated in the above example, the arguments to the arc moving functions, TO-NODE and RETRACT, can be arbitrarily complex. The decision to make these two functions single-arity instead of zero-arity (terminals) allows for arc movements to affect subsequent cell divisions by changing which state the output arcs are copied from.
(b) 2
2
b b a
1
a
b a
3
a
1
4
a
b
b
a
b 3
a b
b
(c) 4
a
2
4
b a
b 1
b a
2
b a
b
(d)
a
a
b 3
Figure 5 Example growth of an automata (final steps).
a
1
a
b 3
Genetic Programming 1996: Proceedings of the First Annual Conference Table 2 Tableau for the DFA induction problem. Objective: Find a program which encodes a DFA that correctly classifies all strings in the sample set (fitness cases), given the target language. AND(arg1,arg2), OR(arg1,arg2), NOT(arg1,arg2), DFA1, Function set for the main branch: DFA2, DFA3 Function set for the three DFA branches PARALLEL(arg1,arg2),PARALLEL-REJ(arg1,arg2), RETRACT(arg1), WRITE-NODE(arg1), TO-NODE(arg1), DONE (DFA1, DFA2, DFA3): Fitness cases: Fitness: Parameters:
1,000 example sentences (500 positive and 500 negative) The number of incorrectly classified sentences. Population size (M): 10,000 Maximum number of generations (G): 100 Maximum number of nodes per individual: Main branch: 50 Each DFA branch: 300 Success predicate: A fitness of 0 (all sentences in the sample set correctly classified).
3.
Structure of the Genetic Program
The genetic program for the language induction problem has four branches: one main branch, and three automata branches. Each automata branch encodes one DFA in the method described above. The main branch has access to these automata in addition to the boolean functions, AND, OR, and NOT. The main branch thus serves as a way of combining the information returned by the three created automata. The main motivation behind this approach was early difficulty in solving the dual parity problem (even number of a's and b's). Although encodings quickly emerged that recognized even a's only or even b's only, these encodings could not be easily combined with crossover to create a solution. To combat this problem, the boolean functions were added in hopes that the greater flexibility would increase building block potential. Since a single automata equivalent to any and/or/not combination of three sub-automata can be trivially created, the GP tree can still be considered an encoding of a single DFA (recall that regular languages are closed under conjunction, disjunction, and complementation). For each sentence to be classified, each of the three DFA branches are evaluated and the resulting automata simulated with the sentence given as input. If a DFA accepts the sentence, all occurrences of that DFA are replaced by TRUE in the main branch. If the sentence is not accepted, all occurrences are replaced by FALSE. Once these substitutions are made, the main branch is evaluated. If the main branch returns TRUE, the sentence has been classified as a member of the language by the genetic program. If the main branch returns FALSE, the sentence has been classified as not in the language. For
example, one possible solution to the dual parity problem would be to build one automata to recognize and even number of a's (DFA1) and one to recognize an odd number of b's (DFA2) and then combine them as (AND DFA1 (NOT DFA2)).
4.
Results and Discussion
The described method was tested on several languages using 500 positive and 500 negative cases with a population size of 10,000. A tableau summarizing the parameters is given as table 2. All runs were executed using Dave's Genetic Programming Code in C (DGPC) on a Pentium. The crossover rate was set to 0.8 for internal functions and 0.1 for leaves, the maximum new tree depth was 6, and no mutation was used. Program size was limited to 50 nodes for the main branch and 300 nodes for each DFA branch. Fitness was measured as the number of incorrectly classified strings. The first languages tested were the seven benchmarks described in Tomita [1982] (shown in figure 1). Of these tests, all but TL6 (#1's - #0's = 3n) solved in less than 100 generations, producing automata with 100% success in classifying out of sample strings (500 positive and 500 negative). Furthermore, inspection of the evolved automata showed that all solutions were fully generalizable. Figure 6 shows one such evolved solution to TL3 and figure 7 shows a solution to TL7 (both of these solutions made use of only one ADF). TL6 was run 10 times without success. The method was also successful in producing correct and general automata for the dual parity problem (even number of a's and b's) and the language: (a|b)*aaaa(a|b)*bbbb(a|b)*.
b a b b
a a
a b
a
b
a
b
a
a|b
a|b
Figure 6 Example evolved solution to TL3.
b
a
b
a
Figure 7 Example evolved solution to TL7.
b
Genetic Programming 1996: Proceedings of the First Annual Conference AND
DFA2
DFA1 b b a
b
a|b
a
a
a
a|b
a|b
Figure 8 Example solution to the dual parity problem. An interesting, evolved solution to the dual parity problem can be seen in figure 8. The main branch (not shown) simplified to an AND of DFA1 and DFA2. DFA1 recognizes even length strings and DFA2 recognizes an even number of a's. The reason why no solution was found to TL6 is still under investigation. Initial observations of the unsuccessful runs, however, suggest that the programs were attempting to enumerate the example sentences. This explanation seems likely given that 1) Any increase in fitness tended to be in small increments 2) The size of the program trees grew more quickly than on runs of other languages 3) Improvements in fitness ceased before the maximum number of generations was reached 4) The function set allows for "easy" enumeration with PARALLEL and PARALLEL-REJ. Such enumerating programs plateau when limitations on program size prevent further enumeration. The reason that the TL6 evolution degenerated to enumeration, while other languages did not, could be due to lack of useful building blocks with the described encoding or even the absence of a possible encoding for the necessary automata. Obviously, all encodings have corresponding automata; however, it is not clear that all automata have an encoding or even that all classes of equivalent automata have at least one encoding. The stack structure used by the WRITE-NODE and TONODE functions makes automata like the one in figure 9 impossible to create (though there may exist an encoding for an equivalent automata). The difficulty occurs with the output arcs of state 4. Because arc a is always active before arc b, the creation of this automata would require state 1 to be above state 2 on the stack. This is not possible with the described encoding since state 2 is created after state 1. The
1 b
a
2
a b a
a
3 b
possibility also exists, however, that TL6 is simply a more difficult problem, requiring a larger population size or more than 10 runs to find a solution.
5.
Conclusion
This paper has presented a method for applying cellular encoding to the evolution of deterministic finite automata. The described encoding has the advantage that it 1) maps to only valid automata, 2) employs normal crossover, and 3) provides mechanisms for the easy creation of back pointers. These advantages, along with the successful evolution (induction) of 8 out of the 9 languages tested, indicates that the cellular encoding of DFAs is a method worth further investigation.
6.
Future Work
Future work will focus on the improvement of the current function set, in hopes of creating a encoding that can successfully solve all seven of the Tomita regular languages. Memory structures other than the single stack (e.g. two stack, indexed) will be investigated, along with the utility of adding new state-creating functions. In addition, a theoretical analysis will be sought, in an attempt to find and prove an encoding which is able to produce any DFA.
Acknowledgments Discussions with John Koza, Frederic Gruau, David Andre, and Simon Handley have been very helpful. Many thanks to David Andre for the use of Dave's Genetic Programming Code in C (DGPC).
Bibliography 4
b
Figure 9 Example automata which is impossible to create with the described encoding.
Dunay, B. D., Petry, F. E., and Buckles, W. P. (1994). Regular language induction with genetic programming, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE Press. Volume I. Pages 396-400. Fogel, L. J., Owens, A. J., and Walsh, M.J. (1966). Artificial Intelligence Through Simulated Evolution. New York, NY: John Wiley and Sons.
Genetic Programming 1996: Proceedings of the First Annual Conference Giles, C. L., Sun, G. Z., Chen, H.H, Lee, Y.C., and Chen, D. (1990). Higher order recurrent networks and grammatical inference, Advances in Neural Information Processing. D.S. Touretzky, ed. San Mateo, CA: Morgan Kaufmann. Volume 2. Pages 380-387. Giles, C. L., Miller, C. B., Chen D., Chen, H. H., Sun, G., and Lee, Y. C. (1992). Learning and extracting finite automata with second-order recurrent neural networks, Neural Computation.Volume 4. Pages 393-405. Gold, E. M. (1967). Language identification in the limit, Inform. Contr. Volume 10. Pages 447-474. Gruau, Frederic. (1994). Genetic micro programming of neural networks. Advances in Genetic Programming. K. E. Kinnear Jr., ed. Cambridge, MA: The MIT Press. Pages 495–518. Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: The MIT Press. Pollack, J. B. (1991). Language induction by phase transition in dynamical recognizers, Advances in Neural Information Processing. R. P. Lipmann, J. E. Moody, D. S. Touretzky, ed. San Mateo, CA: Morgan Kaufmann. Volume 3. Pages 619-626. Tomita, M. (1982). Dynamic construction of finite-state automata from examples using hill climbing, Proceedings of the Fourth Annual Cognitive Science Conference. Ann Arbor, MI. Pages 105-108. Waltrous, R. L. and Kuhn, G. M. (1992). Induction of finitestate languages using second-order recurrent networks, Neural Computation. Volume 4. Pages 404-414. Williams, R. J. and Zipser, D. (1988). A learning algorithm for continually running fully recurrent neural networks, Institute for Cognitive Science Report 8805. La Jolla, CA: University of California at San Diego.