GPK: A Java Based Genetic Programming Kernel 1 2 ˇ ˇ ˇ Danijel Znuderl , Matej Crepinˇ sek2 , Marjan Mernik2 , Viljem Zumer 1
2
Hermes Softlab, d.d. Ljubljana, Slovenia
[email protected] University of Maribor, Faculty of Electrical Engineering and Computer Science, Slovenia {matej.crepinsek, marjan.mernik, viljem.zumer}@uni-mb.si
Abstract. In the paper a Java based genetic programming kernel (GPK) is described. Its main features are ease of use, portability, and robustness, which were achieved by strongly using the Java reflection mechanism. This unique characteristic of our GPK distinguishes it from other similar frameworks. GPK was successfully used in our research work as well in the teaching process. Keywords. evolutionary computations, genetic programming, kernel, Java, educational tool, optimization. 1
the leaves are constructed from the set T while the other nodes are constructed from the set F . The selection of sets T and F is one of the most important tasks in genetic programming since functions, variables and constants should be powerful enough to be able to represent a solution to the problem. In the paper a Java based genetic programming kernel (GPK) is described [22]. The GPK is a genetic programming framework entirely coded in Java. It was carefully designed and implemented following the principles of objectoriented programming (OOP), using design patterns [12] and Java features like reflection and interfaces [8, 7].
Introduction
Evolutionary algorithm (EA) [9, 19, 14] is a search and optimization technique based on the principles and mechanisms of natural selection and survival of the fittest. Genetic algorithm (GA), evolution strategies (ES), evolutionary programming (EP), and genetic programming (GP) are all special instances of an evolutionary algorithm. All these particular algorithms differ among each other mainly in what constitutes the population, different order and kind of genetic operators, and different selection techniques (stochastic/deterministic). In this research we are particularly interested in genetic programming [2, 15], a descendant of genetic algorithms, where the population consists of computer programs. Each member of the population, a chromosome, represents a possible solution in the search space of all possible programs. Since the search space (all i.e. possible programs written in a chosen programming language) is too large, it is restricted by the terminal set T and the user-defined function set F. The set T contains variables and constants and the set F functions that are a priori believed to be useful for the problem domain. For example, in the symbolic regression problem [15] from sets T = {x, 1} and F = {+, −, ∗} the following program can be constructed (+ (∗ x x) 1) which represents the function x2 + 1. The program is presented as a tree. It can be easily seen that
2
Motivation
The GPK was designed with the objective of providing a GP framework that is easy to use, portable, robust, elegant and suitable for the research in GP, as well as for the purpose of teaching and learning. Ease Of Use. Implementing even a simple example of GP in computer programming languages like Ada95, C, C++ or Java could take a lot of time (designing, implementing, debugging). With the implemented framework the programmer works only on a specific problem, while genetic operators (selection, crossover, mutation), program representation, etc. are already incorporated in the framework. The GPK provides several mechanisms that offer user friendly programming. It includes implemented interfaces, GP examples, user interface etc. It was designed to be highly flexible, with nearly all classes (and all of their settings) dynamically defined at runtime by a user-provided scenario file. Portability. Java is a language developed by Sun Microsystems (through its subsidiary JavaSoft) [7] for the creation of cross-platform portable applications. Java is implemented by a client side virtual machine (VM), and offers a
“sand box” model security. Java will not allow any operation likely to breach the security and integrity of the client machine. It has formalized data types, math operations and input/output libraries. That guarantees that GPK behavior will be identical regardless of the platform. For scientific experiments, guaranteed reliability is a highly desirable attribute and something that is not met by existing C, C++, or Lisp-based evolutionary computation systems. Elegant. Carefully designed and implemented principles of object-oriented programming, design patterns and strict programming rules enforce to make the Java code easy to read, understand, and modify. The GPK as an educational tool. The interest in the fields of evolutionary computation (EC) and GP is rapidly growing and the need for special education tools is increasing consequently. To be an efficient educational tool, the framework is designed so as to provide a good visualization (user interface, problem description), a lot of simple examples, ease of use, and a well documented open source code. With the currently available tools, like the GPK and the EGALT [16], students can reduce the mechanical programming aspect of learning and concentrate on principles alone.
3
Related Work
In the decade many evolutionary computation software systems have been developed, but only a few are flexible and easy enough to implement different flavors of algorithms. They use different programming languages, different programming technics and different arhitectures. The proposed GPK differs from other similar frameworks by the use of the Java reflection mechanism. GPC++ - Genetic Programming in c++. The first well known C++ framework [4] for tree-based GP was developed by Adam Fraser and Tomas Weinbrenner. The design of the GPK framework was based on the GPC++ and other known EC tools. GPK and GPC++ share some ideas and implementation aspects, but C++ doesn’t support interfaces and reflection. Therefore, their approach seams to be less elegant (method evaluate in class MyGene). EO - Evolutionary Computation Framework. EO [3] is a template-based, ANSI-C++ compliant evolutionary computation library. It contains classes for most of the known evolutionary computations. In cases when one doesn’t find the needed class, its component-based design allows
us to subclass the existing abstract or concrete class. The EO framework has the ambition to be built in to the operating system. The license is free for distribution, but for commercial use you need to contact authors. ECJ - A Java-based Evolutionary Computation and Genetic Programming Research System. ECJ [1] is a research EC system written in Java. It was designed to be highly flexible, with nearly all classes (and all of their settings) dynamically determined at runtime by a user-provided parameter file. All structures in the system are arranged to be easily modified. Even so, the system was designed with an eye toward efficiency; ECJ may make you reconsider notions about Java and slowness. The system is very extensive, which makes it less readable than we wish. In comparison with other systems, we believe that it is the most feature-rich GP system available in the public domain. Open BEAGLE. Open BEAGLE [13] is an Evolutionary Computation framework entirely coded in C++. It provides a software environment to do any kind of EC. The Open BEAGLE architecture follows strong principles of object oriented programming, where abstractions are represented by loosely coupled objects and where it is common and easy to reuse a code. Open BEAGLE is designed to provide an EC environment that is generic, user friendly, portable, efficient, robust and elegant. The source code of Open BEAGLE [6] is free, available under the GNU [5] Lesser General Public License (LGPL).
4
GPK Architecture
The GPK has a typical multi layer architecture (Fig. 1). Every part of the tool can be easily replaced or modified. The most important part are its interfaces which represents the basic design of the architecture (Fig. 2). The interfaces are the core of the kernel framework and represent its first layer. They simplify building, upgrading, changing, selecting or adding different GP operators and problems. For example, if the user needs to implement his own selection method he just implements the necessary interface (ISelection) and include it in the scenario of the problem description (Fig. 7). Most of the basic GP operations are already implemented in the GPK. Trees consists of nodes which can be terminals or functions (non-terminals) created from a Factory class (the factory pattern) [10], where they are registered by the GP problem. We can say that this is the second layer of the GPK, where we can find some implemented interfaces (classes) and par-
tially implemented abstract classes. The next layer is designed for implementing s specific GP problem. Here we must implement specific methods, like calculating fitness, problem environment and if necessary, method types (typically double), functions (plus, minus, if-food-ahead, etc.) and terminals (constants, left, move). The last layer serves for running and evaluating a specific problem. It includes the problem description file (XML), running classes, and a universal user problem interface, which all allow the setting and running of different scenarios and problems.
Figure 1: GPK framework
Problem description. It has been acknowledged that the parameters to control an evolution algorithm and different genetic operators can have significant impact on its performance [17, 18]. Therefore, the designer of the evolutionary algorithm has a problem with deciding what operators and control parameters settings are likely to produce best results. Designers usually set the control parameters of the algorithm from the command line or from the parameter description file. In cases where we have a lot of different settings of the algorithm it is more suitable to use description files. The parameter description is another problem we come across, when we build such a tool. We choose the description in Extensible Markup Language (XML) [21]. Controlling the EC algorithm is important, but in real research we usually need to customize or add some functionality (Fig. 3). With the ability to register different customized classes or even by switching some parts of the GP kernels we allow maximum flexibility and allow researchers to create different scenarios of the same problem.
To allow the described functionality we used the Java reflection feature. Reflection enables the Java code to discover information about the fields, methods and constructors of loaded classes, and to use reflected fields, methods, and constructors to operate on their underlying counterparts on objects, within security restrictions. This allows us to load in a runtime or switch the necessary classes. This kind of programming is on the other hand time consuming and slows down the algorithm efficiency. Registered classes (Fig. 4) can be divided into two groups. In first group we have the specific problem classes, and in second GPK kernel classes. For every specific problem we must implement at least two interfaces. The first one is the “IFitness” in which we describe the problem (program) evaluation criteria. The second is the “IEnvironment” for defining problem state and its terminals and functions. We could also register predefined terminals and functions. In most cases we could instead of implementing all interfaces just inherit predefined class with implemented basic functionalities.
Figure 4: Problem description 2 GPK Generating and Breeding Trees. At the beginning of the evolution process, initial individuals are generated at random. GPK has implemented three tree-generation algorithms (grow, full, ramped half-and-half) with Strong-Typed Genetic Programming (STGP) [20] which adds type constraints to the return values and child arguments of nodes. Tree building algorithms choose between terminals and functions based on return type of the current node of the tree. This principle is also used in the breeding process.
5
Figure 3: Problem description 1
GPK Example
To demonstrate the framework we will use the well known problem of finding the optimal artificial ant trail. The problem was selected because of its specifics and simplicity. The problem has been studied intensively by Koza [15], who showed
Figure 2: GPK Interfaces that multiple ridges and local optima are a difficult problems. The goal is to find a computer program that steers an ant over a trail of food pieces, eating as much food as possible. There are several wellknown trails used for this problem; we used the one known as the Santa Fe trail, which lies on a 32 × 32 toroidal grid and contains 89 pieces of food. The success criterion for an artificial ant program is then to steer the ant to eat these 89 pieces of food within 600 steps. The logic program that defines the space of allowable ant programs can be described with functions and terminals: • move, • left, • right,
• if-food-ahead(X,Y), • prog2(X,Y) and • prog3(X,Y,Z) (where X,Y,Z are terminals or functions). They are implemented in the class AntEnvironment which implements the interface “IEnvironment”. For the implementation of functions and terminals we used the interface “ITreeNode” that is a basic GP tree node (Fig. 5). For fitness we must calculate the success of the generated program on a toroidal grid. The grid is represented by a static boolean array dimension 32 × 32, positions with food are marked with 1. The representation is implemented in the AntFitness class (“IFitness” interface). The main function for
public ITreeNode if_food_ahead(ITreeNode g1, ITreeNode g2) { boolean isFoodAhead=false; int tmpPosX = m_posX; int tmpPosY = m_posY; switch (m_direction) { case LEFT: tmpPosX--; if (tmpPosX m_maxX-1) tmpPosX = 0; isFoodAhead = isFoodAhead(tmpPosX, tmpPosY); break; case UP: tmpPosY--; if (tmpPosY < 0) tmpPosY = m_maxY - 1; isFoodAhead = isFoodAhead(tmpPosX, tmpPosY); break; case DOWN: tmpPosY++; if (tmpPosY > m_maxY-1) tmpPosY=0; isFoodAhead = isFoodAhead(tmpPosX, tmpPosY); break; } if (isFoodAhead) g1.evaluate(); else g2.evaluate(); return null; }
Figure 5: Environment function implementation setting the fitness value is calculateFitness (Fig. 6). //in class AntFitness public void calculateFitness(IProgram program) { m_nodeCount = ((ITree)program).getNodeCount(); AntEnvironment env = new AntEnvironment( copyFood(MAP, MAXX, MAXY),MAXX,MAXY); // MAXX, MAXY are dimension constants (32) // MAP is description of food position while (env.m_steps < NUM_OF_TIME_STEPS && env.m_steps < FULL_FOOD_COUNT) { // FULL_FOOD_COUNT is 89 for Santa Fe trail // NUM_OF_TIME_STEPS is 600 steps program.evaluate(env); } m_foundFood = env.m_foundFood; m_steps = env.m_steps; }
Figure 6: Fitness function implementation For genetic operators we used the predefined crossover, mutation and selection operator. The scenario of our problem with all specific descriptions is described in the XML (Fig. 7).
6
Future Work
The GPK currently implements the following GP problems: the described Ant problem, Cart centering problem, Symbolic regression, Symbolic integration, Symbolic differentiation, Differential equations, Differential equations, Integral equations, General functional equations, Inverse problem, Trigonometric Identities and Sequence induction. They are all tested on simple case stud-
Figure 7: Ant scenario file ies. In future more complex and real problems will be implemented with the existing framework. With solving more “difficult” problems, we will need to increase the variety of predefined genetic operators, introducing code growth limitation techniques. Currently implemented are max tree depth, initial tree depth, initial population generation (grow, half, and ramped half-and-half ), picking the right breeding operations mechanism [17], more dynamic determination when of to give up in searching a better solution and different constant optimization techniques [11]. Most of the planned work can be implemented without changing the framework core. The speed and memory consumption are two more issues of our plans. It takes significant efforts to be efficient in Java. Adding the distributed model to the core of the framework could significantly speed up the researcher’s work. We are also currently working on making a web page with all information about our framework. It will be an open source project where interested people will be able to freely use and continue the development of the framework.
Finally, we plan to improve the user interface to allow better visualization, more evolution statistics, ease of use, etc. For studying purposes we will add an applet based version.
7
Conclusion
A new architecture for the GP framework is proposed. The system written completely in Java is easy to use and extend. Built on the “OO” foundations, using design patterns and Java features like interfaces, reflection, exception handling, etc, it helps us with studying, researching and teaching GP. The paper describes the design, the functionality and the usage of the framework. If we compare the GPK with other related works, we can see that our design is modern and has its own advantages. On the other hand, there exist much more robust, complex and well known frameworks like ECJ or Open BEAGLE, which were describe earlier in this paper. The motivation in developing the GPK was the desire to research and the need to demonstrate and to teach the advantages of GP.
References
[11] Matthew Evett and Thomas Fernandez. Numeric mutation improves the discovery of numeric constants in genetic programming. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 66–71, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann. [12] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. [13] Cristian Gangn´ e and Marc Parizeau. Open beagle: A new versatile c++ framework for evolution computations. Genetic and Evolutionary Computing Conference (GECCO), 2002. [14] J. Holland. Adaptation in Natural Artificial Systems. University of Michigan Press, 1975. [15] John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
[1] A Java-based Evolutionary Computation and Genetic Programming Research System, [16] Ying-Hong Liao and Chuen-Tsai Sun. An edavailable at ucational genetic algorithms learning tool. In http://www.cs.umd.edu/projects/plus/ec/. Education 44 (2), page 210. IEEE Transactions on Education, 2001. [2] A source of information about the field of genetic programming, available at [17] Sean Luke. Issues in Scaling Genetic Prohttp://www.genetic-programming.org. gramming: Breeding Strategies, Tree Generation, and Code Bloat. PhD thesis, University [3] EO Evolutionary Computation Framework, of Maryland, 2000. available at ˇ ˇ [18] M. Mernik, M. Crepinˇ sek, and V. Zumer. A meta-evolutionary approach in searching Genetic Programming in C++ , available at of the best combination of crossover for the http://www.cs.ucl.ac.uk/research/genprog/. tsp. Proceedings of Neural Networks NN’2000, GNU General Public License, available at pages 32–35, 2000. http://www.gnu.org/licenses/gpl.html. [19] Z. Michalewicz. Genetic Algorithms + Data Open BEAGLE, a versatile EC framework, Structures = Evolution Programs. Springer available at Verlag, 1996. http://www.gel.ulaval.ca/∼beagle/. [20] D.J. Montana. Strongly typed genetic proThe Source for Java Technology, available at gramming. Evolutionary Computation 3(2), http://java.sun.com/. pages 199–230, 1995. Ken Arnold, James Gosling, and David [21] Erik T. Ray. Learning XML. O’Reilly, 2001. Holmes. The Java Programming Language. ˇ ˇ [22] Danijel Znuderl, Matej Crepinˇ sek, Marjan Sun Press, 2000. ˇ Mernik, and Viljem Zumer. Gpk: A java T. Back, D. Fogel, and Z. Michalewicz. Handbased genetic programming kernel. Technibook of Evolutionary Computation. IOP Pubcal report, Faculty of Electrical Engineering lishing Ltd and Oxford University Press, 1997. and Computer Science, University of Maribor, 2002. In Slovene. James W. Copper. The Design Patterns Java Companion. Addison-Wesley, 1998. http://geneura.ugr.es/∼jmerelo/eo.html.
[4] [5] [6]
[7] [8]
[9]
[10]