Automatic Modeling of Complex Functions with Clonal Selection-based Gene Expression Programming Zhaohui Gan Department of Information Science & Engineering Wuhan University of Science & Technology Wuhan, 430081, China
[email protected]
Zhenkun Yang Department of Information Science & Engineering Wuhan University of Science & Technology Wuhan, 430081, China
[email protected]
Gaobin Li Department of Information Science & Engineering Wuhan University of Science & Technology Wuhan, 430081, China
Min Jiang Department of Computer Science Wuhan University of Science & Technology Wuhan, 430081, China
[email protected]
Abstract
regression tool, it has been applied to complex system modeling for several years [2-5]. However, GP shows comparatively low convergence speed due to its nonlinear representation when it is used to deal with complex problems. Gene Expression Programming, a new technique of evolutionary algorithm used for data analysis, was first invented by Candida Ferreira in 1999 [6]. Similar to Genetic Algorithm (GA) and GP, it uses a population of individuals, selecting and evolving individuals according to their fitness and genetic operators. The main difference between GA, GP and GEP is the form of individual encoding: individuals in GA are linear strings of fixed length whereas the GP individuals are trees of different shape and size, the GEP individuals are also trees of different shape and size encoded as linear strings of fixed length using Karva notation [7]. GEP combines the advantage of both GA and GP while overcomes their some shortcomings of GA and GP. GEP has lots of advantages. Based on the simple specific encoding method, GEP has no invalid individuals. However, it has some drawbacks, as GEP is derived from GA and GP, it probably shows premature convergence. In GEP, individuals are selected according to its fitness by roulette-wheel sampling with elitism, although the best individual is preserved, some other better individuals may be lost. All these disadvantages impair the efficiency and effect of GEP. Clonal Selection Algorithm (CSA), being a member of Artificial Immune System, has been successfully applied into several challenging domains, such as multimodal optimization, data mining and pattern recognition. CSA
Gene Expression Programming (GEP) is a powerful evolutionary algorithm derived from Genetic Algorithm and Genetic Programming for system modeling and knowledge discovery. However, when dealing with complex problems, GEP shows quite slow convergence speed, it also probably encounters premature convergence. This paper proposed a Clonal Selectionbased Gene Expression Programming (CS-GEP), which combines the advantages of Clonal Selection Algorithm (CSA) and GEP, overcoming some drawbacks of GEP. CS-GEP is applied into function modeling experiments, the results show that CS-GEP has faster convergence speed and higher modeling precision than that of GEP.
1. Introduction Currently, a frequently encountered problem in analysis the real-world system is finding a mathematical model to express the relationship between the resulted data with minimal error. Such modeling task is difficult and time consuming. Conventional methods always require human experts who hold the domain-specific knowledge very well. Human experts thus play a very important role in developing system model. Recent works in automatic programming have demonstrated that evolutional computation, such as Genetic Programming (GP) and Gene Expression programming (GEP), can discover useful models for complex system. GP was first introduced by Cramer and further developed by Koza [1], which is a powerful
Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007
can enhance the diversity of the population and has a faster convergence speed, it is capable of preserving local solutions, and finding the optimal solution finally. In this paper, we proposed Clonal Selection-based Gene Expression Programming (CS-GEP), which combines the simple representation method of GEP and the advantage of Clonal Selection Algorithms. As distinguishing with typical GEP, some immune operators are adopted in CS-GEP, which overcomes the aforementioned drawbacks of GEP. This paper is organized as follows. Section 2 gives an overview of the related work. Section 3 explains the CSGEP algorithm in detail. The experiments and the discussion are covered in Section 4. Some conclusions and further works are drawn in Section 5.
2. Related Work and Motivations 2.1. An Overview of GEP
For example, from the function set F= {E, Q, +, -, *, /} and the terminal set T= {x, y}, an algebraic expression can be: (2) e xy + y + x h=10, n=2 and t=11 are supposed, the gene is shown in Fig. 2 (the head is shown in bold): 012345678901234567890 Q+E+*yQxyxyxxxyyxyxyx head
Q
Gene Expression Programming is a new evolutionary algorithm for creation of computer programs automatically. Similar to GP, GEP uses a randomly generated population and applies genetic operators to this population until the algorithm finds an individual that satisfies some termination criteria. Fig.1 illustrates the flowchart of the typical GEP algorithm.
+
+ E
Gene 1
+ y
*
Q
Start
x Create the Initial Population
x
Gene 2 Gene 3
+ Gene 4
Yes
No Select new Population
y
+
(a) (b) Fig. 3. (a).The expression tree of equation 2 and (b) An example of a four-gene chromosome linked by addition
Evaluate fitness of Chromosomes
Terminate Criterion Satisfied?
tail
Fig. 2. The gene of equation 2 where, Q represents the square-root function, and E represents the exponential function. The gene can be represented by the expression tree shown in Fig.3 (a). The GEP chromosomes are usually composed of several genes of equal length. The interaction between the genes was specified by the linking function. An example of a four-gene chromosome linked by addition is shown in Fig.3 (b).
End
Mutation, Transposition Recombination Chromosomes for Next Generation
Fig. 1. The flowchart of typical GEP 2.1.1. GEP Genes and Chromosomes. In GEP, each gene is composed of a fixed length of symbols (including head and tail). The head contains symbols that represent both functions and terminals, whereas the tail contains only terminals. For each problem, the length of the head h is determined, the length of tail t is a function of h and the number of arguments of function with more arguments n, and t is evaluated by the equation: t = h × ( n − 1) + 1 (1)
Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007
2.1.2. Main Operators of GEP. GEP has some genetic operators which include selection, mutation, transposition and recombination. Those individuals selected for next generation are subjected to be modified by these operators. Selection. In GEP, individuals are selected according to its fitness by roulette-wheel sampling with elitism. The best individual is cloned to next generation directly. Mutation. In GEP, anywhere of every gene in the chromosome all probably generates mutation. Any symbol in the head of a gene can be changed into another symbol, however, terminals in the tail of a gene can only be changed into terminals. Transposition and insertion sequence elements. Insertion sequence (IS) transposition, root IS (RIS) transposition and gene transposition are used in GEP. These operators copy a sequence of elements and insert it into another location. Recombination. One-point recombination, two-point recombination and gene recombination are used in GEP. In all types of recombination, two chromosomes are randomly chosen to swap information between them.
2.2. Clonal Selection Algorithm The Artificial Immune System (AIS) is a new kind of evolutionary algorithms inspired by the natural immune system to solve real-world problems. The Clonal Selection Principle (CSP) establishes the idea that only those cells that recognize the antigens are selected to proliferate. When an antibody recognizes an antigen with a certain affinity, it is selected to proliferate (asexual). During asexual reproduction, the B-cell clones experience a hypermutation process, together with a high selection pressure, their affinities are improved. In the end of this process, the B-cells with higher antigenic affinities are selected to become memory cells with long life spans. Based on the CSP, Clonal Selection Algorithm (CLONALG) was proposed by de Castro and Von Zuben to solve complex problems [8].
2.3. Motivation from GEP and CSA GEP retained the benefit of GA and GP, the chromosome of GEP is simple and efficient due to the linear encoding method. However, the shortcoming of premature convergence also exists in GEP. CSA enhances the diversity of the population, it has the advantage of fast convergence speed. All the above advantages motivated us apply some operators of CSA into GEP to overcome the disadvantage of GEP.
3. Clonal Selection-based Gene Expression Programming The major contributions of CS-GEP are adoption of the linear representation of GEP and application of some immune operator of CSA. Based on the idea of CSA, only the best of individuals in current population will be cloned and modified. After modification, only the best individuals are selected to next generation. In order to maintain the population diversity, some randomly generated individuals will replace some ones with lower fitness in the population. All the advantages of operators applied guarantee that the algorithm can enhance the population diversity and find the global optimal solution quickly. That is the reason why CS-GEP can overcome the drawback of premature convergence of GEP. The detail of the algorithm is given below.
3.1. The framework of our algorithm The framework of our algorithm is based on GEP, and it incorporates the Clonal Selection Algorithm into classical GEP, namely Clonal Selection-based Gene Expression Programming (CS-GEP). The antibody representation in CS-GEP is the same as the gene expression in GEP, which encoded as string of symbols,
Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007
compared with binary string in CSA. The algorithm of CS-GEP is described as follows: 1) Initialization: An initial population P (N individuals), is generated. 2) Evaluate the fitness of individuals. 3) Select the n (n