Grammar Guided Genetic Programming for Flexible ... - Springer Link

2 downloads 296 Views 397KB Size Report
School of Information Science and Engineering. Jinan University, Jinan ... In this paper Grammar Guided Genetic Programming (GGGP) was em- ployed to .... and software, telecommunications, retail/wholesale trade and biotechnology. The.
Grammar Guided Genetic Programming for Flexible Neural Trees Optimization Peng Wu and Yuehui Chen School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China {ise wup,yhchen}@ujn.edu.cn

Abstract. In our previous studies, Genetic Programming (GP), Probabilistic Incremental Program Evolution (PIPE) and Ant Programming (AP) have been used to optimal design of Flexible Neural Tree (FNT). In this paper Grammar Guided Genetic Programming (GGGP) was employed to optimize the architecture of FNT model. Based on the predefined instruction sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, over-layer connections and different activation functions for the various nodes involved. The free parameters embedded in the neural tree are optimized by particle swarm optimization algorithm. Empirical results on stock index prediction problems indicate that the proposed method is better than the neural network and genetic programming forecasting models.

1

Introduction

There has been growing interest in evolving architecture and parameters of a higher order Sigma-Pi neural network based on a sparse neural tree encoding [1]. Recently some approaches for evolving the neural tree model based on treestructure-based evolutionary algorithm and random search algorithm have been proposed in [11][12][14]. Antonisse [15] used grammars firstly to constrain the generation of chromosome in his proposed system, which is called grammar-based GA. After then, some grammars-based GP systems was proposed. Stefanski [16] proposed the use of abstract syntax trees to set a declarative bias for GP. Robston [6] demonstrated how a formal grammar might be used to specify constraints for GP in the context of engineering design. Mizoguchi and Hemmi [7] suggested the use of production rules to generate hardware language descriptions during the evolutionary process. Three typical grammar guided GP systems can be classified as: Whigham’s CFG-GP system [8], Schultz’s grammar-based expert systems and Wong’s LOGENPRO system[10][5]. Grammar Guided Genetic Programming (GGGP) [3][4] is a typical tree-structure-based genetic programming system. GGGP using a grammar to constrain search space. The individual GP tree in GGGP must respect the grammar. This overcomes the closure problem in GP and provides a more formalized mechanism for typing (strongly-typed genetic programming). Z.-H. Zhou, H. Li, and Q. Yang (Eds.): PAKDD 2007, LNAI 4426, pp. 964–971, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Grammar Guided Genetic Programming

+6

Output layer

x1 x2

ω1 ω2

xn

ωn

+n

f(a,b)

y

Second hidden x1 layer

x2

First hidden layer

+3

Input layer

965

x1

x2

+2

x3

+3

x 1 x2

+2

x3

x3

x3

x2

+3 x1 x2 x3

Fig. 1. A flexible neuron operator (left), and a typical representation of the FNT with function instruction set F = {+2 , +3 , +4 , +5 , +6 }, and terminal instruction set T = {x1 , x2 , x3 } (right)

Actually, the grammar model can do more than just constrain the search space. In Whigham’s work [9], in addition to the normal GGGP search, the grammar is slightly modified during the search. The updated grammar represents the accumulated knowledge found in the process of search. In this paper, GGGP is firstly employed to optimize the Flexible Neural Tree (FNT). Based on a pre-defined instruction/operator sets, a flexible neural tree model can be created and evolved. FNT allows input variables selection, overlayer connections and different activation functions for different nodes. In our previous work, the hierarchical structure of FNT was evolved using PIPE with specific instructions [11][12]. In this research, the hierarchical structure is evolved using the GGGP. The fine tuning of the parameters encoded in the structure is accomplished using Particle Swarm Optimization (PSO) [20]. The novelty of this paper is in the usage of GGGP for flexible neural tree optimization and for selecting the important inputs in the modeling of stock index. The rest of paper is organized as follows. A simple introduction of Grammar Guided Genetic Programming is given in Section 2, and a hybrid-learning algorithm for evolving the FNT is also presented in this Section. Some simulation results for stock index prediction are given in Section 3. Finally, some conclusions are given in Section 4.

2

The Flexible Neural Tree Model

The function set F and terminal  generating a FNT  instruction set T used for model are described as S = F T = {+2 , +3 , . . . , +N } {x1 , . . . , xn }, where +i (i = 2, 3, . . . , N ) denote non-leaf nodes’ instructions and taking i arguments. x1 ,x2 ,. . .,xn are leaf nodes’ instructions and taking no other arguments. The output of a non-leaf node is calculated as a flexible neuron model (see Fig.1). From this point of view, the instruction +i is also called a flexible neuron operator with i inputs. In the creation process of neural tree, if a nonterminal instruction, i.e., +i (i = 2, 3, 4, . . . , N ) is selected, i real values are randomly generated and used for representing the connection strength between the node

966

P. Wu and Y. Chen

+i and its children. In addition, two adjustable parameters ai and bi are randomly created as flexible activation function parameters. For developing the fore−(

x−ai 2 )

bi casting model, the flexible activation function f (ai , bi , x) = e is used.  The total excitation of +n is netn = nj=1 wj ∗ xj , where xj (j = 1, 2, . . . , n) are the inputs to node +n . The output of the node +n is then calculated by netn −an 2 outn = f (an , bn , netn ) = e−( bn ) . The overall output of flexible neural tree can be computed from left to right by depth-first method, recursively.

2.1

Tree Structure Optimization by GGGP

Grammar Guided Genetic Programming (GGGP) is one of the important extensions for GP [2]. The purpose of presented GGGP is mainly to overcome the closure problem [2], the generation and preservation of valid programs in GP system. For an object, some grammars are used to guide the generation of programs in GP, and a chosen declaration of bias can be set on the space of programs. In this research, Context-free Grammar (CFG) [9] was chosen for FNT optimization. A CFG consists of 4 sets, G = {N, T, P, Σ}, Where N is a set of non-terminal symbols, T is a set of terminal  symbols, P is set of production rules and Σ is set of start symbols, and N T = ∅, Σ ∈ N . The production rules have the format x → y, where x ∈ N , y ∈ N T . The production rules specify how the non-terminal symbols should be written into one of their derivations until the expression contains terminal symbols only. For an example (Fig. 2), a CFG for generation one variable simply arithmetic expression can be described as follows, s → exp exp → exp op exp exp → pre exp exp → var pre → sin|cos op → +|− var → x. Although the components of GGGP are the same as GP, there are still some distinct difference between GGGP and GP. In GGGP a tree-based program is generated according to the context-free grammar. In crossover, two internal nodes labeled with the same non-terminal symbol of the grammar are chosen at random, and the two sub-derivation trees underneath them are exchanged. In mutation, a new randomly generated sub-derivation tree rooted at the same non-terminal symbol replaces the sub-derivation tree of the selected node. The general evolutionary process in GGGP can be described as the same as GP. For detailed description of GGGP algorithm, please refer to [3] and [4]. 2.2

Parameter Optimization with PSO

The Particle Swarm Optimization (PSO) conducts searches using a population of particles which correspond to individuals in evolutionary algorithm (EA). A

Grammar Guided Genetic Programming

967

S

exp

op

exp

pre

exp + exp

op

exp

sin

var

pre exp -

var

x

cos var

x

x

Fig. 2. Derivation tree of expression of sin(x) + cos(x) − x

population of particles is randomly generated initially. Each particle represents a potential solution and has a position represented by a position vector xi . A swarm of particles moves through the problem space, with the moving velocity of each particle represented by a velocity vector vi . At each time step, a function fi representing a quality measure is calculated by using xi as input. Each particle keeps track of its own best position, which is associated with the best fitness it has achieved so far in a vector pi . Furthermore, the best position among all the particles obtained so far in the population is kept track of as pg . In addition to this global version, another version of PSO keeps track of the best position among all the topological neighbors of a particle. At each time step t, by using the individual best position, pi , and the global best position, pg (t), a new velocity for particle i is updated by vi (t + 1) = vi (t) + c1 φ1 (pi (t) − xi (t)) + c2 φ2 (pg (t) − xi (t))

(1)

where c1 and c2 are positive constant and φ1 and φ2 are uniformly distributed random number in [0,1]. The term vi is limited to the range of ±vmax . If the velocity violates this limit, it is set to its proper limit. Changing velocity this way enables the particle i to search around its individual best position, pi , and global best position, pg . Based on the updated velocities, each particle changes its position according to the following equation: xi (t + 1) = xi (t) + vi (t + 1).

(2)

For detailed description of PSO algorithm, please refer to [20]. 2.3

The General Learning Algorithm

The general learning algorithm for GGGP-FNT model can be described as follow: 1) Initialization. Set the initial value of parameters used in GGGP and PSO algorithms. The initial population (flexible neural trees and the corresponding parameters) is generated randomly.

968

P. Wu and Y. Chen

2) Structure optimization with GGGP algorithm, in which the fitness function is calculated by root mean square error (RMSE). 3) If a better structure is found then go to step 4), otherwise go to step 2). 4) Parameters optimization with PSO algorithm. In this stage, the structure of FNT is fixed and the best tree is taken from the end of run of the GGGP search, and the fitness function is also calculated by RMSE. 5) If the maximum number of iterations of GGGP algorithm is reached, or no better parameter vector is found for a significantly long time (100 steps) then go to step 6); otherwise go to step 4). 6) If satisfactory solution is found, then stop; otherwise go to step 2).

3 3.1

Experimental Studies Stock Index Modeling

Prediction of stocks is generally believed to be a very difficult task - it behaves like a random walk process and time varying. The obvious complexity of the problem paves the way for the importance of intelligent prediction paradigms [17]. In this experiment, we analyze the seemingly chaotic behaviour of two wellknown stock indices namely the Nasdaq-100 index of NasdaqSM [18] and the S&P CNX NIFTY stock index [19]. The Nasdaq-100 index reflects Nasdaq’s largest companies across major industry groups, including computer hardware and software, telecommunications, retail/wholesale trade and biotechnology. The Nasdaq-100 index is a modified capitalization weighted index, designed to limit domination of the Index by a few large stocks while generally retaining the capitalization ranking of companies. Through an invest-ment in Nasdaq-100 index tracking stock, investors can participate in the collective performance of many of the Nasdaq stocks that are often in the news or have become household names. Similarly, S&P CNX NIFTY is a well-diversified 50 stock index accounting for 25 sectors of the economy. It is used for a variety of purposes such as benchmarking fund portfolios, index-based derivatives and index funds. The CNX Indices are computed using the market capitalization weighted method, wherein the level of the Index reflects the total market value of all the stocks in the index relative to a particular base period. The method also takes into account constituent changes in the index and importantly corporate actions such as stock splits, rights, and so on, without affecting the index value. 3.2

Experimental Setup and Results

In this experiment, we considered 7-year stock data for the Nasdaq-100 Index and 4-year for the NIFTY index. Our research investigates the performance of GGGP-FNT, GP and ANN for modeling the Nasdaq-100 and NIFTY stock market indices [13]. We used the same training and test data sets to evaluate the different models. The assessment of the prediction performance of the different paradigms were done by quantifying the prediction obtained on an independent

Grammar Guided Genetic Programming 0.8

Model outputs and desired value

Model outputs and desierd value

1

Desired value GGGP−FNT GGGP ANN

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

969

0

50

100

150

200

250

300

350

400

Time

450

Desiered value GGGP−FNT GGGP ANN

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

50

100

150

200

250

300

350

400

Time

Fig. 3. Forecasting performances of the three models for the Nasdaq index (left) and NIFTY index (right)

data set. The Root Mean Squared Error (RMSE) is used to performance evaluation index. The settings for GGGP-FNT are population size 100, cross rate 0.9, mute rate 0.1 and maximum depth 5. A FNT model was constructed using the training data and then the model was used on the test data set. The instruction sets used to create an optimal FNT forecaster are S = {+2 , +3 , x1 , x2 , x3 } and S = {+2 , +3 , x1 , x2 , x3 , x4 , x5 } for Nasdaq-100 and NIFTY stock index, respectively. Where xi (i = 1, 2, 3, 4, 5) denotes the 5 input variables of the forecasting model. The grammars used for modeling the Nasdaq-100 index (left) and for modeling the NIFTY index (right) are shown as follow, s → exp exp → exp op exp exp → thr exp exp exp exp → var op → +2 thr → +3 var → x1 |x2 |x3

s → exp exp → exp op exp exp → thr exp exp exp exp → var op → +2 thr → +3 var → x1 |x2 |x3 |x4 |x5

For comparison purpose, a GGGP was also implemented to forecast the stock index. The settings for GGGP are population size 100, cross rate 0.9, mute rate 0.1, and maximum depth 15. The instruction sets S = {+, −, ∗, sin, cos, exp, x1 , x2 , x3 } and S = {+, −, ∗, sin, cos, exp, x1 , x2 , x3 , x4 , x5 } are used for modeling the Nasdaq-100 index and the NIFTY index, respectively. Training was terminated after 3000 epochs on each dataset. Two ANN models with architecture {3 − 10 − 1} and {5 − 10 − 1} trained by PSO are also implemented for modeling the Nasdaq-100 index and the NIFTY index, respectively. Training was terminated after 3000 epochs on each dataset. Table 1 summarizes the training and test results achieved for the two stock indices using the three different approaches. Figures 3 and 4 depict the test

970

P. Wu and Y. Chen Table 1. Comparison of RMSE results for three learning methods (training)

Nasdaq-100 NIFTY

GGGP-FNT 0.02582 0.01699

GGGP 0.02568 0.01658

ANN 0.02573 0.01729

Table 2. Comparison of RMSE results for three learning methods (testing)

Nasdaq-100 NIFTY

GGGP-FNT 0.01725 0.01291

GGGP 0.01993 0.01366

ANN 0.01789 0.01426

results for the one-day ahead prediction of the Nasdaq-100 index and the NIFTY index, respectively. Comparing GGGP-FNT with GGGP and ANN, we found that GGGP-FNT has better generalization ability and high accuracy than GGGP and ANN forecasting models.

4

Conclusions

In this paper, a GGGP and PSO based learning algorithms are employed to optimal design of the FNT models. Simulation results on stock index forecasting problems show the feasibility and effectiveness of the proposed method. For the GGGP algorithm itself, the vital topic is a Context-free Grammar model (CFG). The gammer and its self-turning should be further discussed in our future work. It should be noted that other grammar model can also be used to guide the GP and used to design of FNT, and therefore it is valuable to give a further investigation.

Acknowledgment This research was partially supported by the Natural Science Foundation of China under contract number 60573065 and the Key Subject Research Foundation of Shandong Province.

References 1. Zhang, B.T., et al.: Evolutionary induction of sparse neural trees. Evolutionary Computa-tion.5, (1997) 213-236 2. Koza,J. R.: Genetic Programming. On the Programming of Computers by Natural Selection, MIT Press, MA, (1992) 3. Nguyen X.H.: A Flexible Representation for Genetic Programming: Lessons from Natural Language Processing, Chapter 2, PhD Thesis, University of New South Wales, Australia, (2004)7-29

Grammar Guided Genetic Programming

971

4. Shan, Y., McKay, R., Baxter, R., Abbass, H., Essam, D., and Nguyen., H.: Grammar model based program evolution. In Proceedings of The Congress on Evolutionary Computation, Portland, USA. IEEE.(2004) 5. Wong, M.L.and Leung, K.S.: An Adaptive Inductive Logic Programming System using Genetic Programming. Proceedings of the Fourth Annual Conference on Evolutionary Programming. MIT Press,(1995)737-752 6. Ross, B.J.: The Evolution of Stochastic Regular Motifs for Protein Sequences. New Gen-eration Computing, 20(2), (2002)187-213 7. Mizoguchi, J.,et al: Production Genetic Algorithms for Automated Hardware Design through Evolutionary Process. Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE Press, (1994)85-90 8. Whigham, P.A.: Grammatically-Based Genetic Programming, Proceedings of the Work-shop on Genetic Programming. From Theory to Real-World Applications, Morgan Kauf-mann, (1995)33-41 9. Whigham, P.A., Inductive bias and genetic programming. Proc. of First International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, pages 461.466. UK:IEE, September 1995. 10. Wong, M.L. and Leung, K.S.: An Adaptive Inductive Logic Programming System using Genetic Programming. Proceedings of the Fourth Annual Conference on Evolutionary Programming, MIT Press, (1995)737-752 11. Chen, Y., Yang, Y. and Dong, J.: Nonlinear System Modeling via Optimal Design of Neural Trees. Int. J. of Neural Systems 14(2)(2004) 125 - 137 12. Chen, Y., Abraham, A., Yang, J. and Yang, B.: Hybrid Methods for Stock Index Modeling. Fuzzy Systems and Knowledge Discovery: Second International Conference (FSKD 2005), China, LNCS 3614, (2005)1067-1070 13. Chen, Y., Yang, B., Dong, J. and Abraham, A.: Time-Series Forecasting using Flexible Neural Tree Model. Information Science 174(3-4),(2005)219 - 235 14. Chen, Y., Yang, B., Dong, J.: Evolving Flexible Neural Networks using Ant Program-ming and PSO algorithm”, International Symposium on Neural Networks (ISNN’04), LNCS 3173, (2004)211-216 15. Antonisse, H. J.: A Grammar-Based Genetic Algorithm. in G.J.E. Rawlins, editor, Foun-dations of Genetic Algorithms, Morgan Kaufmann, (1991) 16. Stefanski, P.A.:Genetic Programming Using Abstract Syntax Trees. Notes from the Ge-netic Programming Workshop (ICGA 93), (1993) 17. Abraham A., Nath B. and Mahanti P.K.: Hybrid Intelligent Systems for Stock Market Analysis. Computational Science, Springer-Verlag Germany, Vassil N Alexandrov et al. (Editors), (2001)337-345 18. Nasdaq Stock MarketSM: http://www.nasdaq.com 19. National Stock Exchange of India Limited: http://www.nse-india.com 20. Kennedy, J. and Eberhart, R.C.: Particls swarm optimization. Proc. IEEE int’l conf. on neural networks Vol. IV, (1995)1942-1948