Selection of Heterogeneous Fuzzy Model ... - Semantic Scholar

8 downloads 144 Views 2MB Size Report
algorithms were applied to create heterogeneous ensembles comprising regres- sion fuzzy models to aid in real estate appraisals. The results of experiments.
New Generation Computing, 29(2011)309-327 Ohmsha, Ltd. and Springer

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms ´ Magdalena SME ¸ TEK and Bogdan TRAWINSKI Wrocław University of Technology Wybrze˙ze Wyspia´nskiego 27, Wrocław, POLAND

{magdalena.smetek, bogdan.trawinski}@pwr.wroc.pl Received 01 July 2010 Revised manuscript received 31 March 2011

Abstract

The problem of model selection to compose a heterogeneous bagging ensemble was addressed in the paper. To solve the problem, three self-adapting genetic algorithms were proposed with different control parameters of mutation, crossover, and selection adjusted during the execution. The algorithms were applied to create heterogeneous ensembles comprising regression fuzzy models to aid in real estate appraisals. The results of experiments revealed that the self-adaptive algorithms converged faster than the classic genetic algorithms. The heterogeneous ensembles created by self-adapting methods showed a very good predictive accuracy when compared with the homogeneous ensembles obtained in earlier research. Keywords:

§1

Self-Adaptive GA, Ensemble Selection, Heterogeneous Ensembles, Fuzzy Models.

Introduction

Ensemble learning and fusion have attracted the attention of many researchers mainly due to the fact that multi-model classifiers or regressors very often revealed better predictive performance than their single base counterparts. Ensemble learning systems combine the output of machine learning algorithms, called weak learners, in order to get smaller prediction errors (in regression) or lower error rates (in classification). The individual estimator must provide different patterns of generalization, thus in the training process, diversity is employed. Otherwise, the ensemble would be composed of the same predictors and would provide as good accuracy as the single one. It has been proven that the ensemble performs better when each individual machine learning system is accurate and

310

M. Sm¸ etek and B. Trawi´ nski

makes errors on different examples.9, 36) Ensembles of machine learning models are computationally intensive, so much effort has been put into improving their efficiency and at the same time their predictive performance. One way is to use fast and effective learning algorithms, e.g. decision trees may be trained faster than neural networks, and those in turn faster than genetic fuzzy systems. Extensive research devoted to reducing the sizes of training sets resulted in various techniques such as instance selection,26) m-out-of-n subsampling with and without replacement,10, 12) feature selection60) and random subspaces.33) Ensemble selection belongs to the methods aimed at the reduction of the ensemble size and improving its efficiency and predictive performance. An ensemble may be composed of models with both high and low predictive accuracy, and the latter may negatively affect the overall performance of the ensemble. An effective ensemble may be obtained by pruning these models while maintaining a high diversity among the remaining members.47) A great number of classifier/regressor selection methods have been proposed, their reviews and taxonomy can be found in 14, 59) . It is argued that much of the power of these methods comes from the diversity of the component classifiers/regressors.11, 24, 41, 53) It is also emphasized that an important aspect of the selection of an optimal system is the strategy to overproduce the components and then choose an optimal subset.19, 23, 48) Due to the fact that ensemble selection belongs to the NP-hard combinatorial optimization problems56) and cannot be solved within polynomial computation time, heuristic algorithms have to be used in order to find near-optimal solutions at a reasonable time. The most popular among them are genetic algorithms.16, 34, 38) In the majority of studies, a single criterion is used for the selection of ensembles, however in 19, 44) , it was empirically proven that combined optimization criteria to rank candidate ensembles may lead to better results. So far, we have investigated several methods to construct regression models to assist with real estate appraisal: evolutionary fuzzy systems, neural networks, decision trees, and statistical algorithms using MATLAB, KEEL, RapidMiner, and WEKA data mining systems.27, 39, 42) We have studied also bagging ensemble models created applying these computational intelligence techniques.29, 40, 43) In 8, 37) , we have empirically examined m-out-of-n bagging with and without replacement using week learners such decision trees and neural networks as well as genetic fuzzy systems and genetic neural networks. In 28) , we presented our study on the application of some heuristic optimization algorithms to construct heterogeneous bagging ensembles based on the results of fuzzy algorithms applied to real-world cadastral data. In this paper, we tackled the problem of the selection of regression models to compose a heterogeneous bagging ensemble. The search space was determined by a set of regression models generated by a number of different fuzzy algorithms over different bootstrap replicates. To solve the problem, we proposed three self-adapting genetic algorithms with different control parameters: mutation, crossover, and selection which were adjusted during the execution. Moreover,

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

311

two constraints on the solution were imposed to ensure the heterogeneity, namely the maximal number of models created by a given algorithm and the maximal number of models built over a given bag. Our solution uses some ideas developed by Maruo et al.45) The paper is organized as follows. In section 2, we present the various approaches to parameter control in genetic algorithms. In section 3, we outline our heterogeneous ensemble selection problem. In section 4, we propose three self-adapting genetic algorithms with varying mutation, crossover, and selection devised to select models and compose heterogeneous ensembles. In section 5, we describe results of experiments conducted with self-adapting genetic algorithms. Section 6 provides some concluding remarks.

§2

Parameter Control in Genetic Algorithms

The execution of genetic algorithms may be time consuming, especially when they are employed to create and optimize different models with hybrid machine learning techniques, such as genetic fuzzy systems and genetic neural networks. Therefore, the methods of speeding up the convergence of Genetic Algorithms (GA) or Evolutionary Algorithms (EA) have been developed by many researchers for above two decades. The techniques of adapting the values of various parameters to optimize processes in evolutionary computation have been extensively studied and the issue of adjusting GA/EA to the problem while solving it still seems to be a promising area of research. The probability of mutation and crossover, the size of selection tournament, or the population size belong to the most commonly set parameters of GA/EA. A few taxonomies of parameter setting forms in EC have been proposed.5, 25, 54) Angeline5) determines three different adaptation levels of GA/EA parameters: population-level where parameters that are global to the population are adjusted, individual-level where changes affect each member of the population separately, and component-level where each component of each member may be modified individually. The classification worked out by Smith and Fogarty54) is based on three division criteria: what is being adapted, the scope of the adaptation, and the basis for change. The latter is further split into two categories: evidence upon which the change is carried out and the rule or algorithm that executes the change. Eiben, Hinterding, and Michalewicz25) devised a general taxonomy distinguishing two major forms of parameter value setting, i.e. parameter tuning and parameter control, which is depicted in Fig. 1. The first consists in determining good values for the parameters before running GA/EA, and then tuning the algorithms without changing these values during the run. However, this approach stands in contradiction to the dynamic nature of GA/EA. The second form is an alternative and lies in dynamic adjusting the parameter values during the execution. The third can be categorized into three classes deterministic, adapting and self-adapting parameter control. Deterministic parameter control is applied when the values of EC parameters are modified according to some deterministic rules without using any feedback from the optimization process. In turn, adaptive parameter control is employed when some form of feedback from the process

312

M. Sm¸ etek and B. Trawi´ nski

is used to determine the trend or strength of the change to the GA parameter. Self-adaptive parameter control takes place when the parameters to be adapted are encoded into the chromosomes and undergo mutation and recombination.

Fig. 1

General Taxonomy of Parameter Setting in Evolutionary Computation10)

A series of parameter control methods have been proposed in the literature.6, 32, 46) Several mechanisms of mutation and crossover adaptation and selfadaptation have been developed and experimentally tested.7, 20, 21, 30, 55, 62) Very often benchmark functions are employed to carry out the experiments to validate effectiveness and convergence of novel techniques and to compare them with other methods.22, 52, 57, 63)

§3

Heterogeneous Ensemble Selection Problem

The problem addressed in this paper is to create heterogeneous ensembles comprising fuzzy models aiding real estate appraisals with the predictive accuracy as close as possible to the optimal value. In 40) , comparative analysis of homogenous bagging ensembles built using 16 fuzzy algorithms implemented in data mining system KEEL3, 4) was carried out. They are listed in Table 1, and details of the algorithms and references to source articles can be found on KEEL web site: www.keel.es. Real-world data used to generate and learn appraisal models came from the cadastral system and the registry of real estate transactions referring to residential premises sold in one of big Polish cities at market prices within two years 2001 and 2002. They constituted an original dataset of 1098 instances of sales/purchase transactions. Four attributes were pointed out as price drivers: usable area of premises, floor on which premises were located, year of building construction, number of storeys in the building, in turn, price of premises was the output variable. Schema of the experiments conducted is depicted in Fig. 2. On the basis of the original data set, 30 bootstrap replicates (bags) of the cardinality equal to the original dataset were created by random draw with replacement. The bags were then used to generate models employing each of 16 above mentioned algorithms. During the pre-processing phase, normalization of data was performed using the min-max approach. All models were generated using 10-fold cross validation (10cv) and as the accuracy measure, the mean square error (MSE) was applied. As a result, a matrix of 480 entries containing the results of machine

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

Alg. COR FRS FGP PFC FSA SEF THR IRL IHC ISC ITS W-M WAG

WGG

WGS

WGT

313

Fuzzy Algorithms Used in Study Description Genetic fuzzy rule learning, COR algorithm inducing cooperation among rules15) Regr-FRSBM Fuzzy and random sets based modeling51) Regr-Fuzzy-GAP Fuzzy rule learning, grammar-based GP algorithm50) Regr-Fuzzy-P FCS1 Pittsburgh fuzzy classifier system #113) Regr-Fuzzy-SAP Fuzzy rule learning, grammar GP based operators and simulated annealing based algorithm49) Regr-Fuzzy-SEFC Symbiotic evolution based fuzzy controller design method35) Regr-Thrift Genetic fuzzy rule learning, Thrift algorithm58) Regr-Fuzzy-MOGULIterative rule learning of descriptive Mamdani rules17) IRL Regr-Fuzzy-MOGULIterative rule learning of Mamdani rules - high conIRLHC strained approach18) Regr-Fuzzy-MOGULIterative rule learning of Mamdani rules - small conIRLSC strained approach17) Regr-Fuzzy-MOGULLocal evolutionary learning of TSK fuzzy rule based TSK system1) Regr-Fuzzy-WM Fuzzy rule learning, Wang-Mendel algorithm61) Regr-Fuzzy-WM& Wang-Mendel algorithm tuned using approxi-mative Post-A-G-Tuninggenetic tuning of FRBSs31, 61) FRBSs Regr-Fuzzy-WM& Wang-Mendel algorithm tuned using global genetic Post-G-G-Tuningtuning of the fuzzy partition of linguistic FRBSs17, 61) FRBSs Regr-Fuzzy-WM& Wang-Mendel algorithm tuned using genetic selection Post-G-S-Weightof rules and rule weight tuning2, 61) RRBS Regr-Fuzzy-WM& Wang-Mendel algorithm tuned using genetic tuning of Post-G-T-WeightsFRBS weights2, 61) FRBSs Table 1 KEEL name Regr-COR GA

Fig. 2

Schema of Bagging Ensemble Model Development

learning over 30 bags using 16 above mentioned algorithms was obtained. In the matrix, the columns represent individual bags and the rows - fuzzy algorithms and the entries contain the MSE values provided by respective models. The comparison of predictive accuracy of homogeneous ensemble models created by individual algorithms over 30 bags with MSE of base models is presented in Fig. 3.

314

M. Sm¸ etek and B. Trawi´ nski

Fig. 3

Performance of Homogeneous Ensembles Over 30 Bags Compared with Base Models

Our problem can be generally formulated in the following way: how to select X models to compose a heterogeneous ensemble with the optimal accuracy expressed in terms of the average MSE of its component models? The search space is determined by A different algorithms used to generate machine learning models over B bootstrap replicates (bags). It can be illustrated in the form of a matrix A×B, where rows a1 , a2 , . . . , aA denote the models built by individual algorithms, and columns b1 , b2 , . . . , bB stand for the models built over individual bags (Table 2). Each entry (i, j)th of the matrix comprises the MSE value provided by a respective model. Two constraints can be put on the solution to ensure the heterogeneity. First, the maximal number of models created by a given algorithm can be set, i.e. the maximal number of elements taken from one row. Second, the maximal number of models created over a given bag can be set, i.e. the maximal number of elements drawn from one column. Table 2

a1 a2 a3 ... aA

§4

Matrix containing MSEs provided by models built by different algorithms over different bags b1 b2 b3 ... bB M SE11 M SE12 M SE13 ... M SE1B M SE21 M SE22 M SE23 ... M SE2B M SE31 M SE32 M SE33 ... M SE3B ... ... ... ... ... M SEA1 M SEA2 M SEA3 ... M SEAB

Self-adapting Methods Applied to GA

In order to solve the ensemble selection problem formulated in the previous section, we proposed three self-adapting GAs with varying mutation, crossover, and selection. To implement self-adaptive methods, the chromosomes, besides the solution, comprised mutation and crossover rates, tournament size, number of genes to mutate, and number of crossover points. By mutation and crossover rates, we understood the percentage of chromosomes subjected to mutation and crossover, respectively. For comparison, we used the classic GA with unchanged parameters of selection, crossover, and mutation. As fitness function, the average value of MSEs provided by the models selected to the solution was applied. The constraints were checked and ensured at the evaluation phase by adding a

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

315

penalty value to the value of fitness function of those chromosomes which did not meet the requirements. The increased value of fitness function was so high that such chromosome could not pass the selection phase. In the same way, the chromosomes with repeated selections were eliminated. Below the implementation of the proposed self-adapting techniques is presented using the example of the matrix of 480 entries containing the predictive accuracies of the models built over 30 bags with 16 fuzzy algorithms described in the previous section. In all variants of GAs, constant length chromosomes were used, thus they could represent only the solutions comprising a determined number of models. In order to implement the proposed self-adapting methods, binary coding was employed. In our case, to identify one element, the number of its column and number of its row were needed, so 4 bits (genes) were required to encode the number of rows (0-15) and 5 bits (genes) were needed to represent the number of columns (0-30), thus 9 genes were necessary to encode one element of a solution. The variants of genetic algorithms were denoted as GA, SAMC, SAMCT, and SAM2C2T, and patterns of their chromosomes are illustrated in Fig. 4. GA - Classic genetic algorithm. In the standard genetic algorithm, each chromosome comprised 9*X genes. The unvarying parameters of the standard GA were as follows: mutation rate was equal to 0.15, crossover rate to 0.80, and population size was set to 100, elite equal to one and a seven-chromosome tournament method was applied to the selection process, one third of randomly selected genes in a chromosome was subject to a mutation, a two-point crossover was employed with randomly determined recombination breakpoints. All the parameters remain unchanged during the run. The encoding of chromosomes, mutation and crossover operations are depicted in Figs. 5, 6 and 7, respectively. SAMC Genetic algorithm with self-adapting mutation and crossover. Selfadaptive parameters encoded in SAMC chromosomes are shown in Fig. 8. In SAMC algorithm 12 more genes were added to each chromosome in a population to encode the mutation rate with 5 genes and crossover rate with 7 genes. The mutation rate could be set to values from the bracket 0 to 0.3, and crossover rate from the range 0.5 to 1.0. Selection was the same as in GA. Mutation. The self-adaptive mutation applied in SAMC is illustrated in Fig. 9. It differs from a standard GA mutation which rate remains constant

Fig. 4

Self-adaptive Parameters Encoded in Chromosomes of Individual Algorithms

316

M. Sm¸ etek and B. Trawi´ nski

Fig. 5

Fig. 8

Solution Encoding in the Standard GA

Fig. 6

Mutation Operation in the Standard GA

Fig. 7

Crossover Operation in the Standard GA

Self-adaptive Parameters Encoded in SAMC Chromosomes

during the run. Each chromosome from the population can be subject to the mutation. A special N×M matrix with real, randomly selected values from the range 0 to 0.3 is created, where N is the population size, and M stands for the number of genes in a chromosome. Each matrix entry corresponds to one gene in a chromosome. The self-adaptation of the mutation proceeds as follows. For each gene in each chromosome from the population: • calculate the value of mutation rate extracted from the chromosome, • if the corresponding value in the matrix entry is lower than the value of

Fig. 9

Self-adaptive Mutation in SAMC

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

317

the mutation rate taken from the chromosome, then the gene mutates by flipping its value, • the NM matrix remains unchanged during the run. Crossover. The self-adaptive crossover, which is depicted in Fig. 10, is also different from a traditional GA crossover. A special N×1 matrix (i.e. vector) with real, randomly selected values from the brackets 0.5 to 1.0 is created, where N is the population size. Each entry in the vector corresponds to one chromosome. The self-adaption of the crossover goes on in the following way. For each chromosome from population: • calculate the value of crossover rate extracted from chromosome, • if the value from the vector entry is lower than the value of crossover rate taken from the chromosome, then the chromosome is selected to crossover, • so selected chromosomes are then randomly coupled and undergo the twopoint crossover with randomly determined recombination breakpoints, • the vector remains unchanged during the run.

Fig. 10

Self-adaptive Crossover in SAMC

SAMCT Genetic algorithm with self-adapting mutation, crossover, and tournament size. This algorithm was similar to SAMC. Mutation and crossover were the same as in SAMC (see Figs. 9 and 10), only selection was accomplished in a different way. Three further genes were added to the chromosome to encode the tournament size which could be set to integer from the range 1 to 7 (Fig. 11). Therefore, before selection, the average tournament size of the whole population was calculated and rounded to integer and then used as the final tournament size in the selection operation.

Fig. 11

Self-adaptive Parameters Encoded in SAMCT Chromosomes

SAM2C2T Genetic algorithm with self-adapting mutation, crossover, tournament size, number of genes to mutate, and number of crossover points. Selfadaptive parameters encoded in SAM2C2T chromosomes are presented in Fig.

318

M. Sm¸ etek and B. Trawi´ nski

Fig. 12

Self-adaptive Parameters Encoded in SAM2C2T Chromosomes

12. In this self-adapting algorithm mutation and crossover are carried out differently, whereas selection analogously to SAMCT. 10 more genes were added to each chromosome in comparison with SAMCT. Seven of them are used to encode the number of genes which undergo mutation and three are utilized to encode the number crossover points, which can be in the brackets 1 to 9. Mutation. It differs from mutations of GA, SAMC, and SAMCT. Each chromosome from the population can be subject to the mutation. A special N×1 matrix (i.e. vector) with real, randomly selected values from the range 0 to 0.3 is created, where N is the population size (Fig. 13). Each entry in the vector corresponds to one chromosome. The self-adaptation of the mutation proceeds as follows. For each chromosome from population: • calculate the value of mutation rate extracted from the chromosome, • if the value from the vector entry is lower than the value of mutation rate taken from the chromosome, then this chromosome mutates by flipping the values of randomly selected genes, and the number of mutated genes is extracted from appropriate segment of the chromosome, • the vector remains unchanged during the run.

Fig. 13

Self-adaptive Mutation in SAM2C2T

Crossover. The self-adaptive crossover is k-point one so is different from the crossover of GA, SAMC, and SAMCT. A special N×1 matrix (i.e. vector) with real, randomly selected values from the bracket 0.5 to 1.0 is created, where N is the population size (Fig. 14). Each entry in the vector corresponds to one chromosome. The self-adaptation of the crossover goes on in the following way. For each chromosome from the population: • calculate the value of crossover rate extracted from the chromosome, • if the value from the vector entry is lower than the value of crossover rate taken from the chromosome, then the chromosome is selected to crossover, • so selected chromosomes are then randomly coupled and undergo the kpoint crossover with randomly determined recombination breakpoints, • k - the number of crossover points for each pair of chromosomes is taken from the chromosome with the lower fitness value. • the vector remains unchanged during the run.

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

Fig. 14

§5

319

Self-adaptive Crossover in SAM2C2T

Results of Experiments

There were two main goals of our experiments. Firstly, a classic GA with three self-adapting GAs were compared in respect of a convergence, and secondly, heterogeneous ensembles comprising fuzzy models to assist with real estate appraisals were created with four above mentioned algorithms and their predictive accuracy was analysed and discussed. In all experiments, all four algorithms: GA, SAMC, SAMCT, and SAM2C2, were executed independently 10 times and the final values of MSE were calculated as an average over 10 runs. At the beginning, we investigated the performance of ensembles obtained with different settings of the second constraint described in the previous section, i.e. the maximal number of models created over a given bag. In Fig. 15, the results for ensembles comprising 30 models after 100 generations are shown. Next, for the maximal number of models created over a given bag set to 10, we tested the performance of ensembles composed of different number of component models. The results after 100 generations are depicted in Fig. 16. The outcome is diverse, therefore, in the future, we plan to construct genetic algorithms with variable length chromosomes to learn also the size of an optimal ensemble. In order to investigate the convergence of individual algorithms, three se-

Fig. 15

Performance of ensembles comprising 30 models obtained with different settings of the constraint: maximal number of models per bag, after 100 generations

320

M. Sm¸ etek and B. Trawi´ nski

Fig. 16

Performance of ensembles composed of different number of models obtained with the constraint of maximal number of models per bag equal to 10, after 100 generations

ries of experiments were carried out for different number of generations up to 100, 1000 and 10,000, respectively. Heterogeneous ensembles composed of 10 different models were created. The constraint of the maximal number of models created over a given bag set to 10 and the number of models created by a given algorithm was equal to its maximal value equal to 30. The results are presented in Figs. 17, 18, and 19. It is clearly seen that self-adaptive algorithms converge faster than the classic GA with constant values of control parameters, SAMC and The MSE of heterogeneous ensembles dropped from about 0.0018 by 10 generations to below 0.0015 by 10,000 generations. The results produced by individual algorithms for increasing numbers of generations are shown in Table 3. The MSE

Fig. 17

Convergence Comparison for up to 100 Generations

Fig. 18

Convergence Comparison for up to 1000 Generations

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

Fig. 19

321

Convergence Comparison for up to 10,000 Generations

achieved by GA, SAMC, SAMCT, and SAM2C2 algorithms by 10 000 generations equal to 0.001560, 0.001467, 0.001493, and 0.001468 respectively, seems to be very good when compared with the homogeneous ensembles composed of 10 bags with the best performance.40) The MSE of homogeneous multi-models created with ITS was equal to 0.00145, WGG 0.00173, FRS 0.00189, and SEF 0.00192. Average MSE values increasing number of No. of gen. GA 10 0.001613 100 0.001609 1000 0.001609 10000 0.001560

Table 3

of solutions generations SAMC 0.001708 0.001479 0.001469 0.001467

produced by algorithms for SAMCT 0.001803 0.001518 0.001508 0.001493

SAM2C2T 0.001766 0.001469 0.001468 0.001468

Due to the fact that the convergence rate of self-adaptive algorithms was the biggest in the range to 100 generations, we examined also the number of generations and processing time needed to obtain MSE values below 0.0015. In this tests, GA was omitted because it could not provide so low MSE value. The average values we obtained for 10 independent runs are given in Table 4. Taking into account median all three algorithms reached the threshold MSE value after no more than 25 generations and SAMCT revealed the lowest standard deviation. Processing times were relatively low, i.e. from 2.0 to 3.7 seconds on average, however, in case of greater search space than in our experiment, the convergence rate could play an important role. Selected heterogeneous solutions produced by respective classic and selfadaptive genetic algorithms by 10,000 generations are presented in Table 5. Number of generations and time needed to achieve threshold MSE value equal to 0.0015 by self-adapting algorithms for 10 runs (denotation used in headers: avg - average, med - median, std - standard deviation) Algorithm No. of generations Time [s] avg med std avg med std SAMC 39 22 39 3.480 3.566 1.860 SAMCT 27 25 9 1.978 1.981 0.610 SAM2C2T 44 22 70 3.675 1.602 5.919

Table 4

322

M. Sm¸ etek and B. Trawi´ nski

No. 1 2 3 4 5 6 7 8 9 10 MSE

§6

Table 5 Selected Solutions Produced by Individual Algorithms Bag GA Bag SAMC Bag SAMCT Bag SAM2C2T 11 WGG 30 ITS 6 IRL 11 IRL 30 IHC 6 IRL 27 ITS 27 WGG 27 SEF 6 FRS 11 IRL 11 ISC 6 WAG 14 ITS 16 IRL 6 IRL 22 IHC 27 ITS 14 ITS 27 ITS 11 IRL 11 IHC 27 WGG 11 ITS 16 IHC 6 WGG 6 WGG 14 WGG 27 IHC 11 ITS 22 IRL 11 WGG 28 IHC 30 IHC 11 WGG 11 FRS 6 SEF 22 WGG 22 WGG 16 ITS 0.001668 0.001439 0.001391 0.001398

Conclusions and Future Work

The problem of a model selection to compose a heterogeneous bagging ensemble was addressed in the paper. To solve above problem, we proposed three self-adapting GAs with varying mutation, crossover, and selection. The search space was determined by a number different algorithms used to generate fuzzy regression models over a number of bootstrap replicates. Two constraints were put on the solution to ensure the heterogeneity: the maximal number of models created by a given algorithm and the maximal number of models created over a given bag. Three variants of self-adapting genetic algorithms: SAMC, SAMCT, and SAM2C2T were proposed with different control parameters adjusted during the execution. The chromosomes of these GAs, besides the solution, comprised mutation and crossover rates, tournament size, number of genes to mutate, and number of crossover points, thereby making them subject to evolution. For comparison, the classic GA with unchanged parameters of selection, crossover, and mutation was employed. The algorithms were applied to create heterogeneous ensembles comprising fuzzy models aiding real estate appraisals with the predictive accuracy as close as possible to the optimal value. In the experiments, the algorithms derived the solution from the matrix which had 16 rows and 30 columns, reflecting thereby 16 methods used to create fuzzy models over 30 bags. Three series of experiments were conducted for different number of generations up to 100, 1000 and 10,000. All four algorithms, i.e. GA, SAMC, SAMCT, and SAM2C2T, were executed independently 10 times and the final values of MSE were calculated as an average over 10 runs. The results revealed that the self-adaptive algorithms converged faster than the classic GA with constant values of control parameters and produced compound models with better predictive accuracy. The heterogeneous ensembles created by self-adapting methods were very good in respect of MSE when compared with the homogeneous ensembles obtained in our earlier research. However, an open problem remains the stability of such ensembles, therefore, further investigations are planned with the use of new sets of real world data and with the consideration of time impact on the prices of land and premises. Moreover, several other heuristics such as ant colony optimization,

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

323

particle swarm optimization, harmony search, and artificial immune systems will be implemented and tested from the point of view of their usefulness to create heterogeneous ensembles. Further research is also planned to extend the self-adaptive parameters to include the age of chromosomes, population size, and others. We will also construct self-adapting genetic algorithms with variable lengths of chromosomes to learn also the size of an optimal solution. More criteria of the algorithm assessment will be taken into account. The application of the self-adaptive techniques to benchmark functions will also be considered.

Acknowledgements This paper was partially supported by the National Science Centre (Polish - Narodowe Centrum Nauki) under grant no. N N516 483840.

References 1)

2)

3)

4)

5)

6) 7)

8)

9)

Alcal´ a, R., Alcal´ a-Fdez, J., Casillas, J., Cord´ on, O., Herrera, F., “Local identification of prototypes for genetic learning of accurate TSK fuzzy rule-based systems,” International Journal of Intelligent Systems, 22, 9, pp. 909–941, 2007. Alcal´ a, R., Cord´ on, O., Herrera, F., “Combining Rule Weight Learning and Rule Selection to Obtain Simpler and More Accurate Linguistic Fuzzy Models,” Modeling with Words (Lawry, J. ed.), LNCS 2873, pp. 44–63, Springer, Heidelberg, 2003. Alcal´ a-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garc´ıa, S., S´ anchez, L., Herrera, F., “KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework,” Journal of Multiple-Valued Logic and Soft Computing, in press, 2011. Alcal´ a-Fdez, J., S´ anchez, L, Garc´ıa, S, del Jesus, M. J., Ventura, S., Garrell, J. M., Otero, J., Romero, C., Bacardit, J., Rivas, V. M., Fern´ andez, J. C., Herrera F., “KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems,” Soft Computing, 13, 3, pp. 307–318, 2009. Angeline, P. J., “Adaptive and self-adaptive evolutionary computations,” in Computational Intelligence: A Dynamic Systems Perspective (Palaniswami, M. and Attikiouzel, Y. eds.), pp. 152–163, IEEE Press, New York, 1995. B¨ ack, T., Schwefel, H.-P., “An Overview of Evolutionary Algorithms for Parameter Optimization,” Evolutionary Computation, 1, 1, pp.1–23, 1993. B¨ ack, T., “Self-adaptation in genetic algorithms,” in Toward a Practice of Autonomous Systems (Varela, F. J. and Bourgine, P. eds.), in Proc. First European Conference on Artificial Life, pp. 263–271, MIT Press, 1992. Ba´ nczyk, K., Kempa, O., Lasota, T., Trawi´ nski, B., “Empirical Comparison of Bagging Ensembles Created Using Weak Learners for a Regression Problem,” ACIIDS 2011 (Nguyen, N. T., Kim, C.-G. and Janiak, A. eds.), LNAI 6592, pp. 312–322, Springer, Heidelberg, 2011. Banfield, R. E., Hall, L. O., Bowyer, K. W., Kegelmeyer, W. P., “A comparison of decision tree ensemble creation techniques,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1, pp. 173–180, 2007.

324 10)

11) 12) 13) 14)

15)

16) 17)

18)

19)

20) 21) 22)

23)

24)

25)

26)

M. Sm¸ etek and B. Trawi´ nski

Biau, G., C´erou, F., Guyader, A., “On the Rate of Convergence of the Bagged Nearest Neighbor Estimate,” Journal of Machine Learning Research, 11, pp. 687–712, 2010. Brown, G., Wyatt, J., Ti˜ no, P., “Managing Diversity in Regression Ensembles,” Journal of Machine Learning Research, 6, pp. 1621–1650, 2005. B¨ uhlmann, P., Yu, B., “Analyzing bagging,” Annals of Statistics, 30, pp. 927– 961, 2002. Carse, B., Fogarty, T. C., Munro, A., “Evolving fuzzy rule based controllers using genetic algorithms,” Fuzzy Sets and Systems, 80, 3, pp. 273–293, 1996. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A., “Ensemble selection from libraries of models,” in Proc. of the 21st International Conference on Machine Learning, ACM Press, pp. 137–144, 2004. Casillas, J., Cord´ on, O., Herrera, F., “COR: A Methodology to Improve ad hoc Data-Driven Linguistic Rule Learning Methods by Inducing Cooperation Among Rules,” IEEE Trans. on System, Man and Cybernetics, Part B: Cybernetics, 32, 4, pp. 526–537, 2002. Chandra, A., Yao, X., “Evolving hybrid ensembles of learning machines for better generalization,” Neurocomputing, 69, pp. 686–700, 2006. Cord´ on, O., Herrera, F., “A three-stage evolutionary process for learning descriptive and approximate fuzzy logic controller knowledge bases from examples,” International Journal of Approximate Reasoning, 17, 4, pp. 369–407, 1997. Cord´ on, O., Herrera, F., “Hybridizing genetic algorithms with sharing scheme and evolution strategies for designing approximate fuzzy rule-based systems,” Fuzzy Sets and Systems, 118, 2, pp. 235–255, 2001. Cord´ on, O., Quirin, A., “Comparing Two Genetic Overproduce-and-choose Strategies for Fuzzy Rule-based Multiclassification Systems Generated by Bagging and Mutual Information-based Feature Selection,” International Journal of Hybrid Intelligent Systems, 7, 1, pp. 45–64, 2010. De Jong, K., “An analysis of the behavior of a class of genetic adaptive systems” Ph.D. thesis, University of Michigan, 1975. Deb, K., Beyer, H.-G., “Self-adaptive genetic algorithms with simulated binary crossover,” Evolutionary Computation, 9, 2, pp. 197–221, 2001. Digalakis, J. G., Margaritis, K. G., “An Experimental Study of Benchmarking Functions for Genetic Algorithms,” Int. J. Computer Math, 79, 4, pp. 403–416, 2002. Dos Santos, E. M., Sabourin, R., Maupin, P., “A dynamic overproduce-and choose strategy for the selection of classifier ensembles,” Pattern Recognition, 41, pp. 2993–3009, 2008. Dutta, H., “Measuring Diversity in Regression Ensembles,” in Proc. of the 4th Indian International Conference on Artificial Intelligence (Prasad, B., Lingras, P., Ram, A. eds.), IICAI 2009, Tumkur, Karnataka, India, pp. 2220–2236, 2009. Eiben, E., Hinterding, R., Michalewicz, Z., “Parameter control in evolutionary algorithms,” IEEE Transactions on Evolutionary Computation, 3, 2, pp. 124– 141, 1999. Garc´ıa-Pedrajas, N., “Constructing Ensembles of Classifiers by Means of Weighted Instance Selection,” IEEE Transactions on Neural Networks, 20, 2, pp. 258–277, 2009.

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

27)

28)

29)

30) 31)

32)

33)

34) 35)

36) 37)

38) 39)

40)

41)

42)

325

Graczyk, M., Lasota, T., Trawi´ nski, B., “Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA,” in ICCCI 2009, LNAI 5796 (Nguyen, N. T. et al. eds.), pp. 800–812, Springer, Heidelberg, 2009. Graczyk, M., Lasota, T., Telec, Z., Trawi´ nski, B., “A Multi-agent System to Assist with Property Valuation Using Heterogeneous Ensembles of Fuzzy Models,” in KES-AMSTA 2010, LNAI 6070 (Jedrzejowicz, P. et al. eds.), pp. 420–429, Springer, Heidelberg, 2010. Graczyk, M., Lasota, T., Trawi´ nski, B., Trawi´ nski, K., “Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal,” in ACIIDS 2010, LNAI 5991 (Nguyen, N. T. et al. eds.), pp. 340–350, Springer, Heidelberg, 2010. Hansen, N., Ostermeier, A., “Completely derandomized self-adaptation in evolution strategies,” Evolutionary Computation, 9, 2, pp. 159–195, 2001. Herrera, F., Lozano, M., Verdegay, J. L., “Tuning Fuzzy Logic Controllers by Genetic Algorithms,” International Journal of Approximate Reasoning, 12, pp. 299–315, 1995. Hinterding, R., Michalewicz, Z., Eiben, A. E., “Adaptation in Evolutionary Computation: A Survey,” in Proc. of the Fourth International Conference on Evolutionary Computation (ICEC 97), pp. 65–69, IEEE Press, New York, 1997. Ho, T., “The random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 8, pp. 832–844, 1998. Jackowski, K., Wo´zniak, M., “Method of classifier selection using the genetic approach,” Expert Systems, 27, 2, pp. 114–128, 2010. Juang, C.-F., Lin, J.-Y., Lin, C.-T., “Genetic reinforcement learning through symbiotic evolution for fuzzy controller design,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, 30, 2, pp. 290–302, 2000. K´egl, B., “Robust regression by boosting the median,” in Proc. of the 16th Conference on Computational Learning Theory, pp. 258–272, 2003. Kempa, O., Lasota, T., Telec, Z. and Trawi´ nski, B., “Investigation of Bagging Ensembles of Genetic Neural Networks and Fuzzy Systems for Real Estate Appraisal,” ACIIDS 2011, LNAI 6592 (Nguyen, N. T. Kim, C.-G. and Janiak, A. eds.), pp. 323–332, Springer, Heidelberg, 2011. Kim, Y.-W., Oh, I.-S., “Classifier ensemble selection using hybrid genetic algorithms,” Pattern Recognition Letters, 29, pp. 796–802, 2008. Kr´ ol, D., Lasota, T., Trawi´ nski, B., Trawi´ nski, K., “Investigation of evolutionary optimization methods of TSK fuzzy model for real estate appraisal,” International Journal of Hybrid Intelligent Systems, 5, 3, pp. 111–128, 2008. Krzystanek, M., Lasota, T., Telec, Z., Trawi´ nski, B., “Analysis of Bagging Ensembles of Fuzzy Models for Premises Valuation,” in ACIIDS 2010, Part II, ´ atek, J. eds.), pp. 330– LNCS (LNAI) 5991 (Nguyen, N. T., Le, M. T. and Swi¸ 339, Springer, Heidelberg, 2010. Kuncheva, L. I., Whitaker, C. J., “Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy,” Machine Learning, 51, pp. 181–207, 2003. Lasota, T., Mazurkiewicz, J., Trawi´ nski, B., Trawi´ nski, K., “Comparison of Data Driven Models for the Validation of Residential Premises using KEEL,” International Journal of Hybrid Intelligent Systems, 7, 1, pp. 3–16, 2010.

326 43)

44)

45)

46)

47) 48) 49) 50)

51) 52)

53)

54) 55)

56) 57)

58)

M. Sm¸ etek and B. Trawi´ nski

Lasota, T., Telec, Z., Trawi´ nski, B., Trawi´ nski K., “Exploration of Bagging Ensembles Comprising Genetic Fuzzy Models to Assist with Real Estate Appraisals,” in IDEAL 2009, LNCS 5788 (Yin, H., Corchado, E. eds.) pp. 554–561, Springer, Heidelberg, 2009. Lofstrom, T., Johansson, U., Bostrom, H., “Bostrom, H.: Ensemble member selection using multi-objective optimization,” IEEE Symposium on Computational Intelligence and Data Mining, CIDM’09, pp. 245–251, 2009. Maruo, M. H., Lopes, H. S., Delgado, M. R., “Self-Adapting Evolutionary Parameters Encoding Aspects for Combinatorial Optimization Problems,” in EvoCOP 2005, LNCS 3448 (Raidl, G. R., Gottlieb, J. eds.), pp. 154–165, Springer, Heidelberg, 2005. Meyer-Nieberg, S., Beyer, H-G., “Self-Adaptation in Evolutionary Algorithms,” in SCI (Lobo, F. G., Lima, C. F., Michalewicz, Z. eds.), 54, pp. 47–75, Springer, Heidelberg, 2007. Partalas, I., Tsoumakas, G., Vlahavas, I., “Pruning an Ensemble of Classifiers via Reinforcement Learning,” Neurocomputing, 72, 79, pp. 1900–1909, 2009. Partridge, D., Yates, W. B., “Engineering multiversion neural-net systems,” Neural Computation, 8, 4, pp. 869–893, 1996. S´ anchez, L., Couso, I., “Combining GP operators with SA search to evolve fuzzy rule based classifiers,” Information Sciences, 136, pp. 175–192 2001. S´ anchez, L., Couso, I., “Fuzzy random variables-based modeling with GA-P Algorithms,” in Information, Uncertainty and Fusion (Yager, R., Bouchon-Menier, B., Zadeh, L. eds.), Kluwer Editors, pp. 245–256, 2000. S´ anchez, L., “A random sets-based method for identifying fuzzy models,” Fuzzy Sets and Systems, 98, 3, pp. 343–354, 1998. Schaffer, J. D., Morishima, A., “An adaptive crossover distribution mechanism for genetic algorithms,” in Proc. Second Int. Conference on Genetic Algorithms, L. Erlbaum Associates Inc., New York, pp. 36–40, 1987. Scherbart, A., Nattkemper, T. W., “The Diversity of Regression Ensembles Combining Bagging and Random Subspace Method,” in Advances in NeuroInformation Processing (Kppen, M., Kasabov, N., Coghill, G. eds.), ICONIP 2008, Springer, Heidelberg, pp. 911–918, 2009. Smith, J. E., Fogarty, T. C., “Operator and parameter adaptation in genetic algorithms,” Soft Computing, 1, 2, pp. 81–87, 1997. Srinivas, M., Patnaik, L. M., “Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms,”IEEE Trans. on Systems, Man, and Cybernetics, 24, 4, pp. 656–667, 1994. Tamon, C., Xiang, J., “On the boosting pruning problem,” Proc. 11th European Conference on Machine Learning, pp. 404–412, 2000. Tang, K., Li, X., Suganthan, P. N., Yang, Z., Weise, T., “Benchmark Functions for the CEC’2010 Special Session and Competition on Large Scale Global Optimization,” Technical Report, Nature Inspired Computation and Applications Laboratory, USTC, China, 2009. http://nical.ustc.edu.cn/cec10ss.php Thrift, P., “Fuzzy logic synthesis with genetic algorithms,” in Proc. of the Fourth Int. Conference on Genetic Algorithms (ICGA’91) San Diego, pp. 509–513, 1991.

Selection of Heterogeneous Fuzzy Model Ensembles Using Self-adaptive Genetic Algorithms

59)

60) 61)

62) 63)

327

Tsoumakas, G., Partalas, I., Vlahavas, I., “A Taxonomy and Short Review of Ensemble Selection,” in ECAI 2008, Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications Patras, Greece, 2008. Tsymbal, A., Pechenizkiy, M., Cunningham, P., “Diversity in search strategies for ensemble feature selection,” Information Fusion, 6, 1, pp. 83–98, 2005. Wang, L. X., Mendel, J. M., “Generating Fuzzy Rules by Learning from Examples,” IEEE Trans. on Systems, Man and Cybernetics, 22, 6, pp. 1414–1427 1992. Yao, X., Liu, Y., Lin, G., “Evolutionary Programming Made Faster,” IEEE Trans. Evol. Comput., 3, 2, pp. 9196, 1999. Yao, X., Liu, Y., “Fast evolution strategies,” Contr. Cybern., 26, 3, pp. 467–496, 1997.

Magdalena Sme¸tek: She is a 2nd year Ph.D. student at the Faculty of Computer Science and Management, the Wroclaw University of Technology, Poland. She received her M.S. from the Wroclaw University of Technology in 2009. Her research interests include computational intelligence, knowledge management, and data mining.

´ Bogdan Trawinski, Ph.D.: He is an assistant professor at the Faculty of Computer Science and Management, the Wroclaw University of Technology, Poland. He received his M.S. (1980) and Ph.D. (1986) from the Wroclaw University of Technology. His main research interests are computational intelligence, knowledge management, data mining, and information systems. He is a member of IEEE.