A multiple population XCS: Evolving condition-action rules based on feature space partitions Mani Abedini and Michael Kirley
Abstract— XCS is an accuracy-based machine learning technique, which combines reinforcement learning and evolutionary algorithms to evolve a set of classifiers (or rules) for pattern classification tasks. In this paper, we investigate the effects of alternative feature space partitioning techniques in a multiple population island-based parallel XCS. Here, each of the isolated populations evolve rules based on a subset of the features. The behavior of the multiple population model is carefully analyzed and compared with the original XCS using the Boolean logic multiplexer problem as a test case. Simulation results show that our multiple population XCS produced better performance and better generalization than the single population XCS model, especially when the problem increased in size. A caveat, however, is that the effectiveness of the model was dependent upon the feature space partitioning strategy used.
I. I NTRODUCTION Learning classifier systems are an evolutionary computation based technique used to assign a class label to input data vectors, based on the model built from a previous set of examples. These evolutionary-based approaches can be divided into two different models. Michigan style learning classifier systems [8] evolve a set of rules, called classifiers that partition the search space and predict the outcome of actions. In contrast, in the Pittsburgh model each individual is a variable-length rule set, which is then subjected to the usual evolutionary operators [8], [13]. Each model has its own advantages and/or disadvantages: the Michigan model tends to converges more rapidly, but can fail to learn good solutions for complex problems; the Pittsburgh model can generally solve more difficult problems (in some cases), but at a relatively high computational cost. XCS is a well-known Michigan-style learning classifier system, which evolves problem solutions represented by a population of classifiers [24], [25], [14]. Each classifier consists of a condition-action-prediction rule, with a fitness value proportional to the accuracy of the prediction of the reward. Evolutionary operators are used to discover better rules that may improve the current population of classifiers. There have been many interesting and informative investigations into the performance of the various XCS configurations focused on generalization, specificity and fitness pressure aligned with accuracy. For example, Butz and coworkers [4] have shown that by introducing techniques that can efficiently detect building blocks in the condition part of Mani Abedini is a PhD student in the Department of Computer Sciences and Software Engineering, The University of Melbourne, Australia (email:
[email protected] ). Michael Kirley is with the Department of Computer Sciences and Software Engineering, The University of Melbourne, Australia (email:
[email protected]).
the classifier, it may be possible to improve the performance of XCS. Specific evolutionary operators designed to help avoid the over-generalization phenomena inherent in XCS have also been demonstrated to be useful [15]. The XCS framework is flexible and can be readily adapted by adjusting the condition structures, learning operators and prediction mechanisms. However, a key challenge when using XCS (and indeed many other learning classifier systems) revolves around scalability [23] – how can high levels of performance be obtained when confronted with problems which increase exponentially in size and complexity? In our previous work [1], we have introduced a multiple population coevolutionary XCS, which we referred to as CoXCS (see Section 3 for details). The motiving factor behind the introduction of CoXCS was the need to develop a flexible and robust system to tackle large-scale, complex classification problems. In CoXCS, isolated sub-populations evolve a set of classifiers based on a partitioning of the feature space in the data. Modifications to the base XCS framework were introduced including an algorithm to create the match set and a specialized crossover operator. The performance of the model was evaluated using a suite of classification problems, including large scale benchmark gene expression datasets. The experimental results indicated that the accuracy of the proposed model was significantly better than other well-known classifiers when the ratio of data features to samples was extremely large. The design and implementation of CoXCS was inspired to some extent by the cooperative coevolution model proposed by Potter and De Jong [19], [18], and a more recent study by Yang and co-workers [26] focussed on the problem of decomposition. In this paper, we extend this work by investigating the effects of using alternative feature space partitioning techniques – a dynamic random strategy, a fixed strategy with migration; and a dynamic random strategy with migration – in CoXCS. Here, we restrict our investigation to large scale instances of the Boolean logic multiplexer problem. Results from comprehensive simulation experiments conducted suggest that the multiple population XCS with feature space partitioning and migration episodes performs significantly better than the standard XCS configuration. The remainder of this paper is organized as follows: In Section II we present background material related to XCS. This is followed by a brief review of multi-population implementations of XCS and related decomposition models. In Section III we outline the basic functionality of CoXCS, before describing the feature space partitioning techniques to be investigated. The experiments and results follow in
Section IV. We discuss the results and implications of this work in Section V. We conclude the paper by summarizing the contributions and identify possible future directions. II. BACKGROUND : XCS A. Overview XCS is a learning classifier systems, which has demonstrated reliable performance in data mining applications. We provide a brief overview of XCS functionality in this subsection. Space constraints preclude us from providing a detailed discussion of XCS. However, further details can be found in Wilson’s original paper [24] and related papers (eg.[16], [25], [5]). XCS maintains a population of classifiers (see Fig 1). Each classifier consists of a condition-action-prediction rule, which maps input features to the output signal (or class). A ternary representation of the form 0,1,# (where # is don’t care) for the condition and 0,1 for the action can be used. In addition, real encoding can also be used to accurately describe the environment states [25]. At each time step, the classifier system receives a problem instance – input in the form of a vector of features – which requires a decision – that is, an action to be performed next. A match set [M ] is created consisting of rules (classifiers) that can be “triggered” by the given data instance. A covering operator is used to create new matching classifiers when [M ] is empty. A prediction array is calculated for [M ] that contains an estimation of the corresponding rewards for each of the possible actions. Based on the values in the prediction array, an action, a (the output signal), is selected. In response to a, the reinforcement mechanism is invoked and the prediction, p, prediction error, ǫ, accuracy, k, and fitness, F , of the classifier is updated. The corresponding numerical reward is distributed to the rules accountable for it so as to improve the estimates of the action values. A key component of XCS is the evolutionary computation module. During the evolutionary process, fitnessproportionate selection is used to guide the selection of parents (classifiers in the population), who generate new offspring via crossover and mutation. A bounded population size is typically used. Consequently a form of niching is used to determine if the offspring is added to the population and/or which of the old members of the population are deleted to make room for the new classifier (offspring). The deletion of classifiers is biased towards those with larger action set sizes and lower fitness. B. Multi-population based XCS Parallel and distributed implementations of XCS incorporating multiple (and possibly) independent classifier populations have not been widely reported in the literature. To the best of our knowledge, the approaches briefly described below represent the key contributions. Bull and co-workers [3] present a detailed investigation of an ensemble machine approach utilizing multiple populations of accuracy-based learning classifier systems. A
key component of their multiple population model was a rule migration mechanism, which allowed selected rules to move from one population to another at given time intervals. This particular approach is typically used in islandbased parallel evolutionary models. Bull et al., have shown that this approach can lead to improved learning speed in comparison to equivalent single systems using the benchmark multiplexer problem of varying complexity. Significantly, they have shown by exploiting the underlying niche-based generalization mechanism of accuracy-based systems further performance improvements are possible. Dam and co-workers [7], proposed an XCS–based client/server distributed data mining system. Each client had its own XCS, which evolved a set of classifiers using a local data repository. The server then combined the models with its own XCS and attempted to find a set of classifiers to help explain patterns incorrectly classified locally by the clients. The performance of the model was evaluated using benchmark problems, focusing on network load and communication costs. The results suggested that the distributed XCS model was competitive as a distributed data mining system, particularly when the epoch size increased. In a similar study, a multi-population parallel XCS for classification of electroencephalographic signals was introduced by Skinner et al., [22]. The specific focus of that study was to investigate the effectiveness of migration strategies between sub-populations mapped to ring topologies. They reported that the parameter setting of the multi-population model had a significant effect on the resulting classifier accuracy. C. Decomposition and coevolutionary models An alternative approach for solving a classification task is to incorporate a decomposition strategy into the model. For example, Gershoff et al., [9] attempted to improve global XCS performance via a hierarchical partitioning scheme. An agent in the model was assigned to each partition, which contained a collection of homogeneous XCS classifiers. The predicted output signal (class) was then estimated using a voting mechanism. This output signal, with a confidence score, was then passed up the hierarchy to a controlling agent. This agent then decided the final output of the system based on the combined output from each of the subpopulations it was responsible for. Gershoff et al., report results with improved performance notes in the limited domain tested. Richter [20] introduced an extended XCS model, where a series of lower level problems were solved. These results were then combined into a global result for the given problem. Improved performance was noted in the limited range of test problem used in the study. In such an approach, different sub-problem formulations will have a significant impact on the performance of the distributed system. A recent model employing a cooperative coevolutionary classifier system was introduced by Zhu and Guan [28]. In this fine-grained approach, individuals in isolated subpopulations encoded if–then rules for each feature in the
Fig. 1. XCS model overview. The condition segment of the classifier consists of a vector of features, each encoded using real or binary values. The output signal (prediction class) is a binary value in this case. The classifier’s fitness value is proportional to the accuracy of the prediction of the reward. See text for further explanation.
data set. As such, the decomposition was taken to the extreme. Individuals were used to classify the partially masked training data corresponding to the feature in focus. However, this particular approach required a two-step process – a concurrent global and local evolutionary process – in order to generate satisfactory accuracy levels. For data sets with a large number of features (attributes) such fine-grain modeling is computationally expensive. In a recent unrelated study, Jiao and co-workers [12] introduced an organizational coevolutionary algorithm for classification, based on a bottom-up search mechanism using genetic algorithms. The algorithm was based on simulating the interactions between organizations – or sets of examples – in order to extract rules from the examples. They report that the multiple population model was able to evolve accurate solutions across a range of problems. III. M ODEL A. CoXCS In many learning classifier systems, including XCS, the intrinsic functionality of coevolution guides the rule discovery process. The structure of the individual rule is made up of a condition, an action, and one or more parameter values (e.g., a prediction value and/or a fitness value). A form of parallel processing is used to identify appropriate building blocks when processing information and to resolve conflicts. Consequently, there is a complex balancing act between the competitive evolutionary pressures of generalization and specificity coupled with fitness pressure, which drives accuracy and the cooperative interactions between the evolved classifiers.
Clearly there are differences between the coevolutionary models described above and standard XCS-flavoured models. However, the rationale on which we have based the design and development of CoXCS is that by employing an explicit divide and conquer approach with alternative approaches to feature space partitioning (that is, problem decomposition), it will be possible to evolve robust classifiers. CoXCS is fundamentally a coevolutionary parallel learning classifier based on feature space partitioning. Fig. 2 provides a high-level schematic overview of the system. Algorithm 1 provides a description of CoXCS. CoXCS employs multiple isolated sub-populations, each executing their own XCS using a subset of features. The covering mechanism operates explicitly on a subset of features, determine by a partitioning protocol. An important design component of the model addressed in this study is the use of both fixed and dynamically sized partitions. The model makes explicit use of the don’t care # value when concatenating individuals from each subpopulation. This technique provides the necessary selection pressure to guide the evolving populations towards maximally general, non-overlapping, accurate classifiers. Within the CoXCS model, the features contained in the dataset are partitioned into a set of n sub-populations. Each sub-populations executes its own XCS model. The condition segment of each classifier in a given sub-population is initialized using the associated subset of the features (of size λ, which may varying under different model configurations) from the data set being processed. Given the fact that each sub-population is focussed on a subset of the features in the condition segment for the whole problem, we treat all
Fig. 2. High level overview of the CoXCS model. Each isolated sub-population evolves solutions based on a partitioning of the feature space (of size λ) using a separate CoXCS. Randomly selected classifiers migrate between sub-populations. In this version of the model, the partitioning is fixed in a linear fashion. However, alternative partitioning protocols are possible (see section .III-B)
remaining features from other sub-poulations as don’t cares #. The covering mechanism in each XCS sub-population works in standard manner using a subset of the features. Individual classifier evaluation within a sub-population is a relatively straight forward task. Here, we simply append # at the appropriate ends of the feature vector to build a complete solution. Importantly, the sub-populations evolve separately, but with a common objective. Each isolated sup-population accumulates and specializes its expertise across a subset of the input space. Bounded sub-population sizes are used as per the standard XCS model. When a new classifier is added to a sub-population, if the size limit is reached, a randomly selected classifier (based on a niching technique) is deleted from the sub-population. CoXCS also utilizes migration episodes, as such the model has some similarities to the ensemble machine model described in [3]. After a fixed number of iterations of the sub-population XCS, randomly selected classifiers migrate to a different sub-population based on a random migration topology. It is important to note, that the evolutionary operators (mutation and crossover operators) do not destroy the inherent building blocks within the immigrant classifiers; inside each subpopulation evolutionary operators are restricted to only associated features of that partition. Typically, in cooperatively coevolutionary models, migration between sub-populations is not used. However, migration provides a mechanism to exchange building blocks between different partitions and can be thought of as an advance crossoverstyle operator. B. Feature space partitioning There are many different ways to partition the feature space. Here, we argue that by focussing specifically on the inherent coevolutionary properties of XCS, it will be possible to evolve classifiers capable of tackling large scale problems effectively based on framework similar to the cooperative coevolution framework. Before describing the proposed techniques, key decomposition approaches from
the the evolutionary optimization community are provided, illustrating our rationale. In Potter and De Jong’s optimization model, a high dimensional searching space was divided into n low dimensional searching spaces or sub-problems. A separate population was used to evolve solutions for each of the sub-problems (which evolved independently). However, for evaluation purposes, representative individuals from each of the sub-populations were combined to form a complete solution, and a fitness value was calculated accordingly. In other work, Shi et al. introduced a cooperative differential evolution model for high dimensional optimization [21]. When decomposing the problem, interdependencies between features were ignored and no restrictions were placed on sub-problem composition. More recently, Yang et al. [26] have suggested that random and dynamic feature allocation – that is problem decomposition – will improve the performance of the cooperative coevolution algorithm for a given optimization problem. In this study, two different feature-space partitioning strategies will be examined in combination with a specific migration protocol: •
•
Dynamic Random Partitioning (DRP) A group of features of size si from the condition segment are randomly assigned to each of the n subpopulation (without replacement). It is important to note that the actual feature positions in the original data set does not have to be maintained when allocating features to a partition. Periodically, the allocation of features to a given partition are reorganized. Fixed Random Partitioning with Migration (FRPM) A group of features of fixed size si from the condition segment are randomly assigned to each of the n subpopulation. In this approach, there is no periodically reorganization of features. However, migration episodes, which occur periodically after a given number of learning epochs, provide a mechanism to exchange building blocks. It is important to note that as a result of migration, classifiers assigned to a partition are of variable
Algorithm 1 CoXCS high-level overview Require: n: The number of partitions v: vector of features from the condition segment 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
•
partition[]= PARTITION-FEATURES(n, v) for e = 0 to maxEpochs do for i = 1 to n do XCS[i].CREATE( partition[i] ) XCS[i].RUN( partition[i] ) end for CoXCS.EVALUATE(v) if CoXCS.DYNAMIC then partition[] = PARTITION-FEATURES(n, v) end if if CoXCS.MIGRATION then MIGRATION(n) end if end for CoXCS.RESULTS() PARTITION-FEATURES(n, v) return partition[] for j = 1 to v do index = RANDOM(n) partition[index].ADD( v[j] ) end for
length (this may be as a result of the contention of individuals from different partitions during a crossover operation). Dynamic Random Partitioning with Migration (DRPM) A group of features of size si from the condition segment are randomly assigned to each of the n subpopulation. It is important to note that the actual feature positions in the original data set does not have to be maintained when allocating features to a partition. Periodically, the allocation of features to a given partition are reorganized. Migration episodes, as described above in FRPM, are also used. IV. E XPERIMENTS
A series of experiments have been carried out to evaluate the relative performance of the alternative feature space partitioning – migration protocols proposed in the CoXCS model. The underlying hypothesis tested was that the use of an explicit partitioning scheme would lead to improved performance when compared to the standard XCS. A. Test problems The scalable test problem used in this study was the benchmark Boolean logic multiplexer problem. This problem has often been used in the learning classifier literature to illustrate adaptation and decision-making ability. The problem is interesting because the function to be learned is complex and does allow for generalization to be made, yet the classifier input representation is straight forward.
Fig. 3. An example of the multiplexer 6 test. The address bits point the third data bit, so the value of the whole data is equal to the value of the third data bit which is 1.
Multiplexer L has an input vector with the length of L = k + 2k with k > 0, which the first k bits are the address bits and the last 2k bits are the data bits (see fig 3). Multiplexer 3, 6, 11, 20 have been widely used in the learning classifier and machine learning literatures. In this study, we report results on three large scale instances: multiplexer 20, 37 and 70. The larger problems have been selected to illustrate performance differences between the alternative CoXCS implementation and XCS – where high accuracy levels are still achievable in a reasonable learning time frame. It should be noted, that the goal of the present study is to compare the relative merits of the alternative partitioning strategies. In our previous study [1], we have already shown that fixed partitioning is a useful technique to use in largescale classification tasks. B. Parameters Three different CoXCS configurations were used in this study: DRP, FRPM, and DRPM. For comparison purpose, results for the standard XCS are also reported. We have attempted to keep to the default XCS parameter settings where appropriate. That is, the parameters for XCS – both the base case and the XCS model running in each isolated subpopulation of the CoXCS – were configured using the default XCS settings recommended in [6]. However, determining the most appropriate population size for both the standard XCS model and the isolated populations in the CoXCS required some maneuvering based on previously published population sizes. Consequently, the population size was dependent on the given multiplexer problem: population sizes of 2000, 3000 and 4000 were used for each of the problems 20, 37 and 70 respectively. The partitioning of the condition segment feature space was also problem dependent. Here, we have set the number of partitions n to 2, 2, 6 for multiplexer 20, 37 and 70 respectively. In all scenarios, when used the the migration rate for was set to 10%. All simulation experiments were run using a cluster server with 20 nodes; each node has a Xeon Pro 3.0 GHz processor with 4 GB memory and 80 GB of hard disk capacity. The operating system installed on the cluster server was RedHat Enterprise Linux 5. The XCS models were written in C++. C. Results XCS has two modes: explore and exploit, which are employed iteratively. The exploit mode is used to test the
accuracy of learning process, while the explore mode corresponds to the learning phase. We report time series results of accuracy and population coverage across simulation trials in steps of 200 exploit steps. The plots show averaged values across 30 different trials. Error bars have been omitted from the plots for clarity. Figures 4(a), 5(a), and 6(a) plot results from the trials running the standard XCS for multiplexer 20, 37, 70 problems respectively. As expected, the time required to find 100% accurate solutions increase with problem size. For example, multiplexer-20 can be solved in 3 × 104 iterations, multiplexer-37 takes 2×105 iterations. However, multiplexer70 could not be solved accurately in 12 × 106 iterations. The plots also show that the number of unique classifiers in the population reduces (the learning process is focused on more informative classifiers) as the accuracy level improves. Figure 4(b) demonstrates that all CoXCS configurations (DRP, FRPM,and DRPM) are comparable to, and in some cases outperform the standard XCS. Considering the fact that we run the CoXCS version by using two partitions (subpopulations), the CoXCS versions (DRP, FRPM,and DRPM) only require 2 × 1.4 × 104 iterations, which is significantly less than the time required by the standard XCS. Significantly, the number of unique classifiers in the population reduces over time. Figure 5(b) is also demonstrates that all CoXCS versions have comparable execution time to the standard XCS. The number of iterations taken to find 100% accuracy was 2×1× 104 iterations for CoXCS and 2 × 105 iterations for XCS. In Figure 6(b) the performance differences between the standard XCS and each of the CoXCS partitioning schemes are clearly visible. There is a significant performance improvement when feature space partitioning (DRP, FRPM,and DRPM) is used. This problem is complex, non-linear classification problem – CoXCS can solve it in a reasonable time frame. The results also suggest that migration episodes play an important role in guiding the trajectory of the evolving population. V. D ISCUSSION In order to solve a classification problem, it is necessary to identify a compact set of rules that classifies all problem instances with maximal accuracy. In the case of learning classifiers such as XCS, each classifier identifies what the system knows about the problem solution. As such, XCS is an adaptive system that learns from an internal structural evolution guided by reinforcement learning. In this study, we have taken an abstract view to the problem of classification, and sought inspiration from research into large-scale optimization problems. In CoXCS, an explicit divide-and-conquer strategy encourages the evolution of specialized classifiers and allows us to maximize the advantages of the embedded reinforcement learning mechanism in XCS. In our previous work, we have shown that the CoXCS is comparable with, and in many cases out performs the standard XCS, especially when confronted with problems characterized by an extremely large ratio of data features to
samples. However, in that study, only a fixed partitioning strategy was examined. Here, we have extended this work by analyzing the performance of alternative feature space partitioning techniques using three large instances of the multiplexer problem. The dynamic partitioning and migration techniques used in CoXCS can be thought of as “high level” evolutionary operators. Here, the combination of isolated populations and feature-space partitioning provides a “restricted mating” rule exchange protocol. The operators provide the means to both divide and put together groups of individual classifiers (individuals) in a cooperative problems solving framework. CoXCS is able to utilize the advantages of the fixed-length encoding inherent in the Michigan style models. This approach provides a flexible means of balancing between local search and the exploration of new regions of the search space. The feature space partitioning protocols introduced, have not specifically taken into account the role of interacting features within the condition segment of a rule. This is one limitation of the model that requires further investigation. However, the motivation for this particular approach was based on work reported in [26]. We have adopted an approach that allows each of the isolated subpopulation to be working on different sections of the feature space. This is achieved either by a dynamically changing allocation of features to partitions or via a combination of migration episodes followed by crossover operators within the evolutionary modules. This technique suggests that across the action sets of the isolated XCS populations, non-overlapping classifiers may well be present without the existence of any overlapping ones. This in turn may impact on the effectiveness of the recombination used throws up an additional challenge. The experimental results presented above illustrate the effects of the evolutionary operators and the fitness function. CoXCS not only evolves rules with high predictive accuracy, but tends to produce maximally general solution that can describe particular niches, as a consequence of the concatenation of individuals from each of the isolated populations. We can conclude, that based on the multiplexer problems examined, coevolving multiple populations improves the performance of XCS. In particular, the use of a dynamic random partitioning strategy appears to promote improved performance – a results consistent with finding from largescale optimization reported in [26]. In addition, there is supporting evidence suggesting that intermittent migration can in fact speed up the convergence rate of the co-evolutionary XCS – a result consistent with the findings reported in [3]; especially when there is an increasing number of isolated subpopulations. Obviously, when confronted with large-scale classification tasks the use of a parallel and distributed infrastructure can be used to reduce computational time. In this study, we have shown that from a design perspective the use of isolated populations (even when deployed/implemented in a serial manner on a single core machine) can lead to significant performance improvements. Thus, CoXCS with alternative feature space partitioning protocols provides a
1
Accuracy and Population/2000
Accuracy and Population/2000
1 0.9
Accuracy of Standard XCS Population of Standard XCS
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
1
2
3
4
5
6
7
8
9
Exploit Trials
0.9
0.7 0.6 0.5 0.4 0.3 0.2 0.1
10
Accuracy of CoXCS DRP Population of CoXCS DRP Accuracy of CoXCS FRPM Population of CoXCS FRPM Accuracy of CoXCS DRPM Population of CoXCS DRPM
0.8
0
Accuracy and Population/3000
Accuracy and Population/3000
1
1.2
1.4
1.6
1.8
2 4
x 10
1
Accuracy of Standard XCS Population of Standard XCS 0.8
0.7
0.6
0.5
0.4
0.3
0
1
2
3
4
5
0.9
0.7 0.6 0.5 0.4 0.3 0.2 0.1
6
Accuracy of CoXCS DRP Population of CoXCS DRP Accuracy of CoXCS FRPM Population of CoXCS FRPM Accuracy of CoXCS DRPM Population of CoXCS DRPM
0.8
0
0.5
1.5
2
2.5
Exploit Trials
x 10
(a) Standard XCS Fig. 5.
1
5
Exploit Trials
3 5
x 10
(b) CoXCS (DRP,FRPM,DRPM)
The performance of the standard XCS (a), and CoXCSs (DRP, FRPM, DRPM) (b) on multiplexer-37 test.
1
Accuracy and Population/4000
1
Accuracy and Population/4000
0.8
The performance of the standard XCS (a), and CoXCSs (DRP, FRPM, DRPM) (b) on multiplexer-20 test.
0.9
0.9
0.8
Accuracy of Standard XCS Population of Standard XCS
0.7
0.6
0.5
0.4
0.3
0.2
0.6
(b) CoXCS (DRP, FRPM, DRPM)
1
0.2
0.4
Exploit Trials
(a) Standard XCS Fig. 4.
0.2
4
x 10
0
2
4
6
Exploit Trials
8
10
12 6
x 10
(a) Standard XCS Fig. 6.
0.9 Accuracy of CoXCS DRP Population of CoXCS DRP Accuracy of CoXCS FRPM Population of CoXCS FRPM Accuracy of CoXCS DRPM Population of CoXCS DRPM
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
0.5
1
1.5
2
Exploit Trials
2.5
3 6
x 10
(b) CoXCS (DRP,FRPM,DRPM)
The performance of the standard XCS (a), and CoXCSs (DRP, FRPM, DRPM) (b) on multiplexer-70 test.
robust framework for tackling complex classification tasks. VI. C ONCLUSIONS In this paper, we have provided a detailed investigation of a new multiple population based XCS for the Boolean multiplexer problem. The motivation behind this work, was to design a robust, scaleable approach for a given classification task based on a partitioning of the condition segment feature space. Previous work from the large-scale optimization community has provided the framework for the development of our model. We have examined two alternative featureto-partition allocation techniques, combined with migration
episodes. Detailed simulation experiments show that the CoXCS models is able to generate accurate solutions faster than XCS. The results suggest that the partition strategy plays an important role in guiding the trajectory of the evolving populations. In future work, there is scope to examine the effectiveness of a distributed deployment and alternative migration policies using a range of different classification problems. It might also be interesting to examine a theoretical explanation of the findings.
R EFERENCES [1] M. Abedini and M. Kirley. CoXCS: A Coevolutionary Learning Classifier Based on Feature Space Partitioning Australasian Conference on Artificial Intelligence, volume 5866 Lecture Notes in Artificial Intelligence. pages 360–369. Springer-Verlag, Berlin 2008. [2] L. Bull and T. Kovacs, editors. Foundations of Learning Classifier Systems, volume 183 of Studies in Fuzziness and Soft Computing. Springer, 2005. [3] L. Bull, M. Studley, A. Bagnall, and I. Whittley. Learning Classifier System Ensembles With Rule-Sharing. IEEE Transactions on Evolutionary Computation, 11(4):496-502, 2007. [4] M. Butz, M. Pelikan, X. Lloral, and D. E. Goldberg. Automated global structure extraction for effective local building block processing in XCS. Evolutionary Computation, 14(3):345–380, 2006. [5] M. V. Butz, T. Kovacs, P. L. Lanzi, and S. W. Wilson. Toward a theory of generalization and learning in XCS. Evolutionary Computation, IEEE Transactions on, 8(1):28–46, 2004. [6] M. V. Butz and S. W. Wilson. An Algorithmic Description of XCS. In P. L. Lanzi, W. Stolzmann, and S. W. Wilson, editors, Advances in Learning Classifier Systems, volume 1996/2001 of Lecture Notes in Computer Science, pages 267–274, Berlin, 2001. Springer Berlin / Heidelberg. [7] H. H. Dam, H. A. Abbass, and C. Lokan. DXCS: an XCS system for distributed data mining. In Proceedings of the 2005 conference on Genetic and evolutionary computation (GECCO–05), pages 1883– 1890. ACM Press, 2005. [8] K.A. DeJong, W.M. Spears, and D.F. Gordon Using genetic algorithms for concept learning. Machine Learning, 13(2):161–188, 1993. [9] M. Gershoff and S. Schulenburg. Collective behavior based hierarchical XCS. In Proceedings of the 2007 Genetic And Evolutionary Computation Conference (GECCO–07), pages 2695–2700. ACM Press, 2007. [10] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Mass., 1989. [11] J. H. Holland, L. B. Booker, M. Colombetti, M. Dorigo, D. E. Goldberg, S. Forrest, R. L. Riolo, R. E. Smith, P. L. Lanzi, W. Stolzmann, and S. W. Wilson. What is a Learning Classifier System? In P. L. Lanzi, W. Stolzmann, and S. W. Wilson, editors, Learning Classifier Systems. From Foundations to Applications, volume 1813 of LNAI, pages 3–32, Berlin, 2000. Springer-Verlag. [12] L. Jiao, J. Liu and W. Zhong An organizational coevolutionary algorithm for classification. IEEE Transactions on Evolutionary Computation, 10(1):67–80, 2006. [13] X. Llora, and J.M. Garrell Co-evolving Different Knowledge Representations with fine-grained Parallel Learning Classifier Systems roc. Genetic and Evol. Comp.Conf. (GECCO2002), 2002. [14] T. Kovacs. Two views of classifier systems. In P. L. Lanzi, W. Stolzmann, and S. W. Wilson, editors, Advances in Learning Classifier Systems, volume 2321 of LNAI, pages 74–87. Springer-Verlag, Berlin, 2002. [15] P. L. Lanzi. A Study of the Generalization Capabilities of XCS. In T. B¨ack, editor, Proceedings of the 7th International Conference on Genetic Algorithms, pages 418–425. Morgan Kaufmann, 1997. [16] P. L. Lanzi, W. Stolzmann, and S. W. Wilson, editors. Learning Classifier Systems From Foundations to Applications, volume 1813 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Berlin, 1st edition, 2000. [17] A. Pietramala, V. L. Policicchio, P. Rullo, and I. Sidhu. A Genetic Algorithm for Text Classification Rule Induction. In ECML PKDD ’08: Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II, pages 188–203, Berlin, Heidelberg, 2008. Springer-Verlag. [18] M. A. Potter and K. A. D. Jong. A Cooperative Coevolutionary Approach to Function Optimization. Lecture Notes in Computer Science, 866, 1994. [19] M. A. Potter and K. A. D. Jong. Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents. Evolutionary Computation, 8(1):1–29, 2000. [20] U. Richter, H. Prothmann, and H. Schmeck. Improving XCS performance by distribution Simulated Evolution and Learning, volume 5361 Lecture Notes in Computer Science. pages 111–120. Springer-Verlag, Berlin 2008.
[21] Y. Shi, H. Teng, and Z. Li. Cooperative Co-evolutionary Differential Evolution for Function Optimization. In Advances in Natural Computation, volume Volume 3611/2005 of Lecture Notes in Computer Science, pages 1080–1088. Springer Berlin / Heidelberg, 2005. [22] B. Skinner, H. Nguyen, and D. Liu. Distributed classifier migration in XCS for classification of electroencephalographic signals. In 2007 IEEE Congress on Evolutionary Computation, pages 2829–2836. IEEE Press, 2007. [23] P. Stalph, M. Butz, D.E. Goldberg, and Z. Llor`a On the scalability of XCS(F). In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1315-1322. 2009. [24] S. W. Wilson. Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2):149–175, 1995. http://prediction-dynamics.com/. [25] S. W. Wilson. Get Real! XCS with Continuous-Valued Inputs. In P. L. Lanzi, W. Stolzmann, and S. W. Wilson, editors, Learning Classifier Systems, From Foundations to Applications, volume 1813 of Lecture Notes in Computer Science, pages 209–222. Springer, 1999. [26] Z. Yang, K. Tang, and X. Yao. Large scale evolutionary optimization using cooperative coevolution. Information Sciences, 178(15):2985– 2999, August 2008. [27] Y. Zhang and J. C. Rajapakse. Machine Learning in Bioinformatics. Wiley Series in Bioinformatics. Wiley, 1’st edition, 2008. [28] F. Zhu and S. Guan. Cooperative co-evolution of GA-based classifiers based on input decomposition. Engineering Applications of Artificial Intelligence, 21:1360–1369, 2008.