Investigation of Rotation Forest Ensemble Method Using Genetic Fuzzy Systems for a Regression Problem Tadeusz Lasota1, Zbigniew Telec2, Bogdan Trawiński2, Grzegorz Trawiński3 1
Wrocław University of Environmental and Life Sciences, Dept. of Spatial Management ul. Norwida 25/27, 50-375 Wrocław, Poland 2 Wrocław University of Technology, Institute of Informatics, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland 3 Wrocław University of Technology, Faculty of Electronics, Wybrzeże S. Wyspiańskiego 27, 50-370 Wrocław, Poland
[email protected],
[email protected], {zbigniew.telec, bogdan.trawinski}@pwr.wroc.pl
Abstract. The rotation forest ensemble method using a genetic fuzzy rule-based system as a base learning algorithm was developed in Matlab environment. The method was applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The computationally intensive experiments were conducted aimed to compare the accuracy of ensembles generated by our proposed method with bagging, repeated holdout, and repeated cross-validation models. The statistical analysis of results was made employing nonparametric Friedman and Wilcoxon statistical tests. Keywords: rotation forest, genetic fuzzy systems, bagging, repeated holdout, cross-validation, ensemble models, property valuation
1 Introduction Ensemble machine learning models have been focusing attention of many researchers due to its ability to reduce bias and/or variance when compared to their single model counterparts. The ensemble learning methods combine the output of machine learning systems, called in literature “weak learners” due to their performance, in order to get smaller prediction errors (in regression) or lower error rates (in classification). The individual estimator must provide different patterns of generalization, thus the diversity plays a crucial role in the training process. To the most popular methods belong bagging [1], boosting [19], and stacking [20]. In this paper we focus on bagging family of techniques. Bagging, which stands for bootstrap aggregating, devised by Breiman [1] is one of the most intuitive and simplest ensemble algorithms providing good performance. Diversity of learners is obtained by using bootstrapped replicas of the training data. That is, different training data subsets are randomly drawn with replacement from the original training set. So obtained training data subsets, called also bags, are used then to train different classification and regression models. Theoretical analyses and
experimental results proved benefits of bagging, especially in terms of stability improvement and variance reduction of learners for both classification and regression problems [4], [7]. Another approach to ensemble learning is called the random subspaces, also known as attribute bagging [3]. This approach seeks learners diversity in feature space subsampling. All component models are built with the same training data, but each takes into account randomly chosen subset of features bringing diversity to ensemble. For the most part, feature count is fixed at the same level for all committee components. The method is aimed to increase generalization accuracies of decision tree-based classifiers without loss of accuracy on training data. Ho showed that random subspaces can outperform bagging and in some cases even boosting [9]. While other methods are affected by the curse of dimensionality, random subspace technique can actually benefit out of it. Both bagging and random subspaces were devised to increase classifier or regressor accuracy, but each of them treats the problem from different point of view. Bagging provides diversity by operating on training set instances, whereas random subspaces try to find diversity in feature space subsampling. Breiman [2] developed a method called random forest which merges these two approaches. Random forest uses bootstrap selection for supplying individual learner with training data and limits feature space by random selection. Some recent studies have been focused on hybrid approaches combining random forests with other learning algorithms [8], [12]. Rodríguez et al. [18] proposed in 2006 a new classifier ensemble method, called rotation forest, applying Principal Component Analysis (PCA) to rotate the original feature axes in order to obtain different training sets for learning base classifiers. Their approach attempted to achieve simultaneously greater diversity and accuracy of components within the ensemble. The diversity was obtained by applying PCA to some kind of feature manipulation for individual classifier and accuracy was acquired by holding all principal components (i.e. all features) and employing all instances to train each classifier. They conducted a validation experiment comparing rotation forest with bagging, AdaBoost, and random forests. The authors explore the effect of the design choices and parameter values on the performance of rotation forest ensembles in [15]. Zhang & Zhang successfully combined rotation forest with other ensemble classification techniques, namely with bagging [21] and with AdaBoost [22] and tested them using benchmark classification datasets. Kotsianis [12] built a hybrid ensemble combing bagging, boosting, rotation forest and random subspace. Jędrzejowicz & Jędrzejowicz [10] explored rotation forest based classifier ensembles created using expression trees induced by gene expression programming. Some research was also done into the application of rotation forest into regression problems. Zhang et al. [23] compared the method with bagging, random forest, AdaBoost.R2, and a single regression tree. Kotsiantis & Pintelas [13] proposed a combined technique of local rotation forest of decision stumps. In the majority of the aforementioned experiments on rotation forest techniques various kinds of decision trees were used as base learning algorithms because they reveal relative high computational efficiency and provide diverse component models of an ensemble. Due to the fact that in rotation forest each tree is trained on a dataset in a rotated feature space and it builds classification regions using hyperplanes parallel to the feature axes, even a small rotation of the axes may lead to a very
different tree [15]. In this paper we explore the rotation forest technique using a genetic fuzzy system as a base learning method applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions obtained from a cadastral system. To the best of our knowledge there are not yet any published results of ensemble models created using rotation forest with genetic fuzzy systems, probably due to their considerable computational load. The research was conducted with our newly developed experimental system in Matlab environment to test multiple models using different resampling methods. So far, we have investigated genetic fuzzy systems and genetic fuzzy networks applied to construct regression ensemble models to assist with real estate appraisal using our system [11], [16], [17].
2 Rotation Forest with Genetic Fuzzy System Our study consisted in the application of the rotation forest (RF) method using a genetic fuzzy system (GFS) as a base learning algorithm to a real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions obtained from a cadastral system. We compared in terms of accuracy the models produced by RF with the ones generated employing other resampling techniques such repeated bootstrap, i.e. bagging (BA), repeated holdout (HO), and repeated cross-validation (CV). In our Matlab experimental system we developed a data driven fuzzy system of Mamdani type, where for each input and output variables triangular and trapezoidal membership functions were automatically determined by the symmetric division of the individual attribute domains. The evolutionary optimization process utilized a genetic algorithm of Pittsburgh type and combined both learning the rule base and tuning the membership functions using real-coded chromosomes. Similar designs are described in [5], [6], [14]. The rotation forest method was implemented in the Matlab environment based on ideas described by Rodrígues et al. [18] and Zhang et al. [23]. The pseudo code of our rotation forest ensemble method employing the genetic fuzzy system as a base learning algorithm is presented in Fig. 1.
3 Plan of Experiments Four following approaches to create ensemble models using genetic fuzzy systems as a base learning algorithm were employed in our experiments: rotation forest (RF), bagging (BA), repeated holdout (HO), and repeated cross-validation (CV). In each case an ensemble comprised 50 component models. They were tested with following parameters: RF: M=2, 3, and 4 – rotation forest with three different numbers of input attributes in individual feature subset, as described in Section 2. BA: B90, B70, B50 – rotation forest with three different sizes of bootstrap samples drawn from Xij equal to 50%, 70%, and 90%, respectively, as described in Section 2.
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Given: • GFS: genetic fuzzy system used as a base learning algorithm • L: number of fuzzy models (fuzzy inference systems - FIS) that make up an ensemble • n: number of input attributes in a base training data set • N: number of instances in a base training data set • X: N × n matrix of input attribute values for individual instances • Y: N × 1 vector of output attribute values • T=(X, Y): base training dataset as the concatenation of X and Y • F: feature set, Fij – j-th attribute subset used for training i-th FISi • M: number of input attributes in individual feature subset • Xij: data corresponding Fij selected from matrix X • X ij′ : - bootstrap sample from Xij • Dij: M× M matrix to store the coefficients of principal components computed by PCA • Ri: block diagonal matrix built of Dij matrices • Ria : rotation matrix to obtain training set for i-th FIS Training Phase For i=1,2, ,L • Calculate rotation matrix Ria for i-th FIS 1. Randomly split F into K subsets Fij (j=1,..K) for M attributes each (last subset may contain less than M attributes) 2. For j=1,2,…,K a. Select columns of X that correspond to the attributes of Fij to compose a new matrix Xij b. Draw a bootstrap sample X ij′ from Xij, with sample size smaller than c. 3. 4. 5.
that of Xij, with eg size (B) equal to 50%, 70%, or 90% of Xij Apply PCA to X ij′ to obtain a matrix Dij whose p-th column comprises
the coefficients of p-th principal component EndFor Build a block diagonal matrix Ri of matrices Dij (j=1,2,…,K) Construct the resulting rotation matrix Ria by rearranging rows of Ri to match the order of attributes in F
• Compute (X Ria , Y) and use it as an input of GFS (training dataset) to obtain i-th FISi EndFor Predicting Phase • For any instance xt from test dataset, let FISi xt Ria be the value predicted by i-th
(
fuzzy model, then the predicted output value
ytp =
L
y
)
p t
can be computed as
(
1 FISi xt Ria ∑ L i =1
)
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Fig. 1. Pseudo code of rotation forest ensemble method employing the genetic fuzzy system
BA: B100, B80 – m-out-of-n bagging with replacement with different sizes of samples. The numbers in the codes indicate what percentage of the training set was drawn. HO: H100, H80 – repeated holdout, i.e., m-out-of-n bagging without replacement with different sizes of samples. The numbers in the codes indicate what percentage of the training set was drawn. CV: 5x10cv, 10x5cv– repeated cross-validation, with different k-fold crossvalidation splits, for k=5 and 10, were repeated 5 and 10 times, respectively. Real-world data used in experiments was drawn from an unrefined dataset containing above 50 000 records referring to residential premises transactions accomplished in one Polish big city with the population of 640 000 within eleven years from 1998 to 2008. In this period most transactions were made with non-market prices when the council was selling flats to their current tenants on preferential terms. First of all, transactional records referring to residential premises sold at market prices were selected. Then, the dataset was confined to sales transaction data of apartments built before 1997 and where the land was leased on terms of perpetual usufruct. Hence, the final dataset counted 5303 records. Five following attributes were pointed out as price drivers by professional appraisers: usable area of a flat (Area), age of a building construction (Age), number of storeys in the building (Storeys), number of rooms in the flat including a kitchen (Rooms), the distance of the building from the city centre (Centre), in turn, price of premises (Price) was the output variable. Due to the fact that the prices of premises change substantially in the course of time, the whole 11-year dataset cannot be used to create data-driven models. In order to obtain comparable prices it was split into subsets covering individual years. Then the prices of premises were updated according to the trends of the value changes over 11 years. Starting from the beginning of 1998 the prices were updated for the last day of subsequent years. The trends were modelled by polynomials of degree three. The chart illustrating the change trend of average transactional prices per square metre is given in Fig. 2. We might assume that one-year datasets differed from each-other and might constitute different observation points to compare the accuracy of ensemble models in our study. The sizes of one-year datasets are given in Table 1.
Fig. 2. Change trend of average transactional prices per square metre over time Table 1. Number of instances in one-year datasets 1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
269
477
329
463
530
653
546
580
677
575
204
As a performance function the root mean square error (RMSE) was used, and as aggregation functions of ensembles arithmetic averages were employed. Each input and output attribute in individual dataset was normalized using the min-max approach. The parameters of the architecture of fuzzy systems as well as genetic algorithms are listed in Table 2. Table 2. Parameters of GFS used in experiments Fuzzy system Type of fuzzy system: Mamdani No. of input variables: 5 Type of membership functions (mf): triangular No. of input mf: 3 No. of output mf: 5 No. of rules: 15 AND operator: prod Implication operator: prod Aggregation operator: probor Defuzzyfication method: centroid
Genetic Algorithm Chromosome: rule base and mf, real-coded Population size: 50 Fitness function: MSE Selection function: tournament Tournament size: 4 Elite count: 2 Crossover fraction: 0.8 Crossover function: two point Mutation function: custom No. of generations: 100
In order to ensure the same experimental conditions for all ensemble learning methods, before employing a given method each one-year dataset was split into training and test sets in the proportion of 80% to 20% instances using stratified approach. Namely, each dataset was divided into several clusters with k-means algorithm. The number of clusters was determined experimentally according to the best value of Dunn’s index. Then, 80% of instances from each cluster were randomly selected to the training set and remaining 20% went to the test set. After that, each learning method was applied to the training set and 50 models were generated. The performance of each model was determined using test set. And finally, the average value of accuracy measure over all component models constituted the resulting indicator of the performance of a given method over the one-year dataset. A schema illustrating the flow of experiments is shown in Fig. 3. In turn, having averaged values of RMSE for all one-year datasets we were able to make non-parametric Friedman and Wilcoxon statistical tests to compare the performance of individual methods with different parameters as well as all considered methods.
Fig. 3. Outline of experiment with training and test sets created using stratified approach.
4 Results of Experiments The performance of RF for different values of parameters M and B in terms of RMSE is illustrated graphically in Figures 4 and 5 respectively. In turn, the results provided by the BA, HO, and CV models created using genetic fuzzy systems (GFS) are shown in Figures 6, 7, and 8 respectively. The comparison of the best selected variants of individual methods is given in Fig. 9. The statistical analysis of the results was carried out with non-parametric Friedman and Wilcoxon tests performed in respect of RMSE values of all models built over 11 one-year datasets. The tests were made using Statisitca package. Average ranks of individual models provided by Friedman tests are shown in Table 3, where the lower rank value the better model, and asterisks indicate statistical significance at the level α=0.05.
Fig. 4 Performance of Rotation Forest ensembles with different values of parameter M and B70
Fig. 5 Performance of Rotation Forest ensembles with different values of B and M=2
Fig. 6 Performance comparison of Repeated Holdout ensembles for H100 and H80
Fig. 7 Performance comparison of Bagging ensembles for BA100 and BA80
Fig. 8 Performance comparison of Cross-validation ensembles for CV10x5 and CV5x10
Fig. 9 Performance comparison of the best of RF, BA, HO, and CV ensembles Table 3. Average rank positions of models determined during Friedman test (* significant) Methods RF for M={2,3,4} RF for B={50, 70, 90} *HO100, HO80 BA100, BA80 *CV10x5, 5x10 *RF,BA,HO, CV
1st RF(M=2) (1.55) RF(B90) (1.64) H100 (1.18) BA80 (1.45) CV10x5 (1.09) CV10x5 (1.09)
2nd RF(M=4) (2.18) RF(B70) (2.09) H80 (1.82) BA100 (1.55) CV5x10 (1.91) RF(B90) (2.82)
3rd RF(M=3) (2.27) RF(B50) (2.27)
4th
BA80 (3.00)
HO100 (3.09)
With our specific datasets and experimental setup the best results produced RF with M=2 and B90, however the differences in accurracy among RF with different parameters were statictically insignificant. To the final comparison ensembles created using RF with M=2 and B90 were chosen bacause parameter M=2 provides the highest diversity of component models and B90 ensured the best accuracy when compared to HO and BA. As for individual methods, HO100 and CV10x5 revealed significantly better performance than H80 and CV5x10 respectively. No significant differences were observed between B100 and B80.
The final comparison comprised the best selected ensembles produced by individual methods, namely RF(M=2,B90), BA80, H100, and CV10x5. Friedman test indicated significant differences in performance among the methods. According to Wilcoxon test CV outperformed significantly all other techniques. RF, BA, and HO performed equivalently, but among them RF gained the best rank value. When considered each one-year data point separately, RF revealed better performance than BA and HO over 5 one-year datasets and for the dataset of 2005 outperformed all other methods.
5 Conclusions and Future Work The rotation forest ensemble method using a genetic fuzzy rule-based system system as a base learning algorithm was developed in Matlab environment. The method was applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The computationally intensive experiments were conducted aimed to compare the accuracy of ensembles generated by our proposed method with bagging, repeated holdout, and repeated cross-validation models. The statistical analysis of results was made employing nonparametric Friedman and Wilcoxon statistical tests. The overall results of our investigation are as follows. The ensembles created using genetic fuzzy systems revealed prediction accuracy not worse than bagging and repeated holdout models. Cross-validation approach outperformed other techniques. The proposed method seem to be useful for our real-world regression problem. Further investigations into rotation forest combined with genetic fuzzy systems are planned using benchmark regression datasets preprocessed with instance and feature selection algorithms. The resistance of the method to noised data will be also examined. Acknowledgments. This paper was partially supported by the Polish National Science Centre under grant no. N N516 483840.
References 1. 2. 3. 4. 5. 6.
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123--140 (1996) Breiman, L.: Random Forests. Machine Learning 45(1), 5--32 (2001) Bryll, R.: Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 20(6), pp. 1291--1302 (2003) Bühlmann, P., Yu, B.: Analyzing bagging. Annals of Statistics 30, 927--961 (2002) Cordón, O., Gomide, F., Herrera, F., Hoffmann, F., Magdalena, L.: Ten years of genetic fuzzy systems: current framework and new trends. Fuzzy Sets and Systems 141, 5--31 (2004) Cordón, O., Herrera, F.: A Two-Stage Evolutionary Process for Designing TSK Fuzzy Rule-Based Systems. IEEE Tr on Sys., Man, and Cyb.-Part B 29(6), 703--715 (1999)
7.
8. 9. 10.
11. 12. 13. 14.
15. 16. 17.
18. 19. 20. 21. 22. 23.
Fumera, G., Roli, F., Serrau, A.: A theoretical analysis of bagging as a linear combination of classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 30:7, pp. 1293-1299 (2008) Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous. In 2008 Seventh International Conference on Machine Learning and Applications, ICMLA'08, pp. 900--905 (2008) Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832--844 (1998) Jędrzejowicz, J., Jędrzejowicz, P.: Rotation Forest with GEP-Induced Expression Trees. In J. O'Shea et al. (Eds.): KES-AMSTA 2011, LNAI 6682, pp. 495–503, Springer, Heidelberg (2011) Kempa, O., Lasota, T., Telec, Z., Trawiński, B.: Investigation of bagging ensembles of genetic neural networks and fuzzy systems for real estate appraisal. N.T. Nguyen et al. (Eds.): ACIIDS 2011, LNAI 6592, pp. 323–332, Springer, Heidelberg (2011) Kotsiantis, S.: Combining bagging, boosting, rotation forest and random subspace methods. Artificial Intelligence Review 35(3), 223-240 (2011) Kotsiantis, S.B., Pintelas, P.E.: Local Rotation Forest of Decision Stumps for Regression Problems. In: 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, 170-174 (2009) Król, D., Lasota, T., Trawiński, B., Trawiński, K.: Investigation of Evolutionary Optimization Methods of TSK Fuzzy Model for Real Estate Appraisal. International Journal of Hybrid Intelligent Systems 5(3), 111--128 (2008) Kuncheva, L.I., Rodríguez, J.J.: An Experimental Study on Rotation Forest Ensembles. In: M. Haindl et al. (Eds.):MCS 2007. LNCS 4472, pp. 459–468. Springer, Heidelberg (2007) Lasota, T., Telec, Z., Trawiński, G., Trawiński B.: Empirical Comparison of Resampling Methods Using Genetic Fuzzy Systems for a Regression Problem. In H. Yin et al. (Eds.): IDEAL 2011, LNCS 6936, pp. 17–24, Springer, Heidelberg (2011) Lasota, T., Telec, Z., Trawiński, G., Trawiński B.: Empirical Comparison of Resampling Methods Using Genetic Neural Networks for a Regression Problem In E. Corchado et al. (Eds.): HAIS 2011, LNAI 6679, pp. 213–220, Springer, Heidelberg (2011) Rodrígeuz, J.J., Kuncheva, I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Trans. on Pattern Analysis and Mach. Intel. 28(10), 1619–1630 (2006) Schapire, R. E.: The strength of weak learnability, Mach. Learning 5:2, 197--227 (1990) Wolpert, D.H.: Stacked Generalization. Neural Networks 5:2, pp. 241--259 (1992) Zhang, C.-X., Zhang, J.-S.: A variant of Rotation Forest for constructing ensemble classifiers. Pattern Analysis & Applications 13(1), 59-77 (2010) Zhang, C.-X., Zhang, J.-S.: RotBoost: A technique for combining Rotation Forest and AdaBoost. Pattern Recognition Letters 29(10), 1524–1536 (2008) Zhang, C.-X., Zhang, J.-S., Wang, G.-W.: An empirical study of using Rotation Forest to improve regressors. Applied Mathematics and Computation 195(2), 618-629 (2008)