Feature Subset Selection by Population-Based ... - CiteSeerX

1

Feature Subset Selection by Population-Based Incremental Learning. A case study in the survival of cirrhotic patients treated with TIPS I. Inza1 , M. Merino2, P. Larra~naga1, J. Quiroga3, B. Sierra1, M. Girala3 1

Dept. of Computer Science and Arti cial Intelligence. University of the Basque Country. Donostia - San Sebastian. Spain 2

Basque Health Service - Osakidetza. Donostia - San Sebastian. Spain 3

University Clinic of Navarra. Pamplona - Iru~na. Spain

Abstract The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical sta's experience, the consequences of the TIPS are not homogeneous for all the patients and a subgroup of them dies in the rst six months after the TIPS placement. Actually, there is no risk indicator to identify this group, before treatment. An investigation for predicting the survival of cirrhotic patients treated with TIPS was carried out using a clinical database with 107 cases and 77 attributes. Naive-Bayes rule and ID3 decision tree classi er were used with the whole set of attributes in the prediction of the survival of cirrhotic patients for the rst six months after the placement of the TIPS. Due to the large amount of attributes and with the aim of obtaining more accurate and understandable models, FSS-PBIL (Feature Subset Selection by Population-Based Incremental Learning), a new randomized, population-based feature subset selection algorithm based on the EDA paradigm (Estimation of Distribution Algorithm) was applied. FSS-PBIL signi cantly improved the predictive accuracy of Naive-Bayes and ID3 and considerably reduced the amount of attributes in theirs classi cation models, comparing to the models obtained when the whole set of attributes was used. Due to this reduction in the number of features, a rise in the interpretability and comprehensibility of the classi cation models was achieved, increasing the con dence in the models and their acceptance by our medical sta. Keywords: Feature Subset Selection, Population-Based Incremental Learning, Transjugular Intrahepatic Portosystemic Shunt, Indications, Survival.

2

1 Introduction Portal hypertension is a major complication of chronic liver disease. By de nition, it is a pathological increase in the portal venous pressure which results in formation of porto-systemic collaterals that divert blood from the liver to the systemic circulation. This is caused by both an obstruction to out ow in the portal ow as well as increased mesenteric ow. In the western world, cirrhosis of the liver accounts for approximately 90% of all patients. Of the sequelae of portal hypertension (i.e. varices, encephalopathy, hypersplenism, ascites), bleeding from gastro-oesophageal varices is a signi cant cause of early mortality (approximately 30 ? 50% at the rst bleed) [1] [2]. Many eorts have been made over the past decades in the treatment of portal hypertension. This has resulted in an increasing number of randomized trials and publications but, unfortunately, therapeutic decision is not easy [3]. The Transjugular Intrahepatic Portosystemic Shunt (TIPS) is an interventional treatment resulting in decompression of the portal system by creation of a side-to-side portosystemic anastomosis. Since its introduction over 10 years ago [4] [5] and despite the large number of published studies, many questions remain unanswered. Taking survival into account, the results of the randomized trials should be interpreted with caution, as the studies were not designed to detect dierences in survival [6]. Our medical sta has found that a subgroup of patients dies in the rst six months after TIPS placement. Actually, there is no risk indicator to identify this group, before treatment. In this situation, the testing of Machine Learning techniques for TIPS indication or contraindication is an interesting research way. At rst, we tested the Naive-Bayes rule and the ID3 decision tree classi er in the prediction of the survival of cirrhotic patients for the rst 6 months after the placement of the TIPS. At this point, in our medical sta's opinion, the obtained predictive accuracies were good enough. However, the used database had few patients and a large set of measured attributes. It is well known that the accuracy of Machine Learning techniques is not monotonic with respect to the amount of attributes [7]. In this sense, given the entire set of attributes, we want to select the attribute subset with the best predictive accuracy. This problem is known in the Machine Learning community as the Feature Subset Selection (FSS) problem. A new randomized FSS algorithm, called FSS-PBIL (Feature Subset Selection by Population-Based

3 Incremental Learning), inspired in the EDA [8] [9] (Estimation of Distribution Algorithm) paradigm, has been applied. With the application of FSS-PBIL a considerable reduction in the amount of attributes of Naive-Bayes and ID3 predictive models was achieved as well as a signi cant improvemenet in their accuracies regarding the models with the whole set of attributes. The work is organized as follows: the next section presents the study database. Section 3 presents the Naive-Bayes rule, the ID3 algorithm and their results using the whole set of attributes. Section 4 presents the new approach for feature subset selection, FSS-PBIL algorithm, its results over the Naive-Bayes rule and ID3 algorithm and the corresponding considerations. The last section includes a brief summary of the study and some ideas about the future work.

2 Patients. Study database This study includes 127 patients with liver cirrhosis who underwent TIPS from May 1991 to September 1998 in the University Clinic of Navarra, Spain. The diagnosis of cirrhosis was based in liver histology in all cases. The indications for TIPS placement were: prophylaxis of rebleeding (68 patients), refractory ascites (28 patients), prophylaxis of bleeding (11 patients), acute bleeding refractory to endoscopic and medical therapy (10 patients), portal vein thrombosis (9 patients) and Budd-Chiari syndrome (1 patient). Statistical analysis includes 107 patients because 20 underwent liver transplantation the rst six months after TIPS placement. The data was documented using the SPSS software [10]. The study was approved by the local Ethics Committee, and informed oral consent was obtained from all patients. For each patient 77 attributes were measured for before TIPS placement (see Table 1). The problem has two dierent categories, re ecting whether the patient died in the rst 6 months after the placement of the TIPS or not. In the rst 6 months after the placement of the TIPS 33 patients died and 74 survived for a longer period, thus re ecting the utility and consequences of the TIPS were not homogeneous for all the patients.

4 Table 1: Attributes of the study database. History nding attributes:

Age Weight Bleeding origin Previous sclerotherapy Type of hepatic encephalophaty Volume of paracenteses Spontaneous bacterial peritonitis Diabetes mellitus Laboratory nding attributes:

Hemoglobin Serum sodium Urine potassium Urea Creatinine clearance GOT Alkaline phosphatase Serum albumin (g/dl) Parcial thrombin time FNG Dopamine Gamma-globulin CHILD score PUGH score Doppler sonography:

Portal size Portal flow left Endoscopy:

Size of esophageal varices Acute hemorrhage Hemodinamic parameters:

Arterial pressure (mm Hg) Free hepatic venous pressure Central venous pressure

Gender Etiology of cirrhosis Number of bleedings Restriction of proteins Ascites intensity Dose of furosemide Kidney failure

Height Indication of TIPS Prophylactic therapy with popranolol Number of hepatic encephalopathies Number of paracenteses Dose of spironolactone Organic nephropathy

Hematocrit Urine sodium Plasma osmolarity Plasma creatinine Fractional sodium excretion GPT Serum total bilirubin (mg/dl) Plateletes PRA Aldosterone Norepineohrine

White blood cell count Serum potassium Urine osmolarity Urine creatinine Diuresis GGT Serum conjugated bilirubin (mg/dl) Prothrombin time (%) Proteins ADH Epinephrine

Portal flow velocity Spleen lenght (cm)

Portal flow right

Gastric varices

Portal gastropathy

Heart rate (beats/min) Wedged hepatic venous pressure Portal pressure

Cardiac output (l/min) Hepatic venous pressure gradient (HVPG) Portosystemis venous pressure gradient

Angiography:

Portal thrombosis

3 Application of two Machine Learning algorithms to solve the problem Two well known Machine Learning algorithms, with completely dierent approaches to learning, were applied using the whole set of attributes to predict the survival of cirrhotic patients for the rst 6 months after the setting of the TIPS. Both algorithms were selected due to their simplicity and their long tradition in medical diagnose studies. Naive-Bayes rule [11] uses the Bayes theorem to predict the category for each case, assuming that

5 attributes are independent given the category. To classify a new patient characterized by d attributes

X = (X1; X2; :::; Xd) in our two-category problem C = fc1; c2g, where c1 implies that the patient survives more than 6 months and c2 implies that the patient does not survive more than 6 months, the Naive-Bayes classi er applies the following rule:

Yd

cN ?B = arg max p(cj ) p(xi jcj ) cj 2C i=1

where cN ?B denotes the category value output by the Naive-Bayes classi er (in our problem c1 or c2 ). Despite its simplicity, Naive-Bayes rule has obtained better results than more complex algorithms in many medical domains. Many researchers think that Naive-Bayes rule's success is based on the idea that doctors, in order to make a diagnosis, collect the attributes in the same way as Naive-Bayes rule uses them to classify: that is to say, independently respect the category. ID3 [12] represents a classi cation function by a decision tree. The tree is constructed in a top-down

way, dividing the training set and beginning with the selection of the best attribute in the root of the tree. The selection of the best attribute is based on an informatic theoretic approach. A descendant of the root node is then created for each possible value of the selected attribute, and the training cases are sorted to the appropiate descendant node. The entire process is then recursively repeated using the training cases associated with each descendant node to select the best attribute to test at that point in the tree. The process stops in each node of the tree when all cases in that point of the tree belong to the same category or the best split of the node does not surpass a xed chi-square signi cancy threshold. Costs of medical tests were not considered in the construction of classi cation models and accuracy maximization was the only goal of our research. Due to the low number of cases, the leave-one-out [13] procedure was used to estimate the accuracy of each method. In the leave-one-out technique, the learning algorithm is run k times, where k is the number of instances of the database. Each time k ? 1 instances are used for training and the remaining instance is used for testing, where each instance is used only once for testing. The leaveone-out estimate of accuracy is the overall number of correct classi cations, divided by k, the number of instances in the dataset. Experiments were run in a SUN-SPARC computer using MLC++ [14] Machine Learning library of programs for the presented classi ers.

6 When the entire set of attributes was presented to the algorithms the accuracy estimates for both induced classi ers were the following: Naive-Bayes: 72:90% 4:32%. ID3: 60:75% 4:74%. In the nal tree only 11 of initial 77 attributes appear. Tables 2 and 3 show the misclassi cation matrices [15] obtained by both classi ers: the rows represent the truly probabilities and the columns the predicted ones. Sensitivity and speci ty measures can be also extracted from the tables. Table 2: Misclassi cation matrix of probabilities for the Naive-Bayes rule without applying FSS. True class

Predicted class don't survive

survive

don't survive

0.12

0.19

survive

0.08

0.61

Table 3: Misclassi cation matrix of probabilities for the ID3 classi er without FSS application. True class


survive

don't survive

0.21

0.09

survive

0.07

0.62

4 Feature Subset Selection by Population-Based Incremental Learning At this stage of the study, the obtained accuracy results, specially for the Naive-Bayes classi er, were good enough in our medical sta's opinion. The graphical representation of ID3 classi cation tree also proved to be satisfactory for our medical experts. However, we saw room for improvement using the Feature Subset Selection (FSS) approach for both algorithms. First in this section, we will describe the FSS approach and review some FSS methods. Then FSS-PBIL, a new FSS algorithm based on the EDA approach will be presented as well as its results over the study database.

7

4.1 A review of some FSS methods The basic problem of Machine Learning is concerned with the induction of a model that classi es a given object into one of several known classes. In order to induce the classi cation model, each object is described by a pattern of d features. Here, the Machine Learning community has formulated the following question: are all of these d descriptive features useful for learning the `classi cation rule'? On trying to respond to this question, we come up with the Feature Subset Selection (FSS) approach which can be reformulated as follows: given a set of candidate features, select the best subset in a classi cation problem.

Irrelevant or redundant features, depending on the speci c characteristics of the learning algorithm, may degrade the predictive accuracy of the nal model. FSS objective will be the detection of these irrelevant or redundant features that hurt the performance of the classi cation algorithm. In addition, thanks to the deletion of features, the obtained classi er probably will be less complex and more understable by humans and a reduction in the cost of adquisition of the data can be achieved (this is a critical issue in medicine, when the costs of some medical tests are high). FSS can be viewed as a search problem [16], with each state in the search space specifying a subset of the possible features of the task. Exhaustive evaluation of possible feature subsets is usually unfeasible in practice due to the large amount of computational eort required. In this way, any feature selection method must determine four basic issues that de ne the nature of the search process:

1. The starting point in the space. It determines the direction of the search. One might start with no features and successively add them, or one might start with all features and successively remove them. One might also select an initial state somewhere in the middle of the search space.

2. The organization of the search. It determines the strategy of the search. Roughly speaking, the search strategies can be complete or heuristic (see [17] for a review of FSS algorithms). The basis of the complete search is the systematic examination of every possible feature subset. Three classic complete search implementations are depth- rst, breadth- rst and branch and bound search [18]. On the other hand, among heuristic algorithms, there are deterministic heuristic algorithms and non-deterministic heuristic ones. Classic deterministic heuristic FSS algorithms are sequential forward and backward selection (SFS and SBS [19]), oating selection methods (SFFS and SFBS [20]) and best- rst search [7]. They are deterministic in the sense that all the runs always obtain the same

8 solution. Non-deterministic heuristic search appears in a motivation to avoid getting stuck in local maximum. Randomness is used to escape from local maximum and this implies that one should not expect the same solution from dierent runs. Two classic implementations of non-deterministic search engines are Genetic Algorithms [21] and Simulated Annealing [22].

3. Evaluation strategy of feature subsets. The evaluation function identi es the promising areas of the search space. The objective of FSS algorithm is its maximization. The search algorithm uses the value returned by the evaluation function for helping to guide the search. Some evaluation functions carry out this objective looking only at the characteristics of the data, capturing the relevance of each feature or set of features to de ne the target concept: this type of evaluation functions are grouped below the lter strategy. However, John, Kohavi and P eger [23] reported that when the goal of FSS is the maximization of the accuracy, the features selected should depend not only on the features and the target concept to be learned, but also on the learning algorithm. Thus, they proposed the wrapper concept: this implies that the FSS algorithm conducts a search for a good subset using the induction algorithm itself as a part of the evaluation function, the same algorithm that will be used to induce the nal classi cation model. Once the classi cation algorithm is xed, the idea is to train it with the feature subset found by the search algorithm, estimating the accuracy and assigning it as the value of the evaluation function of the feature subset. In this way, representational biases of the induction algorithm which are used to construct the nal classi er are included in the FSS process. Figure 1 graphically presents both lter and wrapper approaches.

4. Criterion for halting the search. An intuitive approach for stopping the search will be the non-improvement of the evaluation function value of alternative subsets. Another classic criterion will be to x an amount of possible solutions to be visited along the search.

4.2 FSS-PBIL, a new randomized search engine Because a higher-accuracy, more compact and understandable classi cation model could be attractive for our medical sta, we decided to apply the FSS approach in our problem with the aim of improving the results obtained by Naive-Bayes and ID3 without the FSS approach. We will use a new randomized search algorithm, called FSS-PBIL, based on the evolution of populations, for the FSS process. FSS-PBIL roots are in the EDA paradigm [8] [9], which is inspired in

9 All features

Feature subset search

Feature subset evaluation

Induction algorithm

All features

Optimal feature subset

WRAPPER APPROACH

Feature subset search Feature subset evaluation based on the intrinsic characteristics of data

Optimal feature subset

FILTER APPROACH

Figure 1: The wrapper and lter approaches, based on the incorporation or absence of the learning algorithm in the feature subset selection process. Genetic Algorithms. Genetic Algorithms (GAs) [24] are one of the best known techniques for solving optimization problems. The GA is a population based search method. First a set of individuals is generated (a population), then promising individuals are selected, and nally new individuals which will form the new population are generated using crossover and mutation operators. An interesting adaptation of this is the Estimation of Distribution Algorithm (EDA) [8] [9]. In EDA, there are no crossover nor mutation operators: the new population is sampled from a probability distribution which is estimated from the selected individuals. Figure 2 shows the basic structure of EDA approach. EDA algorithm can be used to solve the FSS problem, representing each individual of the EDA search as a possible feature subset solution. A common notation can be used to represent an individual (or feature subset): for a full d feature problem, there are d bits in each individual, each bit indicating whether a feature is present (1) or absent (0). The main problem of EDA resides on how the joint d-dimensional probability distribution pl (x) is estimated. Obviously, the computation of 2d probabilities (for a domain with d binary variables) is impractical. To make the factorization of probability distribution of best individuals Bayesian networks could be an attractive paradigm [25]. However, due to the large amount of attributes in our database, a huge number of individuals is needed to induce a reliable Bayesian network. We have used PBIL [26] [27] [28] [29], a simple probabilistic model which assumes that all attributes of the

10

EDA Generate N individuals (the initial population) randomly.

D0

Repeat for l = 1; 2; : : : until a stop criterion is met. Dls?1

Select S N individuals from Dl?1 according to a selection method.

pl (x) = p(xjDls?1 )

Estimate the joint probability distribution of an individual

being among the selected inviduals. Dl

Sample N individuals (the new population) from pl (x).

Figure 2: Main structure of the EDA approach. database are independent each other1 . In PBIL each bit is examined independently, and the probability distribution to sample each bit of an individual of the new population is learned in the following way: pl (xi ) = (1 ? ) pl?1 (xi jDl?1 ) + pl?1 (xi jDls?1 )

pl?1 (xi jDl?1 ) is the probability distribution of bit i in the old population. pl?1 (xi jDls?1 ) is the probability distribution of bit i among selected individuals. is a user parameter which guarantees the evolution. Instead of discarding all the individuals, we maintain the best individual of the previous generation and simulate N ? 1 new individuals. In this way we can guarantee that the best individual in the nal population will be the best individual the algorithm has found. An elitist approach has been used to form iterative populations. Instead of directly discarding the N ? 1 individuals from the previous generation replacing them with N ? 1 generated ones in the new sampled population (step 7 in Fig. 3), the 2N ? 2 individuals are put together and taken the best N ? 1 of them (step 8 in Fig. 3). Those best N ? 1 individuals will form the new population (step 1 in Fig. 3) together with the best individual of the previous generation. In this way, the populations converge faster to the better 1

Assuming the following factorization: l (x) = p

Qd

i=1 pl (xi ).

11 (1) POPULATION X1 .... Xi .... Xd (*) 1 1 ..... 0 .... 1 ef1 2 1 ..... 1 .... 0 ef2 3 1 ..... 0 .... 1 ef3 .. .................... ... .. .................... ... N 1 ..... 0 .... 1 efN

(2) Selection of N/2 best individuals

(3) SELECTION X1 ..... Xi ..... X d (4) 1 0 ..... 1 ..... 0 Learn new 2 0 ..... 1 ..... 1 .. ........................ probabilities N/2 0 ..... 1 ..... 1

(5) for all i:1,...,d compute: s

p(x ) = (1- α ) p (x | D ) + α p (x | D ) i

l-1

i

l-1

l-1

i

s

p (x | D )

p (x | D ) l-1

i

l-1

l-1

i

l-1

Sample ’N-1’ individuals from learned univariate probabilities and calculate (6) individuals evaluation function values

(8) put together individuals from the previous generation (1) and newly sampled ones (7), and take the best ’N-1’ of them to form the next population (1) with the best individual found during the whole search (*) = individuals evaluation function values

(7) SAMPLED POPULATION 1 2 3 .. .. N-1

X1 .... Xi .... Xd 1 .... 0 ..... 1 1 .... 1 ..... 0 0 .... 1 ..... 0 ................... ................... 0 .... 1 ..... 0

(*) ef1 ef2 ef3 .... .... efN-1

Figure 3: FSS-PBIL algorithm. individuals found: this also implies a risk of losing diversity in the population. Figure 3 summarizes the FSS-PBIL algorithm. FSS-PBIL is a randomized, population based FSS algorithm. The absence of crossover and mutation operators (implicit to Genetic Algorithms) to evolve the population is one of its biggest attractions. We have made the following set up of the parameters of FSS-PBIL algorithm:

The population size, N , was xed to 1; 000. was xed to 0:5. The `wrapper' approach was used to evaluate each individual found in the search, estimating by the leave-one-out method the accuracy of the speci c learning algorithm (Naive-Bayes or ID3) using the found feature subset.

The criterion for halting the search was the following: FSS-PBIL stops when in a sampled new population of solutions no individual is found with an evaluation function value that improves the best individual found in the previous generation. Thus, the best solution of the previous population is returned as the result of the search.

l-1

12

4.3 FSS-PBIL results Due to its randomized nature, FSS-PBIL has been run 10 times for each algorithm. The average results of 10 runs of FSS-PBIL { estimated accuracy and amount of used attributes { appear in Table 4, comparing them with the results of no-FSS approach presented in the third section. Tables 5 and 6 Table 4: Comparative results of FSS-PBIL and no-FSS approach. FSS-PBIL

no-FSS

accuracy

number of attributes

accuracy

number of attributes

Naive-Bayes

86 53% 0 78%

11 1 0 87

72 90% 4 32%

77

ID3

85 97% 1 96%

7 5 0 52

60 75% 4 74%

11

:

:

:

:

:

:

:

:

:

:

:

:

also show the misclassi cation matrices for the best run of FSS-PBIL algorithm for Naive-Bayes and ID3. When a cross-validated paired t test [30] was applied, accuracy dierences between Naive-Bayes Table 5: Misclassi cation matrix of probabilities for the best run of Naive-Bayes rule wrapped by the FSS-PBIL feature selection method. Its estimated accuracy was 86:91% 3:28% and it used 11 attributes.

True class


survive

don't survive

0.21

0.10

survive

0.03

0.66

Table 6: Misclassi cation matrix of probabilities for the best run of ID3 classi er wrapped by the FSS-PBIL feature selection method. The estimated accuracy was 87:84% 3:17% and it used 7 attributes.

True class


survive

don't survive

0.25

0.06

survive

0.07

0.63

rule with the whole set of attributes and each run of Naive-Bayes rule with FSS-PBIL were always signi cant with a p-value smaller than 0:05. The same assertion can be done for the ID3 classi er with and without feature selection. Otherwise, looking at the misclassi cation matrices, in our medical

13 sta's opinion, the sensitivity and speci ty produced by the feature selection were more satisfactory than without this selection. With the application of FSS-PBIL, a noteworthy reduction in the amount of attributes used by NaiveBayes and ID3 must be noted. Naive-Bayes rule, which used the whole set of 77 attributes when no FSS method was applied, reduced the number of attributes needed in a 85% when FSS-PBIL was applied. On the other hand, ID3, which used only 11 attributes when no FSS method was applied, reduced this amount of attributes in a 30% with the application of FSS-PBIL. In this way, more compact models were achieved, which could be easier understood by our medical sta. This complexity reduction also increased the con dence and acceptance in those models of our medical sta. Although the cost reduction was not the goal of our work, this feature selection to induce the classi cation models produced an obvious reduction in the cost for the adquisition of data.

5 Summary and future work A medical problem, the prediction of the survival of cirrhotic patients treated with TIPS, has been focused from a Machine Learning perspective, with the aim of obtaining a classi cation rule for the indication or contraindication of TIPS in cirrhotic patients. At rst, two well known classi ers, NaiveBayes rule and ID3 classi er, have been applied withouth any feature subset selection method. But with the application of FSS-PBIL, a new randomized feature subset selection algorithm, the accuracies of both classi ers have been signi cantly improved. Coupled with this accuracy improvement, more compact models with fewer attributes have been induced, which could be easier understood and accepted by our medical sta. In the future, we plan to use a database with near 300 attributes to deal with the problem of survival in cirrhotic patients treated with TIPS, which also collects patients measurements one month after the placement of the TIPS. For this work we plan to compare FSS-PBIL with other probability distribution factorization models, dierent to PBIL, to factorize the distribution of selected individuals in the EDA approach.

Acknowledgments This work was supported by the PI 96/12 grant from Gobierno Vasco - Departamento de Educacion,

14 Universidades e Investigacion.

References [1] Saunders JB, Walters JRF, Davies P, Paton A. A 20-year prospective study of cirrhosis. Br J Med 1981; 282: 263-6. [2] Bornman PC, Krige JEJ, Terblanche J. Management of oesophageal varices. Lancet 1994; 343: 1079-84. [3] D'Amico G, Pagliaro L, Bosch J. The treatment of portal hypertension: a meta-analytic review. Hepatology 1995; 22: 332-54. [4] Rossle M, Richter GM, Noldge G. Performance of an intrahepatic portocaval shunt (PCS) using a catheter technique - a case report. Hepatology 1988; A8: 1348. [5] Rossle M, Richter GM, Noldge G, et al. New operative treatment for variceal haemorrhage. Lancet 1989; 2: 153. [6] Rossle M, Siegerstetter V, Huber M, Ochs A. The rst decade of the transjugular intrahepatic portosystemic shunt (TIPS): state of the art. Liver 1998; 18: 73-89. [7] Kohavi R, John G. Wrappers for feature subset selection. Artif Intell 1997; 97: 273-324. [8] Larra~naga P, Etxeberria R, Lozano JA, et al. A review of the cooperation between evolutionary computation and probabilistic graphical models. In: Proceedings of the II Symposium on Arti cial Intelligence CIMAF99. Special Session on Distributions and Evolutionary Optimization. 1999:

314-324. [9] Muhlenbein H, Mahnig T, Ochoa A. Schemata, distributions and graphical models in evolutionary optimization. In press. [10] SPSS Inc. SPSS-X User's Guide (3rd ed). 1988. [11] Cestnik B. Estimating Probabilities: a crucial task in Machine Learning. In: Proceedings ECAI90. 1990: 147-9.

15 [12] Quinlan JR. Induction of decision trees. Mach Lear 1986; 1: 81-106. [13] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings IJCAI-95. 1995: 1137-43. [14] Kohavi R, Sommer eld D, Dougherty J. Data mining using MLC++, a Machine Learning Library in C++. Int J Artif Intell Tools 1997; 6: 537-66. [15] Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pat Rec 1997; 30: 1145-59. [16] Langley P. Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance. 1994: 140-4.

[17] Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers 1998. [18] Narendra P, Fukunaga K. A branch and bound algorithm for feature subset selection. IEEE Trans on Comp 1977; 26: 917-22. [19] Kittler J. Feature set search algorithms. In: Chen CH, eds. Pattern Recognition and Signal Processing. Sitho and Noordho: Alphen aan den Rijn, 1978: 41-60.

[20] Pudil P, Novovicova J, Kittler J. Floating search methods in feature selection. Pat Rec Let 1994; 15: 1119-25. [21] Siedelecky W, Skalansky J. On automatic feature selection. Int J Pat Rec and Artif Intell 1988; 2: 197-220. [22] Doak J. An Evaluation of Feature Selection Methods and their Application to Computer Security. Technical Report CSE-92-18. University of California at Davis 1992. [23] John G, Kohavi R, P eger K. Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference. 121-9.

[24] Holland JH. Adaptation in Natural and Arti cial Systems. University of Michigan Press, 1975.

16 [25] Etxeberria R, Larra~naga P. Global optimization with bayesian networks. In: Proceedings of the II Symposium on Arti cial Intelligence CIMAF99. Special Session on Distributions and Evolutionary Optimization. 1999: 332-339.

[26] Baluja, S. Population-Based Incremental Learning: a Method for Integrating Genetic Search Based Function Optimization and Competitive Learning. Technical Report CMU-CS-94-163.

Pittsburgh, Carnegie Mellon University 1994. [27] Gonzalez C, Lozano JA, Larra~naga P. Algoritmo PBIL: un analisis teorico preliminar. In press. [28] Hohfeld M, Rudolph G. Towards a theory of population-based incremental learning. In: Proceedings of the 4th IEEE Conference on Evolutionary Computation. 1997: 1-5.

[29] Monmarche N, Ramat E, Dromel G, Slimane M, Venturini G. On the Similarities between AS, BSC and PBIL: toward the Birth of a new Meta-Heuristic. Internal Repport 215-E3i. Laboratoire

d'Informatique du Littoral 1999. [30] Diettrich TG. Approximate statistical tests for comparing supervised learning algorithms. Neural Computation 1998; 10: 1895-924.

Feature Subset Selection by Population-Based ... - CiteSeerX

Feature Subset Selection by Population-Based ... - CiteSeerX

Suggest Documents

Solving Feature Subset Selection Problem by a Parallel ... - CiteSeerX

Commonality-Based Feature Subset Selection

A Feature Subset Selection Algorithm Automatic ... - arXiv

Feature subset selection using differential evolution ...

Feature subset selection using differential evolution

Best Agglomerative Ranked Subset for Feature Selection

Wrapper Feature Subset Selection for Dimension ... - ScienceDirect

Evolutionary Feature Subset Selection for Pattern Recognition

Wrapper Feature Subset Selection for Dimension ... - CyberLeninka

Feature Subset Selection Using A Genetic Algorithm

Feature Subset Selection by Bayesian networks based optimization

Gait Feature Subset Selection by Mutual Information - ePrints Soton

Page 1 FEATURE SUBSET SELECTION IN REMOTE SENSING by ...

Speeding Up the Wrapper Feature Subset Selection in Regression by ...

Feature Subset Selection Using Genetic Algorithms for ... - CiteSeerX

Messy Genetic Algorithms for Subset Feature Selection - CiteSeerX

Feature Subset Selection Based on ANN Sensitivity Analysis - CiteSeerX

Discriminant Feature Selection by Genetic Programming - CiteSeerX

LNAI 4188 - Feature Subset Selection Based on ... - Springer Link

Feature-channel subset selection for optimising myoelectric human ...

Axiomatic approach to feature subset selection ... - Semantic Scholar

CLeVer: a Feature Subset Selection Technique for Multivariate Time ...

Best Subset Feature Selection for Massive Mixed ... - Semantic Scholar

Speeding up Feature Subset Selection through Mutual Information