Title 14pt Bold Centred

6 downloads 0 Views 15KB Size Report
tween attributes was defined to exist, if the Cramér's V [4] reached a certain cut-off value. Information from attribute groups was used to select attributes to be ...
MEDINFO 2001 V. Patel et al. (Eds) Amsterdam: IOS Press © 2001 IMIA. All rights reserved

Feature Subset Selection for Decision Tree Induction in the Context of Otoneurological Data: A Preliminary Study Kati Viikkia, Erna Kentalab, Martti Juholaa, Ilmari Pyykköc a

Department of Computer and Information Sciences, 33014 University of Tampere, Finland Department of Otorhinolaryngology, 00029 Helsinki University Central Hospital, Finland c Department of Otorhinolaryngology, Karolinska Institute, 17176 Stockholm, Sweden

b

Ménière’s disease excluding the vertigo symptoms.

Introduction

The decision tree constructed using all the attributes had 37 attributes and 57 nodes. The accuracy was 81%. The decision trees generated using information from attribute groups had from 28 to 40 attributes and from 52 to 60 nodes. The accuracy ranged from 75 to 79%.

The definition of representation for learning data is essential, when machine learning methods are applied. Finding of a good subset of attributes [1] may be a tedious task due to wealth of available attributes. In this study, an attribute grouping method based on measures of association and graph theoretic techniques [2] was used to guide the attribute selection for construction of decision trees for six otoneurological diseases.

The attribute grouping method revealed interesting associations between attributes and gave insight into the data. Combined to the attribute selection for decision tree induction, it generated sensible and accurate decision trees, which were, overall, simpler than the decision tree constructed from the whole attribute set.

Materials and Methods The data comprises 815 patient cases from the 6 largest diagnostic groups from the database of an otoneurological expert system ONE [3]: Ménière’s disease (313), benign positional vertigo (146), vestibular schwannoma (130), vestibular neuritis (120), traumatic vertigo (65) and sudden deafness (41). The 104 attributes used in the study concerned symptoms, medical history and clinical findings.

References [1] Kohavi R, and John GH. Wrappers for feature subset selection. Artificial Intelligence 1997: 97: 273-324. [2] Viikki K. A Variable Grouping Method Based on Graph Theoretic Techniques. University of Tampere, Department of Computer and Information Sciences, Report A2001-1, 2001.

The attribute grouping method based on measures of association and graph theoretic techniques [2] was used to find groups of related attributes from the data. A relation between attributes was defined to exist, if the Cramér’s V [4] reached a certain cut-off value. Information from attribute groups was used to select attributes to be input to the See5 decision tree program [5].

[3] Kentala E, Pyykkö I, Auramo Y, and Juhola M. Otoneurological expert system. Annals of Otology, Rhinology & Laryngology 1996: 105: 654-658. [4] Pett MA. Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. Thousand Oaks, California: SAGE Publications, 1997.

Results and Discussion

[5] Quinlan JR. See5, version 1.07a, http://www.rulequest. com/, 1998.

Cut-off values of 0.9, 0.7, 0.5 and 0.3 [4] were used to generate 4 collections of attribute groups. The number of attribute groups in the collections was 92, 83, 55 and 15, respectively. The cut-off value of 0.5 (moderate or stronger relations) produced the most interesting collection, in which the attribute group concerning hearing loss and tinnitus formed, in a manner of speaking, the clinical picture of

Address for correspondence Kati Viikki, Department of Computer and Information Sciences, FIN-33014 University of Tampere, Finland. E-mail: [email protected]

583