Rule-Extraction from Support Vector Machines Joachim Diederich School of Information Technology and Electrical Engineering The University of Queensland Brisbane Q 4072, Australia
[email protected] Department of Computer Science American University of Sharjah Sharjah, U.A.E. Abstract Over the last years, a number of studies on rule extraction from support vector machines (SVMs) have been introduced [1-5]. The research strategy in these projects is similar: to explore and develop algorithms for rule-extraction from SVMs based on the perception (or “view”) of the underlying SVM which is either explicitly or implicitly assumed within the rule extraction technique. In the context of rule-extraction from artificial neural networks [6, 7] the notion of “translucency” describes the degree to which the internal representation of the ANN is accessible to the rule extraction technique. More broadly, a taxonomy for rule-extraction from neural networks has been introduced [6, 7] which includes five evaluation criteria: translucency, rule quality, expressive power, portability and algorithmic complexity. These evaluation criteria are now commonly used for ruleextraction from SVMs. The central thesis here is that the above mentioned evaluation criteria cannot be applied to rule-extraction from SVMs, in particular those trained on very high-dimensional data, and that SVMs that generate structured output [8, 9] offer opportunities for ruleextraction, and hence, the explanation of learning results. The following briefly describes two of the five evaluation criteria for rule-extraction from neural networks which are then discussed in the context of rule-extraction from SVMs. A new classification schema for rule-extraction from SVMs is presented and an approach is outlined which (1) uses SVMs only, including those generating structured output, and (2) works well for high-dimensional data. Rule-extraction from ANNs At one end of the translucency spectrum are those rule extraction techniques which view the underlying ANN at the maximum level of granularity i.e. as a set of hidden and output units (“decompositional” techniques [10]). The basic strategy of such decompositional rule extraction techniques is to extract rules at the level of each individual hidden and output unit. In contrast to the decompositional approaches, the strategy of “pedagogical” or learning-based methods is to view the trained ANN at the minimum possible level of granularity i.e. as a single entity or alternatively as a “black box”. The focus is then on finding rules that map the ANN inputs (i.e. the attribute/value pairs from the 1
problem domain) directly to outputs (e.g. membership-of or exclusion-from some target class [7]). A rule set is considered to be accurate if it can correctly classify a set of previously unseen examples and displays a high level of fidelity if it can mimic the behaviour of the neural network from which it was extracted by capturing all of the information represented in the ANN. An extracted rule set is consistent if, under differing training sessions, the artificial neural network generates rule sets which produce the same classifications of unseen examples. Finally, the comprehensibility of a rule set is determined by measuring the size of the rule set (in terms of the number of rules) and the number of antecedents per rule [7]. Translucency and rule quality applied to rule extraction from SVMs. Most current studies on rule-extraction from SVMs focus on decompositional extraction, however, learning-based approaches are also available [4]. The idea is simple: combine SVM outputs with inputs from a data set and use any machine learning technique that produces rules or decision trees. Hence, pedagogical rule-extraction from SVMs is trivial, in particular if the data set is low dimensional. To date, there is no rule-extraction from SVM technique for high-dimensional data, i.e. the core application domain for SVMs. Rule-extraction from support vector machines requires evaluation criteria that emphasize data. The following dimensions are proposed: (1) translucency, (2) dimensionality of data, (3) expressiveness of the extracted rules, (4) portability, (5) rule quality and (6) algorithmic complexity of extraction. The following technique utilizes a multi-class SVM [9] and scores high on several of the criteria mentioned above. Assume two SVMs, here labeled as CSVM (classification SVM) and ESVM (explanation SVM). CSVM is trained on a binary decision problem with high-dimensional data, e.g. text classification, ESVM on a multiple classification task. ESVM takes as input the output of CSVM plus the original input pattern. ESVM’s target categories represent user selected features from the CSVM input pattern plus ranges over the values of these attributes. In a text classification task, for instance, ESVM’s output classes represent content words and the frequency of their occurrence. Both the CSVM and ESVM outputs can be used to form conjunctive rules: ESVM outputs are the set of antecedents and the CSVM output is the consequence. The entire rule set is then refined: duplicates as well as redundant rules and antecedents are eliminated. This method is simple and purely learning-based, works for high-dimensional data and generates propositional rules. The technique is portable, however, fidelity is not guaranteed and large rule sets are possible. This is offset by the fact that the method is very efficient (rule generation is based on the CSVM/ESVM output only) and explanation for individual input patterns is possible. The objective of future research is to use SVMs that generate structured outputs directly for the generation of rule sets. A structured model is a scoring scheme over a set of combinatorial structures plus a method for finding the highest ranking structure [8]. Hence, it should be possible to learn rule sets directly that offer best explanation for an SVM learning result.
2
References [1] Núñez, H., Angulo, C., Catala, A. “Rule-extraction from support vector machines,” In Proceedings of European Symposium on Artificial Neural Networks, 2002b, pp. 107112. [2] Zhang, Y., Su, H., Jia, T., Chu, J. “Rule extraction from trained support vector machines,” In Proceedings of Advances in Knowledge Discovery and Data Mining: 9th Pacific-Asia Conference PAKDD 2005, Springer, pp. 61-70, 2005. [3] Fung, G., Sandilya, S., Rao, R. “Rule extraction for linear support vector machines,” In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. [4] Barakat, N., Diederich, J., “Eclectic rule-extraction from support vector machines”. International Journal of Computational Intelligence, 2 (1), 59-62, 2005. [5] Diederich J., Barakat, N. “Hybrid rule-extraction from support vector machines”. In Proceedings. of IEEE conference on cybernetics and intelligent systems, Singapore, 2004, pp. 1270-1275. [6] Andrews, R., Diederich, J., Tickle, A.B. “A Survey and Critique of Techniques For Extracting Rules From Trained Artificial Neural Networks”, Knowledge Based Systems, 8, pp. 373-389, 1995. [7] Tickle, A., Andrews, R., Golea, M., Diederich, J. “The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural network,” IEEE Transactions on Neural Networks, 9(6), 1057-1068, 1998. [8] Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C. “Learning structured prediction models: A large margin approach,” ICML 2005, Proceedings of the 22nd International Conference on Machine Learning 2005. [9] Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y. “Support Vector Learning for Interdependent and Structured Output Spaces”, ICML 2004, Proceedings of the 21st International Conference on Machine Learning 2004. [10] Craven, M., Shavlik, J. “Using sampling and queries to extract rules from trained neural networks.” In Proceedings of the 11th International Conference on Machine learning, 1994, 37-45. Topic: data mining Preference: oral/poster
3