Generating Understandable and Accurate Fuzzy Rule ...

78 downloads 5395 Views 114KB Size Report
First, giving an overview on existing software tools for fuzzy ... algorithms under the same interface is Weka (Data Mining Software in Java) [27]. It is also ...
Generating Understandable and Accurate Fuzzy Rule-based Systems in a Java Environment Jose M. Alonso1 L. Magdalena1

Abstract Looking for a good interpretability-accuracy trade-off is one of the most challenging tasks on fuzzy modelling. Indeed, interpretability is acknowledged as a distinguishing capability of linguistic fuzzy systems since the proposal of Zadeh and Mamdani’s seminal ideas. Anyway, obtaining interpretable fuzzy systems is not straightforward. It becomes a matter of careful design which must cover several abstraction levels. Namely, from the design of each individual linguistic term (and its related fuzzy set) to the analysis of the cooperation among several rules, what depends on the fuzzy inference mechanism. This work gives an overview on existing tools for fuzzy system modelling. Moreover, it introduces GUAJE which is an open-source free-software java environment for building understandable and accurate fuzzy rule-based systems by means of combining several pre-existing tools. Key words: Interpretability, Fuzzy modeling, Free open source software

1. Introduction The term Soft Computing (SC) is usually defined by its essential properties, as a family of techniques, as a complement of hard computing, and/or as a tool for coping with imprecision and uncertainty [22]. One of the main issues regarding SC techniques is their cooperative nature. Each individual technique (Fuzzy Logic, Neuro-computing, Probabilistic Reasoning, Evolutionary Computation, etc.), even each individual algorithm, has its own advantages and drawbacks. A family of several pre-existing techniques is able to work in a cooperative way yielding hybrid systems, taking profit from the main advantages of each of them, in order to solve lots of complex real-world problems for which other techniques are not well suited. Since this work deals with modelling understandable and accurate systems, it is mainly focused on those SC techniques more suitable for dealing with the so-called humanistic systems, defined by Zadeh [30] as “those systems whose behaviour is strongly influenced by human judgment, perception or emotions”. We will concentrate on Fuzzy Logic (FL) [28] because of its semantic expressivity close to natural language is well-known for linguistic concept modelling. The use of linguistic variables and rules [29] favours the interpretability of fuzzy systems. Unfortunately, FL is not enough for building interpretable systems, i.e., fuzzy systems are not interpretable per se. Thus, the whole modelling process must be carried out carefully, paying special attention to interpretability from the Email addresses: [email protected] (Jose M. Alonso ), [email protected] (L. Magdalena ) Centre for Soft Computing, C/ Gonzalo Guti´ errez Quir´ os, s/n, 33600 Mieres, Asturias, Spain

1 European

European Centre for Soft Computing Internal Report (January 2011). Final version published in Fuzzy Logic and Applications, Springer-Verlag, Volume LNAI 6857, September 2011, Pages 212-219. Available online at: http://dx.doi.org/10.1007/978-3-642-23713-3_27

beginning to the end and imposing several constraints [25]. Notwithstanding, interpretability requirements strongly depend not only on each specific problem but also on the background (experience, preferences, knowledge, etc.) of the end-user who will interact with the designed system. Notice that, looking for a good interpretability-accuracy trade-off is one of the most complex tasks on system modelling. It demands the aid of powerful software tools. The objective of this paper is twofold. First, giving an overview on existing software tools for fuzzy system modelling. Second, introducing GUAJE. It is a java environment for building understandable and accurate fuzzy rule-based systems. Notice that, GUAJE is an enhanced version of a previous open source software (KBCT) and it integrates several algorithms provided by different open source software tools. It is worth noting that GUAJE is not merely an aggregation of different tools but there exists the possibility of automatically writing a configuration file to select the different algorithms during the modelling process, no matter the specific tool they are provided by. The structure of this paper is as follows. The next section makes a global review on available software that implements SC techniques for system modelling. Then, section 3 presents GUAJE. Section 4 enumerates some applications. Finally, section 5 draws some conclusions and points out future works. 2. An overview on software for system modelling Most software for SC system modelling is available on the Web in the form of libraries and/or small tools which often come from academics and small research groups. In order to get wider visibility and cooperation with other researchers those tools are usually downloadable as free open source software, at least for research and education purposes. As a result, there is a huge amount of available free software what makes really easy creating new small prototypes for lots of applications without the effort of starting from scratch. However, the main drawback of such developments is their maintenance cost. Keeping a flexible and well-documented source code is a mandatory requirement in order to make feasible the cooperation of several researchers in a common development. In addition, the coordination and control of subversions is a really difficult task when several researchers, sometimes located at different parts of the world, are only working on the software development during their own free time. Of course, as alternative it is possible to resort to professional commercial tools like the Matlab toolboxes which include the well-known Fuzzy Toolbox and ANFIS (Adaptive Neuro-Fuzzy System). Nevertheless, we advocate for the use of open source software because it offers the richness of quickly incorporating new developments made by the active research community which is always working in emerging fields. In short, some of the most famous free-software SC packages and tools are the following. In the field of evolutionary computation, JCLEC (Java Class Library for Evolutionary Computation) and JMetal (Metaheuristic Algorithms in Java) provide two nice frameworks for both evolutionary and multi-objective optimization. JavaNNS (Java version of Stuttgart Neural Network Simulator) is probably the best free suite for neural networks. Regarding fuzzy modelling, Xfuzzy (a development environment for fuzzy-inference-based systems), FisPro (Fuzzy Inference System Professional) and KBCT (Knowledge Base Configuration Tool) represent three useful tools. In addition, regarding neuro-fuzzy algorithms we can point out, among others, to NEFCLASS (Neuro-Fuzzy Classification). There are also some interesting and successful attempts for going beyond specialized tools. For instance, FrIDA [12] is free and open source software in the form of a java-based graphical user interface (GUI) that joins several individual tools for data analysis and visualization. In this case all small programs were developed by the same researchers over the years. KEEL (Knowledge Extraction based on Evolutionary http://jclec.sourceforge.net/ http://jmetal.sourceforge.net/ http://www.ra.cs.uni-tuebingen.de/SNNS/ https://forja.rediris.es/projects/xfuzzy/ http://www.inra.fr/internet/Departements/MIA/M/fispro/ http://www.mat.upm.es/projects/advocate/kbct.htm http://fuzzy.cs.uni-magdeburg.de/nefclass/

2

Learning) [1] is another more ambitious software tool created as part of a research project with several goals. To start with it includes a huge repository made up of hundreds of evolutionary learning algorithms developed by several authors (belonging to different research groups) as part of their own research works. Furthermore, new algorithms can be easily added. In addition, KEEL offers a user-friendly java GUI for designing experiments where different algorithms can be fairly compared with exactly the same datasets under a complete statistical analysis. Another quite famous tool putting together several machine-learning algorithms under the same interface is Weka (Data Mining Software in Java) [27]. It is also developed following the open source philosophy and it counts with a lot of related projects and contributors. To do so, it applies the Linux model of releases. It focuses on automatic extraction of knowledge from data but, unfortunately, it does not take care of the interpretability of the generated models and it does not include any algorithms for fuzzy modelling. Lastly, KNIME is a user-friendly and comprehensive open-source data integration, processing, analysis, and exploration platform for both industry and academia. 3. Description of the GUAJE environment GUAJE stands for Generating Understandable and Accurate fuzzy rule-based systems in a Java Environment. GUAJE implements the Highly Interpretable Linguistic Knowledge (HILK) fuzzy modelling methodology [3, 6]. The main building blocks are sketched in Fig. 1. The core of GUAJE is the last downloadable version of KBCT [5] (version 3.0) which has been upgraded with new functionalities. It is an open source software for knowledge extraction and representation which combines expert knowledge and induced knowledge (knowledge automatically extracted from experimental data). The whole modelling process is made up of the next steps. First of all, available experimental data must be pre-processed and translated into the format handled by GUAJE. Secondly, a feature selection process is needed in the case of dealing with complex problems involving many inputs. Thirdly, the partition design stage is based on the definition of linguistic variables characterized by strong fuzzy partitions. Both expert partitions and partitions automatically generated from experimental data are compared. The best partitions according to data distribution and expert knowledge are selected for each input variable. Then, two sets of linguistic rules (expert and induced rules) describe the system behaviour by means of combining the previously generated linguistic variables. They are rules of form If Premise Then Conclusion where both premises and conclusions are expressed by linguistic propositions. Then, both sets of rules are integrated in a unique one after checking integrity, consistency, and so on. Then, the resultant knowledge base can be improved regarding both interpretability (minimization and simplification) and/or accuracy (optimization). Finally, after validating the final fuzzy system (knowledge base + inference mechanism) it is possible to generate native code with the aim of running the designed system in a stand-alone application. Following the cooperative spirit of SC applications, GUAJE not only promotes the combination of several SC techniques but also the combination of several available system modelling tools with the aim of building interpretable fuzzy systems: • FisPro. An open source tool for creating fuzzy inference systems (FIS) to be used for reasoning purposes, especially for simulating a physical or biological system [19]. It includes many algorithms (most of them implemented as C programs) for generating fuzzy partitions and rules directly from experimental data. In addition, it offers data and FIS visualization methods with a java-based userfriendly interface. GUAJE makes use of the following algorithms provided by FisPro: K-means [20]; Hierarchical Fuzzy Partitioning (HFP) [18]; Wang and Mendel (WM) [26]; Fast Prototyping Algorithm (FPA) [19]; and Fuzzy Decision Trees (FDT) [21]. http://www.cs.waikato.ac.nz/ml/weka/ http://knime.org/ http://www.softcomputing.es/guaje

3

WEKA FisPro KEEL

WEKA KBCT

FisPro

Partition learning (HFP, kmeans)

KBCT

Partition evaluation and selection

Data Preprocessing (Visualization, Analysis, Resampling, etc)

DATA

Feature Selection (C45−based)

Partition Design (Linguistic variables with Strong Fuzzy Partitions)

Domain Ontology ORE

FisPro

Induced rules (FDT, FPA, WM)

KBCT

Expert rules

Rule Base Definition (Linguistic Rules) Expert Knowledge

KBCT

Rule Integration Consistency Analysis

Rule Base Verification

Fingrams Analysis

ESPRESSO KBCT

Logical Minimization Linguistic Simplification

Knowledge Base Improvement

Partition Optimization

Program Synthesis Xfuzzy

Automatic Code

Knowledge Base Validation

Generation (C, C++, Java, VHDL)

KBCT Xfuzzy FisPro Matlab

FIS visualization FIS inference

Quality Evaluation

Accuracy

KBCT

Interpretability

FIS simulation

Figure 1: Scheme of the proposed GUAJE environment.

• ORE (Ontology Rule Editor). A java-based open source platform-independent application for defining, managing and testing inference rules on a model represented by a specific ontology [23]. GUAJE calls to libraries provided by ORE for visualizing domain ontologies with the aim of making easier the process of expert knowledge extraction [8]. • Espresso. Free software designed for logical minimization which implements the algorithm developed by R. Brayton [13]. GUAJE calls to Espresso as part of the module in charge of running the interpretability assessment approach based on semantic co-intension proposed by [24]. • Graphviz. A collection of free software for viewing and manipulating abstract graphs [16]. It is used by the module of GUAJE responsible for a novel interpretability analysis at fuzzy inference level (fingrams analysis) [2]. http://sourceforge.net/projects/ore/ http://www.graphviz.org/

4

• JMetal. Free software that comprises a set of Metaheuristic algorithms implemented in Java by Durillo et al. [15]. We have combined GUAJE with JMetal looking for embedding HILK into a multi-objective evolutionary framework [4, 14]. • Weka. Open source tool for data mining. It includes the implementation of many classical algorithms like for example J48 which corresponds to the well-known C4.5 algorithm. GUAJE offers a feature selection procedure based on such algorithm. Fuzzy knowledge bases generated with GUAJE can be exported to the format recognized by FisPro, Xfuzzy and Matlab Fuzzy Toolbox. Thus, those tools can be used just at the final modelling stages. Notwithstanding, the inverse translation is not allowed, i.e., knowledge bases modified with FisPro, Matlab, or Xfuzzy can not be imported and opened again with GUAJE. This is a restriction to preserve the interpretability of the final model because FisPro, Matlab, or Xfuzzy may violate the interpretability constraints imposed and satisfied by GUAJE. We recommend the use of Xfuzzy to generate code for stand-alone applications. On the other hand, Matlab may be useful for system simulations. Finally, KEEL offers many learning algorithms that may be incorporated to GUAJE in the near future. Anyway, they can also be used for the first data pre-processing stage. 4. Applications KBCT, the ancestor of GUAJE, has been successfully used with the aim of designing interpretable fuzzy systems for many applications: detecting the inattentiveness level of a driver [11]; making intelligent diagnosis in robotics [7]; classifying glucose measurements in a telemedicine system [17]; localization of autonomous robots in indoor environments [9]; and human activity recognition [10]. Thanks to its new functionalities (feature selection, visual analysis and simplification by fingrams, interpretability indexes, etc.), GUAJE is expected to overcome results reported by KBCT in previous applications. Moreover, GUAJE is supposed to yield good results even in complex large-size problems where KBCT suffered from scalability problems. 5. Conclusions and Future Works This paper has presented a new system modelling suite mainly focused on designing FRBSs with a good interpretability-accuracy trade-off by means of combining several pre-existing tools. This approach lets us saving a lot of time because we reuse many algorithms already freely available on the Web as part of other tools which are distributed as open source software. New algorithms can be added in the future with the aim of complementing the existing ones or adding new functionalities. GUAJE is freely available (under GPL license) as open source software at: http://www.softcomputing.es/guaje Acknowledgment GUAJE is an enhanced version of KBCT whose first version was developed as part of the European research project ADVOCATE II supported by the European Commission (IST-2001-34508). The initial development started from FisPro what explains why both tools are so closely linked. The present work is partly supported by the European Centre for Soft Computing (ECSC) and the Spanish Ministry of Science and Innovation under project TIN2008-06890-C02-01.

5

References [1] J. Alcal´ a-Fdez, L. S´ anchez, S. Garc´ıa, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fern´ andez, and F. Herrera. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3):307–318, 2009. [2] J. M. Alonso, O. Cord´ on, A. Quirin, and L. Magdalena. Analyzing interpretability of fuzzy rule-based systems by means of fuzzy inference-grams. In World Congress on Soft Computing, 2011. [3] J. M. Alonso and L. Magdalena. HILK++: an interpretability-guided fuzzy modeling methodology for learning readable and comprehensible fuzzy rule-based classifiers. Soft Computing, DOI 10.1007/s00500-010-0628-5, 2010. [4] J. M. Alonso, L. Magdalena, and O. Cord´ on. Embedding hilk in a three-objective evolutionary algorithm with the aim of modeling highly interpretable fuzzy rule-based classifiers. In IV International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS), pages 15–20, 2010. [5] J. M. Alonso, L. Magdalena, and S. Guillaume. KBCT: A knowledge extraction and representation tool for fuzzy logic based systems. In IEEE International Conference on Fuzzy Systems, pages 989–994, 2004. [6] J. M. Alonso, L. Magdalena, and S. Guillaume. HILK: A new methodology for designing highly interpretable linguistic knowledge bases using the fuzzy logic formalism. International Journal of Intelligent Systems, 23(7):761–794, 2008. [7] J. M. Alonso, L. Magdalena, S. Guillaume, M. A. Sotelo, L. M. Bergasa, M. Oca˜ na, and R. Flores. Knowledge-based intelligent diagnosis of ground robot collision with non detectable obstacles. Journal of Robotic & Intelligent Systems, 48:539–566, 2007. [8] J. M. Alonso, A. Mu˜ noz, J. A. Bot´ıa, L. Magdalena, and A. F. G´ omez-Skarmeta. Uso de ontolog´ıas para facilitar las tareas de extracci´ on y representaci´ on de conocimiento en el dise˜ no de sistemas basados en reglas borrosas. In XIV Spanish ESTYLF conference on fuzzy logic and technologies, pages 233–240, 2008. [9] J. M. Alonso, M. Oca˜ na, M. A. Sotelo, L. M. Bergasa, and L. Magdalena. WiFi localization system using fuzzy rule-based classification. In Computer Aided Systems Theory, LNCS5717, EUROCAST, pages 383–390, 2009. [10] A. Alvarez, J. M. Alonso, G. Trivino, N. Hernandez, F. Herranz, A. Llamazares, and M. Oca˜ na. Human activity recognition applying computational intelligence techniques for fusing information related to wifi positioning and body posture. In IEEE World Congress on Computational Intelligence, pages 295–304, 2010. [11] L. M. Bergasa, J. Nuevo, M. A. Sotelo, R. Barea, and M. E. L´ opez. Real-time system for monitoring driver vigilance. IEEE Transactions on Intelligent Transportation Systems, 7(1):63–77, 2006. [12] C. Borgelt and G. Gonz´ alez-Rodr´ıguez. FrIDA - a free intelligent data analysis toolbox. In IEEE International Conference on Fuzzy Systems, pages 1892–1896, 2007. [13] R. K. Brayton, G. D. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers Group, 1984. [14] R. Cannone, J. M. Alonso, and L. Magdalena. Multi-objective design of highly interpretable fuzzy rule-based classifiers with semantic cointension. In V International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS), 2011. [15] J Durillo, A. J. Nebro, and E. Alba. The jMetal framework for multi-objective optimization: Design and architecture. In IEEE World Congress on Computational Intelligence, pages 4318–4325, 2010. [16] E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software - Practice and Experience, 30(11):1203–1233, 1999. [17] G. Garcia-Saez, J. M. Alonso, J. Molero, M. Rigla, I. Martinez-Sarriegui, A. de Leiva, E. J. Gomez, and M. E. Hernando. Mealtime blood glucose classifier based on fuzzy logic for the diabtel telemedicine system. In 12th Conference on Artificial Intelligence in Medicine (AIME), pages 295–304, 2009. [18] S. Guillaume and B. Charnomordic. Generating an interpretable family of fuzzy partitions. IEEE Transactions on Fuzzy Systems, 12(3):324–335, 2004. [19] S. Guillaume and B. Charnomordic. Learning interpretable fuzzy inference systems with FisPro. Information Sciences, Special Issue on Interpretable Fuzzy Systems, In press, 2011. [20] J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100–108, 1979. [21] H. Ichihashi, T. Shirai, K. Nagasaka, and T. Miyoshi. Neuro-fuzzy ID3: A method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning. Fuzzy Sets and Systems, 81:157–167, 1996. [22] L. Magdalena. What is soft computing? Revisiting possible answers. In 8th International FLINS Conference, pages 3–10, 2008. [23] A. Mu˜ noz, A. Vera, J. A. Bot´ıa, and A. F. G´ omez-Skarmeta. Defining basic behaviours in ambient intelligence environments by means of rule-based programming with visual tools. In 1st Workshop of Artificial Intelligence Techniques for Ambient Intelligence. ECAI, 2006. [24] C. Mencar, C. Castiello, R. Cannone, and A.M. Fanelli. Interpretability assessment of fuzzy knowledge bases: a cointension based approach. International Journal of Approximate Reasoning, 52(4):501–518, 2011. [25] C. Mencar and A. M. Fanelli. Interpretability constraints for fuzzy information granulation. Information Sciences, 178:4585–4618, 2008. [26] L.-X. Wang and J. M. Mendel. Generating fuzzy rules by learning from examples. IEEE Transactions on Systems, Man and Cybernetics, 22 (6):1414–1427, 1992. [27] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005. [28] L. A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.

6

[29] L. A. Zadeh. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. on SMC, 3:28–44, 1973. [30] L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning. Parts I, II, and III. Information Sciences, 8, 8, 9:199–249, 301–357, 43–80, 1975.

7

Suggest Documents