ROSE - Software Implementation of the Rough Set Theory - CiteSeerX

28 downloads 12646 Views 80KB Size Report
It is particularly useful in analysis of data sets with large boundary regions. The ROSE system is a successor of RoughDAS and RoughClass systems. [3][5][10].
ROSE - Software Implementation of the Rough Set Theory Bartlomiej Pr¸edki, Roman Slowi´ nski, Jerzy Stefanowski Robert Susmaga, Szymon Wilk Institute of Computing Science Poznan University of Technology Piotrowo 3A, 60-965 Poznan, Poland

[ Abstract.] This paper briefly describes ROSE software package. It is an interactive, modular system designed for analysis and knowledge discovery based on rough set theory in 32-bit operating systems on PC computers. It implements classical rough set theory as well as its extension based on variable precision model. It includes generation of decision rules for classification systems and knowledge discovery.

1

Introduction

ROSE (Rough Set Data Explorer) is a modular software system implementing basic elements of the rough set theory and rule discovery techniques. It has been created at the Laboratory of Intelligent Decision Support Systems of the Institute of Computing Science in Poznan, basing on fourteen-year experience in rough set based knowledge discovery and decision analysis. All computations are based on rough set fundamentals introduced by Z. Pawlak [6]. One of implemented extensions applies the variable precision rough set model defined by W. Ziarko [14]. It is particularly useful in analysis of data sets with large boundary regions. The ROSE system is a successor of RoughDAS and RoughClass systems [3][5][10]. RoughDAS is historically one of the first successful implementations of the rough set theory, which has been used in many real life applications. Due to limitations of RoughDAS, especially its incapability to make full use of currently available computers, there was a need to design and implement new software. ROSE started as several independent modules that were later put together in one system. First we were motivated to create computational engine working on more powerful computers (e.g. UNIX workstations), allowing faster analysis of large data sets. Then we came to the point of creating user friendly interface, where Microsoft Windows was chosen as our basic platform. So the modules can be separately redesigned and recompiled without much interference from user’s point of view. The only component that is strictly platform dependent is graphical user interface (GUI). All this guarantees that the system can be easily adapted for future operating systems and platforms. L. Polkowski and A. Skowron (Eds.): RSCTC’98, LNAI 1424, pp. 605–608, 1998. c Springer-Verlag Berlin Heidelberg 1998

606

2

B. Pr¸edki, R. Slowi´ nski, J. Stefanowski, R. Susmaga, S. Wilk

ROSE system

The program ROSE is an interactive software system running under 32-bit GUI operating systems (Windows 95/NT 4.0) on PC compatible machines. The core modules were written in C++ programming language (standard ANSI), while the interface modules were developed using Borland C++ (with Object Windows libraries) and Borland Delphi. The system consists of a graphical user interface (GUI) and a set of separate computational modules. The modules are platform independent and can be recompiled for different targets including UNIX machines. GUI acts as an overlay on all computational modules. So it is quite easy to add new modules to the ROSE system and that is an important characteristic. This guarantees greater expandability of the system in the future. ROSE is designed to be easy in use, point and click, menu-driven, user friendly tool for exploration and data analysis. It is meant as well for experts as for occasional users who want to do the data analysis. System communicates with users using dialog windows and all the results are represented in the environment. Data can be edited using spreadsheet like interface.

3

Input/output data

ROSE accepts input data in form of a table called an information table in which rows correspond to objects (cases, observations, etc.) and columns correspond to attributes (features, characteristics). The attributes are divided into disjoint sets of condition attributes (e.g. results of particular tests or experiments) and decision attributes (expressing the partition of objects into classes, i.e. their classification). The data is stored in a plain text file according to a defined syntax (Information System File, ISF). ROSE can also import data stored by its predecessor RoughDAS and export to several other formats (including files accepted by the system LERS or C4.5). ISF file specification allows for long attribute names (up to 30 alphanumerical characters) and string values of attributes (such as ’high’, ’low’) aside real and integer values. Because it is plain text file it can be transferred between different operating systems without modifications. It is also easy to edit and verify correctness of data contained in the file. The file format has an open structure. It is divided into sections and it is possible to add some new sections so far undefined for further use. The user can decide to ignore some of the attributes just by changing the qualification of attributes. Except visualization in GUI, all results are also written to plain text files, so they are readable also outside the system, and easily converted to other file formats.

ROSE - Software Implementation of the Rough Set Theory

4

607

Features

Features currently offered by computational modules include: – data validation and preprocessing, – automatic discretization of continuously-valued attributes according to Fayyad & Irani method[1] as well as user-driven discretization, – qualitative estimation of the ability of the condition attributes to approximate the objects’ classification, using either standard rough set model or variable precision model extension, – finding the core of attributes as well as looking for reducts in the information table (either all reducts or a population of reducts of predetermined size) using several methods such as algorithm by S. Romanski[7] and modified algorithm by A. Skowron [8], – examining the relative significance of a given attribute for the classification of objects, by observing the changes in the quality of classification, – reducing superfluous attributes and selecting the most significant attributes for the classification of objects; there are available several techniques that support the choice o subsets of attributes ensuring a satisfactory quality of the classification (e.g. the technique of adding the most discriminatory attributes to the core), – inducing decision rules using either the LEM2 algorithm [2] or the Explore algorithm [4][13], – postprocessing of induced rules, e.g. pruning; looking for interesting rules according to the user defined queries [4], – applying the decision rules to classify new objects using different techniques of rule matching, in particular an original approach based on valued closeness relation [3][9] , – evaluation of the sets of decision rules using k-fold cross validation techniques. It will be quite easy to add new modules to the system due to its open architecture.

5

Final Remarks

In the near future we plan to add new capabilities to our system, such as: incremental reduct generation, incremental rule generation, working with incomplete information tables, working with similarity relations for rough approximations, working with dominance relations for rough approximation of multicriteria classification problems, working with dominance relations and pairwise comparison tables for rough approximation of multicriteria choice and ranking problems. These functionalities are based on recent research results of the team members. The ROSE system and its predecessor RoughDAS have been applied to many real-life data sets. The references to these applications are given, e.g. in [5]. Some of the main fields of applications include: medicine, pharmacy, technical diagnostics, finance and management science, image and signal analysis, geology, software project evaluation.

608

B. Pr¸edki, R. Slowi´ nski, J. Stefanowski, R. Susmaga, S. Wilk

References 1. U.M. Fayyad, K.B. Irani. On the Handling of Continuous-Valued Attributes in Decision Tree Generation, Machine Learning, Vol 8, 1992, 87–102. 2. J.W. Grzymala-Busse. LERS - a system for learning from examples based on rough sets. In R. Slowinski, (ed.) Intelligent Decision Support, Kluwer Academic Publishers, 1992, 3–18. 3. R. Mienko, R. Slowinski, J. Stefanowski. Rule Classifier Based on Valued Closeness Relation: ROUGHCLASS version 2.0, ICS Research Report RA-95/002, Poznan University of Technology, Poznan, April 1995. 4. R. Mienko, J. Stefanowski, D. Vanderpooten. Discovery-Oriented Induction of Decision Rules. Cahier du Lamsade no. 141, Paris, Univeriste de Paris Dauphine, spetembre 1996. 5. R. Mienko, R.Slowinski, J. Stefanowski, R. Susmaga. RoughFamily - software implementation of rough set based data analysis and rule discovery techniques. In Tsumoto S. (ed.) Proceedings of the Fourth International Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Tokyo Nov. 6-8 1996, 437–440. 6. Z. Pawlak Rough Sets. Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Dordrecht, 1991. 7. S. Romanski. Operation on families of sets for exhaustive search, given a monotonic function. In W. Beeri, C. Schmidt, N. Doyle (eds.), Proceedings of the 3rd Int. Conference on Data and Knowledge Bases, Jerusalem 1988, 310–322. 8. A. Skowron, Rauszer C.. The discernibility matrices and functions in information systems in: Slowinski R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, 1992, 331–362. 9. R. Slowinski. Rough sets learning of preferential attitude in multi-criteria decision making. In Komorowski J., Ras Z.W. (eds.), Proc. of Int. Sump. on Methodologies for Intelligent Systems, Springer Verlag LNAI 689, 1993, 642–651. 10. R. Slowinski, J. Stefanowski. ’RoughDAS’ and ’RoughClass’ software implementations of the rough set approach. In R. Slowinski (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, 1992, 445–456. 11. R. Slowinski, J. Stefanowski. Rough classification with valued closeness relation. In Didey E. et al. (eds.), New Approaches in Classification and Data Analysis, Springer Verlag, 1993, 482–489. 12. R. Slowinski, J. Stefanowski. Rough set reasoning about uncertain data. Fundamenta Informaticae, 27 (2-3), 1996, 229–244. 13. J. Stefanowski. On rough set based approaches to induction of decision rules. In A. Skowron, L. Polkowski (eds.), Rough Set in Knowledge Discovery, 1998. 14. W. Ziarko. Analysis of Uncertain Information in The Framework of Variable Precision Rough Sets. Foundations of Computing And Decision Sciences Vol 18 (1993) No. 3-4, 381–396.

Suggest Documents