Robert A. Meyers (Ed.)
Encyclopedia of Complexity and Systems Science With 4300 Figures and 420 Tables
123
ROBERT A. MEYERS, Ph. D. Editor-in-Chief RAMTECH LIMITED 122 Escalle Lane Larkspur, CA 94939 USA
[email protected]
Library of Congress Control Number: 2008933604
ISBN: 978-0-387-30440-3 This publication is available also as: Print publication under ISBN: 978-0-387-75888-6 and Print and electronic bundle under ISBN: 978-0-387-69572-3 © 2009 SpringerScience+Buisiness Media, LLC. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. springer.com Printed on acid free paper
SPIN: 11560258
2109letex – 5 4 3 2 1 0
Rough Sets in Decision Making
Rough Sets in Decision Making 1,2 , SALVATORE GRECO 3 , ´ ROMAN SŁOWI NSKI BENEDETTO MATARAZZO3 1 Institute of Computing Science, Poznan University of Technology, Poznan, Poland 2 Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland 3 Faculty of Economics, University of Catania, Catania, Italy
Article Outline Glossary Definition of the Subject Introduction Classical Rough Set Approach to Classification Problems of Taxonomy Type Dominance-Based Rough Set Approach to Ordinal Classification Problems DRSA on a Pairwise Comparison Table for Multiple Criteria Choice and Ranking Problems DRSA for Decision Under Uncertainty Multiple Criteria Decision Analysis Using Association Rules Interactive Multiobjective Optimization Using DRSA (IMO-DRSA) Conclusions Future Directions Bibliography
Glossary Multiple attribute (or multiple criteria) decision support aims at giving the decision maker (DM) a recommendation concerning a set of objects A (also called alternatives, actions, acts, solutions, options, candidates, . . . ) evaluated from multiple points of view called attributes (also called features, variables, criteria, objectives, . . . ). Main categories of multiple attribute (or multiple criteria) decision problems: classification, when the decision aims at assigning each object to one of predefined classes, choice, when the decision aims at selecting the best object, ranking, when the decision aims at ordering objects from the best to the worst.
R
Two kinds of classification problems are distinguished: taxonomy, when the value sets of attributes and the predefined classes are not preference ordered, ordinal classification (also known as multiple criteria sorting), when the value sets of attributes and the predefined classes are preference ordered. Two kinds of choice problems are distinguished: discrete choice, when the set of objects is finite and reasonably small to be listed, multiple objective optimization, when the set of objects is infinite and defined by constraints of a mathematical program. If value sets of attributes are preference-ordered, they are called criteria or objectives, otherwise they keep the name of attributes. Criterion is a real-valued function g i defined on A, reflecting a worth of objects from a particular point of view, such that in order to compare any two objects a; b 2 A from this point of view it is sufficient to compare two values: g i (a) and g i (b). Dominance Object a is non-dominated in set A (Paretooptimal) if and only if there is no other object b in A such that b is not worse than a on all considered criteria, and strictly better on at least one criterion. Preference model is a representation of a value system of the decision maker on the set of objects with vector evaluations. Decision under uncertainty takes into account consequences of decisions that distribute over multiple states of nature with given probability. The preference order, characteristic for data describing multiple attribute decision problems, concerns also decision under uncertainty, where the objects correspond to acts, attributes are outcomes (gain or loss) to be obtained with a given probability, and the problem consists in ordinal classification, choice, or ranking of the acts. Rough set in universe U is an approximation of a set based on available information about objects of U. The rough approximation is composed of two ordinary sets called lower and upper approximation. Lower approximation is a maximal subset of objects which, according to the available information, certainly belong to the approximated set, and upper approximation is a minimal subset of objects which, according to the available information, possibly belong to the approximated set. The difference between upper and lower approximation is called boundary. Decision rule is a logical statement of the type “if . . . , then . . . ”, kursiv where the premise (condition part)
7753
7754
R
Rough Sets in Decision Making
specifies values assumed by one or more condition attributes and the conclusion (decision part) specifies an overall judgment. Definition of the Subject Scientific analysis of decision problems aims at giving the decision maker (DM) a recommendation concerning a set of objects (also called alternatives, solutions, acts, actions, options, candidates, . . . ) evaluated from multiple points of view considered relevant for the problem at hand and called attributes (also called features, variables, criteria, objectives, . . . ). For example, a decision can concern: 1) diagnosis of pathologies for a set of patients, where patients are objects of the decision, and symptoms and results of medical tests are the attributes, 2) assignment of enterprises to classes of risk, where enterprises are objects of the decision, and financial ratio indices and other economic indicators, such as the market structure, the technology used by the enterprise and the quality of management, are the attributes, 3) selection of a car to be bought from among a given set of cars, where cars are objects of the decision, and maximum speed, acceleration, price, fuel consumption, comfort, color and so on, are the attributes, 4) ordering of students applying for a scholarship, where students are objects of the decision, and scores in different subjects are the attributes. The following three main categories of decision problems are typically distinguished [44]: classification, when the decision aims at assigning each object to one of predefined classes, choice, when the decision aims at selecting the best objects, ranking, when the decision aims at ordering objects from the best to the worst. Looking at the above examples, one can say that 1) and 2) are classification problems, 3) is a choice problem and 4) is a ranking problem. The above categorization can be refined by distinguishing two kinds of classification problems: taxonomy, when the value sets of attributes and the predefined classes are not preference ordered, and ordinal classification (also known as multiple criteria sorting), when the value sets of attributes and the predefined classes are preference ordered [12]. In the above examples, 1) is a taxonomy problem and 2) is an ordinal classification problem. If value sets
of attributes are preference ordered, they are called criteria, otherwise they keep the name of attributes. For example, in a decision regarding the selection of a car, its price is a criterion because, obviously, a low price is better than a high price. Instead, the color of a car is not a criterion but simply an attribute, because red is not intrinsically better than green. One can imagine, however, that also the color of a car could become a criterion if, for example, a DM would consider red better than green. Introduction Scientific support of decisions makes use of a more or less explicit model of the decision problem. The model relates the decision to the characteristics of the objects expressed by the considered attributes. Building such a model requires information about conditions and parameters of the aggregation of multi-attribute characteristics of objects. The nature of this information depends on the adopted methodology: prices and interest rates for cost-benefit analysis, cost coefficients in objectives and technological coefficients in constraints for mathematical programming, a training set of decision examples for neural networks and machine learning, substitution rates for a value function of multi-attribute utility theory, pairwise comparisons of objects in terms of intensity of preference for the analytic hierarchy process, attribute weights and several thresholds for ELECTRE methods, and so on (see the state-of-the-art survey [4]). This information has to be provided by the DM, possibly assisted by an analyst. Very often this information is not easily definable. For example, this is the case of the price of many immaterial goods and of the interest rates in cost-benefit analysis, or the case of the coefficients of objectives and constraints in mathematical programming models. Even if the required information is easily definable, like a training set of decision examples for neural networks, it is often processed in a way which is not clear for the DM, such that (s)he cannot see what are the exact relations between the provided information and the final recommendation. Consequently, very often the decision model is perceived by the DM as a black box whose result has to be accepted because the analyst’s authority guarantees that the result is “right”. In this context, the aspiration of the DM to find good reasons to make decision is frustrated and rises the need for a more transparent methodology in which the relation between the original information and the final recommendation is clearly shown. Such a transparent methodology searched for has been called glass box [32]. Its typical representative is using a learning set of decision examples as the input preference information provided by the DM, and
Rough Sets in Decision Making
it is expressing the decision model in terms of a set of “if . . . , then . . . ” decision rules induced from the input information. From one side, the decision rules are explicitly related to the original information and, from the other side, they give understandable justifications for the decisions to be made. For example, in case of a medical diagnosis problem, the decision rule approach requires as input information a set of examples of previous diagnoses, from which some diagnostic rules are induced, such as “if there is symptom ˛ and the test result is ˇ, then there is pathology ”. Each one of such rules is directly related to examples of diagnoses in the input information, where there is symptom ˛, test result ˇ and pathology . Moreover, the DM can verify easily that in the input information there is no example of diagnosis where there is symptom ˛, test result ˇ but no pathology . The rules induced from the input information provided in terms of exemplary decisions represent a decision model which is transparent for the DM and enables his understanding of the reasons of his past decisions. The acceptance of the rules by the DM justifies, in turn, their use for future decisions. The induction of rules from examples is a typical approach of artificial intelligence. This explains our interest in rough set theory [38,39] which proved to be a useful tool for analysis of vague description of decision situations [41,48]. The rough set analysis aims at explaining the values of some decision attributes, playing the role of “dependent variables”, by means of the values of condition attributes, playing the role of “independent variables”. For example, in the above diagnostic context, data about the presence of a pathology are given by decision attributes, while data about symptoms and tests are given by condition attributes. An important advantage of the rough set approach is that it can deal with partly inconsistent examples, i. e. cases where the presence of different pathologies is associated with the presence of the same symptoms and test results. Moreover, it provides useful information about the role of particular attributes and their subsets, and prepares the ground for representation of knowledge hidden in the data by means of “if . . . , then . . . ” decision rules. Classical rough set approach proposed by Pawlak [38,39] cannot deal, however, with preference order in the value sets of condition and decision attributes. For this reason, the classical rough set approach can deal with only one of four decision problems listed above – classification of taxonomy type. To deal with ordinal classification, choice and ranking, it is necessary to generalize the classical rough set approach, so as to take into
R
account preference orders and monotonic relationships between condition and decision attributes. This generalization, called Dominance-based Rough Set Approach (DRSA), has been proposed by Greco, Matarazzo and Slowinski [12,14,15,18,21,49]. Classical Rough Set Approach to Classification Problems of Taxonomy Type Information Table and Indiscernibility Relation Information regarding classification examples is supplied in the form of an information table, whose separate rows refer to distinct objects, and whose columns refer to different attributes considered. This means that each cell of this table indicates an evaluation (quantitative or qualitative) of the object placed in the corresponding row by means of the attribute in the corresponding column. Formally, an information table is the 4-tuple S D hU; Q; V ; vi, where U is a finite set of objects, called universe, Q D fq1 ; : : : ; q n g is a finite set of attributes, V q is a value set of the atS tribute q; V D q2Q Vq , and v : U Q ! V is a total function such that v(x; q) 2 Vq for each q 2 Q; x 2 U, called information function. Therefore, each object x of U is described by a vector (string) DesQ (x) D [v(x; q1 ); : : : ; v(x; q n )], called description of x in terms of the evaluations on the attributes from Q; it represents the available information about x. Obviously, x 2 U can be described in terms of any nonempty subset P Q. To every (non-empty) subset of attributes P Q there is associated an indiscernibility relation on U, denoted by I P : I P D f(x; y) 2 U U : v(x; q) D v(y; q) ; 8q 2 Pg : If (x; y) 2 I P , it is said that the objects x and y are P-indiscernible. Clearly, the indiscernibility relation thus defined is an equivalence relation (reflexive, symmetric and transitive). The family of all the equivalence classes of the relation I P is denoted by U/I P , and the equivalence class containing object x 2 U, by I P (x). The equivalence classes of the relation I P are called P-elementary sets. Approximations Let S be an information table, X a non-empty subset of U and ; ¤ P Q. The P-lower approximation and the Pupper approximation of X in S are defined, respectively, as: P (X) D fx 2 U : I P (x) Xg ; P (X) D fx 2 U : I P (x) \ X ¤ ;g :
7755
7756
R
Rough Sets in Decision Making
The elements of P (X) are all and only those objects x 2 U which belong to the equivalence classes generated by the indiscernibility relation I P , contained in X; the elements of P (X) are all and only those objects x 2 U which belong to the equivalence classes generated by the indiscernibility relation I P , containing at least one object x belonging to X. In other words, P (X) is the largest union of the P-elementary sets included in X, while P (X) is the smallest union of the P-elementary sets containing X. The P-boundary of X in S, denoted by Bn P (X), is defined by Bn P (X) D P (X) P (X) : The following inclusion relation holds: P (X) X P (X) : Thus, in view of information provided by P, if an object x belongs to P (X), then it certainly belongs to set X, while if x belongs to P (X), then it possibly belongs to set X. Bn P (X) constitutes the “doubtful region” of X: nothing can be said with certainty about the membership of its elements to set X, using the subset of attributes P only. Moreover, the following complementarity relation is satisfied: P (X) D U P(U X) : If the P-boundary of set X is empty, Bn P (X) D ;, then X is an ordinary (exact) set with respect to P, that is, it may be expressed as union of a certain number of P-elementary sets; otherwise, if Bn P (X) ¤ ;, set X is an approximate (rough) set with respect to P and may be characterized by means of the approximations P (X) and P (X). The family of all sets X U having the same P-lower and P-upper approximations is called the rough set. The quality of the approximation of set X by means of the attributes from P is defined as P (X) D
jP (X)j ; jXj
such that 0 P (X) 1. The quality P (X) represents the relative frequency of the objects correctly classified using the attributes from P. The definition of approximations of a set X U can be extended to a classification, i. e. a partition Y D fY1 ; : : : ; Ym g of U. The P-lower and P-upper approximations of Y in S are defined by sets P(Y) D ˚ fP(Y1 ); : : : ; P(Ym )g and P(Y) D P(Y1 ); : : : ; P(Ym ) , respectively. The coefficient Pm jP(Yi )j P (Y) D iD1 jUj
is called quality of the approximation of classification Y by set of attributes P, or in short, quality of classification. It expresses the ratio of all P-correctly classified objects to all objects in the system. Dependence and Reduction of Attributes An issue of great practical importance is the reduction of “superfluous” attributes in an information table. Superfluous attributes can be eliminated without deteriorating the information contained in the original table. Let P Q and p 2 P. It is said that attribute p is superfluous in P with respect to classification Y if P(Y) D (P p)(Y); otherwise, p is indispensable in P. The subset of Q containing all the indispensable attributes is known as the core. Given classification Y, any minimal (with respect to inclusion) subset P Q, such that P(Y) D Q(Y), is called reduct. It specifies a minimal subset P of Q which keeps the quality of classification at the same level as the whole set of attributes, i. e. P (Y) D Q (Y). In other words, the attributes that do not belong to the reduct are superfluous with respect to the classification Y of objects from U. More than one reduct may exist in an information table and their intersection gives the core. Decision Table and Decision Rules In the information table describing examples of classification, the attributes of set Q are divided into condition attributes (set C ¤ ;) and decision attributes (set D ¤ ;); C [ D D Q and C \ D D ;. Such an information table is called a decision table. The decision attributes induce a partition of U deduced from the indiscernibility relation I D in a way that is independent of the condition attributes. D-elementary sets are called decision classes, denoted by Cl t ; t D 1; : : : ; m. The partition of U into decision classes is called classification Cl D fCl1 ; : : : ; Cl m g. There is a tendency to reduce the set C while keeping all important relationships between C and D, in order to make decisions on the basis of a smaller amount of information. When the set of condition attributes is replaced by one of its reducts, the quality of approximation of the classification induced by the decision attributes is not deteriorating. Since the aim is to underline the functional dependencies between condition and decision attributes, a decision table may also be seen as a set of decision rules. These are logical statements of the type “if . . . , then . . . ”, where the premise (condition part) specifies values assumed by one or more condition attributes (description of C-elementary sets) and the conclusion (decision part) specifies an assign-
R
Rough Sets in Decision Making
ment to one or more decision classes. Therefore, the syntax of a rule is the following: “if v(x; q1 ) D r q1 and v(x; q2 ) D r q2 and . . . v(x; q p ) D r q p , then x belongs to decision class Cl j1 or Cl j2 or . . . Cl jk ”, where fq1 ; q2 ; : : : ; q p g C; (r q1 ; r q2 ; : : : ; r q p ) 2 Vq1 Vq2 Vq p and Cl j1 ; Cl j2 ; : : : ; Cl jk are some decision classes of the considered classification Cl. If the consequence is univocal, i. e. k D 1, then the rule is univocal, otherwise it is approximate. An object x 2 U supports decision rule r if its description is matching both the condition part and the decision part of the rule. The decision rule r covers object x if it matches the condition part of the rule. Each decision rule is characterized by its strength, defined as the number of objects supporting the rule. In the case of approximate rules, the strength is calculated for each possible decision class separately. If a univocal rule is supported by objects from the lower approximation of the corresponding decision class only, then the rule is called certain or deterministic. If, however, a univocal rule is supported by objects from the upper approximation of the corresponding decision class only, then the rule is called possible or probabilistic. Approximate rules are supported, in turn, only by objects from the boundaries of the corresponding decision classes. Procedures for generation of decision rules from a decision table use an inductive learning principle. The objects are considered as examples of classification. In order to induce a decision rule with a univocal and certain conclusion about assignment of an object to decision class X, the examples belonging to the C-lower approximation of X are called positive and all the others negative. Analogously, in case of a possible rule, the examples belonging to the C-upper approximation of X are positive and all the others negative. Possible rules are characterized by a coefficient, called confidence, telling to what extent the rule is consistent, i. e. what is the ratio of the number of positive examples supporting the rule to the number of examples belonging to set X according to decision attributes. Finally, in case of an approximate rule, the examples belonging to the C-boundary of X are positive and all the others negative. A decision rule is called minimal if removing any attribute from the condition part gives a rule covering also negative objects. The existing induction algorithms use one of the following strategies [55]: (a) Generation of a minimal representation, i. e. minimal set of rules covering all objects from a decision table,
(b) Generation of an exhaustive representation, i. e. all rules for a given decision table, (c) Generation of a characteristic representation, i. e. a set of rules covering relatively many objects, however, not necessarily all objects from a decision table.
Explanation of the Classical Rough Set Approach by an Example Suppose that one wants to describe the classification of basic traffic signs to a novice. There are three main classes of traffic signs corresponding to: Warning (W), Interdiction (I), Order (O). Then, these classes may be distinguished by such attributes as the shape (S) and the principal color (PC) of the sign. Finally, one can consider a few examples of traffic signs, like those shown in Table 1. These are: a) b) c) d)
Sharp right turn, Speed limit of 50 km/h, No parking, Go ahead.
The rough set approach is used here to build a model of classification of traffic signs to classes W, I, O on the basis of attributes S and PC. This is a typical problem of taxonomy. One can remark that the sets of signs indiscernible by “Class” are: W D fagClass ;
I D fb; cgClass ;
O D fdgClass ;
Rough Sets in Decision Making, Table 1 Examples of traffic signs described by S and PC Traffic sign
Shape (S) Primary color (PC) Class
a)
triangle
yellow
W
b)
circle
white
I
c)
circle
blue
I
d)
circle
blue
O
7757
7758
R
Rough Sets in Decision Making
and the sets of signs indiscernible by S and PC are as follows: fagS;PC ;
fbgS;PC ;
fc; dgS;PC :
The above elementary sets are generated, on the one hand, by decision attribute “Class” and, on the other hand, by condition attributes S and PC. The elementary sets of signs indiscernible by “Class” are denoted by fgClass and those by S and PC are denoted by fgS;PC. Notice that W D fagClass is characterized precisely by fagS;PC. In order to characterize I D fb; cgClass and O D fdgClass , one needs fbgS;PC and fc; dgS;PC , however, only fbgS;PC is included in I D fb; cgClass while fc; dgS;PC has a non-empty intersection with both I D fb; cgClass and O D fdgClass . It follows, from this characterization, that by using condition attributes S and PC, one can characterize class W precisely, while classes I and O can only be characterized approximately: Class W includes sign a certainly and possibly no other sign than a, Class I includes sign b certainly and possibly signs b; c and d, Class O includes no sign certainly and possibly signs c and d. The terms “certainly” and “possibly” refer to the absence or presence of ambiguity between the description of signs by S and PC from the one side, and by “Class”, from the other side. In other words, using description of signs by S and PC, one can say that all signs from elementary sets fgS;PC included in elementary sets fgClass belong certainly to the corresponding class, while all signs from elementary sets fgS;PC having a non-empty intersection with elementary sets fgClass belong to the corresponding class only possibly. The two sets of certain and possible signs are, respectively, the lower and upper approximation of the corresponding class by attributes S and PC: lower_approx:S;PC (W) D fag; upper_approx:S;PC (W) D fag; lower_approx:S;PC (I) D fbg; upper_approx:S;PC (I) D fb; c; dg; lower_approx:S;PC (O) D ;; upper_approx:S;PC (O) D fc; dg: The quality of approximation of the classification by attributes S and PC is equal to the number of all the signs in the lower approximations divided by the number of all the signs in the table, i. e. 1/2.
Rough Sets in Decision Making, Table 2 Examples of traffic signs described by S, PC and SC Traffic sign Shape (S)
Primary Secondary Class color (PC) color (SC)
a)
triangle
yellow
red
W
b)
circle
white
red
I
c)
circle
blue
red
I
d)
circle
blue
white
O
One way to increase the quality of the approximation is to add a new attribute so as to decrease the ambiguity. Let us introduce the secondary color (SC) as a new condition attribute. The new situation is now shown in Table 2. As one can see, the sets of signs indiscernible by S, PC and SC, i. e. the elementary sets fgS;PC;SC, are now: fagS;PC;SC ;
fbgS;PC;SC ;
fcgS;PC;SC ;
fdgS;PC;SC :
It is worth noting that the elementary sets are finer than before and this enables the ambiguity to be eliminated. Consequently, the quality of approximation of the classification by attributes S, PC and SC is now equal to 1. A natural question occurring here is to ask if, indeed, all three attributes are necessary to characterize precisely the classes W, I and O. When attribute S or attribute PC is eliminated from the description of the signs, the elementary sets fgPC;SC or fgS;SC are defined, respectively, as follows: fagPC;SC ; fagS;SC ;
fbgPC;SC ; fb; cgS;SC ;
fcgPC;SC ;
fdgPC;SC ;
fdgS;SC :
Using any one of the above elementary sets, it is possible to characterize (approximate) classes W, I and O with the same quality (equal to 1) as it is when using the elementary sets fgS;PC;SC (i. e. those generated by the complete set of three condition attributes). Thus, the answer to the above question is that the three condition attributes are not all necessary to characterize precisely the classes W, I and O. It is, in fact, sufficient to use either PC and SC or S and SC. The subsets of condition attributes fPC; SCg and fS; SCg are called reducts of fS; PC; SCg because they have
Rough Sets in Decision Making
this property. Note that the identification of reducts enables us to reduce attributes about the signs from the table to only those which are relevant. Other useful information can be generated from the identification of reducts by taking their intersection. This is called the core. In our example, the core contains attribute SC. This tells us that it is clearly an indispensable attribute i. e. it cannot be eliminated from the description of the signs without decreasing the quality of the approximation. Note that other attributes from the reducts (i. e. S and PC) are exchangeable. If there happened to be some other attributes which were not included in any reduct, then they would be superfluous, i. e. they would not be useful at all in the characterization of the classes W, I and O. If, however, column S or PC is eliminated from Table 2, then the resulting table is not a minimal representation of knowledge about the classification of the four traffic signs. Note that, in order to characterize class W in Table 2, it is sufficient to use the condition “S = triangle”. Moreover, class I is characterized by two conditions (“S = circle” and “SC = red”) and class O is characterized by the condition “SC = white”. Thus, the minimal representation of this information system requires only four conditions (rather than the eight conditions that are presented in Table 2 with either column S or PC eliminated). This representation corresponds to the following set of decision rules which may be seen as classification model discovered in the data set contained in Table 2 (in the braces there are symbols of signs covered by the corresponding rule): rul e #1 : if S D triangle;
then Class D W
fag
then Class D I
fb; cg
then Class D O
fdg :
rul e #2 : if S D circle and SC D red; rul e #3 : if SC D white;
This is not the only representation, because an alternative set of rules is: rul e #10 : if PC D yellow;
then Class D W
fag
then Class D I
fbg
and SC D red;
then Class D I
fcg
0
then Class D O
fdg :
0
rul e #2 : if PC D white; rul e #30 : if PC D blue; rul e #4 : if SC D white;
It is interesting to come back to Table 1 and to ask what decision rules represent this information system. As the description of the four signs by S and PC is not sufficient to characterize exactly all the classes, it is not surprising that not all the rules will have a non-ambiguous decision.
R
Indeed, the following decision rules can be induced: rul e #100 : if S D triangle;
then Class D W
fag
rul e #200 : if PC D white;
then Class D I
fbg
rul e #300 : if PC D blue;
then Class D I or O
fc; dg :
Note that these rules can be induced from the lower approximations of classes W and I, and from the set called the boundary of both I and O. Indeed, for exact rule #100 , the supporting example is in lower_approx:S;PC (W) D fag; for exact rule #200 it is in lower_approx:S;PC (I) D fbg; and the supporting examples for approximate rule #300 are in the boundary of classes I and O, defined as: boundaryS;PC (I) D upper_approx:S;PC(I) lower_approx:S;PC (I) D fc; dg ; boundaryS;PC (O) D upper_approx:S;PC(O) lower_approx:S;PC (O) D fc; dg : As a result of the approximate characterization of classes W, I and O by S and PC, an approximate representation in terms of decision rules is obtained. Since the quality of the approximation is 1/2, exact rules (#100 and #200 ) cover one half of the examples and the other half is covered by the approximate rule (#300 ). Now, the quality of approximation by S and SC, or by PC and SC, was equal to 1, so all examples were covered by exact rules (#1 to #3, or #10 to #40 respectively). One can see, from this simple example, that the rough set analysis of data included in an information system provides some useful information. In particular, the following results are obtained: A characterization of decision classes in terms of chosen condition attributes through lower and upper approximation. A measure of the quality of approximation which indicates how good the chosen set of attributes is for approximation of the classification. The reducts of condition attributes including all relevant attributes. At the same time, superfluous and exchangeable attributes are also identified. The core composed of indispensable attributes. A set of decision rules which is induced from the lower and upper approximations of the decision classes. This constitutes a classification model for a given information system.
7759
7760
R
Rough Sets in Decision Making
Dominance-Based Rough Set Approach to Ordinal Classification Problems Dominance-Based Rough Set Approach (DRSA) Dominance-based Rough Set Approach (DRSA) has been proposed by the authors to handle background knowledge about ordinal evaluations of objects from a universe, and about monotonic relationships between these evaluations, e. g. “the larger the mass and the smaller the distance, the larger the gravity” or “the greater the debt of a firm, the greater its risk of failure”. Such a knowledge is typical for data describing various phenomena. It is also characteristic for data concerning multiple criteria decision or decision under uncertainty, where the order of value sets of condition and decision attributes corresponds to increasing or decreasing preference. In case of multiple criteria decision, the condition attributes are called criteria. Let us consider a decision table including a finite universe of objects (solutions, alternatives, actions) U evaluated on a finite set of criteria F D f f1 ; : : : ; f n g, and on a single decision attribute d. The set of the indices of criteria is denoted by I D f1; : : : ; ng. Without loss of generality, f i : U ! < for each i D 1; : : : ; n, and, for all objects x; y 2 U; f i (x) f i (y) means that “x is at least as good as y with respect to criterion f i ”, which is denoted by x i y. Therefore, it is supposed that i is a complete preorder, i. e. a strongly complete and transitive binary relation, defined on U on the basis of evaluations f i (). Furthermore, decision attribute d makes a partition of U into a finite number of decision classes, Cl D fCl1 ; : : : ; Cl m g, such that each x 2 U belongs to one and only one class Cl t ; t D 1; : : : ; m. It is assumed that the classes are preference ordered, i. e. for all r; s D 1; : : : ; m, such that r > s, the objects from Clr are preferred to the objects from Cls . More formally, if is a comprehensive weak preference relation on U, i. e. if for all x; y 2 U; xy reads “x is at least as good as y”, then it is supposed that [x2Clr ; y2Cls ; r > s] ) xy ; where xy means xy and not yx. The above assumptions are typical for consideration of an ordinal classification (or multiple criteria sorting) problem. Indeed, the decision table characterized above includes examples of ordinal classification which constitute an input preference information to be analyzed using DRSA. The sets to be approximated are called upward union and downward union of decision classes, respectively: Cl t
D
[ st
Cls ;
Cl t
D
[ st
Cls ;
t D 1; : : : ; m :
The statement x 2 Cl t reads “x belongs to at least class Cl t ”, while x 2 Cl t reads “x belongs to at most class D U; Cl D Cl Cl t ”. Let us remark that Cl1 D Cl m m m and Cl1 D Cl1 . Furthermore, for t D 2; : : : ; m, D U Cl t Cl t1
and
Cl t D U Cl t1 :
The key idea of DRSA is representation (approximation) of upward and downward unions of decision classes, by granules of knowledge generated by criteria. These granules are dominance cones in the criteria values space. x dominates y with respect to set of criteria P I (shortly, x P-dominates y), denoted by xD P y, if for every criterion i 2 P; f i (x) f i (y). The relation of P-dominance is reflexive and transitive, i. e. it is a partial preorder. Given a set of criteria P I and x 2 U, the granules of knowledge used for approximation in DRSA are: a set of objects dominating x, called P-dominating set, DC P (x) D fy 2 U : yD P xg, a set of objects dominated by x, called P-dominated set, D P (x) D fy 2 U : xD P yg. Let us recall that the dominance principle requires that an object x dominating object y on all considered criteria (i. e. x having evaluations at least as good as y on all considered criteria) should also dominate y on the decision (i. e. x should be assigned to at least as good decision class as y). Objects satisfying the dominance principle are called consistent, and those which are violating the dominance principle are called inconsistent. P-lower approximation of Cl t , denoted by The P Cl t , and the P-upper approximation of Cl t , denoted by P Cl t , are defined as follows (t D 2; : : : ; m):
˚ P Cl t D x 2 U : DC P (x) Cl t ;
˚ P Cl t D x 2 U : D P (x) \ Cl t ¤ ; : Analogously, one can define the P-lower approximation and the P-upper approximation of Cl t as follows (t D 1; : : : ; m 1):
˚ P Cl t D x 2 U : D P (x) Cl t ;
˚ P Cl t D x 2 U : DC P (x) \ Cl t ¤ ; : The P-lower and P-upper approximations so defined satisfy the following inclusion properties, for all P I:
P Cl t Cl t P Cl t ;
t D 2; : : : ; m ;
P Cl t Cl t P Cl t ;
t D 1; : : : ; m 1 :
Rough Sets in Decision Making
The P-lower and P-upper approximations of Cl t and Cl t have an important complementarity property, according to which,
P Cl t D U
P Cl t D U
P Cl t D U
P Cl t D U
P Cl t1 and P Cl t1 ; t D 2; : : : ; m; P Cl tC1 and P Cl tC1 ; t D 1; : : : ; m 1 :
The P-boundary
of Cl t and Cl t , denoted by Bn P Cl t and Bn P Cl t , respectively, are defined as follows:
Bn P Cl t D P Cl t P Cl t ; t D 2; : : : ; m;
Bn P Cl t D P Cl t P Cl t ; t D 1; : : : ; m 1:
Due complementarity property,
to the above Bn P Cl t D Bn P Cl t1 , for t D 2; : : : ; m. For every P C, the quality of approximation of the ordinal classification Cl by a set of criteria P is defined as the ratio of the number of objects P-consistent with the dominance principle and the number of all the objects in U. Since the P-consistent objects are those which do not belong to any P-boundary Bn P (Cl t ); t D 2; : : : ; m, or Bn P Cl t ; t D 1; : : : ; m 1, the quality of approximation of the ordinal classification Cl by a set of criteria P, can be written as ˇ
ˇ S ˇ ˇU tD2;:::;m Bn P Cl t P (Cl) D jUj ˇ
ˇ S ˇ ˇU tD1;:::;m1 Bn P Cl t D : jUj P (Cl) can be seen as a degree of consistency of the objects from U, when P is the set of criteria and Cl is the considered ordinal classification. Each minimal (with respect to inclusion) subset P C such that P (Cl) D C (Cl) is called a reduct of Cl, and is denoted by REDCl . Let us remark that for a given set U one can have more than one reduct. The intersection of all reducts is called the core, and is denoted by CORECl . Criteria in CORECl cannot be removed from consideration without deteriorating the quality of approximation. This means that, in set C, there are three categories of criteria: indispensable criteria included in the core, exchangeable criteria included in some reducts, but not in the core, redundant criteria, neither indispensable nor exchangeable, and thus not included in any reduct.
R
The dominance-based rough approximations of upward and downward unions of decision classes can serve to induce a generalized description of objects in terms of “if . . . , then . . . ” decision rules. For a given upward or downward union of classes, Cl t or Cls , the decision rules induced under
a hypothesis that objects belonging to P(Cl t ) or P Cls are positive examples, and all the others are negative, suggest a certain assignment to “class Cl t or better”, or to “class Cls or worse”, respectively. On the other hand, the decision rules under induced
a hypothesis that objects belonging to P Cl t or P Cl s are positive examples, and all the others are negative, suggest a possible assignment to “class Cl t or better”, or to “class Cls or worse”, respectively. Finally, the decision rules induced under a hypothesis that objects belonging to the intersection P(Cls ) \ P(Cl t ) are positive examples, and all the others are negative, suggest an approximate assignment to some classes between Cls and Cl t (s < t). In the case of preference ordered description of objects, set U is composed of examples of ordinal classification. Then, it is meaningful to consider the following five types of decision rules: 1) certain D -decision rules, providing lower profile descriptions for objects belonging to P(Cl t ): if f i 1 (x) r i 1 and . . . and f i p (x) r i p , then x 2 Cl t ; fi1 ; : : : ; i p g I; t D 2; : : : ; m; r i 1 ; : : : ; r i p 2