dealing with missing data

3 downloads 0 Views 654KB Size Report
We require that the referent object has no missing data. The two adapted rough set approaches boil down to the original approaches when there are no missing ...
[In]: S.H. Zanakis, G. Doukidis and C. Zopounidis (eds.), Decision Making: Recent Developments and Worldwide Applications. Kluwer Academic Publishers, Dordrecht/Boston/London, 2000, 295-316

DEALING WITH MISSING DATA IN ROUGH SET ANALYSIS OF MULTI-ATTRIBUTE AND MULTI-CRITERIA DECISION PROBLEMS 1

1

S. Greco , B. Matarazzo , R. Slowinski 1

2

2

Faculty of Economics, University of Catania, 95129 Catania, Italy

Institute of Computing Science, Poznan University of Technology, 60965 Poznan, Poland

Abstract: Rough sets methodology is a useful tool for analysis of decision problems concerning a set of objects described in a data table by a set of condition attributes and by a set of decision attributes. In practical applications, however, the data table is often not complete because some data are missing. To deal with this case, we propose an extension of the rough set methodology to the analysis of incomplete data tables. The adaptation concerns both the classical rough set approach based on the use of indiscernibility relations and the new rough set approach based on the use of dominance relations. While the first approach deals with the multi-attribute classification problem, the second approach deals with the multi-criteria sorting problem. In the later, condition attributes have preferenceordered scales, and thus are called criteria, and the classes defined by the decision attributes are also preference-ordered. The adapted relations of indiscernibility or dominance between a pair of objects are considered as directional statements where a subject is compared to a referent object. We require that the referent object has no missing data. The two adapted rough set approaches boil down to the original approaches when there are no missing data. The rules induced from the newly defined rough approximations defined are either exact or approximate, depending whether they are supported by consistent objects or not, and they are robust in a sense that each rule is supported by at least one object with no missing data on the condition attributes or criteria represented in the rule. Keywords: Rough sets methodology, Missing data, Decision analysis, Multi-attribute classification, Multi-criteria sorting, Decision rules.

1

1. Introduction The rough sets philosophy introduced by Pawlak (1982) is based on the assumption that with every object of the universe there is associated a certain amount of information (data, knowledge), expressed by means of some attributes used for object description. It proved to be an excellent tool for analysis of decision problems (Pawlak and Slowinski, 1994) where the set of attributes is divided into disjoint sets of condition attributes and decision attributes describing objects in a data table. The key idea of rough sets is approximation of knowledge expressed by decision attributes using knowledge expressed by condition attributes. The rough set approach answers several questions related to the approximation: (a) is the information contained in the data table consistent? (b) what are the non-redundant subsets of condition attributes ensuring the same quality of approximation as the whole set of condition attributes? (c) what are the condition attributes which cannot be eliminated from the approximation without decreasing the quality of approximation? (d) what minimal “if ..., then ...” decision rules can be induced from the approximations? Some important characteristics of the rough set approach make of this a particularly interesting tool in a number of problems and concrete applications (Pawlak, 1991). With respect to the input information, it is possible to deal with both quantitative and qualitative data, and inconsistencies need not to be removed prior to the analysis. The rough set approach deals with inconsistencies by separation of certain and doubtful knowledge extracted from the data table. With reference to the output information, it is possible to acquire a posteriori information regarding the relevance of particular attributes or subsets of attributes to the quality of approximation considered in the problem at hand. Moreover, the final result in the form of "if..., then..." decision rules, using the most relevant attributes, is easy to interpret. The original rough set approach is not able, however, to discover and process inconsistencies coming from consideration of criteria, i.e. condition attributes with preference-ordered scales (domains), like product quality, market share, debt ratio. Regular condition attributes, e.g. symptoms, colors, textural features, traditionally considered in the rough set methodology, are different from criteria because their scales are not preferenceordered (see, e.g., Pawlak, 1991; Pawlak and Slowinski, 1994). Consider, for example, two firms, A and B, evaluated for assessment of bankruptcy risk by a set of criteria including the “debt ratio” (total debt/total assets). If firm A has a low value while firm B has a high value of the debt ratio, and evaluations of these firms on other attributes are equal, then, from bankruptcy risk point of view, firm A dominates firm B. Suppose, however, that firm A has been assigned by a decision maker to a class of higher risk than firm B. This is obviously inconsistent with the dominance principle. Within the classical rough set approach, the two firms will be considered as just discernible and no inconsistency will be stated. For this reason, Greco, Matarazzo and Slowinski (1998a, 1998b, 1999a) have proposed a new rough set approach that is able to deal with inconsistencies typical to consideration of criteria and preference-ordered decision classes. This innovation is mainly based on substitution of the indiscernibility relation by a dominance relation in the rough approximation of decision classes. An important consequence of this fact is a possibility of inferring from exemplary decisions the preference model in terms of decision rules being

2

logical statements of the type ”if..., then...”. The separation of certain and doubtful knowledge about the decision maker’s preferences is done by distinction of different kinds of decision rules, depending whether they are induced from lower approximations of decision classes or from the boundaries of these classes composed of inconsistent examples that do not observe the dominance principle. Such preference model is more general than the classical functional or relational model in multi-criteria decision making and it is more understandable for the users because of its natural syntax. Both the classical rough set approach based on the use of indiscernibility relations and the new rough set approach based on the use of dominance relations suffer, however, from another deficiency: they require the data table to be complete, i.e. without missing values on condition attributes or criteria describing the objects. To deal with the case of missing values in the data table, we propose an adaptation of the rough set methodology. The adaptation concerns both the classical rough set approach and the dominance-based rough set approach. While the first approach deals with multiattribute classification, the second approach deals with multi-criteria sorting. Multiattribute classification concerns an assignment of a set of objects to a set of pre-defined classes. The objects are described by a set of (regular) attributes and the classes are not necessarily ordered. Multi-criteria sorting concerns a set of objects evaluated by criteria, i.e. attributes with preference-ordered scales. In this problem, the classes are also preferenceordered. The adapted relations of indiscernibility or dominance between a pair of objects are considered as directional statements where a subject is compared to a referent object. We require that the referent object has no missing values. The two adapted rough set approaches maintain all good characteristics of their original approaches. They also boil down to the original approaches when there are no missing values. The rules induced from the rough approximations defined according to the adapted relations verify some suitable properties: they are either exact or approximate, depending whether they are supported by consistent objects or not, and they are robust in a sense that each rule is supported by at least one object with no missing value on the condition attributes or criteria represented in the rule. The paper extends the short version presented by the authors in (Greco, Matarazzo and Slowinski, 1999c). It is organized in the following way. In section 2, we present the extended rough sets methodology handling the missing values. It is composed of four paragraphs – first two are devoted to adaptation of the classical rough set approach based on the use of indiscernibility relations; the other two undertake the adaptation of the new rough set approach based on the use of dominance relations. In order to illustrate the concepts introduced in section 2, we present an illustrative example in section 3. Section 4 groups conclusions.

2. Rough approximations defined on data tables with missing values For algorithmic reasons, the data set about objects is represented in the form of a data table. The rows of the table are labeled by objects, whereas columns are labeled by attributes and entries of the table are attribute-values, called descriptors.

3

Formally, by a data table we understand the 4-tuple S=, where U is a finite set of objects, Q is a finite set of attributes, V = U V q and V q is a domain of the attribute q, q∈Q

and f: U×Q→V is a total function such that f(x,q)∈Vq∪{∗} for every q∈Q, x∈U, called an information function (Pawlak, 1991). The symbol “∗” indicates that the value of an attribute for a given object is unknown (missing). If set Q is divided into set C of condition attributes and set D of decision attributes, then such a data table is called decision table. If the domain (scale) of a condition attribute is ordered according to a decreasing or increasing preference, then it is a criterion. For condition attribute q∈C being a criterion, Sq is an outranking relation (Roy, 1985) on U such that xSqy means “x is at least as good as y with respect to criterion q”. We suppose that Sq is a total preorder, i.e. a strongly complete and transitive binary relation, defined on U on the basis of evaluations f(⋅,q). The domains of “regular” condition attributes are not ordered. We assume that the set D of decision attributes is a singleton {d}. Decision attribute d makes a partition of U into a finite number of classes Cl={Clt, t∈T}, T={1,...,n}, such that each x∈U belongs to one and only one Clt∈Cl. The domain of d can be preference-ordered or not. In the former case, we suppose that the classes are ordered such that the higher is the class number the better is the class, i.e. for all r,s∈T, such that r>s, the objects from Clr are preferred (strictly or weakly) to the objects from Cls. More formally, if S is a comprehensive outranking relation on U, i.e. if for all x,y∈U, xSy means “x is at least as good as y”, we suppose: [x∈Clr, y∈Cls, r>s] ⇒ [xSy and not ySx]. These assumptions are typical for consideration of a multi-criteria sorting problem. In the case of non-ordered classes and regular condition attributes, the corresponding problem is that of multi-attribute classification; it is considered by the classical rough set approach. In the following paragraphs of this section we are considering separately the multi-attribute classification and the multi-criteria sorting with respect to the problem of missing values. The first idea of dealing with missing values in the rough set approach to the multiattribute classification problem has been given in (Greco, Matarazzo, Slowinski and Zanakis 1999b). 2.1. Multi-attribute classification problem with missing values For any two objects x,y∈U, we are considering a directional comparison of y to x; object y is called subject and object x, referent. We say that subject y is indiscernible with referent x with respect to condition attributes P⊆C (denotation yIPx) if for every q∈P the following conditions are met: f(x,q)≠∗ , f(x,q)=f(y,q) or f(y,q)=∗. The above means that the referent object considered for indiscernibility with respect to P should have no missing values on attributes from set P.

4

The binary relation IP is not necessarily reflexive because for some x∈U there may exist q∈P for which f(x,q)=∗ and, therefore, we cannot state xIPx. Moreover, IP is also not necessarily symmetric because the statement yIPx cannot be inverted if there exist q∈P for which f(y,q)=∗. However, IP is transitive because for each x,y,z∈U, the statements xIPy and yIPz imply xIPz. This is justified by the observations that object z can substitute object y in the statement xIPy because yIPz and both y and z, as referent objects, have no missing values. For each P⊆C let us define a set of objects having no missing values with respect to P: UP={x∈U: f(x,q)≠∗ for each q∈P}. It is easy to see that the restriction of IP to UP (in other words, the binary relation IP∩UP×UP defined on UP) is reflexive, symmetric and transitive, i.e. it is an equivalence binary relation. For each x∈U and for each P⊆Q let IP(x)={y∈U: yIPx} denote the class of objects indiscernible with x. Given X⊆U and P⊆Q, we define lower approximation of X with respect to P as IP(X)={x∈UP: IP(x)⊆X}.

(1)

Analogously, we define the upper approximation of X with respect to P as I P (X)={x∈UP: IP(x)∩X≠∅}.

(2)

Let us observe that if x∉UP then IP(x)=∅ and, therefore, we can also write I P (X)={x∈U: IP(x)∩X≠∅}.

Let XP=X∩UP. Theorem 1. (Rough inclusion) For each X∈U and for each P⊆C: IP(X)⊆XP⊆ I P (X). Theorem 2. (Complementarity) For each X∈U and for each P⊆C: IP(X)=UP - I P (U-X). The P-boundary of X in S, denoted by BnP(X), is BnP(X)= I P (X) - IP(X). BnP(X) constitutes the "doubtful region" of X: according to knowledge expressed by P nothing can be said with certainty about membership of its elements in the set X. The following concept will also be useful (Skowron and Grzymala-Busse, 1994). Given a partition Cl={Clt, t∈T}, T={1,...,n}, of U, the P-boundary with respect to k>1 classes {Clt1,…,Cltk}⊆{Cl1,…,Cln} is defined as

5

    BdP({Clt1,…,Cltk}) =  I Bn P(Cl t ) ∩  I (U − Bn P (Cl t )) .  t = t1,...,tk   t ≠ t1,..., tk 

The objects from BdP({Clt1,…,Cltk}) can be assigned to one of the classes Clt1,…,Cltk but P provides not enough information to know exactly to what class. Let us observe that a very useful property of lower approximation within classical rough sets theory is that if an object x∈U belongs to the lower approximation with respect to P⊆C, then x belongs also to the lower approximation with respect to R⊆C when P⊆R (this is a kind of monotonicity property). However, definition (1) does not satisfy this property of lower approximation, because it is possible that f(x,q)≠∗ for all q∈P but f(x,q)=∗ for some q∈R-P. This is quite problematic with respect to definition of some key concepts of the rough sets theory, like quality of approximation, reduct and core. Therefore, another definition of lower approximation should be considered to restore the concepts of quality of approximation, reduct and core in the case of missing values. Given X⊆U and P⊆Q, this definition is the following: I*P (X)= U IR (X).

(3)

R ⊆P

I*P (X) will be called cumulative P-lower approximation of X because it includes all the

objects belonging to all R-lower approximations of X, where R⊆P. Let U*P ={x∈U: f(x,q)≠∗ for at least one q∈P}. It can be shown that another type of indiscernibility relation, denoted by I*P , permits a direct definition of the cumulative P-lower approximation in a classic way. For each x,y∈U and for each P⊆Q, y I*P x means that f(x,q)=f(y,q) or f(x,q)=∗ and/or f(y,q)=∗, for every q∈P. Let I*P (x)={y∈U: y I*P x} for each x∈U and for each P⊆Q. I*P is reflexive and symmetric but not transitive (Kryszkiewicz, 1998). Let us observe that the restriction of I*P to U*P is reflexive, symmetric and transitive if U*P =UP. Theorem 3. (Definition (3) expressed in terms of I*P ) I*P (X)={x∈ U*P : I*P (x)⊆X}. Using I*P we can give definition of the P-upper approximation of X, complementary to I*P (X): I ∗P (X)={x∈ U*P : I*P (x)∩X≠∅}.

(4)

For each X⊆U, let X*P =X∩ U*P . Let us remark that x∈ U*P if and only if there exists R≠∅ such that R⊆P and x∈UR. Theorem 4. (Rough inclusion) For each X⊆U and for each P⊆C: I*P (X)⊆ X*P ⊆ I ∗P (X).

6

Theorem 5. (Complementarity) For each X⊆U and for each P⊆C: I*P (X)= U*P - I ∗P (U-X).

The P-boundary of X approximated with I*P is equal to Bn*P (X)= I ∗P (X) - I*P (X).

Given a partition Cl={Clt, t∈T}, T={1,...,n}, of U, the P-boundary with respect to k>1 classes {Clt1,…,Cltk}⊆ {Cl1,…,Cln} is defined as

(

)

    Bd *P ({Clt1,…,Cltk}) =  I Bn *P (Cl t ) ∩  I U − Bn *P (Cl t )  .  t = t1,..., tk   t ≠ t1,..., tk 

The objects from Bd *P ({Clt1,…,Cltk}) can be assigned to one of the classes Clt1,…,Cltk , however, P and all its subsets provide not enough information to know exactly to what class. Theorem 6. (Monotonicity of the accuracy of approximation) For each X⊆U and for each P,R⊆C, such that P⊆R, the following inclusion holds: *

i) I*P (X) ⊆ I (X). R Furthermore, if U*P = U*R , the following inclusion is also true ii) I ∗P (X) ⊇ IR (X). *

Due to Theorem 6, when augmenting a set of attributes P, we get a lower approximation of X that is at least of the same cardinality. Thus, we can restore for the case of missing values the following key concepts of the rough sets theory: accuracy and quality of approximation, reduct and core. The accuracy of the approximation of X⊆U by the attributes from P is the ratio: α P (X) =

(

*

card IP (X) card

(

I ∗P (X)

). )

The quality of the approximation of X⊆U by the attributes from P is the ratio: γP(X)=

(

*

card IP (X) card(X)

).

According to Theorem 6, the accuracy and the quality are monotonic (non-decreasing) with respect to inclusion of new attributes to set P.

7

The quality γP(X) represents the relative frequency of the objects correctly classified using the attributes from P. Moreover, 0≤αP(X)≤γP(X)≤1, γP(X)=0 iff αP(X)=0 and γP(X)=1 iff αP(X)=1. The approximations of a subset X⊆U can be extended to a classification, i.e. a partition Cl={Clt, t∈T}, T={1,...,n}, of U. By P-lower and P-upper approximation of Cl in S we mean sets I*P (Cl)={ I*P (Cl1),…, I*P (Cln)} and I ∗P (Cl)={ I ∗P (Cl1),…, I ∗P (Cln)}, respectively. The coefficient

γ P (Cl) =

(

n

*

∑ t =1 card I P (Clt )

)

card( U)

is called quality of the approximation of classification Cl by set of attributes P, or in short, quality of classification. It expresses the ratio of all P-correctly classified objects to all objects in the system. Each minimal subset P⊆C such that γ P (Cl ) = γ C (Cl ) is called a reduct of S and denoted by REDCl(C). Let us remark that a decision table can have more than one reduct. The intersection of all reducts is called the core and denoted by CORECl(C). 2.2. Decision rules for the multi-attribute classification problem with missing values Using the rough approximations (1), (2) and (3), (4), it is possible to induce a generalized description of the information contained in the decision table in terms of decision rules. These are logical statements (implications) of the type "if ..., then...", where the antecedent (condition part) is a conjunction of elementary conditions concerning particular condition attributes and the consequence (decision part) is a disjunction of possible assignments to particular classes of a partition of U induced by decision attributes. Given a partition Cl={Clt, t∈T}, T={1,...,n}, of U, the syntax of a rule is the following: "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clt1 or Clt2 or ... Cltk", where {q1,q2,...,qp}⊆C, (rq1,rq2,...,rqp)∈Vq1×Vq2×...×Vqp and {Clt1,Clt2,...,Cltk}⊆{Cl1,Cl2, ...,Cln}. If the consequence is univocal, i.e. k=1, then the rule is exact, otherwise it is approximate or uncertain. Let us observe that for any Clt∈{Cl1,Cl2,...,Cln} and P⊆Q, the definition (1) of P-lower approximation of Clt can be rewritten as IP(Clt)={x∈UP: for each y∈U, if yIPx, then y∈Clt}.

(1')

Thus the objects belonging to the lower approximation IP(Clt) can be considered as prototypes for induction of exact decision rules. Therefore, the statement "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clt", is accepted as an exact decision rule iff there exists at least one y∈ IP (Cl t ) , P={q1,…,qp}, such that f(y,q1) = rq1 and f(y,q2) = rq2 and … f(y,qp) = rqp.

8

Given {Clt1,…,Cltk}⊆{Cl1,Cl2,...,Cln} we can write BdP({Clt1,…,Cltk}) ={x∈UP: for each y∈U, if yIPx, then y∈Clt1 or … Cltk}.

(2')

Thus, the objects belonging to the boundary BdP({Clt1,…,Cltk}) can be considered as a basis for induction of approximate decision rules. An exact decision rule "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clt", is minimal iff there is not another rule "if f(x,h1) = rh1 and f(x,h2) = rh2 and ... f(x,hk) = rhk, then x is assigned to Clt", such that {h1,h2,…,hk}⊆{q1,…,qp} and rs=rs for any s∈{h1,h2,…,hk}. An approximate decision rule "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clt1 or Clt2 or ... Cltk ", is minimal iff there is not another rule "if f(x,h1) = rh1 and f(x,h2) = rh2 and ... f(x,hk) = rhk, then x is assigned to Ct1 or Clt2 or ... Cltk", such that {h1,h2,…,hk}⊆{q1,…,qp} and rs=rs for any s∈{h1,h2,…,hk}. Since each decision rule is an implication, a minimal decision rule represents such an implication that there is no other implication with an antecedent of at least the same weakness and a consequent of at least the same strength. We say that y∈U supports the exact decision rule "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clj", if [f(y,q1) = rq1 and/or f(y,q1) = ∗] and [f(y,q2) = rq2 and/or f(y,q2) = ∗] ... and [f(y,qp) = rqp and/or f(y,qp) = ∗ ] and y∈Clt. Similarly, we say that y∈U supports the approximate decision rule "if f(x,q1) = rq1 and f(x,q2) = rq2 and ... f(x,qp) = rqp, then x is assigned to Clt1 or Clt2 or ... Cltk", if [f(y,q1) = rq1 and/or f(y,q1) = ∗] and [f(y,q2) = rq2 and/or f(y,q2) = ∗ ] ... and [f(y,qp) = rqp and/or f(y,qp) = ∗] and y∈ Bd *C ({Clt1,…,Cltk}). A set of decision rules is complete if it fulfils the following conditions: -

each x∈ I*C (Clt) supports at least one exact decision rule of the type "if f(x,q1)=rq1 and f(x,q2)=rq2 and ... f(x,qp)=rqp, then x is assigned to Clt", for each Clt∈Cl,

-

each x∈ Bd *C ({Clt1,…,Cltk}) supports at least one approximate decision rule of the type "if f(x,q1)=rq1 and f(x,q2)=rq2 and ... f(x,qp)=rqp, then x is assigned to Clt1 or Clt2 or ... Cltk", for each {Clt1,Clt2,...,Cltk}⊆{Cl1,Cl2,...,Cln}.

We call minimal each set of minimal decision rules that is complete and non-redundant, i.e. exclusion of any rule from this set makes it non-complete. 2.3. Multi-criteria sorting problem with missing values As was mentioned before, the notion of attribute differs from that of criterion because the domain (scale) of a criterion has to be ordered according to a decreasing or increasing preference, while the domain of the attribute does not have to be ordered. Formally, for each q∈C being a criterion there exists an outranking relation (Roy, 1985) Sq on the set of actions U such that xSqy means “x is at least as good as y with respect to criterion q”. We

9

suppose that Sq is a total preorder, i.e. a strongly complete and transitive binary relation defined on U on the basis of evaluations f(⋅,q). If domain Vq of criterion q is quantitative and for each x,y∈U, f(x,q)≥f(y,q) implies xSqy, then Vq is a scale of preference of criterion q. If, however, for criterion q, Vq is not quantitative and/or f(x,q)≥f(y,q) does not imply xSqy, then in order to define a scale of preference of criterion q one can choose a function gq:U→R such that for each x,y∈U, xSqy if and only if gq(x)≥gq(y) (see, e.g., Roubens and Vincke, 1985); to this aim it is enough to order the objects of U from the worst to the best on criterion q and to assign to gq(x) consecutive numbers corresponding to the rank of x in this order, i.e. for z being the worst, gq(z)=1, for w being the second worst, gq(w)=2, and so on. Then, the domain of function gq(⋅) becomes a scale of preference of criterion q and the domain Vq is recoded such that f(x,q)=gq(x) for every x∈U. Also in this case, we are considering a directional comparison of subject y to referent x, for any two objects x,y∈U. We say that subject y dominates referent x with respect to criteria P⊆C (denotation y D +P x) if for every criterion q∈P the following conditions are met: f(x,q)≠∗, f(y,q)≥f(x,q) or f(y,q)=∗. The above means that the referent object considered for dominance D +P should have no missing values on criteria from set P. We say that subject y is dominated by referent x with respect to criteria P⊆C (denotation x D −P y) if for every criterion q∈P the following conditions are met: f(x,q)≠∗, f(x,q)≥f(y,q) or f(y,q)=∗. The above means that the referent object considered for dominance D −P should have no missing values on criteria from set P. The binary relations D +P and D −P are not necessarily reflexive because for some x∈U there may exist q∈P for which f(x,q)=∗ and, therefore, we cannot state neither x D +P x nor x D −P x. However, D +P and D −P are transitive because for each x,y,z∈U, the statements 1) x D +P y and y D +P z imply x D +P z, and 2) x D −P y and y D −P z imply x D −P z . Implication 1) is justified by the observations that object z can substitute object y in the statement x D +P y because y D +P z and both y and z, as referent objects, have no missing values. As to implication 2), object x can substitute object y in the statement y D −P z because x D −P y and both x and y, as referent objects, have no missing values.

10

For each P⊆C we restore the definition of set UP from paragraph 2.1. It is easy to see that

the restrictions of D +P and D −P to UP (in other words, the binary relations D +P ∩UP×UP and D −P ∩UP×UP defined on UP) are reflexive and transitive, i.e. they are partial preorders.

The sets to be approximated are called upward union and downward union of preferenceordered classes, respectively: Cl≥t =

U Cls , Cl≤t = U Cls , t=1,...,n.

s≥ t

s≤ t

The statement x∈ Cl≥t means "x belongs at least to class Clt", while x∈ Cl≤t means "x belongs at most to class Clt". Let us remark that Cl1≥ = Cl≤n =U, Cl≥n =Cln and Cl1≤ =Cl1. Furthermore, for t=2,...,n, we have: Cl1≥ =U- Cl≤n and Cl≤n =U- Cl1≥ .

Given P⊆C and x∈U, the “granules of knowledge” used for approximation are: -

a set of objects dominating x, called P-dominating set, D+P ( x ) ={y∈U: y D+P x},

-

a set of objects dominated by x, called P-dominated set, D−P ( x ) ={y∈U: x D−P y}.

For any P⊆C we say that x∈U belongs to Cl≥t without any ambiguity if x∈ Cl≥t and for all the objects y∈U dominating x with respect to P, we have y∈ Cl≥t , i.e. D+P ( x ) ⊆ Cl≥t . Furthermore, we say that x∈U could belong to Cl≥t if there would exist at least one object y∈ Cl≥t dominated by x with respect to P, i.e. y∈ D−P ( x ) . Thus, with respect to P⊆C, the set of all objects belonging to Cl≥t without any ambiguity constitutes the P-lower approximation of Cl≥t , denoted by P(Cl≥t ) , and the set of all objects that could belong to Cl≥t constitutes the P-upper approximation of Cl≥t , denoted by P (Cl ≥t ) , for t=1,...,n: P(Cl≥t ) ={x∈UP: D+P ( x ) ⊆ Cl≥t },

(5.1)

P (Cl ≥t ) ={x∈UP: D−P ( x ) ∩ Cl≥t ≠∅}.

(5.2)

Analogously, one can define P-lower approximation and P-upper approximation of Cl≤t , for t=1,...,n: P(Cl ≤t ) ={x∈UP: D−P ( x ) ⊆ Cl≤t },

(6.1)

P (Cl ≤t ) ={x∈UP: D+P ( x ) ∩ Cl≤t ≠∅}.

(6.2)

Let ( Cl≥t )P= Cl≥t ∩UP and ( Cl≤t )P= Cl≤t ∩UP, t=1,…,n.

11

Theorem 7. (Rough inclusion) For each Cl≥t and Cl≤t , t=1,…,n, and for each P⊆C: P(Cl ≥t ) ⊆( Cl≥t )P⊆ P (Cl ≥t ) , P(Cl ≤t ) ⊆( Cl≤t )P⊆ P (Cl ≤t ) .

Theorem 8. (Complementarity) For each Cl≥t , t=2,…,n, and Cl≤t , t=1,…,n-1, and for each P⊆C: P(Cl ≥t ) = UP - P (Cl ≤t −1 ) , P(Cl ≤t ) = UP - P (Cl ≥t +1 ) .

The P-boundaries (P-doubtful regions) of Cl≥t and Cl≤t are defined as: BnP( Cl≥t )= P (Cl ≥t ) - P(Cl ≥t ) ,

BnP( Cl≤t )= P (Cl ≤t ) - P(Cl ≤t ) ,

for t=1,...,n.

Due to complementarity of the rough approximations (Greco, Matarazzo and Slowinski, 1998b), the following property holds: BnP( Cl≥t )=BnP( Cl≤t −1 ), for t=2,...,n. To preserve the monotonicity property of the lower approximation (see paragraph 2.1) it is necessary to use another definition of the approximation for a given Cl≥t and Cl≤t , t=1,…,n, and for each P⊆C: P(Cl ≥t ) ∗ =

U R (Cl ≥t ) ,

(7.1)

P(Cl ≤t )∗ = U R (Cl ≤t ) .

(7.2)

R ⊆P

R⊆P

P(Cl ≥t )∗ and P(Cl ≤t ) ∗ will be called cumulative P-lower approximations of unions Cl≥t and Cl≤t , t=1,…,n, because they include all the objects belonging to all R-lower

approximations of Cl≥t and Cl≤t , respectively, where R⊆P. Let U*P ={x∈U: f(x,q)≠∗ for at least one q∈P}. It can be shown that another type of dominance relation, denoted by D∗P , permits a direct definition of the cumulative P-lower approximations in a classic way. For each x,y∈U and for each P⊆C, y D∗P x means that f(y,q)≥f(x,q) or f(x,q)=∗ and/or f(y,q)=∗, for every q∈P. Now, given P⊆C and x∈U, the “granules of knowledge” used for approximation are: -

a set of objects dominating x, called P-dominating set, D P+∗ (x)={y∈U: y D∗P x},

-

a set of objects dominated by x, called P-dominated set, D P−∗ (x)={y∈U: x D∗P y}.

12

D∗P is reflexive but not transitive. Let us observe that the restriction of D∗P to U*P is reflexive and transitive if U*P =UP.

Theorem 9. (Definitions (7.1) and (7.2) expressed in terms of D∗P ) P(Cl ≥t )∗ ={x∈ U*P : D P+∗ (x)⊆ Cl ≥t }, P(Cl ≤t ) ∗ ={x∈ U*P : D P−∗ (x)⊆ Cl ≤t }.

Using D∗P we can give definition of the P-upper approximations of Cl ≥t and Cl ≤t , complementary to P(Cl ≥t )∗ and P(Cl ≤t ) ∗ , respectively: P (Cl ≥t ) ∗ ={x∈ U*P : D P−∗ (x)∩ Cl ≥t ≠∅},

(8.1)

P (Cl ≤t ) ∗ ={x∈ U*P : D P+∗ (x)∩ Cl ≤t ≠∅}.

(8.2)

For each Cl ≥t ⊆U and Cl ≤t ⊆U, let (Cl ≥t ) ∗ = Cl ≥t ∩ U*P and (Cl ≤t ) ∗ = Cl ≤t ∩ U*P . Let us remark that x∈ U*P if and only if there exists R≠∅ such that R⊆P and x∈UR. Theorem 10. (Rough inclusion) For each Cl≥t and Cl≤t , t=1,…,n, and for each P⊆C: P(Cl ≥t )∗ ⊆ (Cl ≥t ) ∗ ⊆ P (Cl ≥t ) ∗ , P(Cl ≤t ) ∗ ⊆ (Cl ≤t ) ∗ ⊆ P (Cl ≤t ) ∗ .

Theorem 11. (Complementarity) For each Cl≥t , t=2,…,n, and Cl≤t , t=1,…,n-1, and for each P⊆C: P(Cl ≥t )∗ = U*P - P (Cl ≤t −1 ) ∗ , P(Cl ≤t ) ∗ = U*P - P (Cl ≥t +1 ) ∗ .

The P-boundary of and Cl≤t , t=1,…,n, approximated with D∗P are equal, respectively, to Bn*P ( Cl≥t )= P (Cl ≥t ) ∗ - P(Cl ≥t )∗ , Bn*P ( Cl≤t )= P (Cl ≤t ) ∗ - P(Cl ≤t ) ∗ .

Theorem 12. (Monotonicity of the accuracy of approximation) For each Cl≥t and Cl≤t , t=1,…,n, and for each P,R⊆C, such that P⊆R, the following inclusions hold: P(Cl ≥t )∗ ⊆ R (Cl ≥t ) ∗ , P(Cl ≤t ) ∗ ⊆ R (Cl ≤t ) ∗ .

13

Furthermore, if U*P = U*R , the following inclusions are also true P (Cl ≥t ) ∗ ⊇ R (Cl ≥t ) ∗ P (Cl ≤t ) ∗ ⊇ R (Cl ≤t ) ∗ .

Due to Theorem 12, when augmenting a set of attributes P, we get lower approximations of Cl≥t and Cl≤t , t=1,…,n, that are at least of the same cardinality. Thus, we can restore for the case of missing values the following key concepts of the rough sets theory: accuracy and quality of approximation, reduct and core. For every t∈T and for every P⊆C we define the quality of approximation of partition Cl by set of attributes P, or in short, quality of sorting:    card  U -  U Bn *P Cl≤t      t∈T   γ P (Cl) = card( U)

( )

( )

   card  U -  U Bn ∗P Cl ≥t      t∈T   = . card (U )

The quality expresses the ratio of all P-correctly sorted objects to all objects in the decision table. Each minimal subset P⊆C such that γ P (Cl ) = γ C (Cl ) is called a reduct of Cl and denoted by RED Cl (C). Let us remark that a decision table can have more than one reduct. The intersection of all reducts is called the core and denoted by CORE Cl (C). 2.4. Decision rules for multi-criteria sorting problem with missing values Using the rough approximations (5.1), (5.2), (6.1), (6.2) and (7.1), (7.2), (8.1), (8.2), it is possible to induce a generalized description of the information contained in the decision table in terms of "if ..., then..." decision rules. Given the preference-ordered classes of partition Cl={Clt, t∈T}, T={1,...,n}, of U, the following three types of decision rules can be considered: 1)

D≥-decision rules with the following syntax: if f(x,q1)≥rq1 and f(x,q2)≥rq2 and …f(x,qp)≥rqp, then x∈ Cl≥t , where P={q1,...,qp}⊆C, (rq1,...,rqp)∈Vq1×Vq2×...×Vqp and t∈T;

2)

D≤-decision rules with the following syntax: if f(x,q1)≤rq1 and f(x,q2)≤rq2 and ... f(x,qp)≤rqp, then x∈ Cl≤t , where P={q1,...,qp}⊆C, (rq1,...,rqp)∈Vq1×Vq2×...×Vqp and t∈T;

3)

D≥≤-decision rules with the following syntax:

14

if f(x,q1)≥rq1 and f(x,q2)≥rq2 and ... f(x,qk)≥rqk and f(x,qk+1)≤rqk+1 and ... f(x,qp)≤rqp, then x∈Cls∪Cls+1∪…∪Clt, where O’={q1,...,qk}⊆C, O’’={qk+1,...,qp}⊆C, P=O’∪O’’, O’ and O’’ not necessarily disjoint, (rq1,...,rqp)∈Vq1×Vq2×...×Vqp, s,t∈T such that s