Optimal decision rules based on inclusion degree theory - IEEE Xplore

1 downloads 0 Views 240KB Size Report
Nov 5, 2002 - reduction and the optimal maximum distribution rules are also pmsented, which are more useful in making brief decision rules from inconsistent ...
Proceedings of the First International Conference on Macbme Learning and Cybernetics,Beijing, 4-5 November 2002

OPTIMAL DECISION RULES BASED ON INCLUSION DEGREE THEORY JU-SHENG MI'*.^), WEN-XIU ZHANG(=),WEI-ZHI WFaculty of Science, Xi’an Jiaotong University, Xi’an, Shannxi 710049, P. R. China ”College of Mathematics and Information Science, Hebei Normal University,:Shijiazhuang,050016, P. R. China Wollege of Information Science, Zhejiang Ocean University, Zhejiang, Zhoushan, 316000, P. R. China E-MAIL: [email protected]

Abstract: The purpose of this paper is to establish knowledge reductions in inconsistent decision tables. Based on rough set theory, the concepts of upper and lower approximation reductions are introduced. Their relationships are investigated. With the theory of inclusion degree, the maximum distribution reduction and the optimal maximum distribution rules are also pmsented, which are more useful in making brief decision rules from inconsistent infomation systems. Then a new knowledge discoversapproach is established. Keywords: set; Inclusion de-; Inconsistent decision table

Rough

1

Knowledge reduction;

Introduction

are investigated. With the theory of inclusion degree 13’, we then present a concept of maximum distribution reduction that is more useful in making decision rules from an inconsistent decision table. The optional maximum distribution decision rules are also examined. 2

Basicnotions

An information system (IS) (sometimes called data tables, attribute-value systems, knowledge representation systems etc.) is a pair (U, A), where U is a nou-empty, finite set called the universe and A is a non-empty, finite set of attributes. For each a € A, a is a reflection function from

Due to issues such as noise in data, compact representation and prediction capability, most decision information systems are inconsistent. Hence, one has to resort to obtain decision rules with confidence less than one At resent, rough set theory and inclusion degree theory “~2~3r0ffer effective approaches for generating such decision NleS. For an information system, not all condition attributes are necessary to depict the decision attribute before decision rules are generated. Therefore, knowledge reduction 14] is necessary to obtain brief decision rules. Many types of knowledge reductions ‘ind decision rules have been proposed in inconsistent systems ‘5-L01.For example, possible reduction IS1and possible rules have been proposed as a means to deal with inconsistence in an inconsistent decision table. Generalized decision reduction [61, and generalized decision rules provide a decision-maker with more flexible selections of decision behavior. It is shown’in that the knowledge reduction that preserves the membership distribution is equivalent to the knowledge reduction that preserves the value of generalized inference measure function. Kryszkiewicz investigated and compared five notions of knowledge reductions in inconsistent systems. They were based on rough set theory. ,In this paper, upper approximation reduction. and lower approximation reduction are introduced. Their relationships

U

to

V, , where V,

is called the domain of U .

Elements of U are called objects. Each non-empty subset B C A indiscemibility relation R, as follows:

determines

an

R, = { ( x , y ) ~ U X U l V a ~ A , a ( x ) = a ( y ) } . RE partitions U into some equivalence classes. By [XI, we denote the equivalence class determined-by X with respect to (wrt.) B ,i.e., [ X I E ‘ { Y E U , a ( y )= a m ) .

c

U ,B A , one can characterize ‘X by a pair Let X of lower and upper approximations: Ax = { X € U I[x], G x}, AX = {XE U ~ [ x ] n , x ,# p}.

-

The lower approximation -Ax is the set of objects that to X with certainty, while the upper approximation belong -

Ax pair

is the set of objects that possibly belong to

(@,Ax)

x . The

is referred to as the Pawl& rough set of

X wrt. B . A decision table (DT) is an information system (U, A u { d 1) , where d E A , A is called the condition attribute set, while d ‘is called the decision attribute. If

&7803-7508-4/02617.00 a 0 0 2 IEEE 1223

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijimg, 4 5 November 2002

RI,, , we see that (U,Au(d]) is consistent,

RA

otherwise it is inconsistent. One can acquire certainty decision rules from consistent systems' and uncertainty decision rules from inconsistent systems. The inclusion degree D I" of two sets is defined as the degree of one set contained in another set. In general, it should satisfy the following conditions:

0 2 D(Y I X ) 5 1 : (2) X C Y ifaudonlyif D ( Y / X ) = l ; Y g Z , t h e n D ( X I Z ) 5 D(Y IZ). (3)If X

(1)

-

set, then

E(oi)= A(D,) for all i I r . It is clear that to prov,a that

mi

D(Y I X ) =-Ix nYI

1.

Theorem 3.1. Let ( U , A U ( d ] ) be a DT, thsn an upper approximation consistent set must be a lower approximation consistent set. h o t Suppose B is an upper approximation con!;istent

&Di) c_ .4(Di) . Hence. we only need B ( D i )2 4 ( D i )for all i 2 r . Indeed, X E 3 [ X I , e Dj 3 [ X I , nDj = @,Vj# i

It is easy to see that

is an inclusion degree, where

theorem illustrates the relationship between the two concepts.

1x1

3 .re

represents the number of

kj,Vj+i

x E %Dj,V j # i

the elements of the set. In this paper, for the sake of simplicity, we use this degree as our inclusion degree.

3 [ x ] , n D j =@,Vj#i Dl

3

Knowledge reductions based on inclusion degree

=xegDi.

theory

Let

Thus we complete the proof. The following example illustrates that the converse of Theorem 3.1 is not always hue. Example 3.1. Table 1 is an inconsistent DT.

( U , A u ( d ] ) beaDT. B c A . W e d e n o t e

, j.

Table 1.

-

B ( d ) and B ( d ) represent the approximation and lower approximation of knowledge described by attribute set E . Then

upper

target x2

x3

Definition3.1.Let ( U , A u { d ] ) beaDT. B E A . (1)

If B ( d ) = A ( d ) , we see that B is an upper A U Id)). approximation consistent set of (U,

B is an upper approximation consistent set, and B , B\IaJ(d) # A ( d ) , then B is for any

(2)

( U , AU Id)). If B ( d ) = A(d) , we see that B is a lower approximation consistent set of

(U,A ~ ( d ). )

If B is a lower approximation consistent set, and B is for any a~ B , B \ ( a } ( d ) # & d ) , t h e n ~

referred to as a lower approximation reduction of

0

0

2

XS

0

1

0

0

1

1

6'

If

referred to as an upper approximation reduction of

x4

It is easy to prove that ( a , )is a lower approximation consistent set but not an upper approximation consiste.nt set. A membership distribution function ,UB is defined as follows:

p u , ( x ) = ( D ( D41x I , ) , . . . , D ( D , /[XI,), It is obvious that p , is a probability U I R [ d , For . every X E U , we denote

(U,Au(dJ). = @j

An upper approximation consistent set preserves all

upper approximation of all decision classes. While a lower approximation consistent set preserves all lower approximation of all decision classes. The following

Then

=max{D(Dj

/[XI,,

isr

1)

Y E( x ) is the set of all maximum decision classes

of x with respect to the attribnte subset E

1224

XE'U. distribution. on

.

,

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

Defhition3.2. Let ( U , A u ( d } ) b e a D T , B c A .

be called B - descriptor. Let t be a B - descriptor, the attribute set occumng in t will be denoted by B(t) . By

(1) If &(x) = pu,(x)for a l l x ~ U , we see that B is a distribution consistent set of (U, A U I d ] ). If B is a distribution consistent set, and no proper subset of B is distribution consistent, then B is referred to as a distribution reduction of (U, A U (d]) .

(([(( and

(2) If yB(x) = Y,,(x) for all X E

U , we see that B is

a maximum distribution consistent set of ( U , A U ( d ] ) . If B is a maximum distribution consistent set, and no proper subset of B is maximum distribution consistent, then B is referred to as a maximum distribution reduction of ( U , A u ( d ] ) . A distribution consistent set is a subset of attribute set that preserves the degree in which every object belongs to each decision class. A maximum distribution consistent set preserves all maximum decision rules. But the confidence of each uncertainty decision rule may not be equal to the original one. So we have the following result. Theorem 3.2. Let ( U , A U ( d ) ) be a DT, then a distribution consistent set must be a maximum distribution consistent set. The following example illustrates that the converse of Theorem 3.2 is not always true. Example 3.2. Table 2 is an inconsistent DT. Table 2.

,

U2

0

0

0

0

0

0

0

1

0

0

1

0

0

1

1

1

1

1

(a,v) E t , that is, s consists of a subset of

S

be

two

descriptors,

if

atomic properties occurring in t , then we see that s is coarser than t and denoted by s 5 t . If s consists of a proper subset of atomic properties occurring in t , then we see that s is proper coarser than t and denoted by s 4 t . The support of a rule t s is defined as

+

nllsll . The confidence of a rule t + s is denoted by conf (t + s) and is defined as the conditional probability that s is satisfied provided t is satisfied Ilt +

sII =

The user is often interesting in rules allowing classification of objects based on minimum facts. Now we describe the notion of optimal maximum distribution rules. For the sake of simplicity, we denote by d ( y ) = i, for

Di e.3

D,EY&)

XEU.

+

A maximum distribution decision rule t s is an optimal maximum distribution rule if there is no other proper miximum distribution decision rule t'+ S such as

t 8 4t .

NkS

A . In this section, we Let (U, A U (d]) be a DT. B consider decision rules that have the form: t -+ S , where t = A ( U , V ) , U E B,VE S =V(d,W),WE . Any

v,

attribute pair (a,v),a E B will be called an B - atomic property, and

and

(a, v) E s

A decision rule t -3 s is called a maximum distribution v ( d , i ) ,for some A (a,a(x)) and s =

Reductions in construction of optimal decision

v,.

uncertain. Let t

rule if t =

It is easy to verify that [ U , ] is a maximum distribution consistent set but not a distribution consistent set.

4

, we denote the set of objects having

B - descriptor t and the set of objects with decision feature S respectively. The cardinality of the object set lltll will be called the support of t and is denoted by sup(t) . A rule t -+ s is certain if [It11c llsll , otherwise it is

all Y E

d

U1

IIS((

B -atomic property or their conjunction will

Maximum distribution consistent sets can be used in the construction of maximum distribution rules while maximum distribution reductions can be used in construction of optimal maximum distribution rules. Let ( U , A u [ d ] ) be a DT, B G A , x E U . If

ys(x) = Ya(x) we see that B is a maximum distribution consistent set for x E U . If B is a maximum distribution consistent set for X E U , and no proper subset of

B is maximum distribution consistent for

XE U

, then

B is referred to as a maximum distribution reduction for XE

U.

Let B be a maximum distribution consistent set for some x E U . Then x supports the maximum distribution

1225

Proceedings of the First International Conference on Machine Leaming and Cybernetics, Beijing, 4-5 November 2002 decision rules

r:

A

(a,a(x))

nEB

+

v (d,i).

4eye(x)

If B is a maximum distribution reduction for some x E U , then B is a minimal attribute subset that satisfies the rule r . So the rule r:

A YE B

(a,a(x))

+

v (d,i)

D,~Y,(*)

is an optimal maximum distribution rule. Example 4.1. Continuing from example 3.2. The maximum distribution rules are: (a, A (a, (d,O) , supported by x, and

,o)

,o) +

n,

with confidence 1.

,o) A (a2,I) -+ (d,O) , supported by n3,xq and

(a,

x, with confidence 0.67

(U,,l)A(az,l)+(d,l) , supported by x 6 with confidence 1. Since (a,) is a maximum distribution reduction for all objects, the optimal maximum distribution rules are:

,o) -+ (d,O), supported by x, .x2 , xj ,x4 and

(a,

x, with confidence 0.8. (U, .1)

+ ( d , l ) ,supported by x6 with confidence 1.

Because (a, ] is also a maximum distribution reduction for

XI

and

Xz,

the N k

(a,,O)

+ (d,O)

is still an optimal maximum distribution rule supported by

x, and x, with confidence 1. 5

Conclusions

In a decision table, people always expect to represent knowledge using amibutes as least as possible hut without losing any information, hence, knowledge reductions are needed. Many types of amibute reductions have been proposed based on rough sets theory. Each of them aimed at different requirements. This paper has introduced upper approximation reduction and lower approximation reduction. Based on inclusion theory, a new kind of knowledge reduction named amaximum distribution reduction is also presented, which is more useful in making decision rules. As its application, the optimal maximum distribution rules are examined. Since incomplete information systems are more complicated than complete information systems, further research of knowledge reduction for different requirements in incomplete information systems is needed.

References

Z. Pawlak, Rough sets, International Journal of Computer and Information Science, 11 (5) 341-356, 1982 Z. Pawlak, Rough sets: Theoretical aspect!; of reasoning about Data, Kluwer Academic Publishers, Lontoo, 1991 Weoxiu Zhang and Yi Leung, The Uncertlinty reasoning principles, Xi'an Jiaotong University FTess, Xi'an, 1996 (in Chinese). E. Marczewski, A general scheme of independence in mathematics, Bull. Acad. Polon. Sci.,Ser. Sci. Math. Astronom. Phys. 6.731-736.1958 J. Grzymala-Busse, X. Zuo, Classification stra@ies using certain and possible rules, In: LNAI 1.424, RSCTC'98. Springer; 37-44, 1998 Marzena Kryszkiewicz, Rough set approach to incomplete information systems, Information Sciences, 112.39-49.1998 Wenxiu Zbang, Weizbi Wu, Jiye Liang and Deyu Li, Theory and method of rough sets, Science F'ress, Beijing, 2 0 1 (in Chinese). D. Slezak, Approximate reducts in decision tables. In: Proceedings of IPMU'96, Granada, Spain, Vo1.3, 1159-1164, 1996 M. Beynon, Reducts within the variable precision rough sets model: A further investigation, Eurapean Journal of Operational Research 134,592-605,2(01 H. S. Nguyen, D. Slezak, Approximation reduct!; and association rules correspondence and complexity results. In: N. Zhong, A. Skowron, S . Oshuga, editors. Proceedings of RSFDGrC'99, Yamaguchi, Japan. LNAI 1711,137-145,1999 M. JSryszkiewicz, Comparative study of alternative type of knowledge reduction in insistent syslems, International Journal of Intelligent Systems, 16,'105120,2001

Suggest Documents