Rough Set Theory Approach for Attribute Reduction

1 downloads 0 Views 876KB Size Report
The set BNB (X) = X. B. - X. B is defined as the boundary region of X. (4) ...... Sreevani, Y. V., & Rao, T. V. N. (2010). Identification and Evaluation of Functional ...
Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

Rough Set Theory Approach for Attribute Reduction Lukshmi R.A1* , Geetha P.V1 , Venkatesan P2 Department of Mathematics, Meenakshi College for Women, Chennai-24. Department of Statistics, National Institute for Research in Tuberculosis, ICMR, Chennai-31 *[email protected] 1

2

Abstract Knowledge Discovery from databases is practically important in many fields , including the field of medicine. Many methods are being developed for knowledge discovery and due to the availability of enormous amount of data, extraction of knowledge from database has become a challenging task. Researchers have proved methods, among which Rough Set Theory is an effective tool for knowledge discovery. In this paper, Rough Set Theory and its basic ideas are reviewed and applied to identify symptoms for diagnosing diabetes. This study also presents methods for extension to high dimensional data.in the medical domain.

Keywords: Knowledge Discovery, Rough Set Theory, Discernibility Matrix, Reduct, Rule Extraction.

1. Introduction Knowledge Discovery from database is the process of identifying relevant information from data [2]. Due to the explosive growth of collection of data in various fields, extraction of knowledge from database becomes challenging, wherein different techniques and tools are being developed. The problem of imperfect or imprecise knowledge has become a crucial issue in the area of artificial intelligence. There are many approaches to the problem of how to understand and manipulate imperfect knowledge [11]. One important approach is Rough Set Theory (RST) proposed by Zdislaw Pawlak [8]. RST is a new area of ‘uncertainty mathematics’ closely related to Fuzzy Set Theory [5]. With increasing data stored in medical databases, RST has provided efficient and effective predictive tools for medical data mining.The most common applications of RST in medicine is for diagnosis or prognosis. This work shows how RST is applied to classification and prediction. In Section 2, the main aspects of classical RST are introduced. Section 3 shows the applications of rough concepts and methods in the knowledge discovery processes. In section 4, applications of RST in diabetes classification are presented and discussed.

2. Rough Sets RST is a model of approximate reasoning under uncertainty [8]. The Rough philosophy is based on the assumption that some associated form of data or knowledge exists for every object of the universe of discourse. RST is unique in the sense that rough set analysis requires no external parameters and uses only the information present in the input data [7]. RST has been continuously developing and increasing number of researchers are interested in its methodology. The fundamental concept of RST is in the approximation of lower and upper spaces of a set, and its approximation of spaces being the classification of knowledge. The subset generated by lower approximations is characterized by objects that will definitely form part of an interest subset ,while the upper approximation is characterized by objects that will possibly form part of an interest subset. Every subset defined through upper and lower approximation is known as Rough Set [10]. RST is similar to Fuzzy Set Theory (FST), however the uncertainty and indefiniteness in RST is expressed by a boundary region of a set, and not by a partial membership as in FST. A comparison of the definitions of classical sets, fuzzy sets and rough sets reveals that the classical set is a primitive notion and is defined intuitively or axiomatically; Fuzzy set is defined by employing the fuzzy membership function, which involves advanced mathematical structures, numbers and functions. Rough set is defined by topological operations called approximations. http://ijaai.informaticspublishing.com

70

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

2.1 Information and Decision Systems Data for Rough Set analysis are presented in the form of an attribute-value table [1].Each row of the table represents an object and columns represent attributes characterizing the object. Values of the attribute are acquired either by measurement or through human expertise. This attribute-value table is known as an information system [15]. Extension of information system with attribute decision included is termed - decision system. Let U be the universe and A be the set of all attributes. If A be the information system, then the information set of a subset B of A is defined as InfB (x) = {(a,a(x)) : a ℇ B }of x.

2.2 Equivalence Relation and Equivalence Classes

A given binary relation ~ on a set A is said to be an equivalence relation if and only if it is reflexive, symmetric and transitive. Equivalently, for all a, b and c in A: • a ~ a. (Reflexivity) • if a ~ b then b ~ a. (Symmetry) • if a ~ b and b ~ c then a ~ c. (Transitivity) The equivalence class of a under ~, denoted [a], is defined as   [a] = {b ∈ A/a ~ b} 2.2.1 Indiscernibility Classes Objects characterised by the same features are said to be indiscernible in view of the information available about them. The indiscernibility relation INDB of the B-indiscernibility is defined as

(x,y)ℇINDB ⇔ InfB(x)=InfB(y)

(1)

Indiscernibility relation is an equivalence relation and the indiscernibility classes with respect to a subset B of attributes, are the equivalence classes [x]B representing granules of knowledge given by the information system.

2.3 Approximations Rough Set Theory defines three regions based on the equivalence classes induced by the attribute values, namely, lower approximation, upper approximation and boundary. Given a subset X of U, the lower approximation of X denoted by B X and the upper approximation denoted by BX are defined respectively as Lower approximation:

B X = {x ∈ U :[x ]B ⊆ X}

(2)

Upper approximation:

B X = {x ∈ U :[x ]B  X ≠ ϕ }

(3)

The set BNB (X) = BX - BX is defined as the boundary region of X.

(4)

The lower approximation is a description of the domain objects which are known with certainty to belong to the subset of interest, whereas the upper approximation is a description of objects which possibly belong to the subset of interest. The boundary region consists of objects which can neither be certainly classified as members of X nor certainly as members of U-X. A set whose boundary is empty is a crisp set whereas one with a nonempty boundary is said to be Rough [12]. A rough set X can be also characterized numerically by the following Imprecision Coefficient

α B (X ) =

BX

(5)

BX

http://ijaai.informaticspublishing.com

71

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

called the accuracy of approximation, where X denotes the cardinality of X ≠φ . Obviously respect to B.

0≤

(X) ≤ 1 If α B (X) =1, then X is crisp with respect to B and otherwise, if, α B (X) < 1, X is rough with

2.4 Reduct and Core Reduct and core are two major concepts in rough set theory [6]. A reduct is a minimal set of attributes that preserve the indiscernibility relation [17].More formally, given an information table with the attribute set A, a reduct is a set of attributes R

R ⊂ A, and such that i. INDR(U)=INDA(U)

(6)

ii.INDR-a(U) ≠ INDA(U) ∀a ∈ R

(7)

such that

The core of R, denoted by CORE(R), is the set of relations that appear in all reducts of R. While the computation of equivalence classes is easy, finding minimal reducts is NP-hard.

2.5 Discernibility Matrices and Functions Let S=(U,A) be an information system with n objects. The discernibility matrix of S is a symmetric n× n matrix with entries cij as given below:-

cij={ a ℇ A / a( xi) ≠ a( xj)} for i,j=1,.,n

(8)

Each entry thus consists of the set of attributes upon which objects xi and xj differ [9],[13]. With every discernibility matrix a discernibility function can be uniquely associated as defined below:A discernibility function of an information system S is a Boolean function fA of k Boolean variables a1*,a2*,a3* …ak* defined as fA(a1*,a2*,a3* …ak*)= ∧{∨ci j*,1≤j, ci j ≠ Ø}. (9) The set of all prime implicants of fA determines the set of all reducts of A.

2.6 Dependency of Attributes Another important issue in data analysis is discovering dependencies between attributes [10],[16]. Intuitively, a set of attributes D depends totally on a set of attributes C, denoted by C ⇒D, if all values of attributes from D are uniquely determined by values of attributes from C. In other words, D depends totally on C, if there exists a functional dependency between values of D and C. Dependency can be defined in the following way. Let D and C be subsets of A. Attribute D is said to depend on attribute C in a degree k (0 ≤ k ≤ 1), denoted C ⇒D, if

k = γ(C, D) where γ(C,D) =

where

card(Pos C (D )) card(U )

POS C (D ) =



X∈U / I(D )

(10)

C(X )

(11)

The expression POSC (D), is called the positive region of the partition U/D with respect to C. The coefficient k expresses the ratio of all elements of the universe, which can be properly classified to blocks of the partition U/D, employing attributes C. If k = 1 we say that D depends totally on C, and if k γ T (D ) Where

γ R (D ) = card(POS R (D )) / card(U )

(6) T ← R  {x}

(7) R ← T (8) until γ R (D ) = γ C (D ) return R 2.8 Decision rules Decision rules are often presented as implications and are often called “ if … , then … ” rules. For a set of condition attributes P = {P1,P2,P3,P4,…Pn } and a decision attribute Q, Q ∉ P, these rules have the form

(Pi=a)^(Pj=b)^(Pk=c)^...^(Pn=1) → (Q = d)

(12)

Where {a,b,c,…} are legitimate values from the domains of their respective attributes.

2.9 Rule Extraction The procedure of reduction first eliminates condition attributes not so important to the decision making process [14], then induces a set of minimal rules by analyzing the relation of attribute values of each example and removing dispensable attribute values. The reduced data obtained as a result of the above calculation is used for extracting rules [12].

3. Application to Feature Selection for Diabetes Classification The study data has been collected through a survey of 15 employees [men and women of age between 40-45] of Meenakshi College for Women, Chennai, consisting of 8 diabetic and 7 non-diabetic persons. The sample group is homogenous with respect to nature of work. Rough Set concepts applied to the data identify the important symptoms for the diagnosis of diabetes. Diabetes related symptoms’ presence or absence, namely thirst, hunger, frequent urination, weight loss and tiredness have been taken as conditional attributes in the sample. Diabetes is taken as the decision attribute.

3.1 Decision System The attributes are all nominal and their values in the decision system are presented in Table 1. The presence of any symptom is coded as 2 and absence as 1 and the coded information system is presented in Table 2.

http://ijaai.informaticspublishing.com

73

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

Table. 1 Nominal Values Of Attributes Type of Attributes

Conditional Attributes

Decision Attribute

Attributes

Nominal Values

Thirst Hunger Frequent Urination Weight Loss Tiredness Diabetes

Yes, No Yes ,No Yes, No Yes, No Yes, No Yes, No

Table .2 Coded Decision System

Frequent W e i g h t Urination Loss Tiredness

Volunteers

Thirst

Hunger

Diabetic

X1

2

2

1

1

2

2

X2

2

2

1

1

1

2

X3

2

2

2

1

2

2

X4

2

2

2

1

1

2

X5

2

1

2

2

2

2

X6

2

2

2

2

2

2

X7

2

1

1

1

1

2

X8

2

2

2

2

2

2

X9

2

2

1

1

2

1

X10

2

1

2

1

2

1

X11

2

2

2

1

2

1

X12

2

1

1

1

1

1

X13

1

2

1

2

2

1

X14

1

1

1

2

1

1

X15

2

2

1

1

2

1

3.2. Applications of Rough Set Concepts to the Decision System Let TH, HU, FU, WL, TI, DB denote the attributes Thirst, Hunger, Frequent Urination, Weight Loss, Tiredness and Diabetes respectively. Table 3 presents the equivalence (indiscernibility) classes obtained from the decision system. The discernibility matrix is shown in Table 4. Table. 3 Equivalence Classes obtained from the Decision System

Attributes

Equivalence (Indiscernibility) Classes

DB

{x1,x2,x3,x4,x5,x6, x7, x8 },{x9,x10,x11,x12,x13,x14,x15}

TH

{x1,x2,x3,x4,x5,x6,x7, x8,x9,x10,x11,x12,x15},{x13,x14}

HU

{x1,x2,x3,x4,x6,x8,x9,x11,x13,x15},{x5,x7,x10,x12,x14}

http://ijaai.informaticspublishing.com

74

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

FU

{x1,x2,x7,x9,x12,x13 ,x14,x15},{x3, x4, x5, x6, x8, x10,x11}

WL

{x1,x2,x3,x4,x7,x9,x10x11,x12,x15},{x5,x6,x8,x13,x14,}

TI

{x1,x3,x5,x6,x8,x9,x10,x11,x13,x15},{x2,x4,x7,x12,x14}

TH,HU

{x1,x2,x3,x4,x6,x8,x9,x11,x15},{x5,x7,x10,x12},{x13},{x14}

TH,FU

{x1,x2,x7,x9,x12,x15},{x3,x4,x5,x6,x8,x10,x11},{x13,x14}

TH,WL

{x1,x2,x3,x4,x7,x9,x10,x11,x12, x15},{x5,x6,x8},{x13,x14}

TH,TI

{x1,x3,x5,x6,x8,x9,x10x11,x15},{x2,x4,x7,x12},{x13},{x14}

TH,HU,FU

{x1,x2,x9,x15},{x3,x4,x6,x8,x11},{x5,x10},{x7,x12},{x13},{x14}

TH,HU,WL

{x1,x2,x3,x4,,x9,x11,x15},{x5},{x6,x8},{x7,x10,x12},{x13},{x14}

TH,HU,TI

{x1,x3,x6,x8,x9,x11,x15},{x2,x4},{x5,x10},{x7,x12},{x13},{x14}

TH,FU,WL

{x1,x2,x7,x9,x12,x15},{x3,x4,x10,x11},{x5,x6,x8},{x13,x14}

TH,FU,TI

{x1,x9,x15},{x2,x7,x12},{x4},{x3,x5,x6,x8,x10,x11},{x13},{x14}

TH,WL,TI

{x1,x3,x9,x10,x11,x15},{x2,x4,x7,x12,},{x5,x6,x8},{x13},{x14}

HU,FU,WL

{x1,x2,x9,x15},{x3,x4,x11},{x5},{x6,x8},{x7,x12},{x10},{x13},{x14}

HU,FU,TI

{x1,x9,,x13 ,x15},{x2},{x3,x6,x8,x11},{x4},{x5,x10},{x7,x12,x14}

FU,WL,TI

{x1,x9,x15},{x2,x7,x12},{x3,x10,x11},{x4},{x5,x6,x8},{x13},{x14}

HU,WL,TI

{x1,x3,x9,x11,x15},{x2,x4},{x5},{x6,x8,x13},{x7,x12},{x10},{x14}

TH,HU,FU,WL

{x1,x2,x9,x15},{x3,x4,x11},{x5},{x6,x8},{x7,x12},{x10},{x13},{x14}

TH,HU,FU,TI

{x1,x9,x15},{x2},{x3,x6,x8,x11},{x4},{x5,x10},{x7,x12},{x13},{x14}

TH,FU,WL,TI

{x1,x9,x15},{x2,x7,x12,},{x3,x10,x11},{x4},{x5,x6,x8},{x13},{x14}

TH,HU,WL,TI

{x1,x3,x9,x11,x15},{x2,x4},{x5},{x6,x8},{x7,x12},{x10},{x13},{x14}

HU,FU,WL,TI

{x1,x9,x15},{x2},{x3,x11},{x4},{x5},{x6,x8},{x7,x12},{x10},{x13},{x14}

TH,HU,FU,WL,TI

{x1,x9,x15},{x2},{x3,x11},{x4},{x5},{x6,x8},{x7,x12},{x10},{x13},{x14}

Table. 4 Discernibility Matrix associated with the Information System

Volunteers

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X1

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

HU,FU

FU

HU,TI

TH,WL

TH,HU,

Ø

WL,,TI

X2

Ø

Ø

Ø

Ø

Ø

Ø

Ø

TI

HU,FU,

FU,TI

HU

TH,WL,TI

TI

X3

Ø

Ø

http://ijaai.informaticspublishing.com

Ø

Ø

Ø

Ø

FU

75

HU

TH,HU,

TI

WL

Ø

HU,FU,

TH,FU,

TH,HU,FU

TI

WL

WL,TI

FU

Research article

Vol:1

Issue: 3 X4

March 2013

ISSN:2320-4001

Ø

X5

Ø

Ø

Ø

Ø

Indian Journal of Automation & Artificial Intelligence

Ø

Ø

Ø

Ø

FU,TI

HU ,FU,

HU,TI

WL

WL

X6

Ø

Ø

Ø

FU, WL

HU,WL

TI

HU,FU

TH,FU,WL,-

TH,HU,

TI

FU,WL

HU

FU,WL,

TH,HU,

TH,FU,

HU,FU,

WL

TI

FU

TI

WL

WL

HU,FU,

TH,FU

TH,HU

FU,WL

WL,TI

X7

Ø

Ø

HU,TI

FU,TI

FU,TI

HU

Ø

FU,

FU,TI

TH,HU

TH,WL

H,TI

TH,FU

FU,WL

WL,TI

TI

X8

Ø

FU, WL

HU,WL

WL

HU ,FU,

TH,FU

WL, TI

X9

Ø

X10 X11

HU,TI

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

X12 X13 X14 X15

Ø

If a1,a2,a3,a4,a5 are the Boolean variables corresponding respectively to the attributes TH,HU,FU,WL,TI and if f is the discernibility function, then f(a1,a2,a3,a4,a5)= a2 ∧ a3 ∧ a4 ∧ a5 ⇒{HU,FU,WL,TI} form a reduct of the information system The application of Quick Reduct Algorithm to find the reduct of the decision system is shown below:

Initially R←{} IND[TH]={x1,x2,x3,x4,x5,x6,x7, x8,x9,x10,x11,x12, x15},{x13,x14} IND[DB]= {X1,X2}where X1 = {x1,x2,x3,x4,x5,x6, x7, x8 },X2 = {x9,x10, x11,x12,x13,x14,x15} POS(TH)/DB= {x13,x14}⇒γ(TH)/DB=2/15 R←{TH} POS(TH,HU,FU,WL,TI)/DB= {x2,x4,x5,x6, x8,x10,x13,x14} ⇒γ(TH,HU,FU,WL,TI)/DB= 8/15 ⇒γ(TH)/DB ≠ γ(TH,HU,FU,WL,TI)/DB. So this condition proves false. Eliminating all consistent instances, U=U-POS(TH)/DB ={x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x15} IND[TH,HU]={x1,x2,x3,x4,x6,x8,x9,x11,x15},{x5,x7,x10,x12},{x13},{x14} POS(TH,HU)/DB = {x13,x14}⇒γ(TH,HU)/DB = 2/15

γ(TH,HU)/DB >/ γ(TH)/DB Next, IND[TH,FU]={x1,x2,x7,x9,x12,x15},{x3,x4,x5,x6,x8,x10,x11},{x13,x14} POS(TH,FU)/DB= {x13,x14}⇒γ(TH,FU)/DB = 2/15 γ(TH,FU)/DB >/ γ(TH)/DB http://ijaai.informaticspublishing.com

76

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

IND[TH,WL]={x1,x2,x3,x4,x7,x9,x10,x11,x12,

x15},{x5,x6,x8},{x13,x14} POS(TH,WL)/DB = ,{x5,x6,x8,x13,x14}⇒γ(TH,WL)/DB= 5/15=1/3 γ(TH,WL)/DB>γ(TH)/DB R←{TH,WL}. But γ(TH,WL)/DB

≠ γ(TH,HU,FU,WL,TI)/DB

Proceeding in this manner and generating positive regions for all the indiscernibility classes it is found that γ(HU,FU,WL,TI)/

DB>γ(HU,FU,WL)/DB=6/15 Also γ(HU,FU,WL,TI)/DB = γ(TH,HU,FU,WL,TI)/DB = 8/15.

4. Results and Discussions The equivalence classes corresponding to the decision attribute are {x1,x2,x3,x4,x5,x6, x7, x8} and {x9 ,x10,x11,x12,x13,x14,x15}. Let X1={x1,x2,x3,x4,x5,x6,x7,x8} and let X2={x9,x10,x11,x12,x13,x14,x15}.Considering the subset B of attributes {TH,HU,FU,WL,TI },the lower and upper approximations of B with respect to the set X1 are:BX1 = {x2, x4, x5, x6, x8}

(13)

BX1 = {x1,x2,x3,x4,x5,x6,x7, x8, x9, x11, x12, x15}

(14)

The Boundary region of B with respect to set X1 is given by BX1 - BX1

= {x1, x3, x7, x9, x11, x12, x15}

(15)

Similarly, the lower and upper approximations of B with respect to the set X2 are:BX 2 = {x10, x13, x14}

(16)

BX 2 = {x1,x3,x7,x9, x10, x11, x12, x13, x14, x15}

(17)

The Boundary region of B with respect to set X2 is given by BX 2 - BX 2 = {x1, x3, x7, x9, x11, x12, x15}

(18)

Since boundaries of X1 and X2 are non empty, X1 and X2 are rough sets Also BX1  BX 2 ≠ U since a unique decision cannot be taken for the persons {x1,x3,x7,x9,x11,x12,x15}

(19)

From the information system it is noted that IND[HU,FU,WL,TI]=IND[TH,HU,FU,WL,TI ]={x1,x9,x15},{x2},{x3,x11},{x4},{x5},{x6, x8},{x7,x12}, {x10},{x13},{x14} (20)

⇒ {HU, FU, WL, TI } is a reduct of the information system.

http://ijaai.informaticspublishing.com

77

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

Since γ(HU,FU,WL,TI)/DB = γ(TH,HU,FU,WL,TI)/DB, the Quick Reduct Algorithm also implies that Attribute Thirst is dispensable and a reduct of the information system is given by {HU,FU,WL,TI}. The discernibility function of the Boolean variables a1, a2, a3, a4, a5 (corresponding to the attributes

TH,HU,FU,WL,TI ) takes the value f(a1,a2,a3,a4,a5)= a2 ∧ a3 ∧ a4 ∧ a5 which also indicates that {HU,FU,WL,TI} forms a reduct of the information system.

Rule extraction from the decision system involves merging of rows with similar condition and decision attribute values .The core of every row is then calculated. Rows which are repeated due to the calculation of the core are again merged .A new table with reduct value is then formed. The resulting reduced information systems are presented in Tables 5 and 6. Table. 5 Reduction of Information System- Stage 1

Volunteers

Hunger

Frequent Urination

We i g ht Loss

Tiredness

Diabetic

x1

2

*

1

*

2

x2

2

*

1

*

2

x3

2

2

*

*

2

x4

2

2

*

*

2

x5

*

2

2

2

2

x6, x8

*

2

2

2

2

x7

*

1

1

1

2

x9, x15

2

1

*

2

1

x10

1

2

1

2

1

x11

*

2

1

2

1

x12

1

1

1

*

1

x13

*

1

*

2

1

x14

1

1

*

1

1

Table. 6 Reduction of Information System -Stage 2

Volunteers

Hunger

Fre q u e nt Urination

Weight Loss

Tiredness

Diabetic

X1 , X2

2

*

1

*

2

X ,X 3 4

2

2

*

*

2

*

2

2

2

2

*

1

1

1

2

2

1

*

2

1

1

2

1

2

1

*

2

1

2

1

1

1

1

*

1

*

1

*

2

1

1

1

*

1

1

X,X X 5

6,

X

7

X,X 9

X

10

X X X X

11

12

13

14

15

8

http://ijaai.informaticspublishing.com

78

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

The following decision rules are generated from the reduced information Table 6:i. (HU=2)∧(WL=1) ⇒ (DB=2) ii. (HU=2)∧(WL=2)

⇒ (DB=2)

iii. (FU=2)∧(WL=2)∧(TI=2)

⇒ (DB=2)

iv. (FU=1)∧(WL=1)∧(TI=1) v. (HU=2)∧(FU=1)∧(TI=2)

⇒ (DB=2)

⇒ (DB=1)

vi. (HU=1)∧(FU=2)∧(WL=1)∧(TI=2) vii. (FU=2)∧(WL=1)∧(TI=2)

⇒ (DB=1)

viii. (HU=1)∧(FU=1)∧(WL=1) ix. (FU=1)∧ (TI=2)

⇒ (DB=1)

⇒ (DB=1)

⇒ (DB=1)

x. (HU=1)∧(FU=1)∧(TI=1)

⇒ (DB=1)

These decision rules imply that persons with the precedence of (HU=2)∧(WL=1), (HU=2)∧(WL=2) ,(FU=2)∧(WL=2)∧(TI=2) and (FU=1)∧(WL=1)∧(TI=1) all have the consequence (DB=2) namely diabetic and the persons with the precedence of (HU=2)∧(FU=1)∧(TI=2),(HU=1)∧(FU=2)∧(WL=1)∧(TI=2),(FU=2)∧(WL= 1)∧(TI=2), (HU=1)∧(FU=1)∧(WL=1),(FU=1)∧ (TI=2) and (HU=1)∧(FU=1)∧(TI=1)

all have the consequence (DB=1) namely non-diabetic.

Conclusion Rough Set Theory proposed by Pawlak is considered as a powerful and feasible methodology for handling imprecision and uncertainty, for performing data mining and knowledge discovery and for classifying unknown data based on already gained knowledge. The diabetes data set has been drawn from a survey on existing medical problems. The rough set based analysis showed that the most important aspects are the following: Hunger, Frequent Urination, Weight Loss and Tiredness. These aspects influence the incidence of diabetes. The results of this analysis and extracted rules are also consistent with general clinical knowledge about diabetes. Data mining uncovers the inter-data connection or relationships such as the one that exists between symptoms and diseases in medical database.In this paper Rough Set based reduction algorithms such as Discernibility based reduction and Dependency based attribute reductions have been used. Calculations of reducts are useful in reducing noise effects and in identifying the factors which need to be focused upon. The presented methodology goes beyond the individual applications in diabetes and one can apply all of them to mining different data sets.

http://ijaai.informaticspublishing.com

79

Research article

Vol:1

Issue: 3

March 2013

ISSN:2320-4001

Indian Journal of Automation & Artificial Intelligence

References 1. Düntsch, I., & Gediga, G. (2000) Rough set data analysis--A road to non-invasive knowledge discovery.

2. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996) From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. 3. Jensen, R., & Shen, Q. (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-roughbased approaches. Knowledge and Data Engineering, IEEE Transactions on, 16(12), 1457-1471. 4. Kalyani, P., & Karnan, M. (2011) A new implementation of attribute reduction using quick relative reduct algorithm. International Journal of Internet Computing, 1(1), 99-102. 5. Komorowski, J., Pawlak, Z., Polkowski, L., & Skowron, A. (1999) Rough sets: A tutorial. Rough fuzzy hybridization: A new trend in decision-making, 3-98. 6. Magnani, M. (2003) Technical report on rough set theory for knowledge discovery in data bases. Bologna, Italy: University of Bologna. 7. Orlowska,E.(Ed.).(1998) Incomplete information: Rough set analysis (Vol.13) Springer.OrlowskaE., (1997) Incomplete Information: Roughset Analysis, Physica- Verlag. 8. Pawlak, Z. (2002) Rough set theory and its applications. Journal of telecommunications and information technology, 3(2), 7-10. 9. Polkowski, L. (2002) Rough sets: Mathematical foundations (Vol. 15) Springer. 10. Rissino, S., & Lambert-Torres, G. (2009) Rough Set Theory–Fundamental Concepts, Principals, Data Extraction, and Applications. Data Mining and Knowledge Discovery in Real Life Applications, J. Ponce and A. Karahoca (Eds.), InTech Publishers. 11. Sakr, A., & Mosa, D. I. A. N. A. (2010) Dealing medical data with fundamentals of new artificial intelligence. International Journal of Engineering Science and Technology, 2(9), 4406-4417. 12. Shen, Q., & Jensen, R. (2007). Rough sets, their extensions and applications, International Journal of Automation and Computing, 4(3), 217-228. 13. Skowron, A., & Rauszer, C. (1992) The discernibility matrices and functions in information systems. In Intelligent Decision Support (pp. 331-362) Springer Netherlands. 14. Sreevani, Y. V., & Rao, T. V. N. (2010). Identification and Evaluation of Functional Dependency Analysis using Rough sets for Knowledge Discovery. International Journal. of Advanced Computer Science and Applications. 15. Suraj, Z. (2004) An Introduction to Rough Set Theory and Its Applications. ICENCO, Cairo, Egypt. 16. Xu, Y., Cao, Y., & Yang, S. (2011). Research on Care of Postoperative Patient based on Rough Sets Theory. International Journal of Computer Applications,31(10). 17. Yang, Y., & Chiam, T. C. (2000) Rule discovery based on rough set theory. In Information Fusion, 2000. FUSION 2000. Proceedings of the Third International Conference on (Vol. 1, pp. TUC4-11). IEEE.

http://ijaai.informaticspublishing.com

80

Research article