An Incremental Approach for Inducing Knowledge ... - Semantic Scholar

Fundamenta Informaticae 94 (2009) 245–260

245

DOI 10.3233/FI-2009-129 IOS Press

An Incremental Approach for Inducing Knowledge from Dynamic Information Systems Dun Liu∗† School of Economics and Management Southwest Jiaotong University Chengdu, 610031, China, [email protected]

Tianrui Li Laboratory of Intelligent Information Processing School of Information Science and Technology, Southwest Jiaotong University Chengdu, 610031, China, [email protected]

Da Ruan Belgian Nuclear Research Centre (SCK•CEN), Boeretang 200, 2400 Mol, Belgium & Department of Applied Mathematics & Computer Science, Ghent University 9000 Gent, Belgium, [email protected];[email protected]

Weili Zou School of Mathematics, Southwest Jiaotong University Chengdu, 610031, China, [email protected]

Abstract. Knowledge in an information system evolves with its dynamical environment. A new concept of interesting knowledge based on both accuracy and coverage is defined in this paper for dynamic information systems. An incremental model and approach as well as its algorithm for inducing interesting knowledge are proposed when the object set varies over time. A case study validates the feasibility of the proposed method. Keywords: Rough sets, interesting knowledge, accuracy, coverage, dynamic information systems, data mining ∗

Address for correspondence: School of Economics and Management Southwest Jiaotong University, Chengdu, 610031, China This work is supported by the National Science Foundation of China (Nos. 60873108, 60875034), and the Doctoral Innovation Foundation of Southwest Jiaotong University (No. 200907), Chengdu, China. †

246

D. Liu et al. / An Incremental Approach for Inducing Knowledge from Dynamic Information Systems

1. Introduction The rule induction approaches based on rough sets have been developed rapidly over last decades, with comprehensive applications in knowledge discovery from database [1]. The empirical studies, such as water demands [2], financial prediction [3, 4, 5], monopoly recognition [6] and medical diagnose [7, 8], have proved that these approaches are very helpful for the obtained interesting knowledge generated by decision rules from original databases. The core technology of rule induction lies on the measures of rule quality or performance, that is, what is the interesting knowledge? In earlier studies, Wong and Ziarko proposed two measures, namely, the confidence and resolution factors for inductive learning [9]. In the domain of data mining, the interesting knowledge is usually induced by certain patterns, which represent in the form of association rules. Furthermore, Han and Kamber stated in [10], “Rule support and confidence are two measures of rule interestingness and they respectively reflect the usefulness and certainty of discovered rules”. “Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold. Such thresholds can be set by users or domain experts.” In addition, Tsumoto believed “accuracy and coverage measure the degree of sufficiency and necessity, respectively”. “The simplest probabilistic model is that which only uses classification rules which have high accuracy and high coverage.” [11, 12] Inspired by these ideas, we choose both accuracy and coverage as the two important factors to describe interesting knowledge in our study. As accuracy and coverage are two statistical measures for rule induction, we use a classification error parameter β to acquire the accuracy value. This can be viewed as a kind of extension of Ziarko’s Variable Precision Rough Set model (VPRS) [13]. We can also obtain the accuracy value from Bayes formula in Decision-Theoretic Rough Set Models (DTRS) by using the lost function proposed by Yao [14, 15, 16]. However, current researches and applications of knowledge discovery mainly focus on the static information systems, that is, the objects and attributes in a certain information system remain constant. In reality, real data sources possess dynamic characteristics and the data volume grows both in the dimensions of attributes and objects at an unprecedented rate. To maintain the effectiveness of knowledge from the dynamical data, it is necessary to develop an incremental strategy for updating knowledge. Nowadays, the incremental learning approaches based on Rough Set Theory (RST) have received much attention. They mainly focused on these two cases: (1) The object set in the information system evolves over time while the attribute set remains constant. (2) The attribute set in the information system evolves over time while the object set remains constant. In the first case, Shan and Ziarko [17] presented an incremental methodology based on the discernibility matrix introduced by Skowron and Rauszer [18] for finding all maximally generalized rules. Bang and Bien proposed another incremental inductive learning algorithm to find a minimal set of rules for a decision table without recalculating all the set of instances when another instance is added into the universe [19]. Tong and An proposed an algorithm for incremental learning rules based on the ∂-decision matrix [20]. They listed seven cases that would happen when a new sample enters the system. In addition, Blaszczynski and Slowinski discussed the incremental induction of decision rules from dominance-based rough approximations to select the most interesting representatives in the final set of rules [21]. Zheng and Wang developed a rough set and rule tree based incremental knowledge acquisition algorithm RRIA, that can learn new knowledge more quickly [22]. Hu et al. constructed a novel incremental attribute reduction algorithm when new objects are added into a decision information system [23]. In the second case, Chan proposed the incrementally mining algorithms for learning classification rules efficiently when an attribute set in the information


247

system evolves over time [24]. Li et al. presented a method for updating approximations of a concept in an incomplete information system through characteristic relations when an attribute set varies over time [25]. To our best knowledge, previous work on incremental learning mainly concerned the situation where only one single object enters the information system. But in most real problems, some objects will enter the system and some objects will get out of the system simultaneously. In this paper, we consider multiple objects that add to the system and that go out simultaneously. We propose a concept of interesting knowledge under both accuracy and coverage. The cardinality of condition class sets and decision class sets are induced to calculate the accuracy and the coverage of the generated rules. Moreover, aiming to the variation of the object set, we present a novel incremental model and approach together with its algorithm to induce interesting knowledge from dynamic information systems. The feature of our proposed method is to employ matrixes to calculate both accuracy and coverage to obtain the interesting knowledge. The rest of the paper is organized as follows: Section 2 provides basic concepts of rough sets and the definition of interesting knowledge. The model of incremental knowledge updating from dynamic information systems is given in Section 3. An approach and algorithm for incrementally learning interesting knowledge are presented. Section 4 shows an example to validate the proposed model. The paper ends with conclusions and further research topics in Section 5.

2. Preliminaries Basic concepts, notations and results of rough sets as well as their extensions are outlined in this section [1, 10, 12, 13, 26]. S Definition 2.1. A complete information system is defined as a tuple S = (U, C D, V, F ), where U is a non-empty finite set of objects, C is the set of condition attributes and D is the set of decision attributes. V = ∪ Va , where Va is a domain of the attribute a and A = C ∪ D. F : U × A → V is an information a∈A

function such that F (x, a) ∈ Va for every x ∈ U , a ∈ A. S Definition 2.2. Let S = (U, C D, V, F ) be a complete information system. We denote U/C = {X1 , X2 , · · · , Xm }, where Xi (i = 1, 2, · · · , m) is a condition class; U/D = {D1 , D2 , · · · , Dn }, where Dj (j = 1, 2, · · · , n) is a decision class. The support, accuracy and coverage of Xi → Dj , ∀Xi ∈ U/C, ∀Dj ∈ U/D, are defined respectively T as follows: Support of Xi → Dj : Supp(Dj |Xi ) = |Xi T Dj |; Accuracy of Xi → Dj : Acc(Dj |Xi ) = |Xi TDj |/|Xi |; Coverage of Xi → Dj : Cov(Dj |Xi ) = |Xi Dj |/|Dj |. where |Xi | and |Dj | denote the cardinality of set Xi and Dj , respectively. Then we may construct the accuracy matrix and the coverage matrix as follows: 

  Acc(D|X) =   

Acc(D1 |X1 ) Acc(D2 |X1 ) Acc(D1 |X2 ) Acc(D2 |X2 ) .. .. . . Acc(D1 |Xm ) Acc(D2 |Xm )

··· ··· .. . ...

Acc(Dn |X1 ) Acc(Dn |X2 ) .. . Acc(Dn |Xm )

     

(1)

248




  Cov(D|X) =   

Cov(D1 |X1 ) Cov(D1 |X2 ) .. . Cov(D1 |Xm )

Proposition 2.1. 0 ≤ Acc(Dj |Xi ) ≤ 1 and Proposition 2.2. 0 ≤ Cov(Dj |Xi ) ≤ 1 and

Cov(D2 |X1 ) Cov(D2 |X2 ) .. . Cov(D2 |Xm )

Pn

j=1 Acc(Dj |Xi )

Pm

i=1 Cov(Dj |Xi )

··· ··· .. . ...

Cov(Dn |X1 ) Cov(Dn |X2 ) .. . Cov(Dn |Xm )

     

(2)

= 1, ∀Xi ∈ U/C, i = 1, 2, · · · , m. = 1, ∀Dj ∈ U/D, j = 1, 2, · · · , n.

The accuracy matrix and coverage matrix help us extract the remarkable rules from original database. In general, we only care about the information with a high accuracy and a high coverage in knowledge discovery as such information indicates a kind of interesting knowledge. Therefore, it is necessary to set two thresholds for the accuracy and the coverage to obtain the interesting knowledge, just like the support threshold in the association rule mining. Definition 2.3. If Acc(Dj |Xi ) ≥ α and Cov(Dj |Xi ) ≥ γ, ∀Xi (i = 1, 2, · · · , m), ∀Dj (j = 1, 2, · · · , n), we call the rule Xi → Dj a kind of interesting knowledge where α ∈ (0.5, 1) and γ ∈ (0, 1). T By the definitions of relative error ratio E(Dj |Xi ) = 1 − |Xi Dj |/|Xi | (|Xi | > 0) and the classification error β already proposed in VPRS, we get α = 1 − β. Therefore, the interesting knowledge generated in Definition 2.3 has already considered the tolerance and the interesting of knowledge simultaneously.

3. An approach for incrementally learning interesting knowledge and its algorithm In this section, we discuss the change of interesting knowledge in dynamic information systems when the object set evolves over time while the attribute set remains constant. Assume the incrementally learning process of interesting knowledge is from time t to time t + 1. To describe a dynamic information system, S we denote a complete information system at time t as S = (U, C D, V, F ), with the condition class set U/C = {X1 , X2 , · · · , Xm } and the decision class set U/D = {D1 , D2 , · · · , Dn }, where U is a non-empty finite set of objects at time t. At time t + 1, some objects will enterSthe system while some go out, so the original information system S will be changed into S ′ = (U ′ , C ′ D′ , V ′ , F ′ ). Similarly, we denote the accuracy matrix and the coverage matrix at time t as Acc(t) (D|X) and Cov (t) (D|X), and those at time t + 1 as Acc(t+1) (D′ |X ′ ) and Cov (t+1) (D′ |X ′ ) according to Definition 2.2. The rule Xi → Dj is interesting if Acc(t) (Dj |Xi ) ≥ α and Cov (t) (Dj |Xi ) ≥ γ at time t. The rule Xi′ → Dj′ is interesting if Acc(t+1) (Dj′ |Xi′ ) ≥ α and Cov (t+1) (Dj′ |Xi′ ) ≥ γ at time t + 1 according to Definition 2.3. In the following, we first construct a model for incrementally learning interesting knowledge. Then we present the model for incremental learning interesting knowledge when multiple objects enter and go out of the system. Finally, we provide the corresponding algorithm for inducing interesting knowledge.


249

Figure 1. Objects’ immigration and emigration

3.1. The model for incremental learning interesting knowledge Suppose N objects enter the system and M objects get out of the system at time t + 1. We denote the set of incoming N objects as N and the fading set of M objects as M. Furthermore, we assume the N new objects will form l new conditional classes Xm+1 , Xm+2 , · · · , Xm+l and r new decision classes Dn+1 , Dn+2 , · · · , Dn+r . Then, ∀x ∈ N, we count the number of Ni objects that belong to the conditional class Xi (i = 1, 2, · · · , m + l) and the number of Mi that get out of the conditional class Xi (i = 1, 2, · · · , m). The detail process of objects immigration and emigration are shown in Figure 1. Thus we have, P Pm+l Pm+l Pn+r Ni = Pn+r j=1 Nij , N = P i=1 Ni = Pi=1 Pj=1 Nij ; m n Mi = nj=1 Mij , M = m i=1 Mi = i=1 j=1 Mij , where Nij represents that Ni objects enter the conditional class Xi and Nij objects of them result in the decision class Dj ; Mij represents that Mi objects get out of the conditional class Xi and in which Mij objects get out from the decision class Dj . Hence, we obtain the relationship of condition classes and decision classes between time t and time t + 1. As already mentioned previously, at time t, we have the set of condition classes U/C = {X1 , X2 , · · · , Xm } and the set of decision classes U/D = {D1 , D2 , · · · , Dn }. At time t + 1, the set of ′ , · · · , X′ condition classes is marked as U ′ /C ′ = {X1′ , X2′ , · · · , Xm m+l } and the set of condition classes ′ ′ ′ ′ ′ ′ U /D = {D1 , D2 , · · · , Dn , · · · , Dn+r }. ( |Xi | + Ni − Mi , i = 1, 2, · · · , m |Xi′ | = Ni , i = m + 1, m + 2, · · · , m + l ( P Pm |Dj | + m+l ′ i=1 Nij − i=1 Mij , j = 1, 2, · · · , n |Dj | = Pm+l j = n + 1, n + 2, · · · , n + r i=1 Nij ,

250


Following Figure 1 and the above analysis, we calculate the accuracy and coverage at time t + 1:  T |Xi Dj |+Nij −Mij    |Xi |+Ni −Mi , i = 1, 2, · · · , m; j = 1, 2, · · · , n Nij Acc(t+1) (Dj′ |Xi′ ) = |X |+N , i = 1, 2, · · · , m; j = n + 1, n + 2, · · · , n + r i i −Mi    Nij , i = m + 1, m + 2, · · · , m + l; j = 1, 2, · · · , n + r Ni

T  |Xi Dj |+Nij −Mij  P m+l Pm , i = 1, 2, · · · , m; j = 1, 2, · · · , n    |Dj |+ i=1 Nij − i=1 Mij N ij Cov (t+1) (Dj′ |Xi′ ) = |D |+P m+l N −P m M , i = m + 1, m + 2, · · · , m + l; j = 1, 2, · · · , n j ij ij i=1  i=1  Nij   P m+l , i = 1, 2, · · · , m + l; j = n + 1, n + 2, · · · , n + r i=1

Nij

Specially, note that |Xi′ | and |Dj′ | can be equal to zero, because we do include the consideration of M objects emigrated from the condition and decision classes. If Xi∗ (i∗ ∈ {1, 2, · · · , m}) is eliminated from the system, this means all the objects related to Xi′∗ have disappeared from time t to time t+1, without any incoming object that belongs to Xi′∗ . According to Definition 2.2, the row of Acc(t+1) (Dj′ |Xi′∗ ) in Matrix Acc(t+1) (D′ |X ′ ) and the row of Cov (t+1) (Dj′ |Xi′∗ ) in Matrix Cov (t+1) (D′ |X ′ ) are all zero. Similarly, If Dj ∗ (j ∗ ∈ {1, 2, · · · , n}) is eliminated from the system, all the objects related to Dj′ ∗ have emigrated from the system and no new object results in Dj′ ∗ . In this case, the column of Acc(t+1) (Dj′ ∗ |Xi′ ) in Matrix Acc(t+1) (D′ |X ′ ) and the column of Cov (t+1) (Dj′ ∗ |Xi′ ) in Matrix Cov (t+1) (D′ |X ′ ) are all zero. Specially, since Acc(t) (Dj |Xi ) = 1, 2, · · · , n at time t, then we get: Acc(t+1) (Dj′ |Xi′ ) = Cov (t+1) (Dj′ |Xi′ ) =

T |Xi Dj | |Xi |

and Cov (t) (Dj |Xi ) =

T |Xi Dj | |Dj | ,

i = 1, 2, · · · , m; j =

|Xi | Nij − Mij · Acc(t) (Dj |Xi ) + |Xi | + Ni − Mi |Xi | + Ni − Mi

|Dj | Nij − Mij ·Cov (t) (Dj |Xi )+ Pm+l Pm Pm+l P |Dj | + i=1 Nij − i=1 Mij |Dj | + i=1 Nij − m i=1 Mij

where i = 1, 2, · · · , m and j = 1, 2, · · · , n. These two mathematical formulae display the relation of accuracy and coverage between time t and time t + 1, which reveal the inner connection of the two pairs of matrixes. Based on these analysis, we focus on the implementation of the model when the object set in the information system varies with time.

3.2. The implementation of the model This section introduces the implementation of the model and the calculation process for the cardinality of Ni , Mi , Nij and Mij . Since the immigrations (emigrations) of the object set can be regarded as the composition of single object immigrates (emigrates), we only consider the case of single object immigration (emigration). 3.2.1.

Single object enters the information system

Assume a new object x enters into the information system. Case 1: Forming a new conditional class and a new decision class.


251

In this case, x∈Xi (i = 1, 2, · · · , m), x∈Dj (j = 1, 2, · · · , n), i.e., x has different antecedent and seccedent for the objects in U . Then the immigration of x is independent of the former sys′ ′ tem U . It generates a new conditional class Xm+1 and a new decision class Dn+1 . At this time, (t+1) ′ ′ (t+1) ′ ′ (t+1) ′ ′ Acc (Dn+1 |Xm+1 ) = 1, Cov (Dn+1 |Xm+1 ) = 1. In addition, Acc (Dj |Xi ) = (t) (t+1) ′ ′ (t) Acc (Dj |Xi ) and Cov (Dj |Xi ) = Cov (Dj |Xi ) for i = 1, 2, · · · , m; j = 1, 2, · · · , n. Case 2: Only forming a new conditional class. This is to say, x∈Xi (i = 1, 2, · · · , m) and ∃j ∗ ∈ {1, 2, · · · , n} such that x ∈ Dj ∗ , which means x ′ forms a new conditional class Xm+1 . The immigration of x increases the cardinality of Dj ∗ . Therefore, ′ ′ ′ Acc(t+1) (Dj′ ∗ |Xm+1 ) = 1, Cov (t+1) (Dj′ ∗ |Xm+1 ) = 1/(|Dj ∗ | + 1) for Xm+1 → Dj′ ∗ , T Acc(t+1) (Dj′ ∗ |Xu′ ) = Acc(t) (Dj ∗ |Xu ), Cov (t+1) (Dj′ ∗ |Xu′ ) = |Xu Dj ∗ |/(|Dj ∗ | + 1) for Xu′ → Dj′ ∗ (u 6= m + 1). T ′ ′ ′ ′ ′ Acc(t+1) (Dk′ |Xm+1 Dk = ∅(k 6= ) = Cov (t+1) (Dk′ |Xm+1 ) = 0 for Xm+1 → Dk′ since Xm+1 j ∗ ).

Acc(t+1) (Dj′ |Xi′ ) = Acc(t) (Dj |Xi ) and Cov (t+1) (Dj′ |Xi′ ) = Cov (t) (Dj |Xi ) for i 6= m + 1, j 6= j ∗ . Case 3: Only forming a new decision class.

This is, ∃i∗ ∈ {1, 2, · · · , m} such that x ∈ Xi∗ , x∈Dj (j = 1, 2, · · · , n). The immigration of x ′ generates a new inconsistent rule, and x forms a new decision class Dn+1 . In this case, ′ ′ ′ Acc(t+1) (Dn+1 |Xi′∗ ) = 1/(|Xi∗ | + 1), Cov (t+1) (Dn+1 |Xi′∗ ) = 1 for Xi′∗ → Dn+1 ; T (t+1) ′ ′ (t+1) ′ ′ (t) Acc (Dk |Xi∗ ) = (|Xi Dk |)/(|Xi∗ | + 1), Cov (Dk |Xi∗ ) = Cov (Dk |Xi∗ ) for Xi′∗ → Dk′ (k 6= n + 1); T ′ ′ ′ ′ Acc(t+1) (Dn+1 |Xu′ ) = Cov (t+1) (Dn+1 |Xu′ ) = 0 for Xu′ → Dn+1 since Xu′ Dn+1 = ∅(u 6= i∗ ).

Acc(t+1) (Dj′ |Xi′ ) = Acc(t) (Dj |Xi ) and Cov (t+1) (Dj′ |Xi′ ) = Cov (t) (Dj |Xi ) for i 6= i∗ , j 6= n + 1.

Case 4: Neither generating a new conditional class nor a new decision class. Namely, ∃i∗ ∈ {1, 2, · · · , m} such that x ∈ Xi∗ , ∃j ∗ ∈ {1, 2, · · · , n} such that x ∈ Dj ∗ . Thus the immigration of x increases the support of the rule of Xi∗ → Dj ∗ . We obtain, T T Acc(t+1) (Dj′ ∗ |Xi′∗ ) = (|Xi∗ Dj ∗ |+1)/(|Xi∗ |+1), Cov (t+1) (Dj′ ∗ |Xi′∗ ) = (|Xi∗ Dj ∗ |+1)/(|Di∗ |+ 1) for Xi′∗ → Dj′ ∗ ; T Acc(t+1) (Dk′ |Xi′∗ ) = (|Xi∗ Dk |)/(|Xi∗ | + 1), Cov (t+1) (Dk′ |Xi′∗ ) = Cov (t) (Dj ∗ |Xi∗ ) for Xi′∗ → Dk′ (k 6= j ∗ ); T Acc(t+1) (Dj′ ∗ |Xu′ ) = Acc(t) (Dj ∗ |Xu ), Cov (t+1) (Dj′ ∗ |Xu′ ) = (|Xu Dj ∗ |)/(|Dj ∗ | + 1) for Xu′ → Dj′ ∗ (u 6= i∗ ). Acc(t+1) (Dj′ |Xi′ ) = Acc(t) (Dj |Xi ) and Cov (t+1) (Dj′ |Xi′ ) = Cov (t) (Dj |Xi ) for i 6= i∗ , j 6= j ∗ . In particular, if Xi ⊆ Dj , the immigration of x does not change the consistence of Xi → Dj . It just T increases the support of the rule. If Xi DT j = ∅, the immigration of x generates an inconsistent decision rule. T We have: Acc(t+1) (Dj′ |Xi′ ) = (|Xi Dj | + 1)/(|Xi | + 1) = 1/(|Xi | + 1), Cov (t+1) (Dj′ |Xi′ ) = (|Xi Dj | + 1)/(|Dj | + 1) = 1/(|Dj | + 1) for Xi′ → Dj′ .

252

3.2.2.


Single object gets out of the system

An object x ˜ ∈ U gets out of the system (i.e., ∃i∗ ∈ {1, 2, · · · , m}, j ∗ ∈ {1, 2, · · · , n} such that: x ˜ ∈ Xi∗ , x ˜ ∈ Dj ∗ ). Therefore, T T (t+1) Acc (Dj′ ∗ |Xi′∗ ) = (|Xi∗ Dj ∗ |−1)/(|Xi∗ |−1), Cov (t+1) (Dj′ ∗ |Xi′∗ ) = (|Xi∗ Dj ∗ |−1)/(|Di∗ |− 1) for Xi′∗ → Dj′ ∗ ; T (t) Acc(t+1) (Dk′ |Xi′∗ ) = (|Xi∗ Dk |)/(|Xi∗ |− 1), Cov (t+1) (Dk′ |Xi′∗ ) = Cov(i∗ ,k) for Xi′∗ → Dk′ (k 6= j ∗ ); T (t) Acc(t+1) (Dj′ ∗ |Xu′ ) = Acc(u,j ∗ ) , Cov (t+1) (Dj′ ∗ |Xu′ ) = (|Xu Dj ∗ |)/(|Dj ∗ |−1) for Xu′ → Dj′ ∗ (u 6= i∗ ). Acc(t+1) (Dj′ |Xi′ ) = Acc(t) (Dj |Xi ) and Cov (t+1) (Dj′ |Xi′ ) = Cov (t) (Dj |Xi ) for i 6= i∗ , j 6= j ∗ . T In particular, (1) If Xi ⊆ Dj , then Acc(t+1) (Dj′ |Xi′ ) = (|Xi Dj | − 1)/(|Xi | − 1) = (|Xi | − T T 1)/(|Xi | − 1) = 1 = Acc(t) (Dj |Xi ) for Xi′ → Dj′ . (2) If Xi Dj = {˜ x} , then Xi Dj − {˜ x} = ∅. Thus Acc(t+1) (Dj′ |Xi′ ) = Cov (t+1) (Dj′ |Xi′ ) = 0 for Xi′ → Dj′ .

3.3. The incremental algorithm for learning interesting knowledge from dynamic information systems The concrete steps and algorithm of the incremental approach for updating interesting knowledge are presented as follows. StepT1: Calculation of the accuracy and coverage at time t according to Definition 2.2. Acc(t) (Dj |Xi ) = T (t) |Xi Dj |/|Xi | and Cov (Dj |Xi ) = |Xi Dj |/|Dj |, ∀i ∈ {1, 2, · · · , m}, ∀j ∈ {1, 2, · · · , n}. The accuracy matrix and coverage matrix can be built up: Acc(t) (D|X) = (Acc(t) (Dj |Xi ))m×n , Cov (t) (D|X) = (Cov (t) (Dj |Xi ))m×n . Step 2: Calculation of the accuracy matrix Acc(t+1) (D′ |X ′ ) = (Acc(t+1) (Dj′ |Xi′ ))(m+l)×(n+r) and the coverage matrix Cov (t+1) (D′ |X ′ ) = (Cov (t+1) (Dj′ |Xi′ ))(m+l)×(n+r) at time t + 1 by using Algorithm 1, which shows the detail steps of interesting knowledge updating after the system varies with time. Step 3: Construction of two 2-dimensional attribute value pairs (Acc(t) (Dj |Xi ), Cov (t) (Dj |Xi )) for every Xi → Dj and (Acc(t+1) (Dj′ |Xi′ ), Cov (t+1) (Dj′ |Xi′ )) for every Xi′ → Dj′ . Then output all the interesting knowledge at time t and t + 1, respectively. Therefore, we trace the change of interesting knowledge at different times according to the variation of the these two tables.

4. An illustration In this section, we provide an example to show how to use the above approach and algorithm to update the interesting knowledge dynamically. In the information system given in Table 1, U = {x1 , x2 , · · · , x12 }, C = {a1 , a2 , a3 }, D = {d}. Then, U/C = {X1 , X2 , X3 , X4 , X5 , X6 }, U/D = {D1 , D2 , D3 , D4 } = {0, 1, 2, 3}, where X1 = {x1 }, X2 = {x2 , x3 }, X3 = {x4 , x5 }, {X4 } = {x6 }, X5 = {x7 , x8 , x9 }, X6 = {x10 , x11 , x12 }, D1 = {x1 , x2 , x7 }, D2 = {x3 , x4 , x8 , x10 }, D3 = {x6 , x9 , x11 }, D4 = {x5 , x12 }. And N um stands for the cardinality of objects. Hence, the accuracy matrix and the coverage matrix at time t are listed, respectively, as follows.


253

S Data: An information system S = (U, C D, V, F ), two thresholds α and γ. Result: Accuracy matrix, Coverage matrix, Interesting knowledge. for i = 1 to m do for j = 1 to n do Calculate the accuracy matrix Acc(t) (D|X) and the coverage matrix Cov(t) (D|X) at time t. end end for i = 1 to m do for j = 1 to n do if Acc(t) (Dj |Xi ) ≥ α and Cov(t) (Dj |Xi ) ≥ γ then Output interesting knowledge Xi → Dj at time t. end end end Considering x (∀x ∈ N) enters into the system at time t+1. for x = 1 to N do for i = 1 to m + l, l = 0 do for j = 1 to n + r, r = 0 do if x ∈ Xi == false then if x ∈ Dj == false then Obtain a new condition class Xm+1 and a new decision class Dn+1 , Do Case 1 in Subsection 3.2.1. Then l++, r++; else Obtain a new condition class Xm+1 . Do Case 2 in Subsection 3.2.1. Then l++; end end if x ∈ Xi == true then if x ∈ Dj == false then Obtain a new decision class Dn+1 . Do Case 3 in Subsection 3.2.1. Then r++; else Do Case 4 in Subsection 3.2.1. end end end end end Considering x ˜ (∀˜ x in M) gets out of the system at time t+1. for x ˜ = 1 to M do for i = 1 to m do for j = 1 to n do Do Subsection 3.2.2. end end end Calculating the accuracy matrix Acc(t+1) (D ′ |X ′ ) and the coverage matrix Acc(t+1) (D ′ |X ′ ) at time t+1. for i = 1 to m + l do for j = 1 to n + r do if Acc(t) (Dj′ |Xi′ ) ≥ α and Cov(t) (Dj′ |Xi′ ) ≥ γ then Output interesting knowledge Xi′ → Dj′ at time t+1. end end end

Algorithm 1: The incremental algorithm for updating interesting knowledge

254


Table 1. A complete information system

U

a1

a2

a3

d

N um

U

a1

a2

a3

d

N um

x1 x2 x3 x4 x5 x6

0 0 0 0 0 1

0 1 1 1 1 1

0 0 0 1 1 2

0 0 1 1 3 2

10 15 5 25 3 42

x7 x8 x9 x10 x11 x12

1 1 1 2 2 2

2 2 2 2 2 2

2 2 2 2 2 2

0 1 2 1 2 3

3 20 47 5 15 30



    (t) Acc (D|X) =     



    Cov (t) (D|X) =     

 1 0 0 0  0.75 0.25 0 0   0 0.893 0 0.107   0 0 1 0    0.043 0.286 0.671 0  0 0.1 0.3 0.6  0.357 0 0 0  0.536 0.091 0 0   0 0.454 0 0.091   0 0 0.404 0    0.107 0.364 0.452 0  0 0.091 0.144 0.909

(3)

(4)

Then we consider the case of object transportation at time t + 1. Suppose at time t + 1, (1) x1 (10 objects in total) gets out of X1 ; (2) x9 (10 objects in total) enters X5 ; (3) a new sample set X7 (where a1 = 2, a2 = 3, a3 = 2, d = 3) immigrates into the system and the cardinality of X7 is 20. According to the approach and algorithm proposed in Section 3, the incremental process for updating knowledge is shown as follows. (1) x1 (10 objects in total) gets out of X1 . From Subsection 3.2.2, we have: Acc(t+1) (Dj′ |X1′ ) = Cov (t+1) (Dj′ |X1′ ) = 0. And for Xu′ → T D1′ (u 6= 1), Acc(t+1) (D1′ |Xu′ ) = Acc(t) (D1 |Xu ), Cov (t+1) (D1′ |Xu′ ) = (|Xu D1 |)/(|D1 | − 10). (2) x9 (10 objects in total) enters X5 . Due to Case 4 in Subsection T 3.2.1, we have: T (t+1) ′ ′ Acc (D3 |X5 ) = (|X5 D3 |+10)/(|X5 |+10), Cov (t+1) (D3′ |X5′ ) = (|X5 D3 |+10)/(|D3 |+ 10) for X5′ → D3′ . T Acc(t+1) (Dk′ |X5′ ) = (|X5 Dk |)/(|X5 | + 10), Cov (t+1) (Dk′ |X5′ ) = Cov(t) (Dk′ |X5′ ) for X5′ → Dk′ (k 6= 3). T Acc(t+1) (D3′ |Xu′ ) = Acc(t) (D3 |Xu ), Cov (t+1) (D3′ |Xu′ ) = (|Xu D3 |)/(|D3 | + 10) for Xu′ → D3′ (u 6= 5).


255

(3) A new sample set X7 (where a1 = 2, a2 = 3, a3 = 2, d = 3) enters into the system. Due to Case 2 in Subsection 3.2.1, we have: Acc(t+1) (D4′ |X7′ ) = 1, Cov (t+1) (D4′ |X7′ ) = 20/(|D4 | + 20) for X7′ → D4′ . Acc(t+1) (Dk′ |X7′ ) = Cov (t+1) (Dk′ |X7′ ) = 0 for X7′ → Dk′ (k 6= 4). T Acc(t+1) (D4′ |Xu′ ) = Acc(t) (D4 |Xu ), Cov (t+1) (D4′ |Xu′ ) = |Xu D4 |/(|D4 | + 20) for Xu′ → D4′ (u 6= 7). Therefore, the accuracy matrix and the coverage matrix at time t + 1 are calculated, respectively, as follows. 

     (t+1) ′ ′ Acc (D |X ) =       

     (t+1) ′ ′ Cov (D |X ) =      

 0 0 0 0 0.75 0.25 0 0    0 0.893 0 0.107   0 0 1 0    0.038 0.25 0.712 0   0 0.1 0.3 0.6  0 0 0 1 0 0.833 0 0 0.167 0 0

0 0 0 0.091 0 0 0.454 0 0.057 0 0.368 0 0.364 0.5 0 0.091 0.132 0.566 0 0 0.377

(5)

           

(6)

Obviously, different thresholds of accuracy and coverage lead to different interesting knowledge derived. The incremental learning interesting knowledge as the object change can be tracked as follows: First of all, we construct the attribute value table proposed in Section 3 to describe all the rules. Table 2 is the result of accuracy and coverage for every Xi → Dj at time t, according to (3) and (4). Next, if α = 0.6 and γ = 0.4, we get that X2 → D1 , X3 → D2 , X4 → D3 , X5 → D3 and X6 → D4 are interesting knowledge at time t according to Definition 2.3. Similarly, we construct Table 3 to show the accuracy and coverage for every rule at time t + 1 according to (5) and (6). Then we also obtain the interesting knowledge when α = 0.6 and γ = 0.4, namely, X2′ → D1′ , X3′ → D2′ , X5′ → D3′ and X6′ → D4′ are interesting knowledge at time t + 1. Comparing Table 2 with Table 3, we find out that X2 → D1 , X3 → D2 , X5 → D3 and X6 → D4 are still interesting at time t + 1, and X4 → D3 is no longer sensitive because of the changes of objects. Figure 2 is a comparison of the interesting knowledge at time t and t + 1. The interesting knowledge region is located at the top right corner. The above discussions only correspond to the fixed thresholds of accuracy and coverage. If thresholds vary, interesting knowledge generated also change. According to Figure 2, the interesting knowledge region contracts when α or γ is increasing, and the opposite situation happens when α or γ is decreasing. So, the parameters of α and γ play a very important role in our discussion. As the accuracy value of interesting knowledge is no less than 0.5, α is in [0.5,1]. For simplicity, we set γ in [0.3,1] with step 0.1.

256


Table 2. The 2-dimensional table of knowledge at time t Xi 1 1 1 1 2∗ 2 2 2 3 3∗ 3 3 ∗

Dj 1 2 3 4 1∗ 2 3 4 1 2∗ 3 4

Accuracy 1 0 0 0 0.75 0.25 0 0 0 0.893 0 0.107

Xi 4 4 4∗ 4 5 5 5∗ 5 6 6 6 6∗

Dj 1 2 3∗ 4 1 2 3∗ 4 1 2 3 4∗

Accuracy 0 0 1 0 0.043 0.286 0.671 0 0 0.1 0.3 0.6

Coverage 0 0 0.404 0 0.107 0.364 0.452 0 0 0.091 0.144 0.909

represents interesting knowledge at time t.

Table 3.

∗

Coverage 0.357 0 0 0 0.536 0.091 0 0 0 0.454 0 0.091

Xi′

Dj′

1 1 1 1 2∗ 2 2 2 3 3∗ 3 3 4 4

1 2 3 4 1∗ 2 3 4 1 2∗ 3 4 1 2

The 2-dimensional table of knowledge at time t + 1

Accuracy 0 0 0 0 0.75 0.25 0 0 0 0.893 0 0.107 0 0

Coverage 0 0 0 0 0.833 0.091 0 0 0 0.454 0 0.057 0 0

represents interesting knowledge at time t + 1.

Xi′ 4 4 5 5 5∗ 5 6 6 6 6∗ 7 7 7 7

Dj′ 3 4 1 2 3∗ 4 1 2 3 4∗ 1 2 3 4

Accuracy 1 0 0.038 0.25 0.712 0 0 0.1 0.3 0.6 0 0 0 1

Coverage 0.368 0 0.167 0.364 0.5 0 0 0.091 0.132 0.566 0 0 0 0.377


Table 4. γ = 0.3

γ = 0.4

γ γ γ γ γ γ

= 0.5 = 0.6 = 0.7 = 0.8 = 0.9 = 1.0

The interesting knowledge at time t when α and γ are changing

α = 0.5 (1,1),(2,1), (3,2),(4,3), (5,3),(6,4) (2,1),(3,2), (4,3),(5,3), (6,4) (2,1),(6,4) (6,4) (6,4) (6,4) (6,4) –

α = 0.6 (1,1),(2,1), (3,2),(4,3), (5,3),(6,4) (2,1),(3,2), (4,3),(5,3), (6,4) (2,1),(6,4) (6,4) (6,4) (6,4) (6,4) –

α = 0.7 (1,1),(2,1), (3,2),(4,3)

α = 0.8 (1,1),(3,2), (4,3)

α = 0.9 (1,1),(4,3)

α = 1.0 (1,1),(4,3)

(2,1),(3,2), (4,3)

(3,2),(4,3)

(4,3)

(4,3)

(2,1) – – – – –

– – – – – –

– – – – – –

– – – – – –

(i,j) represents the interesting knowledge Xi → Dj , i ∈ {1, 2, · · · , 6}, j ∈ {1, 2, 3, 4}.

Table 5. The interesting knowledge at time t + 1 when α and γ are changing

α = 0.5

α = 0.6

α = 0.7

α = 0.8

α = 0.9

α = 1.0

γ = 0.3

(2,1),(3,2), (2,1),(3,2), (2,1),(3,2), (3,2),(4,3), (4,3),(7,4) (4,3),(5,3), (4,3),(5,3), (4,3),(5,3), (7,4) (6,4),(7,4) (6,4),(7,4) (7,4)

(4,3),(7,4)

γ = 0.4

(2,1),(3,2), (2,1),(3,2), (2,1),(3,2), (3,2) (5,3),(6,4) (5,3),(6,4) (5,3)

–

–

γ = 0.5

(2,1),(5,3) (6,4)

(2,1),(5,3) (6,4)

(2,1),(5,3)

–

–

–

γ = 0.6

(2,1)

(2,1)

(2,1)

–

–

–

γ = 0.7

(2,1)

(2,1)

(2,1)

–

–

–

γ = 0.8

(2,1)

(2,1)

(2,1)

–

–

–

γ = 0.9

–

–

–

–

–

–

γ = 1.0

–

–

–

–

–

–

(i,j) represents the interesting knowledge Xi′ → Dj′ , i ∈ {1, 2, · · · , 7}, j ∈ {1, 2, 3, 4}.

257

258


Figure 2. A comparison of interesting knowledge at time t and t+1

Then the variations of interesting knowledge at time t and t + 1 are displayed in Table 4 and Table 5, respectively with α and γ changing in the specified intervals. From Table 4, Table 5 and Figure 2, we not only describe the variation of interesting knowledge at different time rapidly and clearly, but also discern the change of interesting knowledge with different thresholds of α and γ.

5. Conclusions Interesting knowledge of dynamic information systems is defined based on both accuracy and coverage in this paper. An incremental approach and algorithm to update interesting knowledge is proposed when the object set changes. A case study validated the rationality and validity of the proposed method. However, the work here is only concerning the complete information system and the equivalence relation. Our future research work will focus on the development of approaches for dynamically learning interesting knowledge in incomplete information systems. Another work is to explore if the proposed approach can be extended to other generalized rough sets models.

Acknowledgements The authors thank Professor Y.Y. Yao for his insightful suggestions. The authors also thank Dong Han, Xiaodong Wang and Zhijie Chen for their assistance in preparing the manuscript.


259

References [1] Z. Pawlak: Rough sets. International Journal of Computer and Information Science, 11, pp. 341-356 (1982) [2] A. An, C. Chan, N. Shan, N. Cercone, W. Ziarko: Applying knowledge discovery to predict water-supply consumption. IEEE Expert, 12(4), pp. 72-78 (1997) [3] A. Dimitras, R. Slowinski: Business failure prediction using rough sets. European Journal of Operational Research, 114, pp. 263-280 (1999) [4] M. Beynon, M. Peel: Variable precision rough set theory and data discretisation: an application to corporate failure prediction. The International Journal of Management Science, 29, pp. 561-576 (2001) [5] F. Tay, L. Shen: Economic and financial prediction using rough sets model. European Journal of Operational Research, 141, pp. 641-659 (2002) [6] M. Beynon, N. Driffeld: An illustration of variable precision rough sets model: an analysis of the findings of the UK Monopolies and Mergers Commission. Computers and Operations Research, 32, pp. 1739-1759 (2005) [7] S. Tsumoto: Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences, 162, pp. 65-80 (2004) [8] C. Su, J. Hsu: Precision parameter in the variable precision rough sets model: an application. The International Journal of Management Science, 34, pp. 149-157 (2006) [9] S. Wong, W. Ziarko, Z. Pawlak: Algorithm for inductive learning, Bulletin of the Polish Academy of Sciences, Technical Sciences, 34, pp. 271-276 (1986) [10] J. Han, M. Kamber: Data Mining: Concepts and Techniques. Corgan Kaufmann Press, San Fransisco (2006) [11] S. Tsumoto: Extraction of experts’ decision process from clinical databases using rough set model. In Proceedings of PKDD 1997, pp. 58-67 (1997) [12] S. Tsumoto: Accuracy and coverage in rough set rule induction. In: J. Alpigini et al. (Eds.): RSCTC 2002, LNAI, 2475, pp. 373-380 (2002) [13] W. Ziarko: Variable precision rough set model. Journal of Computer and System Sciences, 46, pp. 39-59 (1993) [14] Y. Yao, S.K.M. Wong: A decision theoretic framework for approximating concepts. International Journal of Man-machine Studies, 37, pp. 793-809 (1992) [15] Y. Yao: Decision-theoretic rough set models. In: The Second International Conference on Rough Sets and Knowledge Technology, LNAI 4481, pp. 1-12, Springer, Heidelberg (2007) [16] Y. Yao: Probabilistic rough set approximations. International Journal of Approximate Reasoning, 49, pp. 255-271 (2008) [17] L. Shan, W. Ziarko: Data-based acquisition and incremental modification of classification rules. Computational Intelligence, 11, pp. 357-370 (1995) [18] A. Skowron, C. Rauszer: The discernibility matrices and functions in information systems. Intelligent Decision Support, Dordrecht, Kluwer Academic Publishers, pp. 331-362 (1992) [19] W. Bang, Z. Bien: New incremental learning algorithm in the framework of rough set theory. International Journal of Fuzzy Systems, 1, pp. 25-36 (1999) [20] L. Tong, L. An: Incremental learning of decision rules based on rough set theory. In: Proceedings of the World Congress on Intelligent Control and Automation (WCICA2002), pp. 420-425 (2002)

260


[21] B. Jerzy, R. Slowinski: Incremental induction of decision rules from dominance-based rough approximations. Electronic Notes in Theoretical Computer Science, 82, pp. 40-51 (2003) [22] Z. Zheng, G. Y. Wang: RRIA: A rough set and rule tree based incremental knowledge acquisition algorithm. Fundamenta Informaticae, 59(2-3): pp. 299-313 (2004) [23] F. Hu, G. Y. Wang, H. Huang, Y. Wu: Incremental attribute reduction based on elementary sets. RSFDGrC2005 Part 1, LNAI3641, pp. 185-193 (2005) [24] C. Chan: A rough set approach to attribute generalization in data mining. Information Sciences, 107, pp. 177-194 (1998) [25] T. Li, D. Ruan, G. Wets, J. Song, Y. Xu: A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowledge-Based Systems, 20, pp. 485-494 (2007) [26] D. Liu, P. Hu, C. Jiang: The methodology of the variable precision rough set increment study based on completely information system. In: The Third International Conference on Rough Sets and Knowledge Technology, LNAI 5009, pp. 276-283. Springer, Heidelberg (2008)

An Incremental Approach for Inducing Knowledge ... - Semantic Scholar

An Incremental Approach for Inducing Knowledge ... - Semantic Scholar

Suggest Documents

An Approach for Inducing Interesting Incremental Knowledge Based ...

a composite approach to inducing knowledge for ... - Semantic Scholar

Incremental Knowledge Acquisition for Non ... - Semantic Scholar

a composite approach to inducing knowledge for

XMILE: An XML Based Approach for Incremental ... - Semantic Scholar

Incremental learning optimization on knowledge ... - Semantic Scholar

Incremental Knowledge Representation Model ... - Semantic Scholar

Exploiting Rich Context: An Incremental Approach ... - Semantic Scholar

An Incremental Parallel Scheduling Approach to ... - Semantic Scholar

A Knowledge Integration Approach for ... - Semantic Scholar

Incremental Learning of Control Knowledge for ... - Semantic Scholar

Ergosterol Peroxide, an Apoptosis-Inducing ... - Semantic Scholar

An Incremental-Approximate-Clustering Approach for ... - CiteSeerX

An Incremental-Approximate-Clustering Approach for ... - CiteSeerX

An Incremental-evolutionary Approach for Learning Deterministic ...

An Incremental Approach for Repairing Record ...

An Ensemble Approach for Incremental Learning

An New Approach for Incremental Speaker

Engineering an Incremental ASP Solver - Semantic Scholar

An Online-Optimized Incremental Learning ... - Semantic Scholar

An Efficient, Incremental, Automatic Garbage ... - Semantic Scholar

An Interactive Method for Inducing Operator ... - Semantic Scholar

An Interactive Method for Inducing Operator ... - Semantic Scholar

PRISMATIC: Inducing Knowledge from a Large ... - Semantic Scholar