Stable rule extraction and decision making in ... - Semantic Scholar

3 downloads 1380 Views 2MB Size Report
IOS Press. Stable rule extraction and decision making in .... quentially call every element an object, an attribute and ...... different conference papers [29,30].
41

International Journal of Hybrid Intelligent Systems 8 (2011) 41–57 DOI 10.3233/HIS-2011-0130 IOS Press

Stable rule extraction and decision making in rough non-deterministic information analysis ´ ¸ zakd,e Hiroshi Sakaia,∗, Hitomi Okumab , Michinori Nakatac and Dominik Sle a

Mathematical Sciences Section, Department of Basic Sciences, Faculty of Engineering, Kyushu Institute of Technology, Tobata, Kitakyushu 804, Japan b Faculty of Education and Welfare Science, Oita University, Dannoharu, Oita 870, Japan c Faculty of Management and Information Science, Josai International University, Gumyo, Togane, Chiba 283, Japan d Institute of Mathematics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland e Infobright Inc., Poland, Krzywickiego 34 pok. 219, 02-078 Warsaw, Poland

Abstract. Rough Non-deterministic Information Analysis (RNIA) is a rough set-based data analysis framework for Nondeterministic Information Systems (NISs). RNIA-related algorithms and software tools developed so far for rule generation provide good characteristics of NISs and can be successfully applied to decision making based on non-deterministic data. In this paper, we extend RNIA by introducing stability factor that enables to evaluate rules in a more flexible way and by developing a question-answering functionality that enables decision makers to analyze data gathered in NISs in case there are no pre-extracted rules that may address specified conditions. Keywords: Rough non-deterministic information analysis, Incomplete information, Rule generation, Apriori algorithm, Stability factor, Decision making

1. Introduction Rough set theory offers a simple and powerful mathematical approach to vagueness and uncertainty. Rough set concepts have been recognized to be very useful in data-based knowledge representation [11,21–23,32, 40]. Rough set theory usually handles tables with deterministic information, called Deterministic Information Systems (DISs). A number of applications of rough set theory to databases, data mining, machine learning and knowledge discovery have been investigated [3,5, 7,8,33,35,36]. Non-deterministic Information Systems (NISs) and Incomplete Information Systems (IISs) have been proposed for handling information incompleteness in DISs, like null values, unknown values, missing values and others [6,9,12–16,19,20,37,38]. Non-determinism and ∗ Corresponding

author. E-mail: [email protected].

1448-5869/11/$27.50  2011 – IOS Press. All rights reserved

information incompleteness are important topics in the areas of data mining and data processing. There is a growing interest in uncertain data in the community of database researchers and practitioners [4,10]. There is also a lot of research on theoretical foundations of information systems with uncertain values [6,14]. Grzymała-Busse [7–9] and Kryszkiewicz [12,13] developed a number of algorithms generating rules from IISs, which are information systems with some of attributes’ values possibly missing (wherein missing values may have various statuses). Stefanowski and Tsoukias [37,38] defined non-symmetric similarity relations and valued tolerance relations, and Nakata [17, 18] coped with semantic aspects of incomplete information. Lipski [14,15] introduced the foundations for question-answering systems and Orłowska [19,20] established rough set analysis for NISs, which are information systems with attributes that can be labeled with subsets of values instead of single values. To some extent, NISs can be treated as generalization of IISs

42

H. Sakai et al. / Stable rule extraction and decision making in RNIA

(wherein the above-mentioned various statuses of missing values need a specific interpretation). On the other hand, NISs need to be further compared with other models of non-deterministic information, such as, e.g., Lipski’s Incomplete Information Databases (IIDs) [15, 31]. For a NIS, we usually suppose that there exists a DIS with unknown actual information, denoted as DIS actual . Surely, it is impossible to know DIS actual itself without additional information. On the other hand, the following holds: (certainty modality) If a formula F holds in each DIS derived from a NIS, F will hold in DIS actual . (possibility modality) If a formula F holds in some DISs derived from a NIS, F may hold in DIS actual . We may say that, in case of possibility (certainty) a formula F is (is not) influenced by information incompleteness related to a NIS. An important problem is then how to validate the above modalities during optimization of formulas describing information gathered in NISs for decision making purposes. One might surely think about considering all DISs that can be derived from a given NIS. However, such an explicit method would result in exponential complexity. In our research, we focused on analytical methods that are not explicit in the above sense. In particular, we investigated a number of aspects of rough set analysis of NISs that do not need derivation of exponential number of DISs. Below we list some of them. We call the obtained framework Rough Non-deterministic Information Analysis (RNIA). (Aspect-1) Definability of sets in NISs [17,18,24,25]. (Aspect-2) Consistency of objects in NISs [25]. (Aspect-3) Data dependencies in NISs [25]. (Aspect-4) Decision rule generation for NISs [10,26, 27,31,34]. (Aspect-5) Decision rule generation for numeric data [28]. In this paper, we continue extending the RNIA framework by considering the following new aspects: (Aspect-6) Decision rule generation based on stability factor analysis [29]. (Aspect-7) Question-answering functionality as a complement to rules [30]. The first of above new aspects is to define more flexible criteria for evaluating rules during their generation from data. The previously used criteria based on

certainty and possibility often turn out as too rigorous or too weak, respectively, in providing a reasonable amount of truly meaningful rules. The role of stability factor computed out of a NIS for a decision rule τ is to estimate percentage of DISs derivable from NIS, for which τ meets standard criteria applied during rule generation. Thus, one may say that Aspect-6 leads to the framework for extraction of stable rules that may not satisfy specified criteria with certainty but have high enough chance of satisfaction for the decision making purposes. However, it is important to provide stability factor’s estimation procedure that would not lead to exponential complexity related to the amount of DISs. The second of above new aspects relates to enriching RNIA with question-answering functionality, which may be helpful to the users of a decision support system in case the conditions they want to specify cannot be matched by any of previously computed rules. Usually, such a type of functionality is introduced to a given framework of dealing with data prior to more advanced approaches to, e.g., automatic rule extraction. On the other hand, user-specified questions and rule generation algorithms share the same foundations, as in both cases the simplest scenario is to evaluate conjunctions of basic attribute-value conditions against the available data. Such evaluation, however, becomes difficult in case of non-deterministic data provided by means of a NIS, where it is not so obvious how to judge relevance of each particular object to user-specified conditions. In this paper, we deal with this issue just like in case of the rule generation algorithms previously developed within RNIA and we report approximations of answers that the users may find as helpful for decision making. In the remainder of the paper, we recall the foundations of RNIA in Section 2. We describe rule generation and our previous results in Section 3. We study Aspect-6 and Aspect-7 in Sections 4 and 5, respectively. We conclude our research on enhancements of RNIA in Section 6.

2. An overview of RNIA This section summarizes a framework of RNIA. 2.1. Basic definitions A Deterministic Information System (DIS) is a quadruplet (OB, AT, {V ALA |A ∈ AT }, f ) [21, 32].

H. Sakai et al. / Stable rule extraction and decision making in RNIA

Here, OB, AT and VALA are finite sets, and we sequentially call every element an object, an attribute and an attribute value, respectively. Furthermore, f is a mapping such that f : OB × AT → ∪A∈AT V ALA . We usually define two sets CON ⊆ AT which we call condition attributes and DEC ⊆ AT (CON ∩ DEC = ∅) which we call decision attributes. An object x ∈ OB is consistent (with any distinct object y ∈ OB), if f (x, A) = f (y, A) for every A ∈ CON implies f (x, B) = f (y, B) for every B ∈ DEC. We call a pair [A, ζA ] (A ∈ AT, ζA ∈ V ALA ) a descriptor. For CON and DEC, we usually consider the next implications. ∧A∈CON [A, f (x, A)] ⇒ ∧B∈DEC [B, f (x, B)] (x ∈ OB). If an object x is consistent, we say this implication is consistent. A ratio |{x ∈ OB|x is consistent for CON and DEC }|/|OB| is called the degree of dependency from CON to DEC in a DIS. In a DIS, we can easily define a set of all equivalence classes over OB. Namely, x and y are equivalent for ATR ⊆ AT, if f (x, A) = f (y, A) for any A ∈ AT R. Let us denote the set of all equivalence classes eq(AT R), and let us denote each equivalence class [x]AT R . A Non-deterministic Information System (NIS) is also a quadruplet (OB, AT, {V ALA |A ∈ AT }, g) [14,19,20], where g : OB × AT → P (∪A∈AT V ALA ) (a power set of ∪A∈AT V ALA ). Every set g(x, A) is interpreted as that there is an actual value in this set but this value is not known. For a NIS = (OB, AT, {V ALA | A ∈ AT }, g) and a set ATR ⊆ AT, we name a DIS = (OB, AT R, {V ALA | A ∈ AT R}, h) satisfying h(x, A) ∈ g(x, A) a derived DIS (for AT R) from a NIS. We name an equivalence class [x]AT R in a derived DIS a possible equivalence class (pe-class) in a NIS. 2.2. An example This subsection clarifies the issues according to an exemplary example of a NIS. Example 1. Let us consider the following NIS.

43

Fig. 1. A NIS and 24 derived DISs.

(1) Pe-classes depending upon the attribute values: For AT R = {Color}, there are four derived DISs, and there are four sets of equivalence classes, i.e., {{1, 2, 3}} (h(1, Color) = h(2, Color) = h(3, Color) = red), {{1, 2}, {3}} (h(1, Color) = h(2, Color) = red, h(3, Color) = blue), {{1}, {2, 3}} (h(1, Color) = green, h(2, Color) = h(3, Color) = red), {{1}, {2}, {3}} (h(1, Color) = green, h(2, Color) = red, h(3, Color) = blue). As for the pe-class [1]{Color} , we also have two cases. (A) If h(1, Color) = red, {1, 2} ⊆ [1]{Color} ⊆ {1, 2, 3}, (B) If h(1, Color) = green, {1} ⊆ [1]{Color} ⊆ {1}. (2) The definability of a set: Let us consider a set X = {1, 3} of objects. In DIS24 , [1]{Color} = {1} (only object 1 satisfies the descriptor [Color,green]) and [3]{Color} = {3} hold. Therefore, X = [1]{Color} ∪ [3]{Color} is derived. In such a case, we say X is definable in DIS24 . In DIS24 , any subset of {1, 2, 3} is definable. However, X is not definable in DIS1 , because [1]{Color} = {1, 2, 3} and [1]{Size} = [1]{Color,Size} = {1, 2}. (3) The consistency of an object: Let us consider CON = {Color} and DEC = {Size}. In DIS1 , the object 1 and object 3 are not consistent, and there is no consistent object. Therefore, the degree of dependency in DIS1 is 0/3 = 0.0. On the other hand, any object is consistent in DIS24 . Therefore, the degree of dependency in DIS24 is 3/3 = 1.0. In Example 1, we have shown the relation between a NIS and the derived DISs. As for the definability of

44

H. Sakai et al. / Stable rule extraction and decision making in RNIA

a set, it is possible to define two modalities, i.e., X is definable in all derived DISs (Certainty) and X is definable in some derived DISs (Possibility). As for the degree of dependency, it is possible to define the minimum and the maximum values. According to the definition of g(x, A) in a NIS, we suppose that there exists a derived DISactual with actual information. In order to handle information based on DISactual , two modalities certainty and possibility are usually defined. (Certainty) If a formula F holds in every derived DIS from a NIS, F also holds in DIS actual . In this case, we say F certainly holds (in DIS actual ). (Possibility) If a formula F holds in some derived DISs from a NIS, there exists such a possibility that F holds in DIS actual . In this case, we say F possibly holds (in DIS actual ). Due to such way of thinking, we have investigated (Aspect-1) to (Aspect-5), and we have extended rough set-based issues to RNIA. As we have already shown, the number of derived DISs increases exponentially, so we did not employ the explicit method, but we employed the minimum and the maximum sets for [x]AT R defined in the next subsection. 2.3. The minimum and the maximum sets of pe-classes Now in this subsection, we give the key definition for computing modal concepts. Definition 1. In a NIS = (OB, AT, {V ALA |A ∈ AT }, g), we define the following two sets of objects, i.e., inf and sup, for each descriptor [A, ζA ] (A ∈ AT R ⊆ AT , ζA ∈ V ALA ). (1) (2) (3) (4)

inf ([A, ζA ]) = {x ∈ OB|g(x, A) = {ζA }}. inf (∧A∈AT R [A, ζA ]) = ∩A∈AT R inf ([A, ζA ]). sup([A, ζA ]) = {x ∈ OB|ζA ∈ g(x, A)}. sup(∧A∈AT R [A, ζA ]) = ∩A∈AT R sup([A, ζA ]).

The set sup is semantically equal to a set defined by the similarity relation SIM [9,12,13,37]. In these research, some theorems are presented based on the relation SIM, and our theoretical results are closely related to those theorems. However, the set inf causes new properties, which hold just in NISs. Now, let us consider a relation between a peclass [x]AT R and two sets inf and sup. In a DIS = (OB, AT, {V ALA |A ∈ AT }, f ), let us suppose f (x, A) = ζA (A ∈ AT R). Then, any [x]AT R is uniquely defined by {y ∈ OB|f (y, A) = ζA , A ∈ AT R}. Since, we can see a DIS as such a NIS that

g(x, A) is a singleton set for any x ∈ OB and any A ∈ AT , we can apply Definition 1 to each DIS. In this case, g(x, A) = {ζA } and ζA ∈ g(x, A) are equivalent, therefore we have the next equation. [x]AT R = inf (∧A∈AT R [A, ζA ]) = sup(∧A∈AT R [A, ζA ]). However in a NIS, g(x, A) may not be a singleton set. So, let us suppose a case that ζA ∈ g(x, A) (A ∈ AT R) and we consider a pe-class [x]AT R depending upon the descriptors ∧A∈AT R [A, ζA ]. Then, {x} ⊆ inf (∧A∈AT R [A, ζA ]) ∪ {x} ⊆ [x]AT R ⊆ sup(∧A∈AT R [A, ζA ]) holds. Proposition 1 in the following connects a peclass [x]AT R (depending upon the descriptor ∧A∈AT R [A, ζA ]) with inf and sup. Proposition 1 [25]. For a NIS, an object x, ATR ⊆ AT and ζA ∈ g(x, A) (A ∈ AT R), conditions (1) and (2) in the following are equivalent. (1) X is a pe-class [x]AT R defined by the condition ∧A∈AT R [A, ζA ]. (2) (inf (∧A∈AT R [A, ζA ]) ∪ {x}) ⊆ X ⊆ sup(∧A∈AT R [A, ζA ]). Proposition 1 generalizes the statement (1) in Example 1. Let us consider a pe-class [1]{Color} in Example 1. (A) In case of h(1, Color) = red, we had two pe-classes {1, 2} and {1, 2, 3}. These two peclasses are also defined by X in the following. inf ([Color, red]) ∪ {1} ⊆ X ⊆ sup([Color, red]), Here, inf ([Color, red]) = {2} and sup([Color, red]) = {1, 2, 3}. (B) In case of h(1, Color) = green, we had a peclass {1}. This pe-class is also defined by X in the following. inf ([Color, green]) ∪ {1} ⊆ X ⊆ sup([Color, green]), Here, inf ([Color, green]) = ∅ and sup([Color, green]) = {1}. Due to Proposition 1, we can handle any pe-class [x]AT R by using inf and sup.

H. Sakai et al. / Stable rule extraction and decision making in RNIA Table 1 A table of DIS with incomplete information OB 1 2 3

Color * red *

Size * * m

2.4. Non-deterministic information and incomplete information This subsection clarifies the semantic difference between non-deterministic information and incomplete information. Let us consider Table 1. The symbol ∗ is often employed for indicating incomplete information. Table 1 is generated by replacing non-deterministic information in Fig. 1 with ∗. There are some interpretations of this ∗ symbol [9,12,17,37]. In the most simple interpretation of incomplete information, the symbol ∗ may be each attribute value. Namely, ∗ may be either red, blue or green and either s, m or l. There are 81 (= 34 ) possible cases in Table 1. In Table 1, the implication from object 1 may be [Color, blue] ⇒ [Size, s], and this contradicts τ : [Color, blue] ⇒ [Size, m] from object 3. However in Fig. 1, τ is consistent in all derived DISs where τ occurs from the object 3, because just the object 3 satisfies the condition [Color,blue] in Fig. 1. Like this, there is the semantic difference between non-deterministic information and incomplete information.

45

Table 2 A Deterministic Information System (An artificial table data) OB 1 2 3 4 5 6 7 8

T emperature high high normal high high normal normal normal

Headache yes yes no yes yes yes no yes

N ausea no yes no yes yes yes yes yes

F lu yes yes yes yes no yes no yes

support (τ ) = NUM (τ )/|OB|, accuracy (τ ) = NUM (τ )/NUM (condition part). N U M (τ ) = the number of occurrences of τ , N U M (condition part) = the number of occurrences of the condition part. Definition 2. We define the following as the definition of rule generation in a DIS. Find all implications τ satisfying support(τ ) ≧ α and accuracy(τ ) ≧ β for the threshold values α and β (0 < α, β 6 1). Figure 2 shows the visual image of Definition 2. Agrawal proposed Apriori algorithm [1,2] for such rule generation, and Apriori algorithm is now a representative algorithm for data mining [4]. In Table 2, let us suppose α = 0.3 and β = 0.8. For an implication τ ′ , τ ′ : [N ausea, yes] ⇒ [F lu, yes], support(τ ′ ) = 4/8 = 0.5 > 0.3 and accuracy (τ ′ ) = 4/6 < 0.8. Therefore, we do not see τ ′ as a rule. For an implication τ ′′ , τ ′′ : [Headache, yes] ∧ [N ausea, yes] ⇒ [F lu, yes],

3. Decision rule generation in DISs and NISs This section at first surveys rule generation in DISs, then summarizes the framework of rule generation in NISs and some proved results. 3.1. Decision rule generation in DISs and Apriori algorithm We usually identify a DIS = (OB, AT, {V ALA |A ∈ AT }, f ) with a standard table. A rule (more correctly, a candidate of a rule) is an appropriate implication in the form of τ : Condition part ⇒ Decision part from a DIS. We usually employ the following two criteria, support(τ ) and accuracy(τ ) [1,2,21].

support (τ ′′ ) = 4/8 = 0.5 > 0.3 and accuracy (τ ′′ ) = 4/5 > 0.8. Therefore, we see τ ′′ as a rule. 3.2. Decision rule generation in NISs Now, let us consider a NIS in Table 3. In reality, Table 2 is a derived DIS from this NIS in Table 3. We clarify implications from a NIS. In Table 3, we can pick up τ : [T emperature, high] ⇒ [F lu, yes] from objects 1, 2, 3, 4 and 8. In order to discriminate implications from different objects, We may use the notation τ x from an object x, for example, τ 1 (τ from the object 1) and τ 8 (τ from the object 8).

46

H. Sakai et al. / Stable rule extraction and decision making in RNIA Table 3 A Non-deterministic Information System (An artificial table data) OB 1 2 3 4 5 6 7 8

T emperature {high} {high, very high} {normal, high, very high} {high} {high} {normal} {normal} {normal, high, very high}

Headache {yes, no} {yes} {no} {yes} {yes, no} {yes} {no} {yes}

N ausea {no} {yes} {no} {yes, no} {yes} {yes, no} {yes} {yes, no}

F lu {yes} {yes} {yes, no} {yes, no} {no} {yes, no} {no} {yes}

Fig. 2. A pair (support, accuracy) corresponding to the implication τ is plotted in a coordinate plane.

Furthermore, we consider a set of derived DISs with τ x , and let DD(τ x ) denote this set. For a set of attributes {T emperature, F lu}, there are 144 (= 24 × 32 ) derived DISs, |DD(τ 1 )| = 144 and |DD(τ 8 )| = 48 hold. If |DD(τ x )| is equal to the number of all derived DISs, we say this τ x is definite. Otherwise, we say this τ x is indefinite. Remark. The same τ may occur from the different objects x and y, and there may be a case that τ x satisfies the condition of rules but τ y does not satisfy this condition. Therefore, we specify the object x in τ x for each calculation, because DD(τ x ) depends upon x. However, we do not specify the object x for obtained rules τ . Namely, if τ x for an object x satisfies the condition of rules, we see this τ is a rule. Definition 3. Let us consider the threshold values α and β (0 < α, β 6 1). We define the following as the definition of rule generation in a NIS. (1) The lower system: Find all implications τ in the following: There exists an object x such that support(τ x ) > α and accuracy(τ x ) > β hold in each ψ ∈ DD(τ x ). (2) The upper system: Find all implications τ in the following: There exists an object x such that support(τ x ) > α and accuracy(τ x ) > β hold in some ψ ∈ DD(τ x ).

In a DIS, DD(τ x ) means a singleton set, therefore the lower and the upper systems define the same implications in a DIS. Namely, the above definition is a natural extension from rule generation in DISs. Intuitively, the lower system defines rules with certainty, and the upper system defines rules with possibility. Especially, if τ x is definite and τ x satisfies the condition of the lower system, this τ is the most reliable. Because, DD(τ x ) is equal to the set of all derived DISs, and τ x satisfies the condition of support and accuracy in all derived DISs. If τ x is indefinite, DD(τ x ) is generally a proper subset of all derived DISs. Definition 3 seems to be a natural extension from rule generation in DISs. However, this definition depends upon DD(τ x ). Since |DD(τ x )| increases exponentially, the explicit method, i.e., the sequential check in each ψ ∈ DD(τ x ), seems hard to apply to Definition 3. We employ two sets inf and sup again, and solve this problem. 3.3. The minimum and the maximum criterion values, and NIS-Apriori algorithm In Table 3, let us consider τ 1 : [T emperature, high] ⇒ [F lu, yes], again. In this case, support(τ 1 ) and accuracy(τ 1 ) take several values according to the derived DISs. We define the following for each implication τ x .

H. Sakai et al. / Stable rule extraction and decision making in RNIA

47

Table 4 A derived DISworst from Table 3, which causes the minimum support 1/8 and the minimum accuracy 1/4 of τ 1 , and a derived DISbest which causes the maximum support 5/8 and the maximum accuracy 5/6 of τ 1 to the right OB 1 2 3 4 5 6 7 8

T emperature high very high high high high normal normal normal

F lu yes yes no no no no no yes

OB 1 2 3 4 5 6 7 8

T emperature high high high high high normal normal high

F lu yes yes yes yes no yes no yes

Fig. 3. A distribution of pairs (support, accuracy) for an implication τ x .

maxacc(τ x ) =

minsupp(τ x ) (the minimum support in DD(τ x )), minacc(τ x ) (the minimum accuracy in DD(τ x )), maxsupp(τ x ) (the maximum support in DD(τ x)), maxacc(τ x ) (the maximum accuracy in DD(τ x )). For such criterion values, we have proved the next results. From now, we may employ [CON, ζ] and [DEC, η] for expressing ∧A∈CON [A, ζA ] and ∧A∈DEC [A, ηA ].

|inf ([CON, ζ])∩sup([DEC, η])|+|IN ACC| . |inf ([CON, ζ])|+|IN ACC| Here, OUTACC = [sup([CON, ζ])\inf ([CON, ζ])]\ inf ([DEC, η]),

Proposition 2 [27]. In a NIS = (OB, AT, {V ALA |A ∈ AT }, g), we can calculate the minimum and the maximum of support(τ x ) and accuracy(τ x ) by using inf and sup. In reality, there are four types of implications, and the calculation is slightly different each other. Furthermore, this calculation does not depend upon |DD(τ x )| in each case. For example, if τ x : [CON, ζ] ⇒ [DEC, η] is definite,

Proposition 3 [27]. For each implication τ x , there is a derived DISworst , where both support(τ x ) and accuracy(τ x ) are the minimum. Furthermore, there is a derived DISbest , where both support(τ x ) and accuracy(τ x ) are the maximum.

x

minsupp(τ ) = |inf ([CON, ζ]) ∩ inf ([DEC, η])|/|OB|, minacc(τ x ) =

|inf ([CON, ζ]) ∩ inf ([DEC, η])| , |inf ([CON, ζ])| + |OU T ACC|

maxsupp(τ x ) = |sup([CON, ζ]) ∩ sup([DEC, η])|/|OB|,

IN ACC = [sup([CON, ζ]) \ inf ([CON, ζ])] ∩sup([DEC, η]).

For an implication τ 1 : [T emperature, high] ⇒ [F lu, yes], Table 4 shows a DISworst and a DISbest . Furthermore, Fig. 3 shows the distribution of pairs (support (τ x ), accuracy(τ x )) in a derived DIS. According to Proposition 3 and Fig. 3, we can consider the next Proposition 4 instead of Definition 3. Proposition 4. Let us consider the threshold values α and β (0 < α, β 6 1). We can revise Definition 3 to the following.

48

H. Sakai et al. / Stable rule extraction and decision making in RNIA

(1) The lower system: Find all implications τ in the following: There exists an object x such that minsupp(τ x ) > α and minacc(τ x ) > β. (2) The upper system: Find all implications τ in the following: There exists an object x such that maxsupp(τ x ) > α and maxacc(τ x ) > β. In Definition 3, we needed to examine support and accuracy in all derived DD(τ x ), however in Proposition 4 we can examine the same results by comparing (minsupp, minacc) and (maxsupp, maxacc) with the threshold α and β, respectively. Like this, we extended rule generation in DISs to rule generation in NISs, and realized a software tool NIS-Apriori [27, 29]. This is an adjusted Apriori algorithm to NISs, and it can handle not only deterministic information but also non-deterministic information. NIS-Apriori algorithm does not depend upon the number of derived DISs, and the complexity is almost the same as Apriori algorithm. We are discussing on Data Mining in Warehousing and Various Types of Inexact Data [10]. 3.4. Real execution This subsection shows a real execution of a NIS in Table 3. Every program is implemented in C on a Windows PC with Pentium 4 (3.2GHz). We first apply Microsoft Excel to make the following data set f lu.csv in Table 3. For indefinite attribute values, we employ a list notation, for example, [high, very high]. high,"[yes,no]",no,yes /* table data */ "[high,very high]",yes,yes,yes "[normal,high,very high]",no,no,"[yes,no]" high,yes,"[yes,no]","[yes,no]" high,"[yes,no]",yes,no normal,yes,"[yes,no]","[yes,no]" normal,no,yes,no "[normal,high,very high]",yes,"[yes,no]",yes

In order to reduce the manipulation of string data, we translate this data to numerical data by using trans.exe. Then, we also make an attribute definition file in the following. In rule generation, we adjust values in this file. Finally, we execute nis apriori.exe command. 8 4 Temperature,Headache, Nausea,Flu 3,1,2,3 1,4 0.2 1.0

/* the number of objects */ /* the number of attributes */ /* names of attributes */ /* 3 candidates, condition attributes */ /* 1 decision attribute, 4 = Flu */ /* support value (α = 0.2) */ /* accuracy value (β = 1.0) */

This execution shows that there is a rule in the lower system, namely τ 5 : [Headache, no]∧[N ausea, yes] ⇒ [F lu, no] satisfies the condition support(τ 5 ) > 0.2 and accuracy (τ 5 ) > 1.0 in each ψ ∈ DD(τ 5 ). In this execution, the constraint is minsupp(τ x ) = |{x ∈ OB|x supports τ x }|/|OB| > 0.2. Thus, |{x ∈ OB| x supports τ x }| > |OB| × 0.2 must hold. Therefore, two objects must support τ x , and we need to handle a descriptor [A, ζA ] satisfying either (CASE A) or (CASE B) in the following: (CASE A) |inf ([A, ζA ])| > 2,

H. Sakai et al. / Stable rule extraction and decision making in RNIA

(CASE B) |inf ([A, ζA ])|=1 and (sup([A, ζA ]) \ inf ([A, ζA ])) 6= ∅. Each definite object can be obtained in (CASE A), and each indefinite can be obtained in (CASE B). Like this, CAN (1) is generated. Then, for descriptors [A, ζA ], [F lu, ηF lu ] ∈ CAN (1), if minsupp([A, ζA ]∧ [F lu, ηF lu ]) > 2, minacc([A, ζA ] ∧ [F lu, ηF lu ]) is calculated according to Proposition 2. If this value is more than 1.0, the conjunction is a rule. Otherwise, we add this conjunction to CAN (2). NIS-Apriori algorithm continues this process until CAN (n) = ∅. The following is a case of the upper system.

49

This execution shows that there are five rules in the upper system, namely five implications satisfy the condition support(τ x ) > 0.2 and accuracy(τ x ) > 1.0 in some ψ ∈ DD(τ x ). Here, we need to remark that the implementation of NIS-Apriori. This implementation is not complete, because some bugs seem to exist. In reality, an implication [T emperature, very high] ⇒ [F lu, yes] supported by objects 2, 3 and 8 is not obtained in the upper system. We are now removing such bugs, and we are refining this program. We have briefly summarized RNIA and (Aspect-1) to (Aspect-5). In the subsequent sections, we add (Aspect-6) and (Aspect-7) to RNIA for making this framework more powerful.

4. Aspect-6: Rule generation based on stability factor This section proposes a new criterion for discriminating rules in the upper system. 4.1. Introducing stability factor into the upper system Now, we consider the next two cases related to rules in the upper system. (CASE 1) If just a ψ ∈ DD(τ x ) satisfies the condition in rule generation, this τ is picked up as a rule in the upper system. (CASE 2) If most of ψ ∈ DD(τ x ) satisfy the condition in rule generation, this τ is also picked up as a rule in the upper system. Due to this example, there is the weakness in the definition of the upper system. In order to distinguish two cases, we need to add another criterion to the upper system, i.e., stability f actor of τ . Definition 4. Let us suppose that τ : [CON, ζ] ⇒ [DEC, η] is a rule in the upper system for α and β, and let OBJ(τ ) = sup([CON, ζ]) ∩ sup([DEC, η]). If support(τ ) > α and accuracy(τ ) > β hold in a ψ ∈ DD(τ x ), we say τ is (α, β)-stable in ψ. Let us define the following. DDAll (τ ) = ∪x∈OBJ(τ ) DD(τ x ), ST (τ, α, β) = {ψ ∈ DDAll (τ )|τ is (α, β) − stable in ψ}. ST F (τ, α, β) = |ST (τ, α, β)|/|DDAll (τ )|.

50

H. Sakai et al. / Stable rule extraction and decision making in RNIA

DDAll (τ ) means a set of all derived DISs, where τ occurs. In any DIS, if τ x = τ y (x 6= y), we can easily derive support(τ x ) = support(τ y ) and

be hard. Even though, it will be possible to calculate this factor, if the next assumption holds. (Assumption) The sizes of |sup ([CON, ζ])\inf ([CON, ζ])| and |sup([DEC, η]) \ inf ([DEC, η])| are both small.

accuracy(τ x ) = accuracy(τ y ). Namely, we do not have to specify x (in τ ) in any ψ ∈ DDAll (τ ). ST F (τ, α, β) is a ratio (the number of the derived DIS which makes τ (α, β)-stable)/DDAll (τ ). This ratio is depending upon just τ , and is not depending upon τ x (x ∈ OB). Let us consider τ : [N ausea, yes] ⇒ [F lu, no] in Table 3. For α = 0.3 and β = 0.7, τ is not picked up as a rule by the lower system, but τ is picked up as a rule by the upper system. Here, DDAll (τ ) = DD(τ 4 ) ∪ DD(τ 5 ) ∪ DD(τ 6 ) ∪ DD(τ 7 ) holds due to the Definition 4. Since τ 5 is definite, DD(τ 5 ) is equal to the set of all derived DISs for {Nausea, Flu}. Clearly, DD(τ 4 ), DD(τ 6 ) and DD(τ 7 ) are subsets of DD(τ 5 ), and DDAll (τ ) = DD(τ 5 ) holds. There are 64 derived DISs. In Fig. 4, 64 pairs of (support, accuracy) are plotted in a coordinate plane, and “•:number” implies the number of derived DISs. For example, four derived DISs cause a pair (3/8,0.5) in Fig. 4. As for τ , |ST (τ, 0.3, 0.7)| = 8 + 2 = 10 and ST F (τ, 0.3, 0.7) = 10/64 (about 16%) hold. In order to increase the stability factor, we may adjust α and β, for example, |ST (τ, 0.25, 0.5)| = 52 and ST F (τ, 0.25, 0.5) = 52/64 (about 81%) hold. According to this stability factor, we can assign a new probability (of the reliability) to rules in the upper system. This stability factor is related to the information incompleteness, and we soon have the next proposition. Proposition 5. (1) For any τ , τ is a rule in the lower system for α and β, if ST F (τ, α, β) = 1.0. (2) For a definite τ , τ is a rule in the lower system for α and β, if and only if ST F (τ, α, β) = 1.0. (3) For any τ , τ is a rule in the upper system for α and β, if and only if ST F (τ, α, β) > 0. If τ is not definite, we need to consider DD(τ x ) for the lower system. Since DD(τ x ) ⊂ DDAll (τ ), the (α, β)-stability of τ x for ψ ∈ DDAll (τ )\DD(τ x ) may not be assured. On the other hand, (3) always holds for any τ . This stability factor seems useful, but the number of elements in DDAll (τ ) increases exponentially. Therefore, the calculation of the stability factor will generally

According to this assumption, the number of derived DISs is restricted to small. In reality, the computational complexity for ST F (τ, α, β) depends upon 2|sup([CON,ζ])\inf ([CON,ζ])| and 2|sup([DEC,η])\inf ([DEC,η])| . These are much smaller than |DDAll (τ )|. 4.2. A Method for calculating stability factor by Pe-Classes In order to calculate ST F (τ, α, β), we propose a method using inf and sup in Proposition 1, which connects pe-class [x]AT R (depending upon ∧A∈AT R [A, ζA ]) and two sets inf , sup. By handling the pe-classes, it is possible to manage all derived DISs. Definition 5. In a NIS = (OB, AT, {V ALA |A ∈ AT }, g), we define pe-classes defined by [A, ζA ] and ∧A∈AT R [A, ζA ]. (1) pe[A,ζA ],M = inf ([A, ζA ]) ∪ M , M⊂DIF F[A,ζA ]=(sup([A, ζA ]) \ inf ([A, ζA ])). (2) pe∧A∈AT R [A,ζA ],M = inf (∧A∈AT R [A, ζA ]) ∪ M, M ⊂ DIF F∧A∈AT R [A,ζA ],M = (sup(∧A∈AT R [A, ζA ]) \ inf ([∧A∈AT R [A, ζA ])). Both pe[A,ζA ],M and pe∧A∈AT R [A,ζA ],M are almost the same as X in Proposition 1. In Definition 5, we do not focus on any object x, but we focus on descriptors. This pe[A,ζA ],M is a pe-class defined by a descriptor [A, ζA ]. Furthermore, we assign the next attribute values to x ∈ OB. (Assignment 1) If x ∈ pe[A,ζA ],M , the value is ζA ∈ V ALA . (Assignment 2) If x ∈ sup([A, ζA ]) \ pe[A,ζA ],M , the value is except ζA . (Assignment 3) If x 6∈ sup([A, ζA ]), the value is any value in g(x, A). Like this, it is possible to connect any pe[A,ζA ],M (and pe∧A∈AT R [A,ζA ],M ) to a set of derived DISs which cause pe[A,ζA ],M (and pe∧A∈AT R [A,ζA ],M ). In Fig. 5,

H. Sakai et al. / Stable rule extraction and decision making in RNIA

51

Fig. 4. A distribution (support, accuracy) of τ : [N ausea, yes] ⇒ [F lu, no] in Table 3.

Fig. 5. A relation of derived DISs and pe[T emperature,normal],M . Since the attribute value normal does not appear in objects 1, 2, 4 nor 5, we may omit them. However, |g(2, T emperature)| = 2 holds, so we think the twice numbers of derived DISs by four objects 3, 6, 7 and 8. Any derived DISs is characterized by a pe-class. The 18 derived DISs are reduced to four kinds of pe-classes defined by [T emperature, normal].

derived DISs causing pe[T emperature,normal],M are shown. In Fig. 5, inf ([T emperature, normal]) = {6, 7}, sup([T emperature, normal]) = {6, 7, 3, 8}, M ⊂ DIF F[T emperature,normal] = {3, 8}. Therefore, there are four kinds of pe-classes, i.e., {6, 7}, {3, 6, 7}, {6, 7, 8} and {3, 6, 7, 8} defined by the descriptor [T emperature, normal]. We can enumerate all derived DISs by using these four pe-classes. In the definition of ST F (τ, α, β) = |ST (τ, α, β)|/ |DDAll (τ )|, we calculate a ratio. Therefore, we need the number of derived DISs in ST (τ, α, β) and DDAll (τ ). Here, the number of cases by (Assignment 3) is multiplied to both the denominator and the numer-

ator, so it is enough to consider the cases by (Assignment1) and (Assignment 2). We have the next proposition. Proposition 6. pe[A,ζA ],M causes N U M[A,ζA ],M = Y

(|g(y, A)| − 1)

y∈sup([A,ζA ])\pe[A,ζA ],M

cases of derived DISs. pe∧A∈AT R [A,ζA ],M causes Y N U M∧A∈AT R [A,ζA ],M = ((

Y

y∈sup([A,ζA ])\pe[A,ζA ],M

|g(y, A)|) − 1)

A∈AT R

cases of derived DISs.

52

H. Sakai et al. / Stable rule extraction and decision making in RNIA

Now, we advance to the calculation of support and accuracy by pe[A,ζA ],M . Since we consider DDAll (τ ), we do not think such a case that pe[CON,ζ],M ∩ pe[DEC,η],M ′ = ∅, because τ does not occur in this case. For pe[CON,ζ],M and pe[DEC,η],M ′ satisfying pe[CON,ζ],M ∩ pe[DEC,η],M ′ 6= ∅, there exist some derived DISs with τ . By using pe[CON,ζ],M (6= ∅) and pe[DEC,η],M ′ , it is possible to calculate support(τ ) and accuracy(τ ) below. support(τ )=|pe[CON,ζ],M ∩ pe[DEC,η],M ′ |/|OB|, accuracy(τ ) = |pe[CON,ζ],M ∩ pe[DEC,η],M ′ |/|pe[CON,ζ],M |. For pe[CON,ζ],M and pe[DEC,η],M ′ , τ occurs N U M[CON,ζ],M × N U M[DEC,η],M ′ cases derived DISs due to Proposition 6. If support(τ ) > α and accuracy(τ ) > β, this τ is (α, β)-stable in these cases of derived DISs. Furthermore, we obtain X |DDAll (τ )| = M,M ′ ,pe[CON,ζ],M ∩pe[DEC,η],M ′ 6=∅

N U M[CON,ζ],M × N U M[DEC,η],M ′ . Finally, we have the next method to calculate STF (τ, α, β). Algorithm 1 (An overview of the calculation for stability factor). (1) Obtain inf ([CON, ζ]), sup([CON, ζ]), DIF F[CON,ζ] , inf ([DEC, η]), sup([DEC, η]) and DIF F[DEC,η] . (2) Assign DEN O = 0 and N U M E = 0. (3) Repeat the following for each pe[CON,ζ],M (6= ∅) and pe[DEC,η],M ′ . (3-1) Calculate support(τ ) (τ : [CON, ζ] ⇒ [DEC, η]). (3-2) If support(τ ) > 0, execute (3-3) else go to (3). (3-3) DEN O = DEN O+N U M[CON,ζ],M × N U M[DEC,η],M ′ . Calculate accuracy(τ ). If support(τ ) > α and accuracy(τ ) > β, N U M E = N U M E + N U M[CON,ζ],M ×N U M[DEC,η],M ′ . (4) ST F (τ, α, β) is a ratio, N U M E/DEN O.

Now, we simulate the above calculation. Let us consider a rule τ : [N ausea, yes] ⇒ [F lu, no](α = 0.3 and β = 0.7) in Fig. 4, again. At first, we obtain inf and sup, i.e., inf ([N ausea, yes]) = {2, 5, 7}, sup([N ausea, yes]) = {2, 5, 7, 4, 6, 8}, DIF F[N ausea,yes] = {4, 6, 8}, inf ([F lu, no]) = {5, 7}, sup([F lu, no]) = {5, 7, 3, 4, 6}, DIF F[F lu,no] = {3, 4, 6}. Then, we fix DEN O = 0 and N U M E = 0. In this case, τ 5 is definite. Thus, we know DEN O is equal to the number of all derived DISs for {N ausea, F lu}, i.e., DEN O = 64. Then, the condition of support and accuracy is examined for each combination. pe[N ausea,yes],M = {2, 5, 7} ∪ M (M ⊂ {4, 6, 8}), pe[F lu,no],M ′ = {5, 7} ∪ M ′ (M ′ ⊂ {3, 4, 6}). For example, a combination pe[N ausea,yes],{4} = {2, 5, 7, 4} and pe[F lu,no],{4} = {5, 7, 4} satisfies |{2, 5, 7, 4} ∩ {5, 7, 4}|/ 8 > 0.3 and |{2, 5, 7, 4} ∩ {5, 7, 4}|/|{2, 5, 7, 4}| > 0.7. Since τ is (0.3,0.7)-stable for this pair, the following is calculated due to Proposition 6. N U M[N ausea,yes],{4} = (|g(6, N ausea)| − 1)|×(|g(8, N ausea)| − 1) = 1, N U M[F lu,no],{4} = (|g(3, F lu)| − 1)| × (|g(6, F lu)| − 1) = 1. Like this, we examine the number of derived DISs where τ is (0.3,0.7)-stable. Finally, we obtain |ST (τ, 0.3, 0.7)| = 10 and ST F (τ, 0.3, 0.7) = |ST (τ, 0.3, 0.7)|/|DDAll(τ )| = 10/64 ≈ 16%. Generally, the number of distinct M in pe[CON,ζ],M is 2 power DIF F[CON,ζ] . In this example, we handled 8 cases for M and 8 cases for M ′ , i.e., 64 combinations. This is equal to the number of all derived DISs. Since the attribute values are two values yes or no, N U M[N ausea,yes],M = 1 and N U M[F lu,no],M ′ = 1 always hold. Therefore, the enumeration by pe-classes may not be effective in this example. However, if the number of attribute values is more than 3, some differ-

H. Sakai et al. / Stable rule extraction and decision making in RNIA

53

is usually equipped in a standard software. However, this functionality covers just DISs, and does not cover NISs. In reality, the list notation [high, very high] in Section 3.4 is identified with an attribute value, and the interpretation “either high or very high” is ignored. In this situation, we propose the following questionanswering functionality.

Fig. 6. If the user’s current condition is A2, we can not apply the stored rules to decision making.

ent derived DISs are reduced to a pe[A,ζA ],M . In such a case, the proposing method is more effective. In Fig. 5, 18 derived DISs are reduced to four pe-classes. We have realized a simple program for calculating the stability factor. In reality, we obtained the following two implications in the upper system for α = 0.3 and β = 0.7. τ : [N ausea, yes] ⇒ [F lu, no], ST F (τ, 0.3, 0.7) = 10/64 ≈ 16%. τ ′ : [Headache, yes] ⇒ [F lu, yes], ST F (τ ′ , 0.3, 0.7) = 16/32 = 50%. From this result, we may conclude τ ′ is more reliable than τ .

5. Aspect-7: Question-answering functionality for NISs In the previous section, we referred to the lower system and the upper system in RNIA. Obtained rules characterize the tendencies of data, and they are often applied to decision making. However, if the user’s current condition and the condition part in rules do not match, we can not apply obtained rules. In such a case, we will need a question-answering functionality for obtaining additional information in Fig. 6. This subsection considers a question-answering functionality in RNIA. 5.1. Question-answering functionality in RNIA In a standard table data like excel format data, we can easily specify the condition attribute values by using the filtering functionality. The tuples satisfying the conditions including decision attribute values are responded. Namely, such a question-answering functionality

Definition 6. We define the next question-answering functionality. (Input) A NIS, DEC ⊆ AT and the condition attribute values ∧A∈CON [A, ζA ]. (Output) Each decision attribute value [DEC, η] with minsupp(τ ), minacc(τ ), maxsupp(τ ) and maxacc(τ ) of a definite τ : ∧A∈CON [A, ζA ] ⇒ [DEC, η]. We specify the conditions ∧A∈CON [A, ζA ], and we directly obtain the decision attribute values [DEC, η] of a definite τ : ∧A∈CON [A, ζA ] ⇒ [DEC, η]. In reality, we calculate four values by inf , sup and the formulas in Proposition 2. We think that the validity of this decision [DEC, η] is characterized by the criterion values of τ . If τ : ∧A∈CON [A, ζA ] ⇒ [DEC, η] is very characteristic, this τ will be obtained by NIS-Apriori algorithm. Therefore, if this τ is not recognized as a rule, τ may not be characteristic, and we may not positively use this τ for decision making. However in such a case, we need the question-answering functionality in Definition 6, and we need to obtain all additional information for decision making. As we have shown, the calculation of four values minsupp(τ x ), minacc(τ x ), maxsupp(τ x ) and maxacc(τ x ) does not depend upon |DD(τ x )|. This question-answering functionality will take the same role as decision making by obtained rules. In reality, we have realized a simple software tool of this functionality in C#. 5.2. Mammographic data with incomplete information We applied our software tools to a mammographic data in U CI data repository [39]. This data originally consists of 961 objects, and we picked up the first 150 objects for simplicity. There are 6 attributes below. BI-RADS assessment,Age,Shape,Margin, Density,Severity. V ALAge : integer, V ALShape = {1, 2, 3, 4}, V ALMargin = {1, 2, 3, 4, 5}, V ALDensity = {1, 2, 3, 4}, V ALSeverity = {0(benign), 1(malignant)}.

54

H. Sakai et al. / Stable rule extraction and decision making in RNIA

There are 76 missing values in 150 objects, which are denoted by ? symbol. no missing values for Age, 8 missing values for Shape, 21 missing values for M argin, 47 missing values for Density, no missing values for Severity. Due to such missing values, it is necessary to employ a special tool for analyzing this data set. We apply the framework of RNIA to this data set. We at first translate this data set to a NIS. Since each V AL is a finite set, we can replace each ? symbol with its V AL. For example, 8 missing values exist in the attribute Shape, so we replaced each ? in this attribute with a set {1, 2, 3, 4}. Since BI-RADS assessment is a data in a double-review process by physicians, we omit this attribute. Like this, we obtained a NIS, but the number of derived DISs is about 10 power 48 (= 521 × 455 ). It will be hard to enumerate all derived DISs sequentially. We escaped from this problem by applying Proposition 2.

(CASE 2) Condition:[Density,3] Decision 1:[Severity,1] minsupp=45/150=0.300,minacc=45/115=0.391 maxsupp=55/150=0.367,maxacc=55/88=0.625 Decision 2:[Severity,0] minsupp=33/150=0.220,minacc=33/88=0.375 maxsupp=70/150=0.467,maxacc=70/115=0.609

In CASE 2, it may be difficult to decide the decision according to the criterion values. In this case, 33 percent of the 150 objects are τ ′ : [Density, ?] ⇒ [Severity, 1] or τ ′′ : [Density, ?] ⇒ [Severity, 0]. If we replace ? in τ ′ with 3 and ? in τ ′′ with 2, [Severity, 1] will be concluded. Because, both support and accuracy values of τ ′ are much better than those of τ ′′ . On the other hand, if we replace ? in τ ′ with 2 and ? in τ ′′ with 3, [Severity, 0] will be concluded. Like this, CASE 2 shows that the conclusion strongly depends upon the information incompleteness. (CASE 3)

5.3. Question-answering in practice Let us show how the proposed question-answering framework performs on the example of Mammographic.csv data. The decision attribute is Severity (V ALSeverity = {0(benign), 1(malignant)}), and the candidates of condition attributes are Age, Shape, M argin and Density. In the following, we handle such cases that any rules (α = 0.3 and β = 0.5) can not be applied to the user’s condition. Let us consider some real cases. (CASE 1) Condition:[Shape,1] Decision 1:[Severity,1] minsupp=9/150=0.060,minacc=9/48=0.181 maxsupp=13/150=0.087,maxacc=13/48=0.270 Decision 2:[Severity,0] minsupp=35/150=0.233,minacc=35/48=0.729 maxsupp=39/150=0.260,maxacc=39/48=0.813

There are two decisions, [Severity,1] and [Severity, 0]. In this case, probably [Severity, 0] will be concluded from the criterion values. Because, every value in Decision 2 is uniquely better than that in Decision 1. In reality, an implication [Shape, 1] ⇒ [Severity, 0] is a rule in the lower system for α = 0.2 and β = 0.7. If we employ lower values of α and β for generating rules, this question-answering functionality may not be necessary. However, lots of improper rules may exist in this case.

Condition:[Shape,2]∧[Margin,2] Decision 1:[Severity,1] minsupp=0/150=0.000,minacc=0/6=0.000 maxsupp=2/150=0.013,maxacc=2/2=1.000 Decision 2:[Severity,0] minsupp=0/150=0.000,minacc=0/2=0.000 maxsupp=6/150=0.040,maxacc=6/6=1.000

In CASE 3, it may also be difficult to decide the decision according to the criterion values. There exist less objects with the condition [Shape, 2] ∧ [M argin, 2]. In both cases, maxacc = 1.0 holds, but the support values are too low. According to three cases, we showed the aspect of decision making in RNIA. Figure 7 is a real execution for a condition [M argin, 1] in 961 total objects.

6. Concluding remarks This paper surveyed the foundations of Rough Nondeterministic Information Analysis (RNIA) including DISs and NISs. RNIA is a good framework for handling the information incompleteness. We mainly focused on rule generation, and clarified NIS-Apriori algorithm. This algorithm can generate rules defined in all derived DISs, however the computational complexity does not depend upon the number of derived DISs. In order to further enhance the framework of RNIA, we considered its two additional aspects: extending the

H. Sakai et al. / Stable rule extraction and decision making in RNIA

55

Fig. 7. Real execution for the condition [Margin,1] for total 961 objects. There are about 150 missing values, and the number of derived DISs is more than 10 power 100. The support and accuracy values of [Severity, 0] are much better than those of [Severity,1].

Fig. 8. A total chart of decision making in RNIA.

previous approach to RNIA-based decision rule generation by taking into account the rules’ stability factor analysis, as well as completing the RNIA with the fundamental question-answering functionality that is often applied prior to automatic decision rule generation in the decision making processes. The extended RNIA framework is illustrated by Fig. 8. As a result of our study, we can distinguish the four following modes of decision rules that can be generated from NISs. (D-LS) A definite implication in the lower system.

(I-LS) An indefinite implication in the lower system. (D-H-U S) A definite implication with high stability factor in the upper system. (I-L-U S) An indefinite implication with low stability factor in the upper system. The D-LS class is the most reliable, and the I-L-U S class handles the possibility. We also enumerate the properties on rules in RNIA. (1) If we apply NIS-Apriori to a DIS, the lower and the upper system define the same rules.

56

H. Sakai et al. / Stable rule extraction and decision making in RNIA

(2) If we fix β = 1.0 in a DIS, this defines the rules by the Pawlak’s consistency [21]. (3) If we fix β = 1.0 in a NIS, this defines the consistency-based certain rules [25,26]. (4) In a DIS, ST F (τ, α, β) = 1.0 is equal to that τ is obtained Apriori algorithm for α and β. (5) In a NIS, ST F (τ, α, β) discriminates rules in the upper system. (6) For a definite τ in a NIS, τ is a rule of the lower system if and only if ST F (τ, α, β) = 1.0. In the nearest future, we plan to develop a software environment more completely addressing the new aspects of RNIA presented in Fig. 8. It is also important to note that the proposed methodology can be applied to data sets with other forms of information incompleteness, like incomplete information systems or Lipski’s incomplete databases. Finally, it is visible that the two newly introduced aspects of our RNIA framework can be combined in order to make it even more expressive. In particular, a question-answering methodology based on an appropriate interpretation of stability factors for the users of an RNIA system may result in quite a novel approach to handle non-deterministic data in the decision making processes.

[4] [5]

[6]

[7] [8]

[9]

[10]

[11]

[12] [13] [14]

[15] [16]

Acknowledgment This paper is a combined extended version of two different conference papers [29,30]. The first author Hiroshi Sakai was partially supported by the Grantin-Aid for Scientific Research (C) (No.18500214, No.22500204), Japan Society for the Promotion of Sci´ ¸ zak was partially ence. The fourth author Dominik Sle supported by the grants N N516 368334 and N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland.

[17]

[18]

[19]

[20]

[21] [22]

References

[23] [24]

[1]

[2]

[3]

R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, in: Proceedings of the 20th Very Large Data Base, 1994, pp. 487–499. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. Verkamo, Fast Discovery of Association Rules, in: Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996, pp. 307–328. J. Bazan, H.S. Nguyen, S.H. Nguyen, P. Synak and J. Wr´oblewski, Rough Set Algorithms in Classification Problem, Rough Set Methods and Applications, Studies in Fuzziness and Soft Computing, Springer 56 (2000), 49–88.

[25]

[26]

[27]

A. Ceglar and J.F. Roddick, Association mining, ACM Computing Survey 38 (2006). ´ ¸ zak, C. Cornelis, R. Jensen, G. Hurtado Mart´ın and D. Sle Attribute Selection with Fuzzy Decision Reducts, Information Sciences 180 (2010), 209–224. S. Demri and E. Orłowska, Incomplete Information: Structure, Inference, Complexity, Monographs in Theoretical Computer Science, Springer, 2002. J. Grzymała-Busse, A New Version of the Rule Induction System LERS, Fundamenta Informaticae 31 (1997), 27–39. J. Grzymała-Busse and J. Stefanowski, Three Discretization Methods for Rule Induction, Int’l. Journal of Intelligent Systems 16 (2001), 29–38. J. Grzymała-Busse, Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction, Transactions on Rough Sets 1 (2004), 78–95. Infobright.org Forums: http://www.infobright.org/Forums/vie wthread/288/, http://www.infobright.org/Forums/viewthread/ 621/. J. Komorowski, Z. Pawlak, L. Polkowski and A. Skowron, Rough Sets: A Tutorial, in: Rough Fuzzy Hybridization, Springer, 1999, pp. 3–98. M. Kryszkiewicz, Rough Set Approach to Incomplete Information Systems, Information Sciences 112 (1998), 39–49. M. Kryszkiewicz, Rules in Incomplete Information Systems, Information Sciences 113 (1999), 271–292. W. Lipski, On Semantic Issues Connected with Incomplete Information Databases, ACM Transaction on Database Systems 4 (1979), 262–296. W. Lipski, On Databases with Incomplete Information, Journal of the ACM 28 (1981), 41–70. A. Nakamura, A Rough Logic based on Incomplete Information and Its Application, Int’l. Journal of Approximate Reasoning 15 (1996), 367–378. M. Nakata and H. Sakai, Lower and Upper Approximations in Data Tables Containing Possibilistic Information, Transactions on Rough Sets 7 (2007), 170–189. M. Nakata and H. Sakai, Applying Rough Sets to Information Tables Containing Possibilistic Values, Transactions on Computational Science 2 (2008), 180–204. E. Orłowska and Z. Pawlak, Representation of Nondeterministic Information, Theoretical Computer Science 29 (1984) 27–39. E. Orłowska, What You Always Wanted to Know about Rough Sets, Incomplete Information: Rough Set Analysis, Studies in Fuzziness and Soft Computing 13 (1998), 1–20. Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991. Z. Pawlak, Some Issues on Rough Sets, Transactions on Rough Sets 1 (2004), 1–58. W. Pedrycz, A. Skowron and V. Kreinovich, eds, Handbook of Granular Computing, Wiley, 2008. H. Sakai, Effective Procedures for Handling Possible Equivalence Relations in Non-deterministic Information Systems, Fundamenta Informaticae 48 (2001), 343–362. H. Sakai and A. Okuma, Basic Algorithms and Tools for Rough Non-deterministic Information Analysis, Transactions on Rough Sets 1 (2004), 209–231. H. Sakai and M. Nakata, An Application of Discernibility Functions to Generating Minimal Rules in Non-deterministic Information Systems, Journal of Advanced Computational Intelligence and Intelligent Informatics 10 (2006), 695–702. H. Sakai, R. Ishibashi and M. Nakata, On Rules and Apriori Algorithm in Non-deterministic Information Systems, Trans-

H. Sakai et al. / Stable rule extraction and decision making in RNIA

[28]

[29]

[30]

[31]

[32]

[33]

actions on Rough Sets 9 (2008), 328–350. H. Sakai, K. Koba and M. Nakata, Rough Sets Based Rule Generation from Data with Categorical and Numerical Values, Journal of Advanced Computational Intelligence and Intelligent Informatics 12 (2008), 426–434. ´ ¸ zak, The Lower H. Sakai, K. Hayashi, M. Nakata and D. Sle System, the Upper System and Rules with Stability Factor in Non-deterministic Information Systems, in: Proceedings of RSFDGrC 2009, Springer, LNAI Vol.5908, 2009, pp. 313–320. H. Sakai, K. Hayashi, H. Kimura and M. Nakata, An Aspect of Decision Making in Rough Non-deterministic Information Analysis, in: New Advances in Intelligent Decision Technologies, Springer, (Vol. 199), 2009, pp. 527–536. ´ ¸ zak, Rule Generation in LipH. Sakai, M. Nakata and D. Sle ski’s Incomplete Information Databases, in: Proceedings of RSCTC 2010, Springer, LNAI Vol.6086, 2010, pp. 376–385. A. Skowron and C. Rauszer, The Discernibility Matrices and Functions in Information Systems, in: Intelligent Decision Support – Handbook of Advances and Applications of the Rough Set Theory, Kluwer Academic Publishers, 1992, pp. 331–362. ´ ¸ zak, J. Wr´oblewski, V. Eastwood and P. Synak, BrighD. Sle

[34]

[35]

[36]

[37]

[38]

[39] [40]

57

thouse: an analytic data warehouse for ad-hoc queries, in: Proceedings of VLDB2008, 2008, pp. 1337–1345. ´ ¸ zak and H. Sakai, Automatic extraction of decision D. Sle rules from non-deterministic data systems: Theoretical foundations and SQL-based implementation, in: Proceedings of DTA’2009, Springer, CCIS, Vol.64, 2009, pp. 151–162. ´ ¸ zak and V. Eastwood, Data Warehouse Technology by D. Sle Infobright, in: Proceedings of SIGMOD’2009, 2009, pp. 841– 846. ´ ¸ zak, Rough Sets and Functional Dependencies in Data: D. Sle Foundations of Association Reducts, Transactions on Computational Science 5 (2009), 182–205. J. Stefanowski and A. Tsoukias, On the Extension of Rough Sets under Incomplete Information, in: Proceedings of RSFDGrC 1999, Springer, LNAI, Vol. 1711, 1999, pp. 73–81. J. Stefanowski and A. Tsoukias, Incomplete Information Tables and Rough Classification, Computational Intelligence 7 (2001), 212–219. UCI Machine Learning Repository: http://mlearn.ics.uci.edu/ MLRepository.html. W. Ziarko, Variable Precision Rough Set Model, Journal of Computer and System Sciences 46 (1993), 39–59.

Suggest Documents