Ontology-Based Association Rules Retrieval using ... - IEEE Xplore

1 downloads 0 Views 294KB Size Report
our ontology-based association rules retrieval method in detail and implement a prototype system called O-. ARR using Protégé tools. In this paper, the system.
Ontology-based Association Rules Retrieval using Protégé Tools Bin Shen1, Min Yao1, Zhaohui Wu1, Yangu Zhang2, and Wensheng Yi1 1. College of Computer, Zhejiang University, Hangzhou 310027, China; 2. School of Computer Science & Engineering, Wenzhou University, Wenzhou, 325027, China tsingbin@zju.edu.cn Abstract The existed methods of association rules retrieval have not given enough high-level semantic information retrieval support. In order to resolve this problem, we propose a new method of association rules retrieval that is based on ontology and semantic web. Ontologybased association rules retrieval method can well deal with the problems of rule semantics sharing, rule semantics consistency and intelligibility. We discuss our ontology-based association rules retrieval method in detail and implement a prototype system called OARR using Protégé tools. In this paper, the system architecture of O-ARR is firstly brought forward, and then the retrieval methods of O-ARR are listed and discussed. Several key issues of O-ARR, which include establishment of rule retrieval ontology, annotation of ontology instance, query parse and user interface, are analyzed. Our method also gives a technique support for further rule information utilization, such as rule information automatic analysis and intelligent reasoning.

1. Introduction Association rules mining is a basic but powerful method in data mining domain. When we apply association rules mining technology in databases[2], a lot of association rules can be produced. Because the number of produced association rules may be very large, it is hard for human being to explore in these rules. So it is necessary to organize and retrieve these association rules effectively to help rules comprehension and utilization, and it becomes a grand challenge for post-analysis stage of data mining process increasingly. Currently several works have been done toward this challenge, e.g. rule pruning[5], grouping[5,6], clustering and rule visualization etc. Among them, one important method to solve this problem is to use association rules retrieval method

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) 0-7695-2702-7/06 $20.00 © 2006

[1,2,3,4]

. And these days, a series of associated works have researched the method of association rules retrieval from various aspects, but research in association rules retrieval is still in its infancy. Reference [4] uses rule cache to postpone restrict operations on the transactions from rule generation to rule retrieval, thus it reduces the retrieval time for association rules. Alex Tuzhilin and Bing Liu [3] provide a new powerful rule query language Rule-QL for querying multiple rulebases, which has rigorous theoretical foundations of a rule-based calculus RC . Reference [2] proposes to use several rule evaluation operators, including rule grouping, filtering, browsing, and data inspection operators, to handle very large numbers of association rules in analyzing microarray data, and achieves satisfied results. Tadeusz Morzy and Maciej Zakrzewicz[1] introduce a new index structure named Group Bitmap Index to quicken subset search of association rules retrieval in relational databases circumstance. These above methods of association rules retrieval only use conditions of low-level information of association rules, such as rule format, rule characteristic etc., to query rulebases. Rule format condition enables users specify that the left-hand-side or right-hand-side of rules should or should not contain certain items. Rule characteristic condition makes users be able to query rulebase using rule characteristics, e.g., confidence, support or length of rule etc. But these query methods have not given enough support to the query of high-level information of rules. Thus these methods cannot conquer the obstacles of semantic obsession and semantic inconsistency on information integration and knowledge sharing during the process of rule retrieval, when these rules are stored in distributed heterogeneous rulebases. These obstacles also limit the utilization of rule information for further intelligent rule analysis and reasoning. Now, arisen ontology and semantic web techniques give a solution for solving

these obstacles, which provide a consistent semantic platform for rule retrieval process. Hence we try to use ontology and semantic web techniques to improve semantic retrieval for association rules. In this paper, aiming at high-level information retrieval in rules retrieval, we propose a new rule retrieval method based on ontology in the background of retail items. A prototype of association rules retrieval system based on ontology (O-ARR) is also established to prove the usefulness of our method, which can well solve the problem of semantic retrieval for association rules.

2. Association rule and its evaluations Association rule has various forms such as boolean association rule, generalized association rule, multidimensional association rule and quantitative association rule. In this paper, we adopt generalized association rule to describe and test our ontologybased rule retrieval mechanism. Generalized association rule can be formally described as follows. Let D be a set of transactions and I ={ i1 , i2 ,L, im } be a set of items. T is a directed acyclic graph on I , representing a set of taxonomies. Thus generalized association rule can be expressed as an implication of the form X ⇒ Y , where X ⊂ I , Y ⊂ I , X ∩Y = φ , and no item in Y is an ancestor of any

stored in ontology knowledge base. At the same time, the source rulebase of these transformed rules are recorded as an attribute of rules. (2) Ontology knowledge base is the kernel part for intelligent rule retrieval, which contains semantic model of association rules and semantic model of items. Its main function is to store rule ontology and related domain knowledge, thus it can give kernel support for ontology based association rules retrieval. (3) Rule ontology creation and maintenance is operated by ontologists, which is the foundation of successful rule retrieval based on ontology. (4) Intelligent user interface supplies intelligent rule retrieval service, and helps to focus on interested rules for users. O-ARR comprises four levels: data level, ontology level, semantic level and service level. Data level contains rule sets that exist in various distributed heterogeneous rulebase, where data about rules are distributed, discrete and non-semantic. In ontology level, rule retrieval ontology is created and maintained, which organizes correlative concepts and their relationships systematically to provide machine readable consistent semantics. Knowledge level is on the top of ontology level, and offers domain knowledge for rule retrieval and comprehension. Service level supplies intelligent rule retrieval service for users.

item in X . We call X and Y as left-hand-side and right-hand-side of association rule. Several evaluation methods can be used to measure the interestingness of association rules, and the common used measurements include support and confidence of association rules.

3. System architecture of ontology-based rules retrieval 3.1 System architecture In order to support semantic retrieval for association rules, we develop a prototype named OARR based on ontology and semantic web. The system architecture of O-ARR is shown in Figure 1, which mainly includes four parts: rule format transformation, ontology knowledge base of rule retrieval, ontology creation and maintenance, intelligent user interface. Their corresponding characteristics and functions are listed as follows. (1) Rule format transformation transforms the various forms of association rules stored in distributed heterogeneous rulebases into ontology language (RDF/RDFs, OWL and so on) format rules which are

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) 0-7695-2702-7/06 $20.00 © 2006

Figure 1. System architecture of O-ARR

3.2 Retrieval methods Our O-ARR system provides multilayer rule retrieval methods: (1) Query on Low-level information Low-level information query realizes precise query on low-level information of rules, which mainly includes two kinds of conditions. The first type is rule format condition, that is users can specify certain items should be contained or not in left-hand-side or right-

hand-side of rules. The other type is rule characteristic condition, that is rule characteristic, such as support, confidence, can be specified for query. (2) Query on ontology semantics Ontology semantics query can query on classes, subclasses or attributes of rule retrieval ontology, and matched rules are called back. (3) Fuzzy semantic retrieval method Fuzzy semantic retrieval method expands the original fuzzy query conditions, and generates the fuzzy extended query conditions, which can be directly used in ontology based rule retrieval. The above retrieval methods are not exclusive but cooperative to help the user find interested rules. At the same time, all of them need to be transformed into a low-level and ontology based query language, which can be executed on the ontology based rulebase.

4. Key issues in ontology-based rules retrieval In order to realize ontology-based rules retrieval, we need to build knowledge base of rule retrieval and ontology based rule query language. Here we discuss several key issues when we establish ontology-based rules retrieval system in the retail trade.

4.1 Rule retrieval ontology The main purpose of establishing rule retrieval ontology is to provide consistent and explicit semantics in the process of rule retrieval. We apply Protégé 3.0[7] as our ontology editor, and use OWL and RDF as rule ontology language to establish rule retrieval ontology. Correlative ontologies are added manually to knowledge base. Ontology of rule retrieval can be regarded as quaternion RuleRetrOnto:={SC, Rules, All_Items, CorrelativeOnto}, where SC represents system class ontology, Rules is rule ontology, All_Items contains all of the item ontologies, and CorrelativeOnto covers other correlative ontologies. The top-level classes of rule retrieval ontology are shown in Figure 2. Attributes of Rules ontology, All_Items ontology, CorrelativeOnto ontology are added according to domain knowledge. The detail discussions are listed below. As shown in Figure 3, Rules ontology includes several attributes that are rule confidence, rule support, left-hand-side of rule, right-hand-side of rule etc. — Confidence: rule confidence attribute, an important measure of rule interest — Support: rule support attribute, another important measure of rule interest

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) 0-7695-2702-7/06 $20.00 © 2006

— Left_hand_side: left-hand-side of rule that contains one or more sub-classes of All_Items — Right_hand_side: right-hand-side of rule that contains one or more sub-classes of All_Items — Data_source: source rulebase of rule — Data_created: source database where rule derives from All_Items ontology and its sub-classes are established according to the taxonomies of retail, as shown in Figure 4. The bottom classes in taxonomies are items in sell in retail trade that have attributes of brand, cost, price etc, as shown in Figure 5. — Brand: brand of item, instance of BrandOnto ontology — Provider: provider, instance of ProviderOnto ontology — Cost: cost of item — ID: ID of item — Name: name of item — Price: price of item — Expiring_date: period of validity — Discount_or_not: whether this item is discount or not — Sales_promotion_or_not: whether this item is connected with sales promotion or not — Ancestor_class: super-classes of item in All_Items

Figure 2. Top-level classes of rule retrieval ontology

Figure 3. Attributes of rules ontology

Figure 4. Taxonomies of All_Items ontology CorrelativeOnto ontology contains other correlative ontologies, as listed below. — BrandOnto: brand ontology — ProviderOnto: provider ontology — DataSourceOnto: rulebase ontology — DataCreateOnto: transaction database ontology

These attributes are selected according to domain knowledge and query need. For example, if we want rule retrieval system can retrieval rules that contain items with high profit, profit attribute should be added to item ontology. Thus rule retrieval system can well satisfy users’ requirement. These ontologies and attributes provide background knowledge for rule retrieval.

corresponding instances are selected in brand attribute and Provider attribute. If existed instances of brand ontology or provider ontology do not satisfy current need, a new brand ontology or provider ontology is created by the similar way. Rule instances can be established similarly. In the process of creation rule instances, if the corresponding item instances do not exist, annotator will create a new item instance.

4.3 Query parse

Figure 5. Item attributes

Figure 6. CorrelativeOnto and subclasses

Ontologies of rule retrieval are described by OWL and RDF ontology language. In the following we give a short RDF section that describes the concept of Livestock_product. Livestock_product is a sub-class of Fresh_food, and it means livestock product. livestock product

Figure 7. Annotation of Bread instance

4.2 Annotation of ontology instance After ontologies of rule retrieval are established, we need to add enough rule instances and item instances to rule retrieval knowledge base. The process of adding item instances includes the following steps. Step1: Annotator chooses a certain item, and creates a blank instance for item. Step2: Annotator fills blank units of instance according to domain knowledge. As shown in Figure 7, a blank Bread instance is created, and then

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) 0-7695-2702-7/06 $20.00 © 2006

O-ARR offers multilayer query methods, which facilitate users’ retrieval. All kinds of query in O-ARR are transformed into ontology query language ultimately, which is executed in ontology based rulebase. (1) Query on low-level information and ontology Low-level information matching is the basic type of rule retrieval, and ontology information matching is the extension of low-level information matching. We use low-level information, ontology, sub-classes or super-class of ontology, and ontology attributes, to realize the effective rule retrieval. Here we adopt inclusive/restrictive template [8] and class attributes to depict this ontology query language: RuleOntoQuery := query rule (RuleOntoQueryList) RuleOntoQueryList := (RuleOntoQueryList) AND (RuleOntoQueryList) RuleOntoQueryList := (RuleOntoQueryList) OR (RuleOntoQueryList) RuleOntoQueryList := rule.Left_hand_side {contains/ does not contain} ItemOntoQueryList | rule.Right_hand_side {contains/ does not contain} ItemOntoQueryList | rule.Confidence {is / is greater than / is less than} realLiteral | rule.Support {is / is greater than / is less than} realLiteral | rule.Data_source {contains/ does not contain} DataSourceOntoQueryList | rule.Data_created {contains/ does not contain} DataCreateOntoQueryList ItemOntoQueryList := ClassLiteral | item.sup-class {contains/does not contain} ClassLiteral | item.Brand {contains/ does not contain} BrandOntoQueryList | item.Provider {contains/ does not contain} ProviderOntoQueryList | item.Cost {is / is greater than / is less than} realLiteral | item.ID {is/ is not} StringLiteral | item.Name {is/ is not} StringLiteral | item.Price {is / is greater than / is less than} realLiteral

| item.Expiring_date {is / is greater than / is less than} realLiteral | item.Discount_or_not {is/ is not} true/false | item.Sales_promotion_or_not {is/ is not} true/false BrandOntoQueryList := ClassLiteral | brand.BrandName {is/ is not} StringLiteral ProviderOntoQueryList := ClassLiteral | provider.Name {is/ is not} StringLiteral DataSourceOntoQueryList:= ClassLiteral | dataSource.Name {is/ is not} StringLiteral DataCreateOntoQueryList:= ClassLiteral | dataCreate.Name {is/ is not} StringLiteral Here we give two query examples to illustrate the using of above ontology query language: Example 1. Query association rules where their left-hand-sides contain Fresh_food’s sub-class items. Then the corresponding ontology query language is Query rule (rule.Left_hand_side contains (item.supclass contains Fresh_food)) Example 2. Query association rules satisfy the following conditions: (1) The confidence of association rules should be greater than 0.1; (2) The right-hand-side of association rule should contain a item which has brand name of Floating_forest. Then the corresponding ontology query language is Query rule ((rule.Confidence is greater than 0.1) AND (rule.Right_hand_side contains (item.Brand contains Floating_forest))) (2) Fuzzy semantic retrieval Fuzzy semantic retrieval enables the using of fuzzy linguistics in queries. It cannot be executed as ontology query language directly, so we need to transform the fuzzy linguistics in fuzzy semantic retrieval into explicit ontology query language. Thus mapping rules should be established which make the fuzzy linguistics explicit. In the following we give a fuzzy semantic retrieval example. Example 3. Query association rules that have high confidence. O-ARR searches domain knowledge and finds the following mapping rule: “most of association rules with high confidence have a confidence which is equal to or greater than 0.1”. Thus the above query requirement is transformed into Query rule (rule.Confidence is greater than 0.1) OR (rule.Confidence is 0.1).

5. Discussion and conclusion In this paper, we discuss ontology based association rules retrieval, which can greatly improve the current rule retrieval methods from semantic support aspect. We design and develop a preliminary prototype named

Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) 0-7695-2702-7/06 $20.00 © 2006

O-ARR for ontology based rules retrieval in ontology based rulebase. O-ARR shows that ontology and semantic web can well organize knowledge of rule retrieval. Key issues in ontology based rule retrieval, such as ontology rule retrieval language, establishment of rule retrieval ontology, are discussed. In order to facilitate users to focus on interesting rules, multiple retrieval methods are provided. Our method also gives a technique support for further rule information utilization, such as rule information automatic analysis and intelligent reasoning. Acknowledgement We would like to thank Prof. Wenyu Zhang for his constructive suggestions on this paper. Our work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) (20040335129) and Key Program of Natural Science Foundation of Zhejiang Province (No. Z104267).

References [1] [2]

[3]

[4]

[5]

[6] [7] [8]

Tadeusz Morzy, Maciej Zakrzewicz, “Group bitmap index: a structure for association rules retrieval”, Proc. ACM SIGKDD'98, 1998, pp. 284-288. Alexander Tuzhilin, Gediminas Adomavicius, “Handling very large numbers of association rules in the analysis of microarray data”, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 396404. Alex Tuzhilin, Bing Liu, “Querying multiple sets of discovered rules”, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, July 23-26, 2002. Jochen Hipp, Christoph Mangold, Ulrich Güntzer, Gholamreza Nakhaeizadeh, “Efficient Rule Retrieval and Postponed Restrict Operations for Association Rule Mining”, PAKDD 2002, pp. 52-65. H Toivonen, M Klemettinen, et al., “Pruning and grouping discovered association rules”, The ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, 1995. Aijun An, Shakil Khan, et al. “Objective and subjective algorithms for grouping association rules”, Third IEEE International Conference on Data Mining, 2003. http://protege.stanford.edu/ Mika Klemettinen, Heikki Mannila, et al. “Finding Interesting Rules from Large Sets of Discovered Association Rules”, Third International Conference on Information and Knowledge Management (CIKM'94), 1994, pp. 401-407.