Ontology and semantic rules in document ... - Semantic Scholar

2 downloads 19048 Views 266KB Size Report
Findings – Document dispatching is highly domain dependent. Human ... text categorization is cheap and helpful for large document set, but has poor ... This study employs web ontology language (OWL) as the notation or formalism for ..... To purchase reprints of this article please e-mail: [email protected].
The current issue and full text archive of this journal is available at www.emeraldinsight.com/0264-0473.htm

TEL 27,4

Ontology and semantic rules in document dispatching

694

Department of Information Management, Chung Yuan Christian University, Chung Li, Taiwan, Republic of China, and

Yu-Liang Chi

Received 9 June 2008 Revised 24 November 2008 Accepted 11 December 2008

Hsiao-Chi Chen Department of Business Administration, Chung Yuan Christian University, Chung Li, Taiwan, Republic of China Abstract Purpose – The purpose of this paper is to demonstrate how the semantic rules in conjunction with ontology can be applied for inferring new facts to dispatch news into corresponding departments. Design/methodology/approach – Under a specific task domain, the proposed design comprises finding a glossary from electronic resources, gathering organization functions as controlled vocabularies, and linking relationships between the glossary and controlled vocabularies. Web ontology language is employed to represent this knowledge as ontology, and semantic web rule language is utilized to infer implicit facts among instances. Findings – Document dispatching is highly domain dependent. Human perspectives being adopted as predefined knowledge in understanding document meanings are important. Knowledge-intensive approaches such as ontology can model and represent expertise as reusable components. Ontology and rules together extend inference capabilities in semantic relationships between instances. Practical implications – Empirical lessons reveal that ontology with semantic rules can be utilized to model human subjective judgement as knowledge bases. An example, including ontology and rules, based on news dispatching is provided. Originality/value – An organization can classify and deliver documents to corresponding departments based on known facts by following the described procedure. Keywords Document delivery, Document management, Classification, Semantics, Taiwan Paper type Research paper

The Electronic Library Vol. 27 No. 4, 2009 pp. 694-707 q Emerald Group Publishing Limited 0264-0473 DOI 10.1108/02640470910979633

1. Introduction Document dispatching is a delivery service of practical applications in document classification. Automatic document classification involves assigning each document to one or more thematic categories based on its contents. Because large numbers of electronic documents are available on the internet, many libraries employ information technology to classify documents in archive management in community. Electronic resources can be divided into three types, namely structured data (database), semi-structured data (markup/metadata), and unstructured data (full-text). Markup languages such as extensible markup language (XML) are recommended as data representation methods to facilitate document classification in information retrieval (IR). Markups (or tags) in XML are user definable, self-describing, and machine readable. The authors would like to thank the National Science Council (NSC) of the Republic of China, Taiwan, for financially supporting this research under Contract No. NSC 96-2416-H-033-002-MY3.

The main objective of XML is to convert nonstructured data into semi-structured data, thus facilitating information sharing across different information systems (El-Sherbini, 2000; Yu and Chen, 2001). Additionally, many systems for processing electronic resources utilize metadata to describe the resource content. Dublin core (DC) is a metadata example that provides properties to facilitate information discovery of electronic resources. The advantages and applications of the DC are widely recognized in the digital library community (Chandrakar, 2005). The question arises that annotating metadata requires much human effort during the creation stage (Cole et al., 2001). Conversely, advances in knowledge discovery from text (KDT) tools and algorithms have resulted in tremendous progress in automated document classification (Hotho et al., 2005; Sebastiani, 2002). Text categorization is one such approach for document classification, and can be divided into two dominant approaches: (1) Unsupervised categorization (or text clustering) predefines each metric on a document space to cluster similar documents into meaningful document groups. (2) Supervised categorization (or text classification) involves an external mechanism providing information on the correct document classification (Bloehdorn et al., 2005). Almost all standard text categorization methods, including machine learning, statistics and text mining, focus on terms, or keywords process. Because those methods cannot handle the semantic relationships between words, knowledge discovery occurs only in the data layer. Data models have several limitations, such as limited dependency, constraints and generalization, making them inadequate for providing in-depth semantics for applications (Hull and King, 1987). In comparison, text categorization is cheap and helpful for large document set, but has poor classification precision, and does not effectively discover semantic relationships within document content. Knowledge-intensive approaches such as ontology can be utilized to enhance the management of document classification. Ontology is an emerging framework for implementing semantic knowledge that has recently been applied in document process (Nasir Uddin and Janecek, 2007; Weng et al., 2006). Ontology is typically employed to construct taxonomy into conceptual structures and assign metadata to the documents that conform to these conceptual structures (Chi et al., 2006). However, the ontology approach mainly provides upper or abstract views of a conceptual structure, and cannot easily find new knowledge from the semantic relationships within instances. A missing rule layer can be added above a solid foundation of the Semantic web layer cake developed by Berners-Lee et al. (2001). Semantic web rule language (SWRL) was recently proposed to improve the power of ontology (Horrocks et al., 2005). SWRL rules provide procedural knowledge power to lift the limitations of ontology inference, particularly in discovering semantic relations among instances. Document dispatch is highly domain-dependent, and requires human subjective judgement in interpreting document meanings. The paper concentrates on exploring semantic rules, and reveals how the rules in conjunction with ontology can be applied to infer new facts from properties of document instances. A simple walk-through example is provided to explain the ideas.

Ontology and semantic rules

695

TEL 27,4

696

2. Ontology and semantic rules 2.1 Ontology technology and OWL Ontology provides an explicit specification of a conceptualization to express shared human perspectives of the real world. Additionally, a concept is an abstract simplified world view used for representation (Guarino, 1997). Ontology has long been adopted in artificial intelligence and expert systems to express shared human understanding of information. Recent studies on development of knowledge base systems have increasingly utilized domain ontology. The advantages of using ontology include improved design discipline, and sharing and reuse of knowledge. Ontology building is a knowledge-intensive approach, and thus can be treated as knowledge engineering, which consists of several successive processes of knowledge acquisition, modeling and representation (Guarino, 1995). Accordingly, the major task of ontology building is translating goal-oriented or problem-solving activities into a systematically knowledge stipulated to solve a problem. Uschold and Grueninger (1996) observed that the knowledge necessary to solve problems is affected by the nature of problem. Achieving clear definitions of common understanding, knowledge development involves integrating perspectives from task domains, communities, and applications. This study employs web ontology language (OWL) as the notation or formalism for representing the knowledge to be stored in ontology. OWL is an XML-based language, developed by the world wide web consortium (W3C), for representing knowledge models through ontological principles. The OWL models knowledge using model-theoretic semantics. OWL is composed of classes, properties and individuals, which roughly correspond to ontology concepts, roles and instances. Individuals represent objects in the domain that we are interested in. Properties are relations on individuals. Classes are sets containing objects with similar properties. According to W3C specifications, OWL has three increasingly expressive sub-languages for different levels of usability, namely lite, description logic (DL), and full. The most popular practical version is OWL-DL, which is based on DLs. A more leisurely description of OWL can be found in the W3Cs web site. 2.2 Semantic rules OWL is designed to represent information about concepts of instances and how instances are interrelated. Horrocks and Patel-Schneider (2004) have reported several limitations and issues of OWL in syntax and computation. Additionally, OWL has some expressive limitations, particularly in relationships between roles chains, causing indeductibility, logical undecidability, by embedding the word problem in inferences. Golbreich (2004) observed that: . DL and rules expressiveness are different, while; and . each paradigm better fits some particular type of knowledge and supports specific reasoning services. The SWRL is an emerging technology developed to address above difficulties (Ding and Sølvberg, 2007). SWRL is based on a combination of OWL and rule markup language. SWRL extends the set of OWL axioms to include Horn-like rules, enabling SWRL rules to be combined with an OWL knowledge base. A more leisurely description of SWRL also can be found in the W3Cs web site.

2.3 Available tools All OWL, DL, and SWRL technologies have made significant progress recently. Development tools include the OWL ontology editor, DL inference engine, and rules engine. Table I lists some development tools: . OWL ontology editors provide simple user interfaces that help developers to create visual conceptual structures. Most editors have syntax checking functions, including consistency and classification of knowledge content, to ensure that ontology is built properly. . The DL engine provides inference functions based on DLs. An OWL ontology editor can employ inference engines that conform to the description language implementation group interface. . Rules engine enables rules implementation. SWRL takes advantage of Java expert system shell (JESS) as a semantic rules engine. JESS is a Java-based rule engine and scripting environment that was initially a Java implementation of C Language Integrated Production System (a tool for building expert system) (CLIPS). Currently, JESS supports SWRL rules that allow interaction among OWL, SWRL, and JESS. . The Java-based OWL Application Programming Interface (API) provides the programmatic needs of OWL applications development.

Ontology and semantic rules

697

3. A design framework for news dispatching Current document classification methods generally rely on machine objective processes instead of subjective judgement. Because document classification is highly domain dependent, human viewpoints need to be added to the knowledge base in order to understand document meanings. To set an appropriate scope of document classification, the task domain must first be identified. A task is a set of goal-oriented activities. The domain refers to the field within which a task is being performed. The task domain of this study focuses on how to dispatch documents, such as campus news, to the appropriate departments and staff in an organization. The system requires full knowledge of both news implication and organization duties before distributing news to corresponding target objects. To improve context-aware mechanisms, the machine capabilities are supplemented by approaches based on KDT. The KDT mainly involves identifying common attributes of similar documents. These attributes can be considered as a glossary of a specific domain to judge the classification of new coming news. Conversely, the targets of news delivery are either departments or individuals. The system needs to be aware of organization, and to operate systematically. Additionally, relationships between the news attributes and the organizational structure need to be identified and linked.

Tools

Utilization

Download site

Prote´ge´ Pellet RacerPro Jena JESS

OWL ontology editor OWL ontology editor DL inference engine Java-based OWL API Rules engine

http://protege.stanford.edu/ http://pellet.owldl.com/ www.sts.tu-harburg.de/, r.f.moeller/racer/ http://jena.sourceforge.net/ http://herzberg.ca.sandia.gov/

Table I. Popular OWL, DL, and SWRL development tools

TEL 27,4

698

Figure 1 shows the design framework of the document dispatching service, which is composed of development and runtime stages. The upper section of Figure 1 (development stage) sketches the preparations of system development. The left side of the development stage prepares news characteristics, including basic attributes such as “publisher” and context attributes such as “person”. Discovering context attributes is a major works of this study. Section 4.1 describes an approach for extracting glossaries. The right side of development stage implements organizational structure identification. The structure also includes basic attributes such as “director” and functional attributes such as “admissions.” Section 4.2 discusses a procedure of identifying organization and functions. In particular, this study utilizes ontology as a repository to preserve the results of each procedure. The middle section of the development stage shows a rule-based procedure for connecting both “news ontology” and “organizational structure ontology.” More details of rules development are discussed in Section 5. The lower section of Figure 1 (runtime stage) outlines the news dispatching operations. A software agent first feeds the knowledge-based system with online news. The system then transforms news into an ontology instance through a text analysis application. Finally, an inference engine calculates logic defined in rules, and derives new knowledge for helping news dispatch. 4. Ontology development 4.1 News ontology development The news ontology provides an abstract view for describing news instances. This ontology comprises basic attributes and context attributes of a specific domain. (I) Development stage Identifying organization and functions

Extracting important tems from news to buid a glossary A training set of docs Glossary ontology Bacic attributes

Organization function ontology

Linking

Functions attributes

Context attributes News ontology

Basic attributes

Organizational structure ontology

Rule-based mapping

(II) Runtime stage

Figure 1. A conceptual framework for news dispatching design

Feed

Insert

Feed Transforming process Online news

Using a software agent to analyze textual news

Knowledge base

Transforming news into an ontology instance

Inference porcess

Invoke JESS rule engine to infer new relationships

News dispatching

Basic attributes simply refer to existing document description standards such as DC. Context attributes are fairly complex and domain dependent. This study simplifies context attributes into four categories, namely “person,” “event,” “place,” and “object.” Further knowledge discovery from documents are essential for identifying the permitted items of these categories. Because the textual corpus contains many noise words and irrelevant information, extracting terms form documents involves running several successive procedures. Additionally, this study not only extracts terms, but also constructs the hierarchy among terms. The method revised from the work of Chi (2007) is applied to create a glossary. As shown in Figure 2, the method comprises three procedures, namely recognizing terms, tagging glossary and identifying hierarchy. Brief descriptions of the extracting method are presented below. The first procedure identifies terms by running two parallel modules, statistical and linguistic. Both modules generate respective their term sets, and they are complementary in the theoretical practical sphere: (1) The statistical module comprises two techniques, namely term frequency (TF) and association rule. The TF measures how frequently a term occurs in a collection of documents. Because the TF technique identifies characteristic terms in a given text, it can be viewed as a method of helping judgements regarding keywords. Thus, low frequency terms are unproductive for recognizing glossaries. To avoid high-TF terms, which are useless for representing documents, association rules are utilized to measure the relevance of TF terms. An association rule is typically accompanied by two numbers expressing the efficiency and accuracy of the rule. The first number is known as the “support” for the rule. The support simply indicates how frequently items occur together, as a percentage of the total transactions. The second number is called the “confidence” of the rule. Confidence measures how much a particular item is dependent on another. For instance, an association rule is written as {(Campus ) Student); Support: 36 percent; Confidence: 21 percent}. The number of association rules can be managed by setting threshold values of support and (or) confidence. Practical experience indicates that the threshold should be set between 5 and 10 percent. Because items in each association rule denote correlation relationships, they can be further regarded as pairs of linked terms. (2) A linguistic module considers semantic relations among terms by calculating their semantic weights, then maintaining them in a topic map. The output of a linguistic module is a set of expressions comprising two terms and their weights. Every term in the term set can then be further linked into a semantic network based on its weight. The contents of the term set then can be further structured into another set of terms.

Ontology and semantic rules

699

Statistical module Construct analysis Textual document

Structure analysis

Linguistic module

Glossary ontology Named glossary tagging

Terms recognized

Identified hierarchies

Figure 2. The extracting terms method from documents

TEL 27,4

700

The second procedure assists in the glossary tagging process. In the previous term recognition stage, both statistical and linguistic modules can be considered as two individual experts. Glossary tagging combines two term sets into a glossary by utilizing the repertory grid technique (RGT), which involves entering elements, constructs, and a rating scale between them. The previous two term sets are employed as elements and constructs, respectively. RGT can be utilized for knowledge acquisition that regards uncovering the personal constructs. Additionally, constructs make the individual expert the centre of attention, and attempt to lower the interviewer bias, which is frequently a key cause of inaccuracy in the process. Consequently, RGT guarantees to achieve a consensus of synonyms, and to combine two term sets into a glossary with common cognition. The final procedure detects the hierarchies among terms of a glossary. Formal concept analysis (FCA) is applied to identify relations between contexts. FCA is duality-based, thus providing a method of identifying groupings of two elements types, namely formal objects and formal attributes. The set of formal objects and attributes, together with their relationship with each other, form a “formal context.” If the relations in a pair of formal objects and attributes cannot be increased, then the pair is closed, and further termed a formal concept. To analyze hidden relations in the context, an attribute exploration feature is utilized to analyze these implications. The attribute exploration is a basic function available in all FCA tools such as Galicia software. The FCA analysis generates a line diagram, which provides a visualized representation of formal context and hierarchical structure. This study utilizes the above extraction method to construct campus-related glossary ontology. The training set comprises over 3,000 campus news items. According to experimental results, the first procedure obtains two terminology sets of 374 and 296 terms, respectively. The second procedure combines two sets together, and generates a total of 331 terms to form a glossary. The third procedure distributes a glossary into a four-layer hierarchical structure. The first layer contains four categories, namely person, event, place, and object, with 47, 145, 58, and 81 terms, respectively. For example, Table II lists the 47 terms of persons allocated into corresponding layers. 4.2 Organizational structure ontology development To create a flexible organizational structure, this study models organization and functions separately. The organizational structure ontology is subsequently constructed by 1st layer

2nd layer

3rd layer

4th layer

Person

Student

Undergraduate Graduate student

Freshman, sophomore, junior, senior Master student, doctoral student (PhD student), candidate Exchange student, full-time student, part-time student, foreign student Professor, lecturer, assistant professor, associate professor, full professor, visiting professor, Instructor, professor, coach, advisor, tutor, scholar Principal (president), dean, chair (chairman), director Officer, assistant, clerical staff, system staff, guard, janitor, worker, admission officer, chef, supervisor

Status Faculty Table II. A hierarchical structure of person-related terms used in campus

Teacher Administrator General staff

allocating functional items into their corresponding departments. This study proposes a development procedure for identifying organizational functions, including: . identifying scope and goal; . analyzing major affairs, manpower, and capabilities required; . constructing a model for describing organization functions; and . using an ontology to represent this model. The development procedure can be considered as a typical ontology engineering procedure. The first two items of the development procedure correspond to expertise or experience collection, the third item maps to knowledge modeling, and the last item is implementing knowledge representation. Conversely, organization configuration is identified through studying the present organization status such as a university organizational chart. Organization configuration development mostly involves identifying common attributes rather than practical instances. For example, the attributes include director, member, and duty. The first step in identifying university organizational functions is to observe and evaluate daily events. Several faculties were invited to analyze university affairs. Each function was summarized using a concept accompanied with a concise definition. The constituents of each individual function were then identified. Significantly, these constituents correspond to terms of the glossary ontology, rather than new terms. In other words, organizational functions are described with a set of characteristics that are terms from person, event, place, and (or) object. Thus, a function is treated as a controlled vocabulary (CV) for news classification. The final step is to link departments in charge of organizational functions to become organizational structure ontology. Figure 3 shows the corresponding relationships among glossary, functions and departments. For example, “Admissions office” is the department in charge of admission, financial aid and campus life. Admission is a functional concept that comprises four characteristics of the predefined glossary. Some underlined terms listed in “event” are crucial characteristics that are strongly relevant to admission. 5. Knowledge inference using semantic rules After establishing both news ontology and organizational structure ontology, semantic rules are employed to integrate them into a knowledge base. Ontology consists of structured concepts, properties and instances. A concept and its properties form an abstract view for describing similar instances. An instance inherits all characteristics of the abstract view with its own contents. Thus, abstract views are elements for developing rules that are utilized to perform the activities of an instance. In this study, the instances of an organizational structure ontology are delivery destinations of news. Instances including individual departments reserve some particular properties for implementing rule inference. For instance, a property called “Assigned_News” inside a department’s instance is adopted as a news container that receives news from a rules inference engine. To obtain the contents of these particular properties, rules combine related concepts and properties as criteria edited in axioms. A rule has a horn-like expression written as “Antecedent ! Consequent.” Both antecedent (rule body) and consequent (rule head) can be conjunctions of one or more atoms written as “atom1^atom2. . .^atomn.” Every atom is attached to one or more parameters represented by a question mark and a

Ontology and semantic rules

701

TEL 27,4

Organization function ontology

Glossary ontology

Applicant, freshman, graduate student

702 Person

Event

{Application, check status, graduate admission, freshman admission, transfer admission} Accommodation, research opportunity, professional program

Department structure

Financial aid

Registration office

Admission

Admissions office

Campus life

International student dept.

Place

Figure 3. The corresponding relationships among glossary, functions, and departments

Organizational structure ontology

Dormitory campus

Object

Tuition, website, brochure, application form,

variable (e.g. “?x”). The relevant criteria for problem solving need to be determined before developing rules. For example, the following criteria are used for “dispatching news”: . containing critical terms in news’ events; or . containing at least one term of person, places, and objects, respectively. The following rule examples describe the first criteria. The “Dispatch_to” and “Assigned_News” are rule consequents that are also properties of the “News” and “Department” concepts, respectively: NewsðxÞ^ What_Eventsð?x; ?yÞ^ is CVð?y; ?zÞ^ Managed_byð?z; ?aÞ^ Name ða; ?bÞ ! Dispatch_toð?x; ?bÞ:

ðRule-1Þ

NewsðxÞ^ What_Eventsð?x; ?yÞ^ is CVð?y; ?zÞ^ Managed_byð?z; ?aÞ ! Assigned_Newsða; ?xÞ:

ðRule-2Þ

Figure 4 shows two sub-diagrams marked (A) and (B), which, respectively, demonstrate these two rules. Each diagram shows the changes of inference processes step by step. Rule-1 in the left diagram starts from a specific news item (?x). The following axioms are sequentially executed: obtaining news events (?y); finding corresponding controlled vocabularies (?z) of these events; identifying departments (?a) in charge of these CVs, and identifying formal department name (?b). Finally, department instances (?b) are inserted into the “Dispatch_to” property of a news item

News (?x) (1) News (?x)

Department (?a)

Dispatch_to (?a, ?b) (5) Name (?a, ?b)

(4) Managed_by (?z, ?a)

(2) What_ events (?x, ?y)

News (?x) (1) News (?x)

(4) Managed_by (?z, ?a)

Ontology and semantic rules

703

(3) IsCV (?y, ?z) CV (?z) organization functions

(a) Rule-1

Assigned_news (?a, ?x)

(2) What_ events (?x, ?y)

(3) IsCV (?y, ?z) Glossary (?y)

Department (?a)

CV (?z) organization functions

Glossary (?y)

(b) Rule-2

instance (?x). Rule-2 implements inference steps similar to those of Rule-1. This rule adjusts affected objects from news to departments. Thus, relevant news is inferred into the “Assigned_News” property of a department instance. 6. System implementation and evaluation 6.1 News dispatching implementation In the runtime stage, a news article enters the system, and then a transforming procedure is invoked (see the lower section in Figure 1). A software agent performs news context decomposition and characteristics extraction. An individual textual news item is transformed into an instance of “news” concept. Every news instance has basic attributes, including Uniform Resource Identifier (URI) and publisher, and context attributes including vocabularies distributed in four categories, namely person, events, place, and object. The inference engine subsequently performs rule-based inference after the agent feeds the knowledge base news instances. As discussed in Section 5, the ontology model reserves the properties of a particular instance’s whose contents are obtained by performing rules. To develop a rule-based inference, rules are editing using SWRL, which is compatible with OWL-based ontology. Developers code horn-like rules using software editors, which then translate these rules into SWRL-based rules. Inference is supported by rule inference engines such as JESS software. Prote´ge´ (OWL-based ontology development tool) integrates SWRLTab plug-in (SWRL-based rule editor) and JESS for knowledge base development and inference test, as shown in Figure 5. Several rules have been written as demonstrated in the rectangle drawn in dashed lines. The JESS interface provides functions for performing rule-based inferences. The rectangle drawn by dashed lines at the bottom contains three buttons for the following functions: translating OWL ontologies and SWRL rules as Jess facts and rules; firing the rule engine, and writing new facts back to the OWL. The OWL-based ontology, the instances and the SWRL-based rules enter the inference procedure when the JESS engine is firing. Some instance properties derive inference results from JESS implementation. Figure 6 shows two Prote´ge´ “individual editor” interfaces representing two different instances: (1) The left diagram presents a news instance “News_287” obtained by implementing Rule-1. Two properties inside the dashed line rectangle

Figure 4. Two rule-based examples for elaborating sequential inference execution

TEL 27,4

User defined SWRL rules

704

Figure 5. A screenshot of editing rules and JESS implementation

Figure 6. Two screenshots for elaborating the changes to instance contents after inference

The control pannels of JESS rules engine

Inferred properties (a) The results of performing rule-1

(b) The results of performing rule-2

indicate that their contents conform to “curriculum” and “students” of organizational functions, and “Office of Curriculum Affairs” of department. (2) The right diagram depicts a department instance “Curriculum_Affairs” obtained by implementing Rule-2. The property “Assigned_News” inside the dashed line rectangle derives news that is relevant to curriculum affairs.

6.2 System evaluation The performance of news dispatching system was measured in terms of precision and recall. IR defines precision and recall in terms of a set of retrieved documents and a set of relevant documents. In this study, the returned alignments were measured by viewing them as sets of news, and checking for the overlap of the two sets. The overlap is calculated from the precision and recall, which are the ratios of the number of true positive (jR > Aj) and retrieved news (jAj) or relevant news (jRj), respectively. Thus, given a reference alignment R, the precision of some alignment A is given by P(A, R) ¼ jR > Aj/jAj; and recall is given by R(A, R) ¼ jR > Aj/jRj. The admissions office, curriculum affairs office and registration office were employed as experimental objects. Over 1,000 campus news items were accumulated for this evaluation. R was evaluated and judged by corresponding department’s staff. Table III shows the evaluation of three departments, respectively. Experimental results reveal that the overall recall rate and precision rate reached 88 and 78 percent, respectively.

Ontology and semantic rules

705

7. Discussion and conclusion This paper describes the application of ontology and semantic rules to infer knowledge from characteristics of news. Ontology has been applied in the knowledge-based systems over the past few years. The rule-based mechanism is regarded as a missing layer of a necessary complement to ontology-based systems. The most significant benefits of rules are its abilities to chain characteristics and infer the existence of new facts. This study employs ontologies and semantic rules to dispatch news to appropriate departments. A walk-through example is provided to indicate the advantages of utilizing semantic rules with ontology. In a practical application, a news article is translated into an ontology instance from which some properties such as people, events, places and objects can be extracted as asserted properties. The SWRL rules are then utilized to infer new facts in accordance with semantic relations between properties inside instances and known facts in ontology. The proposed system based on building OWL ontologies and SWRL rules has the following advantages: . Developers can capture and model both domain and task into ontologies, giving the information systems all the knowledge needed to solve specific problems. . Information holders follow ontology schemas to edit and update their known facts as ontological knowledge. Complex knowledge and semantic networks are organized by using inference systems. The knowledge framework and facts collection are separated, simplifying the system for both developers and users.

Department Admissions office Curriculum affairs office Registration office Total

R (relevant news) 138 110 223 471

Item A (retrieved news) R>A 152 123 253 528

124 92 198 414

Recall (percent)

Precision (percent)

90 84 89 88

82 75 78 78

Table III. Recall and precision of news dispatching system

TEL 27,4

706

.

The rules represent solid expertise of common agreements describing how things get done. All new facts need to follow the guidance of asserted rules. If facts change, then new connections of news dispatch can be rapidly updated by inference services.

Although only simple examples are given in this study, more implementations can be further inferred from comprehensive rules. This ability of semantic rules with ontology to infer knowledge from known facts is especially valuable for finding new facts among instances. References Berners-Lee, T., Hendler, J. and Lassila, O. (2001), “The semantic web”, Scientific American, Vol. 284 No. 5, pp. 34-43. Bloehdorn, S., Cimiano, P., Hotho, A. and Staab, S. (2005), An Ontology-Based Framework for Text Mining, LDV Forum, Vol. 20 No. 1, pp. 87-112. Chandrakar, R. (2005), “An approach to mapping CCF to Dublin core”, The Electronic Library, Vol. 23 No. 5, pp. 577-90. Chi, Y.L. (2007), “Elicitation synergy of extracting conceptual tags and hierarchies in textual document”, Expert Systems with Applications, Vol. 32 No. 2, pp. 349-57. Chi, Y.L., Hsu, T.Y. and Yang, W.P. (2006), “Ontological techniques for reuse and sharing knowledge in digital museums”, The Electronic Library, Vol. 24 No. 2, pp. 147-59. Cole, T.W., Mischo, W.H., Habing, T.G. and Ferrer, R.H. (2001), “Using XML and XSLT to process and render online journals”, Library Hi Tech, Vol. 19 No. 3, pp. 210-22. Ding, H. and Sølvberg, I. (2007), “Rule-based metadata interoperation in heterogeneous digital libraries”, The Electronic Library, Vol. 25 No. 2, pp. 193-206. El-Sherbini, M. (2000), “Metadata and the future of cataloging”, Library Computing, Vol. 19 Nos 3/4, pp. 180-91. Golbreich, C. (2004), “Combining rule and ontology reasoners for the semantic web”, Lecture Notes in Computer Sciences, Vol. 3323, pp. 6-22. Guarino, N. (1995), “Formal ontology, conceptual analysis and knowledge representation”, International Journal of Human-Computer Studies, Vol. 43 No. 5, pp. 625-40. Guarino, N. (1997), “Understanding and building, using ontologies”, International Journal of Human-Computer Studies, Vol. 46 Nos 2/3, pp. 293-310. Horrocks, I. and Patel-Schneider, P.F. (2004), “Reducing OWL entailment to description logic satisfiability”, Journal of Web Semantics, Vol. 1 No. 4, pp. 345-57. Horrocks, I., Patel-Schneider, P.F., Bechhofer, S. and Tsarkov, D. (2005), “OWL rules: a proposal and prototype implementation”, Journal of Web Semantics, Vol. 3 No. 1, pp. 23-40. Hotho, A., Nu¨rnberger, A. and Paaß, G. (2005), “A brief survey of text mining”, LDV Forum, Vol. 20 No. 1, pp. 19-62. Hull, R. and King, R. (1987), “Semantic database modeling: survey, applications, and research issues”, ACM Computing Surveys, Vol. 19 No. 3, pp. 201-60. Nasir Uddin, M. and Janecek, P. (2007), “Faceted classification in web information architecture: a framework for using semantic web tools”, The Electronic Library, Vol. 25 No. 2, pp. 219-33. Sebastiani, F. (2002), “Machine learning in automated text categorization”, ACM Computing Survey, Vol. 34 No. 1, pp. 1-47.

Uschold, M. and Grueninger, M. (1996), “Ontologies: principles, methods and applications”, The Knowledge Engineering Review, Vol. 11 No. 2, pp. 93-155. Weng, S.-S., Tsai, H.-J., Liu, S.-C. and Hsu, C.-H. (2006), “Ontology construction for information classification”, Expert Systems with Applications, Vol. 31 No. 1, pp. 1-12. Yu, S.-C. and Chen, R.-S. (2001), “Developing an XML framework for an electronic document delivery system”, The Electronic Library, Vol. 19 No. 2, pp. 102-11.

Ontology and semantic rules

707 Further reading Horrocks, I., Patel-Schneider, P.F. and Harmelen, F.V. (2003), “From SHIQ and RDF to OWL: the making of a web ontology language”, Journal of Web Semantics, Vol. 1 No. 1, pp. 7-26. About the authors Yu-Liang Chi is an Associate Professor of Information Management at Chung Yuan Christian University, Taiwan. He received his PhD in Industrial Management and System Engineering from Arizona State University. His research interests focus on systems integration, information retrieval, and ontological knowledge-based system. Yu-Liang Chi is the corresponding author and can be contacted at: [email protected] Hsiao-Chi Chen is an Assistant Professor of the Business Administration at Chung Yuan Christian University, Taiwan. Her research interests focus on technology management, organization theory, and information system.

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Suggest Documents