XML-Hoo!: A Prototype Application for Intelligent ... - Semantic Scholar

3 downloads 338 Views 294KB Size Report
Automation of business processes that ... with XML data for e-business applications; Glushko et. al. .... (4) Pa: A character list, one descriptive name for section.
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

XML-hoo!: A Prototype Application for Intelligent Query of XML Documents using Domain-Specific Ontologies Henry M. Kim Schulich School of Business, York University, 4700 Keele St., Toronto, Ontario Canada M3J 1P3 [email protected]

Abstract

XML use for knowledge management systems (KMS) requires a bottom-up approach: Semantics about structured data in existing documents must be organized and applied to support knowledge extraction and discovery. Semantics in of themselves constitute organizational knowledge, and if they are represented systematically, and consistent with XML structure of documents in the KMS’ repository, they can be codified as query answering routines. This benefit can be realized by representing these semantics using ontologies. An ontology is an explicit understanding of shared understanding [3], so it represents common knowledge represented in the KMS. More expressively, it “consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of formal axioms that constrain interpretation and well-formed use of these terms” [4]. Ontology axioms can be codified and applied to structured data for automated inference—i.e. query answering. Smith and Coulter [5] discuss the use of ontologies with XML data for e-business applications; Glushko et. al. [6] posit limitations of such use, namely the impracticality of detailed formalization because of rapidly changing business needs. Through bottom up development, however, a practical approach can be taken: Domainspecific ontologies useful for focused tasks can be developed, and domain independent, generalizable ontology representations can be organized over time [7]. A promising focused task is the use of ontologies to support querying an XML repository. A review of such applications—e.g. GoXML [8], XYZFind [9], and xmlTree [10]—shows there are powerful search engines that index, classify, and store numerous XML documents on the web. However, they focus on speed and breadth of the search, rather than depth. As a result, their observed performance in answering queries is not substantially better than a regular search engine, unless the searcher is familiar with the structure of searched documents; these engines generally do not commit to representing the semantics of XML documents in order to answer the queries.

Use of XML holds great promise for standardizing data models for realizing benefits such as lowered development costs and time for integrating inter-organizational business processes and intra-organizational knowledge management. Further benefits can be realized by formally defining common semantics in ontologies using the standardized models. Automation of business processes that require sharing knowledge represented in XML-based ontologies can then be supported. In this paper, a proof-ofconcept application for using ontologies to support deduction of knowledge implicit in existing XML documents is presented. This system, called XML-hoo!, employs a customized portal user interface to answer queries about Shakespearian plays. Queries are answered by applying inference rules about these plays represented as axioms that comprise a Shakespearian ontology, composed of terminology corresponding to existing XML DTD's. These rules are applied to plays represented in XML that are in the public domain. Hence, answers to queries such as, "Who is Romeo’s father?" can be automatically deduced even though facts required for such answers are not explicitly structured in XML documents. This application demonstrates use of re-usable and sharable ontology representations to further leverage the expected proliferation of XML documents.

1. Introduction Though many organizations have converted their documents to XML, these may not be fully utilized. Structured data is of limited value unless applications manipulate it in a way intended by the developer, and required by the user; there must be a common understanding about semantics of the structured data. Common understanding can be enforced via use of an offthe-shelf library, e.g. xCBL™ [1], or use of standardized industry-based languages, e.g. Financial products Markup Language (FpML) [2]. Both approaches can be considered top-down because they entail conformance to an external data schema and accompanying semantics. However, 1

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

1

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Shakespearian plays, are chosen for the test. The reasons are the following: The focus group has read many plays, so can ably provide requirements; consistent XML element definitions exist, so an application broadly querying all Shakespearian plays can be designed; and extending the application to query about other domains—e.g. plays in general or historical writings—may be possible. The focus group then provides requirements for an application more sophisticated than their current KM system, and general query and search engines. Specifically, they want this application, XML-hoo! to answer non-trivial queries, the kind that a student studying Shakespeare may ask.

By committing to represent such semantics, is it feasible to develop an “intelligent” query application? Is it feasible to represent these semantics using ontologies? What can be learned for developing a query application for KMS? In this paper, a proof-of-concept application developed to address these questions, XML-hoo!, is presented with discussion of its ontology emphasized. An ontological commitment to represent the semantics and structure of Shakespearian plays [11], represented in XML and available in the public domain, is made. Next, excerpts of the ontology and XML-hoo! application are presented. Then concluding remarks and future work are stated.

2. XML-Hoo! Shakespearian Ontology 2.2. Competency Questions - Informal This ontology is developed and presented using the methodology shown below [12].

Users’ requirements expressed as queries of an ontology-based application are competency questions. Since terms to pose formal queries in the ontology's language are not yet developed, these questions are inherently informal, asked in English using vocabulary and semantics familiar to users. The following are some competency questions for the Shakespearian play ontology used for XML-hoo!. CQ-1. Who is Romeo’s father? CQ 2. Who said “Et tu, brute!”? CQ 3. Where does ‘King Lear’ take place? CQ 4. Who are all the characters in ‘Taming of the Shrew’? CQ 5. Does Romeo utter a sonnet, and if so what does he say?

A Motivating Scenario C Ontology Narrative about a company

Terminology X

Q:

A

B

A1

Competency Questions

A2

B B1

B2

Data model of a domain

The questions that an ontology should be used to answer.

Axioms ∀A1∀Α2∀Y { A1 ∧ Α2 ⊃ Y }.

Specify capability of ontology to support problem-solving tasks

Formalizations that define and constrain the data model

Prolog populated enterprise model

A:

Demonstration of Competency

D Evaluation of Ontology

2.3. Ontology

Figure 1: Overview of the TOVE Ontological Engineering Methodology

2.3.1. Terminology. The terminology of the ontology comprises minimally of all terms required to formally express, but not answer, the competency questions. In turn, the expression that defines a given term is expressed using other ontology terms. Ultimately, a primitive ontology term is not defined, but mapped from a data repository. In presenting the terminology, the data schema of Shakespearian XML documents is presented pictorially as a hierarchical model, then terminologically as the ontology’s primitive terms expressed as predicates. Next, key terms and relationships in the informal competency questions are identified and integrated into pictorial (ER) and terminological (predicate) models.

2.1. Motivating Scenario The motivating scenario is a detailed narrative about problems faced or tasks performed by a user for which an ontology-based IS application is constructed. Here is XML-hoo!’s motivating scenario: • An organization with an IS-based KMS is studying feasibility of migrating the system to an XML-based platform. This study entails prototyping an application for knowledge extraction/query from existing XML documents to surmise development effort and cost/ benefit before converting all documents to XML. A focus group is selected to provide requirements for, and ultimately test, the prototype application. Documents existing in public domain, specifically

2.3.2. Primitive Terms - Hierarchical Model.

2

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

2

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Figure 2: Use of Defined Elements within a Shakespearian XML document

attributes or child’s unique identifier. In an ontology, predicates that are not formally defined are called primitive terms because they are populated through assertions, not by inference. Therefore, relationships between terminal nodes in the XML data structure correspond to primitive terms of the ontology.

The Tragedy of Julius Caesar Dramatis Personae JULIUS CAESAR .... FLAVIUS MARULLUS tribunes. .... SCENE Rome: the neighbourhood of Sardis: the neighbourhood of Philippi. ... ACT I SCENE I. Rome. A street. ... FLAVIUS Hence! home, you idle creatures get you home: Is this a holiday? what! know you not, ... ... JULIUS CAESAR ...

2.3.3. Primitive Terms - Predicate Model. Here is a description of primitive term variables. Variable numbers match those in Fig. 3. (1) P: Name of the play; value of Play⇒1Title (2) S: Subtitle of play; value of Play⇒Play Subtitle (3) Scd: One scene description for the play; value of Play⇒Scene Description (4) Pa: A character list, one descriptive name for section wherein all characters are introduced; value of Play⇒Personae⇒Title (5) Pe1: Character description set of all characters individually introduced; value of Play⇒Personae⇒Persona (6) Gd: Character description for each grouping of characters; value of Play⇒Personae⇒PGroup⇒Group Description (7) Pe2: Character description set of all characters introduced within a given group; value of Play⇒Personae⇒PGroup⇒Persona (8) A: Title of an act within the play; value of Play⇒Act⇒Title (9) Std1: One stage direction for each act; value of Play⇒Ac⇒Stage Direction (10) Sc: Title of scene within an act; value of Play⇒Act⇒Scene⇒Title (11) Std2:One stage direction for each scene; value of Play>Act->Scene->Stage Direction (12) Sp: Speaker of a speech; value of Play⇒Act⇒Scene⇒Speech⇒Speaker (13) L: A line in a speech; value of Play⇒Act⇒Scene⇒Speech⇒Line (14) Px: Free text, which will not be represented using the ontology; value of Play⇒FM⇒P

The hierarchical relationships of the markup tags can be diagrammed this way: Play

FM

Personae

*

Play Subtitle

(1)

(14) (4)

P

* Act

Title

(5) * Persona

Title (7)

*

Persona

(3)

(9) * Scene

PGroup (6)

Stage Direction

Title

(8)

*

Group Description

Speech

Title (10)

(13)

Stage Direction (11)

*

Line Cardinality of has relationships are one-to-one, unless explicitly noted as one-to-many with *

Scene Description

(2)

Z

(12)

Speaker entity

(#)

X

Attribute of entity

Y

Unique Identifier of entity

PT-1. play_has_act(P,A) e.g. play_has_act(‘The Tragedy of Julius Caesar’, ‘ACT I’). PT-2. play_has_subtitle(P,S) e.g. play_has_subtitle(‘The Tragedy of Julius Caesar’, ‘JULIUS CAESAR’).

Figure 3: Hierarchical Structure of the Shakespearian Plays in XML:

PT-3.

All relationships can be read top-to-bottom as has—i.e. a play has title, or play has act, which has title. Parent entities have children entities. Terminal nodes in the diagram correspond to elements that mark-up document content; i.e. they are attributes of an entity. Some terminal nodes are attributes that uniquely identify an entity. This model can then can be expressed using predicates, which relate either an entity’s unique identifier to its own

play_has_scene_description(P,Scd)

PT-4. play_has_character_list(P,Pa) e.g. play_has_character_list(‘The Tragedy of Julius Caesar’, ‘Dramatis Personae’). PT-5.

character_list_has_character_description(Pa,Pe1)

1. ⇒ denotes has

3

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

3

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

e.g. character_list_has_character_description(‘Dramatis Personae’, ‘JULIUS CAESAR’).

1 ::=

PT-6. character_list_has_group(Pa,Gd) e.g. character_list_has_group(‘Dramatis Personae’, ‘tribunes.’).

::= [ () ] [ { , } { , } ; | | , | ] [.]

PT-7. group_has_character_description(Gd,Pe2) e.g. group_has_character_description(‘tribunes.’, ‘FLAVIUS.’). PT-8.

act_has_stage_direction(A,Std1)

PT-9. act_has_scene(A,Sc) e.g. act_has_scene(‘ACT 1’, ‘SCENE I. Rome. A street.’).

1 ::= | [...unstructured text ].

PT-10. scene_has_stage_direction(Sc,Std2) PT-11. scene_has_speech(Sc,Sp,L1) L1: First line of a speech. Though not explicitly an element, this attribute in combination with Sp is used to uniquely identify a speech e.g. scene_has_speech(‘SCENE I. Rome. A street.’,‘FLAVIUS’, ‘Hence! home, you idle creatures get you home:’).

::= 2 [ () ] ::= [ { | } ] [ { |
} ]

PT-12. speech_has_line(Sp,L1,L) e.g. speech_has_line(‘FLAVIUS’, ‘Hence! home, you idle creatures get you home:’ , ’Is this a holiday? what! know you not,’).

::= [ ] [ { |
} ] [ { } ]

Each primitive term can be populated by a query on XML documents. Take the question, “Given that ‘Tragedy of Julius Caesar’ is the title of the play, what is the play’s subtitle?”. Using First-Order Logic, the question is expressed using primitive terms as the following axiom to prove: ∃S play_has_subtitle(‘The Tragedy of Julius Caesar’,S). Using the query language XML-QL, the question is expressed as follows, and returns the value, $st=‘JULIUS CAESAR’.

::= e.g. in REYNALDO, servant to Polonius. - = REYNALDO - = servant - = to - = Polonius

WHERE Tragedy of Julius Caesar $st in "Julius_Caesar.xml"

e.g. in PARIS, a young nobleman, kinsman to the prince. - = nobleman - = kinsman - = to - = prince

CONSTRUCT $st

Though not explicitly structured using XML elements, there is an observed format for introducing characters, which applies with few exceptions. For instance, the value of the element always starts with the character’s name, and may be proceeded by combinations of pseudonym, qualifiers, and statements of relationship with other characters.

e.g. in friends to Brutus and Cassius. - = friends - = to - = Brutus - = Cassius 1 and are the only XML elements, all others are shown in for consistency with BNF notation 2 Formats in bold italics are primitive formats--those not defined in terms of other formats--significant for the ontology. Though other formats like and are primitives also, they markup extraneous information, which need not be represented using the ontology BNF [..] format within brackets is optional {..} format within brackets may occur 0 or more times | or x::=y grammar of x defined by format expressed as y

4

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

4

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Figure 4: Format for Persona and Group Descriptions, expressed in BNF notation

PT-18. description_has_relationship(D,Rn,Rp,Cr) e.g1. description_has_relationship(‘REYNALDO, servant to Polonius.’ , ’servant’ , ’to’ , ’Polonius’). e.g2. description_has_relationship(‘friends to Brutus and Cassius.’ , ’friends’ , ’to’ , ’Brutus’).

Obviously, the implementation to parse values within an element is not trivial and is an issue for future work. Nevertheless, assuming parsing capability, following are the primitive terms, which express relationships between or and the primitive formats such as and . Pe:

Pd: D: C: Ps: Qt: Lq:

Cr:

Rn:

Rp:

2.3.4. Ontology Data and Predicate Models. From the informal competency questions, the following key words are isolated: who, son, said/utter, where... take place, characters, sonnet, and what... say. Obviously, some of these words can be easily defined using the primitive terms as:

Pe1 (individual character description) or Pe2 (description of individual characters described in a group description); value for element Description for one character; value for format Pd or Gd (group description: value for element) Just the name of the character; value for format Pseudonym of the character; value for format A qualifying title of a character, e.g. ‘King’; value for format A location that qualifies a character’s title, e.g. ‘King of Denmark’; value for format A character who is referenced when describing another character; value for format The noun describing the nature of the relation between a character and Cr, e.g. father; value for format Relation preposition that qualifies Rn, e.g. ‘of’ in ‘father-of’; value for format

character(C) location(Lo) speaker_starts_speech_with(Sp,L1)

Pred-1. Pred-2. Pred-3.

These in turn are used to define ontology terms such as the following: play_has_character(P,C) play_has_location(P,Lo) has_father(C1,C2) speaker_says(Sp,L) speaks_sonnet(Sp,L1,Ls)

Pred-4. Pred-5. Pred-6. Pred-7. Pred-8.

A sonnet—its contents being a list Ls—is spoken by speaker Sp and starts with line L1. - play name - subtitle - scene description

e.g. has group relation e.g. - play name attribute

Play has act - act name - stage direction

has character list - character list name

Act

Character

Group

List

- group description

- character description name

PT-13. character_description_has_primitive_description

has scene

_set(Pe,Pd) - act name - stage direction

e.g1. character_description_has_primitive_description_set( ‘Senators, Citizens, Guards, Attendants’ , ’Senators’). e.g2. character_description_has_primitive_description_set( ‘CLAUDIUS, king of Denmark’ , ’CLAUDIUS, king of Denmark’).

Scene

Character Description has character description

has character description

has primitive description set has speech

PT-14. primitive_description_set_has_character(Pd,C) e.g. primitive_description_set_has_character(‘CLAUDIUS, king of Denmark’ , ’CLAUDIUS’).

- first line

Speech

- primitive Primitive description set

speaker starts speech with - location name Location

Description name Set

has character

has line speaks sonnet

PT-15. primitive_description_set_has_pseudonym(Pe,P

Line

s) e.g. primitive_description_set_has_pseudonym(‘MARCUS ANTONIUS (ANTONY)’ , ’ANTONY’).

- line text

Sonnet

Character has qualifier related to

has relationship has son

- character name - pseudonym - first line

Figure 5: Revised Data Model (Represented using EntityRelationships)

PT-16. description_has_qualifying_title(D,Qt) e.g. description_has_qualifying_title(‘CLAUDIUS, king of Denmark’ , ’king’).

2.3.5. Formal Competency Questions. Competency questions can now be posed, and formally expressed in First-Order Logic. Given a set of axioms in an ontology (Tontology), and a set of instance objects and relations

PT-17. description_has_location_qualifier(D,Qt,Lq) e.g. description_has_location_qualifier(‘CLAUDIUS, king of Denmark’ , ’king’ , ’Denmark’).

5

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

5

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

(Tground), a competency question Q is a First-Order sentence that can be entailed or is consistent with these sets; i.e. Tontology ∪ Tground = Q or Tontology ∪ Tground = ¬Q. CQ-1. Which character is the father of the Romeo character?

A primitive description set describing one character can have both the character’s name and pseudonym used. ∀C∀Ps∃Pd [ primitive_description_set_has_character(Pd,C) ∧ primitive_description_set_has_pseudonym(Pd,Ps) ↔ character_has_pseudonym(C,Ps) ].

∃C has_father(‘Romeo’,C). CQ 2. Which speaker says the ∃C speaker_says(C,‘Et tu, brute!’).

Defn-5. a) related_characters(C1,Rn,Rp,C2)

CQ 3. What is the location for the ∃Lo play_has_location(‘King Lear’,Lo). CQ 4.

C1 has a relationship, expressed as relation noun(Rn)+preposition(Rp), with C2, if: C1 is explicitly stated as related to C2 or C2’s pseudonym; C1 is a character introduced individually, or is any of the characters in a group that has a relationship to C2; andC1 and C2 are characters in the same play.

line, “Et tu, brute!”? play ‘King Lear’?

Which are all the characters in the play ‘Taming of the Shrew’?

∀C1∀C2∀Rn∀Rp∃D [ ( description_has_relationship(D,Rn,Rp,C2) ∨ ∃Cr ( description_has_relationship(D,Rn,Rp,Cr) ∧ character_has_pseudonym(C2,Cr) ) ) ∧ ( primitive_description_set_has_character(D,C1) ∨ ∃Pe∃Pd ( group_has_character_description(D,Pe) ∧ character_description_has_primitive_description_ set(Pe,Pd) ∧ primitive_description_set_ has_character(Pd,C1) ) ) ∧ ∃P ( play_has_character(P,C1) ∧ play_has_character(P,C2) ) → related_characters(C1,Rn,Rp,C2) ].

∃L∀C [ C∈L → play_has_character(‘Taming of the Shrew’,C) ].

Does Romeo speak a sonnet, and what are the lines that comprise it? ∃L1∃L speaks_sonnet(‘Romeo’,L1,L). CQ 5.

Each question corresponds to an axiom. To prove it, ontology axioms defining and constraining use of terms comprising the question axiom must exist.

Defn-6. b) related_characters(C1,Rn,Rp,C2)

2.3.6. Axioms To answer CQ-1, the following predicates are formally defined.

C1 has a relationship, expressed as relation noun(Rn)+preposition(Rp), with C2 if: C1 is a pseudonym for a character whose relationship with C2 can be inferred; C2 is a pseudonym for a character whose relationship with C1 can be inferred, or C1 and C2 are pseudonyms for characters whose relationship with each other can be inferred.

Defn-1. character(C)

The first part of a primitive description set for one character is the character’s name. ∀C∃Pd [ primitive_description_set_has_character(Pd,C) ↔ character(C) ].

∀C1∀C2∀Rn∀Rp∃Ca∃Cb [ ( related_characters(C1,Rn,Rp,Cb) ∧ character_has_pseudonym(Cb,C2) } ∨ ( related_characters(Ca,Rn,Rp,C2) ∧ character_has_pseudonym(Ca,C1) } ∨ ( related_characters(Ca,Rn,Rp,Cb) ∧ character_has_pseudonym(Ca,C1) ∧ character_has_pseudonym(Cb,C2) } → related_characters(C1,Rn,Rp,C2) ].

Defn-2. play_has_character_description(P,Pe)

A character description Pe either is in a list of individual character descriptions, or contained within a list of group descriptions. ∀Pe∀P∃Pa [ play_has_character_list(P,Pa) ∧ ( character_list_has_character_description(Pa,Pe) ∨ ∃Gd ( character_list_has_group(Pa,Gd) ∧ group_has_character_description(Gd,Pe) ) ) ↔ play_has_character_description(P,Pe) ].

Defn-7. a) may_be_related_characters(C1,Rn,Rp,C2)

Defn-3. play_has_character(P,C)

C1 may have a relationship, expressed as relation noun(Rn)+preposition(Rp), with C2, if: C1 and C2’s relationship (Rn+Rp) cannot be inferred for sure; C1 is explicitly stated as related to C2’s qualifying title or location qualifier; C2 is a character introduced individually, or is any of the characters in a group; C1 is a character introduced individually, or is any of the characters in a group that has a relationship to C2, and C1 and C2 are characters in the same play.

A character name C is the first part of a primitive description set describing one character, which is part of a list of character descriptions ∀P∀C∃Pe∃Pd [ play_has_character_description(P,Pe) ∧ character_description_has_primitive_description_set(Pe,Pd) ∧ primitive_description_set_has_character(Pd,C) ↔ play_has_character(P,C) ]. Defn-4. character_has_pseudonym(C,Ps)

6

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

6

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

∀C1∀C2∀Rn∀Rp∃D∃C∃D2 [ ¬related_characters(C1,Rn,Rp,Cb) ∧ ∃Cr description_has_relationship(D,Rn,Rp,C) ∧ ( description_has_qualifying_title(D2,C) ∨ description_has_location_qualifier(D2,C) ) ∧ ( primitive_description_set_has_character(D2,C2) ∨ ∃Pe2∃Pd2 ( group_has_character_description(D2,Pe2) ∧ character_description_has_primitive_description_ set(Pe2,Pd2) ∧ primitive_description_set_has_character(Pd2,C2) ) ) ∧ ( primitive_description_set_has_character(D,C1) ∨ ∃Pe∃Pd ( group_has_character_description(D,Pe) ∧ character_description_has_primitive_ description_set(Pe,Pd) ∧ primitive_description_set_has_character(Pd,C1) ) ) ∧ ∃P ( play_has_character(P,C1) ∧ play_has_character(P,C2) ) → may_be_related_characters(C1,Rn,Rp,C2) ].

In the next section, these axioms are applied to answer competency questions.

2.4. Demonstration of Competency ’ The Tragedy of Romeo and Juliet

Text placed in the public domain by Moby Lexical Tools, 1992.

SGML markup by Jon Bosak, 1992-1994.

XML version by Jon Bosak, 1996-1997.

This work may be freely copied and distributed worldwide.

Dramatis Personae ESCALUS, prince of Verona. PARIS, a young nobleman, kinsman to the prince. MONTAGUE CAPULET heads of two houses at variance with each other. An old man, cousin to Capulet. ROMEO, son to Montague. MERCUTIO, kinsman to the prince, and friend to Romeo. BENVOLIO, nephew to Montague, and friend to Romeo. . . LADY MONTAGUE, wife to Montague. LADY CAPULET, wife to Capulet.

Defn-8. b) may_be_related_characters(C1,Rn,Rp,C2)

defined similarly to Defn-6. Defn-9. has_son(C1,C2) ∀C1∀C2 [ related_characters(C2,’son’ , ’of’,C1) ∨ related_characters(C2,’son’ , ’to’,C1) → has_son(C1,C2) ]. Defn-10.has_wife(C1,C2) ∀C1∀C2 [ related_characters(C2,’wife’ , ’of’,C1) ∨ related_characters(C2,’wife’ , ’to’,C1) → has_wife(C1,C2) ]. Defn-11.a) has_father(C1,C2) ∀C1∀C2 [ related_characters(C2,’father’ , ’of’,C1) ∨ related_characters(C2,’father’ , ’to’,C1) → has_father(C1,C2)].

Figure 6: Excerpt from XML document of ‘Romeo and Juliet

Defn-12.b) has_father(C1,C2) ∀C1∀C2 [ has_son(C2,C1) ∧ ∃C3 has_wife(C2,C3) → has_father(C1,C2) ].

(i) play_has_character_list(‘The Tragedy of Romeo and Juliet’ , ’Dramatis Personae’). (ii) character_list_has_character_description(‘Dramatis Personae’ , ’Escalus, prince of Verona.’). (iii) character_description_has_primitive_description_set(’Escalus, prince of Verona.’ , ’Escalus, prince of Verona.’). (iv) primitive_description_set_has_character(’Escalus, prince of Verona.’ , ’Escalus’). (v) description_has_qualifying_title(’Escalus, prince of Verona.’ , ’prince’). (vi) character_list_has_character_description(‘Dramatis Personae’ , ’ROMEO, son to Montague.’). (vii) character_description_has_primitive_description_set(’ROMEO , son to Montague.’ , ’ROMEO, son to Montague.’). (viii) primitive_description_set_has_character(’ROMEO, son to Montague.’ , ’ROMEO’). (ix) description_has_relationship(’ROMEO, son to Montague.’ , ’son’ , ’to’ , ’Montague’). (x) character_list_has_character_description(‘Dramatis Personae’ , ’LADY MONTAGUE, wife to Montague.’). (xi) character_description_has_primitive_description_set(’LADY MONTAGUE, wife to Montague.’ , ’LADY MONTAGUE, wife to Montague.’). (xii) primitive_description_set_has_character(’LADY MONTAGUE, wife to Montague.’ , ’LADY MONTAGUE’). (xiii) description_has_relationship(’LADY MONTAGUE,wife to Montague.’ , ’wife’ , ’to’ , ’Montague’).

Obviously, many such relationship terms can be defined, e.g. has_mother, has_uncle, or additional definitions has_father. Also possible familial relationships can be defined using may_be_related_characters. Definitions for answering CQ-2 and CQ-3 are straightforward, so are not presented. The predicate play_has_character has been defined, so CQ-4 can be answered. It is difficult to precisely answer CQ-5 because defining when lines rhyme in a sonnet—consisting of three quatrains and a couplet with a rhyme scheme of abab cdcd efef gg—is complicated. Furthermore, existing Shakespearian XML documents do not represent paragraph separations, so quatrains and couplets cannot be defined without modifying the documents. The following axiom at least disqualifies a speech that is obviously not a sonnet. Cons-1. If a speech does not have 14 lines, it cannot be

a sonnet. ∀Sp∃Ls [ ∃L1∀L ( L∈Ls → speech_has_line(Sp,L1,L) ) ∧ n(Ls)≠14 → ¬speaks_sonnet(Sp,L1,Ls) ]. where n(X) is a function that returns the # of elements in a list X.

7

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

7

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

(xiv) character_list_has_character_description(‘Dramatis Personae’ , ’MERCUTIO, kinsman to the prince, and friend to Romeo.’). (xv) character_description_has_primitive_description_set(’MERCUTIO, kinsman to the prince, and friend to Romeo.’ , ’MERCUTIO, kinsman to the prince, and friend to Romeo.’). (xvi) primitive_description_set_has_character(’MERCUTIO, kinsman to the prince, and friend to Romeo.’ , ’MERCUTIO’). (xvii) description_has_relationship(’MERCUTIO, kinsman to the prince, and friend to Romeo.’,kinsman’ , ’to’ , ’prince’). (xviii) description_has_relationship(’MERCUTIO, kinsman to the prince, and friend to Romeo.’,friend’ , ’to’ , ’Romeo’). (xix) character_list_has_group(‘Dramatis Personae’,‘heads of two houses at variance with each other.’). (xx) group_has_character_description(‘heads of two houses at variance with each other.’ , ’MONTAGUE’) (xxi) character_description_has_primitive_description_set(’MONTAGUE’ , ’MONTAGUE’). (xxii) primitive_description_set_has_character(’MONTAGUE’ , ’MONTAGUE’).

- applying Defn-2 to (i) & (ii), infer (xxxiv) play_has_character_description(‘The Tragedy of Romeo and Juliet’ , ’Escalus, prince of Verona.’) - applying Defn-2 to (i) & (xiv), infer (xxxv) play_has_character_description(‘The Tragedy of Romeo and Juliet’ , ’MERCUTIO, kinsman to the prince, and friend to Romeo.’) - applying Defn-3 to (xxxiv), (iii) & (iv), infer (xxxvi) play_has_character(‘The Tragedy of Romeo and Juliet’ , ’Escalus’) - applying Defn-3 to (xxxv), (xv) & (xvi), infer (xxxvii) play_has_character(‘The Tragedy of Romeo and Juliet’ , ’’MERCUTIO’) - applying Defn-7 to (xvii), (v), (vi), (xvi), (xxxvi) & (xxxviii), infer (xxxviii) may_be_related_characters(’Mercutio’ , ’kinsman’ , ’to’ , ’Escalus’).

Figure 9: Answering CQ-6

Figure 7: Relevant Primitive Term Instances

How XML-hoo! answers competency questions is presented below.

So, the following competency question is answered. CQ-1. Which character is the son of the Montague character? ∃C has_father(‘Romeo’,C). returns has_father(‘Romeo’ ,

3. XML-hoo! XML-hoo! is presented to the user via a web browser. The user can perform: 1) guided search through a topic classification tree, 2) keyword search using a traditional search engine, or 3) a menu-driven conceptual search of Shakespearian plays. Since the first two are capabilities provided by existing XML portals, this paper will discuss the third. The user is presented with the following screen in which context sensitive menus are presented:

’Montague’). - applying Defn-2 to (i) & (vi), infer (xxiii) play_has_character_description(‘The Tragedy of Romeo and Juliet’ , ’ROMEO, son to Montague.’) - applying Defn-2 to (i), (xix) & (xx), infer (xxiv) play_has_character_description(‘The Tragedy of Romeo and Juliet’ , ’MONTAGUE’) - applying Defn-2 to (i) & (x), infer (xxv) play_has_character_description(‘The Tragedy of Romeo and Juliet’ , ’LADY MONTAGUE’) - applying Defn-3 to (xxiii), (vii) & (viii), infer (xxvi) play_has_character(‘The Tragedy of Romeo and Juliet’ , ’ROMEO’) - applying Defn-3 to (xxiv), (xxi) & (xxii), infer (xxvii) play_has_character(‘The Tragedy of Romeo and Juliet’ , ’’MONTAGUE’) - applying Defn-3 to (xxv), (xxi) & (xxii), infer (xxviii) play_has_character(‘The Tragedy of Romeo and Juliet’ , ’’LADY MONTAGUE’) - applying Defn-5 to (viii), (ix), (xxvi) & (xxvii), infer (xxix) related_characters(’ROMEO’ , ’son’ , ’to’ , ’Montague’) - applying Defn-5 to (xii), (xiii), (xxvii) & (xxviii), infer (xxx) related_characters(’LADY MONTAGUE’ , ’wife’ , ’to’ , ’Montague’) - applying Defn-9 to (xxix), infer (xxxi) has_son(,’Montague’ , ’Romeo’). - applying Defn-10 to (xxx), infer (xxxii) has_wife(,’Montague’ , ’Lady Montague’). - applying Defn-12 to (xxxi) & (xxxii), infer (xxxiii) has_father(’Romeo’ , ’Montague’).

Figure 8: Answering CQ-1 A more interesting question is, “Who are the kinsmen to Escalus?”, since no direct relationship reference to ‘Escalus’ is stated in character introductions. Rather, there are references to ‘prince,’ which describes him. This implicit relationship can be inferred using the axioms: CQ 6. Who is possibly Escalus’ kinsman? ∃C may_be_related_characters(C,‘kinsman’,‘to’,‘Escalus’,).

Figure 10: XML-hoo! Conceptual Search Interface

returns may_be_related_characters(‘Merculio’,‘kinsman’,‘to’,‘Escalus’).

8

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

8

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

parametrically represented that Montague is Romeo’s father. Yet, this can be reasoned in XMLhoo!. • The thorough example presented in this paper provide useful instructions for an ontology building endeavor to complement an XML repository. In fact, the example is novel insofar as it is an application of a traditional ontological engineering methodology to develop XML-based ontologies. • The XML-hoo! application serves as a first pass reference for any endeavor to develop a knowledge management application using XML-based ontologies. The crux of the XML-hoo! application is the Shakespearian play ontology. Its development can be described as follows: • An ontological engineering methodology is followed to state the motivating scenario and competency questions. • The hierarchical structure for a common set of XML documents, namely Shakespearian plays, is translated to develop primitive terms—ontology predicates that are populated by look-ups into an XML document, rather than inferred using formal definitions. • Additional structure within an element is discerned; e.g. there is a fomat for character introductions that holds with few exceptions, which applies to the element. This structure is translated to develop more primitive terms. • Ontology predicates are identified from competency questions and ensured for consistency with primitive terms. This is sufficient to express the competency questions formally in the language of the ontology using predicates. • Axioms that define meanings of predicates or constrain their interpretation are developed. By applying ontology axioms to populated primitive terms, answers to competency questions are inferred.

The pre-defined relationships in the menu correspond to predicates defined in the ontology. The diagram below explains how a user query is answered. Figure 11: XML-hoo! Main Systems Architecture, and Query Answer Process XML Document Repository

(5) (7) User Interface

(1) (8)

XML Query Engine

(6)

Control Module

(2) (4)

Ontology Query Engine

(3)

Ontology Repository In (1), the Control Module, a Java program, parses the query represented from the user interface, and translates and expresses it as a competency question in the ontology implementation language. In (2), the control module then passes the question to the Ontology Query Engine, which interacts with the Ontology Repository—the Prolog programming environment is used to implement both components—to prove the competency question axiom (3). In (4), a set of primitive term queries that needs to be answered are then passed back to the Control Module. In (5), the control module then translates each query in the XML query language, XML-QL, and passes them to the XML Query Engine, which repeatedly queries XML documents in the repository (6). In (7), answers to XML queries are returned to the control module, which returns a set of populated primitive terms to the Ontology Query Engine, which then proves the competency question— i.e. repeats steps (2)-(4). In (8), the Control Module formats the answer, if it exists, or an error statement, and returns that to the User Interface.

The rationale for the systems architecture is consistent with ontology use: Sharability and re-usability. The application can scale up to a variety of users querying about different domains and from different repository sources by de-coupling the Control Module, XML Query Engine and Repository, and Ontology Query Engine and Repository, albeit at the expense of inefficiency for focused users and a centralized repository source.

4. Concluding Remarks and Future Work The development of this ontology, and the XML-hoo! application based on it, provide the following evidence to support using domain specific ontologies to represent semantics for XML documents in a knowledge management system: • The capability of the Shakespearian ontology to support inference of facts not explicitly structured in XML demonstrates that an ontology-based approach to query answering is a natural complementary function for an XML data repository. Familiar relationships are not structured in XML. Plus, nowhere is it explicitly nor

This work can be extended several ways. There is an immediate opportunity to formalize and automate manipulation of format (grammar) within an XML element not further structured. This is inherently difficult, and achieving perfect manipulation (parsing) is unlikely. However, the systematic perspective of an ontologist can be applied to make simplifying assumptions in specifying grammar and restricting the domain of discourse described using that grammar. It is believed that a tractable parsing system can result, and a prototype for XML-hoo! is currently in development. 9

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

9

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

[6] Glushko, Robert J., Tenenbaum, Jay M., Meltzer, Bart (1999). "An XML Framework for Agent-based E-commerce", Communications of the ACM, Vol. 42, No. 3. [7] Kim, Henry M. (2000). "Integrating Business Process-Oriented and Data-Driven Approaches for Ontology Development", AAAI Spring Symposium Series 2000 - Bringing Knowledge to Business Processes, Stanford, CA, March 20-22. [8] goXML.com (2001). “goXML.com: Intelligent Searching”, [Online], Available: http:// www.goxml.com, April. [9] XYZFind (2001). “XYZFind - XML Database Repository, Query, and Search for XML”, [Online], Available: http://www.xyzfind.com. April. [10] XMLTree (2001). “xmlTree.com - Directory of Content”, [Online], Available: http://www.xmltree.com. April. [11] Bozak, Jon (1997). XML Shakespeare, [Online], Available: http://metalab.unc.edu/bosak/xml/eg/ shaks200.zip. [12] Kim, Henry M., Fox, Mark S., Grüninger, Michael (1999). "An Ontology for Quality Management: Enabling Quality Problem Identification and Tracing", BT Technology Journal, Kluwer, Netherlands, Vol. 17, No. 4. [13] Fensel, D., Horrocks, I., Van Harmelen F., Decker, S., Erdmann, M. and Klein, M. (2000). "OIL in a nutshell", Proceedings of the European Knowledge Acquisition Conference (EKAW-2000).

Beyond formally representing semantics to support automatic inference, ontologies are desirable for use because its representations are sharable and re-usable. So not only can ontologies be used to more richly answer queries about data structured in XML, they are inherently re-usable as more documents are added to a repository. From this prototype, it appears that the Shakespearian play ontology’s definitions are indeed applicable for other types of plays and literature, and can be de-coupled from specific structural definitions. Future work will endeavor to provide stronger evidence of this. Though the definitions of primitive terms may need to be changed as similar but different DTD’s are added to a repository, higher level definitions may be re-used with little modification. As several iterations of the ontological engineering methodology are applied, a structuring of the ontologies for a repository will emerge: General ontologies will be defined in terms of specific ontologies’ representations. Whereas XML is used to structure data, ontologies can then serve to structure knowledge composed from data. This supports the role of ontologybased languages to proceed XML towards Berners-Lee’s Semantic Web [13].

5. Acknowledgements The author expresses gratitude to the students in his Business Information Systems Analysis, who assisted in detailing the ontology and user interface.

6. References [1] Commerce One, Inc. (2000). "Commerce One XML Common Business Library (XCBLTM), an Interconnectivity Guide, Version 2.0.1", Commerce One, Inc., Hacienda Business Park, Bldg. #4, 4440 Rosewood Dr., Pleasanton, CA 94588, February. [2] FpML.org (2001). “FpML.org: Financial products Markup Language”, [Online], Available: http:// fpml.org, April. [3] Gruber, Thomas R. (1993). "Towards Principles for the Design of Ontologies Used for Knowledge Sharing", In International Workshop on Formal Ontology, N. Guarino & R. Poli, (Eds.), Padova, Italy. [4] Campbell, A. E. and Shapiro, S. C. (1995). "Ontological Mediation: An Overview", Proceedings of the IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing, Menlo Park CA: AAAI Press [5] Smith, Howard and and Poulter, Kevin (1999). "Share the Ontology in XML-based Trading Architectures", Communications of the ACM, Vol. 42, No. 3. 10

0-7695-1435-9/02 $17.00 (c) 2002 IEEE

10

Suggest Documents