Semantic Query Formulation and Evaluation for XML ... - CiteSeerX

3 downloads 1082 Views 131KB Size Report
Asian Institute of Technology. {ptc, vw ... derived information from a database has been long studied ..... Salary. CT1: ..... Master's Thesis, Computer Science and Information.
Semantic Query Formulation and Evaluation for XML Databases Pongtawat Chippimolchai, Vilas Wuwongse, and Chutiporn Anutariya Asian Institute of Technology {ptc, vw, ca}@cs.ait.ac.th

Abstract A semantic query attempts to help a user to obtain or manipulate data in a database without knowing its detailed syntactic structure. Formulation and evaluation of semantic queries require schematic and other constraint information as well as general domain knowledge. A new approach to the formulation and evaluation of semantic queries for XML databases is proposed. It employs XML Declarative Description (XDD)—an XML based knowledge representation—to model the databases together with the schemata of their XML elements and to represent domain knowledge resulting in a uniform representation of all the required information. Equivalent Transformation (ET)—a new computation paradigm which transform a representation into another one while preserving their meanings—is used to optimize and evaluate the queries.

1. Introduction The XML Schema [7], a schema language for XML endorsed by the World-Wide-Web Consortium, is emerging as the standard way to describe and to model XML documents. XML Schema contains many improvements from its predecessor, the XML Document Type Definitions (DTD) including build-in and userdefined data types, subtyping by extension and restriction, integrity constraints (e.g. restricting multiplicity and restricting value), element and type substitution and various other improvements. Compared to DTD, XML Schema provides much more semantic information which is very useful for XML document processing, especially in formulation, optimization and evaluation of XML queries. Unfortunately, current XML query languages and query processing systems make little use of this valuable semantic information.

1.1.

Semantic Query

In order to take advantage of semantic information available in XML Schema as well as general domain knowledge, a new approach to query formulation and evaluation for XML databases is proposed. Employing the XML Declarative Description (XDD) theory [2][5], an XML database is modeled as an XDD description

comprising unit clauses, representing XML elements/document in the database, and non-unit clauses, describing relationships, constraints and other semantic information derived from the database schema and the general domain knowledge. A query, called semantic query, is also modeled as an XDD description consisting of one or more non-unit clauses. Unlike syntactic queries (e.g. XPath [8] and XQuery [6] queries) which only support retrieval of explicit data based on syntactic information (i.e. elements/documents structure), semantic queries enable retrieval of both explicit and implicitly derived information based on syntactic and semantic information contained in the database. A user describes what information is needed without having to know in details how the information is actually represented. For example, in a semantic query, the query: “list the name of all employees in the database” is equivalent to the syntactic query: “list the name of all faculty, trainee and employee in the database”, provided the database semantics specify that all faculty members and trainees are employees.

1.2.

Related works

The ability to retrieve both explicit and implicitly derived information from a database has been long studied in an area of deductive database. By applying some form of logic-based theory to formulate a data model, a deductive database system can directly supports the retrieval of both explicit and implicitly derived information. One of the most well-studied deductive database system is Datalog which employ a subset of firstorder logic theory as a data model. Datalog is explained in [9]. The logic-based theory can also be applied to XML document database modeling, resulting in a deductive XML database. However, since existing logic-based theories are not designed to handle XML data, there are some disadvantages in using them to model XML database. In contrast, XDD theory is designed exclusively for XML data and hence avoids most of these disadvantages. A detailed description of XML database modeling with XDD is given in [16]. A review of other approaches, e.g. Datalog, Description Logic and graphbased, is also given in [16] along with the comparison with XDD approach.

There are only few works that exploit the semantic information in XML Schema. The authors are aware of only one work that utilizes this information in query formulation and evaluation. In [1], conceptual query is proposed to enable retrieval of explicit information based on conceptual information (e.g. type information). A language, called XPathT, is also proposed as a typeenabled XPath extension. Unlike conceptual query, semantic query enable retrieval of both explicit and implicitly derived information based on syntactic and semantic information (including type information). It is to be emphasis here that the XDD theory, although look very much like logic-based theory, is not based on logic. The XDD theory is based on the Declarative Description (DD) theory, which is simpler but has comparable expressive power. The remaining contents are organized as follows. Section 2 summarizes XDD theory, Section 3 develops a modeling of XML databases and its semantic information (constraints, type and substitution) based on XDD theory, Section 4 describes formulation of a semantic query by means of examples, Section 5 outlines the technique for efficient evaluation of semantic queries based on the Equivalent Transformation paradigm and Section 6 draws conclusions.

2. XML Declarative Description XML Declarative Description (XDD) is a knowledge representation system which employs XML as its syntax and data structure and extends its expressive power by means of Declarative Description theory. An XDD description is a set of ordinary XML elements, XML expressions and XML clauses. An XML expression is an XML element extended with variables to represents a class of XML elements. An XML clause represents a conditional relationship between XML elements or class of XML elements. Given an XDD description, its semantics (meaning) is defined as a set of ordinary XML elements which are directly represented in and derivable from the description.

2.1.

XML Elements and XML Expressions

Ordinary XML elements are ground or variable-free. In order to enhance their expressive power, XML elements are extended into XML expressions by incorporation of variables. Every component of an XML expression—the expression itself, its tag name, attribute names and values, attribute-value pairs, contents, sub-expressions and certain partial structures—can contain variables. XML expressions without variables are called ground XML expressions or XML elements, while those with variables are called non-ground XML expressions. Table 1 lists variable types in XDD and their possible instantiation value.

Variable Type N-variable (Name variables) S-variable (String variable) P-variable (Attributevalue-pair variables) E-variable (XMLexpression variables) I-variable (Intermediateexpression variable)

Prefix $N: $S: $P: $E: $I:

Instantiated into Element types or attribute names Strings Sequences of zero or more attribute-value pairs Sequences of zero or more XML expressions Parts of XML expressions

Table 1 Variable Types Variable instantiation is defined by individual basic specializations of the form (v,w) where v is the specializing variable and w the specialized value. A sequence of basic specializations is a specialization. XML expressions and their specializations are characterized by a mathematical abstraction called an XML Specialization System. An XML specialization system, GX is a quadruple ‚AX, GX, SX, mXÚ where: •

• • •

AX is the set of all XML expressions,

GX is the subset of AX comprising all ground expressions in AX,

SX is the set of all specializations for the

expressions in AX, and mX is the specialization operator, which determines for each specialization s in SX the

change of each expression in AX caused by s. Figure 1 shows an example of specialization of a nonground expression a in AX into a ground expression g in

GX by application of the specialization operator mX and the specialization q in SX, i.e., g = mX(q)(a), or simply g = aq.

Nonground XML expression a: Johm $E:elements ⇓ Specialized by mX(q) into

Specialization q: ($N:empType, Faculty) ($S:eid, “9-3168”) ($E:elements, ($E:e1, $E:e2)) ($E:e1, ) ($E:e2, 2000 0, q is a constraint predicate and a1,…,an are XML expressions in AX. A constraint q(g1,…,gn) is called a ground XML constraint if g1,…,gn are ground XML expressions in GX. A truth of falsity of a ground XML constraint is predetermined. A specialization q ∈ SX is applicable to a constraint q(a1,…,an) if it is applicable to a1,…,an. q(a1q,…,anq) = q(a1,…,an)q is the result of application of q to q(a1,…,an). Based on the XML specialization system and XML constraint, an XML declarative description on GX, or XDD description, is a set of XML clauses, each of which has the form: H ← B1, B2, …, Bn where n ≥ 0, H is an XML expression in AX, and Bi is

an XML expression in AX or an XML constraint on GX, its order being immaterial. H is called the head and (B1, B2, …, Bn) the body of a clause, a unit clause if n = 0, a non-unit clause otherwise. When it is clear from the context, a unit clause (H ←) will be denoted simply by H whence every XML document immediately becomes an XDD description comprising only unit clauses.

2.3.

Semantics of XDD Description

The semantics or meanings of an XDD description P is a set of all XML elements which are described directly by and derivable from the unit clause and non-unit clause in P, respectively. If C is a clause (H ← B1,…, Bn), its head is denoted by head(C) and the set of all XML expressions and XML constraints in the body of C by object(C) and con(C), respectively. A specialization q ∈ SX is applicable to C if it is applicable to H, B1,…, Bn and Cq = (Hq ← B1q,…, Bnq). An XML clause is called a ground XML clause if it comprises only ground XML expressions and ground XML constraints. Let Tcon denotes the set of all true ground constraints, the associated mapping TP on 2GX of P is defined by: TP(G) = { head(Cq) | C∈P, q∈SX , Cq is a ground clause, object(Cq) ⊆ G, con(Cq) ⊆ Tcon }. The meaning of P, M(P), is: M(P) =



U [T n =1

P

] n (∅ )

where « is the empty set, [TP]1(«) = TP(«) and [TP]n(«) = TP([TP]n-1(«)) for each n > 1. XML Database Component

Formalized in XDD as

XML document comprising a sequence of n XML elements

n ground XML unit clauses, each of which describes its corresponding XML element in the document.

An extensional XML database comprising m documents D1, …, Dm

XDBE = P1 ∪ …∪ Pm, where for 1 ≤ i ≤ m, Pi is a description representing the document Di by means of unit clauses.

Intensional XML database describing relationships among XML elements

XDBI comprising a set of nonunit XML clauses, describing axioms, rules or relationships among XML elements.

Set of structural and integrity constraints on an XML database

XDBC comprising a set of XML non-unit clauses, describing constraints.

An XML database (Extensional database + Intensional database + Constraints) is modeled as an XDD description XDB = XDBE ∪ XDBI ∪ XDBC. The Semantics of the Database is M(XDB) It yields all the directly represented XML elements in the extensional database XDBE together with all derived ones, inferred from XDBI and satisfying all the restrictions XDBC.

Figure 2 XDD as a data model for XML database

3. Data Model for XML Database By employment of XDD as data model, an XML (document) database XDB becomes an XDD description with three parts: • Extensional XML database, XDBE - a collection of XML documents in the database. For an XML database with n documents D1,…,Dn, XDBE = P1 » P2 » … » Pn where Pi is an XDD description, which contains only unit clauses and represents the document Di. Note that, Di and Pi are identical. • Intensional XML database, XDBI - conditional relationships and rules among XML elements in the database, modeled as an XDD description comprising non-unit clauses. • Constraints, XDBC - structural constraints imposed by some schema as well as integrity and other constraints from the schema and the application domain. XDBI is modeled as an XDD description comprising non-unit clauses.

Celm1:

Celm2:

Kay 2500 Tor 1900

Celm3:

John 2000 Data Warehouse

(a) Extensional database Crel1:

$E:sub-elements $S:salary $S:bonus ← $E:sub-elements $S:salary , GreaterThan($S:salary, 5000), Add($S:salary, $S:salary , $S:bonus). CT1: ←. CT2: ←. CT3: ←. CT4: ←. CT5: ←. CT6: ← . CT7: ← , . CT8: ←.

% % % %

Crel1 defines that if an Employee has salary greater than 5000 then he is a SeniorEmployee with Bonus twice his Salary.

% %

CT1,…,CT5 represents direct typesubtype relationships from the schema.

% % % % %

CT6,…,CT8 represents transitive closure of type-subtype relationship, i.e., if subname is a subtype of iname which is a subtype of basename, then subname is a subtype of basename.

CS1: $E:subElements ← $E:subElements . CS2: $E:subElements ← $E:subElements .

% % % %

CS1 and CS2 describe Faculty and Trainee as a kind of Employee, as stated by the substitution group in the schema.

% % % % % %

Ccon1 restricts Salary value to be valid only if it is in the range [0, 10000].

(b) Intensional database Ccon1:

Ccon2:

$S:value ← GreaterThanOrEqual($S:value, 0), LesserThanOrEqual($S:value, 10000). $S:X_salary ← $E:X1 $E:X2 $S:X_salary $E:X3 , $E:Y1 $S:Y_salary $E:Y2 , LessThan($S:X_salary, $S:Y_salary).

(c) Constraints Figure 3 An example of an XML database

Ccon2 restricts Salary of an Employee to be valid only if it is lower than the Salary of his DirectSupervisor.

According to the semantics of the XDD description, the meaning of XML database XDB = XDBE » XDBI » XDBC, is the set of all XML elements directly represented in the database (XDBE) plus additional elements which are derivable from it. Figure 2 summarizes XDD’s XML database modeling. Please refer to [2] and [16] for more details of the XDD approach to XML database modeling and [3] for details of how structural constraints imposed by a DTD can be modeled in XDD. Example 1: Figure 3 shows an example of an XML database. The unit clauses Celm1,…,Celm3 represent an extensional database, the clause CT1,…CT7, and CS1 and CS2 represents semantic information from the schema of the database—type information and substitution group information respectively. The non-unit clause Crel1 represents a domain knowledge, describing as an employee with salary greater than 5000. Altogether, CT1,…CT7, CS1, CS2 and Crel1 represent an intensional database. The clause Ccon1 and Ccon2 represent constraint on Salary’s value. The former limit the value of valid salary to the range [0, 10000], the latter state that the valid salary of an employee should be less than that of his direct supervisor. Ñ

3.1.

Constraints

Structural and integrity constraints imposed by document schema such as the DTD or XML Schema can be modeled in XDD as non-unit clauses. A translator t is employed to translate a schema in its own language into an XDD description representing its imposed constraints. For example, let tDTD be a translator for DTD, tDTD(DTDA) yield an XDD description describing constraints in DTDA. Since the details of how the translation could be done are beyond present scope, only an example will be presented. The translator tDTD for DTD is defined in [3] and the translator tXSD for simplified XML Schema is defined in [13]. Example 2: A translation of the constraints for the Employee element from the XML Schema in Figure 4 is given in Figure 5. Ñ

3.2.

Types

The XML Schema (and other recent schema language for XML) allows the user to define an element type as a schematic unit that could be later bound with the element name. A set of commonly used types is supplied as builtin type and a mechanism is given to allow a user to define complex types and well as create a new type based on an existing type by extension or restriction of its structure (subtyping). A type in an XML Schema has a unique name and is classified as either a built-in, simple or complex type. A subtype can be defined by extension (addition of



Figure 4 An example XML Schema S

% % % CXSD1:

$E:subexp_1 $E:subexp_2 $E:subexp_3 $E:subexp_4 ← $E:subexp_1 , $E:subexp_2 , $E:subexp_3 , $E:subexp_4 .

% % % % CXSD2:

CXSD2 describes valid Name element as an element containing string data and no attribute. CXSD3 and CXSD4 describe valid DirectSupervisor element as an optional element with no content and single eid attribute. $S:pcdata ← . ← . ← .

CXSD3:

CXSD4:

% % % CXSD5:

CXSD5 describes valid WorksFor element as an empty element with one dept attribute whose value can be “IT” or “CS”. ← Member( $S:value, {IT, CS} ).

% % % CXSD6:

CXSD1 restricts valid Employee element to the one containing valid Name, DirectSupervisor, WorksFor and Salary element.

CXSD6 describes valid Salary element as an element with currency attribute fixed to “USD” and whose content is number in the range [0, 10000]. $S:value ← GreaterThanOrEqual($S:value, 0), LesserThanOrEqual($S:value, 10000).

Figure 5 Constraint imposed on Employee element more attributes or elements) or by restriction (put constraints on) the existing type (called base type). In the schema document, subtype is declared by an or element, both of which contain the base attribute declaring the name of the base type. An element name is bound to a type by an declaration with name attribute as the element name and type attribute as bound type name. Type and subtype information can be useful for document queries. With a query language which supports type information (e.g. XPathT [1]), a user can specify type

and type-subtype relationships (type hierarchy) as a part of query conditions. In XDD, type-subtype relationships are readily represented. Each type-subtype pair is represented as a ground unit clause of the form:

←.

where n1 and n2 are a type name of the subtype and the base type respectively. The transitive closure of the typesubtype relationship, , can be defined from by the two clauses: ← . ← , .

Example 3: For the XML Schema in Figure 4, its typesubtype relationship can be represented by the clause CT1,…,CT7 in Figure 3. The first five clauses, CT1,…,CT5, describe the base type of each subtype directly represented in the schema. The clause CT6 and CT7 describe the transitive closure of type-subtype relationship. Finally the clause CT8 represents reflexive closure, i.e. a type is considered as a subtype of itself. Ñ

3.3.

Substitution Group

An element substitution is a mechanism in the XML Schema which allows replacement of an element by certain other elements. An element which may be substituted is called a head element. Other elements can be substitutable to the head element, if they are declared to be in the substitution group of the head element. Elements in a substitution group can be used anywhere, where its head element is accepted. They are considered to be a (special) kind of head element, hence their type must be the same or derived from the head element’s type. A substitution group is represented as a set of non-unit clauses. Each clause, corresponding to each element in the substitution group, specifies that the element can be considered as a kind of its head element. Hence, as will be shown later, the query which ask for the head element will yields also the substitutable elements. Example 4: Consider the substitution group “Employee” in the XML Schema in Figure 4. The element is the head element and is declared to be in the ’s substitution group. Hence, is a kind of , as described by the clause CS1 in Figure 3. Likewise, the clause CS1 describes as a kind of . The attribute “xsi:type” is added to make the element conform to the schema. Ñ

4. Query Formulation A semantic query, which asks for information in an XML database is formulated as an XDD description comprising one or more non-unit clauses called query clauses. Such clauses are subdivided into three parts: Constructor describes the structure of the answer, Pattern part specifies the source of information and Filter specifies the selection criteria. The Constructor, Pattern and Filter are represented by the head, the body’s expressions and the body’s constraints of the query clause, respectively. This simple formulation satisfies most functionality requirements for an XML query language [10], for example, selection and extraction, regular path expression, nested query, join and aggregation [5][10]. Example 5: (Selection and Extraction). A selection is a retrieval of XML elements which satisfy certain conditions. An extraction is a retrieval of partial information (e.g. subelements). The source of information to be selected or extracted is described by an expression in the body of a query clause. The selection conditions are specified by constraints. If the selection conditions are related by “OR”, more than one query clause is required. The head of each query clause specifies the extraction condition. A query which returns Name and eid of an employee with salary greater than 3000 is formulated as: Q1: $S:name ← $S:name $E:otherElements $S:salary , GreaterThan($S:salary, 3000).

$S:name $S:sname ← $S:name $E:otherElements1 , $S:sname $E:otherElements2 .

Example 7: (Syntactic query). Consider the following simple query on employee: “Find all employees with salary greater than 2000” against an XML database conforming to the example schema in Figure 4. A user, who doesn’t know the schema of the database, may formulate a query (likely by guessing or by looking at some example data) as a syntactic XPath query: /Employee[Salary > 2000]

It is clear that the user who formulates this query does not know the fact that and are also a kind of , as is state by the schema’s substitution group. Thus, the user will receive an incomplete answer from which trainee and faculty employees are missing. In order to obtain a complete answer, a user needs to know the current and complete database schema in the formulation of a complete query. In this case, the following syntactic query provides the complete answer: /*[Employee OR Faculty OR Trainee][Salary > 2000]

Ñ

Example 6: (Join). Joining of information from different parts of a document is expressed by a query clause with multiple body expressions, each representing the relevant source of information. The head of a query clause specifies the joined structure, the constraints specify the join condition. For an equi-join, the same variable can be used to represent the equality join condition. The following query lists the name of employee along with the name of his/her supervisor. Note the use of variable $S:seid to specify the equi-join condition. Q2:

A semantic query will be evaluated against an XML database, both modeled as XDD descriptions, and will return as an answer the set of elements directly represented in or derivable from the database that satisfies the query conditions. Note that while the syntax of a semantic query is similar to an ordinary syntactic query (i.e., constructor-pattern-filter syntax), their answer sets differ. In a semantic query, an answer may contain derivable information in addition to the explicit information in the answer of a syntactic query.

Ñ

Obviously, formulation of such a syntactic query is quite difficult and error-prone (imagine having 20 kinds of employee). If the schema is changed (e.g. a new kind of employee added), the query needs to be updated or an incomplete answer may result. Ñ Example 8: (Semantic query). In contrast to a syntactic query, a user formulating a semantic query need not have complete schema information in order to get the complete answer. The semantic information from the schema and the domain knowledge, modeled as a part of the database, is use to augmented the query’s incompleteness. Since the semantic information is provided by the database administrator as part of a database, the user need not worry about them. Consider the query about employee and salary in Example 8, if the same user formulate a semantic query in XDD instead of a syntactic query, the following query will be resulted: Q3: $E:otherElements1 $S:salary $E:otherElements2





$E:otherElements1 $S:salary $E:otherElements2 , GreaterThan($S:salary, 2000).

Syntactically, this query looks similar to the XPath query of the previous example (just select and filter them with ). However, on evaluation this query will give the complete answer containing information from , and elements. This is so because the semantic information specifying and as a kind of is modeled as a part of the database (the clause CS1 and CS2 in the example database in Figure 3). Execution of Q3 against the database in Figure 3 yields: Cans1:

Cans2:

Kay 2500 John 2000 Data Warehouse Ñ

Example 9: (Type). When abstract type and type substitution are used, a semantic query can be formulated with type information as one of the conditions. From the example schema, a query asking for elements under whose type is TFaculty or one of its subtypes can be formulated as follows: Q4: $S:subElements ← $E:otherElements1 $E:subElements $E:otherElements2 , .

Normally, the xsi:type attribute appears only when type substitution is employed. In order to formulate a query using type information for any elements, every element in the database can be tagged with its type. The type tag is easily added as the document is validated and insert into the database; it is removed from the answer if it is not wanted. Ñ

5. Query Evaluation An evaluation of semantic queries is carried out by the Equivalent Transformation (ET) approach [12]. A query, represented by a description Q, is evaluated against a database, represented by a description P, by successive transformation of the description (P » Q) using semanticpreserving transformations (or equivalent transformations). The description (P » Q) is transformed into a simpler but equivalent description (P » Q1), (P » Q2) and so on until a description (P » Qn) in which the answers can be obtained directly, is produced (Figure 6). The meaning of the original description is maintained during the transformations, i.e. M(P » Q) = M(P » Q1) = … = M(P » Qn), because an equivalent transformation is guaranteed to preserve meaning. In general, Qn contains only ground unit clauses, which is the required answer. An equivalent transformation is represented by an Equivalent Transformation Rule (ET Rule) which, when apply, transform a target clause into one or more (equivalent) clauses. Each ET Rule has the form: Head, {Condition} Ø {Execution1}, BodyList1; Ø {Execution2}, BodyList2; Ç Ø {Executionn}, BodyListn. and express that if the Head matches with the expression in the target clause’s body (the target expression) and the Condition is satisfied, then each Execution part is performed and if it succeed a new clause is created with the target expression replaced by the expressions in the BodyList of the rule. The computation (query evaluation) is done in two steps. First a set of ET Rules is constructed. It includes the general rules constructed from the general knowledge (such as rule for dealing with aggregation) and the domain-based rules constructed from the database description P. Then, these rules are used to transform the query Q into the answer Qn. Note that once the rules are constructed, the database description is not need for the computation. An XML-based declarative programming language called XML Equivalent Transformation (XET) [3] has been developed to implements the equivalent transformation of XDD descriptions. An XET program comprises of a set of XET rules, which are ET rules represented in XML format, and a set of XML elements/documents regarded as the program’s data or facts. An XET program takes as its input a query represented as an XDD description, and computes its answer, which is a set of XML elements derivable from it, by using XET rules.

Da t a ba s e

[2] C. Anutariya, V. Wuwongse, and E. Nantajeewarawat, Towards a Foundation for XML Document Databases, Proc. 1st Intl. Conf. Electronic Commerce and Web Technologies (ECWeb 2000), Lecture Note in Computer Science, no. 1875, Springer-Verlang , 2000, pp. 324-333.

Que r y

P Q

M(P»Q)

Q1

M(P»Q1)

Ge ne r a l Rul e

Do ma i n –b a s e d Ru l e

Eq u i v a l e n t Tr a n s f o r ma t i o n Rul e s

=



= … =

Qn

M(P»Qn)

Γx XML Sp e c i a l i z a t i o n Sys t e m

M(P»Q)

Me a ni n g Ans we r

Figure 6 Query Evaluation by the Equivalent Transformation approach The structure and syntax of XET rule and fact is presented as an example XET program in Figure 7. The complete syntax and a detailed description of XET and its computation can be found in [3] and [14]. Only the example of rules and its application will be given here. Example 10: An XET program representing a fragment of XML database in Figure 3 (only clause Celm1…Celm3 and CS1 and CS2) is shown in Figure 7. The evaluation of the query in Example 8 is illustrated in Figure 8. Ñ

6. Conclusions A new type of queries—semantic queries—has been proposed as a means of retrieving the semantics of XML data, based on semantic as well as syntactic information. Their formulation and evaluation relies on XDD and ET theory. Various types of semantic information have been modeled as a part of a database and can be employed during the formulation and evaluation of the semantic queries as has been demonstrated. Based on the number of semantic information, domain knowledge and constraints which can be modeled by XDD theory, it is obvious that there is a great chance in performing (semantic) query optimization based on XDD and ET theory. Research continues in the area of query optimization.

7. References [1] B. Ludascher, I. Altintas, and A. Gupta, Time to Leave the Trees: From Syntactic to Conceptual Querying of XML. Proc. Intl. Workshop on XML Data Management (XMLDM 2002), to appear in Lecture Note in Computer Science, Springer.

[3] C. Anutariya, V. Wuwongse, and V. Wattanapailin, An Equivalent-Transformation-Based XML Rule Language, Proc. International Workshop on Rule Markup Languages for Business Rules in the Semantic Web, June 2002. [4] C. Anutariya, V. Wuwongse, K. Akama, and E. Nantajeewarawat, A Foundation for XML Document Databases: DTD Modeling. Technical Report, Computer Science and Information Management Program, Asian Institute of Technology, 2000. [5] C. Anutariya, XML Declarative Description. Doctoral dissertation, Computer Science and Information Management Program, Asian Institute of Technology, December 2001. [6] D. Chamberlin, J. Clark, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu, XQuery: A Query Language for XML. World Wide Web Consortium Working Draft, June 2001. [7] H.S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, XML Schema. World Wide Web Consortium Recommendation, May 2001. [8] J. Clark, and S. DeRose, XML Path Language (XPath) Version 1.0. World Wide Web Consortium Recommendation, November 1999. [9] J.D. Ullman, Principles of Database and Knowledge-base Systems – volume I. Computer Science Press, 1988. [10] K. Akama, C. Anutariya, V. Wuwongse, and E. Nantajeewarawat, Query Formulation and Evaluation for XML Databases, Proc. 1st IFIP Workshop on Internet Technologies, Applications, and Societal Impact (WITASI'02), October 2002. [11] K. Akama, Declarative Semantics of Logic Program on Parameterized Representation Systems, Advances in Software Science and Technology, vol. 5, 1993, pp. 45-63. [12] K. Akama, T. Shimitsu, E. Miyamoto, Solving Problems by Equivaent Transformation of Declarative Programs. J. Japanese Society of Artificial Intelligence, Vol. 13, No. 6, 1998 (in Japanese). [13] P. Thongtra, Enhancing the Semantics of XML Schema with Constraints. Master’s Thesis, Computer Science and Information Management Program, Asian Institute of Technology, August 2002. [14] V. Wattanapilin, A Declarative Programming Language with XML. Master’s Thesis, Computer Science and Information Management Program, Asian Institute of Technology, 2000. [15] V. Wuwongse, C. Anutariya, K. Akama, and E. Nantajeewarawat, XML Declarative Description: A Language for the Semantic Web. IEEE Intelligent Systems, May/June 2001, pp. 54-65. [16] V. Wuwongse, K. Akama, C. Anutariya, and E. Nantajeewarawat, A Data Model for XML Databases. Proc. 2001 Intl. Conf. Web Intelligence (WI-01), LNAI 2198, 2001, pp 237-246.

Kay 2500 Tor 1900 John 2000 Data Warehouse Evar-SubElements Evar-SubElements Evar-SubElements

% % % % %

The fact section contains extensional database (Celm1…Celm3).

% % % % % % % % % % %

The rule EmployeeSub is an unfoldingbased rule constructed from the clause CS1 and CS2. Pvarxxx and Evarxxx denote a Pand E-variable, respectively.

% % % % % %

contains an XML expression specifying the pattern to be matched.

% % % % % % % %

contains an XML expression to be replace matched element or an XET built-in or user-defined operation.





Figure 7 An XET program representing a fragment of an example XML database

Q: $E:otherElements1 $S:salary $E:otherElements2 ← $E:otherElements1 $S:salary $E:otherElements2 , GreaterThan( $S:salary, 2000). Q1_1: $E:otherElements1 $S:salary $E:otherElements2 ← $E:otherElements1 $S:salary $E:otherElements2 , GreaterThan( $S:salary, 2000). Q1_2: $E:otherElements1 $S:salary $E:otherElements2 ← $E:otherElements1 $S:salary $E:otherElements2 , GreaterThan( $S:salary, 2000). Q1_3: John 2000 Data Warehouse ←. Q2_1: Kay 2500 ←. Q2_2: John 2000 Data Warehouse ←.

%

The input query.

% % % % %

Applying XET rule EmployeeSub to Q yields Q1_1 and Q1_2.

% % % % %

Applying XET facts to Q and resolve for constraints yields Q1_3

% % % % %

Applying XET facts to Q1_1 and Q1_2 and solving for constraints yields Q2_1.

% %

Q2_2 is just a repeat of Q1_3.

% % %

Together Q2_1 and Q2_2 is the answer.

Figure 8 Evaluation of query in Example 8.

Suggest Documents