research on how the query update is performed is also needed. .... mine whether or not the query needs to access the database during the transformation phase.
On Using Semantic Transformation Algorithms for XML Safe Update Dung Xuan Thi Le and Eric Pardede Department of Computer Science and Computer Engineering La Trobe University, Melbourne VIC 3083, Australia {dx1l3@students.,E.Pardede@}latrobe.edu.au
Abstract. XML update support in data repositories is gaining a new level of importance since the use of XML as data format has been widely accepted in many information system applications. In addition to the capabilities of update, research on how the query update is performed is also needed. Safe update maintains the integrity of the updated XML documents, which can be costly. In this paper, we show how to improve the performance of the safe updates using a series of semantic transformation algorithms. The algorithms will pre-process the schema to preserve the documents’ semantics/constraints. The information gathered is used to transform the update requests into semantic updates that can outperform the primitive safe updates. Keywords: XML Updates, XML Schema, XQuery, XPath.
1 Introduction XML Update once was perceived as an unnecessary operation in XML data management. XML documents were considered static and even if an update was necessary, the users simply replaced a whole document with a new one. However, with the increased usage of XML as a data format, there is a new attitude amongst XML communities towards XML updates. As with any traditional database, XML documents should be updateable and all issues associated with update operations have become emerging research topics. Many information system applications also use XML as the underlying data representation for their business. According to a recent study [3], 60% of information is now structured in XML-based representation. In addition, the use of XML encoding for standard data specification standards, such as BPEL (Business Process Execution Language), HL7 (Health Language 7) and AIXM (Aeronautical Inter Exchange Model) etc. has flourished. The volume of research on XML Updates has increased over the last few years, ranging from proposals for the updating of languages to studies of the implications of XML updates. Since XML documents can be stored in different repositories, the research on XML updates also varies based on the repository types. A common understanding between different research on XML updates is that XML safe update is an expensive operation. However, since XML update has now become J. Yang et al. (Eds.): UNISCON 2009, LNBIP 20, pp. 367–378, 2009. © Springer-Verlag Berlin Heidelberg 2009
368
D.X.T. Le and E. Pardede Non-Safe Update Primitive Safe Update
Semantic Transformation Algorithms
Semantic Safe Update
Fig. 1. Problem Definition
a core requirement and not merely an option, XML communities have to perform XML safe update during their data management. The question is how to find the cheapest way of performing the safe update. In this paper, we aim to improve XML safe update performance by using semantic transformation algorithms (see Fig.1). The updates take form in basic operations such as deletion, replacement and renaming of XML element(s)/ attributes(s). The updates will be the safe update, which considers the conceptual semantics/constraints of the documents, yet performance should improve from the primitive safe update operations [8]. For the repository, we use an XML-Enabled Database. Roadmap. Following the introduction, in section 2, we provide an overview of XML update and related works. In section 3, we discuss the framework of our work in comparison to existing work on XML update. Our update algorithms are explained in section 4 and we discuss the implementation setting as well as the case study in section 5. We perform some analysis in section 6 and conclude this paper in section 7.
2 XML Updates 2.1 Overview In the early days of using XML as data format, there was no capability for updating the data. If one wanted to change an XML data, one needed to remove the current version and upload a newer version of the data into the repository. This is, of course, not acceptable nowadays, as XML data has become more dynamic, in terms of the frequency of changes. [13] has proposed a data manipulation language for the XML database. In this paper, the authors proposed extensions to XQuery to support XML updates. The updates only consider the instances and therefore, no schema checking or constraint preservation is enforced. We refer to these as non-safe updates, as opposed to safe updates. Safe updates, on the other hand, consider schema checking or validation before performing the operations. In this paper, we will address three basic updates as described below. For a running example, we use the XML schema and the instance in Figure 2. The schema is shown to describe the constraints, while the instance is shown to demonstrate the queries in this paper.
On Using Semantic Transformation Algorithms for XML Safe Update
369
•
Insertion involves the addition of node(s) or attribute(s) into the XML data in the database. For example (see Fig.2), insert a second address of a particular employee.
•
Deletion involves the removal of node(s), attribute(s) or their contents from existing data. For example (see Fig.2), removal of all contract employees who work in Melbourne.
•
Replacement involves the change of node(s) and attribute(s). The change can be of the name or the content. For example (see Fig.2), change the supervisor of a particular employee.
In the next section, we describe some existing work on XML updates to show how our work fits in and contributes to the research area of XML update. a. XML Schema
company 1, ∝
Type Node
depart
Leaf Node n,m
Occurrence
1, ∝
1,1
1, ∝
staffList
name 1, ∝
1,1
1,5
1,1
city
1,5
1,1
address
1,1
address
1,1
name
1,1 1,5
name
1,5
perm
1,5
1,1 perm
1,1
edate
city
edate
status age
pcode
age
status
b. XML Instance company dept
dept dept
department
department
list of staff name
department staffList list of contract
name
contract
list of perm link to next contract
content
perm
link to next perm
“Finance”
name
perm
content
name
address
age
content
content
content
“Amy Smith”
“1 Green St”
“35”
city
pcode
status
content content
content
“Melbourne”“3000”
“Valid”
edate content
address
spv “Bob Scott” “5 Spring Rd” content
“01-01-1990”
contract
content
“Dan Carr”
age
content
city content
“25” “Melbourne”
Fig. 2. Sample XML: Schema and Instance
2.2 Related Works XML update mechanisms differ depending on the storage of the XML data. For Native XML Database (NXD), many products use proprietary languages, such as Lore
370
D.X.T. Le and E. Pardede
[4], that allow updating within the server. Other products use special languages such as XUpdate [7], which are designed to be used independently for any kind of implementation. Another strategy that is followed by most NXD is to retrieve the XML document and then update the XML using XML API [6, 10]. Another option for XML update in NXD is to embed the update processes into XML language and enable the query processor to read the update queries. This is the latest development and interest in this is growing. The first work in this area [13] embedded simple update operations into XQuery, thus it can be used for any NXD that supports this language. Subsequent work has extended this proposal; however, a basic issue remains unresolved, that being, we do not know how the update operations may affect the semantic correctness of the updated XML documents. Recently, W3C released a working draft for update facilities in XQuery [14] which is a first step towards fully integrating the update operations in available XML databases. When we use an XML-Enabled Database (XED) for XML storage, we have the benefits of the full database capability, including full support for update processes. Work has been conducted in some DBMS to translate XML query languages (and updates too, for that matter) into DBMS query languages such as SQL, the idea being to cover the expressive power of the XML query languages into a more widely used SQL language [1, 2]. The release of the SQL/XML standard [5] has provided a uniform language to manage the XML data in XED. Many XED products now implement update facilities that comply with this latest standard [12]. Unfortunately, there is no clear discussion on how to carry out safe manipulation by checking the XML constraints. In this work, we will propose algorithms to support safe manipulation, a feature which has been missing from existing work. The implementation of our algorithms will be based on the SQL/XML data manipulation language. In the next section, we show the syntaxes for the updates using this standard. 2.3 XML Updates Using SQL/XML The release of the SQL/XML standard has provided developers and users of XED with a relational platform, a common language for XML data management. Due to our page limitation, we only show a sample of standard statements for data manipulation provided in [5]. For deletion: DELETE [WITH ] FROM [WHERE ] For insertion: INSERT [WITH ] INTO For update: UPDATE [WITH ] SET [WHERE ]
On Using Semantic Transformation Algorithms for XML Safe Update
371
Understandably, the implementation of the standard can differ based on the environment. In the product we use for implementing our algorithms, the is the conditional statement written in XPath. The same is applied to the and components. Every time a manipulation request is made using one of the statements above, the database will proceed without checking the impact of the operations on the integrity of the data. This is what we refer to as a non-safe update. In contrast, we want to support safe updates by providing the mechanisms to check the constraints before the manipulation takes place.
3 Safe Updates Framework 3.1 Primitive Safe Update Framework Authors of previous work [11] have demonstrated how primitive safe updates can support constraint preservation of updated XML documents. However, these incur high costs compared to primitive updates [9]. Figure 3a shows the framework of primitive safe updates. Every time there is an update request, a trigger1 is employed to check the effect of the update on the constraints stated in the XML Schema. Based on the results of the check, the system either confirms the update or orders the cancellation of the update. Using this approach, high costs are incurred in accessing XML Schema and processing the update validity for every single XML Update. Therefore, a more efficient way to perform safe update is necessary. One solution is to incorporate the semantic transformation mechanism [8]. 3.2 Semantic Safe Update Framework The framework of semantic safe update is designed and shown in Figure 3b. On the start-up, the pre-processing schema is initiated so that all constraints/semantics defined in the given schemas are processed. This is shown as A in Figure 3b. The semantics are housed in the transformer component. The pre-processing schema component is no longer needed here, hence it is terminated. It only restarts if the schema is changed/modified. The transformer is a real-time system component. Its functionality is to accept the XML input predicate (in our case, as an XPath) and verify the constraints and transform it where possible. If the path predicate is valid, the transformer sends its semantic path predicate to the XED to perform the update. Otherwise, the conflict is detected and an informative message is returned to the user by the transformer. This is shown as B in Figure 3b. As we can see from the framework, our methodology offers very minimal interaction with the database. Also, the methodology provides the advanced ability to determine whether or not the query needs to access the database during the transformation phase. 1
A built-in trigger can be bundled with RDBMS package.
372
D.X.T. Le and E. Pardede
Input Data
User XML Updates
User XML Updates
Conflict semantic Valid event & Data Real-time Module/System
4 Update Decision 3 Update Confirmation
One-Time process Module
1 Request Update XML Queries Transformer
XML Schema
B
XML Documents
XED XEB 2 Constraint Checking
A
Pre-Processing Schema comp.
XML Schema
a. Primitive Safe Update Framework
XML
XML
b. Semantic Safe Update Framework
Fig. 3. Safe Update Frameworks
In the next section, we detail the proposed algorithms and how they rewrite the primitive safe update query into the semantic safe update query.
4 Proposed Algorithms As shown in Figure 3, two main tasks are conducted in semantic safe update: task A is the pre-processing of schema and task B is the transformation of the XML queries. Algorithm 1 is used to perform task A. This algorithm accepts two input parameters including a valid XML schema and the name of the schema root. The schema is processed and two data structure lists S and C are built respectively (Line 1-5 to 1-6). S stores a sequence of elements in the form of [parent][child] extracted from the schema. C stores a series of leaf-nodes and constraints followed by a series of restricted values in the form of [leaf node name][constraint name][V], where V is a set of restricted values assigned to each constraint that is associated with a leaf-node. For example, a leaf node “pcode” is restricted to an acceptance range, “inclusive” constraint, of between 2000 and 4000. For this particular node, we refer to the various constraints as cardinality, inclusive, exclusive, enumeration, length and pattern. All of these constraints will be processed by algorithm 1 and it will return S, C lists. Upon completion of schema pre-processing, the Semantic Transformation Algorithm (Algorithm 2) takes control. The goal of this algorithm is to transform the XPath into a semantic XPath. Algorithm 2 is started with a derivation of a list of unique paths2 (Line 2-9) from the sequence element list S. As we deal with update query type, the XPath is expected 2
A unique path is a full path that traverses from the root of the XML schema tree to a selected element. The path elements are separated by operator “/”.
On Using Semantic Transformation Algorithms for XML Safe Update
373
Algorithm 1: Pre-processing Schema 1-1: Input 1-2: T = XSD schema 1-3: R = root name of the schema 1-4: Output 1-5: S = List of sequence elements defined in XSD Schema. 1-6: C = semantic knowledge of elements obtained from T 1-7: Begin 1-8: push(“root”, R) into S 1-9: Let eType = List of types. If an element is a type then not leaf node 1-10: push (R ) into eType. R must be a type as it has at least one child. 1-11: WHILE eType not empty read next line l 1-12: IF pop (eType [0])∈l THEN eType [0] is 1st element of eType 1-13: REPEAT read next line to get fj child of eType [0] 1-14: push(ei,, fj) into S 1-15: IF fi, has valid semantic 1-16: push(fi, n,V) into C where n is the name of cons– 1-17: traint; V is a series of constraint values 1-18 IF fi, is also a type push (fi, ) into eType 1-19: UNTIL encounter end of eType [0] 1-20: Return S, C
with some restrictions in order to process the update. From these restrictions, we identify the elements, their values or their paths (Line 2-10 to 2-21). We refer to these types of paths as sub-paths. The identified restricted elements are then verified against the list of elements in the left-node element list C (Line 2-11 to 2-18). In the case of the restricted elements being found in list C, the elements’ restricted values will then be verified (Line 2-12). For example, the ‘age’ element is a restricted element and has a constraint value range between 16 and 52. If the restricted value of ‘age’ in the predicate is greater than 15 or less than 53, the predicate can be removed from Q (Line 2-12). On the other hand, if it is less than 15, then the transformed XPath is set to ‘conflict’ and no further transformation is needed. An alert is returned to notify the user. This also applies to the element in the predicate that does not exist in the leaf-node list C or the S list. In the case where the predicate contains only a restricted path, the algorithm also performs the transformation. If the restricted path is not expressed as a full path, which means it contains only a sequence of elements and operator ‘/’, then it has to be transformed so that it can be a full sub-path of one of the unique paths (Line 2-20). We use fn_semantic_path_transformation function for transformation (Line 2-20) to complete the path transformation task. The fn_semantic_path_transformation is comprised of two operations, namely semantic expansion and semantic contraction, which were firstly introduced in [7]. The former is a transformation of a given XPath to a unique path, while the latter is a transformation of a given XPath to another XPath, which is preceded by operator “//”, hence it is contracted from the recursive type in the XPath. In this paper, the fn_semantic_path_transformation has been selectively adopted to transform any path, including the restricted path, to a semantic restricted sub-path or a semantic path.
374
D.X.T. Le and E. Pardede
Algorithm 2: Semantic Transformation 2-1: Input 2-2: S = List of sequence elements defined in XSD Schema. 2-3: C = Semantic knowledge of elements obtained from 2-4: Q = XPath 2-5: Output 2-6: ℘ = Semantic Xpath or Error message 2-7: Begin Repeat 2-8 Let U be a list of unique path derived from S, r be the inner focus of predicate [ ], 2-9: Q be xpath input by user, P is set to NULL 2-10: FOR each r in (Q != NULL) DO Let γ be restricted element in r, ϕ be restricted values in r, τ be fragment of sub-path in r 2-11: 2-12: IF γ found in C list and all values ϕ ∈ γ existed r are in the domain range of/equivalent to all values ϕ ∈ γ existed in C THEN remove r from Q 2-13: 2-14: ELSE IF γ found in C list and only some value ϕ ∈ γ in the domain range of/equivalent to some values ϕ ∈ γ existed in C THEN 2-15: retain r in Q 2-16: ELSE IF γ not found in C list next item or γ found and ϕ ∈ γ not in the domain range of ϕ ∈ γ in r THEN set ℘ = ‘conflict’ 2-17: EXIT FOR 2-18: 2-19: IF found τ as a sub-path in U list item AND τ contains only operator ‘/’ THEN remove τ from Q IF found τ as a sub-path in U list item AND 2-20: τ contains operators other than ‘/’ τ’ = call fn_semantic_path_transform (τ) ELSE IF not found τ as a sub-path in U list item or τ’ is empty THEN set ℘ = ‘conflict’ 2-21: 2-22: ℘ = Q where Q has been transformed 2-23: IF ℘ is an UPDATE query THEN 2-24: WHILE not end of U Match φ to C list next item where φ is the update element 2-25: 2-26: IF φ found in C list next item and value of φ matched value of found next item THEN 2-27: ℘= call fn_semantic_path_transform (℘) 2-28: EXIT 2-29: ELSE φ not found in C list next item or φ found in C list next item and value of φ not matched THEN 2-30: ℘= ’conflict’ 2-31: ELSE IF ℘ is a DELETE query THEN ℘= call fn_semantic_path_transform (℘) 2-32: 2-33: IF ℘is not NULL AND ℘ is not ‘conflict’ THEN send ℘ to access the database to complete its task 2-34: ℘ = ‘Update/DELETE Done’ 2-35: UNTIL Input is ‘Esc’ Key ’
Continuing with the algorithm, Line 2-19 allows a predicate if the restricted path is a sub-path of one of the unique paths in the list derived from the S. The restricted path will also cause semantic conflict where it does not match the schema structure and where it is not a sub-path of any unique path (Line 2-21). Finally, Line 2-23 to Line 2-32 perform the transformation of the actual update queries where the values used for update are ensured to be within the restricted values in the schema. This operation ensures that the path is at its best efficiency and performance before it is sent to the database.
5 Implementation We describe the implementation into two stacks: (1) the hardware stack includes a machine that has a configuration of AMD Athlon 64 3200+, with 2300 MHz and 2.0
On Using Semantic Transformation Algorithms for XML Safe Update
375
GB of RAM; and (2) the software stack includes a Windows XP Professional OS and Java VM 1.5. We select a leading commercial XED and use their provided database connection driver to connect to our algorithm modules. We use five synthetic datasets (compliant with the schema shown in Fig.2) of varying sizes: 5, 10, 15, 20 and 25 megabytes. We run various update queries for each data set. Due to space limitations, we show only three of the queries. For each query, we show the original query and the result of query rewriting after it goes through the semantic transformation algorithms. Query 1: Updating text element with a conditional predicate – Valid operation Original Query: UPDATE companyxml5 SET OBJECT_VALUE = UPDATEXML(OBJECT_VALUE, '//perm/status/text()', 'Invalid') WHERE existsNode(OBJECT_VALUE, '//perm[ages