2005 3rd IEEE International Conference on Industrial Informatics (INDIN)
Maintaining Data Consistency in XML-Based Applications Eric Pardede1, J. Wenny Rahayu2, and David Taniar3 1'2Department of Computer Science & Computer Engineering, La Trobe Univ., Australia,
email: {ekpardede, wenny} cs.latrobe.edu.au 3School of Business Systems, Monash Univ., Australia, email:
[email protected] Abstract-We have witnessed an increasing number of WebBased Applications that use eXtensible Markup Language (XML) as a data format. This fact has resulted in a high demand for better XML Data Stores. One issue of managing XML data storage that still needs to be addressed is the impact of update operations toward the consistency of the linked XML data. It is widely known that in XML-based applications, persistent references also avoid broken links and invalid search engine results. The current solution is done by regularly checking the broken links and manually rectifying any that are found. This of course requires a considerable effort. Our aim in this work is to avoid the broken references/links in the first place by checking before updating a document. In this work we propose new methodology to maintain the consistency of XML Data after update operations, which can be classified into insertion, deletion and replacement. The methodology is taking form as a set of functions that perform checking mechanism before an update. The methods are applicable for schema-based and also nonschema based XML Data. For implementation we apply the methods into a set of functions in XML-Enabled Database and in Native XML Database.
There are many aspects of XML Data Stores that are
immature and still need a considerably research [1]. One of these is the inefficiency of updating process [1], [13]. For industrial or business application, the data stores will need to
be maintained frequently. The support for consistency maintenance of the data stores is becoming highly important. This paper aims to propose a methodology to maintain the consistency of XML Data Stores after it is being updated. The method can be implemented in various database storage types. We use XML-Enabled Database for the schema-based XML Document and Native-XML Database for the nonschema-based XML Document. Following this introduction, section 2 briefly provides the related works. In section 3 we show how the reference can be embedded in various XML Document types. In section 4 we propose the update methodology and the implementation of the proposed method. Finally, we will conclude this paper in section 5. II. RELATED WORKS
Inde-x Ternm-consistency maintenance, referential integrity XML document is a collection of information that is structured XML Linking, XML Schema, XML Update I. INTRODUCTION
In the last few years XML (eXtensible Markup Language) has
gained high popularity for data representation and exchange on the web. Many web-based applications, which were implemented by using HTML, are now adopting the XML technologies due to its strengths. As a result, more and more XML documents are generated in recent years. In the near future, most of the stored data in a corporation will likely be in a form of XML documents. It has raised the demand for XML Data Stores [6], a data repository that has the capacity of a full-fledged Database Management System for tree-structured XML Data. Currently many industries are aware of the necessity to use XML for their data representation. Nevertheless, many are still reluctant to make a shift from their traditional data stores such as Relational Database to XML Data Stores.
0-7803-9094-6/05/$20.00 ©2005 IEEE
in a tree of nodes. Most of the time, we find an XML node containing information that is not explicitly stated in that particular node. The actual information is stored in another resource and the former node just refers to that resource. The different location between the two sources has raised the issues of referential integrity. How to maintain the referential integrity between the sources becomes a crucial task. Since the era of relational model we become familiar with the concept of primary and foreign keys [15]. The excellent support of referential integrity in this data model is the strength that has given RDB an important position in the database communities. Since then, any emerging data model requires referential integrity maintenance support as a basic requirement. It also applies for the XML data model that has been mentioned as the new era of data format in the database communities. The common way of referencing mechanism in XML data model is ID/IDREF or key/keyref [4]. This practice is normally done if we have a schema along with the XML Document instance [8], [14]. Another mechanism is by using XML
510
linking option [9], [15]. In any kind of database, having persistent references is important for integrity purpose. Specifically for XML-Based applications, persistent references also avoid ubiquitous broken links and invalid search engine results [15]. The current solution is done by regularly checking the broken links and manually rectifying any that are found. This of course requires a considerable effort. Our aim in this work is to avoid the broken references/links in the first place by checking before updating a document. Fig. 1 illustrates three invalid update operations that can happen if there is no checking mechanism toward the referential attributes/elements. This is the main motivation of the presented work.
In XML Schema we can provide two options of referential mechanism: ID/IDREF and key/keyref. We have chosen to use the latter for several reasons: (i) key has no restriction on the lexical space [14], (ii) key cannot be duplicated and thus facilitate the unique semantic of a key node, and (iii) key/keyref can be defined in a specific element and, thereby, helping the implementer to maintain the reference. For running example, we use the Faculty XML Document (see Fig. 2) shown using Semantic Network Diagram [3]. In this document we define the key StaffD and keyref StaffOffice. We use the key to maintain the uniqueness of different subjects in the database. We use the keyref to refer to another element/attribute. We have to define the key and keyref paths in the documents. Specifically for keyref we also need to define the referred source.
Reference
a.Insert
c.
c. Replace
b. Delete
Reference
Replace
Reference
Reference with
with
A' Reference
0
Reference
CentreName
*
ResearhMember
(reference Staff)
ResearchDesc
Fig. 1. Referential Integrity Problems without Update Checking To the best of our knowledge, there is no work that has discussed the maintenance of these various referential mechanisms after an update operation. In our previous work [8] we highlight three current XML update strategies. None of them has preserved referential integrity of the updated XML documents. III.
Fig. 2. Faculty XML Document
REFERENTIAL MECHANISMS IN XML DOCUMENT
There are two main categories of referential mechanism in XML Documents. They are differentiated by the existence of XML Schema, as the logical model ofthe XML Document.
A. Schema-Based Referential Mechanism The most common way to provide a referential mechanism in XML Document is through a schema. Among different schemas (XML-Data, DDML, DCD, SOX, RDF, DTD) [7], we In [8] we have proposed the way to locate the key/keyref select to use XML Schema [14]. It has high precision in depending on the number of participants type and the adhesion defining the data type, as well as an effective ability to define of the participating elements. For example, we show a binary the semantic constraints of a document. relationship between StaffResearch (see Fig. 2). Our proposal
511
is to create another child element under the root node to store three types (see Fig.3): (i) a link between two resources in the the ternary relationship. All participating elements will become same collection, (ii) a link between two resources in the same database, but from different collection and (iii) a link between the reference elements under this new element. two resources from different database. Among these link types, it is hard to maintain the persistent management systems that contain modules for checking desirable [15].
-- not
shown key/keyref CentreID
Note that we add some SQL/XML annotations [5] on the schema. These annotations transform the XML Schema components into SQL components. This step is required if we want to store our XML Document in a table. The namespace xdb in this case is the for the target storage.
B. XML Link The other option for referential integrity purpose is through XML Linking Technologies. W3C has been working on two important linking standards namely XLink [17] and XPointer [18]. XLink is used to describe complex associations between resources identified using Unified Resource Identifiers (URI). Sometimes XLink has added XPointer component for more detailed reference. XPointer, which is built on the XML Path Language (XPath) [16], is used as the basis for the fragment identifier for any URI reference that locates a resource [4]. The fragment may be a single XML element or a collection of elements. The only limitation is that the resources must be an XML Document. An example of the links is shown as follows. It has an XPointer. embedded in a simple XLink.
Fig. 3. XML Link Classification based on the Vertices The information about the links can be stored into separate database called linkbase [15], [17]. Every time the XML document is processed, the linkbase will be loaded. In our work, we aim to utilize the linkbase not only to store the links but also as a look up reference before an update operation. By doing so, we can prevent broken links in our XML documents. An example of a linkbase is shown below.
(!--.--) our previous work on