The Simplest XML Storage Manager Ever - CiteSeerX

4 downloads 11250 Views 66KB Size Report
19 Skyline Drive,. Hawthorne ... XQuery's data model handles namespaces and multiple character ... Permission to make digital or hard copies of all or part of this work for ..... and crash recovery – although we do not use those features yet, we ...
The Simplest XML Storage Manager Ever Avinash Vyas

Mary Fernandez ´

Jer ´ ome ˆ Simeon ´

Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974

AT&T Labs - Research 180 Park Ave. Florham Park, NJ 07932

IBM Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532

[email protected]

[email protected]

[email protected]

ABSTRACT After almost five years of incubation within the W3C, XQuery 1.0 is close to completion. Even before its official release, development of numerous XQuery implementations are underway. However, those implementations have focused only on either completeness or performance. In this paper, we report on our experience building Jungle, a secondary storage manager for Galax. We designed Jungle to be the “simplest XML storage manager ever” that supports the complete XQuery 1.0 language, and we show it can scale to documents up to several hundred megabytes. Interestingly, the process of developing Jungle lead us to revisit several core aspects of currently proposed XML indices. In particular, we propose alternative indices that can improve the performances for the child axis and for serialization. Jungle is fully operational and now being deployed in production applications.

1.

INTRODUCTION

After almost five years of incubation within the W3C, XQuery is emerging as the de facto query language for XML. Its success and impact are due, in part, to XQuery’s unwavering commitment to being a “good XML citizen”. In particular, XQuery supports or is compatible with many related and pre-existing XML standards. XQuery’s data model handles namespaces and multiple character sets, preserves the inherent order in XML documents, and supports all six kinds of XML nodes. XQuery 1.0 includes XPath 2.0 as a proper sub-language, making it compatible with other XML-aware languages that depend on XPath 2.0, such as XSLT 2.0 or XPointer. Lastly, XQuery’s type system is based on XML Schema, making it possible to query and produce typed XML documents. Being a good XML citizen, however, has a cost and building a complete implementation for XQuery is a significant undertaking. Implementors, both in the industry and in academia, are adapting decades of database and compiler technologies to their XQuery implementations and are crafting new techniques for processing XQuery efficiently. Still, there is currently no implementation that is both complete and offers reasonable performances for large scale applications. The objective of this paper is to describe Jungle, an XML storage manager that is integrated with Galax to create a completeyet-efficient XQuery 1.0 implementation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission of the authors. Informal Proceedings of the First International Workshop on XQuery Implementation, Experience, and Perspectives (XIME-P), June 1718, 2004, Paris, France.

Galax [7] is an open-source implementation of the family of XQuery 1.0 specifications [2]. It is designed to be complete and conformant with respect to the W3C standard. Being one of the very few publicly available XQuery implementations and one of the most complete, Galax has gathered a small user community and is the implementation of choice in many university database classes. More recently, Galax has been used to support constraint checking within Lucent’s UMTS wireless infrastructure which is now on trial at several customers sites. This application requires support for many of the advanced XQuery features notably node identity, recursive user-defined functions, and type conversions. Moreover it has to routinely process documents of several hundred megabytes, which is beyond the scope of all public XQuery implementations we are aware of. Jungle. The goal of the Jungle project is to develop the simplest XML storage manager that can support the following requirements. Completeness. The storage manager must support all of the operations of the XQuery data model [18]. This requirement is sufficient to ensure it can support all of XQuery. Galax implements an abstract interface for the XQuery 1.0 data model, which facilitates the use of a different physical representation of the document. That interface is used to easily adapt Galax on top of Jungle. Scalability. The storage manager must support efficient representation of XML documents, and scale to documents of the order of a few hundred megabytes. The response time for typical UMTS constraints must be of the order of a few seconds per constraint. In Section 4, we will show performance results for query evaluation time on document stored in Jungle. Reliability. Jungle must be reliable enough to be deployed in industrial applications. This requirement was the main driver behind our decision to favor simplicity and the use of wellestablished technology. In particular, Jungle is based on standard B-Tree indices, for which there exist good public implementations1 , rather than to develop novel XML indices. Although it may be possibly more efficient on paper to use new kinds of indices, our experience throughout the Galax’s development has been that the simplest solutions are often best: this results in code which is more robust, easier to maintain, as well as easier to extend in the future. In this paper, we describe the design and implementation of Jungle, and provide preliminary performance results that demonstrate the effectiveness of the proposed approach. Although not officially 1

Jungle is currently developed on top of BerkeleyDB [1].

Table/Index Main Text QName QName-ID Prefix-URI

Key pre id id QName prefix

Content par-pre, kind, id (, post) text QName id URI

j

Figure 1: Example Document

Table 1: Core storage structures

released yet, the jungle engine can be accessed from the Galax CVS development server2 . XML Indexes. Numerous strategies have been proposed for storing and indexing XML documents [14, 4, 5, 10, 6, 9, 12]. Most of those indices [14, 5, 10, 6] do not support all of the XQuery data model, but can act as secondary indices to speed up evaluation for specific fragments of the query. Other approaches, like Natix [12] require the development of native XML indices, and do not fulfill our requirement of being based on standard B-Trees. The approach proposed by Grust in [9] is appealing for several reasons. First, it can be used to support the XQuery data model in its entirety, including node identity, and all the XPath axes. Second, it is based on tables and as a result can easily be implemented on top of B-Trees. In the rest of the paper, we will refer to that approach as Accelerated XPath. Originally, Jungle was meant to be a direct implementation of Accelerated XPath on top of B-Trees. However, our first experiments revealed that two key operations: the child axis and serialization, are prohibitively slow. In Section 2 we describe two alternative designs that are effectively used in Jungle. The first includes two indices, one on a node’s children and one on its attributes. The second includes one index on a node’s first child, one on its next sibling, and one on its attributes. We are still investigating the relative trade-offs between those two alternatives. The former performs somewhat better for the child axis, whereas the latter performs somewhat better for serialization. Query processing. The main focus of this paper is on indexing rather than query processing. XML query processing is currently the subject of numerous research activities [16, 8, 15, 3]. Our experience is that improvements at the indexing level brings performances benefits throughout query evaluation. As such, our work is complementary to the work on query optimization. It is also complementary to the work on XPath indexes [14, 5, 10, 6] that can be used in addition to Jungle as secondary indices. Organization. In Section 2, we briefly recall the Accelerated XPath approach, then present the two alternative approaches that are used in Jungle. In Section 3 we describe Jungle’s implementation. And in Section 4, we present experimental results, notably comparing the performances of Jungle against the original Accelerated XPath approach. Section 5 reviews alternative indexing strategies, and Section 6 discusses future work.

2.

JUNGLE INDICES

We first recall the Accelerated XPath approach before presenting the two alternative children, and sibling indices implemented in Jungle. All three approaches share the same core storage structures, which are necessary and sufficient to implement the XQuery data model. These core structures contain all of the information needed to reconstruct the original XML document. Each approach then has additional auxiliary structures, in order to support efficient access to particular parts of the document’s structure or content, notably for specific XPath axes. 2

ncc.research.bell-labs.com:8081/cgi-bin/cvsweb

The Accelerated XPath approach [9]. Table 1 describes the core structures for all three approaches and are similar to the main table of Accelerated XPath. The main table in our implementation contains one record per node, keyed on the node’s pre-order number, which is obtained during document loading. Each record contains the pre-order number of the node’s parent node, the node kind (element, attribute, etc.), the identifier of its QName in the QName table, and an overloaded field whose meaning depends on the node kind. For a text node, for example, the overloaded field contains the identifier of the text node’s value in the Text table. As an example, Figure 2 contains the core structures for the document fragment given in Figure 1. Because there are many redundant QNames in documents, the QName and Prefix-URI tables serve as compact “string pools”. The QName table maps concise ids to pairs of a namespace prefix and a local name, and the Prefix-URI table maps concise prefixes to long URIs. Note that the Accelerated XPath design proposed in [9] also stores the post-order for each node, as indicated in italics in Table 1. Because it supports the children index directly, Jungle does need post-order to operate. This is one of the main departure from [9], and we will come back to that point later in the paper. In this approach, a node’s pre-order number also serves as its node identifier, which makes comparison of two nodes’ relative order very cheap. Pre 1 2 ID 1 2 3

Par-Pre 1 QName (”ex”,”a”) (”ex”,”b”) (”ex”,”c”) ...

Kind QName-id elem 1 elem 2 ... ID 1 2 3

Value-id -

Text-Value ”d” ”i” ”j”

Prefix URI ”ex” ”http://ximep/example.xsd”

Figure 2: Core tables for example document In the Accelerated XPath approach, all the axes are evaluated by scanning records in the main table. The number of records scanned is determined by applying a two-dimensional “window” to the main table that filters the nodes for a given axis. One dimension of that window consists of a pre-order range and the other of a post-order range. A window reduces the total number of records that must be scanned to compute the axis. For example, the descendant-axis window is:

This window is applied to the main table as follows. To compute the a node v’s descendants, select those records in the main table with pre-order number in the range pre-order of v to post-order of v plus the height of v. Similarly, the records must satisfy the remaining constraints, which are on the node’s post-order value, its parent

pre-order, its kind, and overloaded value. The * wildcard denotes any value is permissible. Limitations. We started by implementing the Accelerated XPath approach as is. However, a direct implementation exhibits two important limitations. The first limitation has to do with the evaluation of the child axis. In fact, the only difference between the window of the child and descendant axis is that the pre-order number of a candidate node’s parent must equal v’s pre-order. To illustrate this point, here is the child axis window:

As a result, evaluating the child axis is as expensive as evaluating the descendant axis, which is not acceptable for most queries. The second limitation has to do with the efficiency of serializing the result of a query. Not only serialization requires accessing a node’s descendants in document order, but it also requires to identify boundaries of children of a node in order to generate closing element event. In Galax, this is implemented as a recursive walk over the child axis. This recursive walk step by step turns out to be significantly inefficient using the window access described above. For a single node, its children are accessed (with cost equivalent to accessing its descendants), these children are pushed on a stack, and the procedure is applied recursively to each child on the top of the stack. For example, serializing node f in Figure 1 requires scanning the following nodes: Node f g h i j k

Children (g,h,k) () (i,j) () () ()

Node-records scanned (g,h,i,j,k) (h,i,j,k) (i,j,k) (j,k) (k) ()

Obviously, this is not optimal, because successive windows scan many nodes’ records multiple times. Our very first implementation of Jungle, which was based solely on the Accelerated XPath approach, exhibited significant overhead during serialization. The children index. The first alternative design that we considered was the addition of two specific auxiliary indices, one on the children of a node and one on its attributes. The children index maps the pre-order number of each element node to the sequence of pre-order numbers of the node’s children. Figure 3 contains a fragment of the children and attribute indices for the document in Figure 1. Key 1 2

Children (2,6) (3) ...

Key 1 3

Attributes () (4) ...

Figure 3: Children and Attribute Indices The obvious advantage of this strategy is that accessing a node’s children and its attributes requires constant time. In this strategy, the descendant axis can be implemented by a generic recursive walk of nodes’ children, which unlike the window technique, does not require accessing the main table nor requires comparisons on main-table fields. Serialization is similar to descendant: a node’s children are pushed on a stack, and the procedure is applied recursively to each of its children, in document order. Using that approach, each node in the main record structure must only be accessed once. As we will see in the experiments in Section 4, the

Key 1 2

First-Child 2 3 ...

Key 2 4

Next-Sibling 6 5 ...

Figure 4: First-child and Next-sibling Indices children index substantially improve evaluation of the child and descendant axes, as well as serialization. The sibling index. Our second alternative design replaces the children index with two indices: one on a node’s first child and one on its next sibling. This strategy corresponds to the decomposition of unranked trees into binary trees. Figure 4 contains a fragment of these indices for the example document. One of the motivation for this design is its close relationship with cursor access over nodes sequences in the XQuery data model. The original Galax evaluation engine implemented item sequences as main-memory lists in the physical data model, and the physical operators only accepted materialized lists of items. Currently, Galax’s physical data model includes both materialized sequences as well as cursors over sequences of tuples. The children index naturally corresponds to materialized sequences, whereas the first-child and next-sibling indices naturally correspond to cursors over sequences. Their inherent benefit is that both the child and descendant axes and serialization never require materialization of intermediate sequences. However, the sibling index is somewhat less compact than the children index. As a result, we are still investigating the trade offs between the two approaches. In general, it appears that the children approach speeds up operations that require access to nodes in a breadth-first fashion (e.g., child::), while the sibling approach speeds up operations that require access to nodes in a depth-first fashion descendant:: or serialization). About loading, post-order, and clustering. In the Accelerated XPath approach, the post-order number, which is required to compute an axis’ window, is stored in the main table. Because it supports the children axis directly, Jungle does not need support for post-order at all. This has a number of useful consequences. First, at loading time, it permits to write records to tables in document order as soon as the pre-order number is known. In case the postorder number is included in the main record, one must either do two read and one write for each record, or buffer the record in memory until after the end-element event is encountered and insert them in reverse document order. Our experiments show that updating the main record with the post-order number substantially slows down document loading, whereas writing the records in reverse document order slowed down the window operations, which can benefit from having records clustered in document order. A second benefit of not having post order is that we do not have to maintain it in the presence of updates.

3. JUNGLE IMPLEMENTATION We now describe our implementation of the Jungle storage manager, as described in the previous section. The Jungle Architecture. Figure 5 gives an overview of the architecture of the Jungle storage manager, and its relationship to the Galax XQuery processing architecture. We implemented the Accelerated XPath and the two alternative indices proposed in Section 2 using BerkeleyDB [1] at the physical level. BerkeleyDB is a storage manager that supports Records, BTrees and Hashtables. Each of those indices store arbitrary key-value pairs. It is a robust and widely deployed system which supports concurrency control and crash recovery – although we do not use those features yet, we

plan to do so in the future. BerkeleyDB is distributed as a C library that you can link into your application. Galax XQuery Engine XQuery Expression

Query Compiler

XQuery Plan

Evaluation Engine

Result (Data Model)

Serialization

XML Result

XQuery Data Model Interface

Physical Indexes BerkeleyDB (C++) XML Documents

Jungle Loader

Jungle’s XQuery Data Model Implementation

BerkeleyDB Caml API

Jungle Storage Manager

Figure 5: The Jungle Architecture

Loading Jungle stores. The lower part of Figure 5 describes Jungle itself. A loading utility is provided to populate the set of appropriate indices, as described in Section 2. In the rest of the paper, we will refer to this set of indices as a Jungle store. This utility takes one or more XML documents, a local directory where to create the indices, and a logical name for the store, and creates BerkeleyDB BTrees and Hashtables. Here is for instance, the command we use to load the XMark document into one new Jungle store. jungle-load -store_dir /tmp/test -store_name xmark xmark.xml The loading is applied directly out of the SAX stream generated from the Galax XML parser, which allows it to scale to large XML file, and operate in heap space bounded by the depth of the document. Querying Jungle stores. Once a jungle store has been populated, the user can apply arbitrary XQuery 1.0 expressions, without limitation in expressiveness. Accessing the store is completely transparent, and can be done using XQuery’s standard fn:doc() function. Jungle uses a special convention in the URI in order to identify the store.. For instance, the following call accesses the jungle store that was populated previously and its evaluation would result in accessing the complete database. doc("jungle:///tmp/test#xmark") Internally, Galax accesses Jungle through an abstract interface for the XQuery data model [18]. Galax’s abstract data model requires that an implementation support at least the parent, children, and attribute axes natively. Based on those axes, Galax provides generic implementations of the other axes, such as the descendant, ancestor, and sibling axes. It also permits underlying engines to provide native implementations of these axes if they like. For example, the generic implementation of descendant is done as a depth-first, recursive traversal of the child axis, while its implementation in the Accelerated XPath case uses the window approach. This design permits Galax to operate on many kinds of data sources

and permits implementations to add native axis support incrementally. It is important to note that the internal data model interface is used at two distinct places in the Galax engine. Firstly, it is used during XQuery evaluation to access information needed to execute the query. This results in an internal data model representation for the result, which typically points to nodes within the original Jungle store. Secondly, serialization is applied to those nodes in order to generate the actual result sent to the user. As we have seen previously, this operation will perform a depth-first traversal of the nodes contained in the result and need to access the data model as well. Jungle data model. The core of Jungle’s implementation is effectively mapping calls to the XQuery data model interface into the appropriate operations on top of the physical indices. The implementation is composed of essentially two layers. Galax being implemented in Objective Caml [13], a first layer is used to provide access to the BerkeleyDB API, which is written in C. That part of the code is independent of the actual indexing strategy. The second layer is used to effectively map each data model call to the corresponding access to the physical structure. This second part of the code requires to interpret the underlying indices as representing the XML documents in the store. We will not give further details here for lack of space. The interested reader can get access to Jungle’s development code on the Galax CVS server which is available on-line3 .

4. EXPERIMENTS We report here on a first set of experiments in query processing using Galax on top of Jungle. The performance results are preliminary as we are still doing fine tuning of the indices and modifying the Galax evaluation engine to benefit best from the availability of the Jungle indices. First, we report on the evaluation time for various kinds of queries on top of Jungle. Second, we compare the performances of the newly proposed children and sibling indices against the Accelerated XPath approach. Finally, we provide additional information about loading time and the size of the actual indices. Experimental setup. For the purpose of those experiments, we implemented the Accelerated XPath approach as well as the two alternative indices described in Section 2, using BerkeleyDB-4.2.1. The data used is either synthetic data generated for the XMark benchmark project, or live data from the Lucent UMTS application. Query processing experiments were performed on a IBM T23 laptop with about 256Mb memory, while indices access comparisons experiments were performed on a 2.6 GHz Intel Pentium 4 processor machine with hyperthreading with 1 GB of main memory. Both configurations where running Linux 2.6.4. In order to reduce inconsistencies, we performed each experiment 5 times, discarding the highest and the lowest number and taking the median of the remaining. Query evaluation time. Table 2 shows query evaluation time for the first 5 XMark queries on top of a Jungle store loaded with the XMark document generated with a factor 1 (116Mb). time (in s)

Q1 4

Q2 9

Q3 22

Q4 21

Q5 6

Table 2: XMark queries 3

ncc.research.bell-labs.com:8081/cgi-bin/cvsweb

Although still fairly slow, this is of the order of magnitude acceptable for our UMTS application, and was performed using a Galax evaluation engine without any optimization specific to the use of Jungle. Note, that by comparison, the original Galax implementation breaks for documents of more than 33Mb. Table 3 shows query evaluation time for the complete set of UMTS constraints (about 410) and compares response time between Galax in main-memory and Galax on top of Jungle. The experiment was performed on a rather small documents of 8Mb, which was the only available to us at the time of writing. One can see a significant improvement (about 30%) in processing time to the advantage of the Jungle approach compared to the main-memory approach. This is very encouraging considering that all of the data fits into memory, and that in the case of Jungle the data has to be fetched from disk. time (in s)

Main Memory 908.65

Jungle 678

Table 3: UMTS constraints Comparison between index approaches. We now report on specific experiments evaluating the performance of the basic data model operations: child::, descendant::, and serialization. Table 4 gives the average access time for the children axis for Accelerated XPath, the children, and sibling axis. In each column, we indicate the average time needed to access nodes randomly. Document Size (Mb) 1 10 20 30

Children Time(ms) 9.22 10.73 10.81 11.04

Next Sibling Time(ms) 12.83 14.77 14.93 15.42

Accel. XPath Time(ms) 52.56 67.34 50.05 50.83

Table 4: Children Access Time Since the children index approach requires a single lookup it is naturally the fastest of the three approaches. The sibling index approach requires multiple lookups, as many lookups as the number of children, making it 40% slower. Finally, the Accelerated XPath approach which requires to perform a scan over the descendants is about 10 times slower than children index approach. Table 5 shows the access time for the descendant axis. Once again, the children index approach is faster by a factor of 3 in comparison to the Accelerated XPath approach and by a factor of 2 in comparison to sibling approach. Although all of the three approaches access the same number of records, the use of the window requires additional comparisons. Document Size (Mb) 1 10 20 30

Children Avg(ms) 22.94 70.53 68.88 83.73

Next Sibling Avg(ms) 45.46 122.79 125.39 146.85

Accel. XPath Avg(ms) 108.34 206.81 208.40 222.30

Table 5: Descendant Access Time Finally, Table 6 shows the performance results for the serialization of the complete document back from the store. We do not indicate the serialization time for the Accelerated XPath approach as it appeared to be prohibitively expensive. As an example, serializing back a 1Mb document took more than 40 minutes! This

demonstrate the need for specific techniques to speed up serialization which is one of the most essential operations in the context of XML applications. Document Size(Mb) 1 10 20 30

Children Avg(ms) 3 30 64 95

Next Sibling Avg(ms) 3 29 64 96

Table 6: Serialization Loading and Footprint. Last, Table 7 gives the loading time (seconds) and footprint (megabytes) for XML documents of different sizes under the three indexing schemes. Loading time is approximately the same for the three indexing schemes. The footprints under the three indexing schemes are comparable to that of typical main memory DOM implementations. For e.g. main memory DOM materialized by Xerces is, in general, five times the size of the XML document. Although the children approach and the Sibling approach store the same kind of information, the use of fixed size data in the latter yields a compact representation and a slightly better loading time. In the Accelerated XPath approach, we need to create fewer index structures. However, the auxiliary binary tree postorder index needs to be continuously rebalanced as the document is loaded. This overhead results in similar loading time as in the children approach and a 12 percent higher footprint. Document Size 1 10 20 30

Children time size 2 4.72 31 42.09 66 86.34 101 145.14

Next Sibling time size 2 4.26 25 40.48 53 84.29 84 127.14

Accel. XPath time size 2 5.22 30 47.21 64 96.71 95 161.12

Table 7: Loading time (sec) and Footprint (MB)

5. RELATED WORK There has been a flurry of recent work on XML indices, and XQuery optimization. Most proposals [5, 14, 10, 4] have focused on path indices which can speed up evaluation of certain kinds of XPath expressions, but do not support the whole XQuery language. Some other approaches attempt to build complete XML indexing infrastructure from scratch [11, 12]. As we explained, we decided to focus on simplicity and rely on standard B-Tree index in order to provide more quickly a complete and reliable system. As far as we know, none of the above projects currently support all of XQuery 1.0. Finally, a few proposals attempt to implement XQuery on top of relational systems [17, 9]. We decided to base our work on [9] for its simplicity and its ability to deal with all of the XQuery data model operations, including document order and all of the XPath axis.

6. CONCLUSION & FUTURE WORK In this paper, we have described a complete implementation for a simple-yet-effective XML storage manager that supports all of XQuery 1.0. We have proposed two new specific indices that extend the approach in [9], and have provided first experimental result that show the effectiveness of those new indices.

We are now working on extending the Jungle store in two direction. The first of those direction is to be able to store and query documents which have been schema validated. The second of those direction is to support updates on top of Jungle. Galax already supports updates on top of its main-memory data model, and we believe it should be fairly easy to extend Jungle to allow updates. Still, support for updates raises a number of interesting question related to node identity and numbering schemes that we are planning to investigate.

7.

REFERENCES

[1] Berkeley DB product. http://www.sleepycat.com/. [2] S. Boag, D. Chamberlin, M. F. Fern´andez, D. Florescu, J. Robie, and J. Sim´eon. XQuery 1.0 An XML Query Language, W3C Working Draft, Nov 2003. http://www.w3.org/TR/xquery. [3] Z. Chen, H. V. Jagadish, L. V. S. Lakshmanan, and S. Paparizos. From tree patterns to generalized tree patterns: On efficient evaluation of XQuery. In Proceedings of International Conference on Very Large Databases (VLDB), pages 237–248, Berlin, Germany, Sept. 2003. [4] J.-K. M. Chin-Wan Chung and K. Shim. APEX : An adaptive path index for XML data. ACM SIGMOD, 15(5):121–132, June 2002. [5] B. Cooper, N. Sample, M. Franklin, and M. Shadmon. A fast index for semistructured data. In VLDB, September 2001. [6] K. Deschler and E. Rundensteiner. MASS: A multi-axis storage structure for large XML documents. In CIKM, 2003. [7] M. Fern´andez, J. Sim´eon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax Experience. In Proceedings of International Conference on Very Large Databases (VLDB), pages 1077–1080, Berlin, Germany, Sept. 2003. [8] D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal. The BEA/XQRL streaming XQuery processor. In Proceedings of International Conference on Very Large Databases (VLDB), pages 997–1008, Berlin, Germany, Sept. 2003. [9] T. Grust. Accelerated XPath location steps. ACM SIGMOD, 15(5):795–825, June 2002. [10] W. F. Haixun Wang, Sanghyun Park and P. S. Yu. ViST : A dynamic index method for querying XML data by tree structures. In ACM SIGMOD, pages 110–121, June 2003. [11] H. Jagdish et al. Timber: A native XML database. ACM SIGMOD, page 672, June 2003. [12] C. C. Kanne and G. Moerkotte. Efficient storage of XML data. In Proceedings of IEEE International Conference on Data Engineering (ICDE), 2000. [13] X. Leroy. The Objective Caml system, release 3.07, Documentation and user’s manual. Institut National de Recherche en Informatique et en Automatique, Dec. 2003. [14] Q. Li and B. Moon. Indexing and querying xml data for regular path expressions. In VLDB, September 2001. [15] N. May, S. Helmer, and G. Moerkotte. Three cases for query decorrelation in xquery. In XSym 2003, pages 70–84, Berlin, Germany, Sept. 2003. [16] Y. Papakonstantinou, V. Borkar, M. Orgiyan, K. Stathatos, L. Suta, V. Vassalos, and P. Velikhov. XML queries and algebra in the Enosys integration platform. Data & Knowledge Engineering, 44(3), Mar. 2003.

[17] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered XML using a relational database system. ACM SIGMOD, pages 204–215, June 2002. [18] W3C. XQuery 1.0 and XPath 2.0 data model. W3C, 15(5):795–825, June 2002.

Suggest Documents