TMIX: Temporal Model for Indexing XML Documents Rasha Bin-Thalab
Neamat El-Tazi
Mohamed E.El-Sharkawi
Department of Information System Faculty of computers and Information Cairo University, Egypt
[email protected]
Department of Information System Faculty of computers and Information Cairo University, Egypt
[email protected]
Department of Information System Faculty of computers and Information Cairo University, Egypt
[email protected]
Abstract—Different models have been proposed recently for representing temporal data, tracking historical information and retrieving temporal queries results efficiently. We consider the problem of indexing temporal XML documents. In particular, we propose an indexing scheme that uses a summary structure and a matrix that captures the structural relationships as well as time intervals inside a temporal XML document. We introduce an algorithm to efficiently process all types of temporal queries with any depth using our newly proposed index. We show that our proposed index out-performs the state of the art indices in terms of both query processing time and support for different temporal query types. Keywords: Temporal XML, Indexing, Query Processing, SemiStructured Data, Summary Schema
I.
I NTRODUCTION
Temporal data has received a lot of interest in different areas including XML; temporal XML. Temporal XML document records the evolution of data as new elements are being inserted, existing elements are edited or deleted. Time stamps are associated with XML elements to keep track of history. Storing temporal data in XML opened interesting research areas such as data modeling [1], [2], querying [3], indexing [4], [5], and recently keyword search [6]. A temporal XML document can be modeled as a tree. Each edge in the tree is attached with a time stamp interval denoting time validity of the incoming node. Figure 1 presents a temporal XML tree representing a fragment of a company database. Employees can move from one department to another during their life work period. Our contributions can be summarized as follows: •
Summarize the temporal XML document structure into a matrix using an edge-based approach. This matrix efficiently detects if the query is covered without accessing the physical data layer.
The rest of this paper is organized as follows: Section II presents related work in indexing temporal XML. Section III defines preliminaries and background. The proposed indexing technique, TMIX, and its construction is presented in Section IV. Query processing and evaluation algorithms are introduced in Section V. The effect of updates on the index is addressed in Section VI. Experimental evaluation is shown in Section VII. And we conclude in Section VIII. II.
R ELATED W ORK
Several approaches address temporal XML indexing by extending regular XML indexing methods such as TempIndex [5] which extends the regular summary structure of XML documents to index transaction time dimension. Later, TempIndex was extended by using a graph model taking reference edges and temporal consistency into account [7]. However, TempIndex suffered from complexity and usage of large space. Later, Gao [4] proposed IB-tree structure to index valid time of nodes. Each node is represented by (node_code, Interval, depth). Two main issues were raised upon using this structure. The first was the space overhead cost since the elements were stored twice. The second issue was the additional cost incurred to maintain all lists during updates. Zhang [8] made use of the suffix linear algorithm [9] to construct temporal XML tree. Another approach is TFIX [10] which is based on FIX [11]. It converts XML documents into bisimulation graph and builds asymmetric matrix for this graph. TFIX took the root interval into consideration but it did not consider updates. In this paper, we are providing an XML index that handles time dimension while avoiding complexity and overhead cost of index update propagation that exist in current works." III.
P RELIMINARIES AND BACKGROUND
•
A B+ tree is built on top of the matrix entries to facilitate fast probing of temporal elements.
In this section, we define the temporal XML data model that we use in our work as well as the schema coding number used in our temporal model.
•
A new temporal query processing algorithm is presented by utilizing the coalesced matrix to enhance temporal queries evaluation performance.
A. Data Model
•
A Temporal update algorithm is proposed and its effect on the proposed index is analyzed.
•
Extensive experiments are implemented to evaluate the performance of our proposed index TMIX.
978-1-4799-0792-2/13/$31.00 ©2013 IEEE
There are two main dimensions of time; valid time and transaction time. We consider only transaction time as the main temporal dimension in this paper. Transaction time has an interval duration between two points [ts , te ] where ts represents the time when the data is inserted and te represents the time when the data is modified or deleted. Each element
Company
[1990, now]
[1995, now] Dept
1.1
1.2
Emp
3
[1996, now]
name
[1991, 1993]
3000
John
name 1.2.2.1.1
salary 18 1.1.2.2.2 [1991, 1992]
13
salary 1.2.2.1.2
19
20
3500
21
name
Mary
1.3.2 10
8000
14
1.2.2.2.1
[1992, 1993]
3000
salary
George Emp 1.2.2.2
c
17
c
name 1.1.2.2.1
c
Jane
16 salary 1.1.2.1.2
c
15
[1996, 2002] Emp 1.2.2.1
1.3.1 9
[2000, now]
Emp 1.1.2.2 12
11
name
1.2.2 8
Sales
c
name 1.1.2.1.1
Employees
salary 22 1.2.2.2.2.2 c
c
[1993, 1994]
Production Emp 1.1.2.1
1.2.1 7
c
6
c
Employees 1.1.2
c
[1991, 1994]
name 1.1.1 5
1.3 4
c
2
[2000, now]
c
Dept
1 [1990, now]
1
4000
Michel
4000
Fig. 1: Temporal XML for Company of Employees
has two attributes that stores start and end time to be used as transaction time. We use the word "now" to indicate current time. Handling current time was covered in detail by Cliford et al [12]. We use the temporal XML model defined in [7] which is based on transaction time model. B. Document Order and Numbering Schemes In temporal XML, there is no unique order between all nodes, since each node has its own validity interval. At each instant of time there is a different order for nodes as long as each node can change or modify. We use a discrete time domain to impose linear order through intervals. Each interval contains an ordered pair of time points [ts , te ]. The relation order ts < te means that the interval validity ordered from ts up to te . For any instant of time t there is a total order for each snapshot of document D(t). There are cases for any pair of nodes n1 and n2 , n1 where Es is the set of parent-child pairs for all edges in SG, IClass is the set
Node Results
Fig. 3: TMIX Framework
of intervals in D and Enode is the set of all nodes in D having the same interval. TEC maps each summary edge with an interval to the corresponding nodes representing the edge end. TMIX uses a B-tree index for facilitating query processing by having both structural and temporal information available in key retrieval. A key in the B-tree is composed of the first two attributes in TEC. The architecture of TMIX is presented in Figure 3. B. Index Construction The index construction algorithm is presented in Algorithm 1. The algorithm parses the input XML using a SAX parser that passes over the XML document only once in depth first order. While constructing TMIX structures, a value index is also built to support value predicates by storing values separately. Each value is associated with its node parent id. The separate index provides a direct access to values to prune search space. The complexity of the index construction algorithm is O(n×m×β) where n is the number of nodes in the XML document, m is the number of distinct labels and β is the number of coalesced intervals. The space Requirements of TMIX is composed of matrix and B-tree size. The matrix TM size is equal to O(k ×l) where k is the number of distinct labels and l is less than k by the number of leaf nodes. The B-tree index size equals to O(k + n), k is the number of keys and n is the number of nodes in XML document. V.
Q UERY P ROCESSING
This section illustrates query evaluation algorithms to evaluate queries using our proposed index TMIX. The main idea is based on taking advantage of indexing edges (e) with their time intervals (I). Edges are encoded by hashing its start and end labels encoding as shown in Figure 2 with time interval to form the key pair < e , I> , as shown in Table III. We use TXPath [7] which is a temporal extension of XPath 2.0 as a query language to represent temporal queries. Algorithm 2 illustrates query processing using TMIX index. The input TXPath is decomposed into a sequence expressions. We
Algorithm 1 Index Construction Input: Temporal XML Document D Output: T B, M , T D // Interval: contains start and end time // N ode: stores an element; label and Interval // M : stores the temporal matrix T M // P athStack : a stack that stores a sequence of N ode from the root up to a leaf node // KeySig : an array that holds an id for parent- child nodes and an Interval structure // EqKey : a dictionary that contains Key signatures and Node values // T B: a B tree file // T D: a dictionary of the labels with their codes 1: M ← ∅ 2: EqKey ← ∅ 3: P athStack ← ∅ 4: T D ← ∅ 5: while not end of D do 6: if start event then 7: create N ode // Node structure store label and interval 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40:
N ode.label ← element.name N ode.Interval.start ← element.start N ode.Interval.end ← element.end P athStack .push(N ode) else if closed event then N ode ← P athStack.pop if P athStack 6= ∅ then N odeP ← P athStack .top parent ← MEntry(T D, N odeP .label) else parent ← 0 // root has no parent end if child ← MEntry(T D, N ode.label) if M [parent, child] = ∅ then M [parent, child] ← N ode.Interval else AppCovInt(M, parent, child, N ode) end if KeySig .pc ← Hash(parent, child) KeySig .Interval ← N ode.Intravl if EqKey contains KeySig then append N ode to EqKey [KeySig ] else create new KeySig in EqKey add N ode to EqKey [KeySig ] end if if StackP ath is ∅ then insert EqKey into T B end if end if end if end while
Procedure AppCovInt(M , p, c, N ) begin if N.Interval ( M [p, c] then coalesce N.Interval to M [p, c] end if end Procedure Function MEntry (T D, L) begin search for L in T D if not exist then add L to T D end if return index of L in T D end Function distinguish between four expressions; parent-child, ancestordescendant, time condition, and value predicates. When parentchild expression "/" is encountered between two sequence labels, hashing code is applied to the labels encoding to get the first part of searching key. Otherwise, if ancestor-descendant relationship "//" is encountered, the direct parents of a given child is calculated by calling SearchAD algorithm. When a temporal predicate (asking for a start or end of interval) is encountered, the computed key is attached by the given interval. If the interval is included within the TM, key is used to prune search and retrieves a list of pairs that satisfy the condition. Otherwise if no temporal predicate is detected, only the first part of the key is used in the retrieval process. Value predicate is evaluated using value index which retrieves a list of nodes ids which match the given value. The intermediate lists resulted from each expression are stored in M edResult. At the end, a structural join process is performed on the intermediate result based on their Dewey encoding to get the answer of the query. Algorithm 3 presents SearchAD function which is based on a temporal matrix TM. The algorithm is a dynamic stack that searches for the direct parents of the specific node in the schema. In the worst case, complexity of SearchAD is O(l2 ), where l is number of distinct labels. Query processing complexity is O(m + α × l2 ), where m is query length, α is the number of ancestor-descendant relationships in the query and l is the number of distinct labels. Since m and α are constants, we can say that the query processing complexity is O(l2 ) which is totally dependent on the matrix size. I/O complexity is based on the number retrieved nodes from the B tree which is O(logd n) where d is the size of page used in B tree file and n is the number of elements in the temporal XML document. There are several types of temporal queries [16] such as projection, snapshot, slicing, join and aggregate which are all supported by TMIX. Consider the following example for aggregation and join query which asks for employees names who were in sales department when "Mary" joined the department for the first time. Using TXPath, the query is expressed as follows: let $m=min(//dept[name="Sales"]/emp[name="Mary"]/start) return //dept[name="Sales"]/emp[$m ≥ start and $m ≤ end]/name. Minimum aggregation function is applied to the list of start times that is retrieved from let statement. The returned start time (2000 as in Figure 1) is used to evaluate the time condition. Value predicate dept[name="Sales"] retrieves
Algorithm 3 SearchAD
Algorithm 2 Query Processing Input: Temporal query: query, Dictionary T D, Temporal Matrix: T M , B-tree File: T B, V F Output: Result: set of retrieved nodes 1: j ← 0 // counter for intermediate results 2: Query ← Parse query into expressions 3: repeat 4: Lp ← 0 5: Get parent and child encoding from T D 6: if parent-child relationship is encountered then 7: if TM(parent, child) 6= ∅ then 8: Lp ← parent 9: end if 10: Query.next 11: end if 12: if Anc_Desc relationship is encountered then 13: Lp ← SearchAD(parent, child) 14: if Lp .length == 0 then 15: print : Error no relationship is detected 16: end if 17: Query.next 18: end if 19: if value_predicate then 20: M edResultj ← retrieve(value from V F file) 21: Query.next 22: end if 23: if time_cond then 24: for parent in Lp do 25: if T M [parent, child] include time_condition then 26: key1 ← Hash(parent, child) 27: time_interval ← extract interval from time_condition 28: M edResultj ← retrieve(key1 , time_interval) from T B file 29: Query.next 30: else 31: exit no available intervals 32: end if 33: end for 34: else 35: for parent in Lp do 36: key1 ← Hash(parent, child) 37: M edResultj ← retrieve(key1 ) 38: end for 39: Query.next 40: end if 41: j ← j+1 42: until end of Query Tj 43: Result ← i=0 M edResulti
Input: T M, parent, child Output: new list of parents Listparent 1: ColumnStack ← ∅ // stack stores columns number 2: Listparent ← ∅ // list of detected parents 3: if TM[parent,child] 6= 0 then 4: Append parent to Listparent 5: return Listparent 6: end if 7: while true do 8: for i = 0 to T M.column.length do 9: if T M [parent, i] 6= 0 then 10: ColumnStack.push(i) 11: end if 12: end for 13: if ColumnStack 6= ∅ then 14: parent ← ColumnStack.pop 15: while parent > T M.rows.length do 16: if ColumnStack 6= ∅ then 17: parent ← ColumnStack.pop 18: end if 19: end while 20: if TM[parent,child] 6= 0 then 21: Append parent to Listparent 22: end if 23: else 24: if Listparent .length > 0 then 25: return Listparent 26: else 27: print "Error: No parent-child relationship exist" 28: exit 29: end if 30: end if 31: end while department node 1.2, and time predicate searches for employees in department sales who have lifespan intervals that include year 2000. This retrieves employee node 1.2.2.2. A structural join is performed afterwards to retrieve name node 1.2.2.2.1. VI.
T EMPORAL U PDATES
There are four basic temporal update operations: insertion, deletion, modification, and subtree relocation. Since modifications and sub tree relocation are a sequence of insertion and deletions, we describe the effects of only insertion and deletion operations on our index and how to propagate that effect. The insertion of a new node, in a temporal XML document, requires determining the location of the newly inserted node. As in [17], a path expression is used to specify that location. A parent label of the new node is also specified paired with the label of the new node and both are coded using label signature (see Table I). Furthermore, two parts must be provided with the new node; node id and interval. Node id is computed by using the numbering scheme dynamic dewey [13] based on the current entries nodes. Since we use transaction time model, the start time of interval of new node is assigned to current time (tc ) of insertion statement and its end is assigned to "now" expression. Algorithm 4 represents the insertion algorithm where the result nodes from evaluating the path expression
Input: matrix: T M , current time: tc , node: n, parent id: p , child id c, current nodes: CN , index file: T F , V F Output: updated index structures T M , T F , V F 1: create interval I = [tc , now] 2: if ∈ T M then 3: coalesce I in TM(p,c) 4: else 5: create a new entry in T M 6: T M (p, c) ← I. 7: end if 8: for each current node in CN do 9: assign the new node a new Dewey code id [18] 10: T F .insert(< p, c, I >, id) 11: V F .insert(value, id) 12: end for
TMIX
TMIX TSuffix TFIX
TSuffix TFIX
1.5
Giga Bytes
Algorithm 4 Insertion Pseudo Code
with normal increase of intervals over 12 years. Storage utilization for the three algorithms is presented graphically in Figures 4(a), 4(b) and 4(c). It is clear that TSuffix has the smallest index size as it is built based on paths only. Although TMIX indexes both structure and value, it is close in size to TSuffix specially in Xmark and Xbench data sets. On the other hand, TFIX is the least efficient index in terms of storage utilization since it depends on the whole number of elements inside the XML document.
Giga Bytes
are given as inputs. During insertion, there is no need to reconstruct the index even if the new node changes the structure, only a new parent-child is added to the matrix entries. In addition, there is no change made to the temporal matrix unless there is a new time entry with no coverage interval.
1
0
0 20
40
60
80
100
VII.
E XPERIMENTAL E VALUATION
In this section we evaluate the performance of TMIX. All experiments were implemented in c# with a Berkeley DB implementation of B+tree. We compare TMIX against TSuffix [8] and TFIX [10]. All tests were conducted on a PC with Pentium dual core CPU T4400 @2.2GHz and 6 GB RAM running Windows 7. A. Storage Utilization To evaluate the performance of TMIX, we use both synthetic and real data sets. We use Employees data set which was generated by Wang [2]. Employee data set is composed of multiple documents versions. The text values of a single element is modified over 17 years. In addition, we use Xbench [19], TCSD (Text-Centric Single Document) and Xmark [20] with scale factor 1. Both Xbench and Xmark were converted to temporal sets by attaching validity attributes to each element
22
(a) Employees Index
Giga Byte
44
75
114
203
(b) Xmark Index TSuffix
TFIX
2 1.5 1 0.5 0 26
48
71
97
256
(c) Xbench Index
Algorithm 5 Deletion Pseudo Code Input: matrix: T M , current time: tc , node: n, parent id: p , child id c, current nodes: CN , index file: T F Output: updated index structures T M , T F 1: for each node n in CN do 2: n.I.end = tc - 1 3: update n.I in TM(p,c) 4: T F .modify(< p, c, I >, n.id) 5: end for
1
0.5
0.5
TMIX
Physical deletion is not allowed in temporal XML. A node is logically deleted by changing its end time validity from "now" value to the deletion time tc . A path expression is used to determine the nodes to be logically deleted. Algorithm 5 presents the deletion algorithm. T M matrix is updated by updating only the intervals related to the deleted nodes entries. Update is also needed in the B tree index based on the node entry key.
1.5
Fig. 4: Indexes Sizes (Documents size in MB)
B. Query Performance To evaluate our index, we use the temporal queries classified by Bertino [21] which are used to benchmark the efficiency of any temporal index. Temporal queries for the different data sets are listed in Table IV. We first investigate slice range query type using the three queries Q1, Q2 and Q3. The evaluation time for these queries is presented in Figures 5(a), 5(b) and 5(c) respectively. It is clear from the Figures that TMIX has the best performance because both TSuffix and TFIX indices traverse irrelevant physical nodes to satisfy the time predicate. Whereas TMIX traverses only relevant nodes according to the generated key that is based on query time predicate and query structure. The second query type asks for historical key value. Since value predicates were not considered in TSuffix index, we only limit our comparison to TFIX in queries Q4, Q5 and Q6. Figures 5(d), 5(e) and 5(f) show the evaluation time for these three queries. TMIX outperforms TFIX using XMARK data set. TFIX integrates values and structure of XML document while TMIX creates a separate index for values. As a result, TMIX depends on the retrieved nodes from value predicates. Number of value nodes in queries Q4 and Q6 are much
TABLE IV: Query Benchmarks Query Type Temporal Slice Range
Key only
Key and Range
Q# Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
Data Set Employees Xmark Xbench Employees Xmark Xbench Employees Xmark Xbench
Expression //employee/firstname[1992-1993] //site/regions//item/shipping[2002-2003] //authors[2002-2005] //company//employee[title = "Engineer"] //item[location = "Georgia"] //item//author[name_of_city ="Dubai"] //employee[empno="10025"][salary[1990-2000]] //Africa/item[shipping ="internationally"][2008-2012] //item[subject = "Biographies"][2004-2012]
(a) Q1 - Employees
(b) Q2 - XMark
(c) Q3 - XBench
(d) Q4 - Employees
(e) Q5 - XMark
(f) Q6 - XBench
(g) Q7 - Employees
(h) Q8 - XMark
(i) Q9 - XBench
(j) U3: Update Complexities
(k) U4: Deletion Operation
(l) U5: Insertion Operation
Fig. 5: Key Queries Evaluation Time and Update
greater than that in Q5 for XMark. This explains the different performance of TMIX in the three data sets. Finally, Figures 5(g), 5(h) and 5(i) show the evaluation time for the third type of temporal queries, time and key value ranges (Q7, Q8 and Q9 respectively). Again TMIX is only compared with TFIX and discarded TSuffix index which do not handle value predicates. It is clear that TMIX outperforms TFIX using all data sets. TMIX can answer queries with time predicates efficiently since keys in TMIX are grouped according to time intervals. On the other hand, TFIX continues its degraded performance in evaluating time predicates since nodes are retrieved according to their structure only. All irrelevant retrieved nodes have to be visited and matched with the query time predicate and this extensively increases the query evaluation time using TFIX. C. Performance of Update Operations Using update operations, we limited our experiments on the employees data set to perform meaningful queries. The
update queries are all listed in Table V. The first three update queries U1, U2 and U3 range in complexity. Figure 5(j) shows the response time for the three queries. We can see that U3 outperforms the first two updates by more than a factor of 2. The reason is that the time predicate prunes the search space of updated nodes which in turn reduces processing time significantly. However, U1 efficiency is slightly better than U2. This is because U2 needs to join lists that result from predicate values. TMIX is compared against TSuffix in Figures 5(k) and 5(l) using update queries. U4 insertion query processing is presented in Figure 5(k). TSuffix processing time increases as the number of elements increases. TMIX performance is much faster than TSuffix since many nodes are pruned from the beginning using the time predicate. Deletion query U5 is presented in Figure 5(l), TMIX also outperforms TSuffix. VIII.
C ONCLUSION
In this paper we presented TMIX, an indexing technique for temporal XML documents. The index is based on summa-
TABLE V: Update Queries Utype U1
U2 U3
U4
U5
Update Modify salary of employees whose fname is Deborah to 5000$ Modify salary for employees whose fname is Deborah and title is engineer to 5000$ Modify salary for employees whose fname is Deborah and title is engineer and start time in the range [1990-2000] Delete all salaries for all employees whose start since 1990 Insert a new child called job tax for all employees who start in the company within the range [1988- 1995]
DDL expression For $m in employees[name = "Deborah"]/salary Insert $m Values 5000 For $m in employees[name = "Deborah"] [title= "Engineer"]/salary Insert $m Values 5000 For $m in employees[name = "Deborah"] [title= "Engineer"]/salary[Ts ≥ 1990 and Ts ≤ 2000] Insert $m Values 5000 For $m in employees/salary[Ts ≥ 1990] Delete $m For $m in employees[Ts ≥ 1988 && Ts ≤ 1995] Insert $m/job_tax
rizing the document structure and elements time intervals in a matrix. A B+ tree has been built based on the TMIX matrix for efficient retrieval of physical nodes. We compared the performance of TMIX against TSuffix and TFIX. Experiments showed that TMIX has the best performance in all temporal query types. TMIX obtains its powerfulness from the ability to answer coverage queries using the index only without the need to reach the physical data layer. TMIX builds a minimal summary structure for any temporal document and this allows it to answer temporal twig queries with any depth.
[9] [10]
[11]
[12]
R EFERENCES [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
T. Amagasa, M. Yoshikawa, and S. Uemura, “A data model for temporal xml documents,” in Proceedings of the 11th International Conference on Database and Expert Systems Applications, ser. DEXA ’00. London, UK: Springer-Verlag, 2000, pp. 334–344. [Online]. Available: http://dl.acm.org/citation.cfm?id=648313.755552 F. Wang and C. Zaniolo, “An xml-based approach to publishing and querying the history of databases,” World Wide Web, vol. 8, pp. 233–259, September 2005. [Online]. Available: http://dl.acm.org/ citation.cfm?id=1101070.1101086 K. Nørvåg, “Algorithms for temporal query operators in xml databases,” in Proceedings of the Worshops XMLDM, MDDE, and YRWS on XMLBased Data Management and Multimedia Engineering-Revised Papers, ser. EDBT ’02. London, UK, UK: Springer-Verlag, 2002, pp. 169–183. [Online]. Available: http://dl.acm.org/citation.cfm?id=646146.678899 G. Dandan, W. Xinjun, and D. Li, “Indexing temporal xml using interval-tree index,” in Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 04, ser. CSSE ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 689–691. [Online]. Available: http://dx.doi.org/10.1109/ CSSE.2008.1223 A. O. Mendelzon, F. Rizzolo, and A. Vaisman, “Indexing temporal xml documents,” in Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, ser. VLDB ’04. VLDB Endowment, 2004, pp. 216–227. [Online]. Available: http://dl.acm.org/citation.cfm?id=1316689.1316710 E. Manica, C. F. Dorneles, and R. Galante, “Supporting temporal queries on xml keyword search engines,” Journal Information and Data Management, vol. 1, no. 3, pp. 471–486, October 2010. F. "Rizzolo and A. A. Vaisman, “"temporal xml: modeling, indexing, and query processing",” "The VLDB Journal", vol. "17", pp. "1179–1212", "August" "2008". [Online]. Available: http://dx.doi.org/10.1007/s00778-007-0058-x F. Zhang, X. Wang, and S. Ma, “Temporal xml indexing based on suffix tree,” in Proceedings of the 2009 Seventh ACIS International Conference on Software Engineering Research, Management and Applications, ser. SERA ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 140–144. [Online]. Available: http://dx.doi.org/10.1109/SERA.2009.20
[13]
[14]
[15]
[16] [17]
[18]
[19]
[20] [21]
E. Ukkonen, “On-line construction of suffix trees,” Algorithmica, vol. 14, no. 3, pp. 249– 260, 1995. T. Zheng, X. Wang, and Y. Zhou, “Indexing temporal xml using fix,” in Proceedings of the International Conference on Web Information Systems and Mining, ser. WISM ’09. Berlin, Heidelberg: Springer-Verlag, 2009, pp. 224–231. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-05250-7_24 N. Zhang, M. T. Özsu, I. F. Ilyas, and A. Aboulnaga, “Fix: feature-based indexing technique for xml documents,” in Proceedings of the 32nd international conference on Very large data bases, ser. VLDB ’06. VLDB Endowment, 2006, pp. 259–270. [Online]. Available: http://dl.acm.org/citation.cfm?id=1182635.1164151 J. Clifford, C. Dyreson, T. Isakowitz, C. S. Jensen, and R. T. Snodgrass, “On the semantics of "now" in databases,” ACM Trans. Database Syst., vol. 22, pp. 171–214, June 1997. [Online]. Available: http://doi.acm.org/10.1145/249978.249980 E. Cohen, H. Kaplan, and T. Milo, “Labeling dynamic xml trees,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ser. PODS ’02. New York, NY, USA: ACM, 2002, pp. 271–281. [Online]. Available: http://doi.acm.org/10.1145/543613.543648 M. R. Henzinger, T. A. Henzinger, and P. W. Kopke, “Computing simulations on finite and infinite graphs,” in Proceedings of the 36th Annual Symposium on Foundations of Computer Science, ser. FOCS ’95. Washington, DC, USA: IEEE Computer Society, 1995, pp. 453–462. [Online]. Available: http://dl.acm.org/citation.cfm?id= 795662.796255 M. H. Böhlen, R. T. Snodgrass, and M. D. Soo, “Coalescing in temporal databases,” in Proceedings of the 22th International Conference on Very Large Data Bases, ser. VLDB ’96. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1996, pp. 180–191. [Online]. Available: http://dl.acm.org/citation.cfm?id=645922.673474 R. T. Snodgrass, The TSQL2 Temporal Query Language. Norwell, MA, USA: Kluwer Academic Publishers, 1995. I. Tatarinov, Z. G. Ives, A. Y. Halevy, and D. S. Weld, “Updating xml.” in SIGMOD Conference, S. Mehrotra and T. K. Sellis, Eds. ACM, 2001, pp. 413–424. [Online]. Available: http: //dblp.uni-trier.de/db/conf/sigmod/sigmod2001.html#TatarinovIHW01 L. Xu, T. W. Ling, H. Wu, and Z. Bao, “Dde: from dewey to a fully dynamic xml labeling scheme,” in Proceedings of the 35th SIGMOD international conference on Management of data, ser. SIGMOD ’09. New York, NY, USA: ACM, 2009, pp. 719–730. [Online]. Available: http://doi.acm.org/10.1145/1559845.1559921 B. B. Yao. (2003, Sep.) Xbench - a family of benchmarks for xml dbmss. [Online]. Available: https://cs.uwaterloo.ca/~tozsu/ddbms/ projects/xbench/ A. Schmidt. (2009, Sep.) Xmark-an xml benchmark project. [Online]. Available: http://www.xml-benchmark.org/ E. Bertino, Indexing techniques for advanced database systems, ser. Kluwer international series on advances in database systems. Kluwer Academic Publishers, 1997. [Online]. Available: http: //books.google.com.eg/books?id=s9JQAAAAMAAJ