Jan 27, 2010 - OTwig: An Optimised Twig Pattern Matching Approach for XML ... the structural relationship between nodes in an XML tree. ...... PSD Query. 600.
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases Jun Liu Interoperable Systems Group Dublin City University Ireland
January 27, 2010
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
1 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
2 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
3 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Introduction
XML has been used increasingly as a data exchange format. Querying XML is inefficient due to its complex tree-based structure. An effective mechanism is required to efficiently determine the structural relationship between nodes in an XML tree.
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
4 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
5 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model XML is represented as an ordered and labeled tree structure each element within an XML document represents a tree node in the corresponding XML tree edge between nodes is represented by parent-child relationship nodes with same label (tag-name) has the same type Twig Pattern Query a key component of XPath and XQuery each node of a twig pattern corresponds to a set of nodes in an XML tree the edge represents either parent-child or ancestor-descendant relationship
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
6 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model a1
a2
a3
c2
b3
c1
c3
b1 b2
b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
7 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model a1
a2
a3
c2
b3
c1
c3
b1 b2
b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
7 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model a1
a2
a3
c2
b3
c1
c3
b1 b2
b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
7 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model a1
a2
a3
c2
b3
c1
c3
b1 b2
b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
7 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
XML Model a1
a2
a3
c2
b3
c1
c3
b1 b2
A
B
C
b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
7 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
What is the difficult? Efficiently finding twig patterns in XML trees are essential to XPath and XQuery evaluation. Join algorithm must determine the structural relationship between nodes in a twig pattern query. This is Slow! Positional encoding scheme is used to facilitate the determination of structural relationship. Pre and Post encoding scheme Start and End encoding scheme We need an efficient approach to work on such encoding schemes to efficiently finding twig pattern matches.
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
8 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
What is the difficult? Efficiently finding twig patterns in XML trees are essential to XPath and XQuery evaluation. Join algorithm must determine the structural relationship between nodes in a twig pattern query. This is Slow! Positional encoding scheme is used to facilitate the determination of structural relationship. Pre and Post encoding scheme Start and End encoding scheme We need an efficient approach to work on such encoding schemes to efficiently finding twig pattern matches.
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
8 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Containment Join decompose a twig query into a set of steps apply similar algorithm to traditional merge-join algorithm e.g., MPMGJN Path Join decompose a twig query into a set of binary paths merge each binary path e.g., StackTree decompose a twig query into a set of root-to-leaf paths merge each path together e.g., TwigStack Twig Join evaluate the twig query as a whole e.g., TwigList, Twig2 Stack, TwigStackList, TJFast ...
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
9 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Contribution We present a bottom-up twig join algorithm, OTwig, based on the Start and End positional encoding scheme extends the TwigList algorithm with further performance gains process nodes as they reside in their index streams, rather than creating an additional working stack apply pruning rules to reduce the total number of nodes to be processed and stored in the memory
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
10 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
11 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
Level 1
Level 2
11 b4
SOFSEM 2010
17 18 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
Level 3
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme 1
20 a1 Level 0
2 3
a2
4
b1 7
5
8
6
15 16 a3
c2 17 18 14 c3
12 13
9
b2
19
c1
b3 10
Level 1
Level 2
11 b4
Level 3
For an node u in an XML tree T reg(u) is the region of u in T containing u.start and u.end dep(u) represents the depth (level) of u in T edge(u,v) indicates the relationship between u and v, parent-child or ancestor-descendant relationships
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Start and End Encoding Scheme
1 ISG
20
a1 SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Start and End Encoding Scheme
16
Experimental Analysis
Level 0
Conclusions
19 c2 17 18 14 c3
13 c1
ISG
Level 1
Level 2 SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
17 18 14 c3
Conclusions
Start and End Encoding Scheme
12 13
9
c1
b3
10
11 b4
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
12 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Properties Region Containment Given two nodes u and v, the region of u is contained in the region of v iff u.start > v.start and u.end < v.end, denoted by reg(u) ⊂ reg(v). Ancestor Given two nodes u and v, u ∈ T and v ∈ T . u is the ancestor of v iif reg(v) ⊂ reg(u). Parent Given two nodes u and v, u ∈ T and v ∈ T . u is the parent of v iif reg(v) ⊂ reg(u) and dep(u) + 1 = dep(v).
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
13 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Pruning Rules
Pruning By Ancestor and Descendant Before a node can be added into the result list, we make sure that it has a valid ancestor and all valid descendants.
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
14 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
OTwig By Example XML Document Parsing each type of node is stored in a seperate file sort by the end value in ascending order
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
15 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
OTwig By Example XML Document Parsing each type of node is stored in a seperate file sort by the end value in ascending order 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
a2 [3,4,2]
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
b1
[6,15,1] [1,20,0] a3
a1
[7,8,2] [10,11,3] [9,12,2] b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
15 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [2,5,1] Introduction
[6,15,1] [1,20,0]
Background
OTwig Algorithm
a2
A
a3
Experimental Analysis
Conclusions
a1
OTwig By Example [3,4,2] [7,8,2] A[ ... ]
[10,11,3] [9,12,2]
b2 b4 b3 B b1 Parsing XML Document [13,14,2] [17,18,2] [16,19,1] each type of node is stored in a sperate file B[ ... ] C[ ... ] sort C by the in ascending order c1 end value c3 c2 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
16 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [2,5,1] Introduction
[6,15,1] [1,20,0]
Background
OTwig Algorithm
a2
A
a3
OTwig By Example [7,8,2] [10,11,3] A[ ... ]
B[ b1
Experimental Analysis
[9,12,2]
b3 B b2Parsingb4 XML Document [17,18,2] [16,19,1] each type[13,14,2] of node is stored in a sperate file ] C[ ... ] sort byCthe cend value order c3 in ascending c2 1 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
c2 17 14 c3 c1
b3 10
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
19
12 13
9
b2
ISG
Conclusions
a1
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
17 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [6,15,1] [1,20,0] Introduction
Background
OTwig Algorithm
a3
A
OTwig By Example [7,8,2] [10,11,3] A[ ... ]
B[ b1
Experimental Analysis
[9,12,2]
b3 B b2Parsingb4 XML Document [17,18,2] [16,19,1] each type[13,14,2] of node is stored in a sperate file ] C[ ... ] sort byCthe cend value order c3 in ascending c2 1 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
c2 17 14 c3 c1
b3 10
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
19
12 13
9
b2
ISG
Conclusions
a1
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
18 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [6,15,1] [1,20,0] Introduction
Background
OTwig Algorithm
a3
A
OTwig By Example [10,11,3] A[ ... ]
Experimental Analysis
Conclusions
a1 [9,12,2]
b4 b3 B Parsing XML Document each type of[13,14,2] node is[17,18,2] stored[16,19,1] in a sperate file B[ b1 b2 ] C[ ... ] sort by the end value in ascending order c c c2 1 3 C 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
19 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [6,15,1] [1,20,0] Introduction
Background
OTwig Algorithm
a3
A
Experimental Analysis
Conclusions
a1
OTwig By Example [9,12,2] A[ ... ]
b3 XML DocumentBParsing [13,14,2] [17,18,2] each type of node is stored in [16,19,1] a sperate file B[ b1 b2 b4 ] C[ ... ] sort by the C end value in cascending c1 c2 order 3 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
20 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [6,15,1] [1,20,0] Introduction
Background
OTwig Algorithm
a3
A
Experimental Analysis
Conclusions
a1
OTwig By Example A[ ... ]
[ ... ]
B XML Document Parsing [17,18,2] [16,19,1] file each type of node[13,14,2] is stored in a sperate B[ b1 b2 b4 b3 ] C[ ... ] sort by the endCvalue order c1 in ascending c3 c2 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
21 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [6,15,1] [1,20,0] Introduction
Background
OTwig Algorithm
a3
A
Experimental Analysis
Conclusions
a1
OTwig By Example A[ ... ]
[ ... ]
B XML Document Parsing each type of node is[17,18,2] stored [16,19,1] in a sperate file c1 end value in ascending order c c2 B[ b1 b2 b4 sort C[the ] b3 ] by 3 C 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
22 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [1,20,0] Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
a1
A
OTwig By Example A[ a
[ ... ]
]
B XML3 Document Parsing each type of node is[17,18,2] stored [16,19,1] in a sperate file c1 end value in ascending order c c2 B[ b1 b2 b4 sort C[the ] b3 ] by 3 C 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
23 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [1,20,0] Introduction
Background
OTwig Algorithm
OTwig By Example A[ a
Experimental Analysis
Conclusions
a1
A
[ ... ]
]
XML3 Document ParsingB [16,19,1] each type of node is stored in a sperate file c1 end c3 ] value in ascending order c B[ b1 b2 b4 sort C[the b3 ] by 2 C 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
24 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches [1,20,0] Introduction
Background
OTwig Algorithm
A
a1
B
[ ... ]
Experimental Analysis
Conclusions
OTwig By Example A[ a
]
XML3 Document Parsing each type of node is stored in a sperate file [ ... ] C c1 end c3 c2 value in ascending order B[ b1 b2 b4 sort C[the ] b3 ] by 1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
25 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Searching Pattern Matches Introduction
Background
OTwig Algorithm
A
[ ... ]
B
[ ... ]
Experimental Analysis
Con
OTwig By Example ]
A[ a a
1 XML3 Document Parsing each type of node is stored in a sperate file [ ... ] C c1 end c3 c2 value in ascending order B[ b1 b2 b4 sort C[the ] b3 ] by
1
20 a1
[2,5,1]
Level 0 2 3
a2
4 b1 7
5
8
6
15 16 a3
Level 1 18 Level 2
A
Level 3
B C
[6,15,1] [1,20,0]
a2 [3,4,2] A
B
11 b4
SOFSEM 2010
17 14 c3 c1
b3 10
ISG
c2 12 13
9
b2
19
a3
a1
[7,8,2] [10,11,3] [9,12,2]
b1
b2
b4
b3
[13,14,2] [17,18,2] [16,19,1]
C
c1
c3
c2
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
26 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
A[ a3 a1 ]
B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h
a1
b1
c1
i
h
a1
b1
c3
i
h
a1
b1
c2
i
h
a3
b2
c1
i
h
a1
b2
c1
i
h
a1
b2
c3
i
h
a1
b2
c2
i
h
a3
b4
c1
i
h
a1
b3
c1
i
h
a1
b3
c3
i
h
a1
b3
c2
i
h
a3
b3
c1
i
h
a1
b4
c1
i
h
a1
b4
c3
i
h
a1
b4
c2
i
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
27 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
A[ a3 a1 ]
B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h
a1
b1
c1
i
h
a1
b1
c3
i
h
a1
b1
c2
i
h
a3
b2
c1
i
h
a1
b2
c1
i
h
a1
b2
c3
i
h
a1
b2
c2
i
h
a3
b4
c1
i
h
a1
b3
c1
i
h
a1
b3
c3
i
h
a1
b3
c2
i
h
a3
b3
c1
i
h
a1
b4
c1
i
h
a1
b4
c3
i
h
a1
b4
c2
i
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
27 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
A[ a3 a1 ]
B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h
a1
b1
c1
i
h
a1
b1
c3
i
h
a1
b1
c2
i
h
a3
b2
c1
i
h
a1
b2
c1
i
h
a1
b2
c3
i
h
a1
b2
c2
i
h
a3
b4
c1
i
h
a1
b3
c1
i
h
a1
b3
c3
i
h
a1
b3
c2
i
h
a3
b3
c1
i
h
a1
b4
c1
i
h
a1
b4
c3
i
h
a1
b4
c2
i
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
27 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
28 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Queries DBLP Dataset DB1: //dblp/inproceedings[./title]/author DB2: //dblp/inproceedings[./cite][./title]/author DB3: //article[./volume][./cite]/journal DB4: //article[./mdate][./volume][./cite/label]/journal DB5: //inproceedings[./key][./mdate][./author][./year][./url]/title DB6: //article[./title][./author][./year][./ee]/key Protein Sequence Database PSD1: //ProteinEntry[./header[./accession]/created date]/protein/name PSD2: //ProteinEntry[./organism/source][./reference[.//year][.//month] //group]//gene PSD3: //ProteinEntry[.//gene][.//label]/header/accession PSD4: //ProteinEntry[./genetics[./label]/gene][./reference]/protein/name PSD5: //ProteinEntry[./reference[./accinfo]//title]/classification PSD6: //ProteinEntry[./classification/superfamily][./feature/description] /keywords XMark Dataset XM1: //item[location]/description//keyword XM2: //person[.//address/zipcode]/profile/education XM3: //item[location][.//mailbox/mail//emph]/description//keyword XM4: //item[//location][.//mail//date]//payment XM5: //person[./emailaddress][./phone]/profile[.//age]/education
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
29 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Pruning Rate
ISG
Query
Total Nodes.
DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5
4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120
SOFSEM 2010
OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%
TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%
Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
30 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Pruning Rate
ISG
Query
Total Nodes.
DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5
4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120
SOFSEM 2010
OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%
TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%
Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
31 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
Pruning Rate
ISG
Query
Total Nodes.
DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5
4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120
SOFSEM 2010
OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%
TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%
Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
32 / 35
Background
OTwig
8000
Processing T Time (ms)
6000 5000 4000 3000 2000 1000 0 DB2
DB3
DB4
DB5
2000 1500 1000 500
PSD1
PSD2
PSD3
PSD4
PSD5
200
XM2
50000
DBLP Query
DB5
DB6
XM5
TwigList
35000
140000 120000 100000 80000 60000 40000
30000 25000 20000 15000 10000 5000 0
0 DB4
XM4
OTwig 40000
20000
0
XM3 XMARK Query
Memory U Usage (KB)
Memory Usage (KB)
150000 100000
SOFSEM 2010
400
XM1
TwigList
160000
200000
DB3
600
PSD6
OTwig 180000
TwigList
DB2
800
PSD Query
OTwig
DB1
TwigList
1000
0
DB6
250000 Memory U Usage (KB)
1200
TwigList
DBLP Query
300000
Conclusions
OTwig
OTwig
0 DB1
ISG
Experimental Analysis
2500
TwigList
7000 Processingg Time (ms)
OTwig Algorithm
Processingg Time (ms)
Introduction
PSD1
PSD2
PSD3
PSD4
PSD Query
PSD5
PSD6
XM1
XM2
XM3
XM4
XM5
XMark Query
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
33 / 35
Introduction
Background
1
Introduction
2
Background Problem Description Existing Works Contribution
3
OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process
4
Experimental Analysis Queries Pruning Rate Evaluation Analysis
5
Conclusions
ISG
SOFSEM 2010
OTwig Algorithm
Experimental Analysis
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
Conclusions
34 / 35
Introduction
Background
OTwig Algorithm
Experimental Analysis
Conclusions
With the need for managing and querying large XML datastores, comes the additional requirement for improving the poor query response times. Twig join algorithm is required as traditional join algorithm is inefficient to process structural relationship appearing in XML trees. We extend TwigList algorithm to further improve the twig pattern matching performance. Our future plan is to further reduce the amount of nodes to be accessed by applying technique such as QueryGuide.
ISG
SOFSEM 2010
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases
35 / 35