OTwig: An Optimised Twig Pattern Matching Approach for XML ...

6 downloads 195 Views 1MB Size Report
Jan 27, 2010 - OTwig: An Optimised Twig Pattern Matching Approach for XML ... the structural relationship between nodes in an XML tree. ...... PSD Query. 600.
Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases Jun Liu Interoperable Systems Group Dublin City University Ireland

January 27, 2010

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

1 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

2 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

3 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Introduction

XML has been used increasingly as a data exchange format. Querying XML is inefficient due to its complex tree-based structure. An effective mechanism is required to efficiently determine the structural relationship between nodes in an XML tree.

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

4 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

5 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model XML is represented as an ordered and labeled tree structure each element within an XML document represents a tree node in the corresponding XML tree edge between nodes is represented by parent-child relationship nodes with same label (tag-name) has the same type Twig Pattern Query a key component of XPath and XQuery each node of a twig pattern corresponds to a set of nodes in an XML tree the edge represents either parent-child or ancestor-descendant relationship

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

6 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model a1

a2

a3

c2

b3

c1

c3

b1 b2

b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

7 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model a1

a2

a3

c2

b3

c1

c3

b1 b2

b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

7 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model a1

a2

a3

c2

b3

c1

c3

b1 b2

b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

7 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model a1

a2

a3

c2

b3

c1

c3

b1 b2

b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

7 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

XML Model a1

a2

a3

c2

b3

c1

c3

b1 b2

A

B

C

b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

7 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

What is the difficult? Efficiently finding twig patterns in XML trees are essential to XPath and XQuery evaluation. Join algorithm must determine the structural relationship between nodes in a twig pattern query. This is Slow! Positional encoding scheme is used to facilitate the determination of structural relationship. Pre and Post encoding scheme Start and End encoding scheme We need an efficient approach to work on such encoding schemes to efficiently finding twig pattern matches.

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

8 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

What is the difficult? Efficiently finding twig patterns in XML trees are essential to XPath and XQuery evaluation. Join algorithm must determine the structural relationship between nodes in a twig pattern query. This is Slow! Positional encoding scheme is used to facilitate the determination of structural relationship. Pre and Post encoding scheme Start and End encoding scheme We need an efficient approach to work on such encoding schemes to efficiently finding twig pattern matches.

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

8 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Containment Join decompose a twig query into a set of steps apply similar algorithm to traditional merge-join algorithm e.g., MPMGJN Path Join decompose a twig query into a set of binary paths merge each binary path e.g., StackTree decompose a twig query into a set of root-to-leaf paths merge each path together e.g., TwigStack Twig Join evaluate the twig query as a whole e.g., TwigList, Twig2 Stack, TwigStackList, TJFast ...

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

9 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Contribution We present a bottom-up twig join algorithm, OTwig, based on the Start and End positional encoding scheme extends the TwigList algorithm with further performance gains process nodes as they reside in their index streams, rather than creating an additional working stack apply pruning rules to reduce the total number of nodes to be processed and stored in the memory

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

10 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

11 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

Level 1

Level 2

11 b4

SOFSEM 2010

17 18 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

Level 3

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme 1

20 a1 Level 0

2 3

a2

4

b1 7

5

8

6

15 16 a3

c2 17 18 14 c3

12 13

9

b2

19

c1

b3 10

Level 1

Level 2

11 b4

Level 3

For an node u in an XML tree T reg(u) is the region of u in T containing u.start and u.end dep(u) represents the depth (level) of u in T edge(u,v) indicates the relationship between u and v, parent-child or ancestor-descendant relationships

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Start and End Encoding Scheme

1 ISG

20

a1 SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Start and End Encoding Scheme

16

Experimental Analysis

Level 0

Conclusions

19 c2 17 18 14 c3

13 c1

ISG

Level 1

Level 2 SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

17 18 14 c3

Conclusions

Start and End Encoding Scheme

12 13

9

c1

b3

10

11 b4

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

12 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Properties Region Containment Given two nodes u and v, the region of u is contained in the region of v iff u.start > v.start and u.end < v.end, denoted by reg(u) ⊂ reg(v). Ancestor Given two nodes u and v, u ∈ T and v ∈ T . u is the ancestor of v iif reg(v) ⊂ reg(u). Parent Given two nodes u and v, u ∈ T and v ∈ T . u is the parent of v iif reg(v) ⊂ reg(u) and dep(u) + 1 = dep(v).

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

13 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Pruning Rules

Pruning By Ancestor and Descendant Before a node can be added into the result list, we make sure that it has a valid ancestor and all valid descendants.

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

14 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

OTwig By Example XML Document Parsing each type of node is stored in a seperate file sort by the end value in ascending order

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

15 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

OTwig By Example XML Document Parsing each type of node is stored in a seperate file sort by the end value in ascending order 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

a2 [3,4,2]

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

b1

[6,15,1] [1,20,0] a3

a1

[7,8,2] [10,11,3] [9,12,2] b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

15 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [2,5,1] Introduction

[6,15,1] [1,20,0]

Background

OTwig Algorithm

a2

A

a3

Experimental Analysis

Conclusions

a1

OTwig By Example [3,4,2] [7,8,2] A[ ... ]

[10,11,3] [9,12,2]

b2 b4 b3 B b1 Parsing XML Document [13,14,2] [17,18,2] [16,19,1] each type of node is stored in a sperate file B[ ... ] C[ ... ] sort C by the in ascending order c1 end value c3 c2 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

16 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [2,5,1] Introduction

[6,15,1] [1,20,0]

Background

OTwig Algorithm

a2

A

a3

OTwig By Example [7,8,2] [10,11,3] A[ ... ]

B[ b1

Experimental Analysis

[9,12,2]

b3 B b2Parsingb4 XML Document [17,18,2] [16,19,1] each type[13,14,2] of node is stored in a sperate file ] C[ ... ] sort byCthe cend value order c3 in ascending c2 1 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

c2 17 14 c3 c1

b3 10

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

19

12 13

9

b2

ISG

Conclusions

a1

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

17 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [6,15,1] [1,20,0] Introduction

Background

OTwig Algorithm

a3

A

OTwig By Example [7,8,2] [10,11,3] A[ ... ]

B[ b1

Experimental Analysis

[9,12,2]

b3 B b2Parsingb4 XML Document [17,18,2] [16,19,1] each type[13,14,2] of node is stored in a sperate file ] C[ ... ] sort byCthe cend value order c3 in ascending c2 1 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

c2 17 14 c3 c1

b3 10

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

19

12 13

9

b2

ISG

Conclusions

a1

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

18 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [6,15,1] [1,20,0] Introduction

Background

OTwig Algorithm

a3

A

OTwig By Example [10,11,3] A[ ... ]

Experimental Analysis

Conclusions

a1 [9,12,2]

b4 b3 B Parsing XML Document each type of[13,14,2] node is[17,18,2] stored[16,19,1] in a sperate file B[ b1 b2 ] C[ ... ] sort by the end value in ascending order c c c2 1 3 C 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

19 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [6,15,1] [1,20,0] Introduction

Background

OTwig Algorithm

a3

A

Experimental Analysis

Conclusions

a1

OTwig By Example [9,12,2] A[ ... ]

b3 XML DocumentBParsing [13,14,2] [17,18,2] each type of node is stored in [16,19,1] a sperate file B[ b1 b2 b4 ] C[ ... ] sort by the C end value in cascending c1 c2 order 3 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

20 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [6,15,1] [1,20,0] Introduction

Background

OTwig Algorithm

a3

A

Experimental Analysis

Conclusions

a1

OTwig By Example A[ ... ]

[ ... ]

B XML Document Parsing [17,18,2] [16,19,1] file each type of node[13,14,2] is stored in a sperate B[ b1 b2 b4 b3 ] C[ ... ] sort by the endCvalue order c1 in ascending c3 c2 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

21 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [6,15,1] [1,20,0] Introduction

Background

OTwig Algorithm

a3

A

Experimental Analysis

Conclusions

a1

OTwig By Example A[ ... ]

[ ... ]

B XML Document Parsing each type of node is[17,18,2] stored [16,19,1] in a sperate file c1 end value in ascending order c c2 B[ b1 b2 b4 sort C[the ] b3 ] by 3 C 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

22 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [1,20,0] Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

a1

A

OTwig By Example A[ a

[ ... ]

]

B XML3 Document Parsing each type of node is[17,18,2] stored [16,19,1] in a sperate file c1 end value in ascending order c c2 B[ b1 b2 b4 sort C[the ] b3 ] by 3 C 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

23 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [1,20,0] Introduction

Background

OTwig Algorithm

OTwig By Example A[ a

Experimental Analysis

Conclusions

a1

A

[ ... ]

]

XML3 Document ParsingB [16,19,1] each type of node is stored in a sperate file c1 end c3 ] value in ascending order c B[ b1 b2 b4 sort C[the b3 ] by 2 C 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

24 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches [1,20,0] Introduction

Background

OTwig Algorithm

A

a1

B

[ ... ]

Experimental Analysis

Conclusions

OTwig By Example A[ a

]

XML3 Document Parsing each type of node is stored in a sperate file [ ... ] C c1 end c3 c2 value in ascending order B[ b1 b2 b4 sort C[the ] b3 ] by 1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

25 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Searching Pattern Matches Introduction

Background

OTwig Algorithm

A

[ ... ]

B

[ ... ]

Experimental Analysis

Con

OTwig By Example ]

A[ a a

1 XML3 Document Parsing each type of node is stored in a sperate file [ ... ] C c1 end c3 c2 value in ascending order B[ b1 b2 b4 sort C[the ] b3 ] by

1

20 a1

[2,5,1]

Level 0 2 3

a2

4 b1 7

5

8

6

15 16 a3

Level 1 18 Level 2

A

Level 3

B C

[6,15,1] [1,20,0]

a2 [3,4,2] A

B

11 b4

SOFSEM 2010

17 14 c3 c1

b3 10

ISG

c2 12 13

9

b2

19

a3

a1

[7,8,2] [10,11,3] [9,12,2]

b1

b2

b4

b3

[13,14,2] [17,18,2] [16,19,1]

C

c1

c3

c2

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

26 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

A[ a3 a1 ]

B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h

a1

b1

c1

i

h

a1

b1

c3

i

h

a1

b1

c2

i

h

a3

b2

c1

i

h

a1

b2

c1

i

h

a1

b2

c3

i

h

a1

b2

c2

i

h

a3

b4

c1

i

h

a1

b3

c1

i

h

a1

b3

c3

i

h

a1

b3

c2

i

h

a3

b3

c1

i

h

a1

b4

c1

i

h

a1

b4

c3

i

h

a1

b4

c2

i

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

27 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

A[ a3 a1 ]

B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h

a1

b1

c1

i

h

a1

b1

c3

i

h

a1

b1

c2

i

h

a3

b2

c1

i

h

a1

b2

c1

i

h

a1

b2

c3

i

h

a1

b2

c2

i

h

a3

b4

c1

i

h

a1

b3

c1

i

h

a1

b3

c3

i

h

a1

b3

c2

i

h

a3

b3

c1

i

h

a1

b4

c1

i

h

a1

b4

c3

i

h

a1

b4

c2

i

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

27 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

A[ a3 a1 ]

B[ b1 b2 b4 b3 ] C[ c1 c3 c2 ] h

a1

b1

c1

i

h

a1

b1

c3

i

h

a1

b1

c2

i

h

a3

b2

c1

i

h

a1

b2

c1

i

h

a1

b2

c3

i

h

a1

b2

c2

i

h

a3

b4

c1

i

h

a1

b3

c1

i

h

a1

b3

c3

i

h

a1

b3

c2

i

h

a3

b3

c1

i

h

a1

b4

c1

i

h

a1

b4

c3

i

h

a1

b4

c2

i

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

27 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

28 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Queries DBLP Dataset DB1: //dblp/inproceedings[./title]/author DB2: //dblp/inproceedings[./cite][./title]/author DB3: //article[./volume][./cite]/journal DB4: //article[./mdate][./volume][./cite/label]/journal DB5: //inproceedings[./key][./mdate][./author][./year][./url]/title DB6: //article[./title][./author][./year][./ee]/key Protein Sequence Database PSD1: //ProteinEntry[./header[./accession]/created date]/protein/name PSD2: //ProteinEntry[./organism/source][./reference[.//year][.//month] //group]//gene PSD3: //ProteinEntry[.//gene][.//label]/header/accession PSD4: //ProteinEntry[./genetics[./label]/gene][./reference]/protein/name PSD5: //ProteinEntry[./reference[./accinfo]//title]/classification PSD6: //ProteinEntry[./classification/superfamily][./feature/description] /keywords XMark Dataset XM1: //item[location]/description//keyword XM2: //person[.//address/zipcode]/profile/education XM3: //item[location][.//mailbox/mail//emph]/description//keyword XM4: //item[//location][.//mail//date]//payment XM5: //person[./emailaddress][./phone]/profile[.//age]/education

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

29 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Pruning Rate

ISG

Query

Total Nodes.

DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5

4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120

SOFSEM 2010

OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%

TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%

Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

30 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Pruning Rate

ISG

Query

Total Nodes.

DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5

4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120

SOFSEM 2010

OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%

TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%

Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

31 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

Pruning Rate

ISG

Query

Total Nodes.

DB1 DB2 DB3 DB4 DB5 DB6 PSD1 PSD2 PSD3 PSD4 PSD5 PSD6 XM1 XM2 XM3 XM4 XM5

4082019 4254420 1297362 2376462 8093639 6501573 1948174 1169008 1751474 1900571 1269229 967704 900871 915971 1464611 991628 1012120

SOFSEM 2010

OTwig Pruned Rate 1272822 31% 1923963 45% 502686 39% 1170749 49% 2853343 35% 4029283 62% 312506 16% 573758 49% 395890 23% 864999 46% 230061 18% 271366 28% 394164 44% 708996 77% 848493 58% 499471 50% 732783 72%

TwigList Pruned Rate 1270976 31% 1322555 31% 21 0.000016% 81 0.000034% 8723 0.001% 81 0.000012% 312506 16% 0 0% 312506 17.8% 396504 20.9% 0 0% 40916 4.2% 180101 20% 0 0% 316087 21.6% 102950 10.4% 0 0%

Matches 1595488 290144 47324 13785 1595475 608053 323043 2075 709176 4074 144505 207373 136282 15859 86533 104430 7966

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

32 / 35

Background

OTwig

8000

Processing T Time (ms)

6000 5000 4000 3000 2000 1000 0 DB2

DB3

DB4

DB5

2000 1500 1000 500

PSD1

PSD2

PSD3

PSD4

PSD5

200

XM2

50000

DBLP Query

DB5

DB6

XM5

TwigList

35000

140000 120000 100000 80000 60000 40000

30000 25000 20000 15000 10000 5000 0

0 DB4

XM4

OTwig 40000

20000

0

XM3 XMARK Query

Memory U Usage (KB)

Memory Usage (KB)

150000 100000

SOFSEM 2010

400

XM1

TwigList

160000

200000

DB3

600

PSD6

OTwig 180000

TwigList

DB2

800

PSD Query

OTwig

DB1

TwigList

1000

0

DB6

250000 Memory U Usage (KB)

1200

TwigList

DBLP Query

300000

Conclusions

OTwig

OTwig

0 DB1

ISG

Experimental Analysis

2500

TwigList

7000 Processingg Time (ms)

OTwig Algorithm

Processingg Time (ms)

Introduction

PSD1

PSD2

PSD3

PSD4

PSD Query

PSD5

PSD6

XM1

XM2

XM3

XM4

XM5

XMark Query

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

33 / 35

Introduction

Background

1

Introduction

2

Background Problem Description Existing Works Contribution

3

OTwig Algorithm Positional Encoding Properties and Rules OTwig By Example Merging Process

4

Experimental Analysis Queries Pruning Rate Evaluation Analysis

5

Conclusions

ISG

SOFSEM 2010

OTwig Algorithm

Experimental Analysis

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

Conclusions

34 / 35

Introduction

Background

OTwig Algorithm

Experimental Analysis

Conclusions

With the need for managing and querying large XML datastores, comes the additional requirement for improving the poor query response times. Twig join algorithm is required as traditional join algorithm is inefficient to process structural relationship appearing in XML trees. We extend TwigList algorithm to further improve the twig pattern matching performance. Our future plan is to further reduce the amount of nodes to be accessed by applying technique such as QueryGuide.

ISG

SOFSEM 2010

OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

35 / 35

Suggest Documents