Mapping XQuery to Algebraic Expressions - Semantic Scholar

6 downloads 0 Views 127KB Size Report
Jun 17, 2004 - FLWR expression for $b in collection('books') let $a := $b//author where $b/title = 'Moby Dick' return {$a}. B1 = ϵ. (∗).
Mapping XQuery to Algebraic Expressions

Matteo Magnani

Danilo Montesi

Technical Report UBLCS-2004-11 June 17, 2004

Department of Computer Science University of Bologna Mura Anteo Zamboni 7 40127 Bologna (Italy)

The University of Bologna Department of Computer Science Research Technical Reports are available in PDF and gzipped PostScript formats via anonymous FTP from the area ftp.cs.unibo.it:/pub/TR/UBLCS or via WWW at URL http://www.cs.unibo.it/. Plain-text abstracts organized by year are available in the directory ABSTRACTS.

Recent Titles from the UBLCS Technical Report Series 2003-7 Intersection Types, Lambda Abstraction Algebras and Lambda Theories (Ph.D. Thesis), Lusin, S., March 2003. 2003-8 Towards an Ontology-Guided Search Engine, Gaspari, M., Guidi, D., June 2003. 2003-9 An Object Based Algebra for Specifying A Fault Tolerant Software Architecture, Dragoni, N., Gaspari, M., June 2003. 2003-10 A Scalable Architecture for Responsive Auction Services Over the Internet, Amoroso, A., Fanzieri F., June 2003. 2003-11 WSSecSpaces: a Secure Data-Driven Coordination Service for Web Services Applications, Lucchi, R., Zavattaro, G., September 2003. 2003-12 Integrating Agent Communication Languages in Open Services Architectures, Dragoni, N., Gaspari, M., October 2003. 2003-13 Perfect load balancing on anonymous trees, Margara, L., Pistocchi, A., Vassura, M., October 2003. 2003-14 Towards Secure Epidemics: Detection and Removal of Malicious Peers in Epidemic-Style Protocols, Jelasity, M., Montresor, A., Babaoglu, O., November 2003. 2003-15 Gossip-based Unstructured Overlay Networks: An Experimental Evaluation, Jelasity, M., Guerraoui, R., Kermarrec, A-M., van Steen, M., December 2003. 2003-16 Robust Aggregation Protocols for Large-Scale Overlay Networks, Montresor, A., Jelasity, M., Babaoglu, O., December 2003. 2004-1 A Reliable Protocol for Synchronous Rendezvous (Note), Wischik, L., Wischik, D., February 2004. 2004-2 Design and evaluation of a migration-based architecture for massively populated Internet Games, Gardenghi, L., Pifferi, S., D’Angelo, G., March 2004. 2004-3 Security, Probability and Priority in the tuple-space Coordination Model (Ph.D. Thesis), Lucchi, R., March 2004. 2004-4 A New Graph-theoretic Approach to Clustering, with Applications to Computer Vision (Ph.D Thesis), Pavan., M., March 2004. 2004-5 Knowledge Management of Formal Mathematics and Interactive Theorem Proving (Ph.D. Thesis), Sacerdoti Coen, C., March 2004. 2004-6 An architecture for Content Distribution Internetworking (Ph.D. Thesis), Turrini, E., March 2004. 2004-7 T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies, Jelasity, M., Babaoglu, O., May 2004. 2004-8 A Robust Protocol for Building Superpeer Overlay Topologies, Montresor, A., May 2004. 2004-9 A Unified Approach to Structured, Semistructured and Unstructured Data, Magnani, M., Montesi, D., May 2004. 2004-10 Exact Methods Based on Node Routing Formulations for Arc Routing Problems, Baldacci, R.., Maniezzo, V., June 2004.

Mapping XQuery to Algebraic Expressions Matteo Magnani1

Danilo Montesi2

Technical Report UBLCS-2004-11 June 17, 2004 Abstract High-level database query languages are usually declarative. Typical examples are SQL, XQuery, and QBE. In this way users can express even complex queries without the need of strong technical knowledge. However, the underlying system must choose a good sequence of operations to execute the queries. This can be done by translating them into algebraic expressions, and by using algebraic equivalences to find good execution plans. This document shows how to map a significant subset of XQuery to an algebra proposed by Magnani and Montesi. This algebra is parametric on the language used to express predicates. We start by defining this language. Then we present the grammar producing the subset of XQuery object of this report. Finally, we show the procedure to translate it, and some examples.

1. Department of Computer Science, University of Bologna, Mura Anteo Zamboni 7, 40127 Bologna, Italy 2. Department of Mathematics and Informatics, University of Camerino, Via Madonna delle Carceri 9, 62032 Camerino MC, Italy

1

1

A

B

C E

B D F a)

A (C, E) A (A, A) ((A, C), E) b)

Introduction

A CE CE ACE E c)

Figure 1. Examples of TSEs. The input data graph is composed of three data trees, rooted respectively at A, C, and E (a). We have shown TSEs (b), together with the corresponding selected trees (c).

1

Introduction

High-level database query languages are usually declarative. Typical examples are SQL, XQuery, and QBE [AHV95, BCF+ 03]. In this way users can express even complex queries without the need of strong technical knowledge. However, the underlying system must choose a good sequence of operations to execute the queries. This can be done by translating them into algebraic expressions, and by using algebraic equivalences to find execution plans. This document shows how to map a significant subset of XQuery to the algebra described in [MM04]. This algebra is parametric on the language used to express predicates. We start by defining this language. Then we present the grammar producing the subset of XQuery object of this report. Finally, we show the procedure to translate it, and some examples.

2

Companion language

Let Σ be L ∪ Var ∪ {∗}, where L is a set of labels and Var is a (disjoint) set of variable names. The building blocks of the companion language we will use in this section have the following syntax (tree selection expressions): • x ∈ Σ is a tree selection expression (TSE). • (φ, φ) is a TSE if φ is a TSE. • φ is a TSE if φ is a TSE. Examples of TSEs are: $a, (book, paper), (author, editor), and ∗. When applied to a data graph, they select some of its data trees. In particular, the elements of L ∪ Var select the trees with an equal root node label (e.g., $a selects the root node whose label is $a). ∗ selects all the data trees. A negated expression φ selects all the trees not selected by φ, while a list of comma-separated expressions takes the union of the trees selected by them. For instance, (a, b) selects the trees labeled a (first part) and those not labeled b. In particular, the a part is redundant, as trees labeled a are already selected by b. Some other examples are shown in Fig.1. We will omit parentheses when unnecessary. TSEs are used to build the predicates composing our companion language. In particular: • Selection Predicates3 : – – – – –

A TSE φ is a selection predicate. (ψ)/x is a selection predicate if ψ is a selection predicate and x ∈ Σ. (ψ)//x is a selection predicate if ψ is a selection predicate and x ∈ Σ. (ψ; ψ) is a selection predicate if ψ are selection predicates. φ−/x is a selection predicate if φ is a TSE and x ∈ Σ.

3. Selection predicates are so called because they select subsets of data graphs. However, they are used inside algebraic projections, while selections use comparison predicates.

UBLCS-2004-11

2

2 Companion language

• Embed Predicates: –

< x >(φ;. . .;φ) is an embed predicate if x ∈ L and φ are TSEs or a literals.

• Comparison Predicates: –

ψ θ ψ is a comparison predicate if ψ is a selection predicate or a literal and θ is a comparison operator.

We introduce the semantics of these predicates by example. In particular, we will apply them to the following datagraph (Figure 2). A

E

B

B

F

C

D

G

F

txt1

txt2

txt3

txt4

Figure 2. Input datagraph used in the examples. Node labels are in capital letters, while textual contents are in italic. In this example, nodes with textual content have no label.

TSEs can be directly used as selection predicates. Every tree is included to the output if and only if it has been selected. Negation is necessary as sometimes we do not know the exact composition of the input data graph. Figure 3 shows some examples.

A

A

E

F

E

B

B

B

B

F

G

F

C

D

C

D

G

F

txt3

txt4

txt1

txt2

txt1

txt2

txt3

txt4

(a) E

(b) E

(c) (A, E)

(d) (A, E)

Figure 3. Selection predicates based on tree selection expressions. Predicate in Figure 3(c) selects all the input data trees. The behavior of ∗ would be the same. The last predicate selects nothing, and in this particular case it is equivalent to ∗.

Once a set of data trees has been selected, navigation instructions may be used to manipulate their internal nodes. In particular, (ψ)/x extracts the children of the nodes selected by ψ whose label is x, while (ψ)//x extracts the descendants of the nodes selected by ψ whose label is x. This predicate can duplicate some nodes, when a selected tree is a subset of another selected one (Figure 4(c)). φ −/x extracts the selection of φ, erasing the children of the root nodes named x - but preserving their subtrees (Figure 4(b)). Navigation predicates are illustrated in Figure 4. Selection predicates may be concatenated by means of a semi-colon. In this case, every subexpression is evaluated, contributing to the output data graph (Figure 5). If an implementation of the model allows multisets instead of sets of data trees, which is a typical choice to improve efficiency, the same tree can be included many times to the output. This happens when it is selected by different semi-colon-separated predicates. UBLCS-2004-11

3

3 Grammar of the XQuery Subset

A

B

B

C

D

txt1

txt2

F

C

D

txt1

txt2

(a) A/B

F

G

F

txt3

txt4

(b) A−/B

txt4

(c) E//F

Figure 4. Selection predicates based on tree selection expressions augmented by navigation instructions.

A

B

B

B

B

C

D

C

D

txt1

txt2

txt1

txt2

Figure 5. Application of the predicate: A; A/B

An embed predicate < x >(φ;. . .;φ) creates a new node with label x, whose children are either the trees selected by the TSEs, or unlabeled (text) nodes with the content defined by the literals, in the given order. Figure 6 illustrates three significant uses of this predicate. The semantics of comparison predicates depends on the operator. The left and right hand side subexpressions are evaluated to two datagraphs and compared. In the following examples, the comparison of an element to a literal is intended as a comparison of their string values. Besides defining the companion language for instances, we also need a little extension of the algebra to the practical context of XQuery. The algebra does not specify how to get input values, which is a context and system-dependent issue. Therefore, we assume the availability of a collection with an empty datagraph, represented with open and closed square brakets ([ ]), and an IN operator, which takes a collection and a reference to a data source as input, and it augments the collection with the nodes retrieved from the data source (we will show an example later, in Fig. 8). This is a necessary bridge between a formal language and its practical usage, which does not compromise the core algebra specification.

3

Grammar of the XQuery Subset

The subset of XQuery that will be mapped to our algebra includes its main functionalities, i.e., path expressions, for-let-where-return expressions, constructors, and arbitrarily nested subqueries. On the other side, it has many limitations, which in our opinion are not relevant to this discussion and would compromise the clarity of the exposition. However, we briefly cite the main of them. Path expressions are very simple, but additional features could be added painlessly to the companion language - for example, we only allow tests on node names, but we can easily define a syntax to express tests on types. We have not included ordering, as our model is not ordered outside data trees. As in the relational model, this feature simplifies formal investigations. Finally, we have not considered generic functions, for two reasons. Built-in functions may be implemented outside the algebraic representation of the language, as it happens in the relational context.

UBLCS-2004-11

4

3 Grammar of the XQuery Subset

A

H

A

E

H

E

B

B

C

D

G

F

txt1

txt2

txt3

txt4

B

B

F

C

D

txt1

txt2 txt3

F

(a) < H > (∗)

G

F txt4

(b) < H > ()

A

E

ciao

B

B

F

C

D

G

F

txt1

txt2

txt3

txt4

(c) (0 ciao0 )

Figure 6. Examples of embed predicates. The absence of tag name or content produces a node without respectively label and children. Textual content may be specified together with the expressions selecting the children. For compatibility with the XML data model, we do not allow consecutive textual nodes in the content specification.

Moreover, in our opinion the possibility of writing arbitrarily recursive functions is not a safe feature for a query algebra, and it should be implemented outside it. The BNF grammar defining the XQuery fragment is the following. Notice the unconstrained nesting of expressions inside for, let, where, return clauses and constructors. Expr ::= InputExpr | FLWORExpr | Literal | Constructor InputExpr ::= (InputFunctionCall | VarRef) (PathExpr)? InputFunctionCall ::= ( "doc(" | "collection(" ) ")" VarRef ::= PathExpr ::= ( ChildStep | DescendantStep ) ( ChildStep | DescendantStep )*

UBLCS-2004-11

5

4 Translation

ChildStep ::= "/" NameTest DescendantStep ::= "//" NameTest NameTest ::= | Wildcard Wildcard ::= "*" | ":*" | "*:" FLWRExpr ::=

(ForClause | LetClause)+ (WhereClause)? "return" Expr

ForClause ::= "for" "in" Expr LetClause ::= "let" ":=" Expr WhereClause ::= "where" Expr Expr Constructor ::= "" | (">" ElementContent* "")) ElementContent ::= | Constructor | EnclosedExpr Literal ::= NumericLiteral | NumericLiteral ::= | | EnclosedExpr ::= "{" Expr "}"

4

Translation

In this section we show how to translate the above fragment of XQuery to algebraic expressions. Let χ be a generic query in this language. Eχ (.) evaluates a query χ to a sequence, while EA (.) evaluates an algebraic query A to a folder. In the following we will omit subscripts: E(.) will be used to indicate the evaluation of any expression. We briefly informally review the semantics of the presented fragment of XQuery. If an input expression χ is not followed by path expressions, E(χ) is the sequence of nodes corresponding to the referenced variable or to the data source specified by an input function. If χ is followed by a list of path expressions, they substitute the nodes in the sequence with their children or descendants, filtering them on the value of labels. FLWR expressions create a table with the bindings defined by the for/let clauses, select some of the rows through a where clause and evaluates the return expression once for each row. The result is the concatenation of all the evaluations. A constructor embeds inside the specified new tag its nested expressions. We will simulate the table of bindings through a folder E(B) containing a row for each combination of values in the table of bindings, as represented in Figure 7. In the following, B will represent the algebraic expression evaluating to E(B), i.e., producing the table. Every translation T B of a query χ to an algebraic expression will be based on the Folder E(B). At the beginning, B = [ ], and it will be changed during the translation process when new variables are bounded. UBLCS-2004-11

6

4 Translation

$V1

$V1 A A B

$V2 1 2 2

Table with variable bindings (XQuery)

$V2

A

1

$V1

$V2

A

2

$V1

$V2

B

2 Folder E(B)

Figure 7. A table used by XQuery to bind variables, and the corresponding collection. Notice that this is a logical representation: At the physical level it is not necessary to replicate the names of the variables, and this collection can be represented as a relational table.

4.1

Input Expressions

Input expressions may be input function calls or variable references. They may be followed by a list of path expressions. In this section we show how they are translated when encountered by our parser. Given a query χ ("doc("|"collection(")")" and an expression B evaluating to a collection representing bounded variables, TB (χ) = IN (B)

The IN operator has been described in section 2, and its behavior is illustrated in Figure 8. It adds to each datagraph the set of nodes retrieved from < data source >. If an input function call is followed by path expressions, they are mapped to a projection on the nodes not representing bounded variables, which also duplicates the bounded ones. An example is shown in Fig.9.

$V1

$V2

$V1

$V2

DOC

A $V1

1 $V2

A $V1

1 $V2

book

A $V1

2 $V2

DOC

A $V1

2 $V2

DOC

B

2

book

B

2

book

E(B)

DataSource

DOC book

E(T(InputFunctionCall))

Figure 8. Input function calls evaluate to the sequence of nodes retrieved from the data source, once for every combination of values in the table of variable bindings. In the same way, the IN operator adds the content of the data source, represented as a data graph, to each row of E(B).

Variable references have a similar behavior: The children of the referenced variable are added to the end of each datagraph. This corresponds to the set of nodes previously bounded to that variable. Given a query χ representing the name of a variable, TB (χ) = πbounded;(VarName)/∗ (B) We use “bounded” to reference the list of variables bounded in E(B), not to lose them during the projection. Path expressions are included in the predicate, through child and descendant selections. Figure 10 shows an example of a variable reference. UBLCS-2004-11

7

4 Translation

$V1

$V2

$V1

$V2

DOC

$V1

$V2

A

1

A

1

book

A

1

$V1

$V2

$V1

$V2

DOC

$V1

$V2

A $V1

2 $V2

A $V1

2 $V2

book

2 $V2

book

DOC

A $V1

B

2

book

B

2

book

B

DOC book

2 E(B)

DataSource

E(T(InputFunctionCall))

book

E(T(PathExpr))

Figure 9. Behavior of a path expression. We want to evaluate the algebraic expression corresponding to doc(’DataSource’)/book. The first part (InputFunctionCall) evaluates to the collection in the figure, where the content of ’DataSource’ has been attached at the end of each data graph in E(B). The path expression furtherly processes the collection by extracting the children of the nodes produced by doc(’DataSource’) whose label is book. The resulting algebraic expression is: π$V 1,$V 2;$V 1,$V 2/book (INDataSource (B)).

$V1

$V2

$V1

$V2

A $V1

1 $V2

A $V1

1 $V2

A $V1

2 $V2

A $V1

2 $V2

B

2

B

2

1

2 2

E(B)

E(T(VarRef))

Figure 10. A reference to variable $V 2 is translated to an algebraic expression corresponding to the collection on the right.

4.2 FLWR Expressions The first part of a FLWR expression (FLWRExpr) is a sequence of For and Let clauses. For clauses may be translated to: B = (bounded) (ςbounded(TB (Expr))) We evaluate the nested expression Expr, we split the result keeping bounded variables, and we embed it (except bounded variables) into VarName (Figure 11). This procedure produces a new binding table with the newly bounded variable. In a similar way, as shown in Figure 12, a Let clause is translated to (Figure 12): B = (bounded) (TB (Expr)) Where clauses may erase some rows of the current binding table. If one of its nested expressions is a literal, it can be directly used inside the comparison predicate. However, in general, a where clause may contain two nested expressions which cannot be mapped directly to a selection predicate. In this case, our approach is to evaluate them, keeping references to their results through temporary variables, and erasing these variables after the comparison. This is done in three steps: we evaluate the first subexpression, and we embed its result into a new variable Var1: B = (bounded) (TB (Expr)) UBLCS-2004-11

8

4 Translation

$V1 $V2 1 1 A $V1 $V2 2 $V1

$V2

$V1

$V2

1 A $V1 $V2 2 2 A $V1 $V2 3 2 A $V1 $V2 2 2 B

1 2 A $V1

1 $V2

A $V1

1 $V2

A $V1

2 $V2

A $V1

2 $V2

B

2

B

2

2 3

2 E(B)

$V1 $V2 VarName 1 A 1 $V1 $V2 VarName 1 2 A $V1 $V2 VarName 2 A 2 $V1 $V2 VarName 2 3 A $V1 $V2 VarName 2 2 B

SPLIT

E(T(Expr))

EMBED

Figure 11. Evaluation of a For clause. The nested expression augments each datagraph of the table of bindings with its resulting sequence of nodes. Then, these nodes are distributed to obtain one new node in each row previous bindings are possibly replicated. Finally, we bound the new variable.

$V1

$V2

$V1

$V2

$V1 1

A

1

A

1

$V1

$V2

$V1

$V2

A $V1 2

A

2

A

2

$V1

$V2

$V1

$V2

$V2 VarName

2 1

1 2 $V2 VarName

3 A $V1

2

2 3 $V2 VarName

2 B

2

B

E(B)

2

B

E(T(Expr))

2

2

EMBED

Figure 12. Let clauses are evaluated as For clauses, without splitting the result of the nested expression. In this way, only one new variable is created for each existing row in the binding table.

This creates a new table of bindings, to prevent Var1 from being manipulated by the evaluation of the second subexpression: B = (bounded) (B) At this point, we have augmented the original table of bindings with the result of the two expressions to be compared, embedded into Var1 and Var2. The algebraic expression produced so far has the semantics illustrated in Figure 13. Finally, we perform the selection, as shown in Figure 14. Variables Var1 and Var2 are then erased, as they are no more necessary. The corresponding algebraic expression is: πold

bounded (σVar1/∗ Var2/∗ (B))

where “old bounded” indicates the list of bounded variables just after entering the where clause. In this way new temporary variables are not retained. A FLWR expression is translated as follows. First, for, let and where clauses are translated, in the given order. For and let clauses change the number of bounded variables. The where clause, if present, may erase some rows. This produces a new folder representing the table of bindings used inside the return clause. At this point, we can evaluate the return expression, and delete the last bounded variables: TMP = πnew UBLCS-2004-11

bounded (TB (Expr))

9

4 Translation

E(B)

E(T(Expr1))

E(B)

E(T(Expr1)) E(T(Expr2))

$V1

$V2

$V1

$V2

Var1

$V1

$V2

Var1

Var2

A $V1

1 $V2

A $V1

1 $V2

1 2

A $V1

1 $V2

1 2 Var1

2 Var2

A $V1

2 $V2

A $V1

2 $V2

2 3 Var1

A $V1

2 $V2

2 3 Var1

2 3 Var2

B

2

B

2

2

B

2

2

2

Var1

E(B) Figure 13. Where clause - part 1. Subexpressions to be compared are evaluated and referenced by new variables (Var1 and Var2).

$V1

$V2

Var1

Var2

A

2

2

2

$V1

$V2

Var1

B

2

2

3

$V1

$V2

A

2

Var2

$V1

$V2

2

B

2

3

SELECTION

DELETION

Figure 14. Where clause - part 2. Temporary variables Var1 and Var2 are used to select the required datagraphs. In this example, two out of three have been retained. After the selection, temporary variables are erased.

Now, we group on the old bounded variables, those which are still present in E(B) after the deletion of new bounded. TMP2 = γbounded(TMP) In this way there are no two rows in the result of our expression corresponding to a single row in the original table of bindings. Finally, if we have encountered a where clause, it may have erased some datagraphs. Therefore, we restore the missing rows by a left outer join with the original binding table OB, that we had got just after entering the FLWR clause. A left outer join operator has not been defined in [MM04], but its behavior is intuitive. It adds the result of the expression computed in TMP2 to the end of each row of the original table of bindings (OB), adding nothing if there is not a corresponding row in TMP2. OB =o n TMP2 4.3 Constructors An empty constructor is simply translated to an embedding with empty content. As we have shown in section 2, this has the effect of creating a new leaf node: () (B) When the constructor has a content, every subexpression is evaluated and bounded to a new variable. This temporarily changes E(B). B = () (TB (Expr))

UBLCS-2004-11

10

5 Examples

Then, an embedding is used to include them as children of the required node. TMP = (Var1 ;...;Varn ) (B) Finally, new variables are erased by means of a projection (Figures 15 and 16 show an example). π(bounded)−/Var1 ;...;(bounded)−/Varn (TMP) In practice, some subexpressions may represent textual nodes. In this case, instead of losing resources creating new variables for textual data, we directly include them inside the embed predicate. Finally, the original table of bindings is restored.

$V1

$V2

$V1

$V2

A $V1

2 $V2

A $V1

2 $V2

B

2

B

2

SubExpr1 ... ...

...

SubExprN ...

SubExpr1 ... ...

...

SubExprN ...

Figure 15. Evaluation of contructors - Part 1. All subexpressions are processed, and their results are embedded into temporary variables.

TagQName $V1

$V2

A

2

$V1

$V2

B

2

TagQName

SubExpr1 ... SubExprN ... ... ... ... TagQName

$V1

$V2

A

2

SubExpr1 ...

SubExprN

$V1

$V2

...

B

2

...

...

... ... TagQName

...

...

...

Figure 16. Evaluation of contructors - Part 2. Temporary variables are embedded into a node correponding to the new tag. Nodes used as temporary variables are then erased.

As embedding of subexpressions and deletion of temporary nodes can always been performed in a single step at the physical level, we will represent them together with the symbol  − . For instance, we will write: − (Var1 ;...;Varn ) (B) Instead of: π(bounded)−/Var1 ;...;(bounded)−/Varn ((Var1 ;...;Varn ) (B)) 4.4 Literals (Lit) A literal is a sort of constructor. It is simply translated to an embedding with empty label. Figure 17 shows an example of its behavior: (0 content0 ) (B)

5

Examples

The following queries have been automatically translated by a program written in Java, using JavaCC, implementing the translation steps presented above. UBLCS-2004-11

11

5 Examples

$V1

$V2

$V1

$V2

A $V1

1 $V2

A $V1

1 $V2

A $V1

2 $V2

A $V1

2 $V2

B

2

B

2

’ciao’

’ciao’ ’ciao’

E(B)

T(Literal)

Figure 17. A literal produces an unlabeled node with its content.

5.1

Input function call doc(’file.xml’) INfile.xml([ ])

5.2

Input function call with path expression collection(’coll’)//author π∗//author (INcoll ([ ]))

5.3

Nested constructors

Hello World B1 = (∗) ((0 World0 ) ([ ])) − (0 Hello 0 ;$0) (B1) 5.4

Literal ’hello’ (0 hello0 ) ([ ])

5.5

For-Return expression for $b in collection(’books’) for $a in $b//author return $a B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (ς$b (π$b;$b/∗//author (B1))) γ(π$a,$b (π$b,$a;$a/∗ (B2)))

5.6

For-Let-Return expression for $b in collection(’books’) let $a := $b//author return {$a}

UBLCS-2004-11

12

5 Examples

B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (π$b;$b/∗//author (B1)) B3 = ($b,$a) (π$b,$a;$a/∗ (B2)) γ(π$a,$b (− ($0) (B3))) 5.7

FLWR expression for $b in collection(’books’) let $a := $b//author where $b/title = ’Moby Dick’ return {$a} B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (π$b;$b/∗//author (B1)) B3 = σ$b/∗/title=0 Moby Dick0 (B2) B4 = ($b,$a) (π$b,$a;$a/∗ (B3)) γ(π$a,$b (− ($0) (B4))) 5.8

FLWR expression with constructor and nested variable reference

for $b in collection(’books’) let $a := $b//author where $b/title = ’Moby Dick’ return The authors are {$a} B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (π$b;$b/∗//author (B1)) B3 = σ$b/∗/title=0 Moby Dick0 (B2) B4 = ($b,$a) (π$b,$a;$a/∗ (B3)) B5 = ($b,$a) (− ($0) (B4)) γ(π$a,$b (− (0 The authors are 0 ;$1) (B5))) 5.9

Nested FLWR expression for $b in collection(’books’) return {for $a in $b/author return {$a/name}}

B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (ς$b (π$b;$b/∗/author (B1))) B3 = ($b,$a) (π$b,$a;$a/∗/name (B2)) B4 = ($b) (γ$b (π$a (− ($0) (B3)))) γ(π$b (− (0 ←-

0 ;$1;0 ←-0 )

(B4)))

5.10 Nesting on Where clause for $b in collection(’books’) where $b/title = doc(’my_favourite_books’)//title return $b/isbn UBLCS-2004-11

13

REFERENCES

B1 = (∗) (ς(INbooks ([ ]))) B2 = ($b) (π$b;$b//title (INmy

favourite books (B1)))

B3 = π$b (σ$b/∗/title=$0/∗ (B2)) γ(π$b (π$b;$b/∗/isbn (B3)))

References [AHV95] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley Longman Publishing Co., Inc., 1995. [BCF+ 03] Scott Boag, Don Chamberlin, Mary F. Fern´andez, Daniela Florescu, Jonathan Robie, and J´erˆome Sim´eon. XQuery 1.0: An XML query language (working draft, nov 12, 2003). Technical report, W3C, 2003. http://www.w3.org/TR/xquery/. [MM04]

Matteo Magnani and Danilo Montesi. A unified approach to structured, semistructured and unstructured data. Technical Report UBLCS-2004-9, University of Bologna, May 2004. ftp://ftp.cs.unibo.it/pub/TR/UBLCS/2004/2004-09.pdf.

UBLCS-2004-11

14