XPath. Web Data Management and Distribution. Serge Abiteboul Ioana
Manolescu. Philippe Rigaux. Marie-Christine Rousset Pierre Senellart. Web Data
...
XPath Web Data Management and Distribution
Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart
Web Data Management and Distribution http://webdam.inria.fr/textbook
May 30, 2013
WebDam (INRIA)
XPath
May 30, 2013
1 / 36
Introduction
Outline 1
Introduction
2
Path Expressions
3
Operators and Functions
4
XPath examples
5
XPath 2.0
6
Reference Information
7
Exercise
WebDam (INRIA)
XPath
May 30, 2013
2 / 36
Introduction
XPath An expression language to be used in another host language (e.g., XSLT, XQuery). Allows the description of paths in an XML tree, and the retrieval of nodes that match these paths. Can also be used for performing some (limited) operations on XML data.
Example 2*3 is an XPath literal expression.
//*[@msg="Hello world"] is an XPath path expression, retrieving all elements with a msg attribute set to “Hello world”. Content of this presentation Mostly XPath 1.0: a W3C recommendation published in 1999, widely used. Also a basic introduction to XPath 2.0, published in 2007. WebDam (INRIA)
XPath
May 30, 2013
3 / 36
Introduction
XPath Data Model XPath expressions operate over XML trees, which consist of the following node types: Document: the root node of the XML document; Element: element nodes; Attribute: attribute nodes, represented as children of an Element node; Text: text nodes, i.e., leaves of the XML tree.
Remark Remark 1 The XPath data model features also ProcessingInstruction and Comment node types. Remark 2 Syntactic features specific to serialized representation (e.g., entities, literal section) are ignored by XPath.
WebDam (INRIA)
XPath
May 30, 2013
4 / 36
Introduction
From serialized representation to XML trees
Text 1 Text 2 Element B Element Element Text 3 D D Text 1 Text 2
Document
Element A
Element C
Element B
Element D
Attr att1
Attr att2
2
3
Text Text 3
(Some attributes not shown.)
WebDam (INRIA)
XPath
May 30, 2013
5 / 36
Introduction
XPath Data Model (cont.) The root node of an XML tree is the (unique) Document node; The root element is the (unique) Element child of the root node; A node has a name, or a value, or both ◮ ◮ ◮
an Element node has a name, but no value; a Text node has a value (a character string), but no name; an Attribute node has both a name and a value.
Attributes are special! Attributes are not considered as first-class nodes in an XML tree. They must be addressed specifically, when needed.
Remark The expression “textual value of an Element N” denotes the concatenation of all the Text node values which are descendant of N, taken in the document order.
WebDam (INRIA)
XPath
May 30, 2013
6 / 36
Path Expressions
Outline 1
Introduction
2
Path Expressions Steps and expressions Axes and node tests Predicates
3
Operators and Functions
4
XPath examples
5
XPath 2.0
6
Reference Information
7
Exercise WebDam (INRIA)
XPath
May 30, 2013
7 / 36
Path Expressions
Steps and expressions
XPath Context A step is evaluated in a specific context [< N1 , N2 , · · · , Nn >, Nc ] which consists of: a context list < N1 , N2 , · · · , Nn > of nodes from the XML tree; a context node Nc belonging to the context list. Information on the context The context length n is a positive integer indicating the size of a contextual list of nodes; it can be known by using the function last(); The context node position c ∈ [1, n] is a positive integer indicating the position of the context node in the context list of nodes; it can be known by using the function position().
WebDam (INRIA)
XPath
May 30, 2013
8 / 36
Path Expressions
Steps and expressions
XPath steps The basic component of XPath expression are steps, of the form: axis::node-test[P1 ][P2 ]. . . [Pn] axis is an axis name indicating what the direction of the step in the XML tree is (child is the default). node-test is a node test, indicating the kind of nodes to select. Pi is a predicate, that is, any XPath expression, evaluated as a boolean, indicating an additional condition. There may be no predicates at all. Interpretation of a step A step is evaluated with respect to a context, and returns a node list.
Example descendant::C[@att1=’1’] is a step which denotes all the Element nodes named C, descendant of the context node, having an Attribute node att1 with value 1 WebDam (INRIA)
XPath
May 30, 2013
9 / 36
Path Expressions
Steps and expressions
Path Expressions A path expression is of the form: [/]step1 /step2/. . . /stepn A path that begins with / is an absolute path expression; A path that does not begin with / is a relative path expression.
Example /A/B is an absolute path expression denoting the Element nodes with name B, children of the root named A
./B/descendant::text() is a relative path expression which denotes all the Text nodes descendant of an Element B, itself child of the context node
/A/B/@att1[. > 2] denotes all the Attribute nodes @att1 (of a B node child of the A root element) whose value is greater than 2 . is a special step, which refers to the context node. Thus, ./toto means the same thing as toto. WebDam (INRIA)
XPath
May 30, 2013
10 / 36
Path Expressions
Steps and expressions
Evaluation of Path Expressions Each step stepi is interpreted with respect to a context; its result is a node list. A step stepi is evaluated with respect to the context of stepi −1 . More precisely: For i = 1 (first step) if the path is absolute, the context is a singleton, the root of the XML tree; else (relative paths) the context is defined by the environment; For i > 1 if N =< N1 , N2 , · · · , Nn > is the result of step stepi −1 , stepi is successively evaluated with respect to the context [N , Nj ], for each j ∈ [1, n]. The result of the path expression is the node set obtained after evaluating the last step.
WebDam (INRIA)
XPath
May 30, 2013
11 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 The path expression is absolute: the context consists of the root node of the tree.
The first step, A, is evaluated with respect to this context.
Document
Element A
Element B
Attr att1
Element D
Element B
Element D
1
WebDam (INRIA)
Attr att1
Element D
2
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 The result is A, the root element.
A is the context for the evaluation of the second step, B.
Document
Element A
Element B
Attr att1
Element B
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
1
WebDam (INRIA)
Attr att1
Element D
Element D
2
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 The result is a node list with two nodes B[1], B[2].
@att1 is first evaluated with the context node B[1].
Document
Element A
Element B
Attr att1
Element B
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
1
WebDam (INRIA)
Attr att1
Element D
Element D
2
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 The result is the attribute node of B[1].
Document
Element A
Element B
Attr att1
Element D
Element B
Element D
1
WebDam (INRIA)
Attr att1
Element D
2
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1
@att1 is also evaluated with the context node B[2].
Document
Element A
Element B
Attr att1
Element D
Element B
Element D
1
WebDam (INRIA)
Attr att1
Element D
2
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 The result is the attribute node of B[2].
Document
Element A
Element B
Attr att1
Element D
Element B
Element D
WebDam (INRIA)
Attr att1
Element D
2
1
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
May 30, 2013
12 / 36
Path Expressions
Steps and expressions
Evaluation of /A/B/@att1 Final result: the node set union of all the results of the last step, @att1.
Document
Element A
Element B
Attr att1
Element D
Element B
Element D
1
WebDam (INRIA)
Attr att1
Element D
2
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
May 30, 2013
12 / 36
Path Expressions
Axes and node tests
Axes An axis = a set of nodes determined from the context node, and an ordering of the sequence.
child (default axis). parent Parent node. attribute Attribute nodes. descendant Descendants, excluding the node itself. descendant-or-self Descendants, including the node itself. ancestor Ancestors, excluding the node itself. ancestor-or-self Ancestors, including the node itself. following Following nodes in document order. following-sibling Following siblings in document order. preceding Preceding nodes in document order. preceding-sibling Preceding siblings in document order. self The context node itself. WebDam (INRIA)
XPath
May 30, 2013
13 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Child axis: denotes the Element or Text children of the context node. Important: An Attribute node has a parent (the element on which it is located), but an attribute node is not one of the children of its parent.
WebDam (INRIA)
Result of child::D (equivalent to D) Document
Element A
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Parent axis: denotes the parent of the context node. The node test is either an element name, or * which matches all names, node() which matches all node types. Always a Element or Document node, or an empty node-set (if the parent does not match the node test or does not satisfy a predicate). .. is an abbreviation for parent::node(): the parent of the context node, whatever its type. WebDam (INRIA)
Result of parent::node() (may be abbreviated to ..) Document
Element A
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Attribute axis: denotes the attributes of the context node. The node test is either the attribute name, or * which matches all the names.
Result of attribute::* (equiv. to @*) Document
Element A
Element B
WebDam (INRIA)
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Descendant axis: all the Result of descendant::node() descendant nodes, except Document the Attribute nodes. The node test is either Element A the node name (for Element nodes), or * (any ElElement Element Element ement node) or text() C B B (any Text node) or node() Attr Attr Attr Element Element Element (all nodes). att2 att1 att1 D D D 3 1 2 The context node does not Text Text Text belong to the result: use -
descendant-or-self
Text 1
Text 2
Text 3
instead.
WebDam (INRIA)
XPath
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Descendant axis: all the descendant nodes, except the Attribute nodes. The node test is either the node name (for Element nodes), or * (any Element node) or text() (any Text node) or node() (all nodes). The context node does not belong to the result: use
descendant-or-self
Result of descendant::* Document
Element A
Element B
Attr att1
Element D
Element C
Element B
Element D
Element D
1
Text -
Text -
Text -
Text 1
Text 2
Text 3
Attr att1
Attr att2
2
3
instead.
WebDam (INRIA)
XPath
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Ancestor axis: all the ancestor nodes. The node test is either the node name (for Element nodes), or node() (any Element node, and the Document root node ). The context node does not belong to the result: use ancestor-or-self instead.
WebDam (INRIA)
Result of ancestor::node() Document
Element A
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Following axis: all the nodes that follows the context node in the document order. Attribute nodes are not selected. The node test is either the node name, * text() or node(). The axis preceding denotes all the nodes the precede the context node.
WebDam (INRIA)
Result of following::node() Document
Element A
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Examples of axis interpretation Following sibling axis: all the nodes that follows the context node, and share the same parent node. Same node tests as descendant or following. The axis
Result of following-sibling::node()
preceding-sibling
Element D
denotes all the nodes the precede the context node.
WebDam (INRIA)
Document
Element A
Element B
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
14 / 36
Path Expressions
Axes and node tests
Abbreviations (summary) Summary of abbrevations:
somename . .. @someattr a//b //a /
child::somename self::node() parent::node() attribute::someattr a/descendant-or-self::node()/b /descendant-or-self::node()/a /self::node()
Examples @b selects the b attribute of the context node
../* selects all element siblings of the context node, itself included (if it is an element node)
//@someattr selects all someattr attributes wherever their position in the document WebDam (INRIA)
XPath
May 30, 2013
15 / 36
Path Expressions
Axes and node tests
Node Tests (summary) A node test has one of the following forms:
node() any node text() any text node * any element (or any attribute for the attribute axis) ns:* any element or attribute in the namespace bound to the prefix ns ns:toto any element or attribute in the namespace bound to the prefix ns and whose name is toto Examples a/node() selects all nodes which are children of a a node, itself child of the context node
xsl:* selects all elements whose namespace is bound to the prefix xsl and that are children of the context node /* selects the top-level element node WebDam (INRIA)
XPath
May 30, 2013
16 / 36
Path Expressions
Predicates
XPath Predicates Boolean expression, built with tests and the Boolean connectors and and
or (negation is expressed with the not() function); a test is ◮ ◮
either an XPath expression, whose result is converted to a Boolean; a comparison or a call to a Boolean function.
Important: predicate evaluation requires several rules for converting nodes and node sets to the appropriate type.
Example //B[@att1=1]: nodes B having an attribute att1 with value 1;
//B[@att1]: all nodes B having an attributes named att1! ⇒ @att1 is an XPath expression whose result (a node set) is converted to a Boolean.
//B/descendant::text()[position()=1]: the first Text node descendant of each node B. Can be abbreviated to //B/descendant::text()[1]. WebDam (INRIA)
XPath
May 30, 2013
17 / 36
Path Expressions
Predicates
Predicate evaluation A step is of the form axis::node-test[P].
Ex.: /A/B/descendant::text()[1] Document
First
axis::node-test Element A
is evaluated: one obtains an intermediate result I Second, for each node in I, P is evaluated: the step result consists of those nodes in I for which P is true.
WebDam (INRIA)
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
18 / 36
Path Expressions
Predicates
Predicate evaluation Result
of
/A/B/descendant::text()[1] Beware: an XPath step is always evaluated with respect to the context of the previous step. Here the result consists of those Text nodes, first descendant (in the document order) of a node B.
WebDam (INRIA)
Document
Element A
Element B
Element D
Element D
Element D
Text -
Text -
Text -
Text 1
Text 2
Text 3
XPath
Element C
Element B
Attr att1
Attr att2
2
3
May 30, 2013
18 / 36
Path Expressions
Predicates
XPath 1.0 Type System Four primitive types: Type Description boolean Boolean values number Floating-point string Ch. strings nodeset Node set
Literals none 12, 12.5 "to", ’ti’ none
Examples true(), not($a=3)
1 div 33 concat(’Hello’,’!’) /a/b[c=1 or @e]/d
The boolean(), number(), string() functions convert types into each other (no conversion to nodesets is defined), but this conversion is done in an implicit way most of the time. Rules for converting to a boolean: A number is true if it is neither 0 nor NaN. A string is true if its length is not 0. A nodeset is true if it is not empty. WebDam (INRIA)
XPath
May 30, 2013
19 / 36
Path Expressions
Predicates
Rules for converting a nodeset to a string: The string value of a nodeset is the string value of its first item in document order. The string value of an element or document node is the concatenation of the character data in all text nodes below. The string value of a text node is its character data. The string value of an attribute node is the attribute value.
Examples (Whitespace-only text nodes removed)
tata
string(/) string(/a/@toto) boolean(/a/b) boolean(/a/e) WebDam (INRIA)
"tata" "3" true() false() XPath
May 30, 2013
20 / 36
Operators and Functions
Outline 1
Introduction
2
Path Expressions
3
Operators and Functions
4
XPath examples
5
XPath 2.0
6
Reference Information
7
Exercise
WebDam (INRIA)
XPath
May 30, 2013
21 / 36
Operators and Functions
Operators The following operators can be used in XPath.
+, -, *, div, mod standard arithmetic operators (Example: 1+2*-3). Warning! div is used instead of the usual /. or, and boolean operators (Example: @a and c=3) =, != equality operators. Can be used for strings, booleans or numbers. Warning! //a!=3 means: there is an a element in the document whose string value is different from 3. relational operators (Example: ($a0)). Warning! Can only be used to compare numbers, not strings. If an XPath expression is embedded in an XML document, < must be escaped as