[8] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, editors, Extensible ... [9] Steve DeRose, editor, XML Pointer Language (XPointer), W3C Working.
1
Critical Review : A formal semantics of patterns in XSLT Asif Muhammad Department of Computer Science COMSATS University Islamabad, Pakistan
Abstract—In this journal paper which was published in 2000, it is stated that the W3C specifications of XSLT are ambiguous and not clear. So, a formal semantics of pattern language is presented in the form of denotational semantics. As per author, the provided semantics are clear and concise. With these semantics one can prove properties of the language which can help future development of the XSLT. Keywords: General programming languages, Markup languages, XSLT
I. I NTRODUCTION XSLT is a language for transforming XML documents into other XML documents. A key element of XSLT is the sub-language of patterns, which is used for matching and selection. The pattern language of XSLT has recently evolved into XPath, a language of selection paths and expressions that performs core functions of both XSLT and XPointer. Author claims that the English specification of pattern language given in 16 December 1998 draft of XSLT are not clear and also ambiguous. As a solution, a formal semantics of pattern language is presented in this paper. According to author, the provided semantics are clear and concise, summarizing in one page of formulas. With the support of these semantics one can thoroughly state and prove properties of the language. It may help to spot and correct troubles in language design as well. The paper introduces formal denotational semantics for a fragment of XSLT. As the author points out, the presence of such formal semantics quickly exposes inaccuracies and potential ambiguities in the rules. No previous knowledge of denotational semantics is assumed and a tutorial on sets, relations, and denotational semantics is accompanied by a small XML data model similar to the XML DOM API. The proposed semantics were developed using standard programming language techniques. The formal semantics brings to light issues that can be hard to spot in an English language description. For example, matching is a central concept in XSLT. Here is how it was described in Specification:
The result of the MatchExpr is true if, for any node in the document that contains the context of the MatchExpr, the result of evaluating the SelectExpr with that node as context contains the context of the MatchExpr. Otherwise the result is false. In the given specification, to make above sentence precise has made it almost unreadable. The sentence is ambiguous which will be shown in the example section latter. In contrast, the formal specification given by author presents the same information in just one line, is easier to read than the English, and avoids the ambiguity. As a result, the definition of match patterns in XSLT was simplified and made easier to implement. The formal semantics is given in a style known as denotational semantics. These semantics also draws upon techniques from the functional programming. The semantics was developed and debugged by transliterating it into the functional language Haskell. In related work, Haskell programs for manipulating XML have been developed by Wallace and Runciman. Finally, the author argues that the same techniques can be extended to give a denotational semantics of the entire XPath language. II. P ROBLEM D ESCRIPTION A. A data model for XML As a first step, a mathematical data model for XML is presented. This model mainly based on section 2.6 of XSL Version 1.0 of Working Draft 16-December-1998 which is similar to DOM. In this model, the basic data type is Node. Each Node is one of six kinds: root, element, attribute, text, comment, or processing instruction. We indicate this with six Boolean functions; for each node, exactly one of these functions yields true. We have the following functions that relate nodes to other nodes in the document. These functions are related by various laws. Only a root or element can have children, and only an element can have attribute nodes. The return type of these functions are as follows:
2
by p1 with current node x, and then let x2 be any node selected by p2 with current node x1. E.g. /library/author •
The seventh line says that the name n with current node x selects all sub nodes of x that are elements and have the name n
•
The pattern * with current node x selects all the sub nodes of x that are elements
•
The ninth line says that the pattern @n with current node x selects all sub nodes of x that are attributes and have the name n E.g. ”)/catalog/product[@dept =”ACC”]
•
The pattern . with current node x selects the set of nodes containing just x
The semantics is given in Figure 13 (at the end of this document). A pattern may be used for matching or selection. It is written like •
M[[p]]x to denote whether the pattern p matches against the node x
•
S[[p]]x to denote the set of nodes selected by the pattern p when the context node is x
•
Q[[q]]x to denote whether the qualifier q is satisfied when the context node is x
The above model reveals some small ambiguities in XPath, which does not say what the name of a root, text, or comment node should be. In fact, they should be empty but this is not entirely obvious since the DOM makes a different definition. DOM has a similar table, but the name of root, text, and comment nodes is taken to be ’#document’, ’#text’, and ’#comment’, respectively, rather than empty; while the value of root and element nodes is taken be empty, rather than the concatenation of the values of the children.
The semantic of the above functions have the following types:
B. A semantics for patterns In this section the semantics of matching and selection in the 16 December 1998 draft of XSLT was presented which is as follows:
Q[[p]]x states that a pattern p acting as a qualifier is satisfied in context node x if selecting from pattern p with context node x yields a non-empty set of nodes. M[[p]]x says that a pattern p matches in a context x if there is any node x1 in the document such that selecting with pattern p in context x1 yields a set that contains the original node x.
•
The pattern p1—p2 with current node x selects the union of the nodes selected by p1 and p2, each with current node x. For example, Product[1] — Product[4]
•
The pattern /p with current node x selects the same nodes as selected by the pattern p with current node root(x), the root of the document containing node x. E. g. /library
•
The pattern p1/p2 with current node x is the set of all nodes x2 selected as follows: let x1 be any node selected
C. Lessons from the semantics The description of matching in XSLT specification is ”The result of the MatchExpr is true if, for any node in the document that contains the context of the MatchExpr, the result of evaluating the SelectExpr with that node as context contains the context of the MatchExpr. Otherwise the result is false.” The given description is ambiguous. Should the phrase any node in the document that contains the context of the MatchExpr be read as any node in the document that contains the
3
context of the MatchExpr meaning any node in the document whatsoever? Or should it be read as any node in the document that contains the context of the MatchExpr meaning any node in the document that is an ancestor of the context node of the MatchExpr? From the formal specification we can see that the second one is the intended meaning.
One benefit of a formal semantics is that it becomes easier to prove equivalences between patterns. We have already observed that p/. and ./p and p are all equivalent patterns. More interestingly, one can also show that p1/id(p2) is equivalent to id(p1/p2). Once this equivalence was formulated, it was decided that it was confusing to have two forms of expression that do the same thing, so patterns were further restricted to eliminate the first form in favor of the second. D. Examples of Formal Semantics In the light of formal semantics, few examples are listed here. First W3C specification is given which is followed by the semantics proposed in the paper: • The context of the right hand side expressions is the context of the left hand side expression. The results of the right hand side expressions are node sets. The result of the left hand side UnionExpr is the union of the results of the right hand side expressions
•
The result is the context of the IdentityExpr
•
The result is the parent of the context of the ParentExpr. If the context is the root node, then the result is the empty node set
•
If * is specified, then the result is the child elements of the context of the ElementExpr. Otherwise, the result is the set of all elements that are the children of the context of ElementExpr and whose name is equal to QName
•
If * is specified, the result is the set of attribute nodes of the context of the AttributeExpr. If a QName is specified, the result is the attribute node of the context of the AttributeExpr named QName, or the empty node set if there is no such attribute node
III. C RITICAL R EVIEW In this paper, a formal semantics for XPath has been presented from the 16 December 1998 draft of XSLT. Author claims that the provided semantics are clear and concise, summarizing multiple pages into a single page. With these semantics properties of the language can be proved. The semantics based on standard programming language techniques. According to the paper, formal semantics highlight issues that are difficult to find in an English language description. Denotational semantics is used to style the formal semantics. The semantics was developed by the functional language Haskell. In the methodology section, a mathematical data model for XML has been proposed based on XSL specification which is very much similar to DOM. The provided model is so close to DOM that instead of reinventing the wheel why not author made the minor changes to DOM and reuse it in his research. Author criticized the variation made in the implementation of DOM. However, slight changes exist in the implementation of language implementation from original specifications. Another question could be why denotational semantics is used? We have other mathematical domains which can be used for this purpose as well. No such explanation is given for this selection. One benefit of a formal semantics is that it becomes easier to prove equivalences between patterns (e.g. Commutative, Associative and Distributive Laws). We have already observed that p/. and ./p and p are all equivalent patterns. More interestingly, one can also show that p1/id(p2) is equivalent to id(p1/p2). Once this equivalence was formulated, it was decided that it was confusing to have two forms of expression that do the same thing, so patterns were further restricted to eliminate the first form in favor of the second. In this way, only one implementation is sufficient which will help to keep the implementation size small and precise. Paper also suggested changes to MatchExpr specification of XPath. It is stated that XPath match expression searches in all ancestors of a given node of XML but this thing is not mentioned in the specification. As a result, a modified definition is given in denotational semantics. Question may arise here that why not we can apply these restrictions in English specifications as well? So different modifications in original specifications have been made by author. Some of them can be applied to English specification as well. There is some contradiction also found in the paper. In the proposed methodology, a condition was proposed in the match expression but in the next section it was called irrelevant. For example, the ambiguity noted previously becomes irrelevant: matching from any node in the document becomes equivalent to matching from any ancestor of the context node.
4
A very significant problem is discussed in this paper. The author aim was to remove the ambiguity from the XPath specification. With the provided model it seems that he successfully achieved it with a solid addition to the body of knowledge because he opened totally a new dimension. This work can also be extended to other mark-up languages as well to get benefit in that fields. Methodology used was also strongly backed by literature with valid and reliable reproducible results. Conclusions drawn from the result is also justified. Overall writing style and text structure is suitable from an expert level audience. IV. C ONCLUSION AND F UTURE W ORK In this paper, we have seen that a formal semantics can be more concise and readable than English. It lets one formulate and prove precise statements about the meaning of programs and such statements can assistance further development of a language. The proposed techniques can be applied to the entire XPath language. R EFERENCES [1] Lloyd Allison, A Practical Introduction to Denotational Semantics, Cambridge University Press, 1987. [2] Richard Bird, Introduction to Functional Programming, 2nd edition, Addison-Wesley, 1998. [3] Lauren Wood, editor, Document Object Model (DOM), Version 1.0, W3C Recommendation, 1 October 1998. http://www.w3.org/TR/REC-DOMLevel-1/ [4] The Haskell home page, www.haskell.org. [5] Larry Paulson, ML for the Working Programmer, 2nd edition, Cambridge University Press, 1998. [6] David A. Schmidt, The Structure of Typed Programming Languages, MIT Press, Cambridge, MA, 367 pages, 1994. [7] Malcolm Wallace and Colin Runciman, Haskell and XML: Generic Combinators or Type-Based Translation? 4’th International Conference on Functional Programming (ICFP 99), Paris, ACM Press, September 1999. [8] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, editors, Extensible Markup Language (XML) 1.0, W3C Recommendation, 10 February 1998. http://www.w3.org/TR/1998/REC-xml [9] Steve DeRose, editor, XML Pointer Language (XPointer), W3C Working Draft. http://www.w3.org/TR/WD-xptr [10] James Clark and Stephen Deach, editors, Extensible Stylesheet Language (XSL), W3C Working Draft, 16 December 1998. http://www.w3.org/TR/1998/WD-xsl-19981216 [11] James Clark, editor, XSL Transformations (XSLT), W3C Working Draft. http://www.w3.org/TR/WD-xslt [12] James Clark and Steve DeRose, editors, XSL Path Language (XPath), W3C Working Draft, 9 July 1999. http://www.w3.org/1999/07/WD-xpath19990709