Extreme Markup Languages 2001
Montréal, Québec August 14-17, 2001
XTL: An XML Transformation Language and XSLT generator for XTL Makoto Onizuka Research Engineer NTT CyberSpace Laboratories
[email protected] http://www.cs.washington.edu/homes/makoto/research.htm
Keywords: XTL; Queries; Transformations; XPath; XSLT; Tool/implementation: XTL-to-XSLT translator; Contrasting approaches: XTL, Quilt; Exposition: XTL syntax
Abstract This paper describes a new query and transformation language XTL (An XML transformation language). XTL is based on both output driven and schema driven approach: 1) To specify an output structure of transformation using XML schema language (so far we choose DTD), 2) To map from well-formed input XML documents to an output structure using XPath that is embedded in DTD. XTL has a simple syntax, as it is declarative and it has few extensions. Users only have to understand DTD and XPath specifications with few extensions and rules of XTL. XTL is powerful because it has efficient operations for extraction and transformation for XML data. This paper also describes XTL processor to translate XTL expression to XSLT expression. This generator is useful for XSLT users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.
XTL: An XML Transformation Language and XSLT generator for XTL § 1 What is XTL? As data exchange (B2B and B2C) is getting focused, not only a standardization of XML subset like SMIL, MML, B-XML and so forth, but also transformation technology is getting important to exchange XML data in difference structure. XTL is a transformation language to output an XML data by querying and transforming from a collection of input XML data. The related existing languages are: XML query language (XQL[XQL FAQ], XML-QL[A query language for XML], Quilt[Quilt: an XML Query Language]) XSL Transformations (XSLT [XSL Transformations (XSLT) Version 1.0][XSL Transformations (XSLT) Version 1.1]) However, these languages have several issues. It is not easy to learn and to use because of their syntax complexity (XSLT, Quilt for example). Their functions are not efficient (XQL lacks restructure operation, XML-QL lacks structure preserving query, for example). XTL solves these issues. XTL's main functions are as follows. To specify an output structure of transformation using XML schema language. So far, we choose DTD as the XML schema language because it is most popular in XML schema languages. XML Schema and RELAX will be good candidate of XML schema language for XTL. To specify mapping rule between input well-formed XML data and an output structure using XPath[XML Path Language (XPath) Version 1.0]. Therefore, XTL is easy to learn for people who have knowledge with DTD and XPath. In addition, XTL's fundamental operations are based on relational query model as follows. Table 1: XTL's fundamental operations use a relational query model
Operation Projection
Relational query model SELECT clause
Selection Rename tag
WHERE clause FROM clause
Extreme Markup Languages 2001
XTL expression Projected elements or attributes are specified using DTD XPath's selection is used Renamed element or attribute is specified using
p. 1
XTL: An XML Transformation Language and XSLT generator for XTL
DTD UNION clause, - clause Using XTL extension +, -, *, / operations for node-set specified by XPath. XPath 2.0 [XPath Language Requirement Version 2.0] will support these operations. Cartesian product (join) FROM clause (and Using XSLT function WHERE clause, document() table1.column1 = table2.column2) Sort ORDER BY clause Using XTL extension ORDER BY clause. Eliminate duplicated node DISTINCT or GROUP Using XTL extension BY clause GROUP BY clause Set operation (union, difference, intersect)
XTL includes several important functions of XML query as follows. We summarize the below table based on paper [XML Query Languages: Experiences and Exemplars] and adds several other important functions. Table 2: XTL includes XML query functions
Function Structure preserving
Description A query to preserve a structure of input XML data
XTL expression By specifying DTD and XPath to preserve a structure of input XML data Changing structure A query to change a XTL can change a (including flattening) structure of input XML structure of input XML data data by using DTD and XPath. Tag variable Keeping a same tag name Use $variable as tag in with input XML DTD External function Invocating a user-defined XPath selection condition function can use a user-defined function Specifying all of To extract all Use ANY in DTD sub-structures sub-structures of a specified tag Recursive query A query that executed on a To define a recursive recursive structure structure in DTD or recursive selection in XPath Reference (data models Referring to a referenced Referring to a referenced and navigations) tag tag using XPath
§ 2 Examples of XTL Expressions This section describes several examples to explain the XTL fundamental operations and functions.
Extreme Markup Languages 2001
p. 2
XTL: An XML Transformation Language and XSLT generator for XTL
2.1 Projection while preserving structure The first is an example to project several tags of XML data while preserving structure. The input XML data is showed below as bib.xml (XML_QL examples). An Introduction to Database Systems Date Addison-Wesley Foundations for Object/Relational Databases Date Darwen Addison-Wesley Data on the Web: from Relations to Semistructured Data & XML SergeAbiteboul PeterBuneman DanSuciu Morgan-Kaufman
MaryFernandez AlinDeutsch DanSuciu Storing Semi-structured Data Using STORED ACM SIGMOD NormanRamsey MaryFernandez The New Jersey Machine-Code Toolkit USENIX
Figure 1: Sample document: bibliography (bib.xml)
Let's suppose a query (or transformation) to make bibliography that contains only books and eliminates all articles. The next XTL expresses this query.
bib AS {bib} (book*)> book (title, author+)> book year CDATA #REQUIRED> title (#PCDATA)> author (firstname?, lastname)>
Extreme Markup Languages 2001
p. 3
XTL: An XML Transformation Language and XSLT generator for XTL
Figure 2: Match “book” element
This XTL expression specifies required elements/attributes, bib, book and all its sub-structures in DTD. Therefore, it produces a bibliography that includes only books. To make an implement of the XTL processor easy, XTL follows a rule that any element that can be a root tag must declared with XPath expression (AS {XPath expression}) in its element type declaration. The XPath expression specifies which part of input XML data is mapped to a tag in the output structure. In this example, the bib tag in the input data is mapped to the bib tag in the output structure. When an XPath expression is omitted for an element in content model or an attribute in attribute list declaration, XTL follows a rule that the specified output XML tag is used as default XPath expression. For example, has the same meaning with . However, it is a burden for users to specify all sub-structures of book because they don't transform it at all. XTL query using ANY in DTD solves this issue.
2.2 ANY This example with ANY produces the same query result with a result of the previous XTL expression.
Figure 3: Same result using ANY
ANY plays the same role of specifying the all sub-structures (title, author, firstname, and lastname) of book in this XTL expression. Basically, ANY matches all sub-structures recursively if there is no other element type declaration that matches the sub-structures. ANY is powerful to change a small part of an input XML data. Next example transforms author name by concatenating firstname and lastname and makes new author_name element while keeping the same with other part of input XML.
Figure 4: Power of ANY
In addition, this example contains a different use of AS clause. If AS clause is specified for #PCDATA or attribute data, it maps a value to an output node. If AS clause
Extreme Markup Languages 2001
p. 4
XTL: An XML Transformation Language and XSLT generator for XTL
is specified for element, it maps an input node to an output node.
2.3 Selection This example is to extract tags satisfying several conditions on elements or attributes. For example, let us make a bibliography that contains only books which is published after 1995 and whose title contains XML. The next XTL expresses this query.
bib AS {bib} (book*)> book (title AS {title[contains(.,'XML')]}, book year AS {@year[.>1995]} CDATA #REQUIRED> title (#PCDATA)> author ANY>
Figure 5: Match attribute values and strings
XPath expression @year[.>1995] indicates that book's year should be larger than 1995, and the expression title[contains(.,'XML')] indicates that all book's title should contain a string XML. Therefore, this XTL expression extracts such book whose year is larger than 1995 and whose title contains XML.
2.4 Rename Suppose an transformation example to capitalize all tag names.
Figure 6: Manipulate tag names
The first line in this XTL expression indicates that bib tag should be transformed to Bib tag for example.
Extreme Markup Languages 2001
p. 5
XTL: An XML Transformation Language and XSLT generator for XTL
2.5 Changing structure Next is a changing structure example is to build a list of authors.
Figure 7: Group to change structure (1)
This example specifies a collection of author element as child elements of bib element and that is extracted using XPath expression //author . Moreover, the GROUP BY clause eliminates duplicated author elements. The exclusion key that is specified with GROUP BY is {.} and indicates author element itself. This elimination is based on deep equality of author structure. Let us look at more complicated example that outputs a list of author that contains a list of book title for each author. This example transforms the input structure between title and author upside down.
Figure 8: Group to change structure (2)
As the same with the previous example, this XTL specifies a collection of author as child elements of bib tag. In addition, it also specifies a collection of title tag as child elements of author tag to make a title list for each author. The XPath expression //title[../author=$author] collects title elements from input XML data whose parent tag has a author element and that is the same with the current author (specified by $author ).
2.6 Sort Sort is an operation to reorder a collection of element in ascending or descending order using specified key tag value. Let us look at a simple example that collects an author list and sort it by its name in alphabetical order.
Figure 9: Sort (1)
Extreme Markup Languages 2001
p. 6
XTL: An XML Transformation Language and XSLT generator for XTL
In addition to collect an author list that is the same operation with the first example of changing structure, this example adds an ORDER BY clause to sort the author list. When users omit an ascending or descending clause, ascending is chosen as a default action. The next example describes a descending order example.
Figure 10: Sort (2)
Let us look at an example that has two sort operations in different part. The next example sorts both the author list and its title list.
Figure 11: Sort (3)
The first ORDER BY clause indicates that the author list should be sorted using lastname as key value. The second ORDER BY clause indicates the title list for each author should be sorted using title itself (expressed by .) as key value.
2.7 Join Join operation is a core operation in relational model because relation is a unit and all information is divided into a collection of relations. On the other hand, join operation in XTL is just to combine several XML data into one XML data. Let us look into a join operation example to combine two XML data. One of input XML data is book catalog (BookCatalogue.xml) that does not contain book price data and the other is bookstore list (BookCosts.xml) including book price for each bookstore. My Life and Times Paul McCartney July, 1998 94303-12021-43892 McMillin Publishing
Extreme Markup Languages 2001
p. 7
XTL: An XML Transformation Language and XSLT generator for XTL
Illusions The Adventures of a Reluctant Messiah Richard Bach 1977 0-440-34319-4 Dell Publishing Co. The First and Last Freedom J. Krishnamurti 1954 0-06-064831-7 Harper & Row
Figure 12: Sample document: book catalogue (BookCatalogue.xml)
My Life and Times $12.95 $10.95 Illusions The Adventures of a Reluctant Messiah $5.95 $6.95 The First and Last Freedom $9.95 $8.95
Figure 13: Sample document: bookstore list with prices (BookCosts.xml)
The below XTL expression produces an XML data that combines the above two XML data.
Bib AS {BookCatalogue} (Book*)> Book (Title, Author+, Date, ISBN, Publisher, Title (#PCDATA)> Author (#PCDATA)> Date (#PCDATA)>
Extreme Markup Languages 2001
p. 8
XTL: An XML Transformation Language and XSLT generator for XTL
Figure 14: Join
The important part of this example is that an XPath expression for Cost tag specified in second last row. The docment function, which is defined in XSLT specification, is applied to extract some part of other XML data (BookCosts.xml) and combines it to the base BookCatalogue.xml. The XPath expression, BookCosts/Book/Cost , following after the document function specifies an extraction target tag that should be mapped to Cost tag. The [../Title=$Title] specifies a selection condition for Book tag meaning that a Title tag under the Book tag should be the same with $Title . $Title is a sibling tag of current Cost tag and is defined at third line of this XTL example.
§ 3 XTL Specification Appendix shows XTL syntax. The XTL feature and its extensions to DTD are as follows.
3.1 Output structure is specified with DTD The DTD in XTL specifies an output structure of transformation. Therefore, it is easy for users to understand and specify the transformation result.
3.2 Embedded XPath clause into DTD XPath clause is embedded into DTD syntax for each element or attribute. This clause indicates a mapping rule from input XML to output XML and there are two types of mapping rule: 1) node map and 2) value map. 1. node map: This maps an input node (element or attribute) to an output node. In addition, this defines a current context of XPath for child nodes. Basically, XPath is relative expression to either its parent tag's XPath or its outer parenthesis's XPath. For example, means that the result element is mapped from bib and set a current context of XPath to bib. So the next XPath book corresponds to bib/book . The other example has the same meaning with the previous XTL in different expression. The output element bo's XPath corresponds to bib/book/. in this case. 2. value map: This maps a value to an output text node (expressed using PCDATA for element or any attribute). For example, means that the count element value is set with a result of count(author). It is possible to omit the XPath clause for element or attribute. Default rule of this is that the same tag with output XML tag is used as XPath expression for node map and text() is used as XPath expression for value map. For example, has the same meaning with
Extreme Markup Languages 2001
p. 9
XTL: An XML Transformation Language and XSLT generator for XTL
AS {book})> and has the same meaning with .
3.3 DTD Extension To increase a flexibility of specifying output structure, DTD syntax is extended as follows. 1. Mixed content model extension enables to handle #PCDATA and element in same way. Therefore, XTL can express like, and . 2. ORDER BY clause sorts a collection of specific element using specified key node. For example, means that a collection of author should be sorted using deep equality of itself. 3. GROUP BY eliminates duplicated elements using specified key node. For example, means that a collection of author should not contain a same author using deep equality of itself. 4. Tag variable enables to map any input tag to keep the same tag name. For example, means that $a is a tag variable and the input tag name (paper or article) is mapped to the result tag name.
3.4 Cardinality Constraint In DTD, it is possible to specify an element cardinality (*, +, ?, none) in a content model. DTD in XTL specifies an output structure of transformation. XTL defines that the cardinality of element is a constraint for its mapping rule. For example, indicates that XPath book should return a collection of book that count is more than zero. indicates that XPath book should return a collection of book that count is more than one. XTL also has to define a meaning of ANY and EMPTY in XTL. ANY means that the element can have any content model (structure) without any constraint. EMPTY means that the element should be empty element. For example, indicates that output author can be any content model and all sub-elements of input author is mapped to the transformation result. On the other hand indicates that the input author should be empty element.
3.5 Variable binding A variable is automatically set for each element and attribute. Any embedded XPath can refer those variables as long as the referee node assigned to variable is reachable from a referrer via many-to-one or one-to-one association in DTD graph. For example, indicates that $b refers input bib element. If XTL has a recursive element definition, then a closest parent is referenced from its children.
§ 4 XSLT generator for XTL I have implemented XTL processor to translate XTL expression to XSLT exprssion. This generator is useful for users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.
Extreme Markup Languages 2001
p. 10
XTL: An XML Transformation Language and XSLT generator for XTL
This processor doses not implement the XTL specification fully, because there are some difficulties to translate XTL to XSLT that comes from model gaps between XTL and XSLT. However, this generator translates most of XTL into XSLT so users can use this generator for many XML transformations.
4.1 Translation from XTL to XSLT This section briefly describes how the XTL processor translates XTL expression to XSLT expression. The details will come in following section. First of all, let us remind the XSLT expression. XSLT expression is composed of several template declarations. XSLT processor inputs an XML and checks whether there are some templates that match an input element, attribute, or text. If some template matches, then it is applied and executed. If there are no matching templates, then default action is applied and output the input text. 4.1.1 Template generation Basically, each element type declaration is translated to one XSLT template declaration. There are two patterns regardless of its content model: 1. XPath is specified for a declared element like . 2. XPath is not specified for a declared element like . There are two cases that users must use the first pattern: 1) the element is a candidate of root element or 2) the element is a candidate of a descendant of an element declared as ANY. If not, user can use either the first or second pattern. This is a little bit confusing but this reduces the number of template declaration that this XTL processor generates. The first pattern element declaration is translated into two template declarations like ... for 1) case and ... for 2) case. The second pattern element declaration is translated into a template declaration like