data FuncCall = Call [EvalExpr] [ExecStmt] data Assign ... grammars in software, and describes applications in which grammars play an important rôle ..... Context-free .... S. de Picciotto, editor, Proceedings of the International Conference on In-.
Applications of Grammars Johan Jeuring Summer 2006
1.
Introduction
Grammars are an important tool for describing languages. The course on Languages and Parsing gives many examples of grammars. The applications that have been used to introduce the different kinds of grammar formalisms range from grammars for natural languages to programming languages, to languages used to describe growth in biology. Parsers are used to recognize structure in sentences, and to give meaning to sentences. Languages, grammars, and parsers are the three main concepts introduced in this course. ˆ do languages, grammars, and parsers play in software nowaWhat role days? From most books on grammars and parsing, including the lecture notes for this course, one can easily be led to believe that grammars are particularly useful for specifying the front end of a compiler. And they ˆ This are. But there are many more situations where grammars play a role. chapter tries to give an overview of situations where languages, grammars, ˆ and parsers play a role. Since there are so many applications of these concepts it is impossible to be complete. We will try to discuss some of the more well-known applications. Note that in this chapter we use the term grammar as an alias for a structural description in a software system. This is a more liberal usage of the term as in the other material for this course, where a grammar consists of terminals, nonterminals, productions, and a start-symbol. Usually there exist simple mappings to map grammars we will give in this chapter to grammars in the formal sense. Figures 1, 2, and 3 show some representative examples of grammars. Figure 1 is perhaps
[ axioms ] program ::= declaration statements [ decs ] declarations ::= declaration ";" declarations [ nodec ] declarations ::= ε [ dec ] declaration ::= id ":" type [ stats ] statements ::= statement ";" statements [ nostat ] statements ::= ε [ assign ] statement ::= id ":=" expression [ var ] expression ::= id ... Figure 1: Example of a BNF grammar
the most obvious example of a grammar. It describes a part of the concrete syntax for an imperative programming language. Figure 2 shows the data ExecProg = ExecProg [ Either ExecStmt EvalExpr ] data ExecStmt = ExecStmt [ Either ExecStmt EvalExpr ] data EvalExpr = EvalCall FuncCall | EvalAssign Assign | EvalOthers [ EvalExpr ] data FuncCall = Call [ EvalExpr ] [ ExecStmt ] data Assign = Assign [ EvalExpr ] Desti data Desti = ... Figure 2: Example of a grammar represented by Haskell data types structure of event traces for the execution of C programs as a collection of Haskell data types. The exact details are not important. Figure 3 gives an example of a Document Type Definition (DTD [11]) for recipes. A DTD describes the structure of (is a grammar for) XML documents. The message of this chapter is that grammars are a very important concept in Computer Science, comparable to for example classes and objects, and trees. It is easy to underestimate the importance of grammars, or to overˆ grammars play in Computer look their presence. Understanding the role Science helps in getting a deeper understanding of the field, and to see relations between ideas and concepts which otherwise remain unconnected. This chapter is organized as follows. Section 2 gives a high-level description of software, and shows where grammars are used. Section 3 discusses ˆ of grammars in XML. The Extensible Markup Language (XML) is a the role simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML ˆ in the exchange of a wide is also playing an increasingly important role variety of data on the Web and elsewhere. Grammars are used in many applications of XML. Section 4 gives a more detailed overview of the use of grammars in software, and describes applications in which grammars play ˆ an important role.
collection (description,recipe*)> description ANY> recipe (title,ingredient*,preparation,comment?,nutrition)> title (#PCDATA)> ingredient (ingredient*,preparation)?> ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> preparation (step*)> step (#PCDATA)> comment (#PCDATA)> nutrition EMPTY> nutrition protein CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED calories CDATA #REQUIRED alcohol CDATA #IMPLIED> Figure 3: Example of a DTD
2.
Software
Grammars are important for specifying input, given by the user, and output of software, and when defining functions (methods). We devote a subsection to both of these topics. Of course, the languages in which software is written and functions are defined are also specified by means of grammars, but we will not talk about these grammars in this section.
2.1.
Specifying input and output
Most software expects some input, and produces output. For example, this holds for all software in Microsoft Office, all search engines such as Google and Yahoo, etc. It is actually harder to find examples of software that does not take input and produce output. Software that does not produce output cannot be inspected by anyone, and is likely to be useless. It is not hard to find examples of software that does not take input, think of screensavers, and a program that generates (the start of the sequence of) all primes. Although it is possible to find many examples of software that does not take input, it is a small category. Usually, input of a piece of software follows some grammatical rules. These may be simple rules, such as for example that a Dutch zip code consists of four numbers followed by a space followed by two capital letters, or they may be complicated rules describing the grammar of a natural language. The former rules are used in almost every piece of software that asks a user for address information, the latter rules are used in natural language translation software (English to Dutch, English to Spanish, etc.). Even games expect input to be of a particular form, but since this form often consists of single keystrokes (arrows for moves, function keys or space for actions ˆ like shooting or jumping, for example), structure does not really play a role here. There are different ways to provide input. Most modern software packages communicate with the user via a user interface, in which a user can provide input, and the software can show output. There are many ways in which input can be provided: ➙ via drop-down lists. If a piece of software wishes to present a user with a restricted choice of options and does not want to risk an item
being miss-typed in a text field, a drop-down list is a good solution. For example, many internet applications use drop-down lists for expiration dates of credit cards and for selecting a country. ➙ via one or more text fields. For example, if a user has to provide address information, text fields for giving first name, family name, street, city, state, country, zip code, phone number, etc are given. ➙ via a text area. For example, programming environments and text editors such as Microsoft Word provide an area in which a user can type text. The software tries to recognize the structure, and provides help in structuring the program or text. Most programming environments provide syntax coloring, and support for finding the definition of a method that is used somewhere. If input is obtained via drop-down lists it is not necessary to specify the structure of the input: only correct input can be selected by the user. Using text fields, it becomes important to precisely specify the structure of the input. For example, zip codes have a particular format, and it is important to correct a user if a zip code in a wrong format is entered. Usually, however, the amount of structure in a single text field is limited. Text areas in programming environments require a lot of knowledge about structure, and grammars for specifying the expected structure may be very rich. Grammars for programming languages usually consist of many pages. Structural formalisms are used in almost any piece of software. For simple ˆ cases such as zip codes they play a neglectable role, but there are many ˆ applications where structural formalisms play a larger than superficial role. Section 4 gives an overview of such applications.
2.2.
Defining functions
How do we define a function, which takes input, and produces output? There are many ways to define a function. We might specify a function by giving all inputs and corresponding outputs. This only works for finite input domains. If the input domain is infinite, we may define a function by means of recursion, for example. We discuss some important ways to define functions here. First, assume the domain of a function is finite. For example, the domain might be the days of the week, and the function might return the opening
hours of a museum on those weekdays; or the domain might be the months of the year, and the function returns the special expositions that are exhibited in each month. If the domain of a function is finite, the function can be defined by specifying for each value in the domain the result of the function. The function is essentially a table that relates the values in the domain to their result values. Many domains are infinite. A simple example of an infinite domain is the natural numbers. It follows that lists or trees of natural numbers are infinite domains as well. Actually, almost all kinds of trees used in software are infinite domains. If the domain of a function is infinite we cannot define it by means of a table. So we have to think of alternative means for defining functions on infinite domains. In the rest of this section we will briefly discuss defining functions on finite descriptions of infinite domains, and defining functions as inverses of other functions. There are other ways, such as defining a function as the fixed point of another function, but we will not be complete. The description of an infinite domain is often finite. For example, we can consider the following data type in Haskell for natural numbers data Nat = Zero | Succ Nat as a finite description of an infinite domain. We consider such a finite description of an infinite domain to be a grammar. To define a function on the natural numbers, we define the function on this finite description of an infinite domain. For example, on natural numbers we specify the result of a function for the value Zero, and for the value Succ n, where n may be any natural number. Often, functions on natural numbers (or on other finite descriptions of infinite domains) are defined by means of recursion. For example, here is a, computationally inefficient, definition of the fibonacci function: fib fib Zero fib (Succ Zero) fib (Succ (Succ n))
:: Nat → Nat = Succ Zero = Succ Zero = fib (Succ n) + fib n
Another way to define a function is as the inverse of another function. As an example we will have a look at assignments. Here is an example of a sequence of assignments:
x := 3; if x > 0 then x := 0 We use the following (part of a) simple data type to represent abstract syntax trees for statements: data Stats = Assign String Expr | Sequence Stats Stats | IfThen Expr Stats | Skip We assume Expr is a data type that can be used to represent standard arithmetical and boolean expressions. Here is how the above example sequence of statements is represented in this abstract syntax: s :: Stats s = Sequence (Assign "x" (Const 3)) (IfThen (GT (Var "x") (Const 0)) (Assign "x" (Const 0))) A value of the abstract syntax for statements can be printed as a program by means of the function print print print (Assign v e) print (Sequence s1 s2 ) print (IfThen e s)
:: Stats → String = v ++ " := " ++ printExpr e = print s1 ++ ";\n" ++ print s2 = "if " ++ printExpr e ++ " then " ++ print s
The function parse can now be defined as the inverse of function print. This is quite a common pattern for specifying or defining functions: functions that store values in a database, or compress values are other examples that can be specified thus.
3.
XML
Since its release in 1998, the XML [11] standard has become very popular, and its use is widely spread. XML is used a lot for exchanging documents that follow a particular structure. For example, here is a recipe that follows the structure described by the DTD for recipes given in Figure 3. Tea with milk and sugar Cook the water, add the tea leaves. Remove the tea leaves after a couple of minutes. Add the milk and the sugar to the tea. Stir. This document can be viewed as a sentence in the language specified by the recipe DTD. The ability to specify the structure of a document by means of a DTD is one of the most important features of XML. It follows that the XML world contains many examples of the use of grammars in documents and software. This section assumes a basic knowledge of XML.
3.1.
DTDs
There exists thousands of DTDs. For almost any domain DTDs have been developed. One of the most well-known DTDs is the XHTML DTD, used for writing web pages. There are DTDs for mathematical markup (MathML), for standards related to mobile phones (the Open Mobile Alliance maintains about a hundred DTDs), for music (MusicML), for graphics (SVG), etc. Each of these DTDs specifies a grammar to which documents of the DTD have to adhere. The single most important usage of DTDs is validation. A validator checks whether or not an XML document conforms to a DTD. A well-known validator is the W3C validator for checking whether or not a web-page contains valid XHTML. Conformance supports interoperability, and the possibility to develop components that use the structure of documents to offer particular functionality. For the recipe example we can think of: ➙ a component that given a number of recipes returns a shopping list, and/or the price of the recipes in different shops, . . . ➙ a component that given a number of recipes calculates a health-score, and/or gives advise about a next recipe, and/or gives advise about wines that go well with the food, . . . ➙ a component that given a number of ingredients, selects recipes that can be prepared with the ingredients. Of course, we can only develop these components if we know where to find the name, amount, and unit of an ingredient, for example, in a recipe document. Validating a document with respect to a DTD is relatively easy. According to the XML standard, the content-model of a DTD has to be 1-deterministic, which implies that we can determine which element to expect next without looking ahead in the document. So validation is a simple kind of parsing. The 1-deterministic content-model restriction does impose a restriction on DTDs. For example, the following two content-models ((b,c)|(b,d)) elem?,elem
both need more than one element lookahead. Often content-models can be rewritten into 1-deterministic form. For the above examples, the following equivalent content-models are 1-deterministic. (b,(c|d)) elem,elem? The 1-deterministic content-model restriction has been imposed to simplify writing parsers and validators for XML documents. As a consequence, all DTD writers have to know about this restriction, and have to know how to avoid running into it. Besides the possibility to use and/or develop components for DTDs, using DTDs for XML documents has several other advantages. ➙ Most modern XML editors are DTD aware, and support the construction of valid documents by suggesting which elements can be inserted at which position, which attributes have to be defined, etc. ➙ There exist DTD-aware XML tools, which support storing and querying XML documents in databases. ➙ DTD-aware XML compression tools compress XML documents much better than other XML compression tools.
3.2.
DTD-aware XML compression tools
As an example of an application where DTDs can be used to improve a particular application we discuss DTD-aware XML compression tools. Suppose we have a webserver that has millions of recipes in its database, and gets millions of requests per day for recipes. Since XML is a very verbose format, it pays off to store and send the recipes in a compressed format. We could use standard compression programs like zip to compress recipes, but we can also try to use the fact that we know the DTD of recipes. Compressing an XML document by first compressing it using knowledge about the DTD, and then using standard compression algorithms results in better compression ratios. For example, using knowledge about the DTD, we can compress the example recipe as follows. The detailed explanation of how this works follows below.
Tea with milk and sugar> 3>Tea>11>1cup> 2>Water>11>1cup>0> Tea leaves>11>1gram>0> Cook the water, add the tea leaves. Remove the tea leaves after a couple of minutes.> Milk>11>1spoon>0> Sugar>11>1spoon>0> Add the milk and the sugar to the tea. Stir.> 0>10>0>100>0 The original recipe document consists of 642 characters, whereas this document contains 282 characters. The size of the document is reduced by more than 50%. If a document is a recipe, we know it starts with the tag and ends with the tag . So we do not have to encode these tags: we can just omit them when storing and sending, and insert them again when decompressing. Inside a recipe, we have a title, followed by a list of ingredients, followed by a preparation, an optional comment, and finally the nutrition information. We can remove the title tags too, and replace the closing tag by a separation symbol, such as for example > (which may not appear in text in XML documents). Next we have a list of ingredients. Again, we can replace the closing ingredient tags by a separation symbol, but we have to start with telling how many ingredients there are, since we have to know when to start reading a preparation instead of an ingredient. Furthermore, we also have to include the attributes when encoding the ingredients. Since the name of an ingredient is a required attribute, we will list the ingredient’s name first in the encoding, and add a separation symbol. Note that we do not encode the attribute name itself: it can be inferred from the DTD. We do assume a standard order for the attributes though; the same order as specified in the ATTLIST in the DTD. The other two attributes of ingredient, amount and unit, are optional, and we encode them by first telling whether or not they are present (by means of a 1 or a 0), followed by the value of the attribute, followed by a separation symbol. Since an ingredient may contain subingredients, as for example Tea in Tea with milk and sugar, we
again encode how many ingredients an ingredient contains, followed by a separation symbol, followed by the encoding of the subingredients. We then replace the preparation tags by a closing separation symbol, encode whether or not there is a comment, and replace the closing tag of the comment by a separation symbol. Finally, we remove the nutrition tags and attribute names, and separate the required tags by a separation symbol, and encode whether or not the alcohol attribute is present, followed by its value. We do not have to insert a separation symbol after the nutrition information anymore because this is the last item in the recipe document. The compression algorithm described in this section takes a DTD as argument. Using this DTD, it compresses documents that conform to the DTD. It is an example of a DTD-aware tool.
3.3.
XML Schema
DTDs have a number of shortcomings, which led the W3C to developing XML Schema [12]. XML Schema is another language for specifying the structure of XML documents, more powerful than DTDs, but also much more complex. In this section we will briefly discuss the reasons why XML Schema was developed, since a number of these reasons are related to the ˆ of grammars in software and documents. role ➙ The DTD formalism has only one base type. Strings, integers, dates, amounts, etc. all basic data in an XML documents has the type PCDATA. This implies that it impossible to distinguish these types. It follows that there often are two validation phases for an XML document structured according to a DTD: a first phase validating the XML document against its DTD, and a second phase checking whether or not all the strings in the document satisfy particular properties: dates are proper dates in a particular format, names do not contain numbers, etc. XML Schema has around twenty base types, such as string, boolean, decimal, float, and time. This is a big improvement compared with DTDs. Using an XML Schema it is possible to more precisely specify the structure of documents. ➙ DTDs have a limited set of constructs for constructing DTDs. These constructs are the standard grammatical constructs choice |, sequence ,,
and the EBNF constructs possibly empty list *, nonempty list +, and option ?. XML Schema extends these constructs with a number of ways to specify inheritance, more control over the number of occurrences of elements (for example between three and seven times), a construct that supports specifying a number of elements that may occur in any order, etc. These extra constructs for specifying XML Schema’s make the XML Schema language more expressive, but also much more complex. ➙ The syntax of DTDs does not use XML markup. Most XML standards use XML markup to structure a document. DTDs have their own language, to a large extent derived from the (E)BNF notation for grammars. XML Schema has replaced the standard grammatical language by XML markup. Where in DTD you write ((a,b)|c) for a choice between a sequence of a and b, or a c, in XML Schema you would write \ This is a big step backwards. It is obvious that the above XML Schema is much harder to read than the equivalent DTD. In general, the idea that all structured data should use XML tags, as strongly supported by W3C, is flawed. It confuses abstract and concrete syntax. People want to read and communicate concrete syntax, not abstract syntax. A standard might specify both, but only specifying abstract syntax is a bad idea. ➙ DTDs do not support namespaces. Not only do DTDs not support namespaces, it is also impossible to validate different parts of a document with respect to different DTDs. XML Schema supports namespaces, and makes it possible to validate parts of a document with respect to different XML Schema. For example, if the output of an XSLT sheet is a XHTML document, the
sheet contains both XSLT elements and XHTML elements. It is now possible to validate the XHTML elements with respect to the XHTML DTD. This is very useful.
4.
Grammarware
This section discusses application domains in which grammars play a large ˆ Most of the descriptions and examples have been taken from Klint et role. al [6], who give an excellent overview over the field. We refer the reader to this paper for more information about grammarware, and for references to papers describing the main applications mentioned in this section. We start with discussing different grammar formalisms, and notations for such formalisms. We then give a number of grammar use cases.
4.1.
Grammar formalisms and notations
There are several formalisms that provide a foundation for grammars: ➙ Context-free grammars (the formalism used in both the textbook on formal languages and automata, and in the lecture notes on grammars and parsing); ➙ Algebraic signatures; ➙ Regular graph grammars. These formalisms differ in expressive power and convenience. Context-free grammars help in defining the concrete syntax of programming languages. Algebraic signatures (usually consisting of a set of operators or functions, together with their types) are suitable for specifying unambiguous abstract syntax trees. Graph grammars and the underlying schemas cater for graph structures. There exist all kinds of mappings between the different formalisms. For example, we can convert a context-free grammar to an algebraic signature by forgetting all terminals, inventing a function symbol for each production of the context-free grammar, and by translating productions to types of function symbols. Actual structural descriptions are normally given in some grammar notation, for example: ➙ Backus-Naur Form (BNF [1]), Extended BNF (EBNF [5]).
➙ The Syntax Definition Formalisms (SDF [4, 10]). ➙ Abstract Syntax Notation One (ASN.1 [2]). ➙ Algebraic data types as in functional languages [9]. ➙ Class dictionaries [7]. ➙ UML class diagrams without behavior [3]. ➙ XML schema definitions (XSD [12]). ➙ Document type definitions (DTD [11]). In fact, there are so many grammar notations that we do not aim at a complete enumeration. It is important to realize that grammar notations do not necessarily reveal their grammar affinity via their official name. For instance, a large part of all grammars in this world are “programmed” in the type language of some programming language, e.g., in the common type system for .NET, or as polymorphic algebraic data types in typed functional programming languages. (Recall example 2, which employed algebraic data types.)
4.2.
Grammar use cases
The grammars in Figures 1, 2, and 3 are pure grammars, i.e., plain structural descriptions. Nevertheless, we can infer hints regarding the intended use cases of those grammars. The BNF in Figure 1 comprises details of concrete syntax as needed for a language parser (or an unparser). This is a typical example of the ‘standard’ application of grammars in the front-end of compilers. The algebraic data type in Figure 2 does not involve any concrete syntax or markup, but it addresses nevertheless a specific use case. That is, the description captures the structure of (problem-specific) event traces of C-program execution. Such event grammars facilitate debugging (stepping through the events) and assertion checking (an assertions asserts a property at a particular program point). Note that the algebraic data type for the event traces differs from the (abstract) syntax definition of the C programming language - even though these two grammatical structures are related in a systematic manner. The DTD in Figure 3 favors a markupbased representation as needed for XML processing, tool interoperability, or external storage.
The term grammar use case refers to the purpose of a (possibly enriched) structural description. We distinguish abstract vs. concrete use cases. An abstract use case covers the overall purpose of a grammar without reference to operational arguments. For instance, the use cases “syntax definition” or “exchange format” are abstract. A concrete use case commits to an actual category of grammar-dependent software, which employs a grammar in a specific, operational manner. For instance, “parsing” or “serialization” are concrete use cases. Even the most abstract use cases hint at some problem domain. For instance, “syntax definition” hints at programming languages or special-purpose languages, and “exchange format” hints at tool interoperability. All the grammar use cases that we mention in this section are linked to software engineering including program development. One could favor an even broader view on grammarware. Indeed, Mernik et al. [8] revamp the classic term “grammar-based system” while including use cases that are not just related to software engineering, but also to artificial intelligence, genetic computing, and other fields in computer science. Here are details for representative examples of abstract grammar use cases: ➙ Intermediate program representations are akin to syntax definitions except that they are concerned with specific intermediate languages as they are used in compiler middle and back-ends as well as static analyzers. Representative examples are the formats PDG (Program Dependence Graph) and SSA (Static Single Assignment). Compared to plain syntax definitions, these formats cater directly for controlflow and data-flow analyses, and allow for efficiency optimizations. ➙ Domain-specific exchange formats cater for interoperation among software components in a given domain. For instance, the ATerm format addresses the domain of generic language technology, and the GXL format addresses the domain of graph-based tools. The former format is a proprietary design, whereas the latter format employs XML through a domain-specific XML schema. ➙ Interaction protocols cater for component communication and stream processing in object-oriented or agent-based systems. The protocols describe the actions to be performed by the collaborators in groups of objects or agents. Such protocols regulate sequences of actions, choices (or branching), and iteration (or recursive interactions). For
instance, session types arguably describe interaction protocols in a grammar-like style. There are just too many concrete grammar use cases to list them all. We would even feel uncomfortable to fully categorize them because this is a research topic on its own. We choose the general problem domain of language processing (including language implementation) to list some concrete grammar use case. In fact, we list typical language processors or components thereof. These concrete use cases tend to involve various syntaxes, intermediate representations, and other sorts of grammars: ➙ Debuggers. ➙ Program specializers. ➙ Pre-processors and post-processors. ➙ Code generators in back-ends. ➙ Pretty printers. ➙ Documentation generators.
4.3.
And more
Besides the applications mentioned in this section, grammars are used in grammar-based formalisms. Examples of such formalisms are attribute grammars, general tree and graph grammars, definite clause grammars, etc. These formalisms have in common that somewhere in their description a grammar appears. Some of these formalisms are used as programming ˆ of grammars is then often comparable to algebraic languages, and the role data types. Often these grammar-based formalisms emphasize the fact that input of a piece of software follows grammatical rules.
References [1] John Backus. The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference. In S. de Picciotto, editor, Proceedings of the International Conference on Information Processing, pages 125–131, 1960. [2] O. Dubuisson. ASN.1 Communication between heterogeneous systems. Morgan Kaufmann Publishers, 2000. Translated from French by Philippe Fouquart. [3] M. Gogolla and R. Kollmann. Re-documentation of Java with UML Class Diagrams. In E. Chikofsky, editor, Proceedings of the 7th Reengineering Forum, pages 41–48, 2000. [4] J. Heering, P. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism sdfreference manual. SIGPLAN Notices, 24(11):43–75, 1989. [5] ISO. Iso/iec 14977:1996(e), information technology syntactic metalanguage extended bnf. International Organization for Standardization, 1996. [6] Paul Klint, Ralf L¨ammel, and Chris Verhoef. Toward an engineering discipline for grammarware. ACM Trans. Softw. Eng. Methodol., 14(3):331–380, 2005. [7] K.J. Lieberherr. Object-oriented programming with class dictionaries. Lisp and Symbolic Computation, 1(2):185–212, 1988. ˇ ˇ [8] M. Mernik, M. Crepenˇ sek, T. Kosar, D. Rebernak, and D. Zumer. Grammar-based systems: Definition and examples. Informatica, 28(3):245–255, 2004. [9] Simon Peyton Jones et al. Haskell 98, Language and Libraries. The Revised Report. Cambridge University Press, 2003. A special issue of the Journal of Functional Programming. [10] E. Visser. Syntax definition for language prototyping. PhD thesis, University of Amsterdam, 1997. [11] W3C. Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation, 2004. [12] W3C. XML Schema. W3C Recommendation, 2004.