Article =
let mkToC (Body -> TOC ) cs -> (transform cs with |
2
Documents with computations
An XML document represents a tree. For example, the following XML data represents an element person having two elements name and pocket as its children. The latter in turn have three children elements tool: doraemon take koputa dokodemo door small light In the conventional way an XML document is written, each branches of the tree is written explicitly and their values are independent from each other. In functional languages, on the other hand, it is not uncommon that parts of a structure depends on some other parts. Allowing some parts of an XML document to be dependent on other parts enhances flexibility. Figure 1 shows an article written in XML, whose table of contents is computed from the body of the article. The article contains three sections. Each section element has a title element as its first child. The second section is given a name DocComp. The third section, merely for the sake of demonstrating, is computed from the second section by converting the title to capital letters. Here capitalise is a function taking a section and
2
returning another section by capitalise its title, while ${DocComp} is a pointer referring to the previous section. Where the table of contents should be is occupied by a place holder mkToC ‘../body‘ , where the path ‘../body‘ refers to the body element in the document, and mkToC is a CDuce function defined in the psdlib element in the end of the document. CDuce [2] is a strongly typed functional language designed with XML manipulation in mind. Currently it is our embedded language of choice. The last element in the document, psdlib, defines the “library” CDuce definitions used by the rest of the document. The type definitions, similar to a DTD declaration, declares the structure of the evaluated document. For example, an article element shall have as its children three elements: a title, a TOC, and a body. The function mkToC takes an element of type Body, and returns an element of type TOC. It selects the title elements in the article body and wraps them into item elements required in the table of contents. The definition of captalise is omitted here. The type attribite in the article element indicates that it, after being fully evaluated, must have type Article. That the article does conform to the type Article, and that the function mkToC does generate an element conforming to the type TOC, is guaranteed by the type checker of CDuce. The expression mkToC ‘../body‘, after evaluation, should yield the following: Introduction Documents with computations DOCUMENTS WITH COMPUTATIONS Note that to evaluate the table of contents, capitalise ${DocComp} needs to be evaluated first. This is taken care of by a dependency analysis phase. In fact, we can even construct cyclic documents. In the following XML definition, the pocket, labelled p, refers to itself as an item in the pocket1 : doraemon take koputa dokodemo door small light ${p} The tree representing the pocket expands infinitely when being printed. For those who familiar with non-strict functional programming, such a style of definition is not an anomaly but a useful programming technique. The following Haskell program, for example, computes the (infinite) list of Fibonacci numbers, where the first two elements are given, and the rest are constructed recursively from the list itself! fibs = 1: 1: zipWith (+) fibs (tail fibs) Certainly, eagerly evaluating the list results in non-termination. Therefore, in a non-strict functional language, the parts in a data structure is evaluated only when necessary, e.g., when being printed. CDuce, being a strict language, does not allow self reference in this naive manner. However, it is one of our future work to deal with cyclic documents.
3
The TreeCalc interface
The user experience of a TreeCalc user can be summarised in Figure 2. Each XML document is associated with a presentation specification, which describes how the XML document is to be displayed (for example, by transforming it to HTML). The user can then evaluate or edit the displayed presentation. Shown in Figure 3(a) is the article example in the previous section displayed in TreeCalc. Currently, the psdexpr tag is displayed as two buttons. Pressing the “Evaluate” button triggers the evaluation of the embedded code, as shown in Figure 3(b). The code itself can be edited after pressing the “Show Code” button (Figure 3(c)). The psdlib tag, on the other hand, is displayed as a “Show Program” button which, when pressed, displays the library code (Figure 3(d)). Both the displayed code and 1 If
the DTD allows so, of course. After all, it is a magic pocket!
3
Figure 2: The display-eval-edit diagram of TreeCalc the article body, but not the computed result, can be edited by the user. The changes made by the user, however, needs to be reflected in the original XML document. The presentation is then refreshed according to the original document. Currently, the derived parts of a document have to be evaluated or re-evaluated by pressing the button manually. In the future we hope that the evaluation can be triggered by a mouse-over event, or that TreeCalc would automatically compute all, and only, the parts that is being displayed in the window in a lazy manner.
4
Examples
To convince the reader that the concept of the PSD is useful, we will give some examples in this section. A TreeCalc document is like a spreadsheet in that some parts of the document is computed from other parts. It is in fact more flexible than existing spread sheets in that one can define user-defined functions at ease. In Section 4.1 we will see a TreeCalc document acting as a spreadsheet. In most XML editors, the table of contents of an article is computed by the displaying phase. To demonstrate the benefits of the PSD, we need an example where the programmable contents are intrinsic parts of the document, rather than the presentation. Such an example is given in Section 4.2.
4.1
Shopping list
Shown in Figure 4 is a shopping list written in XML. After the displaying phase it is presented as the spreadsheet in Figure 5. This shopping list has two items, Cocacola 1 liter and Pepsi 1 liter, each of which has a retail price and a formula to calculate the real price. The formulae refers to the retail element of the same item by the path expression ‘../../retail‘. Since the drinks are on sale, 100 yen is taken away from the Coca Cola and 20 yen from the Pepsi. The total money spent is calculated in the sum element by a CDuce function sumPrice, referring to ‘../../itemlist‘. Figure 5(a) shows the result of evaluating the cells and displaying the code. The code in the psdlib section is displayed in Figure 5(b). The function sumPrice selects the realprice tags using a CDuce primitive transform and computes their sum. The user can change the retail prices, edit the formulae or the library code, or add new items. The changes will be re-evaluated and reflected in the presentation by pressing the ”Evaluate” buttons.
4.2
Exam sheet
Consider an English proliferation test written as an XML document. Each multiple choice question has the following form: (question)
4
Figure 3: The article with dynamic table of contents displayed in TreeCalc
5
Cocacola 1 liter 200 getInt ‘../../retail‘ - 100 Pepsi 1 liter 168 getInt ‘../../retail‘ - 20 sumPrices ‘../../itemslist‘ .... Figure 4: XML representation of a shopping list
(a)
(b) Figure 5: A shopping list
6
choice 1 choice 2 : (score earned for this question) An example exam sheet is displayed in Figure 6(a). The choices are displayed as buttons the student can click. The student can then click the “Evaluate” button to check the scores earned for this question. The content of the score element is actually a piece of code, which can be displayed by clicking on the “Edit Code” button. In Figure 6(b) we see that the score of this question is evaluated by the expression onechoice(2, ‘../../options‘,10) where the path ‘../../options‘ refers to the options element of the same question. The function onechoice, which checks a given option element and returns the earned score, shall be defined in the psdlib part of this exam paper. In this case the function returns 10 if only the second choice is clicked. Otherwise it yields zero. Sometimes, however, the exam sheet designer might wish to allow certain degrees of ambiguity in the answers. The score of this question in Figure 6(c), for example, is computed by another function patternEval, which shall also be defined in psdlib: patternEval([([ff ff ff tt], 10) ([ff tt ff ff],5)], ‘../../options‘) It returns 10 if the students chooses the last option, but also returns 5 if the second best answer is chosen. Each question allows a customised way of evaluation. The exam sheet designer can even allow some bonus score if, for example, all questions in a module are answered correctly, all with a little bit of programming.
5
Implementation
The current implementation of TreeCalc is based on the general-purpose XML editor, Fungus, developed by JustSystem. All the displaying and editing is dealt with by Fungus. The evaluation engine, on the other hand, is implemented as a Fungus plug-in. When the user clicks the “Evaluate” button, the plug-in extracts the parts that needs computation, convert it to CDuce code, and feed the code to a CDuce interpreter. The output of the interpreter is then parsed and plugged back to the XML document. The current implementation, though serves to demonstrate the general idea of the PSD, is rather preliminary. We are currently designing and implementing a more efficient interface between Fungus and the embedded functional language.
6
Related works
Quite a few efforts have been devoted into XML processing, transforming and editing. Space here only allows us to mention those by which our work are influenced the most. Editing XML in its raw form is far from user friendly. An XML editor aims at displaying an XML document in a more comprehensible format for the user to edit. Considering the popularity of XML, it is not surprising that there exist quite a number of XML editors. One that is related to the functional programming community is the Proxima project[10]. Since an XML document itself does not specify how it is to be displayed, it is usually specified by a CSS stylesheet or an XSLT script. The designer of Proxima proposed the separation of a transformation language and a presentation language[11]. The latter describes the layout of a page, while the former specifies how an XML document is transformed to the latter. Our work can be seen as proposing the addition of another layer, a computation language HaXml[13] is a library developed for processing XML documents in Haskell, the now standard language for non-strict functional programming. HaXml allows two modes of XML processing. On the one hand, HaXml provides generic datatypes and combinators to parse and transform XML documents. On
7
Figure 6: An exam sheet for an English test. Each question may have its own customised scoring criteria.
8
the other hand, given a DTD, HaXml produces a corresponding Haskell datatype and parsing/printing functions. One can then manipulate an XML document like ordinary Haskell trees, while the type system of Haskell guarantees validity of documents. Other functional languages specifically designed to process XML documents are also being developed, such as [9, 7, 1]. Usually, an XML document is a supported datatype in the language. Special type systems are adopted to ensure validity of generated document and prevent possible bugs.
7
Conclusion and future work
In the implementation the first prototype of TreeCalc, we attempted to reuse existing tools as much as possible. The first prototype of TreeCalc used Haskell as the computational language. In the second version we switched to CDuce. Its strong type system proved to be a great help to ensure that the generated document does conform to some structural constraints. One of our aims in the next stage is to deal with self-referencing documents. We would therefore need a non-strict language, while CDuce, being a strict language, is not suitable for the task. We might have to design our own lazy or semi-lazy variation of CDuce. Lots of lessons can also be learnt from other XML processing languages, such as XMLambda [9] and XDuce[7]. It is also an interesting challenge to see whether the concept of polytypic functions [8] is compatible with a type system with regular expression types and subtyping, like that of CDuce. The current evaluation-by-pressing-button user interface is rather a result of the particular way our plug-in interacts with Fungus, than a desirable design. We are currently designing a more efficient interface through which Fungus and our evaluation engine can interact. To implement this interface, parts of Fungus needs to be rewritten and plenty of work needs to be done. After an XML document is transformed to a displayable presentation, TreeCalc expects editing actions from the user and needs to reflect the changes in the original document. One approach to do so is to construct the inverse transform from the presentation back to an computation-carrying XML document. Certainly, not all computations are invertible. However, it seems that most practical operations involved in XML manipulation are either invertible or can be made invertible. It will be a interesting future topic to identify this subset of invertible computation. After an editing phase, the XML document needs to be incrementally updated. Existing techniques for incremental updating of attribute grammar ought to be very helpful for our need. In [?], it was described how to convert an XML transforming function to an attribute grammar. We expect to apply similar techniques to TreeCalc.
Acknowledgement This work has been done under collaboration between University of Tokyo and Justsystem Corporation, who received the commission from “Comprehensive Development of e-Society Foundation Software” of the Ministry of Education, Culture, Sports, Science and Technology, Japan.
References [1] V. Benzaken, G. Castagn, and A. Frisch. CDuce: an XML-centric general-purpose language. In Proceedings of the 2003 ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2003. [2] V. Benzaken, G. Castagna, and A. Frisch. Cduce: an xml-centric general-purpose language. In Proceedings of the ACM International Conference on Functional Programming, 2003. [3] R. S. Bird and O. de Moor. Algebra of Programming. International Series in Computer Science. Prentice Hall, 1997. [4] T. Bray, J. Paoli, C. M. Sperberg-Macqueen, and E. Maler. Extensible Markup Language (XML) 1.0 (Second Edition), October 2000. http://www.w3.org/TR/REC-xml. [5] J. Clark. XSL Transformations http://www.w3.org/TR/xslt.
(XSLT)
Version
1.0
,
16
November
1999.
[6] S. de Rose, E. Maler, and D. Orchard. XML Linking Language (XLink) Version 1.0. Technical report, World Wide Web Consortium, 27 June 2001. http://www.w3.org/TR/xlink.
9
[7] H. Hosoya and B. C. Pierce. XDuce: A typed XML processing language (preliminary report). In G. Vossen and D. Suciu, editors, Proceedings of Third International Workshop on the Web and Databases (WebDB2000), number 1997 in Lecture Notes in Computer Science, pages 226–244. Springer-Verlag, May 2000. [8] P. Jansson and J. T. Jeuring. Polytypic compact printing and parsing. In S. D. Swierstra, editor, Proceedings of the 8th European Symposium on Programming (ESOP ’99), number 1576 in Lecture Notes in Computer Science, pages 273–287. Springer-Verlag, 1999. [9] E. Meijer and M. Shields. XMλ: a functional language for constructing and manipulating XML documents. (Draft), 1999. [10] M. M. Schrage and J. T. Jeuring. Proxima: a generic presentation-oriented XML editor. http://www.cs.uu.nl/research/projects/proxima/. [11] M. M. Schrage and J. T. Jeuring. Xprez: a declarative presentation language for XML. http://www.cs.uu.nl/research/projects/proxima/docs/xprez.pdf. [12] M. Takeichi and Z. Hu. Calculation carrying programs: how to code program transformations. In Proceedings of International Symposium on Principle of Software Evolution (ISPSE 2000), pages 250–259, 2000. [13] M. Wallace and C. Runciman. Haskell and XML: generic combinators or type-based translation? . In Proceedings of the 1999 ACM SIGPLAN International Conference on Functional Programming, pages 148–159. ACM Press, September 1999.
10