Apr 14, 2003 - The PHP calls within the XML documents render this dynamism. Enabling an XSLT processor to process instructions in PHP combines the ...
Extending XSLT Processor to parse PHP Processing Instructions Pradeep Tallogu and Daniel Andresen Department of Computing and Information Sciences Kansas State University Manhattan, KS 66506 {pradeep, dan}@cis.ksu.edu Abstract There are quiet a few XSLT Processors that are designed to work with XML documents whose structure is defined in a stylesheet. But many of these have limited or no support for external calls from an XML document. This paper describes the implementation of an XSLT processor that works with a dynamic XML document. The PHP calls within the XML documents render this dynamism. Enabling an XSLT processor to process instructions in PHP combines the advantages of XML and programming options of PHP at the same time. We show how the Processor is integrated with the PHP interpreter, and discuss the areas where the system can be extended. We compare the performance of the resulting system with the non-PHP-enabled system and summarize the results.
1. Introduction An XSLT Processor takes an XML document and an XSL document and uses an XML Parser to transform the XML source to a resultant output format. In the case of a static XML document, the transformation is straightforward. In other cases, where XML is a dynamic structure, the transformation requires an external application to process the calls that are embedded in an XML document. When these documents are processed, the external calls are sent to the application, and the results are used to fill in the appropriate locations in the XML structure. We propose to integrate the XSLT processor, Xalan, which is an implementation of TrAX (Transformation API for XML), and PHP Interpreter that responds to the external calls. Active XML generates automatic response due to changes in data; it retrieves the real-time data and places the data in the appropriate location inside the XML structure. Without the dynamic properties of the XML document, one would have used a Java application or a servlet to generate a static XML structure whenever a request was received. With the
enabling of the dynamic behavior, the regeneration of the same static content for every request is ruled out. This kind of work is usually left to the extension of the processors. A significant number of options exist in which one can select an external application that processes the instructions in an XML document. The compatibility of PHP and the widespread use of Xalan prompted us to integrate the two systems. The rest of this paper first describes the related work in section 2, and section 3 describes our implementation. Section 4 describes how we have evaluated the system and presents the results. Section 5 presents the conclusions and describes future work.
2. Related Work Dynamic XML using JSPs: There are atleast a couple of standard Tag Libraries that can be used for coding dynamic XML. XJTL – Devsphere XML JSP Tag Library JSTL – Apache/Sun JSP Standard Tag Library These libraries contain standardized tags each of which are associated with a specific action. The Tag Libraries are a special type of XSL documents that define the function of the tags that are used. When an XSLT processor is used to process these documents, the associated function of the tag is taken from the Tag Library and executed. Here is an example that uses JSPs in a dynamic XML: Let us use "c", "x" and "o" as prefixes for the actions of JSTL Core, JSTL XML and XJTL Output, respectively:
The required functionality is embedded in the tags. The XSP processor replaces elements with their appropriate values, and passes the document on to the stylesheet processor for formatting.
We shall see how a final static XML document of the format ... is generated. Here a1, a2 and a3 are the attributes whose values are obtained from a URL’s request parameter.
For Example, a dynamic XML code of the form: Here is an XSP-generated date: Date genText = new Date(); genText
The dynamic XML can be coded as: Using XJTL ...
results in a static XML that looks like this: Using JSTL
Here is an XSP-generated date: Friday, May 02, 2003
...
In a more tidy approach, the logic is moved into a Tag Library and the XML document is coded in a more readable format.
Using JSP ( Without using Standard Tag Libraries) ...
Example: Here is XSP-generated text: . The logic for mylogicsheet is usually implemented in a stylesheet mylogicsheet.xsp.xsl
The Cocoon Processing Model Cocoon processing model has a well-defined demarcation of the phases for handling dynamic XML. These are divided into: Production – the XML content is generated based on Request parameters (servlet equivalent)
These are processed to get a static XML document first. After that the transformation to the required output format is done.
Processing - the produced XML content is transformed/evaluated
XSP Processor An XSP Processor recognizes the following standard tags:
2
First, the processor is provided with the stylesheet. Then the XML document is fed to the transformer. The transformer converts the source document into templates that can be matched with those defined in the stylesheet. After this point, in a normal non-PHPenabled run, the processor matches the templates and the contents are formatted as defined in the stylesheet to produce a result stream. In such systems, if a processing instruction is encountered it is passed over without any effect on the result stream.
Formatting - the XML content is finally formatted into the desired output format for client use. The producers provide the dynamic content to the processors. The processors evaluate the expressions in the XML file and expand them creating the final XML document. Cocoon distribution includes a number of processors that implement common needs and situations. These are: i. ii. iii. iv. v.
When the PHP interpreter is used, the transformer captures the events due to PHP instructions and passes the arguments to the interpreter. The interpreter in its turn receives the arguments and sends back the results after processing. The processor collects these results and fits them into the appropriate location in result stream.
The XSLT Processor The XSP Processor The DCP Processor (Deprecated) The SQL Processor (Deprecated) The LDAP Processor
The following figure shows the transformation process:
3. Implementation We have used the XSLT Processor, Xalan-J and enabled it to process PHP calls. Transformation API for XML (TrAX) provides a convenient interface for transforming XML documents to XML, HTML and various other streams. The implementation of these interfaces is present in Xalan. Xalan consists of four major modules and various other supporting modules. The four major modules are: org.apache.xalan.processor, the processor module processes the stylesheet, and provides the main entry point into Xalan, org.apache.xalan.templates, the templates module defines the stylesheet structures, including the Stylesheet object, template element instructions, and Attribute Value Templates, org.apache.xalan.transformer, the transformer module applies the source tree to the Templates, and produces a result tree while org.apache.xpath, xpath module processes both XPath expressions, and XSLT Match patterns.
Figure 1: Diagram showing the process of transformation from XML to other streams. Inside the Xalan package, the class SerializerToHtml in the module org.apache.xalan.serialize receives notification of various events like the character data, entity reference, beginning of an element, end of an element, start of a document, processing instruction etc. Upon receiving these events the module takes the appropriate action so that they are converted to generate the HTML output. By default, no action is taken when a processing instruction is encountered. The event of a processing instruction is captured and the arguments of these instructions, namely the target (PHP interpreter) and the data (the PHP instruction and its arguments), are read. These instructions are then sent to the PHP command-line interpreter. The interpreter takes the instruction and its arguments and shoots back with the interpreted result. Xalan in its turn receives the results into a buffered reader and stores
The other supporting modules include org.apache.xalan.utils, org.apache.xalan.serialize and org.apache.xalan.stree. The module that we have worked on is org.apache.xalan.serialize. This module consumes SAX ContentHandler events and produces the resultant XML, HTML or Text streams. The process of transformation goes through the following steps:
3
them into its accumulator. Finally, the contents of the accumulator are directed to the HTML file.
Nevertheless, an instruction like
The implementation of this extra functionality is fairly simple. Most of the coding was spent in handling the data – the arguments to the interpreter, the results from the interpreter. A total of around 100 lines of code was spent in the method public void processingInstruction(String target, String data) of class SerializerToHtml
would return 10.
4. Evaluation The performance due to extended functionality is evaluated in terms of the increase in the time taken to parse while the number of instructions is increasing.
In effect, an XML file that has Processing Instructions like
The following graph shows the results:
converts to Monday 14th of April 2003 09:04:54 PM 14 Apr 2003 in the HTML file. The advantages of using PHP are manifold. It is lightweight, cross-platform compatible, supports any 32-bit or better platform, and a survey in ZDnet’s eWeek online publication found that PHP is as much as 3.5 times faster than JSP. PHP supports wide variety of databases, protocols and generates a wide variety of outputs.
Simple “echo” Instructions Expression evaluating simple math
Figure 2: Graph showing the performance of the system with varying number of PHP Processing instructions
The project is a step in the direction of the development of a system that allows PHP and XML to be as much interleaved as possible. The system that we have developed, as it stands, has the ability to process the PHP instructions in an XML document but is not as productive as can be. There are limitations with this system. Most importantly, it cannot store the state of the system at any particular instance. For example, it cannot hold the value of a variable that was declared at some point in the XML document so that it can be used at a later stage. It treats each instruction independently.
Observations from the Graph: Times taken: Non-PHP-enabled transformation : 1097ms For a simple instruction that echoes a string: PHP-enabled transformation (with one instruction) : 1291ms
Consider a set of instructions of the form:
An average of 141ms is consumed for every extra instruction that is included.
For an instruction that involved simple math: PHP-enabled transformation (with one instruction) : 1569ms
With the current system that we have developed, these set of instructions do not return the value 10! This is because each instruction is treated independent of the others.
An average of 223ms is consumed for every extra instruction that is included.
4
A simple PHP instruction,
with the value of $title as “XML Style Sheets” and the value of $date as “April 2003” as seen from the XML tags.
is used as a processing instruction in the XML file and the time taken to transform it is plotted. It is seen that the graph is fairly linear with the increase in the number of instructions.
References
Another instruction that does a simple math, ,
[1] Transformation API for XML (TrAX). http://xml.apache.org/xalan-j/trax.html
is used to plot the time taken.
[2] Design of Xalan-J2.0. http://xml.apache.org/xalanj/design/design2_0_0.html
This expression recorded an increase in the time taken to transform when compared with that of the previous expression. This can be attributed to the nature of the expression that requires some calculation to be done before the result is returned. The nature of this graph is also found to be linear with increasing number of expressions.
[3] Xalan-J API Docs. http://xml.apache.org/xalan-j/apidocs/ [4] Dynamic XML using JSPs. http://www106.ibm.com/developerworks/java/library/jdynxml.html
5. Conclusions and Future Work
[5] Active XML. http://www.doc.ic.ac.uk/~mt99/Project/Backg round%20reading/AXML-data-centricperspective-on-web-services.pdf
The system introduces a new option of using PHP in writing dynamic XML. With the elegant features of PHP, the work related to web-scale data integration can be made a lot easier to handle. However, a lot of work remains to be done before it can be used to its full potential. Some of the areas in which the system can be improved are:
[6] Dynamic XML in Cocoon. http://carnagepro.com/pub/Docs/Cocoon1.8.2/ dynamic.html
The state of the system should be stored for later use. For instance, a variable that is declared at some point in the XML document through a PHP instruction should be accessible at a later stage.
[7] Dynamic XML output and DOM serialization http://www.devsphere.com/xml/taglib/output/ examples.html
The set of instructions should return 1 and 2 respectively.
The ability of Processing Instructions to refer to values contained in the XML tags can be an interesting extension. For example, an XML document like this should make sense The Moonstone Mar 1863 1863, Wilkie Collins
5