Document not found! Please try again

XML & RDMS - CiteSeerX

14 downloads 470 Views 349KB Size Report
As a result a host of Web based systems make use of both RDBMs (as data ... are mapped to SQL tables, in a way that 'critical' XML tags are mapped to table ...
Accessing and Transforming Dynamic Content based on XML: Alternative Techniques and a Practical Implementation1 Y. Despotopoulos* G. Patikis* J. Soldatos* L. Polymenakos** J. Kleindienst*** J.Gergic*** *

National Technical University of Athens, Electrical & Computer Eng. Dept., Computer Science Division, 9 HEROON POLYTECHNEIOU STR., GR-15773 ZOGRAFOU, GREECE e-mail: {jsoldat,ydes,gpatikis}@telecom.ntua.gr **

IBM Hellas, Computer Science Division, KIFISIAS AVENUE., GR- HALANDRI, GREECE e-mail: [email protected] ***

IBM Czech Republic, Voice Technologies and Systems e-mail: {jankle,jaroslav_gergic}@cz.ibm.com

Abstract XML is widely used as a Web language that decouples completely the data model from the various views that may be employed to present the actual information. Using XML in the scope of information portals, auctions and e-publishing applications constitutes often a compromise between transformation flexibility, scalability and the required computing power. Thus, most large scale Web based applications, featuring databases access and dynamic content presentation rely on RDBM back end systems. In the scope of these systems XML is usually employed as a data layer hiding the peculiarities of the particular database system and schema. This paper reviews popular architectures for combining XML and RDBM systems and illustrates one of these architectural alternatives through a detailed presentation of the practical implementation of a novel back end system. This system is developed in the scope of the CATCH2004 (IST 1999-11103) research initiative, in order to support the data tier of a multimodal, multilingual and multi-device information retrieval system.

1. Introduction Information portals, auctions and e-publishing applications enjoy tremendous popularity among Internet users. At the same time e-commerce applications (B-B, B-C) are gradually becoming the most important Web applications from a business perspective and transform the Internet to vehicle for conducting business. The proliferation of such applications has created new requirements for storing, exchanging and manipulating data. Given the obvious limitations and inefficiencies 1

Work partially funded by the European Union, through the research project IST-1999-11103 CATCH2004 (Converse in AThens, Cologne and Helsinki), in the IST (Information Society Technologies) research program.

resulting from the use of HTML for publishing and exchanging information over the Web the W3C consortium [10] has introduced a new standard for structuring, representing and manipulating data, namely the Extensible Markup Language (XML) [17], [22]. XML is widely used as a Web language that decouples completely the data model from the various views that may be employed to present the actual information. Thus, XML presents certain advantages related to supporting data modeling for E-publishing, E-commerce and EDI (Electronic Data Interchange) applications, as well as for applications involving data transfer between databases and application integration. Due to these strengths XML is often thought as a panacea for supporting the data tier(s) of modern e-commerce and Web portal applications, which are usually based on multi tiered architectures. This belief is not however reflected in most practical applications. In practice, XML is not used stand alone, but in conjunction with conventional relational database management systems [17]. The synergy between XML and Relational Database Management Systems stems from the fact that the advantages offered by XML should be combined with proven advantageous properties of relational systems, such as their scalability, maturity, as well as the rich set of tools which they offer for data manipulation. It should be emphasized that XML provides far less scalability, since data access and transformation is based on complex object structures, called DOMs (Document Object Models) [6], which are usually computationally expensive to manipulate. As a result, most large scale Web based applications, which access databases and present dynamic content rely on RDBM back end systems. In the scope of these systems XML is often employed as a data layer hiding the peculiarities of the particular database system and schema. This transparency is perfectly in line with the core principles of multi tier architectures, which support most modern Web sites. XML data layers can then be exploited towards transforming the content base on the target application. This is because XML allows for perfect co-operation with state of the art technologies for appropriately filtering, transforming and presenting dynamic content mechanisms such as XSL (Extensible Style Sheets) [11], JSP (Java Server Pages) [9], ASP (Active Server Pages) [13] and PHP[15]. These properties render XML appropriate for supporting applications involving multiple modalities and/or multiple terminal types. As a result a host of Web based systems make use of both RDBMs (as data repositories) and XML (as a means for modeling, structuring and manipulating data). It is no accident that most vendors of database systems provide complete sets of XML tools, so as to transform content stored within conventional relational schemas into XML structures (see for example [18]). The purpose of this paper is to review popular architectures for combining XML and RDBM systems, and to illustrate one of these architectural alternatives through a detailed presentation of the practical implementation of a novel back end system. This back end was developed in the scope of the Information Society Technologies [19] (IST 1999-11103) CATCH2004 [4] research initiative, in order to support the data tier of a multimodal, multilingual and multi-device information retrieval system. The presented implementation constitutes a successful application of the techniques discussed in the scope of the paper. Specifically, the structure of the paper is as follows: section 2 following this introduction, elaborates on alternative techniques for tailoring XML to RDBMs systems, in the scope of three tiered applications. Section 3 performs a concise presentation of the objectives of the CATCH2004 with respect to the required back end systems. Thus, section 3 lists the requirements from a back end system that supports a multilingual, multimodal and multidevice system. Section 4 delves into the actual design and implementation of

the data tier that takes place in the scope of the project. Finally, section 5 concludes the paper, refers to future work and attempts to predict future directions and trends in the field.

2. Combining XML and Relational Systems Towards presenting a couple of comprehensive paradigms for combining XML and Relational Systems, we assume a multi tiered system that features a Relational Database Management System as a data repository at the system’s back end (Figure 1). In such a system the application logic uses in most cases a standard database APIs (such as ODBC and JDBC [7]) towards interfacing with the database tier. As a result, data access is ultimately based on standard SQL queries. Once data is retrieved, XML serves as another data layer lying on the edge between the data tier and the application logic tier. This is quite understandable since this extra layer is just another representation of the database data, which serves as input to the application logic towards applying the appropriate transformations [14]. Having an additional XML data layer, eases the task of transforming the content in order to render it appropriate for various client devices, in multiple languages and modalities, as well as conformant to user preferences. The core concept is to retrieve data from the RDBMs, convert it to a well defined (according to a Document Type Definition (DTD) XML format (as required by other transformation components of the application logic tier), transform it to an appropriate client language and format, and finally deliver it to the terminal residing at the presentation layer [5]. This process can be extremely useful for a number of applications such as: WWW sites and portals with personalized content, multilingual systems, multimodal systems, services that are accessible from different terminal (such as mobile phone, PDAs, PCs etc.).

C lie nt T e rmina l # 1



C lie nt T e rmina l # n

P re se nta tio n L o g ic

B usine ss L o g ic

D a ta T ie r RDBM S #1



JDBC, O DBC RDBM S #n

Figure 1: Typical Three Tier Architecture

Depending on the whether the computational complexity is transferred to the data repository or the application logic we propose two alternative strategies for data access and storage in the databases. The first data access and transformation paradigm transfers the complexity of constructing the XML document to the application logic. In particular, this access scheme presupposes that XML objects are mapped to SQL tables, in a way that ‘critical’ XML tags are mapped to table columns. The target XML document is constructed using one or more instances of appropriate application logic

components (e.g., Java Beans, COM objects). These components hold information about the user’s request (e.g., user locale and terminal) and query the database for the requested information. Having the appropriate fragments of data an appropriate middleware component may fabricate the XML document and pass it to other components of the business logic. Given the carefully designed structure of the database, the task of the business logic will then be confined to applying the necessary presentation filters.

Table Fields (F1, F2, …) correspond nearly to XML tags

RDBMS

F #1

F #2

F #3

F #4

………..

………..

………..

………..

………..

………..

………..

………..

………..

………..

………..

………..

Application Logic Components

Data Tier

DTD

XML Document

Figure 2: Sophisticated content structuring within the data repositories

An advantage resulting from this approach is that multiple XML documents with different content and structure can be produced from a single source of data, according to the needs of the application. In general, the fragmentation of the XML document into the database according the target DTD increases the efficiency in querying extremely focused information. There is however likelihood that complicated queries have to be constructed and executed. Moreover, bear in mind that (re)structuring the database according to the desired XML document might not be easy, especially when dealing with legacy systems. Thus, in such cases a good strategy is to take into account the database schema when designing the DTD that will reside in the application logic. An alternative solution hinges on storing entire XML documents into the database system. In this way complexity related to structuring the target XML document is transferred to the database tier. This entails the task of building appropriate database tables containing rows that compass XML documents indexed by the set of querying attributes. Therefore the content management applications of the data repository have to be rather sophisticated with respect to the task of

constructing and managing the database content as XML document. External applications (e.g., an information portal accessing a news database) benefit then for the readily available XML structure.

RDBMS

Tables Contain Entire XML document stored as String Data

XML Data DB

Date 20/05/2001



25/05/2001



………..

………..

………..

………..

Figure 3: Database containing entire XML documents

This approach allows faster data retrieval as the database schema is much simpler and the SQL queries uncomplicated. A major concern again here is to define global and well-structured DTDs for the XML documents. It should however be emphasized that storing entire XML documents into the RDBMs may not be the preferred approach when interfacing with legacy systems, which feature totally different data structures.

3. CATCH2004 : a multimodal, multilingual and multi device system Having introduced two potential strategies for taking advantage of the XML language at the back end of modern multi tiered systems we focus hereby on a real world application, which is being developed within the CATCH2004 project [1]. CATCH 2004 stands for Converse in AThens, Cologne and Helsinki and is a research activity co-funded by the European Union in the scope of the Information Societies Programme. The goal of CATCH-2004 is to develop a multilingual, conversational system with a novel unifying architecture across devices and services. Such a system will provide pervasive access to multiple applications, and sources of information available to citizens from public and private service providers by supporting multiple client devices, and by using multiple input modalities. In particular, the system will allow information access from a variety of terminals such as PCs, kiosks, phones, mobile phones, PDAs and smart wireless devices. Users will have access to multiple databases residing in different locations and offering diverse information. Furthermore, user interaction with the terminals will be based on speech, GUI modalities, as well as on their combinations. Given the desired features and the architecture of the CATCH 2004 system, XML comes into foreground. Specifically, as already outlined an intermediate XML data layer facilitates the tasks of:



Using a unified architecture towards accessing content residing in both demonstrators, i.e. having demonstrator independent business and presentation logic.



Transforming and presenting the information to various devices and terminal types.



Delivering the information based on a variety of modalities, including speech.



Supporting information personalization by taking into account user preferences.



Enabling smooth and efficient development of the system, in a distributed fashion, since the consortium consists of partners dispersed all over the world.



Making provisions for the system’s scalability through support of more database systems featuring other database schemas.

These tasks form a crucial subset of the functional requirements of the CATCH2004 back end system. The CATCH2004 system is based on a multi-tiered architecture, as described in [1], [2]. Multimodality is one of the key properties of the system that requires development of appropriate content transformation mechanisms [3]. In particular, data for each of the supported modalities is generated “on the fly” by transforming content stored in back-end databases, into formats that comply with the requirements of the individual user. These requirements span from the different modalities (e.g., Speech, GUI) that client devices are able to support, to the specific languages that the user prefers for the interaction with the system (e.g., English, German, Greek and Finish). The content is originally stored in a database that resides at the system’s back-end. There are no restrictions regarding the DBMS system or the design of the database schema used for organizing the data. A challenge faced by the CATCH2004 system is to find a transparent way to present data that are bound to a specific database design into a number of different formats, each suiting the needs of a particular client. The demand for transforming the content to different formats implies the existence of an intermediate data format that serves as an abstraction between them. XML is ideal for playing the role of the adaptation layer among the different modalities/localities. The basic idea is perfectly aligned to the first of the strategies proposed within section 2. Specifically, content stored in the back end database is retrieved by the business layer and transformed into an XML document that is independent of both back end and client implementations. This enhances also the extensibility of CATCH2004, since it facilitates handling of additional modalities. Figure 4 provides an overview of the system’s architecture with particular emphasis on the content transformation mechanisms employed at various points. As illustrated in this figure the transformation between the various formats is in practice a two way process consisting of a sequence of steps. First, the system extracts from the user request the information required to perform the content transformation. As the request travels backwards through the layers, each layer extracts the specific information contained in the request. When all layers have the necessary data the transformation process begins. The transformation consists of a series of steps. The information extracted by the user request is used in each step to generate, filter, and enhance the content. The output of each one of these sub-processes is a new document that is closer to the user requirements than its predecessor. Accordingly the new document is the input of the next transformation step until a document capturing completely the user requirements is finally produced. Thus, all modalities are processed along the same path while dynamically generating the respective markup pages.

XML is primarily used for conducting these content transformations. These transformations involve the following sequential steps: The first step takes place at the access point between the database and the business tiers. This step has a dual purpose. First to retrieve content that matches the user locality preferences and second to provide independency from the details of the specific DBMS implementation. As far as the first objective is concerned the solution is straightforward: the system interprets the user preferences and issues the appropriate SQL commands that access the tables for the specified locality. In order to achieve the second objective a more complicated solution is employed: the database content retrieved by the previous process is not forwarded directly to the business tier, but instead transformed into XML format. This XML document contains a simple one to one mapping between the document elements and the tables and fields of the database schema. Furthermore in order to provide some kind of abstraction from the specific SQL schema a second transformation is applied to the output of the previous stage (i.e. the XML document structured according to the SQL query issued). The result is another XML document that is totally independent of the database tier. The respective output is organized in a structure that captures the semantics of the CACTH2004 applications and is completely independent of both database schema and the DBMS details. This XML document will be forwarded to the business tier for further processing, according to the following step. The second step in the content transformation takes place in the business layer namely the Multimodal Portal (MMPortal). Prior to forwarding data to the presentation layer the content is transformed in a format, which is compatible with the modalities of the client device. As already outlined the input of this stage is an XML document containing information that matches the semantics of the CATCH2004 applications, whereas the output is a document matching the modality preferences of the client device. Currently the system is able to produce different versions of the output: An HTML version (appropriate for standard WWW browsers), a WAP [23] version for cell phones and multimodal versions for devices that are able to combine more than a single modality. The third step is meaningful only for a subset of the supported clients, namely those with multimodal capabilities. In this case user devices consist of more than one module each of which handles a separate modality. Effectively this means that the system must perform two actions. First to transform the XML document sent by the MMPortal into two separate documents (one for each module). Second to maintain a separate stream of information to each of the modules and handle the synchronization issues between them.

EIRI Interface

Presentation Tier

Business Tier Multimodal P ortal

Back End Tier

Renderer A

Virt ual P roxy

M M DBs

Renderer B

Abst ract ion from DBM S Archit ect ure

Definit io n of Renderer

Original Cont ent Abst ract ion from DB Schem a

Definit io n of M odalit y

Renderer Sp ecific Content

M odality Sp ecific Content

D ata indep endent of D BM S A rchitecture

Generic

Event Content

Figure 4: Overview of CATCH 2004 Architecture with respect to Content Transformation(s)

Figure 4 provides the general functionality of the system as a multimodal content retrieval engine. However, towards serving audio only interactions, an alternate architecture, involving an NLU (Natural Language Understanding) module instead of the MMPortal is employed. In such a case, the sole client device is a telephone (supporting content retrieval though audio only interaction) and NLU serves as the consumer of the XML document produced after querying the back end database. Therefore, from the three steps mentioned above, only the first one is applicable. The alternate architecture involving the NLU is also described in [2], along with scenarios elaborating on coexistence of the MMPortal and NLU modules into an absolutely unified system. Based on the concepts described above, the project will end up developing and testing two prototype demonstration systems, one in Helsinki and another in Athens. The Helsinki demonstrator will support information retrieval related to cultural activities taking place in the scope of the ‘Art Goes Kapakka’ Event, while the Athens demonstrator will enable multimodal, multilingual and through multiple devices access to information about the Olympic Games of 2004. The Olympic games constitute a golden opportunity towards testing innovative mutlimodal services that are made available to visitors, participants and organizers. The city of Helsinki features high penetration rates with respect to use of smart phones, PDAs and wireless devices, and has therefore been selected as an excellent testbed for deploying novel wireless services.

4. XML based Content Access and Transformation in CATCH2004 The content databases of the CATCH2004 project are built on enterprise Relational Database Management Systems (RDBMS). RDMS systems guarantee the scalability of the content databases towards billions of information records. Based on SQL the CATCH2004 system can access transparently data sources that are distributed among different schemas or remote database instances as if they resided in the same database instance. In the current case the databases schemas available are accessed through a common interface based on the Java Database Connectivity API (JDBC). This effectively means that the system is not bound to a specific DBMS system architecture thus boosting its interoperability and scalability. The particular database schema that is used for hosting the content depends on the target application(s). Currently, the following applications are envisaged for the final CATCH2004 system demonstration: (a) A Cultural Events Information Application, which is being implemented as part of both (Athens and Helsinki) demonstrators (b) A Sports Events Information Application, which is specific to the Athens demonstrator In the sequel we present a short overview of the database schemas for both applications, so as to provide insight to the original format of the data. Such an illustration facilitates the understanding of how the information undergoes the various transformations, till it reaches the end user. Cultural Events Information Schema The cultural events information retrieval application uses queries, which simplify substantially the design of the database schema and enable the adoption of the “star model”. The bulk of the data is common to all entities and is stored in a central table. The rest of the information is scattered among a number of tables that are referenced by the main one. This simple design covers the needs of the application and offers good performance (which is very important in interactive applications). Each of the events stored in the database is composed of two parts. The first part contains information that it is specific to each event like the event name, start/end time and their values are stored inside a particular table. The rest of the event information is selected among list of values that are referenced from this table to the rest of the tables. Sports Events Information Schema The sports schema is a bit more complicated than the one pertaining to the cultural events. The reason is that there is greater variety of queries that the user can submit than in the case of cultural events. Here apart from the events the user may request information about the athletes, results or venues in which each sport takes place. Thus, the schema of the Sport events is a more complicated version of the “star model”. Note that in order to support multiple languages the content should be stored in multiple languages. Moreover, the relevant back end should support multiple localities. The first option to tackling with this requirement is to store all the information in the same set of tables and identify each locality by a special attribute. This effectively means that the queries issued by the user must have additional parameter defining the preferred language. The other option, which has been followed in the scope of the current implementation, is to add a new copy of all the existing tables for each supported locality. Clearly this solution increases the demand for disk space comparing to the former alternative. However, having many tables improves considerably the response time of the system something extremely important for an interactive application like CATCH2004. Moreover,

language specific copies of the database schema allow access to content in a uniform way. This means that the target query is formatted and expressed in the same way regardless of the language used. The architecture of CATCH-2004 application relies on an initial content delivery as an XML document, which is independent of content storing mechanisms, database schemas and vendors. The exchange of XML structures was introduced to allow the separation between the databases used in each application and the business logic. In order to achieve independence from databases schemas, RDMS vendors and content storing mechanisms, there is a need to transform the content from the original relational structure into a well defined XML structure that can be exploited by the rest system components. The transformation into the desired XML structures is triggered by these front-end components and carried out at the system’s back end through an appropriate set of function calls. These content access and transformation methods constitute the Event Information Retrieval Interface (EIRI). Thus, EIRI is the interface that resides between the Application Logic and the RDMS systems of CATCH2004 project (Figure 6). The structure of the EIRI defines ways of accessing and interacting with the particular database implementations so as to get the appropriate XML document. In fact EIRI is implemented as a thin layer at each database side, and as specific function calls in an EIRI Adapter object at the MMPortal side. The current implementations of EIRI access the interface either as a (Java) library or remotely via HTTP protocol. The second option is a small add-on, implemented using the Java Servlets technology (Figure 5). The XML documents resulting from invocations of EIRI implementations contain the requested data, and are structured according to a predefined standard format. The standard format is expressed by using a Document Type Definition DTD. The DTD is a standard mechanism to specify a set of rules, more precisely a grammar, for the structure of XML elements. Using a common DTD the application logic knows exactly the structure of the incoming data, as by using EIRI knows exactly how to acquire the data. EIRI Implementation EIRI Servlet getStatusList getEventList

NLU

HTTP

MMPortal

• Create SQL Statement from Hashmap •Perform Query •Transform Data •Create BEResult

Figure 5: Interaction of business logic components with the EIRI data access library

Delving into the details of the EIRI, one can point out that the implementation layer receives requests for content retrieval from the database in an attribute-value pair format. Attribute-value

pairs can be passed to EIRI implementations either by using an HTTP GET request or encapsulated in software objects using a method call. The basic attributes that should be available are defined in the EIRI interface, but each specific implementation can extend the attribute set according to its needs. To preserve consistency across all EIRI implementations it is assumed that all the tags available in the predefined DTD structure are possible querying attributes. One may argue that using a predetermined DTD structure is limiting the scope of the application features given that new attributes cannot be directly supported. However, the DTD used is extensible and allows for addition of new data, which in the sequel have to be supported by extending the scope of EIRI towards the new information. It is also noteworthy that the DTD has been designed with a view of covering the whole range of the demonstrator’s requirements. Note that extending the EIRI towards supporting extra querying attributes would normally require extra functionality to be introduced in the main module, thus complicating the enhancement process. Therefore, the project will investigate solutions to this point, either by incorporating an XPATH [12] mechanism or by using XQUERY language in the implementations. Upon the reception of a request or method call, the provided attribute-value pairs are parsed for valid and non-empty attributes. The attributes are transformed dynamically to an SQL language query to the database. This part is implementation (and database) specific and the SQL statement can be constructed based on the database schema. The necessary database access mechanisms can be achieved by using one of the following alternatives: JDBC, Pure JAVA implementations, JDBC to ODBC bridges or even Platform Specific Implementation. However, all these approaches require the database vendor drivers for the underlying platform. In case of JDBC we assume as platform the Java Virtual Machine, though the same driver set is needed for every hardware/OS combination. Query results derived in the scope of an EIRI function calls are returned in standard JAVA objects provided by JDBC. From a content transformation viewpoint the first transformation occurs at that point. In particular, the EIRI implementation transforms the data stored in the JAVA structures to an XML document with one to one mapping of columns and rows, preserving the original names of columns and separating each row with a tag. The reason for that transformation is to set the returned data independently of the database and release resources associated with the database connection. The content in this phase is kept in an XML document in memory. Having the content in an XML document is the first step towards the full transformation of content in the desired format. The following transformation step is to make the document valid (compliant) according to the target (i.e. the one expected from the MMPortal) Document Type Definition. The standard mechanism for transformation of an XML document to another XML document is the use of eXtendent Style Language Transformation (XSLT). In the CATCH2004 project we utilise the Xalan-Java library [8] for XSL transformations, developed and maintained by the Apache XML project under the GNU license agreement. XSL stylesheets are provided to the Xalan XSLT engine along with the original XML document and the result is an XML document formatted according to the XSL stylesheet. The XSL stylesheets are responsible to define the rearrangement of the original XML tags, to rename them or introduce new ones, targeting to produce a new valid XML document compliant with catch-core DTD. The whole procedure is visualised in Figure 6. In a nutshell the EIRI is a content transformation module, which is designed and created for the Event Information Retrieval application of CATCH-2004 project. EIRI hides the database details from the business logic and acts as a mediator between the MMPortal and the content storage mechanisms. Currently, EIRI has been implemented both in Athens and Helsinki for the cultural

events database schemas. As there is need for another application for Sports Events, the EIRI concept is reused to provide information for sports events from the databases, by defining another DTD or extending the current one, and implementing the necessary functionality to construct the appropriate SQL statements dynamically.

DB

TO BUSINESS LOGIC

raw DB data structure (ResultSet) XML doc (one to one mapping with DB data)

Transform ResultSet to XML XML to XML, using XSLT

XML doc, valid with catch-core DTD

Figure 6: The EIRI implementation as a content transformation mechanism

5.Conclusions Using XML technologies in the scope of modern multi tier applications is widely associated with a host of certain benefits. Nevertheless, it is necessary that system’s and software engineers are capable of determining the appropriate role of XML within information portals and e-commerce applications. This paper attempts to provide insight into the appropriate positioning of XML in the suite of technologies that support large scale business critical applications. It is demonstrated that XML can provide added value to both legacy and new RDBMS systems by supporting the implementation of an intermediate data layer that is usually sitting on the boundary between the business logic and the data repository. This layer is structured according to the particular application at hand and can be independent of specific database vendors and relational schemas. Moreover, it is usually independent of client technologies and markup languages used to display data. Thus, XML provides the means for implementing multilingual, multimodal and multi device applications with the minimum customisation effort for each supported language, modality or device (i.e. by manipulating a single well structured source of information, which can be flexibly transformed). The paper suggests two generic methods for using XML in the scope of systems featuring data tiers supported by some sort of RDBMS. An implementation of a system that made use of such strategies is also presented as a practical case. This system is devised, designed and implemented in the scope of an important research initiative, namely the CATCH2004 project. Therefore, the main objectives of this project are outlined, along with the architecture of the CATCH2004 system. It is our belief that the use of XML as a means for transforming data in the scope of all kinds of Internet

applications (including the mobile Internet) will proliferate. As XML DOM techniques become more mature it is expected that data processing will increasingly occur at the XML level, instead of relying on conventional direct access to the databases through SQL.

Acknowledgements The authors gratefully acknowledge support from the European Commission under the IST Project CATCH2004: “Converse in Athens, Cologne and Helsinki”. The authors also acknowledge valuable help and contributions from threir partners IBM, NOKIA, ELISA Communications, City of Cologne (COC), Athens Organizing Olympic Committee (ATHOC), Hellenic Telecommunications Organization (OTE) and Gerhard University of Duisburg (GMUD).

References [1] CATCH-2004 "Converse in Athens, Cologne and Helsinki", Annex I "Description of Work", IST Programme Key Action III, Proposal number IST-1999-11103. [2] CATCH 2004, Deliverable D07: ’Report on System Architecture and Design’, August 2000 [3] CATCH 2004, Deliverable D11: ’Functional Specification of Content Transforming Tools’, January 2001 [4] CATCH 2004 public WWW Site, accessible at: http://www.catch2004.org [5] Danny Ayers, Hans Bergsen, Michael Bogovich, Jason Diamond, et.al ‘Professional Java Server Programming’, ISBN 1-861002-77-7 [6] Document Object Model (DOM), http://www.w3c.org/DOM [7] http://java.sun.com/products/jdbc/ [8] http://www.alphaworks.ibm.com [9] http://www.javasoft.com/jsp [10] http://www.w3c.org [11] http://www.w3.org/Style/XSL/ [12] http://www.w3.org/TR/xpath.html [13] John Jam, ‘Microsoft IIS, COM+ and ASP’, PC Magazine, May 3 2001 [14] Kevin Williams, Michael Brundage, Patrick Dengler, Jeff Gabriel, Andy Hoskinson, Michael Kay, Thomas Maxwell, Marcelo Ochoa, Johnny Papa, Mohan Venmane, ‘Professional XML Databases’ ISBN: 1861003587. [15] Lary J. Sheltzer, ‘PHP and Zend’, PC Magazine, May 3 2001 [16] Michael Della, ‘What’s holding up XML’, PC Magazine, May 1, 2001 [17] Neil Randal, ‘XML: A second chance for Web markup’, PC Magazine, November 4, 1997 [18] Oracle XSQL, XSL http://www.oracle.com/xsql [19] The IST programme, http://www.cordis.lu/ist [20] Timothy Dyck, ‘Analysis’, PC Magazine, PC Magazine, May 3, 2001 [21] Voice eXtensible Mark-up Language, http://www.VoiceXMLForum.org/. [22] William Robert Stanek, ‘Structuring Data with XML’ April 10, 2000 [23] Wireless Mark-up Language Specification, WAP Forum, November 4, 1999, http://www.wapforum.org/.