Browser Compatible XLink Processing - CiteSeerX

14 downloads 0 Views 76KB Size Report
In this paper, we propose a novel Browser cOm- patible XLink (BOX) ... Web pages range from a few words to long texts, contain more or less advertising ...
Browser Compatible XLink Processing Tseng-Chang Yen Department of Applied Mathematics National Chung-Hsing University 250 Kuo-Kuang Rd., Taichung, Taiwan 402 [email protected]

Felix Hsu Department of Applied Mathematics National Chung-Hsing University 250 Kuo-Kuang Rd., Taichung, Taiwan 402 [email protected]

Shang-Juh Kao Department of Computer Science National Chung-Hsing University 250 Kuo-Kuang Rd., Taichung, Taiwan 402 [email protected]

Abstract The very nature of the success of the Web lies in its capability for linking resources. However, the unidirectional hyperlink structures of the Web today cannot meet the growing needs for upcoming XML world. The official W3C solution for linking in XML, called XLink (XML Linking Language), includes several advanced linking capabilities, such as third-party links, multiple link set, and multidirectional links. In this paper, we propose a novel Browser cOmpatible XLink (BOX) application development framework which represents links via XLink and translates those links into hyperlinks of HTML. Follow the BOX framework, two XLink applications, multiple link set and multidirectional, are illustrated.

1

Introduction

The World Wide Web (WWW) [1] has gone through a furious path of success since its release in 1991, and hyperlinks are the most essential ingredient of the WWW. According to the Internet Software Consortium the number of hosts advertised in the Domain Name System (DNS) has crossed the 400 million mark sometimes in 2005. In September 2005 Netcraft [2] received responses from more than 76 million hosts providing HTTP service. More important than the huge number of hosts (they only provide the infrastructure), is the amount of information that is provided on the WWW. Because of the chaotic growth, the information lacks structure and organization. Web pages range from a few words to long texts, contain

more or less advertising multimedia, and are written in different languages and styles. They are provided by publishers with different background, culture, interest, motivation, and varying intentions. However, even though we can acquire information easily through search engines, such as Google or Yahoo, the returned web pages are at loose ends. We may trace the chaotic order of the WWW to the very simplistic linking model of the traditional Web which lacks important functionality [3]: An HTML link is a static, directional, single-source, single-destination link that is embedded into the source document. Although the Web has continually grown and evolved, the technical foundations have remained relatively unchanged. Of the basic technologies, URLs and HTTP has remained stable for some time now, and only HTML has changed more frequently. Much effort is put into finding solutions dealing with the flood of information. One approach is to improve search engine algorithms. They analyze the information structure of the Web in order to produce better ranking of the links in search results [4, 5]. Another path on the way to higher quality of information is the description of the content with metadata. The introduction of XML(eXtensible Markup Language) [6] has heralded a substantial change in the way in which content can be managed. XML, a formal recommendation from the W3C, is similar to HTML in its independence of hardware and software. It uses a set of markup conventions for describing texts and can explicitly express semantics of document’s contents. With these descriptive markups, XML documents are suitable for electronic processing and data reusing. Unlike HTML, XML is extensible: it does not contain a fixed set of markups. XML documents may have a pre-defined set of syntaxes, and could be formally validated. As a conse-

2

quence, XML is gaining popularity in academy, commerce, and various organizations, and more and more markups are defined for web documents and data exchange on the Web accordingly.

The XML Linking Language

In a very general sense, we can think of a link as a connection or relationship between two or more things. These things could be located anywhere in space and time. They can be explicitly linked together or implicitly linked by applying a set of assumptions. Conceptually, a link connects at least two things together. On linking terminology, these locations are referred to as anchors. The simplest kind of link consists of two anchors. When a link is traversed, the traversal is started at the source anchor and terminates at the target anchor. The Web conquered gopher for one reason: HTML made it possible to embed hypertext links in documents. However, the linking ability supported by HTML is quite limited. For instance, there are only two linking elements, i.e. and , and people can not create their own linking elements. The element can only be activated on request and then link to one entire document at a time. Similarly, the element will be activated on loading time and only image files can be included. Moreover, once the link is traversed the trail of where you’ve been is lost. Currently, most database management systems have already operated data in XML for a variety of applications. One of the critical facts that leads to the overall success of XML is that the ”XML world” embodies a lot of already known concepts in an optimal way to cope with a broad spectrum of requirements. A series of XML-related standards have been developed, such as XPath [10], XSL [11], ... etc. As XML is becoming an Internet standard for data representation, it is naturally expected that XML has more advanced linking ability. At the moment, several XML-based linking-related standards have been proposed. XLL as a broad term for XML hyperlinking (linking and addressing) has three major components: XLink, XPointer, and XPath. XLink (XML Linking Language) specifies constructs that may be inserted into XML resources to describe links between objects. It uses XML syntax to create structures that can describe the simple unidirectional hyperlinks of today’s HTML as well as more sophisticated multi-ended and typed links. XPointer (XML Pointer Language) specifies a language that builds upon the XML Path Language (XPath), to support addressing into the internal structures of XML documents. In particular, it provides for specific reference to elements, character strings, selections, and other parts of XML documents, whether or not they bear an explicit ID attribute, using traversals of a document’s structure and choice of parts based on their properties such as element types, attribute values, character content, and relative position, containment, and order. XPath (XML Path Language) is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. In support of this primary purpose,

In virtue of the open feature of XML and Internet accessibility, many users create and explore countless innovative applications on the Internet. For instance, XBMS [7], an Open XML Bibliography Management System, provide a framework for bibliography management in which authors can directly use favorite editors to compose articles, efficiently invoke the citation retrieval service, and automatically generate the desired document format, such as PDF, TEX, or other XML format. Nevertheless, for lack of general implementation for XML linking language (XLink [8] and XPointer [9]), XML linking applications are a rarity of rarities. At the moment, several XML-based linking-related standards have been proposed. XLL as a broad term for XML hyperlinking (linking and addressing) has three major components: XLink, XPointer, and XPath. In this paper, we introduce a client-platform independent mechanism for implementing XLL-based applications which make use the four key abilities of XLL: bi-directional links, n-ary links, out-of-line linking and flexible destination specification (i.e. the ability for the source link to specify where in the destination should be navigated to, XPointer is an example of this). For the sake of explanation, we also give an XLL application instance which represents the trace of a special topic. Since current web browsers at best only have early support for XML, the out-of-line linking is transformed into HTML links to be rendered by web browsers. Other than outbound links, which are similar to in-line links of HTML, XLink also offers inbound (with a remote starting resource and a local ending resource) links and outof-line (neither the starting resource nor the ending resource is local) links. The documents containing collections of inbound and out-of-line links are called linkbases. As the process of out-of-line linking requires searching the linking elements in linkbase file, query operation becomes a necessary task in performing the out-of-line linking. Other operations such as transformation or concatenation are also required in completing out-of-line linking. This paper is organized as follows. General linking features are described in Section 2. In the following sections, we briefly discuss XLink linkbase, and then introduce a framework for developing XLink applications. To illustrate the functionality of linkbase, two XLink applications, multiple link set and multidirectional, are presented in this paper. 2

it also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate the use of XPath within URIs and XML attribute values. XLink works by providing users with global attributes that users can use to mark their elements as linking elements. In order to use linking elements, the declaration of the XLink namespace is required:

The linking model of XLink has been the topic of a long discussion, and finally W3C settled for a compromise between HTML’s extreme simplicity and very complex linking models (for a detailed analysis of the expressiveness of links refer to [12]). One of XLink’s most important steps beyond HTML’s linking model is its support of third-party links. The advantages of extended-type links for a multitude of hypermedia applications have been described by many authors, as in [13] and [14]. To create a link that emanates from a resource to which you do not have write access, or from a resource that offers no way to embed linking constructs, it is necessary to use an inbound or third-party arc. When such arcs are used, the requirements for discovery of the link are greater than for outbound arcs. In XLink 1.1, the documents containing collections of inbound and third-party links are called link databases, or linkbases.

Using the global attributes provided by XLink, one may specify whether a particular element is a linking element, and many properties about it (e.g., when to load the linked resources, how to see them once they are loaded, etc.). The global attributes provided by XLink are the following: Type definition attribute Locator attribute Semantic attributes Behavior attributes Traversal attributes

3

type href role, arcrole, title show, actuate label, from, to

XLink Linkbase

In fact, every HTML link is an inline link. In a nutshell, an inline link is a link that is defined within at least a part of the content that it links together. In most cases, the link is defined within the content that acts as the source anchor. Figure 1 is a simple HTML document instance, in which the
linking element is an inline link. The content of the element is what the user will click to traverse to the target anchor. The target anchor is defined by adding an href attribute which contains the address of the target URL.

In XLink 1.0, XLink elements are identified by the presence of an xlink:type attribute. In XLink 1.1, XLink elements are identified by the presence of either an xlink:type attribute or an xlink:href attribute. If an element has an xlink:type attribute, then that attribute must have one of the following values: ”simple”, ”extended”, ”locator”, ”arc”, ”resource”, or ”title” and the element must adhere to the conformance constraints imposed by that XLink element type. On the other hand, If an element has an xlink:href attribute but does not have an xlink:type attribute, then it is treated exactly as if it had an xlink:type attribute with the value ”simple”. By convention, when an attribute includes the type attribute with a value V, we will refer to it as a V-type element, no matter what its actual name is. extended-type elements offer full XLink functionality, such as inbound (with a remote starting resource and a local ending resource) and third-party (neither the starting resource nor the ending resource is local) arcs, as well as links that have arbitrary numbers of participating resources. Simple-type elements offer shorthand syntax for a common kind of link, an outbound link (with a local starting resource and a remote ending resource) with exactly two participating resources (into which category HTML-style A and IMG links fall). locator-type elements address the remote resources participating in the link. arc-type elements provide traversal rules among the link’s participating resources. title-type elements provide human-readable labels for the link. Lastly, resource-type elements supply local resources that participate in the link.

Macro tree transducers

Figure 1. A simple HTML document. Creating an inline link is a straight forward procedure. To create an inline link in HTML file, we have to modify the HTML that acts as the source of the link. However, there are assumptions and problems inherent in this approach to linking. First, In doing this, we must have the write permission to modify part of the content being linked and have thus created an inline link. The problem is, if you don’t have write access to at least one of the resources you will be unable to create any link. For example, suppose you want to build a portable web site to provide access to resources on third party web sites. Perhaps you would like to display links from third party content, but you are not able to show them simply because 3

you don’t have the required write access. There are two options to solve the above problem. First, you could gain write access to the third-party web site so that you can edit their resources to create links from their content to your own. Secondly, you could make a separate copy of their data on your web server and add your own links. Clearly both of the above options are impractical. Sometimes it is desirable to deliver the same content to different types of user, but with different sets of links. For example, a novice may wish to see links to definitions of basic terminology, whereas an advanced user may wish to see links to content that requires a greater level of understanding. With inline linking, the only way to maintain different sets of links is to have multiple copies of the same content. A different set of links would be embedded within each copy. There are two options to solve this problem. First, you must modify each copy of your content. This would be tedious and proved inflexible, and introduces scope for errors in your authoring. Secondly, you may maintain an unlinked master copy of your content. You would change your content in this copy. Then you would create new linked copies of your master copy by re-applying links from the old version of your linked content. This would be even more time consuming and error prone. Besides, HTML provide unidirectional links rather than multidirectional links. A multidirectional link provide a many-to-many relationship between source anchors and target anchors. Sometimes it is necessary to create a link with multiple target anchors. For instance, in HTML the only way to describe the linking relationship that a referred article is cited by several referring article is to create a separate link for each referring article. It is easy to foresee that each individual link must be modified once the URI for referred article had been changed. This would be more time consuming and error prone as the multiple links set in previous problem. The above-mentioned three problems caused by inline links of HTML can be properly solved by the third-party links of XLink. XLink treats links as first-class objects, which means that links may be regarded as separate entities outside of the resources that they are linking. XLink supports a sophisticated linking model that can support advanced hypermedia presentations [15]. However, how links are generated, maintained, transferred, and presented it not within XLink’s scope. In XLink, linkbases are simply XML documents which are collections of third-party links. From XLink’s point of view, this is sufficient, but it is not sufficient from the perspective of application development. The adoption of any new standard is dependent on the availability of compatible client software. Considering the Web specifically, adoption of a new standard requires the support of the authors of browser software, and even presuming this is forthcoming, as a public system there is no

guarantee of the level of client software in use. In the next section, we propose a Browser cOmpatible XLink (BOX) application development framework which represents links via XLink and translates those links into links of HTML.

4

The BOX framework

A typical XLink application allows users to freely traverse its links. Three basic jobs must be completed before those links are available. First, the XLink file which is also an XML document must be parsed by an XML parser. Secondly, the linking elements in the XLink file must be recognized by an XLink processor. Finally, the linking elements are rendering by an user interface. Coming along with the growing request of XML-formatted article publications, tens of free XML parsers have been developed for checking the well-formedness or validity of XML documents. However, due to the versatility of XLink elements, at now there is no general purpose XLink processor at now. Besides, if we want to take advantage of XLink, we have a problem since current Web browsers at best only have premature support for XML. In terms of universal support we only have access to HTML, which lags XLink in terms of three key abilities; third-party links, Multiple link set and multidirectional links. In this paper, we propose an XLink application development methodology, Browser cOmpatible XLink (BOX) application development framework, which offers the flexibility for authors to edit, validate, and automatically generate links via authors’ familiar software. BOX integrates the advantageous link representation of XLink and the requisite presentation of HTML. Consequently, the loading of creating an Xlink application can be significantly reduced. The workflow of BOX is illustrated in Figure 2. As shown in the figure, creating an XLink application in BOX may be carried out through the following sequential phases: editing phase, validation phase, translation phase, and presentation phase. Each phase is further described in the following:

XML Editor (emacs)

XML Parser (Xerces)

Editing phase

Validation phase

XSLT Processor (Xalan)

Translation phase

Browser (Mozilla)

Presentation phase

Figure 2. The BOX framework for XLink application development. Editing phase. The first step is to create and edit an XLink document via authors’ familiar XML-aware editor, such as 4

emacs and xmlspy[16]. The linkbase file which contains the third-party links and the XSLT stylesheet which is responsible for translating XLink links to HTML links are created in this phase as well. A well-formed XML document does not need an associated DTD(Document Type Definition) to describe its structure, but a valid XML document with its associated DTD will reveal more information to succeeding processor. Here, we assume that the XML documents is valid in the BOX framework. Validation phase. Parsing is the most basic operation for XML document processing. After completing the editing, both the XML-based article, linkbase and XSLT stylesheet are submitted to an XML parser, and the parser will pass the root of the generated translation tree to the succeeding XSLT processor, such as Xalan[17] for translation purpose. Almost all XML related standards are structured in XML format. DTD is an exception. In fact. both XSLT stylesheet and XQuery file are also written in XML. These XSLT stylesheets or XQuery files are usually static for specific linkbases. Concerning about the efficiency, we can compile the XSLT stylesheet or XQuery file as a DOM tree, called translation tree, to avoid repeatedly compilation. Translation phase. Third-party link elements are recognized form the linkbase in this phase in which XLink linking elements are translated into HTML links. Many XML related standards, such as XSLT and XQuery, can carry out this translation function. After receiving the roots of source trees and translation tree, the XSLT processor can process the transformation and produce the result HTML document. Presentation phase. The result document of the previous phase is already formatted in HTML, and therefore the interface of XLink application is web-based. The XLink application generated by BOX framework is machineindependent because there exists some web browser which is machine-independent, such as Mozilla. All the traversals of links are performed in this phase. In the followings two subsections, we will present two XLink applications by adopting well-known XML-related software for each phase in the framework. As an open framework, however, BOX can develop application by using different software tools as well. Our platform is an Ubuntu 5.10 system with Pentium 3.2 GHz CPU and 512 MB memory. Xerces-2.7.0[18] and Xalan-1.9.0[17] from the Apache project are chosen as XML parser and XSLT processor respectively.

4.1

vides different publication list web pages for them. If the publication list is tagged with HTML, then the web server need to manage two web pages which are largely identical with minor differences. Each copy of the publication list must be updated whenever the publication list is updated. This would be tedious and may lead to inconsistency. In another word, we face a multiple link set problem as described in section 3. By following the BOX framework, we can implement an application to maintain a single publication list in XML format, and then generate publication lists in HTML format with different links1 .

4.2

Multidirectional Application

Knowledge is power, and bibliography database is the clue of knowledge evolution. The citation relationship among bibliography items helps to depict the development of some subject. Suppose we don’t have write permission to this database, and thus a linkbase file is used to depict the citation relationship. In Figure 3, the three locator-type elements are used to describe two sources and one destination resource, and the arc-type element is used to indicate rules for traversing among these three resources. Note that this arc-type element is a multidirectional link, and is simulated by two HTML elements in BOX.

This arc shows a one-to-many links.

Multiple Link Set Application

Figure 3. The linkbase.xml file.

Consider an online publication list with full texts for downloading purpose. For the authorized users, both the LATEX file and PDF file for each article are accessible, but nonauthorized users can only access the PDF file. According to the authorizations of the clients, the web server pro-

1 Those implementation files of Section 4.1 and Section 4.2 can be requested through the authors or through the web: http://mft.amath.nchu.edu.tw/BOX

5

5

Conclusions

[5] Google Inc. http://www.google.com. [6] Tim Bray, Jean Paoli, and C.M. Sperberg-McQueen. Extensible markup language(xml) 1.1,w3c recommendation, world wide web consortium. http://www.w3.org/TR/2004/REC-xml11-20040204, 2004.

In this paper we present the vision of XLink on the World Wide Web. We motivate the vision with scenarios, analyze the concepts behind it, and discuss how the picture can be completed. Besides, We believe that linkbases will become very popular with Web users, and that they must be a part of the Web infrastructure in the future. However, the absence of general XLink processor for XLink applications makes them hard to implement. The BOX is emerging as an evolution of XLink applications. It makes easier the implementation task as to include the advanced links of XLink and presentation purpose. In this paper, we have presented how the BOX provides a general solution for efficient Xlink application development. It exploits the power of XML to provide a framework for composing XML and XLink files in which authors can directly use favorite editors to compose them, efficiently invoke the XSLT translation service, automatically generate the desired HTML format, and machineindependently render the result on web browsers. From the user’s perspective, this avoids the need to have different collections of software development tools, as well as the need for graphical user interface. Several enhancements can be done in future BOX study. As in the editing phase, checking for the completeness and repetition for third-party links are ignored in our system. However, most common XML-aware editors can accomplish the examination without a great deal of modification. According to the wide accessibility of Internet, the source XML file and the linkbase may reside in different machines or even different platforms. Distributed location requires extra communication efforts in the validation phase. In other words, by setting up a servlet to use XSLT processor (such as Xalan-Java[19]) to respond to requests for automatic HTML generation, the distributed feature should be taken into account in the validation phase. Extending the translation phase with extra XSLT stylesheets to make use XSLT as a query language, linkbase can successfully be accessed and queried for relevant information.

[7] Tseng-Chang Yen and Shang-Juh Kao. Xbms – an open xml bibliography management system. Proceedings of The 3rd International Conference on Information Technology and Application, 1:88–94, 2005. [8] STEVEN J. DEROSE, EVE MALER, and DAVID ORCHARD. Xml linking language (xlink) version 1.0. http://www.w3.org/TR/xlink/, 2001. [9] STEVEN J. DEROSE, EVE MALER, and RON DANIEL. Xml pointer language (xpointer) version 1.0. http://www.w3.org/TR/xptr/, 2001. [10] James Clark and Steve DeRose. Xml path language (xpath) version 1.0. http://www.w3.org/TR/xpath, 1999. [11] Anders Berglund. Extensible style language. 2005. [12] Luc Moreau and Wendy Hall. On the expressiveness of links in hypertext systems. The Computer Journal, 41(3):459–473, 1998. [13] Fabio Vitali. Versioning hypermedia. ACM Computing Surveys, 31(4), 1999. [14] Janet Verbyl. Unlinking the link. ACM Computing Surveys, 31(4), 1999. [15] Harald Weinreich, Hartmut Obendorf, and Winfried Lamersdorf. The look of the link x concepts for the user interface of extended hyperlinks. In Proceedings of the 12th ACM Conference on Hypertext and Hypermedia, 2001. [16] xmlspy. http://www.altova.com/products ide.html.

References

[17] Xalan-C. http://xml.apache.org/xalan-c/index.html.

[1] T. Berners-Lee, R. Cailliau, A. Luotonen, H.F. Nielsen, and A. Secret. The world-wide web. Communications of the ACM, 37, 1994.

[18] Xerces-C. http://xml.apache.org/xerces-c/index.html. [19] Xalan-Java. http://xml.apache.org/xalan-j/index.html.

[2] Netcraft. http://www.netcraft.com. [3] D. LOWE and E. WILDE. Improving web linking using xlink. Proceedings of Open Publish, 2001. [4] S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Hypersearching the web. Scientific American, 1999. 6

Suggest Documents