Generating Transformational Annotation for Web ... - CiteSeerX

4 downloads 168873 Views 1MB Size Report
Jul 12, 2004 - tant to provide an advanced tool support for annotations for Web document .... by a computer, perhaps the person's knowledge can somehow be exploited ..... the page C on the 229th day and the page D on the 365th day.
Generating Transformational Annotation for Web Document Adaptation: Tool Support and Empirical Evaluation Masahiro Hori a,∗ , Kouichi Ono b , Mari Abe b,c , Teruo Koyanagi b a

Faculty of Informatics, Kansai University, 2-1-1 Ryozenji-cho, Takatsuki-shi, Osaka 569-1095, Japan b

IBM Tokyo Research Laboratory, 1623-14 Shimotsuruma, Yamato-shi, Kanagawa 242-8502, Japan c

Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522 Japan

Abstract Web annotation is crucial for providing machine-understandable descriptions of Web resources, and has a number of applications such as discovery, qualification, and adaptation of Web documents. While annotations are often embedded into a Web document, annotations can be associated externally by means of addressing expressions represented with the XPath language. However, creation of external annotation solely with a conventional editor is not easy because annotation authoring involves the maintenance and elaboration of addressing expressions as well as annotation contents. In addition, there has been little empirical study of robust pointing by XPath expressions, in spite of the increasing prevalence of the XPath language for use in emerging content adaptation systems. This paper proposes a classification of annotation tool design, taking account of differences in authoring methods and roles of annotation. On the basis of the classification, tools for generating external annotations are briefly explained along with applications of Web document adaptation for small-screen devices and portal site development. Robustness of the addressing expressions is then investigated, and practical implications to the reliable use of external annotation are drawn from empirical evaluation with evolving real-life Web documents. Key words: external annotation, document adaptation, annotation tool, XPath, robust pointing

∗ Corresponding author. Email addresses: [email protected] (Masahiro Hori),

Preprint submitted to Elsevier Science

12 July 2004

1

Introduction

An annotation is a remark attached to a particular portion of a document, and covers a broad range in the literature. Forms of annotations can be characterized by the dimensions: whether formal or informal, and whether tacit or explicit [28]. Annotation that allows structural specification resides at the most formal and explicit extreme. Web annotation is crucial for providing not only human-readable remarks, but also machine-understandable descriptions, and has a number of applications such as discovery, qualification, and adaptation of Web documents [23]. As more and more Web-enabled personal devices are becoming available for connecting to the Internet, the same Web documents need to be rendered differently on different client devices. Adaptation of Web document to delivery context is thus crucial for transparent Web access, which may depend on client capabilities, network connectivity, or user preferences [9]. The long-term goal of our research is to establish technologies of adapting Web documents suitable for delivery context. The document adaptation or customization requires annotation that indicates the ways of modifying the document at hand because Web documents are usually created without considering such adaptation and are not provided with any additional information or hint for the adaptation. Annotations can be embedded into a Web document as inline annotations, which are often created as extra attributes of document elements. Existing HTML browsers ignore unknown attributes added to HTML elements, without being bothered by the proprietary inline annotations. Because of this simplicity, inline annotation has been adopted as a way of associating annotation with HTML documents [27,32,12,15]. An advantage of the inline approach is the ease of annotation maintenance without the bookkeeping task of associating annotations with their target document. The inline approach, however, requires annotation authors or annotators to have document ownership because annotated documents need to be modified whenever inline annotations are created or revised. On the other hand, the external annotation approach [16,17] does not suffer from these issues related to document ownership. In addition, clear distinction between content and annotation is desirable with regard to the design guideline that content should be separated from presentation. Therefore, it is assumed in this study that annotation is maintained separately from a target document, and exploited dynamically at runtime by a document adaptation engine. It is important to note that an external annotation may point to portions of different Web documents, if it makes sense to apply the annotation to [email protected] (Kouichi Ono), [email protected] (Mari Abe), [email protected] (Teruo Koyanagi).

2

documents with the same document fragment. External annotation approach thus provides a promising way of facilitating the sharing and reuse of metadata for Web document repurposing [19]. An external annotation consists of two items: annotation content and an addressing expression. In particular, the addressing expressions are represented with the open standard XPath language [36], which allows pointing to arbitrary nodes in the document object model (DOM) [10]. Besides the XPath language, there is another open standard addressing language called XPointer [37]. The XPointer language, which is an extension of the XPath, allows finergrained pointing to substrings in character data and flexible pointing to multiple DOM-tree fragments. Regardless of the full-featured expressive power of the XPointer language, external annotations in this study are used for the node-level adaptation of Web documents. Therefore, we adopted the XPath language as a scheme of addressing expressions in this paper. When annotations are attached simply as commentary to a target document, browser-based annotation tools are desirable even if the annotation is externally maintained. In the case of annotation for Web document adaptation, an addressing expression indicates a part of the document to be customized, and an annotation content specifies how the indicated portion should be modified. Therefore, creation of such annotation for document customization is not easy solely with a conventional editor for commentary annotations, and it is important to provide an advanced tool support for annotations for Web document adaptation. In addition to the issues related to the tool support, robustness of the addressing expressions is crucial for the use of external annotation. Since Web documents may change over time, it is not always obvious what kinds of addressing expression keep pointing to the same target element regardless of the document changes. It was reported that a key complaint in the use of electronic annotation was the situation in which an annotation cannot point any portion of a target document [5]. This is an issue related to robust positioning, which has been investigated in a couple of empirical studies [31,4]. However, there has been little study of robust pointing by XPath expressions, in spite of the increasing prevalence of the XPath language not only for use with XSLT [38], but also in emerging content adaptation systems [17,34,29,3]. The objective of this study is to propose a classification of annotation tools that generate annotations for Web document adaptation, and to draw implications to the reliable use of external annotation on the basis of empirical evaluation with evolving real-life Web documents. In the next section, we clarify the different roles of annotations for assertion and transformation, and then introduce variations in annotation tool design with distinction of two types of authoring methods: annotation by selection and by example. Section 3 explains 3

a page-clipping annotation language and tools for generating the clipping annotation, along with applications of the annotation to document clipping for small-screen devices and portal site development. In Section 4, we investigate the robustness of addressing expressions taking account of the changes in real-life Web documents. In particular, it was investigated to what extent the XPath expressions generated by the tools continued to point to the same nodes in the documents updated during the observation period of one year and six months. Finally, we discuss the advantages and limitations of the XPath expressions for practical use in external annotation.

2

Variations in Annotation Tool Design

An annotation in general declares properties that qualify a particular portion of a target document. In some cases, however, annotations may indicate structural changes for the annotated portion of a target document. In order to clarify the distinction of these two roles, the former is called assertional annotation, while the latter transformational annotation [18]. Note that this distinction is not exclusive, because every annotation is intrinsically an assertion. It is simple for annotators to select a location to be annotated and provide annotation content. This is an approach what we call annotation by selection, and is adopted by existing annotation tools [1,6,17,21,29,33,14]. In contrast to assertional annotation (such as Dublin Core Metadata [8]), transformational annotation has been used for Web content adaptation, in which structural changes of a target document are needed [17,29,34,38]. In the case of transformational annotation, it is not easy for an annotator to indicate the ways of modifications declaratively as assertional annotation. However, it is straightforward for an annotator to modify a target document toward the desired result of the customization, and transformational annotation can be generated on the basis of an example of document customization. This is a basic idea behind an approach what we call annotation by example, which was originally proposed in our previous work on the automatic generation of XSLT scripts by example [22,30]. Table 1 Variations in annotation tools Authoring methods

R ol es of a nnota tion

By s elec t i o n

By example

A s s er t i o n

(a)

N/A

T r an s f o r mat i o n

(b)

(c)

4

According to the above-mentioned distinctions of annotation roles and annotation authoring methods, Table 1 summarizes variations in annotation tools. Annotators can select a portion of document to be annotated and declare properties on the selected portion as assertional annotation. This type of annotation tool corresponds to the assertional annotation by selection [Table 1(a)], and most of the existing annotation tools fall into this category. Even when annotations are used for structural changes of a target document, it is possible for an annotator to create transformational annotations by selecting portions to be changed and declare instructions of transformations as annotations. This type of annotation tool is classified as the transformational annotation by selection [Table 1(b)]. In order to create transformational annotations, annotation by example would be easier for annotators, because the annotators can work with a concrete example and create a desired result interactively with the example. In particular, the example-based method allows annotators to generate transformational annotations on the basis of annotators’ editing operations conducted to come up with a desired result. This type of annotation tool falls into the category of transformational annotation by example [Table 1(c)]. The example-based method is particularly useful for the transformational annotations, and would not make sense for assertional annotations, because it is not intuitive for annotators to indicate assertional annotations as results of structural changes of a target document. The core part of the tool configuration, which is common to all the three variations, is independent of any particular views and editors, and consists of two document object models [10]: one for a target document and another for an annotation document. It is assumed here that the creation of an annotation document is a primary task of annotators, and the annotators are not allowed to modify a target document. The assertional annotation by selection is the most typical and provide a comprehensive way of annotation. The details of this type of annotation tool is reported in another article [1,20]. According to our interests in Web document adaptation, this paper focuses on the transformational annotation, and deals with the two approaches to generating annotation by selection [Table 1(b)] and by example [Table 1(c)].

2.1 Transformational Annotation by Selection Fig. 1 depicts a configuration for the transformational annotation by selection. This type of annotation tool relies on a target document viewer, because portions of a target document can only be selected without any modification. With this type of annotation tool, first an annotator opens a target document to be customized. The annotator then selects portions of the target document 5

by using a document viewer [Fig. 1 (a)], and indicates how each of the selected portion to be modified (e.g., to be removed and enlarged). The ways of modification, namely, annotation instructions are usually specified by using a pup-up menu or a dialog window, but may be specified implicitly with a default operation mode as given later in the annotation tool for portal site development. Transformational annotation can then be generated [Fig. 1 (b)] on the basis of the selected portions of a target document and annotation instructions.

2.2 Transformational Annotation by Example Transformational annotation languages such as XSLT are more like programming languages, and it is not necessarily easy for an annotator to create transformational annotations merely by using a conventional editor for Web documents. However, if a person knows how to perform a task to be executed by a computer, perhaps the person’s knowledge can somehow be exploited for the creation of a program to perform the task. This is the idea behind the programming by example [24]. Programming by example is a natural approach to generating transformational annotation for page designers or novice programmers, because they need only work with examples of how to transform a document at hand, and are given with generated annotations that can replicate the same transformation. Selection-based annotation generation

Core components

Target D o c u m en t Viewer

T a r g et D o cu m en t

(a)

annotate T r a n s f o r m a ti o n a l A n n o ta ti o n

(b)

Selected n o des

A n n o tati o n G en erato r

Fig. 1. Tool configuration for transformational annotation by selection Example-based annotation generation

Core components

Target D o c u m en t Editor

T arg et D oc u m ent

C u s tom iz ed D oc u m ent

(b)

annotate T rans f orm ational A nnotation

(a)

(c)

A n n o tati o n G en erato r

Operation h is tory

Fig. 2. Tool configuration for transformational annotation by example

6

A configuration of the example-based annotation tool is depicted in Fig. 2. This type of annotation tool relies on a target document editor rather than a viewer in contrast to the annotation by selection (see Fig. 1). With this type of annotation tool, first an annotator opens a target document to be customized (e.g., an HTML page). The annotator then edits the document by using the full capabilities of a WYSIWYG authoring tool [Fig. 2 (a)]. Although the annotator’s editing actions are recorded into an operation history [Fig. 2 (b)], the annotator does not have to care about the recording process behind the scenes. When the editing is finished, the annotation generator creates transformational annotation for the document customization [Fig. 2 (c)], which can be used by a runtime engine (e.g., XSLT processor) to replicate the transformation from the initial document to the customized document.

3

Annotation Tools for Web Page Clipping

Web pages for e-commerce, for example, contain a lot of information such as product descriptions, product images, and numerous links to other areas of the site. Even if the pages are created for the desktop computers, it would be useful to deliver portions of those pages for users to access through a Webenabled phone rather than a desktop browser. In such a case, the images and nested HTML tables prepared for a nicely laid out page are a hindrance rather than help. The sheer amount of information becomes unwieldy in the small display, and potentially expensive depending on the user’s wireless service. In this section, annotation tools for transformational annotation are explained along with applications to page clipping for small-screen devices and portal site development.

(a) O ri gi nal p age U R L -x U R L -x

Transcoding P rox y

H TTP S e rv e r

Target document (U R L -x)

C l ip p ing e ngine A A nnotati nnotati on on document document

(b ) C l i p p ed p age

Fig. 3. Overview of annotation-based Web document adaptation

7

3.1 Page Clipping for Small-Screen Devices

An overview of an annotation-based document adaptation process is depicted in Fig. 3. Upon receipt of a request from a client, a Web document is retrieved from a content server. Taking account of the capabilities of the client specified in the HTTP request header, a transcoding proxy selects one or more transcoding modules. When a selected transcoding module requires an annotation document, an annotation file is also retrieved from a content server, which may or may not be the same server that retrieved the Web document. The transcoding module may simply return the original document, if a client agent has the rendering capabilities compatible with ordinary desktop computers [Fig. 3 (a)]. Alternatively, the original document may be returned with modification, so that the original content can fit into a small-screen device [Fig. 3 (b)]. The decisions about the document adaptation are made taking account of the client capabilities specified in the HTTP request header. An annotation-based page-clipping engine [34] provides a way of realizing Web document adaptation. At content delivery time, the page-clipping engine may modify the original document with reference to page-clipping annotations and client profiles sent over HTTP. The main idea in the page-clipping annotation language is the notion of a clipping state. By using and elements in the annotation descriptions, an annotator can specify the clipping state to indicate whether the content being processed should be preserved or removed. For example, a corporate Web page and its clipped result are shown in Fig. 4. The clipped page [Fig. 4(b)] preserves areas for the company logo image, news headlines, and menus in the header and footer, and all the other portions are removed from the original page. In the clipped page, the layout of an HTML table for the header is changed by merging the first and second rows in the first column and by deleting the second and third column in the first row. In addition, width attributes of the header and footer tables are changed from an absolute pixel value of 760 to a percentage value of 100%, so that the tables can be adjusted with the available horizontal space of small-screen devices. Fig. 5 shows portions of an annotation document that realizes the page clipping illustrated in Fig. 4. The element prescribes a unit of an annotation statement in the annotation language. The target attribute is set to an XPath expression, and identifies the node on which the annotation will be applied, and the take-effect attribute indicates whether the annotation is applied before or after the target node. By specifying the value of target attribute as /html[1]/body[1]/*[1] [Fig. 5(a)], the clipping state is activated after the first element of the document. The element in Fig. 5(a) indicates that all the document elements encountered are preserved, until otherwise instructed by another annotation statement. The clipping state 8

(a) Original page

(b ) C lipped page

Fig. 4. Example of Web document clipping

is changed to ’remove’ just before the first element [Fig. 5(b)], and changed back to ’keep’ after the element [Fig. 5(c)]. As results, the first script element indicated by /html[1]/body[1]/script[1] is removed while preserving elements just before and after the removed element. Since HTML tables can often be complex elements to clip, the annotation language provides special-purpose elements to make table clipping easier. The and elements allow user to clip rows and columns without relying on complicated XPath expressions. The table-clipping elements are used in the description shown in Fig. 5(d). This description sets the clipping state to ’keep’ just before the first table element, and also changes the value of width attribute to 100% by using the element. The name attribute of can be specified with an arbitrary attribute name available in a target document. Elements for the clipping annotation [35] are listed in Table 2. The description element for the table clipping in Fig. 5(d) declares that the first column and the first row are discarded, while all the other columns and rows in the table are preserved. Note that the wildcard character to indicate multiple columns (index="*"). If a wildcard is specified, all rows (or columns) will be affected, except for those specifically indicated by a separate (or ) element. This table-clipping description thus indicates the removal of all rows and columns other than the cell for the header menu that includes menu items for ”Home”, ”Products & Services” and so on. However, this instruction in itself does not preserve the company logo image appears in the upper left corner of Fig. 4(a). Therefore, the description for inserting a cell for the logo image is needed as shown in Fig. 5(e). The CDATA section included in the element substitutes the table cell for the logo. 9

Suggest Documents