Extensible framework of authoring tools for Web document ... - CiteSeerX

Extensible Framework of Authoring Tools for Web Document Annotation Masahiro Hori, Mari Abe, and Kouichi Ono IBM Tokyo Research Laboratory 1623-14 Shimotsuruma, Yamato-shi Kanagawa-ken, 242-8502, Japan {horim, maria, onono}@jp.ibm.com

Abstract. Web metadata is crucial for providing machine-understandable descriptions of Web resources, and has a number of applications such as discovery, qualification, and adaptation of Web documents. While metadata is often embedded into a target document, metadata can also be associated externally by means of an addressing scheme such as the XPath language. However, creation and modification of external metadata solely with a conventional editor is not easy because metadata authoring involves the maintenance and elaboration of addressing expressions as well as editing individual documents. The objective of this study is to advance extensibility and variations in the configuration of annotation tools, taking account of different authoring methods as well as the different roles of annotations for assertion and transformation. In this paper, we explain a schema for external annotation as a basis for the design of annotation tools. The framework of annotation tools is then introduced, distinguishing tool configurations for annotation by selection and annotation by example. Finally, we present practical applications of external annotations for Web document clipping, and show how annotation tools are used for annotation authoring.

1

Introduction

Web metadata is crucial for providing machine-understandable descriptions of Web resources, and has a number of applications such as discovery, qualification, and adaptation of Web documents [20]. A remark attached to a particular portion of a document is called an annotation, and covers a broad territory in the literature. Forms of annotations can be characterized by the dimensions: whether formal or informal, and whether tacit or explicit [23]. Metadata that follows structural specification resides at the most formal and explicit extreme. In this paper annotation and metadata are used interchangeably in this restricted sense. While metadata is often embedded into a target document, metadata can also be associated externally by means of an addressing scheme such as the XPath language [34]. However, creation and modification of external metadata solely with a conventional editor is not easy because metadata authoring involves the maintenance and elaboration of addressing expressions as well as editing individual documents. Configurations of annotation tools depend on the annotation scenario. Browser-based annotation tools [7, 11, 24, 29] are desirable when annotators are not allowed to edit target documents without document ownership. On the other hand, an annotation tool based on a WYSIWYG editor [15] is helpful when annotators are responsible not only for the creation of the annotations but also for the editing of the target documents. Regardless of the variety of emerging annotation tools, a significant limitation of the current annotation tools is the lack of extensibility, because the existing tools are developed solely for a particular annotation vocabulary, and provided with predefined views for the authoring. In pursuit of Web content adaptation, we have been working for the development of an annotationbased transcoding system [13, 14, 17], their annotation authoring tools [1, 15, 16, 19, 25], and empirical evaluation for the robustness of external annotations [2]. In particular, the page-clipping engine and clipping annotation tools were made commercially available as software products of transcoding proxy [30, 5] and portal server [8]. The objective of this paper is to propose a framework of authoring tools for external annotation. In particular, the extensibility of tool configuration is investigated on the basis of two authoring 1

methods (annotation by selection and by example) as well as the different roles of annotations for assertion and transformation. In the next section, we explain a schema for external annotation which is specified as an XML Information Set [32]. Section 3 explains the framework of authoring tools for the external annotation, and shows three types of typical tool configurations. Finally, we present practical applications of external annotations for Web document clipping, and show how annotation tools are used for annotation authoring.

2

XML Information Set of Annotation Document

The framework of external annotation prescribes a scheme for representing annotation files and a way of associating original documents with external annotations [14]. The basic ideas behind this framework are twofold. One is that it should not be introduced new elements or attributes into the document type definitions of the target documents to be annotated. The other is that annotations need to be created for arbitrary parts of annotated documents. Annotations can be embedded into a Web document as inline annotations. Inline annotations are often created as extra attributes of document elements. Most existing HTML browsers ignore unknown attributes added to HTML elements, without being bothered by the proprietary inline annotations. Because of its simplicity, inline annotation has been often adopted as a way of associating annotation with HTML documents [22, 28, 11, 12]. An advantage of the inline approach is the ease of annotation maintenance without the bookkeeping task of associating annotations with their target document. The inline approach, however, requires annotators to have document ownership because annotated documents need to be modified whenever inline annotations are created or revised. The external annotation approach, on the other hand, does not suffer from these issues related to document ownership. Moreover, the most important point of the external annotation approach is that it facilitates the sharing and reuse of annotations across Web documents. Since an external annotation points to a portion of a Web document, the annotation can be shared across Web documents that have the same document fragment. Furthermore, the mixing of content and metadata is not desirable according to the design guideline that content should be separated from presentation. External annotation files contain metadata that refers to a part of a document to be annotated. XPath [34] is used to associate annotated portions of a document with annotating descriptions. An external annotation document refers to portions of an annotated document. A reference may point to a single element (e.g., an IMG element), or a range of elements (e.g., an H2 element and the following paragraphs). For example, /HTML/BODY/P[3] points to the third P element of the BODY element of the annotated document. If a target element has an id attribute, the attribute can be used for direct addressing without the need for a long path expression. When annotation files are stored in a repository, an appropriate annotation file for a Web document needs to be selected dynamically from the repository either implicitly by means of a structural analysis of the subject document, or explicitly by means of a reference contained in the subject document or some other association database. The explicit association, for example, can be done by using XLink [33] or link tag of HTML. An annotation framework in a broader sense needs to prescribe ways of storing and restoring annotations between the annotation repository and the authoring tool. However, the repository issue is outside the scope of this study because we are pursuing a framework for annotation editors that can be customized for different vocabularies and forms of annotations. One annotation vocabulary would not fit for all the requirements of metadata representation. Moreover, it is also impractical to expect to provide a common basis for a generic portion covering all annotation vocabularies. This difficulty comes from the approach of defining document type specifications based on the grammatical aspects of specific vocabularies, because the same contents may be described with different grammars. Therefore, in order to clarify the edit-time constraints, it is important to characterize a class of annotation documents, without regard to rigorous grammatical definitions. Information available from an XML document can be specified as an abstract data called XML Information Set (Infoset) [32]. Figure 1 shows a RDF schema [27] of the annotation document 2

infs:I nfo I tem s

s

s

infs:D o c um ent

s

r rd f:Pro p erty

s

d

infs:o w nerE l em ent

s XPathAttribute

D esc rip tio nE l em ent

infs:Parent

sp r

sp

infs:Attribute

rd f:Pro p erty

infs:E l em ent

s

s = rdfs:subClassOf sp = rdfs:subPropertyOf d = rdfs:domain r = rdfs:range

r d

d

o w nerD esc rip tio n

o w nerAnno tatio nD o c um ent

Nodes with bold outlines are instances of rdfs:Class.

Fig. 1. RDF schema for an annotation document infoset

infoset. Features of the annotation document infoset can be described as follows. Note that terms in bold face indicate information items, and the terms in italic are items specific to the annotation information set. – – – –

A description element is an element item. An XPath attribute is an attribute item, and may take an XPath expression as its value. An XPath attribute is owned by a description element (owner description property). The parent of a description element is a document item (owner annotation document property).

The annotation document infoset mentioned above provides constraints on the structure of annotation documents, and exploited as edit-time constraints to be satisfied by annotation tools. The annotation tool framework presented in the next section incorporates the specifications of the annotation information set into an annotation profile.

3

Framework of Annotation Tools

An annotation in general declares properties that qualify a particular portion of a target document. Moreover, annotations may indicate structural changes for the annotated portion of a target document. In order to clarify the distinction of these two roles, the former can be called assertional annotations, while the latter transformational annotations [16]. It is important to note that this distinction is not exclusive, because every annotation is intrinsically an assertion. It is straightforward for annotation authors to indicate a location to be annotated and create an assertion as annotation content. This is an approach that we call annotation by selection, and is adopted by existing annotation tools [1, 7, 11, 15, 18, 24, 29]. On the other hand, for transformational annotations, it is easier for the authors to modify a target document toward the desired results of the customization, rather than to indicate the ways of modifications declaratively as assertional annotations. This is a basic idea behind an approach what we call annotation by example, which was originally proposed for the automatic generation of document transformation rules [19] and also applied for automatic generation of transformational annotations on the basis of users’ editing operations for the content customization [16]. 3

Table 1. Variations in annotation tool design Authoring methods

R ol es of a nnota tion

By s elec t i o n

By example

A s s er t i o n

(a)

N/A

T r an s f o r mat i o n

(b)

(c)

According to the distinctions of annotation authoring methods and roles of annotation, Table 1 summarizes variations in annotation tools. It is straightforward for annotation authors to select a portion of document to be annotated and declare properties on the selected portion as assertional annotation. This type of annotation tools support assertional annotation by selection [Table 1(a)]. Even when annotations are used for structural changes of a target document, it is possible for authors to create transformational annotations by selecting portions to be changed and declare instructions of transformations as annotations. This type of annotation tools support transformational annotation by selection [Table 1(b)]. In order to create transformational annotations, however, annotation authoring by example would be much easier for users, because users can work with a concrete example and create a desired result interactively with the example. In particular, the example-based method allows users to automatically generate transformational annotations on the basis of users’ operations conducted to come up with a desired result. This type of annotation tools follows transformational annotation by example [Table 1(c)]. This example-based method is particularly useful for the transformational annotations, but would not make sense for assertional annotations, because it is not intuitive for users to indicate assertional annotations as results of structural changes of a target document to be annotated. Figure 2 shows an overview of annotation tool framework, which realizes the three configuration of annotation tools explained above. The core part of this framework is independent of any particular views and editors, and prescribes the internal document models [10] and their relations common to all the annotation tool configurations. It is assumed here that the creation of an annotation document is a primary task of users, and the users are not allowed to modify a target document as well as the annotation document. The important point here is that the constraints imposed on the core part come from the annotation document infoset, and make the tool framework extensible allowing addition of authoring capabilities as needed. In the remainder of this section, the assertional annotation by selection and the transformational annotation by example are explained respectively in Sections 3.1 and 3.2. The transformational annotation by selection is explained in Section 4.2 together with an application to document clipping for portal pages.

A n n o tati o n To o l F ram ew o rk C o re

A s s erti o n al A n n o tati o n b y S el ec ti o n

Target D o c u m en t

Tran s f o rm ati o n al A n n o tati o n b y S el ec ti o n

A n n o tati o n D o c u m en t

Tran s f o rm ati o n al A n n o tati o n b y E x am p l e

Fig. 2. Overview of annotation tool framework

4

3.1

Assertional Annotation by Selection

Figure 3 shows a tool configuration for the assertional annotation by selection. In addition to the core components, this tool configuration includes an XPath composer with a target document viewer [Figure 3(a)], and an annotation document editor [Figure 3(b)]. The annotation document editor is provided with an annotation profile, which allows customization of the editor for different annotation vocabularies. Since the external annotation can be imported into a target document and stored as inline annotations, this tool configuration can be provided with an additional component for importing external annotations [Figure 3(c)].

Core Components T a r g et D o c u m ent

A nno t a t io n D o c u m ent

(a) Creation of XPath attribute X P a th C o m p o s er

T ar g e t D oc u m e nt V ie w e r

(b) Creation of description element A nno t a t io n D o c u m ent E d it o r

Annotation P r of il e

(c) Conversion to Inline Annotation Annotation I m p or te r

Inline A nno t a t io n

Fig. 3. Tool configuration for assertional annotation by selection

Figure 4 shows a screen copy of an annotation tool, which is developed for the assertional annotation by selection. The main window is divided into two panels. The left pane is the area for the target document viewer, and also contains a target view and a source view in the tabbed pane. The right pane is for the annotation document editor containing a tree view. When the target is an HTML document, users can employ the browser view. The browser view is not embedded in the split pane because it is often helpful for users to compare the selection between the target tree view and the browser view. The annotation tool shown in Figure 4 can be customized for different annotation vocabularies by selecting one of the annotation profiles. Figure 5 shows the profile selection dialog. When a profile is selected from the pull-down list, the dialog shows the definition of the current profile. An annotation profile, which is depicted in Figure 3(b), includes the location of a DTD file, an annotation file type, the location of a data directory, and the names of the information items shown in Figure 1. In the figure, the ”EML Sample Annotation” profile is selected, and the annotation documents are characterized in terms of the root element name annot, the annotation description name description, and the XPath attribute name target. The annotation editor assumes that a target attribute is owned by a descriptions element that is an immediate child of the root element (see Figure 1). According to the annotation document infoset, an annotation document consists of description elements. Every description element must be given with an XPath attribute and annotation contents. The annotation tool in Figure 4 provides an XPath composer. The key idea underlying the XPath composer is to improve flexibility of the creation of XPath expressions with a seamless integration of three authoring methods: automatic instantaneous creation, context-constrained creation, and manual creation. Further details of the XPath composer are reported in another article [1]. 5

Target Target D D o o c c u u m m enen tt V V i i ewew er er

A A n n n n o o tati tati o o n n D D o o c c u u m m enen tt E E d d i i toto rr

P P o o p p u u p p X X P P ath ath m m enen u u

A A n n n n o o tated tated D D o o c c u u m m enen tt N N o o d d ee G G rap rap h h

H H TMTM L L B B roro w w s s er er V V i i ewew

Fig. 4. Screen copy of a tool for assertional annotation by selection

Fig. 5. Dialog for selecting an annotation profile

6

In addition to XPath composition, it is necessary for annotation authoring to create the contents of the description item. This part of authoring relies on a DTD given with a corresponding annotation profile (Figure 5). The contents of the description item can be edited in the same manner as with a DTD-based structure editor on a DOM-tree view. When a description node is selected in the annotation tree view, a popup menu can be brought up by clicking the right mouse button as shown in Figure 6. The first popup menu is common to all annotation vocabularies, and provides menu items to add/remove an element, add/remove an attribute, and edit an attribute value. The second-level menus are derived from the DTD information of the current profile. Figure 6 shows a popup menu that appears after choosing the ”Add the first child element” item, and the valid elements such as importance, date, and author are given as candidates for the insertion as a child of the description node. When the user chooses the ”author” item, a dialog appears with an edit field for typing the text content, namely, an author name.

Fig. 6. Popup menu for editing an annotation description

When annotations have been created, the user can save the annotation document in two ways, either as an external annotation file or as an inline annotation file. The serialization as an external annotation file can be done by simply choosing ”Save” menu item from ”File” pull-down menu of the annotation document editor. For example, an excerpt of the above mentioned annotation descriptions would be stored as the following annotation document: Mari Abe 2002/12/12

In addition, the created annotations can be imported into a target document [see Figure 3(c)], by choosing ”Import” menu item from the ”File” menu of the annotation document editor. The ”Import” menu item is enabled only when an import procedure is given as an implementation of ”AnnotImporter” Java interface. Availability of an import procedure is also specified in an annotation profile, although it is not shown in the profile selection dialog. Therefore, the annotation importing procedures can be defined according to the application-specific requirements. For example, comment sharing in a collaborative environment is being pursued in the Annotea project 7

[18]. Annotations in Annotea are embedded in target documents, and XPointer [35] is used for locating the annotations in the annotated document. If an importing procedure is implemented in accordance with the requirements of Annotea, annotations in the above example could be imported into a part of the HTML document as follows.

In this document fragment, annotations are embedded as attributes of SPAN elements, and reference to inline annotations in Annotea can be done by using XPointer expressions. For example, an XPointer expression xpointer(id("G-324")/SPAN[2]) can point to the SPAN element that is the second child of the element uniquely identified by the id attribute value ”G-324”. This is a case when annotation serialization is done in the form of an embedded annotation. However, it is important to note that the annotation authoring in itself is done on the basis of the XML infoset of external annotation documents, and the users can receive the full benefit of the flexibility in the annotation tool framework. 3.2

Transformational Annotation by Example

Transformational annotation has been used for Web content adaptation, in which structural changes of a target document are needed [3, 14, 24, 30, 36]. Among those annotation languages, XSL Transformation Language (XSLT) [36] is well-known, and the document type definition of XSLT is actually compliant with the constraints of annotation document infoset explained in the previous section. In contrast to assertional metadata such as Dublin Core Metadata [9], transformational annotation languages are more like programming languages. Learning an abstract language and writing programs are not easy tasks for most people. However, if a person knows how to perform a task to be executed by a computer, perhaps the person’s knowledge can somehow be exploited for the creation of a program to perform the task. This is the idea behind programming by example [21]. Programming by example is a natural approach to creating the transformational annotation for page designers or novice programmers, because users need only work with examples of how to transform a document at hand, and are given automatically generated annotations that can replicates the same transformation. Figure 7 illustrates the idea of generating transformational annotation (e.g., page-clipping annotation) as results of WYSIWYG editing of an original page. Target Document Customized Document ( a) W Y S I W Y G editing ( transf ormation)

Keep Keep

R R emem o o v v ee

( b ) A utomatic generation

A nnotation Document

Fig. 7. Illustration of annotation generation by example

8

A configuration of the example-based annotation tool is depicted in Figure 8. With this annotation tool, first a user opens a target document to be customized (e.g., an HTML file). The user then edits the document by using the full capabilities of the WYSIWYG authoring tool. Although a user’s editing actions are recorded into an operation history, the user does not have to care about the recording process behind the scenes. When the editing is finished, the user will have a customized document. At the same time, the annotation generator creates transformational annotation for the customization, which can also be used by a runtime engine to replicate the transformation from the initial target document to the customized document.

Core Components

Automatic Generation of Annotation by Example

T arg et D oc u m ent

Target D o c u m en t Editor

C u s tom iz ed D oc u m ent

A nnotation D oc u m ent

A n n o tati o n G en erato r

Operation h is tory

Fig. 8. Tool configuration for transformational annotation by example

Since users can perform a given editing task in different ways, it is necessary to prescribe a set of edit operations to be recorded in the user interface. The essential mechanism of the annotation generation depends solely on the Document Object Model (DOM) [10], and can be used for not only as arbitrary metadata format when a custom generator is provided, but also with an arbitrary content authoring tool as long as the editor adopts the DOM as an internal document model. On the basis of the idea of annotation by example, we have developed annotation generation tools [16, 25, 19]. Figure 9 shows a screenshot of the prototype annotation tool, which is developed on top of an open tool-integration platform [31]. This annotation tool generates page-clipping annotations [30] as results of users’ editing operations with a WYSIWYG HTML editor [Figure 9(a)]. The area of the WYSIWYG editor can be changed by selecting one of the sizes available from the page size button [Figure 9(b)], for example, reducing the area in the screen shot 1/4 VGA size. Figure 9(c) shows the generated annotation, which is an XML document in the page-clipping annotation language. The look and feel of this WYSIWYG editor is exactly the same as in the original HTML editor, except for the small window for the generated annotation. A recording button [Figure 9(d)] is used to start and stop recording a user’s operations. While the toggle button remains depressed, the user’s operations are recorded. When the button returns to its normal state, the recording stops. A page-clipping annotation document is created from the recorded operations by pressing the create annotation button [Figure 9(e)]. Details on the annotation generation procedure are reported in the other articles [16, 19]. All the above-mentioned editing operations actually modify the target document by means of DOM manipulation operations, and can be created as transformational annotations. However, assertional annotations cannot be created on the basis of the DOM manipulation operations, but need to be declared explicitly as assertions. Annotation tools that follow the annotation by selection approach play a complementary role in such situations instead of the example-based annotation generation for the transformational annotations.

4

Applications of External Annotation

Annotations provide additional information about Web contents, so that an adaptation engine can make better decisions on the content re-purposing. The role of annotations is to provide explicit semantics that can be understood by a content adaptation engine [17]. Figure 10 depicts 9

(a) WSYIWYG HTML editor

(b) Page size button

(e) Create annotation button (d) Start/stop recording button

(c) Generated annotations

Fig. 9. Sample display from the environment for generating transformational annotations

an overview of an annotation-based transcoding process. Upon receipt of a request from a client, a Web document is retrieved from a content server. Taking account of the capabilities of the client specified in the HTTP request header, a transcoding proxy selects one or more transcoding modules. When a selected transcoding module requires an annotation document, an annotation file is also retrieved from a content server, which may or may not be the same server that retrieved the Web document. The transcoding module may simply return the original document, if a client agent has the rendering capabilities compatible with ordinary desktop computers [Figure 10 (a)]. Alternatively, the original document may be returned with modification, so that the original content can fit into a small screen device [Figure 10 (b)]. The decisions about the content adaptation are made taking account of the client capabilities specified in the HTTP request header. 4.1

Annotation-Based Document Clipping

Figure 7 shows a catalog page of notebook computers. The catalog page contains a lot of information, such as details of the product specification, a search field, and numerous links to other areas of the site that might be of interest to the user. However, it may be necessary to deliver portions of this page for users to access through a Web-enabled phone rather than a desktop browser. In such a case, the images and nested HTML tables prepared for a nicely laid out page are a hindrance rather than help. The sheer amount of information becomes unwieldy in the small display, and potentially expensive depending on the user’s wireless service. Content adaptation as illustrated in Figure 7 can be done, for example, by using an annotationbased page-clipping engine [30]. At content delivery time, the page-clipping engine may modify the original document with reference to page-clipping annotations and client profiles sent over HTTP. The main idea in the page-clipping annotation language is the notion of a clipping state. By using and elements in the annotation descriptions, users can specify the clipping state to indicate whether the content being processed should be preserved or removed. 10

(a) O ri gi nal p age U R L -x U R L -x

Transcoding P rox y

H TTP S e rv e r

Target document (U R L -x)

C l ip p ing e ngine A A nnotati nnotati on on document document

(b ) C l i p p ed p age

Fig. 10. Overview of an annotation-based transcoding

As a simple example, an HTML page and its clipped results are shown in Figure 11. In this example, the header and the first paragraph are preserved as shown in Figure 11(a). The table element is modified by deleting the third column and the second row. The cell-padding attribute of the table is increased, so that each table cell can be provided with margin space [Figure 11(b)]. In addition, the whole of the second paragraph is removed as shown in Figure 11(c). Original page

C lipped page

(a) pres erv e

(b ) m o d if y (c ) rem o v e

Fig. 11. Simple example of an HTML page and its clipped result

All the structural changes in HTML documents can be easily done by using a WYSIWYG HTML editor. For example, a table row can be removed by selecting a cell in the row to be deleted and then choosing the menu item ”Delete Row” from a popup menu (Figure 9) that appears after clicking the right mouse button. The cellpadding attribute of the table can also be changed from a dialog window for the table attribute settings. Figure 12 shows an annotation document that realizes the page clipping illustrated in Figure 11. This transformational annotation can actually be automatically generated by using the examplebased annotation generation tool (Figure 9). The element prescribes a unit of an annotation statement in the annotation language. The target attribute is set to an XPath expression, and identifies the node on which the annotation will be applied, and the take-effect attribute indicates whether the annotation is applied before or after the target node. By specifying target="/HTML[1]/BODY[1]/*[1]" as in Figure 12(a), the clipping state is activated after the first element after the first element, which in this case is an
. The element in Figure 12(a) indicates that all the document elements encountered are preserved, until otherwise instructed by another annotation statement. The clipping state is changed to ’remove’ just before the second
element [Figure 12(c)], and changed back to ’keep’ after the
element 11

[Figure 12(d)]. As results, the second paragraph element indicated by "/HTML[1]/BODY[1]/P[2]" is removed while preserving the elements just before and after the removed element.

Fig. 12. Example of a page-clipping annotation document

Since HTML tables can often be complex elements to clip, the annotation language provides special-purpose elements to make table clipping easier. The and elements allow user to clip rows and columns without relying on complicated XPath expressions. The table-clipping elements are used in the description shown in Figure 12(b). This description sets the clipping state to ’keep’ just before the first table element, and also changes the value of cellpadding attribute to 4 by using the element. The name attribute of can be specified with an arbitrary name of an attribute available for a target document. In addition, the description element [Figure 12(b)] declares that the third column, which is indicated by the index value of the element, is discarded, while the remaining columns are preserved. Note here that the wildcard character to indicate multiple columns (index="*"). If a wildcard is specified, all rows (or columns) will be affected, except for those specifically indicated by a separate (or ) element. So, all rows but the second will be preserved for the target table. Annotation-based document clipping is a useful technique for the adaptaton of existing HTML documents to varieties of emerging small-screen devices, but the advantages are not limited to device adaptation. Another promising application of the document clipping technology is the use in Web portals. Web portals are becoming an increasingly popular technology, since it can provide a single point of comprehensive, integrated access to both Web data and applications. However, each of the Web data or application is for the most cases provided assuming to be presented on a desktop borwser, and would be too spacious to fit into a small area in a portal page. Document clipping is thus useful for Web pages that are aggregated into a portal site. Figure 13 illustrate the process of creating a portal page with an annotation-based clipping portlet. Portlets are specialized servlets that plug into and run in portals, and allow to generate dynamic contents. As shown in Figure 13, when a portal server receives an HTTP request, the server dispatches the request to each portlet aggregated in the page, and collects the results into a portal page to be returned. 12

P orta l P a ge

H T T P re q u e s t

P orta l Server

Clipped pa ge

Clipping portlet

O th er pa ge

O th er portlet H T T P r e s p ons e

Content Server A P P agag e e ( ( a)a)

P P agag e e ((bb ))

Annotation Annotation d d ococ u u m m e e nt nt

Content Server B

Fig. 13. Creation of a portal page with annotation-based clipping portlet

4.2

Transformational Annotation by Selection

Figure 14 shows a screen of an annotation tool for clipping portlet in the left, and a portal page that includes the clipped page in the right. This annotation tool allows a user to select the portions of the original page to be removed in the portal page, and the annotation generator automatically creates page-clipping annotations from the selected nodes. A nnotati on T ool Portal Page on Browser

Annotation b y s e l e c tion

Ad ap tation b y tr ans f or m ational annotation

Fig. 14. Annotation tool for Web clipping portlet

This tool follows the configuration for transformational annotation by selection [Table 1(b)], which is depicted in Figure 15. This type of annotation tools rely on a target document viewer rather than an editor in the case of annotation by example (see Figure 8). Therefore, users can only selects the portions to be annotated, and are not allowed to modify or edit the target document to come up the desired results for customization. However, the tools that follow the transformation annotation by selection can automatically generate an annotation document, as long as the users’ intension can be expressed solely with the simple selection operations. To put it another way, it is possible for both selection-based and example-based approaches to automatically generate transformational annotations, but the selection-based approach is limited in the kinds of annotation constructs to be exploited in the automatic generation as compared with the example-based approach, because the expressiveness of users’ selection on a document viewer is far more limited than that of users’ full editing capability on a document editor. It is noteworthy, however, that the selection-based annotation generation was actually adopted for a software product of an annotation tool for a portal server, and extensively used in the development of a supplier portal of an automotive company. In this case, the automotive company 13

Core Components

Automatic Generation of Annotation by Selection

T a r g et D o cu m en t

Target D o c u m en t Viewer

A n n o ta ti o n D o cu m en t

A n n o tati o n G en erato r

Selected n o des

Fig. 15. Tool configuration for transformational annotation by selection

extensively used the document-clipping portlet with the annotation tool solely for the simple and clipping operations. The primary reason for the customer’s choice was just the simplicity of the authoring process without advanced annotation constructs for document clipping. Since the automotive company needs to aggregate several thousands of existing pages into the portal site, it was not practical to create sophisticated clipping annotations for page by page, and it was reasonable to provide just simple clipping capability to remove headers and side menus in the original documents that were created for browsers on desktop computers.

5

Concluding Remarks

Annotation by example is an innovative approach to helping users with the generation of content adaptation metadata, because it allows the users to create a customized result interactively with an example. Since users of the metadata generator do not have to learn the annotation language at all, this approach is particularly suitable for page designers or novice programmers who are not necessarily familiar with annotation languages. However, it is also useful for skilled designers too, because the environment allows users to concentrate on customizing Web pages without paying any attention to the metadata authoring task. It has been pointed out that the success of an environment for programming by example depends far more on the user experience of interacting with the environment than the induction algorithms used to create the user’s programs [6]. The advantage of annotation by example in this regard is that it does not rely on any particular editors, including a conventional WYSIWYG HTML editor, rather than a special editor tailored for the annotation generation. Further research will be needed to investigate the applicability of this technique. The main problem is to determine to what extent users are willing to accept the annotation-by-example environment that guesses what they are doing, and that occasionally might make inadequate or inappropriate generalizations. Web documents cannot always be the same for the long time, and are often changed due to their dynamic contents and/or site layout changes. In such situations, addressing expressions in annotations (i.e., XPath expressions) need to be adjusted to the changes in target documents. This is the issue related to robust positioning, which has been investigated through empirical studies [26, 4]. Although little study has been done regarding the robustness of XPath expressions, we have reported preliminary results of empirical evaluation [2], and are going to further investigate this issue in order to draw practical implications to the usage and authoring process of external annotation.

References 1. Abe, M. and Hori, M.: A visual approach to authoring XPath expressions. Proceedings of Extreme Markup Languages 2001, pp. 1–14 Montr´eal, Canada (2001). 2. Abe, M. and Hori, M.: Robust pointing by XPath language: Authoring support and empirical evaluation. Proceedings of the International Symposium on Applications and the Internet (SAINT 2003), Orlando, Florida (accepted).

14

3. Asakawa, C. and Takagi, H.: Transcoding system for non-visual Web access (2): annotationbased transcoding. Sixteenth International Conference on Technologies and Persons with Disabilities (CSUN2001) (2001). 4. Brush, A. J., Bargeron, D., Gupta, A., and Cadiz, J. J.: Robust annotation positioning in digital documents. Proceedings of the 2001 ACM Conference on Human Factors in Computing Systems (CHI 2001), pp. 285-292, Seattle, Washington (2001). 5. Booth, A., Collins, J., Fridrich, G., Seekamp, C., and Topol, B.: WebSphere Transcoding Publisher: New Features for Document Clipping with Annotation. IBM WebSphere Developer Domain, http://www7b.software.ibm.com/wsdd/library/techarticles/0205 booth/booth.html (2002). 6. Sypher, A.: Introduction: Bringing programming to end users. In Allen Sypher et al. (Eds.): Watch What I Do: Programming by Demonstration. pp. 1–11, The MIT Press, Cambridge, MA (1993). 7. Denoue, L. and Vignollet, L.: An annotation tool for Web browsers and its applications to information retrieval. Proceedings of the 6th Conference on Content-Based Multimedia Information Access (RIAO 2000), Paris, France (2000). 8. DeWitt, S. : Basic Web Clipping Using WebSphere Portal Version 4.1. IBM WebSphere Developer Domain, http://www7b.software.ibm.com/wsdd/library/techarticles/0206 dewitt/dewitt.html (2002). 9. Dublin Core Metadata Element Set, Version 1.1: Reference Description. Dublin Core Metadata Initiative, Recommendation, http://dublincore.org/documents/dces/ (1999). 10. Document Object Model (DOM) Level 1 Specification Version 1.0. W3C Recommendation, http://www.w3.org/TR/REC-DOM-Level-1/ (1998). 11. Erdmann, M., Maedche, A., Schnurr, H.-P., and Staab, S.: From manual to semi-automatic semantic annotation: about ontology-based text annotation tools. Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, Luxembourg (2000). 12. Heflin, J. and Hendler, J.: Semantic interoperability on the Web. Proceedings of Extreme Markup Languages 2000, pp. 111–120 (2000). 13. Hori, M., Mohan, R., Maruyama, H., and Singhal, S.: Annotation of Web Content for Transcoding. W3C Note, http://www.w3.org/TR/annot/ (1999). 14. Hori, M., Kondo, G., Ono, K., Hirose, S., and Singhal, S.: Annotation-based Web content transcoding. Proceedings of the 9th International World Wide Web Conference (WWW9), pp. 197–211, Amsterdam, Netherlands (2000). 15. Hori, M., Ono, K., Kondo, G., and Singhal, S.: Authoring tool for Web content transcoding. Markup Languages: Theory & Practice, 2(1): 81–106 (2000). 16. Hori, M., Ono, K., Koyanagi, T., and Abe, M.: Annotation by transformation for the automatic generation of content customization metadata. In F. Mattern and M. Naghshineh (Eds.) Pervasive Computing, First International Conference, Pervasive 2002, Lecture Notes in Computer Science 2414, pp. 267–281, Zurich, Switzerland (2002). 17. Hori, M.: Semantic annotation for Web content adaptation. In D. Fensel, J. Hendler, H. Lieberman, and W. Whalster (Eds), Spinning the Semantic Web, pp. 542–573, MIT Press, Boston, MA (2002). 18. Kahan, J. and Koivunen, M.-R.: Annotea: an open RDF infrastructure for shared Web annotations. Proceedings of the 10th International World Wide Web Conference (WWW10), pp. 623–632, Hong Kong (2001). 19. Koyanagi, T., Ono, K., and Hori, M.: Demonstrational Interface for XSLT Stylesheet Generation. Markup Languages: Theory & Practice, 2(2): 133–152 (2001). 20. Lassila, O.: Web metadata: a matter of semantics. IEEE Internet Computing, 2(4): 30–37 (1998). 21. Lieberman, H. (Ed.): Your Wish is My Command: Programming by example. Morgan Kaufmann Publishers, San Francisco (2001). 22. Mea, V. D., Beltrami, C. A., Roberto, V., and Brunato, D.: HTML generation and semantic markup for telepathology. Proceedings of the 5th International World Wide Web Conference (WWW5), pp. 1085– 1094, Paris, France (1996). 23. Marshall, C. C.: Toward an ecology of hypertext annotation. Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, pp. 40–49, Pittsburgh, PA (1998). 24. Nagao, K., Shirai, Y., and Kevin, S.: Semantic annotation and transcoding: making Web content more accessible. IEEE Multimedia, 8(2): 69–81 (2001). 25. Ono, K., Koyanagi, T., Abe, M. and Hori, M.: XSLT Stylesheet Generation by Example with WYSIWYG Editing. Proceedings of the International Symposium on Applications and the Internet (SAINT 2002), pp. 150–159 (2002). 26. Phelps, T. A. and Wilensky, R.: Robust intra-document locations. Proceedings of the 9th International World Wide Web Conference (WWW9), pp. 105-118, Amsterdam, Netherlands (2000). 27. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Working Draft, http://www.w3.org/TR/rdf-schema/ (2002).

15

28. Rousseau, J. F., Macias, A. G., de Lima, J. V., and Duda, A.: User adaptable multimedia presentations for the World Wide Web. Proceedings of the 8th International World Wide Web Conference, pp. 195–212, Toronto, Canada (1999). 29. Sakairi, T. and Takagi, H.: An annotation editor for nonvisual Web access. Proceedings of the 9th International Conference on Human-Computer Interaction (HCI International 2001), pp. 982–985, New Orleans, LA (2001). 30. Spinks, R., Topol, B., Seekamp, C., and Ims, S.: Document clipping with annotation. IBM developerWorks, http://www.ibm.com/developerworks/ibm/library/ibm-clip/ (2001). 31. Eclipse Platform. eclipse.org Consortium, http://www.eclipse.org/ (2002). 32. XML Information Set. W3C Recommendation, http://www.w3c.org/TR/xml-infoset/ (2001). 33. XML Linking Language (XLink) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xlink/ (2001). 34. XML Path Language (XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath (1999). 35. XPointer Framework. W3C Proposed Recommendation, http://www.w3.org/TR/xptr-framework/ (2001). 36. XSL Transformations (XSLT) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xslt (1999).

16