Navigation of HTML Tables, Frames, and XML ... - ACM Digital Library

9 downloads 683 Views 891KB Size Report
development of technology to support the non-visual navigation of complex HTML and XML structures. Keywords. Web Accessibility, Visually Impaired Users.
Navigation of HTML Tables, Frames, and XML Fragments E. Pontelli, D. Gillan, W. Xiong, E. Saad

Dept. Computer Science Dept. Psychology New Mexico State University [email protected]

G. Gupta

A.I. Karshmer

Dept. Computer Science University of Texas at Dallas [email protected]

Dept. Computer Science University of South Florida [email protected]

browser's code, while in the latter, we are constrained to making accessibility recommendations to web page designers. Given these constraints, our project has identified three of the most difficult areas of web accessibility through current browsers: tables, frames and forms. While other groups are working on more general problem areas of web browsing [2,16,17], we have confined our research to these three aspects of the problem because they represent the most difficult cases that elude an elegant, general solution. The general reading of a web page by the visually impaired can be implemented by presenting a totally text-based version of the page. This is, however, not possible in the areas of tables, frames and forms. Text is linear in nature while tables, frames and forms are multidimensional and much of their semantic content is implicit in their structure.

ABSTRACT In this paper, we provide a progress report on the development of technology to support the non-visual navigation of complex HTML and XML structures. Keywords Web Accessibility, Visually Impaired Users.

INTRODUCTION Accessibility to resources in our information based society is considered an entitlement by the majority of the industrialized world. Through a reasonably simple series of keystrokes and mouse clicks we are able to access vast amounts of information from distant parts of the world. While the value of all the information available is sometimes questionable, it is there and it is accessible. Accessible that is, if you are not one of the handicapped members of our society. For many in this group, such access is difficult or impossible.

The problem is broad in scope for the visually impaired. Lack of access to these structures has a negative impact not only on decision making in everyday life, but also on access to educational materials which are now being presented in larger measure via the web.

In the United States, landmark legislation has mandated equal access to the information world for all Americans. Through laws such as the 1990 Americans with Disabilities Act (ADA), the 1996 Telecommunications Act and the Section 508 Amendment to the 1993 Rehabilitation Act, the disabled are guaranteed accessibility "where reasonably achievable." These three words belie the problem. In the domains of hardware or architectural modifications, for the most part, the problems are understandable and surmountable. In the domain of software however the situation is somewhat different. The areas of system interfaces and web access highlight this problem.

Project Overview The focus of this work is on developing a collection of tools aimed at improving accessibility of a well-defined set of "complex" structures commonly found on the Web. The focus is on non-visual accessibility - e.g., to provide support to visually impaired individuals as well as users accessing the Web with devices with no or limited display capabilities - and the research is currently aimed at analyzing the role of HTML Tables and Frames. These two constructs are widely used in virtually every Web page (indeed the two constructs are often used interchangeably) and they are by nature non,linear. This implies that a linear translation of these components into speech would lead to a substantial loss of semantic content. This is particularly evident in the case of tables. A Table is inherently a two or more dimensional structure and its spatial layout is an essential component of the semantics encoded in it. In [13] we have presented the design of a system providing capabilities for navigation of HTML tables and other nonlinear HTML structures. In this paper we report on our accomplishments since then.

In our current work, we focus on the accessibility of the WWW by blind and visually impaired individuals. We are specifically interested in web-accessibility issues in an educational setting. We address two basic problem areas: the accessibility offered by the popular web browsers and the effect of the design of the actual web pages. In the former, we are not interested in making changes to the web Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Assets 2002, July 8-10, 2 0 0 2 Edinburgh, Scotland Copyright 2002 ACM ISBN 1 - 5 8 1 1 3 - 4 6 4 - 9 - 0 2 / 0 7 ... $5.00

25

Thus, this archival study indicated that two types of spatial organization are used to provide organizational information in tables, the two-dimensional row-by-column layout and

Related Work

A number of proposals have recently been made towards the development of tools to improve Web accessibility for the visually impaired. Some of the noteworthy proposals are the following. The WAB effort at ETH [10] represents one of the first efforts in this area where transcoding proxy servers are employed to extend HTML documents with (static) navigation information (in this case, links to facilitate the retrieval of titles and hyperlinks). Substantial effort has been invested in IBM's HomePage Reader (HPR) [2] and the related work by Asakawa et al. [l,11 ]. This is indeed the only other proposal that explicitly deals with the issue of table navigation. Like HPR, the pwWebSpeak browser [4] also offers speech output through HTML interpretation. The current reported version of pwWebSpeak does not support frames; it represents tables by linear sequences of links (each table location is interpreted as a separate page); tables are sequentially navigated left-right, top-bottom without any special announcement. The BrookesTalk [16] speech-enabled browser provides access to different parts of a Web page using keys; its novelty is in the use of summarization techniques to facilitate non-visual navigation.

spatial grouping. Table 1: A Table on Pesticides

Type

Formula

Target

Toxic

Exp..

Vanquish

herbicide

flowable

dandelions

medium

dermal

Zapper

fimgicide

powder

mildew

low

ocular

i cockroach

high

inhal.

Riddance

insectic.

fogger

BeGone

rodent.

granules

mice

extreme

ingestion

Purge

miticide

dust

ticks

nominal

cutaneous

Following from the results of the archival study, we conducted two series of experiments to examine different means for supplying a table reader with information about the structure of a table via non-spatial means. In both series, we used sighted subjects in an attempt to understand the basic cognitive mechanisms used by sighted readers interacting with tables. In some of the experiments, we attempted to simulate the flow of information that a blind user would experience when she interacted with a table. The goals that underlie this approach are to provide a baseline against which later to compare the cognitive processes of blind users and to understand the information that spatial cues provide a sighted user and possible ways of providing that same information by non-visual means.

In this project our aim is to provide integrated client-side (i.e., at the browser level) and server-side (i.e., at the level of the Web page author or provider) semantic specification for the understanding and navigation of composite HTML structure (forms, frames, tables). Relatively few other proposals have tackled this problem and most of them rely exclusively on the purely syntactic HTML component to provide navigation support. The need for adaptation of browsers and for the direct access to HTML for accessibility support has been raised by various authors [3,9]. This was further underscored by recent survey studies [5,7] which explicitly pointed out the ineffective performance of existing screen readers when coupled with the Web. Gunderson & Mendelson specifically cite the task of locating information in table structures as one of the most challenging tasks for visually impaired users [7].

In one series of experiments, we investigated the use of tables to learn structured information, much like a chemistry student might learn the periodic table. The experiments were conducted using 5 x 5 matrix tables, and subjects were instructed to learn the information (i.e., a word) in each of the 25 cells. For example, in Table 1 each target word is associated with an individual row and column header combination. However, to simulate the flow of information experienced by a blind reader, each cell was presented individually. During the learning phase in each experiment a row header, column header and a unique target word group were presented to the participant together. This type of presentation was called the temporal format. Depending on the condition, the temporal format might be represented by a specific cue designed to provide structural information. Participants did not see the underlying table at any time during the experiment, nor were they told that the information was extracted from a table (except as an experimental condition in one experiment). Following the learning phase, subjects received a test in which they had to select, for each row/column header combination, the correct target word from a list consisting of the correct word, the remaining target words from the row and column and six foils (for a total of 15 possible responses).

COGNITIVE ASPECTS OF TABLE NAVIGATION

We started our work by investigating the cognitive processes used in reading tabled; initially we conducted an archival study in which we examined the structure of tables in selected scientific journals (Science and Nature) and on web pages (from government and commercial sites), as well as the tasks for which the tables were designed. For both media, the structure of the table was a function of the difficulty of the user's projected task: simple tasks (e.g., an overview of the data in a table) were typically associated with simple tables (e.g., a row by column matrix), whereas the more complex tasks (e.g., to understand an interaction between variables) were associated with tables in which the rows or columns contained spatially-defined groups. In addition, the web-based tables made use of graphics embedded within the tables and contained hyperlinks.

The first experiment examined the effect of the temporal vs. spatial presentation of tabular information. Subjects studied and were tested on three tables that differed by the

26

information contained and the type of cue displayed with the row header, column header, and target word. The temporal only condition consisted of only the row and column headers with the target word; the spatial-separated condition also showed an empty 5 x 5 table in which the relevant position where the target word would normally be located in a table was high-lighted. Likewise, the spatialintegrated condition showed this same table, and the indicator cell contained the target word. During the subsequent test, subjects from the spatial-integrated condition made the fewest errors. So, the data suggest that providing spatial cues results in better learning than the pure temporal presentation of the tabular information.

an example, produced significantly more errors than either the No Instruction or Structure conditions. No significant diffei'ences in recall performance were detected between these conditions. These results show a disadvantage for table users when they are given information in a procedure that simulates the flow of information to blind users using a screen reader. The results also show that providing a table user with additional structural cues was helpful only when the cues were spatial in nature; adding color or auditory cues made learning performance worse. The second series of studies examined the use of spatial and non-spatial forms for indicating groups within a row-bycolumn matrix table. Subjects used tables to answer a variety of questions that varied by complexity. Across the experiments, subjects' performance was compared when they used a table organized simply by alphabetic listing (baseline), groups indicated by spatial grouping on the display, groups indicated by color-coding, groups indicated by subject-controlled graying-out of the group not selected by the subject, and groups indicated by exploding-out of the selected group (i.e., rows within the selected group expand in size and move closer to the reader). The results showed that the various forms of cuing grouping information only aided readers answering complex questions. Spatial grouping, color-coding, graying-out, and exploding-out all produced better performance on selected questions than the alphabetical organization, although exploding-out was the least effective. Interestingly, when the spatial grouping was combined with any of the other procedures, the combination fared no better than the cues taken one at a time. These findings suggest that these various forms of grouping cues permit the table reader to shift procedures on complex questions to the procedures that they use in answering simple questions. As a consequence, their performance improves only on those complex questions.

A second experiment examined whether non-spatial cues could also produce better learning than the temporal presentation format. Experiment 2 used the same basic procedures as Experiment 1 in that subjects received training with the temporal format and two conditions in which the temporal format was enhanced with another cue. The two non-spatial enhancements in the learning phase involved the use of color or auditory tones. In the color condition, each column in the table was assigned a distinct hue, with rows represented by amount of saturation. In the tone condition, each column was assigned a discrete degree of auditory balance fi'om left to right, with the center column tone being equally distributed in both ears. Row information was indicated by changes in pitch, with higher pitch indicating higher rows. Contrary to our hypothesis, the results showed that providing color or auditory cues to the structure of the tabular information during learning resulted in worse test performance than the temporal format. As a consequence of the surprising finding that adding structural information led to worse performance in the second experiment, a third experiment was designed to explore the hypothesis that the auditory and color cues disrupted learning because subjects did not recognize that the cues provided structural information since they had not been informed that the target information could be organized as a table. In the third experiment, subjects were randomly assigned to one of three groups: The No Instruction group received no additional information about the underlying tabular structure. In the Structure condition, subjects were told that the data was organized in a systematic fashion. The Table group was told that the data was organized in a systematic fashion and were also presented with an example table and explicitly shown how the column and row headers were related to the target word. Additionally, the tone cues were paired with the example table and participants interacted with the table to explore the relationship between the tones and spatial locations of the target words. E.g., if a subject clicked on each row in a single column of the example table, she would hear the change in pitch at a constant balance level as she progressed down the column. Again the results were contrary to our hypothesis. The subjects, who were explicitly informed of the underlying structure and shown

These archival studies and experiments have helped us to understand how important spatial cues are to the construction of tables. This substantiated the importance of designing non-visual navigation technology that provide a direct abstraction of the spatial layout of tables - as the semantic-based technology described later in this work. The experiments also proved that light-hearted additions of nonspatial cues may be more harmful than beneficial, confusing the reader. In any case, the lessons learned from these experiments have been valuable in designing the tools described in the rest of the paper. The information gathered in these experiments will also be valuable in evaluating these tools in the future. SYSTEM STRUCTURE Figure 1 provides an overview of our proposed system that we are developing for making complex HTML structures (e.g., tables) accessible. The system comprises of two phases: the first phase involves the retrieval of documents from the WWW, typically directed by a visual users (e.g., a teacher preparing course material) and the generation of

27

Semantic Descriptions (SD) - which are then stored in a

represented in HTML comments or in special-purpose attributes, was an inadequate solution to an important problem. With the advent of XML, meta-data representation has become far easier, since a web page can now store arbitrary amounts of additional information in addition to its main content. An XSL style sheet can choose to ignore this extra content when displaying the document. It has been suggested that this extra meta-data content can be used for web page retrieval; it can also clearly be used for web page navigation. It is this latter use that we are proposing as semantic navigation. Since the content of this meta-data is arbitrary, links between meta-data items and between meta-data and the main content of the page can cut across the syntactic structure of the page. These links have to be represented separately from the page itself, possibly in an associated document, just as display information can be separated from the page and placed in a style sheet. We are proposing to use conceptual graph formalisms for the representation of the semantic links, and we have started work in this direction.

local database. The second phase involves the actual nonvisual navigation, where requests from a blind user are paired with the available SDs to support interactive and "intelligent" non-visual navigation.

v'~ow~

~.~.~.. ~

'

/

li:::l

,

~

~'~

,

f!:~,/,:|

........... ~ ............

~-------

I

/

\

I..~,i~tioa S . l ~ l s t e r a

Figure 1: Overall System Structure

Semantic Structure One of the key aspects of the solution we propose in this project is the adoption of knowledge representation mechanisms - specifically Conceptual Graphs [14] - to capture the navigation structure of a complex HTML component. We will refer to the structure used to encode the navigation structure of a complex HTML component as its navigational SD (or simply Semantic Description (SD)). The SD is synthesized in a semi-automatic fashion. The SDs are handed to the speech navigator; the navigation process is thus customized according to the semantic structure of the current document.

The creation of the SD for a non-linear component of a document represents the key problem we propose to solve in this project. The process involves two steps: (i) identification of the concept nodes of the conceptual graph, and (iO identification of the relation nodes of the conceptual graph. In our context, the concept nodes of the graph represent the semantic entities which are described by the document's component. Nodes are commonly organized according to one or more hierarchies. The lower level of the hierarchy commonly includes syntactic elements directly extracted from the document (e.g., cells of a table, CDATA of an XML document, values within a < c o o r d i n a t e > GML element). The higher levels of the hierarchies provide semantic entities representing general concepts or collections of concepts (e.g., a column is a collection of cells).

Conceptual Graphs The knowledge representation scheme that we propose to adopt in this project is based on Conceptual Graphs [14]. Conceptual Graphs and their associated theory, Conceptual Structure Theory, were proposed in the 1970s as a way of drawing logical statements in diagrammatic form rather than in a linear text-based calculus which was and is the norm. The basic ontology is very simple as it is in logic. A conceptual graph can have two kinds of node; a concept node that represents types and objects of that type and a relation node that represents a relationship between these objects. The theory allows for a basic expressiveness that is equivalent to first-order logic as well as mechanisms for defining concepts and for representing type hierarchies of concepts and relations. Researchers have extended the formalism to allow representation and manipulation of more advanced ontologies, especially those involving actions and events, and higher level structures such as viewpoints, and nested contexts.

The edges of the conceptual graphs represent relationships between the conceptual entities identified as nodes of the graphs. A natural class of relationships originates from the presence of a hierarchy between different concepts present in the graph. The conceptual graph representing a document component (e.g., an HTML table) will be created by combining three sources of knowledge: (i) syntactic content of the document (e.g., use of HTML tags and attributes); (ii) direct input from a human annotator (e.g., the teacher, the creator of the document, a third party); (iii) history of how the documents components have been used in the past (through databases of access traces). These are analyzed in more detail in the next subsections.

There are two kinds of structure present in any web page: syntactic and semantic. The former is hierarchical, as reflected in the use of HTML/XML. Before XML was proposed, the only way to navigate a web page was to follow this syntactic structure. Any other navigation technique was forced to rely on search through content, especially if no meta-data was present. Meta-data,

It is important to observe that in our framework the knowledge representation is not static but it is highly dynamic and constantly updated.

28

B ~

Remet¢ Sel'vtnr

~' ~ ~

I Transc°ding SProxy erver 1 ~ ~ 1

1~-7~

'

Syntax ~'~'~'~'~'~'~%'Analyzer ~

Senmatk IMPatiens

7 ~

.

PFNET

Figure2:CreationofSemanticDescriptions

Figure3:GUIforManualAnnotation

Explicit Generation of SDs

Syntactic Analysis of Tables The task of developing the SD for a complex HTML structure is facilitated by the introduction of a syntactic analyzer. The objective o£ the syntactic analyzer is t o extract as much navigational information as possible from the syntactic structure of the CHS. Once again, we have focused on the analysis of HTML Tables to assess the feasibility of this phase.

It is fairly clear that reliable completely-automatic generation of SDs for every complex HTML structure (CHS) is an impossible task [14] - e.g., the syntax used to express CHSs is inadequate to explicitly express the complete navigation strategies for such document component. In order to make the task of constructing the SD more realistic, we restricted our focus to the context of coursework management - i.e., the various documents are part of the material offered to the students of a course. There are some clear advantages in taking such perspective: (i) we can assume the presence of an instructor, who will take charge of generating and/or completing the semantic descriptions wherever they are lacking or incorrect; (i/) we can assume the presence of a controlled population of users, whose actions can be, to a certain extent, supervised.

Thee intuition behind this syntactic analysis is to recognize collections of table cells (i.e., HTML elements) that the syntactic layout of the table suggests should be grouped together. There are a variety of different items in the syntactic layout of the table that may suggest grouping of cells, e.g., (i) the explicit grouping of cells/rows performed through the use of elements such as < C O L G R O U P > and attributes such as C O L S P A N ; (ii) the use of indexing elements in the table, such as header tags (e.g., < T H > in combination with the scope attribute, and ); (iii) each cells may identify one or more header elements (e.g., through the h e a d e r s attribute in combination with the axis attribute); (iv) in addition to the above components (that are all explicit HTML constructs to structure a table), users frequently use additional visual features to suggest grouping of cells; the most typical approach is through the use of different background colors, fonts colors and size. We have developed a syntactic analyzer that explores the use of these features to recognize possible meaningful grouping of cells. These grouping of cells are proposed to the manual annotator (through the GUI described in the previous section); at that stage, the human annotator has the option to accept, modify, or reject such suggested collections. For example, consider the table on the left in Fig. 4; the syntactic analyzer is capable of recognizing the components of the semantic structure illustrated on the right in Fig. 4.1

The first and simplest approach towards the construction of the SD of a complex structure is the manual approach: a human annotator (e.g., an instructor or a TA) makes use of a specialized tool to create the descriptions while assembling the course material. This task is accomplished through the use of a specialized GUI. As shown in Fig. 2, each request generated by the instructor/TA is filtered by a proxy server; each time the incoming document contains a CHS (in our tests we have focused on HTML Tables), the proxy server automatically extracts the CHS and starts the Annotation GUI. As shown in Fig. 3, the GUI presents the user with an abstraction of the Table; the human annotator can select arbitrary groups of cells in the table and assigns to them a description--thus generating new abstraction levels in the SD. The tool has been developed in Java and, among other things, provides the following capabilities: (i) specialized parsing reutines focused on the extraction of CHSs; (i0 it integrates the syntactic analyzer (described in the next section), thus offering to the annotator an initial SD to work on.

1 The table indicated the cells labeled L o d g i n g M e a l s as headers.

29

and

TaN~ Ivy ~ l p

i I

i

~

'tr*p

I

rr ~lp Ixpm~s

-/

~t

,I

i

!

l

I Lc4ging

i" " "

7 : : .... :

,

t

" ~

~2s3

network in increasing order according to their weights. Direct links between a pair of nodes are included in the Pathfinder network only if its weight is less than the weight of any path with q or less links (triangle inequality) using the r-metric to compute weights. In addition, pathfinder uses the Link Labeling Rule (LLR) which provides a label for each link according to some classification scheme. Links are labeled as primary, secondary, or ternary depending on their role in the network. A primary link is the link which is the only path joining a connected subgraph of the PFNET to a subgraph which has only one node. A secondary link is the link that joins subgraphs in the PFNET or provides alternate path between nodes already connected by primary links. A link is labeled ternary if it joins nodes within a subgraph for which alternate paths already exist.

~ ~tp ~ s t s

~

\

gxay a~ea

~|

Pteal3

~~ ~

~"

\~:s~

~" ~

~-"~

ff

I~

/ I 1I!

.~'

P!eal~

Lo4ging I

j~:~'t ./

I

/

sits

/"

./71 /

ss66

Figure 4: Result of Syntactic Analysis Usage-based Synthesis of SDs: Pathfinder Networks Introduction: In this phase of the project we have employed pathfinder networks to generate a model for web navigation. The intuition is to maintain collection of traces, representing the events generated during previous access to the same document. Pathfinder networks can be used to process these traces and generate "clusterings" of table cells that can be translated into new levels of abstractions in the SD. Network models have been used in many areas in computer science including Artificial Intelligence, Operating Systems, and Database. One of the methods that produce these network models is Pathfinder (PFNET) [6]. The Pathfinder algorithm is an algorithm that finds the shortest path between every two entities in a weighted path and produces a network structure based on estimates of similarities between its entities. Distances can be obtained actually by estimating the "similarity" between each pair of entities.

Application of Pathfinder Networks In this project we have encoded the pathfinder network generation algorithm to support the maintenance of SDs. A specialized tool has been developed to (voluntarily) offer the opportunity to sighted and blind Web navigators to generate traces of the steps taken in navigating a document. Traces are in turn translated in inputs for PFNET algorithm; nodes correspond to table cells and/or known abstractions of groups of cells, while initial measure of similarities are derived from the distance between nodes in the traces. We have performed a variety of experiments to validate the effectiveness of the usage-based synthesis of semantic representations via Pathfinder networks. The example below has been obtained by processing one of the tables

The Pathfinder Algorithm: The input to the pathfinder algorithm is represented by a collection of entities (represented as nodes in a graph) and estimates of the "distance" (or "similarity") between certain pairs of nodes. The outcome is the generation of a network where links between nodes are present if the corresponding entities are considered sufficiently similar (and each link bears a weight measuring such similarity). Pathfinder introduces a special metric (called Minkowski r-metric) to compute the distance between each pair of nodes that are not directly linked [6].

available at twister, sbs. ohio-state, edu. This home

page provides marine, NCEP, and satellite textual weather data to meteorologists and the general public. The data is organized into tables, each cell in a table being a hyperlink to another home page. Our target is to navigate through these tables to produce a number of navigation traces, and then use such sequences to create network model (PFNET) from the hyperlinks clicked during the navigation. We have worked on the Marine Data, Alaska table. This table has 15 hyperlinks, these hyperlinks represent the nodes in PFNET, and we have used a number of navigation sequences (traces) to built PFNET (14 navigation sequences). We have used r = 2 and q = 14. In addition LMR has been used because the nature of the navigation sequences leads to a directed PFNET. The distances between nodes are measured logically according to how close a hyperlink to another in a navigation sequence. The resulting network is shown in Fig. 5.

The generation of the network is based on a second parameter (the q-parameter) - that specifies the maximum number of links in paths that are guaranteed to satisfy the triangle inequality. Intuitively, a path between two nodes satisfies the triangle inequality if the weight of the link between the extremes of the path is no greater than the size of the path (according to the r-distance). Links are eliminated from PFNET because they violate the triangle inequality of paths that have q or fewer links. In the resulting PFNET, triangle inequality violation could be found in paths that have more than q links. Larger values of q result in fewer triangle inequality violations. There is no triangle inequality violation in the resulting PFNET when q is the number of nodes - 1.

This network model can be used for automatic navigation, after creating the PFNET we can consider certain node in PFNET as a root (we can choose it as the node that has the largest number of links to the other nodes in the network). Then we can navigate through the nodes (links) level-wise, depth-wise, or both. This way of navigation is better than traditional row- or column-wise navigation because it is based on navigating the most important (frequently visited)

The Link Membership Rule (LMR) is used to determine whether or not links should be added to the Pathfinder network. This is accomplished by ordering all links in the

30

hyperlinks first, which leads to the user target information much faster. A final post-processing phase is used to integrate the PFNET obtained into the main SD; group of nodes clustered by PFNET are translated into new levels of abstraction in the SD. A formal evaluation of the quality of the new levels of abstractions derived from PFNET is in progress.

file. The definition of HTML includes facilities for controlling the structure and contents of each frame. E.g., the HTML code can be structured in such a way that clicking on a link in one of the frames can result in the target page being displayed in another frame. Frames-based web-pages are yet another source of difficulty for blind users. Most screen readers will traverse the frames in a frame-based web-page in an arbitrary order. Quite often, the author of the frame-based page intended the frames to be viewed in some specific order. The information regarding the order in which the frames are to be viewed is implicit in contents of the frames and has to be inferred through visual inspection. This information regarding the order is lost to the screen-reader and thus the screen-reader might read out the frames in the wrong order. E.g., the most common usage of frames is for displaying an index frame and a content frame, where the content of the various items in the index can be seen in the content frame by clicking on them; in this situation a rational screen reader should read the index page first, then the contents page, and then go back to the index page. However, the order in which the frames are traversed is left to chance in current screen readers.

DOMAIN SPECIFIC LANGUAGES In [13] we presented for the first time a Domain Specific Language designed to support navigation of complex HTML structures. This solution implies that each step in the navigation of a CHS is reduced to the execution of a sequence of commands in the DSL. There are various advantages in following this approach - e.g., predefine (parts of) navigation strategies and provide clean interface to support different forms of disabilities (e.g., partial blindness). The SD can be explicitly generated through a sequence of DSL commands (thus allowing for its dynamic manipulation). In turn, the DSL commands that generate the SD can be automatically generated by the PFNET algorithm and its post-processing step. The key commands used to (i) describe the semantic structure are: connected (Node, Node, Description) stating that the two nodes are part of two layers and the description is the conceptual definition of their relation; (ii) c o n t e n t (Node) a fluent describing the content of a given node (e.g., the HTML stored in a table cell). Navigation is performed through standard graph traversal commands: (i) concrete(Concept) and abstract(Concept) which provides abstraction and concretization between conceptual levels; (iO p r e v i o u s , n e x t to move within a conceptual layer. Some additional commands [13] are also present to query the current status during navigation.

For making frame-based pages accessible, an approach that is similar to the one used for making tables accessible can be used. The various frames in a frame-based page can be grouped and organized as a conceptual graph, very similar to what we propose for the tables. The PFNET algorithm can be used to encode commonly traversed paths and to encode groups of frames clustered together in the SD. A DSL (which is essentially a subset of the one for tables [13]) has also been designed to allow navigation of framebased pages.

IMPLEMENTATION TECHNOLOGY The system presented in this report is currently under development; the major components have been developed and are being integrated into a single system. System Interfaces: The interfacing of the various system components with the outside world has been accomplished using standard off-the-shelf interfaces. The system interacts with the outside world at three different levels. First of all, the interactions with the external servers are accomplished via transcoding proxy servers, as illustrated in the previous sections. We have currently implemented two different proxy servers: the first handles the annotation process--i.e., it interacts with a standard Web browser (Netscape) and redirects the incoming documents to the annotation GUI and the syntactic analyzer. The transcoding proxy server also enriches the received documents with calls to a tracer applet - this allows the browser to trace user access to the tables' content; these traces are used by the pathfinder algorithms to improve the SD. This first proxy server has been built as a simple C program interacting with the Java programs that implemem the annotation GUI and the syntax analyzer). The second proxy server is employed by the

Figure 5: Network obtained from P F N E T Frame-based Web-Pages So far our discussion has been driven by our goal to facilitate the navigation of tables. A frame is another CHS that is used quite frequently in web-pages. Frames essentially divide a web-page into a number of sections where each section (or frame) displays a distinct HTML

31

speech navigator to retrieve the documents accessed by non-visual user and retrieve the SD associated to the CHS present in such document. In this second proxy server we have experimented with IBM's WBI Development Kit (Web based Intermediary) along with the Java APIs. WBI is programmed to act as a proxy and to serve as a HTTP request and response processor; a plug-in has been developed to interface the requests and the documents with the SDs maintained in the local repositories.

navigational SD from the information contained in XSLT sheets associated to the XML document. ACKNOWLEDGMENTS

Research is partially supported by NSF grants EIA0130887, CCR9875279, HRD9906130, and EIA9810732. REFERENCES 1. C. Asakawa and T. Itoh. User Interface of a Home Page Reader. Int. Conf. on Assistive Technologies. ACM, 1998.

Speech Synthesis: Speech synthesis is currently produced using IBM ViaVoice SDKs for Text-to-speech tools.

2. C. Asakawa and C. Laws. Home Page Reader: 1BM's Talking Web Browser. Technical report, IBM, 1998.

Implementation of DSLs: The development of the DSL relies on the use of advanced logic programming technology. The specification of the DSL has been accomplished through the use of Horn-logic denotations [8] - a logic-programming based software engineering methodology to specify Domain Specific Languages and automatically derive provably correct implementations. The DSL inference engine is also currently being extended to take advantage of the powerful capabilities offered by the underlying logic programming layer; as discussed in a companion paper [12], the DSL can be seen as an instance of an action language; this allows the use of sophisticated planning and reasoning algorithms to (partially) automate the navigation process and to delegate to software agents some of the more tedious tasks.

3. J. Brewer, D. Dardailler, and G. Vanderheiden. Toolkit for Promoting Web Accessibility. Technology and Persons with Disabilities Conference, 1998. 4. J.C. De Witt and M.T. Hakkinen. Surfing the Web with pwWebSpeak. Technology and Persons with Disabilities Conference, 1998. 5. C. Earl and J. Leventhal. A Survey of Windows Screen Reader Users. Journal of Visual Impairment and Blindness, 93(3), 1999. 6. R.H. Fowler and D.W. Dearholt. Pathfinder Netwoks in Information Retrieval. MCCS-89-147, NMSU, 1989. 7. J. Gunderson and R. Mendelson. Usability of World Wide Web Browsers by Persons with Visual hnpairments. RESNA Annual Conference, 1997.

FUTURE WORK AND CONCLUSIONS This paper represents a progress report for the NMSU project on Web accessibility. During the last year we have achieved substantial accomplishments. We have refined the semantic structure adopted to describe the navigational semantics of complex HTML components, which is now completely described as instance of the well-studied conceptual structures (and its variants to support multiple viewpoints). A variety of effective techniques have been developed to develop and maintain such SDs. We have also established a domain specific language dedicated to the navigation of such SDs; the language has been validated in two separate contexts--i.e., navigation of tables and navigation of frames. The different components have been interfaced and integrated and the complete system is now becoming operational.

8. G. Gupta and E. Pontelli. Specification, Implementation, and Verification of DSLs. Computational Logic: from Logic Programming into the Future. Springer, 2002. 9. P. Hendrix and M. Birkmire. Adapting Web Browsers for Accessibility. Technology and Persons with Disabilities Conference, 1998. 10. A. Kennel et al. WAB: WWW Access for Blind And Visually Impaired Computer Users. New Tech. in Educ. of Visually Handicapped, 1996. 11. T. Oogane and C. Asakawa. An Interactive Method for Accessing Tables in HTML. Conference on Assistive Technologies, ACM Press, 1998. 12. E. Pontelli and S. Tran. Navigating Tables: Planning, Reasoning, and Agents. TechRep., NMSU, 2001. 13. E. Pontelli et al. A DSL Framework for Non-visual Browsing of Complex HTML Structures. Conference on Assistive Technologies, ACM Press, 2000.

We are currently experimenting with adapting the same technologies considered here for the interactive navigation of XML fragments. XML provides a clean syntactic organization of documents (hierarchical structure), but often the syntactic organization does not reflect the desired navigational behavior (e.g., in the same way as we need to resort to XSLT to provide adequate visual display). We are currently exploring methodologies to develop a

14. J.F. Sowa. Conceptual Structures. Addison Wesley, 1984. 15. G. Vaderheiden et al. WAI Accesibility Guidelines. WDWAI-PAGEAUTH-19980918, W3C, 1998. 16.M. Zajicek, C. Powell, and C. Reeves. Ergonomic Factors for a Speaking Computer Interface. Contemporary Ergonomics. 1999.

32

Suggest Documents