Enriching web information scent for blind users - HIIS Laboratory - Cnr

Enriching Web Information Scent for Blind Users Markel Vigo

Barbara Leporini

Fabio Paternò

University of the Basque Country InformatikaFakultatea 20018 Donostia, Spain

HIIS Laboratory ISTI-National Research Council 56124 Pisa, Italy

HIIS Laboratory ISTI-National Research Council 56124 Pisa, Italy

[email protected]

[email protected]

[email protected]

detrimental for people with disabilities. In the case of blind users, information overload and excessive sequencing are the main problems. Mostly because screen readers and Braille outputs render Web content in a linear way. Therefore, in order to get the overview of a page a blind user has to traverse the whole page diminishing browsing efficiency and increasing disorientation. By enriching hyperlinks (and thus the information scent) with additional information on the accessibility of the corresponding hypertext node our aim is to provide users with navigational cues that make user experience less problematic. This extra information consists of the accessibility score for the target page, which can be considered an indicator of how well people will be able to navigate it. We hypothesize that users will be more effective and satisfied in their navigation with this support. In addition, we want to observe user behaviour when relevance and accessibility information are provided together. Supporting our hypothesis, in an experiment carried out with blind and sighted users, Ivory et al. [16] found that when a Web page may not satisfy users’ information needs, extra information features are preferred over relevance.

ABSTRACT Link annotation with the accessibility level of the target Web page is an adaptive navigation support technique aimed at increasing blind users’ orientation in Web sites. In this work, the accessibility level of a page is measured by exploiting data from evaluation reports produced by two automatic assessment tools. These tools support evaluation of accessibility and usability guideline-sets. As a result, links are annotated with a score that indicates the conformance of the target Web page to blind user accessibility and usability guidelines. A user test with 16 users was conducted in order to observe the strategies they followed when links were annotated with these scores. With annotated links, the navigation paradigm changed from sequential to browsing randomly through the subset of those links with high scores. Even if there was not a general agreement on the correspondence between scores and user perception of accessibility, users found annotations helpful when browsing through links related to a given topic.

Categories and Subject Descriptors H.1.2 [User/Machine Systems]: Human factors. H.5.2. [User Interfaces]: Evaluation. H.5.4. [Hypertext/Hypermedia]: User issues. K.4.2 [Social Issues]: Assistive technologies for persons with disabilities.

2. RELATED WORK According to Goble et al. [13], visually impaired users need to be explicitly warned of obstacles since their reliance on environmental cues is higher than for sighted users. Similarly, Harper et al. [15] found that detecting and notifying users about barriers beforehand improves users’ orientation at a web site and Bigham et al. [4] found that blind users are less likely to interact with non-accessible content. Therefore, warning blind users about forthcoming barriers may enhance their user experience. The above-mentioned outcomes lead us to provide mechanisms that diminish user disorientation by augmenting navigation mechanisms. Orientation and navigation are closely related since both refer to the user’s navigational environment. Orientation is the user’s understanding of current movements and the navigation context. Navigation is part of web browsing and consists in moving around in a hypertext document, deciding at each step where to go next [17]. The former answers the question “where can I go?” while the latter replies to “where am I?” 

General Terms Measurement, Design, Experimentation, Human Factors.

Keywords Information scent, web accessibility, blind users, adaptive navigation.

1. INTRODUCTION The Web is fast growing with an enormous amount of information available and penetrating all facets of our life. Thus, this information abundance generates various orientation problems to Web users. In order to better understand browsing behaviour in the Web, Pirolli and Card [24] formulated the Information Foraging Theory as a way for modelling user decisions when traversing hypertext documents. This theory states that users will follow a determined hyperlink when the trade-off between information gain and access cost is low. The information scent, the underlying basis of Information Foraging Theory, predicts the hyperlink choices based on such trade-offs.

Since blind users strongly rely on navigation cues or landmarks, Takagi et al. [26] suggest the possible solutions for improving usability for blind users: improvement of XHTML specification in such a way that WAI-ARIA [9] statements can be adopted, simplification of the navigation interface, automatic suggestion of navigation methods and integrating transcoding functions. Other techniques, such as the one proposed by Harper and Patel [14], provide summaries for blind users so that they can ascertain in advance if a page’s content is suitable for them. Since none of the previous contributions considers the task itself, Mahmud et al. [21] developed a method to capture the context of the selected link in order to guide the user directly to his target, thus removing

The growing unstructured amount of information is especially Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASSETS’09, October 25–28, 2009, Pittsburgh, Pennsylvania, USA. Copyright 2009 ACM 978-1-60558-558-1/09/10...$10.00.

123

information overload. Furthermore, Bigham et al. [15] developed a method to make task completion less time consuming in interactive web applications by applying end-user development techniques. Regarding link augmentation for able-bodied users, Campbell and Maglio [7] explored how the tension of link relevance and link annotation (in this particular case links were annotated with the connection speed of the page beyond the link) determined user behaviour. They concluded that as link relevance decreases, users tend to rely more on annotations. However, when there was a conflict between relevance and annotations (e.g. the most relevant link to reach a target had a slow connection), users were able to ignore the annotations and relevance prevailed.

visually impaired users. Checkpoints aim at describing and providing guidance in order to repair usability barriers that blind and visually impaired users may face while interacting with Web pages. Magenta [19] is a tool that evaluates Web pages against UGB, and its evaluation engine is independent of the representation of guidelines. Taking advantage of this feature, only those guidelines that just apply to blind users can be considered without changing the tool implementation.

3.3 Relationship Between Usability and Accessibility Guideline Sets There is not a total correspondence between WCAG and UGB. Some checkpoints exclusively belong to WCAG set (e.g. “do not use tables for layout”) while others belong to UGB (e.g. “provide a consistent pathway to enable layout and terminological consistency”). However, there is certainly an overlap of checkpoints. Depending on the relationship between checkpoints of both sets, the type of overlap can be categorized as follows:

The purpose of this paper is to ascertain whether link annotation with accessibility scores improves usability in terms of efficiency and satisfaction. Similarly our goal is to observe the strategy that users follow. To this end, this paper proposes quantitative metrics for blind users as the criterion to enrich the information scent with accessibility scores in Section 3. In order to test our hypotheses a user testing was conducted and the experimental setting is explained in Section 4. Results and discussion in Section 5 lead us to interpret user behaviour and the implications for design. Lastly, some conclusions are drawn in Section 6.



Same. Both guideline sets identify the same problem and suggest same techniques to check it.



Same but differently addressed by the tool. Even if both sets aim at covering a certain guideline just one tool implements it. For instance, Magenta checks the existence of “generic or ambiguous links” while ACB cannot test it.



Precondition. Techniques are complementary but they should be applied in a determined order so that some checkpoints are a precondition for others. WCAG tend to provide preconditions for UGB. For instance, while WCAG emphasizes to provide a summary for tables, UGB gives guidance on the content of the summary. In these cases UGB addresses the usability of the content thus extending the WCAG. This way, both tools complement each other.



Contradictory. There is a contradiction between the statements in guidelines. For instance, while UGB states that frames should not be used, WCAG states how to label them. In this case UGB criteria will prevail.

3. WEB GUIDELINES FOR BLIND USERS Web accessibility guidelines define the requirements a Web page has to satisfy in order to provide accessible content. The most widely accepted sets of guidelines are the Web Content Accessibility Guidelines, WCAG (1.0 [8] and 2.0 [6]) by the W3C Web Accessibility Initiative. Since our purpose is to provide accessibility scores for enhancing browsing experience not only technical web accessibility is considered but also usability aspects for blind users. While accessibility guidelines tend to address the existence of mark-up issues that prevent users from accessing the information, usability guidelines for blind users focus on the adequacy of content and navigational mechanisms.

3.1 Technical Web Accessibility Guidelines WCAG aim at giving guidance on how to build accessible sites for all users. This way, checkpoints, which are more specific best practices, provide guidance on how to remove barriers that may have an impact on several user groups. However, most checkpoints apply exclusively to a particular user group. Targeting blind users leads us to considering those guidelines that only affect this user group. In this sense, Brajnik [3] proposed a correspondence table between WCAG 1.0 guidelines and disabilities. This allows the identification of subsets of guidelines for determined groups and so we considered those guidelines that just impact on blind users. Therefore the subset of WCAG 1.0 that focuses on the blind users has been deployed in an automatic evaluation framework in which guidelines are independent of the evaluation engine [28] obtaining the Accessibility Checker for Blind users, ACB.

3.4 Reporting Issues Generally, checkpoints are stated in natural language entailing several interpretations for each evaluation rule. Thus, checkpoints tend to be divided into design techniques that tend to be technology dependent (in our case (X)HTML). For instance, ACB deals with the “data tables without summary” checkpoint dividing it into two techniques: (1) “provide summaries” and (2) “provide abbreviations for headers in tables”. At the same time techniques can be divided into test cases which are (X)HTML element and attribute dependent statements. The former technique contains 4 test cases: 2 of them check the summary attribute of tables while the others check whether caption element is within table tags. Test cases are atomic rules that are evaluated against Web content and thus the content in accessibility reports is determined by such evaluations. If we adopt EARL [1] terminology so that ambiguity is removed, test cases are equivalent to earl:TestCase statements while techniques and checkpoints correspond to earl:TestRequirement cases. Both statements are subclasses of earl:TestCriterion, which is the way to refer to such terms in a generic way. Depending on the accessibility issue they produce, tools herein presented, classify evaluation techniques as follows:

3.2 Web Usability Guidelines for Blind users Usability plays a key role because even if pages meet accessibility standards they still can be difficult to traverse [18]. In this sense, Leporini and Paternò [20] proposed a set of guidelines for the usability of accessible pages. Usability Guidelines for Blind users (UGB) consist of usability criteria grouped in four principles: structure and arrangement, content appropriateness, multimodal output and consistency. Each principle contains several checkpoints that focus on specific usability issues for blind and

124



Issues that can be completely automatically checked (earl:automatic) yield the next issues:

When evaluating the conformance of a Web page with respect to the mentioned guideline-sets both tools can work independently except when there is a checkpoint dependency due to preconditioning issues. In such a case, a component to solve these dependencies has been introduced. As can be observed in Figure 1, the (X)HTML resource is retrieved and while ACB evaluates its conformance to technical accessibility, Magenta checks the usability. Each tool produces a report and depending on the type of issue raised, be it exclusively accessibility, exclusively usability or overlapping issue, the Metrics Calculation Module produces a quantitative score based on the metrics defined in the following section. The Dependencies Solver, which is encapsulated within the Metrics Calculation Module, deals with those checkpoints that complement each other or those that are one another’s precondition.

- errors (ae): not satisfying this type of techniques raise accessibility barriers. They produce a pass (earl:passed) if the checkpoint is met and a fail (earl:fail) otherwise. - recommendations (ar): techniques implementing these issues can automatically warn or make a recommendation in order to enhance accessibility. Violating this type of techniques does not have a strong impact on accessibility but maybe on usability. Sometimes it refers to those checkpoints that not all users perceive as an enhancement when they are implemented such as “provide separation between subsequent links”. Other times, the fact that the interaction context strongly determines these kinds of techniques leads to not to be very strict on their fulfilment. For instance, users of older versions of Jaws 7 screen reader find problems when the content of value attribute in buttons is not meaningful. 

3.6 Web Accessibility Quantitative Metric for Blind Users Accessibility metrics that produce quantitative scores enable accurate discrimination among web pages as opposed to the WCAG conformance levels or success criteria. Quantitative scores are useful in those scenarios where accurate measurement is required such as in Web Engineering, Quality Assurance, accessibility monitoring observatories and Information Retrieval. In recent years, a good deal of research has been dedicated to Web accessibility metrics. Existing metrics provide a general approach for measuring accessibility as they do not consider specific user groups but rather general purpose guideline-sets. While some are automatically obtained [27] other require human judgement [2]. Even if there are some metrics for blind users [12] we believe our approach is more comprehensive since test typology and reporting particularities of each guideline set are considered in the process. In addition, results are normalized thus enabling interpretation of results in percentage terms.

Issues that raise warnings (w) can only be checked partially in an automatic way (earl:semiAuto). For a complete evaluation, experts should verify whether it actually exists an accessibility barrier. For instance, for “apply appropriate headings”, Magenta raises a warning if there are more than two headings. Afterwards an expert should manually check if headings were adequately placed.

Some automatic issues can raise either errors (ae) or warnings (w) at the same time. For instance, when checking the appropriateness of summaries in tables, if summary is not provided or it is empty an error is produced whereas if it has content and it does not belong to a forbidden description list for table summaries, a warning is produced. Lists that contain forbidden words, such as “this is a summary” or “pic12” in the case of images, can be detected by Magenta. However, the tool cannot guarantee that all forbidden words are contained and besides they are natural language dependent.

Metrics are automatically computed exploiting evaluation reports produced by the ACB and Magenta. Based on the specifications of the WCAG 1.0 subset and UGB guidelines, evaluation test cases can produce the following metrics:

3.5 Tool Coverage for Automatic Evaluation UGB define 17 checkpoints grouped in 4 guidelines. Magenta can semi-automatically evaluate 11 of them implementing 29 test cases. 22 test cases produce automatic errors while 7 of them produce warnings. On the other hand, the subset of WCAG 1.0 for blind users consists of 33 checkpoints, 18 of which can be automatically evaluated to a certain extent. These 18 checkpoints are implemented in 32 techniques that at the same time specify lower level requirements in 101 test cases. 63 can be automatically verified (51 ae and 12 ar) while 38 produce warnings. Compared to UGB, WCAG aims at covering all accessibility barriers, providing numerous techniques to remove them. There is such a difference in the number of test cases (29 vs. 101) between the two guideline-sets because the UGB focuses on subtle usability issues.



Failure-rate (fr) measures the ratio between actual errors and potential errors (or accessibility opportunities) [25]. For example, the “images lacking an alternative text” test case, checks whether each picture has an alternative description. This way, 10 pictures out of 100 would obtain fr=0.1 while 5 images out of 25, fr=0.2. Therefore the normalized score in terms of conformance would be 1-fr.



Accept/reject: whilst techniques to be measured by the failurerate are checked every time a determined hypertext label or attribute appears some test cases are applied once. For instance, “number of links” test case in UGB and implemented by Magenta. This test produces one error if there are more

Figure 1. How evaluation tools take part in the process 125

than 30 links. The metric can be understood as a particular case of failure-rate when the range from 0 to 1 is covered just by integer values. If one test case, the conformance will decrease proportionately to the number of techniques in a guideline. For instance, if “number of links” fails, the overall usability of “number of links and frames” guideline will decrease in a 50%, as there are just two test cases.

disjunction (d=1), arithmetic mean (d=0.5), to conjunction (d=0) depending on the logical relationship to be applied. When simultaneity in satisfying the requirements is necessary, conjunction and similarity are applied. In this case low scores heavily determine the final results. Contrarily, if the objective is to penalize the main component only if all subcomponents fail, the disjunction is applied. This way, only if most scores are low there will be an impact on the final result. Intermediate values are preferred, as extreme cases do not apply. This intermediate range of values is (0