Accessible Privacy and Security: A Universally ... - Semantic Scholar

Accessible Privacy and Security: A Universally Usable Human-Interaction Proof Tool GRAIG SAUER, JONATHAN HOLMAN, JONATHAN LAZAR, HARRY HOCHHEISER, AND JINJUAN FENG Towson University Department of Computer and Information Sciences Universal Usability Laboratory USA [email protected],[email protected],[email protected], [email protected], [email protected]

PRE-PRINT version, to appear in Universal Access in the Information Society (2009) Abstract

Despite growing interest in designing usable systems for managing privacy and

security, recent efforts have generally failed to address the needs of users with disabilities. As security and privacy tools often rely upon subtle visual cues or other potentially inaccessible indicators, users with perceptual limitations might find such tools particularly challenging. To understand the needs of an important group of users with disabilities, a focus group was conducted with blind users to determine their perceptions of security-related challenges. Human-Interaction Proof (HIP) tools, commonly known as CAPTCHAs, are used by web pages to defeat robots, and were identified in the focus group as a major concern. Therefore, a usability test was conducted to see how well blind users were able to use audio equivalents of these graphical tools. Finally, an accessible HIP tool was developed which combines audio and matching images, supporting both visual and audio output. Encouraging results from a small usability evaluation of the prototype with 5 sighted users and 5 blind users show that this new form of HIP is preferred by both blind and visual users to previous forms of text-based HIPs. Future directions for research are also discussed.

Keywords: CAPTCHA, Blind Users, Security, HIP, Universal Usability

1.0 Introduction There are many studies of usability and security, and there are many studies of usability and users with disabilities, but research at the intersection of these two topics is rare. Recent estimates are that 1.8 million people in the USA, and 37 million individuals worldwide, have no residual vision and are therefore 1

considered to be totally blind [18,19]. In addition, it is estimated that over 161 million people worldwide have some type of visual impairment (including low vision). As internet related security and privacy threats sky-rocketed during the past decade, people with visual impairments faced an increasingly treacherous online environment. In the meanwhile, the adoption of new security and privacy protection mechanisms increased the internet usage overhead and created more accessibility barriers for people with visual impairments. With this daunting reality, very few studies have been reported in this domain and current knowledge regarding security concerns of this population and the accessibility of the security mechanisms for this population remains extremely limited. This paper reports a series of related studies as early attempts to fill this gap: 1) a focus group to develop a “top 10 list” of security problems for blind users, 2) a usability study of audio-based HIP products with blind users, 3) development of a new, more accessible, and more secure form of HIP, and lastly 4) evaluation of the new HIP prototype with both blind and visual users.

2.0 Literature Review In the world of information security, there is often a perceived usability/security trade-off. The more secure you make an interface, the less usable it becomes [11, 13]. The more usable you make it, the less secure it is. This becomes even more of an issue when trying to make systems usable for users with disabilities. For instance, D’Arcy and Feng found that users with motor impairments use simple, non-complex passwords that are usually short in length due to their difficulty using the keyboard [10]. This is a security concern if the user does not use well crafted passwords that do not come from a dictionary. Short passwords and passwords that utilize dictionary words are much easier for an intruder to crack. Blind users rely on screen reader software that converts text on the computer screen into synthesized speech. This presents a problem for web-based mechanisms that use icons, images, or popup windows to indicate security concerns: if these displays are not interpreted by the screen reader software (and they often are not), they will be invisible to blind users. Despite these and other similar challenges, there is not a great base of literature on security and accessibility.

2

According to the World Wide Web Consortium [15], CAPTCHAs are one of the greatest security-related problems for users with disabilities, especially blind users. CAPTCHA stands for Completely Automated Public Turing tests to tell Computers and Humans Apart. This is based on the Turing test developed by Alan Turing in the 1950s to test a machine’s ability to imitate a human. In this paper, the term Human-Interaction Proof (HIP) is used to refer to the more general class of tools that distinguish between humans and computers. The most common HIPs are visually distorted images of a string of letters and numbers that can be identified by humans, but not by computers. Distortion of the image arguably makes automated recognition via optical character recognition software (OCR) difficult, making the text interpretable by humans, but not by software. In 1997 AltaVista came up with a HIP to protect its search engine from having web sites submitted by bots [12]. Since their creation, HIPs have been quite successful in serving the purpose of distinguishing between computers and humans [14]. Another use of a HIP is to prevent dictionary attacks. A dictionary attack is when a program tries to break into an account by guessing passwords using words in the dictionary. To prevent these attacks, after multiple failed login attempts some sites require the user to solve a HIP to verify that they are a human and not a bot trying to log into the account. The current CAPTCHA products implemented by Microsoft and Yahoo have been able to be defeated at a success rate higher than 60%. This shows that they may not currently provide the security they were designed to provide [17]. Current HIP implementations are primarily image-based, and therefore largely inaccessible to screen readers. Audio HIPs, on the other hand, which ask users to identify numbers from spoken audio, pose other challenges. In order to defeat automated speech recognition, audio HIPs use significant amount of background noise and varied speakers, making interpretation difficult. Furthermore, many sites do not provide audio HIPs. The following are three examples of HIP products currently being used on the internet today:

Figure 1 Figure 2 Figure 3 3

The first example is GIMPY (Figure 1), which is a distorted text CAPTCHA product. Users must identify the words in the image [7]. EZGIMPY (Figure 2) is another distorted text CAPTCHA product in which users have to identify multiple words [7]. ReCaptcha (Figure 3) asks users to identify the two words in the image. ReCaptcha also presents the user with an audio version, which asks users to type a string of 8 digits [9].

Table 1 Based on the types of content presented, HIP solutions can be grouped into three categories: Character Based: A string of characters is presented to the user. This string can contain either words or random alphanumeric characters. Image Based: Images or pictures are presented to the user. This is normally in the form of an identifiable real-world object, but can also be presented in the form of shapes (BONGO) [3]. For example an image of a cow would be shown and the user would be asked to identify it as a cow. Another example would present three circles and a square and ask the user to click on the square. Sound Based: The user is presented with an audio version of a HIP. The user listens to the audio file and inputs their answer. Examples of sound-based HIPs include spoken words or numbers [9] and sounds related to an image (HIPUU, see below). Based on types of challenges, HIPs can be categorized into two groups: anomaly based and recognition based. Anomaly based HIPs ask users to determine which object, or character, or shape does not belong in a set of images displayed on the screen. Recognition-based HIPs require users to determine what is being presented to them. Any of these five techniques can be used in conjunction with each other, as shown in Table 1. For example, ReCAPTCHA [9] is character based, recognition based, and sound based. Chellapilla et al. [2] evaluated the impact of the text distortion rate on humans. It was found that as distortion rates increase, it becomes extremely difficult for humans to decode a HIP. Yan et al. [20] developed a three-dimensional framework for the usability of CAPTCHAs in which distortion is used as one of 4

the three dimensions. They examine the type of distortion that is used and the overall impact that the distortion has on the user’s ability to solve the CAPTCHA. This framework has yet to be empirically evaluated. Recent studies show that HIPs are not as robust and effective as previously believed. Instead, they can be quite vulnerable to attacks. For instance, objectrecognition techniques have been employed to break commercial text-based HIPs and the accuracy can be as high as 99% [7]. According to Chellapilla et al. [2], software accuracy in decoding HIP images is barely affected by increases in distortion. These findings are highly disturbing, since they indicate that computers have the potential to solve HIPs and the existing key barrier against mass automation abuses may be quite vulnerable.

3.0 Requirements Gathering through Focus Group When this research started, available knowledge about the broader security concerns and priorities of blind users was very limited. To better understand these issues, a focus group was conducted at the National Federation of the Blind in Baltimore, Maryland. The goal of this focus group was to learn more about the security-related problems of blind users. The focus group consisted of a few employees of the National Federation of the Blind, all of whom are legally blind and use screen access software and/or Braille displays as their primary approaches to interact with computers. The focus group resulted in a “top 10 list” of perceived security related concerns for blind users on the web. 1.

Web sites with forms using visual HIP tools that don’t include audio output are inaccessible to blind users. To use these sites, blind users must either ask a colleague for assistance or phone technical support.

2.

Many web sites have secure login sessions that time-out if the user does not complete the login within the allotted amount of time. Unfortunately, some users, especially users of assistive technologies, can sometimes take longer to fill out the login forms. If timeouts are too short, users may be required to start the forms over, once again running the risk of a timeout.

3.

Some web sites occasionally automatically refresh / reload the page’s content. An example of this type of page is yahoo sports, which reloads a page every 30 seconds for constantly updating sports scores. While there may or may not be valid security-related reasons for these refreshes, screen 5

readers may respond by re-reading the entire page. This may lead to confusion, as the user may not be aware of what is going on and why, because they did not request for the page to be reloaded. Web pages should only reload per the user’s request. 4.

Many PDFs are inaccessible, often in ways that imply security issues. When creating an Adobe PDF file the option is provided to allow changes or “lock it” and not allow changes. If changes are allowed, any users can copy and paste and make modifications to the document. If the document is locked so that changes are not allowed, it must be specifically noted that the text has to be accessible to users with assistive technology. If this accessibility option is not selected, and the document is locked, then the PDF file may be inaccessible to screen reader users. In addition, the use of the term “lock” to prevent writes conflicts with most users perception of “lock” being used for security purposes.

5.

Some antivirus packages are still inaccessible. Focus group participants observed that Norton 2007 is apparently inaccessible, although 2006 was perceived to be very good. They also noted that some versions of McAfee antivirus are inaccessible. This is obviously a security concern, as blind users want to protect themselves from viruses, but are unable to do so using some of the most current antivirus versions. This presents another trade-off: should users keep the older virus software, which is accessible, or upgrade to the newer virus software, which is more secure but less accessible?

6.

Many times when a user loads a web page, some sort of software tries to automatically install. Such packages are often spyware that should not be installed. However, blind users may not be presented with enough information about what is trying to install to allow them to make an informed decision about installation. Improvements must be made to provide the users with more information about what piece of code is attempting to be installed, and to allow users to respond appropriately.

7.

Operating system and application updates can sometimes make some software packages inaccessible. This is another security compromise for users: if they update they could lose accessibility, but if they don’t

6

upgrade they could compromise security. This concern has led many blind users to disable automatic updating of software applications. 8.

SecureID is a handheld device that displays a frequently-changing number that must be provided – along with a PIN - in order to authenticate to a VPN (virtual private network). Blind users cannot read the number displayed on the SecureID devices, so whatever system is being secured by this means is inaccessible to them [16].

9.

Key loggers are malicious software packages that can log every key that the user presses. These logs can later be reviewed to disclose user’s passwords, credit card numbers, or other private information. These pieces of malicious software could be even more of a threat to blind users due to their dependency on the keyboard (and non-use of pointing devices). Key logging software is typically installed as spyware and is hard to identify on a computer.

10. Spam is very annoying and inconvenient to all users, including blind users. Since blind users need to listen to the text of their e-mails, it may take more time to discern the true nature of the e-mails, as compared to a quick visual scan. SPAM can also cause users - especially those who are underage - to be caught off guard by the content of some of these unsolicited junk messages. Although some of these items – most notably, PDFs and SecureID – may not appear to be directly related to web security, related concern lies in user perceptions. If blind users believe that these concerns may have a negative impact on security, confidence in their ability to make appropriate decisions may be limited. A combination of innovative designs and user education may be needed to properly address these concerns.

4.0 Usability test of Audio HIPs Despite expressed concern in the usability of audio HIPs, the magnitude of the problem was unclear. Therefore, a usability study was designed to learn more about several accessibility and usability concerns with current audio HIPs. The study addressed the following questions: 

Can users comprehend the distorted CAPTCHA audio?

7



Can users easily remember the numbers contained in an audio HIP challenge to use them to answer the HIP?



Can users, through their screen reader, find the controls that start and operate the audio HIP? Is there any interference between the screen reader (JAWS) output and the audio HIP? Does this interference prevent users from successfully completing the HIP?



Are there any frustrations that arise from any of the above factors?

4.1 Study Design A webpage was implemented based on the ReCAPTCHA system developed by Carnegie Mellon University [9]. ReCAPTCHA offers both a visual distorted text HIP and an audio HIP. For the audio HIP, the user is presented with an audio clip in which eight numbers are spoken by various individuals. Background noise is also inserted into this audio clip to make it harder for hacker bots to break the HIP. The user is asked to fill in a form with those eight numbers and hit a submit button. Upon hitting the submit button, they are presented with either a “Correct” or “Incorrect” reply. This type of audio HIP is the same type implemented by many of the major companies such as Google and Microsoft. The computer that the participants were using was pre-configured with the JAWS screen reader, with an audio output rate that was selected based on prior experience working with JAWS users. Participants were allowed to use external aids of their choice, but were not allowed to adjust the audio output rate. A common output rate was chosen to remove a possible confound eliminating any influence that the rate of audio feedback might have had on success rates. As the audio HIP is presented as a single recording generated by the ReCAPTCHA server, the screen reader settings do not influence the playback rate for the ReCAPTCHA audio. Each participant was asked to solve the audio HIP six times, including one practice trial. Tasks were evaluated on both completion time and correctness. The practice attempt was designed to give the users some comfort with the site, minimizing the possibility of disorientation. At the end of each participant’s session, they were asked to fill out a short questionnaire about their experience during this study.

8

4.2 Study Demographics Six individuals (3 males, 3 females), ranging in age from 28 to 54, participated in the study. One participant (user #5) had partially impaired vision, the rest were blind and had no residual vision. Participants had a range of experience with computer use and with the JAWS screen reader – Table 2.

Table 2

4.3 Study Results The participants used a number of different approaches for dealing with audio HIPs. It was not expected that the participants would use external aids as frequently as they did (See Table 3). Of the six participants, one used a Braille note taker to type numbers as they were spoken, one typed numbers into a word document, one typed numbers directly into the form as they were spoken, and the other three tried to memorize all of the numbers, typing them in once the audio had completed.

Table 3 Figure 4 shows the number of correctly completed attempts for each participant. Participant #1 got five attempts correct, participant #2 got three attempts correct, participant #3 got one attempt correct, participants #4 got four attempts correct, participant #5 got one attempt correct, and participant #6 got zero attempts correct. The average number of correctly completed attempts was 2.3, and the standard deviation was 2.0. Figure 5 shows the average completion times for each user’s correctly completed tasks. Participant #1 took an average of 75 seconds on correct attempts, participant #2 took an average of 81.3 seconds on correct attempts, participant #3 took an average of 81 seconds on correct attempts, participants #4 took an average of 49.25 seconds on correct attempts, participant #5 took an average of 22 seconds on correctly completed tasks and participant #6 got zero attempts correct. Of the tasks completed correctly by all participants, the average task completion time is 65.64 and the standard deviation is 27.84

9

Figure 6 shows the average completion times for each user’s failed attempts. Participant #1 got every answer correct and has no incorrect attempt data, participant #2 took an average of 95.2 seconds on failed attempts, participant #3 took an average of 88.5 seconds on failed attempts, participants #4 took an average of 50 seconds on failed attempts, participant #5 took an average of 30.5 seconds on failed attempts, and participant #6 took an average of 47.2 seconds on failed attempts. Of all of the tasks that were completed incorrectlly, the average completion time is 59.56 and the standard deviation is 35.65.

Figure 4 Figure 5 Figure 6 All participants were asked to complete a survey after their study session in which they were asked about their experiences using HIPs and their experience using the ReCAPTCHA tool. Some of the more pertinent questions on the survey were “Could you easily differentiate between the background noise and the CAPTCHA number,” “At any point while using the CAPTCHA did you feel frustrated,” and “How would you suggest we improve these CAPTCHAs.” All six participants suggested that the background noise be removed from the HIP, but said that they understood why it was necessary. Five of the six participants complained about the clarity of the audio HIP. Two of these five admitted to guessing on numbers that they did not understand. Three of the six participants complained about not knowing when the HIP numbers started and stopped. Four of the six participants suggested that one individual say all of the HIP numbers so that they could focus on one voice.

4.4 Implications of the Usability Study Success rates varied widely. Some people were able to complete all of the tasks correctly, but others were not able to complete any tasks. There was also a very wide range of average times taken to complete the HIP. Some were able to complete it in an average time as low as 30 seconds, while others took almost as long as 90 seconds to complete the task. Participants who used some sort of 10

external aid (Braille note-taker or MS Word) were more successful then those that did not use these aids. Although no statistically significant conclusions can be drawn, the results do provide some insights regarding the performance of audio CAPTCHA as used by blind users. The users involved in the study were only able to complete 46% of the tasks correctly, far short of the 90% success rate suggested as appropriate for HIPs [2]. The average amount of time taken to correctly solve an audio HIP of 65.64 seconds is greater than the 10 seconds that is suggested as the time to complete a HIP [9]. Although blind user times are not expected to rival sighted user times, the large gap in time between the two groups does suggest substantial room for improvement. The study participants who were most successful relied upon external aids, perhaps indicating that the audio HIP passes the threshold of technical accessibility, but does not achieve the goal of usability. One possible explanation for this result is that the audio HIP imposes a high cognitive load. Consequently, users have to utilize external tools.

5.0 HIPUU: An Accessible HIP Ideally, HIP design should be informed by the values of universal usability: tools should support a broad range of users of differing backgrounds and abilities. Current HIP systems create a separation between their visual and audio HIPs. The audio HIP is essentially a distinct system with a completely independent development and maintenance path. A universally usable HIP would join visual and audio presentations to produce a single system in which the audio is directly related to the visual elements that are presented to the user. This type of HIP would be more accessible for users with visual impairments, as well as having possible benefits of easy adaptation for different languages and cultures. Based on the above approach, a HIP has been developed, named HumanInteraction Proof, Universally Usable (HIPUU), which uses corresponding pictures and sounds to present concepts. The pictures make the HIP usable by someone who could see, and the audio make the HIP usable by someone who could not see. This combination of sound and images should make the HIP more universally usable for a larger part of the user population. The use of sound/image pairs, as opposed to text, may make the HIP, at least for now, more secure. As there is no known generalizable image processing and sound recognition tool, it is 11

believed that the images and sounds used in the HIPUU should be relatively resistant to automated attacks. At this moment, since there are no generalizable image processing applications, no image distortion was applied to HIPUU. Since no distortion is used, HIPUU would pass the distortion dimension of the three dimensional framework proposed by Yan et al [20]. Another benefit of this form of HIP is that it should be relatively easy to internationalize. Since the system uses pictures and sound effects, many of these concepts (although not culturallyspecific ones) could be used all over the world. The only thing that would need to be changed for developing the system for another language is changing the labels for the sound/image combinations. Upon loading, a web page containing a HIPUU displays a picture and offers a button to load a corresponding sound clip. For example, an image of a train might correspond to the sound of a train chugging along. Four categories of picture/sound combinations were chosen: transportation, animals, weather, and musical instruments. These four categories were chosen to aid in the familiarity of objects for the majority of potential users. Some examples included a bird, cat, drum, and piano. Any items that had multiple easily identifiable labels were discarded. The web-based application was tested with both JAWSTM and Window-EyesTM to make sure that it was fully accessible with the most common screen readers. When the user views the image or listens to the sound clip, they are prompted to choose the correct label from a drop-down list containing 35 possibly valid choices, which includes many decoy items which do not have images or sound clips. Figure 7 shows a screen capture of the user interface of the prototype application.

Figure 7

6.0 Usability Testing of HIPUU In order to evaluate the usability of HIPUU, a small scale proof-of-concept user study was conducted involving both blind users and visual users. For this testing, the tool was modified to go through objects 1 through 15 sequentially, so that the user who was testing the tool would not see or hear the same object twice. In a 12

real-world production system, the item to be presented would instead be chosen randomly.

6.1 Sighted Users Five sighted users tested the HIPUU prototype. Each user tried the 15 image/sound combinations, identifying the title of each object from the list. After testing, the participants filled out a survey targeted to gather their thoughts on the new tool. All of the participants completed the 15 tests within 3-5 minutes (average: 3.5 minutes) with no problems, identifying the objects that the pictures and sounds represented. No errors were made by any of the participants. Although the sighted users had the benefit of using both the picture and sound to identify the object, the goal in designing this tool is for either the picture or sound to be sufficient, on its own, to identify the object. The sighted users found it very easy to accept the design idea of using both images and sounds to identify objects (average 1.2 out of a scale of 5, 1 being the easiest to accept). Overall, the sighted users were highly satisfied with the ease of the use of the application and the time spent to complete the tasks. The sighted users had no difficulty identifying the objects based on images and sounds. Encouragingly, although the sighted users can effectively use both the traditional and new form of HIPs, the survey showed that they have a slight preference for the new form of HIP (Average 3.6 out of a scale of 5).

Table 4

6.2 Blind Users Five blind users were recruited to test the prototype. All five blind users interact with computers or computer related devices via screen readers in their daily life. The blind users tested the same version of the HIPUU prototype and responded to the same survey as the sighted users. The blind users took on average 8.8 minutes to complete all 15 questions. Two users completed all 15 objects in a little under 8 minutes and the longest test was 17 minutes (which was done with a slower screen reader reading speed, at the 13

user’s preference). As the user who took 17 minutes carefully explored the application, and gave a lot of verbal feedback, time cannot be the only metric used to measure performance. Three of the blind participants made one error during the test. Two blind participants made two errors. All errors were successfully fixed during the second try, so all tests were successfully answered within the first two attempts. According to the survey results shown in table 4, the blind users also found it very easy to accept the design idea of using the HIPUU (1.4 out of a scale of 5, 1 being the easiest to accept). They were highly satisfied with the ease of use of the application and the time spent to complete the tasks. The blind participants did not find it difficult to identify the object based on the sound provided (Average 3.2 out of 5, 3 being neutral). Regarding the possible adoption of the new form of HIP, the blind users greatly prefer this new prototype over the traditional HIPs (average 4.8 out of 5, 5 being highly prefer HIPUU) and expressed that they would like to visit sites that included HIPUU (Average 1, with 1 being most likely to visit), as they are unable to use the majority of current text-based HIPs. Overall, their survey results showed that they were highly satisfied with the prototype.

6.3 HIPUU Prototype Design Issues 6.3.1 Selection of Sound Effects The selection of appropriate matching sound clips and images is a crucial factor for user performance. The match between the term, the image, and the sound effect should be clear, unique, and obvious. To reduce ambiguity, HIPUU accepts multiple answers for some image/sound pairs: thunder and lightning, alarm and siren, etc. Even with those precautions, a few sounds proved to be problematic during testing. For instance, it was found that the pig sound effect was not clear enough: the quiet grunting sound effect used for the pig did not contain the typical ‘oinking’ associated with pig sound effects. In another test, one participant was looking for the word fox when a wolf howl was played. Sound effects that were well received by the participants included glass breaking, truck, train, siren, and bell.

14

6.3.2 Presentation of Sound Effects The sound effects need to have a minimum duration in order to be clear to the users. This was caused in part the screen reader software. With screen reader software, every key that is pressed is spoken through the computer speakers. Thus, if users use the enter or spacebar key to press the “Play Sound” button, the computer will be saying “Enter” or “Space” while the sound is playing. If the sound is not long enough to keep playing after the screen reader feedback, the user will not be able to hear it clearly. To avoid this problem, it might be necessary to repeat the sound effects. For instance, the cat sound clip was very brief and had a cat meow only once. A few users missed the sound the first time they heard it. In contrast, the dog sound has a dog bark three times, which was easier for the users to capture. Another suggestion to compensate for the screen reader software reading the key presses was to insert a delay before the sound plays. The cost is that this will slow down the time it takes for users to complete the HIP test and that a delay on a web site can give the impression of a poor server, slow connection, or a web site that is currently down. One major security concern with the HIPUU prototype is the problem of sound identification via checksum or file signature. This might be addressed by inserting random “non audible” noise (outside the range of human hearing) to the sound files on the fly as they are being processed. By inserting this noise randomly the checksums and file signatures of the files would change every single time they are played.

6.3.3 Interaction Strategies Different strategies were used by the blind users to complete the HIP tests. Some users were able to go through the majority of the HIP tests very quickly. Their main strategy was to recognize the sound, go to the drop-down list, press the key for the first letter of that word, and quickly jump to the answer. Some other users heard the sound and then went through the list considering each item until they found what they believed was the correct answer. Other users went through the entire list on the first few tests, and then on the future tests they jumped to the letter using the first strategy. HIPUU supported multiple strategies for completing the task goal, all of them leading to task success. 15

Alternative designs of the HIPUU approach might use free text instead of dropdown lists. A free text version would require that the user type the word corresponding to the object associated with the sound. This approach will raise questions of aliasing: which words should be associated with the sound of a cat“cat”, “kitten”, others? Techniques such as synonym lists and/or word stemming might be used to make free-text entry a viable alternative. Future studies will involve verifying if free text box interaction is a viable and usable approach for HIPUU.

6.3.4 Considerations for Practical Application HIPUU could be used to help protect web sites against automated attacks, while at the same time allowing effective use by individuals with visual impairment. Some enhancements to the HIPUU prototype can help provide the improved security necessary for practical use. It is very important for all HIPs to use an infinite database, or at least have a large set of options. If the search space is too small, the HIP will be subject to bruteforce attacks. This is clearly a concern for the HIPUU prototype. Another feature that would be desired is randomizing the audio file names on every run, or to have all file names randomly chosen to be renamed to temp before being displayed. Both of these will make it impossible for a bot to catalog the filenames so that it will know how to respond correctly. This feature also has the complication of serving many users simultaneously, so it is important to make sure that each user’s session files do not interfere with other concurrent sessions. Obscuring the file size along with the file name would also deter brute-force attacks. This can be done by making all of the file sizes the same size, thus defeating any attempts at cataloging file sizes. Finally, as with current HIPs, if there are more than 2 or 3 incorrect responses from the user, the user should be locked out from further attempts.

7.0 Future Work The HIPUU prototype is being revised to account for many of the challenges described above. The expanded prototype increases the number of potential images/sounds to be used in the database to 30. In addition, rather than ask the user to identify only 1 sound or image, it can ask the user to identify combinations 16

of 3 or 4 sounds. This greatly increases the number of potential combinations, and increases the security of the HIPUU. The user will have to get all 3 or 4 correct in order to correctly complete the HIP. The expansion of the image/sound database is necessary to prevent computer bots and / or humans from compiling the database of available options: the more options available in the database the more secure the prototype becomes. Adding more choices to the database should have no effect on the usability of the prototype. The dropdown lists that are provided to the user only provides a subset of options to chose from, and will thus not expand by adding additional image/sound combinations. Runtime concatenation of sounds with random length intervening silence will be added to help discourage checksum/signature based attacks. It is highly important to empirically test how the targeted users respond to the new HIP application. Further studies are planned to test the prototype with both blind and sighted users in a laboratory controlled environments. It is also critical to investigate how users interact with the new HIP application in realistic settings when completing everyday tasks. Future studies are planned to address this need as well. The existing HIP prototype uses a pull down menu from which the users select the correct answers. This creates potential security vulnerabilities in that a robot can automatically try all possible answers until it gets the correct answer. Although this threat can be controlled by blocking access after multiple failed attempts, it is not a robust and ideal solution. It is planned to explore the feasibility of replacing the pull-down menu with free-text entry, which will make the application more robust. The down side is that users may come up with a variety of terms for the same picture and sound combination, making it difficult to determine whether a specific entry is correct or not. Empirical tests will help determine and evaluate the feasibility of the free text entry approach.

8.0 Conclusions This study investigated several important issues regarding security and usability for blind users. The top security concerns of blind users on the web were identified. A user study confirmed that existing audio HIP applications are very difficult to use by individuals who are blind. A prototype of a new form of HIP, based on non-textual images and sound clips, was developed. A preliminary user 17

study showed that this new form of HIP provided satisfactory interaction experience for both visual and blind users. The developed system is particularly helpful for blind users, since it addresses the problems of the traditional visual text-based HIPs and provides a secure solution that is significantly more accessible. Further development of this prototype and evaluation of its usability will inform ongoing efforts to develop more accessible solutions for secure interfaces.

Acknowledgments: Our thanks to The National Federation of the Blind for assisting us with recruiting participants. We appreciate the assistance of John D’Arcy on the development of the HIPUU prototype. This paper is an expanded version of the paper presented at CWUAAT and published in “Designing Inclusive Futures.”

References [1] Schluessler T, Goglin S, Johnson E (2007) Is a Bot at Control? Detecting Input Data Attacks. Proceedings of the 6th ACM SIGCOMM workshop on Network and system support for games [2] Chellapilla K, Larson K, Simard P, and Czerwinski M (2005) Designing Human Friendly Human Interaction Proofs (HIPS). Proceedings of the SIGCHI conference on Human factors in computing systems [3] Von Ahn L, Blum M, Langford J (2004) Telling Humans and Computers Apart Automatically. Comm. Of the ACM, 47(2):57-60. [4] Holman J, Lazar J, Feng J, and D’Arcy J (2007) Developing Usable CAPTCHAs for Blind Users. Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility [5] Elson J, Douceur J, Saul J (2007) Asirra: A CAPTCHA that exploits Interest-Aligned Manual Image Categorization. Proceedings of the 14th ACM conference on Computer and communications security [6] Datta R, Li J, Wang J (2005) IMAGINATION: A Robust Image-based CAPTCHA Generation System. Proceedings of the 13th annual ACM international conference on Multimedia [7] Mori G, Malik J (2003) Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA. In Computer Vission and Pattern Recognition [8] Schlaikjer A (2007) A Dual-Use Speech CAPTCHA: Aiding Visually Impaired Web Users While Providing Transcriptions and Audio Streams. CMU-LTI-07-014. http://www.lti.cs.cmu.edu/ [9] ReCAPTCHA:Stop Spam Read Books (2007), Available via http://recaptcha.net/ Acessed 7, May 2008 [10] D’Arcy J,.Feng, J. (2006) Investigating Security-Related Behaviors among Computer Users with Motor Impairments.

Poster abstracts of SOUPS 06. Available via

http://cups.cs.cmu.edu/soups/2006/posters/darcy-poster_abstract.pdf. Accessed 10, March 2007 [11] Johnston J, Eloff J, Labuschagne L (2003) Security and Human Computer Interfaces Computers & Security, 22(8): 675-684. [12] Robinson, S. (2002) Human or Computer? Take this test. Available Via http://query.nytimes.com/gst/fullpage.html?res=9907E5DF163AF933A25751C1A9649C8B63 Accessed 17, March 2007

18

[13] Sasse M.A., Brostoff S, Weirich, D (2001) Transforming the Weakest Link - A Human/Computer Interaction Approach to Usable and Effective Security BT Technology Journal, 19(3): pp. 122-130. [14] Von Ahn L., Blum M, Hopper N, Langford J (2003) CAPTCHA: Using Hard AI Problems for Security. Available via http://www.cs.cmu.edu/~biglou/captcha_crypt.pdf Accessed 3, April 2007 [15] World Wide Web Consortium (W3C) (2007) Inaccessibility of CAPTCHA. Availible via http://www.w3.org/TR/turingtest/. Accessed 10, March 2007 [16] RSA Strong Authentication (2007), Available via http://www.rsa.com/go/gpage.aspx?id=44&engine=googamericasearch!795&keyword=(secure+id)&match_t ype= Accessed 28, June 2008 [17] Yan J (2008) A Low-Cost attack on a Microsoft CAPTCHA., Available via http://homepages.cs.ncl.ac.uk/jeff.yan/msn_draft.pdf Accessed 02,June 2008 [18] U.S. Census Bureau. (2002). Americans with Disabilities: 2002. Available at: http://www.census.gov/hhes/www/disability/sipp/disab02/ds02ta.html [19] World Health Organization. (2006). Magnitude and Causes of Visual Impairment. Available at: http://www.who.int/mediacentre/factsheets/fs282/en/ [20] Yan J (2008) Usability of CAPTCHAs Or usability issues in CAPTCHA design, Available via http://cups.cs.cmu.edu/soups/2008/proceedings/p44Yan.pdf Accessed 02, June 2008

19

Figure captions Figure 1: An example of the Gimpy CAPTCHA Figure 2: An example of the EZGIMPY CAPTCHA Figure 3: An example of the ReCaptcha CAPTCHA Figure 4: Number of correct attempts for each participant Figure 5: Average amount of time taken on each correct attempt Figure 6: Average amount of time taken on each failed attempt Figure 7: Screen shot of the prototype HIP interface

20

Table 1. Overview of existing CAPTCHA solutions CAPTCHA Name Gimpy [7]

EZ-Gimpy [7]

BONGO [3]

PIX [3]

ASIRRA [5]

IMAGINATION [6]

ReCAPTCHA [9]

CAPTCHA Type Character, Recognition Character, Recognition Image, Anomaly, Recognition Image, Anomaly, Recognition Image, Recognition Image, Recognition Character Recognition, Sound

Answer Type Text Box

Text Box

Text Box

Text Box

Text Box

Dropdown List

Text Box

Image, HIPUU [4]

Recognition,

Dropdown List

Sound

21

Table 2. Participants’ computer experience Participant

Years of Computer

Hours of Computer

Self-Reported Jaws Experience

Use

Use Per Day

on a scale of 1-10 (1 Being the lowest)

1

18

4

7

2

4

6-8

7

3

15

7-8

8

4

10

7

9

5

20

10

1

6

20

8

10

22

Table 3. Techniques used by participants to solve the HIP Participant

Technique

1

Used a Braille Note Taker to document numbers as they were spoken

2, 3, 6

Memorized numbers as they were spoken then type them in at the end

4

Used a Word document to document the numbers as they were spoken

5

Participant with residual vision, typed numbers into the submit box as they were spoken

23

Table 4. Summary of survey results from both sighted and blind users. Questions Overall ease of use (1=easiest to use)

Sighted users 1.2

Blind users 1.4

Acceptability of design concept (1=most acceptable)

1.2

1.4

Satisfaction with time (1=most satisfied)

1.6

1.2

Difficulty in identifying object (1=no difficulty)

1.4

3.2

Preference of sound over image (1=prefer sound)

3.2

1

3.6

4.8

1.8

1

Preference of text-based HIP over HIPUU prototype (1=prefer text-based HIP) Likelihood of visiting a site using the HIPUU prototype (1=very likely)

24