AVD-LV: An Accessible Player for Captioned ... - ACM Digital Library

13 downloads 412 Views 827KB Size Report
National Technical Institute for the Deaf at Rochester Institute of Technology. Rochester, NY 14623-5604. {rskics, jjr7497, wxy1697, dss1638}@rit.edu.
AVD-LV: An Accessible Player for Captioned STEM Videos Raja S. Kushalnagar, John J. Rivera, Warrance Yu and Daniel S. Steed National Technical Institute for the Deaf at Rochester Institute of Technology Rochester, NY 14623-5604

{rskics, jjr7497, wxy1697, dss1638}@rit.edu ABSTRACT The Americans with Disabilities Act requires online lecture creators to caption the videos for deaf and hard of hearing students, or for deaf and low vision (DLV) students who request these accommodations. While current captioned lecture video interfaces are usually accessible to deaf students, it is more challenging to provide full accessibility to DLV viewers who have restricted vision, as they cannot see both the lecture and captions simultaneously. We present an enhanced interface for YouTube lectures (Accessible View Device interface for Low Vision) that provides more accessibility for DLV viewers. This interface provides the ability to pause either the video or the captions with a single key-press, so that the viewer can follow simultaneous audio and video information. This interface is available to anyone and can be used with any captioned lecture on YouTube.

Categories and Subject Descriptors K.4.2 [Computers and Society]: Social Issues – Assistive Technologies for persons with disabilities.

Figure 1: DLV students with limited field of views have to switch focus between the video and the captions. If important visual and aural information occur simultaneously, the student will miss the information source that they were not looking at.

3. AVD-LV While hearing consumers can watch and listen simultaneously, the transformation of audio to text requires deaf and low-vision (DLV) viewers to watch two simultaneous visual streams: the video and the audio text [4].

Closed captions translate auditory information to visual text for individuals who do not get auditory information, usually for deaf, or deaf and low-vision consumers. In the United States, about 15%, i.e., 50 million use closed captions [1]. Most US television content is required to have closed captions per the Communications and Video Act 2010. Unfortunately captioned videos are not always accessible to Deaf and Low Vision (DLV) consumers some captioned videos difficult to follow.

The DLV viewer will miss what is happening on the video. In general, this is not a problem when there is little visual information to be processed, such as when the lecturer is simply speaking. Problems arise when there are two simultaneous visuals to read or process – such as reading the slides, in addition to reading the synchronized transcript as shown in Figure 1. This is even more hard when the video has a lot of text or the content is dense, especially STEM videos [3]. We avoid this tradeoff by not using magnification or minification. Instead, we use a visual gaze management approach. Since most low vision students do not have enough field of view to see both the video view and caption view, we support their ability to focus on one view, and pause the other view.

2. Related Work

3.1 Browser Interface

The term ‘low vision’ covers a broad range of vision problems that cannot be resolved by eyeglasses or contacts. Most low vision people lack either a central high resolution focus (fovea), or a wide field of view with low resolution (peripheral vision) [6]. Low-vision aids can improve either central or peripheral vision, but negatively impact the other one. For example, magnifying devices reduce the field of view of the whole presentation, while minifying devices reduce resolution of individual visuals [2,5].

We developed a Chrome Application that can be downloaded and used to view YouTube videos. Consumers can use the application from any computer, anytime and anywhere with minimal set up. This approach also allows us to extend the interface to adapt to multiple caption and transcript standards on the web, unlike TV captioning. Now, most closed caption interfaces for web videos fetch separate caption information and display it via a browser plugins (e.g., Flash and QuickTime) or through built-in browser video functionality. This makes it straightforward to fetch the caption file and display it either as a synchronized transcript (many lines of text that represent several seconds of audio), and as captions (1-2 lines representing 0.5 to 1 second of audio). We include the ability to show both, due to the fact that while transcripts show more information, they can be harder to read because they are further away from the video. Conversely, the captions are easier to read since they are always shown in the same spot and are closer to the video, but the displayed information is shown for a shorter amount of time.

Keywords Deaf, Low-vision; Online video accessibility.

1. INTRODUCTION

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). ASSETS’14, October 20–22, 2014, Rochester, NY, USA. ACM 978-1-4503-2720-6/14/10. http://dx.doi.org/10.1145/2661334.2661353

Figure 2: There was little new visual information, so the viewer is able to alternate between the video explanation and the captions. The viewer has not clicked on the video to pause, but should soon press pause, as the lecturer will simultaneously explain the equation while writing it on screen.

3.2 Enhanced Text and Video Interfaces Since the DLV viewers alternate between reading the text and watching the video, We enhanced the interface to support DLV viewers with the following functions: 1) increase text readability through inverting colors and increasing font size through button clicks, and 2) to simultaneously display transcript and captions, and to freeze one or the other, as shown in Figure 3. In Figure 4, the viewer has magnified the transcript text and is reading it after freezing the video by clicking on it. The click also displays a highlight on the video to support the viewer’s rapid restart their watching or reading process in the video. When they resume watching the video, they can play the video at a faster rate, until the video has caught up. Another advantage of reading the transcript is that the viewer is less likely to lose context. For example, the latest line of captions may look like this: and sugar for taste. This would seem, from simply reading this one line, nonsensical. However, when the whole transcript line is bookmarked at the prior line, it reads as follows:

Figure 3: The viewer paused the video, in order to read the next few seconds of the transcript. Once the viewer finishes reading the transcript, the viewer resumes the video at a faster rate (e.g., 2x) to catch up with the lecture. They can bookmark before they switch, which eliminates search time and the student can rapidly switch between visuals. The student can magnify the text and adjust contrast as needed. The platform is scalable as it extends existing browser interfaces for DLV consumer’s use. The larger project focuses on both deaf and low vision students. The user interface development for low vision students is an on-ongoing project with low vision students.

5. ACKNOWLEDGMENTS This work is supported in part by NSF-IIS 1218056, and two associated REU supplements, and two University of Washington AccessComputing grants to support a REU internship and a co-op.

6. REFERENCES 1. 2.

3.

Then we add some cinnamon and sugar for taste. This example shows how the DLV viewer is able to switch views and use bookmarks to reduce the chance of losing context.

4. Conclusion The AVD-LV application can support a DLV student’s viewing of complex visuals as they read the aural-to-visual translation. The application allows the viewer to focus on the captions or video when there is miniscule simultaneous overlap. When the overlap is significant, the viewers are able to alternate between the views through pausing one of the views, usually the transcript.

4. 5. 6.

Chao, G. The State of Closed Captioning Services in the United States. 2003. Hayden, D.S., Zhou, L., Astrauskas, M.J., and Black, J.A. Note-taker 2.0. Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility - ASSETS ’10, ACM Press (2010), 131–137. Kushalnagar, R.S., Lasecki, W.S., and Bigham, J.P. Captions Versus Transcripts for Online Video Content. 10th International Cross-Disclipinary Conference on Web Accessibility (W4A), ACM Press (2013), 1–4. Kushalnagar, R.S., Lasecki, W.S., and Bigham, J.P. Accessibility Evaluation of Classroom Captions. ACM Transactions on Accessible Computing 5, 3 (2014), 1–24. Peli, E. Vision multiplexing: an engineering approach to vision rehabilitation device development. Optometry & Vision Science 78, 5 (2001), 304–315. Peli, E. Vision multiplexing: an optical engineering concept for low-vision aids. Proceedings of SPIE, SPIE (2007), 66670C.