Session 12c4 Helping Visually Impaired Students in the Study of ...

Session 12c4

Helping Visually Impaired Students in the Study of Mathematics Arthur I. Karshmer, Enrico Pontelli and Gopal Gupta Computer Science Department New Mexico State University Las Cruces, NM 88003 USA Abstract - One of the greatest challenges to the visually The Problem Domains & Solutions impaired student in science and mathematics disciplines is the reading and writing of complex mathematical equations. Indeed, the study of almost all science or engineering based disciplines is normally beyond the ability of the majority of students with serious visual impairment. In research currently underway at New Mexico State University, tools are being built using logic programming to facilitate access to complex information in a variety of formats. On top of the logic based tools, new interfaces are being designed to permit more convenient access to information by our visually impaired students. The combined tools will complete the teaching cycle: exams and assignments prepared by sighted instructors, the material presented to visually impaired students using interactive tools, and finally, the students manipulating and creating responses that are readable by the sighted instructor. In this paper, we present an overview of the systems under development with samples of the interface tools currently being tested.

Introduction The explosion of technology has had a huge impact on all aspects of modern life. Perhaps one of the most critical domains of this artifact is its impact on the domain of education. Traditional concepts of how best to present educational materials to students have gone through dramatic changes over the past ten years. Unfortunately however, the impact of modern computer based technology has not yet had a salutary effect on the problems associated with the instruction of mathematics, science or engineering to the visually impaired and blind in our society. Indeed, it seems clear that what has become an empowering technology had become a disabling technology for the visually impaired [8]. In recent years, however, the situation has been changing. Through governmental programs in the U.S., Europe and Asia, there is a fairly large volume of research being carried out to support blind and visually impaired computer users and students. Our efforts, which are reported in the current paper, have been supported by the U.S. National Science Foundation1 along with help from the Phillips Petroleum Company and Apple Computer, Inc. The second project reported in this paper on presenting complex Web objects to visually impaired students is currently pending support from both the National Science Foundation and Microsoft Corporation.

1. The work on presenting mathematics to the visually impaired is supported by NSF Grant Number 9800209.

Reading & Writing Mathematics The study of mathematics, science and engineering requires the ability to read and write complex equations. While the reading of textual material by computer is a reasonable aid to the blind student, the same can’t be said for reading equations. The complexity of equations are evidenced in two main areas. First, equations are not one dimensional in nature as is text and second, the spatial nature of equations makes meaningful reading of them extremely difficult. Unfortunately, as the vast majority of science and math teachers are sighted and don’t know Braille, we must build a system that will allow student and teacher to interact in a meaningful manner. Thus, a comprehensive solution to the problem should: • make it less difficult for blind students to “read” mathematics • make it less difficult for blind students to “write” mathematics • make it less difficult for blind students to “read” and edit the mathematics they have written • make it less difficult for sighted teachers to be able to read what a blind student has written • make it less difficult for sighted teachers to write mathematics so that it can be “read” by a blind student Given that the blind students and the sighted instructors “read” and “write” mathematics in different languages, the above goal is difficult to achieve unless some innovative approach is used. The blind students are typically trained to “read” and “write” mathematics in Nemeth Code, while most sighted mathematics instructors write mathematics using LaTeX as a markup language; the output of LaTeX makes it easy for sighted instructors to read the mathematics written. Given that the two groups read and write different languages, making one group learn the language of the other, while a feasible approach, is not completely fair. We could make the blind student write mathematics using LaTeX as a markup language, but then they will have to learn a new markup language, plus reading their own writings will become more difficult due to the fact that LaTeX has been designed for sighted people. Likewise, making sighted instructors learn Nemeth Code will be undue hardship on them. An obvious solutions is to write language translators that will translate Nemeth Code to LaTeX mathematics, back and forth. In our work, we develop a formal approach for this language translation. In fact, the same approach can also be used for translating Nemeth Code (or even LaTeX mathematics) to

07803-5643-8/99/$10.00 c 1999IEEE November 10 - 13, 1999 San Juan, Puerto Rico 29th ASEE/IEEE Frontiers in Education Conference 12c4-5

Session 12c4

any other language---even non-traditional ones. For example, Nemeth Code can be translated to audio signals using the same formal approach to communicate the structure of a mathematics expression to a blind person. The model we envisage is as follows: a sighted professor prepares a homework using the LaTeX markup language (which is the case at NMSU, and many other universities). The LaTeX homework is translated into Nemeth Code automatically, using a language filter. The student reads the homework in Nemeth Code, and writes his/her answer in Nemeth Code also. Since a majority of professors are not Braille literate, the Nemeth Code answer should be automatically translated back to LaTeX, that can then be “typeset,” and read by the Professor. At present, students have to present their assignments either in handwritten print or in some computer code agreed upon by student and professor. The print is not accessible to the student, and the computer codes are at best not convenient for the professor. In addition, it would be helpful if there was some way for blind students to write mathematics so that while it is being written, it is completely accessible to them for reading and editing. A simple example of the problem can be seen in the following LaTeX representation of the equation shown in Figure 1. $\sigma=\sqrt{\frac{\sum\limits_{i=1}^{n}(x_i\bar{x})^2}{ n-1}}$

semantic translation of the parse trees into sentences of language L2. Both the steps are quite involved. In addition, the types of languages that can be handled are limited: L1 must be a context-free language [1] for YACC to work (for example, LaTeX is an context-free language, but Nemeth Code is not). In contrast we propose a declarative approach[6] based Logic programming [20] and Programming Language Semantics [19] that makes the task of language translation considerably easier. Once we have generated the required parse trees, we must have a method of presenting the information to the student in a meaningful way. Our experience with the use of encoded musical tones has convinced us that this method of higher bandwidth audible data transfer should transfer to the task of describing complex equations [7-16, 4]. Other research has been carried out in this domain[20,18]. In our current work we will present logical “chunks” of the equation to the student and also allow navigation within each chunk. The “chunks” will express overall structure as well as detail at each level. These “chunks” are essentially phrases in the parse tree that is easily produced by the Prolog based equation parser described above. Coupled with synthesized voice, our equation browser will give overall gestalt and lower level detail using parallelized sound. The equation space is divided into regions by an equation parser (see Figure 2). The user is then presented with a

n

∑ (xi - x)2

σ=

i=1

n-1 n

∑ (xi - x)2

i=1 Figure 1. The visual form of the LaTeX representation The problem of creating Nemeth Code output from LaTeX encoded mathematics homework, or that of creating LaTeX output from Nemeth Code coded assignment, or that of providing audio feedback from a Nemeth Code document can all be viewed as problems of language translation. Language translation amounts to the development of a filter that translates one language to another. Consider the translation of sentences of a formal language L1 to another formal language L2. The development of a filter involves first parsing the sentences in the language L1 according to the rules of syntax of L1. A parse tree should also be generated during the parsing phase. These parse trees next need to be processed and translated into sentences of language L2. Using traditional compilation technology, both these steps become quite complex. In the traditional approach a parser generator such as YACC [1] will be used for generating a parser program, that will in turn parse the sentences in the language L1 and generate parse trees. The parser will next have to be modified, and attribute grammars [1] used to accomplish the

n-1 n

∑ (xi - x)2

i=1 n

∑

n

i=1 Figure 2. A Possible Decomposition of the Sample Equation tonal and verbal description of the equation and its general complexity. Through the use of timbre, note and octave, the user is given an overall “glance” at the equation without much of the detail. Next, the equation is automatically decomposed into meaningful “chunks” and these units are then presented to the user in a tonal and/or verbal format as seen in Figure 3. From the highest level (left-hand side and


Session 12c4

right-hand side) the system cycles through the components with built-in delays between components. During any of these delays the user is free to click the mouse once or twice (or not at all). A single click further dissects the current chunk into its component “sub-chunks” and starts the descriptive cycle again.

Left Hand Side Instrument Note Octave

Trumpet

c

3

Right Hand Side Instrument Note Octave

Trumpet

d

3

Radical Instrument Note Octave

Piano

c

3

Inside Upper Instrument Note Octave

Harp

c

3

Inside Lower Instrument Note Octave

Harp

d

3

Summation from i equal 1 to n

Limit

Summation

Instrument Note Octave


Harp

c

4

d

e

3

Summation Symbol

Terms



Harp

Harp

4

Harp

f

3

Initial Value Instrument Note Octave

Single Click Double Click

Harp

g

3

Figure 3. An Equation Scan This basic audio browser technology is used in the two main phases of our mathematics learning system. First, it is an aid to the visually impaired student in the reading and understanding of complex, multi-dimensional equations. This multi-dimensional aspect of equations is difficult, if not impossible to express in Nemeth Code representations or standard electronic reading of the equations. The second advantage offered by the browser is the interactive help it gives the student during the process of writing complex equations. With it, the student is able to constantly interact with his/her work while in progress. When satisfied with the form and content of the equation, the student simply presses a function key to have the equation printed in the format that can be read by the instructor. In addition to the audio browsing, our system displays the equation on the screen in very large type and highlights, in color, the part of the equation currently being browsed. This feature was added for two reasons. First, sighted instructors in a laboratory setting will be able to work directly with

students. Secondly, with the large type and color highlighting, the system will be of value to low vision students as well. Web Based Educational Tools In a very short time the World-Wide Web (W3) has become very important for our society, industry, and educational infrastructure. While the W3 has rapidly become an enabling technology for most people, the visual nature of Web browsers makes the Web inaccessible to the visually impaired. A number of screen readers (e.g., Window-eyes and Jaws) have been adapted to work with Web browsers. However, they do not make the W3 fully accessible to the visually impaired [2], primarily because they are mere adaptations of pre-existing tools for reading out text on the screen. They do not adequately take into account the structure within a Web page (e.g., tabular information, and frame-based pages). Our novel, semantics-based techniques take the structure within a Web page into account, and are particularly designed to make frame-based pages as well as tables found in Webpages highly accessible to blind individuals – most Webenabled screen-readers fail to convey the spatial structure of frames and tables in a satisfactory way. Our techniques are based on parsing and analyzing the HTML structures that constitute a page (containing tables and frames) to produce semantic tree-based structural representations and meaningful recompositions (see Figure 4). The tools that we are developing will then navigate this structural representation to produce appropriate output, such as sound and speech. Particular attention will be paid to representing tables since tabular information is the hardest to convey to a blind person due to the structural information implicit in a table, which is lost if the table is linearly read out, as done by most current screen readers. To allow navigation of tables, we need to solve two interdependent problems. First we need a data representation which captures the structure of the table and of its components (headers, footers, etc.). Second, we need a representation of the search strategies to be used for navigating the table. The visual structuring of the table implicitly suggests the directions in which navigation should take place. E.g., row and column headers provide indexing mechanisms to access the table, and colors highlight parts of the table to be looked at first. In this project we develop a uniform representation scheme which captures both these aspects. The structural representation relies on the use of a tree-based hierarchical encoding of tables (hierarchy tree). Each table is encoded as a tree containing three sub-trees: 1. descriptive table information (e.g., title, number of dimensions), 2. indexable description of the table (an n-dimension structure storing the table’s cells), and 3. navigation information - multiple hierarchical views of the table represented as separate trees, each describing different ways to access the table. The different levels in each hierarchy identify progressively more precise areas of the table. Consider for example the table and hierarchy tree in Figure 5. The Navigation sub-tree suggests that first the focus should be on the average column, then on the entries in


Session 12c4

Figure 4. A Possible Recomposition of a Table

the today row and then on the rest of the table. Multiple views of the table are possible, and each will be represented as a separate Navigation sub-tree. Although HTML 4.0 has introduced a number of features which allows the creation of highly structured tables, HTML is too rigid and unstructured. The tree representation devised here is meant to provide a more flexible encoding of the structural and semantic information encoded in the table. The tree representation will be explicitly created and kept associ-

ated with the document (as part of it or as an out-of-document link). The structural representation will be encoded as RDF meta-data [3]. The first issue in the design of the structural representation of the table is the identification of the descriptive information. We can distinguish different types of descriptive information. Global and Local information provides textual description of the table and of each cell (e.g., using summary and title attributes). Indexing information denotes rows/columns of the table as headers (using the THEAD and

Focus 1 Navigation

Average

Today

13.17 13.3

16.3 17.3 18.1 17.23

17.23

....

Rest

13.5 12.8 13.2 11.2 13.4 15.3

Figure 5. A Table and its Structural Representation 07803-5643-8/99/$10.00 c 1999IEEE November 10 - 13, 1999 San Juan, Puerto Rico 29th ASEE/IEEE Frontiers in Education Conference 12c4-8

Session 12c4

TH HTML elements), thus providing a description of the dimensions of the table. Generally, it is possible to define any cell as header and place it anywhere inside the table. The effect of the header element can be controlled using the scope and headers attributes. The large majority of tables (e.g., HTML 3.2 compliant) are regular, i.e., they explicitly identify header rows/columns, and they do not use irregular HTML constructions (e.g., scope, headers). The explicit presence of the header information allows one to easily construct the indexing component of the structural representation. In absence of additional information, these headers will provide indexing/navigation into the table. The positioning of the header rows/columns in the table may also have a semantic meaning and will be used to partition the table in sub-tables. In absence of additional semantic information on the content of the table, a generic search strategy will be provided to the user. The strategy relies on using the header information as indices into the table – i.e., select dimensions and cells by scanning the header rows/columns. In addition to the information available from the actual HTML, we are building tools to allow an instructor to annotate existing tables in copies of web pages on an interactive basis. In this way, the instructor will be able to augment the semantic data needed by the table navigator/browser. One of our navigation tools works as shown in Figure 6. First, the table header is scanned horizontally reading each header to the user and pausing as in the equation browser. If the user wants to select a header, he/she simply clicks the mouse after the name is read. If there are sub-headers associated with the selected header, the selection process is reapeated. Once the header and sub headers have been chosen, a scan of the selected column begins and works in a vertical fashion. After reading each column element, the user can click the mouse to select the row. At any point in the scanning process, the user has a variety of function available as shown in Table 1. The function keys allow the user to obtain a variety of information not available to the regular user. The eventual goal of our work will be to produce a “plugin” which will work with the commercially available web browsers. The only serious concern is in the world of the PC (and the huge variety of clones available). There is a stunning lack of standardization from manufacturer to manufacturer in their choice of hardware components. Even if in the best case we could assume that every machine had some sort of sound card, it is unlikely that many would be supported in consistent a manner. The Macintosh line of computers do have a functionally equivalent digital signal processing chip (DSP Chips) with software support. The system consists of two subsystems. The first subsystem is in charge of synthesizing structural descriptions, combine them with semantic descriptions, and provide navigation facilities. This component will be integrated in Netscape and Internet Explorer. The second subsystem is a graphical tool that allows one to edit or annotate a page and synthesize the structural and navigation information for tables and frames. Building and processing of structural descriptions relies on semantics-based approaches and on modern parsing, interpretation and compilation technology. Our approach for developing the browser is the following:

Figure 6. Browsing a Table Using Voice Output We first develop an analyzer for HTML that captures implicit structural information. The analysis process relies on the use of a logic-based Web framework[17]) and logical denotations [5], a powerful framework for program analysis and verification. The output of the analyzer will be interpreted appropriately. The process of interpretation can be seen as an exercise in semantics since it involves assigning meaning to HTML structures (e.g., audio output instead of a graphical representation). Our semantics based approach allows us to handle every aspect of accessibility within a single framework. Table 1: Table Browser Function Keys

Function Key

Function

F1

Start/Stop Scan Columns

F2

Start/Stop Scan Rows


Session 12c4

Table 1: Table Browser Function Keys

Function Key

Function

F3

Max (Valuea, Indexb)

F4

Min (Value, Index)

F5

Median (Value, Index)

F6

Mean

F7

Total

F8

Std Deviation

F9

Find

F10

Where Am Ic

F11

Invoke Help Systemd

F12

Leave Table Browser

6)

7)

8)

9)

10)

11)

a. The selected value will be spoken to the user b. The index will be the titles of the column/row of selected item and will be spoken c. The titles of the row/column will be spoken d. An interactive verbal help system will be associated with all browsers

12)

Conclusions

13)

Through the use of advanced programming languages and techniques, coupled with the development of new ways to make the GUI accessible, our project should make several new tools available for educational purposes among the blind community. Subjects such as mathematics, which have traditionally been problematic for the visually impaired student, should become more attainable through the use of the tools that are being developed in our project. Finally, through the development of our web-based tools, the use of the web should become a valuable delivery medium for a wider variety of students If quality education is a critical issue in our society, it seems clear that it should be available to all. Our research is aimed at expanding the availability of technical education to an audience that is currently under-served.

14)

15)

16)

References 1) Aho, A., Ullman, J.D. and Sethi, Principals of Compiler Construction, Addison Wesley, 1986. 2) Brashers, C. & Bargi-Rangin, A. (1998) Personal communication with blind users. 3) Brickley, D., Guha, R.V., & Layman, A. (1998) Resource Description Framework Schema Specification. Technical Report, W3C Consortium. 4) Edwards, A. D. N. “The Design of Auditory Interfaces for Visually Disabled Users,” Human Factors in Computing Systems (Proceedings of CHI 88), ACM SIGCHI, 89-94. 5) Gupta, G. (1999) Horn Logic Denotations and Applica-

17) 18) 19) 20) 21)

tions. In Current Trends and Future Directions in Constraint Logic Programming. Springer Verlag. Gupta, G., “Logical Denotations: How to Build a Provably Correct Compiler,” Technical Report, Department of Computer Science, New Mexico State University, 1997. Karshmer, A.I., Gupta, G., Geiger, S., & Weaver, C. (1999) Reading and Writing Mathematics: the MAVIS Project, Behaviour & Information Technology, January, 1999 Karshmer, A.I., “Interdisciplinary Efforts to Facilitate the Production of Tools Support the Disabled and Elderly in the Information Society,’ Invited Presentation, Human Computer Interaction International Conference, San Francisco, September, 1997. Karshmer, A.I. and Wood, D., “Integrating the Visually Impaired Computer User Into Information-Based Workplace,” The Finish National Conference on People in the Workplace, Helsinki, August, 1996. Karshmer, A.I., “Navigating the Graphical User Interface: A Case for Interdisciplinary Research to Support People with Special Needs,” Proceedings of the ICCHP Meeting, Vienna, Austria, July, 1996. Karshmer, A.I., Brawner, P. and Reiswig, G., “An Initial Evaluation of a Sound-Based Hierarchical Menu Navigation System for Visually Handicapped Use of Graphical User Interfaces,” to appear in a Springer-Verlag Series on Human Computer Interfaces, 1994. Karshmer, A.I. and Oliver, R.L., “Special Computer Interfaces for the Visually Handicapped: FOB the Manufacturer,” the proceedings of EWHCI ’93, Moscow, Russia, August, 1993. Karshmer, A.I., Hartley, R.T., Paap, K., Alt, K. & Oliver, R.L., “Using Sound and Sound Spaces to Adapt Graphical Interfaces for Use by the Visually Handicapped,” The Proceedings of the 3rd International Conference on Computers and Handicapped Persons, Vienna, July, 1992. Karshmer, A.I., Hartley, R.T. and Paap, K., “SoundStation II: Using Sound & Sound Spaces to Provide High Bandwidth Computer Interfaces to the Visually Handicapped,” SIGCAPH Newsletter, ACM Press, January, 1992. Karshmer, A.I., Davis, R.D. and Myler, H., “The Architecture of An Inexpensive and Portable Talking-Tactile Terminal to Aid the Visually Handicapped,” Computer Standards and Interfaces, Vol. No. 5, 1987, North Holland Publishing, pp. 135-151. Karshmer, A.I., Davis, R.D. and Myler, H., “An Inexpensive Talking Tactile Terminal for the Visually Handicapped,” The Journal of Medical Systems, Vol. 10, No. 3, 1986 Pontelli, E. & Gupta, G. (1997) W-ACE: a constraintbased framework for Internet programming. In Int. Conf. on Tools with Artificial Intelligence. IEEE. Raman, T.V., “Emacsspeak: A Speech-enabling Interface,” Dr. Dobbs Journal, 1997 Schmidt, D., “Denotational Semantics: A Methodology for Language Development”. Allyn and Bacon, 1986. Sterling, L. and Shapiro, S., “The Art of Prolog”. MIT Press, Cambridge, 1994. Stevens, R., “Using SOund to Understand Complex Mathematical Equations,” ICCHP, Vienna, 1996.