Exploring Learning and Training Abilities Using Assistive Technologies

0 downloads 0 Views 248KB Size Report
2 Exploring Learning and Training Abilities Using Assistive Technologies by using data from sensors other than their eyes. We also could legitimately think.
Advances in Cognitive Ergonomics Edited by D . Kaber and G . Boy CRC Press 2010 Pages 754–763 Print ISBN: 978-1-4398-3491-6 eBook ISBN: 978-1-4398-3492-3 DOI: 10.1201/EBK1439834916-c75

CHAPTER 75

Exploring Learning and Training Abilities Using Assistive Technologies Guy A. Boy, David Lecoutre, Laurent Mouluquet, Anil Raj Florida Institute for Human and Machine Cognition 40 South Alcaniz Street, Pensacola, Florida 32502, U.S.A.

ABSTRACT   This paper presents the development of a training protocol for learning to read text with the BP-WAVE-II. The various learning tasks tested in the development of this paradigm could be applied to other situations in order to train BP-WAVE-II users to adapt to operations in unknown environments. Other types of assistive technologies increase situation awareness for the blind, such as the white cane and the guide dog. Unlike the BP-WAVE-II, these alternative systems are costly and require important environment modifications: (1) Braille representations; (2) street traffic lights equipped with speakers (audio CFS) that tells bind people when to cross the street; and (3) pedestrian crosswalks and sidewalk transitions equipped with grooves or bumps to inform the blind about changes in the nature of the ground (tactile CFS). Keywords: Assistive technologies, impaired people, training protocol, reading.

INTRODUCTION   Individuals such as wounded military service members or civilians who recently became visually impaired require special attention with respect to their lost abilities such as vision or audition. Until their injury or illness, they possessed welldeveloped visual processing capabilities that enabled them to recognize previously learned external objects rapidly. The sensory substitution approach can recover this ability to pattern-match information using these previously stored visual memories

2   Exploring Learning and Training Abilities Using Assistive Technologies  

 

by using data from sensors other than their eyes. We also could legitimately think that they create new sets of patterns and tactile “image” memories. The more general question relates to the brain’s mechanism of re-adaptation to an alternate sensory input. Bach-y-Rita and his colleagues (1998) already showed that sensory substitution systems can deliver visual information from a digitized video stream to an array of stimulators in contact with the skin of one or several part of the body, and the tongue in particular. The tongue is very sensitive and enables good electrical contact (Kaczmarek, 2005; Ptito etal., 2005). He notes that “We do not see with the eyes; the optical image does not go beyond the retina where it is turned into spacetime nerve patterns (of impulses) along the optic nerve fibers. The brain then recreates the images from analysis of the impulse patterns.” (Bach-y-Rita et al., 1998). While assistive technologies have been developed for sensory deficits (Capelle et al., 1998; Lenay et al., 2003), there is very few available data on optimal learning and training with these technologies. Indeed, when a natural sensory input such as eyes is replaced by an artificial input such as the combination of electro-tactile stimulation and human skin, the brain needs to learn how to interpret such new sensory input. Our main objective was to develop a principled approach to training for assistive technologies.

WHAT  IS  THE  PROBLEM?   The multiple channels that carry sensory information to the brain, from the eyes, ears and skin, for instance, are set up in a similar manner to perform similar activities to enable understanding of the local environment. Sensory information sent to the brain is carried by nerve fibers in the form of patterns of impulses, and the impulses route to the different sensory centers of the brain for interpretation. Substituting one sensory channel for another can result in correct re-encoding of the information by the brain. The brain appears to be flexible in interpreting sensory signals. It can be trained to correctly interpret signals sensed through the tactile channel, instead of the visual channel, and process it appropriately (Kaczmarek et al., 1995). It just takes training and accurate, timely feedback from the sensory channel in response to user inputs (Bach-y-Rita & Kercel, 2002). 1 Using the BP-WAVE-II , there are three sensory data-processing layers between the external environment and the brain. First, the camera provides digitized visual information. The second layer is the physical array itself that provides electrical stimuli to the tongue. The third layer is the tongue itself, equipped with nerve fibers, that transfers the stimuli to the tactile-sensory area of the cerebral cortex, the parietal lobe. The parietal lobe usually receives tactile information, the temporal lobe receives auditory information, the occipital lobe receives visual information and the cerebellum receives balance information. The frontal lobe is

                                                                                                                1

BrainPort® Wearable Aid for Vision Enhancement (BP-WAVE-II).

 

BOOK TITLE   3  

responsible for many higher brain functions. Obviously, the brain needs training to process signals normally processed in the occipital lobe connected to the parietal lobe in order to interpret artificially-provided tactile representations of visual information. There are many ways to approach augmented perception and cognition via assistive technologies. We chose to represent human and machine cognition using the cognitive function paradigm (Boy, 1998). A cognitive function is represented by three attributes, a role, a context of validity and a set of useful resources. We will consider five types of human cognitive functions related to vision, i.e., frame catching, pathway of data, localization, data processing and recognition. Frame catching involves the pupil, cornea, iris and lens, as well as the cones and rods photoreceptors cells of the retina. These components have their own cognitive functions, which may be impaired leading to a total or partial blindness. There are cognitive functions on the machine side (assistive technology), i.e., the BP-WAVEII. In the camera layer, there are several types of cognitive functions such as the ability to manipulate contrast, light intensity, zoom and so on. In the array layer, two types of cognitive functions can be modeled such as image resolution related to the number of points and the distance between points, and the tongue affordance to accept a large number of points. Finally, there are other cognitive functions related to supervision that needs to be provided to synchronize and support the use of the BP-WAVE-II, e.g., verbal indications or instructions in real-time, various kinds of training, and so on. This can be represented by the triplet Human-SupervisionMachine (HSM), where supervision is the gap between human and machine that an impaired user needs in order to reconstruct some kind of natural perception. Consequently, the experimental problem is as follows. Would it be possible to better understand the way HSM cognitive functions work together to insure a better human-machine adaptation while minimizing supervision? In order to specify this problem statement in more concrete details, we focused on letter size, complexity, color and contrast in order to determine the optimal stimulation on the tongue. In addition, light intensity, external contrasts, familiarity of targeted objects, static objects vs. motion, and complexity of the scene were considered.

METHOD   The first requirement was to develop scenarios appropriate for representation using the BP-WAVE-II, i.e., we had to define the distance between the camera and the object, size and thickness of items presented to the user, brightness, glint, and contrast. We simplified the environment to a 22-inch computer screen with white letters on a black background using a distance between the head mounted video cameras and the display of 80 cm promote efficiency and rapid success. Interaction revealed that all capitals Arial regular text at a size of 400 points provides a good signal quality on the tongue when using the BP-WAVE-II. Initial familiarization with the BP-WAVE-II is required to learn to operate the system independent of its use as a vision substitution interface. Users operate the

4   Exploring Learning and Training Abilities Using Assistive Technologies  

 

controls while presented with simple shapes before other more complex shapes can be presented. In addition, it is easier to feel a moving object, e.g., a line being drawn versus a static one. Once familiar with moving objects, a user can learn simple static shapes such as circles, triangles and stars, which can be presented before differentiating multiple objects. This step is crucial to increase user’s awareness of spaces between items, i.e., the way users feel and understand how to avoid interferences between letters. For example, interference may occur when a user may see half of a letter on the left part of the display, and half of the next letter on the right part of the display. This can result in a cognitively demanding disturbing situation for a novice. Initially, we envisioned training with 36 alphanumeric characters, but because numbers can be more difficult to perceive than letters, we focused on designing the training paradigm using only capital letters, which were categorized into four groups after several iterative trials: easy (C, I, J, L, O, U, T); medium (D, E, F, H, P, V, Y); hard (A, B, M, N, K, X, Z); very hard (G, Q, R, S, W). These results could be interpreted as follows. First, there are simple lines such as vertical and horizontal lines, and complex lines such as diagonal and intersected lines. Second, complexity comes from the number of lines in a letter. Third, circles are easy to recognize, but when augmented with other shapes they can become extremely difficult to identify. Fourth and more generally, letter-reading complexity comes from the combination of various shapes. In addition, we found out that 4 presents the frequency of each letters in English. We can see that the easy and medium categories are the most commons letters. This is encouraging for BP-WAVE-II use. A software presentation control program was written that presents a letter on the computer monitor. If the participant identifies it correctly, another letter is presented, else he or she is informed of the error and prompted to try again. After a second letter recognition failure, the system informs the participant, which letter was presented. For each trial, user’s reaction time can be measured, and subjective assessments of the perceived difficulty can be recorded. First, users would first be presented easy letters. After 80% reading success, they will proceed to the next category of letters. In case of failure in a letter category, this whole category is presented again. This process is used for the first two categories of letters, i.e., easy and medium. The series of letters presented during the test include redundancies in order to prevent participants from predicting the possible next letter, i.e., there are more items to guess than letters to be learned. For example, in easy trials where users learn 7 letters, a typical quiz is the following sequence of 10 items “C, I, J, I, C, L, O, T, U, T”. So even if the letter “C” is indentified, it could be presented again for recognition. For the hard and very hard categories of letters, the process consists of a series of quizzes, e.g., the three letters DPF were presented on the same screen and the participants asked to find the letter F. A quiz consists of asking for a wrong letter in a series, e.g., DPF is presented and the user is asked to find the letter E. For the very hard letters, a quiz consisted for example in presenting the series SZXSK and asking to find the letter S. Finally, participants will be asked to read a word such as ALBUM and reading

 

BOOK TITLE   5  

commonly used English words such as AND, THE, OF, TO, IN, IS, WITH and so on. Potential users of this training paradigm must possess sufficient cognitive and communicative capabilities to understand the instructions, control the BP-WAVE-II and respond to the stimulus. Once this training protocol was designed, we elicited human, machine and supervision cognitive functions. Cognitive functions were categorized with respect to the following dimensions: ease of learning; ease of retention; and level of required supervision.

RESULTS   Our research team used a four-trial evaluation procedure to design the protocol after a familiarization phase (detection of basic movements and shapes before starting reading letters, see Table 1). Each evaluation starts with a training period and ends with a simple test or a quiz. Easy and medium trials are essential for a participant to learn how to recognize a single letter through his or her tongue. Only one item was presented on the screen in order to keep workload as low as possible. Novices with the BP-WAVE-II tend to have difficulties with spatial recognition; they acquire these skills with experience. Hard trials involve reading several letters in order to prepare for reading words. Two trials are presented for identification of a single letter among three letters. Then, three trials are presented for identification of three letter words. In the very hard trials, two trials devoted to the identification of a given letter that may or may not be present in a set of five letters, and two trials for the reading of a word of five letters. The participant is awarded one point for each right answer, and the final score is the sum of all right answers. After experiencing all these trials, an individual should be capable of successfully reading most common English words, defined as recognizing a 3-letter word in less than 5 sec. In that case, this means that a user performed good pattern matching. Alternate procedures were considered, for example instead of presenting fixed (already written) letters, a letter could be dynamically drawn. This was motivated by the fact that when a letter is manually traced on the skin of someone, it is usually recognized. This is a question of memory of the letter shape associated with the motor movement. Initial pilot testing using the BP-WAVE-II found that while the letter being drawn was felt, the letter was seldom identified. The use of the camera zoom was also tested to better understand if it could improve the identification of letter shapes, and increase the awareness of users of zoom capabilities. During letter recognition tasks, zoom in – zoom out movements could be appreciated but not letter shapes. Zoom proved useful for exploration of specific parts of letters, e.g., in order to distinguish letters O and Q, zoom is useful to identify the small line at the bottom of the circle of the letter Q. Ease of learning. Learning of letters depends on user familiarity with the system, and on letter complexity (i.e., easy, medium, hard and very hard categories of letters). Initial pilot testing revealed an average learning time of 45 sec ± 15 sec per letter, which drops to near 20 sec with experience (i.e., at least 5 hours of experience in reading with the BP-WAVE-II). With 45 min of effective training (±

6   Exploring Learning and Training Abilities Using Assistive Technologies  

 

15 min), which is approximately 1 hour and 30 min of overall use, the researchers were able to read a 3-letter word in less than 4 minutes. A first important milestone occurs after approximately 200 min training time when a user manages to easily read small words using the BP-WAVE-II before proceeding to reading longer words. At this point the researchers were able to read a word in less than 15 sec. As a general standpoint, the more training, the faster the reading rate with the BPWAVE-II. After 2 to 3 hours training, interpretation of simple patterns and words presented on the tongue was possible. It must be acknowledged that we still do not know how much training is needed for participants to read fast enough to process an entire sentence at a reasonable pace. A learning plateau occurs when individuals stabilize their performance. We believe that with more experience an expert user will use strategies to read a group of letter instead of one letter at a time. This is a matter of context awareness that should enable readers to recognize words instead of always recomposing words from the recognition of single letters. Imagine when you started learning at school; in the beginning, it took a lot of time for you to read a single word, because you were probably trying to read each letter and then recognize or identify the word. Normal readers do not even read each letter of a word; they are able to understand and reconstruct a sentence even if some words or letters are missing, misspelled or duplicated. We faced the same problem and we expect to find the same solutions using the BP-WAVE-II. We believe that users can learn faster because their brains were already equipped with appropriate cognitive functions that enable them to read efficiently. Ease of retention. Learning is the ability to retain facts and figures in memory. We found out that not using the BP-WAVE-II for intervals of many days did not affect recognition accuracy or reaction. As already said, the more training, the faster one can read a word. Levels of required supervision. Initially users manifest three kinds of focus problems: targeting the screen, focusing on the letter and adjusting the appropriate zoom on the letter. This is a general problem of lack of feedback, resolved with time and experience. Supervision required appears variable with individuals as some individuals try to “Feel borders”, “Feel specific parts of a shape”, “Change level of zoom”, and so on. Four types of useful advices can be given to the users: orientation, technical settings, feedback, and alternative strategies. Orientation. Assisting users to appropriately set the zoom and recapture the area of interest (i.e., the center of the screen, the first letter of a word or a series). Technical settings. Helping users set the intensity of the display and the orientation of the camera. Feedback. Since there are different ways of recognizing shapes, users should explore all possible feedback methods using the BP-WAVE-II. This includes following the shape, moving the head from left to right, top to bottom, trying to draw the letter mentally, and finding various curves of each letter. For example, we know that the letter E is made of one vertical line and three horizontal lines. What is important in the pattern matching is the fact that the E is the only case in this class. It is thus necessary for users to create reference points for each letter. These reference points are typically anticipating recognition patterns. Supervisory guidance can be provided to those who cannot locate important reference points on

 

BOOK TITLE   7  

their own. Alternative strategies. There are two methods for reading an item: following the borders of a letter as if it was being written; and placing the global shape of the letter on the tongue. Users use the former at the beginning of the training and the latter when they start to know how to recognize a letter pattern.

ANALYSIS   Easy letters are found faster than complex ones. When the complexity increases reading time increases, but progressively decreases with experience acquisition. The very hard group of letters requires less reading time than a hard one and so on. Therefore, user experience improves the learning ability of users to read new letters and words. Two issues forced us to modify the initially planned protocol: • Other BP-WAVE-II training protocols informed users about the letter presented, and asked them to feel it (using large three dimensional physical models). This procedure was found to be unproductive because it was very difficult to verify the user’s acuity when feeling of the letter. • Some users spend few seconds (about 5 sec) to learn an item, then they spend a lot of time on another item (1 min), skewing total test response time during the quiz. Others spent more time (50 sec) during the learning part; they actually took the time to feel the different patterns. These users were more efficient at recognizing letters afterwards (10 sec). It could be reasonably inferred that some may overestimate BP-WAVE-II capacity for providing appropriate information users need to find, associate and integrate novel patterns in order to be able to perform good pattern matching. The following learning steps were identified. Acquisition, or assimilation in Piaget’s terminology (1985), is the process of learning a new behavior or skill. It can be measured by the time that users take to learn the patterns of an item. This time differs from one user to another. We observed that when providing appropriate supervision, such as orientation or coaching alternative strategies as mentioned above, skill acquisition improves. Execution adaptation and stabilization (maximum efficiency), or accommodation in Piaget’s terminology, is the process of refining and automating a new cognitive skill. This is the capacity to read a word with ease and recognize interferences, e.g. camera artifacts, distractions in the visual field, etc. Retention is the power of retaining and recalling past experience. At this learning step, it appears that cognitive performance remains unaffected following a lack of stimulation and/or spacing among periods of BP-WAVE-II use. Retention was very good, especially with a few exercises to go back to previous performance levels. Retention is better when both acquisition and adaptation are well done. Endurance is the ability of using an acquired and adapted skill over a long period of time without affecting results. Users often need a break after more than 45 minutes. Physically speaking, the camera gets warm and puts pressure on the user’s head. Cognitively, concentration and mental workload associated with the adaptation task contributes to fatigue. Transfer is the capacity of transferring skills,

8   Exploring Learning and Training Abilities Using Assistive Technologies  

 

e.g., after being able to read upper case letters, users are likely to learn how to read lower case letters faster. Such a transfer is not immediate; users should learn how to recognize a few patterns and reuse these patterns by analogy. For example, changing font style leads to learning a new pattern. This can be a limit of the device, but expert users can easily recognize a letter even if its font was changed. It actually depends on the distance in shape between both fonts. For example, the difference between upper-case “O” and lower-case “o” is insignificant, whereas the difference between “G” and “g” is very significant. Application is the ability to use a new cognitive skill in everyday life or outside its learning context. The environment for administration of this protocol should be set up to maximize the performance of the user. Transition to real world conditions with variable lighting, contrast, reflections, and so on, would be helped with system technology improvement.

DISCUSSION   Human vision obviously provides far better capability than the use of the current BP-WAVE-II system. This is due to several reasons that can be expressed in terms of cognitive functions on the human side (CFH), the supervision side (CFS), and the machine side (CFM). At this point in time, the BP-WAVE-II training cannot occur without supervision. The main goal of further studies and developments is to transfer some of the cognitive functions from the supervision to the machine, i.e., the BP-WAVE-II (CFSCFM). However, supervision shouldn’t be ignored because it enables increasing users’ trust using the BP-WAVE-II. The more trust, the less supervision would be needed. We should add richer feedback (CFSCFM) to increase user’s situation awareness (CFH), i.e., it would be good if the system could identify the reading process context and adjust appropriately. With the current system, the only available feedback is the central vision context, and not the periphery. Adding peripheral visual cues using sensory substitution technology presented on another part of the body could enhance the users understanding of the BP-WAVE-II stimuli. For example, when someone is reading, the system should be able to follow the reading process and dynamically follow up on the next word and line. The system should be able to interpret the user’s point of regard through gaze and accommodation tracking. In human vision, the eye is able to focus on near and far objects (CFSCFM). BP-WAVE-II supports optical and digital zoom instead but without strong direct zoom ratio/level feedback (the current system relies on tactile feedback through the fingers controlling the zoom level). Since users are not able to read letters lower than 18-point font and without a good contrast, then BP-WAVE-II cannot be used for reading most texts of our everyday life. This problem could be mitigated with increases in the resolution of the tongue array. The system should provide auto-focus/orientation (CFSCFM) when the scene is not presented orthogonal to the vision axis, the system should be able to

 

BOOK TITLE   9  

read a word, identify each letter, and present it to the user. While sighted people use this cognitive skill automatically (CFH), BP-WAVE-II users experience difficulty probably due to its low resolution and lack of orientation feedback (two CFM to be improved), e.g., if the letter E is presented and it appears that the item is not straight, it is not directly clear if the letter is rotated or a different letter with angled lines. Frame catching (CFM) needs adjustments, such as feedback of the gaze position. In addition, the concavity of the lens is not corrected. A straight line can be perceived as a curve in many cases. Sighted people are able to promote the saliency of part of the scene and filter out the surroundings in order to concentrate on the main task of processing purposeful data. Instead, the BP-WAVE-II acquires the whole scene without any semantic filtering. The BP-WAVE-II, therefore, requires cognitively more complex image processing. Consequently, a new machine cognitive function should be developed to perform appropriate image processing to improve the saliency of the visual point of regard. Various filters (e.g., infra red vision, edge detection, night vision, sonar vision, hollow shape) could be added on the camera that BP-WAVE-II user could switch as much as they want. In addition, peripheral vision sensing that provided a three-dimensional mapping (CFM) of the environment around the user that can be sent to the body, either on the tongue or chest for example, could improve situation awareness.

CONCLUSIONS   In this paper we present the development of a specific protocol and its applications to the exploration of learning and training abilities using the BP-WAVE-II assistive technology system. We designed the protocol based on observation and analysis of the various distinctions among human, supervision and machine (HSM) cognitive functions in order to propose a possible training solution that is likely to improve current BP-WAVE-II learning and use. In contrast with previous work, we did not only focus on how users feel the various ways of presenting simple letters such as T or C. We clustered the letters with respect to their level of difficulty and frequency in the English language. We then developed the HSM method that users practiced to learn how to read using the BP-WAVE-II. Appropriate supervision facilitates users’ learning and reading performance. For example, there is a learning asymptote for getting close to 100% success at about 5 hours of user experience in a reasonable reading time. However, current BPWAVE-II technology (CFM) is not mature enough to provide good reading quality in the everyday life. Users (CFH) are still too dependent on supervision (CFS). This is why sensory substitution should be augmented to remove supervision as much as possible. The VideoTact, as an example of a technology for context augmentation of BrainPort® stimuli is already under investigation. This work revealed that the tongue array resolution may be suboptimal, but we still do not know how many pixels the tongue is able to manage. We clearly observed that movements are felt through the tongue, but they are not always recognized. Only simple shapes and

10   Exploring Learning and Training Abilities Using Assistive Technologies  

 

simple movements are easily recognized. Therefore, BP-WAVE-II functions should be augmented to facilitate interaction and situation awareness. Human variability is tremendous and vision impaired individuals are not likely to adapt in the same way as our blindfolded researchers. Whether we should imitate human cognitive functions in the system, or augment the BP-WAVE-II by extending the machine’s cognitive functions to sense infrared, sonar, radar, etc.. We are currently continuing to investigate in these directions. More generally, this research effort enabled the development of a human-centered methodology for the co-adaptation of both user experience and technology. This methodology applied to the BP-WAVE-II is based on a multidisciplinary approach, as well as crossfertilized competences in neuroscience, psychology, computer and electrical engineering, and HCI. From an HCI point of view, it enables user experience design, and consequently the construction of training protocols, through the incremental elicitation of HSM cognitive functions.

REFERENCES     BP-WAVE-II (BrainPort® Wearable Aid for Vision Enhancement), Wicab, Inc., Middleton, WI, USA: http://vision.wicab.com. Bach-y-Rita, P., Kaczmarek, K.A., Tyler, M.E. & Garcia-Lara, J. Form perception with a 49-point electrotactile stimulus array on the tongue: A technical note. Journal of Rehabilitation Research and Development. Vol. 35, 4, October, (1998), 427-430. www.rehab.research.va.gov/jour/98/35/4/bachyrita.htm. Bach-y-Rita, P. and Kercel, S.W. (2002) Sensori-‘motor’ coupling by observed and imagined movement. Intellectica 35, 287–297. Boy, G.A. Cognitive function analysis for human-centered automation of safetycritical systems. In Proc. CHI’98. ACM Press (1998), 265-272. Capelle, C., Trullemans, C., Arno, P., & Veraart, C. A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Transactions Biomedical Engineering, 45 (1998), 1279-1293. Kaczmarek, K. A. Sensory augmentation and substitution. In J. D. Bronzino (Ed.), CRC Handbook of Biomedical Engineering (1995), 2100-2109). Boca Raton, FL: CRC Press. Kaczmarek, K. A. Tongue Display Technology. Technical Report. University of Wisconsin, Aug. 18, 2005. Lenay. C., Gapenne, O., Hanneton, S., Marque, C. & Geouelle, C. Sensory Substitution: limits and perspectives. Touching for Knowing, Cognitive psychology of haptic manual perception (2003), 275-292. Piaget, J. (1985). Equilibration of cognitive structures. University of Chicago Press. Ptito, M., Moesgaard, S.M., Gjedde, A. & Kupers, R. Cross-modal plasticity revealed by electrotactile stimulation of the tongue in the congenitally blind. Brain 128(3), (2005), 606-614. VideoTact. ForeThought Development, LLC, Blue Mounds, WI, USA: http://www.4thtdev.com

Suggest Documents