Towards Joint Attention Training for Children with

7 downloads 0 Views 5MB Size Report
to the CVH's joint attention bids, compared with NCVH. Overall, the study ... CVHs would help make more focused on the training content. (e.g., drums, the ... 2018 IEEE Conference on Virtual Reality and 3D User Interfaces. 18-22 March ...
Towards Joint Attention Training for Children with ASD – A VR Game Approach and Eye Gaze Exploration Chao Mei1 Kennesaw State Univ.

Bushra T. Zahed2 The Univ. of Texas at San Antonio

Lee Mason3 The Univ. of Texas at San Antonio

ABSTRACT Joint attention is critical to the education and development of a child. Deficits in joint attention are considered by many researchers to be an early predictor of children with Autism Spectrum Disorder (ASD). Training of joint attention have been a significant topic in ASD intervention education research. We propose a novel joint attention training approach using a Customizable Virtual Human (CVH) and a Virtual Reality (VR) game to assist with joint attention training. Previous work has shown that CVHs potentially help the users with ASD to increase their performance in hand-eye coordination, motivate the users to play longer, as well as improve user experience in a training game. Based upon these discovered CVH benefits, we hypothesize that CVHs may also be beneficial in training joint attention for users with ASD. To test our hypothesis, we developed a CVH with customizable facial features in an educational game - Imagination Drums - and conducted a user study on adolescents with high functioning ASD to investigate the effects of CVHs. We collected users' eye-gaze data and task performance during the game to evaluate the users' joint attention with CVHs and the effectiveness of CVHs compared with NonCustomizable Virtual Humans (NCVHs). The study results showed that the CVH make the participants gaze less at the irrelevant area of the game’s storyline (i.e. background), but surprisingly, also provided evidence that participants react slower to the CVH’s joint attention bids, compared with NCVH. Overall, the study reveals insights of how users with ASD interact with CVHs and how these interactions affect joint attention.

John Quarles4 The Univ. of Texas at San Antonio

and a Virtual Reality (VR) game to assist with joint attention training. CVHs have been found to potentially help children with ASD to increase their performance in a hand-eye coordination training game, motivate the users to play longer, as well as improve user experience [14]. Also, inferred from the work of Downing et al. [7], we know that when a person is paying attention to an object or event, she is likely to pay more attention to features that associated with the attended object. Based upon the discovered CVH benefits, we can hypothesize that CVHs can easily attract joint attention from children with ASD, so that it will consequently improve their attention to the knowledge taught by the CVH. To test this hypothesis, we developed a joint attention training game to investigate how effectively CVHs attract the joint attention of users with ASD. Specifically, we developed a CVH with customizable facial features in an educational game Imagination Drums (Figure 1). In this game, the CVH plays the role of a virtual teacher, interacts with users and tells the users information about different pieces in a jazz drum set. During this process, users have to perform actions such as eye contact and following joint attention with the CVH teacher to make the game proceed.

Keywords: Customizable virtual human. Autism Spectrum Disorder. 3D interaction. Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces –Evaluation/methodology 1 INTRODUCTION Joint attention is defined as the ability to coordinate attention between an object and a person in a social context [1]. For example, when a teacher is showing an object to a student, the student responds by gazing at the object. It is critical to the education and development of a child. The deficits in joint attention are considered by many researchers to be an early predictor of childhood ASD [1] [19] [18]. Training and evaluation of joint attention have been an important topic in ASD intervention education research [19]. In this current research, we are exploring the possible effectiveness of a novel joint attention training approach using a Customizable Virtual Human (CVH) Figure 1: A User is Playing Imagination Drums 1. [email protected] 2. [email protected] 3. [email protected] 4. [email protected]

2018 IEEE Conference on Virtual Reality and 3D User Interfaces 18-22 March, Reutlingen, Germany 978-1-5386-3365-6/18/$31.00 ©2018 IEEE

In the study, we collected users' eye-gaze data and task performance during the game to evaluate the users' joint attention with CVHs and the effectiveness of CVHs compared with NonCustomizable Virtual Humans (NCVHs). We expected that 1) CVHs would more quickly attract the user’s joint attention, and 2) CVHs would help make more focused on the training content (e.g., drums, the teacher’s face). The study results provided

289

evidence that supports the second expectation, but surprisingly, also provided evidence against our first expectation. Overall, the study revealed insights of how the users with ASD interact with CVHs and how these interactions affect joint attention. 2 RELATED WORK Joint attention influences “coordinated attention between interactive social partners with respect to objects or events to share an awareness of the objects or events” [16] Studies have shown that children with ASD commonly have impaired abilities of joint attention. For example, they use incorrect mappings between language and referenced objects [3], and incorrectly respond to joint attention actions with adults [13]. Joint attention deficits could be an early predictor of autism [19] and are associated with language ability [18]. The evaluation and training of joint attention are crucial to ASD intervention. Eye gaze is an important indicator of attention. Identifying the eye gaze point from other people is an essential interpersonal skill. For example, young children follow caretakers’ gaze to distinguish between important and unimportant things; people can be alerted to danger based on other people’s eye gaze direction [2]. During interaction with other people, following eye gaze is a way to share the attention with others [2], i.e., to achieve joint attention. On the other hand, people’s eye gaze direction can reflect his/her attention towards a task [4]. In our work, we evaluate if the users follow a virtual human’s eye gaze direction by observing the user’s eye gaze point. Through the user’s eye gaze information, we can also identify where their attention is when a virtual human makes joint attention requests (e.g., pointing/showing with hand). There is some prior work that utilized eye gaze data in VR intervention systems for ASD. For example, Lahiri et al. [12] developed a system that uses the participants’ eye tracking data to provide dynamic feedback in a VR intervention system. Their results indicate the similarity of eye gaze patterns for participants with ASD in VR and the real world. Navab et al. describe a VR application that evaluates the joint attention in infants with eye tracking data [17], to make an early diagnosis of autism. In our system, we use interactive VR games instead of videos as described by Navab et al. [17]. Dynamic eye-gaze data could reflect the participants’ gaze pattern in an interactive environment, and it could provide information to potentially help to improve the patients’ interaction skills in real life. The content of our VR game is inspired by the work of Whalen et al. [19]. In this work, they described a way to analyze an ASD subject's joint-attention and how to conduct joint attention training. However, all these processes need a professional therapist to interact with the subject. We have developed an automatic VR system which may effectively supplement clinical joint attention therapy, and our VR system could also be used in any other place where the therapists are not available (e.g., at home). 3 THE SYSTEM DESCRIPTION We developed a CVH with customizable facial features and a VR educational game - Imagination Drums – for training joint attention. Here we describe the hardware, software, and interaction that drives the game. Imagination Drums Set Up Imagination Drums is a VR game set up in an environment as shown in Figure 1. The system is equipped with one LED monitor, two speakers, one Tobii EyeX eye tracker and a Razer Hydra. A laptop controls the system and renders the graphics and audio. Figure 2 shows the structure of the implementation of this

290

Figure 2: The System Structure of Imagination Drums system. The specifications of each piece of the system are listed as below: The LED monitor is a 21'' Dell Monitor sitting on the desk. Users can adjust the height and the perspective of the monitor to the most comfortable position for them. The two speakers process the stereo sound of the game, which includes the speech of the CVH teacher and sounds from the virtual drums. The Tobii EyeX eye tracker is attached to the bottom edge of the monitor. This position is recommended by the manual of the eye tracker to be the most optimal position to track the eyes of the user. The eye tracker is calibrated for each user before the game starts. After calibration, the eye tracker provides the user's eye gaze data on the monitor’s screen. The eye gaze data is used to determine whether the user makes eye-contact and achieves joint attention with the virtual teacher. The Razer Hydra is a 3D input device that controls two virtual drumsticks in the game. For example, a user may be required to interact with the CVH by hitting the drums, similar to interacting with real drums without haptic feedback. Therefore, we used a 6DOF controller to simulate the real world action of hitting drums naturally. A Moto 360 android watch is used to provide haptic feedback to the users when the virtual teacher grasps the virtual hands of a user. The android watch vibrates on the user’s wrist to simulate the interaction. The android watch is synchronized with the system through a blue tooth connection. We did not provide haptic feedback for drum hits because although the system is low latency (i.e., ~40ms), it is still too a noticeable delay for realistic haptic feedback of drum hits. A TOSHIBA Qosmio laptop is connected to the monitor, eye tracker and Razer Hydra through an HDMI cable and USB cables. It is equipped with NVIDIA GeForce CTX 770M graphic card, Intel CORE i7 processor and 16GB RAM. 3.1 Customizable Virtual Human The primary purpose of the CVH is to enable end users to implement a virtual teacher based on their interests (e.g., a virtual teacher that resembles someone they know). Based on previous research results, A CVH has the potential to make the users with ASD more engaged [14], and thus we expect this could facilitate more eye-contact and joint attention. In this work, users can customize the CVH with a simple interface (Figure 3) that includes many variables that can be manipulated by the users. Customizable Variables Users customize the CVH by modifying ten variables. These ten variables are partially inspired by previous research on CVHs

for children with ASD [14][15] and by Ducheneaut et al. [8]. The details of these variables are listed below: •Hairstyle and color: a user can choose from 23 different predefined hair models (10 for male CVH 13 for female CVH). • Eye color: a user can choose from 7 different colors of eyes or define the eye color with an online image, which could be described as putting on contact lenses with different patterns for the CVH. •Skin color can be selected from 2 options - light skin and dark skin. •Gender can be selected from 2 options - male and female. •Name is set by typing any string, which may appear during the game.

After the game starts, the teacher explains each piece and gets the user’s joint attention. Specifically, we simulated the six levels of joint attention in the game: Level 1 – User’s response to hand on an object. At this level, when the participant is playing with one drum, the teacher places the participant’s virtual hand on a different drum. When the teacher moves the participant’s virtual hand, the android watch on the user’s wrist vibrates to simulate the haptic. Level 2 – User’s response to an object being tapped. The virtual teacher taps the snare drum in front of her with her right hand. Level 3 – User’s response to the presentation of an object in the game. The virtual teacher shows the user the drumstick in her hand.

•Face decoration: users could search online for images of their interests. The CVH automatically adjusts the image size and puts it on the texture of the face on the cheek area as face decoration. •Age can be selected from 2 different age groups – young teachers and senior teachers.

Figure 4: The Main Interface of Imagination Drums Level 4 – User’s response to eye contact. The virtual teacher asks the subject to look at her eyes before she shifts her eye gaze to the tom-tom drum. Level 5 – User follows the action of pointing. The virtual teacher points at the hi-hat cymbal of the drum set. Figure 3: The User Interface for customizing the CVH Teacher Game Description 3.2 After the user finishes the customization of a virtual teacher, there is a welcome screen which shows the words “Please wait, (the teacher’s name) is preparing for your class.” This welcome screen lasts for 5 seconds before the game starts the main interface. Figure 4 is a screenshot of the main game interface. The game takes place in a music stadium. The virtual teacher sits on a stage surrounded by a set of jazz drums. At the beginning of the game, the teacher greets the user. During the class, the teacher tells the user the name, effect, and history of each piece in the drum set and asks the user to hit a specific drum after the introduction of that drum. As the virtual teacher is speaking to the user, his/her lip and eye movement are synchronized with the speech audio. To keep the same speaking speed for different characters, the voices of the teachers are all generated with iSpeech [10] – a text to speech application. The motions of the virtual teacher were all captured from a real person using Vicon system. During the game, the user sits in front of the screen and interacts with a virtual teacher. The user can control a virtual drumstick with the Razer Hydra to hit the drums. Their eye gaze data is logged throughout the game.

Level 6 – User follows the eye gaze of another person. The virtual teacher shifts her eye gaze to the tom-tom drum, which can only happen after eye contact between the user and the virtual teacher has been established. Before the CVH introduces each piece of the drum set, he/she tries to get the user’s attention on one of the six levels. If the user gazes at the correct target (e.g. drums or eyes), this response will be marked as correct. Then the virtual teacher continues to introduce the drum or shift eye gaze to other objects (i.e., a test of eye gazing and eye shifting). If the user doesn’t gaze at the correct target for 15 seconds, the teacher will make the request again and perform the motion again. If the user still makes no response after the request has been made for three times, the virtual teacher will directly go to the next step of the game. After the teacher finishes introducing one of the drums, the user is asked to hit and play with that drum. When the participant is engaged in playing with one drum by hitting it with the Razer Hydra controller, the virtual teacher uses another one of the six levels to switch to a different piece of the drum set and start another introduction. The users’ correct responses to the virtual teacher’s requests are not directly praised by the virtual teacher (e.g., the virtual teacher never says “Good job!”) because we did not want the effects of the praise to bias the study results.

291

4 STUDY DESIGN We conducted a within-subjects counterbalanced user study, in which 10 ASD participants who scored greater than 71 in an autism severity test (GARS-3) were recruited. Thus, all participants interact with both the CVH teacher and a randomly assigned non-customizable VH (NCVH) teacher in a random order. We used within-subjects design because the ASD population has wide individual variation. The primary research question of this study is: will the users with ASD reach joint attention more easily with CVH teacher than NCVH teacher? Also, we will also evaluate the eye gaze data during the whole interaction process to investigate if there is any eye gaze difference between participants’ interaction with CVH and NCVH.

regions are shown figure 5. Also, as shown in figure 6 each piece of the drum and the drumstick are all individual ROIs. There is a background ROI, which is the area of game interface that is not occluded by other ROIs. We collected the participant's eye gaze time with respect to all these regions dynamically at every rendered frame of the game.

4.1 Hypotheses In previous work, CVHs were found to help the users with ASD to be more engaged in a VR intervention game and achieve higher training goals in the game. Based on these prior results, we hypothesize that a CVH will be more effective than an NCVH in drawing the attention from subjects with ASD. Specifically, the one-tailed hypothesis hypotheses are: H1. The CVH will obtain the participants’ joint attention with fewer trials and faster speed than an NCVH. H2. The participants will gaze longer at the Regions of Interest(ROIs) of a CVH than the ROIs of an NCVH. 4.2 Study Procedure The study was conducted in an indoor lab. Only the experimenters, the participant, and the participant’s guardian were allowed to present. The total duration of the study was about half an hour per participant, including training, customizing, performing tasks, and interview. We paid each participant 50 dollars. The experimenter first showed a demo of how to hit the virtual drums to make a sound with the Razer Hydra. The participant then was required to try hitting each piece of the drum set with a random order given by the experimenter. The training was conducted without any virtual teacher to not bias the participants. The purpose of the study is to explore the different effects the CVH and NCVH have on participants’ joint attention. Each participant went through the same class material with both of the CVH and the NCVH. However, which one of the two characters appeared first was randomly decided, such that the order did not bias the results. Before each participant interacted with a CVH, they were required to download three images of their interests. These images were later used in the game as part of the CVH teacher’s customization 4.3 Measurements Number of Times Acquired Joint Attention: how many times does the virtual teacher try to establish a successful joint attention with the participant. Reaction Time of Joint Attention: how long does the virtual teacher spend to establish one successful joint attention with the participant. These two measurements will be measured at each time the virtual teacher tries to get joint attention from the subjects. Time Gazing at Regions of interest (ROIs): The face of the virtual teacher is divided into three regions of interest – forehead, eyes, and face (upper, middle and lower part of the face). The

292

Figure 5: Regions of interest (ROI) on VH's face

Figure 6: ROIs on Game Interface

5 RESULTS We performed Paired t-tests to compare the differences of the measurements under CVH and NCVH conditions, using Bonferroni correction where appropriate. Number of Times Acquired Joint Attention The Number of Times Acquired Joint Attention is measured each time the virtual teacher requests attention. In the game, the virtual teacher requests joint attention seven times across the different levels (e.g., showing an object, asking for eye-gaze). Almost all of the participants reacted to the CVH and NCVH’s joint attention requests without repeating the request. Only one

Table 1: The comparison of the reaction time over CVH and NCVH conditions

Table 2: The comparison of the gazing time of the Background ROI over CVH and NCVH conditions

participant didn’t respond until the NCVH’s made three requests. There were no significant differences found in this measurement. Reaction Time of Joint Attention However, on the measurement of Reaction Time of Joint Attention, the results are surprising. We compared each of the reaction times in the CVH condition with its corresponding reaction time in the NCVH condition. When the virtual teacher makes the first joint attention bid by showing the drumstick and saying: “Please look at this drumstick,” the participants spent more time to shift their eye gaze to the drumstick in the CVH condition. Moreover, when the virtual teacher is asking for eye gaze, the participants took more time to shift their gaze to the virtual teacher’s eyes in the CVH condition. The details of these results are reported in Table. 1.

result are reported in table 2. Figure 7 shows the box plots of the gaze time on the background ROI in the CVH and NCVH conditions. Inferred from the work of Downing et al. [7], we know that when a person is paying attention to an object or event, she is likely to pay more attention to features that associated with the attended object. Here we find the study participants were paying more attention to the CVH and the drum set that was introduced by the CVH, which supports Downing’s theory and our intention of using CVH in the ASD attention intervention.

Regions of Interests We collected the gaze time in each ROI during the entire game process. Part of the data is excluded from the result analysis. In part of the game, the participants are given some time to hit with drums with the Razer Hydra controller. Through our observation, we found many participants were still not familiar with operating the controller. When they were hitting the drums, they spent time looking at the controller and talking with the experimenter. Therefore, to avoid the possible biases, we only take the gazing time when the teacher is talking or doing motion into consideration. We found the participants spent significantly less time gazing at the background ROI under CVH condition. The entire game scene is composed of the virtual teacher, the drums and the background. The background ROI is the visible area on the screen that excludes the teacher and drums. The details of this

Figure 7: The box plots of the gazing time on the Background ROI over CVH and NCVH conditions (boxes are the range of quartiles, whiskers represent the min/max value outside the quartiles).

293

Table 3: Overall Gazing Time on each ROI - CVH

Table 4: Overall Gazing Time on each ROI - NCVH

0.001). The homogenous-group classification based on the Tukey test results are shown in Table. 3 and Table. 4. From these two tables, we can tell the ranking of gaze time under CVH condition is following the rule that the further the ROI is from the center of the screen, the less gazing time it will have. Such a fact also exists in NCVH condition. The ranking of the most gazed ROIs is different in two conditions. In the CVH condition, there are two homogenous groups (table 3, ROIs in one homogenous group do not have significant differences with each other), the Face and the BassDrum ROIs are a group in the longer gazed group (group 2 in table 3). The Face ROI has significantly longer gaze time than most of the other ROIs that are in the shorter gazed group (group 1 in table 3). The homogeneous- group classification is more complicated in the NCVH condition. There are three homogenous groups (table 4). The face ROI has significantly longer gaze time over the least gazed 5 ROIs. The background ROI, which was not found to have with significantly longer gaze time in the CVH condition, has significantly longer gazing time over the least gazed 2 ROIs in NCVH condition. This has indicated a consistent result with table. 2 – Participants spent less time on gazing at the background in the CVH condition. We also ran the Tukey comparisons on each stage of the game (i.e., during the virtual teacher is talking about the bass drum, snare drum, cymbal, etc.). For both CVH and NCVH condition, the results show that on each stage, the piece that was being introduced by the virtual teacher is usually among the longest gazed homogenous group of ROIs. In the CVH condition, the piece ROI is usually not grouped in the longest gazed group. However, under the NCVH condition, the background ROI often appears in the longest gazed group. This trend is consistent with the overall gazing time. 6 DISCUSSION

Other Supportive Evidence to the Above Finding We also find other evidence that supports the finding participants spent significantly less time gazing at the background ROI under CVH condition. Within each of the CVH and NCVH condition, we performed a One-way ANOVA and Tukey test to compare the eye gaze time among each ROI. In both CVH and NCVH conditions, the ANOVA tests showed that there is a significant difference among the eye gaze time to the ROIs (p

Suggest Documents