by-cell basis makes difficult mental integration of the table structure and the content ... When the number of cells is small enough, e.g., in the Tic-Tac-Toe board-.
Non-Visual Gameplay: Making Board Games Easy and Fun Tatiana V. Evreinova, Grigori Evreinov and Roope Raisamo Dept. of Computer Sciences, University of Tampere, Kanslerinrinne 1, FIN-33014 Tampere, Finland {etv, grse, rr}@cs.uta.fi
Abstract. In this paper we report the results of a study on an evaluation of a game and techniques which allow playing board games in the total absence of visual feedback. We have demonstrated that a camera mouse can be used for blind navigation within a game field. Snapping a position of the virtual pointer to the regions of interest as well as audio-haptic complementary mapping significantly reduce the cognitive load and facilitate mental matching and integration of overview sound sequences. Keywords: board game, tabular data, non-visual game, overview cues, audiohaptic mapping, camera mouse.
1 Background The tabular data presentation has many different forms which combine descriptive (textual) and spatial (graphic) elements. The content of a table can be grouped spatially and functionally. Blind people experience significant problems to access such composite information through screen readers [4]. Reading the table on a cellby-cell basis makes difficult mental integration of the table structure and the content of the separate cells. A table structure requires different overview and attentional mechanisms to explore and comprehend the mapping features and the data. To present the composite image such as a table or board game to the blind the descriptive and spatial (graphical) elements must be translated into accessible forms while preserving their original meaning: the overview, spatial and functional grouping of the content. Requiring minimal cognitive resources the overview information has to be easily perceived along and synchronously with the content of the cells (like the highlighting or shading) [3]. Ramloll with co-authors investigated access to 2D numerical tabular data arranged in 26 rows and 10 columns through sonification [8]. Besides navigation and shortcut keys the authors used overview keys (“+” and “Enter”) by allowing the user to listen to any column or row as a continuous sound sequence. Sonification of the numerical sequences does not deliver the overview information immediately. The listener has to make mentally a sophisticated auditory analysis to discover a tendency in the sound pattern. When the number of cells is small enough, e.g., in the Tic-Tac-Toe boardgame having 9 cells with 3 states per cell, with numeric keypad the player can have an
X
empty empty
0 X X 0
empty
0
X_empty_empty_ _ _0_X_X_ _ _0_empty_0_@ (a) X_empty_empty_ _ _X_X_0_ _ _0_empty_0_@ (b) X_empty_empty_X_0_empty_0_0_X_@
(c) Time line
Fig. 1. The spatial arrangement of the tabular data (picture on the left) and the possible playback sequences of the cells (picture on the right) for a small number of parameters: X, 0, empty – states of the cells; “_” – a short time delay (caesura); “@” – the end sound marker.
access to each cell, each row/column or even a playback of the whole sound sequences is a reasonable way to listen to the pattern of gamespace (Fig. 1). The perceptual parsing of acoustic sequences allows the listener to mentally recombine the components and follow the sounds of interest among other sounds [5]. Special caesurae are necessary to split and synchronize spatial components of rows or columns for successful recombination and integration of the sequence of sounds into a spatial sound image. The playback sequence of the cells (Fig. 1 a, b, c) and a soundsilence ratio are key factors in matching auditory streams [10]. When the table contains more cells than 25 it becomes difficult to perform a mental matching across corresponding rows and columns. An exploratory activity of the person can reinforce the perceptual parsing and integration of sound sequences into a spatial sound image. Stockman with colleagues (2004-2005) performed systematic studies on interactive sonification of spreadsheets and diagrams. In navigating accurately within the data (spreadsheet) the usual PC cursor was used while in pointing to the region of interest the sonification cursor was employed. The sonification cursor provided the user with overview (approximate) parameter of the area of arbitrary cell ranges and subsets of a spreadsheet. Exact information was delivered through verbal feedbacks using the screen reader. The authors demonstrated that the system which is tailored to the user requirements and can display on demand overviews of the data from a range selected will decrease cognitive load and will speed up the process of examining groups of cells for particular features [9]. However, to navigate with the sonification cursor in Stockman’s method the user had to control four additional keys (E-up/D-down, S-left/F-right) with the left hand while to operate with the PC cursor the arrow keys were controlled by the right hand. Four modification keys were used to synchronize behavior of two logic cursors (activate/superpose/lock together) and more keys were used to control the parameters of sound feedback. When many different controls and conditional signals are being used to operate with an application, it drastically decreases the usability of the system. Instead, the application has to rely on intuitive interaction techniques and the complementary mapping of multimodal feedback signals by freeing up mental resources of the user (perceptual, cognitive and executive functions). In order to play the game, blind and visually impaired people mostly have to rely on auditory and haptic cues. As it was demonstrated recently for early-blind, late-
blind, and blindfolded-sighted individuals the need to rely more exclusively on haptic inputs stimulates haptic dexterity in the blind (p. 1263, [7]) while sighted people strongly rely on vision to represent peripersonal space. According to Millar [6], cognitive spatial encoding in pointing tasks can be done as egocentric and can be based on movement memory. The board gamespace can be imagined as a 2D plane orthogonal to the axis of human gaze. Gazing straight forward in neutral head position would mean facing into the center of the virtual plane. This metaphor had been successfully used for different applications. The gamespace can be divided into a number of regions. In the absence of visual feedback the player can select the particular region of the gamespace with keyboard using shortcut keys and key combinations. Nevertheless, video tracking of the user’s head movements can also be used in the absence of visual feedback for navigation and target location acquisition within a gamespace. T-region of interest 6
E R T
5 4
D F G
3 2
CC V V B
1 a
b
c
d
e
f
H J N M
Shift + key to shift the region
Fig. 2. Gamespace: 6 by 6 cells, 4 states per cell and the game character Scrat (picture on the left); 11 moves to win; and the keys of the regular keyboard used together with Ctrl-key to select the region of interest and the particular cell (picture on the right).
T-region of interest Row 1
F
Row 2 Row 3
CC Shift + head gesture
H
J
N M H_J_M_N_@
Overview direction
Time
Fig. 3. The audio-haptic complementary mapping for navigation with camera mouse and sonification of the region of interest within a gamespace. Pressing F-key without modifier moves cursor in the central region.
Entering each region with head gesture can produce an immediate audio signal about the location selected. Furthermore, the technique similar to sticky icons [1, 11] can prevent the cursor from wandering around the region of interest. While it is possible, as shown in Fig. 2, there is no need to encode each cell of the board game like in Chess and to select them with a sequence of key combinations. Sounds of the rows (Row1, Row2, Row3 in Fig. 3) can augment the vertical position of the cursor, while with the haptic sense of the head position the person can accurately differentiate left, middle and right gamespace regions within each row. By hitting a spacebar the player can receive a specific information about the game objects within a region of interest through the overview sound sequence.
2. Experimental Setup 2.1 Hardware To explore the possibility of scanning the field of the game with the camera-mouse the well-known game Eggs (Bug) was adapted for a totally blind gameplay. The game test program was written in Microsoft Visual Basic 6.0. The camera-mouse module was implemented employing Microsoft Video for Windows (VfW) and API functions. A desktop PC with Intel Pentium 4 CPU (3201.0 MHz Cache 2MB) was used for testing. During the tests the monitor (with a resolution of 1024 × 768 pixels) was turned off. A Logitech QuickCam Pro 3000 with a frame rate of 50 fps and SoundBlaster Live! with an amplified multimedia stereo system were used. The region near the eyebrows of 48×48 pixels (the template) was stored and used as the facial landmark being tracked with the adaptive cross-correlation algorithm provided a correlation between the template and samples of 0.995 and not worse than 0.8 [2]. A sigmoid-based transfer function of head displacements into coordinates of the pointer was adjusted with indexes of speed (1.7), acceleration (2.3) and moving average on 5 detected template positions. Such a technique is acceptable for nonvisual interaction when supported by appropriate feedback [1]. Head movements were supported by snapping a position of the virtual pointer to the regions of interest, audio and perceptual haptic (proprioceptive) cues. Thus, the position of the virtual pointer had been changed discontinuously, without intermediate positions. However, the testing software stored the real tracks of the head movements within the input space of 1024×768 pixels that was zoomed regarding the actual gamespace to allow using the less accurate gestures. A screen size of the cells in game application was set at 25×25 mm and the field of interest had a size of 50×50 mm (Fig. 2, left picture). To easily discriminate two adjacent fields of interest perceptually with a haptic sense and discrete sound cues a physical space for head movements was set at 120×100 mm of the screen size while the virtual grid was inspected with the camera-mouse from a distance of 600 mm. 2.2 Game scenario The game task was to explore the gamespace, to detect the exact positions of eggs, nests, obstacles, empty cells and the game character “Scrat” (controlled by the arrow
keys). Scrat can push eggs which must be put into the nests. An exploration of the gamespace had been implemented in two ways: by selecting one of the nine regions of interest with a camera mouse or by selecting these regions with a combination of the control key and the predefined key on the keyboard as shown in Fig. 2 and by hitting a spacebar to listen to an overview sound sequence of four cells. To increase the player opportunities two additional options have been considered: to shift the region of interest with the modifier key Shift as indicated in Fig. 2 and Fig. 3; or to have an access to any particular cell by selecting it with an additional key (H, J, N, M when the control key was already pressed) after the region of interest was marked. However, these extended options were out of scope of the research presented. The task difficulty gradually increased with the game Level (Table 1). Level 2 required more exploratory activity than Level 1: movements and listening to the layout of game objects. Levels 3-6 required more cognitive efforts and the combinatorial thinking to win. That is, some sequences of moves must be repeated to push the eggs in a right way. The patterns of gameplay were asymmetric and required more efforts to memorize the layout. The gameplay scenario provoked the player to build a cognitive model of the gamespace and to elaborate the perceptual strategy: how often to listen to the situation to be aware about the changes. Table 1. The gameplay levels overview. The game character “Scrat” occupied one more cell.
Game Level Eggs / Nests Obstacles Empty cells Moves to win
1 4 16 11 11
2 4 18 9 29
3 4 12 13 33
4 4 12 14 64
5 3 13 16 50
6 5 15 10 38
3 Sonification of the Objects and Listening Test 1 Polyharmonic sounds of MIDI instruments (JWCS MIDI Synthesizer Control) were assigned to the game objects. To facilitate the integration of separate sounds into the sound image of the gamespace these sounds must also resemble the objects they were associated with. Every time the virtual pointer changed the focus of interest the End sound was played automatically. By hitting a spacebar the player could listen to an overview sound sequence of the field of interest. A dynamic range of the haptic sense accompanying pitch head movements is significantly less accurate than in the horizontal direction. Therefore, to facilitate navigation with the camera mouse we added an option to activate sounds of rows (continuous sound of Percussive Organ, notes 80, 75, 70 from top to bottom row) by pressing and holding the control key. When the row sound was played continuously it also functioned as an external memory aid. Alternatively, when the head was in neutral position the player had an opportunity to reset the cursor (and virtual pointer) at the center of the game field with the F key. This option was counted together with the number of the spacebar hits (the number of repeats). Before sound mapping was tested in a gameplay scenario we carried out the listening tests with four subjects to evaluate auditory masking through a recognition
Number of repeats
rate of MIDI instruments, pitch and temporal parameters. During the listening test the sounds selected were presented in a randomized order 10 times within a sound sequence of four sounds to simulate an overview sonification of the field of interest as shown in Fig. 3. Each test sound (T) was presented in all possible combinations with other sounds (x). The time interval of sounds playback was set at 150 or 200 ms and 20 ms pause used between two sequential signals. The End sound was a short wave file of 20 ms with a pitch of 3 kHz. Due to embedded procedures to process camera input duration of the overview sound sequences has a small variation and did not increase over 750 and 950 ms accordingly. Fig. 4 illustrates the results of testing the Timpani instrument (47), 50th note in different combinations with other sounds. 5.0 4.0 3.0 2.0 1.0 0.0
Timpani (150 ms)
Timpani (200 ms)
Txxx xTxx xxTx xxxT TTxx TxTx xTTx xTxT xxTT TTTx xTTT TxTT TTxT
Error rate
Overview sound sequence
0.5 0.4 0.3 0.2 0.1 0
Timpani (150 ms)
Timpani (200 ms)
Txxx xTxx xxTx xxxT TTxx TxTx xTTx xTxT xxTT TTTx xTTT TxTT TTxT Overview sound sequence
Fig. 4. The results of the listening test averaged over 4 listeners.
When any of the test sounds was alone and not the first in the sequence the average number of repeats and the error rate was twofold greater irrespective of the MIDI instrument. Perceptual performance of the listeners was twofold greater when duration of sounds was 200 ms. Finally, the following sounds had been selected: the 27th note of the Sound of the Bird Tweet (123) MIDI instrument was assigned to the game character Scrat. The 66th note of the Glocken-spiel (9) MIDI instrument used to sonify Eggs and the 30th note for Empty cells. Nests were sonified with the 55th note of the Hammond Organ (16). The 50th note of Sounds of the Timpani (47) MIDI instrument was assigned to Obstacles.
4 Evaluation of the Game and Listening Test 2 Six non-blind volunteers took part in testing the game. All had normal hearing abilities. Three of the subjects were males and three were females. The age of the subjects ranged from 25 to 38 years. Two of the subjects had some experience with a camera-mouse and took part in previous research on playing a puzzle game in the absence of visual feedback [1]. No visual feedback was provided during the test.
The participants were asked to keep their eyes closed and instructed on how to navigate and use game controls. They were also given warm-up trials (Level 1) to familiarize themselves with the sounds and the input technique. Each trial was limited to 10 minutes to win. The wave file applauds was played back when a win occurred. When a stalemate was detected the player received the feedback with a prerecorded wave file game is over and the same Level was initiated again. The game could be played on the same Level (the F12 key) as many times as needed to finalize 8 successful trials. In average, each trial took less than 5 minutes. When the player was ready to play s/he must press the Z key, the Escape key used to stop the game. Thus the players could regulate a pace of the test and rest between trials. Every participant played two series of games using a camera-mouse or only the keyboard input techniques presented in a random order. According to the number of Levels, each series comprised of six 1-hour sessions on 10-15 trials per day. Data were collected on 576 games in total: 6 players × 6 Levels × 8 trials × 2 techniques. Players used a short break between each trial to discuss the problems when needed. The results of the tests are summarized and illustrated in Fig. 5. As can be seen in Table 1 the difficulty of the game is a complex function, depending on the composition of four types of the game objects and the number of possible ways to push eggs into the nests. The game completion time is an integrative parameter which reflects the dynamics of cognitive and perceptual-motor factors affecting gameplay. The back steps and repeated moves increase the time and the total number of moves. The number of key clicks to select one of the nine regions of interest with keyboard input was counted separately as well as the track lengths with camera input. However, none of these parameters can be considered as an indicator of the specific gameplay skills. These parameters indicate a highly dynamic process of elaborating the cognitive model of the game and gameplay strategy. The number of repeats (Fig. 5) in Levels 1, 3 and 6 has a small difference because with an acquisition of the experience in perceptual integration the players made checking for only specific fields of interest nearby obstacles and nests. The players had better performance (t(23) = 3.774, p