Evaluating Numerical Tabular Data Comprehension Tasks under ‘Speech only Condition’ with ‘Speech and Pitch Condition’ Rameshsharma Ramloll MultiVis Internal Report Status: Work in PROGRESS Computing Science Department Glasgow University
[email protected] ABSTRACT
Apparatus and Stimuli
This report describes an experiment to compare the workload of participants when faced with data comprehension tasks under two conditions. In one condition participants can access numerical tabular data in speech only while in the other condition, participants are provided with opportunities to access the numerical data mapped to pitch as well. We found that there is a significant increase (p < 0.01) in the number of correct answers obtained and a significant decrease (p < 0.01) in the mental load and frustration of participants when provided with the opportunity to access numerical tabular data as a piano pitch instead of as speech alone.
The application was developed in VC++ using Microsoft’s SAPI4.0 (Speech Application Program Interface). A MIDI synthesiser generates the ‘piano’ pitch used to represent a given numerical value in the table. In this experiment, since we are dealing with MIDI, the numerical data values are truncated so that they lie between 0 and 127 to allow a straightforward mapping of value to pitch. Participants have access to the auditory messages through headphones and navigate the table through keyboard inputs.
Keywords
Data visualisation, sound graphs, subjective workload assessment INTRODUCTION
The purpose of this experiment is to evaluate how participants tackle data comprehension tasks under two auditory conditions namely (1) speech only (SC) and (2) speech and pitch (S&PC). In particular, we want to investigate whether there are grounds to introduce pitch to make the experience of data browsing more effective. Arguably, the obvious way to represent a table (such as that described in Figure 1) in sound is to use speech feedback to inform the user where she is in the table and what is available there. We are investigating whether there is any merit for associating a given numerical value with a representative pitch instead of always having the value read out in speech. In fact, navigating across a row or a column will generate a sound graph [1] or tone plot [2] that has traditionally been associated with line graphs in the visual medium. Figure 1 Prototypical auditory tabular data browser We next describe the functionality of various keys and our strategies to assist the navigation of the table.
1
In the S&PC mode, the arrow keys cause the value of the relevant cell in the current row to be played.
OVERVIEW OF THE INTERFACE (INPUT)
We primarily use the section of a keyboard typically used for numerical input (Figure 2).
5 KEY (CENTER OF NUMPAD) = READ CURRENT (X,Y) VALUE
Pressing the Center Key 5 will cause the position of the participant in the table followed by the value of the current cell to be read out. Example: “ Tom, English, thirty seven ” PgUp KEY = JUMP TO BEGINNING OF A GIVEN X
Pressing PgUp will bring you to the beginning of the current column. Example: “Jumping to the beginning of French” END KEY = JUMP TO BEGINNING OF A GIVEN Y
Figure 2 Some input keys used in querying the data
Pressing End will bring you to the beginning of the current row.
The keys used and their functionality is now presented.
Example:
SPACE BAR = TOGGLING BETWEEN SC and S&PC modes
“Jumping to the beginning of Tom”
Pressing the space bar allows you to toggle between the SC and S&PC modes. ENTER KEY = OVERVIEW
Pressing Enter once will give you a description of the data set in synthetic speech and will describe the function of the arrow keys. Example: Synthesized speech: “ Auditory display shows Test results. Vertical arrows select subjects. Horizontal arrows select students.” Table 1 Typical 2-dimensional table that can be browsed using our prototype
ESCAPE KEY = STOP SPEECH
Pressing Escape once will stop any speech output by freeing any queued up speech messages.
OVERVIEW OF SOUND MAPPING STRATEGY
UP and DOWN arrow KEYS = EXPLORE Y FOR A GIVEN X (See table 1)
In the S&PC mode, the pitch is directly proportional to the value of the data i.e. the higher the value, the higher the pitch.
Pressing Up and Down arrow keys allow you to navigate up and down a column of the table.
The S&PC mode also involves panning of the sound sources. The latter are localised along a line joining the left and right ears and positioned according to their position in their respective row or column.
In the SC mode, the arrow keys cause the labels of rows to be read out. In the S&PC mode, the arrow keys cause the pitch associated to the value of the relevant cell in the current column to be played.
Thus, when moving down a column, the first value is associated with a pitch heard in the left ear, the subsequent values are associated with pitches localised on a line between the left and right ear, and finally, the last value of the column is associated with the right ear.
RIGHT and LEFT arrow KEYS = EXPLORE X FOR A GIVEN Y
Pressing Right and Left arrow keys allow you to navigate left and right a row of the table.
The same effect is obtained while navigating across a row. The first value is heard to the left, the last value is heard to the right and the intermediate values are heard in between the left and right ears.
In the SC mode, the arrow keys cause the labels of columns to be read out.
2
3.
Jaimie: ‘tones stick better in the mind than spoken data’. 4. Pitch mapping allowed fast identification of trends, maximas and minimas. 5. For example while listening to rape data down the years, Jaimie said : 6. ‘The police has been doing their job well’ without being prompted illustrating that a downward trend had been recognised immediately. 7. He also finds identification of values based on pitch especially in the mid range ‘really works’. In a set task, the subject was able to identify successfully all the four minimas in the ‘Arson curve’ in two navigation strokes, one down the column and the other up, just as a checking measure. After the sound spatialisation strategy is explained, the following was observed:
In this current design, participants receive an auditory cue that informs them whenever they leave the table while navigating across a row or column. This strategy presumably gives them an idea of the limits of rows and columns. Once a participant is out of a table, clicking on the centre ‘5’ key causes the “not in table” message to be read out. When the user wanders away from the contents of the table, clicking on the arrow keys produces messages such as “move left”, “move right”, “move up” and “move down” which guides her back into the table. PILOT STUDY
Before carrying out the workload assessment, a pilot study was carried out in order to identify early on any obvious problems with the prototype that may hinder the progress of the experiment. We also took this opportunity to carry out a think aloud procedure [3]. Participant
SUBJECT: JAIMIE X SUBJECT DATA: Visually impaired, Left ear hearing (100%) and Right ear hearing (60%)
The panning strategy was found to be useful for a number of reasons:
Training
Subject found that the training time was pleasantly short. The commands and modes of interaction were easy to understand and remember. Training time on this occasion did not exceed 5 minutes. The ability to learn the interface quickly and get on with the task was a definite plus. Negative Criticisms
1. 2. 3. 4.
5. 6. 7. 8.
The idea of a no-table zone bordering the table is potentially confusing. A more distinct sound is needed for the end of rows and columns. In the speech mode, the user should be given some control on the degree of verbosity required. In that respect, the use of meta-keys have been suggested to give the user more control on the amount of speech information needed by the user. For example, in the speech mode, the user should be able to control whether he needs the row or column information all the time. Voice messages associated with a given command should be interrupted as soon as a new command is launched. Low frequency sounds cannot be distinguished very easily. The sound mapping is simple but needs to be explained if it is to be useful.
2.
It allows the user to guess the size of the data set. This is especially useful for large data sets.
2.
It also allows fast identification of points of interest and where they occurred.
3.
It is more aesthetically pleasing; it surely adds something to the browsing experience.
Overall impression of the subject
The overall impression was very positive in the sense that the interface was found to be simple and powerful. The subject suggested that the application will be very useful and hopes to be able to use it to read in data in Excel sheets for example. A call for integration in the subject’s current environment illustrates his keen interest in the prototype. COMPARING WORKLOADS USING NASA TLX Hypothesis
Comparing workload for a minimal set of data inspection tasks under ‘speech only condition’ with ‘speech and pitch condition’. (Two tailed hypothesis where effect is not specified at priori.) Speech only condition (SC)
A table is navigated using arrow keys. The user is able to get speech feedback about the current position in the table and the value of the relevant cell. Speech and pitch condition (S&PC)
A table is navigated using arrow keys. The user is able to get both pitch and speech feedback about the current position in the table and the value of the relevant cell. Participants
Positive Comments
1.
1.
The sounds are aesthetically pleasing, the subject compared the table browser to a musical instrument. Navigation of the table was fast according to the user and the feedback speed excellent.
Experimental Design
16 subjects are required for this repeated measures experiment design. Participants are subjected to the SC and S&PC conditions as described in table 2. The order in 3
the training phase, participants were provided with a different data set that was about the gross national products of a country over a number of years.
A.1
(SC)
TLX
Task I A.2
(SC)
(S&PC)
TLX
(S&PC) Task II
Answer the following questions based on the information available in the table. Questions
(15 mins)
Evaluation
(20 mins)
Session II
This dataset is about the performance of a number of students in various subjects.
(S&PC)
TLX
(SC)
(SC)
Name the student(s) scoring the highest marks for Biology.
2.
The list of student is presented in ascending order of marks for a given subject. Name the subject.
3.
Name the subject most likely to have the highest number of passes.
4.
Name the student(s) scoring the lowest marks for the Assembly course.
5.
In which course performance was particularly poor?
6.
Name the student most likely to have the highest total marks.
TASK II: London Crime Statistics
This table is about the type of crime and the number of cases reported in London from 1974 to 2000.
TLX
Answer the following questions based on the information available in the table.
TLX
Task II TLX
1.
TLX
Task I
Task I B.2
(S&PC)
TASK I: Student Performance Analysis
Task II
Task II B.1
(15 mins)
Evaluation
(20 mins)
Session I
15 mins
(20 mins)
Training
Explanation
TLX
Group (4 subjects)
which the tasks and auditory conditions are presented to the participants ensures that any effects due to inherent task difficulty levels and practice (i.e. increase in task tackling efficiency due to increase in familiarity) are minimised. During the first and second session, the participants are given data analysis tasks based on a questionnaire. Each session is followed by a NASA TLX test to gather subjective workload information about each task. Our homegrown computerized version of the NASA TLX is used to speed up and facilitate the process of data collection from the participants. We follow strictly the guidelines of the NASA TLX and calculate the combined workload based on the subjective weights obtained after the pair wise comparison of the workload categories. This step will allow us to compare workload values obtained with this experiment to other values obtained form other independent experiments.
Questions TLX
Task I
1.
State the year(s) in which the highest number of murder cases was/were reported.
2.
State the year(s) in which the highest number of robbery cases was/were reported.
3.
Which type(s) of crime had a consistently high number (~>50) of cases reported?
4.
Which type(s) of crime show(s) a consistent increasing trend?
5.
State the year(s) in which the numbers of hate crime cases was/were lowest.
6.
Which type(s) of crime show(s) a consistent decreasing trend?
Table 2 Experiment schedule Participants
The participants were paid 5 pounds per hour. All are sighted and attempted the tasks with the computer screen switched off. Those who did turn up for the experiment included 8 women and 7 men and did not have any overtly recognisable auditory impairment. The participants were a mix of computing and information technology postgraduates. Experimental Procedure
RESULTS
The data analysis and inspection tasks that participants had to tackle are described shortly. The questions were designed according to a number of requirements. Firstly, the data analysis should not be too complicated so that the focus of the task is on the analysis and interpretation of the question rather than on how the tabular data is perceptualised. Secondly, the questions must not be favourable to the peculiarities of our chosen auditory tabular data browser. The two tasks that participants faced in this experiment are described under the headings of Task I and Task II. During
We now compare the various scores obtained for the individual categories deemed to contribute to the overall work load of a task according to the NASA Task Load Index. Figure 3 to Figure 11 describes the various data we collected from the 15 participants that took part in the experiment. Each picture is followed by the four parameters that are obtained from a related-t test applied on our raw data.
4
Mental Demand
Effort Effort
Mental Dem and 20
20
18
18
16
16
14 TLX Score
TLX Score
14 12 10 8
12 10 8 6
6
4
4
2
2
0
0
F1
F1
F2
F3
F4
F5
F6
F7
F8
M1
M2
M3
M4
M5
M6
F2
F3
F4
F5
F6
F7
M7
F8
M1
M2
M3
M4
M5
M6
M7
Participants
Participants
speech
speech
Figure 6 Comparing effort
Figure 3 Comparing mental demand T14 = 3.040, P 9.600
two-tail
Speech&Pitch
Speech&Pitch
T14 = 1.681, P two-tail = 0.115, SCMean = 27.410, S&PCPitch = 21.124
= 0.009, SCMean = 13.333, S&PCPitch =
Performance
Physical Demand
Performance Physical Demand
20 12
18 16
10
14 TLX Score
TLX Score
8 6 4
12 10 8 6 4
2
2 0
0 F1
F2
F3
F4
F5
F6
F7
F8
M1
M2
M3
M4
M5
M6
M7
F1
F2
F3
F4
F5
F6
F7
Participants
speech
F8
M1
M2
M3
M4
M5
M6
M7
Participants
Speech&Pitch
speech
Speech&Pitch
Figure 4 Comparing physical demand
Figure 7 Comparing performance
T14 = 0, P two-tail = 1, SCMean = 2.533, S&PCPitch = 2.533
T14 = 1.384 , P two-tail = 0.188 SCMean = 11.267, S&PCPitch = 9.000
Temporal Demand
Frustration Te mporal Dem and
Frustration
20
20 18
16
16
14
14
12
12
TLX Score
TLX Score
18
10 8
10 8
6
6
4
4
2
2
0
0
F1
F2
F3
F4
F5
F6
F7
F8
M1
M2
M3
M4
M5
M6
M7
F1
F2
F3
F4
F5
F6
F7
speech
F8
M1
M2
M3
M4
M5
M6
M7
Participants
Participants
speech
Speech&Pitch
Figure 5 Comparing temporal demand
Figure 8 Comparing frustration
T14 = 1.543, P two-tail = 0.145, SCMean = 11.333, S&PCPitch = 9.600
T14 =3.192, P 6.333
5
two-tail
Speech&Pitch
= 0.007, SCMean = 10.4, S&PCPitch =
Correct answers obtained
1.
There is a significant decrease in the mental demand in the speech and pitch condition when compared to that in the speech alone condition.
2.
There is a significant decrease in frustration in the speech and pitch condition when to that in the speech alone condition.
3.
There is a significant increase in the number of correct answers obtained in the speech and pitch condition when compared to that in speech the alone condition.
4.
There is a significant decrease in the combined NASA TLX workload rating in the speech and pitch condition.
5.
While the mean of TLX scores indicate that pitch lowers effort, temporal demand and increases performance, these effects are not significant.
6.
The consistently low score of physical demand indicates that most participants did not regard the task as physically demanding.
Correct Answ ers out of 6 7
Correct Answers
6 5 4 3 2 1 0 F1
F2
F3
F4
F5
F6
F7
F8
M1
M2
M3
M4
M5
M6
M7
Participants Speech
Speech and Pitch
Figure 9 Comparing correct answers obtained T14 = -4.012, P two-tail = 0.001, SCMean = 3.867, S&PCPitch = 5.133 Number of questions attempted Number of Qs Attem pted 7 6
Attempts
5
Based on these results, we can safely propose that providing participants the opportunity to access numerical information as pitch whenever required improves the effectiveness of the auditory data browsing tool.
4 3 2 1
Post experiment discussions with participants reveal that while they found the panning of the data pitches aesthetically pleasing, they did not use it consciously during their tasks. Participants noted that listening to pitches gave them a better overview of the data.
0 F1
F2
F3
F4
F5
F6
F7
F8
M1
M2
M3
M4
M5
M6
M7
Participants #Q Attempted S
#Q Attempted P
Figure 10 Comparing number of questions attempted T14 = -1.920, P = 5.467
two-tail
Interviewing the participants also reveal that the significant improvement of participants under the speech and pitch condition (as far as the number of correct answers is concerned) is not matched by their confidence about how successfully they have completed a given task. The inability of participants to associate an exact value to a given pitch can perhaps explain this lack of confidence about a given answer. However, there are cases where participants do feel more confident about their answers in the speech and pitch condition. They argue that under this condition, they are able to browse a larger data set and are more confident that they haven’t missed any important parts.
= 0.075, SCMean = 4.667, S&PCPitch
Comparing workloads
A number of participants found the speech only condition to be annoying because the length of the auditory messages was long and thus time consuming. In addition, there were a number of situations where they felt the auditory speech message contained more information than what they really needed.
Figure 11 Average of TLX scores next to combined workload
Many participants also suggested that they were often not confident about their mental image of the data set. Some attempted to visualise a table, others just tackled the exercises without constructing some mental image. A number of participants jotted down notes on paper while
T14 = 3.585, P two-tail = 0.003 GENERAL DISCUSSIONS
The results can be summarised as follows.
6
Once the lessons learnt from this experiment is used to inform changes in our next prototype, we plan to replicate the same experiment with visually impaired individuals.
attempting the different tasks. However, this mostly happened in the speech only condition. About the table navigation issues, most users found the shortcut keys very useful. Many participants frequently made use of the Home key or to go back some starting point once they felt lost. However, most participants found that they should not be allowed to leave the table even if they have reached one of its edges. The current strategy did not prevent them from getting lost once they overstep the table’s boundary.
ACKNOWLEDGMENTS
We thank all the participants of this experiment. REFERENCES
FUTURE WORK
There is plenty of room for improving the amount of control that participants need as far as speech output is concerned. This increased speech verbosity control can be achieved by a careful redesign of the input interface. The next prototype will be redesigned so that participants are not allowed to get out of a table by crossing its boundaries. They will still be presented with cues that indicate the limits of rows and columns.
7
1.
Mansur, D.L., Graphs in Sound: A Numerical Data Analysis Method for the Blind, in Computing Science. 1975, University of California, Davis: California.
2.
Bulatov, V.l. and J. Gardner. Visualisation by People without Vision. in Workshop on Content Visualisation and Intermediate Representations. 1998. Montreal, CA.
3.
Blackman, H.S., Overview: The Use of Think Aloud Verbal Protocols for the Identification of Mental Models, in Proceedings of the Human Factors Society 32nd Annual Meeting. 1988. p. 872-874.