progress report - Semantic Scholar

16 downloads 0 Views 795KB Size Report
Joanna Lumsden, John Williamson & Stephen Brewster. Department of Computing Science. University of Glasgow. 17 Lilybank Gardens. Glasgow G12 8RZ.
ENHANCING TEXTFIELD INTERACTION WITH THE USE OF SOUND Joanna Lumsden, John Williamson & Stephen Brewster October 2001 TR – 2001 – 99

http://www.dcs.gla.ac.uk/research/audio_toolkit/

ENHANCING TEXTFIELD INTERACTION WITH THE USE OF SOUND Joanna Lumsden, John Williamson & Stephen Brewster Department of Computing Science University of Glasgow 17 Lilybank Gardens Glasgow G12 8RZ E-mail: {jo,stephen}@dcs.gla.ac.uk October 2001 1.

INTRODUCTION

Despite our multi-sensory capabilities, the human-computer interfaces with which we commonly interact are principally designed to target the visual modality. Wasting other sensory channels that could be utilised, user interfaces of this nature typically cause users to become visually overloaded (Brewster, 1997, Edwards et al., 1992). Although the addition of (badly designed) sound to graphical user interfaces is often considered irritating (Lumsden and Brewster, 2001), Edwards et al have shown that the introduction of carefully designed sounds can significantly improve user interaction (Edwards et al., 1992, Edwards et al., 1995). To facilitate the effective construction and integrated use of audio enhanced1 user interface components, the Audio Toolkit Project has developed an architecture and library of multimodal user interface widgets based on the Java™ Swing™ libraries (Eckstein et al., 1998). Additionally, in recognition of the fact that developers currently have little or no expertise in the field of audio enhancement of user interfaces, the Audio Toolkit is accompanied by a set of guidelines to assist user interface designers when creating new widgets for inclusion in the Audio Toolkit and to guide them when incorporating the existing toolkit widgets within user interface designs(Brewster et al., 2001, Crease et al., 1999, Crease et al., 2000a, Crease et al., 2000b). One such widget - the most recent addition to the Audio Toolkit - is the TextField (the MTextField) widget. This, like the majority of the Audio Toolkit widgets, is an audio-enhanced plug-in replacement for its Swing counterpart - the JTextField widget. Unlike the other widgets in the Audio Toolkit, however, its audio feedback has been designed such that it incorporates both non-speech audio and synthesised speech. This report introduces the MTextField widget and provides a high-level outline of its design. Furthermore, it describes the comprehensive evaluation that was performed in order to assess the audio feedback incorporated within the MTextField widget and discusses the results of this evaluation. 2.

THE MTEXTFIELD WIDGET

Despite its relatively straightforward and well defined rôle, the MTextField widget is incredibly complex in terms of the aspects of its behaviour that are potential candidates for audio-enhancement. Not only does the MTextField widget exhibit sonifiable behaviour similar to that of less complex widgets - for example, mouse over/mouse click on etc. for an MButton widget - but it also presents opportunities for sonification of navigation activities within its text and for sonification of a range of text manipulation activities. Highlighting how the various forms of its behaviour are accommodated, this section outlines the current design of the MTextField widget. Based on the visual appearance and fundamental behaviour of the Swing™ JTextField widget, the MTextField widget is designed such that user interface developers can - using the Audio Toolkit architecture - include additional modalities in the widget's design in order to make use of more than just 1

And ultimately feedback in other modalities. 2

our visual sense. By separating the implementation of the MTextField widget's presentation from its underlying functionality, the Audio Toolkit makes it possible to alter the presentational aspect of the widget without having to affect changes to its core functionality. One design for the MTextField widget's presentation is included with the Audio Toolkit and it is this design (and its evaluation) that will be discussed throughout the remainder of this report. Given the natural mapping that exists between the content of an MTextField widget (i.e. its text) and the spoken word, the use of synthesised speech was an obvious output presentation type for use in the design of the widget. By adopting synthesised speech alongside the more familiar non-speech audio (Brewster and Crease, 1997, Brewster, 1998, Brewster and Crease, 1999, Crease and Brewster, 1998, Crease and Brewster, 1999) the potential for audio-enhancement in the design of the MTextField widget feedback was broadened considerably, as is discussed below. One of the principal interaction activities with respect to a textfield - which often causes frustration, especially where screen resolution and real estate are severely restricted - is the placement and movement of the cursor within the text. By mapping cursor position to a scalar value on the western musical scale, this activity is represented audibly within the MTextField widget. Furthermore, this allows selection of text within the widget to correspond to a pair of values on this scale - the start point and end point position of the selection. The MTextField widget utilises a combination of pitch and stereo panning to achieve the mapping of positional information to audio feedback; cursor position is mapped to pitch which ascends from left to right (as in the piano keyboard) and to stereo panning moving from hard left at the start to hard right and the end. To avoid using pitches that are either too high or too low for typical human perception and to avoid encountering pitch distortion at the extremes of the mapping, the range of pitch selected for use in this mapping spans two octaves - stretched to fit the length of the text field starting at middle C. Although this means that for very long pieces of text there may be little differentiation between neighbouring positions, problems that may occur as a result of this should be minimal given that textfields are not normally used to hold text of more than 50 characters. That said, using the pitch stretching technique can result in aliasing which effectively means that two neighbouring positions may appear to have exactly the same tone despite the fact that they should theoretically be a significant proportion of a semitone apart. To avoid this, the MTextField widget uses microtonal feedback where positions are directly mapped to frequencies; the result being that the feedback is in effect n-tone equal temperament where n is the number of positions in the textfield. Since cursor movement will always cause a pitch change in the correct direction, this gives continuous logical feedback2. As mentioned above, the MTextField widget presents many behaviours which are candidates for sonification, each of which must be independently distinguishable from the others. Since Edwards, Brewster and Wright have shown that timbres are easy to recognise and provide an ideal way to distinguish between sounds (Edwards et al., 1995) the MTextField widget uses timbres to provide differentiable feedback for different activities. When interacting with a MTextField widget, users are likely to be generating events (which map to audible feedback) at a very fast rate and so the audio feedback design for the MTextField widget uses timbres which have short attacks - for example, pizzicato strings. Given the complexity of the MTextField widget and its associated audio feedback design, the following discussion of the design - which is centred around the MTextField widget behaviours considers non-speech only sounds and combined use of speech and non-speech sounds separately. 2.1

Non-Speech Only Feedback

Caret Movement Caret movement3 uses the mapping described above; every time the caret position is changed, a note is played with the pitch and pan properties outlined. If the position equals the very start or end point of the It is recognised that, for those users who are particularly accustomed to the 12 tone equal temperament scale, this may be slightly disconcerting to begin with. 3 That is, movement of the current cursor position relative to the text in the textfield. 2

3

textfield, an interval is formed with the note corresponding to the position itself and the note a perfect fifth below if the caret has moved to the start or a perfect fifth above if the caret has moved to the end. This interval has been included in the feedback design to indicate to users that they have reached a special position within the textfield at which their interaction options have changed - that is, they cannot move any further in their current direction. The timbre used to present this feedback is the harp, having a short attack and an appropriate pitch range. Selection Using the Java™ Swing™ API, text selection can be performed using two mechanisms: (1) clicking and dragging the mouse cursor from the intended start point of the selection and releasing the mouse cursor at the intended end point of the selection; and (2) using the navigation keys in conjunction with the shift key to select text relative to a given caret location within the text. Either way, the result is an identified region of text with special properties; it can be cut, copied, deleted etc. When using mechanism one, at any point after starting to drag the mouse cursor and prior to releasing the mouse, an area of the text is 'highlighted'. Highlighted text itself has no special properties, but when the mouse button is released, it becomes selected text - it is, in effect, tentatively selected text which becomes a confirmed selection when the mouse button is released. To reflect the subtle difference between highlighting and selecting text, the MTextField widget incorporates two different but closely related feedback designs. To indicate selection (according to either of the mechanisms outlined above), the MTextField widget plays a pair of notes corresponding to the start and end points of the selection, mapped as in the positional design described previously. The sounds are played in the order in which the end points of the selection are defined to allow for determination of the 'polarity' of the selection - for example, if the two notes are ascending, the selection has been started at the left and extended to the right, and vice versa. An additional sense of the length of a selection is given by inserting a delay between the playing of these two notes; the length of this delay starts at 60ms and increases by 8ms for every additional character included in the selection, capping at 500ms. This maximum limit is enforced since increasing the delay any further would provide little additional information and would potentially risk confusion between the selection feedback and that relating to future events. Selection-oriented feedback is represented using a vibraphone timbre to enable users to distinguish it from other positional sounds used. Continuous feedback representing the extents of the effected region is used to reflect the actual process of highlighting text. The sounds used are very similar to those for selection but there is no delay between the notes played and an organ timbre is used. By generating a sound every time the extent of the highlighted region changes, the effect is that of a continuously changing sound as the highlighted region is dragged out. When the mouse is finally released, the standard selection sounds as described above are played to indicate the existence and parameters of a new selection. Deletion The deletion sound is identical to that for caret movement since it represents a similar action, but uses a pizzicato string timbre to reflect the semantic difference between the two actions and distinguish it from ordinary caret movement. 2.2

Speech and Non-Speech Feedback

As mentioned previously, the MTextField widget differs from others in the Audio Toolkit by virtue of the fact that it uses synthesised speech as well as MIDI sound to present feedback to the user. Although this allows much richer information to be communicated, it is at the expense of speed. In recognition of this, the MTextField widget avoids annoying speech build-ups by working on the basis that any event that produces audio feedback silences any speech currently being read, thus speeding up the interaction and allowing the audio feedback to keep pace with user activity. 4

Silence A user can request that current speech (that is, speech that is currently being spoken) be silenced via use of the ESC key. This does not disable future synthesised speech feedback but takes into account the fact that a user may wish to instantaneously mute any given speech. Cut When a user performs a cut operation, speech is used to indicate the text that has been cut from within the MTextField widget; the speech synthesiser says 'Cut', reads out the text that has been cut, and then provides a spelling of this text. The need to spell out the cut segment of text is enforced by the fact that the speech synthesiser may not always cope sensibly with unusual text segments. Additionally, since a cut also represents a deletion, the text deletion feedback sound (see above) is played. Copy Feedback for the copy action is very similar to that for cut. However, the speech synthesiser precedes the announcement of the copied text and its spelling with the work 'Copy' instead of 'Cut'. Additionally, using a music box timbre rather than the vibraphone timbre, the deletion sound is replaced with the selection change sound. The audio feedback for the copy action is considered especially useful since there is typically no visual feedback for this event in standard textfield widgets. Paste Feedback for the paste action is again very similar to that for both cut and copy, excepting that a unique interval - the two notes4 being A4 and D4 - is played using the music box timbre to distinguish this action from others. Targeting and Hovering As previously discussed, the visual presentation of standard textfields does not typically make it easy for users to determine the exact caret position when they click the mouse cursor within the textfield; since caret positions are between characters it can be very hard to determine on which side of a character the caret will appear, especially with variable width fonts or small screen or low resolution displays. To alleviate this problem, the MTextField widget is enhanced with 'targeting sounds' - that is, a click sound which is played every time the mouse cursor is moved over a new position in which it is possible to place the caret. To avoid annoyance and/or distraction and to maximise the usefulness of this feedback, the targeting sound is only played when the mouse speed drops below approximately 40 pixels/sec as the user slows down to target. Additionally, if the user hovers the mouse cursor over any candidate caret position within the MTextField widget for more that 0.8s, the speech synthesiser reads out: (1) the word over which the mouse cursor is hovering; and (2) the letters of that word between which the mouse cursor is hovering - that is, the precise location at which the caret will appear should the user click the mouse without moving it further. This permits accurate targeting, albeit at the expense of speed. Typing As users type characters into the MTextField widget, the characters are read back. There is a short delay of 60ms before this speech starts to ensure that if a user is typing quickly the speech is cut off prior

4

Standard notation to represent musical notes is the with octave starting at 0.. 5

to starting and stuttering is thereby avoided. If speech feedback has been disabled, the caret movement sound is used in place of this feedback since character inserts move the caret position. 2.3

Additional On-Demand Feedback

The inclusion of speech feedback within the MTextField widget allows for the inclusion of a series of additional functionality that would otherwise not normally be possible via other feedback techniques. These additional commands are outlined below. Read Word This control - which is bound, by default, to the Ctrl+Alt+W key combination - causes the speech synthesiser to read out and then spell the word within which the caret is currently located. For example, if the word is optical and the caret is positioned between the p and the t, the synthesiser will say: 'Current word is optical, cursor is between p and t [pause] spelled as o, p, t, i, c, a, l'. Alternatively however, if the user does not interact with the MTextField widget for a period of 2.5s (or more) and there is no selected text, this speech feedback is produced automatically. This 2.5s delay time was selected after a series of informal testing to identify a timing that would avoid annoying users whilst still making the feedback available within an acceptable and useful time period. Read Selection Bound by default to the Ctrl+Alt+S key combination, this control causes the speech synthesiser to read out the positional location of the two end points of the selection, the contents of the selection, and the spelling of the contents of the selection. If, for example, a textfield contained 'Operation eight' and the current selection covered 'ation eigh', the synthesiser would say: 'Selection from 5 to 16 contains ation eigh, spelled as a, t, i, o, n, space, e, i, g, h'. Alternatively, if a selection exists and the user does not perform any operations on the textfield for a period of 2.5s or more, this selection feedback will be read out automatically. Read Field (Contents) The read contents control - which is bound, by default, to the Ctrl+Alt+R key combination - causes the speech synthesiser to speak back the entire contents of the MTextField widget. For example, if the textfield contained 'Sky pattern', 'Textfield contents are sky pattern' would be read back. Read Position This control causes the speech synthesiser to give a verbal indication of the current caret position within the textfield. It reads out an approximate percentage position (to the nearest 10%), the index of the characters between which the caret is positioned (relative to the associated word), and the index of the word itself. So, for example, positional feedback of this nature might be something like: 'Cursor is at around 70%, between characters 3 and 4 of word 6'. This feature, which is bound by default to the Ctrl+Alt+C key combination, does not reveal information about the actual contents of the MTextField widget and so is safe to be used in environments where privacy is an issue. Read Clipboard Contents Providing functionality that is absent in the JTextField widget supplied within the Java™ Swing™ library, and bound by default to the Ctrl+Alt+B key combination, this control causes the speech synthesiser to 6

read back the current contents of the clipboard buffer. So, if for example the clipboard buffer contains the word 'reason' the speech synthesiser would read out: 'Clipboard contents are reason'. 2.4

Feedback Levels

Speech Levels Whilst the above feedback design acknowledges the potential interaction improvement that may be achieved as a result of introducing speech into the MTextField widget's output presentation, it is also recognised that this form of feedback is not always appropriate. A user may, for example, find the synthesised speech distracting or alternatively spoken feedback may reveal confidential information to those who may be listening. To accommodate this, the presentation design for the MTextField widget incorporates a number of speech levels, controlled by a sensor. Using this, either the user or the application of which the MTextField widget is part can control the amount of speech feedback that is provided. The speech levels are: •

Off Speech is disabled.



Private Only speech that will not reveal the actual contents of the MTextField widget or the clipboard buffer will be used.



Requested Speech is enabled but is only provided when explicitly requested by the user - for example, using the Read Field control.



Some Automatic All the speech at the requested level is enabled with the addition of the typing and clipboard read back. The automatic read back of the current word and current selection is disabled.



Full Speech All of the speech functionality is available.

The following table summarises the availability of speech-related feedback at each of the above levels.

FEEDBACK Auto read word Auto read selection Hover Typing Cut Copy Paste Read Word Read Selection Read Field (Contents) Read Clipboard Contents Read Position

OFF -

PRIVATE Yes

SPEECH LEVEL REQUESTED SOME AUTOMATIC Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Table 1 - Speech levels and corresponding feedback

7

FULL SPEECH Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Fidelity FIDELITY LEVEL LOW MEDIUM HIGH Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

FEEDBACK Under change Hover Insert Type Mouse over Highlight Select Delete Caret Move Focus Action event (e.g. enter)

Table 2 - Fidelity levels with corresponding audio feedback levels

In accordance with the other widgets in the Audio Toolkit, adjusting the fidelity parameter for the audio module in which the output presentation for the MTextField widget is defined changes the amount of feedback the MTextField widget provides. Table 2 maps fidelity levels to the feedback that will be provided at each level. Since requested speech feedback (e.g. Read Selection etc.) is not affected by fidelity settings by virtue of the fact that it must be explicitly requested by the user, is it not listed in Table 2. If it is necessary to prevent this type of feedback from being generated (for example, for reasons of privacy) the speech level controls can be used. 3.

EVALUATING THE MTEXTFIELD WIDGET

Having completed its design and implementation, the MTextField widget was subjected to comprehensive, controlled evaluation. This section outlines the evaluation process itself, and the following section discusses the results. 3.1

Experimental Design

In essence, the evaluation aimed to compare the standard (non-audio) JTextField widget and the MTextField widget. Since the MTextField widget is an audio-enhancement of the JTextField widget this basically meant evaluating the MTextField widget with all sounds disabled (hereafter referred to as the no audio condition) against the MTextField widget with sounds enabled. Since the MTextField widget is the first in the Audio Toolkit to incorporate both non-speech sounds and synthesised speech in its feedback design, the evaluation of the audio-enhanced condition was further separated into evaluation of two independent conditions: the non-speech audio condition and the full audio condition (which included both non-speech and speech audio). Hence, the evaluation was a three condition experiment where the conditions were classified as: no audio, non-speech audio, and full audio. To reflect the fact that textfield use on small screen and/or mobile devices can be problematic due to restrictions on the size of the widget, and to provide a realistic set of conditions for the evaluation, a Stylistic pen-based laptop with an on/touch-screen keyboard was used to facilitate interaction with a simple application designed to provide a controlled environment for subjects to interact with the MTextField widget. The main component of the application was the MTextField widget which was set to be 8 characters in length and was centred in the display area. The remainder of the display area was either blank (and non functional) or contained the on/touch-screen keyboard. The size of the MTextField widget was limited to simulate restricted screen space on devices like PDAs. A screen shot of the application is show in Figure 1.

8

Figure 1 - Screen dump of the evaluation application showing the MTextField widget and customised on/touch-screen keyboard

To assess the use of the MTextField widget across the range of available features, three task types were identified: data entry; data editing; and clipboard (only) manipulation. The first two are self-explanatory; the third refers to rearranging pre-scrambled data using only the cut, copy, and paste operations (text entry and deletion were disabled for the duration of this task type). Together, these task types are representative of the kind of textfield manipulation that is typical within user interfaces. For the purpose of the evaluation, and to avoid confusion, task is taken to refer to the actual task types (data entry etc.) and individual instances of each task (type) - for example, entering a word correctly within the textfield - are referred to as entries. Since there were three different experimental conditions (no audio, non-speech audio, and full audio) there were six permutations of these conditions that had to be evaluated. To minimise interference from learning effects, the experiment was counterbalanced by allocating an equal number of participants to each permutation. This allocation process was sufficiently random to avoid biasing the experiment; the first subject to volunteer for the experiment was randomly allocated to one of the permutations, and the remainder of the subjects were then allocated to each remaining permutation in sequence in the order in which they signed up to take part in the experiment. Allowing three participants per permutation meant that eighteen subjects were required in total. Subjects were acquired by means of advertisements in the Departments of Computing Science and Psychology in the University of Glasgow and were reimbursed for taking part in the experiment. To provide realistic but not necessarily familiar data for the experiment, words of five characters or more were selected in a semi-random fashion from a dictionary; the selection process was not entirely random since words were selected according to length and were filtered in order to reduce the chance/effect of subject familiarity5. Each data set - which contains the same number of words in total - comprises the same number of words of each length, presents the words in the same sequence in terms of word length and, for the data edit and clipboard manipulation tasks, presents equivalent data edit/clipboard manipulation requirements relative to the lengths of the words (which, as already mentioned, are presented in identical word length order). In essence, therefore, without using the same words in each set - which would have incurred considerable bias from the learning effect - the data sets are as similar as possible. Each data set was attributed to one of the three experimental conditions so that all subjects 5

Additionally, a very large proportion of longer words are medical terms which, it was felt, were too specialised to be appropriate for this experiment and so were also filtered from the words selected. 9

would use the same data relative to the same condition and the task order was maintained throughout all experimental conditions - that is, data entry followed by data editing followed by clipboard manipulation. The data sets used can be seen in Appendix A. In recognition of the fact that subjects were unlikely to be completely familiar with using on/touch-screen keyboards, prior to commencing the experiment itself subjects were given a short time to practice with the on/touch-screen keyboard which had been designed specifically for the purpose of the evaluation. At the start of each experimental condition, subjects undertook a short (1 minute) tutorial to familiarise themselves with the conditional feedback design for the textfield. For each task within each condition, subjects were given 4½ minutes in which to complete as many of the entries listed on the handouts as possible6. For the data entry tasks, the MTextField widget contained nothing prior to each activity; for the data editing and clipboard manipulation tasks, the system pre-entered data into the MTextField widget before each entry and it was this data that the subjects were required to edit/manipulate respectively in order to achieve the words listed on the relevant handout. The system prevented subjects from progressing onto the next entry in the relevant list until the current entry had been completed correctly7. This required that subjects were not only fast but also accurate. The tasks were timerestricted not only to place subjects under an element of temporal pressure to ensure they worked as fast as possible, but also to control the length of the experiment. When they had completed all three tasks in each condition, each subject was asked to fill out a NASA TLX workload questionnaire. As discussed in later sections of this report, the results of these completed questionnaires give an indication of the workload the participants experienced in each of the different conditions, and thereby, allow the conditions to be compared in this respect. 3.2

Data Capture

Besides the NASA TLX workload data outlined above, the evaluation captured complex data concerning subjects' low level interaction with the MTextField widget. Essentially, the evaluation application included listeners that logged timed data to a series of output files - for example, timed activity submission and success and the number of key strokes required for each correct submission. Separate logs were maintained for each subject and, relative to each subject, for each condition and task type pairing. The analysis of the results of the evaluation is discussed in the following section. 4.

THE RESULTS

The evaluation of the MTextField widget, as described above, had two main aims: to determine the effect of the various levels of feedback on (1) the speed with which users interact with the MTextField and (2) the accuracy with which the users interact with the MTextField. Additionally, for each level of audio feedback, it aimed to assess and compare the manner in which users rated their overall perceived workload and, for the full audio condition, to identify the usefulness of the special feedback functions provided. Given its novelty (and complexity) - both with respect to GUI textfields per se and, with respect to the other widgets in the Audio Toolkit, in terms of its integrated use of both non-speech and synthesised speech feedback - it was hard to predict, and therefore define succinct hypotheses regarding, how the levels of MTextField feedback would affect the speed and accuracy with which users interact with the widget. Similarly, it was difficult to anticipate how the different feedback would affect users' perceived overall workload. It was hoped that the evaluation would show that the audio-enhanced MTextField - both non-speech only and integrated non-speech and synthesised speech - increases the accuracy with which users interact with the widget. It was, however, unclear as to whether this interaction improvement would be at the expense of speed8. Additionally, it was hypothesised that the 6

That is, the evaluation application timed the interactive session and automatically prevented further interaction after 4½ minutes. A correct entry was flagged to the subjects by means of a dialog box which informed them that they had been successful and, upon being cleared by the subjects, allowed them to proceed onto the next entry. Failure to complete an entry correctly was met with no such feedback to force the subjects to recognise that they had been unsuccessful and to force them to identify where they had made an error and correct it. 8 It was recognised that, due to technological restrictions, the inclusion of synthesised speech in the MTextField feedback design would inhibit the operational speed of the widget and thereby potentially adversely affect the measures of subjects' interaction speeds. 7

10

MTextField would reduce users' overall experience of workload. The remainder of this section discusses the findings of the evaluation with respect to the above investigative issues. 4.1

Effect On Interaction In Terms of Speed & Accuracy

Accuracy As stated above, it was hypothesised that – potentially at the expense of speed – the audio-enhanced versions of the MTextField widget would improve the level of accuracy at which subjects performed their given tasks. A correctness ratio was used as a measure of accuracy for the purpose of comparing subjects’ across the different audio conditions. For each task type, this correctness ratio was calculated for each subject using the following equation: Correctness Ratio =

Total Number of Submission Attempts Total Number of Correct Submissions

Correctness Ratio

According to the above, a higher ratio is worse than a lower ratio. The results per subject according to audio condition and task type are shown in Figure 2; for the averages across all subjects see Figure 3.

5.00 4.00

Data Entry

3.00

Data Edit

2.00

Clipboard

1.00 0.00

Correctness Ratio

Per Subject Results for Full Audio Condition 5.00 4.00

Data Entry

3.00

Data Edit

2.00

Clipboard

1.00 0.00

Correctness Ratio

Per Subject Results for No Audio Condition 5.00 4.00

Data Entry

3.00

Data Edit

2.00

Clipboard

1.00 0.00

Per Subject Results for Non Speech Audio Condition

Figure 2 – Correctness ratio per subject for submissions according to task type and audio condition

11

Average Correctness Ratio

2.00

x = 1 177

x = 1 415

x = 1 348

1.50

Full Audio

1.00

No Audio Non Speech Audio

0.50 0.00 Data Entry

Data Edit

Clipboard

Audio Condition Figure 3 – Average correctness ratios for submission according to task type and audio condition

When compared using a Two Factor ANOVA, the nature of the task was found to have a statistically significant (p = 0.013) effect on the correctness ratios achieved by the various subjects, irrespective of the audio condition under which the tasks were performed. That is, audio condition was not identified as having a significant effect on correctness ratios either on its own or in combination with task type. Figure 3 shows the average correctness ratios achieved across all subjects with respect to task type and audio condition. The listed figures represent the average correctness ratios achieved for each task type irrespective of audio condition. When these averages for task type (shown to have a significant effect on correctness ratio irrespective of audio condition) are compared, it shows that subjects perform data entry tasks significantly better (p = 0.012) than data editing tasks. There is, however, no significant difference between: (1) their accuracy when performing data entry and clipboard manipulation tasks (p = 0.099); and (2) their accuracy when performing data editing and clipboard manipulation tasks (p = 0.69). Unfortunately, on the basis of these results, this evaluation has not identified anything significant in terms the audio enhancement of the MTextField widget with respect to its ability to improve subject accuracy when performing various different tasks. That said, however, the findings illustrate that users may potentially require greater assistance from feedback when editing the contents of an MTextField widget than when performing other tasks. Although it was not the intention of the evaluation to assess the merits of textfield interaction with respect to task type outwith the influence of audio condition, these findings highlight the relative strengths and weaknesses of GUI textfields per se when used in an interface with restricted space and resolution. This information, in itself, is useful in the future design of textfield feedback - particularly for the audio sensory channel. Albeit not statistically significant, when the average correctness ratios according to task type and audio condition as shown in Figure 3 are compared, there is some indication that the full audio condition may have the potential - given additional design and evaluation - to improve the accuracy of textfield interaction. On the basis of these results, however, this cannot be claimed with any statistical backing and so must only be considered an informal observation at this point in time. It is, unfortunately, only possible to conclude that the results of this evaluation alone do not confirm the hypothesis that the audio enhanced MTextField widget improves the accuracy with which users interact with GUI textfields. Speed Consider now the effect of audio condition with respect to the speed with which subjects are able to achieve correct entry submission across the various task types. Figure 4 shows, according to task type and audio condition, the average time taken by subjects to achieve the submission of a correct entry and the average time taken by subjects to attempt an entry submission (regardless of whether or not it was correct).

12

Average Time (secs)

60.0 50.0 40.0

Full Audio No Audio

30.0

Non Speech Audio

20.0 10.0 0.0

Data Entry: Co rrect Sub mis s io n

Data Entry: Sub mis s io n Att emp t

Dat a Ed it: Co rrect Sub mis s io n

Dat a Ed it: Sub mis s io n Att emp t

Clip b o ard : Co rrect Sub mis s io n

Clip b o ard : Sub mis s io n At temp t

Task Type: Nature of Submission Figure 4 - Average times (in secs), according to task type and audio condition, taken by subjects: (1) to achieve a correct submission; and (2) to attempt a submission

Consider first the average times with respect to correct submissions. When compared using a Two Factor ANOVA, both task type on its own and task type in combination with audio condition were found to have a significant effect on the speed at which the subjects achieved correct entry submissions (p < 0.001 and p = 0.019 respectively). For the three task types, irrespective of audio condition, the average times required across all subjects to achieve a correct submission are as follows:

Data entry: x = 13.8secs; Data Editing: x = 21.3secs; Clipboard Manipulation: x = 52.0secs. A pair wise comparison of average submission times with respect to task type alone (using the Tukey test) highlights that there is a statistically significant difference between each of the different task types. That is, subjects achieve correct submission for data entry tasks significantly faster than for data editing tasks (p = 0.005) and significantly faster than for clipboard manipulation tasks (p < 0.001). Similarly, subjects achieve correct submission for data editing tasks significantly faster than for clipboard manipulation tasks (p < 0.001). Once again, although not the intention of this study to evaluate the strengths and weaknesses of textfields in isolation from audio condition, these results highlight the task related areas of textfield interaction that would most benefit from redesign and/or consideration should it prove essential to increase the speed with which accurate textfield interaction is achieved in a restricted/low resolution display. Although audio condition alone was not found to have a significant effect on the average time in which subjects achieved correct submissions, in combination with task type it was found to be significant (see Figure 5). The significant effect of task type on its own has already been discussed above. When the combined effect of task type and audio condition is examined - aside from the task influenced significant differences between audio type and task - there is one particular set of comparisons that is especially interesting to note (these are highlighted in with a heavier border in Figure 5) .

13

9

9

9

Non Speech Audio: Clipboard

Non Speech Audio: Data Editing

Non Speech Audio: Data Entry

No Audio: Clipboard

No Audio: Data Editing

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

No Audio: Data Entry

Full Audio: Clipboard

Full Audio: Data Editing

Full Audio: Data Entry Full Audio: Data Entry Full Audio: Data Editing Full Audio: Clipboard No Audio: Data Entry No Audio: Data Editing No Audio: Clipboard Non Speech Audio: Data Entry Non Speech Audio: Data Editing Non Speech Audio: Clipboard

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

Figure 5 - Statistically significant differences between task type/audio condition combinations (where 9 indicates statistical significance)

When the clipboard manipulation task is compared across the three audio conditions, users were found to be significantly faster on average to achieve a correct submission when working under the full audio condition than they were when working under the non-speech audio condition (p = 0.007). No such statistical significance was found for the difference in average correct submission times between the no audio condition and the other two audio conditions. From this is can be concluded that, specifically for clipboard manipulation tasks, the full audio feedback for the MTextField widget allows users to achieve significantly faster submission times for correct entries than the non speech version of MTextField widget feedback. That is, the inclusion of synthesised speech together with non-speech feedback in a GUI textfield significantly assists users to achieve faster times for correct submissions when performing the most complex of tasks - clipboard manipulation. In this respect, the MTextField widget feedback design can be considered successful. Unfortunately, nothing can be claimed regarding the positioning of the no audio condition relative to both other conditions since no statistical significance can be attributed to the difference in average submission times between the no audio condition and either other audio condition. Consider now subjects' average times per submission attempt according to audio condition and task type see Figure 4 - that is, the average time subjects spent on each entry before attempting a submission. Unlike the average correct submission times discussed above, a Two Factor ANOVA showed that only task type had a significant effect on the average times recorded (p < 0.001) - that is, audio condition was not shown to have a significant influence either on its own or in combination with task type. For each task type, the average times across all subjects irrespective of audio condition are shown below. Data entry: x = 11.8secs; Data Editing: x = 15.4secs; Clipboard Manipulation: x = 42.3secs. When the average times according to task type in isolation are compared using a Tukey test, it is found that on average: (1) subjects attempt a submission in significantly less time when performing data entry tasks than clipboard manipulation tasks (p < 0.001); and (2) subjects attempt a submission in significantly less time when performing data editing tasks than clipboard manipulation tasks (p < 0.001). This finding is unsurprising given the more complex nature of clipboard (only) manipulation and again, although it does not return information of particular relevance to the evaluation of the existing MTextField 14

widget, it lends support to findings previously discussed concerning the task related areas which are most in need of support from improved feedback in textfields. Focussing on the clipboard manipulation task, although there is no evidence to suggest that audio condition in any way influences the time subjects take to attempt a submission, full audio feedback has In been clearly shown to reduce the time it takes for subjects to achieve a correct submission9. combination, these results suggest that for clipboard manipulation tasks, full audio feedback may in fact not excessively hinder user performance with respect to speed (there is no evidence that it slowed users down significantly in terms of their submission attempts) and may therefore over a prolonged period of use indeed improve their performance in terms of accuracy combined with speed. 4.2

Effect on Workload

As previously mentioned, it was hoped that the audio-enhanced MTextField would reduce users' overall experience of workload. However, when tested using a One Factor ANOVA, this hypothesis did not hold to be true (p = 0.661). Essentially, therefore, the audio condition under which the users performed their given tasks was not found to have a significant effect on their perceived workload - see Figure 6.

Rating/Score

15.0 10.0 5.0 0.0

Full Audio

No Audio

Non Speech Audio

Mean Overall Workload Score

12.3

11.9

11.6

Mean Overall Preference

9.7

14.4

12.1

Audio Condition

Figure 6 - Mean overall workload & preference scores across tested audio conditions

Although this means that the audio-enhancement of the MTextField cannot claim to reduce the overall workload experienced by its users, neither did it significantly increase their average workload ratings despite increasing the sensory stimuli presented to (and thereby demands placed on) the users. Although not statistically significant, it is interesting to consider the differences in average ratings attributed to the individual factors which together constitute the overall workload measure - see Figure 7. In particular, it is interesting to note that both the full audio and non-speech audio conditions scored better in terms of mental demand, physical demand, and effort. This would suggest that, by utilising more than the visual sensory channel, the audio-enhancement of the MTextField may at least be successful at improving these factors for the task types evaluated. As expected, the synthesised speech did influence the pressure of time felt by the users. However, as discussed above, it did prove useful in reducing the time taken to achieve correct submission of the complex clipboard manipulation tasks.

9

When compared to the non-speech audio condition. 15

16.0 14.0 12.0 Full Audio

Rating

10.0

No Audio

8.0

Non Speech Audio

6.0 4.0 2.0

Annoyance

Frustration

Effort

Performance

Temporal Demand

Physical Demand

Mental Demand

0.0

Workload Factor

Figure 7 - Mean ratings per workload factor according to audio condition

In terms of users' perception of their own performance, the no audio condition fared best (albeit not significantly). Since the previous section has shown that the audio feedback conditions had no significant impact on users' correctness ratio, the difference in their performance ratings across the audio conditions is likely to be a result of their higher levels of frustration for the audio conditions (again, albeit the difference in frustration ratings is not significant). Finally, it is interesting to note - perhaps unsurprisingly - that the subjects found the full audio feedback to be more annoying that the non-speech only audio feedback. The audio condition under which subjects performed their allocated tasks had no significant effect on their stated preferences – that is, there was no single audio condition that was significantly more popular than others amongst the subjects tested. As can be seen from Figure 6, although not statistically significant, the non-speech audio condition was on average given greater preference ratings than the full audio condition – a finding which is consistent with the frustration ratings discussed above. 4.3

Use of Special Function Keys in Full Audio Condition

When working under the full audio condition, subjects were able to make use of a series of special function keys which present additional information on demand to the user (see section 2.3). To inform future development of these and other such functions, the usage of these functions was recorded in terms of the number of times subjects demanded each function's feedback in relation to the type of task being performed10. Across all subjects, Figure 8 shows the total number of times each of the special function keys were used in relation to the type of task being performed. When tested using a Two Factor ANOVA, it was found that only the function of the special keys themselves had a significant influence over the number of times the keys were used (p = 0.002); the nature of the task was not shown to be influential in this respect. A pair wise comparison of the totals for each of the special keys illustrated that - as is perhaps obvious from Figure 8 - the Read Field function11 was used significantly more often than any of the other functions.

10 Unlike other analysis previously discussed, audio condition was not a factor in this comparison since the function keys are only available when using the MTextField in its full audio form. 11 For description of the nature of this and the other functions, see section 2.3.

16

Total Number of Uses Across All Subjects

30 25

Data Entry

20

Data Edit

15

Clipboard

10 5 0 Word

Position

Selection

Field

Clipboard

Special Key Figure 8 - Total use of special keys according to task type

On the basis of this evaluation, it is unclear what caused the observed difference in special key use. It is, however, possible to speculate as to the reason. The nature of the tasks in the evaluation were such that the MTextField widget normally contained only one word thus rendering the Read Word and Read Field facilities almost identical in terms of feedback. Therefore, although further investigation would be required for verification, it is possible that it was the structure of the evaluated entries themselves that influenced these results. Although again speculative, it is possible that no subjects used the Read Position facility because the existing audio-enhanced feedback from the MTextField widget was sufficiently successful at confirming subjects' location within the MTextField. Finally, the Read Selection and Read Clipboard functions may have received fewer demands from the subjects on the basis that, when performing the clipboard manipulation tasks, subjects worked significantly slower (as discussed previously) to the extent that the automatic speech feedback would have been played without the need for subjects to request it. 4.4

Anomalies in the Evaluation Results

In an attempt to ensure that the effect of learning curve was removed (or at least reduced) from the evaluation, subjects were allocated in equal numbers to each possible permutation12 of the audio conditions being tested and were given equal amounts of tutoring for each condition. Despite this, however, the results from one audio sequence – that is the no audio, full audio, and non-speech audio – were on several occasions significantly different to the results from each of the other sequences. Specifically, for the subjects in the relevant group, that audio sequence had a significant effect on their overall workload ratings, the average correctness ratio they achieved and, both on its own and in combination with task type, it had a significant effect on the average time these subjects took to achieve a correct submission. On the basis that no other audio condition sequence was significantly different from the other sequences, this can be considered an anomaly which, in the absence of more detailed and specific investigation, cannot be explained at this time13. For this reason, the effect of audio sequence has been omitted from the preceding discussion. 5.

CONCLUSIONS AND FURTHER WORK

As already mentioned, the MTextField widget is unique both in terms of textfields per se and with respect to the other audio-enhanced widgets in the Audio Toolkit. The nature of interaction with, and therefore the behaviour of, textfields is arguably more complex than the majority of standard GUI widgets (such as those already included within the Audio Toolkit). On this basis, it was unlikely that any initial design for the audio-enhancement of a standard textfield would immediately achieve significant results. 12 13

Of which there were 6 in total. Further investigation and explanation is beyond the scope of this report. 17

That said, the initial design for an audio-enhanced textfield presented in this report - the MTextField widget - and its subsequent evaluation have returned some interesting results and identified the issues on which future iterations of the MTextField widget design should focus. In particular, although the evaluation did not show that the audio-enhanced formats of MTextField widget dramatically reduced subjects' feeling of overall workload, neither did it show that they increased it. This suggests that there is potential for further work on the existing design of the audio-enhancement of the MTextField widget to in fact reduce the workload placed upon users. Although the evaluation did not highlight any significant area of preference amongst the subjects in terms of the level of audio-enhancement of the MTextField widget, it is encouraging to note that the no audio version was not favoured significantly more than the others. Taking this starting point as a point of reference, future work may be done to improve the appeal of the audio-enhanced versions of the MTextField widget. As mentioned above, when audio condition sequence was included as a factor in the analysis, anomalous results were found for one group of subjects - that is, the group who performed their allocated tasks under the audio sequence no audio, full audio, and non-speech audio. Given that there were only three subjects in each audio permutation group, it is unclear whether these findings are as significant as the analysis suggests or are in fact a spurious effect of the particular subjects concerned. It would be very interesting to investigate this issue further in order to ascertain the cause of the findings, and specifically to identify whether there is something scientifically significant about that particular sequence of audio conditions. Although not its intention to investigate in isolation the effect of task type in terms of user performance with textfields, the evaluation of the MTextField widget has shown the significant influence over performance efficiency exerted by the nature of task. This information is useful, not only to the future evolution of the MTextField widget - it can assist in focussing design effort to target those elements of task that would potentially benefit most from improved feedback - but is also of value to user interface developers when designing graphical user interfaces, whether audio-enhanced or purely visual. The evaluation of the MTextField widget clearly demonstrated the advantage of the full audioenhancement in terms of efficiency (and in particular accuracy) of clipboard manipulation tasks. Thus, in this respect, the MTextField widget, as it is currently, is successful. The remainder of the information gleaned from the evaluation should allow future work on the MTextField widget design to improve its audio-enhanced support for efficient performance of the remaining two task types. Focussing on the full audio-enhancement of the MTextField widget, the results of the investigation into the frequency of use of the various special function keys illustrated that one function (Read Field) was used more than any other. Further investigation will be required in order to determine the reason behind subjects' preference for this particular function and to identify any additional functionality that would be beneficial to the audio-enhancement of the MTextField widget. In particular, it would be interesting to determine whether it was a consequence of using an on/touch-screen keyboard that inhibited the use of these special keys - that is, whether if users find it easier to activate the functions using a standard keyboard, they might make more extensive use of these enhanced facilities. To conclude, although the results of the evaluation of the current MTextField widget are not as positive as it was hoped, neither are they particularly negative. The no audio version of the widget was not shown to be statistically better in any manner than the audio-enhanced versions which suggests there is potential and reason to further the design of the MTextField widget. The results have highlighted a number of interesting and informative issues which should not only benefit further design work on the MTextField widget, but also GUI development using textfields in general. As an initial foray into the combined use of non-speech audio and synthesised speech, the MTextField widget has provided some interesting results which will not only benefit its own future development, but also that of other widgets in the Audio Toolkit which intend to make combined use of both types of audio feedback.

18

REFERENCES Brewster, S. A. (1997), Using Non-Speech Sound To Overcome Information Overload, In Displays Special Issue on Multimedia Displays, 17, pp. 179 - 189 Brewster, S. A. (1998), The Design of Sonically-Enhanced Widgets, In Interacting With Computers, 11, 2, pp. 211 - 235 Brewster, S. A. and Crease, M. G. (1997), Making Menus Musical, In Proceedings of IFIP Interact'97, Sydney, Australia, Chapman & Hall, pp. 389 - 396 Brewster, S. A. and Crease, M. G. (1999), Correcting Menu Usability Problems With Sound, In Behaviour and Information Technology, 18, 3, pp. 165 - 177 Brewster, S. A., Lumsden, J. M., Gray, P. D., Crease, M. G. and Walker, A. (2001), The Audio Toolkit Project, 2001, http://www.dcs.gla.ac.uk/research/audio_toolkit/ Crease, M. G. and Brewster, S. A. (1998), Making Progress With Sounds - The Design and Evaluation of an Audio Progress Bar, In Proceedings of Second International Conference on Auditory Display (ICAD'98), Glasgow, UK, British Computer Society, pp. Crease, M. G. and Brewster, S. A. (1999), Scope for Progress - Monitoring Background Tasks With Sound, In Proceedings of INTERACT'99, Edinburgh, UK, British Computer Society, pp. 19 - 20 Crease, M. G., Brewster, S. A. and Gray, P. D. (2000a), Caring, Sharing Widgets: A Toolkit of Sensitive Widgets, In Proceedings of BCS Human-Computer Interaction (HCI'2000), Sunderland, UK, Springer, pp. 257 - 270 Crease, M. G., Gray, P. D. and Brewster, S. A. (1999), Resource Sensitive Multimodal Widgets, In Proceedings of INTERACT'99, Edinburgh, UK, British Computer Society, pp. 21 - 22 Crease, M. G., Gray, P. D. and Brewster, S. A. (2000b), A Toolkit of Mechanism and Context Independent Widgets, In Proceedings of Design, Specification and Verification of Interactive Systems (DSVIS) Workshop 8 ICSE'2000, Limerick, Ireland, Springer, pp. 127 - 141 Eckstein, R., Loy, M. and Wood, D. (1998), Java Swing, O'Reilly, Sebastopol Edwards, A. D. N., Brewster, S. A. and Wright, P. C. (1992), A Detailed Investigation Into The Effectiveness of Earcons, In (Ed, Kramer, G.) Proceedings of First International Conference on Auditory Display, Santa Fe Institute, Santa Fe, Addison-Wesley, pp. 471 - 498 Edwards, A. D. N., Brewster, S. A. and Wright, P. C. (1995), Experimentally Derived Guidelines for the Creation of Earcons, In Proceedings of Human Computer Interaction (HCI'95), Huddersfield, UK, pp. Lumsden, J. and Brewster, S. (2001), A Survey of Audio-Related Knowledge Amongst Software Engineers Developing Human-Computer Interfaces, Technical Report TR-2001-97, Department of Computing Science University of Glasgow, September 2001, pp. 31

19

APPENDIX A: TASK HANDOUTS FOR EVALUATION EXPERIMENT

DATA SET 1 /AUDIO CONDITION :: NO AUDIO TASK 1 :: DATA ENTRY You will be given 5 minutes in which to enter as many of the following words as possible. Please enter the words in the order given - you will only be allowed to progress to the next word upon correct completion of the current word.

piece sliver certain phantom relative erroneous lieutenant eccentricity statuesqueness altruistic circumnavigable unsympathetically ambiguous whimsicality nonconformistically accompaniment recommendations cyclical predetermination deterministic suspiciously melancholy administrational necessitate juvenescence ceremoniousness extraordinariness hypnological

20

TASK 2 :: DATA EDITING In this task, you will be presented with a series of misspelled/mistyped words or pairs or words. For each word/pair of words in sequence, the following set of words lists the correct spelling. You are required to correct the erroneous words/pairs of words according to the given list. For example, if the textfield was to present there and the corresponding word in the printed list was "their" you should edit the contents of the textfield to change "there" into "their". You will be given 5 minutes in which to correct as many of the given words to reflect the following words as possible - you will only be allowed to progress to the next word upon correct completion of the current word.

parallelogrammical multimillionaire genealogically intellectualistically kaleidoscopically anthropomorphically indistinguishability onomatopoeically pathological perniciousness sophistication unnecessarily inimitability disproportionateness unidirectionality

override reminisce chivalrous perspicuous lasciviously orthopedical agriculturalist jurisprudential discontiguousness disappear metallurgy interference condescensiveness aptitudinally corroboratory characteristical disproportionately

21

TASK 3 :: COPYING, CUTTING, AND PASTING In this task, you will be presented with a series of words or pairs of words in which the letters have been deliberately jumbled up. For each word/pair of words in sequence, the following set of words lists the correct letter ordering. You are required to correct the erroneous words/pairs of words according to the given list using only the copy, cut and paste facilities. For example, if the text field presented thier and the word in the list was "their" you should use the cut, copy and paste facilities to switch the order of the "i" and the "e" in the textfield. You will be given 5 minutes in which to correct as many words as possible - you will only be allowed to progress to the next word upon correct completion of the current word. opportunistically incomprehensibility transmogrification interreceive isidiiferous psychoacoustic cannibalistically uninhabitability constitutionally paternalistically unaesthetically tempestuousness fossiliferous lexicographically

fluorescein interversion particularity constitutionalisation ineffaceable precipitous patriarchist characteristically multitudinosity extemporaneousness humanistical flirtatiously quinquagenarian apprehensiveness representationalist reconciliability antisocialistically

22

DATA SET 2 /AUDIO CONDITION :: NON-SPEECH AUDIO TASK 1 :: DATA ENTRY You will be given 5 minutes in which to enter as many of the following words as possible. Please enter the words in the order given - you will only be allowed to progress to the next word upon correct completion of the current word.

quiet unique initial vaccine parallel secretary fictitious availability transcendental monotonous humanitarianist misrepresentative reimburse hygienically particularistically concatenation sympathetically marriage environmentalism reprehensible participancy empiricist multidimensional corroborate lieutenantry unprecedentedly psychotherapeutic latitudinous

23

TASK 2 :: DATA EDITING In this task, you will be presented with a series of misspelled/mistyped words or pairs or words. For each word/pair of words in sequence, the following set of words lists the correct spelling. You are required to correct the erroneous words/pairs of words according to the given list. For example, if the textfield was to present there and the corresponding word in the printed list was "their" you should edit the contents of the textfield to change "there" into "their". You will be given 5 minutes in which to correct as many of the given words to reflect the following words as possible - you will only be allowed to progress to the next word upon correct completion of the current word. parliamentarianism dissatisfactions archaeological anthropomorphological melancholiousness coinstantaneousness undiscriminatingness hypochondriacism militaristic monotonousness sentimentalist unequivocally kaleidoscopic parallelogrammatical spectrophotometry

priority syllogism reciprocal analyticity insenescible supercilious aristocraticism philosophically nondiscrimination hierarchy phenomenon indelibility disproportionation atrociousness deciduousness commensurability indiscriminatively

24

TASK 3 :: COPYING, CUTTING, AND PASTING In this task, you will be presented with a series of words or pairs of words in which the letters have been deliberately jumbled up. For each word/pair of words in sequence, the following set of words lists the correct letter ordering. You are required to correct the erroneous words/pairs of words according to the given list using only the copy, cut and paste facilities. For example, if the text field presented thier and the word in the list was "their" you should use the cut, copy and paste facilities to switch the order of the "i" and the "e" in the textfield. You will be given 5 minutes in which to correct as many words as possible - you will only be allowed to progress to the next word upon correct completion of the current word. cinematographically onomatopoetically incomprehensiveness interrelationships oscilloscope statistician ecclesiastical anthropologically substitutability conscientiousness miscellaneousness preferentialist synchronousness exhibitionist jurisprudentially

homogeneous multiplicity domiciliation disestablishmentarian asynchronous qualitative rerequisite extraterritorially necessitousness contradistinctions longitudinal hallucination ratificationist discourteousness representationlism militaristically

25

DATA SET 3 /AUDIO CONDITION :: FULL AUDIO ADDITIONAL SPEECH-RELATED FUNCTIONALITY You are about to use a textfield which has the facility to present speech feedback on the basis of your actions - for example, as you type, the textfield can "speak" back the characters that you enter. There are, however, additional facilities related to speech feedback which you can call upon as required. These additional facilities are accessed via key-press combinations as listed below. Please feel free to use them as and when you wish when using the speech-enhanced textfield. The additional functionality and the means by which to access it are listed below: • • • • • •

To read back the current word (at the cursor position) :: Ctrl+Alt+W To get verbal indication of current cursor position :: Ctrl+Alt+C To read back current selection (i.e. highlighted text) :: Ctrl+Alt+S To read back the entire contents of the textfield :: Ctrl+Alt+R To read back what is on the clipboard :: Ctrl+Alt+B To silence the speech feedback :: ESC

26

DATA SET 3 /AUDIO CONDITION :: FULL AUDIO TASK 1 :: DATA ENTRY You will be given 5 minutes in which to enter as many of the following words as possible. Please enter the words in the order given - you will only be allowed to progress to the next word upon correct completion of the current word.

their vacuum odorous receive inactive poisonous solicitude bibliography disciplinarian uniqueness familiarisation circumstantiation unanimous megalomaniac astrometeorological recommendable differentiation occasion developmentalist wheretosoever indefinitive orthogonal transcendentness bureaucracy ridiculously magnanimousness contradistinguish spirituality

27

TASK 2 :: DATA EDITING In this task, you will be presented with a series of misspelled/mistyped words or pairs or words. For each word/pair of words in sequence, the following set of words lists the correct spelling. You are required to correct the erroneous words/pairs of words according to the given list. For example, if the textfield was to present there and the corresponding word in the printed list was "their" you should edit the contents of the textfield to change "there" into "their". You will be given 5 minutes in which to correct as many of the given words to reflect the following words as possible - you will only be allowed to progress to the next word upon correct completion of the current word. parallelogrammatic disillusionments maneuvrability noninterchangeability heterogeneousness irreprehensibleness incomprehensibleness chauvinistically repetitive reconnoissance coincidentally infallibility quadrilateral nondeterministically quintessentially

reassess arbitrary parliament connoisseur inirritative sacrilegious carnivorousness precipitateness undistinguishably heuristic penicillin idealistic spectrophotometric anonymousness dictatorially impracticability archrepresentative

28

TASK 3 :: COPYING, CUTTING, AND PASTING In this task, you will be presented with a series of words or pairs of words in which the letters have been deliberately jumbled up. For each word/pair of words in sequence, the following set of words lists the correct letter ordering. You are required to correct the erroneous words/pairs of words according to the given list using only the copy, cut and paste facilities. For example, if the text field presented thier and the word in the list was "their" you should use the cut, copy and paste facilities to switch the order of the "i" and the "e" in the textfield. You will be given 5 minutes in which to correct as many words as possible - you will only be allowed to progress to the next word upon correct completion of the current word. circumstantiability probabilistically irreconciliableness consequentialities paradisiacal vicissitudes arrhythmically disciplinarianism relativistically disinterestedness logarithmetically metacircularity ritualistically scintillating unintelligibility

kinesthesis remuneration specification establishmentarianism infinitarily vicissitude translucence institutionalising substitutionary phenomenologically solitudinous mediterranean technicological contraindicative disadvantageousness metamathematical

29

WORKLOAD TESTS At then end of each 15 minute session, we will ask you to complete some tables designed to find out about your experiences during the tasks. We are examining the "workload" you experienced. The factors that influence your experiences when interacting with the system to complete the tasks may come from the system itself, your feelings about your own performance, how much effort you put in, or the stress and frustration you felt. The workload contributed by these factors may change as you use the system with or without sounds. The physical parts of workload are easy to measure but the mental ones are harder. Since workload is something that is experienced individually by each person, we need to measure it by asking each person to describe the feelings they experienced. Because workload may be caused by many different factors, we would like you to evaluate several of them individually. This set of 7 scales was developed for you to use in evaluating your experiences in different tasks. Please read the definitions of the scales carefully. If you have a question about any of the scales please ask us about it. It is extremely important that they be clear to you. After each 15 minute session, we will ask you to fill in the 7 scales. You will evaluate the session by marking each scale at the point which matches your experience. Each line has a description at each end: please consider each scale individually. Please consider your responses carefully. Your ratings will play an important role in the evaluation being conducted so your active participation is essential to the success of this experiment and is greatly appreciated.

30

Audio Condition:

Name:

MENTAL DEMAND How much mental, visual, and auitory activity was required? (e.g. thinking, deciding, calculating, remembering, looking, searching, listening, scanning, looking etc.) MENTAL DEMAND Low

High

PHYSICAL DEMAND How much physical activity was required ? (e.g. pushing, pulling, turning, controlling, activating etc.) PHYSICAL DEMAND Low

High

TEMPORAL DEMAND How much time pressure did you feel due to the rate or pace at which the task (TIME PRESSURE) elements occurred? (e.g. slow, leisurely, rapid, frantic) TEMPORAL DEMAND Low

High

PERFORMANCE LEVEL How successful do you think you were in accomplishing the task(s) set by the ACHIEVED experimenter? How satisfied were you with your performance? Don't just think of your 'score', but how you felt you performed. PERFORMANCE Poor

Good

EFFORT EXPENDED How hard did you have to work (mentally and physically) to accomplish your level of performance? EFFORT Low

High

FRUSTRATION LEVEL How much frustration did you experience? (e.g. were you relaxed, content, stressed, irritated, discouraged?) FRUSTRATION Low

High

ANNOYANCE How annoying did you find the sounds in the game? EXPERIENCED Low

High

31

Name: OVERALL Please rate your preference for each of the audio conditions you have PREFERENCE experienced today using the following identifiers: (a) for the full audio condition (b) for the non-speech audio condition (c) for the non-audio condition

Low

High

32