Radical user interfaces for real-time control - CiteSeerX

3 downloads 0 Views 59KB Size Report
Radical user interfaces for real-time control. Andy Hunt and Ross Kirk. University of York, U.K.. Abstract. This paper describes recent work which challenges the.
Radical user interfaces for real-time control Andy Hunt and Ross Kirk University of York, U.K. Abstract This paper describes recent work which challenges the predominance of the WIMP (Windows-Icons-MenusPointers) computer interface for use in real-time situations. The results of the work have implications for the design of user-interfaces for real-time control tasks (of which musical performance and experimentation are clear examples). This paper describes the tests, the interfaces, and the results from a variety of test subjects over several weeks. It then draws conclusions about the appropriateness of commonly accepted interfaces for complex and creative tasks.

1. Background The WIMP interface (Windows-Icons-Menus-Pointers) originated in the 1970’s as what was then a radical way of interacting with computers [1]. Computer engineers were comfortable with ‘command-line interfaces’ which were flexible, if somewhat cryptic and inaccessible to the common user. They intended the acronym ‘WIMP’ to have a negative connotation because they felt that graphical computer interfaces were for users who couldn’t cope with ‘proper’ command-line syntax. The commercial world adopted the WIMP interface as it appeared more ‘user friendly’ and thus resulted in more computers being sold. It has become the predominant way of interacting with computers, and its supremacy is not often challenged. A typical WIMP interface consists of a pointing device (such as a mouse or tracker-ball) and it requires the user to interact constantly with a computer screen via this pointer. Information is displayed within on-screen areas (windows) and most functions take place via clicking on icons, or navigating through layers of text-based menus. This paper suggests that while the WIMP interface may be appropriate for certain office-based tasks such as wordprocessing, it is dangerous to assume that it is always the best interface for all situations. In particular, we highlight issues relating to:

• • • •

The ‘information dialogue’ between the computer and the human user. The dominance of language & reading. The emphasis on ‘search & find’ paradigms. The unsuitability of WIMP interfaces for real-time control tasks.

2. The Problem The two main restrictions with a WIMP interface are as follows: 1) It demands that you look at it the whole time. The emphasis is on visual feedback for various pointing and selection tasks. 2) It often requires that you search for the most appropriate command. Consider trying to use such an interface to drive a car. The driver would have to stare at the screen and would therefore not be able to watch the road! When an important event occurred (such as a pedestrian unexpectedly stepping out into the road) the driver (assuming he or she had seen the danger) would have to search the menu hierarchy for an appropriate ‘Stop’ command. It would probably bring up a dialogue box asking “Are You SURE you want to stop?” by which time it would be too late! Consider also using a WIMP interface in place of a conventional musical instrument to perform a piece of music. The player would have to look away from the screen to locate the written score in order to find out what note to play. S/he would then have to refocus on the screen, co-ordinate the mouse and choose the ‘Play’ menu, which would probably bring up a submenu of the different notes that are possible. Once the note is triggered there would be little chance of making an expressive performance, apart from perhaps selecting a pre-set ‘Expression Trajectory’ from another menu list. One might easily conclude from the above two thought exercises that they are inappropriately difficult tasks to be considering for computer interfaces. However, cars have

been successfully driven by humans for more than a century, and musical instruments successfully played for thousands of years. Interactive systems can be controlled by computer interfaces, but the WIMP paradigm is surely inappropriate for those where the control decisions are not predictable in advance. How sad that so many control systems which demand human real-time control are WIMP-oriented. Readers may have noticed a proliferation of such interfaces on photocopiers, restaurant & shop cash desks and cash-point machines – without noticing any improvement in the service. On the contrary, the result is often a frustrated user, baffled by choices and fixed human-computer dialogue. It is even more worrying when industrial plants replace their traditional hands-on multi-person control interfaces by more ‘up-to-date’ single operator (WIMP-based) computer stations. Martin Moore-Ede’s book “The 24 Hour Society” highlights the many perils of working people longer hours with less sensory rich computer interfaces. “In this computerised age we have focussed on visual cues because that’s what computers do best – a classic case of machine-centred thinking” [2]. He points out how computers in the workplace often give the false impression of being more efficient than precomputer industrial interfaces, but cites many instances of serious problems occurring. These tend to happen when the human is not fully engaged with the interface and an unusual situation takes place, resulting in the human operator being unable to regain adequate control of the system. We now describe a study into a simple interactive musical task in order to compare such interfaces with some alternative designs. The hypothesis is that because people can learn to use complex interfaces such as violins and cars they should also be able to cope with more complex computing interfaces. The goal of the research is therefore to establish whether more complex forms of interface parameter mapping can yield better results as a user becomes accustomed to it.

3. The Study The aim of this study was to compare the effectiveness over time of several different interfaces for a real-time musical control task. The task required the human test subject to listen to a musical phrase lasting no more than a few seconds, and then attempt to reproduce it accurately on the interface to hand. In order to simplify the number of variables involved, the musical phrase consisted of changes in up to four sonic variables (pitch, loudness, timbre and stereo panning). The complexity of the phrases ranged from those involving a single change in

one parameter to those containing simultaneous trajectories of all four parameters. The University of York, UK’s MIDAS [3] system, running on a Silicon Graphics Indy machine, was used to construct the audio algorithms, the user interfaces and the data monitoring systems. For some of the interfaces, an external piece of hardware was used. The hardware was linked to the MIDAS system via a MIDI connection [4]. A range of different interfaces was originally envisaged for the tests. The intention was to allow the user to perform a real-time task on those interfaces which are commonplace in computer music, and to compare these with a new style of multi-parametric interface where there is no longer a one-to-one mapping between a control device and a system parameter. Several interface styles commonly used for computer music were rejected from the outset, as they do not allow any form of real-time interaction. An example of this is the ‘computer text score’ which is the interface traditionally used by computer musicians (in popular programs such as Csound [5]). The sound-generating algorithm is set up via a text file known as an ‘orchestra’. The ‘score’ is another text file which contains lists of numbers that represent the values of all the available parameters over time. The following three interfaces were chosen for the study. They represent a series of stages – from the most commonly accepted through to the most radical.

3.1 Interface 1: ‘Mouse-controlled sliders’ This interface consists of four sliders on a computer screen; one for each of the sonic parameters that can change. During the test, the player uses the mouse to move the sliders (see Figure 1). Each slider control can be ‘dragged’ to produce a trajectory or ‘clicked’ to produce a step change. Figure 1: The Mouse & Sliders Interface

Some initial pre-trial studies of this interface showed that the only way it could be made viable was to ‘pre-set’ the starting positions of each slider control to correspond with the starting values of the sound in question. At least this way, the user had some chance of reproducing the sounds. Otherwise, too much time was spent trying to set each of the sliders into an appropriate starting position. Note that this interface is not a ‘true’ WIMP interface, as it has no menus. The user does not have to search for the parameters, but simply move the mouse to the appropriate sliders. An interface with menus would actually slow down the whole process so as to make interactive continuous control impossible.

3.2 Interface 2: ‘Physical Sliders’ This interface uses four of the sliders on a Roland SC155 sound module (see Figure 2). This was configured to send out MIDI information and thus control the sound algorithms on the MIDAS computer system. Figure 2: The Physical Sliders Interface

The user can move each of the sliders independently (thus controlling each of the sound parameters) although it is quite difficult to set up the parameters into their ‘starting’ positions. Thus the user can simultaneously control all four sound parameters. The slider positions are echoed on the screen, but the user does not need to look at the screen in order to use this interface.

3.3 Interface 3: ‘Multi-parametric instrument’ This interface uses the same hardware as interfaces 1 and 2 (the mouse and physical sliders on a sound module), but it uses them in two radically different ways. Firstly the system expects the user to expend some physical energy to continuously activate the system. Secondly, there are no direct one-to-one correspondences (mapping)

between each physical control and the internal sound parameters. Both of these concepts are only radical in terms of their use in computing interfaces. They are in fact firmly rooted in everyday examples of non-computer-based realtime systems. Firstly let us consider the use of energy to operate a system. In many real-time devices (for example a violin, a bicycle, a clarinet or a drum-kit) the human operator has to inject energy or ‘excite’ the system before it will operate, and must continue to supply energy to keep it going. Then, the energy is steered through the system or damped (dissipated) in order to achieve the task such as playing a note or climbing up a hill. These two operations (inject/excite & steering/damping) are often carried out by different limbs (e.g. bowing with one arm and fingering notes with another, pushing bicycle pedals with the legs and steering with the arms). Even in motorised systems (the car being the most common example) the concept of injecting energy with one limb and steering with another holds true. A motor may generate the energy, but its injection is controlled by the driver. Secondly let us consider the correspondences between each input control and the controllable parameters in such systems. For example, consider a violin and ask “where is the volume control?”. There is no single control; rather a combination of inputs such as bow-speed, bow pressure, choice of string and even finger position. This is an example of a ‘many-to-one’ mapping, where several inputs are needed to control one parameter. Again, considering the violin ask “which parameter does the bow control?”. It actually influences many aspects of the sound such as volume, timbre, articulation and (to some extent) pitch. This is therefore an example of a ‘one-tomany’ mapping. Human operators expect to encounter complex mappings, and yet so often engineers provide nothing but ‘one-to-one’ correspondences (for example a set of sliders, each controlling a synthesis parameter). The multiparametric interface used in the study is shown in Figure 3. The user finds that the computer screen is blank (in contrast to the two previous interfaces where the screen shows a representation of four sliders). Sound is only made when the mouse is moved. The sound’s volume is proportional to the speed of mouse movement. This ensures that the user’s physical energy is needed for any sound to be made, and that the amount of energy has an effect on the quality of the sound. In addition, the volume, pitch, timbre and panning are controlled by combinations of the mouse position and the position of two sliders, as shown here: •

Volume = speed of mouse + mouse button pressed + average position of two sliders.



Pitch = vertical position of the mouse + speed of movement of slider no. 2. • Timbre = Horizontal position of the mouse + difference in the two slider positions. • Panning = Position of slider no. 1. This ensures that there are several many-to-one mappings. Simultaneously there are various one-to-many mappings (e.g. slider 1 affects volume, timbre and panning). Two limbs are used, as the player has to use two hands – one on the mouse, one on the sliders. Figure 3: The Multiparametric Interface

After the results had been analysed (see section 4) it became clear that a set of longitudinal tests was needed in order to see how people reacted over a longer period of time. Three subjects (from the sixteen) were selected to attend ten sessions. At each of these sessions they attempted nine musical examples on the same three interfaces as before.

4. The Results & Analysis Every test result was stored on the computer and later given a score by both a computer algorithm and a human marker.

4.1 Computer Marking A computer marking procedure is arguably completely objective as it can compare the amount of deviation from the original. A program was written to establish a single ‘total error score’ for every test. A score of 0 represents a perfectly reproduced musical example. It does this by performing a root-mean-square (r.m.s.) difference comparison for every parameter (pitch, volume, timbre, and panning) in each test.

4.2 Human Marking There is no ‘obvious’ mapping of hand position to sound produced. The user must experiment. During the tests users tend to be somewhat baffled at first, then gradually develop a ‘feel’ for the interface, before finally beginning to think in terms of gestures.

3.4 The Tests Sixteen subjects were selected from University staff and students with a variety of musical and non-musical backgrounds. They were each asked to attend an individual test session once a week, for three weeks. Each session consisted of a series of 24 graded musical examples which had to be performed on the ‘mouse & sliders’ interface, then the ‘physical sliders’, then the ‘multi-parametric instrument’. The musical examples all last between 2 and 4 seconds, but they vary in complexity. At the beginning of the test series there is just a single change in one sound parameter (e.g. a fixed note whose volume increases in a step-wise fashion after one second). The complexity of the tests gradually increases until at the end all four sound parameters are changing continuously (for example a sort of wolf-whistle effect which pans from left to right, and peaks in harmonic content towards the end).

The automated procedure outlined above cannot take into account several very common human performance traits. • Transposition – everything about the sound is reproduced relatively correctly, but the parameter starts off higher or lower than the original. The computer marks this as a constant error and gives the test a bad score, whereas a human might say “you’re mostly there, but you just need to start a bit lower next time”. • Elongation – the overall shape and sound is fine, but is played back at a slightly slower or faster tempo. Again the moment-by-moment comparison carried out by the computer gives a poor mark whilst a human listener might say “well done, but it was just a little bit slow”. • Time Offset – the sound is perfectly reproduced but is started a short time after the given countdown signal. The computer marks low again whereas a human listener would probably just ignore the late start and give it a good score. These features of human performance are not rare occurrences; they happen to some extent in every test. Whilst there are further mathematical correlation techniques which can be used to compensate for some of the above characteristics, it was thought better to keep the

4.3 Graphical Presentation of Results

The x-axis represents the nine sounds used in the longitudinal tests from sound 1 (simple) to sound 9 (complex). The y-axis represents the average scores for each test. We can see that for tests 6 to 9 (where many parameters are changing at once) the multiparametric interface has the best results. b) For the simpler (one parameter) tests the multiparametric interface comes out worst – but always with an upward trend. Figure 4 (above) indicates that the multiparametric interface is indeed the worst overall performer for the simpler tests. However, this graph does not show the progression of scores over time. Figure 5: Trends in the simplest test.

Average scores over time (Test 1) 65

Percentage Score

absolute computer measurement as a reference, but also to use a human marker to give a score to every test. The marker’s job is to listen to the original sound, then to the performed sound (as many times as necessary) and give a mark out of 10 for each of three categories: • Timing Accuracy (how do the timings of the sonic events match the original?) • Parameter Accuracy (how near to the original are the parameter values?) • Trajectory Accuracy (how well do the parameters move compared to the original?) The final outcome for each test is a percentage score (where 100% means that the sound has been perfectly reproduced). The results of the human marking are also moderated by a second judge who marks independently certain randomly chosen tests. Human marking of audio material is considered essential for music exams, competitions, recitals at college, and feedback in music lessons etc. These tasks are never left to a computer, although in some instances (such as our experiment) a computer can provide an extra level of information about the recorded data. In this paper, however, the results of the human marker are shown.

60 55 50

Mouse

45

Sliders

40

Multi

35 30

The experiments have yielded over 4000 individual tests, and there are many conclusions that can be drawn from the trends shown in the results. We now present what we consider to be the most significant four conclusions. Figure 4: The effect of Test Complexity Comparison of Test Averages 65

Percentage Score

60 55 50

Mouse

45

Sliders

40

Multi

35 30 1

2

3

4

5

6

7

8

9

Test complexity a) For the more complex tests the new multiparametric interface gives the best results. Figure 4 shows how the three interfaces fare with different levels of test complexity.

1

2

3

4

5

6

7

8

9 10

Time (Test session no)

Figure 5 shows the results for Test 1 (the simplest test) over the ten session test period. Time is shown along the x-axis, and the average scores are shown, as before, on the y-axis. Although the multiparametric interface has the lowest overall scores, it does have a significant upward trend. Note how the trends for the ‘mouse’ interface is actually downward over time! Contrast Figure 5 with Figure 6 (overleaf) which shows the same plot over time but for the most complex test. Here, the multiparametric interface is always the clear winner, and still with a strong upward trend. Also note how the trends for both the ‘mouse’ and ‘sliders’ interfaces are downward over time! Taken together, these graphs imply that the sliders-based interfaces are clearly best for simple, single parameter changes. Their superiority is challenged by the ‘multiparametric’ interface where parameters change one after the other. However, they are truly beaten where several parameters change at once! This is an important result, as it demonstrates that it will probably take a complex interface to cope with a non-trivial control domain. We shall return to this point in the conclusions.

Figure 6: Trends in the most complex test. Average scores over time (Test 9) 60

Figure 8: A subject with an improving trend in the multiparametric interface.

55 50

Subject 10 (Tests 9-16)

Mouse 45

Sliders

40

Multi

35 30 1

2

3

4

5

6

7

8

9 10

Time (Test session no)

c) Different subjects favour different interfaces. Figures 7 & 8 show the results of two different subjects (from the sixteen) for the moderately complex tests where just one or two parameters vary. Figure 7 shows someone who (at least in these early stages) seems to favour the mouse interface. Figure 7: A subject who favoured the mouse.

Subject 12 (Tests 9-16)

Percentage Score

60.0 55.0 50.0 45.0

Mouse

40.0

Sliders Multi

35.0 30.0 1

2

3

Time (Session no.) In contrast, Figure 8 shows another subject whose performance on the multiparametric tests outscored that on the mouse and sliders. His overall scores with the multi-parametric interface overtake his scores with the mouse by the third session. Note that his performance on the ‘physical sliders’ actually goes down over time. In conversation with this subject, it became clear that he was really getting to like the multi-parametric interface “even though I don’t quite know how I do it – it frees my mind to concentrate on the music, while my hands perform the gestures”. He also

60.0

Percentage Score

Percentage Score

65

explained why he was having trouble with the physical sliders. “With the sliders I’m constantly having to break the sound down into what to do to control all four fingers, and I can’t keep up”.

55.0 50.0 45.0

Mouse

40.0

Sliders Multi

35.0 30.0 1

2

3

Time (Session no.)

d) The ‘mouse’ interface gives the most consistent (but low) scores across all tests and subjects. Figure 7 (left) shows a subject who is actually rather unusual because the mouse seems to be his best interface. For most other subjects the mouse gives a moderately good performance with the simplest test in the early stages, but is rapidly overtaken by the other interfaces when the tests get harder or the user gets more accustomed to the task. This seems to imply that the mouse (in conjunction with on-screen sliders) is an interface that people can relate to and operate quickly. This is maybe why the WIMP interface is popular. However as the task domain increases in complexity and as people spend longer on the system then other interfaces become more suitable.

5. Summary The tests described here concentrate on a real-time musical task and have shown up some distinct characteristics in the performance of different interfaces. Further work is required to establish whether similar results would occur in alternative real-time task domains. (for example other musical tasks or real-time speech control). Perhaps there are even office tasks whose performance would be improved with an replacement interface. The study has indicated that some users may prefer direct control with visual feedback while others favour a gestural interface with a complex mapping between input and parameters. It is unclear from the present study what

would happen over an even longer period of use. There are several questions which a further study could investigate. Would people gravitate to a particular interface style, or would the multiparametric system work best for everyone (given enough learning time)? What is the best sort of multiparametric interface for a particular task? Would it be possible to test people to see what is the best interface style for them ? It is clear that time is needed to learn an interface, and that some interfaces may require a longer practise time in order to achieve good results. This is hardly surprising if we consider the amount of time needed to learn to drive a car or play a musical instrument. If it took several months or years to master a computer interface it would typically be rejected as non-viable. In many cases there is an assumption that a user interface should take no longer than ten minutes to master. This would rule out every musical instrument and vehicle driving system that had ever been invented. So therefore perhaps our preconceptions about computer interfaces are wrong. Possibly for some tasks we should not expect an ‘easy’ interface which takes minimal learning. For many real-time interactive interfaces we need control over many parameters in a continuous fashion, with a complex mapping of input controls to internal system parameters. Maybe we need a substantial amount of time to learn a complex control interface in order to give us confident real-time control over complex systems.

6. References [1] “Human Computer Interaction”, Jenny Preece et al, Addison Wesley, 1994. ISBN 0-201-62769-8. (Parts II and III give a good overview of Human and Technological limitations of different input styles and devices). [2] “The Twenty Four Hour Society”, Martin Moore-Ede, Piatkus, London 1994. ISBN: 020162611X (A warning to designers of modern systems that human operators are not being considered appropriately). [3] "MIDAS-MILAN : an Open Distributed Processing System for Audio Signal Processing", P.R.Kirk, A.D.Hunt. Journal of the Audio Engineering Society, Vol. 44, No. 3, March 1996 pp119-129. (This is a technical description of the open system for audio visual interaction and processing that has been developed by the authors at the University of York, U.K.) [4] “Digital Sound Processing for Music & Multimedia”, Ross Kirk & Andy Hunt, Focal Press, Butterworth Heinemann, 1999. ISBN: 0240515064 (This textbook provides details of the development and working of music technology systems, including in Chapter 4 the MIDI interface. A detailed web-site is available at http://www.york.ac.uk/inst/mustech/dspmm.htm, and this includes many links to information about MIDI).

[5] http://www.leeds.ac.uk/music/Man/c_front.html. (This web-page provides full on-line documentation, examples and software downloads of the computer music language Csound).

The authors can be contacted by email : [email protected], [email protected] or at The Electronics Department, University of York, Heslington, York YO10 5DD, U.K.