Do Users Always Know What's Good For Them? Utilising ... - CiteSeerX

2 downloads 52831 Views 108KB Size Report
assessment, user cost, audio, video, videoconferencing, physiological ... Multimedia conferencing (MMC) allows users to communicate using audio, video.
Do Users Always Know What’s Good For Them? Utilising Physiological Responses to Assess Media Quality Gillian M. Wilson and M. Angela Sasse Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK. Tel: +44 (0)20 7679 3462 Email: {g.wilson,a.sasse}@cs.ucl.ac.uk URL: http://www.cs.ucl.ac.uk/Staff/G.Wilson/ Subjective methods are widely used to determine whether audio and video quality in networked multimedia applications is sufficient. Recent findings suggest that, due to contextual factors, users often accept levels of media quality known to be below the threshold required for such tasks. Therefore, we propose the use of physiological methods to assess the user cost of different levels of media quality. Physiological responses (HR, GSR and BVP) to two levels of video quality (5 vs. 25 frames per second -fps-) were measured in a study with 24 users. Results showed that there was a statistically significant effect of frame rate, in the direction that 5fps caused responses to indicate stress. However, only 16% of the users noticed the difference subjectively. We propose a 3-tier assessment method that combines task performance, user satisfaction and user cost to obtain a meaningful indication of the media quality required by users. Keywords: Evaluation methods, empirical evaluation, subjective assessment, user cost, audio, video, videoconferencing, physiological measurements.

1

Introduction

Multimedia conferencing (MMC) allows users to communicate using audio, video and shared-workspace tools in real time. In recent years, MMC has become more widely used, including in areas such as healthcare and distance education. Computer workstations and high-bandwidth networks can deliver high-quality audio and video, but higher quality is usually more expensive. Since most users – individual or corporate – do not want to pay more than necessary for their communications, it is important to determine the level of media quality that

Utilising Physiological Responses to Assess Media Quality

2

supports effective and comfortable interaction between users collaborating on a specific task. The point at which increased quality has no further benefit to the user should also be considered, since this allows efficient use of bandwidth. Establishing such quality thresholds is essential for network providers and multimedia application developers. Currently, subjective methods are widely used to assess audio and video quality. However, there are problems associated with the use of those methods and when used as the single means of assessing whether quality is adequate, the results obtained may give a misleading impression about the impact on the user (see section 2). This paper presents a new method for assessing media quality in the context of networked applications: physiological responses to media quality are being taken as an objective measure of user cost. Such methods should be part of a traditional HCI evaluation approach that considers task performance, user satisfaction and user cost, to obtain a reliable indication of how quality affects users. Section 2 presents a critical review of existing methods for assessing media quality, and the rationale for the new assessment method. Section 3 describes an experimental study whose results demonstrate the validity of this method. The implications of the results are discussed in section 4; conclusions and future work follow in section 5.

2

Evaluating Multimedia Quality

In media quality assessment to date, audio and video quality have been treated as uni-dimensional phenomena, which can be described using a one-dimensional rating scale such as those recommended by the International Telecommunications Union (ITU) (see section 2.1). However, this approach is unsuitable in the assessment of media quality in a complex environment, such as videoconferencing, since many factors contribute to users’ perception (Watson & Sasse, 1998): loudness, intelligibility, naturalness, pleasantness of tone and listening effort required are known to contribute to audio quality (Kitawaki & Nagabuchi, 1988). With video quality, variables such as brightness, background stability, speed in image reassembling, outline definition, ‘dirty window’ and the mosaic/blocking effect all contribute to its perceived quality (Gilli et al., 1991).

2.1

Current Assessment Methods

At present, the most widely used assessment method for audio and video quality is the subjective rating scales recommended by the ITU, which is an international organisation in which governments and the private sector co-ordinate global telecommunication networks and services. It is the leading publisher of telecommunications regulatory and standards information. The assessment scales recommended by the ITU fall into three categories: those for speech transmission over telephone networks, image quality over television systems, and multimedia systems. Generally, a short section of material

3

Gillian M. Wilson and M. Angela Sasse

is played, after which the viewer/listener rates their opinion on a 5-point quality or impairment scale, usually labelled with the terms ‘Excellent, Good, Fair, Poor, Bad’. A Mean Opinion Score (MOS) (ITU-R BT. 500-8) is computed by way of an average score. However, recent research at UCL has illustrated the unsuitability of these scales when applied to network audio and video (Watson & Sasse, 1998).

2.2

Problems with ITU Scales

The MOS scales were originally designed to assess high-quality television pictures and toll-quality audio. They are primarily concerned with establishing if viewers/listeners can detect a particular degradation in quality, and this assessment is usually carried out without any reference to task. However, when it comes to determining the media quality required in networked applications, the question asked by network providers, application developers and users is: What level of quality is good enough? Audio and video delivered over digital networks is subject to unique impairments, such as packet loss, so the quality can vary during a session. However, the short duration of the test material used by the scales discussed above is not long enough to allow users to form an opinion on whether the quality is good enough in the context of longer interactive use, such as videoconferencing. In order to account for the fact that network conditions can be variable, a dynamic measuring scale is now recommended by the ITU for video assessment - ITU BT. 500-8. The final problem regards assessment via the 5-point scale. Firstly, the 5point scale does not reflect the multi-dimensional nature of audio and video quality and the potential interaction between different types of impairment (see section 2 introduction). Secondly, the labels on the scale do not represent equal intervals (Jones & McManus, 1986; Teunissen, 1996), so caution must be exercised when interpreting data gathered using such scales (Watson & Sasse, 1998). Attempts to improve the scales have been made by our research group at UCL. Watson & Sasse (1997) developed an unlabelled scale and showed that users were consistent in their quality ratings of audio segments when using it. A dynamic version of this unlabelled scale, the QUality ASsessment Slider, (QUASS - Bouch et al., 1998) allows users to continuously rate the quality of audio and video in a meaningful context. A drawback of this method is that continuous rating interferes with user’s primary task – e.g. participating in a tutorial via videoconferencing.

2.3

Problems with Subjective Assessment

Subjective assessment is cognitively mediated. One example of this process is the finding that users accepted significantly lower media quality when a notion of

Utilising Physiological Responses to Assess Media Quality

4

financial cost was attached to the level of quality (Bouch & Sasse, 1999). The quality that users rated as acceptable in this study has been demonstrated to be insufficient in a number of experimental and field studies. Another example is Wilson & Descamps’ (1996) finding that the same video quality receives lower ratings when the task being performed is difficult. This evidence indicates that contextual variables can influence users’ subjective assessment of quality. Knoche et al. (1999) argue that, because it is not possible for users to register what they do not consciously perceive, subjective measurements are fundamentally flawed. As a more effective method, they recommend task performance. As HCI researchers, we agree that task performance is an essential element of usability, yet cannot be used as its only measure. Subjective assessment methods capture the degree of user satisfaction with quality, which is important but not necessarily a reliable indicator of the impact that quality has on the user. We argue that both task performance and user satisfaction need to be used in conjunction with a measure of user cost, as part of a 3-tier approach. User cost is an explicit - if often disregarded - element of the traditional HCI evaluation framework.

2.4

Objective Measurements of User Cost

There are subjective approaches to determining user cost - rather than user satisfaction - via rating scales, yet like all subjective rating methods they are cognitively mediated. We thus decided to investigate the use of objective methods of assessing the impact of media quality on the user. One way to determine this, is to monitor physiological responses that are indicative of stress and discomfort. When a user is presented with insufficient audio and video quality in a task context – e.g. when making a business decision in a videoconference meeting he/she must expend extra effort on decoding information at the perceptual level. If the user is struggling to decode the information, this should induce a response of discomfort or stress, even if the user is still capable of performing his/her main task (e.g. participating in the business decision). Autonomous physiological responses are not subject to cognitive mediation (see 2.5) and collecting such measurements need not interfere with task completion.

2.5

Physiological Measurements

Psychophysiology explores the relationship between the mind and body and the interactive influence they have upon each other. The nervous system of humans is separated into the central nervous system (CNS) and the peripheral nervous system (PNS) (see Figure 1). The PNS comprises the somatic nervous system (SNS) and the autonomic nervous system (ANS). The ANS is divided into sympathetic and parasympathetic divisions. The sympathetic division mobilises the body’s energetic responses. Thus, when faced with a stressful situation, the sympathetic division prepares the body

5

Gillian M. Wilson and M. Angela Sasse

for the ‘fight or flight’ response by, e.g. releasing glucose into the bloodstream for energy, and dilating the walls of the blood vessels to speed up blood flow to the limbs. When the stressful situation has passed, the parasympathetic division takes over to return the body to equilibrium.

Nervous System Central Nervous System

Brain SNS

Peripheral Nervous System

Spinal Cord

ANS

Sympathetic Division

Parasympathetic Division

‘Fight or flight’ response

Restores body’s balance

HR & GSR increases, BVP decreases Figure 1: Diagram illustrating the nervous system of humans.

We decided to concentrate on the following responses as indicators of stress: Heart Rate (HR), Blood Volume Pulse (BVP) and Galvanic Skin Resistance (GSR). They were adopted, as they are non-invasive, i.e. they do not require samples of body fluids to be taken. In addition, they are relatively easy to measure with standard monitoring equipment. Heart rate is considered to be a valuable indicator of overall activity level, with a high heart rate being associated with an anxious state and vice versa. (Frijda, 1986). Seyle (1956) has linked GSR to stress and ANS arousal. GSR is also known to be the fastest and most robust measure of stress (Cacioppo & Louis, 1990), with an increase in the resistance of the skin being associated with stress. BVP is an indicator of blood flow: the BVP waveform exhibits the characteristic periodicity of the heart beating: each beat of the heart forces blood through the vessels. The overall envelope of the waveform pinches when a person is startled, fearful or anxious, thus a decrease in BVP amplitude is indicative of a person under stress. HR rises under stress in order to increase blood flow to the working muscles, thus preparing the body for the ‘fight or flight’ response. The function of a decreased BVP under stress is to divert blood to the working muscles in order to prepare them for action. This means that blood flow is reduced to the extremities,

Utilising Physiological Responses to Assess Media Quality

6

like a finger. The precise reason for GSR increasing under stress is not known. One theory is that it occurs in order to toughen the skin, thus protecting it against mechanical injury, (Wilcott, 1967) as it has been observed that skin is difficult to cut under profuse sweating (Edelberg & Wright, 1962). A second theory is that GSR increases to cool the body in preparation for the projected activity of ‘fight or flight’. The equipment used to measure the physiological responses in this experiment and throughout this research is the ProComp, manufactured by Thought Technology Ltd. This equipment is lightweight and the sensors are small. At present they are placed on three fingers, however we are looking at the possibility of placing sensors on the feet, (e.g. Healey et al., 1999) to allow ease of typing. To measure GSR, two electrodes are placed on adjacent fingers and a small voltage is applied. The skin’s capacity to conduct the current is measured. Photoplethysmography is used to measure HR and BVP: a sensor is attached to a finger - this applies a light source and the light reflected by the skin is measured. At each contraction of the heart, blood is forced through the peripheral vessels, which produces an engorgement of the vessel under the light source. Thus, the volume and the rate at which blood is pumped through the body are detected.

2.5.1

Problems with Measuring Physiological Responses

Measuring physiological signals in response to multimedia quality can be problematic. One of the main issues is how to separate stress and other emotions, such as excitement about the situation or task, in an experiment. This is a problem as the physiological patterns accompanying each emotion are not clearly understood, (Cacioppo & Louis, 1990) however recent research at the Massachusetts Institute of Technology Media Laboratory, (MIT) (Vyzas & Picard, 1999) has shown that eight emotions can be distinguished between with eightypercent accuracy, which is an encouraging result. We are using the following methods to address this problem in our experiments, by ensuring that there is no stress placed on participants by factors other than the quality. • In our lab-based trials, we hold the environment as constant and minimally stressful as possible. An example of this is that we make sure that stressful environmental events, like the phone ringing, do not occur: we need to determine the effects the quality has in isolation before we can account for environmental events in the field. • We measure the baseline responses of participants for fifteen minutes, prior to any experimentation occurring. This allows participants, and the sensors, time to settle down and allows us to have a set of baseline physiological readings with which to compare responses in an experiment. • We discard the first five minutes of physiological responses in experiments to ensure that the change from baseline measurements being taken to the experiment commencing is not affecting results.

7

Gillian M. Wilson and M. Angela Sasse



We administer subjective assessments of user cost, i.e. scales of discomfort, to allow people to comment on how they feel during the experiment. Physiological measurements identify problems but do not aid problem resolution: subjective assessment is still needed for this purpose. Finally, we carefully design the tasks used in our experiments to ensure that they are engaging, yet minimally stressful (see section 3.1 for an example). The tasks used in our experiments are taken from the taxonomy of tasks performed in networked multimedia environments developed by the ETNA project (see section 5.2).



3

Experimental Study

Results of investigations have generally found that the addition of video to an audio stream does not improve task performance in the context of MMC. The exceptions to this are where conflict resolution or negotiations are required (Short et al., 1976) or when communication is particularly difficult. This could be when participants in the conference do not share the same first language (Veinott et al., 1997). Other research claims that addition of a video channel benefits the process of communication rather than its result. This could be by making the conversation more fluent between people (Daly-Jones et al., 1998). It has often been found that users consider the video to be of subjective benefit: people said they preferred the video to be there (e.g. Tang & Issacs, 1993). Therefore, it can be presumed that video is not a prerequisite for a user to perform a task, yet without the video channel greater effort must be expended (Monk et al., 1996). Synchronisation between audio and video (‘lip synch’) is perceived at around 16 frames per second (fps), and full motion video is defined at 25fps - television quality. High frame rates require a lot of bandwidth, which is not available or affordable for many users. Studies looking at the outcome of a task (as opposed to how communication was conducted - e.g. the number of turns taken) found no impairment in task performance from using low video frame rates (e.g. Kies et al., 1996). It must be noted here that the task being performed is a major factor in the requirements for frame rate. For example, a task that uses ‘video-as-data’, e.g. a neurosurgeon performing an operation where detailed and rapidly changing information is relayed to help critical surgical decisions (Nardi et al., 1998), requires a high frame rate. Recent research using subjective assessment and measures of task performance found that users do not notice the difference between 12fps and 25fps when involved in an engaging task (Anderson et al., 2000). In addition, there is no significant difference in task performance at these two frame rates. However the difference between the same two frame rates is noticed when the data are short video clips in isolation (O’Malley et al., in press). If users do not subjectively notice such differences in frame rate when engaged in a task, does this imply that it has no effect on them physiologically? It

Utilising Physiological Responses to Assess Media Quality

8

needs to be determined if high frame rates are necessary for the user to be satisfied with the quality and to allow them to complete their task without significant user cost. If it is discovered that frame rate does not have a significant impact upon the user, then bandwidth could be conserved and resources would be better allocated elsewhere.

3.1

Method

To investigate the subjective and physiological effects of video frame rate, a full experiment investigating the effects of 5fps and 25fps was devised. We created these 'very low' and 'very high' quality conditions, as we wanted to determine if the difference in frame rate was still not noticed when it became more extreme than used by Anderson et al. (2000). Twenty-four volunteer participants watched two recorded interviews, which were acted between a university admissions tutor and two school pupils applying to University College London. The tutor and students played themselves in scripted interviews, which had been designed with the help of an admissions tutor to mimic typical interactions in such interviews and to ensure that the content of the interviews was not unduly stressful. The interviews were conducted using IP (Internet Protocol) videoconferencing tools on a high-quality computer screen. Audio quality was good and did not vary during the sessions. The interviews began at 16fps for five minutes: these results were disregarded in order to account for any change in physiological measurements due to the experiment beginning (see section 2.5.1). The interviews lasted fifteen minutes each. Participants saw two interviews at 525-5fps or 25-5-25fps: each frame rate was held for a period of five minutes (the frame rate changed twice in order to counteract any expectancy effect). The task was to make a judgement on the suitability of the candidates. The following hypotheses were posited: 1. 2.

There will be different physiological responses to the two frame rates: 5fps will cause more stress. Participants will not subjectively register the frame rate change.

Participants rated the audio/video quality throughout the interviews using the QUASS tool (Bouch et al., 1998). After they had watched both interviews, a questionnaire was administered. This addressed how participants felt during the experiment and asked their opinions on the audio and video quality. Physiological measurements were taken throughout the experiment, and baseline responses were gathered for fifteen minutes before the experiment began.

3.2

Results

9

Gillian M. Wilson and M. Angela Sasse

microsiemens

20 15 5fps

10

25fps 5 23

21

19

17

15

13

11

9

7

5

3

1

0 Participant

The mean GSR, HR and BVP responses at both frame rates are shown in figures 2, 3 and 4 respectively.

110 100 90 80 70 60 50

5 fps

23

21

19

17

15

13

11

9

7

5

3

25 fps

1

bpm

Figure 2: Mean GSR for each participant at 5fps and 25fps

Participant

Figure 3: Mean HR for each participant at 5fps and 25fps

26 25.5 5 fps 25 fps

24 23.5 23

21

19

17

15

13

11

9

7

5

3

23 1

%

25 24.5

Participant

Figure 4: Mean BVP for each participant at 5fps and 25fps

Utilising Physiological Responses to Assess Media Quality

10

A Multivariate Analysis of Variance (MANOVA) was performed on the data with the independent variables frame rate and order of presentation. There was no significant effect of order of presentation on any of the signals: GSR (F(1,22)=0.383, p=0.542); HR (F(1,22)=1.139, p=0.297); BVP (F(1,22)=0.680, p=0.418). There was a significant effect of frame rate on each of the signals: GSR (F(1,22)=9.925, p=0.005); HR (F(1,22)=9.415, p=0.006); BVP (F(1,22)=5.074, p=0.035). Examination of the direction of the means showed that GSR and HR significantly increased at 5fps whereas BVP significantly decreased at 5fps: these results are indicative of stress.

4

Discussion of Results

The results from this experiment show that there was a statistically significant effect of frame rate on participants' physiological responses in the direction predicted: 5fps caused responses indicative of stress. Thus, hypothesis 1 is supported. The questionnaire results showed that 84% of participants did not notice the frame rate change subjectively, thus hypothesis 2 is supported. In addition, there was no significant correlation between subjective and physiological results, which indicates that physiological measurements are tapping into a mechanism that subjective, cognitively mediated, responses do not register. These results are important as they indicate that, when users are engaged in a task, they do not subjectively notice the difference between two extreme frame rates during or after the task, however the difference is registered physiologically. Thus, the difference in quality does have a physiological impact upon the user. The direct implication of these results is that at very low frame rates, as used in this experiment, users have to work harder at the same task. Application designers and network providers should consider this information. The findings from this experiment imply that acceptable quality levels required for a task and those that result in unacceptable user cost should not be determined by subjective assessment alone, as they may not pick up important but subconscious effects.

5

Discussion

5.1 Conclusions Three main conclusions can be made from this research. Firstly, different levels of media quality cause different physiological responses in users and can be detected through common physiological measurement techniques. Secondly, subjective assessment and measures of task performance do not pick up all the effects of poor quality in the short-term, e.g. in an hour-long experimental study. It is possible that the negative effects of poor quality would emerge in these assessment

11

Gillian M. Wilson and M. Angela Sasse

methods in longer-term studies, yet for laboratory-based experiments physiological responses give a more instant account of how the quality affects the user. We therefore argue that the 3-tier approach to multimedia quality assessment, as described in section 2.3, needs to be utilised to determine if a certain level of media quality is usable. Furthermore, we suggest that the largely neglected question of user cost should be given due attention in usability evaluation of any technology, and that objective measures - such as physiological responses - may be more reliable measures of user cost than subjective methods, which are cognitively mediated. Critics of this approach may argue that it is not proven that stress responses are a reliable indicator that a factor - e.g. a level of media quality - is actually bad for the user. In our view, it is reasonable to assume that a significant deviation from baseline responses in the direction of stress indicates that the user has to work harder, and that this might manifest itself in a usability problem with prolonged use. At a time where the negative effects of stress in the workplace are debated, indications that a particular aspect of technology - such as the level of video quality- may be inducing stress deserves further investigation.

5.2 Contributions Our continuing work in this area aims to produce three substantive contributions. Firstly, the minimum levels of multimedia quality for certain tasks at which users can successfully perform, without significant user cost, will be determined. The impact of problems caused by the network will be investigated, such as audio packet loss, delay and lip synchronisation. However, quality is not unidimensional and encompasses more than variables affected by the network. Thus, the effects of other contributing factors must be examined, e.g. volume differences between speakers and image size. This will allow network providers to allocate resources with the users requirements clearly specified, thus improving applications for the end user. These findings will be incorporated into the ETNA Project, which aims to produce a taxonomy of real-time multimedia tasks and applications, and to determine the maximum and minimum audio/video quality thresholds for a number of these tasks. This will greatly assist network providers and application designers, as they will have guidelines on the quality they need to deliver for specific tasks. Secondly, this research aims to build a utility curve. Utility curves provide a mechanism by which the network state can be related to the end user. Such curves are usually formulated by the results of subjective assessment, however by using physiological measurements an adaptive application could be built. This would enable the application to receive continuous feedback on the state of the user. In the future a user ‘wearing’ a discrete computer, like those being developed at MIT Media Lab, could have their physiological responses fed into a videoconferencing application. If the computer detected that the user was under stress, it would automatically adjust the variable of the videoconference causing stress to reduce

Utilising Physiological Responses to Assess Media Quality

12

user cost and increase user satisfaction. If network congestion was occurring, the computer would then refer to the utility curve to deliver the next best quality possible. Finally, a methodological contribution will be made: guidelines stating the best physiological measurements to indicate a specific impairment in quality will be produced. This will pave the way for much needed further research in this area.

5.3 Future Applications We are currently in discussions with British Telecom about the possibility of using physiological measurements as a method of stress detection to evaluate a new interface that has been developed. The MUI (Motivational User Interface) was developed by Bournemouth University and BT’s Bournemouth ‘150’ call centre (Millard et al., 1999). It aims to motivate and provide feedback to call centre operators, thus reflecting their positive attitude back to the customer. BT wants to determine if operators are put under more or less stress when using the MUI, as opposed to the traditional interface. In addition there is interest in developing an adaptive application, whereby the application would modify itself if the operator became stressed. This example of industrial interest illustrates that the ability to detect discomfort and stress unconsciously has wide-ranging implications in product assessment. It can also be used in areas like teaching, stress control and providing 'emotionally sympathetic' user interfaces.

Acknowledgements Gillian Wilson is funded through an EPSRC CASE studentship award with BT Labs. Many thanks to Dr Janet McDonnell of the Computer Science department at UCL, for her help in creating the interview task.

References Anderson, A., Smallwood, L., MacDonald, R., Mullin, J., Fleming, A. & O'Malley, O. (2000), “Video Data and Video Links in Mediated Communication: What Do Users Value?”, International Journal of Human Computer Studies 52(1), 165-187. Bouch, A. & Sasse, M. A. (1999), “Network Quality of Service: What do Users Need?”, Proceedings of the 4th International Distributed Conference, 22nd – 23rd September 1999, Madrid.

13

Gillian M. Wilson and M. Angela Sasse

Bouch, A., Watson, A. & Sasse, M. A. (1998), “QUASS – A Tool for Measuring the Subjective Quality of Real-time Multimedia Audio and Video”, in J. May, J. Siddiqi & J. Wilkinson (eds.), HCI ’98 Conference Companion, pp.94-95, 1st – 4th September 1998, Sheffield, UK. Cacioppo, J. T & Louis, G. T. (1990), “Inferring Psychological Significance from Physiological Signals”, American Psychologist 45(1), 16-28. Daly-Jones, O., Monk, A. & Watts, L. (1998), “Some Advantages of Videoconferencing Over High-Quality Audio Conferencing: Fluency and Awareness of Attentional Focus”, International Journal of Human Computer Studies 45, 21-58. Edelberg, R. & Wright, D. J. (1962), “Two GSR Effector Organs and their Stimulus Specificity”, Paper Read at the Society for Psychophysiological Research, Denver, 1962. ETNA Project http://www-mice.cs.ucl.ac.uk/multimedia/projects/etna/ Frijda, N. H. (1986), “The Emotions, chapter Physiology of Emotion”, Studies in Emotion and Social Interaction, Cambridge University Press, Cambridge, pp124175. Gilli Manzanaro, J., Janez Escalada, L., Hernandez Lioreda, M. & Szymanski, M. (1991), “Subjective Image Quality Assessment and Prediction in Digital Videocommunications”, COST 212 HUFIS Report. Healey, J., Seger, J. & Picard, R. W. (1999), “Quantifying Driver Stress: Developing a System for Collecting and Processing Biometric Signals in Natural Situations”, Proceedings of the Rocky Mountain Bio-Engineering Symposium, 16th - 18th April, 1999. ITU-R BT.500-8 “Methodology for the Subjective Assessment of the Quality of Television Pictures”: http://www.itu.int/publications/itu-t/iturec.htm Jones, B. L. & McManus, P. R. (1986), “Graphic Scaling of Qualitative Terms”, SMPTE Journal, November 1996, 1166-1171. Kies, J. K., Williges, R. C. & Rosson, M. B. (1996), “Controlled Laboratory Experimentation and Field Study Evaluation of Videoconferencing for Distance Learning Applications”, Virginia Tech HCIL-96-02, available from http://www.hci.ise.vt.edu/~hcil/

Utilising Physiological Responses to Assess Media Quality

14

Kitawaki, N. & Nagabuchi, H. (1998), “Quality Assessment of Speech Coding and Speech Synthesis Systems”, IEEE Communications Magazine, October 1998, pp.36-44. Knoche, H., De Meer, H. G. & Kirsh, D. (1999), “Utility Curves: Mean Opinion Scores Considered Biased”, Proceedings of 7th International Workshop on Quality of Service, 1st - 4th June 1999, University College London, London, UK. Millard, N., Coe, T., Gardner, M., Gower, A., Hole, L. & Crowle, S. (1999), “The Future of Customer Contact”, British Telecom Technology Journal, http://www.bt.co.uk/bttj/vol18no1/today.htm Monk, A. F., McCarthy, J., Watts, L. & Daly-Jones, O. (1996), “Measures of Process”, in M. McLeod and D. Murray (eds.), Evaluations for CSCW, Berlin: Springer-Verlag, pp125-139. Nardi, B., Kuchinsky, A., Whittaker, S., Leichner, R. & Schwarz, H. (1997), “Video-as-data: Technical and Social Aspects of a Collaborative Multimedia Application”, in K. Finn, A. Sellen and S. Wilbur (eds.), Video-Mediated Communication, Lawrence Erlbaum Associates, pp 487-518. O’Malley, C., Anderson, A. H., Mullin, J., Fleming, A., Smallwood, L. & MacDonald, R. (in press), “Factors Affecting Perceived Quality of Digitised Video: Tradeoffs between Frame Rate, Resolution and Encoding Format”, To appear in Applied Cognitive Psychology. Seyle, H. (1956), “The Stress of Life”, McGraw-Hill. Short, J., Williams, E. & Christie, B. (1976), “The Social Psychology of Telecommunications”, Chichester: Wiley. Tang, J. C. & Isaacs, E. A. (1993), “Why Do Users Like Video: Study of Multimedia Supported Collaboration”, Computer Supported Cooperative Work 1, 163-196. Teunissen, K. (1996), “The Validity of CCIR Quality Indicators along a Graphical Scale”, SMPTE Journal, March 1996, 144-149. Thought Technology Ltd. See http://www.thoughttechnology.com/ Veinott, E. S., Olson, J. S., Olson, G. M. & Fu, X. (1997), “Video matters! When Communication Ability is Stressed, Video Helps”, Proceedings of CHI'97, pp.315-316, 22nd - 27th March 1997, Atlanta, GA.

15

Gillian M. Wilson and M. Angela Sasse

Vyzas, E. & Picard, R. W. (1999), “Offline and Online Recognition of Emotion Expression from Physiological Data”, Workshop on Emotion-Based Agent Architectures, Third International Conference on Autonomous Agents, 1st May 1999, Seattle, WA. Watson, A. & Sasse, M. A. (1997), “Multimedia Conferencing via Multicast: Determining the Quality of Service required by the End User”. Proceedings of AVSPN ’97 – International Workshop on Audio-visual Services over Packet Networks, pp.189-194, 15th - 16th September 1997, Aberdeen, Scotland, UK. Watson, A. & Sasse, M. A. (1998), “Measuring Perceived Quality of Speech and Video in Multimedia Conferencing Applications”, Proceedings of ACM Multimedia ’98, ACM New York, pp.55-60, Bristol, UK. Wilcott, R. C. (1967), “Arousal Sweating and Electrodermal Phenomena”, Psychological Bulletin 67, 58-72. Wilson, F. & Descamps, P. T. (1996), “Should We Accept Anything Less than TV Quality: Visual Communication”, Paper presented at International Broadcasting Convention, 12th – 16th September 1996, Amsterdam.