Conference Proceedings - DIRC

1 downloads 0 Views 7MB Size Report
Mar 17, 2005 - Peter Bagnall, Guy Dewsbury and Ian Sommerville. Submitted as a DIRC ...... built, they inevitable replace other systems (IT or manual). Developers need to ...... [8] Lyu, M.R., Software Reliability Engineering, McGraw. Hill, 1995. ...... well defeat some of the central objectives of process modelling. So, for ...
The 5th Annual DIRC Research Conference NESC - Edinburgh 15th - 17th March 2005

DIRC Research Themes: Diversity, Responsibility, Risk, Structure, Timeliness.

Proceedings of the Fifth DIRC Research Conference Edinburgh, March 2005

Edited by Jo Mackie and Mark Rouncefield (Lancaster University, UK)

Lancaster University Press. ISBN 1-86220-159-5

i

Table of Contents Introduction – Interdisciplinary Research Themes................................................................. 1 RISK Managing the Risks of Electronic Assistive Technology: Two complementary Methods....... 4 Gordon Baxter, Guy Dewsbury, Andrew Monk, and Ian Sommerville Submitted under the TA EAT

Risks and Dependable Deployment....................................................................................... 8 Jo Mackie and Dave Martin. Submitted under the TA Dependable Deployment

Studies in Behavioural Decision Making............................................................................. 13 Lisbeth Aagaard Submitted as a DIRC funded PhD Contribution

Building the NTRAC: Something Old, Something New...................................................... 15 Kate Ho Submitted as a DIRC funded PhD Contribution

Opacity and Secure Information Flow ................................................................................. 17 Thea Peacock Submitted as a DIRC funded PhD Contribution

Cognitive Assistance for Dementia ..................................................................................... 20 Joe Wherton Submitted as a DIRC funded PhD Contribution DIVERSITY

A Study of Confidence in Safety Judgements...................................................................... 24 E. Alberdi, R. Bloomfield, M. van der Meulen, B. Littlewood and Peter Ayton Submitted under the TA Arguments

TA Weinberg ...................................................................................................................... 25 David Greathead, Budi Arief and Joey Coleman Submitted under the TA Weinberg

Some Difficult Decisions are Easier Without Computer Support......................................... 30 Andrey A. Povyakalo, Eugenio Alberdi and Lorenzo Strigini Submitted under the TA Mammography

A Socio-technical Approach to Voting Systems .................................................................. 31 P Y A Ryan Submitted under the TA Chaum

The Effectiveness of Choice of Programming Language as a Diversity Seeking Decision ... 35 M.J.P. van der Meulen, P.G. Bishop and M. Revilla Submitted under the TA Programming Contest

The Limits of Personas ....................................................................................................... 39 Peter Bagnall, Guy Dewsbury and Ian Sommerville Submitted as a DIRC funded PhD Contribution

Towards A JXTA “Peer to Peer” Infrastructure For Dependable Services ........................... 41 Stephen Hall Submitted as a DIRC funded PhD Contribution

Exploiting Diversity in Peer-to-Peer Systems...................................................................... 44 Daniel Hughes, Geoff Coulson and Ian Warren Submitted as a DIRC funded PhD Contribution

ii

The Effect of Diverse Development Goals upon Computer-Based System Dependability ... 46 Tony Lawrie Submitted as a DIRC funded PhD Contribution

Self Organisation in Large Scale Peer-to-Peer Systems ....................................................... 49 Richard Paul Submitted as a DIRC funded PhD Contribution

TIMELINESS When-to-Act: Evidence for Action Bias in a Dynamic Belief-Updating Task...................... 52 Michael Hildebrandt and Joachim Meyer Submitted under the TA When To Act

‘Dasein of the Times’: Temporal Features of Dependability................................................ 56 K. Clarke, J. Hughes, D. Martin, M. Rouncefield, A. Voß, R. Procter, R. Slack, M. Hartswood. Submitted under the TA Trust Book

Measurement-Based Worst-Case Execution Time (WCET) Analaysis ................................ 64 Adam Betts and Guillem Bernat Submitted as a DIRC funded PhD Contribution

Contract Negotiation, a Tale of Five Themes ...................................................................... 66 Russell Lock Submitted as a DIRC funded PhD Contribution

Human Error Analysis for Collaborative Work and the Timing Theme ............................... 68 Angela Miguel Submitted as a DIRC funded PhD Contribution

RESPONSIBILITY TA Vulnerability Analysis .................................................................................................. 72 Karen Clarke Submitted under the TA Vulnerability Analysis

Complexities of Multi-organisational Error Management.................................................... 74 John Dobson, Simon Lock, Dave Martin Submitted under the Responsibility Theme

‘That’s How The Bastille Got Stormed’: Issues of Responsibility in User-Designer Relations .... 82 Dave Martin and Mark Rouncefield Submitted under the TA Dependable Deployment

Patterns of Responsibility ................................................................................................... 92 Marian Iszatt-White, Simon Kelly, Dave Martin and Mark Rouncefield Submitted under the TA Making Ethnography Accessible

Some Notes on the Social Organization of Responsibility ................................................. 100 John Hughes, Dave Martin and Mark Rouncefield Submitted under the TA Making Ethnography Accessible

The Influence of Regret on Choice: Theoretical and Applied Perspectives ........................ 110 Chris Wright Submitted as a DIRC funded PhD Contribution

STRUCTURE Structuring Dependable on-line Services: A Case Study Using Internet Grocery Shopping....... 114 G. Baxter, B. Arief, S. Smith, A. Monk Submitted under the TA Net Neighbours

Cognitive Conflicts in Aviation: Managing Computerised Critical Environments ............. 119 Denis Besnard and Gordon Baxter Submitted under the TA Cognitive Mismatch

iii

Dynamic Coalitions: A Position Paper .............................................................................. 125 The Dynamic Coalitions Coalition Submitted under DIRC

Developing an Ontology for QoS ...................................................................................... 128 Glen Dobson and Russell Lock Submitted under PA9

Capturing Emerging Complex Interactions Safety Analysis in ATM................................. 133 Massimo Felici Submitted under the TA NATS

Specification and Satisfaction of SLAs in Service Oriented Architectures......................... 141 Stuart Anderson, Antonio Grau and Conrad Hughes Submitted under PA9

Structuring Defences in Dependability Arguments............................................................ 151 Mark Sujan, Shamus Smith and Michael Harrison Submitted under the TA New for Old

Analysing User Confusion in Context Aware Mobile Applications ................................... 161 K. Loer and M. D. Harrison Submitted under the TA Constraints

Long Running Composite Services ................................................................................... 172 Jamie Hillman Submitted under PA9

Research Into Architectural Mismatch In Web Services and Its Relation to DIRC Themes..... 176 Carl Gamble Submitted as a DIRC funded PhD Contribution

Exploration Games With UML Software Design............................................................... 178 Jennifer Tenzer Submitted as a DIRC funded PhD Contribution.

iv

DIRC Research Conference March 2005 'Interdisciplinary Research Themes' The focus for the DIRC Research Conference this year is the 'Interdisciplinary Research Themes': Risk, Diversity, Timeliness, Responsibility and Structure. These 'Research Themes' act as a way of gathering, analysing and recording the lasting knowledge that comes out of DIRC research. One motivation for selecting the themes was that it should be possible (and interesting) to look at these themes from both a technical (system) and human (user) viewpoint. What we have, therefore, in these proceedings, is a selection of interdisciplinary papers that deploy ideas from social science research to the technical issues of dependable computing (and vice versa). In these papers DIRC researchers, and DIRC funded PhD students, have re-considered and re-focused their existing and ongoing body of research and address how it might advance current thinking on the different research themes. Risk: Most technical work on risk has concentrated on identifying undesirable events and assessing the severity of their consequences and their probability of occurrence. This technical view of risk is challenged by new computer-based systems that are deeply embedded and widely distributed in our culture and by the difficulties encountered by decision takers in utilising technical risk information. The DIRC challenge is: " to take a broader view of risk in which technical developments and issues of the decision taking process are developed hand in hand. This broader view has to embrace “secondary dimensions” of risk such as whether the risk is undertaken voluntarily, whether the risk has catastrophic consequences, to what extent the consequences of the risk are reversible, etc. We believe that these are important bridges between formal, probabilistic, assessments and the decision-taking context". Diversity: Significant effort is being put into understanding design diversity – how it can be measured, assessed and maximised. But it is clear that diversity can be applied in much more general forms such as the elicitation of requirements using the “diverse” viewpoints of many domain experts, and the use of “independent” argument legs in safety cases. The DIRC challenge is that: "By studying wider applications of diversity –between humans and computers, different procedures and intellectual approaches, between more than one human, and so on– we will develop a better understanding of the contribution of diversity to dependability". Timeliness: Timeliness poses special concerns for the safe use of systems, there are interesting contrasts between the ways in which a system can be developed to meet guaranteed timing constraints and the expectations that one can put on the human users of a system. The DIRC challenge is that: "DIRC research spans technical aspects of notations or logics for time through to the human perceptions of time. One interesting issue is the need for notations that make it easier to talk about cyclic behaviour". Responsibility: It is important to show clearly the ways in which the acceptance, recording and discharge of responsibilities are reflected in the technical systems that mediate social relationships, particularly in the presence of social and human failure. The DIRC challenge is that: "IRC research in this area will both look at ways to express responsibility structures and their use in the design of dependable systems".

Structure: A well-chosen structure helps system designers and evaluators to understand a system by allowing them to “divide and conquer” the system’s complexity, and ensures that any constraints imposed by the structure do not impose unacceptable overheads on the operation of the system. The DIRC challenge is that: "It is understood how to deploy redundancy to protect against failures of physical components; but it is difficult to apply the same techniques in a systematic way to human behaviour, and hence the issue of structuring a complex computer based system as a whole is still a great challenge. Indeed, the IRC will probably have to tackle the issue of languages to describe human behaviour in order to reason about overall system architectures that will guard against human errors."

Page 1

Page 2

Risk Theme

Page 3

Managing the Risks of Electronic Assistive Technology: Two complementary methods Gordon Baxter1, Guy Dewsb ury 2, Andrew Monk1 and Ian Sommerville2 Department of Psychology University of York Heslington York YO10 5DD +44 1904 434369

Computing Department Infolab 21, South Drive, Lancaster University Lancaster LA1 4WA +44 1524 510351

{g.b axter, a.monk}@p sych.york.ac.uk

{dewsbury, is}@comp .lancs.ac.uk

ABSTRACT In the UK Electronic Assistive Technology (EAT) is widely seen as a way of enabling older and disabled people t o continue to live independently for longer. EAT systems are therefore critical to the health and well being of these people, and hence must be dependable. Two methods are described which provide assurances about the dependability of EAT systems. These methods adapt the concept of risk as normally used in the workplace to make it more applicable to domestic settings. In so doing, the methods provide a way of identifying and managing risk in the home.

Keywords Risk; Electronic Assistive Technology; Dependability.

1. INTRODUCTION The widely documented changes to the populations of the UK and beyond—older people living longer and a smaller working population generating the resources to provide care for them—mean that governments are looking to Electronic Assistive Technology (EAT) to help provide a solution. The intention is that EAT will be used to help elderly and disabled people to live independently, thereby reducing the need for costly institutional care, and will monitor and help in the control of their health and general well-being. The concept of dependability was originally developed for computer-based systems that were regarded as critical: safety critical, mission critical, business critical and so on. A system that is dependable should do what the user expects it to do, whenever the system is required to deliver a service. In the workplace, the attributes that contribute to the dependability of a system are: availability, reliability, safety, confidentiality, integrity and maintainability. Given that EAT is to contribute to the health and well-being of individuals, it becomes apparent that any EAT system should therefore be considered as a critical system because the person using the system comes to rely on the system for a quality of life. The EAT system should enable the user to do what they want to do when they want to do it. In other words the EAT system must be dependable. In the home, however, the environment tends to be less rigidly controlled than the workplace. The notion of dependability in the home therefore differs from that applied in the workplace. More particularly, the concept of dependability in the home has more facets to it, including attributes such as cost, user repairability, and aesthetics. In this paper we look at how the dependability of EAT can be assessed. Two different methods are described. The first i s based on ideas from human reliability analysis that have been

used in industry. These ideas have been developed into a framework that is more applicable to domestic settings. The second is based on taking the standard model of dependability as defined by Laprie [1] and extending it to take appropriate account of the idiosyncrasies of the home setting. Each of the methods has been developed into a tool that can be used in the field.

2. FROM RISK ANALYSIS TO THE PIT Living independently is a risky business, particularly for elderly people. Whilst there is now a growing acceptance that technology can help, there has been little work on the systematic assessment of the risks that the technology i s intended to mitigate. A systematic approach has been developed using a bottom-up approach that applies Swain and Guttman’s [2] error modes t o the activities of daily living (movement, nutrition, hygiene etc.) and other events that can occur in the home (fire, flood, etc.) to generate a taxonomy of domestic mishaps. From this seven generic types of harm (three physical, and four psychological/social) were identified. The seriousness of the harms is determined by the four generic consequences (distress, loss of confidence, costly medical treatment and death) that were also identified. The risks to the client are managed by first identifying the most important ones, and suggesting interventions to reduce them. Then the proposed system (with the interventions) i s evaluated to make sure that there are no unforeseen side effects of the interventions. Finally, after the new system is in place, and the client has become used to it, the system is evaluated again to see how it has affected the client’s behaviour and abilities. The process is described at length in [3]. Field studies were carried out to identify the sorts of problems that people have with EAT. The results from these studies were used to instantiate the framework as a Post Installation Technique (PIT) for assessing the impact of EAT on older people living independently. More particularly, the intention was to identify the problems and benefits, as perceived by the clients, that occur with EAT after the client has had the technology installed for a few months. The PIT was initially tested in West Lothian. Although each of the EAT installations did not have any reported problems, there were lessons learnt about the structure and delivery of the questionnaire that implements the primary part of the PIT. The original intention had been to walk around the home with the client asking questions in each of the rooms; this was not possible, since each of the participants had some mobility problems. The other main refinement was in the structuring of the questions. There was some repetition in the questions, and it was not always clear which questions related to problems

Page 4

and which to benefits, which is required for the second part of the PIT, where the client-centred risk analysis is carried out. The PIT was also given to staff from FOLD Telecare in Northern Ireland, in order to get an expert assessment of it. During interviews with the staff, they reported that they were very happy with the structure of the PIT, and its general aims. The only question that they said they would not use was related t o taking medication, because they do not use medication reminders. Table 1 shows a sample question from the refined version of the PIT. The question starts by asking where some particular activity (or activities) of daily living are carried out, as a way of focusing the client’s attention on a particular task in a particular part of the home. The client is then asked each of the questions in turn to determine how the technology has affected the way that they do that activity of daily living i n that location. The person who is using the technique puts a tick in the box next to the question if the client answers in the affirmative. Those questions that are marked by a B identify benefits of the equipment; those marked by a P identify problems. The benefits are listed on a summary sheet, and each problem is allocated its own separate sheet. Each of the problems is processed separately. The clients are asked to identify what they perceive as the possible harm arising from the problem, the chances of that harm occurring, and the potential consequences of that harm. Once this has been done, the client allocates their priority for having the problem fixed, and is given the opportunity to suggest how i t may be fixed. Once the problems and benefits have all been processed, the forms are passed on to the appropriate authority for further action. Usually this will be the people who supplied or installed the system.

3. BROADENING DEPENDABILITY FOR 2. Where do you up?………………………

prepare

food,

eat

and

clear

Since the equipment was installed: B  (a) do you find preparing food, eating, and clearing u p easier or you are more confident in doing it? P  (b) do you find preparing food, eating, and clearing u p harder or you are less confident in doing it? B  (c) are there new things you can do now that you could not do before when preparing food, eating and clearing up? P  (d) are there things you can no longer do when preparing food, eating, and clearing up? P  (e) are there new things you have to do now when preparing food, eating and clearing up that you did not need t o do before? P  (f) are there things you do in a different way when preparing food, eating and clearing up? Table 1 Sample question from the Post Installation Technique.

THE HOME The second approach is based on field work with older and disabled people living in the north of England and Scotland. It reinterprets and extends Laprie’s model of dependability [1], originally developed for control and protection systems, within a social context. The new framework embraces the original model but also takes explicit account of the user and the system’s environment rather than positioning them outside the system boundary. That is, when a computer-based system is installed in a domestic environment, the main concern is not just whether or not that system is failure-free. Rather, the overall system dependability depends on whether or not it fulfils its intended purpose as far as the system users are concerned. If it does not do so, it will not be used. This situation is equivalent to an unplanned system failure rate of 100%: hardly a dependable system. The users of the EAT may suffer from a range of disabilities which the EAT is supposed to help them overcome, and more generally help them cope with everyday life in their own home. These elderly people depend on the technology to maintain a reasonable quality of life but, all too often, it lets them down. Sometimes, it simply fails to operate but, more often, it is not or cannot be used as intended because its design does not take into account the specific needs of the elderly users, the context in which the system will be installed and the natural human desire to control rather than be controlled by technology. The applicability of some of the assumptions underlying Laprie’s model have been challenged mainly due to the distinctions between home and organisational environments [4]. In addition to ‘traditional’ dependability attributes, dependable home systems must also be acceptable to their users, fit in with their daily routines and lifestyle and support user adaptation as user needs change. Fieldwork, in which cultural probes were used with older and disabled people [5], demonstrated that the dependability of a system is required to be considered from the first stages of the design process. When an initial assessment of a person i s made, there is a requirement to suggest a person could benefit from technology, and an assumption that the assessor i s qualified to judge the appropriate technology. In reality the assessor is unlikely to be able to keep up to date, let alone be able to judge the quality of a system or device. The framework of social dependability was used as to develop a series of checklists designed to assist professionals who work in the field. These checklists extend notions of dependability into the social arena from a practical and theoretical perspective, b y adding a common vocabulary which allows people t o communicate about the person and the system. The checklists also sensitise users to a philosophy of user centred design i n which people are considered not as labels such as a medical functional/dysfunctional patient, or as “disabled” person, but rather sees them as they wish to be seen or see themselves [6]. The resultant technology should be less likely to be abandoned and more likely to be of real use to the person requiring it. The checklists (CATCH: A Compendium of Assistive Technology Checklists for the Home) provide three forms that allow the user to base the technology decisions and form requirements based on the real needs of the person who needs the technological intervention. The sixty page booklet i s currently being evaluated and has been greeted with positive feedback so far.

Page 5

The CATCH method has three stages to it. The first is a quick reference checklist which is designed to enable decisions to be made about whether a particular system is required, or whether there are more suitable alternatives. The second provides a way for the user to focus on which issues are likely to require particular consideration in which locations in the home. The third part is a more detailed consideration of the issues involved that will contribute to the dependability of the system that is supplied. Table 2 shows an example of the sorts of questions that make up this part of the CATCH. There is one such set of questions for each of the attributes of dependability. As with the PIT, the questions are intended t o be used creatively. The key question is intended to identify the main issue that is being addressed by the system related questions and the user related questions. The system related questions generally address technological issues, whilst the user related questions are more focused on issues to do with the use of the system. Viewpoint Housing association in Scotland and MHA care group in Penrith and Derby have both been involved i n manual walkthroughs of the process of applying the CATCH. The feedback has been used to refine the CATCH checklists.

4. DISCUSSION The two methods described above are designed to take appropriate account of risk as it manifests itself in the domestic setting. The situation in the home is somewhat different from the workplace because liability in the home often rests with the resident or home owner. In other words, the final decision about the tolerability of a particular risk in a particular home is often down to the resident or home owner. The classic, often-quoted, example is the case of rugs in older people’s homes. Whilst it is widely accepted that there is a risk of tripping over them, many older people steadfastly refuse t o sanction the removal of their “favourite” rug. Table 3 provides a simple comparison of the CATCH and PIT methods. In addition to showing the origins of the two methods, and the target user populations, the table also identifies training, and evaluation issues. The final line of the table relates to what is supposed to be done with the results Acceptability: 02 - Learnability Key Question 9. Can the user easily learn how to use the system without excessive training? System Related Questions In what ways is the system required to be easy to learn to use? What parts of the system does the user not require information on? Why is this? Does the system require a long manual to understand it? Is there a quick start guide to assist the user? Is the user fully aware of the system’s limitations? User Related Questions Does the user understand what to do in any common event? Table 2 Sample question from the CATCH technique.

CATCH

PIT

Basis

Generalised dependability properties model: fitness for purpose, etc....

Generalised HRA: harms, ADLs, consequences, steps

Technique users

General purpose (requires some specialist knowledge)

Clients, carers, telecare service staff

Training

To explain field and procedure

Minimal

Procedure for evaluation

Flexible

Detailed specification

Procedure for action

Unspecified

Less well specified

Table 3 Comparison of PIT and CATCH methods obtained by using each of the methods. The CATCH is mainly targeted at design issues. As such, i t implicitly encourages risk prevention. So, rather than carrying out an explicit risk analysis, CATCH focuses on making sure that the system will be dependable. It does this by considering all the attributes of dependability—as the concept applies t o domestic systems [4]—as a key to identifying potential problems during design so that they can be dealt with at that time. In this way, it should flag up the possible sources of harm, so that they can be eliminated or at least the chances of occurrence of the harm occurring can be minimised by using appropriate barriers and defences. There is still the potential for problems to arise after the CATCH has been used, however, if the implementation does not correctly implement the design or if the system is not installed correctly, for example. The PIT, having been developed from industrial approaches t o risk analysis, is more explicit about how to deal with risks i n the home. Since the PIT is intended to be applied after the EAT has been installed, it is more aimed at mitigating risk, as the concept applies in the home [3]. It acknowledges the authority of the resident or home owner by adopting a client-centred perspective of risk. The PIT highlights the problems with EAT in a systematic way, but then allows the client to decide o n how significant each of the problems really is, by considering the possible harms and consequences as they are perceived b y the clients. The clients can also make suggestions about how the problems can be mitigated. In this way, the clients make the final decision about which risks can be defined as being tolerable. The CATCH and the PIT can therefore be seen as complementary. The CATCH should help to reduce the problems that are generated by the system. Any remaining problems (or new problems introduced by implementation or installation) should be picked up by the PIT. Taken together, then, the two methods provide a belt and braces approach t o coping with risk.

5. FUTURE WORK The final stage of the evaluation of the methods is in hand. Arrangements are being made to use both of the methods o n the same systems. The intention is that CATCH will be used

Page 6

with the person who specified the equipment that was going t o be supplied to a client. The PIT will then be used with these clients to identify any problems and benefits that they may have had with the equipment. Although the CATCH is more general purpose than the PIT, the results of this final evaluation should show some degree of overlap in the problems that the two methods identify. This i s because the CATCH has the capability to identify design level problems, which may manifest themselves as problems for the clients after the equipment is installed. As part of their ongoing assessments of installations of telecare equipment, FOLD routinely carry out a reassessment six months after the initial installation. This allows time for the system to bed in, and for the client to get used to living with the equipment, which often has some impact on their everyday living. FOLD are planning to use the PIT as part of this post installation assessment.

Belfast) has been invaluable in developing and refining the PIT. Anna Marshal-Day, Eileen Burns and Aileen Orr assisted in the evaluation of the CATCH, whilst Age Concern Barrow, MHA Penrith, and Dundee Social Work were all involved in the development stages leading up to the CATCH.

8. REFERENCES [1] Laprie, J.-C. (1995). Dependable computing: Concepts, limits, challenges. Paper presented at the 25th IEEE International symposium on fault-tolerant computing, Pasadena, CA. [2] Swain, A. D., & Guttmann, H. E. (1983). A handbook o f human reliability analysis with emphasis on nuclear power plant applications (NUREG/CR-1278). Washington, DC: US Nuclear Regulatory Commission.

Viewpoint housing have expressed an interest in using the CATCH checklists as part of their policy. MHA have found elements of the CATCH checklists useful in informing their design plans for future buildings.

[3] Monk, A., Hone, K., Lines, L., Dowdall, A., Baxter, G., Blythe, M., & Wright, P. (Submitted for publication). Towards a practical framework for managing the risks of selecting technology to support independent living. Applied Ergonomics.

6. SUMMARY

[4] Dewsbury, G., Sommerville, I., Clarke, K., & Rouncefield, M. (2003). A dependability model for domestic systems. In Proceedings of 22nd International Conference o n Computer Safety, Reliability and Security (SAFECOMP’03). Heidelberg, Germany: Springer-Verlag.

Two methods have been developed for assessing the dependability of EAT. The first of these, the PIT, is based o n risk analysis techniques from industry and provides a clientcentred systematic way of assessing the risk posed b y problems with EAT. The second, the CATCH, adopts a broad view of dependability and tries to prevent the occurrences of the problems by addressing the issues at design time. Taken together, the two approaches are complementary, and offer a comprehensive way of dealing with the potential risks of using EAT in the home.

7. ACKNOWLEDGMENTS

[5 Dewsbury G & Sommerville I, (2004) CATS: Assisting Older People Obtain Appropriate Technology Support. Paper presented at the HCI and the Older Population workshop at BCS British HCI Conference 2004. Leeds, UK 7 September 2004. [6] Dewsbury G, Clarke K, Hemmings T, Hughes J, Rouncefield M, and Sommerville I, (2004) The Antisocial Model of Disability, Disability and Society, 19, 2, 145-158.

The authors would particularly like to thank all those people who took part in the field studies. The assistance of Pam Mills (Durham), Lynn McAllister (West Lothian), Barbara Taylor (FOLD Telecare, Belfast) and Roy Thompson (FOLD Telecare,

Page 7

Risks and Dependable Deployment Jo Mackie

David Martin

Computing Department Lancaster University Lancaster, UK +44 (0)1524 510352

Computing Department Lancaster University Lancaster, UK +44 (0)1524 510348

j.mackie@comp .lancs.ac.uk

d.b [email protected]

ABSTRACT This paper reports on some of the work produced on the DIRC Targeted Activity ‘Dependable Deployment’. It particularly focuses on the issue of risks that arise during the development and deployment stages of systems design. Risks inevitably plague complex systems design projects and since few projects can be stopped and begun again professionals often try to avoid them or solve their emergent problems through sharing knowledge gained from personal experiences - ‘war stories’ - with other practitioners. We report on our development of a web site to list war stories - descriptions of risks and subsequent actions - arising from specific healthcare information systems development projects. This is intended as a resource to enable developers in this domain to learn from the problems and experiences of other projects.

Keywords Risk, Dependability, Deployment, War Stories, Healthcare Information Systems, Ethnography

Hazards,

Karen C larke Devina Ramduny-Ellis Simon Lock Mark Hartswood Gillian Hardstone

date, as in this domain the technology, its envisaged role, and the working practices and procedures of the organizations change rapidly. Of course, organizations do learn through their own experiences, however, timeliness can be an issue. The question of how much of what is learnt in hindsight can be put to good use in the future is an open one. However, since there is n o ‘silver bullet’ of a design method or process – there may be better ways of doing things on a particular project, a more suitable COTS system to buy, more expert designers and programmers to employ, etc., but still no sure fire route t o success – previously acquired knowledge and experience will necessarily play an integral part in design and development. Based on this idea we wanted to build a website for developers of healthcare systems to effectively share their expertise and experiences to help avoid some of the pitfalls of previous projects and to share their knowledge of development problems and possible solutions.

2. EMPLOYING ‘STORIES’ IN DESIGN

1. INTRODUCTION When seasoned practitioners are asked how to design better, more dependable computer systems, on time, and without problems in deployment, the answer is often twofold – after the first attempt throw it away and begin again, and do so with the same project team [e.g. Brooks, 1]. The idea is a straightforward one, in most design projects numerous decisions are made that later turn out to be ‘non-optimal’, erroneous or mistaken. In the fullness of time developers realize that if they had known what they do now they would have done things differently. When considering organizational systems (large scale, complex, having a definite impact on organizational practices and operation) the process of design is also a process of learning about the different parts and practices of the organization and learning about the impacts of a design on those parts and practices. This means that unfortunately the required and desired knowledge of the organization is often only achieved at the end of a project. Unfortunately, when considering the development of complex and/or large scale systems the possibilities of throwing the initial system away and starting again are often slim. Furthermore, project teams rarely stay together in entirety over the course of a single project, never mind across the development of two or more projects. In the NHS if projects are shelved, the subsequent system is likely to be built by another project team, from a different private company, configuring their own customizable-off-the-shelf (COTS) solution. Much of the learning of a previous project - in terms of documentation and expertise - is likely to be lost or out of

Organizations have attempted to archive knowledge and experience through knowledge management initiatives that seek to capture-codify-and store knowledge electronically. However, these formal (or ‘hard’) programs have often met with only partial success. For example, personnel often begrudge the extra work required to populate ‘knowledge’ databases, may resist making their specialist knowledge publicly available to their company – so undermining their own expert position, and successfully searching for, retrieving and recontextualizing the ‘knowledge’ is notoriously problematic (e.g. [5]). Other branches of research, and the lack of success of formal approaches to knowledge management has lead researchers t o focus on the informal, mundane ways in which knowledge, experience and expertise are employed, shared and passed o n between personnel within and between different organizations. Numerous studies have now drawn attention to the role of narrative, or stories, in communicating just this type of information. A seminal work in this area is Orr’s [6] study of photocopier technicians. He describes how the technicians form a community of practice whereby they routinely tell ‘war stories’ to one another as a means of sharing their experiences. As well as being stories in which, for example, the narrators demonstrate their own ‘heroism’ or ingenuity’, it is through these means that knowledge of problems and solutions is most successfully shared and transferred. That this is the case i s shown by the fact that the company achieved more efficient and effective work by the technicians not by designing a better problem and solution database or better manuals but b y supplying the technicians with mobile phones so they could

Page 8

talk to each other about the problems as they encountered them. Currently, there is great interest in the role that stories can play in organizational activities – not only for problem solving but also for many other knowledge management activities and even as means for inspiration, leadership and strategic thinking. Much has been written and it is now espoused by a number of well known academics and management gurus [e.g. 3,4]. The basic idea is that stories are a ‘natural’ way for humans to communicate ideas, knowledge, experience and so forth, they are necessarily social, they help us bond with other people and they contain the necessary contextual details that are lost when we abstract and codify information. Also, they are part of a dialogue, so the teller can be queried for more information, elaboration, re-specification and so forth, so mutual understanding is more easily achieved. The idea is simple; to increase the transfer of knowledge, increase the opportunities to share stories.

stories via interviews or collect them from observations of where they spontaneously occur in interactions between personnel in a setting. In the following section, to illustrate how we derived our ‘proxy’ war stories from ethnographic fieldwork we will discuss some examples of issue and risk handling, from a study of healthcare information system design at a hospital Trust in the North of England.

4. ‘ISSUES’ AND ‘RISKS’ HANDLING AT NORTH OF ENGLAND TRUST

3. A RESOURCE OF DEPLOYMENT RISKS, HAZARDS AND WAR STORIES

The Trust in this study is currently in Phase 1 of a three phase comprehensive £8 million electronic patient records (EPR) project, delivered as a public private partnership (PPP) i n which analysts from the Trust work cooperatively with analysts from the system provider (a US based company (USCo)) to configure their product for use in the hospital. Phase 1 was due to ‘go-live’ in February 2004 and consists of the core administrative system and connected reporting system, A & E, theatres, order communications, and pathology systems. The core administrative/reporting system incorporates various clinical applications and is designed t o be integrated with existing legacy systems, most notably a series of pathology applications. Phase 2 involves documenting care (medical records), and GP access, and Phase 3 is concerned with clinical pathways and electronic drug prescription. As is clear this project has now experienced serious slippage and but should now ‘go-live’ in February 2005. Unfortunately for the project this makes this an apt case to derive war stories on risks and hazards in deployment.

Over the last few years we have conducted a number of ethnographic studies of systems design and development i n healthcare settings. During this time we have seen and documented a number of the difficulties these projects have experienced. While not being in a position to seek to take measures to increase storytelling per se in these organizations (and not having the management skills, nor necessarily the desire) we considered that it might be useful to provide a resource through which the experiences of the project teams might be archived as a resource of ‘risks’ or ‘hazards’ of deployment. Following on from the above cited research o n war stories, storytelling and knowledge management we decided that a resource that detailed the risks in a narrative format, which could be used interactively by professionals and practitioners themselves, might be a useful approach to take.

As is common on many systems design projects a key concern during development and deployment is to identify risks and potential risks to the project as soon as possible and consequently have the apparatus in place to deal with these problems. To manage risks the project team operates two ‘logs’ – an ‘issues log’ and a ‘risks log’. When a member of the team (usually at weekly meetings) raises something (this can be o n a very wide range of topics that are related in some way to the design) as an issue or problem it is added to the issues log. If an issue is deemed serious enough to threaten the timely delivery of project it is deemed a risk. The logs are managed such that issues can become risks and vice versa and that if items persist, increase in magnitude or decrease in importance they can be moved up or down the logs as illustrated in the following quote from a project team meeting:

We discounted the idea that we should just design a bare ‘template’ website whereby practitioners could add their war stories and their contact details (if they wished). We believed that we might stand more chance of attracting postings and recruiting interest if we pre-populated the site with ‘war stories’ organized in some form of structure that might better allow practitioners to browse for and locate entries that might be useful to them. With so little to differentiate many sites o n the Internet, designing to attempt to get a critical mass of users is crucial. Since we had a wealth of ethnographic material from observing and recording project work and interviews with personnel we decided to firstly mine this for risks, hazards and war stories to initially populate the site. In a sense, at this stage we were producing war stories by proxy, where as in the future we would like the war stories to mainly come from the practitioners themselves. These narratives about problems could be posted directly to the website by practitioners. Also, as another source, an ethnographer could explicitly elicit war

..its already on the Risk, Log we uhm probably up the risk number at this stage cos its obviously increased in possibility or likelihood”

The view of Larry Prusak (IBM’s head of knowledge management) and his co-author, David Cohen [2], is that as an organizational strategy, companies should encourage many forms of social interaction between members of staff, as this increases storytelling and all that goes with it, which in turn increases what is termed the social capital of the organization. Social capital - shared values, trust, community, etc. - is said t o be crucial to successful organizational functioning.

The decision as to what place in what log was usually taken cooperatively by the project team but the project manager usually makes the final decision. The logs also serve as an apparatus for taking problems to a higher level to be dealt with thus keeping the paired US and Trust analysts from arguments that might harm their working relationships. These features are illustrated in this quote from the Trust’s project manager: “I have said I wanted the data to be issues at the risk log now because I I said this delay and um the direction so so um not not that I want anyone to get into an argument with them during the conference call but just so you do know I have escalated this one because I am very concerned”

Page 9

The quote also references a risk that the Trust team identified – ‘the data’ – which in this case was referring to the fact that the Trust’s analysts were unsure of the data sets that they were meant to be collecting for the purposes of the database build. They felt that they should be getting more direction from their US based counterparts. However, this had lead to disputes over whose responsibility this was. This can be thought of as risk related to the technical design of the system – i.e. what a data set should consist of – but it is also clearly a risk related t o contracts, roles and unclear or disputed responsibilities.

different countries, difficulties in balancing requirements for the purposes of integration, and the lack of understanding of how much work was required to configure the COTS system t o fit with the Trust’s requirements. In the next section we provide a description of the how we designed and built the website, discussing its structure and providing some examples of its content to expand on the previous discussion of risks derived from ethnographic fieldwork.

Risks, as managed explicitly, come in many forms though, as shown in the following quote:

5. RISKS AND HAZARDS WEBSITE

“Put this in as a very big risk, if the word gets out that the new system is responsible for more work we could be in big trouble”. This risk is to do with the notion that the project could be i n serious trouble if the Trust staff believe the system will cause them more work. Irrespective of the explicit handling of ‘issues’ and ‘risks’ using the logs, talk of risks in general is high on the agenda i n a project like this where there are reputations, money, jobs and even the future of the Trust on the line. For instance, as shown in the example below, a senior clinician shows their concerns with the fact that the Trust is implementing a system prior t o and separate from the current national program. They are worried that the Trust will separate itself from the NHS, although in this case they are placated by a technician: Clinician – “Do we risk getting an isolated, different system that is outside the national system?” Senior Technician – “The system will be held together by HL7 and XML and the minimum data set, so its all compliant, but with different interfaces etc. but there will be different systems in different places.” And another major risk is that if the new system does not enable them to produce the ‘right’ figures they may be negatively evaluated and this may threaten their status and funding: “because the reports we hand into the NHS are crucial to our funding, as a as a Trust and obviously we have to get the reporting right and and eh there’s a huge risk um to the Trust because we’re going live six weeks before the end of year, and um so hoho all of our end of year reports we have to make sure are right between hahah that six week period, so oobviously again there’s just a huge risk to the Trust as a whole” As shown in the examples above the project team has an ongoing concern for risks and potential risks. The flagging up of potential problems and the use of the logs serve several purposes, for example, as a means of identifying concerns before they become serious, as a means of keeping track of multiple issues, as a means of prioritising problems, t o provide a record of issues and as a means of escalating problems to be dealt with at a more senior level. When deriving our ‘proxy war stories’ from the ethnographic material we focused on risks that did actually become more serious problems, i.e. did delay or disrupt the project. For example, one of the risks we identified (referenced in quote discussed earlier) concerned the problem of deciding what the data sets should be and who should and how should they collect the information. Other example risks identified, concerned problems of working with paired analysts i n

The narratives that we have used as proxy war stories are taken from situations during the deployment of healthcare systems where either a problem has arisen, or something hasn’t adhered to the deployment plan or where something else has interfered with the smooth deployment of the system. In order to develop a web site [7] that would be a resource for all of the parties involved in the deployment process, we first had to decide upon a design that would be suitable for both designers and project managers. Part of the design process was to look at what information relating to the war stories would be useful. Obviously a description of the war story would be an essential piece of information but in addition to the war story an anecdotal description of what happened at the time t o rectify the situation, or of what was learned when looking back, would be of use to those using the database. It i s explained on the web site that the anecdote of the solution, which was used at the time the war story occurred is not necessarily a suggested solution but simply there as an example of what happened. One issue that we have to look at is the issue of confidentiality within the healthcare sector, i n order for this to become a growing resource we felt it necessary to allow anonymity (if desired) for the war story authors as well as the war stories themselves. Here is an example of a war story who’s solution is a useful anecdote of what was learned after the event happened.: “Name: Lack of Code of Connections. War Story: In Public Private Partnerships (PPP’s) to design and deploy EPR systems the private supplier requires code o f connections approval in order to enable access to the NHS network, through which the networks of individual hospitals can be gained. This technical security clearance is necessary for off-site access to the Trusts’ networks and systems. In the case at Preston, code of connections approval was overlooked during initial project planning and preparation. With the private supplier being US based, and build going o n simultaneously at two sites the lack of code of connections during the first six or so months of database build and configuration, work was hampered by the inability of paired US and UK analysts to share real-time up to date details o f the system. Misunderstandings about the ‘current’ configuration of the database delayed the project. Solution: This problem clearly stemmed from a lack of initial understanding about what a PPP for designing an EPR would entail concerning access to the network infrastructure, and the requirements for off to on-site collaboration. Code o f connections should be approved early on in the project.” Once the proxy war stories had been identified we explored different ways of grouping / arranging the stories to allow straight forward access but which still allowed the diverse range of war stories to be handled by the web site. There are

Page 10

two ways in which the war stories have been grouped, the first is by its stage of deployment and the second is by its type: Stages of deployment were as follows: Procurement, Award and Signing of Contract, Data Collection, Database Build and Configuration, Integration, Testing, Transition Management, Domestication and Evolution and Maintenance. The stage of deployment represents the stage of deployment during which the war story (i.e. problem) occurred. It was also thought that it would be necessary to include a category that represented the stage of deployment during which it was thought necessary to be aware of the war story. For example, although the ‘lack of code of connections’ war story occurs in database build and configuration, it would be pertinent to know that the situation might arise during the procurement stage of deployment. Having applied this to the other proxy war stories that were selected it became clear that ideally it would be good to know about all of the war stories as soon as early as possible – i.e. during the procurement stage. This too i s explained on the web site and so there is an exhaustive list of all of the war stories in the procurement stage along with links to the stage during which they actually occur. By grouping the war stories in this way any person who is either embarking on a project or at a certain stage of deployment may browse the war stories relating to that particular stage. The second way in which we grouped the war stories is b y keyword ‘types’, the list of types consists of: Access, Bespoke or Off the Peg, Communication, Configuration, Incomplete Data Sets, Integration, Local Verses Global, Outside Commitments, Participation, Relationships, Schedules, Security, Suppliers, Support and Training. By grouping the war stories in this way allows the user to access them by type, e.g. if a manager is considering a bespoke or off the peg solution, they could take a look at all of the war stories that apply to that category. This is an advantage because war stories in the same type, may appear in several different stages of deployment and therefore be hard to find. The web site has been built using MySQL to store the war stories and the web pages are written using Php (a server sided scripting language). The advantage of using MySQL and Php together is that Php allows simple, direct access to the databases so that they can be populated directly through the web site. All of the individual pages for the stages of deployment and for the types of war story are generated automatically so there is potential for more types to be added or to change the stages of deployment that are currently there (see further work section). Although the web site is primarily aimed at the designers of such systems, it is hoped that it will be used by all of the stakeholders involved, e.g. project managers, funding committees, designers, and the eventual users of the system themselves. In order to do this the site has been designed t o be simple and generic. The two different ways of accessing the war stories, along with a description of our intentions for the site, and instructions for those who are unsure about how t o use it all help to make it open to a wider audience. As well as reading the war stories that are held in the web site, it is hoped and expected that Managers (indeed any stakeholders) also contribute with their own war stories. To this end the final section of the web site has been designed to allow users t o enter war stories through a submission page. The submission page asks for the war stories to be entered in the same format as the existing ones and provides a form, which is split up into a number of sections. The first section is for the name of the war

story and the next two sections have the lists of the stages of deployment and keyword types so that the user may indicate where the war story occurred, and the types of war story of which it might be similar to. The final sections of the form are the entry of the war story itself and the ‘solution’ or actions taken at the time [8].

6. CONCLUSIONS AND FUTURE WORK The website is currently at an early stage of release. At the moment it is being evaluated by several Healthcare professionals for it’s relevance to the projects on which they are working at the moment. The next stage will be to assess their findings, take on board any suggested improvements, and refine the site. There are some additions to the web site that are being considered at the moment, the first is the capability to add new categories to the ‘types’ of war stories. This would be an advantage as other types of problem are likely to arise. We are also considering opening up the website for design project work in general. Another direction in which we’d like to take the web site is t o create an interface for the management of the databases. This would consist of a password-protected section of the web site where an administrator could view and edit all of the war stories without having to have any knowledge of Php or MySQL. This would be an advantage if there were errors in the war stories or, if for example, something which has previously been acceptable now had to be anonymised. The final stage of testing would be to allow the web site to be used during a new project deployment and at the same time allow other healthcare professionals to add their own war stories to the collection. Another issue concerns the war stories themselves. To a certain degree our initial population of the database with proxy war stories was not ideal, as we were translating information we gathered on problems into ‘war stories’. In the future we would like to either gather specifically elicited war stories from the practitioners themselves through interviews or would like them to produce them themselves. However, there are also some more interesting issues to research regarding war stories, particularly regarding naturally occurring (rather than elicited) stories such as; (1) how do war stories function in action, (2) how do they relate to other project work activities, and (3) what are the different types and formats of war stories? For example it would be interesting to see if there are different formats to different types of war story (e.g. stories of success or failure). Furthermore, it would be interesting to examine the circumstances in which they are naturally provoked in a workplace, and also how people decide if a story is appropriate to their current situation and in what ways they can inform decision making. Overall, we think it would be useful t o investigate war stories as occasioned components of practical reasoning and action in the workplace, and we intend to pursue this line of inquiry. In conclusion, we believe that the website, as a resource that healthcare design allows professionals to share knowledge and experiences through war stories could prove useful for them and may help them in their community building. This could be very important in the next few years as all across the NHS similar projects – Public Private Partnerships (PPPs) t o implement COTS electronic patient record systems – are taking place. Designing in these complex settings is inevitably

Page 11

difficult and so it is crucially important that professionals share their knowledge of problems so as to help this process run more smoothly. We would like our website to contribute i n a small way.

7. ACKNOWLEDGMENTS We would like to thank the UK Engineering and Physical Sciences Research Council, grant number GR/M52786 and the Dependability Interdisciplinary Research Collaboration (DIRC). And all the staff at the medical field sites where we conducted our ethnographic studies.

8. REFERENCES [1] Brooks, F., The Mythical Man Month: Essays on software engineering, anniversary edition. Addison-Wesley, Boston, 1995 [2] Cohen, D. & Prusak, L., In Good Company: How Social Capital Makes Organizations Work, Harvard Business School Press, 2001.

[3] http://www.creatingthe21stcentury.org/ [4] Davenport, T. & Prusak, L., Working Knowledge. How Organisations Manage What They Know, Harvard Business School Press, 1998 [5] Hildreth, P., Wright, P. & Kimble, C., Knowledge Management: Are We Missing Something? In Brooks L. & Kimble C. Information Systems – The Next Generation. Proceedings of the 4th UKAIS Conference, York, UK, pp347-356, 1999. [6] Orr, J., Talking About Machines: An ethnography of a modern job. ILR Press, Ithaca, NY, 1996. [7] War Stories Web Site: http://www.comp.lancs.ac.uk/computing/users/mackie/W arStoriesWeb/hazards.php [8] War Stories submission form: http://www.comp.lancs.ac.uk/computing/users/mackie/W arStoriesWeb/submit.php

Page 12

Studies in Behavioural Decision Making Lisbeth Aagaard City University(LONDON) Northampton Square London EC1V 0HB 0044 207 040 4580

[email protected] Keywords Mental Algorithms; Risk Assessment; Complex Decision Making; Structure of Information; Sunk Cost; Mental Accounting; Rule Use; Development.

INTRODUCTION I am currently reading into various areas of behavioural decision making and planning experiments to investigate the following issues:

Ecological Bayesian Inference & Mental Algorithms Gigerenzer (1998) defines mental algorithms as induction mechanisms that perform classification, estimation, or other forms of uncertain inferences, such as deciding what colour an object is, or inferring whether a person has a disease or not. Gigerenzer proposes that human reasoning algorithms are designed for information that comes in a format that was present in the EEA (environment of evolutionary adaptiveness (Tooby & Cosmides, 1992)) and thus are not optimally equipped to deal with probability or statistical problems (e.g. Eddy’s (1982) classic mammography problem) expressed in other formats. Further to the aims of the DIRC project I am interested in investigating how individuals assess risk and make judgments about safety when dealing with low probability events such as differentiating a risk of 10-7 from 10-5, a risk 100 times as large. How do individuals judge the safety of one computerised system over another (say controlling an atomic plant), and does stating the problem or requesting the answer in different formats significantly affect judgment or understanding of such low probability questions for laypersons and / or experts? Gigerenzer’s group also emphasise the importance of the structure of information in environments; for example, Gigerenzer & Goldstein’s (1996) study showed, using computer simulations, that a simple satisficing ‘take the best’ algorithm matched or outperformed various ‘rational’ inference procedures (e.g. multiple regression) on two-alternative-choice tasks. This result raises interesting questions regarding the design of decision support systems. Complex multi-attribute decisions might be best supported by a system that acknowledged that people, regardless of their intuitions about how they perform the task, may use “simple heuristics” to reduce the information processing load.

The Sunk Cost Effect The sunk cost effect is an irrational economic behaviour, which is manifested in a greater tendency to continue an endeavour once an investment of money, time, or effort has been made (Arkes & Blumer, 1985). Research indicates the effect may be a result of cognitive processes such as mental accounting (Thaler, 1980,

1999), impression management (Arkes & Blumer, 1985), and / or inappropriate use of an abstract ‘don’t waste’ rule (Arkes, 1996). Recently it has been proposed that those with more modest cognitive abilities than adult humans, (e.g. children and animals) may be less likely than adults to use these cognitive processes in decision making and thus may be less likely to exhibit the sunk cost effect (Arkes & Ayton, 1999). Studies have investigated children’s use of mental accounting (Webley & Plaiesier, 1997; Krouse, 1986) as well as their use of rules in decision making (Jacobs & Potenza, 1991) but little direct research has been done on the sunk cost effect in children. I am interested in investigating the sunk cost effect for monetary and temporal investments in children, to investigate the developmental stages, if any, of the effect, and the possible correlation with the development of mental accounting and/or abstract rule use.

REFERENCES [1] Arkes, H. R. (1996). The psychology of waste. Journal of Behavioral Decision Making, 9, 213-224. [2] Arkes, H. R. & Blumer, C. (1985). The psychology of sunk cost. Organizational Behavior and Human Decision Processes, 35, 124-140. [3] Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (eds). Judgement under uncertainty: Heuristics and biases. Cambridge University Press [4] Gigerenzer, G. (1998). Ecological intelligence: An adaptation for frequencies. In D. D. Cummins & C. Allen (Eds.), The Evolution of Mind (pp. 9-29). New York: Oxford University Press. [Reprinted in Psychologische Beiträge, 1997, 39, 107125. [5] Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650-669 [6] Jacobs, J. E., & Potenza, M. (1991). The use of judgment heuristics to make social and object decisions: A developmental perspective. Child Development, 62, 166178. [7] Krouse, H. J. (1986). Use of decision frames by elementary school children. Perceptual and Motor Skills, 63, 11071112. [8] Thaler, R.H. (1980). Towards a positive theory of consumer choice. Journal of Economic Behavior and Organization, 1, 39-60.

Page 13

[9] Thaler, R.H. (1999) Mental accounting matters. Journal of Behavioral Decision Making, 12, 183-206. [10] Tooby J. and Cosmides, L. (1992). The psychological foundations of culture. In. J. H. Barkow, L. Cosmides, & J. Tooby (Eds). The Adapted Mind: Evolutionary Psychology and the Generation of Culture. Oxford University Press.

[11] Webley, P. & Plaiesier, Z. (1997, August). Mental accounting in childhood. Paper presented at the 16th biannual conference on Subjective Probability, Utility, and Decision Making, Leeds, UK.

Page 14

Building the NTRAC: Something Old, Something New … Kate Ho University of Edinburgh School of Informatics 1 Buccleuch Place, Edinburgh, EH8 9LW +44 (0)131 650 4412

[email protected] ABSTRACT A forgotten dimension of systems development is how developers use similar systems as a resource to design. This thesis will examine the building of a new cancer research platform – the National Translational Cancer Research (NTRAC) system in Edinburgh. I will examine the way i n which the developer takes procedures, working practices and data definitions from similar systems and “re-uses” them i n order to build the system without “reinventing the wheel”. Healthcare systems are of particular interest because of their highly regulated nature, especially in regards to data protection.

Keywords Risk, ethnomethodology, co-realisation, healthcare systems, trust

1. INTRODUCTION In IT development, there has been a tendency to orient towards the future. Existing systems tend to be seen more as a legacy and burden rather than a resource for design. The temptation t o start from scratch is very strong especially given the need incorporate current best practice. But yet, are such systems really built from nothing? When requirements have been collected, turning this into real workable solutions seems relatively easy if the system exists in a social vacuum. However, this is rarely the case. When computer systems are built, they inevitable replace other systems (IT or manual). Developers need to recognise that user do not react well “starting from scratch” and need to acknowledge current working practices of the systems it seeks to replace. This thesis sits in supporting current body of literature favouring incremental systems development. Healthcare systems are, by their very nature, complex entities. There exists a number of rules and regulations which stipulates what can be done, and in what ways (e.g. Data Protection Act, 1998 and the Freedom of Information Act, 2000). The problem arises when policies have to be implemented in practical terms – i.e. how to make these rules apply to everyday work. Whilst technically, some of the challenges posed are not difficult, trying to solve these problems in a workable way in the social environment of the workplace is more challenging. During observation of the case study and discussions of the problems facing the building of the NTRAC system, it was clear that these problems were not trivial. There are no straightforward or standardised solutions.

One resource in helping to solve them is by examining other similar systems currently under operation. It can be easier t o build “new” systems by looking at the mechanisms, procedures, working practices, data definitions etc. of other systems. Note that this is not attempt at reusing code to save costs, but rather, a recognition that there is real value i n viewing other systems as a resource. The important point here is that developers should work up the future by understanding both the future and the past. The above has to two implications. Firstly, by looking at this type of procedural and definition (re)use, it supports the view that systems design does come from “somewhere” and working practices cannot be drawn up in a social vacuum. This supports Suchman’s (2002) criticism of system developers which she terms a “design from nowhere”. As a result, it leaves developers “to be ignorant of their own positions within the social relations that comprise technical systems” (2002: 95). Secondly, there has been insufficient emphasis on the process of how previous (or current) systems are brought forward t o the new. There seems to be a tendency to over-look this and perhaps not enough recognition (or willingness) to look beyond the developer’s own environment for insight. Taking Co-realisation (Hartswood et al., 2002) as the orientation to systems development, this thesis aims t o examine the process of how old systems are filtered into the new.

2. CO-REALISATION Co-realisation is an approach to systems design which arose out of a “synthesis of ethnomethodology and participatory design” (Hartswood et al., 2002). The “paradox of ethnomethodologically-informed design” (Dourish and Button, 1998 – quoted in Hartswood et al., 2002) states that future working practices cannot be designed by simply looking at current working practices. By in large participatory design has focused on design and thereby tended to neglect technology in use - which means the innovation and learning that occur during use is neglected. Co-realisation calls for the design (and use) of systems which affords work by making the designers examine closely the ways users work. This close examination occurs by “IT professionals to shift the technical work of design and development into the users’ workplace, if not completely, then at least routinely and over sustained periods of time” (ibid). In other words, the designer should be around the workplace i n

Page 15

question, and observe the ways users go about their everyday working lives. In doing so, the “emphasis in co-realisation i s on tightly coupled, ‘lightweight’ design, construction and evaluation techniques” (ibid).

3. RESEARCH METHODOLOGY The research will take the form of an ethnographic study at the NTRAC office. At the time of writing, one member of staff i s employed - with more by March 2005. The current staff member is the e-scientist and acts as a co-realiser at the NTRAC office. The study will not only cover the NTRAC office but also various sites and meetings in which the future shape of NTRAC will be determined.

4. CASE STUDY: NTRAC The study will investigate the Edinburgh node of the National Translational Cancer Research Network (NTRAC). Translational research is defined by NTRAC as the “integration of bench and clinical research for the benefit of cancer patients and those at risk of developing cancer.” The aim of the Edinburgh node is to build an infrastructure i n that facilitates the recruitment of patients into cancer research. The infrastructure will need to support all parts of the research process – which includes epidemiology studies (factor x i s significant in determining whether patient y will develop this type of cancer) and clinical trials (testing of new treatment regimes upon patients). The infrastructure consists of systems and practices which should support the collection of the core dataset, data linkage with other sources (such as death records), data curation and its analysis. On a basic level, NTRAC will have two nurses who recruit cancer patients into the program. Tissue samples are routinely taken as part of the treatment of patients and for consenting patients, parts of these samples will be taken for research purposes. In addition to this, blood will be taken for DNA analysis. Other data such as food frequency questionnaires, lifestyle questionnaires as well as the family history will be recorded in the NTRAC database. Data from patients’ from clinical records will be acquired by record linking with existing data sources such as disease registers or audit data. One outcome from this is that with NTRAC, datasets can be utilised for cancer research without the need for each individual cancer study to recruit patients. In summary, NTRAC is an attempt to build a common platform for different areas of research (e.g. Colorectal Cancer, Breast Cancer) as well as different stages of research (e.g. epidemiology studies, clinical trials).

5. DEPENDABILITY ISSUES There are a number of dependability issues that will arise out of the study.

This study has resonance with the debate around the software reuse and dependability. By reusing software, because of its tried and tested nature, the chances of its reliability increases (although this is debatable). Does the same debate arise here? How does “borrowing” working practices from a previous system impact on dependability? Of course, by simply putting together parts does not make the system more dependable - i t is how these components are put together that makes the impact. Another important discussion to arise from the study of NTRAC is trust. Trust, in this case, relates to patients, practitioners, clinicians, researchers and occurs at all levels. Without trust in the accuracy, integrity etc, in the system, then healthcare professionals and patients will not use the system (Booth, 2003). This also points to the evolution of trust relations (between practitioners, patients, in the system and vice versa). Tradeoffs in systems attributes in dependability are another interesting issue. For example, in NTRAC, security is a higher concern than availability - confidentiality and anonymity of patient records is more important than NTRAC being nonoperational for a few of hours. Also, these tradeoffs and attributes shifts over time and systems. For example, how will the movement towards the paperless (electronic) records affect the systems attributes and hence, its dependability? There are also issues of risk in NTRAC, especially in relation to the data security. With personalised information in the database, control of access and how data can be accessed is an important issue to resolve. However, risk is not restricted t o outcomes but it also includes perceptions of risk. As stated above, confidentiality is more important than accessibility and hence, perceptions of how and whether personal information can be released in unlawful ways should also be taken into consideration.

6. REFERENCES [1] Hartswood, M., Procter, R., Slack, R., Voss, A., Buscher, M., Rouncefield, M. and Rouchy, P. “Co-realisation: Towards a Principled Synthesis of Ethnomethodology and Participatory Design” Scandinavian Journal of Information Systems, 14(2) [2] Suchman, L. (2002) “Located Accountabilities in Technology Production” Scandinavian Journal of Information Systems, 14(2): 91-105 [3] Booth, N. (2003) “Sharing patient information electronically throughout the NHS” BMJ, 2003(327): 114-115

Page 16

Opacity and Secure Information Flow Thea Peacock University of Newcastle upon Tyne Newcastle upon Tyne NE1 7RU +44-191-222- 7895

[email protected] ABSTRACT In this paper, opacity, a recent information flow property, i s briefly described. Extensions to existing work on opacity are outlined, and a unified model for information flow i s proposed.

Keywords Risk; security; information anonymity; opacity.

flow;

2. INFORMATION FLOW PROPERTIES An early attempt at mandatory control of access to information is the Multi-Level Security system, of which the Bell-LaPadula (BLP) model [1] is a famous example. Subjects are grouped b y security clearances, and objects by classifications, forming a lattice. Information flow is controlled by the so-called ‘noread-up’, ‘no-write down’ rules.

non-interference;

2.1 Problems with existing definitions

1. INTRODUCTION This research stems from the scenario of a secure system i n which the flow of information is based on contextual factors, e.g. time, location, history of past accesses, etc., as well as the identity of the person making the request. Information flow control therefore needs to be dynamic and finely-grained. Interactions between processes are complex, and secure information flow in computer systems is a difficult problem. As systems become more sophisticated and distributed computing more prevalent, the problem can only increase i n magnitude. Information flow properties have been a subject of security research for over 20 years, and the various existing definitions are still debated. Recently, a new property has emerged which we call opacity. The term probably originated in the expression of anonymity and privacy as various forms of 'opaqueness' under different 'function views', [8] a mathematical abstraction for partial knowledge of a function. Subsequently, ‘opacity’ was used t o describe a predicate over system traces for analyzing security protocols [9]. Although both ideas are strongly related, the second is closest to the present work. Opacity has since been given a more formal structure and used to model information flow in various situations [2] [3] [4] [5]. Adopting the labeled transition system (LTS) convention of earlier work, a predicate, _, is defined as opaque if, for all runs of the LTS, it is true in one run of the system but false i n another, and it is impossible for an observer to distinguish between the two [9]. A run can be described as a transition from an initial state to another reachable state via a sequence of actions, represented by labels. Further details of the LTS can be found in [2]. In common with most definitions for information flow, opacity is based on observational equivalence. Opacity may be particularly suited to dynamic, finely-grained flow control as it appears not to follow the ‘all or nothing’ approach of many existing definitions. So far, opacity has been mapped to several information flow properties as a first step to understanding it more fully, and to get a measure of its flexibility. This paper provides a brief description of research to date, as well as the anticipated future work.

A serious flaw with the BLP model is covert channels, i.e. implicit and illegal flow. Another is that it is often found t o be too rigid in practice. Any changes to the security level of information are prohibited, except if performed by a ‘trusted subject’, which is an arguable concept. There have been numerous attempts to solve the problem of covert channels, but perhaps most notable is non-interference [7]. Assuming a system partitioned into two sets of users: High and Low, non-interference is satisfied if, from Low’s observations, no knowledge about High actions can be gained. Non-interference, was, however, limited to deterministic systems, and has subsequently been modified in various ways. Recent formulations of non-interference account for nondeterminism, [11] but this is no easy task, and none of the existing definitions are completely fail-safe. Various existing definitions of intransitive information flow attempt to solve the problem of declassification. The flow i s intransitive if it can pass from High to Low via some downgrader subject, but not directly from High to Low. Intransitive flow systems fall broadly into two categories: those using complicated mathematics to describe permitted forward and backward flows, and those imposing architectural constraints on the system. There are also combinations of the two. Preliminary work indicates that opacity is capable of modelling both transitive and intransitive information flow cleanly and succinctly.

3. OPACITY An informal definition of opacity has been given in the Introduction. It should be noted that it is asymmetric: it i s only of concern whether _ can be established.

3.1 Asymmetric vs. symmetric opacity For reasoning about security properties, however, it i s sometimes also important that observer should not be able t o establish ¬ _. For example, if _ denotes the existence of High labels, then establishing ¬_, i.e. absence of High actions, could enable the observer to make certain deductions about High actions. This is especially true if High has control over actions at that level.

Page 17

A symmetric version of opacity has therefore been defined:

Non-deducibility

A system satisfies (symmetric) opacity if for any run where _ (r. ¬_) is true, there exists an observationally equivalent run where ¬_ (r. _) is true.

Non-inference

It should be noted, however, that the asymmetric version i s more fundamental, but symmetric opacity is useful when it i s necessary to guard both _ and ¬_.

?

OPACITY

Non-deducibility on Strategies

?

INITIAL OPACITY

Non-interference + Non-leakage = Non-influence

Figure 1. Mapping opacity to other properties

3.2 Variations of opacity Several variations of opacity have been proposed: initial-, final-, always- and total-opacity [5]. Initial-opacity is useful when some information must be protected at system start-up, e.g. initialisation of cryptographic keys. With final-opacity, the observer cannot determine the result of a computation at the end of a run. Always-opacity describes the situation i n which for every run where _ is true, there is an observationally equivalent run to a corresponding state where _ is not true. For total-opacity, the observer should not be able to establish _ over and up to the end of the run.

3.3 Observation of the LTS There is an obs() function, defined over runs of the LTS, which returns the observable aspects of the system. It can be either state- or label-oriented, or both [2] [5]. ‘Hidden’ actions can be represented by allowing obs() to return some ‘dummy’ label [4]. The obs() function can be static or dynamic, and thus also model the computational power of the observer. It is static if the observer always observes the same transition in the same way. Dynamic obs() allows the observer some accumulation of knowledge based on previous (observable) states and/or labels [2], e.g. key compromise in modelling an adversary adaptive system. A dynamic obs() has interesting possibilities for intransitive information flow. Extending the existing definition, the observer's view of the system could depend on the state and/or labels leading to, possibly even from, that state. This is part of current research.

4. MAPPING OPACITY TO OTHER PROPERTIES Each of several existing information flow properties: noninterference, [7] non-inference, [10] non-deducibility, [11] non-deducibility on strategies [14], non-leakage, [13] and non-influence [13] has been cast as opacity. In all cases, an inverse mapping is also possible. Due to lack of space, the individual properties are not described here. For the same reasons definitions of the properties cast as opacity, and vice versa are not given. Full details will be found in a forthcoming paper.

Figure 1 shows how opacity maps to the properties mentioned above. The arrows carry a transitive implication. For example, opacity can be cast as non-deducibility, and, since nondeducibility on strategies can be regarded as a subset of nondeducibility, opacity also maps to non-deducibility o n strategies. Again, an explanation of the subset inclusion will be found in the forthcoming paper. Non-influence has been defined as non-interference plus non-leakage [13]. It i s possible to cast non-leakage as initial opacity and vice versa, hence opacity cast as non-interference and initial opacity as non-leakage equal non-influence. Information flow is an elusive subject, and many of the intrinsic subtleties have yet to be resolved.

5. FUTURE WORK As mentioned in Section 3.3, it may be possible to define intransitive information flow in terms of opacity, and work i n this direction has been started. Interesting work has been done casting several levels of anonymity as opacity [2] [3]. An expressive framework for anonymity properties has been defined in [9], and it has already been mentioned that there appears to be a strong link with the present work. It is planned to investigate whether the various forms of anonymity in [9] can be cast as opacity. The first attempts are promising. Finally, an examination of information flow in an e-voting scheme using the results of work on opacity is planned as a case study [6]. The present aim is to produce a unified way of expressing a wide range of information flow properties based on opacity. It is hoped to encompass confidentiality, both transitive and intransitive, and anonymity. This, as yet, is unique in the field.

6. CONCLUSION Existing work on opacity has been recalled and links t o various information flow properties briefly described. An overview of planned future work, with extensions t o intransitive flow and anonymity has also been given. Results so far have shown that opacity is extremely flexible and capable of forming a basis for various security properties. However, it would also be interesting to find the limits of its capabilities. It is possible that these will be revealed as research progresses.

7. ACKNOWLEDGMENTS Many thanks to Professor Peter Ryan and Dr. Jeremy Bryans for their invaluable help and advice

Page 18

8. REFERENCES [1] Bell, D. E. and LaPadula, L. J. Secure Computer Systems: Mathematical Foundations and Model. Technical Report M74-244, The Mitre Corp., 1973. [2] Bryans, J., Koutny, M., Mazarè, L. and Ryan, P. Y. A. Opacity Generalised to Transition Systems. Submitted to Computer Security Foundations Workshop 2005. [3] Bryans, J., Koutny, M., Mazarè, L. and Ryan, P. Y. A. Opacity Generalised to Transition Systems. Technical Report CS-TR: 868, University of Newcastle upon Tyne, 2004. [4] Bryans, J., Koutny, M. and Ryan, P. Y. A. Modelling Dynamic Opacity using Petri Nets with Silent Actions. Technical Report CS-TR: 855, University of Newcastle upon Tyne, 2004. [5] Bryans, J., Koutny, M. and Ryan, P. Y. A. Modelling Opacity using Petri Nets. Workshop on Security Issues with Petri Nets and other Computational Models, 2004. [6] Chaum, D., Ryan, P. Y. A. and Schneider, S. A. A Practical Voter-verifiable Election Scheme. Technical Report CSTR: 880, University of Newcastle upon Tyne, 2004.

[8] Hughes, D. and Schmatikov, V. Information Hiding, Anonymity and Privacy: A Modular Approach. Journal of Computer Security, 12, 1 (2002). [9] Mazarè, L., Using Unification for Security Properties. Workshop on Issues in the Theory of Security 2004. [10] O’Halloran, C. A Calculus of Information Flow Properties. European Symposium on Research in Computer Security, 1990. [11] Ryan, P. Y. A. and Schneider, S. Process Algebra and Non-interference. Journal of Computer Security, 9, (2001). [12] Sutherland, G. A Model of Information. 9th National Security Conference, 1986. [13] Von Oheimb, D. Information Flow Revisited: Noninfluence = Noninterference + Nonleakage. European Symposium on Research in Computer Security, 2004. [14] Wittbold, J., Johnson, D. Information Flow in Nondeterministic Systems. IEEE Symposium on Research in Security and Privacy, 1990.

[7] Goguen, J. and Meseguer, J. Security Policies and Security Models. IEEE Symposium on Security and Privacy, 1982.

Page 19

Cognitive Assistance for Dementia Joe Wherton Centre for Usable Home Technology (CUHTec) Department of Psychology, University of York Heslington, York, YO10 5DD 01904 433178

[email protected] ABSTRACT This paper will summarise the proposed research for my PhD thesis. The project centres upon the development of assistive technology (AT) for cognition that can support people with dementia in their home. A report on ten interviews and one focus group with professional carers is presented. Types of activities of daily living (ADL) and hazards that pose most problems are identified. These will be discussed in relation t o the DIRC theme of Risk.

Keywords Risk, Assistive Technology, Activities of Daily Living, Dementia, Distributed Cognition, Episodic Memory, Executive Functioning.

deficits would pose a number of risks in the home, which can be detrimental to independent living.

2. PHD THESIS The overall objective is to develop an assistive device that can support a person with mild to moderate dementia in the home. The system must be relevant to the user’s needs, and suited t o their cognitive capabilities. The first (and current) stage is t o explore the problems experienced by people with dementia within the home. The analysis of the problems will be based upon interviews and observational work with patients and their carers. Findings from the initial interviews are discussed later in this paper. Subsequent work will focus on the development and evaluation of an assistive device, which will support a specific ADL.

2.1 Distributed Cognition Approach

1. INTRODUCTION Clinicians and researchers are now discovering ways in which technological aids can compensate for cognitive impairments. Much of the work has focused on how assistive technology can minimise the risks of independent living amongst vulnerable elderly groups. One of the most common conditions of cognitive impairments amongst the elderly i s dementia. Approximately 15% of people over 65 acquire degenerative dementia, the most common cause being Alzheimer’s disease. A number of projects are underway i n exploring the use of AT to support this population in the home [1,3,7]. Such developments include reminder devices, communication supports, orientation aids, and task prompting systems. Many of these concepts have arisen from the experience of patients and their caregivers. However, little has been done to describe these problems in relation to cognitive theory. This is an important part of the design process in order to establish how AT best meets the users’ cognitive capabilities. Two cognitive domains show profound impairment during the early and middle stages of dementia. Firstly, the episodic memory system is severely impaired [4]. This stores information regarding personal experience, which is essential for memorising previous and prospective events. It is, therefore, an important aspect for everyday living. Second, the central executive (CE) begins to deteriorate at the onset of the dementia [2,8]. This system is central to the process of performing goal directed actions, such as planning, sequencing and divided attention. Most ADLs consist of action sequences, some of which occur in parallel. Therefore, impairment in this domain is a major problem for achieving simple daily task. Research has demonstrated that deterioration of the episodic memory system and the CE has a severe impact on the patient’s functional status [6,9]. These

Distributed cognition (DC) is a suitable framework for the research objective. DC theorists have described how the external environment represents part of the problem solving system [10]. By using and manipulating external resources (including artefacts and people), the nature of a problem becomes re-represented to suit our cognitive capabilities. Similarly, ATs are compensatory strategies that “alter the patient’s environment” in a manner that is compatible with their functional skills [5].

2.2 DIRC Theme: Risk The theme of Risk relates to the problems experienced b y people with dementia, and the consequences for independent living. In designing assistive technology, it is necessary t o inductively establish the activities that need supporting. It i s important to remember that AT should be designed to assist the user, and not to find interesting applications for new technology. The cognitive model of dementia is well established. However, it would be unreliable to deduce what problems are likely to occur from the model alone. Factors beyond their cognitive status may play a significant role. For example, patients can successfully achieve a task using compensatory strategies or seizing upon implicit cues [6]. Therefore, it is necessary to explore the problems within a naturalistic setting before identifying areas for intervention.

2.2.1 Initial Analysis So far, ten interviews and one focus group have been conducted with professional carers. Figure 1 presents a summary of the three issues that emerged from the analysis. These are: the types of problems in the home, the underlying deficits of these problems, and the consequences of the problems for independent living.

Page 20

2.2.1.1 Problems in the home The first issue describes commonly occurring problems in the home. These can be identified as either failure in performing ADLs, or areas of potential hazards. The problematic ADLs include dressing, preparing food or drink, toileting and communication. The likely hazards involve cooker safety (leaving the cooker on or not lighting the gas), not taking the correct medication, and security (wandering outside or letting strangers into the house).

supported. This will ensure that subsequent work into the development of an AT will be appropriate and useful. Problems in the Home Performing ADLs

Hazards

Underlying Deficits

2.2.1.2 Underlying Deficits Sequencing

The second issue relates to the underlying deficits. This represents general problems that impede performance of ADLs, or provoke hazards in the home. This is the level at which specific problems can be conceived in accordance with cognitive theory. The deficits are sequencing, memory and orientation, and learning. Sequencing refers to the patients’ difficulty in correctly initiating each stage of an activity. They may perform the steps in the wrong order, or have trouble knowing when to initiate or finish a particular action. Additionally, they often become confused when confronted with a problem or distracter whilst carrying-out the sequence. These issues relate to deterioration of the CE which is responsible for planning, sequencing, problems solving and divided attention. The second major deficit relates to memory and orientation. For example, patient’s had problems in recognising people, finding their way, or keeping track of daily events. This deficit can be explained by the damage caused to the episodic memory system. The third deficit relates to problems of learning new information. Difficulty in adapting to new environments and household appliances facilitates many of the problems i n daily living. For example, patients struggled to use new household technologies that had a different design to the one they previously owned. Problems in learning relate t o impairment to both episodic memory and the CE [2]. This issue is important with regards to designing an appropriate AT.

2.2.1.3 Consequences

Memory and Orientation

Learning

Consequences Quality of Life

Safety

Figure 1. Problems of independent living for people with mild to moderate dementia.

3. REFERENCES [1] Alm, N., Astell, a., Ellis, M., Dye, R., Gowans, G., and Campell, J. A cognitive prosthesis and communication support for people with dementia. Neuropsychological Rehabilitation, 14, 1/2, (2004), 117-134. [2] Baddeley, A., D., Baddeley, H. A., Bucks, R. S., and Wilcock, G., K., Attentional control in Alzheimer’s disease. Brain, 124, (2001), 1492-1508. [3] Mihailidis, A., Barbenel, J. C., and Fernie, G. The efficacy of an intelligent cognitive orthosis to facilitate handwashing by persons with moderate to severe dementia. Neuropsychological Rehabilitation, 14,1/2, (2004), 135-171. [4] Huff, J., F., Becker, J. T., Belle, S. Cognitive deficits and clinical diagnosis of Alzheimer’s. Neurology, 37, (1987), 1119-1124.

The third issue describes the consequences of the problems i n the home. These consequences either affect the patient’s quality of life or their safety. Issues regarding quality of life, include loneliness, loss of personal space, and sense of failure. Issues regarding safety include physical harm (often involving the cooker), security, and health. Categories that constitute issues of safety and quality of life are essential i n promoting independent living. However, as safety risks are the most common factors leading to care home admissions, i t should be an issue of primary importance.

[5] Kirsch, N. L., Levine, S. P., Fallon-Kreuger, M., & Jaros, L. The microcomputer as an ‘orthotic’ device for patients with cognitive deficits. Journal of Head Trauma Rehabilitation, 2, 4, (1987), 77-86.

2.2.2 Next Stage

[7] Orpwood, R., Bjorneby, S., Hagen, I., Maki, O., Faulkner, R., and Topo, P. User involvement in dementia product development. Dementia, 3, 3, (2004), 263-279.

The present model will be developed further, to incorporate the views of patients and their informal caregivers. Such accounts will provide additional perspectives with regards to the risks of daily living. Development of the model will also include a tasks analysis of specific ADLs. This will provide detailed descriptions of how the cognitive impairments pose a risk during specific stages of an activity. An analysis regarding the occurrence of these risks, and the consequential impact will provide an empirical basis to identify ADLs that should be

[6] Nadler, J., D., Richardson, E., D., Malloy, P. F., Marran, M. E., and Hostetler Brinson, M. E. The ability of the dementia rating scale to predict everyday functioning. Archives of Clinical Neuropsychology, 8, (1993), 449460.

[8] Rainville, C., Amieva, H., Lafont, S., Dartigues, J., Orgogozo, J., and Fabrigoule. C. Executive function deficit in patients with dementia of the Alzheimer’s type: A study with the Tower of London task. Archives of Clinical Neuropsychology, 17 (2002), 513-530. [9] Royall, D., R., Palmer, R., Chiodo, L. K., and Polk, M. J. Declining executive control in normal aging predicts change in functional status: The freedom house project.

Page 21

Journal of American Geriatrics Society, 52, 3, (2004), 346-352. [10] Zhang, J. and Norman, D. A. Representations in distributed cognitive tasks. Cognitive Science, 8, 1, (1994), 87-122.

Page 22

Diversity Theme

Page 23

A study of confidence in safety judgements

1

E. Alberdi, R. Bloomfield, M. van der Meulen, B. Littlewood Centre for Software Reliability, City University Northampton Square, London EC1V 0HB Tel: +44 20 7040 8423

{e.alberdi, reb, mjpm, b.littlewood}@csr.city.ac.uk P. Ayton Psychology Department, City University Northampton Square, London EC1V 0HB Tel: +44 20 7040 8524

[email protected] ABSTRACT We report the results of a preliminary psychological study we conducted to investigate how experts express confidence in their safety judgements. We had in the past illustrated issues related to confidence by examining and mathematically modeling a number of judgements about the safety integrity level (SIL) of a system. This work made some plausible but not empirically substantiated distributions for experts' confidence - a model of some of the behaviour of expert regulators in judging system integrity. The goal of the study reported here was to obtain empirical psychological data to verify and improve the model. The participants were a set of professional safety experts experts who were asked to judge a system's probability of failure on demand (pfd) and express their confidence on their judgement. The experts were asked for judgments in four phases: 1) after a 20 minute presentation describing a safety critical system and the implementation of a particular safety function; b) after a request for additional information, which (if available) was provided individually; c) after a group presentation of all items of additional information provided individually to the different participants in the previous phase; d) after a Delphi phase where there was an opportunity to discuss decisions with the other participants. Preliminary analyses suggest that there were two parts to the population: a subset of the participants (among the most experienced) were very doubtful about the system and judged the pfd to be in the higher categories; he rest of the population fluctuated among lower pfd categories. As a general trend, "trust" in the system (reflected partly in their expressions of confidence in the chosen pfd categories) seemed to decrease as the study progressed. For both groups, the provision of more information seemed to have led to more uniform judgment of quite which category the system was in; the distributions were less spread in the last two phases. We are currently exploring the implications of the observed phenomena for the probabilistic modelling of confidence in safety judgements.

1

Due to copyright restrictions the full text of the paper will be distributed for internal use only as a separate supplement to the conference proceedings.

Page 24

TA Weinberg David Greathead1, Budi Arief1 & Joey Coleman1 1

School of Computing Science University of Newcastle

Newcastle Upon Tyne, NE1 7RU +44 191 2226384

{david.greathead, L.B.Arief, J.W.Coleman}@ncl.ac.uk ABSTRACT

2. PSYCHOLOGY AND PROGRAMMING

This paper summarizes the main areas of research within TA Weinberg, namely an investigation into code review ability and personality; an investigation into code comprehension and personality; and an investigation into the results from an online programming competition and personality.

Cooperation between these two sciences was proposed by Weinberg in 1971 [18], and since then many other theorists have tried to look at some of the psychological aspects involved in the programming process.

With regard to code review, it was discovered that there is a relationship between code review ability and personality type. The code comprehension and programming competition research is ongoing.

Keywords Personality, MBTI, code-review, programming competition, diversity.

code

comprehension,

1. INTRODUCTION There is a tendency in DIRC to focus on the dependability of the created systems (including, of course, the human system around the technical components). But we clearly state, in various places, that DIRC is also concerned with how to increase the dependability of the creation process. System design is obviously a human process and DIRC’s mixture of different disciplines offers us the opportunity to look at factors which might increase the dependability of created systems by looking at the creation process. Little empirical research has been carried out examining the links between personality and programming and, of the research that has been carried out, most treats programming as a single monolithic process. This fails to reflect the fact that different stages of the software development process, such as design, coding, testing etc. are markedly different in nature and therefore people with differing strengths might be better suited to the different tasks. This TA is a continuation of the work started in PA8 — Effective Collaboration in Design which was initially inspired by Weinberg’s classic work, begun in the early 1970s on programming and psychology [18]. Weinberg conjectured that different aspects of the software development process require different abilities and that factors such as personality may, to some extent, be used to predict performance in the various stages.

In his book, Weinberg [18] presented evidence supporting his hypotheses regarding psychology and the programming process and, while this information tended to be anecdotal in nature, it was nonetheless compelling. He covered such diverse areas as personality, intelligence, motivation, training, experience, as well as considering what makes a good program and the various aspects of programming in groups or teams.

3. VARIATIONS IN PERFORMANCE Some theorists noted that there were surprisingly large variations in individual productivity and accuracy while executing parts of the software development process. Boehm [2] described the variation as being between a factor of 10 and a factor of 30. There has also been some speculation that there is some other factor, some ‘innate human trait’ [16] which may explain some of the large variations observed. This variation has been observed even between those with the same programming background, suggesting that there are factors beyond normal teaching programmes which are influencing ability in areas such as debugging. It has also been suggested by Weinberg [18] that the type of task may also be responsible for some of the variation observed. That is, as different types of task require different sets of skills, it stands to reason that different people will possess different skills, resulting in the fact that some people are better suited to certain types of task. For example, an individual could be an excellent program designer, but lack the skills required to effectively debug a program, and vice versa. Neither person is better than the other, simply better suited to certain kinds of task. Various studies have been carried out examining individual characteristics influencing performance at work [7-12]. These individual characteristics can be assessed using some of the many personality tests available. Some personality tests are biased towards a clinical sample and are aimed at aiding diagnosis of disorders (such as the Minnesota Multiphasic Personality Inventory, or MMPI) while others tests are not aimed at diagnosis (such as the Myers Briggs Type Indicator or MBTI and 16

Page 25

Personality Factors - 16PF). Many of the studies carried out within personality and computing science were done with the MBTI. Bishop-Clark [1] analysed the personality types of college students in a programming class and made some suggestions about the right type for each phase of the programming process due to the fact that, according to Weinberg [18] each phase is distinct, requiring different types of people to work on them. Recently, Capretz [3] analysed the personality types of software engineers. He found that while the MBTI describes 16 personality types, 24% of the engineers were all of the same type (ISTJ) which characterises the person as quiet, serious, concentrated, logical and realistic. A person presenting this type is technically oriented, does not like dealing with people and when working prefers to deal with facts and reasons. However, there is no indication whether a programmer with such characteristics performs better than programmers with other personality characteristics. Other personality types were given, but again there was no evidence which could reinforce the idea that such types would be better at certain phase of programming than others.

4. MBTI The MBTI is a popular personality assessment [1, 5, 6, 17]. Originally based on Jung’s theory of psychological types it included three bipolar factors or dimensions, these being; extroversion/introversion (EI), sensing/intuition (SN), and thinking/feeling (TF). Afterwards, Myers and Briggs added another pair of characteristics relating to judgment and perception (JP). The MBTI exhibits acceptable internal consistency; ‘the estimates of internal consistency reliabilities for the continuous scores of the four MBTI scales are acceptable for most adult samples’ [13, p.165-169]. The MBTI also shows consistency over time with good test-retest reliability. Changes in letter type normally only occur with one letter between tests, and then only if the strength of the preference associated with that letter was low [13, p.170171].

4.1 The dimensions Beyond the everyday understanding of the words extroversion and introversion, Myers [15] explained that the EI dimension is related to the way people tend to “recharge their energy”. That is, extroverts will focus their attention on other people through the external environment and feel more energetic after interacting with other people in a social setting, while the opposite would be true of introverts who would be drained after such interactions and would prefer to spend time with close friends or family (or alone) in an internal environment. This obviously has an influence on general career choice as this dimension has an impact on the type of job a person would consider. So, extroverts would be drawn to jobs which involve interacting with people while introverts prefer to work with impressions and ideas. However interest in jobs is mainly determined by the dimensions SN and TF. The sensing/intuition dimension is concerned with how people gather information from the world. In the case of SN, the concern is whether information is gathered through the five senses in a concrete manner, purely accepting what is directly observed, or through intuition using imagination and inspiration. The

dimension of thinking/feeling however, is more related to the manner in which people make decisions. As the name thinking/feeling suggests, decisions can be made in one of two ways – either following some logical sequence or basing decisions on a people-centred opinion considering feelings over logic. These two dimensions have an influence on career choice in that they impact not only on how satisfied people are by their career choice, but also how they are initially drawn to their choice [14]. For instance, it seems reasonable to assume that a psychological counsellor would be more likely to have a feeling bias rather than a thinking bias. It is again important to emphasise that one type is not regarded as being better than another - people simply need to be aware of how to take advantage of their type and then work with their preferences and their abilities. The way a person lives their life on a daily basis is influenced by the judging/perceiving dimension. People with a judgement preference tend to plan the events in their lives as much as possible while those with a perception bias are more likely to be spontaneous and adaptable in their everyday lives. This preference tends to exhibit itself in the manner in which a person lives their working life insofar as judgement-based people will tend to have specific self-imposed schedules and deadlines while people more towards the perception end of the scale will deal with issues as they arise and feel more comfortable without a strongly defined timetable. The MBTI is intended to be used with “normal individuals in counselling and within organizations” [5]. Smither [17] explains that such an instrument is not a useful tool in recruiting people. In fact the MBTI is a useful tool in relocating people in organizations, i.e. selecting current employees for a special task and to improve work communication/interaction. As such it is a popular personality measure within industry, being familiar to both employers and employees alike. In the second edition of his book, Weinberg [18] states that he would have written a completely different chapter about personality if he had known the advantages of the MBTI. He adds that this assessment is dealing “with normal personality differences” (19, p.8i) as some personality tests are related to mental disorders (such as the MMPI), i.e. they do not see the person as being normal.

5. THE CODE-REVIEW STUDY 5.1 Methodology 5.1.1 Apparatus In order to collect the relevant information in this study, two different instruments were employed, these being the MyersBriggs Type Indicator (MBTI) and a code-review task. The ‘MBTI Step 1, European English Edition’ was used to assess participants’ personality. This is an 88 item forced response personality inventory, which returns scores on four bipolar scales. Four letters are returned to indicate the type of preference, for example an individual would have either I or E as one of their letters (for Introversion or Extroversion), with a corresponding value indicating the strength of this preference. Similar scores are obtained for Sensing (S) or iNtuition (N); Thinking (T) or Feeling (F); and Judging (J) or Perceiving (P).

Page 26

The code-review task consisted of four pages (282 lines) of Java code, which had been written by an experienced programmer. The program was a pattern search program which would operate on an ASCII file. After being examined for the presence of bugs, 16 semantic bugs were inserted into the code by the programmer. The program was accompanied by a two page manual and API, including an example of the output the debugged code would produce when executed, and the ASCII file which was used for the example. A title page provided instructions on how to complete the task. One of the participants returned the task essentially blank and as such was excluded from certain statistical tests. Java was chosen as the language as it was the only language that all of the participants had knowledge of; it being the language which was taught during the first year at the university.

5.1.2 Participants Sixty-four participants completed both stages of the study. The participants were all undergraduate students from Newcastle University in the United Kingdom. While participants were unselected for age and sex, the majority (81%) were male and 77% were aged 19-21 due to the nature of the population. Participants were all awarded extra marks in their course for each aspect of the research in which they took part. Three prizes per programme of study were also awarded (of £10, £20 and £30) for the participants who scored more highly on the code review task. This was in order to encourage participants to apply themselves to the task.

5.1.3 Procedure The two instruments were administered on separate occasions, in regularly scheduled, one-hour lecture slots. The code review task also occupied part of the adjacent lecture slot. At both stages of the research, participants were reassured that their data would not be used in such a way that they could be individually identified. Participants were given one and a half hours to complete the code review task. They were informed that this was an individual task and were asked not to talk to one another. They were also asked to spread themselves out as much as possible in the lecture theatre. As well as being reminded that they would receive extra course credit for their participation, the participants were informed that there would be three prizes for each of the programmes of study, awarded to the highest scoring participants in each programme. They were also given additional information about the task in that all of the errors in the code were semantic errors, and that they did not have to correct the errors, only identify them in some way. They were not informed how many errors there were in total. In a separate session, participants were allowed up to one hour to complete the MBTI questionnaire and were free to leave the lecture theatre once they had finished. Participants were also offered an individual feedback session if they desired. The researchers were available to answer any questions participants may have had with regard to the questionnaire.

5.2 Results In order to examine the possible links with MBTI type and code review ability (score), a number of correlations were carried out. 1 The results are presented in Table 1 below . Table 1 - Correlations between MBTI type and code review score E

S

T

J

Code Review correlation

-0.197

-0.251

0.197

0.000

Sig (2-tailed)

0.121

0.047

0.122

0.998

As can be seen, the only significant correlation was that between the Sensing scale and the code review score. High scores on the Sensing scale represent an individual with a high sensing preference, while low scores represent an individual with a high intuitive preference. As this is a negative correlation, it indicates that people who are more intuitively inclined performed significantly better on the code review task than sensing types. There were minor, non-significant correlations with the extroversion and thinking scales but no correlation whatsoever with the judging scale. Further examination of the data revealed an interesting interaction between two variables. If participants are grouped according to two-letter types, performance on the code review task can be compared according to these types. The most notable differences using this method came with the SN and TF scales. Comparison of the mean code review scores shows that the NT students scored 9.10 as compared to the non-NTs who scored 6.14 on average. This illustrates that the NT individuals were able to perform better on average than non-NT people. The mean scores for these four types are shown in Table 2. Table 2 - Mean code review score by SN/TF types. F

T

N

8.71

9.10

S

4.27

6.62

As can be seen from the table, the most marked difference was between NT participants and SF participants, the NT participants scoring on average more than twice a well as the SF participants. A t-test comparing NTs with non-NTs yielded a significant result, illustrating that NTs were better at the task than non-NTs (1-tail sig = 0.039, t = 1.801, df = 61).

5.3 Discussion It would appear that Weinberg’s hypothesis was correct in that people of a certain personality type performed better at one aspect of the software development process. This being the case, further 1

For more information about correlations and other statistical analysis adopted in this study, please see a book on research methods, such as Robson, C., Real world research : a resource for social scientists and practitioner-researchers. 2nd ed. 2002, Oxford, UK ; Madden, Mass.: Blackwell Publishers. xxii, 599.

Page 27

research was recommended in order to more precisely analyse what these differences are. The results of this were the follow up studies of the Code Comprehension and Online Programming Competitions studies (see below). In addition, it may be advantageous for software companies to consider the strengths of their employees when assigning tasks in the workplace. If some people, for whatever reason, are better able to perform code review tasks than others then it would seem prudent for software companies to capitalise on the strengths of their employees, and consider employees perhaps previously overlooked for this particular task. This, in turn, would lead to the creation of more dependable systems. It seems clear that there are mental processes involved in software development which are not as yet, fully understood due to the fact that software engineering in itself is still in its relative infancy. If, as may well be the case, certain types of people are better at certain tasks (other than code-review) then utilising these people for these tasks will lead to fewer errors in the software creation process, and thus produce more dependable systems. In addition, if the development of a model illustrating the mental processes involved in the various tasks can also be produced then this too would lead to a better understanding of the practical processes involved and ultimately in the production of more dependable methods for software development, and consequently more dependable software. A more detailed analysis of the code-review study can be found in [4].

6. THE CODE COMPREHENSION STUDY In order to gain a greater understanding of the mental processes involved in the creation of a software product, it was deemed necessary to examine some other aspect of software development. Given the difficulty some of the participants had with the codereview task, the idea of measuring an individual’s ability to understand a piece of code was considered. An experiment was conceived whereby participants would be given a piece of Java code in addition to the MBTI personality questionnaire. The participants would be required to demonstrate their understanding by answering a number of multiple choice questions concerning the function of a particular section of code, or to state what would happen if the program were run with a particular set of variables. To this end, a piece of Java code was selected, which was sufficiently long and intricate so that gaining a proper understanding of its function would not be a trivial task. The program in question is a simulator for a number of lifts in a building, with a number of variables which can by set by the user. For instance, the user may decide that there are four lifts and fifteen passengers scattered throughout ten floors, who will decide to push the call buttons at random intervals within a user specified timeframe. The lift simulator then uses a random seed to distribute the passengers. Once the simulation is started, the program measures the waiting and travel times for each of the passengers. In addition to simply understanding what certain sections of the code are for, participants will be required to answer specific questions on the program, such as ‘where do the lifts rest?’ The format for these questions will be multiple-choice (with a space for comments) in order that marking can be carried out for large numbers of participants with ease and accuracy. The generation of

these questions is now underway. In a further test, the participants will be required to say how they would cope with modifying the code under certain circumstances. That is, they will be given a particular set of problems and must say how they would implement these changes. In addition to the code comprehension task itself, participants will be given the MBTI to complete in order to assess if the same results observed with the code-review task are indicative of good performance on this task. In addition to measuring the normal ability to understand the code, the influence of comments will also be assessed. That is, there will be two different versions of the code utilised for this between groups experiment, one with few comments present and one with more comments present, particularly concerning assertions at relevant points in the code. While it is generally anticipated that the presence of additional comments will be universally beneficial to the participants, the question if how much these comments aid comprehension, i.e. just how important are the comments in aiding code comprehension? MBTI results will also feature with regard to comments in that it will be investigated as to whether certain types of personality benefit more from the presence or otherwise of these comments. The whole task will be repeated with participants of varying levels of experience, these being second year undergraduate students, third year undergraduate students and a group of more experienced programmers, preferably from an industrial background. With these participants, it will be possible to examine the extent to which experience matters when attempting to understand a piece of code. It will also be possible to examine whether or not more experienced programmers rely more or less on the aforementioned comments.

7. THE ONLINE PROGRAMMING COMPETITION STUDY The third aspect of TA Weinberg is that of the online programming competition. This study comprises of generating an online questionnaire form for participants to complete (which has now been done), including a link to a pre-existing personality questionnaire similar to the MBTI. Permission to use this personality questionnaire has now been obtained. Initially, one programming problem from the Valladolid online programming competition was selected. The individuals who attempted to solve this problem will be sent an e-mail with a link to the form, inviting them to take part in the research. The results from the online form and the personality questionnaire will be compared to the solutions they submitted to the programming competition. Specifically, the types of errors participants made on their early attempts at solving the problem have been analysed and will be compared to personality type, educational level, etc. in order to establish any possible links. This analysis will be mostly exploratory in nature and will potentially generate interest in a much larger study in the future encompassing the thousands of participants in this particular programming competition.

8. CONCLUSIONS TA Weinberg most directly relates to the Diversity research theme in that it is concerned with the impact of different personality types on the process of developing software. A factor which is often overlooked is that the process of creating a piece of

Page 28

software can be directly related to the type of person involved. If, by examining some of the different factors concerning these people it may be possible to shed some light on the mental processes involved, then this should in turn lead to the ability to build more dependable systems. If it is the case that NT type people perform better through all of the tasks involved in TA Weinberg then the question of what makes them better can be addressed. If, on the other hand, it transpires that some other type is better for code comprehension, and yet another type has the better performance at the online competition this is equally interesting.

[6] Edwards, J.A., K. Lanning, and K. Hooker, The MBTI and Social Information Processing: An Incremental Validity Study. Journal of Personality Assessment, 2002. 78(3): p. 432-450.

However, the ultimate aim of this thrust of research (albeit perhaps not within the scope of TA Weinberg) is to examine groups of people rather than individuals and how they interact when designing and building a piece of software. Then, will the issue of diversity of personality truly become apparent.

[9] Furnham, A., et al., Do personality factors predict job satisfaction? Personality & Individual Differences, 2002. 33(8): p. 1325-1342.

9. ACKNOWLEDGMENTS Thanks go to all of those involved with TA Weinberg and the production of this document. This work was carried out under the EPSRC-funded DIRC project. Special thanks go to Alessandra Devito da Cunha who worked closely with the members of TA Weinberg in the earlier stages of the work.

10. REFERENCES [1] Bishop-Clark, C. and D. Wheeler, The Myers-Briggs Personality Type and its Relationship to Computer Programming. Journal of Research on Computing Education, 1994. 26(3): p. 358-70. [2] Boehm, B.W., Software engineering economics. PrenticeHall advances in computing science and technology series. 1981, Englewood Cliffs, N.J.: Prentice-Hall. xxvii, 767. [3] Capretz, L.F., Personality types in software engineering. International Journal Human-Computer Studies, 2003. 58: p. 207-214. [4] Devito Da Cunha, A., "The Myers Briggs Personality Type as a Predictor of Success in the Code-Review Task." MPhil Dissertation, University of Newcastle, 2003. [5] Devito, A.J., Review of Myers-Briggs Type Indicator, in The Ninth Mental measurements yearbook, J. Mitchell, Editor. 1985, Lincoln, Neb. p. 10301032.

[7] Furnham, A. and T. Miller, Personality, Absenteism and Productivity. Personality & Individual Differences, 1997. 23(4): p. 705-707. [8] Furnham, A., C.J. Jackson, and T. Miller, Personality, learning style and work performance. Personality & Individual Differences, 1999. 27: p. 1113-1122.

[10] Furnham, A., L. Forde, and K. Ferrari, Personality and work motivation. Personality & Individual Differences, 1999. 26: p. 1035-1043. [11] Furnham, A., Personality and individual differences in the workplace: Person-organization-outcome fit. 2001. [12] Furnham, A., Personality at work : the role of individual differences in the workplace. 1994, New York: Routledge. xxiii, 423. [13] Myers, I.B. and M.H. McCaulley, Manual : a guide to the development and use of the Myers-Briggs Type Indicator. 5th ed. 1985, Palo Alto, Calif.: Consulting Psychologists Press. xx, 420. [14] Myers, I.B. and P.B. Myers, Gifts differing: understanding personality type. 1995, Palo Alto, Calif.: Davies-Black Pub. 228. [15] Myers, I.B., Introduction to type : a description of the theory and applications of the Myers-Briggs Type Indicator. 12th ed. 1990, Palo Alto, CA: Consulting Psychologists Press. 31. [16] Pressman, R.S., Chapter 19 - Software testing strategies, pp.654-658, in Software engineering : a practitioner's approach. 1992, McGraw-Hill: New York. p. xxi, 793. [17] Smither, R.D., The psychology of work and human performance. 3rd ed. 1998, New York: Longman. xviii, 590. [18] Weinberg, G.M., The psychology of computer programming. 2nd ed. 1998, New York,: Van Nostrand Reinhold. xv, 288.

Page 29

Some difficult decisions are easier without computer support 1 Andrey A. Povyakalo

Eugenio Alberdi

Lorenzo Strigini

Centre for Software Reliability Tel: +44 20 7040 8247 [email protected]

Tel: +44 20 7040 8424 [email protected]

Tel: +44 20 7040 8245 [email protected]

Peter Ayton Psychology department Tel: +44 20 7040 8524 [email protected] City University, Northampton Square, London EC1V 0HB

ABSTRACT Developments in computing offer experts in many fields specialised support for decision making under uncertainty. However, the impact of these technologies remains controversial. In particular, it is not clear how - or how much - imperfect computer responses may affect human decision making. This question applies even when such responses are rare and even where the human expert retains full authority and responsibility for the final decision. Here we report the discovery of strikingly diverse effects of computer support on breast cancer experts' clinical decisions, as detected by statistical analysis of trial results. This was part of an interdisciplinary study conducted in the DIRC project on the dependability of computer-based systems. These technologies have typically been assessed in terms of their aggregate effects in a clinical trial. Instead, we analyse variations of outcomes across (1) different cases, (2) different human experts and (3) correct and erroneous computer output. Here we show that computer support: a) was less useful for (and sometimes hindered) those experts who were relatively good at detecting difficult-to-detect cancers without support; b) helped those experts who were less good at detecting easier-to-detect cancers without support; c) harmed experts' decisions when it failed to prompt difficult-to-detect cancers; d) unexpectedly improved decisions about some normal cases. Thus, experts' professional decisions can be influenced by computer support in intricate and counterintuitive ways that are not normally considered by the accepted approaches to assessment.

Keywords Human-machine diversity, Statistical modelling, Breast cancer screening, mammography, computer aided detection, medical decisions

1

Full text of the paper is distributed separately as a supplement

Page 30

A Socio-technical Approach to Voting Systems P Y A Ryan University of Newcastle Newcastle upon Tyne NE1 7RU, UK +44 191 222 8972

[email protected] ABSTRACT I outline the key developments of the investigation of the interdisciplinary aspects of electronic voting technologies and systems. In particular, I give an outline of new voterverifiable scheme, [2], based on the Chaum original but significantly simpler both conceptually and from an implementation point of view. It is hoped that this simpler scheme may be of interest to the socio-technical researchers of the DIRC project.

Keywords Risk, trust, E-voting, dependability, voter-verifiability.

1. INTRODUCTION The scheme, [1], proposed by David Chaum was chosen as a case study for the security strand of DIRC. The problem of designing, evaluating and maintaining a dependable voting system has all the right ingredients for a security oriented DIRC study. E-voting is of course a very topical subject and highly inter-disciplinary in nature: a voting system will typically have a technical core but be surrounded by a complex, socio-technical system. Besides the technical issues of accuracy, secrecy and availability, public trust and usability are paramount.

With most DRE style systems, the voting devices must be completely trusted to accurately record and count votes and t o maintain the secrecy of all ballots. The level of trust can be reduced, or at least shifted, by a scheme along the lines of that proposed by CESG. Here, voters are provided with ballot forms with unique validation codes, one for each candidate. Voters use the appropriate code to cast their vote, over the internet or other channel. A central authority maintains a list of codes against candidates and uses this to tally the votes cast. This is quite an elegant solution, providing a high degree of end-to-end security, but it still requires complete trust to be placed in the tallying authority. By contrast, the scheme described here achieves the goals with virtually no need to trust system components or agents. Assurance arises from close auditing of the vote capture and counting process. The scheme thus achieves high assurance through a high degree of transparency within the constraints of the secrecy requirements. Phrased another way: we verify the actual election (at run-time) rather than attempting to verify the electoral system (at design-time).

1.1 Voter-verifiability The key ingredients for providing voter-verifiability are:

Most traditional approaches to this problem involve placing significant trust in the technology, mechanisms or processes. Thus, for the traditional paper ballot, the handling of the ballot boxes and counting process must be trusted, that the boxes are not lost or manipulated and that the counting process is accurate. Various observers can be introduced to the process which helps to spread the dependence but does not eliminate it. With many of the touch screen, DRE, devices widely used i n the recent US presidential elections, the voter at best gets some form of acknowledgement of the way they cast their vote. After that, they can only hope that their vote will be accurately included in the final tally. The goal of schemes, like that proposed by Chaum or the Prêt à Voter scheme described here, is to ensure accuracy of the vote tally whilst ensuring ballot secrecy. Ideally, this should be achieved with minimal dependence on the components of the system.



Provide the voter with a receipt showing their vote i n encrypted form.



Enable the voter to confirm in the booth that her intended vote is correctly encoded in the receipt, whilst preventing the vote from being revealed outside the booth.



Have a number of tellers perform an anonymising mix on the batch of encrypted ballot receipts with all intermediate steps of the tellers processing posted t o the web bulletin board.



Perform random checks on all steps of the process t o ensure that, with high probability, any attempt t o corrupt the vote capture and counting will be detected.

The point of the encrypted receipt is to provide the voter with a way to check that her ballot is entered into the tallying process and indeed, if her receipt has not been included, t o prove this to a third party. The fact that her vote is i n encrypted form ensures that there is no way for her to prove t o a third party which way she voted. Voters can visit the web bulletin board and check that their (encrypted) ballot receipt has been correctly posted. The tellers process these posted

Page 31

receipts and there are mechanisms in place to ensure that all posted receipts are entered into the tallying process.

Buddhist ✻

The anonymising mixes performed by the tellers ensure that there is no link between any encrypted ballot receipt and the decrypted vote that is finally output by the tallying process.

X

Alchemist ❂ Anarchist ✭ Nihilist ✮

2. Prêt à Voter

&r8*Kp%SD6$5

The original Chaum scheme used visual cryptography t o represent the vote and resulted in a scheme of considerable complexity, both conceptually and from an implementation point of view. The voter experience was also rather unfamiliar and reasonably elaborate.

She should now detach the left hand strip along the perforation thoughtfully provided down the middle and drop this in a shredder. She is thus left with:

The new scheme, outlined in this paper, greatly simplifies the presentation of the voter’s choice and results in a system that is significantly easier to understand and to implement.

X

Here we present the supervised version of the scheme, in which voters authenticate themselves and register at a polling station and cast their vote in isolation in a booth. Remote versions of the scheme in which voters are able to cast their votes over various channels, telephone, SMS, internet etc are the subject of ongoing work.

2.1 Casting your vote Once registered at the polling station, the voter is presented with a familiar looking ballot form, chosen from a heap at random. This has the candidates listed in the left hand column, along with a right hand column into which the voter selection can be inserted.

Buddhist ✻ Alchemist ❂ Anarchist ✭

&r8*Kp%SD6$5

Which she now inserts in an optical reader that records the information: the position of the cross and the value of the onion. This strip is then returned to the voter who will retain i t as the ballot receipt. The point of the randomization of the candidate list should now be apparent: the right hand strip alone does not reveal which way the vote was cast. Of, course, if this were the end of the story we’d be snookered: there would be no way for anyone to count the votes. This is where the onion comes in: this has buried in it cryptographically the information needed t o reconstruct the candidate ordering.

2.2 Verifying your vote

Nihilist ✮ &r8*Kp%SD6$5

So far, so familiar. There are, however, two features of these ballot forms that are non-standard: •

The ordering of the candidates is randomized.



There is a funny looking string of random garbage i n the lower right hand corner. We will refer to this as the ballot onion.

These are crucial to the way the system works and will be explained shortly. For the moment, let’s continue with the voter experience. The voter trots over to a vacant booth and places an X in the appropriate cell against their candidate. Suppose that our enlightened voter wishes to vote for the Buddhist candidate:

Once the polling stations close, all recorded ballot receipts should be sent to a central server and then posted to a Web Bulletin Board (WBB). Ideally, these should be lexically ordering according to the onion values. The WWB can only be written to by the server and the tellers (see later) but can be read by anyone. Once written, nothing can be erased. Voters should visit the WWB and confirm that their receipt appears there accurately. That is, the onion value should agree exactly and their X should appear in the correct cell, exactly as shown on their receipt. If their receipt does not appear, or i s incorrect, they should kick up a stink. Note that, like snowflakes, the onions should all be distinct. No two ballot forms should have the same onion value. So, every voter who took the trouble to exercise their democratic right, should find a unique onion matching theirs on the WBB. The point of this check is for the voter to confirm that their receipt is entered accurately into the tallying process. How precisely such a WWB would be implemented to enforce the above properties, is a tricky question, outside the scope of this paper.

Page 32

50% of the forms. These will be checked for the correctness of their construction.

2.3 Processing receipts Once a suitable time period has passed after the posting of the receipts, the processing of the votes will start. The details of how this processing is performed are not so important for the purposes of this paper. Indeed, various techniques can be applied here and are being investigated. Reference [2] uses Chaumian decryption mixes, in a similar way to Chaum’s original scheme.

To check a form, the crypto information has to be revealed. An innovation of the new scheme is to use the tellers in an oracle mode during this phase of the election to reveal this information for ballot forms selected to audit. This avoids the need to store and selectively reveal keys. Recall, that the tellers collectively have the necessary secret keys to extract this information. Once a form has been audited in this fashion, it should be discarded, since the secret crypto information has been revealed.

In rough terms, the information required to reconstruct the candidate ordering is buried in layered fashion in the onions under the public keys of the tellers. The tellers take it in turns to take in a batch of receipts, strip off a layer of encryption with their secret keys and shuffle the resulting receipts. The resulting, partially decrypted and shuffled receipts are then posted to the next column of the WBB. The last teller will strip of the last layer of encryption, thus revealing the original vote value. The last column of the WBB thus contains a list of the decrypted votes and this is available to anyone to check the final vote tally.

If all are found to be correctly constructed, we can reasonably conclude that it is highly probable that the remaining forms are all correctly formed. More precisely, the probability that x forms are malformed and go undetected, falls off exponentially with x.

2.4 Assurance of Accuracy

Alternatively, voters could be invited to choose a dummy ballot form and use this to cast a dummy vote. This could be i n presence of the voting officials and would comprise the voter making a random choice of candidate and submitting the right hand strip to reader device in the same manner as casting a real vote (except that a real vote would be cast in a booth on a separate device). The tellers should return the correctly decrypted dummy vote value and this serves as a partial check on the correctness of the form.

Thus far, our description has assumed that all the elements of the scheme perform correctly. With this assumption, it is clear that the outcome of the election will indeed be accurate and secret. However, the whole design philosophy of voterverifiable schemes is to avoid having to make such assumption and place such trust. In place of such trust, we perform close auditing of the behaviour of the authority and tellers in such a away as to, with high probability, detect any corruption or malfunction. There are two places in which things could go wrong with respect to the accuracy requirement in our description so far: •

The ballot forms could be incorrectly constructed.



The tellers could perform their decryptions receipts incorrectly.

of

More precisely, the information buried in the onions might not correspond to the candidate ordering shown on the form. Clearly this could lead to an incorrect decryption of the receipt and so undermine the accuracy of tally. Alternatively, even if the ballot forms are all correctly constructed, it may be that one of the tellers is corrupt or defective and so performs some decryptions inaccurately. Again this could undermine the accuracy of the tally.

2.5 Auditing the Authority The Authority is required to generate a large number of ballot forms in advance of the election, significantly more than will actually required for the size of the electorate. To detect whether the Authority has corrupted any of the ballot forms, we perform audits on a random selection of forms. Thus, various independent organizations like the Electoral Reform Society, would be invited to make a random selection of say

The fact that the ballot forms are all generated in advance actually opens up the possibility of other checking modes, some of which do not necessitate the revealing of the crypto information. For example, for an audited form, the tellers could be required simple to return the candidate ordering given the only the onion value.

Care has to be taken of course to ensure that the oracle mode for the tellers is only available before and during the election phase and not after and that no real votes be submitted to this mode. Details of these alternative checking modes are given i n [2]. The psychology of such dummy voting or checking modes, when made available to voters rather than just to auditing officials, raising fascinating questions. On the one hand, the successful decrypting of dummy votes should presumably provide some reassurance to voters that their real votes will eventually be correctly decrypted and counted. On the other hand it seems likely that dummy voting could undermine voter’s confidence that the secrecy of their vote will be maintained. Investigating these psychological effects would be an excellent avenue of research for DIRC. We should stress at this point that these checks serve only t o verify the correct construction of the ballot forms. They do not serve any purpose with regard to checking the behaviour of the tellers during the tallying phase. For this we have quite different checking procedures that we come to in the next section.

2.6 Auditing the Tellers The checks described above should ensure that, in the absence of collusion between the authority and the auditors at least, the chance of the Authority corrupting a significant number of ballot forms without being detected is vanishingly small. In

Page 33

this section we outline the checks that would be performed t o detect any corruption by the tellers during the tallying phase. Recall that at each stage of the anonymising mix each teller i s required to post the results of its processing to the WBB. Once all teller processing has completed and the tellers have publicly committed to their results, we commence the teller audit process. For each teller, an auditor now makes a random selection of half the receipts input to the teller’s mix. For these selected receipts, the teller is required to reveal the corresponding link, along with some crypto information associated with the processing of that receipt. For each such revealed link, the auditor will check the decryption computations performed b y the teller. If the decryption was not correctly performed this will be detected.

The scheme described here is significantly simpler than the Chaum original. It is hoped that, as such, it will be more accessible to socio-technical folk in DIRC and that they will be tempted to dabble in this topical and exciting subject. As part of the early activity in the TA, groundwork was laid for the development of a dependability case for an e-voting system. It is hoped to build on this and adapt the existing work to the new scheme. Further enhancements and adaptations of the scheme are being developed, for example using alternative robust mix techniques and to facilitate remote voting. Investigation of the psychological aspects of the user experience, in particular of the dummy voting and ballot form checking modes would be useful. It is hoped to develop prototypes of the system and run trails.

The selection of links to be revealed is carefully contrived t o ensure that no complete links are revealed across the entire mix process. Thus no ballot receipt in the far left column of the WBB can be linked to its eventual decryption in the far right column. Thus the secrecy of all ballots is ensured.

The possible deployment and trailing of the scheme is being discussed with the ODPM.

Full details of this process are given in [2].

Voting technology is currently a very hot topic and one that poses major socio-technical challenges. The DIRC project has already made significant contributions in this field both i n terms of the new family of voter-verifiable schemes outlined here but also in terms of a framework for an interdisciplinary study of the dependability of such systems.

2.7 Assurance of Secrecy The mechanisms described above are designed to ensure that there is a negligible probability of significant corruption of votes going undetected. We need also to ensure the secrecy of votes. Here the mechanisms are the encryption of the ballot receipts and the anonymising mixes. Thus, some trust needs t o be placed in the crypto algorithms, but these are publicly known and subject to expert scrutiny. Off the shelf, proven algorithms like RSA would probably be used. Trust must be placed in the Authority to keep the information about the association of onions with candidate ordering secret. Enhancements to the scheme to spread this trust and minimize the threats is ongoing. Similarly, some trust needs to be vested in the tellers to keep their shuffles secret. Here we have defense in depth: we can tolerate some of the tellers to be subverted and still maintain ballot secrecy.

4. Conclusions

5. ACKNOWLEDGMENTS The author would like to thank Ben Adida, Jeremy Bryans, Jeroen van der Graaf, Michael Jackson, Cliff Jones, Rene Peralta, Brian Randell, Ron Rivest, Fred Schneider and Poorvi Vora for many helpful discussions.

6. REFERENCES [1] Chaum, D., Secret-Ballot Receipts: True Voter-Verifiable Elections. IEEE Security and Privacy, 2(1): 38-47, Jan/Feb 2004. [2] Chaum, D., Ryan, P Y A., and Schneider, S A: A Practical, Voter-verifiable Election Scheme. Newcastle Technical Report 880, submitted to CSFW 2005.

3. Way forward

Page 34

The effectiveness of choice of programming language as a diversity seeking decision∗ M.J.P. van der Meulen, P.G. Bishop Centre for Software Reliability City University EC1V 0HB London, UK mjpmcsr.city.ac.uk

M. Revilla Department of Applied Mathematics University of Valladolid 47011 Valladolid, Spain [email protected]

ABSTRACT Software reliability can be increased by using a diverse pair of programs (1-out-of-2 system), both written to the same specification. The improvement of the reliability of the pair versus the reliability of a single version depends on the degree of diversity of the programs. The choice of programming language has been suggested as an example of a diversity seeking decision. However, little is known about the validity of this recommendation. This paper assesses the effect of language on program diversity. We compare the effects of the choice of programming language as a diversity seeking decision by using programs written to three different specifications in the “UVa Online Judge”. Thousands of programs have been written to these specifications; this makes it possible to provide statistical evidence. The experiment shows that when the average probability of failure on demand (pfd) of the programs is high, the programs fail almost independently, and the choice of programming language does not make any difference. When the average pfd of the pools gets lower, the programs start to fail dependently, and the pfd of the pairs deviates more and more from the product of the pfds of the individual programs. Also, we observe that the diverse C/Pascal or C++/Pascal pairs perform as good as or better than the other possible pairs.

In spite of this, the case for diversity for achieving high reliability remains strong. The possible gain using diversity appears to be higher than can be achieved by trying to write a high reliability single program [6]. Several techniques have been proposed to decrease the likelihood that different programs fail dependently. These are called “Diversity Seeking Decisions” in [9]. Examples are: • Data diversity. Using random perturbations of inputs; using algorithm specific re-expression of inputs.

• Design diversity. Separate (“independent”) development; diversity in programming language; diverse requirements/specifications; different expressions of identical requirements; etc. In this paper we will concentrate on design diversity and specifically on programming language diversity. This is a potential defence against some programming slips, and provides some, limited, cognitive diversity against mistakes in higher-level problem solving; the efficacy will however depend heavily on “how different” the programming languages are.

Keywords Diversity, Programming Language.

1

Software diversity may however not lead to a dramatically high improvement. This is caused by the fact that the behaviour of the programs cannot be assumed to be independent [3] [5] [6] [7]. Two program versions written by independent teams can still contain similar programming mistakes, thus limiting the gain in reliability of the diverse pair.

INTRODUCTION

The use of a diverse pair of programs has often been recommended to achieve high reliability [1] [2] [3] [4] [5]. ∗ The full version of this paper will appear in the proceedings of the Fifth European Dependable Computing Conference, EDCC-5, 20-22 April 2005, Budapest, Hungary.

The “UVa Online Judge”-website (http://acm.uva.es) provides many programs written to many specifications, and gives us the opportunity to compare diverse pairs. In this research we use the programs written in C, C++ and Pascal, written to three different specifications. Our aim is to compare the reliability performance of diverse pairs with each other and with single programs.

Page 35

Table 1: Some statistics on the three problems. Number of authors First attempt correct First version completely incorrect

2 2.1

C 5897 2483 723

3n+1 C++ 6097 2442 761

THE EXPERIMENT

Factovisors C++ Pas 582 71 308 42 97 9

Prime Time C C++ Pas 467 884 183 356 653 127 93 194 49

Pas 1581 593 326

C 212 113 27

2.2

Problems Selected

We selected problems from conforming to the following criteria:

The UVa Online Judge

The “UVa Online Judge”-Website is an initiative of Miguel Revilla of the University of Valladolid [10]. It contains problems to which everyone can submit solutions. The solutions are programs written in C, C++, Java or Pascal. The correctness of the programs is automatically judged by the “Online Judge”. Most authors submit solutions until their solution is judged as being correct. There are many thousands of authors and together they have produced more than 3,000,000 solutions to the approximately 1500 problems on the website. In this paper we will analyse the programs written to three different problems on the website. We will submit every program to a test set, and then compare their failure behaviour. There are some obvious drawbacks from using this data as a source for scientific analysis. First of all, these are not “real” programs: the programs under consideration solve small, mostly mathematical, problems. We have to be careful to not overinterpret the results. Another point of criticism might be the fact that the Online Judge does not give feedback on the demand on which the program failed. This is not necessarily a drawback. It is certainly not comparable to a development process involving a programmer and a tester, because in that case there will be feedback on the input data for which the program fails. It has however similarities with a programmer’s normal development process: a programmer will in spite of the fact that there are no examples of inputs for which a program fails, assume that it is not yet correct. The programmer works until he is convinced that the program is correct, based on his own analysis and testing. From this perspective, the Online Judge only confirms the programmer’s intuition that the program is not yet correct. In this experiment, we circumvent this drawback by only using first submissions. A last possible criticism on our approach is that programmers may copy each other’s results. This may be true, but it is possible to limit the consequences of this plagiarism for the analyses by assuming that authors will only copy correct results from each other. For the analyses in this paper, the consequence is that we cannot trust absolute results, and we will limit ourselves to observing trends in relative performance.

• The problem does not have history, i.e. subsequent inputs should not influence each other. Of course, some programmers may implement the problem in such a way that it has history. Given our test approach, see below, we will not detect these kinds of faults. • The problem has a limited demand space: two integer inputs. Both restrictions lead to a reduction of the size of the demand space and this keeps the computing time within reasonable bounds (the necessary preparatory calculations for the analysis of these problems take between a day and two weeks to complete). Below, we provide a short description of the problems, although this information is not necessary for reading this paper: we will not go into detail with respect to functionality. It gives some idea of the nature and difficulty of the problems, which is useful for interpreting our results. See the website http://acm.uva.es for more detailed descriptions of the problems. The “3n+1”-Problem. A number sequence is built as follows: start with a given number; if it is odd, multiply by 3 and add 1; if it is even, divide by 2. The sequence length is the number of these steps to arrive at a result of 1. Determine the maximum sequence length for the numbers between two given integers 0 < i, j ≤ 100, 000. The “Factovisors”-Problem. For two given integers 0 ≤ i, j ≤ 231 , determine whether j divides factorial i. The “Prime Time”-Problem. Euler discovered that the formula n2 + n + 41 produces a prime for 0 ≤ n ≤ 40; it does however not always produce a prime. Write a program that calculates the percentage of primes the formula generates for n between two integers i and j with 0 ≤ i ≤ j ≤ 10, 000.

2.3

Running the Programs

For all problems chosen, a “demand” is a set of two integer inputs. Every program is restarted for every demand; this is to ensure the experiment is not influenced by history, e.g. when a program crashes for certain demands. We set a time limit on each demand of 200 ms. This time limit is chosen to terminate programs that are very slow, stall, or have other problems.

Page 36

Every program is submitted to a series of demands. The outputs generated by the programs are compared to each other. Programs that produce exactly the same outputs form an “equivalence class”. These equivalence classes are then converted into score functions. A score function indicates which demands will result in failure. The difference between an equivalence class and its score function is that programs that fail in different ways (i.e. different, incorrect outputs for the same demands) are part of different equivalence classes; their score functions may however be the same. The score functions are used for the calculations below. For all three problems, we chose the equivalence class with the highest frequency of occurrence as the oracle, i.e. the version giving all the correct answers. “3n+1” and “Factovisors” were run using the same set of demands: two numbers between 1 and 50, i.e. a total of 2500 demands. In both cases the outputs of the programs were deemed correct if they exactly match those of the oracle. “Prime Time” was run using a first number between 0 and 79, and a second number between the first and 79, i.e. a total of 3240 demands. The outputs of the programs were deemed correct if they were within 0.01 of the output of the oracle, thus allowing for errors in round off (the answer is to be given in two decimal places). Table 1 gives some statistics on the problems.

2.4

Graphs

In this experiment, we wish to investigate the reliability improvement gained by choosing different programming languages for the programs in the pair, and we therefore need to use the Littlewood and Miller model. First, we establish pools of programs, each pool containing programs in C, C++ or Pascal. For comparison of the individual programs and pairs, we need pools with the same pfd. To manipulate the pfd of the pools, we remove programs from them, starting with those with the highest pfd, until the average pfd of the pool has the desired value. This is a possible way of simulating testing of the programs; the tests remove the programs with the highest pfd first. Pools with the same pfd could then be assumed to have undergone the same scrutiny of testing. We select a first program from one of the pools. Then we select a second program from a pool, and calculate the ratio of the pfd of the first program and the pfd of the pair:

P

θA (x)P (X = x) pfdA = P x R= pfdAB θ (x)θ B (x)P (X = x) x A

(1)

In this equation, θA (x) and θB (x) represent the difficulty for demand x for the first and second program, and P (X = x) the probability that demand x occurs. We do so for varying values of the pfd of the pools. The varying pfd is shown on the horizontal axis in the graphs. Figure 1 shows these ratios for the “3n+1”-problem for different choices of the programming language of the first

program. The graphs show R on the vertical axes on a logarithmic scale, because we are interested in the reliability improvement; with a logarithmic scale, equal improvements have equal vertical distance.

3

Conclusion

We analysed the effect of the choice of programming language in diverse pairs using three different problems from the “UVa Online Judge”. The results seem to indicate that diverse language pairs outperform other pairs, but the evidence is certainly not strong enough for a definite conclusion. Analysis of more problems could help to strengthen the evidence and also to identify the factors that influence the gain possibly achieved by diversity of programming language.

ACKNOWLEDGEMENT This work is supported by EPSRC/UK in the framework of the Interdisciplinary Research Collaboration in Dependability-Project (DIRC).

References [1] Brilliant, S.S., J.C. Knight, N.G. Leveson, Analysis of Faults in an N-Version Software Experiment, IEEE Transactions on Software Engineering, SE-16(2), pp. 238-47, February 1990. [2] Voges, U., Software diversity, Reliability Engineering and System Safety, Vol. 43(2), pp. 103-10, 1994. [3] Eckhardt, D.E., L.D. Lee, A Theoretical Basis for the Analysis of Multi-Version Software Subject to Coincident Errors, IEEE Transactions on Software Engineering, Vol. SE-11(12), pp. 1511-1517, December 1985. [4] Eckhardt, D.E., A.K. Caglayan, J.C. Knight, L.D. Lee, D.F. McAllister, M.A. Vouk, J.P.J. Kelly, An Experimental Evaluation of Software Redundancy as a Strategy for Improving Reliability, IEEE Transaction on Software Engineering, Vol. 17, No. 7, July 1991. [5] Knight, J.C., N.G. Leveson, An Experimental Evaluation of the Assumption of Independence in Multiversion Programming, IEEE Transaction on Software Engineering, Vol. SE-12(1), pp. 96-109, 1986. [6] Hatton, L., N-Version Design Versus One Good Version, IEEE Software, 14, pp. 71-6, 1997. [7] Littlewood, B., D.R. Miller, Conceptual Modelling of Coincident Failures in Multiversion Software IEEE Transactions on Software Engineering, Vol. 15, No. 2, pp. 1596-1614, December 1989.

Page 37

Effective Methods for Design Diversity - How to Progress from Intuition to Science, In: Proceedings of the 18th International Conference, SAFECOMP ’99, Lecture Notes in Computer Science 1698, Toulouse, 1999.

1000 100

[10] Skiena, S., M. Revilla, Programming Challenges, Springer Verlag, March 2003.

10

C C++ Pascal

[11] Lee, P.A., T. Anderson, Fault Tolerance; Principles and Practice, Dependable Computing and FaultTolerant Systems, Vol. 3, Second, Revised Edition, 1981.

1

Reliability improvement

3n+1; First program in C

0.00001

0.0001

0.001

0.01

0.1

Average pfd

[12] Chen, L., A. Avizienis, N-Version Programming: A Fault Tolerance Approach to Reliability of Software Operation, Digest of 8th Annual International Symposium on Fault Tolerant Computing, Toulouse, France, pp. 3-9, June 1978.

1000 100

[13] IEC, IEC61508, Functional Safety of E/E/PE safetyrelated systems, Geneva, 2001-2.

10

C C++ Pascal

1

Reliability improvement

3n+1; First program in C++

0.00001

0.0001

0.001

0.01

0.1

Average pfd

1000 100 10

C C++ Pascal

1

Reliability improvement

3n+1; First program in Pascal

0.00001

0.0001

0.001

0.01

0.1

Average pfd

Figure 1: These graphs show the reliability improvement of a pair of programs over a single version for the “3n+1”-problem. The horizontal axis gives the average pfd of the pools of programs involved in the calculation. In every graph, the programming language of the first program is given. The curves show the reliability improvement for the different possible choices of the programming language for the second program as function of the average pfd of the pools. The diagonal in the graphs shows the reliability achievement if the programs’ behaviours were independent. [8] Lyu, M.R., Software Reliability Engineering, McGraw Hill, 1995. [9] Popov, P., L. Strigini, A. Romanovsky, Choosing

Page 38

The Limits of Personas Peter Bagnall

Guy Dewsb ury

Ian Sommerville

Computing Department InfoLab 21 South Drive Lancaster University Lancaster LA1 4WA +44 1524 510 352

Computing Department InfoLab 21 South Drive Lancaster University Lancaster LA1 4WA +44 1524 510 351

Computing Department InfoLab 21 South Drive Lancaster University Lancaster LA1 4WA +44 1524 510 307

b agnall@comp .lancs.ac.uk

dewsbury@comp .lancs.ac.uk

is@comp .lancs.ac.uk

ABSTRACT While personas are effective for workplace systems design, they are less useful when designing for vulnerable users, due to problems with gaining sufficient understanding of the target audience and problems making evaluations without prior knowledge of the users. Designers need to be involved i n the development of personas to gain the most benefit from them. Problems also arise designing for populations with impairments, since this implies multiple interfaces, which increases the number of personas required.

cannot be as accurate as genuine user testing it is sufficient t o cull the most problematic design ideas, and at the very least reduce the options to a small set, which if necessary can be tested with actual users. To be effective a persona-based design process relies heavily on the ability of the designers to accurately portray the persona and predict their responses [3]. Because of this any problems with either the information used to generate the persona, or with the designers ability to get into role has the potential to weaken the resulting design.

3. PROBLEMS USING PERSONAS

Keywords Diversity; Personas; Personae; Interaction Design; Perspective taking; Empathy; Goal directed design.

1. INTRODUCTION The use of personas within the interaction design community is growing, but there is still a significant level of disagreement as to their merits and uses. There is also a great deal of difference in the detail of how they are created and used. To inform the debate it is important to understand why personas work, which it is believed will also lead to a greater understanding of how they should be used and where their limits are. Personas, used correctly, can be a powerful tool but their use can require some care.

2. THE PURPOSE OF PERSONAS When designing a system having an understanding of the intended audience is vital. The design clearly has to solve problems that genuinely exist for the target audience [1], in a way which they will recognise and understand. The potential pitfall, known as self-referential design [2], is designing a system which serves the designers needs and desires, but fails to fulfil those of the target users. Without a model of the target audience the design can only fall back on their own taste and preferences when evaluating design decisions. A key part of the model is an understanding of the motivations of the target users, in other words, the user’s goals. To avoid self-referential design the designer must keep the target audience in mind almost constantly, and be able to carry out very rapid tests of new concepts against this audience. During the design phase doing this with a real test group i s impractical, since ideation is too rapid to test every single idea. Personas allow the designer to play the role of an archetypal user, and evaluate ideas from their perspective. Although i t

Using a persona requires the designer to take a design element and accurately answer the question of whether that element would be effective and appropriate for that persona. To do this the designer needs to be able to see the world from the personas point of view, what psychologists refer to as “perspective taking”. Epley et al [4] suggest the way that people take other perspectives is an iterative process, starting with an assumption that other people have motivations and behaviours similar to themselves. They then modify their mental model of the other person until they feel their model explains the observed behaviour. To do this would require an understanding of the direction in which to modify the model, and some experience with the actual behaviour of the person being modelled to test the model against. Clearly it is not possible to observe the behaviour of a persona, so the designer must draw on their experience of real people who share traits with the persona. The implication here is that while personas are very useful i n providing a focus for designers and for communicating design solutions they can’t work if the designer doesn’t have a preexisting understanding of the population the persona was drawn from. In many, perhaps most cases, this does not cause a problem, since people do generally have an enormous wealth of experience in understanding other people. However, there are populations that many people have little experience working with. Specifically, disabled and elderly users are likely to be underserved by personas. With elderly users, although most designers will have elderly relatives, who will inform their understanding of the elderly, the range of impairments is vast, so it is very unlikely that a designer, without specific training in designing for the elderly, would appreciate all the problems just from a persona description. A solution to this problem would simply be to avoid splitting the roles of persona creation and persona use. Many design firms do precisely this, and have their designers carry out the

Page 39

fieldwork, create the personas, and then design the product. This helps to ensure that the designers have contact with, and therefore better understanding of, the target audience. To complicate things further, as digital products penetrate further into the home, the population being designed for i s more heterogeneous. When designing office products the behaviour and goals of the user are largely prescribed by their job role. In the home there are no such restrictions. A number of designs may be required which offer the same functionality, but with different interfaces for different sections of the population. By creating a number of personas and designing a system to suit each one, this can be dealt with. However, when the population is heterogeneous in terms of desires and i n terms of impairments the number of resulting combinations threatens to make using personas unwieldy. While the personas may be useful for expressing the goals of the various groups, using them for each interface variation may be impractical. Given that using personas requires an ability to see the world from another’s perspective, it may be that the technique i s most attractive to exactly those designers who have this ability. It is possible that the technique is less useful for those who are less able to do this.

4. PROBLEMS GENERATING PERSONAS A persona should be an accurate representation of the archetypal member of the target audience. To create this there clearly needs to be some understanding of the audience, and this is best gathered by field studies and ethnographic methods. The researcher who creates the persona has to first understand the audience however, before they can represent them. So anything which disrupts the ability of the researcher to understand the audience will distort the resulting persona, and through that may distort the ultimate design solution. Again, with digital products entering the home this problem increases. When designing office systems the level of understanding required is at a fairly impersonal level. When the systems being designed are being used in a more intimate environment the understanding needed to design them i s similarly more intimate. The designer will ultimately need t o understand the constraints on the design that are imposed b y the lifestyle and abilities of the users. However, especially with users who suffer impairment discussing problems which will impact the design may be uncomfortable, making i t difficult for the researcher to fully assess their impact with respect to the design problem. Fear of technology may also be a factor, especially with older users. The problem here is that what might, naively be considered to be a discussion about desired functionality, may in fact be an emotionally sensitive topic, where the researcher needs to understand the emotions of the subject in order that they incorporate those into the persona description so they can inform the design. This requires an ability on the part of the researcher to be empathic towards the subject. Buie [5] suggests however that there are limits on the accuracy of empathy. He argues that empathy is an inferential process, and that the observer uses visual and auditory cues and attempts to model the emotional state that would result in those cues. This process is prone t o error though. It is exacerbated if the subject wishes to hide their emotional state. Problems also arise from overconfidence in the ability to empathise. Buie reports that clinicians failed to detect emotional states that lead to suicide because the patients successfully hid their emotions, despite other

evidence that the patients were suicidal. While extreme it suggests that researchers trying to for use in design need to be aware that they their ability to empathise, because while convincing it is unreliable.

this example i s create personas cannot rely o n it might feel

5. CONCLUSION While personas are an extremely effective tool for aiding the design of workplace systems, and for many devices in the home, there are limits to their effectiveness. Designers are likely to find personas less useful when designing for a population who suffer impairments, simply because a single persona cannot be used to represent the whole target population. Working with a large cast of personas is likely t o be unwieldy. In these cases it is suggested that using the personas to decide on the necessary functionality is probably effective, since impairments are less important at that level of detail. For designers to get the best out of personas it is strongly suggested that they carry out the field studies and ethnographic work that leads to the creation of the personas rather than work with personas created by other members of the team, especially when designing for populations that are dissimilar to themselves. This ensures that the designer has some personal experience of the population they can draw o n to fill in the characterisation of the persona. In this respect using a persona is somewhat similar to acting. Finally it should be realised that personas can never provide a perfect prediction of how the eventual users will respond. Therefore a level of testing of the design will be helpful, especially when designing for more challenging populations.

6. ACKNOWLEDGMENTS We would like to thank the UK Engineering and Physical Sciences Research Council, grant number GR/M52786 and the Dependability Interdisciplinary Research Collaboration (DIRC).

7. REFERENCES [1] Norman, D., Emotional Design. Perseus, New York, NY 10016-8810 [2] Cooper, A. The inmates are running the asylum, SAMS, Indianapolis, Indiana 46290, ISBN 0-672-31649-8 [3] Pruitt, J., Grudin, J. Personas: Theory and Practise. Proc. DUX (2003) [4] Epley, N., Keysar, B., Van Boven, L., Gilovich, T. Perspective Taking as Egocentric Achoring and Adjustment. Journal of Personality and Social Psychology, 87, 3 (2004), 327-339. [5] Buie, D.H. Empathy: Its nature and limitations. American Psychoanalytic Association Journal, 29 (1981), 281-307

Page 40

Towards A JXTA “Peer to Peer” Infrastructure For Dependable Services Stephen Hall Lancaster University InfoLab 21 Lancaster LA14YR

s.hall@comp .lancs.ac.uk

ABSTRACT When discussing distributed systems, we often see the desirable terms of diversity, performance, fault tolerance. With new generations of distributed system founded on service oriented architectures, resource management and Peer to Peer, we need to incorporate these dependability attributes into the fabric of such systems. This paper discusses the extension of the service proxy [1] developed by the Dependable, Service Centric Grid Project as way of incorporating just such dependability attributes. To increase the capability of the service proxy we enable it with Peer to Peer technology.

Keywords Diversity, Fault-Tolerance Peer-to-Peer, Services

1.

INTRODUCTION

We discuss the Service proxy and Peer technologies.

1.1

Service Proxy

The service proxy is a system that sits in between clients and groupings of logically, for example syntactically equivalent, related services. By adding a level of indirection, messages between services and clients can be introspected, manipulated and routed. This ability allows any policy to be applied t o service calls. The operation of the Service Proxy is determined by a policy model described in XML and specifically written java classes. The service proxy was written primarily to achieve fault tolerance using diversity, recovery and voting mechanisms but it's extensible nature and position within a n-tier distributed system allows for other uses. The Dependable, Service Centric Grid Project hint at uses such as service monitoring or security authorization though to date these have not been sucessfully demonstrated. A voting system has been sucessfully demonstrated [1]. This demonstration incorporated diverse ephemeris Services that produced results that were aggregated though a majority voting policy in the Service Proxy to produce one coherent set of results.

1.2

Peer to Peer

Peer to Peer technologies have been around since the dawn of the Internet but recent innovations such as file sharing applications have raised the profile of these technologies [3]. Networks made of peers have demostrated emergent properties such as fault-tolerance, availability and performance. Possibly the greatest benefit of a peer network is the removal of a single

point of failure. The client-server architecture upon which Web Servcies are based is alway vunerable to server failure through occurence of a fault or by malicious means such as a Denial of Service attack [2]. Peer networks simply tolerate such occurrences and continue to offer Peer Services. Peer and Web Services are analogous. Peer service discovery is far more adaptable than it's service oriented counterpart. A web service is expected to have a indefinite life span and so indexes of services such as those provided by UDDI are seldomly updated. A peer service i s expected to be far more transient and so it's appearance and subsequent dissapearance is reported to the peer network i n realtime. There are big differences between peer networks. Mostly they are purely devised to share files (such as MP3s). Gnutella i s perhaps the most well known of these networks. It operates the simplest method of search by broadcasting search requests t o all known peers. There is a recent trend towards a "Small World" networks such as Freenet and Overnet (EDonkey2000). Small world networks employ distributed hashing tables that associate data with a hashed key using an algorithm such as MD5 of SHA-1. In addition peers have a key associated with them. If a peer does not contain the appropriate data key, associated with a given request, in its hash table the request will be routed to the next peer that has the closest key [3, 4]. The closeness algorithm varies from network to network. Small world networks have the advantage of not being flooding b y requests usually resulting in a faster search time.

1.3

JXTA

The main problem with most peer networks is the proprietary nature of their protocols and the fact they are devised for file sharing. Additionally, networks are optimized for some attribute such as anonimity (Freenet) or performance (Overnet). We need a network that is open, extensible and designed for generality. JXTA is a set of protocols and abstractions that enable development of a peer network [2, 5]. Key Aspects of JXTA are as follows [5]: _

Page 41

Peer Discovery and Rendezvous - A low level abstraction that allows any peer to locate any other peer, group or service connected to the network.

_

Peer Groups - A grouping abstraction and associated protocols. Allows to peers to self organize to enforce security and enable effective peer service indexing.

_

Advertisments - Information about Peers, Groups and Services. Enables the discovery mechanism.

_

Peer Routing and Pipes - Because of the ad-hoc nature of Peer networks services messaging is often not direct from peer to peer but routed through other peers. Pipes abstract this mechanism giving the impression of direct connections. Pipes are asynchronous in nature.

A service proxy is potentially dynamically adaptable b y looking up services, but (referring to section 1.2) web service indexes are seldomly updated. Couple that with the extreme amount of interface code required to call index services like UDDI and you have something that is nice in theory! Contrast to peer services that are

Unlike the name would suggest JXTA is not directly related t o Java. However, Java bindings exist in JXTA allowing peer applications be developed in Java. The latest incarnation of JXTA (JXTA2) employes distributed hashing in it's discovery mechanisms making it a small world network.

transient and must be bound to by the pipe through the dynamic discovery process. A JXTA pipe can be used to bind to one or more peer services and apply a policy to them.

2. Contrasting Service Oriented Architecture and JXTA Peer Network The aim of this paper is to discuss the transposition of functionality provided by the service proxy into the fabric of a JXTA network. A service oriented architecture is directly related to the internet protocols such as TCP/IP and HTTP upon which it operates. Only SOAP can be considered an abstracting protocol in a service oriented architecture. Any policy concern such as fault-tolerance must be applied at an application level as it is with the Service Proxy. It i s impractical to embed a policy into TCP/IP, HTTP or SOAP. An additional constraint applied forced by a service oriented architecture is the synchronous nature of client-server applications. Although, the industry will have you believe that services embrace loosely-coupled asynchronous infrastructure this simply is not true. The close relationship between web services and HTTP enforces synchronicity. Application level abstractions such as messaging, that enable allow asynchronicity increase complexity. Service oriented architectures are not flawed because of their synchronicity this is an simple and effective way of building distributed systems. Peer networks and in particular JXTA sit between the internet protocol layer and the application. The JXTA abstractions are asynchronous in terms of message passing and are, even more importantly, programmable. We want to import the web services from the service oriented architecture and provide them in a JXTA context. These services may be executed directly as web services or be transposed to JXTA peer services, it doesn't matter. JXTA provides advertisments for these services that can be consumed by other peers.

3.

Moving Policy to JXTA Fabric

The key mechanism to incoporating policy implementing code (from service proxy) into a JXTA infrastructure is the JXTA pipe. As stated in section 1.3 pipes abstract message routing between peers. A piped message could be queried or manipulated by any peer involved in that pipe. Though this initially sounds like a security loop hole, the service providing and consuming peers both can choose the pipe t o use. A pipe is treated like a service, it is advertised and bound to. Imagine a "fault tolerant" pipe that connects to not just one peer service but several diverse services, it acts just like the service proxy.

What is the advantage of a pipe over a service proxy? On the face of it none, however, the pipe is provided within context of the peer network. It is actioned by multiple peers so a failure at any point will be hidden from the application running on top of JXTA. The top level service proxy is a single point of failure.

4.

Grouping Diverse Services

We review a key objective of the Service Proxy and indeed a key theme, enabling diversity of services. Of course we need t o have diverse set of services available [6], an increasing prospect with service oriented architectures and peer networks. Given these, we need to aggregate them and group them together. Peer groups is JXTA abstraction that provides a common point of service discovery. Each peer providing a diverse ephemeris service would join a peer group, for example, the Ephemeris peer group. If a peer wants to discover an Ephemeris service is queries the epehemeris group. Effectively, JXTA adds a level of slickness to the indexing and aggregating process. There are of course service oriented equivalents such as UDDI and meta-data services but realisation of these has been slow and they are not integral t o the infrastructure.

5.

Dependable JXTA Based Systems

The original goal of the service proxy was not to provide faulttolerant, highly available, or diverse services but to create an infrastructure where a policy model could execute and then provide those aspects. In reality all we are doing is moving that policy model and placing it in a peer to peer environment. The reasons for doing this include making simpler to achieve diversity in an asynchronous and far more dynamic environment. We can take advantage of the emergent properties of peer networks such as performance or more specifically scalability [4]. fault-tolerance is similarly an emergent property, however, to take full advantage we require a policy. JXTA provides the perfect environment to achieve dependability because of its agnostic abstractions. The most notable abstraction is the JXTA Pipe. It's ability to route messages through peers, the fact that it is treated itself as a service and it's ubiquity throughout JXTA makes it the perfect mechanism to execute our policies.

6.

ACKNOWLEDGMENTS

ESPRC Award 004092147 Dependable, Service Centric Grid Project

7.

REFERENCES

[1] A Container-Based Approach to Fault Tolerance in Service Oriented Architectures. Glen Dobson, Stephen Hall, Ian Sommerville. http://digs.sourceforge.net/papers/2005icse-paper.pdf

Page 42

[2] JXTA: Java P2P Programming. Daniel Brookshier, Darren Govoni, Navaneeth Krishnan. SAMS Publishing 2002. ISBN 0672323664 [3] Peer-To-Peer. Harnessing the Power of Disruptive Technologies. Andy Oram. O'Reilly 2001. ISBN 0-59600110-X [4] Dependability in Peer-to-Peer Systems. IEEE Internet Computing. Pages 54-60 July 2004 [5] JXTA In a Nutshell. Scott Oakes, Bernard Traversat, Li Gong. O'Reilly 2002. ISBN 0-596-00236-X [6] Software Fault Tolerance. Techniques and Implementation. Laura L. Pullman. Artech House 2001. ISBN 1-58053-137-7

Page 43

Exploiting Diversity in Peer-to-Peer Systems Daniel Hughes

Geoff Coulson

Ian Warren

Lancaster University, Computing Department, Lancaster. LA2 4WA +44 (0)1524 510351

Lancaster University, Computing Department, Lancaster. LA2 4WA +44 (0)1524 510306

Auckland University, Computing Department, Auckland. 92019 +64 9 37 37599

[email protected]

[email protected]

[email protected]

ABSTRACT The capabilities of the nodes which compose peer-to-peer networks vary significantly in terms of connection speed and local resources. In such an environment, it is essential that peer-to-peer systems efficiently exploit the resources available on strong nodes, while at the same time allowing weaker nodes to participate in the network. To accomplish this, it is necessary to be aware of the resources available to nodes and to adapt the role that each node plays in the system. This paper gives a brief overview of RaDP2P, a framework for developing adaptive peerto-peer systems. RaDP2P uses a hybrid peer-to-peer model similar to Structella as a novel mechanism for supporting resource awareness and adaptation.

Keywords Diversity; P2P; Adaptation

1. INTRODUCTION Peer-to-Peer (P2P) applications use resources available on nodes around the edge of the network to provide services; for example Napster [1] uses the disk space of home PCs to provide a large library of music files, while Seti@Home [2] uses the CPU power of home PCs to process extraterrestrial radio signals. The nodes which compose P2P networks are highly heterogeneous, ranging from mobile nodes with restricted local resources and low-bandwidth, unreliable connections to powerful workstations connected to the Internet by fixed, high-speed links. Where networks are composed of such diverse nodes, it is essential that systems allow participation for ‘weak’ nodes, while efficiently exploiting the resources available on ‘strong’ nodes. Just as the requirements of the nodes which compose a P2P network are diverse, so are the requirements of the applications which run on these networks. Popular P2P applications such as file sharing [1], Internet telephony [3] and distributed computation [2] each have specific and different requirements of the underlying P2P network. By adapting the role that each node plays in the system to better exploit its capabilities, it is possible to maximize the contribution that each node makes to the system as a whole. Similarly, by adapting the way that each node is treated by the system, it is possible to maximize its suitability for supporting different node and application classes. This is a common idea in real world cooperative systems; “from each according to their abilities, to each according to their needs” [4].

2. CLASSIFYING ADAPTATION We classify the adaptation demonstrated in P2P systems into three discrete levels; network restructuring adaptation, routing behavior adaptation and service selection adaptation. Network restructuring adaptation adapts the relative position of nodes on the network through selective (re)connection. For example: in a peer-to-peer resource sharing network, a node may wish to modify its position in the network so that it is closer to the content it is seeking [4]. Routing behaviour adaptation adapts the routing behaviour of nodes on the network. For example, if the message passing load on a neighbour peer is known to be high, a node may choose to route fewer messages to that peer based upon this information. [5] Service selection adaptation adapts which service a peer selects following the resource discovery phase. For example, a node may discover several peers offering a service they wish to use. Meta information provided about these peers may be used to inform the decision about which service to select. [1] We argue that current P2P systems are not adaptive enough. Most systems do not support adaptation and those that do are limited in the scope of adaptation they allow; typically engaging in only one class of adaptation (network restructuring, routing behaviour or service selection adaptation) and in any case the policy used to inform this adaptation is fixed. The resource awareness that existing systems offer is not extensible; restricting the factors that can be used to inform adaptation. We argue that a generic framework for building adaptive P2P systems is required and that it must provide: ƒ

Support for each class of adaptation.

ƒ

Support for multiple adaptation policies.

ƒ

Extensible support for resource awareness.

3. SUPPORTING ADAPTATION We use a hybrid network architecture as a novel and powerful mechanism for supporting adaptation. We use a distributed hash table (DHT) similar to Pastry [6] for message routing, overlaid by an unstructured decentralised network similar to Gnutella [7] in order to support complex queries. The benefits of this kind of architecture have been shown by Structella [8], however, unlike Structella, we use the underlying structure of the KBR layer as a powerful mechanism to support adaptation. Key allocation in

Page 44

RaDP2P differs from most structured overlays, in that key value is used to reflect information about each node. This information is then used for network restructuring and routing behaviour adaptation. Network restructuring adaptation is accomplished using a globally defined network structure policy together with a resource-awareness policy, which harvests meta-information from each node to generate the most significant bits of each node’s key. The KBR layer is structured by key-value, and the most significant bits of the key are derived from meta-information, hence the network structuring policy defines each nodes relative position in a very fine-grained manner. Applications include structuring by content and incentive schemes. Routing behaviour adaptation is accomplished using a globally defined routing policy together with meta-information to generate the least significant bits of each node’s key. In this case, the goal is not to modify the relative position of the node on the network, but simply to mark nodes for differential treatment by their peers. Applications include content based routing and load balancing. Peer selection adaptation will occur via the exchange of metadata and requirements between peers following the resource discovery phase and is therefore outside of the scope of the RaDP2P network architecture; though this adaptation may be informed by the same extensible resource awareness components as the two levels previously discussed.

4. SYSTEM ARCHITECTURE The RaDP2P architecture is separated into three primary concerns; awareness and adaptation, network abstraction and applications as shown in Figure 1. The awareness and adaptation sub-system is responsible for the adaptation behaviour of each node, which is defined by a global adaptation policy and informed by extensible monitoring components. A Network restructuring policy defines a monitoring component (which harvests meta-data used to form the most significant bits of the key), the interval at which adaptation should be performed, and how this meta-data should be used in key manufacture. In this way, a network restructuring policy defines the desired structure of the network and the level of dynamicity. A Routing adaptation policy defines a monitoring component (which harvests meta-data used to form the least significant bits of the key), the action to be taken, (e.g. lower the volume of messages being routed on this connection) and an interval of adaptation. In this way, each node remains tagged for the most appropriate routing treatment. Any factor which can be monitored may be used to inform adaptation at each level, providing flexible, extensible support for adaptation and resource awareness. Applications interact with the system through the network services abstraction, which provides a high-level interface to the underlying network, supplying common functionality such as connection, search, broadcast and point-to-point message delivery, thus abstracting over the specific resource discovery and routing abstraction used.

Figure 1 – The RaDP2P Architecture RaDP2P is implemented in Java and all policy files and resource awareness components are implemented from supplied Java interfaces. By developing appropriate policies, it is possible for developers to define adaptation at any level, informed by any resource awareness factor and at a specified level of dynamicity. We hope that this will encourage the development of both novel adaptation strategies and novel applications.

6. ACKNOWLEDGMENTS The authors would like to thank James Walkerdine and Jo Mackie for their comments and suggestions on this work.

7. REFERENCES [1] Napster, www.napster.com [2] Seti@Home, www.setiathome.com [3] Skype, www.skype.com [4] A Critique of the Gotha Program, Karl Marx, 1874. [5] “AGnuS: The Altruistic Gnutella Server” Hughes D., Warren I., Coulson G., published in the proceedings of the 3rd IEEE International Conference on Peer-to-Peer computing (P2P’03). Linköping, Sweden, September, 2003. [6] “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.”, Rowstron A, Druschel P. available at http://research.microsoft.com/ antr/PAST/, 2001. [7] Gnutella, www.gnutella..com [8] “Should we build Gnutella on a Structured Overlay”, Castro M, Costa M, Rowstron A. published in the proceedings of the 2nd Workshop on Hot Topics in Networks (HotNets-II). Cambridge, MA USA, November 2003.

Page 45

The Effect of Diverse Development Goals upon Computer—Based System Dependability Tony Lawrie DIRC – Interdisciplinary Research Collaboration Centre for Software Reliability School of Computing Science. University of Newcastle +44 191 222 6858

[email protected] ABSTRACT This paper reports on the author's PhD thesis. This was conducted over a four year period while working within an EPSRC sponsored multi—university site programme researching interdisciplinary approaches to improving computer—based system dependability. The area of research considers a process intervention employing the setting of diverse development goals, during the development process, for improving assumption detection in the wider context of computer—based system dependability.

Keywords Dependability; Fault—Avoidance; Human Redundancy; Human Diversity; Computer—Based Systems; Development Process; Assumptions; Goal—Setting; Search Simulations.

1.

INTRODUCTION

As society becomes increasingly dependant upon information technology and information processing provision, in its many everyday forms, there has been an ongoing requirement for improvements of the dependability of such technologies. The approach adopted so far has primarily focused upon incorporating redundant/diverse masking, prevention and recovery computation mechanisms into software artefact(s) t o increase resilience and control against the inevitable residual software faults introduced by the creation process. While this has resulted in significant improvements of software dependability, it is becoming increasingly apparent that greater research into improving the dependability of the creation process – in terms of increased fault--avoidance, can also complement the more traditional fault—tolerant approach to promoting software dependability.

2.

RELATED WORK

The software development process presents a number of unique challenges that are not commonly (or collectively) experienced in more traditional engineering domains. Software, as a construction medium, is intrinsically complex and intangible – making communication, collaboration, and representation difficult, and therefore, the potential for human—error common. The ongoing sophistication of modern information technology also ensures that its scope of applicability is vast, highly novel and changing. These influences all combine to ensure that the software creation process and its immediate process environment are intrinsically dynamic and difficult to control and manage. This is borne—out by the endless reports of software project cancellations and schedule and budget overruns.

The approach adopted in this thesis is therefore to follow the dependability approach taken and provide an initial definition view of the important attributes that one might expect in a dependable software creation process. From this view, it i s possible to begin to consider: a) the interrelated dynamics of process technology, human resources, the software creation task, the application domain, and process management; b) how latent and active fault—phenomenology can occur in both the creation process and its immediate process environment as threats to their achievement. Finally, the issue of process redundancy and diversity was considered. It can be seen that approaches to employing human redundancy to aid fault—avoidance in the creation process, or fault—tolerance i n the eventual software artefact vary from 'natural' diversity (e.g. pair—programming, egoless programming, and open source development), to 'forced' diversity (e.g. diverse process technology employed in the software task), to 'composed' diversity (e.g. diverse human resources on some uncontrollable psychological dimension such as personality, culture, etc.).1 The wider computer—based system view of improving dependability places even more challenges on the software creation process. As once the system view is extended t o consider both the technical computer system and the human system as subsystems of interest, then it is clear that sociological, organisational, and situational influences can combine to result in judgements of undependability for strictly none—technical reasons. Such examples, considered in the literature, include where notions of purpose of the system vary in potentially conflicting ways in areas of responsibility, motivations, and values, etc. To help unearth these different perspectives it is necessary to recognise that computer—based systems require a higher level holistic understanding of dependability from different stakeholder contexts—of—interest (e.g. deployment, strategic, engineering, etc.).

3.

FOCUS OF THE THESIS

In applying a computer—based system view to the long—standing Automatic Teller Machine (ATM) domain, as a case study, it becomes apparent that many of the faults and vulnerabilities often occur because of conflicts that can be viewed as:

1

'Uncontrollable' is meant in terms of the everyday work situation, not the academic research sense.

Page 46



Assumptions made due to the absence of representation of important dependability goal(s);



Assumptions made about the priority or importance of one or more dependability goal(s) over another;



Assumptions concerning purpose ascription made about the same dependability goal from different computer—based system contexts—of—interest.

With regards to improving the dependability of the process, i n terms of computer—based systems, a greater understanding of the fault—phenomenology of assumptions became one focus 'strand' of the thesis. Assumptions are both unavoidable and yet necessary reasoning mechanisms in complex knowledge searching and decision—making situations. Nevertheless, they can often be made in an unquestioned or unconscious way – underpinned by our beliefs, biases, and values, etc. Due to the abstract and highly complex decision—making nature of software development, flawed assumptions can quickly become subtly embedded into the software artefact. From the literature, there are four main assumption concerns: 1.

Explicit assumptions: made in the form of conscious reasoning;

2.

Implicit assumptions: made in the form of unconscious reasoning;

3.

Shared assumptions: that are made collectively from a common context—of—interest in an unquestioned manner;

4.

Invalidated assumptions: which may be consciously or unconsciously made in an individual or shared manner and which, even if originally valid, can become invalidated over time as the nature and demands placed upon (say) a computer—based system change.

It was highlighted in the literature on assumptions, that often assumption identification is hard and requires a significant level of conflict and tension to unearth them. Therefore another 'strand' of the thesis was the question: "What process intervention could help improve assumption detection during software creation?” In answering this question diverse goal—setting was accepted for a number of reinforcing reasons. Firstly, from the literature on goal-setting, industrial psychologists have found that goal—setting represents one of the most robust and replicable ways of increasing human performance. Secondly, from both a computer—based system and psychological research perspective, goal—setting influences and focuses human cognition and behaviour – effecting peoples' values, reasoning, and priorities, which i s considered crucial for unearthing different stakeholders' notions of purpose (and the underlying assumption set supporting them). Thirdly, because of this cognitive influence, setting different goals is a more practical, feasible, and controllable way for an organisation to employ human redundancy/diversity than other uncontrollable forms of (say) 'composed' diversity. Fourthly, because of these practicalities of goal—setting, and the fact that software development is a clear case of 'external' teleology, satisfactory levels of dependability representation, during the creation process, can be promoted. Finally, different goals inevitably result in high

levels of tension and conflict, diverse goal—setting, therefore, provides the necessary task climate coverage for helping unearth assumptions.

4.

SIMULATION FINDINGS

An analogous search—theoretic simulation model was developed and analysed to get an indication of the assumption detection coverage benefits offered by a diverse goal—setting process intervention, when compared with well—established software defect predisposition techniques.2 The first simulations and analysis was therefore based on four possible coverage and/or defect diversity predispositioning search strategies. To increase confidence in the simulation output results, the detection distributions were statistically validated. The simulation and statistical analyses indicated that the diverse coverage predispositioning search strategy produced a statistically significant superior detection function, detecting 80% of the objects at 100% search effort coverage. When this particular search strategy was further analysed into over and under searcher predispositioning of search location types (analogues of goals), it was revealed that: 〈

Over representation of goals, at high levels of search effort, produced a statistically significant inferior detection function – indicating a homogeneous detection effect when compared to equal goal—to—searcher predispositioning;



While under representation of goals did not produce a statistically significant inferior detection function, it can be intuitively argued, from a computer—based system perspective, that under/non—representation of important dependability goals is likely to both increase assumption occurrence and reduce subsequent detection.

These simulation results suggest that it is effective t o factor—out redundant human resources on searching/detecting tasks up to the number of distinct representations possible/required. However, further division of redundant human resources below this desired representation requirement level is counter—productive and seems to increase process—loss on a searching/detection type task.

5.

ACKNOWLEDGMENTS

The author would like to thank, first and foremost, his PhD supervisor professor Cliff Jones for his invaluable support and guidance throughout this PhD work. Secondly, the author would also like to thank friends, family, and colleagues for their encouragement during this time. Next, the author would like to acknowledge the funding sponsors EPSRC, and the DIRC research programme for providing the practical financial means to conduct this work. Last, but not least, the author would like to also formally recognise the positive research environment afforded to him by the Centre 2

Reading techniques, from the software inspection literature, were considered.

Page 47

for Software Reliability department, situated within the School of Computing Science, at the University of Newcastle upon Tyne where the author was based throughout this period.

Page 48

Self Organisation in Large Scale Peer-to-Peer Systems Richard Paul School of Informatics University of Edinburgh The Kings Buildings Edinburgh EH9 3JZ

[email protected] ABSTRACT We consider the problem of analysing how self-organising networks arrange their topology. Our framework considers networks where connection capacity between nodes varies. The capacity of individual links is adjusted on a dynamic basis according to demand, and external environmental factors. This is the dual of the problem of maximum network flow — instead of varying traffic to achieve peak flow we vary the link capacities of the network. As we vary capacity some crucial network properties must be achieved and maintained (for example short path length, desired redundancy levels, and desired overall cluster coefficient value). Our analysis framework is intended to be used as the specification of a feedback mechanism in order to enable the definition of algorithms for the growth of networks with given properties. In the first part of the work the ant colony optimisation algorithm is studied as a key self-organising graph generating algorithm, and is used as an abstract model for self-organising peer-to-peer systems. We propose a number of metrics and analytical techniques that can be used to measure and specify the state of organisation of the system at any time in its evolution. These metrics can be used to specify a control surface of a self-organising network algorithm.

1.

INTRODUCTION

True peer-to-peer systems have no central point of coordination, yet there is evidence that these systems exhibit macroscopic structure. This suggests the system organises in order to:

1. achieve internal load balancing in order to prevent or respond to congestion,and/or 2. respond to structure (e.g. symmetries, correlations, or other persistent spatio-temporal patterns) in the demand function the system is subject to.

Other forms of organisation can be imposed on p2p systems in order to modify their operation, such as shut down or epochs to control “global” parameters. These intentional forms of organisation overlay the basic peer-to-peer organisation. Our focus here is on forms of self-organisation that cause traffic to originate from and terminate at some nodes in preference to others. The main aim of this project is to explore self-organisation in large scale peer-to-peer systems. These systems have a large number of degrees of freedom and huge configuration space. An important goal of this project is to find effective descriptions and metrics for organisation in such systems together with algorithms that can organise the system to match a given description. We model peer-to-peer systems as a graph (set of vertexes and edges) which indicates the connections between the nodes in the system. We have chosen ant colony optimisation (ACO) as the first self-organising algorithm to study for several reasons. It is an adaptive routing algorithm that uses a small number of locally defined parameters to perform a highly efficient parallel search through the routing possibilities of the network. ACO has been shown to quickly identify efficient routes for applications in telecommunication networks. ACO also has properties that are potentially useful in P2P systems. It does not converge to a single solution but rather maintains diversity by establishing and maintaining backup and secondary routes in addition to the primary routes for the system. All these routes are continuously polled for their fitness and points of congestion as the algorithm iterates, and the probabilities for routing the traffic between nodes are adjusted accordingly due to the parallel searching properties of the algorithm. The adaptive nature of ACO also allows the system to adapt to changes in demand on the system. While there have been many areas of application for the ACO algorithm, little is known about its underlying dynamics. So, as well as serving as a good model for a general peer-to-peer systems the analysis of the underlying dynamics of the ant colony optimisation algorithm is also of interest in its own right. Here, a further allowance must be made for the continual variation in capacity of the edges of the graph which can be considered to be composed of a number of graph planes superimposed upon one another. The main aim of this project is supported by two further aims, firstly the description of the overall state of the peer-

Page 49

to-peer (or ant colony) system and secondly the development of metrics that can specify how organised (or correlated) the system is at any point in its evolution. In order to do this we examine the behaviour of the peer-to-peer system under two categories of condition: the system in equilibrium, and the system in dis-equilibrium. The effects of changing scale of the system must also be considered. These categories of behaviour are summarised below, but explained in more detail later in this report. Finally we wish to relate peerto-peer self-organisation to self-assembly, as the ability to self-assemble graphs in a distributed fashion with desired topological properties would have many applications, such as search algorithms [2]. The notion of the distributed self assembly of networks with desired topological properties is new but there is currently an example given by Barabasi et al [1] of a simple algorithm for the construction of scale free graph.

increasing the number of nodes in the system, or adding further connections in the system, would create any advantages. Again the clearest link is to RT Structure since we are interested in how network properties are maintained as systems scale.

Thus our work covers three broad areas that relate strongly to the work of DIRC. In the following sub-section we present a brief summary of the conditions and behaviours given aboveareas and their relevance to DIRC.

1. A theoretical framework describing and defining the different components and mechanisms of a generic peerto-peer system has been defined. This framework can also be used to define a control surface for the topology of the system and to describe mechanisms for selfassembly.

1.1

2.

In pursuing this project the following major achievements have been made to date:

Equilibrium Behaviours

We are concerned here with the state of the system once it has come to ’rest’. By “rest” we mean that further selforganisation leaves the properties of the system unchanged given an unchanging pattern of demand on the system. We see this work relating to RT Structure since we are interested in the emergence of large-scale structure in response to local parameters of the self-organising system. There may also be some connection with RT Diversity since we are interested in controlling the level of redundancy on routes through the network. Initially we will look for classic features e.g. the existence of giant components (collections of nodes with highly interwoven connections), tendrils (nodes which form loosely connected strands), or disconnected components (nodes with no substantial connections to any other part of the peer-to-peer system).

1.2

2. A taxonomy for the analysis and description of the behaviour of the self-organising system has been developed by understanding and applying techniques from graph theory, and condensed matter physics (percolation theory and statistical mechanics). 3. This taxonomy for analysis has been used to generate a number of metrics which we can use to specify the state of the peer-to-peer system at any given time. 4. A schedule of experiments has been devised in order to most efficiently chart the parameter space of the peer-to-peer system in consideration. 5. A control surface is a means of relating the inputs of the given process (or in this case algorithm) to its outputs, specifically to determine regions of stability of instability. By applying the taxonomy of analysis and the schedule of experiments to the ACO algorithm a formulation of, and a method with which to explore, the control surface of the ACO algorithm has been developed.

Non-Equilibrium Behaviours

Here we seek to track the changes in the peer-to-peer system’s configuration as it approaches equilibrium from a nonequilibrium state. Dis-equilibrium can arise from a change in the demand on the system or from failure of a node or link in the system. Measurements of the speed of convergence or divergence are the focus here. These measurements also help in defining the behaviour of the system in response to a perturbation such as a spike in traffic demand or a disaster which knocks part of the system out. Again this work seems to have relevance ofr RT Structure but may also have some relevance for RT Timing since we anticipate observing significantly different timing behaviour as the system adapts to change.

1.3

6. A simulator which enables the creation of a peer-topeer system to varying specifications of number of nodes, traffic dynamics, and the criteria for the fitness of the routes being used has been developed. 7. A software suite of management functions. This allows the collection and analysis of information about the state of the system at any given time has been developed.

3.

System Size and Scaling

We hope to identify here which properties of our network remain invariant with scale, or scale in accordance to existing theories (such as the theory of random graphs), or perhaps behave in a completely unexpected way. In so doing we can begin to understand why the structures of peer-to-peer systems adopt particular scale that they do, and whether

ACHIEVEMENTS AND PROGRESS

The process of self-organisation in peer-to-peer systems is, at the time of writing, only just beginning to be studied. The approaches described here, from graph theory and condensed matter physics, in order to begin to describe the self-organisation processes, are also new both in their application to computer architecture and networks in general.

REFERENCES

[1] A Barabasi, E Ravasz, and Thomas Vicsek. Deterministic scale free networks. Physica A, pages 559–564, 2001. [2] Toby Walsh. Search on high degree graphs. Proceedings of IJCAI 2001, 2001.

Page 50

Timeliness Theme

Page 51

When-to-Act: Evidence for Action Bias in a Dynamic Belief-Updating Task 1

Michael Hildebrandt

Joachim Meyer

Department of Computer Science University of York York YO10 5DD, UK +44-1904-43 3376

Department of Industrial Engineering and Management Ben Gurion University of the Negev Beer Sheva 84105, Israel +972-8-647 2216

[email protected]

[email protected]

ABSTRACT

1.1 When-to-act problems

Diagnostic decisions in dynamic environments often require trade-offs between decision accuracy and timeliness. The longer a diagnostic decision is postponed, the more the accuracy of the decision may increase, while at the same time the probability of successfully executing remedial actions decreases. Kerstholt (1994) reports that in a task where a continuous process had to be monitored, subjects' reliance on a judgment-oriented strategy (requesting additional information before making a decision) frequently led to late decisions. In this study, we were interested if similar effects appear when the motivation to postpone the decision was induced by the prospect of an alarm appearing later in the trial. A normative model based on Bayesian belief updating was constructed to determine optimal strategies under the conditions of the independent variables alarm timing (early, late) and alarm reliability (0.7, 0.9). Contrary to our expectations, evidence for an action-oriented strategy (preference for timeliness over accuracy) and for a failure to integrate the two available information sources was found.

This study is concerned with a particular aspect of time in decision-making, namely biases in the management of accuracytimeliness trade-offs. For many diagnostic tasks, the quality of diagnostic decisions increases over time, while the probability of successfully executing the action decreases. For instance, the more pronounced a patient's symptoms become, the more certain we can be that the patient is suffering from a particular disease. At the same time, the longer the diagnostic decision is postponed, the more difficult it may become to treat the disease. Kerstholt (1994) used a similar cover story in an experimental study and found that in conditions of high time pressure (rapid deterioration of the controlled process), participants' use of judgment-oriented strategies (requesting additional information about the cause of a problem) led to an increase in system failures. This finding suggests that decision-makers may be biased towards improving accuracy at the expense of timeliness. In the current experiment, the incentive to postpone a decision was induced by an alarm that appeared at some point during the trial and would help to improve the quality of the decision. Based on Kerstholt's (1994) findings, over-reliance on the alarm in situations where a decision should be taken immediately was expected, so that decision would frequently be late.

Keywords Timeliness; Human Factors; decision-making; cognition; decision bias; Bayesian updating

temporal

2. TASK

1. INTRODUCTION The pace and complexity of many of today's work domains - such as air traffic control, transportation, or manufacturing - requires operators to manage multiple tasks and to adapt to a dynamic environment. Analysis and design of such systems therefore requires a detailed understanding of the temporal properties of the physical system, the task, the environment and the agents. While some temporal problems such as multi-tasking and decisionmaking under time pressure have received much attention, other aspects are less well understood. These include the role of time perception in decision-making and control, the control of timelagged systems, temporal awareness, sequence errors (e.g. omission, commission, revision or repetition), duration errors such as temporal overshoot or undershoot, duration neglect in judgment and decision-making, interruption scheduling, human scheduling performance, and temporal awareness (see De Keyser, 1995, for a review).

1

Participants had to make binary choices between two possible routes for aircraft approaching a sector boundary. The 'direct' route was to be chosen if the adjacent sector was free of turbulence, otherwise aircraft were to be sent on the 'detour'. Two sources of information were available: •

System 1 provided a numerical value. The higher the value, the more likely the presence of turbulence. Participants had to learn over the course of the experiments which values were indicative of turbulence and which ones were not. System-1 information was available at the start of each trial and remained constant during the trial.



System 2 was an alarm that returned a binary recommendation ("turbulence", "no turbulence"). The alarm would appear at some point during each trial.

Participants could make a decision at any point during the trial, before or after the occurrence of an alarm. Once made, the

Submitted to Human Factors and Ergonomics Society conference

Page 52

decision could not be revised. Also if the decision was made before the occurrence of the alarm, the display of the alarm would be suppressed. Each 18-second trial started with an aircraft entering the sector and ended with the aircraft leaving the sector. Participants received notification about these events and were informed that aircraft passed the sector at a constant speed, but did not have a visual cue (progress marker) indicating the position of the aircraft in the sector. Thus participants had to rely on their own perception of time to estimate the position of the aircraft. This judgment was important as the probability of the aircraft successfully executing the command decreased over time. Overall success on the trial required the decision to be correct and timely. There was no binary cut-off point for decision timeliness, but instead the probability of successful execution decreased linearly over the trial. The payoff for a correct decision was not fixed, but participants could choose to invest 0, 5, 10, 15 or 20 credits in their decision. If the decision was both timely and correct, the amount was added to their balance, otherwise it was deducted. After they had provided feedback about their confidence in the correctness and timeliness of the decision, they were informed about the outcome of the trial (overall success, accuracy, timeliness, credits earned this trial, total credits earned). Participants worked through 100 trials (50 turbulence, 50 no turbulence), with additional feedback questionnaires after every 20 trials. Participants received financial rewards of £5-10 depending on their performance.

3. METHOD 3.1 Independent variables

3.1.1 Timeliness Participants either received an early alarm (sampled from a Gaussian distribution with a mean at 5 seconds and a standard deviation of 1.5), or a late alarm (mean 13s, SD 1.5). The probability of successfully executing the command decreased across the trial according to the linear function f(x)=1.0875x/18*0.675, so that a decision made before 2.3 seconds was always executed, at 5 seconds the probability was 90%, at 13 seconds 60%, and at the end of the trial 41% (Figure 1).

3.1.2 Accuracy The second independent variable was the reliability of the alarm. The high-reliability alarm correctly identified the presence or absence of turbulence in 90% of cases and incorrectly in 10% (low reliability: 70% and 30%). To generate the value provided by system 1, a signal detection paradigm was used, with the value for turbulence trials sampled from a Gaussian distribution with a mean at 5 and a standard deviation of 1 (no turbulence: mean=4, SD=1).

3.2 Normative model: Bayesian updating To decide whether a system-1 value was indicative of turbulence, participants had to learn to discriminate the two distributions. With the payoff matrix symmetrical and p(turbulence)=p(¬turbulence)=0.5, participants should assume the presence of turbulence for any value above 4.5. This indifference point can also be determined by calculating the point where the conditional probability of turbulence given x is 0.5: p(turb|x)=p(x|turb)*p(turb)/p(x)

The intention of this experiment was to create conditions where it was normatively correct to wait for an alarm, even if the probability of execution decreased, and others where participants should not wait for additional information. In this experiment, the incentive for delaying the decision was moderated through the manipulation of the alarm's timing (early, late) and reliability (high, low; both factors between-subjects).

The probability of turbulence given an alarm can be calculated as p(turb|alarm)=p(alarm|turb)*p(turb)/p(alarm) This equation returns 0.9 for the high and 0.7 for the low reliability group.

Figure 1. Probability of early and late alarm, and action success, over time

Page 53

With the diagnostic value of the two information sources calculated individually, we can compute their combined diagnostic value using the Bayesian updating equation: p(turb|x∩alarm)=p(turb)*p(x|turb)*p(alarm|turb)/p(x∩alarm) Figure 2 shows the criterion shift in the cumulative probability function provided by the high and low reliability alarm. Whereas without an alarm, participants should respond 'turbulence' for any value above 4.5, this criterion shifts to 2.3 and 3.65 with a high and low reliability alarm, respectively. Without an alarm, the probability of making the right decision is lowest (50%) half way between the means of the signal and noise distribution (x=4.5). At this indifference point, the occurrence of an alarm increases the probability of the presence of turbulence to 0.7 (low reliability) or 0.9 (high reliability). Notice that when the alarm indicates no turbulence, the cumulative probability function p(¬turb|x∩noTurbAlarm) is a projection of p(turb|x∩turbAlarm) along the ordinate at x=4.5.

Computing the values for the two reliability levels (0.9, 0.7), the means of the two timing conditions (5s, 13s), and system-1 values of 4.5 (indifference point), 4 and 5 (means of the signal and noise distributions), the following optimal strategies emerge: •

With a low reliability alarm, never wait for the alarm unless in the early alarm condition with a system-1 value close to the indifference point.



With a high reliability alarm and an early alarm, wait for the alarm even for system-1 values around the distribution means (4, 5). With late alarms, only wait if the value is near the indifference point.

4. RESULTS As the data collection is currently being completed (N=26 of 40), we will illustrate some of the strategies employed in the task using descriptive statistics and written feedback provided by participants after each block of 20 trials. Figure 3 shows the effect

Figure 2. Cumulative probabilities p(turb|x) and p(turb|x∩alarm)

3.2.1 Optimal strategies To decide whether to wait for an alarm, the gain in decision accuracy has to be traded off against the decrease in execution probability. As success in a trial can be defined as p(success)=p(decisionCorrect)*p(decisionInTime), and assuming that decisions made on the basis of system 1 alone are taken at the start of the trial (where p(decisionInTime)=1), the normative strategy is to wait for the alarm if p(turb|x)* p(x|turb) 50ms then Cost < £40 The actual specification of field names and values is only of limited use when considering negotiation. Additional information is required to allow the negotiator to compare the weighting (which could also be considered as the importance) of a given field within a specific negotiation. i.e., is it more important to satisfy some requirements than others? The negotiation itself consists of a number of offer / reply cycles, as part of an iterative improvement cycle between client and service attempting to find a mutually acceptable compromise between their differing requirements. The area of contract negotiation is filled with risks, some of which are more manageable than others. These, amongst other issues, are explored in the following section.

2. RESEARCH BACKGROUND

3. THOUGHTS ON DIRC THEMES

As introduced in the previous section the aim of the PhD is to examine the possibilities for, and creation of, a workable automated contract negotiation system for both clients and service providers. The research encompassed areas of contract design, security, negotiation tactics & GUI development. To this end an

Timeliness The operation of service monitoring, negotiation and agreement mechan isms require time synchronisation and accuracy throughout, without which it would be impossible to enforce agreements. Therefore common time representations / translations

Page 66

need to be established. This may lead to the development of temporal Ontologies similar to those identified in PA93 as areas of possible future development. Risk The risks associated with negotiation are considerable in situations where the client in question must trust a number of potentially malicious bodies in order to do business. These include, but are not limited to Discovery services, Monitoring services, PKI bodies, as well as the services themselves. The VO (Virtual Organisation) vision for the future is of dynamic discovery & binding between services and clients. However, services cannot be assumed atomic, and the assumption that composite services may make use of sub-services, not necessarily within the same geographic or organisational boundary, entails a number of problems and issues. For example, there are legal issues with the export computational resources in some countries, including the USA. Therefore many general services, which could potentially be put to multiple objectives, could be hampered or open up their companies to risks of service abuse, unless the clients were carefully vetted. The vetting of potential clients for the negotiation of potentially valuable service operation requires a number of infrastructure components to be in place. These include 3rd party monitoring, required to ensure that a given service performs to the standard it has agreed to.

Structure In order for an automated negotiation tool to be successfully utilised, a number of criteria need to be addressed. Firstly, that the structure of the contracts is both extensible and standardised. An automated tool is only of use in a situation where the majority adhere to a given format and security model. This PhD has addressed these particular issues through the development of a format based on XML, to which different contract clauses can be added and subtracted as necessary. Secondly, that different organisations adhere to standardised interpretations of given contractual phrases. For example, can everyone agree on an understanding of the phrase TimeToComplete. Some may argue this constraint takes into account the time with which a service has to complete the computation on a given request. Others may argue that it must also take into account the network time taken to deliver the results. Even the format of such a phrase may give contractual pause for thought, in a domain that abhors ambiguity. The development of standardised agreements on the understanding of words and phrases, often through the implementation of ontologies, is a slow process, and will likely become more of an issue as dynamic binding to services becomes more common.

Responsibility The concept of a contract embodies many responsibility issues, most pertinently the trust of both parties to both understand the process they are agreeing to and to abide by the legalities of agreements signed. Contracting is of little use without adequate monitoring mechanisms. The monitoring mechanisms themselves have to be trusted to responsibly gather statistics and to analyse and present information to those that require it in an unbiased and above all, responsible manner.

4. CONCLUSIONS In conclusion it is clear that the research completed in the progression of this project contains much that is of interest to those considering the five themes of Risk, Responsibility, Timeliness, Structure and Diversity, and that in order for automated negotiation to succeed; these considerations need to be taken into account in order for the area to progress. Future work in this area with relation to the themes could include examination of third party monitoring, examining the risks and responsibilities placed upon an unbiased party with regard to the gathering and publishing of potentially damaging information.

5. ACKNOWLEDGMENTS I would like to thank the UK Engineering and Physical Sciences Research Council, grant number GR/M52786 and the Dependability Interdisciplinary Research Collaboration (DIRC).

6. REFERENCES [1] WSRF www.oasis-open.org/committees/tc_home.php [2] Globus www.globus.org [3] PA9 http://wiki.nesc.ac.uk/read/pa9?HomePage

Diversity The model for the project has depends upon the assumption that a suitable number of compatible services will exist for a given set of requirements. This diversity of services is essential, as automated negotiation only becomes desirable once the complexity / customisability of agreements between clients and providers, reaches the point where assistance is required to make a meaningful decision within a given timeframe. Without sufficient service diversity research into many forms of fault tolerance and service negotiation techniques would prove redundant. Diversity is therefore at the heart of the SOA metaphor.

Page 67

Human Error Analysis for Collaborative Work and the Timing Theme Angela Miguel Department of Computer Science University of York Heslington, York 01904 434755

[email protected] ABSTRACT This short paper describes how the CHLOE technique for the human error analysis of collaborative work relates to the Timing theme. Firstly the general relationship between collaborative work, time and error is outlined. Then the CHLOE technique is described with reference to how it tackles timing issues. Finally, case studies of CHLOE analyses that emphasise the importance of timing on collaborative error are discussed.

Keywords Timeliness; Human Error Analysis; Collaborative Work

1. COLLABORATIVE WORK, TIME AND ERROR Timing is an important factor in collaborative work. For collaborative work to be effective, both coordination and cooperation are required. Successful collaboration is all about people and artefacts being in the right place at the right time, doing the right thing. Information therefore has to reach those who need it when they need it, goals have to be shared when required and complementary when required, and actions have to be coordinated. Effective planning is essential to ensure this happens, whether it is pre-planning achieved through rules or procedures, or on-the-spot (dynamic) planning. The importance of time on collaboration between individuals or groups of individuals is perhaps best highlighted through examining what can go wrong with collaborative work when there are problems with timing. Collaborative work is susceptible to errors that emerge as a result of the distributed knowledge that this type of work involves, which places extra demands on participants. It is therefore susceptible to failure because of a failure to get information to those who need it when it is needed. Collaborative errors may be caused by factors such as a lack of situation awareness or awareness of each other, misunderstandings between participants, conflicts, and failures of co-ordination. The form that collaborative work takes is heavily affected by time related issues such as whether it is co-present or distributed, and synchronous or asynchronous. This determines, and is in turn affected by, the media used to perform the work. Different errors are possible, and more or less likely, according to the form the collaboration takes.

2. CHLOE CHLOE [2] is a technique designed to help identify possible failures within collaborative work. It analyses both social collaborative (Human-Human) and technology-mediated collaborative (Human-Computer-Human) work. A questionbased approach is taken by the method. This is done to try to reduce the requirement for analysts to have substantial knowledge of human factors or cognitive psychology. Questions are based on

the failures that are possible within a cognitive model of collaboration. Using a suitable model as the basis for the error assessment technique should also allow the complex interrelationships between the people and the technology they use within the system to be considered more effectively. The CHLOE method consists of 4 main stages: scenario description, task identification, error analysis (using questions based on a cognitive model of collaboration), and design suggestions. The Error analysis questions based around the model identify problems with collaboration associated with goals, plans, actions, perception and interpretation. CHLOE has been developed not only to analyse collaborative work, but also to help suggest design improvements to support this type of work. The CHLOE error analysis questions tackle the cognitive causes (cognitive failures or errors) behind the visible failures, and the reasons for these. Questioning the cognitive reasons for failures can help lead to design solutions because it focuses on why the observable failures occur, so improvements can be made to design on this basis. CHLOE does not explicitly focus on timing errors. The error guidewords applied to the cognitive model of collaboration to identify possible failures do not include timing errors specifically (e.g. too early/late). However, timing is an important aspect of collaborative error and therefore many of the error analysis questions in CHLOE refer to timing issues as a possible cause or consequence of collaborative error. The Perception questions refer to timing issues such as whether information is immediately obvious, when updates happen, how long information is available for, and refers to delays as a possible consequence of some forms of error. The Goals section also refers to how up-to-date information is in addition to triggering issues relating to tasks. The Planning section considers whether planning is done before or during a task, how fast participants have access to information, the possibility of delays in communications, and whether a shared representation is consistently available. Finally, the Actions questions are concerned with possible interruptions causing delays and the speed of recovery or repair after errors. A second version of the technique (CHLOE2) with an improved model of collaboration [3] takes timing into consideration more explicitly than the first. Collaboration is explicitly split into coordination and cooperation, which emphasizes the importance of timing, and shared understanding in the cognitive model is split into past, present and future issues.

3. CASE STUDIES 3.1 A Neo-natal Intensive Care Unit (NICU) In the development of a model of collaboration for CHLOE2, the importance of timing and collaboration on the safety and dependability of a neo-natal intensive care unit were examined.

Page 68

This work is described in [1]. Timing-related communication issues affect work in the NICU in several ways. Communication plays an important role in allowing the staff in the NICU to work as a team. It is used to achieve coordination, cooperation, negotiation, planning, decision-making and generally for sharing information. Information must be shared when appropriate and it is essential for all participants to have a shared up-to-date awareness of a situation. This is essential for dependability and helps to allow the team to perform several tasks concurrently. It may be important that information persists so that it can be checked or re-checked. For example, the information held in records may be needed in the long term, but information about staff duties/timetables only needs to be available for as long as it is relevant. As staff work in shifts it may be important to leave information for those who work in a later shift. Informal communication via gestures or notes allows information to be shared between people who work at different times, and provides a faster way of passing information when time is short. The structure of roles and responsibilities is also affected by timing-related considerations. A decision about whether to refer a case or act oneself is affected by time. This will affect, and may be affected by, how long it takes to contact another member of staff and how long it takes them to arrive. Thus, there may be problems if knowledge is distributed too widely in geographical terms.

3.2 The Air Traffic Control Domain: Ground Movements Control CHLOE2 is being used to analyse the work of Ground Movements Controllers (GMCs). The job of a GMC is to direct aircraft (arriving and departing) and airport vehicles around taxiways and ‘parking bays’ safely. Timing is extremely important in the work of a GMC. At a busy airport time is at a premium and aircraft have to be dealt with quickly, but safely. An accident or near-miss will occur if two aircraft or vehicles are in the same place at the same time. A runway incursion may be viewed as an aircraft or vehicle in the wrong place at the correct time, or in the correct place but at the wrong time. When providing instructions to aircraft and vehicles, the GMC must predict how long it will take a pilot to obey the instruction, how long it takes to perform the action itself, and what the state of nearby traffic will be at that time. If several aircraft are going to arrive in the same place at the same time, the GMC then has to decide which aircraft or vehicle should hold position and which should take evasive action. Timing considerations such as who can act more quickly, or possibly who has been waiting longest will heavily influence this decision. In order to carry out his or her work a GMC has to cooperate, coordinate and communicate with several people. Communication with aircraft and ground vehicles is by radio. The pilots and controllers are often waiting to hear from one another. Language, hearing, or attention problems can mean that extra time is needed to pass on information to pilots and ensure that it is understood. It may also take a long time to contact pilots or drivers who are not

listening to their radios. This must be worked around in order to organise other aircraft and vehicles. As the GMC and other controllers can speak to each other face-toface, they can use gesture in addition to speech to help maintain an awareness of each other’s current workload. This is essential for coordinating effectively with one another. Without this, too many aircraft may be passed to or from the GMC, thus leading to too many aircraft waiting within a given space, or too many to deal with in a given time. It is also of vital importance that information is shared between controllers at the correct time. If passed on too early, the information may be forgotten. If too late it is irrelevant, or can already have caused an accident. In order to keep control over so many aircraft and vehicles at once, much of the work is proceduralised. Departing flights must contact Clearance Delivery before Ground Movements. This step allows a planner controller to order the flights before they are passed to the GMC. However, sometimes pilots try to take shortcuts to save time. Paper flight strips are used to organise information about aircraft and vehicles. Most controllers organise their strips in time order. The columns of each controller’s strip bay are used to group flights together according to their progress with that controller e.g. those in a certain position or awaiting certain instructions. In addition to this, most controllers organise strips within columns according to the order in which they need to be or are expected to be dealt with. Flight strips must be annotated and passed between controllers in a way that coincides with the progression of the aircraft between the runways and parking areas. The timing of when these strips are passed is important. If a strip is passed too early or late it can lead to confusion between the pilots and the controllers involved over who is instructing the aircraft. For example, a strip may be passed across before a controller has completed the instructions to an aircraft. It is also important that a strip is noticed on time when it has been passed.

4. CONCLUSION Timing is an important factor in the success of collaborative work, as demonstrated in the work of the NICU and Ground Movements Control. Although timing is not an explicit focus of the CHLOE technique, many CHLOE questions refer to timing issues. CHLOE2 considers timing issues more explicitly by splitting collaboration into cooperation and coordination. Shared understanding is also split into past, present and future issues.

5. REFERENCES [1] Baxter, G., Kuster-Filipe, J., Miguel, A. & Tan, K. (2005) The Effects of Timing and Collaboration on Dependability in the Neonatal Intensive Care Unit. Safety Critical Systems Symposium. Southampton. Springer. [2] Miguel, A. & Wright, P. (2003) CHLOE: A Technique for Analysing Collaborative Systems. Proc. of the 9th CSAPC, Amsterdam, G. van der Veer & J. F. Hoorn (Eds), pp53-60. [3] Miguel, A. & Wright, P. (2004) Towards A Framework for Systematically Analysing Collaborative Error. Proc. of the 7th HESSD. IFIP 18th World Computer Congress, Toulouse, C. W. Johnson & P. Palanque (Eds), pp 271-284. Kluwer

Page 69

Page 70

Responsibility Theme

Page 71

TA Vulnerability Analysis Karen C larke Lancaster University Infolab 21 Lancaster LA1 4WA Tel: +44(0)1524 [email protected]

ABSTRACT In this position paper, the progress to date of the TA ‘Vulnerability Analysis’ is set out.

Keywords

the CHLOE method (developed by Angie Miguel) for analysing process vulnerabilities and am applying this to the existing ethnographic data - bed management initially but t o be extended to other areas if we feel this is the way to proceed. This will be the first stage of ‘testing’ the method and will involve a number of feedback sessions between TA members.

Responsibility, Vulnerability, Process, Method, Analysis.

1. INTRODUCTION This TA is concerned with developing a method of process assessment in socio-technical systems with the aim of discovering vulnerable areas of these processes in order t o produce a method of identifying potential failure points at the design stage of a system.

CHLOE has been designed by Angie Miguel [1]as a technique to help identify possible failures in collaborative work both between people and/or people and information systems. The CHLOE method consists of four stages: 1.

scenario description: scenarios allow ‘systems’ to be broken down and analysed in smaller sections. The scenarios may be data-led or ‘what if’ scenarios from domain experts. CHLOE scenarios are set out using sequence diagrams. Scenarios can be either ‘problem-based’ or ‘smooth-running’ situations.

2.

task identification: hierarchical goal decomposition is applied to the sequence diagram in order to identify the tasks essential to the successful completion of a process. This abstraction allows the analyst to set aside inessential interactions in the sequence diagram of the scenario.

3.

error analysis using a cognitive model of collaboration: the model of collaboration used here is from Dix [2] and this centres on the types of communication involved in collaborative work. Thus, CHLOE then treats failures i n collaboration as failures in communication between participants. The collaborative aspects of the model have then been fed into a set of 2 1 error analysis questions – 6 on perception/interpretation; 5 on goal formation; 5 on planning/co-ordination; 5 on essential actions for participants. Example questions include (goals Q5) ‘Are participant’s goals or sub-goals likely to come into conflict (e.g. same resources required etc). The 21 error analysis questions are applied to each task as identified in stage 2.

4.

design suggestions: design suggestions may be made either after completion of error analysis for the whole scenario or at the time a question i s asked for a particular task. Thus design suggestions are generated through the question and answer sessions. A table is used to formulate the results.

2. BACKGROUND The Vulnerability Analysis TA had initially brought together two separately developed strands of work in vulnerability analysis and applied them to an ethnographic study of work i n a hospital setting. These were: 1. The PERE methodology as a general approach to analysing process vulnerabilities which was developed in the context of REAIMS and builds on work by Reason, Rasmussen and others. 2. Work at Newcastle carried out over some years by John Dobson which has developed a method for analysing weaknesses in responsibility and communication structures within and between organisations. This method fits nicely into and complements the PERE methodological framework. However, although the initial plan was to ‘simply’ integrate these approaches in order to form a systematic method for analysing process vulnerabilities in socio-technical systems, we have included a further strand to the work. Since this TA was started by myself and John Dobson, we have developed the interdisciplinary nature of the TA and have brought Pete Wright and Angie Miguel from York University on board. We have a set of objectives in mind for the TA that will enable u s to contribute to the DIRC themes - particularly responsibility, but also risk and structure. That said, at least for now we are focusing on the responsibility theme as it is the most directly relevant DIRC theme. We are also now planning to contribute directly to the DIRC Method,

2.1. From PERE to CHLOE Following discussions and preliminary meetings/e-mails with Angie Miguel and Peter Wright, I have taken the last version of

Page 72

2.2.

Current Work

The four stages of the CHLOE method are being applied t o ethnographic fieldwork from the DIRC project. Miguel has previously ‘tested’ the method using a scenario from Air Traffic Control. This was done using seven PhD students and Research Associates from York University. These were the split into two groups – ‘novices’ and ‘experts’. These groups then worked through the four stages in CHLOE. The results of this are detailed in [1]. Briefly put, however, Miguel identified a number of problems with the design suggestions proffered at the end of this exercise and sees these problems as related too problems with the earlier stages in the process e.g. lack of domain expertise; lack of direction in questions such that only superficial analysis is provided etc. In this exercise, 128 potential difficulties were identified but design suggestions made were superficial due partly to a lack of domain knowledge and technical expertise. With these problems in mind, I am now assessing the CHLOE method using fieldwork data from three hospitals. The aim of the CHLOE method was that it should be used to answer questions on behalf of and in conjunction with domain experts. In this way, I am testing the method in my capacity as an ethnographer but with the ‘domain knowledge’ provided b y fieldwork material – real life situations. The method is firstly being applied to a number of scenarios: bed management, waiting list management, the multiple responsibilities of directorate managers. For example, one scenario derives from the fact that directorate have conflicting interests in terms of achieving targets and goals for their directorate i.e. treating a large number of minor conditions may help achieve waiting list targets but will not bring i n

enough revenue to meet financial targets. Alternatively, I am also looking at ‘workaround’ scenarios’ e.g. a situation where nurses have a workaround to deal with a flawed dispensing and prescription process. These workarounds are examples of situations which may be seen as ‘smooth-running’ but which, subject to scrutiny, uncover a range of areas for process vulnerability analysis.

3. OBJECTIVES AND OUTCOMES This work is one part of the TA that also involves work o n modelling responsibility and providing a systematic method for analysing process vulnerabilities in a socio-technical system. It will thus contribute directly to both the responsibility theme and to the DIRC method. This work will be disseminated through short conference papers, journal publication s and, possibly, to a DIRC Method book.

4. ACKNOWLEDGMENTS My thanks to John Dobson, Angie Miguel & Peter Wright for their participation in this TA.

5. REFERENCES [1] Miguel, A. & Wright, P. ‘CHLOE: a technique for amalysing collaborative systems’. Paper presented at DIRC Workshop 2004. [2] Dix, A., Finlay, J., Abowd, G. &Beale, R. (1998) HumanComputer Interaction 2nd Ed. Prentice Hall Europe.

Page 73

Complexities of Multi-organisational Error Management John Dob son

Simon Lock

Dave Martin

31 Wentworth Park Allendale NE47 9DR +44(0)1434 683 657

University of Lancaster Department of Computing Lancaster LA1 4YR +44(0)1524 510 304

University of Lancaster Department of Computing Lancaster LA1 4YR +44(0)1524 510 348

J.E.Dob [email protected]

lock@comp .lancs.ac.uk

d.b [email protected]

ABSTRACT In this paper we shall look at some of the problems i n designing an information and communication (ICT) system for an organisation located in a complex multi-organisational setting. We shall look in particular at the handling of errors both within the ICT itself and in the complex multiorganisational activities which the ICT is designed to support. We shall conclude by offering some advice to system designers which should prevent them from repeating mistakes which have been made all too often before.

Keywords Responsibility, error management, organisational boundaries.

1. INTRODUCTION This paper looks at organisational complexity and the problems it raises for the design of information and communication systems. Specifically, we shall look at problems arising from complex patterns of responsibility that are shared between separate organisations engaged in a joint enterprise or between whom some relationship exists, and the issues that arise when these shared responsibilities are called into play either to prevent failure or as a consequence of failure. We shall look at the stresses that shared responsibilities place on information and communication systems, and after arguing that most current approaches t o procurement and design of information systems do not allow for these stresses, we shall indicate an approach to dealing with them. One clarification is necessary at the start. When we use the term “information and communication system” we are not assuming anything about the extent to which it has been computerised. Information systems are taken to include not only paper records but also individual and organisational memory. Similarly, communication systems are taken to include teleconferencing and face-to-face meetings. To stress this point, we have deliberately chosen to illustrate our points b y reference to an example in which serious failings in the information and communication systems were uncovered, though no computer systems were implicated. More will be said about this later, when the case study is introduced. However, our recommendations and conclusions are intended to be applied to systems built using information and communication technology; we are hoping to show how the kinds of problems that such systems have to deal with requires a certain reconceptualisation of information and approach t o design when complex shared responsibilities are to be supported.

Many ICT systems are designed for a context which i s restricted to the organisation that deploys them. This is often an oversimplification since organisations often do not work as a closed system with relationships confined to defined interfaces. Standard system design paradigms are not well adapted t o designing for systems to be deployed in complex multiorganisational settings, often because the procurement process is specifically designed to exclude this. Procurement i s thought of as a single contract between a single purchasing organisation and a single supplier (though the supplier may be a consortium of organisations). This model works well when the nature of the goods to be supplied is well-understood (“Please supply half a ton of broken biscuits”), but fails when the relationship between the parties is more complex than a consumer-supplier one, or when it involves something more complex than goods, or when recovery from failure is problematical and involves society as a whole, not just the interested parties. To make this clear, here are three examples of organisational relationships that are well-understood and standard system design paradigms can be made to work quite well: support for consumer-supplier relationships; implementation of straightforward financial transactions; licence-handling applications. Here, by contrast, are three examples of more complex multi-organisational systems where standard system design paradigms have been found not to work too well: systems to support healthcare for the citizen; integrated transport systems; military command and control systems. These systems are all complex because they all include patterns of shared responsibilities which are implicit, negotiated and dynamic; and, as we shall see, it is often not until a failure occurs that the full complexity of these shared responsibilities is appreciated, and the simplified assumptions about them that are implicit in the information and communication systems that support the joint enterprise are exposed and break down. This make such systems hard to design because more attention has to be paid to what happens when things go wrong. It is i n the presence of failure that responsibilities are assumed, rearticulated and renegotiated; this requires flexibility of role definitions and organisational boundaries. Rigidity of boundaries and interface definitions so often serve to prevent recoverability. Many system design methods start from the assumption that the functionality of the system can be defined in terms of activities that are expressed in terms of their behaviour when accessed through defined interfaces. Although this is a

Page 74

simplified model of the way real things work in the real world, it works well enough as a representation when things are working correctly in a stable well-understood environment.

being compliant with relevant industry standards. Signal sighting experts came to the overall conclusion that the viewing conditions at the relevant signal presented an exceptionally difficult signal reading task. The reasons why the train passed the red light are complex. There were no indications that the driver deliberately set out to pass the signal a t red, and the investigation concluded that any acts or omissions by him were just one group o f contributory factors. A full report of the enquiry is available at //www.hse.gov.uk/railways/ladbrokegrove.htm

But in the kinds of complex multi-organisational systems we are considering, when things do not work correctly, human ingenuity is often able to conceive some kind of workaround which may well require some extension or renegotiation of responsibilities, and this in turn may require some adaption of the information and communication system representation of the real world entities. Although we shall be developing a solution strategy to this complex of related problems later on, we think it might help t o indicate its main features now. It has three main characteristics, which we summarise as vocabulary, viewpoint and process. The vocabulary is to use the language of pathology (what can go wrong) as often as the language of physiology (what happens when it goes right). The viewpoint instructs to look first not at what people do, but at what their responsibilities are. The process describes how failure modes and recovery plans are analysed and implemented. In this paper, we shall concentrate on the sorts of things that can go wrong in a multi-organisational setting, and problems in designing systems to support responsibilities to put things right. There is, unfortunately, no space to look at analytic processes.

2. AN ILLUSTRATIVE EXAMPLE One of the best simple examples of multi-organisational complexity is the relationship between Railtrack and the train operating companies as illustrated by the Ladbroke Grove disaster.1 On 5 October 1999, a train operated by Thames Trains and driven by a newly qualified driver passed a red signal (referred to as SN109) a t Ladbroke Grove (just outside Paddington main line station, London) and continued for some 700 metres into the path of a high speed train. As a result of the collision and the ensuing fires, 3 1 people died and 227 were taken to hospital. A subsequent investigation identified a number of significant factors which resulted in the signalling in the Paddington area not always 1

As we shall show, the information and communication systems in these organisations, though partly manual, were deficient. There is no reason to believe that fully automated systems would have been any better, given the system design paradigms for computer-based systems prevalent at the time.

3. DEPENDABILITY BASICS In this section we introduce, with examples, some basic vocabulary for talking about dependability. The terms used are standard in the domain of dependability of computer-based systems, but we will explain them in the context of any sociotechnical system, including those in which the technology i s not computer-based (or can be so regarded: computers are i n fact used in signalling systems, but in the Ladbroke Grove case this was completely irrelevant). FAULT: a fault is something that is not as it should be; a state. It may be a state of a human or of a machine. It may be latent (not visible) and it may be benign (does not cause an error). ERROR: an error is the manifestation of a fault, and is a behaviour of something. Often an error is a manifestation of an interaction between two or more faults. CONSEQUENCE: a consequence is the observable effect or outcome of an error. FAILURE: a failure has occurred when a undesirable consequence is experienced. It is a judgement about erroneous behaviour, based on its consequences. We shall endeavour to be quite consistent in our usage, which is based on strong typing: a fault is a state, an error is a behaviour, a consequence is a result (a causal concept), a failure is a judgement.

[For readers not familiar with the UK railway system it will help to know that at the time of the disaster, railway responsibility was divided between a single organisation (Railtrack) responsible for the entire infrastructure of track, signalling, bridges and property, and a number of train operating companies (of whom Thames Trains was one) responsible for operating and maintaining rolling stock and the conveyance of passengers or goods. A regulator was appointed to oversee that the whole system worked in the public interest, to award franchises to the operating companies, and to enforce public policy in the areas of competition and cooperation.]

We shall also introduce some terms associated with the achievement of dependability. Faults can be avoided during the creation or implementation of system components Faults can be removed from components after they have been created or implemented. Faults can also be tolerated, which means that if an error occurs and is detected as such, some recovery or workaround is initiated which prevents the error from causing a consequence judged to be a failure. This implies the need for monitoring and exception handling.

Page 75

The risk of failure can also be accepted, with the cost of failure (if it occurs) being met, for example through insurance or compensation or writing off. Acceptance of risk can be transferred to users or operators through the use of disclaimers and warning signs. There are many possible accounts of an incident which leads t o an adjudged failure, taken from different viewpoints. Indeed, though not here, some accounts may lead to the view that a consequence was not, in fact, a failure — since a different judge is making the judgement.

insufficient skill and knowledge “But…. Mr Adams, who supervised Driver Hodder's practical training……was not aware that SN109 was a multi-SPAD signal."

Tolerance

3.1 Example One possible account is that which places the responsibility with the train operator (there are others, equally valid): FAULT a poorly trained driver "A Mr Holmes of the Railway Inspectorate told a Mr Franks of Thames Trains that he was "very concerned about driver training" ERROR a signal passed at danger (called a SPAD) FAILURE a crash Another possible account is that which places the responsibility with the infrastructure provider:

Signal design and Placement Faults Removal

Procedures were in place to identify and rectify problematic signals but a solution had not been found/agreed upon for SN109 “What is unquestionably the case is that the bodies that I have identified generated a considerable quantity of paper. What is less clear is how effective they were at identifying problems and rectifying them.”

Tolerance

FAULT a badly designed and/or positioned operational signal “there would appear to be three principal factors – the siting of signals in terms of design, local orientation and conspicuity – visibility – and human factors which may have contributed t o the error.” FAULT Inadequate monitoring and countermeasure guidelines and practice ERROR SN109 not identified a s dangerous due to poor monitoring and countermeasure processes CONSEQUENCE the continued use in operation of SN109

It was assumed that an automatic warning system in the driver’s cab and a 700-yard run-on (between the signal and the points it controlled) would be sufficient to allow error recovery and 700 yard run on would allow error recovery t o avoid failure

In addition to the assumed tolerance of driver error, procedures were in place in the signal control room to detect SPADs and to take appropriate action in signals on the other line(s)

3.3 Multiple Faults We provide a brief summary of the factors which lead t o complexity. These points will be expanded upon as we proceed. A single failure may be the consequence of multiple faults, all acting together. The removal or tolerance (recovery) from a single fault may prevent a subsequent failure occurring. The danger is (especially with multi-organisational systems) that the faults which are not removed of protected against will remain latent and may later become reactivated by changing conditions or the injection of further faults.

We can summarise the story so far in the following picture:

For example, if the infrastructure provider (namely Railtrack) did all that they could to remove faults from the system, this would at best improve the positioning of the signal, removing only one fault from the system. The adequacy of driver training would not be affected; indeed, the deficiencies i n training might go unnoticed as the improved signal positioning would be likely to reduce or prevent failures. This brings us to the issue which is at the heart of this paper: the complexity arising from multiple faults situated i n different organisations. Examples of this complexity are:

3.2 Achieving Railway Dependability In this subsection we briefly look at the mechanisms in place that were intended to achieve dependability of the system. Operational faults Removal

It was assumed that (re)training and information would remove driver errors due to faults i n insufficient skill and knowledge

* With different organisations, how do different possible faults interact? Whose responsibility is it to work this out? Who is responsible for the interaction? * What is the model of the relationship between companies? What is the nature of the contract between them? * Is the peer relationship of very loose cooperation adequate for creating a safety structure? * How do faults, error and failure in the system that creates a given system undermine the effectiveness of fault

Page 76

avoidance strategies? In a similar way, how are fault-errorfailure chains associated with the other failure management schemes (fault removal, fault tolerance, failure acceptance)?

4. RESPONSIBILITIES FOR HANDLING ERRORS In this section, we shall expand a theme mentioned in the introduction: responsibilities for handling errors. We maintain that in complex multi-organisational settings, failures often occur because mechanisms for sharing responsibilities are inadequate or absent; and this i s particularly true for responsibilities for preventing or managing failure. Our paper is concerned with design considerations for such mechanisms.

Maintenance includes taking retrospective action to prevent subsequent occurrences of F-E-F chains. Decommissioning includes ensuring that documentation concerning the (in)accuracy of the failure mode assumptions and (un)successful ways discovered of managing failures i s preserved for posterity. The previous analysis leads to the following articulation of overall responsibilities:

4.1 Causal and Consequential Responsibility There are many meanings of the word ‘responsibility’, which we will not discuss here. (Good books to read on the nature and importance of responsibility are those by Lucas[5] and Jonas [4], respectively.) However, for our present purposes it i s useful to distinguish between causal responsibility, when an agent has an obligation to make something happen or prevent it from happening or to maintain a state, from consequential responsibility, when an agent is answerable when something happens or does not happen or a state is not maintained. These different responsibilities do not always rest on the same agent (the doctrine of ‘ministerial responsibility’) and consequential responsibility may be held to rest with an organisation as a whole whereas causal responsibility most usually can be traced to an individual or the fact that n o particular individual at the time held the responsibility. causal responsibility may sometimes be delegated, though some responsibility remains with the delegating agent (i.e. the responsibility for having chosen to delegate), whereas consequential responsibility is not normally capable of delegation, though it may sometimes be transferred. We shall refer to these different responsibilities in our discussion of Ladbroke Grove, and deal with the complexities they pose for system design in a later section.

4.2 Lifecycle and Responsibilities In preparation for mapping out the responsibilities implicated in a failure, it is useful to start by looking at the major lifecycle phases of an operational system as a way of distinguishing different responsibilities. There are four major phases (defined by processes) in the life cycle of an operational system: procurement operation maintenance decommissioning (in the case of Ladbroke Grove, decommissioning was not an issue) It is easier to deal with particular faults in particular ways at particular points in the life-cycle: Procurement includes making assessments of the risks and consequences of operational failures. Operation includes monitoring errors and following plans for recovering from the errors so as to prevent them from giving rise to failures.

The use of the word ‘agent’ here indicates a responsibility for doing something or seeing that it gets done – the actual execution could be performed by a machine or delegated t o humans. An agent is always a person or group of people sharing a responsibility. The lines in the diagram represent not just information flows but conversations. A conversation is a possibly extended series of exchanges, distributed in time and space, between two agents. The information exchanged can also be seen as a partial state of that conversation as it exists at any instant. More details about the modelling here presented will appear in a forthcoming book [3]. The picture is intended to be normative. Its use is i n performing a comparison with a description of the responsibilities as they are articulated in the actual setting, i n order to identify such things as missing or ill-defined responsibilities, or shared responsibilities that cross inter- or intra-organisational boundaries, as it is these that so often give rise to failures, and in particular in failures in failure prevention or management. This comparison can be used, as i n the soft systems methodology (see Checkland [1] and Checkland and Scholes [2] for the theory and practice of soft systems methodology), as a way of determining expectations on a new information system. The positioning in this model of (intra- and inter-) organisational boundaries is key to effective error recovery. This will be discussed in the next section.

5. ORGANISATIONAL BOUNDARIES In order to discuss the problems arising when responsibilities cross organisational boundaries, we start by taking a slight simplification of the previous figure.

Page 77

If maintenance responsibilities are in a different enterprise from the operation responsibilities, where exactly does the boundary lie? It could, for example, be like this:

Here, system maintenance is carried out either by a separate organisation or by a separate division within the operating enterprise. As part of the maintenance, all the monitoring responsibilities can be transferred, but the operator is then dependent on another organisation for critical management information; there are a number of possible organisational failures associated with such a critical dependence. An alternative that is theoretically possible but in practice would be defective, is shown below:

so that monitoring and error handling responsibilities are shared between the operational organisation and the maintenance organisation, Such shared responsibilities require good communications channels and some way of resolving conflicts in priorities, because this model i s equivalent to the following:

The problems here are clear. Inter-organisational conversations are required to coordinate shared responsibilities; but the media and channels required for such co-ordination may be unclear and the supposedly communicating processes may be mutually opaque as indeed they were at Thames Trains and Railtrack, as the Ladbroke Grove enquiry shows

6. BOUNDARY OBJECTS There has been much discussion on the concept and nature of boundary objects, and indeed whether they can easily be determined. In simple cases, however, the idea is a useful one. A boundary object is one which is visible on both sides of an organisational boundary, but which has different connotations on each side. For example, to Railtrack a train is something which uses their infrastructure; to Thames Trains, a train is something which delivers their transport services to their customers.

but in practice, maintenance will include at least some monitoring and therefore some error handling:

So information about boundary objects is generated in two distinct contexts. Normally, such information is interpreted i n the context in which it is generated, though parts of the context may be shared (e.g. the whereabouts on the line of the train). In the presence of failure, however, when shared responsibilities are actually called upon to be exercised, information generated on one side (e.g. about the training of drivers) has to be interpreted on the other (e.g. was their driver appropriately trained from our point of view?) In addition, two other things tend to happen:

Page 78

i) what constitutes the boundary object is re-articulated (e.g. trains are now seen to have drivers)

players. Ethnography is also useful in identifying boundary objects and the ways their interpretations differ on each side.

ii) things previously not seen as boundary objects now take on that significance (e.g. signals have now to be treated as boundary objects; as indeed have drivers because Railtrack now realises that it has an interest in the driver’s training and experience).

Ethnography can also be useful in failure mode analysis. One particular use is in situations where response to potential or actual failure is a preventative or recoverable action, ethnography provides a description of what actions actually occurred — as opposed to what actions were supposed t o occur. (It is hardly necessary to stress how often these differ.) It can show how fault-error-failure chains are investigated and examine the nature of interactions across organisational boundaries – how processes are brought into alignment, and who or what does the job of translation. (This is particularly important for recoverability.)

There are three distinct, but related, information management problems that now arise: 1) What one party judges to be a failure of, or implicating, a boundary object might not be so judged by the other party (e.g. the fact that drivers had difficulty in reading a particular signal was initially treated by Railtrack as a form of driver failure, not signal failure). This is a point which is not always appreciated by those who consider a failure a state or behaviour (i.e. something which all competent observers can agree upon), because although it is a mismatch between what actually occurred and what a specification says should occur, there might be more than one valid specification. Clearly, a train crash is a failure, as is a SPAD: but wherein lies the error(s)? And of what is the failure that gave rise to the error(s)? This is the problem of socially determined failures, i.e. consequences whose subsequent characterisation as failures i s a process of social agreement. 2) Information on one side of the boundary – including its context of interpretation and generation – might not be visible on the other side. This undoubtedly occurred at Ladbroke Grove. The report comments unfavourably again and again on the way that information passed across the boundary but was not acted o n for reasons that were obscure. 3) A shared approach to recoverability or repair might well be hampered by the invisibility of relevant information or its processing. These problems are deep-rooted and give rise to a number of issues in the procurement and design processes for an information management system, to which we shall subsequently return.

7. THE USE OF ETHNOGRAPHY In this section we look at the acquisition of information about responsibilities. Clearly one way of finding out about responsibilities is direct enquiry: asking informants (and their managers), looking at their job descriptions and contracts and so on. But the direct approach, although necessary, is also limited. People’s interpretation of their responsibilities are often nuanced, and this nuancing is often better determined as a result of observation and subsequent elaboration, since direct questions are usually answered directly. It is one of the roles of ethnography to observe the manifestation of interpretation of responsibility. It can do this by explicating social aspects of work and considering the relationship between local practice and formal procedure – how procedures are enacted and how practice is related to or explained as or accounted for in terms of formal process. It can probe into aspects of safety culture as these are enacted in the organisation. Because ethnographic interpretation considers systems in a broad socio-technical sense, it is particularly useful for analyses of ‘systems’ where computers are minor

8. SOME IMPLICATIONS FOR PPROCUREMENT 8.1 Procurement and design There are many complex links between the procurement process and the design process. Although we cannot, for reasons of space, explain procurement process complexities i n this paper, we can point to one major source of problems: the idea that a requirements process (considered as part of procurement) specifies what is to be done and the design process suggests how it is to be done, is far too simplistic a view of a complex information system. The language used t o specify requirements invariably carries design implications with it. For example, whether a static or dynamic representation of an object is required, or whether an object can have only one or more than one of a set of attributes, or whether the classification of an object can change and if s o under what circumstances, all provide deep-rooted constraints on the design of a system stretching from database structure t o choice of human interface metaphor to recoverability procedures in the organisation. Some design work during requirements formulation is unavoidably necessary, however deprecated it may be by the purist. The root cause of many failures can often be traced back t o faults in the procurement process, particularly for complex ICT systems. Designers, of course, are not usually in a position t o influence the structure of the procurement process, but they should be aware of the different possible models of procurement and the kinds of error that arise in each.

8.2 Requirements-based procurement In requirements-based procurement, in which the problem has been analysed to some extent but the solution has not, the design responsibility is to architect and implement a solution conformant to the requirements. A common fault is that the requirements may have arisen not from the problem owner’s concerns but from the problem analyst’s understanding of them. This can be alleviated by some kind of ethnographic exercise, but this is not always done. There is, I think, a design responsibility to be aware of the provenance of the requirements and in particular of the assumptions — particularly of the patterns of future change in the environment — that lie behind them and to question those that seem implausible or contradictory.

8.3 Solution-based procurement In this model of procurement, a solution —possibly outline, possibly complete — is presented to the designer for implementation. There are, unfortunately, many cases where

Page 79

the solution has been specified in such detail that the designer is left no scope for using the designer’s own expertise and experience for suggesting better solutions because details of the problem have been hidden.

8.4 Problem-based procurement Here, the designer is given a set of concerns and issues and invited to suggest a design to address them. The main failure here is that the characteristics of the problem owner—problem analyst conversation are not compatible with the consumersupplier relationship implicit in a tendering and provision conversation.

9. SOME IMPLICATIONS FOR DESIGN 9.1 Agency The binding between individuals and responsibilities is a complex many-to-many relationship. We structure this using two distinct concepts. We have introduced the concept of role to classify the relationships held by an individual: an individual can hold many roles simultaneously, and a role may imply many related responsibilities. The concept of role is related to the structure of an organisation and is defined i n organisational terms. Agency, on the other hand, is abstracted away from any actual organisational structure and is simply a collection of related responsibilities; how it maps onto work roles is a matter of organisational design and will change as the organisation changes. In particular, one particular agency may span inter-organisational boundaries, such as the consequential responsibility for a collision. The concept of agency allows a conceptual separation to occur as organisations change. For example, small organisations often combine the sales agency and the marketing agency into the same role; as the organisation grows, this is often a desirable separation to make. Since agency is a more stable concept than role, an information system based on agency rather than role is more likely to be capable of change as the organisational structure changes.

9.2 Conversations Dealing with multi-organisational failure and its consequences requires communication and cooperation. This implies that information, as well as being about the state of a boundary object, is also the (partial) state of a conversation between the communicating and cooperating parties. This means that an information system is sometimes better reconceptualised as a communication system ,and this in turn requires a reconceptualisation of communication and conversation), one that provides a basis for undestanding failure modes. Conversations and the relationships that they define, sometimes fail. The purpose of a theory of conversations is t o explain the failures associated with the intentions of the participants. It is clear that the bringing together of obligations and responsibilities can create conflicts of interest as well as synergies. It can also create overloads and imbalances which could lead to failure in operation. In addition to failures of organisational policy and design, we have operational failures due to a lack of correspondence between the expectations of the participants. We have developed a theory of the attributes of roles and conversations that provide a basis for analysing such situations.

We also need an analysis of failures to perform the intended role by failing to generate correct information, b y misinterpreting presentations or by proffering incorrect or inappropriate resources. These failures would be accounted for in a theory of instruments. Finally, failures in reading, writing or transporting data are the province of a theory of communication.

9.3 The Need to Record Responsibility modelling raises three important information questions: What do I need to know? What do I need to d o? What do I need to record to show what I have done? It is this last that becomes of importance when the possibility of failure raises questions of answerability. Recording can be seen as an anticipated conversation between a responsibility holder and an investigator. For example, one possible organisational policy is that consequential responsibility following a technical malfunction rests with the person who chose t o deploy the malfunctioning device. This answerability could be mitigated by providing recorded evidence about the soundness of the choice process.

9.4 Boundaries and boundary objects Problems with systems are particularly likely to arise if the systems are intended to cut across organisational or professional boundaries. One reason why such problems arise is that the responsibilities on each side of the boundary are either undefined or are incompatible. One design implication is that need for explicit representation of the nature of relationships across the boundary, identifying boundary objects, conversations and communication, and shared and non-shared responsibilities. Because boundary objects have differing interpretations o n the two sides of the boundary, there is often a need for two distinct representations of the object. For example, to the train operator, a train is a unit of schedulable resource, which requires other schedulable resources, such as a trained driver. Essentially the need is for a static representation. But for the infrastructure provider, a train is a dynamic unity defined by a head code and a moving point on the track; the need is for a dynamic representation. Tying together the two different representations is not, to be sure, an insuperable problem, but it does present a certain complexity in system design.

9.5 Monitoring Not everything need be monitored. Obviously if failure i s likely to be catastrophic, fault tolerance and recoverability measures are important. But if the consequences of failure are likely to be merely inefficiencies, resources for planning for and implementing monitoring are best spent where the structures of the system or its associated responsibilities cross organisational boundaries, since it is there that disputes are both more likely to arise and difficult and costly to resolve.

9.6 Audit trail It is unusual for information systems to have the capability t o record things that happen in the social domain, such as delegation. It is in the interest of agents who hold consequential responsibility that the audit trail is correct, reconstrucable and complete. For example, one possible organisational rule is that consequential responsibility following a technical malfunction rests with the agent who chose to deploy the malfunctioning device. This answerability

Page 80

could perhaps be mitigated by providing evidence about the soundness of the choice process, including those aspects of i t that took place in he social domain.

3. During system operation, when a problem arises and there is a need to find out who needs to know and what they need to know

9.7 Design for recoverability

10.2 Uses of ethnography

There are two main classes of strategy for recoverability: backward recovery is the restoration of a previous state known to be safe, usually saved as a checkpoint or archive; forward recovery is the forcing into a known safe state.

Ethnographic methods can be used to identify organisational procedures, different types of responsibility, the agents who hold those responsibilities, patterns of communication and organisational boundaries, as these actually exist as opposed to what is supposed or thought to exist; and thereby can be used as a basis for discussion about change, for example following a failure.

Backward recovery is not always possible for systems that are closely coupled to the world, since although the system can be rewound, it is not always possible to rewind the world. We shall therefore concentrate on some design considerations for forward recovery. There are many risk assessment methods available for analysing possible vulnerabilities (weaknesses in the system) hazards (potential threats in the environment) and risks (decisions concerning what to do, or what not to do, about the vulnerabilities and hazards). However, one important strategy for recovery after a failure i s diversity: trying not to put all you eggs in one basket is as important during recovery as it is during normal operation. Remember that independent systems of the same functionality may well not fail independently (e.g. having a second driver i n the cab may not help if both have been on the same defective training course).

10. SUMMARY AND CONCLUSIONS 10.1 Responsibility Modelling Focussing on responsibility is to make a distinction between who is responsible for performing a role and who (or what) actually executes a role. We advocate that looking at responsibilities is a better guide for designing information systems than looking at executions, since it allows analysis of problems that can potentially arise when responsibility has not been clearly allocated, those responsible do not or can not actually perform the role, responsibility cannot be enforced because of lack of corresponding authority, communication between jointly responsible actors is difficult or impossible, and many other causes of organisational failure.

10.3 Organisational Complexity Organisational complexity requires an ICT system design method which recognises that multi-organisational systems need to extend current methods in the way they deal with failure. Three examples seem particularly important. 1. Procurement processes which are based on the single organisation assumption may not work too well. 2. Failures which can be traced back to errors in the sharing of responsibility are going to occur and the recovery procedures also have to be designed for a multi-organisational context of use. Where consequential responsibility is unclear, the social and legal processes require more information than just that immediately prior to the triggering event. the nature of the contract between the parties may have implications for existing (or non-existing) systems. 3. Information is often best regarded as a partial state of a conversation and understanding the nature of the conversation is needed to construct the multiple contexts of generation and interpretation.

11. ACKNOWLEDGMENTS We wish to thank all those who have participated i n discussions with us at Lancaster (particularly Mark Rouncefield Guy Dewsbury and Ian Sommerville) and Newcastle (particularly Mike Martin and Ros Strens)..

12. REFERENCES [1] Checkland, P. (1981). Systems Thinking, Systems Practice, Chichester, John Wiley.

So far, three possible uses for models of responsibility seem t o be emerging:

[2] Checkland, P. and J. Scholes (1990). Soft Systems Methodology in Action, Chichester, John Wiley.

1. During planning/procurement/requirements when there is a need to clarify the responsibilities of the different actors in the system, especially where multiple organisations are involved.

[3] Clarke, K.M. and Hardstone, G., Trust in Technology, Kluwer, 2005 (in press).

2. During an enquiry, when there is a need to find out who takes the blame and (perhaps) who should have done something.

[4] Jonas, H. (1984) The Imperative of Responsibility, Chicago, University of Chicago Press [5] Lucas, J. R. (1995). Responsibility, Oxford, Clarendon Press.

Page 81

‘That’s How The Bastille Got Stormed’: Issues of Responsibility in User-Designer Relations Dave Martin,

Mark Rouncefield

Computing Department Lancaster University Lancaster, UK +44 (0)1524 510348

Computing Department Lancaster University Lancaster, UK +44 (0)1524 510348

ABSTRACT This paper presents data and analyses from a long term ethnographic study of the development of an electronic patient records system in a UK hospital Trust – TA ‘Dependable Deployment’. The project is a public private partnership (PPP) between the Trust and a US based software house (USCo) contracted to supply, configure and support their customizable-off-the-shelf (COTS) healthcare information system in cooperation with an in-hospital project team. We use data drawn from our observational studies t o highlight a range of responsibility issues in designer-user relationships.

Keywords Responsibility, ethnography, integration, healthcare

user-designer

relations,

Vic – “X has drawn my attention to upcoming changes i n procedures – it is important that these are done before golive so they are not associated with the system. If they are done before go –live, the system will be seen to automate and speed this up. If not then you’ll have a revolt and that’s how the Bastille got stormed.” Barney – “They had an EPR (electronic patient record) in the Bastille?” Vic – “Yes they did

1. Introduction: Responsibility & Design: ‘Life Is Shit, Organised By Bastards’ Ever since the much heralded ‘turn to the social’ in systems design the responsibilities and relationships between users and designers has been held to be of crucial importance i n both designing and deploying information systems. Research and experience appears to have produced a common ethos – if not a cosy shibboleth - in HCI and related disciplines (e.g. CSCW and PD), that it is part of the designers’ responsibility to understand those they design for, to understand their work, and build systems with users and other stakeholders participating. In HCI a proliferation of techniques and methods for understanding the user and their work and involving them in design have emerged to enable designers t o discharge this responsibility. But whether these ideals about responsibility ever work out in the ‘real world, real time’ practice of developing and deploying multi-million pound IT projects remains debatable.

Quite how designers might discharge their responsibilities t o users is itself a topic of dispute. Out of a miasma of ideas, beliefs and approaches, ideas have emerged that inform our understanding of the relationships and responsibilities between systems designers and systems practitioners - the notions of designing both for the user and with the user. In this paper we point to various features of the relationship and responsibilities between users and designers to consider what designing with and for users means in the context of an electronic patient record (EPR) development in a hospital trust in the North of England. In so doing we sketch out some issues in user-designer relations and responsibilities and suggest how the ‘Janus faces’ of design (Bowers) have multiplied and become ever more intricate. We use our ethnographic observations to suggest that this is further complicated by complexities over exactly who the users are and how they can be represented and accommodated within the design process; to the extent that, to the jaundiced eye, and t o the hard-pressed designer, getting users involved appears t o often be the beginning and the cause rather than the end and resolution of design problems. The ‘real time, real world’ issue then becomes exactly when and how do designers (and users) wish to face up to and address these responsibilities and these problems, perhaps best characterised in Arthur Smith’s heartfelt and resonant phrase, “Life is shit, organised b y bastards”.

2. NHS Modernisation and Computerisation The National Health Service (NHS) in the England is currently undergoing a major period of upheaval, ‘modernization’ and computerization (a process that has been going on in different guises since the 1980s) (Bloomfield and Vurdubakis, 1997). In this paper we focus on moves to provide comprehensive, integrated computer support through developing and deploying electronic patient records (EPRs) – that all NHS Trusts are required to develop in the next 5-10 year period. The systems are envisaged to enhance medical work not only through better information (accessible at the point of service, more timely, better quality etc.) but also better support of best practice and decision support, as well as providing the means for integrated working (For commentary on the process, problems and evaluation of current EPR systems see Ellingsen & Monterio, 2000; Hanseth & Monteiro, 1998; Hartswood et al., 2001). Trusts are on a trajectory that requires them to integrate their services electronically with other care providers in their area. At the same time they are required to provide core sets of data expressed in particular ways for national purposes. Integration is then not just a problem for individual Trusts but one that

Page 82

must be worked out in relation to requirements for regional integration with other services, and national integration. The UK Government has instantiated a program to deliver the systems required to achieve this process - the National Programme for IT (NPfIT). Local NHS Trusts will work i n concert with the local service provider (LSP) who will provide a suite of products (not necessarily all their own) which will be configured to the individual Trusts’ requirements. As yet the exact contractual and working relationship between Trusts, LSPs and suppliers is not exactly clear but we know that the relationship is one of public private partnership (PPP). In a PPP the private company is contracted to supply, implement and maintain the Trusts’ systems for a given period of time (usually 8-10 years). Currently, this program is in its infancy – the LSP contracts have been awarded and work is beginning but is still at an early stage. When the LSP programme was announced certain Trusts that were deemed special cases (i.e. where they had already signed contracts with suppliers and their procurement process was judged to have been sound) were allowed to continue implementing systems outside of the LSP programme but having to conform to national guidelines. This study focuses on just one of these based in the North of England – the ‘Trust’. In August 2002, the Trust signed an £8.3 million, nine year, contract with a US software provider t o supply, implement and support an EPR system. The Trust currently comprises three hospitals and the system is due to be delivered in 3 phases. Phase 1 (a core administrative and reporting system, theatres, A & E, radiology and links t o legacy laboratory applications) is due to ‘go-live’ this February (2005) after being delayed a number of times since last February. The second and third phases will bring other specialities and GPs on-line, automated pharmacy applications, care pathways, decision support and so on, turning the system into a full-scale EPR. Our ethnographic study began in May 2003. We were provided with an interesting opportunity to gain access to the design team as they progressed the design, attending meetings of many sorts involving the project team (and particularly the project manager), shadowing, attending testing and so forth and collecting a wealth of material (field notes, tape recordings and various documents). The implementation team – the Trust analysts to which we mainly refer throughout this paper – i s made up of an analyst for each of the system areas/modules (e.g. ‘theatres, A & E etc.). It is the analysts in the implementation teams that carry out most of the day-to-day systems work – in terms of specifying what the build of the database should be and then carrying it out, demonstrating i t to ‘users’ and then refining, re-building and so forth. Each analyst is part of a wider team comprising a Trust analyst, a USCo analyst, a team leader (a manager from that area) and various ‘users’ (medical and administrative staff of various ‘jobs’ and levels). Importantly, while this was happening ahead of the schedule of the main NHS programme it is a very similar situation t o that which many other Trusts will be experiencing over the next few years and most of the other NHS EPR projects will have a similar configuration of players and technologies involved. An outside (often international) supplier will provide a customizable off-the-shelf (COTS) EPR system to be configured for the particular Trust. This may well be integrated with other specialist legacy applications (particularly for e.g. laboratory work), some of which will have different suppliers. The business of building and configuring the system will be managed in partnership – i.e. a joint project team involving

members of the Trust and the supplier. Of course the situation is more complicated since the supplier’s analysts and designers essentially act as intermediaries between the Trust and their employers. The design of any system for an individual Trust is likely to encounter limits as to how much the supplier will want to tailor the system for a given client. The issues we indicate are likely therefore to be generalisable across a number of EPR projects, and may well have relevance to COTS systems in general. We therefore attempt to make some general points about the complexities of user-designer relations in design and project work: the issues of multinational cooperation in development and deployment and how COTS systems get tailored in massive commercial projects. We also point to how issues of project management, usability and integration are influenced by such relations within a ‘real time, real world’ commercial project, where ‘time is money’.

3. Designers, Users and responsibility: Contractual Relations The contract is a massive document, developed throughout the 4 year procurement phase and ‘finalized’ in August 2002 when the Trust signed it with the US-based supplier USCo. It has since gone through a couple of official larger scale ‘change contract’ revisions and numerous minor alterations. When we originally started the fieldwork the project manager – ‘Helen’ – pointed it out on her desk, patted it and said what seemed truthfully and ruefully that it was her “Bible….. and her bedtime reading!”. Although this section is about contracts i t is not about ‘the law’ regarding contracts, the construction of contracts or executive level contract negotiations, although these too would be interesting topics. Rather, it is about everyday design problems and how ‘the contract’ or what i s assumed to be in ‘the contract’, or what is involved in meeting the contract, figures in project work. But it is surprising how rarely ‘the contract’ appears in research on user-designer relations given our routine observations that reference to it i s a persistent feature of the design process. The ‘contract’ - the formal, legal stipulation of work and responsibilities - gets dragged into everyday work and used in a number of ways. It provides a formal framework within which, and in reference to, user-designer relations get worked out in practice, for, as with any ‘plan’ (Suchman ref) how the contract gets worked out in a contingent and rapidly changing world is a product of intense negotiation. In this project a continual feature of the relationship between designers (and designers and users) i s the on-going negotiation over where work is, what work i s required, and who should undertake it by reference to the contract. Certainly some work specification and allocation i s relatively unproblematic. Problems may occur as the requirement for extra work emerges during the development process (as is common), and it may have to be portioned out. When negotiation occurs both sides have room to manoeuvre and they may trade work activities. During such discussions i t is common to invoke the ‘contract’ and take recourse to its specifics. In implementation team meetings, the discussions involving the ‘contract’ are relatively commonplace due to its importance in specifying responsibility - who is formally responsible for what - as illustrated in the following quotes taken from talk between the UK analysts and project manager: “…you can bet that he went back and checked on the contract right away and he was the one who actually pointed out to me

Page 83

that it was in the contract so.. he was going to speed this through” “.. why are they talking to us about cost?.. contractually its on USCo's head” Attention to the detail of the contract ensures that the organization, through the project team, effectively ‘covers its bases’ - or fulfills its obligations - ensuring that any (inevitably costly) breakdowns cannot be attributed to the project team or the organization it represents: “….we have to be very pro-active and keep emailing your analyst and say what do you want me to work on? what d’you want me to do? ..-I’m getting nervous for a variety of reasons .. I’m just not sure what they’re going to throw back at me .. just want to make sure we’re .. covering our bases as well…” The contract, like any plan does not, cannot, lay out in endless detail exactly what it takes to fulfill it. Ambiguities regularly arise over the definition of actions such as what the nature of ‘participation’ versus ‘direction’ should be during the phase of configuration: “..this goes back to the issue of.. whose responsibility is it t o do certain things with setting up and configuration .…the expectation has always been that well we would participate i n configuration… it was on the understanding that they would be directing that configuration” (UK analyst) While the UK Project Team may feel that sometimes they end up with more and different work than they read into the contract in a similar manner the contract offers them possibilities for finding flexibility within the formal contractual limits (what Bittner [4] might term ‘organisational acumen’) to ensure they get what they want: “…its important that we are getting the things that we require within the contractual limitations and y’know I understand that we have to work within that but if also within that we need t o make sure we are getting what we require” (Helen, UK project manager) While the contract constitutes the ‘official’ documentation for specifying activities and responsibilities the Project Team also use a variety of other means to ‘try to get the best deal’ as shown in the following discussion on media manager product (for managing images e.g. from radiology) between Helen (UK project manager) and Peter (senior UK analyst): Peter – “what functionality is required, we seem to be getting a lightweight version but we want as much functionality a s possible.. we have been given less than we were demonstrated”. Helen – “Let’s see if we have a hard copy of what was demonstrated to aid in negotiations”. Helen and Peter discuss how their version of ‘media manager’ seems to have less functionality than that which they were demonstrated, but that if they can find a copy of the demonstration this may aid in asking for more functionality (for the money one assumes). Thus, not only the official documentation of the contract is used as a bargaining tool but also ‘unofficial’ artefacts like a CD-ROM demonstrator can be used for this purpose to gain leverage on the ‘good faith’ of a supplier. Contractual and quasi-contractual issues also impinge o n user-designer relations in other ways, in particular through the notion of the ‘sign-off’ in that ‘sign-off’ can provide ways of

keeping users on board while effectively providing contractual protection for designers. This next excerpt is taken from a discussion between Gail, the UK patient administration system (PAS) analyst, and Alice (her US counterpart). It i s provides an insight into the way the relationship between users and designers is managed. Gail begins by stating that i t is of ‘crucial importance’ to get the administrative system build ‘validated by the data management group’. Alice’s comments are particularly revealing in that she describes the reason for getting the system signed off as being to ‘protect the analyst’ (the UK analyst) from complaints they might receive about aspects of the system during later stages of design. Gail – “PAS, crucial importance of getting it validated by the data management group.” Alice – “…..the importance of buy in.” Gail – “Do I have to fill out a sign off form for each waiting list”. Alice – “No – the reason for sign off is to protect the analyst because without it you can get complaints on procedural changes during testing and go-live… you need to ensure buy in through use of these documents with expert and superusers”. Interestingly this process is not described in terms of making sure the design is ‘correct’ rather it is described as ensuring the users have officially signed up to the design because this undermines any basis for user complaints later on. In this way we see that the design team limit when users can have input into design and what that input will be. Of course, ironically, i t may be, and often is the case that users only achieve the requisite levels of skill and understanding of the design and how it will impact on their work towards the end of the design process. This, of course, leads to new requirements coming along late in the day, often when the design has progressed t o a stage where these are hard to accommodate, or at least accommodate with any level of elegance. Given that this may be a commonplace feature of design, official ‘sign offs’ effectively limit possible disruptions later in design (or at least make them more obviously available to monetary renegotiation). Clearly the contract, and what lies within it, i s not a passive document that unproblematically prescribes a division of tasks and labour for the development and deployment of the system. The contract will have to be worked with during design as its shortcomings become apparent, problems emerge, new requirements come on line and so forth. When discussing the purpose and use of organizational rules Bittner (1965) urges that research should progress from noting that organizational practice does not and cannot be in “strict obedience” with the letter of rules and procedures to instead look “… to the investigation of the limitations o f maneuverability within them, to the study of the skill and craftsmanship involved in their use….” In this study we have sought to echo this program but instead of looking at organizational rules we have considered the practical use of the contract – a description of tasks, duties and responsibilities as distributed between a supplier and customer in cooperative partnership. Bittner continues t o define organizational acumen as follows: “Extending to the rule the respect of compliance, while finding in the rule the means for doing whatever need be done, is the gambit that characterizes organizational acumen.”

Page 84

Drawing on this we might consider that a key feature of ‘acumen’ in project management would be the ability to draw on one’s knowledge and skills to masterfully achieve the system one requires within the limits stipulated in the contract. Clearly the details of the contract always require elaboration into actual work, in practice. The ability t o skillfully elaborate what the contract should mean in terms of work tasks and their allocation for the benefit of one’s organization and successful bargaining over the contract i s doubtless a requirement for project managers in these situations.

4. Design For Users: Identifying Problems Although users have direct access to the analysts and designers, nevertheless a lot of design and decisions about design have to be taken in their absence. It is consequently interesting to explicate some of the ways in which users are considered in design meetings, how responsibility to users i s factored into the accomplishment of the meeting. Design meetings are often about sorting out problems, where the issues often become, ‘who are our users and how do we get worthwhile cooperation?’; ‘what type of user problem is it and how do we solve it?’; and ; ‘whose problem is it and how do we evidence it?’.

4.1 Taking Responsibility: Who are our users and how do we get worthwhile cooperation?

effect in achieving the desired integration of work processes t o produce ‘enterprise wide scheduling’: Alice – “Enterprise wide scheduling would be full integration of a series of procedures, bringing resources together in the ‘correct’ order to support care…. the system would automatically work out what can be done, when… indicate what is required, as opposed to scheduling that is not seamless across procedures.” Thus, the current situation of design is contrasted with design ideals and the lack of achievement of these ideals is attributed to the users rather than the designers. Alice – “We need to make a cut-off date.” Barney – “I could do it, all I need is a correct, full data set…..Other jobs got in the way of chasing up the data.” Alice – “There’s a real problem of the validation of the data set”. Helen – “There’s a problem of change management going o n in the Trust right now, particularly in the call centre, there are disputes over how things are currently done and the requirements for modernisation.” Barney – “Well I’m not going to worry about other people giving me the right information as long as its signed off.” Alice – “But I must stress the importance of buy-in from the most tricky people and areas during QA testing.”

In this example Barney (a senior UK analyst) relates his difficulty in getting the information he requires to build the clinic scheduling application for the new system. He acknowledges the diversity of his user group and the need t o include ‘many different users’ in testing but his design problem is that he does not have the ‘correct’ information (it i s incomplete and in the wrong format) on current process and practice on which to base a new design and he seems unable t o access users who can provide him with the information he requires. For him part of the frustration has been that does not know if he is just talking to the wrong person, whether nobody actually has this information, or whether users are deliberately withholding information. Alice (US analyst) suggests that the problem should be escalated (‘to the IM & T steering group’ – upper management) as a means of putting pressure on hospital staff to cooperate ‘properly’ with the designers. Barney – “For this area we need many different users to test as it is different for different areas. I’m basing the build o n call centre information. There’s a problem that the build comes from either PAS or how you do it. Information has not been provided in full or in a format to be used so I think I will just have to go on how PAS does it.” Alice – “I think this has to go to the IM & T steering group” Barney – “We wanted to set up clinics the way they work – i t would have been magnificent, but have to go to PAS instead. No-one in this hospital is capable of providing a list o f clinics.” Barney formulates the problem as one in which the users are ‘shooting themselves in the foot’, i.e. if he could have received the correct information the new system would have been ‘magnificent’ for the users. This prompts Alice to describe this problem as an instance of a more general difficulty in the design – that the current situation is one where departments or areas operate as ‘silos’ and that this is having a knock o n

While Barney re-iterates that it is only a lack of the required information (‘a correct, full data set’) that is stopping him from achieving the design Alice indicates a problem of ‘validation of the data set’ –when users sign off the data set for the design. Clearly, if there is disagreement amongst users about the data set, such that it has been difficult to collect (for whatever reason), then there may well be problem in getting i t signed off. If it is not signed off then there may be problems progressing to the subsequent stages of design. This leads Helen to reformulate the problem as illustrative of organizational struggles to do with ‘change management’ and ‘modernization’ and therefore as a problem not necessarily t o do with the EPR project alone. Of particular interest here is the manner in which the designers treat the users as troublesome, and that design involves trying to control when and how the users will be involved. Users are meant to be cooperative in providing the required information that will eventually benefit them in helping to design a suitable new system. However, because of their intransigence to change and integration they are resisting the new system. There is also a concern to ensure that user complaints are minimized during later stages of the project (and that this is a real danger) and that his must be achieved by keeping them o n board at this stage. But user involvement is not always welcomed (since user involvement can actually inhibit testing by providing comments that are extraneous to the job i n hand).

4.2 Responsibility: What type of problem is it and how do we solve it? In the excerpt below (from a UK analysts’ meeting) the discussion begins with the A & E analyst (Bob) discussing with Lenny (the pathology analyst) the problem that A & E staff may not remember to log out of the system if they are called away suddenly to an incident. It is interesting to see how this problem is formulated. Bob begins by suggesting

Page 85

that since in current practice staff do not log out, they will not do this with the new system. Lenny responds by suggesting that the new system might log users out quickly anyway once they had stopped interacting with it. Bob then raises the problem that another user might then use the system under the previous person’s signature. This would be a concern for both security and the integrity of records. Bob – “Because if they’ve got to log out people will not log out of it they don’t now ..” Lenny – “But maybe they won’t have a chance because the log in time out will...” Bob – “Well I understand that .. but if it doesn’t time out before someone gets their hands on the keyboard, .hh that next action is taking place under someone else’s signature” Lenny – “Mm hm” Bob – “And that’s a problem” Helen – “Mm hm it is a problem” Bob – “And in A & E, in that chaotic, you know, environment, they will not log out” The discussion continues as to whether the problem can be solved technically. Firstly, the analysts discuss whether an optimum time out can be set but dismisses this as the shorter this is (which would suit for security), the more problems for usability (users would inadvertently be logged out when they stopped typing). They also discuss the possibility of using a plug in key device or biometrics for access and authentication but these are rejected for other reasons. We return to the conversation as the project manager (Helen) proposes her ‘solution’. Helen – “Well and again that is something I mean again this i s one of the reasons why we’ve asked for the IT trainers here as well so that this is ... yesterday I met with the IT trainers and we started talking about some of the issues that we need t o make sure that everyone is aware of .. this is one of the k e y ones . making sure that people log out and understanding the implications because in a fact it’s an electronic signature, and that’s going to give a print, of where you’ve been on the system and if you don’t log out you’re allowing someone else to use that that signature” Bob – “But it’s not a training issue.. the fact is that the log out procedure will not be looked upon as important as treating a patient” Helen – “Sure” Bob – “And in that environment they’re not going to turn round, and log out, every time they walk away from a PC, I can guarantee that” Helen – “Yeah so .. we need to to look at it.. I agree it’s not completely a training issue I do think it is partially a training issue” We can see in this example one of the ways in which user problems become issues for design. For analysts there is an on-going consideration of what the design is and how this corresponds to their understanding of the work done in the area they are responsible for. Through their discussions with users and observations of work they make decisions about the fit of the system to work practice and raise them as problems

when the ‘fit’ is considered bad. The system logging on and off procedure is described as a bad fit with the actualities of A & E work – where other duties will sometimes take priority over logging out. The team search for a technical solution and, interestingly, when no workable technical solution is found Helen re-casts the problem as another type of problem - as a problem of current practice - and therefore something to be dealt with by a change in practice. The solution is to be implemented by training that stresses to the users that their personal integrity with the system is compromised if they d o not log off. This new conception of the problem, however, i s modified by Bob when he re-iterates that other matters naturally take priority in Casualty suggesting that it would not be a question of staff deliberately going against what they were trained. Here, what is particularly interesting is the ‘mobility’ of problems and solutions. Problems of usability can be problems to do with the system or to do with the users. In this case the problem is set as a ‘system not fitting in with the users/users environment’ problem. However, when no easy technical solution can be found it is re-cast as potentially being a user problem – ‘intransigence to change’ to put i t bluntly. But in this case the solution of ‘training’ is rejected and we reach (for now) an impasse on how it will be solved. In general technical solutions are preferred as they ‘solve’ the problem, while there is always doubt about how well training will stick and how well users will adapt. However, it is worth noting that when a technical solution is not found (even if the team agree it is a thoroughly technical problem) it inevitably becomes a problem to deal with through user adaptation (hence why workarounds proliferate during the course of a project).

4.3 Responsibility: Whose problem is it and how do we evidence it? In the previous example log-out was readily accepted as a problem, and while there was a discussion of how it could be technically solved, there was no specific discussion about whether this was the responsibility of the US or UK analysts. In the following example (taken from a joint US and UK analysts meeting) we can see that these responsibility issues do enter into analysts’ talk as well as discussions of the means for evidencing problems in the ‘correct’ fashion for the correct audience. The extract begins with Lenny (UK pathology analyst) discussing how the data entry process for laboratory access to the new system is not ‘slicker’ and ‘smoother’. The problem he refers to is that lab staff are being asked to input five items of demographic data, when previously they only had to input a single code. In consequence the new system will be less efficient, produce bottlenecks and therefore users will view the system negatively. Lenny – “If the data entry process does not work in a smoother, slicker fashion there will be bottlenecks which will slow the process and cause problems… we already attract criticisms and problems with GP ordering which will be manually input... It sounds like 5 steps when currently it i s only one step – we only take one code”. In the next part of the conversation Vic explains that the reason for requiring the 5 demographic details is that the application (a GP (doctor) finder) is generic to the system and requires five items for the Commissioning Data Set (Government requirements). Thus, the reason for the ‘problem’ is due to requirements for producing an integrated system i n

Page 86

line with Government requirements. (Interestingly ‘for the purpose of integration’ and ‘for CDS (NHS/Government) requirements’ become progressively the most prevalent ways designers (both UK and US) account to users the reasons why they must do more work, or the usability is not what desired). This view is partially rejected by Alan (pathology team leader) who takes up the issue of integration but lodges it firmly as being a supplier rather than a user problem. That it is the supplier’s problem to achieve integration while achieving the same level of service. Vic – “You need to have the ability for other areas of the system – what should be easy is a problem because you risk the CDS integrity”. Alan – “Integration is the number one job…it’s how systems will become part of the family… it’s an issue for USCo, fitting legacy lab applications to the EPR”. Helen – “Can someone take a stop-watch and time this?” Alan – “It will take twice the time, more personnel and over 100,000 transactions you can imagine… it takes Lenny longer and he knows what he’s doing”. Helen – “We need the timing so we can take it up as an issue”. Alan – “It’s the same thing for Bob and A & E, it has great importance for system success, if inputters aren’t happy, the department’s not happy”. While Helen asks how long it takes to input the data, so it can be taken up as an issue with the appropriate people, the excerpt finishes with Alan stating that the problem is the same in other departments (A & E), and re-iterating that user attitudes to the system are important for any successful implementation. This builds on the previous example in illustrating the different ways in which a problem is cast, how users’ interests (different users’ interests) are represented by designers, and how problems are tailored to various audiences. Here the problem i s framed and measured in different ways – firstly by Lenny as an efficiency problem that would lead to an interrupted process viewed negatively by the individual users. Vic responds b y suggesting that it is inevitable due to the need to integrate processes and to meet NHS requirements (the organizational user), essentially suggesting that it is not a problem to be solved by the supplier. This is turned around by Alan when he suggests that problems of integration are problems for the supplier. Helen responds by asking for the problem to be timed– so she can make a case to her superiors (this is the route used to put pressure on the supplier when problems are deemed serious). Here we see some of the ‘escalation’ techniques used to get a problem identified, categorized and accepted and how the user is represented in this process. For example, by concentrating on individual users, as making sure they are happy is an important principle in this design, or b y scaling the problem up by looking at the bigger, organizational picture (100,000 transactions) or suggesting that the problem is more widespread (it also affects other areas) than the doubters might consider.

5. Design With Users: Discussions With Users So far our examples have dealt with users at second hand. They have shown how the design team seeks to understand and reason about the work of users, how such work fits with the developing system, how to understand what types of problems are thrown up during this process, and how they can be

appropriately managed. We have also seen how user involvement is partitioned to particular areas and times in the schedule of design, how users are dealt with as something that can be problematic to the design process if allowed, or involved in the wrong place, at the wrong time. Now we turn t o situations in which users are specifically involved – in this case in QA (quality assurance) and integration testing. Here the main questions posed by users centre around the fit with current working practices, the reasons and justifications for the particular design and the likely training demands to learn to use the system. Such discussions can be awkward for the design team since their scope extends beyond the individual user or user group experiences to touch on difficult issues of system integration.

5.1 User problems: How does this fit with our work, why is it designed like this, how do we learn to use this? The following excerpts highlight many of the common types of user concerns that arise and how they are addressed. In the first, two of the US staff (Vic and Brad) are ‘walking’ two of the A & E super-users (Jenny and Brian) through clinic bookings for their department. Here Jenny is evidently unhappy with that fact that to go from one step to another in the workflow ‘you have to go through seven screens’. Brad, currently demonstrating the process on a computer, responds that there is a shortcut to avoid the long sequence of key strokes. Jenny replies by re-stating the problem as one where complex sequences of interaction are required for simple tasks. Brad replies by saying ‘that’s the way it is’. This comment is taken up by the senior US analyst (Vic) who provides a fuller explanation of why the interaction proceeds as it does – for the purposes of collecting the data they are required to by the NHS. He also describes how a series of alternative solutions t o this as a problem were tried, listing the reasons why they were not taken up. Following this Jenny poses a few more questions about important functions (to a ‘typical’ A & E worker) asking whether they are supported by the system. Jenny – “There’s one field to fill in but you have to g o through 7 screens to get to it.” Brad – “But you can just F7 to get to the field.” Jenny again voices their concern about the amount of time i t takes to carry out actions – complains about “having to do x clicks to carry out simple tasks”. Brad – “… that’s the way it is.. Vic - It’s required for the A & E CDS….. A & E visits need to be counted as clinics.” – Thus mirroring other aspects of hospital work (i.e. so they have a generic form). Vic then explains why other options would not work. Jenny – “Can we see a day’s schedule… can we tell who’s had x-rays.. how do we change an appointment”. The next comment comes from Brian, pointing out some buttons on the screen and asking whether they will be using them. Since the system is an integrated one, there is a possibility that for an area there will be functions that are not required (or extra functions may be required). As the subsequent comment by Vic suggests the system may be fairly easily tailored in this respect. Brian - “I’ve a question about the buttons… do we use these (and points to some of the buttons).”

Page 87

Vic – “We’ll have to check whether they have any values or we might be able to switch them off.” Jenny – “This is the first time I’ve seen a clinic, before they’ve never been working so I’ll need to go back and practice it.” Helen – “You need to fit in with the Trust that’s why it’s like this.” Brian – “But it’s a problem that fitting in with the Trust involves more work.” Helen – “Anything we can streamline we will… in the future with USCo… and you have to realise the importance of data gathering and sharing information across the Trust.” Helen adds to Vic’s point about NHS requirements by stating that another part of the reason for the design is to ‘fit in with the Trust’, i.e. for the purposes of integration. Brian responds by stating what might be considered the classic problem between designing to support local practice and the constraints placed by needing to integrate processes – meeting the demands of integration is seen as a problem when it means extra effort by local users. Helen promises future efforts t o ‘streamline’ things before again stating the case for integration. But then Jenny persists in describing her concerns with the new system: Jenny – “I’ve been trying registration for months and have a problem of getting lost and not knowing where I am and I’m worried about how much training for our receptionists will be required.” Vic – “Could you drive (control the computer) and show u s where you are getting lost?” Jenny notes that even though she has been practising ‘registration for months’ she still has difficulties, and these involve ‘getting lost’ on the system. To her this suggests proposed training for receptionists may be insufficient. This triggers a discussion regarding the interfaces and interaction sequences required by the new and old systems. The old system simply took the user through a series of screens where they filled them out item by item. The new system requires navigation back and forward and in and out of menus. For Jenny and Brian the new system is harder to learn, less straightforward and easier to get lost/confused with. Finally, Vic and Helen reiterate their comments about the need for organizational and systems integration, and that the information is required by the Trust: Helen – “This is a Trust wide system, you get the benefits o f the information gathering of other people so you need to d o this….As a teaching hospital we need to do research so we need good data…since there are no A & E people on the PAS team I’ll now put you on as stuff like this is a PAS requirement so it will help you to understand and keep informed o f decisions.” Vic – “If a patient is sent to A & E from elsewhere you won’t need to fill in these details as they will have been done elsewhere so you do get benefits.” As a ‘Trust wide’ (integrated) system, the extra information gathered is often of benefit elsewhere, and since the hospital i s a teaching hospital (required to do research) it needs ‘good data’. Furthermore users in any particular department will receive benefits from others as well as doing extra work t o benefit others. In this long example we can see how the analysts try to sort through different types of problems that are raised as they take the expert users through their workflow

for the purposes of integration testing. When expert users single out aspects of the design and workflow that produce more work for those inputting data – that involve more steps of interaction or more data collection than is presently the case - these are presented as unfortunate by-products of the constraints placed on the design by demands for integration and satisfying new NHS requirements. However, such reasons may also be proffered when the analysts believe the problems to be clinically insignificant or as something that may be dealt with by training and during the domestication of the design. Issues of fitting new systems to working practices also surface in these next excerpts that come from discussions during integration testing for the patient administration system (PAS) team - whose leader is Christine: Christine - “There’s a problem of doing QA’ing when you’re QA’ing something but you don’t actually know what you’ll be getting… ‘cos they don’t have a PAS system in the States… it’s like fitting a square peg in a round hole… in America they just go ‘have you got the money – bang’.. at the end of the day it’s our managerial problem so we need to start thinking o f workarounds… we have to rely on the Trust when they emphasise the clinical suitability of the system.” While analysts explain the complications for users as attributable to requirements for integration within the hospital and the NHS, Christine attributes them to trying to fit a US (insurance and payment) oriented system to the UK – ‘it’s like fitting a square peg in a round hole.’ She casts the problem as one of PAS having to make the adaptations (workarounds) t o fit with the system on the basis that it will fit clinical requirements. This is illustrated when Gail (PAS analyst) describes the model for patient allocation to orthopaedic consultants. The system is set up to allow doctors to monitor their lists of allocated patients with the feature that they can reject or accept them. In previous discussions users had flagged this up as a problem, since doctors are not necessarily thorough and their secretaries often prompt them on their responsibilities. Consequently, the workaround, that consultant’s secretaries would also have access to these lists i s introduced by Gail: Gail – “When a patient is allocated to an orthopaedic consultant it goes to his queue but if consultants don’t answer/accept requests they also sit together on all secretaries queues’ so they can monitor if appointments aren’t being picked up by consultants.” Christine – “What about generic referrals where we usually allot to the shortest waiting list.” This, however, is not taken as a complete solution by Christine and instead provokes her to raise further problems of the fit of the system to the work of organizing clinics. Firstly, she raises the problem that the system is not set up to allow them to allot patients to the shortest list, instead only to a specific consultant. The next comment from Christine highlights one of the major problems of implementing an integrated system when previously workers have used dedicated systems. Since the new system has a number of generic applications that dictate, for example, how resources are ordered and activities scheduled, local workflow must integrate with these. This means that users often complete some details on one screen then move to these generic applications. This means that the flow through the system appears more complicated as screens and menus are logged into and out of. Christine explains the

Page 88

process of learning interaction sequences with the new system to her user group by using an analogy: “I imagine it’s like the map of the tube (London Underground Trains)… (she gestures as she speaks) you go along and sometimes you get off here, go up there, and back, to get t o there… it’s not a completely linear process” Christine’s final comment (below) also takes up on some of the previous themes throughout the analysis. As noted before, the UK project team are instructed to ensure the buy-in from the UK users by getting them to 'sign off' on the stages of the work. Indeed, refusal of an area to sign-off represents a major problem for the project team as this could provide a legitimate reason for users to reject the design. No doubt Christine i s aware of this when she states reluctance to sign-off testing: Christine – “We don’t want to sign this off before we g o through everything in the proper detail… we are not fully happy about accepting that training will sort out all of these problems… some of them seem like major problems.” Just as when she did not want to sign off QAing before the system was finished, here she states her reluctance given that testing has not been conducted in ‘proper detail’. Interestingly, she is only sticking to getting things carried out as the project schedule dictated – ‘the system would be built, then it would be QA tested until users and designers were satisfied, then integration testing would proceed’. For UK and US analysts there is an acceptance that the idealisation of design as discrete phases is only something to be worked towards serving as a means to measure progress. But this is not necessarily the case when users are involved. Although they may concede the need for compromise, as we have seen they can throw the ‘structure’ and ‘methods’ of design back in the faces of the designers by insisting on following the plan. And, of course, they are both entitled to and may also be wise to d o so, to ensure they have the best design to suit their needs. Another point to note is the issue of whether training will solve all the difficulties encountered. While it appears inevitable that problems, particularly when deemed clinically unimportant, and technically difficult to fix, have to end u p being solved by training, workarounds and so forth, it i s important that users do not feel that problems are being trivialised and merely driven down to a training issue. This i s part of a more general issue of how information is presented t o users throughout the design. This is not simply a question of honesty, as obviously a whole lot of translation (of technical and theoretical details) between stakeholders goes on all the time. However, as the design progresses, and as users become more knowledgeable and involved, they can begin to be more militant, and see where explanations fall short. This suggests that there is a need to communicate in a more sophisticated manner with them as the design goes on. But also it raises questions about how, for example, 'sign-offs' work – how can you expect users to be bound into signing off stages when the stages do not flow in the manner specified? While these matters are usually and eventually worked out they can become serious sticking points.

6. Discussion: responsibility Issues in Designer-User Relations - 'That's How The Bastille Got Stormed' As IT systems become steadily more complex and organizationally embedded the challenges of and for design

increase. Achieving systems dependability is of crucial importance since research has already indicated how systems can be disastrously, often fatally, unsuccessful. As with the EPR system reported in this paper - progress in dependable design depends on understanding the fundamental problems that arise in attempts to build systems involving complex organizational interactions. Our interest is therefore i n developing improved means of specifying, designing, assessing, deploying and maintaining complex computerbased systems in the (often mundane) contexts where high dependability is crucial. It is an old refrain from researchers using ethnographic studies (like us) that the details of work achieved as a recognisable social accomplishment explicated by our studies can inform the better design of systems. In this case we have not studied the healthcare and administrative work to be supported by the system but the work of those delivering the system. Our experience suggests that such an approach can provide useful information about how t o effectively target our ethnographic research in a complex setting like this. Firstly, ethnography could be particularly useful when considering integration and 'hand-offs' – the points where processes pass between one part of the organisation and another – the non-integrated parts. This provides better understandings of how processes mesh (or not) with one another and the work required (by talk etc.) to bring things into line. Secondly, problem targeted ethnography could illustrate and evaluate issues of practice to aid stakeholders in sorting out problems (what exactly they are and how serious they are) and which organizational and systems features it is important to support and what might be less relevant. As such it presents further support for the ideas of 'corealisation' (Hartswood et al) which challenge conventional presumptions about IT system design and development practice, the organizational division of labour, and temporal and organizational divisions between designers and users. This paper has considered some of the difficult issues in what is fundamentally mundane, everyday design work. It i s certainly no news to point to ways in which design i s enmeshed in organizational processes, involve various (ultimately political) alignments and are practically resolved. Nevertheless, our sympathy went out to the Trust employed analysts (on whom much of our research is based) - stuck i n the middle between users (in all their diversity) and the US analysts. They understand the workings of the Trust and the people within it but also the constraints of design and the problems that USCo face in trying to achieve a workable solution. They are caught in the push and pull of developing and changing user requirements which become better articulated, and it may be argued, more insightful the later the project goes on, while understanding that the design conversely needs to become more stable (and closed). It might be easy to proclaim that at least some of the difficulties in this project could have been avoided by understanding users and their work practices better, by better management of user participation, by better design methods and process, b y procuring another system etc. However, this is the real world, real time design of a complex system, in a setting where design is constrained by budgets, by time-scales, by personnel numbers, by expertise, by knowledge of developing methods and by a welter of organizational features. In this context participation is unlikely to be the simple, convivial, activity idealised in academic research. Getting a proper idea of who your users are, how they can be stratified, how their

Page 89

requirements can be assessed and prioritised, how they can be trained, cajoled, nurtured and so on is a real problem that must be worked out as the project progresses - and may (just) stop the 'storming of the Bastille'. Our long term observations suggest that even with the best intentions there is rarely time out to sort this out before the project starts. While drawing attention to these ‘intractables’ of design we would like t o indicate some of the ways in which we think some of these difficulties could be addressed with minor interventions i n this type of setting – hopefully aiding those faced with similar design tasks in similar environments.

6.1 Users, Participation and Training “'There is a limit to the extent to which you can seek to design procedures for doing a job without having to depend upon the good sense of those who are to follow them" (Button and Sharrock) Why do users often fail to participate in integration? Traditionally it has been seen as too technical. Users are good at telling you what they do now but not at comprehending technical considerations – instead, the work of integration involves the project team deciding what compromises or alterations need to be made to local instantiations of the system in order to satisfy integration requirements. This can cause consternation amongst users as they see their interfaces changing form and appearance from exposure to exposure. Involving users in the rationale of decision making around integration might better allow the project team to sort out what needs to be supported in the design. In order to do this successfully there needs to be an undertaking to train users i n technical aspects of design so they could appreciate the difficulties of accommodating specific local requirements and generic modelling of integrated processes. However, the effort might well be worth it for two reasons. Firstly, users become more and more capable of understanding and articulating their requirements as the project goes on. This is partly a matter of learning about systems and design as the project progresses and partly about the provocation that comes from seeing successive versions that allows users to match the system t o what they do now and have now. Training in technical aspects would hopefully allow a more sophisticated appreciation of the system earlier on in the project. Secondly, one of the big problems for the project team is working out whether reported issues to do with usability are serious or not, i.e. ‘would this really have an impact on care, and how much time should be spent trying to solve the problem technically?’ Better trained users could better elucidate these issues.

afford the luxury of taking a time out to work out how previously ad hoc (talk and document supported) integration would integrate electronically, and instead that they had to use the EPR project as a forcing device. However, might they have tackled the problem differently? As we have seen in our examples, users participate in design where it concerns their work, and as such evaluate design against current working practices. Change is judged as good when it appears to make things easier. Given their circumscribed role in design, it i s little wonder that users may seem intransigent and skeptical when told that their personal compromises are for others’ benefits and it may lead them to fear transformation and cling to current practice. This suggests a need to involve users i n other parts of the design. Most obviously in designing the models for integration – as this will allow them to appreciate the tension between generic and specific requirements and i n doing so may allow better sorting out of which local requirements are necessary and which might be transformed. It may also encourage them to appreciate how integration may benefit others and in doing so ‘get them on board’ more effectively.

ACKNOWLEGEMENT Thanks to all the staff at the Trust who assisted with this work

REFERENCES

6.2 Integration and Dependability The requirement for integration is one of the key design problems in this project. Trust wide applications, like the replacement PAS system, clinic scheduling, etc. have caused some of the greatest difficulties for the project team. The offthe-shelf system already contained generic models for these processes but adapting this for the requirements of the UK NHS, this Trust and the local users has been traumatic. Quite apart from the fact that requirements have constantly changed during the development, not least as previously ‘unknown’ users (and their requirements) become apparent, balancing the ideal of supporting the multi-fold current local practices against the need for core standardization to integrate processes has been fraught. Given the benefit of hindsight this problem would have been foregrounded as a major issue prior t o design. The Trust may have accurately felt that they could not

Page 90

1.

Bloomfield, B. & Vurdubakis, T. Visions of Organization and Organizations of Vision. Accounting, Organizations and Society 22, 7 (1997), 639-668.

2.

Berg, M. (1997). Rationalizing Medical Work: Decision support techniques and medical practices. Boston: MIT Press

3.

Bittner, E. (1965) ‘The concept of organisation’, Social Research, 23, 239-255

4.

Bjerknes, G., and Bratteteig, T. (1995) User Participation and Democracy: A Discussion of Scandinavian Research on System Development. Scandinavian Journal of Information Systems, Vol. 7. No. 1, pp73-98.

5.

Blomberg, J., and Kensing, F. (eds) (1998) Special Issue on Participatory Design. CSCW Journal, Vol 7, Nos. 3-4. 1998.

6.

Blythin, S., Hughes, J., Kristoffersen, S. & Mark Rouncefield (1997) ‘Recognising ‘success’ and ‘failure’: towards the ‘illuminative’ evaluation of groupware’. In Proceedings of Group’97, Phoenix, USA.

7.

Brooks, F. (1995) The Mythical Man Month: Essays on software engineering, anniversary edn. AddisonWesley: Boston.

8.

Button, G & Sharrock, W. (1994) Occasioned practices in the work of software engineers. In Requirements Engineering Social & Technical

Issues, eds M Jirotka & J Goguen. London. Academic Press. 9.

Carroll, J., Chin, G., Rossen, M. & Neale, D. (2000). The development of Cooperation: Five years of participatory design in the virtual school. In Proceedings of DIS 00.

10. Ciborra, C. (1994). The Grassroots of IT and Strategy. In Ciborra and Jelassi (eds.) Strategic Information Systems. Wiley. 11. Clarke, K., Hughes, J., Martin, D., Rouncefield, M., Hartswood, M., Proctor, R., Slack, R., Voss, A. (2003). Dependable Red Hot Action. ECSCW ’03, Helsinki, September. Kluwer. 12. Crabtree, A. (1998) Ethnography in Participatory Design. In Proceedings of the 1998 Participatory design Conference, 93-105

18. Hartswood, M,. Procter, R, Rouncefield, M., and Sharpe, M. (2000) Being There and Doing IT in the Workplace: A Case Study of a Co-Development Approach in Healthcare. In Proceedings of the 2000 Participatory Design Conference. 19. Hartswood, M., Procter, R., Rouncefield, M. and Sharpe, M. (2001). Making a Case in Medical Work: Implications for the Electronic Medical Record CSCW Journal. 20. Hartswood, M., Procter, R., Slack, R., Voß, A., Buscher, M., Rouncefield, M., Rouchy, P. (2003) Corealisation: Towards a Principled Synthesis of Ethnomethodology & Participatory Design. Scandinavian Journal of Information Systems 2003. 21. Heath, C. & Luff, P. (1996). Documents and Professional Practice: ‘bad’ organizational reasons for ‘good’ clinical records. In Proceedings of CSCW 96

13. Ehn, P. (1988). Work Oriented Design of Computer Artefacts. Stockholm, Sweden: Arbetslivcentrum. 14. Ellingsen, G. and Monterio, E. (2000). A patchwork planet: The heterogeneity of electronic patient record systems in hospitals. In Proc. IRIS’2000 (Uddevalla, Sweden, August). 15. Geertz, C. (1973) 'Thick description: Toward an interpretive theory of culture' in The Interpretation of Culture. New York. Basic Books pp 3 - 30. 16. Grudin, J. (1990). The Computer Reaches Out: The historical continuity of the interface. In Proceedings of CHI 90. 17. Hanseth, O. and Monteiro, E. (1998). Changing Irreversible Networks. In Proc. BCIS (Aix-enProvence, June).

Page 91

22. Hughes, J., Randall, D., Shapiro, D. (1992). Faltering from ethnography to design. Proceedings of ACM CSCW ’92, Conference on Computer-Supported Cooperative Work, pp115-122. © Copyright 1992 ACM. 23. Martin, D. and Rouncefield, M. (2003). Making the Organisation Come Alive: Talking Through and about the technology in remote banking. HumanComputer Interaction, Vol. 18, No’s 1 & 2, pp. 111148. 24. Williams R., Slack, R. and Stewart, J. (2000). Social Learning in Multimedia, Final Report to European Commission, DGXII TSER, University of Edinburgh

Patterns of Responsibility Dave Martin, Mark Rouncefield

Marian Iszatt-White, Simon Kelly

Computing Department Lancaster University Lancaster, UK

Computing Department Lancaster University Lancaster, UK

1. ABSTRACT This paper considers issues of responsibility, leadership and leadership development through drawing on prolonged periods of observational, ethnographic research of educational leaders 'at work'. In an era of a supposed crisis in leadership, we use our rich data and interdisciplinary backgrounds t o consider leadership development as essentially a design problem, adapting the notion of patterns that emerges in the architectural work of Christopher Alexander and the organisational studies of Tom Erickson. Keywords

Responsibility; Leadership; Ethnography; Patterns of Interaction.

1. Introduction: Responsibility - a DIRC approach. This paper attempts to address the issue of making ethnography accessible by taking a novel concern – the problem of ‘leadership – that originated from outside of DIRC and by applying several aspects of what might reasonably be regarded as a ‘DIRC approach’ attempt to fully understand and represent, explicate exactly what this problem looks like. For us a DIRC approach involves: treating problems as design problems, deploying ethnographic approaches, emphasising the value of using an interdisciplinary approach, and in this particular case, utilising ‘the framework provided b y developing patterns of interaction’. Of course, the issue of leadership raises a whole range of issues concerning ‘responsibility; leaders (and leadership roles) are commonly regarded as both wielding and delegating responsibility, being held responsible and holding others to account. Equally obviously leadership is an interesting and different, if strange, ‘design’ problem, but what we are attempting to do is take seriously the notion of socio-technical system by, in this instance, focusing on some of the human factors involved i n responsibility in order to show clearly the ways, the patterns, in which the acceptance, recording and discharge of responsibilities are reflected in these systems, with the eminently practical concern of using such investigations t o provide ‘teachable moments’ in the creation of leadership development packages.

bottom of organisations”. The 1992 Further and Higher Education Act increased college’s responsibilities enormously (and suddenly) to include managing multi-million pound budgets, negotiating staff pay and conditions, resolving legal issues of ownership and maintenance of property and so on. Faced with an increase in the numbers of colleges in serious difficulty, ‘managers right to manage’ has been swiftly overtaken by a ‘crisis in leadership’ as failure has increasingly been assigned to Chief Executives: “.. ambitious to the point of recklessness.. and have got their way in doing all this by being ‘strong’, ‘ruthless’, ‘heavyweight’, ‘determined’ and ‘visionary’.” (Goddard-Patel and Whitehead 2000: 202) – paradoxically the very qualities often associated with ‘good leadership’. Despite this supposed crisis, leadership itself appears poorly understood as both problem and solution – or, as Sacks (1972) so famously commented (on police work), leadership seems “a solution to an unknown problem arrived at by unknown means’. Leadership seen as the problem in FE appears simultaneously as the solution to that very problem – though, of course, ‘good leadership’ as opposed to ‘bad leadership’. In much of the established literature leadership appears as a quality, a skill, an aptitude that transcends the everyday, the mundane and the ordinary, often associated with mystical qualities - ability to influence, arouse, inspire, enthuse and transform. Within organizational settings leadership i s associated with the exercise of power, the setting of goals and objectives, and the mobilisation of others to get work done (Kotter, 1990; Wright, 1996) - ‘a saviourlike essence in a world that constantly needs saving’ (Rost cited in Barker, 1997: 348) This, of course, begs a whole gamut of questions. Is leadership the solution or the problem? What is it? Who does it? How do we recognize it? Can we develop it? What appears to be at stake, however, is not an adequately worded definition, but rather a more fundamental agreement on what leadership – when all is said and done – actually consists of. What makes someone a leader (good or bad) rather than a follower? How can we identify leadership and how can it be adequately measured? In short, most calls for a definition of leadership are concerned with a kind of purity, a boiling down, or isolating of leadership qualities and characteristics. As Grint (2002: 14-15) observes:

Leadership as a Design Problem Our starting point is a 2003 report from the Council for Excellence in Management and Leadership found that practical leadership skills in the UK were “in short supply from top t o

Page 92

“…It is rather as if a leadership scientist had turned chef and was engaged in reducing a renowned leader t o his or her elements by placing them in a saucepan and applying heat. Eventually the residue left from the cooking could be analysed and the material substances divided into their various chemical compounds. Take for instance, Wofford’s (1999: 525)

claim that laboratory research on charisma would develop a ‘purer’ construct ‘free from the influences of such nuisance variables as performance, organizational culture and other styles of leadership’. What a culture-free leader would look like is anyone’s guess…” We are neither leadership scientists, nor chefs, but we are involved in understanding just what is so ‘special’ about leaders and leadership and for us the starting point to this kind of understanding comes from considering, in detail, what it is that leaders actually do. The need to conduct more detailed studies of leadership-inpractice has long been recognised (Gronn, 1982, 2003; Yukl, 2002) and yet few studies venture into the everyday doing of leadership, concentrating instead on developing new theories or explanations of leadership. We are not interested i n developing any new theories of leadership - or even attempting to evaluate the plethora of theories and approaches that currently exist – since we doubt the ‘work’ that such theories do in actually understanding the phenomena they purport to explain. Our interest in leadership is rather different, and, in a sense, more practical (what might be seen as another DIRCish quality) – we are looking for ways for research to contribute to leadership development – and consequently we approach leadership as a ‘design’ problem. We want to know, from the interdisciplinary perspective common to the design enterprise, what the requirements for a leadership development programme might be, how we might best design and deploy it. Clearly leadership is what Rittel and Webber (1973) might term a ‘wicked problem’ and viewing i t as a practical design problem has some benefit. When leadership is regarded as a ‘design’ problem – rather than one associated with personality traits or cultural characteristics – the essence of ‘leadership as design’ becomes both that ‘good’ leadership can be taught and that it can become embedded within the organisation. The point of uncovering and relating ‘patterns of interaction’ lies in developing a set of scenarios of 'teachable moments' that resonate with participants experiences, that connect with the reality of everyday leadership work in the post-16 educational sector. However, unlike the patterns presented by Alexander these are not typically presented as 'problem-solution' - though they could well be - but (much in the fashion of DIRC (Martin et al) 2001) as stories or scenarios (as in the tradition of ‘scenario-based design’) that are recognisable as what Clark (1972) terms 'organisational sagas'. In this way we accommodate what i s sometimes termed ‘the turn to the social’ in design (Grudin REF), the recognition of and central concern with users and understanding situations of use, not divorcing systems – and system here incorporates people and their activities as well as technology - from the settings in which they would be deployed and used. Ever since this much heralded ‘turn to the social’ in systems design research and experience appears t o have produced a common ethos that designers need t o understand those they design for, they need to understand their work. What we are designing here are sets of leadership development programmes and packages rather than technology but the argument and its force remains the same. The challenge to which we hope patterns of interaction present some kind of initial response is to design teaching and

learning programs for leaders and managers in the post-16 education sector that somehow mesh gracefully and meaningfully with the readily observed practices and activities of Further Education. In other words we are seeking to look beyond developing generic management or leadership skills towards identifying and encouraging skills and abilities that are rooted in the sector. While empirical research in this area is growing we need to make such findings accessible, we need ways of representing knowledge about leadership and leadership activities in the sector so that it is accessible to the increasingly diverse set of people involved in designing leadership development programmes – diverse enough t o include ‘horse-whisperers and chocolate makers’ . It is in this sense that leadership both becomes and remains a design problem. The use of the notion of patterns is an attempt not only to represent such workplace knowledge, but also t o provide a framework within which it can be discussed, explicated, extended, and generalized. In turning to the detection and utilisation of patterns as instantiated in DIRC (Martin et al) - patterns of interaction found in a number of instances – many of the issues of generalisability of ethnographic research findings and their translation to policy and training are thereby avoided. Patterns of Leadership While every child understands the notion of a pattern, the academic origin and relevance of patterns for us lies in the work of the architect Christopher Alexander, notably his books 'A Timeless Way of Building' and 'A Pattern Language' (Alexander 1979; Alexander et al 1977). Alexander uses 'patterns' to marry the relevant aspects of the physical and social characteristics of a setting into a design. For Alexander patterns are;“.... ways of representing knowledge about the workplace so that it is accessible to the increasingly diverse set of people involved in design..” For us the 'workplace' i s that of College Principals in Further Education. As Alexander suggests; “each pattern describes a problem which occurs over and over again .., and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice”. As such these patterns, when applied in an educational setting, provide both focus and possible solution for leadership development programmes. For us the advantage of the notions of patterns lies in finding ways of transforming and representing our wealth of observational materials in ways that are sensitive to both the observed practices and needs of ‘leaders’ and that therefore can be readily used in leadership development programmes. There are, however, a number of rather different conceptualisations of patterns and while inspired from Alexander’s original work the notion of patterns has moved on. We wish to exploit patterns in the much looser spirit suggested by Alexander’s original work where familiar situations were used to convey potential (in his case, architectural) solutions. Put simply, the observed reoccurrence of familiar situations lies at the core of our advocacy of patterns. People, designers, College Principals, Senior Managers etc often encounter situations that are similar t o previous ones, and one justification for this focus on patterns is the emphasis on drawing from previous experience t o

Page 93

support the collection and generalisation of successful solutions to common problems. As Alexander suggests; "each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice". Another intriguing rationale behind patterns that may prove us in the context of leadership studies and leadership development is Alexander’s notion of ‘quality’ (‘The Quality Without A Name’). This quasi-mystical property both attracts and repels designers, but for Alexander it consists of answering questions such as "what makes a good cafe?" where ‘quality’ refers not to some mystical characteristic but to features that ensure that buildings, organisations, activities ‘really work’, that they fit with the social circumstances of use. For us, in contrast, the question is "what makes a good leader" - but we suggest the steps towards resolution, the careful observation and documentation of everyday activities, remains the same. Our interest is to break down the question 'what makes a good leaders' into more manageable, more digestible segments - what makes a good meeting, what makes a good public presentation, what makes a good staff meeting, what makes a good presentation of accounts etc - and t o document the patterns that comprise them through a number of empirical examples. Of course, we are not the first to point to the idea of ‘patterns’ as offering possibilities for leadership development – and there is seemingly no end of ‘self-help’, ‘self-improvement’, management books that attest to this fact. In “The Manager Pool: Patterns for Radical Leadership (Olson and Stimmel 2001), for example, the concept of patterns as general solutions to recurring problems is applied to management and leadership. They argue that knowing a number of patterns will both identify and improve the rare and desirable skills of leadership. But the patterns they produce and analyse (some 61 patterns in five different categories) bear no obvious or stated association to any rigorous empirical reality – instead we are presented with a number of largely ‘commonsense’ or theoretically derived categories such as psychological patterns (states of mind); behavioural patterns (behaviour); strategic patterns; tactical patterns and environmental patterns. These patterns, drawing on a vast range of theories of good leadership, supposedly describe how people interact, how they are led, and the environments they work in – but the data seems to consist of anecdote and cutesy homily – for example, environmental patterns, supposedly offer ways t o improve team morale. In this category, for example, the ‘Living Space’ pattern builds on Christopher Alexander's ideas and suggests the family home is an effective model for organising workspace with its mix of private and public areas for work, communication, rest, and play. Well who says so? Where’s the data? What’s the evidence? We certainly would not be the only people to suggest that the family home is neither so simply or unproblematically organised nor such a wonderful place for serious work activity. This is not, particularly, to critique this, or any other approach that uses patterns in this way. What we suggest, however, given that the proliferation of theories of leadership appears to be part of the ‘problem’ rather than the solution to understanding and developing leadership, is that

the place to start looking for patterns, at least if the point i s development programmes, is in the setting itself, in the everyday, mundane, empirical reality of leadership work Observing Patterns: Following the Leader Our research uses observational or, ethnomethodologically informed ethnographic, methods to study ‘leaders’ and ‘leadership’ in the post-compulsory education sector. Our data comes from ‘shadowing’, ‘following about’, various education sector leaders in various institutions as they went about their everyday work. The central characteristic of our research has been an emphasis on the detailed observation of how work – ‘leadership’ work - actually ‘gets done’. Within mainstream Sociology this approach, ethnography, has often been presented as essentially a methodology of last resort - used primarily for obtaining information about groups and culture usually 'deviant cultures' (sometimes (perhaps unfairly) stereotyped as the Sociology of ‘nuts, sluts and perverts’) that are impossible to investigate in other ways. For us, however, the main virtue of ethnography is its ability to make visible the ‘real world’ sociality of a setting, producing detailed descriptions of the ‘workaday’ activities of FE Principals and Senior Managers. This approach runs counter to the temptation, common in the Social Sciences, when studying others' lives to read things into them. We would not be the first to note that the social world is not organised i n ways that analysts and researchers want to find it. The phenomenon of leadership is no exception to this. We do not want to impose a framework on the setting but to discover the social organisational properties of leadership as it is naturally exhibited. However, we should not underestimate the difficulty of this methodological choice for things that are familiar – in this case everyday leadership and leadership work - are often extremely difficult to see clearly because of their very familiarity. In this paper we explicate two especially common and interlinked patterns – maintaining some kind of public face of leadership and the work involved i n organisational audit. In each of these we document the centrality of the college Principal and accordingly view them as patterns of leadership and responsibility. Finding Patterns in the Fieldwork: Pattern One: The Public Face of Leadership There is no single definition of what patterns are, how they should be presented, what their purpose should be and how they should be used. We started by considering that in finding patterns in the fieldwork we were looking for examples of repeated, grossly observable phenomena in our ethnographic studies of everyday leadership work, describing them in detail and seeking a way to present them as interesting and useful scenarios for leadership development work. What we are looking for when we analyse our fieldwork are patterns of observed behaviour and activity that draw on and reflect the experience of leadership, for as one college Principal commented: “…the only difference between an experienced principal, for example, and an inexperienced one is you’ve just had more time to make more mistakes and to learn from them. The critical thing, I suppose, is to be able to know your mistake, because you don’t learn anything, really, like as much until you find

Page 94

out. You like to try and convince yourself, on your better days, that something may have gone right, but you learn a lot more from this – from the things that go wrong. And it often is so frequently tied up with people who just aren’t quite doing what you want them to do. ...” One persistent, grossly observable, feature that emerges in a range of our fieldwork settings is the extent to which college Principals and their senior management teams engage i n activities to manage the visible, the public face of their institution. This idea of working to maintain in some way the public face of the college takes a variety of forms and surfaces in a number of different contexts. Whilst there are clearly elements of Goffman’s (REF) ‘presentation of self’ involved here – including notions of ‘front-stage’ where the performance is given and ‘back-stage’ where the performance is prepared – the work involved, concerned as it is with the perception of the institution often goes beyond such simplistic, dramaturgical analogies (dripping with insincerity). The fact that the organisation has some kind of image to defend and project is often the subject of powerful and persistent organizational sagas (Clark (1972). Such sagas generally involve stories of some form of organizational change, periods of great instability, instances of ‘organisational nostalgia’ (Gabriel REF), references to the ‘good’ or ‘bad’ old days of the college, or more recent periods of change such as incorporation in the early 1990’s. As with all sagas, retelling it becomes yet another powerful means b y which the public face of the college is both outlined and reinforced. The following is one observed saga that combines what could be described as the ‘ancient’ history of the college with the more recent changes in the early 90’s involving a major cultural change programme. This extract is taken from a speech given five times during one day by the college principal. Along with the retelling of the organizational saga, the audience – made up of staff and new students - are reminded of the college’s set of core values which must (according to the principal) be ‘lived’ by all those working and studying at the college:

“When I first came to [this college] I was actually intimidated. Before I even got inside I had to push through a gang of students stood smoking near the main entrance, y’know, literally push my way through. I’m being honest here, I felt intimidated, and I remember thinking, if I feel intimidated and I’m the Principal then how are other visitors to the college going to feel? When I reached what is now the main reception area I was greeted by the sight of bodies – bodies everywhere – students standing around, lying around, chatting. It looked like what we used to call back home a ‘doss house’. I remember thinking ‘what kind of place have I come to?’ For me a good college is not a youth club, it’s a place of learning, it can be fun as well, but people have to take responsibility for that. We have to make each other feel valued. That’s why we don’t have

strict rules here. We don’t need them so long as we have mutual respect…” The principal takes great pride in the change that has taken place in the college since his arrival over two decades ago. This is evident in the number of times the story of the college’s transformation was recounted to us and overheard over the course of the ethnography. A story not just told b y the principal, but by the senior managers, middle management, administrative and teaching staff. It is a story, a ‘war story’ (Orr 1996), that people within the college draw upon to build and maintain a sense of professional identity. A story whose telling and retelling plays an important role in developing a projecting the public face of the organisation. In analysing these mundane observations of various forms of presentation of a public face of leadership and the college, we particularly draw on Yates’ notion of ‘control through communication’ (Yates, 1989); particularly the argument that there is a link, an interrelationship, between technology use and changing managerial philosophies. The education sector in the UK has undergone radical change and restructuring over the past decade. In particular, this has produced a ‘customer driven’ approach to further education where entrepreneurial ideologies challenge more traditional and increasingly outmoded notions (e.g. the professional autonomy of teaching staff.) Instead, in order for colleges to thrive they are adopting the language and presentational practices of business: one key element of this being the engineering of new cultures, systems and technologies that promote, practice and present these new managerial and customer focused philosophies. However, as Yates (1989) suggests new technologies alone are insufficient: what is required is the vision to use it in new ways. This i s clearly seen in the proliferation college newsletters, to staff, students and the wider community and the way they are used both to communicate to customers and staff and to promote a ‘brand approach’ to education. Consider, for instance, these details from a college newsletter:

“During 2003 SMT recognised that, with increased individual use of IT, there was a need for more consistency of style in College documentation. Examples of the range of diversity in practice were evident in papers that went to Governors’ meeting, in letters from different parts of the College to the same external orgnanization (e.g. the Learning and Skills Council) and in memos from different departments. Font sizes varied from 8 to 14 point size and a variety of typefaces were used… … this inconsistency potentially ‘dilutes’ the ‘brand value’ of the College .. A group of ‘professionals’ was formed to develop documentation standards or ‘house style’ guidelines for use by all College staff. These guidelines should now be followed: Develop and maintain a consistent identity for the College so that all readers will quickly recognise a document as being from the College.

Page 95

Ensure documents portray a consistent high quality, attractive, modern image that accords with the College’s vision, mission and values etc. etc.” These ‘technologies’ are then clearly used by principals and senior managers to promote and disseminate specific leadership visions and objectives. Such technological accomplishments represent and draw upon specific ‘genres’ of communication (REF), genres that evolve over time as new technologies are employed to generate, process and disseminate information in new and innovative ways across organizational domains. These are not isolated incidents but part of a pattern – a series of activities and incidents – that have at the heart of them the desire to project a particular image of the college. This is clearly seen in the next set of abbreviated fieldnotes where a College Senior Management Team are considering how best to present a proposed ‘merger’ of two colleges.

Looking for forms of words and motivating ideas – how best to present this. Meeting begins, discussing a recent presentation by the HR Director . re: collaboration etc.. Says they need to put their stamp on it, tone down some of the phrasing etc R: "What we want is high quality provision across the curriculum for our students" B: "I like the language there .. its direct... I think there's another one which I want for the college about sustainability. At the moment we're living hand to mouth from year to year, which is quite demoralizing for staff.. so how do we say that then? .. give me some words.." .. agree other actions to move things forward .. think of strategies for moderating the language used

spreadsheet) through which they can be expressed. The concept of the audit, previously constrained within financial applications, has now expanded to become a ubiquitous element of daily life, with the education sector being n o exception. The result is a raft of ‘technologies of accountability’ which “do as much to construct definitions o f quality and performance as to monitor them” (Power, 1994:33). Audit in this sense represents less an evaluative tool than a means of indirect control over work practices through monitoring and regulation. To anyone who spends any length of time with College Principals or Senior Managers, the extent to which notions of audit dominate their everyday lives is blindingly obvious. But there are different forms of audit and it impacts o n everyday working life in different ways - ie. there are different patterns whereby Principals and their Senior Management Team can be seen to be visibly oriented to notions of audit. 'Doing' audit in an accountable fashion requires different, observable, patterns, of work. Our interest here is then in how and in what ways 'audit' drives the everyday work of college Principals and how this may be of interest to those wishing t o learn such 'skills of leadership'. An example of the importance of such audit work is presented in the abbreviated fieldwork transcript below:

X Tidies papers ahead of 10:45 meeting with SMT - final run through of scripts for LSC meeting to make sure they are all clear and all telling the same story SMT arrive - X leads the meeting by identifying an error in the student numbers which have been sent to the LSC - leading to an error in the financial calculations which have been made subsequently X: "..and we need to get the numbers right, dont we? ' (looking at Z). Tells Z the right figures to insert

R: "Are you OK for the staff briefing tomorrow?" B: "Nearly.. this is where we earn our money, in how we put it to staff. And my instinct is not to say too much."

Pattern Two: Leadership and Audit - Audit as an Organising Device Our second, and associated, set of patterns comes from observing and understanding a range of activities associated with various notions of audit. It is hardly 'news' to anyone i n the FE sector - accustomed by now to the ritualised nightmare of the Ofsted inspection - that the managerial philosophy that currently dominates issues of leadership in FE is evidently one of 'audit' and the need to demonstrate competence, compliance and effectiveness. As Strathern (2000) argues, ‘audit cultures’ are increasingly common in both public institutions and private enterprise, reflecting the need t o practice and perform a new kind of accountability based around the twin goals of economic efficiency and good practice. These new kinds of accountability have generated new managerial and organizational forms and technologies (the Ofsted inspection; the Quality audit; the Exel

X works through the numbers on a calculator - rehearses argument in terms of funding implications Z gives a clear walkthrough of the financial data.. X: " .. and we want a clear indication that we're going to get Premium Funding.. that's the key outcome we want from the meeting" X now concerned about whether college will get premium funding due to a change of emphasis in the criteria.

X: "We need to present the numbers in a way that makes it easy for them to tick the criteria.." Later.. X is finalising update paper for LSC (re progress against strategic targets) .... 'thinking on screen and playing around with content'... Has found a way of using the numbers re: student recruitment and retention selectively to strengthen their case for premium funding.

Page 96

In the example above, the Principal is observed manipulating management data to consider how best to present important information to their funding body, the Learning and Skills Council (LSC). The existence of various categories within which colleges can calculate recruitment, retention and results, and the differing funding formulae provided by official bodies, mask the way in which reasoning is shaped b y contingencies and the ‘skill’ that goes with recognising, identifying and addressing such contingencies. These circumstances influence how the ‘formula’ is applied i n specific cases, what determines the extent or limitations of its applicability, and the requirements for making any formula ‘work’ and, be seen to work - “grappling with the sheer practical difficulties of determining which figures are wanted, pulling them out, and then knowing how t o manipulate them and assess their product.” (Anderson et al, 1989:105-6) An understanding of ‘leadership’ within the learning and skills sector must also include an appreciation of the role of the new accountabilities in rendering organizational information and accounts of everyday practice visible. Much of what counts as everyday leadership work within UK FE colleges appears to consist of producing, sharing and manipulating accounts of events, producing a number of subtly different versions. These versions of events are constructed to conform to the new accountabilities of audit i n that they consist of conscious displays of compliance and effectiveness (Neyland & Woolgar, 2002), and yet they can also serve as forms of organizational communication and accountability that allow other kinds of ‘ordinary’ work to be done within the college (Button & Sharrock, 1998; Suchman, 1993). For example, the components of a successful Ofsted inspection may be recycled as the justification for a Beacon Status/premium funding application, an indication of quality provision to entice students to apply to the college, an opportunity for the public praise of staff and as the motivational basis for exhortations to further achievement. In each case, the mode of delivery and the specific choice of content will serve to construct a version or account suited t o the leadership work it is required to perform. As this example suggests, we could make the case that organizational life within post-incorporation FE colleges in the United Kingdom is increasingly characterized by a need to construct accounts and make oneself, other members of staff and the college accountable to a variety of internal and external audiences. Our observations indicate how and in what ways organizational life within post-incorporation FE colleges i n the United Kingdom is increasingly characterized by a particular managerial pattern of activity - the need to construct accounts making the college accountable. 'Leadership' work here consists in the selection and calculation through which activities on the ground, as understood through the management information collected, are made to visibly fit the requirements imposed upon the organization by external agencies. It is not simply a question of seeing what is 'in the figures' and then working out what should be done since, as the transcript documents, 'what is in the figures' has to be worked out. As one Principal told us:

“…the data’s clean, but in terms of can you use it, is it good enough to use, would you rest your life on it today? – that’s more tricky ... it’s so complex, in a way you have to manage that ambiguity … I know how many students I need to achieve overall at the college ... but that’s probably got no relationship to enrolments because, you know, somebody can be enrolled on 8 things, or you can break the course up into four.” In such activities, there is a need for "managing the interplay between precision and interpretation in calculation" (Anderson et al 1989:121) in order to produce an appropriate, and defensible, account of events. Thus the documents produced and the accounts which underpin them also represent ‘gambits of compliance’ (Bittner 1965) in respect of the perceived rules of conduct imposed by external agencies, such that the process through which decisions are made can be seen as “extending to the rule the respect of compliance, while finding in the rule the means for doing whatever needs to be done.” (Bittner, 1965:273) As one Principal said: “…you play the game, you see, y’know ... You see, theoretically what happens is you should put all the figures in and out the end pops what level of support you need. But the reality is you never bloody win! We were told actually if we try to get a thirty-five percent grant that we would never get it, so what we did was we made the figures show that we could just do it on thirty-five, but it is a very tough squeeze…” In this way, the work of principals and senior managers when they engage in decision-making and analysis of management information involves an observable (and teachable) pattern of continuous (and often ingenious) struggles with the technology and the data

1. Conclusion: Patterns and the Shock of the Familiar In tackling leadership (and specifically leadership development) as a design problem our approach differs somewhat from that taken by Alexander (and the software design community) since we follow Erickson (200a; 2000b) i n suggesting that our principal and rather different emphasis i s on the use of pattern languages as a descriptive device, a lingua franca for creating a common ground among people who lack a shared discipline or theoretical framework. Given the varied background from which educational leaders are drawn such an interdisciplinary approach is both essential and inevitable. Our patterns attempt to capture actual lived experience rather than abstract principles: “abstract principles require users of the principles to understand some conceptual framework, and to be able to map the principles onto their domain of concern, the concrete prototypes i n pattern languages make direct contact with the user's experience” (Erickson 2000a). Nevertheless our exposition does abide by some of Alexander’s central concerns since, whilst not using patterns prescriptively (rather as ‘aids to a sluggish imagination’) we are attempting to use patterns t o capture accepted practice and support generalization. We are also suggesting the value of this perspective, this way of looking at the problem of and the solution to ‘leadership’ i n

Page 97

this sector, for the pattern language is not intended to be a book of patterns that is followed by rote. This is not a crib sheet – rather we are presenting a number of ‘sensitising’ issues that can be modified and re-presented according to local circumstances. Any college Principal or Senior Manager who has experience with the situation can quickly understand, discuss, and contest these patterns. As Erickson (2000a) argues: “It is actually a meta-language which is used t o generate languages for particular sites. For any particular situation a subset of existing patterns is selected; i n addition, designers modify existing patterns and create new patterns that reflect the culture, environment, history, customs and goals of the site's location and inhabitants. These patterns - old, modified, and new - form a site-specific language which is used to guide reflection and discussion about the relationships among the site, the proposed design, and the activities of the inhabitants”. Like Erickson we wonder what advantages and benefits this approach to leadership development might afford. Like Erickson we suggest firstly, that patterns, ‘are more concrete, more tightly bound to the situation at hand, and thus more accessible to an audience that lacks a common disciplinary framework”. Secondly, that presenting empirical studies of leadership in action, the ‘doing’ of leadership, “results in the modularization of workplace knowledge, and thus makes i t easier to take a subset of a pattern language and apply it to a new type of workplace”. Thirdly, that this approach “makes pattern languages more amenable to generalization across workplaces”. Finally that there are some important advantages stemming from this particular representational approach, this way of moving from research finding to practical implementation that is linked to the recognition that researchers and practitioners often have different audiences and different needs. Leadership development is a pragmatic activity. In designing leadership development programmes the needs of ‘users’ – Principals and Senior Managers – are paramount and the use of patterns offers a ready means of establishing a dialogue with such users: “communicating effectively with their users, noticing connections between activities and artifacts that would have been otherwise missed, or simply decrease the time between encountering a workplace and being able to ask useful questions”. We have long been suspicious of the special, almost mythic, status (and hype) accorded to ‘leadership’. If there is anything special about leadership it is simply that researchers have yet to realise the importance of the largely unexplicated and seemingly invisible ‘work’ that is essential in the doing of educational leadership. Good leaders are competent and skilled in Bittner’s (1965) gambit of compliance. They know what stories to tell at the right times, they know what figures to produce, how and when. They are skilled in managing performances, images and interpretations. These seem to be teachable yet rarely taught skills. We are not uncovering or revealing secret or esoteric skills. If there is any shock value i n the fieldwork extracts above it comes from their very familiarity – the ‘been there, done that’ experience, the rueful shake of the head that accompanies painful memories. It i s exactly this quality that makes this work and these patterns useful for leadership development. And such skills are not the esoteric preserve of ‘leadership’. These are skills available t o just about anyone working in an organization and used

everyday. But because of the miasma of believes and conceptual approaches to leadership (see REFS..) in the absence of any shared conceptual framework, knowledge, if i t is to be teachable and transferable, must be embodied in a concrete, recognizable form – and for this we advocate ‘patterns’ as a representational mechanism for design.

1.1 Acknowledgements This is a shortened version of a presentation given at the ‘Refrains’ Conference, Exeter University. We thank the participants for their insightful comments, and the staff at various colleges for their continued tolerance and support.

References [1] 1. Alexander, C. A (1979) Timeless Way of Building. New York: Oxford University Press. [1] Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., and Angel, S.A. (1977) A Pattern Language. New York: Oxford University Press. [1] Alvesson, M. and Sveningsson, S. (2003a) ‘The great disappearance act: difficulties in doing ‘leadership’. Leadership Quarterly, 14: 359-381 [1] Alvesson, M. and Sveningsson, S., (2003c) ‘Managers doing leadership: the extra-ordinarization of the mundane’. Human Relations, 56(12): 1435-1459 [1] Anderson, R. J., Hughes, J.A., and Sharrock, W. W. (1989) Working for Profit: the social organisation of calculation in an entrepreneurial firm. Avebury: Aldershot [1] Barker, R. (1997) ‘How can we train leaders if we don’t know what leadership is?’ Human Relations, 50: 343-362 [1] Bass, B.M. and Avolio, B. (1994) Improving Organizational Effectiveness Through Transformational Leadership. California: Sage [1] Bittner, E. (1965) ‘The concept of organization’ Social Research, 32(3): 239-255 [1] Bryman, A. (1992) Charismatic Leadership in Organizations. London: Sage [1] Bryman, A. (1999) ‘Leadership in organizations’. In S. Clegg, C. Hardy, and W. Nord (eds.), Managing Organizations: Current Issues. Thousand Oaks, CA: Sage [1] Clark, B.R. (1972) The organizational saga in higher education. Administrative Science Quarterly, 17: 178-184 [1] Council for Excellence in Management and Leadership (2002) Managers and Leaders: Raising the Game: http://www.managementandleadershipcouncil.org/press/r elease2.htm [1] Erickson T. (2000a) “Supporting interdisciplinary design: towards pattern languages for workplaces”, ‘In Luff, P., Hindmarsh, J and Heath, Christian. (eds) Workplace Studies: Recovering Work Practice and Informing System Design. Cambridge, CUP. [2] Erickson, T. (2000b) “Lingua Francas for design: sacred places and pattern languages”, in proceedings of Designing interactive systems: processes, practices,

Page 98

methods, and techniques August 17 - 19, 2000, Brooklyn, NY United States, pp. 357-368. [3] Gabriel, Y. (1993) Organizational nostalgia: reflections on The Golden Age. In Fineman, S. (ed) (1993) Emotion in Organizations. London. Sage.

[13] Orr, J. E. (1996). Talking About Machines: an ethnography of a modern job. Ithaca, NY; London, ILR Press. [14] Pfeffer, J. (1977). The ambiguity of leadership, Academy of Management Review 2, 104-112.

[4] Grint, K. (2002) “What is Leadership? From Hydra to Hybrid?” paper presented at the EIASM Workshop on Leadership Research, Oxford.

[15] Power, M., (1994) The Audit Explosion, London: Demos.

[5] Goddard-Patel, P., and Whitehead, S. (2000) Examining The Crisis of Further Education: An Analysis of ‘Failing’ Colleges and Failing Policies. Policy Studies. Vol 21, No 3. pp 191-212.

[17] Rost, J.C. (1991) Leadership for the twenty-first century. Westport, CT.Praeger.

[16] Rittel. H. and Webber, M. (1973), Dilemmas in a general theory of planning, Policy Sciences, Vol. 4(2) pp.155-169.

[6] Goffman, E. (1959), The Presentation of Self in Everyday Life, Hammondsworth: Penguin Books. [7] Gronn, P. (1982) ‘Methodological Perspective: neotaylorism in educational administration?’ Educational Administration Quarterly, 18(4): 17-35 [8] Gronn, P. and Ribbins, P. (1996) ‘Leaders in context: postpositivist approaches to understanding educational leadership’. Educational Administration Quarterly, 32(3): 452-473 [9] Grudin, J. (1990). The Computer Reaches Out: The Historical continuity of Interface Design. In Proceedings of ACM Conference on Human Factors in Computing Systems. CHI '90. Seattle, pp. 19-26 © ACM. [10] Hughes, J.A., Randall, D. and Shapiro, D. (1993). 'From ethnographic record to system design: some experiences from the field'. Computer-Supported Cooperative Work 1(3): 123-141. [11] Kerfoot, D. and Whitehead, S. (1998) ‘Boys own stuff: masculinity and the management of further education’. The Sociological Review, 46(3): 436-457 [12] Martin, D., Rodden, T,,. Rouncefield,M., Sommerville, I. , and Viller, S. (2001) Finding Patterns in the Fieldwork. In Proceedings of ECSCW’ 01, Bonn, pp39-58. Kluwer.

[18] Sacks, H. (1972) ‘Notes on Police Assessment of Moral Character’. In Sudnow, D. (ed) Studies in Social Interaction New York: Free Press. pp 280-93. [19]

Sherwood Olson, D. and Stimmel, C.L. (2001) The Manager Pool: Patterns for Radical Leadership. Addison-Wesley.

[20] Strathern, M. (ed.) (2000) Audit Cultures: anthropological studies in accountability, ethics and the academy. Routledge, London. [21] Suchman, L. (1993) Technologies of Accountability: of lizards and aeroplanes. In G. [22] Suchman, L. (1995) Making work visible. Communications of the ACM, 38(9): 56-64 [23] Wright, P. (1996) Managerial Leadership. London: Routledge [24] Yates, J. (1989) Control through Communication: the rise of system in American management. John Hopkins University Press, Baltimore. [25] Yates, J., and Orlikowski, W.J. (1992) Genres of Organizational Communication: a structurational approach to studying communication and media. Academy of Management Review, 17(2) pp.299-326 [26] Yukl, G. (2002) Leadership in Organizations, 5th Edition. Upper Saddle River, NJ: Prentice Hall 1.

Page 99

Some Notes on the Social Organization of Responsibility Dave Martin, Mark Rouncefield

John Hughes

Computing Department Lancaster University Lancaster, UK

Department of Sociology, Lancaster University Lancaster, UK

ABSTRACT This paper is intended as a preliminary contribution to the ongoing discussions in the Responsibility Theme of the DIRC project. The paper’s aim is to present some ideas from the point of view of Sociology, prompted by the claim that most computer system design fails to adequately understand and model both the notion and the distribution of responsibility within complex organisations. In accord with the TA ‘Making Ethnography Accessible’ we will illustrate the approach using fieldwork material drawn from our ethnographic studies of complex organisations, in this case a multi-national bank as a perspicuous example of an organization expressly, manifestly, interested in issues of responsibility.

1. Introduction: ‘what’s in a word’? In ordinary English, the word ‘responsibility’ is used in a number of related senses. One of these is when we might refer to the duties or the tasks of a job as ‘responsibilities’. Another is the sense in which someone can be ‘responsible for’, ‘held responsible for’ someone or something. Yet another, and looser, sense is when some thing is ‘held responsible’ for, say, a crash or something not working out. Here the sense is equivalent to ‘cause’ as in identifying the cause of some disaster. However, the point we are trying t o bring out here is not simply one about words and their definitions but rather about their ‘logical grammar’ as Wittgenstein would understand this. That is, it is to do with how various concepts meaningfully and intelligibly relate t o other concepts and not to others. As Coulter (1979) illustrates: grasping the concept of ‘chair’, knowing what a chair is, involves knowing how it can be related to concepts like ‘wood’, ‘legs’, ‘sitting’, ‘broken’, ‘repaired’, and so on, i n different ways and in different contexts. Grasping the concept ‘broken’ involves knowing how it signals different states of affairs when related to ‘chair’, ‘promise’ and ‘marriage’. The mastery of natural language requires the mastery of the logical grammar of concepts, that is, what it makes sense to say: a mastery that involves knowing the sense of what is being said even when the same word is used. It is knowing the sense of what is being said from the context in which it is being said. Further, it is the vernacular, everyday language that is used t o describe and characterise what it is we do, the actions we perform. It is through this language that we know and constitute the world for what it is, and an aspect of human life

that has profound implications for the discipline of sociology and its phenomena. Its phenomena consist in what Schutz termed, the ‘first order constructs’, that is, the constructs of ordinary language and action (Schutz, 1964, 1967). Sociology’s own analyses – and one could just as well extend this to social science more generally – are, accordingly, ‘second order constructs’ which need to have a close relationship to the ‘first order constructs’ upon which they depend for their sense. In discussing some of the implications of vernacular language for sociological – social science? – inquiry the point is t o draw attention to the immense relevance of how people talk about the things in the world, in their lives, for understanding the social organisation of activities. This is in no sense t o award any kind of ontological privilege to the vernacular but simply to make the point that what things are – what emotions we have, how others are feeling, what sort of things there are i n the world, the position we take with respect to various things in the world, etc., etc., – has everything to do with how these things are talked about within a language community. As far as the vexed question of meaning is concerned on this view meaning is an intersubjective accomplishment and, in this respect, has close affinities with Wittgenstein’s (1968) adage to the effect that if one wants to know the meaning of a term then look to its use and, importantly, acknowledging that such uses extend beyond the purposes of description. As Austin (REF) pointed out, many words are ‘performatives’ in that uttering the words in appropriate circumstances is to perform an action. Thus, and to use Austin’s own example, if I utter the words, ‘I promise X’ under appropriate circumstances (for example, not acting in a play or reporting on what someone else said) then I am not reporting upon, or describing, some ‘inner state’ but I am making a promise. Accordingly, ‘describing what is in the world’ is but one of the ‘games’ that we can play with language.1 And even this ‘game’ is very often bound-up with other activities in particular, and to put i t briefly, issues of praise or blame. Which brings us to another important aspect of the viewpoint on responsibility being set out here, namely, social organisation as a moral order. Obviously the idea of society as a moral or normative order i s not new in sociology though the emphasis given to the idea along with any methodological consequences that might flow from it have varied. It is an idea closely bound up with the further idea of social action as rule-following; yet another 1

It should go without saying that Wittgenstein’s use of the analogy of language with games is just that – an analogy. He is not, and nor are we, saying that language is a game.

Page 100

notion which has been variously treated within the discipline. For the most part rules have been regarded as one the elements to be included in the explanation of action and, as such, external to, and constraining of, action. One of the main problems with such a stance is that rules have to be treated as ‘external’ to, and independent of, the action that are supposed to circumscribe. However, the description of many actions i s dependent upon the rules which may be invoked in their description. In other words, violating the requirements of causal explanation to establish the regular connection between phenomena. Thus, and perhaps an obvious example, t o describe a move in chess depends upon the rules of chess i n the very description of the move itself. Moreover, it would not make sense to ‘behaviourise’ the description – ‘thing shaped like a castle on 3rd black square from the right….’ – to see the move as ‘following a rule’. The rule enables us to see what, i n chess, particular moves are. Such rules do not constrain us but enable us to play chess in the first place. Many but not all rules are of this character. They serve as instructions rather than as devices forbidding us to do something we might otherwise do were it not for the rule in question. Wittgenstein draws a distinction between ‘action conforming to a rule’ and ‘action according to a rule’ in order to make the point that it i s not difficult to formulate a rule which covers the action concerned, but this is not the same thing as an action being performed as a rule following action.2 In other words, we need to know whether or not an action is, in fact, being performed as a rule-following action, that is, whether or not the actor i s orienting to a rule that could be said to be governing his/her behaviour. In which case, the rule being following i s intimately involved in what action is being performed and, of course, how it is to be described. Further, and again borrowing from Wittgenstein, no rule dictates its own application but must be applied within some context which may determine what the relevant application of a rule might be in this case, i n these circumstances, in respect of these activities, and so on. It is a point such as this that provoked Bittner (1965) to claim that traditional sociological analysis of organisations had failed to appreciate in, for example, its ubiquitous use of the distinction between ‘formal’ and ‘informal’ organisation as an analytic scheme for understanding organisational action. For him traditional approaches fail to address the problem of systematically determining which of the ‘real world, real time’ activities of organisational members are categorisable as ‘formal’ or ‘informal’ without consulting how the organisational members themselves treat such matters. Typically what happens is that ‘real world’ activities are described as ‘formal’ or ‘informal’ by some theoretical-cummethodological fiat.

2. Gaining a Perspicuous View on Responsibility In this section we are concerned with illustrating some ideas about responsibility through fieldwork extracts from a longterm ethnography in a multi-national bank. Responsibility was certainly not one of the main concerns or motivations behind the original study - the emphasis was on explicating everyday work. But there is no 'in-principle' reason why our approach cannot examine issues of responsibility 2

In actual fact Witttgenstein’s point was about meaning and whether or not meaning could be seen as rule following.

responsibility is merely a topic of investigation like any other. Moreover, if responsibility is a topic in the interactions or activities we observed then the organisation and operation of such 'responsibility' is generally someone's mundane work. Furthermore, in its accountable and reflexive achievement, responsibility is oriented to by members and what responsibility amounts to will be "displayed in the particular circumstances of 'this interaction' between 'these people' for 'whatever purposes' done 'somehow'. While notions such as 'responsibility' may be a favoured topic for social science research the point is not to make it an a priori focal concern. But rather to view it as possible phenomena amongst innumerable relevancies which may or may not have importance for those doing the work. That is, what is of interest in our ethnographic studies is the determination of the relevancies that those who do the work see, the considerations that are important to the carrying out of the work-in-hand. Furthermore, these sensitivities would be of interest, as observable features of the way in which a job was done. So, for example, ‘responsibility would be of interest if work was carried out in such a way as to make plain in some way that 'responsibility' was being taken, avoided or misused or so as to ensure that others were made aware of their 'responsibility'. As Lynch notes; "In a variety of ways, members can be held to account for what they are doing. They can be asked to keep records, to show that they have followed instructions, to justify their actions in terms of a set of rules or guidelines, and to inform others about what to do and where to go." Any examination of the fieldwork notes reveals the omnipresence of issues of responsibility through the ways i n which responsibility manifested itself as an integral part of a working division of labour. So, for example, it is readily apparent that grossly observable features of a working division of labour call into play, as a feature of everyday work, a number of aspects of responsibility. Examination of the field notes also indicates the ways in which responsibility interacts with and incorporates other features such as calculation and accountability: most notably in the notion of being responsible in making lending decisions. So for example, 'proper' ‘responsible’ calculation is not merely about the manipulation of numbers but includes the idea of audit and accountability: that figures have been properly arrived at, that the calculation and any action emanating from it are the product of responsible accounting work and are therefore 'accountable'. Such ideas about responsibility and its connection to audit also extend to various aspects of work management, the calculation of work and bonuses and the production of management information. Finally, ideas about responsibility are not simply an academic interest but regularly appear in everyday interaction and conversation. Ideas about responsibility emerge and are articulated in the course of everyday work. Amongst the most frequently expressed ideas about responsibility are notions of responsibility to customers. This is rather more than the notion of legal responsibility but includes, for example, the idea that 'local knowledge' - a particular knowledge of a customer and their account and circumstances - should be used for the benefit of the customer (and also the bank's customer relations) so that. For example, cheques are not 'bounced'

Page 101

unnecessarily. In the same fashion ideas about responsibility to the organisation, to their 'team' and to themselves also strongly feature. Conversations about correct procedure, appropriate routines and so on as well as talk of 'taking ownership' of problems.

• a ‘Doer2’ who checks the work of Doer1s and deals with more complicated processes involving a greater understanding of complex, generally legal, matters and procedures. In particular, and in this instance, the changes in ILA independent legal advice - that had resulted from recent court cases and which had not yet been written into the software program;

3. Responsibility and the Working Division of Labour A working division of labour refers to the way in which the tasks performed by an individual form an interdependent part of a larger sequence of tasks done by others. The working division of labour necessarily implicates some notion of responsibility - the division of labour is a division of responsibility. One of the features noted about such working that connects strongly to ideas to do with 'responsibility' i s ’the egological principle of organisation’. As Anderson et al (1989) note:

• a Supervisor who distributes the work around each team, prioritises the work and deals with computer problems; • a Senior Securities Adviser who deals with the more complex procedures of mortgage debentures and terminal indemnities - as well as performing a checking function; • an Assistant Manager who performs a range of checking, advisory and procedural responsibilities - and who is generally regarded as a

"From the point of view of an actor in a division of labour,

working through the endless stream, getting things done, means doing-what-I-do and passing tasks on to others so they can do what they do." (1989: 161) In this view responsibility - in the sense of responsibility to a team - resides in completing a task in such a way as to enable others to attend to their tasks. Each person within a ‘working division of labour’ operates within ‘horizons of relevance and responsibility’ in which events, persons, information, incidents, knowledge, and so on, move in and out depending upon, and dealing with, work’s eventualities. A further dimension of individual responsibility lies in the fact that the practical accomplishment of the work involves some notion of ‘responsible working’ - requires learning, knowing about, knowing how to use, that information, those artefacts, those files, etc which are relevant to work and those which are not. This is invariably, a matter of exercising judgement or responsibility in light of the various contingencies and uncertainties that arise during the course of an ordinary working day. Responsibility in everyday work is not necessarily encountered as a single coherent entity or philosophy or stream of work (though sometimes it is) but as a variable practice that has to be aligned in different ways with changing organisational routines and notions of responsibility. It is the ability to ‘just do things’, the detailed understanding that ‘this is my work’, that enables workers t o quickly evaluate any interruption, any contingency that arises in the course of the everyday flow of work as either ‘their responsibility’ or ‘not their responsibility but someone else’s responsibility and thus somebody else’s (not their) problem’.

‘font of all knowledge’.

4. Responsibility and Coordination. The notion of a 'working' division of labour also refers to the ways in which many aspects of work in the Bank are explicitly concerned with coordinating interdependencies and responsibilities of various kinds in order to ‘get the work done’. Much of this coordination work consists of accepting responsibility for distributing relevant information t o relevant parties and keeping this flow of information going as a routine state of affairs. This routinisation and 'proceduralising' of responsibility is, for example, made obvious in the organisational procedures for lending money outlined below. Here the organisationally prescribed actions of different grades of staff as well as the various interdependencies between activities and organisational personnel - the ‘Appointed Officer, the ‘Grade 7’ and the ‘Grade 6’, are outlined. Similarly, as part of a formal notion of responsibility, the ‘procedural implicativeness’ of the various forms of paperwork and computerwork (ISS; GAPP analysis) that have to be completed are clearly indicated - "if approved, sign sanction and pass to Personal Loan Team".

The importance of the organisational division of labour i n distributing tasks and responsibilities was grossly observable in all the fieldwork settings. For example, work within the Securities Centre of the bank involves a very basic division of labour and responsibility. Overseen by the Manager and provided with secretarial support, two teams of nine workers service the branches .The hierarchy of Doer1 to Assistant Manager is largely seen as hierarchy of responsibility, knowledge and experience. Each team consists of the following: • five ‘Doer1s’ who do the basic processing of securities requisitions;

Page 102

Example 1: System for Personal Loan Process 1. Appointed Officer: attach face sheet and CN P/L Application £.... rec’d. 2. Grade 7: Credit Score then pass back to Appointed Officer. If refer - write reason on Application Form. 3. Appointed Officer:a) Approved - Check 836 and if happy sign sanction and pass to Personal Loan Team. b) Refer - look at account, make decision:i) If approved, sign sanction and pass to Personal Loan Team. ii) Decline - pass to Grade 6 to see if we can help any other way ie:- PMRL or Consolidation. Send letter from Fileserver. c) Decline - Pass to Grade 6 to see if we can help any other way. Send Fileserver letter.

In this next example, of the commercial sanctioning process (lending money to businesses) responsibilities, and who is t o discharge them, (the briefing officer or the interviewing officer), are clearly outlined in a series of activities. Example 2: Commercial Sanctioning Process Activity 1: Information gathering Pre-interview Briefing Officer: - structured request for information pre-meeting: - customer letters on ISS; - Customer Information Request form; - arrange interview; Activity 2: Analysis & Briefing Briefing Officer - structured briefing financial analysis:

including

- Interview Note Pad; - Balance Sheet Assesment Form; - GAPP Analysis; Activity 3: Customer Interview Interviewing Officer: - prompt to ensure all areas of risk assessment are covered - Use Interview Note Pad to record notes; - refer to Balance Sheet Assessment Form; Non-Financial Information Activity 4: Appraisal Form/Report Preparation Interviewing Officer: - structured output to record full Risk Assessment for UDP and Region files - Record assessment on - Appraisal

Form;

or

lending interview that is a part of the sanctioning process. In this example the Manager has to consider an approach t o borrow money to purchase a hairdresser’s shop. Prior to the meeting he looks at the Customer Brief from the Lending Centre along with some Interview Notes as well as a file containing a range of computer printouts. He also has his own set of notes which consists of a set of questions concerning the relationship between the people proposing to borrow the money, use of the account, the prospective borrower’s contribution to the purchase, serviceability of the debt, and an outline of some issues connected to hairdressing as a business proposition. His preparation is very thorough especially since he considers the request as a ‘non-starter’ due to the lack of any obvious contribution from the borrowers.

LO: “What figures are we talking?” C: “.. 68K .. the Building Society say its worth 65... we think it’ll come down..” LO: “..first question - what have you got to put into it?” C: “.my own home.. thats all .. we have’nt really got any ideas..” LO: “For a commercial proposition to get off the ground we’re looking at a third.. the Banks have had their fingers burnt in the past.. (explains) ... its 20K .. or something like that..” C: “There’s no way round it?” LO: “No.. thats the first thing that any Bank will ask..” C: “So we have to get 20K..” LO: “ Not necessarily - speaking as a cautious banker . . we’re interested in your commitment to the business .. if you’re raising money on your property .. (but) you’re looking for the Bank to raise it all... I’m being honest with you .. you’ll incur a lot of expenses .. (and) you cant get a domestic mortgage on it...(also) I’m talking off the top of my head (but) its a lot of money for a hairdressers.. the business has got to service that..”

Small Appraisal Form -

plus

Non-Financial

Information Form These examples illustrate how coordination and its associated responsibilities can also be a feature of specifically designed artefacts that facilitate coordination by embedding descriptions of the task, along with other relevant information, within the format of a document as ‘instructions’, as ‘persons responsible’, and so on. These artefacts serve as instructions for a set of institutionally identified persons - the ‘Business Manager’s Assistant’, the ‘Records Clerk’ and so on - t o perform particular tasks and, in addition, by the clear assignation of responsibilities, provide for the possibility of audit. The ‘completeness’ of the paper record acts as an audit trail; providing an outline, rationalisation and justification for administrative decisions as well as a clear delineation of administrative responsibility. However, the two formal procedures displayed above are just idealised outlines of processes - in the course of everyday work responsibility i s demonstrated, made manifest and oriented to in practice i n various ways. Responsibility has to be discharged and displayed in everyday work. A good example of this is in the

C: “ OK .. so if we find something different and get a bit of a contribution its worth coming back?” LO: “It depends where you get it from….” LO: “ I’m going to play Devil’s Advocate.. its (the proposition) a lot of debt to have around .... cheer up .. it could be for the best” In this interview the Lending Officer clearly establishes the bank’s basic position on lending to business propositions “for a commercial proposition to get off the ground we’re looking at a third.. its 20k .. or something like that”. He also outlines other aspects of the bank’s position; “we’re interested in your commitment to the business… you’re looking for the bank to raise it all”. Of course this is a ‘routine’ rejection of the proposition, but it is not rejected out of hand – these people are, after all, customers and may well come back with a better proposition or a desire for other bank products.

Page 103

5. Responsibility, plans and procedures Within most major organisations, one of the more prominent ways in which distributed coordination is achieved and responsibility displayed is through institutionalised plans and procedures and their ‘situated interpretation’ (Suchman 1987; Dant and Francis 1998). This refers to a wide range of formal procedures which, in the Bank, would include, for example, ‘how to do an Annual Review’; ‘How to Sanction A Loan’; ‘How to Write A Report’ and so on. The step-by-step processes for the accomplishment of a procedure contained in manuals such as the PIF, the Lending Manual and the Action Sheets and so on. Their explicit point is to co-ordinate the work of numbers of people in order that separate work activities and tasks come to have a coherence and, typically, through this meet other goals such as efficiency, meeting time constraints, assigning responsibility and so on. However, as Suchman (1987) argues, the plan is an abstract construction which will, at the very least, require articulation with, and application to, the specifics of the circumstances in which it is to be followed. Following the plan’ will always involve more than can be specified within it - and the notion of 'responsibility' incorporates this idea that plans should be activated in a sensible and 'responsible' fashion. Following plans in ‘real world, real time’ work does not involve the supposition that everything must be spelled out in minute detail. Instead ‘characteristically’ plans are ‘recipient designed’, that is, spelled out to an extent to which those who are to follow them are, for example, familiar with the circumstances in which they are to follow them, sufficiently trained in the tasks involved, and a host of other possible considerations. Nor does the making of plans indicate any expectation that the course of actions that they specify will, of necessity, follow through. Indeed, the point of plans – followed responsibly - is often to maximise the chances that courses of action will ensue despite the contingencies that can arise. Here, for example, is a Business Account Manager talking about the Bank taking a debenture as part of the process of providing an increased overdraft facility to a company, thereby illustrating the way in which plans are practically accomplished in the course of the work and how, in this instance, paperwork acts as a mechanism of distributed coordination and for developing the awareness of work;

“ ..this ..is a limited company account and it works very well,.... (looking at file/printouts) and computer information, yes, used to quite a degree,.... a limit of 50K, .. I did look at the 836 and the 838 printout again to see this utilisation of an account, see what its doing (looking at printout) ...it works very well, no excess ther,e is there at all, no excess days, ..thats a very important part of information produced from the computer system,.. number of days in credit is important so its not in overdraft all the time, .. shows that credit balances are seen, .. we know that by those days but it does appear, ..together with maximum facilities are fairly lightly used .. ..so in discussion we go down and we talk to them about how the company works, ...the modus operandi of their trading businesses so that I could get a feel and get a handle on how it operates, get a feel of what the management is like, .. it all comes into the decision-

making process ....because seeing the operation, talking to them, trying to ask questions and get a feel as to how good they are, and they’re pretty switched on these guys... they both know, they talk about these deals, ...they know what they’re doing and its difficult to get that picture over to an obscure lender who’s stuck up there in Regional office, thats why recommendation is so important .. …….he (the customer) said “I’d like a 100(K) as standby and if something comes up “.... well, ..that was over ‘dp’ (discretionary power) ...we’re in for a total exposure of 140K, ...overall, ... so we’re then left with do we lend him 100 grand? ...decision making process, what’s the companies trading performance been like? quite good, ..whats its proven track record from audited figures? thats quite good, surplus resources in the company, retained profits in the company. .. thats quite good. What about the product that they’re dealing with? do we consider thats the sort of thing that is. that can be moved on and sold....(discusion of business) ...it shifts .. So, the product, the siting, the company, proven record, management; what do we think of the management? pretty good, pretty switched on, .. everything about it looks OK, so we want to go ahead and do a recommendation, we’ve got to go up to Region because its over those limits. Look at security, yeah, the Bank should have a debenture because we’re principally lending, well we’re lending on the company, we have got security we’ve got a guarantee and its backed up by deeds... we’ve got security for about 50 grand, which covers that (pointing at file) .., ... but, we should have a debenture, if it all fails we want to put a receiver in and take that stock.... we’ll put that (debenture) as a condition of sanction... and I think thats right, .. we should have a debenture now, we’re lending 50, ..... we think we should do it (make the loan) but Bank policy says we should have a debenture..” Here we see the how responsibility involves ideas about not just following the plan, completing the paperwork, consulting the necessary computer printouts but also to 'get a feel' for the business - to enable him to make a judgement as to its soundness and creditworthiness. The extract also demonstrates how ‘responsible’ plans (and plans in the bank are almost always ‘responsible’) often include ‘fail safe’ devices to cope with situations where things are ‘not going to plan’ b y specifying arrangements for adaptation of the plan t o exceptions, unforeseen circumstances, even extensive revision, as well as mechanisms to oversee the implementation of the plan and enforce its requirements - in this particular case that if the loan ‘goes bad’ because they hold a debenture they can “put a receiver in and take the stock”. But, of course, responsibilities can be modified as circumstances change. In the following example a Business Manager outlines the actions taken on an account that i s 'under report';

"..where you might get an account that is within your discretionary lending limits..but it may be in trouble, it may be naff, it may be at risk..and if you feel that that is the case then you are duty bound to report that to region

Page 104

as a risk of loss or as an unsatisfactory account.. … If its under report.... I as a branch really have no discretion whatever .. I cant let that drift up over whatever that max is for anything, I’ve no discretion at all . having said that, that max was 31750.... and I’ve, in listening to him and talking to him, and I’ve put a little note on screen... I’ve said he could go to 3200, so I have exceeded what I should do, but only by £250, but what I did do, to save paper and generating a ..paperchase. I just rang a fellow at Region and said. ‘look, you know this is what the score is, I’ve agreed he can go from that to that (pointing at file) for the remainder of that term ..and thats my max, … and will you give me verbal sanction that I can increase that limit on the computer to 32000 for the remainder of that term, so he said yes.. . I wasnt going to increase the Bank’s exposure anymore whatever unless there was a guaranteed repayment source for that bit.. by increasing it by £250 thats marginal..." In consequence we are somewhat sceptical about stilted, rationalistic models of responsibility and decisionmaking, for as Feldman & March (1981) note, models of strictly rational decisionmaking create expectations that are rarely met in practice.

6. Embedding responsibility in artefacts: using technology to support responsible decision making. One of the ways in which plans and procedures appear in work activities as aids to responsible decisionmaking, is to formulate the work’s activities as step-by-step stages – and then embed this in a ‘technology’ of some kind – paper or computerised checklists, workflow managers etc. This feature is, of course, a standard one in bureaucratic organisations and one which has been explicitly designed as a means of making work subject to a system (Yates, 1989; Weber, 1947)3 and one commonly found, for example, in 'workflow' software packages. In this section we examine various ways in which responsibility is formalised and embedded in artefacts as part of a process of attempting to secure and ensure standardisation of approach to the everyday business of the bank and its treatment of customers. Such artefacts are ways in which organisations attempt to both ensure responsible working – by insisting on the following of procedure – and, perhaps paradoxically, a way of reducing individual responsibility by insisting on the implementation of standard procedures. In the Business Centre a number of software packages, notably ‘GAPP’ (Grading and Pricing Program’) and the ‘Risk Grading’ on ‘Fileserver’ had been introduced and were intended both to support decision making and to improve the speed of processing thereby giving staff more time to be ‘pro-active’ to develop customer relationships and sell Bank

3

Much of the concern of Weber’s work is with the extent t o which the rise of rationality in the West is marked by an extension of planning and calculation to more and more areas of social life.

products.4The following extract shows a Business Manager’s Assistant carrying out a ‘GAPPing’ exercise prior to the Manager’s visit to the company;

Next. 1. Gets screen - ‘Customer New Record’ - fills in details from GAPP data input form (obtained from company’s accounts) Screen ‘Customise’ - (name) - fills in details - date acc obtained etc 3. Screen - “Business Definition’ - “What does pharmacist go under?” - discussion with other BMAss - “try that one” - clicks on various titles - “whats other?” - other small screens appear. - eventually finds it. 4. Screen ‘Audited Management Accounts’ - “do you put a minus in here if its in brackets?” “Yes - it will print up then” - filling in details from form. 5. Screen - ‘Management Details’ - (series of questions yes/no clicks) - management assessment; financial monitoring; trading environment; short term problems; 6. Screen - ‘Facility Summary’ - ‘New Customer facility’ as each section of the screen is entered ‘help/explanation’ messages appear at the bottom of the screen 7. Prints out - ‘Risk Analysis Summary - gives risk grade and ratings on facilities (what should be charged) Again - as with other software packages - the material to be entered - manually - into the program already existed elsewhere in the system - yet the inability of packages to ‘talk’ to each other resulted in wasteful duplication of effort - eg the ‘Decipher’ package. It is important to recognise that GAPP was simply an addition to the existing risk assessment and pricing ‘devices’ - in some senses merely automating what had previously been done (and continued to be done) manually. GAPP, although incorporated into the lending process appeared as a mere additional check in that process rather than integral to it. This meant that GAPPing seemed less important as a decision-making device than as a ‘security blanket’ for decisions already made; and the starting point for negotiation (particularly over pricing) with the business concerned. As an Assistant Manager said; “you cannot say straightaway...just because the computer program says 1% higher...you cant just impose a 1% rise...you’ve got t o use it as a tool...”you’ve got to sum up how much the overdraft is and whatever..” This position - of using the software t o confirm rather than determine decisions - may have arisen as a consequence of the inclusion in the program of ‘non-financial’ 4

A number of new software programs had been placed in the Business Centre notably ‘GAPP’ (grading and pricing policy) and ‘Decipher’ (a balance sheet information and analysis package) and eventually the ‘Balanced Business Scorecard’ (a performance measurement program). The GAPP machine was used to calculate Risk Grade of Businesses (19; 1= “substantially risk free, with minimal risk of failure”; 9= “loss likely”) thereby influencing lending decisions; and the pricing policy that should be adopted (which was also influenced by the Risk Grade).

Page 105

information which could significantly influence the risk grade obtained and which was dependent on the Manager’s store of local and anecdotal knowledge; eg “are there any signs of creative accountancy?”; “are there any anecdotal signs of problems?”. It may also represent a reflection of managerial experience and scepticism about the information provided; an awareness of the variety of techniques that could be employed to disguise the ‘true’ nature of an account. One other ‘technology’ was the lending mnemonic 'CAMPARI & ICE' - that appeared in the Lending manual and on the paperwork that was used to support lending decisions t o Regional Office. ‘CAMPARI & ICE’ was used to guide lending decisions by highlighting the range of factors - ability t o repay, purpose, etc. - that needed to be taken into account when making lending decisions. If the Lending Manual might be regarded as the ‘Bible’ of Bank products and procedures, i t might be suggested that ‘CAMPARI & ICE’ constitute the ‘Ten Commandments; commandments, moreover, whose neglect i n the past, in the late 1980s, had, at least in Bank mythology, contributed to a massive increase in bad debts. However, what was notable in the practical application of ‘CAMPARI & ICE’ was its skilfull application to assemble a 'case' for a lending decision, often being used retrospectively to justify a decision already made on 'gut feeling' or the apparently 'intuitive' deployment of lending ‘lore’ developed over the years. As one Lending Manager said, “You usually find that the decision you make from your gut is the one you go with.” It was thus an informal/formal mechanism for assessing lending proposals; formal' because it appears in the lending manual; 'informal' in the sense that attention to its details, as opposed to its general spirit, is relatively rare. For Lending Officers and Business Managers much of the work of Report Writing is primarily concerned with developing a persuasive 'account' for a lending decision to Regional Office, focussing on those factors with which officials at Regional Office will be most concerned. In this sense ‘CAMPARI & ICE’ is less about the process of making the decision than accounting for it; and in much the same way as Garfinkel (1967) portrays the process by which juries account for their decisions. In the following extract a Business Manager is justifying a lending decision (for a customer who is being pursued by the Inland Revenue). What is important about this extract is that this is precisely the kind of conversation the Business Account Manager will have with his Assistant and with other Business Managers and consequently reflects aspects of organisational socialisation, of developing a sense of the organisation, what it does and why it does it, that lies behind the rather dry notion of ‘responsibility’ or ‘planning’;

.. yeah, but where do you draw the line? is it right to go throwing good money after bad? we’ve done that, been down that road, in the past .. throw out a bit more and you think ..it just staves it off ..why do we stand in the shoes of the Revenue? Is it right that you throw another £6000? What are you going to do about next year’s tax? ...thats probably where some of that (pointing at figures) has come from... tax or something, .. where somebody has gone out and thrown out a little bit more, thrown out a little bit more, and so on and so on, .. and probably other managers have just nodded it through, without actually saying, hang on, how are you, realistically going to deal

with this? how are you going to service this borrowing? where is it coming from? ... I dont see there is any benefit in just shelling out money all the time, .. there must come to a point where you’ve got to say, ‘I’m going to draw a line here, ...and I think it should have been drawn a long time ago, myself,” It is as a consequence of just this kind of conversation that Business Managers do not need total recall of ‘CAMPARI & ICE’ to get a sense of when lending decisions are ‘wrong’; thereby constituting a divergence between what might be characterised as an 'institutional logic' (Buttny 1993:166) - of highly specialised rules and procedures - and a 'folk logic' (Buttny 1993:49) “a logic for action, that is, what is right, moral or at least acceptable...what counts as 'right', 'smart' or at least 'passable' conduct..." in which lending decisions are a product of 'gut feeling'. As Buttny writes, "..negotiating and coordinating diverging logics is more complex than simply applying a general rule to a particular situation."(Buttny 1993:167) The 'logic' deployed in this instance includes the ‘practice’ of using a range of implicit and explicit rules and guidelines in appropriate contexts. One of the properties of such formatting as ‘CAMPARI & ICE’ is the way in which it can proceduralise representation and, through this, represent the work to others for particular purposes. The format, for example how to complete the Lending Appraisal Form, how to complete a GAPPing exercise; how to complete a Balance Sheet Assessment Form and so on, functions as a set of instructions in both its creation and its use. This is, at least one, reason for the importance of standardisation. The various forms in use in the bank are all designed to collect standard information, to make the information comparable and to control the information that i s provided. In this respect they are solutions to the problem of the assembly of information in organisations identified b y Garfinkel (1967); that is, what information is needed and its ‘value’; the worth of collecting the information with reference to the effort involved in its collection. The attempt to get people to comply with formats is often, with regard to those who must apply them, a disciplinary matter; that is, an attempt to ensure, by laying out a series of procedures to be followed, for example in the Action Sheets or the Lending manual, that the persons who need to comply actually do so. Nevertheless despite the obvious benefits of standardised processes and formats - the format does not always, in itself, convey an adequate ‘sense of the work’ and it is in these circumstances that local knowledge and a range of interactional skills are deployed to help ‘make sense of’ the work.

7. Responsibility, the Customer and Complaints. One of the professed aims of the strategic planning of financial service organisations has been the reconfiguration of customers such that their behaviours and interactions are rendered reasonably predictable. With the growing amount of information compiled and used there is a commensurate attempt to formalise and standardise the formats for the presentation of information (Randall et al., 1995) and efforts to ensure that customers behave in a way that will facilitate such a uniform approach. At the heart of this ‘configuring the user’ (Woolgar 1991) lies the notion that both customers and

Page 106

staff can be trained to behave in an ordered fashion. Managers routinely talk about how the organisation is keen to configure its customers to the bank’s way of doing things; aiming t o “train the customer to do the work of the bank” viewing the customer as " a partial employee of the bank”. So, for example, in telephone banking it is common for operators to explain what is and is not possible over the phone, direct customers where to find information on bills, chequebooks and so on. All this information helps to configure the customer as to what i s required for smooth interaction, how information should be packaged, what services are possible, how they are delivered and so on.

in terms of customer satisfaction in being “bad for efficiency, overall satisfaction, dealing with standing orders, (and) sorting out mistakes”. In the Branches, while, in general, the error rate was small, it was constant and resulted in time consuming 'error hunts' and corrections. Further, with increased staff absence, the employment of part-timers unfamiliar with the particular office - where files were located, what correct procedures were, and so on - and redeploying staff unfamilar with the processes to clear backlogs of work, the error and complaint rate was not being significantly reduced. A number of other comments and typical exchanges from the fieldnotes support this view; “Ask X, she might know”

Both computer and paper based technologies act as ways of ensuring and standardizing ways of responsible working. Meanwhile other devices – notably scripting, the use of standard letters – whilst also having this objective also served to pass responsibility ‘back’ to the customer. Customer interaction is in some senses the most unreliable aspect of everyday work – because, of course, they have their interests not the banks foremost. Customers cannot be relied on t o produce their questions in a fashion that is predictable or consistent with the institution’s order of things, nor can they be relied on to furnish all relevant information. Interactions with customers can, then, be hugely unpredictable for even if not all customers are awkward, many are. Customers ‘typically’ make multiple enquiries involving moving in and out of a range of screens and software packages. They typically ‘forget’ then ‘remember’ enquiries, digress, 'waste time' and generally behave in ways that cannot be accounted for by any simple process model. The ‘art’ of operator work resides then, in the accomplishment of a customer’s individual requirements and making this fit with the more standardised requirements of the bank. In customer interactions trying t o keep the customer satisfied is a matter of juggling a quite complex and potentially conflicting series of demands (Randall and Hughes, 1994). Banks employ the use of scripts as a means for operators to organise their interaction with customers. The use of scripts means that when customers interact with their bank the interaction has generally a standard form, staff are prevented from engaging i n organisationally problematic, irresponsible, talk and processes are streamlined. As Strathern (2000) argues, ‘audit cultures’ are increasingly common in both public institutions and private enterprise, reflecting the need to practice and perform accountability based around the twin goals of economic efficiency and good practice. These new kinds of accountability have generated new managerial and organizational forms and technologies through which they can be expressed. The concept of the audit, has conjured up a raft of ‘technologies of accountability’ which, as Power (1994) suggests both monitor and construct notions of quality and performance. Audit in this sense represents less an evaluative tool than a means of indirect control over work practices through monitoring and regulation.

“To be honest with you, I dont know how it works...I’ve only done this job for a week...what’s your phone number? I’ll find out and ring you back.” On the phone..”Your charging structure..we used to charge them..how does it work now?” “The girl there thinks its...but she’s not that sure” The main sources of error and complaint came in the charging of customers that, in a number of cases, was the result of previous errors which had not been thought through when originally corrected. Part of the problem here is that of going beyond the routine, and part is the variable distribution of decision making between the computer system and the staff. The failure to cancel standing orders in time occasionally means that the account becomes overdrawn and incurs charges. The latter is automatically triggered on the computer when the agreed limit for the account is exceeded. When the initial mistake is rectified - typically after a customer complaint - and the money paid back into the account, the charges were often overlooked, so initating a further complaint from the customer. As a Manager commented: “people are’nt instructed to think through the effect of that change - they’re only interested in putting it right” In general, and contra the impression given in the Which Report complaints appeared to be dealt with in a courteous and effective, if not always speedy, manner. The instructions o n complaints handling - including the lists of ‘do’s’ such as ‘listen and don’t interrupt’, ‘apologise’, ‘deal with complaints immediately’, and ‘dont’s’, such as ‘argue’, ‘blame others’, ‘ignore complaints’, seemed to be followed. Nevertheless, there was a feeling that staff were already doing all that could be done to prevent or mollify complaints and that much of the remaining source of complaint was a product either of customer stupidity or Bank policy, neither of which was amenable to dramatic change. “ it depends what people think is a complaint..if they complain about second class mail, that’s still a complaint even though it is’nt anything we can do anything about”.... ”its the fault of the system...they’ve cut out all checking and that’s why it happens”. This idea, that Bank policy is a source of many complaints, clearly emerges in the following extract from the fieldnotes:

The issues of responsibility, error and complaint were of particular importance for the management of the Bank i n general and the branches in particular. This concern had been heightened by a Which Report that had placed the Bank lowest

Page 107

Phone call..customer has gone into another branch without his chequecard and been charged.

Uses computer to check account, explains to customer about charges - asks Assistant Manager.

REFERENCES:

Explains charges to customer..customer had not been charged before but this had been at the mangers discretion..

Anderson, R. J., Hughes, J.A., and Sharrock, W. W. (1989) Working for profit; The Social Organisation of Calculation in an Entrepreneurial Firm. Aldershot, Avebury.

Customer wants to talk to Senior Manager.

Austin, J. (1962) How To Do Things With Words. Cambridge, M.A. Harvard University Press.

Senior Manager says discretionary charge was “in years gone by”; tells customer to have a word with his relationship manager about a possible reimbursement; expalins that they are charging for a service, “like a grocer”.. Customer is getting angry ..saying that “if you dont want my account I’d be better off moving it Replies..”All I’m saying is that is the Bank’s policy”

Bittner, E. (1965) ‘The concept of organisation’, Social Research, 23, 239-255 Bittner, E. (1973) ‘Objectivity and Realism in Sociology’ i n Psathas (1973) Phenomenological Sociology, New York, John Wiley Buttny, R. (1993) Social Accountability in Communication. London. Sage. Coulter, J. (1979) The social construction of mind: studies i n ethnomethodology and linguistic philosophy. London. Macmillan Press.

8. Conclusion: Making Ethnography Accessible. This paper has highlighted a series of ethnographic ‘vignettes’ that touch on issues of ‘responsibility’ as they were observed in the everyday work of a ‘high street’ bank. The value of ethnography in design has always been a matter of some controversy. (Plowman et al 1995). Within DIRC we believe that ethnographic observations can be of critical value i n making visible the ‘real world’ aspects of a work setting. We suggest that ethnographic approaches may clarify the role that actual practices play in the management of work through viewing activities as social actions embedded within a socially organised domain and accomplished in and through the day-to-day activities of its users. This is, in fact, a sociologically partisan conception of ethnography, but it does have the advantage of focusing upon the specific and detailed organisation of activities which designers are concerned t o understand, analyse and reconstruct. It is this ability of ethnography to describe a social setting as it is perceived b y those involved in the setting (the archetypal ‘users’) which underpins its appeal to designers. Obviously there are problems in enabling designers to utilise ethnography.The need to increase the utility of ethnography and to foster communication has directly motivated a number of developments for collecting, organising and presenting ethnographic material. Of particular relevance for the TA ‘Making Ethnography Accessible’ it has fostered the construction of a ‘framework’, a ‘sensitising device’ – partially utilised in this paper, that allows the results of ethnographic studies to be structured for presentation in a manner which makes the emerging results more digestible b y designers (Hughes et al 1997) . In deploying aspects this provisional framework we are attempting to steer a difficult course, between the accusation that ethnography is simply ‘hanging around’ and its findings entirely idiosyncratic, fortuitous and inconsequential; and the over-systematisation of a ‘cookbook’ or ‘painting by numbers’ approach that effectively defeats the purpose of an ethnographic approach.

Dant, T and Francis, D. (1998) Planning In Organisations: Rational Control or Contingent Activity. Sociological Research Online Vol. 3. No. 2 http://www.socresonline.org.uk/socresonline/3/2/4.html Feldman & March (1981), “Information in organisations as signal and symbol,” Administrative Science Quarterly, 26: 171-186. Garfinkel, H. (1967) Studies in Ethnomethodology, Englewood Cliffs, NY. Hughes, J, O’Brien, J, Rodden, T, Rouncefield, M, and Blyhin, S (1997) Designing with ethnography, A Presentation framework for design, Proceedings of the ACM Conference o n Designing Interactive Systems: processes, practices, methods, and techniques, Amsterdam, ACM Press, 147-158 Hughes, J, O’Brien, J, Randall, D, Rouncefield, M and Tolmie, P (2001) 'Some ‘real’ problems of ‘virtual’ organisation', New Technology, Work and Employment, Volume 16, No. 1, 2001 Martin, D. & Rouncefield, M. (2004) “Making the organisation come alive”: talking through and about the technology i n remote banking. HCI Journal Vol 18, Nos 1-2, pp 111-148 Plowman, L., Rogers, Y., & Ramage, M. (1995), “What are workplace studies for?” Proceedings of ECSCW’95, Stockholm, Sweden: ACM Press, pp..309-324. Power, M., (1994) The Audit Explosion, London: Demos. Randall, D. & Hughes, J.A. (1994), “Sociology, CSCW and Working with Customers,” in Thomas, P. ed. Social and Interaction Dimensions of System Design, Cambridge: Cambridge University Press. Randall, D., Hughes, J., O’Brien, TJ., Rodden, T., Rouncefield, M., Sommerville I. and Tolmie, P.; (1999) Banking on the Old Technology: understanding the organizational context of ‘legacy’ issues, Communications of the Association for Information Systems, Vol 1, Article 21, June 1999 Randall, D. Rouncefield, M. and Hughes, J. (1995), “Chalk and Cheese: BPR and Ethnomethodologically Informed Ethnography on CSCW”, Proceedings E-CSCW 1995, Stockholm ACM Press, pp.325-340. Schutz, A (1964) Collected Papers, Vol. 1-3, The Hague: Martinuus Nijhoff.

Page 108

Schutz, A (1967) The phenomenology of the social world / trans. by George Walsh and Frederick Lehnert. Northwestern U.P Strathern, M. (2000) (ed) Audit Cultures: Anthropological studies in accountability, ethics and the academy. London. Routledge Suchman, L., (1987) Plans and Situated Action: The Problem of Human-Machine Communication. Cambridge University Press. Cambridge. Weber, M. (1947), The Theory of Social and Economic Organisation, New York: The Free Press

Wittgenstein, L. (1968) Philosophical Investigations. Oxford, UK, Blackwell Publications Wittgenstein, L. (1998) The Blue and Brown Books, Oxford: Blackwell Publishers. Woolgar, S., (1991). Configuring the user: the case of usability trials. In J. Law (Ed.), A Sociology of Monsters: Essays o n Power, Technology and Domination. London: Routledge. Yates, J. (1989) Control through Communication: The Rise o f System in American Management. Baltimore. John Hopkins University Press.

Page 109

The influence of regret on choice: Theoretical and applied perspectives Chris Wright Department of Psychology, City University, London Current address: Department of Mental Health Sciences, Royal Free and University College Medical School, Hampstead Campus, Rowland Hill Street, London, United Kingdom, NW3 2PF

E-mail address: [email protected]

ABSTRACT In this paper, I give an overview of some of the research that I have conducted for my DIRC-funded PhD thesis (submitted i n June 2004). The thesis explores both applied and theoretical aspects of regret in decision-making.

Keywords Responsibility; risk communication; persuasion; user behaviour; intervention; regret; decision-making; choice; Regret Theory; Decision Justification Theory.

1. INTRODUCTION Recent psychological theory and research suggests that emotions can have an important influence on human choice and judgment. Rather than disrupting cognitive processes as was once believed, it now appears that emotions may actually be essential for effective decision making [4]. One emotion i s thought to be particularly relevant to choice – regret [2, 5]. Regret is experienced when a choice turns out badly. In their mind, the decision-maker compares the negative outcome they have experienced with what they know or believe would have happened if they had chosen differently. Thus, in addition t o feelings of disappointment about the outcome itself, regret i s associated with a sense of self-blame about having made the wrong decision. The consequences of experiencing regret may not, however, be all bad. Research suggests that regretful decision-makers are motivated to try to improve their performance by doing things differently in the future [9]. Studies have, for example, demonstrated that the experience of regret over a choice of service provider promotes a desire t o switch to an alternative service provider in future [10].

1.1 Using Regret to Change System User Behaviour (Applied Perspective) It is not only experienced emotions that can affect decisionmaking. Regret Theory [2, 5] claims that decision makers can also anticipate the possible emotional consequences of their choices before they make them, based on an assessment of the perceived likely outcomes. According to the theory, when choosing between available options, an individual takes these anticipated future emotions into account, typically selecting the option they believe will be associated with the least future negative affect. A growing body of research supports the view

that the anticipation of regret influences choices made b y participants in gambling and negotiation experiments [8] and by lottery players in real life [11]. ‘Intervention studies’ also provide evidence that highlighting future regret can influence people’s subsequent thoughts and actions. All of these studies have involved either health or road safety choices [1, 6, 7]. Part of my research has explored whether focusing individuals on their future regret can persuade them to make more securityconscious choices in relation to backing up their work and internet security 1. Computing science students from the University of Newcastle and City University London took part in an experiment over a 12-week period. They reported their attitudes and behaviour in relation to the two issues at three timepoints (baseline, intervention and follow-up phases). In the intervention phase, students imagined themselves in a scenario where, had they chosen to act differently, they could have avoided a negative outcome. For example, in relation t o backing up, participants imagined themselves as a researcher in a computer research centre who does not back up their work regularly because of the time and effort required; instead, they keep all their research files only on their laptop. One day they discover that their laptop has been stolen and, because they have not backed up recently, they have lost all the work they had done over the previous two months. As a result, there is a delay in being able to publish their research findings and it i s likely that researchers from competing research centres will publish similar work first and thus reap all the rewards. Having imagined themselves in this scenario, the students then considered how regretful they would be feeling in that situation. The results of the experiment showed that, for backing up, students reported more data-protective behavioural choices immediately after the intervention and at follow-up (five weeks later), compared to their baseline measures. Whilst the second scenario was less effective at changing students’ behaviour in relation to disabling active scripting (although there were positive changes in the short-term i n their perceptions of the advantages of adopting this security measure), the experiment nonetheless provides some preliminary evidence that a regret-based communication could be a useful persuasion tool in the domain of computer security.

Page 110

1.2 Regret Components and their Antecedents (Theoretical Perspective) Recent theory in the field of regret has proposed there are two ‘core components’ of regret. Decision Justification Theory (DJT) argues that regret consists of a negative feeling about the bad outcome (‘outcome regret’) and a sense of self-blame about making a bad decision [3]. According to the theory, the two components have different antecedents: outcome regret i s influenced by how bad the outcome is, while decision-related regret is related to the perceived quality of the decision – how justifiable the decision seems. Since they have different antecedents, DJT proposes that, depending on the circumstances surrounding a choice, the two components of regret may be experienced together or one without the other [3]. For example, if a decision-maker feels they made a justifiable decision, they may experience regret about the outcome (if things turns out badly), but they will not experience any self-blame (since they made a ‘good’ decision). Their overall feeling of regret is therefore likely to be less than that of a decision-maker who experiences the same negative outcome but made a poor (unjustifiable) choice – on top of facing the regrettable outcome, they also experience self-blame for not choosing better. A series of experiments reported in my thesis investigated some theoretical aspects of regret in decision making, arising out of DJT’s proposals. The results from my scenario-based studies suggest that (rather than having totally distinct antecedents) the two components of regret may in fact share similar antecedents, in that the nature of the outcome of a choice appears to affect individuals’ perceptions of the justifiability of their decision. Decisions which were followed by more serious outcomes were seen as being less justifiable than the same decisions which were followed by less severe outcomes. Thus, outcome severity appears to influence both regret about a bad outcome and regret about making a poor decision. Overall, outcome regret tended to be rated as greater than decision regret, perhaps reflecting a sub-conscious psychological mechanism to protect the individual from selfblame and negative mood. A content analysis of reported “biggest regrets” derived from a search of UK newspaper articles also found that people mention regrettable outcomes more frequently than they mention regrettable decisions. However, the findings of this newspaper review also indicated that individuals may not distinguish as clearly between ‘regret’ and ‘disappointment’ as some researchers and theorists do – many of the reported ‘regrets’ referred to bad outcomes that were not related to the individual’s own choice. My research findings suggest that it is the valence of the outcome of a choice (rather than concerns about the quality of the choice itself) which plays the stronger part in determining our experiences of regret in decision-making.

2. ACKNOWLEDGMENTS The research reported in my PhD thesis was funded b y Engineering and Physical Sciences Research Council (EPSRC)

Research Studentship Award Number 317441 and the Dependability Interdisciplinary Research Collaboration (DIRC) Project. I wish to thank Professor Lorenzo Strigini and Dr Andrey Povyakalo, Centre for Software Reliability, City University, London, for their assistance with the technical aspects of the study materials for the regret intervention study; Professor Cliff Jones, Mr David Greathead, Dr John Fitzgerald, Dr Cristina Gacek (University of Newcastle) and Dr Peter Popov (City University London), who facilitated data collection for the regret intervention study; and Professor Peter Ayton, Department of Psychology, City University, London, who supervised my PhD.

3. REFERENCES [1] Abraham, C., and Sheeran, P. (2003). Acting on intentions: The role of anticipated regret. British Journal of Social Psychology, 42, 495-511. [2] Bell, D.E. (1982). Regret in decision making under uncertainty. Operations Research, 30, 961-981. [3] Connolly, T., and Zeelenberg, M. (2002). Regret in decision making. Current Directions in Psychological Science, 11, 6, 212-216. [4] Damasio, A.R. (1994). Descartes’ error: Emotion, reason and the human brain. New York: G.P. Putnam’s Sons. [5] Loomes, G., and Sugden, R. (1982). Regret Theory: An alternative theory of rational choice under uncertainty. Economic Journal, 92, 805-824. [6] Parker, D., Stradling, S.G., and Manstead, A.S.R. (1996). Modifying beliefs and attitudes to exceeding the speed limit: An intervention study based on the Theory of Planned Behavior. Journal of Applied Social Psychology, 26, 1, 1-19. [7] Richard, R., van der Pligt, J., and de Vries, N. (1996). Anticipated regret and time perspective: Changing sexual risk-taking behavior. Journal of Behavioral Decision Making, 9, 185-199. [8] Zeelenberg, M. (1999). Anticipated regret, expected feedback and behavioural decision making. Journal of Behavioral Decision Making, 12, 93-106. [9] Zeelenberg, M., van Dijk, W.W., Manstead, A.S.R., and van der Pligt, J. (1998). The experience of regret and disappointment. Cognition and Emotion, 12, 2, 221-230. [10] Zeelenberg, M., and Pieters, R. (1999). Comparing service delivery to what might have been: Behavioral responses to regret and disappointment. Journal of Service Research, 2, 1, 86-97. [11] Zeelenberg, M., and Pieters, R. (2004). Consequences of regret aversion in real life: The case of the Dutch Postcode Lottery. Organizational Behavior and Human Decision Processes, 93, 155-168.

1

A more detailed report of the regret-based intervention study is currently under review by the International Journal of HumanComputer Studies. Copies of the draft manuscript are available from [email protected].

Page 111

Page 112

Structure Theme

Page 113

Structuring dependable on-line services: A case study using internet grocery shopping Gordon Baxter1, Budi Arief2, Shamus Smith3 and Andrew Monk1 1

Department of Psychology University of York Heslington York YO10 5DD +44 1904 434369

2

School of Computing Science University of Newcastle Newcastle upon Tyne NE1 7RU +44 191 2228971

l.b [email protected]

3

Department of Computer Science University of Durham Durham DH1 3LE +44 191 3344284

[email protected]

{g.b axter, a.monk}@p sych.york.ac.uk ABSTRACT Whilst we are entering the era of the silver surfer, there are still many older people who do not have access to internet-based services. Grocery shopping via the internet, for example, i s potentially very useful to older people with mobility problems. The standard model of internet shopping involves a do-it-yourself (DIY) approach; an alternative approach is t o get someone else to do it for you (GSETDIFY). The GSETDIFY model has been implemented by the Net Neighbours scheme with the aim of providing social support and human contact with a volunteer. The DIY model and Net Neighbours implementation are compared and contrasted here in terms of how their structure affects the dependability of the service offered. Whilst the Net Neighbours structure is necessarily more complex, as it involves more stakeholders, it has been possible to achieve a high level of dependability by drawing on these stakeholders as additional resources.

Keywords Structure, dependability, socio-technical systems, age concern, internet shopping, shopping by proxy

1. INTRODUCTION While there has been much talk about so-called silver surfers—older people who regularly use internet services—there is still a proportion of older people who d o not have access to the internet and the services that it can provide. This paper focuses on access to internet services, focusing in particular on internet grocery shopping, and how these services can be accessed by older people whether they have direct access to the internet or not. More particularly i t compares and contrasts the dependability of what many would regard as the standard way of internet shopping with the more complex structure of an indirect method in which the internet access is provided by intermediary volunteers. In section 2 the basic process of grocery shopping i s described, before going on to consider two possible alternative mechanisms for grocery shopping over the internet: personally (i.e. directly), or via an intermediary (i.e. indirectly). Our main concern is with the second mechanism and how it can be implemented in practice to provide a service that is dependable (reliable, available, safe and secure). Section 3 describes Net Neighbours, a pilot scheme in York t o designed to support indirect internet grocery shopping. This scheme adds intermediary stakeholders between the purchaser and the on-line shop. The relationships between these stakeholders need to be carefully monitored and, where

appropriate, controlled if a dependable service is to be provided. In section 4 we consider the dependability issues of Net Neighbours and compare it to the direct internet shopping approach. In section 5 we discuss how explicit management of the extra organisational structures needed for the indirect shopping provide support for not only the shopping itself but also support other desirable consequences, for example social interaction for the older shopper.

2. GROCERY SHOPPING The basic task of grocery shopping is simple when viewed at a high level of abstraction. It is simply a matter of: 1. Decide what groceries are required 2. Select the groceries 3. Pay for the groceries 4. Get the groceries home 5. Put the groceries away This simple model also applies to internet grocery shopping with the difference that item 4 is carried out by the supermarket rather than the customer. If one views internet grocery shopping as a service, then the way in which that service is implemented will affect the dependability of the service. Here, two ways of implementing internet grocery shopping are described.

2.1 DIY INTERNET GROCERY SHOPPING In the do-it-yourself (DIY) approach to internet grocery shopping, the customer is the person who decides what groceries are required, places the order by selecting the groceries from the supermarket website, and pays for the groceries using the supermarket website. There are some other subtle nuances to the process, however. The structure of the service is described as a Hierarchical Task Analysis(HTA) i n Table 1. In order to be able to order groceries over the internet, customers have to first register with the supermarket, b y setting up an account. This will require the use of an e-mail account. This can be quite a hurdle if the customer i s unfamiliar with online shopping as there is no equivalent i n bricks and mortar shopping. As part of the process of placing the order, the customer has t o select a time slot in which the groceries should be delivered and must then pay for the order, typically with a credit card. The customer is sent a confirmation of the order by e-mail. The customer’s credit card is debited by the internet supermarket and the order is collated and dispatched from a local supermarket. The groceries are then delivered to the customer,

Page 114

sometime during the selected time slot (depending on the supermarket, this is usually a 1-hour or 2-hour slot). The customer then accepts or rejects any substituted items (if an item is out of stock, most internet supermarkets will substitute it with a similar in-stock item). Any rejected items are returned to the delivery driver. The driver is responsible for informing the store that the rejected substitutions have been returned and the customer’s account should be credited. It is the customer’s responsibility to ensure that the order is complete and correct and to check that the value of any missing items are credited back to their payment method, e.g. back on their credit card account.

intermediary facilitates the internet interaction of the DIY approach. This GSETDIFY approach is best illustrated with an example: the next section describes the Net Neighbours internet grocery shopping scheme that has been set up with Age Concern, York (ACY). The motivation behind the design of this service was to provide social contact at the same time as a shopping service. Neighbours scheme provides a socialising opportunity by establishing a befriending relationship between the volunteer and the client. Isolation is a major problem for older people, and for some the opportunity to get out and shop is an important opportunity for social contact [Blythe reference].

Having summarised the structure of the service as an HTA it i s then possible to list points where the service could go wrong, these can be thought of as exceptions.

3. NET NEIGHBOURS

1. Register with supermarket (provide personal details) 2. Shop on line 2.1 Decide what groceries are required (shopping list) 2.2 Select groceries (online) 2.3 Select delivery slot online (online) 2.4 Pay for groceries with credit card (online) 2.5 Be in to receive groceries 2.6 Check groceries and return unwanted substitutions/error 2.7 Put away groceries 2.8 Check credit card account against receipt Plan: 2.1 then 2.2 and 2.3 in any order, then 2.4-2.8 in that order Plan: 1 only once, then repeat 2 as necessary Table 1. HTA summarising structure of DIY service. Cannot understand purpose of or jargon in 1. Do not have email account for 1. No suitable delivery slot available for 2.3 Do not have credit card for 2.4 Forget or unable to be in for 2.5 Delivery does not arrive on time for 2.5 Essential items not delivered in 2.6 Apparent error in 2.8

The Net Neighbours scheme [3] enables older people in the York region to effectively shop on-line without direct access to a computer. The scheme uses volunteers to act as an intermediary to on-line shopping at the main supermarkets based in York. The older people (customers or clients) are chosen by ACY [4]. In the Net Neighbours scheme the customers decide what groceries are required, the volunteers select the groceries from the supermarket website, and the customers pay for the groceries albeit indirectly (the volunteers pay the supermarket; the volunteer agency refunds the volunteers; and the customers pay the volunteer agency). The stakeholders (client, agency, volunteer and supermarket) in the Net Neighbours scheme are separated in space and time. This makes the structure of internet grocery shopping more complex. In particular, the financial transactions that take place are more difficult to track, because it is no longer simply the case of the customer directly paying the supermarket.

3.1 Registering the volunteer

Table 2. Possible exceptions for an older person using such a service

Currently volunteers are drawn from the Active York scheme [5]—this is a scheme that encourages staff and students at York University to contribute actively to the local community— they are put in touch with ACY. ACY then carries out the necessary background checks on the person, currently by taking up references. Once these are complete, the volunteer undergoes training from ACY. The development worker from ACY sets up the database so that it can be accessed by the volunteer.

3.2 Registering the client

2.2 GSETDIFY INTERNET GROCERY SHOPPING The main benefits of internet grocery shopping, namely time and effort saved, as well as time and place independence [1] should make it an attractive means for people with limited mobility such as the elderly to do their grocery shopping. But according to [2], the proportion of people aged 65+ who shop online is relatively small – 1.7% in the US and 0.9% i n Belgium (based on surveys in 2001). There are many reasons behind this, and the most frequently quoted ones are the difficulty with the interface and the lack of internet connection. Get-someone-else-to-do-it-for-you (GSETDIFY) internet grocery shopping is an approach where an intermediary is used to insulate the customer (the elderly person) from the complexities and technical difficulties of internet exposure or when the customer does not have access to the internet. The

There are two stages to registering the client. The first is that the client registers with ACY. Once this is complete, the development worker from ACY selects an appropriate volunteer who will act as the client’s shopper. The development worker contacts the volunteer to pass on the relevant information about the client. A meeting is then set u p between the development worker, the client, and the volunteer, which may simply be a telephone conference. Once the meeting has taken place, the second stage is for the volunteer to register the client’s details in the database. In addition to a standard pro forma, the volunteers can add notes which provide snippets of information about the client, such as the fact that they are expecting to have a grab rail installed on a particular date. This information can be used by the volunteer when talking to the client on the telephone, making the process more engaging for the client.

Page 115

Figure 1: The GSETDIFY Internet Shopping Model as implemented in the Net Neighbours scheme

3.3 Placing first order for the client Before the first order can be placed, there are several things that need to be set up by the development worker. They have t o establish a free e-mail account (hotmail, yahoo etc.) for the client (if the client has an existing account, this should not be used, because the new account may need to be accessed by the volunteer). The development worker then accesses the client’s preferred supermarket web site and sets up an account for the client using this new e-mail address.

3.4 Placing an order, delivery and after delivery of an order The volunteer can then go ahead and place the order b y carrying out a sequence of steps. The process is encapsulated in the HTA's below. This time the stakeholder(s) involved i n each task have to be specified. The nature of these dependencies is also captured in Figure 1 where each of the numbers in the figure corresponds to a task step in Table 3. 1. DW Register Volunteer 1.1 DW Get references 1.2 ACY Train volunteer to work with older people 1.3 ACY Train volunteer to use service 2. DW Register client with supermarket 3. DW Match volunteer to client 4. DW, V, C Introduce Volunteer to Client 5. V, C Shop on line 5.1 C Decide what groceries are required (shopping list) 5.2 V, C Select groceries (online) 5.3 V, C Select delivery slot online (online) 5.4 V Pay for groceries with credit card (online) 5.5 V Send order details to development worker

5.6 V, C Agree time for next call 5.7 C Be in to receive groceries 5.8 C Check groceries and return unwanted substitutions/error 5.9 C Put away groceries 5.10 C Send receipt and cheque to ACY 5.11 DV arrange with accounts for volunteer to be reimbursed 5.12 V Check credit card account against reimbursements Plan: 5.1 then 5.2 and 5.3 in any order, then 5.4-5.10 in that order, 5.11 and 5.12 when available 6. ACY Audit payments and receipts Plan: 1- 4 only once, then repeat 5 and 6 as necessary, Table 3. HTA summarising structure of GSETDIFY service.

4. DEPENDABILITY MATTERS Jones and Randall [6] define dependability as “…the ability t o avoid service failures that are more frequent and more severe than is acceptable to the user(s).” Here we broaden the definition slightly to make it apply to the stakeholders, rather than just the users. The main reason for so doing is that there are some stakeholders who may be affected by the service even though they do not directly use it. In the case of the Net Neighbours scheme, there are several stakeholders, and each of these will have a different view of the factors that contribute t o dependability of the service. In other words, each will have different views on what are acceptable service failures. Table 2 in Section 2.1 listed "exceptions", things that could make the DIY service not dependable from the point of view of the customer. It was obtained by systematically working through the structure characterised as an HTA in Table 1. The same thing can be done for the structure of the HTA for the GSETDIFY service as characterised as an HTA in Table 3, only

Page 116

this time three sets for exceptions are generated, one for each of the stakeholders mentioned. Note, we have not considered the role of the supermarket, credit card companies and so on, i n this analysis as it is assumed that they have their own procedures to ensure dependability that will not be compromised by the service. No suitable delivery slot available for 5.3 Forget or unable to be in for 5.7 Delivery does not arrive on time for 5.7 Essential items not delivered in 5.8 No bank account for 5.10 Volunteer does not ring when agreed for 5. Insensitive volunteer upsets client in 5. Table 4. Possible exceptions for the client using GSETDIFY service No suitable time for next call for 5.6 (e.g., on holiday) Delay in reimbursement or apparent error in 5.12 Table 5. Possible exceptions for the volunteer GSETDIFY service

using

Volunteer does not ring when agreed for 5. Essential items not delivered in 5.8 Client does not return payment or receipt for 5.10 Insensitive volunteer upsets client in 5. Payment error or delay causes volunteer to resign in 5. Apparent error in audit for 6. Table 6. Possible exceptions for the development worker and ACY using GSETDIFY service Tables 4-6 give the possible exceptions for the three stakeholders mentioned in Table 3 (ACY and DV are put together for these purposes). Note that the number of exceptions for the client is reduced by one and the total number of exception not greatly increased. This section discusses how these exceptions are handled to ensure a dependable system. The introduction of more stakeholders into the system in the Net Neighbours scheme creates dependencies. In the pilot scheme, the first clients were people who were part of the ACY hospital aftercare scheme. One of the conditions of early discharge from hospital is that the clients are able to fend for themselves. This includes the ability to be able to carry out everyday tasks, such as grocery shopping. The Net Neighbours scheme was seen as one way of fulfilling this condition. The introduction of the extra agents adds to the complexity of the structure, however, in that there is now much more communication required between the agents. There are also issues of trust that are raised, through the dependencies. ACY, who have overall responsibility for the GSETDIFY scheme implemented by Net Neighbours service, have taken two measures to ensure that the service is viewed as dependable by all the stakeholders. The first is to define the role of the development worker in such a way that she can sweep up a lot of the exceptions. The second is through selection and training of the volunteers. Clients and volunteers are encouraged to see the development worker as the first person to contact if something goes wrong,

and there is a dedicated answer phone line monitored throughout the day. Thus if the volunteer does not call, the delivery is not on time or there are essential items missing the development worker can sort out the problem for the client immediately. On the rare occasions when the client has n o bank account, the development worker can call to collect cash. When the volunteer is unable to make the call the development worker can step in. Volunteers go through the standard Age Concern training for volunteers which explored the issues of ageism and other stereotypes we all have. This is designed to make volunteers sensitive to the problems older people have and can also act as a (self) selection measure. Volunteers are also required t o provide character references. It has to be said that weeding out unsuitable volunteers is much less of a problem than getting a suitable number of volunteers in the first place.

5. DISCUSSION The previous section highlighted the fact that even though the service being provided by the two different schemes i s ostensibly the same, the dependability of the two implementations of the service has a different structure. What is important, however, is that the Net Neighbours implementation should be at least as dependable as the DIY scheme. Through an analysis of the structure of the service using hierarchical task analysis it was possible to identify exceptions, points in the structure that made its dependability vulnerable. We were also able to identify how the Net Neighbours Service has handled these exceptions by defining the role of the development worker and by the training and selection of volunteers. The key part to the Net Neighbours scheme is the role played by ACY. They provide the volunteers and administer the scheme, making sure that clients and volunteers are meeting their requirements. ACY are experienced in these areas, and already provide a manual shopping service to clients as part of their hospital aftercare scheme. They are also used to carrying out checks on volunteers and provide training for them. Furthermore, they have experience of administering other financial schemes which they have to internally audit, since they are a non-profit making agency. The dependability of the service therefore hinges on the role played by ACY as administrators. The structure of the Net Neighbours scheme offers some benefits over and above those provided by the DIY scheme. Grocery shopping can be one of the few social activities still available to some older people, so when they are unable to d o their own grocery shopping they can become socially isolated. The DIY scheme reinforces the isolation, because the older person does not need to talk to anybody to order their shopping; on the other hand, it does mean that they retain their independence. The Net Neighbours scheme, however, incorporates a socialising activity, in that the volunteers, as well as taking the client’s grocery order, chat to the clients about more general matters [8].

6. FUTURE WORK The GSETDIFY model that is implemented by the Net Neighbours scheme could be used for other shopping purposes. The scope could be extended to include shopping for books, for example, or adapted to provide access to on-line services, such as train timetables. At present the scheme i s being piloted using volunteers from the University of York.

Page 117

There are plans in hand to extend the scheme to cater for more clients, using volunteers from other local businesses. There are some limitations on the size of the scheme, based on ACY’s scheme of having a maximum of two clients per volunteer.

project (http://www.dirc.org.uk). The pilot scheme of Net Neighbours was conceived by Mark Blythe and is financially supported by Active York.

The Net Neighbours scheme as it currently stands uses a single implementation of the GSETDIFY model. Another possible implementation is where the person placing the order is a friend/relative who physically visits the older person to place the order and helps with the driver interaction on delivery day. This would mean a change to the structure of the system, s o some further work would be required in order to assess how this affected the overall dependability.

9. REFERENCES

7. SUMMARY The standard model of internet grocery shopping involves a DIY approach. There is an alternative, however, which is the GSETDIFY approach. This alternative approach uses a proxy shopper to order the groceries. Whilst the alternative approach (which can implemented in at least two ways) is more involved, the Net Neighbours scheme shows that a judicious choice of the structure of the system can result in a grocery shopping service that is as dependable as the DIY method. In the Net Neighbours scheme much of the structure is provided by people, rather than technology. This does raise other issues, however, which means that the key to dependability lies in the way that the overall service is administered. As the baby boomers approach retirement the demand for services like on-line grocery shopping looks likely to expand, and on-line retailers cannot afford to ignore this expanding market sector [8]. The Net Neighbours scheme offers an alternative way of providing such services in a dependable way.

8. ACKNOWLEDGMENTS The authors would like to thank the volunteers, clients and Age Concern York (particularly Jenny Jarred), for their involvement in the Net Neighbours scheme. This work was carried out under the EPSRC-funded (grant GR/N13999) DIRC

[1] Raijas, A. The consumer benefits and problems in the electronic grocery store, Journal of Retailing and Consumer Services, 9, 2, (Mar. 2002), 107-113. [2] Brengman, M., Geuens, M., Weijters, B., Smith, S.M., and Swinyard, W.R. Segmenting Internet shoppers based on their Web-usage-related lifestyle: across-cultural validation. Journal of Business Research, 58, 1 (Jan. 2005), 79-88. [3] Blythe, M., Monk, A., Baxter, G., and Jarred, J. Making a Net Neighbours Service. Available at http://wwwusers.york.ac.uk/~am1/Making.PDF. [4] Age Concern York. Online at http://www.ageconcernyork.org.uk/. [5] Active York. Online at http://www.york.ac.uk/admin/ssdu/ay/ayold/index.html. [6] Jones, C.B., and Randell, B. Dependability and the role of structure. Manuscript submitted for publication. [7] Geuens, M., Brengman, M., and S’Jegers, R. Food retailing, now and in the future. A consumer perspective. Journal of Retailing and Consumer Services, 10, 4, (Jul. 2003), 241251. [8] Blythe, M. and Monk, A.F. (2005) Net Neighbours: adapting HCI methods to cross the digital divide. Interacting with Computers, 17, 35-56. [9] Keh, H.T. and Shieh, E. Online grocery retailing: Success factors and potential pitfalls. Business Horizons, (JulyAugust 2001), 73-83.

Page 118

Cognitive conflicts in aviation: Managing computerised critical environments Denis Besnard

Gordon Baxter

School of Computing Science Claremont Tower University of Newcastle Newcastle upon Tyne NE1 7RU +44 (0)191 222 8058

Department of Psychology University of York Heslington York YO10 5DD +44 (0)1904 434 369

[email protected]

[email protected]

ABSTRACT The purpose of the introduction of automation in critical environments was to lower human operators’ workload and increase the reliability of human-machine interaction (HMI), thereby improving systems’ safety. However, automation also triggered side effects in that it allowed systems to act in ways that could not be understood nor anticipated by operators. These so-called cognitive conflicts impair dependability because the system’s behaviour falls out of control of the operators. Such a phenomenon has been reported in several commercial air accidents such as the one at Cali (Columbia). After analysing this crash and the involved cognitive factors, we discuss two complementary approaches to HMI (assistant tools and flightdeck transparency) that can contribute to the design of more supportive automation in critical environments.

1. INTRODUCTION The performance of any socio-technical system is the result of an interaction between the humans, the technology, and the environment, including the physical and organisational context within which the system is located. Each of these high level components (human, technology and environment) has its own structure and behaviour. In addition to the static features of some components (such as the human’s physiology, the architecture of the technology, etc.), the dynamic structure of their interaction also has to be considered. Namely, the human use and understanding of automation in critical environments deserves particular attention given its contribution to dependability. This certainly applies to commercial aviation. Automation has been extensively used in the cockpit in the belief that this would increase flight safety and help humans cope with the complexity of flying an aircraft in more and more cluttered skies. This objective has certainly been achieved but has also generated side effects in terms of cognitive failure modes that were less prevalent in the classical cockpit. Some investigation is therefore required in order to understand the nature of these side effects and their consequences on human cognition. In this paper, we focus on cognitive conflicts in aviation and discuss the role played by the design of automation in its occurrence. We investigate the extent to which a discrepancy between the operator’s understanding of the system and what the system actually does can lead to a degraded HMI, using an example from commercial aviation. After defining cognitive conflicts, we analyse the Cali air accident in cognitive terms and explain how the mismatch between the crew’s expectations and the actual behaviour of the aircraft contributed to the mishap. We then discuss possible dimensions related to the remediation of cognitive conflicts and propose some design

guidelines for better supporting HMI in dynamic, critical, computer-based systems.

2. HUMAN-MACHINE INTERACTION IN AVIATION Glass-cockpit aircraft1 such as the Boeing B747-400 are mainly piloted through the flight management computer (FMC) and the autopilot. When the FMC (which holds the flight plan as programmed by the crew) is coupled to the autopilot (which executes the flight plan), the pilots are not physically flying the aircraft any more. This coupling provides a high degree of precision in flying the aircraft and also diminishes crew fatigue during long flight legs. Because the automated flightdeck can reliably perform several tasks simultaneously, airmanship is now just one of many required skills, along with others such as interacting with the on-board computers and various other digital instruments. This situation is typical of many critical systems: human operators increasingly depend on automation that handles more and more critical functions of the controlled process. However, the interaction with the system and the feedback that is received from it is what keeps operators aware of what the system is doing. Therefore, the dependability of the control and supervision task is strongly related to the operators’ understanding of automation’s behaviour. As a consequence, getting the design of the automation right is one of the most important challenges currently faced by critical systems designers. Many computer-based systems utilise multiple operating modes (see Degani, 1997 for a classification) and decision rules that interact between each other. In dynamic environments, this leads to actions being triggered under conditions whose complexity is sometimes beyond human understanding capabilities. The net effect is that the operators sometimes find themselves in problematic out-of-the-loop situations (FAA, 1996; Sarter & Woods, 1995). Namely, they have difficulties in understanding (and predicting) the temporal sequence of output actions of the system (Jones, 2000). The operators do not always detect that something that they cannot explain is happening, thereby leading to conflicts. For instance, Rushby (1999; Rushby et al., 1999), Sarter et al. (1997) and Palmer (1995) describe examples of cockpit automation surprises, i.e. where a normal event occurs that was not expected, or an expected normal event does not occur.

1

The name is derived from the CRT displays in the cockpit.

Page 119

3. WHAT IS A COGNITIVE CONFLICT? A cognitive conflict is an incompatibility between an operator’s mental model and the behaviour of the process under control. It often manifests itself as a surprise on the part of the operator. However, such conflicts are not always detected. For instance, if an operator fails to characterise the system state as abnormal, the conflict is hidden. Nevertheless, the conditions necessary for a mishap to happen might have already gathered. This situation is similar to Reason’s (1990) description of latent errors and to Randell’s (2000) definition of dormant errors. An example of a hidden conflict is the accident of the Royal Majesty (NTSB, 1997) where the crew grounded the ship after many hours of navigation on the wrong heading without noticing the silent failure in the positioning system. We consider cognitive conflicts to be split according to two dimensions2: • Nature. The conflict is be due to an unexpected event or an unexpected non-event; • Status. The conflict is detected (the operator is alerted by a system state) or hidden (the operator is not aware of the system state). A conflict requires some remedial action to be undertaken to bring the operators’ mental model and the automation’s behaviour back in step. However, the occurrence of the conflict and its resolution can be totally disjoint in time. For instance, a hidden conflict can persist long enough for an accident to occur. On the other hand, a pilot trying to make an emergency landing following a loss of power will voluntarily leave a number of detected conflicts unresolved, for which a clear understanding will deliberately not be sought (e.g. leaving alarms unattended).

4. COGNITIVE CONFLICTS ILLUSTRATED We have chosen an example of a cognitive mismatch combined with the execution of an ill-defined plan. Two important issues are worth highlighting here. First, it is only when the outcomes of a given action conflict with their expectations that operators can make a judgement about its correctness. Second, although the conflict described here arose after the crew had taken an action that was subsequently found to be erroneous, variants exist (see e.g. indirect mode changes in Leveson & Palmer, 1997; Rodriguez et al., 2000). Typically, the detection of a conflict triggers some diagnostic activity as operators attempt to reconcile their expectations with the system’s behaviour. However, the time pressure faced by crews during busy periods (in this case, the approach phase) can disrupt recovery. Moreover, fixation errors (de Keyser & Woods, 1990) can impair the situation assessment and prevent the execution of appropriate actions (e.g. abandon an approach).

flight management computer (FMC) to implement the direct approach suggested by air traffic control (ATC)3. ATC suggested that the aircraft could land directly on the southbound runway instead of flying around the airport. The approach for this landing starts 63 nautical miles from Cali and proceeds through a number of beacons. Because the crew knew they had missed the first one (TULUA), they reprogrammed the FMC and intended to enter another beacon (ROZO4) as the next waypoint to capture the extended runway centreline. However, when the crew entered the first two letters of the beacon name (“RO”) in the FMC, ROMEO came first in the list and the crew erroneously accepted it. Unfortunately, ROMEO is located 132 miles east-northeast of Cali. It took the crew over a minute to notice that the aircraft was veering off to a wrong heading. Turning back to ROZO put the aircraft on a fatal course, and it crashed into a mountain near Buga, 10 miles east of the descent track. The inquiry commission noticed several failures in the crew’s performance, most notably: • in the acceptance of ATC guidance without having the required charts to hand; • in continuing the initial descent while flying a different flight plan; • in persisting in proceeding with the (new) southbound approach despite evidence of lack of time. This case highlights the criticality of delays between an action (beacon selection) and the detection of its inappropriate outcomes (veering off to a wrong heading). Another dimension related to the beacon selection mistake is the frequency gambling heuristic (see Reason, 1990). People operating in routine situations select actions on the basis of intuitive success rates in the past. Specifically, entering the first two letters of a beacon should return only one item for a particular airspace. The repetition of this co-occurrence progressively reinforced the pilots’ expectation that the beacon returned by the FMC is the right one. Because heuristics aim at saving cognitive resources, the crew’s selection of the wrong beacon was performed with a low level of control. Moreover, because of the combination of a workload peak (reprogramming of the FMC) and an exception in the beacon’s database, the crew did not immediately detect their mistake. As is typical with heuristics, frequency gambling does not guarantee a perfect result but offers a satisfactory solution in most cases - what Simon (1996) refers to as satisficing.

4.1 The Cali crash In December 1995, a Boeing B757 flying at night from Miami (Florida) crashed into a 12,000ft mountain near Cali, Colombia, killing nearly all of the 163 people on board (Aeronautica Civil of the Republic of Colombia, 1996). This Controlled Flight Into Terrain (CFIT) accident was attributed to the crew losing position awareness after they had decided to reprogram the 3

2

A third dimension could be the interpretation of the situation by the operator: correct, false positive, false negative. Within the scope of this paper, this dimension will not be developed.

Late acceptance of an approach has previously been involved in the crash of an A320 on Mont Sainte Odile (Starsbourg, France) in 1992 (see Monnier, 1993) 4 This beacon has subsequently been renamed PALMA, as is shown in Figure 1.

Page 120

Intended route Route to Romeo Actual route

Figure 1: Partial, amended chart of the approach to runway 19 (southbound) at Cali. © Jeppesen Sanderson, Inc.

MODEL AND THE PROCESS

4.2 Summary The case exposed here highlights the possible consequences of having the system’s behaviour misunderstood by the operators and the importance of an undetected initial problem in the emergence of a conflict. In our opinion, there is a class of factors whose combination can generate aberrant behaviours. Namely, the following factors (or a combination thereof) contribute to the decay of the quality of mental models and to the occurrence of cognitive conflicts: • high-tempo, dynamic system; • low predictability of machine’s behaviour; • occurrence of an undetected error; • erroneous continuation of an initial plan. These factors might be relatively independent from the nature of the system. For instance, the erroneously engaged Go-around mode aboard an Airbus A300 at Nagoya airport (Ministry of Transport, 1996) caused the co-pilot to fight against the climbing aircraft in order to land it. Another example is the grounding of the Royal Majesty (ibid) mentioned earlier in this document. Having defined cognitive conflicts and illustrating the concept using an example, we now consider what can be done to help manage them. More particularly, we consider what is technically feasible to help overcome such conflicts in dynamic systems, and what can be done to help maintain a better alignment between the operator’s mental model and the behaviour of an automation-driven dynamic critical system.

5. ALIGNING THE HUMAN MENTAL

Humans make errors but recover most of them (Helmreich, 2001) and also compensate for contingencies during exceptionhandling (Reason, 2001). Therefore, automation is now more a case of distributing decisions across multiple complementary agents as opposed to excluding humans from the control loop. Today, this view is widely accepted. However, some heterogeneous development in critical systems occurred in aviation, leading to the emergence of integration issues and reliability concerns. Namely, current cockpits are largely a result of a bottom-up approach in which new systems are added in an almost ad hoc fashion. The flight crew are then left to try and cope with complex, disparate pieces of automation. The recent trend in aircraft cockpit design has been to increase the level of automation (e.g. Flight Management System, Airborne Collisions Avoidance System, Enhanced Ground Proximity Warning System), which requires increased programming time and effort from the pilots. Concomitant with this change, the predictability of the behaviour of aircraft has decreased, leading to new sorts of conflicts (e.g. due to indirect mode changes, see Leveson & Palmer, 1997). This is the latest episode of a heavily computer-driven evolution of flightdeck systems where the side-effects of automation were not anticipated. Given the growing number of pieces of automated equipment in modern cockpits and the number of automationrelated incidents (see for instance the FlightDeck Automation Issues website5 for a survey) one may ask whether the cockpits, as they are designed today, have reached their limits in terms of improvements to the HMI. Here we consider two possible design alternatives. The first is the deployment of more powerful and better integrated cockpit assistants (section 5.1). 5

Visit the FDAI website at http://www.flightdeckautomation.com

Page 121

The second deals with transparent flightdecks (section 5.2) based on less knowledge-demanding interfaces.

5.1 Glass-cockpit assistants The success of the joint cognitive systems framework proposed by Hollnagel and Woods (1983) depends on the automation maintaining a model of the operator’s behaviour. The main implication with such systems is that they can infer the operator’s intentions from a combination of the history of the interaction, the operational state of the system, and reference plans (Leveson et al., 1997). The assumption is that if the operator’s intentions can be reliably inferred, then contextspecific monitoring and assistance can be provided by the automation. In aviation, several experimental cooperative systems exist where assistants intervene with various levels of authority (see Wickens, 2001 and Olson & Sarter, 2000 for a description of these levels). Most of these assistants work on an anticipated picture of reality, thereby providing timely advice that help the pilot stay “ahead of the aircraft”. This capability is known to be a strong determinant of the reliability of cognitive activities in dynamic, critical systems (Amalberti, 1996). Hazard Monitor (Bass et al., 1997) is an example of such an experimental anticipative system. Other examples include Pilot’s Associate (Rouse et al., 1990), CASSY (Onken, 1997), CATS (Callantine, 2001) and GHOST (Dehais et al., 2003). In essence, most of these tools compare the action of the crew against a reference library of plans for a given operational context, and use the results of the comparison to generate expectations about the interaction. When a conflict is anticipated or actually happening, the system can send appropriate advice and warnings to the pilot. Several of these systems have undergone experimental testing in simulated flights but none has yet flown on a commercial flight. One crucial feature (that shows some variability among the above mentioned systems) concerns the nature of the advice that is given to the pilot and its integration into the task. In CATS, lines of text can be displayed in the cockpit when pilots fail to complete required actions. Conversely, GHOST blanks or blinks displays and then displays text when pilots make a fixation error. GHOST revealed that pilots can be made more aware of the potential failure associated with their decisions (maintaining their flight plan, in this instance), thereby easing mission rejection.

5.2 Transparent flightdeck needed One way to avoid conflicts is to design the system in such a way that its operation is transparent to the operators. Here, it is assumed that the HMI is more reliable if the operator can easily understand the principles underlying the displays. Attaining such an understanding offers greater predictability of the future system’s behaviours, which is one of the core features of the reliability of HMI, especially in emergency situations (FAA, 1996). Systems designers sometimes falsely assume that the operators fully understand the functioning principles of flightdeck systems. This can sometimes cause systems to exhibit behaviours that operators cannot always understand, even in the absence of any obvious error from their part. For instance, on the Bluecoat Forum6, a pilot reported an unexpected mode 6

The Bluecoat Forum is an international e-mailing list on the subject of FMS, EFIS and EICAS displays, automated subsystems, flight mode annunciators, flight directors, autopilots, and the integration of all avionics equipment in the modern cockpit. Visit http://www.bluecoat.org.

reversion. The aircraft was given clearance for an altitude change from 20,000 to 18,000 ft and the crew selected the descent mode (V/S for vertical speed) and rate (1000 feet per min). Instead of executing the descent plan, the aircraft autonomously swapped to a different mode (LVL CHG for level change). A possible cause is an invalid airspeed as read by the flight control computer, despite the fact that the airspeed selected by the crew (280 knots) was well inside the normal envelope. This shows the lack of understanding that even experienced pilots can encounter when dealing with opaque automated systems. It also highlights: • the need for pilots to understand automated behaviours; • the insufficiency of design efforts allocated to easing this understanding. Complex automated systems exhibit behaviours that cannot always be forecast nor explained by the operators. This is partly caused by the complex structure of triggering conditions hidden behind the interface. The relatively unpredictable nature of these critical computer-based systems, when combined with the associated likelihood of operators having flawed mental models, implies that conflicts are almost inevitable.

6. GUIDELINES Undependability of HMI in complex, dynamic systems partly originates in the lack of alignment of the system inner model and the human mental model. This issue is genuinely about the structure of these systems and their dialogue. Following the brief description of assistant tools and transparency, this section introduces some considerations related to cooperative interfaces in critical systems.

6.1 Better assistant tools Assistant tools developers need to consider the following features, if they are to help to increase the dependability of HMI in critical systems: • Timeliness. The advice delivered by assistant tools has to be timely. The span of the anticipation of the system is a matter of trade-off. The longer the span, the earlier events can be forecast. However, more competing hypotheses will then have to be analysed and eventually brought to the operator’s attention. • Intention. Capturing what the operator wants to do would help in avoiding pilots misinterpreting symptoms when the latter cannot be interpreted meaningfully. Current assistant systems only match the operator’s actions against reference plans, the downside being that safe violations (see Besnard & Greathead, 2003) cannot be supported. • Integration. Today, most of the support given to pilots uses the visual and aural channels. Using more of the kinesthetic channel, e.g. through vibrations (as for stall warnings via the control column), would help to diminish the visual processing load and de-clutter flightdeck displays. On another dimension, how the various functions and subsystems can be integrated to one another (e.g. via similar interface principles under multifunction displays) is still not resolved. • Troubleshooting support. Beyond advising on forecast events, assistant tools should provide specific support for troubleshooting. Namely, assistants need to provide operational help including a holistic view of available resources and deadlines, relative

Page 122



likelihood of causes of problems, technical solutions and associated risks. Evolution. Any new automation needs to take account existing automation and related practices, and planned future automation. Each piece of equipment will have an impact on the flight crew’s cognitive resources.

6.2 Transparent flightdeck We believe that a flightdeck that operates in a manner that is transparent to flight crews could improve human performance and, by way of consequence, the dependability of HMI in critical systems. Such a system would allow pilots, on the basis of elementary knowledge, to have a more direct access to the inner functioning of the automation, thereby leading to a mental model that is compatible with the system. This issue is not only compatible with the one of assistants above, but is also complementary. We conceive transparent systems as being related to the issues below: • Predictable systems. Automation provides reliability only when its actions are understood by the human operators. Such an understanding should not be achieved by training more on current designs but by more intuitive systems. In other words, there is a need to reduce the operational complexity induced by the way technology is deployed (as already suggested by Sarter & Woods, 1995). Today, modern cockpits are more of the indirect feedback type. Pilots flick switches and press buttons, and then let the software perform the command. In some cases, software systems even filter human actions, to the extent of sometimes preventing them. Such an evolution has been done for the sake of safety. However, the initial intention has sometimes been stretched beyond human understanding capabilities, thereby impairing the viability of pilots’ mental models. • Computers should mainly be monitoring/advisory systems. The automation should take last resort emergency decisions (e.g. pull up in front of an obstacle) only if the corresponding situation can be unambiguously characterised as time-critical and not requiring any human intervention. Responsibility for such decisions should remain with humans operators as late as possible. This is actually the case for e.g. the Airborne Collision Avoidance System but some mode changes and reversions (as described in section 5.2) occur in non-critical situations and unduly take the operator out of the control loop, thereby contributing to losses in situation awareness (Endsley, 1996).

7. CONCLUSION It is crucial for dependability of critical computer-based systems that automation be designed in such a way that the behaviours it triggers are understood by humans. When this is not guaranteed, there is a risk that operators are placed out of the control loop, thereby degrading the human-machine cooperation towards system safety. Although our line of arguments relied heavily on commercial aviation, we believe that most computer-based systems which are used to control a critical process (e.g. power production, healthcare) share similarities with HMI in the glass cockpit. As far as the dependability of the interaction is concerned, the compatibility between human mental models and system models is of primary

importance. We believe that there are two main complementary ways to improve this compatibility. One is to have the system work on automation-related events that the operator may not foresee (the assistant tools approach). The other is to design systems that reduce the likelihood of unforeseen automationrelated events (the transparent flightdeck approach). These two views deal with how the structure of the system is modelled by its user, and how it finally impacts the dependability of the interaction. In the days of classical aircraft, electro-mechanical instruments provided the crew with many tasks of low complexity. The glass cockpit has transformed the flight into a job with fewer tasks but of higher complexity. The net balance is one where the workload has shifted, instead of having decreased. Surely, the overall dependability of air transport has improved. But since the hardware and software components of aircraft have now reached unprecedented levels of reliability, HMI has now moved upwards in the dependability agenda. Cognitive mismatches are part of this big picture as a generic mechanism that is potentially involved in any control and supervision activity. Their occurrence can be facilitated by over-computerised environments since the opacity of automation’s decision rules can trigger misunderstood behaviours. Because the failure of complex socio-technical systems is rarely a mere technical issue, we hope that the cognitive approach adopted in this paper is a contribution to a better understanding of the dependability potential of ergonomics and HMI in critical environments, and of potential research avenues.

8. REFERENCES Aeronautica Civil of the Republic of Colombia. (1996). Controlled flight into terrain American Airlines flight 965 Boeing 757-233, N651AA near Cali, Colombia, December 20, 1995 (Aircraft Accident Report). Bass, E. J., Small, R. L., & Ernst-Fortin, S. T. (1997). Knowledge requirements and architecture for an intelligent monitoring aid that facilitate incremental knowledge base development. In D. Potter, M. Matthews, & M. Ali (Eds.), Proceedings of the 10th international conference on industrial and engineering applications of artificial intelligence and expert systems. (pp. 63-68). Amsterdam, The Netherlands: Gordon & Breach Science Publishers. Besnard, D. & Greathead, D. (2003). A cognitive approach to safe violations. Cognition, Technology & Work, 5, 272, 282. Callantine, T. (2001). The crew activity tracking system: Leveraging flight data for aiding, training and analysis, Proceedings of the 20th Digital Avionics Systems Conference (Vol. 1, pp. 5C3/1-5C3/12). Daytona Beach, FL: IEEE. de Keyser, V., & Woods, D. D. (1990). Fixation errors: failures to revise situation assessment in dynamic and risky systems. In A. G. Colombo & A. Saiz de Bustamante (Eds.), Systems reliability assessment (pp. 231-251). Dordrecht: The Netherlands: Kluwer. Degani, A. (1997). On the types of modes in human-machine interactions, Proceedings of the ninth international symposium on aviation psychology. Columbus, OH. Dehais, F., Tessier, C., & Chaudron, L. (2003). GHOST: experimenting conflicts countermeasures in the pilot's activity, Proceedings of the 18th joint conference on artificial intelligence (pp. 163-168). Acapulco, Mexico.

Page 123

Endsley, M. (1996). Automation and situation awareness. In R. Parasuraman & M. Mouloua (Eds.) Automation and human performance: Theory and applications. Lawrence Erlbaum, NJ (pp. 163-181).

Palmer, E. (1995). Oops it didn’t arm. A case study of two automation surprises. In Proceedings of the 8th symposium on Aviation Psychology, Ohio state University, Colombus, OH.

FAA Human Factors Team. (1996). The Interfaces Between Flightcrews and Modern Flight Deck Systems. Washington, DC: Federal Aviation Administration.

Randell, B. (2000). Facing up to faults. The Computer Journal, 43, 95-106.

Helmreich, B. (2001). A closer inspection: What happens in the cockpit. Flight Safety Australia, January-February, 32-35. Hollnagel, E., & Woods, D. (1983). Cognitive systems engineering: New wine in new bottles. International Journal of Man-machine Studies, 18, 583-600. Jones, C. (2000) (Ed.) Preliminary version of conceptual model. Basic concepts. DSOS Project, deliverable BC1. http://www.newcastle.research.ec.org/dsos/deliverables/BC 1.pdf (last accessed on 26/01/2005). Leveson, N., & Palmer, E. (1997). Designing automation to reduce operator errors, Proceedings of IEEE Conference on Systems, Man and Cybernetics. Orlando, FL: IEEE. Leveson, N. G., Pinnel, L. D., Sandys, S. D., Koga, S., & Reese, J. D. (1997). Analysing software specifications for mode confusion potential. In C. W. Johnson (Ed.), Proceedings of Workshop on Human Error and Systems Development (pp. 132-146). Glasgow, UK: Glasgow Accident Analysis Group. Ministry of Transport. (1996). Aircraft Accident Investigation Commission. China Airlines Airbus Industries A300B4622R, B1816, Nagoya Airport, April 26, 1994. (Report 965). Japan: Ministry of Transport, Japan. Monnier, A. (1993). Rapport de la commission d'enquête sur l'accident survenu le 20 janvier 1992. Paris, France: Ministère de l'Equipement, des Transports et du Tourisme. NTSB. (1997). Grounding of the Panamanian passenger ship Royal Majesty on Rose and Crown shoal near Nantucket, Massachusetts, June 10, 1995 (Marine Accident Report NTSB/MAR-97/01). Washington, DC: National Transportation Safety Board.

Reason, J. (1990). Human error. Cambridge, UK: Cambridge University Press. Reason, J. (2001). Heroic compensations. Flight Safety Australia, January-February, 28-31. Rodriguez, M., Zimmerman, M., Katahira, M., de Villepin, M., Ingram, B. & Leveson, N. (2000). Identifying mode confusion potential in software design. In proceedings of the Digital Aviation Systems Conference, October. Rouse, W. B., Geddes, N. D., & Hammer, J. M. (1990). Computer-aided fighter pilots. IEEE Spectrum, 27, 38-41. Rushby, J. (1999). Using model checking to help discover mode confusions and other automation surprises. In D. Javaux & V. de Keyser (Eds.), The 3rd Workshop on Human Error, Safety and System Development. Liege, Belgium. Rushby, J., Crow, J., & Palmer, E. (1999). An automated method to detect potential mode confusions, Proceedings of the 18th AIAA/IEEE Digital Avionics Systems Conference. St Louis, MO: IEEE. Sarter, N. & Woods, D. D. (1995). How in the world did we ever get into that mode? Mode error and awareness in supervisory control. Human Factors, 37, 5-19. Sarter, N. B., Woods, D. D., & Billings, C. E. (1997). Automation Surprises. In G. Salvendy (Ed.), Handbook of Human Factors and Ergonomics (2nd ed., pp. 1926-1943). New York, NY: Wiley. Simon, H. A. (1996). The sciences of the artificial. (3rd ed.). Cambridge, MA: MIT Press. Wickens, C. D. (2001). Attention to safety and the psychology of surprise. Keynote address to the 2001 Symposium on Aviation Psychology, Ohio State University.

Olson, W. A. & Sarter, N. (2000). Automation management strategies : Pilots preferences and operational experiences. International Journal of Aviation Psychology, 10, 327-341.

Onken, R. (1997). The cockpit assistant system CASSY as an on-board player in the ATM environment, Proceedings of first air traffic management research and development seminar. Saclay, France.

Page 124

Dynamic Coalitions: A position paper The Dynamic Coalitions Coalition∗ 14 Feb 2005

1

Context of the work

2

Dynamic Coalition situations

include companies, software or hardware agents, webservices and military units, among others. This work has been motivated by work within the The word “dynamic” points to the nature of the alGOLD[2] project, which seeks to build an architec- liance mentioned above. A dynamic coalition may be ture to facilitate the creation and maintenance of Vir- formed spontaneously, members may join and leave tual Organisations within the Chemical Engineering without warning, and the nature of the relationships sector, and DSTL. between participants may vary dramatically across Within the School of Computing Science at New- the coalition as well as throughout its lifetime. castle, a group has been working to develop a comDynamic coalitions are distinct from a collection mon understanding of Dynamic Coalitions. This doc- of interacting parties by having a common goal or ument seeks to set out our common position, insofar intention. Each party may have several goals, and as one has been achieved1 . some of these may be in conflict with those of other We begin by describing some possible contexts in parties in the coalition, but the common goal is suffiwhich Dynamic Coalitions may arise. In Section 3 ciently strong for them to choose to form a coalition we discuss our progress in formal modelling, and in together. Section 4 we discuss the tools we intend to build. Dynamic coalitions occur in many settings, for example:

Dynamic Coalitions emerge when the individual interests of a number of parties are considered to be best served by co-operation with each other. Within a political context, the OED defines coalition as An alliance for combined action of distinct parties, persons, or states, without permanent incorporation into one body. We can step from politics into many other contexts, by allowing these “parties, persons or states” to also ∗ Budi

Arief, Jeremy Bryans, John Fitzgerald, Carl Gamble, Michael Harrison, Nigel Jefferson, Cliff Jones, Igor Mosolevsky, Thea Peacock, Peter Ryan. 1 Although this work is not being performed within a formal DIRC Targeted Activity, we believe that it may of interest to many within the DIRC project.

• In response to an emergency, where dealing with the immediate emergency becomes the overriding interest of each party. This will be a dynamic grouping, because as the focus of the problem changes the necessary capabilities will be better supplied by different parties. For example, in the event of an earthquake, immediate evacuation of the area may be crucial and require the rapidly deployable capabilities of the army; later the health of the refugees may become the focus, requiring the intervention of the Red Cross. • In response to commercial market forces, where each company believes that profitability will increase if they cooperate. In this context, they are often referred to as Virtual Organisations or Virtual Enterprises. For example, in the Chemical Engineering sector, the patent-holder of a

Page 125

drug will have only a finite amount of time to manufacture and sell the drug before any other company may also exploit the same drug. If a company does not have all the necessary manufacturing skills “in-house”, it may reduce its time-to-market by forming a suitable coalition to compensate for the missing techniques. It will then have more time to exploit its monopoly.

• Information will be “time-valued” — items of information may become more or less valuable to parties over time. • Different parties may make different inferences from the same pieces of information.

We have begun work on a simple formal model which will contain these elements. We have said above that the purpose of our formal • In a programmed-agent context, where the modelling will be to identify potential interesting or agents have been designed to use input from problematic information flow properties within coaliother agents or sensors around them. These are tions. This will lead us to identify the behaviours of called multi-agent systems. As the communicathe individual parties that led to these properties. tions topology changes, agents will have to conAfter this, we may be able to make some deductinually reconfigure their “dynamic coalition” of tions about the motives of the parties for these becommunications with the other agents in order haviours. For example, trust is an important aspect to maintain some desired functionality. of dynamic coalitions. Differing trust relationships may lead to differing behaviours which may lead to • In a military context, where achieving an objecdiffering information flow outcomes. It may be postive may require the cooperation of a large numsible at a later stage of development to ascribe trust ber of military units. Here the dynamic nature of models or policies to the actors. These would esthe cooperation could be due to new objectives sentially act as a set of constraints on the possible becoming necessary as the conflict evolves. behaviours of the coalition. We discuss below a set of open questions which we have discussed for the formal model, and our current 3 Formal modelling thinking on each. We do not necessarily expect any one model to be able to describe all of these situaWe are interested in the flow of information around tions. these models of coalitions. For example, we are interested in identifying states of formal models in which • Should we represent knowledge as atoms, or information has reached the “wrong” actor, or where should relations between these atoms also be a information has not reached the “right” actor. The component of knowledge? Atomic representapurpose of the formal modelling will be to identify tion of knowledge may be sufficient to throw these states. up interesting problems to do with informaWe therefore intend to model a theory of knowledge tion transfer, but it seems cumbersome to aldistribution in a distributed environment in which low agents to reason about and manipulate these dynamic coalitions form, change and disperse. atoms in a meaningful way. It therefore seems To do this we will develop a formal model, within sensible to continue to consider both. which we can specify a number of overlapping dy• How do the knowledge and/or beliefs of individnamic coalitions. We will begin with as simple a ual actors map to the real world? Actors themmodel as possible and continue by building an inselves may have some internal model of the real creasingly refined set of formal models. world. This will include facts which they conOur formal model will at least be able to describe sider to be true. the following properties: We could include in our model an “oracle” com• Parties may know (or believe) different things. ponent to allow the user of the model to state

Page 126

Animator

whether or not these facts actually are true. This has proved a thorny subject, and at this stage it is still not clear if it is a good idea or not.

Scenario

Visual model

join(A,B) break(B,C)

Query

send(A,C,x) ...

• We should be able to allow actors associate metadata with different “bits” of knowledge. This could include degree of certainty, secrecy, allowed recipients and provenance of data, among other things.

Property

CTL SPIN

4

Analysis Tools

Figure 1: A possible view of the animator

We have introduced the idea of building formal models that will allow us to explore the properties of dynamic coalitions, particularly in relation to the flow of knowledge/belief among agents. We also aim to develop tool support that will allow a range of different analyses to be performed on the formal models. One relatively simple tool is an animator that supports the exploration of an executable version of a formal model showing individual agents and the coalitions of which they are members. Figure 1 shows a possible sketch of what this might look like. The model contains agents (circles) linked with possible communication channels. The scenario on the left describes some constraints on how these agents may evolve and communicate. We can query the model at stages of this evolution, to learn the answers to questions such as “when does agent A know this fact?”; “what does agent B know at this stage?”; “from whom did agent C get this piece of info?” etc. A further form of automated analysis is the use of model checking to generate scripts that lead to undesirable states. The model of agents and coalitions needs to be represented as a state-transition system and encoded in a suitable model checking tool. The property characterising states of interest is presented to model checker, which generates a trace leading to such a state. This would need to be converted back to a script which could be executed through the tool interface. We are exploring the use of Spin [1] and CTL as a basis for such a tool. The challenging question is what properties characterise the states we wish to search for.

Acknowledgements: We would like to thank Fred Schneider for his insightful comments on early versions of this work. Also to EPSRC projects DIRC, GOLD, and DSTL.

References [1] Edmund Clarke, Orna Grumberg, and Doron Peled. Model Checking. MIT press, 1999. [2] The GOLD project. www.gigamesh.ac.uk.

Page 127

Developing an Ontology for QoS Glen Dobson

Russell Lock

Computing Department, Lancaster University

Computing Department, Lancaster University,

Lancaster, UK [email protected]

Lancaster, UK [email protected]

ABSTRACT This paper examines the development of an ontology for Quality of Service (QoS). This ontology is being developed to promote consensus on QoS concepts by providing a model which is generic enough for reuse across many domains. Our specific application is to the domain of service-based systems, and we have a particular interest in the QoS attributes which are part of dependability. A further emphasis of this paper is the way in which this work relates to the DIRC research themes. The main theme addressed by the paper is structure, though the applications of the ontology we aim to develop also touch upon risk, diversity and timeliness.

Keywords Structure, ontologies, QoS

1. INTRODUCTION In computing, an ontology can be defined as a specification of a conceptualization [1]. The use of ontologies in computing has gained popularity in recent years for two main reasons: •

They facilitate interoperability.



They facilitate machine reasoning.

Ontologies are already used to aid research in a number of fields. One of the more interesting to DIRC is the National Cancer Institute Thesaurus [2], which contains over 500,000 nodes covering information ranging from disease diagnosis to the drugs, techniques and treatments used in cancer research. Ontologies are also often used in the development of thesauri which need to model the relationships between nodes. Ontology use pervades much of our daily lives, and deaths. Deaths are recorded through correct referencing to the ICD-10 WHO ontology [3]. The complexity of which should not be underestimated. The following code is given for a death involving a volcanic eruption whilst waterskiing in a public library:

called an ontology. However, the choice of formal representation depends on exactly what is to be represented. It may therefore be the case that something called a data modelling language may prove to be more suitable than an ontology language for representing certain conceptualisations (particularly those which are largely taxonomical). For a general introduction to the common aspects of ontological construction see [4]. This paper describes the development of an ontology for QoS. QoS research aims to allow the clear statement of non-functional requirements; reasoning about such requirements during design; as well as specification, monitoring, negotiation, and provision of the relevant levels of service once the system is deployed. Ontologies are designed to make the standardisation of terms for given domains, simpler and easier to achieve. Standardisation of terms and structure has been a considerable stumbling block in the development of many QoS specification projects; often relying on proprietary language formats, that rarely become popular enough to provide a complete vocabulary of terms, or mechanisms for translation between different interpretations. The idea of a QoS ontology for use across technologies is therefore an appealing one. For more information on existing QoS specification languages see [5]. The structure of the remainder of this report is as follows: Firstly, a discussion of the motivation for a QoS ontology. Secondly, an examination of the QoS ontology itself, including an introduction to the hierarchies we are currently in the process of designing. Thirdly, a look at the way in which the DIRC themes of Risk, Structure, Timeliness, Diversity and Responsibility impact on the creation of such constructs. Finally, a brief statement outlining our current state of development, and the potential for future work.

2. MOTIVATION FOR A QOS ONTOLOGY

There is a degree of confusion between a taxonomy and an ontology, and between data modelling versus ontology engineering. In its simplest form an ontology is simply a taxonomy of domain terms, which in turn is clearly a form of data model. Such a taxonomy aids interoperability, but does little to aid machine reasoning.

An ontology is not in itself an end, but rather a means to many ends. We are seeking to provide ourselves (and hopefully, in the future, others) with a unifying base on which to build QoS subsystems. Our main aim is therefore standardisation. The use of a standard itself we see as a simple component of a “structure” for dependability. Moreover, our explicit model of QoS (and specifically dependability) concepts is something which is useful in the formal discussion of “structure” and the effects of different structures. The ontology can be easily extended with new concepts and existing reasoning tools can be used to highlight ambiguity or inconsistency.

Generally, the term ontology infers that domain rules as well as terminology is modelled. As such rules are added to the taxonomic classes, the data model becomes more likely to be

The main pitfall we seek to avoid is straightforward disagreement or misunderstanding of terms. Taking availability as an example QoS attribute can demonstrate the extent of this problem.

X35.2.0

Page 128

Availability is generally represented as a probability that a system is responsive at a randomly selected point in time, i.e. It is essentially a mean of responsiveness over the lifetime of the system (or some period which is agreed to be representative of system lifetime).

As well as standardisation and unification of terms such as reliability and availability, a QoS ontology provides some other useful capabilities. For instance, even if a human misunderstands the usage of a given instance of availability a reasoner will spot the different uses.

However, such information is limited in its usefulness. Statistical summaries by their very nature hide information. An availability of 96.6'% may equally refer to regular downtime of 2 seconds per minute or one day per month. Since system usage will often fall into a particular temporal pattern this means that the mean availability will often be misleading. For instance the 96.6'% availability figure may represent regular maintenance which always takes place on the last Sunday of the month. If the user in question never uses the system at a weekend then this downtime is irrelevant to them. In practice, they may find it actually has better availability than they expected.

Also, certain things which are not explicitly classified at all can be classified automatically, and therefore have the correct domain rules applied to them. This is unlikely to be particularly useful for QoS attributes – but could for instance be used to classify the faults which are returned by a fault reporting system.

3. THE QOS ONTOLOGY To facilitate reusability and extensibility the ontology has been designed from the beginning to be modular in nature. Modules (i.e. Sub-ontologies) fall into three layers as shown in Figure 1.

Other possible representations of availability therefore include a complete downtime history or a listing of regular downtimes. All of these representations of availability (and others) are useful in certain situations. An ontology should therefore allow their use and make explicit their interrelation whilst avoiding ambiguity or confusion. Similarly, a number of different reliability metrics are commonly used in industry, but it cannot be argued for the purposes of this research project that any given metric is the correct one, or even most preferable. Instead a number of possible metrics are put forward, with the proviso that a given interaction may involve, all, some or none of them dependant on the situation. • • • • •

POFOD (Probability of failure on demand) ROCOF (Rate of failure Occurrence) MTTF (Mean time to failure) MTTR (Mean time to repair) MTBF (Mean time between failures. MTTF + MTTR)

Generally speaking, reliability metrics are used to measure three main areas of system operation: • • •

Time to failure Time between failures Time to restart

Note: Time does not necessarily flow in hours, minutes etc, but could be measured in transactions, etc. In the interests of brevity we will not continue this discussion of QoS attributes and their representation further here. Instead please refer to [6].

Figure 1. Layers of the Ontology Some of our less well-defined sub-ontologies could perhaps be replaced with more complete third-party ones at a later date. The base QoS layer contains generic concepts relevant to QoS. It currently consists of a single sub-ontology. Alongside it sit unit ontologies. The only example unit sub-ontology defined at the moment defines units of time (and how to convert between them). This means that an inference engine could establish, for instance, that 1 minute is the same as 60,000 microseconds. The base ontology represents a minimal set of generic concepts (illustrated in Figure 2) – but is the most likely to be expanded upon as the ontology is put to use. Figure 2 shows the properties and classes of the sub-ontology – but not the logical rules which actually define a class in terms of necessary and sufficient conditions for membership.

Page 129

Figure 3. It is likely that such a class will later be defined for the concept of performance as well.

Figure 3. Example classes from the attribute layer

Figure 2. The Base QoS Sub-Ontology We introduce the concept of a QoS attribute, and its unmeasurable and measurable subclasses. In using the ontology it is entirely optional whether one chooses to use these sub-classes or create one's own. Un-measurable in this context relates to attributes which cannot necessarily be measured from a given viewpoint. An example of this could be adherence to the data protection act. Measurable attributes have one or more associated metric. At this level in the architecture we do not prescribe what the individual metrics and formats are; these are defined in more specific attribute sub-ontologies. We define a metric to consist of a description, an acceptability direction and zero or more values. The acceptability direction indicates whether higher or lower values are preferable for the metric (e.g. A low probability of failure on demand is more desirable). It must be remembered that these classes can be extended or constrained by their subclasses, so being over-specific at this base level is undesirable.

Some work is also yet to be done on relating security to dependability. There is no problem with attributes having multiple classifications in an ontology (so, for instance, confidentiality can subclass SecurityAttribute and DependabilityAttribute). We also need to model the fact that dependability attributes are inherently dependent on security. Once modelled, this interrelation could be used, for instance, to disallow the specification of availability without specifying the security attributes to avoid maliciously caused unavailability. The most important part of the service sub-ontology simply links the concept of “QoS attribute” and “service”. Since we are working in the web services arena we have encoded a version of the whole ontology in the OWL Web Ontology Language [8]. This means that we can reuse the Service class from OWL-S ontology [9], which is an existing ontology representing the service domain. Our ontology can also enhance the OWL-S ontology by providing concrete classes to act as its “ServiceParameters”. Certain QoS attributes are also operation-specific (e.g. time-tocomplete, accuracy) and therefore reference the OperationRef class from OWL-S. For attributes such as reliability which are specific to a usage pattern, it may also be useful to reference a workflow in some cases. OWL-S provides a Process class which is much like a workflow. However it would be preferable to also reference other types of workflow definition (e.g. BPEL4WS).

A “physical quantity” has one or more associated “units”. In many cases a numerical value alone cannot be understood without its unit type (e.g. You need to know whether “time to complete” is quoted in seconds, microseconds, milliseconds, etc.) For metrics which have values with simple types (e.g. alphanumeric strings or integer counts) a new datatype property would be included in that sub-class of “metric”.

On top of the sub-ontologies discussed above we are still in the process of extending the finer grained details of dependability attributes (essentially defining metrics). This is an ongoing process as we make more use of the ontology.

Figure 3 shows some attributes from both the dependability and performance sub-ontologies (prefixed by d: and p: respectively). The dependability sub-ontology is largely based upon the taxonomy defined in [7]. It not only includes attributes of dependability – but also means of achieving dependability and dependability threats. These latter may be of less relevance to QoS – but will find use in other forms of specification. There is therefore an overarching concept of dependability, as shown in

We envisage the parameters populating the different ontologies to be used throughout the service cycle illustrated in fig 4. Service capabilities/client requirements, etc. could be expressed through the use of expressions made up of appropriate parameters. The following could be considered a simple example of an expression:

4. USING THE QOS ONTOLOGY IN SERVICE-ORIENTED ARCHITECTURES

Page 130

TimeToComplete < 1000ms

The attribute TimeToComplete (or more precisely one of its metrics) could then be referenced using the QoS ontology in order to reason about both its properties and relation to other attributes.

The fact that the ontology itself can be extended (as well as having instances added to it, forming what is essentially a knowledge base) means that new inferences can be achieved as new knowledge representations are added.

5. THINKING THEMES Section 2 briefly discussed how a QoS ontology contributes to DIRC's “structure” research theme. It was suggested that the ontology itself was a structural component for building more dependable QoS-enabled systems. By simply using the ontology as their base QoS representation, a developer, will avoid certain pitfalls common in the QoS domain. In this context the ontology has a similar function as a design pattern, in that it embodies practically gained knowledge in a reusable way. In its current state this is almost certainly over-stating the ontology's power. However, after a number of iterations of feedback from practitioners (including ourselves), and subsequent improvements we believe that this will be a realistic claim.

Figure 4. Service Usage Process Figure 4 shows the sub-processes involved in service usage in a QoS-enabled architecture. These are: the provider publishing service information to a registry, followed by the client discovering the service via the registry (these are both supported by existing mechanisms such as UDDI). The third stage is negotiation. This may be a full negotiation of a Service Level Agreement (SLA) or it may, at its very simplest, represent an acceptance of published service specifications by the client. Having agreed on the service level to be provided, monitoring of the relevant attributes is set up, before the service is finally utilised by the client. Monitoring continues throughout service usage. We envision the following as possible applications of our ontology in the service domain: •

Specification (at design time as well as at runtime, e.g. specifying the results of monitoring, client testing or provider capability claims)



Requirements specification as input to service differentiation and selection



Fault reporting



SLAs and their negotiation



Reasoning about QoS of composed services

The advantages of an ontology over some other standard for these purposes is the possibility for machine reasoning. This allows inferences such as that dependability is inherently linked to security, that end-to-end availability can be no greater than network availability for a particular route, that time-to-complete of 1 second is the same as that of 1000 milliseconds, etc. An ontology can also infer classifications of objects. It might therefore be useful as the back-end of a fault reporting system. Faults reported through a standard form could be automatically classified as well as compared against existing fault instances already known, to see if the reported fault has already been identified.

The ontology is also a useful means of discussing “structure” and its effects on system dependability. The ontology covers dependability means, threats and attributes as discussed in [7] – providing a way to represent the interrelationship of these concepts. In the system development process, it also minimises ambiguity and misunderstanding between stakeholders, thus minimising design faults in the sub-systems designed specifically for dependability. Our applications of the ontology in Service-Oriented Architectures also touch upon issues relating to other DIRC themes. Our “performance” attribute sub-ontology and “time” units sub-ontology obviously have some relation to the “timeliness” theme. As it stands this relation does not go very deep. However, one area we are looking to use the ontology in is in choosing between service compositions which perform the same task. Essentially, our ontology will give the ability to combine QoS data (including timing and performance data) with a workflow. We aim to model how various QoS metrics aggregate when services are composed and produce overall QoS values for the workflow/s. In the service world it will not be possible to achieve any strict guarantees – but this will at least give us the ability to e.g. choose the composition likely to complete most quickly. Most services based on an economic footing require temporal constraints. The ability of services to state the timeliness of operations is a vital component in the service cycle. These could be applied in a number of different areas including: •

Process time available (Scheduling / load balancing)



Accurate monitoring of the time taken in a request



Constraints and capabilities expressed in service contracts / SLAs

Composing services from many sources (and potentially having further sources provide QoS measurements) also raises problems of “responsibility”. One of our aims is therefore to ensure that the provenance of QoS data is carried with it. A further possibility, if we continued to enhance existing service specification languages, might be to introduce the concept of a responsibility structure into the ontology. This could explicitly

Page 131

model whether the party offering the composition takes complete responsibility, or whether responsibility remains with the atomic services, etc. Another DIRC theme touched upon by composition is “diversity”. The very idea of QoS in Service-Oriented Architectures relies on diverse service implementations being available. Since competing implementations would most likely offer much the same functionality they would compete by offering unique quality:price trade-offs. As well as providing a means of modelling QoS attributes of diverse implementations we also hope to provide other information, such as hardware platform, geographic location, network route, etc. all of which could be used to determine the most diverse service composition achievable. This information could, for instance, be used to choose the most suitable (i.e. diverse) component services for use in a fault tolerant container [10]. Most of the links between DIRC research themes and the ontology will only be fully realised through applying the ontology. The following section discusses some future work which will build upon the ontology.

6. THINKING THEMES This paper details a work in progress. We have only outlined the base components and upper hierarchies needed to support them. We hope to continue populating the hierarchies we have created and to develop software tools suitable for the creation and manipulation of QoS specifications over the next few months. Initially we plan to work on requirements specification as input to service differentiation and selection. The standardisation of attribute structure and QoS hierarchies is essential to the future adoption of more advanced negotiation and discovery mechanisms. The way in which attributes and their metrics are combined into expressions regarding QoS (e.g. assertions in a specification or requirement statements) is the logical next step to investigate. We are in the process of creating suitable user interfaces to allow the construction of expressions to differentiate between discovered services, and to specify service capabilities (regardless of how such capabilities are measured) as well as potentially to be used in the design process. In addition these constructs could be used to aid in the setup of monitoring services designed to prove adherence to a given agreement. The structure which emerges can be used to provide higher level services that support QoS throughout the operational envelope.

The structure of service operation ensures that services provide agreements to cover their liabilities and monitoring to ensure they do not breach them. The service cycle relies on both discovery, and monitoring structures to build trust into the process. Trust in turn reduces the risk inherent to the use of services with whom the client may not personally have dealt with before. The main aim of the standardisation process is to allow greater diversity of services, grounded through a common set of parameters. Along with parallel work on reasoning about composed services the scope of this work is ambitious, but the ontology serves as a good base for this and other future work. Ideally, after serving our own pragmatic purposes, we can also take the ontology on to form the basis of a standard for QoS specification for use across the Web and Grid service community.

7. REFERENCES [1] T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993 [2] National Cancer Institute (NCI) Thesaurus, http://www.mindswap.org/2003/CancerOntology/ [3] ICD-10 WHO Ontology, http://www.who.int/classifications/icd/en/ [4] McGuiness D, “Ontologies come of age”, http://www.ksl.stanford.edu/people/dlm/papers/ontologiescome-of-age-mit-press-(with-citation).htm [5] Glen Dobson, “Quality of service in Service-Oriented Architectures”, http://digs.sourceforge.net/papers/qos.html [6] Glen Dobson & Russell Lock, “QoS specification in service centric systems”, http://wiki.nesc.ac.uk/read/pa9?ParametersOfQoS [7] Jean-Claude-Laprie, Brian Randell, Carl Landwehr, “Basic Concepts and Taxonomy of Dependable and Secure Computing”, in IEEE Transactions on Dependable & Secure Computing. Vol. 1, No. 1, pp. 11-33. [8] “OWL Web Ontology Language Reference”, http://www.w3.org/TR/owl-ref/ [9] “OWL-S”, http://www.daml.org/services/owl-s/ [10] Glen Dobson, Stephen Hall, Ian Sommerville, “A ContainerBased Approach to Fault Tolerance in Service-Oriented Architectures”, http://digs.sourceforge.net/papers/2005-icsepaper.pdf

Page 132

Capturing Emerging Complex Interactions Safety Analysis in ATM Massimo Felici LFCS, School of Informatics, The University of Edinburgh Edinburgh EH9 3JZ, UK http://homepages.inf.ed.ac.uk/mfelici/ Abstract The future development of Air Traffic Management (ATM), set by the ATM 2000+ Strategy, involves a structural revision of ATM processes, a new ATM concept and a systems approach for the ATM network. This requires ATM services to go through significant structural, operational and cultural changes that will contribute towards the ATM 2000+ Strategy. Moreover, from a technology viewpoint, future ATM services will employ new systems forming the emergent ATM architecture underlying and supporting the European Commission’s Single European Sky Initiative. Introducing safety relevant systems in ATM contexts requires us to understand the risk involved in order to mitigate the impact of possible failures. This paper is concerned with some limitations of safety analyses with respect to operational aspects of introducing new systems (functionalities). Keywords Safety Analysis, ATM, Complex Interactions, System Evolution

1. Introduction The future development of Air Traffic Management (ATM), set by the ATM 2000+ Strategy [7], involves a structural revision of ATM processes, a new ATM concept and a systems approach for the ATM network. The overall objective [7] is, for all phases of flight, to enable the safe, economic, expeditious and orderly flow of traffic through the provision of ATM services, which are adaptable and scalable to the requirements of all users and areas of European airspace. This requires ATM services to go through significant structural, operational and cultural changes that will contribute towards the ATM 2000+ Strategy. Moreover, from a technology viewpoint, future ATM services will employ new systems forming

the emergent ATM architecture underlying and supporting the European Commission’s Single European Sky Initiative. ATM services, it is foreseen, will need to accommodate an increasing traffic, as many as twice number of flights, by 2020. This challenging target will require the cost-effectively gaining of extra capacity together with the increase of safety levels [21, 22]. Enhancing safety levels affects the ability to accommodate increased traffic demand as well as the operational efficiency of ensuring safe separation between aircrafts. Suitable safe conditions shall precede the achievement of increased capacity (in terms of accommodated flights). Therefore, it is necessary to foreseen and mitigate safety issues in aviation where ATM can potentiality deliver safety improvements. Introducing safety relevant systems in ATM contexts requires us to understand the risk involved in order to mitigate the impact of possible failures. Safety analysis involves the activities (i.e., definition and identification of system(s) under analysis, risk analysis in terms of tolerable severity and frequency, definition of mitigation actions) that allow the systematic identification of hazards, risk assessment and mitigation processes in critical systems [17, 29]. Diverse domains (e.g., nuclear, chemical or transportation) adopt safety analyses that originate from a general approach [17, 29]. Recent safety requirements, defined by EUROCONTROL (European organization for the safety of air navigation), imply the adoption of a similar safety analysis for the introduction of new systems and their related procedures in the ATM domain [6]. Unfortunately, ATM systems and procedures have distinct characteristics (e.g., openness, volatility, etc.) that expose limitations of the approach. In particular, the complete identification of the system under analysis is crucial for its influence on the cost and the effectiveness of the safety analysis. Some safety-critical domains (e.g., nuclear and chemical plants) allow the properly application of conventional safety analyses. Physical design structures constrain system’s interactions and stress the separation of safety related components from other sys-

Page 133

tem parts. This ensures the independence of failures. In contrast, ATM systems operate in open and dynamic environments where it is difficult completely to identify system interactions. For instance, there exist complex interactions between aircraft systems and ATM safety relevant systems. Unfortunately, these complex interactions may give rise to catastrophic failures. The accident (1 July 2002) between a BOING B757-200 and a Tupolev TU154M [3], that caused the fatal injuries of 71 persons, provides an instance of unforeseen complex interactions. These interactions triggered a catastrophic failure, although all aircraft systems were functioning properly. Hence, safety analysis has to take into account these complex interaction mechanisms (e.g., failure dependence, reliance in ATM, etc.) in order to guarantee and even increase the overall ATM safety as envisaged by the ATM 2000+ Strategy. This paper is concerned with some limitations of safety analyses with respect to operational aspects of introducing a new system (functionality). The paper is structured as follows. Section 2 introduces safety analysis in ATM domain. The EUROCONTROL Safety Regulatory Requirement [6], ESARR4, requires the use of a risk based-approach in ATM when introducing and/or planning changes to any (ground as well as onboard) part of the ATM System. Unfortunately, ATM systems, procedures and interactions expose limitations of safety analyses. Section 3 proposes a framework for capturing complex interactions. The framework supports the iterative aspects of safety analyses. Section 4, finally, discusses the proposed framework and draws some conclusions.

2. Safety Analysis in ATM ATM services across Europe are constantly changing in order to fulfil the requirements identified by the ATM 2000+ Strategy [7]. Currently, ATM services are going through a structural revision of processes, systems and underlying ATM concepts. This highlights a systems approach for the ATM network. The delivery and deployment of new systems will let a new ATM architecture to emerge. The EUROCONTROL OATA project [27] intends to deliver the Concepts of Operation, the Logical Architecture in the form of a description of the interoperable system modules, and the Architecture Evolution Plan. All this will form the basis for common European regulations as part of the Single European Sky. The increasing integration, automation and complexity of the ATM System requires a systematic and structured approach to risk assessment and mitigation, including hazard identification, as well as the use of predictive and monitoring techniques to assist in these processes. Faults [15] in the design, operation or maintenance of the ATM System or errors in the ATM System could affect the safety mar-

gins (e.g., loss of separation) and result in, or contribute to, an increased hazard to aircrafts or a failure (e.g., a loss of separation and an accident in the worst case). Increasingly, the ATM System relies on the reliance (e.g., the ability to recover from failures and accommodate errors) and safety (e.g., the ability to guarantee failure independence) features placed upon all system parts. Moreover, the increased interaction of ATM across State boundaries requires that a consistent and more structured approach be taken to the risk assessment and mitigation of all ATM System elements throughout the ECAC (European Civil Aviation Conference) States [5]. Although the average trends show a decrease in the number of fatal accidents for Europe, the approach and landing accidents are still the most safety pressing problems facing the aviation industry [25, 26, 30]. Many relevant repositories1 report critical incidents involving the ATM System. Unfortunately, even maintaining the same safety levels across the European airspace would be insufficient to accommodate an increasing traffic without affecting the overall safety of the ATM System [4]. The introduction of new safety relevant systems in ATM contexts requires us to understand the risk involved in order to mitigate the impact of possible failures. The EUROCONTROL Safety Regulatory Requirement [6], ESARR4, requires the use of a risk based-approach in ATM when introducing and/or planning changes to any (ground as well as onboard) part of the ATM System. This concerns the human, procedural and equipment (i.e., hardware or software) elements of the ATM System as well as its environment of operations at any stage of the life cycle of the ATM System. The ESARR4 [6] requires that ATM service providers systematically identify any hazard for any change into the ATM System (parts). Moreover, they have to assess any related risk and identify relevant mitigation actions. In order to provide guidelines for and standardise safety analysis EUROCONTROL has developed the EATMP Safety Assessment Methodology (SAM) [8] reflecting best practices for safety assessment of Air Navigation Systems. The SAM methodology provides a means of compliance to ESARR4. The SAM methodology describes a generic process for the safety assessment of Air Navigation Systems. The objective of the methodology is to define the means for providing assurance that an Air Navigation System is safe for operational use. The methodology describes a generic process for the safety assessment of Air Navigation Systems. This process consists of three major steps: 1

Page 134

Some repositories are: Aviation Safety Reporting Systems http://asrs.arc.nasa.gov/ -; Aviation Safety Network - http://aviationsafety.net/ -; Flight Safety Foundation: An International Organization for Everyone Concerned With Safety of Flight http://www.flightsafety.org/ -; Computer-Related Incidents with Commercial Aircraft: A Compendium of Resources, Reports, Research, Discussion and Commentary compiled by Peter B. Ladkin et al. http://www.rvs.uni-bielefeld.de/publications/Incidents/ -.

Functional Hazard Assessment (FHA), Preliminary System Safety Assessment (PSSA) and System Safety Assessment (SSA). Figure 1 shows how the SAM methodology contributes towards system assurance. The process covers the complete life cycle of an Air Navigation System, from initial system definition, through design, implementation, integration, transfer to operations, to operations and maintenance. Moreover, it takes into account three different types of system elements (human, procedure and equipment elements), the interactions between these elements and the interactions between the system and its environment. LIFECYCLE

SAM

System Defintion

FHA

How safe does the system need to be, to achieve tolerablre risk?

System Design

PSSA

Is the proposed design able to achieve tolerable risk?

ASSURANCE

ceptable (or at least tolerable) risk and consequently satisfies its Safety Objectives specified in the FHA. Moreover, the SSA assesses whether each system element meets its Safety Requirements specified in the PSSA. The SSA process collects evidences and provides assurance throughout the system life cycle (i.e., from implementation to decommissioning). Although the SAM methodology describes the underlying principles of the safety assessment process, it provides limited information to applying these principles in specific projects. The hazard identification, risk assessment and mitigation processes comprise a determination of the scope, boundaries and interfaces of the constituent part being considered, as well as the identification of the functions that the constituent part is to perform and the environment of operations in which it is intended to operate. This supports the identification and validation of safety requirements on the constituent parts.

System Implementation

2.1. Modelling Integration

SSA

Does the system achieve tolerable risk?

Operation

Figure 1. Contribution of the Safety Assessment Methodology towards system assurance with respect to the lifecycle.

The FHA is a top-down iterative process, initiated at the beginning of the development or modification of an Air Navigation System. The objective of the FHA process is to determine the overall safety requirements of the system (i.e., specifies the safety level to be achieved by the system). The process points out potential functional failures modes and hazards. It assesses the consequences of their occurrences on the safety of operations, including aircraft operations, within a specified operational environment. The FHA process specifies overall Safety Objectives of the system. The PSSA is another top-down iterative process, initiated at the beginning of a new design or modification to an existing design of an Air Navigation System. The objective of performing a PSSA is to demonstrate whether the assessed system architecture can reasonably be expected to achieve the Safety Objectives specified in the FHA. The PSSA process the Safety Objectives into Safety Requirements allocated to the system elements. That is, it identifies the risk level to be achieved by each system element. The SSA is a process initiated at the beginning of the implementation of an Air Navigation System. The objective of performing a SSA is to demonstrate that the implemented system achieves an ac-

The definition and identification of the system under analysis is extremely critical in the ATM domain and can have a significant influence on the safety analysis. System Models used during design phases provide limited support to safety as well as risk analysis. This is because existing models defined in the design phases are adapted and reused for safety and risk analysis. Organizational and costrelated reasons often determine this choice, without questioning whether models are suitable for the intended use. The main drawback is that design models are tailored to support the work of system designers. Thus, system models capture characteristics that may be of primary importance for design, but irrelevant for safety analysis. On the contrary, models should be built as working-tools that, depending on their intended use, ease and support specific cognitive operations of users, for instance, by highlighting some aspects and neglecting others. The level of granularity of the model should be adaptable to the safety relevance of the part under analysis. Modeling has attracted a substantial effort from research and practice in system engineering. In spite of quality and effective development processes, many system faults are traced back to high-level requirements. This has motivated the increasing use of modeling in system engineering. The aim of modeling is twofold. On the one hand modeling contributes towards correctness and completeness of system requirements. On the other hand modeling supports validation and verification activities. The overall goal of modeling is mainly to reduce the gap between system requirements and design. The requirements-design gap represents a major source of (requirements) changes. Although this gap is one of the sources of requirements changes, re-

Page 135

search on (requirements) evolution clearly points out other origins of changes [24]. Modeling tackles two main issues. The first is that translations from requirements to design are error-prone. The second is that stakeholders (e.g., system users, system engineers, etc.) have often contradicting understandings about which system. These problems have motivated the blossom of many modeling methodologies and languages (e.g., UML [28]) used in practice. Modeling incorporates design concepts and formalities into system specifications. This enhances our ability to assess safety requirements. For instance, Software Cost Reduction (SCR) consists of a set of techniques for designing software systems [11, 12]. The SCR techniques support the construction and evaluation of requirements. The SCR techniques use formal design techniques, like tabular notation and information hiding, in order to specify and verify requirements. According to information hiding principles, separate system modules have to implement those system features that are likely to change. Although module decomposition reduces the cost of system development and maintenance, it provides limited support for system evolution. Intent Specifications provide another example of modeling that further supports the analysis and design of evolving systems [18]. Intent Specifications extend over three dimensions. The vertical dimension represents the intent and consists of five hierarchical levels: Level 1, system purpose; Level 2, system principles; Level 3, blackbox behavior; Level 4, design representation; Level 5, physical representation or code. Note that a recent version of Intent Specifications introduces two additional levels: Level 0 and Level 6. Level 0, the management level, provides a bridge from the contractual obligations and the management planning needs to the high-level engineering design plans. Level 6, the system operations level, includes information produced during the actual operation of the system. Along the horizontal dimension, the Intent Specifications decompose the whole system in heterogeneous parts: Environment, Operator, System and Components. The third dimension, Refinement, further breaks down both the Intent and Decomposition dimensions into details. Each level provides rationale (i.e., the intent or ”why”) about the level below. Each level has mappings that relate the appropriate parts to the levels above and below it. These mappings provide traceability of high-level system requirements and constraints down to physical representation level (or code) and vice versa. In general, the mappings between Intent levels are many-to-many relationships. In accordance with the notion of semantic coupling, Intent Specifications support strategies (e.g., eliminating tightly coupled, many-to-many, mappings or minimizing loosely coupled, one-to-many, mappings) to reduce the cascade effect of changes. Although these strategies support the analysis and design of evolving systems, they provide limited support to understand the evo-

lution of high-level system requirements2. The better is our understanding of system evolution, the more effective are design strategies. That is, understanding system evolution enhances our ability to inform and drive design strategies. Hence, evolution-informed strategies enhance our ability to design evolving systems. Modeling methodologies and languages advocate different design strategies. Although these strategies support different aspects of software development, they originate in a common Systems Approach3 to solving complex problems and managing complex systems. In spite of common grounds, modeling methodologies and languages usually differ in the way they interpret the relationships among heterogeneous system parts (e.g., hardware components, software components, organizational components, etc.). A common aspect is that models identify the relations between the different system parts. On the one hand these relations constrain the system behavior (e.g., by defining environmental dependencies). System (architectural) design partially captures these relations. On the other hand they are very important for system management and design. Among the different relations over heterogeneous system parts and hierarchical levels is Traceability. Although traceability supports management, traceability often faces many issues in practice. In particular, traceability faces evolution. Research and practice in system engineering highlight critical issues. Among these issues evolution affects many aspects of the system life cycle. Unfortunately, most methodologies provide limited support to capture and understand system evolution. This is often because the underlying hypotheses are often unable to capture system evolution. Although requirements serve as basis for system production, development activities (e.g., system design, testing, safety analysis, deployment, etc.) and system usage feed back system requirements. Thus system production as a whole consists 2

3

Page 136

Leveson in [18] reports the problem caused by Reversals in TCAS (Traffic Alert and Collision Avoidance System): ”About four years later the original TCAS specification was written, experts discovered that it did not adequately cover requirements involving the case where the pilot of an intruder aircraft does not follow his or her TCAS advisory and thus TCAS must change the advisory to its own pilot. This change in basic requirements caused extensive changes in the TCAS design, some of which introduced additional subtle problems and errors that took years to discover and rectify.” ”Practitioners and proponents embrace a holistic vision. They focus on the interconnections among subsystems and components, taking special note of the interfaces among various parts. What is significant is that system builders include heterogeneous components, such as mechanical, electrical, and organizational parts, in a single system. Organizational parts might be managerial structures, such as a military command, or political entities, such as a government bureau. Organizational components not only interact with technical ones but often reflect their characteristics. For instance, a management organization for presiding over the development of an intercontinental missile system might be divided into divisions that mirror the parts of the missile being designed.”, INTRODUCTION, p. 3, [13].

of cycles of discoveries and exploitations. The different development processes (e.g., V model, Spiral model, etc.) diversely capture these discover-exploitation cycles, although development processes constrain any exploratory approach that investigates system evolution. Thus system-engineering methodologies mainly support strategies that consider changes from a management viewpoint. In contrast, system changes, like the ones occurring in the ATM System, are emerging behaviors of combinations of development processes, products and organizational aspects.

2.2. Limitations Conventional safety analyses are deemed acceptable in domains such as the nuclear or the chemical sector. Nuclear or chemical plants are well-confined entities with limited predictable interactions with the surroundings. In nuclear and chemical plants design stresses the separation of safety related components from other plant systems. This ensures the independence of failures. Therefore, in these application domains it is possible to identify acceptable tradeoffs between completeness and manageability during the definition and identification of the system under analysis. In contrast, ATM systems operate in open and dynamic environments. Hence, it is difficult to identify the full picture of system interactions in ATM contexts. In particular: There is a complex interaction between aircrafts’ controls and ATM safety functions. Unfortunately, this complex interaction may give rise to catastrophic failures. Hence, failure separation (i.e., understanding the mechanisms to enhance failure independence) would increase the overall ATM safety. Humans [10, 23] using complex language and procedures mediate this interaction. Moreover, most of the final decisions are still demanded to humans whose behaviour is less predictable than that of automated systems. It is necessary further to understand how humans use external artifacts (e.g., tools) to mediate this interaction. Moreover, this will allow the understanding of how humans adopt technological artifacts and adapt their behaviours in order to accommodate ATM technological evolution. Unfortunately, the evolution of technological systems often corresponds to a decrease in technology trust affecting work practice. Work practice and systems evolve rapidly in response to demand and a culture of continuous improvements. A comprehensive account of ATM systems, moreover. will allow the modeling of the mechanisms of evolution. This will enhance strategies for deploying new system configurations or major system upgrades. On

the one hand modelling and understanding system evolution support the engineering of (evolving) ATM systems. On the other hand modelling and understating system evolution allow the communication of changes across different organisational levels. This would enhance visibility of system evolution as well as trust in transition to operations.

3. Capturing Emerging Complex Interactions Heterogeneous engineering4 provides a different perspective that further explains the complex interaction between system (specification) and environment. Heterogeneous engineering considers system production as a whole. It provides a comprehensive account that stresses a holistic viewpoint, which allows us to understand the underlying mechanisms of evolution of socio-technical systems. Heterogeneous engineering involves both the systems approach [13] as well as the social shaping of technology [20]. On the one hand system engineering devises systems in terms of components and structures. On the other hand engineering processes involve social interactions that shape socio-technical systems. Hence, stakeholder interactions shape socio-technical systems. Heterogeneous engineering is therefore convenient further to understand engineering processes. The most common understanding in system engineering considers requirements as goals to be discovered and design solutions as separate technical elements. Hence system engineering is reduced to be an activity where technical solutions are documented for given goals or problems. Differently according to heterogeneous engineering, system requirements specify mappings between problem and solution spaces. Both spaces are socially constructed and negotiated through sequences of mappings between solution spaces and problem spaces [1, 2]. Therefore, system requirements emerge as a set of consecutive solution spaces justified by a problem space of concerns to stakeholders. Requirements, as mappings between socio-technical solutions and problems, represent an account of the history of socio-technical issues arising and being solved within industrial settings. The formal extension of these mappings (or solution space transformations) identifies a framework to model and capture evolutionary system features (e.g., requirements evolution, evolutionary dependencies, etc.) [9]. The resulting framework is sufficient to interpret system changes. 4

Page 137

“People had to be engineered, too - persuaded to suspend their doubts, induced to provide resources, trained and motivated to play their parts in a production process unprecedented in its demands. Successfully inventing the technology, turned out to be heterogeneous engineering, the engineering of the social as well as the physical world.”, p. 28, [19].

Therefore, the formal framework captures how design solutions evolve through subsequent releases. Hence, it is possible to define system evolution in terms of sequential solution space transformations. Moreover, it is possible to capture evolution at different abstraction levels with diverse models. This defines evolutionary cycles of iterations in the form: solutions, problems and solutions. This implies that engineering processes consist of solutions searching for problems, rather than the other way around (that is, problems searching for solutions). This holistic viewpoint of systems allows us to understand the underlying mechanisms of evolution of socio-technical systems, like the ATM System. Capturing cycles of discoveries and exploitations during system design involves the identification of mappings between socio-technical solutions and problems. The proposed framework exploits these mappings in order to construct an evolutionary model that will inform safety analyses of ATM systems. Figure 2 shows the proposed framework, which captures these evolutionary cycles at different levels of abstraction and on diverse models. The framework consists of three different hierarchical layers: System Modeling Transformation (SMT), Safety Analysis Modeling Transformation (SAMT) and Operational Modeling Transformation (OMT).

Figure 2. A framework for modelling evolutionary safety analyses.

The SMT layer captures how solution models evolve in order to accommodate design issues or evolving requirements. Therefore, an SMT captures system requirements as mappings between socio-technical solutions and problems. This allows the gathering of changes into design solutions. That is, it is possible to identify how changes affect design solution. Moreover, This enables sensitivity analy-

ses of design changes. In particular, this allows the revision of safety requirements and the identification of hazards due to the introduction of a new system. Therefore, the SMT supports the gathering of safety requirements for evolving systems. That is, it supports the main activities occurring during the top-down iterative process FHA in the SAM methodology [8]. The FHA in the SAM methodology then initiates another top-down iterative approach, i.e., the PSSA. Similarly, the framework considers design solutions and safety objectives as input to Safety Analyses. Safety analyses assess whether the proposed design solution satisfies the identified safety objectives. This phase involves different methodologies (e.g., Fault Tree Analysis, HAZOP, etc.) that produce diverse (system) models. System usage or operational trials may give rise to unforeseen safety issues that invalidate (part of) safety models. In order to take into account these issues, it is necessary to modify safety analyses. Therefore, safety analysis models evolve too. SAMT, the second layer of the framework, captures how safety analysis models evolve in order to accommodate raising safety issues. Although design models serve as a basis for safety models, they provide limited supports to capture unforeseen system interactions. Therefore, SAMT supports those activities involved in the PSSA process of the SAM methodology [8]. Note that although the SAM methodology stresses that both FHA and PSSA are iterative process, it provides little supports to manage process iterations as well as system evolution in terms of design solution and safety requirements. The framework supports these evolutionary processes. Finally, operational models (e.g., structured scenarios, patterns of interactions, structured procedures, workflows, etc.) capture heterogeneous system dynamics. Unfortunately, operational profiles often change with system usage. For instance, system users often refine procedures in order to integrate different functionalities or to accommodate system failures. OMT, the bottom-layer of the framework, captures how operational models change in order to accommodate issues arising. The evolution of operation models informs safety analyses of new hazards. Therefore, OMT supports the activities involved in the SSA process of the SAM methodology.

4. Discussion and Conclusions The proposed framework addresses three main points in order effectively to support evolutionary safety analyses. Firstly, the model questions the system boundaries and the required level of details. These aspects considerably vary from design models to risk analysis models, since system parts that need to be specified in details for the design may be much less relevant from a safety point of view. The typical drawback experienced in most cases is that resources

Page 138

for risk analysis may be consumed in investigating detailed aspects of every system part, instead of trying to identify unknown risks that may be related to elements not central in the design model. Furthermore it is often the case that system boundaries can be more neatly defined in respect to the design objectives, whilst risk analysis often requires the adoption of a larger focus. All the recent major incidents occurred in the civil aviation domain proved to stem from unexpected interactions from a large variety of elements, differently located in space and time. Those elements were often judged as outside of the system boundaries (or outside of normal operating conditions) when safety analysis has been conducted. For instance, the investigation report [3] of the accident between two aircrafts highlights that although individual ATM systems and procedures work properly, the ATM socio-technical interactions may, unfortunately, result in a catastrophic event. The second point directly addresses these unexpected interactions between system elements as main source of incidents. Best practices and standards in safety analysis prescribe that mutual impact between different risks be analysed. A system model is a key support to perform this task effectively, but the possible interactions need to be represented explicitly. On the contrary, models defined for design purposes usually outline the relationship between system elements by a functional (or physical) decomposition. In all the cases when design models are exploited for the safety analysis, the functional decomposition principle many unduly provide the structure for the analysis of incident causal dynamics [14, 16], thus failing to acknowledge their different underlying nature. Furthermore, a correct model should not only ensure that interactions and mutual impact between different risks be analysed, but also outline interactions between everyday productive processes in ”normal operating conditions”, since risk factors are likely to interact along these lines. The third characteristic of the model refers to the possibility of effective re-use of (part of) the model to inform other safety analyses. This would ensure that part of the safety feedback and experience related to a system can be beneficial when introducing major changes to the current system or when developing new similar systems. In the same way, the effective reuse of the model would result in safety analyses that have better means to achieve a good balance between exhaustiveness and costs, as findings of closely related analysis could be easily considered. In order realistically and cost-effectively to realize the ATM 2000+ Strategy, systems from different suppliers will be interconnected to form a complete functional and operational environment, covering ground segments and aerospace. Industry will be involved as early as possible in the life cycle of ATM projects. EUROCONTROL manages the processes that involve the definition

and validation of new ATM solutions using Industry capabilities (e.g., SMEs). In practice, safety analyses adapt and reuse system design models (produced by third parties). Technical, organisational and cost-related reasons often determine this choice, although design models are unfit for safety analysis. Design models provide limited support to safety analysis, because they are tailored for system designers. The definition of an adequate model and of an underlying methodology for its construction will be highly beneficial for whom is performing safety analyses. As stated before, currently the model definition phase cannot be properly addressed as an integral part of safety analysis, mostly because of limited costs and resources. This paper is concerned with problems in modelling ATM systems for safety analysis. The main objective is to highlight a model specifically targeted to support safety analysis of ATM systems. Moreover, the systematic production of safety analysis (models) will decrease the cost of conducting safety analyses by supporting reuse in future ATM projects. Acknowledgements. I would like to thank Alberto Pasquini and Simone Pozzi for their information about the ATM domain. This work has been supported by the UK EPSRC Interdisciplinary Research Collaboration in Dependability, DIRC - http://www.dirc.org.uk - grant GR/N13999.

References [1] Mark Bergman, John Leslie King, and Kalle Lyytinen. Large-scale requirements analysis as heterogeneous engineering. Social Thinking - Software Practice, pages 357– 386, 2002. [2] Mark Bergman, John Leslie King, and Kalle Lyytinen. Large-scale requirements analysis revisited: The need for understanding the political ecology of requirements engineering. Requirements Engineering, 7(3):152–171, 2002. [3] BFU. Investigation Report, AX001-1-2/02, 2002. [4] John H. Enders, Robert S. Dodd, and Frank Fickeisen. Continuing airworthiness risk evaluation (CARE): An exploratory study. Flight Safety Digest, 18(9-10):1–51, September-October 1999. [5] EUROCONTROL. EUROCONTROL Airspace Strategy for the ECAC States, ASM.ET1.ST03.4000-EAS-01-00, 1.0 edition, 2001. [6] EUROCONTROL. EUROCONTROL Safety Regulatory Requirements (ESARR). ESARR 4 - Risk Assessment and Mitigation in ATM, 1.0 edition, 2001. [7] EUROCONTROL. EUROCONTROL Air Traffic Management Strategy for the years 2000+, 2003. [8] EUROCONTROL. EUROCONTROL Air Navigation System Safety Assessment Methodology, 2.0 edition, 2004. [9] Massimo Felici. Observational Models of Requirements Evolution. PhD thesis, Laboratory for Foundations of Computer Science, School of Informatics, The University of Edinburgh, 2004.

Page 139

[10] Flight Safety Fundation. The Human Factors Inplication for Flight Safety of Recent Developments In the Airline Industry, number (22)3-4 in Flight Safety Digest, March-April 2003. [11] Constance L. Heitmeyer. Software cost reduction. In John J. Marciniak, editor, Encyclopedia of Software Engineering. John Waley & Sons, 2nd edition, 2002. [12] Daniel M. Hoffman and David M. Weiss, editors. Software Fundamentals: Collected Papers by David L. Parnas. Addison-Wesley, 2001. [13] Agatha C. Hughes and Thomas P. Hughes, editors. Systems, Experts, and Computers: The Systems Approach in Management and Engineering, World War II and After. The MIT Press, 2000. [14] Chris W. Johnson. Failure in Safety-Critical Systems: A Handbook of Accident and Incident Reporting. University of Glasgow Press, Glasgow, Scotland, October 2003. [15] Jean-Claude Laprie et al. Dependability handbook. Technical Report LAAS Report no 98-346, LIS LAAS-CNRS, August 1998. [16] Nancy Leveson. A new accident model for engineering safer systems. Safety Science, 42(4):237–270, April 2004. [17] Nancy G. Leveson. SAFEWARE: System Safety and Computers. Addison-Wesley, 1995. [18] Nancy G. Leveson. Intent specifications: An approach to building human-centered specifications. IEEE Transactions on Software Engineering, 26(1):15–35, January 2000. [19] Donald A. MacKenzie. Inventing Accuracy: A Historical Sociology of Nuclear Missile Guidance. The MIT Press, 1990. [20] Donald A. MacKenzie and Judy Wajcman, editors. The Social Shaping of Technology. Open University Press, 2nd edition, 1999. [21] Stuart Matthews. Future developments and challenges in aviation safety. Flight Safety Digest, 21(11):1–12, November 2002. [22] Michael Overall. New pressures on aviation safety challenge safety management systems. Flight Safety Digest, 14(3):1– 6, March 1995. [23] Alberto Pasquini and Simone Pozzi. Evaluation of air traffic management procedures - safety assessment in an experimental environment. Reliability Engineering & System Safety, 2004. [24] PROTEUS. Meeting the challenge of changing requirements. Deliverable 1.3, Centre for Software Reliability, University of Newcastle upon Tyne, June 1996. [25] Harro Ranter. Airliner accident statistics 2002: Statistical summary of fatal multi-engine airliner accidents in 2002. Technical report, Aviation Safety Network, January 2003. [26] Harro Ranter. Airliner accident statistics 2003: Statistical summary of fatal multi-engine airliner accidents in 2003. Technical report, Aviation Safety Network, January 2004. [27] Review. Working towards a fully interoperable system: The EUROCONTROL overall ATM/CNS target architecture project (OATA). Skyway, 32:46–47, Spring 2004. [28] James Rumbaugh, Ivar Jacobson, and Grady Booch. The Unified Modeling Language Reference Manual. AddisonWesley, 1999.

[29] Neil Storey. Safety-Critical Computer Systems. AddisonWesley, 1996. [30] Gerard W.H. van Es. A review of civil aviation accidents - air traffic management related accident: 1980-1999. In Proceedings of the 4th International Air Traffic Management R&D Seminar, New-Mexico, December 2001.

Page 140

Specification and Satisfaction of SLAs in Service Oriented Architectures Stuart Anderson†∗

Antonio Grau†∗

Conrad Hughes†∗

February 15, 2005

1

Introduction

ensures that both parties understand the service to be provided and that it will conform to certain QoS requirements. An SLA language usually defines the following aspects of the SLA: the purpose; the validity period; the parties involved and their respective rˆoles; the scope (i.e. the service operations covered in the agreement); the set of service level indicators (or QoS parameters), and their associated metrics, over which the QoS levels can be measured; the set of Service Level Objectives (SLOs) or guarantees to be fulfilled; and the penalties and actions to be undertaken when these guarantees are not satisfied. QoS parameters refer to observable properties relating to nonfunctional aspects of the service, e.g. availability, performance and reliability. SLOs are constraints defined over the values of those parameters that may be dependent on a precondition. An SLA language should fulfil the following desirable properties and requirements:

This paper presents two essential aspects related to the specification of Quality of Service (QoS) in service oriented architectures. Firstly an overview is presented of the concepts and requirements that a language for the specification of QoS agreements (Service Level Agreements, or SLAs) must address, and how the WSLA language supports them. We concentrate on the components that an SLA language should have, and do not address other issues of the SLA life cycle such as negotiation, provisioning and monitoring. Secondly a work-in-progress is described which can provide statistical estimates of multiple dimensions of behaviour of composite services, a facility which will be essential when choosing services and adjusting workflows in order to satisfy SLA requirements. Section 2 presents SLA language requirements, section 3 gives a brief introduction to WSLA and shows its use in the specification of tradeoffs between the QoS parameters (as well as evaluating Precision WSLA against requirements), and section 4 describes the work on predicting behaviour of aggre- An SLA must be specified in a precise and unamgate workflows. biguous way. This is essential in order that the service provider and customer clearly understand the offered quality of service, and so that no mis2 Requirements for an SLA understandings arise later. Precision is also essential in order to help automate the SLA managelanguage ment process, such as the automatic negotiation, A service level agreement is an agreement between provisioning and monitoring of the SLA. Precision the provider of a service and their customer that must cover all the elements of an SLA specificadefines the set of Quality of Service (QoS) guar- tion, including the definitions of the QoS parameantees and the obligations of the parties. An SLA ters, their metrics and the agreed SLOs. For example, dependability concepts such as availability † School of Informatics, University of Edinburgh and reliability and how these are measured should ∗ These authors acknowledge the support of EPSRC award no. GR/S04642/01, Dependable, Service-centric Grid be well understood by both parties. Issues such Computing as when the SLA should be checked for compli-

Page 141

ance, over which sample of data and from where the measurements should be taken must be unambiguously specified. Offering an average availability of 99.9% does not entirely assure that the service will be available on 99.9% of client accesses, as availability may depend on the time of access. Also, response time measurements made by the service client are likely to be entirely different to those taken on the network or by the service provider. An average response time may be different depending on the averaging window, for example, five minutes or one hour. The correct definition of QoS parameters corresponds to the establishment of an ontology between a service provider and the client [4]. This ontology can be a definition of terms and the semantics of the relationships between them. The definitions could then be available at referenced URIs on the “Semantic Web” as suggested by the OWL-S approach [7], so that the provider and the client have a means of sharing definitions of the terms and concepts used in the SLA.

Flexibility Web services can be extremely diverse. The QoS parameters and SLOs referenced in an SLA are often specific to the particular web service under negotiation. For example, the SLA of a service providing streaming multimedia would contain QoS parameters such as bandwidth limits and quality of the streaming media, while the SLA of a e-banking service would have guarantees on the security of the transmitted information. The nature of interactions between the service customer and provider is likely to be diverse too — for instance, single request-reply interactions and interactions that can last for several hours. This diversity demands SLA languages that can be extended to fit the needs of the specific web service domain. Extensibility should cover the definition of new QoS parameters and new metrics as well as other language terms necessary for the definition of SLOs specific to the web service. XML-based languages are good candidates, not only because of their ease of extension but also because they are extensively used in other web service-related specifications such as WSDL and SOAP [3].

Definition of qualitative parameters Existing SLA languages refer to observable QoS parameters that can be measured and therefore have an associated metric. However, they do not refer to qualitative unmeasurable parameters such as the chosen security model or supported standards. The reason for excluding these parameters is probably due to the fact that one of the main objectives of formalising SLAs is the automation of the SLA management. Including unmeasurable parameters in the SLAs would break the homogeneity of how the parameters are handled by some of the components of the SLA management framework, such as the provisioning of resources and the measuring and monitoring of the parameters. As SLAs are usually not stand-alone documents but embedded in legal contracts, a possible option for consideration of qualitative parameters would be their direct inclusion in the contract, leaving intact the SLA document. However, this would prevent the use of such qualitative parameters in SLOs, for instance for the specification of tradeoffs between measurable and unmeasurable parameters. It remains an issue how unmeasurable parameters can be expressed in an SLA and how the client can be confident that the service fulfils them.

Definition of relationships between the QoS parameters An SLA language should allow the specification of guarantees that relate different QoS parameters by providing a set of logical expression constructs. This is essential in order to specify tradeoffs between the parameters. For instance, a client could be interested in making tradeoffs between response time and availability, or between size and quality of the data. Another interesting possibility for relating QoS parameters is to give them a weight or level of importance, allowing customers to establish their priorities. This could prove useful to the SLA provisioning system if excess resources are available after all SLOs are met.

Definition of effects The SLA must specify the consequences of not meeting an SLO. This includes the specification of a list of actions to perform such as sending notifi-

Page 142

cations to the parties as well as the penalties for the provider. Penalties would incur a certain cost for the provider and should also include the option of terminating the contract in the case that the service levels are unacceptable. Alternatively, meeting an SLO could also generate a reward for the service provider. An SLA should also facilitate notification of the parties when the service level is near to violating an SLO, so the provider has the opportunity to assign additional resources to guarantee the SLO, and the client has advance warning of possible breaches.

Definition of endogenous and exogenous parameters The level of service offered for some QoS parameters can be affected by other parameters that are not within the control of the service provider. This is the case, for example, for web services that are accessed through the public Internet, where the service provider has no control over the network properties that are provided by the client’s ISP. An SLA must specify under which conditions or values of the exogenous parameters, the SLOs are to be guaranteed by the service provider. The SLA can define different guarantee levels as a function of the values of the exogenous parameters. This affords the service provider protection from liability in case of low QoS levels resulting from circumstances outside their control. Although the client is only interested in the end-to-end QoS, it is important for the provider to identify exogenous components in the parameters and the effect that these components have on the end-to-end values so guarantees can be specified accordingly, e.g. end-to-end availability as a function of network availability.

Exceptions An SLA must not only specify the level objectives in normal conditions but also in exceptional circumstances. Exceptions can be provoked by many diverse events and can have effects of diverse magnitude on the terms of the SLA. For example, a server attack may produce a denial of service that entails the suspension of all the SLOs established in the SLA, while maintenance operations at the provider side may just affect the levels of some parameters. Exceptions can be caused by the service

provider (e.g. a server-side hardware failure), the client, an external provider (e.g. the client ISP), or complete externalities such as natural catastrophes or war. Providers must specify the actions to take when an exception for which they are responsible happens (e.g. recovery mechanisms and notifications in case of failures), and also clearly identify the exceptions that are not under their control, so that no liability problems arise. Exceptions can be classified as follows: Type

Failures

Service maintenance

Network properties (not responsibility of the provider)

Denial of service

Examples Hardware failure, software bugs/flaws, telecommunication failure, measurement or monitoring failure Hardware upgrades, software upgrades, backups Low network availability, high network error rate, ... Client negligence/ wilful misconduct, network/Internet security breaches (floods, hacks and attacks), Acts of God (fire, earthquakes, . . . ) and circumstances beyond reasonable control (war, terrorism, strike, . . . )

Table 1: Exceptions

Composition A service provider can make use of other providers to supply some part of the service functionality. In this case, the values of the SLA agreed by the provider and the end customer will depend on the values of the SLAs agreed by the provider and their suppliers. Composite web services require therefore a precise understanding of how the individual QoS properties of a component contribute to the overall QoS properties of the composite. This will depend on the structure of the composition and the nature of the QoS parameters. A precise QoS ontology must therefore encompass the definition of how QoS

Page 143

parameters behave in composition.

Parties specification

Definition of parties The SLA must include information about the parties involved in SLA management and their responsibilities. Apart from the signatory parties, i.e. the service provider and the service client, the SLA should contain information about third parties, i.e. parties supporting part of the management functionality, such as measurement and monitoring.

SLA management The components of an SLA must be defined in the context of an SLA management framework. SLA management includes tasks such as the negotiation, creation, provisioning, monitoring and compliance checking of the SLAs. The creation of the SLA may be as simple as the customer selecting one of the pre-specified SLAs offered by the provider, or by customisation via a negotiation process. Different service clients may have different requirements and preferences regarding QoS levels, so the service provider must define several offers with different service levels. Pre-specified, fixed and negotiable information about the QoS values offered by the provider can be captured by SLA templates. The structure of an SLA template may be the same as that of an SLA but partially completed and containing an additional section where constraints on the values of the unfilled fields are defined. The constraints must be followed by the customer when negotiating the SLA.

3

SLA specification in WSLA

WSLA (Web Service Level Agreement) is a formal language to define service level agreements that has been developed by IBM [2, 5, 6]. It is based on XML, and an XML schema has been defined for its syntax. The language is extensible and allows derivation of new domain-specific elements from existing language elements. This can easily be done by making use of the ability to create derived types using XML schemas. A WSLA specification is structured in three main sections: the parties, the service description and the obligations.

This section describes the parties involved in the management of the web service. A party can be a signatory party or a supporting party. Signatory parties are either the service provider or the service customer. The information for a party includes the name, contact, and the definition of the interfaces of actions that it offers. The interface definitions are specified in WSDL and describe operations that a party can perform when invoked by the occurrence of an event, e.g. a notification when a guarantee is violated. The information for a supporting party includes additionally the sponsor of the party, either the provider or the customer, and the supporting rˆole that it assumes, for example, measurement service or condition evaluation service. Service definition A service definition specifies the information needed about the service to define the agreed service level guarantees, i.e. the operations offered by the service provider, the QoS parameters to be considered for each of these operations and the metrics used for measuring these parameters. In WSLA, QoS parameters are called SLA parameters and must be measurable. Each SLA parameter has a name, a type and a unit. In addition, a parameter refers to one metric that describes how the value of the parameter is measured or computed. A metric can be either a resource metric or a metric composed from other metrics. In the former case, this is specified by a measurement directive; in the latter case, this is specified by a function. Examples of resource metrics are system uptime, service outage period and number of service invocations. Examples of composite metrics are maximum response time and average availability of the service. A measurement directive describes how the values are retrieved from the resources. Examples of measurement directives are the URI of a computer program, a command for invoking scripts and database queries. A function represents a measurement formula that specifies how the composite metric is computed. Examples of functions are mean, median, sum, minimum, maximum and time series constructors. For every function a schedule is described. The schedule specifies the time inter-

Page 144

vals during which the function is executed to compute the metric. The time intervals are defined by means of the start time, duration and frequency. Moreover, SLA parameters can include information about the party that provides the values and the parties that receive them, either by active update (push) or by providing access to the parties to retrieve them (pull ). Each operation also contains a reference to the service that contains the operation definition to which the WSLA operation refers. This reference depends on the way in which the service is described. In the context of web services, this is described in a WSDL specification. The reference to a WSDL-defined service includes the name of the WSDL file, the kind of binding, i.e. the transport encoding for the SOAP messages, and the operation name as specified in the WSDL file. Obligations This section defines the guarantees and constraints that are imposed on the SLA parameters. Two kinds of obligation arise: Service Level Objectives (SLOs) and action guarantees. SLOs are restrictions on the values of the SLA parameters in a given period of time. An SLO specifies the party that is responsible for delivering what is imposed in this guarantee, a validity period defining when the restriction is applicable and a logical expression defining the assertion to be tested. In addition, an SLO may contain information about when the assertion should be evaluated. This can be done by defining either an evaluation event expressing when the assertion should be evaluated (for example, every time a new value for an SLA parameter included in the assertion is available), or a schedule according to which the assertion is evaluated. An action guarantee specifies the actions that must be performed by the parties in the case that a given precondition is met. A typical predicate in a precondition is the violation predicate that expresses whether an SLO has been violated. The definition of an action guarantee contains the name of the party in charge of this guarantee, a logical expression defining the precondition, an evaluation event or schedule describing when the precondition should be evaluated, and the actions to be invoked at particular parties in the case that the precondi-

tion holds. The interface of the actions is defined in the parties’ specification section of the WSLA document and examples of actions are the notification of events, problem reports and payment of penalties and premiums. An action guarantee may also include an execution modality that expresses the frequency with which the action must be executed depending on the value of the precondition, for instance always, on entering a condition, and on entering and leaving a condition.

Tradeoffs specification in WSLA A main feature of WSLA is that it allows the use of logical expressions in the specification of the SLOs and action guarantees. WSLA provides logical expressions that follow first order logic including logic operators and predicates but not quantifiers. The logic operators are And, Or, Not and Implies. Tradeoffs between QoS parameters (i.e. dependencies between the values of the parameters) can be specified using the Implies operator. Predicates are functions that return true or false. The predicates needed may vary depending on the SLA parameters and metrics defined for a particular domain. WSLA allows the definition of new predicates to fit the needs of the specific application domain by defining the predicate type as an abstract type that can be extended in the XML schema. WSLA also provides a set of builtin predicates: Violation, Greater, Less, Equal, GreaterEqual, LessEqual, True and False. As an example to illustrate the concepts, suppose we have a web service providing text-to-speech audio streaming. Customers send requests in the form of texts to be synthesised. The provider generates the corresponding audio signal and transmits it to the customers over the Internet in a compressed stream format. A key requirement in the quality of the service is the continuous playback of the audio stream by the client, i.e. elimination of the gaps or silences which occur (usually as a result of network congestion) when the rate of audio delivery drops below real time for long enough to exhaust the client’s buffers. Assuming an idealistic case in which the provider promises that no gaps in transmission will occur, the SLO within the SLA document that guarantees this requirement could be specified as follows:

Page 145

TTSProvider 2005-01-01T09:00:00-00:00 2005-01-31T09:00:00-00:00 BitsDelay 0 NewValue ...

The complete WSLA specification can be found in appendix A. This SLO specifies first the obliged party in charge of assuring the level of service, and then the validity period. Assuming we have an SLA parameter, BitsDelay, which represents the difference of bits between the expected and the actual bits received by the client, the SLO asserts that this difference must be equal to zero, i.e. there is no gap during the transmission. A more realistic example is the assurance by the provider of no gaps only after a percentage of the audio has been transmitted. This helps the provider to first negotiate and calculate the appropriate encoded content for the best audio quality depending on the customer’s real connection bandwidth. The SLO specifying this guarantee could be defined as follows: TTSProvider 2005-01-31T09:00:00-00:00 2005-01-31T9:00:00-00:00 StreamTransmittedPercentage 10 BitsDelay 0

NewValue

Given an SLA parameter, StreamTransmittedPercentage, which represents the percentage of the audio stream that has been transmitted to the client, this SLO asserts that there are no gaps after 10% of the audio stream has been transmitted. Note that as WSLA does not allow the use of variables with values that can be computed nor retrieved from a resource, the percentage of the audio stream has been defined as an SLA parameter. WSLA allows the specifications of actions to be executed in the parties when a guarantee or SLO is violated. In our example, the following action guarantee specifies the execution of a notification to the provider indicating that a gap in the transmission has violated the SLO: AuditingCompany GapOccurrence NewValue TTSProvider Violation GapOccurrenceGuarantee BitsDelay Always

Analysis of WSLA against requirements WSLA is one of the most suitable QoS languages for achieving the requirements discussed in the previous section. In particular, the language presents the following strengths: • It allows the precise definition of the QoS parameters and their metrics, i.e. how and under which schedule the measurement values are retrieved from the resources and how they can

Page 146

be combined by functions to obtain meaningful parameters such as the mean and the sum of a series of values. • It is flexible. The XML schema specification allows the definition of new functions, predicates and measurement directives by deriving them from their abstract types. Furthermore, the inherent extensibility of XML makes the definition of new elements straightforward. • The language provides logical expressions to relate QoS parameters and specify tradeoffs between them. • It allows the specification of actions to be performed by the parties when some QoS values or service level objectives are not met. However, it does not specify any penalty/reward policy.

4

Behaviour of composite services

The DIGS project seeks to automate as much as possible the composition of services in pursuit of QoS requirements through choice of appropriate components (diversity) and modification of workflow (structure). This can only be done if the effect of differing service choices and workflow modifications can be understood. Consequently we are developing a tool which predicts the aggregate behaviour of a composition in terms of its workflow structure and the known behaviour of its individual services. Once such predictions are possible, candidate compositions can be evaluated against client SLA demands and duly accepted or disregarded. Our solution breaks the problem into the following four elements:

• It gives full support to the definition of parties Properties and their responsibilities. Nevertheless, some of the requirements are not These simply correspond to the QoS parameters in terms of which SLOs are written — things like time supported: to complete, availability, accuracy, peak bandwidth • It does not provide any specific construct for usage, perhaps even cost. the specification of penalties/rewards. • It does not allow the specification of logical or Workflow non-quantifiable parameters. While so sophisticated a representation of compo• It does not support the possibility of giving sition structure as BPEL or a scripting language priorities or weight to the QoS parameters. may not be necessary, at least the basics of parallel, conditional and serial operation need to be • Although the logical expressions provided by the language allow the specification of SLOs modelled. Parallel operations range from simple variations like waiting for the first or all responses in different conditions (normal, maintenance, through to more complicated ones such as voting or failure/exception cases, etc.), the language does not support the explicit definition of these waiting for a certain number of responses to satisfy a condition (such as succeeding rather than failing). conditions. Different cases could be specified by the definition of variables, or predicates on the values of the variables, that could be in- Behaviours stantiated during the service execution. HowEach property will respond to different workflow ever, WSLA only allows the declaration and operations in different ways, for example time to definition of QoS parameters. complete is additive under serial composition, maxThese issues need to be addressed in order to ob- imising under all-of parallelism (i.e. the time to tain more complete SLA specifications. Many of complete of an all-of parallel composition will be the language concepts are being currently used and the maximum of its components’ times), minimisextended to cover grid services in the context of ing under one-of parallelism; network bandwidth the WS-Agreement standard from the Global Grid is maximising under serial composition and addiForum, where the WSLA authors are active mem- tive for all forms of parallelism; probability of success is multiplicative in series and “complementbers [1].

Page 147

Q ) in parallel; etc. multiplicative” 1 − i (1 − psuccess i Considered in this light, a “property” becomes a mapping from workflow operations to (numerical or logical) operations on values. Values Workflow-induced behaviours of properties must be executed on actual values if useful predictions are to be made; the choice of underlying representation for these values will be critical, as (for example) it will be very difficult to say anything about the slowest 10% of a class of operation if all that’s known about them is their average performance. Possible representations for numeric quantities include expected value; minimum-maximum range; mean and variance of best normally distributed approximation; probability density function; Markov model of states; etc. Even within an individual problem it will almost certainly make sense to use different value representations for different properties. At the moment the main value representation used is an approximation of the variable’s statistical distribution using a variable number of uniform segments. This allows for moderately accurate representations of a number of situations, including delta functions, bimodal variables, etc. Processes A process is a bundle of properties representing the known behaviour of an actual workflow element usually a service call. These are the objects with which aggregation computations are made. It is necessary to process all properties at once in bundles because, once elements such as conditional operations and parallel-first-response are admitted to the workflow, individual properties start to affect the probabilities with which other processes are executed, hence affecting all properties for dependent elements of the workflow.

Examples In reference to the text-to-speech service, it might be reasonable to expect the difference between the received and expected quantity of data to be normally distributed. The likelihood of a gap in audio manifesting on the recipient’s machine will then be determined by the size of the recipient’s buffer:

if the received-versus-expected difference exceeds buffer capacity then the buffer will be empty and a gap will occur — the probability of this event can be exactly determined once the difference’s distribution (normal or otherwise) is known. If two sufficiently similar text-to-speech sources can be sourced then the likelihood of both buffers emptying simultaneously will be the product of the each individual service’s gap likelihoods. Obviously this use of correlated parallel data sources will be significantly more reliable than a single source. However, it clearly also uses twice as much network bandwidth, which in a bandwidth-constrained situation could cause the very situation it seeks to mitigate. Similarly, as reported in last year’s DIRC conference, making queries simultaneously against several diversely implemented databases can substantially improve response times — you can forget about the other queries as soon as you have any result that’s good enough. This has been demonstrated in practice, but also follows logically from the mathematical fact that the minimum of two random variables will always have a lower mean than either variable individually (as long as their distributions overlap — if one variable is always lower than the other this strategy cannot have any useful effect). The caveat here is that in a commercial situation this improved performance will come at the — likely financial — cost of always making two queries instead of one. A further example: simply waiting for a small period for an answer from one service then (absent said answer) making the same request of another service will decrease the rate of failures (both services must fail in order to break this strategy), probably also slightly improve response times (the balance here will depend on the failure rates), and — a tradeoff again — slightly increase costs. Calling the more expensive service first will have a higher average cost than starting with the less expensive service, but it might be expected that this more expensive service will also offer a better response time: another choice for the user to make. As can be seen in all of the above examples, there exist situations where using multiple services in place of one can improve certain properties of the system, usually at a tradeoff against other properties — a tradeoff heavily influenced by the manner in which the services are combined. The tool we are developing here will greatly facilitate the analysis of such situations.

Page 148

Status

(WSLA) Language Specification, Version 1.0. Technical report, IBM Corporation, January 2003.

The aggregate prediction tool is only now approaching useful functionality, as a result of which no analysis has yet been done of how it com- [7] D. Martin et al. Bringing Semantics to Web Serpares with (for example) simulation, or whether the vices: The OWL-S Approach. In SWSWPC04, choice of distribution function as underlying value 2004. gives significantly better results than the alternatives in practical situations. This analysis will be undertaken over the coming months. A Example SLA: text-to-

speech 5

Conclusion



This paper has presented the requirements for a precise SLA specification language and how the WSLA language meets them. Although the language supports many of the requirements, it still needs to be extended in order to support the specification of richer service level objectives, such as the definition of logical parameters and exceptions. We expect that combining SLAs with the ability to predict behaviour of composite workflows should offer substantial gains in efficiency of resource usage as well as allowing service providers and consumers to start thinking about their compositions at a much higher level than before.

References [1] A. Andrieux et al. Web Services Agreement Specification (WS-Agreement). Draft. Global Grid Forum, August 2004. [2] A. Dan et al. Web Services on Demand: WSLAdriven Automated Management. IBM Systems Journal, 43(1):136–158, 2004. [3] G. Dobson. Quality of Service in Service-Oriented Architectures, 2004. http://digs.sourceforge.net/papers/qos.html. [4] G. Dobson and R. Lock. Developing an Ontology for QoS, 2005. [5] A. Keller and H. Ludwig. The WSLA framework: Specifying and monitoring service level agreements for web services. IBM Research Report, May 2002. [6] H. Ludwig, A. Keller, A. Dan, R. P. King, and R. Franck. Web Service Level Agreement

NeSC Edinburgh, UK Notification.wsdl SOAPNotificationBinding Notify JCMB, King’s Buildings Edinburgh, UK Notification.wsdl SOAPNotificationBinding Notify BP 1 Edinburgh, UK TTSProvider

Page 149

2005-01-01T09:00 2005-01-31T09:00 5

TTSService.wsdl SOAPNotificationBinding getAudioStream

BitsDelayMetric AuditingCompany MainSchedule IdealBitsReceived MainSchedule BitsReceived AuditingCompany http://nesc.ed.ac.uk/TTS/ipKbitsIn AuditingCompany http://nesc.ed.ac.uk/TTS/IdealKbitsIn

TTSProvider 2005-01-01T09:00:00-00:00 2005-01-31T09:00:00-00:00 BitsDelay 0 NewValue AuditingCompany GapOccurrence NewValue TTSProvider Violation GapOccurrenceGuarantee BitsDelay TTSCustomer Violation GapOccurrenceGuarantee BitsDelay Always

Page 150

Structuring defences in dependability arguments Mark Sujan1 Shamus Smith2 Michael Harrison3 1 Department of Computer Science, University of York, Heslington, York YO10 5DD, United Kingdom 2

Department of Computer Science, University of Durham, Durham DH1 3LE, United Kingdom 3

Dependability Interdisciplinary Research Collaboration, Informatics Research Institute, University of Newcastle Upon Tyne, Newcastle Upon Tyne NE1 7RU, United Kingdom

Abstract

qualities that are notoriously difficult to measure. The problem is that in practice it is often imposIn this paper it is argued that structure is a key as- sible to quantify the likelihood of undependabilities pect of understanding the strength of an argument of of a system that is yet to be fielded and has only dependability. Structure can be used in the analysis been tested in a limited, possibly simulated set of of safety arguments to highlight properties that may conditions. The paper is concerned with the ademake arguments unsafe. quacy of descriptive arguments. It makes two claims, both claims using the structure of a descriptive argument. The first is that to understand the argument’s 1 Introduction structure making it possible to reflect on its adequacy based on notions such depth, coverage and strength Argumentation is essential in assuring a system’s de- of mitigation. Here the structure makes the analysis pendability to a third party. The process of provid- of such notions more straightforward. It is necessary ing such arguments can itself improve the depend- to see how the causes are identified, the consequences ability of a system. Convincing the third party that elaborated. The second is that given this structure, an argument is adequate1 is problematic and for this it can be analysed how implemented structures, barreason quantifiable arguments that can be repeated riers, mitigate the potential consequences of failure are preferred to descriptive arguments that convince or prevent causes from arising. These structures are through their clarity, exhaustiveness and depth – investigated in existing arguments to pave the way for methods in which they can be made more explicit 1 The word adequate is left deliberately vague, it might thereby helping to see the strength of the argument. mean confidence

Page 151

A regulator might use them therefore to analyse the quality of an argument. Safety industries either demonstrate process and/or product arguments directly in a system’s development and evaluation or provide justification based on the presumed confidence that follows the application of standards in a system’s development. Direct and indirect arguments appeal to processes that provide a traceable justification for the decisions made at a particular stage of system development in the context of possible hazardous events which threaten system dependability. The paper focuses on direct arguments. Although assessing the quality of a descriptive argument is problematic, the structural properties of such arguments can be measured to determine whether desired properties are present using for example, (i) the depth and breadth of arguments to demonstrate coverage and amount of support and (ii) the representation of hazard barriers as a manifestation of arguments and their physical structure.

The role of structure in descriptive arguments Dependability arguments often have a structure that consists of claim, argument and evidence. Barriers perform two roles in mitigation, they might be preventative, reducing the likelihood of the cause or may be protective of the conseqence of the hazard. In discussing the role of the barrier a quantitative argument may be used to justify the basis on which the mitigating role of the barrier has been effective in similar situations. Cause, consequence, mitigation arguments use a structure with general characteristics that were observed by Toulmin [20] that take the form of a link between evidence, claim and support. In fact Toulmin developed a notation that can be used to structure such an argument (see Figure 1). “We may symbolise the relation between the data and the claim in support of which they are produced by an arrow, and indicate the authority for taking the step from one to the other by writing the warrant immediately below the arrow:” [20, pg99]

D

So C Since W

Harry was born in Bermuda

So

Harry is a British subject

Since

A man born in Bermuda will be a British subject

Figure 1: Toulmin’s initial argument pattern and an example Toulmin’s notation can be used to express any argument and can be augmented with additional components, for example, qualifiers on claims and rebuttals on qualifiers, but in the context of the current work the initial definition in Figure 1 is sufficient. Toulmin’s argument structure can be used to define arguments with claim → argument → evidence relations. Two data independent structures2 are formed by this approach. Firstly, tree-like parent/child relations, for direct support, and sibling relations, for diverse support. The structures can be defined using methods based on depth first or breadth first approaches.

Using explicit barriers in arguments A barrier is an obstacle, an obstruction, or a hindrance that may either (i) prevent an action from being carried out or an event from taking place, or (ii) prevent or lessen the impact of the consequences, limiting the reach of the consequences or weakening them in some way [10]. They are often complex sociotechnical systems: a combination of technical, human and organisational measures that prevent or protect against an adverse effect. Barriers for safety-critical systems include physical representations, for example a mechanical guard on an electronic throttle [2], as well as beliefs, such as confidence in system safety based on conformance to applied standards. 2 By data independent structure we mean that the argument structures are independent of the data they contain. It is the structural properties of the argument trees that we are interested in. See [16].

Page 152

Barriers embody both abstract and concrete representations of safety properties found in safety cases. Such cases implicitly document the barriers that must exist between hazards and hazardous states and vulnerable components of a system. For certification it is the verification of these barriers that provide confidence in the safety of the system. However, explicit representations of such barriers are commonly absent from safety case documentation and the associated arguments for compliance to particular standards. Explicit barrier description in hazard analysis can provide insight throughout the development of safetycritical systems and can provide a more substantial reason for having confidence in a mitigation [17]. For example if there is a hazard mitigation that an interlock3 inhibits some type of behaviour, this may feature as evidence in a safety case. It should be possible to prove that it is in place in the live system and that its performance can be accessed and compared to predicted performance in the initial hazard analysis. Barriers represent the diverse physical and organisational measures that are taken to prevent a target from being affected by a potential hazard [11, pg 359]. The concepts and terminology related to barriers or safety features vary considerably [9]. We present one barrier classification by Hollnagel [10].

knowledge of the user to achieve their purpose. For example the use of standards. The remainder of this paper is as follows. In Section 2 the role that structure plays will be explored by identifying similarities in argument structures. Structural properties of arguments are demonstrated that can be utilised to identify arguments that could be strengthened. In Section 3 the nature of these structures are explored as they relate to the Reduced Vertical Separation Minimum Functional Hazard Analysis. In Section 4 the nature of particular barrier configurations as an indication of direct arguments in a deployed system will be examined. Such analysis can identify dependability concerns such as single-point failures and highlight human and technology dependencies in the system.

2

The quality of an argument

Arguments can be reinforced in two ways: using “depth” or “breadth” to strengthen the claim that is being made. Hence figure 2(i) reinforces the fact that Jane has a UK passport by claiming that Jane is British. This claim is further strengthened by claiming that Jane was born in the UK. Reinforcement can be achieved not only with depth but also through di1. Material barriers physically prevent an action versity or variety to strengthen the claim. Hence the from being carried out or the consequences of claim that London is an accessible city is reinforced a hazard from spreading. For example a fence or by claiming not only that it has good rail links, but wall. also bus links and air links figure 3(i). Similar structures are used broadly, if we have an argument that 2. Functional barriers impede an action from being London is an accessible city and the same factors hold carried out, for instance the use of an interlock. for Leeds then we can reuse the argument to argue that Leeds is also an accessible city. Indeed if the 3. Symbolic barriers require an act of interpretasame factors hold for Leeds and yet we believe that tion in order to achieve their purpose. For exLeeds is lacking in terms of accessibility this might ample a “give way” sign indicates a driver should provoke the analyst to strengthen the London argugive way but does not actively enforce/stop nonment perhaps qualifying the air links claim by a compliance. further claim that there is frequent public transport 4. Immaterial barriers are not physically present or from the centre of the city to the airport. Typically, represented in the situation, but depend on the arguments use variants of one or other of these argument types and in practice will make use of similar 3 An interlock is a mechanism which ensures that potentially patterns that have worked in the past. This reuse can hazardous actions are only performed at times when they are safe [18]. be good particularly when it can be justified through

Page 153

past experience — the fact that London is an accessible city and that these subclaims are true. In fact it may be possible to quantify the claim of accessibility and carry such quantification to the new situation. However it can also be a bad thing. The reuse may be used inappropriately in a situation where the argument is not analogous. Jane can travel freely through Europe

John can travel freely through Europe

John can travel freely through Europe

UK passport holders can travel freely in Europe

UK passport holders can travel freely in Europe

UK passport holders can travel freely in Europe

Jane has a UK passport

John has a UK passport

John has a UK passport

Jane is British

John is British Jane was born in the UK

(i) First argument

John was born in the UK

(ii) New argument

(iii) Reused lower justification

Figure 2: Depth first reuse example A safety case often involves hazard analysis. Hazard identification, classification and mitigation techniques establish that hazards can either be avoided or that they will not affect the dependability of the system. Here, descriptive arguments often based on barriers are used to mitigate the perceived severity of hazards. In analysing the quality of such arguments there is a requirement that the analysis has (i) sufficient rigour and (ii) sufficient coverage. Our confidence in the rigour of a safety case is directly linked to the confidence or strength of the hazard analysis itself. This confidence will be reinforced by objective evidence of coverage and depth of the analysis and understanding of the dependability of the barriers that have been used in the argument.- that there are no unexpected adverse consequences within a safety-critical system.

Good rail links to London

Good bus links to London

Leeds is an accessible city

Leeds is an accessible city

London is an accessible city

Good air links to London

(i) Initial diverse argument

Good rail links to Leeds

Good rail links to Leeds

(ii) New argument

Good bus links to Leeds

Good air links to Leeds

(iii) Potential breadth first reuse

Figure 3: Breadth first reuse example A range of methods have been developed to support systematic hazard analysis, for example HAZOP

(Hazard and Operability Studies) [14], FMEA (Failure Modes and Effect Analysis) [4] and THEA (Technique for Human Error Assessment) [15]. All these methods use structure explicitly in the arguments that are generated by the methods. All these techniques involve significant personnel effort and time commitment. As a result it is common for similarities in the content and context of such analysis to be exploited to reuse analysis fragments [3, 13, 19]. In what follows we explore the structure of two public domain arguments. In the first case the RVSM Functional hazard Analysis has been translated into ASCE [1] format in order to explore the structure of the argument that is provided. In the second case study, a safety assessment of the eight state route airspace we focus on how barriers are used in the argument.

3

Case Study Study

RVSM

FHA

The Reduced Vertical Separation Minimum (RVSM) programme is an EATMP programme, and was established to contribute to the overall objective of enhancing capacity and efficiency while maintaining or improving safety within the ECAC airspace. The main scope of RVSM is to enhance airspace capacity. The introduction of RVSM will permit the application of a 1000 ft vertical separation minimum (VSM) between suitably equipped aircraft in the level band FL290 - FL410 inclusive. Before the introduction of RVSM the VSM was 2000 ft (referred to as CVSM). A prerequisite to the introduction of RVSM is the production of a safety case to ensure that the minimum safety levels are maintained or improved. The Functional Hazard Analysis (FHA) constitutes an essential part of the Pre-Implementation Safety Case (PISC). The FHA document is publicly available [6] and forms the basis for the study reported in this section. Three areas have been considered in the FHA: 1. Mature / Core EUR RVSM area 2. Mature / Transition space

Page 154

3. Switchover For each area a number of scenarios were created for the FHA sessions. In total 73 hazards have been analysed during the FHA. For all of these, safety objectives have been established. It was concluded that 71 hazards achieved their safety objectives, while 2 hazards were assessed as safety critical / not tolerable. In the analysis below the FHA Session 1 / Scenario 1 was considered. Therefore 7 hazards out of 73 have been analysed pertaining to the core EUR RVSM airspace and focussing on the controller’s ability to issue ATC clearances and to monitor compliance (ground-related hazards). Some of the shortcomings and potentially invalid claims of the FHA had already been pointed out in various places soon after the publication of the document, see for example the discussion in the University of York based Safety-Critical Mailing List. However, the aim of the current study was to understand the structural properties of the FHA arguments, and the type of barriers implied therein. The FHA arguments in the document were provided in textual form. This makes it difficult to analyse and describe precisely the structure and the dependencies of the argument. The difficulties attached to textual descriptions of dependability arguments have been pointed out before, see for example [12, 1]. For this reason the arguments were transformed into Goal Structuring Notation (GSN). Such a post-hoc transformation is not ideal, as the uncertainties or ambiguities inherent in the textual description cannot be resolved. Preferrably, the GSN goal structures would be derived by the people performing the FHA in order to make best use of its capabilities. However, for the current study these uncertainties are acceptable. Figures 4 and 5 illustrate two of the FHA arguments which have been transformed into GSN format. Figure 4 shows the argument provided for demonstrating that the risk associated with nuisance Traffic Advice (TA) and Resolution Advice (RA) due to older versions of the Airborne Collision Avoidance System (ACAS) are tolerable. Figure 5 shows the re-

Figure 4: Functional Hazard Analysis 1.1 spective argument for the hazard arising from a failure of the RX/TX. All arguments follow the same top-level structure, namely the claim that the risk arising from a hazard is tolerable, is broken down into a claim that the severity is at most X, and a second linked claim that the probability of occurrence of this hazard is not greater than Y. A GSN Pattern has been created from which all the arguments have been instantiated [12].

Figure 5: Functional Hazard Analysis 1.15 During the analysis of the structure of the arguments it is useful to recall Govier’s Support Pattern Types [8] (see also [21]). Here a distinction is made between different types of argument support, namely single, linked, and convergent support. With reference to the GSN diagrams, this implies that we need to take into account whether a child node satisfies, by itself, the parent node (single sup-

Page 155

port), whether a number of child goals interdependently support (linked support) the parent node (corresponds roughly to an AND in the strategy description), or whether each of a number of child nodes individually supports the parent node (convergent support) and if so how mutually dependent they are. This latter support type corresponds to a fully diverse argument form. The structural analysis of the hazard mitigation arguments reveals that single claims for both the severity and the probability branch are common, though they usually do not appear conjointly in a single hazard mitigation argument. This can be seen in the figures. While short and precise arguments are desirable, it is quite likely that further elaboration would enable a better assessment of the claim. This point can, in fact, be extended to all of the arguments which were analysed from this FHA document. The depth of arguments usually does not exceed three, which is an indication of very simple arguments. Further inspection reveals that there appear to be a range of hidden assumptions which are not explicitly stated. It would need to be investigated where the trade-off has to be set between simplicity and detail of an argument. A further point which the analysis revealed is that truly diverse arguments are not encountered. Usually, all the arguments are linked in some way, or are single support arguments. For example, in Fig. 4 the claim that the probability of occurrence of the hazard is not greater than remote is supported by a claim that the expected number of aircraft carrying ACAS V6.04 is small and will be reduced even further, and by a claim that any future problem will be dealt with quickly. Clearly, both arguments interdependently support the claim. Likewise, the first claim is subsequently broken down into its two constituent parts. One of these parts is the claim that the expected number of ACAS V6.04 is small which is supported by reference to three different studies. Again it appears that all three references support interdependently the claim, though here we may probably assume a greater degree of independence than in the previous example. This illustrates that in the production and assessment of dependability claims a measure of the relevance of each supporting argument

will be helpful. A final concern of the study was the analysis of the use of barriers in the hazard mitigation arguments. The mitigation argument in Fig. 5 makes reference to four barriers in the severity claim branch. At least two of these are references to procedures (Lost Communication Procedure, CVSM Application Procedure), while a third can be interpreted as being a procedure, a tool, or a combination of both (Compulsory entry points for later calculation). Finally, the fourth barrier refers to the pilot (or crew). What is interesting in this FHA document is that the barriers just mentioned are quite typical of the type of barriers implied in the entire FHA. The barriers are usually either references to procedures, to certain programmes (such as monitoring programmes), or to the crew or ATCO. At least in the analysis of the FHA document conducted thus far there was little mention of any kind of technological barriers or technological support. In fact, and this refers back to what was said above, it appears that there had been a tendency to simplify complex matters into generic statements such as “The crew will regain control”, without explicit reference to how this is achieved and on what kind of support it relies. Whether this is a feasible approach will have to be assessed.

4

Case study - Eight state route airspace

Eurocontrol’s European Air Traffic Management Programme requires a safety assessment to be performed for “all new systems and changes to existing systems.”[5]. Therefore a safety assessment was commissioned for the eight-states4 free route airspace concept. The overriding aim of the concept was to obtain benefits in terms of safety, capacity, flexibility and flight efficiency by removing the constraints imposed by the fixed route structure and by optimising the use of more airspace [7, pg xiii]. The principal safety objective was to ensure that free route airspace operations are at least as safe as the cur4 Belgium, Denmark, Finland, Germany, Luxembourg, The Netherlands, Norway and Sweden.

Page 156

rent fixed route operations. A functional hazard assessment was completed to determine how safe the various functions of the system need to be in order to satisfy the safety policy requirements. This assessment investigated each function of the proposed system and identified ways in which it could fail (i.e. the hazards) [7, pg 10].

4.1

Preventive vs. protective barriers

It is common for hazard mitigations to be considered in terms of independence and diversity. The belief that a hazard has been mitigated may be given a higher level of confidence if multiple diverse arguments are present. Also the nature of the associated barrier in the context of the initiating hazard event A sample from the functional hazard assessment is also of relevance. Classifying preventive and procan be seen in Table 2 in Appendix A. Examining tective barriers highlights this consideration. For exthe explicit barriers present in the airspace route pro- ample if a hazard has only preventive barriers there vides insight into the identification of barriers as both is no fault tolerance in the system, as provided by a design tool and possible analysis metric. Analy- protective barriers. sis is based on the mitigations associated with the No protective barriers and 128 preventive barrinew hazards introduced by the implementation of free ers were identified in the free route airspace examroute operations and ignores existing mitigations in ple. The majority consist of the enforcement or rethe previous system. view of different operating procedures. Other barriers The functional hazard assessment contains 105 include controller and pilot training and monitoring rows of which 69 contained new hazards that required system technology. Twenty two different preventive mitigation. Newly identified hazards are not miti- barriers can be identified as unique barrier forms. All gated by existing mitigating factors in the system. of the barrier protection is based on preventive barThe output of the hazard assessment was a set of riers here, which has implications for the fault tolersafety requirements for the proposed free route envi- ance of the system. ronment. In total 128 barriers can be identified in the safety requirements. For example assessment 210 4.2 Barrier frequency and type in Table 2 of Appendix A contains four existing mitigating factors and four proposed barriers described Commonly there is not a one-to-one relation between as safety requirements. Other implied barriers in this hazards and barriers. One hazard may be protected case study include human oriented barriers such as against by several barriers (see Section 4.3) and one “controller training”, environmental conditions, for barrier may feature in the mitigation arguments of example “airspace design”, and system components, several hazards. A barrier mitigation with a number of high consequence hazards will require greater reliafor example “MTCD5 system usage”. bility as more of the system safety will be dependent The particular barrier configurations are represenon it. This is particularly the case if a single bartative of the direct arguments supplied for the derier is the only defence to a hazard (see Section 4.3). ployed system. By examining their structural properIn addition there may be cost-benefit tradeoffs beties dependability concerns, such as single-point failtween barriers. Expensive barriers, in terms of physures, can be identified. In addition, the structural ical cost, time to implement and/or ongoing mainteconfiguration of the barriers highlights human and nance, that provide protection against a single hazard technology dependencies in the system that may be may be less desirable than alternative barrier soluof concern. The following sections are indicative of tions that provide protection from multiple hazards. the set of barrier properties and of their implications Such knowledge can provide justification for particufor safety. lar design decisions. Table 1 shows the eight most common barriers in the airspace analysis. The two barriers that appear 5 Medium Term Conflict Detection. most common in the hazard analysis are technological

Page 157

systems and together contribute 39% of the barriers. In Table 1 technological systems represent 48% of the total barriers and the human oriented barriers represent 24%. From a total of 22 unique barriers identified in the analysis, those in Table 1 represent 84% of all the barriers in this hazard analysis. In addition to the occurrence of particular barriers in this case study, the frequency of demand of barriers significantly modifies the predicted risk. Expectations on how often a barrier will be expected to be active, and not fail, will determine how critical it is to the system it is protecting. However, the analysis material discussed in this chapter does not provide details of such expectations and will therefore not be discussed here further.

ysis suffer from potential single-point failures. This represents a considerably percentage of the barriers proposed in this assessment. In this analysis each barrier was considered independent and determining independence between barriers is outside the scope of this chapter.

5

Conclusions

The paper explores the role that structure in general and depth and breadth style arguments play in strengthening an argument for the dependability of a system. The paper then takes a further step and notes that exploring the barriers that are used in mitigation enables a better understanding of the effectiveness of the argument. The aim is that explorations 4.3 Barriers per hazard such as described in the paper will be carried out Accidents happen because barriers fail and hazards by auditors as they explore the quality of qualitative are present. Hollnagel [10] observes that accidents dependability arguments. are frequently characterised in terms of the events and conditions that led to the final outcome or in terms of the barriers that have failed. As a conse- References quence, redundancy is a common feature in the safety [1] Adelard. The assurance and safety case aspects of dependable systems. In particular, redunenvironment - ASCE. Technical report, dancy is used to prevent the failure of a single comhttp://www.adelard.co.uk/software/asce/ ponent causing the failure of a complete system - a , 2005. so-called single-point failure [18, pg 132]. Identifying potential single-point failures is essential for deter[2] Stephen Barker, Ian Kendall, and Anthony mining problem areas in a system’s reliability. HazDarlison. Safety cases for software-intensive sysards with only single barriers, and in particular sintems: an industrial experience report. In Pegle preventive barriers, represent a significant threat ter Daniel, editor, 16th International Conferto system safety. By examining the structure of the ence on Computer Safety, Reliability and Secubarriers, we reflect on the arguments that they reprerity (SAFECOMP 97), pages 332–342. Springer, sent. In addition, identifying multiple barriers does 1997. not necessarily imply greater prevention or tolerance properties. Barrier interdependence will compromise [3] David Bush and Anthony Finkelstein. Reuse any diversity based arguments if combined dependof safety case claims - an initial investigation. ability between barriers results in single-point failure In Proceedings of the London Communications situations. A common preventive barrier pair is the Symposium. University College London, Septemuse of “controller training” and “procedure following” ber 2001. http://www.ee.ucl.ac.uk/lcs [last acwhich are clearly interrelated. cess 6/06/03]. In this analysis 28 hazards are protected against by single barriers, 31 hazards by double barriers, 10 [4] B. S. Dhillon. Failure modes and effects analysis hazards by triple barriers and 3 hazards by quadruple - bibliography. Microelectronics and Reliability, barriers. Therefore 22% of the barriers in this anal32(5):719–731, 1992.

Page 158

[5] European air traffic management pro- [14] Trevor Kletz. Hazop and Hazan: Identifying and gramme safety policy, November 1995. Assessing Process Industrial Hazards. InstituSAF.ET1.ST01.1000-POL-01-00, Edition tion of Chemical Engineers, third edition, 1992. 1.0. ISBN 0-85295-285-6. [6] Eurocontrol. EUR rvsm programme: Functional [15] S. Pocock, M.D. Harrison, P. Wright, and P.D. Johnson. THEA: A technique for human erhazard assessment. Working Draft 1.0, Euroror assessment early in design. In M. Hirose, pean Organisation for the Safety of Air Navigaeditor, Human-Computer Interaction INTERtion, February 2001. ACT’01 IFIP TC.13 International Conference on human computer interaction, pages 247–254. [7] Eurocontrol. Safety assessment of the free route IOS Press, 2001. airspace concept: Feasibility phase. Working Draft 0.3, European Organisation for the Safety of Air Navigation, October 2001. 8-States Free [16] Shamus P. Smith and Michael D. Harrison. Improving hazard classification through the reuse Route Airspace Project. of descriptive arguments. In Cristina Gacek, editor, Software Reuse: Methods, Techniques, and [8] T. Govier. A practical study of arguments. Tools, volume 2319 of Lecture Notes in ComWadsworth, 1988. puter Science (LNCS), pages 255–268, Berlin Heidelberg New York, 2002. Springer. [9] Lars Harms-Ringdahl. Investigation of barriers and safety functions related to accidents. In [17] Shamus P. Smith, Michael D. Harrison, and BasProceedings of the European Safety and Reliatiaan A. Schupp. How explicit are the barribility Conference ESREL 2003, Maastricht, The ers to failure in safety arguments? In ComNetherlands, 2003. puter Safety, Reliability and Security (SAFECOMP 2004), Lecture Notes in Computer Sci[10] Erik Hollnagel. Accidents and barriers. In J-M ence (LNCS), Berlin, 2004. Springer. Hoc, P Millot, E Hollnagel, and P. C. Cacciabue, editors, Proceedings of Lex Valenciennes, [18] Neil Storey. Safety-Critical Computer Systems. volume 28, pages 175–182. Presses Universitaires Addison-Wesley, 1996. de Valenciennes, 1999. [19] J. R. Taylor. Risk analysis for process plant, [11] C. W. Johnson. Failure in Safety-Critical Syspipelines and transport. E & FN SPON, London, tems: A Handbook of Accident and Incient Re1994. porting. University of Glasgow Press: Glasgow, [20] Stephen E. Toulmin. The uses of arguments. Scotland, October 2003. ISBN 0-85261-784-4. Cambridge University Press, Cambridge, 1958. [12] Tim P. Kelly. Arguing Safety - A Systematic Ap[21] R. Weaver, J. Fenn, and T. Kelly. A pragmatic proach to Managing Safety Cases. PhD thesis, approach to reasoning about the assurance of Department of Computer Science, The Universafety arguments. In Proceedings 8th Australian sity of York, 1999. Workshop on Safety Critical Systems and Software, 2003. [13] T.P. Kelly and J.A. McDermid. Safety case reconstruction and reuse using patterns. In P. Daniel, editor, 16th International Conference on Computer Safety, Reliability and Security (SAFECOMP 1997), pages 55–69. Springer, 1997.

Page 159

A

Summaries and fragments Barrier MONA (MONitoring Aid) system Controller training

Frequency 32 18

Airspace design Transfer procedure

8 5

Barrier MTCD system Free Route Airspace contingency procedures Review procedures Area Proximity Warning (APW) system

Frequency 18 15 8 4

Table 1: Eight most common barriers in the airspace analysis Failure Operational Existing mitigating Condition Consequences factors

Task

Function

ID

Handling aircraft

Conflict identification

210

Controller fails to identify conflict

Potential collision risk

Controller training. Pilot awareness of other traffic. STCAb , TCASc

Handling aircraft

Conflict identification

211

Potential collision risk

Handling aircraft

Conflict identification

212

Controller unable to make timely identification of conflict Controller mistakenly identifies conflict when none existed

Controller training. Pilot awareness of other traffic STCA, TCAS. Controller training. Traffic monitoring

Extra workload

Table 2: Fragment of safety assessment for the free route airspace concept a Medium

Term Conflict Detection system. Term Conflict Alert system. c Traffic Alert Collision Avoidance System. b Short

Page 160

Proposed Free Route safety requirement MTCD a . Controller training. Airspace design. Procedure review. MTCD. Airspace design. Controller training. Transfer procedures MTCD. Controller training.

Analysing user confusion in context aware mobile applications K. Loer1 and M. D. Harrison2 Department of Computer Science, University of York York, YO10 5DD, UK and 2 Informatics Research Institute, University of Newcastle upon Tyne, NE1 7RU, UK 1

Abstract

confusing, surprising the user, and causing failure to occur. Mobility of ubiquitous systems offers the possibility Context aware systems are still mainly at an exof using the current context to infer information that perimental stage of development and there is considmight otherwise require user input. This can either erable interest in how these systems are used. The make user interfaces more intuitive or cause subtle cost of user confusion about how action is interpreted and confusing mode changes. We discuss the anal- may be expensive in terms of poor take up and poysis of such systems that will allow the designer to tential error in safety critical situations. Techniques predict potential pitfalls before the design is fielded. are required that can predict these difficulties at deWhereas the current predominant approach to under- sign time. An important question therefore is what standing mobile systems is to build and explore ex- these techniques should be and whether the cost of perimental prototypes, our exploration highlights the using them is justified by the early understanding of possibility that early models of an interactive system design. The work underlying this paper uses formal might be used to predict problems with embedding modelling techniques and model checking. The point in context before costly mistakes have been made. of using these techniques is not to suggest necessarAnalysis based on model checking is used to contrast ily that an industrially scaleable technique should configuration and context issues in two interfaces to use them precisely as given in the paper. It is ina process control system. stead the purpose to illustrate the type of issues that approaches such as these can help understand. An important question to be asked of any technique is whether early analysis requires a level of effort that 1 Background can be justified in terms of potential costs of user Mobile interactive technologies bring new opportuni- confusions in business and safety critical systems. ties for flexible work and leisure. The fact that the The purpose here is to explore how analytic techmobile device is context aware means that user inter- niques might be used to: action can be more natural. The system (that is the • analyse differences between different interface whole software, human and hardware infrastructure) configurations, in this case the difference becan detect where the device and its user are, infer tween a central control room and a mobile handinformation about what the user is doing, recognise held device. urgency, even be aware of the user’s emotional state. As a result user actions may be interpreted appro• analyse contextual effects. A simple model of priately. The benefits that context awareness brings context based on location is developed to analyse can be obscured by difficulties. Interaction may be user action and user process.

Page 161

and indicating urgency. For example, a leak in a pipe is indicated in the control room by a red flashing symbol over the relevant part of the schematic. Two operators walk out of the control room leaving it empty, one walks to the location of a heater downstream of the leak, the other walks to the valve upstream of the leak. The operator upstream attempts to close off the valve using the PDA but is warned not to, while the other operator is told by the PDA that the heater should be turned off quickly because the first operator is waiting. Both 2 A Scenario operators, after having carried out their actions, put heater and pump status and controls (respectively) A control room contains full wall displays on three in buckets in their PDAs and move to the location of sides. Plant schematics are displayed to represent the leak to deal with it. When they have fixed the the plant’s state and can be manipulated through the leak together they each check and restore the controls control room interface using physical devices (e.g., that they had previously put in buckets to the state switches), command line or direct manipulation in- before the leak was identified and walk back to the teraction techniques, through the PDA interface, or control room. through the physical components of the plant itself This scenario indicates the variety of modes and (e.g., closing a valve). Trend data about the plant is contexts that can occur. Confusions can arise if there also displayed and helps operators anticipate emerg- is more than one plant component in close proximity, ing situations. Workflow information indicating to- if the operator forgets which component they have day’s schedule for an individual operator is contained saved, if one operator forgets that another operator in the operator’s window also displayed on part of the is nearby. These problems can be exaggerated by wall. poor design. A problem occurs in the plant requiring “handson” inspection and possible action from one or several operators. Operators (perhaps working as a team) 3 Analysing the interface take PDAs as they go to find out what has happened. General situation information and prompts Given a design such as the one above, it is clear that about what to do next can be accessed from the PDA. configuration and context are important to the sucThe PDA can also be used to monitor and control cess of the system. What happens to the interface a valve, pump or heater in situ (some of the mon- when the operator moves from the control room to itoring characteristics of this device are similar to the handheld device and begins to move around the those described in [Nilsson et al., 2000]). A limited plant? What changes occur between the control room subset of information and controls for these compo- and the hand held device? How is the hand held denents will be “stored” in the PDA to ease access to vice affected by the context? An operator will have them in the future – analogous to putting them on a number of goals to achieve using these interfaces the desktop. These desktop spaces are called buckets and the actions that are involved to do this will be in [Nilsson et al., 2000]. The operator can view and different in the two interfaces, and in the mobile case control the current state of the components when in dependent on context. their immediate vicinity. Context is used in identiA typical approach to analysing these differences fying position of an operator, checking validity of a might be to perform a task analysis in different sitgiven action, inferring an operator’s intention, check- uations and produce task descriptions that can be ing action against an operator’s schedule assessing used to explore the two interfaces and how the inter-

The structure of the paper is as follows. The next section gives a scenario to illustrate the kind of system that is being considered here. Section 3 discusses the analysis to be performed. Section 4 presents in brief the model of the two user interfaces to the system. Section 5 explores the analysis of the system based on the models. The paper concludes by discussing the relevance of this approach and how future techniques might emerge.

Page 162

faces support the interactions. This might involve considering the information resources that would be required in the two cases [Wright et al., 2000]. Such an approach would have much in common with [Patern` o and Santoro, 2003, Fields, 2001]. Indeed some of this analysis is performed in a fuller version of this paper [Loer and Harrison, 2004]. However there are difficulties with such an approach. Task descriptions assume that the means by which an operator will achieve a goal can be anticipated with reasonable accuracy. In practice a result is that strategies or activities that the operator actually engages in may be overlooked. A different approach it to take the models and check whether a goal can be reached at all. The role of a model checker is to find any path that can achieve a user goal. This new approach also has difficulties because the sequence of actions may not make any sense in terms of the likely actions of an operator. In order to alleviate this the analyst’s role is to inspect possible traces and decide whether assumptions should be included about use that would enable sequences to be generated that are more realistic. The advantage of this approach is that it means that analysis is not restricted to sequences that are imposed – the presumed tasks. The disadvantage is that in some circumstances there may be many paths that might require such exploration. A model of context is required, as well as of the devices, that will enable an analysis of the effects of the user interface of the mobile device in this way. Since the problem here is that action or sequences of actions (process) may have different meanings depending on context a clear definition of context is required. Persistently forgetting to restore information when the context has changed could be one effect of context, and can be considered as part of the analysis. In the case study the environment is described simply in terms of physical positions in the environment and transitions between these positions. As the hand-held device makes transitions it is capable of interacting with or saving different plant components onto the device. A model of the plant is included in order to comprehend how the interfaces are used to monitor and control. Context confusions can be avoided through design

by changing the action structure (for example, using interlocks) so that these ambiguities are avoided or by clearly marking the differences to users. Techniques are required that will enable the designer to recognise and consider situations where there are likely to be problems. The process is exploratory, different properties are attempted and modified as different traces are encountered as counter-examples or instances. Traces that are “interesting” are scrutinised in more detail to investigate the effectiveness of the design and the possibility of confusion – discovering an interesting trace does not of itself mean that the design is flawed or is prone to human error. Implications of different configurations are explored by considering simple assumptions about the user. In what follows we describe an experiment in which questions are articulated in CTL and recognised by the SMV model checker [Clarke et al., 1999].

4

Modelling the user interface

The characterisation of the device and of the control room are both much simplified for the purposes of exposition. The icons on the hand-held device are the only means available to the user to infer the current system state and the available operations. Since the visibility of icons is important to the operation of the plant and the usability of the hand-held device, the basis for the analysis is (i) that all available operations are visible, and (ii) that all visible operations are executable. The analysis uses Statecharts: an example of how an interface can be developed using Statecharts is given in [Horrocks, 1999]. The Statecharts in the current scenario are structured into different components as suggested by [Degani, 1996] to make interaction with the device and the effect of the environment clearer. The interactive system that controls the process is designed: (1) to inform the operator about progress; (2) to allow the operator to intervene appropriately to control the process; (3) to alert the operator to alarming conditions in the plant and (4) to enable recovery from these conditions. A model is required to explore usability issues and design alternatives in the light of these goals of the underlying process. The

Page 163

The control room, with its central panel, aims to provide the plant operator with a comprehensive + + overview of the status of all devices in the plant. VOLUME ON/OFF VOLUME ON/OFF 1.0 2.5 Availability and visibility of action will be the primary concern here. Other aspects of the problem Pump 3 Pump 4 Direction Direction can be dealt with by using complementary models of ON/OFF ON/OFF the interface, for example alarms structure and preBWD FWD BWD FWD sentation, but analysis is restricted for present purPump 5 poses. The specification describes the behaviour of ON/OFF the displays and the associated buttons for pump 1 (and equivalently pump 2). The effects of actions are described in terms of the signals that are used to synchronise with the pump description and the states in Figure 1: Control Screen layout. which the buttons are illuminated. The control panel is implemented by a mousecontrolled screen (see Figure 1). Screen icons act as central control mechanism provides all information in both displays and controls at the same time. Hence one display (Section 4.1), while the personal applifrom Figure 2 we can see that PUMP1USERINTERFACE ance displays partial information (Section 4.2). supports four simple on-off state transitions defining the effect of pressing the relevant parts of the dis4.1 Representing and modelling the play. The state indicates when icons are illuminated but also shows that the actions trigger corresponding central panel actions in the underlying process. The Statechart This paper deliberately glosses over the model of the here builds a bridge between actions that relate to process. The process involves tanks and pumps that the behaviour of the process underneath and actions feed material between tanks. The tanks can be used that correspond to using the mouse to point and click for more than one process and, in order to change at the relevant icons. A detailed account of what the processes, a tank must be evacuated before material specification means is not presented here. An indicacan be pumped into it. In order to achieve this some tion of what it would look like is all that is intended at of the pumps are bi-directional. In fact the process is this stage – an indication of the scale of the modelling expressed as a simple discrete model in which the sig- problem using this style of specification. Many other nificant features of the environment can be explored, approaches could have been used: Patern` o used LOfor more details, see [Loer and Harrison, 2004] or TOS [Patern` o and Santoro, 2003], Campos and Har[Campos and Harrison, 2001]. Hence the state of the rison used MAL [Campos and Harrison, 2001]. Notatank is simply described as one element of the set tions such as Promela that are supported directly by {f ull, empty, holding} — there is no notion of quan- model checkers are also relatively straightforward to tity or volume in the model. This is adequate to cap- use [Holzmann, 2003]. It should be noted that pumps ture the key features of the process from the point of 1 and 2 and pumps 3 and 4 are identical in terms of view of interaction with the system. their behaviour and hence the specifications are not The control panel contained in the control room repeated. can be seen in Figure 1. All the pumps in the plant are visible and can be adjusted directly using a 4.2 Representing context and the mouse. As can be seen from the display all the pumps hand-held control device can be switched on and off, some pumps (3 and 4) can be reversed and the volume of flow can also be The hand-held device uses individual controls that modified in the case of pumps 1 and 2. are identical to those of the central control panel Pump 1

Volume

Pump 2

Volume

Page 164

PUMP1USERINTERFACE P1ONOFF

P1VOLUME

P1ONOFF DARK

P1VOLUP

P1VOLUME DARK

P1ONDISP_ click_P1VOLUME/ UNSETSIG click_P1ONOFF/ P1VOL_SIG click_P1VOLUME/ click_P1ONOFF/ P1OFF_SIG P1VOL_SIG P1ON_SIG P1VOLDISP_ P1ONDISP_ P1VOLDISP_ SETSIG UNSETSIG SETSIG

P1ONOFF ILLUMINATED

PUMP3USERINTERFACE P3ONOFF P3ONOFF DARK P3ONDISP_

P1VOLUME ILLUMINATED

P1VOLUP ILLUMINATED

click_P1VOLDOWN/ SETNEWVOL_SIG

P1VOLDOWN ILLUMINATED PUMP5USERINTERFACE

P3DIR_BWD DARK

P3DIR_FWD DARK

P3DIR_BWD ILLUMINATED

P3DIR_FWD ILLUMINATED

click_P3ONOFF/ P3FWDDISP_SIG click_P3BWD/ UNSETSIG P3BWDDISP_SIG click_P3FWD/ or P3ONDISP_ P3BWD_SIG click_P3ONOFF/P3OFF_SIG or P3ONDISP_ P3FWD_SIG P3ON_SIG P3ONDISP_UNSETSIG P3BWDDISP_SIG UNSETSIG P3FWDDISP_SIG SETSIG

P3ONOFF ILLUMINATED

P1VOLDOWN DARK

click_P1VOLUP/ SETNEWVOL_SIG

P3FWD

P3BWD

P1VOLDOWN

P1VOLUP DARK

P5ONOFF DARK P5ONDISP_ UNSETSIG click_P5ONOFF/ click_P5ONOFF/ P5OFF_SIG P5ON_SIG P5ONDISP_

P5ONOFF ILLUMINATED

SETSIG

Figure 2: Initial specification of control screen behaviour. but only a limited amount of space is available for them. As a controller walks past a pump it is possible to “save” the controls onto the display. Thereafter, while the controls continue to be visible on the display, it is possible to control the pumps from anywhere in the system. The hand-held control device (Figure 3) knows its position within the spatial organisation of the plant. Hence the Environment model to describe the system involving this device is extended to take account of context. A simple discrete model describes how an operator can move between device positions in the plant modelled as transitions between position states, as shown in Figure 4. By pointing the laser pointer at a plant component and pressing the component selector button, the status information for that component and soft controls are transferred into the currently selected bucket. Components can be removed from a bucket by pressing the delete button. With the bucket selector button the user can cycle through buckets. The intended use of the device has been altered from the published description (monitoring and annotating) to monitoring and manipulation. The specification of the hand-held device describes both the physical buttons that are accessible continuously and other control elements, like pump control icons, that are available temporarily and depend on the position of the device. When the operator approaches a pump, its controls are automatically dis-

played on the screen (it does not require the laser pointer). The component may be “transferred” into a bucket for future remote access by using the component selector button. Controls for plant devices in locations other than the current one can be accessed remotely if they have been previously stored in a bucket. When a plant component is available in a bucket and the bucket is selected, the hand-held device can transmit commands to the processing plant, using the pump control icons. Figure 5 shows an extract of the specification. Here the user can choose between three buckets and each bucket can store controls for up to two components. In the BUCKETS state the current contents of each bucket x are encoded by variables “BxCONTENT”. The environment in this case is a composition of the tank content model and the device position model in Figure 4. The model presumes that the appliance should always know its location. This is of course a simplification. Alternative models would allow the designer to explore interaction issues when there is a dissonance between the states of the device and its location. A richer model in which variables are associated with states, and actions may depend on values of the state that have actually been updated, may lead to asking questions of the models as whether “the action has a false belief about the state”. These issues are important but are not considered in this paper.

Page 165

5

Analysis

ENVIRONMENT PLANT_POSITIONS

Model-checking is a technique for analysing whether a system model satisfies a requirement. These requirements may be concerned with a number of issues including safety and usability. The model checker traverses every reachable system state to check the validity of the given property. If the property “holds”, a True answer is obtained. Otherwise, the property is False, and the tool attempts to create a sequence of states that lead from an initial state to the violating state. These “traces” are a valuable output because they help understanding why a specification is violated. There are many detailed expositions of approaches to model checking, see for example [Clarke et al., 1999, Huth and Ryan, 2000, B´erard et al., 2001, Holzmann, 2003] and a number of treatments of interactive systems from a model checking perspective, see for example [Patern` o and Santoro, 2003, Fields, 2001, Campos and Harrison, 2001, Rushby, 2002].

5.1

Comparing the control room and the hand held device

In order to explore the effect of the difference between the control room and the hand-held device a reachability property may be formulated for a user level “goal” of the system. The goal chosen here for illustration is “Produce substance C ” which is a primary purpose of the system. LED

touch screen delete

Pump1 DEL

F ON/OF

e Volum

2.5

+ -

E

VOLUM

F ON/OF

F ON/OF

Pump5

CTRLROOM

goCTRLRM

goPOS1

goPOS1

goPOS6

goPOS2 POS2

POS1

POS6

goPOS6 goPOS5 goPOS1

goPOS1 goPOS3

goPOS3

goPOS4 POS4

POS3

POS5

goPOS5

goPOS3

Figure 4: Model of device positions. The idea is that differences are explored between the traces by two models: on the one hand containing the control room interface; on the other hand containing the mobile device. If a property does not hold then the checker finds one counter-example. Alternatively, the negated property may be used to find a trace that satisfies the property. Usually the model checker only produces a single trace giving no guarantee that it is an interesting one from the point of view of understanding design implications. Additional traces can be created by adding assumptions about the behaviour. This contrasts with an approach using explicit tasks where the model checker is used to explore a particular way in which the goal can be achieved (the task) because the idea is to use as few behavioural assumptions as possible. We are interested in any behaviours. The sequences in Figure 6 are visualisations of the traces obtained by checking for different models if and how the plant can deliver substance C to the outside world. This is specified as property: SAN1: F (PUMP5CTRLM.state=PMP5ON) & (TANK1.state = HOLDS_C)

component selector bucket selector laser pointer

Figure 3: A hand-held control device (modified version of the “Pucketizer” device in [Nilsson et al., 2000]).

In this case the negated property “not SAN1” is used because instances that satisfy the property are required. The two models involving the different interfaces are checked with the same property. The first sequence in Figure 6 satisfies the control room interface. The second sequence was generated by checking the property against the hand-held device

Page 166

model. While the first two traces assume a serial use of pumps, the third and fourth sequences show the same task for a concurrent use of pumps. Comparison of these sequences yields information about the additional steps that have to be performed to achieve the same goal.

5.2

Analysing context effects

As a result of making a comparison between the traces for the control room and for the hand held, the analyst might come to the conclusion that the repetitive process of saving controls may cause slips or mistakes, a direct effect of location on the actions of the hand-held device. To explore the effect of this a further assumption may be introduced to the property to be analysed that an operator might forget certain steps. This assertion “alwaysForget” is described as follows: assert alwaysForget: G !(savePmp1ctrls| [...] |savePmp5ctrls);

The original property SAN1 is checked under the assumption that this assertion holds: assume alwaysForget; using alwaysForget prove SAN1; Checking this property leads to the sixth sequence in Figure 6. A consequence of exploring this sequence highlights the likelihood of context confusions and therefore the device requires redesign. As can be seen, an identical subsequence of actions at positions 2 and 6 have different effects. An interlock mechanism is therefore introduced with the aim of reducing the likelihood that human error arising from forgetfulness might arise. One candidate design solution is to use an interlock that warns the user and asks for acknowledgement that the currently displayed control elements are about to disappear. The warning is issued whenever a device position is left and the device’s control elements are neither on screen nor stored in a bucket. It is straightforward to adjust the model of the interface to the hand-held device to capture this idea, and this specification is

given in the fuller paper [Loer and Harrison, 2004]. The design however does not prevent the user from acknowledging and then doing nothing about the problem. Checking the same properties, including the assumptions about the forgetful user, produces Sequences 7 and 8 in Figure 6. In this example the central control panel characterises the key actions to achieving the goal since the additional actions introduced by the hand held device are concerned exclusively with the limitations that the new platform introduces, dealing with physical location, uploading and storing controls of the visited devices as appropriate. The analysis highlights these additional steps to allow the analyst to subject the sequence to human factors analysis and to judge if such additional steps are likely to be problematic. The reasons why a given sequence of actions might be problematic may not be evident from the trace but it provides an important representation that allows a human factors or domain analyst to consider these issues. For example, action goPOS6 may involve a lengthy walk through the plant, while action savePmp4ctrls may be performed instantaneously and the performance of action getPmp3ctrls might depend on additional contextual factors like the network quality. The current approach leaves the judgement of the severity of such differences to the designer, the human factors expert or the domain expert. It makes it possible for these experts to draw important considerations to the designer’s attention.

6

Conclusions

The paper illustrates how configuration and context confusions might be analysed in the early stages of design before a system is fielded. The particular method described involves comparing and inspecting sets of sequences of actions that reach a specified goal state. No assumptions are made about user behaviour initially, constraints based on domain and user concerns are used to explore subsets of the traces that can achieve the goals. Experts assist the process of adding the appropriate constraints to the properties to be checked. In order to do this a human factors

Page 167

expert or a domain expert may be provided with sufficiently rich information that it is possible to explore narratives surrounding the traces generated. Hence traces can form the basis for scenarios that aid exploration of potential problems in the design of mobile devices, e.g. the additional work that would be involved for the system operator if subtasks are inadvertently omitted in achieving the goal. The tool can also be used to find recovery strategies if an operator forgets to store control elements. Further work is of course needed to devise strategies for appropriate guidance with respect to (i) finding an efficient sequence of analysis steps and (ii) devising a strategy for the introduction of appropriate assumptions. Guidance is also required to help limit the size of the models to be analysed. Suitable techniques and heuristics for semantic abstraction of system models need to be devised to avoid the state explosion problem. However, the size of models that can be dealt with is encouraging and this situation can be improved through appropriate abstraction and consistency checking. As has been said the case described in the paper involves an oversimplistic model of context for the purpose of presentation. The following questions require exploration: • What are the key features of the design that are relevant to these context confusions?

how are these different stages and complementary models used together? More elaborate analysis would involve models of context in which other users and configurations (for example PDAs) may enter or leave dynamically. In order to reason about context such as these, knowledge logics using operators such as the K operator could be used to express what an agent knows in a given context [Fagin et al., 2004]. Hence given the scenario example, a question may be asked such as whether it is common knowledge that the repair has been completed in order that all agents can restore the state of the components they were dealing with to their original states. The model and logic may also be used to ask whether it is possible that an agent can think that the state of their component can be restored before it is time to do it. Hence the logic will be used to express properties that capture potential user confusions in this richer notion of context. Once appropriate models and formulation of user context confusion have been developed, the next stage is to analyse the pragmatics of modelling using these techniques and analysing realistic properties. It is envisaged that similar strategies will be developed for exploring other aspects of context confusion, for example another line of analysis might be to explore the knowledge that users have of whether the information representing the state of a bucket in the PDA is up-to-date with respect to the actual state of that component – that is issues of temporal validity of data might be considered.

• What are appropriate models of context — what about the information that might be inferred at these different positions? What about knowledge about history or urgency? What about the proximity, knowledge and behaviour of other mo- References bile agents in the environment? What about is- [B´erard et al., 2001] B´erard, M., Bidoit, M., Finkel, A., sues such as the staleness of data? A number of Laroussinie, F., Petit, A., Petrucci, L., and Schnoepapers [Dey et al., 2001, Grudin, 2001] classify belen, P. (2001). Systems and Software Verification. and critique notions of context. Model-Checking Techniques and Tools. Springer. • How should the models be analysed for potential user confusions — can techniques such as model checkers still be used or do the models that are generated have too many states? • If more than one model is appropriate, at different stages of the design or at the same time,

[Campos and Harrison, 2001] Campos, J. and Harrison, M. (2001). Model checking interactor specifications. Automated Software Engineering, 8:275–310. [Clarke et al., 1999] Clarke, E., Grumberg, O., and Peled, D. (1999). Model Checking. MIT Press. [Degani, 1996] Degani, A. (1996). Modeling HumanMachine Systems: On Modes, Error, and Patterns of

Page 168

Interaction. PhD thesis, Georgia Institute of Technology.

distributed cognition: the resources model. HumanComputer Interaction, 15(1):1–42.

[Dey et al., 2001] Dey, A., Abowd, G., and Salber, D. (2001). A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction, 16:97–166. [Fagin et al., 2004] Fagin, R., Halpern, J., Moses, Y., and Vardi, M. (2004). Reasoning about Knowledge. MIT Press. [Fields, 2001] Fields, R. (2001). Analysis of erroneous actions in the design of critical systems. PhD thesis, Department of Computer Science, University of York, Heslington, York, YO10 5DD. [Grudin, 2001] Grudin, J. (2001). Desituating action: digital representation of context. Human-Computer Interaction, 16:257–268. [Holzmann, 2003] Holzmann, G. (2003). SPIN Model Checker, The: Primer and Reference Manual. Addison Wesley. [Horrocks, 1999] Horrocks, I. (1999). Constructing the User Interfaces with StateCharts. Addison Wesley. [Huth and Ryan, 2000] Huth, M. R. A. and Ryan, M. D. (2000). Modelling and reasoning about systems. Cambridge University Press. [Loer and Harrison, 2004] Loer, K. and Harrison, M. (2004). Analysing and modelling context in mobile systems to support design. http://homepages.cs.ncl.ac.uk/ michael.harrison/publications.htm. [Nilsson et al., 2000] Nilsson, J., Sokoler, T., Binder, T., and Wetcke, N. (2000). Beyond the control room: mobile devices for spatially distributed interaction on industrial process plants. In Thomas, P. and Gellersen, H.-W., editors, Handheld and Ubiquitous Computing, HUC’2000, number 1927 in Lecture Notes in Computer Science, pages 30–45. Springer. [Patern` o and Santoro, 2003] Patern` o, F. and Santoro, C. (2003). Support for reasoning about interactive systems through human-computer interaction designers’ representations. The Computer Journal, 6(4):340–357. [Rushby, 2002] Rushby, J. (2002). Using model checking to help discover mode confusions and other automation surprises. Reliability Engineering and System Safety, 75(2):167–177. [Wright et al., 2000] Wright, P., Fields, R., and Harrison, M. (2000). Analyzing human-computer interaction as

Page 169

USER_INTERFACE TEMPORARY_ELEMENTS

PUMP1_CONTROLS

P1CTRL_INVIS

P1CTRL_VIS P1ONOFF

P1VOLUME

P1ONOFF DARK

P1VOLUP

P1VOLUME DARK

P1VOLDISP_ SETSIG or tap_P1VOLUME/ [in(PMP1_ P1VOL_SIG VOLUME_SET)] tap_P1VOLUME/ P1VOL_SIG P1VOLDISP_ P1VOLUME UNSETSIG

P1ONDISP_ SETSIG or [in(PMP1_ON)] tap_P1ONOFF/ P1OFF_SIG

tap_P1ONOFF/ P1ON_SIG P1ONDISP_ UNSETSIG

ex(POS2)[not(in(TWOCOMPONENTSB1) or in(ONECOMP1B1) or in(ONECOMP1B3) or in(TWOCOMPONENTSB3))] or BSLCTSIG

(en(POS2) or [in(TWOCOMPONENTSB1) or in(ONECOMP1B1) or in(ONECOMP1B3) or in(TWOCOMPONENTSB3)])

P1ONOFF ILLUMINATED

P1VOLDOWN P1VOLDOWN DARK

P1VOLUP DARK tap_P1VOLUP/ SETNEWVOL_SIG

tap_P1VOLDOWN/ SETNEWVOL_SIG

P1VOLUP ILLUMINATED

ILLUMINATED

PUMP5_CONTROLS P5CTRL_INVIS

PUMP3_CONTROLS

P3CTRL_INVIS

P3ONOFF

P3BWD

P3ONOFF DARK

tap_P3ONOFF/ P3ON_SIG P3ONDISP_ UNSETSIG

ex(POS6)[not(in(TWOCOMPONENTSB1) or in(ONECOMP3B1) or in(ONECOMP3B2) or in(TWOCOMPONENTSB2))] or BSLCTSIG

(en(POS6) or [in(TWOCOMPONENTSB1) or in(ONECOMP3B1) or in(ONECOMP3B2) or in(TWOCOMPONENTSB2)])

P3CTRL_VIS

P3FWD

P3DIR_BWD DARK

P3ONDISP_ SETSIG or [in(PMP3_ON)] P3FWDDISP_SIG or P3ONDISP_ tap_P3ONOFF/ UNSETSIG P3OFF_SIG

tap_P3BWD/ P3BWD_SIG P3BWDDISP_SIG or [in(PMP3_BWD)]

ex(POS5)[not(in ...)] (en(POS5) or [in(TWOCOMPONENTSB2) or in(ONECOMP5B2) or in(ONECOMP5B3) or BSLCTSIG or in(TWOCOMPONENTSB3)])

P5CTRL_VIS

P3DIR_FWD DARK

P5ONOFF DARK

tap_P3FWD/ P3BWDDISP_SIG P3FWD_SIG or P3ONDISP_ P3FWDDISP_SIG or UNSETSIG [in(PMP3_FWD)]

tap_P5ONOFF/ P5ON_SIG P5ONDISP_ UNSETSIG

P3DIR_FWD ILLUMINATED

P3DIR_BWD ILLUMINATED

P3ONOFF ILLUMINATED

P1VOLDOWN ILLUMINATED

P5ONDISP_ SETSIG or [in(PMP5_ON)]

tap_P3ONOFF/ P5OFF_SIG

P5ONOFF ILLUMINATED

PERMANENT_ELEMENTS BUCKET_SLCT_BUTTON BSBTTN_RELEASED pressBSBTTN

COMPONENT_DEL_BUTTON CDBTTN_RELEASED

COMPONENT_GRAB_BUTTON

CGBTTN_RELEASED pressCGRABBTTN relCGRABBTTN/ CGRAB_SIG

pressCDELBTTN relCDELBTTN/ CDEL_SIG

relBSBTTN/ BSLCTSIG

CDBTTN_DELAY

CGBTTN_PRESSED

CDBTTN_PRESSED

BDBTTN_PRESSED

relCGRABBTTN/ CGRAB_LONGSIG

CONTROL_MECHANISM BUCKETS

BUCKET2 (CGRABLINGSIG and [in(POS6)])/ B2CONTENT:=1 /B1CONTENT:=0; B2CONTENT:=0; B3CONTENT:=0;

ONECOMP3B2

BUCKET1 (CGRABLINGSIG and [in(POS2)])/ B1CONTENT:=1

ONECOMP1B1

EMPTYB1 B1CONTENT==0 CRMVSIG/ B1CONTENT:=0 B1CONTENT==1

(CGRABLINGSIG and [in(POS6)])/ B1CONTENT:=2 BSLCTSIG

INITB1 B1CONTENT==2

CRMVSIG/ B2CONTENT:=0

INITB2 B2CONTENT==2 B1CONTENT==1

ONECOMP5B2 CGRABLONGSIG [in(POS6)]/ B2CONTENT:=3

TWOCOMPONENTSB2 CSLCTSIG TWOCOMP3B2 TWOCOMP1B2 CSLCTSIG

ONECOMP3B1

TWOCOMPONENTSB1 CSLCTSIG TWOCOMP3B1 TWOCOMP1B1 CSLCTSIG

CRMVSIG/ B2CONTENT:=0

CRMVSIG CRMVSIG CGRABLONGSIG [in(TWOCOMP3B2]/ [in(POS5)]/ B2CONTENT==3 [in(TWOCOMP1B2]/ B2CONTENT:=1 B2CONTENT:=2 B2CONTENT:=3

CRMVSIG/ B1CONTENT:=0

CRMVSIG CRMVSIG CGRABLONGSIG CGRABLONGSIG B1CONTENT==3 [in(TWOCOMP1B2]/ [in(POS2)]/ [in(TWOCOMP3B2]/ [in(POS6)]/ B1CONTENT:=1 B1CONTENT:=2 B1CONTENT:=3 B1CONTENT:=3

(CGRABLINGSIG and [in(POS5)])/ B2CONTENT:=2

EMPTYB2 B2CONTENT==0

BSLCTSIG

BUCKET3

(CGRABLINGSIG and [in(POS5)])/ BSLCTSIG B3CONTENT:=1

ONECOMP1B3

EMPTYB3 B3CONTENT==0 CRMVSIG/ B3CONTENT:=0 B1CONTENT==1

(CGRABLINGSIG and [in(POS2)])/ B3CONTENT:=2

CRMVSIG/ B3CONTENT:=0

INITB3 B3CONTENT==2

ONECOMP3B3

CRMVSIG CRMVSIG CGRABLONGSIG [in(TWOCOMP3B2]/ [in(POS2)]/ B3CONTENT==3 [in(TWOCOMP1B2]/ B3CONTENT:=1 B3CONTENT:=2 B3CONTENT:=3

CGRABLONGSIG [in(POS5)]/ B3CONTENT:=3

TWOCOMPONENTSB3 CSLCTSIG TWOCOMP3B3 TWOCOMP1B3 CSLCTSIG

Figure 5: Ofan model for the hand-held device: The User Interface and Control Mechanism modules.

Page 170

sequence 1

openPmp1

sequence 4

sequence 5

goPos2

goPos2

goPos2

getPmp1ctrls

getPmp1ctrls

sequence 2

sequence 3

openPmp1

openPmp1

savePmp1ctrls

savePmp1ctrls

savePmp1ctrls

goPos6

goPos6

goPos6

openPmp1

openPmp1

sequence 7

sequence 8

goPos2

goPos2

goPos2

getPmp1ctrls

getPmp1ctrls

getPmp1ctrls

openPmp1

openPmp1

openPmp1

goPos6

goPos6

goPos6

acknowledge

acknowledge

sequence 6

savePmp1ctrls

selectBucket1

closePmp1

closePmp1

getPmp3ctrls

openPmp3

openPmp3

openPmp3

selectBucket2

selectBucket2

savePmp3ctrls

savePmp3ctrls

getPmp3ctrls

getPmp3ctrls

getPmp3ctrls

openPmp3

openPmp3

openPmp3

goPos2

goPos2

acknowledge

getPmp3ctrls

selectBucket2

openPmp3

openPmp3

getPmp1ctrls

getPmp1ctrls

closePmp1

selectBucket1

selectBucket1

closePmp1

closePmp1

savePmp3ctrls

selectBucket1

closePmp1

closePmp1

closePmp1

goPos6

goPos6

acknowledge

getP3ctrls

getP3ctrls

selectBucket2

reverseP3

reverseP3

reverseP3

goPos5

selectBucket2

selectBucket2

reverseP3

reverseP3

reverseP3

reverseP3

reverseP3

goPos5

goPos5

goPos5

goPos5

goPos5

acknowledge

acknowledge

selectBucket2

selectBucket2

savePmp3ctrls

getP5ctrls

getP5ctrls

closeP3

closeP3

openP5

openP5

openP5

openP5

openP5

getP5ctrls

getP5ctrls

getP5ctrls

openP5

openP5

openP5

Figure 6: Traces generated by runs of the model checker

Page 171

Long Running Composite Services Jamie Hillman Computing Department, Lancaster University, Lancaster, UK Email: j.hillman @comp.lancs.ac.uk

Abstract Non-trivial services often cross organisational boundaries and so must be constructed as composite services, with each organisation providing a service to carry out part of the task. These individual services can be time consuming, as complex computation is undertaken or human input is required. The combination of composition of services and long running transactions puts great pressure on the current model of service construction, which is essentially an RPC-based model. A growing number of developers and standards organisations are moving to an asynchronous and document oriented model for service construction, which shows great promise in particular for constructing composite long running services. This paper describes this architecture and sets out some of the challenges remaining.

1

Introduction

minutes, hours or even days then problems arise. Scalability is reduced greatly when services are blocking for long periods of time - expecially if they are holding resources during that period. The RPC-based approach also treats a long running transaction as a single operation and so no information, such as the progress of the transaction, can be obtained until the transaction has completed. More generally, synchronous communication is not well suited to what are naturally asynchronous interactions. Much of the complexity of current standards for web services could be attributed to this mis-match in communication paradigms. The next section describes an asychronous approach to web services and describes it’s merits. It is followed in section 3 by a brief description of the complementary concept of document-oriented services before conclusions and a proposal for future work are described in section 4.

2

Composing services inevitably leads to dependencies between the services, where the outcome or at least the completion of one service is dependent on another. These depedencies can take many forms from simple chaining of requests to complex nesting. A common feature of these dependencies, where web services are composed in the current RPC-based approach, is that components have to block whilst nested or chained transactions complete. Resources are often held whilst this blocking is taking place, preventing their use for other requests or by other services on the same host. This is not a problem if transactions are not long running, and this is the assumption on which this method of constructing services is built. Where transactions take

Asynchronous Web Services

Asynchronous web services simply communicate in message exchanges, a service simply sends a message and continues executing. If a reply is generated in a response to a message then the reply is sent as a separate message and handled by a separate thread of control. This approach is more flexible than the standard RPC model because request-reply is not the only mechanism that can be used. One way notification messages can be sent out, for example to update a service on the progress of an action it initiated. Other invocation patterns include consumption and solicit-response. Long running services are particularly suited to this

Page 172

2

ASYNCHRONOUS WEB SERVICES

2

model as traditionally a service would block when calling a service it depends upon, transferring control to that service until a reply is generated. In the asynchronous model the service would be invoked by simply dispatching a message representing a request for some action to be carried out. This allows the requesting service to continue performing other functions until a reply is received, if a reply is needed at all. The difference between these two approaches is similar to the difference between telephoning someone and sending them an email. Long running transactions in an RPC-based model are often black-box in that you do not know what has happened until the procedure call returns. Another benefit of an asynchronous-based model is that it allows progress messages to be dispatched to the client of a service informing the client of progress, requesting further information or reporting errors. Another benefit of this approach is that services become more efficient as they aren’t blocking for long periods of time, they generate requests without blocking and handle responses as they arrive, allowing them to interleave many different transactions. Also, if services are built using message-queues to handle incoming requests (as described by the worker pattern) then scalability increases as a service need not have finished processing the previous message in order to receive a new request. Finally, systems built around asynchronous communication are often also more loosely coupled as they tend to be more event oriented, avoiding complex protocols of procedure calls. This is particularly true when communication is document-oriented too, as will be described later. Loosely coupled systems are generally more scalable and flexible.

standards section below provide such functionality. 2.1.2

A service must know where it can direct messages related to another message - for example replies, errors or progress updates. This requires some means of addressing scheme. Many suitable addressing schemes exist, some of which are referenced in the related standards section below. 2.1.3

2.1.1

Challenges & Related Standards Correlation

Due to requests and replies being sent as separate messages in asynchronous communication, there is need for a means of correlating which replies correspond to which requests when request-reply semantics are required, similarly for solicit-response a response must be correlated with a solicitation. This is most often achieved through unique identifiers for requests which the corresponding reply references. Various standards listed in the related

Programming Style

Though asynchronous communication is a very old concept, most programmers are used to sychronous, requestresponse-based programming. Tool support and bodies of knowledge such as design patterns and architectural styles are not as well developed for asynchronous programming as for synchronous, for the simple reason that synchronous programming is more popular. As asychronous web services are developed further the body of knowledge surrounding them should develop. 2.1.4

Reliable Transport

Often actions in service-oriented-architectures are not idempotent and so messages must be guaranteed to arrive once and once only. This is a classic computing problem and algorithms exist to solve these problems. References to specific protocols are listed in the next section. 2.1.5

2.1

Addressing

Related Standards

WS-TXM[5] provides an asynchrnous model for long running transactions and covers a lot of the issues described above. WS-COORDINATION[3] provides a means for co-ordinating asychronous message exchanges to form a protocol or to carry out some composite function. WSADDRESSING[4] provides an addressing scheme identifying endpoints as required in asychronous communications. AWSP[8] and ASAP[6] both extend SOAP[9] to cater for asynchronous communications. And finally HTTPR[2] and WSRM[7] cater for reliable message delivery.

Page 173

3

3

DOCUMENT ORIENTED SERVICES

Document oriented Services

3

3.1

Challenges & Related Standards

Document orientation is more of an approach than a technology but there are related technical challenges and exDocument orientation is a complementary approach to isting standards that aim to tackle them. This section asynchronous communication. Asynchronous services briefly details the challenges. simply exchange messages providing addresses of endpoints where replies, errors and progress messages can be sent. These messages can still be procedural however, for example, saying carry out operation Z with ar- 3.1.1 Data Model gument X and Y. This naturally leads to a fixed mapping In a model where everything is a document there must from messages to procedures, where a particular mesbe some form of data model. The data model specifies sage causes a particular procedure/method to be invoked. how data in a document is encoded and a scheme which Mapping messages to procedures produces messaging proconstrains this data. The encoding in a data model could tocols that are tied to implementations and tightly coube as simple as the unicode character set for example. ples services, reducing the decoupling enabled by asynAs argued earlier documents should be implementation chronous communication. independent and so the data model should also be indeThe document oriented approach separates the data pendent of the implementation. XML is often a popular model from the code that operates upon it. Thinking in choice for expressing a data model for these reasons and document oriented terms means thinking about how best a document oriented service model could be constructed to model the data you are transporting without thinking around XML. about code that operates upon it. This leads to more loosely coupled and flexible services architectures as data is kept independent of use. Instead of saying carry out 3.1.2 Addressing operation Y with argument X and Y you would create a document representing X and Y and pass it to a Z service, Documents generally should avoid referencing specific services as this reduces their independence of implementawhich would add to or transform the document. tion but it may often be desireable to reference docuBeyond this significant advantage of more loosely cou- ments in order to avoid passing around large documents pled and flexible services, a document oriented model or to reference commonly used documents. An addresscombined with a powerful addressing scheme can improve ing scheme as required by asynchronous communication the efficiency and scalability of long running services by may be suitable for referencing documents as well as endallowing persistent documents to be passed by reference. points though a protocol for accessing documents and, if Long running transactions often operate on large data possible, modifying them, would be necessary. sets, which, when passed by value can cause great waste of resources. A document oriented model could allow a document to reference another document by stating the 3.1.3 Existing Standards & Related Work address at which it can be retrieved. The REST model [1] attempts to capture the architecMany such extensions to a simple document model can tural style of the world wide web and as such describes be envisaged, improving the scalability and flexiblity of an asychronous document oriented model for building serservice oriented architectures. Similarly to asynchronous vices. REST advocates HTTP as a lowest common demessaging document orientation is a very simple concept moninator protocol for transfering documents around and that is more a way of thinking about services than any- it’s advocates argue that the application protocol should thing else but it seems promising particularly in the con- be as simple as HTTP in order to allow maximum flexitext of long running composite services. bility.

Page 174

REFERENCES

4

4

Conclusions & Proposal

served.

Asynchronous communication and document orientation are simply means of communicating and structuring communications. As such they are deceptively simple concepts. Building service architectures using an asychronous and document oriented approach, however, can greatly increase efficiency, scalability and reusability. This is not the case for every situation and in some cases the current RPC-based approach is clearly stronger, but for services that provide long-running functionality the benefits offered by an asynchronous document oriented approach are clearly attractive. Services can spend more time carrying out other tasks instead of blocking waiting for responses. Errors and progress information can be transmitted as a transaction is carried out. Services are more loosely coupled and so more reusable and flexible and finally the messages and the data they contain are not tied to the implementation, leading to further flexibility and reusability. There are still challenges to be addressed when building systems based on this approach, some issues of standardisation and some issues of familiarity with this way of thinking. For example designing interfaces independently of data models and designing data model independently of code is difficult for a programmer used to constructing closed-environment applications where there is no need for flexibility and de-coupling. Existing mechanisms for composition and fault tolerance are also tested by the approach described in this paper. The choreography model of composition where there is no central point of control for a composite service fits more naturally than the more popular and better developed model of orchestration, where a single controlling party controls the flow of events. If this model of composition is used then fault tolerance becomes difficult as there is no single point of control where fault tolerance mechanisms can be deployed. As many of the concepts described here are fairly abstract and the applicability of existing models for fault tolerance and composition is hard to predict, we propose that a significant case study be carried out. A real-world example should be developed as a composite service, built based upon the principles described in this paper. Only with such examples can the merits of this approach and it’s affect on existing web services best practice be ob-

References [1] R. Fielding. Representational transfer. http://www.ics.uci.edu/ ing/pubs/dissertation/top.htm.

state field-

[2] IBM. Reliable hypertext transfer protocol. http://www106.ibm.com/developerworks/webservices/library/wshttprspec/. [3] IBM. Web services coordination. ftp://www6.software.ibm.com/software/developer/library/WSCoordination.pdf. [4] M. e. a. IBM. Web services addressing. http://www106.ibm.com/developerworks/library/specification/wsadd/.

[5] S. Microsystems. Web services transaction management. http://developers.sun.com/techtopics/webservices/wscaf/wstxm.p [6] OASIS. Asynchronous service access protocol. http://www.oasisopen.org/committees/tc home.php?wg abbrev=asap. [7] OASIS. Web services reliable messaging. http://www.oasisopen.org/committees/tc home.php?wg abbrev=wsrm. [8] Trans-enterprise. Asynchronous web service protocol. http://xml.coverpages.org/AWSPDraft20020405.pdf. [9] W3C. Simple object http://www.w3.org/TR/soap/.

Page 175

access

protocol.

Research Into Architectural Mismatch In Web Services And Its Relation To DIRC Themes Carl Gamble Newcastle University Room 1101, Claremont Tower Newcastle University, Newcastle upon Tyne +44 191 3464

[email protected] ABSTRACT

2. What do I plan to do

Carl Gamble is a PhD student under the supervision of Dr. C. Gacek at Newcastle University. He will be investigating the field of Architectural Mismatch with Web Services. In this paper he will outline what architectural mismatch is and what the current intended outcomes of his research will be. It is then shown how this research may be related to some of the DIRC themes.

Firstly I need to define the scope of the problem I intend to research and solve. In his paper Garlan et.al.[1] defines four categories of architectural mismatch, these are…

Keywords Structure, Web Services, Architecture, Dependability

1. Introduction The title of my research is "Architectural Mismatches in Web Services", so the first question to answer is "what is an architectural mismatch"? Architectural mismatch according to Garlan et al[1] is the result of a software component making some assumptions about the system it is to be integrated with and that system making a different set of assumptions. These assumptions can greatly increase the cost of integrating a component due to the extra effort required to write the wrapping code needed to get the component working. There are an ever increasing number of web services being published and made available on the internet, however the support for discovering these services is still very limited being based mainly upon a keyword search for the business area that a service offering company exists within. Work is being done in the area of semantic matching of web services to client needs[2]. The goal of this is to improve on the keyword searching by selecting providers based upon the distance between the required service and the advertised service on a hierarchical ontology. There are however other aspects that should be considered when selecting a provider. The current standard data publicised about a web service is little more than an interface description which ignores the ideas of how a service is intended to be used (which could be elaborated with control and / or data flow architectural structures) or what its performance characteristics might be (which can be derived from a process architectural structure). If these and other architectural structures relating to a web service were publicised then it may be possible to gauge the effort required to bind to a service from a client and also determine how a service will perform from both response time and dependability points of view.



Nature of Components



Nature of Connectors



Global Architecture Structure



Construction Process.

Nature of Components, refers to the assumptions made by a software component about the infrastructure it requires, which components have control and what level of access do other components have to its internal data. Nature of Connectors, refers to the assumptions made about the protocol or choreography of access to a components interface and how data passed to and from the component is represented. For example one component may expect an ASCII string but is passed a reference to a C type array. Global Architecture Structure, is what overall structure of components does a component assume. For example a component may expect a data centric structure where all data and messages passed between components occur through a central database, where in fact the system is being built using an event based model where components send messages directly to each other. Construction Process, refers to the dependencies a component exhibits at compile time, it may for example assume that certain libraries exist on a system which is not clear before the attempt at building the system. It appears at this time that the nature of components and connectors are the most relevant to my study having the most potential for causing architectural mismatch, while the global architecture structure and construction process may also warrant study later on. Additional to the architectural features from Garlan, I also believe that the concept of dependability of a service needs to be included its description. So a similar exercise of selecting those dependability attributes relevant to web services needs to be carried out on them. With the desired attributes selected it then follows that this data needs to be represented in some way. Currently the leading contenders for the architectural description language(ADL) of choice are, ACME[3] and XADL[4]. ACME is an architectural interchange language and has the advantage that it can contain the

Page 176

connector and component semantics from many of the existing ADLs, however in earlier versions these semantics were not interpreted by the ACME tools only by special tools for each language. XADL on the other hand has the advantage that it is already based in XML and would therefore simply the task of publishing on the internet, however at this stage I do not know to what extent its semantics are defined. The final part of the research will be to develop a reasoning logic which will allow a system developer to select a web service which is a best match from both architecture and dependability viewpoints for the system they are attempting to build.

3. Anticipated Contributions The intended contribution of this research is the development of formally proven reasoning logic that would assist in the selection of Web Services with respect to their defined architecture and its closeness to the architecture of a client application. For this to be possible another contribution of the research, formalisation of the architectural descriptions of Web Services and a means of publicising them, will also be necessary. Although outside the scope of this research, this would lay the groundwork for more broadly scoped architectural descriptions and reasoning logic to assist with the selection of more general software components. These could form the basis of developer tools which could reduce the amount to time developers spend on producing wrapper code to get a component working within the system they are building by selecting a component which is a better match and therefore requires less wrapping. This would also have implications for the dependability of such a system which would be inherently simpler and therefore easier to understand. Another benefit could be a reduced number of data type translations, in the case that two components assume different representations, which would increase a systems performance.

4. Relation to DIRC themes I will now indicate the ways in which I believe my research will touch upon the DIRC themes.

4.1 Diversity This theme is related to the study of using multiple instances of an item or multiple different implementations of an algorithm to increase the dependability of a system. One possible reason that system integrators do not implement systems using multiple components is that integrating a component is rarely simple due to the architectural assumptions they make. My research will help with the selection of components which most closely match the architecture of the system they are to be integrated into. This will reduce the overhead in integrating those components and allow developers to use a richer base if they desire to.

4.2 Risk This theme is studying the quantification and reason about risk. My research will study the dependability parameters related to Web Services and will include the development of a reasoning logic that takes these into account when choosing a service to integrate. This will allow developers to select components which build toward the dependability characteristics desired of the system they are constructing

4.3 Structure This theme is investigating the notion that understanding a system is crucial to its dependability, and as systems get increasingly large and complex they become much harder to understand. Part of my research will allow Web Service providers to make the assumptions they make about the use of their service explicit and publicly available. This will help developers building a system, firstly to choose a suitable component; and secondly to understand and document the system they build.

4.4 Timeliness The research theme appears mainly concerned with timeliness with respect to safety critical systems, which my research will likely not impact upon, however, it will have two time based implications. The first implication is that the response time of a service may well be described as part of either the architecture or the dependability which will afford developers the opportunity to select services based upon this characteristic. The second implication is that when a Web Service component does fail, this research will make the selection and integration of a replacement component quicker than today which will expedite the resuming of normal service.

5. REFERENCES [1] D. Garlan, R. Allen, J. Ockerbloom, Architectural Mismatch: Why Reuse Is So Hard, pp 17-26, IEEE Software, November 1995. [2] M. Paolucci, T. Kawamura, T.R.Payne, K. Sycara, Semantic Matching Of Web Services Capabilities, Carnegie Mellon University, Date Unspecified. [3] School of Computer Science, Carnegie Mellon Univeristy The ACME Architecture Description Language Homepage http://www-2.cs.cmu.edu/~acme/ [4] Univserity of California, Irvine XADL 2.0 http://www.isr.uci.edu/projects/xarchuci/

Page 177

Exploration games with UML software design Jennifer Tenzer Laboratory for Foundations of Computer Science School of Informatics University of Edinburgh +44-(0)131-650 51 46

[email protected]

ABSTRACT The aim of this work is to use formal games as foundation for a design tool which provides support for the exploration and evaluation of software design in UML. The designer sets up a game on the basis of a UML model and repeatedly plays a game with the tool to detect flaws or incompleteness in the design. During a play the game definition including the underlying UML model may be incremented. This allows the modeller to add information while the game is being played and to react to discoveries which have been made during the play. This research builds on previous work which has been published in [1], [2] and [3].

Keywords Structure, object-oriented software design, UML, formal games, tool support.

1. UML SOFTWARE DESIGN For most systems there exist different design solutions, each of which has particular strengths and weaknesses. The non-trivial task of the software designer is to choose a suitable design which fulfils the system requirements. In order to make this choice the designer has to understand the system, the different design options and their consequences. A well-chosen design and structure splits the system and its behaviour up into smaller entities which are much easier to understand, develop and evaluate than the complete system. This “divide and conquer” approach is essential for building complex software systems. The Unified Modeling Language (UML) [4] is a standard language for modelling the design of object-oriented software systems and recording the result of the designer’s decisions. UML consists of various diagram types, a constraint language and an action framework. Each diagram type provides a specific view on the system and can be used on different levels of abstraction. A UML design model usually contains class diagrams which specify the structure the system. The behaviour of the system may be modelled from different perspectives which complement each other. For example, a UML state machine diagram models how objects of a class react to events, while UML activity diagrams focus on the sequence of steps that is needed to perform an a particular piece of functionality. Currently available UML design tools mainly assist the designer in drawing the UML diagrams and generating code from them. Some tools such as, for instance, [5] and [6] allow the modeller to “play through” UML state machines or activity diagrams. However, the modeller cannot refine the design during this

process and experiment with different design options. The design model must be very precise and contain much detail before such an animation can be attempted. So far there exists no tool support for the transition from a loosely defined design model to one that is precise enough for animation, code generation or verification.

2. EXPLORATION GAMES In order to enhance the capabilities of existing UML tools exploration games have been developed as formal framework. Exploration games are a variation of two-player games as known in verification of software systems (for an overview see [7]). There are two main problems with applying a formal technique to UML. First, software designers rarely model the complete system in UML, but focus on the system's most interesting parts. Second, UML is a semi-formal language whose diagrams can contain constraints and descriptions in natural language. The fact that games are an interactive technique is essential for our solution to these problems. The progress of a play is determined by the decisions of the players who react to each other's moves. The designer takes actively part in the game and can decide about uncertain situations or increment the design model while s/he is playing the game. Thus UML models which are incomplete or imprecise are allowed as basis for a game and may be refined during a play. The participants in an exploration game are two players called Verifier and Refuter, a Referee and an Explorer. The human designer can play the role of one or more of these participants. Verifier's aim is to show that the UML design is correct with respect to the system requirements, while Refuter attempts to find a flaw in the design. The UML design model determines many parts of the game such as its positions and the possible moves for each player. However, in order to check whether the design is correct, the system requirements have to be integrated into the game. They constitute the winning condition for Verifier in the exploration game. If the winning condition is violated, the design does not meet the specification and Refuter wins the game. For example, in a game which is based on a set of UML state machines, the positions capture the current state of the system, which consists of a collection of objects. The players move by generating events or firing transitions in the state machines for the objects. The winning condition could be given by specifying the set of state combinations which are considered as legal. In an exploration game the players have responsibilities assigned to them. For example each player must decide about the legality of a move if it is not clearly defined by the UML model for

Page 178

particular kinds of moves in a play. If any uncertain situations arise which cannot be resolved by the players, the Referee decides how the play is continued. The Explorer may increment the game definition at any point during a play. This can involve both a modification of the UML design model, or of the requirements specification which the design is verified against. Such an incremental development of the game is the central idea of this work. The exploration game framework can be applied to UML in many different variants. A game variant has to specify how exactly the exploration game is defined, i.e. what its positions, moves, and winning condition look like, which parts of the UML model are used for the game definition, how the responsibilities for the players are assigned and how the Explorer may increment the game. Game variants can either be used to check one design with respect to desired or undesired properties, or to compare two different designs.

2. TOOL SUPPORT The main functionality of a design tool which is based on exploration games is to let the user play a game once it has been set up. The simplest possibility is that the tool merely ensures that the game is played correctly and declares the winner. In this case the user takes on the role of all four participants in the play. He plays Refuter and Verifier, increments the game as Explorer, and decides about uncertain situations as Referee. In fact there might be several users which share these tasks between them. For example, Refuter and Verifier could be played by different users who ”discuss” and improve the design, and yet another user might act as Referee. This mode of the tool is probably particularly useful in early design phases, where more detail needs to be added to the game before it becomes interesting and challenging to play with less involvement of the user(s). Another operation mode for the tool would be to play the role of Verifier or Refuter. The user chooses which of these parts he wants to play and the tool takes the opponent's part. If it is possible for the tool to calculate a winning strategy for the current game, then the tool might play this winning strategy, or use it to suggest better moves to the user. Otherwise, the tool might use random choices and/or heuristics to play as well as possible. If incompleteness of the design model prevents the tool from computing a complete winning strategy in advance, it will have to adapt its strategy during the play. We assume that the role of the Explorer is always played by the human designer. The incrementation of a game generally requires knowledge about the system and design skills. We do not aim at substituting the designer, but at developing a tool which supports him in using his skills. A very advanced version of a game-based design tool could try to give the designer feedback about which kind of incrementation is beneficial for a player in specific situations. However, it is then still the designer who has to make a concrete incrementation according to the tool's suggestion.

3. RELATED WORK In [8] is described how model checking techniques can be applied to UML. Some of these ideas are realised by the tools vUML [9] and HUGO [10]. Both of them require the definition of a UML sequence diagram and a set of state machines. They transform these diagrams into a formal model and use it as input for the model checking tool SPIN which analyses whether the given sequence is realizable by the state machines. The tools differ in the way they translate the UML model and in their coverage of state machine features. These tools are purely aimed at verification and do not help the user to create and extend the design, which is the focus of our work.

4. REFERENCES [1] Tenzer, J. Exploration games for safety-critical system design with UML 2.0. In Proceedings of the 3rd International Workshop on Critical Systems Development with UML, CSDUML'04. Report I0415, Technische Universität München, September 2004, pages 41-55. [2] Stevens, P., and Tenzer, J. Games for UML software design. In Proceedings of Formal Methods for Components and Objects, FMCO'02, volume 2852 of LNCS. Springer, 2003. [3] Tenzer, J. Improving UML design tools by formal games. In Proceedings of the International Conference on Software Engineering, ICSE'04, pages 75-77. IEEE Computer Society, 2004. Research abstract for the ICSE doctoral symposium. [4] Object Management Group (OMG). UML 2.0 Superstructure Final Adopted specification, August 2003. Available at http://www.uml.org. [5] Rhapsody, version 5.0. Available from from I-Logix at http://www.ilogix.com. [6] Real Time Studio Professional, version 4.3. Available from Artisan Software at http://www.artisansw.com. [7] Grädel, E., Thomas, W., and Wilke, T. Automata, Logics and Infinite Games: A Guide to Current Research, volume 2500 of LNCS. Springer, 2002. [8] Del Mar Gallardo, M., Merino P., and Pimentel, E. Debugging UML designs with model checking. Journal of Object Technology, 1(2):101-117, July-August 2002. [9] Lilius J., and Paltor, I. vUML: A tool for verifying UML models. In Proceedings of Automated Software Engineering, ASE’99. IEEE, 1999. [10] Knapp, A., and Merz, S. Model checking and code generation for UML state machines and collaborations. In Proceedings of the 5th Workshop on Tools for System Design and Verification, FM-TOOLS’02, Report 2002-11. Institut für Informatik, Universität Augsburg, 2002.

Page 179