Systematic Development of the Human Interface - CiteSeerX

Systematic Development of the Human Interface D.J. Duke Department of Computer Science University of York Heslington, York YO1 5DD, U.K. [email protected]

P.J. Barnard and J. May MRC Applied Psychology Unit 15 Chaucer Road Cambridge CB2 2EF, U.K. [email protected]

D.A. Duce Rutherford Appleton Laboratory Chilton Didcot, Oxon. OX11 0QX, U.K. [email protected]

for example to construct high-integrity software. This paper demonstrates how recent work within the European Amodeus-2 project (a research action in the ESPRIT programme) provides a foundation for applying existing tools for rigorous development within the context of human-system interaction. As an example, we will consider the development of a multi-modal interface to an information system. Section 2 introduces this example, and uses it to illustrate the role of two quite different modelling techniques that are applicable to the development of interactionally rich systems. One model derives from the use of formal methods in software engineering; the other, from cognitive psychology. The problem is that to address design issues such as multi-modal blending, these disparate representations must be brought into contact. How that contact can be achieved is the subject of Section 4. The solution we propose represents the structure and constraints of the cognitive model in the same framework as the formal model. As a result, the two models can be combined to give a syndetic representation of the problem (syndetic, from the Greek: to bind). The conclusion sets out current progress on system and syndetic modelling. A short glossary of the abbreviations used in the paper is provided in Appendix A.

Abstract The problem of developing software to meet precise specifications has lead to the development of mathematical notations for expressing and reasoning about the behaviour of a required or extant system. In this paper we describe a different use of formal models: as tools for gathering and consolidating requirements on interaction between engineered systems and their users. This change in focus reflects the growing use of sophisticated interactive technology in domains, such as medicine, where human comfort or safety is an issue. Not only must software systems function correctly, but the demands that the interface places on users of those systems need to be understood. This problem can not be addressed by formal models in isolation. Instead, we describe an approach that uses formal models of human information processing to augment models of system functions. As a result it becomes possible, at an early stage in system design, to consider the role of human cognition in the correct behaviour of the system.

Introduction Widespread availability of graphical and multi-modal interfaces means that software systems can make increasingly strong and subtle demands on the cognitive abilities of users to monitor and mediate the performance of software systems [1]. As examples, we cite the use of visual and audio presentations in process and power-plant control, aircraft cockpits, medical imaging and surgery. If software engineering is to address the growing importance of the user in controlling the behaviour of systems, then its tools must be extended and adapted to deal with the problems that emerge. These include the separation of internal and perceivable state, and the need to assess system behaviour against the capabilities and limitations of the human. While the HCI community has generated many theories and evaluative techniques for assessing usability, these are either (a) post-hoc, or (b) suited for participatory or experimental design, not for the detailed and systematic process needed

Systems and Models Most disciplines routinely use various kinds of models to represent and reason about problems in their domain. For human-computer interaction, that domain includes the properties and behaviour of software artefacts, and the cognitive capabilities of their users. Two quite different classes of model are thus applicable when formulating requirements on systems that rely on human cognitive ability. These are models that express the required or expected behaviour of the system to be constructed, and models of human information processing and mental performance that determine the constraints under which the user will be able to interact successfully with the proposed system. If we are to gain some assurance that the system will be usable for its intended purpose, then somehow these two distinct kinds of model must be brought into contact. The Amodeus project is concerned with the develop-

1 In: Proceedings of APSEC’95: Second Asia-Pacific Software Engineering Conference, Brisbane, Dec 6-11. IEEE Computer Society Press.

1

Query 2

Query 1

Search

Clear

Search

Query 1

Clear

Search

Query 2

Clear

Search

Clear

From

LONDON

From

From

LONDON

From

MANCHESTER

To

MANCHESTE

To

To

MANCHESTER

To

OSLO

Dep Time

Dep Time

Dep Time

Dep Time

Arr Time

Arr Time

Arr Time

Arr Time

Airline

Airline

Airline

Airline

'Show me flights from this city ...'

'... to Oslo'

Figure 1: Deictic blending of speech and gesture in MATIS. ment of user and system models for interactionally sophisticated technology, for example multi-modal and gestural interaction, and the integration of these models within the practice of system design. In this section we describe two models (a user model and a system model) that can be used to capture and reason about requirements on human-computer interaction. After describing the individual role of the models, this section explains how they can be brought into contact so that the different demands placed on the implementation by the two disciplines can be related and reconciled. Our example is MATIS [2], a multi-modal interface to an airline flight information system designed to evaluate implementation techniques for combining input from multiple modalities [3]. It allows a user to plan a multi-stage journey by completing ‘query’ forms that can be used to search a flight database. These can be specified using multiple modalities, either individually or in combinations. The example in Figure 1 shows a user combining spoken natural language with mouse-based gesture to fill in the second query template. On the left hand side the user has begun to speak a request that contains a deictic reference, ‘this city’. To resolve the deictic reference (this), the user needs to select the city that she is referring to by using a pointing device such as a mouse. That is, she needs to locate the name of the city on some part of the display, position the pointer over that name, and click on a button. The right hand side of Figure 1 shows the query form after the user has completed articulating the question. This process of combining data from multiple streams is called data fusion [4]. An generic framework for implementing this type of interface, described in [3], uses a concept of a temporal window that defines how data originating from multiple streams is matched and fused into domain structures.

contains an internal state representing some facet of the application domain, and a presentation that describes the perceivable components of that state. Changes to the state and presentation are effected by actions. Effective reasoning about a complex system like a virtual environment, requires a description that abstracts away from all but the essential details of the system’s behaviour. To this end, software engineering, like many other disciplines, has recruited the descriptive power and economy of expression afforded by various branches of mathematics, in particular logic [8], and discrete structures such as sets and relations [9]. The complexity of large system models is managed by organising specifications into structures. Interactors enable the specification to be structured around concepts that are significant from the viewpoint of the user, for example presentations and actions. As we have argued elsewhere [6], the use of formal models in interactive systems is somewhat different from the role of formal methods in software development. Formal methods support the precise specification of system functions in order to ensure, either through refinement or verification, that an implementation meets its requirements. In contrast, the models we build are part of the process of understanding what those requirement should (or could) be in the first place. The process is thus closer to the role of a scientific model in supporting description and reasoning about observations, as opposed to an engineering model that defines the structure of an artefact. The focus of this paper is on the problem of considering the role of the human in interactionally rich systems, rather than results specific to any one application domain. Our model abstracts away from the problems of implementing multi-modal fusion, as described in [3]. Instead, its focus is the interaction between user and system. The specification assumes the existence of certain types that represent domain concepts:

Formal Models for Human-Computer Interaction

type qnr name data

Our approach builds on work on using interactors to present formal models of computer agents [5, 6, 7]. An interactor 2

- identifiers for different queries - field names on query forms - values assigned to query fields

Type constructors include cartesian product (S T), partial functions (S ! 7 T) and sequences (S ). These are based on the Z notation [9], from which we also make use of ‘free type’ definitions that are built from constructors. For example, a type to represent ‘optional’ values is given by: opt[T]

::=

expressed in modal action logic (MAL) [8]. This contains the usual connectives and quantifiers of first order logic, e.g. and (^), implies ()) and for-all (8). For any action A and predicate Q, the logic also includes a modal predicate [A] Q, meaning that Q is required to hold in any state that results after performing the action. Deontic operators [11] are also defined in the logic. The predicate per(A) indicates that the action ‘A’ is permitted, while an obligation to perform ‘A’ is written as obl(A). For MATIS, axiom 1 defines the effect of the ‘speak’ action on the speech data stream. If the value of the speech stream is X, the axiom requires that the effect of speaking a name-data pair is to append that pair to X.

none j somehhTii

This definition is generic with respect to some other type ‘T’. For example, a value of type opt[N ] is either the constant ‘none’, or for any number n 2 N , the value some(n). The optional type is used in the specification to distinguish between data values and deictic references within the speech data stream. In this particular example, the behaviour of the system is described within a single interactor. The state of the model is given by a collection of typed attributes or variables that represent those aspects of the MATIS system that we are interested in observing. Our model of data fusion simply captures the input arriving on the two data streams, and the current state of the queries. We indicate that queries are a user-perceivable part of the system by an annotation; vis indicates that the attribute is a visual percept [10].

axioms 1

vis

fields

2

: qnr name ! 7 data

speech : (name opt[data]) result

: name ! 7 data

current : qnr

These attributes represent: the content of the query forms (fields), the sequence of data on the mouse stream (mouse), the sequence of data, or holes, on the speech stream (speech), the outcome of resolving deictic references (result), and the currently ‘active’ query (current). Four actions are defined on the model;

Interacting Cognitive Subsystems (ICS) ICS [1, 12, 13] is a comprehensive model of human information processing that describes cognition in terms a collection of sub-systems that operate on specific mental codes. Although specialised to deal with specific codes, all subsystems have a common architecture, shown in Figure 2. Incoming data streams arrive at an input array, from which they are copied into an image record representing an unbounded episodic store of all data received by that subsystem. In parallel with the basic copy process, each subsystem also contains transformation processes that convert incoming data into certain other mental codes. This output is passed through a data network to other subsystems. If the incoming data stream is incomplete, a process can augment

actions art

speak : name opt[data]

lim

select : data

mouse = M ) [select(d)] mouse = M a hdi

We could go on to define further axioms, for example the effect of the ‘fuse’ action, but these are beyond the needs of this paper. Our concern is with the role of such a model in software development. On its own, this system model can be a valuable source of insight that can raise significant design questions; for example what should happen if a field name is mentioned twice in the speech stream? What the model does not (and cannot) address is how users will be able to deploy the resources of the interface to achieve their tasks. Will users of systems like MATIS even be able to perform deixis with the proposed interface? To try and address these issues, we need to consider what kinds of cognitive resources are involved in user-system interaction.

mouse : data

vis

speech = X a h(nm; d)i

Axiom 2 defines similar behaviour for the ‘select’ action, though here the new value is a data item that is appended to the stream of mouse input.

interactor MATIS attributes

speech = X ) [speak(nm; d)]

fuse fill these represent the articulation of a data value for a named field (speak), selection of a value using the mouse (select), the fusion of input streams to yield a result (fuse), and the transfer of a result into the active query. The relationship between the observables of the system is described by a number of axioms that we introduce below. These are 3

it by accessing the image record. Coherent data streams may be blended at the input array of a subsystem, with the result that a process can transform data derived from multiple input sources. from store

via the hand is controlled by the limb subsystem (6) through :obj-lim:. While this configuration is actively locating an object, a second sequence of processes could be engaged in producing spoken output, such as “now where is that icon?” This would require the :prop-mpl: and :art-speech: processes. Principle 1 of ICS constrains the occurrence of secondary configurations such as this, by limiting the action of any one transformation process to a single data stream.

to store image record copy

input of code C

transform C to X transform C to Y transform C to Z

input array

8 ac

Figure 2: Generic structure of an ICS subsystem.

→mpl

mpl

→implic

→art

→speech

art

→type

→prop

ICS assumes the existence of 9 distinct subsystems, each based on the architecture described above: Sensory subsystems VIS visual: hue, contour etc. from the eyes AC acoustic: pitch, rhythm etc. from the ears BS body-state: proprioceptive feedback

7 3 prop

→mpl →implic →obj

art← lim← implic←

bs

4

Structural subsystems OBJ object: mental imagery, shapes, etc. MPL morphonolexical: words, lexical forms

implic

→prop →som →visc

Meaning subsystems PROP propositional: semantic relations IMPLIC implicational: holistic meaning 1

Effector subsystems ART articulatory: subvocal rehearsal, speech LIM limb: motion of limbs, eyes, etc

→implic vis

→obj

obj

2

Overall behaviour of the cognitive system is constrained by the possible transformations and by several principles of processing. Visual information for instance cannot be translated directly into propositional code, but must be processed via the object system that addresses spatial structure. The collection of processes that are deployed in supporting a particular task is called a configuration. In addition to the nine subsystems and their transformations, Figure 2 also shows a configuration that might be deployed while searching for an icon on a display. In order to locate some object with a mouse, information arriving at the visual system (1) will be transformed into object code (2) that contains the basic organisation of visual elements on the display. This transformation is written as :vis-obj:. At the same time, the propositional subsystem is buffering information about the target (3) through its image record, and using :prop-obj: to produce an object code representation (4). When this representation can be blended at the obj subsystem with the incoming representation from :vis-obj:, :obj-prop: will be able to return a matching representation (5) to the propositional subsystem to indicate that a possible target has been found. Finally motion of the mouse

→mpl →prop →lim

som visc

5 →leg lim

→hand

6

Figure 3: The ICS model, showing one configuration of resources. A cognitive model such as ICS can be used to address the kind of design question that we confronted at the end of previous section. The design options under consideration within the system model could be represented within some ‘mediating’ framework, for example Design Rationale [14, 15]. The advice provided by modellers could be used to define criteria for choosing between options. This approach is critically dependent upon the success of the mediating representation, and it is more likely than not that there would be some loss in translation both from the system modellers to the user modellers, and back again. Furthermore, the use of a mediating expression also makes it difficult to generalise beyond a particular situation, since it is not possible to see directly whether the assumptions of each model concur, or whether changing aspects of one model would have consequences for the other. These problems can be overcome by syndetic modelling, 4

D’ to mean that C ‘is a part of’ D; thus S ? @ (ST), S ?@ SR, and, for a dual processing configuration, S ? @ S + T. The configuration in Figure 3 can be expressed as:

where user and system representations can be expressed within a common framework, without the need for additional representations. It provides a direct link between the observables and actions of a theory that prescribes a system’s behaviour and the resources and constraints that characterise users’ capabilities. In practical terms syndetic modelling is a framework for bringing theoretically grounded models to bear on complex problems of humancomputer interaction.

[:vis-obj:(:obj-prop::prop-obj:BUF)R:obj-lim::lim-hand:] + [:prop-mpl::mpl-art:] The current configuration is one observable that characterises the overall behaviour of cognitive processing in ICS. We are also concerned with the mental representations available at a given subsystem; we write ‘r@s’ to indicate that representation ‘r’ is available at subsystem ‘s’. ICS assumes that all representations are built from basic units organised into a super/subordinate structure, with differences in mental codes corresponding to encoding dimensions of the basic units [12, 1]. However a thorough treatment of these structures is beyond the scope of the present paper. Here, we simply parameterise the model with a generic type to stand for mental representations. This can then be instantiated, as needed, with a set of values that captures the representations involved in a particular domain. The potential for a transformation process to operate on data derived from multiple sources is represented by an attribute called ‘blended’. For a subsystem ‘s’, ‘blended(s)’ is the set of subsystems whose output can be blended on the input array of ‘s’ to form a single coherent data stream. The model also describes which transformations of particular representations have been proceduralised, and the location of the buffered transformation. The latter can be derived from the structure of the system configuration.

Syndetic Modelling This section describes how the structure and axioms of an interactor provide a common framework for bringing the previously disparate viewpoints of rigorous software development and cognitive modelling into direct contact. Once user and system theories are expressed in this common framework, we can use the expressive power of the formal representation to describe and reason about how the cognitive systems of a user are deployed in performing tasks with the system. By doing so, we can gain some understanding on the requirements that a system must satisfy if it is to be usable for its intended task. To illustrate this approach we will investigate how constraints on the blending of cognitive data streams might present problems for the use of deictic reference within a multi-modal interface. An Axiomatic Description of ICS A comprehensive model of ICS is beyond the scope of this paper. Instead, our specification is built around the data streams that define and control the flow of information through the subsystems. We are particularly interested in reasoning about configurations like the one shown in Figure 3. Our concern is how the properties of the configuration might affect users’ ability to blend data streams in the way needed to interact with a system like MATIS. A definition of resource configurations is given below: Config

::= j j j j

:sys-sys: Config Config ConfigR ConfigBUF Config + Config

interactor ICS [Repr] attributes config @

: Config

: Repr sys ! B

blended : sys ! P sys buffered : sys sys

– transformation – chaining – recipricol loop – buffering – dual processing

proc

: sys sys Repr ! B

The model includes two actions that modify the configuration of resources and information in the system. A transformation action (trans) represents the processing of data available at one subsystem into a representation that becomes available at another subsystem. The second action, ‘buffer’, captures changes in the location of the buffered transformation that otherwise preserves the overall structure of a configuration. Oscillation of the buffer within a configuration is indicative of a novel or difficult task where focal awareness shifts between the different kinds of representation involved in performing the task.

The basic unit of a configuration is a process that transforms representations from one mental code to another. A transformation process :src-dst: is located in the ‘src’ subsystem and generates representations for the ‘dst’ system. For example, :vis-obj: is the process within the visual subsystem that extracts an object-level representation of visual @ information. If C and D are configurations, we write ‘C ? 5

actions

tents of the image record, thus enriching the available representation. Axiom 6 expresses this property by requiring that a non-proceduralised transformation will be competing for buffering.

trans

buffer : sys sys Principle 1 of ICS constrains the occurrence of secondary configurations by limiting the action of any one transformation process to a single coherent data stream. We express this constraint as axiom 1: for any two processes :s-t: and :u-t: that generate the same code (i.e. t), the output of those processes must be blended at the destination subsystem. If data streams cannot be blended, cognition may ‘oscillate’ between different configurations that draw on one or other of the streams. As we shall discuss later, such oscillations can be problematic. For present purposes we consider two data streams to be coherent only if the same representations are available at both originating subsystems (axiom 2). In practice, coherence only requires that the representations involved have a consistent structure; a detailed formal account of this concept is not developed here.

5 6

@ config ^ : proc(src; dst; p) p@src ^ :src-dst: ? ) obl(buffer(src dst)) ;

Blending Speech and Gesture We can now combine the system and user models of MATIS into a single, syndetic theory. This explains how cognitive systems need to be configured to accomplish particular tasks with the system. Figure 4 brings together the system components involved in effecting multi-modal fusion with the ICS structures from Figure 3 involved in processing. A syndetic model, bringing together the user and system specifications provides the necessary framework for expressing constraints that involve both agents. In this example, the observables inhereted from the two ‘local’ theories are augmented by an action that represents the user reading a data item from the contents of the presentation.

axioms 1

p@src ^ :src-dst: ? @ config ) [trans(p)] p@dst

8 s t u : sys :s-t: ? @ config ^ :u-t: ? @ config , fs ug blended(t) 8 s t u : sys s 6= t ^ fs ug blended(t) ) 8 r : Repr r@s , r@u ; ;

;

2

interactor MATIS-User

; ;

MATIS

;

ICS actions read : data

Axiom 3 links the value of the ‘buffered’ attribute to the structure of the configuration. The ‘buffer’ action indicates a change in the position of the buffer within the configuration, and axiom 4 describes the relationship between the configurations that exist before and after this action occurs. 3 4

The conjoint behaviour of the two agents is captured by three axioms that span the two sets of observables. The first axiom defines the condition under which it is possible for the user to read an item of data from the presentation. On the system side, there must exist a field on a query such that the value of the field is the data item. On the user side, the configuration must include a data stream from the visual system, through the object and morphonolexical levels, to the propositional subsystem.

buffered = (s; t) , :s-t:BUF ? @ config config = C ) [buffer(s; t)] buffered = (s; t) ^

@ config , :x-y: ? @C 8 x y : sys :x-y: ? ;

axioms

If some representation ‘p’ is available at a subsystem ‘src’, and the configuration includes the process that takes ‘src’ code to ‘dst’ code, a transformed representation of ‘p’ will become available (after a delay) at the destination subsystem. This change in state is described in axiom 5. If a transformation is part of a configuration, but is unable to operate on the current representation, buffering will be needed before that transformation can produce stable output. Here buffering allows the process to draw on the con-

1

per(read(d)) ) 9 q : qnr; n : name d = fields(n) ^

:vis-obj::obj-mpl::mpl-prop: ? @ config Axioms 2 and 3 address the cognitive requirements associated with the action of selecting a data item with the mouse, and uttering some part of a query: 6

prop←mpl

mpl→art speak(n,d) ..speech..

propA

mplA artA

mpl←prop

obj←prop

art→speech

to screen

obj→mpl

..mouse..

from screen

lim→hand vis→obj

objA

obj→lim

MATIS

select(d)

limA

user model

system model

Figure 4: Information flow between user and system processes.. 2

per(select(d)) )

per(speak(s)&select(d))

3

per(speak(n; d)) )

) )

d@prop ^ wordsearch ? @ config :prop-mpl::mpl-art: ? @ config

In order to select an item (with the mouse) the user will need to have a proposition involving that item, either through having read and recoded a value or as part of goal formation. Now, since items on the MATIS display are lexical structures, the configuration for object search is not sufficient. The mpl and prop systems need to be recruited to find lexical objects (words) on the screen and compare them with the users’ goals, and this will require the configuration

) ) )

wordsearch

=

)

:vis-obj:(:obj-mpl:(:mpl-prop::prop-mpl:)R:prop-obj:)R :obj-lim::lim-hand:

per(speak(s)) ^ per(select(d)) [deontic axiom] s@prop :prop-mpl::mpl-art: ? @ config per(select(d))

[MATIS-User.2]

s@prop :prop-mpl::mpl-art: ? @ config wordsearch ? @ config

[MATIS-User.3]

s@prop :prop-mpl: ? @ config :obj-mpl: ? @ config

[Defn of ? @]

s@prop fprop; objg 2 blended(mpl)

[ICS.1]

s@obj

[ICS.2]

The result above shows that a user will not be able to articulate a phrase at the same time as they search for a different value on the display, since the two representations needed at the morphonolexical level are not the same. Thus, the syndetic model shows that in order to employ the resources defined in the system model, a user of MATIS may have to ‘interrupt’ a spoken request in order to locate a value for deictic reference. This need to switch processing mode will be distracting. If the system also requires selection to occur within some temporal window around a deictic utterance, the user may not be able to carry out the context switching and location of an appropriate value in time. Such timing constraints could be included in a more detailed account.

This explains axiom 2; axiom 3 likewise sets out the configuration required for the user to articulate part of a query. The key point is that ‘speaking’ and ‘selecting’ both require use of the morphonolexical system, and ICS axiom 1 states that this resource can only be shared if it is operating on two coherent data streams. In terms of our approximate model, this means that the user can only be articulating a phrase and searching for a lexical item on the screen if that phrase is also the focus of the of the visual search. We can express this property formally: MATIS-User ` per(speak(s)&select(d)) ) s@obj A sketch of the proof is set out below. 7

Prospects and Conclusion

constraint on human performance, could be made clearer by enriching the model to include the different phases of cognitive activity described in [12] for example. One factor brought out in the syndetic analysis of gestural interaction [18] is the significance of timing constraints on system-user feedback. It would be useful if the theory could express these directly. The duration calculus [19] may provide a suitable framework for expressing and reasoning about these constraints. This might for example allow us to reason the use of temporal windows to moderate blending of data streams [4].

The previous section showed that detailed and sometimes unexpected constraints on user performance can be deduced from syndetic modelling. In contrast system models can only describe what the system should do. Any claims or assumptions made about user performance must be validated separately, either through appeal to cognitive theory or directly, through prototyping and experimental evaluation. User models likewise require assumptions about system behaviour. This is a key limitation for both. By expressing user and system constraints in equal terms, syndetic models allow direct (and formal) comparison between the capabilities and limitations of the two parties in an interactive system. As the underlying cognitive and system theories are built into the model, the reason why some problem exists, such as difficulty in expressing deictic queries, can be found. Alternative design solutions can then be driven by theoretical insight, rather than through a potentially expensive ‘generate and test’ cycle of ad-hoc changes. In the case of a system that builds on the technology of MATIS, location of cities might be supported better through a graphical display, for example a map in which spatial location may reduce the need to invoke :prop-mpl: transformations. Two other approaches that attempt to put integration on a theoretical basis are the Interaction Framework [16] and the production system model described by Kieras and Polson, [17]. The former is intended to provide an agent-neutral view, abstracting away from any specific user or system representation by working with a notion of event trajectory. In contrast, Kieras and Polson combine a productionsystem model of the user with a GTN representation of the system to obtain a detailed operational model of both agents. Syndetic modelling avoids the level of detail that makes Kieras and Polson’s model difficult to apply, by operating with abstract axiomatic specifications of user and system. This involves commitment to specific cognitive and user models, in contrast to the Interaction Framework which operates with its own event-based representation. The benefit of our model is that we can make use of existing cognitive and system theories directly. The results of analysis are then expressed in terms that can be used to guide behavioural evaluation or system implementation. In this paper we have had space only to outline the theory and benefit of syndetic models. We are currently concerned with expanding syndetic analysis to cope with larger design spaces, and investigating its utility in design. Work on expressing the ICS model with the axiomatic framework used in this paper is ongoing, with case studies such as this serving to validate the feasibility of the approach. The model used in this paper is still quite approximate. In particular, the connection between the property described at the end of the previous section, and the actual

Acknowledgements We thank A.E. Blandford, T. Green, and M.D. Harrison for their helpful comments on the development of this paper. This work was carried out as part of the Amodeus-2 project, ESPRIT Basic Research Action 7040 funded by the Commission of the European Communities. Information about the Amodeus-2 project, as well as many of the technical reports produced to date, is available electronically: http://www.mrc-apu.cam.ac.uk/amodeus/amodeus.html, or ftp://ftp.mrc-apu.cam.ac.uk/pub/amodeus

References [1] P.J. Barnard and J. May. Interactions with advanced graphical interfaces and the deployment of latent human knowledge. In Eurographics Workshop on Design, Specification and Verification of Interactive Systems. Springer, June 1994. Held in Bocca di Magra, Italy. To appear 1995. [2] L. Nigay. Conception et modélisation logicielles des systèmes interactifs. Ph.D. Thèse de l’Université Joseph Fourier, Grenoble, 1994. [3] L. Nigay and J. Coutaz. A generic platform for addressing the multimodal challenge. In Proc. of CHI’95. Addison-Wesley, 1995. [4] L. Nigay and J. Coutaz. A design space for multimodal systems: Concurrent processing and data fusion. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel, and T. White, editors, Proc. INTERCHI’93, pages 172–178. Addison-Wesley, 1993. [5] D.J. Duke and M.D. Harrison. Abstract interaction objects. Computer Graphics Forum, 12(3):25–36, 1993. Conference Issue: Proc. Eurographics’93. [6] D.J. Duke and M.D. Harrison. From formal models to formal methods. In Proc Intl. Workshop on Software Engineering and Human-Computer Interaction, 8

[17] D. Kieras and P.G. Polson. An approach to the formal analysis of user complexity. International Journal of Man-Machine Studies, 22:365–394, 1985.

volume 896 of Lecture Notes in Computer Science, pages 159–173. Springer-Verlag, 1994. [7] F. Paterno’ and A. Leonardi. A semantics based approach for the design and implementation of interaction objects. Computer Graphics Forum, 13(3), 1994. Conference Issue: Proc. Eurographics’94, Oslo, Norway.

[18] D.J. Duke. Reasoning about gestural interaction. Computer Graphics Forum, 14(3), 1995. Conference Issue: Proc. Eurographics’95, Maastricht, The Netherlands.

[8] M. Ryan, J. Fiadeiro, and T. Maibaum. Sharing actions and attributes in modal action logic. In T. Ito and A.R. Meyer, editors, Theoretical Aspects of Computer Software, volume 526 of Lecture Notes in Computer Science, pages 569–593. Springer-Verlag, 1991.

[19] Zhou Chaochen. Duration calculi: An overview. In D. Bjørner, M. Broy, and I.V. Pottosin, editors, Formal Techniques in Programming and Their Applications, volume 735 of Lecture Notes in Computer Science, pages 256–266. Springer-Verlag, 1993.

[9] J.M. Spivey. The Z Notation: A Reference Manual. Prentice Hall International, second edition, 1992.

A Glossary A list of the abbreviations and names used in the paper is given below. The following comments are used: (A) - action name, (I) - ICS subsystem, (T) - type name, and (V) attribute (variable) name.

[10] D.J. Duke and M.D. Harrison. A theory of presentations. In FME’94: Industrial Benefit of Formal Methods, volume 873 of Lecture Notes in Computer Science, pages 271–290. Springer-Verlag, 1994.

ac art @ bs blended buffer buffered Config config current data fields implic lim mouse mpl name prop qnr obj opt proc read select speak speech trans result vis

[11] J.-J. Ch. Meyer and R.J. Wieringa, editors. Deontic Logic in Computer Science: Normative System Specification. Wiley Professional Computing, 1993. [12] P.J. Barnard and J. May. Cognitive modelling for user requirements. In P.F. Byerley, P.J. Barnard, and J. May, editors, Computers, Communication and Usability: Design Issues, Research and Methods for Integrated Services, North Holland Series in Telecommunication. Elsevier, 1993. [13] P.J. Barnard. Interacting cognitive subsystems: A psycholinguistic approach to short-term memory. In A.W. Ellis, editor, Progress in the Psychology of Language, volume 2. Lawrence Erlbaum Associates, 1985. [14] V. Bellotti. Intergrating theoreticians’ and practioners’ perspectives with design rationale. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel, and T. White, editors, Proc. INTERCHI’93, pages 101–106. Addison-Wesley, 1993. [15] A. MacLean, R. Young, V. Bellotti, and T. Moran. Questions, options, and criteria: Elements of design space analysis. Human-Computer Interaction, 6(3&4):201–250, 1991. [16] M.D. Harrison and P.J. Barnard. On defining requirements for interaction. In Proc. of the IEEE International Workshop on Requirements Engineering, pages 50–54. IEEE Press, 1993. 9

- acoustic subsystem - articulatory subsystem - codes available at a subsystem - body-state subsystem - data streams blended at a subsystem - move location of ICS buffer - location of buffer in ICS configuration - structure of ICS configurations - current configuration of ICS resources - active query in MATIS - data values that can appear on a query - the content of a slot on each query - implicational subsystem - limb-control subsystem - stream of data from the mouse - morphonlexical subsystem - field names on a MATIS query - propositional subsystem - identifier for queries on MATIS - object (structural) subsystem - optional values - is a transformation proceduralised? - get visual data from MATIS display - choose a value using the mouse - articulate a name-value pair - stream of data from speech recogniser - transformation of mental codes - final assignment of values to slots - visual subsystem

(I) (I) (V) (I) (V) (A) (V) (T) (V) (V) (T) (V) (I) (I) (V) (I) (V) (I) (T) (I) (T) (V) (A) (A) (A) (V) (A) (V) (I)