Automating Human-Performance Modeling at the ... - CiteSeerX

4 downloads 27524 Views 1MB Size Report
Jan 31, 2005 - Center. Michael Freed is a Computer Scientist with an interest in cognitive ... a model of automated teller machine interaction. ...... password bottoms out in a call to get-PIN (a cognitive operator that recalls the PIN) and.
31jan05 Accepted for publication in Human-Computer Interaction

Automating Human-Performance Modeling at the Millisecond Level Alonso H. Vera NASA Ames Research Center & Carnegie Mellon University Bonnie E. John Carnegie Mellon University Roger Remington NASA Ames Research Center Michael Matessa NASA Ames Research Center Michael A. Freed NASA Ames Research Center

RUNNING HEAD: HUMAN PERFORMANCE MODELING) Corresponding Author’s Contact Information: Dr. Alonso H. Vera Mail Stop 262-4 NASA Ames Research Center Moffett Field, CA 94035

31jan05 Accepted for publication in Human-Computer Interaction Brief Authors’ Biographies: Alonso Vera is a Cognitive Scientist with an interest in human performance modeling tools; he is faculty at Carnegie Mellon and a Senior Research Scientist at NASA Ames Research Center where he leads the HCI Group. Bonnie John is an Engineer and Cognitive Psychologist with an interest in modeling as a usability assessment method; she is a Professor in the Human Computer Interaction Institute at Carnegie Mellon University. Roger Remington is a Cognitive Scientist with an interest in basic cognitive processes; he is a Senior Research Psychologist and heads the Cognition Group at NASA Ames Research Center. Michael Matessa is a Cognitive Scientist with an interest in communication and modeling; he is a Research Psychologist at NASA Ames Research Center. Michael Freed is a Computer Scientist with an interest in cognitive architectures and autonomy; he is faculty at the Institute for Human and Machine Cognition and a Senior Research Scientist at NASA Ames Research Center where he leads the Intelligent Architectures group.

31jan05 Accepted for publication in Human-Computer Interaction ABSTRACT A priori prediction of skilled human performance has the potential to be of great practical value but is difficult to carry out. This paper reports on an approach that facilitates modeling of human behavior at the level of cognitive, perceptual, and motor operations, following the CPM-GOMS method (John, 1990). CPM-GOMS is a powerful modeling method that has remained underused because of the expertise and labor required. We describe a process for automatically generating CPM-GOMS models from a hierarchical task decomposition expressed in a computational modeling tool, taking advantage of reusable behavior templates and their efficacy for generating zero-parameter a priori predictions of complex human behavior. To demonstrate the process, we present a model of automated teller machine interaction. The model shows that it is possible to string together existing behavioral templates that compose basic HCI tasks, (e.g., mousing to a button and clicking on it) in order to generate powerful human performance predictions. Because interleaving of templates is now automated, it becomes possible to construct arbitrarily long sequences of behavior. In addition, the manipulation and adaptation of complete models has the potential of becoming dramatically easier. Thus, the tool described here provides an engine for CPM-GOMS that may facilitate computational modeling of human performance at the millisecond level.

1. INTRODUCTION Engineering design makes extensive use of computer simulation to explore the consequences of design options. Similarly, Human-Computer Interaction (HCI) has long recognized the potential value of having computer models of human performance that would allow anticipation of human responses to operational situations. Though the field is far from having a comprehensive model of human performance characteristics, several computational approaches have been successful in making accurate predictions of user choices as well as task completion times (e.g., Card, Moran & Newell, 1983; Gray, John, & Atwood, 1993; Kitajima & Polson, 1995; Pirolli & Card, 1999; Young, Green, & Simon, 1989; Young & Whittington, 1990). While some of the above efforts target specific classes of HCI behavior (e.g., label following on web pages), others try to provide a more general human model. These approaches combine theories from information processing psychology and cognitive science to model the flow of information from perception through cognition to action, incorporating known properties of human information processing (see Pew & Mavor, 1998). Architectures such as ACT-R, (Anderson & Lebiere, 1998), Soar (Newell, 1990), and EPIC (Meyer & Kieras, 1997a, 1997b), that were developed to explore computational theories of the underlying psychology, are now being applied to more complex real-word tasks (e.g., Salvucci & Macuga, 2001; Tambe, Johnson, Jones, Koss, Laird, Rosenbloom, & Schwamb, 1995). Still other architectures, such as MIDAS (Laughery & Corker, 1994; Corker, 2000) and Omar (Deutsch, Adams, Abrett, Cramer, & Freehrer, 1993), were developed explicitly as engineering models to assist in understanding and designing complex human-machine systems, such as those found in military and aerospace operations. manuscript page 1

31jan05 Accepted for publication in Human-Computer Interaction Despite these efforts, there has been little penetration of user modeling into HCI design practice. While modeling tools are well informed by current theory and empirical work, and can fit some existing data well, there remain significant usability problems with the tools themselves that pose barriers to their use. Crafting computational cognitive models requires specialized expertise in cognitive psychology as well as extensive knowledge of the specific underlying architecture itself. Consequently, the activity of modeling human performance has been confined largely to researchers involved in developing particular modeling architectures. It is clear that before cognitive modeling can be used in engineering development environments, we will have to advance our techniques and tools to enable easier and faster model development, testing, and refinement. What limited penetration into engineering development has occurred has been largely with task analysis methods that make strong simplifying assumptions about human performance (e.g., Haunold & Khun, 1994; John and Kieras, 1996b). One such class of simplifying assumptions was introduced in The Psychology of Human Computer Interaction, (Card, Moran, and Newell,1983) where the authors described a computational method that could be used to make a priori predictions about how users would accomplish a given task with a specified interface. Called GOMS – an acronym for Goals, Operators, Methods, and Selection rules – the method involved decomposing a task into a set of nested goals and subgoals. A GOMS analysis assumes that users accomplish a task by executing operators that move them from one state in the goal space to another. This straightforward method of cognitive task analysis has proven useful in representing the procedural knowledge that characterizes tasks in many domains. GOMS achieves a reasonable level of simplicity and usability by assuming serial execution of the leaf-node activities of the hierarchical decomposition. Card, Moran, and Newell (1983) also provided a simplified model of human information processing that embodied a set of assumptions (taken from theory and empirical data) about human perception, cognition, and motor behavior from which human performance predictions could be made. The approach was based on Bell and Newell’s (1971) analysis of computer architectures, which abstracted over details of circuitry to describe the functional architecture of the computer and allowed black boxlike descriptions of the functions of particular aspects of the hardware. Similarly, Card, et al., proposed that the human cognitive system could be described in terms of separate processors for perception, cognition, and motor, each with associated cycle times. They called this characterization of the human information processing architecture the Model Human Processor (MHP). Though GOMS and the MHP would not be integrated until later (John, 1988), the method embodied in GOMS and the structure of human mental processing outlined in the MHP have constituted one of the primary tools for applied cognitive modeling for almost two decades. Even when not used directly, their influence on the architectures of many applied modeling systems is clear (see Pew & Mavor, 1998, Chapter 3). In this paper, we introduce a new tool for GOMS modeling at the MHP level that retains the accuracy of previous models while automating much of the difficulty in manuscript page 2

31jan05 Accepted for publication in Human-Computer Interaction constructing such models. First we review GOMS modeling, to provide background on the successes our tool preserves and difficulties it overcomes. We then describe the tool and how it automates those parts of the modeling process most difficult for human analysts. In section 4, we present an example of using our tool, to illustrate its current strengths and limitations. Finally, we conclude with a general discussion of how this tool fits into a larger picture of using computational cognitive modeling as a design method for interactive systems and future research toward that goal.

2. Background on GOMS As described by Card, Moran, and Newell (1983), GOMS task decomposition into a nested goal hierarchy is relatively straightforward given a sufficient understanding of the task itself. The hierarchy is created by recursive deepening of goals until the desired level of granularity is achieved (i.e., the operator level). Sequences of behavior derive from methods in GOMS that specify the order in which subgoals and operators are executed to achieve a goal. The effect of context and user preference can be captured by selection rules that determine which method is executed and, hence, which behavioral sequence will emerge. Because of a simplifying assumption of serial execution of operators, task completion times can be computed in a straightforward way by assigning times for individual operator execution and summing them for the desired behavioral sequence. Furthermore, earlier work has demonstrated that GOMS can be applied to modeling, not only user-driven tasks, but highly interactive tasks as well (John, Vera & Newell, 1994; Vera & Rosenblatt, 1995). Because GOMS can make good time-course approximations for procedural tasks and is relatively easy to learn and use, it is often taught in university courses. Most of the commonly used reference and textbooks in HCI have at least several pages and worked examples of GOMS models: (e.g., Brinck, Gergle, & Wood, 2002; Dix, Finlay, Abowd, & Beale, 1998; Eberts, 1994; Helander, Landauer & Prabhu, [Eds.], 1997; Newman, & Lamming, 1995; Preece, Rogers, Sharp, Benyon, Holland, & Carey, 1994; Raskin, 2000; Shneiderman, 1998). Its use in research contexts as an applied HCI task analysis method is also widespread (e.g., Bovair, Kieras & Polson, 1990; Byrne, Wood, Sukaviriya, Foley, Kieras, 1994; Gray, John & Atwood, 1993; Kieras, Wood & Meyer, 1997; Lerch, Mantei & Olson, 1989; Irving, Polson & Irving, 1994; Young & Whittington, 1990). The description above characterizes the original formulation of GOMS, from which variants have emerged. John & Kieras (1996a) describe four varieties of GOMS modeling techniques. Three make the assumption that all operators occur in sequence and usually do not contain operators below the task activity level (e.g., type-string, move-andclick-mouse). These three are the original formulation by Card, Moran and Newell (1980a, 1983) now termed CMN-GOMS, the Keystroke-Level Model (KLM) also formulated by Card Moran and Newell (1980b; 1983), and NGOMSL (Kieras, 1996). Software tools providing computational support for some aspects of GOMS modeling have been developed. QGOMS (Beard, Smith, & Denelsbeck , 1996) allows modelers to draw an hierarchical goal decomposition in a tree diagram. GLEAN (Kieras, Wood, Abotel, & Hornof, 1995) allows modelers to program an NGOMSL model in a dedicated programming environment. CATHCI (Williams, 1993) is a computer-based technique for manuscript page 3

31jan05 Accepted for publication in Human-Computer Interaction eliciting GOMS models from domain experts. CRITIQUE (Hudson, John, Knudsen, & Byrne, 1999) allows the modeler to automatically generate a KLM and most of a GOMS model simply by demonstrating a task. The fourth variant, called CPM-GOMS (John, 1988, 1990) allows parallel execution of Cognitive, Perceptual, and Motor processors, represented using the Critical Path Method, to achieve tighter estimates of performance for highly skilled users. The detail required for a CPM-GOMS model goes beyond the task-decomposition level supported by the software tools available for other GOMS methods. Human performance predictions are constructed from primitives explicitly based on estimates of the times for the elementary cognitive, motor, and perceptual operations. Activities such as typing a key or moving a mouse are modeled as an ordered set of cognitive, perceptual, and motor operators. Much of the power of CPM-GOMS to predict skilled behavior comes from its assumption of parallel operator execution. It models overlapping actions as the interleaving of cognitive, perceptual, and motor operators from neighboring elements in the behavior stream. By capturing the overlapping of perceptual, cognitive and motor operators for one task with those of subsequent tasks, it better approximates the smooth transitions between actions that characterize highly skilled human behavior. CPM-GOMS has been shown to make very accurate a priori predictions of human performance in realworld task domains. An example is Project Ernestine, which predicted the outcome of a test of new computer consoles, saving a telephone company $2 million per year (Gray, et al., 1993). Despite its success, there is no currently available software implementation of CPM-GOMS. As a result, the difficulty of constructing a CPM-GOMS model has been a significant barrier to its widespread use. Since the greater part of the difficulty lies in correctly capturing the intricate interplay of elementary perceptual, cognitive, and motor operators, great benefit would derive from a software tool that relieves the analyst of dealing with this level of detail, leaving only the relatively straightforward GOMS task decomposition. To illustrate what makes CPM-GOMS modeling difficult, we continue this background section by detailing the structure of CPM-GOMS models and the knowledge-intensive and tedious procedure analysts use to produce them by hand. We then describe how that manual procedure is automated in a system called Apex-CPM (Section 3) and present an example model with comparisons to data (Section 4). 2.1 The Structure of a CPM-GOMS Model A CPM-GOMS model combines the hierarchical task decomposition of CMNGOMS with a detailed representation of the MHP-level operators required to achieve task goals. Whereas CMN-GOMS models typically stop expanding the goal hierarchy at a linear sequence of operators at the keystroke level (about 200 ms), CPM-GOMS models continue to expand the goal hierarchy to a more complex schedule of concurrent and sequential operator executions at the level of elementary cognitive, perceptual, and motor operators. These cognitive, perceptual and motor components are of very short duration – tens of milliseconds to hundreds of milliseconds – making their individual manipulation cumbersome for any but the shortest tasks to be modeled. In response, it has become manuscript page 4

31jan05 Accepted for publication in Human-Computer Interaction standard practice to construct templates that describe the cognitive, perceptual, and motor sequences underlying commonly recurring task-level activities in HCI, such as mouse moving-and-clicking, or typing, which range from a fraction of a second up to several seconds (e.g., John & Gray, 1992; Gray & Boehm-Davis, 2001). Because they describe recurring interface activities, templates can be transferred from one application to another, often with no modification. Extended behavioral sequences can be created by stitching together strings of such templates. Templates currently exist for several common HCI activities that include typing, visually acquiring information from a screen (with or without eye-movements), pressing a single key, having a short conversation, and so forth. These templates were not implemented in any computational architecture but rather distributed as text and graphic descriptions across journal papers, conference papers, and tutorial materials in the HCI literature (e.g., John, 1996; John & Gray, 1992; Gray & Boehm-Davis, 2001). Templates are typically represented as PERT charts (Program Evaluation Review Technique; US Navy PERT Summary Reports, 1958; Modell, 1996), which depict the flow of activity in the cognitive, perceptual, and motor processors over time needed to accomplish the activity. Figure 1 shows a template, in PERT chart format, adapted from Gray and Boehm-Davis (2001) that models a person moving a mouse to and clicking on a target. Each row in Figure 1 depicts a resource stream, which from top to bottom are: World events, Perception, Cognition, and two separate motor resources, Right-Hand and Eye-movements. Each box represents an operator in the respective resource stream with its duration (in milliseconds) at the lower left. The width of each box is proportional to the duration of the operator, so this representation is also a timeline of operation; this timeline representation is not standard in PERT charts, but provides significant value in visualizing the activity of a model.

Figure 1. Model of carefully moving the cursor to a target and clicking the mouse button. (Adapted from Gray & Boehm-Davis, 2001) manuscript page 5

31jan05 Accepted for publication in Human-Computer Interaction Boxes are connected by lines, which represent dependencies on their execution. For example, the cursor must be moved to the target location before the mouse button can be clicked. Thus, the right-hand motor operator mouse-down must wait for the right-hand motor operator move-cursor to complete. With this in mind, it can be seen that this template, called Slow-Move-Click by Gray & Boehm-Davis, is composed of movements of the mouse done with the right hand (move-cursor, mouse-down), movements of the eyes to the target, as well as cognitive and perceptual activities that localize the target (attend-target, perceive-target, verify-target-position) and verify that the cursor is over the target before clicking the mouse button (attend-cursorat-target, perceive-cursor-at-target, verify-cursor-at-target). The convention in CPM-GOMS, derived from MHP, is to precede every motor action by a cognitive operator that initiates the action. Thus, move-cursor, eye-move, and mouse-down are preceded by cognitive initiate operators (init-move-cursor, init-eye-move, and init-click. Two concepts related to scheduling are useful in understanding CPM-GOMS models: the critical path and slack time. In any CPM-GOMS model, be it of a single template or a total task, there is a critical path comprised of those processes whose durations directly influence the total duration. In Figure 1, the critical path (depicted by the thicker box outlines and connecting lines) is determined in large part by the movecursor and subsequent operators that depend on its completion. Slack time occurs in a resource stream when all subsequent operators in that resource depend on the completion of an operator in another resource that has not yet occurred, resulting in a gap in the use of that resource. This can be seen in Figure 1 in the cognitive stream between initeye-move and verify-target-position, and between attend-cursor-attarget and verify-cursor-at-target. The presence of slack time, or more precisely the lack of activity in resource streams, sometimes creates the opportunity for operators from subsequent templates to execute (Section 2.2). Both the critical path and slack time are properties of a model that emerge only after all operators and dependencies of a task have been scheduled. This fact will become important when we discuss the differences between CPM-GOMS modeling by hand and automatic CPM-GOMS modeling in Section 3. 2.2 Building CPM-GOMS Models by Hand Building a CPM-GOMS model of a task of any length by hand is a tedious and error-prone job. We report elsewhere on the many mundane sources of tedium and error (John, Vera, Matessa, Freed, & Remington, 2002). However, there are also substantive conceptual difficulties associated with interleaving CPM-GOMS templates. To build a CPM-GOMS model, the analyst first produces a CMN-GOMS goal hierarchy that expands to the level of the names of templates. The analyst then finds these templates in a template library (John & Gray, 1992) and lays templates end-to-end in a project management tool (MacProject™). The templates are locked into sequential order by making the first operator of a template dependent on the completion of the last operator of its predecessor (see Figure 2a). This produces a model that performs overt manuscript page 6

31jan05 Accepted for publication in Human-Computer Interaction motor actions in the correct task order, but predicts behavior that contains substantial slack time and is too slow to match highly skilled human behavior. The analyst then has to interleave the templates as much as possible to model highly skilled behavior. In order to begin interleaving the templates, the analyst first identifies opportunities for cognitive operators to move forward from subsequent templates by looking for slack time in the earlier templates. Slack time occurs within templates because of dependencies that cause one process to wait for the completion of another. For example, in Figure 1, the verify-cursor-at-target must wait for perceive-cursor-at-target. This results in the cognitive resource stream sitting idle until the process in the perceptual stream completes. For an opportunity for interleaving to exist, the duration of the slack time has to be at least 50 msec, the duration of a cognitive operator. Figure 2a shows two areas of slack time in the first template (white) that are opportunities for interleaving: between the init-move-cursor and the verify-targ-pos and between the verify-targ-pos and the init-click cognitive operators. If the slack time is so large that multiple cognitive operators would fit, multiple operators can be considered for interleaving into that slack time. In Figure 2a, all the slack times are large enough for multiple operators to potentially interleave.

Figure 2: Two consecutive templates before (a) and after (b) interleaving, with the critical path in bold. manuscript page 7

31jan05 Accepted for publication in Human-Computer Interaction Once an opportunity is identified, the analyst searches for a candidate cognitive operator to move forward. In Figure 2 a, some candidates in the second template (gray) are attend-target, init-eye-move and init-move-cursor. Each cognitive operator in the next template is considered in turn and rejected as candidates if it is logically dependent on any operators in its own template. For example, init-eye-move would be rejected as candidate because a modeling idiom in the MHP that an eye movement to a target cannot be initiated until the intention to attend to that target has occurred (the attend-target cognitive operator). On the other hand, the init-movecursor operator in the gray template is not dependent on anything within that template, so would not be rejected by this criterion. Likewise, the attend-target operator in the gray template is not logically dependent on any operator within the template; there is no logical restriction on whether the eyes start to move first or the hand starts to move first in a mouse movement. Therefore, the attend-target operator would also remain a candidate under this criterion. The determination of which operator is logically dependent on which other operator in a specific template requires an understanding of the psychology embedded in the template construction. Note that since all the slack time is over 100 ms, the combination of attend-target and init-eye-move could be considered as a pair for interleaving as long as they maintain their dependency order as described above. Once a candidate operator satisfies the above criterion, the analyst evaluates whether the operator is dependent on any operators in the previous template that occurs later than the slack time. For example, consider moving the init-move-cursor operator from the gray template into the rightmost slack time of the white template. To fit transcription typing data, the TYPIST model (John 1996) requires that a motor operator on one hand has to complete before the cognitive operator to initiate the next movement could begin if the next movement is on the same hand. Extending this modeling result to mouse movements (an assumption that has yielded good fits to data as will be shown in Section 4), if the previous template ends with a motor operator on the right hand, and the candidate operator is an init-move-cursor for the right hand, this previously established MHP modeling idiom results in the rejection of the init-move-cursor as a candidate for interleaving. Thus, init-move-cursor cannot be moved into the white template at all because it must wait for the completion of the mouse-up; a dependency line is drawn from mouse-up (white) to init-move-cursor (gray) to denote this relationship. As another example, imagine attend-target and init-eye-move from the gray template advancing to between init-move-cursor and verify-targ-pos in the white template. Because nothing is constraining the gray eye-move operator from moving forward, it too could move to between init-move-cursor and verifytarg-pos. The resulting model would have an eye movement happening in the middle of a perceive-target perceptual operator. This model does not make sense because the eye would be moving away from the very information it was currently perceiving. Therefore, the attend-target and init-eye-move cannot move that far forward into the white template. However, attend-target and init-eye-move can move into the slack time between verify-targ-pos and the init-click because the perception is concluded by that time. Figure 2b shows the attend-target and initeye-move in this new interleaved position. The effect of this interleaving is that the eye manuscript page 8

31jan05 Accepted for publication in Human-Computer Interaction moves to the second target sooner, the second target is perceived sooner, and the critical path shortens from 1554ms to 1209ms. If the potential movement of a candidate operator across templates does not violate the assumptions imposed by the application of MHP-level psychological findings, the analyst moves it into the slack time and then examines the resulting critical path. If this interleaving causes the order of physical actions to change such that the task is no longer executed correctly, the operator is moved back to its original place and another candidate is considered. Although not demonstrated in Figure 2, this situation is prevalent in models involving two hands, e.g., the order of typed characters reverses, or a mouse click occurs before a string is typed by the other hand. The analyst has to catch the error and undo the interleaving that produced it. As evident from the description above, the interleaving process is a difficult one. It requires a great deal of knowledge of cognitive psychology, MHP idioms, and an intimate knowledge of the task being modeled, as well as an attention to the details of an iterative process where many intermediate states might fail and have to be undone. The interleaving process is very difficult to explain, teach, or write about, resulting in the correct perception that CPM-GOMS modeling is more art than science. Clearly, CPMGOMS will not become widespread in HCI practice until some of these difficulties are resolved, perhaps with computational support like the system we are proposing here.

3. Implementing CPM-GOMS in Apex Apex is a software system for resource scheduling and plan execution (Freed, 1998).1 It was designed to simulate an agent deciding how best to allocate its limited resources to accomplish a set of tasks. Apex includes a Resource Architecture that can be used to represent the cognitive, perceptual, and motor resources necessary to carry out tasks, and an Action Selection Architecture that determines how those resources will be allocated. The Action Selection Architecture implements a procedure-based reactive planner that represents plans at multiple levels of abstraction, committing resources only at execution time. This deferment of resource allocation creates a nested set of intermediate plans that specify abstractly what needs to be accomplished at the next level of plan detail. As with a GOMS goal hierarchy, plans are recursively deepened until execution time when primitive plans are allocated resources. Together, the Resource Architecture and Action Selection Architecture provide the necessary mechanisms for implementing the constraints needed to produce CPM-GOMS templates and accomplish their interleaving. The Resource Architecture in Apex is flexible and supports a direct representation of MHP processors. Apex treats all resources as equal and unary, providing a mechanism to enforce resource use by only one activity at a time. Apex-CPM, the system presented

1

http://human-factors.arc.nasa.gov/apex/index.html manuscript page 9

31jan05 Accepted for publication in Human-Computer Interaction in this paper, uses a subset of the features of Apex specifically for creating CPM-GOMS models. Apex-CPM repurposes Apex’s resources to map to the MHP’s cognitive (Apex’s memory), perceptual (Apex’s vision), and motor processes (Apex’s gaze and hands). As in other cognitive architectures (EPIC, ACT-R) sensory and motor functions are not modeled in detail, but simplified as needed to provide timing estimates and the interconnection of resource usage with cognitive operations. As a general simulation platform, Apex allows the addition of other resources, unassociated with the MHP. As discussed below, we take advantage of this in order to create a “virtual resource” mechanism that allows the modeler control over the template interleaving process. All behavior in Apex is generated by the execution of activities, which are derived from procedures that represent the "how to" knowledge within a domain. The hierarchical goal structure of a GOMS model is expressed in Apex as a nested set of procedures in its Procedure Description Language (PDL). In PDL, a procedure consists of a number of steps. PDL steps are themselves procedures that can be further decomposed hierarchically into procedures of simpler steps, until those steps bottom out in primitive activities that occupy resources. Procedures and the procedure expansions do not occupy resources. Because procedures are the only executable data structures in Apex, goals and methods in GOMS are represented as procedures. The PDL language is similar to the implementation of NGOMSL in GLEAN (Kieras, et al., 1995). However, PDL is closer to CMN-GOMS and CPM-GOMS in that it assigns no time to goal manipulation, only the execution of operators (primitive activities). 3.1 Representing a CPM-GOMS Template in Apex PDL Figure 3 shows the PDL expression of the CPM-GOMS template corresponding to the PERT chart in Figure 1. The procedure for the template consists of an index statement followed by a series of steps. Each step in a template is a primitive activity equivalent to a GOMS operator. A primitive activity requires the use of one or more resources, typically the cognitive resource, one or more hand resources, or a perceptual resource. To make the template human-readable, the steps are conventionally labeled “c” for cognitive, “p” for perceptual, “m” for motor, or “w” to denote an activity in the world. The index clause serves both as the name of the procedure as well as the unique identifier used by the Action Selection Architecture to access the procedure. For example, the procedure in Figure 3 will be activated by a step in a higher procedure of the following format: (step s1 (slow-move-click menu))

Given the step invocation above, “menu” would be bound to the value of the variable ?target. Likewise each step in (slow-move-click ?target) procedure is linked to a procedure whose index clause matches the step clause. Once (slow-move-click ?target) is called from another procedure, with a variable to bind to its argument, the corresponding procedures for each of its steps can become enabled. An enabled procedure is referred to as an activity. Procedures in an enabled activity become themselves enabled when their preconditions are satisfied. In Figure 3, these manuscript page 10

31jan05 Accepted for publication in Human-Computer Interaction preconditions are established by use of the waitfor clause. For example, the movecursor motor operator (step m1) must wait for the initiate-move-cursor cognitive operator (step c1) to complete before its procedure can become enabled. The waitfor clause can specify preconditions based on the state of other procedures, the results of other procedures, or events in the simulation world. Each enabled activity will execute if it does not require resources, or if no resource conflicts exist. Resource conflicts in Apex are resolved by the setting of explicit priorities (described in section 3.3).

(procedure (index (slow-move-click ?target)) (step c1 (initiate-move-cursor ?target)) (step m1 (move-cursor ?target) (step c2 (attend-target ?target)) (step c3 (initiate-eye-movement ?target) (step m2 (eye-movement ?target) (step p1 (perceive-target-complex ?target) (step c4 (verify-target-position ?target) (step w1 (WORLD new-cursor-location ?target) (step c5 (attend-cursor-at-target ?target) (step p2 (perceive-cursor-at-target ?target) (step c6 (verify-cursor-at-target ?target) (step c7 (initiate-click ?target) (step m3 (mouse-down ?target) (step m4 (mouse-up ?target) (step t1 (terminate)

(waitfor ?c1)) (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor (waitfor

?c2)) ?c3)) ?m2)) ?c3 ?p1)) ?m1)) ?c4)) ?p1 ?c5 ?w1)) ?c5 ?p2)) ?c6 ?m1)) ?m1 ?c7)) ?m3)) ?m4)))

Figure 3. PDL code for the CPM-GOMS template shown in Figure 1.

Notice that in Figure 3 neither c1 (initiate-move-cursor) nor c2 (attendtarget) waits for the completion of any step in this template. This is theoretically appropriate because when selecting a target with a mouse a skilled user can start to point before she starts to look at the target, or start to look before she starts to point. Thus, the model is indifferent to the order of execution of these activities, and the PDL code enforces no dependency between these two cognitive operators. The order of execution in this case will be determined by Apex’s Action Selection Architecture. A computational perspective on the implementation of Apex-CPM mechanisms is provided in Freed, Remington, Matessa, and Vera (2002). 3.2 Building CPM-GOMS Models in Apex-CPM Building CPM-GOMS models in Apex-CPM has similarities to and differences from the procedure for building CPM-GOMS models by hand. This section outlines these similarities and differences. Section 3.3 details the mechanisms underlying Apex-CPM’s implementation of CPM-GOMS models.

manuscript page 11

31jan05 Accepted for publication in Human-Computer Interaction The procedure for building a CPM-GOMS model by hand, described in Section 2.2, are summarized below so we can compare, point by point, to the procedure for using Apex-CPM. The h preceding each number indicates that it is a step in the hand-coded modeling procedure. h1 Construct a CMN-GOMS model, expanding the task hierarchy down to operators that are the names of templates. h2 String templates together in correct order as dictated by the task. h3 Identify sufficient slack time into which cognitive operators might be moved. h4 Bring knowledge of psychology to bear toward identifying candidate cognitive operators in a template that might move forward into an earlier template without violating necessary dependencies within their own template. h5 Bring knowledge of psychology to bear toward identifying candidate cognitive operators in a template that might move forward into an earlier template without violating dependencies within that earlier template. h6 Finally, bring task knowledge to bear to identify errors in the overt physical operators of the resulting model. The corresponding Apex-CPM mechanisms that implement each hand-coded CPMGOMS modeling heuristics are summarized below and detailed in the next section. The a preceding each number indicates that it is a mechanism of the automated CPM-GOMS modeling process. a1 Using PDL, construct a hierarchical CMN-GOMS model terminating in steps that are calls to templates. a2 Specify the order in which the templates need to be executed to yield correct task performance using the rank mechanism (detailed in section 3.3). a3 Apex schedules operators incrementally, as the required resource becomes available and the operator's dependencies are satisfied. The concept of slack time in hand-constructed models therefore corresponds to a resource being available during scheduling. An operator from a lower-ranked template (i.e., a template that comes later in the schedule) can be scheduled to use a resource if an operator from a higher-ranked template (i.e., an earlier template in the schedule) is not using that resource. a4 Within-template dependencies are implemented with the waitfor mechanism. An operator from a lower-ranked template cannot be interleaved into a higher-ranked template if it has to wait for an operator within its own template that has not yet been completed.

manuscript page 12

31jan05 Accepted for publication in Human-Computer Interaction a5 Operator relationships across templates are established during scheduling using virtual resources. A virtual resource (detailed in section 3.3) is a mechanism that can be used to block specific operators in the cognitive, perceptual, and motor resource streams from being executed even when that resource is available. The virtual resource mechanism is used such that blocks are started and stopped within templates, based on the specifics of each template, but their effects are manifest across templates. a6 The waitfor and virtual resource blocking mechanisms work together to prevent the scheduling of overt task actions in the incorrect task order. The next section provides a detailed description of the role of the goal hierarchy, the rank mechanism, and virtual resources in managing the interleaving process across templates in the context of an example. These are the critical mechanisms underlying the automation of CPM-GOMS. 3.3 Using Apex-CPM The previous section summarized the process of creating a CPM-GOMS model using Apex-CPM. Here we present an example to illustrate two important aspects of such models: the model in program form (John & Kieras, 1996) as specified by a human analyst and the runtime behavior of Apex-CPM that translates that model into a specific prediction of behavior. The first step in creating a CPM-GOMS model in Apex-CPM is to express the CMN-GOMS goal hierarchy in PDL (a1, above). Consider the task of using an ATM to withdraw $80 from a bank account. A simple goal hierarchy might be that to accomplish this banking task, the user must complete three subgoals: initiate the session, do the transaction, and end the session. Each of these subgoals in turn expands into a series of subgoals until they bottom out in calls to templates as explained in Section 3.2. Figure 4 shows PDL code that represents the top of this goal hierarchy and the partial expansion of the subgoal to do a withdrawal transaction.2 The expansion of the goal hierarchy stops with calls to the templates. The templates themselves expand into cognitive, perceptual and motor operators with dependencies as discussed in Section 3.2 and shown previously in Figure 3.

2

Because the simple goal hierarchy in Figure 4 has no conditionals and no strict waitfor dependencies, we can use a simpler syntax than that shown in Figure 3 for the slow-move-click template. The step label (e.g., “step c1”) is assumed by Apex and only the action of each step need appear in the list of subgoals or templates that make up a procedure. manuscript page 13

31jan05 Accepted for publication in Human-Computer Interaction The top of the goal hierarchy for withdrawing money from an ATM. (procedure :ranked (index (do banking)) (initiate session) (do transaction) (end session))

Each of the three subgoals above, are expressed as their own PDL procedure, indexed with the name of the subgoal. For example, the initiate session and do transaction subgoals are expanded below. (procedure :ranked (index (initiate session)) (insert card) (enter password))

(procedure :ranked (index (do transaction)) (choose withdraw) (choose account) (enter amount)

(retrieve money))

Procedures bottom out in calls to templates. For example, the insert-card and retrievemoney subgoals both bottom out in a call to the slow-move-click template. Enterpassword bottoms out in a call to get-PIN (a cognitive operator that recalls the PIN) and four calls to fast-move-click templates to enter that PIN. (procedure :ranked (index (insert card)) (slow-move-click card-slot))

(procedure :ranked (index (retrieve money)) (slow-move-click money-slot))

(procedure :ranked (index (enter password)) (get-PIN) (fast-move-click 4) (fast-move-click 9) (fast-move-click 0) (fast-move-click 1))

Finally, PDL expresses primitive activities (equivalent to GOMS operators) that make up the templates. The profile clause declares which resource is used by the activity for its :duration, as can be seen in the PDL code for get-PIN below. (procedure (index (get-PIN)) (profile memory) (step s1 (start-activity memory memory-act :duration 50 => ?a)) (step t (terminate) (waitfor (completed ?a))))

Figure 4. Part of a PDL representation of a model to withdraw $80 from an ATM. (Note, this model is of a person withdrawing money from a VisualBasic mock-up of an ATM rather than of a physical banking machine. Therefore, a mouse-based template is called to provide the lowest-level operators rather than templates that physically insert a card, hit buttons or removing money from a slot.) manuscript page 14

31jan05 Accepted for publication in Human-Computer Interaction The second step in creating a CPM-GOMS model in Apex-CPM (a2) is for the analyst to specify the order in which the operators of the bottom-level templates should be executed. To do this, the analyst puts the word :ranked on the top line of each procedure. This tells Apex-CPM that the steps in the procedure are ranked in order of their appearance. For example, initiate session is ranked higher than do transaction, which is ranked higher than end session. The ranking of a procedure is used when there is a competition for resources, as will be described below. The third step (a3) refers to what Apex-CPM does with the analyst’s PDL code rather than any action by the analyst. At runtime, Apex-CPM’s action-selection architecture determines which activities (operators) are eligible to be executed. If an activity’s preconditions are satisfied, e.g., if all the activities it is waiting for have been completed (a4), then it is enabled. Once enabled, its resource demands are assessed. If there is a conflict between two activities, Apex-CPM assigns the contended resource to the activity with the higher stated rank, and suspends the lower ranked activity. For example, if init-eye-move in the slow-move-click to the card-slot (which is in service of initiate session) and init-eye-move in the slow-move-click to the money-slot (which is in service of do transaction) are both enabled, they contend for the cognitive resource. Apex-CPM will assign the resource to init-eyemove to the card slot because it is in service of the higher-ranked procedure, initiate session. Once init-eye-move to the card slot is complete, freeing the cognitive resource, Apex-CPM reactivates init-eye-move to the money slot and assigns the resource if no other higher ranked activity is enabled. Thus, by assigning a higher rank to templates earlier in the sequence it is possible to ensure the proper overall ordering of templates while allowing their component activities to move forward as permitted by interleaving. The by-hand heuristic h3, however, says not only that a resource must be free, but that the resource must be free for a sufficient period of time (i.e., sufficient slack time). When done by hand, the entire task schedule must be defined in order to establish slack time. Apex, however, uses a greedy scheduler that operates on the fly. Therefore, there is no exact correspondence to slack time, only whether a resource is free or not at any given point in the time. Thus, Apex would allocate resources to any operator enabled to use a free resource without judging whether there is sufficient time to complete this operator, and operators would be interleaved from later templates that no analyst would have done by hand. However, Apex-CPM’s use of the rank mechanism also solves this problem. If the Action Selection Architecture allocates a resource to an operator, it begins to execute. If an operator with a higher rank that also uses that resource then becomes enabled, the action-selection architecture interrupts the executing operator and reallocates the resources to the operator with the higher rank. The interrupted operator merely goes back onto the agenda and waits for the resources it requires to be free again. A detailed trace of the Apex-CPM run shows this operator partially executing, being interrupted, and appearing later. We do not believe that skilled people actually start operators and interrupt them; we believe they have learned an efficient sequence of operators through experience. Thus, we do not claim that the millisecond-by-millisecond behavior of Apex-

manuscript page 15

31jan05 Accepted for publication in Human-Computer Interaction CPM accurately reflects the behavior of people, merely that the resulting PERT charts minus the interrupted operators are a good representation of human behavior. Allowing interleaving with the use of the rank mechanism introduces a new problem: how to prevent theoretically incorrect interleaving (heuristic h5 above). ApexCPM solves the problem by providing a mechanism whereby templates can specify when resources are available for use by other templates, even if those resources are not actively being used by the current template. The need for such specification arises, for example, where a template (T1) might not use a resource continuously, but it would not be desirable for that resource to be used by another template (T2) until T1 is completely finished with it. We discussed an example of this problem when we explained modeling by hand (Section 2.2), when we considered moving the attend-target and initeye-move from the gray template into the earlier white template by hand. This candidate interleaving was rejected by the human analyst because, although neither the cognitive resource nor the eye-movement resource were being used, this would result in an eye movement before perception is complete. Just like the human analyst, Apex-CPM also “considers” assigning the cognitive resource to attend-target at this point in its execution because the cognitive resource is free and attend-target is enabled. We next illustrate this problem (Figures 5) and solution (Figure 6) using a mechanism to block resource use with virtual resources that allows Apex-CPM to produce valid psychological models. Figure 5 illustrates this problem in the context of the ATM withdrawal task. In its upper left quadrant, Figure 5 lists the first 20 operators in the ATM task coming from the insert-card procedure (slow-move-click card-slot) and the enter password procedure (get-PIN and fast-move-click 4-key) shown in Figure 4. Following those operators across to the upper right quadrant, Figure 5 indicates which operators Apex considers allocating resources to at each 50 ms slice in time. The operator selected by Apex’s Action Selection Architecture using only its knowledge of the completion of waitfors, free resources, and rank, are indicated by tall rectangles in the upper right quadrant. Following the columns down to the lower right quadrant, Figure 5 shows the familiar PERT chart representation of the resulting behavior of the model. We now walk through the decisions Apex’s Action Selection Architecture makes to produce this behavior. When Apex begins executing the model code shown in Figure 3, it enables all the procedures that do not have unsatisfied waitfors. In particular, Apex enables the initiate-session procedure, its two children (insert-card and enterpassword) and their children, which include slow-move-click card-slot, getPIN, and fast-move-click 4-key, shown in Figure 5. When slow-move-click card-slot is enabled, both initiate-move-cursor card-slot and attendtarget card-slot are enabled because neither must waitfor anything else in the template (Figure 3). As shown by the thin rectangles in Figure 5, get-PIN, initiatemove-cursor 4-key and attend-target 4-key are also enabled at this point in time and Apex considers assigning the cognitive resource to all five of these operators. Because initiate-move-cursor card-slot and attend-target card-slot manuscript page 16

31jan05 Accepted for publication in Human-Computer Interaction have a rank of 1, Apex eliminates the other operators from consideration, and assigns the resource randomly to attend-target card-slot. When attend-target cardslot completes, initiate-eye-movement card-slot is enabled because its waitfor is now satisfied. The completion of attend-target card-slot also releases the cognitive resource, so Apex considers assigning it to the five enabled operators shown in Figure 5. Again, there are two operators with rank 1, initiate-eye-movement cardslot and init-move-cursor card-slot, and again Apex assigns the resource at random, this time to initiate-eye-movement card-slot. When initiate-eyemovement card-slot completes, it frees the cognitive resource for init-movecursor card-slot and enables eye-movement card-slot because its waitfor is now satisfied (Figure 3). Since eye-movement uses the ocular-motor resource and that resource is free, Apex assigns that resource to eye-movement and it begins execution in parallel with init-move-cursor. Perceive-target card-slot executes on the vision resource as soon as eye-movement is complete. Thus, the execution of these first few operators is straightforward and sensible, using only Apex’s standard operating mechanisms. As soon as initiate-move-cursor card-slot is finished using the cognitive resource, that resource is idle and Apex’s greedy scheduling algorithm searches for other enabled operators that can be assigned to it. As shown by Figure 5’s thin rectangles, get-PIN, initiate-move-cursor 4-key and attend-target 4-key have all been enabled since the beginning of the run, and now contend for the cognitive resource. Since get-PIN has a rank of 2 and the others are of rank 3, Apex assigns the cognitive resource to get-PIN. This is an example of appropriate interleaving behavior; it violates neither psychological research nor task logic. This interleaving is Apex’s equivalent of the user recalling her PIN as she inserts her card into the bank machine. However, when get-PIN completes, the cognitive resource is free again and Apex considers the enabled operators initiate-move-cursor 4-key and attendtarget 4-key. Both have rank 3, so Apex chooses randomly, assigning the cognitive resource to attend-target 4-key. This begins a chain of assignments (t=200 to t=300) that initiates an eye-movement to the 4-key and moves the eye to the 4-key while the perceive card-slot is still executing and well before the perceive-cursorat-target card-slot can even begin. Thus, the eye is moved away from the card-slot before the user has completed the perception necessary to click on it. Similar interleaving occurs to allow the hand to move the cursor away from the card-slot before clicking on it (t=250 to t=900). Thus, in Figure 5, the model’s behavior is a mixture of plausible interleaving and nonsense. On the one hand, the purely cognitive operator, get-PIN, interleaves to an early location, as soon as the cognitive processor is free. This makes intuitive sense; the user thinks about her PIN as she inserts her banking card. On the other hand, the eyes move away from the card-slot before verifying that the cursor has reached it and the hand moves the cursor away from the card-slot before clicking on it, violating psychological knowledge, MHP modeling idioms, and task logic. It is clearly a sequence of actions no skilled user of this device would perform. manuscript page 17

31jan05 Accepted for publication in Human-Computer Interaction

manuscript page 18

31jan05 Accepted for publication in Human-Computer Interaction

manuscript page 19

31jan05 Accepted for publication in Human-Computer Interaction To solve this problem, researchers who create templates use virtual resources to specify when resources are available for use by other templates. Virtual resources then tell the Action Selection Architecture to block interleaving from subsequent templates under conditions that violate psychological knowledge, MHP idioms, or task logic. Thus, researchers who create primitives and templates encode their own knowledge relevant to modeling into the templates so that analysts who use their templates do not have to have the same level of psychological or modeling sophistication. A virtual resource acts like any other resource with respect to resource allocation by the Action Selection Architecture, but it does not correspond to a physical or mental component of the human user like the right-hand, eyes, or cognitive processor (i.e., an MHP resource). When deciding which operator should receive a resource allocation, the action-selection architecture considers whether a virtual resource is free, just as it considers whether MHP resources are free. Thus, researchers who create primitives simply put the name of a virtual resource in the profile clause of that primitive along with the MHP resources the primitive requires. Virtual resources, however, can be “held” up the goal hierarchy by a special hold-resource step in a procedure. When a virtual resource is held by a template, it is reserved for use only by operators who share the same rank as that template, thereby blocking operators from subsequent templates that require this virtual resource from being executed. A virtual resource can be released by a release-resource step in a procedure or it releases automatically when the goal holding it completes. Figure 6 illustrates the solution with the addition of another column in the upper left quadrant, the Virtual Resources column. This column indicates which virtual resource, if any, each operator consumes. The execution of the model begins exactly as it did before the addition of virtual resources. All the same operators are considered to receive the cognitive resource and that resource is allocated to attend-target cardslot for the same reasons. However, this time, the attend-target primitive has been rewritten as follows to include virtual resources. (procedure (index (attend-target ?target)) (profile memory vision-block) (step s1 (start-activity memory memory-act :duration 50 => ?a)) (step s2 (hold-resource vision-block :ancestor 2) (waitfor (completed ?a))) (step r (reset ?self) (waitfor (resumed ?self)))3 (step s3 (terminate) (waitfor ?s2)))

3

The attend-target procedure also contains explicit instructions for resetting itself should it be suspended. The reset ?self command instructs the Action Selection Architecture to restart the activity for its entire duration when resumed after being interrupted. Recall that because Apex schedules on the fly, the solution to interleaving into a period of slack that was insufficient is to allow the interleaving and suspend the interleaved procedure when higher priority steps of the template are enabled. The reset capability allows the procedure to be started afresh later. manuscript page 20

31jan05 Accepted for publication in Human-Computer Interaction The attend-target primitive now includes the virtual resource vision-block in its profile clause. At the beginning of the model’s run, all resources are free, so both the cognitive resource (which PDL calls memory) and vision-block are allocated to attendtarget card-slot. Resource blocking is done by the hold-resource step (s2), which waits for completion of the MHP resource use designated in step s1. The argument :ancestor 2 in the hold-resource step specifies that the vision-block virtual resource is held by the procedure that called attend-target, in this case the template slow-move-click card-slot. This is indicated in the lower right quadrant of Figure 6 by the vision-block rectangle running from attend-target through verify-cursor-at-target. (The primitive for verifycursor-at-target contains a release-resource step because all activities in the template involving the eyes are complete at the end of verify-cursor-at-target.) The run of the model continues as before, selecting initiate-eye-movement card-slot, eyemovement card-slot, initiate-move-cursor card-slot, move-cursor card-slot and get-PIN, for all the same reasons as before. However, the effects of the virtual resource can be seen when get-PIN is complete (t=200), the cognitive resource is freed, and the operators of rank 3 remain to contend for this resource. This time, attend-target 4-key requires the visionblock virtual resource. However, that resource is being held by slow-move-click card-slot template and is not free at t=200. Since slow-move-click card-slot has rank 1, the action-selection architecture does not allocate this resource to attendtarget 4-key. An operator cannot execute until all its specified resources are free, and therefore attend-target 4-key does not interleave at t=200. It must wait until t=700 for both its MHP resource and its virtual resource to be free. Thus, the virtual resource effectively blocks operators from subsequent templates from interleaving inappropriately. Several virtual blocking resources are required to produce correct CPM-GOMS models, e.g., the vision-block detailed above, an analogous right-hand-block to prevent moving the mouse away from a target before clicking on it (also shown in Figure 6, t=100 to 1=900), and several motor and cognitive blocks to get typing to work correctly. All of this blocking is built by researchers who create primitives and templates into those primitives and templates, so analysts need not concern themselves with the inner workings. Thus, Apex-CPM fulfills all the steps and heuristics used by human analysts when constructing CPM-GOMS models by hand, but does so in a very different way. Both procedures start with an analyst doing task analysis and encoding it into a CMN-GOMS model, but there the similarity ends. As taught in past CHI tutorials (John & Gray, 1992, 1994, &1995), the process of creating a CPM-GOMS model involved working backward from the critical path to eliminate slack time by interleaving operators. The analyst had to be knowledgeable about the details of the task, psychological data and MHP modeling idioms. Working backward from an initial critical path is very different from building up the model from task and resource constraints at run time, as Apex-CPM does. The analyst no longer needs to know details of psychology, check consistency in task logic, nor work

manuscript page 21

31jan05 Accepted for publication in Human-Computer Interaction with the critical path; these details and checks can be built into the library of templates and primitives by researchers in HCI and delivered with Apex-CPM.

4. Applying Automated Template Interleaving 4.1 Modeling User Interaction with an ATM Machine We used Apex-CPM to create a CPM-GOMS model to predict the time it would take well-practiced users to withdraw money from an ATM as described in the previous section. In addition to testing the rules for interleaving, we exploited the ease of creating models in Apex-CPM to explore alternative models. We adopted two of the mouse moveand-click templates from Gray and Boehm-Davis (2000) to see if models could successfully be constructed by directly using templates developed for a different task/device. This is critically important. If genuinely HCI-general templates are possible, as would be demonstrated by their successful use in different tasks from the one for which they were initially created, then template-based modeling can scale-up to longer, more complex sequences of behaviors and there is hope that the tool can be applied by non-cognitive scientists. Demonstrating the potential generality of templates available in the literature is consequently an important goal. 4.1.2 Templates for Modeling an ATM User The leaf nodes of the ATM task decomposition are the two CPM-GOMS templates Slow-Move-Click and Fast-Move-Click adapted from Gray and Boehm-Davis (2000). The only change we made to Gray and Boehm-Davis’s templates was to increase the time for the "perceive-target" operator from 100 ms to 290 ms (taken from the TYPIST model, John, 1996) to account for the increased difficulty of perceiving the target in our experiment compared to the simple binary detection task of Gray and Boehm-Davis. Fitts's Law calculations were made from calculations of screen distance based on the Euclidean distance between the center of each object. Thus, all parameters in the models were set a priori from prior research, without reference to the collected data. For Gray and Boehm-Davis, Slow-Move-Click represents the selection of a target when there is uncertainty about where the target appears in each trial. Fast-Move-Click represents a selection of a target at a known location. We applied the two templates somewhat differently in the ATM task. The Slow-Move-Click template applies to mouse movements to distant or small targets (e.g., the card and cash slots), which require more careful verification of target and cursor location before clicking. The Fast-Move-Click template was used to predict actions on buttons of reasonable size and at a reasonable distance (e.g, the buttons on the keypad). The resulting model consisted of the goal hierarchy described in the previous section, which expanded into 15 templates, and about 180 cognitive, perceptual, and motor operators that produce about 11 seconds of behavior. Fifty-three of these operators interleave, that is, they begin before all the operators in the template that precedes them are completed. Figure 7 shows a portion of the PERT chart that Apex-CPM produces when this model is run. manuscript page 22

31jan05 Accepted for publication in Human-Computer Interaction

Figure 7: PERT chart view of interleaving templates. The darker boxes are from the earlier template and the lighter ones from the later one. This figure shows a view of the PERT chart where the width of boxes is proportional to time.

4.1.3 Empirical Procedure Two student participants served as users in the empirical observation. The goal here was not to validate CPM-GOMS (that theory has been validated elsewhere, Gray & Boehm-Davis, 2000; Gray, John & Atwood, 1993) but rather to provide detailed keystroke timing data against which to assess the automatic interleaving of pre-existing templates. The skill acquisition profile of these two participants resembles that documented in other similar studies (Baskin & John, 1998; Card, Moran & Newell, 1983); however, these studies did not provide keystroke-level data to compare to ApexCPM models. A simulation of an ATM was implemented in Visual Basic. All actions in the task were executed with the mouse. The trials were run on a PC. The display is shown in Figure 8.

manuscript page 23

31jan05 Accepted for publication in Human-Computer Interaction

Figure 8. Visual Basic mock-up used in the ATM task. On each trial participants followed a scripted procedure to withdraw $80 from the checking account on a Visual Basic mock-up of an Automated Teller Machine (ATM). Users performed actions on the ATM by using a mouse to click on simulated keys and slots. The procedure consisted of the following steps: Insert card (click on the picture of the card slot) Enter PIN (click on the 4, 9, 0, and 1 buttons in turn) Press OK (click on OK button) Select transaction type (click on withdraw button) Select account (click on checking button) Enter amount (click on 8 and 0 buttons) Press if correct/not correct? (click on correct button) Take cash (click on the picture of the cash slot) Answer question about wanting another transaction (click on No button) Take card (click on the picture of the card slot) Take receipt (click on the picture of the cash slot)

manuscript page 24

31jan05 Accepted for publication in Human-Computer Interaction The participants performed the procedure for 200 consecutive trials. This level of practice follows that used by both Card Moran and Newell (1983) in a text-editing task and Baskin and John (1998) in a CAD drawing task where they explored the effects of extensive practice on match to various GOMS models. 4.1.4 Results Individual participant data consisting of click times are compared with the model in Figure 9. Because Baskin and John (1998) found that CPM-GOMS models predict behavior well between the 50th and 100th trial of a practiced procedure, the means of trials 51-100 for each ATM user were used for analysis. The correlation between the predictions of the CPM-GOMS model and the individual user data over the 15 mouse clicks was quite high: r = .92, .93 for participants 1 and 2 respectively.

Figure 9: Mean individual subject performance over the 51st-100th trials compared to the predictions generated by the CPM-GOMS model. The average absolute difference between model predictions and average observed data was 75 ms. The root mean squared error of prediction of the CPM-GOMS model was calculated using the following formula: error = √[Σ(p - o)2 / n], where n = 15, o = the average of the two user's observed times, and p = the predicted time. For the CPMGOMS model, root mean squared error = 93 ms. We also computed the percentage of error for each point by dividing the absolute error by the response time for that point. Over the 15 clicks the average error of prediction was 12%. 4.1.5 Discussion The a priori predictions of the CPM-GOMS model correlated highly with the observed results and predicted the absolute response times with an average percent error of 12 percent. The average absolute difference between model predictions and observed manuscript page 25

31jan05 Accepted for publication in Human-Computer Interaction data was only 75 ms, and required no parameters to be set with reference to the data. This good fit is evidence automatic interleaving in Apex-CPM can be used with preconstructed templates to provide a good approximation of task execution time. Results were analyzed using the means of trials 51-100 because Baskin and John (1998) found that CPM-GOMS models predict behavior well at that level of practice. But what about earlier and later performance? Figure 10 compares average participant performance early in training (trials 1-50), trials 51-100, and late in training (trials 151200). Earlier performance tended to be slower than later performance, with both trial 51100 performance and model predictions generally somewhere in between.

Figure 10: Average subject performance earlier and later in training.

The model generally follows the variation in inter-click time observed in the participant performance, but this could be due entirely to the geometry of the interface, since Fitts's Law is used by Apex-CPM to calculate movement time. An alternative hypothesis is that Apex-CPM’s interleaving contributes to the inter-click time variability. Since modifying and running the model is now automated, we can easily explore alternative hypotheses. To see the benefit of interleaved templates, predictions from two other models are also shown: a model of only mouse movements (not including mouse clicks) based on Fitts’s Law predictions, and the CPM-GOMS model with sequential templates but no interleaving. The results of these explorations are shown in Figure 11.

manuscript page 26

31jan05 Accepted for publication in Human-Computer Interaction

Figure 11: Participant data plotted against a model that does not interleave, a full Apex-CPM model, and mouse movements only (not including click times). The first thing to notice is the good a priori fit of the model to user data. The degree to which interleaving contributes to this fit can be seen by comparing the interleaving and non-interleaving models. The non-interleaving model generally predicts a longer time for a mouse click and does not capture the variation of user click times. This variation is better captured by the Mouse Movements Only model, which shows just motor time as predicted by Fitts’s Law based on target distance and size. The correlation of participant performance with the Mouse Movements Only model is high: r = .98 for S1, and .96 for S2. The Mouse Movements Only model does not include click times but, even with these included, would still predict faster performance times than the participants produce and that the CPM-GOMS model predicts. The better CPM-GOMS model predictions are a consequence of the perceptual and cognitive processes incorporated into the templates. These comparisons show that templates contain important predictions of perceptual, cognitive, and motor processes. Furthermore, template interleaving captures the anticipatory quality of skilled performance by moving components of future actions into present actions if resources are available. These aspects of CPM-GOMS models combine to produce results that fit human data better than simpler models involving no interleaving or Fitts’s Law mouse-movement predictions alone.

manuscript page 27

31jan05 Accepted for publication in Human-Computer Interaction

5. General Discussion The previous sections demonstrate that Apex-CPM’s mechanisms of resource allocation, rank, and virtual resources combine to automatically generate CPM-GOMS from a GOMS-like specification of the task coupled with pre-constructed templates of basic HCI behavior (e.g., moving a mouse and clicking). Templates, including the virtual resource machinery, embody the expertise in psychology required to carry out a CPMGOMS analysis; templates are intended to be constructed by cognitive/HCI researchers for use by analysts less expert in cognitive psychology. The automation of the interleaving process embodies some of the modeling expertise required. Taking these two elements out of the modeling process is a step toward a tool that could be useful and valuable to engineers and interface designers. Users of robust versions of such tools could then work at the task-analysis level requiring only the generation of basic GOMS-like hierarchical goal decomposition models. Apex-CPM represents an important first step in making cognitive modeling more accessible in engineering environments. Application to simple HCI tasks, like the ATM task shown, is relatively straightforward and the template-based approach shows promise for tackling more complex procedural tasks. We believe the advance presented here, even with its limitations, could facilitate widespread exploration of CPM-GOMS, both as a useful tool in application domains, and as the embodiment of psychological theory. Like all first steps, Apex-CPM and the lessons learned from it leave many issues yet to be resolved. With respect to Apex-CPM itself, this paper presents an example using only mouse movements and clicks as its interaction technique, limited empirical evidence for the accuracy of automatically generated CPM-GOMS models, and only a logical argument for the reduction in cost of producing such models. More empirical validation is needed on all these fronts. We also need to extend the useful set of HCI behavior templates to typing, speech input and output, visual search, gestures, and other interaction techniques. Some of these behaviors already exist as CPM-GOMS templates expressed in PERT charts (John & Gray, 1992), which provide a good start for operators, their durations, and waitfors within a template, but appropriate virtual resources need to be added to produce usable templates. Automatic interleaving will only generate valid CPM-GOMS models if these individual templates are themselves validated empirically, and combinations of these templates are also validated against appropriately skilled human performance. Investigation into automatic composition across a greater variety of templates also needs to be carried out (e.g., Lewis, Howes, & Vera, 2004) We expect that template use and automatic interleaving will reduce the knowledge needed by an analyst to create valid CPM-GOMS models, and that it will also reduce the effort needed to produce such models. However, this expectation is itself a hypothesis to be tested empirically. While construction of an initial CPM-GOMS model may take less expertise and time, it is possible that debugging or modifying a model may take more time and special expertise about Apex-CPM. It is also possible that the expertise and process required to build a CPM-GOMS model by hand encourages the manuscript page 28

31jan05 Accepted for publication in Human-Computer Interaction analyst to think more deeply about the task than using Apex-CPM does, increasing some implicit learning we do not yet understand, leading to more insights for design. All of these hypotheses, and more, could be tested by HCI researchers even while HCI practitioners pick up use of the tool itself. Beyond Apex-CPM as a specific tool lie theoretical issues about skilled human behavior that are captured by reusable templates and automatic interleaving. Do reusable templates capture the skills humans bring to new tasks, e.g., typing, mousing, strategies for visual search? Does automatic interleaving represent an end-point of human skilled behavior after practicing a new complex task? Although Apex-CPM provides a convenient framework with which to demonstrate template construction and interleaving, the underlying Apex architecture is not, at core, a model of human psychological mechanisms. For example, as discussed previously, we do not claim that starting an operator then stopping it if a higher-ranked operator requests a resource, is a human-like mechanism. In order to explore the theoretical underpinnings of human-like skill and the acquisition of that skill, it may be profitable to investigate analogous mechanisms in cognitive architectures. The work presented here has fed into to such investigations in ACT-R. Matessa (2004) has looked at reusable templates in ACT-R and Taatgen (submitted) has explored how interleaving of tasks emerges with practice. It has also generated explorations into the assumptions of cognitive architectures themselves and their effects on interleaving (Vera, Howes, Lewis, & McCurdy, 2004). Finally, Apex-CPM represents only one point in the space of tools that might make cognitive modeling more accessible to engineers and designers. As mentioned in the introduction, several tools for GOMS and KLM emerged in the mid to late 1990s (Beard, Smith, & Denelsbeck , 1996; ; Byrne, Wood, Sukaviriya, Foley, Kieras, 1994; Hudson, John, Knudsen, & Byrne, 1999; Kieras, Wood, Abotel, & Hornof, 1995; Williams, 1993). Since Apex-CPM first appeared (John, et. al. 2002), other research efforts have started to explore alternate ways of expressing procedural knowledge and its relationship to the environment. Salvucci & Lee (2003) produced ACT-Simple to compile KLMs into ACT-R productions and St. Amant, Freed, and Ritter’s (2005) G4A did the same for GOMSL models. Tollinger, Lewis, McCurdy, Tollinger, Vera, Howes, & Pelton (2005) developed a modeling interface called X-PRT for the CORE modeling architecture (Howes, Vera, Lewis, & McCurdy, 2004), to support going from hierarchical task representations to CPM-GOMS graphically, without having to specify the model code. John, Prevas, Salvucci & Koedinger (2004) used modeling-by-demonstration with a connected storyboard, in a system called CogTool, to achieve an order of magnitude reduction in analyst time to produce a valid KLM. John, Salvucci, Centgraf, and Prevas, (2004) extended CogTool to multi-tasking, combining typical interactive HCI tasks with automobile driving to predict not only HCI task time, but lateral deviation from the center of the road and increased braking time. Salvucci, Zuber, Beregovaia, & Markley (2005) takes this work one step further, addressing usability issues found in CogTool. This upswing in tool research is making good progress, making it easier to model human performance on HCI tasks.

manuscript page 29

31jan05 Accepted for publication in Human-Computer Interaction

ACKNOWLEDGEMENTS This research was supported by funds from the NASA Aviation Operations Safety Program and the Intelligent Systems Program, and the Office of Naval Research (N00014-03-1-0086). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of NASA, the Office of Naval Research or the U. S. Government.

REFERENCES Anderson, J. R. & Lebiere, C. (1998). The Atomic Components of Thought. Mahwah, NJ: Lawrence Erlbaum and Associates. Baskin, J. D. & John, B. E. (1998) Comparison of GOMS analysis methods. Proceedings Companion of CHI, 1998 (Los Angeles CA, April 18-23, 1998) ACM, New York. p.261-262. Beard, D. V., Smith, D. K. & Denelsbeck, K. M. (1996). Quick and Dirty GOMS: A Case Study of Computed Tomography, Human-Computer Interaction, v.11 n.2 p.157-180. Bell, C. G. and Newell, A. (1971). The design and analysis of instruction set processors. New York, NY:McGraw-Hill. Bovair, S., Kieras, D.E. & Polson, P.E. (1990). The Acquisition and Performance of Text-Editing Skill: A Cognitive Complexity Analysis Articles. Human-Computer Interaction 1990 v.5 n.1 p.1-48 Brinck, T., Gergle, D., & Wood, S. D. (2002). Usability for the Web: Designing Web Sites that Work. Morgan Kaufmann: San Francisco. Byrne, M. D., Wood, S. D., Sukaviriya, P. N., Foley, J. D. & Kieras, D. (1994). Automating Interface Evaluation, Proceedings of ACM CHI'94 Conference on Human Factors in Computing Systems, B. Adelson, S. Dumais, & J. Olson (Eds.), v.1, pp. 232-237. New York: ACM Press. Card, S. K.; Moran, T. P.; and Newell, A., (1980a). Computer text-editing: an information-processing analysis of a routine cognitive skill. Cognitive Psychology 12, 396-410. Card, S.K., Moran, T. P., & Newell, A. (1980b). The keystroke-level model for user performance with interactive systems. Comm. ACM, 23:396-410, 1980. Card, S. K., Moran, T.P. & Newell, A. (1983). The Psychology of Human-Computer Interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Corker, K.M. (2000). Cognitive Models & Control: Human & System Dynamics in Advanced Airspace Operations. In N. Sarter and R. Amalberti (Eds.) Cognitive Engineering in the Aviation Domain. Lawrence Earlbaum Associates, New Jersey. Deutsch, S.E., Adams, M.J., Abrett, G.A., Cramer, N.L., and Feehrer, C.E. (1993). RDT&E Support: OMAR Software Technical SpecificationAL/HR-TP-1993-0027. Wright- Patterson AFB, OH. manuscript page 30

31jan05 Accepted for publication in Human-Computer Interaction Dix, A., Finlay, J., Abowd, G., & Beale, R. (1998). Human-Computer Interaction, 2nd Ed. Prentice Hall: Hertfordshire, England. Eberts, R. E. (1994) User Interface Design. Prentice Hall: Englewood Cliffs, NJ. Freed, M. (1998) Managing multiple tasks in complex, dynamic environments. In Proceedings of the 1998 National Conference on Artificial Intelligence. Cambridge, MA:MIT. Freed, M., Matessa, M. Remington, R. and Vera, A. (2003) How Apex automates CPMGOMS. In Proceedings of the Fifth International Conference on Cognitive Modeling, pp. 93-98. Bamberg, Germany:Universitats-Verlag. Gray, W. D., & Boehm-Davis, D. A. (2000). Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior. Journal of Experimental Psychology: Applied, 6(4), 322-335. Gray, W. D., John, B. E. & Atwood, M. E. (1993) Project Ernestine: Validating a GOMS Analysis for Predicting and Explaining Real-World Task Performance, HumanComputer Interaction, v.8 (3), pp.237-309. Haunold, P. & Kuhn, W. (1994). A Keystroke Level Analysis of a Graphics Application: Manual Map Digitizing GOMS Analysis Proceedings of ACM CHI'94 Conference on Human Factors in Computing Systems, 1, 337-343. Helander, M., Landauer, T. K. & Prabhu, P. V., Eds. (1997). Handbook of HumanComputer Interaction. Amsterdam, North-Holland. Howes, A., Vera, A.H., Lewis, R.L., & McCurdy, M. (2004). Cognitive Constraint Modeling: A formal approach to supporting reasoning about behavior. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society, Chicago, IL. Hudson, S.E., John, B.E., Knudsen, K., & Byrne, M. D. (1999). A Tool for Creating Predictive Performance Models from User Interface Demonstrations. Proceedings of the ACM Symposium on User Interface Software and Technology, p.93-102. Irving, S., Polson, P. Irving, J.E. (1994) A GOMS Analysis of the Advanced Automated Cockpit GOMS Analysis. Proceedings of ACM CHI'94. Conference on Human Factors in Computing Systems, 1994 v.1 p.344-350. John, B. E. (1988) Contributions to engineering models of human-computer interaction. Ph.D. Thesis. Carnegie Mellon University. John, B. E. (1990) Extensions of GOMS analyses to expert performance requiring perception of dynamic visual and auditory information. In Proceedings of CHI, 1990 (Seattle, Washington, April 30-May 4, 1990) ACM, New York, 107-115. John, B.E. (1996) TYPIST: A theory of performance in skilled typing. Human-Computer Interaction, 11, 321-355. John, B. E. & Gray, W. D. (1992) GOMS Analyses for Parallel Activities. Tutorial materials, presented at CHI, 1992 (Monterey, California, May 3- May 7, 1992), CHI, 1994 (Boston MA, April 24-28, 1994) and CHI, 1995 (Denver CO, May 7-11, 1995) ACM, New York. manuscript page 31

31jan05 Accepted for publication in Human-Computer Interaction John, B. E. & Kieras, D. E. (1996a). The GOMS family of user interface analysis techniques: Comparison and Contrast, ACM Transactions on Computer-Human Interaction, v.3(4), pp. 320-351. New York: ACM Press. John, B. E. & Kieras, D. E. (1996b) Using GOMS for user interface design and evaluation: Which technique? ACM Transactions on Computer-Human Interaction, v. 3 (4), pp. 287-319. New York: ACM Press. John, B. E., Prevas, K., Salvucci, D. D., & Koedinger, K. (2004) Predictive human performance modeling made easy. In Proceedings of CHI 2004 (Vienna, Austria, April 2004) ACM New York. John, B. E., Salvucci, D. D., Centgraf, P., & Prevas, K., (2004) Integrating models and tools in the context of driving and in-vehicle devices. Proceedings of International Conference on Cognitive Modeling 2004 (Pittsburgh. PA, July 30 – August 1, 2004). John, B. E., Vera, A. H., Matessa, M., Freed, M., and Remington, R. (2002) Automating CPM-GOMS. In Proceedings of CHI’02: Conference on Human Factors in Computing Systems. ACM, New York. John, B. E., Vera, A. H., and Newell, A. (1994). Towards real time GOMS: A model of expert behavior in a highly interactive task. Behaviour and Information Technology, 13, 4, pp. 255-267. Kieras, D. E. (1996) Guide to GOMS model usability evaluations using NGOMSL, The Handbook of Human-Computer Interaction, M. Helander and T.Landauer (Eds.), 2nd ed. North-Holland Amsterdam. Kieras, D. E., Wood, S. D., Abotel, K., & Hornof, A. (1995). GLEAN: A ComputerBased Tool for Rapid GOMS Model Usability Evaluation of User Interface Designs. International Journal of Man-Machine Studies, 22, 365-394. Kieras, D.E. Wood, S.D. & Meyer, D.E. (1997) Predictive Engineering Models Based on the EPIC Architecture for a Multimodal High-Performance Human-Computer Interaction Task. ACM Transactions on Computer-Human Interaction 1997 v.4 n.3 p.230-275. Kitajima, M. & Polson, P. G. (1995) A comprehension-based model of correct performance and errors in skilled, display-based, human-computer interaction. International Journal of Human-Computer Studies, 43(1):65-99. Laughery, K. R. & Corker, K. M. (1994). Computer modeling and simulation of human/system performance. In G. Salvendy (Ed.), Handbook of Human Factors, second edition. New York, NY: John Wiley and Sons. Lerch, F. J., Mantei, M. M. & Olson, J. R. (1989). Translating Ideas into Action: Cognitive Analysis of Errors in Spreadsheet Formulas, Proceedings of CHI, 1989, pp. 121-126. New York: ACM. Lewis, R.L., Howes, A., & Vera, A. H. (2004). A constraint-based approach to understanding the composition of skill. In Proceedings of the International Conference on Cognitive Modeling, Pittsburgh, PA. manuscript page 32

31jan05 Accepted for publication in Human-Computer Interaction Matessa, M. (2004). An ACT-R framework for interleaving templates of human behavior. In Proceedings of the Twenty-sixth Annual Conference of the Cognitive Science Society, pp. 903-908, Chicago, IL. Meyer, D. E. & Kieras, D. E. (1997a) A computational theory of human multiple-task performance: The EPIC information-processing architecture and strategic response deferment model. Psychological Review, 104, 1-65. Meyer, D. E. & Kieras, D. E. (1997b) A computational theory of human multiple-task performance: Part 2 Accounts of psychological refractory phenomena. Psychological Review, 107, 749-791. Modell, M. E. (1996). A Professional's Guide to Systems Analysis. McGraw Hill, 1996. Newell , A. ( 1990 ). Unified Theories of Cognition. Cambridge ,MA :Harvard University Press. Newman, W. M., & Lamming, M. G. (1995). Interactive System Design. Addison Wesley: Wokingham England. Pelz, J. B. and Canosa, R. (2001). Oculomotor Behavior and Perceptual Strategies in Complex Tasks. Vision Research, 41, 3587-3596. Pew, R. and Mavor, A. (1998). Modelling Human and Organizational Behavior: Application to Military Simulations . National Academy Press, 1998. Pirolli, P. and Card, S. K. (1999). Information foraging. Psychological Review, 106, 643675. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., & Carey, T. (1994). HumanComputer Interaction. Addison Wesley: Workingham England. Raskin, J (2000). The Humane Interface: New Directions for Designing Interactive System.. Addison Wesley: Boston. Salvucci, D. D., & Lee, F. J. (2003). Simple cognitive modeling in a complex cognitive architecture. In Human Factors in Computing Systems: CHI 2003 Conference Proceedings (pp. 265-272). New York: ACM Press. Salvucci, D.D. & Macuga, K. L. (2001). Predicting the effects of cell-phone dialing on driver performance. Proceedings of the Fourth International Conference on Cognitive Modeling, pp. 25-30. Mahwah, NJ: Lawrence Erlbaum. Salvucci D. D., Zuber, M., Beregovaia, E., & Markley, E. Distract-R: Rapid Prototyping and Evaluation of In-Vehicle Interfaces. In Proceedings of the ACM Conference on Human Factors and Computing Systems, CHI'05, Portland, Oregon. Shneiderman, B. (1998). Designing the User Interface: Strategies for effective humancomputer interaction, 3rd Ed. Addison Wesley: Reading, MA. St. Amant, R., & Ritter, F. E. (2005). Specifying ACTR models of user interaction with a GOMS language. Cognitive Systems Research, 6, 71-88.

manuscript page 33

31jan05 Accepted for publication in Human-Computer Interaction Taatgen, N.A. (submitted). Modeling parallelization and flexibility improvements in skill acquisition: from dual tasks to complex dynamic skills. Manuscript submitted for review. Available from http://www.ai.rug.nl/~niels/publications.html. Tambe, M., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Rosenbloom, P.S., Schwamb, K., (1995) Intelligent agents for interactive simulation environments. AI Magazine, 16, 1, pp. 15-39. Tollinger, I., Lewis, R.L., McCurdy, M., Tollinger, P., Vera, A. H., Howes, A., & Pelton, L. (2005). Supporting efficient development of cognitive models at multiple skill levels: Exploring recent advances in constraint-based modeling. In Proceedings of the ACM Conference on Human Factors and Computing Systems, CHI'05, Portland, Oregon. US Navy, 1958. PERT Summary Reports, Phase 1 and Phase 2, Special Projects Office, Bureau of Naval Weapons, Department of Navy, Washington, USA. Vera, A. H, and Rosenblatt, J. K. (1995). Developing user model-based intelligent agents. In Proceedings of the 17th Annual Meeting of the Cognitive Science Society (Pittsburgh, PA, July 22-25, 1995). Vera, A. H., Howes, A., McCurdy, M., and Lewis, R. L. (2004). A constraint satisfaction approach to predicting skilled interactive cognition. In Proceedings of the Conference on Human Factors in Computing Systems CHI’04, Vienna, April 24-29, 2004. Williams, K. E. (1993) Automating the cognitive task modeling process: An extension to GOMS for HCI. In Proceedings of the Fifth International Conference on HumanComputer Interaction Poster Sessions: Abridged Proceedings (vol 3. p. 182). Young, R. M., Green, T. R. G., & Simon, T. (1989) Programmable user models for predictive evaluation of interface designs. In J. C. Chew & J. Whiteside (Eds) CHI’89 Conference Proceedings: Human Factors in Computing Systems, 15-19. ACM Press. Young, R. M. & Whittington, J. E. (1990) Using a knowledge analysis to predict conceptual errors in text-editor usage. In J. C. Chew & J. Whiteside (Eds) CHI’90 Conference Proceedings: Human Factors in Computing Systems, 91-97. ACM Press.

manuscript page 34