Systematic Procedural Error as a Result of

0 downloads 0 Views 131KB Size Report
by. Michael D. Byrne. In Partial Fulfillment of the Requirements for the Degree .... In his classic paper, Norman (1981) defines a slip as “a form of human error defined to ..... always destroy it, so the first ten steps were performed more times than the ..... standard deviations from the mean on the dependent variable (number of ...
Systematic Procedural Error as a Result of Interaction Between Working Memory Demand and Task Structure

A Thesis Presented to The Academic Faculty by Michael D. Byrne

In Partial Fulfillment of the Requirements for the Degree Master of Science in Psychology

Georgia Institute of Technology August 1993

Copyright ©1993 Michael D. Byrne

2

Abstract Despite the presence of correct knowledge, people seem to systematically make errors on tasks that share a particular structure, that of having some steps left to complete after the main goal has been fulfilled. An example of this kind of error would be leaving one’s gas cap after filling up. Such errors are termed post-completion errors. Though little empirical or theoretical work has been done on this particular phenomenon, a synthesis of ideas from various sources suggests that such errors are a result of an interaction between post-completion goal structure and working memory limitations. In particular, people should be prone to making post-completion errors when working memory load is high. Preliminary research provided some support for this claim, so in further investigation a computational model of a complex task was constructed using CAPS (Just & Carpenter, 1992). The model does, in fact, make post-completion errors when the working memory activation allowed to the system is reduced. A second experiment was conducted to evaluate the claims of the model, and results from the experiment are consistent with the CAPS simulation. Implications for the design of both human-machine systems and cognitive models are discussed.

Introduction In the course of everyday interaction with the world, people make countless mistakes, slips, lapses, miscalculations and the like. Errors are a common and natural phenomenon, and many errors stem from incorrect, incomplete, or misapplied knowledge. People can only perform tasks without error when they have the correct knowledge—or good luck. While many errors are knowledge-related errors, does correct knowledge guarantee error-free performance? If not, where would errors come from? It seems to be assumed by the psychological community that with correct knowledge, errors may be possible but are likely negligible. Non-knowledge errors are often viewed as being essentially stochastic, and it is common in psychological research for error trials to be either discarded or ignored. Yet there is anecdotal support for the notion of systematic errors in contexts where the correct knowledge is present. For example, most people who work in office or academic environments use photocopiers on at least an occasional basis, and use them successfully. That is, the correct knowledge is typically present. Yet, after making their copies, people often leave their original sitting on the glass, under the cover. Another example is the use of automated teller machines (ATMs). Imagine someone using an ATM with the goal of withdrawing $20. Most people know at least one person who has left their bank card in the ATM after getting their money, and can easily imagine themselves doing so, especially if the machine had not beeped or displayed a reminder. These two errors have a common theme: operations (removing the original, removing the card) are required after the main goal of the task (get copies, get money) has been satisfied. Thus, this type of error will be referred to as post-completion error. Post-completion errors abound in day-to-day interaction with the world: leaving one’s gas cap after filling the tank, leaving one’s headlights on, and so on. These are all tasks that people usually perform correctly; tasks they have the right knowledge of how to perform. Despite this, people still seem to be prone to post-completion errors. Yet little psychological research explicitly addresses the subject of errors, particularly those that are not knowledge-based. “One reason for this is that error is frequently considered only as a result or a measure of some other variable, and not as a phenomenon in its own right.” (Senders & Moray, 1991, p. 2) Typically, errors are seen as some stochastic function, where decrements in performance are manifested as an increase in global error rate, with little or no attention to what causes any of the particular errors.

3

Previous Work on Errors One of the most surprising things about the literature on non knowledge-based errors is the conspicuous lack of it. There are two likely causes for this: the difficulty of getting human subjects to make any but apparently stochastic errors in the laboratory and the lack of interest in error as a phenomenon in its own right. There has been more treatment of non-stochastic error as a phenomenon in the human-machine systems community than in traditional experimental psychology or cognitive science, but little of that treatment has been empirical. Senders & Moray (1991) describe the results of one of the few conferences ever held on the subject of error, The Conference on the Nature and Source of Human Error of 1983. The bulk of their book on this conference is a recognition of the legitimacy of errors as a phenomenon worthy of study and an appeal for more research to be done on the subject. The remainder of the book is concerned with the problem of generating simple taxonomies and terms so that research in this area can have a basic starting point. There is little in the way of actual theory or data presented, so this work gets us no closer to an understanding of or explanation for post-completion error. One of the largest and most well-known works in error research is James Reason’s (1990) book Human Error. He divides errors into three classes, corresponding to the three levels of behavior he describes: skill, rule, and knowledge. A significant bulk of the theory and model he proposes deals with errors based on incomplete or incorrect knowledge, and therefore at the “knowledge” level. He does, however, give consideration to non-knowledgebased errors, those at the “skill” level. These errors are termed “slips” or “lapses,” and are characterized by “failure to make an attentional check” or “failure to monitor the current intention” (p. 60, 61). Errors at higher levels, rule and knowledge, are termed “mistakes,” which occur “at the level of intention formation.” So how would Reason explain the photocopier or ATM errors? Likely, he would characterize these errors as “slips.” Unfortunately, Reason does not go into much detail about how slips emerge from the human cognitive system. In his framework, they come from either “inattention” or “overattention”; people either fail to provide an attentional check when they should or they perform one when they should not. This explanation begs the question: where do the attentional failures come from? Reason would probably classify post-completion errors as resulting from inattention, but he does not describe how the cognitive system would be systematically prone to these attentional glitches. They just seem to occur, possibly stochastically, possibly not—no further explanation is provided. In his classic paper, Norman (1981) defines a slip as “a form of human error defined to be the performance of an action that was not what was intended” (p. 1). That is, a slip is a mis-execution of a correct procedure—exactly the kind of error in question. This paper is primarily a brief taxonomy of different kinds of slips, based on naturalistic observations and self-reports of errors. It does not provide any direct or conclusive evidence of systematic human error. Rather, the paper describes several kinds of errors and gives a few anecdotes for each. Norman does, however, begin the paper with some thoughts about how errors might emerge. He describes a model based on an “activation-trigger-schema” (ATS) system. In such a system, various schemata (“organized memory units”) for action exist at varying levels of activation. Each schema has a series of “triggering conditions” associated with it, much like the if side of a production rule. He assumes that, at any given moment, multiple schemata will be active and that schemata will be applied according to some function that trades off between activation level and “goodness-of-match of the trigger conditions.” He does not describe this trade-off in more detail. Errors, then, typically emerge either because the activation of the correct schema was too low or because the goodness of match to some incorrect one was high. While the ATS framework may begin to suggest an explanation for errors in correct knowledge situations, it does not precisely specify when those errors are more likely. It does not, for instance, provide an explanation for the errors in the photocopier and ATM examples. Baars (1992a) has also undertaken an extensive and ambitious research program on errors. Though this program has concentrated on speech errors, Baars (1992b) suggests that the same mechanisms responsible for speech errors cause action errors as well. In particular, he suggests that slips occur because “[c]ompetition between two alternative plans

4

may also overload the limited-capacity system momentarily, thereby interfering with the conscious access that is needed to resolve nonroutine choices caused by the competing plans” (p. 30). His conceptualization is, however, rather underspecified (largely due to his reliance on “consciousness”) and neither directly addresses post-completion errors or contains enough specification to predict their occurrence or non-occurrence with much precision. The ideas of competition and capacity limitations are crucial, however, and will be revisited.

Task Analysis One way to begin searching for an explanation of post-completion errors is to carefully examine the tasks in which these errors seem to occur. The tasks will be analyzed using a GOMS analysis. GOMS stands for Goals, Operators, Methods, and Selection rules (Card, Moran, & Newell, 1983) and is useful for describing the procedural knowledge necessary to perform a routine cognitive skill, as well as formalizing notions about the structure of the task in question. GOMS analyses have shown themselves to be useful in the design of better help/instructional systems (Elkerton & Palmiter, 1989), and have been highly formalized by Kieras (1988). It is possible to directly transform a GOMS model into a production system model of a task (Bovair, Kieras, & Polson, 1990). A GOMS analysis consists of a description of the goal structure of a task, along with the methods to accomplish those goals and the operators used to implement those methods. Figure 1 graphically depicts the structure of the photocopier task. Ovals represent goals, and rectangles represent operators. A method is represented by a column with the goal satisfied by the method on top and operators in temporal order going downward. Methods proceed in left-to-right fashion. It is important to distinguish between the task structure that is a function of the device and that which is a function of the user’s goals. When using a simple device such as a photocopier, the user typically has only one goal: getting copies. However, if the user is to successfully operate the device in service of this goal, they will have to generate subgoals according to the task structure inherent in the device. Thus, the user will typically follow the procedural path required by the device. At the “remove copies” operator, however, the path required by the device and that which is necessary to satisfy the user’s goal diverge. (The double vertical lines serve to emphasize that operator.) When the copies are removed, the user has fulfilled the “get copies” goal—he or she now has copies, so the “get copies” goal is no longer relevant. From the user’s point of view, the goal is satisfied and the next goal may be pursued. However, from the perspective of the device, the task is not complete. The divergence is represented by the dashed arrow. The model for the ATM is strikingly similar to that of the photocopier (see Figure 2).While this task is slightly more complex in terms of the number of methods and operators, the associated error is probably less common, partly because there is more of a cost associated with leaving one’s bank card in an ATM than there is with leaving a document on a photocopier. In addition, there are perceptual differences. With the photocopier, it is impossible to see the original because it is hidden under the cover. ATMs, on the other hand, eject the card near or in the field of vision. Most also display a “don’t forget your card” message and beep until the card has been removed. If the user’s top-level goal is fulfilled before completing all steps in a task and the step where this occurs can be identified, these GOMS models indicate what steps are likely to be omitted. That part of the task that occurs after the user’s goal is satisfied but before the completion of the device task will be referred to as post-completion structure. It seems to be the case that if a task includes post-completion structure, errors are more likely.

5

Get copies

Next Goal

Place paper

Number of copies

Make copies

Remove original

Lift cover

Punch in number

Press "Copy"

LIft cover

Wait, then remove copies

Remove original

Place original

Close cover

Close cover

Figure 1. Graphic GOMS for photocopier While GOMS modeling may provide a tool to systematically identify where post-completion might occur, it does not explain why post-completion structure is hard for people to execute correctly. If post-completion directly causes errors, people should always slip on tasks where post-completions appear—yet they do not. If post-completion does not directly cause error, where does the error come from? Obviously, more is involved than simply the structure of the tasks themselves. Since people typically have the ability to perform the task without error, the explanation for this phenomenon cannot be in the form of “knowledge bugs.” Therefore, something about the underlying cognitive architecture must be responsible for these errors.

Get cash

Next goal

Insert card

Enter PIN

Perform transaction

Pick up cash

Clean up

Align card

Enter code

Press "Withdrawl"

Wait for cash to appear

Remove receipt

Push card in

Press "Enter"

Enter amount

Remove cash

Retrieve card

Verify response

Press "Enter"

Figure 2. Graphic GOMS for ATM

6

7

Unified Theories of Cognition Since it is likely that post-completion errors emerge from the underlying cognitive architecture, one approach that should be productive is to examine large-scale cognitive theories and try to understand how post-completion errors might emerge from them. Soar, ACT*, and construction-integration have demonstrated predictive or explanatory power in a variety of domains and seem well-suited to application in new domains. The power and generality of these theories makes them particularly appealing starting points in areas where little specific empirical work has been done. While none of these theories provides a compelling account of post-completion error, they suggest two potential sources of the error: goal management and working memory failure. Soar (Laird, Newell, & Rosenbloom, 1987; Newell, 1990) is said to be a “universal weak method” that learns for itself, thereby acquiring domain knowledge that overcomes the need to do search. Soar is built on just a few mechanisms: search through a problem space; a single, universal production system; automatic subgoal generation; and a unitary learning mechanism, chunking. For the purposes of this discussion, it is not necessary to go into great detail on these components. Soar has been used to model systematic errors (Polk & Newell, 1988). In an effort to address the use of mental models, Polk and Newell (1988) applied Soar to the task of syllogistic reasoning. Their aim was to account for the data gathered in Johnson-Laird’s (1983) classic work on syllogistic reasoning, where the errors made by the subjects were non-random in that some problems systematically generated more errors. Newell (1990) concluded that these systematic errors are a function of systematic knowledge deficiencies. Thus, the errors in this task are also “knowledge errors” and such work does not address post-completion errors. In fact, is not clear how Soar would make non-knowledge-based errors. Typically, Soar only makes errors while learning a procedure; once a procedure is learned, it is always executed correctly. When given an identical problem over and over again, Soar’s performance only improves. Clearly, this is not what goes on when humans perform tasks such as photocopying and using an ATM. However, Soar does suggest a useful approach to this problem: Soar uses a “goal stack” to maintain a recursive set of goals and subgoals in which the top goal on the stack is the current goal the system is pursuing. When a new goal is created, it is pushed onto the stack, becoming the current goal. Subgoals generated by a goal on the stack are also pushed onto the stack. When a goal is satisfied, it is popped from the stack, making what was the second goal on the stack the current goal. It is possible in the course of pursuing a subgoal that not only will that subgoal be satisfied, but one of the goals further down in the stack will be satisfied; thus, satisfying some subgoals may satisfy a supergoal as well. In that case, the supergoal and everything above it in the stack are popped. This leaves Soar free to pursue whatever was below that supergoal on the stack. In the ATM example, for instance, the supergoal might be “get money.” This goal would be pushed onto the stack. Since Soar is at an ATM, an “insert card” subgoal would be pushed, then popped when the card is successfully inserted into the ATM. A subgoal like “perform transaction” and the “get money” supergoal will be on the stack when the ATM spits out money, and both the subgoal of completing the transaction and the supergoal of getting money are satisfied. Thus, the stack should be popped from the “get money” goal on up. The error of forgetting to remove the card arises because the subgoal of retrieving the card is required by the task, but at the time when it is appropriate, there is no goal to which this subgoal can attach itself, as there are no goals relevant to the ATM on the stack. That is, the “retrieve card” subgoal is lost because the relevant goal has been popped from the stack already.Figure 3 provides an illustration in the form of a graphic depiction of a goal stack over the course of some limited problem-solving (trying to go to dinner).

Acquire cash

Insert card

System is now free to pursue whatever was on stack before "Get money"

Get money

Get money

Get money

Purchase

Purchase

Purchase

Purchase

Purchase

Purchase

dinner

dinner

dinner

dinner

dinner

dinner

Initial goal stack

"Get money" pushed

"Insert card" pushed

"Insert card" popped

"Acquire cash" pushed

Have money; "Get money" popped, which also pops "Acquire cash"

Get money

...

Figure 3. Goal stack for ATM example

8

9

Although this explanation has a lot of intuitive appeal, it does not make entirely palatable predictions. According to this account, people should make this error every time, which is not the case. Of course, the Soar account can be “patched” to stipulate that another goal gets placed on the stack, which is the goal to perform the post-completion step. (This is similar to the modification of the main goal to include the post-completion step, essentially making the main goal a conjunctive goal.) This would solve the problem in that Soar would perform the task correctly, but then it should perform the task correctly on every single trial, which is no better than Soar making the error on every single trial. One might hypothesize that Soar begins with the non-patched version of the procedure and learns the patch. This account predicts that people should transition from always making this error to never making this error. Once again, this does not seem to match most people’s experience; if this account is correct, people should leave the original on the photocopier the first few times they use it, and once they get it right they never make the post-completion error again. While this explanation does not satisfactorily cover the photocopier or ATM examples, it suggests an important concept: it is possible to satisfy the goals present in the cognitive system before completing all of the task-relevant goals. It should be noted that this account for post-completion developed is almost identical to the one developed by Young, Barnard, Simon and, Whittington (1989), though they termed the phenomenon “secondary subgoal omission.” Unfortunately, that account was never supported by either a running model or laboratory data. Where Soar is primarily derived from the Newell and Simon approach to problem solving, ACT* (Anderson, 1983) is derived from Anderson’s research on memory. Anderson’s approach to errors typically has been through working memory. In ACT*, working memory is simply that part of the long-term declarative store that is active. Productions match these active elements, and can posit new elements. Goals are seen as sources of activation for working memory elements. Activation of elements decays over time, so some kind of maintenance is necessary. ACT* does not explicitly predict or attempt to model errors, but in a study on students learning LISP, Anderson (1987) claims that “the major source of errors will be working-memory failures. The student gets good feedback on the facts about the programming language, and so misconceptions would probably be rapidly stamped out.” (pp. 202-203). That is, the correct knowledge should be present in most cases, yet errors emerge anyway. However, Anderson provides no more than a cursory explanation of the source of the “working memory failure.” Working memory capacity is “overwhelmed,” which produces errors that are “nonsystematic.” Anderson describes one common systematic error: the omission of parentheses, which is twice as common as the addition of parentheses. Unfortunately, Anderson more or less leaves the discussion there, raising but not answering questions. Why is this particular error more common? Why do programmers lose this particular piece of information from working memory more often? This example seems parallel to the photocopier example and the Soar explanation because the closing parentheses are typically added after functions are written in LISP. That is, the goal of writing the function is satisfied and, in addition, the correct knowledge is present. Anderson’s data seems to support the notion of a systematic error that is not knowledge-based. Anderson was certainly not the first to use working memory as an explanation for errors. In 1978, Hitch suggested that a memory model which includes some decay of short-term information best explained his data on arithmetic errors. It is unclear how Hitch’s model would be generalized to other tasks, but it at least provides a plausible place to begin the search for an explanation. Another architecturally-inspired account for post-completion errors is described in Polson, Lewis, Reiman, and Wharton (1992). This account is based on the construction-integration (C-I) system (Kintsch, 1988), and uses the term“supergoal kill-off” for the error. The construction-integration system was originally developed to model text

10

comprehension, but has been applied in other areas as well (e.g. Mannes & Kintsch, 1991; Kitajima & Polson, 1992). The construction-integration model is a hybrid symbolic/connectionist system, and the account of postcompletion error depends on inhibition of goals. C-I uses inhibition from “done-it” nodes to goal nodes to manage goals. In this account (which also used ATMs as an example), post-completion error arises because the “done-it” node for the main goal (“get cash”) is very “similar” to the “done-it” node of the “pick up cash” subgoal. When the “done-it” node for the main goal becomes active, it inhibits the main goal, which causes the subgoal to lose activation and therefore never gets executed. When the error is made, it is because the system confuses one “done-it” node for another. There are some problems with this approach, both theoretical and empirical. One theoretical problem is that “similar” is never operationalized; how “similar” do two goals have to be to generate the error? Second, it has the same problem the Soar account does: if the two goals are similar, then the error should always be made, if the goals are not similar, the error should never be made. Also, like the account of Young et al., this account lacks empirical support. In particular, experiments based on this account have failed to produce post-completion errors (Polson, personal communication). It is important to note, however, that in this account, the post-completion error occurs because the relevant subgoal is lost. This is in agreement with the Soar account in that management of goals is responsible for the error, suggesting that the goal management is critical. So, while looking at extant theories of cognition is suggestive, it does not offer a satisfactory explanation of the phenomena in question. But it does suggest that the errors may have something to do with the structure of the task and may also be a function of working memory failure. In particular, it appears that the temporal relationship between the goal structure and task completion is important because of working memory constraints.

Working Memory and Post-Completion Error Omitting post-completion structure (either goals, operators, or both) can be thought of as a kind of forgetting; the person performing the task simply forgets to perform some portion of the task. Since it is typically the case that the person performing the task knows the correct procedure, this is not an issue of permanent or long-term memory (LTM), as the correct knowledge is present in long-term memory. One could argue that this error is a result of a failure to retrieve the appropriate material from LTM, but this explanation does not clearly explain why errors in one part of the task would be more common than errors in any other part, and why so many tasks with this structure would be be prone to the same type of error. Post-completion error, then, may be a working memory phenomenon. Working memory is generally thought of as the cognitive space in which computations (such as planning) are carried out and temporary memory structures are stored. Thus, it seems likely to be responsible for post-completion error. Various conceptualizations of working memory have been proposed (e.g. Anderson, 1983; Baddeley, 1986), and appeals to working memory have previously been used to explain certain types of errors (e.g. Hitch, 1978). Postcompletion omissions can be explained in a relatively straightforward way through goal maintenance in working memory. When working memory is not taxed, post-completion structures should be no more likely to be omitted than any other structure. People should be more prone to omission of post-completion structure when working memory load is high. That is, task structure and working memory conditions systematically interact to increase the likelihood of error. The concept underlying this mechanism is simple. Supergoals in working memory supply activation to subgoals. When the supergoal is eliminated from working memory by being satisfied, the subgoal loses the benefit of this supply from the supergoal, causing it to decay. When working memory load is high, the subgoal will be closer to some “threshold” level of activation and this decay may lead to the loss of the subgoal.

11

Thus, subgoals depend on supergoals in working memory. When working memory is taxed, post-completion goals are the ones most likely to suffer from this dependence. When working memory is not taxed, this should not be a problem. Modeling Recently, a number of computational models that incorporate some kind of working memory limitation have surfaced (e.g. Shastri, 1992; Kotovsky & Kushmerick, 1991). The model with the most impressive track record and the most well-developed theoretical foundation is the CAPS production system model (Just & Carpenter, 1992). Using the CAPS framework, it should be possible to build a model that is capable of performing tasks with postcompletion correctly most of the time and failing when memory load is high. As with most production systems, CAPS has two memory systems, a long-term memory and a working memory. Working memory contains elements such as goals and propositions. In this model, each element in working memory has some continuous activation value associated with it. Not all working memory elements can be used in the matching of productions; an element’s activation value must be above a threshold level in order for that element to be matched by productions. The long-term memory contains productions, simple if-then rules. Productions match their “if” conditions against the above-threshold elements in working memory. On a given execution cycle, all productions that match are allowed to fire in parallel; there is no conflict resolution. When a production fires, it can cause one of two things: direct actions, such as pushing a button; or a request for activation of some element or elements in working memory. If a production requests activation for an element that is not present in working memory, that element is created. There is a ceiling to the total amount of activation that may be present in the system at any given time. On a given cycle, all the productions that have their conditions matched request some amount of activation from the system. The amount requested is totaled and added to the total activation of the currently active working memory elements. If this total exceeds the ceiling, the system “cuts back” the amount of activation propagated and maintained according to the amount that the total requested exceeds the ceiling. For example, if a set of productions fire requesting 60 units of activation and currently active elements use 60 units, then 120 total units of activation are requested. If the ceiling is 100 units, the activation propagated by each production and the activation of each working memory element will be cut back according to the amount requested that is above the ceiling. In this case, that would be 20/120 (or 1/6) of the requested amount. That is, 5/6 (100 units) of the 120 units of activation requested will be propagated and maintained as each individual element and production is scaled back to 5/6 of its request. The primary implications of such a capacity limit are twofold. First, when the system is operating at the capacity ceiling and memory load is increased, the functional decay rate for elements in working memory will increase—that is, elements will be “displaced” faster. This is because the amount of scaling will have to increase to accommodate the increased demand. Second, it is not necessary to implement a “goal stack” in such a system. Stack-like behavior emerges from the differential activation levels of the goals that are present in working memory. An example of the photocopier will be helpful to both clarify the operation of the model and explain how postcompletion errors might occur when working memory load is high. (Example productions referred to here may be found in Appendix A.) This example will be a detailed look at the execution of the photocopier procedure, looking in particular at the execution of the first method and the post-completion step. When the system approaches the photocopier, the goal of “get copies” will be active in working memory. This goal will be maintained by the fact that the system does not have copies, a function provided by the MAINTAIN-GETCOPIES production.

12

The MAINTAIN-PLACE-PAPER production has its conditions matched and so it fires for several cycles, building the activation of the “place paper” subgoal until it reaches threshold. Then, productions to implement this subgoal fire (LIFT-COVER-1, PLACE-ON-GLASS, and CLOSE-COVER-1) and have their actions executed over the next few cycles. Once the paper has been placed, MAINTAIN-PLACE-PAPER will stop firing and the “place-paper” goal will decay below threshold. The same pattern continues through the next two methods until the method implementing “make copies” completes. When this happens, the MAINTAIN-GET-COPIES production will no longer fire, since one of its conditions (not having copies) is no longer met. This is where, from the standpoint of post-completion errors, things get interesting. Since the “remove original” subgoal will be below threshold, it depends on the MAINTAIN-REMOVE-ORIGINAL production to bring it back above threshold. However, this production depends on the presence of the “get copies” goal. If this goal is not above threshold in working memory, MAINTAIN-REMOVE-ORIGINAL cannot fire and the “remove original” goal will never get above threshold. This is how post-completion structure gets omitted. As described above for CAPS, if working memory load is high, the functional decay rate will be high. Thus, the “get copies” goal will decay rapidly once MAINTAIN-GETCOPIES stops firing. If the decay rate is high enough, MAINTAIN-REMOVE-ORIGINAL will simply not have enough cycles to bring “remove original” above threshold. If working memory load is low, however, the “get copies” goal should be present long enough to bring “remove original” above threshold. Thus emerges the prediction that post-completion structure should be problematic when working memory load is high, and relatively incidental when it is not.

Preliminary Research The fact that one can predict an interaction between post-completion and working memory load leaves open a number of questions. First and foremost, is it possible to generate empirical evidence for claims about systematic error in the laboratory? Second, does such evidence lend support to the hypotheses about the relationship between working memory and post-completion omission? A pilot study was conducted using a fictional “Star Trek” setting with the primary purpose of getting subjects to make systematic errors in the laboratory, and to give at least some insight into the relationship with working memory if such errors were produced (see Byrne 1992). This was a two-factor experiment, the first factor being the working memory load imposed by two different tasks. In the experiment, there were two tasks that the subject had to perform: the phaser bank task and the deflector shield task. These tasks differed substantially in the number of goals and operators that were necessary, which was intended as a manipulation of the working memory load involved, where the phaser task had a larger number of goals and operators than the deflector task. The second factor was post-completion structure: each task either contained post-completion structure or it did not. The tasks with post-completion contained operators and goals nearly identical to those in “normal” tasks, but they differed in that one version of each task had post-completion structure. In the experiment, subjects went through two phases: training and trial. Subjects first trained with each task to ensure that they had the correct knowledge of how to do each task. This was followed by the trial phase, where each subject received 10 trials with each task. The order of tasks within trials was randomly determined.

13

Results for trial error rate are depicted in Figure 4. Within the deflector task, there is no effect of post-completion, t(26) = .00, not significant. Within the the phaser task, the effect of post-completion is also not statistically significant, though it indicates a trend in the predicted direction, t(26) = 1.26, p = .110, one-tailed.

Trial error rate 0.8 Deflector

Errors/trial

0.6

0.62

Phaser

0.4 0.32

0.2

0.02

0.02

0 No Post-completion

Has Post-completion

Post-completion

Figure 4. Trial error rate There is another way of looking at these data. There are eleven physical actions that are performed as part of the phaser task. In the best-case scenario, the phaser hits and destroys the enemy vessel on the first shot, meaning that exactly eleven actions are performed. If errors are equiprobible, as would be suggested by a stochastic error theory, any particular step should be responsible for 1/11 or 9.1% of the errors in the task. In actuality, the post-completion error accounted for just under 15% of the errors made by subjects (13 out of 87 errors). By binomial test of proportion, this is significantly more than would be expected given the stochastic error hypothesis (z = 1.96, p = .025, one-tailed). This is a conservative test of the hypothesis, since the phaser often missed the target and did not always destroy it, so the first ten steps were performed more times than the post-completion step. Thus, the real expected proportion of errors for which the post-completion step should have been responsible is actually less than 9.1%. No test was possible for the deflector task, since there was a grand total of three errors made on the task; obviously, this is a severe floor effect that affects the other tests done as well. However, note that the error rate in the phaser/post-completion cell is almost twice that of the error rate in the phaser/no post-completion cell. This suggests a difference of some kind. Since while training the subjects were engaged in the joint tasks of procedure execution and procedure acquisition, they likely experienced a high in working memory load. There is some evidence (e.g. Reber & Kotovsky, 1992) that working memory load is particularly critical in training, so looking at training data for this study may also be informative. Results are presented in Figure 5.

14

These data are somewhat more conclusive. Individual t-tests within tasks indicate the clear presence of an effect. Within the phaser task, the cells are significantly different, t(26) = 2.25, p = .017, one-tailed (the direction of difference was predicted), but they are not the deflector task, t(26) = -1.74, p = .094, two-tailed (no difference predicted). That is, post-completion did not have an effect in the simpler task, but in the more memory-intensive phaser task, post-completion was associated with a sizably increased error rate. One notable effect not apparent in the means is the data from the very first training trial. Subjects have access to the training manuals for the first two trials, so the correct knowledge is most certainly available to the subjects. Despite the fact that the manual explicitly instructs subjects on the post-completion step (it is mentioned in three places in the manual), 93% of the subjects (13 of 14 in this condition) made this error on the initial trial. Errors are typically considered difficult to predict, but here subjects consistently made this error in a laboratory situation.

Training error rate 3 2.47

Deflector Phaser

Errors/trial

2

1

0.85

0.25 0.05 0 No Post-completion

Has Post-completion

Post-completion

Figure 5. Training error rate Thus, results from preliminary work seem to indicate that it is, in fact, possible to get subjects to consistently err in the laboratory. Second, there is some evidence supporting the hypothesis that working memory load and postcompletion structure interact to produce increased error. Clearly, more sensitive measures and a cleaner experimental design are necessary before strong conclusions can be drawn. The rest of this paper is an attempt to realize those aims, through a running computer simulation model and a second, more refined, experiment. The experiment is essentially a replication of the previous experiment in which working memory load and the particular tasks used are unconfounded, and individual differences in working memory are included in the experimental design. The fictional Star Trek setting is retained and the same phaser task used. The

15

deflector task was removed and replaced with a “transporter” task, which is isomorphic to the phaser task in terms of GOMS structure. The computational model is a simulation of the tasks used in the experiment. The combination of the cleaner experiment and the simulation model should provide a strong test of the hypothesis of post-completion error as a result of interaction between task structure and working memory.

Computational Model Though there are a total of four tasks used in the experiment, the isomorphic nature of the phaser and transporter tasks used makes it unnecessary to model both of them. Thus, only two models were constructed: the postcompletion version of the phaser, and the non-post-completion version. The graphical representation of the GOMS model for the post-completion version of the phaser appears in Figure 6. This GOMS model was transformed to a production system model, specifically a CAPS model, by a procedure similar to the one outlined in Bovair, Kieras, & Polson (1990). Since both the post-completion and non-postcompletion versions of the task share much of the same structure, a common set of productions was used for the first portion of the task. These productions appear in section 1 of Appendix C. Those productions that fire after the point where the tasks diverge and were used to model the post-completion task appear in section 2 and those used to model the non-post-completion version appear in section 3. Since the two tasks are quite similar, these latter two sets of productions are also quite similar. The model that was actually built is very similar to the production system model described earlier for the photocopier. Rather than a goal stack, the model uses “maintenance” productions to manage goals. Maintenance productions have the following general form: IF the supergoal [is above threshold], AND NOT the goal [is above threshold], AND the preconditions for the goal are met, AND the goal state has not been reached THEN give activation to the goal This yields behavior quite similar to a goal stack, with one minor difference: goals can still be matched after they have been satisfied, at least until they are displaced. What becomes critical is the rate of displacement. Rate of displacement is, as described earlier, a function of the load on working memory: the higher the load, the faster satisfied goals will be displaced. The results of running the two versions of the model are, as expected, not the same, and depend on the value of the activation ceiling given to the CAPS system. If the activation ceiling is 7.0 units or higher, both models execute the task without error. However, if the ceiling is lowered to 6.5 units or less, the post-completion step of “turn off tracking” is often omitted by the post-completion version of the model. The non-post-completion model does not make any errors with a ceiling of 6.5. Thus, depending on the value of the ceiling, the post-completion model may or may not make the omission. The omission is dependent on the state of working memory at the time that the main goal is fulfilled, and this in turn is dependent on the number of times the phaser must be fired and how long the target must be tracked before firing. With a ceiling below approximately 6.0 units, the omission is made every time. This suggests a correlation should be observed between working memory and post-completion error.

16

Destroy Romulan ship

If killed, proceed, else GOAL fire phaser

Turn off tracking

Fire phaser

Click "Tracking" Charge phaser bank

Set focus

Turn On Tracking

Track and fire

Click "Power Connected"

Click "Settings"

Click "Firing"

Determine firing criterion

Click "Charge"

Decide upon desired focus value

Click "Tracking"

Press cursor keys until firing criterion reached

Wait until charged

Set slider to desired focus value

Click "Stop Charging"

Click "Focus Set"

Press space bar

Click "Power Connected"

Figure 6. Graphic GOMS for phaser task, post-completion version

17

In fact, one of the factors that determines if a post-completion error will be made by the model is the number of cycles spent tracking the target. This is because as the model tracks the target, the old (satisfied) goals are displaced more and more, relieving the load on working memory. This is functionally equivalent to raising the capacity ceiling. Thus, if the tracking time is very long, the effective capacity of the system will be higher and a lower “real” ceiling is necessary to cause the post-completion error. If the tracking time is very short, however, the model can omit the post-completion step even with the activation ceiling just over 7.0 because the functional capacity after tracking is low. Thus, if two tasks share exactly this same structure, but they differ in the time for the tracking subtask, one might expect differences in post-completion error rates. There are several features of the CAPS system that make these models work. One of these features is the ability to function without a goal stack. If a strict goal stack is used, the model will make the post-completion errors every time, regardless of memory capacity, in exactly the same way Soar would. What makes the errors happen is the parallelism that is inherent in CAPS. The incorrect post-completion action (exiting the task) and the correct action (turning off tracking) begin to compete in parallel once the supergoal (destroy the enemy) is satisfied. If there is enough “room” in working memory for both of them, no error occurs. However, if there is little room in working memory, the incorrect post-completion action will win the competition, since the correct action is dependent on the main goal for its activation—that is, the main goal appears in the if side of the maintenance production that activates the “turn off tracking” subgoal. This main goal is rapidly displaced when it has been satisfied, giving the incorrect post-completion action the opportunity to win the competition (since it does not depend on the main goal) and be executed. Thus, the claim that post-completion error arises from the combination of a strained working memory and a particular task structure despite the presence of correct knowledge is lended support by the CAPS models constructed. The experiment described below is intended to empirically evaluate this hypothesis. More specifically, the model predicts that there should be an increase in total error rate due primarily to post-completion errors when working memory load is high. In terms of the experiment, “high working memory load” should manifest itself as an interaction between the externally imposed load and subjects’ individual working memory capacity. Thus, the model predicts a three-way interaction between post-completion structure, memory load, and memory capacity. This threeway interaction should disappear when post-completion errors are removed from the analysis. The possibility exists that the two tasks, the phaser and the transporter, have different characteristics that have an impact on error rates. If the difference between the two tasks is a difference in tracking time, this should have an effect on post-completion error rates, resulting in a four-way interaction between post-completion structure, memory load, working memory capacity, and task. If such an effect is found, controlling for differences in tracking time should eliminate the effect.

Method Subjects Subjects were 64 Georgia Institute of Technology undergraduates who participated for extra credit in a psychology course. Apparatus/Materials The materials used for the experiment consisted of a sheet of written instructions, a set of paper manuals, and a Macintosh IIfx personal computer running HyperCard. The sheet of instructions described to the subjects the fictional “Star Trek” setting for the experiment: as an “Operating Engineer” aboard the Columbia, a sister vessel of the Enterprise. The sheet then explained that they were to learn how to operate the ship’s phaser bank and transporter system. (See Appendix B.1 for a screen shot of the phaser bank.) Once they had completed training, they would later be assigned to actual duty aboard the ship. The instructions stressed that in the training phase they were to

18

concentrate on learning the procedures and not to worry about speed, but once the trials were “for real” it was critical that they work quickly in order to insure the safety of their ship. The sheet outlined the experimental procedure, which is described in the Procedures section below. The paper manuals contained instructions for how to perform each task. These instructions were based on the GOMS models of the tasks (see Figure 6). First, the manual described the top-level goal structure of the task, and the remainder was divided into sections that described the method for accomplishing each goal. Each section started with a summary of the steps for accomplishing the goal. These steps were taken directly from the GOMS models. The remainder of each section gave a detailed description of the steps and the “reason” for having to do each step. The “reasons” provided included things like “Once the phaser has charged, it is necessary to disconnect the battery from the power source.” (See Appendix B.2 for a more complete example.) These were included to facilitate learning of the task, as Kieras and Bovair (1984) demonstrated that “mental model” information aids the learning of procedures. Finally, the last page of each manual contained a full summary of the procedure for easy reference. (See Appendix B.3 for an example.) The experiment program was developed with Claris’s HyperCard software, version 2.1. The experiment program handled all of the experiment that was not on the instruction sheet or in the manuals. This included not only presentation of the task, but recording of subjects’ actions, keeping track of where the subject was in the training procedure, randomizing trials, assigning of experimental condition, and so on. Design This experiment was a four-way design with three between-subjects factors and one within-subject factor. The factors were the following: post-completion (within-subjects), task mix (described below), working memory load, and working memory capacity. The experiment was designed so that multiple regression analysis could be performed, with post-completion, task mix, and memory load as categorical variables and memory capacity as a continuous variable. Post-completion had two levels: control and post-completion. Subjects learned and performed two tasks, the phaser bank and the transporter. These two tasks were identical in terms of GOMS structure, but one task contained an operator that satisfies the main goal prior to completion of all the subgoals. Thus, one of the tasks contained postcompletion, while the other did not. Note that this is not a change in the number of steps or the order of the steps in the procedure, but in the goal structure of the task only; opportunities for error are the same in both the postcompletion and control versions of the task. Task mix, the second factor, had two levels. Since it is desirable to attribute any effects of post-completion to the post-completion and not the specific task, two “versions” of the experiment were run: one in which the phaser contains the post-completion, and one in which the transporter contains the post-completion. Subjects were randomly assigned to one of these conditions.The task mix in which the phaser was the task with post-completion and the transporter was the control task will be referred to as Mix A, the reverse Mix B. The third factor was external working memory load, which also had two levels (no load and load) to which subjects were randomly assigned. In previous work, working memory load was manipulated by varying task complexity, but this was a relatively weak manipulation. In this experiment, subjects in the load present condition performed a concurrent, unrelated task, similar to that used by Reber and Kotovsky (1992). There were several reasons for using this task: it has been shown to be a successful manipulation, it uses material that is unrelated to the phaser and transporter tasks, and is relatively straightforward to implement in HyperCard. In this task, subjects were presented with auditory stimuli (letters) at the rate of approximately one per three seconds. At random intervals ranging from 3-15 seconds, subjects heard a tone rather than a letter. When they heard this tone, subjects were to recall (write down) the last three letters that they heard. Thus, subjects in this condition were forced to not only remember three letters, but to constantly update the letters that they were to remember.

19

The fourth factor, working memory capacity, was a continuous independent variable in the multiple regression. In order to obtain a good estimate of working memory capacity, two different task were used, and these tasks were designed to have as little domain-specific crossover as possible. A subject’s working memory score was the average of their z-scores on two working memory tasks: Sentence Digit and Operation Word, taken from Turner and Engle (1989). These tasks make use of different items in their computation and storage components in order to minimize domain-dependence within each task. A good estimate of general working memory capacity should be provided by the convergence of these two measures. In the Sentence Digit task, subjects read a sentence followed by a digit from 1-9. They were asked a simple question about the sentence and told to retain the digit until recall was requested. One or more sentence-digit pairs were presented, after which the subject was asked to recall the digits. Subjects began with one sentence-digit pair, then received two, then three, and so on until the subject either incorrectly answered questions or misremembered digits on two out of three trials for a given span. The subject’s score was the total number of digits recalled correctly. Operation Word was similar to Sentence Digit, except that instead of being presented with a sentence, subjects were presented with an arithmetic problem followed by a word. Subjects were asked to perform the arithmetic computation and then remember the word. Presentation length was gradually increased until conditions like those above were reached; that is, until subjects made errors in either computation or recall on two out of three trials. A subject’s score was again measured by the total number of words the subject recalled correctly. The dependent measure was the total number of errors made by the subject on each task. Table 1 summarizes the layout of the design (not including the continuous factor), where check marks indicate the presence of postcompletion in the task. The design is a 2 x 2 x 2(within-subjects) x 1(continuous) multiple-regression design. Table 1. Summary of design

Phaser WM load

√ √

WM load No WM load No WM load

Transporter

√ √

Procedures The subjects were run in two sessions, spaced one week apart. This was done to try to overcome the floor effects seen in the preliminary study. In that experiment, evaluation of subjects’ training performance was based explicitly on errors and subjects reported a high level of conscious self-monitoring even when training was complete. A week delay between training and test trials was used to help alleviate this effect, while being a small enough delay that subjects should still have remembered the tasks. In the first session, subjects read the instruction sheet describing the experimental procedure, after which the training procedure began. First, subjects read the instruction manual for the phaser bank task. They next performed one trial with the phaser bank, during which they were allowed to refer to the manual. This was essentially a “familiarization” trial. The manual was then taken from the subject to encourage the subject to memorize the

20

procedure. Training continued with the phaser bank until the subject was able to complete three trials without error. During training, if the subject made an error executing the procedure, the error was explicitly flagged and the trial ended. Once training with the phaser was finished, training with the transporter began. This was nearly identical to training with the phaser: subjects read the manual for the task, tried once with the manual at hand, and then trained to criterion without the manual. The second session consisted of two parts: working memory assessment and test trials of the two tasks. Subjects first performed the working memory tasks until they reached the limits of their span. Once the subject had completed working memory assessment, they performed ten trials each with the phaser and transporter, presented in random order. That is, unlike in training, trials were not blocked but were randomized to prevent differential practice effects. During the second session, the experiment program emitted a loud buzz whenever an error was made, so subjects were aware that they had just made an error.Subjects were, however, allowed to continue the trial. In order to encourage subjects to work quickly, the display screen for each task contained a “timer” that displayed the amount of time (in seconds) that had passed since the subject started that particular trial. At the end of each trial, subjects were shown their average “time to complete” for that task. Subjects were run alone or in pairs. In order to minimize the amount of noise interference between subjects assigned to the working memory load condition, subjects wore headphones connected to their Macintosh. To maintain consistency across subjects, all subjects wore headphones, regardless of whether they were run in pairs or individually.

Results A few observations had to be adjusted. One subject clearly forgot one of the tasks and was approximately six standard deviations from the mean on the dependent variable (number of errors); this subject’s data was excluded from the analysis. Two subjects failed to complete even one trial with the Sentence Digit task. For these subjects the only available estimate of working memory was the task they did complete, so their score on the Operations Word task was substituted. The correlation between the two working memory measures was much smaller than expected, r = .25, p = .042. This is likely due to ceiling effects; the tasks simply were not difficult enough to create much variance in the subjects that were recruited. Histograms for the Sentence Digit and Operations Word tasks are presented in Figures 7 and 8. One question of interest not addressed in the CAPS model is that of practice (CAPS does not include a learning mechanism). Figure 9 depicts the mean errors as a function of trial number. Note the large number of errors made on the first trial, and that this jump is a function of errors that are not post-completion errors. In total errors, there was a main effect for trial, F(9,558) = 25.14, p < .001, and an interaction between trial and post-completion structure, F(9,558) = 19.43, p < .001. Similar results were observed for the difference between the errors in the control condition and exclusively post-completion errors (such errors only occurred in the post-completion condition). The first trial had not only the highest mean number of errors, but also the highest variability. Eliminating the first trial leaves the main effect for trial number intact, but eliminates the interaction between post-completion and trial number. Thus, while subjects did improve over the course of the experiment, there is no evidence supporting the idea of different levels of improvement between post-completion and other errors. For the remaining analyses, errors will be collapsed across trials 2-9.

21

Sentence Digit Span 35 30

Frequency

25 20 15 10 5 0 [1]

[2]

[3]

[4]

[5]

[6]

[7]

Span Figure 7. Frequency distribution for Sentence Digit task

20

Frequency

15

10

5

0 [1]

[2]

[3]

[4]

[5]

[6]

Span Figure 8. Frequency distribution for the Operations Word task

[7]

22

Errors by trial number 4

Mean number of errors

Control task total Post-Completion task total 3 Post-completion errors only

2

1

0 1

2

3

4

5

6

7

8

9

10

Trial number

Figure 9. Error rate by trial

The cell means for each condition within each task mix are presented in Figures 10 and 11. Note also that means for the post-completion conditions with the actual post-completion errors removed are presented. The complete ANOVA table is presented in Appendix D. The result of interest was a reliable three-way interaction; as one might guess from the figures, this was the post-completion by task mix by memory load interaction, F(1,55) = 4.70, p = .034. This three-way interaction decomposes in a relatively straightforward way. Within task mix A, the interaction of memory load and post-completion is reliable, F(1, 27) = 5.66, p = .025. This matches quite closely with the predictions of the CAPS model: working memory load and post-completion structure must both be present to cause an increase in errors. Note, though, that there was no effect involving individual working memory capacity. This is likely due to the low variability in this measure; since the subjects were generally similar in working memory capacity, they were not differentially affected by the working memory load. A key question is whether or not the increased error rate is due to post-completion errors. This appears to be the case, for if the error of omitting the post-completion step is subtracted from the errors observed in the postcompletion condition, no main effects or interactions are observed at the .05 level.

23

Task Mix A errors 14 Control Cond.

Mean number of errors

12

11.6

PC Cond. 10

PC Cond., w/o PC errors

8

6

7.20

5.57

5.93 4

4.83 3.25

2 0 No Load

With Load

WM Load

Figure 10. Task Mix A errors Within the task mix B, however, the pattern of results is somewhat different. There is only a main effect of postcompletion, F(1,28) = 23.26, p < .001. Again, this appears to be due to the presence of post-completion errors; when post-completion errors are removed, there are once again no effects at the .05 level. While this difference between task mixes may seem counter to what one might expect from the CAPS model, in reality it is not. Recall that the CAPS model predicts post-completion errors at higher memory capacity if the time spent tracking the target is short. As it turned out, there was a significant difference in the amount of time spent tracking the target in the two tasks. In the last training trial (the most stable time estimate), subjects averaged 16.7 seconds to track with the phaser, while they averaged only 7.8 seconds with the transporter. This difference is reliable, t(62) = 7.22, p < .001. On average, subjects took more than twice as long to track with the phaser than with the transporter. This is entirely consistent with the CAPS simulation, as subjects working with the transporter were prone to making post-completion errors even without additional memory load.

24

Task Mix B errors 14 Control Cond.

Mean number of errors

12

PC Cond. PC Cond., w/o PC errors

10

8

10.81

9.06

5.81

6 4.25

5.19

4 3.88 2

0 No Load

With Load

WM Load

Figure 11. Task Mix B errors Another way of looking at the role of tracking time in the difference between tasks is to directly control for differences in tracking time. The correlation between the difference in the tracking time and the task mix factor is .69, p < .001. Statistically controlling for tracking time does indeed eliminate the three-way interaction of postcompletion, task mix, and memory load. If this tracking time variable is substituted for the task mix variable in the multiple regression, the overall results are quite similar to those obtained with the original dichotomous task mix variable (see Appendix D for the ANOVA table using the tracking time variable instead of the task mix variable). The difference between the phaser and transporter, then, can be attributed to differences in the time spent on the tracking subtask. The basic hypothesis of working memory strain yielding post-completion errors is supported not only by a running CAPS model, but to some degree by laboratory data as well. Given the differences in tracking time between the two tasks, the CAPS model predicts a four-way interaction between post-completion structure, memory load, task mix, and working memory capacity. Instead, a three-way interaction involving all of those factors except working memory capacity was found. This would be consistent with the CAPS model if there were no individual differences in working memory capacity. The subjects in this study, did, in fact, demonstrate minimal between-subjects variability in memory capacity.Thus, the predictions of the CAPS model of a three-way interaction are supported by the empirical results.This finding is particularly noteworthy given the limited amount of previous formal and empirical work on systematic procedural errors.

25

Discussion This work represents a first step into a largely unexplored area of cognition. While errors of various types have been studied by various researchers, this kind of common, everyday procedural error has remained largely untouched. One reason for this is may be that such errors are believed to be stochastic and therefore uninteresting. This work demonstrates that this is not the case. At least one type of procedural error, the post-completion error, is not a purely stochastic error. This error was made consistently by subjects in a laboratory situation, despite the presence of correct knowledge. Thus, laboratory studies of systematic error are viable, particularly if the mechanisms responsible for the error can be identified. In this case, systematic error is a result of both working memory load and a particular task structure. Using CAPS as a modeling tool was critical in developing this hypothesis, as CAPS is one of the few systems available in which the concept of limited working memory capacity is operationalized. One of the critical aspects of the CAPS model that gave it the ability to make post-completion errors is the fact that the model can “lose” goals. Forgetting goals is likely to play a crucial role in formal and empirical approaches to other kinds of common error, such as capture errors (Norman, 1981). The classic example of a capture error is when driving a very familiar route, such as home from work, one plans to make a deviation from the route, perhaps to run an errand on the way home. Despite the fact that the goal of performing the error remains unsatisfied, people often find themselves at home, wondering how they could have forgotten their errand. The “drive home” procedure is said to “capture” the intended errand, hence the name. Clearly, the forgetting of the goal to make the errand is the critical event in causing the error; the question is how the goal gets lost. The approach developed here may be able to account for this kind of error as well, since goal forgetting is a part of the system. On the other hand, the lack of an explicit goal stack makes it necessary for CAPS to be run at values of the ceiling low enough that displacement decay is forced—CAPS only loses things from working memory if it is trying to use more activation that it is allowed. If the system is not run with a capacity low enough to cause displacement, satisfied goals do not decay and correct sequencing is not ensured since irrelevant goals may be active. That is, CAPS can only work if there is some strain on working memory—yet people seem to work quite effectively when there is no such strain. There are two approaches that solve this problem, neither of which is supported by CAPS. One approach is to have goals at a particular level of the goal hierarchy inhibit each other, forcing decay. This is similar to the “lateral inhibition” proposed in the IAM model of word recognition (McClelland & Rumelhart, 1981). However, empirical evidence for this at the level of system goals is, at best, lacking. Another approach, for which there is considerable evidence (see Salthouse & Babcock, 1991, for a review) is to have the activation of all elements in working memory that are not explicitly maintained decay over time. If satisfaction of a goal equates with a lack of explicit maintenance, this would yield a model that should run correctly regardless of the capacity limitation. More research is needed in this area to determine the best way to construct such a model. The CAPS model is lacking in other areas as well. While the CAPS model predicts post-completion errors reasonably well, the model makes no other errors while executing the tasks. The subjects, on the other hand, averaged approximately one non-post-completion error every other trial. The CAPS model is silent on the possible causes of these other errors. In addition, the number of errors subjects made, both post-completion and otherwise, did decline slightly with practice. Since CAPS does not include a learning mechanism, it cannot account for this trend. It may be possible to account for this kind of small improvement over time with a strength-adjustment mechanism such as the one found in ACT-R or in some PDP models, but until such a learning mechanism is actually implemented the reduction in errors over time can be considered a shortcoming of the model. This research has implications for design of human-machine systems as well. Designers of such systems would do well to avoid building post-completion structure into the interaction, even for tasks that may not themselves impose much of a working memory load. A person’s working memory can easily become “filled” with task-irrelevant information at any time by daydreaming, planning the route to drive after getting gas, worrying about their account balance, and so on. Since the interface designer has no control over memory load, post-completion steps should

26

always be avoided. There are examples of safeguards in real systems today, such as ATMs that will not dispense money until the card has been removed, ATMs where the card is swiped rather than inserted, and gas caps that are attached to the car so that they cannot be left behind. One way that probably serves to reduce, though may not eliminate, post-completion errors is to provide some cue about the post-completion step. This is common in ATMs in the United States—they typically beep and flash something on the screen until the card is removed. Both the experiment and the model presented here assume that no such cue is present. Obviously, more work is necessary to determine just what the impact of such a cue would be. Another issue raised here is that of goal identification. One of the oldest problems confronted by research in cognition is that of identifying “what’s in people heads.” How, for instance, is it possible to know that the subjects’ goal was “destroy enemy” and not “destroy the enemy and turn off the tracking system”? The question of conjunctive goals is a critical and difficult question, both in cognitive psychology and artificial intelligence. One answer to the question is provided by the data: the fact that just over 90% of the subjects made the post-completion error at least once indicates that such a conjunctive goal was not present. This is not, however, a globally satisfactory answer. Much more work needs to be done to understand how and when people form conjunctive goals and how such goals affect behavior. This work also points out a weakness of most of the dominant modeling approaches prevalent today in cognitive science. Most systems, once they have the requisite knowledge to perform a task correctly, always perform the task correctly. People do not do this. Whether this requires major changes or minor changes depends on the system in question, but most systems simply fail to address this. The solutions found to this problem will probably affect how the systems works in other areas as well. The CAPS model, in particular, points out two areas in which many systems might be revised: goal stacks and working memory. While there has been evidence gathered that many forms of problem-solving behavior are stacklike,there is no conclusive proof that the entire goal management system for all tasks should be based on a stack. This research indicates that a goal stack might not be effective for modeling errors, which is consistent with other work on errors (e.g. Baars, 1992b) that also relies on “goal competition.” Most computational models also omit some operationalization of a working memory constraint. While this may well be adequate in a wide variety of tasks, there may be some phenomena, such as errors, that require the inclusion of some method for modeling the limited capacity of human information processing.

27

Appendix A: Example Production System ;;; this production maintains the main goal in working memory and creates the subgoals MAINTAIN-GET-COPIES IF

(goal get-copies) AND (not (have copies))

THEN

(activate (goal get-copies))

;;; this production maintains the first subgoal MAINTAIN-PLACE-PAPER IF

(goal get-copies) AND (not (paper placed))

THEN

(activate (goal place-paper)))

;;; this production implements the first step of the method to ;;; accomplish the PLACE-PAPER subgoal LIFT-COVER-1 IF

(goal place-paper) AND (cover closed) AND (not (paper on-glass))

THEN

(enact (lift cover)) (activate (cover closed))

;;; this production continues the first method PLACE-ON-GLASS IF

(goal place-paper) AND (cover open) AND (not (paper on-glass))

THEN

(enact (put-on paper glass)) (activate (paper on-glass))

;;; this production closes the cover, thereby satisfying the first subgoal CLOSE-COVER-1 IF

(goal place-paper) AND (cover open) AND (paper on-glass)

THEN

(enact (close cover)) (activate (paper placed)) (activate (cover closed))

28

;;; this production maintains the second subgoal MAINTAIN-ENTER-NUMBER IF

(goal get-copies) AND (paper placed) AND (not (number entered))

THEN

(activate (goal enter-number))

;;; this production maintains the third subgoal MAINTAIN-MAKE-COPIES IF

(goal get-copies) AND (number entered) AND (not (copies made))

THEN

(activate (goal make-copies))

;;; this production maintains the final subgoal MAINTAIN-REMOVE-ORIGINAL IF

(goal get-copies) AND (copies made) AND (not (original removed))

THEN

(activate (goal remove-original))

29

Appendix B: Experimental Materials B.1 Screen shot of phaser bank

B.2 Example of a section from the phaser manual

Step. 2 Setting phaser beam focus Summary of steps to set phaser beam focus: • Click “Settings” • Adjust location of slider to desired focus • Click “Focus Set” The opti-magnetic focusing apparatus of the X3 class phaser makes it possible to control the amount of dispersion of the phaser beam itself. The higher the dispersion, the larger is the perpendicular cross-section of the beam. Therefore, the higher the dispersion, the easier it is to hit the target. However, the more dispersed the beam, the less damaging it is. Dispersion is controlled by the Phaser Focus Index, which is set on the Phaser Control Bank. The first step in setting the Focus Index is to enable the alteration of current settings. This is done by clicking on the “Settings” button, as is demonstrated in Figures 7 and 8.

30

Figure 7. Before clicking “Settings”

Figure 8. After clicking “Settings”

Once this is done, the Focus Index may be changed by means of a sliderstyle control. The slider is adjusted by clicking on the indicator and dragging it to the desired setting. This is illustrated in Figures 9 and 10.

Figure 9. The mouse button is clicked down and held

Figure 10. The mouse button is released at the desired index

Once the desired focus is set, it is necessary to inform the X3 firing system that the current focus is the desired one. That is, the focus setting must be “locked in.” This is done by clicking the “Focus Set” button, as demonstrated in Figures 11 and 12.

31

Figure 11. Before clicking “Focus Set”

Figure 12. After clicking “Focus Set”

Once this has been completed, the Phaser Focus Index has been set. As with charging the phaser battery, it is impossible to operate the other controls until this has been done. B.3 Example procedure summary from phaser manual Summary of Phaser Bank Operating Procedures: Step 1. Charge the phaser: • Click “Power Connected” • Click “Charge” •Wait until phaser charges the appropriate amount • Click “Stop Charging” • Click “Power Connected” Step 2. Set phaser beam focus: • Click “Settings” • Adjust location of slider to desired focus • Click “Focus Set” Step 3. Track the target: • Click “Firing” • Click “Tracking” • Use arrow keys to adjust location of the target indicator Step 4. Fire the phaser: • Press the space bar • Determine if the target has been destroyed • If not, return to Step 1 Then return to Main Control.

32

Appendix C: CAPS model C.1 Productions common to both models ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Main goal: Destroy the enemy ship ;;; (p MAINTAIN-DESTROY-SHIP 1 {(goal ^is destroy-ship) } < 1.0 (stored-state ^enemy-destroyed yes) --> (delta-spew 1.2)) (p MAINTAIN-STATE {(goal ^is destroy-ship) } 0.7 {(stored-state) } < 1.0 --> (delta-spew 1.2))

;;; ;;; First subgoal: ;;;

Fire the phaser

(p CREATE-FIRE-PHASER 1 {(goal ^is destroy-ship) } (stored-state ^just-fired no ^enemy-destroyed no) - (goal ^is fire-phaser) 0.1 --> (delta-spew (goal ^is fire-phaser ^parent destroy-ship) 1.2)) (p MAINTAIN-FIRE-PHASER 1 {(goal ^is destroy-ship) } (stored-state ^just-fired no ^enemy-destroyed no) {(goal ^is fire-phaser) } < 1.0 --> (delta-spew 1.2))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Fist subgoal of firing phaser: Charge it ;;; (p CREATE-CHARGE-PHASER 1 {(goal ^is fire-phaser) } (visible-state ^charged yes ^focus-set no) (stored-state ^enemy-destroyed no) - (goal ^is charge-phaser) 0.1 --> (delta-spew (goal ^is charge-phaser ^parent fire-phaser) 1.2))

33

(p MAINTAIN-CHARGE-PHASER 1 {(goal ^is fire-phaser) } (visible-state ^charged yes ^focus-set no) (stored-state ^enemy-destroyed no) {(goal ^is charge-phaser) } < 1.0 --> (delta-spew 1.2))

(p CONNECT-POWER 1 {(goal ^is charge-phaser) } (visible-state ^charged no ^power-connected no) - (click ^object power-connected) 0.9 - (click ^object power-connected) --> (delta-spew (click ^object power-connected) 1.2)) (p CLICK-POWER-CONNECTED-ON 1 {(click ^object power-connected) } {(visible-state ^power-connected no) } --> (modify ^power-connected yes ^feedback none) (delta-spew 0.0)) (p CHARGE-PHASER 1 {(goal ^is charge-phaser) } (visible-state ^power-connected yes ^charged no) - (click ^object charge) 0.9 - (click ^object charge) --> (delta-spew (click ^object charge) 1.2)) (p CLICK-CHARGE-PHASER 1 {(click ^object charge) } {(visible-state ^charged no) } --> (modify ^charged working) (delta-spew 0.0)) (p STOP-CHARGING 1 {(goal ^is charge-phaser) } (visible-state ^charged working) - (click ^object stop-charging) 0.9 - (click ^object stop-charging) --> (delta-spew (click ^object stop-charging) 1.2)) (p CLICK-STOP-CHARGING 1 {(click ^object stop-charging) } {(visible-state ^charged working) } --> (modify ^charged yes) (delta-spew 0.0))

34

(p DISCONNECT-POWER 1 {(goal ^is charge-phaser) } (visible-state ^charged yes ^power-connected yes) - (click ^object power-connected) 0.9 - (click ^object power-connected) --> (delta-spew (click ^object power-connected) 1.2)) (p CLICK-POWER-CONNECTED-OFF 1 {(click ^object power-connected)
} {(visible-state ^power-connected yes) } --> (modify ^power-connected no) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Second subgoal of fire-phaser: Set the focus ;;; (p CREATE-SET-FOCUS 1 {(goal ^is fire-phaser) } (visible-state ^charged yes ^focus-set no) (stored-state ^tracking off) - (goal ^is set-focus) 0.1 --> (delta-spew (goal ^is set-focus ^parent fire-phaser) 1.2)) (p MAINTAIN-SET-FOCUS 1 {(goal ^is fire-phaser) } (visible-state ^charged yes ^focus-set no) (stored-state ^tracking off) {(goal ^is set-focus) } < 1.0 --> (delta-spew 1.2)) (p SETTINGS 1 {(goal ^is set-focus) } (visible-state ^focus-value 0.0 ^triple-button battery) - (click ^object settings) 0.9 - (click ^object settings) --> (delta-spew (click ^object settings) 1.2)) (p CLICK-SETTINGS 1 {(click ^object settings)
} {(visible-state ^triple-button battery) } --> (modify ^triple-button settings) (delta-spew 0.0))

35

(p FOCUS-VALUE 1 {(goal ^is set-focus) } (visible-state ^triple-button settings ^focus-value 0.0) - (click ^object slider) 0.9 - (click ^object slider) --> (delta-spew (click ^object slider) 1.2)) (p CLICK-SLIDER 1 {(click ^object slider)
} {(visible-state ^focus-value 0.0) } --> (modify ^focus-value 1.0) (delta-spew 0.0)) (p LOCK-IN-FOCUS 1 {(goal ^is set-focus) } (visible-state ^focus-value 0.0 ^focus-set no) - (click ^object focus-set) 0.9 - (click ^object focus-set) --> (delta-spew (click ^object focus-set) 1.2)) (p CLICK-FOCUS-SET 1 {(click ^object focus-set) } {(visible-state ^focus-set no) } --> (modify ^focus-set yes) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Third subgoal of fire-phaser: Turn on the tracking system ;;; (p CREATE-TURN-ON-TRACKING 1 {(goal ^is fire-phaser) } (visible-state ^focus-set yes) (stored-state ^tracking off ^just-fired no) - (goal ^is turn-on-tracking) 0.1 --> (delta-spew (goal ^is turn-on-tracking ^parent fire-phaser) 1.2)) (p MAINTAIN-TURN-ON-TRACKING 1 {(goal ^is fire-phaser) } (visible-state ^focus-set yes) (stored-state ^tracking off ^just-fired no) {(goal ^is turn-on-tracking) } < 1.0 --> (delta-spew 1.2))

36

(p FIRING-MODE 1 {(goal ^is turn-on-tracking) } (visible-state ^triple-button firing) (stored-state ^tracking off) - (click ^object firing) 0.9 - (click ^object firing) --> (delta-spew (click ^object firing) 1.2)) (p CLICK-FIRING 1 {(click ^object firing)
} {(visible-state ^triple-button firing) } --> (modify ^triple-button firing) (delta-spew 0.0)) (p TRACKING-ON 1 {(goal ^is turn-on-tracking) } (visible-state ^triple-button firing) (stored-state ^tracking off) - (click ^object tracking) 0.9 - (click ^object tracking) --> (delta-spew (click ^object tracking) 1.2)) (p CLICK-TRACKING-ON 1 {(click ^object tracking) } {(stored-state ^tracking off) } {(visible-state ^target-visible no) } --> (modify ^tracking on) (modify ^target-visible yes) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Forth (and final) subgoal of fire-phaser: Track and fire ;;; (p CREATE-TRACK-AND-FIRE 1 {(goal ^is fire-phaser) } (stored-state ^tracking on ^just-fired no ^enemy-destroyed no) - (goal ^is track-and-fire) 0.1 --> (delta-spew (goal ^is track-and-fire ^parent fire-phaser) 1.2)) (p MAINTAIN-TRACK-AND-FIRE 1 {(goal ^is fire-phaser) } (stored-state ^tracking on ^just-fired no ^enemy-destroyed no) {(goal ^is track-and-fire) } < 1.0 --> (delta-spew 1.2))

(p CREATE-CLICK-SPACE-BAR 1 {(goal ^is track-and-fire) } (stored-state ^just-fired no ^tracking on)

37

- (click ^object space-bar) 0.1 --> (delta-spew (click ^object space-bar) 1.2)) (p MAINTAIN-CLICK-SPACE-BAR 1 {(visible-state ^target-visible yes) } {(click ^object space-bar)
} < 1.0 (click ^object space-bar) > 0.1 --> (delta-spew 1.2)) (p TRACK-TARGET 1 {(visible-state ^in-sights no ^tracks ^target-visible yes) } --> (modify ^tracks (compute + 1) ^in-sights (target-in-sights?)))

C.2 Productions used to model post-completion version of the phaser task (p CLICK-SPACE-BAR 1 {(click ^object space-bar)
} {(visible-state ^in-sights yes) } {(stored-state ^just-fired no) } --> (modify ^tracks 0 ^charged no ^triple-button battery ^focus-set no ^focus-value 0.0 ^feedback (target-destroyed?) ^in-sights no) (modify ^just-fired yes ^enemy-destroyed unknown) (delta-spew 0.0)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Second step (an operator) of destroy-ship: Confirm kill ;;; Note: this doesn't set up an explicit goal, really it's just a check ;;;

(p ENEMY-DESTROYED (goal ^is destroy-ship) {(stored-state ^just-fired yes ^enemy-destroyed unknown) } (visible-state ^feedback killed) --> (modify ^enemy-destroyed yes))

38

(p ENEMY-STILL-ALIVE {(goal ^is destroy-ship) } {(stored-state ^just-fired yes ^enemy-destroyed unknown) } (visible-state ^feedback >) --> (delta-spew (goal ^is fire-phaser ^parent destroy-ship) 1.2) (modify ^just-fired no ^enemy-destroyed no ^tracking off))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Final subgoal of destroy-ship: Turn off tracking ;;; (p CREATE-TURN-OFF-TRACKING 1 {(goal ^is destroy-ship) } (stored-state ^enemy-destroyed yes ^tracking on) - (goal ^is turn-off-tracking) 0.1 --> (delta-spew (goal ^is turn-off-tracking ^parent destroy-ship) 1.2)) (p MAINTAIN-TURN-OFF-TRACKING 1 {(goal ^is destroy-ship) } (stored-state ^enemy-destroyed yes ^tracking on) {(goal ^is turn-off-tracking) } < 1.0 --> (delta-spew 1.2)) (p TURN-OFF-TRACKING {(goal ^is turn-off-tracking) } (stored-state ^tracking on) - (click ^object tracking) 0.9 - (click ^object tracking) --> (delta-spew (click ^object tracking) 1.2)) (p CLICK-TRACKING-OFF 1 {(click ^object tracking)
} {(stored-state ^tracking on) } --> (modify ^tracking off) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; After the task: Exiting ;;; (p CREATE-EXIT-TASK 1 {(stored-state ^enemy-destroyed yes) } - (goal ^is destroy-ship) - (goal ^is exit-task) 0.1 --> (delta-spew (goal ^is exit-task) 1.2))

39

(p MAINTAIN-EXIT-TASK 1 {(goal ^is exit-task) } < 1.0 --> (delta-spew 1.2)) (p READY-TO-GO {(goal ^is exit-task) } - (click ^object main-control) 0.9 --> (delta-spew (click ^object main-control) 1.2)) (p CLICK-MAIN-CONTROL 1 {(click ^object main-control)
} --> (delta-spew 0.0) (halt))

C.3 Productions used to model the non-post-completion version of the task (p CLICK-SPACE-BAR 1 {(click ^object space-bar)
} {(visible-state ^in-sights yes) } {(stored-state ^just-fired no) } --> (modify ^tracks 0 ^charged no ^triple-button battery ^focus-set no ^focus-value 0.0 ^in-sights no) (modify ^just-fired yes ^enemy-destroyed unknown) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Second subgoal of destroy-ship: Turn off tracking ;;; (p CREATE-TURN-OFF-TRACKING 1 {(goal ^is destroy-ship) } (stored-state ^enemy-destroyed unknown ^tracking on) - (goal ^is turn-off-tracking) 0.1 --> (delta-spew (goal ^is turn-off-tracking ^parent destroy-ship) 1.2)) (p MAINTAIN-TURN-OFF-TRACKING 1 {(goal ^is destroy-ship) } (stored-state ^enemy-destroyed unknown ^tracking on) {(goal ^is turn-off-tracking) } < 1.0 --> (delta-spew 1.2))

40

(p TURN-OFF-TRACKING {(goal ^is turn-off-tracking) } (stored-state ^tracking on) - (click ^object tracking) 0.9 --> (delta-spew (click ^object tracking) 1.2)) (p CLICK-TRACKING-OFF 1 {(click ^object tracking)
} {(stored-state ^tracking on) } {(visible-state) } --> (modify ^tracking off) (modify ^feedback (target-destroyed?)) (delta-spew 0.0))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Operator of destroy-ship: Confirm kill ;;; Note: this doesn't set up an explicit goal, really it's just a check ;;;

(p ENEMY-DESTROYED (goal ^is destroy-ship) {(stored-state ^enemy-destroyed unknown ^tracking off) } (visible-state ^feedback killed) --> (modify ^enemy-destroyed yes)) (p ENEMY-STILL-ALIVE {(goal ^is destroy-ship) } {(stored-state ^enemy-destroyed unknown ^tracking off) } (visible-state ^feedback >) --> (delta-spew (goal ^is fire-phaser ^parent destroy-ship) 1.2) (modify ^just-fired no ^enemy-destroyed no))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; After the task: Exiting ;;; (p CREATE-EXIT-TASK 1 {(stored-state ^enemy-destroyed yes) } - (goal ^is exit-task) --> (delta-spew (goal ^is exit-task) 1.2)) (p MAINTAIN-EXIT-TASK 1 {(goal ^is exit-task) } < 1.0 --> (delta-spew 1.2))

41

(p READY-TO-GO {(goal ^is exit-task) } - (click ^object main-control) 0.9 --> (delta-spew (click ^object main-control) 1.2)) (p CLICK-MAIN-CONTROL 1 {(click ^object main-control)
} {(stored-state) } --> (delta-spew 0.0) (halt))

42

Appendix D: ANOVA tables

ANOVA table for original multiple regression Between-subjects effects Source Mix Load IDWM Load*Mix Load*IDWM Mix*IDWM Load*Mix*IDWM ERROR

SS 2.785 134.743 26.574 10.138 48.608 0.444 81.224 1968.622

df 1 1 1 1 1 1 1 55

MS 2.785 134.743 26.574 10.138 48.608 0.444 81.224 35.793

F 0.078 3.764 0.742 0.283 1.358 0.012 2.269

p 0.781 0.057 0.393 0.597 0.249 0.912 0.138

df 1 1 1 1 1 1 1 1 55

MS 437.739 11.574 23.873 8.432 64.397 14.409 15.270 15.769 13.697

F 31.959 0.845 1.743 0.616 4.702 1.052 1.115 1.151

p 0.000 0.362 0.192 0.436 0.034 0.310 0.296 0.288

Within-subjects effects Source PC PC*Mix PC*Load PC*IDWM PC*Load*Mix PC*Load*IDWM PC*Mix*IDWM PC*Load*Mix*IDWM ERROR

SS 437.739 11.574 23.873 8.432 64.397 14.409 15.270 15.769 753.332

PC = post-completion task structure (control, post-completion) Mix = task mix (A: phaser has PC, transporter control; B: transporter PC, phaser control) Load = external memory Load (no Load, Load) IDWM = individual differences in working memory (average z-score on Sentence Digit and Operations Word tasks)

43

ANOVA table for tracking instead of task mix Between-subjects effects Source Load IDWM Track Load*Track Track*IDWM Load*IDWM Load*Track*IDWM ERROR

SS 144.016 44.438 0.660 10.668 16.596 36.261 45.111 2004.251

df 1 1 1 1 1 1 1 55

MS 144.016 44.438 0.660 10.668 16.596 36.261 45.111 36.441

F 3.952 1.219 0.018 0.293 0.455 0.995 1.238

p 0.052 0.274 0.893 0.591 0.503 0.323 0.271

SS 513.123 12.251 26.435 1.020 76.330 38.527 10.225 17.231 762.381

df 1 1 1 1 1 1 1 1 55

MS 513.123 12.251 26.435 1.020 76.330 38.527 10.225 17.231 13.861

F 37.018 0.884 1.907 0.074 5.507 2.779 0.738 1.243

p 0.000 0.351 0.173 0.787 0.023 0.101 0.394 0.270

Within-subjects effects Source PC PC*Load PC*IDWM PC*Track PC*Load*Track PC*Track*IDWM PC*Load*IDWM PC*Load*Track*IDWM ERROR

PC = post-completion task structure (control, post-completion) Track = difference between tracking time in post-completion task and control task Load = external memory Load (no Load, Load) IDWM = individual differences in working memory (average z-score on Sentence Digit and Operations Word tasks)

44

References Anderson, J. R. (1983) The architecture of cognition. Cambridge, MA: Harvard University Press. Anderson, J. R. (1987). Skill acquisition: Compilation of Weak-Method Problem Solutions. Psychological Review, 94, 192-210. Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence Erlbaum. Baars, B. J. (1992a). Experimental slips and human error: Exploring the architecture of volition. New York: Plenum Press. Baars, B. J. (1992b). The many uses of error: Twelve steps to a unified framework. In B. J. Baars (Ed.), Experimental slips and human error: Exploring the architecture of volition. New York: Plenum Press. Baddeley, A. D. (1986). Working memory. Oxford: Oxford University Press. Bovair, S., Kieras, D. E., & Polson, P. G. (1990). The acquisition and performance of text editing skill: A cognitive complexity analysis. Human Computer Interaction, 5, 1-48. Byrne, M. D. (1992). Towards understanding one type of systematic procedural error. Unpublished manuscript. Card, S., Moran, T., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: Erlbaum. Elkerton, J., & Palmiter, S. (1989). Designing help systems using a GOMS model: Part I —an information retrieval evaluation (Technical Report C4E-ONR-3). Ann Arbor, MI: Center for Ergonomics, University of Michigan. Hitch, G. J. (1978). The role of short-term working memory in mental arithmetic. Cognitive Psychology, 10, 302323. Johnson-Laird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press. Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 123-148.Kieras, D. E. (1988). Towards a practical GOMS model methodology for user interface design. In M. Helander (Ed.), Handbook of human-computer interaction (pp. 135-158). Amsterdam: North-Holland Elesevier. Kieras, D. E., and Bovair, S. (1984). The role of a mental model in learning to operate a device. Cognitive Science, 8, 255-273. Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, 163-182. Kitajima, M., & Polson, P. (1992). A computational model of skilled use of a graphical user interface. In CHI’92 Conference Proceedings. Reading, MA: Addison Wesley.

45

Kotovsky, K., & Kushmerick, N. (1991). Processing constraints and problem difficulty: A model. In Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society. Laird, J. E., Newell, A., Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33, 1-64. Mannes, S. M., & Kintsch, W. (1991). Routine computing tasks: Planning as understanding. Cognitive Science, 15, 305-342. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Norman, D. A. (1981). Categorization of action slips. Psychological Review, 88, 1-15. Polk, T. A., & Newell, A. (1988). Modeling human syllogistic reasoning in Soar. In Proceedings of the Eleventh Annual Conference of the Cognitive Science Society. Polson, P. G., Lewis, C., Reiman, J., & Wharton, C. (1992). Cognitive walkthroughs: A method for theory-based evaluation of user interfaces. International Journal of Man-machine Studies, 36, 741-773. Reber, P. J., & Kotovsky, K. (1992). Learning and problem solving under a memory Load. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society. Reason, J. T. (1990). Human error. New York: Cambridge University Press. Salthouse, T. A., & Babcock, R. L. (1991). Decomposing adult age differences in working memory. Developmental Psychology, 27, 763-776. Senders, J. W., & Moray, N. P. (1991). Human error: Cause, prediction, and reduction. Hillsdale, NJ: Lawrence Erlbaum. Shastri, L. (1992). Neurally motivated constraints on the working memory capacity of a production system for parallel processing: Implications of a connectionist model based on temporal synchrony. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society. Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127-154. Young, R. M., Barnard, P., Simon, T., & Whittington, J. (1989). How would your favorite user model cope with these scenarios? ACM SIGCHI Bulletin, 20, 51-55.

Suggest Documents