Mental Models: A Survey Karl B. Schwamb Department of Information and Computer Science University of California Irvine, CA 92717 (714) 856-4196 ArpaNet:
[email protected] 12 November 1990 Abstract
This paper reviews a class of cognitive representations collectively known as mental models. These representations purport to integrate certain aspects of propositional and imagistic theories of knowledge representation while enabling more detailed theories of human cognitive behavior. This paper explores the nature of mental models by considering basic philosophical issues including ontology and semantics. Psychological evidence supporting the existence and identifying characteristics of these models are reviewed in the areas of comprehension, problem solving, and learning. Where available, computational implementations of mental model theories are discussed. After a comparison of mental models with more conventional knowledge representations, a brief discussion and directions for future research are given.
1
1 Introduction Philosophers through the ages have wrestled with the problem of understanding human mental processes. This paper surveys a fairly recent contribution made in the eld of cognitive science { mental models. These representations hold promise because they integrate work on the diametrically opposed topics of propositions and mental imagery. Mental models can contain much of the same information as a propositional encoding while preserving certain implicit relations associated with imagery. This produces a clean referential semantics, a compact encoding of knowledge, and a form manipulable by procedures previously acquired through interaction with the real world. This paper surveys the usage of this term in cognitive science and associated elds, attempting to both distinguish it from related concepts in mental representation and to identify its most salient features. Indeed, \model" is a ubiquitous term in studies of cognition and the following discussion aims to separate the more speci c sense of the phrase mental model from its more general usage. Rather than taking an approach which merely records the views of numerous researchers by listing a number of systems or separate theories, this paper will consider fundamental dimensions of representation theory in an attempt to nd integrating principles across the various applications. To explore the area, this paper rst considers desiderata of any cognitive representation. These serve to identify important issues that should be addressed in a full assessment of any representation. De nitions given in the literature are reviewed to illustrate the wide scope of usage associated with mental models. To further clarify these de nitions, an elaboration of the role of implicit relations is given followed by some comments on ontology. The use of these aspects in providing a referential semantics is considered along with contrasting semantic theories. Using this philosophical backdrop, eorts in psychology and arti cial intelligence are examined. Psychological evidence is presented which indicates certain aspects of comprehension are nicely accounted for in mental models. A simple computational model of comprehension is sketched out to clarify this approach. Problem solving methods using mental models are considered in the areas of deduction and simulation. Strong evidence is taken from the literature to show that mental models explain human reasoning on syllogisms much better than theories presupposing logical abilities. Much arti cial intelligence work in problem solving with these representations has been in simulation and several computer programs are considered along with their implications for a theory of mental models. Two aspects of mental models and learning are considered also. Instruction including a model appears to improve the performance of device users. In addition, novice and expert problem solvers in physics appear to have signi cant dierences in their modeling of the world. After the aforementioned review of these topics, several typologies oered in the literature are considered in an attempt to place more structure on this rich eld. Mental 2
models are contrasted with more conventional representations to clarify the distinguishing characteristics of this form. Finally, a brief discussion and directions for future research are given.
2 Desiderata If the mental model construct is to have merit, it should meet generally accepted criteria for theories of cognition in the eld of cognitive science. It must distinguish itself as a unique mental construct to justify investigation. It must be describable and measurable as psychologically valid to allow scienti c study of its properties. In addition, it must be speci c enough to be modeled in a computational framework. This provides both a rigorous test of theoretical coherence and speci city as well as a practical tool for behavioral prediction. These criteria are elaborated before proceeding with a discussion of the philosophical foundations of mental models. Any mental representation must be de nable. Without de nition scienti c inquiry cannot proceed and researchers will be doomed to futile arguments about vague notions. The de nition should include enough semantic constraints to avoid being synonymous with \thing". One should be able to identify other constructs which are clearly subsumed as well as others which are not. Lack of such considerations may easily result in a theory which is not falsi able.1 A construct should also add something new to existing theory. Ideally, it should clarify or unify older constructs. A construct and its properties should be measurable. Without objective measures of presence and behavior a construct is vulnerable to superstitious beliefs. Gentner and Stevens (1983) list a number of measurement techniques which have been used in the study of mental models: protocol analysis, psychological experiments, developmental studies, expert-novice studies, simulations, eld observation, cross-cultural comparison, historical comparison, and computer microworlds. While a small number of measuring tools would be preferable, the complex nature of cognition requires a plethora of techniques often from a number of elds. The construct should provide some explanatory power over natural phenomena. A rational cognitive theory2 should be able to describe the behavior under study including both normative and anomalous modes; ideally it should be able to make predictions about behavior (i.e., possess generative power). It should have robust explanatory power, by explaining individual dierences, and be extensible to related rational behaviors. Acquisition of the fundamental behavior as well as learning based on the behavior should be explained. It should account for exceptional behavior by allowing that people can be rational. On Garnham (1987) has made such a claim against schemata. These desiderata of cognitive theory are based on those given for a psychological theory of deduction in (Johnson-Laird, 1983). 1 2
3
the other hand, the theory should clarify why formal systems have been developed and how they help in the reasoning process. There should also be a test of practical power by demonstrating applications to education.3 In step with current standards in cognitive science, the construct should be modeled by some computational scheme. Hence, for comparison purposes, formulations of cognitive phenomena predating theories of computation will be ignored. This is not to say such formulations are not important or in uential; this merely serves to separate conjecture from scienti c exploration. In a poorly speci ed theory, intuitions may well be responsible for predictions when committing to speci c hypothesis tests. A computational model of a cognitive theory is necessary to avoid this problem by clarifying what theoretical claims are made. The goals of a computational model of a cognitive theory are numerous. Many of these originate from the concerns of psychological validity mentioned above. The model must re ect regularities in human cognitive behavior, e.g., no unlimited short-term memory. It should match the psychological theory in descriptive and generative behavior. Inherent properties of the psychological process should be explicable through algorithms and environmental in uences through the data. The computational model should be extensible to account for related behaviors. A procedure for acquiring new behavior (i.e., learning) should be given. It should be clear how the use of an external formal system would aid the procedure. Finally, an intelligent tutoring system (ITS) should be used to demonstrate practical feasibility of the cognitive modeling.4 It is clear no single research eort has met all the desiderata mentioned above. These desiderata, however, provide a clear set of goals to guide cognitive research. Fortunately, the eld of mental models as a whole has addressed many of these criteria. More will certainly be forthcoming.
3 De nitions There have been many de nitions of the term mental model oered in the literature. The philosopher Craik viewed all mental representation as mimicing the physical world. He conjectured that people operate on mental representations to simulate real-world behavior and produce predictions (Craik, 1943). Johnson-Laird, one of the most in uential psychologists in the area, describes a mental model as a cognitive representation which represents a \state of aairs" in part by having a similar \relation structure". Mental models are composed of a nite set of elements, each of which denotes some object, along with operations on those elements for construction, revision, and evaluation. The structure is derived 3 4
This latter point is persuasively argued in (Anderson, 1987). See (Wenger, 1987) for an overview of intelligent tutoring systems.
4
from ontological constraints, relations in the perceived or conceived world, and the need to avoid contradictions (Johnson-Laird, 1983). In the eld of text comprehension, van Dijk and Kintsch have argued for the importance of a situation model, which appears to have much in common with other notions of mental model (van Dijk & Kintsch, 1983). A situation model represents the people, actions, and events relevant to a situation which a text describes. They argue readers comprehend text by building models of the described situations. Johnson-Laird (1983) has also put forth his version of mental models as an integral component of comprehension. These approaches are discussed in more detail in the section on comprehension. Rouse and Morris, in a survey primarily directed at the \manual control" eld, view mental models as mechanisms whereby humans generate descriptions of system purpose and form, explanations of system functioning and observable system states, and predictions of future system states (Rouse & Morris, 1986). This is an admitted intersection of the usages given in the manual control literature which presents even vaguer descriptions. For example, they cite Veldhuyzen and Stassen as believing that mental models include knowledge about the system to be controlled, the properties of disturbances likely to act on the system, and knowledge about the criteria, strategies, and so forth associated with the control task. In short, the manual control community appears to use the term so broadly as to be almost vacuous. The heavy reliance on the term \system" makes it unclear what is disallowed from their formulations. Rouse and Morris cite a few other researchers who have a more directed focus on mental models of devices. Young holds that a mental model is a way of describing a device independent of usage (Young, 1983). This corresponds to de Kleer and Brown's no function in structure (NFIS) principle. This principle states that devices should be described structurally in a manner which does not presume their function. de Kleer and Brown's research interest focuses on qualitative models of simple devices (de Kleer & Brown, 1983). These models are embodied in arti cial intelligence (AI) programs that include the speci cation of a device topology, an envisionment process which can determine function from structure, and a causal model used to describe that functioning when the model is \run". Their representation scheme utilizes a context-free description of components and a procedure to determine function from a structural description of the topology of a device using such components. Because measured values in the device are qualitative, behavior is underdetermined leading to the need to apply assumptions to resolve ambiguities. Like de Kleer and Brown, researchers Williams, Hollan, and Stevens view mental models as composed of autonomous objects with an associated topology, \runnable" by means of local qualitative inference, as well as being decomposable (Williams, Hollan, & Stevens, 1983) Each object has an explicit representation of state, its connection to other objects, and a set of internal parameters. Each object has a set of rules which modify its parameters thereby de ning its behavior. \Running" is equivalent to propagating information through 5
the device topology. Their formulation is apparently a weaker version of de Kleer and Brown's.5 All of the above de nitions have certain commonalities. Mental models are representations of knowledge dealing with particular objects and/or people in particular situations. Ancillary knowledge needed to simulate the behavior of the objects may also be included. There is also a concern that relations between objects either be implicit in their structure or arrangement (cf. Craik and Johnson-Laird) or be emergent through simulation (cf. de Kleer and Brown). In both cases, there is a desire to avoid the explicit representation of certain kinds of relational knowledge. At this point, it is worth considering a possible pitfall in trying to integrate the conclusions of some of this research. Much of the mental models research has been conducted by AI researchers (e.g., de Kleer & Brown, 1983; Forbus, 1983) whose models have not been adequately compared to human performance. Although inspired by psychological theories, these approaches clearly have elements which are psychologically implausible. In fact, these qualitative models of physical systems resemble mechanical models in the world as much as mental models in the head. For similar reasons, Young (1983) and Rouse and Morris (1986) disagree with formulations based solely on device models since the most appropriate conceptualizations for problem solvers may depend on the task and a system-oriented version of mental model would be incomplete (Rouse & Morris, 1986). Norman further notes that since science is inherently a modeling enterprise, researchers must be careful to distinguish the following four viewpoints of an phenomenon: the phenomenon itself, subjects' mental model of the phenomenon, the cognitive scientists' model of the subject's model, and (if the phenomenon is an artifact) the artifact designers' model (Norman, 1983). It is also judicious to point out that not all cognitive scientists are agreeable to the aforementioned de ning characteristics of mental models. Guided by his observations of people interacting with artifacts, Norman concludes that mental models are inherently dicult to de ne (Norman, 1983). He nds that people's mental models of artifacts are: unstable because details are not always remembered; not clearly bounded because similar devices and operations are confused with each other; \unscienti c" because occasionally actions appear to be based on superstition; and not easily simulated due to model complexity. This latter point is also stressed by Kahneman and Tversky who considered whether people occasionally utilize a simulation heuristic to make judgements. They found the heuristic to be of fairly limited power (Kahneman & Tversky, 1982). On a positive note, Norman nds that mental models are somewhat parsimonious because people prefer to use extra physical operations in problem solving rather than complicate their model of a device. However, in stark contrast to the qualitative models proposed in AI, Norman nds people's understanding of devices \meager, imprecisely speci ed, and These views of mental models as device models could be modeled computationally using an objectoriented programming language. 5
6
full of inconsistencies, gaps, and idiosyncratic quirks." He nds that people are often aware of their limited knowledge and they can oer assessments of certainty regarding components and results of manipulation. Guided by their certainties, people develop behavior patterns that make them more secure in their actions (Norman, 1983).
4 Implicit Relations The implicit relation property of mental models mentioned above is derived primarily from observations made in the mental imagery literature.6 Palmer (1978) holds that this property is derived from an analogue representation of knowledge. The nature of analogue representation can be illustrated by considering two objects, one of which is longer than the other. This might be represented as a set of two elements (| and {). This contrasts with a propositional representation which would require an expression such as longer(a, b). The latter representation introduces a relation term longer to record the predication. The analogue form preserves the relative length relation implicitly. This property enables analogue representations to support low cost non-deductive inferences. For example, one can infer which object is shorter via inspection in the analogue representation, while the propositional representation requires additional rules of inference. Lindsay (1988) cites Dretske as noting that analogue forms (i.e., imagery) can generally be reexamined for other information and relationships in contrast to propositional forms. Levesque and Brachman (1985) state that analogues are more complete in that they contain more information about the domain being represented. They observe that one drawback to this approach is that a richly analogic representation may leave nothing unsaid about a situation, begging the question as to what is being represented. Hayes (1974) claims that analogue properties derive from the medium used to represent information. More speci cally, the representation mirrors the relation structure of a situation by placing the representing objects in a medium which possesses the same critical property as the medium which contains the represented objects.7 For example, the preservation of the implicit ordering of objects along a straight line would require the use of a one-dimensional spatial medium. Without a notion of medium, the inherentness would be lost. While maintaining inherentness in a medium which lacks the critical propThis property of mental models may lead some to question whether mental models are a form of analogy. While analogical reasoning also considers similar relation structures, that eld focusses on potential mappings between systems. Restricted by a semantics of reference, mental models make the modest presumption of a lawful relation between the structure of a mental model and the represented situation. See (Hall, 1989b) for an overview of analogical reasoning. 7 Hayes (1974) argues that the properties are preserved via homomorphism rather than denotation. But Garnham (1987) observes that this denotation through structural similarity produces a transparent semantics. This correspondence provides a justi cation for a representational form unlike the homomorphism argument. 6
7
erty appears unlikely, the ability to reproduce an intermediate form can recover inherent constraints. For example, if object positions were noted in a propositional form, the placement of objects in a spatial medium at the speci ed locations would restore the implicit relations. Since all encodings lose some information, it is preferable to encode and represent data as close to sensory input as possible. Such information is presumably stored in episodic memory. Due to the large amount of sensory data it is not always feasible to record all input. Palmer (1978) discusses a series of representational forms that people appear to use in ltering input data. These forms are de ned by a hierarchy of isomorphisms. A physical isomorphism represents information by using the same relations as the found in the represented situation. This is a property of imagery. A natural isomorphism utilizes certain inherent constraints in the medium. Maps and diagrams are examples of this category. A functional isomorphism preserves only the algebraic structure of relations. This most abstract category contains such representations as propositions. There are several positive reasons for utilizing a representation that encodes certain relations implicitly. Such analogue representations enable a reasoner to use the same operations in manipulating internally generated forms that are used to process sense data. Another parsimony argument is found in the compact representation of knowledge. An inde nite number of propositions might be required to encode the knowledge given in a single image. In a more abstract form, a device model is a more compact representation than a set of propositions or rules for modeling the real world. This is particularly evident when considering device fault diagnosis. Using so-called \mal-rules" requires designers to enumerate all possible faults (Sleeman, 1982). In a device model, faults are modeled by changes to component objects or their topology. The fault characteristics are emergent in the device model but must be explicit stated a priori using \mal-rules".8 Another bene t of inherent relations is the eciency gained by enabling quick correspondence checks with the real-world. This creates a sharp contrast between imagery and propositional representations. Since images have lawful relations to the real-world phenomena they record, they should be more easily compared to new incoming sense data than representations which can possess arbitrary relations. Logical languages proliferate relation objects in creating bloated representations and combinatoric processing complexity. Indeed, every representation which represents relations as objects is inherently arti cial. Relation veri cation is more parsimonious when the same procedure can be applied in both examining the real world (our perceptions thereof) and our internal models. Propositional approaches, on the other hand, rely on extrinsic constraints which are typically unspeci ed. This point has also been investigated by AI researchers Waltz and Boggess (1979). Spatial models were used to capture the meaning of English locative expressions. They A single change to a device can be represented by a single change to the device model while a large number of \mal-rules" may need to changed usually requiring operation of the device in question. 8
8
constructed a program which represented objects by rectangular solids that were oriented in space in positions dependent on the locative expression used to constrain the objects location. This program illustrated how simple spatial relations could be inferred quickly in contrast to logical representations which might require a long deductive chain. From a problem-solving standpoint, the physical description of objects can be used to infer roles they can play in situations. Gibson's ecological psychology recognized the importance of this relationship between objects and their uses and named it aordance. A similar theme is found in de Kleer and Brown's NFIS principle. Considering this and the other important properties of implicit relations mentioned above, Anderson (1978) may well be wrong in concluding that the choice of representation is arbitrary for cognitive modelling and only functionality remains important. While functionality may be preserved in other representations, it is dicult to see how timing regularities (possibly gained by the eciencies of implicit relations) can scale up properly without mimicing the structure of the human cognitive architecture. The evaluation of implicit relations is not solely positive, however. One diculty arises when a model is underdetermined. Johnson-Laird (1983) claims that a single state of aairs can be represented by a single model even when the description used to build it is incomplete or indeterminate. Mental models can directly represent indeterminacies if and only if their use is not computationally intractable. One way this can be accomplished is to select a representative model even when more than one is possible. Then the model builder is allowed to change the model as long as no previous statements are falsi ed, a relation being true if it is true in all possible models. As an example, consider the two statements \A is left of C" and \B is left of C". In a spatial model all three elements are placed in particular locations, creating some relation between A and B even though the statements make no such commitment. Waltz and Boggess (1979) had diculty with this point noting that having to use a single concrete model to represent underdetermined relations can introduce unintended interpretations of linguistic statements. Indeed, Johnson-Laird (1983) nds that people appear to use propositional representations when a single model cannot be formed easily. A diculty with this approach to uncertainty is the fact that propositional representations are used to retain knowledge even after forming models because that information is needed to check the coherence of any model changes. Thus, one might wonder why a model is needed at all, since the knowledge appears duplicated. Solid evidence for the separate existence of mental models will require careful experimental procedures. Williams, Hollan, and Stevens (1983) profess ignorance of how to test properties of subjects' mental models without proposing speci c models. Fortunately, researchers such as Garnham (1987) have some clever experimental procedures which try to address these concerns. One such procedure varies the order of statement presentation to subjects and tracks eye movements to provide evidence for the importance of referents in the comprehension process. 9
5 Ontology In order to clarify the representation of relations, the ontology of mental models must be considered. Unfortunately, the literature has little to say on this issue. By an ontology, we refer to the components which make up a mental model and the collection of properties we use to describe these objects.9 Greeno has discussed several ways in which an ontology is important to modeling (Greeno, 1983). The selection of conceptual entities determines what procedures may be applied to them. Poor selections may lead to either problem-solving impasses or ineciencies. He cites the work of Simon & Hayes (1976) which showed that the choice of conceptual entities could dramatically aect the performance of problem solvers on even logically isomorphic problems. Greeno also claims that mathematical problem solving can be enhanced if subjects are told the conceptual entities underlying equations rather than just how to manipulate them. The use of conceptual entities by researchers such as Greeno invite an analysis of whether they are concrete or abstract in nature. The literature is equivocal on this point. The argument that models have a similar relation structure to the real world suggests that the conceptual entities represent concrete objects. The nonarbitrary nature of models has been used to distinguish them from propositional representations. But models are more than just imagistic projections since they represent objects in situations not pictures. On the other hand, there are limits to what characteristics are represented since if all were represented, the represented object would be in the head! Interestingly, some examples of mental models given in the literature suggest considerably more abstract representations of conceptual entities. In the extreme, Johnson-Laird (1983) represents objects by using mere tokens to explain deductive inference. However, he suggests other models which have a more natural correspondence to the physical world. Hayes is sympathetic to the research framework of mental models in its attempt to understand the world in ner detail than other knowledge representation schemes (Hayes, 1985). In step with this program, he has produced a preliminary ontology for liquids { a dicult class to describe (Hayes, 1979). This ontology does not appear to have been used, so it is unclear what value it has for physical models in general. Also, his theory was developed analytically, not by observing human behavior, so its psychological validity is questionable. DiSessa had similar objectives in suggesting \phenomenological primitives" (p-prims for short) underlying our understanding of the physical world (DiSessa, 1983).10 He de nes The notion of ontology deals with more issues as well, including fundamental properties of media in which objects are embedded. However, those issues may be more properly ascribed to the more general study of metaphysics and in any event this paper does not address them. 10It is unclear whether these notions more properly belong to the more general area of metaphysics since the primitives deal somewhat with lawful behavior and the nature of the physical world. On the other 9
10
these primitives as \minimal abstractions of simple common phenomena". These are simple knowledge structures, evoked as a whole, whose meanings are \relatively independent of context". Examples of p-prims include rigidity, springiness, bouncing, rolling, and pivoting. Associated with these primitives are cuing and reliability assessments used to help select the p-prims appropriate to a situation. The priorities themselves are context-independent. The highest reliability primitive is used of those which are applicable in context.11 DiSessa has used a small set of p-prims to explain protocols taken of subjects explaining and justifying simple physical situations.
6 Semantics For any communicative act, one can ask how its meaning is expressed in a representation.12 This section reviews some problems with the semantics of conventional representations and the improvements brought through a referential approach to meaning used in mental models. Notions of truth are discussed also since mental models, unlike most other theories of knowledge representation, relate meaning closely to truth conditions. To limit the discussion, we consider only linguistic communication.13 Also excluded are peripheral aspects of meaning, including the connotative, stylistic, aective, and thematic (see Leech, 1974). Frege identi ed two main aspects of conceptual semantics: sense and reference. These correspond closely to the more contemporary terms \intension" and \extension". These aspects of meaning are motivated by two philosophical theories of truth: coherence and correspondence, respectively. The former ascribes truth to statements which are consistent with other true statements, as in logic. The latter ascribes truth to statements which correspond to actual states of aairs in the world. While other representations have grave diculty describing a correspondence theory, mental models have an advantage by the structural similarity principle.14 This principle provides an obvious mapping between a representation and a represented object. The signi cance of this approach is more clearly revealed in the context of traditional approaches to semantics. hand, the p-prims DiSessa identi es can also be viewed as properties or states of objects since they are often attributed to a single object and are supposedly context-independent. 11DiSessa speculates that p-prims are recognized by subjects at a low level of cognition. Harnad (in press) argues that a connectionist implementation may be used to determine the applicability of such concepts. 12Many of the ideas discussed in this section are due to Johnson-Laird (1983). 13However, the structural similarity property of mental models seems to hold promise for handling certain nonverbal communicative acts, such as pointing. 14Those sympathetic to the philosophy of Realism nd meaning outside the head. The eld of situated reasoning holds that meaning is partly in the world and what is in the head is inherently incomplete. Radical Realists, such as Putnam, hold that meaning is entirely outside the head, i.e., meanings are independent of mind (Johnson-Laird, 1983).
11
Conventional theories of word meaning focus on coherence. Such methods are based typically on a set of unde ned primitives and rules for de ning new terms based on old terms and/or primitives. Katz and Fodor (1963) proposed that word meaning is found by decomposing a set of necessary and sucient conditions into more primitive semantic components. Recursively, these components are also decomposable, eventually reaching a speci cation based on primitives. Semantic networks represent words (nodes) as a set of associations (links) to other words. These associations often include classi cation along with descriptive properties and relations. Meaning postulates are expressions in formal logic which specify the necessary relations between predicates. Individual words are represented by tokens in logical expressions. Kintsch, Fodor, Johnson-Laird and others have conducted experiments calling into question whether people decompose meanings. Their studies show that the complexity of word de nitions, as predicted by such representations above, is not a reliable predictor of sentence processing time. In experiments targeted solely at semantic networks, similar results have been obtained. Consider the two facts that a poodle is classi ed as a dog and that a dog is classi ed as an animal. While spreading activation would predict a longer latency for associating poodle with animal than poodle with dog, no such latency was observed. The failure of researchers to specify necessary and sucient conditions for word meanings also seriously weakens the explanatory power of meaning postulates. In addition to problems of coherence, these theories do not provide an adequate account of certain referential issues in language use. Referentially ambiguous statements often rely on context for proper resolution. Context-free rules such as selectional restrictions cannot be expected to provide reasonable accounts of the variety of expressions used to select individuals, as in \The meatball wants a sub." Another diculty is illustrated in experiments where more speci c words than those given in training are actually better cues that original words.15 For example, when given a statement such as \Tom whacked the nail" the word \hammer" might be a better recall cue than \whacked". Transitivity is another issue that appears sensitive to context. Relations such as \left of" are typically thought to be transitive in the abstract. But when placed in the context of people sitting around a table, the transitivity breaks down since the objects being placed are no longer in a linear arrangement. Finally, referents can play a role in determining the senses of other expressions. For example, the metonymy in the sentence \The crown is sending its troops to Iraq." requires the reader to determine the referent of \the crown". A theory which assumes literal meanings are assigned to words (and then selectional restrictions are applied) fails to handle such cases. Before considering the referentially-based semantics of mental models, it is important to consider a super cially similar semantic theory developed by logicians. \Model-theoretic" semantics maps linguistic strings to token set models. This approach is based on Frege's 15
This is the so-called \instantiation eect".
12
compositional semantics which holds that the intension of an utterance can be composed from the intensions of its constituents restricted only by grammatical constraints. Logicians gave this idea a precise formulation by mapping each proper name to a separate token, each property to a set of tokens, and each relation to sets of ordered tuples of tokens. This approach nesses the hard semantic issues by essentially creating abstracted worlds of tokens which serve as referents. Any utterance is interpreted as being true or false with respect to the token model. Hence, the semantic value of an utterance is simply a truth value. Kripke showed how model-theoretic semantics could be extended to the modal logics of necessity and possibility, resulting in possible world semantics. Intension and extension are viewed as mappings from possible worlds to individuals. Hence, the mappings identify individuals which possess the associated property either over all time or in a particular situation. Possible worlds have been applied to language semantics in Montague grammar. Unlike the above approaches which utilize a coherence semantics, mental model theories focus on referents. Garnham (1987) summarizes this philosophy by noting that the structural similarity property provides a \transparent" referential semantics. Johnson-Laird (1983) completes this idea by stating that an utterance is true if there exists a mental model, that satis es the truth conditions of the utterance, which can be embedded in a mental model of the world. Hence, this formulation provides an technique for dealing with both coherence and reference issues. Garnham (1987) points out that mental models serve not only the role of representing the world but also the role of interpreting linguistic expressions. The semantic properties of relations emerge from the explicit representation of situations which satisfy the truth conditions of utterances. In other words, one builds a model such that assertions are true of the model. Then the characteristics of any semantic relation emerge from the model. The spatial domain provides an simple example. Consider the two statements \A is left of B" and \B is left of C". By placing objects A, B, and C in the indicated arrangement, we can observe that the statement \A is left of C" holds by inspection. Hence, transitivity of the \left of" relation becomes an emergent property, not one requiring an explicit meaning postulate. Another interesting example is the conjunction \and". A conjunction of any two true statements in logic produces the same result (i.e., the conjunction is true) whereas in a mental model if the two statements cannot be assimilated into one model, then the additional model created (for the second true statement) leads to uncertainty and additional cognitive load. Hence, this emergence property enables mental models to account for certain context eects. These same ideas can be applied to create models of actual, possible, hypothetical, and imaginary situations. Johnson-Laird conjectures that mental models enable people to use much the same process to understand ction as is used to understand non- ction. He illustrates how mental models can use these notions to provide a single semantics for the conditional, which is often used to discuss non-actual situations. 13
Mental models are not stand alone structures, of course, but are created and manipulated by procedures. Thus, the semantics of a mental model depends on the procedural (operational in the domain of computer science) semantics of the manipulation routines. The section on problem solving below discusses this aspect in greater detail. Not everyone is agreeable to a referential theory of meaning, however. In particular, Rips levels several criticisms at the mental model community (Rips, 1986). He is not convinced by Johnson-Laird's arguments regarding truth and reference. He invokes the \formalist" argument: cognitive science researchers cannot concern themselves with semantic issues in formal approaches to issues in the eld. That is, truth is a matter for philosophers, not cognitive scientists (who follow a formalist approach). Rips argues further that mental models are just another representation and have no better connection to the real world than any other. While technically true, any representation theory which utilizes the structural similarity principle appears to have a priviledged semantic status due to the close correspondence between representation and represented objects. Rips holds that even if such semantic considerations were necessary, a propositional representation could do just as well. For example, in a spatial reasoning task, one could encode the positions and directions of objects using propositions and apply rules to infer relations. While it does not appear that all analogue structure could be encoded in this fashion, this argument is harder to refute. Rips nd mental inference rules more plausible than mental models for several reasons. Mental models use rules of their own in manipulation. Manipulation is discussed in more detail below. He argues that although model elements are supposed to provide referents, manipulation rules, such as those posed by Johnson-Laird, seem to focus on syntactic model properties. This point is well taken and the onus appears on Johnson-Laird to address this matter. Rips also nds the use of tokens as modeling elements sacri ces the ability to account for content eects. Finally, he nds mental logic rules are better supported by the literature. While these criticisms of mental models are applicable to Johnson-Laird, other approaches discussed below dier enough to make these criticisms less damaging. Some of these eorts address the structural similarity principle and many show support from the psychological literature. The issue of semantics is a vitally important one and has often been ignored in accounts of cognition. The limitation of a cognitive agent without some form of referential semantics is clear. Such an agent has no embedded capability to check the truth of assertions and must assume that both its input and its inference rules are correct. Clearly, such an agent has little chance for survival in a less than benevolent world.
14
7 Comprehension In this section, psychological evidence is presented supporting the use of mental models in linguistic comprehension. This evidence is consistent with properties of mental models as well as the construction processes used to build them. After discussing these matters, a computational approach to comprehension via mental models is sketched providing a more detailed account of this process. Bransford and his colleagues have examined numerous aspects of comprehension (Bransford & Johnson, 1972; Bransford & McCarrell, 1974). Their studies have considered how people understand objects, events, and linguistic statements. The meaning of an object is determined, in part, from the functions it performs. To support this point, Bransford applies Gibson's notion of aordance. A novel physical object can be understood if its perceptual properties are sucient to specify its possible role in events. When this is not the case, objects become meaningful through their interrelations with other objects. Bransford presents several devices whose purpose only becomes clear upon use { including an antiquated apple peeler. To understand events, we need to include or create enough information to achieve coherence. For example, all movement requires some instigating force { these forces need to be accounted for in understanding dynamic events. Events which require an agent need to have such agents identi ed in order to comprehend the nature of actions. In addition to coherence concerns, understanding is subject to constraints given in observed situations. An agents' capability for self-initiated motion can often be determined through perception. Repeated exposure to similar events enables us to acquire abstract event notions (e.g., walking). These are possible due to certain invariant relations and constraints on the roles objects can play. Bransford nds that linguistic comprehension is guided by a constructive process. He conjectures that comprehension \may involve options such as whether or not to create new [objects], whether or not to judge the truth value of a statement or presuppose its truth value and see what its implications might be". In particular, he and his colleagues state that people augment the literal knowledge given in linguistic utterances by 1) inferring spatial relations between objects 2) inferring instruments used to carry out actions 3) inferring consequences of events and 4) creating situations that justify the relations between two separate events (Bransford & Johnson, 1972; Bransford & McCarrell, 1974). They cite several studies to support these claims typically involving the reading of a short passage followed by some form of recall test. Subjects were likely to err by believing that sentences that were only implied by linguistic input were actually read. In fact, good readers are more likely than bad readers to make errors based on plausible inference. They were also better at rejecting implausible inferences. The importance of the constructive process is further illustrated by experiments which provided too little information to allow the development of a concrete coherent situation 15
model. They discuss two studies involving passages about a serenade and clothes washing that were worded too generally to enable comprehension. The addition of a single phrase to provide a familiar context enabled understanding to occur. Johnson-Laird (1983) has noted similar behavior when varying the order of statements within text concerning spatial relations. When statements were order such that each statement referred to at least one previously encountered referent, the passage was easily comprehended. When this constraint was violated, comprehension was considerably more dicult. van Dijk and Kintsch, in oering a theory of discourse comprehension, describe a construct that appears to handle many of the observations made by Bransford (van Dijk & Kintsch, 1983). A situation model is used to augment the knowledge contained in a \textbase" which contains propositional information about text that has been read. The textbase is checked for consistency and the situation model for referents and truth. They cite numerous issues that support the use of a situation model in discourse comprehension: reference, coherence, situational parameters (possible worlds, locations, and time), perspective (dierent agents and points of view), language translation, individual dierences, level of description, memory, reordering, cross-modality integration, problem solving, updating, and learning. On the other hand, they believe textbases are essential because people tend to remember many aspects of particular discourses: the speci c linguistic manner, discourse styles, rhetorical devices (such as rhyme, alliteration, metaphor), text themes, ordering, point of view. They claim that situation models integrate episodic information with general information from semantic memory. It is unclear whether situation models are unique (and updated upon recall) or used to collect similar situations as in MOPS theory (Schank, 1982). van Dijk and Kintsch believe the latter is more likely although those sympathetic to mental models would proabably prefer the former. Garnham (1987) presents further evidence on the nature of reference adding support to the idea of using a mental model in comprehension. He argues people remember content over the literal presentation given in an utterance. More precisely, people remember referents not the compositional semantics of utterances. Garnham's experiments show that people tend to confuse dierent referential phrases for the same object in recall tests. Garnham cites Winograd's SHRDLU program as exemplifying such model-based reasoning since its limited natural language understanding was performed with the aid of a small closed world of objects in a particular setting (Winograd, 1972). He also maintains that people use a demand-driven procedure to ll in missing information from texts. Using a special self-paced reading technique, his experiments seem to indicate that people don't ll in missing agents and instruments upon encoding. Using eye movement data and xations on referents, he claims that indeterminate passages are encoded verbally while determinate ones are converted into a mental model. While somewhat speculative, these conjectures are consistent with the approach of van Dijk and Kintsch. Johnson-Laird (1983) oers a computational model of comprehension to provide a more detailed basis for hypothesis testing. His approach indicates when to create a model and 16
how to integrate it with existing ones. A summary of this technique is presented here to clarify the mental model approach to comprehension. A current model of discourse is presumed to provide a basis for later constructions. A new mental model is created when an assertion makes a reference to a non-existent entity in the current model of discourse. If such an entity exists, then the new information is added to the current model accordingly. It follows from this that procedures are needed to integrate two or more mental models if an assertion interrelates them. If an assertion is made where all referenced entities exist, then the existing relations in the model are checked to verify the truth of the statement. Two procedures check for validity. If an assertion is true, a check for falsi ability is made by examining alternative yet valid models guided by the constraints in the statements already given. If a model cannot be constructed to make the statement false, then it can assumed to follow from the givens. If the statement can be falsi ed, the model is modi ed to make it consistent with previous assertions while rendering the current assertion true. Where no such modi cation is possible, then the current statement is inconsistent. The propositional attitudes of possibility and necessity can be handled with the above approach. If an assertion is made concerning possibility then the assertion is true in some model but not exclusively. Necessity requires that the assertion be true in some model and no other models are possible. Discourse is considered coherent if it is possible to construct a single mental model from the statements given. It is independent of plausibility { it depends on reference and consistency. It might be argued that the completion of a single mental model is necessary to achieve the \click of comprehension". Conversely, it might be the case that our quest for understanding originates from a natural desire to create a single mental model of the world.
8 Problem Solving In the mental model framework, problem solving is explained as model manipulation.16 This manipulation is achieved by applying various procedures to a model structure. These procedures perform creation, revision, and examination. The nature of the procedures depends on the domain associated with the mental model. Oddly, model creation has received the least attention of the three. Rather than explaining the origin of a model, most researchers specify the type of model used in a given domain as a precursor to explaining problem-solving behavior. This approach either prespeci es 16In some ways, problem solving via model manipulation appears to have much in common with statespace search used in AI. State-space search makes no commitment to the particular representational structure, being a general technique, while mental models do. This is particularly evident in the simulation section below.
17
the mental model construction process or presumes the existence of a mental model. When this is not sucient, a problem solver might either restructure an existing model or use some form of analogy. Montgomery (1988) argues that restructuring the problem-solving representation can result from mental model manipulation leading to subsequent insight into a problem. Collins and Gentner (1984) use \structure mapping" to create new mental models via analogy. Their procedure partitions the target structure into components, maps analogous familiar models into components, connects them together, and tests the new model's operation through qualitative simulation. They give an example of how the conceptual structure of evaporation is derived from knowledge of people in crowded rooms and rocket trajectories. Model revision has numerous facets as is shown in the following two subsections. As with model construction, the origin of revision procedures is murky. One interesting conjecture is that these procedures are in large part the same as those we use in our experience with the physical world. For example, we are familiar with arranging objects spatially and the same skill may be used in arranging objects in spatial mental models. This idea gives a plausible account of manipulation procedure origins in a parsimonius fashion. Extending this notion, procedures learned in a domain particular to certain tasks and objects, may be utilized in the manipulation of mental models of the same. The following two subsections explore the two most active research domains for problem solving with mental models: deduction17 and simulation. Johnson-Laird (1983) presents convincing evidence that mental models are signi cantly better explanatory devices of syllogistic reasoning than previous approaches. The simulation subsection describes the work of several AI researchers who have looked at commonsense reasoning and device models. Before delving into these areas, it is important to keep in mind that much of the problem-solving described here occurs in a larger context guided by several meta-level concerns. While models are being built they should be checked for consistency and completeness. When a model is inconsistent it must be revised or abandoned. If it is incomplete, it requires further elaboration. The particular action to take at any given time is often guided by belief and con dence in aspects of the model. In addition to intra-model issues, Williams, Hollan, and Stevens (1983) note a full account must also explore the relation of mental models to other forms of knowledge and reasoning (such as constraints). For example, Hall, Kibler, Wenger, and Truxaw (1989) observed both the use of multiple mental models as well as interactions between them and other representations (text and algebra) in solutions to algebra word problems. It was observed that solution strategies may be suggested by certain mental models when not accessible through more conventional context-free representations such as algebra. The literature has very little to say on the other two main aspects of scienti c reasoning { induction and abduction. 17
18
8.1 Deduction
Logic and so-called \natural deduction"18 are based on a content-free notion of validity. However, humans do not appear to reason in this manner. Their reasoning is sensitive to content and any theory with explanatory power must account for this. The common basis for deduction in logic and mental models is the belief that an argument is valid if there is no interpretation of premises consistent with a denial of the conclusion. Johnson-Laird shows how this may be achieved without appealing to formal logic. Inferences are not the whole story, however, since children apparently learn the meaning of connectives before acquiring inference (Johnson-Laird, 1983). Connective meaning can be determined by considering the truth conditions which support their valid use. This observation highlights the fundamental nature of truth conditions which is addressed directly in mental models. A simple case of deductive reasoning is found in syllogisms. A syllogism is a simple problem which invites one to deduce some conclusion from two simple statements, such as \All A's are B's" and \Some B's are not C's". Johnson-Laird (1983) presents an eective procedure for syllogistic reasoning using mental models. Each group of objects referred to (such as the A's) are represented by a set of tokens. The modelling process consists of three simple steps. The initial step represents the rst statement by creating a small number of tokens for each set and recording the relationship between the sets. For example, the rst sentence above would be represented by each element of the set A being connected to each element of the set B. Since there may be more B's than A's, two models are possible. The second step of the modelling process adds the information stated in the second premise to the existing model using the same process of noting codesignation. Finally, a conclusion is formed expressing the relation that holds between the \end" terms, in this case A and C. The \middle" term is mentioned in both premises. In this case, we might conclude that \Some of the A's are C's". Each step in the process creates some number of models. Johnson-Laird and Bara (1984) have conducted experiments which indicate that the number of models possible at any given stage is strongly correlated with the number of reasoning errors generated by subjects. They also considers other factors, but the main point is that this approach provides good predictions of subjects performance on syllogisms. Indeed, these errors are not even explained by logical formulations of human rationality such as the ANDS system constructed by Rips (1983). Johnson-Laird has also shown how this simple method can be applied to statements involving more than one quanti er (Johnson-Laird, 1988). More recently, he has also shown how certain kinds of meta-logical reasoning can be handled (Johnson-Laird, 1990). Natural deduction is a model of human reasoning which presumes that people use non-standard rules of inference in an otherwise logical framework. For example, they may not possess a rule corresponding to modes tollens but rather something more speci c in their inferential repertoire. 18
19
As shown above, logical operations can emerge from mental models. It is interesting to question, then, why logic was invented. Johnson-Laird (1983) conjectures that logic was developed to deal with situations involving a number of interpretations. This would serve to both reduce cognitive load and provide a more reliable method of dealing with complex statements. Rips (1986) makes a valid criticism in noting that the use of mental models lacks the power of general inference rules such as modus ponens that people seem to be able to apply in many dierent domains. On the other hand, the nature and power of mental models appears to provide a better explanatory theory of human reasoning in this domain than natural deduction schemes. Other eorts have focussed on model representations that mirror more of the structure of the physical world. Gelernter (1963) describes a program which uses a sort of mental model to aid in proving geometry theorems. He uses a standard theorem prover, except that subgoals are checked against a geometric diagram to determine if the conjecture is true before attempting a proof. By checking the truth status of the conjecture using the diagrammatic model, the proof procedure was able to achieve impressive gains in eciency. Inspired by Gelernter, Lindsay (1988) presents a set of construction and retrieval operations for a more general form of diagrams which he claims is a more detailed form of mental model. He distinguishes between 1) the diagram 2) what the diagram denotes 3) the diagram representation and 4) the class of representation structures for diagrams. Unfortunately, he makes no claims for a descriptive theory of human reasoning. Fuson (1990) takes this idea further by proposing the use of imagery in deductive rules. Assuming a large target image is present to represent an initial state, rules are applied to the image to \deduce" a new image. The rules applied have both antecedent and consequent given as (small) image patterns. A single \inference" is performed by matching a rule antecedent to some portion of the target image and replacing the matching portion by the consequent pattern. While mental models and imagery have aspects of the structural similarity principle in common, the use of imagery has some problems in this case. The order of scanning the target image can produce dierent results when patterns in the target image overlap. It is not clear if all possible images produced by diering scan sequences are \valid". Indeed, it is not even clear what valid and invalid mean in this context.
8.2 Simulation
By far the most research on the topic of mental models has occurred in the area of reasoning about the physical world. Rather than replicating the knowledge of physics, these eorts attempt to determine the qualitative knowledge that humans use in making judgements. Those aspects that have attracted the most interest are spatial reasoning, causation, and device modeling. The manipulation of these qualitative mental models naturally produces simulation. As objects are changed or rearranged in a model, they represent the simulation of activity in the real world. Due to this property, manipulation routines that mimic 20
action in the physical world would appear to be the most natural operators. More abstract procedures could be based on these physically-based manipulations. Norman and Rumelhart (1975) have said that much of a mental model's power comes from simulation capabilities.19 On the other hand, Kahneman and Tversky (1982) state that simulation is dicult for humans. Forbus' (1981) computer simulations of qualitative reasoning about simple motion echoes this concern. Both suggest that simulation is probably not done by humans except in trivial cases. Kahneman and Tversky suggest that some simulations may be just rule-of-thumb reasoning. Despite these criticisms, researchers have examined the potential computational advantages of problem solving using mental model simulations. To simplify the problem of characterizing these qualitative mental models some researchers have examined simple devices. de Kleer and Brown (1981) have looked at a mechanical buzzer which involves only three components. One of the most interesting observations made in this research is the importance of the \no function in structure" principle which states that \the rules for specifying the behavior of any constituent part of the overall system can in no way refer, even implicitly, to how the overall system functions." Besides being a conventional argument for modularity of description, this has important rami cations for model revision. If model components are described presuming that the other portions of the device are working, then minor changes to the structure of the device will produce incorrect simulations. The functioning of the device should be an emergent property of its structure. This notion is closely related to the notion of implicit relations typically found in mental models. Both describe phenomena in a structural manner to avoid unnecessarily committing to a particular interpretation. While the principle has been criticized for being unachievable (Keuneke & Allemang, 1989), it has been a useful guide in developing robust models. de Kleer and Brown (1983) characterize their approach to mental models of devices as involving four aspects: a speci cation of device topology (implicitly including device descriptions); envisioning, which determines function using the structure of the device; causal models, formed from envisioning the possible functions of the device; and simulation. While these terms are very closely related in common discourse, de Kleer and Brown view these as clearly separate steps in the modelling process. Given a device structure, envisionment determines all possible causal pathways. Given a particular set of inputs, a particular causal model can be formed as a subset of what can be envisioned. Finally, the simulation displays the sequential nature of value propagation in the causal network. The qualitative nature of the models stems in part from the modal characterization of quantity values. de Kleer and Brown (1983) have noted that this unfortunately leads to underconstrained models of behavior implying the need for some form of nondeterministic In an interesting conjecture, Norman and Rumelhart suggest that simulation might be the mechanism whereby we determine the aordance of an object, as in \Can you sit on that?". 19
21
control in simulation. To disambiguate the operation of models and avoid this control problem, explicit consideration of assumptions is used.20 One class of assumptions they call \class-wide" as they apply to all devices in a given class, e.g., current ows instantaneously. Other assumptions apply to particular devices, such as having a particular voltage on an input/output line.21 The envisionment process is used to construct a causal model. This diers from approaches such as Forbus' qualitative process theory which assumes that the developer speci es causal interactions (Forbus, 1984). A causal model is simply a graph with quantity values and states as nodes. Links to a node indicate which nodes were used to de ne it. The construction of causal models often has ambiguities so assumptions are necessary to resolve them. Ambiguities originate from the use of qualitative quantity values and qualitative time (leading to ordering ambiguities). Additional ambiguities arise from the fact that some knowledge is not encoded in the model, such as constraints in cause and eect rules. The use of assumptions forces envisionment to construct a set of plausible models so external evidence must be used to select one as \the" causal model. The primary motivation for developing the circuit model was to investigate its applicability to instruction in device troubleshooting (Brown, Burton, & de Kleer, 1982). de Kleer and Brown view the diagnosis process as a form of abduction, attempting to determine the (change in) structure from the (mis)functioning device. Davis (1984) concurs with this view. He describes a system which utilizes information about not only the connectivity of a circuit, but its spatial layout as well. Davis' approach stresses a principle that might be viewed as: single faults tend to originate from the interaction of adjacent structures along some causal pathway. Sets of these causal pathways serve to de ne a representation. He claims that faults are dicult to diagnose when they occur in representations that are unusual. Hence, the art of diagnosis includes the proper choice of representation: that in which change is compact. Davis gives an illustrative example in which consideration of electrical connectivity is not sucient to diagnose a bridging fault (the inadvertent connection of neighboring pins typically induced by a poor soldering job); the physical layout of chips provides the critical information needed to isolate the fault. This step requires removing the assumption that spatial layout is not important to consider a more detailed model of the device. He argues that the single technique of constraint suspension (a dual of constraint satisfaction) can provide a methodical search both within and across causal pathways to isolate a fault.22 Davis notes that multiple faults lead to a much more dicult diagnosis problem. Unlike these researchers who presuppose a model of the device under consideration, An elegant mechanism for assumption management is developed in a later paper of de Kleer (1986). This technique enables multiple fault diagnosis among other things. 21In de Kleer and Brown's models input is not distinguished from output since that depends on how a device is used. 22However, this search relies on a prespeci ed ordered list of fault class priorities to be ecient. 20
22
Doyle (1988) discusses a system, JACK, which hypothesizes device models given only input/output data and a library of device components. While Doyle's work also does not model human behavior, it is interesting to consider how model construction might be accomplished. JACK uses primarily two classes of information: constraints and hypothesis ordering. Constraints include information on component parameters, such as type, direction, and medium. Hypotheses on device structure consider, in order, causal paths of linear connections, interactions, hidden inputs, and nally cycles. An hypothesis, which is typically some additional device component, is formed by scanning the partially speci ed timeline associated with the device speci cation (initially just input/output information) for pairs of components which exhibit symptoms characteristic of the aforementioned four causal paths. Once an incompleteness in the device structure has been classi ed, a device component can be added to address the local problem. The procedure iterates until the input/output behavior is completely explained. Another category of mental model simulation involves qualitative spatial reasoning. Forbus (1984) promotes a theory using qualitative process descriptions attempting to focus more on actions than objects. Funt (1980) has investigated the use of imagery manipulation in making judgements about the results of events. Waltz and Boggess (1979) have examined models of spatial placement. Forbus rst explored qualitative reasoning about space in his FROB system (Forbus, 1980). FROB reasons about the motion of balls in a two-dimensional space. The space is partitioned into rectangular regions and the ball motion is modelled as moving to regions selected by the qualitative direction and velocity of the ball. In a manner similar to the circuit troubleshooter of de Kleer and Brown, FROB must select spatial paths from the many that are possible due to the underdetermined nature of qualitatively described behavior. A more general approach to qualitative reasoning about space (and other aspects of the physical world) is given in Forbus' qualitative process theory (Forbus, 1984). His focus is on processes in contrast to de Kleer and Brown's interest in objects. Examples of processes include moving, owing, stretching, and boiling. Bobrow (1984) states that while the underconstrained speci cations may lead to possibly inconsistent models of the physical world, Forbus' theory is better suited to explaining human reasoning. Forbus has worked with Gentner to propose a framework for studying human learning using qualitative process theory (Forbus & Gentner, 1983). At the lowest level of Forbus' theory are quantities. Quantities are de ned in the same fashion as with device models, having qualitative parameters for type, sign, and magnitude. The possible magnitudes are de ned by a quantity space which is a partial order of possible symbolic values. Any two quantities may be related by specifying that one is qualitatively proportional to the other. These constructs are used to specify individual views and processes. An individual view de nes the conditions for the existence and/or state of an object. This includes a speci cation of what other objects must exist and what 23
quantities must hold, along with any other preconditions. If true, then the individual view becomes true and its associated relations are asserted. Processes are de ned in the same manner as individual views, except that they can have additional \in uences" which aect how quantities can change. Furthermore, all changes are caused by processes. A qualitative process model speci ed using the above language would be created by identifying and codifying the conditions for individual views and processes. A simulation can be enacted by checking the preconditions of these entities and whenever true, the corresponding relations and/or in uences asserted. Hence, the underlying operation of such a model is very similar to a forward-chaining production system. The primary dierence results from the management of assumptions which is omnipresent in qualitative models. Funt (1980) describes the system WHISPER which performs problem solving with the aid of simulated imagery manipulation. WHISPER addresses the problem of unstable block structures in the \blocks world". The system proceeds by generating a diagram which is projected onto an array (modeled after certain aspects of the human retina). The array is then examined for local relationships between blocks, such as touching and nearness. A \high-level reasoner" uses the results of the array examination in making predictions about subsequent motion. These predictions lead to the formation of a new diagram and the process reiterates. Processing is continued until a stable con guration is achieved. The purpose of this work is somewhat unclear. Information about future states can be computed from the data used to generate diagrams, hence the importance of diagram generation and \retina" examination is unclear. This program is not used as a model of human visual processing, so the ineciency does not appear justi ed.23 Waltz and Boggess (1979) have proposed a visual analogue model for modelling English locative expressions. The basic representation uses 3-dimensional rectilinear metric volumes to bound the spatial extent of the objects mentioned in locative expressions. They claim to handle context by adding constraints on objects involved in the described situation. The representation uses numeric constants in specifying coordinates (arbitrarily selected to satisfy the various constraints) so its not clear how this is relaxed. Each locative has an associated function that makes decisions about word sense and builds the appropriate model. The system answers questions by using direct tests on the model thereby bypassing long chains of inference and combinatorial proof methods. As in the circuit diagnosis problems of de Kleer and Brown, ambiguity is a problem since several models are often possible. Although, the representation is not qualitative in the manner of Forbus' FROB program. Consequently, for natural language generation, the system has diculty in selecting locative expressions based on the model since the information may be overconstraining. Johnson-Laird (1981) also discusses the modelling of spatial placement. He presents Indeed, the use of the term \retina" is unfortunate. Researchers in mental imagery have concluded that imagery processing does not extend down to the level of the retina (see Finke, 1989). Certain properties of the retina (e.g., afterimages) do not appear to be producible in mental imagery experiments. 23
24
a persuasive argument for its use in comprehension by considering the number of placement models that can be generated from linguistic descriptions. Johnson-Laird and his colleagues have investigated simple domains such as the relative placements of tableware. When only one (qualitative) model can be generated, the description is more easily comprehended. Unfortunately, he gives little detail on the computer implementation of these ideas, especially when more than one model is possible.
9 Learning This section discusses several aspects of learning and mental models. Studies are presented to illustrate that problem solvers nd model information helpful in problem solving. A more detailed examination of learning in problem solving is found in psychological studies of dierences between novices and experts. Finally, some preliminary computational models are mentioned brie y. Halasz and Moran (1983) examined the dierences between subjects who performed computational tasks on a reverse Polish notation calculator. The control group received training in procedures for using the calculator in simple problems. The experimental group received the same instruction along with information describing a stack model of the calculator's operation. While there was little dierence in simple problems, the modeling group performed better on novel problems indicating better transfer. Kieras and Bovair (1984) expanded this line of research by considering other factors that may aect learning of device operation. Using a \Star Trek Enterprise phaser panel" device, they found that instruction including a model of device operation led to better operation of the device, improved memory retention, and introduction of short-cuts in problem solving. In a factorial experiment examining the eect of a motivational story (i.e., treatment was either present or absent) and speci c information relevant to the phaser panel, the speci c information group did better, irrespective of motivation or motivational content, which included general principles of device operation, or information about system components. The authors conjecture that the results may be due to the motivational elements not contributing to the formation of precise inferences. They conclude that problem solvers don't always need a precise device model to perform adequately. They cite as anecdotal supporting evidence our lack of knowledge of the internals of telephone operation. This de cency does not prevent us from adequately operating the device. The generality of this result is unclear, however. Norman (1989) illustrates how telephone features (other than dialing) are very confusing and due in part to our lack of a suitable device model. Young (1983) elaborates on Halasz and Moran's work by considering two dierent types of mental models applied to three calculators. He states that a learning approach needs to consider the relation between the tasks to be performed and the devices to be used. He considers two classes of mental model: surrogate and task/action mapping. A 25
surrogate model is a physical or notational analogue and can be used to answer questions about the modeled object's behavior. This is the same type of model as the device models described above. It explains input/output behavior of the device but doesn't explain a user's performance or learning.24 The task/action mapping is the core of the relation between user and system actions. Young claims the mapping helps to explain performance but is not detailed enough to account for calculator properties and hence input/output behavior. Young gives a discussion of these two model types vis-a-vis three calculators: reverse Polish, algebraic, and the traditional \four function" variety. These attempts illustrate why surrogate models are poor psychological models. There is no facility to model the solution sequences of users and their gain in eciency with learning. The surrogate model is likely to be too complicated to seriously entertain as being held and manipulated in the mind. As an example, Young suggests that if a calculator can handle parenthesis matching, the internal device model is likely to be too complicated for users. Finally, the algebraic structure users expect in the input does not always match what the device will accept as valid input. This aspect can lead to surprises in calculator behavior. Some calculators will produce the output \25" given the input \5 * =". Due to these problems, Young suggests that a vague statement of calculator operation, such as \it evaluates expressions according to the rules of arithmetic", will be more likely to communicate a useful model of operation than the surrogate model. The other approach Young examines is the task/action mapping. The task is simply a speci cation of what is to be done and the actions are the interface actions necessary to perform the task. The mapping between them is a hierarchical user-device interface description. Young notes that the calculator operation may induce a particular structure in the user's model of operation. The four function calculator, oriented toward computation of simple binary expressions, induces left-branching binary trees in long expressions. The reverse Polish notation calculator requires the user to preprocess the input expression to convert it to post x. While it provides a better performance model of the user, Young admits it does not allow for learning. While Kieras and Bovair (1984) claim that users are not helped by internal models of a device, others are interested in how a technician understands a device. de Kleer and Brown's work (1983) discusses a series of mental models used in understanding devices. They don't explain how a learner switches from one model to another, although they speculate on how learning might occur guided by their view of device modelling. They conjecture the existence of three kinds of learning: connecting structure and function, making implicit assumptions explicit, and \caching" the results of selecting one causal model from the set of possibilities. The rst creates a causal model and connects For this reason, Young notes that a model oriented strictly around the device is not a good basis for design. This point is supported by Norman (1989). 24
26
it to a structural model. When violations to consistency, correspondence, or robustness occur then an opportunity arises to make implicit assumptions explicit. Adding explicit assumption handling requires external evidence to resolve the device behavior. An expert can recover assumptions along with the corresponding causal model, if there is a diagnosis to be made. However, novices must resort to the second kind of learning. The expert performs more eciently because of \caching" the possible causal models and associated assumptions. de Kleer and Brown hold that explanations of device behavior are critical to learning. They propose that a sequence of mental models can serve adequately in explaining the operation of a device. An explanation would initially use simpli ed components that enable envisionment to produce the desired causal models. Further explanation would re ne components to encourage the removal of implicit assumptions. They claim this approach leads to ecient learning since the mental model would only be re ned, not reorganized. In addition to the study of device models, psychologists have examined the relationship between novices and experts in order to illuminate the dierences in their models. The studies mentioned in the remainder of this section examine the domain of physics. Chi, Feltovich, and Glaser (1981) found that novice physics students tend to use surface features of problems while experts tend to use theoretical principles. More speci cally, novices tended to classify problems and organize their knowledge around surface features, while experts utilized theoretical constructs, such as mass, force, and momentum. These ndings are further supported by Clement (1983) and McCloskey (1983) who found that subjects with little exposure to the formal principles of physics tended to reason about motion problems using an approach much like Galileo's impetus theory. In contrast, experts appear to utilize the theoretical ontology of physics in solving such problems. diSessa (1983) speculates that lay people may have \phenomenological primitives" based on false intuitions, such as \force is a mover", or \energy dies away". Through training and experimentation, experts gradually revise or abandon these primitives and come to see the more theoretical elements as naturally occurring. Roschelle (1988) suggests that experts don't actually lose the qualitative reasoning abilities that novices depend on but rather use a mixture of qualitative commonsense reasoning and theoretically-grounded reasoning. He presents an computational model which uses qualitative simulation of a situation model of objects to support problem solving done in a theoretical model of point masses and forces. Roschelle argues that this approach is a more robust model than simple textbook algebra or a mathematical case analysis. Qualitative reasoning with a problem situation is supposedly easier than manipulating mathematical equations describing it, even for experts.
27
10 Typologies Researchers have oered numerous classi cation schemes in the literature. These schemes are often trivial, containing only one or two distinctions. Hopefully, a more useful detailed framework will be advanced in the future. Johnson-Laird (1983) provides a binary division of mental models into physical and conceptual domains. In the physical domain, models may be formed which are spatial, temporal, kinematic, dynamic, or imagistic. In the conceptual domain, models can be classi ed as monadic (involving a single relation), relational, meta-linguistic (leading to recursive mental models), and set-theoretic (i.e., a nite number of tokens used to directly represent sets). Since mental models are closely related to the structure of the domain being modeled, classi cation schemes based on physical domain properties naturally arise. Johnson-Laird (1983) has provided simple models of spatial relationships as tokens or objects placed in the appropriate arrangement. Kuipers (1978) gives a more elaborate model of spatial reasoning in the large. Davis (1984) uses spatial knowledge to isolate device faults. Forbus (1984) uses qualitative process theory to describe dynamic phenomena. Imagistic reasoning is discussed by Lindsay (1988), Palmer (1978), and Funt (1980). Finke (1989) covers several aspects of mental models under the topic of mental imagery. Johnson-Laird (1983) also discusses conceptual models as another class. These models are more abstract than those above since they involve only tokens for individuals and (typically) a simple tabular format to express relations. Johnson-Laird applies this method primarily to explain rational inference, including deduction, reasoning with quanti ed expressions, and meta-logical reasoning. Rips (1986) notes a distinction between literal mental models (e.g., those posited by Johnson-Laird, 1983) and gurative ones (such as those discussed in Gentner & Stevens, 1983). A literal modeler would propose that we consider elements in our minds that represent objects in the world and that when we reason, we manipulate this model. A gurative modeler would make the weaker claim that while humans appear to perform in a manner consistent with the belief that they manipulate models in the mind, they do not commit to any particular representation which may achieve that performance. In the domain of manual control, Rouse and Morris (1986) give a chart of mental model classes organized by the nature of model manipulation versus the level of behavioral control. Assembling a toy from instructions would be an example of explicit model manipulation with a low level of behavioral discretion. Problem solving in the domain of physics would exemplify a higher level of behavioral discretion since the actions the problem solver may take are much less limited. At this same level of behavioral discretion, the problem of making value judgements would presumably involve more implicit model manipulation. Supposedly, this characterization provides a rough indication of the kind of mental model one might expect in the domain classi ed in this manner. 28
Young (1983) posits eight views of a mental model, oriented toward explaining users' models of devices used in support of problem solving. These include surrogate, task/action mapping, strong analogy (a device is similar to another), coherence (a schemata for the device), vocabulary (terms used in describing the device), problem space (used to solve problems using the device), psychological grammar (a generative model of device behavior), and commonality (shared data structure explaining device behaviors). Each view may be seen as de ning a type of mental model. Unfortunately, Young's typology is not systematic and the ad hoc character makes its utility unclear.
11 Uniqueness In the previous sections, we have considered many aspects of mental models. It is clear that these representations have certain features in common with other conventional representations. In this section, we attempt to distinguish mental models from these related representations in an eort to provide justi cation for its study as a separate entity. Only representations proposed for modeling rational cognitive behavior are considered. In the following, mental models are compared to the following representations: propositions, semantic networks, schemas, natural logic, and mental imagery. Johnson-Laird (1983) claims that mental models are a generalization of both propositional and imagery representations. He states that mental models are not capable of representing all knowledge and therefore memory is more likely a hybrid system including propositional and imagery representations as well. Hence, this view might be seen as building on Paivio's (1986) dual coding approach. On the other hand, several results from imagery research seem to indicate that current thought on the nature of mental imagery may be more like those associated with mental models then a static pictorial view (Finke, 1989). Propositions can handle indeterminate relations easily while mental models are better with determinate ones. Johnson-Laird (1983) conjectures that people form a mental model of determinate descriptions. Experimental evidence appears to supports this belief (Bransford & McCarrell, 1974; Bransford & Johnson, 1972; Garnham, 1987). When faced with determinate descriptions, subjects remember the gist of a passage better than the literal text, while verbatim recall is better in cases where text passages are indeterminate. Johnson-Laird proposes methods that might be used in coping with indeterminacy. One possibility is to encode the information in propositions rather than a mental model. The unfortunate redundancy this incurs was mentioned in the comprehension section. Another option would encode indeterminacy using alternative mental models. This, however, appears psychologically invalid. In still another option, the indeterminacy could be rei ed as an element in an otherwise determinate model. Finally, one might simply note the indeterminacy but go on to select one representative model that is valid within the constraints 29
given in the description. The noted indeterminacy would be used in revising the model should an inconsistency arise. Brewer (1987) distinguishes between mental models and other cognitive representations by an appeal to the form of suggested memory storage. He is concerned with whether mental models are general knowledge structures or the episodic representations formed from them. He believes mental models are constructed upon demand as opposed to being simply retrieved. Schemata (which includes the ideas of schemas, frames, and scripts) are generic memory structures used to create instantiations from incoming episodic information. While schemata account for some inferences, such as comprehension and memory phenomena, they don't account for novel situations, new actions, or new arguments. The fact that mental models are constructed and can mimic the relations in the situation at hand, supports their use in representing novel phenomena. As mentioned in earlier sections, Johnson-Laird's (1983) conception of mental model views these representations as modeling a state by matching its structure. Hence, its form is not arbitrary like schemata. This gives mental models additional capability to handle text comprehension for spatial descriptions and related inferences. Brewer (1987) claims that both schemata and mental models mimic domain structure but mental models are speci c while schemata are generic. In general, the notion that schemata can mimic the structure of a domain appears to be false. The only exception to this is the script representation of Schank and Abelson (1977). Both mental models and scripts can be used to simulate event sequences. In a critical vein, Garnham (1987) argues that both schemata and mental models are more properly descriptive of frameworks rather than speci c representations. His position stems from the observation that schema theory is vague to the point of being untestable and unfalsi able. Garnham also makes an interesting distinction between certain types of mental models. He suggests that the form advocated by Johnson-Laird is likely to be stored in episodic memory due to its close relation to the real world. The formulation suggested by researchers such as Forbus (1984) are more likely to be stored in semantic memory, since its knowledge is applicable to classes of objects rather than speci c cases. In a contrast, Rips (1986) states both the literal and gurative mental model approaches are structured after the referent domain and they are consistent with evidence about the eects of domain content.25 He cites Medin and Schaer as arguing that mental representations of categories may consist of memories of exemplars. This is also supported by Tversky and Kahneman (1982) who show people sometimes estimate probabilities based on characteristics of an event rather than using abstract properties such as base rate or variance. (These characteristics include representativeness, availability, and anchoring.) Norman and Rumelhart (1975) argue against mental models indirectly by preferring case frames. They argue against analogic forms citing Pylyshyn's arguments for propo25
See the section on comprehension for a review of this evidence.
30
sitional representations. They also cite some interesting experiments supporting their viewpoint: eidetic memory may involve reconstruction from propositional memory - reconstructed images typically are missing properties or relations, not portions of some geometric form. Their studies have addressed oor plans, impressions of a university building, and go board game positions. This point, however, was noted and accepted by Finke (1989) who argues for mental imagery. Shanon (1988) has a more radical interpretation of the dierence between mental models and schemata. He argues that traditional representations can't capture all the subtleties of word usage and fundamental cognitive behavior. For example, propositional theories would seem to indicate that word de nition should be trivial for humans when in fact it is very dicult. Word meanings cannot be de ned from a given set of terms or distinctions because words not only can have more than one meaning, but meaning can vary as a function of context and even be novel. We can also recover from misusage. Shanon views representation not as an object but as the end of a dimension. The other end is de ned by the notion of presentation, i.e., that which we experience in the world, a state of aairs. Since mental models re ect the structure inherent in a state of aairs, Shanon places them closer to the presentational end of the representation-presentation dimension. He does not view the traditional representational approaches as fundamentally wrong but argues it is limited and overapplied. Greeno (1989) conjectures some general functional relationships between the representational and presentational levels. Also close to the presentational side of the representation dimension is mental imagery. Finke (1989) gives an overview of this eld putting forth ve principles which he feels are supported by the experimental data: implicit encoding, and equivalences of perception, space, transformation, and structure. The implicit encoding principle he attributes to Pinker and has been discussed in previous sections. The other principles of equivalence say, in essence, that our manipulation of imagery is performed in much the same manner as we manipulate physical objects in the world. The results and timing relationships are closely related to those derived from manipulations of actual objects.
12 Summary While research into mental models does not meet many of the proposed desiderata, the eld does seem to hold promise for the development of unique and useful representations. The current de nitions of the term are more appropriate to a research framework than a speci c hypothesis concerning representation. Consequently, the measurement of the presence of these models and their properties is somewhat ad hoc. However, mental models have an appealing explanatory power, accounting for some aspects of meaning, content, and context which are particularly dicult issues for theories of representation. Computational models have been oered in some domains, although few have matched the standard set 31
by Johnson-Laird in terms of psychological plausibility. De nitions of the term mental model are varied. Rouse and Morris (1986) oer a description which includes most knowledge necessary to operate and describe a system. de Kleer and Brown (1981) prefer a version which includes knowledge of device components independent of their potential uses. Young (1983) nds this too device-centered and argues that mental models must also include knowledge about the task and permissible actions on the device. Norman (1983) nds all of these eorts overly structured, pointing out that people's knowledge is often incomplete and inconsistent. In general, however, mental models are used to account for the representation of particular objects and/or people in particular situations. Certain relations (especially temporal and spatial) are often captured implicitly through representational structure or emergent through simulation. In both cases, explicit representation of certain kinds of knowledge is avoided. The implicit relations give mental models an analogue representational character. Hayes (1974) would argue that this stems from preserving fundamental relations found in the original medium, in most cases situations in the physical world. Palmer (1978) gives a hierarchy of isomorphisms between models and the physical world de ning a simple taxonomy of increasingly abstract representations. Implicit relations provide several useful functions: more information from the represented situation is preserved, the knowledge is stored compactly, conceivably the same mental functions which operate on sense data could manipulate these forms, and the lawful manner in which the representation matches the physical world provides for basis for a correspondence theory of truth and a referential theory of meaning. In addition to perceived situations, mental models can also be used to represent described situations. This view serves to explain certain features of comprehension and is discussed below in more detail. On the other hand, if such models are used to represent certain underdetermined situations, errors of commission may arise as in the research of Waltz and Boggess (1979). Their eorts point out a problem with mental models: are they underspeci ed or overspeci ed? Approaches such as Forbus' (1984) qualitative process theory or device models are inherently underspeci ed with respect to the phenomena under study. Other eorts emphasizing concrete or imagistic properties are overspeci ed (with respect to the information given in utterances about the situation). The diculty of selecting the appropriate representation without being overly speci c or overly abstract is underscored by Davis' (1984) work. While the study of relations has lead to provocative ideas and received a fair amount of attention, the investigation of mental model ontology has been neglected somewhat. Some researchers view model objects has being abstract while others hold they are more concrete. Johnson-Laird (1989) primarily utilizes tokens as modeling objects while Funt (1980) considers the physical shape of objects. Many researchers focus on objects in their models but some consider processes (e.g., Forbus). The two researchers contributing the most to this problem are Hayes (1985a) and diSessa (1983). Hayes has sketched an ontology for the behavior of liquids while diSessa has considered the phenomenological primitives 32
which might form our ontology of the physical world. It is interesting to note that both consider a mixture of physical objects and processes as their ontology in contrast to most other mental model researchers. By representing the world in a more direct fashion than most other representations, mental models serve to both represent actual situations and interpret linguistic expressions (Garnham, 1987). The use of similar structure gives mental models a \transparent" semantics through referential means. Experiments by Johnson-Laird (1983), Garnham (1987), Bransford and McCarrell (1974), and others have shown the importance of reference in comprehension. This enables the inference of spatial relations, instruments, and consequences of situations. People often remember referents but not speci c referential phrases from text passages. Readers also nd text that relates new referents to previously mentioned ones more coherent than text that does not state such relationships. Reference provides straightforward truth conditions for linguistic utterances: an utterance is true if there exists a mental model that satis es the truth condition of the statement which can be embedded in a mental model of the world (Johnson-Laird, 1983). A bene t of this approach is that certain semantic properties emerge from these truth conditions and need no independent speci cation. Another indication of mental model utility is found in problem solving. For the purposes of this paper, problem solving has been separated into deduction and simulation. In the area of deduction, Johnson-Laird (1983) has used mental models to explain numerous results from subjects' reasoning about syllogisms. Gelernter (1963) has used a diagram to make geometric theorem proving more ecient. The primarily application of mental models in problem solving has been as simulation vehicles. Arti cial intelligence researchers have investigated qualitative models of the physical world and simple artifacts, such as circuits. de Kleer and Brown (1981) proposed the no function in structure principle as a design \aesthetic". They convincingly demonstrate its usefulness in achieving a concise and robust knowledge representation of device behavior. They use a context-free description of device components to create particular device models with a topology matching the electrical connectivity of the actual device. Their envisionment procedure generates possible causal models from which one is selected as the most plausible. The qualitative nature of the device speci cation leads naturally to ambiguities. These are managed via the explicit consideration of assumptions. Davis (1983) shows how the additional consideration of spatial layout adds more power to the device model. He stresses the importance of causal pathways in both de ning specialized representations and isolating faults. While the above device modeling eorts presume a device model is given and primarily concentrate on envisioning causal models or performing fault diagnosis, Doyle's (1988) JACK system hypothesizes the structure of device models given only input/output behavior. Oddly, the interest in simulation illustrates the lack of communication between arti cial intelligence researchers and psychologists. Psychologists such as Norman (1983), and 33
Kahneman and Tversky (1982) believe that simulation is too hard for people to perform with any delity. Indeed, it is hard to imagine that people can reason about device models without the aid of the actual device or some external representation thereof. Psychologists have had more impact in the area of learning using mental models. Halasz and Moran (1983) along with Kieras and Bovair (1984) have shown that explaining a model of device operation to subjects results in improved problem-solving performance. However, Young (1983) makes the observation that device models oer an incomplete if not misleading account of people's mental models. He states that researchers must be careful to distinguish the device model from the user's model of the device and the psychologist's model of the user and the designer's model of the device. Another aspect of learning investigated by mental model researchers is the dierences between novices and experts, primarily in the domain of simple physics problems. Chi, Feltovich, and Glaser (1981) found that novices diered from experts not just in the amount of knowledge but in kind as well. Novices tend to focus on surface features of situations given in problems while experts tend to focus on theoretical motions of mass and force. Roschelle (1988) speculates that experts have aspects of naive qualitative reasoning (found in novices) as well as theoretical notions. Both are used to advantage when solving problems which may be dicult to solve using only one form of knowledge. Learning with mental models brings up the issue of how mental models relate to other representations. Johnson-Laird (1983) states that mental models are used by people in addition to propositional and imagistic representations. He argues that propositions are better for encoding indeterminate relations and mental models are better are representing determinate relations. Because of the determinate nature of mental models, Brewer (1987) holds that they are episodic representations, while schemata are held in semantic memory. Brewer also nds the elements of mental models enable them to represent novel situations while schemata are better suited to stereotypical events. This also follows from the nonarbitrary nature of encoding relations. In fact, Shanon (1988) nds mental models to be more of a presentational nature (like images) than representational. Finke's (1989) account of mental imagery seems to lend support for such a notion. Garnham (1987) nds both mental models and schemata to be terms that characterize frameworks instead of speci c representations and the diversity of research mentioned above seems to support that claim. In (Funt, 1980) Kosslyn is cited as stating that our internal representation of images is analogous to way images are stored in computer graphics. That is, we store enough detailed information to be able to reconstruct images rather than storing the images themselves. This notion suggests an interesting conjecture about mental model representation. Mental models may be compact representations that enable us to reconstruct the world in our heads. Such a theory would provide an integrated theory of memory applicable to all modalities. Through constructing a mental model, we re-present to ourselves a phenomena that is then projected onto the various sensory systems (not necessarily at the beginning of sensory pathways). These systems can then regenerate the previously 34
experienced events associated with the situation the model is representing. The act of recreating a presentation may be critical to compactly storing our experience as well as providing a facility for reinterpreting previous experience. The elegance of this notion is in the manner in which a single structure handles multi-modal recall of visual, auditory, emotional, and propositional memory. This idea is related to Finke's (1989) principles of equivalence that imply that our judgements regarding imagery are due in large part to our manipulation of mental representations which mimic the structure of the world. Of course, this hypothesis is a bit extreme. If people did possess such capabilities, it would seem that we would naturally acquire proper laws of physics (at least to an approximation relevant to our everyday experience). On the other hand, people by and large have no trouble navigating our physical world.
13 Future Research It is clear from the variety of eorts in this area that many fundamental issues still need to be addressed. Many of the topics covered in this paper are still open including ontology, learning, and interaction with other forms of knowledge. The origins of mental models and their associated manipulation procedures should be clari ed. Also, the basic operation of mental models should be clari ed such as when entities are introduced into a model, how they are identi ed, and how they are elaborated. This requires the speci cation of enabling conditions for the construction, use, and modi cation of models (Garham, 1987). Much of this foundational work requires psychologists and cognitive scientists to clarify how mental models can be concisely described, detected, and measured. Since the term mental models appears to describe a framework rather than a speci c representation, it behooves researchers in this area to develop a more detailed typology. This will also serve to clarify the nature of speci c mental model representations. In addressing that issue, investigators will need to consider several key questions aimed at separating out the contribution of mental models from other representation schemes. First, if models possess implicit relations, do they dier from earlier proposals for analogue representations (Sloman, 1971)? These were criticized heavily by Hayes (1974) and others who claimed that nothing could be gained that was not already available from propositional representations. In fact, Forbus (1984) gives an implementation of qualitative process theory as logical axioms. Such arguments retarded work in this area for years. In a similar vein, schemas have been used to explain much of the phenomena that mental models address, especially with regard to comprehension. From an implementation standpoint, it is unclear how mental models might dier from schemata unless the role of implicit relations is addressed. This problem gets at the heart of the distinction between representation and presentation. One promising direction is to consider how computational implementations of mental models might demonstrate the improved representation of particulars over schemata 35
as claimed in the theory. One possibility is to draw on work done with imagery, such as Funt (1980). However, even those working in mental imagery don't believe imagery is pictorial in form. Finke's (1989) principles suggest a more structured representation. A related issue, is the consideration of eciency. Since certain relations are represented implicitly, checking certain relations should be faster than other representations which would require inference to derive. This might be supported by a computational complexity theory of human processing. Such a theory might be based on primitive human information processing operations rather than machine instructions. If possible, this might enable a rational claim for constant-time processing of inherent relations in a mental model and slower times for other representations requiring inference. Another interesting area is the interaction between models. The study of algebra word problem solvers by Hall, Kibler, Wenger, and Truxaw (1989a) illustrates several unsolved issues. Problem solvers often create several external representations reifying aspects of their internal models. The role of this rei cation activity needs to be clari ed and promises to provide insight into the interaction between people's models and the world. Understanding how and why certain models are selected or reexamined is vital to a proper theory of model (and representation, in general) formation and the associated meta-level control. The role of the diagrammatic languages used by subjects may provide constraints critical to the identi cation of important model properties.
36
References Addanki, S., Cremonini, R., & Penberthy, J. (1989). Reasoning about assumptions in graphs of models. In Proceedings of the International Joint Conference on Arti cial Intelligence. Anderson, J. (1978). Arguments concerning representations for mental imagery. Psychological Review, 85, 249{277. Anderson, J. (1987). Methodologies for studying human knowledge. Behavioral and Brain Sciences, 10, 467{505. Bransford, J., & Johnson, M. (1972). Considerations of some problems of comprehension. In W. Chase (Ed.) Visual Information Processing. New York: Academic Press. Bransford, J., & McCarrell, N. (1974). A sketch of a cognitive approach to comprehension: Some thoughts about understanding what it means to comprehend. In W. Weimer & D. Palermo (Eds.) Cognition and the symbolic processes. New Jersey: Lawrence Erlbaum Associates. Bobrow, D. (1984). Qualitative reasoning about physical systems: An introduction. Arti cial Intelligence, 24, 1{5. Brewer, W. (1987). Schemas versus mental models in human reasoning. In P. Morris (Ed.) Modelling Cognition. New York: John Wiley. Brown, J., Burton, R., & De Kleer, J. (1982). Pedagogical, natural language and knowledge engineering techniques in SOPHIE I, II, and III. In D. Sleeman & J. Brown (Eds.) Intelligent Tutoring Systems. New York: Academic Press. Burstein, M. (1988). Combining analogies in mental models. Analogical Reasoning Chi, M., Feltovich, P., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognition, 5, 121{152. Collins, A., & Gentner, D. (1984). How people construct mental models. In N. Quinn & D. Holland (Eds.) Cultural models of language and thought. New York: Cambridge University Press. Craik, K. (1943). The nature of explanation. Cambridge: Cambridge University Press. Davis, R. (1984). Diagnostic reasoning based on structure and behavior. Arti cial intelligence, 24, 347{410. Davis, R., & Hamscher, W. (1988). Model-based reasoning: Troubleshooting. In H. Shrobe and AAAI (Eds.) Exploring arti cial intelligence. San Mateo, CA: Morgan Kaufman. de Kleer, J., & Brown, J. (1981). Mental models of physical mechanisms and their acquisition. In J. Anderson (Ed.) Cognitive skills and their acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates. 37
de Kleer, J., & Brown, J. (1983). Assumptions and ambiguities in mechanistic mental models. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. de Kleer, J. (1986). An assumption-based TMS. Arti cial intelligence, 28, 127{162. DiSessa, A. (1983). Phenomenology and the evolution of intuition. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Doyle, R. (1988). Hypothesizing device mechanisms: opening up the black box. Technical Report AI-TR1047. Cambridge: MIT AI Laboratory. Finke, R. (1989). Principles of Mental Imagery. Cambridge: MIT Press. Forbus, K. (1981). A study of qualitative and geometric knowledge in reasoning about motion. Technical report TR-615. Cambridge: MIT AI Laboratory. Forbus, K. (1983). Qualitative reasoning about space and motion. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Forbus, K., & Gentner, D. (1983). Learning physical domains: Towards a theoretical framework. In Proceedings of the second international machine learning workshop. Funt, B. (1980). Problem solving with diagrammatic representations. Arti cial Intelligence, 13, 201{230. Furnas, G. (1990). Formal models for imaginal deduction. In Proceedings of the Annual Conference of the Cognitive Science Society. Garnham, A. (1987). Mental models as representations of discourse and text. New York: John Wiley. Gentner, D., & Stevens, A. (1983). Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Gelernter, H. (1963). Realization of a geometry-theorem proving machine. In E. Feigenbaum & J. Feldman (Eds.) Computers and Thought. New York: McGraw-Hill. Greeno, J. (1983). Conceptual entities. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Greeno, J. (1989). Situations, mental models, and generative knowledge. In D. Klahr & K. Kotovsky (Eds.) Complex Information Processing: The impact of Herb Simon. Hillsdale, NJ: Lawrence Erlbaum Associates. Greer, B. (1987). Understanding of arithmetical operations as models of situations. In J. Sloboda & D. Rogers (Eds.) Cognitive processes in mathematics. Oxford: Clarendon Press. 38
Halasz, F., & Moran, T. (1983). Mental models and problem solving in using a calculator. In A. Janda (Ed.) Human factors in computing systems; ACM CHI conference proceedings. Hall, R., Kibler, D., Wenger, E., & Truxaw, C. (1989a). Exploring the episodic structure of algebra story problem solving. Cognition and instruction, 6, 223-283. Hall, R. (1989b). Computational approaches to analogical reasoning: A comparative analysis. Arti cial Intelligence, 39, 39{120. Harnad, S. (in press) The symbol grounding problem. Physica D. Hayes, P. (1974). Some problems and non-problems in representation theory. In R. Brachman & H. Levesque (Eds.) Readings in knowledge representation. Los Altos, CA: Morgan Kaufmann. Hayes, P. (1985a). Naive physics I: Ontology for liquids. In J. Hobbs & R. Moore (Eds.) Formal theories of the commonsense world. Norwood, NJ: Ablex. Hayes, P. (1985b). The second naive physics manifesto. In J. Hobbs & R. Moore (Eds.) Formal theories of the commonsense world. Norwood, NJ: Ablex. Holland, J., Holyoak, K., Nisbett, R., & Thagard, P. (1986). Induction: Processes of inference, learning, and discovery. Cambridge: MIT Press. Johnson-Laird, P. (1981). Mental models of meaning. In A. Joshi, B. Webber, & I. Sag (Eds.) Elements of discourse understanding. Cambridge: Cambridge University Press. Johnson-Laird, P. N. (1983). Mental Models. Cambridge: Harvard University Press. Johnson-Laird, P. N., & Bara, B. (1984). Syllogistic inference. Cognition, 16, 1{61. Johnson-Laird, P. N. (1988). Reasoning by rule or model? In Proceedings of Annual Conference of the Cognitive Science Society. Johnson-Laird, P. N. (1989). Mental Models. In M. Posner (Ed.) Foundations of cognitive science. Cambridge: MIT Press. Johnson-Laird, P. N. (1990). Knights, knaves, and Rips. Cognition, 36, 69-84. Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.) Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. Katz, J. & Fodor, J. (1963). The structure of a semantic theory. Language, 39, 170{210. Keuneke, A., & Allemang, D. (1989). Exploring the no-function-in-structure principle. Journal of experimental and theoretical arti cial intelligence, 1, 79{89. Kieras,D. & Bovair, S. (1984). Mental models in operating a device. Cognitive Science, 8, 255-273. 39
Kosslyn, S. (1981). The medium and the message in mental imagery: A theory. Psychological review, 88, 46-66. Kuipers, B. (1978). Modeling spatial knowledge. Cognitive Science, 2, 129{153. Leech, G. (1974). Semantics. New York: Penguin Books. Levesque, H., & Brachman, R. (1985). A fundamental tradeo in knowledge representation and reasoning (revised version). In R. Brachman & H. Levesque Readings in knowledge representation. Los Altos, CA: Morgan Kaufmann. Lewis, C. (1986). A model of mental model construction. Proceedings ACM CHI. Lindsay, R. K. (1988). Images and inference. Cognition, 29, 229{250. Macaulay, D. (1988). The way things work. Boston: Houghton Miin. McCloskey, M. (1983). Naive theories of motion. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Montgomery, H. (1988). Mental models and problem solving: Three challenges to a theory of restructuring and insight. Scandanavian Journal of Psychology, 29, 85{94. Norman, D. & Rumelhart, D. (1975). Memory and Knowledge. In D. Norman & D. Rumelhart (Eds.) Explorations in Cognition. San Francisco: Freeman. Norman, D. (1983). Some observations on mental models. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Norman, D. (1989). The design of everyday things. New York: Doubleday. Palmer, S. (1978). Fundamental aspects of cognitive representation. In Categorization. Hillsdale, NJ: Lawrence Erlbaum Associates. Paivio, A. (1986). Mental representation. New York: Oxford University Press. Rips, L. (1983). Cognitive processes in propositional reasoning. Psychological Review, 90, 38{71. Rips, L. (1986). Mental Muddles. In M. Brand and R. Harnish (Eds.) The representation of knowledge and belief. Tucson: University of Arizona Press. Rollins, M. (1989). Mental Imagery. New Haven: Yale University Press. Roschelle, J. (1988). Integrated commonsense and theoretical mental models in physics problem solving. In Proceedings of the Annual Conference of the Cognitive Science Society. Rouse, W., & Morris, N. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, 100, 349{363. 40
Rumelhart, D., & Norman, D. (1988). Representation in memory. In R. Atkinson, R. Herrnstein, G. Lindzey, & R. Luce (Eds.) Steven's handbook of experimental psychology. New York: John Wiley. Schank, R., & Abelson, R. (1977). Scripts, Plans, Goals, and Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Schank, R. (1982). Dynamic memory. New York: Cambridge University Press. Shanon, B. (1988). Semantic representation of meaning: a critique. Psychological Bulletin, 104, 70-83. Sleeman, D. (1982). Assessing aspects of competence in basic algebra. In D. Sleeman & J. Brown (Eds.) Intelligent tutoring systems. Orlando, FL: Academic Press. Sloman, A. (1971). Interactions between philosophy and arti cial intelligence: The role of intuition and non-logical reasoning in intelligence. In Proceedings of the International Joint Conference on Arti cial Intelligence. Sloman, A. (1975). Afterthoughts on analogical representations. In Proceedings of theoretical issues in natural language processing. Cambridge, MA. Tversky, A., & Kahneman, D. (1982). Judgement under uncertainty: Heuristics and biases. In D. Kahneman, P. Slovic, & A. Tversky (Eds.) Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. van Dijk, T., & Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic Press. Waltz, D. (1981). Generating and understanding scene descriptions. In A. Joshi, B. Webber, & I. Sag (Eds.) Elements of discourse understanding. Cambridge: Cambridge University Press. Waltz, D. & Boggess, L. (1979). Visual analog representation for natural language understanding. In Proceedings of the International Joint Conference on Arti cial Intelligence. Wenger, E. (1987). Arti cial intelligence and tutoring systems. Los Altos, CA: Morgan Kaufmann. White, B., & Frederiksen, J. (1986). Intelligent tutoring systems based on qualitative model evolutions. In Proceedings of the National Conference on Arti cial Intelligence. Williams, M., Hollan, J., & Stevens, A. (1983). Human reasoning about a simple physical system. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Winograd, T. (1972). Understanding natural language. New York: Academic Press. 41
Young, R. (1983). Surrogates and mappings: Two kinds of conceptual models for interactive devices. In D. Gentner & A. Stevens (Eds.) Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates.
42