Crotty, & Beale, 1985)ma group often thought to be have spe- cial difficulty with mirror-image ... Hollard and Delius (1982) timed pi- geons as they chose which of ...
PsychologicalReview 1988.Vol.95,No. 1, 115-123
Copyright1988bytheAmericanPsychologicalAssociation,Inc. 0033-295X/88/$00.75
Recognition of Disoriented Shapes M i c h a e l C. C o r b a l l i s University of Auckland, Auckland, New Zealand People can usually recognize a familiar shape independently of its orientation in three-dimensional space. I suggest in this article that they do this by extracting a description of the shape that is frameindependent, or independent of any coordinate system. Such a description is usually sufficient to locate the stored representation of the shape uniquely in long-term memory. However, a frameindependent description does not discriminate mirror-image shapes, which could explain the strong tendency to treat mirror images as equivalent in shape. Once a shape is identified, information about its internal axes (e.g., its top and bottom) can be recovered from memory, so that its orientation relative to the observer can be determined. The shape can then be mentally rotated to its normal or upright orientation; this normative transformation appears to be necessary if the shape is to be distinguished from its mirror image.
As freely moving organisms in a world of movable objects, we are faced continually with the problem of recognizing objects or shapes in varying orientations. This is part of the more general problem of pattern recognition, whereby we recognize patterns as invariant despite the fact that they may present themselves to our senses in an infinite variety of manifestations. We may recognize a particular person, for example, whether that person is near or far, standing or sitting, in left or right profile, laughing or crying, in sunshine or in shadow. It is convenient to distinguish properties that are intrinsic to the pattern itself and that serve to define that pattern from those that depend on the particular circumstances under which the pattern is manifest to the observer. We may identify these properties as invariant and circumstantial properties, respectively. That is, we identify what things are by identifying their invariant characteristics, and we also perceive something of the circumstances surrounding them. We recognize a dog, say, but we also perceive where it is in relation to ourselves, what it is doing, and so forth. For most purposes orientation can be considered a circumstantial property. Common movable objects can appear in any orientation, yet we can usually recognize them for what they are. At the same time we can also perceive the orientations they are in. In some cases, however, it is not easy to identify patterns in unusual orientations. Rock (197 3) pointed out, for instance, that it is peculiarly difficult to recognize familiar faces if they are upside down, and that it is also difficult to read cursive script upside down. These observations illustrate a further point about pattern recognition, namely, that it is hierarchical; one may still recognize an upside-down face as a face, even though
The research discussed in this article was supported by the National Research Council of Canada, the New Zealand Neurological Foundation, and the University of Auckland Research Committee. I thank Janice E. Murray and two anonymous referees for their helpful discussions. Correspondence concerning this article should be addressed to Michael C. Corballis, Department of Psychology, University of Auckland, Auckland, New Zealand.
one may not know whose face it is, just as one may recognize upside-down cursive script as script without being able to read it. Although we may be able to recognize most common objects independently of orientation, there is usually some orientation that is considered upright or canonical. Uprightness alone may not be sufficient to define the canonical orientation; for instance, an automobile is perhaps best perceived as an automobile in side view rather than from front or back. According to Palmer, Rosch, and Chase (1981), the canonical orientation of a three-dimensional object is the one that "reveals the most information of greatest salience about it" (p. 147); these authors show that the latency to name familiar objects decreases as the view of the objects becomes more canonical. Warrington and Taylor (1973) have shown that patients with right posterior cerebral lesions have great difficulty recognizing common objects depicted from unconventional points of view but relatively little difficulty if those same objects are depicted conventionally. It has been suggested that i f a shape is presented in some orientation other than its canonical one, the observer might compensate by an act of mental rotation. That is, the shape is imagined as it would look in its canonical orientation (Rock, 1973)--that is, with a viewer-centered coordinate system replaced by a canonical, object-centered coordinate system (Marr & Nishihara, 1978; Pinker & Finke, 1980)wand only when such normalization is complete can recognition occur. There is empirical evidence that mental rotation is indeed invoked in certain judgments about disoriented shapes. Cooper and Shepard (1973) timed subjects as they decided whether rotated alphanumeric characters were normal or backward (mirror reversed) and found that their reaction times increased sharply with the angular departure of the characters from the normal upright orientation. They took this to mean that the subjects mentally rotated the characters to the upright before making their decisions. Similar results have been obtained in experiments on judging whether a rotated picture of a hand is a left or a right hand (Cooper & Shepard, 1975), or whether a rotated polygon is a normal or a reflected version of a previously learned canonical shape (Cooper & Podgorny, 1976). These and other experiments have been taken as evidence 115
1 16
MICHAEL C. CORBALLIS
that mental rotation is a smooth, analogue process and that the representation of a mentally rotated shape is at some level the same as that of a shape that has actually been rotated (Cooper, 1976; Shepard, 1978). However, not all decisions about rotated shapes require mental rotation. For instance, reaction time to name rotated alphanumeric characters (Corballis, Zbrodoff, Shetzer, & Butler, 1978; White, 1980), to identify rotated letterlike symbols (Eley, 1982), or to classify rotated alphanumeric characters as letters or digits (Corballis & Nagourney, 1978) does not show the sharp dependence on angular orientation that is usually taken to imply mental rotation, although recognition may not be wholly independent of orientation (Jolicoeur & Landau, 1984). There is evidence from reaction time studies that mental rotation is involved in the identification of line drawings of rotated natural objects (Jolicoeur, 1985; Schwartz, 1981), although the role of mental rotation diminishes rapidly with repeated presentations of the same objects (Jolicoeur, 1985). These last results notwithstanding, there is a logical difficulty in supposing that mental rotation precedes recognition. It is hard to understand how one could mentally rotate an unrecognized shape to a canonical or upright orientation, because in the absence of recognition one could hardly know what its canonical orientation was. The observer might try mentally rotating the representation on some systematic basis until it appeared familiar, but such a strategy scarcely accords with the sharp functions relating recognition time to angular departure from the upright. In the case of three-dimensional shapes, there would be a further complication, since parts of the shape would be hidden from view and would have to be exposed if the shape were rotated in depth. The hidden parts could scarcely be represented unless the identity of the shape was known. ~ These considerations suggest that there may be at least two levels in the recognition of rotated shapes, one that is prior to mental rotation and one that is subsequent to it. For instance, observers seem to have little difficulty identifying rotated letters or digits (except those represented in cursive script) without mentally returning them to the upright, yet they do resort to mental rotation in order to distinguish them from their mirror images. In some cases mirror-image discrimination is necessary for identification itself, and here mental rotation is required. For instance, Corballis and McLaren (1984) reported evidence that mental rotation was involved in making the critical distinctions among the lowercase letters b, d, p, and q. Similarly, mental rotation may not be required in order to identify a drawing of a rotated hand as a hand, but mental rotation is required if one is to identify it as a left or a right hand (Cooper & Shepard, 1975). Although mental rotation does not always play a role in the identification of shapes, then, it seems to have a special function in the discrimination of mirror images. Cooper and Shepard (1973) argued that this is because mirror images are featurally identical and so cannot be distinguished on the basis of featural descriptions; the observer must therefore work with holistic, analogue representations. Hinton and Parsons (1981) proposed that perception of a shape requires the assignment of an intrinsic frame of reference to the shape, and this frame can be either left- or right-handed; the handedness of the frame is not coded explicitly, however, and can only be determined in analogue fashion. I proposed a similar account (Corballis, 1982; Corballis
& Beale, 1976). However, as we have seen, the role of mental rotation does not appear to be restricted to decisions involving mirror images. Sometimes, it may simply serve as a check. In the naming of rotated natural objects, for example, subjects must have at least some notion of what the objects are before mentally rotating them to the upright, because otherwise they could scarcely know where to rotate them to. Mental rotation may therefore simply provide a way of verifying their identities. Part of my aim in the following sections is to clarify the role of mental rotation in the recognition of disoriented shapes. This aim is embedded, however, in an attempt to sketch a more general account of how disoriented shapes are recognized. This involves the specification of the nature of the information that is extracted from the stimulus input, the matching of this input against information stored in long-term memory, and the further use of stored information to enhance or "flesh out" the image of the shape. Toward a T h e o r y
Perceptual Processing First, it is necessary to distinguish perceptual processing from the processes involved in shape recognition itself. As Pinker (1984) pointed out, it is misleading to proceed directly to a theory of shape recognition from the retinal arrays, or even from the known feature-extracting properties of the visual cortical areas. This is because the boundaries of a given shape, such as a square, might be defined in different manifestations by quite different elements, such as lines, rows of dots, boundaries between colors or shades, or disparities in random-dot stereograms. Consequently it is likely that a good deal of perceptual processing occurs before shape recognition even begins. Following Marr and Nishihara (1978; Marr, 1982), I shall assume that early visual processing supplies the so-called 21h-D sketch and that this provides the raw material for subsequent recognition processes. The 21/2-D sketch is a representation of a visual display from the viewpoint oftbe observer and includes information about edges, corners, surfaces, and other surface details. Surfaces are coded according to their distance from the observer and their orientation or slope relative to the observer. Discontinuities in distance or orientation that might signal edges or ridges are also marked. This representation is intended to include the richest information that purely bottom-up visual processing can provide. (For further details, see Marr, 1982.) Although the 21/2-D sketch in principle supplies the observer with as much information about the world as his or her retinal image allows, this representation is still a poor basis for shape recognition. It can only represent visible surfaces and so provides no information about the hidden sides of solid shapes. Because it is specific to the observer's viewpoint, it will change
~Even in the case of some familiar objects, the observer may lack knowledge of hidden parts. Many people are no doubt unfamiliar with the underside of an automobile, for example. A referee has suggestedto me that this may limit one's ability to mentally rotate such objects. I suspect, however, that nonmechanical observers may still imagine an automobile turned upside down, with the underside represented simply as a fiat surface, say. In my own case, I find that the imagined underside resembles that of the toy models I remember from childhood!
RECOGNITION OF DISORIENTED SHAPES as the observer's position or eye fixation changes. Of itself, it does not segment the visual scene into discrete objects. Much further processing must therefore take place if the observer is to understand and recognize what the scene actually contains. According to Marr (1982; Marr & Nishihara, 1978), the next step in the processing of shapes involves the construction of a 3-D model. As a prelude to this, it is necessary to partition the display in order to locate different shapes contained within i r a problem that need not concern us here. For any given shape, construction of the 3-D model involves departing from the reference frame of the 21/2-Dsketch, which is centered on the viewer's own vantage point, and locating a reference system centered on the shape itself. The parts of the shape are then located or characterized with respect to this shape-centered coordinate system and compared with characterizations stored in memory. According to Marr (1982), the assigning of a shape-centered coordinate system is necessary if shapes are to be recognized independently of orientation: Object recognition demands a stable shape description that depends little, if at all, on the viewpoint. This, in turn, means that the pieces and articulation of a shape need to be described not relative to the viewerbut relative to a frame of reference based on the shape itself. This has the fascinating implication that a canonical coordinate system must be set up within the object before its shape is described, and there seems to be no way of avoiding this. (Marr, 1982, p. 296, his emphasis) This statement will be challenged below. It is clear, however, that the establishing of a shape-centered coordinate system effectively solves the problem of orientation, because the description of the shape with respect to this coordinate system is invariant with respect to changes in the orientation of the shape. Just how the shape-centered coordinate system is located within a shape is a nontrivial problem, however. Marr and Nishihara (1978) suggested a number of possible rules and heuristics. They pointed out that many shapes and their parts can be described in terms of generalized cones organized in hierarchical fashion. Thus a human body, for instance, can be regarded at the top level as an elongated cone centered on the torso, with subordinate cones representing head, arms, and legs and lower-order cones representing smaller parts such as forearm, hand, and fingers. Given this overall plan, rules for the location of shape-centered reference axes might then be formulated. For instance, the top-bottom axis might be aligned with an axis of elongation within the shape or with an axis of bilateral or radial symmetry; linear movement might define a front-back axis; and so on. Parts could be assigned their own coordinate systems and integrated into the overall coordinate system for the shape as a whole. As Pinker (1984) pointed out, such heuristics leave many problems unsolved. Not all objects lend themselves to description in terms of generalized cones; indeed not all are elongated or symmetrical, and such properties may not in any case be readily detectable from the vantage point represented in the 21/2D sketch. Other authors have proposed alternative schemes for the generalized description of shapes; for instance, Hoffman and Richards (1984) suggested a scheme for discovering the parts of shapes based on patterns of inflection and extremes of curvature rather than on generalized cones. Richards (1979, 1982, cited in Pinker, 1984) suggested a two-stage analysis, in which easily sensed cues for broad classes of objects (e.g., ani-
1 17
mal, vegetable, or mineral) are extracted first and are used to formulate likely hypotheses about reference frames. One wonders, however, if any set of heuristics could ever be powerful enough to locate shape-centered reference frames prior to the act of recognition itself. That is, it is difficult to see how the intrinsic axes of a shape can possibly be determined unless one already knows what the shape is. In the following section, therefore, I argue that recognition might occur prior to the location of the reference frame, at least in some cases
Description Without a Reference Frame The idea of a description of a shape that is independent of any frame of reference is not new. For instance, some authors have suggested that the recognition of shapes might be achieved by the extraction of features that are independent of orientation (e.g., Milner, 1974; Sutherland, 1968). Although feature-based theories have largely fallen from favor (Pinker, 1984), there is some recent evidence that locally defined features may well constitute one source of information about shapes that is orientation free. Humphreys and Riddoch (1984) reported evidence that patients with right-hemispheric lesions are impaired in their ability to match objects in different orientations if deprived of information enabling them to locate the shape-centered reference frame. Under appropriate conditions, these patients could still match such objects on the basis of distinctive features. A patient with a bilateral lesion was unable to match objects on the basis of distinctive features but was apparently able to do so on the basis of overall reference axes. The authors therefore suggested that there are two "routes to object constancy," one based on global assignment of axes, the other based on local features (Humphreys & Riddoch, 1984, p. 385). As noted above, the objection to feature-based theories applies to the attempt to extract features directly from the retinal image. This objection is overruled if one seeks to identify orientation-free features from the 21/2-Dsketch. Notwithstanding the evidence reported by Humphreys and Riddoch (1984), we need not be limited to the search for local features. The more general question is whether there is any appropriately rich description of a shape that is sufficient to allow it to be recognized but is nevertheless free of any reference frame. I suggest that most shapes do allow such a description. It might take any of several forms. It might be propositional (Pylyshyn, 1973), but it need not be. For example, Deutsch (1955) proposed that a two-dimensional shape might be characterized by a ranked list of distances between all possible pairs of points on its contours. Similarly, one might characterize a three-dimensional object in terms of a ranked list of the distances between pairs of points on its surface. Such a description captures much of what we mean by shape and is clearly independent of orientation in space. Moreover, most shapes characterized in this fashion are unique, the exceptions being shapes that are the same as other shapes rotated to a different orientation (such as a square and a diamond). As Deutsch himself recognized, the listing of intercontour or intersurface distances may provide a description that is too complete, and some subset may be sufficient. For instance, it may be enough to list the distances between corners or between
118
MICHAEL C. CORBALLIS
other discontinuities on the shape's contour or surface. In any given context all that is needed is a description sufficient to eliminate other shapes. As noted earlier, recognition is hierarchical, and a description that is insufficient in one context may be adequate in another. One problem with Deutsch's solution is that it may not deal adequately with inexact matches. For instance, an airplane with its wings missing might better fit the stored description of a submarine or a fish, yet it would surely be recognized for what it was. I suspect therefore that the description of complex shapes must include some propositional information concerning the interrelations among parts. Deutsch's theory may well apply to the descriptions of the parts themselves or to the elemental shapes that are combined to form more complex objects. A frame-independent description of a shape can be at least as rich as a verbal description of that shape, with the proviso that one eschews terms such as top, bottom, back, front, left, and right or their equivalents, which imply a reference frame. To take a very simple example, one can easily identify a particular alphanumeric character from the following description: It consists of a straight line, with two rounded loops, one typically slightly larger than the other. The loops begin from opposite ends of the line, and curve around to meet the line again at a common point in about the middle of the line. Both loops are on the same side of the line. This description is orientation free and does not invoke any particular coordinate system. If one were actually to construct the shape from the description, its orientation in space would be entirely arbitrary because it is unspecified in the description. I do not of course mean to argue that our description of letters is in fact verbal; rather, the verbal description is simply an illustration that a symbolic, frame-independent description is possible. One difficulty with a description like the one above is that although it might describe a shape with sufficient specificity to distinguish it from other shapes, it is not the only possible description. There are alternatives to this description of the letter B, even in verbal terms. This is due in part to my having expressed it in English, a sloppy language. Some potential variation could be removed with a more formal symbolic description, but the problem runs deeper. My description effectively treats the line as a reference and relates the two curves to the line; alternatively, I could have treated the curves as the reference and related the line to them. The term reference here should not be confused with reference frame, because no orientation is implied. Thus, if the stored description were of one form and the perceptual description of another, there would still be a matching problem. Partial solutions to this problem could be achieved by giving priority to lines over curves and other features as references and to longer lines over shorter ones. Context could also help; in an alphanumeric context, the mere listing of two loops and a line would be sufficient to rule out all but an uppercase B. Some form of top-down interrogation might remove any remaining uncertainties; having matched the two loops and the line, for example, the system might then inquire as to whether they are joined as specified by the stored description o f a B. It might be noted that this matching problem does not arise in the case of Deutsch's scheme, at least for exact matches.
A description that is free of any particular reference frame is of course also orientation free. In the case of three-dimensional shapes, however, the ease of extracting such descriptions will vary with orientation, because parts of a shape are always hidden. For example, it is dearly easier to recognize a person from the front than from the back. Earlier, I referred to the evidence of Palmer et al. (1981) that the time to name familiar threedimensional objects decreases as they approach the canonical orientation. Although this might be taken as evidence that the subjects mentally rotated the objects to the canonical orientation before naming them, the more likely explanation is that fewer salient characteristics of the objects are available as the orientation departs from the canonical one. The match to information stored in long-term memory is simply better and faster when the maximum of salient information is available in the depicted view. In most cases, though, even a poor viewing angle should provide sufficient information for recognition; Palmer et al. found that naming errors remained very low even for the least canonical of views despite the increase in naming latency. According to this account, frame-independent descriptions would also form part of the defining properties of a known shape or object and be stored in symbolic fashion in long-term memory. They would be supplemented by other information about the shape, including information about its coordinates, as well as more qualitative information (e.g., what the object is used for, whether it is ugly or beautiful, pleasant or unpleasant, and so on). The point here is simply that this orientation-free component of the total information about a shape may be sufficient to mediate recognition. Once the shape is recognized, the observer can then identify its coordinates from the extra information stored in long-term memory. Of course there are some shapes, even two-dimensional ones, that cannot be identified uniquely from orientation-free descriptions. The lowercase letters b, d, p, and q constitute one such set. Other pairs include the digits 6 and 9 and the uppercase letters M and W. Similarly, a diamond cannot be distinguished from a square unless a top-bottom axis is defined (Rock, 1973). In such cases a top-bottom axis may be assigned arbitrarily, or according to context, or by identifying the gravitational upright with the canonical upright of the displayed pattern. Such examples, however, are surely the exceptions rather than the rule, and even in the exceptional cases recognition is usually reduced to a choice of a small number of possible alternatives, so that uncertainty is greatly diminished by a frameindependent description.
The Mirror-Image Problem Frame-independent descriptions possess one interesting property: They do not discriminate shapes from their mirror images. Every distance between pairs of points on a shape is matched by an equal distance within the mirror image of the shape; for every line, corner, or curve in one there is a corresponding line, corner, or curve in the other. Any verbal description of a shape that omits any reference to a reference frame applies equally to the mirror image of that shape. The problem of the descriptive equivalence of mirror images was recognized 200 years ago by Kant (1783/1953), who wrote as follows: What can more resemble my hand or my ear, and be in all points
RECOGNITION OF DISORIENTED SHAPES more alike, than its image in the looking glass? And yet I cannot put such a hand as I see in the glass in the place of its original. (p. 42) Gardner (1967) has referred to the problem as the "Ozma Problem," after Project Ozma. This project was started in Green Bank, West Virginia, in 1960, in an effort to search the galaxy by radio telescope for intelligent life on other planets. Suppose we had coded instructions on how to construct a certain shape or pattern and were able to transmit the instructions to some distant planet, called Oz, inhabited by intelligent beings. Even if those beings were able to decode our message, there would be no guarantee that the shape they reconstructed would not be the mirror image of the original. The parity, or left-right sense, of the shape requires reference to some analogue representation of the difference between left and right (Corballis & Beale, 1983). I should emphasize that this indeterminacy is specific to parity. That is, a frame-independent description might in principle capture all of the subtleties of shape except parity. The world viewed through a looking glass has all the richness and complexity of the real worldmpoint for point, feature for feature~and the descriptions that capture this richness and complexity in one apply equally to the other. But so long as there is no appeal to a reference frame, that description cannot specify parity. This failure of the frame-independent description to distinguish mirror images might well explain in part why most species, including human children and, up to a point, human adults, seem to have special difficulty with mirror-image discrimination. This difficulty has its adaptive consequences, because there is seldom if ever any need to discriminate mirror images in the natural world, and it is often an advantage to treat them as equivalent. These themes have been explored in detail by Corballis and Beale (1976, 1983).
Orientation-FreeCodingof Parity The handedness, or parity, of a shape is an elusive property. There are several lines of evidence that shapes of opposite parity are weakly discriminable regardless of their angular orientations and in the absence of mental rotation. Earlier, I cited studies showing that people can name or categorize rotated twodimensional shapes without first having to mentally rotate them to the upright (CorbaUis & Nagourney, 1978; Corballis, Zbrodoff, Shetzer, & Butler, 1978; Eley, 1982; White, 1980). However, they respond more quickly to normal than to backward shapes regardless of their orientations. This is true not only of adults but also of 11- to 13-year-old children, including those suffering from specific reading disability (Corballis, Macadie, Crotty, & Beale, 1985)ma group often thought to be have special difficulty with mirror-image discriminations (Corballis & Beale, 1976, 1983; Orton, 1937). These results are somewhat paradoxical. They show that mirror images are responded to differentially prior to any act of mental rotation, and yet studies of mental rotation indicate that it is precisely in the discrimination of mirror images that mental rotation plays a crucial role (Cooper & Shepard, 1973; Corballis, 1982). One can only conclude that information about the parity of a shape is available prior to mental rotation but is too weak to sustain reliable discrimination or is for some other reason inaccessible to the discrimination process.
119
There may be interspecies differences in the strength or availability of this parity code. Hollard and Delius (1982) timed pigeons as they chose which of two mirror-image forms was the same as a previously presented sample form and found that their times were independent of the orientation of the test forms relative to the sample. By contrast, humans performed the same task much more slowly but no more accurately, and their times increased sharply with relative orientation, suggesting mental rotation. It appeared that the pigeons had access to a parity code that was independent of orientation. What could be the nature of the information about a shape that is at once parity specific and orientation free? This question might seem unanswerable, because parity is in a sense only a special case of angular orientation. One can reverse a shape's parity by rotating it 180* through a space that has one dimension more than that of the shape itself. Thus a two-dimensional shape is mirror reversed if flipped through the third dimension, and a left-footed shoe becomes a right-footed one if flipped over in the fourth dimension. It is possible, however, to conceive of a reference frame that can specify the parity of shape but not its angular orientation. For a two-dimensional shape, one can imagine a clockwise spiral superimposed on the shape and centered on some landmark within the shape (e.g., its centroid). This will intersect with the shape differently from the ways in which it will intersect with the mirror image of the shape, regardless of their angular orientations. In developmental biology, the notion of a spiral gradient embedded in the embryo has been similarly invoked to explain how the situs, or parity, of the internal organs is established (Bateson, 1980; Lepori, 1969). Equivalently, a rotation superimposed on a shape will interact differently from the same rotation superimposed on its mirror image and so provide an orientation-free basis for parity discrimination. Imagine, for instance, a clockwise rotation superimposed on the letter L, centered on its fulcrum. If the longer arm of the L was held fixed while the shorter one was allowed to be influenced by the rotation, the rotation would act to increase the angle between the arms of a normal L but to decrease the angle between the arms of a backward L. An unpublished experiment of my own (Corballis, 1987) shows that subjects can judge whether a 30* jump of a hand on a clockface is clockwise or counterclockwise more or less equally quickly regardless of where the hand is located; there is some variation in reaction time, but the function is much flatter than the typical mental-rotation function. Rotational direction can therefore serve as a frame for mirror-image discrimination that is more or less orientation free. In another experiment, however, subjects proved generally unable to use such a frame to discriminate normal from backward letters independently of their orientation. The subjects were presented with the letters F, P, and R in different orientations and versions. Half of the subjects were instructed to think of the normal versions of these letters as clockwise and the backward versions as counterclockwise and to respond accordingly; the other half simply judged whether the letters were normal or backward. The results were essentially the same for both groups and suggested that they mentally rotated the letters to the upright before making either decision. Whether or not it is based on reference to a spiral frame, the asymmetry that provides the orientation-free coding of parity
120
MICHAEL C. CORBALLIS
in shape recognition must be supplied by the actual observer. That is, a bilaterally symmetrical observer could not accomplish such coding. The reason for this is explained by CorbaUis and Beale (1970, 1976, 1983) and can be briefly restated as follows. Suppose a symmetrical observer systematically identifies normal letters more rapidly than backward ones regardless of their orientation. But now suppose this scenario is mirror reversed, as though viewed in a looking glass. The observer, being symmetrical, is unchanged but is now seen to respond more quickly to backward than to normal characters, contradicting its own earlier behavior. This proves that a symmetrical observer could not achieve the coding of parity--that is, the observer must supply the essence of the code. The constraints imposed by bilateral symmetry may seem of marginal relevance because the human brain, at least, is well known to exhibit striking functional asymmetries (see Corballis, 1983, for a review). It may well be argued that the formation of structural memory traces would quickly provide the necessary asymmetry for mirror-image discrimination. There is some evidence, however, that memory traces themselves tend to be "symmetrized" in formation and that this process indeed provides the basis for mirror-image equivalence (see Corballis & Beale, 1976, 1983, for reviews). The very contingencies that have shaped bilateral symmetry in evolution may have also favored bilateral symmetry in the mechanisms of learning and memory. Such considerations must be weighed against the pressures favoring a systematic distinction between left and right, especially in human environments. Whatever its nature, this orientation-free coding of parity must imply a structural symmetry in the observer and may be incorporated in the stored representation of information about shapes. It is presumably what enables observers to recognize and classify alphanumeric characters more quickly if they are normal than if they are backward, regardless of their orientation and independently of mental rotation. It is not clear whether parity information is also coded in an orientation-free manner for three-dimensional shapes--whether observers are sensitive to whether a shoe, say, fits a left or a right foot regardless of the orientation of the shoe.
Coding of Orientation It was suggested above that most shapes can be recognized from a frame-independent description. The stored information about known shapes presumably also includes information about their internal or shape-centered reference frames. Once a shape is recognized, this stored information can be used to locate its internal reference frame and thus to determine the orientation of the shape relative to the observer. At this point, the orientation of the shape can be understood as a circumstantial attribute of the shape--an attribute that depends on the particular viewing circumstances. Once its orientation is determined, the shape can be mentally rotated to its upright or canonical orientation, in which the shape's internal reference frame is aligned with the perceived axes of the environment. Locating the internal reference frame also brings us to a description resembling the 3-D model proposed by Marr and Nishihara (1978). To recapitulate, then, the principal difference between this account and theirs is that it is here proposed that recognition normally precedes the location of
the internal reference frame, whereas they proposed that locating the reference frame is necessary before recognition can occur. The different dimensions of the shape-centered reference frame may not be assigned with equal ease. The top and bottom of a shape are evidently especially salient (Rock, 1973), probably because there are important functional differences between the tops and bottoms of many shapes, which give rise to featural differences. Moreover, the top-bottom axis seems to be critical to the mental rotation of shapes to the upright, because it effectively defines the upright. Logically, then, the top-bottom axis must be identified prior to mental rotation. There is empirical evidence that this is true. Corballis and Cullen (1986) timed subjects as they decided whether an asterisk was to the left, right, top, or bottom of rotated shapes. When the shapes were letters, the subjects could generally identify the asterisk as being at the top or bottom without mentally rotating the letters to the upright. There was some evidence that they occasionally resorted to mental rotation when the letters were horizontally symmetrical (e.g., D or E), with no features distinguishing top from bottom. In these cases mental rotation was presumably guided by prerotational identification of the leftright axis, which is featurally marked. It was also of interest that the subjects could apparently determine the tops and bottoms of horizontally symmetrical letters most of the time without first mentally rotating them to the upright. This required a mirror-image discrimination, because the set of rotated Ds, say, with asterisks located at their tops is the mirror image of the set with asterisks located at their bottoms. In order to solve this problem the subjects must have made use of the orientation-free parity code discussed above. Taken overall, however, these results show that the identification of top and bottom has priority over the identification of left and right, even when there are features distinguishing left and right but no features distinguishing top and bottom. In the case of relatively unfamiliar shapes learned just prior to the experiment, Corballis and Cullen found that judgments about top and bottom regularly induced mental rotation when the shapes were horizontally symmetrical. Orientation-free coding of the parity of a shape may depend on a high degree of familiarity with that shape. It might even be argued that it is the product of accumulated experience with particular shapes in different angular orientations, although it is a moot point whether even experienced readers have had more than a trivial amount of experience with inverted words or letters. It seems likely that the fronts and backs of three-dimensional shapes can also be identified without mental rotation to some canonical orientation, although I know of no empirical data on the matter. Front and back are virtually always distinguishable in featural terms--indeed the front-back dimension can scarcely be said to exist unless featurally defined. Some mentalrotation tasks presumably require that front and back be identified prior to the rotation. In deciding whether a shoe fits the left or right foot, for example, one would rotate it physically or mentally into alignment with the foot, with the toe facing forward and the heel backward. To do this requires that one can identify the toe and heel prior to rotation. Corballis and Cullen also tested subjects on a task requiring them to make one response if an asterisk was to either side of a rotated letter and another response if it was at the top or bottom. This task did not induce mental rotation. That is, mental rota-
RECOGNITION OF DISORIENTED SHAPES tion is not involved in locating the side of a letter when there is no requirement to decide which side is which. When the subjects were required to specify whether the asterisk was to the left or right side, then they generally did resort to mental rotation, even in the case of asymmetrical letters (such as E or R) in which there are featural cues as to which is the left or right side.
121
may depend on literacy or other forms of asymmetrical training. Whatever its nature, it is important to stress that the asymmetry must be supplied by the observer. A bilaterally symmetrical organism could not tell left from right (Corballis & Beale, 1970), so even if it could mentally rotate a shape to its canonical orientation it would still have no basis for distinguishing it from its mirror image.
Coding of Left-Right Parity This brings us to the point at which mental rotation plays its major role in decisions about rotated shapes. Although the orientation-free parity code described above can in principle enable one to solve the left-right problem without resorting to mental rotation, it seems that it does not normally do so. Perhaps this is because the left-right problem is not exclusively one of parity or mirror-image discrimination. As noted above, subjects in Corballis and Cullen's (1986) study adopted a mental-rotation strategy in making left-right decisions about asymmetrical lettersmdecisions that did not involve mirror-image discrimination. The importance of mental rotation in left-right discrimination may have to do with the fact that left and right must be effectively defined in relation to the human body (Corballis & Beale, 1983). In order to decide which is the left or right side of a disoriented shape, we must rotate the shape, physically or mentally, into alignment with our own sense of which are the left and right sides of space. This implied anisotropy of space may derive initially from a sense of which side of space contains the dominant and which the nondominant hand, or it may owe something to the asymmetrical scanning habits acquired in learning to read (Corballis & Beale, 1976). However derived, it provides us with an egocentric coordinate system that distinguishes left from right. In order to use this system in telling the left- from the right-hand side of a shape, one must first bring the coordinates of the shape into at least rough alignment with the coordinates of the space by an act of mental rotation. This anisotropic egocentric space is almost certainly not retinal. When subjects with their heads tilted to one or other side mentally rotate alphanumeric characters to the upright, the "upright" is generally more closely aligned with the gravitational than with the retinal upright, although it may coincide with neither (Corballis, Nagourney, Shetzer, & Stefanatos, 1978; Corballis, Zbrodoff, & Roldan, 1976). Knowledge of which is the left or right side is implicit in the decision as to whether an asymmetrical alphanumeric character, such as R or G, is normal or backward, which is presumably why the decision as to whether a rotated character is normal or backward also normally requires that the letter be mentally rotated to the upright. Rather than identify the sides explicitly, however, the observer might simply observe how the character, once rotated, is aligned with the coordinates of the egocentric space and compare this with an alignment code stored in longterm memory. We have seen that subjects apparently cannot identify the left or right sides of rotated shapes without first mentally rotating them to the upright, even when the shapes are asymmetrical, suggesting that the sides are not explicitly labeled as left or right depending on their lateralized features. I suggested earlier that the distinction between left and right is egocentric and is woven into the subjective sense of space itself, although this anisotropy
Long-Term Memory and the Short-Term Buffer Following Kosslyn (1980), it is convenient to distinguish between long-term visual memory, where information about familiar shapes is stored, and a short-term visual buffer, where an on-going visual image is constructed. Perception, pattern recognition, and imagery involve interplay between these two structures. The long-term store presumably holds symbolically coded information about familiar shapes and objects. I suggested earlier that this includes frame-independent descriptions of known shapes. These descriptions may also be coded independently of size, because people readily recognize shapes regardless not only of retinal size but also of actual sizemeven children easily recognize toys for what they represent. Nevertheless there is probably an additional parameter to represent size where this is relevant, precisely so that we can tell that a particular shape is a toy or a model rather than the real thing. These frame-independent descriptions of shape are assumed to be supplemented by descriptors that do in fact specify a shape-centered reference frame. Earlier I suggested a verbal description of the letter B that made no reference to its orientation in space. Such a description might then be supplemented by information to the effect that the straight-line portion is to be understood as the top-bottom axis, and the curved portions are to be understood as on the right-hand side of space. Again, I do not mean to imply that these descriptions are in fact verbal; rather, my purpose is to illustrate the kind of information that might be encoded. If a letter B is actually presented, extraction of a frame-independent description could allow it to be located in long-term memory. Information about its internal axes would then be accessed and applied to the representation in the short-term buffer. At this point, the observer might be said to perceive the letter in a particular angular orientation. Frame specifications may be coded symbolically in long-term memory but may require reference to spatial information in the short-term buffer itself if the parity of the shape is to be resolved. For instance, long-term memory may contain instructions as to how to construct a normal R, with appropriate reference to the left and right sides. However, the information as to which is the left and right might be built into the dimensions of the buffer itself. The same stored information could therefore produce a backward R if decoded in a different spatial medium. A spiral or torque embedded in the visual buffer, or subjective space, may also allow the observer to extract information about the parity of the presented shape. Parity may also be stored symbolically in long-term memory, so that access to memory may be faster if the extracted parity information matches the stored information. However, this parity code does not seem to be strong enough to permit most observers to decide between a
122
MICHAEL C. CORBALLIS SHORT-TERM V I S U A L BUFFER
LONG-TERM VISUAL MEMORY
I
2 ~; - O Sketch
SYMBOLIC SHAP( CODE
Frome- independent deecriptlon
Frome-1 ndepenoent descri ption
$- O Sketch
I Locetlon ofinternel I
lax--
I i
I mege rototed to canonical coordinotes
I
Other :ltored information
I
Shape generetea from long-term memortj
Figure 1. Schematic representation of the theory.
familiar rotated stimulus and its mirror image without the further step of mental rotation. The descriptions outlined above do not constitute the only information stored in long-term memory. Descriptions might be added to specify the contexts in which the shape is likely to be found. To take a well-known example, a particular stimulus might be seen as a B in a verbal context and as 13 in a numerical one. Possible contexts are of course many and varied, and likely to be organized hierarchically. We may also add descriptions that include associative aspects, a verbal label, color, emotional aspects, and so forth. Stored representations can of course be accessed in the absence of a presented shape. For example, a verbal label may be sufficient to locate a stored description and allow an image to be generated in the short-term buffer. Perception and imagery are thus intimately related. Indeed, any "filling out" of a percept with stored information may be considered an imaginal process. For instance, one may view another person from the back, locate a match in memory to identify that person, and fill out the hidden facial features, only to be shocked when the person turns around to discover that it is someone else. Conclusions I have sketched a theory as to how people recognize rotated shapes or objects. Its essential components are summarized in Figure 1. The theory is computational in spirit if not in detail and owes much to the theoretical frameworks already provided by such authors as Marr and Nishihara (1978) and Kosslyn (1980). The theory is also sequential, in that it specifies a sequence of steps in the processing of visuospatial input. According to the theory, recognition may be said to occur as soon as the match is found in memory. However, recognition may be a somewhat hierarchical notion. Long-term memory is no doubt itself hierarchically organized, so that a match may be achieved at one level before it is reached at another. I have al-
ready suggested several examples of this; a particular stimulus may be seen as a lowercase letter, but it may require mental rotation in order to be identified as a b or a d; or a shape may be recognized as a hand, but it may require mental rotation if it is to be recognized as a left or a right hand. In these examples, the extra processing is necessary because the parity of the shape must be resolved before recognition reaches the desired level of specificity. It is also possible that information extracted from a rotated shape is of poorer quality than that extracted from a shape in its upright or canonical orientation. In part, this is due to the fact that critical parts of three-dimensional shapes may be hidden if they are rotated away from their canonical orientations. But this is not the whole explanation; even two-dimensional shapes seem to be recognized more accurately or slightly more quickly if they are upright than if disoriented, even though mental rotation does not seem to be involved9 To take an extreme example referred to earlier, individual faces are virtually unrecognizable upside down (Rock, 1973). The frame-independent description may be sufficient for the observer to establish that the shape is a face and to recover the stored information to further perceive that it is upside down. If the face is to be processed to a sufficient level of description for its identity to be established, however, it must be presented in its upright orientation, or at least within tolerable limits of it. To speculate as to precisely why this is so is beyond the scope of this article. The relative poverty of information extracted from disoriented shapes might also explain why subjects sometimes do resort to mental rotation, especially in the identification of rotated natural objects. The initial extraction of frame-independent information may be sufficient to locate a match in memory and to allow axes to be assigned so that the object can be mentally rotated to the upright. Once the object is imagined in its upright orientation, the observer may then be able to carry out further processing, perhaps to check the initial, cruder act of recognition. Faces suffer an extra disadvantage because they are too subtle or complex for mental rotation to occur without loss of critical information. Thus if an inverted face is to be identified, it must be reoriented by a physical act of rotation. The main point I have tried to make in this article is that recognition of a rotated shape can occur before shape-centered axes are assigned to it. That is, a frame-independent description is usually sufficient to uniquely identify a shape, and shape-centered axes can then be assigned. This approach overcomes some of the difficulties inherent in the theoretical framework proposed by Marr (1982; Marr & Nishihara, 1978) and also provides some insights into the problems associated with mirrorimage shapes. It may not be without problems of its own, however, and the truth may lie somewhere between the two approaches. The initial act of recognition may be relatively crude, although sufficient perhaps for the location of axes to be determined, and more elaborate and meaningful descriptions may indeed depend on further processing. References Bateson, C. (1980). Mind and nature: A necessary unity. Glasgow: Fontana/CoUins. Cooper, L. A. (1976). Demonstration of a mental analog of an external rotation. Perception & Psychophysics, 19, 296-302.
RECOGNITION OF DISORIENTED SHAPES Cooper, L. A., & Podgorny, P. (1976). Mental transformations and visual comparison processes: Effects of complexity and similarity. Jour-
nal of Experimental Psychology: Human Perception and Performance, 2, 503-514. Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental images. In W. G. Chase (Ed.), Visual information processing (pp. 75-176). New York: Academic Press. Cooper, L. A., & Sbepard, R. N. (1975). Mental transformation in the identification of left and right hands. Journal of Experimental Psychology: Human Perceptionand Performance, 1, 48-56. Corballis, M. C. (1982). Mental rotation: Anatomy of a paradigm. In M. Potegal (Ed.), Spatial abilities: Developmental and physiological foundations. New York: Academic Press. Corballis, M. C. (1983). Human laterality, New York: Academic Press. CorbaUis, M. C. (1987). Distinguishing clockwise from counterclockwise: Does it require mental rotation?Manuscript submitted for publication. Corballis, M. C., & Beale, I. L. (1970). Bilateral symmetry and behavior. PsychologicalReview, 77, 451-464. Corballis, M. C., & Beale, I. L. (1976). The psychology of left and right. Hillsdale, NJ: Edbaum. Corballis, M. C., & Beale, I. L. (1983). The ambivalent mind. Chicago: Nelson-Hall. Corballis, M. C., & Cullen, S. A. (1986). Decisions about the axes of disoriented shapes. Memory & Cognition, 14, 27-38. Corballis, M. C., Macadie, L., Crotty, A., & Beale, I. L. (1985). The naming of disoriented letters by normal and reading disabled children. Journal of Child Psychology & Psychiatry, 26, 929-938. Corballis, M. C., & McLaren, R. (1984). Winding one's ps and qs: Mental rotation and mirror-image discrimination. Journal of Experimental Psychology: Human Perceptionand Performance, 10, 318-327. Corballis, M. C., & Nagourney, B. A. (1978). Latency to categorize disoriented alphanumeric characters as letters or digits. Canadian Journal of Psychology,, 23, 186-188. Corballis, M. C., Nagourney, B. A., Shetzer, L. I., & Stefanatos, G. (1978). Mental rotation under head tilt: Factors influencingthe location of the subjective reference frame. Perception & Psychophysics, 24, 263-273. Corballis, M. C., Zbrodoff, N. J., & Roldan, C. E. (1976). What's up in mental rotation? Perception& Psychophysics, 19, 525-530. Corballis, M. C., Zbrodoff, N. J., Shetzer, L. I., & Butler, P. B. (1978). Decisions about identity and orientation of rotated letters and digits. Memory & Cognition, 6, 98-107. Deutsch, J. A. (1955). A theory of shape recognition. British Journal of Psychology, 46, 30-37. Eley, M. G. (1982). Identifying rotated letter-like symbols. Memory & Cognition, 10, 25-32. Gardner, M. (1967). The ambidextrous universe. New York: Academic Press. Hinton, G. E., & Parsons, L. M. (1981). Frames of reference and mental imagery. In A. D. Baddeley & J. Long (Eds.), Attention and performance (Vol. 9, pp. 261-278). HiUsdale, NJ: Erlbaum. Hoffman, D. D., & Richards, M. (1984). Parts of recognition. Cognition, 18, 65-96. HoUard, V. D., & Delius, J. D. (1982). Rotational invarianee in visual pattern recognition by pigeons and humans. Science, 218, 804-806.
123
Humphreys, G. W., & Riddoch, M. J. (1984). Routes to object constancy: Implications from neurological impairments of object constancy. QuarterlyJournal ofExperimental Psychology, 36A, 385-415. Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory & Cognition, 13, 289-303. Jolicoeur, P., & Landau, M. J. (1984). Effects of orientation on the identification of simple visual patterns. Canadian Journal of Psychology, 38, 80-93. Kant, I. (1953). Prolegomena to any future metaphysics. (P. G. Lucas, Trans.). Manchester, England: Manchester UniversityPress. (Original work published 1783) Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Cambridge University Press. Lepori, N. G. (1969). Sur la genese des structures asymetriques chez rembryon des oiseaux. Monitore Zoologico Italiano, 3, 33-53. Marr, D. (1982). Vision. San Francisco: Freeman. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London, 200, 269-294. Milner, P. M. (1974). A model for visual shape recognition. Psychological Review, 81, 521-535. Orton, S. T. (1937). Reading, writing, and speech problems in children. New York: Norton. Palmer, S., Rosch, E., & Chase, P. ( 1981). Canonical perspective and the perception of objects. In A. D. Baddeley & J. Long (Eds.), Attention and performance (Vol. 9, pp. 135-151). Hillsdale, NJ- Academic Press. Pinker, S. (1984). Visual cognition: An introduction. Cognition, 18, 163. Pinker, S., & Finke, R. A. (1980). Emergent two-dimensional patterns in images rotated in depth. Journal of Experimental Psychology: Human Perceptionand Performance, 6, 244-264. Pylyshyn, Z. W. (1973). What the mind's eye tells the mind's brain: A critique of mental imagery. PsychologicalBulletin, 80, 1-24. Richards, W. (1979, April). Natural computations: Filling a perceptual void. Paper presented at the 10th Annual Pittsburgh Conference on Modeling and Simulation, University of Pittsburgh. Richards, W. (1982). How to play twenty questions with nature and win. (Memo No. 660). Boston: MIT Artificial Intelligence Laboratory. Rock, I. (1973). Orientation andform. New York: Academic Press. Schwartz, S. P. (1981). The perception of disoriented complex objects. Unpublished manuseipt, Yale University. Shepard, R. N. (1978). The mental image. American Psychologist, 33, 125-137. Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition in animals and man. Proceedingsof the Royal Society of London, 171B, 297-317. Warrington, E. K., & Taylor, A. M. (1973). The contribution of the right parietal lobe to object recognition. Cortex, 9, 152-164. White, M. J. (1980). Naming and categorization of tilted alphanumeric characters do not require mental rotation. Bulletin of the Psychonomic Society, 15, 153-156. Received November 21, 1986 Revision received June 8, 1987 Accepted June 11, 1987 9