Sequential Processes in Image Generation - CiteSeerX

32 downloads 47474 Views 2MB Size Report
grid, using the standard Apple II computer font. We were interested primarily in ..... rant of the purposes and predictions of the experiment. The method and.
COGNITIVE

PSYCHOLOGY

Sequential

20, 319-343 (1988)

Processes

in Image Generation M. KOSSLYN

STEPHEN Harvard

CAROLYN

University

BACKER CAVE

Harvard

University

A. PROVOST

DAVID

Johns Hopkins

University

AND SUSANNE

M. VON GIERKE

Harvard

University

The results reported in this paper indicate that images of simple twodimensional patterns are formed sequentially. Segments of block letters were imaged in roughly the order in which most people draw the letters, and segments of novel patterns were imaged in the order in which the segments were learned. These results were shown not to be an artifact of how people scan over images once they are formed, and could not have been due to experimenter expectancy effects. In addition, a new objective methodology indicated that images of complex letters required more time to generate, and the increases in time with complexity were very similar to those obtained simply by measuring the time subjects required to report that images had been formed. o 1988 Academic PWS, IX.

One of the most fundamental observations about mental images is that we somehow create them on the basis of stored information. Images are transient, perceptlike representations that exist in short-term memory. When we do not experience imagery, the information necessary to create images presumably resides only in long-term memory; images proper must arise as a consequence of this information’s being processed. Upon introspection, images of simple objects, such as letters or line patterns, may seem to appear all at once. In this paper we present evidence that Requests for reprints should be sent to S. M. Kosslyn, 1236 William James Hall. 33 Kirkland Street, Cambridge, MA 02138. This work was supported by NIMH Grant MH 39478 and ONR Contracts NOOOl4-83-K-0095and N00014-85-K-0291. The second author was supported by an NSF Graduate Fellowship. The authors thank Eric Domeshek, the late Jeffrey Holtzman, and Jim Roth for useful criticism, and Lori Cronin, Tim Doyle, and Phil James for technical assistance. 319 OOlO-0285/88$7.50 Copyright Q 1988 by Academic Press, Inc. All rights of reproduction in any form reserved.

320

KOSSLYN

ET

AL.

such introspections are incorrect; even when images of simple, welllearned patterns are formed, the parts of the pattern are activated sequentially, with the image being built up over time. There are several reasons why serial generation of visual mental images is to be expected. One pertinent observation is that in many (if not most) cases, images are composed of objects or parts that were encoded separately. This obviously is true for images of objects arranged into novel scenes (e.g., a dog chasing a car that has a monkey riding on the roof), but also will often be the case when single objects are imaged (e.g., a bicycle, which contains wheels, handles, and so on). When an object is initially viewed, its parts need not be encoded simultaneously, especially if the object subtends a large visual angle and is examined via multiple eye fixations made over the course of several seconds (or even minutes). If parts of an object are encoded separately, it will be necessary also to encode the spatial relations among the parts if an image is later to be formed correctly. Many theories of perception posit that such spatial relations are local, they relate each part to one other part (e.g., see Mart-, 1982; Palmer, 1977). For example, when scanning a human form, one might encode that the shin bone is connected to the knee bone, and the knee bone is connected to the thigh bone, and so on, with each part being located relative to a neighbor. If the locations of parts are represented as relative to other parts, then, it will be necessary to have the reference part in the image before knowing where to position another part. When generating an image of a person, for example, if the location of the hand is specified relative to the wrist, one will need to “see” the wrist to know where the hand should be placed in the image. These sorts of concatenated “structural contingencies” lead us to expect that images will be generated by activating parts sequentially, placing each part relative to its previously imaged reference part. There is suggestive evidence that a part is generated in an image only when the appropriate reference point is “visible.” Farah and Kosslyn (1981) found that subjects required more time to image complex objects at small sizes (i.e., so that they appeared to subtend small visual angles) than at larger sizes when the subjects were motivated to include all of the details. This result can be interpreted in terms of Kosslyn’s (1975) finding that parts are harder to “see” when objects are imaged at small sizes, which would make it more difficult to “see” where additional parts should be added during image generation. In addition, Kosslyn, Reiser, Farah, and Fliegel (1983) found that it was more difficult to form images of matrices of letters when the letters were very similar. This result may have reflected greater difficulty in discriminating among letters of the partially completed image, which was necessary in order to “see” where to insert new letters.

SEQUENTIAL

IMAGE

GENERATION

321

Furthermore, there are two kinds of evidence that images are generated by activating distinct perceptual units. First, in some experiments subjects were asked to study a pattern, and then to decide in its absence whether a second pattern was embedded within it (e.g., Reed, 1974; Reed & Johnsen, 1975). For example, subjects might study the Star of David, image it, and then decide whether a parallelogram was contained within it. If the probed part corresponded to a perceptual unit (e.g., a triangle, for the Star of David), subjects were faster and more accurate than if the part did not correspond to a unit (e.g., a parallelogram). Presumably, a set of such units was initially encoded as one examined the figure, and the image later was composed by activating these units. If so, then the inspection task would be easiest when the probe corresponded to one of the units inherent in the image, and would be more difficult when it cut across these units (e.g., a parallelogram falls over parts of two triangles in the star). However, the fact that similar effects were obtained perceptually, when the object is physically present, clouds the issue: it could be that the part effect reflects the inspection process, and images are formed all of a piece. A second source of evidence that images are built up by activating separate perceptual units comes from studies of image formation time. Typically, the time subjects require to report that an image is complete increases when more objects are included in an imaged scene, or when more parts are included in an imaged object (e.g., Beech & Allport, 1978; Kosslyn et al. 1983; Paivio, 1975; Weber & Harnish, 1974; see Kosslyn, Roth, & Mordkowitz, 1986). This result has been obtained when “parts” are defined in terms of various physical properties (e.g., grouped according to the Gestalt Laws of organization) or when they are conceptually imposed on a figure (e.g., seeing a Star of David as two overlapping triangles or as a hexagon with six small triangles attached; see Kosslyn et al., 1983). None of the previously reported research informs us about the nature of the image generation sequence. In particular, the finding that more time is required to generate images of patterns that contain more parts does not implicate a sequential generation process. All that can be said is that individual units are activated, and that “capacity” is limited so that the system slows down when more units must be activated. It is possible that parts are activated in parallel, but that the rate of activation depends on the total number of parts being activated. The present experiments allow us to examine the sequential nature of the image generation process. EXPERIMENT

1

It is not clear how the previously used methodologies can be used to study the temporal course of image generation; typically, they reflect the

322

KOSSLYN

ET

AL.

time to complete forming an image. In addition, there are several potential difficulties with much of the previous research. Simply asking subjects to press a key when an image is formed not only depends on subjects’ knowing when an image is “completely formed,” but on their cooperation and good will. In addition, given the subjective nature of the judgment, the technique seems susceptible to possible experimenter demand effects and task demands (see Intons-Peterson, 1983; Pylyshyn, 1981). Furthermore, it may not be a good measure of image generation per se; after the image is formed, it is possible that the subjects scan around and “inspect” it prior to pressing the key. If so, then response time may reflect the time to inspect an image in addition to the time to generate it, and there is no way to assess the relative contributions of the two kinds of processes to the overall times. The methodology used in this experiment circumvents the possible problems with the traditional method. This new methodology is derived from a task developed by Podgorny and Shepard (1978). In one condition, they showed subjects a 5 x 5 matrix, and filled in specific cells so that a letter was displayed. Shortly thereafter, one or more dots appeared in the grid, and subjects were asked to decide whether all of the dots fell on the figure. In a second condition, the subjects saw an empty matrix, and now imaged the letter as if the appropriate cells had been filled. After forming the image, the subjects then decided whether dots fell in cells that were imagined to be filled. Podgorny and Shepard found remarkably similar patterns of response times in the two conditions. Times in both cases depended on whether dots fell on intersections or limbs, the size of the figure, and the number of dots. In addition, Podgorny and Shepard reported that there was no evidence of top-to-bottom or left-to-right scanning or the like; in both cases, the cells apparently could be monitored in parallel. We have adapted the Podgorny and Shepard paradigm to study image generation in the following way: subjects are shown a blank grid along with a lowercase letter. They are asked to decide whether two x marks fall in cells that would be occupied by the uppercase version of the letter if it was present in the matrix (displayed as shown previously). The x marks appear only 500 ms after the lowercase cue appears, which-if previous estimates of image generation time are anywhere near correct-is not enough time to read the cue and finish generating an image. Thus, some additional time will be required to generate the image prior to evaluating the probes, with more time being required when more of an image must be formed. Hence, if more complex forms (i.e., ones with more segments) require more time to image, increasing stimulus complexity ought to result in longer response times in this task.

SEQUENTIAL

IMAGE

GENERATION

323

The task just described involves not only forming an image, but “inspecting” it and making a decision about the relationship between the probe marks and imaged pattern. In order to argue that specific effects reflect image generation per se, and not the other components of the task, we included two control conditions. First, given Podgorny and Shepard’s results, it seems that the same factors affect inspecting a physically present display and an image of the display. Thus, we included a perceptual condition exactly analogous to the imagery one, except that the appropriate cells were actually tilled. Second, we asked people to inspect an imaged pattern after the generation process was complete; this task does not require image generation, but does require all other aspects of the primary task. These control tasks are very similar to ones studied by Podgorny & Shepard (1978), and we wished to contrast their results on inspecting perceived and imaged patterns, which suggested a parallel search, with the present predictions on image generation, which posit a serial process. The most important feature of this experiment is that, in addition to varying the complexity of the letters, we systematically varied the locations of the probe marks on key trials. The letters in the grid were composed of segments, which consisted of two or more filled squares that formed a straight line. On some trials, both probe marks were located on segments that should be generated near the beginning of the sequence, if we suppose that a letter’s segments are imaged in the sequence they are usually drawn or that the single longest line is generated first and each connecting link is thereafter generated in order. On other trials, one of the probe marks was located on a “far” segment, which should be imaged at the end of the sequence, and thus this probe should take more time to evaluate because one will have to wait longer for the segment to be generated. If the generation sequence is really at the root of such an effect, of course, then we expect no such result when the subjects evaluate marks on physically present letters or on previously imaged letters. Finally, in order to compare the new image generation task with the traditional subjective method, we also included a fourth task. Here the subjects saw a lowercase cue, and formed an image of the uppercase version of the letter in the grid. When the image was complete, the subject simply pressed a key. The time to press the key, reporting that the image was formed, was measured. We compared the differences in image generation times among the different letters obtained from the key press technique with the differences obtained from the new image generation task. If both measures in fact gauge image generation time, then we should find similar effects of the complexity of the imaged letters in both cases.

324

KOSSLYN

ET

AL.

Method

Subjects Twelve Johns Hopkins University undergraduate students volunteered to participate as paid subjects.

Materials A 4 x 5 grid was displayed on a white-on-black fast-phosphor video display monitor (subtending approximately 6 x 8 deg. of visual angle), and a program was written to form the uppercase versions of the letters L, C, J, G, H, F, P, and U by selectively filling in the appropriate cells. The lowercase version of the letter was presented directly beneath the grid, using the standard Apple II computer font. We were interested primarily in responses to the letters L and C (simple letters, with relatively few segments in our font) and J and G (complex letters, with relatively many segments in our font); a close examination of these two complex letters is particularly interesting because they face in opposite directions, which will allow us to examine one type of possible scanning effects. The additional letters were included in part to allow us to analyze the effects of complexity over a wider range of stimuli, giving us firmer grounds for drawing generalizations about effects of complexity per se; the additional letters also served in part as tillers, so that no particular locations in the grid were especially likely to be the location of a probe mark. The computer was programmed to deliver three types of trials: for the Perception Condition, a letter was actually displayed by tilling in the appropriate cells of the grid, and the lowercase version was presented beneath the grid; 500 ms after the stimulus was presented, an x mark appeared in two of the ceils of the grid. Each x appeared in the center of a cell (in reverse video, if the cell was tilled) and remained on until the subject responded. On half the trials both marks fell on the block letter (these were “Yes” trials), and on the other half only one mark fell on the letter (these were “No” trials). On No trials, one of the probe marks fell in a cell located adjacent to the letter. Each letter was presented 20 times, 10 with both probe marks on it and 10 with only one mark on it. The presentation sequence was random with the constraints that each of the 16 stimulus types (Yes and No trials for each of the eight letters) appeared an equal number of times in each block of 80 trials, no letter could appear twice in succession, and the same response could not appear more than three times in succession; the same sequence of trials was used in all three conditions. A new trial was presented 5 s after the subject made a response. As is illustrated in Fig. 1, for the letters “G” and “J” the “Yes” trials were divided into two types: “Near” trials had one x mark on the spine (the single longest line, the vertical line on the left of the “G” and on the right of the “J”) and another on a segment adjacent to the spine (e.g., the bottom of the G). “Far” trials had one x mark on the spine and one on a segment separated from the spine by the maximum possible number of intervening segments (i.e., the short inward-turning horizontal segment at the right of the G or the vertical tip of the hook on the left of the J). The near and far locations were selected after a separate group of 12 Johns Hopkins students were given blank grids and asked to copy the two complex stimulus letters, which were displayed in other grids; an experimenter observed the order in which the two key segments of each letter were drawn. All of these subjects drew near segments before far ones. The probe marks in the near condition were separated by a mean of 5.7 cm and 5.5 cm on the screen for the J and G, respectively; the probe marks in the far condition were separated on the screen by a mean of 4.5 cm for both letters. Thus, although we expected far segments to be generated farther apart in the generation sequence, the marks were actually closer together on the screen than the near marks. The different stimuli are illustrated in Fig. 1. The trials for the Image Evaluation Condition were exactly like those just described

SEQUENTIAL

IMAGE

SlflPLE

GENERATION

325

STIMULI “NO” PROBES

“YES” PROBES

COMPLEX STlflUU -YES” NEAR

FIG.

PROBES

‘NO- PROBES FAR

1. The simple and complex stimuli used in Experiment 1.

except that none of the cells of the matrix were tilled. The lowercase letter was presented centered beneath the empty matrix, and the probe marks appeared 500 ms after the grid and lowercase letter were presented. Finally, the trials for the Image Report Condition were designed to obtain two separate response times. These trials also began by presenting an empty grid with a lowercase letter centered beneath it. However, the subject now first pressed a key to indicate that he or she had finished forming the image; this response time was recorded. Five hundred milliseconds after this initial key press, two probe marks were presented, and the subject evaluated them and pressed one of two response keys, exactly as in the Image Evaluation Condition. In all three conditions the decisions and the response times were recorded by the computer, with the clock starting when the probes were presented and stopping when either of two response keys was pressed. All types of trials were preceded by eight practice trials in which each letter appeared once; half of these trials had “Yes” probes and half had “No” probes.

Procedure The subjects were tested individually, beginning with a series of study trials to familiarize them with the block letters as they appeared in the grid. The subjects were first shown an empty grid with the lowercase version of the letter beneath it. The appropriate cells of the grid were then tilled (in a single raster scan of the screen) to form the corresponding

326

KOSSLYN ET AL.

uppercase letter. The subjects were asked to study each letter until they could close their eyes and form an image of it as it appeared in the grid, and were given as much time and as many exposures to the letters as was needed for them to report being able to form good images of the stimuli. This was a very easy task for all subjects. All subjects participated in all three conditions. Half of the subjects received the Image Report Condition first, followed by the Image Evaluation Condition, and half received the Image Evaluation Condition first followed by the Image Report Condition. The Perception Condition was always administered last becausewe worried that the subjectsmight begin to

learn where the probe marks appeared,which would obviate the need for imagery in the other conditions. In the Perception Condition the subjects were told simply to indicate whether or not both probe marks fell on the visible block letter. If so, they were to press the key labeled “Yes”; if not, they were to press the key labeled “No.” The “z” and “/” keys of the Apple II keyboard were used for responses. Within each counterbalancing group, half the subjects responded “Yes” by pressing the key beneath their dominant hand and half responded “Yes” by pressing the key beneath their nondominant hand; the remaining key was used to respond “No.” The response keys were kept constant for all of the conditions. In the Image Evaluation Condition the subjects were told to decide whether both probe marks would have fallen on the uppercase version of the lowercase cue letter located beneath the grid. That is, they were asked, “If the corresponding block letter were displayed in the grid, would both marks fall in cells that were tilled?” No instructions to use imagery were given here; given Podgorny and Shepard’s results, we expected that imagery would be used spontaneously in this task. Again, responses were made by pressing the appropriate key. In the Image Report Condition there were two responses: first, upon seeing the empty grid and lowercase cue, subjects were to “project” an image of the corresponding uppercase letter (as previously studied) into the grid. As soon as the image was complete, the subject was to press the key also used to indicate “Yes” decisions. Second, 500 ms after making the response, probe marks appeared and the subject decided whether both would have fallen on the letter had it actually been present in the grid. In all conditions subjects were urged to respond as quickly as possible while keeping errors to a minimum.

Results In this and all subsequent experiments reported in this paper, only times from trials on which a correct decision was made were included in the analyses. In addition, “wild” scores were eliminated prior to analysis; a wild score was defined as one two times as large as the mean of the remaining scores in a given condition for a given subject. Finally, all effects and interactions not discussed were not significant, p > .05 in all

cases. Complexity

Effects

The results from the simple self-report task in the Image Report Condition replicated previous findings, as shown in the left panel of Fig. 2, with more time being taken for the complex letters (a mean of 1490 ms) than the simple ones (1211 ms), F(l,ll) = 18.43, p < .002. Similarly, in the Image Evaluation Condition the complex letters required more time to evaluate than the simple ones, F(1 ,ll) = 42.05, p < .OOOl(for simple and

SEQUENTIAL

IMAGE

327

GENERATION

complex stimuli, the means were 872 vs 1234ms for “Yes” responses and 904 vs 1184 ms for “No” responses, respectively). In a single analysis including the times from both the Image Evaluation and the Image Report Condition, we found that complexity had the same effects on both measures, F < 1 for the interaction of measure and complexity. An additional 279 ms were required for the more complex letters in the Image Report Condition, compared to 321 ms in the Image Evaluation Condition. The self-report times generally were longer, F(l,ll) = 21.77, p < .OOl, and more time generally was required for the more complex letters, F(l,ll) = 61.80, p < .OOOl. To examine the generality of our findings beyond the four stimuli examined in detail, we simply correlated the self-report times for all eight letters with those from the far probes in the Image Evaluation Condition, which should have required completing most of the image; this correlation was very high, r = .87, p < .005. Thus, both measures seem to be tapping the same kinds of processing. In order to argue that the Image Report and Image Evaluation times reflect image generation per se, as opposed to image inspection, we looked for contrasting results in the image inspection time data. As is evident in the right panel of Fig. 2, we found that more time also was required to inspect previously formed images of the more complex letters in the Image Report Condition, F(l,ll) = 6.37, p < .03, replicating Podgorny and Shepard (1978). (With means of 670 vs 763 ms for “Yes” responses to simple and complex letters, respectively, and 747 vs 810 ms for “No” responses to simple and complex letters, respectively.) “Yes” responses were faster than “No” responses, F(l,ll) = 7.02, p < .03. More importantly, we next compared the time to inspect the pattern INSPECTION

GENERATION ,500 ,300 t

1100$ 900 t

+

4

# SIMPLE

COMPLEX

SIMPLE

COMPLEX:

STIMULUS

FIG. 2. Mean response times for simple and complex letters in Experiment 1, for two image generation tasks, image inspection, and perceptual inspection.

328

KOSSLYN

ET

AL.

after it had been imaged in the Image Report Condition to the time to perform the Image Evaluation Condition, which presumably contains a component due to having to generate the image. As is evident in Fig. 2, there were greatly diminished effects of complexity in the inspection task, F( 1,ll) = 23.84, p < .OOl, for the interaction of condition and complexity. In addition, there was a 301-ms overall advantage (337 and 266 ms for “Yes” and “No” responses, respectively) in inspecting a previously imaged letter relative to the Image Evaluation Condition, in which subjects presumably had to generate much of the image after the probes appeared, F(l,l 1) = 7.72, p < .02. Finally, there was an overall effect of complexity, F(l,ll) = 36.66, p < .OOl, which was larger for “Yes” than “No” trials, F(1,ll) = 8.42, p < .02. The data from the Perception Condition were also collected to be used as a control. The analysis of these data again revealed that more time was required to examine the complex letters than to examine the simple ones, F(l,ll) = 13.52, p < .005 (for simple and complex stimuli, a mean of 478 vs 498 ms for “Yes” responses and 505 vs 525 ms for “No” responses). “Yes” responses were faster than “No” ones, F( I,1 1) = 11.03, p < .007. A single analysis of the times from the Image Evaluation and Perception conditions revealed a large disparity in the effects of complexity in the two conditions, F(l,ll> = 34.13, p < .0002, with larger effects in the Image Evaluation Condition. In addition, more time was required overall for more complex letters, F(l,ll) = 50.54, p < .OOOl,and subjects generally were faster in the Perception Condition, F(1 ,11) = 40.63, p < .0002.

We also compared the time to inspect the letters after they were imaged in the Image Report Condition with the time to inspect the physically present stimuli in the Perception Condition, which are the two conditions originally examined by Podgorny and Shepard (1978). Replicating their results, there were no interactions with condition, p > .l in all cases. Subjects required more time for the complex letters, F( I,1 1) = 10.57, p < .Ol, in the image inspection task, F(l,ll) = 14.46, p < .003, and when making “Yes” responses, F(l,ll) = 12.88, p < .005. Perceptual control group. We next considered the results from a separate Perception Condition group, which was tested as a control for possible practice or fatigue effects in the Perception Condition. This group of 12 Harvard University undergraduates (volunteers who were paid for their participation) was tested by a different experimenter who was ignorant of the purposes and predictions of the experiment. The method and procedure used here were identical to those used for the previous group. The results of testing this group were similar to those found for the original Perception Condition, except that there was no difference between the simple and complex stimuli, F < 1. This finding was consistent

SEQUENTIAL

IMAGE

GENERATION

329

for both types of responses (for simple and complex stimuli, the means were 475 and 475 ms for “Yes” responses, and 521 and 503 ms for “No” = responses). “Yes” responses were faster than “No” ones, F(l,ll) 7.30, p < .03, and this difference was smaller for C than L, and for G than J, F(2,22) = 4.21, p < .03. Probe Position

The finding that more time is required to form images of more complex patterns is as expected if segments are generated individually. However, this result would also be expected if more complex perceptual units simply require more time to image all of a piece. Our next set of analyses addresses the question of whether images of some parts of the letters are generated before others. We now examined the effects of near versus far probes (relative to the beginning of the path along which the letter is typically drawn) for the two complex letters in the Image Evaluation Condition; these probes occurred only on “Yes” trials (see Fig. 1). First, response times for near probes were in fact faster than those for far probes (1121 vs 1351 ms, respectively), F(l,ll) = 14.39, p < .005. Second, there was no effect of letter (J vs G), F < 1. And third, the effects of near vs far probes were the same for the two letters, even though the far probes were to the left on the J and to the right on the G, F = 1.06 for the appropriate interaction. In order to consider the generality of our findings over all of the stimuli, we also correlated the response times with the number of segments from the first one typically drawn to the farthest probe for all eight letters. (Recall that a “segment” was defined as a set of contiguous filled cells that formed a straight line.) Consistent with the previous results, the correlation between the mean response times and the number of segments to the second probe mark was r = .80, p < .Ol. A separate analysis of the effects of probe position in the image inspection task was also conducted, revealing that the near probes were not evaluated faster than the far ones (722 vs 804 ms), F(l,ll) = 1.70, p > .2; this was true for both stimulus letters, F < 1 for the appropriate interaction. The correlation between the mean response times and the number of segments to the second probe mark was r = .48, p < .l. However, a separate analysis comparing the effects of probe location in the Image Evaluation Condition and in image inspection revealed only a marginal trend toward an interaction of location and condition, F( 1,22) = 2.90, p = . 1. The means of the near and far probe locations in the various conditions are presented in Fig. 3. We next examined the effects of probe position in the Perception Condition. The results of this analysis are easy to summarize: there was absolutely no hint of an effect of probe position, nor were there interac-

330

KOSSLYN ET AL. GENERATION

INSPECTION

t NEAR

FAR

).

NEAR

FAR

PROBE LOCATION

FIG. 3. Mean response times for near and far probe locations in Experiment 1.

tions with this factor, p > .20,in all cases (the means were 497 vs 499 ms for near vs far, respectively). In another analysis we compared the times from the Perception Condition with those from the Image Evaluation Condition, and found that two types of probes do indeed affect the two measures differently, F(l)1 1) = 11S7, p < .Ol, for the interaction between probe type and condition. (There was, however, an overall increase in times for far probes, F(l,ll) = 15.96, p < .005, and times were faster in the Perception Condition, F(l,ll) = 38.68, p < .OOl). The correlation between the response times and the number of segments from the beginning to the second x mark was r = .24, p > .25; this correlation contrasts with the .80 correlation found in the Image Evaluation Condition, which is consistent with the claim that effects of segments reflect image generation per se. Perceptual control group. An analysis of the effects of near versus far probes in the separate control group again revealed absolutely no effect of probe location, with means of 478 and 471 ms for the two types of trials, respectively, F < 1. The correlation between the mean response times and the number of segments to the second probe mark was r = .20, p > .25. Of particular interest are the results of analyses in which the data from this group were compared to the data from the initial Perception Condition. Not only was there no significant effect of group, F < 1, but there were no interactions with group, p > .19 in all cases. Finally, there were no significant interactions or main effects when we analyzed only the effects of probe position, p > .15 in all cases. Apparently, the results from the Perception Condition were not contaminated by the preceding imagery tasks. Indeed, the magnitude of the complexity effect in the two perception groups differed by only 11 ms, and the disparity in the difference between near and far probe types in the two conditions was only 5 ms.

SEQUENTIAL

IMAGE

GENERATION

331

The error rates for the various conditions were positively correlated with the response times: for Image Evaluation, Y = .40, p < .l; image inspection (in the Image Report Condition), Y = .56, p < .02; Perception, r = .15, p > .l; and Perception Control group, r = .41, p < .l. Thus, there was no evidence of a speed-accuracy tradeoff. Discussion

Perhaps the most important results obtained here are the effects of probe position, which provide support for the claim that segments are imaged sequentially. The increased times for far probes did not occur in the analogous perceptual task nor in the image inspection task. The correlational analyses dovetailed nicely with these results, with a high correlation between the times in the Image Evaluation Condition and the number of segments necessary to image before being able to evaluate the probe mark on the “farthest” segment. It is of interest that these correlations were low in the perceptual task and in the image inspection taskneither of which required image generation. There was good convergence in the results from the simple self-report task traditionally used to measure image formation time and the new image generation task, with the effect of letter complexity being very similar in the two measures. In principle, this effect of complexity could simply reflect added time to inspect images of more complex patterns in both tasks. However, when physically present letters or previously imaged letters were inspected, the effect of complexity was greatly diminished or absent; hence, the added time for more complex letters in the Image Evaluation task apparently is not due solely to difficulty in evaluating the locations of the probe marks. In addition, as just noted, the effects of probe position suggest that the Image Evaluation task reflects image generation time. Given the sum of these results, the most parsimonious account of the convergence in times from the Image Report and Image Evaluation tasks is that both tasks reflect image generation time. One might attempt to argue that the present results do not indicate that images of segments are generated one at a time. Instead, perhaps the image of a letter is formed all of a piece, but then is scanned over to ensure that it is fully present. Such scanning would only occur when one has just formed an image from memory. If so, then the more scanning that occurs, the more time will be required. One variant of the scanning theory attributes the effects of complexity and probe position to systematic leftright scanning over the screen after the image was formed. However, the effects of probe position in the image generation task were obtained when the far segment was to the left or to the right of the near one, and hence these effects cannot be due to such a scanning strategy over the screen. In addition, the probe marks in the near pairs were actually physically

332

KOSSLYN

ET

AL.

farther apart on the display than were the x marks in the far pairs, but nevertheless near probes required less time to evaluate than far ones. EXPERIMENT 2 The results of Experiment 1 suggest that images of letters are generated segment by segment. However, these findings are also consistent with the hypothesis that subjects generate the image all of a piece and then scan along it, tracing its contour in order to ensure that the pattern is fully present; such postgeneration scanning times presumably would increase with complexity and distance along the letter, Alternatively, it is possible that the effects reflect not sequential generation of individual segments, but a continuous unfurling of the image (akin to the kind of processing discussed by Freyd, 1983). The stimuli used in the present experiment included probes that were different distances along a single segment, as well as probes on different segments. Thus, we could examine separately the role of segment and distance in determining response times. If postgeneration scanning or continuous generation occurs, then we expect increased times when probes are greater distances along the letter (see Finke & Pinker, 1982, 1983; Jolicoeur & Kosslyn, 1985; Kosslyn, 1980; Kosslyn, Ball & Reiser, 1978); if image generation is segment by segment with no scanning, we expect effects of segment but not of distance per se. Subjects were tested in three conditions in order to investigate these hypotheses. In the Fixation Condition, subjects were instructed to focus their attention at a dot in the center of the screen immediately after reading the cue in the image evaluation task of Experiment 1. If this manipulation discourages scanning and image generation is not continuous, then there should be no effects of distance on the response times. And if the previously observed effects of complexity and segment were due to postgeneration scanning, then such effects should be eliminated along with scanning effects. In contrast, if image generation is gradual and continuous, more time should be required to evaluate probes greater distances along the letter even when scanning is discouraged. But if effects of complexity and segment persist in the absence of effects of distance, we have evidence that neither continuous generation nor scanning underlie these effects. In the Scanning Condition, subjects explicitly were instructed to scan over the imaged letters in the course of searching for the two probe marks. This condition was conducted as a control for the sensitivity of the paradigm to distance effects: when subjects are instructed to scan the imaged letters, response times should increase with increasing distance traversed; if they do not, the paradigm is insensitive to scanning effects. If we do obtain effects of distance when subjects are instructed to scan,

SEQUENTIAL

IMAGE

333

GENERATION

but not otherwise, this is evidence that subjects do not spontaneously scan before responding. In the Neutral Condition, subjects are instructed neither to fixate nor to scan, but simply to evaluate the probes, exactly as in the Image Evaluation Condition of Experiment 1. Finally, this experiment takes an additional precaution not used in the within-subjects conditions of the first experiment: the experimenters were totally ignorant of the stimulus design, hypotheses, and predictions of this experiment and of the results of the first experiment. This precaution minimizes possible effects of experimenter expectancy (see IntonsPeterson, 1983), particularly if the experimenters remain ignorant and do not develop similar hypotheses over the course of the experiment. To maintain this state of ignorance, the experimenters were not privy to the results until all subjects were tested. Method Subjects Thirty-six Harvard University undergraduates volunteered to participate as paid subjects, being equally divided among the three conditions. No subject participated in Experiment 1.

Materials The stimuli used in the Image Evaluation Condition of Experiment 1 were also used here. However, the stimulus set was modified by including two versions of the “near” trials for the letters J and G: “Near-close” trials had the second x (not on the spine) closer to the spine, and “near-far” trials had the second x farther from the spine on the same segment as the corresponding “near-close” trial. The two types of trials are illustrated in Fig. 4. Six trials of each type for “G” and for “J” were included. In addition, four of the J and G “far” trials used before were used here. The stimuli were randomized with the same constraints used in Experiment 1, and the same sequence was used in all three conditions. For the Scanning Condition, we drew on paper a set of 4 x 5 grids and formed the letters within them by blacking out the appropriate squares. Arrows were drawn showing the direction in which the strokes are usually drawn (as determined by a consensus of people around the lab, excluding the experimenters who tested these subjects; the drawings from the earlier group of 12 subjects had been scored only in terms of relative near and far segments). These materials were used to teach subjects how to scan along the letter in search of the probe marks. Finally, for the Fixation Condition a 3-mm round piece of masking tape was afftxed to the location on the screen corresponding to the center of the grid.

Near-clore

J

Near-ter

J

Near-ciose G

Neef-fer

G

FIG. 4. The two types of near probe locations used in Experiment 2.

334

KOSSLYN ET AL.

Procedure This experiment used a purely between-subjects design, with each subject being tested in only one condition, which resulted in less load on individual subjects and eliminated any possible order effects between conditions. One experimenter administered the Scanning Condition, and another administered the Fixation and Neutral Conditions. Subjects in the Fixation Condition were asked to read the lowercase cue and then immediately to focus their attention on the tape marker. As an aid to such attention focusing, they were told to fixate their eyes on the tape thoughout the task. Posner, Nissen, and Ogden (1978, p. 149) demonstrated that subjects are very good at following this kind of fixation instruction; indeed, anticipatory eye movements of 1 deg. or more occurred on only about 4% of their trials. We took the Posner et al. results as evidence that extensive monitoring of eye fixation (which is an overt indication of attention fixation) was not necessary, particularly because we could later check whether times increased with distance; if not, we could be confident that attentional scanning did not occur. Nevertheless, during the practice trials the experimenter watched the subjects’ eyes and ensured that they remained fixated on the center of the screen; during the actual test trials the experimenter was out of sight, to eliminate potential experimenter effects (even though the experimenter was ignorant of the purposes and predictions of this experiment). Aside from the addition of the instructions to focus attention on the tape, the instructions were identical to those used in the Image Evaluation Condition of Experiment 1. The Scanning Condition was like the Image Evaluation Condition of Experiment 1 except that subjects were told to image the letter in the grid, and then to mentally scan along it, tracing out the pathway shown in the sample drawing of the letter. Subjects first were asked to study the sample materials and to draw each letter, indicating the direction of scan by the way the letter was drawn. The subjects were asked to memorize the scan paths, and were tested until they could draw each of the scan paths correctly from memory. Following this, they were told that we did not want them to respond in the probe evaluation task until they had scanned along the imaged letter and passed over both x marks (on “Yes” trials), or had scanned over the entire letter and not “seen” both marks (on “No” trials). The instructions and procedure for the Neutral Condition were identical to those used in the Image Evaluation Condition of Experiment 1. We did not test an additional group in the Perception Condition because the data from the between-subjects Perception Condition of Experiment 1 did not reveal effects of complexity or probe type; these results suggest that effects of complexity and probe type are not due to encoding, evaluation, or response processes. Given the questions being asked here, we have no need to obtain data from another perceptual baseline group. As before, eight practice trials were given at the outset of each condition and subjects were urged to respond as quickly as possible, keeping errors to a minimum while fully following the instructions.

Results We first examined the results from each condition separately, and then compared conditions in additional analyses. The data were analyzed as in Experiment 1, except that only responses from “Yes” trials were considered because we were particularly interested in effects of different positions along the letters. All effects and interactions not mentioned were not significant, p > .I in all cases. The results are illustrated in Fig. 5. Of most interest are the effects of probe position along a segment in the Fixation Condition. An analysis

SEQUENTIAL

PMJEE LOCATION

STI)NLus

SIMPLE

335

IMAGE GENERATION

COMPLEX

MAR-CLOSE

NEAR-FAR

FAR

PROBE TYPE

FIG. 5. The left panel illustrates the mean response times for simple and complex letters in Experiment 2. The right panel illustrates the mean response times from the three types of probe locations on the complex letters.

examining the two sorts of “near” trials revealed that the difference between them did not even approach significance, F < 1. This result is exactly as expected if image generation is a segment at a time and the instructions did in fact discourage the subjects from scanning. Thus, it is of interest that in a second analysis the effects of complexity persisted (1105 vs 1506 ms, for simple and complex stimuli, respectively), F(1 ,l 1) = 14.26, p < .005. This result was also reflected in an analysis of the three probe positions, which revealed that the far probes required more time than the near ones (with means of 1293, 1350, and 1876 ms for near-close, near-far, and far probes, respectively), F(2,22) = 7.85, p < .003. An analysis of the two near positions revealed an effect of distance in the Scanning Condition, F(l,ll) = 28.10, p < .0005 (for J, 1332 and 1532 ms; for G, 1481 and 1673 ms for near-close and near-far trials, respectively). Thus, the paradigm is sensitive to scanning effects. In addition, less time was required for the letter J, F( 1,ll) = 6.98, p < .03, which may reflect the smaller difference in distance between the two probe positions for this letter (see Fig. 4). A separate analysis revealed an effect of complexity (1193 vs 1606 ms for simple and complex stimuli, respectively), F( 1,ll) = 51.15, p < .OOOl.A third analysis revealed that times varied with probe position (1407, 1603, and 1809 ms for near-close, near-far, and far probes, respectively), F(2,22) = 22.77, p < .OOOl. Probes in the near-far position required more time to evaluate even in the Neutral Condition, F(l,ll) = 17.43, p < .002 (for J, 1183 and 1358 ms; for G, 1144 and 1350 ms for near-close and near-far trials, respectively). This effect was consistent for both letters, F < 1 for the interaction of letter and distance. An additional analysis revealed longer times for more

336

KOSSLYN ET AL.

complex letters (1109 vs 1367 ms, for simple and complex, respectively), F(l,ll) = 10.62,~ < .Ol. Finally, times varied with probe position (1168, 1354, and 1575 ms for near-close, near-far, and far positions, respectively), F(2,22) = 21.98, p < .OOOl. Three additional analyses were conducted to compare directly the results from the different conditions. In an analysis conducted to examine the effects of probe position on a single segment, near-close probes required less time than near-far ones, F(1,33) = 19.97, p < .OOOl.The form of the interaction being what it was, it is not surprising that the interaction between position and condition was not significant, F(2,33) = 1.88, p > .I, nor was any other effect or interaction, p > .09 in all cases. Similarly, although more complex letters required more time to evaluate, F( 1,33) = 55.05, p < .OOOl,this effect was the same in all conditions, F = 1.09 for the interaction of complexity and condition. Finally, an analysis of the three probe positions revealed an effect of position, F(2,66) = 30.10, p < .OOOl,but this effect was comparable in the different conditions, F(4,66) = 1.47,p > .l. Because it is of particular theoretical importance, we directly compared the effects of complexity in the Fixation Condition, in which there was no evidence for scanning, and the Neutral Condition. These effects were comparable in the two conditions, F( 1,22) = 1.18, p > .25. The error rates in all three conditions increased with increasing response times: for the Fixation Condition, r = .46, p > .I; Scanning Condition, r = .51, p < .05; and Neutral Condition, r = .79, p < .OOl. There was no evidence of a speed-accuracy tradeoff. Discussion The most important results are from the Fixation Condition, in which we did not find a significant difference between the two “near” probe positions, indicating that the fixation instructions did in fact discourage scanning. Nevertheless, the amount of increased time for more complex letters was comparable to that observed when scanning was not discouraged, as was the amount of increased time for probes on “far” segments. Thus, postgeneration scanning does not underlie the effects of complexity or segment, at least for this condition. We can also conclude that scanning does not play a critical role in image generation, at least of the sort examined here. Similarly, these results are counter to what is expected if image generation is continuous, which also should have resulted in subjects requiring more time to evaluate more distant probes in this task. However, increased times for probes farther along a segment were observed in the Neutral Condition, and these effects were comparable to those observed in the Scanning Condition. Thus it is possible that scanning or continuous generation (e.g., of individual filled cells) did indeed occur in the Neutral Condition. But given the results from the Fixation

SEQUENTIAL

IMAGE GENERATION

337

Condition, we can conclude that such processing is not necessary in image generation, and that the effects of complexity and segment are not simply artifacts of scanning. In addition, postexperiment debriefing revealed that the experimenters remained ignorant of the purposes and predictions of the experiment; indeed, the experimenters were unaware that letter complexity and probe position had been systematically varied. Thus, the effects of complexity and probe position observed here do not reflect the experimenters’ beliefs about the expected outcome. Given the similarity between these results and those of Experiment 1, we have reason to infer that our earlier findings also were not due to experimenter expectancy effects. Finally, it may be worth noting that the present results allow us to reject another alternative interpretation of our findings. The results cannot be due to eye movements. That is, the increases in time for increasingly complex letters or farther probes cannot reflect the time required to make additional eye movements. This notion is implausible on the face of things, given the magnitudes of our effects relative to eye movement time, but it is worth having the convergent evidence from the Fixation Condition. EXPERIMENT

3

The previous experiments used letters as stimuli, and we assumed that the most common order in which the segments are hand printed would be the most common order in which they are generated into an image. It is possible that our findings have something to do with letters per se, and it is possible that the order in which segments are drawn reflects some other, irrelevant principle (e.g., having to do with the structure of our hands). The claim that images are generated a segment at a time would be stronger if we used novel stimuli, especially if we experimentally manipulate the order in which the segments are learned. Thus, in this experiment we created two novel, four-segment patterns and asked subjects to learn the segments in one of two orders. Following this, the subjects participated in the Image Evaluation task used previously, except that in this experiment only a single probe mark was used. We expected that the time to decide whether a probe would have fallen on the pattern would depend on the order in which the segments were learned, with time increasing as additional segments had to be generated into the image. Method

Subjects Sixteen members of the Harvard University community volunteered to participate as paid subjects. None had participated in any of the previous experiments.

338

KOSSLYN ET AL.

Materials A 4 X 5 grid, subtending 10 x 11 deg. of visual angle, was presented on the video display. The computer was programmed to deliver a single probe, as is described below. Two novel patterns were created by filling in selected cells of the grid; these patterns each had four segments, as is evident in Fig. 6. (Note: the numbers were not present on the actual stimuli; they are included here to illustrate the order in which the segments were presented, as is discussed below.) These patterns were presented to subjects on paper, one segment at a time. Thus, the subjects never saw the entire pattern, but were required to “mentally glue” the segments together to form a complete image of each pattern. The patterns were arbitrarily named “A” and “B”.

Procedure Each subject was randomly assigned to one of two groups, which differed only in the order in which the figure segments were presented. The subjects were asked to study a sequence of four grids, each of which illustrated a single segment. Each segment contained an arrow indicating how the segment should be drawn. These grids were presented on paper at the same size as those which later appeared on the screen. The subject was asked to study the location of the first segment, close his or her eyes and form a mental image of it, and then turn to the second segment. This segment was to be studied and imagined as it would look connected to the first segment, and so on through the four segments. The object was to “mentally glue” the segments together, forming an image of what they would look like if drawn together within a single grid. The segments were presented in the order l-2-3-4 (as indicated on Fig. 7) for one group, and in the opposite order, 4-3-2-1, for the other group. (The numbers were not on the patterns the subjects learned.) The subjects were allowed to study the segments for pattern “A,” in the specified order, until they thought they could form a mental image of the entire pattern. They then were asked to form an image of pattern “A,” and then to draw it six times, each segment in the order in which it had been presented. Drawings were made on blank 4 x 5 grids on paper; these grids were 5.7 x 6.4 cm. If an error was made, the stimuli were given back to the subject and the study procedure was repeated. Once the subject drew pattern A correctly six times, pattern B was learned using the same procedure. Following this, the subjects were asked to draw each of the patterns 12 times; the pattern cues (“A” and “B”) were presented in a random order. If they made an error, the study phase was again repeated. The procedure of the Image Evaluation Condition of Experiment 1 was also used here. Either “a” or “b” appeared beneath a blank grid on the screen, followed 300 ms later by a single x within one of the cells of the grid (the interstimulus interval was reduced from the 500 ms used previously because only two cues could occur and only one probe was used). Half of the trials contained x marks that fell in a cell that would have contained a segment

A

0

FIG. 6. The stimuli used in Experiment 4. The numbers were not present on the stimuli given to the subjects; they are included here for identification purposes only.

SEQUENTIAL

;;

GENERATION

339

2700’

I E

2500’

$

2300

E z % “a

IMAGE

2100 1900, 1700, 1500

I

2 SEGMENT

3

4

NUMBER

FIG. 7. Mean response times for probes on the segments illustrated in Fig. 6, learned in ascending order by Group 1 and in descending order by Group 2. (“Yes” trials), and half contained x marks that fell in an empty cell that was adjacent to a tilled one (“No” trials). The subjects were asked to read the lowercase pattern cue, and to form an image of the specified pattern within the blank grid on the screen. They were told that the task was to attend to the part of the grid “covered” by the image, and to decide whether the x would fall in a box that would be tilled by the pattern were it actually present in the grid. If so, they were to press the “Yes” key; if not, they were to press the “No” key. The subjects were asked to respond as accurately and quickly as possible. Before beginning the experiment proper, 8 practice trials were presented; if the subjects made any errors, an additional 8 practice trials were presented. Following this, the subjects evaluated eight blocks of 16 trials, for a total of 128 trials. Within each block of trials, each pattern and each type of response (Yes/No) occurred an equal number of times, and every segment of each pattern was probed an equal number of times. The trials were presented in a random order, with the constraint that no pattern or response condition occurred more than three times in succession. Equal numbers of males and females were tested in each group, one-half of each gender in each group responding “Yes” with their dominant hand, and one-half responding “Yes” with their nondominant hand. In addition, half of the subjects in each of these counterbalancing groups received the trials in one order, whereas the other half received the trials in the reverse order.

Results

For each subject, eight mean response times were computed for “Yes” probes, one for each of the four segments of each of the two patterns. The most interesting results are apparent when only the “Yes” responses are analyzed, and we know which segment should have been imaged to evaluate the probe (“No” probes often were adjacent to more than one segment). These means were then submitted to an analysis of variance. As is illustrated in Figure 7, probes on each segment had radically different effects in the two learning groups, F (3,42) = 12.86, p < .OOOl,for the interaction of segment and group. This interaction was the same for both stimulus patterns, F (3,42) = 1.39, p > .25, for the interaction of pattern,

340

KOSSLYN

ET AL.

group, and segment. In general, more time was taken to evaluate pattern “A,“ F (1,14) = 5.89, p < .03. There was no difference in the overall means for each group, F < 1, nor were any other effects or interactions significant, p > .14 in all cases. In a second analysis we evaluated the speed of generating the individual segments in the two orders. We examined effects of the order in which each segment was learned, considering the segments in ascending order for the two groups. Now there was a steady increase in time for segments farther along the sequence, F (3,42) = 12.38, p < .OOOl,and this increase was the same for both groups, F (3,42) = 1.26, p > .25, for the interaction of segment and group. Thus, the increases in time per segment were of comparable magnitude regardless of the order in which the segments of the figures were imaged. The error rates were positively correlated with the response times, with r = .81, p < .02 in the forward order group, and r = .54, p > .l, in the backward order group. There was no speed-accuracy tradeoff. Discussion

The results from this experiment dovetail nicely with those from the previous experiments. The time to decide whether a probe would have fallen on a pattern depends on the amount of the image that must be generated before the to-be-evaluated segment is present. Segments farther along the sequence require more time to evaluate. And the order in which segments were learned proved to dictate the order in which they were later imaged. As is evident in Fig. 7, the learning procedure had dramatic effects on the ease of evaluating any given probe. The fact that the order of encoding affects the order of image generation does not imply that this is the only principle underlying such ordering. Indeed, Roth and Kosslyn (1988) found very different principles at work when images of three-dimensional objects were generated. Nevertheless, the order of encoding is one principle which, all other things being equal, will determine the serial unfolding of the generation process. GENERAL

DISCUSSION

The results from these experiments allow us to draw three general conclusions. First, we showed that imaged letters and letterlike patterns are generated sequentially over time. Although it is possible that subjects sometimes may continuously generate an image of a pattern, the results suggest that such generation is at best one option, along with segmentby-segment generation. Caution in positing continuous generation is warranted by the results from the Fixation Condition of Experiment 2, which suggest that a process of segment-by-segment generation can account for the effects of complexity and segment observed here. In addition, the

SEQUENTIAL

IMAGE

GENERATION

341

reasoning that motivated our predicting sequential generation, which rests on discrete encodings of parts made as one changes fixation over time, suggests that segment-by-segment, and not continuous generation will occur when images are later formed. Second, the order in which the segments of a pattern are encoded is one factor that determines the order in which they subsequently will be imaged. This inference is justified by the results from Experiment 3, in which we varied the order of encoding. In addition, if we assume that one monitors oneself while printing block letters, then the first-drawn segments will be visually encoded before later-drawn ones. Thus, it is not surprising that segments of block letters are imaged in the order in which they typically are drawn. Third, the traditional subjective technique for measuring image generation does in fact reflect such processing. In Experiment 1 we found very similar effects of the complexity of a letter on the time to generate an image of it in a simple self-report task and in the Image Evaluation task, and we had independent evidence that the Image Evaluation task reflects sequential image generation. Furthermore, the results of Experiment 2 showed that the effects in the new task were not due to experimenter expectancy, which indirectly demonstrates that the subjective self-report task was not produced by possible experimenter effects. Thus, the previously reported results from experiments that used the subjective method are unlikely to have led us too far astray. Careful debriefing following testing indicated that the subjects did not infer the purposes or predictions of the experiments. Indeed, many denied that they formed images one segment at a time, and the majority of subjects reported not being aware of any systematic differences in their response times to the different types of probes. Even though the time to form each segment is relatively long in information-processing terms, apparently people are not usually aware of time differences between mental events of at most about one-third of a second. The present results are interesting in part because they generalize to new stimuli, as witnessed by the results of Roth and Kosslyn (1988), who used the present methodology to demonstrate that closer parts of threedimensional objects are imaged prior to farther parts. The present finding of corresponding sequential effects with letters and novel twodimensional patterns serves to underline the generality of the claim that parts of imaged objects are generated sequentially, although the Roth and Kosslyn results suggest that different principles are at work in the twoand three-dimensional cases. However, one might challenge the generality of these results by pointing out that in both series of experiments images were formed within grids. Podgorny and Shepard (1978) found that the presence or absence of a grid did not affect their results, and we have

342

KOSSLYN ET AL.

no reason to believe it is critical here. But in any event, we have good evidence that at least in some situations parts of imaged objects are generated in sequence. The claim that an image is generated by sequentially integrating parts of the imaged object also illuminates results in a very different domain. Kosslyn, Holtzman, Farah, and Gazzaniga (1985) found that the right hemispheres of split-brain patients have difficulty in forming images of letters and in imaging the correct relative spatial relation of two parts of an object, whereas their left hemispheres have no difficulty in these tasks. If uppercase letters are imaged a segment at a time, it is easy to argue that the left hemisphere will be superior: the local spatial relations among parts may be represented by descriptions (e.g., “connected to,” “left of,” and so on; see Kosslyn, 1987). These descriptions are languagelike, and hence may be processed better in the linguistically sophisticated left hemisphere when used to arrange segments into an image. In addition, if segments of uppercase letters are imaged sequentially, it is easy to argue that the right hemisphere will have difficulty imaging letters: these patients have difftculty performing numerous sequential-processing tasks in their right hemispheres, such as drawing simple inferences about the next item in a progression of successive cues. In contrast, both’ hemispheres were equally good at imagery tasks requiring the generation of a single “global” shape. The distinction between the two types of tasks suggested that the deficit was specific only to imagery tasks that involved integrating separate units into a single composite form. Thus, these results converge nicely with the claim that letters are imaged a segment at a time. In conclusion, although the individual stages of processing may be carried out using massively parallel networks, the present results indicate that these computations are at least sometimes sequenced precisely over time. Both parallel and sequential processing must be explained by a comprehensive theory of information processing, which should account for how the two kinds of mechanisms are intimately linked in the brain. REFERENCES Beech, J. R., & Allport, D. A. (1978). Visualization of compound scenes. Perception, 7, 129-138. Farah, M. J., 8 Kosslyn, S. M. (1981). Structure and strategy in image generation. Cognitive Science, 4, 371-383. Finke, R. A., & Pinker, S. (1982). Spontaneous imagery scanning in mental extrapolation. Journal of Experimental Psychology: Human Learning and Memory, 8, 142-147. Finke, R. A., & Pinker, S. (1983). Directional scanning of remembered visual patterns. Journal of Experimental Psychology: Learning, Memory, & Cognition, 9, 398-410. Freyd, J. J. (1983). Representing the dynamics of a static form. Memory & Cognition, 11, 342-346. Intons-Peterson, M. J. (1983). Imagery paradigms: How vulnerable are they to experiment-

SEQUENTIAL

IMAGE

GENERATION

343

ers’ expectations? Journal of Experimental Psychology: Human Perception and Per9, 394-412. Jolicoeur, P., & Kosslyn, S. M. (1985). Is time to scan visual images due to demand characteristics? Memory and Cognition, 13, 320-332. Kosslyn, S. M. (1975). Information representation in visual images. Cognitive Psychology, 7, 341-370. Kosslyn, S. M. (1978) Measuring the visual angle of the mind’s eye. Cognitive Psychology, formance,

10, 356-389.

Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard Univ. Press. Kosslyn, S. M. (1987). Seeing and imagining in the cerebral hemispheres: A computational approach. Psychological Review, 94, 148-175. Kosslyn, S. M., Ball, T. M., & Reiser, B. J. (1978). Visual images preserve metric spatial information: Evidence from studies of image scanning. Journal of Experimental Psychology:

Human Perception

and Performance,

4, 47-60.

Kosslyn, S. M., Holtzman, J. D., Farah, M. J., & Gazzaniga, M. S. (1985). A computational analysis of mental image generation: Evidence from functional dissociations in split-brain patients. Journal of Experimental Psychology: General, 114, 311-341. Kosslyn, S. M., Reiser, B. J., Farah, M. J., & Fliegel, S. J. (1983). Generating visual images: units and relations. Journal of Experimental Psychology: General, 112, 278-303. Kosslyn, S. M., Roth, J. D., & Mordkowitz, E. (1986). Theories of image formation: A Computational Approach. In D. M. Marks (Ed.), Theories of Image Formation. New York: Brandon House. Mat-r, D. (1982). Vision. San Francisco, CA: Freeman. Paivio, A. (1975). Imagery and synchronic thinking. Canadian Psychological Review, 16, 147-163. Palmer, S. E. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology,

9, 441474.

Podgorny, P., & Shepard, R. N. (1978). Functional representations common to visual perception and imagination. Journal of Experimental Psychology: Human Perception and Performance, 4, 21-35. Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick & I. J. Saltzman (Eds.), Modes of perceiving and processing information. Hillsdale, NJ: Erlbaum. Pylyshyn, Z. W. (1981). The imagery debate: Analogue media versus tacit knowledge. Psychological Review, 87, 16-45. Reed, S. K. (1974). Structural descriptions and the limitations of visual images. Memory & Cognition,

2, 329-336.

Reed, S. K., & Johnsen, J. A. (1975). Detection of parts in patterns and images. Memory & Cognition,

3, 569-575.

Roth, J. D., & Kosslyn, S. M. (1988). Construction of the third dimension in mental imagery. Cognitive Psychology, 20, 344-361. Weber, R. J., & Hamish, R. (1974). Visual imagery for words: the Hebb Test. Journal of Experimental

Psychology,

102, 409-414.

(Accepted November 25, 1987)

Suggest Documents