Perception, 2007, volume 36, pages 1715 ^ 1729
doi:10.1068/p5592
Pictures in mind: Initial encoding of object properties varies with the realism of the scene stimulus Benjamin W Tatler
School of Psychology, University of Dundee, Dundee DD1 4HN, Scotland, UK; e-mail:
[email protected]
David Melcher
Center for Mind/Brain Science and Department of Cognitive Science, University of Trento, I 38068 Rovereto, Italy Received 3 March 2006, in revised form 3 April 2007; published online 5 December 2007
Abstract. A fundamental question in perception is how we visually encode and retain information about a complex scene in order to allow effective operation within it. Interestingly, the stimuli used to investigate scene perception have varied greatly between studies, ranging from line drawings to coloured drawings, computer-generated scenes, photographs, and real scenes. Are findings from these different types of scene stimulus equally ecologically valid? Two experiments are reported that address this issue. In the first we compared photographic and non-photographic scenes and found that observers perform better in questions testing object memory when viewing photographs, suggesting an initial benefit for encoding information from photographs. In the second we found that whether or not non-photographic scenes obeyed realistic scene-organising properties influenced object-memory formation. Effects varied for the different question types, but again were most prominent early in viewing. We conclude that in the search for an understanding of everyday scene perception we must be very careful in our choice of scene stimuli and in our interpretation of findings from the laboratory.
1 Introduction Humans are expert perceivers of natural scenes. Most of our daily interactions involve perceiving, thinking, and acting in complex three-dimensional environments. Our environments tend to follow a predictable set of constraints based both on physics (such as the law of gravity) and the norms of our particular culture (such as the pattern of roads, buildings, and landscapes). Thus, one might expect that our perceptual and memory systems might be particularly adept at processing and using information from natural scenes. Our scientific knowledge about visual perception and memory, however, is largely based on simplified, two-dimensional stimuli. This raises the question whether scenes are `special' for vision and, if so, whether the `scene-ness' of a particular stimulus influences how it is perceived and remembered. The ubiquity of visual scenes in everyday life means that it would be grossly inefficient to remember in great detail each aspect of the three-dimensional visual world that we encounter during the day. Consensus now favours the view that we do not typically encode exact, metric details about a scene. Instead, scene perception can be characterised as a two-step process. First, we quickly grasp the `gist' of the scene (Biederman 1981; Intraub 1980, 1981). Learning about the specific details of the environment, including its constituent objects, however, requires a series of saccadic eye movements. Depending on the pattern of fixations, we extract and retain in memory some properties of objects in the scene across separate glances (eg Hollingworth 2004; Hollingworth and Henderson 2002; Hollingworth et al 2001; Melcher 2001, 2006; Melcher and Kowler 2001; Melcher and Morrone 2003; Tatler 2002; Tatler et al 2003, 2005). Given the complexity of natural scenes, and the difficulty in measuring and controlling behaviour in real environments, perception and memory are often measured with simple displays of text, numbers, or coloured shapes. When scene-like stimuli have
1716
B W Tatler, D Melcher
been used, such studies have variously involved simple line drawings (eg De Graef 1998; De Graef et al 1990; Henderson et al 1999), coloured images of drawings and paintings (eg included in Melcher 2006), computer-rendered scenes (eg Henderson and Hollingworth 2003; Hollingworth 2006; Hollingworth and Henderson 2002; Melcher and Kowler 2001), photographs of real-world scenes (eg Henderson 2003; Melcher 2006; Tatler et al 2003; and many more), and real-world environments themselves (Tatler et al 2005). In most of these studies, the chosen stimulus type has been assumed to be a valid representation of the real scene, and as such it has been assumed that the characteristics of scene perception and object-memory formation observed are ecologically valid. That is, it is often assumed that the way that information is extracted and retained from a simplified scene, such as a drawing, is equivalent to the way in which information is extracted from a real scene. Furthermore, it is often assumed that findings from studies of a particular stimulus type can be compared directly to studies of a different stimulus type. These assumptions lead to two fundamental questions, which form the basis of the present study: (i) Does our ability to encode and retain information vary according to the manner in which the scene is depicted? (ii) Given that non-veridical depictions can violate fundamental characteristics of real scenes (such as obeying gravity and perspective), does the level of realism influence our ability to encode and retain object memories? If we consider first the nature of the scenes, it is clear that the different depictions of a complex scene, including drawn, painted, and photographed versions as described above, have very different visual characteristics. As such, the information available for encoding from such scenes varies considerably between these different depictions. For example, simple black-and-white line drawings of scenes contain positional information and object-identity information (but only from the shape of drawn objects); these scenes necessarily lack colour information, shading, most texture information, and depth. Paintings generally offer a wider range of sources of information, but still tend to be gross simplifications of the level of detail in real-world scenes. Computerrendered scenes, though rich in detail, are far less complex than a photograph of a real scene. While photographs offer the richest information and are closest to the real scenes that they depict, they are still a simplification of the real scene, with, for example, a smaller field of view, less variation in brightness, and no real binocular depth. Previous studies of picture perception have demonstrated that recognition memory is sensitive to manipulations of low-level image properties such as luminance (Loftus 1985) and contrast (Loftus and McLean 1999). Given the large differences in the range and richness of information available to the viewer when viewing different stimulus types, it might seem logical to suggest that these differences may have real and measurable consequences for the way in which information is extracted and retained from the depicted scenes. Furthermore, it is possible that different object properties may be differentially sensitive to the nature of the scene stimulus, with different object properties being assimilated in different ways according to the medium in which the scene is depicted. To our knowledge, there has been only one study to date in which object-memory formation when viewing different stimulus types has been directly compared (Tatler et al 2005). In this study, the characteristics of object-memory formation and persistence were compared for real-world scenes and photographs of real-world scenes. This study showed that the overall patterns of accumulation of object information were comparable between these two stimulus types, but that the richness of the encoded memory was greater for those who viewed the real scenes than those who viewed photographs of the scenes. This previous work also suggested that there may be some degree of differential sensitivity of object properties to the scene stimulus: in particular, differential patterns of information persistence for colour and position were observed
Pictures in mind
1717
for the real and photographic scenes. While this previous study provides a useful exploration of the validity of using photographic stimuli as representations of real-world scenes, it is not the only important comparison for evaluating the current literature. Two of the most common stimulus types used in the literature in recent years have been non-photographic scenes (such as drawings) and photographs of real-world scenes. Perhaps, therefore, a crucial comparison is whether insights gained from drawings are equally valid as those gained from photographs and whether photographs and non-photographic depictions of scenes offer equivalent portrayals of the processes of object-memory formation. When dealing with non-photographic depictions of scenes (eg paintings and drawings) a further issue arises: it is possible that the artistic depiction of the scene could violate basic principles of natural scenes such as support (objects are subject to gravity), linear perspective, or size constancy (a near object is depicted as larger than the same size object positioned further away) and as such would not be a realistic depiction of a real scene (Biederman 1972). Any violations of realism in a non-photographic scene may also disrupt the semantic relations between the objects and scene. Previous researchers have considered whether semantically incongruent objects pop out from the scene, but their findings have been controversial and inconsistent (eg Davenport and Potter 2004; De Graef et al 1990; Friedman 1979; Henderson and Hollingworth 1998, 1999; Hollingworth and Henderson 1998, 2000, 2003; Pezdek et al 1989). Recently, Underwood and colleagues (Underwood and Foulsham 2006; Underwood et al 2007) have argued that the congruency effect may depend upon the nature and richness of the scene depiction, with early fixation of semantically inconsistent objects when viewing photographic scenes, but not when viewing line drawings of scenes. It is entirely plausible, therefore, that our ability to extract and retain information from a scene might depend upon whether the stimulus is a realistic or an unrealistic depiction of a natural scene. Realism might have a global effect on memory formation and persistence, or may have differential effects on different features. For example, a depiction that contravenes the laws of gravity might lead to a larger effect on certain object properties (such as position) than others (such as object colour). If the realism with which a scene is depicted influences any aspect of object-memory formation and persistence, there will be clear and important implications for interpreting studies in which non-photographic or artificially altered scenes are used. Here we present data from two experiments: object memory for photographic and non-photographic scenes is compared in the first; and object memory for realistic and unrealistic non-photographic scenes is compared in the second. As such, both can be seen as comparing different levels of realism in the scene stimulus (`optical' realism in experiment 1 and structural realism in experiment 2). In both experiments we consider both memory formation and persistence beyond the end of viewing. While these two aspects of object memory cannot truly be dissociated in the present experiments, we can make inferences about the manner in which information is built up and also for how long it persists after viewing has ended. Memory formation can be inferred by comparing recall performance after exposures of differing duration to the scenes. Memory persistence can be inferred either by introducing a delay between viewing the scene and the object-memory questions (experiment 1) or by presenting two separate exposures of a scene, separated by a number of intervening trials and comparing this performance to a single exposure of the same total duration (experiment 2). The effects observed in both experiments may depend upon the duration of the stimulus. As mentioned above, scene gist is extracted quickly whereas object details require a serial process of looking around the scene. In the case of a less realistic scene (a nonphotographic scene in experiment 1 or an unrealistic non-photographic scene in experiment 2), one might predict lower performance either at brief durations (poor extraction
B W Tatler, D Melcher
Performance
1718
(a)
Presentation time
(b)
Presentation time
(c)
Presentation time
Figure 1. Sample predictions for memory performance as a function of display duration. (a) Predicted performance if there is a benefit of one stimulus type for initial encoding, but no difference in the overall rate of accumulation across saccades. (b) Predicted performance if the initial encoding of the scene is identical but one stimulus type supports better accumulation of memory across separate glances. (c) Illustration of an initial benefit for one stimulus type, followed by similar accumulation up to a performance ceiling less than 100% correct.
of gist and initial detailsöfigure 1a) or at longer durations (poor retention of details in a scene-based representationöfigure 1b). The potentially separable influences of realism on initial encoding and effortful encoding can be illustrated as differences in either the intercept or the slope of a learning curve (figure 1). In addition, interpretations of memory tests must also take into account ceiling effects (figure 1c). Depending on the difficulty of the memory test, performance might plateau below 100%, demonstrating the need to test memory for displays at more than a single duration in order to avoid mistaken interpretations of the data. In addition, realism (in either experiment) might influence the persistence of object properties in memory: there may be an effect of the delay between presentation of the stimulus and the memory test. Long-term memory for scene stimuli, unlike for simpler stimuli, such as random digits or coloured shapes, is remarkably good (Hollingworth 2005; Melcher 2001, 2006; Melcher and Kowler 2001). An open question is whether the strong persistence of memory for scene-like stimuli is linked to the `scene-ness' of the stimulus (Melcher 2001; Melcher and Morrone 2007). If so, then less photorealistic scenes might be expected to suffer more from delays between stimulus presentation and the memory test (experiment 1) or between separate presentations of the same scene (experiment 2). 2 Experiment 1 2.1 Introduction The first experiment allows us to address the first of the two questions posed above: whether our ability to encode and retain information varies according to the manner in which the scene is depicted. Observers viewed both photographic and non-photographic scenes, and after each scene presentation four questions were asked, each testing a different object property. The object properties tested were the colour, position, and identity of objects in the scene, along with a recognition test in which the correct object was presented alongside two very similar exemplars. Each of these was tested under three exposure conditions: 1 s exposure, 10 s exposure, and 10 s exposure followed by a 1 s exposure after a 60 s delay (see section 2.2.3 for further details). Information accumulation can be inferred by comparing performance after 1 s and after 10 s of viewing. Information persistence can be inferred by comparing performance when testing immediately follows viewing the scene for 10 s and when testing is delayed by 60 s after viewing the scene for 10 s. 2.2 Method 2.2.1 Participants. There were thirty-one observers, all naive to the purposes of the experiment. Informed consent was obtained from all participants.
Pictures in mind
1719
(a)
(b)
(c)
(d)
Figure 2. Examples of non-photographic [(a) and (b)] and photographic [(c) and (d)] scenes.
2.2.2 Stimuli. 60 photographs and artistic renderings of natural scenes (drawings and paintings) were taken from non copy-written images available to the public domain and edited with Adobe Photoshop. Examples are shown in figure 2. The 30 photographs included both indoor and outdoor scenes (figures 2c and 2d). The 30 non-photographic scenes included pictorial depictions of scenes (see figure 2a), scenes that did not contain realistic scene organisation (see figure 2b) and computer-generated images of scenes. The memory test included a set of questions and a recognition test. This test was divided over two different display frames, with the order of the frames (questions or recognition test) randomised across trials. The recognition test measured memory for visual detail. One object that had been presented in the picture, along with two conceptually identical items, was presented as a greyscale image (figures 3b and 3c). On a separate display frame, there were three questions displayed on the screen along with three multiple-choice answers to each question (see figures 3a and 3d). Two questions examined visual details about the colour or position of objects, and the third question dealt with the identity of a particular object. Questions about the images were written by a group of four people and each person's questions were evenly distributed throughout the experiment to control for any effects of difficulty. The full set of questions, without any pictures, was given to two naive observers to determine chance performance. Both observers were unable to guess the correct answer at a rate significantly better than chance. The stimuli were displayed on a 21 inch Sony LCD monitor viewed from 60 cm and controlled with Superlab. Each image was approximately 20 deg of visual angle in height and width.
1720
B W Tatler, D Melcher
How many flags are flying? (1) One
(2) Two
(3) Three
Where is the man who is sitting in a wicker chair: (1) Centre
(2) Bottom right
(3) Bottom left
What is the colour of the umbrella held by the woman who is standing close to the sea? (1) Yellow
(2) White
(3) Red
(b)
(1)
(2)
(3)
(a) What colour is the line painted on the ground around the aeroplane? (1) Red
(2) Green
(3) Light blue
Where does the aeroplane's shadow fall? (1) Behind the aeroplane (3) There is no shadow
(c)
(1)
(2)
(3)
(2) In front of the aeroplane
How many white vehicles are parked in the bottom right of the picture? (1) One
(2) Three
(3) Six
(d) Figure 3. Sample questions for the scenes shown in figure 1. For (a) and (d), examples are shown for identity, colour, and position questions. For (b) and (c), the object-recognition questions are shown.
2.2.3 Procedure. Each trial commenced with a button press, followed by a stimulus display and then the two test frames. There were three experimental conditions. In the first condition, the stimulus was shown for 1 s, followed by the memory test. In the second condition, the stimulus duration was 10 s. In the third condition, an initial 10 s view was followed by a 60 s delay, and then followed by an additional 1 s view before the memory test. The purpose of the delay manipulation was to examine whether information was lost across the delay (Melcher 2001, 2006). The delay period contained a display of text from a history textbook. Participants read the paragraph silently at their own pace throughout the delay period, repeating from the beginning if time permitted. The reading task was used, rather than leaving the screen blank, in order to occupy working memory and prevent rehearsal. The experimenter observed each participant to ensure that he/she was reading the paragraph during the delay. The experiment was self-paced, allowing the participants to type in the answer to each question (by giving the corresponding number to their choice: 1, 2, or 3) and their response in the recognition test before moving to the next trial. The 30 photographic and 30 non-photographic scenes were evenly distributed between the three presentation conditions. The scenes used in each condition were counterbalanced across subjects, as was the order in which the conditions were tested. 2.3 Results and discussion As shown in figures 4 and 5, performance was consistently better for photographic stimuli and for the longer display durations. A 3-way repeated-measures ANOVA was run to consider the influence of the scene stimulus type (photographic and non-photographic), question type (identity, colour, position, and recognition), and presentation time (1 s, 10 s with no delay before testing, 10 s with a 60 s delay before testing) upon performance
Pictures in mind
1721
in the object-memory questions. Bonferroni-corrected t-tests were used to break down significant interactions. There was a significant main effect of scene stimulus type (F1, 30 74:446, p 5 0:001, partial Z 2 0:713), with better performance when viewing photographic stimuli than when viewing non-photographic stimuli. Previous work, comparing real scenes and photographic scenes, similarly suggested that object memories were richer when viewing the richer, real scenes than when viewing photographic scenes (Tatler et al 2005). Here we extend this finding to suggest that photographs promote greater encoding of object memories than do non-photographic scenes. Performance varied significantly across the three presentation conditions (F2, 60 72:721, p 5 0:001, partial Z 2 0:708). Performance was lower when viewing images for only 1 s than for either of the 10 s presentation conditions (both ps 5 0:001). This result supports the well-established finding that object memories improve with prolonged exposure to the scene (eg Hollingworth 2004; Hollingworth and Henderson 2002; Hollingworth et al 2001; Melcher 2001, 2006; Melcher and Kowler 2001; Melcher and Morrone 2003; Tatler 2002; Tatler et al 2003, 2005). For scenes viewed for 10 s, there was no difference in performance between those where the questions were asked immediately, and those where there was a 60 s delay before testing ( p 4 0:999).(1) The lack of difference in performance when a 60 s delay was introduced suggests that the encoded object memories persist stably for at least this period of time. Persistence of object memories over several seconds is consistent with previous studies of scene perception (eg Hollingworth and Henderson 2002; Melcher 2001, 2006; Melcher and Kowler 2001; Tatler et al 2005). There was a significant main effect of question type (F3, 90 5:647, p 0:001, partial Z 2 0:158). Performance was poorer for position questions than for either identity questions ( p 0:002) or colour questions ( p 0:001). There were no other differences between the question types (all ps 4 0:141). Of course, direct comparisons between different question types are problematic, as we cannot assume perceptual equivalence of the different object properties, or that the location questions were equally difficult to identify as colour questions, and as such this main effect should be interpreted with caution (see Tatler et al 2003 for a discussion of this issue). The only significant interaction was between stimulus type and presentation condition (F2, 60 4:24, p 0:019, partial Z 2 0:124). As shown in figure 4, performance was better for photographic stimuli than non-photographic stimuli for all presentation conditions (all ps 5 0:001), but the magnitude of this difference was greater for scenes presented for only 1 s than for scenes presented for 10 s. The trend is consistent with the interpretation that photographic scenes promote faster extraction early in viewing. Based on the similarities in slope, the overall rate of information extraction appeared similar for photographs and non-photographic stimuli (although there is some suggestion of a slightly shallower slope for photographic scenes). Proportion correct
1.0
photographic non-photographic
0.9 0.8 0.7 0.6 0.5 0.4 0.3
(1) Note:
1
10 (no delay) Time=s
10 (60 s delay)
Figure 4. Mean (1 SE) overall performance in the three presentation conditions in experiment 1 for photographic and nonphotographic scenes.
p values are Bonferroni corrected and this can result in corrected values of 4 0:999.
1722
B W Tatler, D Melcher
photographic non-photographic
Proportion correct
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
Proportion correct
(a)
(c)
(b) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
1
10 (no delay) Time=s
10 (60 s delay)
1
(d)
10 (no delay) Time=s
10 (60 s delay)
Figure 5. Mean (1 SE) performance in each of the three presentation conditions for photographic and non-photographic scenes for (a) identity, (b) colour, (c) position, and (d) recognition questions.
Figure 5 shows the performance after each presentation condition for photographic and non-photographic stimuli, for each of the four question types. The lack of any significant interaction involving question type suggests that the patterns of information extraction and persistence did not vary according to the type of information encoded. That different question types do not show different patterns of extraction according to the realism of the scene is consistent with previous work comparing real and photographic scenes (Tatler et al 2005). However, in this previous work Tatler et al found evidence for differential persistence of encoded object properties after attention was withdrawn from the object. The discrepancy between the current findings and this previous study may arise from differences in the manner in which persistence is tested (by introducing a delay in the present study with no intervening visual images, rather than by considering the number of fixations between the final fixation on the object and the end of viewing in the previous report by Tatler et al). Overall, in the present experiment there was a benefit in performance for photographic scenes that was already noticeable within 1 s and continued up to 10 s of viewing without becoming larger. This suggests that the main influence of photographic realism was in the initial encoding of the display, with similar rates of learning for the two types of scenes and similar persistence across a 60 s delay (consistent with the learning curves depicted in figure 1a). 3 Experiment 2 3.1 Introduction The non-photographic scenes used in experiment 1 included `realistic' scenes and `unrealistic' scenes in which aspects of natural scenes were violated. As such, we cannot be certain whether the advantage for photographs is due to their more optically accurate portrayal of surfaces or rather from the fact that the photographs always followed the
Pictures in mind
1723
`rules' of natural scenes (in terms of both physics and cultural norms) while many of the non-photographic scenes did not. It may be that unrealistic scenes, in which fundamental aspects of natural scenes (such as support) are violated, place different processing requirements and constraints on the viewer and thus influence object-memory formation. The data presented in experiment 2 allow us to explore the influence of realism when viewing non-photographic stimuli. 3.2 Method 3.2.1 Participants. Twenty-three students participated in the experiment for course credit. All were naive regarding the purpose of the experiment and gave informed consent to participate in the experiment. 3.2.2 Stimuli. The stimulus set comprised a mix of three types of images: pictorial depictions of realistic scenes (38; for example see figure 2a), drawings or paintings that violated some aspect of natural scenes (52; for example see figure 2b), and photographs (10). The `unrealistic' depictions violated one or more basic principles of real scenes, such as support (objects are subject to gravity), size constancy (a near object is depicted as larger than the same size object further away), or linear perspective (Biederman 1972). For the analyses presented in this article, photographic stimuli were excluded; hence, only non-photographic stimuli are analysed here. 3.2.3 Procedure. The procedure was similar to that in experiment 1, except that there was no forced-choice visual recognition test. Each scene was presented for a single exposure of 5, 10, or 20 s, or was presented for two exposures, separated by a number of intervening trials. If two exposures of the same scene were presented, total viewing time was either 10 s (two 5 s exposures) or 15 s (a 10 s exposure followed, later, by a 5 s exposure); when total presentation time was split across two exposures, questions were asked only after the second presentation of the scene. The 5 s and 10 s trials were run together in two blocks of 20 trials each, with half of the stimuli in each block shown once (continuous trials) before the memory test and the other half viewed twice before the test. Re-tests of the same stimulus were given 4 to 6 trials after the first view of the picture. The 20 s trials were shown in a separate block of 20 trials in the same experiment, with the order of the 20 s block and the other blocks counterbalanced across subjects. A different analysis of an overlapping portion of this same data set, without consideration of the realism of the pictures, was presented as experiment 1 in Melcher (2006). Data for scenes presented only once and those presented twice were analysed separately. 3.3 Results and discussion The main results are summarised in figure 6 (left) for scenes presented only once, and in figure 6 (right) for scenes presented twice. A 3-way repeated-measures ANOVA was run to consider the influence of the scene realism (realistic and unrealistic), question type (identity, colour, and position), and presentation time (5 s, 10 s, and 20 s) upon performance in the object-memory questions (Bonferroni-corrected t-tests were used to break down significant interactions). For the scenes presented with only a single exposure, the influence of realism differed depending on the question type and display duration. This difference is reflected in the 3-way interaction between scene realism, the type of question asked, and the presentation time (F4, 88 2:95, p 0:024, partial Z 2 0:118). Bonferroni-corrected t-tests showed that for identity questions (figure 6a), when the scene was presented for 5 s, performance was better when viewing unrealistic scenes than when viewing realistic scenes ( p 0:036). There were no other differences between realistic and unrealistic scenes for identity questions (all ps 4 0:098). Thus, the data suggest an initial benefit for extracting object-identity information from unrealistic scenes, followed by a shallow slope in
1724
B W Tatler, D Melcher
learning from 5 to 20 s. The more realistic scenes showed poorer performance at 5 s, but then quickly reached plateau performance by 10 s. One potential explanation for this trend is that violations in realism can lead to greater attention to (Biederman et al 1982) and/or longer fixation times (eg De Graef et al 1990; Friedman 1979; Hollingworth and Henderson 1998) on the objects that violate the rules of scene depiction. This might, in theory, have led to an initial boost in processing these objects but no benefit in learning more about the scene. Thus, these findings are consistent with recent work that has highlighted that the nature of the scene stimulus (line drawing or photographic) may determine whether or not congruency effects emerge (see Underwood and Foulsham 2006; Underwood et al 2007). For colour questions (figure 6b), there were no differences in performance for realistic and unrealistic scenes for any of the presentation times (all ps 4 0:298). On the other hand, position questions (figure 6c) showed an overall benefit for realistic scenes, with these differences reaching significance for 5 s ( p 0:038) and 10 s ( p 0:042), but not for scenes presented for 20 s ( p 0:158). The data for realistic scenes are consistent with a plateau in performance at around 70% ^ 75% within 5 s. Such an early promotion of position information is consistent with previous suggestions that layout information is extracted and integrated into scene memory faster than information regarding other object details (eg Aginsky and Tarr 2000; Rensink 2000; Tatler et al 2003). This may reflect the importance of spatial information in organising a coherent scene representation (Melcher 2001; Melcher and Morrone 2007; Tatler et al 2003). Overall, there were differences between the three question types (F2, 44 4:98, p 0:011, partial Z 2 0:185). As stated previously, however, it is difficult to interpret such differences because we cannot be certain that the different question types are perceptually equivalent in terms of encoding information (see Tatler et al 2003). While there was no main effect of scene realism (F1, 22 0:08, p 0:778, partial Z 2 0:004), the 3-way interaction described above indicates that the different types of object information tested were differentially sensitive to the level of realism in the depicted scene. In order to explore information persistence we compared single and split exposures to scenes. The data from the split-exposure scenes can be used to consider whether extracted information is retained during the period between exposures: if information persists stably, two exposures of 5 s should be equivalent to a single exposure of 10 s (Melcher 2001). We therefore ran a separate 3-way ANOVA to explore this question: the repeated-measures variables were scene realism (realistic and unrealistic), question type (identity, colour, and position), and exposure type (single or split). Bonferroni-corrected t-tests were used to follow up any significant effects or interactions. Figure 6 (right) shows the performance when viewing realistic and unrealistic scenes for each question type for scenes presented for single exposures of 5 s, two exposures of 5 s, and single exposures of 10 s.(2) There was a main effect of realism (F1, 22 8:69, p 0:007, partial Z 2 0:283), with better performance when viewing realistic scenes. There was no main effect of exposure type between single exposures of 10 s and two exposures of 5 s each separated by a number of intervening trials (F1, 22 1:53, p 0:229, partial Z 2 0:065). This result supports previous findings that object memories can persist stably for some time (eg Hollingworth and Henderson 2002; Melcher 2001, 2006; Melcher and Kowler 2001; Tatler et al 2005).
(2) Note
that the data for single exposures of 5 s are included in figure 6 (right) for comparison purposes only. As explained in the text, the ANOVA compared split exposures of 5 s 5 s to single exposures of 10 s.
Pictures in mind
1725
realistic unrealistic 1.0 Proportion correct
0.9
Proportion correct
(a)
(b)
0.8 0.7 0.6 0.5 0.4 0.3
(d)
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
(e)
Proportion correct
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
(c)
5
10 Presentation time=s
20
5
(f)
55 Presentation time=s
10
Figure 6. Mean (1 SE) performance for realistic and unrealistic non-photographic scenes for each of the three single-exposure presentation times in experiment 2 (left) and for stimuli presented for a single exposure of 5 s, two exposures of 5 s each, and a single exposure of 10 s in experiment 2 (right). Data are shown for (a) and (d) identity, (b) and (e) colour, and (c) and (f ) position questions.
There were no significant interactions, suggesting that the different question types were not differentially sensitive to either the realism of the scene stimulus or whether exposure was single or split. This result suggests that while information accumulation is sensitive to the realism of the depicted scene and varies between question types, the stability of the extracted information is not influenced by either of these factors. There was a main effect of question type (F2, 44 12:18, p 5 0:001, partial Z 2 0:356). Performance was lower for colour questions (figure 6e) than for either identity ( p 0:001; figure 6d), or position ( p 0:001; figure 6f ) questions; there was no difference between identity and position questions ( p 4 0:999). Again, the interpretation of the difference in questions is problematic since the difficulty in extracting and maintaining these different types of information was not equated a priori.
1726
B W Tatler, D Melcher
4 General discussion In this study we examined the accumulation and persistence of visual memory for scenes varying in realism, from photographs of natural scenes to non-realistic coloured drawings of scenes. The overall finding was that differences did emerge for learning scene details as a function of realism, but these differences were greatest for brief exposures and varied depending on the type of question (identity, colour, position, or object recognition). Thus, at least in these experiments, the influence of realism appeared to be during the first few glances and not during the more laborious process of memorising the details of individual objects across numerous fixations. Realism did not appear to influence the persistence of the visual details in memory, at least over a period of seconds to minutes. The data from experiment 1 showed that recall of object properties was better overall when viewing photographic scenes than when viewing non-photographic scenes. This result implies that the nature of the scene presented to observers must be considered when extrapolating results obtained in the laboratory to real-world scene perception. This overall effect of scene stimulus mirrors that found previously when comparing real-world scenes with photographic scenes (Tatler et al 2005): the extent of information extraction and recall was greater for the more realistic scenes. While the nature of the scene stimulus influenced the overall efficacy of object memory, it did not have differential effects on the different object properties that comprise object memory. There was no evidence for differential accumulation or persistence for the different question types. The interaction between stimulus type and presentation condition suggests an initial benefit for extracting information from photographic scenes. However, it is not clear whether the rates of extraction from 1 to 10 s were the same or whether the rate was slightly slower for photographic scenes; given the present data we do not have sufficient evidence for the latter possibility. As such, it would appear that non-photographic scenes still promote largely the same mechanisms of object-memory accumulation and persistence as photographic scenes. Therefore, findings from non-photographic scene stimuli will still reflect the process of learning about individual objects across several fixations, even though the overall efficacy is lower. Non-photographic scenes can either realistically adhere to principles of scene organisation such as support, or unrealistically break these principles. In experiment 2 we found that the observer's ability to extract and recall information about objects was influenced by whether the depicted scene adhered to natural scene properties or not. This influence differed for the three types of object information tested. There was an initial benefit for object-identity information with unrealistic scenes, but with prolonged viewing this advantage for unrealistic scenes disappeared. The extent and rate of accumulation of colour information was not influenced by the realism of the scene. Position information was extracted more faithfully from realistic scenes (presented for 5 or 10 s) than from unrealistic scenes, but the advantage for realistic scenes disappeared for more prolonged viewing. The finding that scene realism can influence the manner in which information is extracted and assimilated into object memories, and that the nature of this effect depends upon the type of information being tested, suggests that caution must be exercised when interpreting findings from studies that include nonphotographic scenes that violate realistic scene organisation. Our finding of differential effects of scene realism upon different aspects of object memory has parallels in the debate surrounding the influence of semantic consistency between objects and the scene upon perception. Early suggestions that semantically inconsistent objects attract attention sooner than semantically consistent objects (Loftus and Mackworth 1978) were later challenged (eg De Graef et al 1990; Henderson and Hollingworth 1998, 1999; Hollingworth and Henderson 1998); indeed it has even been argued that object-memory construction might operate in isolation from scene semantics
Pictures in mind
1727
(Hollingworth and Henderson 1999). It is worth noting, however, that these studies used line-drawing stimuli, in which semantic inconsistency might not be particularly obvious in peripheral vision. The type of real-scene information eliminated by linedrawing stimuli is quite important for both scene gist recovery and object identification (Goffaux et al 2005; Oliva and Schyns 2000). In fact, recent studies using photographs have reported semantic consistency effects on both the object and its background in several tasks (Davenport and Potter 2004; see also Underwood and Foulsham 2006; Underwood et al 2007, for a discussion of this issue). Other effects of semantic consistency have also been reported. Attentional dwell time appears to be higher for semantically inconsistent objects than semantically consistent objects (eg De Graef et al 1990; Friedman 1979; Hollingworth and Henderson 1998). An increased attentional dwell upon inconsistent objects might also be used to explain our result: if the unrealistic nature of the scene promotes longer dwell-time on objects, and, as has been shown, information accumulates over fixation time (Loftus 1972; Hollingworth and Henderson 2002), then objects in unrealistic scenes may be encoded more extensively. While our realism manipulation will have influenced the semantics of the scene, we were not employing a de facto semantic manipulation. Rather, our manipulation is what could be termed as structural: unrealistic scenes contain violations of basic properties of real scenes such as gravitational support. One of the few studies that looked at structural rather than semantic inconsistency found that structurally inconsistent objects were processed slower and with more errors (Biederman et al 1982). Once again, however, this result is somewhat controversial, and a more recent replication of the study argued that Biederman et al's result was an artefact of not controlling response bias fully (Hollingworth and Henderson 1998), although, again, the latter study used line drawings exclusively. 5 Conclusion The manner in which a scene is depicted when presented to observers had a clear influence upon object memory. Photographic scenes promoted object memories of greater efficacy than did non-photographic scenes. These effects were most pronounced during the initial encoding of the scene. However, subsequent encoding appeared to progress at similar rates for photographic and non-photographic scenes. There were no differences between photographic and non-photographic scenes in the persistence of information after the end of viewing. Scene realism, within the set of non-photographic pictures, had a rather different influence upon extracting information. Not only did realism have differing effects upon the initial encoding period, but there was evidence for differences in the accumulation of information across multiple saccadic eye movements. Thus, unlike the situation when comparing photographic and non-photographic scenes, realism did appear to influence the underlying processes of object-memory formation. Increasingly, researchers are striving to arrive at an ecologically valid account of the mechanisms of scene perception, in which we can understand the way in which the system works under real-world conditions. The present study suggests that, if we are to achieve this aim, careful consideration is required in the nature of the stimuli that we use in scene perception experiments. Clearly, as we move further from the real world in terms of our scene stimuli, so we move further from understanding the real operation of the visual perceptual system. References Aginsky V, Tarr M J, 2000 ``How are different properties of a scene encoded in visual memory?'' Visual Cognition 7 147 ^ 162 Biederman I, 1972 ``Perceiving real-world scenes'' Science 177 77 ^ 80 Biederman I, 1981 ``On the semantics of a glance at a scene'', in Perceptual Organization Eds M Kubovy, J R Pomerantz (Hillsdale, NJ: Lawrence Erlbaum Associates) pp 213 ^ 253
1728
B W Tatler, D Melcher
Biederman I, Mezzanotte R J, Rabinowitz J C, 1982 ``Scene perception: Detecting and judging objects undergoing relational violations'' Cognitive Psychology 14 143 ^ 177 Davenport J L, Potter M C, 2004 ``Scene consistency in object and background perception'' Psychological Science 15 559 ^ 564 De Graef P, 1998 ``Prefixational object perception in scenes: Objects popping out of schemas'', in Eye Guidance in Reading and Scene Perception Ed. G Underwood (Oxford: Elsevier) pp 315 ^ 338 De Graef P, Christiaens D, d'Ydewalle G, 1990 ``Perceptual effects of scene context on object identification'' Psychological Research ^ Psychologische Forschung 52 317 ^ 329 Friedman A, 1979 ``Framing pictures: the role of knowledge in automatized encoding and memory for gist'' Journal of Experimental Psychology: General 108 316 ^ 355 Goffaux V, Jacques C, Mouraux A, Oliva A, Schyns P G, Rossion B, 2005 ``Diagnostic colours contribute to the early stages of scene categorization: Behavioural and neurophysiological evidence'' Visual Cognition 12 878 ^ 892 Henderson J M, 2003 ``Human gaze control in real-world scene perception'' Trends in Cognitive Sciences 7 498 ^ 504 Henderson J M, Hollingworth A, 1998 ``Eye movements during scene viewing: An overview'', in Eye Guidance in Reading and Scene Perception Ed. G Underwood (Oxford: Elsevier) pp 269 ^ 298 Henderson J M, Hollingworth A, 1999 ``The role of fixation position in detecting scene changes across saccades'' Psychological Science 10 438 ^ 443 Henderson J M, Hollingworth A, 2003 ``Eye movements and visual memory: Detecting changes to saccade targets in scenes'' Perception & Psychophysics 65 58 ^ 71 Henderson J M, Weeks P A, Hollingworth A, 1999 ``The effects of semantic consistency on eye movements during complex scene viewing'' Journal of Experimental Psychology: Human Perception and Performance 25 210 ^ 228 Hollingworth A, 2004 ``Constructing visual representations of natural scenes: The roles of shortand long-term visual memory'' Journal of Experimental Psychology: Human Perception and Performance 30 519 ^ 537 Hollingworth A, 2005 ``The relationship between online visual representation of a scene and longterm scene memory'' Journal of Experimental Psychology: Learning, Memory, and Cognition 31 396 ^ 411 Hollingworth A, 2006 ``Scene and position specificity in visual memory for objects'' Journal of Experimental Psychology: Learning, Memory, and Cognition 32 58 ^ 69 Hollingworth A, Henderson J M, 1998 ``Does consistent scene context facilitate object perception?'' Journal of Experimental Psychology: General 127 398 ^ 415 Hollingworth A, Henderson J M, 1999 ``Object identification is isolated from scene semantic constraint: Evidence from object type and token discrimination'' Acta Psychologica 102 319 ^ 343 Hollingworth A, Henderson J M, 2000 ``Semantic informativeness mediates the detection of changes in natural scenes'' Visual Cognition 7 213 ^ 235 Hollingworth A, Henderson J M, 2002 ``Accurate visual memory for previously attended objects in natural scenes'' Journal of Experimental Psychology: Human Perception and Performance 28 113 ^ 136 Hollingworth A, Henderson J M, 2003 ``Testing a conceptual locus for the inconsistent object change detection advantage in real-world scenes'' Memory & Cognition 31 930 ^ 940 Hollingworth A, Williams C C, Henderson J M, 2001 ``To see and remember: Visually specific information is retained in memory from previously attended objects in natural scenes'' Psychonomic Bulletin & Review 8 761 ^ 768 Intraub H, 1980 ``Presentation rate and the representation of briefly glimpsed pictures in memory'' Journal of Experimental Psychology: Human Learning and Memory 6 1 ^ 12 Intraub H, 1981 ``Identification and processing of briefly glimpsed visual scenes'', in Eye Movements: Cognition and Visual Perception Eds D F Fisher, R A Monty, J W Senders (Hillsdale, NJ: Lawrence Erlbaum Associates) pp 181 ^ 190 Loftus G R, 1972 ``Eye fixations and recognition memory for pictures'' Cognitive Psychology 3 525 ^ 551 Loftus G R, 1985 ``Picture perception: Effects of luminance on available information and information-extraction rate'' Journal of Experimental Psychology: General 114 342 ^ 346 Loftus G R, Mackworth H H, 1978 ``Cognitive determinates of fixation location during picture viewing'' Journal of Experimental Psychology: Human Perception and Performance 4 565 ^ 572 Loftus G R, McLean J, 1999 ``A front end to a theory of picture recognition'' Psychonomic Bulletin & Review 6 394 ^ 411
Pictures in mind
1729
Melcher D, 2001 ``Persistence of visual memory for scenes öA medium-term memory may help us to keep track of objects during visual tasks'' Nature 412 401 Melcher D, 2006 ``Accumulation and persistence of memory for natural scenes'' Journal of Vision 6 8 ^ 17 Melcher D, Kowler E, 2001 ``Visual scene memory and the guidance of saccadic eye movements''' Vision Research 41 3597 ^ 3611 Melcher D, Morrone M C, 2003 ``Spatiotopic temporal integration of visual motion across saccadic eye movements'' Nature Neuroscience 6 877 ^ 881 Melcher D, Morrone M C, 2007 ``Transsaccadic memory: Building a stable world from glance to glance'', in Eye Movements: A Window on Mind and Brain Eds R P G Van Gompel, M H Fischer, W S Murray, R L Hill (Oxford: Elsevier), in press Oliva A, Schyns P G, 2000 ``Diagnostic colors mediate scene recognition'' Cognitive Psychology 41 176 ^ 210 Pezdek K, Whetstone T, Reynolds K, Askari N, Dougherty T, 1989 ``Memory for real-world scenesöthe role of consistency with schema expectation'' Journal of Experimental Psychology: Learning, Memory, and Cognition 15 587 ^ 595 Rensink R A, 2000 ``The dynamic representation of scenes'' Visual Cognition 7 17 ^ 42 Tatler B W, 2002 ``What information survives saccades in the real world'', in The Brain's Eye: Neurobiological and Clinical Aspects of Oculomotor Research Eds J Hyo«na«, D Munoz, W Heide, R Radach (Amsterdam: Elsevier) pp 149 ^ 163 Tatler B W, Gilchrist I D, Land M F, 2005 ``Visual memory for objects in natural scenes: From fixations to object files'' Quarterly Journal of Experimental Psychology, Section A ö Human Experimental Psychology 58 931 ^ 960 Tatler B W, Gilchrist I D, Rusted J, 2003 ``The time course of abstract visual representation'' Perception 32 579 ^ 592 Underwood G, Foulsham T, 2006 ``Visual saliency and semantic incongruency influence eye movements when inspecting pictures'' Quarterly Journal of Experimental Psychology 59 1931 ^ 1949 Underwood G, Humphreys L, Cross E, 2007 ``Congruency, saliency and gist in the inspection of objects in natural scenes'', in Eye Movements: A Window on Mind and Brain Eds R P G Van Gompel, M H Fischer, W S Murray, R L Hill (Oxford: Elsevier), in press
ß 2007 a Pion publication
ISSN 0301-0066 (print)
ISSN 1468-4233 (electronic)
www.perceptionweb.com
Conditions of use. This article may be downloaded from the Perception website for personal research by members of subscribing organisations. Authors are entitled to distribute their own article (in printed form or by e-mail) to up to 50 people. This PDF may not be placed on any website (or other online distribution system) without permission of the publisher.