Cogn Comput DOI 10.1007/s12559-010-9062-3
Salience in Paintings: Bottom-Up Influences on Eye Fixations Isabella Fuchs • Ulrich Ansorge • Christoph Redies Helmut Leder
•
Received: 1 April 2010 / Accepted: 1 August 2010 Ó Springer Science+Business Media, LLC 2010
Abstract In the current study, we investigated whether visual salience attracts attention in a bottom-up manner. We presented abstract and depictive paintings as well as photographs to naı¨ve participants in free-viewing (Experiment 1) and target-search (Experiment 2) tasks. Image salience was computed in terms of local feature contrasts in color, luminance, and orientation. Based on the theories of stimulus-driven salience effects on attention and fixations, we expected salience effects in all conditions and a characteristic short-lived temporal profile of the salience-driven effect on fixations. Our results confirmed the predictions. Results are discussed in terms of their potential implications.
Eye Movements and Attention
Keywords Visual attention Salience Eye movements Art perception
Visual salience is a factor assumed to drive the eyes and attention (cf. [2, 11, 12]). In this context, visual salience denotes local feature contrast (cf. [13]). In its strongest form, the salience-capture hypothesis claims that salience captures attention in a bottom-up or stimulus-driven way (e.g., [2, 14]); strong local feature contrasts, such as a color difference or a luminance edge, capture attention regardless of the participants’ search goals or intentions. Much of the evidence for this position is equivocal (e.g., [15, 16], for a review, see [17]). For instance, Bacon and Egeth [15] found that a local feature contrast captures attention if participants actively search for local feature contrasts. However, if participants search for specific features, no bottom-up salience-driven capture is observed whatsoever [18, 19]. Salience-driven attentional capture effects seem fast and short-lived [20, 21]. If these were true, it would be in line with the assumption of an early stimulus-driven effect giving way to subsequent top-down control over attention (e.g. [21]). However, this interpretation is again uncertain because in previous studies, salience was task-relevant.
Introduction Visual processing is selective. Humans process some visual stimuli, while they ignore others (cf. [1]). This selectivity is called visual attention. But which factors drive attention? One universal principle could be bottom-up visual salience, that is, local feature contrast (cf. [2]). The current study tested this hypothesis. I. Fuchs (&) U. Ansorge H. Leder University of Vienna, Liebiggasse 5, 1010 Vienna, Austria e-mail:
[email protected] U. Ansorge University of Osnabru¨ck, Osnabru¨ck, Germany C. Redies University of Jena School of Medicine, Jena, Germany
Visual attention serves different purposes. It helps to improve the quality of perception of attended compared to unattended stimuli (cf. [3, 4]). It also serves to direct actions, as has been emphasized in the selection-for-action view of attention (cf. [5, 6]). In particular, attention could be a mechanism for picking the next fixation location on which to direct the eyes [7, 8]. Fixations can, thus, help us to understand which image features capture our attention [9, 10]. Visual Salience
123
Cogn Comput
Participants could have therefore searched for salient objects in a goal-directed or top-down way. Moreover, topdown controlled attention can also be quick (cf. [22–24]). Thus, it is necessary to demonstrate that a salience-driven capture effect is (1) short-lived and (2) does not require a fitting top-down goal of the participants to search for salient objects to confirm salience-driven capture of attention. To only show that salience-driven attention is quick is not sufficient to confirm the bottom-up effect of salience. Visual Salience as a Predictor of Fixations One very interesting line of research examined whether salience drives human gaze (cf. [11, 25–27]). This research used a salience model of visual images to predict fixation directions. The model utilizes local contrasts of color, intensity, and edge orientation and identifies image areas as salient proportional to these contrasts. The resulting conspicuity maps for color, intensity, and orientation are then weighed and combined linearly into an overall salience map (for more details on how this model works, see (cf. [28]). The likelihood of a fixation on a particular image location is then proportional to the amount of local salience at this location. In line with the speed of bottom-up salience-driven attentional capture, the salience model of fixations is usually based on the first five fixations. The model even predicts an eye scan path (first five fixations) that can be compared to real gaze data. For this prediction, prevention of re-fixations is modeled using an inhibition of return (IOR) mechanism of the gaze to previously fixated locations. The salience model indeed predicts, with a better than chance likelihood, human fixations in 2D images (photographs) of natural scenes [29]. This salience effect, however, is not necessarily due to bottom-up processes. One reason for this is that salient regions of images are also highly correlated with the locations of objects that are depicted in the images [30]. This means that the top-downexploited diagnostic value of feature contrasts for the identity of interesting objects rather than salience per se could be responsible for the salience model’s fit to the fixation data (cf. [31]). It might seem as if this is not a valid argument, since salience also predicts fixations in more artificial images such as fractals [27, 32]. Whether fractals provide an appropriate baseline is, however, questionable; fractals also nicely conform to ecological principles, such as the distribution of different spatial frequencies according to a 1/f2 ratio (cf. [33]). In addition, as in controlled psychological experiments, salience effects on fixations are easily overridden by topdown control. Salience effects on fixations are typically observed in free-viewing conditions, where the participants have no explicit task. However, if there is a task, models involving task-congruent top-down control of attention
123
typically outperform predictions based on salience alone (cf. [34–38]). Furthermore, participants’ interest and knowledge seem to affect gaze behavior stronger than any low-level factors [39, 40]. Paintings We used paintings to test whether salience drives fixations in a bottom-up way. Paintings have the potential to violate ecological visual principles. Even the natural character of depictive paintings (see Fig. 1), which are aiming at copying real objects or scenes, suffers from the constraints imposed by insufficient painting skills and limitations imposed by painting materials. Even worse, as the very idea the phrase ‘abstract painting’ denotes, there are paintings in which the painter actively seeks to disregard if not to violate our visual experiences based on natural scene perception. Therefore, we tested the bottom-up salience hypothesis using depictive and abstract paintings. If salience influences fixations in a bottom-up manner, we expected salience-driven effects on fixation probabilities in abstract and depictive paintings. Moreover, we expected to find evidence for the short-lived nature of these bottom-up effects (cf. [20]): if salience effects influence fixations in a stimulus-driven way, we expected to find decreasing salience effects over time. Alternatively, it is possible that no salience-based fixation effects would be found during the viewing of paintings; prior reports of salience effects in photographs and fractals may reflect preferences for ecologically valid visual input, which would be uncertain for abstract paintings. Therefore, we also included two control conditions with photographs to demonstrate that salience-driven fixation effects could be replicated if participants looked at photographs rather than paintings. In Experiment 1, we used a free-viewing task. Free viewing theoretically allows participants to search for contrasts if they think that this is worthwhile. Therefore, in Experiment 2, we conducted a visual search task that required participants to disregard salience. This was meant to test whether salience has the power to overcome top-down control and draw fixations away from task-relevant stimuli (cf. [19]).
Experiment 1 Methods Participants Twelve undergraduate students (nine female students; mean age: 25.4) from the University of Vienna participated
Cogn Comput Fig. 1 a Example images of all image classes. b Fourier power spectra of the example images A–E. c The mean curves of radially averaged power for all 25 images of each image class. Curves and confidence intervals for abstract paintings are shown in direct comparison to depictive paintings, Ce´zanne’s paintings, Machotka’s photographs, and Shore’s photographs, respectively. The depicted example images are (A) abstract paintings (Willem de Kooning), (B) depictive paintings (Franz Radziwill) (C) Paul Ce´zanne, and photographs by (D) Pavel Machotka and (E) Stephen Shore. Images are reproduced with kind permission of the museums (A: [58] Ó Peggy Guggenheim Collection; B: [54] Ó Staatliche Museen zu Berlin, Kupferstichkabinett, Photo by Jo¨rg P. Anders; C: [55] Ó The State Hermitage Museum St. Petersburg, Photo by Vladimir Terebenin, Leonard Kheifets, Yuri Molodkovets) and the artists (D: [56] Ó Pavel Machotka, E: [57] Ó Stephen Shore)
for course credit. Here and in Experiment 2, the procedure was explained prior to data acquisition, and informed consent was obtained from each participant. All participants had normal or corrected-to-normal vision, as well as normal color vision, as assessed by Ishihara plates. Participants were naı¨ve with respect to the purpose of the study and not formally educated in art.
Stimuli We used five stimulus classes from two categories, paintings and photographs (see Fig. 1a). Each class consisted of 25 color images, 125 images altogether. Paintings: (1) Abstract, (2) Ce´zanne, and (3) Depictive Paintings We used abstract paintings by different artists.
123
Cogn Comput
The paintings did not have any recognizable objects and varied in complexity and style. Monochrome and unstructured paintings, such as uni-colored paintings, were not used because the salience model would definitely not apply to these paintings. The paintings by Ce´zanne depicted landscapes. Ce´zanne’s paintings were chosen as a unique stimulus class because they corresponded nicely with Machotka’s photographs, which served as a control condition. Machotka took photographs of the scenes depicted in Ce´zanne’s paintings. A third class of stimuli contained other depictive paintings by artists. None of the paintings depicted people because faces and people seem to capture attention [41].
Data Analysis Data analysis was conducted with Data Viewer (SR Research). Fixations were identified with standard settings and thresholds (eye movement displacement \0.15°, minimal duration = 100 ms), and further analysis was implemented in Matlab (Mathworks, Natick, MA). Analysis and Results Power Spectrum Analysis
Photographs: (4) Machotka’s, and (5) Shore’s Photographs Pavel Machotka photographed the sites of Ce´zanne’s landscape paintings from the same perspective and at the same time of the day as depicted in the paintings. Therefore, each of Ce´zanne’s images corresponded to a photograph in the Machotka set. Stephen Shore photographed real-world scenes. These photographs were already used in prior studies [31], and fixations on these were shown to be influenced by salience. Shore’s photographs thus served as a control condition to which to compare the fixations from the other stimulus classes.
Fourier power spectra were analyzed for all image classes [42]. In a first step, all images were converted to grayscale by using the luminance component of the YIQ transform [43]. Each input image was resized to 1,024 9 1,024 pixels. Fast Fourier transform was computed for each frequency. Figure 1b shows the power spectra for the example images. The mean curves of radially averaged power for all 25 images of each image class are depicted in Fig. 2 (for more details on the procedure, see [43]). As can be seen in Fig. 1c, compared to photographs, the abstract paintings contained less high spatial frequency information but otherwise nicely adhered to the power law of spatial frequency distributions.
Procedure
Fixation Data
Eye movements were recorded using the EyeLink 1000 (SR Research, Mississauga, Ontario, Canada) at 1.000 Hz. Head position was stabilized using a chin rest with a forehead strip, and the proper signal was controlled prior to each trial and recalibrated if necessary. Stimulus materials were presented on a 19-inch CRT monitor with a resolution of 800 9 640 pixels at a distance of 64 cm and thus subtended 25 9 20 degrees of visual angle. Stimulus presentation was controlled using Experiment Builder (SR Research). Before each trial, a fixation cross was presented at the center of the screen. The trial started only if participants fixated within one degree of visual angle of the fixation cross (mean: 0.34°, SD: 0.15°, over participants and trials). Participants were instructed to inspect the presented images carefully for five seconds during a free-viewing task. Stimuli were presented in randomized order. They were centered on the screen on a gray background (CLab color coordinates: 86.7/0.6/-34.9). All participants saw the complete set of pictures from all classes: abstract paintings, depictive paintings, Shore’s photographs, and either Ce´zanne’s paintings or Machotka’s photographs. The latter two classes were tested as a between-participant variable to avoid confounded repetition effects of picture content.
The first fixation in each image was excluded, since this fixation was on the fixation cross (98% within 2° of visual angle from the image center). Raw fixations are shown separately for different image classes in Fig. 3. A central bias was detected in all classes. (Note that Ce´zanne’s paintings and Machotka’s photographs received fewer fixations, since each of the pictures
123
Fig. 2 Log–log plots of radially averaged Fourier power versus spatial frequency, averaged over all 25 images of each image class
Cogn Comput Fig. 3 Fixation distributions on images. Data from all participants, separately for different image classes
in these classes was presented to only half of the participants.) Number of Fixations and Fixation Durations Table 1 shows the mean number of fixations made on the images, aggregated over participants and separated by image class. A significant effect of image class was found on the average number of fixations made on each image, F(4, 1193) = 16.26, p \ .001, g2p = 0.05. Post hoc t-tests showed a higher number of fixations on depictive paintings and Shore’s photographs than on the other stimulus classes (all significant ps \ .01, all Cohen’s Ds [ 0.33). Mean fixation durations also differed significantly between image classes, F(4, 1193) = 6.06, p \ .001, g2p = 0.02. Mean fixation durations were shortest for Shore’s photographs and longest for Machotka’s photographs. Post hoc t-tests revealed a significantly shorter mean fixation duration on Shore’s photographs than on all other image classes (for Abstract paintings, Ce´zanne’s
Table 1 Means and standard deviations (SD) for the number of fixations and the fixation durations (over participants), listed separately for image categories Stimulus class
Abstract art Ce´zanne
Number of fixation
Fixation duration (ms)
Mean
Mean
SD
SD
12.95
3.14
349.76
186.50
12.97
3.22
350.33
224.31
Depictive art
14.00
2.90
322.55
211.66
Machotka
12.43
3.45
403.55
415.10
Shore
14.41
3.05
298.82
117.80
paintings, and Machotka’s photographs, all ps \ .05, all 3 Cohen’s Ds [ 0.28), except for the depictive paintings (p = .20). In addition, fixations on Machotka’s photographs were on average longer compared to Shore’s photographs and to depictive paintings (p \ .01, Cohen’s D = 0.25). Mean Salience We calculated the mean salience value for each stimulus (using the matlab code from: [44]). A univariate ANOVA of the mean salience indicated no significant differences between the five image classes, F(4, 120) = 1.94, p = .11. Effects of Salience Versus Random Distributions For our test of the predictive power of the salience model against chance level, we used random distributions of fixations across the images as baselines [45]. We used two kinds of random distributions. In baseline 1, fixations were distributed randomly across the image. In baseline 2, a biased distribution was computed that takes the probability distribution of human fixations into account. For the biased-random distribution, the image is divided into 20 equally sized 160 9 160 pixel sections (yielding a 5 9 4 section grid). Next, salience was computed for the five most salient regions of interest (ROIs) within each image. The salience maps were calculated with the saliency toolbox by Walther and Koch [46], a Matlab implementation of Itti and Koch’s model [25], using standard parameters. The ROIs were set to 2° so that the top-five salient regions covered maximally 12.6% of an image. In case of overlapping ROIs, fixations within the ROIs were only counted once. For the test of the
123
Cogn Comput
salience model, the proportion of hits—that is, the participants’ fixations within the ROIs—was then compared to the proportion of pseudo-hits—that is, the proportion of the randomly distributed (baseline 1) and of the biased-randomly distributed (baseline 2) fixations landing in the same ROIs. In this test, the salience model is confirmed if the observed proportion of hits exceeds that of both sorts of pseudo-hits. The fixations were aggregated over participants separately for each class of images. An ANOVA of the proportion of hits and pseudo-hits with the variables distribution (observed vs. random vs. biased-random) and image class (5 steps) indicated a significant advantage of the salience model for predicting observers’ fixations compared to randomly and biased-randomly distributed fixations (see Fig. 4) reflected in a significant main effect for the variable distribution, F(2, 360) = 105.7, p \ .001, g2p = 0.37. Post hoc Bonferroni tests confirmed that proportions of hits of the salience model were higher than both proportions of pseudo-hits (both ps \ .01) and that the proportion of pseudo-hits was higher in the more conservative biased distribution than in the pure random distribution (p \ .05). There was neither a significant main effect of image class nor a significant interaction, Fs \ 1.00, suggesting that the salience map predicted fixations with better than chance accuracy for all of our image classes. (We also repeated this analysis, with a 5 9 5 grid for the biased-random distribution. This distribution yields an even stronger central bias, but we found the same results.)
regions) [47]. These probabilities were plotted against one another, while the threshold of the salience map was varied between zero and one. This technique renders a receiver operating characteristic (ROC) curve. The area under this curve (AUC) can then be used as a measure for the quality of the salience map’s predictive power (Matlab script adapted from [44], available at http://www.klab.caltech.edu/*harel/ share/gbvs.php). AUCs were significantly larger than chance level (AUC = .5) for all image classes, abstract paintings, Ce´zanne’s paintings, depictive paintings, Machotka’s photographs, and Shore’s photographs, all ts [ 5.00; all ps \ .001). We also found that AUCs significantly differed between image classes, F(4, 120) = 3.34, p \ .05, g2p = 0.10. Post hoc tests revealed significantly larger AUCs in Shore’s photographs than in all other image classes (abstract, Ce´zanne, and Machotka: all ps \ .01, all Cohen’s Ds [ 0.74), except the depictive paintings (p = .20). To investigate the influence of the feature contrasts that contribute to the salience map, ROCs for all underlying conspicuity maps (color, intensity, and orientation) were calculated. Figure 5 shows AUCs for each of these maps (plus overall salience). Regression analyses of salience’s AUC on different conspicuity maps as predictors that were conducted separately for each image class showed that color and orientation contrasts were good predictors of overall salience (all R2 [ .92, all ps \ .01) in all stimulus classes. This, as well as further effects, is shown in more detail in Table 2.
The Influence of Salience: Signal Detection Analysis Time Course We additionally assessed the predictions of the salience model, comparing the probability for hits (predicted and fixated regions) and false alarms (predicted but non-fixated
We also assessed the impact of salience on fixations as a function of time since picture onset. To that end, we
Fig. 4 Mean proportion (±standard error) of all hits and pseudo-hits (fixations on salient regions) over participants, separately for different image classes
Fig. 5 Area under curves (AUCs) for salience (filled circles) and the underlying conspicuity maps for color, intensity, and orientation, plotted separately for each image class
123
Cogn Comput Table 2 Results of linear regressions, calculated for overall salience, separately for image classes Predictor
B Color Intensity Orientation Total R
2
Ce´zanne
Abstract SE .68
.11
-.09
.16
.50
.11
b
B .66***
-.07 .44***
.93***
Depictive SE
b
B
Machotka SE
B
B
SE
Shore b
B
SE
b
.74
.18
.56***
.14
.11
.12
.39
.13
.36**
.35
.13
.30*
.10
.16
.08
.38
.14
.31*
.37
.16
.33*
.37
.18
.30
.43
.08
.42***
.60
.09
.61***
.55
.12
.39***
.54
.13
.44***
.93***
.93***
.93***
.93***
B, standard error of B (SE), and b of each linear regression are reported * p \ .05; ** p \ .01; *** p \ .001 Fig. 6 Mean area under curves (AUCs) over all participants and images, separately for each image class, plotted for paintings (left) and photographs (right), as a function of temporal ordinal fixation position
calculated AUCs separately for the first 8 fixations (see Fig. 6). In a first ANOVA, we compared the AUCs with one another, with the variables fixations (2 to 9) and image class (5 steps). This ANOVA confirmed significant decrements of AUCs over time, F = 14.29, p \ .01, and significant main effects of image class, F = 3.70, p \ .01, but no interaction between these variables, F \ 1.00. (A second ANOVA that was conducted solely on the first 4 fixations (2 to 5) confirmed the picture.)
3.
Discussion
5.
The goal of this experiment was to investigate the predictive value of Itti and Koch’s salience model [11, 25] on gaze behavior for photographs and paintings. The experiment indicates some interesting findings: 1.
2.
Raw eye movement data showed differences in average fixation numbers and average fixation durations for the different stimulus categories. Predictions for observers’ gaze behavior were by far better than chance in all stimulus categories used for this experiment. This also was the case using a biasedrandom data set instead of a purely random distribution as a baseline, which takes observer biases like the central bias tendency into account. The finding that salience outperforms the biased distribution for all images suggests that the salience map works also well for artworks.
4.
ROC’s AUC scores also revealed effects of the salience map that exceeded chance behavior in all stimulus categories. Here, we found the best scores for photographs by Shore, which indicates that the salience map works best for this category. Regression analyses showed that in all stimulus categories, orientation is a highly valuable predictor for overall salience. Furthermore, color contributes also in abstract artworks and Ce´zannes landscapes, but not so much in depictive artworks and photographs. Investigating the time course of the salience map’s impact on gaze behavior also revealed an advantage of Shore’s images over the other stimulus categories. These photographs yield generally higher and more constant ROC AUC scores over time.
Inspecting the raw eye movement data, we found differences in the average fixation number and fixation durations on our stimulus categories. Henderson and Pierce [48] addressed the topic of what determines fixation durations during scene viewing. They found longer fixation durations for ‘directly controlled fixations’, when increasing time was needed for scene analysis. A second category of fixations showed to be independent of current visual input. The authors attributed these fixations to a general stimulusdependent and task-dependent parameter. Following this hypothesis, our findings indicate that artworks are more difficult to analyze and therefore resulted in longer fixation durations. Henderson and Pierce suggested that the direct
123
Cogn Comput
control of fixation duration can also be elicited by feature analysis or object recognition. This might explain why abstract artworks without objects gained longer fixations than depictive artworks or Shore’s photographs. Findings in visual feature search that increasing fixation duration is correlated with increasing target–distractor similarity [49] can also support our interpretation: fixation durations increase when processing efforts increase. Signal detection analysis also supports the predictive value of the salience map. Probabilities that participants fixated salient regions were considerably higher than that the same regions were not looked at, as indicated by ROC AUC scores in all stimulus categories. Although all results were better than chance, Shore’s photographs along with depictive artworks resulted in the highest ROC AUC scores. This finding raises the question whether these effects can be entirely explained by salience effects. Considering what factors those two classes have in common and what varies between the classes, the answer is: the nature of the depicted content. The real-world photographs as well as our depictive artworks are composed similarly. The higher scores in the signal detection analysis for realworld photographs and depictive art can therefore be interpreted as a processing facilitation, since we are very familiar with the content of these images. Abstract artworks in contrast require more top-down resources for content-wise processing of these images. To investigate the time course of bottom-up features on our gaze behavior, we calculated ROC AUC scores for the first fixations individually. We found that Shore’s photographs not only resulted in the highest AUC values, but also remained constantly high over time. For artworks, we found that the salience-dependent AUC scores decreased over the first few fixations. The reason for this can be attributed to the fact that top-down influences are presumably weakest just after stimulus onset [27]. If the top-down processes, elicited by the artworks, increase over time, it seems plausible that the salience effect decreases. According to this argument, the fact that top-down factors counteract salience-dependent fixation behavior in abstract paintings over time but that the same top-down factors seemingly led to the maintenance of salience-dependent fixation behavior in Shore’s photographs reflected the selective presence of objects coinciding with salience in Shore’s photographs but not in abstract artworks. In other words, real-world photographs are not the most sensitive means to unveil the typical short-lived time course of a bottom-up salience effect.
Experiment 2 One difficulty with the interpretation of the behavior in the free-viewing task of Experiment 1 is that we do not know
123
whether participants had a top-down setting to search for contrasts (or salience). To rule out such a top-down search goal and test whether salience captures attention, regardless of a top-down control setting, we varied the images and used a different task. In Experiment 2, we used hard-todetect targets in half of the images. These targets were such that they did not alter the salience maps of the images. Most importantly, in target-present images, targets were equally likely presented at positions of low and of high salience, and our participants knew this. Under these conditions, participants had no reason to preferably only search for salient image regions. This should be demonstrated by an overall better than chance likelihood of detecting targets even at low-salience regions, together with an increased likelihood of fixations on targets in target-present images with a target at a lowly salient position. Of interest, however, was the behavior in the nontarget trials. In these trials, no target was shown. Hence, there was no reason to fixate either a highly salient or a lowly salient region in the non-target trials. However, if salience captures attention and the gaze in a stimulus-driven way and regardless of an actually pertaining task set, we expected to observe again more fixations on highly salient than on lowly salient image regions in the nontarget images. For our test, we used only images of abstract paintings because these images most strongly deviated from the 2D images of natural scenes. Methods Participants Twelve undergraduate students (7 female students; mean age 22.7) from the University of Vienna participated for course credit. Stimuli The 25 abstract paintings were used for this experiment. A target in the form of a blurred disk, with a diameter of 1° of visual angle, was superimposed in half of the images using a Gaussian blurring filter (radius = 1.5 pixel) in Photoshop. This filter produces a hazy effect by adding lowfrequency detail. The position of the target disk was either a highly salient or a lowly salient position. The most salient region of each stimulus according to the analysis from Experiment 1 was used as the highly salient target position. Within the same image, the most salient location from a randomly assigned alternative image served as the lowly salient target position. The minimum distance of the lowly salient position to the first five salient positions according to Experiment 1 was at least 4° of visual angle in each image.
Cogn Comput
The first five salient regions were calculated again for each modified image. Salience was not affected by the presence of targets at the lowly salient positions at all. Targets at the highly salient positions also hardly influenced overall salience maps and, if anything, reduced salience (i.e., shifted the most salient to the second most salient region in 2 out of 25 images). A manipulation check confirmed that salience measured at the highly salient position was significantly higher than at the lowly salient position, t(24) = 5.85, p \ .001. To avoid that fixation biases toward particular regions could account for our results, the whole stimulus material was presented in the original orientation and additionally 180° rotated. Participants saw half of the original images and half of the rotated images, with the identity of the original and the rotated images perfectly balanced across participants. Because each highly salient region in one image was taken to pick a lowly salient region in an alternative image and because all lowly salient regions were created in this way, the relative positions of the targets at highly salient and at lowly salient regions were thus perfectly balanced across participants, too, and a spatial bias as an alternative potential explanation of our results ruled out beforehand. Procedure Eye movement recording and settings were the same as in Experiment 1. Participants were instructed to inspect the images carefully for fifteen seconds. Each participant saw the 12 (or 13) original and 13 (or 12) rotated images twice: once without a target (absent condition) and once with a target (present condition). In the present condition, the target was at either a highly or a lowly salient position. The images and, hence, trials with and without targets were presented in a pseudo-randomized order. Participants had to indicate after each trial whether they had found a target. In case that the participants reported seeing a target, they were asked to mark the target position in the original image (but now presented without the target) on a subsequent image display by moving a mouse cursor to the perceived target position and by clicking the left mouse button to mark the perceived target position. Analysis and Results Target Detection: Sensitivity Measure Hits (i.e., targets that the participants correctly localized) were defined as a mouse click at a maximum distance of 2°
of visual angle from the center of the displayed target. The sensitivity index d’ was calculated separately for each participant and for targets at highly salient (d0 = 1.81) and lowly salient positions (d0 = 1.30), respectively. Both of these d0 values differed significantly from chance level (i.e., d0 = 0), both ps \ .001. A paired-sample t-test revealed no significant difference between the participant’s sensitivity to targets at highly salient and at lowly salient positions, p = .084. Fixation Data The first fixation of each trial was excluded from further analysis. Highly Salient Versus Lowly Salient Areas of Interest AOIs with a radius of 2° of visual angle for the highly salient and the lowly salient target locations were created for each image in target-present and target-absent images. Mean fixation durations on the AOIs did not differ between highly salient (m = 379 ms) and lowly salient AOIs (m = 388 ms), t(11) = .48, p = .64. Next, the number of fixations was aggregated over stimuli, separately for each participant and each AOI. A paired samples t-test, t(11) = 4.48, p = .001, revealed that highly salient AOIs (mean = 184.9) received more fixations than lowly salient AOIs (mean = 136.6). A 2 9 2 repeated measures ANOVA, with the variables AOI (highly salient vs. lowly salient) and condition (highly salient target vs. lowly salient target), indicated a significant interaction, F(1, 11) = 68.8, p \ .001, g2p = .86. This interaction confirms that the task-set manipulation affected participants’ gaze behavior. The salient AOIs were fixated more often in highly salient target trials (mean = 118.8) than in lowly salient target trials (mean = 9.2), whereas the lowly salient AOIs received more fixations when a target appeared at the lowly salient position (mean = 100.3) than at a highly salient position (mean = 17.7). Most importantly and in line with salience-driven capture of the gaze, however, the significant main effect for the variable AOI, F(1, 11) = 7.5, p \ .05, g2p = 0.41, reflects the fact that highly salient AOIs received a higher number of fixations overall (as was shown above including also the target-absent trials). (Analysis restricted to the first five seconds of presentation time revealed the same data patterns as described above.) One possible explanation for the higher number of fixations on highly salient AOIs could be that participants found the target at the highly salient position more often than at the lowly salient position, so that salience gained a reward function. To rule out this reward hypothesis, we
123
Cogn Comput
included only the trials without a target (absent trials) that had been shown prior to the same image’s version containing a target (present trials). Analysis again confirmed the salience hypothesis, t(11) = 6.28, p \ .001, where highly salient AOIs received nearly twice as many fixations on average than lowly salient AOIs (means: 22.3 vs. 11.8). Since the visual search task in our study affected gaze behavior in terms of more fixations on the target region, in conditions with a target at a highly salient position and at a lowly salient position, we further investigated how persistently our target manipulation worked on participants’ gaze behavior. Therefore, we also analyzed only those target-absent trials that followed a trial in which participants had seen the same image but containing a target at a lowly salient position. Results indicated that although viewers had fixated more often on the lowly salient AOI when a lowly salient target was presented, they did not keep this setting so that when seeing the same image again but without a target in it, they again preferred fixation of a highly salient AOI (m = 11.5) compared to a lowly salient AOI (m = 7.0), t(10) = 5.67, p \ .001.
Discussion Experiment 2 investigated the influence of salience on fixations during a visual search task. Our results strongly supported the salience hypothesis: participants fixated more often at salient compared to less salient regions. This was the case although participants understood the task and followed the instructions. Participants found the targets at highly and lowly salient positions with better than chance likelihood, an observation that was also reflected in significantly more fixations on lowly salient regions if targets were presented at a lowly salient position and more fixations on highly salient regions if targets were presented at a highly salient position. This experiment thus replicated prior findings that a certain task set, like a visual search task set, can override the influence of salience at least to some degree [34–36]. Additionally, however, we showed that this task-set effect is not kept across trials, since the number of fixations was higher on salient regions overall, even when participants had seen the same image with a lowly salient target in it before. Highly salient regions were also preferred when no target was presented and the image had not been seen before. This means that the salience effect on fixations that we found in the present experiment did not reflect that participants learned to look at a particular highly salient (or lowly salient) region in a specific image because they found the target at this position.
123
General Discussion The goal of this study was to investigate the bottom-up theory of salience-driven attention (cf. [14]). For our experiments, we used paintings as a conservative stimulus with at least a smaller resemblance to ecological visual inputs in the case of the abstract paintings. We ruled out, for example, the effects of objects on fixation probabilities, which are typically correlated with the effects of salience in 2D images of natural scenes. For our test, we used Itti and Koch’s salience model [25] and fixation probabilities as dependent variables. Our results were clearly in favor of the bottom-up hypothesis of salience-driven attention and Itti and Koch’s and Parkhurst et al.’s [27] models. We found that the salience model predicted fixations far better than chance in all image classes. This was also observed in the abstract paintings and in comparison to a biased-random distribution of fixations that controlled for participants’ biases toward the screen center. AUC analyses also confirmed this conclusion, again ruling out that biases better account for the results. As an aside, our regression analyses showed that in all image classes, orientation and color were significant predictors of overall salience. Maybe even more important is that a time course analysis demonstrated that the salience effect was indeed shortlived. This was predicted by stimulus-driven capture (cf. [20]) and was necessary to show a stimulus-driven salience effect. If anything, our data showed only one exception to this trend: less decrease in the salience effect on fixations and, thus, higher overall salience effects in Shore’s photographs. This was potentially due to top-down biases toward objects because these are highly correlated with salience in 2D images of natural scenes [30]. By and large, however, the finding that salience effects were weaker in the paintings than in Shore’s photographs confirmed that paintings indeed provided a more conservative and appropriate measure of stimulus-driven salience effects on fixations. Finally, we also confirmed a stimulus-driven influence of salience on fixations, when in fact participants successfully searched for targets at highly salient as well as at lowly salient locations. This is even clearer evidence of a stimulus-driven effect of salience because it is not so clear what participants searched for under (standard) free-viewing conditions. In contrast, in our Experiment 2, we knew exactly what participants searched for, as indicated by their successful search for lowly salient target blurs, which were equally likely located at highly salient and lowly salient regions. One limitation of the present study concerns our use of fixations as the only dependent variables. We cannot determine whether the attentional effect that we measured
Cogn Comput Table 3 Examples (references) for principled models and approaches to explain gaze behavior Bottom-up
Top-down
Perceptual
Itti & Koch (cf. [11, 25, 26])
Torralba (cf. [37, 50]) Tatler (cf. [47, 51])
Motor
Tatler (cf. [52])
Ballard et al. (cf. [53])
Perceptual explanations typically refer to image properties, such as objects, features, and feature contrasts within images, when accounting for gaze directions. Motor theories refer to factors in action control, such as the need to return to optimal spatial viewing positions for subsequent quick and unbiased saccades to different positions, or to the role of fixations during specific actions, like inspection of objects for subsequent successful grasping of such objects. Orthogonal to the perceptual-motor dimension, theories of gaze behavior have focused on task-specific principles, here denoted as top-down models, or on general task-unspecific factors, here referred to as bottom-up models. In addition, some models (like Guided Search) take more than one of the factors into account. The present paper only focuses on perceptual factors and emphasizes the power of the bottom-up account, but certainly the other principles are also important factors for a full explanation of gaze behavior
reflected selection for action or whether it also reflected selection for perception. To flesh out the latter, future studies should also test the accuracy of discrimination or the recollection of fixated when compared to non-fixated image regions. Another limitation is that we exclusively focused on visual factors and emphasized bottom-up influences, although a more complete picture of fixation behavior should also include more action-related or motor principles, as represented by our schematic overview of confirmed principles related to fixation behavior (cf. Table 3).
Conclusion We found further support for the salience model by Itti and Koch [11, 25], extending the visual stimulus categories to paintings. However, the assumption that our findings are solely due to salience effects should be made with caution. During art perception, top-down processes neglected in this study could be important factors. Acknowledgments We would like to thank our Reviewers for valuable comments on an earlier version of this manuscript. Furthermore, we thank Elisa Tersteegen and Helmut Perrelli for their support in data collection, as well as Joachim Denzler and Michael Koch for kindly providing the programs for the Fourier power spectra analyses.
References 1. Posner MI. Orienting of attention. Q J Exp Psychol. 1980;32: 3–25. 2. Theeuwes J. Perceptual selectivity for color and form. Percept Psychophys. 1992;51:599–606. 3. Bashinski HS, Bacharach VR. Enhancement of perceptual sensitivity as the result of selectively attending to spatial locations. Percept Psychophys. 1980;28:241–8. 4. Mu¨ller HJ, Rabbitt PMA. Reflexive and voluntary orienting of visual attention: time course of activation and resistance to interruption. J Exp Psychol Hum Percept Perform. 1989;15: 315–30.
5. Neumann O. Beyond capacity: a functional view of attention. In: Heuer H, Sanders AF, editors. Perspectives on perception and action. Hillsdale, NJ: Erlbaum; 1987. p. 361–94. 6. Allport A. Selection for action: some behavioral and neurophysiological considerations of attention and action. In: Heuer H, Sander AF, editors. Perspectives on perception and action. Hillsdale, NJ: Erlbaum; 1987. p. 395–419. 7. Deubel H, Schneider WX. Saccade target selection and object recognition: evidence for a common attentional mechanism. Vision Res. 1996;36:1827–37. 8. Rizzolatti G, Riggio L, Sheliga BM. Space and selective attention. In: Umilta` C, Moscovitch M, editors. Attention and performance, XV: conscious and nonconscious information processing. Cambridge, MA: MIT Press; 1994. 9. Henderson JM. Regarding scenes. Curr Dir Psychol Sci. 2007; 16:219–22. 10. Rayner K. Eye movements in reading and information processing: 20 years of research. Psychol Bull. 1998;124:372–422. 11. Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2:194–203. 12. Van Zoest W, Donk M, Theeuwes J. The role of stimulus-driven and goal-driven control in visual selection. J Exp Psychol Hum Percept Perform. 2004;30:746–59. 13. Bergen JR, Julesz B. Parallel vs. serial processing in rapid pattern discrimination. Nat Rev Neurosci. 1983;303:696–8. 14. Theeuwes J. Top-down and bottom-up control of visual selection. Acta Psychol (Amst). in press. 15. Bacon WF, Egeth HE. Overriding stimulus-driven attentional capture. Percept Psychophys. 1994;55:485–96. 16. Leber AB, Egeth HE. It’s under control: top-down strategies can override attentional capture. Psychon Bull Rev. 2006;13: 132–238. 17. Burnham BR. Displaywide visual features associated with a search display’s appearance can mediate attentional capture. Psychon Bull Rev. 2007;14(392–422). 18. Eimer M, Kiss M. Involuntary attentional capture is determined by task set: evidence from event-related brain potentials. J Cogn Neurosci. 2008;20:1423–33. 19. Folk CL, Remington RW, Johnston JC. Involuntary covert orienting is contingent on attentional control settings. J Exp Psychol Hum Percept Perform. 1992;18:1030–44. 20. Donk M, van Zoest W. Effects of salience are short-lived. Psychol Sci. 2008;19:733–9. 21. Ogawa T, Komatsu H. Neuronal dynamics of bottom-up and topdown processes in area V4 of macaque monkeys performing a visual search. Exp Brain Res. 2004;173:1–13. 22. Ansorge U, Horstmann G. Preemptive control of attentional capture by color: evidence from trial-by-trial analysis and ordering of onsets of capture effects in RT distributions. Q J Exp Psychol. 2007;60:952–75.
123
Cogn Comput 23. Bichot NP, Rossi AF, Desimone R. Parallel and serial neural mechanisms for visual search in macaque area V4. Science. 2005;308:529–34. 24. Zhang W-W, Luck S. Feature-based attention modulates feedforward visual processing. Nat Neurosci. 2009;12:24–5. 25. Itti L, Koch C. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Res. 2000;40:1489–506. 26. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20:1254–9. 27. Parkhurst D, Law K, Niebur E. Modeling the role of salience in the allocation of overt visual attention. Vision Res. 2002;42: 107–23. 28. Underwood G. Cognitive processing in eye guidance: algorithms for attention in image processing. Cognit Comput. 2009;1:64–76. 29. Peters RJ, Iyer A, Itti L, Koch C. Components of bottom-up gaze allocation in natural images. Vision Res. 2005;45:2397–416. 30. Elazary L, Itti L. Interesting objects are visually salient. J Vis. 2008;8(3):1–15. 31. Einha¨user W, Spain M, Perona P. Objects predict fixations better than early saliency. J Vis. 2008;8:1–26. 32. Frey HP, Honey C, Ko¨nig P. What’s color got to do with it? The influence of color on visual attention in different categories. J Vis. 2008;8:1–17. 33. Simoncelli EP, Olshausen BA. Natural image statistics and neural representation. Annu Rev Neurosci. 2001;24:1193–216. 34. Chen X, Zelinsky GJ. Real-world visual search is dominated by top-down guidance. Vision Res. 2006;46:4118–33. 35. Einha¨user W, Rutishauser U, Koch C. Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. J Vis. 2008;8:1–19. 36. Rutishauser U, Koch C. Probabilistic modeling of eye movement data during conjunction search via feature-based attention. J Vis. 2007;7:1–20. 37. Torralba A, Oliva A, Castelhano MS, Henderson JM. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113:766–86. 38. Underwood G, Foulsham T. Visual saliency and semantic incongruency influence eye movements when inspecting pictures. Q J Exp Psychol. 2006;59:1931–49. 39. Humphrey K, Underwood G. Domain knowledge moderates the influence of visual saliency in scene recognition. Br J Psychol. 2009;100:377–98. 40. Underwood G, Foulsham T, Humphrey K. Saliency and scan patterns in the inspection of real-world scenes: eye movements during encoding and recognition. Vis Cogn. 2009;17:812–34.
123
41. Cerf M, Harel J, Einha¨user W, Koch C. Predicting human gaze using low-level saliency combined with face detection. Adv Neural Inf Process Syst. 2008;20:241–8. 42. Redies C, Hasenstein J, Denzler J. Fractal-like image statistics in visual art: similarity to natural scenes. Spat Vis. 2007;21:137–48. 43. Redies C, Ha¨nisch J, Blickhan M, Denzler J. Artists portrait human faces with the Fourier statistics of complex natural scenes. Network. 2007;18:235–48. 44. Harel J, Koch C, Perona P. Graph-based visual saliency. Adv Neural Inf Process Syst. 2006. 45. Foulsham T, Underwood G. What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. J Vis. 2008;8:1–17. 46. Walther D, Koch C. Modeling attention to salient proto-objects. Neural Netw. 2006;19:1395–407. 47. Tatler BW, Baddeley RJ, Gilchrist ID. Visual correlates of fixation selection: effects of scale and time. Vision Res. 2005;45: 643–59. 48. Henderson JM, Pierce GL. Eye movements during scene viewing: evidence for mixed control of fixation durations. Psychon Bull Rev. 2008;15:566–73. 49. Wienrich C, Heße U, Mu¨ller-Plath G. Eye movements and attention in visual feature search with graded target-distractorsimilarity. J Eye Mov Res. 2009;3:1–19. 50. Oliva A, Torralba A, Castelhano MS, Henderson JM. Top-down control of visual attention in object detection. Proc Int Conf Image Proc. 2003;1:253–6. 51. Tatler BW, Baddeley RJ, Vincent BT. The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vision Res. 2006;46:1857–62. 52. Tatler BW. The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases an image feature distributions. J Vis. 2007;7:1–17. 53. Ballard D, Hayhoe M, Pook P, Rao R. Deictic codes for the embodiment of cognition. Behav Brain Sci. 1997;20:723–67. 54. Radziwill F. Stilleben. Berlin: Kupferstichkabinett, Staatliche Museen zu Berlin; n.a. 55. Ce´zanne P. La Sainte-Victoire vue des Infernets. St. Petersburg: The State Hermitage Museum; 1898. 56. Machotka P. Cezanne: landscape into art. New Haven: Yale University Press; 1996. 57. Shore S, Schmidt-Wulffen S, Tillman L. Stephen shore: uncommon places: the complete works. New York: Aperture; 1982. 58. De Kooning W. Untitled. Venice: Peggy Guggenheim Collection; 1958.