Observers' Cognitive States Modulate How Visual Inputs Relate to ...

1 downloads 0 Views 2MB Size Report
Apr 28, 2016 - dation Grant BCS-1151358 to John M. Henderson. Correspondence .... Both the b and c paths support the cognitive relevance hypothesis. HSV.
Journal of Experimental Psychology: Human Perception and Performance 2016, Vol. 42, No. 9, 1429 –1442

© 2016 American Psychological Association 0096-1523/16/$12.00 http://dx.doi.org/10.1037/xhp0000224

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Observers’ Cognitive States Modulate How Visual Inputs Relate to Gaze Control Omid Kardan

John M. Henderson

University of Chicago

University of California, Davis

Grigori Yourganov

Marc G. Berman

University of South Carolina

University of Chicago

Previous research has shown that eye-movements change depending on both the visual features of our environment, and the viewer’s top-down knowledge. One important question that is unclear is the degree to which the visual goals of the viewer modulate how visual features of scenes guide eye-movements. Here, we propose a systematic framework to investigate this question. In our study, participants performed 3 different visual tasks on 135 scenes: search, memorization, and aesthetic judgment, while their eye-movements were tracked. Canonical correlation analyses showed that eye-movements were reliably more related to low-level visual features at fixations during the visual search task compared to the aesthetic judgment and scene memorization tasks. Different visual features also had different relevance to eye-movements between tasks. This modulation of the relationship between visual features and eye-movements by task was also demonstrated with classification analyses, where classifiers were trained to predict the viewing task based on eye movements and visual features at fixations. Feature loadings showed that the visual features at fixations could signal task differences independent of temporal and spatial properties of eye-movements. When classifying across participants, edge density and saliency at fixations were as important as eye-movements in the successful prediction of task, with entropy and hue also being significant, but with smaller effect sizes. When classifying within participants, brightness and saturation were also significant contributors. Canonical correlation and classification results, together with a test of moderation versus mediation, suggest that the cognitive state of the observer moderates the relationship between stimulus-driven visual features and eye-movements. Keywords: eye-movements, visual features, cognitive state, gaze control, overt attention Supplemental materials: http://dx.doi.org/10.1037/xhp0000224.supp

& Hollingworth, 1999; Kardan, Berman, Yourganov, Schmidt, & Henderson, 2015; Mills, Hollingworth, Van der Stigchel, Hoffman, & Dodd, 2011; Yarbus, 1967). This is because the characteristics of our eye movements when viewing an image are influenced top-down by cognitive processes during scene viewing (e.g., Borji & Itti, 2013; Buswell, 1935; Henderson, 2013; Land & Tatler, 2009; Yarbus, 1967). At the same time, the visual environment also influences eye-movement behavior. When viewing different images our eye-movements are influenced by the low-level visual feature distribution of the images (Andrews & Coppola, 1999; Cerf, Harel, Einhäuser, and Koch, 2008; Henderson, Chanceaux, & Smith, 2009; Itti et al., 1998; Itti & Koch, 2001; Koch & Ullman, 1985; Parkhurst, Law, & Niebur, 2002). It is therefore important to determine how the external environment and our intentions combine to govern our eye-movements. To study how our eye-movements are controlled by the combination of visual features and mental states, we first need to reduce the dimensions of the visual environment and of cognitive states, and examine their independent contributions and interactions. In this study we examined how saccadic eye-movement measures and their dynamics are altered based on the visual environment, which we operationalized as stimulus-driven features (i.e., hue, satura-

Our eye-movements provide strong links between our external environment and our intentions/internal states. Specifically, an individual’s saccadic eye-movements are driven by visual features of the environment and also by cognitive factors such as the goals of the viewer (Henderson, 2007, 2013). For example, if individuals examine the same scene with different purposes (e.g., aesthetic judgment vs. visual search), their eyes are likely to move differently (Castelhano, Mack, & Henderson, 2009; Henderson, Weeks,

This article was published Online First April 28, 2016. Omid Kardan, Department of Psychology, University of Chicago; John M. Henderson, Department of Psychology and Center for Mind and Brain, University of California, Davis; Grigori Yourganov, Department of Psychology, University of South Carolina; Marc G. Berman, Department of Psychology, University of Chicago. This work was partially supported by a TKF grant and an internal grant from the University of Chicago to Marc G. Berman, and National Science Foundation Grant BCS-1151358 to John M. Henderson. Correspondence concerning this article should be addressed to Omid Kardan or Marc G. Berman, Department of Psychology, University of Chicago, 5848 South University Avenue, Chicago, IL 60637. E-mail: [email protected] or [email protected] 1429

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1430

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

tion, brightness, edge density, entropy, and saliency of fixation), and whether that relationship is moderated by the cognitive state of observers, which we operationalized as the states induced by three viewing tasks (i.e., search, memorization, and aesthetic preference). According to the cognitive relevance hypothesis, eye-movement patterns when performing different visual tasks are driven by top-down factors that intentionally direct fixations toward informative task-relevant locations (Henderson, Brockmole, Castelhano, & Mack, 2007; Henderson, Malcolm, & Schandl, 2009; see Henderson, 2013). For example, Castelhano et al. (2009) investigated how task instructions influence specific eye-movement control parameters. In their study, participants viewed color photographs of natural scenes under two instruction sets: (a) searching a scene for a particular item or (b) remembering characteristics of that same scene. They found that the viewing task biased aggregate eye-movement measures such as average fixation durations and average saccade amplitudes. Mills et al. (2011) also examined the influence of task set on spatial and temporal characteristics of eye movements during scene perception. They found that task set affected both spatial (e.g., saccade amplitude) and temporal characteristics of fixations (e.g., fixation duration). Both results lend support to the cognitive-relevance hypothesis because the task that one performs alters eye-movement patterns in systematic ways. In addition to the cognitive relevance hypothesis, another hypothesis that describes different types of contributions to attention and gaze control, called the saliency hypothesis, focuses more on bottom-up control of eye-movements based on the visual features of the stimuli. The major tenet of this theory is that our eyes are directed, in a bottom-up fashion, to low-level image discontinuities such as bright regions, edges, color differences, and so forth, which are termed salient regions (Itti et al., 1998; Itti & Koch, 2000; Koch & Ullman, 1985; O’Connell & Walther, 2015; Parkhurst et al., 2002; Treisman & Gelade, 1980; Walther & Koch, 2006). For instance, some evidence suggests that in a search task, the location containing the most salient stimulus initially receives attention. Following inhibition of the most salient location, the location containing the next salient element receives spatial attention and so forth until the target is found (Wolfe, 1994). However, it could be that these results are modulated by the viewer’s objectives. Building upon these models and the evidence for them, we constructed a hybrid model, which is depicted in Figure 1, showing the hypothesized relationships between eye movements, the visual environment, and cognitive state. The a path represents how stimulus-driven features could affect gaze control mechanisms in ways that can be observed through the eye-movement outputs of fixation durations and saccade amplitudes. Existence of this path is supported by the body of research consistent with the saliency hypothesis. The b path represents how the top-down context of viewing directly affects gaze control mechanisms, again observed through fixations and saccades. Existence of this path is supported by research consistent with the cognitive relevance hypothesis as well as studies demonstrating that cognitive states can be systematically predicted to some degree from differences in eye movement patterns using machine learning classification methods (Borji & Itti, 2014; Henderson, Shinkareva, Wang, Luke, & Olejarczyk, 2013; Kardan, Berman, et al., 2015).1 While the previous literature suggests that bottom-up (the a path) and/or top-down (the b path) processes are at play to guide

Figure 1. Correspondence model of the relationship between the visual environment, eye-movements, and the top-down context of viewing. The observed variables (in rectangles) also include the interaction of each feature with time (eye-movement side) and with spatial positions (visual features side). We do not include them explicitly in this figure for ease of display. The a path represents how stimulus-driven features could affect gaze control consistent with the saliency hypothesis. The b path represents how the top-down context of viewing directly affects gaze control mechanisms, and the c path represents the moderation of the relationship between the visual environment and gaze control by the top-down context. Both the b and c paths support the cognitive relevance hypothesis. HSV ⫽ hue-saturation-value; SAC ⫽ saccade. See the online article for the color version of this figure.

our overt attention/eye movements, vision scientists have only recently started to investigate the extent to which cognitive and stimulus-driven aspects of scene viewing independently or interactively contribute to viewing behavior in an explicit manner, and many aspects of this possible interaction are yet to be determined (e.g., see Anderson, Ort, Kruijne, Meeter, & Donk, 2015; O’Connell, & Walther, 2015). Of course, the degree of involvement of each aspect could be dependent on the nature of the task and stimulus. For example, whether the task is attentionally demanding or not, as well as whether there are well-defined strategies learned through experience to perform the task, could make viewing behavior more or less dependent on the stimulus. To assess this question one could hypothesize that there is a c path, as shown in Figure 1, which represents the modulation of the relationship between the visual environment and gaze control by the top-down context of the viewer. In this study, we hypothesized that this moderation exists and tested its existence in two ways, directly by canonical correlation analysis, and indirectly by classification analysis. First, we performed three different canonical correlation analyses where the relationship between extracted stimulus-driven features and eye-movements (the a path) was assessed under three different visual tasks: (a) search, (b) memorization, and (c) aesthetic judgment. The stimulus-driven features that were used to define the visual environment included low-level color features (local hue, saturation, and brightness of the fixations), low-level 1 Although, the degree to which classification is possible is also dependent on the nature of the task and stimulus (Borji & Itti, 2014; Greene et al., 2012).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

1431

Figure 2. A sample image (top left), its saliency map (top middle) and salient shapes (top right) based on intensities, orientations, and colors. These conspicuity maps were created using the Saliency Toolbox 2.3 (Walther & Koch, 2006). The isolated conspicuity maps are also shown in the bottom row; intensity (left), orientation (middle), and color (right). Hotter colors (red) in the conspicuity maps show higher salience values. See the online article for the color version of this figure.

spatial features (grayscale local entropy and edge density), and low-level salience. These features were chosen because they have been previously shown to successfully summarize scenes and behavioral reactions to those scenes (e.g., see Berman et al., 2014; Kardan, Demiralp, et al., 2015; Walther & Koch, 2006). The eye-driven features that were used to define gaze control were fixation durations, saccade amplitudes, and their interactions with time passed while viewing. These features and the shapes of their distributions have been successfully used to summarize components of gaze control (e.g., see Kardan, Berman, et al., 2015; Castelhano et al., 2009; Borji & Itti, 2013). If the relationship between stimulus-driven features and eye characteristics systematically change based on the visual task, the c path should exist and improve the fit of the model. Second, we examined the same moderation question with a classification analysis, which provides an estimate of the reproducibility of the moderation effect.2 To do so, we performed a visual task classification analysis (i.e., predicting the viewing task) based on stimulus-driven and eye-movement features. In this analysis we regressed out stimulus-driven features from the eye movements to make them independent of one another. If the a path is independent of the visual task and not moderated by it (i.e., if the c path ⫽ 0), then it does not contribute to successful classification of cognitive goals, and removing the a path should not change the predictability of goals based on eye characteristics. Alternatively, if top-down context moderates a (c is nonzero), visual features can independently signal cognitive state differences because the classifiers can capitalize on the modulation of a to discern tasks.

In summary, in this study we set out to determine if top-down aspects of viewing may modulate the relationship between bottom-up stimulus features and eye movement dynamics (i.e., gaze control), and assess the extent to which this could happen.

Method Participants We used the data collected for our previous study (Kardan, Berman, et al., 2015), where 72 Edinburgh undergraduate students with normal or corrected to normal vision participated in our experiment. All participants were naïve concerning the purposes of the experiment and provided informed consent as administered by the Institutional Review Board of the University of Edinburgh.

Apparatus Eye movements were recorded via an SR Research Eyelink 1000 eye tracker (spatial resolution of 0.01°) with a sampling rate of 1,000 Hz. Participants were seated 90 cm away from a 21-in. CRT monitor. Head movements were minimized with a chin and head rest. Although viewing was binocular, eye movements were 2 This was more of an indirect test in the sense that the eye-movements were not being predicted directly, but rather the strength of their relationship with visual features was being tested to see if those relationships systematically distinguished the tasks.

1432

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

recorded from the right eye. The experiment was controlled with SR Research Experiment Builder software.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Stimuli and Tasks Materials. 135 unique full-color 800 ⫻ 600 pixel (32 bits; subtending 25.8 ⫻ 19.4) photographs of real-world scenes from a variety of scene categories were used during the initial task phase of the experiment. Procedure. The scenes were split into blocks of 45 images and participants were instructed to perform one of three tasks during each block: Search for an object (with word template), memorize the scene for a subsequent old/new recognition test, or aesthetic preference (1 ⫽ dislike to 4 ⫽ like). All scenes were presented for 8 s. The order of tasks and the scenes used in each of the tasks were rotated across nine participant groupings using a dual-Latin square design. This ensured that every order of task and combination of scenes within each task was represented at least once across the nine participant groupings. During the search task, an object name was presented (e.g., fish, hand fan, keys, vase, pigeon, wig, etc.) in Ariel font, vertical height 1.62 cm for 800 ms followed by a fixation cross for 200 ms. The scene was displayed for 8 s. If the participant located the search target they responded by pressing a trigger on the controller (Microsoft Sidewinder), but the scene remained on the screen until the 8 s were over.3 The Search task was preceded by three practice trials. The behavioral results showed that in 88.3% of the search trials, the target was found by the participant (participant pressing button as well as his or her fixation coordinates containing the object). During the aesthetic preference task, participants were instructed to examine the scenes and then rate how much they liked each scene, which were each presented for 8 s. After each scene a response screen appeared asking the participant to indicate on a 1– 4 scale how much they liked the scene (1 ⫽ dislike, 4 ⫽ like). Responses were made via four buttons on the controller. Three practice trials were presented at the start of the preference task. During the scene memorization task, scenes were presented for 8 s. Participants were not required to make any responses during the initial encoding phase and no practice was given. In the memory test phase of the experiment (i.e., recognition memory test), 134 of the original scenes were presented with half mirrored horizontally and 22 new scenes were presented (half of these new scenes had also been flipped to ensure any reverse text in the images could not be used as an indicator of whether a scene was old) for 3 s. The participants were then asked whether each scene was old, new, or an altered version of one they had seen before. They were also asked to identify the task in which they saw old or altered images. The behavioral results showed that in 73.6% of memory test trials, participants correctly identified the image as old, new, or altered. Finally, in 60.5% of trials, participants correctly identified the source task (whether it had previously been seen in the search task, the memorization task, or the preference task) of an old or altered image. Fixation-related features. Eye-movement features. In the canonical correlation analyses, we computed four eye movement features for each trial: (a) mean fixation duration and (b) its interaction with time (mean duration weighted with time passed during viewing) and (c) mean saccade

amplitude and (d) its interaction with time (mean saccade weighted with time passed during viewing). In the classification analyses, we also used these additional eye movement features: the number of fixations and the standard deviation and skewness of fixation durations and saccade amplitudes. We added these features to the classification analyses because it has been shown that the shape of the distribution of eye movements vary across visual tasks in systematic ways (e.g., see Castelhano et al., 2009; Henderson & Luke, 2014; Kardan, Berman, et al., 2015). We used the MATLAB functions mean, std, and skewness to compute these features. Local visual features. Our visual image features included edge density, hue, saturation, brightness, entropy, and saliency. To quantify the value of different visual features at each fixation, we first generated the feature map of each image. The maps for the first four visual features (edge density, hue, brightness, saturation) were created using MATLAB (Release 2014a, The MathWorks, Inc., Natick, MA) as in Berman et al., (2014) and Kardan, Demiralp, et al. (2015). Mean values of hue for every fixation, as well as the average hue of the images were calculated using the directional (circular) mean (Circular Statistics Toolbox for MATLAB; see www.kyb.mpg.de/~berens/circStat.html). The saliency maps were created using the Saliency Toolbox 2.3, in MATLAB as in Walther and Koch (2006). Figure 2 shows a sample image and its saliency map based on intensity, orientation, and color conspicuity maps. It is noteworthy that although saliency is calculated based on the integration of low level visual features and biological considerations such as inhibition of return processes, it is a stimulus driven feature (i.e., no fixation data) and thus is grouped with edge density, entropy, and other low-level stimulus driven features. After the maps were created, for each fixation time point, a 600 ⫻ 800 Gaussian mask (the same size as the images)4 was centered at that corresponding fixation location with horizontal and vertical standard deviations of ␴x ⫽ ␴y ⫽ 58 pixels (2° of visual angle). This Gaussian map was then multiplied by the feature map to obtain local feature maps. For example, Figure 3 shows a sample image, its edge map, and the masked fixation neighborhoods of the edge map (every third fixation is plotted, i.e., Fixations 1, 4, 7, . . ., 19). Figure 4 shows another sample image and its saturation map with the masked fixation neighborhoods of the saturation map in the same manner. Next, we normalized the visual features’ values for each fixation. This was done by dividing the average value from the masked feature map’s pixels by the average amplitude of the mask for that fixation to generate the normalized local visual feature value.5 Finally, the mean feature value across all pixels of the image was subtracted from each local feature value on that image, so that the fixation features are corrected for the global baseline of that visual feature. 3 We used the entire 8 s in the analysis because we previously have shown that including fixations made after finding the object does not artificially help distinguishing the search task from other tasks. If anything, it makes search trials resemble other tasks a little more (see Kardan, Berman, et al., 2015). 4 Please notice that a 600 ⫻ 800 matrix (mask) corresponds to an 800 ⫻ 600 image, because the dimensions of a matrix go as rows by columns whereas dimensions of a photo are usually reported as width by height. 5 Notice that average amplitude of the mask for a fixation is the maximum value possible for a local visual feature both for central and peripheral fixations on the scene, thus this step normalizes values to [0 1] range.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

1433

Figure 3. A sample image, its edge map, and the masked fixation neighborhoods of the edge map (every third fixation is plotted, i.e., Fixations 1, 4, 7, . . ., 19). Gray to white pixels show intensity of edges at respective fixations. See the online article for the color version of this figure.

The local grayscale entropy for each fixation neighborhood was 256 calculated as entropy ⫽ ⫺ 兺n⫽1 共pn*log2pn兲 , where pn is the probability value of the nth bin (out of 256 bins) of the grayscale intensity histogram (i.e., number of pixels with the intensity value ⫽ [n – 1, n) over the total number of pixels in the neighborhood). The neighborhood for a fixation was indicated by a circle with radius ⫽ 2° (58 pixels) and centered at the specific fixation’s coordinates (i.e., a neighborhood contains 10,568 pixels or fewer depending on whether some of the neighborhood falls out of image boundaries). The overall grayscale entropy of each image was also subtracted from each local entropy value on that image.6 Spatial positions. Spatial positions of fixations have also been shown to carry information about bottom-up and top-down contributions to overt attention (e.g., O’Connell and Walther, 2015). Spatial patterns of fixations can also help to discriminate between tasks (Borji & Itti, 2014; Coco & Keller, 2014; Mills et al., 2011). Therefore, we added the spatial positions of fixations and positionweighted visual features to our models. This was done by having three variables coding right, down, and proximity to the center of

the coordinates for each fixation. Right and down are simply the Cartesian coordinates of a fixation from the center, normalized to [⫺1, 1] range, with the right side, and lower parts of the image being positive. The closeness to center was measured as the Gaussian distance (␴x ⫽ 4°, ␴y ⫽ 3°) of the center of each fixation from the center of the image (as opposed to Euclidian distance). We did this because our principal component analysis on the coordinates showed that Gaussian distance explained slightly more of the variance of the spatial positions than Euclidian distance. The interaction of a specific visual feature with spatial position was calculated by first multiplying each fixation’s value of right, down, and center by the z score of a visual feature’s value. Then, to avoid overcrowding the models with too many variables, we

6 We did this because an image with higher entropy is also likely to have higher local entropy values. Nevertheless, an analysis without subtracting the overall entropy values made little (if any) change to entropy’s contributions to the results.

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1434

Figure 4. A sample image, its saturation map, and the masked fixation neighborhoods of the saturation map (every third fixation is plotted, i.e., Fixations 1, 4, 7, . . ., 19). Warmer colors show higher values of saturation. See the online article for the color version of this figure.

performed a PCA on the Feature ⫻ Position values and reduced them into one variable (the first principal component of the three interactions), which explained at least 70% of the total variance in the three vectors. Thus, for every visual feature, there is one interaction term with the spatial positions.

Results Canonical Correlation Analysis: Method In a canonical correlation analysis (Hotelling, 1936; Johnson & Wichern, 1992), first, the weights that maximize the correlation of the two weighted sums (linear composites) of each set of variables (called canonical roots) are calculated. Then the first root is extracted and the weights that produce the second largest correlation between the summed scores is calculated, subject to the constraint that the next set of summed scores is orthogonal to the previous one. Each successive root will explain a unique additional proportion of variability in the

two sets of variables. There can be as many canonical roots as the minimum number of variables in either of the two sets, which is four in this analysis. Therefore, we obtain four sets of canonical weights for each set of variables, and each of these four canonical roots have a canonical correlation coefficient which is the square root of the explained variability between the two weighted sums (canonical roots). To obtain canonical weights for variables and canonical correlation coefficients relating eye-movements to visual features, we used the same approach as in Kardan, Gozdyra, et al. (2015) for each visual task separately and then compared the loadings between tasks to look for systematic changes in the relationships between eye-movements and visual features based on the visual task (moderation). We first performed a canonical correlation analysis on the z scores of the variables using MATLAB. For a more straightforward interpretation and a better characterization of the underlying latent variables, instead of using the canonical weights, we calculated the Pearson correlation coefficients (canonical loadings) of

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

each observed variable in the set with the weighted summed scores for each of the four linear composites. This way, each canonical root (linear composite) could be interpreted as an underlying latent variable whose degree of relationship with each of the observed variables in the set (how much the observed variable contributes to the canonical variate) is reflected in the observed variable’s loading. Each variable is represented by the loading of the observed variable and its error bar, which is the 95% confidence interval calculated by bootstrapping the data (1,000 samples with replacement) and choosing the symmetrical range around each average that contains 95% of all values in the loading distribution. The partial r2 of every variable on the X side (side of latent variable consisted of visual features from the Figure 1) of the canonical analysis (within a single component) is then calculated as the square of its loading in that component, multiplied by the sum of the squared loadings of the variables on the Y side (the latent variable consisted of eye-movements from the Figure 1), multiplied by the shared variance of the two latent variables (R2) in that component. Summing the partial r2 of a variable across the components then results in the proportion of variance in eyemovements explained by that specific variable, because the components are orthogonal to each other. To investigate the relationship between eye-movement features (saccade amplitudes, fixation durations, and their interactions with time passed viewing in the trial) and the stimulus-driven features (hue, saturation, brightness, edge density, entropy, and saliency), their interactions with positions (Hue ⫻ Position, Saturation ⫻ Position, Brightness ⫻ Position, Edge Density ⫻ Position, Entropy ⫻ Position, and Saliency ⫻ Position), and the spatial locations (right, down, center), we performed canonical correlation analyses relating these two sets of variables separately for the three different visual tasks (search, memorization, and preference) based on the model presented in Figure 1. We then compared the differences in the partial r2 of each visual feature across tasks to investigate the task-dependent changes in the relationship between visual features and eye-movements.

Canonical Correlation Analysis: Results Canonical correlation analyses revealed that fixation durations, saccade amplitudes, and their interactions with time of viewing were reliably related to spatial positions, visual features, and the interaction of visual features with positions in all three tasks. In all four canonical components, a significant amount of variance of eye-movements was explained by visual features and spatial locations (see Supplementary Materials, Figures S1–S12). To contrast the proportions of variance explained by each visual feature in each task, we calculated the partial r2 for each feature and its interaction with positions in each component and then summed them across the four components as explained in the methods section above. The results are shown in Figure 5, which contrasts the proportion of variance in eye-movements explained by each of the visual features, indicated by different colors, between tasks. Eye-movements were related to visual features more strongly in the search task than in both the memory task (Fischer’s Z ⫽ 6.00, p ⬍ .0017) and the preference task (Z ⫽ 7.77, p ⬍ .001). Specifically, saturation (Z ⫽ 6.85, p ⬍ .001), entropy (Z ⫽ 8.57, p ⬍ .001), and saliency (Z ⫽ 3.48, p ⫽ .011) explained more of the

1435

variance in eye-movements in the search task compared to the preference task. In addition, saturation (Z ⫽ 3.04, p ⫽ .050) and entropy (Z ⫽ 5.07, p ⬍ .001) were more related to eye-movements in the search task than in the memory task. Although the total explained variance of eye-movements by visual features was not found to be different between the preference and the memory tasks, the relative contributions of different features were different. Saturation (Z ⫽ 3.81, p ⫽ .002) and entropy (Z ⫽ 3.50, p ⫽ .011) were more predictive in the memory task, and hue (Z ⫽ 6.85, p ⬍ .001) was more predictive of eye-movements in the preference task.

Classification Analysis: Methods Classifiers. Throughout this study, we used four distinct classifiers: linear discriminant (LD), quadratic discriminant (QD), linear Gaussian naïve Bayesian classifier (GNB-L), and nonlinear Gaussian naive Bayesian classifier (GNB-N), as our previous work has indicated that the differences in the assumptions of these different classifiers can lead to different levels of performance when using eye-tracking variables to predict behavioral tasks (Kardan, Berman et al., 2015). Implementation of the classifiers was provided by the classify function in the Statistics toolbox in MATLAB (the classifier type was set to linear, quadratic, diagLinear, and diagQuadratic, respectively). These classifiers use a multivariate Gaussian distribution to model the classes and classify a vector by assigning it to the most probable class. The LD classification model contains an assumption of homoscedasticity, that is, that all classes are sampled from populations with the same covariance matrix. For our purposes, this assumption means that (a) the variance of each feature does not change across tasks, and (b) the correlation between each pair of features is the same for all tasks. The QD makes no such assumption, and instead estimates the covariance matrices separately for each class (that is, the variances of and the correlations between features are allowed to differ across tasks). The GNB classifiers impose a constraint that the covariance matrices are diagonal (in our case, this implies that the eyemovement features are uncorrelated); furthermore, linear GNB makes a further assumption of homoscedasticity, whereas nonlinear GNB does not make this assumption (that is, GNB-L assumes that the variance of each feature is the same across tasks, and GNB-N allows it to differ). Because the interactions between features are ignored by both versions of GNB, these two classifiers are univariate; whereas LD and QD are multivariate classifiers because they use the feature interactions to predict the class. Another distinction is between linear and nonlinear classifiers: The assumption of homoscedasticity is equivalent to separating the classes with a linear plane, otherwise the classes are separated with a nonlinear curved surface (therefore, both QD and GNB-N are nonlinear classifiers, whereas LD and GNB-L are both linear). Cross-validation procedure. All classifiers were evaluated using a cross-validation approach. A subset of trials was used to train the classifier, and the task was predicted for the trials that were not included in the training set. We used two approaches 7 All p values are Bonferroni corrected for 21 comparisons yielding an alpha of 0.0024.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1436

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

Figure 5. Aggregated results from the canonical components showing the explained variance of eyemovements (i.e., latent variables comprised of the fixation durations, saccade amplitudes, and their interactions with time of view) by each visual feature. Each color represents the partial r2 of one visual feature together with its interaction with the spatial positions. Asterisks accompanied by letters show significantly higher partial variance explained by the visual feature of the corresponding color and column (task) compared to that of other tasks (M ⫽ compared to memory, P ⫽ compared to preference). Black asterisks (top) compare the total variance explained between tasks. Numbers in the table correspond to the partial variance explained by each variable in each task. ⴱ p ⬍ .05. ⴱⴱ p ⬍ .001, with Bonferroni correction for 21 comparisons. See the online article for the color version of this figure.

for separating the data into training and test sets. The first approach used a within-participant classification, training and testing on data within the same participant in an iterative fashion. At each iteration, one trial from a single task was held out for testing, and the remaining trials (134 trials in total across all tasks) were used to train the classifier; this process was repeated until each trial was tested.8 The second approach used an across-participant classification in which all trials for a particular participant were iteratively tested using all of the trials from the remaining 71 participants for training. Likewise, this process was iterated until all trials for all participants had been tested. Feature loadings. The relative importance of each of the stimulus-driven features in the classifications was evaluated by removing the feature and also regressing it out of all of the remaining variables (i.e., the residuals of the other variables enter the classification analysis). For the eye-movement char-

acteristics, all the stimulus-driven features were first regressed out of fixation durations and saccade amplitudes for each task before entering classifications (i.e., their residuals were used for the classification). The relative importance of each of the eyemovement characteristics was then evaluated by removing it, its interaction with time of viewing, and its standard deviation and skewness from all participants’ data. After removal, we computed the difference in task prediction accuracy relative to prediction using the full set of features. This difference in accuracy was computed for all trials in within-participant and all 72 participants in across-participant classification and this

8 Leaving out 10% of trials (10-fold classification) yielded very similar results.

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

accuracy drop was used as the variable’s contribution to the classification.9

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Classification Analysis: Results

1437

the number of fixations also signal goal differences (even with all the stimulus-driven features regressed out of them).

Discussion

Within-participants. All classifiers performed significantly better than chance in predicting the visual task performed (see Table 1) within participants. As can be seen in the boxplots in Figure 6a, the distribution of prediction accuracies for all participants was always better than chance (except for GNB-L), with the nonlinear GNB classifier having the best performance. Confusion matrices10 in Panel (b) of Figure 6 show that the GNB-N classifier achieved the best average accuracy, and had the best specificity in differentiating all three tasks from each other equally well. Panel (c) of Figure 6 shows the feature loadings for the GNB-N classifier (See Feature Loadings section under Method). Results show that attended visual features signal task goal differences, with fixation saturation, entropy, edge density and saliency contributing the most and hue and brightness also contributing in distinguishing the goal of observers based on their own behavior (within-participant). Eye-movement characteristics also signal goal differences even with all the stimulus-driven features regressed out of them. Recall that when a visual feature was excluded from classification, its interaction with time of viewing was also removed. Therefore, each feature’s loading includes the contribution of that feature’s interaction with time of viewing as well. Across-participants. All classifiers performed significantly better than chance in predicting the visual tasks (see Table 2) across participants, as well. However, as one would expect, performance of classifiers was worse compared to within-participant classifications due to between-participants heterogeneity (see Kardan, Berman, et al., 2015). Figure 7a shows the distribution of prediction accuracies for all participants was always better than chance only in LD. Confusion matrices also show that LD had the best specificity in differentiating all 3 tasks from each other equally well. Panel (c) of Figure 7 shows the feature loadings in LD classification across participants. Again, the results showed that attended visual features signaled task goal differences, with fixation edge density and saliency contributing the most and entropy also contributing in distinguishing the goal of observers based on the other participants’ eye-movement behavior (i.e., across-participants). Compared to the within-participant analysis, color preference (hue) seems not to generalize across participants in distinguishing their goal at hand. Here again, eye-movement characteristics such as saccade amplitudes, fixation durations and

In this study, we showed that an observer’s visual task (at least when the tasks are as distinct as search, memorization, and aesthetic judgment) systematically change an observer’s eyemovements and the visual features they fixate on through the duration of the viewing task (i.e., over time of the task). The visual goal an observer pursues places the observer in different mindsets, and this moderates how visual features contribute to their eyemovement behavior. We provide evidence for this in two ways. Directly, we found that locations with higher entropy, and salience (containing more visual information), as well as more saturated colors relate to eye-movements more when performing a visual search task, but these relationships are reliably smaller (sometimes much smaller) when people have other visual goals such as memorizing a scene or judging the aesthetic beauty of a scene. This finding has an immediate implication for the modeling of overt attention: salience- based attention models do not tell the whole story if the context/mental state of the viewer is not considered (e.g., see Parkhurst, Law, & Niebur, 2002). This is because salience’s (as well as some other stimulus-driven features’) systematic relevance to overt attention can vary between tasks (e.g., salience for aesthetic judgment is not as important compared to salience for the visual search task). Indirectly, using classifications analysis we found that differences in attended locations with regards to local edge density, entropy, and salience were related to goal differences across individuals. We argue that the task-dependency of the size of the relationships between visual features and eye-movements show a moderated effect (i.e., the interaction between visual task and the visual environment as shown by path c in Figure 1). The assumption here is that the change of the relationship between stimulus-driven features and eye-movements (change in a) is directly due to the task (arrow c), and is not mediated by the eye-movements, which one might argue against. This argument is based on the possibility that arrow a is bidirectional, and hence the observed change in its magnitude based on the task is not a direct moderation of the forward a path (i.e., from image features to eye-movements), but is caused by a mediation of eye-movements through a backward a path (from eye-movements to image features). This appears to be a possibility at first glance because the canonical analysis that is used to measure a is a covariance-based analysis that has no directionality (bidirectional).

Table 1 The Statistics for t Tests Between Classification Accuracies of Within-Participant Task Classification and Chance

9 Another way to look at the feature loadings is to introduce only one visual feature and its interaction with time, and examine the classification’ prediction accuracy versus chance. This method gives inflated contributions for each visual feature because features are correlated and the contributions are not unique to singular features when adding features in isolation. However, this method provides additional information about each feature’s contribution to successful classification when compared with the elimination method. The results from this method are in Figures S26 and S27 of the Supplementary Materials. 10 Confusion matrices show the probability of true positive (diagonal) and false positive (off-diagonal) predictions. Higher values on the diagonal and lower values off diagonal show better performance. More similar values on the diagonal shows that the average accuracy is not driven by exceptional performance in detecting one specific task.

Classifier

Mean accuracy

t stat

df

p

LD QD GNB-L GNB-N

64.5% 63.1% 60.7% 69.6%

21.59 25.89 17.90 30.59

71 71 71 71

⬍.001 ⬍.001 ⬍.001 ⬍.001

Note. LD ⫽ linear discriminant; QD ⫽ quadratic discriminant; GNB-L ⫽ linear Gaussian naïve Bayes; GNB-N ⫽ Nonlinear Gaussian naïve Bayes. All classifiers performed well above chance (.33).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1438

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

Figure 6. (a) Performance of the four classifiers for within-participant classifications. Accuracy of each classifier when predicting the task is shown on the y-axis. The dot represents the median of classification performance of all participants. The box represents the middle two quartiles of classification performance and the whiskers represent the full range of classification performance. Outliers (⫾ 3.5 SD) are marked with a red ⫹. The dashed line indicates the level of chance prediction (0.333 or 33.3%). (b) Confusion matrices of classifications, showing the probability of false and true positive predictions. Higher values (hotter colors) on diagonal and lower values off diagonal show better performance, and more similar values on diagonal show that the average accuracy is not driven by exceptional performance in detecting one specific task. (c) Isolated contribution of each of the visual and eye-movement features to successful prediction of task within participants as indexed by the accuracy drop due to removing the feature and its interaction with time from the classification, as well as regressing it out of all other features. Saccade (Sac.) amplitude, fixation duration (Fix.Dur), and number of fixations (Num.FIX) were independent of visual features because all visual features were regressed out of them in the classification analyses. LD ⫽ linear discriminant; QD ⫽ quadratic discriminant; LGNB ⫽ Linear Gaussian Naive Bayes; NLGNB ⫽ Non-linear Gaussian Naive Bayes. See the online article for the color version of this figure.

This alternative explanation (mediation) assumes that the specific changes in the distributions of saccades and fixation durations (that comprise the eye-movements latent variable) based on the task (path b) have consequently caused the visual features to change (hence causing a change in path a due to the mediation by

eye-movements). To test this possibility, we took scan-paths from the search task, which were made on images 1, 2, 3, . . ., 135 and assigned them to a random permutation of the images, and recalculated the visual features at those fixations. We performed the same operation for the memorization task and the preference task,

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

Table 2 The Statistics for t Tests Between Classification Accuracies of Across-Participant Task Classification and Chance Classifier

Mean accuracy

t stat

df

p

LD QD GNB-L GNB-N

58.9% 58.2% 53.1% 54.8%

22.01 14.17 13.54 12.17

71 71 71 71

⬍.001 ⬍.001 ⬍.001 ⬍.001

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Note. LD ⫽ linear discriminant; QD ⫽ quadratic discriminant; GNB-L ⫽ linear Gaussian naïve Bayes; GNB-N ⫽ Nonlinear Gaussian naïve Bayes. All classifiers performed well above chance (.33).

and ran the same canonical analyses, relating newly calculated visual features with the eye-movement variables in each task (that has same distributions and time interactions as before). If we found that the relationship between the newly calculated visual features and eye-movements still systematically changed by task, the mediation hypothesis would be supported. On the other hand, if the change in a due to task is dependent on the visual features to begin with (hence an interaction between task and visual features), this procedure will remove all the differences in path a between the tasks and make a for every task zero, because the moderation effect is dependent on path a; so by randomizing the scanpaths on different images, the interaction between a and b becomes zero as well. We performed this analysis, and the results showed that all relationships between visual features and eye-movements were removed similarly across tasks (see Figure S13 in the Supplementary Material). This means that any task-dependent changes in the eye-movements that is correlated with the visual features and we found in our results has to be due to a mechanism that is dependent on the visual features to begin with (i.e., an interaction between the task type and the visual features). Therefore, the differences in path a based on the tasks are not caused by the changes in the distribution of fixation durations, saccade amplitudes, and their interaction with time, but rather are caused by the task interacting with the visual features. Importantly, our results not only support the model we proposed by providing evidence for the a (saliency hypothesis) and c pathways, they also support the b pathway (cognitive relevance hypothesis). The b pathway represents a direct and independent relationship between top-down context of the viewer and gaze control. To show this, in the classification analyses we removed any linear relationship that existed between the low-level visual features and the eye-movements by regressing them out of fixation durations and saccades before introducing them into the classification analyses. Feature loadings for both within- and acrossparticipant classifications showed that even after removing the possible effects of local visual features on eye-driven characteristics within each task, the distinction of these characteristics between different tasks still systematically signaled the observer’s visual goal. Hence, visual goal can, independent of low-level properties of the visual environment, change eye-movement behavior (in addition to its interactive effect). This finding provides strong support for the cognitive relevance theory. An alternative, but less likely account would be that there are other local low-level visual features that are independent of all of the six visual features that we measured, which guide the eyes differently for the three

1439

visual tasks and would account for all of this direct contribution we found. Our second finding was related to the importance of each visual feature in guiding eye-movements across different tasks, and to determine which features are more idiosyncratic (i.e., subject to individual differences) and which are more generalizable. Canonical correlation analyses showed that visual features seem to be more tightly relevant to eye-movements in the search task compared to the memorization and the aesthetic judgment tasks. Specifically, locations with higher “information” density in the scene (higher entropy, and salience) tend to be more relevant to overt attention when the visual search task is performed. This could be the result of the time-limited nature of the visual search task (i.e., one only has 8 s to find the target), which may make a place with denser information more carefully assessed to avoid having to return to it again. Tasks with no time-dependent goals such as aesthetic judgment, on the other hand, could be done more arbitrarily with little or no cost of “missing something.” It is a bit surprising that we did not find a larger systematic relationship between eye-movements and low-level visual features in the scene memorization task compared to aesthetic judgment, because, from an information processing point of view, it is cost-effective to look for “denser information” when trying to encode a “summary” of the scene. However, it could be the case that for memorization strategies that rely on higher-level semantic content of the scenes (relationship between the objects, social content, etc.) are much more effective. Those strategies may be more diverse and idiosyncratic and low-level features such as salience may not be able to capture them (Birmingham, Bischof, & Kingstone, 2009; also see: Khosla, Raju, Torralba, & Oliva, 2015). Finally, attended colors in the scene could also become relevant or irrelevant based on the goal of a task in a systematic way. This is inferred based on local hue’s relationship with overt attention being relatively large in the preference task, but not the other two tasks. In our previous research (Kardan, Berman, et al., 2015), we found that saccades and fixations were most prone to changes across viewing time when performing visual search, moderately prone to changes across viewing time in the memory task, and were most homogenous for aesthetic judgments. Thus, we inferred that adaptive alterations in eye-movement behavior were more drastic in search, more modest in memorization (with a less immediate goal), and relatively small in preference judgments over the time course of scene viewing. Those results suggested that the gaze control system could be modeled as a control system with feedback, where the gain on the feedback component changes based on the task or cognitive goal. When the gain of the feedback component of the gaze control system is turned up and is comparable to the gain of the feed-forward component of the system, more temporally and spatially drastic eye-movements are likely to happen (as in the search task), whereas smaller gain on the feedback component (e.g., aesthetic judgment) makes eye-movements less variable (Kardan, Berman, et al., 2015). Interestingly, the current study’s results could point to a similar process. The moderation of the relationship between the stimulus-driven visual features and eye-movements (the c pathway) could be due to the same process that modulates the gain of the feedback in the model of the gaze control system that was just described based on our previous work (Kardan, Berman, et al., 2015). When the gain of the feedback component of the gaze control system is turned up

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1440

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

Figure 7. (a) Performance of the four classifiers in across-participant classifications. Accuracy of each classifier when predicting the task is shown on y-axis. The dot represents the median of classification performance of all participants. The box represents the middle two quartiles of classification performance and the whiskers represent the full range of classification performance, outliers (⫾ 3.5 SD) are marked with a red ⫹. The dashed line indicates the level of chance prediction (0.333 or 33.3%). (b) Confusion matrices of classifications, showing the probability of false and true positive predictions. Higher values (hotter colors) on diagonal and lower values off diagonal show better performance, and more similar values on diagonal show that the average accuracy is not driven by exceptional performance in detecting one specific task. (c) Isolated contribution of each of the visual and eye-driven features to successful prediction of task across participants as indexed by the accuracy drop due to removing the feature and its interaction with time from the classification, as well as regressing it out of all other features. Saccade (Sac) amplitude, fixation duration (FIX.Dur), and number of fixations (Num.FIX) were independent of visual features because all visual features were regressed out of them in the classification analyses. LD ⫽ linear discriminant; QD ⫽ quadratic discriminant; LGNB ⫽ Linear Gaussian Naive Bayes; NLGNB ⫽ Non-linear Gaussian Naive Bayes. See the online article for the color version of this figure.

and is comparable to the gain of the feed-forward component of the system, large feedback causes (a) more stimulus-dependency in eye-movements and (b) more dynamic alterations to fixations demonstrated by more temporally and spatially drastic fixations. This leads to (a) more stimulus-adaptive overt-attention output

(search task in this study) and (b) less temporally and spatially homogenous overt-attention output (search task in Kardan, Berman, et al., 2015), where overt attention deployments are reflected in the eye-movement behavior. If, however, the gain of the feedback component is set to be small relative to the feed-forward

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VISUAL INPUT, COGNITIVE STATES, AND GAZE CONTROL

component, the eye-movement output will be less varied and less skewed (preference task in Kardan, Berman, et al., 2015), as well as less dependent on the visual features of the stimulus (preference task in current study). In this study, we reduced the top-down cognitive context of the viewer into the goal of one out of three possible tasks. This is probably the main limitation of our study, because there are many more potential variables that can play a role in setting the “context” for the viewer. For example, familiarity/novelty of the scenes and their semantic content can put people in different levels of arousal or systematically change their motivations, which could interact with what features of the environment they attend to. Another example is the age or culture of the observer, which could potentially affect the top-down control of viewing. In our future work, we aim to assess the model in a more comprehensive manner and also look for neural evidence for different components of the proposed model. In summary, these findings suggest that the cognitive state of observer, as indexed by the instructed visual task, moderates the relationship between low-level features of the visual environment and overt attention of the observer.

References Anderson, N. C., Ort, E., Kruijne, W., Meeter, M., & Donk, M. (2015). It depends on when you look at it: Salience influences eye movements in natural scene viewing and search early in time. Journal of Vision, 15, 9. http://dx.doi.org/10.1167/15.5.9 Andrews, T. J., & Coppola, D. M. (1999). Idiosyncratic characteristics of saccadic eye movements when viewing different visual environments. Vision Research, 39, 2947–2953. http://dx.doi.org/10.1016/S00426989(99)00019-X Berman, M. G., Hout, M. C., Kardan, O., Hunter, M. R., Yourganov, G., Henderson, J. M., . . . Jonides, J. (2014). The perception of naturalness correlates with low-level visual features of environmental scenes. PLoS ONE, 9, e114572. http://dx.doi.org/10.1371/journal.pone.0114572 Birmingham, E., Bischof, W. F., & Kingstone, A. (2009). Saliency does not account for fixations to eyes within social scenes. Vision Research, 49, 2992–3000. http://dx.doi.org/10.1016/j.visres.2009.09.014 Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 185–207. Borji, A., & Itti, L. (2014). Defending Yarbus: Eye movements reveal observers’ task. Journal of Vision, 14, 29. http://dx.doi.org/10.1167/14 .3.29 Buswell, G. T. (1935). How people look at pictures. Chicago, IL: University of Chicago Press. Castelhano, M. S., Mack, M. L., & Henderson, J. M. (2009). Viewing task influences eye movement control during active scene perception. Journal of Vision, 9, 1–15. http://dx.doi.org/10.1167/9.3.6 Cerf, M., Harel, J., Einhäuser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In (D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou, Eds.), Advances in neural information processing systems 21: Proceedings from Neural Information Processing Systems Conference (pp. 241–248). La Jolla, CA: NIPS. Coco, M. I., & Keller, F. (2014). Classification of visual and linguistic tasks using eye-movement features. Journal of Vision, 14, 11. http://dx .doi.org/10.1167/14.3.11 Greene, M. R., Liu, T., & Wolfe, J. M. (2012). Reconsidering Yarbus: A failure to predict observers’ task from eye movement patterns. Vision Research, 62, 1– 8. http://dx.doi.org/10.1016/j.visres.2012.03.019

1441

Henderson, J. M. (2007). Regarding scenes. Current Directions in Psychological Science, 16, 219 –222. http://dx.doi.org/10.1111/j.1467-8721 .2007.00507.x Henderson, J. M. (2013). Eye movements. In D. Reisberg (Ed.), The Oxford handbook of cognitive psychology (pp. 69 – 82). New York, NY: Oxford University Press. http://dx.doi.org/10.1093/oxfordhb/ 9780195376746.013.0005 Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & Mack, M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. In R. van Gompel, M. Fischer, W. Murray, & R. Hill (Eds.), Eye movements: A window on mind and brain (pp. 537–562). Oxford, United Kingdom: Elsevier. http://dx.doi.org/10.1016/ B978-008044980-7/50027-6 Henderson, J. M., Chanceaux, M., & Smith, T. J. (2009). The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements. Journal of Vision, 9, 32, 1– 8. Henderson, J. M., & Luke, S. G. (2014). Stable individual differences in saccadic eye movements during reading, pseudoreading, scene viewing, and scene search. Human Perception and Performance, 40, 1390 –1400. http://dx.doi.org/10.1037/a0036330 Henderson, J. M., Malcolm, G. L., & Schandl, C. (2009). Searching in the dark: Cognitive relevance drives attention in real-world scenes. Psychonomic Bulletin & Review, 16, 850 – 856. http://dx.doi.org/10.3758/PBR .16.5.850 Henderson, J. M., Shinkareva, S. V., Wang, J., Luke, S. G., & Olejarczyk, J. (2013). Predicting cognitive state from eye movements. PLoS ONE, 8, e64937. http://dx.doi.org/10.1371/journal.pone.0064937 Henderson, J. M., Weeks, P. A., Jr., & Hollingworth, A. (1999). The effects of semantic consistency on eye movements during complex scene viewing. Journal of Experimental Psychology: Human Perception and Performance, 25, 210 –228. http://dx.doi.org/10.1037/0096-1523.25.1.210 Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377. http://dx.doi.org/10.1093/biomet/28.3-4.321 Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489 –1506. http://dx.doi.org/10.1016/S0042-6989(99)00163-7 Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254 –1259. http://dx.doi.org/10 .1109/34.730558 Johnson, R. A., & Wichern, D. W. (1992). Applied multivariate statistical analysis (Vol. 4). Englewood Cliffs, NJ: Prentice Hall. Kardan, O., Berman, M. G., Yourganov, G., Schmidt, J., & Henderson, J. M. (2015). Classifying mental states from eye movements during scene viewing. Journal of Experimental Psychology: Human Perception and Performance, 41, 1502–1514. http://dx.doi.org/10.1037/a0039673 Kardan, O., Demiralp, E., Hout, M. C., Hunter, M. R., Karimi, H., Hanayik, T., . . . Berman, M. G. (2015). Is the preference of natural versus man-made scenes driven by bottom-up processing of the visual features of nature? Frontiers in Psychology, 6, 471. http://dx.doi.org/10 .3389/fpsyg.2015.00471 Kardan, O., Gozdyra, P., Misic, B., Moola, F., Palmer, L. J., Paus, T., & Berman, M. G. (2015). Neighborhood greenspace and health in a large urban center. Scientific Reports, 5, 11610. http://dx.doi.org/10.1038/ srep11610 Khosla, A., Raju, A. S., Torralba, A., & Oliva, A. (2015). Understanding and predicting image memorability at a large scale. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2390 –2398). Retrieved from http://www.cv-foundation.org/openaccess/content_iccv_ 2015/html/Khosla_Understanding_and_Predicting_ICCV_2015_paper .html Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219 – 227.

KARDAN, HENDERSON, YOURGANOV, AND BERMAN

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1442

Land, M., & Tatler, B. (2009). Looking and acting: Vision and eye movements in natural behaviour. New York, NY: Oxford University Press. Mills, M., Hollingworth, A., Van der Stigchel, S., Hoffman, L., & Dodd, M. D. (2011). Examining the influence of task set on eye movements and fixations. Journal of Vision, 11, 17. http://dx.doi.org/10.1167/11.8.17 O’Connell, T. P., & Walther, D. B. (2015). Dissociation of salience-driven and content-driven spatial attention to scene category with predictive decoding of gaze patterns. Journal of Vision, 15, 20. http://dx.doi.org/ 10.1167/15.5.20 Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123. http://dx.doi.org/10.1016/S0042-6989(01)00250-4 Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136.

Walther, D., & Koch, C. (2006). Modeling attention to salient protoobjects. Neural Networks, 19, 1395–1407. http://dx.doi.org/10.1016/j .neunet.2006.10.001 Wolfe, J. M. (1994). Visual search in continuous, naturalistic stimuli. Vision Research, 34, 1187–1195. http://dx.doi.org/10.1016/00426989(94)90300-X Yarbus, A. L. (1967). Eye movements and vision. New York, NY: Springer. http://dx.doi.org/10.1007/978-1-4899-5379-7_8

Received September 15, 2015 Revision received January 11, 2016 Accepted February 8, 2016 䡲

Correction to Hopstaken et al. (2016) In the article “Shifts in Attention During Mental Fatigue: Evidence From Subjective, Behavioral, Physiological, and Eye-Tracking Data” by Jesper F. Hopstaken, Dimitri van der Linden, Arnold B. Bakker, Michiel A. J. Kompier, and Yik Kiu Leung (Journal of Experimental Psychology: Human Perception and Performance, 2016, Vol. 42, No. 6, pp. 878 – 889. http://dx.doi.org/10.1037/ xhp0000189), there were formatting errors in columns 1 through 8 of Table 2. The correct table is presented below. Table 2 Within-person Correlations between Measures, N ⫽ 238 Observations ICC

1. 2. 3. 4. 5. 6. 7. 8. 9.

Subjective Fatigue Subjective Task Engagement d-prime P3 Amplitude Baseline Pupil Diameter AoI task AoI faces AoI rest screen AoI offscreen/missing

0.39 0.51 0.36 0.67 0.84 0.73 0.19 0.80 0.61

1

2

3

4

5

6

7

8

⫺.47ⴱⴱⴱ ⫺.34ⴱⴱⴱ .51ⴱⴱⴱ ⫺.38ⴱⴱⴱ .51ⴱⴱⴱ .44ⴱⴱⴱ ⫺.34ⴱⴱⴱ .35ⴱⴱⴱ .38ⴱⴱⴱ .24ⴱⴱⴱ ⫺.12 .25ⴱⴱⴱ .28ⴱ .29ⴱⴱⴱ .18 .14ⴱ ⫺.19ⴱⴱⴱ ⫺.12 ⫺.14ⴱⴱ ⫺.32ⴱⴱⴱ ⫺.39ⴱⴱⴱ ⫺.06 .04 .09 ⫺.05 .29ⴱⴱ ⫺.70ⴱⴱⴱ ⫺.11ⴱⴱⴱ .21ⴱ ⫺.38ⴱⴱⴱ ⫺.54ⴱⴱⴱ ⫺.36ⴱⴱⴱ ⫺.48ⴱⴱⴱ ⫺.65ⴱⴱⴱ .16ⴱⴱ .06

Note. Area of interest (AoI) measures indicate the amount of time a specific area of the screen was attended. ⴱ p ⬍ .05. ⴱⴱ p ⬍ .01. ⴱⴱⴱ p ⬍ .001. http://dx.doi.org/10.1037/xhp0000300

Suggest Documents