Visual synchrony affects binding and segmentation in ... - CiteSeerX

27 downloads 0 Views 137KB Size Report
Neil Blue and Andrew Found for software assistance, and David Vernon and Zeri James for help in running experiments. Correspondence and requests for ...
Visual synchrony a ects binding and segmentation in perception Marius Usher and Nick Donnelly Department of Psychology University of Kent at Canterbury Kent, CP2 7NP, UK

Published in Nature, 394, 9 July, 1998, pp179-182

The visual system analyses information by decomposing complex objects into simple components (visual features) widely distributed across the cortex [1, 2] When several objects are simultaneously present in the visual eld, a mechanism is required to group (bind) together visual features belonging to each object and to separate (segment) them from features of other objects. An attractive scheme for binding visual features into a coherent percept consists of synchronising the activity of their neural representations [3, 4, 5, 6]. If synchrony plays a major role in binding, one should expect that grouping and segmentation are facilitated in visual displays that induce stimulus dependent synchrony by temporal manipulations. Here we show that visual grouping is indeed facilitated when elements of one percept are presented simultaneously and are temporally separated (on a scale below the integration time of the visual system [7]) from elements of another percept or from background elements. The experimental results indicate that the e ect is due to a global mechanism of grouping caused by synchronous neural activation and not to a local mechanism of motion computation. 1

Despite compelling neurophysiological support for the synchrony-binding hypothesis from animal studies (for review see [8, 9]), the rst psychophysical studies on humans which tested the e ects of temporal manipulations on visual binding did not provide positive results [10, 11, 12]. Recently however, a new series of studies has demonstrated [13, 14, 15] that human subjects are able to perform texture discrimination when texture and background elements were spatially identical but presented in a di erent temporal phase (i.e., on the basis of temporal information only). While this might provide the rst psychophysical con rmation of the use of temporal synchrony for visual binding in humans, an alternative explanation needs to be ruled out. Texture discrimination can be computed either on the basis of grouping (of the elements within each texture separately) or on the basis of local gradients at texture boundaries [16]. In this case, one could argue that such gradient computations at boundaries generate a (possibly implicit) motion signal, that partially mediates the e ect. To rule out such an explanation, two experimental paradigms (different from texture discrimination) were used in the present study, in an attempt to, i) engage global grouping and segmentation processes but not local boundary computations, ii) demonstrate a grouping mechanism that is independent of the computation of motion (even if only implicit) and, iii) demonstrate that grouping by temporal asynchrony interacts with spatial information thus excluding a statistical decision based on two independent sources of information. The rst set of experiments tested grouping within a symmetric square lattice display (Figure 1A), which is typically perceived as either rows or columns [17, 18]. Three types of displays were randomly presented in a mixed design (see Figure 1A), and subjects were required to perform a forced choice about their perception of the lattice (rows or columns). In condition one, all the elements were ashed on and o together (synchronous trials); in condition two, at each successive time cycle, alternating rows of the lattice were ashed on together (asynchronous-row trials); in condition three, alternating columns were ashed on together (asynchronous-columns trials). The temporal asynchrony in all the experiments reported was 16 ms which corresponds to a total time cycle of 32 ms. This asynchrony 2

resulted in a perfectly steady lattice percept with no icker or motion reported by subjects. As expected, decisions on synchronous trials were evenly divided between rows and columns (due to the symmetry of the square lattice). However, on asynchronous displays, the temporal structure a ected subjects' perception. The probability for choosing rows or columns consistent with the temporal manipulation is shown in Figure 1B, and is much larger than chance (one sample t[7] = 35:3; p < :001). This was so although, (i) display times were so short that eye movements could not be made, (ii) the temporal asynchrony was much lower than that which can be detected in temporal discrimination tasks using similar displays [19]. A second result that is of interest is that the probability of correct detection on asynchronous trials increases with contrast (F (1; 7) = 34:76; p < :01) but decreases with increasing display duration (see Figure 1B; F (2; 14) = 6:84; p < :01). While the e ect of display duration is opposite to that predicted by Bloch's law, it can be easily explained by a simple model which shows that short displays (that result in transient activation) are temporally more distinctive than longer displays (that engage a steady activation pattern), as shown in Figure 1D. Finally, we nd a strong trend for better performance with circles than crosses (F (1; 7) = 5:3; p = :055). This might suggest that the impact of stimulus induced synchrony is larger when internally induced synchronisation does not dominate; internal synchronisation is likely to be stronger for crosses than for non-oriented elements (circles), due to the lateral connections between cells with similar orientation preference within V1 [20, 21]. A similar nding was reported in [14], where it was shown that the tendency to choose a target based on temporal grouping is diminished when its percept con icts with another percept based on spatial grouping. Insert Figure 1 about here Although the subjects reported no motion in the lattice display, we attempted to rule out an explanation of the previous results based on an implicit motion computation (vertical oscillatory `motion' of rows and horizontal oscillatory `motion' of columns). To achieve 3

this, another group of subjects performed an experiment using the same displays, but was required to make a three alternative forced choice (3AFC) between rows (or `moving' vertically), columns (or `moving' horizontally) and synchronous (or `not moving') responses. If motion was the mechanism mediating performance on this task, one should expect more confusions between types of asynchronous (`moving') displays than between synchronous and asynchronous displays, as studies of motion perception have shown that the threshold for discriminating between two orthogonal directions is higher than the threshold for detecting motion [22] (the thresholds are close to equal only for directions with angles larger than 120 deg.). The results (see Figure 1c), show that subjects performed much better than chance (Pcorrect = :59  :06(SE ); Pchance = :33, one sample t(7) = 4:03; p < :005). Most importantly, a comparison of error types between conditions demonstrates that subjects are much more likely to confuse synchronous (`non-moving') with either of the asynchronous (`moving') displays, than to confuse between asynchronous displays themselves (F (1; 7) = 21:28; p < :01). Furthermore, the stimulus-response matrix (averaged across subjects and durations; shown in Table 1) demonstrates that the tendency to confuse synchronous and asynchronous stimuli is not the result of a moderate response bias for generating rows or columns responses relative to synchronous responses overall (1.02 and 1.12 versus .87). When either row or column displays were presented, and errors made, they were more likely to involve synchronous responses than the opposing asynchronous response (.23 and .18 relative to .15); this is opposite from that expected on the basis of a response bias alone (that should favour row/column responses). Therefore, the pattern of confusions contradicts the predictions of a motion mechanism that predicts better motion detection than motion discrimination. These results strongly suggest that grouping (in rows/columns) and not implicit motion is the mechanism responsible for the e ect. In a second type of experiment, we tested the e ect of temporal asynchrony on a task that required subjects to group and detect quasi-collinear line elements among a background of randomly oriented line elements using a four alternative force choice (4AFC) task (illustrated in Figure 2A). Note that this is not a texture discrimination task, since 4

elements within the background are not uniformly oriented and the `target' does not enclose a surface. Fast grouping on the basis of spatial information only has been demonstrated for such a task [23]. As shown in Figure 2B, subjects performed much better when target and background elements were ashed cyclically and asynchronously at a time lag of 16 ms (collinear-asynchronous condition), than in the collinear-synchronous condition (F (1; 4) = 10:6; p < :05). This demonstrates that a short temporal asynchrony of 16 ms facilitates the binding of elements within the target and their segmentation from background elements. Figure 2C shows that performance drops to chance level (25%), at the left most data point (one sample t(9) = 1:83; p > :1), when the same targets elements (with the same spatial locations) have collinearity removed by randomising their orientations so that they are distinguished from background only by the 16 ms asynchrony (random-asynchronous condition). Since the random-asynchronous condition should provide the same motion signal as the original display (because the spatial-temporal di erence between target and background elements are statistically the same), this demonstrates that the facilitation e ect is not due to an implicit motion detection (according to which some facilitation should have been found in the random-asynchronous condition). A replication of this study, using a within-subject design (8 subjects), and increasing the temporal resolution of the display to 75 Hz (13 ms.) provided similar results. For each subject, we compared performance in the collinear-asynchronous condition with the performance predicted from a multinomial model based on the independent probabilistic summation of spatial and temporal information and guessing (with chance probability :25, corresponding to 4AFC); computed aacording to: Pind = Pspatial +(1 ; Pspatial )Ptemporal + :25(1 ; Pspatial )(1 ; Ptemporal ); where Pspatial and Ptemporal are estimated from: Prandom;async = Ptemporal + :25(1 ; Ptemporal ), Pcolinear;sync = Pspatial + :25(1 ; Pspatial ). For each subject, the performance in the collinear-asynchronous condition was higher than that predicted on the basis of independent probabilistic summation; the average di erence was .1 (one sample t(7) = 7:32; p < :001). The results of the grouping by collinearity experiments indicate that the temporal informa5

tion in our displays interacts with spatial information and is not combined independently. This is consistent with the nding reported in [11, 14], showing that temporal synchrony between a random set of elements of a texture target and elements in the background does not interfere with texture segregation on the basis of spatial features. A common explanation [14] is that temporal grouping between elements takes place only when the target elements have compatible spatial alignments through which they can be grouped into a coherent percept (in [14] this takes place when the elements enclose a surface; in our second set of experiments it takes place when they can be grouped into a continuous curve). Insert Figure 2 The two sets of experiments, together with other recent studies [14, 15] suggests that a small temporal asynchrony, below the visual integration time scale, can have a direct effect on grouping. These results provide evidence for a synchrony-binding mechanism and support previous animal studies which demonstrated that a failure of neural synchrony correlates with visual binding de cits in strabismic amblyopic cats [24]. Nevertheless, further combined psychophysical and neurophysiological studies are required to test and reveal the nature of the synchrony-binding mechanism.

References [1] Van Essen D., C., (1985), "Functional organisation of primate visual cortex", in A. Peters & E. G. Jones (Eds.), A. Peters & E. G. Jones (Eds.), Cerebral Cortex, New-York: Plenum. [2] Tanaka K., & Saito Y. (1991). "Coding of visual images of objects in the inferotemporal cortex of macaque monkey. Journal of Neurophysiology, 66, 170-189. [3] Milner P. (1974), "A model for visual shape recognition", Psychological Review, 81, 521-535. [4] von der Malsburg C. (1981), "The correlation theory of brain function", Internal Report 81-2. Gottingen, Max Plank Institute for Biophysical Chemistry. Reprinted in: Domany 6

E., van Hemmen J., L., & Shulten K., (Eds.), "Models of Neural Networks II. Berlin: Springer, pp 95-119. [5] Crick F. & Koch C. (1990), "Towards a neurobiological theory of consciousness", Seminars in Neuroscience, 2, 263-275. [6] von der Malsburg C. (1995), "Binding in models of perception and brain function". Current Opinion in Neurobiology, 5, 520-526. [7] Colheart M (1980), "Iconic memory and visible persistence", Perception and Psychophysics, 27, 183-228. [8] Singer W. & Gray, C. M. (1995), "Visual feature integration and the temporal correlation hypothesis", Annual Review of Neuroscience, 18, 555-349-374. [9] Singer W., Engel A. K., Kreiter A. K., Munk M. H., J. (1997), "Neuronal assemblies: necessity, signi cance and detectability", Trends in Cognitive Science, 1, 252-261. [10] Keele S. W., Cohen A., Ivry R., Liotti M. & Lee P. (1988), "Tests of a temporal theory of attentional binding", Journal of Experimental Psychology: HPP, 14, 444-452. [11] Kiper D. S., Gegenfurtner K. R., & Movshon A. (1996), "Cortical oscillatory responses do not a ect visual segmentation", Vision Research, 36, 539-544. [12] Fahle M. & Koch C. (1995), "Spatial displacement, but not temporal asynchrony, destroys gural binding", Vision Research, 35, 491-494. [13] Fahle M. (1993) "Figure ground discrimination for temporal information", Proceedings of the Royal Soc. London, B, 254, 199-203. [14] Leonards U., Singer W., Fahle M. (1996), "The in uence of temporal phase di erences on texture segmentation" Vision Research, 36, 2689-2697 [15] Leonards U. & Singer W. (1998), "Two segmentation mechanisms with di erential sensitivity for colour and luminance contrast", Vision Research, 38, 101-109. [16] Beck J. (1982), "Textural segmentation", In J. Beck (Ed.), Organization and representation in perception, Hilsdale, NY: Erlbaum. 7

[17] Gregory R., L. (1997), "Eye and brain: the psychology of seeing" (3rd ed), London, Weidenfeld and Nicolson. [18] Barchilon Ben-Av M., & Sagi D., (1995) \Perceptual grouping by similarity and proximity: Experimental results can be predicted by intensity autocorrelations". Vision Research, 35, 853-866. [19] Oyama T & Yamada W (1978), "Perceptual grouping between successively presented stimuli and its relation to visual simultaneity and masking", Psychological Research, 40, 101-112. [20] Tso D., Y., Gilbert C. & Wiesel T. N, (1986), "Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis", Journal of Neuroscience, 3, 1160-1170. [21] Lund J. S., Takashi Y. & Levitt J. B. (1993), "Comparison of intrinsic connectivity in di erent cortex of the cerebral cortex", Cerebral Cortex, 3, 148-162. [22] Ball K., Sekuler R., & Machamer J. (1983), "Detection and identi cation of moving targets", Vision Research, 23, 229-238. [23] Field D., J., Hayes A., & Hess R. F. (1993), "Contour integration by the human visual system: evidence for a local "association eld" ", Vision Research, 33, 173-193. [24] Roelfsema P. R., Konig, P., Engel A. K., Sireteanu R., & Singer W. (1994), "Reduced synchronisation in the visual cortex of cats with strabismic amblyopia", European Journal of Neuroscience, 6, 1645-1655. [25] Loftus G. R. , Busey T., A., & Senders J. W. (1993), \Providing a sensory basis for models of visual information acquisition", Perception & Psychophysics, 54, 535-554.

Acknowledgements. We want to thank Yoram Bonneh, Ernst Niebur and Dov Sagi for very helpful discussions, Neil Blue and Andrew Found for software assistance, and David Vernon and Zeri James for help in running experiments.

Correspondence and requests for materials should be sent to Marius Usher, Dept. of Psychology, University of Kent, CT2 7NP, UK, email:[email protected] 8

stimulus-R stimulus-C stimulus-S Total

response-R response-C response-S .62 (.07) .15 (.04) .23 (.04) .15 (.04) .68 (.07) .18 (.04) .25 (.04) .29 (.05) .46 (.07) 1.02 1.12 .87

Table 1. Stimulus-response matrix (means and SE across 8 subjects) for the 3AFC task.

The three stimuli, R (rows), C (columns), and S (synchronous), are shown as rows, and the three responses as columns. The values are normalised to the fractions of response generated for each stimulus, and thus the numbers in each row sum to unity.

9

Figure Captions Figure 1. E ects of temporal structure on grouping. a) Display used in Experiment 1.

A square lattice of elements (either lled circles, as shown in the gure, or crosses) was presented for a variable duration time in one of three conditions (rows, columns and synchronous, as described below). b) Probability of choosing rows/columns (8 subjects) consistent with the stimulus manipulation (for asynchronous trials), as a function of the total display time, two levels of contrast (low-"L" and high "H") and the two types of elements ( lled circles or crosses); chance level is 50%. c) Error rates (8 subjects) for a triple choice between rows (moving up-down) columns (moving left-right), and static (synchronous). confusions between synchronous and asynchronous-rows stimuli (HS), confusions between synchronous and asynchronous-columns (VS), and between rows and columns (VH); chance level is 66%. d, e) The e ect of total duration on discriminability, D, of the two consecutive inputs in the ASYNC condition. Inputs (I1 ; I2 ) and the activations of hypothetical visual detectors, a1; a2 (obtained by convolving the inputs with an alpha function, f (t) = t exp(; t ), with  =R 10 ms [25], are shown for a short (d) and a long (e) display. (D is computed according to Rja1ja;1ajdt2 jdt ). The elements of the lattice appear relatively bright on a grey background; two levels of L2 element brightness were used with relative contrast ( LL1; 1+L2 ) of :3 and :07. The temporal structure of the display involves three type of trials, randomized within a a mixed design: i) Synchronous trials in which all the elements are ashed together for a 16ms time interval followed by a blank screen on the following 16 ms interval. This 32 ms cycle is repeated for a total duration time of either 32; 96; 160 ms (SYNC condition), ii) asynchronous row trials and, iii) asynchronous column trials, where successive rows or columns of elements, respectively, are ashed in successive 16 ms intervals (ASYNC condition). Subjects were required to make a forced choice decision about their perception of the lattice. Auditory feedback was given after responses inconsistent with the stimulus condition.

10

Figure 2. E ects of temporal structure on the detection of collinear target elements presented within a noisy background. a) Example display (the target is in the right-bottom quadrant and vertically oriented). b) Probability for correct detection (5 subjects) as a

function of the duration of the display for two temporal structures: i) synchronous (s) and ii) asynchronous (a). c) Probability of detecting the target against the background (10 subjects), when the orientation of the target line elements is randomised (control condition). The total display time is 200 ms, and the cycle time of the asynchrony is either 16, 32 or 48 ms. In both tasks the chance level is 25%: A curved line segment is embedded within a background noise mask made of randomly oriented line elements using the algorithm presented in [23]. The line segment is made of seven elements and appears randomly in one of the four quadrants of the display. Subjects were required to perform a 4AFC task, choosing the quadrant where the line segment was shown. Stimuli were randomized according to the following 2 conditions: i) synchronous (target and background were ashed together for 16 ms, every 32 ms, as in Figure 1A SYNC condition), and ii) asynchronous (target and background were ashed in opposite phase, as in Figure 1A ASYNC condition). A control condition required detection when the target line elements were presented in the same positions as in the collinear condition but with their orientation randomized; under this condition the target is characterised by temporal infrormation only. Auditory feedback was provided after incorrect responses.

11

Suggest Documents