I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
The Internet edition -- Note that this version is based on a pre-galley proof file. The formatting and other minor details are different from the published version. Please refer to the published article for citations. For our other on-line publicatons, please see our home page: http://totoro.berkeley.edu/
Encoding of Binocular Disparity by Simple Cells in the Cat's Visual Cortex Izumi Ohzawa#, Gregory C. DeAngelis*, and Ralph D. Freeman Group in Vision Science, School of Optometry, University of California, Berkeley, California 94720-2020
[email protected] [email protected] [email protected]
* Present address: Department of Neurobiology Stanford University School of Medicine Stanford, CA 94305-5401
#Reprint
request
J. Neurophysiol. 75: 1779-1805, 1996 Abbreviated title: Encoding of Binocular Disparity by Simple Cells Key words: binocular disparity, stereopsis, binocular vision, receptive field, cat, visual cortex 43 pages, 19 figures, 1 table
Summary and Conclusions 1. Spatiotemporal receptive fields (RFs) for left and right eyes were studied for simple cells in the cat's striate cortex to examine the idea that stereoscopic depth information is encoded via structural differences of RFs between the two eyes. Traditional models are based on neurons that possess matched RF profiles for the two eyes. We propose a model that requires a subset of simple cells with mismatched RF profiles for the two eyes in addition to those with similar RF structure. 2. A reverse correlation technique, which allows a rapid measurement of detailed RF profiles in the joint space-time domains, was used to map RFs for isolated single neurons recorded extracellularly in the anesthetized paralyzed cat. 3. Approximately 30% of our sample of cells shows substantial differences between spatial RF structure for the two eyes. Nearly all of these neurons prefer orientations between oblique and vertical, and are therefore presumed to be involved in processing horizontal disparities. On the other hand, cells that prefer oriJ. Neurophysiol. 75: 1779-1805, 1996
1
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
entations near horizontal have matched RF profiles for the two eyes. Considered together, these findings suggest that the visual system takes advantage of the orientation anisotropy of binocular disparities present in the retinal images. 4. For some cells, the spatial structure of the RF changes over the time course of the response (inseparable RF in the space-time domain). In these cases, the change is similar for the two eyes, and therefore the difference remains nearly constant at all times. Since the difference of the RF structure between the two eyes is the critical determinant of a cell's relative depth selectivity for the proposed model, space-time inseparability of RFs is not an obstacle for consistent representation of stereoscopic information. 5. RF parameters including amplitude, RF width, and optimal spatial frequency are generally well matched for the two eyes over the time course of the response. The preferred speed and direction of motion are also well matched for the two eyes. These results suggest that the encoding of motion-in-depth is not likely to be a function of simple cells in the striate cortex. 6. The results presented here are consistent with our model in which stereoscopic depth information is encoded via differences in the spatial structure of RFs for the two eyes. This model provides a natural binocular extension of the current notion of monocular spatial form encoding by a population of simple cells. Note, however, that our findings do not exclude the possibility that positional shifts of RFs also play a role in determining the disparity selectivity of cortical neurons.
Introduction Visual systems of animals with frontally located eyes possess the remarkable ability to construct a perception of a three-dimensional world based on a pair of two-dimensional retinal images. The relative difference in position of these images provides the stereoscopic cue for the perception of 3-dimensional space. Since Wheatstone's demonstration (1838) that minute differences in left and right images, which result from binocular parallax, provide very fine depth discrimination, this phenomenon has received extensive attention. Electrophysiological studies of neural mechanisms of stereopsis were initiated by Pettigrew (1965) and Barlow, et al. (1967). Although the existence of binocular neurons in the visual cortex had been established previously (Hubel and Wiesel 1962), Barlow and colleagues presented, for the first time, the notion that there are neurons in the primary visual cortex which respond selectively to stimuli positioned in space at various distances from the eyes (see Bishop and Pettigrew, 1986). By having a collection of these neurons covering visual space, it was proposed that the visual system could represent information about objects in depth. This notion is based on simple geometrical constraints of how images of objects in space are formed on the retinas of the two eyes (Joshua and Bishop, 1970). To illustrate the geometrical requirements for RFs of a binocular neuron, consider the special case illustrated schematically in Fig. 1A. The figure shows the condition in which the animal directly fixates a vertical bar stimulus at position P. An image P' of the bar is formed on each retina via the eye's optics. Left and right images fall on retinal corresponding points. By definition, a stimulus whose image falls on retinal corresponding points has a binocular disparity of zero. For a binocular neuron to respond maximally to the stimulus at position P, the peak of sensitivity of both the left and right RFs must be located at corresponding points. Fig. 1B shows a schematic depiction of such a RF pair. If the bar is moved closer, to position Q, while fixation is maintained at the original position P, left and right images no longer fall on corresponding points (represented here by a shift in the right retinal image). Therefore, the cell of Fig. 1B will no longer respond effectively to the stimulus at position Q. Instead, the RF arrangement illustrated in Fig. 1C is required. For a more distant stimulus, yet another neuron will be needed with RFs offset in the opposite direction. This traditional encoding notion requires a set of neurons responsive to different disparities at each retinal location. Early studies of neural disparity selectivity (Barlow, et al. 1967; Nikara et al. 1968; Pettigrew et al. J. Neurophysiol. (in press)
2
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells Left
A
P
Right
D
Zero Disparity Cell Position & Phase Models
E
Crossed Disparity Cell Position Model
Dopt
F
Crossed Disparity Cell Phase Model
Dopt
Q
Right
Left
P’ Q’
P’ Q’ Corresponding Points
B
Left
Right Zero Disparity Cell
P’
P’
C Crossed Disparity Cell Q’
Q’
Corresponding Points
Fig. 1 The geometry of stereoscopic viewing is shown along with alternative models of disparity encoding. A: An observer views two vertical line targets P and Q and fixates on target P. Q lies nearer to the observer than P. The image of target P falls on each retina at a pair of points P' known as retinal corresponding points. For the left eye, the image of target Q falls at the same position as that of P but for the right eye, the image is laterally displaced. Displacement of images from corresponding points is a measure of binocular disparity (see text). B: A simple scheme is shown to create neurons that are selective to binocular disparity by matching the locations of the RFs (open rectangles) to where images land on the retinas. The neuron is activated most when the images of the targets fall exactly within the RF for both eyes. A neuron that is selective to target P must have its RFs at corresponding points. Note that, strictly speaking, a RF is an area defined in the visual field within which visual stimulation elicits a response. However, we may use it to refer also to the conjugate area of the retina on which the RF in object space is imaged. C: RFs must be laterally offset from corresponding points for a neuron to become selective to targets at distances nearer (as in this case) or farther than the plane of fixation. D: RFs of simple cells possess internal structures consisting of multiple subregions, a cross-section of which is shown by solid curves. Dashed curves indicate the envelopes of the RFs. Representation of RFs by rectangles, as in B and C, is clearly an oversimplification. A neuron with identical RFs at corresponding points for the two eyes is selective to targets with zero disparity. E: Another cell that has an offset of one RF is selective to disparity Dopt. The position model assumes that the whole RF is offset while maintaining an identical internal RF structure for the two eyes. F: We hypothesize that selectivity to the same non-zero disparity as in E may arise from a difference in internal structure of left and right RFs while their envelopes remain at corresponding points. This is the phase model. 1968; Joshua and Bishop, 1970; von der Heydt et al., 1978; Poggio and Fischer, 1977) characterized RFs in terms of minimum response fields. The minimum response field is a rectangular area of the visual field which is highly excitatory (Barlow, et al. 1967). However, this description is clearly inadequate, because a typical RF cannot be represented accurately by a rectangular box. RFs of simple cells usually consist of multiple ON and OFF subregions of various strengths (Hubel and Wiesel, 1962; Schiller et al. 1976; Movshon et al. 1978a; Mullikin et al, 1984; Jones and Palmer, 1987a). More recent studies have included addiJ. Neurophysiol. (in press)
3
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
tional details of RFs (Maske, 1984; Ferster, 1981; LeVay and Voigt, 1988), but the basic notion that has been retained is that a neuron's relative depth preference is determined by the offset of left and right RF positions from the corresponding points (see Figs. 1D and 1E). The implicit assumption has been that the RF profiles for left and right eyes are the same. There are problems with the basic premises of this scheme, which we designate here as the position offset model, or the position model. First, the assumption of identical left and right RFs has not been established firmly, despite claims to that effect made by Hubel and Wiesel (1962) and Maske et al. (1984). The former study noted the results of subjective determinations of the RF structure for the two eyes. The latter presents RF profiles obtained using bright and dark bars that are swept over the RF. However, no quantitative method was used to evaluate the similarity of the left and right RFs. In addition, the use of moving bar stimuli may distort the RF profile due to the effects of stimulating ON and OFF regions sequentially (e.g., the profiles obtained from forward and reverse sweeps are generally not identical (Casanova et al. 1992)). If RF structure of left and right eyes is not always identical, an alternative scheme of disparity representation, which has appealing attributes, becomes possible. In this case, selectivity to a variety of disparities may arise because the internal subregion structures of RFs are different for the two eyes while the RFs are centered at corresponding points on the two retinas. Fig. 1F depicts a RF arrangement for a cell that responds to a crossed-disparity stimulus (position Q in Fig. 1A). Note that the envelopes of the RFs (shown by dashed curves) are located exactly at corresponding points, unlike the position model shown in Fig. 1E. The selectivity to a specific disparity arises from the shift of the internal subregion structure. This internal structure of simple cell RFs is known to be fit well by a Gabor function, which is the product of a Gaussian and a sinusoid (Gabor 1946; Marcelja, 1981; Daugmann 1985; Jones and Palmer, 1987a,b; DeAngelis et al. 1993a). The difference in subregion structure depicted in Fig. 1F corresponds to a phase shift of the sinusoidal component of a Gabor function. Hence, we refer to this scheme as phase-based disparity encoding, or in short, the phase model. For a cell that responds to a zero disparity stimulus, the RF configurations are the same for the phase model and the position model (see Fig. 1D). While the difference between the traditional position model and the phase model may appear to be minor, we will show that an important distinction should be made, and that the phase model can provide a unified understanding of the role of simple cells in the encoding of both monocular and binocular information. Phase-based disparity representation is attractive because it can be integrated more smoothly than the position model with current notions of monocular spatial form representation. Form representation schemes often employ wavelet-like basis functions that respond to small ranges of visual space, spatial frequency, and orientation (Marcelja, 1982; Robson, 1983; Watson, 1983; Sakitt and Barlow, 1982; Geisler and Hamilton, 1986; Daugman, 1985). The exact form of the basis functions is not important here, but one of the critical requirements for encoding form without loss of information is the presence of at least two basis functions at each position, spatial frequency and orientation. If the basis functions are Gabor functions, then two of them that are 90 apart in phase will suffice. Since simple cell RFs come in a variety of monocular phases (Field and Tolhurst, 1986; Jones and Palmer, 1987a; DeAngelis et al., 1993a), the idea of using phase for encoding form and disparity is appealing. Furthermore, computational schemes for stereopsis have been proposed in which the phase components of Gabor-filtered left and right images are used to obtain a binocular disparity map (Sanger, 1988; Jenkin and Jepson, 1988; Fleet et al. 1991). Although obtaining disparity from phase information is the primary focus of these schemes, the computed representation itself contains rich spatial form information, and allows a single unified representation to serve both stereoscopic and form perception. In other words, stereoscopic depth information may be "tapped off" of such a unified multimodal representation using binocular neurons with mismatched RFs for the two eyes. Since a mismatch of RFs is significant only in the binocular context, these neurons may still participate in monocular form encoding. This is more elegant than postulating "depth neurons" that specifically encode disparity information alone. It also provides a more realistic view of the striate cortex, since simple cells are selective to numerous parameters, and therefore cannot be considered to play an exclusive role in just one function. In this sense, the phase-based disparity encoding model that we propose here provides a natural extension, into the binocJ. Neurophysiol. (in press)
4
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
ular domain, of the current notion of representation of monocular spatial form information. With respect to the proposed phase model, the first key question is whether there are simple cells with different left and right RF profiles, as illustrated in Fig. 1F To address this question, we employ a measurement technique known as reverse correlation (DeBoer and Kuyper, 1968; Eggermont et al. 1983; Jones and Palmer, 1987a; Freeman and Ohzawa, 1990; DeAngelis et al. 1991,1993a). This method allows a highly efficient quantitative determination of detailed RF profiles in two dimensions of space and one dimension of time. Other important questions concern the relationships of disparity-related properties to other aspects of stimulus specificity of cortical neurons, such as orientation, spatial frequency and velocity tuning. For example, one of the most interesting questions is whether there is any specialization in the binocular system for processing horizontal disparities, which are larger on average than vertical disparities due to lateral displacement of the two eyes. This specialization may arise because contours oriented near vertical produce larger horizontal disparities in retinal images than horizontally oriented contours. Such an asymmetry may be detected by studying the dependence of binocular properties on the orientation preference of neurons. The report by Barlow et al. (1967) included evidence of a specialization for horizontal disparities in the form of a larger range of preferred binocular disparities along the horizontal dimension. However, subsequent studies have failed to duplicate this result (Joshua and Bishop, 1970; von der Heydt, 1978; Ferster, 1981; LeVay and Voigt, 1988). This issue, therefore, requires further investigation. We present evidence here that a specialization for processing horizontal disparities is indeed present, not in the form of positional offsets, but in terms of differences in RF phase. It is somewhat surprising that no previous physiological study has addressed the relationship between binocular disparity selectivity and spatial frequency selectivity, although there are reports that show evidence for correlation between RF size and preferred disparity (Pettigrew et al., 1968; Ferster, 1981). This is in marked contrast to psychophysical and computational investigations of stereopsis. The question of how the spatial frequency selectivity of the underlying neural mechanisms may affect psychophysical stereo performance has received a substantial amount of attention (Frisby and Mayhew, 1978; Mayhew and Frisby, 1979; Schor and Wood, 1983; Schor et al. 1984a,b; Wilson et al. 1991; Smallman and MacLeod, 1994). In computational and theoretical models of stereopsis, disparity information processing based on multiple spatial frequency channels is a key feature (Marr and Poggio, 1979; Sanger, 1988; Jenkin and Jepson, 1988). Given such extensive interest in this question from a psychophysical and theoretical standpoint, a physiological study of dependence of disparity selectivity on the spatial frequency preference is clearly needed. Methods Surgical procedure and preparation Adults cats (2 - 4kg) are prepared for physiological experiments as described elsewhere (DeAngelis et al. 1993a). Briefly, premedication consisting of Atropine sulfate (0.2 mg kg-1) and Acepromazine (1 mg kg-1) is given subcutaneously. Anesthesia is induced with Halothane (2.5 - 3% in oxygen) and maintained during surgery. ECG (electrocardiogram) electrodes and a rectal temperature probe are installed, and these two physiological parameters are monitored during initial surgery. The thermometer controls a heating pad and a lamp to maintain the body temperature close to 38 degs C. A catheter is inserted into a femoral vein on two forelimbs. Next, a tracheal cannula is inserted and the halothane anesthesia is continued. The animal's head is secured in a stereotaxic device. Lidocaine ointment (5%) is used at pressure points. The skull is exposed and two small machine screws are inserted over the frontal sinus for use as EEG electrodes. Then, a craniotomy is performed directly above the central representation of the visual field in the striate cortex (Horsley-Clark P4 L2.5). The dura is dissected away to allow insertion of microelectrodes. While carefully monitoring respiration and heart rate, anesthesia is switched to sodium thiamylal (Surital). After the anesJ. Neurophysiol. (in press)
5
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
thesia is stabilized, paralysis is induced with a loading dose of gallamine triethiodide (Flaxedil, 7 - 10 mg kg-1), and the animal is placed under artificial respiration at the rate of 25 strokes per min. Anesthesia is maintained by a combination of nitrous oxide (70% mixed with oxygen for respiration) and Surital (1 mg kg-1hr-1 in the infusion solution). The infusion fluid also contains Flaxedil (10 mg kg-1hr-1) and lactated Ringer's solution with 5% dextrose. At this time, a CO2 sensor and EEG (electroencephalogram) electrodes are attached. Expired CO2 level and EEG are monitored continuously in addition to the ECG and the core temperature. A PC based physiological monitoring system, that we have developed (Ghose et al. 1995), monitors and displays these parameters, automatically records parameter values into a log file, provides voice warnings when any of the parameters go beyond preset limits, and requests experimenter verification of vital signs at 30-minute intervals. This system helps to maintain physiological conditions near optimal levels at all times. Pupils are dilated with 1% atropine sulfate and nictitating membranes are retracted with 5% phenylephrine HCl. Contact lenses with 4 mm artificial pupils are applied to protect the corneas. Locations of optic disks and the area centrales are mapped onto a tangent screen using a reversible ophthalmoscope. Tungsten-in-glass microelectrodes (Levick 1972) are used for recording spike activity extracellularly. Typically, to increase the chance of encountering cells, two electrodes are mounted in parallel in a single protective guide tube and driven by a common microelectrode drive (Inchworm, Burleigh). The two electrodes are not attached and allow cortical tissue to pass between them. The horizontal separation of electrode tips is typically about 300-400µm, while the vertical separation ranges from 0 to around 200 µm. Experimental apparatus A schematic diagram of our experimental setup is shown in Fig. 2. The setup consists of a dichoptic visual stimulator, a pair of CRT (cathode ray tube) displays, a tangent screen with joystick-controlled optical rear projector, microelectrodes and amplifiers, a data acquisition system, and a computer for experiment control and data analysis. The dichoptic visual stimulator is based on a PC with two high-resolution graphics boards (Imagraph) running software written in our laboratory. The graphics boards generate video signals that drive a pair of
Amplifiers Action Potentials & Filters Microelectrodes
Video Signals
Tangent Screen
Display (Right)
Cat
Display (Left) Sync Pulses
2-Channel Visual Stimulator
Fig. 2
Serial (RS232)
Experiment Control & Analysis Computer
Data Acquisition System
A diagram is shown that illustrates our visual stimulation and data acquisition system.. See text for details.
J. Neurophysiol. (in press)
6
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
displays with a resolution of 1024 x 804 pixels at a frame rate of 76Hz. The timing of video frames on the two channels is synchronized in hardware to ensure that onset and offset of dichoptic stimuli are timelocked. Stimuli are delivered with a temporal resolution of 1 frame period (13.2 msec) by custom temporal modulation driver software. The driver utilizes CPU interrupts generated from one of the graphics boards on every vertical retrace to control the timing of stimulus sequences. The stimulator is capable of generating a wide variety of stimuli including multiple sinusoidal grating patches, and reverse correlation stimuli (see below). The stimulator receives stimulus specifications via a serial port from the experiment control computer. In addition to two video signals, the stimulator generates sync pulses that are time-locked to the temporal modulation of visual stimuli. The CRT displays are a pair of color monitors (Mitsubishi 6600K) operated in gray scale mode, i.e, a single video signal drives the RGB inputs in parallel. The termination resistors for two of the three channels are removed to prevent impedance mismatches that result in ghosts, i.e., multiple horizontally displaced edges resulting from reflected video signals. The mean luminance of the displays is 45 cd m-2, and 17 cd m-2 as viewed from a half-silvered mirror. The mirrors are positioned at a 45 angle in front of the cat's eyes, and form a simple haploscope that allows dichoptic viewing of independently controlled left and right stimuli. The mean luminance and contrast are matched for the two displays. Each screen has an image area of 28 x 22 at a distance of 57 cm from the cat's eyes. This screen resolution translates to 36.5 pixels per degree of visual angle, providing sufficient resolution to cover the range of spatial frequencies necessary for studying the cat's visual system. The tangent screen is viewed by the animal through the half-silvered mirrors. A bar-shaped stimulus, i.e., a rotatable rectangle of light, is back-projected onto the tangent screen via scanning mirrors mounted on pen motors. The position and orientation of the stimulus are controlled by a joy stick. This manually controlled stimulus is used only during initial exploration of the RF, and is turned off while the CRT displays are used for quantitative measurements. Two types of data acquisition systems have been used to record action potential data and stimulus sync pulses, as the system was upgraded during the course of these experiments. The majority of the data were collected by a system that uses an analog window discriminator device and custom real-time event buffer hardware. The window discriminator generates digital pulses for selected action potentials whose peak amplitude falls between an adjustable voltage range. The pulses from the discriminator and sync pulses from the stimulator are time-stamped with 1 msec resolution by the event buffer hardware and sent to the experiment control computer. Some of the data were collected by a new data acquisition system based on a Digital Signal Processing (DSP) processor and custom spike waveform discrimination software running on a Unix workstation (NeXT). Spike waveforms are sampled at 25kHz, discriminated digitally and time-stamped with 40 µsec resolution. For most spikes, only the time code is stored, but for 2 to 10% of all spikes, waveform information is also stored. The time-stamped data are sent to the control computer through an Ethernet network connection. The data from the two systems are equivalent for the purpose of this study, except that the new system has higher time accuracy and records waveform data. The waveform data are useful in evaluating the reliability of spike discrimination when multiple cells are recorded simultaneously from one electrode. The experiment control computer directs all aspects of an experimental run. It accepts specifications of a test from the experimenter, controls the visual stimulator, receives data from the data acquisition system, and performs preliminary analyses and real-time display of the data. Immediately after each run, data are plotted graphically to determine stimulus parameters for use in subsequent runs. Preliminary procedures Once a cell is encountered and a spike waveform is isolated, location and preferred orientation of the RFs (for each eye) is determined approximately using a bar stimulus projected on the tangent screen. Next, J. Neurophysiol. (in press)
7
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
an interactive search program (DeAngelis et al. 1993a) is used to determine optimal parameters for a circular patch of drifting sinusoidal grating presented on one of the CRT displays. A pointing device (mouse) controls the spatial frequency, orientation, position and size of the grating patch. Spikes are marked as bright dots on an X-Y position map and on a polar map indicating the stimulus spatial frequency and orientation. A cluster of these dots indicates the range of stimulus parameters that excite the neuron effectively. In order to determine the RF position accurately, the grating patch is made as small as possible (typically 2-4 in diameter) and moved slowly around and throughout the RF. The optimal parameters obtained this way are used for subsequent quantitative measurements, which in turn provide more accurate estimates of the relevant parameters. First, orientation and direction selectivities are determined for the two eyes using drifting sinusoidal grating stimuli. Typically, gratings of 7- 9 orientations (in steps of 8 to 15) and two opposite directions of motion are presented in a randomly interleaved fashion. The orientation of all of our stimuli may be specified with a resolution of a 32-bit floating point number. Measurements for the two eyes are also interleaved in a single run. To improve the accuracy of determining an optimal orientation that is likely to fall between the test orientations used, optimal orientation and direction are determined for each eye by automatically locating a maximum in interpolated tuning curves (cubic spline, Press et al., 1992). Using these optimal values for each eye, spatial frequency tuning curves are then obtained. Temporal frequency tuning curves are also obtained similarly. Next, gratings of optimal parameters are presented dichoptically at a variety of interocular spatial phases to determine the relative phase dependence of the neuron (Ohzawa and Freeman 1986a,b). RF measurement and analysis To obtain a detailed space-time map of a simple cell's RF for each eye, we use a procedure known as reverse correlation (DeBoer and Kuyper, 1968; Eggermont et al. 1983). This method has been applied extensively to visual cortex by Jones and Palmer (1987a), and we have adapted their method for our system. Our reverse correlation method has been described in our recent papers (Freeman and Ohzawa, 1990; DeAngelis, et al. 1993a, 1995a). The method is explained here briefly. Stimuli used in reverse correlation experiments are randomized sequences of bars presented one at a time at various locations throughout the RF. Typically, the position of a bar stimulus is selected randomly from 20x20 two-dimensional grid positions. The orientation of individual bars and the stimulus grid is always matched closely to the preferred RF orientation for each eye. By orienting the grid, the RF always appears vertically in the resulting spatial maps. The sign of contrast of the bar (bright or dark bar) is picked randomly for each presentation. Each randomized stimulus sequence contains all possible stimuli, i.e., for each sequence, bright and dark bars are presented once and only once at each grid point. The stimulus sequence is rerandomized and presented repeatedly until reasonably smooth RF profiles are obtained, which typically requires 20 to 40 repetitions (or about 15 to 30 minutes). These sequences are generated using a random number generator, so that a seed given to the generator uniquely identifies a particular stimulus sequence. Therefore, for subsequent analysis, only these seeds must be recorded in order to regenerate all stimulus sequences. Fig. 3 shows a stimulus sequence and the process by which a space-time RF is derived. The top section of the figure represents a segment of a randomized stimulus sequence, containing about 40 consecutive stimuli. The sequence can be thought of as a time series of two-dimensional(X,Y) spatial patterns, as shown in the figure by a collection of square "slabs" stacked from right to left. These slabs are shown separately at the top in an exploded view for 5 consecutive stimuli. Each slab represents a single stimulus presentation containing a bar stimulus that is either bright or dark and flashed on an otherwise uniformly gray screen somewhere within the square region. Some of these stimuli fall on excitatory regions of the RF and spikes are elicited. A spike train is depicted just below the stimulus sequence in Fig. 3. For each spike generated, we look back in time (rightward in the figure) for stimuli that are likely to have J. Neurophysiol. (in press)
8
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Fig. 3
Stimulus Sequence Y
time
X
time
Y [deg]
Spike Train
X
[de
g]
T [msec] X
T 4 Y [deg]
T [msec]
300
A reverse correlation technique is used to measure spacetime RFs of simple cells. A section of a stimulus sequence is illustrated at the top. A part of it is shown in an exploded view to reveal individual stimuli that are typically 30-50 msec in duration. Each stimulus is a bright or dark bar presented on a gray background at one of 20x20 grid points. The position and the polarity (bright or dark) are randomized. Although stimuli are presented with no interstimulus interval, not all will elicit a discharge from a neuron as indicated by the relatively sparse spike train. The reverse correlation technique may be understood as a procedure for obtaining the average stimulus profile that has caused a cell to fire. This is achieved by summing sections of the stimulus sequence that immediately precede each spike (shown by shaded cubes in the stimulus sequence) for all spikes generated. The result is a three-dimensional map of X, Y and T (two of space and one of time) that indicates the effectiveness of a stimulus flashed at position (X, Y) in causing the cell to fire T msec after the flash. This is the definition of a RF in space and time. Various cross-sections and projections of the 3-D data are used for clarity of presentation as shown at the bottom.
0
0 0
X [deg]
4
0
X [deg]
4
elicited the spike, starting at the instant of spike occurrence. How far into the past should we look? Typically, for cells in area 17 of adult cats, only stimuli presented in the past several hundred milliseconds contribute to the response (DeAngelis et al. 1993a; McLean et al. 1994). This time epoch is depicted in Fig. 3 as a shaded cube shown within the stimulus sequence for each generated spike. Note that stimuli are presented continuously between shaded cubes, but this is not shown for clarity. The selection of bounds in the time domain is not critical because we can always perform the analysis of saved data again with different time parameters if we find that a particular range is not appropriate. On the other hand, the spatial stimulus parameters, particularly the size and centering of the stimulus grid must be set carefully because these parameters cannot be readjusted later. The stimuli in the shaded cubes are summed for all spikes generated in a stimulus sequence, counting each bright bar stimulus as +1, each dark bar as -1, and the background as zero. Note that the shaded cubes are time-locked to each occurrence of a spike. This produces the average stimulus profile (after division by the number of cubes added) that elicits a spike. The average profile is depicted within a cube shown in the J. Neurophysiol. (in press)
9
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
center of Fig. 3. An element within this cube at (X,Y,T) represents the contribution that a stimulus makes to the generation of a spike, i.e., a stimulus presented T milliseconds ago at spatial position (X, Y), contributes to spike generation by the amount indicated by the value of the element at (X, Y, T) in the averaged cube. From the above specification of the procedure, the reverse correlation process yields a time-locked average of all stimuli that produce spikes. Since it is difficult to present 3-dimensional data such as these in print, we show either a cross-section or a projection of the cube onto two dimensions. For example, a cross section at a given T is a spatial map of the RF at that instant, as shown at the bottom left in Fig. 3. A cross section or the projection of the cube along the Y dimension gives the space-time (X-T) RF, as depicted by the inset at the bottom right. Since the orientation of the grid is always matched to the preferred orientation of the RF, the process of obtaining an X-T profile is simplified. Note that some RFs could not be excited sufficiently by short bar stimuli. In these cases, bar length was extended such that it equals or exceeds the size of the X domain, and measurements were performed in the X-T domain (DeAngelis et al. 1993a). This procedure yields data that are nearly equivalent with the exception that a 2-D (X-Y) spatial map cannot be obtained. However, the signal to noise ratio of the data is improved since a long bar is generally more effective in driving the cell (care must be used so that a bar does not extend into an end-inhibitory region). Also, many more repetitions are possible in a given time because the total number of stimulus locations is reduced. The reverse correlation procedure and our interpretation of the resulting RF profiles depend on a linearity assumption. We and others have partially tested this assumption by comparing spatial and temporal frequency tuning curves measured by grating stimuli to those predicted from X-T RFs via the Fourier transform, and generally, good matches have been found (DeAngelis et al, 1993b; McLean and Palmer. 1994b). It has also been shown that responses and direction preferences to moving bar stimuli are predictable from the XT RF of simple cells (McLean et al. 1994a). These results confirm that simple cells may generally be considered to be linear under our measurement conditions. Histology and laminar analysis At the end of each electrode penetration, electrolytic lesions (5 µA, 10 sec) are made at 700-1500µm intervals while the electrodes are retracted. The animal is then given an overdose of pentobarbital sodium (Nembutal), and perfused through the heart with Formaldehyde (4% in buffered saline). Coronal sections are made at 40µm intervals and they are stained with thionin. Electrode tracks are reconstructed, and based on lesions and the depth of electrodes for each recorded cell, the laminar locations of the cells are identified. Histological analyses confirm that all recordings were made from area 17, and cells from all lamina are included in our sample. Results We have analyzed single unit recordings from a total of 257 neurons in the striate cortex of 18 normal adult cats. Of these, 142 were classified as simple, and 115 as complex based on the degree of modulation of responses to drifting gratings at the temporal frequency of drift (Skottun et al. 1991) and the presence or absence of discrete bright and dark excitatory subregions as determined by our RF measurements. Of the simple cells, appropriate RF measurements were completed for both eyes for 65 cells. The remaining cells were either lost during the lengthy and extensive recording period required to obtain complete data for each cell or they were completely monocular. Results in this paper are restricted to simple cells. Spatial structure of RFs The spatial RF structure of simple cells generally consists of elongated bright- and dark-excitatory subregions arranged alternately (Hubel and Wiesel, 1962; Jones and Palmer, 1987a; DeAngelis et al, 1993a). J. Neurophysiol. (in press)
10
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Space (Y)
LEFT
RIGHT
A
kd905-12 80 msec 4 x 4 deg OR=115°, 115° E=0.96, 0.43 S/N=23.6, 14.5 dB
B
kd905-19 70 msec 6 x 6 deg OR=110°, 100° E=2.1, 2.4 S/N=19.7, 20.2 dB
C
bk318-20 60 msec 5 x 5 deg OR=260°, 270° E=2.3, 0.5 S/N=19.8, 16.4 dB
D
kd123-28 60 msec 6 x 6 deg OR=270°, 260° E=1.2, 1.5 S/N=24.6, 22.5 dB
E
kd121-17 60 msec 5 x 5 deg OR=210°, 195° E=2.6, 3.0 S/N=20.9, 20.4 dB
Fig. 4
Two-dimensional spatial (X-Y) RFs are shown as contour plots for left and right eyes for 5 simple cells (A-E). The X dimension of the RF, which is perpendicular to the preferred orientation in all cases, is represented on the horizontal axis. In this and all subsequent figures, solid and dashed contours represent bright and dark excitatory subregions, respectively. The shading is proportional to the strength of the response. A 1-dimensional profile, obtained by integrating the 2-D profile along the Y dimension, is also shown below each 2-D profile. Details of recording are indicated to the right of the profiles in the following order: animal code-cell number, correlation delay at which spatial profiles are obtained, size of the stimulus grid, preferred orientation (OR; 0, 180=horizontal; 90, 270=vertical) for left an d right eyes, efficiency (E) which is the average number of spikes generated by a single flash at the highest peak of the spatial profile, and signal-to-noise ratio (S/N) of the 1-D profile for the two eyes (see text). Because the mapping grid has been rotated to match the preferred orientation of the RF, subregions are always elongated along the vertical.
Space (X) Although it is customary to refer to the subregions of simple cells as ON or OFF areas (Hubel and Wiesel, 1962), these are not strictly equivalent to the subregion structure defined in the space-time domain (see below). Since one of our goals is to determine how left and right RFs of binocular simple cells might differ, we first examine the spatial domain and then the space-time domains of representative examples. Fig. 4 presents left and right spatial RFs of five representative simple cells. For each cell, the RF for the two eyes is measured with identical stimulus parameters using the reverse correlation procedure outlined above (see Methods). The cell shown in Fig. 4A exhibits a large difference in the spatial structure of left and right RF J. Neurophysiol. (in press)
11
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
profiles. This is apparent by examining the arrangement of bright and dark excitatory flanks. For the left eye, the RF subregions are arranged in the order of bright excitatory (solid contours) and dark excitatory (dashed contours) from left to right. For the right eye, the subregions are arranged in the order of weak bright-excitatory, strong dark-excitatory, and strong bright-excitatory. This difference in the RF structures for the two eyes is even more apparent when we examine the profiles using one-dimensional (1-D) plots, as shown below each two-dimensional (2-D) profile in Fig. 4. These 1D RF profiles are obtained by integrating the 2-D profiles along the Y axis (the axis parallel to elongation of the subregions). The 1-D plots obtained this way correspond to RFs measured with long bar stimuli (Ohzawa et al., 1990). Another example of a cell with different RF profiles for the two eyes is shown in Fig. 4B. For this cell, both the left and right RFs have 3 clear subregions, but the 1-D profiles are nearly mirror images of each other with respect to the Y axis. Note that the apparent difference between positions of left and right RFs within the square domains (e.g., Figs. 4B, C and E) simply reflects slight errors in centering the stimulus grid for each eye's RF (see Methods), and does not represent RF incongruity. Fig. 4C presents another example of a cell with different RF profiles for the two eyes. Each profile consists of two distinct subregions. Again, the order of bright- and dark-excitatory regions is reversed for the two eyes. To evaluate the quality of RF measurements, signal-to-noise analysis is used. Signal-to-noise ratio (S/N) is defined as the energy (sum of squares) of the profile at the optimal delay (shown as 1-D profiles in Fig. 4) to the energy of the profile at a delay of -100 msec (not shown). The energy at the optimal delay represents the strength of signal+noise, while that at -100 msec represents the amount of noise. This is because a negative delay means that one measures the correlation of response to stimuli in the future. The S/N ratio is self-normalizing since the same number of spikes contribute to defining a RF profile at any delay. The S/N ratios for each RF map are noted to the right of the profiles in decibels (dB), which is defined as 10 log10(S/N). The examples described above clearly demonstrate that the RFs of binocular simple cells are not always identical as was reported previously (Maske et al. 1984; Hubel and Wiesel, 1962). The presence of these cells indicates that the phase-encoding hypothesis for binocular disparity representation is plausible. Of course, not all simple cells possess different RF profiles for the two eyes. The cells shown in Fig. 4D and 4E have quite similar RFs for the two eyes. These cells are certainly in accordance with the original notion of matched left and right RFs. The relative proportions of cells with similar and dissimilar RF structures for the two eyes will be addressed below. Space-time RF profiles The RF maps shown in Fig. 4 are spatial profiles taken at time T, where the peak response occurs. However, the spatial map alone is not an adequate description of the RF. Ideally, one would like to visualize the RFs in three-dimensions (X-Y-T), as presented schematically as a cube in Fig. 2. Since this is not practically feasible for our presentation here, we obtain a 2-D (X-T) projection of the cube by collapsing the Y-dimension. Because we rotate the stimulus grid to match each cell's preferred orientation, the Y-dimension is always parallel to the length dimension of elongated RF subregions, while the X-dimension traverses the bright and dark subregions. Fig. 5 presents space-time (X-T) RF profiles for the same cells depicted in Fig.4. Inspection of the X-T profiles in Fig. 5 immediately reveals properties that are not apparent in the spatial RF profiles shown in Fig. 4. For example, the left RF profile of the cell in Fig. 5A shows at least three clear subregions, whereas the spatial map in Fig 4A only displays two major flanks. This is because the subregion located in the upper-left region of the X-T domain in Fig. 5A (left) has a latency longer than the peak response latency (indicated by the horizontal dashed line) at which the spatial RF profile shown in Fig. 4 was obtained. Similar features are also observed for the right eye response (Fig. 5A, right). The left-most bright excitatory subregion does not reach its peak strength until approximately 150 msec after the stimulus flash. This subregion is very weak at 80 msec where the overall maximum response occurs (dotted line). J. Neurophysiol. (in press)
12
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells LEFT
RIGHT
A
kd905-12 4 deg 300 msec
B
kd905-19 6 deg 250 msec
C
Time (T)
bk318-20 5 deg 250 msec
D
Fig. 5 Space-time (X-T) RFs are shown for the 5 simple cells (A-E) presented in Fig. 4. Horizontal and vertical axes represent space (X) and time (T) dimensions, respectively. T=0 at the bottom of each profile, and is maximum at the top. Details of experimental conditions are shown to the right of the profiles in the following order: animal code-cell number, size of spatial domain, and the maximum correlation delay. Horizontal dashed lines indicate the delay at which the spatial RF profiles in Fig. 4 were obtained (i.e. for time values at which there was a maximum response).
kd123-28 6 deg 200 msec
E
kd121-17 5 deg 250 msec
Space (X) Another notable feature of the X-T profiles in Fig. 5A is that the subregions are tilted to the right. For example, the dominant bright excitatory region for the left eye in Fig. 5A shifts to the right as time T increases. This is true for the dark excitatory subregions as well. The right RF exhibits a similar tilt of the subregions. Oriented space-time RF profiles have been presented previously (McLean and Palmer, 1989; McLean et al. 1994; Reid et al. 1991; DeAngelis et al, 1993a,b). The existence of such RFs has been predicted by models of motion processing mechanisms (Watson and Ahumada 1985; Adelson and Bergen 1985), and inferred by indirect physiological measurements (Emerson et al. 1987; Reid et al. 1991; Hamilton et al., 1989; Tolhurst and Dean, 1991). It should be noted, however, that the orientation of space-time RFs usually cannot be described with a simple mathematical formula, e.g., as an Gaussian derivative or Gabor function that is oriented in the space-time domain (Young and Lesperance, 1993; Qian 1994a). For many cells, there are "kinks" in the subregions (Hamilton et al., 1989). For example, the bright excitatory region for the left eye in Fig. 5A displays a relatively large right-ward tilt up to about 150 msec, but the tilt is much reduced thereafter. Because of this feature, we have not attempted to obtain the slope of the subregions as J. Neurophysiol. (in press)
13
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
McLean and Palmer have done (1989, 1994) to predict the optimal velocity from the space-time RF. A straight line fit is often impossible to perform without large residual errors. Frequency domain analysis is generally more useful for this purpose, and is presented below. The tilt or orientation of the space-time RFs varies greatly from cell to cell. Fig. 5B shows a cell that exhibits almost no tilt of subregions in the X-T domain. In particular, the left eye profile is almost perfectly space-time separable, i.e., the X-T RF may be expressed as the product of a function of space and a function of time. The right eye RF shows a small degree of tilt in the left-most dark excitatory region. Fig. 5C and D show cells that exhibit intermediate behavior, possessing some subregions that are oriented in the X-T domain, and some that are not. Fig. 5E shows another cell with clearly oriented subregions in space-time for the two eyes. Although this figure contains only a few examples from the spectrum of space-time RFs, we find that X-T profiles for the left and right eyes generally exhibit similar degrees of tilt. It is important to emphasize that the X-T RFs described above represent intrinsic response properties of simple cells and are largely independent of the stimulus parameters used to obtain the profiles. In particular, the delayed second phase of the biphasic temporal response, such as in Fig. 5B, is not a response to the offset of the stimulus, although one might be tempted to interpret the data that way. For example, Eckhorn et al. (1993) describe the second excitatory temporal phase as the "exit response" caused by disappearance of the stimulus. We have performed a control experiment which demonstrates that this is not the case. This issue can be addressed by varying the duration of the stimulus, and examining whether the first and second phases of the temporal response are time-locked to stimulus onset and offset. If the second phase of a biphasic response is indeed caused by stimulus offset, we would expect the peak of the second phase to occur with a constant delay after stimulus offset. On the other hand, if the timing of peaks in the biphasic temporal response is not affected by stimulus duration, this would indicate that the X-T RF represents intrinsic properties. Results of this control experiment are shown in Fig. 6. In Fig. 6A, an X-T RF is presented for a simple cell that has a clear biphasic response. Note that this X-T plot is rotated 90 relative to those of the previous figure so that the temporal responses are plotted in a familiar format. A cross section through this profile at one spatial position, indicated by a dashed horizontal line, represents a temporal response profile. The first trace of Fig. 6B shows this temporal response, with a large dark-excitatory phase followed by a smaller but longer lasting bright excitatory second phase. As shown at the beginning of the trace, the stimulus duration was 52.8 msec (4 video frame periods). Identical measurements were also performed using different stimulus durations in multiples of the frame duration (13.2 msec). Temporal responses obtained at the same spatial position are shown in Fig. 6B. Latencies of waveform features (peaks and a zero-crossing) referenced to stimulus onset, temporal center of stimulus, and stimulus offset are shown in Fig. 6C, D, and E, respectively. From Fig. 6D, it is clear that the latencies to the peaks and the zero-crossing from the center of stimuli remain relatively constant, except for a small increase for the case of 52.8 msec. This probably indicates that with 52.8 msec stimuli, there was a small temporal smearing (filtering) effect. On the other hand, the latencies from stimulus offset (Fig. 6E) to any of the waveform features, including the zero-crossing (filled circles) and the peak of the second phase of the response (open squares) decrease with stimulus duration and are not time-locked to the stimulus offset. Since contrast of stimuli affects temporal responses (Shapley and Victor, 1981; Sclar 1987; Carandini and Heeger, 1994), it may be argued that the changes in latencies are due to changes in the stimulus energy (or effective contrast) contained in stimuli of different durations. However, the opposite effects apparent in onset (Fig. 6C) and offset (Fig. 6E) latencies suggest that this is not the case. If the effective contrast were the primary determinant of latency, then it is expected that the phase advance or speed up of the response would be reflected similarly for both onset and offset latencies. These results show that the spatiotemporal response obtained by the reverse correlation method represents the intrinsic response properties of a RF, and generally is not affected by the stimulus duration. The second phase of the temporal response is not the OFF response or the exit response caused by stimulus offset. Rather, the space-time RF is a close approximation of the space-time impulse response of a neuron (see Methods). Note that the temporal response is invariant when the center of the stimulus pulse is taken as time zero, J. Neurophysiol. (in press)
14
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
A
C 200
0
Onset Latency
Space [deg]
150
100
50
9 0
B
150
Time [msec] 52.8 msec
300
Latency [msec]
D 200
26.4
39.6
52.8
Center Latency
150
100
50
39.6
13.2
E 150
13.2
26.4
39.6
52.8
Offset Latency
26.4 100
13.2 50 13.2
26.4
39.6
52.8
Stimulus Duration [msec]
Fig. 6 Results of a control experiment demonstrate that the second phase of a temporal response profile is not due to the offset of flashed stimuli used in reverse correlation mapping. A: A space-time RF is shown rotated 90 deg from those in Fig. 5. The horizontal line indicates a cross section of the profile for one position at which temporal responses are examined in detail. B: Temporal responses, at the position indicated in A, are shown for 4 different stimulus durations (1 to 4 video frames, 1 frame period = 13.2msec). Although stimulus duration varies, the temporal responses show little change, indicating that the space-time map represents the intrinsic structure of the RF, not the onset and offset responses to stimulus transitions. C: Latencies of waveform features in (B) are shown as a function of stimulus duration. Latencies referenced to stimulus onset are plotted separately for negative peak (open circles), zero-crossing (filled circles), and positive peak (open squares) of the response waveforms. D: Latencies referenced to temporal center of stimulus pulse are shown similarly. E: Latencies from stimulus offset are given. instead of stimulus onset. For this reason, all of the spatiotemporal data presented here are computed by referencing time to the center of the stimulus pulse. The same definition of the time reference is used in all of our previous publications, except for the earliest (Freeman and Ohzawa, 1990) in which time was referenced to the onset of the stimulus. The addition of 1/2 of the stimulus duration to the correlation delays in that study is the only correction needed. Having clarified the interpretation of space-time RFs, we return to the analysis of left and right eye profiles. One potential problem associated with oriented X-T profiles is that there is no unique spatial RF profile for these neurons because spatial structure changes over the time course of the response (DeAngelis et al. 1993a). This could be a problem because we are comparing spatial RF profiles for the two eyes to determine if their difference can be the basis for encoding binocular disparity. Dynamic changes in spatial RF profiles may make the encoding of binocular disparity difficult for the visual system. Therefore, we have evaluated the transformations of spatial profiles by quantitatively tracking the changes in various parameters over the time course of the response. This is done by obtaining 1-dimensional spatial RF profiles, at 5 msec intervals, over a range of T and by fitting a Gabor function (Gabor, 1946) to each of these profiles (DeAngelis et al. 1995a). Curve fitting was performed using a modified version of the downhill simplex method J. Neurophysiol. (in press)
15
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
(Press et al. 1992). All fits are examined graphically to make sure that the simplex optimization algorithm is not caught in a local minimum. An example of a Gabor function is shown in Fig. 7. A Gabor function is the product of a sinusoid and a Gaussian, and is defined by: G(x) = k exp[-(x - x0)2 / (w/2)2] cos[2 π fopt (x - x0) + Φ] ,
(eq. 1)
where k, w, and x0 are the amplitude, full width at 1/e of the peak, and center position of the Gaussian envelope, respectively, and fopt and Φ are the spatial frequency and phase of the sinusoid, respectively. This k
w/ 2
Φ
Fig. 7
k/e x
0 x0
Five parameters that define a Gabor function (solid curve) are illustrated. Parameters k, w, and x0 are the amplitude, full width at 1/e of the peak, and center position of the Gaussian envelope, respectively, and fopt and Φ are the spatial frequency and phase of the sinusoid, respectively (see eq. 1 in text). Dashed curves represent the Gaussian envelope. The phase is referenced to the center of the Gaussian envelope. This function is used to fit the spatial RF profiles of simple cells, and to extract five key parameters that characterize the RF.
1/ fopt
figure relates a different attribute of the RF to each of the five parameters of the function. The Gaussian part, or envelope, of the function is shown by the dashed curve. The center of the RF, x0, is defined as the center of this Gaussian envelope. Therefore, depending on the phase parameter, Φ, it is possible that the center of the RF coincides with the boundary between bright and dark excitatory regions, at which position the cell does not respond. It has been determined empirically that 1-D spatial RF profiles of simple cells are fitted well with this class of function (Marcelja, 1980; Jones and Palmer, 1987a; DeAngelis et al, 1991, 1995a), although other formulations have also been proposed (e.g., Young 1987; Stork and Wilson, 1990; Klein and Beutter, 1992). We have chosen a Gabor function for quantifying various attributes of spatial RF profiles, because each fitted parameter has a well-defined meaning. Once the curve fitting procedure is completed, we are able to examine how each spatial parameter of a RF varies with time, T. Fig. 8 presents the time courses of all 5 fitted parameters for the simple cell shown in Fig. 4B and 5B. The time course of the envelope amplitude (k) is shown for both left (solid curve) and right eye (dashed curve) RFs in Fig. 8A. The peak amplitudes are reached at 65 and 75 msec for the left and right eyes, respectively. The minimum amplitudes occur at about 130 msec, which corresponds to the zero crossings of the temporal response function in Fig. 5B. The right eye response appears to have a small delay of about 5-8 msec relative to that for the left eye. Because the curve fitting cannot be reliably performed when the amplitude is small, we placed a minimum amplitude criterion at 0.2 times the peak amplitude, below which other fitted parameters are not shown. As a result, there are gaps in the curves, from 100 - 150 msec, for the rest of the panels in Fig. 8. Other parameters, including the center of the Gaussian envelope (8B), the RF width (8C), and the optimal spatial frequency (8D) remain quite constant throughout the time course of the response and are well-matched for the two eyes. The RF phase also remains relatively fixed, but is clearly different for the two eyes, as shown in Fig. 8E. The RF phase jumps by 180 across the gap for both eyes. These jumps represent the inversion of the spatial profiles due to a biphasic temporal response, i.e., a bright excitatory region becomes dark excitatory after 140 msec, and vice versa. Since all parameters of the RF except for the phase, Φ, are matched for the two eyes, phase is the key parameter that distinguishes J. Neurophysiol. (in press)
16
Left eye Right eye
0.6 0.4 0.2 0.0
RF Width, w [deg]
0
40
120
160
3 2 1 0 40
80
120
160
200
E
90 0 -90 -180 -270 0
40
80
120
160
Time, T [msec]
200
B
4
KD905-19
3 2 1 0
200
C
4
0
RF Phase, Φ [deg]
80
0
Opt. SF, fopt [cyc/deg]
0.8
RF Center, x0 [deg]
A
1.0
Phase Diff., δΦ [deg]
Norm. Amplitude, k
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
40
80
120
160
200
40
80
120
160
200
40
80
120
160
200
D
1.0 0.8 0.6 0.4 0.2 0.0 0
F
180 135 90 45 0 0
Time, T [msec]
Fig. 8 The time course of five spatial RF parameters is plotted for left and right eyes. Data are shown for the simple cell presented in Figs. 4B and 5B. This cell has RFs that are nearly spatio-temporally separable. A: Amplitudes of the fitted Gabor functions are plotted as a function of time. Values are normalized to the maximum for each eye. The horizontal line at 0.2 indicates the (arbitrary) criterion below which the fitted parameters are deemed unreliable due to low signal-to-noise ratio. Consequently, there are breaks in the graphs of other parameters for the interval during which the amplitude is less than this criterion. Such dips in amplitude are typical of RFs that are space-time separable. B, C, D, E: center position (of envelope), full width at 1/e of the peak amplitude, spatial frequency, and phase are plotted for the RFs, respectively. F: Phase difference (right - left) is computed from the data in E and plotted similarly. the left and right RF profiles. Therefore, the phase difference between the left and right RFs, δΦ = (ΦR ΦL), is the metric that quantifies the structural difference between the left and right RFs. Fig. 8F plots the phase difference, as derived from data in Fig. 8E, as a function of time. The phase difference remains relatively constant at 115 + 20 degs during the course of the response. The phase difference confirms quantitatively the obvious difference between the left and right RFs illustrated in Fig. 4B. The constancy of all parameters over time is expected for a cell such as this one since it has space-time separable RFs, and the spatial profiles change little over time except for scaling and inversion. We next examine the time course of the parameters for a cell that has space-time inseparable RFs. Fig. 9 presents the time courses of the 5 RF parameters for the cell depicted in Fig. 4E and 5E. This cell has oriented bright and dark excitatory subregions in the space-time domain. The peak amplitude of the reJ. Neurophysiol. (in press)
17
Left eye Right eye
0.6 0.4 0.2 0.0
RF Width, w [deg]
0
40
120
160
3 2 1 0 40
80
120
160
200
E
360 270 180 90 0 0
40
80
120
160
Time, T [msec]
200
B
4
KD121-17
3 2 1 0
200
C
4
0
RF Phase, Φ [deg]
80
0
Opt. SF, fopt [cyc/deg]
0.8
RF Center, x0 [deg]
A
1.0
Phase Diff., δΦ [deg]
Norm. Amplitude, k
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
40
80
120
160
200
40
80
120
160
200
40
80
120
160
200
D
1.0 0.8 0.6 0.4 0.2 0.0 0
F
180 135 90 45 0 -45 0
Time, T [msec]
Fig. 9 The time course of spatial RF parameters of another simple cell is plotted in the same format as that of Fig. 8. XY and X-T RF profiles of this cell are shown in Figs. 4E and 5E, respectively. The RFs are spatio-temporally inseparable.
sponse is reached at 70 msec for both eyes as shown in Fig. 9A, and there is practically no delay between the left and right eye responses. The RF center, width, and optimal spatial frequency are again fairly constant throughout the time course of the response. Unlike the cell of Fig. 8, there are no gaps in the curves in this figure, since the amplitude k is always above the criterion value 0.2 for the interval from 20 to 200 msec. Note that the small difference in the RF center position for the two eyes, shown in Fig. 9B, depends on the centering of the stimulus grid with respect to the RF, and thus the vertical offset of the curves does not reflect any physiological property. The optimal spatial frequency for the right eye is slightly higher than that for the left eye, as shown in Fig. 9D. Separate direct measurements of spatial frequency tuning using drifting sinusoidal gratings show a similar difference between the optimal spatial frequencies for the two eyes (not shown). Differences of optimal spatial frequencies for the two eyes are present for some cells, although they are generally small (see below). These differences appear to be neural in origin and not due to optical factors such as refractive errors or magnification differences, because even within a given electrode penetration, the eye that has higher preferred frequency changes. Indeed, a cell recorded immediately after that of Fig. 9 had the opposite difference of optimal spatial frequencies. In any case, the most notable difference between the cells of Figs. 8 and 9 can be seen in the plot of RF phase shown in Fig. 9E. For this cell, the phase decreases J. Neurophysiol. (in press)
18
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
monotonically until about 120 msec, and then it becomes nearly constant. However, the time course of the phase change is closely matched for the two eyes, as seen by the solid and dashed curves that are almost superimposed. Therefore, the phase difference, shown in Fig. 9F, remains constant at about 0 + 20. Together, Figs. 8 and 9 show that a constant phase difference is maintained over time for both space-time separable RFs and space-time inseparable RFs, respectively. The phase difference between left and right RFs is a constant attribute of these neurons, in spite of the fact that the phase of the monocular RFs is a dynamic property. These results show that, of all 5 parameters that describe the spatial RF, only the amplitude and the phase change substantially over the time course of the response. The rest of the attributes: width, spatial frequency, and center position of the RFs remain highly stable throughout the response. Based on these findings, it is reasonable to describe the space-time RFs of a simple cell as a fixed-frequency sine wave that moves or phase-reverses through a fixed two-dimensional Gaussian window. Phase-reversal of the sine wave corresponds to a separable space-time RF, and movement of the sinusoid corresponds to a non-separable space-time RF that is direction selective. However, the speed at which the sinusoid moves is not generally constant over the time course of the response (as a result of "kinks" in the X-T profiles). The velocity selectivity of these neurons is examined further below using frequency domain analysis. For other binocular simple cells that we have recorded, findings are similar to those described above (Figs. 8 and 9). In Fig. 10, we show additional data from four simple cells. In this figure, only the time course of RF phase and phase difference is shown. All four examples are from cells that had space-time inseparable RFs, as indicated by the changes in RF phase shown in Fig. 10A, C, E, and G. Despite the fact that RF phases for each eye change over the time course of the response, the phase difference remains relatively constant for the first 3 cells, as shown in Fig. 10B, D, and F. Of these, the cell at the top (Fig. 10A, and B) shows highly matched time courses for the two eyes, and thus a phase difference that is nearly constant around zero. On the other hand, the middle two cells (Fig. 10C, D, E, F) show a relatively constant but non-zero phase difference over the course of the response. However, the cell shown at the bottom exhibits mismatched time courses for left and right RF phase (Fig. 10G), and therefore a phase difference that is not constant. In fact, during the time course of the response (20 - 200 msec), the phase difference increases monotonically by about 280. This type of non-constancy of phase difference is rare among simple cells in the striate cortex, and was observed for only 4 of the 65 cells we studied. Dependence of phase difference on preferred orientation Thus far, we have established time constancy of the phase difference between left and right RFs, and the basis for using it as a metric for the structural difference between the two RFs. We now examine the relationship of this metric, the phase difference δΦ, to other attributes of neurons. First, a possible dependence of the phase difference on preferred orientation is examined. As described above (Introduction), this is of particular interest because an orientation-specific functional specialization is predicted on the basis of the lateral placement of the two eyes. If the visual system has adapted itself to the horizontal-vertical asymmetry present in the images from the two eyes, we expect to find a corresponding asymmetry in aspects of the disparity selectivity of cortical neurons. Our results show a horizontal-vertical asymmetry with respect to differences in the phase parameters of left and right RFs. Fig. 11 shows the phase difference between RFs for the two eyes plotted against preferred orientation. We observe a striking bias in the distribution of phase differences. Specifically, most of the points are located in the lower-right half of the square domain (i.e., below the diagonal), and there is a notable lack of points in the upper-left region (except for one near the upper left corner). This graph indicates that almost all cells that prefer orientations near horizontal have a small phase difference, meaning that their RFs are similar for the two eyes. On the other hand, cells that prefer orientations near vertical have a variety of phase differences from 0 to 180, indicating that their RFs may be similar or very different. In order to examine how changes in RF phase over time affect the horizontal-vertical asymmetry, an error bar is drawn for each point indicating the degree of confidence we have in the phase difference estimate J. Neurophysiol. (in press)
19
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
A
360
B
180
Left eye Right eye
KD905-06
135
270
90 180 45 90
0
0
-45 0
80
120
160
200
0
C
180 90
40
80
120
D
180
160
200
KD969-21
135
0 -90 -180 0
30
60
90
120
150
E
360 270 180 90 0
Phase Difference, δΦ [deg]
Receptive Field Phase, Φ [deg]
40
90 45 0 0
30
60
90
F
180
120
150
KD496-02
135 90
Fig. 10 Time course of the RF phase parameter is shown for 4 additional cells. Other parameters are not shown. A, C, E, and G depict the phase parameter for the two eyes for cells indicated by the animal code-cell number at the right. B, D, F, and H depict the phase difference between the two eyes for the same cells.
45 0
0
40
80
120
160
200
G
90
0
80
120
H
270
0
40
160
200
BK318-18
180
-90 90
-180 -270
0 0
40
80
120
160
200
0
40
80
120
160
200
Time, T [msec]
for a given cell. The error bars represent —1 standard deviation of the mean of the phase difference over the time course of the response for which the amplitude is equal to or greater than 1/2 of the peak amplitude for both eyes. This criterion value of 1/2 is taken arbitrarily. The contribution that a cell makes toward neural disparity representation should be positively related to the response amplitude. In addition, portions of the response with a small amplitude do not allow reliable measurement of the phase parameter because of low signal to noise ratio. Therefore, only the strongest half of the time course of the space-time RF is considered. For simplicity, zero weight is given to the weak portion of the responses. It is clear from Fig. 11 that the size of error bars does not vary systematically with preferred orientation. Therefore, changes in RF phase over time do not alter the finding of a vertical-horizontal asymmetry in the phase difference distribution. The orientation asymmetry we find is consistent with the notion that the visual system is organized to take advantage of the asymmetries in the distribution of disparities found in the images from the two eyes, so that the neural representation is optimized to encode binocular information most efficiently. Since substantial vertical disparities do not occur in association with horizontally oriented contours, only matched RFs J. Neurophysiol. (in press)
20
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
N = 65
Fig. 11
θ π
135
πs
θ
inθ π
Phase Difference [deg]
180
90
45
0 0
30
60
Preferred Orientation [deg]
90
Phase differences between left and right RFs are plotted against preferred orientation for each of 65 simple cells. The orientation angle is defined from horizontal. When the preferred orientations are different for the two eyes (typically by less than 15), the average value is used. Each point represents the mean phase difference averaged over time (obtained at 5 msec intervals) where the normalized amplitude exceeds 0.5 for both eyes. Error bars represent +1 SD of the mean. A similar figure in our previous publication (DeAngelis et al. 1991) presents the phase difference distribution determined only at the optimal delay. The distribution shown here incorporates phase difference data at other correlation delays because monocular RF phase for many cells changes due to the orientation of space-time RFs. Sample size is also slightly increased. The solid curve is a quarter cycle of a sinusoid and gives a theoretical upper limit for the phase difference distribution (see text). The inset shows the derivation of this limit. The distance between the dashed lines indicates a phase difference of π (180 deg) for vertical and oblique (θ) orientations. The horizontal disparity that corresponds to a phase difference of π at a vertical orientation is matched by that corresponding to a phase difference of π sinθ at an oblique orientation.
for the two eyes are needed for cells tuned to an orientation near horizontal. On the other hand, for representing vertically oriented contours with a wide range of possible horizontal disparities, our phase model requires that cells have a range of phase differences, and hence different RF profiles for the two eyes. It is important to note that the phase difference distribution appears to be continuous, and is not bimodal with peaks at zero and 90 degs. This indicates that the neural representation of disparity does not follow a strict form of quadrature encoding in which binocular cells with only two possible phase differences at 0 and 90 degs are expected. Why is the distribution of data points in Fig. 11 shaped roughly like a triangle? The curve in Fig. 11 represents a function 180 sinθ (where θ is preferred orientation), and most of the data points fall below this curve. A rationale, based on the phase model, that this curve represents the upper bound for the phase difference is as follows. Due to the periodic nature of simple cell RFs, the maximum phase difference between left and right RFs is 180 (see Blake and Wilson, 1991 for a review). Therefore, for a given spatial frequency, the maximum horizontal disparity, Dhmax, that a cell tuned to vertical orientation can encode unambiguously is given by one-half of the period of the sinusoid at that frequency. For a cell tuned to an oblique orientation θ and the same spatial frequency, Dhmax corresponds to a phase difference of δΦ(θ) = 180 sinθ, as shown by the inset in Fig. 11. Therefore, it is not necessary for cells tuned to oblique orientations to have phase differences greater than this limit. In principle, cells tuned to oblique orientations can encode horizontal disparities larger than Dhmax, by having a larger phase difference. However, this attempt will result in a suboptimal representation because at disparities exceeding Dhmax, cells tuned to vertical, the very cells that provide the most useful disparity information, cannot signal disparities unambiguously. Although it would be difficult to prove this idea, it provides an attractive explanation for the shape of the distribution in Fig. 11. Implications of these findings are considered further in the Discussion. In our experiments, RF maps were obtained seperately for each eye. Therefore, it is important to conJ. Neurophysiol. (in press)
21
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
sider the effects of variations in responsiveness of cells and eye positions. Changes in responsiveness primarily affect the amplitude parameter of the RF but not RF structure as long as a sufficient number of spikes is collected. To evaluate the consistency of RF maps, we have occasionally repeated RF measurements, and have found that RF parameters remained highly stable except for amplitude. In another project in which we actively tried to induce changes in RF structure under various conditions (DeAngelis et al., 1995b), spatial phase, spatial frequency, and RF size were remarkably stable across multiple measurements performed up to 2-3 hours apart, even in cases where the amplitude changed by a factor of two. In particular, for the critical phase parameter, the standard deviation of the mean of phase change was only 17 degrees for 77 pairs of RF measurements. It should also be noted that, because the fitted function is formulated to isolate the position parameter from phase (see eq-1), RF position shifts caused by eye movements between separate measurements do not affect the estimates of phase, spatial frequency, and width. Only short-term stability during each measurement (15-30 minutes) is required, and this is generally attained (Freeman and Ohzawa, 1990). Another factor that affects estimates of phase difference is the S/N ratio of RF measurements. The mean and the standard deviation of S/N for RFs of the dominant eye was 18.7 + 4.3 dB, whereas those for the nondominant eye was 17.2 + 3.9 dB. The worst-case S/N ratio for all 65 cells was 9.1dB. For this worst-case profile, the standard error of the estimated phase parameter, obtained by a Levenberg-Marquardt fitting procedure (Press et al. 1992), was only 13 degrees. For more typical cells, the standard errors for the phase parameter were several degrees (DeAngelis et al. 1995b). Clearly, therefore, the S/N ratio could not have had much effect on the distribution shown in Fig. 11. Dependence of phase difference on other parameters By definition, the phase model is closely related to the notion of spatial frequency analysis. Phase is defined at the spatial frequency of the sinusoidal component of the Gabor function used to model the RF. Therefore, to represent a range of disparities at all spatial frequency scales, the phase model requires that neurons with a variety of phase differences between the left and right RFs exist at all spatial frequency scales. If the distribution of phase differences is highly biased with respect to spatial frequency, similar to that found for orientation, then phase encoding is not likely to work. In order to examine this issue, we have plotted, in Fig. 12, phase difference against optimal spatial frequency for each neuron. A wide variety of phase differences is present among cells at all spatial frequencies. There is no statistically significant dependence of the phase difference on the preferred spatial frequency (P=0.18, linear regression analysis, correlation coefficient=0.17). Therefore, the uniform distribution of phase differences at all spatial frequencies allows phase encoding to function as proposed in the model. A possible dependence of phase difference on other RF parameters was also examined using multiple regression analysis, and the results are presented in Table 1. The DSI (direction selectivity index), as defined by equation 2 below, quantifies the degree of direction selectivity. The index is zero for a cell that responds equally to either direction of motion. It is 1 for a cell that only responds to one direction of motion. The OBI (ocular balance index) is a metric of ocular dominance (Anzai et al., 1995; DeAngelis et al., 1995a). It is 1 for a cell with a balanced ocular dominance (group 4 in the conventional 7-group classification, Hubel and Wiesel, 1962) and zero for a cell that is exclusively driven by one eye (groups 1 and 7). It should be noted that, of 32 cells that were highly binocular (OBI >= 2/3), 14 cells (44%) had a phase difference greater than 45 indicating that there was not a tendency for highly binocular cells to have similar RF structure for the two eyes (DeAngelis et al. 1995a). Of 33 cells that were dominated by one eye (OBI < 2/3), 9 cells (27%) had a phase difference greater than 45. As summarized in Table 1, phase difference does not show any statistically significant dependence on any parameter including ocular dominance, except for orientation for which a dependence was found. These results establish that the preferred orientation is unique in exhibiting a correlation with the interocular difference in RF profiles of simple cells in the striate cortex.
J. Neurophysiol. (in press)
22
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Table 1
Dependence of phase difference (δΦ = ΦR - ΦL) on other parameters
Independent Variable
Mean Value
t-statistic
Significance
ORopt ORdiff DSIave DSIdiff SFopt SFdiff TFopt OBI
45.7 deg 6.9 deg 0.34 0.14 0.39 c/deg 0.06 c/deg 2.46 Hz 0.62
2.32 0.24 0.67 1.37 0.99 -1.36 -1.42 1.19
P=0.024 P=0.808 P=0.509 P=0.175 P=0.323 P=0.178 P=0.161 P=0.237
Key: ORopt : average preferred orientation for the two eyes as angle from horizontal ORdiff : difference in preferred orientation between the two eyes DSIave : average direction selectivity index for the two eyes DSIdiff : diffrence in direction selectivity index between the two eyes SFopt : average preferred spatial frequency for the two eyes SFdiff : difference in preferred spatial frequency between the two eyes TFopt : average preferred temporal frequency for the two eyes OBI : ocular balance index defined as OBI= 1 - 2 | RI/(RI + RC) - 0.5 |, where RI and RC are responses of the cell to an optimal sinusoidal grating for ipsi- and contra-lateral eyes, respectively.
Comparison of left and right RFs in the frequency domain Although the two-dimensional version of the Gabor function provides a generally good description of spatial RF profiles of simple cells (Jones and Palmer, 1987b; DeAngelis et al. 1993a), it is difficult to provide a complete mathematical characterization of these RFs as an explicit function in three dimensions f(x, y, t). This is primarily because the phase of the RF does not change linearly with time (see Figs. 5A, 8E), i.e., the time axis appears to be skewed such that the initial portion of the response is relatively compressed while the later part is expanded. This results in a fast rise of the temporal envelope and a relatively slow decay (DeAngelis et al., 1993a). Intervals between zero crossings of the temporal response also lengthen during the later part of the response (DeAngelis et al., 1993a). For this reason, quantitative estimation of some parameters such as direction selectivity and preferred velocity may be obtained more easily in the frequency domain and with less arbitrary assumptions than by direct space-time domain analyses. A frequency domain description is obtained by taking the two-dimensional Fourier transform of the space-time RF. The amplitude component of the transform represents the spatio-temporal frequency tuning profile predicted from the space-time RF profile. It has been shown that tuning profiles derived using a linear prediction agree well with tuning data obtained by direct measurements with drifting sinusoidal gratings (DeAngelis et al. 1993b; McLean et al., 1994a,b). Fig. 13 shows, in the same format as that of Fig. 5, space-time RF profiles for the two eyes and the corresponding spatio-temporal frequency tuning profiles. Fig. 13A shows RF profiles from a simple cell that exhibits clear non-separable space-time behavior. It is interesting to note that, at the center position of the J. Neurophysiol. (in press)
23
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Phase Difference [deg]
180
Fig. 12
Phase differences between left and right RFs are plotted against each cell's preferred spatial frequency. Preferred spatial frequencies are generally well matched for the two eyes (see below), but when they are different, the average value is used. Preferred spatial frequency is obtained from a Gabor function fit to each RF. Filled triangles represent cells that had preferred orientations within 30 degs of vertical. Open circles represent the remainder.
135
90
45
0 0.0
0.3
0.6
0.9
Spatial Frequency [c/deg]
A
LEFT
RIGHT 300
Time [msec]
300
0 0
8
8
Space [deg]
B Temporal Frequency [Hz]
0 0
-17
-17
0
0
17 -0.6
0
0.6
17 -0.6
0
0.6
Fig. 13 Space-time RFs and their amplitude spectra in the spatio-temporal frequency domain are shown for a binocular simple cell. A: Space-time RFs are shown in the same format as that of Fig. 5. The RFs of this cell are clearly inseparable in the space-time domain as indicated by the tilted orientation of the subregions. B: Amplitude spectra for the two RFs are shown in the spatial and temporal frequency domain. The spectra are symmetric about the origin because of the nature of the Fourier transform. Positive and negative frequencies may be interpreted as representing responses to a sinusoidal grating moving in forward and reverse directions, respectively. A large amplitude difference for the peaks in the first and fourth quadrants indicates direction selectivity. Preferred orientations were 75 degs for both eyes. Bar stimuli used in reverse correlation mapping had dimensions of 2 x 0.75 degs. Stimulus duration was 52.8 msec (4 frames).
Spatial Frequency [c/deg]
J. Neurophysiol. (in press)
24
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
RFs, the second temporal phase (bright excitatory) of the response is larger than the first dark excitatory phase as indicated by the contour levels and denser shading. The degree of tilt of the subregions is quite similar for the two eyes. Two-dimensional Fast Fourier Transforms (FFT) of these X-T profiles are computed, and the amplitude spectra are shown in Fig. 13B. Because of properties of the Fourier Transforms of real signals, the amplitude spectra are symmetrical with respect to the origin (Bracewell, 1978). Subjectively, it is clear that the spectral distributions for the left and right eyes are similar. For each eye, there is a strong peak in the lower right quadrant of the spectrum, whereas the peak in the upper right quadrant is clearly present but substantially weaker. The larger peak represents the sensitivity of the RF to motion in the preferred direction, whereas the smaller peak represents that for motion in the non-preferred direction. In order to quantify these comparisons, we extract some key parameters from these spectral distributions and compare them for the two eyes. Details of the definitions of the parameters have been presented previously (DeAngelis, et al., 1993b; Ghose et al. 1994). Briefly, we define a direction selectivity index (DSI) as, DSI = (Rp - Rnp) / (Rp + Rnp)
(eq. 2)
where Rp and Rnp are the amplitudes of the spectral peaks for motion in the preferred and non-preferred directions, respectively. This index takes on a value between 0 and 1. A value of zero indicates a perfectly bi-directional neuron. A completely uni-directional neuron with no response to the non-preferred direction has a DSI of 1. It should be noted that the peak spatial and temporal frequencies are not necessarily the same for preferred and non-preferred directions. The DSI is therefore defined by using the peak for the preferred direction and its mirror image point in the non-preferred direction. We also extract, from the spectral distributions, an estimate of the optimal velocity Vopt for a drifting sinusoidal grating. This is obtained by, Vopt = TFopt / SFopt
(eq. 3)
where TFopt and SFopt are the optimal temporal and spatial frequencies, respectively, for the preferred direction of motion. The ratio given by eq. 3 has been shown to correlate well with estimates of preferred velocity measured using a swept bar stimulus (Baker 1990). The optimal velocity estimate obtained in this manner is more robust and less subjective than attempting to fit a line directly to oriented subregions in the space-time domain, as employed by McLean and Palmer (1989, 1994a). This is because subregions in the X-T domain are not always straight (see Fig. 5A). In addition, estimation of optimal velocity from the spectral distribution utilizes all data contained in a space-time RF profile, and the analysis can be applied uniformly to all X-T profiles. However, with the line fitting approach, only the strongest and the straightest subregion in the X-T domain is typically selected for fitting. For the cell shown in Fig. 13, the optimal spatial frequencies obtained from the peaks of the spectra are 0.23 and 0.27 cyc/deg for the left and right eyes, respectively. Optimal temporal frequencies are also similar for the left and right eyes (3.7 and 4.2 Hz, respectively). The spatial frequency and temporal frequency bandwidths are also comparable for the two eyes. Similar data from another cell are shown in Fig. 14. This neuron has RFs that are roughly separable in space and time for the two eyes. It is also notable that the temporal responses exhibit a clear tri-phasic behavior as indicated by the three alternating polarity contour regions one encounters when traversing the maps in Fig. 14A parallel to the vertical axis. Reflecting this multiphasic temporal response, the temporal frequency bandwidth is correspondingly narrow (2.46 and 2.32 octaves for the left and right eyes), and the low cutoff frequency (at half height) of 2.4Hz for both eyes is relatively high. For most cells, the low cut-off is below 1Hz (Movshon et al. 1978c; DeAngelis et al. 1993b). Since the space-time RFs are separable, the cell is not selective to direction of motion. This is indicated by the fact that the peaks for both the forward and J. Neurophysiol. (in press)
25
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
A
LEFT
RIGHT 200
Time [msec]
200
0 0
8
0 0
8
Space [deg] Temporal Frequency [Hz]
B -17
-17
0
0
17 -0.6
0
0.6
17 -0.6
0
Fig. 14 Space-time RFs and their amplitude spectra in the spatio-temporal frequency domain are shown for another cell in the same format as that of Fig. 13. This cell had RFs that are space-time separable for the two eyes. Preferred orientations were 205 and 193 degs for left and right eyes, respectively. Bar stimuli used in reverse correlation mapping had dimensions of 2 x 0.5 deg. Stimulus duration was 52.8 msec (4 frames).
0.6
Spatial Frequency [c/deg] reverse portions of the spectrum are nearly equally strong (Fig. 14B). Separate tests using drifting sinusoidal gratings also confirm that this cell is bidirectional. Results from all the binocular simple cells tested are summarized in Fig. 15 which shows correlations between parameters for the left and right eyes. The parameters are all derived from the space-time spectra except for those shown in C and E, which depend also on parameters defined in the space domain. The parameters are: optimal spatial frequency (A), optimal temporal frequency (B), RF width (C), direction selectivity index (DSI) (D), number of RF subregions (E), and optimal velocity(F). Matching of left and right eye values varies. The optimal spatial frequencies (Fig. 15A) are matched to within a factor of 1.2 for 77% of cells. Results of linear regression in the logarithmic domain gives a correlation coefficient of 0.88, and a slope of 1.02. Optimal temporal frequencies are also highly matched to within a factor of 1.4 for 74% of cells (correlation coefficient=0.77, slope=0.81). For RF width, a match to within a factor of 1.5 is found for 82% of the cells (correlation coefficient=0.77, slope 0.96). The matching of direction selectivity indices for the two eyes is not that good by visual inspection, and this is confirmed by regression analysis (correlation coefficient=0.60, slope=0.62). Only 67% of the cells have the DSI for the two eyes matched to within a factor of two (Fig. 15D). However, note that none of the highly direction selective cells (DSI > 0.3 for both eyes) preferred opposite directions of motion for the two eyes. The number of subregions (Fig. 15E) is an intuitive metric that is derived from the optimal spatial frequency and the RF width (Fig. 15A and 15C). It is the number of subregions that one expects to find in the spatial RF profile. Note that it is possible to count the number of subregions directly from a RF profile. However, such attempts involve identifying "significant" subregions with respect to the noise level that varies from one RF to another, and tend to require arbitrary criteria, while the derivation procedure we employ here (see legend for details) may be applied uniformly to all data without an arbitrary criterion. The number J. Neurophysiol. (in press)
26
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
A
B
Opt. SF [cyc/deg]
Opt. TF [Hz]
1
10
3
Fig. 15
0.3 1 0.1 0.1
Right Parameter
C
0.3
0.5 0.5
1
D
RF Width [degs]
1.0
6
0.4
E
0
6
–0.2 –0.2
12
Number of Subregions
3
10
DSI
12
0
1
F
0.4
1.0
Opt. Velocity [deg/sec]
6 30
Comparisons of left and right RFs are made with respect to relevant parameters. The solid oblique line in each scatter plot represents the exact match of left and right parameter values. A: Optimal spatial frequency. Dashed oblique lines indicate a factor of 1.2 difference between left and right values. B: Optimal temporal frequency. Dashed oblique lines indicate a factor of 1.4 difference. C: RF width at 5% of the peak of the envelope. This is given by 1.73 times the value of w obtained by fitting a Gabor function (Fig. 7). Dashed oblique lines indicate a factor of 1.5 difference. D: Direction selectivity index (DSI). Dashed oblique lines indicate a factor of two difference. E: Number of RF subregions. Dashed oblique lines indicate a factor of 1.5 difference. This parameter is derived as: 2 (RF width) (preferred spatial frequency). F: Optimal speed. Dashed oblique lines indicate a factor of two difference. This parameter is given by the ratio of the optimal temporal and spatial frequencies.
10
3
3 0
0
3
6
1
1
3
10
30
Left Parameter
of subregions was matched to within a factor of 1.5 for 78% of the cells (correlation coefficient= 0.64, slope=0.66). Fig. 15F presents a comparison between preferred velocities for left and right eyes for all cells we have tested. For a substantial majority of neurons, optimal velocities for the two eyes are highly matched. The velocities are matched to within a factor of two for 92% of neurons. Yet as a population, the cells cover a wide range of the velocity domain from 1.5 to 40 deg/sec. These results are consistent with the results from our earlier study conducted using drifting sinusoidal gratings (Skottun and Freeman, 1984). However, the previous study compared only the optimal spatial frequency and orientation for the two eyes. It should be noted that all of the five descriptive parameters of the frequency spectrum are obtained from a single test for each eye. These results show that the left and right RFs are highly matched for all these frequency domain parameters. Combined with the comparisons of RFs in the space-time domain presented earlier, the only parameter that exhibits a systematic difference for many cells is the spatial RF phase. All other space-time and frequency domain parameters are generally well matched between the eyes. The results of the left and right eye velocity comparisons are of interest because differential velocity selectivity for the two eyes is thought to be related to motion-in-depth preferences of cortical neurons (Cynader and Regan, 1978, 1982; Spileers et al., 1990). Fig. 16 depicts a modified version of the polar diagram devised by Cynader and Regan (1978). In this diagram, the angle of the vector from the origin to a point indicates the direction of motion in depth as viewed from above. The length of the vector indicates speed. Note that the angle between the lines of sight for the two eyes is 90, and is highly exaggerated compared with realistic viewing conditions (e.g., with an inter-pupillary distance of 4 cm and at a fixation distance of J. Neurophysiol. (in press)
27
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells Left
Right
VL = -VR
VR = 0
VL = 0 AWAY 2VL = VR
B
VL = VR LEFT
A
RIGHT
VR = 0
Fig. 16 Interpretation of monocular horizontal velocities is illustrated schematically with respect to motion-in-depth selectivity of neurons (Cynader and Regan, 1978). Neurons with matched preferred velocities for the two eyes should be selective to stimuli moving along the fronto-parallel plane, and will fall on the horizontal axis (e.g., point marked as A). A mismatch in the two eyes' preferred velocities will give the neuron selectivity to motion-in-depth along other trajectories. For example, if a neuron has preferred velocities that are opposite in direction but of the same magnitude for the two eyes, it is represented by a point along the vertical axis (e.g., point B).
TOWARD
57 cm, the angle between the two eyes' directions is only 4). An advantage of this diagram is that target velocity in depth is given by a simple vector sum of monocular image velocities for the two eyes, because the axes are orthogonal. For a given vector on this diagram, the monocular image velocity for the right eye is given by a projection of the vector onto the +45 deg axis, whereas that for the left eye is given by a projection onto the -45 deg axis. For example, a neuron that has matched optimal velocities for the two eyes would respond best when a stimulus moves within the fronto-parallel plane (vector A in Fig. 16). A neuron that shows opposite preferred directions for the two eyes would respond maximally for movements either toward or away from the animal (vector B in Fig. 16). Mismatched optimal velocities generally result in a preference to motion-in-depth along an oblique trajectory. Therefore, we wish to examine if, as a population, simple cells in the striate cortex are able to encode such information. In order to interpret our results with respect to motion-in-depth, we replot, in Fig. 17A, the data presented in Fig. 15F after correcting for the preferred orientation of each neuron to obtain the optimal horizontal velocity for the two eyes. The optimal horizontal velocity is obtained by dividing the optimal velocity by the sine of the preferred orientation given by the angle from horizontal (i.e., vertical orientation is 90 deg). Points for neurons that prefer orientations near horizontal are shifted toward the upper right corner. Open circles represent cells that are direction selective (DSI > 0.3) for both eyes, whereas filled circles depict those that are bidirectional. There is no systematic difference between the distributions for these two groups of neurons. In Fig. 17B, optimal horizontal velocities for the two eyes are plotted for each neuron on the polar diagram of Fig. 16. A point in Fig. 17B represents a motion-in-depth to which a neuron responds optimally. The angle of a vector from the center to a point specifies the preferred direction in depth and the length represents the optimal speed in depth. Coverage of the polar domain by the data points indicates the range of motion-in-depth that the cell population is capable of encoding. In this plot, direction selective neurons and bidirectional neurons must be plotted separately. For direction selective neurons, there is only one direction of motion that the neuron will respond to in this velocity space. Considering that the preferred directions are matched for all of our direction selective cells (N=20, DSI > 0.3 for both eyes; see Fig. 15D), these cells will only respond to motion to the left or right, near the fronto-parallel plane, as plotted by open circles. However, for bidirectional neurons, because there are two directions to which a neuron responds for each eye, the neuron will be responsive to any of the four cardinal directions in this diagram. Thus, bidirectional neurons are plotted four times in Fig. 17B. From inspection of the coverage of the velocity domain, it is clear that there are four sectors that lack points. These are centered on the lines of sight from the two eyes where the image velocity for one eye is close to zero (objects moving along a line of sight for an eye will have a stationary image on that eye's retina). Therefore, the results shown in Fig. 17 imply that simple cells J. Neurophysiol. (in press)
28
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
A
B
Right Horizontal Opt. Velocity [deg/sec]
AWAY
10
[d 10 eg 0 /s ec ]
100
LEFT
10
VR
= 2V L
RIGHT 2V R =V L
1 1
10
100
Left Horizontal Opt. Velocity [deg/sec]
TOWARD
Fig. 17
Results are shown of a comparison of the optimal horizontal velocities between the two eyes. A: Optimal horizontal velocities for left and right eyes are plotted as a scatter diagram. Open circles represent direction selective cells that had a DSI (direction selectivity index) > 0.3 for both eyes. Closed circles represent data from non-direction selective cells. The solid oblique line indicates an exact match for the two eyes. Dashed oblique lines indicate a factor of two difference in velocities. Seven of 65 cells are not shown because they had a preferred orientation within 5 deg of horizontal for at least one eye, and consequently the preferred horizontal velocity was too large to plot. B: Coverage of the velocity space by the cell population in A is examined by plotting the velocity data on the diagram presented in Fig. 16. The radial dimension of the polar plot is logarithmic. Direction selective cells (open circles) in our sample preferred a matched direction of motion for the two eyes. Therefore, these cells can represent motion only along or near the horizontal axis. The two clusters of points for left- and right-ward motion along the horizontal are duplicates of each other (symmetric about the center). This is done to maintain a sufficient number of points in each sector by assuming optimal velocity distributions that are symmetric for leftand right-ward motions for a population of cells. Non-direction selective cells (filled circles) are plotted 4 times in the diagram because they will respond to any combination of left and right directions as long as the preferred speed is appropriate for each eye. Loci of velocities where left and right values differ by a factor of two are shown by oblique dashed lines.
in the striate cortex cannot represent the motion of objects that fall within these blank regions. We conclude that the encoding of motion-in-depth is not likely to be a function of simple cells in the striate cortex.
Discussion We have proposed a model of binocular disparity representation that is based on the phase difference between left and right RFs of simple cells, and we have presented results that are consistent with this model. We now consider how these results relate to previous studies, the implications of our findings, and some potential difficulties with our experiments. Vertical-horizontal asymmetry The most striking finding of this study is that there are simple cells with different left and right RF profiles, and that this difference is dependent on the preferred orientation of the cells. Cells with different RF profiles for the two eyes are tuned to orientations between oblique and vertical (see Fig. 11). On the other hand, cells that prefer horizontal orientations possess very similar RF profiles for the two eyes. This horizontal-vertical anisotropy of phase differences is functionally important, as it implies that the visual system has evolved to take advantage of a statistical bias in the distribution of binocular disparities in stereo image J. Neurophysiol. (in press)
29
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
pairs. Because of the horizontal separation of the two eyes, the range of binocular disparities in stereo image pairs is larger along the horizontal dimension than along the vertical. This was, of course, known to Barlow et al. (1967) who were the first to address this issue. They reported, for a population of cells, that the distribution of optimal binocular disparities is broader along the horizontal dimension than along the vertical dimension. The finding of a vertical-horizontal asymmetry, however, has not been confirmed by subsequent studies (von der Heydt et al., 1978; Ferster, 1981; LeVay and Voigt, 1988) in which procedures were employed to minimize the effects of eye drifts during experiments. It is possible that the lack of precise control for eye drifts may be a reason for this discrepancy. Other factors may also have affected the results of previous investigations. In early studies, a bright bar stimulus was usually used to map RFs and to study binocular interactions. The minimum response field mapping used in the first study explicitly excludes those areas of the RF that are inhibitory (Barlow et al, 1967). However, areas that are inhibitory to a bright bar are generally excitatory to a dark bar (Kulikowski et al. 1981; Jones and Palmer 1987a; DeAngelis et al. 1993a). Failure to use both bright and dark bars must have influenced estimates of the precise locations of simple cell RFs. Although dark excitatory areas generally produce some OFF response to bright bar stimulation, this is not always the case. For those dark excitatory areas that have a monophasic temporal response, it can be shown that the OFF response to a bright bar will be absent or severely reduced in amplitude. Thus, dark excitatory areas can appear as purely inhibitory if one uses only a bright bar stimulus, and purely inhibitory regions are generally not detectable for simple cells due to a lack of spontaneous discharge (Movshon et al. 1978a). However, there is an additional factor that should be noted. The term "binocular disparity" has been used to refer to two different concepts. On the one hand, the term is used to mean retinal image disparity which is the attribute of the stimuli. On the other, the term has been used to designate receptive field disparity or incongruity which is an attribute of neural wiring (Rodieck 1971; Bishop and Pettigrew, 1986). In the study by Barlow et al. (1967), the distribution of optimal stimulus disparities for a population of neurons showed a larger scatter along the horizontal dimension than along the vertical. Note that this finding does not require any a priori assumption about how disparity is encoded (i.e., position or phase model), as it is specified strictly in terms of stimulus attributes. However, other investigators have looked for, and failed to find, a horizontal-vertical asymmetry in receptive field disparities (Joshua and Bishop, 1970; von der Heydt at al., 1978; Ferster, 1981; LeVay and Voigt, 1988), assuming that the incongruity of minimum response fields is directly related to the preferred disparity (i.e., the position model). Interestingly, Barlow et al. (1967) noted that there are cells for which the preferred disparity cannot be predicted from the positions of minimum response fields for the two eyes. Therefore, it is possible that the data from all of the previous studies are consistent, and that the horizontal-vertical asymmetry observed by Barlow et al. may actually result from the orientation asymmetry that we have observed with respect to RF phase. Relationship between the phase model and the traditional disparity tuning types How does the phase model that we propose relate to the traditional notion of disparity selective neurons? Specifically, how does the model relate to the disparity tuning categories – tuned-excitatory (now subdivided into tuned-zero, tuned-near, tuned-far), near, far, and tuned-inhibitory – that were originally proposed by Poggio and Fischer (1977)? It has been shown that the phase model is consistent with the disparity tuning data presented in previous studies. Prototypical disparity tuning curves of various types may be fit well with the phase model (Nomura et al. 1990; DeAngelis et al., 1995a). Does this mean that the phase model confirms the basic concept of disparity tuning types proposed by Poggio and colleagues? In what follows, we argue that this is not the case. If the phase model is correct, then a classification scheme based on conventional disparity tuning types is flawed. Fig. 18A describes how the preferred disparity of a neuron depends on both the preferred spatial period (reciprocal of spatial frequency) and the phase difference between left and right RFs under the phase model. Since there are cells tuned to a variety of spatial frequencies at each retinal region and eccentricity (Maffei J. Neurophysiol. (in press)
30
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells 10
4
Ne
ar ,
D
B 5
Fa r
2 1.5
a Ne
r,
1
Tu ned
Tuned Inhibitory
3
d Tune
Spatial Period [1/fopt, arc deg]
A
C 0.5
Far
T0
0.2 0.05
0 0
90
180
Phase Difference [deg] Left
B
C
D
Right d=0.5
d=0.5
Fig. 18 Predictions of the phase model are shown in order to relate disparity selectivity of neurons to the traditional classes of disparity tuning. A: A contour plot is shown of preferred disparity as a function of the phase difference between left and right RFs and the preferred spatial period. Each contour represents possible combinations of a phase difference and the spatial period (reciprocal of optimal spatial frequency) that results in preference to a particular disparity. The value indicated along each curve denotes the preferred disparity. The two thick curves show the boundaries of different disparity tuning types as defined by Poggio and colleagues between tuned-zero(T0) and tunednear,tuned-far types (0.05 degs), and also between near, far and tunednear, tuned-far types (1 deg) (Poggio et al., 1988). B & C: Left and right RFs are shown for two cases that result in the same preferred disparity of 0.5 deg. The optimal spatial period for B is 6 and the phase difference is 30 deg, while the spatial period for C is 2 deg and the phase difference is 90 deg. Positions of these cases are indicated also in A along the curve for 0.5 deg disparity. D: This case also has a phase difference of 90 degs as in C, but because it is tuned to low spatial frequency, preferred disparity is correspondingly large.
d=1.5
and Fiorentini, 1973; DeValois et al. 1982), the figure should apply to each retinal location. Each contour is a hyperbolic locus of points defined by, (spatial period) (phase difference) = 2 π disparity
(eq. 4)
and the value of disparity is parametrically changed from 0.05 degs to 4 for different curves. (There is an approximation in eq. 4 because peaks of a Gabor function do not exactly match with those of its sinusoidal component.) Each of these curves represents the locus of possible combinations of phase difference and preferred spatial period (frequency) that produces a given preferred disparity. For example, the points labelled B and C in Fig. 18A, on the locus of disparity = 0.5 degs, correspond to left and right RFs as shown in Fig. 18B and 18C, respectively. One cell, B, has a large spatial period and a small phase difference (30), while the other, C, has a smaller spatial period and a larger phase difference (90 degs). However, the preferred disparities for both are the same, as indicated by the offset of the vertical line, d, in Fig. 18B, and C. Disparity tuning types as defined by Poggio et al. (1988) are noted in the figure. Several additional points can be made based on Fig. 18. First, because the traditional disparity tuning types are based on preferred disparity (at least for the tuned types), the neurons' size or spatial frequency preference is not taken into account (Poggio and Fischer, 1977; Poggio and Poggio, 1984). For example, tuned-excitatory (tuned-inhibitory) neurons are defined as cells that have their peak (or minimum) responses J. Neurophysiol. (in press)
31
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
within a narrow range (0 + 0.1 degs) of horizontal disparities (Poggio et al. 1985). The tuned-zero (T0) type includes cells that exhibit peak responses within + 0.05 degs of zero disparity. Tuned-near (TN) and tunedfar (TF) neurons are those that show a peak at a disparity between 0.05 and 1 deg (Poggio et al., 1988). Therefore, this classification scheme ignores small variations in preferred disparity below a certain criterion. However, such variations may be precisely the mechanism used to represent small disparities with high precision, and therefore may be the basis of the remarkably high stereoacuity that is observed psychophysically (Westheimer and McKee, 1980). In other words, the traditional scheme of disparity tuning types may not adequately describe the actual variety of disparity preferences that are present at high spatial frequencies. Perhaps for this reason, Poggio et al. (1988) subdivided the tuned-excitatory category into three subclasses: tuned-zero, tuned-near, and tuned-far. However, this subdivision merely adds yet another arbitrary criterion for classification, and potentially one requires as many subdivisions as there are size or frequency scales in the visual system (Blakemore and Campbell 1969; Wilson et al. 1983, 1990; DeValois and DeValois, 1988). Clearly, such subdivisions cannot be a fundamental solution in reconciling neurons' selectivities to disparity and spatial scale, as long as the classification is based on the absolute disparity preference alone. Thus, the classification scheme employed by Poggio et al. (1988) does not capture the diversity of disparity selectivities that are possible. The phase model, on the other hand, has a built-in scaling that allows a uniform description of preferred disparity variations across the entire spatial frequency range, without arbitrary criteria to separate one tuning type from another. Second, phase difference and disparity preference are not equivalent, in that a large phase difference does not necessarily imply selectivity to large disparities, and vice versa. For example, points C and D in Fig. 18A have the same phase difference (90 degs), but the preferred disparities are widely different, as a result of the RF profiles shown in Fig. 18C and D, respectively. Furthermore, cells with a spatial period smaller than 1 deg(or preferred spatial frequency higher than 1 c/deg), for example, cannot be selective to disparities larger than 0.5 deg for any phase difference. This means that all cells selective to high spatial frequencies are tuned types, i.e., tuned-zero, tuned-near and tuned-far. Conversely, near and far cells must be tuned to low spatial frequencies. These predictions are consistent with the reported tendency for tunednear and tuned-far cells to have larger disparity tuning widths than tuned-zero neurons (Poggio et al., 1988). In addition, near and far cells exhibit even larger disparity tuning widths than tuned-near and tuned-far types (Poggio et al., 1988). Some of the disparity tuning curves for near and far cells appear to show a flat responsive region on one side of zero disparity that extends out to a large disparity (e.g., see Fig. 9 of Poggio, 1984). However, such a curve can be consistent with the phase model, if the cell was driven to saturation and the range of disparities tested was insufficient to cover the complete tuning curve. Indeed, it has been shown that such curves can be fit well by a phase model that also incorporates a response threshold (Nomura et al. 1990). The phase model may also account for the claim (Hubel and Wiesel, 1970, 1973) that area 17 (V1) neurons are tuned to zero disparity whereas area 18 (V2) neurons are tuned to non-zero disparities. This is perhaps a trivial consequence of a large difference in the average preferred spatial frequencies of neurons in the two areas. The optimal spatial frequency for most area 17 cells is 0.3-2.0 c/deg, while that for most area 18 cells is below 0.3 c/deg (Movshon et al., 1978c). LeVay and Voigt (1988) also report that cells which have an optimal disparity within 0.5 deg of zero show sharper disparity tuning than those tuned to larger disparities. This is also consistent with the expectation of the phase model (i.e., the size-disparity correlation; see DeAngelis et al. 1995a). Another factor that previous studies have examined is the relationship between the ocular dominance of cells and their disparity tuning types. It has been reported that highly binocular neurons tend to be tunedexcitatory types, and near and far cells tend to be nearly monocular (Poggio and Fisher, 1977; Ferster, 1981; Fischer and Krˆger, 1979; LeVay and Voigt,1988). However, we have found no correlation between the phase difference and the ocular dominance (Table 1). In particular, nearly half of our sample of highly binocular cells had a phase difference greater than 45 degs. This is not surprising since the preferred disparity of a cell depends on both the phase difference and optimal spatial frequency. Because of this, a substantial number of cells that were classified as tuned-excitatory in the previous studies might have had a large phase J. Neurophysiol. (in press)
32
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
difference, especially those tuned to high spatial frequencies. It must be noted, however, that our sample does not include cells that are truly monocular. It is conceivable that we are missing a class of near and far cells whose disparity tuning arises from an entirely different mechanism such as a purely inhibitory region similar to that responsible for end stopping (Maske et al. 1986). If such an area is present in the silent eye and is offset from retinal corresponding points, it may endow an apparently monocular neuron with a disparity tuning. Lack of an exact quadrature phase relationship For cells that are tuned to orientations near vertical, we should note that the phase difference distribution does not appear to follow an exact quadrature relationship, i.e, a 90-deg phase difference between left and right fields. In computational models, such quadrature filters are frequently used to encode disparity because of the efficiency and simplicity of computation that results from the quadrature relationship (Sanger, 1988; Jenkin et al., 1988; Fleet et al., 1991). However, the visual cortex does not appear to have taken advantage of the possible gain in efficiency in this case. This is not surprising, however, given that other aspects of RFs also do not follow a quadrature relationship that may be desirable theoretically. For example, the monocular RF phase of neurons does not appear to favor even or odd symmetries. Rather, phase is distributed quite uniformly (Field and Tolhurst, 1986; Jones and Palmer, 1987a; Hamilton et al. 1989; DeAngelis et al., 1993a). There are reports, however, that monocular phase differences between RFs of nearby simple cells occur in multiples of 90 degs (Pollen and Ronner, 1981; Foster et al., 1983; but see DeAngelis et al. 1992). Therefore, at this time, it is not clear which aspects of RF organization may follow the expectations derived from theoretical considerations. The fact that the phase difference between the two eyes' RFs is not always 90 degs does not invalidate the basic notion of the phase model. Clearly, the visual system has developed highly matched RF phase for the two eyes for cells tuned to near-horizontal orientations (Fig. 11). In addition, it can be shown mathematically that a pair of RFs that are not in an exact quadrature can encode the same information as a quadrature pair, as long as the phase difference is not 0 or 180 degs (Gram-Schmidt orthogonalization; see e.g., Williamson and Trotter, 1979). However, the overall signal-to-noise ratio of the encoded information will suffer if the phase difference is too close to 0 or 180 degs, because signals carried by the pair become highly correlated. At least, it is clear that there are suficient numbers of cells with suitable phase differences in the distribution shown in Fig. 11. It is possible that the developmental cost of achieving an exact quadrature relationship would be too high. Alternatively, the visual system may employ a somewhat redundant representation of binocular disparity using many more phase differences than the most parsimonious construction dictates. Computational modeling work by Qian (1994a) employs 8 phase differences to encode disparity at one spatial frequency scale. Although this causes an over-representation of information at each scale, there may be other benefits to a redundant representation, such as more reliable determination of disparity at each location. Integration of disparity information in the 3-D orientation-phase difference-spatial frequency domain Considerations of stereo mechanisms tend to focus on cells that are tuned to near-vertical orientations, because vertical contours carry the predominant information about horizontal disparities. However, cells that are tuned to oblique orientations can still contribute useful information toward a representation of depth (see Fig. 11 inset). Therefore, disparity selectivity of a binocular simple cell must be considered in the 3-D joint domain of spatial frequency, phase difference between left and right RFs, and orientation. The surface plotted in Fig. 19 represents a locus of constant horizontal disparity (1) in this 3-D domain. Any cell whose attributes fall on this surface has a preferred horizontal disparity of 1. Loci for other disparities form similar surfaces (not shown) that are layered around the example surface. Surfaces for larger horizontal disparities lie progressively farther away from the view point used in this plot, whereas those for smaller disparities are J. Neurophysiol. (in press)
33
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
disp
arity =1
deg
Orientation [deg]
80 60
OR
40
a ati Sp
10 20
7.5
od
eri lP
Period 5
Fig. 19 A locus of constant horizontal disparity (=1 deg) is shown as a surface plot as a function of preferred orientation and spatial period, and phase difference between RFs for the two eyes. Any binocular simple cell that lies on this surface has a preferred horizontal disparity of 1 deg. Surfaces are defined by the equation: (spatial period) ( phase difference) = 2 π ( horizontal disparity) / sin(orientation). The grid on the surface consists of three set of curves, each representing contours for one fixed parameter.
0
2.5
g]
[de
0
0
50
100
rence Ph ffeDif i D e s ha
150
[deg]
P
situated progressively closer. Note that the plot shown in Fig.18A corresponds to the top face of the cube in Fig. 19. Loci for all horizontal disparities meet at the phase difference and spatial period axes. Therefore, not all parts of the cube are populated with cells. Specifically, the region near the far bottom vertex of the cube is empty. There are some interesting implications that may be drawn from Fig. 19. First, almost all binocular simple cells may contribute to stereopsis because they are contained within this cube. Second, the notion of coarse-to-fine (or multiple spatial frequency) stereo mechanisms (Marr and Poggio, 1979; Marr 1982; Wilson et al. 1991; Blake and Wilson 1991) should be extended to include the dimension of orientation. One of the motivations for coarse-to-fine stereo models is that ambiguous stereo matches are eliminated or reduced by combining information across multiple spatial frequency scales. This disambiguation process applies equally well to schemes that combine disparity information across multiple orientations. Therefore, we expect that if disparity information is integrated over the joint spatial frequency-orientation domain, further improvements may be achieved for accurate estimation of horizontal disparities. One may visualize the cube in Fig. 19 as a volume populated by binocular simple cells. Presentation of a broad-band or textured planar stimulus in depth (e.g., a random dot stereogram), at a disparity of 1, will "light up" the surface shown in Fig. 19. Such a distinct pattern of neural activity in V1 may uniquely signal the presence of a specific disparity. A neuron that receives input from cells that lie on this surface is a good candidate for a "depth neuron", although it is not clear if there are such neurons in the visual system. Computational models and physiological representations Recently, several computational algorithms for stereopsis have been presented that rely on the phase information contained in a multi-scale image representation (Sanger, 1988; Jenkin et al., 1988; Fleet et al. 1991). In these schemes, each monocular image of a stereo-pair is first transformed into a multi-scale representation using Gabor filters. Then the phase of local spatial frequency components is extracted. The difJ. Neurophysiol. (in press)
34
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
ference between the phases of left and right images at corresponding points are combined across spatial frequency scales to arrive at the final estimate of disparity. On the surface, these computational schemes are quite similar to the phase-based disparity encoding model that we have proposed. In both cases, Gabor functions at multiple spatial frequency scales are used to extract phase information. Also, Gabor patches at corresponding points are used without any positional offset (incongruity) of left and right RFs. One critical difference, however, is that these computational models are not physiologically realizable. In the computational schemes, the phases are obtained by processing monocular images separately and then these phase estimates are binocularly combined to arrive at an estimate of disparity at each location. None of these models employ binocular RFs as they exist for most simple cells in the visual cortex (Kato et al. 1981; Ohzawa and Freeman, 1986a,b). This fact has been pointed out recently (Qian, 1994a). In other words, simple cells implement Gabor filtering and binocular combination simultaneously using binocular RFs. In this sense, the very notion of extracting "matching primitives" from each monocular image is not suitable when considering physiological models of stereopsis. Whether or not an algorithm for disparity computation is physiologically realizable may not be a significant factor for machine vision systems, and at a fundamental level the computational models and the brain may encode information that is mathematically equivalent. Ultimately, however, a physiologically accurate model of visual processing must be based on the properties of binocular RFs that we find. Comparisons with psychophysics The phase-based disparity encoding model has been criticized on the basis of data from human psychophysics (Liu et al. 1992). Although there may be species differences, it is important to consider evidence for and against our model. In humans, there is some evidence that the prediction of a strict phase model does not hold at spatial frequencies higher than 2-3 c/deg. Results of psychophysical experiments on the fusion range for difference-of-Gaussian (DOG) stereo targets indicate that, for spatial frequencies above 2-3 c/deg, subjects are able to fuse stimuli with disparities larger than that predicted from the phase model (Schor and Wood, 1993; Schor et al, 1984a,b). For spatial frequencies below 2-3 c/deg, the prediction of the phase model is in good agreement with psychophysical results. Threshold disparity, measured as a function of the spatial frequency of a sinusoidal grating stimulus, has also been shown to behave in a similar manner although there is substantial individual variation (Legge and Gu, 1989). However, a recent psychophysical study by Smallman and MacLeod (1994) provides somewhat different results. Their psychophysical data generally show good agreement with the prediction of the phase model. Data from one subject closely follows the prediction of the phase model up to a spatial frequency of 11 c/deg. Data from another subject show a gradual deviation from the prediction of the phase model in the form of a reduced slope. In either case, there is not an obvious two-segment curve as found by Schor and colleagues. The gradual deviation found by Smallman and MacLeod (1994) may be consistent with the phase model if one accounts for the fact that the number of subregions in simple cell RFs tends to increase somewhat with spatial frequency (DeValois et al., 1982; DeAngelis et al., 1995a). This means that RF size does not scale inversely with spatial frequency, but shrinks at a smaller rate. For binocular matching, this allows a relatively larger range of disparities to be encoded at high spatial frequencies. Characteristic disparity and the phase model A recent physiological study suggests that the owl's visual system may encode disparity using a mechanism that is based on positional offsets rather than phase differences (Wagner and Frost, 1993, 1994). Using binocularly presented drifting sinusoidal gratings with a variety of phase shifts (Freeman and Robson, 1982; Ohzawa and Freeman, 1986a), they show that peaks of disparity tuning curves obtained at different spatial frequencies coincide at a particular disparity, which is defined as the characteristic disparity for the neuron. Because the peaks will coincide only if the left and right RFs are identical, Wagner and Frost conclude that J. Neurophysiol. (in press)
35
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
left and right RF profiles must be the same. This leads one to the conclusion that, if the characteristic disparity is not zero, then left and right RFs must not be centered at corresponding retinal points. This is a puzzling result in light of our findings from the cat. We have clearly shown that left and right RFs are different for many neurons. For those neurons, a characteristic disparity cannot exist because the prerequisite of matched RF profiles is not satisfied. Why do our results disagree with those of Wagner and Frost? One possibility is that there is simply a species difference between cats and owls in terms of how disparity is encoded. Another possibility is related to the analysis of Wagner and Frost (1993). They compare the goodness of fit of their data to the position and the phase models. It is not clear from the report, however, if this comparison has been made in a neutral way. In our fitting of the left and right RF profiles, the RF phase for each eye is referenced to the center of the Gaussian envelope, which is one of the free parameters. This is because an absolute disparity reference is not available for the paralyzed cat. In other words, our analysis decouples phase and position, allowing us to estimate the phase difference without any assumption concerning incongruity. However, in Wagner and Frost's fitting of the phase model to the data, it appears that the envelopes of the left and right RFs were constrained to the corresponding points that were estimated by measurements. This is a reasonable analysis if there is no possibility of measurement errors. Assume, however, that there is some estimation error for zero disparity (i.e., retinal corresponding points). This error would not affect the fit to the characteristic disparity model (i.e., the position model) as it simply produces an offset that is the same for all spatial frequencies. In other words, peaks will still align at one disparity regardless of the size of the error in estimating zero disparity. They will just coincide at a slightly shifted disparity. On the other hand, if phase is referenced to the estimate of zero disparity, then any errors in this estimate may have a serious impact on the fit of the data to the phase model. For a conclusive resolution of this apparent discrepancy between results from the owl and the cat, one must obtain measurements of RF profiles in the owl's visual Wulst. Testing whether there is a characteristic disparity for cortical neurons in the cat does not seem profitable, for we have already shown that left and right RF profiles are clearly different for a substantial number of cells. For these neurons, there is, by definition, no characteristic disparity. In addition, a computational study has shown that the presense of a characteristic disparity does not necessarily support the position model and may be consistent with both the position and phase models (Qian, 1994b) if responses of a complex cell are considered (Ohzawa et al., 1990; Qian, 1994a). Since Wagner and Frost (1993) did not distinguish simple and and complex cells (or their equivalents in the owl's Wrust), it is possible that some of their data are actually consistent with the phase model. Sensitivity to motion-in-depth Our measurements show that, among all parameters, only the response amplitude and the RF phase change systematically over the time course of the response. However, the changes that occur in these parameters are matched well for the two eyes. Therefore, the phase difference for the two eyes remains quite stable over the time course of the response for most simple cells. Considering that it is the phase difference that determines the preferred disparity (assuming no position disparity), preferred disparity remains constant over the time course of the response for most cells. An implication of this finding is that these cells prefer motion parallel to the fronto-parallel plane. Additional data with respect to simple cells' selectivity for various trajectories of motion-in-depth have also been obtained. For each cell, the preferred horizontal velocity for left and right RFs is determined (Fig. 17A) from analysis of the data in the frequency domain. When the velocity distributions are plotted on the binocular velocity diagram (Fig. 17B), there is a conspicuous lack of data points for directions of motion along the line of sight for each eye. This means that simple cells, as a population, are unable to represent stimuli that move along these trajectories. This situation results from the fact that the preferred velocities of RFs for the two eyes are well matched. To have cells in the empty areas of Fig. 17B would require widely different velocity preferences for the two eyes. Reasons for the missing representations are not known. It J. Neurophysiol. (in press)
36
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
is possible that cells which represent these areas of binocular velocity space may appear monocular due to widely disparate velocity preferences, simply because the stimulus set used in initial tests ( with gratings) may not extend to a low enough velocity range. Therefore, we may have inadvertently excluded these cells from our population, because detailed mapping of RFs could be carried out only for binocular neurons. It should also be noted that binocular velocity space, as presented in Figs. 16 and 17B, is quite distorted as compared with the real map of 3-D space. The missing area of representation, in real spatial terms, represents only a small angle of subtense (Cynader et al. 1978). Therefore, the lack of representation for a small fraction of real space may not be significant for the animal. Alternatively, it is possible that these velocity combinations are encoded elsewhere in the brain. Roles of simple and complex cells In this paper, we have examined RF profiles of simple cells in detail to evaluate if the behavior of a population of cells is consistent with the phase model. We have shown that there is a sufficient set of simple cells to support a phase-based encoding of binocular disparity. Our results, however, do not exclude the possibility that positional offsets (incongruities) also play a role in stereo vision. Definitive determination of the relative contributions of phase differences and position offsets to a cells' disparity preference must await further study. Encoding information for stereopsis, however, is not the sole purpose of simple cells. In fact, because simple cells exhibit approximately linear spatiotemporal response properties (Movshon at al., 1978a; DeAngelis et al. 1993b; McLean et al., 1994), a population of simple cells can preserve the information that is available in the responses of retinal ganglion cells, although in a different form (Watson and Ahumada, 1989; Watson 1991). Simple cell represent visual information in a binocularly combined transform that appears to employ wavelet-like basis functions or local spatio-temporal frequency analyzers (Marcelja, 1982; Robson, 1983; Watson, 1983; Sakitt and Barlow, 1982; Geisler and Hamilton, 1986; Daugman, 1985). Therefore, it would be inappropriate to link the response properties of simple cells exclusively to specific visual functions, e.g., perception of form, depth, or motion. Rather, they carry information that is potentially useful for all of these functions. The implication of such a representation is that simple cells are selective to all of the parameters that they encode. Therefore, if one looks at the information available from a single cell, ambiguity is unavoidable as to which parameter change causes a given change in response. This is a serious problem for any notion of single neurons underlying perception (Barlow, 1972). One way to solve such an ambiguity problem is to combine outputs of multiple neurons. Specifically, activity from multiple simple cells may be combined to produce an output that retains selectivity to a smaller set of parameters of interest, but is insensitive to all other parameters. Complex cells appear to perform precisely such a function. Many complex cells retain selectivity to orientation, spatial frequency, and motion, but their sensitivities to position and the sign of contrast (bright or dark) are largely eliminated (Movshon et al., 1978b; DeValois et al., 1982). An especially notable group of complex cells, with regard to binocular function, are those that respond to a particular disparity independent of stimulus position and contrast polarity, a feature that simple cells lack (Ohzawa et al., 1990). Other complex cells are insensitive to binocular disparity in addition to position and contrast polarity (non-phase-specific cells; Ohzawa and Freeman, 1986b). It is interesting that physiologists tend to seek ever sharper selectivities in a neural response (e.g. the concept of a "grandmother cell"), but do not usually think about a mechanism whose purpose is to make cells insensitive to a particular stimulus parameter. The implicit assumption of many studies is that sharp selectivity is something to be achieved, but insensitivity is not. Although we tend to ignore the insensitivity of neurons to some stimulus parameters, perhaps this should not be the case. Complex cells appear to be actively eliminating sensitivities to a selected set of parameters. To understand how this may be beneficial, we have proposed a complex cell model that combines the outputs of 4 or more simple cells, and we have shown that such a model is able to predict responses of complex cells to binocular stimuli (Ohzawa et al. 1990). In this model, simple subunits with different left and right RFs play a key role in producing asymJ. Neurophysiol. (in press)
37
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
metric disparity tuning profiles. The evidence we have presented in this study, that there are such simple cells, is a crucial requirement and lends further support to the complex cell model that we have proposed. In summary, we have presented neurophysiological results that are consistent with the notion that binocular disparity information is encoded via structural differences between RFs for the two eyes. This phasebased encoding model provides a natural binocular extension to the current notion of spatial form encoding by a population of simple cells. These simple cells form a unified source of information which other cortical neurons can draw upon to subserve a multitude of visual functions including (but not limited to) form and depth perception.
Acknowledgments We thank our colleagues, Geoff Ghose for help with these experiments, and Aki Anzai for valuable comments on the manuscript. We also thank Larry Cormack for suggesting an upper bound for the distribution in Fig. 11. This work was supported by research and CORE grants from the National Eye Institute (EY01175 and EY03176), and by a grant from the Human Frontier Science Program.
References Adelson, E.H., and Bergen, J.R. Spatiotemporal energy models for the perception of motion J. Opt. Soc. of America A 2: 284-299, 1985. Anzai, A., Bearse, M.A. Jr., Freeman, R. D., and Cai, D. Contrast coding by cells in the cat's striate cortex: Monocular vs. binocular detection. Visual Neurosci. 12: 77-93, 1995. Baker, C.L. Jr. Spatial- and temporal-frequency selectivity as a basis for velocity preference in cat striate cortex neurons. Visual Neurosci. 4:101-113, 1990. Barlow, H.B., Blakemore, C., and Pettigrew, J.D. The neural mechanism of binocular depth discrimination J. Physiol. (London) 193: 327-342, 1967. Barlow, H.B. Single units and sensation: A neuron doctrine for perceptual psychology? Perception 1:371394, 1972. Bishop, P.O., and Pettigrew, J.D. Neural mechanisms of binocular vision. Vision Res. 26: 1587-1600, 1986. Blake, R., and Wilson, H.R. Neural models of stereoscopic vision. Trends in Neuroci. 14: 445-452, 1991. Blakemore, C., and Campbell, F.W. On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. J. Physiol. London 203: 237-260, 1969. Bracewell, R.N. The Fourier transform and its applications (2nd ed.). New York: McGraw-Hill, 1978. Carandini, M. and Heeger, D.J. Summation and division by neurons in primate visual cortex. Science 264: 1333-1336, 1994. Casanova, C., Nordmann, J.P., Ohzawa, I., and Freeman, R.D. Direction selectivity of cells in the cat's striate cortex: differences between bar and grating stimuli. Visual Neurosci. 9: 505-513, 1992. Cynader, M., and Regan, D. Neurones in cat parastriate cortex sensitive to the direction of motion in threedimensional space J. Physiol. (London) 274: 549-569, 1978. Cynader, M., and Regan, D. Neurons in cat visual cortex tuned to the direction of motion in depth: effect of positional disparity Vision Res. 22: 967-982, 1982. Daugmann, J.G. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. America A, 2: 1160-1169, 1985. J. Neurophysiol. (in press)
38
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
DeAngelis, G.C., Ohzawa, I., and Freeman, R.D. Depth is encoded in the visual cortex by a specialized receptive field structure. Nature (London) 352: 156-159, 1991. DeAngelis, G.C., Ghose, G.M., Ohzawa, I., and Freeman, R.D. Spatiotemporal receptive field structure and phase relationships between adjacent simple cells in the cat's striate cortex. Soc. Neurosci. Abs. 18: p.10, 1992. DeAngelis, G.C., Ohzawa, I., and Freeman, R.D. Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. J. Neurophysiol. 69: 1091-1117, 1993a. DeAngelis, G.C., Ohzawa, I., and Freeman, R.D. Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. II. Linearity of temporal and spatial summation J. Neurophysiol. 69: 1118-1135, 1993b. DeAngelis, G.C., Ohzawa, I., and Freeman, R.D. Neural mechanisms underlying stereopsis: how do simple cells in the visual cortex encode binocular disparity? Perception 24: 3-31, 1995a. DeAngelis, G. C., Anzai, A., Ohzawa, I., and Freeman, R. D. Receptive field structrure in the visual cortex: does selective stimulation induce plasticity? Proc. Nat. Acad. Sci. 92 (in press), 1995b. DeBoer, E., and Kuyper, P. Triggered correlation. IEEE Trans. on Biomed. Engin. 15: 169-179, 1968. DeValois, R.L., Albrecht, D.G., and Thorell, L.G. Spatial frequency selectivity of cells in macaque visual cortex. Vision Res. 22: 545-559, 1982. DeValois, R.L., and DeValois, K.K. Spatial Vision. (New York: Oxford University Press), 1988. Eckhorn, R., Krause, F., and Nelson, J.I. The RF-cinematogram. A cross-correlation technique for mapping several visual receptive fields at once. Biol. Cybern., 69: 37-55, 1993. Eggermont, J.J., Johannesma, P.I.M., and Aertsen, A.M.H.J. Reverse-correlation methods in auditory research. Quarterly Rev. Biophys., 16: 341-414, 1983. Emerson, R.C., Citron, M.C., Vaughn, W.J., and Klein, S.A. Nonlinear directionally selective subunits in complex cells of cat striate cortex. J. Neurophysiol. 58: 33-65, 1987. Ferster, D. A comparison of binocular depth mechanisms in Areas 17 and 18 of the cat visual cortex. J. Physiol. (London) 311: 623-655, 1981. Field, D.J., and Tolhurst, D.J. The structure and symmetry of simple-cell receptive-field profiles in the cat's visual cortex. Proc. Roy. Soc. London B:Biol. Sci. 228: 379-400, 1986. Fischer, B. and Krˆger, J. Disparity tuning and binocularity of single neurons in cat visual cortex. Exp. Brain Res. 35: 1-8, 1979. Fleet, D.J., Jepson, A.D., and Jenkin, M.R.M. Phase-based disparity measurement. CVGIP: Image Understanding 53: 198-210, 1991. Foster, K.H., Gaska, J.P., Marcelja, J., and Pollen, D.A. Phase relationships between adjacent simple cells in the feline visual cortex. J. Physiol (London) 345: 22P, 1983. Freeman, R.D., and Ohzawa, I. On the neurophysiological organization of binocular vision. Vision Res. 30: 1661-1676, 1990. Freeman, R.D., and Robson, J.G. A new approach to the study of binocular interaction in visual cortex: Normal and monocularly deprived cats. Exp. Brain Res. 48: 296-300, 1982. Frisby, J.P., and Mayhew, J.E.W. 1978.
Contrast sensitivity function for stereopsis. Perception 7: 423-429,
J. Neurophysiol. (in press)
39
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Gabor, D. Theory of communication. J. Inst. Elec. Engin. 93: 429-457, 1946. Geisler, W.S., and Hamilton, D.B. 70, 1986.
Sampling-theory analysis of spatial vision. J. Opt. Soc. Am. A, 3: 62-
Ghose, G.M., Ohzawa, I., and Freeman, R.D. Receptive-field maps of correlated discharge between pairs of neurons in the cat's visual cortex. J. Neurophysiol. 71: 330-346, 1994. Ghose, G. M., Freeman, R. D., and Ohzawa, I. Local intracortical connections in the cat's visual cortex: postnatal development and plasticity. J. Neurophysiol. 72: 1290-1303, 1994. Ghose, G.M., Ohzawa, I., and Freeman, R.D. A flexible PC-based physiological monitor for animal experiments J. Neurosci. Methods (in press), 1995. Hamilton, D.B., Albrecht, D.G., and Geisler, W.S. Visual cortical receptive fields in monkey and cat: Spatial- and temporal-phase transfer function. Vision Res. 29: 1285-1308, 1989. Hubel, D.H., and Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. (London) 160: 106-154, 1962. Hubel, D.H., and Wiesel, T.N. Cells sensitive to binocular depth in area 18 of the macaque monkey cortex. Nature 255: 41-42, 1970. Hubel, D.H., and Wiesel, T.N. A re-examination of stereoscopic mechanisms in area 17 of the cat. J. Physiol. (London) 232: 29-30P, 1973. Jenkin, M.R.M., and Jepson, A.D. The measurement of binocular disparity. In: Pylyshyn Z (ed) "Computational processes in human vision" Ablex, Norwood, NJ, 1988. Jones, J.P., and Palmer, L.A. The two-dimensional spatial structure of simple receptive fields in the cat striate cortex. J. Neurophysiol. 58: 1187-1211, 1987a. Jones, J.P., and Palmer, L.A. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58: 1233-1258, 1987b. Joshua, D.E., and Bishop, P.O. Binocular single vision and depth discrimination. Receptive field disparities for central and peripheral vision and binocular interaction on peripheral single units in cat striate cortex" Exp. Brain Res. 10: 389-416, 1970. Kato, H., Bishop, P.O., and Orban, G.A. Binocular interaction on monocularly discharged lateral geniculate and striate neurons in the cat. J. Neurophysiol. 46: 932-951, 1981. Klein, S.A., and Beutter, B. Minimizing and maximizing the joint space-spatial frequency uncertainty of Gabor-like functions: comment. J. Opt. Soc. Am. A, 9: 337-340, 1992. Kulikowski, J.J., Bishop, P.O., and Kato, H. Spatial arrangements of responses by cells in the cat visual cortex to light and dark bars and edges. Exp. Brain Res., 44: 371-385, 1981. Legge, G.E., and Gu, Y. Stereopsis and contrast. Vision Res. 29: 989-1004, 1989. Lehky, S.R., and Sejnowski, T.J. Neural model of stereoacuity and depth interpolation based on a distributed representation of stereodisparity. J. Neurosci. 10: 2281-2299, 1990. LeVay, S., and Voigt, T. Ocular dominance and disparity coding in cat visual cortex. Visual Neurosci. 1: 395-414, 1988. Levick, W.R. Another tungsten microelectrode. Med. Biol. Engin. 10: 510-515, 1972. Liu, L., Tyler, C.W., Schor, C.M., and Ramachandran, V.S. Position disparity is more efficient in encoding depth than phase disparity. Inv. Ophthalmol. Vis. Sci. (suppl.) 33: 1373, 1992. Maffei, L. and Fiorentini, A. The visual cortex as a spatial frequency analyser. Vision Res. 13: 1255-1268, J. Neurophysiol. (in press)
40
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
1973. Marcelja, S. Mathematical description of the responses of simple cortical cells. J. Opt. Soc. Am. 70 12971300, 1980. Marr, D., and Poggio, T. A computational theory of human stereo vision. Proc. Roy. Soc. London B:Biol. Sci. 204: 301-328, 1979. Marr, D. Vision, WH Freeman Co., New York, 1982. Maske, R., Yamane, S., and Bishop, P.O. Binocular simple cells for local stereopsis: Comparison of receptive field organizations for the two eyes. Vision Res. 24: 1921-1929, 1984. Maske, R., Yamane, S., and Bishop, P.O. End-stopped cells and binocular depth discrimination in the striate cortex of cats. Proc. Roy. Soc. London B 229: 257-76, 1986. Mayhew, J.E.W., and Frisby, J.P. Convergent disparity discriminations in narrow-band-filtered random-dot stereograms. Vision Research 19: 63-71, 1979. McLean, J., and Palmer, L.A. Contribution of linear spatiotemporal receptive field structure to velocity selectivity of simple cells in the cat's striate cortex. Vision Res. 29: 675-679, 1989. McLean, J., Raab, S., and Palmer, L.A. Contribution of linear mechanisms to the specification of local motion by simple cells in areas 17 and 18 of the cat. Visual Neurosci. 11:271-294, 1994. McLean, J., and Palmer, L.A. Organization of simple cell responses in the three-dimensional (3-D) frequency domain. Visual Neurosci. 11:295-306, 1994. Movshon, J.A., Thompson, I.D., and Tolhurst, D.J. Spatial summation in the receptive fields of simple cells in the cat's striate cortex. J. Physiol. (London) 283: 53-77, 1978a. Movshon, J.A., Thompson, I.D., and Tolhurst, D.J. Receptive field organization of complex cells in the cat's striate cortex. J. Physiol. London 283: 79-99, 1978b. Movshon, J.A., Thompson, I.D., and Tolhurst, D.J. Spatial and temporal contrast sensitivity of neurones in areas 17 and 18 of the cat's visual cortex. J. Physiol. (London) 283: 101-120, 1978c. Mullikin, W.H., Jones, J.P., and Palmer, L.A. Periodic simple cells in cat area 17. J. Neurophysiol. 52: 372-387, 1984. Nikara, T., Bishop, P. O., and Pettigrew, J. D. Analysis of retinal correspondence by studying receptive fields of binocular single units in cat striate cortex. Exp. Brain Res. 6 353-372, 1968. Nomura, M., Matsumoto, G., and Fujiwara, S. A binocular model for the simple cell. Biol. Cybern. 63 237242, 1990. Ohzawa, I., and Freeman, R.D. The binocular organization of simple cells in the cat's visual cortex. J. Neurophysiol. 56: 221-242, 1986a. Ohzawa, I., and Freeman, R. The binocular organization of complex cells in the cat's visual cortex. J. Neurophysiol. 56: 243-259, 1986b. Ohzawa, I., DeAngelis, G.C., and Freeman, R.D. Stereoscopic depth discrimination in the visual cortex: Neurons ideally suited as disparity detectors. Science 249: 1037-1041, 1990. Pettigrew, J.D. Binocular interaction on single units of the striate cortex of the cat. Thesis submitted for the degree of Sc.B. (Med.), Dept. of Physiology, Univ. of Sydney, Australia, 1965. Pettigrew, J.D., Nikara, T., and Bishop, P.O. Binocular interaction on single units in cat striate cortex: Simultaneous stimulation by single moving slit with receptive fields in correspondence. Exp. Brain Res. 6: 391-410, 1968. J. Neurophysiol. (in press)
41
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Poggio, G.F., and Fischer, B. Binocular interaction and depth sensitivity in striate and prestriate cortex of behaving Rhesus monkey. J. Neurophysiol. 40: 1392-1405, 1977. Poggio, G.F. P rocessing of stereoscopic information in primate visual cortex. in Dynamic Aspects of Neocortical Function Eds G Edelman, W E Gall, W M Cowan (New York: Wiley) pp. 613-635, 1984. Poggio, G.F., Motter, B.C., Squatrito, S., and Trotter, Y. Responses of neurons in visual cortex (V1 and V2) of the alert macaque to dynamic random-dot stereograms. Vision Res. 25: 397-406, 1985. Poggio, G.F., Gonzalez, F., and Krause, F. Stereoscopic mechanisms in monkey visual cortex: Binocular correlation and disparity selectivity. J. Neurosci. 8: 4531-4550, 1988. Poggio, G.F., and Poggio, T. The analysis of stereopsis. Ann. Rev. Neurosci. 7:379-412, 1984. Pollen, D.A., and Ronner, S.F. Phase relationships between adjacent simple cells in the visual cortex. Science, 212: 1409-1411, 1981. Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. Numerical Recipes in C 2-nd ed. Cambridge Univ. Press, 1992. Qian, N. Computing stereo disparity and motion with known binocular cell properties. Neural Computation 6: 390-404, 1994a. Qian, N. Stereo model based on phase parameters can explain characteristic disparity. Soc. Neurosci. Abstr. 20: 624, 1994b. Reid, R.C., Soodak, R.E., and Shapley, R.M. Directional selectivity and spatiotemporal structure of receptive fields of simple cells in cat striate cortex. J. Neurophysiol. 66: 505-529, 1991. Robson, J.G. Frequency domain visual processing. In: Physical and biological processing of images, Eds: Braddick OJ, Sleigh AC. Springer-Verlag, Berlin, 1983. Rodiek, R.W. Central nervous system: afferent mechanisms. Ann. Rev. Physiol. 33: 203-240, 1971. Sakitt, B., and Barlow, H.B. A model for the economical encoding of the visual image in cerebral cortex. Biol. Cybern. 43: 97-108, 1982. Sanger, T.D. Stereo disparity computation using Gabor filters. Biol. Cybern. 59: 405-418, 1988. Schiller, P.H., Finlay, B.L., and Volman, S.F. Quantitative studies of single-cell properties in monkey striate cortex. I. Spatiotemporal organization of receptive fields. J. Neurophysiol. 39: 1288-1319, 1976. Schor, C.M., and Wood, I. Disparity range for local stereopsis as a function of luminance spatial frequency. Vision Res. 23: 1649-1654, 1983. Schor, C.M., Wood, I., and Ogawa, J. Spatial tuning of static and dynamic local stereopsis. Vision Res. 24: 573-578, 1984a. Schor, C.M., Wood, I., and Ogawa, J. Binocular sensory fusion is limited by spatial resolution. Vision Res. 24: 661-665, 1984b. Sclar, G. Expression of "retinal" contrast gain control by neurons of the cat's lateral geniculate nucleus. Exp. Brain Res. 66: 589-596, 1987. Shapley, R. M., and Victor, J.D. How the contrast gain control modifies the frequency responses of cat retinal ganglion cells. J. Physiol. London 318: 161-179, 1981. Skottun, B.C., DeValois, R.L., Grosof, D.H., Movshon, J.A., Albrecht, D.G., and Bonds, A.B. Classifying simple and complex cells on the basis of response modulation. Vision Res. 31: 1079-1086, 1991. Skottun, B.C., and Freeman, R.D. Stimulus specificity of binocular cells in the cat's visual cortex: ocular dominance and the matching of left and right eyes. Exp. Brain Res. 56: 206-216, 1984. J. Neurophysiol. (in press)
42
I. Ohzawa, G. DeAngelis, & R. Freeman --- Encoding of Binocular Disparity by Simple Cells
Smallman, H.S., and MacLeod, D.I.A. A size-disparity correlation in stereopsis at contrast threshold. J. Opt. Soc. Am. A 11: 2169-2183, 1994. Spileers, W., Orban, G.A., Gulys, B., and Maes, H. Selectivity of cat Area 18 neurons for direction and speed in depth. J. Neurophysiol. 63: 936-954, 1990. Stork, D.G., and Wilson, H.R. Do Gabor functions provide appropriate descriptions of visual cortical receptive fields? J. Opt. Soc. Am. A 7:1362-1373, 1990. Tolhurst, D.J., and Dean, A.F. Evaluation of a linear model of directional selectivity in simple cells of the cat's striate cortex. Visual Neurosci., 6: 421-428, 1991. von der Heydt, R., Adorjani, C.S., HŸnny, P., and Baumgartner, G. Disparity sensitivity and receptive field incongruity of units in the cat striate cortex. Exp. Brain Res. 31: 523-545, 1978. Wagner, H., and Frost, B. Disparity-sensitive cells in the owl have a characteristic disparity. Nature 364: 796-798, 1993. Wagner, H., and Frost, B. Binocular responses of neurons in the barn owl's visual Wulst. J. Comp Physiol. A 174: 661-670, 1994. Watson, A.B. Detection and recognition of simple spatial forms. In: Physical and biological processing of images. Eds: Braddick OJ, Slade AC. Springer-Verlag, Berlin, 1983. Watson, A.B., and Ahumada, A.J. Jr. Model of human visual motion sensing. J. Opt. Soc. Am A, 2: 322342, 1985. Watson, A.B., and Ahumada, A.J. Jr. A hexagonal orthogonal-oriented pyramid as a model of image representation in visual cortex. IEEE Trans. Biomed. Engin. 36: 97-106, 1989. Watson, A.B. Cortical Algotecture. In Blakemore CB (Ed.), Vision: Coding and efficiency, Cambridge, Cambridge University Press, 1991. Westheimer, G., and McKee, S.P. Stereoscopic acuity with defocused and spatially filtered retinal images. J. Opt. Soc. Am. 70: 772-778, 1980. Wheatstone, C. Contributions to the physiology of vision. Part the first. On some remarkable, and hitherto unobserved, phenomena of binocular vision. Philos. Trans. Roy. Soc. London 2: 371-393, 1838. Williamson, R.E., and Trotter, H.F. Multivariable Mathematics, 2-nd Ed. Prentice-Hall, New Jersey, 1979. Wilson, H.R., FcFarlane, D.K., and Phillips, G.C. Spatial frequency tuning of orientation selective units estimated by oblique masking. Vision Res. 23: 873-882, 1983. Wilson, H.R., Levi, D., Maffei, L., Rovamo, J., and DeValois, R. The perception of form: Retina to striate cortex. In: Visual perception: The neurophysiological foundations. (Eds.) Spillman L, Werner JS, Academic Press. New York, 1990. Wilson, H.R., Blake, R., and Halpern, D.L. Coarse spatial scales constrain the range of binocular fusion of fine scales. J. Opt Soc Am A 8:229-236, 1991. Young, R.A. The Gaussian derivative model for spatial vision: I. Retinal mechanisms. Spatial Vision 2: 273-293, 1987. Young, R.A., and Lesperance, R.M. A Physiological model of motion analysis for machine vision. Technical Report GMR-7878, General Motors Research Laboratories. Warren MI, 1993.
J. Neurophysiol. (in press)
43