Jun 16, 1989 - jects a diagonal line moving behind a horizontal rec- tangular window ...... nections in a 1-dimensional slice across the network. Implementation ...
Neural Networks. Vol. 3, pp. 45-74, 1990 Printed in the USA. All rights reserved.
0893-6080/90 $3.00 + ,00 Copyright ~, 1990 Pcrgamon Press plc
ORIGINAL CONTRIBUTION
Self-Organizing Neural Networks for Perception of Visual Motion JONATHAN A . MARSHALL Boston University (Received 6 January 1989; revised and accepted 16 June 1989)
Abstract--The human visual system overcomes ambiguities, collectively known as the aperture problem, in its local measurements of the direction in which visual objects are moving, producing unambiguous percepts of motion. A new approach to the aperture problem is presented, using an adaptive neural network model. The neural network is exposed to moving images during a developmental period and develops its own structure by adapting to statistical characteristics of its visual input history. Competitive learning rules ensure that only connection "chains" between cells of similar direction and velocity sensitivity along successive spatial positions survive. The resultant self-organized configuration implements the type of disambiguation necessary for solving the aperture problem and operates in accord with direction judgments of human experimental subjects. The system not only accommodates its structure to long-term statistics of visual motion, but also simultaneously uses its acquired structure to assimilate, disambiguate, and represent visual motion events in real-time.
Keywords---Self-organization,Motion perception, Hypercomplexcells, Cooperative-competitivelearning, Neural networks, Aperture problem, Intrinsic connections, Visual tracking. 1. I N T R O D U C T I O N
Subjectively, the process of seeing is effortless; our visual systems perform complex visual tasks, such as motion detection and depth perception, with striking ease. Yet even the simplest, most commonplace visual tasks, such as distinguishing light from dark, marshal an enormous array of neural processing mechanisms. Not surprisingly, the visual areas of the brain appear to have a highly complex internal organization in order to accomplish their diverse set of processing tasks. Despite its complexity, the structure of visual cor-
Acknowledgements--Based on the author's Ph.D. dissertation submitted to Boston University. Supported in part by Boston University (University Graduate Fellowship award) and by grants to Dr. Stephen Grossberg and the Boston University Center tk~r Adaptive Systems from the Air Force Office of Scientific Research (AFOSR 85-0149, F49620-86-C-0037, and F49620-87-C-0018), the Army Research Office (ARO DAAG-29-85-K-0095), and the National Science Foundation (NSF IRI-84-17756). The author thanks Stephen Grossberg for his support, instruction, and useful criticisms. The author also gratefully acknowledges the help of Ennio Mingolla in every aspect of this research. Requests for reprints should be sent to Jonathan A. Marshall, Center for Research in Learning, Perception, and Cognition, 205 Elliott Hall, University of Minnesota, Minneapolis, MN 55455.
tex in higher animals is not formed strictly by genetic means. Rather, it is determined by genetics plus adaptation to visual experience during a developmental period. In animals such as cats, monkeys, and human beings, a rudimentary neural interconnection structure is genetically set up in visual cortex before birth. After birth, the animal is exposed to the visual world. Its cortical interconnection structure becomes tuned, adaptively gaining sensitivity to the kinds of visual input that are likely to occur in the animal's environment, and losing sensitivity to unlikely visual events. Such developmental tuning has been demonstrated experimentally in regard to several aspects of visual processing, including sensitivity to orientation (Braastad & Heggelund, 1985; Fr6gnac & Imbert, 1978, 1984; Gary-Bobo, Milleret, & Buisseret, 1986; Hirsch & Spinelli, 1970; Hubel & Wiesel, 1970; Wiesel & Hubel, 1965), spatial frequency (Derrington, 1984), binocularity (Fr6gnac & Imbert, 1978; Trotter, Fr6gnac, & Buisseret, 1987), depth (Graves, Trotter, & Fr6gnac, 1987), and motion (Cremieux, Orban, Duysens, & Amblard, 1987; Kennedy & Orban, 1983; Pasternak & Leinen, 1986). This paper explores how a visual system can adapt to aspects of its visual environment, and in so doing, can produce useful visual processing structures without human intervention. A general-purpose class of adaptive mechanisms is proposed and applied to a
46
J, A. Marshall
fundamental visual processing problem: the aperture problem in motion detection. The results suggest not only novel avenues of research in visual motion processing, but also a general means by which a perceptual system can adaptively use its input history to construct representation mechanisms for new input. Several papers have explored adaptive approaches to understanding important issues in early visual processing, such as how cells can become sensitive to spatial contrasts and contrast orientations, and how orientation "columns" can form in visual cortex (Ahumada & Yellott, 1988; Amari, 1977; Bienenstock, Cooper, & Munro, 1982; Fukushima & Miyake, 1982; Grossberg, 1976b; Linsker, 1986a, 1986b, 1986c; Nagano & Kurata, 198t; O'Toole & Kersten, 1986; Pearson, Finkel, & Edelman, 1987; Poggio & Hurlbert, 1988; Singer, 1983, 1985a, 1985b; Takeuchi & Amari, 1979; v o n d e r Malsburg, 1973; v o n d e r Malsburg & Cowan, 1982). This paper builds on the earlier results, but addresses issues at a somewhat higher processing level: perception of visual motion.
(a)
2. THE APERTURE PROBLEM (b)
Direction Ambiguity What the activity of a simple cell (Hubel & Wiesel, 1962, 1963, 1968) represents is inherently ambiguous. Grossberg and Mingolla (1985a, 1985b) showed that the spatial extent of a simple cell's receptive field induces uncertainty about both the position and orientation of a visual stimulus. A simple cell's activity is ambiguous in an additional sense, however, because it can indicate the presence of a visual contrast edge moving in any of several directions, instead of just one direction. Because each cell's receptive field is sensitive to visual stimuli only within a spatially local region, or "aperture," no single simple cell can determine the actual direction of motion of an edge (Figure la). The aperture problem (Hildreth, 1983; Marr, 1982; Marr & Ullman, 1981) arises as a consequence of this ambiguity: since the local direction of motion signaled by the activity of such cells is ambiguous, how can the actual direction of motion of a contrast edge be computed and represented unambiguously? The task associated with the aperture problem is to specify how such ambiguous local activations can be combined coherently to form unambiguous global motion percepts. The ambiguity in the aperture problem can be traced back to two degrees of freedom in the motion measurements performed by simple cells. First, a simple cell can only partly localize the position of an edge segment. An edge segment can activate many simple cells whose receptive fields are positioned along the edge (Figure la). But none of the simple cells
individually can indicate where the segment lies along the cell's preferred orientation axis relative to the cell's receptive field position (Figure lb). Second, a simple cell is sensitive only to one local component of motion direction and hence to many possible actual motion directions consistent with its local motion preference. It cannot indicate which actual direction activates it (Figure la). In many analyses of the aperture problem (e.g., Adelson & Movshon, 1982; Ferrera & Wilson, 1988; Heeger, 1987, 1988; Movshon, Adelson, Gizzi, & Newsome, 1985; Sereno, 1986, 1987; Tanner, 1986; Welch, 1988), the two degrees of freedom are treated simultaneously. In a hierarchy o f cortical visual processing stages (Van Essen & Maunsetl, 1983),a stage of simple cells is considered to supply input to a stage of "pattern-motion" cells, similar to ones found in area MT of macaque monkey cortex (Movshon et al., 1985). A pattern-motion cell combines motion information from multiple local sources and becomes
Self-Organization in Motion Perception
activated when globally coherent motion in a prescribed direction is present in the cell's receptive field. The motion sensitivity of higher-level cortical cells, such as those in areas V3, MT, and STS, is the subject of intensive research (Albright, 1984; Allman, Miezin, & McGuinness, 1985; Felleman & Van Essen, 1987; Maunsell & Van Essen, 1983; Mikami, Newsome, & Wurtz, 1986a, 1986b; Newsome, Mikami, & Wurtz, 1986; Newsome & Par6, 1988; Rodman & Albright, 1987; Saito et al., 1986). The neural mechanism by which the local sources, such as simple cells, contribute to the activity of pattern-motion cells in MT has not yet been precisely identified, however. This paper will treat the two degrees of freedom in the aperture problem separately, resulting in a new kind of solution. One degree of freedom is eliminated by first localizing visual features, such as contrast edges. The second degree of freedom can then also be eliminated quite easily using a simple tracking network. Effect of Window Shape on Perceived Direction
Wallach (1935, 1976) investigated certain phenomena related to the aperture problem. He showed that when human subjects view moving objects which appear to be occluded except within a "window" region, the shape of the window affects the perceived direction of motion of objects viewed through the window. For example, he showed experimental subjects a diagonal line moving behind a horizontal rectangular window (Figure 2a). His subjects tended to report that the visible segment appeared to move diagonally when it was in the left or right corner of the rectangle, but horizontally when it was in the middle portion of the rectangle (Figure 2b). When the segment is near the corners of the rectangle, its endpoints appear to move in perpendicular directions, and the segment's length changes (Figure 2c). But when it travels along the middle section of the rectangle, its endpoints both move horizontally, and the segment's length is constant. The direction percepts are the same even if the borders of the window are invisible (Figure 2d). How do our visual systems determine the apparent direction of motion of the segment? A variety of factors, including depth (Shimojo, Silverman, & Nakayama, 1988), presence of additional segments (Wallach, 1935), length, endpoint motion, and position, may influence the segment's apparent actual direction of motion. It is important to note that the segment's apparent actual direction of motion cannot be determined by examining the motion of either of its endpoints separately; information from both of the bar's endpoints must be integrated nonlocally in order to construct the percept of motion. Thus, our visual systems are capable of integrating visual in-
47
i
!
i¸
~ ......
i ~i
......
(a)
(b)
(c)
t=l t=2 t=3 t=4 t=5 1=6 t=7 t=8 1=9
(d) FIGURE 2. (a) Subject views a diagonal stripe moving behind a horizontally elongated rectangular window. (b) Stripe segment appears to move diagonally near the rectangle's corners but horizontally along its length. (c) Stripe's position, length, and endpoints may influence its perceived direction of motion. Arrows indicate motion of endpoints. (d) Same direction percepts arise when window borders are invisible.
formation from widely separated regions of an image (Nelson & Frost, 1985; Norman, Lappin, & Wason, 1988). From that fact, it is only a short step to the conclusion that our visual systems contain processing units sensitive to widely separated regions of the image. Wallach's (1935) study thus raises a key question for visual perception: how is information combined over long spatial ranges to produce a unitary motion percept for each visual object? In order to address unsolved aspects of the aperture problem, the present work expands the formulation of the problem proposed by Marr and Ullman (1981). This paper shows how a visual system can use spatially nonlocal contextual cues (Marshall, 1988a) to construct a representation of the motion of a contour as a whole, not just at confined regions where motion is determined by intersecting local constraints. The definition of the aperture problem proposed
48
L A. Ma;~shalI
by Marr and UUman (1981) applies only to systems that use local contour detectors where "the motion is detected by a unit that is small compared with the overall contour" (p. 154). However, their definition may be overly restrictive as a characterization of the operations that human motion perception systems must perform, because Wallach's (1935) study of window shape showed that the motion of a single bar can produce strong directional percepts, even where no small detector units can signal the overall direction of motion. In recognition of the fact that information from the endpoints of a moving contour must be combined, an analysis of the aperture problem must permit the effective sizes of the receptive fields of some contour-detection units to be large, comparable to the image sizes of the contours whose direction of motion is to be determined. This choice will allow information about the motion of distinguishing points, such as endpoints, to be integrated over the long spatial ranges necessary for motion perception (Burbeck, 1985, 1986, 1987; Norman et al., 1988), producing a unitary representation of each contour's velocity and direction of motion. The statement of the aperture problem remains the same: how does the system obtain global representations of actual direction of motion, given locally ambiguous motion measurements? The present analysis emphasizes these fundamental questions raised by Wallach (1935) regarding the endpoints of contours. In addition, this paper will specify how certain aspects of a visual system's motion processing mechanisms can adaptively self-organize, that is, form without guidance from an external teacher. 3. A MOTION PROCESSING NETWORK
This section describes the architecture of a self-organizing neural network that computes actual direction of motion. The network generates representations of motion analogous to the percepts arising from Wallach's (1935) displays. The network's structure and operation after self-organization has occurred will be described first, in order to motivate the analysis of how the network acquired its special structure. The network's self-organization will be described later in this paper. The network contains two layers of cells: an input layer L1 and a processing layer L2. For the example at hand, L1 consists of idealized cells analogous to the hypercomplex cells (Hubel & Wiesel, 1965, 1968, 1977) found in visual cortex, that is, cells sensitive to the orientation, length, and local direction of motion of visual stimuli in their receptive field. Hypercomplex cells are similar to simple cells; however, hypercomplex cells also possess inhibitory end-zones
flanking the central excitatory regions of their receptive fields (Hubel & Wiesel, 1965, 1977; Kato, Bishop, & Orban, 1978; Orban, Kato, & Bishop, 1979a, 1979b). A hypercomplex cell's activity is inhibited if an edge extends into onc or both of its inhibitory end-zones. Both simple and hypercomplex cells fire with increasing strength as a function of input edge segment length--up to the length of the excitatory region of the cell's receptive field. For longer stimuli, simple cells continue to fire at their maximum rate, whereas hypercomplex cells respond more weakly. A hypercomplex-type LI cell thus can detect a contrast segment of its preferred orientation and length moving through its receptive field in any of several possible direction/speed combinations. However, it cannot distinguish in which one of the direction/speed combinations the segment is actually moving. The network takes the output of the L1 cells. which are already sensitive to locally measured ("shortrange") direction of motion (Adelson & Bergen, 1985; Anstis, 1977; Braddick, 1974, 1980; Harris, 1986; Reichardt, 1961; Sperling, van Santen, & Burr, 1985; van Santen & Sperling, 1984), and---in a manner to be shown--creates a representation in L2 that indicates actual velocity. As an oriented edge segment moves across the retina, the responses of hypercomplex cells whose stimulus preferences do not match the properties of the edge segment (e.g., wrong position or wrong length) are attenuated due to their tuning. The only hypercomplex cells that respond optimally are those whose receptive field position, preferred orientation, length, and speed most closely match that of the edge segment; the activity of this limited subset of the hypercomplex cells can be used as the representation of the edge segment. Edge segments can be effectively localized by finding the activity peaks across the hypercomplex layer. (Simple cells cannot localize visual features in this manner.) The activity of hypercomplex cells can thus constitute a measure of a contrast edge segment's position, orientation, local direction component of motion, and length. A curved edge would activate hypercomplex-type cells of a variety of orientations, approximating its curvature to the extent permitted by the scale and length-sensitivity of the cells (Dobbins, Zucker. & Cynader, 1987, 1988; Hubel & Wiesel, 1977). The correctness of the present arguments does not depend specifically on the use of hypercomplex cells. Rather, what is important is the existence of classes of cells that localize different visual features. Hypercomplex cells are useful because they can localize edge segments. As this paper will show, the problem of determining the direction of motion for a localized edge segment is much easier than that for an edge whose
Se!f-Organization in Motion Perception
length is locally unknown. In fact, a simple tracking mechanism can easily determine the direction of motion of any localized feature. For example, if a class of cells responds preferentially to the bright diamond-shaped intersections of a "plaid" pair of gratings (Adelson & Movshon, 1982), then the direction of motion of the diamonds can be tracked. The tracked direction is equivalent to the direction computed from the "intersection of constraints" proposed by Adelson & Movshon (1982)--however, tracking does not require any explicit computation of constraint intersections. Psychophysical evidence indicates that human visual systems are indeed very good at localizing the relative positions of visual features (Burbeck, 1985, 1986, 1987). It might be reasonable to suggest that hypercomplex-type cells serve to localize edge segments in the visual motion detection systems of animals, but only if their visual systems contain such cells of a sufficiently wide range of preferred lengths. Current physiological evidence does not support the existence in cortical area 17 of hypercomplex cells with sufficiently large receptive fields to localize very long edge segments. However, progressively larger receptive field sizes in general are observed at higher stages in visual cortex (Van Essen & Maunsell, 1983). For example, the preferred lengths for hypercomplex cells typically range between 0.3 and 3 degrees of visual angle in area 17 of cat visual cortex (Kato et al., 1978; Orban et al., 1979a, 1979b), but in cat area 19, hypercomplex cells respond optimally to edge segments between 2 and 8 degrees in length (Saito, Tanaka, Fukada, & Oyamada, 1988). It thus may be possible that the hypercomplex property of lengthsensitivity will be observed in cells with even larger receptive fields when more detailed probes of higher stages of visual cortex are reported. The use of hypercomplex-type cells in this paper is a convenient device to show how localization of an edge segment allows the segment's actual direction of motion to be computed. These hypercomplex-type L~ cells project excitatory connections forward, in an orderly fashion, to cells in L~. To fix ideas, only a subset of the cells in L~ will be shown: those preferring a particular orientation, length, and local direction component of motion (Figure 3). The kind of structure described in this single case is reproduced correspondingly for all other combinations of orientation, length, and local directions to which the system is sensitive. Figure 3 depicts the structure of the feedforward (or bottom-up) excitatory connections from cells in L~ to cells in L 2. Each L1 cell projects to a cluster of retinotopicaily neighboring L 2 cells. For simplicity, the figure shows three cells in each such cluster, but increasing the number would serve only to increase the resolution of the system. Each of the three cells
49
%
% % %% % % % % % % +
/ 11 FIGURE 3. Subnetwork of cells preferring a particular combination of orientation, local direction component of motion, and length. Each L1 cell projects excitatory connections to a cluster of nearby cells in the corresponding L2 position.
in an L2 cluster receives bottom-up connections of equal strength from a single L~ cell. Hence, all the cells in a cluster inherit identical bottom-up receptive field properties: those of the L 1 cell. All of the L 2 cells in a cluster are sensitive to a particular orientation, length, and local direction component of motion of visual stimuli traversing their receptive field. Although the L 2 cells within a cluster all receive the same bottom-up input, their lateral connectivities--to and from other L2 cells--differ. Figure 4a shows a close-up of the lateral input and output structure of a cluster. Each L 2 cell in a cluster receives a strong excitatory input connection from an L2 cell displaced spatially in one direction and sends a strong excitatory output connection to another L2 cell displaced in the opposite direction. Furthermore, each cell within a cluster connects to others along a different axis in the L2 plane. In the figure, one cell receives and sends lateral connections along a horizontal axis, another along a diagonal axis, and another along a vertical axis. The directions of the lateral connections along each of the axes are consistent with the possible actual directions in which an edge segment activating the cluster's L~ input could be moving. In this manner, each L 2 cell constitutes a link in a chain of lateral connections along successive positions in a direction in which an edge segment might travel. Figure 5 depicts the embedding of an L 2 cluster in its three lateral connection chains. In addition, each lateral connection possesses a signal transmission latency (Adelson & Bergen, 1985; Barlow & Levick, 1965; Fleet & Jepson, 1985; Reichardt, 1961; Sperling et al., 1985; van Santen & Sperling, 1984; Waibel, Hanazawa, Hinton, Shi-
5O
//t.
~4ar~'haii
kano, & Lang, 1987). That is, a signal emitted by one cell does not reach its lateral destination cell until a prescribed time later. The timing of the lateral transmission latencies figures prominently in the operation of the network. In vivo, all synaptic connections possess signal transmission latencies. However, for convenience in computational simulation, the bottom-up excitatory connections transmit their signals instantaneously, and delays are simulated only in the lateral excitatory connections The final element in the structure of L2 is the set of lateral inhibitory connections between cells in each cluster (Figure 4b). Inhibitory connections project reciprocally between all Lz cells in a cluster. For reasons of computational ease and numerical stability, the lateral inhibitory connections are assumed to have negligible signal transmission tatencies. The exposition to this point has sketched the static structure of the network and the operation of individual cells. Details are in the Appendix, and further exposition is supplied by Marshall (1988b, 1989). How the network operates dynamically, in response to visual input, will be examined next. The network shown here is an example of a more general class of networks for visual processing. It can be thought of as a stage within a hierarchy of visual processing networks; this stage is suited for certain aspects of visual motion processing.
4-
(a)
(b)
How the Network Resolves Uncertainty FIGURE 4. (a) Each L2 cell receives Input excitatory oonnsctlon8 from other L2 cells in ixxdttons displaced In one direction-and sends output exc,~r~Ty c o n ~ to L2 ~lls displaced in the opposite dim=tton. Each L2 cell withi, a cluster ¢onnscts to others dlsp~::d ~ m . g a dmerent axis in the L2 plane. (b) Ceils ~ a cluster project inhibitory connections to one another, if one cell is active, then the others in Its cluster are less likely to be active.
FIGURE 5. Lateral chains of ~ one ©luster ( e ~ ) are ~ ~
~
The network described above resolves uncertainty by tracking (Anstis & Ramachandran, 1987; Burr & Ross, 1986; Ramachandran & Anstis, 1983; Sethi & Jain, 1987; Thompson & Pong, in press; Waxman & Duncan, 1986) visual features as they move across the visual field. It operates in a two-phase fashion, as shown in Figure 6. Consider a diagonally oriented contrast segment moving horizontally to the right. It excites L ~cells with the corresponding preferred:orientation, length, and local direction, at successive positions along its path of travel.
conm~ions. Each cell in a cluster participates in a different chain. The cha'ms for arrows.
51
Self-Organization in Motion Perceotion
%
% %
J
L1 (b) FIGURE 6. (a) Horizontally moving stripe activates L, cell a, which excites cells b, c, d in L2 cluster. Moderately active L2 cells (shaded) emit lateral excitatory signals, which do not arrive at their destinations yet because of transmission latencies. (b) A short while later, stripe has moved farther to the right and activates L~ cell e, which excites L2 cluster f, g, h. Lateral signals now reach their destinations: cells f, i, j. One cell receives both bottom-up plus lateral excitation, becomes strongly active (solid), and suppresses its neighbors' activities via lateral inhibition. Cells that receive only lateral excitation (i, j) are too weakly activated to propagate their lateral signals. Only one cell per cluster propagates its lateral signals in this phase.
In Phase 1 (Figure 6a), the segment activates cell a in LI, which--like all L1 cells--measures only the local component of motion perpendicular to the orientation of its receptive field. Cell a transmits bottom-up excitation to L 2 cells b, c, and d. Since cells b, c, and d are equally activated, they in turn initiate the transmission of excitatory signals through their lateral output connections. However, because the lateral connections possess a transmission time-delay, the output signals do not reach their destinations yet. The equal activation of the three cells, b, c, and d represents the uncertainty at this phase about the actual direction in which the contrast is moving. By permitting more than one cell at a spatial position to be active, the network multiplexes, or represents simultaneously, all possible actual directions in which the segment could be moving. A short time later, Phase 2 begins (Figure 6b). By
this time, the input segment has moved farther to the right. It no longer activates cell a; instead it now activates L~ cell e, which in turn sends bottom-up excitatory signals to its corresponding cluster of L 2 cells: f, g, and h. At just this moment, the delayed lateral signals from b, c, and d are delivered to L 2 cells f, i, and j. Thus cell f receives both bottom-up and lateral excitation, while cells g and h receive only bottom-up input. The extra excitation delivered to f increases its activation and enables it to suppress (via lateral inhibition) the activities of g and h. A single cell in the cluster is thereby chosen to be active. The lateral excitation received by i and j is too weak to activate those cells supraliminally; hence only cell f is fully active at Phase 2. Consequently, only cell f can propagate its lateral signals to its own successor. Since cellfis on a horizontal chain, the full activation of the single chosen cell f represents the network's newly computed decision that the input segment is moving horizontally at Phase 2. The initial broad dispersal of lateral signals (three active cells) at Phase 1 has been narrowed (one active ceil) at Phase 2. As long as the input segment continues to follow the same trajectory, the lateral signals continue to propagate along the same horizontal chain. Thus, the signal transmission latencies allow the representation of the segment's horizontal motion direction to predictively track the segment's changing position.
4. DETERMINATION OF ACTUAL DIRECTION The network sketched in the previous section can be elaborated in a variety of ways and applied to several uncertainty-resolution problems. This section develops the details of how such a network can function to disambiguate local motion measurements and to spatially track edge segments, even when the length of a segment can vary. Simulations are presented showing how the network's outputs are analogous to the direction judgments of human beings in response to Wallach's (1935) motion displays. This approach allows a new and easily generalizable solution to the aperture problem to be implemented.
Simulation I: Direction Judgments in Wallach's Display Simulation I shows how the kind of network described in the previous section can resolve motion ambiguity and respond to changes in direction, using a slightly more general version of the network. Implementation details for Simulation I are described in the Appendix and are elaborated further by Marshall (1988b, 1989). Figure 7 shows a grid of cell positions within layer L 2. Bars representing the network's input pattern sequence are superimposed on
52
,t ,4.
FIGURE 7. Grid of cell positions (dotted lines) is overlaid with bars representing changing ~ a n d length of input segment. Arrows repremmt direction ambiguity in L~ ceil activations.
the grid. The input sequence represents one of Wallach's displays: a diagonal line moving behind a horizontal rectangular window (Figure 2d). A single diagonally oriented segment is presented at each discrete time step, so that a single bar appears to sweep across the window. The bar starts at one corner of the rectangle and appears to become disoccluded, lengthening as it changes position diagonally upward. At time 5, the bar appears to stop lengthening and to change direction, shifting horizontally. It continues shifting horizontally until time 10, when it appears to shorten and shift diagonally upward again, through time 12. Each box in the grid contains a cluster of 12 cells; Figure 8 describes the preferred orientation, preferred length, and lateral connectivity of each of the 12 cells at a position. The 12 cells shown at each position are all sensitive to the same orientation and local direction component of motion, and thus only a subset of the full network is shown. However, the
La,era, ou,.ut Lateral input, / / \
Ce
La,e a,
~'~-" connection
Cell cluster J' ~ ~ 4~-~JPreferred position~ " ~ lengths S S. each ceil in t , c l . m r is rap--n..-----ted by a dot. which cell ~
-.nil ~
~
s~lnshs.
Ma~:s/m/i
12 cells in a cluster each possess a different combination of one of four preferred lengths and one of three lateral connection chain directions. Thus, one of the 12 cells might, for example, respond preferentially to presentation of a short~ diagonally oriented contrast segment, preceded by activation of a cell prior along a horizontal axis. By virtue of its lateral inputs, the cell can be said to respond prelerentially to a horizontally moving diagonal edge traversing its receptive field. At each time step in the simulation, the input pattern is fed into LI cells sensitive to the appropriate position, orientation, length, and local direction component of motion. None of the l.~ cells individually can determine the actual direction in which the bar appears to be moving; determination of actual direction is the task of L2. Excitatory signals from the maximally active L~ cells are fed forward to clusters of L2 cells. The simulation shows how the network at layer L2 computes and represents the actual direction in which the bar is moving, in a manner consistent with human percepts of actual direction of motion. Figure 9 summarizes the network's direction computations as the segment traverses the network. The initial (time 1) uncertainty about the segment's direction of motion is represented by the equal bottomup activation of three cells (three output arrows, shaded area) belonging to separate chains. Next (time 2), only a single cell (on a diagonal chain) receives both bottom-up and lateral excitation; lateral inhibition then ensures that only that cell remains significantly active. The network's decision that the segment is moving in a diagonal direction is represented by that cell's activity (single output arrow). The representation of diagonal motion propagates (times 3 and 4). At time 5 (and time 10), the segment begins to shift in a new direction, preventing the bottom-up and lateral signals from arriving at the same location. Three cells receive equal excitation (bottom-up only), engendering a new moment of uncertainty (three output arrows). The uncertainty is resolved at time 6 (and time 11), when again a single cell receives both bottom-up and lateral excitation, representing the network's decision about the segment's direction of motion (single output arrow). By virtue of the lateral chains, the preferences of L2 cells for actual direction of motion were not, in general, necessarily perpendicular to the cells' orientation preferences. Cells whose orientation preference is not perpendicular to their direction preference in response to moving edge segments (Slits) have been found in area MT of macaque monkey visual cortex (Albright, 1984). Thus, the dissociation of orientation and direction preferences exhibited :by L2 cells in the present simulations may correspond
Self Organization in Motion Perception
53
FIGURE 9. Summary of network's direction computations. Number adjacent to each diagonal row indicates time-step at which moving stripe reaches the row's position. Input arrows indicate lateral excitation; diagonal bars indicate cells receiving bottom-up excitation. Output arrows (shaded regions) indicate active cells. Triple output arrows (times 1, 5, and 10) indicate direction uncertainty; single output arrows indicate direction decision.
to that dissociation found in such physiologically identified cells. Simulation I illustrates how the two degrees of freedom in the aperture problem can be disentangled and then dealt with separately: first, the hypercomplex-type cells localize each edge segment; and second, the lateral chains track the segments' actual directions of motion. The direction decisions produced by the simulation accord with the percepts reported by Wallach's (1935) experimental subjects. Simulation I shows how a simple network can overcome the ambiguity inherent in L1 cell activations and can represent actual directions of motion. The same kind of mechanism can be replicated for other orientations and velocities; each such subnetwork is engaged by stimuli of its preferred orientation and velocity.
5. HOW THE NETWORK SELF-ORGANIZES
The complex chain structures described above might seem arbitrary, or indeed bizarre, as processing structures for visual information, were it not for a special property: self-organization. Self-organization refers to the ability of the network to acquire its structure adaptively without detailed external "teaching." The networks described here begin with a rudimentary, undifferentiated interconnection structure. As cells in the network are exposed to sequences of visual input, a simple adaptation rule causes them to modify their connection strengths ac-
cording to the spatiotemporal correlations in the input. As a result, certain characteristic patterns of connections form between cells in the network. The chain structures in L2 form in this manner; they are thus a natural consequence of simple adaptation to ordinary moving visual input. Initially, the networks are formed according to simple growth rules (Cohen & Grossberg, 1987; Dammasch, Wagner, & Wolff, 1986; Grossberg, 1976a, 1976b; Kohonen, 1982a, 1982b; Linsker, 1986a; vonder Malsburg, 1973; vonder Malsburg & Cowan, 1982; Willshaw & von der Malsburg, 1976). Cells are distributed uniformly throughout each layer. The weights of connections between cells decrease with distance according to a Gaussian function. Thus, the initial processing capabilities of cells in each layer are uniform and nonspecific. The strengths of neural interconnections are modified on a "use it or lose it" basis: connections involved in representing frequent visual events are exercised often, becoming stronger, while infrequently exercised connections become weaker (Amari & Takeuchi, 1978; Bienenstock et al., 1982; Carpenter & Grossberg, 1987a, 1987b; Dubin, Stark, & Archer, 1986; Fukushima & Miyake, 1982; Garey & Pettigrew, 1974; Globus, Rosenzweig, Bennett, & Diamond, 1973; Grcenough, 1975; Grossberg, 1972, 1976a, 1976b, 1984; Hebb, 1949; Hubel. Wiesel, & LeVay, 1977; Kohonen, 1982a, 1982b, 1984, 1987; Kohonen & Oja, 1976; Linsker, 1986a, 1986b, 1986c; Rakic, 1977; Singer, 1983, 1985a, 1985b; v o n d e r Malsburg, 1973; von der Malsburg & Cowan, 1982). The present paper goes beyond previous analyses in
54 several ways. First, the self-organization is applied to the domain of motion, which implies that representations of temporal sequences must be formed. Second, the dual processes of input representation and network self-organization occur simultaneously, even when uncertainty is represented. Third, a new kind of inhibitory learning rule is combined with an excitatory learning rule to regulate the amount of permissible overlap between the input patterns represented by each cell. A simple learning rule governs the gradual changes in the excitatory connection weights between cells: Whenever a cell is active, its input connections from active cells become stronger--at the expense of its input connections from inactive cells. This kind of rule is in the class of instar learning rules (Grossberg, 1982b); that is, for the (excitatory) connection from cell j to cell i, the learning occurs whenever cell i is active, but the target value of the connection strength is determined by the activity of cell j. The learning rule is followed independently by each cell Yet when combined in a network with excitatory and inhibitory signal transmission, it leads to several important properties of the network as a whole. In particular, two global network properties emerge from the local cellular interactions governed by such learning: 1. Selectivity Property. Each cell becomes increasingly sensitive to a particular input pattern. 2. Dispersion Property. Every cell tends to become sensitive to a different input pattern. Because learning occurs whenever a target cell is active, the dispersion property is upheld when inhibition is strong enough to prevent two target cells from responding to the same input pattern. (This condition can be relaxed somewhat: see section below on Tuning of Inhibition Strength.) Selectivity follows from dispersion: input patterns that differ sufficiently will activate different target cells, each of which then learns more strongly its own input pattern.
Spatiotemporal Correlations in Moving Images The networks presented here exploit certain spatiotemporal correlation properties (Field, i987; Kersten, 1987; Knill & Kersten, 1988) of information carried by images moving on the retina. To clarify how the networks self-organize, these ecologically derived correlations will now be described. The self-organization that occurs in these networks is based on the following premise: Moving visual features detected in an image have visual (Anstis & Ramachandran, 1987; Ramachandran &
~. A. Mars~toi/ Anstis, 1983; Sethi & Jain, 1987; Waxman & Duncan, 1986). For example, if a vertically oriented contrast segment moving rightward at velocity v activates a cell at position (x, y) at time t, then another cell near position (x + vAt, y) is likely to become acli vated at time t + At (for small St). On average, it is possible to predict the position and velocity of a given visual feature a short time into the future, given past measurements of position and velocity. For example, measurements over an interval of time of the motion of an oriented edge segment can lead to ~ fairly reliable estimate of the segment's future position. Of course, such predictions often fail to hold, because of noise, changes in the objcct's motion, cell unreliability, etc. However, if construed as probabilities rather than rigid sequences, such predictions can be used for resolving motion ambiguities, as in the aperture problem. The self-organizing networks described here learn and exploit these inertial tendencies. Figure 10 sketches a contour plot of a probability density function that estimates the a posteriori likelihood of cells at various positions becoming activated, given the activation of a cell sensitive to a particular local visual motion, At time units ago. The likeliest positions toward which a segment might move lie along a ridge whose location is consistent with the possible actual motions of the segment. Note that it is possible for the a posteriori activation to occur even in the opposite direction from where the given cell's activity nominally would indicate, due to factors such as change in motion or noise. Although it may be possible to accommodate some of these factors to an extent within this framework by allowing connections to spread across a "fuzzy" region (see Discussion), this paper shall be limited to a study of the sharply defined connections arising from the likeliest activation sequences. The networks described here incorporate the probabilities of these sequences into their connection weights via an adaptive rule, producing a connection structure suitable for resolving motk~n ambiguity. The networks take advantage of the interactions between signal transmission timing, retinotopic distance, and velocity measurements to generate prediction signals that correspond to the probabilities that a measured visual motion will be followed by a particular future visual input. Motion of a visual feature is represented in terms of the extent to which it follows these predicted motion sequences. In other words, t h e network automatically forms a model of the spatiotemporal structure of its input patterns and represents incoming visual information in terms of its model. Such an approach is similar in these respects to some maximum a posteriori (MAP) o r Bayesian methods (Anderson & Abrahams, 1987; Golden, 11988; Ker-
Self Organization in Motion Perception
55
sten, & Barlow, 1988; Kersten, 1987; Stork & Wilson, 1988; Watson, 1987). However, they may require an external "teacher" to supply a vector specifying the desired outcome of the learning. An analysis of self-organization provides even more power, by specifying not only how an organism can use environmental statistics to efficiently encode environmental events, but also how the same organism can accomplish the uptake of such statistics--without an external teacher.
Self-Organization o f the N e t w o r k Structure
FIGURE 10. Assuming that a cell is active, which cells will become activated a short time into the future? The figure sketches a schematic contour plot of the probability density function of the a posteriori likelihood of cells at various positions becoming activated, provided that a given cell is activated. Suppose a particular cell (ellipse), which responds maximally to a vertically oriented edge segment moving with a particular horizontal speed component (arrow), is active. Then a cell at the position marked by the x is the most likely cell to be activated next. Cells located along the contour ridge are also likely candidates for activation. Cells located elsewhere are much less likely to be activated next, although they might become activated by chance or by change in the segment's trajectory. The sketch is drawn based on the simplifying assumption that motion of visual objects is equally likely in all radial directions. In the case of linear motion without noise, the probability density along the ridge would dO
1
be proportional to dy - 1 + y2, where y represents the distance from the midpoint of the ridge and 0 = arctan y represents the radial direction angle of the motion.
sten, O'Toole, Sereno, Knill, & Anderson, 1987; Knill & Kersten, 1988), which have begun to be explored in the area of motion perception (Sereno, 1986, 1987). Such methods are useful because they elucidate how a perceiving organism's representational efficiency can be maximized and its coding redundancy minimized (Barlow, 1980, 1981; Bossomaier & Snyder, 1986; Daugman, 1985, 1988; Field, 1987; Field, Ker-
The networks described here incorporate the sequence probabilities of such motion stimuli into their connection weights via an adaptive rule, producing a connection structure suitable for resolving motion ambiguity. Because the lateral connections are timedelayed, the receptive field profile of each L 2 cell develops to include information about the prior state of other cells in its own layer. Each L 2 cell acquires strong lateral connections from other L 2 cells whose activations are likely to precede its own, that is, from cells of the s a m e direction and speed sensitivity prior on the s a m e direction and speed trajectory. Likewise, each L 2 cell is likely to develop strong lateral connections to other cells of the same sensitivity but subsequent on the same trajectory. Thus, the inertial tendencies of moving visual features lead to the formation of lateral chain structures in layer L 2 of the network. The signal transmission latencies of connections along each chain match the rate at which bottom-up signals activate the cells along the chain. The chains allow the network to track moving visual features, as in Simulation I. To illustrate, Figure lla shows how each L2 cell initially receives an undifferentiated profusion of excitatory connections from nearby cells in L] and L 2. After a period of exposure to moving visual images, most of those connections weaken or disappear (Figure 11b). Each L2 cell then receives bottom-up input representing a single preferred orientation, length, and local direction component of motion. Each L 2 cell also receives lateral connections consistent with its bottom-up input. That is, a n L 2 cell receives connections from other L 2 cells of similar receptive field preferences. The only such lateral connections that remain, after sufficient exposure to moving visual input, are those for which the transmission latency equals the time a visual feature (moving at the originating cell's preferred speed) would take to travel from the originating cell's receptive field to the destination cell's receptive field (Figure 11c). Due to the dispersion property, neighboring L 2 cells tend to acquire different lateral connections, even if they re-
,/, A. Mar~tzaii
56
(a)
(b)
/
L (c)
(d)
FIGURE 11. (a) Initially, each L2 cell receives excitatory connections from all its neighbors in L~ and L2. (b) AoUvity c ~ s (shading) s~lrengthen some connections and weaken others. (c) When a cell receives both bottom-up ex~tatla~ and ~m)de/a,/ed lateral excitstion, it becomes more eansltive to the ¢o~ ceils with like receptive field properties. Transmission lata~ their usual activation eaym:hron¥. (d) Lateral ihbibnion ens~ taneously. This gives rise to the d/spefll/on property: every c even if two ceils receive the same connections from L,, they
ceive the same bottom-up connections (Figure 11d). Thus, even cells that are physically clustered together tend to spread their input topologies apart.
6. SELF-ORGANIIATION OF VELOCITY S E N S I T M T Y This section shows how a neural network which ~mplements the adaptive principles described above acquires sensitivity to motion at several velocities. In particular, the network's bottom-up and lateral excitatory connections develop, in response to moving visual input, to endow L2 cells with predictive velocity tracking capabilities. The lateral connections in L2 form chain structures, linking cells with similar bottom-up velocity sensitivities along successive spatial positions. The global pattern of excitatory connections to all the L2 cells becomes maximally selective and dispersed. The system climbs out of local minima in its connection landscape as it proceeds to a globally consistent L~ --~ L2 mapping.
~ n 11: (A) Devdooment of Bottom.Up Conneetions Simulation II shows how adaptation, combined with moving visual input, leads to the formation of lateral connection chains consistent with bottom-up connections in a 1-dimensional slice across the network. Implementation details are specified in the Appendix and by Marshall (1988b, 1989). At each spatial position in the simulation there are 6 L1 cells and 6 L~ cells. Each L~ cell is sensitive to visual input moving at a different speed across the visual field: fast, medium. or slow to the left. or fast. medium, or slow to the right (Figure 12a). The network's input sequences are designed according to the simplest possible assumption: that each L~ cell is equally likely over time to be stimulated. The L1 cells are activated sequentially by simulated stimuli moving at velocities chosen randomly from the set { - 3 , - 2 , - 1,1,2,3}. This design is intended only as a rough approximation o f the ecologically constrained behavior of an animal's visual
Self-Organization in Motion Perception
57
- 1~1-- ~--I~+1 -2~ (a)
# 12
-3 4
~ +3
L2 Cell -3
-2
(b)
-1
+1
+2 + 3
L1 Velocity t-
o
~
'
(c)
(d)
'
-3 -2 ,4
-1
+1
L-"
Cell
+2 +a D,
I_1 Velocity
1
2
3
4
5
6
-3
-2
-1
+1
+2
+3
FIGURE 12. (a) Input patterns for Simulation II, A visual feature sweeps either left or right across the 1-D simulated visual field, with speed 1, 2, or 3. (b) Initially, the 6 L1 cells connect equally well to the 6 L2 cells. Length of vertical bar in matrix indicates bottom-up connection strength from an L, cell to an L2 cell. (c) After repeated exposure to visual input, most of the connections have weakened. (d) The surviving strong connections form a one-to-one mapping.
inputs. A wider range of input velocities could be accommodated simply by adding properly tuned L~ cells. Intermediate velocities could be handled by broadening the cells' tuning curves. The generality of the networks is not impaired by the choice here to limit input patterns to a small discrete set; intermediate patterns would be approximated by similar neighbors (Grossberg, 1976a, 1976b; Kohonen, 1984) and processed appropriately. Figures 12b-c depict the development of the bottom-up connection structure from a group of L] cells to a group of L 2 cells. An L~ cell's preferred speed is indicated by a number: - 3, - 2, - 1, + 1, + 2, or ÷ 3, which refers to the number of positions a visual feature moving at the cell's preferred speed would traverse in one unit of time. One matrix shows the initial strengths of the connections from the 6 L1 cells to the 6 L2 cells at a single position (Figure 12b). Another matrix shows the strengths of the same connections after a period of exposure to moving visual
input (Figure 12c). The pattern of connections develops so that each L~ cell connects strongly to exactly one L2 cell. The one-to-one pattern of bottomup connections is detailed in Figure 12d. Each L 2 cell thus inherits the receptive field preferences of a single L~ cell. The L 2 cell numbered 2, for example, has become sensitive to visual input moving at the rate of 1 position to the left per unit of time. The same kind of bottom-up self-organization occurs at every position in the network. The selectivity and dispersion properties of the network's adaptive rules make the network useful as a general-purpose pattern classification scheme. The combination of selectivity and dispersion enable the network to climb out of local minima in its connection landscape, in a manner similar to the formation of globally consistent feature maps in Kohonen's (1982a, 1982b, 1984, 1987) self-organizing networks. Occasionally, due to spurious correlations in the input patterns, improper bottom-up connections gain a small amount of strength. However, these connections tend to disappear rapidly. In general, the high correlation probabilities between the activity of an L] cell and the L 2 cell to which it connects keep the correct connections strong and incorrect connections weak. In this manner, each L2 cell acquires a unique bottom-up receptive field profile. Each L2 cell within a position becomes preferentially sensitive to a particular local velocity. The bottom-up velocity preference of each L 2 cell in the 6 simulated network positions is summarized in Figure 13.
Simulation lh (B) Development of Lateral Connections In addition to the bottom-up connections, each L 2 cell acquires a set of lateral excitatory connections. Both bottom-up and lateral excitatory connection strengths vary according to the same adaptive rule. In Figure 13a, all the surviving strong connections with transmission latency 1 are shown. Note that all the + 1 cells participate in a single chain of connections that jumps from each + 1 cell to the one + 1 positions to its right (heavy arrows). Likewise, every cell connects laterally only to other cells of the same velocity sensitivity. The direction and number of positions that each link crosses corresponds to the bottom-up velocity preferences of the cells it connects. In Figures 13b and 13c, the surviving strong connections with latencies 2 and 3, respectively, are shown. These also form chains, linking cells of like velocity preference; however, each link crosses 2 or 3 positions, respectively, for each unit of velocity to which its endpoint cells are sensitive. Each cell here acquires lateral connections from the L2 cells likeliest
58
~ A Mar~h.!:
Delay
I.-'*~'-k:~a~-~
=
1
3 [ O,Xly 2 =
iI°'
IC
I-*1+
Position
oQxty= a
1
Position
2
Position
3
Position
4
Position
5
Position
6
FIGURE 13. The bottom-up velocity preference of each L2 cell in the 6 simulated network positions is indicated by a n u ~ within the corresponding box. All surviving strong connecUons of time-delay I (a), time-deiw 2(b), e n d t ~ 3 (©)m displayed as arrows. The chain structure, connecting ceils of like bottom,up ~ asnsltivlty, is ~ ~ ~ . ~ chain is shown by heavy arrows.
to have been active 2 or 3 time units prior to its own activation. Although the connections with latencies 1, 2, and 3 are displayed separately in Figures 13a--c, they all exist simultaneously in the single network of Simulation II. Based on the bottom-up velocity sensitivity of each cell, the surviving lateral connections, displayed in Figure 13, are exactly the correct ones-none are missing, and none are superfluous. Simulation II points out three main features of the network's self-organization in response to visual input: (a) the initial nonspecific bottom-up mapping becomes maximally selective and dispersed, in this case coding each velocity by a separate L2 cell; (b) the lateral excitatory connections that thrive respect the bottom-up sensitivity of the cells they link; and (c) unidirectional chain-like structures of lateral connections form along successive positions, linking cells that respond to visual features with similar motion characteristics.
7. SELF-ORGANIZATION OF D I R E ~ O N SENSITIVITY Simulation II illustrated how a network can acquire sensitivity to visual motion trajectories based on velocity. In order to show that self-organization can
provide all the structural elements the network needs to perform its motion-disambiguation tasks, an additional adaptive capability must be demonstrated: the network's ability to acquire sensitivity to visual motion trajectories based on 2-D actual direction of motion. The adaptive rule used in Simulation II. combined with a new inhibitory adaptive rule, together allow layer L2 to become sensitive to actual direction of motion. Each L2 cell becomes a member of a lateral chain which connects cells of similar direction preference; in addition, the L2 cells in a local neighborhood all become members of different chains. The inhibitory learning rule enables the system to develop representations of both uncertainty and decision, to use its full representational capacity, and to dynamically maintain symmetry of inhibition. m : (A)
of
to
Actual Direction Simulation Iti shows how layer L2 of the network can become sensitive to actual direction of motion. Implementation details are described in the ~ e n dix and by Marshall (1988b, 1989). Figure 14 degicts the 3 x 5 grid of L2 celldusters on w h i e h ~ a t i o n III takes place. At each of the 3 × 5 spatial positions in L~, there is a single cell, which is assumed to be
Self-Organization in Motion Perception
59
FIGURE 14. Time = 0. Schematic diagram of L2. Each cell in 3 x 5 matrix of clusters is identified by number. Some cells are displayed more than once to facilitate display of "wraparound" connections. Initially, every L2 cell is connected weakly to all its L2 neighbors. The output connections from a single cell are displayed.
sensitive to vertically-oriented edge segments moving in any of three directions with rightward normal component. Each L1 cell projects bottom-up connections to the three cells in its topographically corresponding L2 cluster. Lateral excitatory connections within L2 are initially symmetric: each L2 cell sends weak excitatory connections to all its neighbors, as shown in Figure 14. Strong reciprocal inhibitory connections between cells within each cluster are also present. The network is exposed to an input sequence representing oriented edge segments moving across a region of the visual field. At random intervals a vertical edge segment appears in the visual field at one of the 3 × 5 L1 cell positions. It then sweeps in one of three rightward directions for several time-steps: horizontally, diagonally upward, or diagonally downward; and finally, it disappears. Such simulated visual motion stimuli are presented repeatedly. On each
presentation, the weights of both the bottom-up and lateral connections change only slightly, so that the learning does not reflect the effects of any single input presentation, but rather the accumulated effects of statistical trends in the input over long periods of time. Figures 15 and 16 show the resultant lateral excitatory connection patterns at progressively longer periods of simulated visual exposure. Already after 2000 units of time (Figure 15), a universal rightward trend in the direction of lateral excitatory connections is discernable. Although much of the desired chain structure is still missing at this stage, the lack of leftward input motion is reflected in the absence of leftward connections. By time 8000 (Figure 16), the network has selforganized its lateral chain structure completely: every L2 cell is a link in one lateral chain, and furthermore, within each cluster, each of the cells is part of a
60
./. A. Marsh.I/
"\
,
•
\\.
•
\
/
f
///
................)i i I' l
FIGURE 15. Time = 2000. Already, the connections show a distinct k d t - t o - r t l ~ trend.
different lateral chain. Thus, Simulation III illustrates how each cluster of neighboring cells in L2 independently develops a separate means of representing each of the possible actual directions in which an L1 activation can move. After time 8000, the connection strengths continue to vary--the learning rules are not shut off. However, the overall pattern of lateral chains established at this point (Figure 16) does not change throughout the remainder of the simulation, which runs to time 15,000. Thus, the overall pattern of lateral excitatory connections is stable, as long as the system's input sequences continue to follow similar statistical distributions. See section below on Stability and Plasticity of the Network for a discussion of the tradeoffs between the network's learning rate and its sensitivity to temporary fluctuations in the statistical behavior of its input sequences. After a period of self-organization has occurred ~ ~:lateral ~ s have become established, as in Figure 16, the network processes visual input in its normal fashion, as described above (Figures 6 and 9).
Simulation I11: (B) Tuning o f ~ n
Strength
Simulation III combines both excitatory and inhibitory learning (Easton & Gordon. 1984) in a new way. Both types of learning contribute to the network's self-organization. The excitatory learning allows the L2 cells to form categories for the input, based on spatiotemporal correlations. The inhibitory learning (Amari & Takeuchi, 1978; Easton & Gordon, t98~: Nagano & Kurata, 1981; Wilson. 1988) governs the amount of coactivation, or permissible overlap, permitted between categories. Figure 17 illustrates the desired inhibition properties. If too little inhibition were present wi~in each cluster, then all three cells in a cluster coultt become active Simultaneously (Figure 17a). Since cormection strengths are initially isotropic, all the cells in acluster would becon~ sensitive tothe same input pattern. Because each celt ought to acquire a d/fferent sensitivity, the inhibition strengths need to be ~ enough (at least initially) so that the slightdifferenees in cell inputs would result in great differences in cell acti-
Self-Organization in Motion Perception
61
FIGURE 16. Time = 8000. The desired chain lattice has formed perfectly. All strong excitatory connections are part of a lateral chain. Also, each cell in every cluster is part of a different chain. The same overall chain structure remains permanently through time 15,000, even though learning is allowed to continue.
vations (Figure 17b). Over long periods this would ensure (by dispersion) that no two cells would tend to acquire the same input sensitivity. However, once the cells' input pattern sensitivities are established through strong inhibition, they become incorporated into the excitatory connection weights. Thereafter, less inhibition is needed because cells tend not to become coactivated anyway. Moreover, too much inhibition would then preclude the transient activation of multiple cells when a new stimulus first appears; a single cell (probably the wrong one within the L2 cluster) would be chosen to actively represent the input. Since the wrong cell might become the only active cell, the concomitant incorrect learning would be likely to distort the network's connection patterns, in general preventing a stable configuration from becoming established. The inhibitory learning rule in Simulation III prevents such pathologies from arising by permitting the network to choose the appropriate intermediate levels of inhibition.
The desired inhibition properties dovetail nicely with the network's representation of uncertainty via multicell activation. Most of the time, as a stimulus sweeps across the visual field, the system is able to make a definite choice about its actual direction of motion. The kind of visual uncertainty examined in this paper generally occurs only during the moments of onset or direction-change of a moving visual stimulus. Thus, coactivation of multiple cells within a cluster occurs relatively rarely in this case. When it does occur, it is characterized by the absence of lateral excitation. Because Simulation II was not designed to demonstrate how uncertainty can be represented by multiple-cell activity, it did not employ such tuning of lateral inhibition strength. Rather, a high but constant amount of inhibition in Simulation II ensured that the L2 cells would simply develop highly selective receptive field profiles. The high inhibition levels in Simulation II prevented simultaneous activation
62
J A. Marshal!
Weak inhibitior
and less inhibition, until it begins to respond to some input pattern. The cell would continue to receive reduced levels of inhibition generally until it is active as often as other cells.
Strong:! inhibitior
Symmetric Inhibition from an Asymmetric Learning Rnle (a)
(b)
(c)
(d)
FIGURE 17. (a) When the ~ then all the cells In the ~ When the inhibition Is ~ ,
within a L=cluster is weak,
~
(b)
one ceil can ~ the others. Thus, uncertainty cannot be ~
mu~s
~-'~Iv~ of multiple cells. An i
suppress via si-
~
hMI
is desirable, so that the ~ r k
can (c) allow a cell that
receive= both bottom-up ph~ ~
:
~
to ~ u ~
the other ceils' ~ , but also ( ~ allow ~ a activation of cells when only bottom-up input Is ~ .
of multiple L2 cells within each spatial position from occurring. The rule governing inhibitory learning in Simulation III was similar to the excitatory learning rule. Whenever a cell is active, its output inhibitory connections to other active cells become stronger; its output inhibitory connections to other inactive cells become weaker. Thus, if two cells tend to be coactivated, the amount of inhibition between them tends to increase (Easton & Gordon, 1984), thereby making them less likely to be coactivated. If two cells tend not to be coactivated, the amount of inhibition between them tends to decrease, so that they can become coactivated on relatively rare occasions. This rule is a reverse of inhibitory learning rules proposed previously (Amari & TakeuchL 1978; Nagano & Kurata, 1981), in which coactivation results in a decrease of inhibition strength. The inhibitory learning rule helps ensure that the network's representational capacity is fully used. If, for instance, a particular cell is never activated, then it will be unable to learn to code any input pattern, and its network role will be wasted. But the inhibitory learning rule would cause the cell to receive less
It is usually desirable to keep the amount of mutual inhibition between cells symmetric; that is, the strength of the inhibitory connection j -:> i should be the same as the strength of the inhibitory connection i-~ j. Otherwise, one cell could become strong enough to inhibit all the other cells. Thus whenever j - ~ i changes, so must i-7->j. In addition, theorems proving stability of learning in neural networks require the assumption of symmetric inhibition (Cohen & Grossberg, 1983; Grossberg, 1982b). How can symmetry be preserved when both j--, i and i -~ j are allowed to vary independently? Ordinarily one would think either that the connections j ~ i and i --~ j must communicate with each other (via some privileged nonlocal means) or that the inhibitory learning rule must treat j and i symmetrically and interchangeably. Both of these options for maintaining symmetry are unattractive: in the first case. the physical requirement that all neural interactions be locally mediated (Grossberg, 1984; Stent. 1973) rules out such weight-communication or weighttransport schemes; in the second case. explicitly symmetric learning rules are unlikely to possess a physiological interpretation (Grgssberg, 1984; Stent, 1973). The problem of maintaining connection symmetry is thus a difficult one. The inhibitory learning rule proposed in this paper successfully maintains connection symmetry using a local, asymmetric learning rule. thereby avoiding the difficulties outlined above. The following example sketches how the learning rule keeps the connection strengths in balance. Suppose j ---> i is stronger than i -~ j. Then cell j is more likely to become activated than i because j can suppress i's activity. However. when j does become activated, the rule causes j -~ i to weaken, heading toward restored symmetry. The following statistic indicates the rule's effectiveness: within every reciprocal pair of inhibitory connections in Simulation Ill. the strength of the weaker connection was at least 83% of that of the stronger connection, at time t = 15,000. Thus the simple learning rule described above nicely solves the problem of keeping inhibition symmetric--with an asymmetric learning rule. Simulation III shows (a) that development of sensitivity to velocity and to direction can both be governed by the same excitatory learning rule; (b) that even if all the L2 cells in a cluster receive the same bottom-up input, they can all acquire different lateral
63
Self-Organization in Motion Perception input connections; (c) that both uncertainty and decision can be represented, without disrupting the network's development; and (d) that excitatory and inhibitory learning have different roles and can be combined to produce sophisticated and useful forms of adaptation.
8. DISCUSSION: ISSUES IN REPRESENTATION AND SELF-ORGANIZATION Representation in Visual Processing Networks Visual systems construct sharp, vivid percepts from diffuse, uncertain, noisy measurements. In the domain of motion perception, for example, a moving visual stimulus appears smeared when it is viewed for a brief exposure (30 ms), yet perfectly sharp when viewed for a longer exposure (100 ms) (Burr & Ross, 1986, Van Essen & Anderson, 1987). Our ability to counteract smear at longer exposures suggests that our visual systems combine and sharpen motion information from multiple locations along a spatiotemporal trajectory that matches the motion of the stimulus (Barlow, 1979, 1981; Burr & Ross, 1986). What neural mechanisms perform such spatiotemporal integration? The type of long-range lateral connections proposed in this paper allow motion information computed at one location to propagate to successive locations in the correct direction and velocity. The lateral motion signals can then influence the outcome of inhibitory sharpening of bottom-up visual data. Thus, such lateral tracking mechanisms can implement the kind of integration-along-trajectory found in human vision. The "shifter circuits" suggested by Anderson and Van Essen (1987) and by Van Essen and Anderson (1987) may permit a visual system to compensate for the blurring effects of eye movements. However, their shifter circuits are controlled by a "black box" that takes its input from a global retinal motion signal. Consequently, their mechanism handles only the kinds of motion that arise from eye motion. It does not compensate for blur due to motion of visual features independent of eye motion. In contrast, the tracking mechanisms proposed in this paper do not require a global motion signal to regulate image shifts. Instead, the amount of shift is controlled separately for each visual feature, independent of other visual features, and independent of eye motion. The amount of L 2 shift for each visual feature is governed by the feature's locally measured L1 velocity. Furthermore, it is easy to add mechanisms to compensate for global eye-motion to this kind of system, simply by adding a global eye-motion input to each L 2 cell. The system can then develop a chain structure, again connecting cells with similar
receptive-field characteristics--where the notion of similarity is expanded to include eye-motion as well as orientation, local velocity, and length. Several subclasses of cells that represent different eye motions would result, and the representation of each object's motion could then be allocated appropriately between retinal and object components. Thus, the kind of self-organized tracking mechanisms proposed in this paper can be expanded to handle both eye motion and visual object motion. The network's tracking mechanisms can be implemented either by lateral connections that traverse a single distance with a variety of transmission latencies, or by lateral connections across a variety of distances but with a single, fixed transmission latency. Functionally, these two alternatives are equivalent. However, biological evidence tends to favor either the latter alternative (Amthor & Grzywacz, 1988; Movshon, Newsome, Gizzi, & Levitt, 1988) or a combination of both alternatives (Baker, 1988). The adaptive mechanisms of Simulation II can produce connection chains consistent with either of the alternatives, or both alternatives together, as illustrated in Figure 13. Baker (1988) provides a discussion of how small variations in the spatial and temporal response properties of cells in striate cortex combine to produce large variations in the range of the cells' velocity sensitivities.
Simultaneous Representation of Multiple Moving Objects Unlike schemes that represent visual motion via optical flow or velocity fields (Gibson, 1950; Gibson, Olum, & Rosenblatt, 1955; Horn & Schunck, 1981; Koenderink, 1986; Mart & Ullman, 1981; Nakayama & Loomis, 1974; Regan, 1986; Waxman & Duncan, 1986), the networks described here do not maintain a field of vectors to indicate local velocity at every visual position. Rather, these networks represent motion via a localized cell activation for each visual feature, such as an edge segment. For example, an entire edge segment is represented by activation of a cell whose receptive field is centered retinotopically at the centroid of the segment. Because the network's representation of a visual feature is local and moves with the feature itself, more than one feature can be represented simultaneously. In response to multiple moving features in the visual field, the activations representing each feature propagate independently. Some computational schemes for representing visual motion require that each visual object be explicitly identified, segregated, and labeled before its motion can be determined--and then further require explicit mechanisms to map an abstract representation of motion back to a retinotopic representation of the object (e.g., Feldman, 1988). In the
64
networks presented here, the segregation (Mart, 1982; Orban & Gulyfis, 1988), identification, and motion mapping of visual features is accomplished implicitly by simply preserving retinotopy throughout all network layers and allowing the internal representations to track the external stimuli.
How Many Cell Types Are Needed? Coding schemes like the present one, in which the activity of each cell represents a different situation (Waiters, 1987), always exhibit a tradeoff between sampling resolution and coding economy (Duda & Hart, 1972). One feature of how this kind of network represents information is that many categories of cell can develop. For instance, in Simulation III, each position contains one type of cell for every actual direction of motion that can be represented. If the network can represent 3 directions, 4 lengths, and 5 velocities, then one might argue that 3 × 4 x 5 = 60 cell types are needed at each position in the network. Even more might be needed if more input dimensions are represented. This is a serious concern, but it can be addressed in at least four ways. First, for computational simplicity, every simulated cell was tuned very sharply to a specific input feature. However, the tuning curves of real cells are typically much broader than those depicted in the simulations. If more realistic input conditions were applied to the network (e.g., more noise, nonlinear trajectories), then the cells would naturally develop broader tuning curves. Fewer broadly tuned cells than sharply tuned cells would then be required to represent the input--although fine degrees of resolution might be unavailable at higher network levels. The broader tuning would help establish connections between pairs of cells preferring somewhat more disparate stimulus attributes. These connections would in turn allow the network to represent with greater certainty some changes of direction and velocity. Second, cells at a given network level could have larger receptive fields than cells at prior levels. This amounts to broader tuning in the positional domain and would allow cell positions at higher levels to be spaced farther apart than cells at lower levels. The notions of broader tuning and increased receptive field size at higher levels have a great deal of support in the physiological literature (Van Essen & Maunsell, 1983). In trading sharpness for cell types, we actually lose nothing because sharp information is still available at lower levels. Alternatively, certain neural network architectures (e.g., Grossberg & Marshall, 1989) can recover sha~ness after a stage of diffusion (Grossberg & Mingolla, 1985a, 1985b). Third, it might be further argued that the apparent massive redundancy of ceils in physiological measurements of visual cortex could be illusory. Elec-
.) A. Mars~mii
trophysiological and cytohistochemical studies have concluded that cells often are organized into cortical columns, in which cells have similar receptive field properties (Hubel & Wiesel, 1977). However, such studies, by their very nature, may use insufficiently detailed visual probes to reveal fine receptive field differences. For instance, studies of cell response in various cortical areas to actual (versus local) direction of motion could be informative and regarded as a test of the biological application of some of the hypotheses in this paper. Fourth, it is possible that the visual systems of animals use axo-axonal connections, in addition to the axo-somatic connections modeled here. Analogous rules can be derived to describe the behavior of such neural architectures. An axo-axonal system would allow each cell's activity to represent many possible points along a feature dimension, instead of just one point. Thus, the number of cells required could be cut by orders of magnitude~
Dependence of Serf.Organized Structures on Input Sequence Ideally, Simulations II and III could be merged into a single simulation showing how self-organization based on both velocity and direction could occur simultaneously. Because of the high costs of simulating the parallel self-organization processes on a serial computer, it has been necessary to split the scope of the ideal simulation into two separately manageable tasks. (The numerical integration of Simulation HI alone required approximately six weeks to run on a 0.3 megaflop computer, for example.) However. in principle, a network can structure itself in the desired manner when presented with input sequences that combine a variety of velocities and directions of motion because both Simulations II and III operate according to the same kind of network rules. No cell "knows" whether its input represents a given velocity or direction; its input is simply a pattern of activity distributed spatially across other cells and correlated in time. Thus the same learning algorithm would work whether the network's input patterns represent visual velocity, direction, or velocity and direction combined. Simulations II and III together illustrate how lateral chains can self-organize in a network that operates according to quite simple rules. The simulations show how the structures that develop can reflect the velocities and directions represented in the network's visual input history. In general, the adaptive rules that the network obeys can enable it to learn and encode any statistical input distributions, not just of velocity and direction, but also of attributes such as orientation (Bienenstock et al.. 1982; Linsker. 1986b), contrast (Linsker. 1986a), and length. For
Self Organization in Motion Perception
65
example, the premise that the detection of a short bar is more likely to be followed by other detections of a short bar than by detections of a long bar could be captured by this type of network and encoded in cell connections. The explicit assumption of this premise formed part of the basis of the hardwiring of connections in Simulation I. What is truly striking about this type of network is not that it just encodes sequence probabilities in connection weights, but rather that it also subsequently uses those weights to represent and disambiguate actual incoming visual data. There is an elegant isomorphism between the coded structure of the world (in the connection chains) and the propagated processing of visual input (which becomes represented in terms of the chains). The generality of these adaptive properties opens the possibility that a lengthy hierarchy of visual processing layers can form through self-organization. It has been shown that contrast-detection layers and orientation-detection layers can form through selforganization (Amari, 1977; Bienenstock et al., 1982; Fukushima & Miyake, 1982; Grossberg, 1976b; Linsker, 1986a, 1986b, 1986c; Pearson et al., 1987; Singer, 1983, 1985a, 1985b; Takeuchi & Amari, 1979; von der Malsburg, 1973; von der Malsburg & Cowan, 1982). This paper adds direction and velocity sensitivity to the list of known visual processing functions that can self-organize. Eventually, it may be possible, by using the output of each layer as input to subsequent layers, to show that even higher-order capabilities, such as depth perception and object recognition, can self-organize.
plasticity and stability of connection strengths by insuring that a single input presentation can change connection strengths only a tiny amount. Only the cumulative and systematic effects of many input presentations can significantly recode the network's connectivity. If the rate of such weight change is made small enough, then one can be reasonably sure that the resultant connection patterns stably code the statistics of the input history rather than the adventitious correlations in a small number of input presentations. However, if the rate of weight-change is made too small, then a very large number of input presentations would be needed to produce the desired adaptation effects. A more sophisticated approach to the tradeoffs between stability and plasticity is explored by Carpenter and Grossberg (1987a, 1987b) and Grossberg (1980, 1982a). For the purposes of this paper, the simple expedient of fixing the rate of connection weight-change is sufficient, though. The approach taken in Simulations II and III was to choose a rate of weight-change small enough that even several spuriously correlated input presentations would not change the connection topology-but no smaller. The result was that the connection pattern was quite stable once it settled into its final form--and that the network would reach its final structure after a reasonable amount of visual exposure. Simulation II required approximately 30 presentations per cell before it reached its final overall structure (1080 time-steps + 36 cells in L2). Simulation III required approximately 270 presentations per cell before it reached its final overall structure (12,000 time-steps + 45 cells in L2).
Stability and Plasticity of the Network
Physiological and Psychophysical Correlates
The network's connection weights constitute a coding of the structure of the external world. The formation and persistence of this code depends heavily on the statistics of the network's visual input history. That is, if the probabilities of events are altered over a long enough period of time, then the connection patterns will change. For example, in Simulation II, if the positional displacements of the input sequences were changed from {_+1, -+2, -+3} to {-+2, -+3, -+4}, then the network would eventually lose its strong lateral connections between cells -+1 position apart and gain strong connections between cells -+4 positions apart. This plasticity is an advantage under many circumstances; for example, it would compensate for the systematic distortions produced by growth of the eyeball as a newborn animal ages. However, plasticity could also be a liability in other cases. For instance, if the alteration of input statistics is only temporary or spurious, then the changes it induces might erode desired connectivity patterns. The networks in this paper control the degree of
A number of lines of evidence support the notion that long-range lateral connections exist in visual cortex and are used for motion processing. Long-range, direction-specific spatiotemporal facilitatory interactions have been found in area MT of macaque monkeys (Mikami et al., 1986a, 1986b). Long-range axons and long-range excitatory interactions between cells of similar orientation preferences have been found in area 17 of the visual cortex of cats, tree shrews, and macaque monkeys (Blasdel, Lund, & Fitzpatrick, 1985; Gabbott, Martin, & Whitteridge, 1987; Luhmann, Martinez-Mill~in, & Singer, 1986; Lund, 1987; Michalski, Gerstein, Czarkowska, & Tarnecki, 1983; Mitchison & Crick, 1982; Nelson & Frost, 1985; Rockland &Lund, 1982, 1983; Rockland, L u n d , & Humphrey, 1982; Ts'o, Gilbert, & Wiesel, 1986). Gabbott et al. (1987) suggest that such lateral interactions may be used in motion perception: "This feedforward excitation by a neuron into whose receptive field an object has just entered could act as a facilitatory device to 'prime' (by increasing
66
their response sensitivity) other neurons into whose receptive fields the object might eventually travel. These operations could also provide some predictive estimate of the future position of the object" (pp. 378-379). Such "priming" is exactly the function of the lateral connections in Simulations I-III. Luhmann et al. (1986) further note that the development of such lateral connections in cat area 17 is dependent on visual experience: "There is an inborn pattern of discrete horizontal connections in striate cortex which is shaped by visual experience and requires contour vision for its maintenance" (p. 443). "The development of the horizontal connections in striate cortex occurs in at least two phases, an early phase during which connections are formed in excess and a later phase during which connections are again eliminated" (p. 447). The self-organization processes presented in this paper offer a ready explanation of the mechanism and function of such developmental processes. Luhmann et al. (1986) found that such horizontal connections spanned distances of up to 10.5 mm across area 17, that is, very far in retinotopic coordinates. The present paper does not make an explicit identification of particular levels in the network with area 17. Rather, the levels LI and L2 are predicted to correspond to certain higher cortical processing areas, such as MT (Albright, 1984; Allman et al., 1985; Maunsell & Van Essen, 1983; Mikami et al., 1986a, 1986b; Rodman & Albright, 1987) and STS (Saito et al., 1986). Since the longrange lateral connections are found already in area 17, similar connections are likely to be found within higher levels as well. The long-range horizontal connections found in cat area 17 serve to illustrate (a) that the visual systems of animals do make use of long-range interactions, and (b) that kind of the lateral connections proposed in this paper do have a degree of support in the physiological literature. Lateral connections can contribute to explanations of certain psychophysical data, as well as physiological data. Phenomena such as visual entrainment or inertia suggest neural mechanisms by which motion computations at one spatial position influence motion computations at other spatial positions. For example, the multistable displays of Anstis and Ramachandran (1987) and Ramachandran and Anstis (1983) show that the directions of motion computed over one sequence pair of image frames cause the same directions to be favored in successive pairs of frames, at successive positions along the same motion trajectories. Such inertial phenomena can be readily analyzed in light of the lateral connection chains proposed in this paper: the chains cause the directions of motion computed at one location to propagate to successive locations along the corresponding motion trajectories, with the appropriate timing. The laterally propagated direction signals
J. A. Marshal/
cause the representation of motion in the inertial directions to be favored. Williams and Phillips (1987) report certain cooperative phenomena in the perception of motion direction; their results are consistent with the cooperative (i.e., excitatory) properties of the lateral connections proposed in this paper. They presented observers with a moving random dot field in which the individual dots moved either in a random or glob: ally coherent direction. By temporally varying the proportions of randomness and coherence, they showed that the percept of coherent motion tended to persist hystereticaUy. That is, an observer began to detect coherent motion at a certain point as the proportion of coherence was increased; subsequently, coherence persisted even when the proportion was decreased below that l~int. Lateral excitatory connections could contribute to the hysteresis by propagating the computed motion directions at one moment to successive moments. However, since lateral connections propagate spatially as well as temporally, they may be predicted to cause coherence to appear to spread spatially in the direction of motion, or at least to be influenced by spatial factors. Williams and Phillips (1987) did not control for such spatial factors; that might be done by varying the spatial displacement of each dot over successive frames or by varying the rate of change of the proportion of coherence. If their results are expanded to indicate whether spatial, as well as temporal factors influence perceived motion, then the persistence of perceived direction may be found to be attributable to propagation of computed direction at one moment to successive moments via lateral excitatory connections. The psychophysical phenomena described above can be used to analyze one further fundamental question regarding the structure of motion-processing networks. Why not just use time-delayed bottom-up connections, instead of bottom-up and lateral connections, to accomplish motion tracking? The answer is that vision requires feedback: if all connections were bottom-up, then the network's motion computations would be performed afresh at each spatial position, without the feedback necessary to produce the phenomena of visual inertia (Anstis & Ramachandran, 1987; Ramachandran & Anstis, 1983) and motion hysteresis (Williams & Phillips, 1987): Lateral connections allow the results of motion computations at each spatial location to influence the outcome of subsequent motion computations at other locations, thus permitting inertia and hysteresis,
CONCLUSIONS One can imagine many reasons for a visual system to be adaptive. Adaptive mechanisms could allow the system to compensate for distortions, for ex-
Self-Organization in Motion Perception
ample, due to growth of the eyeball. They could maintain the proper behavior of the network by compensating for changes or deterioration in the behavior of individual cells. They can reduce the informational burden on genetic coding by allowing the details of neural interconnection structure to be specified by the network's input correlation history. The exposition here highlights some of the adaptive issues a visual system must confront and some possible solutions to those issues. The strength of this approach is its reliance on simple, general-purpose adaptive processes in showing how a rudimentary network can acquire an ability to represent and disambiguate visual input. Simulations I-III sketch separate but related fragments of the puzzle of visual processing. The behaviors illustrated in the three simulations must be joined into a single model--and then combined with many other features--in order to constitute a full theory of vision. With sufficient computational resources, this can be done. So, the true value of this path of research lies not in its simulated construction of specific adaptive networks, but in its broader principles: use of input correlation history to guide the formation of its structure; use of strictly local adaptive rules to govern the self-organization of global processing mechanisms; use of lateral time-delays to bring events into temporal register; competition between cells and between connections to foster selectivity and dispersion; inhibitory as well as excitatory learning to balance selectivity and multiplexing; preservation of spatiotopic relations to allow simultaneous representation of multiple independent objects; and simultaneous use of network structures for both processing and adaptation. REFERENCES Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2, 284-299. Adelson, E. H., & Movshon, J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300, 523-525. Ahumada, A. J., Jr., & Yellott, J. I. Jr. (1988). A connectionist model for learning receptor positions. Investigative Ophthalmology and Visual Science, 29(Suppl.), 58. Albright, T. D. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neurophysiology, 52, 1106-1130. Allman, J., Miezin, E, & McGuinness, E. (1985). Direction- and velocity-specific responses from beyond the classical receptive field in the middle temporal visual area (MT). Perception, 14, 105-126. Amari, S. (1977). Neural theory of association and concept formation. Biological Cybernetics, 26, 175-185. Amari, S., & Takeuchi, A. (1978). Mathematical theory on formation of category detecting nerve cells. Biological Cybernetics, 29, 127-136. Amthor, F. R., & Grzywacz, N. M. (1988). The time course of inhibition and the velocity independence of direction selectivity in the rabbit retina. Investigative Ophthalmology and Visual Science, 29(Suppl.), 225.
67 Anderson, C. H., & Abrahams, E. (1987). The Bayes connection. In M. Caudill & C. Butler (Eds.), Proceedings of the First IEEE International Conference on Neural Networks, 111 (pp. 105-112). Piscataway, N J: IEEE. Anderson, C. H., & Van Essen, D. C. (1987). Shifter circuits: A computational strategy for dynamic aspects of visual processing. Proceedings of the National Academy of Sciences of the U.S.A., 84, 6297-6301. Anstis, S. M. (1977). Apparent movement. In R. Held, H. W. Leibowitz, & H.-L. Teuber (Eds.), Handbook of sensory physiology, Vol. VIII: Perception. New York: Springer-Verlag. Anstis, S. M., & Ramachandran, V. S. (1987). Visual inertia in apparent motion. Vision Research, 27, 755-764. Baker, C. L., Jr. (1988). Spatial and temporal determinants of directionally selective velocity preference in cat striate cortex neurons. Journal of Neurophysiology, 59, 1557-1574. Barlow, H. B. (1979). Reconstructing the visual image of space and time. Nature, 279, 189-190. Barlow, H. B. (1980). The absolute efficiency of perceptual decisions. Philosophical Transactions of the Royal Society of London, Ser. B, 290, 71-82. Barlow, H. B. (1981). Critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society of London, Ser. B, 212, 1-34. Barlow, H. B., & Levick, W. R. (1965). The mechanism of directionally selective units in the rabbit's retina. Journal of Physiology, 178, 477-504. Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 32-48. Blasdel, G. G., Lund, J. S., & Fitzpatrick, D. (1985). Intrinsic connections of macaque striate cortex: Axonal projections of cells outside lamina 4C. Journal of Neuroscience, 5, 33503369. Bossomaier, T., & Snyder, A. W. (1986). Why spatial frequency processing in the visual cortex? Vision Research, 26, 13071309. Braastad, B. O., & Heggelund, P. (1985). Development of spatial receptive-field organization and orientation selectivity in kitten striate cortex. Journal of Neurophysiology, 53, 1158-1178. Braddick, O. J. (1974). A short-range process in apparent motion. Vision Research 14, 519-527. Braddick, O. J. (1980). Low-level and high-level processes in apparent motion. Philosophical Transactions of the Royal Society of London, Ser. B, 290, 137-151. Burbeck, C. A. (1985). Separate channels for the analysis of form and location. Investigative Ophthalmology and Visual Science, 26(Suppl.), 82. Burbeck, C. A. (1986). Orientation selectivity in large-scale localization. Journal of the Optical Society of America, A, 3, 98. Burbeck, C. A. (1987). Position and spatial frequency in largescale localization judgments. Vision Research, 27, 417-427. Burr, D., & Ross, J. (1986). Visual processing of motion. Trends in Neuroscience, 9,304-307. Carpenter, G. A., & Grossberg, S. (1987a). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115. Carpenter, G. A., & Grossberg, S. (1987b). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics, 26, 4919-4930. Cohen, M. A., & Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13,815-826. Cohen, M. A., & Grossberg, S. (1987). Masking fields: A massively parallel neural architecture for learning, recognizing,
68
.L A. Marshall
and predicting multiple groupings of patterned data. Applied Optics, 26, 1866-1891. Cremieux, J., Orban, G. A., Duysens, J., & Amblard, B. (1987). Response properties of area 17 neurons in cats reared in stroboscopic illumination. Journal of Neurophysiology, 57, 15111535. Dammasch, I. E., Wagner, G. P., & Wolff, J. R. (1986). Selfstabilization of neuronal networks. Biological Cybernetics, 54, 211-222. Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by twodimensional visual cortical filters. Journal of the Optical Society of America A, 2, 1160-1169. Daugman, J. G. (1988). Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. 1EEE
Transactions on Acoustics, Speech, and Signal Processing, 36. 1169-1179. Derrington, A. M. (1984). Development of spatial frequency selectivity in striate cortex of vision-deprived cats. Experimental Brain Research, 55,431-437. Dobbins, A., Zucker, S. W., & Cynader, M. S. (1987). Endstopped neurons in the visual cortex as a substrate for calculating curvature. Nature, 329(6138), 438-441. Dobbins, A., Zucker, S. W., & Cynader, M. S. (1988). Endstopped simple cells and curvature: Predictions from a computational model. Investigative Ophthalmology and Visual Science, 29(Suppl.), 331. Dubin, M. W., Stark, L. A., & Archer, S. M. (1986). A role for action-potential activity in the development of neuronal connections in the kitten retinogeniculate pathway. Journal of Neuroscience, 6, 1021-1036. Duda, R. O., & Hart, P. E. (1972). Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 15, 11-15. Easton, P., & Gordon, P. E. (1984). Stabilization of Hebbian neural nets by inhibitory learning. Biological Cybernetics, 51, 1-9. Ellias, S. A., & Grossberg, S. (1975). Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 20, 69-98. Feldman, J. A. (1988). Time, space and form in vision. Unpublished manuscript, University of Rochester Department of Computer Science. Felleman, D. J., & Van Essen, D. C. (1987). Receptive field properties of neurons in area V3 of macaque monkey extrastriate cortex. Journal of Neurophysiology, 57, 889-920. Ferrera, V. E, & Wilson, H. R. (1988). Perceived direction of moving 2D patterns. Investigative Ophthalmology and Visual Science, 29(Suppl.), 264. Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4, 2379-2394. Field, D. J., Kersten, D., & Barlow, H. B. (1988). Is redundancy increased or decreased in visual coding? Investigative Ophthalmology and Visual Science, 29(Suppl.), 408. Fleet, D. J., & Jepson, A. D. (1985). Spatiotemporal inseparability in early vision: Centre-surround models and velocity selectivity. Computational Intelligence, 1, 89-102. Fr6gnac, Y., & Imbert, M. (1978). Early development of visual cortical cells in normal and dark-reared kittens: Relationship between orientation selectivity and ocular dominance. Journal of Physiology, 278, 27-44. Fr6gnac, Y., & lmbert, M. (1984). Development of neuronal selectivity in primary visual cortex of cat. Physiological Reviews, 64, 325-434. Fukushima, K., & Miyake, S. (1982). Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15,455-469. Gabbott, P. L. A., Martin, K. A. C., & Whitteridge, D. (1987).
Connections between pyramidal neurons in layer 5 of cat visual cortex (area 17). Journal of Comparative Neurology, 259,364381. Garey, L. J., & Pettigrew, J. D. (1974). Ultrastructural changes in kitten visual cortex after environmental modification. Brain Research, 66, 165-172. Gary-Bobo, E., Milleret, C., & Buisseret, l~. (1986). Role of eye movements in developmental process of orientation selectivity in the kitten visual cortex. Vision Research, 26, 557-567. Gibson. J. 1 (1950). The perception of the visual world. Westport, CT: Greenwood Press. Gibson. J. J.. Olum. P.. & Rosenblatt, F. ¢1955/. Parallax and perspective during aircraft landings. American Journal of Psychology, 68, 372-385. Globus. A., Rosenzweig, M. R.. Bennett E l_. & Diamond. M. C. t1973). Effects of differential experience on dendritic spine counts in rat cerebral cortex. Journal of Comparattve and Physiological Psychology, 82. t 75-- l 81 Golden. R M. (1988). A unified framework for connecuonist systems. Biological Cybernetics. 59. 109-120. Graves. A. L., Trotter. Y.. & Fr6gnac, Y. 11987). Role of extraocular muscle propnoception m the development of depth perception in cats. Journal of Neurophysiology, 58, 816-831 Greenough, W q'. (1975). Experimental modification of the developing brain. American Scientist. 63, 37-46 Grossberg, S. (1972). Neural expectation: Cerebeltar and retinal analogs of cells fired by learnable or unlearned pattern classes. Kvbernetik. 10.49-57. Grossberg, S. (1976a). Adaptive pattern classification and universal recoding: ! Parallel development and coding of neural feature detectors. Biological Cybernetics, 23. 121-134. Grossberg, S. (1976b). On the development of feature detectors m the visual cortex with applications to learning and reacuondiffusion systems. Biological Cybernetics. 21, 145-159. Grossberg, S. (1980). How does a brain build a cognitive code'? Psychological Review. 87. 1-51. Grossberg, S. (1982al. Processing of expected and unexpected events during conditioning and attention: A psychophysiological theory. Psychological Review. 89 529-572 Grossberg. S. (1982b). Studies of mind and brain: Neural prin-
ciples of learning, perception, developmem, cognition, and motor control. Boston: Reidel Press. Grossberg, S. (1984]. Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In R. Karrer. J. Cohen, & P. Tueting CEds.), Brain and information: event related potentials, New York: New York Academy of Sciences. Grossberg, S., & Marshall, J. A. (1989). Stereo boundary fusion by cortical complex cells: A system of maps. filters, and feedback networks for multiplexing distributed data. Neural Networks, 2, 29-.5 i, Grossberg, S.. & MingoUa, E. 11985a). Neural dynamics of Iorm percepuon: boundary completion, illusory figures, and neon color spreading. Psychological Review, 92, 173-21 l, Grossberg, S., & Mingolla, E. (1985b). Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. Perception & Psychophysics, 38, 141~171. Harris. M. G. (1986). The perception of moving stimuli: A model of spatiotemporal coding in human vision. Vision Research, 26, 1281-1287. Hebb. D. O. (1949). The organization of behavior. New York: Wiley. Heeger, D. J. ~1987). Model for the extraction of image flow. Journal of the Optical Society o f America A. 4. 1455-1471. Heeger, D. J. (1988). Optical flow using spatiotemporal filters. International Journal of Computer Vision, 1,279-302, Hildreth. E. C. (1983). Computing the velocity field along contours. Proceedings of the ACM SIGGRAPH/SIGART Interdisciplinary Workshop on Motion (pp. 26-32/. Association for Computing Machinery.
Self-Organization in Motion Perception Hirsch, H. V. B., & Spinelli, D. N. (1970). Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats. Science, 168, 869-871. Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185-203. Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interactions, and functional architecture in cat's visual cortex. Journal of Physiology, 160, 106-154. Hubel, D. H., & Wiesel, T. N. (1963). Shape and arrangement of columns in cat's striate cortex. Journal of Physiology, 165, 559-568. Hubel, D. H., & Wiesel, T. N. (1965). Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. Journal of Neurophysiology, 28,229-289. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195. 215-243. Hubel, D. H., & Wiesel, T. N. (1970). The period of susceptibility to the physiological effects of unilateral eye closure in kittens. Journal of Physiology, 206, 419-436. Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London, Ser. B, 198, 1-59. Hubel, D. H., Wiesel, T. N., & LeVay, S. (1977). Plasticity of ocular dominance columns in monkey striate cortex. Philo-
sophical Transactions of the Royal Socie(v of London, Ser. B. 278, 377-409. Kato, H., Bishop, P. O., & Orban, G. A. (1978). Hypercomplex and simple/complex cell classifications in cat striate cortex. Journal of Neurophysiology, 41, 1071-1095. Kennedy, H., & Orban, G. A. (19831. Response properties of visual cortical neurons in cats reared in stroboscopic illumination. Journal of Neurophysiology, 49, 686-704. Kersten, D. (1987). Predictability and redundancy of natural images. Journal of the Optical Society of America A, 4, 23952400. Kersten, D., O'Toole, A. J., Sereno, M. E., Knill, D. C., & Anderson, J. A. (1987). Associative learning of scene parameters from images. Applied Optics, 26, 4999-5006. Knill, D. C., & Kersten, D. (1988). The perception of correlational structure in natural images. Investigative Ophthalmology and Visual Science, 29(Suppl.), 407. Koendcrink, J. J. (1986). Optic flow. Vision Research, 26(1), 161180. Kohonen, T. (1982a). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 5969. Kohoncn, T. (1982b). A simple paradigm for the self-organized formation of structured feature maps. In S. Amari & M. A. Arbib (Eds.), Competition and cooperation in neural networks. New York: Springer-Verlag. Kohonen, T. (1984). Self organization of associative memory. New York: Springer-Verlag. Kohonen. T. (19871. Adaptive, associative, and self-organizing functions in neural computing. Applied Optics, 26, 491/)4918. Kohonen, T., & Oja, E. (19761. Fast adaptive formation of orthogonalizing filters and associative memory in recurrent networks of neuron-like elements. Biological Cybernetics, 21, 8595. Linsker, R. (1986a). From basic network principles to neural architecture: Emergence of spatial-opponent cells. Proceedings of the National Academy of Sciences of the U.S.A., 83, 75087512. Linsker, R. (1986b). From basic network principles to neural architecture: Emergence of orientation-selective cells. Pro-
ceedings of the National Academy of Sciences of the U.S.A., 83. 8390-8394. Linsker, R. (1986c). From basic network principles to neural architecture: emergence of orientation columns. Proceedings. of
69 the National Academy of Sciences of the U.S.A., 83, 87798783. Luhmann, H. J., Martinez Mill~n, L., & Singer, W. (19861. Development of horizontal intrinsic connections in cat striate cortex. Experimental Brain Research, 63, 443-448. Lund, J. S. (1987). Local circuit neurons of macaque monkey striate cortex: I. Neurons of laminae 4C and 5A. Journal of Comparative Neurology, 257, 60-92. Mart, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman and Company. Marr, D., & Ullman, S. (19811. Directional selectivity and its use in early visual processing. Proceedings of the Royal Society of London, Ser. B, 211, 151-180. Marshall, J. A. (1988a). Aperture effects in visual motion: Velocity and direction judgments via lateral intrinsic connections. Investigative Ophthalmology and Visual Science, 29(Suppl.), 2511. Marshall, J. A. (1988b). Self-organizing neural networks ¢or perception of visual motion (Tech. Rep. 88-01(I). Boston University Computer Science Department. Marshall, J. A. (1989). Neural networks lor computational vision: Motion segmentation and stereo fusion. Ph.D. Dissertation, Boston University, MA. Ann Arbor, MI: University Microfilms Inc. Maunsell, J. R., & Van Essen, D. C. (1983). Functional properties of neurons in middle temporal visual area of the macaque monkey. 1. Selectivity for stimulus direction, speed, and orientation. Journal of Neurophysiology, 49, 11271147. Michalski, A., Gerstein, G. L., Czarkowska, J., & Tarnecki, R. (1983). Interactions between cat striate cortex neurons. Experimental Brain Research, 51, 97-107. Mikami, A., Newsome, W. T., & Wurtz, R. H. (1986a). Motion selectivity in macaque visual cortex: I. Mechanisms of direction and speed selectivity in extrastriate area MT. Journal of Neurophysiology, 55, 1308-1327. Mikami, A., Newsome, W. T., & Wurtz, R. H. (1986b). Motion selectivity in macaque visual cortex: II. Spatiotemporal range of directional interactions in MT and V1. Journal of Neurophysiology, 55, 1328-1339. Mitchison, G., & Crick, F. (1982). Long axons within the striate cortex: Their distribution, orientation, and patterns of connection. Proceedings of the National Academy of Sciences of the USA., 79, 3661-3665. Movshon, J. A., Adelson, E. H., Gizzi, M. S., & Newsome, W. T. (1985). The analysis of moving visual patterns. In C. Chagas, R. Gattass, & C. Gross (Eds.), Pattern recognition mechanisms (pp. 117-1511. Vatican City: Pontifical Academy of Sciences. Movshon, J. A., Newsome, W. T., Gizzi, M. S., & Levitt, J. B. (1988). Spatio-temporal tuning and speed sensitivity in macaque visual cortical neurons. Investigative Ophthalmology and Visual Science, 29(Suppl.), 327. Nagano, T., & Kurata, K. (1981). A self-organizing neural network model for the development of complex cells. Biological Cybernetics, 40, 195-200. Nakayama, K., & Loomis, J. M. (1974). Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis. Perception, 3, 63-80. Nelson, J. I., & Frost, B. J. (1985). Intracortical facilitation among co-oriented, co-axially aligned simple cells in cat striate cortex. Experimental Brain Research, 61, 54-61. Newsome, W. T., Mikami, A., & Wurtz, R. H. (19861. Motion selectivity in macaque visual cortex. III. Psychophysics and physiology of apparent motion. Journal of Neurophysiology, 55, 1340-1351. Newsome, W. T., & Pard, E. B. (1988). A selective impairment of motion perception following lesions of the middle temporal visual area (MT). Journal of Neuroscience, 8, 22(11-2211.
70 Norman, J. E , Lappin, J. S., & Wason, T. D. (1988). Long-range detection of the geometric components of optic flow. Investigative Ophthalmology and Visual Science, 29(Suppl.), 25t. Orban, G. A., & Gulyhs, B. (1988). Image segregation by motion: Cortical mechanisms and implementation in neural networks~ In R. Eckmiller & C. vonder Malsburg (Eds.), Neural computers. NATO ASI Series, 41. New York: Springer-Verlag. Orban, G. A., Kato, H., & Bishop, P. O. (1979a). End-zone region in receptive fields of hypercomplex and other striate neurons in the cat. Journal of Neurophysiology, 42, 818832. Orban, G. A., Kato, H., & Bishop, P. O. (1979b). Dimensions and properties of end-zone inhibitory areas in receptive fields of hypercomplex cells in cat striate cortex. Journal of Neurophysiology, 42, 833-849. O'Toole, A. J., & Kersten, D. (1986). Adaptive connectionist approach to structure from stereo. Journal of the Optical Society of America A, 3, 72. Pasternak, T., & Leinen, L. J. (1986). Pattern and motion vision in cats with selective loss of cortical directional selectivity. Journal of Neuroscience, 6, 938-945. Pearson, J. C., Finkel, L. H., & Edelman, G. M. (1987). Plasticity in the organization of adult cerebral cortical maps: A computer simulation based on neuronal group selection. Journal of Neuroscience, 7, 4209-4223. Poggio, T., & Hurtbert, A. C. (1988). Learning receptive fields for color constancy. Investigative Ophthalmology and Visual Science, 29(Suppl.), 301. Rakic, P. (1977). Prenatal development of the visual system in rhesus monkey. Philosophical Transactions of the Royal Society of London, Ser. B, 2711, 245-260. Ramachandran, V. S., & Anstis, S. M. (1983). Extrapolation of motion path in human visual perception. Vision Research, 23, 83-85. Regan, D. (1986). Visual processing of four kinds of relative motion. Vision Research, 26, 127-145. Reichardt, W. (1961). Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. In W. A. Rosenblith (Ed.), Sensory communications. New York: Wiley. Rockland, K. S., & Lund, J. S. (1982). Widespread periodic intrinsic connections in the tree shrew visual cortex. Science, 215, 1532-1534. Rockland, K. S., & Lund, J. S. (1983). Intrinsic laminar lattice connections in primate visual cortex. Journal of Comparative Neurology, 216, 303-318. Rockland, K. S., Lurid, J. S., & Humphrey, A. L. (1982). Anatomical banding of intrinsic connections in striate cortex of tree shrews ( Tupaia glis). Journal of Comparative Neurology, 209, 41-58. Rodman, H. R., & Albright, T. D. (1987). Coding of visual stimulus velocity in area MT of the macaque. Vision Research, 27, 2035-2048. Saito, H., Tanaka, K., Fukada, Y., & Oyamada, H. (1988). Analysis of discontinuity in visual contours in area 19 of the cat. Journal of Neuroscience, 8, 1131-1143. Saito, H., Yukie, M., Tanaka, K., Hikosaka, K., Fukada, Y., & Iwai, E. (1986). Integration of direction signals of image motion in the superior temporal sulcus of the macaque monkey. Journal of Neuroscience, 6, 145-157. Sereno, M. E. (1986). Neural network model for the measurement of visual motion. Journal of the Optical Society of America A, 3, 72. Sereno, M. E. (1987). Implementing stages of motion analysis in neural networks. Program of the Ninth Annual Conference of the Cognitive Science Society (pp. 405-416). Hillsdale, NJ: Lawrence Erlbaum Associates. Sethi, I. K., & Jain, R. (1987). Finding trajectories of feature
./. A. Marshall points in a monocular image sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9, 56-73. Shimojo, S., Silverman, G. H., & Nakayama, K. (1988). Occlusion and the solution to the aperture problem for motion. Investigative Ophthalmology and Visual Science, 29(Suppl. ). 264. Singer, W. (1983). Neuronal activity as a shaping factor in the self-organization of neuron assemblies. In E. Basar, H. Flohr, H. Haken, & A. J. Mandell (Eds.), Synergetics of the brain. New York: Springer-Verlag. Singer, W. (1985a). Activity-dependent ~elf-organization of the mammalian visual cortex. In D. Rose & V. G. Dobson (Eds.), Models of the visual cortex (pp. 123-136). New York: John Wiley and Sons. Singer, W. (1985b). Central control of developmental plasticity in the mammalian visual cortex. Vision Research, 25,389-396. Sperling, G., van Santen, J. P. H., & Butt. P. (1985). Three theories of stroboscopic motion detection. Spatial Vision, 1. 47-56. Stent, G. S. (1973). A physiological mechanism for Hebb's postulate of learning. Proceedings of the National Academy of Sciences of the U.S.A., 70, 997-1001. Stork, D. G., & Wilson, H. R. (1988). Considerations of Gabor functional descriptions of visual cortical receptive fields. Preprint. Takeuchi, A., & Amari, S. (1979). Formation of topographic maps and columnar microstructures in nerve fields. Biological Cybernetics, 35, 63-72. Tanner, J. E. (t986). Integrated optical motion detection. Ph.D. Dissertation, California Institute of Technology (Caltech Tech. Rep. 5223:TR:86). Thompson, W. B., & Pong, T. C. (in press). Detecting moving objects. International Journal of Computer Vision. Trotter, Y., Frtgnac, Y., & Buisseret, P. (1987). The period of susceptibility of visual cortical binocularity to unilateral proprioceptive deafferentation of extraocular muscles, Journal of Neurophysiology, 58, 795-815. Ts'o, D. Y., Gilbert, C. D., & Wiesel, T. N. (t986). Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis. Journal of Neuroscience, 6, 1160-1170. Van Essen, D. C., & Anderson, C. H. (t987). Reference frames and dynamic remapping processes in vision. In E. Schwartz (Ed.), Computational neuroscience. Cambridge, MA: MIT Press. Van Essen, D. C., & Maunsell, J. H. R. (1983). Hierarchical organization and functional streams in the visual cortex. Trends in Neurosciences, 6, 370-375. van Santen, J. P. H., & Sperling, G. (1984). A temporal covariance model of motion perception. Journal of the Optical Society of America A, 1,451-473. yon der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik. 14, 85-100. yon der Malsburg, C., & Cowan. L D. 11982). Outline of a theory. for the ontogenesis of iso-orientation domains in visual cortex. Biological Cybernetics. 45, 49-56. Waibel, A.. Hanazawa, T.. Hinton. G.. Shikano, K., & Lang, K. (1987). Phoneme recognition using time-delay neural networks. ATR Technical Report. Advanced Telecommunications Research Institute International. Japan. Wallach, H. (1935). Uber visuelt wahrgenommene Bewegungsrichtung. Psychologische Forschung, Zll, 325-380. Wallach, H. (1976). On perception. New York: Quadrangle. Waiters, D. (1987). Properties of conneetionist variable representations. Program of the Ninth Annual Conference of the Cognitive Science Society (pp. 265-273). Hillsdate, NJ: Lawrence Erlbaum Associates. Watson, A. B. (1987). Efficiency of a model human image code. Journal of the Optical Society of America A. 4. 2401-2417.
Self-Organization in Motion Perception
71
Waxman, A. M., & Duncan, J. H. (1986). Binocular image flows: Steps toward stereo-motion fusion. I E E E Transactions on Pattern Analysis and Machine Intelligence, PAMI-8,715-729. Welch, L. (1988). Speed discrimination and the aperture problem. Investigative Ophthalmology and Visual Science, 29(Suppl.), 264. Wiesel, T. N., & Hubel, D. H. (1965). Comparison of the effects of unilateral and bilateral eye closure on cortical unit responses in kittens. Journal of Neurophysiology, 28, 1029-1040. Williams, D., & Phillips, G. (1987). Cooperative phenomena in the perception of motion direction. Journal of the Optical Society of America A, 4, 878-885. Willshaw, D. J., & v o n der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proceedings of the Royal Society of London, Ser. B, 194, 431445. Wilson, H. R. (1988). Development of spatiotemporal mechanisms in infant vision. Vision Research, 28, 611-628. APPENDIX:
IMPLEMENTATION
DETAILS
This section will describe in detail the three simulations. For each simulation, equations and parameters will specify the coordinate positions of the cells in the network, the strengths of excitatory and inhibitory connections between cells, the manner in which each cell's activity level changes according to its inputs, the manner in which connection strengths vary according to cell activity correlations, and the sequences of simulated visual input to the network. In the following discussion, let the symbol ~)~ (with indexing subscripts and superscripts) refer to a number drawn pseudorandomly from the interval [0,1), and define the notations Ix] -= max(x, 0),
lxl = floor(x),
[x] = ceil(x).
The function floor(x) produces the greatest integer less than or equal to x, and the function ceil(x) produces the least integer greater than or equal to x. Network
S t r u c t u r e : Cell C o o r d i n a t e
Positions
The cells in each simulated network are organized into two layers, L, and L2; the layer in which the ith cell resides is specified by ':-> Within each layer, every cell has X and Y spatial coordinates, specified by rL and %. Let the quantities -= r't{2)~'~!2)t~2~ be the number of cells in L2. Then ! ~:., =
if0~i< 2.~'', if-'~) ~ i < ?l~> + -~l:, otherwise;
= / l ( i mod rL
[1((i
-
(!'/'(ll('~l)))/~ a{l)]
-:u) rood (r~,'(:)~:(:)))/02' I
(1) if ~,, = 1, if ":,, = 2:
(2)
~,, = Ili/(