Segmentation in Scale Space - CiteSeerX

104 downloads 1834 Views 345KB Size Report
space into objects with closed and orientable borders. ... structure of scale space, a neural network approach using synchronizing ... bremen.de/oncalc.html.
Segmentation in Scale Space ? Rolf D. Henkel Institute of Theoretical Physics, University of Bremen, Germany

Abstract. A segmentation scheme based on tracing objects and borders

through scale space is proposed. Scale space allows to create a hierarchical representation of input data which can be used to tessellate input space into objects with closed and orientable borders. For analyzing the structure of scale space, a neural network approach using synchronizing neural oscillators is proposed.

1 Introduction

Despite a wide variety of di erent segmentation techniques [1, 2, 11], no general theory of segmentation exists. In this paper, the segmentation task is explored in a biological context. Speci cally, we want to show how neurons with small and large receptive elds can cooperate in order to group visual data into disjunct classes. One might call this process early segmentation, its sole purpose being the grouping of raw data into some meaningful chunks. The grouping operation leads to a tremendous data reduction for higher cognitive functions, which we assume not to interfere with the segmentation process in question. Thus grouping will only be done on the basis of some intrinsic image characteristics and not be based on pre-learned knowledge about the data being processed. It is not a trivial question what kind of image characteristic one should choose for early segmentation. The possibility to nd one is closely connected to the fact that the visual signals in question are generated by only a few physical processes, which in turn create some regularities in the sensory signals. One can exploit these regularities in two ways: by trying to invert the image formation processes, or by utilizing some invariants of these processes. A simple example of the rst kind are various shape-from-shading algorithms. They can be interpreted as trying to invert the physical processes leading to shadows on the surface of objects. Easier than the inversion type of processing is the grouping of data based on some invariant image feature. One trivial candidate for such a feature is the optical ow induced by the motion of objects in the visual eld or by ego-motion. Using this feature for segmentation corresponds to utilizing the "common fate" paradigm of gestalt-theory. However, we will not be concerned with the appropriate image features to choose for segmentation. At the moment, this seems to ?

In: Proc. of. 6th Int. Conf. on Computer Analysis of Images and Pattern, CAIP '95, Prague 1995. See also: WWW-based Image Processing, http://pooh.physik.unibremen.de/oncalc.html.

be a largely heuristic question. Instead, we ask how to segment a given feature set into a few correct object chunks. Usually, the regions grouped together into one object are required to be uniform and homogeneous with respect to the chosen image feature. However, strictly uniform and homogeneous regions are simply not present in generic datasets. This is caused by a variety of reasons, some connected with sensor noise, others with the interference of image formation processes causing shading and highlights to appear on the objects in question. Accordingly, the segmentation based on a strict notion of uniform feature values across objects leads to regions full of small holes, with ragged or no border at all. Relaxing the uniformity requirement helps, but also tends to merge regions corresponding to di erent objects. A generic segmentation process has to deal with this problem. In the following, we propose a scheme which utilizes neurons with di erent sized receptive elds for the segmentation of data given over a two-dimensional input eld. The objects detected by the algorithm possess some nontrivial properties, like being compact and enclosed by orientable borders.

1.1 Local and Global Analysis Basically, only two approaches toward segmentation can be di erentiated. Since we are dealing with features f (x) given over some input space X 2 IR2 , one element of the dataset has two distinct properties: a value f and a spatial coordinate x. Depending on the feature quality used, we can distinguish two di erent ansatze for segmentation, called in the following global and local analysis. In a global analysis, one ignores metrical information, in a local analysis, one uses it. As we will see, both types of segmentation are closely connected with two complementary properties of objects: objects de ned as prominent signal variations versus objects de ned by being encircled with borders.

Global Analysis as Clustering Process. Ignoring the spatial coordinates of the feature data, we are left with a set of feature values f approximating a global probability distribution P (f ). Analysis of P (f ) can be done via a variety of cluster algorithms. These algorithms search prominent clusters in the data set and compute some prototypical representation fp for each cluster. In a second processing step, all data is grouped into one of the found clusters by associating each f (x) to the nearest prototype fp. Since in image analysis tasks the available data base for building the prototypical feature vectors fp is large, the central limit theorem assures stable per-

formance of these types of algorithms. The intrinsic properties of the physical world | which created the data in rst place | assure normally connectedness of the detected segments in input space (despite the fact that the spatial information was not used in the grouping operation). However, strong noise will cause algorithms based on global analysis to fail ( gure 1.b). If the distributions of feature values corresponding to di erent objects overlap, detected regions will

have many holes or they may even consist of several disconnected patches. In addition, regions far apart from each other, but possessing approximately equal feature values, will be grouped into a single object. Both types of errors can be attributed to the neglect of spatial information in a global analysis.

Local Analysis as Grouping in Measurement Space. Another approach towards segmentation is based on a local analysis of the raw data, explicitly utilizing the spatial information content of the data set. Within this approach, objects are de ned as areas being enclosed by borders, i.e., strong local signal variations. However, a local analysis can yield only local data, i.e. edge elements. Thus in a second processing step these edge elements have to be grouped into continuous and closed borderlines de ning valid object regions. The edgegrouping process has to delete edge elements not consistent with other data, and to create some missing edge elements in order to close boundaries ( gure 1.c). This requires, of course, some global knowledge about the borderlines present. a

b

c

Fig. 1. The image a) used for testing segmentation algorithms. It consist of several

simple objects, with gaussian distributed noise added. In b) the results of a simple cluster algorithm are displayed, an edge-detection scheme was employed in c).

1.2 Scale Space It follows from these considerations that we can not circumvent the simultaneous application of global and local analysis. One hint how to combine both approaches can be obtained from biological vision systems. In these systems, a scene is examined with neurons having di erent sized receptive elds. For example, one nds within the tectum opticum of salamander many neurons with small, some neurons with larger receptive elds, and even a few neurons sensitive to the total visual eld of the salamander [13]. Yet another hint comes from theoretical considerations about the scales of the image formation process. Trivially, an upper limit of resolution is given by the receptor spacing, and a lower limit of resolution by the size of the observation window. One nice way to interpolate between these di erent scales is to utilize the two-dimensional di usion equation [3]. Starting from the data f (x; y), one creates a family of images

f (x; y; t), where t measures scale, by setting f (x; y; t = 0) = f (x; y) and using the di usion equation

2 2 @tf (x; y; t) = [@xx + @yy ]f (x; y; t):

(1)

to obtain the images at coarser scales. This di usion process has a natural analog in the sampling of input space by neurons with di erent sized receptive elds. The solution f (x; y; t) of the di usion equation at scale t can be obtained from f (x; y; t = 0) through convolution with the gaussian kernel

g(x; y) = 1=(4t)  exp[ ?(x2 + y2)=(4t) ]

(2)

Thus neurons with di erent receptive eld sizes can be interpreted as sampling scale space at certain discrete points f (xk ; yk ; tk ).

1.3 Hierarchical Segmentation During resolution reduction by di usion, local signal variations smooth out as "time" t proceeds. Extrema disappear one after the other (only rarely a new on in created, see [6]), and nally only the strongest signal variations survive. Thus a hierarchical ordering of extrema is induced by the di usion process. The images f (x; y; t) obtained from the original data f (x; y; t) obey further a concept of causality [3]: every value f (x; y; ts ) =  found at a speci c scale ts > 0 can be traced back to the original resolution t = 0. The contour sheets f (x; y; t) =  are, except for a nite number of critical values c , two-dimensional submanifolds of IR3 . This follows directly from Sard's theorem. At the critical values c , the topology of the contour sheets changes. These events correspond to the aforementioned disappearance or appearance of local extrema. From Thom's theory it follows that every such event has qualitatively the same generic form [3]. Since f (x; y; t) is a one-parameter family of functions in t, the only possible way to change is given by the so-called fold catastrophe, generically described through the function

f (x; y; t) = x6 + t(x + 1) + y2 3

2

(3)

In gure 2 the contour sheets of this function are displayed near a critical value c . For  < c , we see a branching of the contour-sheet as one descends towards ner resolutions ( gure 2.a). For  > c ( gure 2.c), the manifold separates into two disjunct pieces, one continuing towards coarser resolutions, and the other one descending in scale space towards ner resolutions. Clearly, the surface spawned from the main surface is orientable, and encircles an area at the nest resolution which is topological equivalent to a disk (or a union of discs if it branches again). Between critical values, all surfaces are similar and can be continuously deformed into each other. We thus have an onion-like, hierarchical structure of

a

b

scale

c

scale

scale

Y

Y

Y

X

X

X

f (x; y ; t) = 0 < crit

f (x; y ; t) = crit

f (x; y ; t) = 1 > crit

Fig. 2. The change of the topology of contour sheets near a critical value c . Connected at  < c (a), the sheet splits into two disjunct sets for  > c (c).

contour sheets, changing topology only at a few critical values c , corresponding to the disappearance of local extrema present in the input data. Several schemes have been proposed for exploiting the structure of scale space [3, 6, 7]. Most approaches need a very dense sampling of scale space and are therefore computationally expensive. Furthermore, the connection with biological vision systems is not clear. We propose here a single and simple neuronal mechanism for utilizing scale space: the tracing and merging of contour sheets in scale space by neuronal oscillators. Within this ansatz, neurons distributed over scale space at discrete points f (xk ; yk ; tk ) can participate in a common slow modulation of their mean ring rate, if they code approximately the same intensity level. By this process, each contour-sheet in scale space is can be marked with a speci c "timecode". These patches of synchronization are allowed to merge further, exhibiting a common frequency, if | in scale space | no pronounced border exists between them. As we will show, such a process is able to segment and mark data into a few object chunks.

1.4 Possible Neural Implementation We now sketch a possible neuronal mechanism for the creation and merging of contour sheets in scale space. As basic computing units pools of densely interconnected excitatory and inhibitory spiking neurons are used. Single neurons are described by their mean ring rate, which is given by the internal eld h via a sigmoid transfer function f (h) = (1 + exp(?h=))?1 . Connecting all neurons within a pool in an all-to-all fashion leads to the following mean- eld equations for the averaged ring rates of the excitatory (E ) and inhibitory (I ) neuron populations:

@E = ?E + f ( E ? I ? E + S ) @t @I = ?I + f ( I ? E ? I ) @t

(4) (5)

0.7

0.4 0.3 0.2 0.1

b

900

frequency of oscillation

mean firing rate

0.5

0 -0.5

1000

a

0.6

800 700 600 500 400 300 200

0

0.5 1 stimulus

1.5

100 -0.5

2

0

0.5 1 stimulus

1.5

2

Fig. 3. Mean ring rate (a) and slow modulation frequency of the mean ring rate (b) for pools of coupled excitatory and inhibitory neurons. Simulation parameters were = = 1:0, = 3:0,  = 0:5, E = 0:075, I = 0:15, and  = 0:025. These equations have been derived in a variety of contexts [14, 12, 9] and display the following basic behavior: as the applied stimulus S gets stronger, the mean ring rate of the pool increases (Fig. 3.a). In addition, the system displays limit-cycle behavior, resulting in a slow modulation of the mean ring rate. The frequency of this slow modulation is a monotonic function of the input stimulus ( g. 3.b) until saturation e ects take over. It is well known that systems of coupled limit-cycle oscillators display frequency and phase locking for a variety of connection schemes [5, 4, 10, 8]. In gure 4.a a prototypical network is displayed. Four oscillators are receiving xed stimuli, whereas a a fth oscillator is tuned through the available stimulus range. The oscillators with xed input have small synaptic links from their excitatory pools to the inhibitory pool of the oscillator which is tuned through. a

b excitatory

variable stimulus

1.2

inhibitory correlation

1

0.2

0.4

0.6

0.8

0.8 0.6 0.4 0.2 0 -0.2 0

0.2

0.4 0.6 stimulus value

0.8

1

Fig. 4. a) In a prototypical network, one of the oscillators is tuned through the stimulus

range, while four other oscillators are kept xed at stimulus values of 0.2, 0.4, 0.6, and 0.8. In b) the correlation measure indicates synchronization within a small -range.

The correlation measure cij = (Ei Ej ? Ei Ej )=

q

2Ei 2Ej (where 2x =

x2 ? x2 ) between the various oscillator-pairs shows four pronounced peaks over the whole stimulus range ( gure 4.b). These peaks indicate frequency locking of the central oscillator with one of the four driving oscillators. Generally, depending on the coupling strength between them, oscillators close in frequency will tend to group together. They will form clusters oscillating at a common frequency, while oscillators further away from the principal cluster frequency stay unsynchronized with that cluster.

Level l+1

Level l

Level l-1

Fig. 5. The link scheme used for segmentation. Neurons with identically sized receptive elds are arranged in layers. Neurons with partially overlapping receptive elds are connected by uni- or bidirectional synaptic links.

1.5 Segmentation Results This dynamic linking scheme is used with a speci c link structure to analyze scale space ( gure 5). Arranging all neurons in layers of xed resolution, two types of links are introduced: unidirectional interlevel links, connecting coarser resolutions levels with the next ner levels, and bidirectional intralevel links, connecting neighboring neurons within a layer. Due to the local nature of the link scheme, only oscillators close in scale space can couple with each other, thus forcing synchronized activity to follow contour sheets. In order to perform large scale simulations a simpli ed locking scheme was used. Two units coupled by uni- or bidirectional links were marked as synchronized, if the di erence in their ring rates was less than a prescribed threshold. In case of bidirectional links, the common frequency of the synchronized units was assumed to be the average of the individual modulation frequencies before synchronization. During linking, units with the smallest frequency di erence were allowed to synchronize rst. Units at the end of unidirectional links were just set to the modulation frequency of the driving units if the sync-threshold was not exceeded.

a

b

Fig. 6. Two segmentation results. In (a), the input to the network was the same is in gure 1. In b), a picture from a standard dataset (CIL-0001, S.A. Sha er, Calibrated Imaging Lab, Carnegie Mellon University.) was used. The results of two segmentation runs are depicted in gure 6. The neural network consisted of a total of six resolution layers with the link structure proposed in gure 5. The only parameters of the model, the inter- and intralevel link thresholds, were kept xed for both simulation runs (all intralevel thresholds were set at i = 0:04, except 1 = 0:05, and all interlevel thresholds at ij = 0:03, except 21 = 0:5). Acknowlegdements. Stimulating discussions with H. Schwegler and H.-U. Bauer are acknowledged. This work was supported by a grant of the Deutsche Forschungsgemeinschaft. References 1. Fu, K.S., Mui, J.K.: A Survey on Image Segmentation. Patt. Recog. 13 (1981) 3{16 2. Haralick, R.M., Shapiro, L.G.: Survey: Image segmentation techniques. CVGIP 29 (1985) 100{132 3. Koenderinck, J.J: The Structure of Images. Biol. Cybern. 50 (1993) 363{370 4. Kopell, N., Ermentrout, G.B: Phase Transitions and other Phenomena in Chains of Coupled Oscillators. SIAM J. Appl. Math. 50 (1990) 1014{1052 5. Kuramoto, Y.: Chemical Oscillations, Waves, and Turbulence. Springer-Verlag, New York, 1984. 6. Lifshitz, L.M., Pizer, S.M.: A Multiresolution Hierarchical Approach to Image Segmentation Based on Intensity Extrema. IEEE PAMI 12 (1990) 529{540 7. Lindeberg, T.: Detecting Salient Blob-Like Image Structures and Their Scales with a Scale-Space Primal Sketch: A Method for Focus-of-Attention. Int. J. Comp. Vis. 11:3 (1993) 283{318 8. Lumer, E.D., Huberman, B.A.: Hierarchical Dynamics in Large Assemblies of Interacting Oscillators. Physics Letters A 160 (1991) 227{232 9. Malsburg, C., Buhmann, J.: Sensory Segmentation with Coupled Neural Oscillators. Biol. Cybern. 67 (1992) 233{242 10. Niebur, E., Schuster, H.G., Kammen, D.M., Koch, C.: Oscillator-phase Coupling for Di erent Two-dimensional Network Connections. Physical Review a 44 (1991) 6895{6904

11. Reed, T.R., du Buf, J.M.H.: A Review of Recent Texture Segmentation and Feature Extraction Techniques. CVGIP 57 (1993) 359{372 12. Schuster, H.G., Wagner, P.: A Model for Neuronal Oscillations in the Visual Cortex. Biol. Cybern. 64 (1990) 77{82 13. Wiggers, W., Roth, G., Eurich, C., Straub, A.: Binocular Depth Perception Mechanism in Tongue-Projecting Salamanders. J. Comp. Physiol. A 176 (1995) (to appear) 14. Wilson, H.R., Cowan, J.D.: Excitatory and Inhibitory Interactions in Localized Populations of Model Neurons. Biophys. J. 12 (1972) 1{24

This article was processed using the LATEX macro package with LLNCS style

Suggest Documents