Spatio-temporal CNN Algorithm for Object Segmentation ... - CiteSeerX

Spatio-temporal CNN Algorithm for Object Segmentation and Object Recognition Abraham Schultz**, Csaba Rekeczky+*, István Szatmári+*, Tamás Roska+*, and Leon O. Chua+ **Naval Research Laboratory, Radar Division, 4555 Overlook Ave., S.W., Washington D.C., USA + Electronics Research Laboratory, College of Engineering, University of California at Berkeley, 258-M, Cory Hall, Berkeley, CA 94720, USA *Analogical and Neural Computing Laboratory, Computer and Automation Institute, Hungarian Academy of Sciences, Kende utc. 13-17, 1111 Budapest, Hungary ABSTRACT: In this paper a spatio-temporal analogic CNN algorithm is designed for front-end filtering, segmentation and object recognition. First, a generalized segmentation strategy is presented based on various diffusion models. Both PDE and non-PDE related schemes are discussed and their VLSI complexity is analyzed. In classification (object recognition) a CNN implementation of the autowave metric, a “nonlinear” variant of the Hausdorff metric, is used. This approach turned out to be superior compared to some other classification methods, e.g. the Hamming distance calculation. A number of tests have been completed within the so-called “bubble/debris” segmentation experiments using original and artificial gray-scale images.

1. Introduction Since the publication of the original paper in 1988 ([1]), the rapidly growing field of Cellular Neural Networks (CNNs) have found numerous potential applications, especially in image processing problems where real-time signal processing is required. The CNN approach evolved to a widely accepted computational paradigm [2], and recently its dedicated hardware architecture has also been designed, called the CNN Universal Machine (CNN UM, [3]). The CNN UM is the first parallel, stored program analogic and visual array microprocessor that can be fabricated on a single chip ([5], [6]). The new device is programmed by analogic algorithms, i.e. using analog operations in sequence combined with local logic at the cell level. In this paper we describe a complex analogic CNN algorithm designed for front-end filtering, segmentation and classification in the so-called bubble-debris classification experiments.

2. Preliminaries in Bubble/debris Classification Experiments The bubble-debris classification project aims to develop a real-time operating warning system for solving the condition based maintenance problem of helicopter (and jet) engines. This classification system should warn the pilot to land if the number of debris particles in the oil of the engine exceeds a predefined threshold level indicating the likelihood of an engine failure. To perform the inspection optical sensing and subsequent image analysis was chosen because the strong mechanical vibration and electromagnetic interference render this problem very difficult to be solved using other types of methods. In the current system a pulsed laser illuminator projects an image of the fluid containing various suspended objects into a CCD sensor. An image usually contains debris particles and air bubbles which can overlay when projected onto the image plane. The major goal is to separate debris particles from the air bubbles. The problem of distinguishing debris particles from air bubbles is difficult due to the coarse resolution of the images and the requirement for an extremely low false alarm rate for miss-classified bubbles. At NRL a bubbledebris classifier is being developed. The approach followed was to tune the system parameters to achieve the desired false alarm rate and to minimize the percentage of miss-classified debris particles. The system operates as a series of rejection filters with increasing time consuming classification tasks applied to smaller and smaller number of objects. First single bubbles are removed using a relatively simple test involving the variance of a set of radii drawn through the objects center. Then double-bubbles are removed using an arc test. Finally multiple bubble groups are detected using the erosion operator to find the largest bubble in the group. The percentage of matching pixels of the entire boundary of the unknown object to the circumference of the hypothetical circle associated with the largest bubble in the group is computed. The radius of this circle is the number of times the erosion operator was applied to get the “extinction set” (the set which vanishes with the next application of the erosion operator) and the center of mass of the extinction set is taken to be the center of the circle. Matching is done to within a specified tolerance which has been taken to be about 1.5 pixels. If the unknown object is a debris particle the percentage of matching pixels will in most cases be relatively low. The strategy of using a sequence of tests of increasing computational complexity was determined by the limitations of the standard (digital) single

CPU computer. This approach makes the classification algorithm difficult to tune in that the thresholds used for one test can impact on subsequent tests. CNN offers the possibility of a totally different approach to the bubble-debris classification problem. The features defined in NRL system cannot be easily implemented in this environment and it is necessary to define a new feature set. It should be noted that the highest frame rate of the image acquisition system in bubble-debris experiments can be as high as 2 milliseconds. A complex CNN algorithm can be executed within this time interval since most CNN operations (and reprogramming the hardware) are in the order of a microsecond. The approach followed here is to filter out air bubbles using CNN based binary morphology [15] and the autowave metric. In the experiments we used binary images, the output of the field-programmed gate array that in the current system thresholds all gray-scale input images at a fixed level right after the acquisition. We have also investigated how the quality of these binary images can be improved if the fixed thresholding of the original grayscale image is replaced with a CNN based locally adaptive front-end filtering and segmentation strategy.

3. Object Segmentation via Nonlinear Diffusion In this section we define the CNN based diffusion models used in the experiments. It is assumed that the image is corrupted by additive Gaussian that justifies the application of diffusion-type filters in noise reduction and image enhancement. First, CNN models of the so-called constrained linear and nonlinear diffusion are discussed derived from PDE formulation. Then, an algorithmic CNN approximation of a novel type (non-PDE related) nonlinear diffusion formulation is presented and analyzed. A major breakthrough in the field of edge detection comes from Perona and Malik [10], who proposed anisotropic diffusion for adaptive smoothing to formulate the problem in terms of the nonlinear heat equation. Some concerns were risen by other researchers on the uniqueness of their solution and a “pre-diffusion” strategy is proposed to improve the original model (e.g. [12]). In [14] we investigated several approaches (based on studies [10]-[12]), from which only a particular one will be recalled here, due to its favorable VLSI implementation (stability and robustness) properties and flexible framework to incorporate further spatial adaptivity to this operator. This approach is a generalization of Nordström's model ([11], variational regularization for global edge detection), named constrained anisotropic diffusion: & & & & & * d & I ( x , t ) − div[ g ( x , t ) grad ( I ( x , t ))] = β ( I ( x , t 0 )) − I ( x , t ) , I ( x , t0 ) = I 0 ( x ) (1) dt & & & where I ( x , t ) is the image intensity ( I0 ( x ) is the original image), the vector x represents the spatial coordinates, &

the time variable t can also be interpreted as the scaling parameter and g( x , t ) is the thermal conductivity. Proposed functions for g(.) are (that enable to satisfy the immediate localization and piecewise smoothing criteria without sacrificing the causality [10]): & g1 = exp − grad ( I ( x , t )) 

g2 = 1 + 

(

&

2

/ K 2  , K > 0

grad ( I ( x , t )) / K

)

1+ α 

 

−1

(2)

, α > 0, K > 0

Formulation (1) makes it possible to force the output to remain close to a pre-defined local morphological constraint or any image which is calculated by β(.) . If β(.) is chosen properly this generalization will not violate the causality assumption of the multi-scale image description, but gives a flexible practical framework where different filtering strategies can be efficiently connected to the anisotropic diffusion. For example, one can define β(.) as the weighted average in a given neighborhood that can be achieved by convoluting a Gaussian with the & & original (initial) signal ( β( I ( x , t 0 )) = Gσ ∗ I ( x , t 0 ) ). A possible CNN template reflecting these ideas is the following (the associated CNN model is derived using spatial discretization and simplification of the nonlinearities): 0 Φ 0 1 − ∆v xx / 2 K A = 0, D = Φ 0 Φ , Φ = g ∆v xx , g =  0  0 Φ 0 

if ∆v xx < 2 K otherwise

, B = β(.)

(3)

where B calculates the local morphological constraint. Note that the contribution added by the B term can also be interpreted as a pre-calculated bias map of the anisotropic template. If the diffusion coefficients are not spatially varying and β(.) is defined as a simple low-pass filter from (1) we obtain a constrained linear diffusion. This robust diffusion scheme is thoroughly discussed in [4]. Contrary to previous models that were derived from a PDE formulation, recently a novel-type nonlinear diffusion formulation [13] was proposed that is a non-PDE related approach. The form assumed for the diffusion equation is as follows: d I ij = J e ( I i, j −1 − I i, j ) + J n ( I i −1, j − I i, j ) + J w ( I i, j +1 − I i, j ) + J s ( I i +1, j − I i, j ) dt

(4)

here the diffusion connection weights are functions of the region of the image within the window Nr as shown in Fig. 1 (Nr is a union of four subwindows Nr,e, Nr,n, Nr,w, and Nr,s).

Let us define the local variances and the total variance: Ve = j-1

j

2 ∑ ( I k ,l − I i, j ) , ... , Vt = Ve + V n + V w + Vs

(5)

kl ∈N r ,e

j+1

Nr,n r=2 i-1 i

r=1

Nr,e

Nr,w

i+1

Nr,s

(a)

Figure 1 The four symmetric subwindows surrounding each pixel used in variance estimation. The eastern (Nr,e) subwindow is shaded with gray and the pixels involved in the estimation process (r=2) are drawn with black.

(b)

(c)

Figure 2 Comparison of CNN based diffusion methods in filtering and segmentation, (a) original image corrupted by additive Gaussian (b) output of the constrained nonlinear diffusion derived from PDE formulation, (c) output of the nonlinear diffusion derived from non-PDE approach.

The diffusion coefficients can now be defined: Vt

Je =

Vt

Jw

Vt

− − Vw (1 − e λ ) + 0.25 e λ , Vt Vt

− − V = e (1 − e λ ) + 0.25 e λ , Vt

Jn =

Vt

Vt

Vt

Vt

− − Vs (1 − e λ ) + 0.25 e λ Vt

(6)

− − V J s = n (1 − e λ ) + 0.25 e λ Vt

Simply defining the diffusion coefficients as the quotient of the local and total variance would lead to a model that is extremely noise sensitive. The above formulation is a possible solution for the problem that arise when the total variance is relatively small and one of the local variances have a comparable value. By introducing the “variance scale factor” λ in (6) as in the homotopy methods this problem can be avoided. Further-more, it has been found that performance is relatively insensitive to the value of λ over a moderate range. The nonlinear diffusion model defined by (4)-(6) does not have an associated PDE (a constructive proof can be given based on [9]). Using some simplifications it is possible to derive an iterative CNN approximation that has similar VLSI complexity to the PDE related models based on (1)-(3) and still keeps some favorable properties of the original model. Two simplifications seems to be necessary in diffusion coefficient calculation: (i) local variance estimation by employing the absolute value function in the nearest neighborhood (these estimates can be calculated by simple nonlinear templates and stored in local analog memories), (ii) replacing the normalization (division) with an iterative solution that forces the sum of the weight values to stay close to the unity without significantly altering the relative ratio of these values. After shifting the weight values to the proper locations now the diffusion process can be run controlled from local analog memories for some time T. This should be repeated in a cycle (n) always updating the weight values before the diffusion is performed. Since the output of this nonlinear diffusion model converges to a nontrivial solution timing (nT) is not a critical issue (only a lower limit need to be specified). Though this algorithm does not need a complex nonlinear CNN template, it operates with space-variant linear templates (the diffusion is controlled from local analog memories) that sets its VLSI complexity above the existing CNN Universal Chip implementations ([5]-[6], [18]-[20]). The major advantage of the non-PDE related model is that in case of a heavy noise corruption it can reconstruct piecewiseconstant signals better than the PDE related models (Fig. 2). Initial experiments also indicate that this method does not break down if the basic assumptions about the noise process (additive zero mean Gaussian) are violated (e.g. additive double-tailed exponential or multiplicative Gaussian). The algorithm (see the flowchart in Fig. 3) consists of three computational blocks: (i) linear or nonlinear diffusion based pre-filtering, (ii) local threshold estimation, (iii) locally adaptive segmentation and edge detection.

The first block is a diffusion-type filter (DIFF_FILT) designed for noise reduction and edge enhancement. The second block computes a locally optimal threshold level combining the first order (mean) and second order (variance) image statistics. The mean is estimated through a constrained linear diffusion process (MEAN_EST), while the approximation of the variance is completed using the absolute value function in a nonlinear CNN template (VAR_EST). These two outputs are scaled (α, β) and added to calculate the bias map of the adaptive thresholding. In the last block adaptive thresholding (ADTHRES) gives the binary segmentation output. Sorting (SORTING) eliminates objects being smaller than a specified size and the edge map is obtained using binary morphology (EDGEDET). If α = 1 and β = 0 the main part of the algorithm reduces to a DoG type operator [7] since the difference of two diffusion outputs will be thresholded. When α = 0 and β = 0 the method reverts back to a fixed thresholding approach that employs only a pre-filtering. Combination of both statistical estimates (α ≠ 0, β ≠ 0) results in an efficient and robust tool for different image and noise models. The algorithm has been tested on number of real images containing both bubbles and debris particles (e.g. Fig. 4). (a)

Local threshold estimation

ORIGINAL IMAGE

MEAN_EST (c)

α

VAR_EST (d)

DIFF_FILT (b)

(a)

β

(b)

BMAP_CR (e)

ADTHRES (f)

Pre-filtering (linear or nonlinear diffusion based)

SORTING Locally adaptive segmentation and edge detection

(c)

(d)

(e)

(g)

EDGEDET (h) EDGE MAP

Figure 3 Flowchart of the segmentation algorithm

(f) (g) (h) Figure 4 Intermediate processing results, (a) original image, (b) nonlinear diffusion, (c) mean estimate, (d) variance estimate, (e) bias map, (f) segmentation, (g) sorting, (h) edge detection.

Artificial test images were also generated corresponding to the general image model that have similar global histogram to the original images. The performance and robustness of the algorithm was tested altering the image and noise model parameters (shifting the object gray-scale levels, changing the background illumination, violating the additive Gaussian noise assumption etc.). These experiments showed a better performance of the locally adaptive (nonlinear) method compared to a globally optimal fixed thresholding, edge detectors based on DoG-type operators and Canny’s edge detector [8].

4. Object Recognition Using the Autowave Metric on CNN The algorithm designed for object recognition consists of two major blocks (see the flowchart in Fig 5.): (i) center point detection based on binary morphology, and (ii) object recognition based on autowave metric. In the first step a so-called masked erosion operation is implemented in CNN using the templates of binary morphology [15]. The result of this procedure is a set of points estimating the object centers.

Center point detection

QVWHSV

(

0

% _

,1

3

L

3

(

%

'

%

'

%

0PDVN

L

3

&HQWHU3RLQWV

(a)

(b)

(c)

(d)

0PDVN

2EMHFWUDGLL HVWLPDWLRQ

$XWRZDYH

&RPSDULVRQ

Classification

&RQWUROOHG

0PDVN

%XEEOH0RGHOV

287

3

'HEULVSDUWLFOHV

Figure 5 Flowchart of the bubble-debris classification algorithm containing center point detection using binary morphology and bubble model generation and classification via autowaves. Masked erosion (M1) is (e) employed to extract the center points. M3 contains information related to Figure 6 Consecutive steps of the bubbleradii of the objects. This image controls the autowave when bubble models debris classification algorithm. (a) original are grown around the center points. gray-scale image, (b) adaptive segmentation, (c) center points, (d) bubble models, (e) detected debris particle

In the next step the autowave approach is used for generating bubble models around the detected centers and utilize them for pattern matching. Autowaves represent a particular class of nonlinear waves which spread in active media at the expense of the energy stored in the medium [16]. They can be described by a PDE of the form: ∂u ∂2u ∂2u = D[ + ∂t ∂x 2 ∂y 2

] + f(u)

(7)

Here, for image, ∂u / ∂t is the rate of change of intensity values of u. It is induced by f(u) plus the diffusion term D (∂ 2 u / ∂x 2 + ∂ 2 u / ∂y2 ) . We have focused on the simplest type of autowaves called trigger waves where

transition of the state of a cell can propagate in the system. The autowave approach can be applied to the problem of image classification or recognition via comparison with prototypes (pattern matching). This comparison requires the measurement of the coincidence of two different overlapping point sets. One possibility is to compute the Hamming distance between point sets. Another known distance metric is the Hausdorff metric which is more tolerant to shift and noise [17]. A variant of this latter (nonlinear) metric is called autowave metric and provides an increased tolerance to noise effects [16]. In our problem this latter approach was taken and a hypothesis is used that an object is a group of one or more overlapping bubbles, meaning that these objects are a set union of circles. In the classification algorithm these models are generated from the center points and compared to the original objects. A number of experiments showed that this recognition method is more robust than a classification based simply on roughness measurement or on calculating the Hamming distance.

5. Implementation on a CNN Universal Chip The designed analogic CNN algorithm for object segmentation and recognition can be implemented on a future version of a CNN Universal Chip ([6]-[19]). The discussed solution requires the so-called bias map and fixed-state map technique, two additional LAMs at the cell level and nonlinear cell interactions. The VLSI implementation complexity of the solution mainly depends on the implementation of the diffusion models and the autowave generation, since these are the only building blocks which require nonlinear templates. Simplifying these models (e.g. using constrained linear diffusion in segmentation and linear template generated trigger-waves in recognition) the algorithm can be tested on gray-scale input-output chips when the above mentioned less complex functionalities are available.

6. Conclusions We have described a methodology for designing analogic CNN algorithms for object segmentation and classification implemented on the CNN Universal Machine. PDE and non-PDE related diffusion approaches were presented and discussed. The classification problem is solved based on implementing the autowave metric, a superior approach than classification methods based simply on Hamming distance calculation. The algorithm was tested on real and artificial images within the frame of the bubble-debris classification experiments. This work has been supported by the grant of the ONR (No. N00014-89-J-1402) and the grant of the NSF (No. INT-9413186). The fruitful discussions with Tibor Kozek are kindly acknowledged.

7. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

[16] [17] [18]

[19]

[20]

L. O. Chua and L. Yang, "Cellular Neural Networks: Theory and Applications", IEEE Trans. on Circuits and Systems, Vol. 35, pp. 1257-1290, Oct. 1988. L. O. Chua, and T. Roska, "The CNN Paradigm", IEEE Trans. on Circuits and Systems, Vol. 40, pp. 147156, March 1993. T. Roska and L. O. Chua, "The CNN Universal Machine", IEEE Trans. on Circuits and Systems, Vol. 40, pp. 163-173, March 1993. K. R. Crounse and L. O. Chua, "Methods for Image Processing in Cellular Neural Networks: A Tutorial", IEEE Trans. on Circuits and Systems, Vol. 42, No. 10, pp. 583-601, October 1995. J. M. Cruz, L. O. Chua, T. Roska, "A Fast, Complex and Efficient Test Implementation of the CNN Universal Machine", in Proc. CNNA’94, pp. 61-66, Rome, 1994. S. Espejo, R. Carmona, R. Domingúez-Castro, A. Rodrigúez-Vázquez, "CNN Universal Chip in CMOS Technology", Int. Journal of Circ. Theory and Appl., Vol. 24, pp. 93-111, 1996. D. Marr and E. Hildreth, "Theory of Edge Detection", Proc. Roy. Soc., pp. 187-217, 1980. J. Canny, "A Computational Approach to Edge Detection", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, pp. 679-698, Nov. 1986. J. P. Keener, “Propagation and its Failure in Coupled Systems of Discrete Excitable Cells”, SIAM J. Appl. Math., 47, 556-572. P. Perona and J. Malik, "Scale-Space and Edge Detection Using Anisotropic Diffusion", IEEE Trans. on PAMI, Vol. 12, pp. 629-639, July 1990. N. Nordström, "Biased Anisotropic Diffusion - A Unified Regularization and Diffusion Approach to Edge Detection", Image Vision Computers, Vol. 8, No 4, pp. 318-327, July 1990. F. Catté, P. L. Lions, J. M. Morel, and T. Coll, "Image Selective Smoothing and Edge Detection by Nonlinear Diffusion", SIAM Journal on Numerical Analysis, Vol. 29, No. 1, pp. 182-193, 1992. A. Schultz, “A Nonlinear Diffusion Method for Image Segmentation”, Technical Report, Radar Devision, Naval Research Laboratory, 1993. Cs. Rekeczky, T. Roska, and A. Ushida, “CNN Based Difference-Controlled Adaptive Nonlinear Image Filters”, International Journal of Circuit Theory and Applications, in press, 1998. Á. Zarándy, A. Stoffels, T. Roska, F. Werblin, and L. O. Chua, “Implementation of Binary and Gray-Scale Mathematical Morphology on the CNN Universal Machine”, Memo No. UCB-ERL, Univ. of Cal. Berkeley, 96/19, 1996. V. Biktashev, V. Krinsky, H. Haken, “A wave approach to pattern recognition (with application to optical character recognition)”, Int.l Journal of Bifurcation and Chaos, Vol. 4, No. 1, pp. 193-207, 1994. D. P. Huttenlocher, G. A. Klanderman, W. Rucklidge, “Comparing Images Using the Hausdorff Distance”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No 9, September, 1993. S. Espejo, R. Carmona, R. Domingúez-Castro, and A. Rodrigúez-Vázquez, "CNN Universal Chip in CMOS Technology", International Journal of Circuit Theory and Applications, Special Issue on CNN II: Part I, Vol. 24, pp. 93-111, 1996. R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vazquez, and R. Carmona, “A 0.8 µm CMOS 2-D Programmable Mixed-Signal Focal-Plane Array Processor with On-Chip Binary Imaging and Instructions Storage”, IEEE J. Solid State Circuits, pp. 103-1026, July, 1997. J. M. Cruz and L.O. Chua, “A 16x16 Cellular Neural Network Universal Chip: the First Complete Singlechip Dynamic Computer Array with Distributed Memory and with Gray-scale Input-output”, Journal Analog Integrated Circuits and Signal Processing, in press, 1997.

Spatio-temporal CNN Algorithm for Object Segmentation ... - CiteSeerX

Spatio-temporal CNN Algorithm for Object Segmentation ... - CiteSeerX

Suggest Documents

New Object-oriented Segmentation Algorithm based on the CNN

A CNN Based Algorithm for the Automated Segmentation of Multiple

segmentation for object-based ... - CiteSeerX

Deeply-Supervised CNN for Prostate Segmentation - arXiv

Spatiotemporal Semantic Video Segmentation

A Modified 2D Chain Code Algorithm for Object Segmentation and ...

Spatiotemporal Active Region Model for Simultaneous Segmentation ...

CNN-based Segmentation of Medical Imaging Data

Brain MRI Segmentation with Patch-based CNN

Improved Document Image Segmentation Algorithm using ... - CiteSeerX

segmentation for object extraction of trees using matlab ... - CiteSeerX

An Adaptive Unsupervised Segmentation Algorithm based ... - CiteSeerX

Parallel Edge-Region-Based Segmentation Algorithm ... - CiteSeerX

A Perceptual Colour Segmentation Algorithm - CiteSeerX

Segmentation of 3D acoustic images for object recognition ... - CiteSeerX

Range Image Segmentation for Modeling and Object ... - CiteSeerX

Segmentation of 3D acoustic images for object recognition ... - CiteSeerX

Interactive Object Class Segmentation for Mobile Devices - CiteSeerX

Interactive Object Class Segmentation for Mobile Devices - CiteSeerX

segmentation for object extraction of trees using matlab ... - CiteSeerX

CNN-aware Binary Map for General Semantic Segmentation

CNN-aware Binary Map for General Semantic Segmentation

A CNN Cascade for Landmark Guided Semantic Part Segmentation

Video Object Segmentation using Tracked Object Proposals