On-Line Visual Learning Method for Color Image Segmentation and

3 downloads 0 Views 981KB Size Report
color variation in a static image to perform color-based change detection. ... for color image segmentation and object tracking in dy- namic environment.
On-Line Visual Learning Method for Color Image Segmentation and Object Tracking Takayuki Nakamura∗

Tsukasa Ogasawara

Nara Institute of Science and Technology Dept. of Information Systems 8916-5, Takayama-cho, Ikoma, Nara 630-0101, Japan ∗E-mail:[email protected]

Abstract In order to keep visual tracking systems with color segmentation technique running in real environment, it should be developed on-line learning method to update models for adapting them to dynamic changes of surroundings. To deal with this problem, we propose an on-line visual learning method for color image segmentation and object tracking in dynamic environment. Our method utilizes fuzzy ART model which is a kind of neural network for competitive learning. The mechanism of this neural network is suitable for on-line learning and different from that of backpropagation type neural network. In order to use fuzzy ART model for color segmentation on-line, we transform the color signal that the framegrabber used yields to a particular color space called Y rθ space. To show validity of our method, we present some results of experiments using sequences of real images. Keywords: On-line visual learning, Fuzzy ART model, Segmentation, Tracking

1

Introduction

Robust visual tracking is indispensable for building up vision-based robotic system. The hard problem in visual tracking is performing fast and reliable matching of the target every frames. A variety of tracking techniques and algorithms have been developed. Among them, color-based image segmentation and tracking algorithm seems to be practical and robust in real world, because color is comparatively insensitive to the presence of changes in scene geometry and occlusion. To date, there have been some researches on practical color tracking algorithms. Du and Crisman [1] proposed a simple color tracking algorithm. Their method gets a set of arbitrary representative colors in RGB space and constructs membership volumes for objects based on nearest neighbor calculation. The objects are characterized by their histograms over these volumes. Wren et al. [2] proposed the Pfinder which is a specialized system for tracking

people. Pfinder uses a statistical characterization of color variation in a static image to perform color-based change detection. Rasmussen et al. [3] proposed a color blob tracking technique based on color image segmentation. Their method describes sample color distribution by an ellipsoidal model. The shape of this ellipsoid defines a static membership function that is used to track objects with varying levels of illumination. To find parameters of a ellipsoidal model, their method need to perform principal component analysis of sample colors every time a target appears in the scene. In this way, most of existing methods need bootstrap process to model sample color distribution for image segmentation. That is, in order to calibrate their statistical model for image segmentation, these visual tracking systems need to wait for color data to accumulate enough even if these systems work in dynamic environment. In order to keep visual tracking systems running in real environment, on-line learning method for acquiring some models for image segmentation should be developed. This paper proposes an on-line visual learning method for color image segmentation and object tracking in dynamic environment. Such on-line visual learning method is indispensable for realizing a vision-based system which can keep running in real world. To realize on-line learning, our method utilizes fuzzy ART model [4] which is a kind of neural network for competitive learning. Although color image we deal with is represented by Y U V color space, Y U V space is not suitable for inputs of fuzzy ART model. For this reason, Y U V space is transformed to a certain color space. This transformation enables fuzzy ART model to segment color image in on-line. As a result, even if surroundings such as lighting condition changes, our on-line visual learning method can perform color image segmentation and object tracking correctly. To show validity of our method, we present some experimental results where sequences of real images are used as inputs of our method.

2

Visual Tracking System with On-line Learning Capability t=0

t=1

t=2

Image

Image

Image

Specify target regions

Signal transform system

Transform color signals

Transform color signals

Transform color signals

Construct initial target models

Match based on target models

Match based on target models

Update target models & Estimate target position

Update target models & Estimate target position

On-line learning sytem

Figure 1: An architecture of our system Fig.1 shows configuration of our visual tracking system. The main part of our system is represented by an area surrounded by a dotted line in this figure. As shown in Fig.1, our system mainly consists of two subsystems. One is the on-line learning system based on fuzzy ART model to realize constructing, matching and updating color models of objects. The other is the signal transforming system to generate color signal suitable for input of the learning system. Actually, this system transforms Y U V color signal which is component of input color image to Y rθ color signal. We explain how to perform visual tracking based on this configuration of our system as follows: First, at time t = 0, we specify target regions we’d like to extract on an initial color image as closed regions. Initial color models of target regions are constructed based on color information in these closed regions. At the same time, spatial models of target regions are also constructed based on the location of these closed regions. When our system performs tracking, first of all, input color image at time t = k is transformed into Y rθ signal. Then, such color signal is compared with color models of the specified targets in our on-line learning system. Pixels coincided with the color models are compared with spatial models of the targets. Our tracking system treats pixels coincided with the spatial model at time t = k as observation of the target position. Based on this observation, the position of the target is estimated and updated using Kalman filter in the same way described in [2]. Color models at time t = k is estimated by updating color models at time t = k − 1 based on color information in the target regions at time t = k. In the following subsection, first, we explain fuzzy ART system which is main part of our learning system according to the literature [4]. Next, we present how to transform Y U V signal into Y rθ signal. Finally, we show how to update the spatial model of the tracked target.

2.1

On-line learning system based on Fuzzy ART model

A fuzzy ART model consists of three fields F0 , F1 and F2 . In the field F0 , a current input vector is represented by an internal node of F0 . In F1 , competition between bottom-up input from F0 and top-down input from a field F2 occurs. In F2 , a category is represented by an internal node of this field. First of all, we explain some preconditions which are required to use fuzzy ART model. Input vector: Let a = (a1 , · · · aM ) be an input vector to fuzzy ART model which is M dimensional vector. The value of each component ai is normalized in such a way that it can take values in the interval [0, 1]. Complement coding: The amplitude of input vector a is transformed in such a way that it can take constant value in fuzzy ART model. Such a transformation is called “complement coding. ” The complementary value of ai is defined as aci ≡ 1 − ai . According to this complement coding, an input vector I in the learning system can be represented by 2M dimensional vector I = (a, ac ) ≡ (a1 , · · · , aM , ac1 , · · · , acM ). For example, if the input set consist of two-dimensional vectors, the input vector I in fuzzy ART model is represented by four-dimensional vector I = (a, aC ) = (a1 , a2 , 1− a1 , 1 − a2 ). As the size of the input vector decreases, the size of category in F2 , that is, the prototype vector become arbitrary small or large. In order to keep the size of the input vectors constant, such a representation rule is used. Based on this representation rule, the size of input vector |I| can be preserved as follows: |I| = |(a, ac )| =

M X i=1

ai + (M −

M X

ai ) = M

i=1

Weight vector: Each category j is described by a 2M -dimensional weight vector w j = (uj , v cj ). The number of categories N (j = 1 ∼ N ) is arbitrary. Each of uj and v j is M −dimensional vector. uj corresponds to one vertex of the hyper-rectangle Rj . v j corresponds to another vertex of Rj . These two vertex are located diagonally. In case M = 2, as shown in Fig.3 (a), each category j can be represented by a rectangle Rj . As described later, during the learning process, the size of each hyper-rectangle becomes large. However, as the learning progresses, the size of hyper-rectangle Rj can’t endlessly become large. With respect to any input vector, the size of Rj , that is, |Rj | = |vj − uj | has the constraint that |Rj | ≤ (1 − ρ)M, where ρ is called “vigilance parameter.” Initially, weight vectors are set to be wj1(0) = · · · = wjM (0) = 1 and all categories are said to be uncom-

mitted. After a category is selected during the learning process, it is considered that the category becomes committed. Parameters: The learning in fuzzy ART model is controlled by a choice parameter α > 0, a learning rate parameter β ∈ [0, 1] and vigilance parameter ρ ∈ [0, 1]. F2 wj = (u j , v j ) C

Reset

F1

I = (a, a C )

F0

a For all a Figure 2: An architecture of fuzzy ART model

Based on these preconditions, the learning of fuzzy ART model takes place as follows.

1 Category choice: For each input I and category j in F2 the choice function Tj (I) is defined by Tj (I) =

is reset to 0 so that the same categories may prevent its persistent selection during search. This search process continues until the chosen J leads to resonance. If there are some categories Js maximizing the choice function, this search process is repeated for each index J.

3 Learning: If there is a category which is selected as resonance category, its weight vector wJ is updated according to the following equation; wJ new = β(I ∧ wJ old ) + (1 − β)wJ old . In this way, the learning is deformed. In case of β = 1, such learning is defined as fast learning. These process

1, 2 and 3 are repeated for all input vectors. Fig.3 (b) gives a geometrical explanation about how to update weight vector wJ during the learning process. As mentioned before, in fuzzy ART model with complement coding, a weight vector wj corresponds to a hyper-rectangle Rj . Suppose that an input vector a would belong to category J, the region of RJ expands to the region of RJ ⊕ a which is the minimum rectangle containing RJ and a. The corners of RJ ⊕ a are given by a ∧ uj and a ∨ v j , where operator ∨ is defined by (p ∨ q)i ≡ max(pi , qi ).

1.0

TJ = max Tj : j = 1, · · · , N. If more than one Tj is maximal, the category j with the smallest index is chosen. At the same time, the categories become committed in order j = 1, 2, 3, · · ·.

2 Resonance or reset: If the match function of the selected category J |I ∧ wJ | M meets the vigilance criterion, that is, |I ∧ wJ | ≥ ρM , it is considered that “resonance” occurs. Such category J is chosen as a candidate which can update its weight vector wJ as described later. To the contrary, if |I ∧ wJ | < ρM , it is considered that “mismatch reset” occurs. Then, the value of the choice function TJ

a

vj Rj

|I ∧ wj | , α + |wj |

where, for any M − dimensional vector p and q, operator ∧ is defined by (p ∧ q)i = min(pi , qi ), and where the PM norm || is defined by |p| = i=1 |pi |. With respect to the I, a category J is selected based on the following equation;

1.0

RJ

uj

RJ a uJ

0

vJ

1.0

0

a a 1.0

Figure 3: Weight representation of Fuzzy ART According to the original algorithm of fuzzy ART model, many categories which overlapped each other are generated. Among such categories, there are categories so that the size of the corresponded hyper-rectangles may be small or may include few sample data. In this paper, such extra categories are excluded according to thresholds in terms of the size of hyper-rectangle and the number of sample data. After excluding extra categories, the remaining categories are treated as the results of fuzzy ART model. In this paper, it is supposed that the color model of the target can be represented by several categories.

2.2

Color Space Transformation for On-Line Visual Learning

Although our frame grabber yields a Y U V color signal, Y U V space is not suitable for inputs of fuzzy ART

200 100 0 200 100 0

300

cluster with similar color forms rectangular region on rθ plane. Fig.4 (b) shows a result of transforming Y U V values shown in Fig.4 (a) into Y rθ values.

100

200 2 0

100 10

-100 -1

0 -100

100 0

200 100

(a)

(b)

Figure 4: Y U V and Y rθ color space model. As shown in Fig.4 (a), even if characteristic of two sets of data in Y U V color space are different each other, these two sets of data are distributed in almost same part of Y U V color space. Therefore, it is difficult to divide such distributions into two regions with same color information by hyper-rectangles which represent categories in fuzzy ART model. In order to make the distribution of data in a color space suitable for inputs of fuzzy ART model, it is transformed to a particular color space called Y rθ color space which is linearly separable using only plane parallel to the principle axes. By this transformation, the resultant color signal in Y rθ can be treated as input to fuzzy ART model. Luminance Y

r

(u,v) Chrominance U

Chrominance U Chrominance V

Chrominance V Luminance Y

2.3

Updating a spatial model

We can represent location of a target by a cluster of two-dimensional (hereafter, 2D) points which has 2D spatial mean µ and covariance matrix σ. For computational convenience, this can be interpreted as a 2D Gaussian model. According to this representation, µ corresponds to the center of gravity of the target region. In this paper, we call µ the “mean location.” First of all, we define a fitness function Φtarget (x, y) at a pixel (x, y) as follows:  1 C(x, y) ∈ CM target , Φtarget (x, y) = 0 Otherwise where C(x, y) and CM target shows Y rθ value at the pixel (x, y) and the color model of the target region. This fitness function represents whether the color model of the target supports the pixel or not. Based on Φtarget (x, y), it is calculated the mean location (xtarget , ytarget ) of the target at time t as follows: P (x ,y )∈R xi Φtarget (xi , yi ) , xtarget = P i i (xi ,yi )∈R Φtarget (xi , yi ) P (x ,y )∈R yi Φtarget (xi , yi ) , ytarget = P i i (xi ,yi )∈R Φtarget (xi , yi ) where R shows the search area of the target in the image . Initially, or when the target is lost in the image, R implies an entire image plane. After initial estimation for the location of the target, we can know the standard deviations σ(xtarget ) and σ(ytarget ) regarding (xtarget , ytarget ). Therefore, based on the deviations, R is restricted to local region centering the mean location of the target during the tracking process as follows: R : {(x, y)|

Saturation r

(r,

)

Hue

Figure 5: Transformation of Y U V color space to Y rθ space We define the transformation of Y U V color space to Y rθ as follows: p u y = y, r = u2 + v2 , θ = arc tan , v where r and θ indicate saturation and hue components, respectively. As shown in Fig.5, the neighboring area with color similar to (u, v) forms arc region on U V plane. On the contrary, this neighboring area, that is, a

P

xtarget − 2.5σ(xtarget) ≤ x ≤ xtarget + 2.5σ(xtarget), ytarget − 2.5σ(ytarget ) ≤ y ≤ ytarget + 2.5σ(ytarget )}.

(xi ,yi )∈R Φtarget (xi , yi ) shows the area of the target in the image. Based on this value, we judge the appearance of the target. If this value is lower than the pre-defined threshold, the target is considered to be lost, then R is set to be the entire image plane for estimation at next time step. We set this threshold for the target area = 0.05 ∗ S, where S shows the area of the entire image. This process helps to reduce the computational cost for extracting regions with similar color. In the same way described in [2], the best estimate ˆ xtarget , yˆtarget ) of the target location at time X t|t = (ˆ

t is calculated based on the Kalman filter method as follows:   ˆ t|t−1 + G ˆ t Yˆ t − X ˆ t|t−1 , ˆ t|t = X X ˆ shows the mean location of the target where Yˆ and G in the current image, and the Kalman filter gain matrix assuming simple Newtonian dynamics.

3

Experimental Results SONY EVI D30 NIKKO BLACK BEAST Tactile sensors

Motor Driver Board parallel port

serial port IBM Smart Capture Card II

Libretto 100 /32M

WaveLAN /PCMCIA

required for tracking are constructed among the several segmented regions. Utilizing these spatial models, the coordinates of the center of objects are estimated based on the Kalman filter method. Fig.8 shows the performance of color segmentation by our on-line visual learning system in the real environment. In this figure, each part (a), (b), (c) and (d) shows an input color image, the distribution of Y rθ values of the input image, color models of the targets produced by our method, and the result of segmentation based on the color models, respectively. The size of an input image is 80 × 60 pixels. In this experiment, we specified a red ball, a yellow goal, green field, and a purple marker as the targets that our robots should discriminate. Table.1 shows the processing time and the number of categories in fuzzy ART model which are required for extracting the objects. 5 categories are generated for a ball, 3 for a marker, 9 for a field, 6 for a goal. In total, 23 categories are generated. Table 1: The processing time and the number of categories time (msec) # of categories 206 23(=5+3+9+6)

Figure 6: Our vision-based mobile robot As shown in Fig.6, we have developed a vision-based mobile robot which is compact and cheap [5]. For experiments, we utilized the visual sensing system of our mobile robot. The visual sensing system consists of two parts. One is IBM Smart Capture Card II (hereafter SCCII) which is a video capture PCMCIA card and can be easily plugged into a portable PC. Another is SONY EVI D30 (hereafter EVI-D30) which is a color CCD camera and has a motorized pan-tilt unit. In experiments, all processing including image capture and image processing is performed by the CPU (MMX Pentium 166MHz) on Toshiba Libretto 100. Utilizing our vision-based mobile robots, we are developing robotic soccer players with on-board visual sensor like human soccer players. First of all, to deal with robotics soccer task, our robots have to discriminate a ball, goals, white lines, teammates and opponents. Currently, our mobile robots discriminate such objects based on color information. Our mobile robots accomplish the visual tracking and discrimination tasks based on the following procedures. First, somehow, we specify the target regions for discriminating objects which will appear in perceived images. Based on the color information in the specified regions, color models of objects are constructed. Using these color models, an input image is segmented into several regions which correspond to some objects. After color segmentation, spatial models of the regions

Fig.9 shows a sequence of tracking a ball which a robotic soccer player is dribbling. This sequence consists of 4 snapshots. The leftmost and the rightmost snapshot is taken at the start and the end of the sequence, respectively. The top left and top right of each snapshot shows an input color image and the result of segmentation based on the color model of the ball, respectively. In the top left figure, a black rectangle and a gray filled rectangle shows the detected size and location of the ball in the image. The bottom of each snapshot shows the distribution of Y rθ values of the input image and the color model of the ball produced by our method, respectively. At the start of the displayed sequence, the ball are detected by 4 categories in Y rθ space which are an initial set of the color model of a ball. As time goes on and the ball approaches our visual sensing system, the color model of a ball is updated because a lighting condition changes in terms of the location in the field. At the end of the sequence, the ball are detected correctly by 7 categories as shown in the bottom of the last snapshot. To show the usefulness of our on-line visual learning system in the real environment, we make a comparison between our tracking algorithms without and with online learning process. Fig.10 shows the result of the comparison where the tracking target is a red glossy file which a person is swinging. This sequence consists of 3 snapshots (a),(b) and (c). Each snapshot includes comparison of the detected results. At the beginning,

two detected results are same because the categories used to detect the region are same. As time goes by, distinct gap between two detected results appears. The tracking algorithm with on-line learning process updates categories for detecting a red file as the appearance of the red file changes due to its property of surface reflectance. At the end, if the appearance of the red file becomes bright, our tracking algorithm without learning process can’t detect it, on the other hand, the tracking algorithm with learning process can do it because this tracking algorithm updates categories during the tracking sequence. Fig.7 shows change of the number of categories while tracking by our method with on-line learning process. And it also shows a convergent property of our on-line learning process. As shown in this figure, the number of categories converges at about 44. This means that our learning process converges if the sampled data for detecting a target is sufficient.

current hardware. If we had more powerful hardware, we could additionally use motion and texture measurements as input vectors to our on-line learning system other than position and color measurements. In such case, our method might be more robust than the current version. In this paper, we proposed to apply fuzzy ART model to color image segmentation and to construct a system for color space transformation in order to generate input vectors which are suitable for learning in fuzzy ART model. As shown in experimental results, even if the lighting condition of the surroundings changes, our method was able to update color models of the objects in response to the change and to extract the target regions in the image almost correctly. To show the usefulness of our learning process, we showed comparison between our tracking algorithms without and with learning process. Furthermore, we showed the convergent property of our algorithm although it is checked experimentally. This method will be able to expand image processing applications much more widely in general.

Acknowledgments I’d like to say thank you to Prof. G. Bartfai. He lets me know where the free software for fuzzy ART is available. Due to his notification, I was able to implement fuzzy ART system easily.

References Time step ( x 15 sec)

Figure 7: Change of the number of categories during tracking by our method

4

Discussion and Conclusion

Generally, fuzzy ART model has many advantages which learning algorithm should have. For example, fuzzy ART architecture has mechanisms for on-line and unsupervised learning and can be applied to non-stationary world [4]. However, a learning algorithm based on fuzzy ART architecture can be applied only to input feature space where the distribution of input data is is linearly separable using only plane parallel to the principle axes as show in Fig.4, because a category produced by fuzzy ART model forms rectangular region. Therefore, this architecture can’t be applied to arbitrary input space. To cope with this problem, we transform Y U V space to a “linear” color space Y rθ as an input feature space for fuzzy ART architecture. This transformation enables fuzzy ART model to segment color image in on-line. In this paper, as an example of image processing, we dealt with image segmentation based on color information due to the limitation of performance of our

[1] Y. Du and J. Crisman. “A Color Projection for Fast Generic Target Tracking”. In Proc. of IEEE/RSJ/GI International Conference on Intelligent Robots and Systems 1995 (IROS ’95), pages 360–365, 1995. [2] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland. “Pfinder:Real-Time Tracking of the Human Body”. IEEE Trans. on PAMI., 19:7:780–785, 1997. [3] C. Rasmussen, K. Toyama, and G. D. Hager. “Tracking Objects By Color Alone”. Technical Report TR1114, Dept. of Computer Science Yale University, June 1996. [4] G. A. Carpenter, S. Grossberg, and D. B. Rosen. “Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System”. Neural Networks., 4:759–771, 1991. [5] T. Nakamura, K. Terada, and et al. “Development of a Cheap On-board Vision Mobile Robot for Robotic Soccer Research”. In Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems 1998 (IROS ’98), pages 431–436, 1998.

1 1

Y 0 0.40.2 0.80.6 1 1

0.8

0.8

0.6

0.6

r

r

0.4

0.4

0.2

0.2

0

0

0 0.2

(a)

Y 0 0.40.2 0.8 0.6

Goal 0

0.4 0.6

(b)

0.8

1

Ball

Field

Marker 0.2

0.4

0.6 0.8

1

(c)

(d)

Figure 8: A result of color segmentation by our method

Figure 9: A sequence of tracking a moving ball

Figure 10: Comparison of two tracking algorithms without and with learning process

Suggest Documents