Real-time moving object detection and shadow removing ... - CiteSeerX

13 downloads 240 Views 295KB Size Report
Abstract: In automatic video monitoring, real-time detection and in particular shadow elimination are critical to the ... As in any modular conceptions, the test of the real performances of an algorithm must be ..... Proc. of IEE Conf. on Visual Media.
SETIT 2005

3rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27-31, 2005 – TUNISIA

Real-time moving object detection and shadow removing in video surveillance Mohamed Dahmane, Jean Meunier Université de Montréal, DIRO, CP 6128, Succursale Centre-Ville, 2920 Chemin de la tour, Montréal, Québec, Canada, H3C 3J7 {dahmanem, meunier}@ iro.umontreal.ca Abstract: In automatic video monitoring, real-time detection and in particular shadow elimination are critical to the correct moving objects segmentation since they severely affect the surveillance process. In this study, we propose a fast and flexible approach of movement detection based on an adaptive background subtraction technique with an effective model of shadow elimination based on color constancy principle in RGB color space. The results show the robustness of the model and particularly its capacity to work in a completely autonomous way. As in any modular conceptions, the test of the real performances of an algorithm must be carried out in its global context; it's why a complete automatic monitoring system was elaborated. However, in this article the emphasis will be put on the detection part. Keywords: background subtraction, movement detection, RGB color space, shadow elimination.

1 Introduction The role of an automatic video surveillance system is to detect a suspicious human behaviour by considering information such as: color, shape, size, centroid, etc. Generally this task can be summarized in a typical modular form (Figure I.), in which the moving object detection stage constitutes the most critical part. Detection Current image

Background estimation

Object extraction

pixels are incorrectly classified as foreground, leading to illusory forms, or worse, false connectivity between completely independent blobs. In order to mitigate these major inconveniences we propose a flexible and effective spatiotemporal segmentation approach, integrating a shadow elimination model which is very functional, respecting real-time requirements.

2 Related works Several segmentation approaches allow shadow and even penumbra detection. Each one has its own advantages but also its weaknesses (Nadimi & al., 2004). The principles used can be gathered in four categories (Prati & al., 2003) which we describe now: 2.1 Nonparametric statistical methods

Object tracking

Event Recognition

Decision

Figure I. Typical pipeline of a video surveillance process.

Mobile entity detection implies a detection of the temporal variations of luminous intensity. However the changes due to the light fluctuations and more particularly shadows alter severely the segmentation process; a condition that is even worse when the scene is observed by only one camera. The shadow delimitation is therefore a paramount step for providing correct moving target detection results, because of the problems that can arise when shadow

These methods suppose for example that the color is the product of radiance and reflectance and use two distortion measurements: chrominance and brightness distortions. The technique described in (Horprasert & al., 1999) is not well adapted to a systematic update of the reference image, because of its parameters standardization process, necessary to make possible the comparisons. Thus, dynamic scenes limit severely its performances. 2.2 Parametric statistical methods Here, the probability that a pixel belongs to shadow is calculated using a linear transformation

SETIT2005 matrix D which estimates the shadowed pixel components (r`,g`,b`) from its non-shadowed ones (r,g,b). The spatial information is exploited by carrying out an iterative probabilistic relaxation (Mikic & al., 2000). The disadvantage of this method is that it requires a manual segmentation of a certain number of images in order to collect the statistics and to construct the matrix D. Moreover, in the case of a scene made up of several surfaces with reflectance properties (structure and orientation), a zoning is necessary implying a significant number of matrices. 2.3 Deterministic methods without model I An example of these techniques refers to the HSV color space by exploiting the photometric invariance property of the components H and S (Cavallaro & al., 2002), assuming that a shadow does not change the hue H of the background pixels in an important way and that, on the other hand, it appreciably decreases its saturation S (Cucchiara & al., 2001). However, this approach needs to convert the RGB data to the HSV color space, a costly step. 2.4 Deterministic methods without model II Shadow detection is based on the following criteria: (a) the presence of a darker uniform area, (b) the presence of a significant difference in brightness with regard to the reference image and (c) the presence of edges. Although this approach treats penumbra, the assumptions made are not always verified (Stauder & al., 1999).

RGB components in a proportional way (Kumar & al., 2002), with a quasi-preservation of the dominant color producing a semi-transparency effect. Explicitly, two informational criteria are introduced: the brightness distortion (darkening level) and the chromatic distortion. 3.1.1 Brightness distortion Suppose that I(x)=[IR(x), IG(x), IB(x)] are the RGB components of the pixel x from the current image and BCK(x)=[BCKR(x), BCKG(x), BCKB(x)] those of its corresponding pixel in the reference (background) image. Then we define the brightness distortion δBr(x) of I(x) with regard to BCK(x), as the difference between BCK(x) and I`(x) the projection of I(x) on BCK(x) (Figure II.), formally we write:

δBr ( x) = BCK ( x) −

I ( x) • BCK ( x) (1) BCK ( x)

Three cases arise: δBr(x) take a null value if no brightness change is observed in the scene, has a positive value if x is a shadowed pixel in this case we rather speak about a darkening level, and finally a negative distortion denotes an intensity gain (ie. x is illuminated). G

BCK(x)

δ Br I (x)

I (x)

3. The spatiotemporal segmentation model In this study, the detection of moving visual objects (MVO) is based on background subtraction, which is a very effective technique when it is adapted to dynamic scenes. The basic idea is to compare the gathered data in the functional mode with a preregistered scene model to extract the regions of interest. In its most general form, the foreground segmentation model includes the following stages: 1. Background modeling: it corresponds to build a reference image representing the stationary part of the scene; 2. Subtraction: it allows classifying a given pixel as forming a part of a MVO or background; 3. Background updating. 3.1 Background model The detection algorithm that we developed is inspired by a nonparametric statistical technique (section 2.1). The principle, at the origin of several works (Horprasert & al., 1999 - Cucchiara & al., 2001), aims at benefiting of a singular shadow characteristic: a shadow covers a pixel by decreasing appreciably its intensity, while keeping practically invariant its chrominance. Thus a shadow affects the

R

B Figure II. Brightness distortion.

3.1.2 Chromatic distortion Since the brightness distortion varies only along the BCK(x) vector, pixels belonging to the equalbrightness plane (|δBr(x)|≈ε) (Figure III) and whose colour has changed will not be detected as being part of foreground. Under these conditions, the establishment of a chrominance model is necessary for taking into account the color change with the same brightness. Then, we define the chromatic distortion δCr(x) of the vector I(x) with regard to the reference vector BCK(x) as the angle ∠(I(x),BCK(x)) (Figure IV.):

δCr( x) = arccos

[

I ( x )• BCK ( x ) I ( x ) ⋅ BCK ( x )

]

(2)

SETIT2005 besides the shadow does not affect the enlightened areas in the same way as those which are darker. As a consequence, a Chromatic (bright pixel) vs. Achromatic (dark pixel) cleavage was used to consider more suitable thresholds (τ1 and θ) in each case. We define a background pixel as achromatic if:

(BCK c ( x))< 50

max

c∈{ R , G , B}

Figure III. The equal-brightness plan BCK⊥(x) related to BCK(x) in the RGB cube. G

BCK(x)

δ Cr

Real-time applications require straightforward, fast and reliable algorithms. To be able to satisfy these strict temporal requirements we increase the detection frequency, by subjecting only the potential candidate pixels to the detection module, by holding only those verifying:

I(x)

∃ c ∈ {R, G, B}

I c ( x) − BCK c ( x ) > k ⋅ σ c ( x)

(I(x),BCK(x) ) R B

Figure IV. Chromatic distortion.

3.2 Background subtraction A pixel is classified as shadow if its darkening level is reasonable with a weak chromatic distortion:

and

δBr ( x) < τ 0

(3)

δCr (x) < θ

(4)

Equations (3) and (4) define what we call a shadow volume Γ (Figure V.). τ0

G

The thresholds σc(x) are automatically determined and represent the maximum inter-frame absolute differences (Haritaoglu & al., 2000) throughout a preset updating time interval ∆t ( =10 s in this work). By setting k=2, at least 75% of the values will fall within the interval BCKc(x)± 2σc(x), supposing that σc(x) corresponds to a reliable standard deviation estimation (1). Besides, equation (5) offers an advantageous 2 possibility to pre-store τ 1 ⋅ B ( x ) . So, in surveillance mode, only I ( x) • BCK ( x) needs to be evaluated. 3.3 The detection Pseudo-code // // //

let I(x) be the intensity vector of the pixel x in the current image and BCK(x) its intensity in the reference image.

BCK(x)

// check if x represents a potential candidate if (|IR(x)-BCKR(x)|>k⋅σR(x) or |IG(x)-BCKG(x)|>k⋅σG(x) or |IB(x)-BCKB(x)|>k⋅σB(x)) then // check the darkening level (2).

Γ I(x)

τ=

θ

B

2

// check the chromatic distortion if ( δCr (x ) > θ ) then

Figure V. The conical shadow volume Γ

By putting τ 0 = (1 − τ 1 ) ⋅ BCK ( x) , we guarantee a semi-dynamic thresholding (proportional threshold) according to the magnitude of BCK(x). This ensures a constant contrast thresholding. Equation (3) becomes: 2

BCK ( x )

if ( τ 1 < τ and τ < 1.0 ) then // x is reasonably darkened

R

τ 1 ⋅ BCK ( x) < I ( x) • BCK ( x)

I ( x ) • BCK ( x )

//x has a different colour x∈MVO else

x∈shadow endif else

(5)

x∈MVO endif endif

The values τ1 (0.0) are empirically fixed, according to the nature of the scene, so making the detection system more flexible.

1 Chebyshev’s theorem.

Furthermore, we observe that the distribution of the brightest pixels tends to have a weaker variance,

2 Two couples of empirical thresholds (τ ,θ) were considered, 1 according to the BCK(x) pre-segmentation (Chromatic or Achromatic).

SETIT2005 3.4 Background updating To keep the system functional as long as possible in the case of dynamic scenes, a periodic maintenance of the background BCK is necessary. This is obtained by updating a "parallel" background that we denote BP, according to a recursive estimation of a weighted intensity average:

BPct + δt ( x) = α ⋅ BPct ( x) + (1 − α ) ⋅ I ct +δt ( x) The factor α∈[0,1] (=0.9 in this work) is initialized empirically by taking into account the camera frame rate; indeed this value controls the adaptation rate. Thus, the final value BPc(x) represents an average with exponential weighting of the values taken by the pixel x during the updating period ∆t. At t+∆t, the reference image BCK is initialized by BP where only the pixels not visited by a valid (3) MVO will be considered, this by keeping their tracks from the targets tracking process: BCK t + ∆t ( x ) = ⎧ BCK t ( x ) if ∃ t ′ ∈ [t , t + ∆ t ]: x ∈ O et O ∈ {MVO }t ` ⎨ else BP t + ∆t ( x ) ⎩

Background subtraction alone does not deal with noisy images. Denoising requires an additional phase, usually morphological filtering that is well adapted to practical video surveillance cases. However, authors meet with great difficulties the right combination of morphological operators and the resulting system is often quite scene-dependent (Bevilacqua, 2002). In this optic, considering the robustness of our detection model, only a median filter (3x3) is necessary to eliminate isolated noisy pixels and edge effects due to the camera jittering (Figure VI).

University of Montreal in such a way as to overlook a part of the ground floor hall. To test the algorithm we considered rather difficult video conditions: - the camera observes the scene with a plunging angle. - the scene is influenced by the outside lighting resulting from a glass door situated upstairs almost in front of the camera. - the light in the scene is not uniform. - surfaces of different nature are present: a very reflective floor and a dark and rough texture staircase. - the fluorescent white balance mode was activated: to simulate a not completely white light source (the actual light sources are incandescent) The insensitivity of the model with regard to the weak variations of illumination demonstrates good performances (Figure VII.) and give a high detection rate (true positives), a weak omission rate and in particular an acceptable commission percentage (false positives) for a real-time algorithm (Table I.). TP Omissions FP Detection rate 97.47% 2.53% 7.60% Table I. Average performances of the conical model.

The false positives (7.6 %) are due for the greater part to the strong shadows situated just under the MVOs, and particularly, to the ‘chromatic’ instability of the achromatic pixels because of their RGB components being quite close to the origin.

The retained regions of interest are then labelled using a fast connected components labelling algorithm based on union-find data structure (Cormen & al., 2001). Finally, the small blobs will be simply ignored.

(a) Chromatic region -floor Figure VI. The camera jitter effect.

4 Experimental results For evaluating the system, video sequences with a 320x240 pixels resolution were taken, using a Logitech® QuickCam Pro4000 Webcam installed on the first floor of the André-Aisenstadt building of the (b) Achromatic region in the staircase 3 If the MVO has been correctly tracked for at least three frames.

SETIT2005 constancy) by modelling both the pixel luminance and chromaticity and their respective distortions in the RGB space. Besides, this classification employs a Chromatic / Achromatic background pre-segmentation because of the intensity level variation behaviour which differs according to corresponding pixel degree of illumination. (c) 3D-vision laboratory at University of Montreal.

(d) Laboratory_raw (4)

The systematic update of the background model allows to reduce the noise sensibility and to strengthen the robustness to the variations of illumination. The false positives are mainly owed to the strong shadows located directly under the MVOs. In practice, this type of detection does not influence too much the system because, generally, a process of automatic surveillance considers the global movement properties. However, these misclassification can be overcome by a simple criteria based on geometrical considerations (shadows are MVO projections). By its simplicity, due to the fact that a simple thresholding allows to correctly segment the foreground from the background/shadow, and its flexibility, due to the adjustable thresholds according to the nature of the scene, the system is very successful and effective in real-time. On a 2.66GHz processor the minimal video frequency obtained was 16Hz, which should be largely sufficient to perform a robust target tracking and to correctly complete the event recognition stage.

References (e) IntelligentRoom (4) Figure VII. Detection performances

In Figure VII, the blue color represents the detection resulting from a brightness distortion, pixels in red show those who present a significant chromatic distortion. Shadows were correctly identified by the green regions. These experiments were performed by fixing thresholds (τ1,θ) to (0.75, 0.1rad) for the pixels corresponding to the achromatic partition of the reference image (BCK) and to (0.80, 0.04rad) for those of the complementary one (chromatic).

5 Conclusion This study presents a fast and flexible approach of background modelling with robust shadow elimination for correct moving object detection, based on its intensity I(x), the background average intensity BCK(x), and the maximal absolute inter-frame intensity difference σc(x). The pixel classification to a MVO or to a shadow, is based on a physical property of this last one (color 4 http://cvrr.ucsd.edu/aton/shadow

-(Bevilacqua, 2002). Effective Object Segmentation in a Traffic Monitoring Application. Proc. of the 3rd ICVGIP, pages 125-130, 2002. - ( C a v a l l a r o & a l . , 2 0 0 4 ) . Detecting shadows in image sequences. Proc. of IEE Conf. on Visual Media Production (CVMP), London (UK), March 2004. -(Cormen & al., 2001). Introduction to algorithms. 2nd ed. Cambridge, Mass. : MIT Press, Boston ; Montréal : McGraw Hill Book, c2001. -(Cucchiara & al., 2001). Improving shadow suppression in moving object detection with HSV color information. Proc. of The IEEE 4th Inter. Conference on Intelligent Transportation Systems, pages 334-339, August 2001. -(Haritaoglu & al., 2000). A Fast Background Scene Modeling and Maintenance for Outdoor Surveillance. Int. Conf. on Pattern Recognition (ICPR'00), 4:179-183, 2000. -(Horprasert & al., 1999). A Statistical Approach for Realtime Robust Background Subtraction and Shadow Detection. ICCV'99 Frame-Rate Workshop, 1999. -(Kumar & al., 2002). A comparative study of different color spaces for foreground and shadow detection for traffic monitoring system. Proc. of The IEEE 5th Inter. Conference on Intelligent Transportation Systems, pages 100-105, September 2002.

SETIT2005 -(Mikic & al., 2000). Moving shadow and object detection in traffic scenes. Proc. of Inter. Conference on Pattern Recognition, Sept 2000. -(Nadimi & al., 2004). Physical Models for Moving Shadow and Object Detection in Video. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(8) :1079-1087, 2004. -(Prati & al., 2003). Detecting moving shadows: algorithms and evaluation. IEEE Tran. on Pattern Analysis and Machine Intelligence, 25(7): 918-923, 2003. -(Stauder & al., 1999). Detection of moving cast shadows for object segmentation. IEEE Transactions on multimedia, 1(1):65-76, 1999.