Shadow Detection by Combined Photometric ... - Semantic Scholar

3 downloads 0 Views 1MB Size Report
Shadow Detection by Combined Photometric Invariants for Improved. Foreground Segmentation. Filiz Bunyak Ilker Ersoy S.R. Subramanya. University of ...
Shadow Detection by Combined Photometric Invariants for Improved Foreground Segmentation Filiz Bunyak Ilker Ersoy S.R. Subramanya University of Missouri-Rolla Department of Computer Science 1870 Miner Circle, Rolla, MO 65409 {bunyak,ersoy,subra}@umr.edu Abstract Detection and tracking of moving objects are the essential steps of many video understanding applications such as traffic monitoring, video surveillance and visual event recognition. Moving object detection process segments the scene into foreground (moving) and background regions. Moving cast shadows cause serious problems in this process because they can easily be misclassified as foreground. This misclassification may lead to drastic changes in the shapes of objects or merging of multiple objects. In this paper, we present a method to detect moving cast shadows to improve the performance of moving object detection. The foreground regions are processed in terms of intensity, chromaticity, and reflectance ratio. To further refine the results, compactness constraint is enforced on the foreground and shadow masks. The algorithm exploits spatial and spectral information; no a priori knowledge about camera, illumination or object/scene characteristics are required. Obtained results show better performance compared to other work in recent literature.

1. Introduction Proliferation of video cameras results in diverse applications of computer vision such as traffic monitoring, event recognition in intelligent surveillance systems, content based video annotation and human-computer interaction. The success of these applications depends on reliable tracking and analysis of moving objects in image sequences. Applications that involve shape analysis of objects such as human-computer interaction, or counting and identifying objects such as traffic monitoring require reliable detection and tracking of the objects in question. Hence, reliable moving object detection is an essential step in further analysis of image sequences in these applications. Moving object detection process segments the moving regions/objects

(foreground) from the rest of the image (background). A generally accepted method to achieve this segmentation is maintaining a background model and subtracting the current frame in the sequence from this background ([2, 5, 11]). Moving cast shadows cause serious problems in this process because they can easily be misclassified as foreground. They may darken the background significantly and they have the same motion as the objects they belong to. This misclassification may lead to drastic changes in the shapes of objects or merging of multiple objects, which subsequently result in poor performance in the respective application. Shadow detection is a widely studied problem and draws an increasing interest as evidenced from the recent literature ([1, 6, 9, 10]). A recent survey ([9]) gives a taxonomy of shadow detection methods and compares four different methods in a quantitative manner. In [6], results of shadow detection by using a spatio-temporal albedo test and body color estimation and verification is reported. In [1], a chromaticity based approach (normalized RGB) and some empirical rules followed by spatio-temporal verification are used to detect shadows. A slightly different problem which is harder to solve, the removal of shadows in still images, is studied in [3]. In this paper, we present a method to detect moving cast shadows by exploiting the spatial and spectral information. Two photometric invariants, the normalized color and the reflectance ratio, are combined to obtain a reliable classification of shadow and object pixels in the detected foreground. In section 2, the rationale for the invariants are given and the proposed method is described. In section 3, experiments and results are discussed. Section 4 concludes the paper.

2. Proposed method A common observation is that a shadow does not have a texture or color of its own, instead it changes the illumination of the background surface it is cast upon. This results in a change in the observed surface color, but surface mate-

rial properties remain the same. In contrast, when an object occludes the background, this results in a change in material properties of that image region. In order to differentiate illumination changes due to shadows from material changes due to actual moving objects, foreground points should be compared to the background model in terms of features that depend on material but invariant to illumination. To ensure a robust comparison, we use two photometric invariants: one derived from spectral information (normalized color), and one derived from spatial information (reflectance ratio). The steps of the proposed method, depicted in figure 1, is given below: 1. Obtain the background model BG and the foreground mask F M through background subtraction. Mask the current frame to obtain foreground pixels F G. 2. Tag every pixel F G(x, y) of foreground as a shadow candidate SC(x, y) if it is darker than the background pixel BG(x, y). (this step eliminates the object pixels that are lighter than the background.) 3. Compare shadow candidates SC(x, y) to the corresponding background pixels BG(x, y) in terms of normalized color, mark the pixels with similar normalized color as part of shadow mask SMN C . 4. Compare shadow candidates SC(x, y) to the corresponding background pixels BG(x, y) in terms of reflectance ratio, mark the pixels with similar reflectance ratios as part of shadow mask SMRR . 5. Combine the masks obtained in step (3) and step (4); SM = SMN C AND SMRR . Obtain shadow-corrected foreground; F GSC = F G − SM . 6. Enforce compactness constraint on the shadow mask SM and foreground F GSC . The following subsections elaborate on the steps of our method and the two photometric invariants that are used.

2.1. Moving object detection To obtain moving objects/regions, we use the mixture of Gaussians (MoG) approach described in [11] because of its adaptive and multi-modal nature that is robust to lighting changes, repetitive motion of the background (such as waving trees), slow moving objects, and introduction and removal of scene objects. In mixture of Gaussians approach, the recent history of each pixel, X1 , ..., Xt , is modeled by a mixture of k Gaussian distributions. Each distribution is characterized by its mean, variance, and a weight that indicates what portion of the previous values did get assigned to this distribution. Using a K-means approximation, RGB color vector of each new pixel is assigned to a Gaussian distribution and the parameters of the distributions are updated.

New Frame

Moving Object Detection BG model

FGmask

Shadow Detection Identification of Darker Regions Reflectance Ratio Comparison

Normalized Color Comparison

Combination

FGmask

Shadow mask

Post Processing FGmask

Shadow mask

Figure 1. Steps of the proposed method. Then the distributions are labeled as foreground or background based on the weight/σ ratio (σ denotes the standard deviation in this context). Gaussian distributions that have the most supporting evidence (high weight and low σ) are considered as produced by background processes; the remaining distributions are considered as produced by foreground processes. This process results in a background model BG and a binary foreground mask F M identifying the moving regions/objects. This mask is used to obtain foreground pixels F G. All frames are passed through a smoothing operation prior to any further processing.

2.2. Image color model Observed color Ik of a surface depends on the spectral reflectance of the surface that the light is leaving and the spectral radiance of the light falling on the surface. The value measured by the kth camera sensor can be expressed as in (1). Z Ik = σk (λ)ρ(λ)E(λ)dλ (1) Λ

Here, Ik is the response of the kth camera sensor, σk (λ) is the sensitivity of the kth camera sensor, ρ(λ) is the spectral reflectance, and E(λ) is the spectrum of the illuminating light ([4]). If the kth camera sensor is assumed to be sensitive to a single wavelength λk , then σk (λ) = σk ×δ(λ−λk ). Ik can be further simplified to (2) ([3]). Ik (k∈R,G,B) = σk (λk )ρ(λk )E(λk )

(2)

The following discussions about normalized color and reflectance ratio will be based on the image color model given in (2).

2.3. Normalized color comparison Normalized color is one of the photometric invariants widely used in shadow detection but a rationale is usually not given ([1]). Here, we will show that it is invariant to illuminant intensity, but depends on illumination color and surface reflectance. Normalized color Iknorm is defined as in (3). Iknorm = P

Ik i=R,G,B Ii

, k = R, G, B

(3)

Assuming illuminant is a black-body radiator, illumination may be modeled by Planck’s Law as in (4), where λ is the wavelength, T is the temperature, c1 and c2 are constants ([3, 4]). c2

E(λ, T ) = I × c1 × λ−5 × (e T λ − 1)−1

(4)

Substituting Ik in (3) by (2) and using (4) for illumination, (3) may be rewritten as in (5). Iknorm = P

σk ρ(λk )Ic1 λ−5 k (e

c2 T λk

− 1)−1 c2

−5 T λi − 1)−1 i=R,G,B σi ρ(λi )Ic1 λi (e

(5)

Wavelength independent terms cancel out and (6) is obtained. c2

Iknorm

=P

T λk − 1)−1 σk ρ(λk )λ− k 5(e c2

i=R,G,B

T λi − 1)−1 σi ρ(λi )λ−5 i (e

(6)

This shows that normalized color is invariant to illumination intensity, but it is not invariant to illuminant color and surface reflectance. Hence, if a shadow candidate SC(x, y) has the same normalized color components as the corresponding background BG(x, y) , it can be marked as part of shadow mask SMN C . In our algorithm, we use a threshold in (0.01, 0.02) to allow for small changes in each normalized color component.

2.4. Reflectance ratio comparison

surface normal vector, light source direction, and sensor direction are the same. Then sensor responses of p1 and p2 may be rewritten as in (9) and (10). Ikp1 = σk ρp1 (λk )E(λk )

(9)

Ikp2 = σk ρp2 (λk )E(λk )

(10)

The reflectance ratio p is written as in (11). Reflectance ratio is independent of the illumination E. It only depends on the reflectance properties of the points p1 and p2 . p=

I p1 ρ p1 = I p2 ρ p2

(11)

Instead of this definition of p (when I p2 = 0, p = ∞) a well-behaved p function in [−1, +1] can be redefined as in (12) ([7]). ρ p1 − ρ p2 I p1 − I p2 = p1 (12) p = p1 p 2 I +I ρ + ρ p2 In our algorithm, we do not use the reflectance ratio to segment the image, instead we calculate this ratio for 4connected neighbors of a pixel and for all color components. If the pixel is a shadow, these ratios should be same for both foreground and background pixels. Again, we use a small threshold to allow for noise and other imperfections. By redefining p for the kth color component of a shadow candidate SCk (x, y) at location (x, y) and its neighbor’s color component SCk (x+i, y +j) at location (x+i, y +j), we obtain (13). pk (x, y, i, j) =

SCk (x, y) − SCk (x + i, y + j) SCk (x, y) + SCk (x + i, y + j)

(13)

Here, k = R, G, B denote three color components and i = −1, 1; j = −1, 1 denote 4-connected neighbors. Similarly, we define p0k (x, y, i, j) for the components of a background pixel BGk (x, y). Using these ratios, we define our shadow constraint as in (14). X |pk (x, y, i, j) − p0k (x, y, i, j)| < T (14) k=R,G,B

Reflectance ratio is a photometric invariant developed by Nayar and Bolle ([7]). Originally, it was developed for grayscale images and used to obtain a surface segmentation that is invariant to illumination and surface geometry. The camera sensor response of two points p1 and p2 can be written as in (7) and (8).

For i = −1, 1 and j = −1, 1, we obtain four constraints for four neighbors of a pixel. If all these constraints are satisfied at location (x, y), that pixel is marked as part of shadow mask SMRR .

Ikp1 = σk ρp1 (λk )E p1 (λk )

(7)

Ikp2 = σk ρp2 (λk )E p2 (λk )

(8)

The value of an invariant expression may not be unique to a particular material, and there may be singularities and instabilities for particular values (e.g. normalized color is not reliable around black vertex). To obtain a robust result, the results from two invariants based on two different properties (i.e. the normalized color based on spectral property

If p1 and p2 are two adjacent points on a surface whose geometry changes smoothly, it may be assumed that they are illuminated by the same light source (E p1 = E p2 ) and the

2.5. Combination of the masks and postprocessing

and the reflectance ratio based on spatial property) are combined; the shadow mask SM is obtained by ANDing the corresponding masks SMN C and SMRR . The shadow corrected foreground F GSC is then obtained by subtracting SM from F G. To further refine the results, postprocessing is applied to both SM and F GSC masks separately. Isolated pixels are removed and contours are refined using morphological operators such as open and close. Compactness of foreground and shadow mask is enforced by filling the holes. Because conservation of foreground regions is more important than conservation of shadow regions, the shadow corrected foreground mask is given a higher priority in postprocessing (i.e. if a pixel is marked as both foreground in F GSC and as shadow in SM after postprocessing, the pixel is assumed to be a foreground pixel). At the shadow boundaries, same illuminant assumption fails resulting in different reflectance ratios for the neighbor pixels. This leads to misclassification of shadow pixels as foreground. To undo this effect, the cleaned shadow mask is dilated. Figure 2 shows the SMN C , SMRR and their combination together with the foreground, and the result after postprocessing for a sample frame.

3. Experimental results

T Pshadow T Pshadow + F Nshadow

(15)

ξ=

T P f oreground T Pf oreground + F Nf oreground

(16)

where T P is the number of true positives, F N is the number of false negatives, and T P f oreground is the number of ground-truth pixels of the foreground objects minus the number of pixels detected as shadows but belonging to foreground objects. In [9] background model is not updated. To establish a fair comparison, we used a non-adaptive background subtraction with a fixed background model and used the median of the first N frames as the background model. Our method resulted in higher values both in detection and

η% 72.82% 76.27% 78.61% 62.00% 83.48%

ξ% 88.90% 90.74% 90.29% 93.89% 96.68%

Table 1. Shadow detection performance compared to other methods reported in [9]. discrimination rates. Table 1 presents the results of comparison. The main goal of the proposed method is not accurate detection or discrimination of shadow pixels, but the improvement of moving object detection because accurate object detection is crucial for further tasks in a vision application. Thus, another set of experiments is considered to measure the improvement of object detection after shadows are removed. The performance of moving object detection is measured in terms of foreground recall (17) and foreground precision (18). RecallF G =

The proposed method is tested with various indoor and outdoor image sequences. In order to compare the performance of the proposed shadow detection method to those methods reported in a recent survey ([9]), the available sequence intelligent-room and its ground-truth data are obtained. In [9], a two layer taxonomy is proposed, where shadow detection algorithms are first grouped as deterministic versus statistical and further divided as parametric and non-parametric. Four algorithms representative of three classes, SNP (statistical non-parametric), SP (statistical parametric), DNM1 (deterministic non-model based 1-), DNM2 (deterministic non-model based -2-) are picked and compared. Results are reported in terms of shadow detection rate η (15), and shadow discrimination rate ξ (16). η=

Method SNP SP DNM1 DNM2 Proposed method

T PF G T PF G + F N F G

P recisionF G =

T PF G T PF G + F P F G

(17) (18)

where T P , F N and F P are numbers of true positives, false negatives, and false positives of foreground respectively. Two sets of recall and precision rates are computed for the intelligent room sequence whose ground-truth is available. First set is computed for moving object detection with MoG only followed by post processing; second set is computed for the shadow corrected sequence followed by post processing. Results are shown in figure 3. As expected, shadow detection improves foreground precision considerably, increasing average foreground precision from 54.17% to 86.42% for a comparably small decrease in average foreground recall, from 96.23% to 92.43%. Detection results along with ground-truths for the frames #170 and #296 of intelligent room sequence are shown in figure 4. Detection results are superimposed on original images, dark gray regions correspond to moving objects, black regions correspond to moving shadows. The proposed method is tested with other indoor and outdoor sequences but ground-truths of these sequences are not available at the time. Visual results of selected frames from the sequences walk-in and parking-lot are shown in figure 5. From the figures 4 and 5, it can be seen that the proposed method yields good shadow detection and discrimination in the presence of both intense outdoor shadows and less intense indoor shadows, greatly improving moving object detection performance. Some of

the less intense shadows are misclassified as background by the mixture of Gaussians method, but this misclassification is not disruptive since it does not affect object detection performance.

4. Conclusion

Figure 2. Frame #100 in intelligent-room. Top left: SMN C and foreground, top right: SMRR and foreground, bottom left: combined SM and foreground, bottom right: result after postprocessing superimposed on frame.

100

100

90

90

80

80

70

70 foreground precision

foreground recall

We presented a method to detect moving cast shadows in image sequences to improve the detection of moving objects. The proposed method uses two photometric invariants to ensure the robust detection of the shadow mask and does not require a priori information about the camera, scene geometry or object model. A recent work ([6]) uses a similar approach by exploiting the reflectance ratio as reported in [7, 8]. The proposed method in this paper was inspired from the same work by Nayar and Bolle, but the approach in [6] requires manual segmentation of certain surface patches under shadow in advance, and involves calculation of body color, whereas the proposed method does not require manual intervention and we do not use the reflectance ratio for surface segmentation, instead we extend it to three color components in order to classify each pixel locally by exploiting this invariant. Also, the proposed method does not require tracking of shadow patches as reported in [1], hence simplifying the detection process. The proposed method shows promising results both in indoor and outdoor sequences, yet it should be evaluated with more diverse sequences. As future work, we plan to manually produce the ground-truth data of the sequences and test the method thoroughly.

60 50 40 30

60 50 40 30

20

20 Before Shadow Removal After Shadow Removal

10

100

120

140

160

180 200 frame number

220

240

260

280

Before Shadow Removal After Shadow Removal

10

100

120

140

160

180 200 frame number

220

240

260

280

References [1] A. Cavallaro, E. Salvador, and T. Ebrahimi. Detecting shadows in image sequences. In Proc. of IEE Conf. on Visual Media Production (CVMP04), London, UK, March 2004. [2] S. Cheung and C. Kamath. Robust techniques for background subtraction in urban traffic video. In Proc. of Video Communications and Image Processing, SPIE Electronic Imaging, San Jose, January 2004. [3] G. Finlayson, S. Hordley, and M. Drew. Removing shadows from images. In Proc. of the 7th European Conf. on Computer Vision (ECCV02), pages 823–836, Copenhagen, Denmark, May 2002. [4] D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall, 2002. [5] D. Gutchess, M. Trajkovic, E. Cohen-Solal, D. Lyons, and A. Jain. A background model initialization algorithm for video surveillance. In Proc. of the 8th IEEE Intl. Conf. on Computer Vision (ICCV01), pages 733–740, Vancouver, Canada, July 2001. [6] S. Nadimi and B. Bhanu. Physical models for moving shadow and object detection in video. IEEE Trans. on Patt. Anal. and Mach. Intel., 26(8):1079–1087, August 2004.

Figure 3. Foreground recall and precision with and without shadow detection. [7] S. Nayar and R. Bolle. Computing reflectance ratios from an image. Pattern Recognition, 26(10):1529–1542, October 1993. [8] S. Nayar and R. Bolle. Reflectance based object recognition. Intl. Journal of Computer Vision, 17(3):219–240, March 1996. [9] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detecting moving shadows: algorithms and evaluation. IEEE Trans. on Patt. Anal. and Mach. Intel., 25(7):918–923, July 2003. [10] J. Renno, J. Orwell, and G. Jones. Evaluation of shadow classification techniques for object detection and tracking. In Proc. of the IEEE Intl. Conf. on Image Processing (ICIP04), Singapore, October 2004. [11] C. Stauffer and W. Grimson. Adaptive background mixture models for real-life tracking. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR99), pages 246–252, Ft. Collins, CO, June 1999.

(a) Original image

(b) After MoG

(c) After shadow detection

(d) Ground-truth

Figure 4. Intelligent room. top row: frame #170. bottom row: frame #296.

Figure 5. Top row: walk-in #16. middle row: walk-in #35. bottom row: parking-lot #34. left to right: original, after MoG, after shadow detection.