Moving shadows represent a serious difficulty for these methods, as they will appear as part of the segmented foreground vehicles. Shadow removal algorithms ...
2010 13th International IEEE Annual Conference on Intelligent Transportation Systems Madeira Island, Portugal, September 19-22, 2010
TB5.6
A Shadow Removal Algorithm for Vehicle Detection based on Reflectance Ratio and Edge Density M. Vargas, Member, IEEE, S.L. Toral, Senior Member, IEEE, J. M. Milla, F. Barrero, Senior Member, IEEE
Abstract— Automatic vehicle detection systems in urban and inter-urban traffic using computer vision are frequently based on background subtraction methods. Moving shadows represent a serious difficulty for these methods, as they will appear as part of the segmented foreground vehicles. Shadow removal algorithms usually rely on exploiting color properties. However, the use of image color information, when available, is more computationally demanding and it may compromise many real-time implementations. This paper proposes a shadow removal algorithm, suitable for background subtraction methods, where only grayscale information is required. The method is based on edge density computation on a quotient image, obtained from the current frame and the background model. Experimental results from various traffic scenes are provided in order to prove the validity of the proposed method.
M
I. INTRODUCTION
OVING object segmentation is an important issue in many applications, such as automatic video surveillance of people activities or traffic scenes [1], [2]. In many cases, the motion detection must be done under natural or unstructured illumination conditions, where shadows cast by moving objects play a very important role, as they will naturally be segmented as part of the foreground moving objects. As a consequence, the object size and shape can be seriously distorted, or even fairly separated objects can be connected by their shadows, causing, in any case, the tracking or recognition system to fail. Shadows are caused by objects occluding the light coming from a light source. The object blocks the light on one side causing a dark area on the other, with an outline similar to its silhouette. This shadow is known as cast shadow, as opposed to the self shadow which is the part of the object not illuminated by the light source. For object recognition, object tracking and many other applications, cast shadows are undesired and should be removed, while self shadows are parts of objects and should be preserved [3]. Fortunately, This work has been supported by ACISA and the Spanish Ministry of Education and Science (National Research Projects DPI2007-60128 and DPI2007-64697) and the Consejería de Innovación, Ciencia y Empresa (Research Project P07-TIC-02621). J. M. Milla and M. Vargas and are with the Department of Automation and Systems Engineering, University of Seville, Spain, (e-mail: {vargas,jmilla}@esi.us.es). S. L. Toral and F. Barrero are with the Electronic Engineering Department, University of Seville, Spain, (e-mail: {toral,fbarrero}@esi.us.es).
978-1-4244-7659-6/10/$26.00 ©2010 IEEE
cast shadow is cast over a surface belonging to the background model of the image, while self shadow is cast on a surface belonging to the object itself. The comparison between foreground objects with the background model allows not only the detection of moving objects, but also the distinction between cast and self shadows [4]. Consequently, the rest of the paper will be focused on the cast shadow which will be referred from now on as shadow.Typically, shadow removal is one of the early stages in video-based vehicle detection systems. Shadow removal algorithms try to discriminate shadows from the rest of the foreground image, by exploiting some of their physical properties [5]. The majority of the techniques are based on color images, which provide much richer information [6], [7], [8]. However, color information is not always available, as grayscale video streams are still usual in video surveillance and traffic monitoring applications [9]. Moreover, if the processing of the image color information is just a demand of the shadow removal algorithm (not being necessary for other processing steps), significant computational effort could be saved by providing a shadow removal algorithm based only in grayscale information. This constitutes a more challenging problem which is tackled in this paper. Some alternatives for shadow detection using just luminance information, are based in the comparison between the current frame, I, and the background model, M, using textures, quotients or correlations. This paper focuses on the quotient’s approach, as a starting point, and introduces several enhancements to improve the detection performance, without compromising a real-time implementation. The rest of the paper is organized as follows. The next section is an overview about the existing techniques for moving shadow removal in grayscale video streams. Section III describes the enhancement of the proposed approach and section IV provides results about the shadow detection performance in several typical traffic scenes. Finally, the paper is concluded in section V. II. GRAYSCALE SHADOW REMOVAL ALGORITHMS OVERVIEW Theoretically, a shadow does not affect the texture of the surface on which it is cast [10]. Consequently, an accurate measurement of the texture of surfaces could be used for shadow detection. The main problem of an exhaustive measurement of textures is its high computational cost. That is the reason why several simpler indicators of textures are 1123
proposed in the literature. One possibility consists of using the gradient images ∇I and ∇M instead of the original current frame and I and background model M to derive a texture measurement [11], like the expression given by equation (1). R(i, j ) = 1 −
∑ 2 ⋅ ∇I (n, m) ⋅ ∇M (n, m) ⋅ cosθ
( n, m )
∑ ( ∇I (n, m)
2
+ ∇M (n, m)
2
( n, m)
)
(1)
( n, m)
where (n,m) are coordinates of pixels belonging to a given neighborhood of (i,j), and θ is the angle between ∇I (n, m) and ∇M (n, m) . Other measurements reported in the literature are based on local binary patterns (LBP) [12], [13]. They consist of applying a mask, usually 3x3, with a weight associated to each of the 8 neighbors. The first consecutive powers of two starting from 1 are used in [12]. A threshold equal to the central pixel is applied to each of the eight neighbors, binarizing the result of this operation. Then the chosen weights of pixels activated through the threshold operation are added, obtaining a measure of the relative local intensity of each pixel. A local contrast value is also obtained as detailed in equation (2).
∑ I (n, m) B(n, m) ∑ I (n, m) B (n, m)
C (i, j ) =
( n, m )
−
∑ B ( n, m )
(2)
( n,m)
( n,m )
∑ B ( n, m )
( n,m )
where B(n,m) is the resulting binary mask from the previous thresholding operation. Although a lot of information is summarized in both values, the disadvantage of these measurements is its high noise sensitivity, mainly in those pixels weighted with the highest powers of two. Several improvements have been introduced to overcome these difficulties, like the rotation invariant LBP [13]. A different strategy for shadow removal is based on the image quotient, obtained as I/M. The image quotient has the property of remaining almost constant in those areas where shadows are cast, with a small standard deviation [14]. Some authors have observed that the intensity variation on this quotient image is approximately linear, which means a uniform gradient [15]. Finally, correlation between the same surface with and without shadow can be used for shadow removal. The normalized cross-correlation is calculated in [16] for each pixel according to equation (3). NCC (i, j ) =
where ER (i, j ) =
(3)
ER(i, j ) EM (i, j )
EI (i, j )
∑ M (n, m) I (n, m) ,
( n,m)
E M (i, j ) =
∑ M (n, m)
( n,m)
2
E I (i, j ) =
∑ I ( n, m )
( n,m)
2
being EM the energy of the background model and EI the energy of the current frame. Despite of the diverse strategies considered by the previous algorithms, all of them present common disadvantages: • They heavily depend on the illumination conditions of the scene. In indoors environments with controlled illumination conditions they can achieve very good results, but in outdoor scenes with unknown illumination conditions, like traffic scenes, their performance is considerably reduced. • Another problem is the shadow periphery removal, where the shadow properties the algorithms rely on are not preserved. • In the particular case of the traffic scenes, surfaces like vehicles or pavement exhibit a poor variety of textures, clearly inadequate for those techniques based on preserving the texture of the surface where they are cast. III. PROPOSED SHADOW REMOVAL ALGORITHM Basically, the proposed algorithm consists of using the quotient image between the current frame and the background model; after this image, the areas with high edge density are located (which are expected to be concentrated on the vehicles) and a region growing process is applied in order to recover the whole vehicle as a unified blob. However, after the previous steps, vehicles are often partitioned into two or more blobs. To overcome this situation, a subsequent clustering procedure will be applied to reunify separate regions owning to the same vehicle. The proposed procedure can be decomposed in the following six steps. The starting point is the application of a background subtraction using a Sigma-Delta algorithm [17], [18], [19], [20]. Step 1. Computation of the modified quotient image. The quotient image is typically calculated as the quotient between the current frame I and the background model M or the inverse, eq (4). C1 ( x) =
I (i, j ) M (i, j )
; C 2 ( x) =
M (i, j ) I (i, j )
∀ pixel (i, j )
(4)
The quotient image has the property of amplifying the intensity of the surfaces where shadows or reflections are cast. However, the result of shadows and reflections are just the opposite: while shadows darken the regions where they are cast, reflections lighten them. To remove indistinctively both shadows and reflections, the quotient image must be calculated in a way in which both phenomena are treated in the same way. In the case of C1, the shadows spectrum is concentrated on the [0,1[ interval, while the reflections spectrum comprises the ]1, ∞ [ interval. The case of C2 is just the opposite. Notice that both C1 and C2 are unbounded, which may cause problems from a computational perspective [17]. One way of treating both phenomena the same way consists of applying logarithms in (4). Both intervals are then equalled in amplitude. An alternative solution provided in the
1124
literature achieves the same spectra amplitude, but leading to better results [21], eq. (5). C 3 (i, j ) =
I (i, j ) − M (i, j ) I (i, j ) + M (i, j )
∈ [− 1,1]
(5)
According to [21], the quotient image is a measure of the relative reflectance of pixels. That is, the quotient image represents the relative reflectance of the images to which is applied. This value is uniform in those regions which are identical in both images (like the visible background areas) or which are multiplied by a constant factor in all their pixels (like shadows and reflections). Compared to other alternatives, C3 proposed in [21] exhibits better noise immunity. Figure 1 illustrates the modified quotient image, C3, on a typical urban traffic scene.
shadows’ periphery. In this step, a morphological erosion operation is applied to the detection mask, D, obtained through background subtraction. The problem is that this erosion step also removes the desirable vehicles’ periphery. An appropriate selection of the erosion operator size is made, in order to minimize this inconvenience. In general, the size of the vehicles and their edge thickness will be different depending on their position on the image. Hence, a fixed-sized erosion operation may cause not enough erosion effect to the nearest shadows, while more distant vehicles can even disappear as a result of this operation. An adaptive selection of the erosion operator size is performed, taking into account how the perspective affects the size of vehicles. During an off-line calibration step, two bounding boxes are defined. They correspond to a medium-sized typical vehicle, located at the most distant and at the closest positions (usually, upper and lower rows, respectively, inside the desired region of interest for detection) with respect to the camera. Figure 3 illustrates the effect of the adaptive erosion over the detection mask. The left column of this figure corresponds to the adaptively-eroded detection mask. The effect of masking the gradient image with the eroded detection mask is shown in right image. Compared with the result shown in Figure 2, it can be seen that the most part of the shadows’ boundaries have been removed.
Figure 1. Example of quotient image C3. From left to right and from top to bottom: current frame I, background model M, detection mask D and quotient image C3 (masked with D). Step 2. Computation of the modified-quotient gradient image. The gradient image is obtained from the modified quotient image by applying a 3x3 horizontal and vertical Sobel operator.
Figure 2. Modified-quotient gradient image (right), after masking with detection image (left). Figure 2 shows the resulting gradient image. It must be noticed that the gradient computation is restricted to the active pixels in the detection mask, D, obtained from background subtraction. It can be seen that the shadows appear as gradient-free areas, except at their contours. Step 3. Morphological erosion of the detection mask. In view of the results from the previous step, it is necessary to avoid the undesirable detection of edges belonging to
Figure 3. Modified-quotient gradient image (right), after masking with adaptively-eroded detection image (left). Step 4. Computation of the edge density image. The previous step does not completely remove those parts of the shadow periphery which are closer to the own vehicle periphery. In this fourth step, an edge image is obtained first from the binarization of the gradient image, highlighting relevant edges (see upper-right image in Figure 4). Next, in order to emphasize the core of the foreground vehicles, which is expected to have higher edge density, an adaptive-sized density mask (the adaptation rule is based on a principle similar to the case of the erosion operation) is applied. Basically, the density operation consists of a convolution smoothing operation. To avoid the resulting density image to expand beyond the regions included by gradients, the obtained image will be again masked using the eroded detection image (see bottom-left image in Fig. 4). A subsequent binarization operation will filter out areas with low edge density, as shown in the bottom-right image in Fig. 4. It can be seen that, as a result of this step, many disjoint blobs are obtained, some of them possibly belonging to a same foreground object. The next two steps will try to
1125
reunify those blobs which are, hopefully, part of the same object.
Figure 4. From left to right and from top to bottom: masked gradient image, binary edge image, edge density image, binarized edge density.
computation of two blobs with arbitrary shape would be too computationally expensive. An alternative solution is adopted in order to reduce this cost. A grid is overlaid on the binary image resulting from the previous step; this allows decomposing it into a cell matrix. The object fragment inside each cell (set of white pixels inside the cell) is then modelled as a circle with centre coincident with the centre of mass of the fragment and radius proportional to the cell occupancy. With this simplification, the problem is reduced to the computation of the minimum distance among each two circles, i and j, given by d ij = CM i − CM j − (Ri + R j ) , where CM denotes the centre of mass of each circle and R represents the corresponding radius. Again, a progressive distance metric is used, in order to take into account the perspective effect. According to this, the distance threshold will be more permissive as the objects come closer to the camera. Figure 6 illustrates how the clustering algorithm works.
Step 5. Seeded region growing. The disjoint blobs resulting from the previous step are used as possible multiple seeds for recovering each whole vehicle. A seeded region growing algorithm is performed in the gradient density image, using, as a stopping condition, a more permissive threshold that the one used for seed extraction. The growth-limit threshold should be also adaptive to the relative size of the vehicles in the image (and, consequently, to the size of the density mask). Figure 5 gives the result of this step for the exemplified frame.
Figure 6. Example of the proposed clustering algorithm.
Figure 5. Result of the seeded growing algorithm. Ideally, the resulting image contains single blobs corresponding to vehicles, while shadows have been removed. However, in many cases, a single vehicle will still remain as a compound of several disjoint blobs. An example is shown in Figure 6, where the nearest car appear as three disjoint blobs, and where the missing parts of the vehicle are related to relatively large continuous patches. A final clustering step tries to solve these cases. Step 6. Blob clustering. The chosen blob clustering algorithm is based on the simple concept of minimum distance: whenever the distance between two blobs is below a given threshold, they are considered to belong to the same object and they are labelled as connected blobs. Finally, all the blobs with the same label are reunified in a single cluster, the properties of which are recorded (centre of mass, area, bounding box). Of course, an appropriate definition of “distance” is critical for this step. The minimum distance
This figure shows, from left to right and from top to bottom, the current image, the resulting image after the seeded region growing (with the superimposed grid), the same image showing the fragment-modeling circles, and the final result of the clustering algorithm with the vehicles bounding boxes. Notice that the three blobs corresponding to the nearest vehicle have been clustered together, while several distant vehicles can be mixed in the same cluster. IV. EXPERIMENTAL RESULTS The proposed shadow removal procedure has been tested in different conditions. A quantitative metric is necessary to measure the suitability of the proposed algorithm. Benchmark metrics have been proposed in [5], but they are not suitable for obtaining large statistics, since they require a priori labeling of each pixel on each frame, which is a very time-consuming and tedious work. In this paper, bounding box fitting is proposed as a benchmark metric instead. The shadow removal algorithm is considered to behave properly if the bounding-box size of the resulting cluster does not
1126
imply an increment or reduction of more than 25% of the original bounding-box size of the vehicle. Figure 7 illustrates two possible situations.
Figure 7. Bounding box of the vehicle (1), and acceptable (2) and unacceptable bounding boxes (3) after shadow removal. The bounding box labeled as 1 is the true bounding box of the vehicle. The bounding boxes labeled as 2 represent the limit bounding boxes considered as acceptable after shadow removal (by excess and by default). Finally, bounding boxes labeled as 3 are unacceptable bounding boxes, either over-sizing or under-sizing the right one. Three scenarios have been used to test the proposed algorithm. The first scenario (Figure 8, upper image) corresponds to the Highway I sequence (http://cvrr.ucsd.edu:88/aton/shadow), frequently used as a benchmark for vehicle and shadow detection [5], [22], [23]. The other two proposed scenarios illustrate typical urban and highway situations and correspond to currently installed cameras in Seville and Almeria, Spain. It must be emphasized that both sequences have been recorded using low-cost analog video cameras. A variable mask size has been used in several steps of the proposed algorithm to compensate the perspective effect. In particular, variable mask sizes are involved in the morphological erosion process and in the edge density computation. For instance, Table 1 relates the area of a typical medium-sized vehicle (estimated, by interpolation, for each row inside the region of interest for detection) and the kernel size used for erosion. Similarly, an adaptive distance metric is used during the clustering step. Table 2 details the obtained results. For each sequence, the first row is the number of shadows correctly removed, according to the previously defined bounding-box fitting criterion. The second and third rows distinguish the cases of erroneous shadow removal due to a resulting bounding box above and below the chosen threshold, respectively. Finally, the fourth row is the number of merged vehicles during the shadow removal algorithm (also accounted as over- or under-sized bounding boxes).
Figure 8. Test traffic sequences.
AREAVeh
Kernel size
0 – 15 pixels 3x3 15 – 65 píxels 5x5 65 – 110 píxels 7x7 110 – 150 píxels 9x9 Table 1. Kernel size as a function of the prototype vehicle area. The sequence Highway I shows a traffic environment where the shadow suppression is very important to avoid misclassification and erroneous counting of vehicles. Shadows have been successfully removed from 76 of a total of 89 vehicles. This represent a 85.4% of succeed rate. The succeed rate for the second and third scenario are reduced to 76.12 and 78.43 %, respectively. This is due to the low quality of both sequences and to the fact of incorporating motorbikes and trucks. In the case of the Seville sequence, vehicles are close to each other due to the narrow lanes, producing a much higher vehicle-merging ratio. The third sequence has the additional difficulty of a traffic panel casting its shadow on the road. Despite of these difficulties, an acceptable succeed rate is achieved. Highway I Detection Oversized BB
1127
Cars 76 2
M.bike s 0 0
Trucks 0 0
0 0 0 0 0 0 M.bike Seville Cars Trucks s 366 4 3 Detection 1 25 5 Oversized BB 6 1 0 Undersized BB 68 6 5 Vehicles merged 441 36 13 Total M.bike Almeria Cars Trucks s 234 0 17 Detection 6 0 22 Oversized BB 25 0 1 Undersized BB 8 0 7 Vehicles merged 273 0 47 Total Table 2. Shadow detection results for the three proposed test sequences. Undersized BB Vehicles merged Total
2 9 89
Moving shadows removal is a complex and very important problem for outdoor, real-world applications in unstructured environments. Although a lot of algorithms based on colour information has been reported in the literature, this paper deals with B/W sequences. The proposed method is based on a combination of a quotient image, representing the reflectance ratio between the current frame and a background model, and an edge-density estimation process. The specific problem caused by the shadow boundaries is dealt with an adaptive erosion step, prior to the edge-density estimation. Finally, a very timeefficient blob clustering algorithm is applied in order to combine together different parts of the same object. Three different scenarios have been chosen to test the proposed method. The first one is a common benchmark sequence, exhibiting long shadows and a good point of view. The other two sequences are low quality sequences with worse point of view and slow-moving background shadows. A suitable metric is proposed, in terms of bounding-box deviation, in order to evaluate the algorithms. A success rate over 75 % is obtained in all reported cases.
[3]
[6] [7] [8] [9]
[10] [11] [12]
[14]
[15] [16]
[17] [18] [19] [20]
[21] [22]
REFERENCES
[2]
[5]
[13]
V. CONCLUSION
[1]
[4]
A. Leone, C. Distante, F. Buccolieri, “A shadow elimination approach in video-surveillance context”, Pattern Recognition Letters, Vol. 27, pp. 345–355, 2006. F. Barrero, S. Toral, M. Vargas, J.M. Milla, F. Cortés, “Internet 1n the Development of Future Road-Traffic Control Systems”, Internet Research, Vol. 20, Iss. 2, pp. 154-168. J.M. Wangt, Y.C. Chung, C.L. Changt, and S.W. Chen, “Shadow Detection and Removal for Traffic Images”, Proceedings of the 2004 IEEE International Conference an Networking. Sensing & Control, Taipei, Taiwm, 2004, pp. 649-654.
[23]
1128
P.-M. Jodoin, M. Mignotte, and J. Konrad, “Statistical Background Subtraction Using Spatial Cues”, IEEE Transactions on Circuits and Systems for Video Tech, Vol. 17, Iss. 12, pp. 1758–1763, 2007. A. Prati, I. Mikic, M. Trivedi, R. Cucchiara, “Detecting moving shadows: algorithm and evaluation”, IEEE Trans. Pattern Anal. Mach., Vol. 25, Iss. 3, pp. 918–923, 2003. J. Yao, Z. M. Zhang, “Hierarchical shadow detection for color aerial images”. Comp. Vision & Im. Underst., Vol. 102, pp. 60–69, 2006 E. Salvador, A. Cavallaro, and T. Ebrahimi, “Cast shadow segmentation using invariant color features”, Computer Vision and Image Understanding, Vol. 95, pp. 238–259, 2004 S. Nadimi, B. Bhanu, “Moving shadow detection using a physicsbased approach”, Proc. IEEE Int. Conf. Pattern Recognition, Vol. 2, 2002, pp. 701–704. Y. Wang, “Real-Time Moving Vehicle Detection With Cast Shadow Removal in Video Based on Conditional Random Field”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 19, Iss. 3, pp. 437–441, 2009. O. Javed and M. Shah, “Tracking and object classification for automated surveillance”, Proc. European Conference on Computer Vision, Vol. 4, 2002, pp. 343-357. L. Li and M.K.H. Leung, “Integrating Intensity and Texture Differences for Robust Change Detection”, IEEE Transactions on Image Processing, Vol. 11, Iss. 2, pp. 105-112, 2002. T. Ojala, M. Pietikainen, D. Harwood, “A Comparative Study of Texture Measures with Classification based on Feature Distributions”, Pattern Recognition, Vol. 29, pp. 51-59, 1996. T. Ojala, M. Pietikainen, T. Maenpaa, “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Iss. 7, pp. 971-987, 2002. J. C. Silveira Jacques Jr., C. Rosito Jung, and S. Raupp Musse, “Background Subtraction and Shadow Detection in Grayscale Video Sequences”, 18th Brazilian Symposium on Computer Graphics and Image Processing, SIBGRAPI 2005, 2005, pp. 189-196. A. Bevilacqua, “Effective Shadow Detection in Traffic Monitoring Applications”, Journal of WSCG, Vol.11, no. 1, pp. 57-64, 2003. Ying-Li Tian, Max Lu, and Arun Hampapur, “Robust and Efficient Foreground Analysis for Real-time Video Surveillance”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, pp. 1182-1187. S.L. Toral, M. Vargas, F. Barrero, M. G. Ortega, “Improved SigmaDelta Background Estimation for Vehicle Detection”, Electronics Letters, Vol. 45, Iss. 1, pp. 32-34, 2009. S.L. Toral, M. Vargas, F. Barrero, “Embedded Multimedia Processors for Road-Traffic Parameter Estimation”, Computer, Vol. 42, no. 12, pp. 61-68. A. Manzanera, and J. C. Richefeu, “A new motion detection algorithm based on Sigma-Delta background estimation”, Pattern Recognition Letters, Vol. 28, pp. 320-328, 2007. M. Vargas, J. M. Milla, S. L. Toral, F. Barrero, “An Enhanced Background Estimation Algorithm for Vehicle Detection in Urban Traffic Scenes”, IEEE Transactions on Vehicular Technology, in press, doi 10.1109/TVT.2010.2058134.. S. Nayar and R. Bolle, “Reflectance based object recognition”, International Journal of Computer Vision, Vol. 17, no. 3, pp. 219– 240, 1996. R. Cucchiara, C. Grana, M. Piccardi, A. Prati, “Detecting objects, shadows and ghosts in video streams by exploiting color and motion information”, Proc. 11th International Conference on Image Analysis and processing, 26-28, 2001, pp. 360-365. M. M. Trivedi, I. Mikic, G. Kogut, “Distributed video networks for incident detection and management”, Proc. Intelligent transportation Systems, 1-3, 2000, pp. 155-160.