Generative Model for Abandoned Object Detection - IEEE Xplore

4 downloads 0 Views 508KB Size Report
This paper proposes an algorithm for abandoned object de- tection based on generative model of low level features. First, suspected blobs are detected by ...
GENERATIVE MODEL FOR ABANDONED OBJECT DETECTION Jianting Wen 1,2 , Haifeng Gong 1,3 , Xia Zhang 2 and Wenze Hu 1,3 1

2

Lotus Hill Institute, Ezhou, China, Institute of Remote Sensing Applications, Chinese Academy of Sciences, Beijing, China, 3 Department of Statistics, University of California Los Angeles, USA ABSTRACT

This paper proposes an algorithm for abandoned object detection based on generative model of low level features. First, suspected blobs are detected by foreground detection and pixel variance thresholding. Then several low level features on blobs are calculated to remove false alarms, which include pixel variance over time, edge intensity score, edge variance over time, foreground completeness, histogram contrast with respect to surrounding background, bag color model and priors of height and width. The last two features are used as threshold. For other features, their log probability ratio between distributions on positive examples and negative ones are tted by sigmoid function. At last, the fitted log probability ratios are weighted to construct a classifier. The algorithm has been verified in 29 challenging scenes and produces very low false alarms and missing detection. Index Terms— Abandoned object detection, generative model, video surveillance 1. INTRODUCTION Abandoned object detection is one of the most important tasks in video surveillance. In the literature, there are mainly two types of methods. The rst type of methods use background model to detect suspected region and certain low level processing to suppress possible false alarms. The low level processing include shadow removal and boundary matching [1], classification on gradient histograms[2] and multi-layer modeling [3]. However, this type of methods usually limit to simple situations and will meet many false alarms in complex scenes. The second type of methods are based on tracking. For example, Auvinet et al.[4] use multi-camera system event detection to recognize bags left by their owers. Ferrando et al.[5] also detect the abandoned bags by tracking and differentiate removed objects as well. Spengler and Schiele[6] detect abandoned objects based on multi-people tracker and blob-based object detector. Smith et al.[7] track people and This work was done at Lotus Hill Institute and supported by the following funding — NSFC 60875005, 60832004 and Commission of Science Technology and Industry for National Defense of China 5.3.1.

978-1-4244-5654-3/09/$26.00 ©2009 IEEE

853

bags simultaneously and tell bags from people by their lifespan and size. The effectiveness of such kind of methods rely on the robustness of tracking, which have not ensured yet. In this paper, we propose a method to detect abandoned bag with generative models for a set of carefully designed low level features. We uses only one camera system and deal with more challenging scenes including railway, waiting room and bus shelter etc. It only depends on background model, and it is more robust than tracking based method. Because of the carefully designed features which target several possible false alarms, our system is very robust. First, suspected blobs are detected by foreground detection and pixel variance thresholding. Then several low level features on blobs are calculated to remove false alarms, which include pixel variance over time, edge intensity score, edge variance over time, foreground completeness, histogram contrast with respect to surrounding background, bag color model and priors of height and width. The last two features are used as threshold. For other features, their log probability ratio between distributions on positive examples and negative ones are fitted by sigmoid function. At last, the fitted log probability ratios are weighted to construct a classifier. The remainder of the paper is organized as follows. Section 2 introduces the preprocessing step, Section 3 gives out our low level features. Section 4 constructs assembled classifier that discriminates abandoned bags from suspected objects. In Section 5, experiments and results are reported to verify its effectiveness. At last, conclusions are provided in Section 6. 2. SUSPECTED FOREGROUND MASK DETECTION To avoid exhausted scanning of all possible bounding boxes, we first introduce two criteria to screen out a small number of suspected regions. To become an abandoned object, two conditions should be satisfied. First, it should be a foreground object. Second, it should remain static in recent frames. So, we define that the suspected areas should mostly be foreground in recent frames and their variance should be low. Our algorithm is integrated into a video surveillance system, so the background modeling module is provided by the surveillance system. In the integrated system, three background models

ICIP 2009

(LBP[8], IPCA[9], and Gaussian mixture model[10]) together with different shadow removal strategies are switched according to scene type. Based on background modeling, we find pixels occupied by foreground in 70% of recent frames, and compute their robust variances over recent frames. Then the pixels with robust variances lower than a threshold (e.g. 200) are taken as suspected regions. On the mask of suspected pixels, we apply connected component analysis, and obtain a set of connected components. We remove connected components with small area and fit the remain ones with rectangles by second order moment analysis to obtain a set of bounding boxes. For each bounding box, several features are computed and combined into a classifier. To tolerate the inaccuracy of bounding box locating, we scan around each bounding box to find a maximal classifier score as the final score. 3. FEATURE EXTRACTION Several low level features have been extracted for each suspected bounding box and used to construct a classifier. Pixel Value (PV) Static objects should have small variance over time. We calculate the PV feature as x∈Ω log σx2 where Ω is suspected foreground mask, σx is the robust variance in position x during frame n to n + t frame. Robust variance is used instead of plain one to allow occasional occlusions of bags. Edge Variance (EV) Considering that objects with small movements may have small PV too, we add EV to further check the static status and discriminate it from small movements. EV is more sensitive to small motions than PV, which is illustrated in Fig. 1. Let ux and vx denote the horizontal and vertical gradient value at position x. Similarly, we calculate the EV as EV = x∈Ω log σu2 x + log σv2x , where σu2 x and log σv2x are robust variance of ux and vx over time respectively. (a)

pixel

(d)

Bag Color feature score (BG) Above features cannot tell bags from other objects effectively. It would still have many false alarms, such as grass, trees and walls. To further discriminate them, we define a BG feature by hypothesis testing with respect to typical false alarms, i.e. ground, wall and vagetations. We first learn the color histograms of bags, ground, wall and vegetation offline, denoted by pb , qg , qw , qv , respectively. The we compute the  (Ix ) pb (Ix ) x) , log qpwb (I BG feature by BG = x min(log pqgb(I (Ix ) , log qv (Ix ) ) x) where Ix is the pixel value at position x in the current frame. Abandoned bags should have a large BC. Edge Score (ES) The boundaries of bags should appear as edges in the image. So, we calculate the ES feature, which is defined as the Kullback-Leibler divergence between edge intensities  in suspected box and background image, i.e. ES = r pe (r) log pe (r)qe (r) , where pe , qe are histograms of edge intensities collected from the suspected bounding box and a whole background image. This feature can remove false alarms like shadows and lighting. To reduce the computing burden, edge histogram of background can be computed once per minute. Foreground Shape Completeness (FC1) In addition to the above features, we have noticed that abandoned bag should be splitted with its owner, which can be defined with its foreground shape completeness. If a bag has been abandoned, foreground mask will fulfill its bounding box. So FC1 is dened as log ratio between the foreground percentage in the suspected bounding box and the foreground percentage outside the bounding box in a certain range. If the bag’s owner doesn’t leave, it will have lower FC1, such as in Fig. 2. Moreover, it will suppress other suspected objects that haven’t near rectangular shape.

edge

t

High FC (b)

(e)

PV

(f)

Low FC

Fig. 2. Foreground shape completeness and rectangular measure.

t+1 (c)

Low FC

EV

Fig. 1. EV is more sensitive than PV. (a) is one dimensional pixel values at time t, (b) is its pixel values at time t + 1, (c) shows its pixel variances over time t and t + 1 at each pixel. Correspondingly, (d) is its edge values at time t, (e) is its edge values at time t + 1, (f) shows the edge variance over time t and t + 1 at each pixel.

Foreground Shape Completeness 2 (FC2) Different background models will result in different foreground detection results. Some may have severe hollow, some may introduce more false alarms. To compensate these differences, a simple internal background model is maintained. Here, we apply the single Gaussian model on background in addition to other background models. And we dene another shape completeness like FC1, called FC2. Histogram Contrast (HC)

854

a shape parameter to adjust the bandwidth of the distrbution,  and Z = exp λh(kr + b)q(r)}dr is the normalization constant. This model fit the log probability ratio log p(r)/q(r) with transformed sigmoid function. To learn the model, we enumerate the 3 parameters k, b and λ in a certain range to find the maximal log-likelihood score. Given k, b and λ, the normalization constant can be approximated by the mean of exp{λh(kr +b)} on negative examples because Z = Eq [exp{λh(kr + b)}]. The sigmoid fitted results are shown in Fig. 4. The regression results are good because their large errors only happen in positions with very few negative samples. After learning of generative model for each feature, the log probability score log p(r)/q(r) = λh(kr + b)? log Z becomes a natural measure of contribution of the corresponding feature to the overall classification. So we add up these score directly as the final score, because FC1 and FC2 are highly correlated, their contributions are halved.

Above features are not enough to differentiate the abandoned objects from leaving ones. When a background object is removed, the change will be recognize as foreground and its features have possibility to be accepted by previous features. So additional feature should be introduced to suppress it enforcely. HC is dened as the KL Divergence of color histogram between those inside the  suspected bounding box and its surroundings, i.e. HC = r pc (r) log pc (r)qc (r), where pc , qc are color histograms of suspected bounding box and its surroundings. It is Just like Fig. 3, dropped object has high HC while removed one has low HC value. Above features focus on finding out abandoned objects that are static in foreground and left by the owners. And they have been further discriminate from background and other surrounding objects. However, it can’t differentiate the abandoned objects from leaving ones. When a static bag is taken, the change will be recognize as foreground and its features have possibility to be accepted by the classifier. So we introduce HC to suppress it enforcely. HC is defined as the KL Divergence of color histogram between those inside the rectangular and arround. Just like Fig. 3, drop object has high HC while remove one has low value.

2

PV

0 -2 -4

0 5 FC1

2

4

0 -2

-2

-4

-4

2

3

4

10

-5

20

25 HC

0

-2

-2

-4

-4

-6

pc

15

2 FC2

0

qc

ES

-6 1

6

0

High HC

2 0

-6 -6

Drop object

EV

2

0

2

4

6

-6 -2

0

2

4

2

4

6

8

10

12

2 BG

Remove object

Low HC

0 -2 -4 -6 -4

Fig. 3. HC: A dropped object has a high HC, while a removed one has a low HC value.

5. EXPERIMENTS We have experimented the proposed algorithm in nearly 30 scenes, including rails, waiting rooms, bus shelters etc. The results are shown in Tab. 1. 4 in 29 testing scenes appear missing detection, all others have no missing and false alarm. All rails’ scenes have passed testing, even when the abandoned bags have similar color with the crosstie. And it can suspress false alarms caused by slight movements of vegetations and illumination changes. In waiting room videos, one in five scenes appear missing detection. It is caused by long staying of the owner with his abbandoned bag. But other results are good and many false alarms caused by pantomime movement are suspressed. 2 in 9 scenes have missing reports in bus shelter videos, because some people go standing near

4. GENERATIVE MODELS FOR FEATURES To obtain a classifier with good generalization ability from few training examples, we choose to build generative models for each feature. For each feature r, we try to learn its probability model p(r) on true abandoned bags. Let q(r) be the distribution of feature r on negative examples, we assume the positive distribution p(r) is of the follow form

a

1 exp{λh(kr + b)}q(r) Z

0

Fig. 4. Sigmoid funtion regression of each feature’s log ratio. Blue lines indicate the original log raios and green lines are regression results. They are leart from the waiting room videos.

Above listed features will assemble to a classifier which will detail in next section. Besides, to remove some obvious false alarms, we constrant suspected objects to meet size requirments. Considering the inaccuracy of calibration, suspected objects’ rectangular height and width should be lower to 1.1 and 2.5 meters respectively.

p(r) =

-2

(1)

−a

where h(a) = eea −e +e−a is a sigmoid function, k and b are scale and offset parameters to adjust the feature valuer, λ is

855

to the abbandoned bag after it is dropped. However, others recevied good detection and it is verified that occlusion is not a problem because detections are carry on several frames. 1 in 9 of other scenes has missing alarms. It happens in an indoor scene where a suitcase abbandoned with heavy shadow. Shadows fail to be removed and become part of the object which lowers its feature score and fails to pass the classifier. Some of the video examples are shown in Fig. 5. Video scene Rail Waiting room Bus shelter Others All

Correct 6 4

Missing 0 1

7

2

8 25

1 4

False alarm 0 0 0 0 0

7. REFERENCES

Total 6 5 9 9 29

[1] P. Spagnolo, A. Caroppo, M. Leo, T.o Martiriggiano, and T. D’Orazio, “An abandoned/removed objects detection algorithm and its evaluation on pets datasets,” in Proceedings of the IEEE International Conference on Video and Signal Based Surveillance, 2006, pp. 17–22.

Fig. 5. Testing result in 29 scenes. (a)

are used as thresholding. Others are assembled into classifier with sigmoid functions. It has been veried in 29 video scenes, most of which are challege scenes with complex background and crowded people. This method can suspress false alarms caused by illumination changes, slight movements of background objects, some shadows and works under slight occlusions.

[2] R. Miezianko and D. Pokrajac, “Detecting and recognizing abandoned objects in crowded environments,” in Computer Vision Systems, 2008, vol. 5008/2008, pp. 241–250.

(b)

[3] S. Denman, S. Sridharan, and V. Chandran, “Abandoned object detection using multi-layer motion detection,” in Proceeding of International Conference on Signal Processing and Communication Systems, 2007.

(c)

[4] E. Auvinet, E. Grossmann, C. Rougier, M. Dahmane, and J. Meunier, “Left-luggage detection using homographies and simple heuristics,” in IEEE Int’l Worksshop on PETS, 2006, pp. 51–58.

(d)

[5] S. Ferrando, G. Gera, M. Massa, and Regazzoni C., “A new method for real time abandoned object detection and owner tracking,” in Proc. ICIP, 2006, pp. 3329– 3332.

Fig. 6. Some experiment results. Red masks are suspected foreground mask detection results. White rectangles are suspected objects that have passed some of the features’ classifier. Yellow rectangles are abandoned object detection results that pass all the classifier. (a) Rail road detection result. Vegetation movements and illumination changes have produce large amout of suspected objects. (b) Waiting room detection result. Pantamine have introduce many suspected areas. (c) Bus shelter detection result. Besides suspected areas caused by pantamime, travelling buses generate large area of suspected objects and caue occlusion. (d) PETS video result. No missing and false alarms.

6. CONCLUSIONS In this paper, an abandoned object detection method has been proposed based on low level features including PV, EV, BG, ES, FC1, FC2, HC and sizes of objects. The last two features

856

[6] M. Spengler and B. Schiele, “Automatic detection and tracking of abandoned objects,” in IEEE Int’l Workshop on PETS, 2003. [7] K. Smith, P. Quelhas, and D. Gatica-Perez, “Detecting abandoned luggage items in a public space,” in IEEE Int’l Workshop on PETS, 2006. [8] Marko Heikkil¨a and Matti Pietik¨ainen, “A texturebased method for modeling the background and detecting moving objects,” PAMI, vol. 28, no. 4, pp. 657, 2006. [9] Y. Zhao, H. Gong, L. Lin, and Y. Jia, “Spatio-temporal patches for night background modeling by subspace learning,” in ICPR, 2008. [10] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, “Background and foreground modeling using nonparametric kernel density estimation for visual surveillance,” Proc. IEEE, vol. 90, no. 7, pp. 1151– 1163, 2002.