Abandoned Object's Owner Detection: A Case Study of Hybrid Mobile

0 downloads 0 Views 1MB Size Report
Abandoned Object's Owner Detection: A Case Study of Hybrid Mobile-fixed Video. Surveillance System. Minh-Son Dao*, Riccardo Mattivi, Francesco G.B. De ...
2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance

Abandoned Object’s Owner Detection: A Case Study of Hybrid Mobile-fixed Video Surveillance System Keita Masui*, Noboru Babaguchi Minh-Son Dao*, Riccardo Mattivi, Francesco G.B. De Natale Media Integrated Communication LAB (MICL) MultiMedia Signal Processing and Understanding LAB (mmLAB) Osaka University Trento University 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan Via Sommarive, 5 I-38123 POVO, Trento, Italy Email: {masui, babaguchi}@comm.eng.osaka-u.ac.jp Email: {dao, mattivi, denatale}@disi.unitn.it

In order to overcome these problems, mobile/hybrid cameras surveillance system has been introduced [1], [5], [11]– [13]. These systems offer more flexible schemes to increase the ability of covering observing areas of cameras towards to increase the accuracy of activities or objects detection. The common point of these systems is that cameras or remote controls are installed on self-moving materials (e.g., robots, cameras move along predefined route), or on agents or officers (e.g., shoulder-mobile cameras). Nevertheless, these systems could be claimed on moral harassment due to unfriendly or unnaturally installed mobile sensors (e.g., people could feel uncomfortable when seeing robots run around, or being seized by strange equipment installed on agents or officers). Moreover, moving equipments, in some cases, could be easily disabled by criminals or by natural obstacles. Besides, using remote controls to interfere directly to cameras [1] (e.g., order to zoom in- or out-, or change angle view) could prevent team works since at one period only one person can control a camera. Apparently, it could be good if there is another way to decrease these disadvantages while maintaining the privacy and security. Fortunately, in recent years, several companies have launched a new type of gadget, namely ”Spyglass”, where tiny camera, microphone, and wireless components are integrated secretly into a frame of glasses 1 . Therefore, users could record videos without harassing people, and also focus on whatsoever they want to see as well as transfer what they record to a third party. Unfortunately, this gadget has been used mainly for personal purposes such as entertainment (e.g., recording picnic without worrying of carrying cameras by hands) or even criminal actions (e.g., secretly recording in private or prohibited areas), not much for public security. Recently, there is an interesting application used for US police force as an on-officer cameras. This system aims to ”catch you in the act, right to remain silent imperative”, as announced in 2 .

Abstract—In this paper, a new framework of hybrid mobilefixed video surveillance system (HMFVSS) is introduced. The purpose of this framework is to overcome common problems of existing mobile or fixed video surveillance systems: (1) moral harassment due to unfriendly or unnaturally installed mobile sensors, and (2) blind areas due to narrow-scope moving of fixed cameras. A case study of abandoned object’s owner alert system (AOOAS) is also presented to emphasize the framework’s advantages. IP cameras and ”Spyglass” (i.e. a mobile camera embedded on glasses) are used as fixed and mobile sensors, respectively. There are three main tasks are inherited, developed, and integrated: (1) image registration for automatically locating abandoned object, (2) common histogram based abandoned object’s owner detection, and (3) faces recognition. The experimental results with careful evaluation and comparison with others shows that the proposed framework moves a step ahead in video surveillance system. Keywords-Video Surveillance; Abandoned Objects; Image Registration; Spyglass;

I. I NTRODUCTION Nowadays ambient surveillance has become very popular in our life and it plays a very important role in the security, both in civilization and military. In public spaces, such as subway stations, parks, and streets, various types of camera surveillance systems are installed. These systems could use only fixed cameras (i.e., cameras which are setup at fixed position, could or could not change angles of view) [3], [14], only mobile cameras (i.e., cameras which can move around) [12], or hybrid (i.e., both fixed and mobile cameras) [5] as input sensors in order to record all activities happening inside certain covered areas. Among these systems, fixed camera system have known as the most popular system and gained several remarkable results in public security. Nevertheless, there are still some emerging problems need to be solved: (1) blind areas: areas what cannot be observed by fixed cameras, (2) low resolution images of suspect objects: objects’ images with too small size due to distances from cameras to objects, and (3) being cheated by criminals: criminals can cover or hide themselves in blind areas to avoid being recorded.

1 http://pivothead.com/product/product.php 2 http://www.engadget.com/2012/02/21/tasers-on-officer-cameras-catchyou-in-the-act-right-to-remain/

* authors with equal contributions

978-0-7695-4797-8/12 $26.00 © 2012 IEEE DOI 10.1109/AVSS.2012.4

404

Therefore, in this paper, by integrating this gadget into fixed camera surveillance systems, a brand-new effective mobile-fixed surveillance system prototype for public security is designed and introduced in order to overcome existing problems. In order to prove the effectiveness of the proposed prototype, one of the most important issues in public security, abandoned object’s owner detection (AOOD), is developed as case study of this prototype. As one of the most important issue in public security, abandoned object detection has been attracted attention of researchers for decades [10], [15]. Unfortunately, AOOD has not been paid attention much though it plays a very important role for fast reaction both in criminals preventing and clients supporting, and there are a little of literatures discussing on this topic [4], [7]. The difficulty of AOOD is a very high ”false alarm” rate caused by occlusion or lack of angle view as well as ”memory” of owners in recorded data. By building several essential functions for AOOD problem we confirm that the proposed framework could move a step ahead along this direction. Moreover, we also prove that the proposed framework is easy to be extended depending on clients’ demands and scope of applications. The major contributions of the proposed framework are: • to compensate problems of traditional video surveillance system such as blind areas, occlusion, lowresolution objects’ images and lack of effective application • to decrease moral harassment (e.g., help people feel more privacy, decrease contact directly between people and security forces) • to provide an effective solution to AOOD problem • to serve in public security with high productivity

Figure 1.

The proposed framework architecture

In this framework, AXIS 207W network cameras 3 are used as fixed cameras because of economic price, offered API, and either work with cable or wireless networks. The ”Spyglass”, special equipment in which tiny camera, wireless transfer, memory card slot, and decoder/encoder are integrated into a glass frame (as shown in Fig.2), is used as mobile sensor and worn by officers as a sunglass.

II. H YBRID M OBILE -F IXED V IDEO S URVEILLANCE S YSTEM

Figure 2.

The ”Spyglass”

A case study is also developed to see how well the proposed framework could overcome emerging problems mentioned in section I: Abandoned Object’s Owner Alert System (AOOAS). As mentioned in section I, though AOOAS is necessary in public security in order to fast detect owners to either return normal abandoned objects or catch owners if there is any harmful situtaion, there is still the need of economic and effective system. The overview diagram of AOOAS is illustrated in Fig. 3. This system includes two main tasks as follow: • Human task: There are two roles for human: (1) officers who patrol with Spyglass, and (2) controllers who sit inside control rooms and watching video streaming from cameras. In the first role, whenever officers recognize an abandoned object, they will alert and send to the system the objects’ images. In the second role, the system is alerted either by automatic abandoned object

In this section, a low-cost, simple, and new hybrid mobilefixed video surveillance system that integrates fixed cameras and mobile sensors is introduced with following information (as shown in Fig.1) • IP cameras are installed in predetermined areas (e.g., shopping hall, student restaurants, laboratories, open spaces in universities, etc.), and play as fixed cameras • ”Spyglass” and Handy community equipment (e.g., walkie-talky, smartphone, PDA, etc.) are worn by officers who patrol around predetermined areas. The former plays as mobile sensor, and the latter as remote controller. • Wireless receivers and transmitters are installed in observing areas, and play as communicative bridges. • Servers are used to store and process videos recorded by cameras. • Friendly User Interface supports for both controllers who stay in monitor rooms and officers who carry on patrol courses.

3 http://www.axis.com/products/video/camera/

405

There are two cases abandoned objects can be recognized by human: (1) from controllers who sit in the server room and watch monitors, and (2) from officers who wear ”Spyglass” and patrol in their route. Since the former is trivial, we pay attention in the latter case. In this case, the problem is how to locate the same object that appears in both Spyglass and IP camera data. In another word, when an officer recognized an abandoned object, he/she focuses on that object for around 5 seconds, then that object is automatically marked in IP camera data by the blue rectangle, namely AbandonedObjectArea, to activate the second module AOOD. This problem is the image registration problem categorized to ”different viewpoint” and ”different sensor” groups, as analyzed in [16]. In our case, we decide to use feature-based method (i.e., SURF-64 [2]) for feature detection and RANSAC [8] as feature matching. Since the limitation of space, we avoid going deep inside the detail of this module.

Figure 3. Abandoned Object’s Owner Alert System built on the proposed framework

detection function [10], [15], or by controllers manually (e.g., draw a rectangle around a suspect object). The results (i.e., owner’s information) are finally returned to officers via smartphone with appropriate orders to follow. • Machine task: Using all information fed by officers, controllers, or machine, the AOOD task is activated to detect owners. After successfully detecting owners, their faces will be registered to the blacklist, and two further tasks are performed: (1) suspects registration and detection from current time onwards (e.g., look for owners in observing areas, send warrant to other departments, etc.), and (2) face detection in storage to get more information of owners depending on demands of investigation progress. AOOS has three major modules: (1) Abandoned object detection module, (2) Abandoned object’s owner detection module, and (3) Owner recognition module. Generally, existing methods related to abandoned object’s owner detection always assume that there is a memory tracing along a time axis until objects are abandoned totally. The question is that if abandoned objects are detected or located without any knowledge of their previous state, whether their owners could be detected successfully? This situation usually happens in reality since most of abandoned object detection methods cannot perform with 100% accuracy. There are usually some abandoned objects need to be pointed out manually and in this case, no knowledge of previous activities of owners is recorded. Therefore, AOOAS will be built according to this direction.

IV. A BANDONED O BJECT ’ S OWNER D ETECTION By observing situations in real life, we could say that owners usually carry on their objects to AbandonedObjectArea, could be stand or step around for a while, then totally leave that area. Therefore, we define two terms for marking the period when owners start and finish leaving their object (KeyTime): (1) appearance time (Tapp ): is the time objects totally appear in AbandonedObjectArea, and (2) start-appearance time (Tstartapp ): is the time objects just start appearing in AbandonedObjectArea. It should be noted that Tstartapp ≤ Tapp . In other words, when owners carried on their objects approach AbandonedObjectArea, we mark that moment by Tstartapp ; and when objects totally appear in AbandonedObjectArea, we mark that moment by Tapp . Therefore, KeyTime = [Tstartapp , Tapp ]. A. KeyTime Detection Detecting KeyTime is a trivial task by using template matching and measuring the dissimilarity between the object template T extracted from AbandonedObjectArea and the ROI at the same location in time-backward video frame, start from the time objects are located. B. Candidate Owner Detection Inside the period KeyTime, objects and their owners must stick together (as we mentioned above). Therefore, by tracking objects along time-backward direction from Tapp to Tstartapp - e (where e is a deviation for not missing a useful information to identify candidate owners. In our case, e is 100 frames), we can capture the covering area that contains owner area. For object tracking, we use MEANSHIFT [6] and Kalman filter [9]. After finishing tracking object, the candidate owner area is selected by using background subtraction and connected component methods. Nevertheless, this task cannot guarantee that the detected area contains

III. AUTOMATICALLY L OCATING A BANDONED O BJECT ON IP C AMERA DATA FROM ”S PYGLASS ” DATA As mentioned in the previous section, we focus on the case abandoned objects are located manually by human. The automatic abandoned object recognition is out of this paper’s scope.

406

by the previous module. With these face images, we use face registration tools offered by face.com to trigger the face detection scheme over our system. The advantage of this system is on ”Spyglass”. With this ”Spyglass”, officers who patrol along their route can observe persons at the same time with controllers who work in a center by synchronizing video data recorded by their ”Spyglass” and by fixed cameras. Therefore, whenever the AOOD module finishes registering owner faces, the whole system can work to find the owner immediately as long as that owner still in covering areas.

only one owner. It could be one, two, or group of person who happen to appear near by the object at the time we analyze (as shown in Fig.5c, f). These persons could occlude or pass behind the right owner. Therefore, we need one more step to select the right owner. C. Owner Selection In order to overcome the problem mentioned above, we propose the algorithm using common color histogram (as shown in Alg.1). We assume that, the right owner appears inside period [Tstartapp - e, Tapp ] of time. Hence, by calculating the average of color histogram (AVH) of all frames inside [Tstartapp - e, Tapp ] period, we can select the frame whose color histogram is closest to (AVH).

VI. E XPERIMENTAL R ESULTS In this section, experimental results are reported to evaluate the effectiveness of the proposed framework as well as a case study AOOAS.

Algorithm 1 Owner selection 1. for t = Tstartapp − e to Tapp do 2. ConvertRGB2HSV (F ramet ); 3. M asking(F ramet , OwnerF oregroundM askt ); 4. Hist[t] = Calc HS Hist(F ramet ); 5. N ormalizeHist(Hist[t]) 6. Hc + = Hist[t] − C 7. end for 8. N ormalizeHist(Hc ) 9. for t = Tstartapp − e to Tapp do 10. Distance[t] = Bhattacharyya(Hc , Hist[t]) 11. end for 12. sort(Distance) 13. Output top 10 frames

A. Hybrid System Architecture The proposed system is setup with following information: (1) 4 IP cameras are installed as fixed cameras in 2 separated rooms, 2 cameras/room, (2) 2 Spyglasses are worn by 2 volunteers who play an ”officers” role. Officers will patrol around these two rooms freely, (3) 1 server to store and analyze data recorded/streamed from these cameras, (4) 1 client controlled by 1 volunteer who plays a ”controller” role. The controller is in charge of activating AOOD module, as well as updating blacklist or issuing orders for officers, and (5) C# and Ajax are used as main programming languages. Table I denotes the evaluation made on three kinds of surveillance videos on real situations (voted by volunteers): (1) only one stable camera [3], (2) multi cameras [14], and (3) mobile camera [12]. The first system has problem of occlusion, and small object images due to stable cameras. The second one solves these problems, but small sizes of objects impact significantly to video processing stage. This leads to the low accuracy of the system. The last one is robust to occlusion due to not depending on a fixed camera position, but the view area is too narrow. Besides, it could harass people due to unnaturally installed mobile equipment. In contrast, the proposed system can discover blind corner, compensate occlusion by both fixed and mobile cameras, and avoid moral harassment.

Where C is the noise decreasing threshold. When the Hc is calculated, due to occlusion or passing behind, the noise is accumulated into the Hc . Therefore, we need to subtract these noises after accumulation progress. In our experiments, the C is set the total histogram bins/500. D. Face Detection and Registration The human face is widely used to identify persons. Thus, we extract faces from detected owner area by using REST API from face.com 4 . Using these API, we conduct three functions: (1) face detection, (2) face learning using face tag upload, and (3) face recognition. In the case there are still more than one person after running ”Owner Selection” step (confirmed by the number of faces detected using API of face.com), we accept to inform all these persons are suspect owners, and alert for controllers to make their own decision.

B. Abandoned Object’s Owner Alert System In this subsection, we would like to evaluate the three major modules of AOOAS as follow: 1) Abandoned Object Location Module: With each IP camera and object, we capture several images from the Spyglass with various angles and distances of view. Then, we evaluate image registration results by asking volunteers vote for their satisfaction degree. The accuracy rate of this module cannot reach over 65%. The main reason is from the number of feature points extracted from object and object’s environment as well as the view’s angle of ”Spyglass”. In

V. OWNER R ECOGNITION M ODULE This module serves as the post-processing stage of the previous module. In this module, we use all detected owner’s faces with different poses extracted by tracing all backward and forward frames start at the located frame pointed out 4 http://developers.face.com

407

Table I T HE COMPARISON OF SURVEILLANCE SYSTEM View area Hybrid(ours)

can see blind corner

Security support help officer catch suspect

Fixed [3]

have blind corner missing too small objects

detect in video only detect in video only

narrow and discontinuous

moral harassment

MultiCam [14] Mobile [12]

Occlusion compensate by other camera view poor compensate by other camera view robust manually

Figure 5. Occlusion situations: (a) and (b) are heavy occlusion examples which are defined that a object is totally occluded and the period of occlusion is more than 5 frames. (c) and (d) are part occlusion example which mean that a object is occluded but not satisfies the requirement of heavy occlusion. (e) and (f) are passing behind example which are defined that another person passes behind of the owner

(a) Ideal case

means that the system detects the owner of the abandoned object, and ”False” means vice versa. ”Passing”, ”Part” and ”Heavy” are our own dataset include one of occlusion situations defined in Fig.5. ”AVSS” is the dataset from AVSS 2007, and ”ITEA” is the one from ITEA CANDELA. Occlusion is the one of major problems influent to the qualification of owner extraction. There are two major situations happen with a high frequency: (a) Passing situation: a pedestrian passes behind the owner (as shown in Fig.5 right column), (b) Part or Total occlusion: a pedestrian is occluded partly or totally by others (as shown in Fig.5 center and left column respectively). Thanks to Alg. 1, we can overcome such a situation with high accuray. Fig. 6 illustrates one case of this problem.

(b) Acceptable case

Figure 4. Automatically locating abandoned object on IP camera data from ”Spyglass” data. The upper right corner contains image captured from ”Spyglass”, the other from IP camera. The yellow polygon is image registration result. The blue rectangle is the AbandonedObjectArea. In (a), AbandonedObjectArea contains only the object. In (b), the object is still included in AbandonedObjectArea.

the future, we will try to integrate area-based and featurebased methods to increase the accuracy rate of this module. Nevertheless, since there is the support of controllers, the final AbandonedObjectArea could be located good enough to activate the second module AOOD. Although the current accuracy rate of this module is not high, this module could help to alleviate the burden of controllers in manually marking abandoned objects. 2) Abandoned Object’s Owner Detection Module: The proposed method is evaluated by using AVSS 20075 , ITEA CANDELA6 , and our own dataset. The first one contains video recorded from subway stations. The second one focuses on indoor data. And the last one is recorded in our LAB, 12 hours; consists of Passing, Part occlusion, and Heavy occlusion (as shown in Fig.5). Thorough tests are made focusing on these situations to evaluate our method’s accuracy. Table II shows how good our method is when dealing with these situations. Each clips includes one abandoning scene by one owner. ”Correct” 5 http://www.eecs.qmul.ac.uk/˜andrea/avss2007

Figure 6. Owner detection using Alg. 1 (right column), and without Alg. 1 (left column)

3) Owner Recognition Module: After detecting owners successfully, by continuously tracking backward and forward

d.html

6 http://www.multitel.be/˜va/candela/abandon.html

408

In the future, more functions and components will be developed and integrated into this framework to extend the scope of applications in real life, especially in human activities understanding. Due to lack of time, suitalbe materials, and testing environments, at the moment, the proposed framework was evaluated in the locally-distributed camera network simulation environment. Nevertheless, the largelydistributed networks will be investigated soon to evaluate thoroughly the productivity of the proposed frameworks.

Table II R ESULTS OF OWNER DETECTION Passing Part Heavy AVSS ITEA Total

Clips 15 10 13 3 5 46

Correct 14 10 12 2 4 42

False 1 0 1 1 1 4

Accuracy 93% 100% 92% 67% 80% 91%

R EFERENCES

detected owners, we can collect a series of face images with diverse poses. These series of face images will be fed to ”face learning API” of face.com to training, then this module can use such a trained data to recognize the owner. Fig.7 shows one case where the owner is recognized by ”Spyglass” meanwhile IP camera cannot due to too small face image. This confirms that using the proposed framework could solve successfully the AOOD problem.

Figure 7.

[1] B. Barake, P. Atrey, J. Zhao, and A. El Saddik. A user friendly mobile device interface to assist security officers in physical monitoring at public places. In Proc. of the 2009 workshop on Ambient media computing, pages 29–36. ACM, 2009. [2] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Journal of CVIU, 110(3):346–359, 2008. [3] B. Benfold and I. Reid. Stable multi-target tracking in realtime surveillance video. In Proc. of CVPR, pages 3457–3464. IEEE, 2011. [4] M. Bhargava, C. Chen, M. Ryoo, and J. Aggarwal. Detection of object abandonment using temporal logic. Machine Vision and Applications, 20(5):271–281, 2009. [5] P. Biswas and S. Phoha. Self-organizing sensor networks for integrated target surveillance. IEEE Trans. on Computers, 55(8):1033–1047, 2006. [6] G. Bradski. Computer vision face tracking for use in a perceptual user interface. Proc. of Intel Technology Journal, 1998. [7] J. Chang, H. Liao, and L. Chen. Localized detection of abandoned luggage. EURASIP Journal on Advances in Signal Processing, 2010:11, 2010. [8] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Magazine of Comm. of the ACM, 24(6):381–395, 1981. [9] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960. [10] H. Kong, J. Audibert, and J. Ponce. Detecting abandoned objects with a moving camera. IEEE TIP, 19(8):2201–2210, 2010. [11] E. Menegatti, M. Cavasin, E. Pagello, E. Mumolo, and M. Nolich. Combining audio and video surveillance with a mobile robot. Int. Journal on AI Tools, 16(2):377, 2007. [12] Y. Tomioka, A. Takara, and H. Kitazawa. Generation of an optimum patrol course for mobile surveillance camera. IEEE TCSVT, (99):1–1, 2011. [13] Y. Tseng, Y. Wang, K. Cheng, and Y. Hsieh. iMouse: an integrated mobile surveillance and wireless sensor system. Computer, 40(6):60–66, 2007. [14] X. Wang, K. Tieu, and E. Grimson. Correspondence-free activity analysis and scene modeling in multiple camera views. IEEE TPAMI, 32(1):56–71, 2010. [15] Z. Yang and L. Rothkrantz. Surveillance system using abandoned object detection. In Proc. of the 12th ICCST, pages 380–386. ACM, 2011. [16] B. Zitova and J. Flusser. Image registration methods: A survey. Image and Vision Computing, 21:977–1000, 2003.

Owner Recognition Result

VII. C ONCLUSIONS A hybrid mobile-fixed video surveillance system has attracted a lot of attention from both academic and industrial communities. The ideal hybrid system should satisfy several emerging conditions such as low-cost fee and simple operation, less moral harassment as well as computational complexity. Besides decreasing as much as possible unexpected interferences from both human (e.g., criminals, careless clients) and environment (e.g., sudden obstacles) to system’s productivity is also the utmost requirement. Starting from this spirit, a low-cost, simple, and new hybrid mobile-fixed video surveillance framework using IP cameras and ”Spyglass” is introduced in this paper. Moreover, a case study, namely ”Abandoned Object’s Owner Alert System” (AOOAS), built on the proposed framework is also investigated to show how well the proposed framework can contribute to public security.

409