Mobile Video Stream Monitoring System Kam-Yiu Lam and Calvin K. H. Chiu Department of Computer Science City University of Hong Kong 83 Tat Chee Avenue, Kowloon, Hong Kong
[email protected],
[email protected]
ABSTRACT IMVS (Intelligent Mobile Video Stream Monitoring System) is a mobile video surveillance system. The objective of IMVS is to design a high performance video stream monitoring system in a mobile computing environment. In particular, the technical questions to be addressed are: (1) how to minimize the amount of video signals to be transmitted between the front-end mobile device and the backend server over the mobile network; and (2) how to divide the jobs to be performed between the front-end and backend processes so that the workload at the front-end mobile device can be maintained within its processing capacity.
filtering, and transmitting the interested visual object image to the backend server for detail analysis.
2. OPERATION PROCEDURE Wireless Network
RSA
VMA
VMA
Categories and Subject Descriptors I.4.8 [Image Processing and Computer Vision]: Scene Analysis – Object recognition, Tracking.
: : ROU
General Terms
ROU
...
Performance, Design Figure 1. System components.
Keywords Video surveillance, wireless computing, scheduling and rule evaluation
1. INTRODUCTION IMVS is suitable for applications where a close monitoring of the status of objects is required but the background environment is quite static. Two example applications of IMVS are monitoring the critical areas in a building and monitoring the safety of a baby in a house. The outside parents may issue a monitoring rule such as: “send me the images if the baby has entered the kitchen”. IMVS is a two-tier architecture consisting of a front-end mobile device equipped with a video camera, and a powerful backend server. They cooperatively work with each another to monitor video images for the queries submitted by the clients of the system. The backend server is responsible for video context analysis, visual object modeling, and monitoring rules evaluation. A hierarchical object model is adopted to model the interested objects in the monitoring scene. The front-end device is responsible for video capturing, pre-analysis of visual objects and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright is held by the author/owner(s). MM’03, November 2-8, 2003, Berkeley, California, USA. ACM 1-58113-722-2/03/0011.
As shown in Figure 1, the VMA (the front-end mobile unit) is a mobile computing device with light processing power. It detects and tracks visual objects of its monitoring environment periodically and forwards them conditionally to the RSA (backend server). The conditions are defined by the backend server based on the submitted monitoring queries from the ROUs (the users). RSA
VMA Image frame Initial phase Monitoring phase
Monitoring rules VO attributes VO features
Figure 2. The initial phase and monitoring phase. The operations of IMVS consist of two phases as shown in Figure 2: initial phase and monitoring phase. In the initial phase, the VMA forwards an initial video image frame to the RSA to perform a pre-analysis on the image frame to identify and annotate what are the visual objects in the frame and their properties such as their sizes and identity, etc. (The pre-analysis may be assisted with information input from the users.) The RSA then generates a set of monitoring rules according to the context of the identified objects and their properties, and the submitted user monitoring
In the monitoring phase, the VMA analyzes video frames periodically and maintains the hierarchical object model (HOM). Each visual object (VO) is adaptively scheduled for re-evaluation basing on the previous status of the objects corresponding to the monitoring rules. In principle, if the status of the objects is close to the conditions defined in the associated monitoring rules, it will be given a high priority meaning that the period length of evaluation is shorter. Video frames are segmented into visual objects which are further pre-filtered following the monitoring rules. The pre-filtering process is an object-based approach to compare the status of the objects in successive video frames. We considered applying some sophisticated object detection techniques [3,4] yet they are too complicated for mobile computing device in terms of processing speed and memory requirement. For the sake of computational efficiency, the implemented detection module is a modified hybrid algorithm based on [1,2] and connected-component labeling.
3. OBJECT MODELING & ADAPTIVE MONITORING A monitoring scene is composed of a collection of visual objects. A visual object consists of an entity together with its region in the scene, i.e., its dimension in the image. The VOs in the scene are organized into a hierarchical relationship. Each VO is described by a set of attributes. These attributes go into two types: dynamic and static. Dynamic attributes are the attributes that represent the properties of the VO changing over time. These attributes are frequently updated to reflect the current status of the VO. Some examples are: position, direction and speed. Static attributes refer to the attributes which are more static in nature: dimension, size, color, classification (e.g. human), type (static/dynamic), and label. Monitoring queries are defined based on the context of the monitoring environment:
Send me the images if someone entered my home. In the example, “someone” and “my home” represent two visual objects. “send me the images” is the action to take when the event occurs. “entered” is one of the supported event type. In IMVS, the object-based adaptive monitoring (OBAM) scheme is proposed to minimize the processing overhead at the VMA in which the evaluation frequency for individual VO is adaptive to the current and previous status of the objects comparing to the associated monitoring rules. If the current status of a visual object is similar to that in the previous video frame initially, the VMA will operate at loose mode for that VO with long evaluation period to conserve energy consumption of the mobile device. If the status of the object is approaching to the conditions as defined in the monitoring rules, i.e., it is entering into the monitoring environment, the VMA bursts itself to full strength with shorter evaluation period as shown in Figure 3 to provide a closer monitoring on status of the visual objects. If the VMA finds that the object is leaving the monitoring environment or has no significant changes in the status, the evaluation period is then adjusted longer gradually towards a loosely monitoring mode.
4. IMPLEMENTATION & DEMONSTRATION IMVS is implemented in C++ and J2ME. The RSA and the VMA currently is running on an ordinary PC, in which a Pocket PC version of the VMA is developing. The ROU is developed in J2ME which runs on Java phone. In the demonstration, we will setup a stationary camera connected to a PC as the VMA. The ROU is demonstrated in Java-phone emulator. Through the GUI of the RSA and the ROU, we can clearly obtain the real-time status of objects in the monitoring environment. Monitoring queries can be modified to respond to the current condition. Performance statistics is available in runtime such as the number of messages transferred between the RSA and VMA, and the computation cost for video frame processing and for analysis purposes. Figure 4 shows the identification of a visual object displaying at the RSA.
5. REFERENCES [1] R.T. Collins, A.J. Lipton, T. Kanade, “A System for Video Surveillance and Monitoring”, Conference on Automated Deduction, 497-501, 2000. [2]
F. Bobick, J. W. Davis, “The Recognition of Human Movement Using Temporal Templates”, IEEE Transaction on PAMI, vol. 23, no. 3, 2001.
[3] C. Ridder, O. Munkelt, H. Kirchner, “Adaptive Background Estimation and Foreground Detection using KalmanFiltering”, International Conference on Recent Advances in Mechatronics, 1995. [4] J. Barron, D. Fleet, and S. Beauchemin, “Performance of optical flow techniques”, International Journal of Computer Vision, 12(1):42–77, 1994. Object detected Evaluation frequency
queries, which define how close the tracking should apply on individual visual objects. Visual objects are traced in higher priority if its status is close to the monitoring target.
Object does not match to any rules
Return to loose monitoring mode
Time
Figure 3. The evaluation period is adjusted according to the status of visual objects.
Figure 4. The IMVS server program.