sampling for monitoring in a multi-camera setting. The system utilizes the ... otherwise, or republish, to post on servers or to redistribute to lists, requires prior ...
Experiential Sampling for Monitoring Wei-Qi Yan
Mohan S Kankanhalli
School of Computing National University of Singapore {yanwq, mohan}@comp.nus.edu.sg
ABSTRACT This demonstration presents a novel prototype of experiential sampling for monitoring in a multi-camera setting. The system utilizes the experiential sampling technique to compute the importance of each frame in multiple video streams, and selects the most important scene to display on the monitor that can be connected by a network or a wireless communication device, such as the infrared port mounted on a laptop. Furthermore, each frame to be displayed is adaptively centered to display the region of interest like moving objects or the human face. The monitored region is zoomed and panned based on the number and distribution of attention samples at that moment. What this demo therefore attempts to do is to display on the monitor the most relevant frame with a focus on the region of interest.
Categories and Subject Descriptors I.4.8 [Image Processing and Computer Vision]: Scene Analysis-Color, Motion, Sensor fusion, Time-varying imagery. I.6.5. [Simulation and Modeling]: Model Development.
General Terms Algorithms, Performance, Design, Experimentation.
Keywords Experiential sampling, experiential computing, telepresence monitoring, face detection.
1. INTRODUCTION Human beings utilize all their sensors to experience the environment. Experiential environments let users apply their senses to observe data and information about an event and to interact with aspects of the event that are of particular interest. These environments are essential to explore large volumes of heterogeneous, multifarious, spatio-temporal data to gain insights in the situation through experiencing the data. Thus, in an experiential computing environment, users apply their senses directly, observing event-related data and information of interest. An early example of a computing system built with this goal in mind is the Praja System [1]. The idea is to allow for real-time Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ETP 2003, November 7, 2003, Berkeley, California, USA. Copyright 2003 ACM 1-58113-780-X/03/00011…$5.00.
Wang Jun
Marcel J.T.Reinders
Department of Mediamatics, Faculty of EEMCS Delft University of Technology {j.wang, m.j.t.reinders} @ewi.tudelft.nl data analysis to be coupled with real-time data assimilation in order to obtain a deeper insight into the situation being monitored. Many of these ideas have been explored in the context of video surveillance through the notion of multiple perspective interactive video [2,3]. This approach can be characterized as the early precursor of experiential computing [4]. Digital experience involves immersion in a rich set of data and information in a way that allows us to observe the relevant subset of the data and information directly. In [2], the experiential computing paradigm has been elaborated. In an experiential computing environment, users directly apply all their sensors to observe data and information of interest related to an event. Furthermore, each user can explore the data according to his particular interest in the context of that event. Experiential environments free people from the tedious management of enormous volumes of heterogeneous data. By dealing with spatial-temporal live data streams, experiential computing can address many new real world problems. The experiential sampling technique provides a formal basis for analysis in experiential environments. The experiential environment is sensed through the means of sensor samples. Based on the context as well as the sensed environment, attention samples perform adaptive sampling of the most relevant data for the user's purpose at hand. Moreover, the context information and the attention can be temporally evolved using a dynamical system. In [5], we have used the experiential sampling technique to analyze multimedia data that consist of both audio and video streams. We have used the technique for object detection and tracking. This includes moving cars on the highway and people walking in outdoor environments. Human face detection and tracking was also performed in both indoor and outdoor environments. In [6], we applied our basic experiential sampling scheme to situations where either multiple cameras (data streams) are either present or are necessary. The framework presented there can calculate the importance of each video frame emanating from different cameras and these frames are ranked according to the importance at every time instance. The idea is that if there is one screen available for tele-monitoring applications, the most important frame can be displayed on the screen. The sequence of most important frames can be thought of as a new composed video. If a person monitoring application, this video can reveal the best captured snapshots of a walking person. In [7], we utilize the control features of a video camera, and adaptively adjust the parameters to best frame the object position for the purpose of tele-monitoring. We basically utilize a closedloop feedback control mechanism to adaptively change the
camera parameters in order to centrally frame the object of interest. The proportional feedback control strategy has been employed for this purpose. For human faces, it basically centers the face in the video frame. Our on-going work is focused on the continual adjustment of lighting, contrast and sharpness parameters for obtaining good quality sensor data. Our monitoring demonstration will essentially be a demonstration of the ideas described in [5,6,7]. We have integrated the multicamera setting with the feedback control mechanism which can therefore adaptively pan and zoom for the objects of interest.
2. The Prototype Figure 1 is an illustration of our experiential sampling prototype. In our prototype, multiple sensors are mounted to accumulate the information from a wide variety of monitoring environment at the first level. At the second level, feature extraction and sensor fusion are performed. The assimilated information at this phase will be used for experiential sampling. At the third stage, dynamic attention extracted via sampling is calculated for fine visual analysis. Finally the resultant output of attention information for monitoring is updated from time to time according to the importance of the acquired samples. The key feature of the system is in the dynamical evolution of the attention samples. If motion is the activity being monitored, with more motion, the number of attention samples will increase. If the amount of motion activity decreases, the number of attention samples will fall correspondingly. The prototype implements face detection and monitoring in which the moving face will be detected and displayed on the monitor while the attention samples keep changing. The attention samples are also displayed on the monitor as weighted dots.
Figure 2: Our prototype for experiential sampling monitoring Figure 3 is a snapshot of the GUI for our prototype. It detects faces in real-time based on the motion cue. The number of attention samples depends on the amount of motion activity. Face detection is then performed in the attention sample regions. The detected faces are marked with a yellow rectangle. Our prototype has the following features:
Figure 3: The GUI of our experiential sampling monitoring prototype
Figure 1: Illustration of Experiential Sampling for Monitoring Figure 2 shows our set-up for the monitoring with multiple cameras. Each camera is connected to its monitor or its grid region in a single monitor. Multiple monitors or grid regions can easily result in the missing of an important event or activity due to fatigue in continual monitoring of several data-streams. This prototype therefore analyzes the output of several sensors in realtime and outputs the most relevant (as defined by the monitoring task) on one single monitor. This serves as a tremendous aid in the tele-monitoring of several data streams.
z
Tracks moving objects across multiple cameras.
z
Detects faces in real-time and sets up an alarm whenever a face is detected.
z
Performs automatically zooming and panning for face detection and tracking so as to centrally frame the detected face.
z
The varying number of attention samples can be displayed and used for a variety of multimedia analysis tasks.
3. Conclusion Experiential computing represents a powerful computing paradigm for handling and assimilating information from multiple live data sources. In this demonstration, we mainly exhibit the monitoring of moving objects in a multiple cameras setting. We also illustrate the camera control through panning and zooming for the purpose of centrally framing the object. All ideas have been demonstrated using the human face as a prototypical example of an object. We believe that this is the first step towards tele-monitoring for the purpose of surveillance. We are currently
working on a theoretical framework to incorporate user interaction so as to interactively steer the sensors to provide data for better analysis in a dynamical system incorporating feedback. The idea would be to steer the sensors towards configuration that can provide the most optimal view of the situation being monitored. For this purpose, we are incorporating multiple types of sensors as well into the framework.
4. REFERENCES [1] Arun Katkere, Saied Moezzi, Don Y. Kuramura, Patrick Kelly, Ramesh Jain. Towards Video-Based Immersive Environments, Multimedia Systems, Vol. 5, No. 2, pp. 69-85 (1997).
[2] Jain R., Experiential Computing, Communications of the ACM, 46(7): 48-55, July 2003.
[3] P. H. Kelly, A. Katkere, D. Y. Kuramura, S. Moezzi, S. Chatterjee, R. Jain, An Architecture for Multiple Perspective Interactive Video, Proc. of ACM Multimedia 1995, 201-212.
[4] S Santini and R Jain, A Multiple Perspective Interactive Video Architecture for VSAM, Proc. of the Image Understanding Workshop, Monterey, November 1998.
[5] J. Wang and M S Kankanhalli, Experiential Sampling for Multimedia Analysis. ACM Multimedia 2003, Berkeley, November 2003.
[6] J Wang, M S Kankanhalli, W Q Yan and R Jain,
Experiential Sampling for Video Surveillance. In Proc. 1st ACM Int. Workshop on Video Surveillance, Berkeley, November 2003.
[7] Jun Wang, Wei-Qi Yan, Mohan S. Kankanhalli, Ramesh Jain, Marcel J. T. Reinders, Adaptive Monitoring for Video Surveillance, The Fourth IEEE Pacific-Rim Conference on Multimedia (PCM 2003), December 2003, Singapore.