Intelligent video surveillance system

Multimed Tools Appl DOI 10.1007/s11042-010-0677-x

Intelligent video surveillance system: 3-tier context-aware surveillance system with metadata Yunyoung Nam & Seungmin Rho & Jong Hyuk Park

# Springer Science+Business Media, LLC 2010

Abstract This paper presents an intelligent video surveillance system with the metadata rule for the exchange of analyzed information. We define the metadata rule for the exchange of analyzed information between intelligent video surveillance systems that automatically analyzes video data acquired from cameras. The metadata rule is to effectively index very large video surveillance databases and to unify searches and management between distributed or heterogeneous surveillance systems more efficiently. The system consists of low-level context-aware, high-level context-aware and intelligent services to generate metadata for the surveillance systems. Various contexts are acquired from physical sensors in monitoring areas for the low-level context-aware system. The situation is recognized in the high-level contextaware system by analyzing the context data collected in the low-level system. The system provides intelligent services to track moving objects in Fields Of View (FOVs) and to recognize human activities. Furthermore, the system supports real-time moving objects tracking with Panning, Tilting and Zooming (PTZ) cameras in overlapping and non-overlapping FOVs. Keywords Object identification . Object localization . Object tracking . CCTV . Surveillance . PTZ camera . Metadata

1 Introduction The advent of increased network bandwidth and improved image processing technologies has led to the rapid emergence of intelligent video surveillance systems. Traditional Y. Nam Center of Excellence for Ubiquitous System, Ajou University, Suwon, South Korea e-mail: [email protected] S. Rho (*) School of Electrical Engineering, Korea University, Seoul, South Korea e-mail: [email protected] J. H. Park Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, South Korea e-mail: [email protected]

Multimed Tools Appl

closed-circuit television (CCTV) requires a relative few operators to continuously monitor a significant number of cameras in areas, such as military installations, roads, and airports that need security. Intelligent surveillance systems can provide automated services, such as abrupt incursion detection, shortest path recommendation using traffic jam analysis, robbery monitoring, and people counting. In general, video surveillance systems monitor the specific activities by analyzing a recorded video and multi-channel monitoring. It is a considerably tedious task for people to monitor multi-channel division screens for 24 h. In addition, when the target object moves from one area to the other area, a camera handover is required. Accordingly, intelligence video surveillance systems not only provide real-time abnormal event detection by analyzing images acquired from cameras, but also acquire continuous video sequences from adjacent cameras using panning, tilting and zooming (PTZ) with multi-channel, multiarea, and an immediate acquisition of the monitoring area video with complex and vast environments. More and more cameras are being in place forming huge surveillance systems. Within those systems required basic functionalities are identical. The video stream needs to be transmitted from the site to an appropriate place where it will be archived. The video might be looked at by a number of persons and in case of an incident it could be exported to the appropriate authorities. In order to identify a requested stream it would be necessary to enhance the pure video stream by appropriate metadata. The information about recording time and place as well as camera parameters used for recording would be sufficient to achieve basic interoperability. For efficient archival a packaging of the video and metadata information into a file format should be supported. That file format should also provide for the inclusion of user data and possibly additional MPEG-7 metadata. This metadata should provide key functionality to support the activities of the CCTV manufacturers, installers and users. This paper describes the metadata rule for exchanging analyzed information in surveillance systems. The remaining sections of this paper are organized as follows. Section 2 introduces related work. The system architecture and metadata scheme are given in Section 3. Section 4 presents video analysis methods for metadata generation. Section 5 shows physical prototyping of an intelligent video surveillance system. Finally, we conclude this paper and discuss our future work in Section 6.

2 Background 2.1 Surveillance systems Surveillance systems have played an important role in the management of public places relating to safety and security. The explosion in the number of cameras that should be monitored, the accruing costs of offering monitoring personnel and the limitations of human operators to uphold sustained levels of concentration, severely circumscribe the efficaciousness of these systems. Alternatively, subsequent advances in information and communication technologies can potentially offer considerable improvements. The deployment of technology to maintain surveillance is used in modern urban environments [17]. In many surveillance applications, events of interest may occur rarely. For these unusual events (or abnormal, rare events), it is difficult to collect sufficient training data for supervised learning to develop unusual event models. In this case, many unusual event detection algorithms [7, 25, 26, 28] that require a large amount of training data become

Multimed Tools Appl

unsuitable. Several algorithms have been proposed to address the difficulty of unusual event recognition with sparse training data. Zelnik-Manor et al. [25] and Zhong et al. [28] clustered the divided video clips into different groups based on a similarity measure. The groups with relatively small numbers of video clips are detected as unusual events. However, since unusual events have insufficient training data, clusters for these events may not be sufficiently representative to predict future unusual events. Zhang et al. [26] proposed a method by developing the unusual event model from that of usual events. This method provides a hint on how to deal with the lack-of-training-data issue. However, they obtained all unusual event models by adapting from the general usual-event model, while in reality, the usual events and unusual events can be vastly different in nature. In this paper, we develop abnormal activity recognition based on a motion history image and a moving trajectory of objects. 2.2 Tampering detection Position, angle, and power of a camera can be arbitrarily changed intentionally or accidentally. We implemented a tampering module using image difference calculation to deal with this situation. Scene change detection is essential for the implementation of the tampering module to detect tampering. Numerous scene change detection schemes have been proposed. Nam [14] detected a gradual scene change, such as fade-in, fade-out, and overlaps by B-spline curve. However, this is inappropriate for the tampering module implementation, because it does not cope with abrupt scene changes. In addition, Zhao [27] and Huang [9] detected moving picture scene changes using color histogram and pixelbased features. The color or pixel reflects the global feature of the image to detect the change of a screen. Unlike tampering detection, scene change detection in moving images is more difficult than orientation detection of a camera. Thus, it needs to be modified for our tampering method. In addition, Ribnick [20] used a short-term and long-term image buffer to calculate image similarity using image chromaticity, L1-norm value and histogram value. In this paper, we implemented tampering detection using the RGB color feature. The trial and error method was used to set the threshold value. An object in an image can be represented by the shape of a point, circle, square, contour, and silhouette. A point-employed method describes an object as a set of points [22] or the centroid [21]. In addition, an object also can be represented by an ellipse [5]. The contour and silhouette method [24] recognizes the object as a contour that is the silhouette boundary obtained inside the contour. It is suitable for tracking complex non-rigid shapes. Last, the object skeleton method [1] uses the medial axis transformation to the silhouette of an image. In this paper, we represent an object as the representative point of an object and use it to track the object. 2.3 Human activity recognition Human activity recognition is a challenging task due to the non-rigidness of the human body, as human motion lacks a clear categorical structure: the motion can be often classified into several categories simultaneously, because some activities have a natural compositional structure in terms of basic action units and even the transition between simple activities naturally has temporal segments of ambiguity and overlap. Human motion often displays multiple levels of increasing complexity that range from action-units to activities and behaviors. The more complex the human behavior, the more difficult it becomes to perform recognition in isolation. Motions can occur in various timescales and as they often exhibit

Multimed Tools Appl

long-term dependencies, long contexts of observations may need to be considered for correct classification at particular time-steps. For instance, the motion class at a current time-step may be hard to predict using only the previous state and the current image observation alone, but may be less ambiguous if several neighboring states or observations possibly both backward and forward in time are considered. However, this computation would be hard to perform using a Hidden Markov Model (HMM) [18] where stringent independence assumptions among observations are required to ensure computational tractability. Many algorithms have been proposed to recognize human activities. Lv et al. [13] and Ribeiro [19] focus on the selection of suitable feature sets for different events. Models, such as HMM [7, 18], state machine [2], Adaboost [23], are also widely used for activity recognition. However, most of the methods proposed in these works are inflexible to add new activities. They are trained or constructed to recognize predefined events. If new activities are added, the entire model has to be re-trained or the entire system has to be re-constructed. Other methods [25, 28] tried to use a similarity metric, so that different events can be clustered into different groups. This approach has more flexibility for newly added events. However, due to the uncertain nature of the activity instances, it is difficult to find a suitable feature set, such that all samples of an event are clustered closely around a center. 2.4 Object tracking based on multiple cameras A single camera is insufficient to detect and track objects due to its limited field of view (FOV) or occlusion. Many approaches address detection and tracking using overlapping or non-overlapping multiple views. Tracking algorithms [3, 11] require camera calibration and a computation of the handoff of tracked objects between overlapped cameras. It is necessary to share a considerable common FOV with the first camera to accomplish this. These requirements of overlapped cameras, however, are impractical due to the large number of cameras required and the physical constraint upon their placement. Thus, it must be able to deal with a non-overlapping region in the system, where an object is invisible to any camera. Kettnaker and Zabih [12] presented a Bayesian solution to track objects across multiple cameras where the cameras have a non-overlapping field of view. They used constraints on the motion of the objects between cameras, which are positions, object velocities and transition times. A Bayesian formulation of the problem was used to reconstruct the paths of objects across multiple cameras. They required manual input of the topology of allowable paths of movement and the transition probabilities. Huang and Russell [10] used a probabilistic approach that is a combination of appearance matching and transition times of cars in non-overlapping cameras with known topology. The appearance of a car is evaluated using the color and the transition times modeled as Gaussian distributions.

3 System architecture and metadata scheme The surveillance system operates continuously or only as required to monitor a particular event. To develop the intelligent surveillance system, various contexts are acquired from physical sensors in monitoring areas. The situation is recognized by analyzing the context data collected from the physical sensors. Then, the surveillance system generates metadata. The goal of the metadata rule is to effectively index very large video surveillance databases and to enable unified searches and management between distributed or heterogeneous surveillance systems more efficiently.

Multimed Tools Appl

3.1 System architecture Figure 1 depicts our intelligent video surveillance system consisting of three different layers. First, the low-level context module in the bottom layer collects the measurable data from sensing hardware in the monitoring area. In this paper, the system receives audiovisual data and RFID tag data from cameras, microphones, and RFID readers. Data acquired from various sensors are transmitted to the high-level context aware module. The high-level context module recognizes human actions, such as hugging, snatching, trespassing and tampering, by analyzing audio-visual data. The abnormal context aware module judges whether the context is normal; if it is abnormal, it constructs the community and gives an instruction for the appropriate services, as shown in Fig. 2. Figure 3 shows the intelligent surveillance system architecture. The components are described as follows. –

– – – –

Sensing Infrastructure: Sensing Infrastructure is used to collect various data from heterogeneous sensing hardware devices in a ubiquitous network environment. This paper used cameras, GPS for a location awareness sensor, and microphone for a noise sensor to acquire various data in the monitoring area. Data from the Sensing Infrastructure are transmitted to the Context Aggregator and are modified as our predefined format for the context awareness. Context Database: Context Database refers to the module in which the modified data from the Context Broker (which is used for the future awareness of context) are stored. The corresponding data are represented as space safety index, personal safety index and so on. Context Broker: Context Broker stores the data into the context DB that is transmitted from the Context Aggregator. Data are processed for the usage of the corresponding space. Community Manager: When an event occurs in a specific location according to our predefined criteria, Community Manager gives instruction to its Service Invocator to construct relevant services that are defined by Community Editor. Community Editor: Community Editor constructs the community that makes a service when a pre-defined event occurs in our monitoring area. The Community is dynamically constructed and stored in the Community Template Repository.

Abnormal Context Recognition

High-level Context-aware

Low-level Context-aware

Fig. 1 3-tier context-aware surveillance system

Multimed Tools Appl Tampering Detection Camera Association

Multi-camera Tracking Topology-based Context Propagation

Object Localization

Fig. 2 Appropriated services for intelligent surveillance system

–

Service Discoverer and Invocator: When an event occurs, Context Manager finds an appropriate service through the Service Discoverer; and, if it exists, Service Invocator performs the relevant service stored in the Community Template.

When a tampering action occurs in the monitoring area, sensing data are transmitted to Context Broker and Context Broker commands Index Agent to update the latest space safety index in the Index Database. A camera application computes its space safety index in the index DB. If the computed space safety index value exceeds the threshold, it commands the camera to monitor the area by the PTZ function. Finally, the user agent sends an alarm message to users by computation of the space safety index.

Fig. 3 System architecture

Multimed Tools Appl

A multi-camera tracking scheme is applied for a continuous video acquisition of the object movement. After the physical setting on the camera and system environments, Field of View (FOVs) of fixed and PTZ surveillance cameras are automatically set by an image similarity comparison. The fixed cameras cover the Region Of Interest (ROI) and analyze the real-time images for object representation and tracking in case of abnormal situations. The system automatically sends an alarm message to the surveillance system when tampering or violence occurs. After receiving the message, the system shows the object in images and indicates the object location in the safety index, Google satellite map, and a 2D map. If the object moves in the FOV of the fixed camera, a PTZ camera traces the object using PTZ control. Otherwise, if the object disappears from the FOV of the fixed camera, our system attempts to obtain the object through our autonomic collaboration method employing adjacent camera topology in the non-overlapping zone. The main purpose of the intelligent surveillance system is to provide real time event detection based upon established rules. Monitoring and surveillance agents then receive alerts in real time, allowing them to address threats and other events of importance proactively within their environment. However, the surveillance systems have different established event rules and message exchanging rules with vendors. Thus, metadata standardization is required to enable the intelligent surveillance systems to exchange analyzed data. The metadata rule is to help exchange analyzed information between distributed systems or heterogeneous systems. We define the metadata rule for exchanging analyzed information between intelligent video surveillance systems that automatically analyzes video data acquired from cameras. 3.2 Metadata scheme Surveillance metadata should be constructed with a camera unique ID, camera resolution, power on/off status, and camera location information. When a moving object appears in the FOV, the object color feature is constructed as metadata. The object color feature is classified as head, body, upper, and the lower part in the HSI color space. In addition, metadata consists of a unique ID, size, object location, camera location, type, action, and additional information. Figure 4 shows the schema diagram of metadata. In the next section, we will describe audio-visual data analysis methods to generate metadata.

4 Audio-visual data analysis methods The described framework in this paper includes real-time video data analysis methods for an automated surveillance system, which is listed as object identification, tampering detection, object size analysis, object location analysis, moving object tracking. This chapter describes audio-visual data analysis methods to develop intelligent video surveillance and to generate metadata. 4.1 Object identification We used the background subtraction scheme for object classification. This divides the background and moving objects. The background subtraction algorithm captures a sequence of images containing moving objects from a static single camera and detects moving objects from the reference background image. We statistically analyze the reference

Multimed Tools Appl

MetadataID xs:ID Object ObjectType 1.. Metadata

Camera CameraType extension

1..

Sync SYNCType File Comment xs:string

ObjectID xs:ID Color ColorType Size Location LocationType Object ObjectType 1..

Type 1..

restriction Action xs:string Comment xs:string SYNC SYNCType

Fig. 4 Schema diagram of metadata

Multimed Tools Appl

CameraID xs:ID Resolution Camera CameraType

Status restriction Location LocationType

Body

Color ColorType

Head Upper Lower

Fig. 4 (continued)

background image in HSI colour space for fifty frames with different illuminations and all pixels of the static background scene image are modeled as Gaussian distribution with respect to the hue and saturation values. After the preprocessing for analyzing the background image, a sequence of images containing a moving human captured from a camera is converted into HSI colour images and subtracted from the reference background image. If the subtraction values are greater than the threshold values which are derived based on the variance values from the background image, those pixels are determined as belonging to the foreground pixels. After background subtraction, the object is identified by moving direction and color histogram of the object in our system. When objects move in the monitoring area, pixel-level subtraction results in the separation of background and object image. However, these subtracted data have numerous useless noisy and ungrouped pixels. Thus, these pixels are eliminated and blobs for the grouped pixels are grouped as a moving object. Figure 5 shows background extraction and object movement orientation analysis. 4.2 Tampering detection Image difference comparison is used for tampering and the camera position setting. Figure 6 shows tampering detection; Fig. 6 (c) shows the subtraction result of Fig. 6 (a)

Multimed Tools Appl

TIME : 1_30_16_6_46_718 c enter c oordinate : (143,139) angle : 243.47 previous c oordinate : (143,139) (161,130) (147,122) (144,118) (0,0)

TIME : 1_30_16_9_8_765 c enter c oordinate : (312,160) angle : 115.21 previous c oordinate : (312,160) (210,112) (194,112) (185,112) (213,112)

Fig. 5 Background extraction and object movement orientation analysis

and (b). In Fig. 6 (c), the unchanged part of Fig. 6 (a) and 6 (b) has close-to-zero RGB values. When the image difference value exceeds the 80% threshold of non-zero RGB pixel over the entire image, an alarm message is initiated. When a tampering alarm is received, the system predicts an object’s movement and controls the adjacent cameras to acquire a continuous object using the PTZ function. In our previous paper [15], an object movement routine was graphically presented considering the spatial relation of the camera and the time-spatial relation of the object appearance and disappearance. 4.3 Object size and location analysis The size of an object is determined by the distance to the object and the focal length of the camera. To determine the distance to an object of unknown size is possible using the

(a) Monitoring area Fig. 6 Tampering detection

(b) Tampering action

(c) Image difference

Multimed Tools Appl

knowledge about the height of the camera and the bearing to the point where the object meets the ground. Therefore, object size y is computed by focal length f, camera height yc, tilting angle θx, y¼

fyc ðf sin q x ðvc vt Þ cos q x Þ=ðf sin q x ðvc vb Þ cos q x Þ fyc ; ðvc vt Þ sin qx þ f cos q x

ð1Þ

where vc is the center coordinate of the object in a camera, vt is the top coordinate of the object in a camera, vb is the bottom coordinate of the object in a camera. An object location y ′ is computed by object size y, camera location h′ and camera height h. y0 ¼ h0 þ ðh yÞ tanðqÞ;

ð2Þ

where h′ is the GPS location of a camera (Fig. 7). 4.4 Activity recognition Human activity is recognized by [4], as shown in Fig. 8. The method in [4] can recognize human activities, such as walking, turning, punching and sitting. The proposed system adopts several action classifiers as the movement direction of the object to recognize the human action view-invariantly. Then, the proposed system selects a classifier based on the moving path of the target object. We train the Multi-Layer Perceptron (MLP) using 320 actions obtained from four subjects. When a punching action occurs, the method sends an alarm message to our surveillance system. 4.5 Moving object tracking In the case of a multi-camera tracking system, while one fixed camera shows the ROI, the PTZ camera is controlled by analyzing the moving image of the fixed camera. At

h'

θ α' h

α /2 |h'-y''|

(h-y)tanθ y y'

y'

h'

(a) Side view

y''

(b) Top-down view (α is FOV)

Fig. 7 Calculation of object size and location in camera environments

Multimed Tools Appl

Fig. 8 Screenshot of activity recognition

this time, the PTZ camera should control PTZ dependent on the angle of view and zoom level. Thus, additional physical position adjustment between the fixed and PTZ camera is essential for object tracking. In this paper, the camera topology can be adjusted by image similarity comparison automatically. The camera position setting algorithm is as follows. Algorithm 1: Camera Position Setting 1: SetCameraPosition(&FixedCamera, fZoomLev, fHeight); 2: SetCameraPosition(&PTZCamera, fZoomLev, fHeight); 3: image Rep[], FixedImg; FixedImg = SaveImgFromFixedCam( ); Do Panning PTZ Camera { Rep = SaveImage( ); } while (From leftmost to rightmost) 4: CalculateImgDiff(Rep[], FixedImg); 5: SetPosition (Min (Rep[])); We set the fixed camera position to the specific height and zoom level, as shown above. The PTZ camera was calibrated using the fixed camera’s height and zoom level. Then, we collected the representative images through the PTZ camera panning, which covered entire monitoring areas. We set the PTZ camera location that satisfies the minimum difference by calculating the differences between the fixed camera images and the representative images. An object is detected by preprocessing, which subtracts the object from a background image. The background image without any objects is stored, so that an object can be extracted by subtracting object images from the background image when needed. However, the background subtraction method could not be started immediately after setting the physical camera until no object appears in the background image. It also takes time to store the background image. In addition, the background image may be newly stored with luminance and the object is changed by light and wind. In this paper, we use a motion history method of OpenCV library [16], which has no background learning process, to trace moving objects using the centric point in real-time. The tracking process is as follows.

Multimed Tools Appl

Algorithm 2: Moving Object Tracking Image buffer[]; Point objPoint[]; MotionSegmentation seg[]; int minDistance, nowobjPoint; int angle[]; buffer[] = SaveImg ( ); cvCvtColor (buffer[], CV_BGR2GRAY ); cvAbsDiff (buffer[]); seg[] = cvUpdateMotionHistory (buffer[],DURATION); for (i = 1; i < Num(seq); i++) { extractObjFromSegmentation (seg[]); } objPoint[] = GetCenterPointofObj (seg[]); angle[] = GetObjAngle (seg[]); for(i = 1; i < Num(seq); i++) { int tmpDistance; tmpDistance = CalculateEuclideanDistance (prevobjPoint, objPoint[i]); if (tmpDistance ACCEPTABLE_MIN_DISTANCE) nowobjPoint = Compare (prevAngle, objAngle[i]); else nowobjPoint = objPoint[i]; } } As depicted in algorithm 2, images are stored in a buffer, and then are converted into gray-scale images. Our system obtains the motion history of two images using the image difference calculation. We adopted the cvUpdateMotionHistory of the OpenCV API to update the motion history. Motion history can be updated by the non-zero pixel silhouette image when motion occurs in the image. In this paper, we used the 1 second time-stamp for the image storage time and exclude the non-zero pixel silhouette images that have a summation of width and height below 20 pixels. As shown in Fig. 9, when objects move in the image for a specific period, a blue-marked motion history is updated by the comparison of two image frames. We represent an object as the center point in a circle that covers the whole object. In addition, a line from the centric point to a circular arc shows the movement direction of the object, as shown in Fig. 9. We can predict the centric point of an object in the next frame using the centric point and direction of the object. First, we continuously store the center point and direction in the buffer. The object center point in next frame is set by the point that has the shortest distance between object center points of the previous frame and the next frame. The Euclidean method [6] was used to measure distance between center points. In the case of an occlusion of multiple objects, the movement direction between the previous frame and next frame is used for the object’s identification. The centric point is used for Panning and Tilting the PTZ camera, as used in the following algorithm, considering the zoom level difference between the fixed camera and the PTZ camera.

Multimed Tools Appl

(a) Real image

(b) Silhouette image

Fig. 9 Object tracking

Algorithm 3: PTZ Function 1: 2: 3: 4: 5:

int MovFactor, prevX, prevY, newX, newY; MovFactor = (PTZCamZoomLev/FixedCamZoomLev); newX = FixedCamCenterX + (OjbCenterX – FixedCamCenterX) x MovFactor; newY = FixedCamCenterY + (OjbCenterY – FixedCamCenterY) x MovFactor; DoPTZ (newX, newY);

We can calculate the degree of panning and tilting using the given coordinates depicted in the function DoPTZ (newX, newY).

5 Physical prototyping of intelligent video surveillance system We performed experiments with seven CCD cameras (704×480 resolutions) to evaluate the performance of our system. We used PCs with Intel 64-bit Xeon 3.2 G Processors and 2 GB of RAM as the hardware platform, and Microsoft SQL Server 2000 as the underlying DBMS. The system automatically recognizes various dangerous situations in public areas and classifies the safety level by means of the environment’s safety index models using network camera collaboration. Figure 10 shows color-based object identification using two cameras and violence recognition using an acoustic sensor. In Fig. 10 (a), the face of an entering object is detected by Adaboost algorithm in the entrance. Each object is classified by the HSI color model. Thus, the intelligent surveillance system identifies and tracks unauthenticated people by analyzing the color and pattern of clothing. In Fig. 10 (b), dangerous situations are recognized by analyzing audio and visual data. Our system detects abnormal situations using the decibel (dB) levels of scream pitches. Figure 11 shows the physical prototype of the intelligent video surveillance system composed of a screen (4.2 m×2 m) with six projectors, one ceiling projector, four fix cameras, four PTZ cameras, one speed dome camera and two acoustic sensors. Figure 11

Multimed Tools Appl

(a) Object identification

(b) Abnormal situation detection Fig. 10 Object identification violence recognition using multiple sensors

Multimed Tools Appl Fig. 11 Physical prototyping of intelligent video surveillance system

(a) Control center

(b) Screenshot of ISS

(c) GIS using Google Earth

(d) USS monitor

Multimed Tools Appl

(c) shows a satellite map that covers the entire boundary of the Earth. When an accident occurs, the system zooms into the accident area. The system indicates the accident point using a red circle in a 2D map. We use the Google Earth API functions [8] to mark the monitoring area for efficiency. In Fig. 11 (d), the level of spatial importance is computed using space features and facilities. Based on this space safety index, we reconstruct the monitoring area when an accident occurs in a specific space. For example, the safety index of an ATM facility is higher than that of other areas. At this time, if violence happens in the ATM area, the level of spatial importance is recalculated. Then, the system controls the adjacent camera to monitor the ATM area using PTZ control. If violence occurs in the ATM area, the area is marked with red in the 2D map.

6 Conclusions In this paper, we have developed an intelligent surveillance system that provides various services, such as object identification, object size analysis, object localization, tampering detection, activity recognition, and moving object tracking. The surveillance systems have different established event rules and message exchanging rules with vendors. Thus, we have defined the metadata rules to exchange analyzed information between distributed surveillance systems or heterogeneous surveillance systems. A 3tier context-awareness conceptual framework is presented to identify the design principles of the intelligent surveillance system. Most importantly, the design prototypes, as the convergence of computers and buildings, have shown the potential for a profound transformation of design practice in smart space design. The design framework and the implementation of the prototypes have served as a logical basis to elaborate broad design concepts and intelligent video computing technologies that may be performed toward future smart surveillance systems. In future work, we will improve robust object identification methods and create an administrative mobile device interface.

Acknowledgment “This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency)” (NIPA-2010-C1090-1031-0004) and this research is also supported by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier R&D Program of the Ministry of Knowledge Economy (MKE), the Korean government, as a result of UCN’s subproject 10C2-T3-10M.

References 1. Ali A, Aggarwal J (2001) Segmentation and recognition of continuous human activity. In: Detection and recognition of events in video, 2001. Proceedings. IEEE Workshop on, pp 28–35 2. Ayers D, Shah M (2001) Monitoring human behavior from video taken in an office environment. Image Vis Comput 19(12):833–846 3. Cai Q, Aggarwal JK (1996) Tracking human motion using multiple cameras. In: ICPR ’96: Proceedings of the International Conference on Pattern Recognition (ICPR ’96) Volume III-Volume 7276. Washington, DC, USA: IEEE Computer Society, pp 68–72

Multimed Tools Appl 4. Chae YN, Kim Y-H, Choi J, Cho K, Yang HS (2009) An adaptive sensor fusion based objects tracking and human action recognition for interactive virtual environments. In: VRCAI ’09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry. New York, NY, USA: ACM, pp 357–362 5. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25:564–575 6. Danielsson P (1980) Euclidean distance mapping. Comput Graph Image Process 14(3):227–248 7. Duong T, Bui H, Phung D, Venkatesh S (20–25 2005) Activity recognition and abnormality detection with the switching hidden semi-markov model. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp 838–845 8. Google Earth API. Online Available. http://code.google.com/apis/earth/ 9. Huang C-L, Liao B-Y (2001) A robust scene-change detection method for video segmentation. IEEE Trans Circuits Syst Video Technol 11(12):1281–1288 10. Huang T, Russell S (1997) Object identification in a Bayesian context. In: IJCAI’97: Proceedings of the Fifteenth international joint conference on Artifical intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., p. 1276–1282 11. Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S (1995) An architecture for multiple perspective interactive video. In: MULTI- MEDIA ’95: Proceedings of the third ACM International Conference on Multimedia. New York, NY, USA: ACM, pp 201–212 12. Kettnaker V, Zabih R (Jul 1999) Counting people from multiple cameras. In: Multimedia computing and systems, 1999. IEEE International Conference on, vol. 2, pp 267–271 13. Lv F, Kang J, Nevatia R, Cohen I, Medioni G (2004) Automatic tracking and labeling of human activities in a video sequence. In PETS04 14. Nam J, Tewfik A (2005) Detection of gradual transitions in video sequences using b-spline interpolation. IEEE Trans Multimedia 7(4):667–679 15. Nam Y, Ryu J, Joo Choi Y, Duke Cho W (2007) Learning spatio-temporal topology of a multi-camera network by tracking multiple people. World Acad Sci Eng Tech 4(4):254–259 16. OpenCV, Open Computer Vision Library. http://sourceforge.net/projects/opencvlibrary/ 17. Petrushin V, Wei G, Ghani R, Gershman A (28–28 2005) Multiple sensor indoor surveillance: problems and solutions. In: Machine Learning for Signal Processing, 2005 IEEE Workshop on, pp 349–354 18. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286 19. Ribeiro PC, Santos-victor J (2005) Human activity recognition from video: modeling, feature selection and classification architecture. In: International Workshop on Human Activity Recognition and Modeling, pp 61–70 20. Ribnick E, Atev S, Masoud O, Papanikolopoulos N, Voyles R (Nov. 2006) Real-time detection of camera tampering. In: Video and signal based surveillance, 2006. AVSS ’06. IEEE International Conference on 21. Serby D, Meier E, Van Gool L (23–26 2004) Probabilistic object tracking using multiple features. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp 184–187 Vol.2 22. Veenman C, Reinders M, Backer E (2001) Resolving motion correspondence for densely moving points. IEEE Trans Pattern Anal Mach Intell 23(1):54–72 23. Viola P, Jones M, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63:153–161 24. Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):1531–1536 25. Zelnik-Manorand L, Irani M (2001) Event-based analysis of video. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 2, pp 123–130 26. Zhang D, Gatica-Perez D, Bengio S, McCowan I (20–25 2005) Semi-supervised adapted HMMs for unusual event detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp 611–618 27. Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (Jul 1999) Improving color based video shot detection. In: Multimedia computing and systems, 1999. IEEE International Conference on, vol. 2, pp 752–756 vol.2 28. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Computer vision and pattern recognition, Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp 819–826

Multimed Tools Appl

Yunyoung Nam received B.S, M.S. and Ph.D. degree in Information and Computer Engineering from Ajou University, Korea in 2001, 2003, and 2007 respectively. He was a research engineer in the Center of Excellence in Ubiquitous System from 2007 to 2009. He was a post-doctoral researcher at Stony Brook University in 2009, New York. He is currently a research professor in Ajou University in Korea. He also spent time as a visiting scholar at Center of Excellence for Wireless & Information Technology (CEWIT), Stony Brook University - State University of New York Stony Brook, New York. His research interests include multimedia database, ubiquitous computing, image processing, pattern recognition, contextawareness, conflict resolution, wearable computing, and intelligent video surveillance.

Seungmin Rho received his MS and PhD Degrees in Information and Computer Engineering from Ajou University, Korea, in Computer Science from Ajou University, Korea, in 2003 and 2008, respectively. In 2008–2009, he was a Postdoctoral Research Fellow at the Computer Music Lab of the School of Computer Science in Carnegie Mellon University. He is currently working as a Research Professor at School of Electrical Engineering in Korea University. His research interests include database, music retrieval, multimedia systems, machine learning, knowledge management and intelligent agent technologies. He has been a reviewer in Multimedia Tools and Applications (MTAP), Journal of Systems and Software, Information Science (Elsevier), and Program Committee member in over 10 international conferences. He has published 14 papers in journals and book chapters and 21 in international conferences and workshops. He is listed in Who’s Who in the World.

Multimed Tools Appl

Dr. Jong Hyuk Park received his Ph.D. degree in Graduate School of Information Security from Korea University, Korea. From December, 2002 to July, 2007, Dr. Park had been a research scientist of R&D Institute, Hanwha S&C Co., Ltd., Korea. From September, 2007 to August, 2009, He had been a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor at the Department of Computer Science and Engineering, Seoul National University of Science and Technology, Korea. Dr. Park has published about 100 research papers in international journals and conferences. He has been serving as chairs, program committee, or organizing committee chair for many international conferences and workshops. He is a president of Korea Information Technology Convergence Society (KITCS). He is editor-in-chief (EiC) of International Journal of Information Technology, Communications and Convergence (IJITCC), InderScience. He was EiCs of the International Journal of Multimedia and Ubiquitous Engineering (IJMUE) and the International Journal of Smart Home (IJSH). He is Associate Editor / Editor of 14 international journals including 8 journals indexed by SCI(E). In addition, he has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John Wiley, Oxford Univ. press, Hindawi, Emerald, Inderscience. His research interests include security and digital forensics, ubiquitous and pervasive computing, context awareness, multimedia services, etc. He got the best paper award in ISA-08 conference, April, 2008. And he got the outstanding leadership awards from IEEE HPCC-09 and ISA-09, June, 2009.