INTELLIGENT DISTRIBUTED SURVEILLANCE SYSTEMS
Dual camera intelligent sensor for high definition 360 degrees surveillance G. Scotti, L. Marcenaro, C. Coelho, F. Selvaggi and C.S. Regazzoni Abstract: A novel integrated multi-camera video-sensor (panoramic scene analysis – PSA) system is proposed for surveillance applications. In the proposed set-up, an omnidirectional imaging device is used in conjunction with a pan tilt zoom (PTZ) camera leading to an innovative kind of sensor that is able to automatically track at a higher zoom level any moving object within the guarded area. In particular, the catadioptric sensor is calibrated and used in order to track every single moving object within its 360 degree field of view. Omnidirectional image portions are eventually rectified and pan, tilt and zoom parameters of the moving camera are automatically adjusted by the system in order to track detected objects. In addition a co-operative strategy was developed for the selection of the object to be tracked by the PTZ sensor in the case of multiple targets.
1
Introduction
In the design of an architecture for video surveillance applications, many functionalities such as detection, tracking and classification of objects acting within the guarded environment have to be taken into account. In particular, achieving a robust and reliable behaviour of the system against corruption of data is an important issue. In order to enhance overall robustness and to extend the spatial coverage of the system, a multi-sensor approach has been chosen. The architecture here proposed is composed of a 360 degree catadioptric sensor for full range monitoring and a pan tilt zoom (PTZ) camera for high-resolution surveillance. The catadioptric sensor solution is adopted in video surveillance systems because of its advantages in terms of coverage and costs. In particular, Harabar [1] utilises this kind of optics to automatically pilot a small traffic surveillance helicopter or a robot. Nayar and Boult at Columbia University produced a system [2] capable of detecting activity in the monitored scene, which fuses catadioptric sensors with PTZ cameras. Boult, in [3] describes a video surveillance system based upon a catadioptric sensor and capable of detecting and tracking objects in complex environments. The system proposed in this paper, ‘closes the loop’ as it is a complete dual camera real-time system that actively tracks objects of interest with a high degree of independence. In fact, the PSA works like an embedded smart sensor
q IEE, 2005 IEE Proceedings online no. 20041302 doi: 10.1049/ip-vis:20041302 Paper first received 16th April and in revised form 7th December 2004 G. Scotti and C.S. Regazzoni are with the Department of Biophysical and Electronic Engineering, University of Genova, Via Opera Pia 11a 16145, Genova, Italy C. Coelho and F. Selvaggi are with the Elsag S.p.A., Via G. Puccini 2 16154, Genova, Italy L. Marcenaro is with the Technoaware S.r.L., Italy E-mail:
[email protected] 250
capable of detecting and tracking multiple objects whether in low or high resolution by using the mobile camera. This approach greatly simplifies problems of data fusion in multi-camera video surveillance systems, drastically reducing set up, maintenance costs and computational complexity too. 2
The catadioptric sensor
In the literature, different types of catadioptric sensors can be found. In particular, what characterises each catadioptric sensor is the shape of the adopted mirror (parabolic, hyperbolic, conical or ellipsoidal). In fact, the mirror has to assure only one point of projection in order to create a perspective image. The mirror chosen for this work has a double parabolic envelope; this means the projection centre is in the parabola focus and the geometric mapping between the image and the real world is invariant to mirror translations (Fig. 1). The chosen reflector is directly placed over a conventional camera sensor and its field of view extends from 4 degrees over the horizon to 56 degrees below. By using such a sensor, the obtained resolution image is not constant (Fig. 2). In the next Section, an intrinsic and extrinsic sensor calibration procedure will be explained for transforming the polar image into a Cartesian one and to locate targets on the ground plane. The extrinsic calibration is also required for pointing the PTZ camera via the omnidirectional sensor. 3
Sensor calibration
Sensor calibration is a procedure for obtaining sensor parameters and positioning. It is divided into three different phases 1. Intrinsic calibration 2. Extrinsic calibration 3. Joint calibration In the first step, image centre, eccentricity and radius are estimated in order to achieve an image rectification [4]. IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
x1 ¼ Gðy2 Þ cosðx2 Þ y1 ¼ Gðy2 Þ sinðx2 Þ
ð1Þ
where ðx2 ; y2 Þ is a point in domain 2 and ðx1 ; y1 Þ is a point in domain 1 and G (:) is a stretching function along the vertical dimension. The values ðx1 ; y1 Þ must be integer so that it is possible to obtain 4 integers as follows: 1 XI ð2Þ such as XI þ XF X1 XF 8
Fig. 1 Catadioptric sensor [10]
In the second phase, the relationship between the image plane and the world plane is computed assuming the ground plane and the image plane are parallel. In the third phase, a methodology for pointing the PTZ camera on the target, by using information taken by the catadioptric sensor, is described.
3.1 Intrinsic calibration Intrinsic calibration is the procedure required for image rectification. In particular, rectification is used for enhancing the comprehensibility of the polar image (Fig. 3). In the literature, more complex rectification algorithms [5] generating a perspective view from a rectified one do exist. In our method, an image re-sampling is first performed to overcome the non-homogeneous resolution of the input polar image. In Fig. 4 it is possible to see a schematic representation of this procedure in which all points of domain 2 (rectified) are mapped in domain 1 (polar). The mapping functions are:
1 YI such that YI þ YF Y1 YF 8
ð3Þ
This procedure clearly introduces a certain aliasing depending on the resolution chosen for the output image (domain 2). The interpolating function is then: V¼
3 X
Ak PðYF ; kÞ
ð4Þ
k¼0
where: Ak ¼
3 P n¼0
IMMðXI 1 þ n; YI 1 þ kÞPðXF ; nÞ for k ¼ 0; 1; 2; 3
ð5Þ
IMM is the input image (domain 1) V is the pixel value in domain 2 P is an 8 4 matrix of interpolative filters In order to obtain a correct image re-sampling five parameters have to be estimated: CðX0 ; Y0 Þ: the common centre of the circumferences K: scale factor along x R(E): external radius R(I): internal radius
Fig. 2 Extrinsic sensor calibration image IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
251
Fig. 3
PSA functionalities [9]
Fig. 5
Fig. 4
Image re-sampling
The estimation procedure is based on an iterative least mean-square-methodology and takes place by selecting N couples of points along the two circumferences. In particular, this procedure is based on the iterative minimisation of the following functional Q: Q¼
N X
ðRn ðEÞ RðEÞÞ2 þ
n¼1
N X
ðRn ðIÞ RðIÞÞ2
ð6Þ
n¼1
The last quantity to be evaluated is the stretching function G(r) that is strictly perspective dependent. For this reason, a vertical graded pattern bar is placed in the scene at a fixed distance L from the camera. Then by selecting several points on the image belonging to the pattern bar G(r) can be calculated using the least squares technique. GðrÞ ¼ g0 þ g1 r þ g2 r 2 þ g3 r 3 þ g4 r 4
ð7Þ
3.2 Extrinsic calibration Generally, extrinsic calibration deals with the estimation of the mathematical relationship between the image plane and the world space. In general, for a traditional camera this means to find out the tilt angle and the sensor displacement. 252
Angular resolution
Traditional calibration methodologies have to be modified in order to satisfy sensor requirements. In the literature several algorithms have been proposed [6, 7]. In our case, owing to the physical configuration of the sensor (i.e. the mirror is characterised by a field of view covering 4 degrees over the horizon to 56 degrees below), it is possible to identify a different angle value for each incoming ray of light. In order to solve the problem, a radioscopic chessboard grid is wrapped around the sensor (Fig. 2), obtaining a particularly useful image for estimating the angle as a function of distance from the image centre r. It is then possible to find out the relationship between radius r in pixels and the angle value. ¼ f ðrÞ
ð8Þ
Interpolating experimental data obtained by applying the chessboard pattern to the sensor, it has been found that the dependence is linear (Fig. 5). It is also important to underline how this procedure is totally application independent, and depends only by catadioptric sensor geometry. By using this dependency the distance of an object from the camera on the ground plane, once the distance in pixels from the centre of the image is known, can be evaluated (Fig. 6). rw ¼ H tanðÞ
ð9Þ
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
Fig. 8 PTZ camera pan and tilt angles
the image centre CðX0 ; Y0 Þ it is simple to find out the absolute pan angle j: _
j ¼ ZC P
Fig. 6 System resolution
The object ‘world position’ in a polar coordinate system is finally estimated after evaluating the angle displacement on the image plane.
3.3 Joint calibration strategy This strategy is fundamental for the optimal functioning of the proposed system. Vertical camera optical axis alignment and absolute PTZ camera positioning are utilised for this calibration phase (Fig. 7). In particular a reference zero position Zðx0 ; y0 Þ for the PTZ camera on the catadioptric 360 degree image is first defined. This position coincides with the default orientation of the PTZ camera and is chosen as the starting point for pan angle evaluation (Fig. 8). In fact, given a point P(x, y) and
ð10Þ
The movable camera tilt angle w is instead evaluated using extrinsic calibration. In fact, by knowing mobile camera height Hm and the object distance from camera on the ground plane rw ; the tilt angle for each image point can be calculated simply by using the following expression: H w ¼ arctan m ð11Þ rw It is then possible for the user to point the moving camera simply by clicking on the omnidirectional image or directly on the rectified one. 4
Multi-target tracking system
The PSA sensor is able to track multiple objects directly on the panoramic image [8]. The software module processing to detect and track objects is depicted in Fig. 9. In the following, a brief description of the most important modules will be given. Object detection is performed by subtracting the current image I to a reference image B (background image) obtaining as output another image containing the changes in the scene [8 –10]. jIh ð j; iÞ Bð j; iÞj > Th ) Foreground pixel jIh ð j; iÞ Bð j; iÞj Th ) Background pixel By using this image, a bounding box (blob) is drawn and some features (such as colour, shape and position) extracted for each object by the feature extraction module (blob
Fig. 7 Sensors physical set up IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
Fig. 9 Image processing chain 253
Fig. 10
Multi-target tracking
colouring), in order to track the same blob in the next image. In particular, each moving area (called blob) detected in the scene is bounded by a rectangle to which a numerical label is assigned. Through to the detection of temporal correspondences among bounding boxes, a graph-based temporal representation of the dynamics of the image primitives can be built. The temporal graph provides information on the current bounding boxes and their relations to the boxes detected in the previous frames. By using the temporal graph layer as a scene representation tool, tracking can be performed to preserve temporal coherence between blobs. An alarm is then raised every time an object gets into a forbidden area. The PSA sensor is able to display the tracked object in the panoramic camera using rectification (low-resolution), or using a pan tilt zoom camera (high-resolution). Many false alarms (i.e. false blob tracks) can appear in outdoor environments when meteorological conditions are adverse. In fact shadows and illumination changes generate noisy artefacts, which lead to misdetection at the low level (i.e. Change Detection), so a backgroundupdating algorithm is required. The updated background Bkþ1 ðx; yÞ is then obtained using alpha filter properties as follows: Bkþ1 ðx; yÞ ¼ Ik ðx; yÞ þ a½Bk ðx; yÞ Ik ðx; yÞ
In both cases the system is able to track the object at low resolution redirecting the moving camera by using the information of the blob position on image [11]. During automatic functioning, the system is also able to generate an alarm log file and to save a rectified image for each alarmed blob (Fig. 10). In addition to the functionalities described above, this sensor is able to track several blobs (multi-target tracking mode) at the same time showing a rectified image for each one. In this way, it is possible to automatically monitor a wide area at low resolution. 5
Results
In order to test the proposed system, the panoramic camera has been installed at 5.7 meters whereas the PTZ sensor has been placed at 8 meters over a car park (Fig. 7). For the implementation, two 1=3-inch sensor cameras have been used ensuring an image resolution of 768 576
ð12Þ
where Ik ðx; yÞ is the current image, Bk ðx; yÞ is the background and a : ½0 1 is an updating coefficient.
4.1 Target selection Target selection is performed: 1. Automatically: when an object enters, the alarmed zone is selected over the image by user. 2. Manually: selected by the operator on the panoramic image or on the rectified one. 254
Fig. 11
System working zones IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
both for the panoramic 360 degree camera and for the PTZ. The maximum frame rate obtained by this configuration is about 15 fps, on an Intel Pentium IV 2.4 GHz, permitting also to track high speed moving objects such as cars or running pedestrians. The maximum object speed for tracking has been found to be 60 km=h for blobs belonging to the green ‘tracking’ belt of Fig. 11. The panoramic image is otherwise characterised by a nonlinear resolution as a function of the distance from the centre (Fig. 6) and by a blind circular zone that can be a problem both for detection and for tracking. Consequently, working zones have been identified for each of the system functionalities as depicted in Fig. 11. From experimentation it has been found: yoi ¼ 34
yi ¼ 39 yf ¼ 79 yof ¼ 94
ð13Þ
where y ¼ þ 90 and yoi is the minimum theoretical (radial) visibility angle, yi is the real visibility angle, yf is Table 1: Calibration error (pel) 89
w (m) measured 4.50
w (m) evaluated
S(m)
3.94
0.56
157
9.50
8.90
0.60
205
19.50
19.30
0.20
217
24.50
25.88
1.38
229
34.50
38.60
3.90
the maximum angle of detection and yof is the maximum angle of visibility. In particular in the ‘Detection’ zone only visual detection is allowed because of the low-resolution and the high distance from camera. Meanwhile in the ‘tracking’ field only detection is granted, and in the ‘classification’ area also blob classification is possible in order to determine if a detected ‘change’ is a pedestrian or a car. Sensor extrinsic calibration error S has also been evaluated as the difference between the measured distance and the calculated one S ¼ jrw
meas
rw
eval j
ð14Þ
as can be seen in Table 1. The values obtained for S demonstrate the validity of the methodology used for calibrating the sensor. Actually, the error remains quite small for points belonging to the detection field as it can be seen in Table 1 where to a rw equal to 24.5 m at an angle y equal to 76 degrees corresponds after placing the camera at 5.7 m. In order to show PSA capabilities three image sequences (Figs. 12 –14) are presented corresponding to an outdoor and indoor environment, respectively. In Figs. 12 –14 it is possible to see how the target object is tracked over time. In particular, it is possible to see how the system monitors a large area directly by operating on a 360 degree image and is able to track objects to both lowand high-resolution also in presence of more than one target, granting full area coverage. From 200 different tests it has been found that PSA is characterised by a frequency of object loss (FL) equal to 5% in the ‘tracking’ field, by a false detection rate (FFD)
Fig. 12 Pedestrian tracking IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
255
Fig. 13
Car tracking
Fig. 14
Indoor functioning
lower than 2% and by a frequency of correct detection (FCD) greater than 95%. It is also important to observe how the PSA highly reduces installation and maintenance costs. In fact, this solution is 256
equivalent to a four traditional fixed camera system (or three fixed plus one moving camera) but does not require complex data fusion and calibration procedures. One system limitation is represented by the poor resolution at relatively IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
Fig. 15 Distributed VSS architecture
high distances from the camera optical axes projection. This fact increases ‘horizon’ and ‘detection’ zones limiting system functionalities. A possible solution to this problem can be found in using mega-pixel cameras obtaining the same frame rate together with a higher resolution leading to a strong increase in track capabilities. 6
Applications
The proposed architecture can be used as a part of a distributed video surveillance system composed by two video cameras and four processors (Fig. 15). The described system is capable of detecting and classifying targets in order to reveal intrusions in protected areas. In particular when an alarm occurs, the omnidirectional camera is able to trigger the PTZ camera that begins to track the target autonomously. This feature is required whenever the object goes outside the tracking field of the panoramic camera. Furthermore, each processed video stream is also stored in a distributed database together with metadata for future queries. Remote ‘intelligent’ database retrieval is then possible. The user is then able to look for an event of interest using several research keys such as date, temporal interval, alarm typology or alarmed area crossing. Therefore, the integration of the PSA system in a distributed architecture, such as the one depicted here, greatly reduces complexity and costs granting complete high performance 360 degree surveillance. 7
Conclusions
In this paper, a novel 360 degree video surveillance sensor called PSA has been described. It has the advantages of a multi-camera system while maintaining the robustness of a single camera. Proposed results demonstrate how system track capabilities are equal than traditional multi-camera systems, but building costs and configuration procedures are smaller. This sensor has been designed as a multi purpose video surveillance system so that possible applications can be traffic monitoring, monitoring of sensitive sites, military IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005
applications and virtual reality. Possible future evolutions can include an independent PTZ tracking camera in order to track objects also when they move out from the ‘tracking’ field. In this way, the limits imposed by reflector resolution can be easily overcome, also permitting to continue tracking targets more outside the tracking belt. 8
Acknowledgments
This work was performed under co-financing of the MIUR within the project FIRB-VICOM and ELSAG S.p.a.. G. Scotti has been funded under a grant by County of Genoa and ELSAG Spa. 9
References
1 Harabar, S., and Sukhatme, G.: ‘Omni directional vision for an autonomous helicopter’. Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2003 2 Nayar, K., and Boult, T.: ‘Omni-directional vision systems: PI report’ Proc. DARPA Image Understanding Workshop, Monterey, Nov. 1998 3 Boult, T.E., Micheals, R.J., Gao, X., and Eckmann, M.: ‘Into the woods visual surveillance of non cooperative and camouflaged targets in complex outdoor settings’, Proc. IEEE, 2001, 89, pp. 1382–1402 4 Baker, S., and Nayar, S.K.: ‘Catadioptric image formation’. Proc. DARPA Image Understanding Workshop, May 1997 5 Daniilidis, K., Makadia, A., and Bulow, T.: ‘Image processing in catadioptric planes: spatiotemporal derivatives and optical flow computation’. Proc. 3rd Workshop on Omni-directional Vision (OMNIVIS), IEEE, 2002, pp. 3 –10 6 Paulino, A., Araujo, H., and Salvi, J.: ‘Pose estimation for central catadioptric systems: an analytical approach’. ICPR 2002, Quebec, Canada 7 Sturm, P.: ‘Mixing catadioptric and perspective cameras’. Proc. 3rd Workshop on Omni-directional Vision (OMNIVIS), IEEE, 2002, pp. 37–44 8 Marcenaro, L., Gera, G., and Regazzoni, C.S.: ‘Adaptive change detection approach for object detection in outdoor scenes under variable speed illumination changes’, Eusipco, Tampere, Finland, 2000, pp. 1025–1028 9 Regazzoni, C.S., Vernazza, G., and Fabri, G. (Eds.): ‘Advanced videobased surveillance systems’ (Kluwer Academic Publishers, Norwell MA, 1999) 10 Skifstad, K., and Jain, R.: ‘Illumination independent change detection for real world image sequences’, Comput. Vis. Graph. Image Process., 1989, 46, pp. 387–399 11 Marchesotti, L., Messina, A., Marcenaro, L., and Regazzoni, C.: ‘A cooperative multi-sensor system for face detection in video surveillance applications’, Int. J. Chin. Autom., 2002, 5, (5)
257