Robust Data Fusion in a Visual Sensor Multi-Agent Architecture

4 downloads 0 Views 264KB Size Report
Abstract—A surveillance system that fuses data from several data sources is ... by a fusion agent in a multi-agent surveillance system. The ..... [9] Gerhard Weiss.
Robust Data Fusion in a Visual Sensor Multi-Agent Architecture F. Castanedo, M.A. Patricio, J. Garc´ıa and J.M. Molina University Carlos III of Madrid Computer Science Department Applied Artificial Intelligence Group Avda. Universidad Carlos III 22, 28270-Colmenarejo (Madrid) {fcastane, mpatrici, jgherrer}@inf.uc3m.es, [email protected]

Abstract—A surveillance system that fuses data from several data sources is more robust than those which depends on a single source of input. Fusing the information acquired by a vision system is a difficult task since the system needs to use reliable models for errors and take into account bad performance when taking measurements. In this research, we use a bidimensional object correspondence and tracking method based on the ground plane projection of the blob centroid. We propose a robust method that employs a two phase algorithm which uses a heuristic value and context information to automatically combine each source of information. The fusion process is carried out by a fusion agent in a multi-agent surveillance system. The experimental results on real video sequences have showed the effectiveness and robustness of the system.

Keywords: Multi Camera Image Fusion, Distributed Surveillance Systems, Multi-Agent Systems I. I NTRODUCTION Data fusion from multiple cameras involving the same object is one of the main challenges in multi camera surveillance systems [1] and it is related to the data combination from different sources in an optimal way [2]. The data fusion process in multi-sensor networks is the main aspect to build a coherent time-space description of interesting objects in the area (called ”level-one fusion task” [3] [5]). To do that, it is necessary to estimate the reliability of the available sensors and processes, so that complementary information could be combined (by removing redundant information) in areas with multiple views, to solve problems for specific sensors such as occlusions, overlaps, shadows, etc. Some traditional advantages, besides extended spatial coverage are, improving the accuracy with combination by means of covariance reduction, robustness by identification of sensors malfunctioning and improved continuity with complementary detections, etc. [5] [6] [7]. To achieve these goals, some of the basic aspects to be taken into account for data fusing from different cameras could be as follows: • • • •

changing into a common coordinate space (global coordinates) and synchronization in a common time basis. dynamic time-space alignment (or recalibration) to guarantee unbiased information to fuse. removing corrupted or wrong objects with analysis tests. data association at the right level.



combinating estimates obtained with different sensors and local processors.

One of the key steps in data fusion is determining how to represent information and uncertainty. For this purpose, a lot of literature on data fusion includes Bayesian approaches, a technique which uses probabilities to represent degrees of belief. However, Hall [4] describes a list of problems associated with such techniques: • • • •

difficulty in defining prior likelihoods. complexity when there are many potential hypotheses and many condition dependent events. hypotheses must be mutually exclusive. the ability to describe uncertainty in decisions.

To deal with the last problem, Dempster [11] and later Shafer [12], generalized the traditional Bayesian belief model to allow explicit representation of uncertainty with no model for the sources to fuse. On the other hand, the effect of one sensor on another, also known as data correlation, has studied in depth. Some authors use measurement reconstruction [8], a technique which can be used in a global fusion node, which compares remote estimates received with its own version of the global estimate. Julier and Uhlmann proposed the Covariance Intersection (CI) algorithm [13]. CI works to solve the problem of correlated inputs, but it is undefined for inconsistent inputs. To solve this, Uhlmann develop Covariance Union [14] to handle inconsistent sources and robust fusion. In vision applications, Snidaro et. al [15] proposed track-to-track scheme without feedback for combined data from different sensors together. They also propose a confidence measure to automatically control the fusion process automatically according to the performance of the sensors called appearance ratio (AR) [16]. In [17] the authors proposed the application of fuzzy and neuro-fuzzy techniques in a multi target tracking video system in order to ponder update decisions both for the trajectories and shapes estimated for targets. The main objective of this contribution is to make an accurate and robust fusion process to track people using a multi camera surveillance system embedded in a multi-agent architecture. The overlapped regions of each camera and the calibration of them using a common reference system allow

the fusion node to correct measurements in order to improve the tracking accuracy of the surveillance system. Indeed, an operator could easily select the most suitable visual sensor for any situation, but having a system to automatically pick the right camera or set of cameras is less trivial. In the next section we will show the multi-agent architecture, then section III will explain the two phase fusion algorithm. Section IV is related to the experiments and results, finally section V discusses the conclusions of the research.

processors. The inter-sectional region between cameras is used to track targets as they transit between different fields of view to fuse the output and compute the corrections.

II. M ULTI AGENT A RCHITECTURE Using a multi-agent architecture for video surveillance have several advantages [19] [18] [20]. First of all, the loosely coupled nature of the multi-agent architecture allows more flexibility for the communication processes, also the ability to assign responsibilities for each agent is ideal to solve complex task in a surveillance system. This complex task involves the use of mechanisms such as coordination, dynamic configuration and cooperation that are widely studied in the multi-agent community. Another ideas from the multi-agent and distributed artificial intelligence, for example the dynamic role distribution, could be applied [9]. In our system, the data fusion process is carried out by the fusion agent of the multi-agent architecture depicted in Figure 1.

Figure 2.

Multi-camera geometry for global coverage

A. Surveillance-sensor agent: Synchronization Each surveillance-sensor agent Si acquires images I(x, y) at a certain frame rate, Vi . The tracking process provides for ˆ Si [n], each target XTj , an associated track vector of features X Tj containing the numeric description of their features and state: location, velocity, dimensions, etc. and associated error covariance matrix, PˆTSji [n]. Usually, video frame grabbers, with A/D conversion in the case of analog cameras, or directly with digital cameras, provide a sequence of frames, fn , which can be assigned to a time stamp by knowing the initial time of capture, t0 , and the grabbing rate, Vi (frames per second): fk = Vc (t − t0 )

(1)

Although external time stamps fix the problem of numbering each frame, the clocks of the different machines which generate the time stamps must be synchronized. Using the Network Time Protocol (NTP) [22] as an external clock source to stabilize the local clocks of each machine is one way to solve the local clocks differences. B. Surveillance-sensor agent: Calibration and Correspondence Figure 1.

MAS architecture

In the multi-agent architecture there are several surveillancesensor agents which track every the targets and send vectors of features and their associate covariance matrix to their respective fusion agent. Each surveillance-sensor agent is coordinated with other surveillance-sensor agents in order to improve surveillance quality. The fusion agent integrates all the surveillance-sensor agent’s data information (vectors of features and an association covariance matrix) of the targets. We consider that cameras in the surveillance system are deployed so that their field of view is partially overlapped. Figure 2 shows a possible geometry. This level of redundancy allows advantages of redundancy and smooth transitions with overlapped areas, which may be affordable given the current low cost of equipments and

The camera calibration is the projection from local space to the common representation in central coordinates. The correspondence between multiple cameras involves at the same time instant finding correspondences between objects in the different images sequences. So, each camera in the surveillance system is assumed to measure the location of mobile targets within its field of view with respecting to a common reference system. This is a mandatory step, since they must use the same metrics during the cooperative process. In order to make the calibration between multiple cameras we use Tsai calibration method [23]. As Khan et al. [25] [27] we use the points located on the feet to match people in multiple views based on the homography constraint defined by the ground plane (see Figure 3). We projected the centroid of each blob on the local ground plane and then applied the Tsai calibration method in order to transform each local coordinates

and their associated error covariance matrix intto the common representation.

agent is a two phase algorithm, in the first phase we selected the consistent tracks acquired from each surveillance-sensor agent Si and subsequently the selected tracks are fused. A. Phase 1. Consistent Tracks In order to detect inconsistent tracks we use these two methods: • Calculating the Mahanalobis Distance (MD) [26] between each surveillance-sensor agent (Si ) track features and the mean (M ) of all candidates features.

Figure 3.

Planar projection on the ground plane

Correspondence results are used to improve the tracking through the fusion of consistent trajectories between cameras. Once the correspondence problem is solved, the aim is to improve the tracking results of each camera by using the tracking results of the others cameras. It means that the data fusion process makes a robust tracking possible even when the position of the person is affected by shadows or occlusions. III. F USION AGENT: F USION A LGORITHM If several surveillance-sensor agents with different points of view tracks the same object, it may occur that one of these surveillance-sensor agents provide wrong information about the position of the same object. Wrong information can be provided due to many reasons (i.e. the tracking object could be affected by shadow, communication errors, hardware failures, occlusions, change of the illumination conditions, dew, etc.). Therefore, it is necessary to detect inconsistencies and problems before the fusion process. Let N be equal to the number of surveillance-sensor agents in the surveillance system and S be a subset of surveillancesensor agents S = {S1 , S2 , . . . , Si } which are monitoring the same area with 3 6 i 6 N . Thus, we suppose that we have at least 3 surveillance-sensor agents in {S} that are monitoring the same area. Let’s suppose j is equal to the number of common targets in {S} and T be the set of targets T = {T1 , T2 , . . . , Tj }. For each target the surveillance-sensor agent observes a vector of features. Each feature is an observable characteristic that could be taken into account in order to fuse the information. Some examples of features could be: position, velocity, color, size, shape, etc... Therefore, we have ˆ Si [n] of each target j acquired by a vector of features X Tj each surveillance-sensor agent i. F is the set of vector ˆ S2 [n], ˆ S1 [n]}, {X ˆ S1 [n], . . . X ˆ S1 [n], X features F = {{X T1 Tj T2 T1 ˆ Si [n]}} ˆ Si [n], , . . . , X ˆ Si [n], X ˆ S2 [n]}, . . . , {X ˆ S2 [n] , . . . , X X Tj T2 T1 Tj T1 and P is the set of associated error covariance matrices P = {{PˆTS11 [n], PˆTS21 [n], . . . PˆTSj1 [n]}, {PˆTS12 [n], PˆTS12 [n] , . . . , PˆTSj2 [n]}, . . . , {PˆTS1i [n], PˆTS2i [n], , . . . , PˆTSji [n]}}. In the fusion process we suppose that the correspondence problem between the same target in different surveillanceagents is solved. The fusion algorithm carried out by the fusion



ˆ Si [n])T ˆ Si [n]−M ˆ Si [n])(Pˆ Si [n])−1 (X ˆ Si [n]−M M DSi = (X Tj Tj Tj Tj Tj (2) If the MD exceeds the λ threshold, the track is not selected to be taken into account in the second phase. taking into account context information. The idea is to establish a priori spatial context information in which tracking measurements which have no sense (spatial tracking restrictions) are ruled out. For example, if a surveillance system is tracking a person inside an office, spatial context information could be the position of the desks in the office.

Algorithm Phase 1: Select Consistent Tracks of each camera SelectConsistentTraks ({S}, {T}, {F}) for each Si ∈ {S} for each common target Tj ∈ {T } ˆ Si [n]) ˆ Si [n] ← CalculateMean(Si , Tj , X M Tj Tj Initialize fusion set: {S F } ← {S} for each Si ∈ {S} for each common target Tj ∈ {T } ˆ Si [n])) ≤ λ or ˆ Si [n], M if (MD (X Tj Tj ˆ Si [n]) IsOutOfContext (X Tj {S F } = {S F } − Si for each Si ∈ {S F } for each common target Tj ∈ {T } ˆ Si [n]) ˆ Si [n] ←CalculateMean(Si , Tj , X M Tj Tj F if {S } = ∅ {S F } ← take Si with Min(PˆTSji [n]) return({S F }) B. Phase 2. Fusion between consistent tracks Once consistent tracks are selected, the data fusion is performed according to the reliability of each track. We take a simple fusion approach based on weighting each source of information according to its level of confidence (αTSji [n]). So we need to calculate each level of confidence αTSji [n] per each target Tj for all surveillance-sensor agents Si in the consistent set {S F }.

Algorithm: Calculate Weighted values CalculateWeigthValues ({S F }, {T }, {F }) for each consistent camera Si ∈ {S F } for each common target Tj ∈ {T } αTSji [n] ← (]{S F })−1 * (PˆTSji [n])−1 + hSi αTSji N orm[n] ← Normalize (αTSji [n]) return(αTSji N orm[n]) In the previous step we calculate the level of confidence for each consistent camera and for each common target. This value is based on the inverse covariance value of each sensor and target plus a heuristic value hSi per sensor that is setting by a human operator of the surveillance system. Then the N ormalize() function changes the values in order to satisfy equation 3.

Figure 4.

Geometric patterns of the floor used for calibration purposes

]{S F }

∀j ∈ {Tj } → 1 =

X

αTSji [n]

(3)

i=1

The vector obtained is used in the second phase of the algorithm. Algorithm Phase 2: Fusion between consistent tracks FusionConsistentTracks ({S F }, {T }, {F }, αTSji [n]) for each consistent camera Si ∈ {S F } for each common target Tj ∈ {T } ˆ Si [n]) ˆ STi [n] = αSi [n] ∗ value(X XF Tj Tj j ˆ STi [n]) return(XF j

With the previous algorithm we obtained the fused values of each target feature from all the consistent sensors. IV. E XPERIMENTS AND P RELIMINARY R ESULTS We evaluate the proposed fusion algorithm, using the open computer vision data set PETS 2006 [21] and our surveillance system implementation based on the well-known Open Computer Vision (OpenCV) [24] library. Many algorithms [27] [28] [29] [30] have been evaluated using the PETS database and we think it is a good approach to use well known data sets to evaluate data fusion algorithms. The resolution of all PETS 2006 sequences are PAL standard (768 x 576 pixels, 25 frames per second) compressed as JPEG image sequences. The input images for the experiments are the images between the frame 0 and 199 of the cameras C1 (Canon MV-1 1xCCD w/progressive scan), C3 (Canon MV1 1xCCD w/progressive scan) and C4 (Sony DCR-PC1000E 3xCMOS). We do not consider images of camera C2 (Sony DCR-PC1000E 3xCMOS) because they have a poor image quality due to the location. These videos were taken in a real world public setting a railway station. The calibration data for each individual camera was given and computed from specific point locations taken from the geometric patterns on the floor of the station (see figure 4). In figure 5 we show the tracking trajectories for each local camera. Camera 1 (on the left) shows an unstable tracking, camera 3 (in the center) presents the best local tracking results

due to their location, camera 4 (on the right) presents a stable but imprecise tracking due to the shadow of the tracking person. In figure 6 we show the foreground and the blob detection for each local tracking trajectories. Figures 7, 8 and 9 show the global trajectories position of camera 1, camera 3 and camera 4. That is, the projection of the ground plane coordinates (x,y) of the tracked object and the application of Tsai transformation [23]. As we can see in figure 7 (trajectories of camera 1) trajectory positions are scattered due to the tracking problems. On the other hand, figure 8 (trajectories of camera 3) presents continuous and stable trajectory positions in the same frames. We can see another tracking problem in figure 9, where the tracking is affected by shadow. Therefore the fusion process deal with three different types of sources.

Figure 7.

Global trajectories position of camera 1 (Frames: 89-199).

In the figure 10, we show the difference between the mean position values of the three cameras (((xC1 , yC1 ) + (xC3 , yC3 ) + (xC4 , yC4 ))/3) and the tracking results using the previous fusion algorithm. This figure shows the improvement obtained by using the proposed fusion algorithm. In this experiments the algorithm detects an inconsistency between camera C4 and the rest of cameras. Therefore the fusion

Figure 5. Tracking trajectories in the ground plane of the same object with three different points of view (camera1, camera3 and camera 4 of Pets 2006 data set 1): From the top to the bottom, the frame numbers are, respectively, 91, 140 and 199.

Figure 8.

Global trajectories position of camera 3 (Frames: 89-199).

algorithm only takes into account trajectory positions from camera C1 and camera C3 . In this preliminary results we only use position features and we do not take into account context information.

Figure 9.

Global trajectories position of camera 4 (Frames: 89-199).

V. C ONCLUSIONS AND F UTURE W ORK

inconsistent tracks, using normalized residuals between them, and context information. This makes it possible to eliminate inconsistent tracks and then we fuse each source according to their reliability.

In this research we tackled the fusion of data from multiple cameras in a visual sensor multi-agent architecture. There are several approaches in the literature which deal with data fusion, although they have some specific problems when being applied to distributed vision systems. Our approach uses a visual sensor multi-agent architecture and a robust data fusion process carried out by a fusion agent. Our data fusion algorithm, is a two phase algorithm. In the first step we detect

Our method has been tested on the PETS 2006 data set [21]. We have shown preliminary results of the fusion algorithm tested with three different sequences of images acquired from three different cameras of the PETS 2006 data set. The preliminary experimental results have shown the robustness of the algorithm. In our future work, we will exploit using context information in the experiments and apply the algorithm in real time surveillance systems.

Figure 6. Foreground of tracking trajectories in the ground plane of the same object with three different points of view (camera1, camera3 and camera 4 of Pets 2006 data set 1): From the top to the bottom, the frame numbers are, respectively, 91, 140 and 199.

Figure 10. Comparison between mean values and fusion algorithm values (Frames: 89-199).

ACKNOWLEDGMENT The authors are supported by projects CICYT TSI200507344, CICYT TEC2005-07186 and CAM MADRINET S0505/TIC/0255 R EFERENCES [1] J. Manyika and H. Durrant-Whyte. Data Fusion and Sensor Management a decentralized information-theoretic approach. Ellis Horwood, 1994 [2] E. Waltz and J. Llinas. Multisensor Data Fusion. Artech House Inc, Norwood, Massachussets, U.S, 1990. [3] D.L. Hall and J. Llinas. Handbook of MultiSensor Data Fusion. Ed. Boca Raton. CRC Press, 2001. [4] D. Hall. Mathematical Techniques in Multisensor Data Fusion Artech House. 1992.

[5] Ng. G.W. Intelligent systems: fusion, tracking and control Research studies Press, 2003. ISBN: 086380277X. [6] L. Marchessoti, G. Vernazza and C. Regazzoni. A multicamera fusion framework for multiple occluding objects tracking in intelligent monitoring and sport viewing applications. IEEE International Conference on Image Processing, 2004, pp.1033-1036. [7] S. Mavandadi and P. Aarabi. Multi-sensor Information Fusion with applications to Multi-Camera Systems. Proceedings of the 2004 IEEE International Conference on Systems Man and Cybernetics, October 2004, The Hague, Netherlands, pp 1267-1271. [8] L.Y. Pao. Distributed Multisensor Fusion. Am. Inst. of Aeronautics and Astronautics, 1994. [9] Gerhard Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. The MIT Press. 1999. [10] L.Y. Pao. and M. Kalandros. Algorithms for Distributed Architecture Tracking. Proc. Am. Control Conference, June, 1997. [11] A.P. Dempster. A Generalisation of Bayesian Inference. J. Royal Statistical Soc. vol 30, pp. 205-247. 1968. [12] A Mathematical Theory of Evidence. Princeton Univ. Press, 1976. [13] S.J. Julier and J.K. Uhlmann. A Non-Divergent Algorithm in the Presence of Unknown Correlation. Proc. Am. Control Conference, June 1997. [14] J.K. Uhlmann. Covariance Consistency Methods for Fault Tolerant Distributed Data Fusion- Information Fusion, Vol 4, pp. 201-215, 2003. [15] L. Snidaro, R. Niu, P.K. Varshney and G.L. Foresti. Sensor fusion for video surveillance. Proc. of the Seventh International Conference on Information Fusion, Stockholm, Sweden, 2004. pp 739-746. [16] L. Snidaro, R. Niu, P.K. Varshney and G.L. Foresti. Automatic Camera Selection and Fusion for Outdoor Surveillance under Changing Weather Conditions. Proc. of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. [17] J. Garcia, J.M. Molina, J.A. Besada and J.I. Portillo. A multitarget tracking video system based on fuzzy and neuro-fuzzy techniques. IN EURASIP Journal on Applied Signal Processing, volume 14, 2007. pp 2341-2358. [18] F. Castanedo, M.A. Patricio, J. Garc´ıa and J.M. Molina. Extending Surveillance Systems Capabilities Using BDI Cooperative Sensor Agents.

Proc. of the 4th ACM international workshop on Video surveillance and sensor networks, Santa Barbara, California, USA, 2006. pp 131-138. [19] F. Castanedo, M.A. Patricio, J. Garc´ıa and J.M. Molina. Coalition of Surveillance Agents. Cooperative Fusion Improvement in Surveillance Systems. Proc. of the 1st International Workshop on Agent-Based Ubiquitous Computing, Honolulu, Hawaii, 2007. [20] M.A. Patricio, J. Carb´o, O. P´erez and J. Garc´ıa. Multi-Agent Framework in Visual Sensor Networks. IN EURASIP Journal on Advances in Signal Processing, volume 2007. [21] http://pets2006.net/ [22] Network Time Protocol. http://www.ntp.org [23] R.Y. Tsai. An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, pp. 364-374, 1986. [24] http://sourceforge.net/projects/opencvlibrary [25] S. Khan and M. Shan. Consistent labeling of Tracked Objects in Multiple Cameras with Overlapping Fields of View. IEEE Trans. Pattern Analysis and Machine Intelligence, vol 25, no. 10, pp 1355-1360, Oct. 2003. [26] P. Mahanalobis. On the generalized distance in statistics. Proc. Natl. Inst. Sci. 12, 1936. 49-55. [27] S. Khan, O. Javed, and M. Shah. ”Tracking in Uncalibrated Cameras with overlapping Fields of View”. Proc. IEEE Int’l Workshop Performance Evaluation of Tracking and Surveillance, pp. 84-91, Dec. 2001. [28] J. Black and T. Ellis. ”Multi-Camera Image Tracking”. Proc. IEEE Int’l Workshop Performance Evaluation of Tracking and Surveillance, pp. 6875. Dec. 2001. [29] Q. Zhou and J.K. Aggarwal. ”Tracking and Classifying Moving Objects from Video”. Proc. IEEE Int’l Workshop Performance Evaluation of Tracking and Surveillance, pp. 52-29. Dec. 2001. [30] L.M. Fuentes and S.A. Velastin. ”People Tracking in Surveillance Applications”. Prc. IEEE Int’l Workshop Performance Evaluation of Tracking and Surveillance, pp. 20-27. Dec. 2001.

Suggest Documents