Bottom-Up/Top-Down Coordination in a MultiAgent Visual Sensor Network∗ F. Castanedo, M.A. Patricio, J. Garc´ıa and J.M. Molina University Carlos III of Madrid Computer Science Department Applied Artificial Intelligence Group Avda. Universidad Carlos III 22, 28270-Colmenarejo (Madrid) {fcastane, mpatrici, jgherrer}@inf.uc3m.es,
[email protected]
Abstract
human desires ignoring other facets of human motivations to act. And finally, it also uses, in a consistent way, psychological concepts that closely correspond to the terms that humans often use to explain their behavior.
In this paper an approach for multi-sensor coordination in a multiagent visual sensor network is presented. A BeliefDesire-Intention model of multiagent systems is employed. In this multiagent system, the interactions between several surveillance-sensor agents and their respective fusion agent are discussed. The surveillance process is improved using a bottom-up/top-down coordination approach, in which a fusion agent controls the coordination process. In the bottomup phase the information is sent to the fusion agent. On the other hand, in the top-down stage, feedback messages are sent to those surveillance-sensor agents that are performing an inconsistency tracking process with regard to the global fused tracking process. This feedback information allows to the surveillance-sensor agent to correct its tracking process. Finally, preliminary experiments with the PETS 2006 database are presented.
In a visual sensor network, the integration of the results obtained from multiple visual sensors can provide more accurate information than using a single visual sensor [6] [7]. This allows, for example, a improved tracking accuracy in a surveillance system. However, data fusion must be performed with due care, because even though multiple visual sensors provide more information of the same object; this information could be inconsistent between them. A lot of reasons could provide inconsistent or wrong information in a visual sensor network when objects are being tracked. First, the object could be affected by shadow [8] when it is being tracked. This shadow could be originated by external conditions. Second, external conditions could affect the accuracy of the tracking process. For example: changes in illumination conditions, a sudden increment in wind velocity, all of them affect to the foreground detector and therefore the global tracking process. Another problem which a multiagent visual sensor network must take into account are the partial occlusions of the objects that are being tracked [1].
1. Introduction A multiagent visual sensor network is a distributed network of several intelligent software agents with visual capabilities [1]. An intelligent software agent is a computational process that has several characteristics [2]: (1) ”reactivity” (allowing agents to perceive and respond to a changing environment), (2) ”social ability” (by which agents interact with other agents) and (3) ”proactiveness” (through which agents behave in a goal-directed fashion). Wooldridge and Jennings give a strong notion of agent which also uses mental components such as belief, desire and intentions (BDI).The BDI model is one of the best known and studied models of practical reasoning [3]. It is based on a philosophical model of human practical reasoning, originally developed by M. Bratman [4] and reduces the explanation for complex human behavior to a motivational stance [5]. This means that the causes for actions are always related to the
In our proposed visual sensor system the data fusion process is carried out by a fusion agent in the multiagent visual sensor network. The fusion agent informs to each surveillance-sensor agent which are being performing an inconsistent tracking. The main objective of our approach is to coordinate the network of visual sensors, and the fusion agent is the manager of this coordination process. This paper focus on the interactions between several surveillance-sensor agents and their respective fusion agent in order to solve the specific problems of inconsistencies. In the next section the related work using multiagent systems in a visual sensor network is reviewed and our multiagent approach is presented. Later, we further explain the bottomup/top-down coordination in a multiagent visual sensor network. Then we present experimental results of the proposed method and finally the conclusions of this research.
∗ Funded by projects Ministerio de Fomento (SINPROB), CICYT TEC2005-07186 and CAM MADRINET S-0505/TIC/0255
1 978-1-4244-1696-7/07/$25.00 ©2007 IEEE.
2
MultiAgent Visual Sensor Networks
The detector process (1) of moving objects must give a list of blobs that are found in a frame, this list contain information about the position and size of each blob. Within the tracking process and continuing with the list of blobs obtained by the previous module, the association process (2) will solve the problem of blob-to-track multi-assignment, where several (or none) blobs may be assigned to the same track and simultaneously several tracks could overlap and share common blobs. So, the association problem to solve, is the decision of the most proper grouping of blobs and the assignation to each track for each frame processed. The prediction process (3) uses the association made by the tracking process and predicts where each track will move to during the next frame. This prediction will be used by the tracking process in order to make the association. The blob deleter (4) module eliminates those blobs that have not been associated to any track, thus they are considered to be noise. The last main module, the track updater (5), updates the tracks obtained in the last frame, with the information obtained from the previous modules for this frame.
There are a few works in the literature related to multiagent frameworks applied to surveillance systems. In [9] the authors presented a flexible and opportunistic agent orientated framework, where agents are created when required, with the objective to construct an automatic visual surveillance system. In another interesting work [10] a multiagent framework for visual surveillance is proposed. In their proposed architecture, one object agent per target in the scene is created. A different framework approach [1] is to use one agent per sensor. Each camera maintains the information of the object of interest in its Field-of-View (FoV) as an agent belief. An approach to recognize events using a multiagent system and complex temporal description is presented in [11]. In [12] the multiagent paradigm is used in order to achieve scalable designs and cooperative behaviours. They use Hidden Markov Models for learning the scene. Our multiagent visual sensor network is composed of several types of intelligent agents working together in order to achieve a common goal: a coherent surveillance among all the surveillance-sensor agents involved. The details of the multiagent visual sensor network architecture are described formally and more extensive in [1, 13, 14]. This paper, only focus on the interactions between these two types of agents:
3
Bottom-Up/Top-Down Coordination Approach
The visual sensor network is composed of surveillancesensor and fusion agents. Let us assume that n is the number of surveillance-sensor agents in the multiagent visual sensor network and S is the set of autonomous surveillancesensor agents S = {S1 , S2 , . . . , Sn }. Let us assume that m is the number of fusion agents in the multiagent visual sensor network and F is the set of autonomous fusion agents F = {F1 , F2 , . . . , Fm }. Each surveillance-sensor agent Si has a specific fusion agent Fj to which it sends tracking information. For example, a fusion agent F1 can fuse the information received from S1 , S2 and S3 surveillance-sensor agents. Each surveillance-sensor agent Si acquires images I(x, y) at a certain frame rate, Vi . The internal tracking process of each surveillance-sensor agent, provides for each detected target XTj , an associated track vector of features ˆ Si [n], containing the numeric description of their features X Tj and state: location, velocity, dimensions, etc. and associated error covariance matrix, PˆTSji [n]. The clocks of the different computer machines where each surveillance-sensor agent runs, must be synchronized. Network Time Protocol (NTP) is used as an external clock source to stabilize the local clock of each machine. The fusion agent, which is in charge of performing the data fusion, needs to receive the information of the objects which are being tracked by each surveillance-sensor agent. This process is related to the social ability of the agents in the multiagent systems.
1. surveillance-sensor agent: it tracks all the targets moving within its local field of view and sends data to their related fusion agent. It is coordinated with other surveillance-sensor agents in order to improve surveillance quality. 2. fusion agent: integrates the information sent from the associated surveillance-sensor agents. It analyzes the situation in order to manage the resources and to coordinate the associated surveillance-sensor agents during the fusion stage. As explained in [13], the surveillance-sensor agent’s Desires, capture the motivation of the agent. The final goal of each surveillance-sensor agent is the permanent surveillance of its environment. An overview of the implementation of this Desire is depicted in Figure 1. The surveillance desire implementation is arranged in a pipe-line structure of several modules, as shown in Figure 1; it directly interfaces with the image stream coming from a camera and extracts the track information of the mobile objects in the current frame. The interface between adjacent modules is symbolic data and it is set up so that for each module different algorithms are interchangeable. The main modules of the surveillance desire implementation are: (1) a detector process of moving objects; (2) an association process; (3) a prediction process; (4) blob deleter; (5) track updater. 2
Figure 1: Tracking System Architecture.
Figure 3: Top-Down process in the data fusion protocol.
Figure 2: Bottom-Up process in the data fusion protocol.
3.1
surveillance-sensor agent Sk to the same fusion agent (Fj ): F ˆ T [n]) ˆ T [n], X SendT argetInf oSkj (X 2 1
Bottom-Up Phase
In the bottom-up phase (see Figure 2) each surveillancesensor agent sends information about the tracked objects to the fusion agent. The fusion agent analyzes the information received from all the surveillance-sensor agents and then in the top-down stage (see Figure 3), sends quality feedback about the tracking process. This feedback information is sent to each surveillance-sensor agent involved in the data fusion process that has a clearly inconsistency between their track vector and the information vector fused by the fusion agent. The main objective of this two-stage data fusion protocol is to maintain a coherent network between the agents involved in the surveillance process. And also, to automatically correct tracking inconsistencies between the same object. At each instant of time t, each surveillance-sensor agent Si sends information about the tracked objects to their respective fusion agent Fj . So, for example, let us suppose that the surveillance-sensor agent Si at time t is tracking two objects (T1 and T2 ), it sends the next message to the fusion agent (Fj ): F ˆ T [n]) ˆ T [n], X SendT argetInf oSij (X 2 1
Also, it could be possible that a different surveillancesensor agent Sl sent a different message at time t that involves only one object (T1 ): F ˆ T [n]) SendT argetInf oSlj (X 1 Since all the messages are asynchronous and non-block, each agent continues doing things after they are sent. The fusion agent updates the information of all the objects as soon as it processes the received messages.
3.2
Inconsistent Tracking Detection
For each detected object the fusion agent maintains a tracking history. So, at time t it updates the information received from each object, in a simple way, performing a mean between the information provided from all the surveillancesensor agents. Also the fusion agent sends feedback information (every t ∗ β time) about the global quality of the tracking process to each surveillance-sensor agent which is performing an inconsistent tracking. An inconsistent tracking from a surveillance-sensor agent is detected using a disparity measure (a comparison between the information of
Therefore the fusion agent Fj has the information at the time instant t of the object T1 and T2 detected from the surveillance-sensor agent Si . A different message, that involves the same two objects, is sent at time t from the 3
the last (t ∗ β) steps): ˆ Fl [n]) ˆ Si [n] − X X (X Tj Tj
(t∗β)
t=1
(t ∗ β)
≥K
(1)
where K and β are application dependent constant.
3.3
Top-Down Phase
If the disparity measure of a surveillance-sensor agent is meaningful, a F eedbackT argetInf o() message is sent to the surveillance-sensor agent Si in the top-down phase. The feedback information is provided with less frequency than the SendT argetInf o() messages, in order to avoid a network congestion. So, if SendT argetInf o() message is sent every time t, a F eedbackT argetInf o() it could be received every (t ∗ β) steps. Like K, β is a constant empirically established. The F eedbackT argetInf o() message allows the surveillance-sensor agent to correct the tracking, due to its knowledge of how good the tracking is being performed in the other surveillance-sensor agents involved in the surveillance process. Let us suppose that in the previous example, the surveillance-sensor agent Sl is performing a bad tracking of the object T1 ; so the fusion agent Fj sends a F eedbackT argetInf o() message with the global tracking information to the surveillance-sensor agent Sl :
Figure 4: Geometric patterns of the floor used for calibration purposes. All the data provided by each surveillancesensor agent use this system of reference.
Figure 5: S1 (camera 1) local tracking at frame 160.
scan), C3 (Canon MV-1 1xCCD w/progressive scan) and C4 (Sony DCR-PC1000E 3xCMOS). We do not consider images of camera C2 (Sony DCR-PC1000E 3xCMOS) because they have a poor image quality due to their location. These videos were taken in a real world public setting, a railway station. The calibration data for each individual camera are given and were computed from specific point locations taken from the geometric patterns on the floor of the station (see Figure 4). In consequence we have three surveillance-sensor agents (S1 , S2 and S3 ) each of them controls the images from one camera. Also we have only one fusion agent (F1 ). Each surveillance-sensor agent performs a local tracking on the ground plane of the scene. The results of this local tracking (at frame 160) applied to the woman with a black skirt are shown in Figures 5, 6 and 7. The surveillance-sensor agent S1 (camera 1) tracking results are presented in Figure 8, the internal process of this agent detects the object of interest in the frame 28. On other hand, the surveillance-sensor agent S2 (camera 3) results are presented in Figure 9, in this case the tracker detects the object in the frame 76, but presents a more stable tracking, due to a good foreground/background detection. In the case of camera 4, the surveillance-sensor agent S3 starts the tracking process at frame 89. This tracking presents a problem related to the shadow, that causes a bigger blob and
ˆ T [n]) F eedbackT argetInf oSFlj (X 1 When the surveillance-sensor agent (Sl ) receives the F eedbackT argetInf o() message, a reasoning process must happen in order to know what is the problem related to the tracking process that is being carried out.
4
Experimental Results
In this section we show preliminary experimental results, in which we do not consider the automatic adjustment of the surveillance-sensor agents. Although many adhoc implementations of the BDI abstract intepreter proposed by Rao and Georgeff have been developed, recently the release of JADEX [15] is obtaining increasing acceptance. So we select it for the implementation of the multiagent visual sensor network. Jadex facilitates FIPA-ACL communications between agents. Therefore, each surveillance-sensor agent and the fusion agent is a Jadex agent. For this experiments we select the images from the open computer vision data set PETS 2006. The resolution of all PETS 2006 sequences are PAL standard (768 x 576 pixels, 25 frames per second) and compressed as JPEG image sequences. The input images for the experiments are the images between the frame 0 and 199 of the cameras C1 (Canon MV-1 1xCCD w/progressive 4
Figure 9: S2 (camera 3) local tracking (frames 76-199). Figure 6: S2 (camera 3) local tracking at frame 160.
Figure 10: S3 (camera 4) local tracking (frames 89-199). Figure 7: S3 (camera 4) local tracking at frame 160. the shadow mixes up the tracking process. affects to the tracking process. The fusion agent fuses the information received from each surveillance-sensor agent. The fused tracked results are shown in Figure 12. An unstable tracking occurs at the first frames (from frame 28 to 76) due to the fusion agent only receives the information from one surveillancesensor agent. The fused tracking results between frames 76 and 199 are performed using only the information received from the surveillance-sensor agent S1 (camera 1) and surveillance-sensor agent S2 (camera 3). The information received from the surveillance-sensor agent S3 are ruled out from the global tracking due to the foreground detector module of the surveillance-sensor agent S3 (camera 4) performs a badly detection. In Figure 11 we can see how
5
Conclusions
In this paper a coordination framework in a multiagent visual sensor network is presented. We discussed the interactions between the surveillance-sensor agents and their respective fusion agent. These interactions take place using a bottom-up/top-down approach. In the bottom-up phase the involved surveillance-sensor agents sent local tracking information. Then, if the fusion agent detects inconsistencies between the tracking process of each surveillance-sensor agent and the fused tracking, a feedback message is sent.
Figure 11: (camera 3) Foreground detection result at frame 160.
Figure 8: S1 (camera 1) local tracking (frames 28-199). 5
[8] O. P´erez, M A. Patricio, J. Garc´ıa, and J M. Molina. Improving the segmentation stage of a pedestrian tracking video-based system by means of evolution strategies. In 8th European Workshop on Evolutionary Computation in Image Analysis and Signal Processing. EvoIASP 2006, Budapest, Hungary, April 2006. [9] P. Remagnino, T. Tan, and K. Baker. Agent orientated annotation in model based visual surveillance. In ICCV ’98: Proceedings of the Sixth International Conference on Computer Vision, page 857, Washington, DC, USA, 1998. IEEE Computer Society.
Figure 12: Fused tracking (frames 28-199).
[10] J. Orwell, S. Massey, P. Remagnino, D. Greenhill, and G. A. Jones. A multi-agent framework for visual surveillance. In ICIAP ’99: Proceedings of the 10th International Conference on Image Analysis and Processing, page 1104, Washington, DC, USA, 1999. IEEE Computer Society.
This message allows the surveillance-sensor agent to correct the tracking process. This happens if the surveillancesensor agent is performing a deviation in the tracking process regard to the fused tracking. Preliminary results show how the tracking process is improved in the fusion agent. However the internal reasoning process of the surveillancesensor agent which allows to change the tracking process, is still an open challenge. Currently we are investigating this internal reasoning process of the surveillance-sensor agent. The idea is to change specific parameters of the tracking algorithm that could be affected from external conditions.
[11] Nevatia R Hongeng, S. Multi-agent event recognition. In Eighth ICCV. [12] Ndedi Monekosso Paolo Remagnino, Graeme A. Jones. Reasoning about dynamic scenes using autonomous agents. In AI*IA 2001: Advances in Artificial Intelligence : 7th Congress of the Italian Association for Artificial Intelligence.
References [1] M. A. Patricio, J. Carb´o, O. P´erez, J. Garc´ıa, and J. M. Molina. Multi-agent framework in visual sensor networks. EURASIP Journal on Advances in Signal Processing, 2007:Article ID 98639, 21 pages, 2007. doi:10.1155/2007/98639.
[13] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina. Extending surveillance systems capabilities using bdi cooperative sensor agents. In VSSN ’06: Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, pages 131–138, New York, NY, USA, 2006. ACM Press.
[2] M. Wooldridge and N. Jennings. Intelligent agents: Theory and practice. The knowledge Engineering Review, 1995.
[14] O. Perez, M.A. Patricio, J. Garcia, and J.M. Molina. Fusion of surveillance information for visual sensor networks. In Proceedings of the ninth International Conference on Information Fusion, Florence (Italy), July 2006.
[3] A. Rao and M. Georgeff. Bdi agents: from theory to practice. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS’95), pages 312–319, Cambridge, MA, USA, 1995. The MIT Press.
[15] A. Pokahr, L. Braubach, and W. Lamersdorf. Jadex: Implementing a BDI-Infrastructure for JADE Agents. EXP–in search of innovation, 3(3):76–85, 2003.
[4] M.E. Bratman. Intentions, Plans and Practical Reasoning. Harvard University Press, Cambridge, Massachusetts, 1987. [5] D. Dennett. The Intentional Stance. Bradford Books, 1987. [6] E. Waltz and J. Llinas. Multisensor Data Fusion. Artech House Inc, Norwood, Massachussets, U.S, 1990. [7] D L. Hall and J. Llinas. Handbook of MultiSensor Data Fusion. CRC Press, Boca Raton, 2001. 6