Dynamic Sensor Selection for Single Target Tracking ...

6 downloads 1311 Views 418KB Size Report
field of views as well as with static and non-static sensors. 1. Introduction .... of single object tracking in a large camera networks. For tracking of multiple objects, ...... a task-oriented surveillance system: A service- and event- based approach ...
Dynamic Sensor Selection for Single Target Tracking in Large Video Surveillance Networks Eduardo Monari and Kristian Kroschel Fraunhofer Institute of Optronics, System Technologies and Image Exploitation Fraunhoferstr. 1, D-76131 Karlsruhe, Germany [email protected]

Abstract

era networks the accurate selection of task-related sensors is essential for reduction of bandwidth and computational resources. For this reason, we have developed a deterministic algorithm, which is able to determine the relevant subset of cameras needed for consistent observation of a single object (or for recognition if the object is temporarily out of sight). Due to the highly efficient camera cluster calculation, the proposed approach is applicable to very large static and non-static camera networks with overlapping and non-overlapping field-of-views. This paper is organized as follows: After a short overview of related work in Section 2, in Section 3 the basic idea of our approach is introduced. In Section 4 the geometric modelling sensors and the prior knowledge about the environment is described, before starting with the introduction to the sensor selection approach in Section 5. Finally, performance results obtained by evaluations on simulated data are shown and future work is presented.

In this paper an approach for dynamic camera selection in large video-based sensor networks for the purpose of multi-camera object tracking is presented. The sensor selection approach is based on computational geometry algorithms and is able to determine task-relevant cameras (camera cluster) by evaluation of geometrical attributes, given the last observed object position, the sensor configurations and a building map. A special goal of this algorithm is the efficient determination of the minimum number of sensors needed to relocate an object, even if the object is temporarily out of sight. In particular, the approach is applicable in camera networks with overlapping and non-overlapping field of views as well as with static and non-static sensors.

1. Introduction Multi-camera video surveillance and monitoring is one of the most recent fields of development and research. Due to the increasing threat by crime, industrial espionage and even terrorism, video surveillance systems became more and more important during the last years. Especially in large camera networks, there is an increasing need for high-class automated surveillance methods. While using a small number of sensors, the operator is able to overlook all videos simultaneously and to switch their focus between them (e.g., to keep a person in view), now by a higher number of sensors the object detection, tracking and camera selection task can no more be done manually by the human operator. In an automated video surveillance system these tasks have to be fulfilled by integrated intelligent modules, which provide already processed information about events or location of objects and subsequently enable the operator to manage more complex situations. But, while a lot of research groups focus on multicamera tracking, only few activities have been noticed for intelligent camera selection. Even if for very large cam-

2. Related Work There are several algorithms in literature for camera selection in sensor networks. Thus, the most of them are designed for selecting a subset of sensors out of a set of cameras which all have the object in range. Most approaches estimate the observation quality and select the cameras with the best quality coefficients [1, 2, 3, 4]. In [5] a look-up table is used for camera selection in large networks. Depending on the object position, sensors preassigned to a specific location are selected. Cameras needed for recognition of the object after disappearance are not considered by this approaches. Concerning this matter, mainly probabilistic methods have been investigated for learning so-called “transition graphs” of camera networks [6, 7, 8, 9]. The main drawbacks of these methods is the need for a learning period, where correspondences between objects disappearing in one and reappearing in another camera have to be established. Especially, in camera networks with non1

a)

b)

Figure 1. The two sketches show two examples for typical camera networks with non-overlapping field of views. However, by involving prior knowledge about the freedom of movement of the objects to observe (persons in a building, or cars on streets), it is easy to see, that the relevant cameras for further tracking are given by the subset of cameras which surrounds the object locally (red cameras).

static sensors such a learning phase is not applicable in most cases. To overcome this drawbacks, in [10] a deterministic knowledge-based approach with no need for learning phase has been proposed. This approach uses geometric knowledge about camera field-of-views and a building map to setup an efficient data structure for calculation of the sensors which enclose the object to observe. It has been shown, that the approach can be implemented for video-real-time applications in large networks. However, the approach described in [10] has two major disadvantages: First, the algorithm is designed for networks with static cameras only. In particular the calculation of the efficient data structure (which consists of field-of-viewpolygons and the environment model in 2D) is performed in a preprocessing step. For large non-static sensor networks this is no more applicable, due to the needed update of the geometrical representation of the sensing ranges. Second, if an object disappears for a longer period of time, the proposed algorithm involves more and more cameras into the tracking task, trying to refind the object of interest. This leads to an overload of the centralized object tracker. In this paper the approach proposed in [10] is enhanced to applications with non-static cameras. Furthermore, the algorithm has been modified to minimize the increase of the subset of cameras (so-called camera cluster) involved in the multi-camera tracking task.

3. Basic Approach Our sensor selection approach is designed for the task of single object tracking in a large camera networks. For tracking of multiple objects, the system instantiates multi-

ple single-target tracking processes (e.g. a tracking agent), each associated with an object to observe. Because such (multiple) centralized tracking processes are not able to perform data association analysis to all observations provided by all intelligent video sensors simultaneously, a dynamic sensor selection algorithm has been integrated in each tracking agent, to assign the temporarily task-relevant sensors to the tracking process. The basic idea of the approach is to find the subset of the cameras in the network, in which the object has to reappear, after disappearance due to gaps in the sensor coverage. In Fig. 1a for example, for relocation of the blue person only the blue cameras are needed, while for relocation of the red person, only the red ones are relevant. Similar, for tracking a vehicle in a city using a large camera network, due to the known freedom of movement of the observed object, and the knowledge about the sensor network the number of relevant cameras is usually quite small (Fig. 1b). We call the dynamic assignment of relevant cameras to the tracking process which are responsible for single target tracking task-oriented sensor clustering. In the following chapters we will propose an efficient algorithm for solve this camera selection problem. The approach evaluates geometrical relations between object state prediction, building map and camera visibility polygons in a highly efficient way using a dynamic arrangement of line segments (see Fig. 3), and determines those cameras which are relevant for the multi-camera tracking task. In doing so, data association is only performed on data from relevant cameras. Now, before starting with the introduction to the developed algorithm, the used prior static and dynamic knowledge and its modelling is described and some basic terms from computational geometry are introduced.

4. Knowledge and Object State Modelling 4.1. Modelling of Camera Sensing Ranges For each camera or sensor in the network a dedicated video analysis module is assumed (e.g. vision agent on a smart camera). Additionally, self-localization and selfcalibration capabilities are assumed, which means that each camera is able to provide a polygonal field-of-view (FoV) in world coordinates. More precisely, for our camera selection approach the so-called visibility polygons are needed. While a FoV represents the maximum observable area defined by the optical attributes of a camera, a visibility polygon is the subset of the FoV, which is directly observable, regarding environmental restrictions (e.g., occlusions by static objects, walls etc.) as well as application-related requirements (e.g., minimum resolution, region-of-interest). However, by means of the FoV parameters and the prior known building map, it is possible to calculates the visi-

r2 r4 a) Field-of-View (FoV) Visibility-Polygon (total) Visibility-Polygon of the camera

b)

Figure 2. For the proposed algorithm, it is essential to evaluate the sensing ranges of the sensors in the network. In particular, the subsets of sensing ranges are need, which can directly detect the object of interest. In case of cameras, this is given by the intersection of the field-of-view (yellow polygon) and the visibility area (red polygon). The resulting visibility polygon (blue area) is involved as dynamic knowledge about the cameras state in the sensor selection algorithm.

bility polygon. In our system this is done by the visibility algorithm proposed in [11]. As a result of this processing step, for each camera si in the network, a visibility polygon Li = {l1 , l2 , . . . , lji } is determined (see Fig. 2). It is important to mention, that for non-static sensors, the sensing ranges have to be kept up-to-date during processing, to guarantee correct sensor selection.

4.2. Modelling of Geometrical Prior Knowledge A central point of our approach is the modelling of both, the environment (building map) and the camera visibility polygons, in a common geometrical 2D representation. The building model (Fig. 3a) is given by a set of p line segments M = {m1 , m2 , . . . , mp } and for each camera si ∈ S the sensing range is given by a visibility polygon (Fig. 3b), also represented by a set of line segments Li = {l1 , l2 , . . . , lji }. All line segments are now used to induce a so-called arrangement of line segments A(L ∪ M) = (E, V, F), with S L i Li [12]. That is, E is a set of edges, V is a set of vertices and F is a set of resulting faces in the plane (Fig. 3c). Here, a vertex is the intersection or an end point given a line segment of L ∪ M, an edge is the maximum portions of a line without any vertex in between and a face is a subset of the plane that doesn’t contain a point on an edge or a vertex. Consequently, a face is an open polygonal region whose boundary is formed by edges and vertices. For efficient handling of arrangements or planar graphs, the doubly-connected edge list structure (DCEL) turns out to be a standard in computational geometry [12]. Using DCELs, each edge is represented using a pair of directed halfedges, one going from the first endpoint of the line segment to the second one, while the other (its twin halfedge) going in the opposite direction. A DCEL-element consists of three containers of records: vertices, halfedges and faces,

where halfedges are used to separate faces and to connect vertices. That is, from each halfedge the incident faces and vertices are directly on hand. In addition, every halfedge is followed by another halfedge sharing the same incident face, such that the destination vertex of the halfedge is the same as the origin vertex of its following halfedge. Using this structure, it’s possible to traverse the boundary of faces by simply following connected halfedges. A further advantage is that for DCELs and line arrangements there are highly efficient algorithms for point, edge or face localization and manipulation ([12, 13, 14]). DCEL was chosen, since efficient local manipulation and point localization capability in large arrangements are enhancing the effectiveness of our approach.

4.3. Object State Model In this section the object state model used by our sensor selection algorithm is described. For object observation, an arbitrary object detection and tracking approach can be used. Therefore, a generic interface to the sensor selection algorithm was defined. First of all, we assume the observation process for the object state xk = [x, y]T to be of the form zk = x k + v k

(1)

where zk = [x, y]T is the measurement of the object state xk and vk represents the measurement uncertainty, that is assumed to be normal distributed with a zero mean and a covariance matrix Rk = E[vk vkT ]. For camera selection a prediction filter was added as a preprocessing step. The prediction filter is used for tempoDynamic Knowledge (Camera Sensing Ranges)

Static Prior Knowledge (Building Map)

a)

b)

Arrangement of Line Segments vertex edge face ccb

c)

Figure 3. Given a set of line segments for the building map M (a), and the visibility polygons of the cameras L (b, in yellow), the arrangement of lines (c) is the cell complex induced by L ∪ M. It consists of a set of edges, vertices and faces. A face can be described by its connected component of the boundary (CCB) using DCEL structures (yellow dashed line of the blue face).

ral adaptation of the state estimate, if no new measurements are available for a longer period of time. Here, the predicted process is assumed to be of the form x ˆ− xk−1 + wk k = Aˆ

(2)

where A is the 2x2 state transition matrix and wk represents the process noise, which is assumed to be a normal distributed random variable with a zero mean and a covariance matrix Q(∆T ) = E[wk wkT ]. ∆T denotes the elapsed time since the last measurement. The filtering is performed in two steps: A prediction step, and a update step. The update step simply consists of setting the object state estimate by x ˆk = zk and the estimate error covariance Pk = Rk . This is performed each time new measurements are available. The prediction step uses the model parameters A and Q to predict the future state estimate x ˆ− k and predicted error − covariance Pk as follows: x ˆ− k P− k

= =

Aˆ xk−1

(3) T

APk−1 A + Q(∆T )

(4)

The prediction step is performed in regular intervals, defined by the process clock of the sensor selection algorithm (e.g., 1Hz). The model parameters A and Q of our prediction filter are defined as follows: Since a typical linear motion model (e.g., constant velocity or constant acceleration) is not applicable in nonlinear environments (building map) we use a constant position model for the state transition matrix A = I, with I as the 2x2 identity matrix (see [15]). In doing so, the predicted state estimate is given by x ˆ− ˆk−1 = zk−1 , and is therefore always inside a visik = x bility polygon of a camera. The complete uncertainty about the object movement is modelled as process noise by   ∆T 0 Q(∆T ) = q (5) 0 ∆T with q as the assumed position variance (motion dynamic) of the observed object (e.g., 1.5 m/s). The error covariance matrix Q is increasing the state estimate uncertainty, depending on the elapsed time ∆T since the last object position measurement. The resulting object state prediction − x ˆ− k and the predicted error covariance Pk are now used by the sensor selection approach as information about the object state.

5. Camera Selection Algorithm / Clustering In our framework the camera selection algorithm can be considered as an independent process which periodically determines sensors (cluster members) relevant for a certain surveillance task (e.g. tracking). Given a prediction about the object state N (ˆ x− , P− ) and the prior knowledge

about sensors and the environment, the camera selection algorithm calculates the relevant subset of the sensor network that is needed for further observation of the object. In addition to determine the sensors with the estimated position in FoV as a first objective, a special goal of this algorithm is to determine the minimum number of sensors needed to relocate an object, even if the object is temporarily out of sight (e.g., by non-overlapping sensor coverage). In this section we will describe the algorithm in two steps. First, we will introduce the basic idea formally by a basic algorithm. Hereby, we assume a ”perfect detector and tracker”. That is, we assume that if an object is visible in a camera, the video analytics integrated on the smart cameras are always able to detect and recognize the object of interest. However, in the basic algorithm we already consider static as well as non-static cameras. In a second step, we apply more realistic assumptions to the object detectors and the sensor selection algorithm is adapted for suboptimal detection and tracking approaches. That is, the sensor selection approach has to take into account ”false negatives” generated by the object detection video analytics.

5.1. Basic Algorithm Let S be a set of sensors distributed in an area under surveillance (e. g. as in Fig. 4a). The relevant sensors for multi-camera object tracking are called cluster sensors (cluster) Sk . The proposed approach differs between three subsets of sensors in S (cluster) : One subset of the cluster (the so-called active sensors) S (active) includes all sensors with the object of interest in view (Fig. 4b, red FoVs). These sensors, of course, are responsible for direct object observation. The second subset (so-called passive sensors) are those sensors (with no object in view) needed to limit the local area in the surrounding of the object under observation (Fig. 4b, yellow FoVs). The passive sensors are determined for object recognition, in case of object disappearance. Because the sensor selection algorithm guarantees that the object is surrounded by passive or active sensors all time, the observed object has to reappear in one of them. Subsequentially, the tracking process has to request observation data from these sensors only. However, while for multi-camera tracking the object observations are requested from active cameras all the time, only a subset of the passive sensors is temporarily relevant for the tracking task. Due to the motion model defined for the object of interest, only the subset of the passive cameras which can be reached by the object by the time elapsed since the last detection are included into the multi-camera tracking. For object relocation, only the observations from the active and this subset of the passive cameras are needed. So, communication paths have to be set up to these sensors only. This subset is defined by S (com) (Fig. 4, red FoVs).

Now, after introduction of this basic terms, we start with our sensor selection method, which is based on computational geometry algorithms. For each si ∈ S a set of line segments Li,k = {l1 , l2 , . . . , lji } (visibility polygon on the 2D ground plane at time k) is given. The collection of the visibility polygons is summarized by Lk = {L1,k , L2,k , . . . , Ln,k }. Additionally, let a set of line segments M = {m1 , m2 , . . . , mp } represent the building map. Given this prior information, the sensor selection algorithm starts with an initialization step: (active)

S0

(passive)

= ∅, S0 (com)

S0

=S

=∅

A0 (L0 ∪ M) = (E0 , V0 , F0 ) At this time, there is no initial object position available (active) (com) (tracker not initialized). Thus, S0 and S0 are (passive) empty sets. S0 includes all sensors in the network, because without position information no spatial containment of the object is possible. Additionally, the complete arrangement of line segments A0 (L0 ∪ M) is generated, including the line segments of the environment model and all line segments from sensor visibility polygons. It should be noticed, that the full generation of the arrangement of line segments is performed in prior. Later updates for non-static cameras are performed only for cluster sensors. Once the tracker has been initialized (by definition of the object to observe) a prediction about the object state estimate is provided (e.g., by a tracking approach) and camera selection can be performed. The tracking process provides predictions about the object state estimate periodically. For each prediction a new sensor selection loop is performed. Because for the selection approach it is essential to evaluate valid (that is, most recent) information about camera sensing ranges, in step 1 an update of the visibility polygons is performed first. Hereby, only the cluster sensors determined in the previous selection loop are updated, and therefore only a small subset of the arrangement has to be recalculated. After update of the arrangement the set of active sensors are calculated first (step 2). Given the predicted object position estimate x ˆ− k , for all visibility polygons Li,k of the clus(cluster) ter cameras Sk−1 a point-in-polygon-test is performed using an implementation of the well-known ray casting algorithm as described in [14]. The sensors with the object in sensing range are declared as active sensors (Fig. 4b/e). It is important to notice, that while for the very first selec(passive) = S), tion loop, all sensors have to be evaluated (S0 in subsequent iterations, the point in polygon test is only applied on a highly reduced number of camera polygons (cluster) (Sk−1 and is therefore very efficient.

Algorithmus 1 − For each object state prediction (ˆ x− k , Pk ), do

1. update arrangement of line segments: (a) Determine edges of passive sensors to remove: (passive) L(remove) = {Li ∈ Lk−1 }|si ∈ {Sk−1 } (b) If the object state prediction is the first since a successful object relocation, perform a full update of cluster sensors (including reinsertion of active cameras). In case of a follow-up prediction, only passive sensors are updated. L(update) = {Li ∈ Lk |si ∈ S (update) } with  (cluster)  if first  Sk−1 (update) prediction S =   S (cluster) \S (active) else k−1 k−1 (c) The updated arrangement is given by (remove) A0k = A(L0k ∪ M), L0k = (L− )∪ k−1 \L (update) L 2. determine the active sensors S (active) : (active) (cluster) − Sk = {si ∈ Sk−1 |ˆ x ∩ Face(Li,k ) 6= ∅} 3. Determine the passive sensors: (a) Manipulate the arrangement of line segments: (active) Lk = L0k \Lk Ak = A(Lk ∪ M) = (Ek , Vk , Fk ) (b) Find the face in Ak (Lk ∪M), which includes the predicted object position: F (passive) = {Fu ∈ Fk |ˆ x− k ∩ Fu 6= ∅} (c) Determine the sensors involved in the boundary edges of the face F (passive) (or which are completely inside F (passive) ): (passive) Sk = {si ∈ S|Li,k ∩ CCB(F (passive) ) 6= ∅} 4. Determine the plausibility-of-presence ψ of the passive sensors and subscribe to relevant cluster sensors (that is, to active sensors and passive sensors with a (com) sufficient plausibility-of-presence: Sk = {si ∈ (cluster) (active) Sk |si ∈ Sk ∨ (ψi,k > τ (P oP ) )} Once the active sensors are known, the passive sensors can be determined in a subsequent step (3). First, a slightly modified arrangement of line segments (active) Ak ((L0k \Lk ) ∪ M) is created. This is, the arrangement includes line segments given by the environment model M and the visibility polygons of all sensors in the network, but without the active ones. The subtraction of the

Arrangement Selected Sensors Arrangement

e)

f)

g)

h)

a)

b)

c)

d)

e)

f)

g)

h)

Figure 4. The algorithm starts with an update of the visibility polygons of the cluster sensors. After determination of the active sensors by a point-in-polygon-test (b/e) the arrangement of line segments is manipulated (lines of active sensors are removed, see f). The passive sensors (highlighted in yellow in b) are determined by evaluation of the actual face (blue area). In a subsequent step, the algorithm involves the object state uncertainty for determination of the probability-of-presence of the object in passive sensors (c/g). If the object is not reappearing for a longer period of time, the position uncertainty increases and more passive sensors are involved in the relocation task.

line segments of the active sensors is a crucial part of the algorithm. By excluding the visibility polygons of the active sensors it is guaranteed, that all remaining line segments can be considered as physical (environment model, e.g., walls) or virtual (entry edges of surrounding cameras) borderlines (Fig. 4f). The arrangement can also be constructed in a very efficient way, by temporarily removing the edges which belong to the actual active sensors. In doing so, even in very large sensor networks, the computational power needed for recalculation of the arrangement is minimized. After manipulation of the arrangement A0k to Ak , the face F (passive) (containing the actual object position) is calculated. In a further step, the connected component boundary function CCB determines the edges which are incident to the face F (passive) (Fig. 4f, blue area). The edges of the CCB can derive from visibility polygons of the sensors (Li,k ) or from the environment model M. Because we are only interested in sensor selection, the edges deriving from the building map (M) are remain unconsidered, and (passive) the sensors involved in the CCB are added to Sk . The active and the passive sensors are now defined as the (cluster) (active) (passive) actual camera cluster Sk = Sk ∪ Sk . The sensor selection process is now repeated from Step 2 periodically. It is important to mention, that this algorithm is obviously able to determine locally task-relevant cameras, by recursive determination of cluster members and an efficient local manipulation of the arrangement of line segments. Now that the cluster sensors are determined, in a further (com) (cluster) step the temporarily relevant cameras Sk ⊆ Sk are determined for request of observation data (step 4). This is done by evaluation of the plausibility-of-presence of the object of interest for a each passive cameras. The (passive) plausibility-of-presence for si ∈ Sk is defined as

− P oPi (ˆ x− k , Pk )

ShortestPath(ˆ x− k , Li,k ) = 1/ 1 + σ (max) (P− k) 

 ,

with ShortestPath as the length of the shortest path from the predicted object position x ˆ− k to the visibility polygon (max) of si and σ as the worst-case assumption about the uncertainty of the predicted object state estimate (given by the covariance matrix P− k ). For calculation of the shortest path we use a modification of Dijkstra’s algorithm [16]. Dijkstra’s algorithm, as a single-source-query-approach is able to calculate the shortest paths to all passive sensors simultaneously. σ (max) represents the worst-case assumption about the object movement (position uncertainty). In case of disappearance the prediction filter leads to an increase of the position uncertainty about the object, as defined by the motion model. So finally, if the plausibility-ofpresence exceeds a predefined threshold, the reappearance of the object in this cameras is regarded as plausible. In this case, this sensor is added to S (com) , that is observation data is requested. In addition, the proposed algorithms optimizes the communication load by evaluation of the plausibility-ofpresence for the passive cameras. In doing so, the tracking process only needs to subscribe to the plausible cluster sensors. The plausibility-of-presence is based on shortest path distance calculation for taking into consideration the non-linear freedom of movement in buildings with physical barriers (obstacles, etc.).

5.2. Algorithm for Suboptimal Detectors However, the approach described in the previous section has one disadvantage: if an object is able to pass a passive camera without being recognized as the object of interest, a recognition of the object of interest is no more possible,

because only cluster sensors are involved in the further detection and recognition process. Because we assume, that the object of interest is surrounded by active and passive cameras all the time, the cluster is not expanded in this case. Algorithmus 2 − For each object state prediction (ˆ x− k , Pk ), do

1. update arrangement of line segments: see Algorithm 1 (active)

2. determine the preliminary active sensors Sk

:

see Algorithm 1 3. Repeat until S (active)+ = ∅ (stable cluster): Determine the preliminary passiven sensors: (a) Manipulate the arrangement of line segments: (active) Lk = L0k \Lk Ak = A(Lk ∪ M) = (Ek , Vk , Fk ) (b) Find the face in Ak , which includes the predicted object position: F (passive) = {Fu ∈ Fk |ˆ x− k ∩ Fu 6= ∅} (c) Determine the sensors involved in the boundary edges of the face F (passive) (or which are completely inside F (passive) ): (passive)− Sk = {si ∈ S|Li,k ∩ CCB(F (passive) ) 6= ∅} (d) Determine the plausibility-of-presence ψk of the preliminary passive sensors and re-classify the plausible sensors with no availability as active ones: S (active)+ = (passive)− {si ∈ Sk |(ψi,k > τ (P oP ) ) ∧ (ϑi 6= 1)} (active) (active) Sk = Sk ∪ S (active)+ (passive) Sk = S (passive)− \ S (active)+ 4. If cluster ”stable”, subscribe to relevant cluster sensors as in Algorithm 1 To overcome this drawbacks, an additional variable is added to the sensor description – the so-called sensor availability. The availability is a boolean coefficient provided by the intelligent sensors and describes if a camera is temporarily applicable as passive sensor. If, for example no objects, no movement, and no lighting changes are detected by the video analytics integrated on the camera, this sensor is applicable as a passive one, because the visibility is not shortened. If objects are detected (or in general a part of the visibility polygon is not observable, due to disturbance, noise, or clutter), the camera is classified as not available as

passive sensor. During sensor selection such an unavailable passive camera is re-classified as active, because it can not guarantee to act as a reliable boundary camera. However, because for calculation of the passive sensors the arrangement of lines has to be modified by removing line segments of all active sensors, the subsequent redeclaration of passive sensors as active implies a recursive sensor cluster calculation. Thus, step 3 of Algorithm 2 is repeated iteratively, until the cluster is stable. That is, until no further passive sensors are subsequently classified as active (step 3). The drawback of ”losing objects” which passed through the passive sensors unnoticed, is overcome. If an object is passing through a passive camera, even it is not recognized as the object of interest, the detected movement leads to a self-declaration of the passive camera as not available. Subsequentially, the sensor selection algorithm redefines this camera as active and the cluster is expanded. In doing so, the object is still surrounded by the new and the former passive cameras.

6. Experimental Results The developed simulation tool generates a simple environment model and randomly generates a predefined number of visibility polygons (as described in Section 4.1). Fig.5(left) shows an example of such a configuration, for 50 cameras. The camera network size has been varied between 25 and 500 cameras, with constant density of coverage. Additionally, a predefined number of object tracks (random walks in the corridors) are generated (green ellipses in Fig.5). For simulation of different density of traffic, the number of tracks has been varied between 10 and 500. Sensor selection is then applied for single object tracking (the red ellipse). Given this framework, the processing time of the proposed algorithm has been analyzed. Fig. 5(middle) shows the mean runtime and the standard deviation of the calculation time for different network sizes, split into four main parts: update of the arrangement (visibility-polygons of non-static sensors), calculation of active sensors (point-inpolygon tests), manipulation of the arrangement (removing edges of active sensors), and calculation of passive sensors (face calculation). It is shown, that by increasing the network size, computational costs for sensor selection increase moderately and the approach is applicable to real systems (absolute runtime ¡ 1sec. for 500 sensors). However, the maintain of the arrangement of line segments (update of the non-static visibility polygons) takes the major part of the processing time, followed by the task of determination of passive sensor by manipulation of the arrangements and evaluation of the faces. This highlights the manipulation of the arrangement (update) as the main cost factor of the algorithm.

Calculation Time (in ms)

Calculation Time (in ms) Simulation-Screenshot

800

Mean Calc Time Std. Deviation

600 400 200 0 0

100

200 300 400 No. of Sensors

500

800

CalcPassive ManipArr

600

CalcActive Update

400 200 0 25

50

100 250 No. of Sensors

500

Figure 5. The developed simulation tool shows an arrangement of lines segments, that consists of a simple building map and 50 cameras. All cameras simulated as moving pan/tilt-units. The actual passive sensors are highlighted in yellow, the active ones are highlighted in red. The red ellipse represents the simulated moving object, while the green ellipses are additional moving objects which represent the traffic and clutter. The graph in the middle and on the right show the determined processing time for sensor networks with 25 to 500 cameras, and 25 to 500 moving objects in the area under surveillance.

7. Future Work In future work, the algorithm will be evaluated under real conditions, using our experimental surveillance system NEST [17]. Furthermore, we will investigate possibilities for speed up the maintain of the arrangement of line segments. A promising approach to overcome this drawback, is a temporarily suboptimal sensor selection, by updating only the relevant subset of the cameras cluster S (com) , that is, the sensors which are involved in the observation task.

References [1] P. Doubek, “Multi-view tracking and viewpoint selection,” PhD Thesis, Swiss Federal Institute of Technology Zurich (ETH), 2005. 1 [2] A. Gupta, A. Mittal, and L. S. Davis, “Cost: An approach for camera selection and multi-object inference ordering in dynamic scenes,” IEEE 11th International Conference on Computer Vision (ICCV), Oct. 2007. 1 [3] P. Puhuluwutta, T. Puppus, and A. Kutsuggelos, “Optimal sensor selection for video-based target tracking in a wireless sensor network,” in International Conference on Image Processing ICIP, vol. 5, 24-27 Oct. 2004, pp. 3073 – 3076. 1 [4] Y. Zhang and Q. Ji, “Active and dynamic information fusion for multisensor systems with dynamic bayesian networks,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 36, no. 2, pp. 467–472, April 2006. 1 [5] J. Park, C. Bhat, and A. C. Kak, “A look-up table based approach for solving the camera selection problem in large camera networks,” in Workshop on Distributed Smart Cameras (ACM SenSys’06), 2006. 1 [6] D. Makris, T. Ellis, and J. Black, “Bridging the gaps between cameras,” Proc. IEEE Computer Vision and Pattern Recognition, vol. 2, pp. II–205–II–210 Vol.2, Jul. 2004. 1 [7] S. Calderara, R. Vezzani, A. Prati, and R. Cucchiara, “Entry edge of field of view for multi-camera tracking in distributed video surveillance,” in Proc. IEEE Conference on

Advanced Video and Signal Based Surveillance AVSS 2005, 15–16 Sept. 2005, pp. 93–98. 1 [8] X. Zou, B. Bhanu, B. Song, and A. K. Roy-Chowdhury, “Determining topology in a distributed camera network,” in Proc. IEEE International Conference on Image Processing ICIP 2007, vol. 5, 16 Sept.– 19 Oct. 2007, pp. V–133–V– 136. 1 [9] O. Javed, Z. Rasheed, K. Shafique, and M. Shah, “Tracking across multiple cameras with disjoint views,” in Proc. Ninth IEEE International Conference on Computer Vision, 13–16 Oct. 2003, pp. 952–957. 1 [10] E. Monari and K. Kroschel, “A knowledge-based camera selection approach for object tracking in large sensor networks,” in Proc. ACM/IEEE International Conference on Distributed Smart Cameras ICDSC, 30 Aug.-2 Sept. 2009. 2 [11] K. J. Obermeyer, “The VisiLibity http://www.VisiLibity.org, 2008, r-1. 3

library,”

[12] M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars, Computational Geometry - Algorithms and Applications, 3rd ed. Springer, 2008. 3 [13] “C GAL, Computational Geometry Algorithms Library,” 2008. [Online]. Available: http://www.cgal.org 3 [14] R. Wein, F. Fogel, B. Zukerman, and D. Halperin, 2D Arrangements, 3rd ed., 2007. 3, 5 [15] D. Kelly and F. Boland, “Motion model selection in tracking humans,” in Proc. IET Irish Signals and Systems Conference, 28–30 June 2006, pp. 363–368. 4 [16] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, no. 1, pp. 269–271, December 1959. [Online]. Available: http: //dx.doi.org/10.1007/BF01386390 6 [17] J. Mossgraber, F. Reinert, and H. Vagts, “An architecture for a task-oriented surveillance system: A service- and eventbased approach,” Systems, Second International Conference on, vol. 0, pp. 146–151, 2010. 8

Suggest Documents