November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
International Journal of Information Acquisition Vol. 7, No. 3 (2010) 225–242 c World Scientific Publishing Company DOI: 10.1142/S0219878910002208
MULTI-CAMERA POSITIONING FOR AUTOMATED TRACKING SYSTEMS IN DYNAMIC ENVIRONMENTS YI YAO Visualization and Computer Vision Lab GE Global Research
[email protected] CHUNG-HAO CHEN Department of Mathematics and Computer Science North Carolina Central University
[email protected] BESMA ABIDI∗ , DAVID PAGE† , ANDREAS KOSCHAN‡ and MONGI ABIDI§ Imaging, Robotics, and Intelligent Systems Lab University of Tennessee, Knoxville ∗
[email protected] †
[email protected] ‡
[email protected] §
[email protected] Received 6 July 2010 Accepted 12 October 2010 Most existing camera placement algorithms focus on coverage and/or visibility analysis, which ensures that the object of interest is visible in the camera’s field of view (FOV). According to recent literature, handoff safety margin is introduced to sensor planning so that sufficient overlapped FOVs among adjacent cameras are reserved for successful and smooth target transition. In this paper, we investigate the sensor planning problem when considering the dynamic interactions between moving targets and observing cameras. The probability of camera overload is explored to model the aforementioned interactions. The introduction of the probability of camera overload also considers the limitation that a given camera can simultaneously monitor or track a fixed number of targets and incorporates the target’s dynamics into sensor planning. The resulting camera placement not only achieves the optimal balance between coverage and handoff success rate but also maintains the optimal balance in environments with various target densities. The proposed camera placement method is compared with a reference algorithm by Erdem and Sclaroff. Consistently improved handoff success rate is illustrated via experiments using typical office floor plans with various target densities. Keywords: Sensor planning; visual surveillance; multiple camera systems; camera handoff. 225
November 16, 2010
226
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
1. Introduction With the increased scale and complexity involved in most practical surveillance applications, it is almost impossible for any single camera (either fisheye or PTZ) to fulfill automated and persistent tracking with an acceptable degree of continuity and/or reasonable accuracy. Systems with multiple cameras find extensive use in surveillance applications. The need for sensor planning emerges when the question of how to place multiple cameras to fulfill given tasks with given performance requirements arises. In literature, most sensor planning algorithms are proposed for such applications as 3D object inspection and reconstruction [Roy et al., 2004; Roy et al., 2005; Chen et al., 2008]. Sensor planning for surveillance systems has received increasing attention in recent years [Quereshi and Teropoulos, 2005; Cai and Aggarwal, 1999; Isler et al., 2005; Pavlidis et al., 2001]. Cameras are placed such that a full or specified coverage of the environment or object is achieved. A probabilistic camera planning framework with visibility analysis of dynamic occlusions was proposed by Mittal and Davis [2004]. Erdem and Sclaroff [2006] defined different types of coverage problems and developed corresponding solutions. The conventional requirements in sensor planning, such as coverage and visibility, cannot by themselves ensure an automated tracking in real-time surveillance systems. A uniform and sufficient amount of overlap between the FOVs of adjacent cameras should be reserved so that consistent labeling and camera handoff can be executed successfully. To achieve such a camera placement, sensor planning algorithms achieving the optimal balance between coverage and handoff success rate are discussed in [Yao et al., 2010; Yao et al., 2008]. In this paper, additional considerations regarding the targets’ dynamics including their dynamic interaction with the observing cameras are incorporated into sensor planning via the probability of camera overload. The optimal camera placement is obtained according to the targets’ statistical distribution in
the environment. More overlapped FOVs are secured for more clustered environments so that a target has more freedom to be transferred to another camera when experiencing camera overload or dynamic occlusion. In addition, the introduction of the probability of camera overload also simplifies the modeling of PTZ cameras. Most existing sensor planning algorithms find it difficult to properly model PTZ cameras. Let the instant FOV denote the FOV that a PTZ camera can see at any given time instance and the achievable FOV the FOV that a PTZ camera can survey given a sufficient period of time. The camera’s limited pan and tilt speeds lead to the discrepancy between the instant and achievable FOVs, which in consequence introduces difficulties in modeling PTZ cameras for sensor planning. Some algorithms simply use the achievable FOV for the sensor planning of PTZ cameras. In brief, the major contribution of this paper is the derivation of the probability of camera overload and the application of this probability to sensor planning. The resulting sensor planning algorithm has the advantage of considering both the dynamics of the observing camera, especially for PTZ cameras, and the dynamic interactions between the objects and cameras. The algorithm presented in this paper is closely related to our previous work described in [Yao et al., 2010; Yao et al., 2008; Yao et al., 2008]. The major differences are listed as follows. (I) The algorithms in [Yao et al., 2010; Yao et al., 2008] construct an objective function without considering the dynamic interaction between objects and observing cameras, which is, however, the major additional consideration introduced in this paper. (II) The algorithm in [Yao et al., 2008] concentrates on the placement of PTZ cameras, which indeed is a special case of the scenarios addressed by our algorithm. In [Yao et al., 2008], the maximum number of targets that can be tracked simultaneously by a camera, Nobj , is set to 1, whereas in this paper we investigate the general case with Nobj ≥ 1. The remainder of this paper is organized as follows. Table 1 lists the major notations used in this paper. A brief introduction to related work
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments Table 1. λj µj A, AC AH AV ci Ko,j MD , MD,ij MR , MR,ij Nobj , Nobj,j Pdo,ij Pdo,i Pco,i Qi,j QF QT x
227
Major notations.
Arrival rate in the FOV of the jth camera configuration Average residence time in the FOV of the jth camera configuration Coverage matrix with aij = 1/aC,ij = 1 indicating that the ith grid can be seen by the jth camera configuration Handoff safety margin with aH,ij = 1 indicating that the ith grid is in the handoff safety margin of the jth camera configuration Visible area with aV,ij = 1 indicating that the ith grid is in the visible area of the jth camera configuration The cost function of the ith grid Target density in the FOV of the jth camera configuration Distance component of the observation measure of the ith grid observed by the jth camera configuration Resolution component of the observation measure of the ith grid observed by the jth camera configuration The maximum number of objects that can be tracked simultaneously by the jth camera configuration Probability of dynamic occlusion on the ith grid observed by the jth camera configuration Probability of dynamic occlusion on the ith grid Probability of camera overload on the ith grid Observation measure of the ith grid observed by the jth camera configuration Failure threshold that separates the handoff safety margin and invisible areas Trigger threshold that separates the handoff safety margin and visible areas The solution vector with xj = 1 indicating that the jth camera configuration is selected
is given in Sec. 2. Section 3 defines the observation measure and the objective functions used for the search of the optimal camera placement. Section 4 introduces the probability of camera overload and the modified objective function considering targets’ dynamics. Section 5 demonstrates our experimental results and comparisons with the reference algorithm. Section 6 concludes this paper.
2. Related Work In literature, most placement algorithms using visual sensors are proposed for such applications as 3D object inspection and reconstruction. Roy et al. [2004] reviewed existing sensor planning algorithms for 3D object reconstruction and proposed an online scheme using a probabilistic reasoning framework for next-view planning and object recognition [2005]. A more recent and thorough discussion regarding sensor planning algorithms for 3D object reconstruction and recognition can be found in [Chen et al., 2008]. The authors also pointed out promising
directions for future research, such as the combinational optimization of the placement of both cameras and illumination sources. Wong et al. [1999] defined a metric evaluating the unknown information in each group of potential viewpoints and used it in the search of the next best view for 3D modeling. Yous et al. [2006] designed an active scheme for the assignment of multiple PTZ cameras so that each camera observes a specific part of a moving object, mainly pedestrians, and achieves the best visibility of the whole object. The selection of sets of omnidirectional views for the representation of a 3D scene is discussed in [Tosic and Frossard, 2006]. In [Saadatseresht et al., 2005], fuzzy logic inference is employed for camera placement considering the uncertainty in the analysis of visibility, accessibility, and cameraobject distance. Chen and Li [2004] addressed the placement of active sensors in the context of robot vision. With the increased complexity of multiple camera systems, sensor planning is also conducted in a larger scale and a higher level similar to sensor networks. Guo et al. [2008] modeled observability as a decreasing
November 16, 2010
228
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
exponential function of the observation distance and used this model in camera placement for the monitoring and tracking of mass objects. Dunn et al. [2006] employed the Parisian evolutionary algorithm to search for the optimal camera placement for 3D object reconstruction aiming at a reduced computational complexity. Sensor planning for surveillance systems has received increasing attention in recent years [Quereshi and Teropoulos, 2005; Cai and Aggarwal, 1999; Isler et al., 2005; Pavlidis et al., 2001]. Erdem and Sclaroff [2006] defined different types of coverage problems and developed corresponding solutions using perspective cameras. Their methods have been implemented in a simulator with a genetic algorithm as the optimization engine [David et al., 2007]. Several placement algorithms are developed based on Erdem and Sclaroff’s method. Angella et al. [2007] presented solutions for the more generalized Mcoverage problem, where it is desired that the object of interest can be observed by at least M cameras. Horster and Lienhart [2006] also addressed the M-coverage problem and transformed their nonlinear objective function to a linear one so that linear binary programming can be used in the search of the optimal camera placement. Literature also mentions sensor placement algorithms focusing on additional considerations such as path observability, dynamic occlusion, and frontal view availability. Bodor et al. [2007, 2005] presented a camera placement algorithm to maximize the observability of a path. In the similar vein, Fiore et al. [2008] used the distance and foreshortening constraints to describe the observability of a path and defined the corresponding objective function to guide the search for optimal camera placement. Successful camera placement and online repositioning are demonstrated for tracking pedestrians moving along a regular path using two fixed cameras mounted on remotely controllable mobile platforms. A probabilistic camera planning framework with dynamic visibility analysis was proposed by Mittal and Davis [2004]. Another metric describing the likelihood of dynamic occlusion is also discussed in [Chen and Davis].
Ram et al. [2006] introduced frontal view probability to coverage analysis and demonstrated real-time camera selection for a better observation of a pedestrian. Online camera selection, also referred to as the focus of attention problem by Isler et al. [2005], is introduced as a result of the improved mobility of cameras. Since the object of interest can be observed by multiple cameras, an online resource management mechanism is necessary to guide the coordination among multiple cameras for an optimal system performance. The optimal performance is two-fold: the optimal observation of every object of interest and the optimal computational load for every camera deployed in the environment. Gupta et al. [2007] discussed a unified approach, referred to as COST, which selects a set of cameras to be used for the inferences for each person in a group of pedestrians considering occlusions and visual confusion. Isler et al. [2005] proposed a selection framework to assign cameras to track the object of interest for a minimized expected error in the estimation of the object’s location. The visibility interval is explored as one criterion for online camera selection in [Lim et al., 2005]. The online camera selection or scheduling is discussed for large scale camera networks, where up to 80 cameras are deployed [Lim et al., 2007]. Since Erdem and Sclaroff’s [2006] sensor planning algorithm is selected as the baseline method for performance comparison, more details are presented as follows. In [Erdem and Sclaroff, 2006], it is assumed that a polygonal floor plan is represented as an occupancy grid, a binary vector b ∈ BNg can be obtained by letting bi = 1 if the corresponding grid can be seen by at least one camera and bi = 0 oth. erwise, where B = {0, 1} and Ng denotes the number of grid points. We construct a binary matrix A ∈ BNg ×Nc with aij = 1 if the ith grid is covered by the jth camera configuration, where Nc denotes the number of camera configurations. Each camera configuration specifies one combination of camera intrinsic and extrinsic parameters, including the camera’s focal length f , pan/tilt angle θP /θT , and position Tc . The solution vector x ∈ BNc has xj = 1 if the jth camera configuration is selected and xj = 0,
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
otherwise. The following relation holds: bi = 1 if bi > 0; and bi = 0, otherwise, with b = Ax. Let the cost associated with the jth camera configuration be ωj . Given the maximum cost Cmax , the Max-Coverage sensor planning problem can be described by bi , subject to ωj xj ≤ Cmax . (1) max i
j
3. Sensor Planning As we can see from the previous section, a binary model, visibility matrix A, is sufficient to tackle the optimization of coverage. However, this binary model is no longer adequate to incorporate the additional requirement of uniform and sufficient overlapped FOVs between adjacent cameras. Therefore, in this section, we first define a continuous observation measure to describe the suitability of the target being tracked by the current camera. Afterwards, this continuous observation measure is thresholded to generate three regions including, invisible, visible, and a gray region. The gray region, to which we refer as the handoff safety margin, defines the area in the camera’s FOV where a target needs a transfer. The basic idea of our sensor planning algorithm is to utilize this gray area so that uniform and sufficient overlapped FOVs are achieved. Note that the definition of the observation measure and the formulation of the objective function for static environment are directly inherited from our previous work presented in [Yao et al., 2010; Yao et al., 2008]. To save space, a brief introduction is given in this section for the completeness of the algorithm description. Interested readers please refer to our previous publications for more detailed derivations.
229
object tracking and smooth camera handoff, the tracked target should be at a reasonable distance from the edges of the camera’s FOV. The MD component, therefore, considers the safety margin before the object falls out of the camera’s FOV. To begin our study, the camera and world coordinates are defined and illustrated in Fig. 1. A point gi = [gx,i gy,i gz,i ]T in the world coordinates is projected onto a point pij = T zi,j ] in the jth camera’s coordi[xi,j yi,j nates by
xi,j
cos θT,j
y i,j = 0 zi,j sin θT,j
1
0
− sin θT,j
1
0
0
cos θT,j
0
0
× 0
cos θP,j
0
− sin θP,j
gz,i − Tz,j
sin θP,j cos θP,j
× gx,i − Tx,j ,
(2)
gy,i − Ty,j with Tc,j = [Tx,j Ty,j Tz,j ]T . Assuming zero skew, unit aspect ratio, and zero image center, the projected point in the image plane is Camera coordinates
x' z’ Z
TC
Y
y’
θP
3.1. Observation measure To describe the observation of a tracked target in addition to visibility, we consider two components: the resolution MR and the distance to the edges of camera’s FOV MD . The MR component is designed to evaluate a valid observation for the viewer. For a persistent
X
θT World coordinates
Fig. 1.
Illustration of the camera and world coordinates.
November 16, 2010
230
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
given by:
xi,j = fj xi,j /zi,j /z yi,j = fj yi,j i,j
The observation measure is then given by: wR MR,ij + wD MD,ij [xi,j , yi,j ]T ∈ Π Qij = , −∞ otherwise
.
Letting gz,i = 0 (in the ground plane), , the distance between the the target depth zˆi,j object’s centroid and the camera’s optical center, can be estimated by: zˆi,j =
−Tz,j xi,j /fj cos θT,j + sin θT,j
(3)
and the resolution component MR,ij is then expressed as: αR fj Tz,j zˆi,j >− zˆi,j tan θT,j αR fj MR,ij = , zi,j + Tz,j /tan θT,j )2 − Tz,j /tan θT,j (ˆ Tz,j zˆi,j ≤− tan θT,j (4) where αR is a normalization coefficient. In practice, for better observation and to reserve enough computation time for handoff, the target should remain at a distance from the edges of the camera’s FOV. Moreover, this margin distance is affected by the target depth. When the target is at a closer distance, its projected image undergoes larger displacement in the image plane. Therefore, a larger margin should be reserved. In our definition, different polynomial powers are used to achieve varying decreasing rates of the MD component as the object of interest approaches the edges of the camera’s FOV. The MD,ij is then given by:
|xi,j | 2 1− MD,ij = αD Nrow /2
|yi,j | + 1− Ncol /2
+β 2 β1 zˆi,j 0
, (5)
where Nrow and Ncol denote the image’s width and height, αD is a normalization coefficient, and β1 and β0 are used to adjust the polynomial degree.
(6) where wR and wD are importance weights and Π denotes the image plane. As for PTZ cameras, considering the additional flexibility from the camera’s adjustable pan and tilt angles, the resolution component . We assume zi,j MR,ij is given by MR,ij = αR fj /ˆ that the target is always maintained at the image center by panning and tilting the camera. Therefore, the MD,ij component can be eliminated from the computation of the observation measure.
3.2. Objective function A failure threshold QF and a trigger threshold QT are derived to define three disjoint regions: (I) invisible area with Qij < QF where Qij represents the observation measure value of the ith grid observed by the jth camera configuration, (II) visible area with Qij ≥ QT , and (III) handoff safety margin with QF ≤ Qij < QT . The failure threshold QF segments the invisible areas and is used for coverage analysis. The trigger threshold QT separates the visible areas and handoff safety margins. It is introduced for handoff rate analysis, where necessary overlapped FOVs between adjacent cameras are optimized. The trigger threshold QT is given by uobj t¯H , where u ¯obj represents the QT = QF + κ¯ average moving speed of the object of interest, t¯H denotes the average duration for a successful handoff, and κ is a conversion scalar. Let AC ∈ BNg ×Nc represent the grid coverage with aC,ij = 1 if Qij ≥ QF and aC,ij = 0 otherwise. Matrix AC resembles matrix A in the conventional coverage analysis discussed in the previous section. For the purpose of handoff analysis, two additional coefficient matrices are constructed AH and AV . The matrix AH has aH,ij = 1 if QF ≤ Qij < QT and aH,ij = 0 otherwise. The matrix AV has aV,ij = 1 if Qij ≥ QT and aV,ij = 0, otherwise. Matrices AH and AV represent the handoff safety margin and visible area, respectively. Recall that the solution vector
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
x specifies a set of chosen camera configurations with the corresponding element xj = 1 if the configuration is chosen and xj = 0 otherwise. Let cC = AC x, cH = AH x, and cV = AV x. The objective function of the ith grid is formulated as: ci = wC (cC,i > 0) + wH (cH,i = 2) − wV (cV,i > 1),
j
Note that the optimization of camera placement can be formulated in an alternative way where additional constraint is used instead of constructing an objective function as a weighted sum of three terms as in Eq. (7). To be more specific, the optimization formulation can be changed to: ci , subject to ωj xj ≤ Cmax max i
j
and AH x ≥ 2,
(a)
where ci = wC (cC,i > 0) − wV (cV,i > 1) and the term 2 represents a vector with all elements as 2. From our experiments, the hard constraint approach suffers from a slower convergence speed in comparison with the scheme with an additive objective function. Therefore, in the following discussion, we have chosen to use the additive objective function approach.
(7)
where wC , wH , and wV are predefined importance weights. The operation (cC,i > 0) means 1 cC,i > 0 . The first term in the cC,i = 0 otherwise objective function considers coverage, the second term produces sufficient overlapped handoff safety margins, and the third term penalizes excessive overlapped visible areas. Our objective function achieves a balance between coverage and sufficient margins for camera handoff. The optimal sensor placement for the Max-Coverage problem can then be obtained by: ci , subject to ωj xj ≤ Cmax . (8) max i
231
(9)
4. Dynamic Considerations Environments with multiple moving objects impose additional difficulties on sensor planning. Multiple moving objects cause dynamic occlusions depending on their real-time relative positions. Figure 2 compares two camera placements in terms of the ability to handle dynamic occlusion. It is obvious that the camera placement in Fig. 2(a) is unable to deal with dynamic occlusion since target 2 is blocked by target 1 in the FOVs of both cameras. On the contrary, in the camera placement shown in Fig. 2(b), target 2 can be seen from camera 2 when it is occluded by target 1 in camera 1. From the above illustration, we could see that the probability of dynamic occlusion can be reduced by a proper camera placement. Due to the non-deterministic nature of dynamic occlusion, analysis regarding such occlusions is conducted in a probabilistic framework. The probability of dynamic occlusion is derived and incorporated into sensor planning. Another important issue in sensor planning for environments with multiple dynamic targets is the coordination among multiple cameras. In practice, a single camera can track a limited number of targets simultaneously because of the
(b)
Fig. 2. Schematic illustration of the problem of dynamic occlusion, (a) target 2 is occluded by target 1 for both cameras, (b) target 2 can be observed by camera 2 when it is occluded by target 1 in the FOV of camera 1.
November 16, 2010
232
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
3 3 4
2
New target
1 (a)
4 2 Camera’s FOV Tracked object Untracked object
1 (b)
Fig. 3. Schematic illustration of the problem of camera overload. Assume that the camera is able to track four targets at maximum simultaneously due to limited computational capacities, (a) the maximum number of targets is achieved, (b) camera overload occurs when a new target enters the camera’s FOV. Target 3 is dropped due to camera overload.
limited resolvable distance and computational capacities [Chen et al., 2008]. The camera may not be able to detect and/or track new objects when its maximum computational capacity has been reached. This scenario is referred to as the problem of camera overload and is demonstrated in Fig. 3. Assume that the camera is able to track four targets at maximum simultaneously. When a new target enters the camera’s FOV, a decision is to be made so that an appropriate target is dropped due to the limited computational capacity. In Fig. 3(b), since target 3 is farther away from the camera, it is dropped so that the camera can track the new target. The goal of sensor planning is to automatically minimize the number of dropped targets due to camera overload.
4.1. System model For camera overload analysis, we model the multi-object tracking system as an M/M/N /N queuing system. Following the conventions in queuing theory, an M/M/N /N system suggests that: (1) the arrival process follows a Poisson distribution; (2) the residence time follows an exponential distribution; and (3) the number of servers and buffer slots is N . Poisson processes are employed to model objects’ arrival and departure. The Poisson process has been proved effective in modeling random events, such as a customer arrival and the arrival of a cellular phone call that emanates continuously and independently at a constant average rate. It has found wide applications in systems such as bank service and wireless
communication [Panneerselvam, 2004; Tivedi, 2001]. The case of an object of interest entering the FOV of a surveillance system for the service of “tracking” and “monitoring” is similar to the case of a customer entering a bank for the service of account transactions and the case of a mobile call entering the base station for the service of wireless communication. Therefore, the Poisson process is chosen to formulate objects’ arrival and departure in a surveillance system. The Poisson and exponential probability distribution describes two aspects of a Poisson process. The Poisson probability function presents the distribution of the number of events that occur in a time interval of fixed length whereas the exponential probability function records the distribution of the length of the time interval between consecutive events. The different rates of the objects’ arrival and departure determine the target density in the environment. A higher arrival rate and a lower departure rates results in an environment with higher target density. Assume that the average arrival rate in the FOV of the jth camera is λj and that the mean camera-residence time is 1/µj . Let Nobj,j be the maximum number of targets that can be tracked simultaneously by the jth camera. From the M/M/N /N queuing theory, the system can be described by a Markov chain as shown in Fig. 4.
4.2. Dynamic occlusion For dynamic occlusion analysis, we borrow the definition of the probability of dynamic occlusion defined by Mittal and Davis [2004]. Objects are modeled as a cylinder with a radius of robj
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
λj 0
λj
µj
λj
λj n
2
1
2µ j
233
nµ j
Nobj,j-1
Nobj,j
Nobj,j µ j
Fig. 4. Illustration of the state transition of an M/M/N /N queuing system, which is used to model a multi-object tracking system as more objects come into a camera’s FOV.
and a height of hobj . Let the area of their projection onto the ground plane be fixed as Aob = 2 . Assume that the object of interest cenπrobj tered at the ith grid is observed by the jth camera from a distance Dij . Its region of occlusion Dij . The occlusion probais Ao,ij = 2robj hobj Tz,j bility depends on the target density, which can be derived from our Markov chain based system model. Let Aj denote the FOV of the jth camera. The target density in the jth camera is given by: Ko,j =
λj . Aj µj
(10)
The occlusion probability at the ith grid observed by the jth camera Pdo,ij can then be expressed as [Mittal and Davis, 2004]: Pdo,ij
Ko Ao,ij (2 − Ko Aob ) , = 1 − exp − 2(1 − Ko,j Aob ) (11)
where Ko denotes the object density. Given the probability of dynamic occlusion at the ith grid observed from the jth camera, we can compute the overall probability of dynamic occlusion at the ith grid Pdo,i by: Pdo,i =
Pdo,ij .
(12)
j,aC,ij xj =1
Given the probability of the (n − 1)th state Pn−1,j of the Markov chain, the probability of the nth state Pn,j is expressed as: λj 1 λj n Pn−1,j = Po,j , (14) Pn,j = nµj n! µj for 1 ≤ n ≤ Nobj,j , where Po,j is a normalization term to make the sum of the probabilities of all possible states as one: −1 Nobj,j 1 λj n . (15) Po,j = n! µj n=0
The probability that the jth camera reaches its maximum computational capacity is the probability that the Markov chain reaches the (Nobj,j )th state: Nobj,j 1 Nobj,j !
λj µj
n=0
1 n!
Pmax,j = N obj,j
λj µj
n .
(16)
Let the average arrival rate at the ith grid be λg,i and the mean camera-residence time be 1/µg,i . The ith grid can be observed from multiple cameras. The probability of camera overload at the ith grid Pco,i is the probability that all the observing cameras have reached the maximum computational load when new objects appear at the ith grid: Pmax,j . (17) Pco,i = (1 − e−λg,i /µg,i ) j,aC,ij xj =1
The objective function becomes:
The objective function becomes:
ci = wC (cC,i > 0) + wH (cH,i = 2) − wV (cV,i > 1) + wdo (Pdo,i ≤ Pdo,th ),
4.3. Camera overload
ci = wC (cC,i > 0) + wH (cH,i = 2) (13)
where Pdo,th is a predefined threshold and wdo is the importance weight.
− wV (cV,i > 1) + wco (Pco,j ≤ Pco,th ),
(18)
where Pco,th is a predefined threshold and wco is the importance weight. Note that
November 16, 2010
234
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
θ
Target 1
tko +1
tko θ
t
t
tko +1
t ko t k1 Target 2
t k2
t k1 +1
(a)
(b)
(c)
Fig. 5. (a) Illustration of a PTZ camera’s instant (lightly shaded lines) and achievable (black circle) FOVs, (b) illustration of reachable regions of a PTZ camera as defined in Erdem and Sclaroff’s algorithm, (c) illustration of the proposed dynamic modeling of a PTZ camera with Nobj,j = 1. Red circles and green dots depict untracked and tracked objects’ trajectories.
the computation of the probability of camera overload depends on the target’s arrival rate and residence time, which describes the target’s dynamic distribution in the environment. Therefore, the resulting optimal camera placement not only depends on the environment’s geometry but also adjusts to the target’s dynamics.
4.4. Dynamic modeling of PTZ cameras The significance of introducing camera overload analysis becomes obvious especially for PTZ cameras. As mentioned before, camera placement algorithms always find it difficult to properly model a PTZ camera’s instant and achievable FOVs. Erdem and Sclaroff [2006] defined the reachable region to model PTZ cameras. It is assumed that a PTZ camera has two end points for panning. The reachable region corresponds to the intersection areas that the PTZ camera can pan from these two end points during a given period of time, as shown in Fig. 5(b). The reachable region represents the camera’s coverage in the worst case, which leads to a camera placement with excessive overlapped achievable FOVs. In this paper, the discrepancy in modeling a PTZ camera’s instant and achievable FOVs is solved elegantly by letting Nobj,j = 1. That is at a given time instance, a single PTZ camera is able to track a single object in its 360◦ × 90◦ achievable FOV. Figure 5(c)
illustrates scenarios where a PTZ camera is tracking an object in its instant FOV when a second object enters its achievable FOV. Once the first object leaves the camera’s achievable FOV, the camera turns toward the second object and begins tracking. We observe that a seemingly trivial assumption of Nobj,j = 1 sufficiently describes the dynamic interaction between a PTZ camera and the objects of interest. The achievable FOV remains the same under the assumption that the number of objects that have been tracked is zero, whereas the limited instant FOV can be described as the achievable FOV under the assumption that the maximum number of objects that can be tracked simultaneously has been reached. Therefore, the achievable FOV with Nobj,j = 1 is sufficient to model PTZ cameras’ both instant and achievable FOVs for sensor planning. In addition to the power of resolving the dilemma of PTZ cameras’ achievable and instant FOVs, our modeling also incorporates the analysis of PTZ cameras into a unified framework along with the static perspective cameras. The only difference is the assumption regarding the maximum number of targets that can be tracked simultaneously. The maximum numbers of targets for a static camera and a PTZ camera are Nobj,j ≥ 1 and Nobj,j = 1, respectively.
5. Experimental Results In this section, our experimental results using static perspective and PTZ cameras are
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
presented and compared with the reference algorithm proposed by Erdem and Sclaroff [2006]. Both dynamic occlusion and camera overload are included in the objective function for the search of the optimal camera placement: ci =
wC (cC,i
wH (cH,i
> 0) +
= 2)
− wV (cV,i > 1) + wdo (Pdo,i ≤ Pdo,th ) + wco (Pco,j ≤ Pco,th ).
(19)
Two criteria are used to evaluate and compare the performances of various algorithms: coverage and handoff success rate. Handoff success rate denotes the ratio between the number of successful handoffs and the total number of handoff requests. Compared to Erdem and Sclaroff’s method, a consistently improved handoff success rate is expected from our proposed algorithm.
5.1. Parameter selection There are two sets of parameters: normalization coefficients (αR and αD ) and importance weights (wR , wD , wC , wH , wV , wdo , and wco ). The goal of choosing the appropriate normalization coefficients is to provide a uniform comparison basis for different types of cameras and cameras with various intrinsic and extrinsic parameters. In so doing, sensor planning and camera handoff can be conducted independently
of the actual types of cameras selected. In general, we normalize the MR,ij and MD,ij components in the range of [0, 1], which leads to 1 αR = maxj {−Tz,j /tanθT,j } and αD = 0.5. Different from the selection of the normalization coefficients, which depends on the characteristics of the cameras used, the selection of the importance weights is application dependent. We purposefully reserve the freedom for users to choose different importance weights according to their special requirements to increase our algorithm’s flexibility. Meanwhile, default values can be used if the corresponding variables are not specified by users. The default values of wR /wD and wdo /wco are simply 0.5 and 0.2, respectively. We could compute wC , wH , and wV such that the turning point is placed at the middle point between the contours defined by QT and QF . To save space, please refer to [Yao et al., 2008] for details on the selection of parameters. In the following experiments, the importance weights are set to default values.
5.2. Experimental methodology The floor plans under test are shown in Fig. 6. The floor plan in Fig. 6(a) represents two types of environments commonly encountered in practical surveillance: space with obstacles (region
10m
Height: 3m
A 16m
B 11m : Obstacles 10m
: Entrance
11m
: Point of interest : Simulated trajectory (a)
Fig. 6.
235
(b)
Tested floor plans. Two office floor plans: (a) without path constraints and (b) with path constraints.
November 16, 2010
236
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
A illustrated in yellow) and open space where pedestrians can move freely (region B illustrated in green). Region B is deliberately included because it imposes more challenges on camera placement when considering handoff success rate. Camera handoff is relatively easier when there is a predefined path compared with the scenarios where subjects are assumed to move freely, since camera handoff may be triggered at any point in the camera’s FOV. Figure 6(b) illustrates an environment with a predefined path where workers proceed in a predefined sequence. In the following experiments, we will refer to these two floor plans as plan A and B. To obtain a statistically valid estimation of handoff success rate, simulations are carried out to enable a large amount of tests under various conditions. A pedestrian behavior simulator [Antonini et al., 2006; Pettre et al., 2002] is implemented so that we could have a close resemblance to the experiments in real environments and in turn an accurate estimation of the handoff success rate. To save space, interested readers can refer to the original papers for details. In our experiments, the arrival of the pedestrian follows a Poisson distribution. The average walking speed is 0.5 (meters/second). Several points of interest are generated randomly to form a pedestrian trace. Figure 6 depicts some randomly generated pedestrian traces. Since one of the major advantages of our algorithm is the consideration of targets’ statistical distribution, we focus on testing and comparing the algorithms’ performances in environments with a variety of target densities. In theory, given the proper estimation of the target’s arrival rate in the environment, the same value should be used in sensor planning. However, in our experiment, we purposefully use two sets of arrival rates. One set of arrival rates is used in sensor planning. Three camera placements are generated with λ = 0.01 person/second (pps), 0.025 pps, and 0.05 pps. The other set is used in the simulation of target behavior. The tested arrival rates vary from 0.01 pps to 0.05 pps, representing environments with low to high target density. The residency time depends on the pedestrian walking speed
and the average trajectory length. From a statistical study regarding the geometry of the environment, the average residency time is 80 s. According to queuing theory, it is required λj that µj Nobj,j be less than one so as to maintain the equilibrium state of the Markov chain. Therefore, given a fixed µj , the object’s arrival rate is bounded by µj Nobj,j . Such a requirement relates the maximum number of objects Nmax in the FOV of one camera to the maximum number of objects that one camera can handle. In our experiments, we have Nobj = 4 and Nobj = 1 for static and PTZ cameras. Under such a constraint, an arrival rate of 0.05 pps is the maximum rate that our sensor planning algorithm can handle. For more crowded scenarios, Nobj should be increased. Although the arrival rate is one controlling parameter in our experiment, it does not directly describe how crowded the environment is. Therefore, we chose to use the maximum number of objects in the FOV of one camera Nmax to describe the target density instead of the arrival rate. In addition, to remove the dependence of the performance evaluation on the camera’s maximum tracking capability, Nmax is divided by the maximum number of objects that a camera can handle. In brief, we have used the ratio Nmax /Nobj to describe the target density. A simulation with Nmax /Nobj > 1 suggests a scenario that exceeds the tracking capability of a static camera. It is desirable to investigate the ability of the proposed algorithm in handling crowded scenes. It has been shown that 0.30 persons per square meters is a typical target density for free flow traffic scenarios where pedestrians still can freely select their own walking speed, can bypass slower-moving people, and can readily avoid conflicts when crossing in front of others [Fruin]. Beyond 0.30 persons per square meters, traffic congestion may occur. Therefore, we have selected 0.30 as the average target density for crowded scenes. Given an FOV of 50 m2 , the maximum number of targets is 15 persons. A camera with the ability to track 15 targets is needed, Nobj = 15. We employ different values of target arrival rates to verify that the camera placement with a
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
certain arrival rate is able to maintain the handoff success rate for environments with an arrival rate up to the value used in sensor planning. For instance, the handoff success rate of a camera placement with λ = 0.025 pps should be maintained for environments with an arrival rate less than 0.025 pps or a maximum number of targets less than four.
5.3. Results Figures 7 and 8 illustrate the optimal camera placements from the reference algorithm and our algorithm with various target densities for floor plans A and B, respectively. Note that the optimization of the camera’s parameters is not restricted to 2D. The optimized position of a camera includes both Tx,j /Ty,j and Tz,j (height). For clear presentation, Figs. 7 and 8 illustrate the FOVs of the cameras at the optimal positions that are projected onto the 2D ground plane. We can see from Fig. 7 that as the maximum number of targets in the environment increases from one to six, the handoff success rate of the reference method drops gradually from 74.1% to 44.2%. On the contrary, the handoff success rate of our method is maintained within 90.0% for camera placement with λ = 0.025 pps (λ = 0.05 pps) till the maximum number of targets reaches four (six). As expected, with different density parameters used in Eq. 16, the resulting camera placement yields different capacity in handling clustered environments. To examine the ability of the proposed sensor planning algorithm in handling crowded scenes, the simulated target density is increased to 0.3 persons per square meters. The corresponding Nobj is set to 15. Using the same sensor placement, the handoff success rate is shown in Fig. 7(e), which presents similar performance as the scenarios with lower target density and Nobj = 4. This verifies the capability of the proposed sensor planning algorithm in dealing with crowded scenes given that the camera is able to handle crowd as well. Figure 9 shows and compares sample frames from two cameras with and without sufficient handoff margins. If only coverage is taken into
237
account as shown in Fig. 9(a), the object of interest is lost before the left camera is able to identify the subject and cooperate with the right camera. With sufficient handoff margins as shown in Fig. 9(b), the object of interest can be detected and labeled correctly before it becomes unidentifiable in the right camera. Figure 10 illustrates our experimental results using PTZ cameras. Figures 10(b) and 10(c) show the camera placement obtained from our method with different target densities. A larger arrival rate/target density setting leads to a camera placement with more overlapped FOVs between adjacent cameras so that the tracked target has more freedom to be transferred to another camera when experiencing dynamic occlusion and/or camera overload. The advantage of our method over the reference method becomes clear when we look into the handoff success rate with respect to the maximum number of targets in the environment, as shown in Fig. 10(d). When the maximum number of targets is one, our method elevates the handoff success rate from 48.7% to 100% and maintains a similar coverage. As the maximum number of targets in the environment increases from one to six, the handoff success rate of the reference method drops gradually from 48.7% to 10.2%. On the contrary, the handoff success rate of our method is maintained within 90% for camera placement with λ = 0.025 pps (λ = 0.05 pps) till the maximum number of targets reaches four (six). Therefore, the proposed algorithm is able to achieve and maintain a significantly higher handoff success rate according to the targets’ density in the environment. In addition to simulation results, we also conducted real-time experiments based on the sensor placement derived from the proposed algorithm for floor plan A. However, producing results using real sequences requires specially designed handoff algorithms, which is out of the scope of this paper. Interested readers please refer to [Chen et al., 2009], where a camera handoff algorithm designed based on the observation measure and the probability of camera overload is proposed and real-time experimental results of multi-camera multi-object tracking are demonstrated.
November 16, 2010
238
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
(a)
(b)
Handoff success rate (%)
100 90 80 70 60 50 40
Erdem & Sclaroff Our method λ=0.01 Our method λ=0.025 Our method λ=0.05 0.4
0.6 N
max
(c)
0.8 /N
1
1.2
obj
(d)
Handoff success rate (%)
100 90 80 70 60 50
Erdem & Sclaroff Our method λ=0.01 Our method λ=0.025 Our method λ=0.05
40 0
0.5
1
1.5
Nmax/Nobj (e)
Fig. 7. Sensor planning results for floor plan A considering various target densities. The optimal camera positioning from: (a) the reference method (coverage: 98.5%), (b) our method with λ = 0.01 pps (coverage: 89.9%), and (c) our method with λ = 0.05 pps (coverage: 89.6%). System performance comparison based on handoff success rate with various target densities (d) Nobj = 4 and (e) Nobj = 15. Target density is described by the maximum number of targets to be tracked simultaneously in the environment.
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
(a)
(b)
239
(c)
Handoff success rate (%)
90 80 70 Erdem & Sclaroff Our method λ=0.01 Our method λ=0.025 Our method λ=0.05
60 50 40
0.4
0.6
0.8 Nmax/Nobj
1
1.2
(d)
Fig. 8. Sensor planning results for floor plan B considering various target densities. The optimal camera positioning from: (a) the reference method (coverage: 94.1%), (b) our method with λ = 0.01 pps (coverage: 90.5%), and (c) our method with λ = 0.05 pps (coverage: 86.7%). (d) System performance comparison based on handoff success rate with various target densities.
The computational complexity of the proposed and reference algorithms consists of the computation of the objective function and the optimization process. With the same optimization algorithm deployed, the computational complexity of computing the objective function becomes dominant, which is O(Ng Nc ) for both algorithms. Therefore, the proposed algorithm has similar computational complexity as the reference method for solving the Max-Coverage problem.
6. Conclusions A sensor planning algorithm designed for both static perspective and PTZ cameras was proposed to achieve the optimal balance between coverage and handoff success rate. The probability of camera overload was derived and employed to solve the discrepancy between a PTZ camera’s instant and achievable FOVs. The probability of camera overload also introduced additional considerations regarding the dynamic
November 16, 2010
240
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al. Frames collected at t Left camera
Frames collected at t Left camera
Right camera
Frames collected at t o+∆ t
Right camera
Frames collected at t +∆ t
Areas covered by both cameras
Areas covered by both cameras
(a)
(b)
Fig. 9. Illustration of sufficient safety margin for continuous and automated handoff using perspective cameras. Sample frames from two cameras (a) when only coverage is considered and (b) when both coverage and handoff are considered. The object of interest is visible in the right camera at to . In (a), the object of interest is lost in the right camera as it moves and becomes visible in the left camera at to + ∆t. There is no sufficient margin for a successful handoff. In (b), the object of interest remains visible in the right camera at to + ∆t, which ensures a successful handoff to the left camera.
interaction between targets and observing cameras to sensor planning. The proposed sensor planning algorithm not only considered the PTZ camera’s dynamics from panning and tilting but
(a)
also incorporated the target’s dynamics. Experimental results demonstrated a significantly and consistently improved handoff success rate in comparison with the reference algorithm
(b)
(c)
Fig. 10. Sensor planning results for PTZ cameras considering various target densities. The optimal camera positioning from: (a) the reference method (coverage: 100.0%), (b) our method with λ = 0.01 pps (coverage: 99.5%), and (c) our method with λ = 0.05 pps (coverage: 100.0%). (d) System performance comparison based on handoff success rate with various target densities.
November 16, 2010
15:12
WSPC-IJIA
S0219878910002208.tex
Multi-Camera Positioning for Automated Tracking Systems in Dynamic Environments
241
Handoff success rate (%)
100 80 Reference method Our method λ=0.01 Our method λ=0.025 Our method λ=0.05
60 40 20 0 1
2
3 N
4
max
/N
5
6
obj
(d)
Fig. 10.
described by Erdem and Sclaroff in environments with various target dynamics. The proposed algorithm presents superior and robust performance regardless of the target densities.
Acknowledgment This work was supported in part by the University Research Program in Robotics under grant DOE-DE-FG52-2004NA25589.
References Angella, F., Reithler, L. and Gallesio, F. [2007] “Optimal deployment of cameras for video surveillance systems,” in IEEE Intl Conf. on Advanced Video and Signal Based Surveillance, London, United Kingdom, pp. 388–392. Antonini, G., Venegas, S., Bierlaire, M. and Thiran, J. [2006] “Behavioral priors for detection and tracking of pedestrians in video sequences,” Intl Journal of Computer Vision 69, 159–180. Bodor, R., Drenner, A., Schrater, P. and Papanikolopoulos, N. [2007] “Optimal camera placement for automated surveillance tasks,” Journal of Intelligent and Robotic Systems 50, 257–295. Bodor, R., Schrater, P. and Papanikolopoulos, N. [2005] “Multi-camera positioning to optimize task observability,” in IEEE Conf. on Advanced Video and Signal Based Surveillance, Como, Italy, pp. 552–557. Cai, Q. and Aggarwal, J. K. [1999] “Tracking human motion in structured environments using a distributed-camera system,” IEEE Trans. on
(Continued )
Pattern Recognition and Machine Intelligence 21(11), 1241–1247. Chen, C.-H., Yao, Y., Page, D., Abidi, B., Koschan, A. and Abidi, M. [2008] “Camera handoff with adaptive resource management for multi-camera surveillance,” in IEEE Conf. on Advanced Video and Signal Based Surveillance, Santa Fe, NM. Chen, C.-H., Yao, Y., Page, D., Abidi, B., Koschan, A. and Abidi, M. [2009] “Camera handoff with adaptive resource management for multi-camera multi-object tracking,” Image and Vision Computing, doi:1016/j.imavis.2009.10.013. Chen, S., Li, Y., Zhang, J. and Wang, W. [2008] Active Sensor Planning for Multiview Vision Tasks, Springer-Verlag. Chen, S. Y. and Li, Y. F. [2004] “Automatic sensor placement for model-based robot vision,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, 34, 393–408. Chen, X. and Davis, J. “An occlusion metric for selection robust camera configurations,” Machine Vision and Applications. David, P., Idasiak, V. and Kratz, F. [2007] “A sensor placement approach for the monitoring of indoor scenes,” in European Conf. on Smart Sensing and Context, Kendal, England, pp. 110–125. Dunn, E., Olague, G. and Lutton, E. [2006] “Parisian camera placement for vision metrology,” Pattern Recognition Letters 27, 1209–1219. Erdem, U. and Sclaroff, S. [2006] “Automated camera layout to satisfy task-specific and floor planspecific coverage requirements,” Computer Vision and Image Understanding, 103, 156–169. Fiore, L., Fehr, D., Bodor, R., Drenner, A., Somasundaram, G. and Papanikolopoulos, N. [2008]
November 16, 2010
242
15:12
WSPC-IJIA
S0219878910002208.tex
Y. Yao et al.
“Multi-camera human activity monitoring,” Journal of Intelligent and Robotic Systems 52, 5–43. Fruin, J. J. Pedestrian Planning and Design. Elevator World. Guo, Z., Zhuo, M. and Jiang, G. [2008] “Adaptive sensor placement and boundary estimation for monitoring mass objects,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 38, 222–232. Gupta, A., Mittal, A. and Davis, L. S. [2007] “Cost: an approach for camera selection and multi-object inference ordering in dynamic scenes,” in IEEE Intl Conf. on Computer Vision, Rio de Janeiro, Brazil. Horster, E. and Lienhart, R. [2006] “On the optimal placement of multiple visual sensors,” in ACM Intl Workshop on Video Surveillance and Sensor Networks, Santa Barbara, CA, pp. 111–120. Isler, V., Khanna, S., Spletzer, J. and Taylor, C. J. [2005] “Target tracking with distributed sensors: the focus of attention problem,” Computer Vision and Image Understanding 100(1–2), 225–247. Lim, S.-N., Davis, L. and Mittal, A. [2007] “Task scheduling in large camera networks,” in Asian Conf. on Computer Vision. Lim, S.-N., Mittal, A. and Davis, L. [2005] “Constructing task visibility intervals for a surveillance system,” in ACM Intl Workshop on Visual Surveillance and Sensory Networks. Mittal, A. and Davis, L. S. [2004] “Visibility analysis and sensor planning in dynamic environments,” in European Conf. on Computer Vision, Prague, Czech Republic, pp. 175–189. Panneerselvam, R. [2004] Research Methodology. Prentice Hall. Pavlidis, I., Morellas, V., Tsiamyrtzis, P. and Harp, S. [2001] “Urban surveillance systems: from the laboratory to the commercial world,” IEEE Proceedings, 89, 1478–1497. Pettre, J., Simeon, T. and Laumond, J. P. [2002] “Planning human walk in virtual environments,” in IEEE/RSJ Intl Conf. on Intelligent Robots and Systems, Lausanne, Switzerland, pp. 3048–3053. Quereshi, F. Z. and Teropoulos, D. [2005] “Towards intelligent camera networks: a virtual vision approach,” in IEEE Intl Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, pp. 177–184. Ram, S., Ramakrishnan, K. R., Atrey, P. K., Singh, V. K. and Kankanhalli, M. S. [2006] “A design
methodology for selection and placement of sensors in multimedia surveillance systems,” in ACM Intl Workshop on Video Surveillance and Sensor Networks, Santa Barbara, CA, pp. 121–130. Roy, S. D., Chaudhury, S. and Banerjee, S. [2004] “Active recognition through next view planning: a survey,” Pattern Recognition 37, 429–446. Roy, S. D., Chaudhury, S. and Banerjee, S. [2005] “Recognizing large isolated 3-D objects through next view planning using inner camera invariants,” IEEE Trans. on Systems, Man, and Cybernetics 35, 282–292. Saadatseresht, M., Samadzadegan, F. and Azizi, A. [2005] “Automatic camera placement in vision metrology based on a fuzzy inference system,” Photogrammetric Engineering and Remote Sensing 71(12), 1375–1385. Tivedi, K. S. [2001] Probability and Statistics with Reliability, Queuing and Computer Science Applications, Wiley-Interscience. Tosic, I. and Frossard, P. [2006] “Omnidirectional views selection for scene representation,” in IEEE Intl Conf. on Image Processing, Atlanta, GA, pp. 2213–2216. Wong, L. M., Dumont, C. and Abidi, M. A. [1999] “Next best view system in a 3D object modeling task,” in IEEE Int’l Symposium on Computational Intelligence in Robotics and Automation, Monterey, CA, pp. 306–311. Yao, Y., Chen, C.-H., Abidi, B., Page, D., Koschan, A. and Abidi, M. [2008] “Sensor planning for automated and persistent object tracking with multiple cameras,” in IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, AK. Yao, Y., Chen, C.-H., Abidi, B., Page, D., Koschan, A. and Abidi, M. [2008] “Sensor planning for PTZ cameras using the probability of camera overload,” in Int’l Conf. on Pattern Recognition, Tampa, FL. Yao, Y., Chen, C.-H., Abidi, B., Page, D., Koschan, A. and Abidi, M. [2010] “Can you see me now? sensor positioning for automated and persistent surveillance,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 40(1), 101–115. Yous, S., Ukita, N. and Kidode, M. [2006] “Multiple active camera assignment for high fidelity 3D video,” in IEEE Int’l Conf. on Computer Vision Systems, New York, NY.