Validation of Blind Region Learning and Tracking Abstract ... - CiteSeerX

1 downloads 0 Views 689KB Size Report
James Black, Dimitrios Makris, Tim Ellis. Digital Imaging Research Centre, Kingston University, Kingston-upon-Thames, Surrey, UK. [email protected], ...
Validation of Blind Region Learning and Tracking James Black, Dimitrios Makris, Tim Ellis. Digital Imaging Research Centre, Kingston University, Kingston-upon-Thames, Surrey, UK. [email protected], {d.makris, t.ellis}@kingston.ac.uk

Abstract Multi view tracking systems enable an object’s identity to be preserved as it moves through a wide area surveillance network of cameras. One limitation of these systems is an inability to track objects between blind regions, i.e. parts of the scene that are not observable by the network of cameras. Recent interest has been shown in blind region learning and tracking but not much work has been reported on the systematic performance evaluation of these algorithms. The main contribution of this paper is to define a set of novel techniques that can be employed to validate a camera topology model, and a blind region multi view tracking algorithm.

1. Introduction Multi view tracking systems offer several advantages in a surveillance system. Firstly, it is possible to fuse information from multiple sensors and assign a unique identity to objects that are visible in two or more overlapping camera views. Secondly, assuming the spatial relations between the cameras are known, it is possible to coordinate object tracking between two or more non-overlapping views. This allows an object’s identity to be preserved when it moves between the blind regions of two camera views. In this paper we focus on how to automatically validate a camera network model using single within-camera tracking data. We will also discuss how such a model can be applied within an agent based framework to support multi view tracking across blind regions, and improve tracking performance. Several strategies have been employed for coordinating object tracking between multiple overlapping camera views [1, 8]. In [1] they assumed that 3D ground calibration information was available for each camera using a common world coordinate system. Information was integrated between each of the camera views using a homography constraint. In [8] a method was proposed that could automatically identify the limits of the field of view of each camera by analysis of training data. Once the model has been determined it is

possible to handover objects between overlapping views. A weakness of the methods [1,8] is their limited support for tracking objects between non-overlapping views. In [1] they suggested using a 3D Kalman filter for predictive tracking across blind regions but this can have serious limitations when the transition times exceed a few seconds. Recently there has been some interest in tracking objects between blind regions [2,4,5,6,7,11,12,13,14,15]. In [2] a system is described that can track objects across blind regions using a Kalman filter. It is assumed the ground plane is known between each of the non-overlapping views. Motion and shape cues are used to match objects across blind regions. In [4,11] the camera topology is learned by the temporal correlation of exit and entry events. The major exit and entry zones in each camera are identified by clustering the start and end points of tracked object trajectories using an Expectation Maximisation (EM) algorithm [9,10]. A system is presented in [5] that uses a combination of spatial-temporal and object appearance information to track objects between blind regions. Parzen windows were used to estimate the transition time probability distribution from a set of training data. The approach adopted in [6] automatically learns salient transition times between non-overlapping cameras, or long-term occlusion regions within the same camera. The transition times are estimated by using fuzzy histogram matching based on colour appearance information in the Munsell colour space. Once the model has been learnt it is possible to track objects across occlusion regions within the same camera or across blind regions between two cameras. A Bayesian formulation is proposed in [7] to coordinate tracking across blind regions on a traffic highway, although it is assumed the camera topology, transition times, and transition probabilities are known in advance. In [12] a Markov Chain Monte Carlo (MCMC) method is used for object identification across a multi camera network. The MCMC allows accurate estimation of the origin/destination transition times even when individual links in the sensor chain are unreliable. In [13] a method is described that uses a combination of spatial-temporal and appearance cues to track objects between non-overlapping camera views. A correspondence based colour calibration model is

defined, which allows an object’s appearance to be predicted across blind regions. A correlation matrix is defined between histograms of the same object observed in different camera views. Dynamic programming is used to estimate the optimal alignment function between the appearance histograms of the objects seen in different views. In [15] a transition correspondence model (TCM) is defined that estimates the likelihoods of links existing between sinks and sources of two nonoverlapping camera views. The sources and sinks in each view are automatically identified by simultaneously estimating the optimal within-camera stitching and source and sink assignment [14]. They claim to have the advantage over [4] and [11] in that their algorithm can handle signal correlations that are not stationary, which can occur in scenes that have variable transition times due to constraints imposed by heavily congested traffic flow and stop lights. The method has currently only been evaluated using synthetic data. One common characteristic of the methods discussed in [2,5,6,7,13] is that they require some form of supervision, or rely on known object correspondences in training data between non-overlapping views. Ideally we want to use a solution that can work unsupervised and operate in a correspondence free manner, in order to increase the robustness of the algorithm. [4,11,12,15] are the only previous methods discussed that satisfy these properties. The remainder of this paper has the following structure: in Section 2 we summarise the key points of an algorithm that can automatically learn the camera topology and transition times between a set of widely separated views, which has been discussed in previous work [4,11]. Section 3 describes how this model can be employed within an agent based framework for coordinating object tracking through blind regions. Section 4 contains a discussion of performance evaluation and results. We first discuss how withincamera tracking data can be employed to validate the learning algorithm for network calibration. This novel methodology can be used to evaluate any other similar unsupervised learning algorithm to assess the accuracy of transition time estimates. Secondly, we compare the performance of blind region tracking using a 3D Kalman filter and the agent based handover reasoning framework presented in Section 3. We demonstrate how the network calibration model can improve the performance of a multi view tracking algorithm. Finally, in Section 5 we summarise the key and novel points of the paper and our main conclusions on the current work.

2. Multiple Camera Activity Network We automatically learn a Multiple-Camera Activity Network (MCAN) from observations. The MCAN is then used to estimate the camera network calibration and the transition times and probabilities that are used for blind tracking in between cameras. A short description of the MCAN is given below; more details can be found in [11]. A MCAN is formulated by the set of all the entry/exit zones that are estimated within all the cameras of the system. The entry/exit zones of each camera view (see Figure 1) are learnt automatically using [9]. A link between the nodes i and j represents the activity between them and is expressed by a transition probability function aij( ), where is the transition time. We proposed a statistical correspondence-free method to estimate the transition probabilities and times of the MCAN. Specifically, the entry/exit events for each pair of nodes i and j are correlated in the time domain and the cross-correlation function Rij( ) is used to provide an estimate of the aij( ) (See Eq. 4 in [11]). To further improve the accuracy of the estimated transition probabilities, the median absolute deviation of the function Rij( ) is used as an unbiased estimate of the noise: (1) n = E{Rij (τ ) − median (Rij (τ )) Then, the noise of the cross-correlation function is suppressed according to the formula: (2) Rij (τ ) = median (Rij (τ ) ), ∀τ Rij (τ ) < s ⋅ n where s is a factor. In section 4, we evaluate the result of the noise suppression for values of s=1,2, against the original method. These values allow the outliers of the function Rij( ) to stand out as indicators of activity links. Their selection is also justified by the results in section 4. The implementation of the learning method is based on the accumulation of co-occurred events on a transition time histogram for each possible pair of nodes. In our experiments, the bins of the histogram are one second wide, therefore the transition times are estimated within an accuracy of one second. The Network Camera Calibration is derived by the transition times of the MCAN. Specifically, two cameras have overlapped views if at least one transition time from an exit zone to an entry zone is negative. Otherwise, they have non-overlapped views and their separation can be defined in temporal terms, based on the minimum transition time from one camera to the other.

The MCAN is also used for blind tracking through the “gaps” of non-overlapped views, as described in the next section.

Figure 1: The entry/exit zones that were identified in the views of the six-camera network.

3. Multi View Blind Region Tracking In a typical image surveillance network the cameras are usually organised so as to maximise the total field of coverage. As a consequence there can be several cameras in the surveillance network that are separated by a short temporal and spatial distance, or have minimal overlap. In these situations the system needs to track an object when it leaves the field of view of one camera and then re-enters the field of view of another after a short temporal delay. For transition times of less than two seconds the trajectory prediction of the Kalman filter can be used to predict where the object should become visible again to the system. However, if the object changes direction significantly or disappears for a longer time period this approach is unreliable. In order to handle these cases our system uses an object handover policy between each pair of non-overlapping cameras. The system waits for a new object to be created in the adjacent camera view. A data association method is applied to check the temporal constraints of the objects exit and re-entry into the camera network. In order to facilitate the object handover reasoning process a model of the major exit and entry regions is constructed for each pair of adjacent non-

overlapping camera views. These models can be hand crafted or automatically learned by analysis of entry and exit events as was discussed in Section 2. These models of the entry and exit regions can be used to improve the performance of the object handover reasoning process. When an object is terminated within an exit region the system uses the camera network model to determine the regions where the object is most likely to reappear. The main benefits of using the model for handover reasoning is that the method reduces the computational complexity of the data association process, since the model is used to focus attention on the major entry and exit regions where object handover is most likely to occur. In addition, if the two cameras are calibrated in different world coordinates the system can still track objects since the model uses temporal properties to perform data association. The object handover region models consist of a linked entry and exit region along with the expected transition time between each region. The temporal delay can be determined manually by observation, or by generating statistics from the data collected over extended time periods. The temporal delay gives an indication of the transition time for the handover region. Each entry or exit region is modelled as a Gaussian:

( x, y ),

. Where

( x, y )

is the centre of the

distribution in 2D image coordinates, and is the spatial covariance of the distribution. The following convention is used to describe the major entry and exit regions in each camera view:

X ik is the kth exit region in the ith camera view

E lj is the ith entry region in the jth camera view. Given the set of major exit and entry regions in each camera the following convention is used to define the handover regions between the non-overlapping camera views:

H ijP = X ik , E lj , t ij , σ ij2

is the pth handover region

between camera ith and jth camera views. As previously discussed each handover region p H ij consists of a spatially connected exit and entry

(

)

region pair X ik , E lj , along with the transition time and

its variance

(t ij , σ ij2 ) . An example of object handover

regions is shown in Figure 2 . The black and white ellipses in each camera view correspond to the major entry and exit regions in each camera. The links represent the handover regions between each camera. The object handover mechanism only needs to be activated when an object is terminated within an exit region that is linked to an entry region in the adjacent

camera view. Once the object disappears from one camera and is in transit through a blind region the system cannot reliably determine its exact position.

have transition probabilities between each of the links, it is possible to select the best object for handover completion (3).

arg max ( p(h O E , O X ) ) h∈H s

where

(3)

H s is a set of shared links with a common entry

or exit zone, O E , O X are the set of observations for exit and entry events across the set of shared links. The likelihood of an object moving between each link can be decoupled into several components: the link transition probability, and the object’s exit/entry posterior probabilities (4).

p(hij OE (i ), O X ( j )) = Figure 2 Handover regions for six cameras in the surveillance system.

3.1 Handover Initiation The handover agent is activated when an object is terminated within an exit region

X ik that is included in

the handover region list. The handover agent records the geometric location and time when the object left the field of view of the ith camera. Allowing the object handover agent to only be activated when an object is terminated in a handover region eliminates the case where an object is prematurely terminated within the field of view due to tracking failure caused by a complex dynamic occlusion. In addition, once the handover agent has been activated the handover region model can be used to determine the most likely regions where the object is expected to re-appear, hence reducing the computational cost of completing the handover process.

3.2 Handover Completion The handover agent achieves completion when an object is created within the entry region

E lj that forms a k

handover region with the exit region X i , where the object was terminated in the ith camera view. The handover agent task is only complete if the new object satisfies the spatial and temporal constraints of the handover region. In some scenes it may be possible that two or more handover regions can share a common entry or exit zone. In these instances it is necessary for each agent to exchange information on the status of possible candidates for handover completion. Assuming that we

aij (τ ) p(i X i , OE (i )) p( j E j O X ( j )) (4) The handover region with the highest likelihood can be chosen by the tracker as the best candidate for handover completion. By adopting this approach it is possible for the agent based tracker to manage multiple hypotheses for 1-M and M-N relationships between entry and exit zones.

3.3 Handover Termination The handover agent is terminated once an object has not reappeared after the maximum transition time, which can be determined by the statistical properties of the handover regions related to the exit region where the object left the field of view. The maximum transition time for the handover region is an important characteristic; since the scene is not constrained in such a way that an object must re-appear in the field of view once it enters a handover region. It is possible that once the object leaves one camera it will not re-appear within scene again. When this situation occurs it is not possible for the system to locate the object, since it is will not be visible by any of the cameras in the surveillance network. The framework used for tracking objects between non-overlapping views makes several assumptions. It is assumed that the temporal delay between the camera views is of the order of seconds. If the handover regions are located on the same ground plane and calibrated in the same world coordinate system then 3D trajectory prediction can be used to add another constraint to the data association between the handover object and candidate objects which appear in entry regions in the adjacent camera view. The 3D trajectory prediction is only valid if the object maintains

the same velocity and does not significantly change direction once it has entered the handover region.

4. Results and Evaluation 4.1 Network calibration validation There are various approaches to validate the performance of a camera network calibration learning algorithm. In [12] and [15] synthetic data was used to determine performance. This approach does offer some advantages, since it is possible to simulate different scenarios that may not be commonplace in real data. For example, extremely dense traffic flow with variable transition times between cameras. However, we prefer to use real data captured by a real-time surveillance system, since it is more likely to allow the true algorithm operational performance to be characterised. We validate the unsupervised learning algorithm discussed in Section 2 using several criteria: the accuracy of the most popular transition time, the similarity between the estimated and actual transition time distributions, and the consistency of ground plane velocity profiles. We initially captured within-camera tracking data using a six camera surveillance system over an extended period of 24 hours. We then applied a technique to identify the major entry and exit regions in each camera view [9]. The algorithm described in Section 2 was used to estimate the transition time distributions for within-camera objects between various entry and exit zones. The actual within-camera transition times was estimated from the captured tracking data. By comparing the transition time distributions between the tracking based method and the unsupervised learning it is possible to evaluate the performance of the algorithm. A qualitative comparison of the transition time distributions is shown in Figure 3. The tracking based transition time distributions are shown in blue. The transition time distributions estimated from unsupervised learning are shown in red. A key conclusion we can derive from these results is that the unsupervised learning algorithm can consistently estimate the transition times for each camera. This illustrates the effectiveness of the network calibration algorithm, since it operates in a correspondence free manner and does not make any prior assumptions about the organisation of the camera network.

(a)

(b)

(c)

(d)

(e) (f) Figure 3: Plots of the estimated transition times between various entry/exit zones using withincamera tracking data (blue), and unsupervised learning (red), using scale factor s=2. A quantitative assessment of the within-camera transition time estimates was made by measuring the similarity of the distributions. The transition time distributions are modelled using a 101 bin histogram where each bin indicates the likelihood of the object transition time within the interval (-50,+50) seconds. We used a Bhattacharyya coefficient measure to compare the similarity of the two non parametric distributions:

β (q a , qb ) =

N u =1

q a (u ) × qb (u )

(5)

where N is the number of bins in the histogram,

q a is

the transition time distribution estimated from the within-camera tracking data. qb is the transition time distribution estimated from the unsupervised learning algorithm. It is assumed that both of the distribution histograms are normalised such that:

N u =1

q(u ) = 1 .

The Bhattacharyya coefficient is a popular similarity measure that has been used for robust object tracking. The similarity measure has an advantage over chi-square and Kullback-Leibler which can have singularity problems when comparing empty histogram bins [3].

Table 1.

The similarity measurements are summarised in N s is the number of objects observed by the

tracker between the within-camera exit and entry region. ΜTT is the peak transition time from the tracking data. ΜTU is the peak estimated transition time using the unsupervised learning algorithm discussed in Section 2. The largest error for peak detection is 3 seconds for link (4,4), in general the error of the transition time error was less than 2 seconds. A more accurate estimation can be achieved by refining the quantisation of the histogram bins. The value of β A is the similarity measurement of the estimated transition time distributions between tracking data and the method described in [11]. In section 2 we described an extension to this method to suppress and filter the noise from the estimate of the transition time distribution using the unsupervised learning algorithm. The values β B and

β C correspond to the noise suppression method using

scale factor s=1 and s=2 respectively. The values of the similarity measures are shown in Table 1. The means and standard deviations of β A , β B , and β C are (0.752, 0.810, 0.831) and (0.0944, 0.1089, 0.0942) respectively. From these results we can conclude that the method devised to filter noise from the transition time distribution discussed in section 2 performs better than the method described in [11]. The final method we employed to validate the camera network model was to compare the ground plane velocity profiles of links identified from unsupervised learning and within-camera tracking data. The distances between the centres of the zones on the ground plane were used to provide an estimate of the most popular speeds across links, assuming linear velocity of the targets on the ground plane. For the within-camera tracking data we estimate the velocity profiles by considering the estimate the ground plane distance between the start and end of tracked objects, and estimating their speed given the corresponding transition time. The estimated speeds provide a quantitative validation of the method. For instance, links with slow speed values (10m/sec) correspond to vehicle motion (e.g., most links between zones in cameras 4 and 6. medium speed values (between 3m/sec and 10m/sec) correspond to vehicle motion on low-speed lanes (e.g., links between zones in cameras 3, 4, 5 and 6.). The velocity profiles are summarised in Table 2.

cam

exit

entry

Ns

ΜTT

ΜTU

βA

βB

βC

1

1

3

57

-16

-16

0.84

0.90

0.91

1

2

2

77

0

0

0.76

0.80

0.78

1

3

1

46

-14

-15

0.77

0.81

0.79

2

4

4

219

-4

-1

0.84

0.91

0.82

2

4

5

86

-14

-13

0.84

0.90

0.91

2

5

4

67

-13

-13

0.85

0.91

0.87

2

6

4

61

-13

-14

0.81

0.91

0.91

2

6

6

27

0

0

0.65

0.66

0.69

3

7

9

31

-11

-11

0.72

0.80

0.85

3

7

10

41

-5

-5

0.79

0.88

0.91

3

8

8

36

0

0

0.60

0.63

0.65

3

8

9

42

-3

-3

0.69

0.70

0.71

3

8

10

59

-5

-5

0.82

0.91

0.95

3

9

8

51

-2

-2

0.77

0.82

0.84

3

10

7

59

-5

-4

0.83

0.93

0.91

3

10

8

35

-5

-5

0.76

0.84

0.87

4

11

12

125

-8

-6

0.88

0.96

0.97

4

12

11

76

-8

-7

0.83

0.88

0.94

4

14

16

127

-4

-4

0.72

0.76

0.80

4

15

13

338

-3

-3

0.76

0.80

0.82

5

18

20

117

-4

-4

0.87

0.94

0.95

5

20

19

78

-4

-4

0.75

0.78

0.82

5

20

20

19

0

0

0.53

0.58

0.72

5

21

19

54

-6

-8

0.85

0.92

0.95

5

21

21

14

-2

-1

0.60

0.66

0.73

6

23

25

508

-2

-2

0.63

0.67

0.71

6

23

26

75

-5

-4

0.64

0.70

0.72

6

24

22

547

-2

-2

0.67

0.71

0.76

Table 1: Summary of quantitative evaluation results

The values speed(T) and speed(UL) refer to the estimates of the link speed from the within-camera tracking data and unsupervised learning respectively. The mean and standard deviation of the speed error was 0.795 and 0.931 respectively. These results indicate the consistency of the velocity profiles. The large variance is due to the three links (24,22), (23,25), and (20,19) that relate to fast and medium speed traffic. One benefit of using this evaluation methodology is that it is possible to quantitatively compare the performance of any other similar network calibration algorithm. The approach makes use of real data that contains different sources of noise including: poorly initialised tracked objects, false correlations due to free flowing vehicle traffic, and loitering people. Each of these sources of noise tests the robustness of the unsupervised learning algorithm.

cam

exit

entry

Speed (T) m/s

Speed (UL) m/s

error

1

1

3

1.60

1.47

1

2

2

0.83

-

0.13 -

1

3

1

1.70

1.55

0.15

2

4

4

1.36

1.62

0.26

2

4

5

1.70

1.60

0.10

2

5

4

1.56

1.47

0.09

2

6

4

1.64

1.59

.05

2

6

6

1.90

-

-

3

7

9

0.88

1.88

1.00

3

7

10

1.84

1.75

.09

3

8

8

0.83

-

-

3

8

9

2.16

2.47

0.31

3

8

10

4.33

4.74

0.41

3

9

8

2.74

3.29

0.55

3

10

7

2.15

2.43

0.28

3

10

8

4.10

4.52

0.42

4

11

12

1.64

2.75

1.11

4

12

11

1.47

2.42

0.95

4

14

16

11.65

11.07

0.58

4

15

13

10.57

11.59

1.02

5

18

20

8.89

9.61

0.72

5

20

19

7.64

4.87

2.77

5

20

20

1.37

-

-

5

21

19

1.95

2.68

0.73

5

21

21

0.74

0.14

0.60

6

23

25

10.86

14.74

3.88

6

23

26

7.05

7.68

0.63

6

24

22

8.17

10.43

2.26

video sequence. A summary of the tracking results is given in Table 3. When using a 3D Kalman filter The overall accuracy of handover completion was 83.58%. The number of object handover failures was due to: poor track initialisation by the 2D tracker, and the size of objects such as large vehicles whose position on the ground plane could not be reliably estimated, resulting in tracking failure. When we only consider blind region links with a transition time of more than 2 seconds the accuracy of handover completion drops considerably to 30% (6/20). The Kalman filter is unable to track objects moving between the link (11,20) where objects are changing direction as they enter the blind region. Linear prediction of the object’s position is not sufficient to track objects across this blind region link. The results of this experiment illustrate that 3D trajectory prediction of the Kalman Filter is not always adequate for tracking objects between blind regions.

(a)

Table 2: Comparison of velocity profiles

4.2 Object Handover Between Multiple Views The purpose of this experiment was to determine the reliability of the multi view tracker for coordinating tracking between widely separated camera views, which are non-overlapping, or have limited overlap. We manually defined ground truth for a 30 minute video sequence that comprised 10,000 image frames from six camera views. The ground truth defines the 2D track identification for objects moving between different camera views. The multi view tracking results and the ground truth were compared to determine if the correct track identity was preserved when objects move between each of the camera views. In total 134 ground-truth objects were manually selected from the multi view

(b) Figure 4: Example of object tracking through blind regions. (a) an object moving through three different cameras, (b) corresponding tracked 3D trajectory on the ground plane. The same multi view video sequence was run again using the camera network model and an agent based handover region policy, as presented in section 2 and section 3. The success of the object handover of the

multi view tracker increased to 92.54%. When we only consider blind region links the handover accuracy is 90% (18/20). The two failures was a result of an object moving between a link (10,18) that was not detected by unsupervised learning, and a very slow moving vehicle between link (20,11). Figure 4(a) shows an example of object tracking between several camera views with the correct identity being preserved. In Figure 4(b) shows the corresponding 3D ground plane trajectory of the tracked object. It can be observed that there are several kinks in the trajectory, which represent instances where the Kalman filter is attempting to predict the location of the object as it moves through a blind region between the three camera views. This demonstrates an instance where the agent based object tracking framework can preserve an object’s identity and the 3D Kalman filter could fail. 3D Kalman Filter Acc. miss (%)

hit

Agent based tracker Acc. miss (%)

exit

entry

hit

3

7

2

0

100.00

2

0

100.00

7

3

5

1

83.33

5

1

83.33

10

18

0

1

0.00

0

1

0.00

11

20

0

8

0.00

8

0

100.00

15

22

23

2

92.00

23

2

92.00

18

10

6

2

75.00

8

0

100.00

20

11

0

3

0.00

2

1

66.67

23

16

76

5

93.83

76

5

93.83

Table 3: Comparison between Kalman Filter and agent based tracker. Blind region links are highlighted in bold.

5. Summary and Conclusion In this paper we have presented a novel set of techniques that can automatically validate an unsupervised camera network calibration learning algorithm. Our validation strategy has enabled us to produce qualitative and quantitative results which indicate the robustness of the unsupervised learning algorithm. We have also demonstrated how this camera network calibration model can be applied within an agent based framework to track objects through blind regions between widely separated views. The results demonstrate that the reliability of object handover reasoning through blind regions improved from 30% to 90% when compared to using a conventional 3D linear Kalman filter based approach.

References [1]

Black J., Ellis T.J., Rosin P., "Multi View Image Surveillance and Tracking", IEEE Workshop on Motion and Video Computing (MOTION02), Orlando, USA, pp. 169-174. December 2002.

[2]

Chilgunde A., Kumar P., Ranganath S., WeiMin H., “MultiCamera Target Tracking in Blind Regions of Cameras with NonOverlapping Fields of View”, British Machine Vision Conference (BMVC04), London, pp 397-406, September 2004.

[3]

Comaniciu D., Ramesh V., Meer P., “Kernel-based object tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 25, No. 5, pp 564-577, May 2003.

[4]

Ellis T.J., Makris D Black J, "Learning a Multi-camera Topology", Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS03), Nice, France, pp. 165-171. October 2003

[5]

Javed O., Rasheed Z., Shafique K., Shah M., "Tracking Across Multiple Cameras With Disjoint Views", IEEE International Conference on Computer Vision (ICCV03), Nice, France, pp 952-957, October 2003.

[6]

KaewTraKulPong P, Bowden R. “Probabilistic Learning of Salient Patterns Across Spatially Separated, Uncalibrated Views”. IEE Intelligent Distributed Surveillance Systems (IDSS04), London, pp 36-40, February 2004.

[7]

Kettnaker V., Zabih R. Bayesian “Multi-Camera Surveillance”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR99), Fort Collins, Colorado, pp 253-561, June 1999.

[8]

Khan S., Shah M, Consistent Labeling of Tracked Objects in Multiple Cameras with Overlapping Fields of View. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 25, No. 10, pp 1355-1360, October 2003.

[9]

Makris D, Ellis T.J., "Automatic Learning of an Activity-Based Semantic Scene Model", IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS03), Miami, FL, USA, pp. 183-188. July 2003.

[10] Makris D, Ellis T.J., "Path Detection in Video Surveillance" in ' Image and Vision Computing' , 20(12) pp. 895-803, October 2002. [11] Makris D, Ellis T.J., Black J, "Bridging the Gaps between Cameras", IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR04), June, Washington DC, USA, pp 205-210, June 2004. [12] Pasula H., Russell S., Ostland M., Ritov Y., “Tracking many objects with many sensors”, Proceedings of the 16th International Joint Conferences on Artificial Intelligence (IJCAI99), Stockholm, Sweden, pp1160-1171, August1999. [13] Porikli F., Divakaran A., “Multi-Camera Calibration, Object Tracking and Query Generation”, International Coneference on Multimedia and Expo (ICME03), pp 653-656, Baltimore, July 2003. [14] Stauffer C., "Estimating Tracking Sources and Sinks", Second IEEE Workshop on Event Mining (CVPRW), vol. 4, no. 4 pp 35, July 2003. [15] Stauffer C., "Learning to Track Objects Through Unobserved Regions", IEEE Workshop on Motion and Video Computing, Colorado (MOTION05), USA, pp. 96-102, January 2005.

Suggest Documents