REAL-TIME ESTIMATION OF GEOMETRICAL ... - CiteSeerX

3 downloads 0 Views 289KB Size Report
to identify minimum 4 image line endpoints. These points, ... mography among the two views with the maximum number ... The instant when a new object is seen is used to ..... instant. In particular we guess a linear motion with constant speed.
REAL-TIME ESTIMATION OF GEOMETRICAL TRANSFORMATION BETWEEN VIEWS IN DISTRIBUTED SMART-CAMERAS SYSTEMS Liliana Lo Presti and Marco La Cascia Dipartimento di Ingegneria Informatica University of Palermo - ITALY ABSTRACT We present a method to automatically estimate the geometric relations among different views of cameras with partially overlapping fields of view using the position of detected moving objects visible at the same time in two or more views. The correspondences among objects are found by comparing their appearance based on dominant colour descriptors. The geometric transformation are computed iteratively and may be used to solve the consistent labelling problem. As a significant part of the processing is performed on smart cameras, the method has been conceived by taking into account the limited resources and low computational capability of the aforementioned devices. Experiments are done on publicly available datasets in order to show the accuracy of the geometric relations found by our method. Index Terms— Consistent labelling, smart cameras, homography, motion detection, midground 1. INTRODUCTION Nowadays, video-surveillance systems are in great demand in order to guarantee the public safety of people and sites of interest against dangerous events. The main goals of an intelligent video surveillance system are: detection and monitoring of moving objects in the scene, understanding of events occurring in the site and prediction of actions and intents of the detected people in an automatic real-time way without the direct control of a human operator. Particular care must be used to detect potentially dangerous situation coming from the introduction of new stationary objects like abandoned luggages. A distributed video-surveillance system is composed by many cameras that can be organized in a wireless network. In this case, each node consists of a micro-controller, a radiocommunication device, a battery for its own power supply and a camera to sense the environment [1]. Smart-cameras collect and process in real-time the data gathered in order to monitor interesting events concerning the safety of the site and recover information useful to understand the scene. They must produce and forward only aggregated data in order to reduce the number of data transmissions towards a base station. This c 978-1-4244-2665-2/08/$25.00 2008 IEEE

base station collects and merges the information coming from the whole networks and acts like a gateway to the user. Because the system is composed by a great number of cameras, automatic methods to align them are required to establish the correspondences among the several views acquired by each camera. In outdoor, cameras have often to work in hostile conditions without any human action after the deployment; so cameras views alignment is periodically required. In realizing this kind of systems many issues have yet no solution and are the subject of current research activities. In particular, outdoor video surveillance is challenging because the area to monitor is often too wide and have not well defined boundaries. Moreover, in outdoor the system is very sensitive to problem like sudden lighting variation and is heavily subjected to whether changes. Under these conditions, automatic detection of moving objects in the scene and recognition of dangerous events are more difficult [2, 3, 4]. On the other hand, distributed video-surveillance is challenging due to the need of both power saving and continuously monitoring the site. So, suitable arrangements are necessary to adequately satisfy these constraints. For these reasons, methods to track moving objects among the several distributed cameras plays a central role; they must establish geometric relations among the different views of the cameras deployed in the site in order to unambiguously identify objects and solve potential partial/total occlusion problems (consistent labelling)[5, 6]. In this paper we describe a system composed by a set of distributed smart-cameras able to detect moving and new stationary objects introduced in the site. Our system takes into account both dynamic changes of outdoor environment and constraints imposed by the limited computational power of the devices used in the network. We propose an algorithm to estimate the geometric relations among the different views of the smart-cameras based on the correspondences among the moving objects simultaneously detected by two or more nodes in the network. Correspondences are found by comparing the appearance models of the detected objects and are used to estimate iteratively the geometric relations among them so to solve the consistent labelling problem. Our method estimates the cameras alignment by using the trajectories of the detected objects;

the estimated geometric transformation is the homography between the views. The most important contribution is online computation of the transformation differently than other methods [7, 8, 5, 6, 9] that collect a certain amount of correspondences to compute an off-line alignment among cameras views. In the following we briefly describe some related works about the solution of the consistent labelling problem and then we will describe our method. 2. RELATED WORKS The automatic visual control of human presence and actions in an outdoor environment is a challenging problem. A large literature exists about surveillance in structured environment but many research works are yet necessary to develop systems in outdoor and to migrate video-surveillance algorithms on architectures with low computational power and limited storing capability. There are many methods trying to solve the consistent labelling problem. Some methods rely on feature matching to establish correspondences among the several camera image planes. Generally they match colour or other set of features of objects being tracked in each camera to learn the geometry and the constraints among the views. These methods usually compute epipolar geometry, homography or landmarks in the views and may be based on the appearance of the detected objects in the site. Other methods are based on 3D information and use camera calibration data and the 3D environment model to solve the problem. Finally, alignment approaches estimate the geometric transformation between the cameras so that the same object in different cameras will map to the same location [5, 6]. In [7], the authors present a technique for the registration of multiple surveillance cameras through the automatic alignment of image trajectories constructed by tracking pedestrians as they move through the area under surveillance. Their method computes the relative rotation from each camera to the local ground plane by a minimal user input. Once cameras have been aligned, objects can be tracked in multiple views along the local ground plane. Relative camera-to-ground rotation is estimated in a semi-automated fashion. A user denotes two different bundles of parallel lines in the image so to identify minimum 4 image line endpoints. These points, together with the focal length, permit to compute the relative orientation of the camera with respect to the ground. This transformation is used to back-project the found trajectories in a reprojection plane. The transformed trajectories are used to estimate the rotation, scale and translation transformations that permit to align the trajectories in the reprojection plane. Stauffer et al. [8] propose a system that can automatically determine the topology of camera overlap and determine pairwise camera correspondence models directly from the tracking data and capable of determining object correspondences. They propose a method for estimating the planar correspondence model given time-aligned tracking data and introduce a

mechanism for validating the model itself to determine if the best correspondence model is valid. By using the tracking sequences of the cameras deployed in the site, they estimate the correspondence matrix among that objects. Finally, the homography among the two views with the maximum number of corresponding tracking sequences is chosen as the model for the current cameras pair. The approach described in [6] uses uncalibrated cameras with overlapping fields of view. Their approach consists of finding the limits of field of view (FOV) of each camera by observing motion in the environment. FOV lines are the edges of the footprint of a camera as seen in other cameras and permit to establish correspondence between trajectories so to disambiguate between multiple possibilities for correspondence and allow to determine the set of cameras in which that object will be visible. The instant when a new object is seen is used to establish just one correspondence between the tracks of the same object in different views. Since the false candidates are randomly spread on both sides of the line whereas the correct candidates are clustered on a single line, correct correspondences will yield a line in a single orientation, but the wrong correspondences will yield lines in scattered orientations. They use the Hough transform to find the best lines. In [9] the problem of self-calibration of multiple cameras is solved by using feature correspondences to satisfy planar geometric constraints of moving objects in the scene. The method estimates the homography which aligns the planar structure in the scene and then computes the 3D position of the plane and of the relative camera positions. Since objects that appear to be moving simultaneously in two camera views are likely to correspond to the same 3D object and are used to determine the homography, they use the centroids of moving objects in the images as features and motion trajectories in different cameras are randomly matched against one another to compute plane homographies. The correct homography is the one that is statistically most frequent. The homography is computed in two steps. First, a rough alignment is obtained by using moving objects tracked in both views, then a finer alignment is achieved through a robust estimation techniques on static features to determine a more accurate registration of the ground plane. In our work, we compute the homography based on motion detection. Corresponding points are found by comparing the appearance models of the detected objects. In the following, first we describe the architecture of the implemented system by highlighting the main tasks of the smart cameras and of the centralized processing unit that we call logical reasoner. Then we describe in details the proposed algorithm to estimate the homography among two different camera views. 3. SYSTEM OVERVIEW The proposed technique has been studied for a distributed video-surveillance system in wide outdoor environments.

We assume that each camera in the system has got no knowledge about its own location and that of the other cameras in the site. It knows only the existence of a centralized processing unit to send the information detected about the site. As figure 1 shows, there is a central unit and a set of peripheral nodes that are linked by means of a wireless network to the central unit to communicate the information detected about the monitored site. Each node represents a smart camera. Fig. 3. Tasks performed by the logical reasoner to perform the multi-camera tracking 4. SMART CAMERAS PROCESSING

Fig. 1. Distributed set of smart-cameras connected to a centralized processing unit we call logical reasoner

In the start up phase, the logical reasoner must only know which cameras have overlapping views. No other calibration information is needed. We assume that the ground plane is visible in all cameras and the sequences are time-aligned; moreover, cameras have overlapping FOVs and when a moving object disappears from all the camera views to reappear afterwards in some camera view, it will be treated as a new object. To explain our technique we present the work done by the smart cameras to detect the potential corresponding points and the work done by the reasoner in order to estimate the homography. The main tasks for each node are showed in figure 2 while those performed by the reasoner in order to realize a multicameras tracking are shown in figure 3

Fig. 2. Tasks performed by each smart-camera to detect the objects

Even though it is reported in figure 3, the use of the estimated homography to perform multi-camera tracking is beyond the scope of this paper and it is not described here.

A smart-camera is a combination of video-sensing, processing and communication devices able to extract information from the scene for higher level processing. Generally, smartcameras are designed like flexible and programmable modules of a wider network in order to realize a scalable system. When used in wireless network, they can not produce videostream but just convenient features in order to understand the events occurring in the sites. We assume that smart-cameras work independently. By classifying image pixels as foreground or background, smartcameras detect moving objects and any changes in the scene that becomes permanent. To rapidly identify suspicious objects by highlighting the new stationary one another class, called midground, is used [10]. Once objects are detected, smart cameras locally track them having care to solve problem like split/merge of the detected objects[11]. Each node tracks objects and assigns them a unique label. Nodes estimate the trajectory of each object on the image plane by approximating it to a piecewise linear function. Vertices of this function correspond to the points in which the object changes its own direction and are used to estimate the homography among views. 4.1. Objects Detection In our previous work [10], we presented an algorithm to highlight changes in the background by distinguishing at pixel level between moving and new stationary objects introduced in the scene. We have modelled the scene by using three different memories in order to store short-term, medium-term and long-term changes. In classifying pixel as foreground, midground and background we have taken advantages from considering the multi-modal nature of the pixels and their temporal evolution. In particular, midground permits to put in evidence the new stationary objects in the site so to make possible the analysis of suspicious objects (for example abandoned luggage) appearing in the scene. In our algorithm, background adapts only to the permanent changes; the training phase of the mod-

els is done on-line and results becomes more and more accurate after some frames. The algorithm has been conceived with particular care to limit computational load and memory occupancy in order to guarantee real-time operation on a node of a WSN. Figure 4 shows an example of the detected midground/foreground objects in the scene.

(a)

(b)

(c)

(d)

Fig. 4. Example of results: a) frame, b) hand segmented foreground, c and d) detected foreground and midground. To detect blobs, we apply an initial morphological open operation so to eliminate some noisy points in midground and foreground; a size filtering permits then to purge the too little blobs in the scene on the basis of the area of the detected blobs. For each blob, information like the bounding box and the area are stored to characterize them. During the start up step, we need also to compute an appearance model of the objects. In this step, blobs near to the border of the image are not considered as their point of support on the ground could not be inside the image. 4.2. Objects Characterization We use appearance based descriptors to characterize the blob in order to solve local tracking and find correspondences during the start up stage of the system. To construct an appearance model for each detected object, an Incremental Major Colour Spectrum Histogram Representation (I-MCSHR) is used [12]. For each new object appearing in the scene its own RGB colours are clustered by K-means method so to consider the most representative colours in the detected object. As explained in [12], a good representation may have 30 clusters. Nevertheless, in our work this is an upper bound because cluster too close are merged all together. The resulting clusters will be used like major colours for our model. To each one we assign a probability of appearance computed like frequency with respect of the total number of samples used. To consider the temporal evolution of the objects, the descriptor is constructed in an incremental way by updating the colour

probabilities at each frame. In this way our model will be refined as we collect more information about the object. In our experiments we have used 3 frames to construct the model. To compute the homography, we determine a point in the image plane on the ground for each blob. We approximate this point by using the middle point of the baseline of the detected object. This point (feet point) is used to compute the pair of corresponding points in the different views but also to allow a local tracking of the object. Tracking is done using a Kalman filter to predict the new location in the next frame[11]. The nearest blob to that predicted is considered the correspondent blob. In the cases of ambiguities we compare the appearance models of the detected objects so to accurately label them. Feet points are computed on the image plane and are used by the reasoner to estimate the homography. The main problem in computing this type of geometric transformation is that points can not be on the same straight line otherwise the transformation degenerates. Moreover, in realizing our system we want to limit the number of data transmissions in the net. So, we compute and transmit only the points in which the object changes its own direction or disappears. To do this, we approximate the object trajectories by means of a piecewise linear function whose nodes correspond to points in which the object changes its own direction. If the objects is at the same time in the FOV of another camera, these nodes are candidates to be corresponding point. To estimate the trajectory we identify the feet point of the detected object and use it to compute the straight line along which the object moves. If a point is close to the straight line then it is used to update the parameters identifying the line, otherwise it is considered a node of the trajectory in which the object changes direction. In the latter case, a new straight line is computed while the point-node is sent to the reasoner. Given N points and by defining Y , A and P as 2

3 y1 6 y2 7 7 Y =6 4 ... 5 yN 2 3 1 x1 6 1 x2 7 7 A=6 4... ... 5 1 xN » – n P = m

(1)

(2)

(3)

points on the straight line must satisfy the following equation: Y =A·P

(4)

Parameters P may be computed by the following: P = (AT · A)−1 · AT · Y

(5)

In our system we compute this solution iteratively by using two points at time avoiding so degenerate cases. At the kth step, to update the parameters with the points (x1 (k), y1 (k))

and (x2 (k), y2 (k)) we define the matrices A(k), Y (k), Mx (k) and My (k) as   1 x1 (k) A(k) = (6) 1 x2 (k)   y (k) Y (k) = 1 (7) y2 (k) Mx (k) = Mx (k − 1) + AT (k) · A(k) T

My (k) = My (k − 1) + A (k) · Y (k)

(8) (9)

and then we use the following equations to update parameters: δ(k) = (Mx (k))−1 · (My (k) − Mx (k) · P (k − 1))

(10)

P (k) = P (k − 1) + δ(k)

(11)

So, P (k) is the current estimation of the parameters of the straight line. When a new point is computed, before updating the parameters we check the distance among the point with the current line. If the point is too far from the estimated line, then a new line is computed by putting to zero P (k), Mx (k) and My (k). 5. PROPOSED TECHNIQUE In distributed video-surveillance system, one of the major difficulties is the extraction of meaningful information from remote cameras to detect abnormal situations. Moreover, we need to take into account the very limited bandwidth in the network and that data transmission is power consuming. For this reason, data processing is done locally and only aggregated data are sent over the network. Each smart camera may provide only local information so a centralized logical reasoner is used to merge and correlate the information coming from different geographical locations and infer a global representation of the events happening in the site. The reasoner must assign the same identity to the different views of the same object acquired from different cameras and have to track an object moving from a camera field of view to another one. In our approach, in order to learn the correspondences among the cameras, the reasoner computes iteratively the homography between each pair of camera views by matching the appearance models of the moving objects detected by themselves. This is done by locally tracking the objects and using the resulting trajectories to determine the corresponding points in the processed views. Once the transformation is estimated, then each correspondence in the two views is known and it is possible to use this information to label in a consistent way the detected objects and solve problem of partial/global occlusion or of split/merge of the detected objects. For each pair of cameras we determine which are the corresponding blobs by using the value of their descriptors and the points on the ground to determine the geometric transformation among the two cameras views. To accomplish these tasks, the reasoner need to know some information about the

detected moving objects that it stores properly in its data structures. In particular, for each pair of cameras it maintains the current homography and the number of correspondences found (it needs at least four pairs of points to compute a valid homography). For each camera it stores data about the currently detected blobs. Each blob is represented by a unique ID, its appearance model and the last two tuples (ti , xi , y i ) to stores the instant and the location respectively in which the blob appears/disappears or changes its own direction. When the reasoner receives information about a blob, it updates data and checks among the blobs of the other cameras with overlapping views. Blobs with similarity higher than a threshold T (T=0.95 [12]) are considered correspondent. 5.1. Blob Matching and Corresponding Points To determine if two objects seen by different cameras are the same, we compare their appearance models by computing a similarity measure. To do that we refer to [12] in which a similarity measure is introduced to compare two MCSHR. Assuming that the two descriptors for the blob A and B that we want to compare are, respectively: M CS(A) = {(CA1 , p(A1 )), (CA2 , p(A2 )), ..., (CAM , p(AM ))} M CS(B) = {(CB1 , p(B1 )), (CB2 , p(B2 )), ..., (CBN , p(BN ))} where CXi represents the i-th major colour for the blob X while p(Xi ) is its own probability. The similarity measure we used is defined like: sim(A, B) = min{sim(A → − B), sim(B → − A)}

(12)

where P sim(X → − Y)=

p(Yi |X) iP i

p(Yi |X) = min{p(Yi ),

(13)

Yi

X

p(Xj )}

(14)

j:dist(CYi ,CXj ) tc11 and tc12 < tc21 then the pair of corresponding points will be ((xc12 , y1c2 ), (˜ xc1 , y˜c1 )) where tc12 tc21 tc2 − y1c1 ) · 1c1 t2

x ˜c1 = xc11 + (xc21 − xc11 ) · y˜c1 = y1c1 + (y2c1

− tc11 − tc11 − tc11 − tc11

(15) (16)

ax + by + c x0 = gx + hy + 1 dx + ey + f y0 = gx + hy + 1

(17) (18)

where (x0 , y 0 ) is a point in the original image while (x, y) is a point in the transformed image (backwards transformation). To compute this homography at least four points are necessary. For each pair ((x0i , yi0 ), (xi , yi ) ) of corresponding points, equations (17) and (18) can be written as: axi + byi + c − gxi x0i − hyi x0i − x0i = 0

(19)

gxi yi0

(20)



hyi yi0



yi0

=0

By defining the vector of parameters p = [a, b, ..., h, 1]0 and considering n pairs of corresponding points, the following equation must be satisfied : Cn · p = 0

(21)

where x1 60 6 6 . Cn = 6 6 . 6 4xn 0

y1 0

1 0

0 x1

0 y1

0 1

−x1 · x01 −x1 · y10

−y1 · x01 −y1 · y10

yn 0

1 0

0 xn

0 yn

0 1

−xn · x0n −xn · yn0

−yn · x0n −yn · yn0

3 x01 0 7 y1 7 7 7 7 7 0 5 xn yn0

Our approach is an iterative and incremental method able to refine the homography as new correspondences are recovered. When a new correspondence is known, we define the matrix An computed only by the n-th correspondence among the two views, as: » An =

xn 0

yn 0

1 0

0 xn

0 yn

0 1

(24)

By defining the matrix M as:

For each couple of cameras, the reasoner determines the homography between the two views by using the found pair of corresponding points. Generally speaking, the image planes of two cameras with overlapping field of views are related by a homography by means of the following equations

2

then T (Cn−1 · Cn−1 + ATn · An ) · p = 0

5.2. Homography Estimation

dxi + eyi + f −

To solve equation (22), we multiply on the left side for the transposed matrix and obtain:    T  C Cn−1 ATn · n−1 · p = 0 (23) An

−xn · x0n 0 −xn · yn

−yn · x0n 0 −yn · yn

x0n 0 yn



We can compute the contribution of this new correspondence to the transformation in (17) and (18) by considering that the parameters of the transformation must satisfy the following equation:   Cn−1 ·p=0 (22) An

T M = Cn−1 · Cn−1 + ATn · An

(25)

parameters p must satisfy the equation M ·p=0

(26)

To obtain an iterative relation, each time a new point is known, we update the matrix M and estimate the parameters variation δ so to satisfy, at the n-th step, the equation: M (n) · (p(n − 1) + δ(n)) = 0

(27)

M (n) · p(n − 1) + M (n) · δ(n) = 0

(28)

then

To update data structures and compute δ by means of the knowledge of the new correspondence, we use the following set of equations: M (n) = M (n − 1) + ATn · An

(29)

δ(n) = −pInv(M (n)) · M (n) · p(n − 1) p(n) = p(n − 1) + δ(n)

(30) (31)

where pInv(.) is a function computing the pseudo-inverse of a matrix, while p(n) is the estimation of the parameters at the n-th step. After some correspondences, the estimated homography is reasonably precise. We stop iteration when the estimated parameters are stable and no significant change happens. The major advantage is that in each moment, when a new calibration of the system is needed, it is possible to use this procedure to update the parameters of the homography without saving all the correspondences coming from the cameras. Figure 5 shows a pair of frames from two different cameras and one of the two frames after the estimated homography was applied. 6. EXPERIMENTAL RESULTS We tested our method on publicly available datasets [13, 14] consisting of outdoor sequences taken by several cameras with overlapping FOVs. To measure the goodness of the homog˜ automatically estimated by our method, we have raphy H

(a)

(b)

(a)

(b)

Fig. 7. Correspondence points: a) Frame from camera 1, b) Frame from camera 2 frame. In both images, the correspondent points used to estimate the homography are represented by red circles. (c)

(d)

Fig. 5. Example of results: a) Frame from camera 1, b) Frame from camera 2, c) Frame from Camera 2 after that homography was applied, d) Frame from camera after applying the homography estimated by the five landmark hand-detected ¯ computed by suitcompared it against the homography H able hand annotated landmarks. We have computed the mean˜ and square error between the image I˜ obtained by applying H ¯ the image I¯ obtained by applying the homography H. mse =

XX 1 ˜ j) − I(i, ¯ j))2 · (I(i, M ·N i j

(32)

As expected, as the number of pairs of corresponding points ˜ becomes more and increases, the resulting homography H more robust and reliable. Figure 7 shows the trend of the mean-square error related to the number of correspondences used for the images in figure 5.

Fig. 6. Mean-Square Error trend versus increasing number of correspondences. Minimum four points are needed to estimate an acceptable transformation

6.1. Two cameras Scenario To better illustrate as the method works, we demonstrate the results by means of a pair of outdoor cameras[13] placed in

two different locations and with overlapping FOVs. Figure 5, a) and b) show two images of the area under surveillance taken by the two cameras while figure 7 represents by red circles the corresponding points used to iteratively estimate the homography. Finally, figure 8 shows the images obtained applying the updated homography each time a new point is detected. 7. CONCLUSIONS AND FUTURE WORKS In this paper we have described a fully automatic method to estimate the homography between a pair of overlapping camera views. Our method can be used in outdoor distributed video-surveillance systems in which smart cameras compute low level information like appearance model and the image plane location of the detected moving objects and send these data to a central unit acting like logical reasoner. By establishing correspondences among the cameras views, the reasoner estimates in an incremental way the homography among the views. Our method works on-line and needs only to know which cameras have overlapping views. Smart-camera works independently from each other so the resulting system is scalable: each time the homography between a couple of cameras must be recovered or updated the procedure can be started without involving all the other cameras. Improvements to the method are subject of the current research activities in our laboratory. In particular we want to make the system less sensitive to the suddenly variation of the detected objects speed. In these cases, indeed, there is an error in establishing the correspondences that lower the performance of the method. Another interesting development is toward a distributed computing of the homography in which the transformation is computed directly by the smart-camera without using the reasoner. 8. ACKNOWLEDGEMENT This work is partially supported by the project FREE SURF funded by Italian MIUR Ministry (2007-2008).

[3] A. Bovyrin R. Belenov K. Rodyushkin A. Kuranov T. P. Chen, H. Haussecker and V. Eruhimov, “Computer vision workload analysis: Case study of video surveillance systems,” Intel Technology Journal - Computeintensive, highly parallel applications and uses, vol. 92, pp. 109 – 118, 2005.

(a)

[4] T. Huang J. Malik G. Ogasawara B. Rao D. Koller, J. Weber and S. Russell, “Towards robust automatic traffic scene analysis in real-time,” Proceedings of the 12th International Conference IAPR, vol. 1, pp. 126– 131, 1994. [5] R. Cucchiara S. Calderara and A. Prati, “Bayesiancompetitive consistent labeling for people surveillance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 354–360, 2008.

(b)

[6] S. Khan and M. Shah, “Consistent labeling of tracked objects in multiple cameras with overlapping fields of view,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1355–1360, 2003. [7] C. Jaynes, “Multi-view calibration from planar motion for video surveillance,” Visual Surveillance, 1999. Second IEEE Workshop on, (VS’99), pp. 59–66, Jul 1999.

(c)

[8] C. Stauffer and K. Tieu, “Automated multi-camera planar tracking correspondence modeling,” Conference on Computer Vision and Pattern Recognition, IEEE Proceedings, vol. 1, pp. 259–266, 18-20 June 2003. [9] R. Romano L. Lee and G. Stein, “Monitoring activities from multiple video streams: Establishing a common coordinate frame,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 758–767, 2000.

(d) Fig. 8. Example of results: transformed frame from camera 2 a) by using four points, b) by using five points, c) by using six points, d) by using seven points 9. REFERENCES [1] M. Valera and S. A. Velastin, “Intelligent distributed surveillance systems: a review,” Vision, Image and Signal Processing, IEEE Proceedings, vol. 152, pp. 192 – 204, 2005. [2] D. Harwood I. Haritaoglu and L. S. Davis, “W4: Realtime surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 809 – 830, 2000.

[10] L. Lo Presti and M. La Cascia, “Real-time object detection in embedded video surveillance systems,” 9th International Workshop on Image Analysis for Multimedia Interactive Services, IEEE Proceedings, 2008. [11] C. Micheloni L. Snidaro and C. Chiavedale, “Video security for ambient intelligence,” Systems, Man and Cybernetics, Part A, IEEE Transactions on, vol. 35, pp. 133 – 144, Jan. 2005. [12] M. Piccardi and E. D. Cheng, “Multi-frame moving object track matching based on an incremental major color spectrum histogram matching algorithm,” Conference on Computer Vision and Pattern Recognition IEEE Proceedings, p. 19, 2005. [13] PETS Metrics Datasets, “http://www.cvg.cs.rdg.ac.uk/ cgi-bin/petsmetrics/page.cgi?dataset,” . [14] Caviar Datasets, “http://homepages.inf.ed.ac.uk/rbf/caviar/,” .

Suggest Documents