Robust Unsupervised Motion Pattern Inference from Video ... - CiteSeerX

7 downloads 0 Views 2MB Size Report
We propose an unsupervised learning framework to in- fer motion patterns in videos and in turn use them to im- prove tracking of moving objects in sequences ...
Robust Unsupervised Motion Pattern Inference from Video and Applications Xuemei Zhao and G´erard Medioni University of Southern California Los Angeles, CA 90089 {xuemeiz|medioni}@usc.edu

Abstract We propose an unsupervised learning framework to infer motion patterns in videos and in turn use them to improve tracking of moving objects in sequences from static cameras. Based on tracklets, we use a manifold learning method Tensor Voting to infer the local geometric structures in (x, y) space, and embed tracklet points into (x, y, θ) space, where θ represents motion direction. In this space, points automatically form intrinsic manifold structures, each of which corresponds to a motion pattern. To define each group, a novel robust manifold grouping algorithm is proposed. Tensor Voting is performed to provide multiple geometric cues which formulate multiple similarity kernels between any pair of points, and a spectral clustering technique is used in this multiple kernel setting. The grouping algorithm achieves better performance than state-of-the-art methods in our applications. Extracted motion patterns can then be used as a prior to improve the performance of any object tracker. It is especially useful to reduce false alarms and ID switches. Experiments are performed on challenging real-world sequences, and a quantitative analysis of the results shows the framework effectively improves state-ofthe-art tracker.

1. Introduction With the decreasing cost of collecting data, the deluge of surveillance videos makes it necessary to carry out automatic intelligent processing to understand scenes and analyze activities. Learning motion patterns of moving objects is an important way to solve this problem, and in recent years, significant effort has been devoted on this topic in the visual surveillance community. Recent works in scene understanding have proven that high-level knowledge about the scene in the form of motion patterns is helpful for lowlevel detection and tracking [5, 6, 8, 16, 21], and high-level anomaly detection and behavior prediction [6, 8, 16, 19]. A motion pattern or flow is a smooth and compact spatiotemporal structure that describes a set of neighboring ob-

(a) From NGSIM

(b) From NGSIM

(c) From YouTube

(d) From YouTube

Figure 1. Examples: Arrows illustrate the directions of movement.

jects moving in a similar way, and some examples are shown in Fig. 1. For instance, Fig. 1(a) shows an intersection. Vehicles moving east to west form a motion pattern, while vehicles from the south making a right turn form another motion pattern. Fig. 1(d) shows a marching band. Two groups of performers form two intersecting and rotating circles, giving rise to two different motion patterns. Our method is able to separate flows with different directions even though they may overlap spatially. Motion patterns are often obtained from sparse featured points, such as sparse optical flow and trajectory points, but the information propagates to all the pixels in images. Therefore, motion patterns serve as a geometric and statistical model of the scene, providing information on where and what is likely to happen. Using it as a prior, various tasks like tracking and activity analysis become easier. Note that here we focus on global motion pattern, while another category method [9] focuses on local motion pattern. A combination of the two would be interesting, but it’s beyond the scope of this paper. In this paper, we propose a novel unsupervised motion pattern inference method based on tracklets in static camera scenes. We mainly deal with far-field surveillance videos which are often of low quality, and the size of moving ob-

2011 IEEE International Conference on Computer Vision c 978-1-4577-1102-2/11/$26.00 2011 IEEE

715

jects is relatively small. The low resolution makes it difficult if not impossible for appearance based detectors to work in complex scenes. Therefore, we first learn motion blobs corresponding to foreground moving objects from background modeling, and then perform local association to obtain tracklets. By applying Tensor Voting [11] on these tracklets, we get refined motion direction for each tracklet point, and then embed 2D tracklet points position information into (x, y, θ) space, where θ represents the motion direction. When motion patterns exist, points form intrinsic manifold structures. To segment these structures, a novel robust manifold grouping algorithm is proposed. It explicitly handles outliers and uses local geometric information like normal/tangent space and dimensionality of the local structure that each point belongs to. In addition, kernel density estimation is performed to propagate the grouping information from tracklet points to all pixels thus defining dense motion patterns. With flow information as prior, tracking becomes easier. Each track on the image lattice can be seen as a random walk where the prior probabilities of transitions at each state are given by the motion pattern estimates. These prior probabilities help remove false associations, thus greatly decreasing false alarms. Since this is a general approach, it can be integrated into any tracking algorithm. Contributions. Here our goal is not to provide a new tracking algorithm, but a generalized framework to represent motion pattern information, that can be incorporated as prior knowledge in any tracker. The key contribution of our work consists of two aspects. First, a novel framework for learning motion patterns is proposed. A robust multiple manifold grouping algorithm making full use of local geometric information is present. Compared to other state-of-the-art candidate algorithms which focus on linear subspace segmentation, the new method is proved to be good at grouping motion patterns which are nonlinear in many cases. Second, motion pattern knowledge is used in turn to improve tracking, and a quantitative analysis shows that our method effectively improves state-of-the-art tracker [13, 14] on challenging real-world sequences. The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 describes feature extraction. Section 4 presents a new robust non-linear manifold grouping algorithm. Section 5 and 6 explain how to learn motion patterns and how to use motion patterns to improve tracking respectively. Section 7 shows experimental results, followed by conclusion and future work.

2. Related work Features. Some pioneering works on motion pattern extraction can be roughly classified by the features they use. One major kind of methods uses optical flow [10, 15, 21] as input, and the other category uses trajectories obtained

by tracking moving objects [8, 19]. Systems built on motion flow field have broad applications, such as in extremely crowded scenes, where other features are difficult to get. However, simple and fast optical flow methods cannot generate reliable results as a solid basis, while the speed requirement of more sophisticated ones makes them infeasible [21]. On the other side, in many cases, multi-target tracking is a difficult task itself. Thus, we sit between the two, and use local moving object association results (tracklets) as features instead of global association results (trajectories). Applications. From the view of applications of motion patterns, works can be roughly divided into tracking [3, 4, 21], segmentation [10, 18], and anomaly detection [8, 18, 19]. In segmentation, [10] segments every frame of video into regions of different motions based on the similarity of the neighboring streamlines, and [18] gets motion patterns in a way that segments video sequences based on different types of interaction occurring and detects activities both temporally and spatially. Pure multi-target tracking methods are object-centric and do not exploit any high-level or global knowledge that may aid in tracking. This is one of the major differences between tracking with flow knowledge algorithms and these approaches. In both [3, 4, 21] and our work, high level constraints resulting from scene structure in the form of motion patterns are integrated into the tracking algorithm.

3. Feature Extraction Extraction of Foreground Motion Blobs. The goal of background modeling is to learn the static part of the scene in an unsupervised fashion. This is a challenging task in many settings, such as traffic surveillance videos, due to the low quality of images and complex background structure. Here we use Robust Alignment by Sparse and Low-rank Decomposition (RASL) [12] to learn foreground points that are supposed to be the pixels on moving objects by treating the static scene (background) as intrinsic images and moving objects (foreground) as sparse errors, or the corruptions to the intrinsic images. Notice that RASL estimates global transformation, which means they could handle moving camera scenes. Our current framework deals with fixed camera scenes, but the potential of RASL makes it the best choice for future generalization. To our knowledge, it’s the first time that RASL is used in a tracking system. After performing RASL, foreground points ei are obtained for every frame. Then connected components are taken directly from the foreground points and considered as motion blobs corresponding to moving objects or parts of them. Examples of foreground points and motion blobs acquired from a frame are shown in Fig. 2. Local Association. Using the method proposed in [13, 14], we perform local association to get initial tracklets. The al-

716

(a) Foreground pixels

(b) Motion blobs

Figure 2. Segmentation

gorithm takes a set of motion blobs in each frame as input to infer an optimal set of tracklets. That is formulated by constructing a set of Bayesian networks, each of which models the joint distribution of detections over a sliding window of length L. Using consistency of motion and appearance as the driving force, MAP inference in this graphical model then gives an efficient and optimal solution. It is worth noting that any other local association method producing reasonable tracklets can be used instead. These initial tracklets may have errors due to many factors. In motion blobs extraction step, it is very likely that a single moving object is split into multiple moving regions when some parts of the object are undistinguishable from the background or that the neighboring multiple moving regions are merged into a single blob when they overlap from the viewpoint. These all cause appearance change which may result in errors in association step. In spite of these errors, by observing enough tracklets, a fairly good understanding of the scene can be acquired. We use these tracklets in the training phase to learn motion patterns.

4. Robust Non-linear Manifold Grouping As an important step in motion pattern learning, automatic and robust grouping is needed. State-of-the-art subspace grouping methods like [7, 17, 20] acquire good performances on multiple linear subspace segmentation problems. However, motion patterns in reality often have nonlinear shapes (like a right turn), which brings difficulty for them. The main reason is these methods often assume the intrinsic manifolds have linear structures. Furthermore, the robustness issue is not explicitly considered in most of these frameworks. Driven by these factors, we propose a novel robust non-linear manifold Grouping (RNMG) method, which explicitly considers outliers and can handle multiple nonlinear intrinsic subspaces with different dimensionalities. It’s proved to be effective in our applications. It is worth noting that, we do not claim we have better algorithm than [7, 17, 20] which achieve the best results on multiple linear subspace grouping and motion segmentation tasks, but our algorithm focuses on multiple nonlinear manifold grouping in 3D space. One of the most popular clustering methods is spectral

clustering, which has solid theoretical foundation (graph spectral theory), and elegant computational framework (eigen-decomposition). However it is pointed out by [22] that pairwise similarity is not enough for manifold learning, i.e., high-order relationship information between point sets is missing. Inspired by [22], we present a novel way to construct the similarity graph in a multiple kernel setting, which includes high-order point sets information.

4.1. N D Tensor Voting An important step in our grouping algorithm is the use of Tensor Voting to provide multiple geometric cues to construct similarity graph. Tensor Voting is an unsupervised computational framework to infer the local geometric structures of manifolds [11]. It has been proved capable of estimating structures in N D space with very noisy input data. Given samples in an input space of dimensionality N , the local geometric information at each point is encoded in a tensor T , whose quadratic form is a symmetric and non-negative definite matrix. Recall that a tensor can be decomposed as T =

N X i=1

λi ei ei T =

N −1 X

i X

i=1

k=1

(λi −λi+1 )

ek ek T +λN

N X

ei ei T

i=1

(1) where {λi } are the eigenvalues in descending order, and {ei } are the corresponding eigenvectors. This way, local geometric information such as dimensionality and normal/tangent space at every point can be derived by examining the eigensystem of the corresponding tensor. Intrinsic dimensionality. The largest gap between two consecutive eigenvalues λi − λi+1 indicates the dimensionality d of the local structure that the point belongs to, and d = N − arg max(λi − λi+1 ) i

(2)

Local normal and tangent space. A compact representation of the local manifold structure at the point corresponding to T is a d-dimensional normal space spanned by {e1 , ..., ed } and a (N − d)-dimensional tangent space spanned by {ed+1 , ..., eN }. Suppose we have a set of points Pi (i = 1, 2, · · · , N ) in N D space. Initially, we encode every data as an identity matrix, indicating no orientation preference, since we have no knowledge of their local structure at the beginning. In the voting process, each point Pi propagates its information to its neighbors, meanwhile collects information from them. The vote from a voter to a receiver depends on the tensor of the voter, the orientation and the distance between the two. After the voting process, the sum of all the votes from neighbors becomes the new tensor of a point, and by analyzing it as Eq. 1, we can get a point’s geometric property. More details can be found in [11].

717

4.2. Multiple Kernel Similarity Graph

4.3. Graph Spectral Grouping

After estimating the local geometric structures on manifolds, we construct multiple kernels. Distance kernel. This is widely used in graph spectral clustering and defined as

Once we have the N × N similarity graph, a standard spectral clustering technique can be applied. Specifically, the unnormalized Laplacian matrix L = D − W , where D is a diagonal matrix whose elements equal to the sum of W ’s corresponding rows. Afterwards, we select the C (the number of groups) eigenvectors corresponding to the C smallest eigenvalues. Finally, K-means algorithm is applied on those eigenvectors to get the grouping results.

2 wdis (xi , xj ) = exp(−dis(xi , xj )2 /σdis )

(3)

dis(xi , xj ) can be set as the simple L2 distance in Euclidean space, or the geodesic distance inspired by the idea of ISOMAP. People may argue that geodesic distance is a better measure here. However, there are usually multiple manifold structures in the scene, so the geodesic distance between two points is difficult to define without knowing whether they belong to the same structure, which is exactly what we aim to do with the grouping algorithm. Another potential problem for geodesic distance is the shortcut issue. Normal space kernel. The simple intuition is, if two points are on the same manifold, in particular on the same local manifold, then their normal spaces should be similar. Although two points far away from each other on the same manifold may have large principal angle between their normal spaces, we still use normal space similarity to build our kernel. The main reason is usually motion pattern has relatively low curvature, i.e. it changes smoothly and slowly as spatial distance increases. Thus we have 2 wnor (xi , xj ) = exp(−sin(θ(xi , xj ))2 /σnor )

(4)

where θ(xi , xj ) measures the principal angle between the normal spaces E i of xi and E j of xj . Intrinsic dimensionality kernel. Tensor Voting provides reliable intrinsic dimensionality results, in particular when sufficient number of samples on manifolds are available [11]. This fact is helpful to construct the similarity graph since a single motion pattern usually has a unique intrinsic dimensionality. Thus 2 wdim (xi , xj ) = exp(−(di − dj )2 /σdim )

(5)

Multiple kernel association. There are multiple ways to associate these kernels together, but which is best is an open problem in machine learning. In the current setting, we use the simple product of the three and find it effective. w(xi , xj ) = wdis (xi , xj )wnor (xi , xj )wdim (xi , xj ) (6) Then a similarity matrix W = [wij ] = [w(xi , xj )] is obtained. Smaller angles between two normal spaces, smaller dimensionality differences and smaller Euclidean distances all lead to larger weight, or larger similarity score.

4.4. Outlier Rejection Outlier rejection is an extremely important issue for robust grouping. Previous state-of-the-art approaches can handle outliers to some degree, such as less than 5%. But we found from experiments that it’s possible to have outliers of more than 10% in many cases due to tracking errors(false alarm, wrong association, etc.), which makes it necessary to explicitly handle outliers in our algorithm. In experiments, 3D Tensor Voting is thus performed to get 3 eigenvalues {λi }, i = 1, 2, 3, in descending order. A point is considered as outlier if all of its corresponding eigenvalues are small. If the percentage p of outliers is known, then we can rank all the points in descending order of their {λ1 }, and those bottom p% points are rejected as outliers. Otherwise, we use the median of all the {λ1 } as a criterion to select outliers by threshold strategy.

4.5. Number of Motion Patterns Due to the intrinsic vagueness of motion patterns, it’s difficult to tell exactly how many motion patterns exist, even for humans. And because of the vagueness, the number of patterns is not a sensitive parameter in our application, as long as motion patterns help us to understand the scene. However, we still have two choices to decide the number of patterns. The first is to pre-define how many patterns are there by human, which is used in our current framework, and the second is to learn patterns in a hierarchical fashion, which will be explored in future work.

5. Learning Motion Patterns Taking tracklets from the pre-processing step as input, 2D Tensor Voting is performed to get the local tangent direction of every tracklet point. Therefore, every tracklet point (x, y) is mapped to a point in (x, y, θ) space, 0 6 θ < 360. In previous works like [15, 21], velocity information of both magnitude and direction are used. But we found that in practice, directions are more reliable than magnitudes, whether they are got from optical flow or global/local object association, since we can use Tensor Voting to refine directions, but no good method can be used to correct magnitudes once they are wrong. That’s why we only use direction information in our framework.

718

pattern i can be calculated as P (mx = i|x = (x, y, θ)) P (x = (x, y, θ)|mx = i) × P (mx = i) P (x = (x, y, θ)) X 1 Ni ∝ exp(−||a − x||2 /σ 2 ) × P Z m =i i Ni =

(a) Tracked points

(b) Direction

(7)

a

Figure 3. A tracklet

Motion patterns are inferred in the following process. Step 1 − Tracklet Analysis. A tracklet is an ordered sequence χ = {xpi }, where xpi is the spatial coordinate of the center of motion blob in frame i. In tracklet analysis, we perform 2D Tensor Voting for every tracklet χ separately, taking {xpi } as input. After voting process, the tangent direction (two-side) of every tracklet point is got, and the order of points further helps us choose one side, which indicates the motion direction of that point and the corresponding motion blob. An example of a tracklet and the motion directions after Tensor Voting is shown in Fig. 3. We can see that Tensor Voting gives accurate direction information in the presence of noise. Taking east direction as 0, every motion direction is projected to a degree between 0 and 360. Therefore, every tracklet point (x, y) is mapped to a point in (x, y, θ) space, 0 6 θ < 360, and the original problem in 2D space is embedded into 3D space. Step 2 − Multi-Motion Pattern Grouping. In (x, y, θ) space, we find that points automatically form manifold structures. One example is shown in Fig.4(a). Each structure contains rich information describing the pattern of movement. For example, a specific location (x0 , y0 ) may have a correspondence (x0 , y0 , θ0 ), and that means an object at that location is very likely to move in the direction θ0 . Therefore, the task is to group the points in (x, y, θ) space into segments, each of which corresponds to a motion pattern. Then RNMG (Sec. 4) is performed to first reject outliers, then group inliers into segments. Step 3 − Dense Motion Pattern Inference. After grouping, each segment corresponds to a motion pattern. But till now, motion pattern information is only on sparse tracklet points. To get a full understanding of the whole scene, we propagate the information to all pixels in image. We use kernel density estimation in this step. Assume C groups are learned from the last step, corresponding to C motion patterns i(i = 1, 2, · · · , C), and the number of tracklet points belonging to i is Ni . Given a 3D point x = (x, y, θ), let mx = i represent x belongs to motion pattern i. The probability of x belonging to motion

Here, Z is a normalization term and σ is the bandwith of the kernel. Then the motion pattern X belongs to is the one with the largest probability.

6. Motion Patterns Improve Tracking Once motion pattern information is obtained, it serves as prior to facilitate tracking individual objects. In the association stage of our tracking system, the key step is to calculate the probability Pb of a prediction state xR = (xR , y R , θR ) being the next step of a current state xC = (xC , y C , θC ). In the original tracking framework, Pb is calculated based on appearance similarity and motion similarity. Now, prior knowledge tells us that xR should also be on the same motion pattern as xC . To calculate this possibility, we first define the motion pattern xC belongs to mxC = i = arg maxj P (mxC = j|xC = (xC , y C , θC )) (8) The probability that xR also belongs to i is P (mxR = i|xR = (xR , y R , θR )) ∝P (xR = (xR , y R , θR )|mxR = i) × P (mxR = i) (9) 1 X Ni = exp(−||a − xR ||2 /σ 2 ) × P Z m =i i Ni a

Assume in the tracking system, the original probability of Pb being the prediction of C is P (X R = xR |X C = xC ), then a better measure of the possibility using motion pattern information is: Pb′ = P (X R = xR |X C = xC , mxR = mxC = i) ≈

C X

P (mxR = i|X R = xR )Pb P (mxC = i|X C = xC )

i=1

This use of motion pattern knowledge in tracking is general, so it is not limited to our tracker, but can be easily integrated into any tracking module. Complexity Analysis Assume the size of input images is W ×H, and K tracklets are learned from the feature extraction step with average length N . The computational complexity of motion pattern learning is O(KN W H +K 3 N 3 ), and the additional computational complexity cost of using this prior to improve tracking is O(KN ). The speed of the Matlab implementation of learning step is about 2 FPS on

719

a 3.0G Hz PC. In practice, we find that instead of incurring extra time cost for tracking, using this prior could speed up tracking procedure since false alarms are largely reduced.

7. Experimental Results 7.1. Learning Motion Patterns by RNMG We apply the proposed method on several real-world video sequences. Three sequences are from NGSIM data set [2]. They were acquired by static cameras installed on high-rise buildings, and suffer from low image quality. The average size of the moving vehicles is about 22 × 15 pixels. Two other sequences are from YouTube. To test the effectiveness of RNMG, we compare it with five major algorithms [1], which are K-means, RANSAC, Local Subspace Affinity (LSA) [20], Generalized PCA (GPCA) [17], and Sparse Subspace Clustering (SSC) [7]. σdis (L2 distance), σnor , σdim , σ and L are set and fixed as 80, 0.5, 0.75, 30 and 20 respectively for all the experiments. Since GPCA, LSA, RANSAC and K-means have no explicit outlier rejection step, to make a fair comparison, we first reject outliers by the method proposed in our algorithm (thre = 0.02 × median{λ1 }), and use the same results to do further grouping for all the methods. Without outlier rejection, non-robust segmentation methods would give worse results. Compared to algorithms like GPCA and LSA which focus on linear subspace segmentation, the new method is better at grouping nonlinear motion patterns. Since our algorithm focuses on multiple nonlinear manifold grouping, results on Hopkins 155 [1] are not provided. Taking sequence 2 as an example, in (x, y, θ) space, three intrinsic manifold structures exist in the space, corresponding to the three motion patterns in Fig. 5(f). We manually labeled ground truth segments. The visualizations of ground truth and the grouping results for different algorithms are shown in Fig. 4. Performance shown in Table 1 is evaluated in terms of the grouping error, which is defined as number of misgrouped points / total number of points.

7.2. Tracking To test motion patterns’ usage in tracking, we select 100 frames which are not used in learning motion patterns from each of the 3 NGSIM sequences. The ground truth trajectories are generated by manually tracking each moving vehicle. The data set is challenging due to heavy traffic, occlusion, and low resolution. Four metrics are adopted to quantitatively analyze the results: (1) TRDR: Tracker Detection Rate = Total True Positives / Total Number of Ground Truth, (2) FAR: False Alarm Rate = Total False Positives / (Total True Positives + Total False Positives), (3) MT: Mostly tracked trajectories, the percentage of trajectories that are successfully tracked for more than 50% frames, (4) FRAG: Average number of fragments for each tracked ground truth

(a) Ground truth

(b) RANSAC

(c) LSA

(d) GPCA

(e) SSC1

(f) Ours

Figure 4. Grouping results of the sequence in Fig. 5. Black points indicate outliers.

trajectory. The higher the value, the better is the performance in TRDR and MT; The lower the value, the better is the performance in FAR and RAG. These four categories are by no means complete, however they cover most of the typical errors observed in our experiments, and form reasonable evaluation metrics. For comparison, we first perform tracking by a stateof-the-art tracker [13, 14], then incorporate motion pattern information to the tracker to do tracking again. Tracking evaluation results are shown in Table 2. With similar detection rate, our method significantly decreases false alarms. It shows that motion pattern information prevents wrong association by providing prior knowledge of objects’ movement. But since the detection rate largely depends on the initial motion blobs found from background modeling, motion patterns do not contribute much to improve detection.

8. Conclusion and Future Work In the paper, we proposed a method to detect multiple semantically meaningful motion patterns in an unsupervised manner, and used them to improve tracking accuracy. A novel robust grouping algorithm making full use of local geometric information is designed. The experimental results

720

K-mean 42.96%

RANSAC 40.34%

LSA 31.61%

GPCA 21.28%

SSC1 27.21%

SSC2 23.03%

Ours 7.99%

Table 1. Misclassification rates for 7 grouping algorithms on the real-world sequences. SSC1 means use SSC to do grouping by using its own outlier rejection step, and SSC2 means using our outlier rejection step.

Sequence 1 2 3

Method [13, 14] Ours [13, 14] Ours [13, 14] Ours

TRDR 81.60% 82.12% 80.10% 81.79% 85.21% 88.76%

FAR 15.50% 3.75% 33.45% 20.95% 33.75% 21.42%

MT 76.92% 80.77% 77.08% 81.25% 88.46% 88.46%

FRAG 1.1429 1.1429 1.9000 1.5610 1.8333 1.4000

Table 2. Tracking evaluation results on three sequences

on complex video sequences show that the grouping algorithm outperforms state-of-the-art alternatives, and verify that using high-level knowledge about the scene in the form of motion patterns significantly improves tracking performance. In the future, we will generalize the current framework to moving camera scenes, and apply sliding window techniques to online update motion patterns. With respect to applications, motion pattern information will be used to help anomaly detection and behavior prediction.

Acknowledgements This work was supported in part by grant DE-FG5208NA28775 from the U.S. Department of Energy.

References [1] Hopkins 155 matlab implementation for subspace clustering algorithms. Available at. http://www.vision.jhu.edu/code/. 6 [2] Next generation simulation NGSIM dataset. Available at. http://www.ngsim.fhwa.dot.gov/. 6 [3] S. Ali and M. Shah. A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. CVPR, pages 1–6, 2007. 2 [4] S. Ali and M. Shah. Floor fields for tracking in high density crowd scenes. ECCV, pages 1–14, 2008. 2 [5] G. Antonini, S. V. Martinez, M. Bierlaire, and J. P. Thiran. Behavioral priors for detection and tracking of pedestrians in video sequences. IJCV, 69(2):159–180, 2006. 1 [6] J. Berclaz, F. Fleuret, and P. Fua. Multi-camera tracking and atypical motion detection with behavioral maps. ECCV, pages 112–125, 2008. 1 [7] E. Elhamifar and R. Vidal. Sparse subspace clustering. CVPR, pages 2790–2797, 2009. 3, 6 [8] W. Hu, X. Xiao, Z. Fu, D. Xie, T. Tan, and S. Maybank. A system for learning statistical motion patterns. PAMI, 28(9):1450–1464, 2006. 1, 2 [9] L. Kratz and K. Nishino. Tracking with local spatio-temporal motion patterns in extremely crowded scenes. CVPR, pages 693–700, 2010. 1

[10] R. Mehran, B. Moore, and M. Shah. A streakline representation of flow in crowded scenes. ECCV, pages 439–452, 2010. 2 [11] P. Mordohai and G. Medioni. Dimensionality estimation, manifold learning and function approximation using tensor voting. JMLR, 11:411–450, 2010. 2, 3, 4 [12] Y. Peng, A. Ganesh, J. Wright, and Y. Ma. Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. CVPR, pages 763–770, 2010. 2 [13] J. Prokaj and G. Medioni. Inferring tracklets for multi-object tracking. Workshop of Aerial Video Processing Joint with IEEE CVPR, 2011. 2, 6, 7 [14] J. Prokaj and G. Medioni. Using 3d scene structure to improve tracking. CVPR, pages 1337–1344, 2011. 2, 6, 7 [15] I. Saleemi, L. Hartung, and M. Shah. Scene understanding by statistical modeling of motion patterns. CVPR, pages 2069– 2076, 2010. 2, 4 [16] I. Saleemi, K. Shafique, and M. Shah. Probabilistic modeling of scene dynamics for applications in visual surveillance. PAMI, 31(8):1472–1485, 2009. 1 [17] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using powerfactorization and GPCA. IJCV, 79(1):85–105, 2008. 3, 6 [18] X. Wang, X. Ma, and E. Grimson. Unsupervised activity perception by hierarchical bayesian models. CVPR, pages 1–8, 2007. 2 [19] X. Wang, K. Tieu, and E. Grimson. Learning semantic scene models by trajectory analysis. ECCV, pages 110–123, 2006. 1, 2 [20] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. ECCV, pages 94–106, 2006. 3, 6 [21] Q. Yu and G. Medioni. Motion pattern interpretation and detection. CVPR, pages 2671–2678, 2009. 1, 2, 4 [22] D. Zhou, J. Huang, and B. Scholkopf. Learning with hypergraphs: Clustering, classification, and embedding. NIPS, pages 1601–1608, 2006. 3

721

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

} (m)

(n)

(o)

Figure 5. Left column: three sequences; Middle column: the inferred groups, and the main movement directions of trajectory groups are illustrated by white arrows; Right Column: 2D visualization of motion patterns.

722

Suggest Documents