IN-SEQUENCE VIDEO DUPLICATE DETECTION WITH ... - IEEE Xplore

4 downloads 0 Views 206KB Size Report
ABSTRACT. A computational geometry approach is developed to detect video duplicate with mild transformations. We model the video sequence as a trajectory ...
Proceedings of 2010 IEEE 17th International Conference on Image Processing

September 26-29, 2010, Hong Kong

IN-SEQUENCE VIDEO DUPLICATE DETECTION WITH FAST POINT-TO-LINE MATCHING Bo Liu1 , Zhu Li1 , Meng Wang2 , A. K. Katsaggelos3 1

Department of Computing, The Hong Kong Polytechnic University, Hong Kong, CHN 2 Microsoft Research Asia, Beijing, CHN 3 EECS Department, Northwestern University, Evanston, Illinois, USA ABSTRACT

A computational geometry approach is developed to detect video duplicate with mild transformations. We model the video sequence as a trajectory after scaling and projection. Through interpolation and equal curve length sampling, part of the frame points is selected. A simplified video representation is the line segment set connecting the left neighboring points. For a given query, match distortion is calculated by projecting the query frame points to the line segment set guided by the frame temporal relationship. Experiments demonstrate the effectiveness of the proposed approach. Index Terms— Duplicate Detection, Cubic Interpolation, Curve Matching 1. INTRODUCTION Recent years have witnessed an explosive increase of video content with the development of video capture and transmission capabilities. This creates an urgent demand for robust and efficient large-scale video database management technologies. Video duplicate detection, which seeks duplicate video clips in a database, becomes a highly desired technique and it plays a unique role in various applications such as web video search [12], copyright protection [8], topic detection and tracking [3]. Here a duplicate is not always an exact copy of the original video sequence, but with several transformations such as lighting, viewpoint and camera length. Extensive research efforts have been dedicated to video duplicate detection and a comprehensive literature survey can be found in [10]. Most of them accomplish the task via keyframe matching.Although encouraging results are achieved, the considerable computational cost brought by key-frame extraction together with feature extraction and matching make this technology difficult to be utilized in real-time applications. In-sequence video duplicate detection is a more challenging problem [1] [7] [9]. Instead of finding duplicates of a query clip in a segment set, the task is to detect duplicate in a video sequence, and this means that it needs to locate the duplicate in a long sequence. However, the location will further introduce much higher computational cost. This technology is also of real significance. For example, statistic indicates

978-1-4244-7994-8/10/$26.00 ©2010 IEEE

1037

that there are 20% exact duplicate videos contained in the collected web duplicate video database [12] and also can we usually find exact duplicate or duplicate with mild transformations in many cases including TV news, advertisements, social web repositories such as YouTube [6] and Google Video [4]. This paper introduces a fast approach for in-sequence video duplicate detection. We investigate the problem with a computational geometry perspective and propose a fast approach based on trajectory matching. In our scheme, a video sequence is reduced to a sequential trajectory in the principal component space and each frame is viewed as a point on this trajectory, and this turns duplicate detection to a trajectory matching problem. To reduce the computational cost, we select part of the points to approximate the trajectory through interpolation and sampling. Temporal relationship is utilized in the matching of query clip on the simplified video trajectory. The organization of the rest of this paper is as follows. In section 2, we introduce our approach. Experiments are provided in section 3. Finally we conclude this paper in section 4. 2. VIDEO DUPLICATE DETECTION WITH POINT-TO-LINE MATCHING 2.1. Video Luminance Trajectory Model In order to detect and locate the video duplicate in real time, it is necessary to find an efficient feature descriptor to extract features from the video frames. Here we utilize global luminance field trace model which has been proved successful in the previous work [11]. All of the video frames in the video sequence is mapped into a d-dimensional space after scaling and PCA transformation. The feature dimension is chosen by compromising computational efficiency and detection accuracy. By exploring the video temporal consistency, the video sequence can be viewed as a trajectory composed of ordered frame points in the principal component space. A typical example is shown in Fig. 1(a), which is the 2D trajectory of

ICIP 2010

















































 











 

D











 











 





F

E







G

Fig. 1. (a)2D trajectory of the “Foreman” sequence; (b)Sequence interpolation result; (c)Distribution of left points after sampling with equal-curve-length strategy; (d)Distribution of left points after sampling with equal-time strategy. 300-frame video sequence “Foreman”1 . 2.2. Video Representation with Compact Luminance Trajectory After modeling the video sequences into a set of principal component coefficient trajectories, we can detect the duplicates of a given query clip with point-by-point comparison. Mathematically, denote the query trajectory by Q = {qi , i = 1, 2, . . . m}, its duplicate can be detected through scanning the video repository C and seeking the potential match location j by minimizing the match distortion: m 1  D(j) = ||qi − ci+j−1 || (1) m i=1 Apparently higher detection efficiency will be achieved if we can use fewer points to represent the video trajectories. Here an interpolation-sampling approach is proposed to select part of the control points and get a simplified video representation.

points and the initializations, we can have a set of equations and easily get the unknown parameters: fj,i−1 (xi ) = fj,i (xi )   (xi ) = fj,i (xi ) fj,i−1   fj,i−1 (x1 ) = fj,i (xi )   fj,i−1 (1) = fj,i (n) = 0

(4) (5) (6) (7)

We estimate the curve function dimension by dimension and finally the trajectory can be modeled by d curve functions F (x) = [f1 (x), f2 (x), . . . , fd (x)]. This interpolation process turns a video clip into a “continuous” curve in the principal component space controlled by n video trajectory points. Fig. 1(b) illustrates the cubic interpolation result of the 2D “Foreman” trajectory. 2.2.2. Equal-Curve-Length Sampling

As we aim at choosing part of the trajectory points to simplify the video representation, intuitively, it is ideal if the 2.2.1. Trajectory Interpolation selected control points are uniformly distributed along the Given a d-dimension video trajectory T = {(1, x1 ), (2, x2 ), . . . , curve, so that we can better preserve its shape and distribu(n, xn )} where n ∈ N is the time coordinate and x ∈ Rd is tion. Here we perform sampling with an equal-curve-length the feature vector. We first interpolate the trajectory to find a approach. Thanks to cubic interpolation, we can calculate the curve representation for it. choose Cubic spline is preferred curve length between two points with any precision we need, because of its efficiency and well-behavior [2]. For the j − th that is  xi+j dimension feature, the piecewise curve function is defined as:   L(xi , xi+j ) = ds = lim Δsi ≈ Δsi (8) ⎧ λ→0 xi fj,1 (x) 1 ≤ x ≤ 2 ⎪ i i ⎪ ⎨ fj,2 (x) 2 ≤ x ≤ 3 We predefine a step length as a criterion to sample all of the fj (x) = (2) ... ⎪ ⎪ video sequence curves in the repository. In the sampling pro⎩ fj,n (x) n − 1 ≤ x ≤ n cess, every selected control point is represented by the principal component coefficients and its location index. For each wherefj,i (x)is a third degree polynomial: video sequence curve, the overall curve sampling process is fj,i (x) = ai1 x3 + ai2 x2 + ai3 x + ai4 (3) described in Algorithm 1. Fig. 1(c) illustrates the result of selecting 60 points for the 2D “Foreman” sequence trajectory Several additional stipulations are added to guarantee the with the above method. While Fig. 1(d) shows the result of curve smoothness. Here fj (x), fj (x), fj (x) should be conselecting 60 points to simplify the trajectory with a fixed time tinuous on the discontinuity points. Together with the given n interval. We can see that the sampling result of our method 1 http://trace.eas.asu.edu/yuv/index.html can better preserve the details of the original trajectory.

1038

IYLYVL TXHU\ ITTV ITTV ITTV

IYLYVL IYLYVL ITTV

Fig. 2. Illustration of the query match process. f vi ), we still project it to [(f vi , vi ), (f vi+1 , vi+1 )]. Otherwise we check whether (f q2 − f q1 ) < (f vi+2 − f vi ).This continues until we find the point (f vi+j , vsi+j ) that satisfies (f vi+j − f vi ) > (f q2 − f q1 ). Then: Di 2 = d(qs2 , s(vsi+j , vsi+j−1 , α))

2.3. Query Matching For a given query sequence, we scale and project the frames in the same way as we process the repository. In order to reduce the computational cost, first we sample the query according to time interval with a fixed frequency f and get a sparse sequence trajectory with p frames in all, which is denoted as QS = {(f q1 , qs1 ), (f q2 , qs2 ), . . . , (f qp , qsp )}. We seek its duplicate in the sampled video trajectory repository. Since query and video repository are sampled with different approaches, the point correspondence is destroyed and consequently we cannot apply Eqn. (1). Here an alternative matching approach is developed. As shown in Fig. 2, we use the line fragment set that connects the neighboring simplified trajectory points to describe the video sequence. We project the sampled query sequence on this set to estimate the match distortion. Obviously now the problem is to decide to which line segment we should project each query point. It is well known that temporal correlation is an important property of video data. The time interval between frames can play as guidance here. For the candidate location i in the video sequence V S = {(f v1 , vs1 ), (f v2 , vs2 ), . . . , (f vn , vsn )}, we calculate the matching distortion from the first sampled query point (f q1 , qs1 ) to the line segment [(f vi , vi ), (f vi+1 , vi+1 )]. The line goes through vi and vi+1 and be modeled as a parameter function: s(vi , vi+1 , α) = vi + α(vi+1 − vi )

IYLYVL

YLGHR

Algorithm 1: Equal-Curve-Length Sampling Input:Video sequence trajectory FT controlled by trajectory T = {(1, x1 ), (2, x2 ), . . . , (n, xn )}; Predefined sampling threshold lth . Initialization:Selected Frame Set S = [(1, x1 )]; i = 2; Index of the last point being selected ind = 1. Iteration: While i ≤ n If L(xind , xi ) < lth S = [S; (i, xi )], ind = i, i++; continue; Else i++; continue; End if; End While Output: Selected point set S

(9)

It can be easily proved that the projection distance from qs1 to s(vi , vi+1 , α) is:

(11)

s. t. (f vi+j − f vi ) > (f q2 − f q1 ) and (f vi+j−1 − f vi ) < (f q2 − f q1 ) For the third point, we start from the (i + j)-th point and search with the same method. For the other points, and so on. The average projection distortion E(QS, V S) is defined as: p  1 E(QS, V S) = min Di (12) p i j=1 j A predefined threshold dth is set to decide the existance and location of duplicate in the repository, that is: ⎧ p p ⎨ arg min( 1  Di ) if min( 1  Di ) < d th j j p p i location = i j=1 j=1 ⎩ notf ound else (13) 3. EXPERIMENTS We conduct our experiments on videos collected form the NIST TRECVID shot boundary test set in 2006 [5]. The repository is composed of 6 video sequences. Each sequence is about 30 minutes in length. We randomly generate positive queries from the repository. Several transformations are additionally added to the queries as follows: (1). Median filtering with a 5 × 5 template. (2). Additive Gaussian noises measured by PSNR at 25dB. (3). TV logo insertion. (4). Five percent video frame loss in the query clip, the lost frame is replaced by the previous frame. Negative queries are collected from other videos. For each video sequence, the frames first scaled to 11 × 8 icon then project it into the principal component space as a trajectory. The precision-recall is adopted as performance metric: the number of true duplicates detected total number of detected duplicates the number of true duplicates detected = total number of true duplicates (14)

precision =

D1i = d(qs1 , s(vi , vi+1 , α)) = ||qs1 − vi − α(vi+1 − vi )|| (10) T i ) (vi+1 −vi ) where α = (qs1 −v . Next, for example, the query 2 ||vi+1 −vi || point (f q2 , qs2 ), if the time interval (f q2 − f q1 ) < (f vi+1 −

recall

1039













3UHFLVLRQ

3UHFLVLRQ

       

/ / / /















  



I I I I

    









5HFDOO

5HFDOO

Fig. 3. Precision-recall comparison between different curve sampling step-sizes.

Fig. 4. Precision-recall comparison between different query sampling frequencies.

Table 1. Number of Points and Average Detection Time Comparison

tion. However, we can easily get more than 90% accuracy with few time cost (about 0.2s) even if we sample the video at a considerable high step-size (only 1.28% points left).

Original

0.1L

0.3L

0.5L

0.7L

CPN

308006

22692

8593

5389

3943

SR(%)



7.37

2.79

1.75

1.28

ADT(s)



1.2039

0.4378

0.2757

0.1993

4. CONCLUSION

In the proposed scheme, precision-recall is controlled by changing the decision threshold. The experiments are conducted on a PC with 2.8G CPU, 4G RAM. 3.1. Experiment Results Apparently the repository sample rate and frequency we utilize to sample the query are two key factors for our algorithm. Hence we analyze the effect of different parameters choice to the performance of our algorithm. Previous research indicates that usually satisfactory result can be achieved when we choose first 8 principal component coefficients [11]. In this paper the feature dimension is fixed to be 10. 300 positive queries with transformations and 300 negative queries are generated as test samples, each query contains 2000 frames. For each query, we set sampling frequency as f = 30.The video repository, for simplicity, the curve sampling parameter is decided according to the maximum distance between neighboring trajectory points, which is denoted as L here. We set the sampling curve length as 0.1L, 0.3L, 0.5L, 0.7L, respectively. Table 1 illustrates the number of control points after sampling (CPN), sampling rate (SR) and average duplicate detection time (ADT) with different parameters. If we choose larger sampling stepsize, fewer control points will be left and less computational cost is required to search the duplicate in the candidate space. The precisionrecall curve is shown in Fig. 3. Next we fix the curve sampling step-size as 0.3L, for each query we select sampling frequency f to be 30, 50, 60, 70, respectively. The precisionrecall curve is shown in Fig. 4. From these two experiments we can clearly see that the detection time cost is nearly linear with the number of point left after sampling. Too large sampling step-size may lead to certain precision loss since it loses too much detail informa-

1040

This paper describes a video duplicate sequence detection algorithm. Each video sequence is modeled as a trajectory sequence after scaling, projection, interpolation and sampling. We detect query duplicate by point to line segment set matching guided by temporal relationship between video frames. The result indicates encouraging potential for being applied in large scale video repository. In the future we plan to further develop our approach by integrating with indexing structure to handle large datasets. 5. REFERENCES [1] D. A. Adjeroh, M. C. Lee, and I. King. A distance measure for video sequences. In Computer Vision and Image Understanding, volume 75, pages 25–45, 1999. [2] S. A. Dyer and J. S. Dyer. Cubic-spline interpolation: Part 1. In IEEE Instrumentation & Measurement Magazine, volume 4, pages 44–46, 2001. [3] W. H. Hsu and S. F. Chang. Topic threading for structuring a large-scale news video archive. In Proc. of International Conference on Image Processing, 2006. [4] http://video.google.com/. [5] http://www nlpir.nist.gov/projects/tv2006/tv2006.html. [6] http://www.youtube.com/. [7] K. Iwamoto, E. Kasutani, and A. Yamada. Image signature robust to caption superimposition for video sequence identification. In Proc. of International Conference on Image Processing, 2006. [8] Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In Proc. of ACM Multimedia, 2004. [9] Y. T. Kim and T. S. Chua. Retrieval of news video using video sequence matching. In Proc. of International MultiMedia Modeling Conference, 2005. [10] J. L. L.-To, L. Chen, A. Joly, I. Laptev, O. Buisson, V. G.Brunet, N. Boujemaa, and F. Stentiford. Video copy detection: a comparative study. In Proc. of International Conference on Image and Video Retrieval, 2007. [11] Z. Li, Y. Fu, J. Yuan, Y. Wu, A. K. Katsaggelos, and T. S. Huang. Multimedia Data Indexing. 2008. [12] X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In Proc. of ACM Multimedia, 2007.

Suggest Documents