Data-parallel full reference algorithm for dropped ... - Semantic Scholar

4 downloads 0 Views 287KB Size Report
Department of CSE & IT. Jaypee Institute of Information Technology. Noida, India – 201307 mthakur.jiit@gmail.com, vikas.saxena@jiit.ac.in. J P Gupta.
Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm Manish K Thakur, Vikas Saxena

J P Gupta

Department of CSE & IT Jaypee Institute of Information Technology Noida, India – 201307 [email protected], [email protected]

Vice Chancellor Sharda University Greater Noida, India – 201310 [email protected]

Abstract— In recent years due to easy availability of video editing tools, video sequences which are to be presented as evidence during court trials, can be tampered and therefore misguide the court proceeding. These video sequences are required to be authenticated by forensic experts before considering them as evidence during court trials. Frame drop is one of the most often temporal tampering and it is required to authenticate a video sequence against frame drop. Current work is an extension of author’s pre-published work where they presented a full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm which efficiently identifies dropped frame indices with accuracy ranging from 78 to 100% under different test scenarios. To resolve the issue of scalability, this paper extends the algorithm by identifying set of independent tasks in pre-published FR algorithm and presented a data-parallel algorithm. We simulated the scheme over five video sequences (of duration 10 to 540 seconds) by dropping 0.5% and 1% video frames and analyzed the required processing time with 1, 2, 4, and 8 processors. Simulated results suggested that the presented algorithm is scalable and efficiently identifies dropped frame indices with average speedup of 1.77, 2.81, and 3.35 for 2, 4, and 8 processors respectively. (Abstract)

(FR) mode where forensic experts have the availability of original as well as tampered video sequences.

Keywords— video authentication; tampering; temporal; spatial; spatio-temporal; frame drop; data parallelism

Numerous approaches had been proposed by researchers to detect various types of temporal tampering under FR, NR, and RR modes. As authentication is described as intractable problem, thus schemes presented by researchers are based upon approximation [6] [7]. Therefore during video authentication, achieving better accuracy is major challenge before research community.

I. INTRODUCTION Whenever a video sequence is presented as evidence during court trials, it is required first to authenticate the presented video sequence before considering it as evidence [1]. Video forensic laboratories with the help of forensic tools and intelligence of forensic experts play vital role to authenticate such evidences [2]. If video sequence is found as authentic and all parties agree, copy of authenticated evidence (now to be referred as original video sequence) is created which is later on to be used during long court trials [3]. While court proceeding, if someone makes tampering with the copy, it is often required to authenticate the copy with respect to available original video sequence [3]. Here the authentication is required to be conducted under full reference

Apart from FR, a video sequence is required to be authenticated under no reference (NR) and reduced reference (RR) modes also. Under NR mode, forensic experts have a single video sequence and its authenticity is to be analyzed. A scenario where video sequence is required to be authenticated when it was first presented as evidence during court trials describes it as authentication under NR mode. Under RR mode, forensic experts have the availability of tampered video sequence along with little information about original video sequence. A video sequence can be tampered in following domain: spatial, temporal, and spatio-temporal. During spatial tampering (ST), attackers manipulate pixel bits of a video frame, whereas in temporal tampering (TT) attackers manipulate video frames in time domain like frame drop, frame swapping, frame copying, and frame addition. In spatiotemporal tampering (STT), attackers tamper a video sequence at inter and intra frame level i.e. attackers tamper a video sequence spatially and temporally [4] [5].

Frame drop is one of the most often temporal tampering where some frames of a video sequence are cut or dropped [8] [9]. Theses dropped frames may partially or fully belong to a video scene. In one of contemporary work S. Wolf presented a NR and RR scheme which identifies dropped frames by maintaining motion energy time history of a video clip [10]. In another work carried out by Graphics and media lab of Moscow State University (MSU) presented dropped frame metric which identifies the dropped frames by calculating each frame difference with the previous one [11].

Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

In [12] authors of this paper presented a FR scheme which efficiently identifies dropped frame indices in uncompressed video sequences. Accuracy of presented scheme is claimed between 78 to 100 % under different test cases if video is spatio-temporally tampered by frame drop and high and low spatial distortions. As elaborated in II.C, scheme proposed in [12] requires heavy computations due to which the scheme might not be scalable. Therefore to resolve the issue of scalability, this paper presents data parallelization of scheme presented in [12].

C. Analysis In [12] the achieved accuracy of the scheme presented in II.B is claimed between 78 to 100% under different test cases. Achieved accuracy is at the cost of computations, which is significant due to required heavy computations in Step 2 of II.B. It is O((m-n+1) × n) due to which the scheme might not be scalable. To overcome the problem of scalability for said scheme, we present data-parallel algorithm in next section which makes the scheme scalable.

This paper is organized as follows: apart from introduction in section 1, section 2 describes the problem along with major steps of the scheme presented in [12]. Section 3 presents dataparallel algorithm. Section 4 describes simulation details followed by conclusion.

III. DATA-PARALLEL ALGORITHM This section presents data parallelization of the scheme presented in II.B for the problem defined in II.A. Here we identified that there is data independency to perform involved operations in Step 2. Therefore we applied data parallelism for this step and distributed the tasks to p processors.

II. DROPPED FRAME IDENTIFICATION This section redefines the problem of dropped frame indices identification presented in [12] along with its sequential algorithm.

Apart from data parallelism, we made one modification while analyzing the difference between two frames. Instead of comparing two video frames for similarity using peak signal to noise ratio (PSNR), in this paper we used mean square error (MSE) as it requires less computations [13]. Subsequent paragraph describes steps of proposed data-parallel algorithm.

A. Problem Definition Let us consider an original video sequence (VO) with m video frames as VO1, VO2, VO3 .. VOm. The copy (VC) of VO is distorted (intentionally or accidentally) by one of the temporal tampering (TT) i.e. frame drop such that we have tampered video sequence (VC) with n video frames as VC1, VC2 .. VCn, where m>n. Let us consider that VC is spatially tampered (ST) too with low and high spatial distortions. i.e. The video sequence VC is spatio-temporally tampered version of original video sequence VO. Given with both VC and VO it is required to identify frame indices which were dropped in VC. B. Sequential Scheme presented in [12] Subsequent paragraphs describe in brief the involved steps to identify dropped frame indices. Step 1: Input original video VO (m frames) and tampered video VC (n frames). i.e. Dropped frames count (dfc) is m–n. Step 2: Compute peak signal to noise ratio (PSNR) between ith frame of VC and ith to (i+dfc+1)th frame of VO for all i from 0 to n. Store these PSNR into a matrix DiffMat. In each row of DiffMat store maximum PSNR and its index (frame number of VO) into (m+1)th and (m+2)th field of DiffMat respectively. Step 3: Compute longest increasing sequence (LIS) in the Index field of DiffMat. Following possibilities will be encountered with computed LIS. Case 1: Length of LIS is n, here, call dropped frames as those indices which are not in LIS and exit. Case 2: If length of LIS is less than n, go to step 4. Step 4: Create initial population with the indices present in LIS. Compute fitness score (FS) which is average PSNR of all indices included into initial population. Using genetic algorithms mutation operator generate new population, compute its FS, if it is better than store the new population. Repeat the process either up to kth population or max fit (100dB) is achieved. After kth step or best fit, call the indices which are not in population list as dropped frames and exit.

Step 1: Input original video VO and tampered video VC with m and n frames respectively, where m > n. i.e. Dropped frames count (dfc) is m – n. Step 2: For each processor Pi set range of frames in VC which ith processor compares with dfc+1 frames of VO using MSE. Store these MSE values into a matrix (DiffMat) having n rows corresponding to length of VC and m+2 columns corresponding to length of VO and two additional fields as Min and Index which store lowest MSE in a row and its frame number/index in VO respectively. Create DiffMat ( ) Global VO, VC {original and tampered video sequence} m, n {count of frames in VO and VC} dfc {count of dropped frames} min {minimum MSE} DiffMat[1..n][1...m+2] {stores MSE’s, min, and index} begin m ← VO.Length, n ← VC.Length, dfc ← m – n, min ← ∞ activate processors P1 P2, ….. Pt for all Pi where 1 ≤ i ≤ n do {each processor sets bounds of frames in VC} start ← (i – 1) × n / t + 1, end ← (i × n / t) for j ← start to end do for k ← start to start + dfc + 1 do DiffMat[j][k] ← MSE(VCi , VOk) if(min > MSE(VCi , VOk) min ← MSE(VCi , VOk), DiffMat[j][m+1] ← min DiffMat[j][m+2] ← k end if end for end for end for end Fig. 1. Proposed data-parallel algorithm to create DiffMat

Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

VC1 VC2 VC3 VC4 VC5 VC6

VO1 25

VO2 80 17

VO3 60 58 81

VO4 85 54 82 91

VO5 75 51 86 94 74

VO6 83 68 89 73 70 66

VO7 69 90 29 63 79 67

VO8

VO9

VO10

VO11

VO12

86 88 23 84 69

68 61 20 71

62 71 72

75 78

22

Min 25 17 29 23 20 22

Index 1 2 7 8 9 12

Fig. 2. An example of case C2.2a where DiffMat is created with m = 12 and n = 6 video frames

VC1 VC2 VC3 VC4 VC5 VC6

VO1 25

VO2 80 17

VO3 60 58 81

VO4 85 54 82 91

VO5 75 51 86 94 20

VO6 83 68 89 23 70 66

VO7 69 90 29 63 79 67

VO8

VO9

VO10

VO11

VO12

86 88 35 84 69

68 61 26 71

62 71 72

75 78

22

Min 25 17 29 23 20 22

Index 1 2 7 6 5 12

Fig. 3. An example of case C2.2b where DiffMat is created with m = 12 and n = 6 video frames

If more than one field in a row has same minimum MSE, then store smaller index into Index field of DiffMat. Fig. 1 describes first two steps of the proposed algorithm. Since some frames of VC are spatially tampered too, therefore minimum MSE’s in each row of DiffMat may not be 0. Following cases are possible at this stage: C2.1: If there is no spatial tampering in VC (i.e. VC is only temporally tampered), then, there will be at least one MSE in each row of DiffMat as 0. Since there is no ST in VC, indices in Index field of DiffMat will be in increasing order. C2.2: Apart from frame drop, if VC is spatially tampered too, then MSE’s in a row of DiffMat may not be 0, thus Min field of that row in DiffMat will not be 0. Therefore, indices in Index field of DiffMat may be (a) in increasing order or (b) in random order. These cases (C2.2a and C2.2b) have been presented by examples in Fig. 2 and Fig. 3 with m = 12 (i.e. 12 frames in VO) and n = 6 (i.e. 6 frames in VC) frames respectively. Step 3: Apply longest increasing sequence [14] [15] to compute longest increasing sequence (LIS) in the Index field of DiffMat. There can be following possibilities while computing LIS. C3.1: Length of identified LIS is equal to n (i.e. number of frames in VC). This indicates unique matching for each frame of VC with VO. Call these indices as frames of VO present in VC, and other indices as dropped frames indices and exit. C3.2: Length of identified LIS (say u) < n indicates that there are some frames in VC which do not have unique matching frame in VO. Here our objective is to minimize average MSE (for all n frames), thus as a first step we select LIS which gives minimum average MSE (for u frames where u < n). There can be more than one LIS of length u. Select the LIS which gives minimum average MSE (for u frames) and go to Step 4.

9, and 12}, as frame numbers of VO present in VC and other indices {3, 4, 5, 6, 10, and 11} as dropped frame indices in VC. In the example of Fig. 3, length of LIS (i.e. u) is 4, which is less than n. There are 3 such sequences but the sequence {1, 2, 5, and 12} gives minimum average MSE {25, 17, 20, and 24}, thus we select LIS as {1, 2, 5, and 12}. As our objective is to select such combination of indices (of length n) which produces maximum overall similarity (i.e. minimum average MSE for all n frames), it is required to be explored all combinations of n frames which is computationally expensive. To obtain a feasible solution we present next genetic algorithm based approach which does not guarantee to give optimal solution but with guided initial population we can achieve near optimal solution. Step 4: Create a population array (PopArr) of length m (i.e. count of frames in VO). Use frame indices of VC which are available in selected LIS (of length u) to create initial population. Fill the fields of PopArr as 1 if that index is part of the selected LIS. Randomly fill n - u entries as 1 and rest other entries as 0. Make a copy of PopArr into Final. Define MinFit as ratio of (sum of MSE’s in Min field of DiffMat) and u, and MaxFit as ratio of (sum of MSE’s in Min field of DiffMat) and n. In the example of Fig. 3, MinFit is 34 and MaxFit is 22.67. PopArr for this example is represented in Fig. 4, first by inserting 1 into the fields {1, 2, 5, and 12} then randomly insert n – u (i.e. 2) entries as 1 and rest other as 0. Make copy of PopArr into array Final as {1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, and 1}. 1

2

1

1

3

4

5

6

7

8

9

10

11

1

12

1 (a)

1

1

In the example of Fig. 2, since all indices of Index field are in increasing sequence, thus we call theses indices {1, 2, 7, 8,

0

0

1

1

0

0

0

0

1

1

(b) Fig. 4. PopArr for the example of Fig 3

Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

1

2

3

4

5

6

7

8

9

10

11

12

1

2

2

2

3

4

4

4

4

4

5

6

Fig. 5. Prefix Sum of PopArr for the example of Fig 3

Step 5: Store prefix sum [16] of the array PopArr into an array PfxSum. Compute fitness score (FS) of current population by averaging MSE’s of DiffMat [PfxSum[i]][i] for all i = 1 to m where PopArr[i] = 1. If FS of current population is less than MinFit then assign FS into MinFit and make a copy of PopArr into Final. In the preceding example FS of the current population is 41.3 which is greater than MinFit, therefore MinFit and Final remain unchanged. Step 6: Randomly flip any two bits of current population using bit string mutation [17] such that one of the flipped bit is 0 and another one is 1 and generate new population into PopArr. As count of 1 is unchanged while performing bit string mutation, there will be n entries of 1’s in newly generated population placed at random places in PopArr.

B. Analysis parameters With five sets of original and tampered video sequences we analyzed performance of the scheme presented in previous section over following parameters: 

Required processing time with 8 processors to identify dropped frames indices when there are (a) 0.5% dropped frames, and (b) 1% dropped frames



Average speedup with one, two, four and eight processors to identify dropped frame indices when there are (a) 0.5% dropped frames, and (b) 1% dropped frames.

Fig. 6 describes the required processing time (with 0.5% and 1% dropped frames) whereas Fig. 7 and Fig. 8 describe average speedup with 0.5% and 1% dropped frames respectively.

Step 7: Repeat Step 5 and Step 6 until kth population or MinFit = MaxFit. Step 8: We call a frame of VO present into VC if that index in Final is 1. Call rest other indices as dropped frame indices in VC (i.e. these frames of VO are not present in VC) and exit. In the preceding example, MaxFit cannot be achieved, but the best fit which is possible is as follows: frame indices 1, 2, 7, 8, 9, and 12 of VO with average MSE of 25.66 {(DiffMat[1][1] + DiffMat[2][2] + DiffMat[3][7] + DiffMat[4][8] + DiffMat[5][9] + DiffMat[6][12]) / 6} are present in VC. We call rest other indices {3, 4, 5, 6, 10, and 11} as frames of VO which are dropped in VC. IV. SIMULATION As this paper is an extension of author’s pre-published work where they claimed the achieved accuracy to identify dropped frame indices between 78 to 100% under different test cases. Therefore, here we are only presenting the simulation results to analyze the required processing time and average speed up over various video data sets.

Fig. 6. Required processing time with 8 processors

A. Original and tampered video sequences Presented algorithm has been simulated over multi core machine (having 8 processors) and tested with 5 different uncompressed video sequences of following durations: Video 1 (10 seconds), Video 2 (30 seconds), Video 3 (200 seconds), Video 4 (215 seconds), and Video 5 (540 seconds). We created five sets of tampered video sequences by introducing temporal and spatial tampering in all five original video sequences. Each original video sequence is temporally tampered by dropping 0.5% and 1% video frames. Further, we introduced low and high spatial distortions by modifying least significant bit and fourth least significant bit of some pixels in randomly selected video frames.

Fig. 7. Average speedup when 0.5% frames were dropped

Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

and 8 processors and observed that it reduces by increasing number of processors with average speedup of 1.77 to 3.35 for 2 to 8 processors respectively. Simulated results suggested that the proposed scheme is scalable by increasing the number of processors. ACKNOWLEDGMENT We are thankful to all persons who directly or indirectly helped us to develop and simulate the data-parallel algorithm. It’s our sincere thanks to Sanjay sir for insisting us to propose scalable scheme. We also thank Yamuna Shukla to provide resources for simulation of data-parallel algorithm. REFERENCES [1] [2] Fig. 8. Average speedup when 1% frames were dropped [3]

C. Dicussions Although video authentication is intractable problem, our scheme efficiently identifies dropped frame indices and as presented in Fig 6, requires reasonable processing time with 8 processors. It is noted that required processing time for 0.5% frame drop and 1% frame drop linearly varies i.e. to identify dropped frame indices in 215 seconds long video we require 177.29 seconds when there is 0.5% dropped frame whereas 348.79 seconds (almost double than needed with 0.5% dropped frames) needed in same video to identify 1% dropped frame indices. Similar observations are made for other video sequences also. Further, it has been observed from Fig. 7 (0.5% dropped frames) and Fig 8 (1% dropped frames) that average speedup are 1.77, 2.82, and 3.35 for 2, 4, and 8 processors respectively. Here, average processor efficiencies are 0.92, 0.71, and 0.43 for 2, 4, and 8 processors respectively. Achieved speedup and processors efficiency (with 2, 4, and 8 processors) suggests that required processing time can be reduced further by increasing number of processors. Thus the presented algorithm is scalable by increasing number of processors and resolves the scalability issue of the scheme presented in [12]. CONCLUSION

[4]

[5]

[6]

[7]

[8]

[9] [10]

[11] [12]

[13]

Authentication of a video sequence against some tampering is one of the major challenges for forensic experts. Dropping of video frames is one of the temporal tampering which is being addressed by authors in [12]. In this paper we extended the scheme presented in [12] to resolve the issue of scalability. We presented a data-parallel algorithm which distributes one of involved algorithmic step (creation of DiffMat) to n processors and simulated the presented algorithm by introducing 0.5% and 1% frame drop over five video sequences of duration ranging from 10 seconds to 540 seconds. We noted the required processing time with 1, 2, 4,

[14]

[15]

[16] [17]

http://www.bbc.co.uk/news/science-environment-20629671 Accessed 25 April 2013. http://www.videoforensicexpert.com/tag/digital-video-forensicevidence/ Accessed 25 April 2013. http://www.videoforensicexpert.com/video-forensics/videoauthentication-services/ Accessed 25 April 2013. S. Upadhyay and S. K. Singh, “Video Authentication: Issues and Challenges,” International Journal of Computer Science Issues, Vol. 9, Issue 1, No. 3, pp 409-418, January 2012. M D Swanson, M Kobayashi, and A H Tewfik, “Multimedia DataEmbedding and Watermarking Technologies,” in Proc. IEEE, vol. 86, pp 1064-1087, June 1998. N. D. Beser, T. E. Duerr, and G. P. Staisiunas, “Authentication of digital video evidence," in Proc. of the SPIE International Conference on Applications of Digital Image Processing XXVI, vol. 5203, pp. 407-416, November 2003. N. Memon, P. Vora, B.-L. Yeo, and M. Yeung, "Distortion Bounded Authentication Techniques,” in Proc. of the SPIE, Security and Watermarking of Multimedia Contents II, vol.3971, pp. 164-174, 24-26 Jan. 2000. Y. Yusoff, W. Christmas, and J. Kittler, “Video shot cut detection using adaptive thresholding,” in British Machine Vision Conference, Bristol, UK, pp. 362–371, September 2000. Jun U and M. D. Srinath, “An efficient method for scene cut detection,” Pattern Recognition Letters 22 (2001) pp 1379-1391. Wolf S, “A No Reference (NR) and Reduced Reference (RR) Metric for Detecting Dropped Video Frames,” Fourth International Workshop on Video Processing and Quality Metrics for Consumer Electronics, VPQM 2009. http://compression.ru/video/quality_measure/metric_plugins/dfm_en.ht m Accessed 25 April 2013. M. K. Thakur, V. Saxena, and J. P. Gupta, “A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm,” International Journal of Digital Content Technology and its Applications, Vol. 6, No. 20, pp. 562-573, 2012. Girod B, “What’s wrong with mean-squared error,” in Digital Images and Human Vision. A. B. Watson, ed., MIT Press, pp. 207–220, 1993. Saks M, Seshadhri C, “Estimating the Longest Increasing Sequence in Polylogarithmic Time,” in Proc. of 51st Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pp. 458-467, 2010. Chan W T, Zhang Y, Fung S P Y, Ye D, Zhu H, “Efficient algorithms for finding a longest common increasing subsequence,” in Proc. of 16th Annual International Symposium on Algorithms and Computation, Hainan, China, pp. 665-674, 2005. http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf. Accessed 25 April 2013. http://en.wikipedia.org/wiki/Mutation_%28genetic_algorithm%29 Accessed 25 April 2013.

Author’s Copy Citation Details: Thakur, M. K., Saxena, V., and Gupta, J. P., “Data-parallel full reference algorithm for dropped frame identification in uncompressed video using genetic algorithm,” in Proc. 6th International Conference on Contemporary Computing (IC3-2013), August 8-10, 2013, pp. 467-471. Available at: ieeexplore.org

Suggest Documents