Temporal Feature Modulation for Video ... - Semantic Scholar

5 downloads 222 Views 329KB Size Report
a temporal feature is extracted from each video frame and modulated to ...... [14] J. S. Seo and C. D. Yoo, “Image watermarking based on invariant regions.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 4, APRIL 2009

603

Temporal Feature Modulation for Video Watermarking Young-Yoon Lee, Sang-Uk Park, Chang-Su Kim, Senior Member, IEEE, and Sang-Uk Lee, Fellow, IEEE

Abstract— We propose two temporal feature modulation algorithms that extract a feature from each video frame and modulate the features for a series of frames to embed a watermark codeword. In the first algorithm, the existence of a frame is used as the frame feature, and a watermark codeword is embedded into the original video by skipping selected frames. In the second algorithm, the centers of gravity of blocks in a frame are used as the frame feature. By modifying the centers of gravity, we embed 1-bit information into the frame. Simulation results demonstrate that the proposed algorithms are robust against compression and temporal attacks. Index Terms— Frame skipping, image fingerprinting, temporal feature modulation, video watermarking.

I. I NTRODUCTION

N

OWADAYS, the problem of copyright violation of video contents has become very serious. For example, a digital recorder can acquire high-quality movies by capturing, which is an easy and attractive tool for piracy [1]. Even though it is very difficult to prevent piracy completely, it is useful to track and punish the pirates by employing watermarking or fingerprinting schemes [1]–[3]. Video watermarking techniques can be classified into three categories according to the embedding domain: frame-based modulation; three-dimensional (3-D) transform modulation; and temporal feature modulation. Early algorithms extended image watermarking techniques to video watermarking [4], [5], which is called frame-based modulation. In 3-D transform modulation, a group of video frames is represented using 3-D transforms, and then watermarks are embedded into 3-D transform coefficients [6]. In temporal feature modulation, a temporal feature is extracted from each video frame and modulated to convey watermark information. In other words, the temporal features from a series of frames form a onedimensional (1-D) signal, which is used as the watermark embedding domain. Haitsma and Kalker’s algorithm [7] can be regarded as a temporal feature modulation scheme, which computes the average pixel value for each frame and modulates the average values.

Manuscript received March 15, 2007; revised May 11, 2008. First version published March 4, 2009; current version published May 20, 2009. The work of Y.-Y. Lee, S.-U. Park, and S.-U. Lee was supported by the Ministry of Knowledge Economy (MKE), Korea, under the ITRC support program supervised by the IITA (grant number IITA-2008-C1090-0801-0018). The work of C.-S. Kim was supported by the MKE, Korea, under the ITRC support program supervised by the IITA (grant number IITA-2008-C1090-0801-0017). This paper was recommended by Associate Editor Q. Sun. Y.-Y. Lee, S.-U. Park, and S.-U. Lee are with the School of Electrical Engineering and Computer Science and INMC, Seoul National University, Seoul 151-742, Korea (e-mails: [email protected], [email protected], [email protected]). C.-S. Kim is with the School of Electrical Engineering, Korea University, Seoul 136-713, Korea (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSVT.2009.2014011

In this paper, we propose two temporal feature modulation algorithms. The first algorithm skips a selected set of frames from the original video according to a watermark codeword. Thus, the feature of a frame is represented by its existence in the watermarked video. The decoder matches the frames in the watermarked video with those in the original to extract the skipping pattern, which is used to estimate the watermark codeword. The first algorithm requires the original video in the decoder. The second algorithm uses Pröfrock et al.’s normed center of gravity (NCG) scheme [8] to embed 1-bit information into each frame. The second algorithm enables the decoder to reconstruct the watermark codeword without requiring the original video. Simulation results demonstrate that the proposed algorithms provides the robustness against H.264/AVC compression attacks and temporal attacks. The rest of the paper is organized as follows. Sections II and III propose the two temporal feature modulation algorithms and evaluate their performances. Section IV gives concluding remarks. II. A LGORITHM I: F RAME S KIPPING We embed a temporal watermark by skipping a selected set of frames from the original video. The frame skipping pattern becomes the temporal watermark. The human visual system is not sensitive to a slight change in the frame rate. For example, 29.97-Hz TV signals can be converted into 24-Hz motion pictures without noticeable artifacts. Thus, the frame skipping does not cause severe degradation if the original video has a high frame rate and the skipping rate is relatively low. A. Watermark Embedding Fig. 1 shows the watermark embedder using frame skipping. We partition the original video into embedding units and embed watermarks into them separately. Although embedding units can be determined by a scene-change detection scheme, we use the uniform partitioning for simplicity and assume that each embedding unit consists of a fixed number of frames. For each embedding unit X, the message encoder φm hashes a message M using a key K and generates a watermark codeword c. The codeword c is a binary vector of length l X , where l X is the number of frames in X. The frame selector φw reads the bits of c, which decide the states of the video frames in X. If the ith bit is “0,” the frame selector skips the ith frame in X. Otherwise, the frame selector preserves the frame. Thus, the watermarked unit Y is obtained by preserving a selected set of frames in X. Note that the number of frames in Y, denoted by lY , is equal to the number of “1” bits in c. Finally, the watermarked video is obtained by merging the watermarked units at the scene unifier.

1051-8215/$25.00 © 2009 IEEE

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.

604

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 4, APRIL 2009

Scene Change Detector

Original Video

X

Frame Selector φ

Y w

Scene Unifier

Watermarked Video

Original Embedding Unit X

c

~

Input Video Y

Message Encoder φ

Message M

m

λ ^

Watermark Codeword Extractor

Watermark Extractor φw

^

c

Message –1 Decoder φm

^

Message M

–1

Key K

Key K

Fig. 1.

Frame Matching

Block diagram of the watermark embedder.

B. Watermark Code Design We begin with a BCH(n, k, 2t + 1) code to design a robust watermark code C, which is the set of possible watermark codewords. The BCH code encodes k-bit information into a binary vector of length n and can correct up to t bit errors [9]. The length n is set to l X , which is the number of frames in X. The error correction capability t is determined as follows. We select watermark codewords from the BCH code such that the number of “1” bits in each codeword is lY , which is the number of preserved frames in Y. Since all codewords have the same number of “1” bits, the minimum distance dmin of the watermark code C is an even number. Also, dmin is guaranteed to be larger than the minimum distance (2t + 1) of the entire BCH code. Therefore, given dmin , t can be set to dmin 2 − 1. Embedding a watermark should not introduce noticeable artifacts into the watermarked video. Some codewords in the watermark code, however, can cause the skipping of several consecutive frames, yielding unnatural motions on the watermarked video. To alleviate these artifacts, we further choose only the codewords in which “0” bits are separated by at least ν successive “1” bits. Then, the number of such codewords becomes the watermark capacity n c . More detailed analysis on the robustness, transparency, and capacity of the proposed watermark code can be found in [10]. C. Frame Matching Fig. 2 shows the block diagram of the proposed water−1 aligns the input mark decoder. The watermark extractor φw video  Y temporally to the original embedding unit X and estimates the frame skipping pattern. The frame skipping pattern corresponds to a codeword cˆ , which is provided to the message decoder φm−1 . Finally, the message decoder estimates ˆ the message M. Given the input video  Y and the embedding unit X, we introduce a selection function λ: {1, . . . , lY} → {1, . . . , l X } ∪ {0} to describe the frame correspondence between  Y and X, Y and X, where lY and l X are the numbers of frames in  respectively. The selection function is defined on the space of frame indices and represents the temporal modification of (i) in  video. More specifically, if the ith frame Y Y matches the jth frame X ( j) in X, then we set the selection function (i) does not match any as λ(i) = j. On the other hand, if Y frame in X, it is an unmatched frame and λ(i) = 0. Suppose that the frame selector in the watermark embedder uses λ to skip frames. Then, Y (i) is obtained by preserving

Fig. 2.

Block diagram of the watermark decoder.

X (λ(i)). Thus, if  Y is the watermarked copy of X without (i) = X (λ(i)). However, when  any attacks, then Y Y is (i) may be different from any corrupted by some attacks, Y frame of X. We hence define the matching cost dm (i, j) as (i) to X ( j). We use the sum a distance measure from Y (i) and X ( j) for of absolute differences (SAD) between Y dm (i, j). Then, we can estimate the selection function λ by the local minimization (LM) of the matching cost, via λLM (i) = arg min dm (i, j), for each 1 ≤ i ≤ lY.

(1)

1≤ j≤l X

This implies that the selected frame X (λLM (i)) is most similar (i). To recover from frame insertion to the input frame Y  attacks, Y (i) can be declared to be an unmatched frame if the minimum cost min1≤ j≤l X dm (i, j) is larger than a threshold. However, in this LM approach, selection functions may not be one-to-one, and estimation results are very sensitive to the threshold. We adopt the globally optimal matching algorithm [11], which minimizes the total matching cost subject to the oneto-one matching constraint as ⎞ ⎛   λˆ = arg min ⎝ dm (i, λ(i)) + du (i)⎠ (2) λ∈ i :λ(i)>0 i :λ(i)>0 where  = {λ|λ(i) < λ(i  ) if i < i  , λ(i) > 0, λ(i  ) > 0}, and du (i) is the unmatched cost for the ith frame [11]. The exhaustive minimization of (2) requires a huge amount of computations, which are impractical in most applications. However, we can reduce the complexity efficiently to O(lYl X + lYν I 2 ) using dynamic programming [11], [12], where ν I is the number of inserted frames. The reduced complexity is still too high for real-time applications. However, the proposed algorithm can be useful in some applications, such as forensic ones, that do not require the real-time detection. D. Watermark Decoding The watermark codeword extractor in Fig. 2 accepts the ˆ which corresponds to a codeestimated selection function λ, word c˜ . The estimated codeword c˜ , however, may not belong to the watermark code C. The watermark code is derived from a BCH code, and we can correct errors up to its error correction capability. But, since we do not use the entire BCH code, the corrected codeword may not be included in the watermark code C either. Therefore, given c˜ ∈ C, we can correct further bit errors by finding the nearest codeword cˆ , which is cˆ = arg min d H (˜c, c). c∈C

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.

(3)

LEE et al.: TEMPORAL FEATURE MODULATION FOR VIDEO WATERMARKING

605

0.5

QP45

LM DP

0.4

(b) Movie 2:161,500th

(c) Movie 3:118,400th

0.3 Pe

(a) Movie 1: 86,295th

0.2 QP40 0.1

(d) Movie 11:52,076th Fig. 3.

(e) Movie 13:104,396th

(f) Movie 14:135,976th 0

Representative frames of test sequences.

The message decoder φm−1 , which is the inverse of the message encoder φm , decodes the message from cˆ using the key by Mˆ = φm−1 (K , cˆ ).

(4)

E. Experimental Results The performance of the proposed algorithm is evaluated using 20 Korean movies of resolution 360 × 240. The frame rates of these sequences are either 29.97 or 23.976 frames per second (fps). Fig. 3 shows representative frames of six of these sequences. In case of stationary scenes in which neighboring frames are very similar, it is not easy to discern video frames and align them temporally to the original video. Thus, we select an embedding unit so that the average SAD between neighboring frames is larger than a threshold. In total, 9,619 embedding units are tested. Each embedding unit consists of 63 consecutive frames. We skip 12 frames from each embedding unit and preserve at least two frames after each skipped frame to satisfy the transparency constraint. We design our codewords using a BCH (63, 45, 7) code. The number of valid codewords is 14 482 ( 13.8 bits). To evaluate the robustness of our watermarking scheme, we measure the message error probability, denoted by Pe , which is the probability of the decoded message being different from the original message. Fig. 4 shows the message error probability, when the test sequences are compressed by H.264/AVC. We compare two frame matching algorithms: the local matching (LM) approach in (1) and the proposed global matching algorithm using dynamic programming (DP). The proposed algorithm yields a lower Pe than LM. Note that the proposed algorithm can extract watermark messages without any error when the quantization parameter (QP) is less than 35 and the average peak signal-to-noise ratio (PSNR) of a compressed sequence is higher than about 31 dB. When the watermarked sequences are compressed at higher QPs, the reconstructed videos are very crude and the differences between neighboring frames can be ambiguous. The ambiguity can yield decoding errors. Next, we evaluate the performance against temporal attacks, including frame removal, insertion, and swapping. Watermarked sequences, which are compressed at QP 20, are used in the tests. For frame removal attacks, we omit a number of

QP35 25

30

QP30 QP25 QP20 QP15 35 40 45 PSNR(dB)

Fig. 4. Message error probability Pe of the first algorithm after H.264/AVC compression attacks. The frame matching is performed by the local matching (LM) approach or the proposed global matching algorithm using dynamic programming (DP). TABLE I T HE M ESSAGE E RROR P ROBABILITY Pe OF THE F IRST A LGORITHM AND THE S ECOND A LGORITHM FOR T EMPORAL ATTACKS Temporal attacks

Algorithm I

Algorithm II 0

Frame insertion only

20 frames

0

Frame swapping only

20 pairs

0

0

Frame removal only

8 frames

8.33 × 10−3

0

Frame removal + Frame insertion + Frame swapping

8 frames 8 frames 8 pairs

8.33 × 10−3

0

Frame averaging

all pairs

1.65 × 10−2

4.73 × 10−2

frames at random. For frame insertion attacks, we duplicate randomly chosen frames. For frame swapping attacks, we exchange a pair of randomly chosen frames. For each attack scenario, 100 different seeds are used to choose frames at random, and the average performance is provided. Table I summarizes the performance of the proposed algorithm (Algorithm I) in several attack scenarios. The proposed algorithm yields no error, when up to 20 frames are inserted into an embedding unit or up to 20 pairs of frames are swapped. The message error probability is 8.33 × 10−3 when the number of removed frames amounts to 8. Note that the message error probability does not increase even when the removal attacks are further combined with the insertion of eight frames and the swapping of eight pairs of frames. This indicates the proposed algorithm is very robust against frame insertion and swapping attacks. We also employ frame averaging, which is a simple collusion attack. We attack a watermarked unit by averaging randomly selected pairs of consecutive frames. When all pairs of frames are averaged, the message error probability is relatively high (= 1.65 × 10−2 ). However, if all pairs of frames are averaged, the attacked sequence loses its value in general since its quality is degraded too much due to blurring and ghost artifacts around edges of moving objects.

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.

606

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 4, APRIL 2009

III. A LGORITHM II: N ORMED C ENTER OF G RAVITY Algorithm I requires the original video at the decoder, which is a burden for the watermarking system. In this section, instead of frame skipping, we modify block features to embed 1-bit information into each frame according to a watermark codeword. This approach enables the blind decoding. We use the NCG, proposed in [8], as a block feature. However, the main contribution of this paper does not lie in the choice of the specific type of feature, but in the method to modulate the feature vector to form robust watermark codewords against temporal attacks. Instead of NCG, other features such as the ones in [13], [14] can be combined with the proposed modulation technique.

0 0 0 0 0 0 0 0 p (1,0)

0 0 1 0 1 0 1 1 p (2,0)

0 0 0 1 0 1 1 1 p (3,0)

0 1 0 0 1 1 0 1 p (4,0)

0 1 0 1 1 0 1 0 p (5,0)

0 0 1 1 1 1 0 0 p (6,0)

0 1 1 1 0 0 0 1 p (7,0)

0 1 1 0 0 1 1 0 p (8,0)

1 1 1 1 1 1 1 1

1 1 0 1 0 1 0 0

1 1 1 0 1 0 0 0

1 0 1 1 0 0 1 0

1 0 1 0 0 1 0 1

1 1 0 0 0 0 1 1

1 0 0 0 1 1 1 0

1 0 0 1 1 0 0 1

p (1,1)

p (2,1)

p (3,1)

p (4,1)

p (5,1)

p (6,1)

p (7,1)

p (8,1)

1

2

3

4

5

6

7

8

0 0 0 1 0 1 1 1 p (3,0)

frame averging

1 0 1 1 0 0 1 0 (4,1) p

.5 0 .5 1 0 1 .5 .5 p (3,0)+(4,1)

frame index

(a)

(b)

(a) Spread patterns

(b) Colluded spread pattern

Fig. 5. Set of spread patterns that are equivalent to the extended BCH (8,4,4) codewords.

A. Compression Resilient Feature For efficient compression, the state-of-the-art video coding standard H.264/AVC eliminates spatio-temporal redundancies in video sequences effectively. It is very difficult for watermarks to survive compression attacks. To overcome the weakness against the compression procedure, it is beneficial to embed watermarks into resilient features that are not affected sensitively by compression tools. Pröfrock et al. proposed NCG as a robust block feature in their video watermarking system [8], [15]. In this paper, we also use NCG as a feature for watermark embedding. For the sake of completeness, let us briefly describe NCG. For a block b of size n × n, we compute the vertical mean = (m x (1), . . . , m x (n)), whose each element is vector mx  n b(k, l) for 1 ≤ k ≤ n, where b(k, l) is the m x (k) = n1 l=1 (k, l)th pixel of the block b. We then obtain the weighted sum of m x (k)’s by γx = L x exp(iθx ) =

n 

m x (k) exp i 2πn k

(5)

k=1

where L x and θx denote the magnitude and the phase of γx , respectively. The phase θx indicates which element in the mean vector mx is dominant. Similarly, we can compute the horizontal mean vector m y , γ y , L y , and θ y . Then, the NCG of the block b is defined as = θx , θ y . A block is called NCG-robust if the corresponding magnitude L x or L y is larger than a pre-specified threshold L min . We set L min to 200 and use only NCG-robust blocks for data embedding. B. Watermark Embedding Let X denote an embedding unit and l X be the number of frames in X. The message encoder generates a codeword c = c1 , . . . , cl X , where ci = 0 or 1. In contrast to Section II, adjacent bits can have value 0. In the watermark multiplexer, we map each codeword bit ci to a spread pattern p = ( p1 , . . . , pl ) of length l. Then, we partition NCG-robust blocks in X (i) into l disjoint sets, and embed the kth spread bit pk into the kth set repeatedly. For the embedding, we warp an NCG-robust block iteratively to move its NCG component θ toward a specified value θˆ based on the quantization index modulation (QIM) [16].

C. Decoding of Spread Bits At the decoder side, a spread bit pk is estimated from an NCG-robust block. Let θˆ denote the NCG component of the NCG-robust block. We first compute the normalized requantization error of θˆ [16]. To estimate pk reliably, we get the average value ek of the normalized re-quantization errors over the kth set of NCG-robust blocks. Since pk is repeatedly embedded into the kth set of blocks, ek is close to pk and becomes a good estimator of pk . D. Spread Patterns and Their Decoding

We embed the ith bit ci of the codeword c = c1 , . . . , cl X into the ith frame of the embedding unit. This is achieved by spreading ci into a spread pattern p = ( p1 , . . . , pl ) and modifying the NCGs of blocks within the ith frame according to the spreading bits pk ’s. Since the temporal alignment of frames within the embedding unit can be broken by frame removal, insertion, or swapping attacks, the spreading pattern should inform the blind decoder of the frame index i as well as the codeword bit ci . We design the spread patterns using the extended binary BCH (l, r, 2l ) code, l = 2r −1 , which is formed by adding an overall parity to the binary BCH (l − 1, r, 2l − 1) code [9]. Let p(i,ci ) denote the spread pattern representing the frame index i and its corresponding codeword bit ci . We design p(i,1) as the complementary pattern of p(i,0) . Fig. 5 (a) shows the spread patterns when l = 8. The minimum distance of the extended BCH code is 2l . Thus, we can effectively estimate the original frame index i of ( j). First, as described in the previous a watermarked frame Y subsection, we compute ek from the kth set of NCG-robust ( j), which is an estimator of the kth spread bit. blocks in Y Then, we match those estimated bits to the spread pattern by

l  n j,k (i, c) ˜ (6) i˜ = arg min min ek − pk nj c∈{0,1} ˜ 1≤i≤l X k=1

where n j,k is the number of NCG-robust blocks in the kth  ( j), and n j = lk=1 n j,k . The resultant i˜ is the partition of Y estimated frame index. Also, the bit c, ˜ which minimizes the ˜ bit in the estimated codeword c˜ . sum in (6), becomes the i-th

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.

LEE et al.: TEMPORAL FEATURE MODULATION FOR VIDEO WATERMARKING

E. Experimental Results We adopt the BCH (63,16,23) code as the watermark code. Note that the first algorithm has the transparency constraint that “0” bits are separated by at least two “1” bits (ν = 2). However, the second algorithm does not have the constraint and hence provides the capacity of 16 bits, which is bigger than the capacity of the first algorithm (= 13.8 bits). We compare the robustness of the second algorithm against H.264/AVC compression attacks with that of Alattar et al.’s algorithm [5]. In the implementation of [5], the capacity is controlled so that each frame contains 1-bit information as in the proposed algorithm. Also, the global embedding strength is adaptively determined between 0.5 and 2.0 so that the increase of the bit rate due to the watermark embedding does not exceed by 10%. Fig. 6 shows the message error probabilities. Whereas Alattar et al.’s algorithm provides a non-zero message error probability at QPs higher than 20, the second algorithm can perfectly decode the messages if QP is less than 35. Comparing Fig. 6 with Fig. 4, note that the robustness of the first algorithm against H.264/AVC compression attacks is comparable with that of the second algorithm. The attack scenarios in Table I are also employed to evaluate the robustness of the second algorithm. The second algorithm can perfectly decode the watermarks when the sequences are modified by the frame insertion, swapping, removal, or their combination attacks. For the frame averaging attack, when all pairs of frames are averaged, the second algorithm provides a higher message error probability (=4.73 × 10−2 ) than the first algorithm (=1.65 × 102 ). To summarize, the first and the second algorithms can tolerate compression attacks and temporal attacks effectively. Against video compression attacks, the performances of the two algorithms are comparable to each other. The second algorithm provides a bigger capacity and a better robustness against temporal attacks than the first algorithm. However, since the first algorithm does not introduce any spatial distortions in

1

Algorithm II Alattar et al.’s

QP40 QP45 QP35

0.8

0.6 Pe

The extended BCH code has an interesting property that the distance between any two spread patterns p(i,ci ) and p( j,c j ) is l 2 when i = j. In other words, two patterns have the same bits at half of their coordinates and the complementary bits at the other half coordinates. Let p(i,ci )+( j,c j ) denote the colluded spread pattern, which is obtained by averaging two patterns p(i,ci ) and p( j,c j ) . For instance, Fig. 5 (b) shows a colluded spread pattern p(3,0)+(4,1) . Suppose that two watermarked frames are averaged by an attack. Then, if the corresponding spread bits from the two frames are identical, ek in (6) tend to be close to 0 or 1. On the other hand, if the corresponding bits are complementary, ek tend to be close to 0.5. Therefore, to decode the frame indices and the codeword bits from the averaged frame, we employ  ,c˜ ) ˜ ˜ in the instead of p(i,c) the colluded spread pattern p(i,c)+(i matching in (6). Finally, the watermark codeword extractor corrects possible bit errors in c˜ to obtain cˆ and the message decoder estimates the message Mˆ from cˆ in the same way as Section II-D.

607

QP30

0.4

0.2

0

QP40

QP25 QP35

25

30

QP30 QP25 35 40 PSNR(dB)

QP20

QP15 45

Fig. 6. Message error probabilities Pe of the second algorithm and the Alattar et al.’s algorithm [5] after H.264/AVC compression attacks.

preserved frames, it can incorporate any image watermarking algorithm to embed additional information. In other words, the frame skipping method in the first algorithm can be viewed as an orthogonal procedure to image watermarking techniques. IV. C ONCLUSIONS We proposed two temporal feature modulation algorithms. The first algorithm skips a selected set of frames from the original video according to a watermark codeword, and the decoder matches the watermarked video frames to the original frames to estimate the codeword. The second algorithm modulates NCGs of blocks to embed 1-bit information to each frame, and its decoder can estimate the watermark codeword in a blind manner. Experimental results demonstrated that both algorithms are robust against H.264/AVC compression attacks and temporal attacks, including frame removal, insertion, and swapping. R EFERENCES [1] J. Lubin, J. Bloom, and H. Cheng, “Robust, content-dependent, highfidelity watermark for tracking in digital cinema,” in Proc. SPIE Security and Watermarking of Multimedia Contents, vol. 5020, Jan. 2003, pp. 536–545. [2] D. Boneh and J. Shaw, “Collusion-secure fingerprinting for digital data,” IEEE Trans. Inform. Theory, vol. 44, no. 5, pp. 1897–1905, May 1998. [3] M. U. Celik, G. Sharma, and A. M. Tekalp, “Collusion-resilient fingerprinting by random pre-warping,” IEEE Signal Process. Lett., vol. 11, no. 10, pp. 831–835, Oct. 2004. [4] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure spread spectrum watermarking for multimedia,” IEEE Trans. Image Process., vol. 6, no. 12, pp. 1673–1687, Dec. 1997. [5] A. M. Alattar, E. T. Lin, and M. U. Celik, “Digital watermarking of low bit-rate advanced simple profile MPEG-4 compressed video,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 8, pp. 787–800, Aug. 2003. [6] H.-S. Jung, Y.-Y. Lee, and S.-U. Lee, “RST-resilient video watermarking using scene-based feature extraction,” EURASIP J. Appl. Signal Process., vol. 2004, no. 14, pp. 2113–2131, Oct. 2004. [7] J. Haitsma and T. Kalker, “A watermarking scheme for digital cinema,” in Proc. IEEE ICIP, Oct. 2001, pp. 487–489. [8] D. Pröfrock, M. Schlauweg, and E. Müller, “A new uncompressed domain video watermarking approach robust to H.264/AVC compression,” in Proc. IASTED Conf. Signal Processing, Pattern Recognition/Applications, Feb. 2006, pp. 99–104.

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.

608

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 4, APRIL 2009

[9] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes (North-Holland Mathematical Library). Amsterdam: Elsevier, 1977. [10] Y.-Y. Lee, C.-S. Kim, and S.-U. Lee, “Video fingerprinting based on frame skipping,” in Proc. IEEE ICIP, vol. 1, Oct. 2006, pp. 2305–2308. [11] Y.-Y. Lee, “Temporal feature modulation for video watermarking,” Ph.D. dissertation, Seoul Nat. Univ., Korea, Feb. 2008. [12] D. S. Hirschberg, “Algorithms for the longest common subsequence problem,” J. ACM, vol. 24, no. 4, pp. 664–675, Oct. 1977. [13] P. Bas, J. M. Chassery, and B. Macq, “Geometrically invariant watermarking using frame points,” IEEE Trans. Image Process., vol. 11, no. 9, pp. 1014–1028, Sep. 2002.

[14] J. S. Seo and C. D. Yoo, “Image watermarking based on invariant regions of scale-space representation,” IEEE Trans. Signal Process., vol. 54, no. 4, pp. 1537–1549, Apr. 2006. [15] D. Pröfrock, M. Schlauweg, and E. Müller, “Geometric warping watermarking extended concerning geometric attacks and embedding artifacts,” in Proc. ACM Multimedia & Security Workshop, Sep. 2007, pp. 169–174. [16] B. Chen and G. W. Wornell, “Quantization index modulation: a class of provably good methods for digital watermarking and information embedding,” IEEE Trans. Inform. Theory, vol. 47, no. 4, pp. 1423–1443, May 2001.

Authorized licensed use limited to: Korea University. Downloaded on May 27, 2009 at 21:13 from IEEE Xplore. Restrictions apply.