A Survey on Different Video Watermarking Techniques and ...

A Survey on Different Video Watermarking Techniques and Comparative Analysis with Reference to H.264/AVC Sourav Bhattacharya, T. Chattopadhyay and Arpan Pal Abstract — Last few years have witnessed rapid growth in video coding technology. Among various standards, H.264/Advanced Video Codec (AVC) is found to be of significant importance regarding reduced bandwidth, better image quality and network friendliness. One of the current fields of interest is to develop a system with authentication and copyright protection methodology embedded within an efficient video codec. In this paper we first perform a survey on available video watermarking techniques, feasibility study on watermarking techniques meeting application specific criteria for H.264/AVC and then we perform a comparative analysis based on robustness and computational complexity of different watermarking algorithms. 1 Index Terms — Video Watermarking, H.264/AVC.

I. INTRODUCTION High speed computer networks, the Internet and the World Wide Web have revolutionized the way in which digital data is distributed. The widespread and easy accesses to multimedia contents and possibility to make unlimited copy without loss of considerable fidelity have motivated the need for digital rights management. Digital watermarking is a technology that can serve this purpose. A large number of watermarking schemes have been proposed to hide copyright marks and other information in digital images, video, audio and other multimedia objects [1, and references there in]. A watermark is a digital data embedded in multimedia objects such that the watermark can be detected or extracted at later times in order to make an assertion about the object. The main purpose of digital watermarking is to embed information imperceptibly and robustly in the host data. Typically the watermark contains information about the origin, ownership, destination, copy control, transaction etc. Potential applications of digital watermarking include transaction tracking, copy control, authentication, legacy system 1

Sourav Bhattacharya, T. Chattopadhyay and Arpan Pal are with Convergence Solutions Practice, Tata Consultancy Services Limited, Kolkata, India. E-mail: {sourav.bhattacharya, tanushyam.chattopadhyay, arpan.pal} @tcs.com

1-4244-0216-6/06/$20.00 ©2006 IEEE

enhancement and database linking etc. [2]. Growing popularity of video based applications such as Internet multimedia, wireless video, personal video recorders, video-on-demand, set-top box, videophone and videoconferencing have a demand for much higher compression to meet bandwidth criteria and best video quality as possible. Different video Encoder Decoders (CODECs) have evolved to meet the current requirements of video application based products. Among various available standards H.264 / Advanced Video Codec (AVC) is becoming an important alternative regarding reduced band width, better image quality in terms of peak-signal-to-noise-ratio (PSNR) and network friendliness [26], but it requires higher computational complexity. Different watermarking techniques have been proposed for different video CODECs, but only a few works on H.264/AVC can be found in the literature. H.264/AVC uses different transformation and block sizes than MPEG series, so development of new algorithms is required to integrate robust watermarking techniques for different profiles of H.264/AVC. In section II we review the basics of digital watermarking, video watermarking terminologies and techniques. In Section III we briefly discuss common video watermarking techniques. Comparative analyses between different watermarking techniques are described in section IV. Finally basics of H.264/AVC encoder are explained and applicability of different watermarking techniques in H.264/AVC is drawn in section V.

II. VIDEO WATERMARKING A. Digital watermarking: Digital watermarking also known as watermark insertion or watermark embedding, represents the method of inserting information into multimedia data also called original media or cover media e.g. text, audio, image, video. The embedded information or watermark can be a serial number or random number sequence, ownership identifiers, copyright messages, control signals, transaction dates, information about the creators of the work, bi-level or gray level images, text or other digital data formats. In the literature large number of text

[3]-[5], image [6]-[9], audio [10] and video [11]-[15] watermarking algorithms can be found. These algorithms modify the original media to generate the watermarked media. There may be no or little perceptible differences between the original media and the watermarked media. Fig.1 gives an overview of different types of watermarking methodologies depending on their working domains, cover media, perceptibility and application areas. After embedding watermark, the watermarked media are sent over Internet or some other transmission channels. Whenever the copyright of the digital media is under question, the embedded information is decoded to identify copyright owner. The decoding process can extract the watermark from the watermarked media (watermark extraction) or can detect the existence of watermark in it (watermark detection). Watermarking

Domain Document Perception Application Spatial Frequency

Text

Image Audio

Source Based Video

Invisible Robust

Private

Destination Based

Visible

Fragile

Public Non-invertible

Nonquasi-invertible Quasi-invertible

Invertible

Fig.1 Different types of watermarking methodologies.

The embedding or encoding process can be viewed as a function or mapping that maps the input X (original media), W (watermark) and/or K (key) to output X ′ (watermarked media). Mathematically it can be expressed as

X ′ = E ( X , W , [ K ])

(1) where E (⋅) denotes the embedding process and [⋅] represents optional argument. Similarly the decoding or extraction process D (⋅) can be expressed formally as

W ′ = D ( X ′′, [ X ], [ K ]) and the detection process

d (⋅) can be expressed as

(2)

{Yes or No} = d ( X ′′, [ X ],W , [ K ])

(3)

B

Video watermarking terminologies: Video watermarking describes the process of embedding information in video data. Different data hiding terminologies are given in [16]. The important terminologies pertaining to digital video watermarking are: Digital Video: Video sequence is a collection of consecutive and equally time spaced still images. Payload: It is the amount of information that can be stored in a watermark. An important concept regarding the videowatermarking payload is watermark granularity. Watermark granularity can be defined as how much data is required for embedding one unit of watermark information. Perceptibility: video watermarking methodology is called imperceptible if humans cannot distinguish between the original video from the video with inserted watermark. Robustness: a fragile watermark should not be robust against intentional modification techniques, as failure to detect the watermark signifies that the received data is no longer authentic. In case of application such as copyright protection, it is desirable that watermark always remains in the video data, even if the video data is subjected to intentional and unintentional signal processing attacks. Hence, depending on the requirements of the application the watermark is embedded in a robust, semi-fragile or fragile manner. Security: the security of the watermarking algorithm is ensured in the same way as in encryption methodology. According to the Kerckhoff’s assumption, the algorithm for watermark embedding can be considered to be public, where as the security depend solely on the choice of a key from a large key space. C Video watermarking techniques: Apparently any image watermarking technique can be extended to watermark videos, but in reality video watermarking techniques need to meet other challenges than that in image watermarking schemes such as large volume of inherently redundant data between frames, the unbalance between the motion and motionless regions, real-time requirements in the video broadcasting etc. Watermarked video sequences are very much susceptible to pirate attacks such as frame averaging, frame swapping, statistical analysis, digital-analog (AD/DA) conversion, and lossy compressions. Video watermarking applications can be grouped as security related like Copy control [18], fingerprinting, ownership identification, authentication, taper resistance etc. or value added applications like legacy system enhancement, database linking [1], video tagging, digital video broadcast monitoring [19], Media Bridge [20] etc. Apart from robustness, reliability, imperceptibility,

practicality, video watermarking algorithms should also address issues such as localized detection, real time algorithm complexity, synchronization recovery, effects of floating point representation, power dissipation etc [17]. According to the working domain, video watermarking techniques are classified in pixel domain and transform domain techniques. In pixel domain the watermark is embedded in the source video by simple addition or bit replacement of selected pixel positions. The main advantages of using pixel domain techniques are that they are conceptually simple to understand and the time complexity of these techniques are low which favours real time implementations. But these techniques generally lacks in providing adequate robustness and imperceptibility requirements. In transform domain methods, the host signal is transformed into a different domain and watermark is embedded in selective coefficients. Commonly used transform methodologies are discrete cosine transformation (DCT) and discrete wavelet transformation (DWT). Detection is generally performed by transforming the received signal into appropriate domain and searching for the watermarking patterns or attributes. The main advantage of the transformed domain watermarking is the easy applicability of special transformed domain properties. For example, working in the frequency domain enables us to apply more advanced properties of the human visual system (HVS) to ensure better robustness and imperceptibility criteria. III. SURVEY ON VIDEO WATERMARKING Watermark can be either directly inserted in the raw video data or integrated during encoding process or implemented after compressing the video data. Now we shall briefly discuss some common video watermarking techniques. Spread spectrum (SS) based watermarking technique was proposed in [11]. In the basic algorithm each bit of watermark a j , a j ∈ {−1,1} is spread over a large number of chips (cr ) and modulated by a binary pseudo-noise sequence pi , pi ∈ {−1,1} . The video and watermark are represented as vectors and scaled addition is carried out for watermark insertion. The retrieval of the watermark is carried out by high-pass filtering followed by correlationbased method. The robustness of the algorithm can be increased by increasing cr , random sequence), or

σ 2p (variance

µα (mean

of pseudo

of locally adjustable

amplitude factor). But increases in cr reduces the data rate of the scheme, where as increases in

σ 2p or µ α results

in

perceptibility of the watermark. As DCT is a linear transformation and watermark is independent of the picture, the watermark can be added in the DCT domain. The 1D watermark vector is rearranged into frame structure and by transforming it to 8 × 8 DCT domain; the watermark can be added directly to a partially decoded video stream. Since the size and transfer rate of watermarked video should be identical to the original video, DCT coefficients of watermark and video frame are combined only if the resulting VLC code is of same length of the original one. Again drift compensation is required to cancel out watermark components from P and B frames, as motion compensated prediction or interpolation from other frames are added by the decoder to construct the P and B frames. A 2D spread spectrum method for video watermarking (just another watermarking system, JAWS) was proposed in [19], which is used for monitoring video data transmitted over different broadcast links. This pixel domain watermarking scheme is distinctive for its enhanced payload capabilities and shift invariance. A novel collusion resistant (CR) video watermarking approach is proposed in [21]. This is a practical frame by frame video watermarking technique. Here a basic s × s watermark pattern is first created and this pattern is repeatedly embedded so that it is centred around a fixed number of selected points known as anchors in every video frame. The part of the video frame where the basic watermark is embedded is called the footprint. Anchor points are calculated using feature extraction algorithm. As the content of the video frames changes, so do the selected feature points. As a result of that watermark footprints evolves with the video. After generating these watermark frames with in a given host frame, spatial masking is applied on it to ensue robustness and imperceptibility criteria. Then the scaled watermark is embedded in the host data using addition. Watermarking using CDMA modulation was proposed in [22]. In this proposed methodology one of the four least significant bitplanes are replaced by watermark planes. The bitplanes to be replaced are selected according to a random periodic quaternary sequence. The watermark plane is generated using 1D spread spectrum methodology. For detection of the watermark, the author proposed a two-level hierarchical correlation methodology. One of the prime motivations for integrating watermarking into video coding structures such as MPEG2, H.264 etc is to reduce the overall real-time video

processing complexity. The reader is referred to [26] for an exposition on H.264/AVC. In [23] a watermarking method using variable length code (VLC) swapping was proposed. This methodology was based on the observation that in the MPEG-2, H.26l VLC tables there are pairs of code words ( r , l ) 6 c0 and

(r , l ± 1) 6 c1 such

that

length(c 0 ) = length(c1 ) ,

lsb(c0 ) ≠ lsb(c1 ) . Such level-adjacent pairs are called label-carrying VLC (lc-VLC). A covert data bit U i is embedded into a frame by extracting eligible lc-VLC, ci ∈ {c 0 } ∪ {c1} , and swapping a codeword, if necessary such that to ensure lsb( ci ) = U i . This process does not use any random key based component as a result of that this method is not robust against attacks. In [24], Darmstaedter et al. proposed a data hiding method (region based energy modification, RBEM), where data were embedded by manipulating the average energy or luminance intensities in sub-regions of each frame. This method achieves a high data capacity by embedding one bit into every 8 × 8 block, and error control coding is used to ensure robustness. Here the data sequence U is directly embedded in the cover data. The concept of block classification was introduced here. With the classification of blocks, this scheme can take the advantage of local spatial characteristics and adjust its embedding strategy to improve imperceptibility and robustness criteria. One of the first transformed domain video watermarking methods (TDC) was proposed by Cox et al. in [25]. The authors proposed and stressed on the importance of embedding the watermark into perceptually significant components to increase robustness against signal processing and lossy compression techniques. The watermark of length n was populated from a standard normal distribution apart from a binary PN sequence in order to enhance robustness. This method uses a non-blind approach for watermark detention. Detection is performed by transforming the original and test frame in the DCT domain and correlating the difference vector with the expected watermark pattern. A perceptual watermarking (PW) method explicitly model masking properties of the HVS and utilizes these models to analyse video sequence or frames to embed watermark in the optimal way. The five main properties of the HSV namely, frequency sensitivity, luminance sensitivity, contrast masking, edge masking and temporal masking can be exploited by video watermarking techniques [29], [30]. In [31] a 3D DFT based robust watermarking scheme

proposed. Watermarking algorithm based on group of frames (GOF) has few important benefits as they utilize temporal properties of the video. This consideration helps to maintain temporal imperceptibility. IV. COMPARATIVE ANALYSIS OF DIFFERENT VIDEO WATERMARKING TECHNIQUES

In this section we shall discuss the comparative analysis of different video watermarking techniques in Table 1. We shall use the following terminology in the table: R: Robustness; Rl: Reliability; I: Imperceptibility; P: practicality; T: Time complexity; S: synchronization recovery. We shall denote the measure of goodness using the quantifiers Good (G), Acceptable (A), and Poor (P). TABLE 1

Technique SS JAWS CR CDMA VLC RBEM TDC PW 3D DFT GOF

R A A G A P A G A G G

Rl A A G A P A G A G A

I G G G G G G G G G G

P G G P A G A A A A A

T G G P A G A A G A G

V. WATERMARKING TECHNIQUES APPLICABLE TO H.264 / AVC H.264/AVC is becoming a popular video codec for its better compression, picture quality and applicability to portable electronic devices. So a H.264 video CODEC with a suitable watermarking embedded in it is a good consumer electronics product in current scenario. We have already discussed about different watermarking techniques and compared their performances. Now we shall discuss about an overview of H.264/AVC and then we shall discuss about the applicability of different watermarking technology in H.264/AVC. We shall discuss the overview of H.264 CODEC with help of a block diagram shown in Fig. 2 [27]. The H.264 video encoder works as follows: • Input image is captured. • Prediction cost is computed by exploiting temporal (P) redundancy and spatial (I) redundancy. • Best prediction mode (temporal (P) or spatial (I)) is selected by defining a minimizing function on the costs. • Residue is computed for the best prediction mode.

+ Input video (YUV 4:2:0

format)

F/ n-1 (Reference)

Best Predictio n Mode and block size selection

T

Reorder

Q

Entropy Encoder

+ Inter (P)

NAL

Intra (I)

+ /

F n (Reconstruct ed)

De-blocking Filter

T-1

Q-1

+

Fig 2 Block diagram of H.264/AVC Encoder

• Residue is gone through a Integer transformation on 4x4 sub-blocks followed by quantization. • Using inverse quantization and inverse transformation residue part is reconstructed. • Reconstructed image is filtered with de-blocking filter to remove blocking artefacts. • Quantized coefficients are reordered and entropy coded. • Reconstructed and deblocked image is used as the reference for next frame prediction. Watermarking can be implemented in motion vectors or in integer transformation [28]. But H.264 differs from other video CODECs like MPEG in the following basic units: • All transformations are performed on 4x4 block instead of 8x8 or 16x16 block. • Integer transformation is used which differs from the DCT used in other CODECs. But most of the reliable and robust watermarking techniques are applied in transformed domain only, so some modifications in the existing algorithms are required to implement watermarking in H.264 system. Moreover one of the target applications of H.264 is videophone and video conferencing. These require criteria that watermarking needs to be performed in real time. So now we shall discuss the applicability of different watermarking technology in H.264 in a table 2.

SS

Pixel/Transf ormed

Algebraic/ DCT

Y

JAWS

Pixel

Algebraic

Y

CR

Pixel

Algebraic

Y

CDMA

Pixel

Algebraic

N

VLC

During Algebraic Compression

Y

RBEM

Pixel

Algebraic

Y

TDC

Transformed

DCT

N

PW

Pixel/Transf ormed

Algebraic/ DFT /DCT

Y

3D DFT

Transformed

DFT

N

REFERENCES [1] [2]

[3]

TABLE 2 [4]

Technique

Domain

Basic Tool

Applicability in H.264

F. Hartung and M. Kutter, “Multimedia watermarking techniques”, Proceedings of the IEEE, vol. 87, no. 7, July 1999. I. J. Cox and M. L. Miller, “Electronic watermarking: the first 50 years”. Fourth, IEEE Workshop on Multimedia Signal Processing, 2001, pp. 225-230. J. Brassil, S. Low, N. Maxemchuk, and L. O’Gorman, “Electronic marking and identification techniques to discourage document copying,” IEEE J. Select. Areas Commun., vol. 13, pp. 1495–1504, Oct. 1995. S. Low and N. Maxemchuk, “Performance comparison of two text marking methods,” IEEE J. Select. Areas Commun.(Special Issue on Copyright and Privacy Protection), vol. 16, pp. 561–572, May 1998.

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

[19]

[20] [21]

[22] [23]

[24]

S. Low, N. Maxemchuk, J. Brassil, and L. O’Gorman, “Document marking and identification using both line and word shifting,” in Proc. Infocom ’95, Boston, MA, Apr. 1995. F. M. Boland, J. J. K. Ó Ruanaidh, and W. J. Dowling, “Watermarking digital images for copyright protection,” in Proc. Int. Conf. Image Processing and Its Applications, vol. 410, Edinburgh, U.K., July 1995. M. S. Kankanhalli, Rajmohan, and K. R. Ramakrishnan, “Contentbased watermarking of images,” in Proc. ACM Multimedia ’98, Bristol, U.K., Sept. 1998. I. Pitas, “A method for signature casting on digital images,” in Proc. Int. Conf. Image Processing (ICIP), Lausanne, Switzerland, Sept. 1996. E. Koch and J. Zhao, “Toward robust and hidden image copyright labeling,” in Proc. Workshop Nonlinear Signal and Image Processing, Marmaros, Greece, June 1995. L. Boney, A. H. Tewfik, and K. H. Hamdy, “Digital watermarks for audio signals,” in Proc. EUSIPCO 1996, Trieste, Italy, Sept. 1996. F. Hartung and B. Girod, “Digital watermarking of raw and compressed video,” in Proc. SPIE Digital Compression Technologies and Systems for Video Commun., vol. 2952, Oct. 1996, pp. 205–213. F. Jordan, M. Kutter, and T. Ebrahimi, “Proposal of a watermarking technique for hiding/retrieving data in compressed and decompressed video,” ISO/IEC Doc. JTC1/SC29/WG11 MPEG97/M2281, July 1997. I. Cox, J. Kilian, T. Leighton, and T. Shamoon, “Secure spread spectrum watermarking for images, audio and video,” in Proc. IEEE Int. Conf. Image Processing (ICIP 96), Lausanne, Switzerland, Sept. 1996. -----, “Digital watermarking of uncompressed and compressed video,” Signal Processing (Special Issue on Copyright Protection and Access Control for Multimedia Services), vol. 66, no. 3, pp. 283– 301, 1998. G. C. Langelaar, R. L. Lagendijk, and J. Biemond, “Realtime labeling methods for MPEG compressed video,” in Proc. 18th Symp. Information Theory in the Benelux, Veldhoven, The Netherlands, May 1997. B.Pfitzmann, ”Information Hiding Terminology”, Proc. of First Int. Workshop on Information Hiding, Cambridge, UK, May30-June1, 1996, Lecture notes in Computer Science, Vol.1174, Ross Anderson(Ed.), pp.347-350. J. S. Pan, H. C. Huang, L. C. Jain, “Intelligent Watermarking Techniques”. J. A. Bloom, I. J. Cox, T. Kalker, J. –P. M. G. Linnartz, M. L. Miller, and C. B. S. Traw, “Copy protection of DVD video”, Proceeding of the IEEE, vol. 87, pp. 1267-1276, (1999). T. Kalker, G. Depovere, J. Haitsma, M. Maes, “A video watermarking system for broadcast monitoring”, proceedings of the SPIE, vol. 3657, pp. 103-112, (1999). Digimarc Company Website: http://www.digimarc.com K. Su, D. Kundur and D. Hatzinakos, “A novel approach to collusion-resistant video watermarking”, Proceedings of the SPIE, vol. 4675, pp. 491-502. B. G. Mobasseri, “Exploring CDMA for watermarking of digital video”, (1999) proceedings of of the SPIE, vol. 3675, pp. 96-102. G. C. Langelaar, R. L. Lagendijk, and J. Biemond, “Realtime labeling of MPEG-2 compressed video,” (1998) journal of visual communication and image representation, vol. 9, pp. 256-270. V. Darmstaedter, J. –F, Delaigle, D. Nicholson and B. Macq, “A block based watermarking technique for MPEG-2 signals: Optimization and validation on real digital TV distribution links”, Proceedings 3rd European Conference on Multimedia Applications, Services and Techniques, pp. 190-206, 1998.

[25] I. J. Cox, J. Kilian. F. T. Leighton and T. Shamoon, “Secure spread spectrum watermarking for multimedia”, IEEE transactions on image processing, vol. 6, pp. 1673-1687, (1997). [26] T. Wiegand, G. Sullivan, G. Bjøntegaard, and A. Luthra, Overview of the H.264/AVCvideo coding standard, IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 560–576, July 2003. [27] Iain E. G. Richardson, H.264 and MPEG-4 Video Compression, ISBN 0-470-84837-5 [28] Gang Qiu, Pina Marziliano, Anthony T.S. Ho, Dajun He, and Qibin Sun, “A hybrid watermarking scheme for H.264/AVC video” , Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04) [29] R. B. Wolfgang, C. I. Podilchuk and E. J. Delp, “Perceptual watermarks for digital images and video”, Proceedings of the IEEE, vol. 87, pp. 1108-1126, (1999). [30] M. M. Reid, R. J. Millar and N. D. Black, “Second-generation image coding: An overview”, ACM Computing Surveys, vol. 29, pp. 3-29. [31] F. Deguillaume, G. Csurka, J. O’Ruanaidh and T. Pun, “Robust 3D DFT video watermarking”, Proceeding of the SPIE, vol. 3657, pp. 113-124. Sourav Bhattacharya was born in Kolkata, India, in 1982. He received the B.Tech. in Computer Science and Engineering from Institute of Engineering and Management under West Bengal University of Technology in 2005. Currently he is associated with Research and Development Section of Embedded Systems group of Tata Consultancy Services Limited, Kolkata, India. His research interests include digital image processing, video compression, digital watermarking. Tanushyam Chattopadhyay was born in Suri, India, in 1976. He received the BSc in Physics from Visva Bharati and completed his MCA from Bengal Engineering College, Shibpur, India, in 1998 and 2002, respectively. He was awarded with the University Gold medal in MCA. He started his career as research personnel in Indian Statistical Institute, Kolkata, and later on, joined the software professional in Research and development section of Embedded Systems group of Tata Consultancy Services Limited, as an Assistant system Engineer. He is one of the key programmers involved in the development of a H.264 based video conferencing and video telephony system. His areas of interest include video compression, digital watermarking and encryption, video segmentation and summarization. Arpan Pal received both B.Tech. in Electronics and Electrical Communication Engineering and M.Tech degree in Telecommunication Systems Engineering from Indian Institute of Technology, Kharagpur, India in 1990 and 1993 respectively. From 1993 to 1997, he was a scientist with Research Center Imarat (RCI), a Defence Research and Development Organization laboratory at Hyderabad, India, From 1997 to 2002, he was with Macmet Interactive Technologies Pvt. Ltd., Kolkata, Since 2002, he is with Tata Consultancy Services, Kolkata, where he is leading the Convergence Solutions Practice. His areas of interests include Digital Signal Processing, Kalman Filters, Wireless Networks, Wireless Radio Transceivers, Audio/Video Compression and Wireless/Multimedia Security. He has filed for some patents in the area of Wireless Security and Wireless Baseband Communication and is also involved in next generation Wireless Standardization efforts like European Union MAGNET.