IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
649
A Low Complexity Video Watermarking in H.264 Compressed Domain Azadeh Mansouri, Member, IEEE, Ahmad Mahmoudi Aznaveh, Member, IEEE, Farah Torkamani-Azar, Member, IEEE, and Fatih Kurugollu, Senior Member, IEEE
Abstract—In this paper, a new blind and readable H.264 compressed domain watermarking scheme is proposed in which the embedding/extracting is performed using the syntactic elements of the compressed bit stream. As a result, it is not necessary to fully decode a compressed video stream both in the embedding and extracting processes. The method also presents an inexpensive spatiotemporal analysis that selects the appropriate submacroblocks for embedding, increasing watermark robustness while reducing its impact on visual quality. Meanwhile, the proposed method prevents bit-rate increase and restricts it within an acceptable limit by selecting appropriate quantized residuals for watermark insertion. Regarding watermarking demands such as imperceptibility, bit-rate control, and appropriate level of security, a priority matrix is defined which can be adjusted based on the application requirements. The resulted flexibility expands the usability of the proposed method. Index Terms—Bit-rate increase, compressed domain watermarking, discrete cosine transform (DCT) coefficients analysis, H.264, readable/detectable watermark, temporal quality, video watermarking.
I. INTRODUCTION INCE distribution of duplicated digital video can be performed easily, appropriate techniques are required to prevent illicit usage of the video content. In video watermarking, the watermark can be added either into the uncompressed (raw data) or compressed video. There are various methods for embedding the watermark in raw video [1]–[3]. Applying them for compressed video sequences, however, needs full decoding and re-encoding for embedding or watermark detection since video signals are often stored and transmitted in compressed format. In many applications, it is really not practical to decode the sequence completely. Consequently, the compressed video watermarking has attracted more attention. Different video compression standards have emerged. The goal of each standard is providing more compressed data along
S
Manuscript received January 09, 2010; revised August 09, 2010; accepted August 22, 2010. Date of publication September 23, 2010; date of current version November 17, 2010. This work was supported in part by Iran Telecommunication Research Center (ITRC). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Wenjun Zeng. A. Mansouri, A. M. Aznaveh, and F. Torkamani-Azar are with the Electrical and Computer Engineering Faculty, Shahid Beheshti University, G.C., Tehran 1983963113, Iran (e-mail:
[email protected];
[email protected];
[email protected]). F. Kurugollu is with The Institute of Electronics, Communications and Information Technology, Queen’s University, Belfast BT3 9DT, U.K. (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2010.2076280
with better quality. H.264 as the most efficient and the latest compression standard is utilized in a wide range of applications. As a result, providing a secure watermarking method, which is appropriate for this standard, is highly desirable. For watermark embedding in compressed video, there are two approaches [4]: • joint compression embedding; • compressed domain watermarking. In the first category, the processes of watermarking and compression are performed jointly. The watermarked data is used for the next predictions; therefore, the error induced by watermark embedding does not propagate. Nevertheless, decompressing and recompressing the video stream are required which are computationally expensive. Furthermore, bit-rate increase is one of the challenging issues in this category. In the second scenario, full decoding of the compressed video is not necessary. As a result, this approach can embed a watermark without performing the computationally expensive motion estimation process during recompression. Since the introduced error propagates, preserving quality is the main issue in such methods. Most of the compressed domain video watermarking methods presented so far use only detectable watermarks, i.e., they can verify the existence of a predetermined watermark rather than extracting the embedded payload. In contrast, readable watermarking schemes, in which the payload can be read without knowing it beforehand, is by far more flexible since the a priori knowledge of the embedded watermark cannot always be provided from an application point of view. As stated in [5], the term watermark decoding is used for the extraction of a readable watermark while verifying a detectable watermark is referred to as detection. Needless to say, all readable watermarks can be converted to detectable ones. Computational cost of watermark detection/decoding, in addition to the embedding process, is a significant issue since some applications necessitate real-time response. Therefore, performing watermark detection/decoding without full decoding of the video stream can be instrumental [4]. However, most of the presented techniques cannot perform the detection/decoding phase in compressed domain. To achieve acceptable quality, in H.264 compressed domain, a subset of coefficients should be selected to embed the watermark. However, the detector/decoder may not find the same subset due to changes after re-encoding or imposed attacks. Therefore, desynchronization and consequent failure in watermark detection/decoding may occur. In order to overcome this problem, the robust compressed video watermarking schemes in the H.264 standard suggest two strategies: In the first category,
1556-6013/$26.00 © 2010 IEEE
650
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
nonblind embedding is used; consequently, the precise locations of the watermark data can be achieved using the original video [6]. However, this technique restricts the scope of applications. On the other hand, the second group embeds the watermark in specified locations. These locations, however, are vulnerable to be identified by attackers [7]–[9]. Thus, the security issue is the major problem in this category. Therefore, designing a blind, robust, and secure method is really desirable and applicable. In recent years, various H.264 watermarking methods have been proposed. Most of them modify frames since such frames convey a wide range of information compared to the and frames. Moreover, the existence of this type of frame is vital for decoding. In [10], a blind watermark embedding/detection algorithm is presented providing robustness against compression; however, it requires decompressing the video sequence in order to embed and detect the watermark. Another watermarking scheme for H.264/AVC is proposed in [8]. This algorithm is implemented in such a way that the watermark data is only embedded into the last nonzero and nontrailing ac coefficient in CAVLC. Although the induced artifacts are reduced, there is no consideration toward security issues. In effect, specified locations are vulnerable to be identified by attackers. Moreover, the presented scheme is not robust against signal processing attacks. In [9], the authors embed a fragile watermark using motion vectors information and a robust watermark by modifying discrete cosine transform (DCT) residual coefficients. In addition to the security problem due to exploiting just the diagonal positions for embedding, the robustness is examined just against re-encoding. Indeed, this method is not robust against common watermarking attacks. The other H.264/AVC watermarking method proposed by Noorkami et al. [11] embeds a readable watermark in the quantized ac coefficients of frames. Although this algorithm does not have a security problem, its robustness against common watermarking attacks is not satisfactory. They presented robust embedding/detection watermarking schemes in [6] and [12]. In these methods, the original (uncompressed) video is required for calculating the parameter of visual model. In addition, the watermark is detected using the decompressed video sequence too. The algorithm also generates a palette containing the actual locations of the watermark which should be transmitted to the decoder side. Although the presented scheme in [6] is robust, its main drawback is that the watermark embedding cannot be done without performing the computationally expensive prediction process. Moreover, the effect of intracollusion attack is not considered in this method. In [13], their algorithm is extended for embedding the watermark in frames. In [7], a blind video watermarking scheme in H.264 standard is proposed. The watermark information is embedded into H.264/AVC video at the encoder by modifying the quantized dc coefficients. Bit-rate increase is one of the major problems of this method. In addition, it may lead to a security problem since just dc coefficients are exploited for embedding. In [14], we proposed a blind watermarking algorithm by analyzing structural information of the blocks which can be robust against re-encoding and some common watermarking attacks;
however, the robustness of the algorithm is restricted due to intraprediction mode changes. To avoid previous drawbacks and provide a more efficient H.264 compression domain watermarking, we introduce a new blind embedding/decoding watermarking algorithm in the compressed domain. In the proposed method, just the syntactic elements of the video stream are exploited. This information is analyzed to embed the watermark in a more robust fashion while maintaining the high quality of the watermarked video. Hence, the computationally expensive decompressing and recompression are avoided. Meanwhile, the robustness and security requirements are met at the same time. This paper is organized as follows. In Section II-A, we discuss the robustness problem and perform a spatial analysis in order to enhance the robustness of the watermarking algorithm. In Section II-B, a low complexity method for improving the temporal visual quality is proposed. The priority of residuals for inserting the watermark is explained in Section II-C. Then, the embedding technique is illustrated in Section II-D. And finally, the implementation results are given and discussed in Section III. II. PROPOSED METHOD In the proposed method, the watermark is embedded into the luma components of 4 4 intrapredicted submacroblocks of frames, . This is due to the fact that in the frames, is is selected for the dechosen for the smooth regions while tailed areas [15]. Since shows more textured macroblocks, embedding in this type of block is less sensitive to human eyes and it does not sacrifice visual quality intensively. The structural information is employed to provide a collusionsecure algorithm through generating a content-based key. The security of the algorithm is provided using random block selection based on the generated key for each macroblock. If the same key is used for watermark embedding in all frames, the method would be vulnerable to intracollusion attack [16]. Thus, an efficient algorithm uses the video content to get appropriate features in order to select a suitable area for embedding. These features should be robust enough to avoid the possibility of changes after watermark insertion or re-encoding [17]. In other words, the selected features need to be robust to prevent desynchronization during watermark extraction. In order to design a low complexity scheme, we used the H.264 codec information for generating the public key without any further decoding. We categorized nine intraprediction modes into three groups as: dc mode (2), horizontal modes (1, 6, 8), and vertical and diagonal modes (0, 3, 4, 5, 7). Since similar modes may be converted to each other after embedding or re-encoding, categorizing them makes the public key more robust in case of alternations. We assigned two bits for each mode. As a result, a 32-bit content-based key is generated for sixteen 4 4 submacroblocks. The robustness of the extracted public key can be perceived easily since for altering it, the attackers have to change the structural information of the image resulting in a significant degradation in video quality. The public key is then scrambled using a private key to generate the resultant key which is used to select the specified submacroblocks for embedding.
MANSOURI et al.: LOW COMPLEXITY VIDEO WATERMARKING IN H.264 COMPRESSED DOMAIN
651
Although the watermark decoder could find the same macroblocks to extract the watermark bits, after any simple processing followed by re-encoding, some of macroblocks may change to which will cause the decoder to lose synchronization. Furthermore, these processes may also change the intraprediction modes in blocks which consequently lead to different residuals, and hence make the embedded watermark unachievable. Thus, the embedding should be performed just in macroblocks which are robust to intraprediction mode changes. In this case, the proposed algorithm is restricted to embedding in busy areas in order to avoid the desynchronization problem since more textured blocks are often encoded and remain as even after applying any modifications. We show that the macroblocks with higher spatial activity are more robust against insubmacroblocks. In Section traprediction mode changes in II-A, the effect of spatial activity on intramode changes is explored. A. Robustness Enhancement Through Spatial Analysis In order to preserve the advantages of the compressed domain watermarking, texture analysis should be restricted to syntactic elements. Thus, we utilized the quantized coefficients to estimate the spatial activity for each block. The number of nonzero (NNZ) quantized ac residuals in H.264 are considered as a measure indicating the busyness of each block. After re-encoding, the luma prediction modes are changed which affects the synchronization in the watermark extraction process. To evaluate this effect, we investigated the rate of prediction mode changes after re-encoding. The rate of luma preto for blocks with different diction mode changes from numbers of nonzero quantized residuals is depicted in Fig. 1(a). The represented results are drawn from 150 frames of five different benchmarks: Mobile, Tempete, Suzie, Salesman, and and . Flower which are compressed using As it is shown, for blocks containing the higher value of NNZ, mode changes happen. Hence, embedding in the less to macroblocks with a higher value of NNZ prevents desynchronization of the watermark decoder. Another experiment is conducted in order to show the rate of nine intramode conversions in 4 4 blocks with different numbers of nonzero quantized residuals. In doing so, we estimated the probability of intraprediction mode changes given the NNZ value, Mode Change NNZ . The probability of intramode alternations, after re-encoding, decreases for blocks with the higher number of nonzero quantized residuals which can be considered as more textured regions. As it is depicted in Fig. 1(b), when the NNZ value increases, the probability of changes in intramodes decreases. In other words, more textured blocks can withstand better against the re-encoding process and, therefore, against other manipulations. These coefficients mostly correspond to nonflat areas which have the capability of more embedding with less degradation. As a result, the embedding algorithm is restricted to blocks with high NNZ values. Besides the benefit of further robustness, the spatial quality is considered as well due to implicitly employing a texture masking. To select the more suitable subsets of blocks, a threshold, namely , should be applied based on the
Fig. 1. Rate of changes after re-encoding considering number of nonzero quantized residuals. (a) I4 to I16 conversion; (b) probability of intraprediction mode alternations.
number of nonzero quantized residuals. It is not appropriate to select a constant threshold since the distribution of NNZ coefficients varies from sequence to sequence. In Fig. 2, the distributions of NNZ (within frame) for different sequences with are depicted. It is clearly shown in Fig. 2 that for a more detailed sequence such as Mobile the more submacroblocks contain high values of NNZ; as a result, the distribution of NNZ is skewed to the area with the highest number of nonzero quantized residuals while for a smooth sequence such as Mother, the maximum occurrences are fallen within the area with a lower number of nonzero quantized residuals. Therefore, the appropriate threshold should be selected regarding the spatial characteristics of each sequence.
652
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
, which is the complementary cuThe survival function mulative distribution function, is illustrated by the following statement: (2) In Fig. 3, the of the four different sequences with are depicted. Regarding (1) and (2), for the same , different values of will be achieved depending on the spatial activity of the sequences. The achieved value for is utilized in order to to select suitable submacestablish a robustness threshold roblocks for embedding. More details about using the robustness threshold will be described in Section II-D. Since the quality of the media highly depends on the temporal features in addition to the spatial ones, Section II-B is dedicated to analyzing the effects of the embedded watermark on the perceptual temporal quality. Fig. 2. Histogram of number of nonzero quantized coefficients in first I frame . of different sequences with
QP = 16
Fig. 3. Survival function
(S
) for the four different sequences.
In this case, we adopted a method that uses statistical information gathered from the number of nonzero quantized coefficients from the previous frame in order to select the appropriate threshold. Regarding the watermarking demands such as robustness and capacity, a percentage of 4 4 blocks, namely , containing a high value of NNZ is selected for embedding. In this case, by choosing a higher value for , the embedding capacity will increase while the perceptual quality and robustness will decrease. To obtain a target value of , the survival function of NNZ disis utilized as follows: tribution
B. Temporal Quality Consideration In a watermarked video stream, two visual artifacts may occur. These artifacts are spatial noise and temporal flicker [18]. Considering error propagation in compressed domain watermarking, it is necessary to exploit the motion information to avoid temporal flicker. On the other hand, the motion analysis should be performed just based on the available information to prevent further decoding and re-encoding. Accordingly, we analyzed the motion information of H.264/AVC to achieve a better perceptual quality. To extract the motion information of a video sequence, the motion activity of each 4 4 submacroblock is estimated. In this case, the intermode to which the current 4 4 partition in the previous frames belongs is extracted. This information is employed to construct a matrix called intermode map (IMM). As a case in point, for each frame with the size of 144 176, an IMM matrix with the size of 36 44 is achieved in which each value shows the intermode type of the specified 4 4 submacroblock. The different intermodes are: COPY SKIP to each of which is assigned corthe respective mode value respondingly. We used this information in [19] to show the importance of considering the motion activity in the achieved watermarked video quality. Similar to intramodes, the intermodes may also alter after any manipulation followed by re-encoding. To prevent intermodes alternations, we categorized them into three groups as copy mode, 16 16, and a group encompassing the rest of the other intermodes. This process makes the IMM matrix more robust against re-encoding and common watermarking attacks. To estimate the motion activity, the IMM information related to the previous GOP frames is exploited to construct the GOP motion activity map (GMAM)
where (3)
(1)
MANSOURI et al.: LOW COMPLEXITY VIDEO WATERMARKING IN H.264 COMPRESSED DOMAIN
To obtain the normalized motion activity (NMA) matrix, as it is described in (3), the GMAM information is processed. In this equation, indicates the index of a frame in the previous GOP and mean (GMAM) is a scalar value which represents the arithmetic mean of the GMAM matrix entries. For each 4 4 block, the temporal activity is estimated to be more than the average motion where the NMA is greater than one. As a result, such blocks are chosen for inserting the watermark due to the temporal characteristic of human perception. We avoid embedding the watermark if the NMA value is less than one since inserting the watermark into nonactive areas induces noticeable visual artifacts. In addition to the selection of the appropriate macroblocks for watermark insertion, it is important to modify coefficients which are less sensitive to human visual perception. Furthermore, the role of each coefficient in the video bit-rate should be taken into account in order to control the bit-rate increase within an acceptable limit. To fulfill this aim, the priority matrix is introduced in which the preference of each coefficient in terms of visual perception and bit-rate increase is estimated. C. Priority Matrix (PM) The PM is utilized to locate the most appropriate coefficients for modification. To select the suitable coefficients, the quality of the watermarked video, its bit-rate, and also security have all been taken into consideration. A PM is devised for each demand named as , , and , respectively. is a matrix prioritizing the coefficients which have less effect on visual degradais a matrix ranking the coefficients based on their role tion. is a randomly generated in bit-rate variation. Furthermore, matrix utilized to enhance the security of the algorithm. Details of constructing these matrices are explained in the following. , the structure type of each To design the PM of quality, 4 4 submacroblock is determined first. Considering the probability density function (pdf) of each coefficient given the structure type, effectiveness of each coefficient can be specified. As a result, the variances of the achieved distribution of the coefficients are utilized to construct the quality priority matrix. In this paper, we use the results of the analysis presented in [20]. The importance of each coefficient presence in a submacroblock is related to the block structure; i.e., the two coefficients in the same position but with different structures have different impacts on the quality of their corresponding submacroblocks. In this case, the priority of coefficients is needed to be identified based on the pattern directions. In [20], the authors show that the scan order for vertical and horizontal prediction modes should be different from the standard zigzag order. Based on the statistic results, they proposed two different scanning tables. In this paper, these two scanning tables (vertical and horizontal), in in case addition to standard zigzag order, are considered as of vertical and horizontal submacroblocks. In other words, the is dependent on the structure of the current submacapplied roblock. Moreover, one of the most challenging issues in video watermarking is bit-rate increase. Merely appraising the quality in designing the PM may increase the bit-rate. To prevent bit-rate changes, it is necessary to modify the coefficient in such a way that the zigzag order of the watermarked block does not alter
653
, is significantly. Thus, the priority for each coefficient in defined based on the zigzag ordering scan. On the other hand, to strengthen the security, a random selection of the coefficients is employed in addition to the block selection policy. Accordingly, a pseudorandom matrix is defined . This random matrix is generated based on a seed exas tracted from the unused bits of the generated content based key. Finally, the PM is calculated via the weighted summation of these three matrices
where (4) , , and illustrate the quality, bit-rate, and in which , , and represecurity priority matrices. Moreover, sent the contribution weights of these factors in generating PM. The higher weight of the specified control parameter leads to achieving the results considering that factor more than the two others. As a case in point, the better quality will be achieved if . the modification is applied using a higher value of D. Watermark Embedding The difference between the number of nonzero quantized coefficients in the two selected 4 4 blocks (namely and ) indicates the watermark bit. Wherever the watermark bit is opposite to the sign of the result, the modification should be applied. Regarding watermarking demands, for embedding procedure, two constrains are established as robustness threshold and quality threshold. We suppose the robustness and quality and , respectively. In order to prevent quality threshold as degradation, the modification is applied on the 4 4 blocks satisfying the following condition: if
(5)
indicates the numbers of nonzero quantized where the NNZ ac coefficients in a selected 4 4 block, and the threshold, , varies depending on the application requirements. On the other hand, to enhance the robustness, is introduced as the robustness threshold. The two candidate submacroblocks will be selected for embedding through the following condition: if where
(6)
The parameter has to be selected in such a way that the robustness threshold does not exceed the maximum number of residuals. The parameter is derived based on the spatial activities of the sequence according to (1) and (2). Consequently, the embedding is restricted to more textured areas which are more robust against re-encoding and other signal processing manipulations. In our experiments, was considered as 1.5. If the watermark bit is equal to one , then the conNNZ should be met; otherwise, the dition NNZ coefficients should be altered until this condition is satisfied. In doing so, the suitable nonzero coefficient in (based on PM)
654
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
TABLE I SIMULATION RESULTS FOR “TEMPETE” SEQUENCE,
have to be converted to zero. In this case, the least probable coefficients in PM are candidates for this modification. If the mentioned condition is not satisfied yet, the appropriate zero coefficients in the block should be converted to ( 1) randomly. These processes are repeated until the condition is fulfilled. In a reverse manner, if the embedding watermark bit is equal , then the condition NNZ NNZ to zero should be complied through a similar procedure. In order to improve the robustness, the mentioned modifications are applied until the absolute value of the difference between the numbers of nonzero quantized coefficients reaches the quality threshold. Moreover, we also modify the subNNZ . macroblocks in which NNZ and are altered in such a way that In this case, NNZ NNZ offset. Thus, more degradation is imposed on the watermarked media in favor of a more robust embedding. In this case, considering quality threshold as in extraction phase would lead to better results especially after applying attacks. In this paper, we considered an offset as 2 in the experimental results. The watermark extraction is performed after entropy decoding. The same procedures of the watermark embedding are used to extract the watermark. After determining the location of the watermarked submacroblocks, the embedded watermark is extracted as follows:
Otherwise where is the th watermark bit, and and sponding watermarked submacroblocks.
(7) are the corre-
III. EXPERIMENTAL RESULTS The proposed method was implemented using the H.264 reference software JM15.0 [21]. To evaluate the watermarking method fairly, we used different standard video sequences in QCIF format (176 144) (horizontal vertical) at the rate of 30 frames/s. Since PSNR and other image quality metrics do not take the temporal activity into account, to compare the perceptual
QP = 28
quality, the visual quality metric (VQM) [22] is employed too. This metric is between zero and one; zero means not having any distortions while one shows maximum impairment. The original compressed and watermarked sequences are used as the original and the processed clips. Since the bit-rate increase is dependent on the embedded capacity, in order to perform a fair comparison, we define the bit-rate increase ratio (BIR) as the percentage of bit-rate increase per embedded bit
(8) where BR Watermarked and BR Original are the number of bits utilized for coding the watermarked and original sequences, respectively. To evaluate the proposed method, first the results of selecting different parameters are shown in Table I. In this case, we applied the algorithm for 95 successive frames of “Tempete” sequence with ten intraperiod in the main profile and the CAVLC entropy coding. Table I demonstrates the results in terms of bit-rate increase, perceptual quality using the VQM metric, and the achieved capacity for different threshold values and different weight factors. As it is shown in Table I, the desired capacity can be achieved by selecting the appropriate values for the threshold parameter and . Modifying these two parameters will affect the capacity of the embedded watermark. In this case, considering a has almost a reverse effect on the quality higher value for of the watermarked video. On the other hand, a greater leads to achieving a higher capacity. It goes without saying that increasing the capacity affects the perceptual quality. Therefore, these two factors are contributed jointly to regulate the embedding capacity versus the imposed distortion. The effects of these two parameters on the performance of the proposed method are depicted in Table I. In order to scrutinize the effect of the control parameters, the quality threshold and are kept fixed for variant control parameleads to a better quality as it ters. Selecting a higher value for
MANSOURI et al.: LOW COMPLEXITY VIDEO WATERMARKING IN H.264 COMPRESSED DOMAIN
655
TABLE III ROBUSTNESS AGAINST THREE KINDS OF ATTACKS IN BLIND EXTRACTION (Salt AND Pepper Noise Density = 0:001, GAUSSIAN FILTER [5 5] Sigma = 0:3, AND CIRCULAR AVERAGING FILTER r = 0:5) IN THE PROPOSED METHOD AND THE SCHEME IN [14]
2
Fig. 4. Re-encoding recovery rate.
TABLE II COMPARISON OF THE BIR (BIR 10 ) OF THE PROPOSED METHOD WITH METHOD [11] AND [7]
2
is expected. By increasing the participation of bit-rate factor, the bit-rate increase can be managed better. It is obvious that if only , the security factor is considered the bit-rate increases substantially. It is clearly shown that after applying the different weights, the corresponding results and the expected ones are consistent with each other. It is worth noting that the summation of weight factors should be equal to one. In fact, their values demonstrate the contribution percentage in constructing the PM. Adjusting the weight factors, considering the application demands, provides suitable results in terms of security, quality, and bit-rate control. In Fig. 4, the robustness against re-encoding in the proposed method and the two other similar readable methods is depicted. The results of the proposed method are the average values of . It is clearly shown recovery percentage for that the proposed method outperforms the other methods in most cases. Table II shows the comparison of the BIRs BIR of different video sequences in the proposed algorithm with the methods presented in [11] and [7]. Our results show that the proposed algorithm can prevent the bit-rate increase for most of the video sequences. We evaluated the algorithm against some common signal processing attacks including “salt and pepper noise,” “Gaussian filter,” and “circular averaging filter.” The watermark is extracted after applying attacks followed by re-encoding. For evaluation,
75 frames of different sequences with ten intraperiod and were employed. These results along with the robustness evaluation of [14] are depicted in Table III. is inIn comparison to [14], the robustness threshold troduced in the proposed method. By applying this threshold during the watermarking process, the achieved improvement is impressive especially in the case of results after applying attacks. In addition, we enhanced the spatiotemporal analysis leading to better results in terms of visual quality. By modifying the defined GMAM matrix, the synchronization problem is mitigated in comparison to the scheme presented in [14]. As it can be seen, the robustness improvement of the proposed method over [14] is noticeable. For decoding, the watermarked locations should be determined during the extraction process. In this case, even if one watermarked location is missed or one nonwatermarked position is selected incorrectly, the synchronization will be lost and the watermark bits cannot be extracted properly. Inanotherscenario,alocationawarealternativeoftheproposed method was explored. Like [6], the watermark information has been extracted through the specified locations which were provided to the decoder. This information was saved as a “palette” during the embedding process. In this situation, the synchronization of the extracted watermark never loses and just the robustness of the watermark against manipulations was evaluated. Table IV compares the robustness results of the proposed and the two other methods against “salt and pepper noise,” “Gaussian filter,” “circular averaging filter,” ” from to 26 “Gaussian noise,” and “changing and 22 in case of providing the “palette” to the decoder. The spatiotemporal analysis, performed in the proposed method for selecting the appropriate blocks for embedding, enhances the robustness impressively. IV. CONCLUSION In this paper, a new low complexity video watermarking scheme in the H.264 compressed domain has been proposed. The presented technique avoids full decoding and re-encoding in both embedding and extracting phases. To enhance the video quality and robustness of the proposed method, a spatiotemporal analysis was adopted in which just the syntactic elements of the compressed video are utilized to preserve the
656
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
TABLE IV ROBUSTNESS AGAINST SIX KINDS OF ATTACKS USING PALETTE FOR EXTRACTION (Salt AND Paper Noise density : , GAUSSIAN FILTER [5 : , GAUSSIAN NOISE (0, 0.0001), AND CHANGING QP FROM 24 TO 26 AND 22) Sigma : , CIRCULAR AVERAGING FILTER r
= 0 01
=06
=04
advantage of compressed domain watermarking. We showed that the number of nonzero quantized residuals in frames can be used for achieving a more robust watermarking algorithm. Moreover, to improve the perceived temporal quality, a low computational cost analysis was developed. The evaluation of the watermarked sequence base on the VQM metric validates our claim. Furthermore, to select an appropriate subset of coefficients for watermark embedding, a new mechanism is proposed by which it is possible to adapt the embedding process based on the application demands.
REFERENCES [1] P. Campisi and A. Neri, “Video watermarking in the 3D-DWT domain using perceptual masking,” in IEEE Int. Conf. Image Processing, 2005 (ICIP 2005), pp. 997–1000. [2] A. Koz and A. A. Alatan, “Oblivious spatio-temporal watermarking of digital video by exploiting the human visual system,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp. 326–337, Mar. 2008. [3] F. Hartung and B. Girod, “Watermarking of uncompressed and compressed video,” Signal Process., vol. 66, no. 3, pp. 283–301, May 1998. [4] E. T. Lin, Video and image watermark synchronization Purdue University, 2005. [5] B. Mauro and B. Franco, Watermarking Systems Engineering (Signal Processing and Communications, 21). Boca Raton, FL: CRC Press, Inc., 2004. [6] M. Noorkami and R. M. Mersereau, “A framework for robust watermarking of H.264-encoded video with controllable detection performance,” IEEE Trans. Inf. Forensics Security, vol. 2, no. 1, pp. 14–23, Mar. 2007. [7] D. Xu, R. Wang, and J. Wang, “Blind digital watermarking of low bitrate advanced H.264/AVC compressed video,” Digital Watermarking, pp. 96–109, 2009. [8] T. Lihua, Z. Nanning, X. Jianru, and X. Tao, “A CAVLC-based blind watermarking method for H.264/AVC compressed video,” in IEEE Asia-Pacific Services Computing Conf., 2008 (APSCC ’08), pp. 1295–1299. [9] Q. Gang, P. Marziliano, A. T. S. Ho, H. Dajun, and S. Qibin, “A hybrid watermarking scheme for H.264/AVC video,” in Proc. 17th Int. Conf. Pattern Recognition, 2004 (ICPR 2004), vol. 4, pp. 865–868. [10] G.-Z. Wu, Y.-J. Wang, and W.-H. Hsu, “Robust watermark embedding/ detection algorithm for H.264 video,” J. Electron. Imag., vol. 14, no. 1, p. 013013-9, 2005. [11] M. Noorkami and R. M. Mersereau, “Compressed-domain video watermarking for H.264,” in Proc. IEEE Int. Conf. Image Processing, 2005 (ICIP 2005), pp. 890–893. [12] M. Noorkami and R. M. Mersereau, “Towards robust compressed-domain video watermarking for H.264,” in Proc. Security, Steganography, and Watermarking of Multimedia Contents VIII, San Jose, CA, 2006, p. 60721A-9. [13] M. Noorkami and R. M. Mersereau, “Digital video watermarking in P-frames with controlled video bit-rate increase,” IEEE Trans. Inf. Forensics Security, vol. 3, no. 3, pp. 441–455, Sep. 2008.
2 5]
[14] A. Mansouri, A. M. Aznaveh, and F. Torkamani Azar, “Blind H.264 compressed video watermarking with pattern consideration,” in Proc. ICASSP, Dallas, TX, 2010, pp. 1754–1757. [15] I. E. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia. Hoboken, NJ: Wiley, 2003. [16] B. Furht and O. Marques, Handbook of Video Databases: Design and Applications. Boca Raton, FL: CRC Press, 2003. [17] M. Holliman, N. Memon, and M. Yeung, “On the need for image dependent keys in watermarking,” in Proc. Second Workshop on Multimedia, Newark, NJ, 1999. [18] S. Winkler, E. D. Gelasca, and T. Ebrahimi, “Toward perceptual metrics for video watermark evaluation,” in Proc. Applications of Digital Image Processing XXVI, San Diego, CA, 2003, pp. 371–378. [19] A. Mansouri, F. Torkamani-Azar, and A. M. Aznaveh, “Motion consideration in H.264/AVC compressed video watermarking,” in Proc. 10th Pacific Rim Conf. Multimedia: Advances in Multimedia Information Processing, Bangkok, Thailand, 2009. [20] F. Xiaopeng, L. Yan, and G. Wen, “A novel coefficient scanning scheme for directional spatial prediction-based image compression,” in Proc. 2003 Int. Conf. Multimedia and Expo, 2003 (ICME ’03), pp. II-557–560. [21] H.264 Refrence Software Group [Online]. Available: http://www. iphome.hhi.de/suehring/tml/download [22] American National Standard for Telecommunications-Digital Transport of One-Way Video Signals-Parameters for Objective Performance Assessment, Standard T1.801.03-003, Jul. 2003, vol. ANSI.
Azadeh Mansouri (S’10–M’10) received the B.S. degree in computer engineering from Shiraz University, Shiraz, Iran, in 2002, and the M.S. degree from the Science and Research Branch, Azad University of Tehran, in 2004. She is currently working toward the Ph.D. degree in the Department of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran. Her research interests include image and video processing with an emphasis on compressed domain video watermarking techniques as well as quality assessment methods.
Ahmad Mahmoudi Aznaveh (S’10–M’10) received the B.S. degree in computer engineering from Isfahan University of Technology, Isfahan, Iran, in 2003, and the M.S. degree from the Department of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran, in 2005, where he is currently working toward the Ph.D. degree. His areas of research interest include image and video processing, especially digital watermarking and visual quality assessment.
MANSOURI et al.: LOW COMPLEXITY VIDEO WATERMARKING IN H.264 COMPRESSED DOMAIN
Farah Torkamani-Azar (M’10) received the B.S. degree from the Amirkabir University of Technology, Iran, in 1986, the M.S. degree from Isfahan University of Technology, Iran, in 1991, and the Ph.D. degree from New South Wales University, Australia, in 1995, all in electrical engineering. From 1995 to 2000, she was with the academic staff at Isfahan University of Technology. From 2000 to the present, she has been with Shahid Beheshti University, Tehran, Iran, as a member of the Communication Department where she is currently an Associate Professor, and a member of the Cognitive Telecommunication Research Group. Her research interests include several aspects of image processing and neural networks. Dr. Torkamani-Azar is a member of the Signal Processing Society.
657
Fatih Kurugollu (M’02–SM’08) received the B.Sc., M.Sc., and Ph.D. degrees in computer engineering from Istanbul Technical University, Istanbul, Turkey, in 1989, 1994, and 2000, respectively. From 1991 to 2000, he was a Research Fellow in Marmara Research Center, Kocaeli, Turkey. In 2000, he joined the School of Computer Science, Queen’s University, Belfast, U.K., as a Postdoctoral Research Assistant. He was appointed Lecturer in the same department in 2003. His research interests include multimedia security, soft computing for image and video segmentation, visual surveillance, and hardware architectures for image and video applications. He is an affiliate member of the IEEE Information Forensics and Security technical committee.