AUDIO WATERMARKING IN PARTIALLY ...

AUDIO WATERMARKING IN PARTIALLY COMPRESSED-ENCRYPTED DOMAIN A.V.Subramanyam, Sabu Emmanuel School of Computer Engineering Nanyang Technological University [email protected], [email protected] ABSTRACT In Digital Asset Management systems, media is often handled in compressed and encrypted form. These media sometimes need to be watermarked in compressed and encrypted domain itself. In this paper, we propose a robust audio watermarking technique for partially compressed-encrypted MP3 audio. However, arbitrary embedding of a watermark in a partially compressed encrypted MP3 audio can cause drastic degradation of the quality as the underlying change may result in random decrypted values. In addition, encryption may result in very low compression efficiency. Thus the challenge is to design a watermarking technique that provides good watermarked audio quality and at the same time gives good compression efficiency. While the proposed technique embeds watermark in the partially compressed-encrypted domain, the extraction of watermark can be done in the encrypted or decrypted domains. The experiments show that the watermarked audio quality is good and the reduction in compression efficiency is low. The proposed watermarking technique is robust to common signal processing attacks. Index Terms— Compressed-encrypted domain; audio 1. INTRODUCTION The processing and distribution of digital audio has grown enormously over the past few years. Digital audio is often compressed using popular compression standards such as MP3. The compressed audio may be encrypted for confidentiality before distribution in systems like Digital Rights Management (DRM) [1]. In DRM, the media may be transmitted from owners to the consumers through different levels of distributors. In this scenario, the distributors are entitled only to distribute the compressed encrypted media to the end user and as such cannot access the un-encrypted content. Distributors request the license server in the DRM system to distribute the associated licence containing the decryption keys to open the encrypted content to the consumers. However, each distributor sometimes needs to watermark the content for copyright violation detection, proof of distributorship, traitor tracing or others. Thus they need to watermark in the compressed encrypted domain itself. In this paper we focus on robust watermarking of partially compressed encrypted MP3 audio [2]. There have been several audio watermarking techniques proposed till date. Some of these, however few, embed the watermark in the encrypted host such as [3], [4]. Some encrypted domain techniques for image watermarking have also been proposed [5], [6]. The

aforementioned algorithms assume that either the plain-text content is accessible at the time of embedding or design the algorithm such that the decryption and embedding are performed simultaneously. However, in our algorithm, the watermark embedder does not have access to the plain text values. They have only compressed-encrypted content and also do not have the key to un-encrypt and get the plain compressed values. Thus, the techniques [3], [4], [5] or [6] cannot be employed for watermarking in compressed-encrypted domain. In [3], [4] the authors propose a secure embedding technique for forensic tracking. The embedding is done by joint decryption and watermarking method. However, when the watermark embedder has only access to the encrypted content (in case of multilevel distribution), designing an encryption system which would allow correct decryption of content while embedding the watermark is a challenging problem. In [5], Deng et. al. proposed an efficient buyer-seller watermarking protocol based on composite signal representation. However, when the content is available only in encrypted form to the distributors, the embedding scheme proposed in [5] might not be applicable as the host and watermark signal are represented in composite signal form using the plain text features of the host signal. In [6], the authors proposed a client side embedding technique using a look-up table which also suffers from a similar constraint as in the case of [3]. Therefore, we propose a robust watermarking technique for MP3 audio in which the watermark can be embedded in a predictable manner in partially compressed-encrypted domain by exploiting the homomorphic property of the applied cryptosystem. The significance of the proposed technique is that it saves the computational complexity as it requires only a partial decompression, while preserving the confidentiality of the content as decryption is not required. However, this proposed technique faces the following challenges: 1) Compressed domain watermarking: The information content in a compressed media is much higher than in the raw data, modifying which may lead to a considerable deterioration in the quality of decoded audio. Thus the position and capacity for watermark embedding has to be carefully investigated in the compressed data, so that the degradation is inaudible while preserving the robustness. 2) Encrypted domain watermarking: In an encrypted piece of content, a slight modification may lead to random decrypted values, therefore the encryption should be such that the distortion due to embedding can be controlled to maintain the audio quality. It should also be possible to detect the watermark correctly even after the content is decrypted. However, encrypting in such a manner may lead to high amount

Yes PCM audio input (frames)

Hybrid filterbank

Bit/noise allocation, quantization, coding

If noise >= 0.0 dB

Coefficient-pair difference Encrypted stream

Coefficient-pair difference

No

If final iteration done

Yes Coefficient selection

Coefficient selection

Bitstream unpacking

Encrypted coefficients

U /E /F U

Encryption

Coefficient replacement

Coefficient replacement

Watermark embedding

Coefficient-pair difference

Huffman coding

Bitstream formatting

EncryptedWatermarked bitstream

(a)

No Psychoacoustic model

Bitstream unpacking

Huffman coding

Encrypted Bitstream bitstream formatting

Encrypted Watermarked bitstream

Decryption

of cipher text expansion. Therefore, the trade-off between the increase in compressed-encrypted file size and confidentiality has to be investigated deeply. This paper is organized as follows. In section 2, we describe the applied cryptosystem, embedding and detection techniques. Section 3 gives experimental results. In section 4, we present the conclusions. 2. PROPOSED ALGORITHM The process pipeline of the proposed algorithm, which works on compressed-encrypted MP3 audio [2] is given in Figures 1 and 2. In order to have a quick understanding, we briefly describe the proposed algorithm. Brief description: In the Figure 1, the input audio is first segmented into frames. Each frame comprises of two granules where each granule is made up of 576 samples. The frame samples are then passed through hybrid filterbank and psychoacoustic model simultaneously, and subsequently quantized. The quantized coefficients are then selected based on a particular criteria and encrypted while other coefficients are left unencrypted. All the encrypted/non-encrypted quantized coefficients are replaced back to there respective original positions and are further entropy coded. The entropy coded stream is then sent to the watermark embedder. In addition, the plain-text difference between the coefficient-pairs is also given to the watermark embedder through a secret channel, without compromising the security of the proposed algorithm. The coefficient-pairs are formed from quantized coefficients in a non-overlapping manner where each coefficient is selected randomly. The pair-wise coefficients difference is needed while embedding the watermark. The benefit of using the coefficient-pairs difference is that it gives good watermark detection performance by eliminating the interference of the host signal with the watermark. In the watermark embedder block in figure 2-a, the entropy coded partially encrypted quantized coefficients is received and partially decompressed to extract the encrypted quantized coefficients. The encrypted quantized coefficients are then selected in the same way as selected for encryption. Embedding is then performed in the partially compressedencrypted domain as discussed in section 2.2. The watermarked encrypted quantized coefficients are then replaced back and all the quantized coefficients are entropy coded which can then be distributed to the users. In the watermark detection block in figure 2-b, the watermarked coefficients are selected in a similar way as in case of embedding, and detection can be performed in encrypted/decrypted domains which are explained in detail in Section 2.3. 2.1. Encryption In this section we discuss the RC4 based encryption technique [7], which has the desired properties as discussed in

Frequency to time mapping

Watermarked decoded audio



Fig. 1: Compression and encryption of MP3

Frequency sample reconstruction

U /E /F U

U /E /F U

Decrypted domain detection

Encrypted domain detection Watermark detected (Y/N)

Watermark detected (Y/N) (b)

Fig. 2: (a) Watermark embedding (b) Watermark detection Section 1. Since only the magnitude or absolute values of the quantized coefficients are used for embedding watermark, the magnitude of the quantized coefficients are encrypted using an additive homomorphic cryptosystem as explained. Let the first N number of quantized coefficients from each of two granules of a frame are used for encryption. For encryption of the magnitude part such that the encryption is additive homomorphic, the coefficients are picked randomly only if the magnitude is less than KM where KM is a constant and sequentially arranged in a different 1-D array. Let the array representing the magnitude of chosen quantized coefficients be Q = {q(i)} ∀ i = 1, 2, .., N G where G is the total number of granules to be encrypted and N G is the total number of coefficients encrypted. In the array Q, consecutive coefficients form a pair in a non-overlapping manner and each coefficient in the pair will be encrypted using the same key. In order to encrypt Q, we choose K = {k(j)} where {k(j)} ∈ [0, KM − 1] ∀ j = 1, 2, .., N G/2, a randomly generated key-stream using RC4 [8]. The encryption is then performed as given, { qe (i) = (q(i) + k(j)) mod KM i = 1, 3, ., N G − 1 qe (i + 1) = (q(i + 1) + k(j)) mod KM j = 1, 2, ., N G/2 The coefficient-pair difference between the absolute values of encrypted quantized coefficient pairs denoted by δQ is computed as shown in equation (1) and transmitted to the watermark embedder. For watermarking, only first Nw ≤ N coefficients are used from each granule. The coefficient-pair difference is transmitted in order to cancel the effect of host signal i.e. the quantized coefficients at the time of watermark detection. δq (z) = q(i) − q(i + 1) i = 1, 3, .., Nw G − 1 ; ∀ z (1) 2.2. Embedding Algorithm The watermark embedder receives the entropy coded encrypted coefficients as well as coefficient-pair difference. The codestream is then partially decoded to get the encrypted quantized coefficients which are then used for embedding watermark as shown in figure 2-a. Let the watermark embedder receives the encrypted audio which is subjected to entropy decoding. The embedder then picks the first N coefficients from each granule and the coefficient pairs are extracted in a similar way as described in section 2.1 which are then watermarked based on coefficient-pair difference. Let E = {ez } ∀ z = 1, 2, .., Nw G/2 where {ez } ∈ {0, 1} be the embedding key bit stream. Let the watermark information be

a bipolar sequence of +1 and -1. Then, the watermark information can be denoted as b = {bȷ } ∀ȷ = 1, 2, .., Nw G/2cr, where bȷ ∈ {0, 1} such that -1 is mapped to 0, and cr is the spreading factor. The watermark signal W is generated by spreading the watermark information bits b by cr as, wz = bȷ , ȷcr ≤ z < (ȷ + 1)cr − 1 ∀ ȷ = 1, 2, .., Nw G/2cr Let U be defined as, { {l, l + 1, l + 2} l = −KM + n∆ ∀ n if(x = 1) U= {l + ∆, l + 1 + ∆, l + 2 + ∆} if(x = 0) where ∆ denotes a bin size, n is an integer and x gives whether the bin corresponds to watermark bit 1 or 0. Let qew (.) denotes the watermarked encrypted quantized coefficients and d − d′ be the amount of distortion introduced in a given encrypted quantized coefficient pair to embed a watermark bit. The values of d and d′ can be obtained using plain-text difference δq (z). In order to find d and d′ , let us first define V as V = ez ⊕ wz ⊕ FU (δq (z)) where FU (.) is an indicator function which indicates whether the bin in which δq (z) falls in U belongs to 0 or 1. FU (.) = 0 if the argument falls in bin 0, else if the argument falls in bin 1 then FU (.) = 1. Then consider the following equation, δq′ (z) = δq (z) + d − d′ ∀ z = 1, 2, .., Nw G/2 Now, d and d′ are selected in such a way that δq′ (z) takes the value of the center of the same bin if V = 0, else if V = 1 d and d′ are selected such that δq′ (z) takes the value of the center of the next or previous bin whichever is nearest. As d and d′ can have more than one solution, the solution which provides least distortion is considered for embedding. Once d and d′ are computed, the embedding is given by, { qew (i) = qe (i) + d qew (i + 1) = qe (i + 1) + d′ In order to facilitate detection in encrypted domain, we store the watermark W in a different form as We = {wez } and is given as, wez = F (qew (i) − qew (i + 1)) ⊕ ez 2.3. Extraction Algorithm Once the watermarked compressed-encrypted data is received the detection can be performed in encrypted or decrypted domains as shown in watermark detection block of figure 2-b. The watermarked encrypted or decrypted quantized coefficients are selected in an identical manner as described in section 2.1. The detector involves encrypted or decrypted quantized coefficients, U, E and FU (.) for watermark extraction which is further correlated with the correct as well as randomly generated watermarks. The presence of correct watermark is decided by subjecting the correlation value to a predefined threshold T . 2.3.1. Decrypted Domain Extraction In the decrypted domain detection, watermark is extracted after the encrypted quantized coefficients are decrypted. The

extracted watermark signal bit wz′ is extracted as, wz′ = FU (qw (i) − qw (i + 1)) ⊕ ez ∀ z = 1, 2, .., Nw G/2 If Sȷ denotes the ȷth watermark information bit, then, { (ȷ+1)cr 1 if Σz=ȷcr wz′ ≥ 0 Sȷ = (ȷ+1)cr 0 if Σz=ȷcr wz′ < 0 In case the audio is available in decompressed/time domain, it is first passed through the re-compression process to extract the quantized coefficients, and the extraction follows according to the procedure given in section 2.3.1. 2.3.2. Encrypted Domain Extraction In encrypted domain, watermark is extracted in the encrypted quantized coefficients itself. The difference between the encrypted watermarked coefficients is computed and watermark signal bit is extracted as, we′ z = FU (qew (i) − qew (i + 1)) ⊕ ez ∀ z = 1, 2, .., Nw G/2 If Seȷ denotes the ȷth watermark information bit, then, { (ȷ+1)cr 1 if Σz=ȷcr we′ z ≥ 0 S eȷ = (ȷ+1)cr ′ 0 if Σz=ȷcr wez < 0

3. EXPERIMENTAL RESULTS Experiments are carried out on 16 bit PCM audio files sampled at 48 kHz. The compression rate used is 112 kbps, KM = 256, ∆ = 3, T = 0.6, and Nw = 50. 3.1. Watermarked audio quality and robustness Objective Difference Grade (ODG) is used to measure the watermarked and attacked audio quality. ODG measure lies in the range of [−4, 0] with 0 denoting no degradation, -1 slightly perceptible but not annoying, -2 slightly annoying, -3 annoying and -4 denotes very annoying audio quality. The watermarked audio quality decreases with increase in Nw as more number of coefficients get watermarked. For Nw ≤ 70, ODG remains below -1 i.e degradation is slightly perceptible but not annoying. Thus a maximum of Nw = 70 per granule can be used for watermarking. The watermarked audio can be attacked in encrypted, decrypted or decompressed domains. However, the most favorable domain would be when the content is decrypted and decompressed. This is because, if the encrypted content is attacked by adding some random noise or other attacks, the quality of decrypted-decompressed content may be highly degraded as decryption may generate random values. While, in the absence of any attack, there is no error in the extracted watermark in case of encrypted or decrypted domain detection, the results in Table 1 are given for the case when the watermarked audio is decrypted, decompressed and then attacked. Table 1 gives the normalized correlation value of the embedded and extracted watermark and ODG of decoded attacked watermarked audio. From the Table 1 it is clear that the proposed technique is robust against different attacks.

Table 1: Table showing correlation and ODG value of different audio under attacks Belike (Music) Arnie (Male Vocal) Barebear (Music) Not (Female Vocal) Attack corr. ODG corr. ODG corr. ODG corr. ODG Lowpass (3 kHz) .81 -2.2 .78 -3.0 .85 -2.2 .63 -2.2 Highpass (30 Hz) .9 -.8 1 -.5 .85 -.97 .89 -.5 AWGN .72 -3.2 .62 -3.0 .78 -3.6 .73 -3.4 Echo (100ms,20%) .63 -2.9 .61 -3.5 .61 -2.5 .78 -3.8 Median filter (5x5) .90 -3.6 .98 -3.77 .95 -3.6 .97 -3.7 Re-sampling (48/44.1/48) .92 -.3 .95 -.39 .9 -.2 .88 -.3 Re-sampling (48/22.05/48) .9 -.45 .9 -.44 .9 -.24 .87 -.35 Re-sampling (48/11.025/48) .88 -.45 .6 -2.3 .87 -.3 .86 -.44

(a)

50 45 40

128 kbps 35

112 kbps 30

96 kbps

25 20

54 kbps

15

50 kbps 40

50

60

70

80

90

100

% of coefficients encrypted/frame

% rise in Huffman coded bits

14

55

10 30

4. CONCLUSION

(b)

60

13 12 11 10

128 kbps

9

112 kbps 8

96 kbps

7

54 kbps

6

50 kbps

5 4 30

40

50

60

70

N

N

(a)

(b)

80

90

100

Fig. 3: (a) % rise in Huffman coded bits vs N ; (b) % of coefficients encrypted/frame vs N The detection performance of the proposed scheme and [9], [4] and [3] are compared. The BER (ratio of incorrect extracted watermark information bits to total number of watermark information bits embedded) for the proposed technique is between 0 and 0.19 while the BER for the scheme proposed in [9] is zero under the attacks given in Table 1. The watermark data rate is .0008 and .0013 bits/sample for the proposed technique and [9] respectively. Since the watermark is embedded in compressed-encrypted domain in the proposed algorithm, the detection performance and payload are lower than in the scheme proposed in [9] where embedding takes place in uncompressed domain. 3.2. Compression Efficiency vs Confidentiality Figure 3-a and 3-b gives the average % rise in Huffman coded bits due to encryption and the number of coefficients encrypted per frame for different N . The % rise in total number of Huffman coded bits increases with increase in N as more number of coefficients get encrypted. For N = 50, there is a rise of ≈ 29% in the coded bits at 112 kbps and 13.6% coefficients are encrypted per frame. In addition, the size of δQ can be reduced to less than 12 Bytes/frame for Nw = 50 by compressing it. The encrypted-decoded audio file is further correlated with the original audio. A correlation value below 0.35 gives a security level comparable to that obtained by complete encryption of a frame [10]. The correlation values for N = 100 varies from 0.33 to 0.35, while for N = 80 to N = 30 it is between 0.36 to 0.4. Thus, N = 80 to N = 30 can be chosen for a moderate confidentiality application, whereas N ≥ 80 can be chosen for high confidentiality encryption.

In this paper, we proposed a novel robust watermarking technique for MP3 audio in partially compressed-encrypted domain, while extracting the watermark in encrypted and decrypted domains. The watermarking scheme preserves the confidentiality of the content as the watermark is inserted in the encrypted quantized coefficients itself without requiring any decryption by the watermark embedder. Since embedding is performed after partial decoding, algorithm is suitable for practical applications by avoiding the need for a complete decompression. The experimental results show that the watermarked audio quality is good and the watermarking algorithm is robust against common signal processing attacks. The comparison of the results show that the performance of the proposed technique is good. The loss in compression efficiency is not very significant. 5. REFERENCES [1] T. Thomas, S. Emmanuel, AV Subramanyam, and M.S. Kankanhalli, “Joint watermarking scheme for multiparty multilevel DRM architecture,” IEEE Transactions on Information Forensics and Security, vol. 4, no. 4, pp. 758–767, 2009. [2] D. Pan, “A tutorial on mpeg/audio compression,” Multimedia, IEEE, vol. 2, no. 2, pp. 60–74, 1995. [3] S. Katzenbeisser, A. Lemma, MU Celik, M. van der Veen, and M. Maas, “A buyer–seller watermarking protocol based on secure embedding,” IEEE Transactions on Information Forensics and Security, vol. 3, no. 4, pp. 783–786, 2008. [4] A. Lemma, S. Katzenbeisser, M. Celik, and M. Van Der Veen, “Secure watermark embedding through partial encryption,” Lecture Notes in Computer Science, vol. 4283, pp. 433–445, 2006. [5] M. Deng, T. Bianchi, A. Piva, and B. Preneel, “An efficient buyer-seller watermarking protocol based on composite signal representation,” in Proceedings of the 11th ACM workshop on Multimedia and security. ACM, 2009, pp. 9–18. [6] A. Piva, T. Bianchi, and A. De Rosa, “Secure client-side st-dm watermark embedding,” Information Forensics and Security, IEEE Transactions on, vol. 5, no. 1, pp. 13–26, 2010. [7] C. Castelluccia, E. Mykletun, and G. Tsudik, “Efficient aggregation of encrypted data in wireless sensor networks,” in Mobile and Ubiquitous Systems: Networking and Services, 2005. MobiQuitous 2005. The Second Annual International Conference on, pp. 109–117, 2005. [8] Bruce Schneier, Applied Cryptography, John Wiley and Sons, New York, 1996. [9] A.N. Lemma, J. Aprea, W. Oomen, and L. van de Kerkhof, “A temporal domain audio watermarking technique,” Signal Processing, IEEE Transactions on, vol. 51, no. 4, pp. 1088–1097, 2003. [10] H. Wang, M. Hempel, D. Peng, W. Wang, H. Sharif, and H.H. Chen, “Index-based selective audio encryption for wireless multimedia sensor networks,” Multimedia, IEEE Transactions on, vol. 12, no. 3, pp. 215– 223, 2010.