Entropy-Based Audio Watermarking Using Singular Value Decomposition and Log-Polar Transformation Pranab Kumar Dhar
Tetsuya Shimamura
Graduate School of Science and Engineering Saitama University, 225 Shimo-okubo, Sakura-ku, Saitama, 338-8570, Japan E-mail:
[email protected]
Graduate School of Science and Engineering Saitama University, 225 Shimo-okubo, Sakura-ku, Saitama, 338-8570, Japan E-mail:
[email protected]
Abstract— This paper proposes an entropy based audio watermarking scheme using singular value decomposition (SVD) and log-polar transformation (LPT) for copyright protection of audio data. In our proposed scheme, initially the original audio is segmented into non-overlapping frames and discrete cosine transformation (DCT) is applied to each frame. Watermark information is embedded into the LPT components of the largest singular value obtained from the DCT sub band with highest entropy value of each frame by quantization. Simulation results demonstrate the robustness of the hidden watermark for different attacks. The comparison analysis shows that the proposed method has superior performance than the state-ofthe-art watermarking methods reported recently.
I.
INTRODUCTION
Digital audio watermarking is a process of embedding watermark into the audio data to show authenticity and ownership. This technique serves a number of purposes such as copyright protection, data authentication, data indexing, broadcast monitoring, and so on. It must successfully satisfy the trade-off among the conflicting requirements of imperceptibility, robustness, and data payload. A comprehensive survey on audio watermarking can be found in [1]. Different watermarking techniques of varying complexities have been proposed in [2]-[4]. In [2] a robust watermarking scheme to different attacks is proposed with limited data payload. To improve the data payload, watermarking schemes performed in the wavelet domain have been proposed in [3]-[4]. Moreover, some other techniques such as spread spectrum [5], singular value decomposition (SVD) [6]-[8], empirical mode decomposition (EMD) [9], and audio histogram techniques [10]-[11] are becoming more and more popular in audio watermarking field. The major limitation of the existing audio watermarking techniques is the difficulty to obtain a good trade-off among the imperceptibility, robustness, and data payload. To overcome these limitations, in this paper, we propose an entropy-based audio watermarking scheme in discrete cosine transform (DCT) domain using singular value decomposition (SVD), log-polar transformation (LPT), and quantization. Simulation
results show that the proposed watermarking scheme is highly robust against various attacks such as noise addition, cropping, resampling, requantization, and MP3 compression. Moreover, it outperforms state-of-the-art methods [3]-[7], [9]-[11] in terms of imperceptibility, robustness, and data payload. This is because watermark information is embedded into the LPT components of the largest singular value obtained from the DCT sub band with highest entropy value of each frame by quantization. The data payload of the proposed scheme is 172.3906 bps which is relatively higher than that of the stateof-the-art methods. II.
Let A={a(n), 1 ≤ n ≤ L} be an original audio signal with L samples, W={w(i, j), 1 ≤ i ≤ M, 1 ≤ j ≤ M } be a binary logo image to be embedded into the original audio signal. A. Watermark Preprocesing In this paper, we use a tent map that contains the chaotic characteristics to encrypt the binary watermark image for enhancing the confidentiality of the proposed method. Tent map can be defined as follows: ⎧ 1 y (i ), 0≤ y (i )< β ⎪ y (i + 1) = ⎨ β1 1 ⎪⎩ β −1 y (i )+ β −1, β ≤ y (i )≤1
(1)
where y(1)∈(0,1) and β are a real parameter (map’s initial condition). Then z(i) is calculated by using the following equation: ⎧ (2) z(i) = ⎨1 if y (i )>T ⎩ 0 otherwise
where T is a predefined threshold. The binary watermark image W is converted into a one dimensional vector q, where q = {q(i), i=1, 2, 3,….., M×M}. Finally q(i) is encrypted using z(i) by the following rule: (3) u (i ) = z (i ) ⊕ q(i ), 1 ≤ i ≤ M × M where ⊕ is the exclusive-or (XOR) operation. Here, y(1), β, and the encrypted watermark sequence u(i) are used as secret key K1.
This work was supported by Ministry of Education, Culture, Sports, Science, and Technology (MEXT), Japan.
978-1-4799-0066-4/13/$31.00 ©2013 IEEE
PROPOSED WATERMARKING SCHEME
1224
7) Let Di = Si(1,1)/Q, where Q is a predefined quantization coefficient. Di of each matrix Si is further decomposed into two components Dix and Diy using the following polar-to-log transformation: Dix=eDicosθ (9) Diy=eDisinθ where θ is the angle of decomposition. 8) In order to guarantee the robustness and transparency, the proposed method embeds watermark bit into Dix and Diy of each matrix Si using quantization. This ensures that the watermark is located at the most significant perceptual components of the audio signal. If the bit to be embedded is ‘1’, the following embedding equation is used:
1
Generate frame
0.5
0
-0.5
-1
0
1
2
3
4
5
Original audio signal
Apply DCT Generate sub band and calculate entropy
Watermark image
Chaotic encryption
Secret key K1
Reshape B j ( highest ) into square matrix and perform SVD Calculate Di and decompose it into Dix and Diy using LPT
Watermark formation
Insert watermark Calculate Di', Si'(1,1) and perform inverse SVD
0
Reconstruct frame
-0.5
-1
0
1
2
3
4
5
Watermarked audio signal
B. Watermark Embedding Process The proposed watermark embedding process is shown in Fig. 1. The embedding process is implemented in the following steps: 1) The original audio signal A is first segmented into nonoverlapping frames F= {F1, F2, F3,.......,FM×M} and then each frame Fi is transformed into DCT domain to calculate the DCT coefficients Ci, where i indicates the frame number. 2) The DCT coefficients of each frame Fi are divided into m number of sub bands B= {B1, B2, B3,.....,Bm} with r numbers of coefficients in each sub band Bj, where j indicates the sub band number. 3) The entropy of each sub band Bj (denoted by Ej) of each frame Fi is calculated using the following equation: r
k=1
(10)
QGi B QGi ' Diy = Diy − B
(11)
where B is a user defined constant. If the bit to be embedded is ‘0’, the following embedding equation is used:
Figure 1. Watermark embedding process
Ej = −∑ pk log2 pk
QGi B QGi Diy' = Diy + B
Dix' = Dix +
Apply inverse DCT
1
0.5
(4)
where pk and k are the histogram count of the DCT coefficients and the index of the histogram count, respectively in each sub band. 4) The sub band which contains the highest entropy value (denoted by Bj(highest)) of each frame Fi is selected. The DCT coefficients of Bj(highest) of each frame Fi are rearranged into an N×N square matrix Ri. This is done by dividing the coefficient set into N segments with N coefficients. 5) SVD is performed to decompose each matrix Ri into three matrices: Ui, Si, and Vi. The SVD operation is represented as follows: Ri=UiSiViT (5) 6) Calculate the mean Yi, variance Vi, and average Gi of Yi and Vi of the diagonal elements (σ1, σ2, . . . , σp) of each matrix Si using the following equations: Yi =(σ1+σ2+ . . . +σp)/p (6) Vi={(σ1-Yi)2+(σ2-Yi)2+ . . . +(σp-Yi)2}/p
(7)
Gi=(Yi+Vi)/2
(8)
Dix' = Dix −
9) The modified Di' of each matrix Si' is obtained using the following log-to-polar transformation: Di' = log Dix' 2 + Diy' 2 ⎛ D' ⎞ iy ⎟ ' ⎟ ⎜ Dix ⎟ ⎝ ⎠
θ ' = tan −1 ⎜⎜
(12)
10) The modified largest singular value Si'(1,1) is calculated using the following equation: Si'(1,1)= QDi' (13) The largest singular values (Si(1,1)) of the original audio signal are preserved to utilize in detection process and are used as secret key K2. 11) Reinsert each modified largest singular value Si'(1,1) into matrix Si and inverse SVD is applied to obtain the modified matrix Ri' which is given by: (14) Ri'=UiSi'ViT Each matrix Ri' is then reshaped to create the modified sub band B'j(highest) of each frame Fi by performing the inverse operation of step 4. 12) After substituting the modified sub band B'j(highest) for Bj(highest), an inverse DCT is performed on Ci' to obtain the watermarked audio frame Fi'. 13) Finally, all watermarked frames are concatenated to calculate the watermarked audio signal A'. C. Watermark Detection Process The proposed watermark detection process is shown in Fig. 2. The detection process is implemented in the following steps: 1) The DCT is performed on each frame Fi*of the attacked watermarked audio signal. 2) Calculate the entropy of each sub band Bj*of each frame. Rearrange each B*j(highest) to obtain Ri*.
1225
and signal-to-noise ratio (SNR) between original and watermarked audio signals, bit error rate (BER), and normalized cross-correlation (NC) [4].
Attacked watermarked signal 1
0. 5
0
- 0.5
-1
0
1
2
3
4
5
Generate frame
TABLE I. SNR AND MOS RESULTS FOR DIFFERENT WATERMARKED SOUNDS
Apply DCT
Types of Signal Pop Jazz Folk Classical Average
Generate sub band and calculate entropy Reshape B*j ( highest ) into a square matrix
SNR 40.74 50.19 51.68 50.72 48.33
MOS 4.9 5.0 5.0 5.0 4.97
Perform SVD Secret key K2
Extract watermark
Secret key K1
Chaotic decryption
TABLE II. SNR AND MOS COMPARISON BETWEEN THE PROPOSED SCHEME AND SEVERAL RECENT METHODS
Reference [9] [4] [6] [7] [5] Proposed
Extracted watermark image
Figure 2. Watermark detection process
3) SVD is performed on each B*j(highest) to calculate the largest value Si*(1,1) of each matrix Si* of the attacked watermarked audio frame. 4) Watermark sequence is extracted as follows using the secret key K2: u*(i ) =
⎧ ⎪1 ⎨ ⎪0 ⎩
if Si*(1,1)>Si (1,1) otherwise
Attack Type
where the Si(1,1)’s of the original audio signal are used as secret key K2. 5) Perform chaotic decryption using the secret key K1 to find the hidden binary sequence with the following rule: (16) q* (i ) = z(i ) ⊕ u* (i ) 6) Finally, watermark image is obtained by rearranging the binary sequence q*(i) into a square matrix W* of size M×M. III.
SNR 24.12 26.84 27.13 28.36 28.59 48.33
MOS -4.60 4.60 4.70 4.46 4.97
TABLE III. NC AND BER OF EXTRACTED WATERMARK IMAGE FOR THE AUDIO SIGNAL ‘FOLK’
(15)
SIMULATION RESULTS AND DISCUSSION
In this section, several experiments were conducted on four different types of 16 bit mono audio signals (Pop, Jazz, Folk, and Classical) sampled at 44.1 kHz. Each audio file contains 262,144 samples (duration 5.94 sec). Each audio signal is divided into frames of size 256 samples. In each frame of audio signal, we embed one bit watermark information of a binary logo image. The binary logo image and the corresponding encrypted image by chaotic encryption of size M×M=32×32=1024 are shown in Fig. 3. Here, the selected value for y(1), β, T, B, θ, the sub band number m and the coefficients in each sub band r are 0.6, 0.3, 0.5, 2, 45o, 16, and 16, respectively. These parameters have been selected in order to have a good compromise among the conflicting requirements of imperceptibility, robustness, and data payload.
Algorithm EMD DWT-SVD SVD STFT-SVD Spread spectrum SVD-LPT
NC
BER (%)
No attack
1
0
Noise addition
1
0
Cropping
0.9984
0.1953
Re-sampling
1
0
Re-quantization
1
0
MP3 compression
0.9772
2.7344
Extracted Watermark
TABLE IV. NC AND BER OF THE EXTRACTED WATERMARK FOR DIFFERENT AUDIO SIGNALS
Audio Signal Pop
Jazz
Classical Figure 3. (a) Binary watermark image (b) Encrypted watermark image
We have evaluated the performance of the proposed method in terms of data payload, mean opinion score (MOS)
1226
Attack Type No attack Noise addition Cropping Re-sampling Re-quantization MP3 compression No attack Noise addition Cropping Re-sampling Re-quantization MP3 compression No attack Noise addition Cropping Re-sampling Re-quantization MP3 Compression
NC 1 1 1 1 1 0.9820 1 0.9984 0.9984 1 1 0.9766 1 0.9959 0.9984 1 1 0.9746
BER (%) 0 0 0 0 0 2.1484 0 0.1953 0.1953 0 0 2.8320 0 0.4883 0.1953 0 0 3.0273
TABLE V. A GENERAL COMPARISON OF AUDIO WATERMARKING ALGORITHMS WITH PROPOSED SCHEME SORTED BY DATA PAYLOAD Reference
Algorithm
Proposed [9] [4] [3] [10] [11]
SVD-LPT EMD DWT-SVD DWT-LPC Histogram DWT-based Histogram
Payload (bps) 172.3906 46.9-50.3 45.90 10.72 3 2
Resampling BER (%) 0 (22.05 kHz) 3 (22.05 kHz) 2 (22.05 kHz) 13.64 (22.05 kHz) 0 (--) 0 (16 kHz)
Perceptual quality assessment can be done using subjective listening tests by human acoustic perception and objective evaluation tests by measuring the SNR. The subjective listening test is conducted by ten listeners and the result is summarized in terms of MOS [4]. The SNR and MOS results of the different watermarked signals are reported in Table I. After embedding watermark bits, the SNRs of the watermarked audio signals using the proposed scheme are above 20 dB, conforming the International Federation of the Phonographic Industry (IFPI) standard [4]. Table II shows a comparison of SNR and MOS results between the proposed scheme and the several recent methods which are based on the reported results in the references [4]-[7], [9]. From these comparison results, we observed that our proposed scheme outperforms the recent watermarking methods in terms of SNR and MOS. In other word, subjective and objective evaluations prove a high transparency of the proposed scheme. To assess the robustness of the proposed scheme, various common signal processing attacks done in [8] were performed. Table III shows the extracted watermark with NC and BER values for different attacks on the audio signal ‘Folk’. The NC values are all above 0.97 and the BER values are below 3%. The extracted watermark images are visually similar to the original watermark. This result shows the robustness of watermarking method for the audio signal ‘Folk’. Table IV shows similar results for the audio signal ‘Pop’, ‘Jazz’, and ‘Classical’, respectively. The NC values are all above 0.96 and the BER values are all below 4%, demonstrating the high robustness of our proposed scheme against different attacks. This is because watermark information is embedded into the LPT components of the largest singular value obtained from the DCT sub band with highest entropy value of each frame by quantization. To enhance the security, the proposed method utilizes chaotic encryption. Since the proposed watermark embedding and detection processes depend on the secret keys K1 and K2, it is impossible to maliciously detect the watermark without these keys. Table V shows a comparison between the proposed scheme and several recent methods [3]-[4], [9]-[11] in terms of data payload and robustness to resampling, requantization and MP3 compression attack. The data payload of the proposed scheme is 172.3906 bps. From these comparison results, we observed that the proposed scheme achieves higher data payload and lower BER values against several attacks than the state-of-the-art watermarking methods.
Requantization BER (%) 0 (8 bits/sample) 0 (8 bits/sample) 0 (8 bits/sample) 5.24 (8 bits/sample) 0 (8 bits/sample) 0 (8 bits/sample)
IV.
MP3 compression BER (%) 3.0273 (128 kbps) 1 (32 kbps) 1 (32 kbps) 5.71 (128 kbps) 15 (128 kbps) 17.50 (64 kbps)
CONCLUSION
An entropy-based audio watermarking scheme using SVD, LPT, and quantization was presented in this paper. Simulation results suggest that the proposed scheme shows good robustness against different attacks such as noise addition, cropping, resampling, requantization, and MP3 compression. In addition, audio quality evaluation tests show high imperceptibility of the watermark in the audio signal. Moreover, it outperforms state-of-the-art audio watermarking methods in terms of robustness, imperceptibility, and data payload. These results indicate that the proposed watermarking scheme can be used for audio copyright protection. REFERENCES [1]
I. J. Cox and M. L. Miller, “The First 50 Years of Electronic Watermarking,” J. Appl. Signal Processing, vol. 56, no. 2, pp. 225-230, 2002. [2] D. Kiroveski and S. Malvar, “Robust Spread Spectrum Audio Watermarking,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), pp. 1345-1348, 2001. [3] R. Wang, D. Xu, J. Chen, and C. Du, “Digital Audio Watermarking Algorithm Based on Linear Predictive Coding in Wavelet Domain,” IEEE International Conference on Signal Processing (ICSP’04), vol.1, pp. 2393-2396, 2004. [4] V. K. Bhat, I. Sengupta, and A. Das, “An Adaptive Audio Watermarking Based on the Singular Value Decomposition in the Wavelet Domain,” Digital Signal Processing, vol. 20, no. 6, pp. 15471558, 2010 [5] I. Cox, J. Killian, F. Leighton, and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia,” IEEE Trans. Image Processing, vol. 6, no. 12, pp. 1673-1687, 1997. [6] F. E. A. El-Samie, “An Efficient Singular Value Decomposition Algorithm for Digital Audio Watermarking,” International Journal of Speech Technology, vol. 12, no. 1, pp. 27-45, 2009. Ozer, B. Sankur, and N. Memon, “An SVD-Based Audio [7] H. Watermarking Technique,” ACM Workshop on Multimedia and Security (MM-Sec’05), pp. 51-56, 2005. [8] P. K. Dhar and T. Shimamura, “Audio Watermarking in Transform Domain Based on Singular Value Decomposition and Quantization,” Asia-Pacific Conference on Communication (APCC’12), pp. 516-521, 2012. [9] K. Khaldi and A. O. Boudraa, “Audio Watermarking Via EMD,” IEEE Trans. Audio, Speech and Language Processing, vol. 21, no. 3, pp. 675680, 2013. [10] S. Xiang and J. Huang, “Histogram Based Audio Watermarking Against Time Scale Modification and Cropping Attacks,” IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1357-1372, 2007. [11] S. Xiang, H. J. Kim, and J. Huang, “Audio Watermarking Robust Against Time Scale Modification and MP3 Compression,” Signal Processing, vol. 88, no. 10, pp. 2372-2387, 2008.
1227