1138
ISCCSP 2008, Malta, 12-14 March 2008
A blind audio watermarking scheme based on Neural Network and Psychoacoustic Model with Error correcting code in Wavelet Domain Charfeddine Maha, Elarbi Maher and Ben Amar Chokri REsearch Group on Intelligent Machines (REGIM) University of Sfax, National Engineering School of Sfax (ENIS) Sfax, TUNISIA
[email protected],
[email protected],
[email protected]
Abstract—Audio watermarking is a method that embeds inaudible information into digital audio data. This paper proposes an audio Watermarking technique for protecting audio copyrights based on Human Psychoacoustic Model (HPM), Discrete Wavelet Transform (DWT), Neural Network (NN) and Error correcting code. Our technique exploits frequency perceptual masking studied in HPM to guarantee that the embedded watermark is inaudible. To assure watermark embedding and extraction, neural network is used to memorize the relationships between a Wavelet central sample and its neighbors. To increase robustness of the scheme, the watermark is refined by the Hamming error correcting code while the encoded mark is embedded as new watermark in the transformed audio signal. Our audio watermarking algorithm is robust to common audio signal manipulations like MP3 compression, noise addition, silence addition, bit per sample conversion, noise reduction, dynamic changes and Notch filtering. Furthermore, it allows blind retrieval of embedded watermark which does not need the original audio and makes the watermark perceptually inaudible. Keywords-component; Audio watermarking; Wavelet transforms; Psychoacoustic model; Neural networks; Hamming codes
I.
INTRODUCTION
Recently, a necessity has arisen for copyright protection of digital content against piracy. The efficiency and easiness of distribution, reproduction and manipulation of multimedia product over the Internet at very low cost has offer increasing potential to both legal and unauthorized data manipulation. A typical example of this problem is the piracy of high-quality music across the Internet in MP3 format. In the early days, encryption and control access techniques were employed to provide adequate protection of copyrighted contents. Those schemes effectively protect the audio products from interception during content transmission but they are no longer sufficient since they are unable to protect them once they are decrypted. Digital Watermarking [1], the art of hiding information (watermark) into the original audio signal, has been proposed as a potential solution to this problem. This mark should not degrade audio quality, but it should be detectable and indelible. This technology covers any type of digital document (text,
c 978-1-4244-1688-2/08/$25.00 2008 IEEE
sound, image, video, etc) and should satisfy a trade-off between three properties [2]: ratio (watermark bit number), inaudibility (watermarking should not degrade sound quality) and robustness (the watermark should resist any transformations applied to the audio signal, as long as sound quality is not unacceptably degraded.) In this paper, a blind audio watermarking scheme operating in Wavelet domain [3] is described. The original audio is divided into 384 block samples. Some blocks are chosen randomly and then transformed to the Wavelet domain to embed watermark. Watermark embedding and extraction are based on Neural Network (NN) which is used to memorize the relationships between a transformed central sample and its 8 neighbors. In addition, Human Psychoacoustic Model (HPM) is used to select inaudible embedding positions and an error correcting code is applied to the watermark in order to enhance the robustness. The extraction is the inverse process. The rest of this paper is organized as follows: section 2 presents some related works in audio watermarking. Section 3 introduces basic concepts for audio masking which refer to the frequency masking [4] with MPEG-I Human Psychoacoustic Model 1 HPM1 [5]. Section 4 describes then the proposed audio watermarking scheme. Section 5 shows the obtained results. Finally, section 6 concludes the paper. II.
RELATED WORKS
There are two main ways to classify audio watermarking techniques. The first concerns schemes which require the original signal to detect the watermark from the watermarked signal known as nonblind techniques and schemes which are capable of recovering the watermark data without requiring access to the original audio known as blind techniques[6][7]. The second is related to the domain of insertion. In fact, some schemes embed the watermark directly in the time domain [8][9], and others operate in the transformed domain usually using Discrete Fourier Transform DFT[10], Discrete Cosine Transform DCT [11] or Discrete Wavelet Transform DWT [12]. According to the audio watermarking literature survey, our contribution resides in proposing an algorithm which satisfies the following efficient constraints:
ISCCSP 2008, Malta, 12-14 March 2008
•
Taking into account the Human psychoacoustic model HPM to improve the robustness and inaudibility of the hidden information.
•
Adopting blind detection without resorting to the original digital audio signals.
•
Exploiting the Wavelet domain to assure robustness property thanks to the advantages of such a domain compared to others.
•
Using the Neural Network NN for insertion and detection process to ameliorate robustness. In fact, techniques with simple insertion and detection strategies failed to enhance robustness.
•
Adding an Error Correcting Code to refine the embedded watermark during detection. III.
PSYCHOACOUSTIC MODEL
The MPEG audio algorithm compresses audio data in large part by removing the acoustically irrelevant parts of the audio signal. In fact, it takes advantage of the human auditory system’s inability to hear quantization noise under conditions of auditory masking. This masking occurs whenever the presence of a strong audio signal makes a temporal or frequency neighborhood of weaker audio signals imperceptible. Masking is then; usually distinguished in frequency masking and temporal masking [13]. The psychoacoustic model analyzes the audio signal and computes the amount of noise masking available as a function of frequency. The encoder uses this information to decide how best to represent the input audio signal with its limited number of code bits. The psychoacoustic model 1 HPM1 defined in MPEG audio algorithm, for layer 1[5] exploits the frequency masking : let two simultaneously occurring signals be close together in frequency, the lower-power frequency components may be inaudible in the presence of the higher-power frequency components. From an audio signal S1, this model calculates a curve LTg called masking threshold that is homogeneous to a power spectral density (PSD) Xf[5]. If the PSD of a signal S2 is below LTg, then S2 is masked by S1. This means that the listener is unable to perceive any difference between S1 and (S1+S2). The same phenomenon is applied to S3 and S4 shown in Fig. 1.
1139
We carry a big interest to HPM1 to search the masking threshold curve calculated in 7 steps [5]. In audio watermarking, can psychoacoustic model be used to ensure inaudibility of the watermark? Will watermark be inaudible if it is under the masking threshold curve? It is what we tried to answer in section 4 through our audio watermarking scheme using HPM1. IV. THE PROPOSED AUDIO WATERMARKING SCHEME This proposed scheme consists of two parts, including: watermark embedding, and watermark detection. Details are described in the following subparts. A. Watermark embedding The main steps of the embedding procedure developed are depicted in Fig. 2. First an original audio signal is divided into nonoverlapping NB blocks of 384 samples. We choose this number because we are going to implant the HPM1 which operates on blocks with 384 samples [5]. Then a 3-level DWT transform is performed. Simultaneously, a watermark is decomposed into different sets of length 8. Each set is encoded with a Hamming error correcting code (12,8) [14]. After that, we generate a pseudo-random index sequence Z(x) where x ∈ {1,…..,pq), pq is the encoded watermarking size. At the same time, a NN is trained to be used later in the watermark insertion and extraction operations. Given the pq selected blocks, we exploit the HPM1 to provide parameters necessary in the research of inaudible positions where to embed the watermark bits. We proceed then to the watermark insertion using an adaptive strategy. Finally we construct the watermarked audio signal after performing 3-level Inverse DWT transform. Original
Watermark Preprocessing
DWT
384 Block division
Random generator
Hamming Code
audio
HPM 1 and research positions
DWT Correspondence Training NN
NN Simulation
Bs
Is
Insertion Watermarked audio
Watermarked signal construction
Figure 2. Mark embedding process Figure 1. Frequency masking phenomenon
IDWT
1140
ISCCSP 2008, Malta, 12-14 March 2008
1) Hamming error correcting code The Hamming error correcting code (12,8) is used to increase the robustness. In fact, error correcting code plays an important role to the watermark, especially when this watermark is corrupted, i.e., when it is damaged significantly.
performance of using more than 9 nodes in the hidden layer of the NN is not improved significantly. For a selected sample I(x), the network is trained with its 8 neighbors as input vector and the value of the sample as output.
The Hamming code can overcome the corruption of a watermark, and can help it survive through serious attacks. 2) Selecting embedding positions HPM1 is exploited to ensure good robustness and inaudibility. The first 7 HPM1 steps are applied to the original signal and provide for each audio block of 384 samples, the power spectral density Xf and the masking threshold LTg. After these parameters are obtained, the position "t0" of the watermark bit to be inserted can be determined. This index corresponds to the highest difference between Xf and LTg. If we substitute the watermark bit "j" in the block "j" in this position "t0", the watermark will be perfectly inaudible because it will be masked by the masking threshold curve.
As the training process for the NN is completed, a set of synaptic weights, characterizing the behavior of the trained neural network, can be obtained and used in the NN simulation process. 4) Watermark insertion strategy Each watermark bit Wi is embedded in a selected block by modifying the Wavelet sample at the position t0cd3. The insertion is based on a comparison between Bs and Is values where Bs is the NN output and Is the value of the original central sample at position t0cd3.
Once the positions "t0" are obtained for all blocks, their correspondents in Wavelet domain must be found because of the 3-level DWT decomposition.
We use an adaptive watermarking strength, equal to abs (Bs-Is) to accentuate the difference between Bs and IS. Then we perform a substitute insertion depending on the result of this difference and the watermark bit values (1 or 0).
As it is known, the low frequencies in a DWT block are localized in the 3-level approximation coefficients "ca3", we are going then, to move away from this band. In fact, the human ear is very sensitive to the low frequency band which constitutes the most important information in the audio signal.
B. Watermark retrieval The extraction process is the inverse embedding one and is exhibited in Fig. 4.
We choose therefore the "t0" correspondents (t0cd3) in the 3-level detail sub-bands "cd3". The correspondence of t0 in the Wavelet domain is illustrated in Fig. 3.
The extraction process does not require the original audio signal. We need in this process all NN synaptic weights, random generator results and pq inaudible positions. All parameters must be secured against pirates. During the watermark-extraction process, the NN is employed to estimate the central sample. Detection is based on a comparison between Bs’ and Is’ values where Bs’ is the NN output and Is’ the value of the watermarked central sample at position t0cd3.
Watermarked audio 384 block division
Random block numbers DWT
NN synaptic weights pq inaudible positions t0cd3 Figure 3. Correspondence in DWT domain
Bs’ NN Simulation
3) Neural network architecture and training We establish the relationship between Wavelet samples around a wavelet central sample by using the back-propagation neural networks (BPNN) model [15]. The BPNN comprises three layers: the input layer with eight neurons, the hidden layer with nine neurons, and the output layer with a single neuron. The dependence of the performance of the NN on the number of hidden nodes can be found in [15]. In this case, the
Is’
Detected Watermark
Detection
Hamming decoding
Watermark treatment
Figure 4. Mark retrieval process
Once all the watermarks are detected and corrected using Hamming code, the normalized cross-correlation [16] NC is
ISCCSP 2008, Malta, 12-14 March 2008
1141
calculated to measure similarity between the extracted and the inserted watermarks: n
∑
bin ( i , j ) * bin ' ( i , j )
i, j =1
NC =
n
(1)
n
∑
bin ' ( i , j )
i , j =1
2
*
∑
bin ( i , j )
2
i, j =1
The more NC is close to 1, the more binary detected watermark "bin’" is similar to the binary inserted watermark "bin".
V.
•
Wavelet technique using NN and error correcting code. This technique doesn’t use the HPM1 to find inaudible embedding positions, but it hides watermark in middle frequency positions after moving away the 3 level DWT approximation coefficients.
•
Modified Least Significant Bit (LSB) technique operating in temporal domain. This scheme is based on LSB standard method [17] and use an efficient insertion strategy based on a pseudo-random generator number.
•
(a) Original audio file
EXPERIMENTAL RESULTS
In order to test the imperceptibility and the robustness characteristics of the proposed audio watermarking method, we performed several experiments. In addition, to conclude well about the performances of the proposed scheme, we have compared it with other implemented techniques operating in different domains of insertion. Those techniques embed an identical watermark in the same audio signal test and they are:
•
detection (NC=1) in the case of an ideal exchange (exchange without signal processing manipulations or attacks.) To consider these eventual operations and to evaluate the robustness performance of the different watermarking schemes, we performed several tests in which the watermarked audio is subjected to commonly encountered degradations.
DCT audio watermarking scheme exploiting HPM1, NN and error correcting code. This frequency technique differs from the proposed paper scheme in the insertion domain and it doesn’t need to find insertion position correspondences in frequency domain after HPM1 localization. DCT technique using NN and error correcting code in the same way as the previous scheme. Therefore, this second frequency technique doesn’t use the HPM1 to find inaudible embedding positions, but it hides watermark in middle frequency positions.
An audio “wav” file with 44.1 kHz sampling rate and 16 bits per samples depicted in Fig. 5 is used in our experiments. The watermark is a binary image of size 32×32, showing in Fig. 6. The wavelet bases of ‘‘haar’’ and ‘‘db1’’ [3] are used, and they show similar results. In order to estimate the audio quality after watermark embedding in the different schemes, some listeners were presented with the original and the watermarked signals. We use the NC formula in the watermark detection, as a performance measure. All techniques, guarantee free error
(b) DWT, NN, HPM, Hamming
(c) DWT, NN, Hamming
(d) DCT, NN, HPM, Hamming
(e) DCT, NN, Hamming
(f) Modified LSB Figure 5. Original and watermarked audio files according to the corresponding technique
Figure 6. Image watermark
1142
ISCCSP 2008, Malta, 12-14 March 2008
The experimental results are described in details as following:
DWT+NN+HPM
DWT+NN
DCT+NN
Modified LSB
DCT+NN+HPM
1,2
Inaudibility results
NC
1
A.
0,8 0,6 0,4 0,2
The inaudibility of our schemes has been certified through listening tests that involved 9 persons after watermark embedding.
0
Co/de c
For all techniques, the listeners were presented with the original and the watermarked audio and were asked to report any differences between the two signals. None of the subjects was able to tell any differences between the two audio files.
Echo room
Dynamic change s
DWT+NN+HPM
DWT+NN
DCT+NN
Modified LSB
Sile nce
DCT+NN+HPM
1,2 1
Therefore, the proposed methods remarkably possess imperceptible capability for making watermarks inaudible.
NC
0,8
0,4 0,2
The proposed technique (DWT, NN, HPM, Hamming), the frequency techniques, and the temporal Modified LSB technique achieve better inaudibility performance then the (DWT, NN, Hamming) technique. They assure higher imperceptible capability even when we insert an important watermark bit number in an audio signal with small length.
0 Conv1
Conv2
DWT+NN+HPM
DWT+NN
DCT+NN+HPM
B. Robustness results The degradations used to evaluate our schemes include MP3 compression/decompression, addition of brown noise or white noise, bit per sample conversion (conv1: 16 to 32 or conv2: 16 to 8 bits per sample while keeping the 44.1KHz frequency sample), frequency sampling conversion (conv3: 44.1 to 48 KHz while keeping the 16 bits per sample), echo room addition, silence introduction, dynamic changes, hiss reduction, low-pass filter and Notch filter. Robustness has been assessed using the NC correlation measure between the embedded watermark and the identified watermark from the watermarked and manipulated signals. Fig. 7 summarizes the watermark detection results for these degradations.
DCT+NN
Modified LSB
1 0,8 0,6 0,4 0,2 0
Brown noise White noise
We can conclude, then, that the temporal and frequency domains of insertion are very suitable for hiding watermark in audio signals contrary to the Wavelet domain which presents, if it is used alone, some audibility problems when we increase the ratio. For that reason, we have exploited the HPM1 in the technique proposed in this paper, operating in Wavelet domain, which guaranteed higher imperceptible capability thanks to the frequency masking exploited in HPM1 model.
Conv3
1,2
NC
However, this is not guaranteed with (DWT, NN, Hamming) technique. In fact, in this last technique, we hear sometimes a discontinuous light noise if we increase strongly the ratio in signals with small lengths.
0,6
Notch Filter
Pass-band
Hiss reduction
Figure 7. NC values after the attacks applied to the watermarked signals for each technique
For all techniques, the watermark was detected successfully even when the watermarked signal is manipulated with MP3 compression/decompression or with silence introduction. All methods don't resist to conv3 and to echo room. Tab. 1 summarizes the robustness results for each technique while ignoring the conv3 and to echo room attacks. TABLE I. Co/Dec MP3, Silence
Scheme/NC
DWT, NN, HPM, Hamming DWT, NN, Hamming DCT, NN, HPM, Hamming DCT, NN, Hamming Modified LSB
X
ROBUSTNESS RESULTS Noise Brown, White
Lowpass filter
X
Dynamic changes, Hiss reduction, Notch Filter, Conv1,Conv2 X
X
X
X
X
X
X
X
X
X
X X
ISCCSP 2008, Malta, 12-14 March 2008
From Tab.1 and Fig. 7, we can conclude that (DCT, NN, HPM, Hamming) scheme is the most robust technique. The (DCT, NN, Hamming) method and the scheme proposed in this paper are robust to the same attacks and present considerably higher robustness capability. However, the (DWT, NN, Hamming) method is less robust and the Modified LSB technique possesses the lowest robustness capability. The experiments also demonstrate that the schemes exploiting the Hamming error correcting code possess higher capability in improving the NC value. In fact, detection errors introduced by some attacks were refined. The Hamming code permit to correct one error in 8 bits and help then to overcome the part of the corruption of the watermark, thus the NC values of the watermark is higher than the one without the error correcting code. VI.
CONCLUSION
Compared with digital image and video watermarking technologies, digital audio watermarking technology provides a special challenge because the human auditory system is extremely more sensitive than human visual system. In this paper, an original blind audio watermarking scheme combining several domains is presented. The proposed technique can embed the watermark into the digital audio signal in the Wavelet domain. Operating in this domain makes the watermark more resistant to wide range of attacks. The robustness is improved by using the NN since it has highly adaptive decision from training examples and the Hamming code because it overcomes the corruption of the watermark, and can make it survive through serious attacks. Moreover, the method makes the mark imperceptible by exploiting the frequency masking characteristics of the HPM1. Watermark detection is done without referencing the original audio signal which permits to identify easily the audio owner copyright. Experiments have shown that the inaudibility and robustness performance goals can be achieved successfully. ACKNOWLEDGMENT The authors would like to thank Professor Mohamed Adel ALIMI from REGIM (REsearch Group on Intelligent Machines) laboratory for his advice and the fruitful discussions elaborated with him. The authors also would like to acknowledge the financial support of this work by grants from the General DGRST (Direction Generale de la Recherche Scientifique et de Technologie), Tunisia, under the ARUB program 01/UR/11/02. We would like to extend our thanks to Mr. A. Damak from the Engineering School of Sfax Tunisia, for his help with English. REFERENCES [1] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, “Techniques for data hiding”, IBM Systems Journal, vol. 35, pp. 313–336, 1996. [2] R. Barnett, “Digital watermarking: applications, techniques and challenges”, Electronics and Communication Engineering Journal, 11(4): pp. 173–183, 1999.
1143 [3] I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis”, IEEE Transactions on information theory, vol. 36, pp. 961-1005, 1990. [4] P. Srinivasan and L. H. Jamieson, “High quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling”, IEEE Transactions on Signal Processing, vol 46, No. 4, April 1998. [5] ISO/IEC IS 11172 (MPEG), Information technology—coding of moving pictures and associated audio for digital storage up to about 1.5Mbits/s, 1993. [6] M. Arnold and Z. Huang, “Blind detection of multiple audio watermarks,” in Proceedings of First International Conference on Web Delivering of Music, pp. 4–11, IEEE Computer Society, (Florence, Italy), Nov. 2001. [7] C.T. Hsieh and P.Y. Tsou, “Blind cepstrum domain audio watermarking based on time energy features”. 14th International Conference on Digital signal Processing Proceeding, Greece: Santorini, pp. 705-708, 2002. [8] C. Laftsidis, A. Tefas, N. Nikolaidis and I. Pitas, “Robust multibit audio watermarking in the temporal domain”. ISCAS (2), pp. 944947, 2003. [9] A.N. Lemma, J. Aprea, W. Oomen and Le. van de Kerkhof , “A temporal domain audio watermarking technique”, IEEE transactions on signal processing, vol. 51, NO. 4, pp. 1088-1097, April 2003. [10] Malik, H.Khokhar, S. Rashid and A. , “Robust audio watermarking using frequency selective spread spectrum theory”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’04), pp.385-388, 2004. [11] I.K. Yeo and H.J. Kim, "Modified patchwork algorithm: A novel audio watermarking scheme," IEEE Trans. Speech Audio Process, vol.11, no.4, pp .381–386, 2003. [12] Wu, S.Q., Huang, J.W., Huang, D.R., Shi and Y.Q.,”Selfsynchronized audio watermark in dwt domain”, Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS ’04), pp.712-715, 2004. [13] P. Noll, “Wideband speech and audio coding,” IEEE Communication Magazine, vol. 26, no. 11, pp. 34–44, 1993. [14] R.W, “Hamming, error detecting and error correcting codes”, Bell Syst. Tech. J. 26 (2), pp.147–160, 1950. [15] S. Haykin, “Neural Networks: a comprehensive foundation”, Macmillan College Publishing Company, New York, NY, USA, 1995. [16] CH. Tung and Ja. Ling, “Digital watermarking for video”, Proceedings 1997 13th International Conference on Digital signal Processing, DSP 97, Volume: 1, pp. 217 -220, 2-4 Jul 1997. [17] P. Bassia and I. Pitas, “Robust audio watermarking in the time domain“, Proc. EUSIPCO 98, vol. 1, pp. 25--28, Rodos, Greece, 1998.