proposes an adaptive frequency-domain audio ... Pseudo Noise generator is used in the generation of secret ... frequency content of the input audio signal.
An Adaptive Encoder for Audio Watermarking B. GUNSEL, S. SENER,Y. YASLAN Department of Electrical and Electronics Engineering Istanbul Technical University 34469 Maslak, Istanbul TURKEY
Abstract: - This paper introduces an adaptive technique for audio watermarking. Almost all of the published audio watermarking methods report a tradeoff between audibility and detectibility, and performs well in digital environment. However, in most of the applications audio should be transmitted through the channels (in analog form) in which the detection performance relies on the strength of embedded watermark signal. This work proposes a new frequency-domain technique which allows to increase watermark strength in a data adaptive way while preserving inaudibility. The method is compatible to MPEG Layer 1 and robust to stereo-to-mono conversions. Key-Words: Physcoacoustic masking, data hiding, watermark decoding.
1 Introduction
2 Adaptive Audio Watermarking
In the past few years, digital technologies made easy to create and copy multimedia content and to distribute this information over Internet at very low cost. However, these enabling technologies also made it easy to illegal copy, modify and redistribute multimedia data without paying anything. Digital watermarking is a method to embed sideband data, such as author copyright information, into host data thus is seen a solution to the problem of protecting digital media.
Let s ti (n) refers to the nth original audio samples of ith audio frame at state t. In our work, audio data is sampled at 44100 Hz and for each frame the number of samples is N=512. Thus each audio frame is 11 msec in length. Since our adaptive watermarking method requires iterative insertion of one watermark bit, w j , into each
This paper deals with the protection of audio content. Many techniques have been proposed for audio watermarking, both in time [1] as well as in frequency domains [2]. Although, most of these techniques perform well in theoretical domain, their performance falls down radically over real communication channels. This work proposes an adaptive frequency-domain audio watermarking technique which controls the decoding accuracy at watermark encoding stage. Thus allows to keep watermark strength at adequate level in a data adaptive way, while preserving inaudibility . The method is compatible to MPEG Layer 1 and robust to stereo-to-mono conversions. Section 2 presents the proposed audio watermarking technique. Section 3 describes our decoding scheme for watermarked audio. Performance of the proposed adaptive technique as well as the non-adaptive techniques are reported in Section 4. Section 5 summarizes cnclusions.
audio frame, t is the counter of these insertions thus refers to the state of insertions. The initial value of t is equal to one. Watermark bits can be either 1 or -1, j can be any integer from 1 to L where L is the watermark length . Figure 1 illustrates a general block diagram of a watermarking system. The audio data is processed frame by frame. At each instant, the encoder takes an original t audio frame, s i (n), n=0,1,..,N-1, as its input and transmits the corresponding watermarked frame, s tiWM (n), over the communication channel. At the receiver side, the received watermarked signal, s ti R (n), is decoded by a
)
watermark decoder and the watermark bit w j is estimated. Note that the WM decoder only needs secret key, k, to be able to extract watermark data. Figure 2 presents a block diagram of the proposed adaptive watermark encoding scheme. Our watermark embedding technique embeds a modulated secret key, k m , into each audio frame while inserting one watermark bit per frame. Physcoacoustic frequency masking module generates masking thresholds in a data adaptive way. The secret key is modulated in frequency domain by using masking thresholds. Data embedding as performed in
time domain and sign of the modulated key is specified by the inserted watermark bit. In order to improve performance over communication channels, a channel encoder block is also added to the system. Thus watermark bits are generated by a BCH encoder [3]. A Pseudo Noise generator is used in the generation of secret key [4]. 2.1 Problem Definition The main difficulty with inaudible WM encoding schemes is, strength of the watermark data relies on the frequency content of the input audio signal. In an original audio frame, frequency components remaining below the masking threshold are suitable places for watermark embedding, however some of the audio frames allow a few insertions while others force insertions resulting in decoding errors. The insertion points should also support MPEG Layer 1[5] masking scheme which reduces the number of available data embedding points significantly.
Fig. 2 Block diagram of the proposed adaptive encoder.
3
Watermark Decoding
Assuming that the communication channel is additive, the received watermarked audio signal s ti R can be expressed as: s ti R = s tiWM + n = s ti + w k m + n
(1)
where n models the additive noise of channel, w is the watermark bit stream. Fig. 1 Block diagram of an audio watermarking system.
Therefore using available data embedding capacity in an efficient way is the key point in the design of an audio WM encoder. The proposed WM encoding scheme allows to use maximum embedding capacity in a data adaptive way to make sure that the decoding accuracy is high. Unlike the reported techniques, our watermark encoding scheme minimizes decoding error at encoding stage by using a feedback loop. As it is seen from Figure 2, at each state t, a feedback block checks the WM decoding error. If the error is not equal to zero, means correct detection of watermark bit is impossible, then the watermarked audio frame is re-watermarked. The feedback loop uses the same decision making criteria as the WM decoder that is explained in the next section.
Decision mechanism of our WM decoder is similar to the correlator receivers. Thus the WM decoder mainly computes correlations between the received watermarked signal, s ti R , and embedded secret key, k, by using Eq. 2, N −1
r=
∑(k(n)− < k >)(s n=0
N −1
t iR
(n)− < sitR >)
N −1
∑(k(n)− < k >) ∑(s n=0
2
n=0
(2) t iR
(n)− < s >) t iR
2
where is the mean operation. The extracted watermark bit will be either 1 or -1 depending on the sign of the correlation r. For each ) inserted watermark bit, w j , j = 1,2,..,L, will be estimated as follows:
1 wˆ j = − 1
if sign(r ) > 0 if sign(r ) < 0
(3)
To improve decoding accuracy, original audio signal and secret key are selected as orthogonal to each other. Furthermore, secret key is a zero-mean unit variance signal [6]. The channel is an additive channel. Therefore, the decoding accuracy is directly related with the watermark to signal ratio (WSR) that can be computed by Eq. 4.
∑N−1[sit (n) − sit (n)]2 WM WSR=10log10 n=0 N−1 dB t 2 ( s ) ( n ) ∑n=0 i
(4)
Higher the embedded data, higher the watermark decoding accuracy. Fig. 3 presents the main blocks of our watermark decoder. In order to improve decoding performance over communication channel, a channel decoder is also used at the decoding stage.
4
Test Results
Test data set is prepared by sampling audio data recorded from different radio/TV channels at 44100 Hz. 512 samples per frame is stored in wave format. Thus the length of an audio frame is 11 msec. An MPEG Layer 1 compatible pyshcoacoustic masking module is implemented, thus tonal and non-tonal components are specified as in MPEG Layer 1. Stereo to mono conversions are performed by a professional software package. 0.15 encoder2 correlation unlimited correlation
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
0
10
20
30
40
50
60
70
Fig. 4 Distribution of correlations versus frame number for 64 audio frames. Fig. 3 Audio watermark decoder.
The idea behind adding feedback block shown in Figure 2 is to increase WSR by iterative insertions. Because, at each state, t, the embedding capacity will vary depending on the masked original audio data. Each re-embedding allows us to change masking thresholds in a data adaptive way, thus to keep the inserted watermark just below the audible level while increasing the insertion capacity. Thus, with the proposed encoder, the size of the embedded data is maximized resulting in a higher decoding performance. Figure 4 illustrates the distribution of correlations versus frame number for 64 audio frames, means when the watermark length is equal to 64. Red line illustrates the distribution of correlations versus frame number for the standard watermark encoder. Blue line illustrates the same distribution obtained by the proposed encoder. It is shown that the proposed encoder increases the correlation between the watermarked data and secret key. Observe that, encoder eliminates decoding errors by changing the sign of the correlation in an iterative way, when it is required. It is shown in Fig. 5 that, in our encoding scheme, repeat number of watermark insertions varies from frame to frame. The maximum number for this examle is 18 while the minimum is 1. Thus, the adaptive encoding allows to insert the same watermark bit in an audio frame whenever it is required.
Table 1 and Table 2 report experimental results. In order to observe decoding performance for different watermark lengths, watermark bits are generated by using BCH(11,15) and BCH(16,63) encoders. Results obtained by L=15 and L=63 are reported in Table 1 and Table 2, respectively. Audio data is labeled as speech and music and performance of WM decoding is reported for these two audio types. 18 16 14 12 10 8 6 4 2 0
0
10
20
30
40
50
60
70
Fig. 5 Distribution of watermark insertion repeats versus frame number for 64 audio frames.
Since the embedded watermark bits are generated as coded bit streams, at decoding stage, decoded watermarked bits are analyzed as code word blocks. It is important to use channel codes to make WM encoding
robust to communication errors. In our work, BCH codes are used as channel codes.
audio signal in an iterative manner, thus preserves the imperceptibility criteria.
In order to observe robustness to stereo-mono conversions, Left (L) and Right (R) channels of stereo records are extracted by software. Both channels are watermarked simultaneously. Then watermarked stereo is created by editing software. By converting watermarked stereo to watermarked mono, robustness to conversions are analyzed. Results obtained for Left (L), Right (R) and Stereo-to Mono conversion (S-to-M) are reported in the tables. Alternatively, original stereo is converted to mono and instead of two channels, only mono record is watermarked. In the tables, results obtained for this experiment is reported as Mono.
It is shown that the proposed encoder significantly improves the decoding performance of standard frequency-masking based watermarking and robust to stereo to mono conversions.
Watermark decoding performance is reported in terms of correct decoding (C), missed blocks (M) and the number of false alarms (F). In practice, same watermark block is inserted repeatedly. Thus the decoding performance is observed through the record. Errors are reported as bit errors, rather than block errors. Channel decoder is used to perform error correction. Table 1 and 2 report decoding performance results for a difficult speech file and a highly complex music file. Correct ( C), False (F) and Misclassified (M) watermark blocks are reported. Performance obtained for watermarked Left and Right channels and mono record, as well as the stereo to mono conversion are reported. It is observable that, false alarm ratio (F) is zero for both methods, however without adaptivity, most of the inserted bits are missed. It is also observed that the adaptive encoding robust to stereo to mono conversions and performs well even though the length of the watermark is increased from 15 to 63 bits. This is mainly because of the adaptive encoder allows to embed more watermark data into original audio. Fig. 6 illustrates the inserted watermark data signal in time domain for adaptive watermarking (a) as well as the standard encoding (b), when t=2. Note that adaptive encoder increases the size of embedded data. Fig. 7 and Fig. 8 illustrate the original and watermarked signals in time and in frequency domains, respectively. Note that adaptive watermarking does not change the time and frequency characteristics of audio data.
References: [1]
P. Hartung and M. Kutter, “Multimedia Watermarking Techniques,” Proc. IEEE, vol87, no 7, pp.10799-1107, July 1999. [2] M. D. Swanson, B. Zhu, A.H. Tewfik, and L. Boney, “Robust Audio Watermarking Using Perceptual Masking,” Signal Processing, vol. 66, pp.337-355, 1998. [3] R. C. Bose, and D. K. Ray-Chaudhuri, “On a Class of Error Correcting Binary Group Codes,” Information Control,” vol. 3, pp.68-79, 1960. [4] H. L. van Trees, “Detection Estimation and Modulation Theory, vol.1, Wiley, NY, 1968. [5] Information technology – Coding of audio-visual objects: Structured audio, ISO/IEC FDIS 144-3, sec. 5, 10th., March 1999. [6] H. S.Malvar, and D. F. Florencio, “ Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking,” IEEE Trans. On Signal Processing,vol. 51, no 4., pp.898-905,April 2003. 80 60 40 20 0 -20 -40 -60 -80
-1000
200
400
512
(a) 60 40
20
0
-20
5 Conclusion This paper proposes a new data adaptive watermark encoder that allows to minimize decoding errors at the encoding stage. The encoding scheme allows to extend the data embedding capacity specified by the original
-40
-60
0
200
400
512
(b) Fig. 6 Watermark signal embedded into one audio frame a) by using adaptive encoder, b) by using standard encoder
Table 1 Decoding performance for a 15 bit watermark. Correct ( C), False (F) and Misclassified (M) watermark blocks are reported for music and speech data.
10000 8000 6000 4000
Speech (17 sec.) WM Adaptive Enc. WM Enc. C /F /M C /F /M L
7 /0 /94
101 /0 /0
Music (36 sec.) WM Enc. Adaptive WM Enc. C /F /M C /F /M 58 /0 /154 212 /0 /0
R
9 /0 /92
101 /0 /0
48 /0 /164
212 /0 /0
S-to-M
4 /0 /97
101 /0 /0
64 /0 /148
212 /0 /0
Mono
7 /0 /94
101 /0 /0
62 /0 /150
212 /0 /0
2000 0 -2000 -4000 -6000
-10000
0
200
512
400
(c) Fig 7. a ) Original, b) watermarked, c) adaptively watermarked signals in time domain. 9e-5
Table 2 Decoding performance for a 63 bit watermark. Correct ( C), False (F) and Misclassified (M) watermark blocks are reported for music and speech data.
8 7 6 5
Speech (17 sec.) WM Adaptive Enc. WM Enc. C /F C /F /M /M
Music (36 sec.) WM Adaptive Enc. WM Enc. C /F /M C /F /M
R
1 /0 /23 1 /0/23
24 /0 /0 24 /0 /0
15 /0 /35 15 /0 /35
50 /0 /0 50 /0 /0
S-to-M
1 /0 /23
24 /0 /0
17 /0 /33
50 /0 /0
L
Mono
1 /0 /23
24 /0 /0
16 /0 /34
50 /0 /0
4 3 2 1 0
0
200
400
512
(a) 9e-5 8 7 6 5 4 3
10000
2
8000
1
6000
0
4000
0
200
2000
512
(b)
0 -2000
9e-5
-4000
8
-6000
7
-100000
400
200
400
512
6 5
(a)
4
10000
3
8000
2
6000
1
4000
0
0
200
400
2000
512
(c )
0 -2000
Fig 8. a ) Original, b) watermarked, c) adaptively watermarked signals in frequency domain.
-4000 -6000
-100000
200
400
(b)
512