pseudo noise generator (PN). f(s t i ,w j. ) is a nonlinear function of the input audio signal s t i , and the watermark bit w j . f(.) models the watermark generation.
An Integrated Decoding Framework for Audio Watermark Extraction* Yusuf Yaslan
Bilge Gunsel
Multimedia Signal Processing and Pattern Recognition Lab Electrical-Electronics Faculty. Dept. of Electronics and Communications Eng. Istanbul Technical University, 34469 Maslak, Istanbul Turkey http://www.ehb.itu.edu.tr/~bgunsel/mspr
Abstract This paper proposes a blind audio watermark extraction technique that allows performing watermark decoding while installing data synchronization. The proposed decoding algorithm employs correlation techniques supported by a wavelet denoising process, thus improves the decoding performance significantly. A data adaptive nonlinear MPEG Layer 1 Model 1 compatible watermark encoder is designed for watermark embedding. A channel encoder is also included into the system to take the advantage of error correction. The method does not require the original audio for decoding and it is robust to channel noise, filtering as well as stereo-to-mono conversions. It allows working at very low watermark-to-signal ratios thus preserves inaudibility.
1. Introduction Audio watermarking is a technique that can be used to declare ownership, to authenticate credibility or to carry /hide useful information into digitized audio data. An effective watermarking scheme should satisfy a set of requirements, including transparency, robustness to intentional and unintentional attacks. Watermarks can be extracted with the help of the original data or without access to it (blind detection). Blind detection is practical since it does not require to know anything at the receiver site, except the secret key. However, it is still a challenge, since the embedded watermark should be extracted from an attacked transmitted signal. Most of the existing decoding methods detect watermarks by using simple correlation-based decision rules [1] and usually they are not directly deal with the data synchronization. Therefore, decoding performance reduces significantly when the watermarked data is transmitted through a real communication channel. In this
*
paper, we introduce a blind watermark decoding framework that integrates the data synchronization and data extraction processes resulting in a high and consistent decoding performance under many circumstances. It is completed by a data adaptive watermark embedding scheme that allows maximizing the Watermark-to-Signal Ratio (WSR) according to the transmission channel conditions [2]. Our decoding framework first performs a waveletbased denoising process on the received watermarked signal. The wavelet denoising is a common preprocessing step in communication that can be used to eliminate the effect of noise [3]. However, in this work, it is used as a tool to access to the useful portion of data, i.e., embedded watermark information. After denoising, the data synchronization is performed on the decomposed signal then the decoder looks for the watermark synchronization pattern in order to extract the watermark bits. The extracted watermark data is delivered to the user. In order to improve robustness to intentional attacks as well as to eliminate channel noise during the transmission, an optional channel encoder is also included into the system. Respectively, a channel decoding process is performed at the receiver site.
2. Adaptive Watermark Encoding An adaptive audio watermarking technique introduced in [2] is used for watermark embedding. It allows maximizing watermark to signal ratio in an iterative way and controls the decoding accuracy at watermark encoding stage, while preserving inaudibility. t
Let s i refers to the ith audio frame at state t. Since the adaptive watermarking method allows iterative insertion of one watermark bit, w
j
, into each audio frame, t is the
counter of these insertions thus refers to the state of insertions. The value of t increases whenever the
This work is partially supported by Rumeli Software Inc.
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
Watermark–to-Signal Ratio (WSR) is small and the embedding algorithm converges to the optimal value of t by minimizing the decoding error. Watermark bits can be either 1 or -1, j can be any integer from 1 to L where L is the watermark length. The audio data is processed frame by frame. At each instant, the encoder takes an original t
audio frame, s i , as its input and transmits the t
corresponding watermarked frame, s iWM , over the communication channel. performed as follows t
t
Watermark embedding is
t
t
s iWM = s i + f(s i ,w j ) k = s i + w j k m , i
i=1,.., (Lx RP), j=1,..,L
(1)
where Refresh Period(RP) refers to the repeat number of block embedding thus RP is at least equal to 1 means the watermark bit stream is embedded once. In Eq.(1), w
j
is
the watermark bit which is transmitted in frame i, k is a secret key sequence with zero mean generated by a t
pseudo noise generator (PN). f(s i ,w j ) is a nonlinear t
function of the input audio signal s i , and the watermark bit w j . f(.) models the watermark generation. In [1], an analytic approach to analyze f(.) is introduced and it is shown that it is difficult to obtain an analytic form for f(.). In [2], an iterative approach that allows to specify f(.) in a data adaptive way is proposed. Thus, in Eq.(1), w j k m
i
models
the
nonlinear
distortions
coming
from
watermarking. Here, k m is the embedded data obtained i
by applying psychoacoustic masking on k, that we refer as modulated key.
3. Synchronization and Watermark Decoding The proposed decoding algorithm first applies a wavelet based denoising on the received watermarked audio data. Then a correlation based decision rule is employed to install data synchronization between the transmitted and received audio streams. Embedded watermark bits are extracted after the data synchronization. t
The received watermarked audio signal, s i R , can be expressed as; t
t
t
s i R = s iWM + n = s i + w j k m + n, i
i=1,.., (Lx RP), j=1,...,L .
(2)
t
where s iWM refers to the watermarked signal and n models the additive channel noise. Assuming that the original signal constitutes the approximation part of the wavelet decomposed received data; an estimate of the original audio signal is obtained by taking the inverse wavelet transform of the thresholded received signal as follows: ŝ
t i
t
= W-1( Λh ( W(s i R ) )) , i=1,.., (Lx RP)
(3)
In Eq.(3), Λh refers to the wavelet thresholding operator which eliminates the detail coefficients less than a threshold h. W and W-1 denote the wavelet and inverse wavelet transforms, respectively [3]. By using Eq.(2) and Eq.(3), ê, an estimate for the embedded watermark signal including channel noise can be obtained as; t t ê = s i R - ŝ i = wˆ j kˆm + n, j=1,...,L. (4) i In order to be able to extract the inserted watermark data accurately, the decoding scheme should allow to eliminate noise component without knowing the original audio signal. Only the secret key is known at the receiver site. Thus the proposed watermark decoding technique estimates the embedded data by using a correlation based decision rule. A correlation function between the ê and the secret key k is defined by Eq.(5). M−1
ri (τ) =
∑(k(n)−)(wˆ kˆ (n+τ)−< wˆ kˆ (τ) >) j mi
n=0
M−1
j mi
M−1
(5)
∑(k(n)−< k >) ∑(wˆ kˆ (n+τ)−< wˆ kˆ (τ) >) 2
n=0
n=0
j mi
j mi
2
where is the mean operation defined by Eq. (6). k(n) refers to a sample of the secret key signal where the number of samples, M, is specified by the embedding frequency band. τ=0,...N-1, where N is the number of samples within an audio frame.
< wˆ j kˆmi (τ ) > =
1 M
M −1
∑ wˆ kˆ n =0
j mi
(n + τ ) .
(6)
Note that the correlation function is simplified under a number of assumptions. The channel noise is considered as a zero mean i.i.d. noise. The watermark bit stream w and secret key k are chosen as independent signals. This is achieved by generating k and w as PN sequences. Optionally, w can be generated as a channel code word. For each audio frame, when the transmitter and receiver are synchronized, it is adequate to compute ri( 0) and than by looking at the value of correlation to decide whether the frame is watermarked or not and to extract the embedded watermark bit. However, in practice, the transmission channel causes delays. The proposed watermark extraction technique introduces a data synchronization step that aims to find exact τ value which
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
specifies the delay between the transmission and reception of audio bits. Therefore, for each frame remaining within the refresh period, ri( τ) is computed for τ=0,..., M-1 by using Eq.(6). A histogram of τ values corresponding to the maximum ri( τ), i=1,.., (Lx RP) is obtained and peak values of the histogram are considered as potential synchronization locations. These locations are reduced to one according to a tolerance, t, and the maximum τ is declared as the data synchronization delay, τsynch. Let ri(τsynch±t) refers to the correlation computed at τsynch for a given tolerance t. Then Ri, the maksimum correlation value for audio frame i is computed by maximizing ri‘s with respect to the tolerance t, as follows: (7) R i = m a x ( ri (τ s y n c h ± t ) ) t After the data synchronization, watermark extraction is performed as an integrated estimation process. Let
dˆ =[ dˆi ]LxRP is the vector collecting all estimated
watermark bits remaining within the refresh period. Each decoded bit,
dˆi , will be estimated as follows:
1 dˆi = − 1
if sign(Ri ) > 0
(8)
if sign(Ri ) < 0
In order to extract the watermark vector ŵ, the decoder scans the dˆ by shifting one bit to the right until finding a match with the watermark synchronization pattern. In our work, the synchronization pattern is an eight bits all 1 sequence. If a match can be found within the refresh period, watermark synchronization is declared and watermark decoding is initialized. Otherwise the same process is repeated until the synchronization is installed. The idea behind installing watermark pattern synchronization is to speed up the decoding and to make sure that the decoder is capable of detecting unwatermarked bit streams, thus to prevent false alarms. The decoder switches to the watermark search whenever watermark synchronization is installed. Let ŵ represents the extracted watermark vector. The decoder takes
L bits of dˆ as the initial value for
ŵ where
ŵ=[ dˆ j], j=1,...L. Suppose an (L,c) BCH code which corrects up to (dmin -1)/2 errors is used as channel encoder. Then ŵ should be a codeword. Therefore, when the estimated watermark vector ŵ is processed by the channel decoder, transmission errors up to (dmin-1)/2 are corrected and ŵ is updated. Let the updated watermark vector is ŵBCH =[ dˆ BCH ], j=1,...L. ŵBCH is treated as a j
possible watermark vector and it is used as a query example for database search. If a match between ŵBCH and any of the watermark bit streams stored in the model database is found, the name of the channel represented by
ŵBCH is declared and a new search is initiated by shifting the input audio stream L bits to the right. When the channel encoding step is skipped, a similar decoding process is performed on the input data bit streams excluding error correction. In this case, the channel decoding is replaced by a statistical decision rule. Elimination of the channel encoder is preferable when the length of the watermark bit streams is long or not appropriate to a channel encoder. It is also desirable to reduce the computational complexity, if the application does not require secure communication. In this case, in order to extract the watermark vector ŵ, the decoder scans the dˆ by shifting L bit to the right until finding a match with any of watermark bit streams. Similar to the coded case, the decoder takes L bits of
dˆ as the initial value for ŵ where ŵ =[ dˆ j], j=1,...L. At each iteration, Hamming distance between the ŵ and watermark vectors stored in the model database is computed. If the distance is found as equal to zero for any of the stored watermarks, a match is declared and a new search is initiated. Otherwise, ŵ is updated by performing logic AND operation on the current ŵ and ŵ =[ dˆ j], j=L,..,2L. Updating is repeated within the refresh period until finding a match. In order to prevent false alarms wˆ is reset to zero vector whenever a match is found.
4. Experimental Results A test data set is prepared by digitizing various speech and music files at 44.1 kHz sampling rate (16bits/sample). Two different watermark length is considered, L=15 bits and L=63 bits and BCH(15,11) and BCH(63,16) codes are used for channel coding. Stereo-to-mono conversions have been performed by using a professional audio editing software. Fig.1 presents the watermark extraction performances of traditional spread spectrum audio watermarking (enc1) and proposed adaptive audio watermarking (enc2) at different WSR (L=15). Watermark embedding has been performed within a 2-22050 Hz frequency band. Note that WSR values are chosen very low to guarantee inaudibility. In order to observe robustness to filtering attack, the watermarked audio files are passed through a filter (passes 25-13000 Hz). Fig.1 shows Probability of detection (PD) versus WSR for each case. Observe that enc2 provides 100% decoding accuracy. This is achieved by iterative embedding mechanism that minimizes decoding error at encoder site. PD of enc1 drops 50% at WSR=-25dB. Note that WM extraction performance of both encoders remain almost the same under filtering attack.
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
high SNR levels. Observe that the gain is more than 15% at SNR=20dB. Robustness to stereo-to-mono conversion attacks is also evaluated and results are presented in Table 1. First, encoded left and right channels are converted to stereo and decoded as mono (S-to-M). For comparison purposes, mono embedding-mono decoding results are also reported (Mono). It is concluded that the proposed blind decoding scheme robust to stereo-to-mono conversions. Observe that, even though the WSR is -30 dB for the speech data, denoising followed by error correction improves the performance radically while it is higher than 90% when the WSR is equal to -16dB.
1 0.95
Probability of Detection
0.90 0.85 0.80 0.75 0.70 0.65 0.60
enc2: 2-22050 Hz & 25-13000 Hz enc1: 2-22050 Hz enc1: 25-13000 Hz
0.55 0.50 -30
-25
-20
-15
-10
WSR (dB)
-5
Figure 1. PDs as a function of WSR. 1
Probability of Detection
0.9
enc1 enc2 enc1-D enc2-D
Table 1. Robustness to stereo-to-mono conversions (WSR= -30 dB for speech and -16 dB for music).
0.8
Speech
0.7
L=15 bit
0.6
S-to-M Mono
0.5 -10
L=63 bit -5
0
5
SNR (dB)
10
15
20
Figure 2. PDs as a function of SNR . 1
Probability of Detection
0.95
enc2 enc2-D enc2-D-BCH
S-to-M Mono
enc2 4 13 enc2 4 4
enc2-D-BCH 66 86
enc2-D-BCH 95 95
Music enc2 68 67 enc2 84 80
enc2-D-BCH 92 92
enc2-D-BCH 100 100
5. Conclusions
0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 -10
-5
0
5
10
15
20
SNR (dB)
Figure 3. PDs as a function of SNR.
In order to evaluate the performance at noisy communication channels, watermarked audio files are distorted by i.i.d. Gaussian noise. WM extraction is performed with denoising (enc1-D,enc2-D) as well as without denoising (enc1-enc2). Fig.2 presents probability of detection versus SNR. Observe that wavelet denoising improves the watermark extraction capability for both encoders but especially for enc1 and it is dominant at low noise levels. Fig.3 illustrates performance of our decoder versus SNR. PDs are plotted without denoising (enc2), with denoising (enc2-D) and denoising followed by error correction (enc2-D-BCH). For this experiment the BCH(63,16) code is used thus L is increased from 15 to 63. It is shown in Fig.3 that error correcting improves the WM extraction performance significantly especially at
This paper proposes a new audio watermark decoding framework that adopts wavelet denoising for blind watermark extraction along with data synchronization. It is shown that the wavelet denoising combined wit error correction improves the watermark extraction performance significantly. The proposed method does not need the original audio for decoding. It eliminates false alarms coming from transmission, by installing data synchronization while decoding, thus superior to traditional correlation-based schemes. It is robust to most common attacks such as filtering, channel noise and stereo-to-mono conversions.
6. References [1] H. S.Malvar, and D. F. Florencio, “ Improved spread spectrum: A New modulation technique for robust watermarking,” IEEE Tran. Signal Processing, vol. 51, no 4, pp.898-905, April 2003. [2] B. Gunsel, S. Sener, and Y. Yaslan, “An adaptive encoder for audio watermarking,” WSEAS Transactions on Computers, Issue 4, vol. 2, pp.1044-1048, October 2003. [3] . K. Fletcher, K. Ramchandran, and V. K. Goyal, “Wavelet Denoising by Recursive Cycle Spinning”, Proc. IEEE ICIP 2002, Rochester, New York, September 22-25, 2002.
0-7695-2128-2/04 $20.00 (C) 2004 IEEE