Embedding and Detecting Spread Spectrum ... - Semantic Scholar

1 downloads 0 Views 158KB Size Report
tional in amplitude to the recorded signal in order to suc- cessfully remove a ..... Although block repetition codes enable wow-and-flutter tolerance required for ...
Embedding and Detecting Spread Spectrum Watermarks under The Estimation Attack Darko Kirovski

Henrique Malvar

Microsoft Research One Microsoft Way Redmond, WA 98052 {[email protected]}

Microsoft Research One Microsoft Way Redmond, WA 98052 {[email protected]}

ABSTRACT Spread-spectrum (SS) watermarking has been one of the oldest methodologies for hiding data in multimedia signals. Recently, it has been demonstrated that simple block repetition codes enable strong robustness of such watermarks with respect to noise addition and limited arbitrary geometric transformations. In order to further investigate the security of such watermarks, in this manuscript, we explore their robustness with respect to the highly realistic watermark estimation attack. To launch this attack successfully, the adversary relies on the knowledge of all the details of the watermark codec except the hidden secret. We define a modification to the traditional SS watermark detector that enforces the adversary to add the amount of noise proportional in amplitude to the recorded signal in order to successfully remove a direct sequence SS watermark.

1.

INTRODUCTION

Audio watermarking1 schemes rely on the imperfections of the human auditory system (HAS). Numerous secret hiding techniques explore the fact that the HAS is insensitive to small amplitude changes, either in the time [2] or frequency [4], [9] domain. Information modulation is usually carried out using: SS [4] or quantization index modulation (QIM) [3]. It is important to review the disadvantages that both methods exhibit: • the marked signal and the watermark (WM) have to be perfectly synchronized at WM detection, • by breaking a single player (e.g. debugging, the sensitivity attack [8]), one can extract the secret information (the SS WM or the hidden quantizers in QIM) and recreate the original (in case of SS) or create a new copy that induces the QIM detector to treat the content as unmarked. While a relatively effective mechanism for enabling asymmetric SS WM-systems has been developed [7], an equivalent system for QIM does not exist to date, which renders QIM relatively impractical for developing industry-strength content screening applications. Recently, Kirovski and Malvar have demonstrated that simple block repetition codes enable strong robustness of SS WMs with respect to noise addition and limited arbitrary geometric transformations [5], [6]. Unfortunately, block repetition codes significantly increase the effectiveness of the watermark estimation attack [7]. When launching this attack, the adversary is assumed to know all the details of the WM codec except the hidden secret. By leveraging on the repetition of WM chips within the signal, the attacker 1 With no loss of generality, in this paper, we restrict our attention to audio signals.

can estimate the WM using an optimal algorithm. The adversary creates an attacked copy by simply subtracting the amplified estimate from the marked signal. Here, we explore the interaction of parameters involved in the game of marking and detecting SS WMs under the assumption that the WM estimation attack is carried out prior to WM detection. As an important contribution, we define a modification to the classic SS WM detector that enforces the adversary to add amount of noise proportional in amplitude to the recorded signal in order to successfully remove a SS WM.

2.

SS DATA HIDING

The host signal to be marked x ˜ ∈ RN can be modeled as a random vector, where each element xi of x ˜ is a normal independent identically distributed (i.i.d.) r.v. with standard deviation σx , i.e. xi ∼ N (0, σx ). Such modeling is arguable and is further discussed in Section 5. A watermark is defined as a SS sequence w, ˜ which is a vector pseudo-randomly generated in w ˜ ∈ {±1}N . Each element wi is called a “chip”. WM chips are mutually independent with respect to x ˜. The marked signal y is created as y˜ = x ˜ + w. ˜ Signal variance σx2 impacts the security of the scheme: the higher the variance, the more securely the WM can be hidden in the signal. WM detection is performed by correlating a given signal vector √ z˜ with the WM (w): ˜ C(˜ z , w) ˜ = z˜· w ˜ = E[˜ z · w]+N ˜ (0, σx / N ), where · denotes a normalized inner product of vectors. Under no malicious attacks or other signal modifications, if signal z˜ has been marked, then E[˜ z · w] ˜ = 1, else E[˜ z · w] ˜ = 0. The detector decides that the WM is present if C(˜ z , w) ˜ > τ, where τ is a detection threshold that controls the tradeoff between the probabilities of false positive and false negative decisions. We recall from modulation and detection theory that under the condition that x ˜ and w ˜ are i.i.d. signals, such a detector is optimal [10]. The probability PF A |(˜ z=x ˜) that the detection decision is a false alarm is:

PF A

1 = Pr[˜ z·w ˜ ≥ τ ] = erfc 2

 √  τ N √ , σx 2

(1)

and the probability PM D |(˜ z = x ˜ + w) ˜ that the detection decision is a misdetection equals:

PM D

3.

1 = Pr[˜ z·w ˜ ≤ τ ] = erfc 2

(E[˜ z · w] ˜ − τ) p σx 2/N

! .

(2)

THE SECURITY REQUIREMENT FOR SS WATERMARKING

Direct application of traditional SS as a data hiding tool is highly unreliable. The main reason for this are geometric transformations of the signal [1], that aim at desynchronizing w ˜ with respect to its location in the marked

content. Only a slight misalignment of the involved vectors incurs an incorrect detection decision. To address this problem, Kirovski and Malvar introduced a technique which uses block repetition codes with multiple correlation tests to enforce synchronization for attacks with limited time-pitch scaling [5], [6]. In a generic setup, marked signal y˜ is created by adding the WM with certain magnitude δ to the original: y˜ = x ˜ + δ w, ˜ w ˜ ∈ {{−1}m , {1}m }n .

(3)

Vectors y˜ and x ˜ have N = m × n samples, whereas w ˜ has n chips, each of them replicated successively m times. The WM detector correlates the averages of the central mo elements of each region marked with the same chip, where commonly mo = m/k, k ∈ {2, 5}. Such a detector can tolercoefficients [5]. ate fluctuation in scaling up to (k−1)m 2k The involved block repetition code improves the detection, but it also improves the efficacy of the estimation attack. If all details of the codec are known (except w), ˜ the adversary can compute the WM estimate, amplify it with a factor α > 1, and then subtract it from the marked content [7]. Theorem 1. Given a set of m samples of x ˜, marked with the same chip wi such that: y(i−1)m+j = x(i−1)m+j +δwi , 1 ≤ j ≤ m, the optimal estimate vi of the hidden WM chip wi is given as:

vi = sign

m X

! (x(i−1)m+j + δwi ) .

(4)

j=1

See Lemma 1 in [7] for proof. Note that v˜ ∈ {±1}N . Theorem 2. The optimal WM estimation yields probability of estimation error per WM chip:

ε = Pr[vi 6= wi ] =

1 erfc 2

 √  δ m √ . σx 2

(5)

See Corollary 1 in [7] for proof. We construct the estimation attack by subtracting an amplified WM estimate α˜ v from the marked content y: z˜ = y˜ − α˜ v.

(6)

The maximal value of the amplification factor α depends solely on the imperceptiveness of the attack. This issue is discussed further in Section 5. Corollary 1. The variance V ar[˜ z ] = σz2 of the attacked signal depends on α as presented:

σz2

1 = E n

" n X

(yi − αvi )

=

σx2

2

2

+ δ + α − 2αξ,

i=1

(7) 2 − δ2 2σ

ξ = σx e

x

r

2 + δ erf π



δ √ σx 2



 √  δ m √ E[˜ z · w] ˜ = δ − α(1 − 2ε) = δ − α erf , σx 2 2 σz . nmo

erf

δ−θ  √ .

(8)

(9)

δ m √ σx 2

If v˜ 6= w ˜ or α 6= δ, the estimation attack adds noise to the marked signal. Part of this noise is an accurate estimate of the WM and it actually reverses the effect of the watermarking process. The remainder of the attack vector is applied in addition to the existing marked data. Corollary 3. The estimation attack on a marked content described in Eqn.6 induces the following additive noise with respect to the original signal:  N/O ≡ E [|zi − xi |] =

α + δ(2ε − 1) , α ≥ δ δ + α(2ε − 1) , α < δ

(10)

A realistic attack/detect watermarking model would assume the following criteria: Criterion 1. The amplitude of the attack α is limited by the induced noise as: N/O ≤ β. Criterion 2. An attack with fixed α(θ) draws the expected value of the correlation to a value E[˜ z · w] ˜ = θ. For a fixed watermark length n and detection decision threshold τ = θ/2 that achieves symmetric probability of false alarm PF A and misdetection PM D , the detection error probability PE = PF A = PM D is upper bounded by at most γ: 1 erfc 2

PE =

 √  θ nmo √ ≤ γ. 2σz 2

(11)

It is important to stress that the efficiency of SS watermarking and detection depends by and large upon the parameters that are content dependent. For example, for typical audio watermarking applications, realistic values can be adopted at nmo ≤ 5 · 104 , β ≥ 4dB, and γ ≤ 10−6 [5]. Problem 1. For a given σx , what is the optimal value of δ such that under the optimal estimation attack described in Eqn.6 and quantified using α, maximal N/O is induced while Criterion 2 is satisfied? The posed problem can be solved in two steps. First, from Criterion 2, we can compute the minimal expected value for the normalized correlation E[˜ z · w] ˜ ≥ θ after the attack: √ 2σz 2 erf −1 (1 − 2γ) . √ nmo

(12)

Second, from Eqn.5, Eqn.9, and Eqn.10, we can compute the dependency of N/O upon the WM magnitude δ:

.

Corollary 2. After the attack, the expected correlation value computed by the WM detector equals:

with V ar[˜ z · w] ˜ ≈

α(θ) =

θ=

# 2

From Eqn.8, we compute that, in order to draw the expected correlation value to E[˜ z · w] ˜ = θ, the attacker has to induce α(θ) equal to:

N/O = f (δ) = erf

δ−θ  √  − δ erf δ m √ σx 2

 √  δ m √ , σx 2

(13)

from which we can numerically find the desired δ that maximizes the induced N/O. Figure 2 depicts the dependency of N/O with respect to δ x for √σm ∈ {2 . . . 6.5}. Optimal values δ(σx ), which result in maximal N/O, are depicted using the {◦} symbol.

The undo of the attack moves 0.1 pdf(z) against the likely 0.05 direction of attack.

The estimation attack moves each half of pdf(y) against its sign.

0.1 0.05

=3

0 -10

-5

=3 0

5

Signal pdf(y) before attack: N(0, y=3)

10

0 -10

Correct undo of the attack.

0.1

=3

Incorrect undo of the attack.

0.05

=3

=3

-5

0

=3 5

10

Decomposed pdf(z) after the attack.

0 -10

-5

0

5

Decomposed pdf(u) after undoing the attack.

10

Figure 1: Illustration of the undo of the estimation attack. N/O=f(δ,σ /sqrt(m)) x

Noise with respect to original N/O

7

−α < zi < 0. Similarly, since |Y (−α, 0)| > |Y (α, 2α)|, ui is an optimal estimation of yi based on zi s.t. 0 < zi < α.

σ x/sqrt(m)∈{2..6.5}

Corollary 4. The expected value for the correlation of the recovered u ˜ and w ˜ is:

6

5

E[˜ u · w] ˜ = δ − α[erfc(a) − erfc(b) − erfc(c) + erfc(d)] (15) 4

where a = 3

2

1

θ=0.3 0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

2α−δ √ ,d σx 2

2α+δ √ . σx 2

=

α

where f (x + c) is a function of the normal distribution centered at c with variance σx2 , which results in Eqn.15.

(14)

Theorem 3. Given a signal coefficient zi created using the estimation attack as zi = yi − α sign(yi ), where yi is a weighted sum yi = xi + δwi of a normal zero-mean i.i.d. variable xi and a SS sequence chip wi and α ≥ δ, optimal estimation ui of the signal yi such that E[|ui − yi |] is minimal, is given using the undo operator ui = U (zi , α). Proof. (sketch) When doing the estimation attack, the adversary shifts the positive and negative pdf of the marked signal y˜ for α against the sign of y˜. The undo operation described in Eqn.14 retrieves all values of the original signal y˜: ∀(yi > 2α), ui = yi . Now, lets define a subset Y (a, b) ⊂ y˜ s.t. yi ∈ Y (a, b) iff a < yi < b. Since for a zero-mean gaussian distribution of xi and α > δ, |Y (0, α)| > |Y (−2α, −α)|, then ui is the optimal estimation of yi based on a given zi s.t.

Effect of UNDO on detection 1 0.9

σ x=4

0.8

E[z⋅w] and E[u⋅w]

Definition 1. The undo operator U (zi , α), where zi , α ∈ R, is defined as follows: ui = U (zi , α) =

=

Z2α C2 = [(x − δ − 2α)f (x − δ) + (x + δ − 2α)f (x + δ)]dx

UNDOING THE ESTIMATION ATTACK

In this section, we demonstrate a remedy for the estimation attack described in Eqn.6. The main idea is to optimally reverse the attack, i.e. estimate the signal coefficient yi from the attacked signal zi . We also demonstrate that a slight modification to the attack in Eqn.6 succeeds in removing the WM (or disabling the detector to identify the WM) by adding additional noise to the attacked signal.

zi + α sign(zi ) , |zi | > 2α . zi − α sign(zi ) , |zi | 6 2α

α+δ √ ,c σx 2

α

Figure 2: Diagram of the dependency of N/O x (Eqn.13) with respect to δ for given √σm ∈ {2 . . . 6.5} and θ = 0.3.



=

Proof. (sketch) The undo of the estimation attack cannot recover the magnitudes of Y (−2α, −α) ∪ Y (α, 2α) which got mixed with Y (−α, 0) ∪ Y (0, α) during the attack. We compensate the final correlation E[˜ u·w] ˜ as follows: E[˜ u·w] ˜ = δ − C1 + C2 with: Z2α C1 = [(x − δ)f (x − δ) + (x + δ)f (x + δ)]dx

δ={0.3,5}

4.

α−δ √ ,b σx 2

0.7 0.6

E[u⋅w]

0.5 0.4

δ={0.5...3}

0.3 0.2

E[z⋅w]

0.1 0

0

1

2

3

4

5

6

7

8

9

10

Attack amplitude α

Figure 3: The effect of the undo test on the correlation test. As α increases, figure shows how E[˜ z · w] ˜ and E[˜ u · w] ˜ change for fixed σx = 4 and variable δ ∈ [0.5 . . . 3]. Figure 3 illustrates the effect of the undo operation on WM detection. Whereas the correlation value of a traditional SS WM detector E[˜ z · w] ˜ inevitably converges to zero, the correlation after the undo operation yields µ = min(E[˜ u · w]) ˜ >

0. Thus, according to Eqn.11, for a sufficiently long SS sequence (nmo elements of u ˜ are integrated), a detection threshold at τ = min(E[˜ u · w])/2 ˜ would yield desired detection results regardless of the strength of the estimation attack. In the region of interest, i.e. α s.t. (˜ y − α˜ v) · w ˜ ≈ µ, the variance of the correlation is approximately equal to σ2 V ar[˜ u · w] ˜ ≈ nz , where σz is computed in Corollary 1. The detector cannot possibly know the attack amplification value α while performing the detection. However, note that for any α, E[U(˜ x, α), w] ˜ = 0 2 , where x ˜ is a signal which does not have w ˜ embedded. Thus, the detector can perform T tests E[U(˜ x, αt ), w] ˜ for realistic values of α = {αt , t = 1..T } that can potentially break the system. The power of the undo operation is based on the inequivalent distribution of magnitudes marked with positive and negative chips. Therefore, the attacker must impose additional noise n ˜ to the attacked signal z˜ such that the latter distributions are equalized. While the ”smart noise” −α˜ v draws E[˜ z · w] ˜ to zero, the additional noise n ˜ enables that no undo operation is able to retrieve even a small part of the original distributions of the signal marked with positive and negative chips. Conclusion 1. Because of the undo operation, the estimation attack needs to be modified as follows: z˜ = y˜ −α˜ v +n ˜ , where n ˜ is a noise pattern aiming to equalize the distributions of magnitudes marked with positive and negative chips. For example, white noise of amplitude σn ∼ σx /2 commonly creates a difficult task for the designer of an undo operation.

5.

SS WATERMARKING OF AUDIO

In this section, we discuss the following arguments: (i) why using a gaussian distribution for the content is a justifiable assumption, (ii) how does redundancy impact detector and estimator performance on real-life data, and (iii) what is the impact of the results obtained so far on audio watermarking? We use an existing audio watermarking technology [5], [6] as a reference to discuss the aforementioned arguments. The linear marking (Eqn.3) and detection (Corollary 4) process is performed on the: • audible; audibility is detected using a state-of-the-art psycho-acoustic masking model [6], • averaged ; to improve robustness against de-synch attacks, the detector averages the central mo coefficients in each repetition block of size m coefficients [6], • cepstrum-filtered ; cepstrum filtering converts the coefficients into a zero-mean signal with significantly reduced variance [5] coefficients of a 2048-long MCLT analysis window in the logarithmic (dB) domain. We have observed that on a great variety of audio clips, even the individual filtered coefficients can be globally relatively accurately modeled using a normal pdf (·). In addition, the detector averages the central mo coefficients in each repetition block, which significantly improves the modeling accuracy due to the Central Limit Theorem. Thus, the final working vector y˜ extracted from the audio clip can be highly accurately macro-modeled as a gaussian. Local correlations and non-stationarity are effectively cancelled using sufficiently large windows (1024+ window size at a sampling frequency of 44.1kHz), cepstrum filtering, and running-average windowing along the time-axis. 2

U (˜ x, α) = {U(xi , α), i = 1..n}.

The reliability of detection as well as the performance of the WM estimator depend upon the variance of the original working vector x ˜. Although, Kirovski and Malvar restrict the location of the WM to the 2kHz-7kHz subband [5], we relocate the WM to the 200Hz-3kHz region for three reasons. First, HAS is much more sensitive to noise in this subband (a noise of only 4dB can rarely be tolerated). Second, the variance of the carrier signal is higher in this region providing a more robust host for data hiding with respect to the √ √ m = 3, in estimation attack. Third, although the ratio √m o E the proposed subband, the actual σσD retrieved experimentally from over 100 audio clips is only 1.18. Conclusion 2. In this manuscript, we have presented a generic recipe for using SS WMs to hide secrets in multimedia content. For a typical music content, if the SS WM is located in the 200Hz-3kHz subband, in order to draw the correlation of the new undo correlation test to a value that forces detection failure, the adversary needs to add total noise in the excess of 6dB which is intolerable to most users. SS WM length that would enable false alarm accuracy of PF A ≈ 10−6 would require approximately an 80 second music frame. A WM of such length is difficult to synchronize at the detector. Although block repetition codes enable wow-and-flutter tolerance required for most low-end turntables (e.g. 0.15% playtime fluctuation), it is arguable whether a common HAS would discard such content as no value. On the other hand, techniques presented in this manuscript may provide better results for data hiding in video signals, as we estimate that per frame significantly more chips can be embedded resulting in higher robustness to frame dropping and limited geometric distortions.

6.

REFERENCES

[1] Anderson, R.J., Petitcolas, F.A.P.: On the limits of steganography. IEEE J. Selected Areas in Communications 16 (1998) 474-481. [2] Bassia, P., Pitas, I.: Robust audio watermarking in the time domain. Proc. EUSIPCO, Vol. 1. Rodos, Greece. IEE (1998) 25-28. [3] Chen, B., Wornell, G.W.: Digital watermarking and Information embedding using dither modulation. Workshop on Multimedia Signal Processing, Redondo Beach, CA. IEEE (1998) 273-278. [4] Cox, I.J., Kilian, J., Leighton, T., Shamoon, T.: A secure, robust watermark for multimedia. Information Hiding Workshop, Cambridge, UK, (1996). [5] Kirovski D., Malvar H.: Robust Covert Communication over a Public Audio Channel Using Spread Spectrum. Information Hiding Workshop, Pittsburgh, PA, (2001). [6] Kirovski D., Malvar H.: Robust Spread-Spectrum Audio Watermarking. International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, IEEE (2001). [7] Kirovski, D., Malvar, H., Yacobi, Y.: A Dual Watermarking and Fingerprinting System. Microsoft Research Technical Report, (2001). [8] Linnartz, J.P., van Dijk, M.: Analysis of the sensitivity attack against electronic watermarks in images. Information Hiding Workshop, Portland, OR, (1998). [9] Swanson, M.D., Zhu, B., Tewfik, A.H., Boney, L.: Robust audio watermarking using perceptual masking. Signal Processing, 66 (1998), 337-355. [10] van Trees, H.L.: Detection, Estimation, and Modulation Theory. Part I, New York: John Wiley and Sons, (1968).

Suggest Documents