Watermarking via Optimization Algorithms for Quantizing Randomized Semi-Global Image Statistics M. Kıvan¸c Mıh¸cak
Ramarathnam Venkatesan
Tie Liu
Microsoft Research One Microsoft Way Redmond, WA 98053, USA
Microsoft Research One Microsoft Way Redmond, WA 98053, USA
University of Illinois 405 North Mathews Avenue Urbana, IL 61801, USA
[email protected]
[email protected]
[email protected]
ABSTRACT We introduce a novel approach for blind and semi-blind watermarking and apply it to images. We derive randomized robust semi-global statistics of images in a suitable transform domain (wavelets in case of images) and quantize them in order to embed the watermark. Quantization is effectively carried out by embedding to the host a computed sequence, which is obtained by solving an optimization problem whose parameters are known to the information hider but unknown to the attacker. An essential emphasis of the proposed method is randomization, which is crucial for security and robustness against arbitrary quality-preserving attacks. We formally show that malicious optimal estimation attacks that are specifically derived for our algorithm are ineffective in practice. Furthermore, we experimentally demonstrate that our watermarking method survives many generic benchmark attacks for a large number of images.
Keywords Watermarking, image watermarking, security, randomization, semi-global image statistics, dithered quantization, estimation attacks
1.
INTRODUCTION
Watermarking aims to invisibly embed a message, i.e., the watermark (WM), reliably in the host data, from which the message can later be recovered and used by a receiver for identification or copyright information purposes. The embedding is done by an encoder using a secret key, which results in an output required to be perceptually approximately the same as the original. The watermarked image may then undergo many possible changes by users and attackers: unintentional modifications and malicious attacks that particularly aim to disable the recovery of the WM. Ideally, the system must resist modifications and attacks as long as they result in images that are of perceptually approximately the same quality as the original. In this pa-
per, we focus on the verification problem where the receiver needs to make a binary decision regarding the existence of a possibly embedded WM1 . Practical applications of verification include automatic monitoring and access control of copyrighted materials. We distinguish here between data embedding (for communications purposes) and watermarking (for security purposes), both of which fall into the category of information hiding. The latter involves a malicious attacker, therefore one needs to consider security aspects in the presence of an adversary (akin to traditional cryptography). On the other hand, the former can be modelled as an information-theoretic problem of communications with side information. Recently, extensive research has been carried out on the data embedding problem and fundamental performance limits have been studied thoroughly, see for instance [2, 3, 4, 5, 6] and references therein2 . Comparatively, research work in securitybased watermarking is much more limited to the best of our knowledge. Ideally, one needs to show that to break a proposed secure watermarking system within perceptual constraints is equivalent to solving a hard problem in the presence of a resource-bounded adversary (by using tools from complexity theory); this is our ultimate goal. In this paper, however, we relax this goal and aim to achieve a milder task: We would like to design an image watermarking algorithm that has provable security properties in the presence of an intelligent attacker while being robust to reasonable quality-preserving conventional signal processing attacks, e.g., compression, independent noise addition, mild rotation and cropping, random bending, etc. Note that robustness against (even mild) geometric attacks using conventional methods often compromises security; this issue is further discussed in Sec. 2. We emphasize and heavily use randomization in the pro1 The verification problem is different from the decoding problem in which the receiver outputs embedded logical bits with minimal probability of error without checking potential existence. Although our scheme is specifically designed for verification purposes, it can be extended to the decoding problem with little effort, see [1]. 2 Note that, in most of these works, there are significant restrictions on the attack channel, e.g., often memoryless or block-wise memoryless attacks are considered. Such models admit meaningful analysis and results for pure communications purposes; however, systems designed solely based on these models can presumably be bypassed by realistic and intelligent attackers.
posed algorithm both to increase randomness introduced in the resulting system and to achieve robustness against reasonable (possibly geometric) attacks. We demonstrate that our proposed algorithm satisfies the targeted criteria; in particular, we show robustness against both optimal WM estimation attack (specifically designed for our algorithm) and standard benchmark attacks that include a wide range of JPEG compression, independent additive white Gaussian noise (AWGN) addition, random bending, mild rotation and cropping. There are fundamental difficulties in modelling all possible quality-preserving attacks and providing a formal security analysis. This is mainly because we do not have a reasonable and tractable model (and a corresponding metric) for perceptually non-distortive attacks. Conventional methods that use Euclidean-like metrics in deterministic transforms domains (e.g., wavelets, DCT, etc.) may fall short in case of malicious attacks. In order to overcome such difficulties, we propose a “hash-then-watermark” paradigm: We work on pseudorandomly derived semi-global image statistics (which we term as hash values throughout this paper) that compactly capture the essential distinctive information in the input image and are robust to mild (including geometric) attacks that preserve the perceptual quality. This results in a watermarking scheme that has two main generic steps: Given an image, we first derive pseudorandom (PR) robust semi-global statistics (host hash values). See Sec. 2 for a detailed account on hash extraction. Next, we quantize them to perform watermarking. After quantizing the host hash values, we subsequently compute a WM sequence via solving a nontrivial optimization problem. As a result, the hash values of the watermarked image are equal to the quantized version of the host hash values. Randomization left aside, some similarities may be observed between our approach and “spread transform” watermarking discussed in [7, 8]. However, fundamental differences exist: Our image statistics (hash values) are produced by non-disjoint sets of host coefficients (i.e., overlapping semiglobal connected regions) that have geometric meanings (as opposed to the usage of disjoint sets in [7, 8] that are not necessarily meaningful geometrically). We mention three advantages: First, our approach allows us to embed WM with minimal perceptual distortion; this can be imposed as a constraint in our optimization-problem formulation. Second, because of the way hash values are generated, we demonstrate better robustness properties against a wide range of attacks. Third, because we employ overlapping regions to extract hash values, there is a significant increase in the dimension of the domain where WM is computed and embedded, and this, as a byproduct, allows us to avert adversarial estimation attacks. Overall, our approach requires the computation of the WM via solving a nontrivial optimization problem (which yields a combinatorially complicated PR structure that is security-wise desired), whereas the WM is obtained by a trivial inverse spread transform in [7, 8] that yields weaker security properties with respect to the proposed scheme. After concluding this section with some notes on notation, we give a detailed account on hash extraction in Sec. 2 and demonstrate the effectiveness and robustness of the hash val-
ues which can be viewed as a “PR signal representation”. In Sec. 3, we give a general description of the proposed WM verification system and highlight some design considerations. In Sec. 4, we present in detail the design of the WM embedding algorithms. In Sec. 5, we propose WM detection algorithms for blind and semi-blind watermarking. In Sec. 6, we derive an optimal estimation attack algorithm (specifically designed for our scheme) and demonstrate the robustness of our algorithm against it. In Sec. 7, we apply our approach to images and report experimental results under a wide range of benchmark attacks. Finally, we give some concluding remarks in Sec. 8. This work was presented in part in [1, 9]. An extension of a similar approach has been used for video watermarking in [10].
1.1
Notation
Vectors and matrices are denoted by boldface letters. Their componential elements are denoted by respective normal letters with subscripts. The alphabets are designated by corresponding calligraphic letters. The n-fold Cartesian power of a generic alphabet A is denoted An . For real vectors, we use k · k and h·, ·i to denote the Euclidean norm and inner product. For matrices, we use ·0 , tr[·] and k · kF to denote the transpose, trace and Frobenius norm respectively. The n × n identity matrix is denoted as In . For random variables, we use E[·] and Var[·] to denote the expectation and variance.
2.
GENERAL APPROACH
A serious problem in watermarking is resistance against geometric attacks (e.g., rotation, cropping, shearing, etc.). Common techniques to achieve robustness against these attacks include repetition of the WM [11] to gain approximate invariance under these attacks, embedding a template or “pilot pattern” to perform re-synchronization [12], and exploiting periodicity properties of the embedded watermark to estimate parameters of the attack [13]. However, we believe that such techniques may pose serious security threats. Due to the redundancy in the embedded marks and/or pilot patterns, they cause “information leakage” and may lead to the attacker’s effective estimation; see [14] for an example of such an attack. Therefore, we need to do mark embedding such that we gain robustness against acceptable geometric attacks in a secure way. To achieve this task, we would like to find “PR signal representations” that are both secure (in a complexity-theoretic sense akin to traditional cryptography) and approximately invariant under reasonable geometric attacks. Our hash extraction method is designed to serve this purpose. Note that the transforms that yield desired PR signal representations would obviously be radically different from the ones that are used for conventional signal processing applications (e.g., wavelets, DCT, etc.). Ideally, such transforms with provable security properties should be used in all practical media anti-piracy applications. Our approach in this paper can be seen as a first step towards this goal. We choose our hash values as PR linear statistics of PR semi-global regions in the DC subband in the wavelet domain for images; the statistic of each region is computed as a weighted linear combination of the coefficients in that region where the weights are chosen from a smoothly varying
5
10
4
0.09
3
0.08
2
0.07
1
0.06
10
10
10
Histogram
MSE per sample
10
0
10
0.05
−1
0.04
−2
0.03
−3
0.02
−4
0.01
10
10
10
10
−5
10
0
10
20
30
40
50
60
70
0 −3
(a)
−2
−1
0
1
2
3
(b)
Figure 1: We consider four attacks on four standard test images (Lena, Barbara, Baboon, and Goldhill) to demonstrate robustness of the hash values; solid: JPEG compression attack with quality factor 10; dashed: StirMark random bending attack; dash-dotted: rotation by 5 degrees; dotted: cropping along each dimension by 5 percents. (a) MSE per sample between the hash values of the original images and their attacked versions v.s. square size. (b) Histograms of the change of the hash values induced by the attacks for square size 32; kurtoses are 3.02, 4.01, 4.22, and 4.09 for JPEG compression, random bending, rotation and cropping attacks, respectively. Gaussian random field. In practice, we apply an ideal low pass filter to a field of independent identically distributed (i.i.d.) Gaussian random variables to achieve smoothness. These regions can potentially overlap and need not be connected in general3 . The motivation for the semi-global nature of the hash values is that experiments indicate simple adversarial attacks can change local statistics dramatically. The resulting semi-global hash values are effectively used as the domain of WM embedding and detection, and thus they constitute a PR signal representation that is unknown to the attackers. Furthermore, this means that we are implicitly imposing a PR perceptual model on images in the hash domain. The robustness of similar image statistics has previously been studied for the hashing problem [16] where no information embedding was considered. Randomized steps of our approach constitute an essential contribution of this work; they are crucial in terms of security and being able to deal with arbitrary quality-preserving attacks at the same time. First, as it turns out, randomness introduced by our transforms enables us to use (either explicitly or implicitly) probabilistic models on arbitrary quality-preserving attacks (here the probability space is induced by the randomness of the utilized transform). Thus, many attacks can be shown to introduce a Gaussian additive noise effect in the hash domain since the attackers do not know the PR Gaussian weights. Even if the attacks are mild geometric ones, in practice we observe approximately zero mean additive Gaussian noise in the hash domain, as we shall demonstrate shortly in Sec. 2.1. This is a desir3 In our prior work [15], we hide information in the statistics of non-overlapping regions. Confining to non-overlapping regions not only brings limitations to the rate of the WM to be embedded, but also in case of strong WM blocking artifacts would be visible due to non-overlapping nature of randomly chosen regions.
able property for a mark-embedding domain since conventional WM techniques are known to work well under zero mean additive Gaussian noise attack (as opposed to geometric attacks which are often problematic). Furthermore, employing such randomness effectively “hides” the characteristics or the metric that the algorithm is using, makes it hard for an adversary (without the secrete key) to compute them, and thereby leads to a potentially secure system. We believe that appropriately designed randomized algorithms are essential for secure watermarking systems.
2.1
Robustness of the Hash Values
We now experimentally illustrate the robustness of PR linear semi-global statistics (hash values) and therefore justify that the PR hash domain is appropriate for mark embedding. We also show that we can treat the outcome of mild geometric attacks as additive Gaussian noise in the PR hash domain. Here, we show aggregate test results obtained from four standard grayscale test images, namely Lena, Barbara, Baboon and Peppers (all of them are of size 512 × 512). We consider the following four attacks: JPEG compression with quality factor 10, StirMark random bending [17], rotation by 5 degrees and cropping by 5 percent along each dimension; similar results have been obtained for other attacks. We compare the hash values obtained from the original images, with those of the attacked. In computing the hash values, we use square regions in the DC subband after applying 3-level discrete wavelet transform (DWT) where Daubechies length-8 wavelets are employed. The weights are chosen from a PR Gaussian random field (bandlimited to 0.3π in both horizontal and vertical dimensions) and are independently chosen for each square. In Fig. 1(a), we show the average mean squared error (MSE) between the hash values of the original image and the attacked image for the aforementioned four attacks. For each
m ∈ {0, 1} hs
s
x Encoder
m ˆ ∈ {0, 1}
y Attacker
Decoder
θ
Figure 2: Diagram of a generic WM verification system. The dashed line is for semi-blind watermarking only. of the four standard test images and their attacked versions, we pseudo-randomly independently generate 10000 hash values, where independent square locations and weights are used. Then, the normalized MSE of the difference between the original hash value and its attacked version is computed; normalization is done by dividing the total MSE by the number of coefficients in each square, from which the hash value is computed; therefore, the resulting MSE is per coefficient. Then, for each attack and each square size, we carry out averaging over all normalized MSE values (there are 40000 of them combining the results from all four images). We observe that, under all attacks the MSE monotonically decreases quite fast as the region size increases. In our watermarking algorithms, we use sufficiently large regions to work in a regime corresponding to relatively small MSE. In Fig. 1(b), we fix the square size to be 32 and show the histogram of the average MSE of the difference between the original hash values and their attacked versions for a total of 40000 independent realizations obtained from four standard test images. We observe that for all attacks, the distributions are sufficiently close to Gaussian (their kurtoses are pretty close to the ideal value of 3 for Gaussian distributions for all practical purposes).
3.
SYSTEM DESCRIPTION
Referring to Fig. 2, we denote by s, θ, and m the host data, the secret key, and the message. Specifically, • s ∈ S n is an n-sample host data which are possibly from audio, image, video signal or their transform in the most general case. In this paper, we refer to s as a vector representation of the DC subband in the DWT domain of the host image. • The secret key θ is uniformly chosen from the key space Θ. In practice, it is generated by a secure PR number generator and may subsequently be used as the seeds of other PR number generators to generate required randomness. To simplify the analysis, we do not limit the amount of randomness provided by θ, but it must be independent of the host data and the message. The secret key θ is shared between the encoder and the decoder, but not with the attacker. • For the verification problem, the message m ∈ M = {0, 1}. In this paper, we use the convention that m = 0 refers to that the image is unmarked and m = 1 the image is watermarked.
The rest of the system consists of an encoder, an attacker, and a decoder. Formally, the encoding refers to the mapping e : (S n , M, Θ) → X n where x = e(s, m, θ).
(3.1)
When m = 0, we have x = s, i.e., the image is unmarked. When m = 1, the watermarked data are produced via the embedding algorithm. It is required that the watermarked data be perceptually approximately the same as the host data; x is then made available to the public. The attacker takes x (the unmarked data or the watermarked data) as its input and produces the attacked data y ∈ Y n , possibly with some assisted randomness, to fool the decoder. In communications theory, such an inputoutput relationship is modelled as a (or a class of) conditional distribution(s) under the name “communications channel”. While in communications the channel is governed by a certain fixed (but complicated and possibly not completely known) probabilistic law, there exists an intelligent and adaptive attacker for watermarking. The attacker could try whatever that might work to disrupt communications of the WM, as long as the attacked data are perceptually close to the input. In particular, the attacker could use his knowledge of the algorithmic steps of the watermarking method to design his attack. Such an approach is cryptanalytic in nature and is we are targeting at in this paper (together with conventional image processing attacks). Given the attacked data y and the secret key θ, the decoder tries to make a binary decision regarding whether the image has been watermarked or not. Formally, this refers to the mapping d : (Y n , Θ) → M where m ˆ = d(y, θ).
(3.2)
We call this setup blind watermarking in contrast with nonblind watermarking in which the host data s are also available at the decoder. In this paper, we consider blind watermarking as well as what we call semi-blind watermarking, which is formally defined as the mapping d : Y n , Θ, Hk → M where m ˆ = d(y, θ, hs ) s
k
(3.3)
and h ∈ H are the length-k host hash values. For applications such as screening, it will be more realistic to assume that hs is available not only to the decoder but also to the attacker. Therefore, it is crucial to minimize the information leakage carried out by these hash values to the attacker.
This is in part analogous to the security requirement for the design of cryptographic hash functions [18], and our design is indeed to let the hash values depend on the host data in a PR way and thus to minimize its utility to the attacker. We consider the Neyman-Pearson setup where the probability of false alarm and the probability of miss are defined as PF A PM
= =
Pr [m ˆ = 1|m = 0] Pr [m ˆ = 0|m = 1] .
(3.4) (3.5)
For simplicity, we assume S = X = Y = H = R in the rest of this paper.
4.
WM EMBEDDING ALGORITHMS
In this section we present in detail the design of WM embedding algorithms including hash extraction, randomized quantization and computation of WM sequences, see Fig. 3.
4.1
Hash Extraction
Our hash extraction algorithm is based on the prototype proposed in [16] for hashing purposes and in [1] for watermarking purposes and consists of the following three steps. First, use the secret key θ to randomly tilt s into k (possibly overlapping) rectangles where k is assumed to be less than the length n of the host data. Denote the index set for the ith chosen rectangle as Ri . Second, given each Ri use the secret key θ to generate random weights {tij } for all j ∈ Ri . Let tij = 0 for those j ∈ / Ri . Weights are generated independently for different rectangles no matter they are overlapping or not. Finally, we compute hsi =
n X
tij sj ,
i = 1, · · · , k
(4.6)
j=1
as the hash values. Let T = {tij } be an k × n matrix and we have hs = Ts.
(4.7)
In the previous work [15] and [1], the PR weights are generated as independent Gaussian random variables. From the security perspective, independent weights maximizes the total randomness given marginal distributions. However, from the robustness perspective, independent weights results in fragility against desynchronization attacks which aim at mismatching the weights and the DWT coefficients at the decoder. In the image processing literature, the DC subband of natural images is usually modelled as a smoothly varying Markov or hidden Markov random field. Therefore, the use of correlated weights increases robustness against nonnoticeable desynchronization attacks because they provide a better match to the natural image spectra. Our experiments show that this improvement is especially substantial for semi-blind watermarking. In practice, correlated weights are generated by passing the independent weights through an ideal low-pass filter. The cutoff frequency of the low-pass filter is an important algorithmic parameter which controls the security and robustness tradeoff of our watermarking system. It also turns out that the choice of the cutoff frequency has important effect
on the embedding distortion introduced to the original image, both in MSE sense and in perceptual sense. This issue will be further discussed in Sec. 4.4. Remark: From (4.6), it is clear that we always have E (hsi ) = 0 for all i, since E (tij ) = 0 and {tij } are independent of the host s. Therefore, the higher order moments (second and up) of {hsi } become crucial in determining the information content of the hash values. It is well-known that as the amount of information in the host increases (resp. decreases), the “watermarkability” (i.e., the strength of the WM that can be embedded in the host) also increases (resp. decreases). In our case, the information contained in the hash values decreases as their distribution becomes more sharply concentrated around 0; this may happen if the hash parameters are not chosen properly or the underlying image is overly smooth (e.g., an image consisting approximately of a constant color). Our experiments show that, in practice we avoid this issue by choosing sufficiently large regions with sufficiently quickly-varying weights, yielding high-enough entropy in the resulting hash values; see Fig. 4 for the operational distribution of the hash values obtained from the Lena image for large-enough region sizes (10000 independent realizations for each region size); the parameter choice is the same as the one in Sec. 2.1. Although in practice we never observed hash values with very low entropy (i.e., sharply concentrated around 0), it would be a good precautionary measure to use a “watermarkability” measure (that would determine if an input image is suitable for watermarking) before using the proposed technique; we are not addressing this issue in this paper and leaving it for future research.
4.2
Randomized Quantization
The extracted hash values hs are then quantized for mark embedding. For blind watermarking, quantization-based methods have been shown to be superior to spread-spectrum methods for both decoding problem [5, 6] and detection problem [20], due to their host-interference rejection capability. For non-blind watermarking, quantization-based methods also have the practical advantage of being more secure, because the associated watermark sequence is host dependent and thus is more difficult to estimate. The purpose of using a key-dependent quantizer is to further increase security level of the embedding algorithm. In this work, we use subtractive dithered scalar quantizer to quantize the hash values: hxi = Q(hsi + di ) − di ,
i = 1, · · · , k.
(4.8)
Here, Q is the integer lattice quantizer scaled by ∆, and d is the k-dimension dither vector whose components are i.i.d. uniformly distributed in − ∆ , ∆ and are functions of the 2 2 secret key θ. It is also possible, even preferable, to use highdimensional lattice quantizer instead of scalar quantizer. In terms of data-embedding efficiency, using high-dimension lattice quantizer results in a higher shaping gain of the verification system [20]. From the security perspective, it is essential that the quantization occurs in large enough dimensions to exploit the combinatorial hardness of some underlying problems [18]. However, one has to pay the price of high computational complexity for working in the true high-dimensional space.
s
hs
Hash
hx
Randomized
Extraction
WM
Quantization
x
Generation
θ
Figure 3: Diagram of the proposed WM embedding scheme (m = 1).
300
Distribution of hash values
250
200
150
100
50
0 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1 4
x 10
Figure 4: Operational distribution of the hash values of the Lena image for squares of size 16 (solid) and 32 (dashed); the parameter selection is given in Sec. 2.1. Although Lena is a pretty smooth image, we observe that the resulting hash values are not sharply concentrated around 0 and contain high-enough entropy for WM embedding.
4.3
Computation of WM Sequence
provided that T has full row rank [21].
Mark embedding is carried out by finding the watermarked data x such that the correspondent hash values are equal to hx . Denote by e the quantization error for the host hash values hs : ei
=
hxi − hsi
=
Q(hsi + di ) − (hsi + di ),
i = 1, · · · , k. (4.9)
The quantization error e can be thought of as the watermark in the hash domain; the task of finding the watermarked data is to map e to the image domain such that the hash values of the watermarked data x is equal to hx . Because of the dimensionality reduction from the image domain to the hash domain (k < n), such a mapping is generally not unique. The appropriate algorithm should be designed to minimize the perceptual distortion between the watermarked data and the host data. In this section, we provide two different algorithms to generate the watermarked data.
4.3.1
The additive WM
Denote by n = x−s
(4.10)
the additive WM. The watermarked data are derived by solving the following optimization problem: min knk (4.11) subject to Tx = hx .
Our assumption on the full row rank of T, while it is valid for most practical situations, imposes constraints on the choice of random regions and random weights. In some extremal cases where the average region size and the number regions are large, it may happen that even though T has full row rank the condition number of TTT is poor, then there is a potential possibility of information leakage to the attacker due to the imbalance nature of computing the hash values. Hence, parameters must be carefully chosen to minimize the probability of T being ill conditioned.
4.3.2
The multiplicative WM
Denote by n the multiplicative WM where xj = nj sj ,
j = 1, · · · , n.
Lemma 4.1. The solution to the optimization problem (4.11) is given by nmin = T0 (TT0 )−1 e,
(4.12)
provided that T has full row rank. Proof: Using (4.10), (4.7) and (4.9), we can rewrite the constraint in (4.11) as Tn
= =
hx − Ts hx − hs
=
e.
Note that both T and e are known a priori. The result of (4.12) follows from the minimum-norm solution to the equivalent optimization problem min knk subject to Tn = e,
(4.13)
The watermarked data are derived by solving the optimization problem min kn − 1k (4.14) subject to Tx = hx where 1 , [1, · · · , 1]0 . Lemma 4.2. The solution to the optimization problem (4.14) is given by (n − 1)min = ST0 (TS2 T0 )−1 e,
Our criterion is to minimize the Euclidean distance between the watermarked data and the host data. It is, of course, questionable whether Euclidean metric is a good measure of perceptual quality relative to perceptual distortion issues. However, the Euclidean metric does provide us an objective measure of the distortion level introduced. Furthermore, if the solution to (4.11) is visually annoying, it is possible to use some heuristics. Such heuristics can be viewed as analogous to having regularization terms in inverse problems. In that case, our formulation is still useful, provided that the heuristics can be expressed as extra constraints in (4.11). This issue will be further addressed in Sec. 4.5.
¤
(4.15)
where T = {tij } as before and S , diag(s1 , · · · , sn ),
(4.16)
provided that TS has full row rank. Proof: Using the definition of S in (4.16), (4.13) can be written as x = Sn.
(4.17)
The constraint in (4.14) can then be written as TS(n − 1) =
hx − TS1
= =
hx − Ts hx − hs
=
e.
Now the result of (4.15) follows from the minimum-norm solution to the equivalent optimization problem min kn − 1k subject to TS(n − 1) = e, provided that TS has full row rank.
¤
Remark: So far, multiplicative WMs have only been used in spread spectrum watermarking, to the best of our knowledge. One of the novelties of the proposed approach is that, it allows the usage of multiplicative WMs jointly with quantization-based information hiding. The difference between the multiplicative and the additive embedding algorithms is in the visual quality while they are comparable in robustness aspects. The distribution of the
resulting distortion is quite different since they use different metrics. The superior method often depends on the input image. We can also use techniques similar to those mentioned for the additive embedding to achieve better perceptual effects for multiplicative embedding.
4.4
Analysis of MSE Embedding Distortion
In this section, we give an estimate of the MSE distortion introduced by the additive watermarking scheme. Such an analysis provides guidelines for choosing algorithmic parameters to achieve better overall performance in terms of distortion, robustness and security. Given a secret key θ, the MSE embedding distortion 1 E kx − sk2 θ . (4.18) De , n We make the following two assumptions on the distribution (induced by the host data s) of the host hash values hs . First, the variance of hsi is much larger than ∆2 so that quantization occurs in the low-distortion regime [23] and {ei } are uniformly distributed in − ∆ , ∆ . Second, hsi and 2 2 s hj are uncorrelated for all i 6= j, so we have ∆2 Ik (4.19) E ee0 | θ = 12 for almost all choices of θ. See Fig. 5 for an experimental verification of the above assumptions. Lemma 4.3. Assuming (4.19), the MSE embedding distortion introduced by the additive watermarking scheme De =
∆2 k∆2 tr (TT0 )−1 ≤ , 12n 12nλ2k
(4.20)
where λk is the smallest singular value of T. Proof: By Lemma 4.1, we have x−s = T0 (TT0 )−1 e. fore, the MSE embedding distortion 1 De = E (x − s)0 (x − s)|θ n 1 0 = E e (TT0 )−1 e|θ n 1 = E tr (TT0 )−1 ee0 |θ n 1 = tr (TT0 )−1 E ee0 |θ n ∆2 = tr (TT0 )−1 12n k∆2 ≤ , 12nλ2k
There-
(4.21) (4.22) (4.23)
where (4.21) follows from T is a deterministic function of the secret key θ, (4.22) follows from (4.19), and (4.23) follows from k X 1 tr (TT0 )−1 = 2, λ i=1 i
where λ1 ≥ · · · ≥ λk ≥ 0 are the singular values of T.
(4.24) ¤
Lemma 4.3 shows that for the additive watermarking scheme the MSE embedding distortion is independent of the image
being watermarked but depends on the secret key via the PR matrix T. However, our experiments show that for appropriately chosen algorithmic parameters the MSE embedding distortion is almost independent of the secret key due to the amazing robustness of the empirical singular-value distribution of T. Fig. 6 plots the MSE embedding distortion against several cutoff frequencies of the ideal low-pass filter used to generate the correlated weights. As there, there is an increasing probability for the PR matrix T to be ill conditioned and hence to blow up the MSE as we decrease the cutoff frequency. The situation becomes severe when the cutoff frequency is less than 0.3π. Certainly, MSE cannot completely characterize the perceptual quality of the watermarked image. As a matter of fact, although the use of cutoff frequency 0.3π results in a higher MSE than that with cutoff frequency π (which implies independent weights), our experiments show that in the former case we have much better watermarked images in term of human perception because the introduced watermark sequence turns out to be more smooth. This motivates our study of the regularization algorithms which aim at further smoothing the introduced WM sequence.
4.5
Regularization
The results given in (4.12) and (4.15) are optimal in the sense of minimizing the Euclidean distance between the watermarked data and the host data. The ideal solution from a practical standpoint would be to design the embedded WM so as to minimize perceptual artifacts. It is well-known that, even though Euclidean metric is a reasonable quantity to optimize, it is certainly not the best for human visual system. For instance, it is quite possible that the minimum-norm solutions may yield spike-like artifacts while minimizing the Euclidean norm. In this section, we propose to further improve the minimum-norm solutions in a perceptual sense by imposing extra smoothness constraints in the optimization problem such that the resulting WM is visually more appealing. Our approach is analogous to imposing extra constraints in regularization-based inverse problems such as denoising and restoration. Suppose, the perceptual quality of the watermarked signal is ensured by imposing the linear constraint An = 0, where 0 , [0, · · · , 0]0 and A is appropriately chosen. In our algorithm, we impose n to be band-limited in which case A would relate to a sub-matrix of a discrete Fourier transformation (DFT) matrix of appropriate size. If such a linear constraint is imposed, the optimization problem that corresponds to the design of the additive WM is min knk (4.25) subject to Tx = hx and An = 0. e − 1) = Similarly, we can impose smoothness constraint A(n 0 for the design of the multiplicative WM, and the correspondent optimization problem is given by min kn − 1k (4.26) e − 1) = 0. subject to Tx = hx and A(n Lemma 4.4. The solutions to the optimization problems (4.25) and (4.26) are given by −1 nmin = T01 T1 T01 e1 (4.27)
1 600
0.9 500
0.8
0.7
Histogram
400
0.6 300
0.5
0.4 200
0.3 100
0.2
0.1 0 −150
−100
−50
0
50
100
150
0
(a)
(b)
Figure 5: (a) Histogram of the quantization error ei ; (b) Autocorrelation matrix of the quantization-error vector e; The intensity of the pixel is drawn proportionally to the absolute value of the correlation which has been normalized to lie in the range [0, 1]. The experiment is done by averaging over 100 images while we fix the secret key. Similar experimental results have been observed for different choices of keys.
700 The mean value The maximum value The minimum value
650
600
550
MSE
500
450
400
350
300
250
200
1
1.5
2
2.5
3
Figure 6: MSE embedding distortion v.s. cutoff frequency. The “o” line, ’the “*” line and the “+” line are for the mean, the maximum, and the minimum values over 1000 keys for each cutoff frequency.
and (n − 1)min = T02
−1 T2 T02 e2 ,
(4.28)
TS T respectively, where T1 = , T2 = e , e1 = A A e e , e2 = , provided that T1 and T2 have full e 0 −A1 row rank.
each j ∈ Ri . We let aij = bij = 0 for those j ∈ / Ri . Finally, we compute Pn j=1 aij sj s , i = 1, · · · , k (4.29) hi = Pn j=1 bij sj as the hash values. Now, hs is invariant under homogeneous value-metric scaling of s. As a matter of fact, hs is approximately invariant under any slowly-varying value-metric scaling of s because the computation of each component of hs only involves a local portion of s.
Proof: The proof is immediate by following similar footsteps in the proof of Lemma 4.1 and Lemma 4.2 and incorporating the extra constraint An = 0. ¤
The watermarked data are generated using optimization algorithms similar to those developed in Sec. 4.3.
It is certainly possible to impose other constraints that may improve the quality even better. Such constraints may even be nonlinear or based on some heuristics and still be useful in practice as long as they improve the perceptual quality of the watermarked image. Next, we present an iterative algorithm for the additive watermarking scheme where arbitrary smoothness constraint can be incorporated.
Denote by n = x − s the additive WM. The watermarked data are generated by solving the following optimization problem. ( min knk Pn aij xj (4.30) subject to Pj=1 = hxi , i = 1, · · · , k. n bij xj
1. Set n0 = nmin where nmin is given by (4.12). 2. For K steps, do: 2.1) Set n1 = SMOOTH(n0 ). 2.2) Set n2 = n1 − nmin . 2.3) Set n3 to the projection of n2 to the null space of −1 T: n3 = n2 − T0 (TT0 ) Tn2 . e 0 = n3 + nmin . 2.4) Set n e 0 and go to 2.5) If ke n0 − n0 k < ε stop, else set n0 = n 2.1.
4.6.1
The additive WM
j=1
Lemma 4.5. Let T be an k × n matrix with Tij = aij − hxi bij ,
the solution to the optimization problem (4.30) is given by nmin = −T0 (TT0 )−1 Ts,
Proof: Using the definition of n, we can rewrite the constraints in (4.30) as n X
(aij − hxi bij )nj = −
j=1
4.6
Extension against Value-Metric Scaling Attacks
Traditional quantization-based watermarking methods are criticized for their inability against value-metric scaling attacks. However, there is a remedy within our “hash-thenwatermark” framework. We start with the following simple modification of the hash extraction algorithm from Sec. 4.1: First, we use the secret key θ to randomly tilt s into rectangles Ri . Second, given each Ri , instead of generating one set of random weights {tij } we use the secret key θ to generate two independent sets of random weights {aij } and {bij } for
(4.32)
provided that T has full row rank.
3. n0 is the watermark sequence. In the algorithm above, the operator SMOOTH can be an arbitrary smoothness operator. If the algorithm converges, both the smoothness condition and the condition Tn0 = hx are satisfied. Note that a similar algorithm can be written for the multiplicative case as well. Furthermore, the aforementioned algorithm may be useful from a computational complexity point of view if T1 T01 in (4.27) or T2 T02 in (4.28) is large in size and expensive to invert. Our experiments reveal that if we choose SMOOTH(·) to be ideal low-pass filtering with cutoff frequency 0.3π, the proposed algorithm typically converges in about 15 iterations for the additive watermarking scheme and its computational complexity is considerably less than computing (4.27) directly.
(4.31)
n X
(aij − hxi bij )sj .
j=1
Let T be defined as in (4.31), the above equation can be written in the matrix-vector form Tn = −Ts. Note that both T and s are known a priori. So the conclusion (4.32) follows from the minimum-norm solution to the equivalent optimization problem min knk subject to Tn = −Ts, provided that T has full row rank.
4.6.2
¤
The multiplicative WM
Denote by n the multiplicative WM where xj = nj sj for j = 1, · · · , n. The watermarked data are derived by solving the optimization problem ( min kn − 1k Pn (4.33) j=1 aij xj P subject to : = hxi , i = 1, · · · , k. n bij xj j=1
Lemma 4.6. The solution to the optimization problem (4.33) is given by (n − 1)min = −ST0 (TS2 T0 )−1 Ts, provided that TS has full row rank.
(4.34)
Proof: By the definitions of T, S, and n , the constraints in (4.33) can be written as T(n − 1) = −Ts. The result in (4.34) follows from the minimum-norm solution to the equivalent optimization problem min kn − 1k subject to TS(n − 1) = −Ts, provided that TS has full row rank.
5.
¤
WM DETECTION ALGORITHMS
The decoder first extracts the hash values of the attacked data y using the secret key θ. WM detection then takes place in the hash domain where as we have discussed in Sec. 2.1, we may consider the following simple AWGN attack model: m = 0 : hy = hs + z (5.35) m = 1 : hy = hx + z, where hs , hx and hy are the hash values of the host data s, the watermarked data x and the attacked data y respectively, and z is the additive white Gaussian noise. Next, we propose a blind and a semi-blind WM detector. Their performances under a wide range of realistic attacks will be demonstrated via simulations in Sec. 7.
5.1
Blind WM Detection
For blind watermarking, we propose a sample-by-sample decision strategy where the binary decision random variable vi is defined as 1 if |fi | ≥ ∆/4 vi , (5.36) 0 if |fi | < ∆/4, and fi = hyi − (Q(hyi + di ) − di ) ,
i = 1, · · · , k.
(5.37)
y
The detection statistic t(h |θ) simply takes the majority vote over all vi : t(hy |θ) ,
m ˆ =0 k 2X > vi ρ, < k i=1 m ˆ =1
(5.38)
where the threshold 0 ≤ ρ ≤ 1 is chosen to balance between PF A and PM .
5.2
Semi-blind WM Detection
For semi-blind watermarking, both the secret key θ and the original hash values hs are available to the detector. In this case, we use the following generalized correlation detector which is detection-theoretic optimal assuming z is white and Gaussian [22]. m ˆ =1 hhy − hs , hx − hs i y s > t(h |θ, h ) , ρ, < khx − hs k2 m ˆ =0
(5.39)
where 0 ≤ ρ ≤ 1 is the threshold chosen to control the tradeoff between PF A and PM .
6.
ROBUSTNESS AGAINST ESTIMATION ATTACKS
Conventionally-proposed attacks on mark-embedding systems are often generic, i.e., not dependent on particular mark-embedding algorithms being attacked. Some examples are StirMark attacks [17], CheckMark attacks [25], generic estimation attacks for noise removal [26], sensitivity attacks [27], and “match-and-replace” type attacks [28, 29, 30]. Because such attacks are not algorithm adaptive, it may be reasonable to expect to have a well-engineered mark-embedding scheme that would withstand most of these attacks up to a reasonable degree. However, this does not convey much information about the performance of that mark-embedding scheme against carefully designed algorithm-specific attacks. For instance, in [14] we proposed an algorithm-specific attack (which is therefore cryptanalytic in spirit) on an audio mark-embedding system that is quite robust against most generic attacks. There, we suggested that the use of stochastic models for attackers is realistic. Furthermore, we also mentioned that a similar assumption on the attacked data by a decoder is inappropriate, since the attacker may specifically design an attack that may “fool” any such assumptions of the decoder, thereby increase the probability of error. Here, we follow a similar approach to [14]: we take the role of the attacker and derive an optimal estimation attack on the additive-embedding scheme using a particular stochastic model on the host data and the embedding sequence. We assume that the attacker knows the algorithmic steps of our additive mark-embedding algorithm. Moreover, we allow some partial information about secret key to be known by the attacker. Hence, we give an “upper hand” to the attacker, which is drastically different from conventional attack algorithms [17, 25, 26, 27, 28, 29, 30]. Our attack consists of finding the linear minimum-mean-squared-error (LMMSE) estimate of the unmarked signal using this extra information. Furthermore, we show that under some mild assumptions, the same attack is both the minimum-meansquared-error (MMSE) and maximum-a-posteriori (MAP) estimate of the unmarked signal. We show the robustness of our algorithm to the proposed attack. The fact that, our attack is algorithm specific and uses extra information on the secret key, is the main difference between our approach and conventional estimation-based attacks (e.g., [25, 26]) in the literature. We begin our analysis by stating the following immediate observations by the attacker: 1) E [e] = 0. 2) e and s are uncorrelated, i.e., E [ei sj ] = E [ei ] E [sj ] for all i and j. Note that these are classical results from quantization theory in the low-distortion regime [23] which is indeed the regime our mark-embedding algorithm operates at. Recall that, in the additive-embedding scheme, we have x = s + n, −1
where n = T0 (TT0 ) e. Hence, based on observation 2, we have that s and n are uncorrelated. Furthermore, we assume that the attacker knows the mean value of the unmarked host signal. Equivalently, we assume that E [s] = 0 in the analysis below. We have the following result:
−3
8
x 10
Distribution of Frobenius Norm of R
Empirical Distribution of Singular Values of T
n
0.06
7 0.05
6 0.04
5
0.03
4
3
0.02
2 0.01
1
0 0.2
0 2.2
0.4
0.6
0.8
1
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
(a)
3.1 5
1.2
x 10
(b)
Figure 7: Experimental results on T and kRn kF using 10000 keys: (a) The estimated empirical distribution of the singular values of T. (b) The estimated distribution of kRn kF . Lemma 6.1. T he LMMSE estimate of the unmarked host data s from the watermarked data x (i.e., the optimal estimate of s that is linear in x and minimizes the MSE) is given by ˆs = Rs (Rs + Rn )−1 x,
(6.40)
assuming s and n are both zero mean and uncorrelated. Here, Rs and Rn are autocorrelation matrices of s and n. Proof: Based on the stated assumptions, this is a classical result. See, for example, [22] for a proof. ¤ The result (6.40) is also known as Wiener filtering in the literature. Next, we state two more observations by the attacker: 3) For each i = 1, 2, · · · , k, the quantization error ei is uniform in − ∆ ,∆ . 2 2 4) ei and ej are uncorrelated for all i 6= j, i.e., E [ei ej ] = ∆2 δ . 12 ij Note that, observations 3 and 4 are similar to, but not the same as, the assumptions made in Sec. 4.4. Here, from the attacker’s perspective, the unmarked host data s might be considered as deterministic but unknown, and the probability space is defined solely over the random key selection. In that case, the use of random dithering as a function of θ in (4.9) induces the probability space. As a result, even for a deterministic unknown s, the observations 3 and 4 are exact (unlike the analogous approximate ones given in Sec. 4.4 that are approximately true in the low-distortion regime [24]). Using observation 4, we have ∆2 E ee0 = Ik . 12
(6.41)
Let T = UΣT V0 be the singular-value decomposition (SVD) of the secret transformation matrix T. We assume that T
has full row rank. In that case, U (left singular-vector matrix) is an k × k unitary matrix, ΣT (singular-value matrix) is an k×k diagonal matrix with positive entries, and V (leftsingular vector matrix) is an n × k matrix with orthonormal columns. Using the SVD of T, the minimum-norm watermark sequence for the additive mark-embedding scheme can be rewritten as 0 n = T0 (TT0 )−1 e = VΣ−1 T U e.
(6.42)
Using (6.41) and (6.42), we obtain ∆2 0 Rn = E nn0 = VΣ−2 T V . 12
(6.43)
Next, we provide some experimental results on T and Rn in order to understand the effect of Rn in the estimation attack. In Fig. 7(a), we show the estimated empirical distribution of the singular values of T for 10000 randomly chosen keys; the parameters used are the same as the ones given in Sec. 7. In Fig. 7(b), we show the estimated distribution of kRn kF under the same setup. We consider conventional stochastic signal models on s to estimate operational statistics of Rs . The first model we use is “independent stationary Gaussian model”: We assume that s is a realizations of an i.i.d. zero-mean Gaussian process. Under this model, Rs would be a scaled version of the identity matrix, where the scaling factor is the variance. We estimate the variance of this distribution using 4000 images from our database. In Fig. 8(a) and Fig. 8(b), we show the estimated empirical distribution of the singular values of Rs and kRs kF respectively under this model. Following [19], the second model that we consider is the so-called non-stationary approximately locally i.i.d. Gaussian model. The same model or a variant of it has been shown to be effective in compression [31] and denoising [32] applications. We model s as the realization of an independent, but not necessarily identically distributed zero-mean Gaussian process. Furthermore, we assume that the variance field of s is smoothly varying, based on which we use locally approximately i.i.d. assump-
Distribution of Frobenius Norm of R , Stationary Model
Empirical Distribution of Singular Values of R , Stationary Model
s
s
0.07
0.02
0.018
0.06 0.016
0.05
0.014
0.012
0.04
0.01
0.03 0.008
0.006
0.02
0.004
0.01 0.002
0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5 x 10
(a)
(b) Distribution of Frobenius Norm of Rs, Non−stationary Model
Empirical Distribution of Singular Values of Rs, Non−stationary Model 0.06
0.06
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
0.5
1
1.5
2
2.5
3
3.5
0
0
1
2
3
4
5
6 7
6
x 10
x 10
(c)
6 7
5
x 10
(d)
Figure 8: Experimental results on Rs using 4000 images: (a) The estimated distribution of the singular values of Rs using the stationary i.i.d. Gaussian model; (b) The estimated distribution of kRs kF using the stationary i.i.d. Gaussian model; (c) The estimated distribution of the singular values of Rs using the non-stationary approximately locally i.i.d. Gaussian model; (d) The estimated distribution of kRs kF using the non-stationary approximately locally i.i.d. Gaussian model.
tion on s. Using this approximation, we estimate the underlying variance field; see [19] for further details on the model and estimation procedure. Under this model, Rs would be a diagonal matrix with possibly different entries. We estimate Rs of this distribution using 4000 images from our database. In Fig. 8(c) and Fig. 8(d), we show the estimated empirical distribution of the singular values of Rs and ||Rs ||F respectively under this model. Note that, although our current attack analysis uses Gaussian models on the host data s, our algorithmic construction is motivated by the desire to derive similar results while avoiding or minimizing such stochastic assumptions, but rather by the randomization steps that result from random key selection. This issue shall be explored in our future research. We assume that the attacker knows Rs and Rn . According 2 0 to (6.43), Rn = ∆ VΣ−2 T V , so the attacker gets some in12 formation about T for free. Hence, the attacker applies approximate Wiener filtering as an attack, where he uses an estimate of Rs in (6.40). The estimate is found using the original unmarked host s under two different models explained above. Under both stationary and non-stationary Gaussian models, the attack was ineffective in all our experiments. In fact, comparing Fig. 7(b) with Fig. 8(b) and Fig. 8(d), this result is not surprising: According to our experiments, the empirical mean of kRn kF is 2.57 × 105 , whereas the empirical mean of kRs kF is 1.52 × 107 and 2.18 × 107 according to stationary and non-stationary independent Gaussian models respectively. This suggests that Rs + Rn is very close to Rs , which implies that Rs (Rs + Rn )−1 behaves like an identity matrix approximately. Hence, the perturbation on the characteristics of s is negligible after mark embedding, which explains why the attack is ineffective although the attacker has partial information on s and n. Note if we further assume n is Gaussian for large number of samples and rectangles, the estimate (6.40) is both the MMSE and MAP estimate of s given x under Gaussian stochastic models of s [22].
7.
EXPERIMENTAL RESULTS
In our experiments, we used many grayscale test images of size 512 × 512 from a database of size approximately 4000. We applied our additive mark-embedding algorithm to the DC subband of 3-level DWT with Daubechies length8 wavelets. We experimentally found out that multiplicative mark-embedding algorithm yields slightly inferior results with respect to the additive scheme. The PR transformation matrix T is chosen such that image statistics represent weighted averages of pseudorandomly chosen squares of size 16 in this subband. Weights are initially chosen as i.i.d. realizations of a zero-mean Gaussian distribution with variance 0.7; then they are passed through an ideal low-pass filter of cutoff frequency 0.3π. Typically, we choose about 100 rectangles. With these parameters, we never observed an ill-conditioned transformation matrix T. Ideally, the quantization step size ∆ should be chosen in an image-dependent way, however we observed that the value of ∆ = 300 produces reasonable results for a wide class of images. When we compute the WM sequence, we use the regularization technique, where we impose the constraint of being bandlimited to [−0.6π, 0.6π]. At the receiver, prior to applying our detection algorithms, we first resize the input image to
size 512 × 512 via bi-cubic interpolation. We tested our algorithm against several attacks. We applied these attacks to both watermarked and unmarked images and experimentally quantified the probabilities of “false alarm” (PF A ) and “miss” (PM ). Recall that the event of “false alarm” happens if the detector declares the presence of the WM although it was not embedded; conversely, the event of “miss” happens if the detector declares that the WM is not present even though it was embedded. In all of our experiments, we observed that none of these events happen both for blind and semi-blind detection in case of estimation attacks (explained in Sec. 6), JPEG compression attacks (with quality factor as low as 10), AWGN attacks (with standard deviation as high as 50) and several smoothing attacks with linear shift-invariant filtering. In case of JPEG compression attack with quality factor 1, we observed no error with semi-blind detection. In Fig. 9(a) and Fig. 9(b), we show the operational distributions of the soft (i.e., prior to thresholding) detector outputs for the AWGN attack with standard deviation 30 (solid line for the unmarked inputs and dotted line for the watermarked inputs) and 50 (circles for the unmarked inputs and crosses for the watermarked inputs) for blind and semi-blind detectors respectively. Recall that the soft detector outputs are given on the left hand side of (5.38) and (5.39) for blind and semi-blind detection respectively. Using the same setup, in Fig. 9(c) and Fig. 9(d), we show the operational distributions of the soft detector outputs for the JPEG compression attack with quality factor 10 (solid line for the unmarked inputs and dotted line for the watermarked inputs) and 1 (circles for the unmarked inputs and crosses for the watermarked inputs) for blind and semi-blind detectors respectively. In all cases, we see a clear separation between the detector outputs for watermarked and unmarked inputs. These results have been produced using 100 randomly chosen images from our database and 100 randomly chosen keys; hence, the total sample size is 10000. Next, we tested our algorithm against geometric attacks such as rotation, cropping, and StirMark random bending [17]. In general, we observed that as long as these attacks are mild and produce perceptually pleasing images, PF A and PM remain very small. In Fig. 10, we show receiving operation characteristics (ROC) (i.e., PM v.s. PF A varied as a function of the detection threshold) under rotation, cropping, and random bending attacks under the same experimental setup mentioned above. In Fig. 10(a), solid (resp. dash-dotted) line shows the ROC under the attack of rotation by 1 degree for blind (resp. semi-blind) detection; dashed (resp. dotted) line shows the ROC under the attack of rotation by 2 degrees for blind (resp. semi-blind) detection. In Fig. 10(b), solid (resp. dash-dotted) line shows the ROC under the attack of cropping 5 rows from each dimension for blind (resp. semi-blind) detection; dashed (resp. dotted) line shows the ROC under the attack of cropping 10 rows from each dimension for blind (resp. semi-blind) detection. In Fig. 10(c), solid and dashed lines show the ROC under the StirMark random bending attack for blind and semi-blind detection, respectively. We observe that, in case of geometric attacks, semi-blind detection improves the performance significantly.
Distribution of Blind Detector Output
Distribution of Semi−blind Detector Output
1
0.06
AWGN30, blind, unmarked AWGN30, blind, marked AWGN50, blind, unmarked AWGN50, blind, marked
0.9
AWGN30, semi−blind, unmarked AWGN30, semi−blind, marked AWGN50, semi−blind, unmarked AWGN50, semi−blind, marked
0.05
0.8
0.7 0.04
0.6
0.5
0.03
0.4 0.02
0.3
0.2 0.01
0.1
0 −0.5
0
0.5
1
1.5
0 −0.2
0
0.2
0.4
(a)
0.6
0.8
1
1.2
(b)
Distribution of Blind Detector Output
Distribution of Semi−blind Detector Output
0.8
0.025
JPEG, QF=10, blind, unmarked JPEG, QF=10, blind, marked JPEG, QF=1, blind, unmarked JPEG, QF=1, blind, marked
0.7
JPEG, QF=10, semi−blind, unmarked JPEG, QF=10, semi−blind, marked JPEG, QF=1, semi−blind, unmarked JPEG, QF=1, semi−blind, marked 0.02
0.6
0.5 0.015
0.4
0.01
0.3
0.2 0.005
0.1
0
0
0.2
0.4
0.6
(c)
0.8
1
1.2
1.4
0 −0.5
0
0.5
1
1.5
2
(d)
Figure 9: Empirical distribution of the soft detector outputs for AWGN (panels (a) and (b)) and JPEG compression (panels (c) and (d)) attacks. Panels (a) and (c) (resp. (b) and (d)) show the results for the blind (resp. semi-blind) detector; i.e., the left hand side of (5.38) and (5.39) respectively. In (a) and (b), solid (resp. dotted) line shows the distribution for noise standard deviation 30 for unmarked (resp. watermarked) inputs; circles (resp. crosses) show the distribution for noise standard deviation 50 for unmarked (resp. watermarked inputs). In (c) and (d), solid (resp. dotted) line shows the distribution for quality factor 10 for unmarked (resp. watermarked) inputs; circles (resp. crosses) show the distribution for quality factor 1 for unmarked (resp. watermarked) inputs.
ROC, rotation attack 0
ROC, cropping attack
0
10
10
Rotation by 2 Blind Cropping by 10 Blind
Rotation by 1 Blind
−1
10
−1
10
Cropping by 5 Blind
M
−2
10
P
P
M
Rotation by 2 Semi−blind
Cropping by 10 Semi−blind
−2
10
Cropping by 5 Semi−blind
Rotation by 1 Semi−blind −3
−3
10
10
−4
10
−4
−4
10
−3
10
−2
−1
10
10
0
10
10
−4
−3
10
−2
10
10
PFA
−1
10
0
10
PFA
(a)
(b) ROC, StirMark Random Bending Attack
0
10
blind
−1
10
PM
semi−blind
−2
10
−3
10
−4
10
−4
10
−3
10
−2
10 PFA
−1
10
0
10
(c) Figure 10: Receiving operation characteristics (ROC): (a) Rotation by 1 degree using blind (solid) and semi-blind (dash-dotted) detectors; and rotation by 2 degrees using blind (dashed) and semi-blind (dotted) detectors; (b) Cropping 5 rows from each dimension using blind (solid) and semi-blind (dash-dotted) detectors; and cropping 10 rows from each dimension using blind (dashed) and semi-blind (dotted) detectors; (c) StirMark random bending attack for blind (solid) and semi-blind (dashed) detectors.
We also tested the variant of our algorithm that uses scalinginvariant rational statistics (explained in Sec. 4.6) against the aforementioned attacks. We observed that its performance is comparable to that of the mark-embedding scheme using linear statistics.
8.
CONCLUDING REMARKS
We proposed an image watermarking system where the information is embedded via quantization of pseudorandomly derived robust semi-global image features. These features consist of pseudorandomly computed linear (or rational) statistics of pseudorandomly chosen regions. The corresponding WM sequence is computed via solving a nontrivial optimization problem whose parameters are known to the encoder and decoder but unknown to the attacker. We performed extensive tests on our algorithms using images from a database of size approximately 4000 and thousands of randomly chosen keys under various attacks. As a result, we experimentally demonstrated robustness against not only conventional perceptual-quality-preserving attacks (such as compression, AWGN, rotation, cropping, etc.), but also a malicious estimation attack that is specifically derived for our algorithm. Our scheme is generic and can be applied to audio signals and fingerprinting applications as well; those results shall be presented elsewhere. We stress the significance of randomization in mark-embedding problems for adversarial applications. Our general approach advocates using a PR signal representation that is both secure from an adversarial point of view (e.g., hard to invert without the knowledge of the secret key, minimizes information leakage, etc) and approximately invariant under perceptually acceptable attacks. Consequently, we wish to attract our reader’s attention to the necessity of robustness against not only generic attacks but also algorithm-specific attacks that are often ignored in the literature. Our approach in this paper is a first step towards deriving secure and robust image statistics and using them for mark embedding. Our future work includes finding better signal-representation domains from both security and robustness aspects and use them for mark-embedding purposes. Accordingly, we plan to design stronger algorithm-specific attacks and perform cryptanalysis of the proposed algorithms. Eventually, we would like to formally incorporate resource-bounded adversarial models to the mark-embedding problem.
9.
ACKNOWLEDGMENTS
The authors wish to thank Mustafa Kesal, S. Serdar Kozat, Pierre Moulin, Yacov Yacobi and the anonymous reviewers for their helpful comments.
10.
REFERENCES
[1] M. K. Mıh¸cak, R. Venkatesan, and M. Kesal, “Watermarking via optimization algorithms for quantizing randomized statistics of image regions,” in Proc. 40th Annual Allerton Conf. Comm., Control and Computing, Monticello, Illinois, October 2002. [2] P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of information hiding,” IEEE Trans. Inform. Theory, vol. 39, pp. 563-593, March 2003.
[3] A. S. Cohen and A. Lapidoth, “The Gaussian watermarking game,” IEEE Trans. Inform. Theory (Special Issue on Shannon Theory), vol. 48, pp. 1639-1667, June 2002. [4] A. Somekh-Baruch and N. Merhav, “On the capacity game of public watermarking systems,” IEEE Trans. Inform. Theory, vol. 50, pp. 511-524, March 2004. [5] B. Chen and G. W. Wornell, “Quantization index modulation: a class of provably good methods for digital watermarking and information embedding,” IEEE Trans. Inform. Theory, vol. 47, pp. 1423-1443, May 2001. [6] R. Zamir, S. Shamai (Shitz), and U. Erez, “Nested linear/lattice codes for structured multiterminal binning,” IEEE Trans. Inform. Theory, vol. 48, pp. 1250-1276, June 2002. [7] J. Eggers, “Information embedding and digital watermarking as communication with side information,” Ph.D. Thesis, Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg, 2001. [8] B. Chen and G. W. Wornell, “Achievable performance of digital watermarking systems,” in Proc. ICMCS, Florence, Italy, June 1999. [9] T. Liu, R. Venkatesan, and M. K. Mıh¸cak, “Scale-invariant image watermarking via optimization algorithms for quantizing randomized statistics,” in Proc. ACM Multimedia and Security Workshop, Magdeburg, Germany, September 2004. [10] M. Kucukgoz, O. Harmanci, M. K. Mıh¸cak, and R. Venkatesan, “Robust video watermarking via optimization algorithm for quantization of pseudo-random semi-global statistics,” in Proc. SPIE, San Jose, CA, January 2005. [11] D. Kirovski and H. S. Malvar, “Spread-spectrum watermarking of audio signals,” IEEE Trans. Signal Processing, vol. 51, pp. 1020-1033, April 2003. [12] S. Pereira and T. Pun, “Fast Robust Template Matching for Affine Resistant Watermarks,” Lecture Notes in Computer Science: Third International Workshop on Information Hiding, Springer, vol. 1768, pp. 199–210, 1999. [13] F. Deguillaume, S. Voloshynovskiy, and T. Pun, “Method for the estimation and recovering from general affine transforms,” in Proc. SPIE, San Jose, CA, January 2002. [14] M. K. Mıh¸cak, R. Venkatesan, and M. Kesal, “Cryptanalysis of discrete-sequence spread spectrum watermarks,” in Proc. Int. Inform. Hiding Workshop, Noordwijkerhout, The Netherlands, October 2002. [15] M. K. Mıh¸cak and R. Venkatesan, “Blind image watermarking via derivation and quantization of robust semi-global statistics,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Orlando, FL, May 2002.
[16] R. Venkatesan, S. M. Koon, M. H. Jakubowski and P. Moulin, “Robust image hashing,” in Proc. IEEE Int. Conf. Image Processing, Vancouver, Canada, September 2000. [17] F. A. P. Petitcolas and M. G. Kuhn: StirMark software. Available from http://www.cl.cam.ac.uk/˜fapp2 /watermarking/image watermarking/stirmark/. [18] A. J. Menezes, P. C. van Oorschot and S. A. Vanstone, Handbook of Applied Cryptography, Boca Raton: CRC Press, 1997. [19] M. K. Mıh¸cak and P. Moulin, “Information embedding codes matched to locally stationary Gaussian image models,” in Proc. IEEE Int. Conf. on Image Processing, Rochester, NY, September 2002. [20] T. Liu and P. Moulin, “Error exponents for one-bit watermarking,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Hong Kong, April 2003. [21] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge: Cambridge University Press, 1985. [22] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd ed., New York: Springer-Verlag, 1994. [23] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, New York: Kluwer Academic Press, 1992. [24] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inform. Theory, vol. 44, pp. 2325-2383, October 1998. [25] S. Pereira, S. Voloshynovskiy, M. Madue˜ no, S. Marchand-Maillet, and T. Pun: Checkmark software, available from http://watermarking.unige.ch/Checkmark/. [26] S. Voloshynovskiy, S. Pereira, T. Pun, J. J. Eggers, and J. K. Su, “Attacks on digital watermarks: classification, estimation-based attacks and benchmarks,” IEEE Communications Magazine, vol. 39, pp. 118-127, August 2001. [27] J.-P. M. G. Linnartz and M. van Dijk, “Analysis of the sensitivity attack against electronic watermarks in images,” in Lecture Notes in Computer Science, vol. 1525, pp. 258-272, 1998. [28] M. Holliman and N. Memon, “Counterfeiting attacks on oblivious block-wise independent invisible watermarking schemes,” IEEE Trans. Image Processing, vol. 9, pp. 432-441, March 2000. [29] M. Kutter, S. Voloshynovskiy, and A. Herrigal, “The watermark copy attack,” in Proc. Electronic Imaging, San Jose, CA, January 2000. [30] D. Kirovski and F. A. P. Petitcolas, “Blind pattern matching attack on watermarking systems,” IEEE Trans. Signal Processing, vol. 51, pp. 1045-1053, April 2003.
[31] S. LoPresto, K. Ramchandran, and M. T. Orchard, “Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework,” in Proc. Data Compression Conference, Snowbird, UT, 1997. [32] M. K. Mıh¸cak, I. Kozintsev, K. Ramchandran, and P. Moulin, “Low complexity image denoising based on statistical modeling of wavelet coefficients,” IEEE Signal Processing Letters, vol. 6, pp. 300-303, December 1999.