Watermarking via Optimization Algorithms for Quantizing Randomized Statistics of Image Regions M. Kıvan¸c Mıh¸cak
Ramarathnam Venkatesan
Microsoft Research
[email protected]
Microsoft Research
[email protected]
Mustafa Kesal∗ University of Illinois, Urbana-Champaign
[email protected] Abstract We introduce a novel approach for blind signal watermarking and apply it to images. We derive randomized robust semi-global features in a suitable transform domain (wavelets in case of images) and quantize them in order to embed the watermark. Quantization is carried out by adding an embedding sequence to the host; this sequence is computed by solving an optimization problem whose parameters are known to the information hider, but unknown to the attacker. We experimentally identify some conditions (our randomizations are aimed at achieving them) satisfied by our parameters, which formally and experimentally imply robustness of our algorithm against malicious optimal estimation attacks. We also tested its robustness against many generic (i.e. non-malicious benchmark attacks ) attacks.
1
Introduction
Watermarking (wm) aims to embed a message (i.e., “watermark”) reliably in host data, where the message can be later detected and used by a decoder for identification or copyright information purposes. The embedding is done by an encoder using some secret key, which results in an output that is required to be perceptually (approximately) the same as the original. The watermarked image may then undergo many possible changes by users and attackers: unintentional modifications and malicious attacks that particularly aim to disable the detection of the watermark. Ideally, the system must resist modifications and attacks as long as they result in images that are of perceptually the same quality. We distinguish here between data embedding (for communications purposes) and watermarking (for security purposes) problems, both of which fall into the category of information hiding. The latter involves a malicious attacker, and our approach emphasizes this aspect and presumes the attacker to be resource-bounded, and our approach may be suitable data embedding by considering a very mild adversary. There are fundamental difficulties in modelling such attacks and providing a formal security analysis of a given algorithm. Firstly, we do not have a reasonable model of perceptually non-distortive attacks. The standard models measuring some Euclideanlike norm in some canonically associated vector space falls short when one considers ∗
Work was performed while M. Kesal was with Microsoft Research.
malicious attacks. Second, the security arguments must be carried out in a model that explicitly recognizes that the attacker is resource bounded and this is the case we are interested in.For a discussion related to these issues we refer to [?] As there, our blind watermarking method has two main generic steps: (1) Given an image, derivation of robust image characteristics (a vector) that are not local (i.e. in the domain where the watermarking takes place) and are in fact semi-global in nature. (2) Quantization of these characteristics for information hiding. It is not obvious that why such characteristics may exist or how to derive them. The motivation for the semi-global nature of the characteristics is that experiments indicate that simple adversarial attacks can change local characteristics quite dramatically. Randomness is used to implicitly hide the characteristics or the metric the algorithm is using, by making it not easy for an adversary (without the key) to compute them. Finally the characteristics are so chosen such that they will stay approximately invariant if one introduces non-noticeable visual distortions. Note that by hiding information in such statistics, we are implicitly imposing or assuming a perceptual model on images. The robustness of similar statistics has previously been studied for the image hashing problem [?], where no information embedding was considered. In our scheme, we use random linear statistics of random regions (i.e., these statistics are given by pseudo-randomly weighted linear combinations of data in randomly chosen regions, where the weights are chosen randomly). These regions can potentially overlap and need not be connected in general. In [?], we hide information in the statistics of nonoverlapping regions. Confining to non-overlapping regions not only brings limitations to the rate of the watermark to be embedded, but also in case of strong watermarks blocking artifacts would be visible due to non-overlapping nature of randomly chosen regions. The proposed scheme in this paper gets rid of these limitations by designing an embedding sequence for random overlapping regions using minimum-norm criterion. The existence is guaranteed under some mild assumptions. As a natural byproduct of our quantizationbased approach, we can perform decoding and detection jointly at the receiver side (unlike spread-spectrum based schemes). Randomized steps of our algorithm are crucial in terms of security and constitute an essential contribution of this work (for other works using explicit randomization as a defense see [?] and references in [?]). One can view this as a way of minimizing the chances that an adversary can foil certain security related conditions specified later (which imply robustness against estimation attacks), but it is only part of the picture. We need to formalize this last remark and the quantizations studied in this work need to be extended to higher dimensions. Note that, when randomization is mentioned in this paper, it should be understood that the randomization is carried out by means of a random number generator whose seed is the secret key, where this key is known to both the encoder and the decoder, however unknown to the attacker.We experimentally show the robustness of our algorithm against a particular type of malicious estimation attacks, where the attacker knows the unmarked host data statistics and partial information about the randomized steps of our algorithm. However, we stress that robustness against such a particular type of malicious attack (where the attacker has clearly the upper hand due to the information he has), by no means, implies absolute security for the proposed approach. Furthermore, we present successful experimental results relating to robustness against bench mark attacks that include a wide range JPEG compression and independent additive white Gaussian noise (AWGN) addition type attacks.
2
Watermarking algorithm
Our algorithm will use a secret K and we shall assume that a cryptographically secure pseudo-random generator is used to suitably randomize our steps below. We write B = {0, 1} representing the domain of each WM element. We represent the signals as vectors, which does not describe the geometry of a 2-D picture, but is sufficient to describe our algorithm. In the notation below, subscripts denote individual elements of vectors, unless otherwise stated. s ∈RN : The host signal. w ∈B M : Watermark to be embedded, an encoded version of some given message. Ri : A pseudo-randomly chosen region denoting a subset of {1, 2, . . . N }, i ≤ M . ci ∈RN :Pseudo-random host-dependent weights, one ci for each bit wi of w, 1 ≤ i ≤ M . 4 P m ∈ RM : Random linear statistics of host, mi = j∈Ri cij sj , 1 ≤ i ≤ M . Define f (c1 , ..., cM , s) = m, for a fixed choice of regions. C: A discrete set of pseudo-randomly specified points in RM (codebook). QK : B M × RM → C: A keyed quantization algorithm. b Given these definitions the watermarking We drop K and write Q(w, m) = m. algorithm is a map (dependent on K) that produces the watermarked signal x ∈RN : s 7→ x, so that b f (c1 , . . . , cM , x) = Q(w, m) =m. This will have many solutions but we pick our x to minimize certain distortion metric described below. In general distortion metrics are not helpful (and can be a main modeling weak-point) but our metric is randomized, and as such is not efficiently computable by an adversary who does not know K. Note that the above transformation f (·) is non-linear if ci are host-dependent in a non-linear fashion. In this paper we mainly consider ci to be independent of the input, and our limited experiments produced similar results when the dependencies were nonlinear (which is important from security view point, and not explored here). This work extends [?] in a crucial way, where the regions are not overlapping. The motivation is that of security concerns (as it enables increasing the dimensionality of the problem) as well as to minimize the perceptual distortions introduced by the watermarking procedure. The non-trivial problem is that the watermarking criteria which is specified globally can pose contradictory conditions on data in the intersection of regions Ri and Rj , where weighted average in Ri may have to be increased while that of Rj has to be decreased.The point in using an optimization algorithm is to resolve this conflict, and each choice of such algorithm’s effect must be formally cryptanalyzed, and this is what this work does in part. Quantization has been used in [?, ?, ?] but our use (and motivation) is fundamentally different, as we mentioned earlier. In this paper, we make our quantizations local (i.e. scalar) and study the effects of attacks on it. For our approach, it is essential that the quantization occurs in large enough dimensions to exploit the combinatorial hardness of some underlying problems, and this is being addressed in an ongoing work [?]. n We use scalar quantizers Q0 and Q1 , which are derived from the codebooks Ci = (−1)i ∆/4 + j∆ + kij | j ∈ Z}, i = 0, 1. Here ∆ is the step size of a base quantizer and {kij } are random quantities and chosen independently uniformly from [−γ∆/2, γ∆/2].
Here γ∈ (0, 1)is an input parameter to ensure sufficient randomization of the codebook design while maintaining desired amount of robustness. In order to enhance robustness properties, we use error-correction codes (ECC) at the encoder to produce w from a given message. For simplicity, we use block repetition ˆ 0 (resp. m ˆ 1 ) be the quantized version of m using Q0 (resp. codes at the moment. Let m ˆ where Q1 ). Then watermark embedding is carried out by finding m, ½ m ˆ 0i if wi = 0, 1 ≤ i ≤ M. m ˆi = m ˆ 1i if wi = 1, More explicitly the above criteria specifies the following task: given s, {Ri }, {ci }, m, ˆ to find x, such that and m, X (xj cij ) = m ˆ i , 1 ≤ i ≤ M. (2.1) j∈Ri
We provide a solution to this problem using the minimum-norm criterion [?]; this is discussed in detail in Sec. ??. At the receiver, the initial task is to find the random statistics of the input data given ˜ be the random statistics of y, i.e., the secret P key. Let y be the receiver input and m ˜ we find the most likely codewords m ˜0 m ˜ i = j∈Ri (xj cij ), 1 ≤ i ≤ M . Then, given m, ˜ 1 from C0 and C1 respectively via nearest neighbor decoding: and m 4
˜ j − t||, m ˜ ij = argmint∈Ci ||m
0 ≤ i ≤ 1,
1 ≤ j ≤ M.
The next step is soft decoding, where thresholding is applied to the log-likelihood ratios ˜ 0 or m ˜ 1 ). Consider (where AWGN attack channel is assumed on a possible codeword, m a block repetition code of rate 1/L applied at the encoder. In that case, the first decoded P L
(m ˜ j −m ˜ 0j )2
bit at the receiver is 0 if log Pj=1 L (m ˜ j=1
˜ 1j ) j −m
2
< 0; 1 otherwise. Similar procedure is applied
for the next decoded bits. Note that this decoding technique is similar to the one in [?] and a simpler variant of the one in [?]. Thus, we refer the reader to [?, ?] for further details on decoding. In this paper, we consider only the “decoding” problem (i.e., receiver finds an estimate of m with the goal of no-error under attacks). However, it is possible to use this framework for “verification” purposes as well (where detection is done at the receiver and the output is a binary decision on the existence of a possibly embedded wm). If this is the case, then the decision on the existence of the watermark is carried out by thresholding the Hamming distance between the decoded vector and the embedded watermark vector (similar to the method used in [?]).
3
Design of Embedding Sequences
We now present two methods to design x s.t. (??) is satisfied. First, we design an “additive” embedding sequence such that ||x − s|| is minimized and secondly we design “multiplicative” embedding sequence to minimize the distance between x./s (where ./ denotes element-wise division, si 6= 0, 1 ≤ i ≤ N ) and unity vector is minimized. Now, consider the following definitions: 4 ˆ − d ∈ RM : d = m ½ m. cij if j ∈ Ri 4 T ∈RM ×N :Tij = 0 else 1∈ RN : A vector with all entries equal to one.
ˆ Note that T is chosen such that Ts = m and our goal is to find x such that Tx = m. Furthermore, we assume that M < N and T is full-rank. Note that this is a mild assumption when M ¿ N which can be satisfied by keeping the watermarking rate M N small enough.
3.1
Design of Additive Embedding Sequences 4
In this section, we use the definition of n = x − s. Lemma 3.1 The solution to ˆ s.t. Tx = m.
min ||x − s|| x
(3.2)
¡ ¢−1 ˆ − m) . is given by x = s + TT TTT (m Proof : The problem (??) can be rewritten as minn ||n|| s.t. Tn = d, whose solu¡ ¢−1 tion is given by ([?]) nM N = TT TTT d. The result follows. Remark 1: Lemma ?? provides the solution that is optimal in the sense of Euclidean norm. It is, of course, questionable if Euclidean norm is a good measure of ¡ perceptual ¢−1 T ˆ − m) quality relative to perceptual distortion issues. For instance, the solution T TTT (m may have some spikes, which would degrade the perceptual quality. Our experiments suggest such artifacts become more likely as ∆ increases. For security concerns, we recall that our metric is randomized and not easily computable without the key. If the solution given by Lemma ?? is visually annoying, it is possible to use some heuristics. Such heuristics can be viewed as analogous to having regularization terms in inverse problems. In that case, the formulation is still useful, provided that the heuristics can be expressed as extra linear constraints in (??). For instance, we may be interested in finding the opˆ ⇔ Tn = d timal additive embedding sequence that both satisfies the condition Tx = m and is approximately band-limited (to provide smoothness). Hence, one possibility to analytically express the smoothness constraint could be to impose Dn = 0 where D is a submatrix of the conventional dct matrix, which ensures that n is approximately band-limited to a low frequency range. Then, (??) can be rewritten as min ||n|| n
4
s.t. T0 n = d0 ,
(3.3)
4
where T0 = [ T ; D ] and d0 = [ d ; 0 ]; here 0 is a vector of appropriate size that consists of 0’s everywhere. Therefore, the solution to (??) is given by applying Lemma ??, where T and d are replaced by T0 and d0 respectively. Remark 2: Our non-singularity assumption on TTT , while it is valid for most practical situations, imposes constraints on the choice of random regions and the random weights. In some extreme cases, where the average region size and the number of such regions is large, it may happen that even though T is still full-rank, the condition number of TTT is large, which may lead to numeric problems. Furthermore, if the condition number is large, then there is a potential possibility of information leakage to the attacker, due to the unbalanced nature of computing the statistics. Hence, parameters must be chosen to prevent these cases as our experiments suggest (possibly involving a search in secret key space).
3.2
Design of Multiplicative Embedding Sequences
Now we embed via a multiplicative embedding sequence n, i.e., xi = si ni , 1 ≤ i ≤ N and our measure of quality is the distance between n and 1, i.e., solve the optimization problem: ˆ min ||n − 1|| s.t. Tx = m. (3.4) x
Lemma 3.2 The solution to (??) is given by xi = n ˆ i si ,
1 ≤ i ≤ N where
¡ ¢−1 ˆ = 1 + STT TS2 TT ˆ − m) , n (m and S =diag(s1 , ..., sN ). Proof : The constraint in (??) can be rewritten as ˆ − m = d = T (x − s) = TS (n − 1) , m where S is as defined in the statement of Lemma ??. Thus the problem (??) is equivalent to minn ||n − 1|| s.t. T (n − 1) = d. Applying the standard minimum-norm result, we h i−1 ¡ ¢−1 get (with a diagonal S), (n − 1)M N = (TS)T TS (TS)T d = STT TS2 TT d. Remark 1: The difference between the multiplicative and the additive embedding schemes is in the visual quality while they are comparable in robustness aspects, as our tests suggest. The distribution of the resulting watermarking distortion is quite different since they use different metrics. The superior method often depends on the input image. Remark 2: We can use techniques similar to the ones mentioned in Remark 1 of the last section to achieve smoothing and avoiding visual artifacts.
4
Robustness Against Estimation Attacks
Attacks on watermarking schemes can be either generic (i.e., benchmark type [?, ?]) or algorithm-specific that take into account the structure of the targeted algorithm. Such attacks would be cryptanalytic in spirit. See [?], for a latter type of an attack on an algorithm that is quite robust against generic attacks. There, it is suggested that the use of stochastic models for attackers is realistic while a similar assumption on attacked signal by a detector can be inappropriate. In this section, we derive an optimal estimation attack on the additive embedding scheme using a particular stochastic model on the host data and the embedding sequence. Moreover, we allow some host data statistics and some partial information about T be known by the attacker. Consequently, the attacker applies minimum mean-squared error (MMSE) estimation on the watermarked data as an attack. We consider additive embedding sequences in this section. We now list a few assumptions: 1. For each i ≤ M, di is uniform in [−∆/2, ∆/2] . 2. {di } are uncorrelated, i.e., E [di dj ] =
∆2 δ . 12 ij
3. d and s are uncorrelated i.e., E(di sj ) = E(di )E(sj ).
Estimated autocorrelation matrix of quantization errors
1.2
Rectangle index
1
p.d.f.
0.8
0.6
0.4
0.2
0 −0.5
−0.4
−0.3
−0.2
−0.1 0 0.1 normalized quantization error
0.2
0.3
0.4
0.5
Rectangle index
(a)
(b)
Figure 1: (a) Solid : Histogram of {di /∆} for image Lena; dashed : probability density function £ T ¤ of uniform distribution in the corresponding region. (b) Empirical estimate of E dd over 80 different images. 4. n is Gaussian. First, we discuss only the experimental aspects of verifying these conditions. An inspection of the histogram of {di /∆} in Fig. ?? for various values of ∆ of interest verifies assumption (1). Next, we repeated the same experiment for 80 different images fixing the secret key (i.e., same T and same codebook is used in all these images). This yields empirical estimates of E [di dj ] and thus an estimate of the autocorrelation £ ¤ 4 matrix Rd = E ddT which is shown in Fig. ?? (b). The experimental estimate is nearly diagonal, which validates (2). In these experiments we worked in the 3rd dc subband of the input image; this is as in our algorithm in section Sec. ??. As M and the density of non-zero entries of T increases, assumption (4) becomes more valid by central limit theorem. 2 By (2), Rd = ∆12 IM , where IM is the identity matrix. Now, using the full rank assumption on T, we let singular value decomposition (SVD) of T be T = UΣT V where U ∈ RM ×M is unitary, ΣT ∈ RM ×M is a non-singular diagonal matrix and V T ∈ RM ×N with orthonormal columns. From section 3.1 , n = TT (TTT )−1 d = VΣ−1 T U d. 4
2
T From this we get Rn = ∆12 VΣ−2 T V . Our current attack analysis uses Gaussian models on the host data s; however our algorithmic construction is motivated by the desire to derive similar results while avoiding or minimizing such stochastic assumptions, but rather by the randomization steps that result from random secret key. This issue shall be explored in our future research. Following [?], we now model s as an independent, but not necessarily identically distributed 0-mean Gaussian vector. Furthermore, we assume that the variance field of s is smoothly varying, based on which we use locally approximately i.i.d. assumption on s. Using this approximation, we estimate the underlying variance field; see [?] for further details. Let Rs denote the autocovariance matrix of s. We now make the following assumption: • The attacker knows Rs and Rn
2
T Now, since Rn = ∆12 VΣ−2 T V , the attacker gets some information about T for free. As our experiments reveal. even then the attacker does not succeed: Indeed, in this Gaussian setup, the attacker can estimate s, given x = s + n optimally using Wiener filtering: ˆs = Rs (Rs + Rn )−1 x.
Due to the Gaussian nature of the setup, this estimate coincides with both mmse and map estimates of s given x [?]. Consequently, we apply approximate Wiener filtering attack on x; the estimation method of Rs given s in the locally Gaussian model is explained in detail in [?], estimation of Rs given x is however non-trivial. Nevertheless, we allow the adversary to know the estimate of Rs and it only strengthens our result. In all of our experiments, we observed that all the embedded bits are recovered without an error under Wiener filtering attack.
5
Application to Image Watermarking We used many standard test images of size 512 × 512 as inputs to our algorithm applied to dc subband coefficients for a 3-level discrete Wavelet transform dwt with Daubechies length-8 wavelets. The transformation matrix T is chosen such that m represents weighted averages of randomly-chosen rectangles in this subband. Weights are chosen as i.i.d. realizations of a Gaussian distribution, whose mean is the reciprocal of the area of the particular rectangle and variance is an algorithm parameter (we also tested when these are input dependent). Typically, we chose around 200 rectangles to avoid singularity of T; their locations and sizes are chosen randomly from uniform distri1 butions such that the expected area of each rectangle is 16 -th of the area of the subband. The exact parameters of these distributions are user-dependent algorithm parameters. The base quantization step size ∆ is chosen in an image-dependent manner, such that the distortion introduced at the encoder is just noticeable. We applied both additive and multiplicative designs for embedding sequences. We used rate 15 block repetition codes at the encoder where the repetitions are scattered randomly. At the receiver, we first resized the input image to size 512 × 512 by bicubic interpolation. Then, we used nearest-neighbor decoding followed by soft decoding based on log-likelihood ratios to perform decoding. In general, it turns out that soft decoding produces better results than hard decoding. We applied several attacks in our tests. In all our experiments, we observed that all embedded bits are recovered correctly (for both additive and multiplicative designs of embedding sequences) in case of Wiener filtering attack (explained in Sec. ??), JPEG compression attacks (with quality factor as low as 10%), AWGN attacks (with noise standard deviation as high as 40) and several smoothing attacks with linear shift-invariant filtering. Furthermore, we can withstand (i.e., recover all the bits correctly) geometric attacks mentioned in [?] when we use lower-rate codes (such as rate 1/40) as long as these attacks do not create severe perceptual distortion; for instance we are robust against all scaling attacks, rotation attacks up to ±1 degree and cropping attacks up to 2%. Watermarked image Lena and some attack examples are shown in Fig. ??.
6
Discussion We proposed a novel signal watermarking scheme, where message bits are embedded in randomly-chosen semi-global statistics of the input via quantization. The embedding sequence is found by solving an optimization problem, whose parameters are known to the encoder and decoder, but unknown to the attacker. Our approach stresses randomization
(a)
(b)
(c)
(d)
Figure 2: (a) Watermarked Lena with ∆ = 100. Attacked versions of watermarked Lena are shown in (b) (where Wiener filtering is applied as explained in Sec. ??), (c) (where JPEG compression with quality factor 10% is applied) and (d) (where AWGN attack is applied with noise standard deviation is 40). In all the attack examples, we recover all the embedded bits.
in watermarking applications. In this paper, we applied this approach to images and experimentally demonstrated robustness against not only non-malicious JPEG compression and AWGN attacks, but also malicious estimation attacks. Our scheme is generic and can be applied to audio signals as well; those results shall be presented elsewhere. Our future research includes design of better codes and better transforms (possibly randomized) to enhance robustness. Furthermore, we would like to apply this approach in the fingerprinting problem as an embedding layer. Acknowledgments : We thank Mariusz Jakubowski (Microsoft Research) and Pierre Moulin (University of Illinois) for useful discussions.
References [1] A. J. Menezes, P. C. van Oorschot and S. A. Vanstone, Handbook of applied cryptog-
raphy, CRC Press, Boca Raton, FL, 1997. [2] R. Venkatesan, S.-M. Koon, M. Jakubowski and P. Moulin, “Robust image hashing,”
Proc. IEEE ICIP 2000, Vancouver, Canada, September 2000. [3] M. K. Mıh¸cak and R. Venkatesan, “Blind Image Watermarking Via Derivation and
Quantization of Robust Semi–Global Statistics,” Proc. IEEE ICASSP 2002, Orlando, FL, May 2002. [4] R. Venkatesan and M. H. Jakubowski, “Image watermarking with better resilience,”
Proc. IEEE ICIP 2000, Vancouver, Canada, September 2000. [5] B. Chen and G. W. Wornell, “Quantization index modulation: A class of provably
good methods for digital watermarking and information embedding,” IEEE Trans. on Information Theory, vol. 47, no. 4, pp. 1423-1443, May 2001. [6] M. K. Mıh¸cak and P. Moulin, “Information Embedding Codes Matched to Locally
Stationary Gaussian Image Models,” Proc. IEEE ICIP 2002, Rochester, NY, Sep. 2002. [7] M. Kesal, M. K. Mıh¸cak, R. K¨ otter and P. Moulin, “Iteratively Decodable Codes
for Watermarking Applications,” Proc. 2nd Symposium on Turbo Codes and Their Applications, Brest, France, Sep. 2000. [8] K. Jain, M. K. Mıh¸cak and R. Venkatesan, “Lattice Rounding Approach to Water-
marking and Hashing,” preprint, 2002. [9] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, NY,
1985. [10] F. A. P. Petitcolas and M. G. Kuhn:
StirMark software, available from www.cl.cam.ac.uk/ ˜fapp2/watermarking/image watermarking/stirmark/.
[11] S. Pereira, S. Voloshynovskiy, M. Madue˜ no, S. Marchand-Maillet and T. Pun: Check-
mark software, available from watermarking.unige.ch/Checkmark/. [12] M. K. Mıh¸cak, R. Venkatesan and M. Kesal, “Cryptanalysis of Discrete-Sequence
Spread Spectrum Watermarks,” Proceedings of 5th Information Hiding Workshop, Holland, Oct. 2002. [13] H. V. Poor, An Introduction to Signal Detection and Estimaton, 2nd Ed., Springer-
Verlag, 1994.