Rotation, Scaling, and Translation Robust Image Watermarking Using Gabor Kernels Hiuk Jae Shim and Byeungwoo Jeon1 School of ECE, Sungkyunkwan Univ., Korea ABSTRACT In this paper, we propose a RST-robust watermarking algorithm which exploits the orientation feature of a host image by using 2D Gabor kernels. From the viewpoint of watermark detection, host images are usually regarded as noise. However, since geometric manipulations affect watermarks as well as the host images simultaneously, evaluating host image can be helpful to measure the nature of distortion. To make most use of this property, we first hierarchically find the orientation of the host image with 2D Gabor kernels and insert a modified reference pattern aligned to the estimated orientation in a selected transformed domain. Since the pattern is generated in a repetitive manner according to the orientation, in its detection step, we can simply project the signal in the direction of image orientation and average the projected value to obtain a 1-D average pattern. Finally, correlation of the 1-D projection average pattern with watermark identifies periodic peaks. Analysed are experimental results against geometric attacks including aspect ratio changes and rotation. Keywords : Gabor kernels, projection average, rotation, scaling, translation
1. INTRODUCTION Many watermarking schemes are developed in order to obtain a proper degree of robustness against a lot of different distortions. One of the possible ways of accomplishing this is to identify the nature of distortion and to measure the exact amount of distortion in order to restore undistorted watermark signal by inverting the distortion. Another approach is to use a distortion-invariant domain that is independent of possible distortions. Every watermarking algorithm is trying to pursue some robustness based on these simple rules. There are many different kinds of attacks, and among those, detection of watermarks suffers most from the geometrical distortion such as rotation, scaling, translation (RST), and aspect ratio changes. Although this class of distortion does not decrease the quality of image much, nor obliterate the embedded watermark, the detection process fails to operate properly. Therefore, it is a very simple but quite an effective class of attacks to decrease the performance of watermark detection. To overcome this vulnerability, one can seek a RST-invariant domain in which watermark signal is embedded. This approach is a basic and theoretically reasonable way of overcoming geometrical distortions. O’Ruanaidh and Pun1 have introduced a method to identify a transform-invariant domain using LPM(Log-polar Map) and Fourier-Mellin Transform. However, it has a disadvantage of difficulties in its implementation. Pereira et al.2-3 showed the use of templates, LPM, and LLM(Log-log Map) with purpose of recovering watermark signal from geometrically distorted images. This scheme works with relatively reasonable complexity and processing time. However, recent report of a method of removing template by Herrigel et al.5 suggests that this template-based method is risky. Instead of any other tools such as template, Kutter4 has proposed the use of autocorrelation function (ACF) which utilizes watermark as a reference point. Since watermark itself is used as a reference point, there is no additional degradation. This approach is quite a remarkable method, however, its exhaustive search to find ACF peaks limits its use as usual in real-time applications.
Further author information: Hiuk Jae Shim: E-mail:
[email protected] Byeungwoo Jeon: E-mail:
[email protected]
Security and Watermarking of Multimedia Contents IV, Edward J. Delp III, Ping Wah Wong, Editors, Proceedings of SPIE Vol. 4675 (2002) © 2002 SPIE · 0277-786X/02/$15.00
563
One method6 to reduce the complexity of the ACF is to insert a pattern periodically over whole image. Since magnitude spectrum of a periodical pattern yields discrete grid that can be used as reference points, recovery from geometric distortions becomes easier by exploiting this property. The periodicity of embedded block reduces complexity reasonably, however, the periodicity itself presents another problem. That is, since the reference pattern is repeated, without any knowledge on the embedded pattern, it is possible to estimate the period that can be used in turn to remove the pattern. Despite of the possible risk associated with the periodical embedding, its merit of complexity reduction made us adopt the periodically generated pattern in this paper. Our proposed method basically consists of two parts. The first one is utilizing image feature. When geometric manipulations occur, the distortions affect the host image in the same way as the watermark; therefore evaluating host image can be helpful to measure the nature of distortions. The first step to use image features is estimating orientation of image, which is accomplished by the Gabor kernels in our proposed method. The second part is about the projection average (PA). Since an image signal is represented in two dimensions, most approaches to recover geometrical distortions carry out two-dimensional search that consequently calls for complexities in general. In our paper, we use the PA method so that 2-D search becomes 1-D search to decrease its complexity.
2. WATERMARK EMBEDDING AND EXTRACTION The previous section discussed a possible attack on using periodical pattern embedding, however, it still has an attractive merit of increased detection probability. In other words, robustness is enhanced against another attack such as compression. Furthermore, it does not mandate precise detection of embedded signal on up-sampled version of images. In this section, we describe how to utilize some features of host image so that embedded pattern can be generated in a way repetitive and adaptive to host image. 2.1 ESTIMATING ORIENTATION OF HOST IMAGE BY USING GABOR KERNELS The geometrical distortion has little effects on the quality of host images and embedded signal, and its characteristic somewhat differs from other kinds of attacks. Because of the nature, distortion on embedded signal is relatively less than that by other attacks; moreover, the amount of distortion affects both the host and embedded signals in the same way. Therefore, analysis of distorted image can provide critical information about the degree of geometrical distortion. In addition, pattern is embedded in accordance to some features of image, which can be seen as an image adaptive way of embedding. There are many features to consider, which can represent nature of image itself, i.e., lines, edges, colors, and etc. One of the most interesting features among them is the orientation of an image. Since every image has its own orientation (for instance, an image with a lot of buildings has much more horizontal frequency than the vertical one, or vice versa), main energy of an image is likely to spread to a particular direction. Gabor kernels have been widely used in object recognition applications because filtering images with Gabor functions helps estimating directional blob or directional edge components7-10 in object detection. In our method, we evaluate the directional feature of a host image with 2D Gabor kernels, which have their own energy spread into specific directions. A 2D Gabor function is a product of an elliptical Gaussian function with a complex exponential representing harmonic modulation. A general form of the 2D Gabor function 7 is given by,
g ( x, y) = e
2π −[( x − x0 ) 2 / 2σ x2 + ( y − y0 ) 2 / 2σ y2 ] − j λ [( x− x0 ) cos(θ ) +( y − y0 )sin(θ )]
e
(1)
where, (x0, y0) is the center of the elliptical Gaussian function, σx2 and σy2 are the variance of the Gaussian function in the horizontal and vertical directions, and θ and λ are the orientation and wave-length of the harmonic modulation function, respectively. In addition, assuming a unity aspect ratio (σy /σx =1), the function can be separated as a real Gabor function (RGF) and an imaginary Gabor function (IGF). The separated form of a Gabor function is as followings.
564
Proc. SPIE Vol. 4675
RGF ( x, y )
2π = cos λ 2π
IGF ( x, y ) = sin
λ
{ x cos(θ ) + y sin(θ )}
e
{ x cos(θ ) + y sin(θ )} e
− ( x2 + y2 ) 2σ 2
(2)
− ( x2 + y 2 ) 2σ 2
These two functions have different features from each other. RGF has one large positive central lobe and two smaller negative lobes on both side of it, while IGF has one large positive lobe and one negative lobe. These features are shown in Figure 1 (a) and (b), whose differences are mainly made by cosine and sine functions in their forms. Therefore general usage of RGF is appropriate to detect object height and width, while IGF is to detect object edges.
(a)
(b)
(c)
(d)
Figure 1: (a) shape of RGF, (b) shape of IGF, (c) RGF (θ=45 °), (d) IGF (θ=45 °) Since we are interested in Gabor kernels in that they can indicate some specific directions of images, we utilize them only as a directional estimator. Figure 1 (c) and (d) show Gabor kernels with specific directions, therefore it is expected that correlating with any image of the same direction will result in high correlation value.
(a)
(b)
(c)
Figure 2: (a) original lena image, (b) image rotated by 15°, (c) correlation values
Proc. SPIE Vol. 4675
565
One example can be found in Figure 2. The orientations of Figure 2 (a) and (b) are measured by Gabor kernels and the results are shown in Figure 2 (c) which shows normalized correlation value between Gabor kernel and images by varying θ in (2) from 1° to 180°. Maximum correlation values of original and rotated image occur at about 48° and 63° respectively, which illustrates the feasibility of estimating orientation with the Gabor kernels. When we correlate the Gabor kernels with an image, the size of image and kernels needs to be the same. Moreover, it is apparent that small size of kernel reduces complexity of calculating correlation. Gabor kernels are relatively small compared to host image. Therefore we firstly down-sample the host image to the size of Gabor kernel. The advantage of using down-sampled image is that slight change of details is negligible to some extent. To estimate the orientation of a host image, we need to find the location of the maximum correlation peak in Figure 2 (c). Therefore we correlate the host image with Gabor kernels having orientation from 1° to 180° in the step of 1°. Despite of the small size of the images, this process is still of high complexity, therefore we employ a hierarchical approach which consists of three steps. In the first step, correlation is calculated at every 20° as shown in Figure 3. Therefore we evaluate 9 correlation values. Among them, we select an angle giving the largest correlation value and its immediately adjacent angle giving larger correlation value than the other one. These two angles identify a selected interval of 20°. At the second step, we divide the selected interval to sub-intervals of 5° and calculate correlation values for each sub-interval as before. In this step, we get 4 correlation values. We apply the same rule as above to find a candidate sub-interval of 5°. At the final step, the selected sub-interval is divided into 5 sub-intervals of 1° and only the largest correlation value is found. The corresponding angle is regarded as the orientation of image. Since correlation calculation is performed 18 times in total, which is 1/10 of the full range search times, it is obviously a way of computational reduction.
Figure 3:
3 step search of image orientation
2.2 WATERMARK EMBEDDING Once the orientation of the host image is estimated, an embedding pattern is generated with reference to a user key and the orientation of a host image. For this purpose, we firstly generate a 1-D reference pattern W, determined by a user key. W consists of –1, or +1 and its length is k. Suppose transforming function is X, which transforms the host image I
into a signal C in a selected embedding domain, then the relation is C = X (I). When the size of the transformed host image is MxN, W is repeated over horizontal length N and the repeated sequence is denoted by W1D. When N is not an integer multiple of k, W is repeated by J times where J⋅k < N and the remaining (N-J⋅k) positions are padded with zero value. Since the embedding domain is 2 dimensional, W1D is repeated through each row again by M times. The final result is W2D of size MxN which is the same size as the transformed signal C. An example of reference pattern is shown in Figure 4 (a). Since the same 1-D pattern is repeated in each row, there are seen vertical lines. For this reason, we call it having an orientation of 90°.
566
Proc. SPIE Vol. 4675
(a)
(b)
Figure 4: (a) W2D (90° orientation), (b) W2D (50° orientation) In actual embedding, we use a reference pattern that has the same orientation as the host image. Suppose the host image has an orientation of 50°, then the reference pattern having an orientation of 50° as shown in Figure 4 (b) is used in embedding. Embedding process is then simply adding W2D to C as described in (3), where s is a weighting factor and C represents an embedded signal.
C = C + s ⋅W2 D
(3)
2.3 WATERMARK EXTRACTION The 2 dimensional search generally required by estimation of geometrical distortions is one of the main source of computational burden. If the 2-D search can be separated into two 1-D searches or reduced to one 1-D search, then the complexity can be substantially reduced. In this paper, we use the so called, projection average (PA) approach which makes 1-D separation possible. Simply speaking, it is to project the signal onto an axis perpendicular to the orientation of host image, and then to evaluate the average of the projected values. PA of 2-D reference pattern W2D having 90° of orientation as in Figure 4 (a) is described as following.
PAW2 D =
1 M
∑W
(4)
2D
row
Since the projection and averaging is done through each column (or 90° orientation), PAW2 D in (4) is exactly the same as 1xN W1D. Similarly, PA of C assuming 90° orientation of an host image can be derived as following.
∑ ∑
1 (C + s ⋅W2 D ) M row 1 1 = C+ s ⋅ W2 D M row M row = PAC + s ⋅ PAW2 D
PAC =
∑
(5)
As mentioned previously, PAW2 D is the same as W1D, therefore correlating PAC with W results in several correlation peaks whose period is k. Therefore comparing the peaks of PAW2 D and PAC can inform us the degree of geometrical distortions. Figure 5 (a) shows an example of generated W2D, whose PA direction is along column (90° orientation). PAW2 D and its ACF are shown in Figure 5 (b) and (c), respectively. Since evaluating the orientation of the image and utilizing the information in generating and embedding reference pattern makes the proposed method somewhat free from rotation, the only concern over the obtained peaks is the scaling information for proper recovery of watermark. However, the PA approach is based on the assumption that projection direction (i.e. orientation of the host image) can be exactly found, otherwise correlation peaks will not be obtained. Therefore finding an accurate projection direction is of main concern.
Proc. SPIE Vol. 4675
567
(a)
(b)
(c)
Figure 5: (a) W2D, (b) PAW2 D (below) and PAC (above), (c) ACF of PAW2 D (below) and ACF of PAC (above)
The total procedure described so far is summarized in Figure 6.
Figure 6: embedding part (above) and extracting part (below)
3. EXPERIMENTAL RESULTS 3.1 ACCURACY OF ORIENTATION ESTIMATION In order to utilize Gabor kernels, we need to determine parameters defining Gabor kernels. Those are the size of Gabor kernel, variance of Gaussian σ2, and wave-length λ. In order to obtain proper estimation results, we performed simulation with various parameters to finally decide: the kernel size = 1/16 of host image size, variance of Gaussian = 1/4 of kernel size. The wave-length λ determines frequency component of the generated kernels. A small λ indicates high frequency and large λ indicates the opposite. Low frequency case results in more accurate estimation. Therefore, we set λ to 1/2 length of kernel’s row (or column). Table 1 summarizes the accuracy of orientation estimation where “diff.” indicates difference between a true and its estimated orientation after specified distortions. Since the parameters defining the Gabor kernels may interact with the performance, more optimal selection criterion of parameter values must be further studied.
568
Proc. SPIE Vol. 4675
Table 1: Accuracy of orientation estimation lena
baboon
pepper
diff.
diff.
diff.
0
1.8
-2.83
0.04
0
3.41
-4.6
-2.94
1 : 1.6
0
5.68
-5.11
-4.67
1 : 1.8
0
6.51
-5.21
-5.99
1: 2
0
6.79
-5.77
-6.76
1: 1
1
0
-2
-1
1: 1
10
3
-3
-1
1: 1
20
-4
1
4
1: 1
30
0
6
8
1: 1
40
2
12
-11
Aspect ratio change(H:V)
Rotation angle
1 : 1.2 1 : 1.4
3.2 GEOMETRICAL DISTORTIONS The host image is 256x256 lena and embedding domain is assumed to be spatial domain. However, instead of spatial domain, other transform domain is applicable as well. Since simulation was carried out on the spatial domain, the weighting factor s in (3) is to be determined to exploit so-called spatial masking effect11. In order to obtain a weighting factor s, we used the edge filter in a similar manner by Kalker12. The weighting factor s is given by
s=
log( I f )
log( I f
) max
*t
(6)
where If represents a filtered image and t is a gain factor which is set to 2 in the experiments. The embedded image is shown in Figure 7.
Figure 7: original lena image (left) and pattern embedded image (right) Figure 8 (a) shows the generated PAW , whose PA direction is along the orientation estimated by the Gabor kernels 2D and Figure 8 (b) shows “scaled and rotated” image. Scaling ratio is 1.2:1.4(20% and 40% enlarged in the horizontal and the vertical directions, respectively), and rotation angle is 12°. Since the watermarked image is distorted, correlation peaks of PA in Figure 8 (c) do not seem to be periodic. Taking DFT is helpful to find its period as shown in Figure 8 (d). For illustration, Figure 8 (d) shows a partial interval of discrete magnitude spectrum. Before taking DFT, we calculate gradient of (c) and suppress the local peaks, then it is easier to estimate the scaling information. Moreover, since length L of DFT/frequency index = period T, comparing the period of each gives how much reference pattern has been distorted.
Proc. SPIE Vol. 4675
569
(a)
(b)
(c) (d) Figure 8: (a) W2D (50° orientation), (b) distorted lena image (1.2:1.4 aspect ratio change and rotated by 12°), (c) ACF of PAW2 D (below) and ACF of PAC (above), (d) DFT of (c)
4. CONCLUSSION Geometric distortions such as rotation, scaling and translation are very effective attacks. In this paper, we propose a method relatively robust to such attacks. By using Gabor kernels, we extract the orientation of image and based on the orientation information, reference pattern is periodically generated. In its detection step, the PA approach is used, which project the pattern to the estimated orientation; therefore after the PA procedure, 2-D processing becomes 1-D processing which is computationally much simpler. Consequently the estimation of orientation is one of the main concerns. As shown in the results, the estimation of orientation by Gabor kernels is still not accurate. This calls for additional search around the estimated orientation to further refine the estimation. Moreover, the results in previous section show robustness against geometrical distortions, however not like scaling and rotation case, translation has somewhat different effect on the proposed method. Since orientation is estimated with assumption of the center of an image being not changed, once the center is shifted, it causes inaccuracy of estimated orientation. We leave these as our future work.
ACKNOWLEDGEMENTS This work was supported by the Information Technology Research Center (ITRC) program of the Ministry of Information & Communication of Korea.
REFERENCES 1. J. O’Ruanaidh and T. Pun, “Rotation, scale and translation invariant spread spectrum digital image watermarking,” Signal Processing, 66(3), pp.303-317, 1998.
570
Proc. SPIE Vol. 4675
2.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
S. Pereira, J. J. K. O’Ruanaidh, F. Deguillaume, G. Csurka, and T. Pun, “Template based recovery of Fourier-based watermarks using Log-polar and Log-log maps,” In Intern. Conference on Multimedia Computing and Systems, Special Session on Multimedia Data Security and Watermarking, vol. 1, pp. 870-874, June 1999. S. Pereira and T. Pun, “Robust Template Matching for Affine Resistant Image Watermarks,” IEEE Ttrans. on Image Processing, vol. 9, no. 6, pp. 1123-1129, June 2000. M. Kutter, “Watermarking resisting to translation, rotation and scaling,” In SPIE Conf. On Multimedia Systems and Applications, vol. SPIE 3528, pp. 423-431, 1998. A. Herrigel, S. Voloshynovskiy, and Y Rytsar, “The Watermark Template Attack,” In SPIE Conf. On Security and Watermarking of Multimedia Contents III, vol. SPIE 4314, pp. 394-405, 1998. S. Voloshynovskiy, F. Deguillaume, and T. Pun, “Content adaptive watermarking based on a stochastic multiresolution image modeling,” In EUSIPCO ’2000, Tampere, Finland, 2000. Y. Adini, Y. Moses, and S. Ullman, “Face recognition: The problem of compensating for changes in illumination direction,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 721-732, July 1997. L. Chengjun and H. Wechsler, “A Gabor feature classifier for face recognition,” In Eighth IEEE Int. Conf. on Computer Vision, ICCV 2001, vol. 2, pp. 270 –275, 2001. T. Shioyama, H. Wu, and S. Mitani, “Object detection with Gabor filters and cumulative histograms,” In Int. Conf. on Pattern Recognition, vol. 1, pp. 704 -707, 2000. D. M. Weber, D. P. Casasent, “Quadratic Gabor filters for object detection,” In IEEE Trans. on Image Processing, vol. 10, no. 2, pp. 218-230, Feb. 2001. G. C. Langelaar, I. Setyawan, R. L. Lagendijk, “Watermarking digital image and video data. A state-of-the-art overview,” In IEEE Signal Processing Magazine, vol. 17, no. 5, pp. 20-46, Sept. 2000 T. Kalker, G. Depovere, J. Haitsma, and M. Maes, “A video watermarking system for broadcast monitoring,” In Proc. SPIE Electronic Imaging, Security and Watermarking of Multimedia Contents, pp. 103-112, 1999.
Proc. SPIE Vol. 4675
571