Robustness and Security of a Wavelet-based CBIR ∗ Hashing Algorithm Albert Meixner
Andreas Uhl
Department of Computer Science Duke University, USA
Department of Computer Sciences Salzburg University, Austria
[email protected]
[email protected]
ABSTRACT
1. Equal distribution of hash values:
A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks. A method to construct a forgery is presented and possible countermeasures are discussed.
P [H(X) = α] ≈
1 , ∀α ∈ {0/1}L . 2L
2. Pairwise independence for visually different images X and Y: ∀α, β ∈ {0/1}L holds P [H(X) = α|H(Y ) = β] ≈ P [H(X) = α].
Categories and Subject Descriptors I.4.10 [Image Processing and Computer Vision]: Image Representation; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval
ˆ 3. Invariance for visually similar images X and X: ˆ ≈1. P [H(X) = H(X)] To fulfil this requirement, most proposed algorithms try to extract image features which are invariant to slight global modifications like compression or filtering.
General Terms Security
4. Distinction of visually different images X and Y: P [H(X) = H(Y )] ≈ 0 .
Keywords Image retrieval, robust hash functions, security
1.
This final requirement also means that given an image X, it is almost impossible to find a visually different image Y with H(X) = H(Y ). In other words, it should be impossible to create a forgery which results in the same hash value as the original image. Note that the visual features selected according to requirement 3) are usually publicly known and can therefore be modified. This might threaten security, as the hash value could be adjusted maliciously to match that of another image.
INTRODUCTION
The use of robust hash algorithms for media authentication has been extensively researched in recent years. A number of different algorithms [17] have been proposed and reviewed in various publications. Due to the intended use of these algorithms, security has always been a major design and evaluation criteria [10, 20, 18]. Several attacks on popular algorithms have been proposed and countermeasures to these attacks have been developed [13, 7, 11, 15]. Similar to cryptographic hash functions, robust hash functions for image authentication should satisfy 4 major requirements [21] (where P denotes probability, H is the hash ˆ Y are images, α and β are hash values, and function, X, X, {0/1}L represents binary strings of length L):
Another application for robust hashes is searching for images in a database (content based image retieval (CBIR) see e.g. [1]). A common scenario for this application is scanning the Internet for copyrighted material. An agent searches the Internet for images and compares them to a large number of registered images, who’s copyright holders are known. The cost of searching in such a database increases linearly with the number of stored images. A hash value of the image can be used to quickly identify a small number of candidates in the database, that resemble the image, resulting in a constant search cost. Obviously classical hash functions are not suitable for this purpose, as small image modifications, like compression artifacts or inserted logos, would prohibit correct identification. The desired properties of robust hash functions used for this purpose are different from those used for authentication. Contrary to security applications, a small number of false positives does not have any impact on the system. Thus, the requirement of returning different hash values for locally
∗This work has been partially supported by the Austrian Science Fund, project no. 15170
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM&Sec’06, September 26–27, 2006, Geneva, Switzerland. Copyright 2006 ACM 1-59593-493-6/06/0009 ...$5.00.
140
base algorithm is then applied to all regions, producing hash strings Hi . These strings are concatenated and a pseudo random subset of the resulting string is selected. This subset is the final hash value Hk (X), where k is a secret key used to seed the pseudo random number generator.
modified images, which has been set up for authentication hashes, is not valid for these hashes (e.g. the insertion of small objects or their removal should not have significant impact on the hash value). Thus, for database hashes it is required that, for any image X, the construction of a visual similar image X with H(X ) = H(X) should be impossible. This includes the derivation of X from X through any kind of image modification. While several such algorithms have been proposed, very few literature on the evaluation of these algorithms is available and existing evaluations are almost completely limited to the stability of the algorithm and do not address security aspects. In this paper we evaluate and discuss a wavelet based scheme proposed in literature [21]. The first part of the paper covers the resilience against image modifications, the second part sketches a possible attack on the described algorithm and discusses a possible countermeasure.
2.
3. ROBUSTNESS EVALUATION The scheme shows extraordinary stability under non-geometric transformations. In all tests, the results stayed far below a hamming distance of 0.1 even for strong modifications, while the distance between arbitrary images stayed above 13 in most cases (see Table 1 for some concrete values). baboon barb boat jet lena peppers truck
A CBIR HASHING SCHEME
2. XA is transformed into a binary image M by thresholding. T is selected, such that there is an equal number of black and white pixels in M (i.e. T is the median). 1 0
XA (i, j) ≥ T otherwise
boat
jet
lena
peppers
truck
0.50 0 0.29 0.42 0.44 0.40 0.39
0.49 0.29 0 0.25 0.40 0.36 0.56
0.49 0.42 0.25 0 0.34 0.38 0.47
0.48 0.44 0.40 0.34 0 0.40 0.46
0.47 0.40 0.36 0.38 0.40 0 0.49
0.40 0.39 0.56 0.47 0.46 0.49 0
JPEG Most of the JPEG artifacts are removed by using the approximation sub-band, which is mostly unaffected by lossy compression. Smaller remaining artifacts are removed during the median filtering. Even when using extremely low quality, the hamming distance stays below 0.06 ! (see Fig 1).
1. The original image X is decomposed, using a L level DWT. XA is the resulting approximation sub-band.
barb
0 0.50 0.49 0.49 0.48 0.47 0.38
Table 1: Hamming distance between the hash values of various test images.
M. K. Mihcak and R. Venkatesan proposed a scheme [21] based on wavelet decomposition and an iterative algorithm to extract a representative image structure as follows.
M (i, j) =
baboon
0.06 baboon lena peppers truck
(1)
0.05
3. Let S[m,n],p (A) be a statistical order filter, defined as S[m,n],p (A) = B, where ∀i, j B(i, j) is the p-th element of the ascending sequence A(i , j ) : i ∈ i − m, . . . , i + m, j ∈ j − n, . . . , j + n.
0.04
0.03
4. M2 = S[m,n],p (M1 )
0.02
5. M3 = AM2 , where A is a constant 0.01
6. Let f be a shift invariant filter. 7. M4 = f (M3 )
0 0
8. From M4 a binary image M5 is derived by thresholding.
10
20
30
40
50
60
70
80
90
100
Figure 1: Hamming distance of compressed images (Baboon, Lena, Peppers, Truck) versus JPEG quality factor
9. If the hamming distance between M5 and M1 is greater than and the number of iterations is < C, the algorithm continues with step 3 and M5 as new input image. Otherwise the algorithm continues with step 10.
Noise Similar to JPEG compression, noise addition does not have a big affect on the approximation sub-band. Most of the remaining noise is removed during the linear filtering step of the iteration. Fig. 2 shows that the hamming distance versus an increasing amount of noise remains below 0.04.
10. H = M5 Figure 8 (the first and third lines) show a visualization of the first and fifth iteration of the algorithm applied to the image Baboon (see Fig. 6(a) for the original). The basic variant of the algorithm does not depend on a key (this version is used for generating the illustrations in Fig. 8). To make it key depended, the original image is pseudo randomly split into rectangular regions Ri . The
Brightness Changes in brightness directly affect the approximation sub-band. These changes are almost completely removed during the initial thresholding step. As the median is used as threshold it is immune to any modification, that conserve relations between pixels. The only source of error in this test is clipping
141
0.04
0.004
0.035
0.0035
0.03
0.003
0.025
0.0025
0.02
0.002
0.015
0.0015
0.01
0.001
0.005
0.0005
baboon lena peppers truck
0
baboon lena peppers truck
0 0
5
10
15
20
25
30
35
40
45
50
0
Figure 2: Hamming distance of noisy images versus noise amount
2
4
6
8
10
12
14
16
18
20
Figure 4: Hamming distance of images with an increasing number of 8 × 8 pixels objects inserted. 0.03
of pixel values, which occurs for extreme parameter values. 0.025 0.4
0.02
0.35
0.3 0.015
0.25 0.01 0.2
0.005
0.15
baboon lena peppers truck
0.1 0 0 0.05
baboon lena peppers truck 20
40
60
80
100
120
140
160
180
200
Figure 3: Hamming distance of modified images versus change in average brightness value
6
8
10
12
14
16
18
20
against image modifications. The only fragile step is the initial generation of a black and white image by thresholding. The whole iteration part depends on the initial thresholding, making it the major target for an attack. During this step, the image median is used as a threshold, to guarantee an equal amount of black and white pixels. The image median of the low pass subband is publicly known, which allows to modify pixels which are within a certain range of the median. All of these pixels are modified to result in a different value after thresholding.
Fig. 3 shows that the hamming distance versus an increasing amount of change in average brightness remains below 0.06. The leftmost values in the plot are measurement artifacts. Object Insertion By inserting an increasing number of 8 × 8 and 32 × 32 pixels objects, this test changes the low frequency components and does not conserve pixel relations, so the introduced changes cannot be resolved by the first two steps of the algorithm. The removal of the objects takes place during the statistical order filtering, which effectively removes smaller disturbances. Figs. 4 and 5 shows the impressively low values for four images.
4.
4
Figure 5: Hamming distance of images with an increasing number of 32 × 32 pixels objects inserted.
0 0
2
• The image is decomposed using a DWT and the approximation subband L is processed further. • The median m of L is determined. • L is constructed by flipping pixels close to the median
SECURITY EVALUATION
Lij =
An attack against a database hash aims to modify an image such that it produces a different hash value, without visual changes to the image. As we have seen in the previous section, the Mihcak/Venkatesan scheme is very stable
Lij |Lij − m| ≥ 2m − Lij otherwise
(2)
• The image is reconstructed, using L as approximation subband.
142
(a) Original
when comparing it to the values among different images in Table 1 which are in the same order of magnitude. The attack works particularly well for images which have a high amount of pixels close to the median in general. Obviously, also the key dependency scheme does not help to prevent this attack effectively. The experiments show as well that even small values of result in a significant change in the resulting hash value for a large number of images (see Fig. 7). The quality of the resulting image highly depends on the actual image used. In the “best” case, the hash value of the modified image has a normalized hamming distance of about 0.4 to the original hash, while not perceptibly degrading image quality.
(b) Attacked ( = 0.2, hamming distance = 0.35)
Step 2
Figure 6: Baboon Testimage. In order to cope with the key dependency of the scheme involving the generation of the random rectangular regions Ri , we apply the described flipping procedure on the basis of a fixed square grid instead of applying it to the entire image. This means that the median is computed within a patch of the grid and the flipping is also applied on a local basis. The rationale behind this strategy is that the median in a random rectangular region Ri is often quite similar to the median values of those grid patches, which overlap Ri . Flipping only pixels close to the median results in a small visual impact, while changing a lot of pixels in the thresholded image. It should be noted that flipping all respective pixels in the defined way offers the possibility to reverse the attack by flipping all affected pixels back. A solution would be not to flip all pixels close to the median but to decide ramdomly, which of these pixels should be subject to flipping. This strategy introduces a tradeoff since it annihilates the possibility to revert the attack but the effect of attack is less severe. The experimental results are achieved by flipping all pixels close to the median.
Step 4
Step 7
(a) First Iteration Step 2
Step 4
Step 7
0.7
0.6
0.5
0.4
0.3
(b) Fifth Iteration
0.2 Baboon Barb Boat Jet Lena Peppers Truck Zelda
0.1
Figure 8: Steps 2, 4, and 7 of the Mihcak Venkatesan algorithm for the original and attacked Baboon image.
0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 7: Attack results: Hamming distances for different attacked images
Fig. 8 shows first and fifth iteration of the algorithm applied to the original (first lines in subfigures) and attacked (second lines in subfigures) Baboon images (see Fig. 6 for the original). The large structural differences in the final images (which cause the high hamming distance value as obtained by the attack) are clearly visible.
Fig. 6 shows the Baboon image in original and in an attacked version which has a hamming distance of 0.35 to the original. Given the high degree of visual similarity the high value of the hamming distance is astonishing, especially
143
5.
IMPROVING ATTACK RESISTANCE Normalized hamming distance
The concept of secret transform domains has been exploited to some degree in the area of multimedia security during the last years. Fridrich [8, 6] introduced the concept of DCT-type key-dependent basis functions in order to protect a watermark from hostile attacks. Unnikrishnan and Singh [19] suggest to use secret fractional Fourier domains to encrypt visual data, a technique which was also used to embed watermarks in an unknown domain [4]. The many degrees of freedom available to design a wavelet transform has also been exploited in similar manner for image and video encryption [5, 14] and to secure watermarking copyprotection [2, 3] and authentication [9] schemes.
0.26 0.25 0.24 0.23 0.22 0.21 0.2 0.19 3 2 1 -3
-2
0 -1
0
-1 1
-2 2
3 -3
Normalized hamming distance
Figure 10: Results of attacks using different Pollen filters
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
6. CONCLUSIONS Our test results show, that the iterative processing used by the Mihcak/Venkatesan scheme results in high resistance to common global image manipulations. Even though the scheme is very stable to most manipulations, it is possible to maliciously modify the image to create false negatives, that are visually very similar to the original image. The key-depency scheme based on a random tiling can be overcome easily and turns out not to be helpful against this attack. This weakness can not be easily overcome by a generic method, which we proposed for protecting authentication hashes from similar attacks [11, 12]. One reason for this effect is the extremely high robustness of the scheme, since there exists a tradeoff between the security and robustness of visual hashing schemes as pointed out by [18]. These facts raise the question if the construction of highly robust and secure visual hashing schemes for CBIR is possible in general.
3 2 1 -3
-2
0 -1
0
-1 1
-2 2
3 -3
Figure 9: Distance of hashes created with varying Pollen filters In recent work [11] we have demonstrated an effective attack against a wavelet-based authentication hash algorithm. In subsequent work [12] we could reduce the effectiveness of this attack by using secret parameterized wavelet filters during the decomposition. As it is the case also in this algorithm, the attack takes place before the actual hash generation, therefore parameterized wavelets appear to be a feasible means to improve security for this algorithm as well. Figure 9 shows, that we indeed result in different hash values when using different wavelet filters generated by Pollens 2-D parameterization [16] during the algorithm. However, the amount of difference is not very large, because the hashing algorithm is designed for stability. To determine the effectiveness against the proposed attack, another experiment is performed. A hash value is computed for given parameters (1.0, 1.0) of the Pollen scheme and is compared to the hash values of images, that were attacked using various, incorrect parameters. As the goal of the attack was to modify the hash as much as possible, a successful improvement would show a lower difference between original and attacked image for incorrect parameter choice. The results in Fig. 10 show, that this is not the case. The choice of parameters influences the effectiveness of the attack only in a small range, too small to prevent the attack effectively.
7. REFERENCES [1] B. Coskun and B. Sankur. Robust video hash extraction. In Proceedings (CD-ROM) of the European Signal Processing Conference, EUSIPCO ’04, Vienna, Austria, Sept. 2004. [2] W. Dietl, P. Meerwald, and A. Uhl. Protection of wavelet-based watermarking systems using filter parametrization. Signal Processing (Special Issue on Security of Data Hiding Technologies), 83:2095–2116, 2003. [3] W. M. Dietl and A. Uhl. Robustness against unauthorized watermark removal attacks via key-dependent wavelet packet subband structures. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME ’04, Taipei, Taiwan, June 2004. [4] I. Djurovic, S. Stankovic, and I. Pitas. Digital watermarking in the fractional fourier transformation domain. Journal of Network and Computer Applications, 24:167–173, 2001.
144
[5] D. Engel and A. Uhl. Parameterized biorthogonal wavelet lifting for lightweight JPEG 2000 transparent encryption. In Proceedings of ACM Multimedia and Security Workshop, MM-SEC ’05, pages 63–70, New York, NY, USA, Aug. 2005. [6] J. Fridrich. Key-dependent random image transforms and their applications in image watermarking. In Proceedings of the 1999 International Conference on Imaging Science, Systems, and Technology, CISST ’99, pages 237–243, Las Vegas, NV, USA, June 1999. [7] J. Fridrich. Visual hash for oblivious watermarking. In P. W. Wong and E. J. Delp, editors, Proceedings of IS&T/SPIE’s 12th Annual Symposium, Electronic Imaging 2000: Security and Watermarking of Multimedia Content II, volume 3971, San Jose, CA, USA, Jan. 2000. [8] J. Fridrich, A. C. Baldoza, and R. J. Simard. Robust digital watermarking based on key-dependent basis functions. In D. Aucsmith, editor, Information hiding: second international workshop, volume 1525 of Lecture notes in computer science, pages 143–157, Portland, OR, USA, Apr. 1998. Springer Verlag, Berlin, Germany. [9] J. Huang, J. Hu, D. Huang, and Y. Q. Shi. Improve security of fragile watermarking via parameterized wavelet. In Proceedings of the IEEE International Conference on Image Processing (ICIP’04), Singapore, Oct. 2004. IEEE Signal Processing Society. [10] C.-S. Lu and H.-Y. M. Liao. Structural digital signature for image authentication: An incidental distortion resistant scheme. In ACM Multimedia 2000, pages 115–118, Los Angeles, CA, USA, Nov. 2000. [11] A. Meixner and A. Uhl. Analysis of a wavelet-based robust hash algorithm. In E. J. Delp and P. W. Wong, editors, Security, Steganography, and Watermarking of Multimedia Contents VI, volume 5306 of Proceedings of SPIE, pages 772–783, San Jose, CA, USA, Jan. 2004. SPIE. [12] A. Meixner and A. Uhl. Security enhancement of visual hashes through key dependent wavelet transformations. In F. Roli and S. Vitulano, editors, Image Analysis and Processing - ICIAP 2005, volume 3617 of Lecture Notes on Computer Science, pages 543–550, Cagliari, Italy, Sept. 2005. Springer-Verlag. [13] N. Memon and J. Fridrich. Further attacks on the Yeung-Mintzer fragile watermark. In P. W. Wong and E. J. Delp, editors, Proceedings of IS&T/SPIE’s 12th Annual Symposium, Electronic Imaging 2000: Security and Watermarking of Multimedia Content II, volume 3971, San Jose, CA, USA, Jan. 2000.
[14] A. Pommer and A. Uhl. Selective encryption of wavelet-packet encoded image data — efficiency and security. ACM Multimedia Systems (Special issue on Multimedia Security), 9(3):279–287, 2003. [15] R. Radhakrishnan, Z. Xiong, and N. D. Memom. Security of visual hash functions. In P. W. Wong and E. J. Delp, editors, Proceedings of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents V, volume 5020, Santa Clara, CA, USA, Jan. 2003. SPIE. [16] J. Schneid and S. Pittner. On the parametrization of the coefficients of dilation equations for compactly supported wavelets. Computing, 51:165–173, May 1993. [17] C. J. Skrepth and A. Uhl. Robust hash-functions for visual data: An experimental comparison. In F. J. Perales et al., editors, Pattern Recognition and Image Analysis, Proceedings of IbPRIA 2003, the First Iberian Conference on Pattern Recognition and Image Analysis, volume 2652 of Lecture Notes on Computer Science, pages 986–993, Puerto de Andratx, Mallorca, Spain, June 2003. Springer Verlag, Berlin, Germany. [18] A. Swaminathan, Y. Mao, and M. Wu. Security of feature extraction in image hashing. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, US, Mar. 2005. IEEE Signal Processing Society. [19] G. Unnikrishnan and K. Singh. Double random fractional fourier-domain encoding for optical security. Optical Engineering, 39(11):2853–2859, Nov. 2000. [20] R. Venkatesan, S.-M. Koon, M. H. Jakubowski, and P. Moulin. Robust image hashing. In Proceedings of the IEEE International Conference on Image Processing (ICIP’00), Vancouver, Canada, Sept. 2000. [21] R. Venkatesan and M. K. Mihcak. New iterative geometric methods for robust perceptual image hashing. In Proceedings of the Workshop on Security and Privacy in Digital Rights Management 2001, Philadelphia, PA, USA, Nov. 2001.
145