Robust Image Hashing by Downsampling: Between Mean and Median

4 downloads 0 Views 846KB Size Report
Abstract—The arithmetic mean and the non-linear median operation, provide two slightly different abstractions of any discrete dataset. The differential ...
Robust Image Hashing by Downsampling: Between Mean and Median Kannan Karthik Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati, Guwahati-781039, India E-mail: [email protected] Tel: +91-361-2582516, Fax: +91-361-2582542

Abstract—The arithmetic mean and the non-linear median operation, provide two slightly different abstractions of any discrete dataset. The differential information contained in these two filtering operations reveal several trends in the processed data. This principle has been applied towards the process of deriving robust binary PINs from gray-scale images by first extracting pseudo-images by random rectangular downsampling of a parent image, fusing them with a median operation and then mean thresholding the filtered set. Robustness against recompression, noise and most importantly geometric operations, stems from two factors: (a) the random sampling which provides elasticity and (b) the process of selecting the PIN-thresholds as the arithmetic mean filtered set. Keywords-Mean, Median, Image hash, Downsampling, Image search

I. I NTRODUCTION The urge to rummage through a large multimedia database in search of specific multimedia fragments, could be because of several reasons. Every search implies, seeking an association between two entities, (a) a known entity termed as host or query input QI, which is supplied to some search engine and (b) unknown entity (or entities) called targets, Ti , which are the search byproducts. Some examples have been provided: Example 1: Locating crumpled artwork - A particular customer has recently purchased a digital artifact and wishes to crawl through the internet database to find out how many such art-works lay scattered, not necessarily possessing the same digital quality as the new purchase. In this example, QI has a much higher quality compared to the targets, Ti ’s, the latter is like a tossed away crumpled piece of paper. A related application is using the snapshot of a criminal to search several immigration databases for similar-looking faces (accounting for disguises). An excellent overview of applied robust hash functions towards locating file fragments and also similarity hashes is presented in Roussev et al. [1]. Example 2: PIN based inquiry - In a PIN based search, the user supplies the search engine with a multimedia fragment such as a distorted video clip, an audio segment from a well known artist or a key phrase from a technical paper and requests higher quality full versions of the same. If QI happens to be video clip, the motive can be the search for the parent movie which contained that clip. Here, the quality of the target Ti is much higher and is more complete as

compared to the PIN QI provided. An example is a webcrawler application [2]. Example 3: Fingerprint identification - Since personal documents such as passports, identification cards can be easily forged and passwords compromised, the process of access control must invoke an association between the individual and some physiological trait, e.g. a thumbprint [3]. Here, QI is the print supplied by the individual and TF is the exact matching print within the database. For most fingerprint identification applications, it is required that the match be perfect and with zero false alarms. The focus of this paper is on Example 1, in which given an undistorted high quality host (base) image (or its PIN equivalent), the motive is to locate noisy, cropped, re-compressed and stretched versions of the same in any database. Typically image hashing literature can be classified into the following categories: (i) Hashes based on preservation of image statistics [4], (ii) Identification of certain invariant features (such as the geometric inter-relationship between object centric anchor points) and using them for hash abstraction [5] and (iii) Learning based approaches subject to a significant training with the help of both content-different images and contentsimilar images (similarity transformed versions) [6]. Unfortunately categories (i) and (iii) go together as it is difficult to arrive at a set of robust statistics without sufficient learning and content summarization. Since the perceptual meaning in an image is a ’high level measure’, it must be devoid of underlying pixel intensity fluctuations either artificially introduced or due to a set of incidental signal processing and compression operations. Hence, in the signal domain the definition of an image is elastic. In this paper we try to inculcate this elasticity through random downsampling and deriving several slightly different child images (distorted shrunken versions) from the parent image. We then use this set for producing the statistical abstraction. A novel image hashing algorithm is discussed in Section. II as a four stage process. The significance of median filtering followed by mean thresholding is discussed in Section. III while the motive for random sampling is presented in Section. IV. Finally the principle is applied to image hashing with results in Section. V.

II. P ROPOSED IMAGE HASHING ALGORITHM The proposed image hash is the function: ⃗ = P hash [I, T, p] Z

(1)

where, I is the original gray-scale image whose binary PIN must be computed, T is the number of pseudo-images which are allowed to be created during the process of hash construction (Sub-section (A), of the algorithm description) and the parameter p is the probability with which a particular row or column is chosen to form a part of the pseudo-image. The ⃗ byproduct of the hash operation is an L-bit binary string Z which captures the essential features of the original image I. The four stages in this algorithm are discussed below: A. Pseudo-images by random downsampling Let ζ1 , ζ2 , ..., ζT be independent downsampling experiments performed on the original image I. The consequence of these experiments are child images, IC1 , IC2 , ..., ICT , where, the creation of the child is described by the relation, ζi : I → ICi

(2)

Let the image be of size N × N . The sets R = {1, 2, 3, ..., N } and C = {1, 2, 3, ..., N } represent the rows and columns respectively. Further a critical sampling parameter p is defined as, p = Pr (Row r ∈ R is selected) (3) Every row and column is independently chosen from the host image I. In every experiment ζi , the decision to choose a particular row or column is made 2 × N times and results in a splitting of R and C as R = RS ∪ RD and C = CS ∪ CD respectively. RS and CS represent the sets of rows/columns which have been chosen to form the child image ICi . Any child image ICi is the grid formed by the intersection of selected rows RS with columns CS . When the experiment ζi is independently repeated (e.g. ζ1 : I → IC1 , ζ2 : I → IC2 , ζ3 : I → IC3 ), the following sampled versions result which form the distorted child images (Fig. 1).

rows chosen towards the child image. This forms a binomial distribution, ( ) N j N −j (4) Pr (K = j) = p (1 − p) j and the expected value of K or the average number of rows extracted is E[K] = p · N . Hence, Nch = ceil(p · N ). The consequence of this operation is a series of equal sized childimages I˜C1 , I˜C2 , I˜C3 , ..., I˜CT . C. Median filtering The images, I˜C1 , I˜C2 , I˜C3 , ..., I˜CT are median filtered by stacking them one below the other and then performing the (k) median operation along each pixel column (Fig. 2). Let ci,j be one specific pixel value at position (i, j) ∈ Pch × Pch of the child image I˜Ck , where, Pch = {1, 2, ..., Nch }. Corresponding to each column in the stack, there are T such pixels (1) (2) (T ) ci,j , ci,j , ..., ci,j from each of the T child images. Let fi,j be the pixel value at position (i, j) in the median filtered set ICF . Thus we have, ] [ (T ) (1) (2) (5) fi,j = M edian ci,j , ci,j , ..., ci,j

D. PIN Computation The binary PIN for the original image I, is finally computed by comparing ICF with a threshold matrix T h, which is chosen as the arithmetic mean filtered matrix. Let ti,j ∈ T h and this is calculated as, ti,j =

T 1 ∑ (k) ci,j T

(6)

k=1

The binary PIN matrix (PI ) has the same size as the child images, but is a binary representation derived based on the following comparison, PI (i, j) = 1 IF fi,j ≥ ti,j = 0 IF fi,j < ti,j

(7)

The median and the arithmetic mean provide two slightly different abstractions of the same data set CHdata = {I˜C1 , I˜C2 , I˜C3 , ...,I˜CT }. The question that needs to be asked is, ”what form of information does the difference M edian(CHdata ) − M ean(CHdata ) convey?”. III. T REND D ISCOVERY Fig. 1.

Child images as a result of random sampling.

B. Row-column borrowing Since the randomized sampling process does not guarantee all child images to be of the same size, a row-column borrowing step is introduced. The step is executed in such a way that all child images have sizes Nch × Nch , where, Nch = ceil(p · N ). Let K = |RS | be the number of

We suggest that this problem translates to a piecewise ⃗ t = [x[0], x[1], x[2], x[3], x[4]] represent trend discovery. Let X ⃗C , the host image data. Five (T = 5) downsampled sets X 1 ⃗ ⃗ ⃗ ⃗ ⃗t XC2 , XC3 , XC4 , XC5 are constructed from the parent set X through the matrix AS as shown below:   ⃗  x[0] x[1] x[3]  XC1 ⃗  XC2   x[2] x[3] x[4]   ⃗ (8) AS =   XC3  =  x[0] x[3] x[4]  ⃗C X 4 ⃗C X 5

x[1] x[1]

x[2] x[3]

x[4] x[4]

Fig. 2.

PIN construction by median filtering with mean thresholding.

The median and arithmetic means are evaluated column-wise as mentioned in the algorithm in Section. II to get vec⃗ tors M ⃗EDc (AS ) and M EAN c (AS ) respectively, where, the suffix ’c’ indicates column-wise operation. The PIN (binary ⃗ is simply the outcome of the boolean result, string), Z ( ) ⃗ = BOOLEAN M ⃗EDc (AS ) > M EAN ⃗ Z c (AS )

(9)

⃗ t ) is shown The trends for various test vectors (host data sets, X in Fig. 3. To examine robustness of the abstraction process under noisy conditions we have added zero mean Gaussian noise v[n] with a variance σv 2 = 9 to the original samples. Hence the actual sequence whose PIN is computed is, y[n] = x[n] + v[n]

(10)

⃗t = [y[0], y[1], y[2], y[3], y[4]] is where, n = 0, 1, 2, 3, 4 and Y the noisy sequence. Note in the Table. I, that each unique biTABLE I ⃗ t = [x[0], x[1], x[2], x[3], x[4]]. T RENDS IN X Trend label T1 T2 T3 T4 T5 T6 T7 T8

⃗t X [13, 21, 28, 36, 44] [44, 36, 28, 21, 13] [10, 40, 25, 42, 12] [40, 25, 10, 27, 42] [10, 40, 15, 25, 43] [40, 10, 30, 10, 43] [13, 30, 53, 28, 9] [13, 15, 13, 35, 9]

Trend type

⃗ PIN Z

Increasing Decreasing Oscillation-1 (2 cycles) Oscillation-2 (1 cycle) Oscillation-3 Inverse of Osc-1 Inverse of Osc-2 Delay and Burst

[1,1,1] [0,0,0] [0,1,0] [0,1,1] [0,0,1] [1,0,1] [1,0,0] [1,1,0]

nary string obtained through the PIN generation process maps to a unique data-representation, which is an indication that relative mean-median information can be used to uncover trends ⃗ buried in the datasets. Thus, M ⃗EDc (AS ) and M EAN c (AS ) can be fused into a single binarized representation of the host ⃗ t as the PIN Z. ⃗ Robustness to additive noise is examined data X in Fig. 3 which validates this form of trend discovery.

IV. P IXEL TRAJECTORY Since it is important to gauge the performance of the algorithm (detectability) inspite of geometric operations and cropping, the question that first need to be asked is, ”Which are the pixels (from the host image) involved in the computation of the median or mean?”. Alternatively, how many spots did the original pixel jump to reach its current position represented by fi,j (Eqn. 5)? Since the host image has been subjected to random rectangular sampling, the pixels will migrate along the rectangular grid shown in Fig. 4. The probability with which a particular pixel will form the child pixel depends on its orientation in the grid. The paths taken by three host pixels have been expressed in terms of unit vectors ˆi, ˆj as, OP 1 → Onew : (2ˆi − ˆj), OP 2 → Onew : (2ˆi + ˆj), OP 3 → Onew : (−ˆi + 2ˆj). Let lx be the length of the path traversed along the X-direction and ly be the length along the Y -direction. Pr (OP 1 → Onew )

= Pr (Bx AND By AND Bch ) l

l

=

(1 − p) x · (1 − p) y · p

=

(1 − p) x

l +ly

·p

(11)

where, (a) Bx is the event that lx spots are skipped because of column sampling, (b) By the event that ly spots are skipped because of row sampling and (c) Bch : OP 1 is chosen as the child pixel. Bx , By and Bch are independent events because of the independence in the selection process. Equation. 11, shows that all pixels on the contour lx + ly = r will have the same probability of reaching Onew and this decreases with the contour radius r. Thus the choice of p places an upper limit on the number of pseudo-images (child images) T , which can be created before forming the PIN, limited by the size of the diamond shell. This contour-view with respect to every child pixel is similar and so the fusion of several child images results in a globally smooth median operation. Since the median operation requires sorting according to ascending order, only pixels contributing to significant texture are extracted from the diamond shell. The process of replacing the central spot can

Fig. 3.

Capturing trends in the datasets by examining the relative mean-median binary information.

be viewed as a contraction of the pixel area about the pixel Onew . Thus randomized sampling provides some elasticity and resilience to scaling and cropping operations. V. E XPERIMENTAL RESULTS A 512 × 512 Lena gray-scale image is used as the host image I, the template for searching the database (Fig. 5(a)). By downsampling (p = 0.16, i.e. approximately one in seven pixels selected, a tradeoff between degree of elasticity required and the preservation of spatial texture details), T = 16 child pseudo-images are created (Fig. 5(e)). These child images are stacked and subjected to two different forms of filtering, median fusion and then averaging to produce two filtered images, ICF and the mean threshold image T h. By identifying the spatial positions where ICF > T h, an 87 × 87 binary PIN ⃗ (Fig. 5(d)). is extracted Z

Fig. 4. Trajectory of OP 1 , OP 2 , OP 3 from the host which form the child image. Central pixel Onew is a child pixel.

(a)

(b)

(c)

cation of ’perceptual relevance’ is application dependent and if we are interested in more than one match, we must define a decision boundary in relation to the minimum distance. If we treat noisy and processed versions of the original image as perceptually relevant, we should allow images which are atleast within 10% of the minimum distance. This tolerance may change if we include geometric transformations. Results of the top 4 matches for other queries are shown in Figs. 7 and 8. In the case of Fig. 7, the query image is the ’woman with a hat’ (same orientation as Lena’s hat). Since the test set includes the original image, the closest match is with the original image itself (distance of 0.347). Observe that this distance is not ZERO since we are treating images as statistical abstractions. This imparts a certain elasticity to the hash representation of a particular image. Since there are no other images at a distance of 0.377 and less, one can say that there are no false positives. In Fig. 8, the query is the image of a ’diplomat’. Once again there are no images other than the original within 10% of the minimum distance. However there is one problem with this hashing approach. It cannot be used as a hierarchical classifier as it extracts global statistical features. This is evident from the ranking results in Fig. 7 and Fig. 8 which contains some unrelated images (perceptually irrelevant). The main advantage however is the resilience to geometric transformations and forms of distortion which tend to preserve the perceptual value in the query image.

(d)

(e) Fig. 5. (a) Original Lena, (b) Image obtained by median fusion of child ⃗ images, (c) Image obtained by averaging, (d) PIN extracted from host (a) Z, (e) Pseudo-images by downsampling the original host image.

The database used was self-constructed by integrating the following types of images: • Transformed versions of the Lena image: (i) Cropped and scaled, (ii) With additive noise, (iii) Rotated by ±5 degrees and scaled, (iv) JPEG compressed, (v) Sharpened, (vi) With periodic snip insertions, (vii) Radial Blurring, (viii) Median filtering, (ix) Brightness and contrast changes, (x) With bubbles. • Face images of different people. • Natural images. • Fingerprint images. • Medical images (including MRI and angiogram images) The total number of images in the database were 64. The metric used for comparing hash values (or PINs) from the ⃗ Q and test image PIN Z ⃗ T was the normalized query PIN Z Hamming distance, dQ,T =

Nch −1 Nch −1 [

1 Nch





i=0

j=0

2

⃗ Q (i, j) ⊕ Z ⃗ T (i, j) Z

] (12)

From various test images, PINs were extracted in a similar fashion. The distances were sorted in the ascending the order and the top 15 closest matches are displayed in Fig. 6. Observe that amongst the top 15, the first 11 images are transformed versions of the original Lena (including the original) which illustrates the robustness of the hashing algorithm. Quantifi-

VI. C ONCLUSIONS In this paper we have presented a fast and computationally simple procedure for abstracting images by deriving several randomly downsampled child pseudo-images from a parent and following it up by evaluating a median and mean of this stack. The median image is then thresholded with respect to the mean filtered image to produce the PIN. This relative information can be used for piecewise trend discovery which in turn can be applied to the process of image searching. Experimentally we have shown that the hashing algorithm is robust to geometric transformations, noise addition, cropping, signal processing operations and lossy compression. R EFERENCES [1] V. Roussev, “Hashing and data fingerprinting in digital forensics,” IEEE Security and Privacy Magazine, vol. 7, pp. 49–55, March 2009. [2] J. P. G. N. G. D. Yadav, A. K. Sharma and A. Mahajan, “Architecture for parallel crawling and algorithm for change detection in web pages,” Int. Conference on Information Technology (ICIT), pp. 258–264, Dec. 2007. [3] S. P. U. Uludag, S. Pankanti and A. K. Jain, “Biometric cryptosystems: issues and challenges,” Proceedings of the IEEE, vol. 92, no. 6, pp. 948– 960, 2004. [4] R. Venkatesan, S.-M. Koon, M. Jakubowski, and P. Moulin, “Robust image hashing,” in Proc. International Conference on Image Processing, vol. 3, pp. 664 –666 vol.3, 2000. [5] V. Monga and B. L. Evans, “Perceptual image hashing via feature points: Performance evaluation and tradeoffs,” Image Processing, IEEE Transactions on, vol. 15, pp. 3452 –3465, nov. 2006. [6] V. Monga, A. Banerjee, and B. L. Evans, “A clustering based approach to perceptual image hashing,” IEEE Transactions on Information Forensics and Security, vol. 1, pp. 68 – 79, march 2006.

Fig. 6.

Closest matches arranged in the order of increasing distances with respect to the original Lena image.

Fig. 7.

Query image of woman with a hat (similar orientation as Lena’s hat) and the closest matches.

Fig. 8. Query image of the diplomat and the closest matches. The normalized Hamming distance of the original image with itself is 0.34 which is substantially lower than the other Hamming distances.

Suggest Documents