Asymmetric Cyclical Hashing for Large Scale Image ... - IEEE Xplore

0 downloads 0 Views 3MB Size Report
Abstract—This paper addresses a problem in the hashing technique for large scale image retrieval: learn a compact hash code to reduce the storage cost with ...
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

1225

Asymmetric Cyclical Hashing for Large Scale Image Retrieval Yueming Lv, Wing W. Y. Ng, Senior Member, IEEE, Ziqian Zeng, Daniel S. Yeung, Fellow, IEEE, and Patrick P. K. Chan, Member, IEEE

Abstract—This paper addresses a problem in the hashing technique for large scale image retrieval: learn a compact hash code to reduce the storage cost with performance comparable to that of the long hash code. A longer hash code yields a better precision rate of retrieved images. However, it also requires a larger storage, which limits the number of stored images. Current hashing methods employ the same code length for both queries and stored images. We propose a new hashing scheme using two hash codes with different lengths for queries and stored images, i.e., the asymmetric cyclical hashing. A compact hash code is used to reduce the storage requirement, while a long hash code is used for the query image. The image retrieval is performed by computing the Hamming distance of the long hash code of the query and the cyclically concatenated compact hash code of the stored image to yield a high precision and recall rate. Experiments on benchmarking databases consisting up to one million images show the effectiveness of the proposed method. Index Terms—Asymmetric hashing with different code lengths, hashing, large scale image retrieval.

I. INTRODUCTION

W

ITH the explosive growth of both the number of images available on the Internet and the dimensionality of image descriptors, two problems must be addressed: retrieval speed and storage cost [1]. On the issue of speed, hashing methods have a sublinear time complexity for solving large scale content based image retrieval problems [2], [3]. The computation of Hamming distances between hash codes of the query and the stored images is very fast and can be accomplished with a simple data structure and efficient bit operations [4], [5]. Even the exhaustive computation of millions of Hamming distances could take only a second using a single CPU [1], [6]. Many hashing methods have been proposed in recent years for better retrieval performance. They can be divided into two major categories: random projection based and data-dependent based methods. The Locality Sensitive Hashing (LSH) is a very

Manuscript received October 22, 2014; revised February 02, 2015 and May 17, 2015; accepted May 19, 2015. Date of publication May 25, 2015; date of current version July 15, 2015. This work was supported by the National Natural Science Foundation of China under Grant 61272201 and the Program for New Century Excellent Talents in University of China under Grant NCET-11-0162. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Cees Snoek. (Corresponding author: Wing W. Y. Ng.) The authors are with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2015.2437712

first and popular random projection based hashing method with a strong theoretical foundation [3], [7]–[10]. The major drawback of the LSH and its variants (e.g. the SKLSH) is the requirement of a long hash code to achieve a high precision for the similarity estimation. It leads to a large storage cost and even slow hard disk access for large scale databases. To deal with this problem, data-dependent based methods [2], [11]–[13] are proposed to find compact hash codes by finding the mapping from the high dimensional real-valued vectors describing images to the low dimensional hash codes. These data-dependent methods achieve satisfactory performances with short hash codes, but they fail to improve performances when the number of bits increases as the LSH and its variants do. Recently, many sophisticated data-dependent hashing methods [4], [5], [14]–[18] have shown better performances by using longer hash codes. However, the storage cost of a long hash code for a large scale database is very large and it limits the number of images being stored in memory. When the memory cannot accommodate hash codes for all images, frequent access to hard disks or a distributed system will be needed which is much slower than direct memory access [1]. As a result, queries may collapse because of a long response time which severely limits the application of hashing methods with long codes for large scale databases. For an image retrieval system processing millions of query in a second on the Internet, a slow access of image hash codes costs much more time in comparison to the fast Hamming distance computation. To address this problem created by using the same code length for both the query and stored images, we propose the Asymmetric Cyclical Hashing (ACH). The major contribution of the ACH is to use a short hash code with bits storage for each stored image and bits for the query to provide a good retrieval performance by using a long hash code for the similarity computation, where and are non-negative integers. In contrast, the asymmetric hashing presented in [4] uses two different hash codes with the same length and thus still suffers from the aforementioned limitation. We select bits for the query for ease of Hamming distance computation. In other words, the ACH computes the Hamming distances between the -bit hash code of the query and the -times repetitive concatenation of the -bit hash code of the stored images. Let be the descriptor vector of an image and . Fig. 1 provides a visual illustration of the ACH. The ACH uses to generate the -bit hash codes for the query image as all hashing methods do. In contrast, for each image in the database, the ACH

1520-9210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1226

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

Fig. 1. Graphical illustration of the ACH.

uses to generate a -bit hash code . During the query execution, is repeated times and the resulting -bit codes are concatenated to form an -bit hash code for computing the Hamming distances between the query and the stored images. Therefore, the storage requirement for each image is the short -bit code, while long -bit codes are used for the similarity computation during the query execution. This paper is organized as follows. Section II briefly introduces related works while the ACH is proposed in Section III. Experimental results are presented and discussed in Section IV. Section V concludes this work. II. RELATED WORKS Only hashing methods with Euclidean ground-truth will be discussed in this work. Semantic based image retrieval and supervised or semi-supervised hashing methods will not be covered. The LSH is the most widely used random projection based hashing method [3]. It has many variants, for instances, the norm LSH [10], the LSH forest [19], the entropy-based LSH [20], and the multi-probe LSH [21]. Several kernel projection based LSH methods have been proposed: the Kernel LSH [8], the SKLSH [7], and the image kernel LSH [22]. The locality preserving principle states that two images will be hashed into the same hash code with a probability directly proportional to the actual similarity between these two images if hash functions are randomly selected. Therefore, a long hash code is required for high image retrieval precision for LSH based methods. A long hash code increases both the storage cost and the similarity computational time. One solution is to use a data-dependent hashing method. The spectral hashing [11] assumes that the data follows a uniform distribution and the hash code is computed using a one dimensional Laplacian eigenfunction. The major drawback of the spectral hashing is that the assumption of uniform distribution is rarely met in applications. The unsupervised sequential projection learning [2] for hashing iteratively learns a hash bit by correcting error made by each previous bit. The Iterative Quantization (ITQ) [12], [5] minimizes the average quantization error between the real-valued outputs of hash functions and the currently assigned hash codes to all images by iteratively rotating the orthogonal projection of hash functions. The anchor graph hashing [13] obtains a tractable low-rank adjacency matrix using an anchor graph which captures neighborhood structure information from the data. The hash code is then

obtained via a Laplacian eigenfunction of the adjacency matrix. The performances of these data-dependent methods will not significantly improve when the number of hash bits increases. However, there are some data-dependent hashing methods which show that a long code can achieve better performance than a short code. The binary reconstructive embedding [23] minimizes the reconstruction error between the original Euclidean distance and the Hamming distance of the binary embedding to generate hash functions. The minimal loss hashing (MLH) [14] learns the similarity preserving hash functions by minimizing the upper bound of a Hinge-like loss function. By first performing the Principal Component Analysis (PCA), the maximum number of hash bits of the ITQ is bounded by the number of dimensions of the original data. Hence, the ITQRFF [5] performs a kernel embedding to PCA using the Random Fourier Features (RFF) to allow the ITQ using a hash code with the number of bits larger than the number of dimensions of the original data. The Orthogonal K-means (OKM) [16] uses the vertexes of the binary hypercube (constructed by hash code in the Hamming space) to approximate centers of the k-means clustering to improve the performance of the ITQ. The bilinear projection-based hashing [17] focuses on the improvement of performance of very high dimension codes for extremely high dimensional visual descriptors (e.g. 4,000 bits or higher). It employs a bilinear rotation to maximize the angle between a rotated feature vector and its corresponding binary codes. The Spherical Hashing [24] uses a hypersphere to formulate a hash function which yields 1 when a data point is located inside the hypersphere and 0 otherwise. It iteratively optimizes the pivots and the thresholds of hyperspheres to satisfy both the independence and the balancing properties. However, its bits are not equally important because a bit equal to 1 provides a precise location information of a data point while a bit equal to 0 only indicates that the data point is located outside the hypersphere. In [25] distance lookup tables are explicitly optimized to approximate the original distances between the query and the database codes to address the ranking issue. By assuming that both the query and the database points follow the same distribution, it minimizes the error between the original pairwise squared Euclidean distance for database points and the corresponding approximated distances. Its major drawback is the requirement of storing additional distance lookup tables. Owing to the page limit, comprehensive surveys on hashing can be found in [26] and [27]. Dong et al. [28] present an asymmetric scheme based on the random projection to improve the precision rate of image retrieval. It employs the real-valued output of the hash function to calculate a weighted Hamming distance based lower bound of the Euclidean distance between the query and the stored images. Gordo et al. [29] further generalizes it to a broader family of binary embeddings. A weighted Hamming distance based on the lower bound of Euclidean distances between the real-valued outputs of hash functions of the query and the binary hash codes of the stored images is computed to enhance the image retrieval precision. In addition, an expectation-based asymmetric distance is also proposed in [1]. These methods achieve the precision improvement at the cost of more computations in addition to the computation of the Hamming distance

LV et al.: ASYMMETRIC CYCLICAL HASHING FOR LARGE SCALE IMAGE RETRIEVAL

between the query and the stored images. The Asymmetric Hashing (AH) [4] generates discriminative codes by employing two distinct hash functions, for the query and for the stored images with both hash functions having the same number of hash bits. Therefore it requires a large storage for a long hash code. Although the aforementioned methods can achieve satisfactory performance with long codes, the large storage cost severely limits their usefulness for large scale retrieval problems. The ACH is proposed to address this problem. III. ASYMMETRIC CYCLICAL HASHING The aim of the ACH is to achieve both the storage efficiency by storing only a short hash code for the stored images, and a high similarity precision for image retrieval using a long hash code during the query execution. These two goals are usually incompatible, but we show that the ACH can achieve both. The ACH uses two distinct hash functions, for the query and for the stored images. Hence, the goal of designing the hash function is to preserve the similarity information of the original descriptor space as much as possible using a small storage cost. In contrast, during the query execution, bit operations are fast and the computation of a single long hash code for the query is negligible. Therefore, a long hash code is generated to preserve the similarity between the query and the stored images in a high dimensional Hamming space for a better precision. The major research problem of the ACH is how to preserve the similarity between the query and the stored images using two different hash functions with different lengths. From the data compression point of view, the compresses the original descriptor vectors of the stored images to a low dimensional -bit binary hash code with a high compression ratio for less storage requirement. In contrast, the uses a smaller compression ratio for better precision of similarity comparison among images. Algorithm 1 presents the ACH in two phases: mapping and quantization. In the mapping phase, the descriptor vector of an image is mapped onto a high dimensional real-valued space ( ) using the similarity preserving Random Fourier Features. The quantization phase finds the and the sequentially. Let be the -bit hash code computed by . Firstly, the ACH finds both the -bit hash function and an orthogonal rotation matrix by minimizing the quantization error between the -bit hash code , which is the concatenation of -time repeated , and the real-valued vectors of the stored images being rotated by . Then, an -bit hash function is computed to minimize the quantization error between the rotated data and the corresponding hash codes generated by . Algorithm 1 Asymmetric Cyclical Hashing Input: : the database : the number of bits for the query : the number of bits for images in the database : the width parameter of the Gaussian kernel : the number of iterations

1227

Procedures: Phase 1: Compute the high dimensional RFF (1) for by randomly drawing Phase 2: Compute the hash functions and and (8), respectively. Compute the hash code for the database by (3) return Parameters of the RFF: and The hash function for the query: The hash function for the database: Hash code of all images in the database:

using ,

using (11)

A. Phase 1: Mapping The Random Fourier Features (RFF) is proposed by Rahami and Recht [30] to approximate the Mercer kernel value. The SKLSH [7] applies the RFF for hashing based image retrieval and achieves a good performance using a long hash code. which satisfies the following For a mercer kernel properties P1 and P2: P1: It is shift invariant: and P2: It is normalized: where is a real-valued constant. , then Let where , and denotes the distribution induced by the kernel. For a Gaussian kernel, i.e. , . By randomly drawing an i.i.d. sample ( where and , we have the following mapping:

(1) where original

( ) denotes the number of dimensions of the descriptor space (the mapped space). Then, for all and . By the uniform convergence property of the RFF, the new -dimensional space preserves the pairwise similarity of all pairs of images in the original -dimensional space. In this work we use the RFF with the Gaussian kernel to map the stored images from the original descriptor space onto a high dimension feature space , where denotes the number of stored images. The Gaussian kernel is selected because it preserves the Euclidean-based similarity between images well. B. Phase 2: Quantization The -bit short code computed by for all stored images is denoted by . Then, the -time repetitive concatenation of the -bit hash code of for the entire database is denoted by . During the query execution, Hamming distance

1228

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

is used to compare the hash code of the query with to find the nearest neighboring images. Hence, the ACH preserves the similarity among images using in an dimensional Hamming space instead of while requiring that consists of repetitions of -bit hash codes. The orthogonal rotation is conformal, hence we find the binary code using an orthogonal rotation ( ) on the images ( ) to minimize the quantization error ( ) as follows: (2) and dewhere is an orthogonal matrix notes the Frobenius norm. Both the ITQ and the OKM use the similar objective function. The ITQ finds a rotation matrix in the subspace constructed by the top principle components to minimize the quantization error. On the other hand, the OKM finds a rotation matrix to minimize the quantization error in the original feature space. The ACH finds a rotation matrix to minimize the quantization error between the concatenated binary hash codes and the real-valued vector representing the mapped feature space. For the extreme case , the ACH is identical to the ITQRFF. Hence the ACH is more general than the ITQRFF in which it provides flexibility on the compression ratio ( ) for user to trade-off between the storage efficiency and the precision rate of retrieval. The quantization error in (2) is optimized by the coordinate descent algorithm [5], [16] which solves two subproblems to optimize and iteratively by fixing the other. Firstly, is initialized by a random rotation. Then, for the subproblem of fixing and optimizing , the global optimal is computed by the following equation: (3) where

and denotes the matrix consisting of the th row to the th row of the matrix . The function outputs when and otherwise where is a real number. Proof: For a fixed rotation matrix

where , and are constants. Hence the minimization of quantization error using (4) is equivalent to the maximization of the following equation: (5) Owing to the fact that

, the maximum of

can only be achieved by (3). For the subproblem of fixing and optimizing , is optimized via minimizing (2) subject to the constraint . This is equivalent to an orthogonal procrustes problem [31] and the global optimal of can be found by a singular value decomposition of the matrix similar to [5] as follows: (6) (7) By alternatively updating and , the ACH optimizes the quantization error in (2). In each step, the ACH finds the global optimal solution for and in the subproblem of fixing to find and fixing to find , iteratively, and never increases the quantization error value. Given the fact that the lower bound of the quantization error is zero, the ACH guarantees to converge to a local minimum solution similar to the ITQ [5]. The final hash function for a stored image ( ) is computed by (8):

(8) denotes the matrix consisting of the th where row to the th row of the matrix . After finding the hash function for the stored images, the hash function for query images is generated in the same rotated feature space. Hence the ACH computes the similarity between the query and the stored images in the same rotated feature space. Owing to the fact that given a fixed , the hash code minimizing the quantization error in (2) is unique [5]. The hash function for the query image is computed directly using the fixed rotation matrix . The optimization problem of finding is as follows:

(9) Note that

.. .

(10)

(4)

Both and are constants, so the minimization . of (9) is equivalent to the maximization of Given that , the term

LV et al.: ASYMMETRIC CYCLICAL HASHING FOR LARGE SCALE IMAGE RETRIEVAL

is maximized if the sign of bits. So, the ACH computes the

and

is the same for all for an image by (11)

In this way, both and are optimized in the rotated RFF projected space, i.e. . Therefore, both and preserve the similarity of images in the -dimensional Hamming space. Since repeats times while is a full hash code with bits, the precision of is expected to be weaker than that of . The ACH trades-off the similarity precision with the compression ratio ( ) for storage efficiency. Experimental results show that does not reduce the precision significantly. C. Time Complexity Analysis The time complexity of phases 1 and 2 of the ACH are and , respectively, where denotes the number of iterations of the coordinate descent algorithm and is a constant (usually set to 50 as in [5]). The overall time complexity of the ACH is dominated by the time complexity of phase 2, i.e. , which is linear on the number of the stored images. The value of is a user selected parameter which trades off the similarity precision with the computational complexity during the query execution. A larger number of hash bits leads to a higher precision. During the query execution, the time complexity of computing the hash code for the query image is . A k-bit ITQRFF requires where . Usually is larger than , so the time complexity of the ITQRFF is not much better than that of the ACH. D. Relationship to Query Sensitive Weighted Hamming Distance For the case of using -bit short hash code for an image in the database and -bit long hash code for the query image, the Hamming distance computed by the ACH between the -bit query code and the -bit cyclical database code is calculated as follows: (12) where , , and denotes the Indicator function. It outputs 1 when the condition is true and 0 otherwise. Equation (12) can be rewritten as

(13) By rewriting (13), we have (14) where . The weight

and (

) counts the number of bits equal to zero

1229

(one) among the th bit, th bit, th bit in the query code . The equivalent weighted Hamming distance in (14) is query sensitive because the weights and for the th bit are determined by the query code . Therefore, the ACH is equivalent to a query sensitive weighted Hamming distance with -bit hash codes for both images in the database and the query image. In contrast to the real-valued weighted Hamming distance, the computations of the ACH are in binary. IV. EXPERIMENTS Four large scale databases are used to evaluate the performance of the ACH: the 22KLabelMe [32], the CIFAR10 [33], the SIFT1M [34], and the GIST1M [34]. The 22 K LabelMe database [32] contains 22,019 images described by a 512-dimensional GIST descriptor. The CIFAR10 database [33] is a subset of the Tiny Images Database and consists of 60,000 images. A 320-dimensional GIST feature set is used for the CIFAR10 as in [5]. Both the SIFT1M [34] and the GIST1M databases [34] consist of one million images. Each image is represented by a 128-dimensional SIFT features and a 960-dimensional GIST features in the SIFT1M and the GIST1M databases, respectively. The average Euclidean distance of the 50th nearest neighbor is used as the threshold of the ground truth as in [5]. For all the experiments, we randomly select 1,000 images as testing queries and the remaining as training images. For each query, images are retrieved from the training sets. The proposed method is compared with six popular hashing methods: the Iteration Quantization (ITQ) [5], the ITQ with Random Fourier Feature (ITQRFF) [5], the Orthogonal K-means (OKM) [16], the Asymmetric Hashing (AH) [4], the Locality Sensitive Hashing (LSH) [10], and the Shift-invariant Kernel based Locality Sensitive Hashing (SKLSH) [7]. For all experiments, hashing using different methods with 32-bit, 64-bit, 128-bit and 256-bit are compared. However, the code length of the ITQ and the OKM are limited by the number of dimensions of the image descriptor. Hence, the ITQ and the OKM cannot create a 256-bit hash code for the SIFT1M database with a 128-dimensional image descriptor. For the ITQ, the ITQRFF, the OKM and the AH, we used the codes provided by the authors. For the SKLSH and the LSH, the codes provided in the ITQ program package were used directly. For the ITQ, the ITQRFF, the OKM and the ACH, the number of iterations is set to be 50 (same as the default value in the ITQ and the OKM codes). For the SKLSH, the ITQRFF, and the ACH, the bandwidth of the Gaussian kernel is set to be equal to the average value of Euclidean distances of the 50th nearest neighbors of the stored images. For the ITQRFF and the ACH, the number of Random Fourier Features ( ) is set to be 1024 for experiments of the 22KLabelMe and the CIFAR10. For the SIFT1M and the GIST1M databases, the number of Random Fourier Features is set to be 768 (i.e. ) owing to the memory limitation. For the AH, we let , as suggested in [4]. Figs. 2 to 5 plot the precision-recall curves on the 22KLabelMe [32], the CIFAR10 [33], the GIST1M [34] and the

1230

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

Fig. 2. Precision recall curves on 22KLabelMe. (a) 22KlabelMe-32 bits. (b) 22KlabelMe-64 bits. (c) 22KlabelMe-128 bits. (d) 22KlabelMe-256 bits.

Fig. 4. Precision recall curves on GIST1M. (a) GIST1M-32 bits. (b) GIST1M-64 bits. (c) GIST1M-128 bits. (d) GIST1M-256 bits.

Fig. 3. Precision recall curves on CIFAR10. (a) CIFAR10-32 bits. (b) CIFAR10-64 bits. (c) CIFAR10-128 bits. (d) CIFAR10-256 bits.

Fig. 5. Precision Recall Curves on SIFT1M (a) SIFT1M-32 bits (b) SIFT1M-64 bits (c) SIFT1M-128 bits (d) SIFT1M-256 bits.

SIFT1M [34] databases, respectively. The ACH consistently outperforms all other hashing methods in all experiments using the four databases with the same storage cost (i.e. number of hash bits). The AH yields a good performance using the 32-bit hash codes which may be due to the better hashing efficiency yielded by the asymmetric hashing scheme. However, the AH shows little improvement when the hash code gets longer. This may be caused by the Stochastic Gradient Decent based training method of the AH which is not efficient for training a large number of parameters for a long hash code. The OKM achieves similar performance of the AH and outperforms the ITQ, which is consistent with the results shown in [16]. The ITQ yields better results than that of the ITQRFF on short hash codes and does not achieve significant improvement for longer hash codes. The ITQRFF achieves better performance for the long hash codes, which is consistent with the findings

in [5], [12]. In contrast to the ITQ and the OKM which find the hash functions based on the largest principal components or largest eigenvectors, both the LSH and the SKLSH are random projection based. Therefore, They both yield significant improvement for longer hash codes. Both the ACH and the ITQRFF firstly randomly project image feature vector to a high dimensional space and then reduce the number of dimensions to the required number of bits. Therefore, both methods yield a better performance when the number of hash bits (k) increases. The ACH yields a better performance than that of the ITQRFF because the asymmetric hashing scheme of the ACH provides more precise location information for the query. In summary, the performance of RFF based hashing methods (i.e the SKLSH, the ITQRFF, and the ACH) improves when the length of hash code increases. However, long hash codes are usually needed to achieve satisfactory precision-recall rates for

LV et al.: ASYMMETRIC CYCLICAL HASHING FOR LARGE SCALE IMAGE RETRIEVAL

RFF based methods, especially for the SKLSH. The ACH combines the benefits of the asymmetric hashing scheme and the RFF projection to get better precision and recall rates in comparison to other methods using the same storage cost. Therefore, the ACH yields better storage efficiency as well as higher similarity precision. Fig. 6 plots the Hamming ranking versus recall rate curves on the four databases with 64-bit hashing. Each point on the Hamming ranking curves corresponds to the recall rate and the average number of images retrieved with different hamming radii. It shows that the ACH yields the highest recall rate with the same number of retrieved images among all methods in all experiments. The ITQRFF yields the second highest recall rate except for the GIST1M database. This may be due to the high dimensionality of the GIST1M database (960 dimensions) which requires a higher dimensional Random Fourier Features and more principal component directions (i.e. bits) to yield a good performance for the non-asymmetric hashing based ITQRFF. Fig. 7 shows retrieval results of different hashing methods with 64-bit code length of three sample queries from the CIFAR10 database. Each row shows the query image (left-most) and its corresponding top retrieved images for a particular hashing method ranked by the Hamming distance from the query image. Although the ACH is an unsupervised hashing method, its retrieval results are still semantically consistent with the query to some degree. The ACH yields the best precision rates. However, all methods cannot distinguish well between the images for planes and birds. This is caused by the low level features used to describe images which yield similar feature values for images of birds and planes. For instances, birds and planes have similar shape and their images usually have similar background (i.e. sky). One possible way to overcome this problem is to use a set of high level conceptual features for the semantic image retrieval when using the ACH. Fig. 8 shows that the mean loss value of the ACH decreases very fast at the first 5 iterations and tends to converge after 20 iterations. Therefore, for further improvement of the ACH efficiency, an early stopping method can be used to reduce the number of iterations being used for the loss function minimization. The ACH provides a mechanism via the compression ratio for the selection of trade-off between the similarity precision and the storage efficiency. Fig. 9 shows that the ACH with 256-bit (i.e. and ) yields only a small decrease in the average precision ( in comparison to the ACH with 1024-bit (i.e. no compression and the ACH is reduced to the ITQRFF) while reducing a large percentage of storage . Another example shown in Fig. 9 is that the ACH with 64-bit ( ) uses only 6.25% of the storage cost to maintain a 73.70% of the average precision of that yielded by the ACH with 1024-bit. Hence, the ACH provides an effective mechanism for the trade-off between storage and average precision. Tables I and II show both the training time and the on-line query time for the SIFT1M and the GIST1M databases, respectively, for different compression rates for the ACH. On-line query times reported are averages over 1,000 queries. All programs run on the same computer with the Intel Xeon

1231

Fig. 6. Hamming ranking versus recall curves on four databases. (a) 22KlabelMe-64 bits. (b) CIFAR10-64 bits. (c) GIST1M-64 bits. (d) SIFT1M-64 bits.

CPU E5-1620 [email protected] Hz and a 32 GB RAM. All methods are programmed using the Matlab while the Hamming distance and precision-recall rates are computed using the C program provided by [14]. Over different values, training times of the

1232

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

Fig. 9. Average precision for the CIFAR10 Database with different compression ratios of the 1024-dimensional random Fourier features.

Fig. 7. Sample top retrieved images for a query. (a) Plane. (b) Horse. (c) Car.

Fig. 8. Mean loss value of (2) of the ACH for the experiment of the CIFAR10 ) over all training samples. Database using 64 bits storage (

ACH are similar while that of other methods increase significantly. This is because the ACH with different values are trained using the same length of the long query code, i.e. bits. The ACH yields a smaller training time in comparison to the AH for both and . Both hardware and software are 64-bit systems which may be the reason that the on-line query time for is smaller than that for for all methods except the ACH. Without regarding the value of , the ACH performs queries using 768-bit (i.e. -bit) long codes and therefore its on-line query time is larger than other methods in comparison and does not change over different values. In our experiments,

a linear scan over the one million images using Hamming ranking is performed to facilitate the comparison with other hashing methods using the precision-recall curves. However, for very large databases, retrieving only images in the -neighbors (i.e. images yielding a Hamming distance from the query) using a hash table lookup is more efficient, where is a non-negative integer. In this case, the ACH can be combined with the Multi-Index Hashing (MIH) [6] for fast and efficient retrieval using hash table lookup. The MIH splits the n-bit long code into -bit subcodes, where is a positive integer and is divisible by . For retrieving -neighbors of the query, it first retrieves a set of candidate images by the union of -neighbors for each of the subcodes. Then, images with a Hamming distance in the set of candidate images is filtered out by a Hamming distance filter using the full -bit hash code. For combination of the ACH and the MIH, the value of can be set equal to for the ease of computation. We use an ACH with , , , and to 63 as an example. The long hash code for a query in the ACH is divided into 32 32-bit hash codes and each corresponds to a copy of the short code. In this case, the MIH only needs to return the union of 1-neighbors for all subcodes (i.e. 32-bit codes) and filters out images in the union yielding a Hamming distance for the whole long code. The computation of the candidate image set requires only hash table lookups and the filtering uses very small computational time because the candidate set is much smaller than the total number of images in the database. The benefit of the combination of the ACH and the MIH over the original MIH is that only -bit storage is required while the original one requires -bit storage for each image. On the other hand, the 1024-bit query code can be divided into 64 16-bit subcodes and divides the 32-bit hash code for images in the database into 2 16-bit subcodes. In this case, the candidate set of images is computed by the union of the exact match of hash codes of the 64 subcodes. So only 64 hash table lookups are needed but the number of images in the union may be larger than the one created by using 32 32-bit subcodes. Nonetheless, the combination of the ACH and the MIH yields a fast retrieval time when images yielding Hamming distance are selected.

LV et al.: ASYMMETRIC CYCLICAL HASHING FOR LARGE SCALE IMAGE RETRIEVAL

1233

TABLE I TRAINING AND ONLINE QUERY TIMES FOR THE SIFT1M DATABASE

TABLE II TRAINING AND ONLINE QUERY TIMES FOR THE GIST1M DATABASE

Experiments are performed on the CIFAR10 database to illustrate how to apply the MIH to accelerate the search procedure for the ACH. For all schemes, the code lengths of the query image and the database image are and , respectively. The ACH computes the -bit Hamming distance between the query and all images in the database. Images are retrieved according to the sorted Hamming distances. All the other four schemes are different combinations of the ACH and the MIH method. Both the ACH-MIH-r1 and the ACH-MIH-r2 apply the MIH to the ACH directly with different radii while both the Hypertable-ACH-r1 and the Hypertable-ACH-r2 train independent ACH tables and apply the MIH on them. As in the MIH method, the ACH-MIH-r1 is performed by retrieving the union of -neighbors of the -bit codes of the hash tables (each table consists of the same -bit database codes (stored only one copy of the -bit codes)and the -bit subcode of the -bit query code) and then filtering out images with Hamming distance larger than for the -bit long codes. The ACH-MIH-r2 is performed by retrieving the union of -neighbors of the -bit codes of the hash tables and then filtering out images with Hamming distance larger than using the -bit long codes. This is equivalent to returning the intersect set of the -neighbors of the hash tables, because the Hamming distance of the -bit long code of an image is larger than if it is not in -neighbors of all the hash tables. Therefore, it does not need to recompute the Hamming distance of the -bit long code on the candidate set. The Hypertable-ACH-r1 independently trains ACH tables and performs the ACH-MIH-r1 for each ACH table to yield the union of retrieval results. Similarly, the Hypertable-ACH-r2 independently trains ACH tables and perform the ACH-MIH-r2 for each ACH table then returns the union of retrieval results. Fig. 10 shows the precision-recall curves of the ACH and the four combinations of the ACH and the MIH schemes on the CIFAR10 database. Owing to the fact that the MIH is a multi-table hashing method which enables an exact K-nearest neighbor search in the Hamming space, the precision-recall

Fig. 10. Precision-recall curves of the ACH and four different combinations of the MIH and the ACH on the CIFAR10 Database.

TABLE III RETRIEVAL PERFORMANCE WITHIN HAMMING RADIUS 2 ON CIFAR10 DATABASE

curves of the ACH-MIH-r1, the ACH-MIH-r2 and the ACH are the same. Both the Hypertable-ACH-r1 and the Hypertable-ACH-r2 yield better precision-recall rates because they are bagging methods with ACH tables. Table III shows the retrieval performance within Hamming Radius ( ) equal to 2 of the -bit codes of the four combination schemes on the CIFAR10 database. The HyperTableACH-r1 scheme yields the highest F1-Score among the four schemes. Moreover, Table III shows that the ACH-MIH-r2 and the HyperTable-ACH-r2 yield higher precisions than that of the ACH-MIH-r1 and the HyperTable-ACH-r1, while the ACH-

1234

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 8, AUGUST 2015

MIH-r1 and the HyperTable-ACH-r1 yield higher recalls. Different combination schemes can be selected to trade off between the precision and the recall rates according to the user requirements. V. CONCLUSION In this work, the Asymmetric Cyclical Hashing (ACH) is proposed to yield both high similarity preservation among images and better storage efficiency. The ACH stores only -bit hash codes for the stored images, while it uses -bit hash codes to compute the similarity between the stored images and the query. Experimental results show that the ACH yields higher precision and recall rates in comparison to current hashing methods using the same storage cost. The ACH is an unsupervised hashing method. Although it yields a certain degree of semantic preserving capability as shown in our experiments, it heavily depends on the semantic similarity preservation of the features being used. A semi-supervised ACH is expected to yield a better retrieval results for semantic image retrieval problems. On the other hand, the weighted Hamming distance has been shown to yield a better precision in comparison to the traditional binary Hamming distance at the cost of more computation. One of our future works is to extend the ACH to a realvalued weighted Hamming distance based asymmetric hashing scheme to provide a higher similarity precision. One may consider reducing both the storage and the computational costs of the real-valued weighed Hamming distance based hashing. Another improvement of the ACH is to relax the restriction on the multiple relationships between lengths of the short ( ) and the long ( ) hash codes. However, this also requires a more complicated optimization to find the long code based on the short code. Bit selection and weighting may be required in the optimization. REFERENCES [1] A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik, “Asymmetric distances for binary embeddings,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 33–47, Jan. 2014. [2] J. Wang, S. Kumar, and S.-F. Chang, “Semi-supervised hashing for large-scale search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12, pp. 2393–2406, Dec. 2012. [3] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” in Proc. IEEE Symp. Found. Comput. Sci., Oct. 2006, pp. 459–468. [4] B. Neyshabur, N. Srebro, R. Salakhutdinov, Y. Makarychev, and P. Yadollahpour, “The power of asymmetry in binary hashing,” in Proc. NIPS, 2013, pp. 2823–2831. [5] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp. 2916–2929, Dec. 2013. [6] M. Norouzi, A. Punjani, and D. J. Fleet, “Fast search in hamming space with multi-index hashing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 3108–3115. [7] M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes from shift-invariant kernels,” in Pproc. NIPS, 2009, vol. 22, pp. 1509–1517. [8] B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing for scalable image search,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Sep.–Oct. 2009, pp. 2130–2137.

[9] A. Gionis et al., “Similarity search in high dimensions via hashing,” in Proc. VLDB, 1999, vol. 99, pp. 518–529. [10] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in Proc. 20th Annu. Symp. Comput. Geometry, 2004, pp. 253–262. [11] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proc. NIPS, 2008, vol. 9, no. 1, p. 6. [12] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach to learning binary codes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 817–824. [13] W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “Hashing with graphs,” in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 1–8. [14] M. Norouzi and D. M. Blei, “Minimal loss hashing for compact binary codes,” in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 353–360. [15] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised hashing with kernels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 2074–2081. [16] M. Norouzi and D. J. Fleet, “Cartesian k-means,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2013, pp. 3017–3024. [17] Y. Gong, S. Kumar, H. A. Rowley, and S. Lazebnik, “Learning binary codes for high-dimensional data using bilinear projections,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2013, pp. 484–491. [18] Y. Gong, S. Kumar, V. Verma, and S. Lazebnik, “Angular quantizationbased binary codes for fast similarity search,” in Proc. NIPS, 2012, pp. 1205–1213. [19] M. Bawa, T. Condie, and P. Ganesan, “Lsh forest: Self-tuning indexes for similarity search,” in Proc. 14th Int. Conf. World Wide Web, 2005, pp. 651–660. [20] R. Panigrahy, “Entropy based nearest neighbor search in high dimensions,” in Proc. 17th Annu. ACM-SIAM Symp. Discrete Algorithm, 2006, pp. 1186–1195. [21] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, “Multi-probe LSH: Efficient indexing for high-dimensional similarity search,” in Proc. 33rd Int. Conf. Very Large Databases, 2007, pp. 950–961. [22] K. Grauman and T. Darrell, “Pyramid match hashing: Sub-linear time indexing over partial correspondences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2007, pp. 1–8. [23] B. Kulis and T. Darrell, “Learning to hash with binary reconstructive embeddings,” in Proc. NIPS, 2009, vol. 22, pp. 1042–1050. [24] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon, “Spherical hashing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 2957–2964. [25] J. Wang et al., “Optimized distances for binary code ranking,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 517–526. [26] J. Wang, H. T. Shen, J. Song, and J. Ji, “Hashing for similarity search: A survey,” CoRR, vol. abs/1408.2927, 2014 [Online]. Available: http:/ /arxiv.org/abs/1408.2927 [27] S. Bondugula, “Survey of hashing techniques for compact bit representations of images,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Maryland, College Park, MD, USA, 2013. [28] W. Dong, M. Charikar, and K. Li, “Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 123–130. [29] A. Gordo and F. Perronnin, “Asymmetric distances for binary embeddings,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 729–736. [30] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Proc. NIPS, 2007, vol. 3, no. 4, p. 5. [31] P. H. Schönemann, “A generalized solution of the orthogonal procrustes problem,” Psychometrika, vol. 31, no. 1, pp. 1–10, 1966. [32] A. Torralba, R. Fergus, and Y. Weiss, “Small codes and large image databases for recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2008, pp. 1–8. [33] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Comput. Sci. Dept., Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009. [34] H. Jegou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 117–128, Jan. 2011.

LV et al.: ASYMMETRIC CYCLICAL HASHING FOR LARGE SCALE IMAGE RETRIEVAL

Yueming Lv received the B.S. degree from the South China University of Technology, Guangzhou, China, and is currently working toward the M.S. degree at the South China University of Technology. His research interests include machine learning, computer vision, information retrieval, and evolutionary computation.

Wing W. Y. Ng (S’02–M’05–SM’15) received the B.Sc. and Ph.D. degrees from the Hong Kong Polytechnic University, Hong Kong, in 2001 and 2006, respectively. He is currently an Associate Professor with the School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. His current research interests include neural networks, large-scale image retrieval, and machine learning in nonstationary environments. Dr. Ng is an Associate Editor of the International Journal of Machine Learning and Cybernetics. He is a Principle Investigator of two China National Nature Science Foundation projects and the Program for New Century Excellent Talents in University from the China Ministry of Education. He served on the Board of Governors of the IEEE Systems, Man, and Cybernetics Society from 2011 to 2013.

Ziqian Zeng is currently working toward the B. S. degree in computer science and engineering at the South China University of Technology, Guangzhou, China. Her current research interests include machine learning and computer vision.

1235

Daniel S. Yeung (M’89–SM’99–F’04) received the Ph.D. degree from Case Western Reserve University, Cleveland, OH, USA. He was a Faculty Member with the Rochester Institute of Technology, Rochester, NY, USA, from 1974 to 1980. For the next ten years, he held industrial and business positions in the USA. In 1989, he joined the Department of Computer Science, City Polytechnic of Hong Kong, Hong Kong, as an Associate Head/ Principal Lecturer. He served as the Founding Head and Chair Professor with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, until his retirement in 1906. He is currently a Visiting Professor with the School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. His current research interests include neural network sensitivity analysis, largescale data retrieval problem, and cybersecurity. Dr. Yeung served as the President of the IEEE Systems, Man, and Cybernetics Society from 2008 to 2009.

Patrick P. K. Chan (S’03–M’07) received the Ph.D. degree from the Hong Kong Polytechnic University, Hong Kong, in 2009. He is currently an Associate Professor with the School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. His current research interests include pattern recognition, adversarial learning, and multiple classifier systems. Dr. Chan has been a member of the Governing Board of the IEEE Systems, Man and Cybernetics Society since 2014. He was the Chairman of the IEEE SMCS Hong Kong Chapter in 2014 and 2015, and the Counselor of the IEEE Student Branch in the South China University of Technology.