World Wide Web DOI 10.1007/s11280-015-0346-0
Feature aggregating hashing for image copy detection Lingyu Yan1 · Fuhao Zou2 · Rui Guo3 · Lianli Gao4 · Ke Zhou5 · Chunzhi Wang1
Received: 15 December 2014 / Revised: 31 March 2015 / Accepted: 14 April 2015 © Springer Science+Business Media New York 2015
Abstract Currently, research on content based image copy detection mainly focuses on robust feature extraction. However, due to the exponential growth of online images, it is necessary to consider searching among large scale images, which is very timeconsuming and unscalable. Hence, we need to pay much attention to the efficiency of
Rui Guo
[email protected] Lingyu Yan
[email protected] Fuhao Zou fuhao
[email protected] Lianli Gao
[email protected] Ke Zhou
[email protected] Chunzhi Wang
[email protected] 1
School of Computer Science, Hubei University of Technology, Wuhan, China
2
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
3
School of Computer, Southeast University, Nanjing, China
4
School of Computer, University of Electronic Science and Technology of China, Hefei, China
5
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
World Wide Web
image detection. In this paper, we propose a fast feature aggregating method for image copy detection which uses machine learning based hashing to achieve fast feature aggregation. Since the machine learning based hashing effectively preserves neighborhood structure of data, it yields visual words with strong discriminability. Furthermore, the generated binary codes leads image representation building to be of low-complexity, making it efficient and scalable to large scale databases. Experimental results show good performance of our approach. Keywords Image copy detection · Visual words · Feature aggregation · Machine learning base hashing
1 Introduction Since software for editing digital content is easily accessible, digital images may be subject to different kinds of attack like scale change, cropping, resolution or contrast change, etc., which yield different image copies preserving the main semantic content of the original image, causing infringements to the copyright of original works.To satisfy the necessity of protecting the copyright of some digital images, content based image copy detection is proposed, which performs the detection by processing content of raw multimedia, ignoring partial data and avoiding embedding digital watermarks into the original works and hence is an effective way to trace unmarked content after being distributed. Currently, many approaches based on local features have been proposed for image copy detection [7, 18]. In [10], Ke et al. proposed a local-region detector for near-duplicate detection and sub-image retrieval based on PCA-SIFT descriptor. To further improve the retrieval efficiency, Foo et al. [3] proposed pruning strategy that reduces the number of SIFT features. According to the evaluation the SIFT descriptor achieves the best performance among all the local descriptors. To further improve the efficiency, some approaches [9, 14] based on bag of visual words have been proposed, which aggregate visual words of one image into a compact representation of local points. Jegou et al. [9] proposed a descriptor named VLAD(vector of locally aggregated descriptors), which aggregates SIFT descriptors into a vector of limited dimension. However, obtaining the visual vocabulary by quantization is very time-consuming and unscalable to large databases. In the case of large datasets, such algorithms impose a very high computation load. It is obvious that the nearest neighbor search for searching similar features does not give a practical solution, while the approximate nearest neighbor search gives efficient solution for such problems. Researchers has introduced hashing into the field of retrieval [5, 6, 8, 19–21]. Tang et al. [15] develop a global method using nonnegative matrix factorization (NMF), which first convert image into a fixed-sized pixel array and then generate secondary image by rearranging pixels and applying NMF to produce a feature-bearing hash code, after that, the fingerprint is coarsely quantized into binary string and scrambled to generate the image fingerprint.Self-taught hashing (STH) [22] is proposed and considered as one of the state-of-the-art works [13]. However, it suffers overfitting problem since the operations of generating hash codes for traing data and hash function for test data are independently handled, which will lead to poor generalization ability.Spectral hashing (SpH) [16] uses a separable Laplacian eigenfunction formulation that ends up assigning more bits to directions along which the data has a greater range. However, this approach is somewhat heuristic and relies on an unrealistic
World Wide Web
assumption that the data is uniformly distributed in a high-dimensional rectangle. On the whole, although hashing based method is considered as an effective approach for approximate nearest neighbor search, semantic loss issue remains the big challenge in the area of hashing. Motivated by this, we propose a promising approach using binary fingerprints to define visual words. To construct the visual vocabulary, we first extract local features from training image dataset and apply K-means to them to generate K clusters. Then, we propose a machine learning based hashing to generate binary codes of visual words and get hashing functions which map local features into binary code efficiently. After that, histograms of visual words are constructed as image representation and histogram comparison is employed to measure the similarity between two images. The main contributions of this work are as follows: Firstly, we propose a machine learning based hashing method to convert visual words into binary codes, which are efficient and scalable to large scale databases. Secondly, we implement our hashing method by constructing a joint framework which simultaneously achieves the aim of neighbourhood preservation and discrimination enhancement, making hash codes discriminative for image representation. The remainder of this paper is organized as follows. In Section 2, we present the hashing based feature aggregating for fast image search. In Section 3, extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed algorithm. Finally, we draw a conclusion in Section 4.
2 Apporach 2.1 Framework of image search We propose an image search framework illustrated in Figure 1. It is composed of two stages called train stage and query stage, the key point of which is machine learning based hashing for feature aggregating. In training stage, we first extract local features derived from training images, which are denoted by X = [x1 , x2 , . . . , xn ] ∈ Rm×n (m is the dimensionality of the features, n is the number of local features). Secondly, to construct the visual vocabulary, we use K-means to those local feature to generate K clusters and apply machine learning based hashing to get corresponding hashing functions f (x) which then map local features into binary codes efficiently. After that, the K centroids are mapped into binary visual words denoted by C = {c1 , c2 , ..., cK }(ci ∈ {−1, 1}d ). Then local features corresponding to training images are mapped into binary codes Y = [y1 , y2 , ..., yn ], yi ∈ {−1, 1}d (d < m). Note that different image results in different number of local features, here we donote the number by n in a unified manner for simplification. Finally, for one image I , every binary code corresponding to the local feature is assigned a visual word which is nearest to the binary code, and histogram H (I ) = [H1 (I ), H2 (I ), ..., HK (I )] of visual words is constructed as training image representation, where Hk (I ) is the number of k −th visual word ck presented in the image I . In online detection stage, we first extract local features derived from image set, then map those local features into binary codes using the hashing functions learned in the process of training. After that, every binary code corresponding to the local feature are assigned a visual word which is nearest to the binary code, and histogram of visual words is constructed as test image representation. It’s obvious that offline stage consumes a lot of time, but the efficiency of image search mainly depends on the online stage.
World Wide Web
Figure 1 Framework of proposed method
World Wide Web
2.2 The proposed hashing algorithm Given a data set X = [x1 , x2 , ..., xn ], where xi ∈ Rm , and n is the total number of data points, we first constructs its local and global structures by Neighborhood Preservation and Discrimination Enhancement respectively, which are detailed in following part. Then we generates a hash function f (x) = sign(P T x) based on the constructed structures, where P ∈ R m×d and x ∈ Rm . After that, by utilizing the hash function f (.), we map each data point xi in X into yi ∈ {−1, 1}d (d < m) such that the close data points have the similar codes while the distant data points have the dissimilar codes. Y = [y1 , y2 , ..., yn ] belongs to the hamming space.
2.2.1 Neighborhood preservation Following the local structure based methods, such as SpH [17] and STH [22], we adopt the affinity graph to characterize the local structure. In the affinity graph, each vertex is uniquely corresponding to a point in X and the two vertices are connected via a edge if one of them is among the k-nearest neighbors of the other one. Mathematically, we denote the graph by affinity matrix A of the form 1 if xi ∈ Nk (xj ) or xj ∈ Nk (xi ) A(i, j ) = k 0 otherwise where A(i, j ) is the weight of the edge between vertices xi and xj , and Nk (x) represents the set of k-nearest-neighbors of data point x. Generally, we evaluate the extent of locality preserving by checking how much the affinity graph is retained after the data being mapped into the hamming space, namely, to what extent the neighborhood relations between data points still hold. Here, the extent of locality preserving is quantitatively expressed as semantic loss S = ni,j =1 A(i, j )yi − yj 2 , (1) = T r(Y LY T ) where L = D − A = I − A, D(i, i) = j W (i, j ) = 1, and L is graph Laplacian. Suppose that for each data point, it is associated with a cirque consisting of itself and its k − 1 nearest neighbors. However, directly optimizing the (1) is liable to result in NPhard problem due to the discrete value constraint of matrix Y . To avoid such problem, we T relax n the discrete value constraint by replacing the Y with P X, where X is centered,i.e., i=1 xi = 0. Thus, the optimization of (1) is reduced to min : P
T r(P T XLXT P ),
subj ect to : P T P = Id
(2)
where Id is d × d identity matrix, and P T P = Id is to avoid trivial solution.
2.2.2 Discrimination enhancement Principally, if just seeking to preserve the local structure, we only achieve the target that the close data have the similar codes. Thus, how to make the distant data have dissimilar codes remains an issue to be solved. According to LDA, we know that discrimination structure is quite helpful to make the distant points far apart. However, constructing the discrimination structure encoded by LDA requires that label information is available, severely restricting the application scope of the hashing method to be proposed. Obviously, it is desirable to
World Wide Web
discover the discriminant structure in an unsupervised way. Inspired by the interpretation of LDA from the graph embedding point of view, based on the obtained affinity graph (which has been proved to be Laplacian graph), we intend to define the characterization of discriminant structure in unsupervised way. For each data point, it has been assigned to a unique cluster. Assume that the data set X is partitioned into K clusters. Linear discriminant analysis (LDA) seeks a set of linear projections by which the within-class scatter Sw is minimized while the between-class scatter Sb being maximized. Specifically, we define the total scatter St , within-class scatter Sw , and between-class scatter Sb . n (xi − u)(xi − u)T , (3) St = i=1 nk c Sw = (xi(k) − u(k) )(xi(k) − u(k) )T , k=1 i=1 c
Sb =
nk (u(k) − u)(u(k) − u)T ,
(4) (5)
k=1
where u is the centroid of the all data points, u(k) is the centroid of the k − th class, nk is the number of data points in the k − th class, and xi(k) denotes data point xi belonging to the k − th class. As stated in [12], it follows from (3), (4), and (5) that St = Sb + Sw , namely the total scatter is equal to the sum of within-class scatter and between-class scatter. Formally, LDA finds a projection matrix P = [p1 , p2 , ..., pn ] ∈ Rm×d by optimizing the following objective function: max : T r(P T (St + μI )−1 Sb P ), P (6) subj ect to : P T P = I where T r(·) is trace operator and the scaled identity matrix μI is added to avoid the singular value problem, and μ > 0 is scale factor. In [2], LDA has been interpreted by Cai from the graph embedding point of view. Let (k) (k) (k) x¯ i denote the centered data point and X¯ (k) = [x¯ 1 , x¯ 2 , ..., x¯ nk ] denote the centered data matrix of k − th cluster. It is easy to get ¯ (k) (k) ¯ (k) T Sb = K k=1 X W (X ) where W (k) is a nk × nk matrix with all its elements being n1k , nk is the number of data points in the k − th class. Let X¯ = [X¯ (1) , X¯ (2) , ..., X¯ (K) ] be the centered data matrix of X and matrix W denote as ⎡ (1) ⎤ W 0 ... 0 ⎢ 0 W (2) . . . 0 ⎥ ⎢ ⎥ W =⎢ . (7) .. . . .. ⎥ ⎣ .. . . ⎦ . 0 0 . . . W (K) It follows that K ¯ X¯ T . X¯ (k) W (k) (X¯ (k) )T = XW (8) Sb = k=1
Since St = X¯ X¯ T , Sw is then able to be defined as ¯ − W )X¯ T . Sw = St − Sb = X(I
(9)
Recall the formulation in (2), we found that the W in (9) has the same meaning as A, so we can optimize (2) and (6) simultaneously by replacing W with A.
World Wide Web
2.2.3 Unified objective function and solution Since X¯ (k) = [x¯ 1(k) , x¯ 2(k) , ..., x¯ n(k) k ] denote the centered data matrix of k − th class, we can use X¯ = [X¯ (1) , X¯ (2) , ..., X¯ (K) ] to replace the centered X in (2). For simplification, we use X to denote centered X in both objective function. Then (6) turn out to be the following objective function. max : P
T r(P T (XXT + μIm )−1 XAXT P ),
subj ect to : P T P = Id
(10)
where Im is m×m identity matrix. Considering that P T P = Id , the (10) can be reformulated as min : T r(P T P − P T (XXT + μIm )−1 XAXT P ) P (11) = T r(P T (Im − (XXT + μIm )−1 XAXT )P ), subj ect to : P T P = Id Then we can integrate the objective functions (2), (11) together and thus have the unified objective function min : λT r(P T XLXT P ) + (1 − λ)∗ P
T r(P T (Im − (XXT + μIm )−1 XW XT )P ), subj ect to : P T P = Id
(12)
where 0 ≤ λ ≤ 1, which is used to balance the tradeoff between the two structures. Let B = XLXT , C = (XXT + μIm )−1 XAXT . The (12) turn out to be: min : T r(P T BP )+ P
T r(P T (λ(Im − C)P ) + (1 − λ)T r(P T P ) = T r(P T DP ), subj ect to : P T P = Id
(13)
Where
D = B + λ(Im − C) + (1 − λ)Im = B + λIm − λC + Im − λIm (14) = B − λC + Im Then P can be obtained by the d eigenvectors of D corresponding to the d smallest nonzero eigenvalues. Based on P , the K centroids are mapped into binary visual words and local features corresponding to images are mapped into binary codes. For the case where the dimension of the data m is comparatively low, above hashing method can be efficiently performed according to the above steps. However, if m is very large, it is computationally prohibitive. Actually, in most cases, the data to be processed has very high dimension, especially for the features of multimedia, such as document, image, video. This indicates that it is infeasible to train on the data with high dimensionality, which prevents some optimization strategies from being applied. To deal with the above problem, we plan to exploit the SVD trick to decrease time complexity of our method while not degrading the optimization objective defined in (13) at all. Let X = U V T be the SVD of X, where X ∈ Rm×n , U ∈ Rm×m , V ∈ Rn×n , and ∈ Rm×n . Suppose that t = rank(X). Let U1 ∈ Rm×t and V1 ∈ Rn×t be the first t columns of U and V , respectively. Let the square matrix t ∈ Rt×t consist of the first t rows and the first t columns of . We have X = U V T = U1 t V1T .
(15)
World Wide Web
Following the definition of X in (15), it follows that B = XLXT = U V T LV U T , = U1 t V1T LV1 t U1T
(16)
and C= = = =
(XXT + μIm )−1 XAXT U ( 2 + μIm )−1 V T AV U T (U1 t V1T V1 t U1T + μIm )−1 U1 t V1T AV1 t U1T , U1 (t2 + μIt )−1 t V1T AV1 t U1T
(17)
where It is identity matrix of size t × t, and t ≤ m. Let us define matrix A1 , B1 as B1 = t V1T LV1 t ,
(18)
C1 = (t2 + μIt )−1 t V1T W V1 t .
(19)
and
Therefore, the (13) can be rewritten as min : T r(P T U1 (λB1 + λ(It + C1 ) + (1 − λ)It )U1T P ) P
= T r(QT D1 Q), subj ect to : QT Q = It
(20)
where D1 = B1 − λC1 + It and Q = U1T P . By performing eigen-decomposition on C1 , we have Q and then compute P as follows P = U1 Q.
(21)
Since mostly t = rank(X) is far smaller than m, the size of matrixes B1 , C1 , D1 ∈ Rt×t is significantly less than that of B, C, D ∈ Rm×m . This means that by virtue of the SVD, the computational cost for obtaining V dramatically decreases. In the end, we summarize the SVD-based LDPH in Algorithm 1.
World Wide Web
Figure 2 Sample images and corresponding sample image copies
3 Experiments and analysis 3.1 Experiment setting Our original image dataset consists of 46,735 images, where 1000 images come from the COREL 1000 image database, 15,128 Images are from the CEACLIC database, and the other 30,607 images come from the Caltech 256 database. We randomly select 1000 images from the image dataset to be training images. Using Stirmark, every training image is modified to generate 100 copies in the way of [23]. Thereby, we get 100,000 copies and 46,616 non-copies. Figure 2 shows some sample images and corresponding image copies.
3.2 Performance of the proposed method We first evaluate the nearest neighbor search performance on GIST1M, which contains 1 million 384-d GIST features. We compare our method with four state-of-the-art methods including kernelized locality sensitive hashing (KLSH) [11], spectral hashing (SpH) [17], self-taught hashing [22], iterative quantization (ITQ) [4]. Following the search strategy of Hamming ranking commonly adopted in many hashing methods, we evaluate the recall at the first N Hamming neighbors, the performance of which is shown in Figure 3. Our hashing method outperforms all other methods. Then we explore the influences of code length on the retrieval performance by tuning the code length c from 4 to 80. For any code length, we first find 100 image copies whose Hamming distance to the query is less than other images, and then calculate the F1-measure of this query. After the F1-measure of all queries is calculated, the average F1-measure is regarded as F1-measure of current code length. Figure 4 demonstrates that the performance of our method is superior to all the methods when the code length is small. With the increase of code length, the F1-measure of most methods decrease gradually while SpH hold its superiority. However, since long binary codes demand more memory and computation, the performance of long code length is not that important. According to Figure 4, we set the code length to be 20 for further comparison experiment. To evaluate the performance of the proposed approach, comparison experiments with VLAD and Fisher Vector are done. Among the two methods, SIFT are extracted as local features. After the local feature got aggregated, histogram comparison is done to measure the similarity among images.Let RP be the number of true copies correctly assigned to the
World Wide Web
Figure 3 Nearest neibhor search performance of hashing methods on GIST1M
positive class, FP is the number of false copies incorrectly assigned to the positive class, and RN the number of true copies incorrectly rejected by the positive class. The precision and recall are defined as: precision =
RP RP +FP
, recall =
RP RP +RN
(22)
.
Figure 5 shows the comparison results between our method, Fisher Vector and VLAD, in terms of precision-recall curves for query images. At the first step of implementing this method, we set the number of cluster centroids to 64(k = 64), then a VLAD vector of 8192(d = 8192)dimensions is acquired for each image. Second, each VLAD is reduced to 128 dimensional by principle component analysis(PCA). Third, we implement Fisher Vector in 1 SpH STH Proposed method
0.9 0.8
0.6 0.5
1
F measure
0.7
0.4 0.3 0.2 0.1 0
10
20
30
40 50 code length
60
Figure 4 F1-measure of the comparative algorithm under various code length
70
80
World Wide Web
Figure 5 The P-R curve of proposed method compared with VLAD
Figure 6 Average time cost of querying over different sample size
case of K = 64, D = 4096. Fourth, we implement our method in case of K = 8192. It can be observed that our method yields good performance in terms of both precision and recall. In addition,we demonstrate the effectiveness of proposed method, since VLAD is relatively efficient among comparison method, we only compare our method with VLAD on average time cost of querying one image over datasets of different size. Figure 6 shows the comparison of average time cost of proposed method(K = 8192) and VLAD(K = 64 D = 8192) feature data. Algorithms are run on Matlab R2012b with a 32-bit PC with a 2.4-GHz CPU and 2 GB of RAM. It is obvious that the time cost of our method is less than VLAD. Although our method involves in 8192 centroids, which is much more than VLAD(only 64 centroids),
World Wide Web
4 Conclusion In this paper, we propose a new hash based feature aggregating method for image search, which adopts a unified training framework to simultaneously achieve the aim of neighbourhood preservation and discrimination enhancement. During the process of hashing, locality information is successfully preserved by minimizing semantic loss among nearest neighbors, while within-class scatter is minimized and the between-class scatter is maximized. In the scheme, we successfully project high-dimensional local features into low dimensional Hamming space, making it efficient to compute pairwise similarity by using a simple XOR and bit-count operation, which further improve the efficiency of feature aggregation. The extensive experiments demonstrate that our method shows good performance in image search. Acknowledgments Thanks for the funding supported by the National Natural Science Foundation of China (No. 61170135, No. 61202287, No.61440024), and the General Program for Natural Science Foundation of Hubei Province in China(No.2013CFB020, No. 2014CFB590), and Natural Science Foundation of Hubei University of Technology(No. BSQD13039).
References 1. Berrani, S.A., Amsaleg, L., Gros, P.: Robust content-based image searches for copyright protection. Int. Workshop Multimed. Databases, 70–77 (2003) 2. Cai, D., Xiaofei, H., Jiawei, H.: SRDA: An efficient algorithm for large-scale discriminant analysis. IEEE Trans. Knowl. Data Eng. 20(1), 1–12 (2008) 3. Foo, J.J., Sinha, R.: Pruning sift for scalable near-duplicate image matching. In: Proceedings of the Eighteenth Conference on Australasian Database, pp. 63–71 (2007) 4. Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: Proceedings of the IEEE Computer Conference on Computer Vision and Pattern Recognition, pp. 817–824 (2011) 5. Han, Y., Wu, F., Tian, Q., Zhuang, Y.: Image annotation by input-output structural grouping sparsity. IEEE Trans. Image Process. 21(6), 3066–3079 (2012) 6. Han, Y., Wu, F., Tao, D., Shao, J., Zhuang, Y., Jiang, J.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circ. Syst. Video Technol. 22(10), 1485–1496 (2012) 7. Han, Y., Yang, Y., Zhou, X.: Co-regularized ensemble for feature selection. Int. Joint Conf. Artif. Intell., 2013 (2013) 8. Han, Y., Yang, Y., Yan, Y., Ma, Z., Sebe, N., Zhou, X.: Semi-supervised feature selection via spline regression for video semantic recognition. IEEE Trans. Neural Netw. Learn. Syst. (IEEE T-NNLS) 26(2), 252–264 (2015) 9. Jegou, H., Perronnin, F., Douze, M., et al.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 1704–1716 (2012) 10. Ke, Y., Sukthankar, R., Huston Larry, L.: Efficient near-duplicate and subimage retrieval. ACM Multimed, 869–876 (2004) 11. Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2130–2137 (2009) 12. Kokiopoulou, E., Chen, J., Saad, Y.: Trace optimization and eigenproblems in dimension reduction methods. Numer. Linear Algebra Appl. 18(3), 565–602 (2011) 13. Li, P., Wang, M., Cheng, J., Xu, C., Lu, H.: Spectral hashing with semantically consistent graph for image indexing. IEEE Trans Multimed 15(1), 141–152 (2013) 14. Poullot, S., Crucianu, M., Buisson, O.: Scalable mining of large video databases using copy detection. ACM Multimed, 61–70 (2008) 15. Tang, Z., Wang, S., Zhang, X., Wei, W., Su, S.: Robust image hashing for tamper detection using nonnegative matrix factorization. J. Ubiquitous Convergence Technol 2(1), 18–26 (2008) 16. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. Adv. Neural Inf. Process. Syst, 1753–1760 (2009) 17. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Proceedings of the 2008 Conference Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)
World Wide Web 18. Yan, Y., Liu, G., Wang, S., Zhang, J., Zheng, K.: Graph-based clustering and ranking for diversified image search. Multimed. Systemsation. doi:10.1007/s00530-014-0419-4 (2014) 19. Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y.: A multimedia retrieval framework based on semisupervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(5), 723–742 (2012) 20. Yang, Y., Ma, Z., Xu, Z., Yan, S., Hauptmann, A.: How related exemplars help complex event detection in web videos? In: Proceedings of the 2013 IEEE International Conference on Computer Vision, pp. 1–8 (2013) 21. Yang, Y., Ma, Z., Nie, F., Chang, X., Hauptmann, A.: Multi-class active learning by uncertainty sampling with diversity maximization. Int. J. Comput. Vis. (IJCV). doi:10.1007/s11263-014-0781-x (2014) 22. Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25 (2010) 23. Zou, F., Feng, H., Ling, H., Liu, C., Lingyuyan, Li, P., Li, D.: Nonnegative sparse coding induced hashing for image copy detection. Neurocomputing (Elsevier Academic Press) 105(1), 81C89 (2013)