Int. J. Computer Applications in Technology, Vol. 54, No. 1, 2016
61
Trademark image retrieval using weighted combination of sift and HSV correlogram Akriti Nigam* and R.C. Tripathi Indian Institute of Information Technology, Allahabad, India Email:
[email protected] Email:
[email protected] *Corresponding author Abstract: This paper describes an effective tool to automate the process of trademark similarity checking at the time of registration. A combination of shape and colour feature has been illustrated here so that images taken from different vantage points or levels fall into the same group. This was particularly important for trademark images to curb attempts at trademark infringement through object transformations. SIFT has been used as a powerful technique that is unaffected by various transformations like rotation, scaling, translation, change in view etc. Correlogram on the other hand has been used to describe the colour content of the image which is then combined with SIFT to completely represent a trademark image. Keywords: SIFT; colour correlogram; interest points; trademark retrieval. Reference to this paper should be made as follows: Nigam, A. and Tripathi, R.C. (2016) ‘Trademark image retrieval using weighted combination of sift and HSV correlogram’, Int. J. Computer Applications in Technology, Vol. 54, No. 1, pp.61–67. Biographical notes: Akriti Nigam received her BTech degree in Information Technology from United College of Engineering and Research, Allahabad in 2009. In 2011, she received MTech degree with specialisation in Human Computer Interaction from Indian Institute of Information Technology, Allahabad, and presently pursuing research from same university since 2011. She specialises in image processing, intellectual property rights, and pattern recognition. R.C. Tripathi received his MSc degree (Physics with Electronics) in 1967 and PhD in Solid State Physics using Electronics Instrumentation in 1972 from University of Allahabad. He is Ex. Sr. Director, R&D in IT Group, DIT, Min of Comm’n & IT, (MCIT) Govt. of India, New Delhi, up to 31 March 2007 and is presently Dean (SA) and Head of Department (IPR) in Indian Institute of Information Technology, Allahabad. He has 26 years of R&D and 29 years of teaching experience. His area of interest includes information and network security, digital and data communications, intellectual property rights, computer networks management, digital imageprocessing.
1
Introduction
The field of image retrieval is a very deeply researched area for a long time. It is a very broad field of research, and work carried out is in varied areas of application, for example medical diagnosis, geographical information and remote sensing, cultural and archaeological designs, intellectual property rights etc. The present research work is primarily focused on trademark images. The objective of is to develop a system which can be used to avoid duplicity or infringement of trademarks. The system will help in identifying similarity of a query image (proposed trademark image for registration) with the existing trademarks using the SIFT and colour correlogram technique. SIFT technique is independent of variations like rotation of image; change in angle of frame, size of
Copyright © 2016 Inderscience Enterprises Ltd.
image etc. and it is an effective technique to be used while dealing with trademark images. This technique arrests possibilities of approval of an image as a trademark if it is similar to some previously existing trademark image which may occur by mistake or even can be case of intentional trademark infringement. Previous techniques in this field basically focused on colour, edges and texture and could not detect orientation variance and other image transformations. Certain techniques handled few transformations but are mainly restricted to corner detection. SIFT technique identifies image similarity irrespective of rotation of image, change in angle of frame, size of image etc. and is capable of handling a wide range of transformations and orientation changes. For a trademark image the contribution of background image is negligible when compared to central object.
62
A. Nigam and R.C. Tripathi
Connected components of the image are calculated and then components of background image are removed keeping only the central object, on which SIFT vector is applied. Colour feature of connected components is extracted using HSV 2d Correlogram. A weighted combination of SIFT vector and correlogram feature is used to compare the similarity of query image with the images in the database and a list in descending order of similarity is thus formed for deciding the possibility of trademark infringement.
2
Related work
The amount of research done and the range of techniques established in the field of CBIR are immense. Notable however amongst them are works mentioned in this section. One such work is by Nandgopalan et al. (2008), who relied on the three basic features of an image, its colour, texture and shape to uniquely represent an image. Prominent techniques discussed by them are edge histogram density, autocorrelation function for quantifying texture and RGB colour histogram. A similar work of finding similar images from a large database is mentioned in Qi et al. (2010) that are focused on trademark images. For defining shape, two descriptors have been used; RAPC-HCD which is contour based and SDFP-FPM which is a region based descriptor. Using these two and an average probability distribution of dissimilarity value along with global dissimilarity value, most similar images in the database are retrieved. SIFT feature extraction has found application in varying fields of research for example it has been used to match similar body tattoo images to assist in law enforcement through victim identification (Jain et al., 2009). It also finds suitability in the field of digital archaeology where SIFT feature points are matched to find similar images of cultural relics (Wen et al., 2011). Zhi et al. (2009) have used SIFT feature in medical imaging to find similar CT scan images using bag of words framework. Hoang et al. (2010) proposed an image retrieval technique that uses contourlet Harris detector which is a combination of Non Subsampled Contourlet Transform (NSCT) and Harris corner detector. Another variation in similar image retrieval is mentioned in the work of Velmurugan and Baboo (2011) where again Harris corner detector is used to find the interest points which are further organised in the form of Histogram of Oriented Gradients. The retrieval system proposed by Velmurugan and Baboo (2011) achieves average precision of 82.46%. Another research domain where SIFT has been used successfully is in face recognition. In Velmurugan and Baboo (2011), person specific SIFT features are extracted and matching is done using a combination of local and global similarity strategy. Experiments reveal that their proposed method works efficiently for variations in expressions, poses etc. except for variations in lightening and age of person. In Wu et al. (2009) and Kogler and Lux (2010), image is represented as a bag of visual words. Concept of bundled SIFT features is used which proves to be a more powerful descriptor than using SIFT individually. Wen et al. (2011)
used the concept of PCA-SIFT to save on the computational time by reducing the dimensions of SIFT vector from 128 to 20. After computing the reduced dimensional SIFT vector, Euclidean distance is used to find similarity between query image and database images. Their approach improves on the recall values obtained by previous cultural relic image retrieval systems. One of the works focusing on colour means for image retrieval has been discussed in Khokher and Talwar (2012). Janet and Reddy (2012) proposed an index model for image retrieval using SIFT distortion. Other techniques which also provide translation, scale and rotation invariance are discussed in Fu et al. (2006). Some of them are based on Zernike moments, Gabor filters and the like, their limitation however being focussing only on texture feature.
3
Methodology
Since we can have different sizes of image objects from different viewpoints and different depths, it becomes difficult to compare the features of these perceptually similar images. Here in this method we do not apply the feature extraction algorithm on the entire trademark image. Instead, we first extract the central objects (foreground) from the image and remove the background using connected component analysis. For this we apply the object extraction technique using connected component analysis. Once we get all the disjoint components from the trademark image, we apply the following SIFT and colour feature extraction process on all the image components.
3.1 Scale invariant feature transform Determination of SIFT (Nandagopalan et al., 2008) vector of image component involves identification of location of peaks in a scale space which consists of knowledge of exact location of interest points and its exact angular orientation. This way of locating a point through its location in X-Y plane as well as its dominant orientation is essential for achieving scale and rotation invariance. If detection of only any one of these (exact location and exact orientation) will be done as in corner detector technique Harris detector (Velmurugan and Baboo, 2011), which detects point of interest, we can’t have a method for finding point descriptor. Figure 1 shows categorisation of different phases of SIFT vector. Figure 1
SIFT computation pipeline
Trademark image retrieval using weighted combination of sift and HSV correlogram
3.1.1 Scale space and key points detection Analysing an image at multiple scales is supported by the fact that there are very slight chances of obtaining the correct value of sigma (σ) by application of canny or Laplacian of Gaussian operator. Hence, LoG filtered images for various sigma values are used for creation of scale space. Steps (as shown in Figure 2) for determining whether a point say ‘p’ on a particular scale is point of concern are: 1
Create a 3X3 neighbourhood at that point on a particular scale.
2
Similarly create a 3X3 neighbourhood in a scale above and below the selected scale.
3
The above two steps thus form a collection of 27 points. These points can be represented by their pixel position and scale i.e. (x, y, σ).
4
If the point ‘p’ is either point of maxima or minima when considered in this collection of 27 points, then it is marked as point of interest.
Figure 2
63
weights. The pixel direction with highest weight is considered as orientation of point of interest.
L x 1, y L x 1, y L x, y 1 L x, y 1 2
m x, y
2
x, y tan 1
L x, y 1 L x, y 1 / L x 1, y L x 1, y
(2)
(3)
3.1.3 Trademark Image descriptor at key point After identification of key point’s next step is to find the descriptors for them, for which various methods are being used. In the present work keeping in mind the stability of feature we have used gradient orientation for computation of descriptors. When compared to actual intensity values, gradients are found to be more stable. Figure 3
Computation of gradient histogram in a 4X4 grid of example image
Locating key point using intensity values of 27 pixel points (see online version for colours)
3.2 2-D HSV Correlogram
For building the scale space it is required to obtain Laplacian of Gaussian which can be determined by subtraction of Gaussian at two different scales (Lowe, 2004). For building scale space, following equation is used: G x , y , k
x2 y2
1 2 kσ
2
e
2 k 2 σ2
(1)
Colour is another prominent feature of an image. Representation of colours in similar works in past have been done using colour histogram. The technique of colour histogram being a global technique has an inherent limitation of extracting global colour information. Local variations in colour could not be taken into account using this global technique. To do away with this limitation in current work we have used 2-D HSV correlogram technique (Sebe and Lew, 2001). This technique captures the spatial distribution information of colour intensity values. Images prior to input are resized to 255 × 255 pixels. Steps involved in this technique/algorithm are:
3.1.2 Determination of orientation
1
Conversion of RGB image to HSV colour space.
As discussed earlier, another key feature for achieving rotation invariance is orientation. At each key point or point of interest, derivative magnitude m(x, y) and direction Ɵ (x, y) is determined using equations (2) and (3) respectively. Then in a desired neighbourhood directional histogram of eight bins (each bin containing ten consecutive degrees) is created. Based on gradient magnitude, pixel directions are assigned
2
For any particular HSV plane following steps are then applied: a
A 256 × 256 2-D matrix is created to store Correlogram values
b
Store values of H in Corr_H and similarly values of S and V in Corr_S and Corr_V respectively.
64
A. Nigam and R.C. Tripathi
3
For any intensity value ‘I’, if pixel with intensity value ‘j’ lies within 5 × 5 neighbourhood, increment the value of Corr_H(i,j) by 1 and similarly for S and V planes.
4
Division of values in each cell is done by 255 to normalise the values in correlogram matrix.
In Figure 4, LHS shows an example image and RHS shows the correlogram obtained while processing the shaded pixel in the image. Figure 4
3.3
where D’Ɛ D represents the Mahalanobis distance between D and Q where Λ is the covariance matrix for D and Q. We used Mahalanobis distance metric over Euclidean because it also considers the correlation between the vectors. Figure 5 depicts the process of weighted combination and similarity matching. Figure 5
Process of weighted combination and similarity matching
Example of correlogram computation
Weighted combination
In the present work we take a combination of both the features discussed above namely SIFT and colour correlogram. The developed system works on weighted combination of both these feature vectors. The weights of individual feature vectors is flexible and can be decided based on which feature needs to be emphasised. Following equation (4) shows the relation for weighted combination of both the feature vector, where FV is the final feature vector, SV is the feature vector obtained after applying SIFT, WSV is the weight assigned to SIFT feature, similarly CCV is the feature vector obtained after applying colour correlogram and WCCV is the weight assigned to colour feature: FV
WSV . SV WCCV .CCV WSV WCCV
(4)
It is evident from the above equation that weights for both the feature vectors are variable in the range 0–1 such that sum of both the weights come out to be 1. We can vary the weights on the basis of the feature which we want to be dominant in our retrieved results.
3.4 Retrieval based on similarity To find the similarity between the query trademark image and set of images in the registered trademark database, we compute the weighted combination to get the feature vector of both query image and database image and make use of Mahalanobis distance metric to compare the two vectors. If Q and D are query and database image vectors respectively, then d Q, D
D Q
T
D*=arg min [d (Q,D’]
1 D Q
(5)
4
Results
To test the retrieval performance we have used recall and precision values as the evaluation metric. The result has been tested on one standard dataset WANG containing ten images of ten categories each and a self created dataset composed of 800 images of registered trademarks in India, taken from ipindia.nic.in. and 200 images have been synthetically created to appear duplicates of original trademark images. All images in the self created dataset are properly labelled into ten groups. Query images are taken from each group and results obtained by using several combinations of weights for both SIFT and Correlogram feature has been analysed and tabulated in Table 1. The highest recall value is obtained when WSV = 0.6 and WCCV = 0.4. Recall and precision values obtained using the proposed method is tabulated in Table 2. Table 1
Weight combinations and corresponding average recall value obtained WSV [0-1]
WCCV [0-1]
Avg. Recall
0.3
0.7
0.79
0.4
0.6
0.98
0.5
0.5
0.89
0.6
0.4
0.94
0.7
0.3
0.84
Trademark image retrieval using weighted combination of sift and HSV correlogram
Total Total relevant Relevant Precision Recall retrieved in the database retrieved
1
10
8
8
0.8
1.0
2
10
9
8
0.8
0.88
3
10
9
8
0.8
0.77
4
10
9
9
0.9
1.0
5
10
7
7
0.7
1.0
The recall precision values depict an increase in the performance of the retrieval system as compared to the one’s described in Nandagopalan et al. (2008) where a combination of colour, texture and edge histogram feature has been used giving an average precision and recall value of 64% and 67.5 % respectively and in (Kim and Kim, 2000) where Zernike moments are used to describe the region of interest giving retrieval accuracy of 87.2%. Figure 6 shows a graph showing the average recall value obtained corresponding to number of images retrieved on WANG dataset. The average recall value of the proposed approach has been compared with the works of Jhanwar et al. (2004), Huang and Dai (2003) and Lin et al. (2009). Jhanwar et al. (2004) have used combination of colour and texture features extracted through motif co-occurrence matrix. Huang and Dai (2003) had elicitated a technique combining composite sub-band gradient vector and energy
distribution pattern string. Lin et al. (2009) have used colour representation through co- occurrence matrix and difference in pixels of scan pattern. Figure 6
Graph showing comparison of recall values obtained using proposed approach and several other approaches in literature on WANG dataset (see online version for colours)
0.8 Jhanwar et al. (2004)
0.6 0.4
Huang and Dai (2003)
0.2 0
Lin et al. (2009)
20 30 40 50 60 70 80 90 100
S.No.
Recall and precision values of the proposed system
Recall
Table 2
65
No. of Images Retrieved
Proposed Approach
The recall graph shows that the highest recall value is achieved through proposed approach when compared with the works of Jhanwar et al. (2004), Huang and Dai (2003) and Lin et al. (2009). Resulting images retrieved while firing queries from certain labelled groups are shown in Figure 10, where on the left are the query images in each row and on right are the retrieved images in decreasing order of similarity from left to right. The intermediate stages are shown in Figures 7–9.
Figure 7
Query image on the left hand side and database image on the right hand side
Figure 8
SIFT interest points in query and database image
66
A. Nigam and R.C. Tripathi
Figure 9
Lines connecting point pairs corresponding to matched SIFT interest points in query and database image
Figure 10 Results of proposed technique: on left are query images, on right are the retrieved images
5
Conclusion
To automate and enhance the process of similar trademark retrieval a weighted combination of SIFT and colour feature vector has been proposed. The quality of SIFT feature vector to be invariant to scaling, translation and rotation has been exploited to create an automated system that can catch any sort of transformations applied to an original image either unknowingly or with some malicious intention. SIFT feature in combination with colour correlogram is seen to cope well with scaling and rotation problems. With variation in weights the flexibility in results is provided in terms of the feature to be emphasised in the feature matching phase. The average recall and precision achieved by the proposed system are 95% and 80% respectively which proves its effectiveness.
References Fu, X., Li, Y., Harrison, R. and Belkasim, S. (2006) ‘Content-based image retrieval using Gabor-Zernike features’, IEEE 18th International Conference on Pattern Recognition, Vol. 2, pp.417–420. Hoang, N., Thuong, L., Tuan, D., Cao, B. and Ty, N. (2010) ‘Image retrieval using contourlet based interest points’, IEEE 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp.93–96.
Huang, P.W. and Dai, S.K. (2003) ‘Image retrieval by texture similarity’, Pattern Recognition, Vol. 36, pp.665–679. Jain, A.K., Lee, J.E., Jin, R. and Gregg, N. (2009) ‘Content-based image retrieval: an application to tattoo images’, IEEE International Conference on Image Processing, pp.2745–2748. Janet, B. and Reddy, A.V. (2012) ‘Index model for mage retrieval using SIFT distortion’, International Journal of Computer Applications in Technology, Vol. 6, No. 3, pp.289–306. Jhanwar, N., Chaudhurib, S., Seetharamanc, G. and Zavidovique, B. (2004) ‘Content based image retrieval using motif cooccurrence matrix’, Image and Vision Computing, Vol. 22, pp.1211–1220. Khokher, A. and Talwar, R. (2012) ‘Evaluation of a content-based image retrieval system using features based on colour means’, International Journal of Computer Applications in Technology, Vol. 4, No. 1, pp.61–75. Kim, W.-Y. and Kim, Y.-S. (2000) ‘A region based shape descriptor using Zernike moments’, Signal Processing: Image Communication, Vol. 16, Nos. 1/2, pp.95–102. Kogler, M. and Lux, M. (2010) ‘Bag of visual words revisited: an exploratory study on robust image retrieval exploiting fuzzy codebooks’, Proceedings of the 10th International Workshop on Multimedia Data Mining, pp.3:1–3:6. Lin, C.H., Chen, R. and Chan, Y. (2009) ‘A smart content-based image retrieval system based on color and texture feature’, Image and Vision Computing, Vol. 27, No. 6, pp.658–665. Lowe, D.G. (2004) ‘Distinctive image features from scale-invariant keypoints’, International Journal of Computer Vision, Vol. 60, No. 2, pp.91–110.
Trademark image retrieval using weighted combination of sift and HSV correlogram Nandagopalan, S., Adiga, B.S. and Deepak, N. (2008) ‘A universal model for content-based image retrieval’, World Academy of Science, Engineering and Technology, Vol. 46, pp.644–647. Qi, H., Li, K., Shen, Y. and Qu, W. (2010) ‘An effective solution for trademark image retrieval by combining shape description and feature matching’, Pattern Recognition, Vol. 43, pp.2017–2027. Sebe, N. and Lew, M.S. (2001) ‘Color-based retrieval’, Pattern Recognition Letters, Vol. 22, pp.223–230. Velmurugan, K. and Baboo, S.S. (2011) ‘Image retrieval using Harris corners and histogram of oriented gradients’, International Journal of Computer Applications, Vol. 24, No. 7, pp.6–10.
67
Wen, C., Geng, G.H. and Zhu, X.Y. (2011) ‘Cultural relic image retrieval method based on features of SIFT’, IEEE International Conference on Computational and Information Sciences, pp.125–128. Wu, Z., Ke, Q.F. and Sun, J. (2009) ‘Bundling features for largescale partial-duplicate web image search’, ICMR ACM International Conference on Multimedia Retrieval. Zhi, L., Zhang, S., Zhao, D., Zhao, H., Lin, S., Zhao, D. and Zhao, H. (2009) ‘Medical image retrieval using SIFT feature’, IEEE 2nd International Congress on Image and Signal Processing, pp.1–4.