Classification of Near Duplicate Images by Texture ...

1 downloads 0 Views 5MB Size Report
**Dean (Academic), R.M.D. Engineering College, Anna University, Chennai, India [email protected]. Abstract: In this paper, a novel approach is ...
Classification of Near Duplicate Images by Texture Feature Extraction and Fuzzy SVM G. Kalaiarasi* and Dr. K. K. Thyagharajan** Assistant Professor, Department of CSE, Dhanalakshmi Srinivasan College of Engineering & Technology, Anna University, Chennai, India [email protected] ** Dean (Academic), R.M.D. Engineering College, Anna University, Chennai, India [email protected]

*

Abstract: In this paper, a novel approach is proposed for classification of near duplicate images based on Fuzzy Support Vector Machine (FSVM). First, gray level co-occurrence matrix (GLCM) is used to extract texture features. Next, extracted features are given as input to SVM. Finally, fuzzy is incorporated with SVM classifier to classify near duplicate images. Experimental results show that classification accuracy of FSVM is more than that of SVM. Keywords: Near duplicate images, Classification, Support Vector Machine (SVM), Fuzzy Support Vector Machine (FSVM).

Introduction Organizing or Classifying images into meaningful near duplicates and non near duplicates using low-level visual features is a challenging and important problem in content-based image retrieval. Many machine learning methods are done based on automatic semantic image classification. Humans can classify objects easily than machines. Availability of high-capacity computers, high quality and low-priced video cameras has created an interest in classification algorithms. Classification system contains trained patterns that compares with detected object to classify in to an appropriate category. Image classification is an important and challenging method in various applications, including biomedical imaging, video surveillance, vehicle navigation, remote sensing, etc. Digital image classification is the most appealing area which finds application everywhere, such as industry application, medical diagnosis and research area. The two areas of digital image are – one is to improve the pictorial information for human interpretation and the other is to process image data for storage and retrieval of image. Clustering and classification procedures of large number of objects use different methods for similarity analysis in order to obtain the solution. Despite such difficulties, finding a “good”, reasonable (from a human viewpoint) solution to the problem of similarity evaluation is a key factor for a plausible classification that leads to various practical applications such as the automated visual search through a large amount of image information, its proper sorting and classification for the purpose of different kinds of decision making and (technical or medical) diagnosis. With the increase in the advancement in computing and digital imaging process, the number of images is increasing rapidly, thus, making it necessary to provide techniques for efficient management and retrieval of stored images. Image classification and retrieval is also a major issue in the areas of pattern recognition, robotics and artificial intelligence. Examples of systems requiring classification include but are not limited to visual tracking, image registration and content-based image retrieval. Classification of image from given data base by using traditional machine learning algorithms is very complicated task because of the large size image database and many numbers of details that describe an image. For these reasons, learning algorithm is not suitable for image classification when data base is very large. Another limitation of mention traditional machine algorithm is long time requirement for classification. Image classification is an important topic in the field of image processing. The accuracy and convergence rate of image classification indicates the effectiveness of the algorithm. Most existing image classification algorithms classify images with the gray scale values of the image by using the prior experience in image processing. However, due to the large variations in the properties and environments of different images, this method has limitations in real-world applications. It is a better approach to enhance the accuracy and robustness of image classification by utilizing the own characteristics of images. Classification of near duplicate images is much important in almost all the Image Processing area. This is because if near duplicate images are identified, processing time can be reduced and even unnecessary storage space can also be reduced by removing those near duplicates. Consider that there are 3 images in the database out of which 2 images should be classified to one class i.e., near duplicate images in one class and non near duplicate image in another class. To achieve this, texture features are extracted and Fuzzy SVM is applied to that.

188 Advances in Engineering and Technology This paper is organized as follows: Section 2 has Related Works in Image Classification. Section 3 has detailed description of Proposed Work and Experimental Results are discussed in Section 4. Finally Conclusion is presented in Section 5.

Related Works In this part, some of the related works are discussed: Prior research in scene classification has shown that high-level information can be obtained from low-level image features. In [1], low level features are used for classification and high classification rates are often achieved by using computationally expensive, high-dimensional feature sets. The removal of near duplicates result in higher error rates. Color features and texture features are computed; color features are computed for an entire image and trained using SVM and texture features are obtained from a two level wavelet decomposition. Two-stage SVM classification scheme is used. First stage is for training color and texture features; second stage is to train a new SVM with global color and texture features for the entire image. Conventional relevance feedback in content based image retrieval (CBIR) system uses only the labelled images for learning. Image labelling is a tedious process which consumes more time and so the users are not willing to label many images during the feedback process. Pseudo-labelled image is proposed in [2]. Pseudo-labelled image is labelling of image that is not labelled explicitly by the users, but estimated using fuzzy rule. These images contain a certain degree of uncertainty or fuzziness in its class information. PLFSVM perform CBIR. The main principle is - the identification of useful unlabeled images as candidates for pseudo-labelling, the integration of fuzziness is embedded in the pseudo-labelled images into the scheme and the development of necessary computational techniques for the optimization of learning. Two computational intelligence is used i.e., SVM and Fuzzy logic for developing PLFSVM. By exploiting the characteristics of the labelled images, unlabelled images are chosen carefully and can be assigned with pseudo-labels such as ‘relevant’ or ‘irrelevant’. PLFSVM achieve improve performance and reduce user workload. In [6], texture features of the image are extracted using Gray-Local Texture Pattern (GLTP) operator which is extended version of Local Texture Pattern(LTP) texture model. Contrast is another important property of images that can be extracted using Color-Local Contrast Variance Patterns (CLCVP). GLTP/CLCVP is used as a textual feature extraction technique for classification of color images. Here Pattern Unification Procedure is applied i.e., when LTP operator is applied over the local region of the R, G and B bands individually, there will be three LTP values corresponding to each & every pixel. Similarly, 3 VAR values are computed. Then three LTP and three VAR values are combined to form unique code. Then classification principle is followed based on texture similarities and k-Nearest Neighbor. Texture similarities are calculated by comparing their GLTP/CLCVP histograms. G-statistic compares the bins of two histograms. Classification based on kNN is done as – testing and training is done; then similarity or dissimilarity is calculated between the test point and the training point. In [8], a novel approach is proposed for semantic classification of images based on weighted feature support vector machine (WFSVM). For image classification, the image data has a large number of feature dimensions. Conventional classification algorithms based on the SVM assign equal weights to these features. However, kernel function computation of SVM may be dominated by trivial relevant or irrelevant features. The novelty of this paper is that the importance of each feature is taken with respect to the classification task into account. Firstly, the relevant features are determined according to their degree of discrete and assign greater weight to relevant features, the irrelevant features are discarded. Secondly, the weighted features are utilized to compute the kernel functions and SVM is trained. Finally, the trained SVM has been used to the new images automatic classification task. An effective shift invariant wavelet feature extraction method for classification of images with different sizes is proposed. The feature extraction has a normalization process followed by an adaptive shift invariant wavelet packet transform. Next energy signature is computed for each sub band of these invariant wavelet coefficients. A reduced subset of energy signatures is selected as the feature vector for classification of images with different sizes.[12] Chang jing Shang and Dave Barnes has presented a study on rock texture image classification using support vector machines (and also K-nearest neighbors and decision trees) with the help of feature selection techniques. Li Mao, Wen-bo Xu proposed an idea that according to the characteristics of the image classification, two types of ants is defined that have different search strategies and refreshing mechanisms. The stochastic ants have the capability of identifying new categories, category tables’ construction and determining the clustering center of each category. The Intellectual ants classify the image pixels by employing search advancing strategies, with the guidance of the information provided by stochastic ants. [13] In [16], a new type of a support vector machine is proposed which uses a kernel constituted from fuzzy basis functions. This combines both the characteristics of a support vector machine and a fuzzy system: high generalization performance, even when the dimension of the input space is very high, structured and numerical representation of knowledge and ability to extract linguistic fuzzy rules, in order to bridge the “semantic gap" between the low-level descriptors and the high-level semantics of an image. In [17], two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. FSVM based on

Classification of Near Duplicate Images by Texture Feature Extraction and Fuzzy SVM 189 recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy. [18] considers automated classification of breast tissue type as benign or malignant using Weighted Feature Support Vector Machine (WFSVM) through constructing the precomputed kernel f'unction by assigning more weight to relevant features using the principle of maximizing deviations. These analysis shows that the texture features are resulted with better accuracy than the other features with WFSVM and SVM. However the number of support vectors created in WFSVM is less than the SVM classifier. [20] presents a method of generating membership values which is iterative FSVM (I-FSVM). This method generates membership values iteratively based on the positions of training vectors relative to the SVM decision surface itself to improve the accuracy. In the field of Computer Aided Disease Diagnosis, classification technique plays a major role in identifying the image/tissue as normal or abnormal. [22] focuses on the statistical and texture based Brain image classification using Support Vector Machine for different modalities (MRI, PET/SPECT) and also fused images of MRI and PET/SPECT. The result shows that the fused images are classified correctly with minimum error rate than the single modality images.

Proposed Work In this paper, a novel approach is proposed for classification of near duplicate images based on Fuzzy Support Vector Machine (FSVM).Block diagram is given in fig. 1. First, texture features are extracted using gray level co-occurrence matrix (GLCM)[23]. Five features namely, Angular Second Moment, Correlation, Dissimilarity, Inverse Difference Moment, and Entropy are extracted. These features have high discrimination accuracy, require less computation time and hence can be efficiently used for Image Classification. Next, extracted features are given as input to SVM. Finally, fuzzy inference system is incorporated with SVM classifier to classify near duplicate images.

Fig. 1 Block Diagram of Proposed Work

First images are loaded from the database and the size of all the images are made uniform. Then Image Filtering is done and Image Equalization is done for Image Enhancement to get enhanced details of those images. Then corresponding histogram plot for filtered image as well as equalized image is obtained. Next step is feature extraction, i.e., texture features are extracted and these features will be the input to the SVM classifier and then to FSVM classifier for further classification of images. Finally, there will be two classes with one class having near duplicate images and other class with non near duplicates. Texture Feature Extraction Feature extraction involves the simplified process of describing a large set of data accurately with the minimal and accurate feature. If larger data or complex data is used, then number of variables involved will be a major problem. This requires huge memory space and computation for classification algorithm results in poor classes for new samples, even though it will have good classes for trained samples. The main aim of texture analysis is to find a unique way of representing the characteristics

190 Advances in Engineering and Technology of textures and represent them in simple but unique form, which can be used for robust and accurate classification. In this paper, Gray level co-occurrence matrix is formulated to obtain some of the texture features. Large number of texture features can be extracted from the GLCM. Five features namely, Angular Second Moment, Correlation, Dissimilarity, Inverse Difference Moment, and Entropy are extracted. These features have high discrimination accuracy, require less computation time and hence can be efficiently used for Image Classification. A GLCM is a matrix where the number of rows and columns is equal to the number of gray levels in the image. The element P (i, j | Δx, Δy) is the relative frequency with which two pixels, separated by a pixel distance (Δx, Δy), occur with two intensities – ‘i’ and ‘j’. Due to their large dimensionality, the GLCM’s are very sensitive to the size of the texture samples. Thus, the number of gray levels is often reduced. Following features are involved. Angular Second Moment (or Uniformity) measures the image homogeneity. Angular Second Moment is high if the image is homogeneous or when pixels are very similar. It is calculated using eqn. 1, ∑ ASM = ∑ (1) where i, j are the spatial coordinates and N is gray tone. Inverse Difference Moment (IDM) is the local homogeneity. If the local gray level is uniform, then the value of IDM will be high and inverse GLCM is high which is calculated with the help of eqn. 2, IDM =



∑ (

(2)

)

IDM weight value is the inverse of the Contrast weight. Entropy is nothing but the information required for the image compression which is measured as the loss of information. This can be given by the eqn. 3, ∑ ENTROPY = ∑ − × (3) Correlation is calculated which is nothing but the dependency of grey levels between the pixels. This is often used to measure deformation, displacement, strain and optical flow. Correlation is widely applied in many fields of science and engineering. This is given by eqn. 4, CORRELATION=





( , ) (, )



(4)

Dissimilarity is a measure that defines the variation of gray level pairs in an image. Dissimilarity ranges from [0,1] and obtain maximum when the gray level of the reference and neighbour pixel is at the extremes of the possible gray levels in the texture sample, which can be obtained by using eqn. 5, DISSIMILARITY= ∑



| − | (, )

(5)

Support Vector Machine The SVM method comes from the application of statistical learning theory to separate hyperplanes for binary classification problems. Consider there are numerous images; SVM searches for a hyperplane that divides the images into two classes where one class consists of same group of images that are inter related and the other class consists of related images in another group. First a brief introduction is provided on SVM. Let S = {(xi , yi )}, i=1 to n be a set of n training samples, where xi is an mdimensional sample in the input space, and yi ∈ {−1, 1} is the class label of xi . SVM first maps the input into a high dimensional feature space through a mapping function z = ϕ(x) and finds the optimal hyperplane that separates the images into different classes with the minimal classification errors. The hyperplane can be represented as: w・z+b=0 where w is the normal vector of the hyperplane, and b is the bias(scalar). The optimal hyperplane can be obtained by solving the following: Minimize ‖ ‖ +



Subject to y(w.zi+b) >= 1- , ≥ 0 , i=1,...n where C is the regularization parameter that reduces classification error. classification errors in SVM. Basic Structure of SVM is  Obtaining the Image Datasets  Separate Training set and Test set images

is called the slack variable that is related to

Classification of Near Duplicate Images by Texture Feature Extraction and Fuzzy SVM 191  Creating Labels for SVM train to distinct class.  Training SVM  Classify Test set images Training set - This set of images will be used to train our SVM. Test set - In the end of the SVM training these images are used for classification. Label - Near Duplicates and not Near Duplicates will be used as label, these are two objects two "labels" are given Classify - Distinguish the test set images. Fuzzy Support Vector Machine A fuzzy SVM has been proposed as an extension to standard SVM. The acquisition of data in the real-world applications is usually vague, uncertain and/or not complete. So these data can be represented by using fuzzy sets. Especially SVMs seem to be quite sensitive to noise and points that were not drawn properly from the underlying data. The only free parameter of an SVM is C which regularizes the penalty term in and hence the classification error. This is usually fixed for every input image during the training process. Prior to training, all images are treated the same. That might be crucial for the SVM due to outliers and noise. So, that may suffer from overfitting. As a consequence, the concept of a fuzzy support vector machine (FSVM) has been introduced independently from two different research groups at the same time. In particular, a membership value µi is assigned to every training image xi. FSVM considers the different significance of the training samples. For FSVM, each training sample is provided with a fuzzy membership value {µi }i=1 to n ∈ [0, 1]. The membership value µi reflects the fidelity of the data; in other words, confidence about the actual class information of the data. The higher its value, the more confident about its class label. The optimization problem of the FSVM is formulated as follows Minimize ‖ ‖ +



Subject to y(w.zi+b) >= 1- , i=1,...n It is noted that the error term is scaled by the membership value µi. The fuzzy membership values are used to weigh the soft penalty term. The weighted soft penalty term reflects the relative robustness of the training samples. Important samples with larger membership values will be more important in the FSVM training than those with smaller values.

Experimental Results Initially 30 images are taken out of which twenty images are trained in a such a way that first ten images should lie in one class i.e., near duplicate class and the remaining ten image in another class specifying non near duplicates. Then, remaining images are tested such that the result should be in such a way that first five images should be in one class and the remaining five images in another class. Samples of Input images are shown in fig. 3. Following steps are performed:          

Load the images Resize all the image to uniform size Image Filtering is performed Image is enhanced by finding the corresponding equalized image Histogram is plotted for both filtered image and equalized image Five Texture features are extracted – namely, Angular Second Moment, Correlation, Dissimilarity, Inverse Difference Moment, and Entropy Features values are stored in excel sheet Then normalized values are read from the excel sheet Based on these values training of image is done in SVM Fuzzy is incorporated along with SVM resulting in accurate classification of near duplicate images.

From the values, maximum value of Correlation, Energy and IDM is taken; minimum value of Dissimilarity and Entropy is taken; these values are trained for non near duplicate values and the remaining are trained for near duplicates by the SVM Classifier. But some of the images are not properly classified, so FSVM is used. i.e., if some of the images are present in both the classes – they can be classified by specifying membership values. Membership values and functions are added to integrate fuzzy with SVM by including the images to one class if it has maximum value for all the features based on that class. Thus classification is done by FSVM resulting in two classes namely, near duplicates and non near duplicates with increase in accuracy.

192 Advances in Engineering and Technology

Fig. 3 shows samples of input image

Fig. 4 shows filtered image, equalized image of an Input image and the corresponding Histogram Plot

Fig. 6 shows excel file with the values of five texture features

Classification of Near Duplicate Images by Texture Feature Extraction and Fuzzy SVM 193

Fig. 7 shows the extracted and normalized values of five texture features

Conclusion In this paper, a novel approach is proposed for classification of near duplicate images based on Fuzzy Support Vector Machine (FSVM). First, texture features are extracted using gray level co-occurrence matrix (GLCM). Five features namely, Angular Second Moment, Correlation, Dissimilarity, Inverse Difference Moment, and Entropy are extracted. These features have high discrimination accuracy, require less computation time and hence can be efficiently used for Image Classification. Next, extracted features are given as input to SVM. Finally, fuzzy is incorporated with SVM classifier to classify near duplicate images. Thus the images are classified using FSVM. Experimental results show that FSVM performs better than traditional SVM.

References [1] Navid Serranol. Andreas Savakis and Jiebo Luo, ”Computationally Efficient Approach to Indoor/Outdoor Scene Classification”, IEEE, 2002. Proceedings in 16th International Conference on Pattern Recognition Volume 4. [2] Kui Wu, Kim-Hui Yap, “Fuzzy SVM for Content-Based Image Retrieval”, IEEE Computational Intelligence Magazine, May 2006. [3] Witold Pedrycz, Alberto Amato, Vincenzo Di Lecce, and Vincenzo Piuri, “ Fuzzy Clustering With Partial Supervision in Organization and Classification of Digital Images”, IEEE Transactions On Fuzzy Systems, Vol. 16, No. 4, August 2008. [4] Gancho Vachkov, “Online Incremental Image Classification by Use of Human Assisted Fuzzy Similarity”, Proceedings of the 2010 IEEE International Conference on Information and Automation. [5] Takuya Kawakami, Takahiro Ogawa And Miki Haseyama, “Novel Image Classification Based On Decision-Level Fusion of Eeg And Visual Features”, 2014 IEEE International Conference On Acoustic, Speech and Signal Processing (ICASSP). [6] A.Suruliandi, E.M.Srinivasan, K.Ramar,” Image Resolution Dependency of Local Texture Patterns in Classification of Color Images”, 2010 Annual IEEE India Conference (INDICON). [7] Imran Ahmad, Muhammad Talal Ibrahim ,”Image Classification and Retrieval using Correlation”, IEEE Proceedings of the 3rd Canadian Conference on Computer and Robot Vision (CRV’06). [8] Keping Wang, Xiaojie Wang, Yixin Zhong, “A Weighted Feature Support Vector Machines Method for Semantic Image Classification”, 2010 International Conference on Measuring Technology and Mechatronics Automation. [9] Heather Dunlop,”Scene Classification of Images and Video via Semantic Segmentation”, 2010, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). [10] Om Prakash, Manish Khare, Rajneesh Kumar Srivastava, Ashish Khare,”Multiclass Image Classification using Multiscale Biorthogonal Wavelet Transform”, Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013).

194 Advances in Engineering and Technology [11] Dharmendra Patidar, Nitin Jain, Ashish Parikh, “Performance Analysis of Artificial Neural Network and K Nearest Neighbors Image Classification Techniques with Wavelet features”, 2014 IEEE International Conference on Computer Communication and Systems(ICCCS '14), Feb 20-21, 2014, Chennai, INDIA. [12] Chi-Man Pun, Moon-Chuen Lee, “Extraction of Shift Invariant Wavelet Features For Classification of Images With Different Sizes”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 26, No. 9, September 2004. [13] Chang Jing Shang, Dave Barnes, “Support Vector Machine-Based Classification of Rock Texture Images Aided by Efficient Feature Selection”, WCCI 2012 IEEE World Congress on Computational Intelligence. [14] Wei-jiu Zhang, Li Mao, Wen-bo Xu, “Automatic Image Classification Using the Classification Ant-Colony Algorithm”, 2009 International Conference on Environmental Science and Information Application Technology. [15] Loris Nanni, Michelangelo Paci, Sheryl Brahnam, “Indirect immunofluorescence image classification using texture descriptors”, Journal of Expert Systems with Applications, 2013 Elsevier. [16] Caijuan Shi , Qiuqi Ruan, Gaoyun An, “Sparse feature selection based on graph Laplacian for web image annotation”, Journal on Image and Vision Computing, 2014, Elsevier. [17] Spyrou. E, Stamou. G, Avrithis. Y., Kollias. S,”Fuzzy Support Vector Machines for Image Classification Fusing Mpeg-7 Visual Descriptors”, Integration of Knowledge, Semantics and Digital Media Technology, 2005, the 2nd European Workshop on the (Ref. No. 2005/11099). [18] Mao Y, Zhou X, Pi D, Sun Y, Wong ST,”Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection”, Journal of Biomedicine Biotechnology, 2005. [19] S. Kavitha and K.K. Thyagharajan, “Features Based Mammogram Image Classification Using Weighted Feature Support Vector Machine”, Communications in Computer and Information Science (CCIS), Springer-Verlag Berlin Heidelberg 2012. [20] Taoufik Guernine, Kacem Zeroual,”New fuzzy multi-class method to train SVM classifier”, DBKDA 2011 : The Third International Conference on Advances in Databases, Knowledge and Data Applications ISBN:978-1-61208-115-1. [21] Alistair Shilton and Daniel T. H. Lai,”Iterative Fuzzy Support Vector Machine Classification”, Fuzzy Systems Conference, 2007. FUZZ-IEEE 2007. IEEE International Conference. [22] Takuya Inoue and Shigeo Abe,”Fuzzy Support Vector Machines for Pattern Classification”, Neural Networks, 2001. Proceedings. IJCNN '01. [23] S. Kavitha and K.K. Thyagharajan, “A Classification System for Fused Brain Images using Support Vector Machine”, International Journal of Applied Engineering Research ISSN 0973-4562 Volume 10, Number 8 (2015). [24] Vignesh T and Thyagharajan K.K ,”Efficient Classification Methodology For Change Detection using Satellite Imagery”, Aust. J. Basic & Appl. Sci., 9(20): 580-590, 2015.

AUTHORS G.Kalaiarasi has received her B.Tech., degree in Information Technology in the year 2004 from Srinivasa Institute of Engineering & Technology affiliated to Madras University. She has completed M.E., in Computer Science & Engineering in the year 2008 from Dhanalakshmi Srinivasan Engineering College which is affiliated to Anna University. She is presently a Ph.D scholar in Information & Communication Engineering, Anna University, Chennai. Her research areas are Image Processing, Image Retrieval and Data Mining

Dr. K.K. Thyagharajan obtained his B.E., degree in Electrical and Electronics Engineering from PSG College of Technology (Madras University) and received his M.E., degree in Applied Electronics from Coimbatore Institute of Technology in 1988. He also possesses a Post Graduate Diploma in Computer Applications from Bharathiar University. He obtained his Ph.D. degree in Information and Communication Engineering (Computer Science) from College of Engineering Guindy, Anna University. .He has served at various Educational Institutions. He has twenty five years of teaching experience. Since September 2012 he is the Dean (Academic) of R.M.D. Engineering College. He has written 5 books in Computing including “Flash MX 2004” published by McGraw Hill (INDIA) and it has been recommended as text and reference book by universities and Polytechnics. He has published more than 60 papers in National & International Journals and Conferences. He is a grant recipient of Tamil Nadu State Council for Science and Technology. His biography has been published in the 25th Anniversary Edition of Marquis Who's Who in the World. He has been invited as chairperson and delivered special lectures in many National and International conferences and workshops. His current research interests are Multimedia Networks, Content Based Multimedia Retrieval, Mobile Computing, e-learning, Image Processing. He is reviewer for many International Journals and Conferences. He is a recognized supervisor for Ph.D by Anna University Chennai, MS University, JNTU and Sathyabama University.

Suggest Documents