proposed technique. Index Termsâ. Saliency detection, fuzzy quantification, semi-supervised classification, ensemble projection, random forest, large datasets.
SEMI-SUPERVISED IMAGE CLASSIFICATION IN LARGE DATASETS BY USING RANDOM FOREST AND FUZZY QUANTIFICATION OF THE SALIENT OBJECT Hager Merdassi, Walid Barhoumi and Ezzeddine Zagrouba Research team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), RIADI laboratory. ISI, 2 rue Abou Rayhane Bayrouni, 2080 Ariana, Tunisia. ABSTRACT In this paper we are interested in the semi-supervised image classification in large datasets. The main originality of the proposed technique resides in the fuzzy quantification of the salient object in each image in order to guide the semisupervised learning process during the classification. Indeed, we detect the salient object in each image using soft image abstraction, which allows the subsequent global saliency cues to uniformly highlight entire salient regions. Then, fuzzy quantification was involved for the purpose of improving the correct belonging of pixels to the salient object in each image. For classification, ensemble projection is used, while training a random forest classifier on labeled images with the learned features to classify the unlabeled ones. Experimental results on two challenging large benchmarks show the accuracy and the efficiency of the proposed technique. Index Terms— Saliency detection, fuzzy quantification, semi-supervised classification, ensemble projection, random forest, large datasets. 1. INTRODUCTION Problems ranging from object detection to image classification have been two challenging active fields of research in computer vision. On the one side, object detection is the process of detecting instances of semantic real-world objects of a certain class in images. Detecting such salient regions has benefited a wide range of tasks and applications [1, 2, 3, 4]. On the other side, image classification, especially in the case of large datasets, involves two ways of performing machine learning. The first one is unsupervised learning, which is closely related to the problem of density estimation in statistics. The problem of unsupervised learning is resolved by trying to find a hidden structure in unlabeled data and it has some impressive success. However, unsupervised learning has significant limitations. In fact, using very general notions to identify patterns and interesting structure of data limits their performance on any specific task. Moreover, the scarcity of 978-1-4799-7971-4/14/$31.00 ©2014 IEEE
prior knowledge makes the problem harder and decreases the accuracy. The second one is supervised learning where data are labeled with predefined classes. A supervised learning algorithm analyzes the training data and produces an inferred function which can be used for mapping new examples. Indeed, several methods were used to generalize the standard supervised learning: active learning [5], structured prediction, learning to rank [6] and semisupervised learning [7]. Despite the many applications of supervised learning, labels can be hard to get, because human annotation is slow, boring and expensive. For these reasons, we use in this work the semi-supervised learning (SSL) in order to minimize the classification errors. Our purpose is to increase the performance of the image classification while optimizing the computation cost. The proposed technique incorporates four principal steps. The first one consists of detecting salient regions of each image. Then, we use a fuzzy quantification in order to give more importance to pixels that belong certainly to the salient regions than those on the object boundaries. After that, semi-supervised image object classification based on ensemble projection was involved. For classification, we explore the random forest [8, 9, 10] thanks to its benefits. It is unexcelled in accuracy among current classification algorithms. Also, it runs efficiently on large datasets. Besides, it can handle thousands of input variables without variable deletion and gives estimates of what variables are important in the classification. The remaining part of this paper is organized as follows. Next section focuses on the related work on image classification based on semi-supervised learning. In Section 3, we describe the proposed technique based on fuzzy quantification. We show the experimental results in Section 4 and we produce conclusion with some directions for future works in Section 5. 2. RELATED WORK Due to the availability of large quantities of unlabeled data, semi-supervised learning has recently become more
practical to resolve the problem of classification and has improved performance in the presence of large volumes of data. Indeed, in many specific classification tasks, semisupervised learning was incorporated. Semi-supervised learning was successfully applied in different applications including segmentation [11], co-segmentation [12], handwritten digit recognition, text classification, multi-pose facial expression recognition [13], protein function prediction problem [25], object tracking [14], image retrieval [15], and person identification in multimedia data [16]. In fact, classification methods based on semisupervised learning are more efficient compared to the supervised and unsupervised learning based ones. In [17], for example, an empirical comparison of several semisupervised learning and supervised learning methods shows that the semi-supervised learning can achieve better predictive performance than the supervised one. In particular, the performance of supervised learning methods depends especially on the quality of the training set, contrarily to the semi-supervised learning, which makes use of both labeled and unlabeled data for classification. Some popular semi-supervised learning models include generative models, graph-based methods, self-training, co-training and ensemble projection. Semi-supervised Performance Evaluation (SPE) based on a generative model for the classifier’s confidence scores was proposed in order to estimate classifier performance from few labeled items [18]. In this model, unlabeled data are used as additional information to perform a better clustering [19]. It can be improved by external knowledge in form of ranked constraints [20] or by optimizing multiple objectives with a Pareto-optimal solution [21]. Furthermore, there are hybrid methods that combine generative models with discriminative ones [22]. Another major class of methods, often referred as low-density separation class, attempts to place boundaries in regions where there are few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine. Otherwise, graph-based methods, as the name implies, operate on a graph structure that represent data, with a node for each labeled and unlabeled example. The graph may be constructed using domain knowledge or similarity of examples. The semi-supervised learning methods based on normalized graph Laplacian [23] and random walk graph Laplacian [24] have successfully been applied. For example, in order to explore graph-based methods, semi-supervised learning methods based on three graph Laplacian were combined [25]. Furthermore, higher classification accuracy, compared to existing state-of-the-art SSL methods, is obtained using a greedy Max-Cut based SSL [26] when it is applied on both artificial and standard benchmark datasets. Recently, another method based on ensemble projection [27] was proposed where images are represented by the
concatenation of their projected values to all the image prototypes. 3. PROPOSED TECHNIQUE We begin the proposed technique, as shown in Fig. 1, by detecting the salient object in each image of the input dataset. This is done by using the global cues estimation that produces two different complementary saliency maps. The first one is the global uniqueness GUuniq. The uniqueness Uuniq(ci) of a global component ci (1) is defined as its weighted color contrast to all other entire components [3].
where, d(ci,cj) represents the spatial distance between centroids of the two Gaussian Mixture Model (GMM) components ci and cj. The standard deviation σ allows distant regions to also contribute to the global uniqueness (in our case σ2=0.4). ωc and μc represent respectively the weight and the mean color of the cth component.
Fig. 1. Outline of the proposed technique.
The secondary saliency map is the color spatial distribution (CSD) of a clustered component Cc (2) and it is defined as follows: where,
V(Cc) is the spatial variance (3) of a clustered component Cc. Vh(Cc) and Vv(Cc) define respectively the horizontal and the vertical spatial variance of a clustered component Cc. These two cues, GUuniq and CSD, lead to identify automatically which one can detect correctly the salient object. Then, the second step consists on the fuzzy quantification in order to give more importance to pixels that belong certainly to the salient region. To do that, for each saliency image SI of size N×M, we define the quantified image QI as follows (4):
define the prototypes, that should be inter-distinct and intracompact, we calculate the max-min sampling [27]. Then, we use the image prototype sets for a new image representation. Indeed, we used ensemble projection. We sample an ensemble of diverse prototype sets from all known images and learn classifiers on them for the projection functions. Using these functions, images are projected to obtain new representation. For classification, we train a random forest classifier on labeled images with the learned features to classify the unlabeled ones. In fact, random forest consists of a collection of tree-structured classifiers {H(x,Θl), l=1,…,T} where the Θl are independent identically distributed random vectors and each tree, among T ones, votes for the most popular class at input x [8].
where, 4. EXPERIMENTS AND EVALUATIONS
This results in a saliency image with fuzzy pixels’ values that lead to maintain pixels that have high value (≈1) where they construct the salient region, and ignore other pixels (=0). This procedure, enhances the object detection as shown in Fig. 2. In fact, the fuzzy quantification of the salient object offers a better modelling of uncertainty and ambiguity concerning the belonging of a pixel to the object of interest.
In order to evaluate the proposed technique, this section is devoted to a comparison study with an efficient state-of-theart technique [27]. This technique is the most similar technique, relatively to the suggested one, since it is based on semi-supervised classification problem based on ensemble projection. Two challenging large datasets of color images are tested: “Caltech_101” and “Flowers”. In Table 1, we produce the accuracy metrics AM obtained successively by [27] and by the proposed technique. In fact, we extract AM from a sample of 3000 images from the first database “Caltech_101”. We train the same classifiers used in [27]: SVM (Support Vector Machine), LR (logistic regression) and KNN (K-Nearest Neighbor). We note the stability of the proposed technique, since AM scores are higher for each feature (PHOG, LBP and GIST), as well as for the concatenation of all of them (ALL), than the AM score of the compared technique [27]. Consequently, we conclude that the fuzzy quantification allows enhancing the saliency region quantified for each image and then it allows improving the classification accuracy. Classifier SVM
LR
(a)
(b)
(c)
Fig. 2. Examples of saliency detection and fuzzy quantification results: (a) original images, (b) saliency detection results, (c) fuzzy quantification results.
After computing the features for each image, which are local binary pattern (LBP), Pyramid of Histogram of Oriented Gradients (PHOG) and GIST [27], we explore all available data D where D=Dl + Dunl (labeled and unlabeled data). First of all, we construct a prototype set Pset. In fact, in order to
KNN
Features PHOG LBP GIST ALL PHOG LBP GIST ALL PHOG LBP GIST ALL
Proposed Technique 0.3 0.4 0.4 0.5 0.3 0.4 0.4 0.5 0.2 0.3 0.3 0.4
[27] 0.2 0.4 0.4 0.4 0.2 0.4 0.4 0.4 0.2 0.3 0,3 0.4
Table 1. Objective evaluation: comparison of accuracy metrics AM obtained by the proposed technique against [27] on the database “Caltech_101”.
Besides, we assess the proposed technique by comparing the performance of different classifiers used in [27] to ours on
the second database “Flowers” that contains 1360 images. We remark that we successfully enhance the accuracy metrics for the same classifiers used in [27] and to the used random forest RF in the proposed technique. For instance, the AM score obtained by the random forest classifier is equal to 0.94 against a value of 0.26 by SVM. In this case, five labeled training examples per class were used. We note also that the classifiers applied in the proposed technique (SVM_PT, LR_PT, KNN_PT, RF_PT) allow higher values than the others (Fig. 3). As a result, we confirm the aforementioned benefits of random forest mainly its accuracy and its efficiently running on large datasets.
PHOG features
GIST features
LBP features
Concatenation of all features
Fig. 3. Objective evaluation: comparison of the performance of the proposed technique, with different classifiers, on the database “Flowers” while testing various features (PHOG, LBP, GIST and concatenation of them). Classifier SVM
LR
KNN
RF
Features PHOG LBP GIST ALL PHOG LBP GIST ALL PHOG LBP GIST ALL PHOG LBP GIST ALL
can clearly deduce that SVM has the higher execution time and it is equal to 23 seconds. In contrast, random forest is the fastest, since it runs just in 2 seconds. All results in this paper were obtained from an Intel® CoreTM i5-2410M CPU system with 4GB of RAM, running on MATLAB R2012a and Ubuntu 12.10 64bit. In fact, the computation time was optimized thanks to the use of a low number of trees and the number of randomly selected samples to use to grow each tree. Fig. 4 shows the execution time of random forest according to the number of trees nTrees in the ensemble in Fig. 4(a) and the number of randomly selected samples nsamtosample in Fig. 4(b). We note a remarkable growing of execution time that can be seen with the increasing of the nTrees, and respectively a low variation following the growing of nsamtosample. In contrast, we record a very small variation of the classification accuracy. For this, we opted for low values of nTrees and nsamtosample in order to optimize the computation cost.
Accuracy 0.22 0.16 0.12 0.20 0.24 0.15 0.12 0.22 0.17 0.12 0.11 0.15 0.94 0.94 0.93 0.94
CPU (secs) 23.73 19.24 18.20 21.40 17.11 17.84 17.60 17.13 7.34 6.53 6.52 6.37 2.32 2.57 2.45 2.46
Table 2. Classification accuracy and execution time obtained by the proposed technique on the database “Flowers”.
Table 2 shows classification accuracy on the dataset “Flowers” and the computation time for every classifier. We
(a)
(b)
Fig. 4. Execution time of random forest according to: (a) number of trees, (b) number of randomly selected samples to use to grow each tree.
5. RELATION TO PRIOR WORK AND CONLCUSION The work presented here has focused on the semi supervised classification based on fuzzy quantification, which takes advantage of the random forest properties. The work presented in [27] computes features directly from originals images. In contrast, in our work, we first detect the salient object [3], and thanks to the fuzzy quantification we successfully enhance the detection of the object. Then, we make use of ensemble projected used in [27] and we train a random forest classifier with the learned features which enhances classification accuracy, while being faster than the classifiers used in [27]. As future work, we plan to investigate other features while adopting the same proposed SSL model for selecting accurate ones.
6. REFERENCES [1] H.J. Seo and P. Milanfar, “Visual saliency for automatic target detection, boundary detection, and image quality assessment”, ICASSP, IEEE, Dallas, pp. 5578−5581, March 2010. [2] M.M. Cheng, G.X. Zhang, N.J. Mitra, X. Huang and S.M. Hu, “Global contrast based salient region detection”, CVPR, IEEE, Providence, pp. 409−416, June 2011. [3] M.M. Cheng, J.W. Wen, Y.L. Shuai, Z.V. Vineet and N. Crook, “Efficient salient region detection with soft image abstraction” ICCV, IEEE, Sydney, pp. 1529−1536, December 2013. [4] M.M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr and S.M. Hu, “Salient object detection and segmentation”, TPAMI, IEEE, 2014. [5] P. Jain and A. Kapoor. “Active learning for large multi-class problems”, CVPR, IEEE, Miami, pp. 762−769, June 2009. [6] M.N. Volkovs, H. Larochelle and R.S. Zemel, “Learning to rank by aggregating expert preferences”, CIKM, New York, pp. 843−851, October 2012. [7] R. Fergus, Y. Weiss, and A. Torralba, “Semi-supervised learning in gigantic image collections”. NIPS, Canada, pp. 522−530, December 2009. [8] L. Breiman, “Random forests”, ML, California, Berkeley, pp. 5−32, 2001. [9] C. Leistner, A. Saffari, J. Santner and H. Bischof, “Semi supervised random forests”, CV, IEEE, Kyoto, pp. 506−513, September 2009. [10] X. Liu, M. Song, D. Tao and Z. Liu, “Semi-supervised node splitting for random forest construction”, CVPR, IEEE, Portland, pp. 492−499, June 2013. [11] J. Xu, C. Sch, X. Chen and X. Huang, “Interactive image segmentation by semi-supervised learning ensemble”, KAM, IEEE, Wuhan, China, pp. 645 – 648, December 2008. [12] Z. Wang and R. Liu, “Semi-supervised learning for large scale image cosegmentation”, ICCV, IEEE, Sydney, Australia, pp. 1550−5499, December 2013. [13] B. Jiang and K. Jia, “Semi-supervised facial expression recognition algorithm on the condition of multi-pose”, JIHMSP, New York, pp. 17−22, 2013. [14] Y.Y. Guang and S.M. Shah, “Semi-supervised learning of feature hierarchies for object detection in a video”, CVPR, Proland, pp. 1650−1657, June 2013.
[15] K. Zhao, W. Liu and J. Liu, “Optimal semi-supervised metric learning for image retrieval”, ACM, New York, pp. 893−896, October 2012. [16] M. Bäuml, M. Tapaswi and R. Stiefelhagen, “Semi-supervised learning with constraints for person identification in multimedia data”, CVPR, IEEE, Portland, pp. 3602−3609, June 2013. [17] L. Jurica, D. Sašo, S. Fran and Š. Tomislav, “Semi-supervised learning for quantitative structure-activity modeling”, STJ, pp. 173−179, June 2013. [18] P. Welinder, M. Welling and P. Perona, “A lazy man’s approach to benchmarking: semi supervised classifier evaluation and recalibration”, CVPR, IEEE, Portland, pp. 3262−3269, June 2013. [19] X. Huang, H. Cheng, J. Yang, J. X. Yu, H. Fei, and J. Huan, “Semi-supervised clustering of graph objects : a subgraph mining approach”, DASFAA, Springer, South Korea, pp. 197−212, April 2012. [20] E. Ben Ahmed, A. Nabli, and F. Gargouri, “SHACUN : Semisupervised hierarchical active clustering based on ranking constraints”, ICDM, Springer, Berlin, Germany, pp. 194−208, July 2012. [21] J. Ebrahimi and M. S. Abadeh, “Semi supervised clustering : a pareto approach”, MLDM, Springer, Berlin, pp. 237−251, July 2012. [22] G. Druck and A. Mc Callum, “High-performance semisupervised learning using discriminatively constrained generative models”, ICML, IEEE, Israel, pp. 319−326, June 2010. [23] D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Schölkopf, “Learning with local and global consistency advances” NIPS, MIT Press, Cambridge, pp. 321−328, December 2004. [24] X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation technical report” CMUCALD, Pittsburgh, June 2002. [25] L. Tran, “Application of three graph laplacian based semisupervised learning methods to protein function prediction problem”, IJBB, India, pp. 11, June 2013. [26] J. Wang, T. Jebara and S.F. Chang, “Semi-supervised learning using greedy max-cut”, JMLR, pp. 771−800, January 2013. [27] D. Dai and L.V. Gool, “Ensemble projection for semisupervised image classification”, ICCV, IEEE, Sydney, pp. 2072– 2079, December 2013.