Building effective SVM concept detectors from clickthrough data for

0 downloads 0 Views 4MB Size Report
... work on clickthrough data from image search engines and on the use of weighted SVM ... The use of clickthrough data for building concept de- tectors for image ..... absence of optimization, k1 is usually set in the range. [1.2, 2] (here k1 = 1.2) ...
Building effective SVM concept detectors from clickthrough data for large-scale image retrieval Ioannis Sarafis · Christos Diou · Anastasios Delopoulos

The final publication is available at Springer via http://dx.doi.org/10.1007/s13735-015-0080-5

Abstract Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on (i) automatic training set generation (ii) assignment of label confidence weights to the training samples and (iii) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: Vector Space Models, BM25 and Language Models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (i) on average, all weighted SVM variants are more effective than the standard SVM (ii) the Vector Space Model produces the best training sets and best weights (iii) the Bilateralweighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (iv) the Fuzzy SVM is the most robust training approach for varying levels of label noise.

I. Sarafis, C. Diou, A. Delopoulos Multimedia Understanding Group Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece

Keywords Clickthrough data · Concept based image retrieval · SVM · Fuzzy SVM · Power SVM · Bilateral-weighted Fuzzy SVM

1 Introduction Support Vector Machines have been commonly used for building concept detectors for image and video retrieval [26]. Like any other supervised learning algorithm, SVM relies on annotated image examples for concept detector training. Hand-labeling of concept detector training data, however, is impractical for large-scale image retrieval systems. This is not only due to the number of concepts required to effectively cover the user needs, but also due to the frequent updating required as concepts change or as new or ad hoc concepts emerge. Thus, creating concept detector training sets by other means, rather than hand-labeled annotations, is essential for the successful application of concept based retrieval to large scale image collections. Clickthrough data harvested by the search logs of multimedia search engines can be used for this purpose. These consist of clicked images after user-submitted queries and can provide weak indications of image relevance with respect to a concept and therefore can be used to automate concept detector training. One important advantage of clickthrough data compared to other sources of information (such as tags) is that it is almost effortless for image search engines to collect them without adding any overhead to the users. On the other hand, their direct usage for labeling can lead to annotations that suffer from noise, i.e., errors in training label assignment [30]. Such errors in training set labels have a significant impact on the performance of SVM-based concept detectors. Our previous work has

2

shown that it is feasible to train concept detectors using visual features and SVM training sets created automatically from query logs [31]. In [25] we showed that the use of Fuzzy Support Vector Machine (FSVM) in combination with a method for calculating weights from Language Models lead to significant improvement compared to concept detectors based on standard SVM. Experiments were conducted on a dataset of approximately 130,000 images with clickthrough data collected from a professional image search engine. The results indicated significant relative performance gains of Fuzzy SVM for increasing levels of label noise in the training sets. In [24] we explored the use of different IR models for automatically training noise resilient SVMbased concept detectors using weighted SVM variants and weights derived from different IR models. The experimental results indicated that weights deriving from Vector Space Models produced the best performance. In this paper, we further extend these works by proposing and evaluating a range of training set generation, sample weighting and classifier training strategies for building robust SVM image concept detectors from clickthrough data. More specifically, we first evaluate three IR models for image scoring: Vector Space Models (VSM), BM25 and Language Models (LM). The scores are then used for training set selection and weight generation for multiple weighted SVM variants, including the Fuzzy SVM (FSVM), the Power SVM (PSVM), and the Bilateral - weighted Fuzzy SVM (BFSVM). Experiments for 40 concepts on the MSR–Bing Grand Challenge dataset, consisting of 1M images and 82.3M unique queries, demonstrate the performance gains of each of the proposed strategies. Furthermore, in this work, we perform experimental analysis of the three different IR models in order to explain the observed differences between the effectiveness of concept detectors. This set of experiments leads to useful insights about the properties that a good scoring method should display. Finally, we complete our analysis with experiments where the noise level of the candidate positive training samples is artificially controlled allowing us to draw conclusions on the robustness of the different SVM variants. The paper is organized as follows. Section 2 presents an overview of related work on clickthrough data from image search engines and on the use of weighted SVM variants in pattern recognition problems, while Section 3 provides an overview of the proposed automatic concept detector training strategies. Sections 4 and 5 present an overview of the employed IR models and SVM algorithms, respectively, while, Sect. 6 presents the experimental evaluation and analysis of results. Finally,

Ioannis Sarafis et al.

Sect. 7 concludes the paper, summarizing the proposed approach and the main experimental findings.

2 Related Work This section presents an overview of research work related to the exploitation of clickthrough data and the applications of weighted SVM variants for robust classifier training under uncertainty.

2.1 Clickthrough data Clickthrough data provide an implicit indication of the relevance of the results returned by search engines to the user-submitted queries and have been successfully used to model user behavior [4, 5], identify meaningful user groups [29] and predict document relevance [3, 18]. The use of clickthrough data for building concept detectors for image retrieval was introduced in [31]. The dataset was collected from a professional image search engine and multiple strategies for training set selection were evaluated, including the use of Language Models examined in this paper. Our work was further extended in [25] where the scores calculated by Language Models were used to produce weights for SVM-based classifiers. In [24] we proposed techniques for robust SVM-based concept detectors based on automatic training set creation and sample weight generation for weighted SVM variants. Recently, large datasets of clickthrough data from commercial image search engine became publicly available [9] enabling new research in this field. The first methods to assign relevance for new images given a query were based on the visual similarity of the images or the textual similarity of the queries with the ones present in search logs [32]. In [37] a method for graph based image re-ranking using the clicked images to find similar images and improve the retrieved results is used and in [17] a combination of search queries and image mappings in latent subspaces to produce ranking mechanisms is proposed. The authors in [39] propose a reranking algorithm that employs clickthrough data for adaptive learning of query-dependent fusion weights for multiple modelities. In [11] a Gaussian Process (GP) regressor is trained to predict the normalized click counts of the images. The output of the GP regressor is then used, in combination with the original ranking scores produced by the search engine, to re-rank the retrieved list. The proposed method addresses the problem of incorporating visual features using training data that suffer from label noise while producing more diverse results than a static ranker. In [34] special emphasis

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

is given on handling named entities (e.g. celebrities), which occupy a large portion of the dataset and the search traffic, while random walk propagation models are used for general queries. Approaches based on ensembles of scoring methods such as SVMs, k-NN, and deep neural networks for assigning image relevance have also been proposed [6, 16]. In [16] it is remarked that SVM concept detectors can suffer from label noise present in training sets while other techniques, such as graph based label propagation, could tackle this problem. This is caused by the fact that the quality of the selected positive images is ignored during training set creation; i.e. the images are considered positive examples with the same confidence no matter how many times were clicked for the query at hand. Although SVM concept detectors have proved to be useful for image retrieval from clickthrough data, there is no work on how to treat the individual training sample quality at the classifier level in order to boost performance.

2.2 Weighted SVM variants SVM variants that provide weighting schemes for individual samples have been well used for a variety of problems. FSVM was initially proposed as a variant to handle the effect of outliers in the training set [8, 13] and has been extensively used in a many fields, such as classification of medical images [27, 28, 36], improving performance of multiclass classification problems [10], and managing relevance feedback in Content Based Image Retrieval systems [15, 19, 35]. The latter model the explicitly provided user feedback into FSVM in order to improve the retrieved results and FSVM based classifiers are re-trained iteratively until the user is satisfied. Based on the FSVM weighting scheme, BFSVM was proposed as a variant that better captures the label importance of the training samples. This is particularly useful in problems where the training samples cannot be clearly assigned or should not be assigned to a single class [33]. One example of BFSVM use is the problem of credit risk classification in economics [12, 33]. On the other hand, the discriminative ability of the training samples is the key idea behind PSVM [38]. The discriminative ability is a result of the position of the classes in the feature space and reflects how well a sample can represent its class. For multiclass image classification problems PSVM displayed better performance relatively to FSVM and SVM classifiers.

3

3 Building effective SVM-based concept detectors from noisy clickthrough data This section outlines the proposed approach for training effective SVM-based concept detectors from noisy clickthrough data. Figure 1 illustrates the proposed strategies. Clickthrough data from image search engines consist of triples in the form (Ii , Qj , Ni,j ) denoting that image Ii was clicked Ni,j times after users submitted query Qj . Initially we construct a textual representation Ti for each image Ii by concatenating all queries Qj for which the image was clicked. Repetitions are allowed, so each query Qj is repeated Ni,j times. Figure 1 displays examples of the textual representations constructed using clickthrough data as the second step of the proposed strategies. The resulting textual representations can be used to determine the relevance of an image to a concept. Given a concept c, we could select candidate positive images for the concept by searching for text documents Ti containing the terms of the concept name. Alternatively, we could select the images that have been clicked at least once for the concept name. These approaches, though, have a major drawback; the importance of each positive image is lost. To compensate this effect, we treat the concept name c as a text query q and we apply IR techniques to assign a degree of relevance of the textual representations Ti for concept c. Three well-known IR relevance methods are used: Vector Space Models, BM25, and Language Models. The output of the relevance methods is a list of scored images where, in our context, the scores reflect the importance of each sample for the training procedure. In the example, the textual representations are scored for the concept “windmill” and a scored list for each relevance method is produced. Important samples with many (relevant) clicks are scored with high values, whereas irrelevant images (i.e., that do not contain the concept name at all) are given zero score. Section 4 presents the IR models and the procedure for assigning scores in detail. After scoring the images of the development set, we construct training sets Sc,m for each concept c and relevance methods m. The training sets Sc,m contain the positive images along with the scores calculated by relevance method m. A fixed number of negative training samples are randomly selected among the development set images with zero relevance (according to m). It is important to note that, by construction, the training sets Sc,m contain the same images across the relevance methods m for a given concept c; only the scores of the positive samples differ.

4

Ioannis Sarafis et al.

Τ570272 Image 570272 dutch windmill, 1 european windmills, 2 europe windmills, 1 holland,1 holland windmills, 1

Image 613595 bicycle path 2

Image 17507 indmill, 1 modern european windmills, 1 modern windmills, 2 pics of wind miles, 1 windmill, 24

dutch windmill european windmills european windmills europe windmills holland holland windmills

Concept c “windmill”

Vector Space Model

Τ613595 bicycle path bicycle path

Τ17507 indmill modern european windmills modern windmills modern windmills pics of wind miles windmill windmill windmill windmill ...

BM25

Language Model

Image Score VSM … … 17507 0.89 570272 0.63 … … 613595 0

Image Score BM25 … … 17507 0.95 570272 0.58 … … 613595 0

FSVM Weights

FSVM concept detector

PSVM Weights

PSVM concept detector

BFSVM Weights

BFSVM concept detector

Swindmill,VSM Swindmill,BM25

Training Set Selection

Swindmill,LM

SVM concept detector

Image Score LM … … 17507 0.82 570272 0.43 … … 613595 0

Fig. 1: An overview of the proposed approach for training set generation from clickthrough data. A textual representation Ti is constructed using the queries image Ii was clicked. The images are then scored against a selected concept c using IR relevance models. The scores are used for training set selection and training weights generation for the weighted SVM variants. Next, SVM-based concept detectors are built from training sets Sc,m using the standard SVM algorithm, which serves as the baseline, and three SVM variants: Fuzzy SVM, Power SVM and Bilateral-weighed Fuzzy SVM. The SVM variants incorporate weights for the training samples providing the ability to model the training sample importance at classifier level. The weights are generated from the scores in order to best model the importance of the samples for each SVM variant. Section 5 presents an theoretical analysis of the SVM variants and the procedure followed to generate appropriate weights from the scores assigned by IR models.

4 Relevance Methods This section outlines the three relevance methods that are used to assign scores to images based on their textual representations Ti .

4.1 Vector Space Models Vector space models [14, 22] is a popular scoring method commonly used by commercial search engines. In VSM a document d is represented as a vector V(d) = (w1 , w2 , ..., wT ) where T is the total number of terms present in the document collection D. The terms that are not in the document d are given zero weight. For the terms present in d weights are assigned according to the tf-idf

weighting scheme: wt = tft,d · idft

(1)

where the term frequency tft,d is number of occurrences of term t in document d and idft is the inverse document frequency of term t: idft = log

|D| |{d0 ∈ D|t ∈ d0 }|

(2)

|D| is the number of document in the collection and |{d0 ∈ D|t ∈ d0 }| the number of documents in which the term t appears. A query q is treated as a vector similarly to the documents vectors; V(q) = (w1 , w2 , ..., wT ). Then, the documents d are scored for query q using the cosine similarity score(q, d) = cosθ =

V(d)V(q) |V(d)||V(q)|

(3)

The VSM scoring formula (and similarly the other scoring formulas presented next) can be easily used for our purposes; setting q = c and d = Ti in eq. 3 distinguishes positive images of the development set along with a relevance score for each one. Notice that VSM, as well as BM25 presented next, assign positive scores to documents d even if they do not contain all the terms of the query q. For text retrieval or document similarity problems this may not be a problem; however, in our case, this can lead to unacceptable levels of noise. For example, for the concept “great wall” an overwhelming amount of irrelevant

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

images are retrieved as candidate positive samples. To this end, we apply the relevance methods m only for images with documents Ti that contain all the query (concept) terms. The rest of the images are considered negative for the concept, i.e., they are assigned with zero relevance score.

5

(mle); if a term t occurs in a document d with freP quency tft,d and the length of the document is t tft,d then the maximum likelihood is calculated from tft,d Pmle (t|φd ) = P t tft,d

(6)

5 Support Vector Machines 4.2 BM25 This sections provides an overview of the SVM algoThe BM25 weighting scheme, also known as Okapi weight- rithm and the weighted variants used in this paper. We focus on (i) how the individual sample weights are haning, is a scoring method based on the Probabilistic Reledled by each of the algorithms and (ii) the proposed vance Framework (PRF) and it is the best-known model procedure for generating appropriate weights from the from the family of models derived from PRF [14, 20, image relevance scores. 21]. The scores of the BM25 are called Retrieval Status Value (RSV) and are given by the formula: score(q, d) =RSV =

X

log

t∈q

|D| · |{d0 ∈ D|t ∈ d0 }|

(k1 + 1)tft,d k1



 d ) + tft,d (1 − b) + b( LLave

5.1 SVM

(4)

where tft,d is the frequency of term t in document d, Ld is the length of the document d and, Lave is the average document length of the collection. The variables k1 and b are free parameters of the model and their values can be subject of optimization which can be done either manually or using grid search methods and cross-validation schemes [14]. The variable k1 tunes the document term frequency scaling, whereas the variable b (b ∈ [0, 1]) tunes the document length scaling. In absence of optimization, k1 is usually set in the range [1.2, 2] (here k1 = 1.2) and b = 0.75.

4.3 Language Models Language models [7] treat a query q as sequence of k binary random variables qi one for each term of the query. The documents are scored according to the query likelihood:

Consider the binary classification problem with training set of size N ; S = {(xi , yi )}, i = 1, . . . , N where xi are the feature vectors and yi ∈ {−1, +1} are the labels of the training samples. The solution of the SVM algorithm is the optimal hyperplane wT x+b = 0 which sep2 arates the two classes with the maximun margin ||w|| . A test sample x is classified according to which side of the separating hyperplane lies; i.e. it is assigned to class sign(wT x + b). The signed distance of the test sample from the separating hyperplane is: f (x; w, b) = wT x + b

(7)

In our context, we use eq. 7 to rank new images with respect to a concept. If the training set S consists of l positive samples (yi = +1, i = 1, . . . , l) and k negative samples (yi = −1, i = l + 1, . . . , l + k, N = l + k) then the optimization problem of SVM is formulated as: min

l N X X 1 T w w + C+ ξi + C − ξi 2 i=1

s.t. yi (wT xi + b) ≥ 1 − ξi , score(q, d) = P (q|φd )=P (q1 , q2 , ..., qk |φd ) =

k Y

P (qi |φd )

ξi ≥ 0, (5)

i=1

where φd is the language model inferred from the document d. Equation 5 implies that the query likelihood is calculated under the assumption that the query terms qi are generated independently. The values of P (qi |φd ) are computed using the maximum likelihood estimate

(8)

i=l+1

i = 1, ..., N

i = 1, ..., N

(9) (10)

Minimizing the term 21 wT w results in wider margin. The variables ξi are called slack variables and are the misclassification penalties for the training samples xi that violate the margin. They are added in the problem formulation to allow solutions for the non-separable problems (i.e., the classes in S cannot be perfectly separated by a hyperplane). As we will see in the subsequent sections, the employed SVM variants introduce weights

6

Ioannis Sarafis et al.

that modify the effect of the slack variables according the importance of the training samples. The parameters C + and C − are constants that scale the misclassification error (Eq. 8) for the positive and the negative samples, respectively. In general, the number of samples of each class is different (typically l < k) and their values can be used to balance the classification effort between the two classes classes. For example, in problems where there are fewer positive samples than negative ones in the training set, we would set a high value for C + relatively to C − in order to compensate Pl for a small value of the misclassification penalty i=1 ξi of the positive class. Furthermore, the values of C + and C − tune the trade-off between wider margin and larger training error; larger values for C + and C − lead to smaller margin and less training errors and vice versa. The values of the constants are typically set from a cross-validation procedure; however, in our case, cross-validation was not useful due to high label noise. To this end, the values of C + and C − were set according to the prior p(c) of the concept c in the training set Sc,m as: C + = 1 − p(c) and C − = p(c). The linear SVM model of eq. 7 can directly be extended to use a nonlinear separating hyperplane by replacing the inner products that appear in eq. 8 and eq. 9 with a kernel K(xi , xj ) = hφ(xi ), φ(xj )i [1]. The same extension can be applied to all SVM variants used in this work.

5.2 Fuzzy SVM FSVM modifies the effect of the slack variables in the objective function (eq. 8) by introducing weights indicative of the importance of each sample. This is achieved by the following FSVM formulation: min

l N X X 1 T w w + C+ wiF ξi + C − wiF ξi 2 i=1

(11)

i=l+1

s.t. yi (wT xi + b) ≥ 1 − ξi , ξi ≥ 0,

i = 1, ..., N

i = 1, ..., N

(12) (13)

The weights wiF reflect the importance of each training sample and higher weight values are assigned to the more prominent ones. This way misclassification for important samples results in increased penalties, making these samples less likely to be misclassified during training. For automatic concept detector training, we calculate the weights wiF from a linear scaling of the scores

assigned by the relevance methods m. The normalization range is in [s, 1], where s is a small positive value needed for stability of the optimization problem (here s = 0.1). For the randomly selected negative samples, since no information about their quality can be inferred, we set the weights to 1.

5.3 Power SVM In PSVM weights wiP are introduced in the constrains (eq. 9) controlling the misclassification penalties in a different manner. The formulation of PSVM becomes:

min

l N X X 1 T w w + C+ ξi + C − ξi 2 i=1

(14)

i=l+1

s.t. yi (wT xi + b) ≥ 1 − ξi − wiP , ξi ≥ 0,

i = 1, ..., N

i = 1, ..., N

(15) (16)

In the above formulation, a training sample may be misclassified and still receive zero penalty. This results from eq. 15: the penalty ξi will be zero until the training error reaches the value of the weight wiP . When the training error exceeds the weight wiP , the counted penalty ξi is mitigated by wiP . This way, PSVM may improve the classification performance for areas of the feature space containing opposite classes with training samples of low discriminative ability by constructing simpler separating hyperplanes. The framework proposed in [38] provides a procedure for calculating the weights based on the discriminative ability of the training samples. The weights were set in the range [0, 1] with the most discriminative samples receiving zero weights and the least discriminatives samples weight value 1. Note that PSVM was not proposed as a way to handle label importance; however, we can use the weights wiP to implicitly modify the effect of the misclassification penalties in the objective function (Eq. 14) and therefore model label importance in our context. In our experiments, the most important training samples receive smaller values of wiP (so they are not easily misclassified by the resulting model) while larger weights wiP are given to the less important ones. To this end, we generate the weights wiP of the positive samples from inverse linear mapping of the relevance scores in the range [0, 1]. The negative samples are assigned with weights wiP = 0.

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

5.4 Bilateral-weighted Fuzzy SVM BFSVM is based on the formulation of FSVM; however, the training samples are represented at the same time as positive and negative ones in the objective function: min

N N X X 1 T w w + C+ wiB ξi + C − (1 − wiB )ξi0 (17) 2 i=1 i=1

In order to test the concept detectors, we apply them on the evaluation set of 500,000 images creating this way a challenging evaluation procedure. We manually annotated the top 100 results of each experiment and then calculated the Normalized Discounted Cumulative Gain (nDCG) at different levels. Given a ranked list of images, the nDCG@d at level d is given by: nDCG@d = Zd

T

s.t. yi (w xi + b) ≥ 1 − ξi ,

i = 1, ..., N

(18)

ξi0 ,

i = 1, ..., N

(19)

T

yi (w xi + b) ≤ 1 − ξi ≥ 0,

i = 1, ..., N

(20)

ξi0 ≥ 0,

i = 1, ..., N

(21)

In our experiments, the weights wiB and are generated in the same way as for FSVM (Sect. 5.2). Since wiB = 1 for the negative samples, they account only for one type of penalty. In our context, the only difference with FSVM is that the positive samples account for an additional training error. It is important to notice that when a positive sample is assigned with a low weight value wiB , the contribution in the training error as positive wiB ξi C + may become smaller than or comparable to the contribution in the training error as negative (1 − wiB )ξi0 C − . As we will see in the experiments, the proper weight assignment plays an important role for the performance of BFSVM.

6 Experiments 6.1 Experimental Setup For the experiments we used the MSR–Bing Grand Challenge dataset (also known as Clickture-Lite [9]). In total, the dataset contains 23.1 million triples, 1 million unique images and 82.3 million unique clicks sampled over one-year period from the image search engine of Microsoft Bing1 . The provided data are aggregated, i.e. no information for separate user sessions or clicks is given. In this paper, we randomly split the dataset into equally sized development and evaluation set. Following the proposed strategies of Sect. 3, we first create training sets Sc,m for the combinations of the three relevance methods m and 40 selected concepts c using the development set of 500,000 images. The queries Qi of the search logs are stemmed before adding them in the text documents Ti and the concept name c is stemmed as well before used by the relevance methods for scoring the documents Ti . 1

https://image.bing.com

7

d X 2ri − 1 log(1 + i) i=1

(22)

where ri = {Bad = 0, Good = 2, Excellent = 3} and the term Zd is a normalizer to make the nDCG@d for d excellent retrieved images to result in 1. Each image is represented via a bag of words feature vector based on Opponent SIFT [23] with dense sampling and a codebook of 512 visual words. The χ2 kernel was employed for the concept detector training using a set of 5000 randomly selected negative samples to form the training sets Sc,m . The SVM algorithms were implemented in LIBSVM [2]. The standard SVM and FSVM are directly supported by the current LIBSVM distribution, while we extended the library to implement the PSVM and BFSVM versions used in this paper. The next section presents the experimental results.

6.2 Evaluation of the proposed strategies Table 1 displays the nDCG@100 results and the number of retrieved positive images for the 40 selected concepts. Given a concept c, the training set Sc,m contain the same samples by construction across methods m (as explained in Sect. 3) and since the scores are ignored the baseline (standard SVM) concept detectors demonstrate the same performance across relevance methods in this set of experiments. First, we observe that any combination of SVM variant and relevance method has improved average performance relatively to standard SVM concept detectors, proving the usefulness of our proposed strategies. Across relevance methods the SVM variants have higher performance when trained with weights derived from VSM. BFSVM trained with weights from VSM scores displays the best average nDCG@100 value with a relative improvement of 130% compared to the baseline average performance. FSVM also shows an excellent performance with an improvement of 110%. At this point we must recall that, in our experiments, the only difference between FSVM and BFSVM is in the objective function where the term (1 − wiB )ξi0 C − for the positive samples is added (Sect. 5.4). This indicates that the contribution of the positive samples in two types of

8

Ioannis Sarafis et al.

Table 1: nDCG@100 results and the number of retrieved positive images for 40 concepts for the combinations of relevance methods and SVM algorithms. The performance of the standard SVM algorithm is the same across relevance methods in these experiments. Best results are marked with bold font. Vector Space Models Concept

BM25

Language Models

# pos.

SVM

FSVM

PSVM

BFSVM

FSVM

PSVM

BFSVM

FSVM

PSVM

BFSVM

aircraft carrier

104

0.0679

0.1216

0.1504

0.1143

0.0528

0.0998

0.0527

0.0539

0.1118

0.0529

airplane flying

117

0.1774

0.1782

0.1747

0.1639

0.2012

0.1705

0.2013

0.1983

0.1689

0.199

basketball

2537

0.0322

0.2289

0.0309

0.346

0.0483

0.0365

0.0513

0.1302

0.0338

0.2814

bathroom shower

597

0.1192

0.2078

0.1242

0.2045

0.1323

0.1153

0.1235

0.1745

0.123

0.1695

bicycle

748

0.0591

0.593

0.1763

0.6422

0.1258

0.0849

0.1116

0.1617

0.1161

0.1313

birds

4821

0.1894

0.2047

0.2324

0.0087

0.1987

0.237

0.1402

0.0525

0.2416

0.239

birthday cake

2980

0.1297

0.2809

0.1991

0.3968

0.1894

0.1863

0.1752

0.2734

0.1945

0.3422

christmas tree

1516

0.1251

0.1405

0.1601

0.1493

0.1162

0.1422

0.1178

0.1372

0.1566

0.1709

cookies

1484

0.0269

0.0852

0.027

0.1396

0.0388

0.0262

0.0312

0.0446

0.034

0.0502

coral reef

190

0.171

0.2588

0.2155

0.238

0.1942

0.1889

0.2014

0.2279

0.1804

0.2263

eiffel tower

163

0.0954

0.1139

0.1192

0.1133

0.0945

0.1035

0.0934

0.0987

0.1066

0.0979

falling leaves

567

0.207

0.2834

0.2661

0.2757

0.2846

0.2581

0.2749

0.2679

0.26

0.2787

farm

1699

0.1629

0.1597

0.1572

0.1279

0.176

0.1845

0.1706

0.1467

0.1531

0.1375

fashion show

244

0.1514

0.1913

0.1881

0.1917

0.1745

0.1964

0.1741

0.1624

0.189

0.1682

fireplace

752

0.1111

0.1119

0.0549

0.1144

0.1074

0.1068

0.0911

0.109

0.0703

0.1002

fireworks

365

0.1181

0.3181

0.2065

0.3241

0.2541

0.2172

0.255

0.2804

0.185

0.2789

forests

1871

0.0608

0.3567

0.1481

0.689

0.1198

0.1025

0.1196

0.1647

0.1286

0.1822

great wall

471

0

0.1063

0.0104

0.1503

0.0478

0

0.0554

0.0797

0.0083

0.103

gymnastics

435

0.021

0.2491

0.0555

0.2364

0.0873

0.0476

0.0699

0.193

0.0555

0.1767

horses

3687

0.0608

0.3775

0.0996

0.3939

0.1696

0.0921

0.1632

0.3264

0.1123

0.3564

jewelry

1527

0.4337

0.6585

0.4571

0.5765

0.5308

0.5088

0.5066

0.5802

0.495

0.5236

kitchen cabinet

843

0.5895

0.6103

0.5967

0.5274

0.5856

0.6369

0.5812

0.5805

0.6248

0.5942

library

466

0

0.1231

0.0096

0.127

0.0212

0.0112

0.0252

0.0484

0.0077

0.0292

lightning

549

0.0473

0.4909

0.1367

0.5618

0.1519

0.0916

0.1077

0.1956

0.109

0.1793

meadow

118

0.1711

0.2183

0.1604

0.2169

0.2001

0.1766

0.2036

0.1659

0.1474

0.2142

mobile phones

217

0.1598

0.5238

0.319

0.5186

0.405

0.3268

0.3868

0.4248

0.3146

0.4041

modern buildings

188

0.1506

0.3173

0.2761

0.3217

0.1886

0.2348

0.1891

0.2461

0.2667

0.2467

mountain peak

77

0.1866

0.3798

0.3728

0.386

0.238

0.2239

0.1983

0.3583

0.2728

0.2454

nascar

365

0.0602

0.1083

0.1076

0.1288

0.0752

0.0757

0.0759

0.0717

0.0774

0.0782

nebula

85

0.2413

0.2976

0.2709

0.3017

0.2742

0.2673

0.2742

0.2722

0.2646

0.2801

pizza

456

0.0301

0.0961

0.0738

0.1191

0.0645

0.0714

0.0642

0.0746

0.0723

0.0743

polar bear

302

0.1051

0.3257

0.2031

0.3079

0.2583

0.1634

0.2433

0.267

0.188

0.2647

sculptures

715

0.0401

0.2063

0.0964

0.1656

0.0768

0.0581

0.0623

0.1028

0.0764

0.0589

soccer team

329

0

0.2416

0

0.2367

0.0531

0

0.0432

0.1803

0

0.1878

sphinx giza

12

0.0521

0.101

0.0855

0.1005

0.0988

0.0857

0.0717

0.1018

0.0857

0.0697

swimming pool

371

0.205

0.2124

0.235

0.2045

0.2084

0.2183

0.2197

0.1953

0.2323

0.2021

tree branches

299

0.0641

0.1276

0.1093

0.1208

0.0997

0.0883

0.0962

0.1066

0.0869

0.0914

tulip

329

0.0252

0.0903

0.0233

0.1193

0.0216

0.0202

0.0217

0.0201

0.0251

0.0209

waterfalls

807

0.0279

0.1051

0.0596

0.208

0.0833

0.0568

0.084

0.0942

0.0696

0.0756

windmill

85

0.0351

0.1208

0.0266

0.1216

0.0613

0.0285

0.059

0.0789

0.0261

0.0653

Average

837.2

0.1178

0.2481

0.1604

0.2598

0.1627

0.1485

0.1547

0.1862

0.1518

0.1912

training errors allows BFSVM to converge to a better solution than FSVM.

space; whereas, in our experiments, we used the sample weights of PSVM to model the sample importance.

On the other hand, PSVM has an inferior performance compared to the other SVM variants, however, displays a performance improvement of 35%. This can be explained by the fact that PSVM was proposed as an SVM variant that handles importance associated with the locality of the training samples in the feature

Figure 2 displays the average nDCG@d at various levels and demonstrates that the performance gains for lower depths d are significantly larger. For example, the average values of nDCG@5 and the relative improvement of the concept detectors using weights calculated from VSM scores are: 0.1545 for SVM, 0.4379

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

0.5

0.5

BSVM PSVM FSVM SVM

0.4

0.5

BSVM PSVM FSVM SVM

0.4 0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

nD

nD

CG

@5

nD

CG

nD

CG

@1

0

5

0

nD

CG

@2

@5

0

CG @

10

nD

CG

0

(a) VSM

@5

nD C

nD

G@

10

BSVM PSVM FSVM SVM

0.4

0.3

0

9

0

nD nD CG CG @2 @5 @1 5 0 00

CG

nD C

nD C

G@

5

(b) BM25

nD C

G@

10

nD nD CG CG G@ @5 @1 25 0 0

0

(c) LM

Fig. 2: Average nDCG@d at various levels d for the 40 concepts with training sets generated using the three relevance methods

100

Percentage of positive samples True positive rate

80 Percentage

Percentage

80

100

Percentage of positive samples True positive rate

60 40 20

60 40 20

0 1

(a) VSM

60 40 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Score range

Percentage of positive samples True positive rate

80 Percentage

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Score range

(b) BM25

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Score range

1

(c) LM

Fig. 3: Average true positive rate and percentage of positive samples across concepts for different score ranges for the three relevance methods. (+183%) for FSVM, 0.2635 (+71%) for PSVM, and 0.476 (+208%) for BFSVM. Figures 5 , 7, and 9 display the top 10 retrieved images by concept detectors trained with VSM scores for the concepts “bicycle”, “horses”, and “modern buildings”.

6.3 Evaluation of relevance methods Given that training sets Sc,m contain the same samples for a concept c across methods m, the results indicate that the score assignment leads to a significantly different performance gains and plays a very important role for the weight generation for the SVM variants. In this section, we study the relevance methods to understand the qualities that make them more or less appropriate to assign scores for weight generation. First, we manually annotate the positive samples in training sets Sc,m and we consider training samples as correct for quality Good or Excellent. The image scores are normalized with the top listed image for each method to have score 1. We then select differ-

ent subsets of the training sets by choosing only samples with scores that fall within predefined score ranges (i.e., (0, 0.1], (0.1, 0.2], . . . ). Figure 3 displays the average true positive rate (TPR) and the percentage of the positive samples for training sets Sc,m at different score ranges for the 40 concepts of the experiments. Regarding the TPR, all relevance methods display an increasing accuracy for higher score ranges. This very important property indicates the ability of the employed IR models to assign meaningful scores to the retrieved positive images. As we can see from Fig. 3 VSM demonstrates the best accuracy at all score ranges. In addition, VSM assigns low scores to a high percentage of inaccurate positive samples and high scores to a small percentage of mostly accurate samples compared to other relevance methods. On the contrary, BM25 produces the most noisy scores; a large percentage of inaccurate positive samples are assigned to high score ranges. These observations explain the significant performance differences of the SVM classifiers across relevance methods. The SVM variants trained with VSM scores have the best performance, whereas when trained

10

Ioannis Sarafis et al.

with BM25 scores resulted to poor performance. Furthermore, the results give us intuition about the qualities a scoring method should possess; assigning high scores to less and more accurate samples and lower scores to many inaccurate ones can boost the performance of the SVM variants.

0.35 BSVM PSVM FSVM SVM

nDCG@100

0.3 0.25 0.2 0.15 0.1 0

0.1

0.2 0.3 Threshold

0.4

0.5

Fig. 4: Average nDCG@100 for training sets generated by VSM and subsequent thresholding of the positive samples for various thresholds

6.4 Evaluation of SVM concept detectors for different levels of label noise One question that naturally arises from the results presented in Sect. 6.1 has to do with the behavior of the different weighted SVM algorithms for varying levels of label noise. To this end, we constructed training sets with different label noise levels artificially, by applying a threshold to the VSM scores of the candidate positive training samples. The ones with scores below the threshold were discarded from the development set. The training set subsets are therefore selected to have scores [0, 1], [0.1, 1], [0.2, 1], etc. Then, the same procedure for generating weights was followed for each SVM variant as before. Figure 4 displays the average nDCG@100 of the standard SVM and the SVM variants for 40 concepts and various thresholds. As expected, the performance of the standard SVM algorithm tends to increase when we apply thresholds (i.e. for lower noise levels). However, higher applied threshold values (0.4 and 0.5) adversely effect the performance of standard SVM classifiers, due to the small number of positive examples that remain. Furthermore, these observations lead to important conclusions for the effectiveness of methods that

use score filtering and the standard SVM algorithm. Strategies based on score filtering to create training sets with low label noise and standard SVM classifiers may suffer from the loss of the positive samples and display significantly inferior performance compared to weighted SVM variants. BFSVM performance is deteriorating for training sets produced with higher threshold values. In our experiments, the positive training samples account for two types of training errors in BFSVM: wiB ξi C + and (1 − wiB )ξi0 C − (Section 5.4). There are two reasons that explain the reduced performance of BFSVM: (i) applying larger thresholds and then scaling the weights wiB in the same range leads to an increasing number of true positive samples in receiving smaller weights wiB . Thus, the positive samples affect the training error in a much different way while their training errors increase faster for the term (1 − wiB )ξi0 C − , and (ii) a large number of low-score, mostly inaccurate samples were removed, that were previously important negative examples ((1−wiB )ξi0 C − ) and that had contributed to a better definition of the boundary between the positive and negative classes in BFSVM. The results indicate BFSVM’s high sensitivity to wiB weights and that special care is required for proper weight assignment. On the other hand, PSVM is displaying an increasing performance for lower noise levels, however, the performance gains are small. FSVM displays the best performance when thresholds are applied and the performance remains very stable for varying levels of label noise in training sets. The noise insensitivity combined with the high performance suggests FSVM as the most robust SVM variant. Figures 6, 8, and 9 display the top 10 retrieved images using VSM scores and thresholds for concepts “bicycle”, “horses” and “modern buildings”. 7 Conclusions In this paper, we propose and evaluate strategies for building effective SVM-based concept detectors from clickthrough data. In the proposed strategies, queries from the search logs of image search engines are used to create text documents for the images. Then, we apply three IR relevance methods (Vector Space Models, BM25, and Language Models ) to distinguish the positive samples among with a score indicating the relevance of the candidate positive images given a concept. We use the scored lists of images to automatically generate training sets and we build and evaluate SVM-based concept detectors using the standard SVM algorithm, which serves as baseline, and three weighed SVM variants using weights calculated from the scores.

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

The SVM variants studied in this paper include Fuzzy SVM, Power SVM, and Bilateral-weighted Fuzzy SVM. For the experiments, we use the MSR–Bing Grand Challenge dataset that consists of 1M images and 82.3M unique clicks and evaluate the automatically constructed concept detectors for 40 concepts. The results indicate the ability of the SVM variants to handle label noise that is introduced from the automatic generation of training sets and lead to significant improvement compared to standard SVM without weights. The best performance gains are observed for BFSVM followed closely by FSVM. PSVM also demonstrated label noise handling capability for the this dataset although inferior to other SVM variants. Across relevance methods classifiers trained with weights generated from Vector Space Models displayed the greater performance gains. In order to justify the observed differences, we manually annotated the retrieved candidate training samples and calculated the accuracy and the percentage of positive examples at different score ranges. Vector Space Models demonstrated higher accuracy for the top scored images and, at the same time, assigned low scores to a large portion of inaccurate candidate images. These qualities explain the performance differences and can be used to provide intuition for designing new scoring methods for clickthrough data. Finally, we evaluated the ability of the SVM variants in handling different label noise levels. To this end, we constructed training sets with lower noise levels using thresholds at the scores before selecting positive images. The experiments indicate that FSVM is the most noiseresilient variant with high performance across different noise levels. On the other hand, BFSVM displays a high sensitivity to weight assignment and the performance severely suffers if the weights are not appropriate.

References 1. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min and Knowl Discov 2(2):121–167, DOI 10.1023/A: 1009715923555 2. Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27, DOI 10.1145/ 1961189.1961199 3. Chapelle O, Zhang Y (2009) A dynamic bayesian network click model for web search ranking. In: Proc of the 18th Int Conf on World Wide Web, ACM, New York, NY, USA, WWW ’09, pp 1–10, DOI 10.1145/1526709.1526711

11

4. Craswell N, Szummer M (2007) Random walks on the click graph. In: Proc of the 30th Annual Int ACM SIGIR Conf on Res and Develop in Information Retr, ACM, New York, NY, USA, SIGIR ’07, pp 239–246, DOI 10.1145/1277741.1277784 5. Dupret G, Liao C (2010) A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In: Proc of the Third ACM Int Conf on Web Search and Data Min, ACM, New York, NY, USA, WSDM ’10, pp 181–190, DOI 10.1145/1718487.1718510 6. Fang Q, Xu H, Wang R, Qian S, Wang T, Sang J, Xu C (2013) Towards MSR-Bing challenge: Ensemble of diverse models for image retrieval. URL research.microsoft.com/en-us/ events/irc2013/paper_irc_nlpr-mmc.pdf, Accessed: 15 August 2014 7. Hiemstra D (1998) A linguistically motivated probabilistic model of information retrieval. In: Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, vol 1513, Springer Berlin Heidelberg, pp 569–584, DOI 10.1007/3-540-49653-X 34 8. Hsu CC, Han MF, Chang SH, Chung HY (2009) Fuzzy support vector machines with the uncertainty of parameter C. Expert Systems with Appl 36(3, Part 2):6654 – 6658, DOI http://dx.doi.org/ 10.1016/j.eswa.2008.08.032 9. Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In: Proc of the 21st ACM Int Conf on Multimed, ACM, New York, NY, USA, MM ’13, pp 243–252, DOI 10.1145/2502081.2502283 10. Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proc of Int Joint Conf on Neural Netw, 2001. IJCNN ’01., vol 2, pp 1449–1454 vol.2, DOI 10.1109/IJCNN. 2001.939575 11. Jain V, Varma M (2011) Learning to re-rank: Query-dependent image re-ranking using click data. In: Proceedings of the 20th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’11, pp 277–286, DOI 10.1145/ 1963405.1963447, URL http://doi.acm.org/10. 1145/1963405.1963447 12. Jilani T, Burney S (2008) Multiclass bilateralweighted fuzzy support vector machine to evaluate financial strength credit rating. In: Proc of Int Conf on Comput Science and Inf Technol, 2008. ICCSIT ’08., pp 342–348, DOI 10.1109/ICCSIT.2008.191 13. Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans on Neural Netw 13(2):464–471,

12

Ioannis Sarafis et al.

SVM

FSVM

PSVM

BFSVM

Fig. 5: Top 10 retrieved images for concept “bicycle” using train sets generated with VSM. FSVM and BSVM demonstrate excellent performance. The average nDCG@100 values are: 0.0591 for SVM, 0.593 for FSVM, 0.1762 for PSVM, and 0.6422 for BFSVM.

SVM

FSVM

PSVM

BFSVM

Fig. 6: Top 10 retrieved images for concept “bicycle” using train sets generated with VSM and threshold = 0.3. The performance of BFSVM has decreased, whereas FSVM maintains an excellent performance. The average nDCG@100 values are: 0.1023 for SVM, 0.4217 for FSVM, 0.1566 for PSVM, and 0.1646 for BFSVM.

SVM

FSVM

PSVM

BFSVM

Fig. 7: Top 10 retrieved images for concept “horses” using train sets generated with VSM. FSVM and BSVM demonstrate high performance, whereas SVM and PSVM performance is poor. The average nDCG@100 values are: 0.0608 for SVM, 0.3775 for FSVM, 0.0996 for PSVM, and 0.3939 for BFSVM.

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

13

SVM

FSVM

PSVM

BFSVM

Fig. 8: Top 10 retrieved images for concept “horses” using train sets generated with VSM and threshold = 0.5. The performance of SVM and PSVM has improved significantly using thresholds. FSVM maintains a stable performance, whereas BFSVM performance has severely decreased as a result of the weight sensitivity of BFSVM. The average nDCG@100 values are: 0.1926 for SVM, 0.3743 for FSVM, 0.1863 for PSVM, and 0.1926 for BFSVM.

SVM

FSVM

PSVM

BFSVM

Fig. 9: Top 10 retrieved images for concept “modern buildings” using train sets generated with VSM. The average nDCG@100 values are: 0.1506 for SVM, 0.3173 for FSVM, 0.2761 for PSVM, and 0.3217 for BFSVM. DOI 10.1109/72.991432 14. Manning CD, Raghavan P, Sch¨ utze H (2008) Introduction to information retrieval. Cambridge university press 15. Min R, Cheng HD (2009) Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Recogn 42(1):147– 157, DOI 10.1016/j.patcog.2008.07.001 16. Pan Y, Yao T, Yang K, Li H, Ngo CW, Wang J, Mei T (2013) Image search by graph-based label propagation with image representation from DNN. In: Proc of the 21st ACM Int Conf on Multimed, ACM, New York, NY, USA, MM ’13, pp 397–400, DOI 10.1145/2502081.2508128 17. Pan Y, Yao T, Mei T, Li H, Ngo CW, Rui Y (2014) Click-through-based cross-view learning for image search. In: Proc of the 37th Int ACM SI-

GIR Conf on Res; Dev in Inf Retr, ACM, New York, NY, USA, SIGIR ’14, pp 717–726, DOI 10.1145/2600428.2609568, URL http://doi.acm. org/10.1145/2600428.2609568 18. Radlinski F, Joachims T (2005) Query chains: Learning to rank from implicit feedback. In: Proc of the Eleventh ACM SIGKDD Int Conf on Knowl Discov in Data Min, ACM, New York, NY, USA, KDD ’05, pp 239–248, DOI 10.1145/1081870. 1081899 19. Rao Y, Mundur P, Yesha Y (2006) Fuzzy SVM ensembles for relevance feedback in image retrieval. In: Proc of the 5th Int Conf on Image and Video Retr, Springer-Verlag, Berlin, Heidelberg, CIVR’06, pp 350–359, DOI 10.1007/11788034 36 20. Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Now Pub-

14

Ioannis Sarafis et al.

SVM

FSVM

PSVM

BFSVM

Fig. 10: Top 10 retrieved images for concept “modern buildings” using train sets generated with VSM and threshold = 0.5. The standard SVM and the SVM variants display excellent performance. Once more, FSVM demonstrates that is the best SVM variant for varying level noises. The average nDCG@100 values are: 0.4344 for SVM, 0.5016 for FSVM, 0.4887 for PSVM, and 0.4502 for BFSVM. lishers Inc 21. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proc of the 17th Annual Int ACM SIGIR Conf on Res and Dev in Inf Retr, Springer-Verlag New York, Inc., New York, NY, USA, SIGIR ’94, pp 232–241 22. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620, DOI 10.1145/361219.361220 23. Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans on Pattern Anal and Mach Intell 32(9):1582–1596, DOI 10.1109/TPAMI.2009. 154 24. Sarafis I, Diou C, Delopoulos A (2014) Building robust concept detectors from clickthrough data: A study in the msr-bing dataset. In: Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop on, pp 66–71, DOI 10.1109/SMAP.2014.22 25. Sarafis I, Diou C, Tsikrika T, Delopoulos A (2014) Weighted SVM from clickthrough data for image retrieval. In: IEEE Int Conf on Image Process 2014 (ICIP 2014), Paris, France, pp 3051–3055 26. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215– 322, DOI 10.1561/1500000014 27. Sohail A, Bhattacharya P, Mudur S, Krishnamurthy S (2011) Classification of ultrasound medical images using distance based feature selection and fuzzy-SVM. In: Pattern Recognit and Image Anal, Lecture Notes in Computer Science, vol 6669, Springer Berlin Heidelberg, pp 176–183, DOI

10.1007/978-3-642-21257-4 22 28. Sun Z, Ruan D, Ma Y, Hu X, Zhang Xg (2009) Crack defects detection in radiographic weldment images using FSVM and beamlet transform. In: Proc of the 6th Int Conf on Fuzzy Systems and Knowl Discov - Volume 3, IEEE Press, Piscataway, NJ, USA, FSKD’09, pp 402–406 29. Tsikrika T, Diou C (2014) Multi-evidence user group discovery in professional image search. In: de Rijke M, Kenter T, de Vries A, Zhai C, de Jong F, Radinsky K, Hofmann K (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 8416, Springer International Publishing, pp 693–699, DOI 10.1007/ 978-3-319-06028-6 78, URL http://dx.doi.org/ 10.1007/978-3-319-06028-6_78 30. Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Are clickthrough data reliable as image annotations? In: Proc of the Theseus/ImageCLEF workshop on vis inf retr eval, Fraunhofer Verlag, Corfu, Greece 31. Tsikrika T, Diou C, de Vries A, Delopoulos A (2011) Reliability and effectiveness of clickthrough data for automatic image annotation. Multimed Tools and Appl 55(1):27–52, DOI 10.1007/ s11042-010-0584-1 32. Wang L, Cen S, Bai H, Huang C, Zhao N, Liu B, Feng Y, Dong Y (2013) France telecom orange labs (beijing) at MSR-Bing challenge on image retrieval 2013. URL research.microsoft.com/ en-us/events/irc2013/paper_irc_orange.pdf, Accessed: 15 August 2014 33. Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. Trans

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

34.

35.

36.

37.

38.

39.

Fuz Sys 13(6):820–831, DOI 10.1109/TFUZZ.2005. 859320 Wu CC, Chu KY, Kuo YH, Chen YY, Lee WY, Hsu WH (2013) Search-based relevance association with auxiliary contextual cues. In: Proc of the 21st ACM Int Conf on Multimed, ACM, New York, NY, USA, MM ’13, pp 393–396, DOI 10.1145/2502081. 2508127 Wu K, Yap KH (2006) Fuzzy SVM for contentbased image retrieval: A pseudo-label support vector machine framework. Comp Intell Mag 1(2):10– 16, DOI 10.1109/MCI.2006.1626490 Xian Gm (2010) An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Syst Appl 37(10):6737–6741, DOI 10.1016/j.eswa.2010.02.067 Yang X, Zhang Y, Yao T, Ngo CW, Mei T (2014) Click-boosting multi-modality graph-based reranking for image search. Multimed Systems pp 1–11, DOI 10.1007/s00530-014-0379-8 Yu SX (2012) Power SVM: Generalization with exemplar classification uncertainty. In: Proc of the 2012 IEEE Conf on Comput Vis and Pattern Recognit (CVPR), IEEE Computer Society, Washington, DC, USA, CVPR ’12, pp 2144–2151 Zhang Y, Yang X, Mei T (2014) Image search reranking with query-dependent click-based relevance feedback. Image Processing, IEEE Transactions on 23(10):4448–4459, DOI 10.1109/TIP.2014. 2346991

15

Suggest Documents