IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used Information Processing and Management xxx (2010) xxx–xxx 1
Contents lists available at ScienceDirect
Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman
3
A relational vector space model using an advanced weighting scheme for image retrieval
4
Jean Martinet a,⇑, Yves Chiaramella b, Philippe Mulhem b
2
5 Q1 6 7 8 1 4 0 2 11 12 13 14 15 16 17 18 19 20 21 22 23
a b
Laboratoire d’Informatique Fondamentale de Lille, France Laboratoire d’Informatique de Grenoble, France
a r t i c l e
i n f o
Article history: Received 7 July 2008 Received in revised form 5 July 2010 Accepted 30 October 2010 Available online xxxx Keywords: Information retrieval Vector space model Conceptual graph Image indexing Weighting scheme Evaluation
a b s t r a c t In this paper, we lay out a relational approach for indexing and retrieving photographs from a collection. The increase of digital image acquisition devices, combined with the growth of the World Wide Web, requires the development of information retrieval (IR) models and systems that provide fast access to images searched by users in databases. The aim of our work is to develop an IR model suited to images, integrating rich semantics for representing this visual data and user queries, which can also be applied to large corpora. Our proposal merges the vector space model of IR – widely tested in textual IR – with the conceptual graph (CG) formalism, based on the use of star graphs (i.e. elementary CGs made up of a single relation connected to some concepts representing image objects). A novel weighting scheme for star graphs, based on image objects size, position, and image heterogeneity is outlined. We show that integrating relations into the vector space model through star graphs increases the system’s precision, and that the results are comparable to those from graph projection systems, and also that they shorten processing time for user queries. ! 2010 Elsevier Ltd. All rights reserved.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
42 43
1. Introduction
44
Image retrieval has been an active research topic for more than 15 years now (Smeulders, Worring, Santini, Gupta, & Jain, 2000). Historically, a lot of the initial studies have been performed on content-based image retrieval systems: such proposals consider low-level features to be an accurate enough representation of image content for the purposes of retrieval. However, in applications like home photograph management, semantic descriptions of image content are useful (Rodden, 1999; Rodden & Wood, 2003). We have developed an image content description model that represents semantics related to visible objects, as well as to the overall organization of the image. According to classical models of information retrieval (IR), a document is described by a set of representative index terms. In text retrieval, for instance, index terms are keywords extracted from the collection. Because all index terms in a document do not describe it equally, they are assigned numerical weights. The purpose of a weighting scheme is to give emphasis to important terms, quantifying how well they semantically describe documents and distinguish them from each other. In order to contain the complex knowledge related to either an image description or a user need, a formalism supporting relations is necessary. In particular, the knowledge representation formalism of conceptual graphs, introduced by Sowa (1984), has been used for image representation by Mechkour (1995) and for image retrieval by Ounis and Pasca (1998).
45 46 47 48 49 50 51 52 53 54 55 56
⇑ Corresponding author. Tel.: +33 659691191; fax: +33 328778537.
E-mail addresses:
[email protected] (J. Martinet),
[email protected] (Y. Chiaramella),
[email protected] (P. Mulhem).
0306-4573/$ - see front matter ! 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2010.10.003
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 2
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
64
We use conceptual graphs to index the images in the collection, assigning some numerical values to elementary sub-graphs in order to represent the importance of the sub-graphs in the index. This paper describes an information retrieval model whose image representation is based on star graphs. The remaining sections are organized as follows: in Section 2, we review the state of the art in image indexing and retrieval, from the specific points of view of term weighting and relational descriptions. Section 3 describes how documents are represented in the model presented. Section 4 details the novel weighting scheme for star graphs. Section 5 describes experiments about evaluating the benefits of the weighted star graph approach. Section 6 discusses the impact and limitations of the model. A conclusion and some future works are discussed in Section 7.
65
2. Related works
66
69
Approaches in classical text retrieval are mostly based on using keywords as document descriptors (Salton, 1971). The advantage of dealing with keywords is that the semantics are contained within the keywords themselves, unlike in other media e.g. image or video. For instance, a document in which the word ‘‘alpinism’’ appears is considered as a document about ‘‘alpinism’’. Consequently this document is considered relevant for a query about ‘‘alpinism’’.
70
2.1. Term weighting
71
Document model keywords in a text IR-system are similar to the words of the user’s natural language. They are automatically extracted, filtered, and weighted to give them more or less importance depending on how well they represent the document content. For instance, in the vector space model (VSM), documents and queries are represented with vectors in a ndimensional space, where n is the number of terms in the index language. The index of a document dj is the vector
57 58 59 60 61 62 63
67 68
72 73 74
75 77 78 79 80 81 82 83 84 85
86 88
89 90 91 92
93 95
96 97 98 99 100
~ dj ¼ ðw1;j ; w2;j ; . . . ; wn;j Þ where wi,j 2 [0, 1] represents the weight of the term ti in the document dj. A query is represented by a vector ~ q ¼ ðw1;q ; w2;q ; . . . ; wn;q Þ, where wi,q 2 [0, 1] is the weight of the term ti in the query q. The weight of a term represents its importance in the document and its ability to distinguish between documents in the collection, in order to distinguish relevant documents from irrelevant ones. Most largely used weighting schemes for text retrieval are based on variations of tf.idf (term frequency $ inverted document frequency) (Baeza-Yates & Ribeiro-Neto, 1999; Salton, 1971; Salton & Buckley, 1988; Salton & McGill, 1983). The tf measures the importance of a term in a document, and is usually defined according to the (normalized) number of occurrences of the term in the document. For instance, the tf of a term ti in a document dj is often defined as:
tfi;j ¼
freqi;j jdj j
ð1Þ
where freqi,j is the number of occurrences of the term ti in the document dj, and jdjj is the number terms in dj. The idf measures the discriminance of the term (Salton & Buckley, 1988; Salton, Wong, & Yang, 1975) (also called resolving power (van Rijsbergen, 1979)), i.e. the ability of the term to distinguish between the documents of the collection. A widely used formula for calculating the idf of a term ti is (Salton, 1971):
idfi ¼ log
! " N ni
ð2Þ
where N is the total number of documents in the collection and ni is the number of documents in which the term ti occurs. This weighting scheme is based on two sources. The former represents the local importance of the term in the document and the latter represents the global importance of the term in the whole collection. Weighting schemes are usually based on a combination of these two sources, e.g. wi,j = tfi,j $ idfi in documents, wi,q = idfi in the queries, and the matching (Retrieval Status Value, RSV) is evaluated with the cosine of their vectors (Baeza-Yates & Ribeiro-Neto, 1999):
101 103 104 105 106 107 108 109 110 111
RSVðdj ; qÞ ¼ cosðd~j ; ~ qÞ
ð3Þ
When dealing with image documents, the problem of the semantic gap (Smeulders et al., 2000) arises: because of the distance between the raw signal (i.e. the pixel matrix) and its interpretation, it is difficult to automatically extract an accurate semantic content representation of images. Image retrieval approaches can be divided in two main categories: signal-based approaches and semantic-based approaches. Signal-based approaches consider low-level image features (e.g. color, texture, etc.) to be image descriptors. Images and queries are represented with feature vectors, the coordinates of which represent the amount of different colors or textures in the image. Hence the numerical values of such vectors have a different meaning than the weights in the VSM, as they are not related to the features’ semantic importance. Another important difference is that the notion of idf is often not integrated into Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx 112 113 114
these systems. However, a normalization of the dimensions of RGB color space is applied in QBIC (Flickner et al., 1995; Niblack et al., 1993): every coordinate in a vector (or histogram) is divided by the standard deviation of all values in the collection for this coordinate. Hence the mean color vector for a document dj is:
115 117
3
~ dj ¼
!
Rj G j B j ; ;
rR rG rB
"
133
where Rj, Gj and Bj are respectively R, G and B components of the image dj, and rR, rR and rR are respectively the standard deviations of R, G and B components for the whole collection. This normalization aims to homogenize value distributions for the three dimensions. Wang, Li, and Wiederhold (2001) and Wang and Du (2001) introduced in SIMPLYcity the region frequency and inverse picture frequency (rf.ipf), a region-based measure for image retrieval purposes, inspired by the tf.idf weighting scheme for image retrieval. The images in the collection are segmented into regions, employing several features (e.g. color components). The rf measures how frequently a region feature occurs in a picture, and the ipf attaches a higher value to region features that occur in fewer images, which are therefore considered good discriminators. All regions are assigned an rf.ipf weight, which is stored for the image matching process. This weighting scheme has been evaluated for an image classification task with 10 categories such as ‘‘beach’’, ‘‘mountain’’ or ‘‘building’’. Results show that a higher classification quality is achieved with this weighting scheme than when using simple color histograms. The basic difference is the integration of the region distribution into the collection, which makes SIMPLYcity closer to semantic-based IR approaches. The work presented here is inspired by text retrieval, in the sense that we attach a semantic label to (combinations of) objects appearing in images. The objective is to give them more or less importance according to how well they describe the image.
134
2.2. Relational descriptions
135
Semantic-based approaches integrate a semantic interpretation of the image content. Simple keywords can be used to describe the main elements in an image, e.g. ‘‘man’’ or ‘‘sky’’. However some information contained in the image cannot be expressed or modeled by keywords themselves (Ounis & Pasca, 1998), such as the semantic and spatial relations between objects, or object attributes. Expression of the spatial relations between objects have been studied in research by Bertino and Catania (1998) and Di Sciascio et al. (2002), in which the authors develop languages to represent the objects’ shape and position in an image. These languages deal with relational image indexing and querying. The semantic interpretation of image content can also be integrated with types of high-level knowledge representation formalism, such as conceptual graphs (CG) (Sowa, 1984) that constitute a formal framework for image representation and retrieval. The CG formalism, like many types of knowledge representation formalisms in artificial intelligence, allows for Boolean truth values only. Numerous studies have been carried out to extend the conceptual graph formalism. Morton (1987) extend Sowa’s conceptual graph theory to include fuzzy concepts, fuzzy referents, and fuzzy operations. Wuwongse and colleagues (1993) extend Morton’s fuzzy conceptual graphs to take into account fuzzy conceptual relations. Maher (1991) proposes a similarity measurement for matching simple fuzzy conceptual graphs, based on fuzzy similarity measurement for concepts. Those studies aim to assign a non-Boolean truth value to a conceptual graph, in order to take account for uncertainty when confronted with imprecise knowledge. The CG formalism has been used for image representation by Mechkour (1995) and for image retrieval by Ounis and Pasca (1998). In such approaches, the similarity between a document and a query is determined using the graph projection operator that consists of a graph isomorphism, known to be of exponential complexity (Chein & Mugnier, 1992; Mugnier & Chein, 1996). This point impacts the document and query processing time negatively, as pointed out by Mechkour in Mechkour (1995), in which the processing time was close to 2 min for a query, which does not match the usability constraints of an end-user system (Nielsen, 1994; Bouch, Kuchinsky, & Bhatti, 2000). Ounis and Pasca suggest shifting a part of the data processing to the indexing time (i.e. offline), in order to make the query processing algorithm polynomial. However, the indexing time required grows exponentially with the size of the collection. Moreover, this system does not allow users to rank the relevant documents in decreasing order of relevance, but only to organize documents into the three following relevance classes:
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
1. Total matching 2. Partial matching 3. No matching at all
163 164 165 166 167 168
The approach presented in this paper is an extension of the conceptual graph approach for representing and searching for complex documents, such as images. In particular, this extension consists of using elementary sub-graphs as image descriptors in a vector space representation. Star graphs are made up of a single relation attached to one or more concepts representing the objects in an image. This representation is based on an image model described in the following section. Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 4
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
169
3. Image model
170
177
What do users actually consider significant when describing images, and how can a model reflecting their behavior towards image relevance be developed? According to previous experiments in the context of retrieving non-specific images1 (Hollink, Screiber, Wielinga, & Worring, 2004) related to home photographs (Lim, 2001; Mulhem, Lim, Leow, & Kankanhalli, 2003), users tend to describe images based on the objects appearing in them, and users seem to find it more intuitive to access images using natural language descriptions, rather than low-level characteristics such as colors or textures (Rodden, 1999; Rodden & Wood, 2003). This explains why our approach focuses on visible objects and not, say, on color regions. It also explains why such visible objects are considered basic entities in our image model. The model in this paper is based on an indexing language composed of star graphs (see Definition 1 below), that are relational descriptors for the images.
178
3.1. Physical image
179
Firstly, we introduce the following notions of image and region on which our image model is based. An image is basically considered to be a matrix of pixels.
171 172 173 174 175 176
180 181 182 183 184 185 186 187 188
189 190 191 192 193
Definition 1 (Image). An image I is a set of spatially connected pixels organized in a rectangular matrix, corresponding to a two-dimensional projection (or visualization) of a real-world scene. The number of pixels of I is jIj, written nI. The image represents a set X of visible physical objects. In order to provide a description of the objects visible in an image, we now give the definition of a region. Definition 2 (Region). A region is defined as a two-dimensional projection (or visualization) of a (part of an occluded) physical object which is itself part of the scene displayed in an image. Formally, a region r # I is a non empty set of spatially connected pixels from an image I, r 2 PðIÞ where PðIÞ is the set of non empty spatially connected parts of I. The number of pixels of r is jrj P 1, written nrp . Definition 3 (Region Segmentation). The set of regions in an image I forms a partition SIR % PðIÞ of I: & 8ri ; rj 2 SIR : r i \ r j ¼ ; S & r2SI r ¼ I R
The number regions of I is jSIR j, written nIR .
194
197
A relation R1 defined on X $ SIR associates physical objects and regions is one-to-n: several regions can be associated to different parts of one (occluded) physical object, and all visible physical objects are associated to at least one region. This relation is shown in Fig. 1.
198
3.2. Image objects
199
Our model is based on the notion of image object (IO). This following definition of an image object is inspired from Mechkour (1995):
195 196
200 201 202 203 204 205 206 207 208 209 210 211 212 213
Definition 4 (Image Object). An image object is defined as a two-dimensional projection (or visualization) of a physical object (or possibly several physical objects) which is (are) part(s) of the scene in an image. Formally, an image object o # SIR is a non empty set of not necessarily spatially connected regions from SIR , o 2 PðSIR Þ where PðSIR Þ is the set of non empty subsets P of SIR . The number of regions of o is joj P 1, written nor , and the number of pixels of o is r2o nrp , written nop . Image objects provide a flexible model for describing image content, as – according to Definition 4 – an IO may correspond to several non-connected regions. Hence, they provide a solution for handling groups of objects in photographs. For instance, they can be used to describe a group of boats in a harbor, a group of trees in a forest, or a group of people in a crowd, with a given level of granularity. This flexibility also makes it possible to take into account the common occurrence in which the object is partially occluded (a car behind a tree, for instance). We will refer further to these two situations as fragmentation cases of an IO. # $ Definition 5 (Image Object Segmentation). The set of image objects of an image I forms a partition of SIR , SIO % P SIR : & 8oi ; oj 2 SIO : oi \ oj ¼ ; S & o2SI o ¼ SIR O
1
Non specific images here means non application-specific images, such as medical or satellite images usually are.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
5
Fig. 1. Graphical view of the relation R1.
Fig. 2. Graphical view of the relation R2.
Fig. 3. Graphical view of the relation R3. 214 215 216
217 219
% % % % The number image objects of I is %SIO %, written nIO . By construction, SIO is a partition of I. Every region belongs to one and only one image object. Hence, the region partition of the pixels is finer than the image object partition:
8r 2 SIR : 9!o 2 SIO jr 2 o
220
224
A relation R2 defined on SIR $ SIO associates regions to image objects. This relation is n-to-one, as several regions can be grouped into a single image object. Fig. 2 shows a graphical view of R2. A relation R3 defined on X $ SIO connects physical objects to image objects. This relation is also n-to-one, since several physical objects can be connected to one image object. Fig. 3 shows a graphical view of R3.
225
3.3. Interpretation of an image object
226
An IO is related to one or several semantic interpretations which are defined by labels. Here a label stands for a semantic interpretation of the related IO. In its general definition, a label may belong to any kind of indexing language, as part of a particular indexing model. With regard to images, such labels are usually natural language words (‘‘tree’’, ‘‘house’’, ‘‘sky’’, ‘‘boat’’, etc.) In some cases, it is not straightforward or relevant to assign a semantic interpretation to certain IOs: blurred, cluttered, or background objects, for example. The indexing language may contain one specific label to be attached to void objects that have no semantic contribution to the index. Physical objects and IOs are connected through semantic interpretations of IOs. An IO o is connected to a concept2 c = [t, #o] whose type t corresponds to the label of o and whose referent #o is an identifier of the IO. This connexion is one-
221 222 223
227 228 229 230 231 232 233
2
A concept as an element of the conceptual graphs formalism.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 6
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Fig. 4. An example of concept type lattice.
236
to-n, since a single object can have several semantic interpretations in the indexing language. The concept type set Tc is organized in a lattice hTc, 6ci whose relation 6c is a generic/specific relation (IsA) between concept types. Fig. 4 shows an example of concept type lattice, in which concept types are organized according to a partial order 6c. For instance, Flower 6c Foliage.
237
3.4. Discussion
238
In linguistics, synonyms are different words or phrases with identical meanings, and polysemy refers to a word or phrase’s capacity for multiple meanings. A case in which several IOs have the same interpretation is referred to as visual synonymy and a case in which an IO can have several interpretations is referred to as visual polysemy. While the problem of visual synonymy is raised during the automatic selection of a label (or concept) to be attached to IOs, the model tackles the problem of visual polysemy by allowing several labels to be attached to a given IO.
234 235
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
3.4.1. Automatic label selection: the problem of visual synonymy The automatic segmentation and labeling of images and video keyframes is prone to error and remains an ongoing problem in computer vision. The challenge is automatically determining a semantic label for a given set of pixels that is prone to great variations. The identification of IOs continues to be a problem, and most of the existing segmentation methods rely on the low-level characteristics of image parts, such as color, intensity, texture, and shape. The last 10 years have witnessed significant improvements in this regard. The methods for segmentation include model-based segmentation (Carson, Thomas, Belongie, Hellerstein, & Malik, 1999), graph partitioning segmentation (Shi & Malik, 2000), clustering methods (Wang et al., 2001; Wang & Du, 2001), and multi-scale segmentation (ter Haar Romeny, 2003). The methods for annotation/recognition include neural networks classifiers (Lim, 2000, 2001), and SVM-based classification (Ayache, QuTnot, & Gensel, 2007; Snoek & Worring, 2005; Vapnik, 1995), for instance using popular SIFT features (Lowe, 2004). In order to avoid the imprecise segmentation process, approaches increasingly rely either on grid-based segmentations in which the images are tessellated in rectangular patches (Bastan & Duygulu, 2006; Lim, 2000, 2001; Martinet & Satoh, 2007; Wang et al., 2001; Wang & Du, 2001), or on interest point detection (Quelhas, Monay, Odobez, Gatica-Perez, & Tuytelaars, 2007; Hoerster, Lienhart, & Slaney, 2007; Quelhas & Odobez, 2006; Zheng, Wang, & Gao, 2006). Local features extracted from patches or around interest points are then quantized and classified according to their visual appearance. In grid-based approaches, contiguous patches sharing the same label can be merged to form larger regions (Lim, 2001). The annotation performance of automatic systems is improving steadily: in Quelhas and Odobez (2006), Quelhas et al. (2007), Vogel and Schiele (2007), the authors achieve annotation precision of about 70% on a 6-class problem. In the case where annotations are automatically derived, information such as the size and the position of corresponding regions can be directly used directly in the model for weighting the indexing terms (see Section 4). 3.4.2. Multi-label justification: visual polysemy The one-to-n connexion between IOs and concepts originates in visual polysemy, and is justified by the need to represent all possible semantic interpretations of a single, real-world object. While most of the IOs will be connected to a single concept (and therefore a single label) in practice, the model can also represent cases in which it is necessary to define several labels for an IO, such as a window (or a fence, etc.) through which a tree can be seen, or a mirror (or the surface of calm water) reflecting a cloud. In such cases, several concepts are attached to the same IO. 3.4.3. Limitations for other types of relations On one hand, the advantage of representing concepts in a lattice is that the IsA relation between concepts is integrated into the model (see Definition 8) according to the first order logic interpretation of conceptual graphs subsumption, because the generic/specific relation of concepts corresponds to a hyponym–hypernym relation of subsumption for the labels. Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
7
Fig. 5. An example of a relation lattice.
282
On the other hand, one limitation of the model is that it represents neither the synonym/antonym relations nor holonym/ meronym relations (partOf), since the concept lattice does not represent these relations. Indeed, the concept lattice is part of the conceptual graph formalism, and represents only hyponym–hypernym relations between labels. Our model conforms to these relations in order to be consistent with the CG formalism and the graph projection. As a consequence, the issue of selecting the best label to be attached to an IO among all concepts and their possible relations (e.g. should a wing be labeled with ‘‘wing’’ or ‘‘airplane’’ – or should a forest be labeled with ‘‘forest’’ or ‘‘tree’’ in a fragmented case) is an unresolved problem in IR that lies outside of the scope of this paper. However, the model makes it possible to use both labels, which is consistent with the fact that a given image displaying wing is likely to be relevant to a query about both ‘‘wing’’ and ‘‘airplane’’. Nevertheless, the granularity and the richness of the indexing language should match the image database domain, the labeling strategy being dedicated to fulfilling retrieval needs.
283
3.5. Relational vector space of star graphs
284
In order to allow relational indexing, concepts can be put into a relation with conceptual relations (Sowa, 1984). Like the set of concepts, the set of conceptual relations Tr is organized in a lattice hTr, 6ri. Fig. 5 shows an example of a relation lattice in which relations are organized according to a partial order 6r. See for instance: standing 6r position. Concepts are connected to a relation by the means of star graphs to denote the relationship between them.
273 274 275 276 277 278 279 280 281
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307
Definition 6 (Star graph). A star graph is an elementary conceptual graph consisting of one and only one relation r, connected to ar concepts ci (ar P 1 is the arity of r), with linear writing: hrðc1 ; c2 ; . . . ; car Þi. A star graph (SG) can be either specific (when composed of specific concepts3) or generic (when composed of generic concepts). The indexing vocabulary is defined according to the canonical basis, containing the signatures of the relations from Tr. Definition 7 (Indexing vocabulary). The indexing vocabulary is composed of all generic star graphs derived from the canonical basis star graphs by restricting the relation, or one or several concepts. In other words, the indexing vocabulary contains all possible restrictions of the star graphs from the canonical basis. Example of star graphs are given in Fig. 6. The following star graphs: & hrowing(Jean:#2)i & hon(Boat:#4, Sea:#3)i & hbetween(Boat:#4, Matthieu:#1, Jean:#2)i are specific as they contain specific concepts.4 A star graph is said to be generic when it contains generic concepts only. Our definition of a star graph is slightly different from the one proposed by Chein and Mugnier (1992) and Mugnier and Chein (1996), who define a star graph as a relation attached to concept types. In our definition, relations are attached to concepts (that is to say a specific star graph). However, in essence, our definition of a star graph and the definition by Chein and Mugnier share the same underlying support: & the generic star graphs are seen as building blocks for generating conceptual graphs by making them specific, and then by performing joint operations on them to build the graph, & the specific star graphs in the index of a document are seen as instances of generic star graphs that are the index terms of the presented model. 3 4
A specific concept is a concept whose referent is an individual, and not the universal referent ('). A specific concept has an individual referent, unlike a generic concept whose referent is the universal referent (').
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 8
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Fig. 6. Star graphs of unary, binary and ternary relations (from left to right).
Fig. 7. Example of two images likely to be labeled with ‘‘Jean’’, ‘‘Matthieu’’, ‘‘Boat’’, and ‘‘Sky’’. While these four labels do not make it possible to distinguish between these images, the star graph helps differentiating them since they describe only the image on the left.
Fig. 8. General outline of the star graph lattice. 308 309 310 311 312 313 314 315 316 317 318 319
The star graphs given in Fig. 6 are taken from the description of the left image in Fig. 7. While the concepts of these star graphs could label the image on the right, the star graphs do not apply. Star graphs are more expressive than simple keywords, and less complex by construction than large conceptual graphs. Hence we consider them as a satisfying trade-off between the lack of expressivity of keywords and the processing complexity of conceptual graphs. The set of star graphs (SGs) extracted from the collection of conceptual graphs has a lattice structure, composed with sublattices of relations sharing the same arity (number of concepts). Its partial order relation 6sg, is the graph projection (Sowa, 1984). This lattice is limited at the top by >sg and at the bottom by botsg. It has the general outline shown in Fig. 8. In this figure, there are three sub-lattices, containing star graphs respectively with two, three, and four concepts. Documents and queries are represented in our model by vectors of generic star graphs.5 For instance, if a document is described by the specific star graphs of Fig. 6, then its index will include the following generic star graphs: hrowing(Jean)i, hon(Boat, Sea)i and hbetween(Boat, Matthieu, Jean)i.
5
The referents are not included in the indexing language as they refer to particular instances of concepts.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
9
320
3.6. Image index
321
Automatic document expansion is an important part of the indexing process in our model. This document expansion adds some related index terms to the index of the original document. When a document is indexed by a given index term, it is also implicitly described by all index terms that are generic to the given index term. For instance, a document that is originally described by the term hleft(Boat, Pine-tree)i is also implicitly described by the term hleft(Boat, Tree)i if, for example, the concept type lattice contains the information Pine-tree 6c Tree. The automatic document expansion process adds the second term above (if not already included) to the index of the document. The addition of a term to the index of a document means changing its weight to a non-zero positive value.
322 323 324 325 326 327 328 329
330
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349
Definition 8 (Automatic document expansion). Let SG be the set of specific star graphs indexing a document. The expansion of this document is defined by the set of specific star graphs SG’:
& ' SG0 ¼ SG [ sg j j9sg i 2 SG : ðsg i 6sg sg j Þ
The set SG’ is the index of the document. The operation of adding an extra term to the index means changing its weight from 0 to a non-zero value. If the term that is added already belonged to the index, its weight will still be increased. Fig. 9 shows the index of a document before (above) and after (below) the automatic document expansion. Dimensions corresponding to generic dimensions of original dimensions are affected non-zero values, namely the weight of the term hrowing(Person)i is 0 before the document expansion, and becomes wk,1 > 0 after the document expansion, assuming that Jean, Matthieu and Caroline are specific concept types of Person. 3.6.1. Justification The justification behind the operation of document expansion as part of the model originates in the first order logic interpretation of conceptual graphs subsumption. Indeed, a projection of a graph g into a graph h exists if and only if h ) g (Chein & Mugnier, 1992; Sowa, 1984). Since our model uses a lattice to represent the indexing vocabulary, the set of star graphs is not flat (i.e. a bag of terms). It is necessary to guarantee the consistency of the vector match with regard to the logical implication. Hence, this pre-processing step ensures a non null term intersection between indexes containing star graphs that are comparable to the relation 6sg. 3.6.2. Limitation The implementation of automatic document expansion in the model is results exceptions problems, inherited from the standard automatic document expansion in text retrieval. For instance, if the concept lattice states Penguin 6c FlyingAnimal,
Fig. 9. Vector of the document before (above) and after (below) the document expansion.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 10
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
358
then the automatic document expansion would generate index terms with ‘‘FlyingAnimal’’ for images containing a ‘‘Penguin’’. To avoid this still unresolved problem, a disambiguation step is necessary. One way of avoiding the problem is to take special care in the design of the ontology that results in the concept lattice. Another solution is to use Latent Semantic Indexing (LSI) techniques (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990; Hull, 1994). LSI is an IR method based on a singular value decomposition for identifying patterns in the relationships between terms and concepts in an unstructured collection of text. LSI is based on the principle that words that are used in the same context tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing connections between terms that occur in similar contexts. Applying LSI to concepts and/or star graphs could potentially resolve this issue.
359
4. Term weighting
360
375
We have developed the image model, in which documents and queries are represented with star graph vectors. In the context of IR, assigning weights to index terms in order to emphasize important terms is a central problem. The purpose is to quantify how well they describe documents semantically and distinguish between them. Previous section defined the indexing vocabulary; this section discusses the aboutness of an image considering a given index term, i.e. the importance of the index term regarding the image. The challenge is assigning a weight to an index term for the purpose of measuring the extent to which the image containing the corresponding star graph is relevant for a query about the index term. In order to define such a weight of an index term (i.e. a star graph), it is necessary to define first the notion of importance of concepts. This importance is directly related to the importance of their associated image objects. In our approach to image indexing, such image components are intended to provide a basis for semantic image indexing, but also potentially to feature-based indexing (or signal indexing) of images. These definitions are related to the notion of image content in a similar way to the standard notion of term importance in the context of text retrieval. The notion of occurrence – now very classical when considering the weighting of index terms for textual documents – is much less intuitive in the case of images. Images are 2D data and in this context it should be understood that the relevance of an image showing boats is not only related to the fact that it represents one or more distinct boats. In that regard, other 2D perceptive factors are possibly more important, such as the spatial position or the size of these boats.
376
4.1. Image objects importance
377
Considering an image I and an image object o of I, o is labeled by a concept type t (see Section 3.3). The importance of o regarding I is directly related to the aboutness of I considering t. For example, one will consider that an IO representing a boat (hence t = ‘‘boat’’) in a given image is important if most users would consider that this image is about a ‘‘boat’’. In our opinion, aboutness is a notion mainly local to a given document. It should not be confused with relevance, a notion that usually compares a particular document to the whole corpus. The three criteria outlined here are local to images and thus cannot be directly assimilated to relevance evaluation. A relation between these two notions is somewhat illustrated by the tf.idf weighting model: given a term t of document D, tf of t stands for the aboutness of D considering t, while tf.idf is an estimate of the relevance of D considering t. In earlier studies (Martinet, Chiaramella, & Mulhem, 2005, 2008), we propose and validate a definition of the importance of media objects, and more specifically image objects. We give a general overview of this work and refer interested readers to the original publications for more details. Connected to an image region (or a set of image regions), an IO is basically a 2D set of pixels whose basic geometrical characteristics (area, position, etc.) can be easily calculated. For instance, the area can be estimated by summing up the surface of polygons delimiting the IO, and the position can be assimilated to the center of the bounding box of the IO. According to the weighting model defined in Martinet et al. (2005, 2008), the importance of an IO is determined by the following three criteria:
350 351 352 353 354 355 356 357
361 362 363 364 365 366 367 368 369 370 371 372 373 374
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397
& Criterion 1 (Size): The importance of an image object varies in the same way as its size S. & Criterion 2 (Position): The importance of an image object is maximal when its position P is at the center of the image, and decreases when its distance from the image center increases. & Criterion 3 (Homogeneity): The importance of an image object varies in the same way as the homogeneity H of its embedding image.
398 399 400 401
402 404 405
The modeling of the criteria is based on probability theory and Shannon information theory (Shannon, 1948), which represent a formal framework suited for IO importance modeling. The importance of an IO o in an image I has been defined as:
importanceI ðoÞ ¼ Snorm ðo; IÞ þ Pnorm ðo; IÞ þ Hnorm ðIÞ
ð4Þ
where Snorm, Pnorm and Hnorm are the normalized values of criteria S, P, and H respectively. The normalization is done with: Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
406 408 409 410 411 412 413 414
415
417 418 419 420 421
422 424
425 426 427 428 429 430 431 432 433 434 435 436
437 439 440
441
442 444
445 446 447 448 449 450
451 453 454 455 456
v norm ¼
max
11
v ) min v alue v alue ) min v alue
where min_value (resp. max_value) is the minimum value (resp. the maximum value) of the criterion value found in the whole collection. Since the three criteria might have different variation ranges on different collections, the normalization is applied in order to align both lower and upper bounds (Lee, 1997). The criteria S, P, and H are modeled in the following way: 1. Human perception of surfaces being logarithmic rather than linear (Rojet & Schwartz, 1990), we define a normalized surface according to the following formula:
# $ log nop # $ ¼ lognI ðnop Þ Sðo; IÞ ¼ p log nIp
I where nop is the number of pixels in o, nIp is the number of pixels in I, and lognIp is the logarithm in base n h p . S(o, I)# is $an i o increasing function of np (according to the first criterion) that has a logarithmic variation with values in 0; lognIp nIp , that is [0, 1]. 2. The position criterion P is integrated into our model through a non-uniform probability density function:
Pðo; IÞ ¼ p0I ðoÞ where p0I is a probability distribution that gives higher probability values to IOs in the center of an image I. Some examples of such a distribution are given in Fig. 10, where I is a simplified image seen as a 1D segment, and a 1D distribution is shown above I. Two image objects o1 and o2 are represented as parts of the segment, and their associated probabilities correspond to the black areas below the probability density curves. These distributions would correspond – for 2D images – to (a) a flat distribution, (b) two plateaux: a square/circular high level in the center and a low level for the remainder if the image, (c) a conic distribution, and (d) a bell-shaped (Gaussian-like) distribution. The three distributions (b)–(d) meet the requirements related to the second criterion, as the probability associated with o1 is higher than the one associated with o2. The position values belong to [0, 1] and they are greater for IOs in the center of the image according to this criterion. 3. With regard to the homogeneity criterion, let us consider normalized sizes of IOs as probabilities, as they verify all classical Kolmogorov properties of a probabilistic distribution and are therefore isomorphic to a probability. The probability associated with an IO o corresponds to the probability that a pixel taken randomly from the image belongs to o:
pðo; IÞ ¼
nop nIp
Based on this probability, we can define a spatial entropy SH (according to Shannon (1948)) calculated from the spatial distribution of IOs in an image, in order to represent the ‘‘disorder’’ in the image:
SHðIÞ ¼
X o#I
pðo; IÞ $ ) log ðpðo; IÞÞ
The term ‘‘)log(p(o, I))’’ represents the amount of information held by o in the context of I. IOs have a minimal size, pðo; IÞ 2 ½n1I ; 1+ and consequently: ) logðpðo; IÞÞ 2 ½0; logðnIp Þ+. Entropy values are larger when there are many objects with p the same size in an image and when one object in particular is less visible. Entropy values are smaller when there are fewer objects with different sizes, when one object is bigger than the others and therefore more easily visible in the image. To be consistent with our last criterion, the homogeneity value is defined as the complement to 1 of SH to which we apply a normalization factor. The homogeneity H is defined according to the following formula:
HðIÞ ¼ 1 )
SHðIÞ SHmaxðIÞ
where SHmax(I) is the maximum value of the spatial entropy, corresponding to the virtual case where the image is composed of nIp 1-pixel IOs. The homogeneity values are large for homogeneous images according to the last criterion, and they belong to [0, 1]. This value is the same for all IOs in one image.
457
Fig. 10. Uniform probability distribution (a), and examples of non-uniform probability distributions (b–d).
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 12
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
461
Hence, our definition of IO importance combines the three criteria that have been experimentally validated in Martinet et al. (2005, 2008) to reflect the aboutness of images with regard to the semantic interpretation of IOs. This definition of IO importance provides the foundations of the weighting model for generic star graphs that are the index terms of the presented model.
462
4.2. Weighting scheme for index terms
463
Based on the definition of IO importance, we define the weighting model for the indexing vocabulary. The model is an extended and refined definition of the model presented in Martinet, Chiaramella, Mulhem, and Ounis (2003, 2003). The main differences are the criteria by which the importance of individual concepts is defined and their combination.
458 459 460
464 465 466 467 468 469
470 472
473 474 475 476 477 478
479 481
482 483 484 485
486
488
4.2.1. Local importance of a star graph in a document Specific star graphs connected to an image are composed of specific concepts whose referents are the image objects. Hence the importance of a specific star graph is related to the importance its IOs’ concepts. Let us first associate the importance of a concept ck = [t, #o] with the importance of its IO o:
½importance+I ðck Þ ¼ ½importance+I ðoÞ A function combining the importance values of the concepts must satisfy the following requirements: & The importance value of a star graph should increase when the importance values of its concepts increase. & The arity of the relation of a specific star graph should not influence the importance value. These requirements are satisfied by the mean value. We can define the local importance of a specific star graph s of an image j as:
local-specifics;j ¼
1 X importancej ðck Þ aritys c 2s k
The local importance of an index term i (generic star graph) for an image j is a function of the importance values of specific star graphs associated with the index term. Inspired by text IR famous weighting schemes for the tf value of a keyword, we define the local importance of an index term as the normalized number of occurrences of its corresponding specific star graphs:
locali;j ¼
P
s;s$i local-specifics;j maxðlocal-specifick;j Þ k
ð5Þ
489
where s refers to the specific star graphs associated with i (written s M i) and k refers to all specific star graphs of the image j.
490
4.2.2. Global importance of a star graph in a collection In traditional approaches to text vector space models, the global importance of an index term aims to emphasize the impact of the most discriminating index terms (e.g. the use of an idf value). This factor for a star graph i is calculated using the classical formula:
491 492 493
494 496
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512
globali ¼ log
! " N ni
ð6Þ
where N is the number of documents in the collection, and ni is the number of images of the collection that are described by the star graph i. An important aspect of this factor is that it is applied after the document expansion (see Definition 8). Hence, the semantics are slightly different than the traditional idf (Martinet et al., 2003, 2003), as the index contains all generalizations of original star graphs defined by the indexer. The traditional idf of a keyword i is calculated by counting the number of documents in which the keyword i occurs, while our globali is calculated for the generic star graph i by counting not only the documents in which i occurs, but also the documents in which any star graph generic to i occurs. For instance, in order to calculate the global importance value of the term hleft(Boat, People)i, not only is it necessary to take into account the documents indexed by hleft(Boat, People)i, but it is also necessary to consider the documents indexed by hleft(Sailboat, People)i, hleft(Boat, Jean)i, or hleft(Boat, Matthieu)i (assuming that the concept type lattice contains the following relations: Sailboat 6c Boat, Jean 6c People, and Matthieu 6c People). Indeed, all of these documents are implicitly indexed by hleft(Boat, People)i, which is made explicit by the automatic document expansion process (see Definition 8). 4.2.3. Weight of a star graph The above definitions of the local importance and the global importance of a star graph allow us to define a weighting scheme for star graphs that consists of combination of both values. Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
13
Fig. 11. Example of local and global values for two images.
513 514 515 516
517 519 520 521 522 523
524 526
According to the classical weighting approaches of the VSM for textual documents (see Section 2), we define the weight of a star graph i in a document j based on the two following sources: the locali,j value that is calculated for the star graph i in the document j, and the globali value that is calculated for the star graph i, independently of the document in which it appears:
wi;j ¼ locali;j $ globali
ð7Þ
Hence, the weight of a star graph combines its local importance in the document and its global importance in the collection. In the context of our relational vector space model, the users’ queries, like the documents, are represented with star graphs vectors. The weighting scheme for query star graphs is based on their global importance in the collection:
wi;q ¼ globali
ð8Þ
535
Fig. 11 shows an example of two images from the collection Coll-1 (see the description in the next section) with local and global values for the index term hright(Horse, Tree)i. The details of S, P, and H values are given for the concept ‘‘Horse’’. The real surface (not using the log) is also given for information between braces. Note that the H value is defined for the image itself, and therefore is the same for all concepts. Also note that the global value is defined for a given term in the whole collection. This term is given a high global value (5.296) because only 4 out of the 798 images in the collection are indexed with this term. The global values for terms in this collection range from 0.167 for the term htouch(Natural_Object, Natural_Object)i that belongs (after document expansion) to 675 images, to 6.682 for the term htouch(Car, Flower)i that belongs to only one document. While the first term is considered not useful for discriminating documents in the collection, the second term is very useful to select relevant documents in the collection.
536
5. Experiments
537
In previous sections, we develop an image model for representing documents based on vectors of generic star graphs, which are considered the indexing vocabulary. A weighting scheme for star graphs has also been introduced as a part of the retrieval model. This experiment section is aimed at:
527 528 529 530 531 532 533 534
538 539 540 541 542 543
& measuring the improvements in performance yielded by the operation of expansion described in Section 3.6, & evaluating the benefits of using star graphs as compared to simple concepts in the indexing vocabulary, & comparing the difference in precision between a star graph vector space system and a conceptual graph total projection system.
544
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 14
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
545
5.1. Image test collections
546
For the purpose of evaluating the presented model, we have used two image collections. The first one, which we will refer to as ‘‘Coll-1’’, is composed of about 800 personal photographs from the authors. The images have been segmented and indexed manually by defining polygons corresponding to real-world objects, and assigning a semantic label to them. This collection contains a wide range of the authors’ holidays photographs. A set of 24 queries has been designed for this collection, containing an equal number of queries with one keyword (or one concept, e.g. ‘‘a boat’’), several keywords, one relation (or one star graph, e.g. ‘‘a water stream under a leafed tree’’), and several relations. Fig. 12 shows a sample of images from Coll-1. The second collection (‘‘Coll-2’’), contains about 2300 images that have been indexed automatically (Lim, 2001). The automatic indexing process is based on an artificial neural network, that is trained to classify 20 $ 20 image blocks into categories such as rock, building, and water. Regions (or blobs) are formed by aggregating spatially connected blocks sharing the same label. For comparison purposes, we have used the same query set as in the experiments presented in Mulhem and Lim (2002), in which authors have used a set of 24 queries. Fig. 13 shows a sample of images from Coll-2. For both collections, the importance values of image objects are computed according to Eq. (4). Spatial and position relations have been automatically extracted by considering the barycenter of the regions, and the resulting star graphs are weighted according to Eq. (7) (and Eq. (8) for query star graphs). The relations in the considered lattice include:
547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565
& an indication of the absolute location of IOs in the image (such as one IO ‘‘is located in such part of the image’’, the images being tessellated with a 5 $ 5 grid), & the information of whether an IO is close to (and touches) the border of the image (such as one IO ‘‘touches the right border of the image’’), and & an indication of the relative position of IOs (such as one IO ‘‘is located below’’ another IO).
Fig. 12. Sample of images from Coll-1.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
15
Fig. 13. Sample of images from Coll-2.
Table 1 Statistics for the two collections. Coll-1
Coll-2
Number of documents Number of queries
798 24
2382 24
Number Number Number Number
of of of of
concepts in vocabulary: original concepts in vocabulary: extended star graphs in vocabulary: original star graphs in vocabulary: extended
237 253 15,868 53,007
26 38 2000 7361
Number Number Number Number
of of of of
concepts per document: original concepts per document: extended star graphs per document: original star graphs per document: extended
6.94 16.16 75.14 907.58
3.01 8.81 18.74 317.74
Number Number Number Number
of of of of
concepts per query: original concepts per query: extended star graph per query: original star graph per query: extended
1.67 31.62 2.37 267.58
1.58 7.29 2.5 38.58
566 567 568 569 570 571 572 573 574 575 576 577 578 579
These relations are 2D relations, that are automatically computed based on the location and surface of the bounding boxes of IOs. The automated detection of 3D relations between real-world objects is more problematic since IOs are a 2D projection of these objects onto the image. A finer interpretation is necessary to integrate 3D relations. This step would require a 3D scene analysis to extract depth information, by using a probabilistic modeling of the background/forground over a sequence of images (Li, Huang, Gu, & Tian, 2004), analyzing focus maps (Czuni & Csordas, 2003; Malik & Choi, 2008), using occlusion information (Merchán, Vázquez, Adán, & Salamanca, 2008), or even use a stereoscopic camera (Francois & Chupeau, 1997). The Table 1 shows some statistics for both collections. These two collections have been selected because they differ in the content, in the size (Coll-2 is three times as big as Coll-1), in the indexing process (manual and automatic), and in the size of the vocabulary (the vocabulary for Coll-1 is about 10 times as big as the one for Coll-2). These two collections have been selected because they differ in the content, in the size (Coll-2 is three times as big as Coll1), in the indexing process (manual and automatic), and in the size of the vocabulary (the vocabulary for Coll-1 is about 10 times as big as the one for Coll-2).
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 16
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Fig. 14. Impact of the expansion for concepts (above) and star graphs (below) for Coll-1.
580
5.2. Evaluation protocol
581
590
We have implemented different systems, and have compared the retrieval performances in terms of Mean Average Precision (MAP) values and Recall–Precision (R–P) curves. We have used trec_eval tool6 provided by NIST to produce the results. The first system (VSM-C) is a vector space system based on concepts only, which corresponds to an implementation of the traditional VSM for text. Star graphs are not used in this system. The second system (VSM-SG) is a vector space system based on star graphs, as defined in Section 3. Both VSM-C and VSM-SG can be run in original space or in extended space, where the operation of expansion adds some new dimensions to the vector space, and new terms to the indexes. In addition, the weighting scheme defined in Eqs. (7) and (8) are used to weight star graphs. The third system, called DIESKAU (Mulhem & Lim, 2002), is a CG-based system which uses the total graph projection operation to select relevant images for a query. A detailed analysis of the evaluation results is given below.
591
5.3. Impact of the vector expansion
592
596
Figs. 14 and 15 show the MAP values for concepts and star graphs according the different weighting schemes in both original and extended spaces, for Coll-1 and Coll-2 respectively. We can see that for both collections, the MAP value in the extended space is about twice as high as the one in the original space. For both Coll-1 and Coll-2, the retrieval performance has increased by a ratio of 2 when using the extended documents and queries for the concepts, and by a ratio of 3 for star graphs.
597
5.4. Relations increase precision in extended space
598
We compared the results of VSM-C and VSM-SG to measure the increase in precision brought about by integrating relations into the VSM. The results given in Fig. 16 show the MAP value difference between the concept-based system and the
582 583 584 585 586 587 588 589
593 594 595
599
6
URL: http://trec.nist.gov/act_part/tools.html.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
17
Fig. 15. Impact of the expansion for concepts (above) and star graphs (below) for Coll-2.
Fig. 16. From concepts to star graphs: impact of integrating the relations. Above: Coll-1, below: Coll-2. 600 601 602 603 604 605 606 607 608 609 610
star graph-based system for Coll-1 and Coll-2 in extended space. These results indicate that for Coll-1 (Fig. 16 above) relations significantly increase the precision of the VSM in extended space, from 0.461 to 0.585, or a 27% relative gain. This can also be seen in the R–P curves given in Fig. 19 above by comparing ’BEST EXTENDED STAR GRAPH’ and ’BEST EXTENDED CONCEPTS’. In this bar chart, we also display the MAP for DIESKAU of 0.497. For Coll-2 (Fig. 16 below), the results show that there is almost no difference – see the scale of the chart – in terms of MAP between VSM-C (0.234) and VSM-SG (0.227). The R–P curves given in Fig. 19 (compare ‘BEST EXTENDED STAR GRAPH’ and ‘BEST EXTENDED CONCEPTS’ in the lower chart) confirm that there is no significant gain with VSM-SG. This result is due to the two following reasons: & the reasonably small size of graphs in Coll-2 documents and queries (on average 3.01 concepts/document and 1.58 concepts/query), & the large proportion of non-relational query in this collection (12 out of 24). Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 18
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Fig. 17. Result for the subset of relational queries of Coll-2.
Fig. 18. Recall–precision curves for best-performing settings of each system (Coll-1).
611
617
However, if we select from among all queries the subset of 12 relational queries, the increase in precision becomes significant (see Fig. 17): 9% from 0.254 (VSM-C) to 0.278 (VSM-SG). With this result, we also notice that the MAP for VSM-SG remains comparable to that of VSM-C when using all queries, meaning that the star graph representation is still valid for non-relational queries. For Coll-2, the MAP values for DIESKAU show a slight increase in precision when using the subset of relational queries (0.136), compared to when all queries are used (0.129).
618
5.5. Star graphs versus conceptual graphs
619
626
In this section, we discuss the differences in search precision between the star graph approach (VSM-CG) and the conceptual graph approach (DIESKAU) more precisely. Figs. 18 and 19 show the R–P curves of these systems for Coll-1 and for Coll-2, respectively. The R–P curve for Coll-1 shows that for small recall values, the graph projection system (DIESKAU) has a higher precision than the star graph-based system. Indeed, this system is precision-oriented since it uses whole conceptual graphs as document indexes. Also, because the total graph projection is used to select relevant images, the system finds fully matching documents accurately. However, the star graph system does not keep the joint information between indexing star graphs. For instance, the joint information on [Tree:#1] in the star graphs:
630
and
612 613 614 615 616
620 Q2 621 622 623 624 625
627 629 631 633
634 635 636 637 638 639 640 641
½Tree : #1+ ! ðleftÞ ! ½Building : #2+ ½Tree : #1+ ! ðabov eÞ ! ½Car : #3+ is lost in the vector space representation. Indeed, these two star graphs are part of the index, which could describe both ‘a tree is located to the left of a building and above a car’ (the original description contained in the whole graph with the joint information) and ‘a tree is located to the left of a building, another tree is located above a car’. Hence, documents containing both star graphs are retrieved by the star graph system regardless of this joint information, which potentially judges relevant documents and partially relevant documents to be fully relevant. For recall values over 0.5, DIESKAU shows a drop in precision, and the precision of the star graph-based system remains reasonably high. This drop is a consequence of the rigidity of the projection system: while both the concept-based system and the star graph-based system retrieve partially relevant documents, the total projection-based system misses these Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
19
Fig. 19. Recall–precision curves for best-performing settings of each system (Coll-2).
642 643 644 645 646 647 648
documents. From this point of view, the vector space system is more flexible than the graph projection system. This result confirms that the graph projection system is precision oriented while the vector space system is recall oriented. This lack of flexibility is more explicit for Coll-2, where the R–P curves show that DIESKAU’s results are lower than the star graph system. Besides, the R–P curves given in Figs. 18 and 19 confirm the following results explained in Sections 5.4 and 5.5: & the large increase in precision brought by the operation of expansion for both collections and & the larger difference in precision between VSM-C and VSM-SG for Coll-1 than for Coll-2.
649
6. Discussion
650
656
The results in the previous section show that the operation of document expansion yields considerable gains in precision, with a ratio of about 2 for concepts and about 2.5 for star graphs. Moreover, integrating relations in the document index and query representation is an important aspect of image retrieval. It helps to increase search precision for relational queries, while not affecting non-relational queries. Last but not least, the overall retrieval performance of our system is of equivalent quality to that of a graph projection system, while query processing time is much shorter (linear) in our system. In this paper, we have not addressed the validation of the Local weighting model (see Eq. (5)), since this specific part of the model been assessed in Martinet et al. (2005, 2008).
657
6.1. Impact of local value
658
In the experiments in this paper, the small differences that can be noticed in the MAP values are the result of minor rerankings of documents since all different weighting settings of a given system result in the same set of retrieved images. The reason for this is that the standard recall–precision measures evaluate systems based on a ground truth about the binary relevance of documents. Consequently internal re-rankings within the set of retrieved relevant documents have no (or very little) effect on the final MAP value. For instance, if the result of one system setting – say Boolean – is:
651 652 653 654 655
659 660 661 662 663
664 666 667
668 670 671 672 673 674 675 676 677
rankBoolean ¼ ½4; 3; 1; 2; 5+ (that is to say the sorted list of images), and the result of the same system with another setting – say Local – is:
rankLocal ¼ ½4; 1; 2; 3; 5+ then (assuming that the only relevant documents are 1–3) the MAP value is identical for both settings. This can be seen in Figs. 14–16: the MAP values are very similar for all weighting schemes, although there are small, non-significant variations. However, the quality of the rankings produced with the Local weighting model is closer to users’ perception of relevance, since this model is based on Eq. (4), which has proved to be a good estimator of this perception. Besides, it should be noticed that the sparseness of the vector spaces – in comparison to the size of the collections – makes it difficult to evaluate the weighting method directly. Indeed, in Coll-1 (resp. Coll-2), the star graph vector space is 53,000 (resp. 7000) in dimension, while the collection contains about 800 (resp. 2000) images. In Coll-1 in particular, this difference
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 20
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
679
of two orders of magnitude makes the occurrence or absence of a star graph in a document index decisive for the retrieval, regardless to its weight.
680
6.2. Impact of global value
681
689
We have found that the Global part of the weighting scheme has no significant impact on the retrieval results. In the original space, the best weighting scheme seems to be Local $ Global, while in the extended space slightly better results are achieved when Local is used independently. The first result is in agreement with what could be expected for tf.idf in the traditional vector space for text retrieval. As for the second result, it can be explained by the distribution of the index terms in the extended space, which differ from a standard vector space because of the expansion operation. Star graphs located in he upper part of the lattice (i.e. close to >sg) tend to occur more often than star graphs in the lower part (i.e. close to botsg), since the expansion adds terms that are generic to those belonging to the index. Besides, due to the sparseness of the space, a small number of terms appear fairly often in the collection, while it is rare to see a wide range of terms with a large Global value.
690
6.3. Processing time and scalability
691
703
A short processing time for a query is a central requirement for interactive information systems. In IR, we distinguish between indexing processing time and search processing time. While a rather expensive indexing processing time is acceptable, it is crucial to guarantee a short search processing time. The indexing approach that we implemented requires close to 10 s per image to generate the star graph index, including the time to generate the labels. Note that this step is performed only once, for all evaluated systems. We emphasize that besides providing a balanced representation between concepts and conceptual graphs, the proposed star graph vector space approach benefits from an important feature of the VSM that is the low processing complexity. Indeed, with a given vocabulary size, the required evaluation time for a query is linear to the number of documents. This feature is opposed to total graph projection systems, where the NP-complexity of graph isomorphism results to nonpolynomial implementations. In the context of user-oriented systems, it is an important requirement to guarantee a short processing time, whatever the size of the collection. While the average processing time for one query with DIESKAU is close to 1 s on a desktop computer, it is three orders of magnitude shorter (about 1 $ 10)3 s per query) for both VSM-C and VSM-SG.
704
6.4. Robustness to labeling errors
705
717
While images from the collection Coll-1 have been annotated manually, images from Coll-2 have been indexed fully automatically. The question of the quality of annotation arises when the annotation process is automated. Indeed, the presented model assumes that an indexing process has generated labels for images beforehand, and relies on this annotation. As a consequence, the quality of the retrieval results obviously depends on the quality of the annotation. However, the comparison between evaluated systems is fair, since all systems are compared on the same basis, with the same images, same labels, and same queries. Our focus is to evaluate the difference between a standard bag-of-word search more elaborated approaches including relations. Besides, while one could argue that the automatic document expansion is likely to amplify labeling errors, it is interesting to notice that it has no impact on the star graph model comparatively to the other models. This point shows in the experimental results, where the gain in precision for Coll-1 and Coll-2 are similar. This is explained by the fact that, on one hand, a label missing from an image (e.g. a ‘‘boat’’ in the image does not appear in the index) will result in this image being never retrieved by all models with this label as a query (silence). On the other hand, a wrong label (e.g. a plane is labeled with ‘‘boat’’) will result in a non relevant image being retrieved by all models (noise).
718
6.5. Limitations of the model
719
Since our model is an extension of the VSM, it obviously suffers from the same limitations as the original VSM. For instance, it is not possible to express the negation of a term in a query. Therefore, a user cannot search ‘‘a boat without sea’’. In addition, the model relies on an open universe assumption. For instance, the query ‘‘a tree in the center of the image’’ does not implicitly mean ‘‘. . . and no other tree in the image’’. When generating the relevant document list, we considered an image to be relevant whenever there actually was a tree in the center of the image, regardless of the remainder of the image. However, it could be argued that a picture of a forest covering the whole image is not specifically relevant to this query. Another limitation is the implicit assumption of orthogonal dimensions in the vector space, so that the use of the cosine is valid. Eq. (3) does not take into account the possible dependencies between terms, which raised criticism against this model (Raghavan & Wong, 1986). However, it has been shown that taking into account the dependencies does not allow one to improve the quality of the model: given the location of the dependencies, considering them in a global way in the collection is likely to produce the opposite effect and deteriorate the model (Baeza-Yates & Ribeiro-Neto, 1999).
678
682 683 684 685 686 687 688
692 693 694 695 696 697 698 699 700 701 702
706 707 708 709 710 711 712 713 714 715 716
720 721 722 723 724 725 726 727 728 729 730
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
21
731
7. Conclusion
732
740
In this paper, we have introduced a new framework for relational information retrieval dedicated to images. This framework includes an image model in which the physical image is segmented in a set of regions that can be aggregated to form Image Objects. Image Objects are a flexible way to delimit real-world objects in the image. When associated with concepts, they have a semantic interpretation; the relation between them is expressed via conceptual relations. Star graphs, which are elementary expressions of the relations between concepts, are essential descriptors of the model presented. They generate a vector space, in which documents and queries are represented as vectors. The vectors coordinates denote the importance of each star graph in the index. They are estimated using a dedicated weighting scheme built on top of a model of importance of Image Objects. Along with vector expansion and similarity function, this representation provides a complete information retrieval model adapted to image retrieval.
741
7.1. Contributions of the paper
733 734 735 736 737 738 739
742 743 744 745 746 747 748 749
After implementing the model, the test results for two different collections show that: & Document expansion yields an important gain of precision, with a ratio of about 2 for concepts, and about 2.5 for star graphs. & Integrating the relations in the document index and the query representation is an important aspect to be taken into account in image retrieval. It helps increasing the search precision for relational queries, while not affecting non-relational ones. & The overall retrieval performance of our system is comparable to the one of a graph projection system, while the query processing time of our system is much shorter (linear).
750 751
This vector space model of star graphs contributes to relational image indexing.
752
7.2. Future works
753
765
The work presented here is meant to be integrated in a fully automatic image retrieval system. Future works mainly aim at designing and testing complementary approaches to either enhance the presented model, or be compared with the current proposition. Since the expansion process generates new dimensions in the document indexes, a side-effect is produced consisting of strong dependencies between the dimensions. This issue could be addressed by performing a theoretical and experimental analysis on the impact of increasing the dependencies. For instance, a Singular Value Decomposition of the star graph vector space may help reducing the space to useful data, as it was done before with the traditional VSM for text. A theoretical analysis of the star graph space structure will indicate how a dedicated idf could be designed for the star graph space, taking into account both the spareness of the space and the effect of document expansion. Regarding the weighting issue, the presented geometrical modeling could be enriched with low-level indicators of region saliency. For instance, saliency maps (Itti, Koch, & Niebur, 1998) could be used to determine the regions and the importance of star graphs. Through this approach, the importance values of the regions could be a function of the estimated saliency value of interest points in the regions.
766
Appendix A. Concept type lattice of Coll-1
768 767
Root > IO Root > Image Root > Artificial_Objet Root > Natural_Objet Artificial_Object > Other_Artificial_Object Artificial_Object > Bell Artificial_Object > Construction Construction > Other_Construction Building > Pond Construction > Building Building > Other_Building Building > Castel Building > Cult_Building Cult_Building > Other_Cult_Building Cult_Building > Church Cult_Building > Monastery
754 755 756 757 758 759 760 761 762 763 764
769 770 771 772 773 774 775 776 777 778 779 780 781 782 783
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 22 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Cult_Building > Mosque Cult_Building > Synagogue Cult_Building > Temple Building > Fortress Building > Hut Building > Igloo Building > Block Building > House Building > Tower Tower > Other_Tower Tower > Church_Tower Tour > Minaret Tour > Castel_Tower Tour > Fortress_Tower Construction > Cloister Construction > Tier Construction > Building_Detail Building_Detail > Television_Antenna
802 803 805 804 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829
Appendix B. Concept type lattice of Coll-2 Root > IO Root > symb symb > people people > face face > man face > policeman man > jluc man > uncle face > woman woman > wife woman > antie face > child child > son child > moody-boy child > funny-boy woman > maid woman > blonde woman > old-lady face > white-hair people > crowd crowd > heads crowd > audience crowd > cyclists crowd > kids crowd > group
830 831 833 832 834 835 836 837 838 839 840 841 842 843 844
Appendix C. Relation lattice Root > comp Root > IsA Root > touchborder touchborder > touchtop touchborder > touchbottom touchborder > touchright touchborder > touchleft Root > touch Root > right_of Root > below_of Root > center0x center0x > center00 Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872
23
center0x > center01 center0x > center02 center0x > center03 center0x > center04 Root > center1x center1x > center10 center1x > center11 center1x > center12 center1x > center13 center1x > center14 Root > center2x center2x > center20 center2x > center21 center2x > center22 center2x > center23 center2x > center24 Root > center3x center3x > center30 center3x > center31 center3x > center32 center3x > center33 center3x > center34 Root > center4x center4x > center40 center4x > center41 center4x > center42 center4x > center43 center4x > center44
873 874
References
875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912
Ayache, S., QuTnot, G., & Gensel, J. (2007). Classifier fusion for SVM-based multimedia semantic indexing. In European conference of information retrieval (ECIR’2007) (pp. 494–504). Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison-Wesley. Bastan, M., & Duygulu, P. (2006). Recognizing objects and scenes in news videos. In Proceedings of CIVR’2006, Phoenix AZ. Bertino, E., & Catania, B. (1998). A constraint-based approach to shape management in multimedia databases. ACM Multimedia Journal, 6(1), 2–16. Bouch, A., Kuchinsky, A., & Bhatti, N. (2000). Quality is in the eye of the beholder: Meeting users’ requirements for internet quality of service. In Conference on human factors in computing systems – CHI’00 (pp. 297–304). Carson, C., Thomas, M., Belongie, S., Hellerstein, J. M., & Malik, J. (1999). Blobworld: A system for region-based image indexing and retrieval. In International conference on visual information and information systems. Springer. Chein, M., & Mugnier, M.-L. (1992). Conceptual graphs: Fundamental notions. Revue d’intelligence artificielle, 6(4), 365–406. Czuni, Laszlo, & Csordas, Dezso (2003). Depth-based indexing and retrieval of photographic images. In LNCS visual content processing and representation (pp. 76–83). Berlin/Heidelberg: Springer. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Di Sciascio, E., Donini, F. M., & Mongiello, M. (2002). Structured knowledge representation for image retrieval. Journal of Artificial Intelligence Research, 16, 209–257. Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al (1995). Query by image and video content: the QBIC system. IEEE Computer, 28(9), 23–32. Francois, E., & Chupeau, B. (1997). Depth-based segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 237–240. Hoerster, E., Lienhart, R., & Slaney, M. (2007). Image retrieval on large-scale image databases. In Proceedings of CIVR’2007 (pp. 17–24). The Netherlands: Amsterdam. Hollink, L., Screiber, A. Th., Wielinga, B. J., & Worring, M. (2004). Classification of user image descriptions. International Journal of Human–Computer Studies, 61, 601–626. Hull, David A. (1994). Improving text retrieval for the routing problem using latent semantic indexing. In W. Bruce Croft & Cornelis J. van Rijsbergen (Eds.), Proceedings of SIGIR-94, 17th ACM international conference on research and development in information retrieval (pp. 282–289). Dublin, IE, Heidelberg, DE: Springer-Verlag. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. Lee, J. H. (1997). Analyses of multiple evidence combination. In SIGIR’97, Philadelphia, USA (pp. 267–276). Li, Liyuan, Huang, Weimin, Gu, Irene Y., & Tian, Qi (2004). Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, 13(11), 1459–1472. Lim, J.-H. (2000). Photograph retrieval and classification by visual keywords and thesaurus. New Generation Computing, 18, 147–156. Lim, J.-H. (2001). Building visual vocabulary for image indexation and query formulation. Pattern Analysis and Applications [Special issue on Image Indexation](42/3), 125–139. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. Maher, P. E. (1991). Conceptual graphs – A framework for uncertainty management. In NAFIPS’91 (pp. 106–110). Malik, Aamir Saeed, & Choi, Tae-Sun (2008). A novel algorithm for estimation of depth map using image focus for 3d shape recovery in the presence of noise. Pattern Recognition, 41(7), 2200–2225. Martinet, J., & Satoh, S. (2007). Using visual–textual mutual information for inter-modal document indexing. In ECIR’07, Rome, Italy.
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003
IPM 1370
No. of Pages 25, Model 3G
15 November 2010 Disk Used 24 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967
J. Martinet et al. / Information Processing and Management xxx (2010) xxx–xxx
Martinet, J., Chiaramella, Y., Mulhem, P., & Ounis, I. (2003). Photograph indexing and retrieval using star-graphs. In Proceedings of CBMI’03 – Third international workshop on content-based multimedia indexing, Rennes (pp. 335–341). Martinet, J., Ounis, I., Chiaramella, Y., & Mulhem, P. (2003). A weighting scheme for star-graphs. In Proceedings of ECIR’03 – 25th BCS-IRSG European conference on information retrieval research, Pisa (pp. 546–554). Martinet, J., Chiaramella, Y., & Mulhem, P. (2005). A model for weighting image objects in home photographs. In ACM-CIKM’2005, Bremen, Germany (pp. 760– 767). Martinet, J., Satoh, S., Chiaramella, Y., & Mulhem, P. (2008). Media objects for user-centered similarity matching. Multimedia Tools and Applications [Special issue on Semantic Multimedia]. Mechkour, M. (1995). Emir2: An extended model for image representation and retrieval. In Database and expert systems applications conf., London (pp. 395– 404). Mechkour, M. (1995). Un modFle Ttendu de reprTsentation et de correspondance d’images pour la recherche d’informations. ThFse de Doctorat, UniversitT Joseph Fourier, Grenoble. Merchán, P., Vázquez, A. S., Adán, A., & Salamanca, S. (2008). 3d scene analysis from a single range image through occlusion graphs. Pattern Recognition Letters, 29(8), 1105–1116. Morton, S. (1987). Conceptual graphs and fuzziness in artificial intelligence. PhD thesis, University of Bristol. Mugnier, M.-L., & Chein, M. (1996). ReprTsenter des connaissances et raisonner avec des graphes. Revue d’intelligence artificielle, 10(1), 7–56. Mulhem, P., & Lim, J. H. (2002). Symbolic photograph content-based retrieval. In CIKM’2002 – ACM conference on information and knowledge management, Virginia, USA (pp. 94–101). Mulhem, P., Lim, J. H., Leow, W. K., & Kankanhalli, M. (2003). Advances in digital home photo albums. In S. Deb (Ed.), Multimedia systems and content-based image retrieval. Idea Group Publishing. Niblack, W., Barber, R., Equitz, W., Flicker, M., Glasman, E., Petkovic, D., et al. (1993). The QBIC project: Querying images by content using color, texture, and shape. Research Report, Rj 9203(81511), IBM Almanden Research Center, San Jose, CA. Nielsen, J. (1994). Usability engineering. San Francisco: Morgan Kaufman. Ounis, I., & Pasca, M. (1998). Relief: Combing expressiveness and rapidity into a single system. In SIGIR’98 (pp. 266–274). Quelhas, P., & Odobez, J. -M. (2006). Natural scene image modeling using color and texture visterms. In CIVR’2006, Phoenix AZ. Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., & Tuytelaars, T. (2007). A thousand words in a scene. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1575–1589. Raghavan, V. V., & Wong, S. K. M. (1986). A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science, 37(5), 279–287. Rodden, K. (1999). How do people organise their photographs? In Proceedings of 25th BCS-IRSG colloquium on IR. Rodden, K., & Wood, K. R. (2003). How do people manage their digital photographs? In ACM conference on human factors in computing systems – CHI’03, Florida, USA (pp. 409–416). Rojet, A. S., & Schwartz, E. L. (1990). Design considerations for a space-variant visual sensor with complex-logarithmic geometry. In 10th International conference on pattern recognition (vol. 2, pp. 278–285). Salton, G. (1971). The SMART retrieval system. Prentice Hall. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. In Information processing and management (pp. 513–523). Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill. Salton, G., Wong, A., & Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423. 623–656. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. Snoek, C. G. M., & Worring, M. (2005). Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1), 5–35. Sowa, J. F. (1984). Conceptual structures. Reading, MA: Addison-Wesley. ter Haar Romeny, B. M. (2003). Front-end vision and multi-scale image analysis. Kluwer Academic Publishers.. van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London: Butterworths. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-Verlag. Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157. Wang, J. Z., & Du, Y. (2001). RF $ IPF: A weighting scheme for multimedia information retrieval. In ICIAP (pp. 380–385). Wang, J. Z., Li, J., & Wiederhold, G. (2001). SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9), 947–963. Wuwongse, V., & Manzano, M. (1993). Fuzzy conceptual graphs. In ICCS’93 (pp. 430–449). Zheng, Q. -F., Wang, W. -Q. & Gao, W. (2006). Effective and efficient object-based image retrieval using visual phrases. In ACM Multimedia’2006, Santa Barbara, California.
968
Please cite this article in press as: Martinet, J., et al. A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing and Management (2010), doi:10.1016/j.ipm.2010.10.003