Multiple-Instance Image Database Retrieval by Spatial Similarity Based on Interval Neighbor Group John Y. Chiang Department of Computer Science and Engineering, National Sun Yat-sen University Kaohsiung, Taiwan 804 886-934-151515
YEN-Ren Huang
Shuenn-Ren Cheng
Department of Business Administration, Department of Computer Science Cheng Shiu University and Engineering, Kaohsiung, Taiwan National Sun Yat-sen University 886-7-7310606 #1195 Kaohsiung, Taiwan 804
[email protected]
[email protected] ABSTRACT
[email protected]. edu.tw
1. INTRODUCTION
In this paper, a multiple-instance image retrieval system incorporating a general spatial similarity measure is proposed. A multiple-instance learning is employed to summarize the commonality of spatial features among positive and negative example images. The general spatial similarity measure evaluates the degree of similarity between matching atomic spatial relations present in the maximum common object set of the query and a database image based on their nodal distance in an Interval Neighbor Group (ING). The shorter the distance, the higher degree of similarity, while a longer one, a lower degree of similarity. An ensemble similarity measure, derived from the spatial relations of all constituent objects in the query and a database image, will then integrate these atomic spatial similarity assessments and give an overall similarity value between two images. Therefore, images in a database can be quantitatively ranked according to the degree of ensemble spatial similarity with the query. In order to demonstrate the feasibility of the proposed approach, two sets of test for querying an image database are performed, namely, single-instance v.s. multiple-instance retrieval by employing the RSS-ING scheme proposed and the RSS-ING scheme v.s. 2D Be-string similarity method incorporating identical multiple-instance learning. The ING-based spatial similarity measure with fine granularity, combined with the utilization of a multiple-instance learning paradigm to forge a unified query key, produces desirable retrieval results that better match user’s expectation.
Retrieval of digital images from a database is an active research area due to the inefficiency of query processing utilizing traditional textual language [1-3]. In content-based image retrieval (CBIR) applications, features sufficiently discriminative to summarize the image content are usually extracted first to reduce the complexity of information contained in an image. However, images are inherently ambiguous since they contain a great amount of information that justifies many different facets of interpretation. Using a single image to query a database might adversely employ features that do not match user’s intention and retrieve dissimilar false-positive images, or leave similar falsenegative ones behind. How to automatically extract reliable image features as a query key that meets user’s anticipation in a contentbased image retrieval (CBIR) system is an important topic. Multiple-instance learning procedure identifies common positive features and excludes negative ones to further clarify the user’s searching criteria. Spatial similarity for objects present in the maximum common object set of the query and a database image are considered. The derivation begins with the discussion of similarity between two matching atomic spatial relations, and then extends to two sets of atomic spatial relations. The Ensemble Spatial Relation Similarity (ESRS) reflecting the overall degree of spatial resemblance between two images can then be determined. The last section demonstrates the effectiveness of this novel approach by comparing the retrieval results of (1) single-instance versus multiple-instance learning by applying the same RSS-ING paradigm proposed, and (2) multiple-instance retrieval by employing the RSS-ING approach and a 2D Be-string spatial representation technique. Due to the derivation of commonality from multiple example images through multiple-instance learning and the fine granularity of spatial similarity measure proposed, consistent search results matching user’s expectation are obtained in both tests. Finally, concluding remarks are made.
Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]
General Terms Theory.
Keywords retrieval by spatial similarity (RSS), Interval Neighbor Group (ING), multiple-instance learning, content-based image retrieval (CBIR).
2. MULTIPLE-INSTANCE LEARNING
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIVR ’10, July 5-7, Xi’an China c 2010 ACM 978-1-4503-0117-6/10/07 ...$10.00. Copyright ○
135
The multiple instance learning takes a small set of positive and negative example images, each associated with extracted feature clusters the user desired or unwanted [11, 12]. The positive concepts are then integrated and the negative ones excluded to automatically generate a unified query key for a CBIR system. Each training example image corresponds to a bag containing a plurality of instances. Each instance is mapped to a point in feature space. A bag is labeled negative if all the instances in it are
negative. On the other hand, a bag is labeled positive if there is at least one instance in it that is positive. Note the instances within a bag are not labeled themselves, only the bag is labeled positive or negative. Each bag is therefore an example image with ambiguous concepts. From a collection of labeled bags, the learner tries to induce common concepts that will label unseen bags correctly. The ambiguity between multiple feature clusters existing within a single image is further clarified with the help of multiple-instance learning. The features extracted by giving both positive and negative example images are then forged through multipleinstance learning procedure to derive a unified query key for retrieving relevant images from a database.
3. DERIVATION OF THE GENERAL SIMILARITY MEASURE 3.1 Atomic Spatial Relation and Interval Neighbor Graph (ING) An image is preprocessed first into a symbolic picture, composed of minimum bounding rectangles (MBRs) enclosing constituent objects, as a form of abstraction [13]. Given two non-zero sized objects A and B that are abstracted as MBRs, there are 13 possible one-dimensional spatial relations between A and B [14], as shown in Fig. 1. Note the assignment of labels to objects, e.g., A and B, is known a priori. Therefore, a total of 169 spatial relations exist for a pair of two-dimensional objects, as illustrated in Fig. 2. A 2D spatial relation describes the topological relation between two objects and is considered as the most fundamental unit, i.e., atomic spatial relation, in an RSS system. An atomic spatial relation can be decomposed into horizontal and vertical 1D spatial components by projecting onto the x- and y-axis, respectively. Given both the horizontal and vertical 1D spatial relations, the corresponding atomic 2D spatial relation can also be uniquely determined.
Figure 2: A total of 169 possible atomic spatial relations can be found between two MBRs. In Fig. 3, the 13 spatial relations between two 1D objects are further arranged as an Interval Neighbor Group (ING) to illustrate graphically the relations between various spatial arrangements [15]. Every 1D spatial relation R1D ( N i ), i = 1,,13, in Fig. 1 can be associated with one specific node N i in an ING. Two nodes are neighbors, if the corresponding spatial relations can be directly transformed into one another by continuously shortening, lengthening or moving operations. An atomic spatial relation in image = Q, R2 D ( N Qi, j )
j ( R ( N i ), R ( N )),1 ≤ i, j ≤ 13, 1D Qx 1D Qy
can
be
i represented by an ING node pair N Qi , j = ( N Qx , N Qyj ) , containing i two ING nodes N Qx , N Qyj in two INGs, each associated with the
corresponding respectively.
horizontal
and
vertical
1D
components,
Three different search strategies are commonly employed, namely, exact match, subimage and similarity retrieval [16]. The goal of performing an exact match search aims at finding a database image identical to the query one [17, 18]. The image retrieved and the query have exactly the same object set and spatial
Figure 1: A listing of 13 spatial relations between two 1D objects.
Figure 3: The Interval Neighbor Group (ING), with nodes corresponding to 13 1D spatial relations.
136
An atomic 2D spatial relation R2 D ( N Qi , j ) in image Q can be
arrangements. On the other hand, subimage retrieval intends to find database images that encompass all the objects and spatial arrangements in the query image[19]. Subimage retrieval is analogous to a query “find all images in the database that include this query image.” Similarity retrieval that allows disparate objects in both the query image and a database one is the most general kind of spatial retrieval [20-24]. The purpose of similarity retrieval is to find all database images partially match with a query in terms of constituent objects and spatial relations. Given a collection of partially matched database images, an effective similarity measure is required to generate a ranked retrieval result according to the degree of spatial resemblance with the query image. In the 2D Be-string scheme [20], the topology of objects in an image was represented by a string derived from the corresponding MBRs. The spatial relation between two boundary symbols was depicted by applying a dummy object. A dummy object was not a real object existed in the original image, rather a virtual object assumed. By comparing the 2D Be-strings of the query and a database image, the length of the “longest common subsequence (LCS)” was used as the similarity index. An overall similarity metric was determined based on the average of the LCS lengths along the x- and y-axis projections of the MBRs. However, two database images sharing the same LCS, yet with different remaining substrings will not be differentiable under this approach. For a Retrieval by Spatial Similarity (RSS) system to be effective, a general similarity measure with fine granularity based on the ensembles of spatial relations existing in two images is the key issue needed to be addressed.
i decomposed into two 1D components R1D ( N Qx ), R1D ( N Qyj ) , each
an orthogonal projection along the vertical or horizontal axis. For every 1D spatial relation R1D ( N i ) , there is a associate node N i in an ING. Take Fig. 4 (a) as an example, the atomic spatial relation formed by objects o1 and o 2 in image Q is decomposed into xi = meet and y-axis components corresponding to ING nodes N Qx and N Qyj = overlap in two INGs, as shown in Fig. 4 (b) and (c). An ING can be represented as a tuple (V , E ) , where V is the set containing 13 nodes associated with 1D spatial relations, and E the set of undirected edges connecting the nodes. The edge i
j
e( N , N ) is defined as two neighboring nodes connected by a direct link, where e ∈ E , N i , N j ∈ V . For a node N i ∈ V , its corresponding 1D spatial relation is denoted as R1D ( N i ) . The distance between two nodes connected by a direct link is one. A path is a sequence of nodes such that from each of its nodes, there is an edge to the next node in the sequence. The shortest path between N i and N j is the path with minimum number of edges among all paths from N i to N j . The distance dist1D ( N i , N j ) is the number of edges traversed along the shortest path from N i to N j . The distance between two arbitrary ING nodes is summarized in Table 1. The farthest is 6, while the closest 0. The distance between two ING nodes indicates the degree of similarity between two projected 1-D spatial relations. Neighboring nodes can be directly transformed into one another by a simple shortening, lengthening or moving operation, while nodes far apart require a cascade of the above operations. Based on this observation, the formulation of the Spatial Relation Similarity (SRS) between two 1D spatial relations R1D ( N i ) and
3.2 The ING Nodal Distance and Degree of Similarity between Two Atomic Spatial Relations For every two objects in an image, there exists an atomic spatial relation. A set of atomic spatial relations can be derived from an image with a plural number of objects. Before exploring the spatial similarity between two images, the similarity measure between two atomic relations has to be determined first. The degree of similarity between two atomic spatial relations is linked directly to the distance between the associated nodes in an ING. Once all the similarity values of one-to-one atomic relations are derived, an ensemble similarity measure between two images can then be determined.
R1D ( N j ) is linked to the distance between the corresponding ING nodes N i and N j , as follows:
SRS1D ( R1D ( N i ), R1D ( N j )) = 1 −
dist1D ( N i , N j ) . 6
The range of SRS1D ( R1D ( N i ), R1D ( N j )) is between 0 and 1, with 1 as the exact match, 0 total irrelevance. The approach for deriving 1D similarity measure above is
Table 1: The shortest distance between two nodes in an ING Before Meet Overlap Finish During Start Equal ~Start ~During ~Finish ~Overlap ~Meet ~Before
Before
Meet
Overlap
Finish
During
Start
Equal
~Start
~During
~Finish
~Overlap
~Meet
~Before
0 1 2 4 4 3 3 4 4 3 4 5 6
1 0 1 3 3 2 2 3 3 2 3 4 5
2 1 0 2 2 1 1 2 2 1 2 3 4
4 3 2 0 1 2 1 2 2 2 1 2 3
4 3 2 1 0 1 1 2 2 2 2 3 4
3 2 1 2 1 0 1 2 2 2 2 3 4
3 2 1 1 1 1 0 1 1 1 1 2 3
4 3 2 2 2 2 1 0 1 2 1 2 3
4 3 2 2 2 2 1 1 0 1 2 3 4
3 2 1 2 2 2 1 2 1 0 2 3 4
4 3 2 1 2 2 1 1 2 2 0 1 2
5 4 3 2 3 3 2 2 3 3 1 0 1
6 5 4 3 4 4 3 3 4 4 2 1 0
137
extended to 2D case by considering the 2D distance first. Assume images Q and D contain exactly two common objects o1 and o 2 . The matching object pairs (o1 , o 2 ) in images Q and D form atomic spatial relations R ( N i , j ) and R ( N k ,l ) , respectively. 2D
Q
2D
D
The x- and y-projections of atomic spatial relations R2 D ( N Qi , j ) and R2 D ( N Dk ,l ) are associated with ING node pairs i k l N Qi , j = ( N Qx , N Qyj ) and N Dk ,l = ( N Dx , N Dy ) . The 2D distance
i k = meet and N Dx (b) N Qx = ~start
between N Qi , j and N Dk ,l can be obtained by summing up the corresponding 1D shortest-path distances:
i k l i , j k ,l dist = dist1D ( N Qx , N Dx ) + dist1D ( N Qyj , N Dy ). 2 D ( NQ , N D )
(1)
Following the same paradigm, the degree of similarity between two atomic spatial relations can be expressed as the average of 1D similarity values along x- and y-axis: i k l SRS1D ( R1D ( N Qx ), R1D ( N Dx )) + SRS1D ( R1D ( N Qyj ), R1D ( N Dy )) SRS2 D ( R2 D ( N Qi , j ), R2 D ( N Dk ,l )) = 2
= 1−
dist2 D ( N Qi , j , N Dk ,l ) 12
.
l = ~overlap (c) N Qyj = overlap and N Dy
Take Fig. 4 (a) as an example. There are two common objects 1 o and o 2 in images Q and D. An atomic spatial relation, R ( N i , j ) and R ( N k ,l ) , can be derived for every image. The 2D
Q
2D
Figure 4: (a) Two common objects, o1 and o 2 , are present in images Q and D. An atomic spatial relation can be derived for each image. The ING nodes corresponding to x- and y-axis projected spatial relations of the matching atomic pairs are shown in (b) and (c), respectively.
D
pair of spatial relations, ( R2 D ( N Qi , j ), R2 D ( N Dk ,l )), formed by two common objects in two images is called a matching atomic pair. The elements of a matching atomic pair are each associated with i = N Qi , j (= N Qx , N Qyj ) (meet, overlap) and an ING node pair
3.3 Similarity Measure between Two Sets of Atomic Spatial Relations, and Two Images (ESRS)
k l = N Dk ,l (= N Dx , N Dy ) (~start, ~overlap), as illustrated in Fig. 4 (b)
i, j k ,l and (c). The 2D distance dist2 D ( N Q , N D ) can be obtained by
Assume q and d objects are contained in the query Q and a database image D, respectively, i.e., object sets= OQ {= oQs s 1,, q} and = OD {= oDt t 1,, d }. These q + d
summing up the corresponding 1D path distances according to Eq. (1):
dist2 D ( N Qi , j , N Dk ,l ) = dist1D (meet, ~start)
objects can be partitioned into three disjoint sets, namely, OQ ∩ OD , OQ − OD , and OD − OQ . The intersection
(1)
+ dist1D (overlap,~overlap)= 3+2=5.
M = OQ ∩ OD is called the maximum common object set. Objects
Therefore, the degree of spatial similarity between a matching atomic pair in images Q and D can be calculated according to Eq. (2): SRS 2 D ( R2 D ( N Qi , j ), R2 D ( N Dk ,l )) =− 1
Query image Q
5 7 = . 12 12
in the maximum common object set can be found in both images, while objects belonging to the difference set, OQ − OD or
OD − OQ , appear only in either image Q or D. Every two objects
(2)
in M constitute an atomic spatial relation in images Q and D, respectively. These two atomic spatial relations further form a matching atomic pair. On the other hand, due to lack of correspondence between images Q and D for objects in the difference sets OQ − OD and OD − OQ , no matching atomic spatial relations can be established. We will focus on deriving the similarity measure of spatial relations for objects belonging to the maximum common object set next. Assume the maximum common object set M contains m objects common to images Q and D, i.e., M = {o1 ,, o m }. Among them,
Database image D (a)
a
total
of
= C2m m(m − 1) / 2
object
pairs
(o , o ),1 ≤ a, b ≤ m, a > b, can be formed. For each of the object a
b
, a matching atomic pair (o a , o b ) ( o a , ob ) i , j ( o a , ob ) k , l ( R2 D ( N Q ), R2 D ( N D )) containing spatial relations in images
pairs
138
Q and D can be identified. The corresponding one-to-one spatial relation similarity of the matching atomic pair can be derived by following Eq. (2): a b a b SRS 2 D ( R2( oD , o ) ( N Qi , j ), R2( oD , o ) ( N Dk ,l )) = 1 −
dist2 D ( N Qi , j , N Dk ,l ) 12
Finally, the degree of ensemble spatial resemblance between images Q and D can be obtained by following Eq. (4): C3 5 5 ESRS 2 D (Q, D) = ⋅ 4 2 3 = . 3 C2 ⋅ C2 18
.
The total amount of spatial similarity contributed by all matching atomic pairs in M can be obtained by summing up the similarity value of individual matching pair of atomic spatial relations. (3) SRS M = SRS ( R ( o , o ) ( N i , j ), R ( o , o ) ( N k ,l )). 2D
∑
a
2D
b
a
2D
Q
2D
(a)
b
D
(3)
o , o ∈M ,1≤ a , b ≤ m , a > b a
b
The SRS2MD above represents the degree of spatial similarity between matching atomic spatial relations constituted by objects in the maximum common object set of images Q and D. On the other hand, the ensemble spatial relation similarity (ESRS) stands for the overall resemblance between images Q and D by taking all possible correspondences of spatial relations formed by constituent objects into consideration. For a query Q with q objects, C2q atomic spatial relations can be identified. Similarly, C2d atomic spatial relations for a database image D with d objects. Therefore, a total of C2q ⋅ C2d possible correspondences of spatial relations between images Q and D can be established. Among them, only C2m matching atomic pairs of spatial relations, formed by the m objects in the maximum common object set M, can successfully find its corresponding counterparts. With the above discussion, the ensemble spatial relation similarity (ESRS) can be formulated by normalizing SRS2MD with the ratio of the number of successful matching atomic pairs to the total number of possible correspondences, as follows: , D) SRS ESRS 2 D (Q =
where
OQ = q,
OD = d ,
M 2D
Cm ⋅ q 2 d C2 ⋅ C2
and
M =OQ ∩ OD =m.
(b)
(c) D ∪ {oD4 } Figure 5: (a) Image Q is composed of four objects and D three objects. The maximum common object set M contains three (4) common objects, i.e., M = {o1 , o 2 , o3}. Three matching atomic pairs, constituted by common objects in M, are shown in (b). (c) One disparate object oD4 is added to the image D in (a). M
(4) The
ESRS 2 D (Q, D). reflects the overall degree of spatial resemblance between images Q and D. When the numbers of objects present in Q, D and M are all equal to 2, Eq. (4) degenerates into (2), i.e., the degree of similarity between two matching atomic spatial relations. Let’s assume there are four objects {o1 , o 2 , o3 , oQ4 } in image Q and
remains unchanged, yet the number of possible correspondences between atomic relations increases due the inclusion of a new object. In the above example, all objects in database image D are also present in query Q, i.e., OQ ⊃ OD , and M = OQ ∩ OD = OD . Next, we will add a new object oD4 to the current image D, as shown in
three objects {o1 , o 2 , o3} in D, as shown in Fig. 5 (a). The maximum common object set M contains three common objects, i.e., M = {o1 , o 2 , o3}. Fig. 5 (b) illustrates the three matching atomic pairs, constituted by the common objects in M. The 2D ING nodal distance between each matching atomic spatial relations can be derived as follows:
Fig. 5 (c). Since oD4 and oQ4 are two disparate objects, the maximum common object set M remains unchanged after the inclusion of a new object, i.e., M = {o1 , o 2 , o3}. So does the total amount of spatial similarity SRS2MD , contributed by the matching object pairs in M. However, the difference set OD − OQ includes a
dist2 D ((meet,overlap)Q ,(~start,~overlap) D )
new element oD4 after the insertion. The number of atomic spatial relations lacking correspondence increases due to the placement of a disparate object. This decrease in spatial resemblance between images Q and D will be reflected on the degree of ESRS as follows:
= dist1D (meet, ~start) + dist1D (overlap, ~overlap) = 3 + 2 = 5,
dist2 D ((~finish,~overlap)Q ,(~overlap,overlap) D ) = 4,
dist2 D ((~meet,after)Q ,(~overlap,before) D ) = 7.
5 C3 5 ESRS 2 D (Q, D) = ⋅ 4 2 4 = . 3 C2 ⋅ C2 36
According to Eq. (3), the total amount of spatial similarity contributed by the matching object pairs in M is equal to 5 4 7 5 SRS 2MD = (1 − ) + (1 − ) + (1 − ) = . 12 12 12 3
From the above discussion, next we will prove the monotonicity property of the proposed ensemble spatial relation similarity
139
(ESRS). For a query image, database images with different number of disparate objects will be differentiable in terms of the ESRS, even sharing the same maximum common object set. This desirable “fine granularity” feature for the proposed similarity measure cannot be found in the aforementioned exact match, subimage and similarity retrieval search strategies [16-24]. Preposition The ensemble spatial relation similarity (ESRS) between two images increases as the number of objects in the difference set OQ − OD or OD − OQ decreases, and vice versa. Proof Assume q, d and m objects are contained in the query Q, database image D and maximum common object set M, respectively. The original ensemble spatial relation similarity (ESRS) is formulated according to Eq. (4): , D) SRS 2MD ⋅ ESRS 2 D (Q =
C2m . C ⋅ C2d q 2
Without loss of generality, let’s assume the number of objects in the difference set OQ − OD is decreased by one due to removal of an object from the query Q. Denote the image after the removal operation as Q ' . Since the maximum common object set M = N Pi1, j (= N Pi1 x , N Pj1 y ) (after, during) and (c) ING node pairs remains intact, the total amount of spatial similarity, SRS2MD , contributed by the matching object pairs in M is unchanged. The = N Pk2,l (= N Pk2 x , N Pl 2 y ) (after, during) corresponding to the x- and yensemble spatial relation similarity (ESRS) can be rewritten as: axis projected spatial relations of matching atomic pair between Cm ', D) SRS 2MD ⋅ q −1 2 d ESRS 2 D (Q= objects (o1 , o 2 ) in images (a) and (b). Note that both atomic C2 ⋅ C2 relations are identical. Therefore, the x- and y-axis ING nodes are Cm q the same. So does the induced common spatial characteristics = SRS 2MD ⋅ q 2 d ⋅ > ESRS 2 D (Q, D). C2 ⋅ C2 q − 2 from multiple-instance learning.
4. EXPERIMENTAL STUDY Every picture contains two to five MBR-covered objects, each object generated with random dimension and positioning. In order to effectively compare the retrieval results, rotation with 90degree-increment, mirroring, inclusion of new object(s) and removal of existing object(s) are applied on every base picture to produce 6 more variants. Therefore, for every image in the database, a total of seven pictures are closely correlated in terms of constituent objects and topology, as shown in Fig. 6.
4.1 Single-Instance v.s. Multiple-Instance
(a) Positive image P1 (b) Positive image P2
= N Pi1, j (= N Pi1 x , N Pj1 y ) (during, ~overlap) and (d) ING node pairs
l = N Pk2,l closely (= N Pk2 x , Ncorrelated (before, after) corresponding the x- and Figure 6: For every database image, there are a total of seven pictures in terms of constituenttoobjects andyP2 y ) spatial arrangement. axis projected spatial relations of matching atomic pair between
140
objects (o1 , o3 ) in images (a) and (b). The spatial commonality induced is (overlap, ~meet).
(b) Figure 8: The result retrieved by using Figure 7 (a) as a singleinstance query, and (b) multiple-instance retrieval result using two positive images in Fig. 7 (a) and (b). The corresponding ESRS value and image number in the database are also listed.
= N Pi1, j (= N Pi1 x , N Pj1 y ) (before, ~during) and (e) ING node pairs
(a) Negative image n1
= N Pk2,l (= N Pk2 x , N Pl 2 y ) (before, after) corresponding to the x- and yaxis projected spatial relations of matching atomic pair between objects (o 2 , o3 ) in images (a) and (b). The spatial commonality induced is (before, ~overlap).
(f) The unified query forged through the multiple-instance paradigm with 2 positive images (a) and (b). Figure 7: (a) and (b), Two positive images. The ING node pairs corresponding to the x- and y-axis projected spatial relations of matching atomic pair between objects (c) (o1 , o 2 ) , (d) (o1 , o3 ) , (e) (o1 , o3 ) , and (f) the spatial commonality induced by employing the diverse density strategy.
= N Pi1, j (= N Pi1 x , N Pj1 y ) (b) ING node pairs
= N Pk2,l (= N Pk2 x , N Pl 2 y ) = N nu1, v (= N nu1 x , N nv1 y )
(before,
(during, ~overlap), after),
and
(during, ~ during) corresponding to the
matching atomic pair between objects (o1 , o3 ) in images 7 (a), 7 (b) and 8 (a). Note the commonality induced, after the inclusion of a negative image, is changed from (overlap, ~meet) in Fig. 7 (d) to (before, after). (a)
141
Figure 9: (a) One negative image. The ING nodes corresponding to (b) x- and y-axis projected spatial relations of two positive images and a negative one.
measure proposed can also explore the combination of other global or local features to produce an even more effective image feature evaluation.
4.2 RSS-ING Proposed v.s. 2D Be-string
6. REFERENCES [1] W. C. Lin, Y. C. Chang and H. H. Chen, “Integrating textual and visual information for cross-language image retrieval: A trans-media dictionary approach,” Inf. Process. Manage., Vol. 43, No. 2, pp. 488-502, Mar. 2007. [2] K. Barnard and M. Johnson, “Word sense disambiguation with pictures,” Artif. Intell., Vol. 167, No. 1-2, pp. 1330 ,2005. [3] W. R. Hersh, H. Muller, J. R. Jensen, J. Yang, P. N. Gorman and P. Ruch, “Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection,” J. Am. Med. Inf. Assoc., Vol. 13, No. 5, pp. 488-496, 2006. [4] John Y. Chiang and Shuenn-Ren Cheng, “Multiple-Instance Content-Based Image Retrieval Employing Isometric Embedded Similarity Measure,” Pattern Recogni., Vol. 42, No. 1, pp. 158-166, Jan. 2009. [5] D. R. Dooly, S. A. Goldman, S. S. Kwek, “Real-valued multiple-instance learning with queries,” J. Comput. Sys. Sci., Vol. 72, No. 1, pp. 1-15, 2006. [6] Z. H. Zhou and M. L. Zhang, “Solving multi-instance problems with classifier ensemble based on constructive clustering,” Knowledge Inf. Syst., Vol. 11, No. 2, pp. 155170, 2007. [7] X. Liu, S. Shekhar and S. Chawla, “Object-based Directional Query Processing in Spatial Databases,” IEEE Trans. Knowl. Data Eng., Vol. 15, No. 2, pp. 295-304, 2003. [8] Ahmad and W. Grosky, “Indexing and retrieval of images by spatial constraints,” J. Vis. Commun. Image Repres., Vol. 14, pp. 291-320, 2003. [9] J. T. Lee and H. P. Chiu, “2D Z-string: A New Spatial Knowledge Representation for Image Databases,” Pattern Recogni. Lett., Vol. 24, No. 16, pp. 3015-3026, Dec. 2003. [10] S. M. Hsieh and C. C. Hsu, “Graph-based representation for similarity retrieval of symbolic images,” IEEE Trans. Knowl. Data Eng., Vol. 65, No. 3, pp. 401-418, 2008. [11] P. Punitha and D. S. Guru, “An effective and efficient exact match retrieval scheme for symbolic image database systems based on spatial reasoning: A logarithmic search time approach,” IEEE Trans. Knowl. Data Eng., Vol. 18, No. 10, pp. 1368-1381, 2006. [12] P. Punitha and D. S. Guru, “An Invariant Scheme for Exact Match Retrieval of Symbolic Images: Triangular Spatial Relationship Based Approach,” Pattern Recogni. Lett., Vol. 26, pp. 893-907, 2005. [13] Y. H. Wang, “Image indexing and similarity retrieval based on spatial relationship model,” Inf. Sci.: Int. J., Vol. 154, No. 1-2, pp. 39-58, 2003. [14] P. W. Huang, L. P. Hsu, Y. W. Su, and P. L. Lin, “Spatial Inference and Similarity Retrieval of an Intelligent Image Database System Based on Object's Spanning Representation,” J. Vis. Lang. Comput., accepted 2007. [15] T. K. Shin, C. S. Wang, A. Y. Chang and C. H. Kao, “Indexing and retrieval scheme of the image database based on color and spatial relations,” IEEE Int. Conf. ICME, 2000.
(a)
(b) Figure 10: (a) The multiple-instance counterpart by employing 2D Be-string with two positive example images and one negative image. (b) The top 50 images retrieved and their corresponding similarity value derived for RSS-ING and 2D Be-string schemes. The plot shows a better discriminability between different spatial relations for RSS-ING.
5. CONCLUSIONS The proposed RSS-ING spatial feature representation and similarity measure evaluates the degree of similarity between matching atomic spatial relations present in the maximum common object set of the query and a database image based on their nodal distance in an Interval Neighbor Group (ING). The spatial features identified are further analyzed through a multiinstance learning stage to induce commonality among numerous features existing in multiple example images. The combination of a good spatial feature assessor and feature cluster locator will generate query meeting user expectation, produce consistent database retrieval result and exclude both false-positive and falsenegative cases. The constraints regarding color, texture, size, orientation, location, etc., can be further imposed upon the feature clusters extracted and incorporated in the similarity comparison. The similarity
142