A Multi-step Strategy for Approximate Similarity ... - ACM Digital Library

0 downloads 0 Views 418KB Size Report
from the non-metric distance function of the similarity .... f ex n. X n t n tfj t f π. (3). Here,. 1. −= j is the imaginary unit. The inverse DFT of X returns the original time ...
A Multi-step Strategy for Approximate Similarity Search in Image Databases Paul W.H. Kwan

Junbin Gao

School of Mathematics, Statistics and Computer Science University of New England Armidale, NSW 2351, Australia

School of Information Technology Charles Sturt University Bathurst, NSW 2795, Australia

[email protected]

[email protected]

Abstract Many strategies for similarity search in image databases assume a metric and quadratic form-based similarity model where an optimal lower bounding distance function exists for filtering. These strategies are mainly two-step, with the initial "filter" step based on a spatial or metric access method followed by a "refine" step employing expensive computation. Recent research on robust matching methods for computer vision has discovered that similarity models behind human visual judgment are inherently non-metric. When applying such models to similarity search in image databases, one has to address the problem of non-metric distance functions that might not have an optimal lower bound for filtering. Here, we propose a novel three-step "prune-filter-refine" strategy for approximate similarity search on these models. First, the "prune" step adopts a spatial access method to roughly eliminate improbable matches via an adjustable distance threshold. Second, the "filter" step uses a quasi lower-bounding distance derived from the non-metric distance function of the similarity model. Third, the "refine" stage compares the query with the remaining candidates by a robust matching method for final ranking. Experimental results confirmed that the proposed strategy achieves more filtering than a two-step approach with close to no false drops in the final result. Keywords: Multi-step Strategy, Similarity Search, Image Databases, Non-metric Distance, Lower Bound

1

Introduction

Researches on similarity search in image databases cover both data representation and query specifications. Images are usually represented as numerical vectors indexed by a spatial access method (Gaede and Günther 1998). In terms of query specifications, both range and nearest neighbour queries have been widely studied and analyzed (Weber, Schek and Blott 1998). In practice, a similarity query is often processed in two steps, namely the initial “filter” step employing a spatial index built on the distance function of the underlying similarity model, followed by a "refine" step performing exact but expensive distance calculations (Brinkhoff et al. 1994, Ankerst, Kriegel and Seidl 1998).

Copyright © 2006, Australian Computer Society, Inc. This paper appeared at the Seventeenth Australasian Database Conference (ADC2006), Hobart, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 49. Gillian Dobbie and James Bailey, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included.

However, most strategies assume a metric and quadratic form-based similarity model for which an optimal lower bounding distance function exists for filtering (Ciaccia and Patella 2002). Recent research on robust image matching methods for appearance-based vision has discovered that similarity models behind human visual judgment are often non-metric (Jacobs, Weinshall and Gdalyahu 2000). When applying such models, one has to address the problems of non-metric distance functions that might not have an optimal lower bound for filtering. Here, we proposed a novel three-step "prune-filter-refine" strategy for approximate similarity search for non-metric similarity models. First, the "prune" step adopts a spatial index to roughly eliminate improbable matches via an adjustable distance threshold. Second, the "filter" step uses a quasi lower-bounding distance derived from the non-metric distance function. Third, the "refine" stage compares the remaining candidates via a robust matching method for final ranking. Experimental evaluation confirmed that the proposed strategy achieves more filtering than a two-step approach with close to no false drops in the final result. The rest of this paper is organized as follow. In Section 2, the proposed multi-step strategy is described. In Section 3, related works are mentioned. In Section 4, experimental evaluation performed on a database of traditional Japanese kamon images is presented. Lastly, Section 5 ends with concluding remarks.

2

The Multi-step Strategy

The proposed strategy consists of three successive steps, collectively known as the “prune-filter-refine” (or PFR) strategy. To facilitate explanation, Figure 1 illustrates the processing mechanisms behind the proposed strategy in the context of a shape-based image retrieval system that the authors developed based on a related research (Kwan, Kameyama and Toraichi 2003). This database application will serve as a guiding example throughout this paper.

2.1

The “Prune” Step

Similar to many two-step query processing strategies in the literature, the prune step of the proposed strategy uses a multi-dimensional spatial index to quickly eliminate irrelevant database objects from further matching. Each of the database objects is represented as a numerical vector whose values are derived from features of the image. Here, the set of numerical vectors is constructed by performing Discrete Fourier Transform (DFT) on both the horizontal and the vertical autocorrelation plots generated from each of the images by treating these plots as time sequence data.

Browse Enlarge

Name:

Keyword Search

Sound: Shape:

Similarity Search

End user

Query Specifications

Search:

Similarity Ranking

Name: 土 岐 桔 梗 Sound: き Caption: 五角

Similarity Query Feature Vectors

Feature Extraction

Query Processing

Browse Result

Candidate List

Filter Step

Refine Step

High-dimensional Spatial Index (Prune Step)



Database Administrator

...

Features Extraction



Shape Attributes

Batch Processing

Browse Query



Kamon Image DB

Feature Vectors Shape Attributes

Figure 1: The Multi-step Strategy in relation to a Shape-based Image Retrieval System The algorithm used in constructing these plots is similar to Nagashima, Tsubaki and Nakajima (2003). The idea is illustrated visually in Figure 2. Starting from a complete overlap of an image by a copy of itself, the horizontal autocorrelation is taken by shifting the image one pixel at a time from left to right while calculating for each shift the degree of autocorrelation, measured in terms of the number of non-background pixels that overlap. This is repeated until the image and its copy no longer overlap. Similar steps are taken when the vertical autocorrelation is measured by shifting from top to bottom instead. More formally, let x and y be variables that denote the autocorrelation lengths measured in the horizontal and the vertical directions respectively. Then, the horizontal and vertical autocorrelations can be expressed as: AC h (x ) =

AC v ( y ) =

lY

1 l Y =1 X − x

1

∑ lY − y ∑ f ( X , Y ) f ( X , Y + y )

lX

lX

Xf =

n

∑ xt

− j 2πtf e n

f = 0,1,..., n − 1

(3)

t =0

Here, j = − 1 is the imaginary unit. The inverse DFT of X returns the original time sequence by the following equation: xt =

1 n

n −1

∑X fe

j 2πtf n

t = 0,1,..., n − 1

(4)

f =0

For each database image, the two sets of coefficients are concatenated to produce a numerical vector. As illustrated in Table 1, for the experiments the first 15 Fourier series coefficients from each plot are used.

lX − x

1 lY



n −1

1

1

X =1

∑ f ( X , Y ) f ( X + x, Y )

(1)

X =1 lY − y

(2)

Y =1

Here, f(·,·) is a function that returns 0 for a background pixel and a 1 for a non-background pixel. Based on these equations, the horizontal and vertical autocorrelation plots shown in Figures 2(c) and (d) can be generated. Considering both horizontal and vertical autocorrelation plots as time sequences, the DFT generates for each a set of Fourier series coefficients as in Rafiei and Mendelzon (1997). Let a time sequence x = [ xt ] for t = 0, 1, …, n-1 be a finite duration signal. The DFT of x , denoted by X = [ X f ] , is given by:

Horizontal Autocorrelation 163.334 28.528 -3.659 4.017 6.754 5.332 2.356 -1.210 -0.703 0.857 2.701 2.598 0.811 -0.557 -0.562

Vertical Autocorrelation 212.053 49.417 21.058 27.413 33.419 29.126 25.692 21.002 19.098 21.020 22.260 21.999 19.137 15.825 14.272

Table 1: The first 15 Fourier series coefficients of the autocorrelation plots shown in Figure 2(c) and (d)

x

Y

lY y X

lx

X

Figure 2: (a) and (b) illustrate how horizontal and vertical autocorrelations of an image is taken (c) and (d) are the generated horizontal and vertical autocorrelation plots respectively To index the set of numerical vectors, a k-d-B tree based spatial index is chosen (Robinson 1981). The k-d-B tree combines the multi-dimensional search efficiency of k-d trees (Bentley and Friedman 1979) and the I/O efficiency of B-trees (Comer 1979) to handle multi-dimensional points. When applying the k-d-B tree in this work, the numerical vectors are normalized before embedded in a d-dimensional Euclidean space Ed. Let U be the universe of all such vectors. For any Oi,Oj ∈ U, their dissimilarity is defined by a distance metric D(Oi,Oj) ∈ R+in Ed as:

) (Oi1 − O1j )2 + ... + (Oid − O dj )2

(

D Oi , O j =

(5)

Here, Oik and O kj denote the attribute values of Oi and Oj in the kth-dimension respectively. Further, the condition below holds for the distance function:

(

)

0 < D Oi , O j ≤ d , ∀Oi , O j ∈ U and i ≠ j

(6)

Lastly, as seen from Figure 1, a pointer (or ID) is appended as the last element in each numerical vector to facilitate retrieving shape attributes of the corresponding image that will be used in the filter and refine steps.

2.2

The “Filter and Refine” Steps

The “filter and refine” steps of the proposed strategy are an application of Kwan et al. (2003). There, an approximate query processing approach for addressing the performance problem brought about by sequential matching in a related research was introduced (Kwan, Kameyama and Toraichi 2003). Kwan, Kameyama and Toraichi (2003) applied a robust image matching algorithm based on probabilistic relaxation labeling when comparing the query with every database image, and ranked them by a heuristic distance function. The features used were function approximated contour segments derived from closed contours of shapes extracted from the images. Figure 3 gives an example.

Based on the k-d-B tree based spatial index, two types of similarity queries, namely nearest neighbour and range queries could be possible. However, in the context of the proposed strategy, only range queries are relevant because the objective of the “prune” step is to reduce the search space quickly to a much smaller candidate set for further processing by the “filter and refine” steps. Given Oq ∈ U as the query, by specifying r ∈ R+ as the range, such that 0 ≤ r ≤ d holds, the candidate set (denoted C) that meets the condition below is returned:

{

(

) }

C = O p O p ∈ U and D O p , Oq ≤ r

(7)

Figure 3: Function approximated image (on the right) showing joint points between contour segments used by Kwan, Kameyama and Toraichi (2003)

Because the heuristic distance function is defined using probabilistic variables whose final values are not known until after the matching has converged, the distances from the query to all other objects in the space cannot be accurately determined even when the query is entered. Further, the space of all objects is non-metric, in the sense of obeying the triangle inequality on distances, rendering it difficult to designate one database object as the vantage point for a possible index (Bozkaya and Ozsoyoglu 1997). The approximate query processing approach of Kwan et al. (2003) and thereby, the “filter and refine” steps of the proposed strategy enables a close “approximation” of the retrieval result while simultaneously reducing the amount of computation required. At its centre is the concept of a lower bounding distance function for filtering improbable database objects from the need of computing expensive query distances. Provided that a provable lower bounding distance function exists and employed, no relevant objects should be discarded by the filtering step. Because some objects that remain after filtering might not belong to the final result, a further refining step based on computing the actual query distances is required to eliminate any false alarms (Ciaccia and Patella 2002). Whereas related work used provable lower bounds due to the metric nature of their similarity models, in research like ours where non-metric distance functions are used, a provable lower bound might not be easily found. As one approach for addressing this problem, in our work a quasi lower bounding distance function was introduced. This function (which could be many) is defined using both the non-metric distance function and a confidence factor. In practice, it is computed right after the initial state of a robust matching algorithm is set, but before the process of advancing towards a final state has commenced. In our formulation, the following notations are defined:

Dinitp ≡ distance calculated after initial state located; Dquery ≡ actual query distance;

Changes to the query processing strategy of Kwan et al. (2003), likewise for other robust matching methods such as based on deformable template matching, hierarchical neural network, etc) were made in two places. The first change was made where the initial state of the relaxation labeling algorithm is set and the second concerned the filtering step that was enabled by Dquasi, calculated by the product of Dinip and c_factor. An important assumption is that the amount of computation spent in setting the initial system state is much less than that of computing the actual query distances. In the context of Kwan et al. (2003), the changes made are summarized as follows: 1.

Rough matching between the query and all database images is performed all at once initially. For each match, an initial Dinitp is calculated. This differs from Kwan, Kameyama and Toraichi (2003) in that rough matching between every (query, database image) pair is followed by actual distance calculation at once.

2.

In the filtering step, Dquasi is computed by taking the product of Dinitp and the running value of c_factor. A database image is a candidate for refining wherever less than k nearest neighbours have been found so far or when the following criterion is satisfied:

Dquasi(db[i]) ≤ Dquery(current kth-NN)

(10)

Here, db([i]) refers to the current database image being compared, and kth-NN the kth element in the nearest neighbour list accumulated so far. Only those that satisfy (10) at their respective turn of comparison will have their actual Dquery calculated. The Dquery computed is used in updating the k-NN list. Further, the value of Dquery / Dinitp is used in updating the running value of c_factor to be used in matching the next database image. Pseudo code for “filter and refine” steps is given below: [BEGIN] expected_c_factor = 0.0; cumulative_c_factor = 0.0; count = 0;

Dquasi ≡ the quasi lower bounding distance; c_factor ≡ the confidence factor. The following condition should be held during the entire retrieval process: Dquery ≤ Dinitp

(8)

The role of the confidence factor is to facilitate adjusting Dinitp for the dual purpose of minimizing the chance of false drops while avoiding excessive false alarms. It is applied in the following equation: Dquasi = c _ factor ∗ Dinitp , 0 < c _ factor ≤ 1

(9)

Here, a heuristic procedure is introduced to determine the value of the confidence factor dynamically by treating it as a discrete random variable that takes the value of the mean of the cumulative sum of the ratio, Dquery / Dinitp, averaged by the number of times that Dquery has computed. In other words, the confidence factor represents a running average of the ratio of Dquery and Dinitp. Taken over an infinite time interval, it approximates the expected mean, E[c_factor].

for (i : [1,NUMBER_DB_IMAGES]) { D_initp[i] = rough_matching(Query, DB_Image[i]); } for (i : [1,NUMBER_DB_IMAGES]) { if (Less than k images in NN-List) { D_query = fine_matching(Query, DB_Image[i]); Update the NN-List; count = count + 1; cumulative_c_factor += (D_query / D_initp[i]); expected_c_factor = cumulative_c_factor / count; } else { D_quasi = expected_c_factor * D_initp[i]; if (D_quasi