Similarity Matching Abstract 1 Introduction 2 Metric or not ... - CiteSeerX

20 downloads 0 Views 195KB Size Report
Tversky and Gati 10] proposed a substitute for the triangle ... in uential papers in the eld, published by Tversky 9], uses a .... 10] Amos Tversky and Itamar Gati.
Similarity Matching Simone Santini and Ramesh Jain Visual Computing Lab University of California, San Diego

Abstract

certain elements placed in a given way. The basic operation in an image database is ordering a portion of the database with respect to the similarity with the query. Just like the matching is the single most important operation in traditional databases|since it decides, ultimately, which data satisfy the query|so similarity measurement is the single most important operation in image databases. Since the results of the query must be judged by a human user, it seems natural to let the human concept of similarity drive us in the de nition of a similarity measure. Psychologists have been experimenting for some 70 years trying to de ne the properties of human similarity judgment. Many of these models succeed in explaining qualitatively some experimental ndings, but few of them are in a mathematical form suitable for application to automatic computation. In section 2, we brie y review some of the most debated models of human similarity. The emphasis of this section will be on whether human similarity can or cannot be modeled as a distance in a metric space. In section 3, we analyze a little more closely one of the most successful psychlogical models and, in section 4, we discuss its fuzzy cation, that turns out to be necessary if the similarity measure has to be used in all but the most simple laboratory experiments. Finally, in section 5, we present some of the result we obtained using these metrics, when applied to the determination of the similarity between human faces.

Image databases will force us to rethink many of the concepts that led us so far. One of these is matching. We argue that the fundamental operation in a content-indexed image database should not be matching the query against the images in the database in search of a \target" image that best matches the query. The basic operation in query-by-content will be ranking portions of the database with respect to similarity with the query. What kind of similarity measure should be used is a problem we begin exploring in this paper. We let psychological experiments guide us in the quest for a good similarity measure, and devise a measure derived from a set-theoretic measure proposed in the psychological literature, modi ed by the introduction of fuzzy logic.

1 Introduction What makes a multimedia database di erent from the databases we have been using until now? Many things, one might say: there are di erent ways to express a query (a sketch, an image...), a di erence of several orders of magnitude in the amount of data we might expect to deal with, and so on. All these di erences will, and do, play an important r^ole in designing an images database, but there is also a fundamental qualitative di erence we must consider: image databases must be designed to handle a di erent type of search than conventional databases. While a search in a traditional database always results in a clear distinction between the elements that match the query and the elements that don't, this is no longer true when we search images by content. Although it is certainly possible to match one image against the other, the most natural approach to image databases is based on similarity. When we make a search in a traditional database, we always have a target in mind: we look for those records that satisfy a certain query, and only those records. When searching an image database, most of the time, we don't have any partcular target in mind: we might ask for a photograph with a certain color tone, or with

2 Metric or not Metric? In the past decades, a number of psychologists developed experiments to measure the human similarty perception, and devised models to explain the results they obtained. These models di er in several features, but one of the most characteristic di erences|and one of the most debated points in the psychological community| is whether human similarity measurement follows the metric axioms and, if not all, which axioms are satis ed and which are not. Suppose SA and SB are two stimuli, represented as vectors in some space of suitable dimension, and let the similarity between the two be measures via a psychologi1

cal distance function d(SA ; SB ). In general, the assumption is made that the perceived similarity d is di erent from the judged similarity , and that the two are related by a monotonically nondecreasing function g: (SA ; SB ) = g[d(SA ; SB )] (1) Only the properties of  can be determined experimentally. If d is a metric function, it has the following characteristics: Constance of self-similarity: for all stimuli S it is d(S; S ) = 0. This is an experimentally testable property, since it implies (SA ; SA ) = (SB ; SB ) for all stimuli SA , SB . Minimality: for all stimuli SA and SB , it is d(SA ; SA )  d(SA ; SB ). This is also an experimentally testable property, since it implies (SA ; SA )  (SA ; SB ), due to the monotonicity of the function g. Symmetry: for all stimuli SA and SB , it is d(SA; SB ) = d(SB ; SA). This property too can be tested experimentally. Triangle inequality: for all SA, SB , SC , it is d(SA ; SC )  d(SA ; SB ) + d(SB ; SC ) This property cannot be tested experimentally. Even if d satis es the triangle inequality,  might not, or vice versa. There is strong experimental evidence that selfsimilarity is not constant [5] that is, some things are more \similar to themselves" than others. Also, a number of researchers have proved experimentally that human similarity assessment is asymmetrical [9, 1]. Monotonicity seems to hold, although in [9] it is argued that this might not always be the case. The triangular inequality is the most debated and troublesome of the metric axioms, since, according to the model, the satisfaction or violation of this property by the function d is not accessible to experimentation. It is common wisdom, however, that the triangular inequality does not hold for human similarity perception [10, 1]. Tversky and Gati [10] proposed a substitute for the triangle inequality that they call corner inequality. If x1 y1 , x2 y2 , and x3 y3 are three points in a twodimensional space, and x1 < x2 < x3 and y1 < y2 < y3 , the corner inequality holds if d(x1 y1 ; x3 y1 ) > d(x1 y1 ; x2 y2 ) and (2) d(x3 y1 ; x3 y3 ) > d(x2 y2 ; x3 y3 ) or d(x1 y1 ; x3 y1 ) > d(x2 y2 ; x3 y3 ) and (3) d(x3 y1 ; x3 y3 ) > d(x1 y1 ; x2 y2 )

It can be proved that, if the function g relating d and  in (1) is monotonic then the corner inequality holds for  if and only if it holds for d that is, unlike the triangular inequality, compliance with the corner inequality can be veri ed experimentally and, indeed, has been veri ed not to hold for human observers. To sum up, a model of human similarity should ideally present the following characteristics:

 Non constant self-similarity  Asymmetry  Violation of the corner inequality

2.1 Some Geometric Models In spite of evidence that constancy of self-similarity and symmetry do not hold in general, metric models of similarity are quite popular, mainly because of their simplicity and the powerful mathematical appartus available to study them. Also, it must be considered that a property that is proved not to hold in an experiment with given stimuli can hold for a di erent type of stimuli. For instance, similarity of certain global properties, like overall color, seems to be metric. In all this section, we will consider the problem of determining the preceptual similarity or the pereptual distance between two objects SA and SB , being described by the sets of features fa1 ; : : : ; an g and fb1 ; : : : ; bn g, respectively. The feature vectors are two elements of IRn .

2.1.1 Euclidean Distance The simplest way to compute the perceptual distance between the stimuli is by the Euclidean distance between the two vectors:

de (Sa ; SB ) =

" n X

i=1

(ai ? bi )2

# 21

(4)

Although there is quite conclusive evidence that the perceptual distance in human is not Euclidean, this distance is still used as a reference against which models are compared.

2.1.2 City Block Distance In a series of experiments with rectangles changing shape and size, Attneave [2] found a good agreement between the experimental data and the city block model:

dc (Sa ; SB ) =

n X i=1

jai ? bi j

(5)

2.1.3 Thurstone-Shepard Models

1

In these models, [8, 6, 7, 4, 3], it is postulated that similarity is based on a momentary distance, which is assumed to be a Minkowski distance:

d=

" n X

k=1

# 1

(6)

The similarity between two stimuli is a function g of the distance, usually assumed to be of the form:

g(d) = exp(?d )

(7)

where is a positive parameter.

3 The Feature Contrast Model The failure to saisfy the requirements outlined in sec. 2 with metric models has led to the explorations of alternative ways of similarity assessment. One of the most in uential papers in the eld, published by Tversky [9], uses a set-theoretic model. Let a, b be two stimuli, described by the presence or absence of features drawn from a common feature set. Let A and B be the set of features present in the stimuli a and b, respectively. Also, let S (a; b) be a measure of the similarity between a and b. Tversky considers similarity functions of the form (8)

and then proves that, under reasonable hypotheses, the distance S can be rewritten as:

S (a; b) = f (A \ B ) ? f (A ? B ) ? f (B ? A) (9) without changins the ordering induced by S .

This representation is called the feature contrast model. The feature contrast model has proved very succesful in explaining the outcome of experiments; in particular, it explains asymmetry, since

S (a; b) > S (b; a) whenever f (A) > f (B ):

µ (narrow) 0

jxk ? yk j ;  1:

S (a; b) = F (A \ B; A ? B; B ? A)

µ (wide)

(10)

The model also accounts for variation of self-similarity and violation of the corner inequality.

4 The Fuzzy Feature Contrast Model The feature contrast model has been quite successful in explaining the characeristics of human similarity assessment in laboratory experiments, where the subjects were

0 1 width of the feature (normalized)

Figure 1: Typical membership functions for narrowness and \wideness" presented simple stimuli. It encounters problems, however, when we try to apply it to the typical situations we have in computer vision. The main problem of the approach is that it requires the stimuli to be de ned by a set of binary features. This can easily be accomodated in the controlled world of a psychology laboratory, but analysis of real world images is more likely based on a series of continuous measurements, grouped in a feature vector. For instance, in a faces database, we may have feature vectors made of geometric measurements of a face, such as the width of the mouth or the length of the nose, while the feature contrast model would require binary features such as the presence of a wide mouth or of a long nose. In this section, we modify the feature contrast model into the fuzzy feature contrast model. Consider again the problem of measuring the similarity between faces. A face is characterized by a number of features of different types but, for the following discussion, we will only consider geometrical features, since these lead naturally to predicate features. It seems pretty intuitive that face similarity is in uenced by things like the size of the mouth, the shape of the chin, and so on. Also two faces with big mouths will be, all other things being equal, more similar than two faces one with a big mouth and one with a small mouth. A predicate like the mouth of this person is wide can be modeled as a fuzzy predicate whose truth, in the rst approximation, is supposed to be based only on the measurement of the width of the mouth. Once we have measured the width of the mouth, we can apply two truth functions like those in Fig. 1 to determine the truth of the predicates the mouth is wide and the mouth is narrow. In general, we have an image I and a number of measurements i on the image. We want to use these measurements to determine the truth of n fuzzy predicates. From the measurements i we derive the truth values of a number p of fuzzy predicates, and collect them into a vector: () = f1 (); : : : p ()g (11) We call () the (fuzzy) set of true predicates on the

measurements . We use this fuzzy set as a basis to apply Tversky's theory. In order to apply the feature contrast model to the fuzzy sets () and ( ) of the predicates true for the measurements  and , we need to compute the fuzzy sets () \ ( ) and () ? ( ) (and, by the same de nition, ( )?()), and to choose a suitable salience function f . For the saliency function f of the Tversky measure (9), we take the cardinality of the fuzzy set  = f1 : : : p g:

f () =

p X i=1

i

(12)

The intersection of the sets () and ( ) is de ned in the traditional way:

\ (; ) = fminf1 (); 1 ( )g; : : : minfp (); p ( )g; g

(13)

For the di erence of two fuzzy sets, we use the de nition:

a b d e f g

Figure 2: These 6 measures are taken from a face image. Predicate Supporting quantity Long face b Long chin d Wide mouth e Long nose a Square face (g ? f )=g Large face g=b Large chin f=b Table 1: Predicates used for similarity evaluation, and measured quantities that support their truth.

? (; ) = fmaxf1 () ? 1 ( ); 0g; : : : maxfp () ? p ( ); 0g; g (14) Measurements and prediwhich has the advantage that A?A = ;. With these def- 5.0.4 Geometrical cates initions, we can write the Tversky's similarity function between two fuzzy sets () and ( ) corresponding to The geometric measurements we derive from a face immeasurements made on two images as:

S (; ) =

p X

minfi (); i ( )g

i=1 p X

? ?

i=1 p X i=1

maxfi () ? i ( ); 0g maxfi ( ) ? i (); 0g (15)

We refer to the model de ned by eq. (15) as the Fuzzy Features Contrast (FFC) model.

5 Similarity of Faces In this section we present an experimental comparison of the results obtained by some of the similarity measures discussed in the previous sections. The testbed we use is that of the similarity of human faces. Similarity of faces is a complex issue, that depend on a number of factors, like the color and the shape of the hair, the texture of the skin, the geometry of the face components, and so on. In this experiment, we have chosen a limited approach, and we will measure similarity based only on geometric measures.

age are described in Fig. 2. Opportune combinations of these measurements provide support for the predicates of Tab. 1. These predicates can be collected in a set of features, and used to compute Tversky similarity. The FFC similarity model uses the truth value of the predicates, while metric distances are based on the geometric measurements.

5.0.5 Experimental results For the experiments, we use ten faces taken from a subset of the MIT face images database. All the images in this subset are mug shots of faces of approximately the same scale and with approximately constant illumination. A typical stimulus-response result is presented in Fig. 3. We note several di erences in the responses given by the similarity measures we considered. For instance, the person that was deemed \more similar" by three of the metric models (and was deemed \second similar" by the fourth) is placed only in fth position by the FFC measure. The only di erence between the two instances of FFC is that the face that is ranked \most similar" when > is ranked third when > . Since the parameters for the case > were chosen with the declared purpose

Attneave Stimulus

Euclid Shepard γ=6, α=2 Shepard γ=1, α=1 Shepard γ=0.3, α=1 Fuzzy FC 1 α=β=0 Fuzzy FC 2 α > 0, β > 0

Figure 3: of nding the maximum di erence, this indicates that, in this case, FFC is quite insensitive to the values of and . This makes the FFC model \almost" symmetric in the case of faces. We don't know of any experimental data that might con rm nor disprove this symmetry in the judgment of face similarity for human observers. The only di erence between the Attneave an Euclid distances is the inversion of the faces in the third and fth position. The Shepard model is much less consistent: comparing the case = 6 with the case = 1we see that many faces are ranked di erently. The average displacement, from the two cases is 2, versus an average displacement of 1=2 between the two instances of FFC.

References [1] F. Gregory Ashby and Nancy A. Perrin. Toward a uni ed theory os similarity and recognition. Psychological Review, 95(1):124{150, 1988. [2] Fred Attneave. Dimensions of similarity. Americal Journal of Psychology, 63:516{556, 1950. [3] Daniel M. Ennis and Normal L. Johnson. Thurstone-shepard similarity models as special cases of moment generating functions. Journal of Mathematical Psychology, 37:104{110, 1993.

[4] Daniel M. Ennis, Joseph J. Palen, and Kenneth Mullen. A multidimensional stochastic theory of similarity. Journal of Mathematical Psychology, 32:449{465, 1988. [5] Carol L Krumhansl. Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85:445{463., 1978. [6] Roger N. Shepard. Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22:325{245, 1957. [7] Roger N. Shepard. Toward a universal law of generalization for physical science. Science, 237:1317{ 1323, 1987. [8] L. L. Thurstone. A law of comparative judgement. Psychological Review, 34:273{286, 1927. [9] Amos Tversky. Features of similarity. Psychological review, 84(4):327{352, July 1977. [10] Amos Tversky and Itamar Gati. Similarity, separability, and the triangle inequality. Psychological Review, 89:123{154, 1982.