An Approach to Interactive Retrieval in Face Image ... - Semantic Scholar

3 downloads 22655 Views 193KB Size Report
images, referred to as the image database. ... images. Image Retrieval (IR) problem is concerned with retrieving images that ...... Montreal, CDN, June 1985, pp.
An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes Venkat N. Gudivada1, Vijay V. Raghavan2, and Guna S. Seetharaman2 1Department of Computer Science Ohio University, Athens, OH 45701 2The Center for Advanced Computer Studies University of Southwestern Louisiana Lafayette, LA 70504

Abstract Image Retrieval (IR) problem is concerned with retrieving images that are relevant to users' requests from a large collection of images, referred to as the image database. A taxonomy for and the limitations of the existing approaches for image retrieval are discussed in [1]. Also, to alleviate some of the problems associated with these approaches, a unified framework for retrieval in image databases for a class of application areas is proposed in [1]. The framework provides a taxonomy for image attributes and identifies four generic types of retrieval based on the attribute taxonomy. Semantic attributes play a central role in supporting one of those generic retrieval types, referred to as Retrieval by Semantic Attributes (RSA). Semantic attributes are those attributes the specification of which necessarily involves some subjectivity, imprecision, and/or uncertainty. In this paper, we introduce Personal Construct Theory (PCT) [2, 3] as a knowledge elicitation tool for systematically deriving semantic attributes to support RSA in image retrieval applications. As a case study, we use a prototype database system comprising of human face images. The knowledge elicited from the face images is stored in a matrix form which is referred to as repertory grid. We propose an algorithm

for RSA based on the repertory grid. The algorithm incorporates user relevance judgments as a means to deal with the inherent problems associated with the specification of semantic attributes. The algorithm is implemented and tested on the human face image database and the initial results are encouraging. In essence, we have developed an overall methodology/test bed to facilitate experimentation with different algorithms for RSA.

1. Introduction Recently, there has been widespread interest in various kinds of database management systems for managing information from images. Image Retrieval (IR) problem is concerned with retrieving images that are relevant to users' requests from a large collection of images, referred to as the image database. There is a multitude of application areas that consider image retrieval as the principal activity [1]. Since the application areas are greatly diverse, there seems to be no consensus as to what an image database system really is. Consequently, the features of the existing image database systems have essentially evolved from domain specific considerations.

A taxonomy for and the limitations of the existing approaches for image retrieval are discussed in [1]. Also, to alleviate some of the problems associated with these approaches, a unified framework for retrieval in image databases for a class of application areas is proposed in [1]. The framework provides a taxonomy for image attributes and identifies four generic types of retrieval based on the attribute taxonomy. The attribute types are: meta, logical, and semantic attributes. The taxonomy for image attributes is shown in Figure 1. The attributes of an image that are derived externally and do not depend on the contents of an image are referred to as meta attributes. Meta attributes that apply to the entire image are referred to as image meta attributes and the meta attributes that apply to constituent objects in an image are called image-object meta attributes. The attributes that are used to describe the contents/properties of an image viewed either as an integral entity or as a collection of constituent objects are referred to as logical attributes. In the former case they are referred to as image logical attributes while in the latter case they are named image-

object logical attributes. Simply stated, semantic attributes are those attributes that are used to describe the high-level domain concepts which the images manifest. Specification of semantic attributes necessarily involves some subjectivity, imprecision, and/or uncertainty. Subjectivity arises due to differing view points of the users about various domain aspects. Difficulties in the measurement and specification of image features lead to imprecision. The following description further illustrates the imprecision aspect of the semantic attributes. In many image database application domains users prefer to express some semantic attributes using an ordinal scale though the underlying representation of these attributes is numeric. For example, in face image databases, a user's query may specify one of the following values for an attribute that indicates nose length: short, normal, and long. The retrieval mechanism must map each value on the ordinal scale to a range on the underlying numeric scale. The design of this mapping function may be based on domain semantics and statistical properties of this feature over all the images currently stored in the database.

Figure 1: A Taxonomy for Image Attributes

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

Uncertainty is introduced because of the vagueness in the retrieval needs of a user. The use of semantic attributes in a query forces the retrieval system to deal with subjectivity, imprecision, and/or uncertainty. Semantic attributes may be synthesized by applying user-perceived transformations/ mappings on logical and/or meta attributes of an image. These transformations/mappings can be conveniently realized using a rulebase. Subjectivity and uncertainty in some semantic attributes may be resolved through user interaction/learning at the query specification/processing time. Thus the meaning and the method of deriving semantic attributes in a domain may vary from one user to another user. It is through these semantic attributes that the proposed unified model [1] captures domain semantics that vary from domain to domain as well as from user to user within the same domain. Semantic attributes pertaining to the whole image are named image semantic attributes whereas those that pertain to the constituent image objects are named image-object semantic attributes. Retrieval based on semantic attributes is referred to as Retrieval by Semantic Attributes (RSA) and constitutes one of the four generic retrieval types identified in [1]. In this paper, we address two issues that relate to RSA. The first issue is concerned with the methods for systematic identification and analysis of semantic attributes in image database application areas. The second issue deals with developing an interactive approach/algorithm to facilitate RSA. As a case study, we use a prototype image database consisting of human face (HF) images. We refer to this database as HF image database or simply HF database. The organization of the paper is as follows. In section 2, we provide the retrieval requirements analysis of face image databases in general. Related work in the face information retrieval is presented in section 3. This is followed by a terse introduction to Personal Construct Theory (PCT) as a tool for systematically deriving semantic attributes in section 4. Elicitation of

Gudivada, Raghavan, and Seetharaman

semantic attributes in the form of repertory grid for the HF database using PCT is described in section 5. In section 6, we provide an analysis of the repertory grid to discover any dependencies that may exist among the semantic attributes. Our proposed algorithm for face information retrieval based on the repertory grid is described in section 7. Finally, section 8 concludes the paper.

2. Retrieval Requirements Analysis of Face Image Databases Law enforcement and criminal investigation agencies typically maintain large image databases of human faces. Such databases consist of faces of those individuals who have either committed crimes or suspected of involved in criminal activities in the past. Retrieval from these databases is performed in the context of the following activities: matching of Composite Drawings, Bann File searching, and Ranking for Photo Lineup. Composite Drawings are used in identifying a potential suspect from an image database. The victim or an eye witness of a crime describes the facial features of the perpetrator of the crime to a forensic composite technician. There may be considerable imprecision and uncertainty associated with this description. The forensic composite technician then sketches a face image from these descriptions. The retrieval system is expected to display those images in the database that match the sketch. Bann File searching is performed when a (suspected) criminal at hand does not disclose his legitimate identification information to enable a law enforcement/criminal investigator to retrieve the criminal's past history. Under such circumstances, the investigator visually scans the criminal's face to extract some features and uses them in performing the retrieval. In Ranking for Photo Lineup, person performing the retrieval provides a vague and often uncertain set of features of a face and expects the system to provide a

ranking of those faces in the database that match the feature descriptions. Often, this type of retrieval is performed in an exploratory manner by emphasizing a combination of prominent features during a retrieval and then emphasizing a different combination of features during subsequent retrievals to assist the investigation process. Retrieval involving matching of Composite Drawings can be viewed as Retrieval by Semantic Attributes (RSA) [1], since considerable subjectivity, imprecision, and uncertainty are associated with the attributes used in the retrieval. In Bann File searching, the person performing the retrieval has "live" access to the features of a face to be retrieved. Therefore, there is little imprecision and uncertainty associated with the specification of the attributes. However, considerable subjectivity is involved in the specification of a query. The assignment of a symbolic or a numeric value to a semantic attribute may vary from one user to another user. For example, assignment of a value wide to the semantic attribute nose width may vary amongst the retrieval system users. Hence, Bann File searching can also be viewed as RSA. Finally, in Ranking for Photo Lineup, the person performing the retrieval uses some features about which he is very certain and also some other features with which a great deal of imprecision and uncertainty may be associated. In this sense, Ranking for Photo Lineup can be considered as both Retrieval by Non-semantic Attributes (RNA) and RSA complementing each other. RNA and RSA are described in greater detail in [1]. In the next section, we describe the related work in the face recognition and retrieval area.

3. Related Work Automated systems for human face identification and classification are useful in a multitude of application areas and the initial studies in this direction date back to last century. Samal and Iyengar provide a survey

of work done in the automatic recognition and analysis of human faces [12]. They identify five major aspects of face study: representation of faces, detection of faces, identification of faces, analysis of facial expressions, and classification of faces based on physical features. The early systems for face recognition were essentially manual systems such as Photofit [13]. Alternatives to Photofit system include Multiple Image-Maker and Identification Compositor (MIMIC) (uses film strip projections), Identikit (uses plastic overlays of drawn features), and Compusketch (a computerized version of Identikit). Automatic/Semi-automatic face identification systems are based on feature representation of faces. Features can be derived either from the 2D intensity image or from the face profile. Examples of features in the former category include distance between eyes, hairline position, nose width and height, among many others. A face profile is simply an outline view of a face as observed from a side (left or right). Features derived from a face profile are usually based on distances and angles between characteristic points on the profile. These may include nose protrusion, area of the profile, distances between characteristic points, wiggliness of the profile, among others. In summary, a wide range of features have been used for face identification task. However, there is no general consensus as to what features are significant for discriminating face images. Much of the research in the automatic identification of faces has been carried out from the perspective of image processing and computer vision researchers. Major focus has been on automatically extracting facial features from the 2D intensity images. The success of these systems is quite limited despite several strong assumptions about the environment in which the image is produced as well as about the pose of the face [13, 14, 15]. Very little attention has been paid to the database issues such as feature organization and matching

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

algorithms. In most of the cases, matching algorithm is simply the Euclidean distance between the corresponding features. However, recent research indicates that features stored in terms of their deviations from the typical face are more useful for retrieval than the actual feature values themselves [16]. It should be recognized that the above investigations are carried out from the perspective of face recognition task. A face information retrieval system termed Xenomania is reported in [17]. The purpose of this system is face retrieval and not face recognition. The system is based on VIMSYS data model [18] and employs human-assisted approach to face feature extraction. In the next section, we provide a terse introduction to PCT.

4. Personal Construct Theory Personal Construct Theory (PCT) was originally proposed by George Kelly in Clinical Psychology domain [2, 3]. This theory is viewed as a formal model of organization of human cognitive processes. Both animate and inanimate objects with which a person interacts in everyday life constitute that person's environment. According to PCT, the objects comprising a person's environment profoundly influence his decision making process. These objects are referred to as entities or elements. A property of an element that influences a person's decision making process is known as a construct or a cognitive dimension. In other words, PCT assumes that people typically use these cognitive dimensions in evaluating their experiences for decision making. An element may possess many constructs. The process of assigning a value for a construct on a ordinal scale to reflect the degree to which that construct is present in an element is known as rating the element on that construct. Usually, a value of one is assigned if the construct is certainly present

Gudivada, Raghavan, and Seetharaman

in the element and a value of three is assigned if the construct is certainly absent in the element. To indicate a subjective neutral position, a value of two is used. However, the granularity of the rating scale can vary. A matrix that shows the elements and the corresponding construct values is referred to as repertory grid. The rows are labeled with the construct names and the element names form the column labels. In the context of image databases, the following interpretations are given to the constructs and the repertory grid. Constructs are viewed as cognitive dimensions of the image domain that are useful in making relevant distinctions among the database images to facilitate retrieval. Repertory grid generation is viewed as a complex sorting test in which the images are rated with respect to a set of constructs. To be consistent with the terminology in sections 1 and 2, we always use the term semantic attribute to refer to a construct. PCT experiment is carried out in two stages. During the first stage, a set of semantic attributes is discovered in the image database. The procedure for the first stage is as follows. Three randomly selected images from the image database are displayed in three quadrants of a computer display screen. The domain expert is asked to name the poles of a bipolar construct(s) vis-a-vis a semantic attribute by which images in the first and second quadrants are similar and maximally different from the image in the third quadrant. For the same set of images, the other two combinations are also considered. Then the next set of three images are shown to the domain expert and the same procedure is repeated. This process continues until the domain expert is unable to identify any more new semantic attributes. During the second stage, repertory grid is generated. The images in the database are shown to the domain expert in a sequence. The domain expert is asked to rate each of these images with respect to each of the semantic attributes identified in stage one.

More details on the PCT experimental methodology can be found in [1, 4]. The following section describes the elicitation of semantic attributes and the generation of repertory grid for the HF database.

motivation for and a method for analyzing the repertory grid.

5. Elicitation of Semantic Attributes

The repertory grid elicited from the domain expert is analyzed to extract implicit dependencies/relationships that may exist among the semantic attributes provided by the domain expert. In other words, repertory grid analysis techniques may be viewed as ways for determining the degree of dependence among the semantic attributes. The degree of dependence among attributes has important implications for retrieval. First, it can be used in reformulating a user query for improving the retrieval quality. For example, assuming that there is a high degree of dependency between two attributes and when a user query specifies only one of these attributes, the retrieval system may implicitly reformulate the query by adding the other attribute to the query. Query reformulation techniques have been investigated in bibliographic retrieval systems [5]. Second, the dependency information can be used in rating vis-a-vis assigning values for semantic attributes on new image instances automatically. There may be some attributes for which the process of rating may be algorithmically difficult or computationally expensive. In such cases, a high degree of dependency between a pair/set of attributes may obviate this difficulty by assigning the same rating for both the attributes assuming that one of these attributes can easily be rated automatically. Third, several retrieval methods, especially those that were developed in the context of bibliographic information retrieval systems, assume independence among the attributes. Repertory grid analysis techniques can used to ascertain the existence of any such independence. Once the independence among the semantic attributes is confirmed, most of the existing techniques for retrieval can be used with confidence.

First we describe the HF database and then the elicitation of semantic attributes and the generation of the repertory grid. A set of 93 black-and-white negatives of face photographs of students taken for a university yearbook is randomly selected from a large collection. Reprints of size 5" × 7" are made from these negatives. Then, all these photographs are scanned and stored as 256× 256 resolution, 256 levels of gray scale digital images. Each image is assigned a unique identification (ID) number for referencing purposes in the range 1 to 93. These images constitute our HF database. A campus police officer with expertise in forensic art served as the domain expert. A computer program using X window library is developed on a UNIX workstation to facilitate both stages of the PCT experiment. The graphical user interface provided by the program proved to be valuable in making the expert-computer interaction take place naturally. Also, by capturing all the repertory grid entries directly into disk files, we were able to guard against the data entry errors from creeping into the grid. PCT experiment is performed on the HF database using this program. The result of the first stage is the identification of a set of 19 semantic attributes by the domain expert. The semantic attributes names, their left and right hand poles are shown in Table 1. During the second stage, the repertory grid is generated. A scale of 1-3 is used in rating the images. For space reasons, we have not included the repertory grid in this paper. It can be found in Appendix C in [1]. Next, we describe the

6. Repertory Grid Analysis Techniques

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

Three methods exist for analyzing the repertory grid: Rnorm measure [1, 6], Hart's measure [7], and Expected Mutual Information Measure (EMIM) [8, 9]. The grid analysis involves first computing the similarity between each pair of attributes and then establishing dominant dependencies among the attributes. The methods differ in the way they compute the similarity between a pair of attributes. Hart's measure assumes that the entries in the repertory grid are measured on a interval scale. On the other hand, the Rnorm measure assumes that the entries in the repertory grid are measured on a ordinal scale whereas the EMIM considers the grid entries as being measured on a nominal scale. We now describe the EMIM method.

possible pairs of attributes. The normalized value of I is calculated using the formula ( I 1.10 ) • 100 . After normalization, the range of EMIM values is from 0 to 100. An EMIM value of 0 indicates that the attributes are completely independent while a value of 100 indicates that the attributes are completely dependent. EMIM has the following three distinct characteristics. First, given an attribute value in an image, it enables us to predict the other attribute values in the same image. Secondly, the measure is independent of the discrete scale used for rating the attributes. Finally, the measure is sensitive to the size of the image collection used in the experiment. The attribute dependency information as shown in Table 2 is useful when we consider only a pair of attributes at a time. However, it 6.1 EMIM Method is difficult to establish attribute dependency patterns that may span more than two Under this method, a semantic attribute vector attributes. To identify such clusters of Ci represents the rating values for semantic dependencies among the attributes, it is useful attribute i across all the images under to construct a maximal spanning tree (MST). consideration. The extent to which the This is facilitated by viewing the attribute attribute vectors Ci and C j are mutually dependency information in Table 2 as a complete weighted graph. The set of dependent can be measured by EMIM, attributes constitute the vertices of the graph denoted as I , and is given by: P(Ci = p, C j = q ) and the weight of the edge I = I (Ci , C j ) = P(Ci = p, C j = q ) ln P(Ci = p) P( C j = q ) connecting the vertices i p q and j is the table entry where 1 ≤ p, q ≤ m and m is the maximum corresponding to the row i and column j . value on the rating scale. The MST for this graph can be obtained by It should be noted from the above using an algorithm such as Kruskal's [10]. equation that when the attribute vectors Ci The maximal spanning tree provides some and C j are independent then, I ( Ci , C j ) = 0 ; useful insights into the relationships that exist among the semantic attributes. The attributes since for any value of p and q from the which are deemed highly interrelated by the respective domains. On the other hand, if the domain expert are captured as adjacent attribute vectors Ci and C j are completely vertices in the MST. dependent then, For the HF image database, since there are P ( C i = p, C j = q ) = P ( C i = p ) = no significant dependencies between the P(C j = q ) = 1 3 and the expression for I facial features (i.e., the semantic attributes), MST is not constructed. Also, we conclude evaluates to ln(3) and this value is equal to that all the semantic attributes are 1.10. Therefore, the maximum value of I is independent. This observation is consistent 1.10 and the minimum value is 0.0. Table 2 with the related studies in Anthropology [11]. shows the normalized values of I between all This independence among the attributes is

∑∑

Gudivada, Raghavan, and Seetharaman

important to the retrieval method that we propose in the following section.

7. Proposed Algorithm for RSA Based on Repertory Grid

binary. Bollmann [20] provides a generalization of term relevance weight problem to multilevel preference relations. Under this generalization, term/attribute preference weight of an attribute ai , denoted ARWi , is given by:

Pr ob( ai = 1 & ai' = 0 | f ⋅ > f ' )

ARWi = ln We introduce some notation first. The images Pr ob( ai = 0 & ai' = 1| f ⋅ > f ' ) in the database ( D ) are designated as f1 , f2 , f3 , , fn and the binary semantic The term in the numerator indicates the probability of an attribute ai in image f has a attributes (to be explained later) ( A ) are denoted by a1 , a2 , a3 , , ak . Also, Rb [i, j ] value greater than its corresponding value ( ai' ) denotes the (binary) value of the semantic in image f ' given that the user prefers f over attribute ai in image f j , where 1 ≤ i ≤ k and f ' . The symbol ⋅ > denotes the relation 1 ≤ j ≤ n . In the following discussion, we use "preferred over." Similar interpretation is the term attribute to refer to binary semantic given to the denominator. Also, it is shown in attribute. The algorithm that we propose is [20] that referred to as Attribute Relevance Weight (ARW) number of image pairs ( f , f ' ) with f ⋅ > f ' and a > a' i i algorithm and the ARW = ln i ' ' ' theoretical basis for the number of image pairs ( f , f ) with f ⋅ > f and a > a i i algorithm is presented next. Since, all ai s are binary, the above expression can be rewritten as 7.1 Theoretical Basis for ARW + − + ARWi = log( S S ) where S denotes the

L

L

Algorithm

number Statistical techniques are used for exploiting user provided relevance information to assign weights to search terms to reflect their relative importance in the context of a user retrieval need in bibliographic retrieval systems [19]. The weight assigned to a search term is referred to as term relevance weight. Usually, user provided relevance information is in the form of preference relations (e.g., image 5 is preferred over image 2). It should be noted that these preferences can be multilevel. That is, if we have two preference relations, for example, "image 5 is preferred over image 2" and "image 2 is preferred over image 7," then the degree by which image 5 is preferred over image 7 is, clearly, different from the degree by which image 2 is preferred over image 7. However, the methods for term relevance weighting assume that the preference relations are not multilevel and the search terms are

of

image

f ⋅ > f ' and ai > ai'

pairs ( f , f ' ) with and S − denotes the

image pairs ( f , f ' ) with f ' ⋅ > f and ai > ai' . Recall that the rating values in the repertory grid are in the range 1 to 3. Since the proposed algorithm works with binary semantic attributes, we need to transform the repertory grid. Binarization of the repertory grid is explained next. number

of

7.2 Binarized Repertory Grid In binarized repertory grid, every semantic attribute is viewed as a binary variable. For example, consider the semantic attribute Eyebrows Shape of HF database shown in Table 1. This attribute can assume any of the following three values: Arched, Twisted, or Straight. We decompose the attribute Eyebrows Shape into three distinct binary

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

semantic attributes: Eyebrows Shape Arched, Eyebrows Shape - Twisted, and Eyebrows Shape - Straight. Assuming that the attribute Eyebrows Shape has a value 3 (i.e., Straight) in the original repertory grid, the following values are assigned to the corresponding binary semantic attributes: Eyebrows Shape - Arched = 0, Eyebrows Shape - Twisted = 0, and Eyebrows Shape Straight = 1. Similar mappings are provided for other values the semantic attribute Eyebrows Shape can assume. We now describe the ARW algorithm.

7.3 ARW Algorithm The following description is cross referenced with various steps shown in Figure 2. First, a user specifies an (initial) query in terms of binary semantic attributes (step 1). The user may be uncertain about the accuracy or completeness of these attributes in precisely specifying his need. Second, the system retrieves and displays to the user a subset of the database images that exactly match the specified binary semantic attributes in the initial query (step 2). We denote this set of images as F . Third, the system obtains user relevance judgments in the form of preference relations. It is assumed that each preference relation is of the form fr ⋅ > fs and fr , fs ∈ F . All such preference relations are denoted by the set P (step 3). Fourth, using these preference relations, the quality of all the binary semantic attributes for retrieval in the context of the present user need is evaluated. These quality assessments are quantified by ARW values and are calculated as shown in step 4. Fifth, using these ARW values, we assign a numeric value to each image in the database. These values are referred to as Retrieval Status Values ( RSV s). RSV of an image ft , denoted RSVt , is computed as shown in step 5. It should be noted that the independence assumption among the semantic attributes is exploited in the calculation of RSV values.

Gudivada, Raghavan, and Seetharaman

Sixth, the database images are rank ordered using the RSV values and are shown to the user in the decreasing order of relevance (step 6). If the user is not satisfied with this rank ordering, he may choose to provide additional relevance judgments on few images placed at the top of this rank ordering. Again the quality of attributes is reevaluated, database images rank ordered, and shown to the user. This process continues until the user is satisfied with the retrieved images (i.e., the top few images in the rank ordering). The effectiveness of this approach will be measured in terms of two factors. First, whether or not the system provided successive rank orderings converge toward the user expected rank ordering of the database images. Second, the number of iterations the system takes for the system provided rank ordering to converge (if at all) with the user expected rank ordering vis-a-vis the rate of convergence. We have implemented a prototype system for face image retrieval based on the ARW algorithm. A user specifies a RSA query by selecting binary semantic attributes corresponding to a (small) subset of the 19 semantic attributes as shown in Figure 3. The query is processed using the ARW algorithm. The window that is used to solicit user relevance judgments in the form of preference relations is shown in Figure 4. The initial testing indicates that the ARW algorithm is able to find the relevant images to queries within few iterations. Controlled experimentation is planned for evaluating the ARW algorithm for convergence in general and the rate of convergence in particular. The following section concludes the paper.

8. Conclusions Retrieval by Semantic Attributes (RSA) is one of the four generic retrieval types in image database applications as identified in [1]. Methods for systematically deriving the semantic attributes and analyzing the

repertory grid for dependencies among these semantic attributes is an important aspect in the design of image database systems. Though the notion of semantic attributes is used in [17, 21], no methods are provided for systematically eliciting the semantic attributes. The semantic attributes that we have derived using the PCT are comparable to those that are used by the U.S. Federal Bureau of Investigation (FBI), with the following exceptions. FBI uses fewer semantic attributes than the number our study has identified. Also, all the semantic attributes need not confirm to the same rating scale. For example, in case of the semantic attribute nose, it can assume any one of the following values (nominal scale): Average, Hooked, Snub, Small, Wide Base, Concave, Narrow Base, Downward Tip, and Large. Since PCT requires all the semantic attributes to be of bipolar type (i.e., measured on a ordinal scale), the attribute nose has been decomposed into two semantic attributes: nose protrusion and nose width (see Table 1). This decomposition not only introduces an ordinal scale for the measurement nose width but may also simplify the procedures for automatic/semi-automatic extraction of attributes from 2D intensity representation of face images. In addition to the requirement that all semantic attributes be of type bipolar, PCT also requires the same level of granularity in the rating scale for all the semantic attributes. However, in our experiments the domain expert acknowledged that there are semantic attributes which defy both the requirements. We conclude that different levels of granularity as well as the provision for nonbipolar semantic attributes may be essential for PCT to be useful in image database applications. This naturally leads to another conclusion that, of the three measures for the analysis of the repertory grid that we have mentioned in this paper, only the EMIM measure is appropriate for the grid analysis if all the PCT requirements are not rigidly met. In summary, we have developed an overall

approach for facilitating RSA in image database application areas in general, and a test bed for experimenting with algorithms for RSA in particular. One of our future research direction is to conduct controlled experimentation of the ARW algorithm on the HF database. We hope that these results will enable us to make some conclusions about the convergence in general and to quantify the rate of convergence in particular. Another future research direction is to investigate methods for automatic extraction of semantic attributes from the 2D intensity representation of face images. Over the years, both Anthropology Scientists and Forensic Artists have devised a unit of measurement, called module, based on which the face is divided into various proportions/areas. All these modules joined together form a structure that is referred to as the cannon of the human head. This cannon is used by Forensic Artists when recreating drawings or paintings of human subjects. It is our assessment that the algorithms for the automatic extraction of the semantic attributes based on the cannon structure would be both tractable and efficient.

Acknowledgment This research is supported by the U.S. Department of Defense under Grant No: DAAL03-89-G-0118. The authors are grateful to Officer Rose Lachale for her involvement in this research as a domain expert and for providing insights into Forensic Art. The authors also acknowledge S. Bhakta and K. Sastry for their contributions in the implementation aspects of this research.

References [1]

Gudivada, V.N. (1993), A Unified Framework for Retrieval in Image Databases, Ph.D. Dissertation,

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

University of Southwestern Louisiana, Lafayette, LA. [2]

Kelly, G. (1955), The Psychology of Personal Constructs, Norton Publishing Company.

[3]

Kelly, G. (1969), A mathematical Approach to Psychology. From the collections of Clinical Psychology and Personality: The Selected Papers of George Kelly, John Wiley, B. Maher (Ed.), pp. 94--112.

[4]

[5]

[6]

Raghavan, V.V., Gudivada, V.N., and Katiyar, A. (1991), "Discovery of Conceptual Categories in an Image Database," International Conference on Intelligent Text and Image Handling, RIAO 91, Barcelona, Spain, pp. 902-915. Salton, G. and McGill, M. (1983), Introduction to Modern Information Retrieval, McGraw-Hill, N.Y., 1983. Bollmann, P., Jochum, F., Reiner, Weissmann, V., and Zuse, H. (1985), The LIVE-Project Retrieval Experiments Based on Evaluation Viewpoints, Proc. of the Eighth Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval, Montreal, CDN, June 1985, pp. 213214.

[7]

Hart, A. (1986), Knowledge Acquisition for Expert Systems, McGraw-Hill Publishing Company, New York, NY.

[8]

Chow, C.K. and Liu, C.N. (1968), "Approximating Discrete Probability Distributions with Dependence Trees," IEEE Transactions on Information Theory, Vol. 14, pp. 462-467.

[9]

Deogun, J.S., Raghavan, V.V., and Bhatia, S.K. (1989), "A Theoretical

Gudivada, Raghavan, and Seetharaman

Basis for the Automatic Extraction of Relationships from Expert--Provided Data, Fourth International Symposium on Methodologies for Intelligent Systems, pp. 123--131. [10] Stinson, D.R. (1986), An Introduction to the Design and Analysis of Algorithms, Charles Babbage Research Center, Manitoba, Canada. [11] Personal communication, Professor D. Cring, Department of Anthropology and Sociology, University of SW Louisiana, Lafyette, LA., February 1993. [12] Samal, A. and Iyengar, P. (1992), "Automatic Recognition and Analysis of Human Faces and Facial Expressions: A Survey," Pattern Recognition, Vol. 25, No. 1, Pergamon Press, pp. 65-77. [13] Kaya, Y. and Kobayashi, K. (1987), "A Basic Study on Human Face Recognition," in Frontiers of Pattern Recognition, Academic Press, pp. 265289. [14] Craw, I., Ellis, H., and Lishman, J.R. (1987), "Automatic Extraction of Face Features," Pattern Recognition Letters, Vol. 5, February, Elsevier Science Publishers, pp. 183-187. [15] Sakai, T., Nagao, M., and Kanade, T. (1972), "Computer Analysis and Classification of Photographs of Human Faces," Proc. of First USA-Japan Computer Conference, American Federation of Information Processing, Montvale, NJ., pp. 55-62. [16] Bruce, V. (1988), Recognising Faces, Lawrence Erlbaum Associates, Hillsdale, NJ. [17] Bach, J., Paul, S., and Jain, R. (1993), "A Visual Information Management System for the Interactive Retrieval of

Faces," IEEE Transcations on Knowledge and Data Engineering, Vol. 5, No. 4, pp. 619-628. [18] Gupta, A., Weymouth, T., and Jain, R. (1991), "Semantic Queries with Pictures: The VIMSYS Model," 17th International Conference on Very Large Data Bases, Barcelona, Spain, pp. 69-79. [19] Robertson, S.E. and Sparck Jones, K. (1991), "Relevance Weighting of Search Terms," Journal of American Society for Information Science, May June, pp. 129-146. [20] Bollmann, P. (1993), Unpublished Manuscript, Department of Computer Science, Technical University of Berlin, Germany. [21] Hirabayashi, F., Matoba, H., and Kasahara, Y. (1988), "Information Retrieval Using Impression of Documents as a Clue," ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 233-244.

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

Semantic Attribute No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Semantic Attribute Name Eyebrows Shape Eyebrows Width Eyelids Nose Protrusion Nose Width Jaw Line Indentation Forehead Width Hair Length Ears Shape Hairline Position Face Shape Lip Thickness Cheek Bones Lower Jaw Shape Distance Between Eyebrows Pupil Distance Nasal Lines Skin Tone Nasal-Lip Distance

Left Hand Pole Arched Thick/Bushy Close Set Small Small None Narrow Short Close Normal Oval Thin Lean Pointed Wide Wide None Olive Small

Twisted Thin Squinting Upturn Large Shallow Medium Medium Normal Receding Square Medium Round Trapezoidal Normal Centered Light Ruddy Medium

Right Hand Pole Straight Smooth Bulging Bulging Wide Base Deep Wide Large Protruding Bald Round Large Chubby Round Continuous Close Set Prominent Dark Large

Table 1: Semantic Attributes of Human Face Image Database

Gudivada, Raghavan, and Seetharaman

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

S2 2

S3 2 1

S4 1 2 3

S5 1 1 3 13

S6 1 1 2 1 0

S7 0 1 1 1 2 0

S8 3 3 0 4 3 6 0

S9 3 3 0 5 1 2 0 5

S10 2 1 0 0 0 1 1 2 4

S11 1 2 2 6 3 1 0 6 0 1

S12 2 1 3 7 5 2 0 5 3 1 6

S13 2 3 3 6 4 3 1 1 1 0 4 1

S14 0 1 2 2 3 2 1 4 1 0 7 4 4

S15 1 1 1 0 1 1 2 4 1 0 1 1 1 1

S16 2 2 2 4 4 1 4 1 1 0 1 2 2 3 2

S17 2 3 2 0 0 4 1 7 1 1 1 2 4 0 2 1

S18 2 3 2 7 8 2 1 2 1 1 2 7 3 1 3 3 1

S19 3 2 0 1 1 3 1 6 4 1 1 2 1 1 2 1 5 0

Table 2: EMIM Values Between Human Face Image Database Semantic Attributes

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

Database Image Set ( D ) ← { f1 , f2 , f3 ,

L, f } n

Binary Semantic Attribute Set ( A ) ← {a1 , a2 , a3 ,

L, a } k

Rb [i, j ] denotes the (binary) value of the semantic attribute ai in the image f j , where 1 ≤ i ≤ k and 1 ≤ j ≤ n /*

1. User specifies an initial query Q */

/*

2. Display images retrieved into F based on perfect match on attributes in Q */

/*

3. Obtain user relevance judgments ( P ) on F specified as preference relations of the form fr ⋅ > fs . */

Q⊆ A

F ← { f j | ∀ai ∈Q, Rb [i, j ] = 1}

P ← {( fr ⋅ > fs )}

/*

4. Calculate Attribute Relevance Weights ( ARWi , 1 ≤ i ≤ k ) */ Calculate ARWi as:

S+ ← 0 , S− ← 0 ∀p ∈ P , If Rb [i, r ] > Rb [i, s ] then S + ← S + + 1 elseif Rb [i, r ] < Rb [i, s ] then S − ← S − +1 ARWi ← ln( S + S − ) /* 5. Compute Retrieval Status Values ( RSVt , 1 ≤ t ≤ n ) for each image in the database */ Calculate RSVt as:

RSVt ← 0 ∀ai ∈ A, if ( Rb [i, t ] = 1)) then RSVt ← RSVt + ARWi

/* 6. Rank order database images ( D ) on RSV values and display */

RSVl ≥ RSVm for 1 ≤ l < n, l < m ≤ n If desired, elicit additional preference relations and add them to P . Then, repeat steps 46. Otherwise, terminate

Figure 2: Strategy for RSA based on ARW Computation

Gudivada, Raghavan, and Seetharaman

Figure 3: RSA Query Specification Window

An Approach to Interactive Retrieval in Face Image Databases Based on Semantic Attributes

Figure 4: User Preference Relations Specification Window

Gudivada, Raghavan, and Seetharaman

Suggest Documents