modeling and retrieving images by content

25 downloads 5520 Views 2MB Size Report
requirements and to structure the requirements from a domain-independent perspective. This ..... As the name implies, transform based methods represent a shape in terms of the result of ..... We use a laser range finder to effortlessly measure ...
Infi~rmationPtr~cessing& Management, Vol. 33, No. 4, pp. 427-452, 1997 © 1997 Elsevier Science Ltd All rights reserved. Printed in Great Britain 0306-4573197 $17 + 0.00

Pergamon Plh S0306-4573(97)00007-1

MODELING AND RETRIEVING IMAGES BY CONTENT VENKAT N. GUDIVADAI'* and VIJAY V. RAGHAVAN" Computer Science Department, University of Missouri--Rolla, Rolla, MO 65409, U.S.A. and 2The Center for AdvancedComputer Studies, University of Southwestern Louisiana, Lafayette, LA 70504, U.S.A. (Received 2 October 1995; accepted 27 January 1997)

Abstract--A content-based image retrieval system (CBIR) is required to effectively utilize information from image databases. Content-based retrieval is characterized by the ability of the system to retrieve relevant images based on their visual and semantic contents rather than by using atomic attributes or keywords assigned to them. In this paper, we provide a taxonomy for approaches to image retrieval and describe their characteristics and limitations. We examined a number of image database applications to discover their retrieval requirements and to structure the requirements from a domain-independent perspective. This study enabled us to provide a taxonomy for image attributes and to propose a number of generic query operators. These operators are adequate to realize CBIR in a number of diverse applications. We propose a novel system architecture for CBIR that supports the generic query operators. The architecture is structured in a way to enable applications to inherit only those query operators that are useful in the domain. We have developed a partial prototype implementation of this architecture. The versatility and effectiveness of the architecture is demonstrated by configuring the prototype implementation for two image retrieval applications: realtors information system and face retrieval system. The first application is for real estate marketing and the other is for law enforcement and criminal investigation. © 1997 Elsevier Science Ltd

1. RETRIEVING IMAGES BY CONTENT Images are produced at an ever increasing rate through defense and civilian satellites, military reconnaissance and surveillance flights, finger printing and mug-shot capturing devices, scientific experiments, biomedical imaging, and home entertainment systems. As an example, NASA's Earth Observing System (EOS) is projected to receive one terabyte of image data per day when fully operational. The image retrieval (IR) problem is concerned with retrieving images that are relevant to user requests from a large collection of images, referred to as the image database. Application areas that consider IR as a principal activity are both numerous and diverse: art galleries and museums, architectural and engineering design, interior design, remote sensing and management of earth resources, geographic information systems, scientific databases, weather forecasting, retailing, fabric and fashion design, trademark and copyright, law enforcement and criminal investigation, picture archiving and communication systems. With the recent ubiquitous interest in multimedia systems, IR has attracted the attention of researchers across several disciplines. A content-based image retrieval (CBIR) system is required to effectively utilize information from the image databases. Content-based retrieval is characterized by the ability of the system to retrieve relevant images based on their semantic and visual contents rather than by using atomic attributes or keywords assigned to them. The relevance of retrieved images may be judged differently by the system users for an identically formulated query. That is, the notion of * To w h o m all correspondence should be addressed. 427

428

VenkatN. Gudivadaand VijayV. Raghavan

relevance is dynamic, subjective, and is a function of both the user's retrieval need and context. This necessitates that a CBIR system be adaptive and process queries from the view point of the user's interpretation of the image content and domain semantics. As it is difficult to provide a precise definition of CBIR, we illustrate the concept using few examples. • Remote sensing application: given a query 'show me all geographic regions where the tropical



• • • •

rain forest has been destroyed by 50% in the last 10 years', the system retrieves all images that have the qualifying regions. The retrieved images are ranked in the order of their similarity to the query. Medical application: given a radiological image of a patient, the system retrieves and rank orders images with similar features and patterns from large archives of radiological images. This helps a Physician in the diagnosis and treatment plan and is especially useful in training interns. Trademark and copyright application: given a trademark symbol, the system retrieves and rank orders similar trademark symbols in the database. Architectural design~Real estate marketing: given a floor plan of a residential building, the system retrieves similar floor plans. Criminal investigation~security applications: given a sample mugshot image, the system retrieves similar images in the database. Defense application: given an ATR (automatic target recognition) image, the system retrieves similar images in the ATR database.

Interdisciplinary interest in CBIR has contributed research results on various aspects of the problem (Grosky & Mehrotra, 1992). However, the investigations leading to these results were carried out quite independently and often from application specific concerns. As a consequence, the systems for CBIR have become limited in the range of application areas that are supported and address only those aspects of the problem that are relevant in a specific domain. In this paper, we provide an overview of various approaches to image retrieval from an evolutionary perspective (Section 2). We examine the conceptual issues of CBIR in Section 3. This helps in the understanding of the various facets of CBIR and shaped our proposed framework for CBIR. We refer to this framework as CBAIR (content-based adaptive image retrieval) and it is described in Section 4. CBAIR supports several generic query operators in a domain independent manner. These operators are as fundamental to CBIR as relational algebra is to the relational DBMS (database management system). The operators are adequate to realize CBIR in a number of applications. An image database designer can configure the system for various applications by selecting only those operators that are necessary in the application. This helps to produce a tailor-made system without inheriting all the overhead associated with the CBAIR. We have partially implemented the CBAIR architecture and demonstrated its versatility and effectiveness by configuring it to two image retrieval applications: realtors information system and face retrieval system (Section 5). The first application is for real estate marketing and is intended to demonstrate primarily the retrieval by spatial similarity query operator. The second application is for law enforcement and criminal investigation and demonstrates the retrieval by subjective attributes query operator. Section 6 concludes the paper.

2. AN OVERVIEW OF APPROACHES TO IMAGE RETRIEVAL To provide a taxonomy for and perspective on systems for image retrieval, we first identify a number of facets or dimensions of IR systems. They include image data model, mode of feature extraction and semantics capture, when features and semantics are derived, level of abstraction manifested in the features, expressiveness and intuitiveness of the query language, how the user subjectivity in the interpretation of image content is handled, degree of domain independence, and system architecture. An image data model (IDM) is a scheme for representing entities of interest in the images,

Modeling and retrieving imagesby content

429

their visual and geometric characteristics such as color, texture, shape, spatial and topological relationships between the entities, and semantic associations such as aggregation, generalization/specialization among the entities. An image retrieval model (IRM) encompasses the specification of the following: an image data model, a language for query specification, and matching or retrieval strategies for processing user queries. An image retrieval system (IRS) provides convenient and efficient access to image collections by implementing the image retrieval model. Since most of the image databases allow only queries and no update operations, concurrency control and transaction processing issues are not important, at least for the current generation IRS. An image database management system (IDBMS) is an IRS with additional functionality for concurrency control and transaction management. Features refer to image characteristics that can be used to describe the visual contents of an image. Semantics, on the other hand, denote high-level domain concepts manifested in the image. Features are more amenable for automatic extraction than the semantics. Depending on the complexity of the images, features can be extracted with no manual intervention, or may require a limited manual intervention. The first case is referred to as automatic mode whereas the second is termed semi-automatic. Though the automatic feature extraction is appealing for developing large-scale image retrieval systems, the state of the art in image processing and pattern recognition has not advanced to make this a reality. Currently, automatic feature extraction is limited to narrow domains where the types of objects that occur in the images are known in advance and the images are generated in a controlled environment (Daneels et al., 1993). Typically, deriving semantics requires considerable human involvement and they are synthesized by using domain-specific transformations under the guidance of a domain expert. Features and semantics can be derived at the time an image is inserted into the database (termed a priori), or at the time of query processing (termed dynamic). Extracting features a priori facilitates efficient online query processing though it limits the scope for ad hoc queries. The level of abstraction manifested in the features refers to the suitability of the features for direct use in query processing. When the level of abstraction is low (e.g., slope of a line segment), the features are not useful for direct use in processing user queries. On the other hand, highly abstracted features (e.g., polygonal approximation of an object in the image) lend themselves to use in query processing. The level of abstraction in the features is best viewed as a continuum spanning the raw or physical image (unstructured data) at one end and a highly abstracted image (structured data) at the other end. Unstructured data is also referred to as byte string, longfield, and BLOB (binary long object box). The term semi-structured data is used to refer to data that falls between unstructured and structured data. It should be noted that only unstructured data is in a format suitable for visual display of the image on output devices and all other abstractions or representations exist only to facilitate CBIR. The next facet, expressiveness and intuitiveness in the query language, deals with the naturalness and ease of use of the language employed for querying image databases. As we will see in Section 3, there are several query operators, each one requires a specification scheme that is natural to its intrinsic nature. In contrast with the queries in DBMS environment, image retrieval queries tend to be incomplete and imprecise, and require incremental specification and refinement. The system users may differ in the interpretation of the image content. As noted earlier, relevance of retrieved images may be judged differently by the system users for an identically formulated query. The user subjectivity in the interpretation of the image content facet looks into how the user subjectivity is incorporated into the query processing task. The degree of domain independence facet evaluates how generic the retrieval system is. In other words, it is a measure of the effort required to adapt the system for various applications. Finally, the system architecture facet describes implementation aspects of IR systems including persistent storage of the images, features, and semantics; and control and coordination among various functional components of the system. Our proposed taxonomy captures the chronology of developments and provides an evolutionary perspective on IR systems. The taxonomy identifies five approaches to image retrieval based primarily on the architecture used for implementing the system. In the first

430

Venkat N. Gudivada and Vijay V. Raghavan

approach, conventional data management systems have been used whereas in the second, database functionality has been added to an image processing and pattern recognition system. The third approach exploits the features of extensible and object-oriented database systems, and the fourth incorporates adaptive behavior to the retrieval system via user relevance feedback and iterative query refinement. The approaches that do not fall into the above categories are grouped under the fifth approach. In the following five subsections, we describe characteristics of these approaches in terms of the facets described above. We end the section with a summary.

2.1. Image retrieval architecture based on conventional data management systems

Conventional data management systems ~are primarily designed for commercial and business applications where the data to be managed is of structured type. However, this has been by far the most popular approach for implementing image retrieval systems, especially the earlier ones. The system architecture for image retrieval using this scheme is shown in Fig. 1. An image is represented by a set of keywords, or attribute-value pairs. Keywords represent high-level domain concepts (i.e., semantics) manifested in the images. Attributes represent structured data that are derived extrinsic to the image contents, such as the date of acquisition, and resolution of the image. Attributes are also used to represent data that are derived from the images through either automatic image interpretation or human involvement. The mode of acquisition of keywords or attributes is not a concern to the system, since the emphasis is on structuring and providing access to this information. Keywords and attributes are derived at the time an image is inserted into the database. The actual images are not managed by the DBMS, since the latter does not have the provision for storing the images using database record fields. The images are stored using the file system of the host computer. However, both the image keywords and attributes are stored in the database (Fig. 1). Queries are specified using the keywords or attribute-value pairs. Typically, the query language used is that of the host DBMS such as SQL (structured query language). The retrieval strategy is also essentially that of the host DBMS. Once a decision is made as to which images are to be retrieved, the corresponding image files are accessed and displayed. We refer to this

J ~

Conv~mlional ]

DBMS

.--

OS FH¢ System

Fig. 1. Image retrieval architecture based on conventional data management systems. ~Conventional data management systems are those that are based on the three classical data models---namely, hierarchical, network, and relational.

Modeling and retrieving images by content

431

type of retrieval as attribute based image retrieval. The primary advantage of this approach has been the cost effectiveness of the system since all the DBMS functionality is readily available. The approach provides domain independence since feature extraction and semantics capture are outside the scope of the retrieval system. However, the cost effectiveness, domain independence, and simplicity of the approach brings with it a host of problems. Since the images are represented by a set of keywords or attributevalue pairs, the level of abstraction in the image representation is quite high. Generally, the higher the level of abstraction, the lesser is the scope for posing ad hoc queries. In other words, the data model is too simple to capture the rich visual and semantic content manifested in the images. The host query language is inadequate in its expressiveness and naturalness for specifying image queries. The issue of user subjectively is avoided altogether in this approach. The attribute based retrieval is advocated and advanced primarily by DBMS specialists. Representative systems in this category include GRAIN (Chang et al., 1977), a system by Tang (1981), ADM (Takao, Itoh & Iisaka, 1980), AQL (Antonacci et al., 1980), GEO-QUEL (Berman & Stonebraker, 1977), GADS (Mantey & Carlson, 1980), and HI-MAP (Tanaka & Ichikawa, 1988).

2.2. Image retrieval architecture based on an image processing and pattern recognition system

The system architecture for image retrieval based on this approach is shown in Fig. 2. A distinctive characteristic of systems in this category is that an array of both general purpose and domain-specific image processing and pattern recognition operations are provided as an integral part of the system. There are two distinct image data models associated with the systems in this category: one for the structured data (primarily intended for users who are not domain experts)

u u l~nt

I

Display

jr ~

OSFileSys~n

0$ Filc Sy~m

I

Fig. 2. Image retrieval architecture based on an image processing and pattern recognition system.

432

VenkatN. Gudivadaand VijayV. Raghavan

and the other for the unstructured data (intended for domain experts). The data about an image that is extrinsic to the image (e.g., date of image acquisition, spectral channel number) is stored as structured data in the header portion of the image file. Additional structured data can include the results of automatic image interpretation, and attributes derived through human involvement. The data model then is simply a set of keywords or attribute-value pairs. User queries are specified by using a set of system-specific commands. For example, in ELAS system (ELAS, 1990), commands exist for retrieving LANDSAT images based on attribute values such as the date of image acquisition, geographic area represented by the image, spectral channel number, among others. The system-specific commands are implemented by enhancing the file system of the host computer. Alternatively, the structured data can be managed by using a conventional DBMS. User queries are specified by using the query language of the DBMS (e.g., SQL). The retrieval strategy is that of the DBMS. The model employed for the unstructured data is primarily one of the two fundamental physical representations: raster, vector. Queries are specified by using a sequence of systemspecific commands. A command typically specifies an image processing or pattern recognition operation to be performed on the image(s). The user then determines the relevance of the image to his need by manual interpretation. It is expected that the users are domain experts who are familiar with the image processing and pattern recognition capabilities of the system. The structured data that is derived extrinsic to the image is manually generated and entered into the system at the time of inserting the image. However, the structured data obtained via semi-automatic or manual interpretation of the image by the domain experts or analysts is added incrementally. Typically, several features and semantics are generated by various analysts by interpreting the image from different problem perspectives over a period of time. The level of abstraction manifested in the features range from low-level to highly abstract. The query language is very expressive but less intuitive and is suitable only for the domain experts. The subjectivity problem is not a major issue since the features are derived using well-established algorithms coupled with human interpretation. In other words, features are less subjective and therefore, no considerable imprecision and uncertainty is associated with the feature values. The systems tend to be highly domain-specific since they are so tightly integrated with image processing and pattern recognition aspects of the domain. This architecture is primarily advanced by image processing and pattern recognition researchers. Representative systems in this category include MIDAS (McKeown & Reddy, 1977), REDI (Chang & Fu, 1980a), IMAID (Chang & Fu, 1980b), PICDMS (Chock, 1982; Joseph & Cardenas, 1988; Cardenas et al., 1993), VICAR-IBIS (Bryant & Zobrist, 1977), ELAS (ELAS, 1990), PDB (Maccarone & Tripiciano, 1993), I-See (Fierens et al., 1992), ERDAS (ERDAS, 1995), GRASS (GRASS, 1995), ARC/INFO (Morehouse, 1985), and SAND (Aref & Samet, 1991a,b).

2.3. Image retrieval architectures based on extensible and object-oriented database systems

The nature and the type of abstractions required to support CBIR vary from one application to another. Thus, each application is forced to choose abstractions that efficiently support retrieval operations prevalent in the domain. This precludes designing and building a general purpose IR system based on the conventional data models since they have limited data types. In response to this, some researchers have investigated developing IR systems based on extensible and object-oriented database systems. The basic idea behind extensibility is to provide facilities for the image database designers to define their own application-specific data types and operations on them. An extensible database system must support at least the facility for abstract data types, whereas extensibility is a central feature of object-oriented database systems. The architecture for systems under this category is shown in Fig. 3. Extensible and objectoriented data models provide most flexibility for CBIR. Image features and semantics can be represented by using new features such as set-type attributes, procedural fields, binary large object boxes, and abstract data types. The query specification language of the extensible or

Modeling and retrieving images by content

433

Ext~mmble or

Object-o~i~ted DBMS

Fig. 3. Image retrieval architectures based on extensible and object-oriented database systems.

object-oriented DBMS is extended to include these new features. Unlike the previous two approaches, extensible and object-oriented database systems provide an integrated mechanism to store physical images, their features and semantics within the database system. This helps to reap the benefits of the database approach--convenient sharing, controlled redundancy, transaction support, concurrent access, security and authorization. Moreover, the object-oriented principles entail modular system development, easier maintenance, and evolution of the system. Features and semantics are derived at the time of image insertion into the database to ensure online query processing. Whether or not the features are automatically derived depends on the complexity of the image and the nature of the feature. For example, color histograms of an image can be automatically generated, whereas extracting object boundaries and assigning semantic labels to the objects is done semi-automatically. Though object-oriented database systems have been actively investigated for more than ten years, an intuitive and easy to use query language is yet to emerge. In the case of extensible database systems, the emerging SQL/3 standard provides several features for specifying options for storing the images and to formulate queries on image contents represented as structured data. Though the subjectivity problem has not been addressed in this approach, the system architecture provides a natural means to accommodate it. Compared to the previous approach, the domain dependence problem is somewhat alleviated since some aspects of the system are application invariant. Representative investigations in this direction include QUEL as a datatype (Stonebraker, Rubenstein & Guttman, 1983), GEM (Zaniola, 1983), AIM-P (Dadam et al., 1986), PSQL (Roussopoulos, Faloutsos & Sellis, 1988), POSTGRES (Stonebraker & Rowe, 1987), Starburst (Schwarz et al., 1986), GRAL (Gutling, 1989), PROBE (Manola & Orenstein, 1986; Orenstein & Manola, 1988), GENESIS (Batory et al., 1987), and EXODUS (Carey, DeWitt & Vandenberg, 1988). On the commercial front, both Oracle and Informix corporations claim to have incorporated multimedia information retrieval capabilities into their yet to be released universal database servers. The idea behind the universal server concept is to provide an integrated database engine to store and retrieve multimedia data. The Informix's universal server is based on objectrelational technology. The server features a central database, and relies on various data blades for multimedia data management. For example, a text data blade features full functionality for storing and retrieving text. Informix claims that hundreds of data blades will seamlessly cooperate and interact with the central database to collectively provide multimedia data management services. However, the true challenge lies in establishing cross-media correlation to realize CBIR.

434

Venkat N. Gudivadaand VijayV. Raghavan

2.4. Architecture for adaptive image retrieval systems The systems in this category are adaptive for one or more of the following. They recognize that the retrieval users may differ in their assessment on the relevance of a set of retrieved images for an identically formulated query (i.e., the subjectivity problem). Some systems are capable of dynamically deriving new features and semantics on the fly. Other systems provide a generic framework that enables the system to learn to derive features automatically after initial training. Therefore, learning and adaptation are central to the systems in this category (Fig. 4). One way to deal with the subjectivity problem is to provide a flexible query specification that accommodates incomplete and imprecise queries. The query processor then iteratively executes a query by incrementally learning the interpretation of images from the perspective of individual users by soliciting relevance feedback on the initial retrieval results. New features can be dynamically derived by invoking feature extraction modules during the query processing time. Furthermore, the system user guidance is required to derive complex features by applying transformations on several low-level features. The characteristics of various facets of adaptive image retrieval systems are similar to those based on extensible and object-oriented database architectures. However, the architecture for the former has an in-built mechanism for dynamic feature extraction and for dealing with the

~

n

T

Query

~

J "-

Relevance

Feedback Subsystem

Extensiblc or

Obj~t-Ori~ted DBMS

Fig. 4. Architecturefor adaptiveimageretrievalsystems.

Modeling and retrievingimagesby content

435

subjectivity problem. Therefore, the adaptive image retrieval architecture has more domain independence. There is no standard query specification scheme or language and most systems employ a graphical interface for query specification. We now briefly describe a few investigations that embrace the adaptive architecture. The system that we have developed belongs to this category and is described in Section 4. VIMSYS (visual information management system) is a multi-layer image data model based on an extended object-oriented data model (Gupta, Weymouth & Jain, 1991). An image interpretation system is an integral part of VIMSYS. Support for modeling image sequences is provided. Sequences can be based on spatial contiguity of images or images of the same geographic region recorded at different time points. An application of VIMSYS data model for interactive retrieval of face images is described in (Bach, Paul & Jain, 1993). MOODS (modeling of object oriented data semantics) is an automated image data modeling system that enhances object-oriented data modeling in three ways: dynamic data semantics, abstract function groups, and dynamic inheritance (Griffioen, Mehrotra & Yavatkar, 1993). Dynamic data semantics facilitates modification to the system functionality as the semantic meaning of data evolves over time. Abstract function groups allow defining domain objects in terms of abstract operations rather than specific image processing functions. The ability to combine semantically different data types into a new data type at run-time is enabled by the dynamic inheritance feature. CORE (content-based retrieval engine) is a generic engine for content-based retrieval of images (Wu et al., 1995). CORE features multiple feature extraction methods, automated content-based indexing using self-organizing neural networks, and fuzzy retrieval. The system has been used to develop two applications: computer-aided facial image inference and retrieval; trademark archival and retrieval. Another approach to image retrieval based on partitioning database images into clusters based on their visual resemblance is presented in Oommen and Fothergill (1993). The partitioning is performed adaptively on the basis of the statistical properties of the user's query patterns. The grouping of similar images is expected to make the subsequent searches faster. Explicit computation of statistics is avoided by employing a learning automata. 2.5. Miscellaneous approaches to image retrieval

In this section, we briefly describe various approaches to image retrieval that do not fall under any of the previous categories. The intent is to inform the reader about other investigations in the area. Cognition based approaches to image retrieval are discussed in Hirabayashi, Matoba and Kasahara (1988) and Kato et al. (1991). The approach proposed in Hirabayashi, Matoba and Kasahara (1988) is based on the impressions that the images are expected to make on the user. Impressions are modeled as index terms. However, these index terms are different from the traditional index terms. The intensity of index terms is represented using an interval scale. An image may contain several impressions and each impression corresponds to an axis in the semantic space. Thus, images are represented as points in a multidimensional semantic space. Queries are expressed as subspaces by selecting ranges on the semantic axes. In the approach proposed in Kato et al. (1991), the retrieval model is based on both the image model and user model. The image model represents image features that are significant for retrieval. The user model reflects the visual perception processes of the user. The cognitive model integrates both these models. PhotoBook is a system for retrieving images by content in certain domains (Pentland, Picard & Sclaroff, 1994). Features are represented as lists of coefficients and are automatically extracted using a two-stage process. During the first stage, portions of an image are transformed into a canonical coordinate system that preserves perceptual similarity. A lossy compression method (e.g., Karhunan-Loeve transform, Wold decomposition) is then used to extract and code the important parts of the image from the first stage. Since each feature is tuned for a specific type of image content (e.g., shape, texture), several features may be required for an application. Search for similar images is based on the coefficient values. The On-Line Images (OLI) collection at the US National Library of Medicine features 60,000

436

VenkatN. Gudivadaand VijayV. Raghavan

images from the history of medicine project (Rodgers & Srinivasan, 1994). This collection is accessible through a World Wide Web (WWW) client. The client features a form-based interface and supports keyword and attribute queries. A system for selecting the images of earth's aurora from a large collection of DE-I satellite images based on the events the images depict is described in Samadani, Han & Katragadda (1993). The querying process proceeds as follows. A small subset (i.e., training set) of images from the collection are selected and the system extracts various features that relate to shape, size, and intensity of aurora with the interactive involvement of the user. These features are fed to a supervised decision tree classifier and the user indicates the relevance of the images in the training set to the query. Using the decision tree, the relevance of the images in the collection to the query is determined. ACORN, a system for learning knowledge about spatial relations from raster images is described in Hiraki et al. (199 l). Learned knowledge is represented with constrained programs. ACORN can generate scenes from spatial relational descriptions and can describe spatial relations in new image instances using the learned spatial knowledge. An image retrieval system based on relevance feedback techniques is proposed in AlHawamdeh et al. (1991). The initial query is stated in textual form. The user browses the retrieved pictures and provides relevance feedback to the system. The query is reformulated using the relevance feedback to improve the retrieval effectiveness. An extension of this work appears in Price, Chua & AI-Hawamdeh (1992). A grammar-based approach to IR is introduced in Rabitti & Stanchev (1987). An extension of this work has resulted in a prototype system referred to as GRIM-DBMS and the associated data model is based on attributed relational graphs (Rabitti & Stanchev, 1989). Further extension and related work appears in Rabitti & Savino (1992). An approach to image retrieval based on region feature description is proposed in Yamamoto and Takagi (1988). Objects in the image are described as regions and region properties are defined at both global (i.e., image) and local (i.e., object) levels. Image queries are expressed either in textual form using keywords or in pictorial form using region icons. lIDS (intelligent image database system) is a prototype image database system (Chang et al., 1988) based on a spatial knowledge representation structure called the 2D-string (Chang, Shi & Yan, 1987). This system supports spatial reasoning and visualization, in addition to the traditional image database operations. An extension of this work appears in Chang, Jungert and Li (1989). A PDL (picture description language) based on entity-attribute-relationship model is proposed in Leung, Hibler and Mwara (1992). The PDL allows for the logical description of objects, attributes of objects, relationships among objects, and events in the image. The logical descriptions are manually derived. An extension of this work appears in Hibler et al. (1992). Pictorial semantic network(s) is proposed as a scheme for representing pictorial knowledge in Lee (1988). Pictorial knowledge is grouped into three classes: angular, side, angular and side. Angular pictorial knowledge is expressed in terms of only angles between line segments of geometric objects, and side pictorial knowledge is expressed in terms of only (lengths of) sides. Likewise, angular and side representation involves both angles and sides. A transformation module is provided to transform knowledge among these three classes. The pictorial semantic network(s) is searched to retrieve relevant images. FINDIT is an image retrieval tool for locating images that contain the object specified in the query (Swain, 1993). Objects are specified by closed spine curves and retrieval algorithms are based on histogram matching and correlation on wavelet-encoded images. An approach to retrieving similar images from NOVA satellite imagery collection is discussed in Kitamoto, Zhou and Takagi (1993). The notion of similarity is based on the shape and the spatial relationships among the objects (i.e., regions) in the satellite imagery. Images are modeled as attributed relational graphs and similarity is computed by graph matching. Feature graph is introduced in Wakimoto et al. (1993) as a logical structure for describing the contents of structured drawings such as plant diagrams. The concept of functional block is introduced to retrieve similar drawings. An approach to IR based only on spatial relationships is proposed in Chang and Lee (1991).

Modeling and retrievingimagesby content

437

A scheme for conceptual indexing of images is discussed in Halin and Mouaddib (1992). A method for retrieving image sequences using motion information as a key is proposed in Ioka and Kurokawa (1992). QBIC (query by image content) is a CBIR system commercially available from IBM corporation (Flickner et al., 1995). Images are indexed by color, texture, and shape. Indexing is done partly in automatic mode and partly in semi-automatic mode. Shape representation and matching has been an active research topic in image processing and computer vision discipline over the years and plays a principal role in CBIR. Much of the work in this area has been carried out from the automatic object recognition point of view. Only recently, there has been focus on similarity-based shape matching and retrieval (Gary & Mehrotra, 1992; Jagadish, 1991; Jagadish & Bruckstein, 1992; Mehrotra & Gary, 1993, 1995; Faloutsos et al., 1994; Jain & Vailaya, 1996). However, many of these algorithms have high computational complexity. Furthermore, how well the algorithmic shape similarity of existing algorithms captures human perceptual similarity remains to be investigated. A taxonomy for shape description is proposed (Mehtre, Kankanahalli & Lee, 1995). The description schemes are broadly classified into two categories: boundary and region based. Boundary. methods are based on the outline or contour of the shape and completely ignore the interior information. On the other hand, region based methods consider both the contour and interior of the shape. Region based methods are further classified into transform and spatial domain representations. As the name implies, transform based methods represent a shape in terms of the result of a transformation on the shape. Further distinction is made in spatial domain representation: structural and geometric. Structural representations employ certain primitives and rules for shape description. Geometric schemes are based on interior geometry of the shape and different representations are used based on whether or not shapes are occluded. Though these representations existed in the image processing and computer vision literature for quite some time, they are only recently being reinvestigated from the shape similarity perspective. Similarity measures are typically based on the Euclidean distance between the features used for shape the representation. No conclusive evidence exists as to which representations are more effective than the rest, though an attempt has been made (Mehtre, Kankanahalli & Lee, 1995). Studies on how algorithmic shape similarity corresponds to human perceptual shape similarity are only beginning (Scassellati, Alexopoulos & Flickner, 1994). 2.6. Summary. Image retrieval systems have shown impressive sophistication as they evolved from architectures based on conventional database systems to architectures that intrinsically support semi-automatic feature extraction and address the subjectivity problem. The data model has evolved from simple relational to the ones that support a spectrum of image abstractions ranging from unstructured to semi-structured to structured. Typically, the larger the number of abstractions supported on the spectrum, the greater is the scope for ad hoc and flexible queries. Though some systems promote the idea of dynamic feature extraction, it is less likely that such systems can scale up and provide online query processing for large image retrieval applications. True domain independence is an important goal to achieve, however, it remains elusive. This is because the type of features required and associated algorithms for query processing vary from one domain to another. Furthermore, the algorithms for automatic feature extraction tend to be domain-specific, computationally intensive, and not robust (Daneels et al., 1993). Semiautomated approaches to feature extraction and semantics capture hold promise for developing large-scale systems. The architectures based on extensible and object-oriented database systems, and those that support adaptiveness, represent promising directions toward achieving domain independence. Until recently (Kunii, 1989), most of the languages for querying images databases were based on either SQL or QBE (query by example). However, SQL and QBE are not natural and intuitive for querying the image data. They assume that the user is familiar with the database schema. Moreover, there are several query operators that facilitate CBIR and each operator requires a

438

VenkatN. Gudivadaand VijayV. Raghavan

specification scheme that is most natural to its intrinsic nature. Recent research on CBIR recognizes the need for synergy between attribute based and image interpretation approaches (Gudivada & Raghavan, 1995a). Toward this goal, the recent research efforts draw upon ideas from areas such as knowledge-based systems, cognitive science, user modeling, computer graphics, image processing, pattern recognition, data management systems, and information retrieval. In the following section, we discuss various conceptual issues that shaped our proposed framework for CBIR.

3. CONCEPTUAL ISSUES OF CONTENT-BASEDIMAGE RETRIEVAL Achieving a reasonable degree of domain independence was the overriding consideration in developing our framework for CBIR. Toward this goal, we studied the retrieval requirements of a number of image retrieval applications including art galleries and museums, interior design, architectural design, real estate marketing, and face information retrieval. In all these domains, there is a need for flexible and efficient content-based retrieval. The study enabled us to provide a taxonomy for image attributes and generic query operators. Regardless of the domain, an image can be thought of as a complex object or entity. That is, an image is composed of one or more domain objects. The domain objects themselves can be complex objects. Intuitively, a domain object is a semantic entity contained in the image which is meaningful in the application. For example, in the interior design application, various furniture and decorative items in an image constitute the domain objects. At the physical representation (e.g., bitmap) level, a domain object is defined as a subset of the image pixels. Image features and semantics facilitate CBIR. Though the level of abstraction manifested in a feature can be best viewed as a point on the spectrum ranging from unstructured to structured data (Section 2), for pragmatic reasons, we classify features into two broad classes: primitive and logical. Primitive features are low-level image features which can be extracted from the images automatically or semi-automatically. For example, coordinates of the centroid and minimum bounding rectangle of a domain object are primitive features. Usually, they are not used directly for querying the images since they are either cumbersome or not natural to express the query. Logical features, on the other hand, are abstract representations of images at various levels of detail. They characterize various properties of an image and its domain objects, and associations among the domain objects. Some logical features can be synthesized from primitive features whereas other features can only be obtained through considerable human involvement. Consider a database of mug-shot images. Jawline indentation is a logical feature that assumes one of the three values: none, shallow, deep. Jawline indentation is a measure of the extent of shadow area under the lower lip and can be computed automatically using a rule base. In contrast, image semantics denote complex domain-specific relationships among the domain objects that are extremely difficult to compute automatically. An example of a semantic is snow covered mountain in a natural scene image.

3.1. A taxonomy for image attributes We use the term attribute to refer to the logical features and semantics suitable for formulating user queries. A taxonomy for image attributes is shown in Fig. 5. Attributes are grouped into two categories: extrinsic and intrinsic. Extrinsic attributes denote the characteristics of an image (or a domain object) which can only be obtained externally. That is, extrinsic attributes can't be derived from the image itself. For example, spectral channel number, date of image acquisition, and name of the satellite, are a few extrinsic attributes for remotely sensed images. As another example, the name of the Physician who made the diagnosis of a chest X-ray image is an extrinsic attribute. In contrast, intrinsic attributes are those that can be extracted from the image manually, automatically, or by a combination of both. Intrinsic attributes are grouped into three categories:

Modeling and retrievingimagesby content

I

l Altnlmtes I

t

439

I

T

t Semanlic

Fig. 5. A taxonomyfor imageattributes.

objective, subjective, and semantic. The interpretation of an objective attribute does not vary from one system user to another. For example, the number of bedrooms, and the total floor area, are two objective attributes of a residential floor plan image. Compared to subjective attributes (discussed below), objective attributes are more precise and do not require the domain expertise to either identify or quantify them in new image instances. The interpretation of subjective attributes, on the other hand, may vary significantly from one system user to another. The range of values assumed by a subjective attribute is best viewed as spanning a spectrum characterized by a left hand pole (one extreme position) and a right hand pole (the other extreme position). A user's conceptualization of the value for a subjective attribute is then associated with a specific position on the spectrum. For example, in a mug-shot image database, the subjective attribute eyebrow shape may assume one of the three values: arched, normal, straight. The value arched represents, say the left hand pole, whereas the value straight represents the right hand pole. For a given image, the indexer2 might have assigned the value normal for the attribute, whereas from the view point of the user it should have been arched. Semantic attributes denote deeper domain semantics manifested in the images. Often, a semantic attribute corresponds to a group of domain objects. Semantic attributes are used to capture geometrical and topological properties, aggregation and class hierarchy relationships, and other semantics among the domain objects. The previous example of a snow covered mountain is a semantic attribute. Natural language text that describes an image is also a semantic attribute. Usually, it is not possible to extract semantic attributes automatically from the images.

3.2. A taxonomy for query operators Based on the attribute taxonomy, we have found the following generic query operators important for facilitating CBIR (Gudivada & Raghavan, 1995a): retrieval by color, texture, sketch, shape, volume, spatial constraints, browsing, objective attributes, subjective attributes, sequence, keywords, natural language text, and domain concepts. Retrieval by color and texture queries enable retrieving images that contain domain objects with specified color and texture (Mehtre et al., 1995; Ogle & Stonebraker, 1995; Picard & Minka, 1995). Using retrieval by sketch, a user simply sketches an image of interest and expects the system to retrieve images in the database that are similar to the sketch (Faloutsos et al., 1994; Flickner et al., 1995). Retrieval by sketch can be thought of as retrieving images by matching the dominant edges. Retrieval by shape (Gary & Mehrotra, 1992; Faloutsos et al., 1994; Jagadish, 1991; Jagadish & Bruckstein, 1992; Jain & Vailaya, 1996; Mehrotra & Gary, 1993, 1995; Mehtre, Kankanahalli & Lee, 1995; Scassellati, Alexopoulos & Flickner, 1994) facilitates a class of queries that are based on the shapes of objects in an image and its counterpart in 3D images is referred to as retrieval by volume. 'A personor an automatedtool used to extractthe features.

440

Venkat N. Gudivada and Vijay V. Raghavan

Retrieval by spatial constraints deals with a class of queries that is based on spatial and topological relationships among the domain objects. These relationships may span a broad spectrum ranging from directional relationships to adjacency, overlap, and containment involving a pair of objects to multiple objects. This query is subdivided into two subclasses: retrieval by spatial similarity (Gudivada, 1997; Gudivada & Jung, 1995; Gudivada & Raghavan, 1995b) and topological relationships (Papadias et al., 1995). Retrieval by spatial similarity queries require selecting database images that satisfy the spatial relationships specified in the query to varying degrees. This degree of conformance is used to rank order the database images with respect to the query. In contrast, retrieval by topological relationships involves selecting those database images in which the domain objects exhibit the specified topological relationships in the query. Adding topological relationships to a spatial similarity query makes the latter more constrained. For example, a user query may specify that object A be to the left of object B (spatial relationship) and objects A and B be adjacent (topological relationship). Retrieval by browsing is performed when the user is vague about his retrieval needs or is unfamiliar with the structure and the types of information available in the image database. The functionality of a browser may vary from providing very little help to the user in guiding the search process to sophisticated filtering controls to effectively constrain the search (Ahlberg & Shneiderman, 1994; Plaisant, Carr & Shneiderman, 1994). Various types of attributes can be employed to facilitate content-based browsing (Grosky et al., 1994). In retrieval by objective attributes, a query is formulated using objective attributes and is similar to the retrieval in conventional DBMS using the SQL. Query processing is based on exact match on the attribute values. Subjective attributes are used in specifying retrieval by subjective attributes queries (Gudivada, Raghavan & Seetharaman, 1994; Jung & Gudivada, 1994, 1995). Executing this query operator requires that the query processor be adaptive by learning from the user interaction at query processing time. Retrieval by sequence queries facilitate retrieving spatio-temporal image sequences that depict a domain phenomenon that varies in (geographic) space or time (Gupta, Weymouth & Jain, 1991). For example, in geographic information systems, a user query might involve retrieving a sequence of images of a geographic area exhibiting certain soil and surface characteristics (e.g., regions of clay soil where cotton is grown). A set of images corresponding to the same space also comprises images sequences. Such sequences depict a domain phenomenon that varies over time (e.g., growth of a plant's root system in a laboratory setting). In some applications, images tend to be quite distinct from each other both in structure and semantic content. Structuring this content to fit the (rigid) structure of a relation (relational data model) or class (object-oriented data model) is quite difficult. The notion of keyword or term 3 from information retrieval (Salton, 1989) area is useful for modeling such images. An image is modeled by a set of keywords which are representative of the image content. Typically, keywords are assigned manually. Note that a keyword is different from an attribute in that the former has no value associated with it. However, optionally a weight (usually in the range 0 to 1) can be associated with a keyword to indicate the degree of the relevance of the keyword in describing the contents of the image. Queries are processed using information retrieval models ranging from simple set-theoretic Boolean model to advanced algebraic models such as the vector space model (Salton, 1989). A retrieval by keywords query is composed by specifying keywords and optionally assigning weights to them. In applications such as photo-journalism, a caption or natural language text typically accompanies the image. The text usually describes the contents of the image, among other things (Flank et al., 1995; Srihari, 1995). Retrieving images from such applications is modeled by retrieval by natural language text query operator. The retrieval engine employs natural language processing techniques (morphological, syntactic, and semantic analysis) to retrieve relevant images. Alternatively, retrieval by natural language text queries can be processed as retrieval by keyword queries. Keywords can be automatically extracted and weighted from the text using 3Keyword and term are used interchangeably in the information retrieval literature. We use keyword since it has been widely used in the image retrieval literature.

Modeling and retrieving images by content

441

automatic indexing methods (Salton, 1989). This method is preferred when the image collection is very large and interactive query processing is essential. The above query operators can be used as fundamental, primitive building blocks in formulating complex queries referred to as retrieval by domain concepts (Gudivada & Raghavan, 1995a; Ogle & Stonebraker, 1995). An algebra needs to be developed to specify complex queries in terms of the above query operators. Furthermore, an API (application programming interface) is required to embed the generic query operators in an application program written in a general purpose programming language such as C.

4. CBAIR--A FRAMEWORK FOR CONTENT-BASED IMAGE RETRIEVAL We have designed a data model which supports the above query operators and is referred to as CBAIR (content-based adaptive image retrieval) (Gudivada, Raghavan & Vanapipat, 1995). We have also developed a system architecture for image retrieval based on the CBAIR data model and is shown in Fig. 6. CBAIR enables instantiation of domain-dependent aspects of an application and features a domain independent part. The instantiable component comprises a rule base to synthesize various attributes in semi-automated mode and to encode the domain semantics (e.g., generalization/specialization hierarchy, thesaurus). The domain-independent part of the system comprises the following major cOmponents: feature extraction and semantics capture, query specification and retrieval, relevance feedback elicitation and query reformulation, browser, and query manager.

4.1. CBAIR system architecture

When a new image is inserted into the database, various domain objects present in the image are identified and labeled, and their primitive and logical features are extracted. Also, various associations among the domain objects are derived. We refer to these activities as feature

l~l~m~e¢Feedback Elicitationand ~

1 ~

QuerySl~'ifie~oe

~fonnua~

T

DomainKnowl~lg© ] 1

TLI

• ilal© B ~

• Gmaalimica/ Specislization Hierm:hy s Tl~s~rus • Externall~rograms

r

>

t

rv~ui~ ~a~r~tm,

>

Feature

Fig. 6. CBAIR system architecture.

ll,m~

442

VenkatN. Gudivadaand VijayV. Raghavan

extraction and semantics capture (FESC). Automated approaches to FESC are highly desirable, since manual approaches are expensive and tedious. However, as noted in Section 2.6, automated approaches are computationally expensive, difficult, and tend to be domain-specific. A simple task such as detecting and labeling a balloon by using shape and color analysis techniques is not robust in the general case, let alone determining whether the balloon is in the setting of a wedding or birthday party. The problem becomes worse when we deal with continuous tone images of natural scenes which may contain weak contrast along the object boundary, spurious edges, overlapping and occluding objects. To be useful across a range of domains, a CBIR system should have the capability to deal with images originating from diverse domains and the types of objects that will be present in the images is not known a priori, otherwise the number of models required to recognize the domain objects is numerous and adding contextual information to these models results in extremely complex models that have little practical value. Therefore, our approach to FESC is a hybrid one--features for which robust image interpretation techniques exist are extracted automatically while the others are derived semiautomatically or manually. However, manual approaches to FESC introduce inconsistency and subjectivity problems. Inconsistency refers to the problem of describing the contents of similar images differently by the same person (i.e., the indexer). Subjectivity arises due to the differing interpretations of a feature by the indexer and the system user. Controlled vocabulary (Salton, 1989) and semi-automated tools will help alleviate the inconsistency problem. The subjectivity problem can be resolved dynamically through relevance feedback at the query processing time (Gudivada, Raghavan & Seetharaman, 1994; Jung & Gudivada, 1994, 1995). The design of FESC is described elsewhere (Gudivada & Jung, 1996). Extracted features and semantics are persistently stored in the image database. The query specification and retrieval module features a consistent and intuitive interface and integrates the specification of various query operators in a way that is transparent to the user. As an example, consider the specification of retrieval by spatial similarity (RSS) and retrieval by text (RBT) queries. A user specifies an RSS query by placing the icons corresponding to the domain objects in a special window named sketch pad window (see Fig. 8). The sketch pad window provides both the graphic icons of the domain objects and the necessary mechanisms for selecting and placing the graphic icons for composing an RSS query. Spatial relationships among the icons in the sketch pad window implicitly indicate the desired spatial relationships among the domain objects in the images to be retrieved. In contrast, users typically specify RBT queries in natural language text. This illustrates the complexity involved in the design of query specification and retrieval module to seamlessly integrate various query specification schemes. We are currently investigating these issues. The relevance feedback elicitation and query reformulation module provides the functionality to elicit user relevance feedback and to interactively reformulate the query (Gudivada, Raghavan & Seetharaman, 1994; Jung & Gudivada, 1994, 1995). This module interacts with the following domain-instantiated knowledge: generalization/specialization hierarchies, rule bases, thesaurus, and external programs. The browser enables IR in an exploratory and learning mode and involves integrating several novel ideas (Hock, Siong & Chi, 1991; Plaisant, Carr & Shneiderman, 1994). The query manager module comprises two submodules: query operators/processing algorithms, and index manager. The processing algorithms facilitate efficient query processing and provide extensibility to CBAIR. This module is shown in Fig. 7. It encapsulates all the application independent logical features. Each logical feature is modeled as a class which consists of a structure and a set of associated methods to manipulate the structure. In the example shown in Fig. 7, we consider seven generic logical feature representations: natural language text, 6~-string, color, texture, shape, subjective attributes, and objective attributes. These logical features are used to efficiently process various query operators. For example, the 6~-string class is used to process retrieval by spatial similarity queries. Three applications are shown in the figure: systems for real estate marketing, law enforcement and criminal investigation, and photo journalism. These applications simply inherit the necessary logical feature classes to process queries that are prevalent in their domains. The extensibility feature

Modeling and retrieving images by content

443

Ge=~ t ~ 4 d Fe*t~e ~ Module Provided by CBAIR System

Application #3: Photo Journalism Appliettlon

Application #1 : RmdtmmInfomudim-ISystem

LelccalFeem~

tosical Feata~ Represmm~on

/' I J,

O~ - String

Nm.d ~*~uqe Text

Color

Color

Almmtetion

Texture Shape

T~tUm~ Applicali~l #2: Law Enfolr.emmat ~ Inve~ipfion

Cr~mlnal~'~' -

Lo~cal F ~ ~ o f l

//

... \

.....

',, objectiveAtml~utes

,

[ t~naa: .......

~,

lnst~tiation

Fig. 7. Application independent logical feature representation in CBAIR.

of CBAIR facilitates incorporating new logical features for processing other query operators. The index manager provides two functions: filtering and access. The filtering function helps

Rooms 60

40

20

Query Factors Scale

L:qedroom I

aster

drm

10

Spatial Object

0

10

20

30

40

Fig. 8. Sketch pad window for specifying spatial similarity queries. IPM 33-4-B

60

60

444

VenkatN. Gudivadaand VijayV. Raghavan

identify those images that have the most potential for being relevant to the user's query (Gudivada, Sonak & Grosky, 1997). The access mechanism helps efficient retrieval of logical and primitive features of the images identified by the filtering function from secondary and tertiary storage devices. Query processing algorithms are then applied only on this subset of images to compute their degree of relevance to the user query. The CBAIR being generic, for a given image retrieval application, not all of its functionality is required. We have structured the system in way that an image database application designer can keep only that functionality of CBAIR which is required in the application. That is, we can produce a tailor-made image retrieval system for an application from CBAIR without inheriting all the overhead associated with the latter.

5. APPLICATION PROTOTYPES In this section, we describe two image retrieval applications developed using our implementation of CBAIR. The first application is a system for real estate marketing and the intended users are realtors. The second application is a face retrieval system for campus law enforcement and the intended users are police officers. The primary objective of the real estate marketing system is to demonstrate the significance of the RSS query operator, whereas the objective of the face retrieval system is to illustrate the effectiveness of the RSA query operator. 5.1. Realtors' information system Currently, realtors use MLS (multiple listing service) system to find houses that match the needs and preferences of their clients. The MLS system is designed to retrieve sale houses based on their extrinsic and simple logical attributes such as the age of the house, lot size, price range, number of bedrooms, and total heated area. Information provided by the MLS system does not include details either on the floor plan or superstructure of the house. Independent of the MLS system, realtors may be able to display from a video disk, an image of the house taken from a vantage point. This only provides a general feeling for the quality of the neighborhood and exterior of the house. However, it has been noted that some people prefer a house that has a bedroom with east-facing orientation so that waking up to the morning sun is a psychologically pleasant experience. Yet some other people may prefer certain orientation for specific units in the house based on cultural and religious backgrounds. Though this type of retrieval need has existed in the domain for some time, no existing systems seem to support such a retrieval. In metropolitan areas having a large number of houses for sale, it is almost beyond the abilities of a human being to remember the spatial configuration of various functional and esthetic units in the sale houses. The RSS query operator naturally models the above need whereas the ROA operator models the functionality provided by the MLS system. Therefore, RSA and ROA provide the necessary functionality to develop the next generation of real estate marketing systems. User interaction with such a system proceeds as follows. First, the user specifies his need and preferences by using a ROA query. The system then retrieves a set of sale houses that meet the needs and preferences specified in the query. Second, the user sketches a desired floor plan (see Fig. 8) and this query is processed as a RSS query against the set of images retrieved by the ROA query. The user then browses the ranked list of floor plans returned by the RSS query and selects a small number of houses for physical tour. This two-stage query specification and processing is meant only for exposition purpose. However, the two stages can be seamlessly integrated. The system that we have developed for real estate marketing is described next. First, we describe the floor plan database followed by query specification and processing, system evaluation, and future enhancements. 5.1.1. The floor plan database. A set of 60 floor plans were scanned and stored in digital form and constitute our image database. The extrinsic attributes of an image include style, price, lot size, lot type, lot topography, school district, subdivision name, and age of the house. For

Modeling and retrieving images by content

445

lack of real data, we have not included other extrinsic attributes. Logical attributes of the image include number of bedrooms, number of bathrooms, total floor area, total heated area, foundation type, roof pitch, and utility type. Dimensions and shapes of the various rooms constitute the logical attributes of domain objects. Three logical feature classes--O~-string (Gudivada, 1997), spatial orientation graph (Gudivada & Raghavan, 1995b), and 2D-string (Lee, Shan & Yang, 1989)--are inherited from the CBAIR. Though only one logical feature class is required, we wanted to compare and contrast the performance of these feature classes on the RSS query operator. All the information about the floor plans is derived manually. 5.1.2. Query specification and processing. An RSS query is specified by first spatially configuring the icons corresponding to the domain objects and then assigning extrinsic and logical attributes to these icons (Fig. 8). The user also specifies which of the three spatial similarity algorithms (Gudivada, 1997; Gudivada & Raghavan, 1995b; Lee, Shan & Yang, 1989) be used in processing the query. The retrieved floor plans are rank ordered based on the spatial similarity values and are shown to the user using a browser (Fig. 9). 5.1.3. System evaluation. Evaluating the retrieval effectiveness of the spatial similarity algorithms featured in the prototype is outside the scope of this paper and each is described in its respective publications. The retrieval effectiveness is measured in terms of how well the rank ordering of the database images provided by the system for a query corresponds to the human rank ordering of the database images for the same query. This correspondence is quantified by a measure known as R.... (Gudivada & Raghavan, 1995b). An R.... value of 1 indicates that the system provided rank ordering is identical to the human provided rank ordering and lower values indicate proportional disagreement between the two rank orderings. The range for R.... values is [0,1]. We have randomly selected 20 floor plans from the pool of 60 and used them as test queries. A graduate student in computer science is asked to provide a rank ordering of the database images with respect to each of the test queries. The system provided rank ordering of the database images for the test queries is obtained by submitting them to the system as a query in turn. The R value for each test query was consistently high (>0.96) confirming the retrieval effectiveness of the system. The overall assessment of the prototype by realtors indicates that the ....

Index

I)W;"

MASTER BEDROOM

FAMILY

Image

o

I

Suggest Documents