372
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
An Efficient and Robust Algorithm for Shape Indexing and Retrieval Soma Biswas, Graduate Student Member, IEEE, Gaurav Aggarwal, Student Member, IEEE, and Rama Chellappa, Fellow, IEEE Abstract—Many shape matching methods are either fast but too simplistic to give the desired performance or promising as far as performance is concerned but computationally demanding. In this paper, we present a very simple and efficient approach that not only performs almost as good as many state-of-the-art techniques but also scales up to large databases. In the proposed approach, each shape is indexed based on a variety of simple and easily computable features which are invariant to articulations, rigid transformations, etc. The features characterize pairwise geometric relationships between interest points on the shape. The fact that each shape is represented using a number of distributed features instead of a single global feature that captures the shape in its entirety provides robustness to the approach. Shapes in the database are ordered according to their similarity with the query shape and similar shapes are retrieved using an efficient scheme which does not involve costly operations like shape-wise alignment or establishing correspondences. Depending on the application, the approach can be used directly for matching or as a first step for obtaining a short list of candidate shapes for more rigorous matching. We show that the features proposed to perform shape indexing can be used to perform the rigorous matching as well, to further improve the retrieval performance. To illustrate the computational and performance advantages of the proposed approach, extensive experiments have been performed on several challenging problems that involve matching shapes. We also highlight the effectiveness of the approach to perform robust and efficient shape matching in real images and videos for different applications like human pose estimation and activity classification. Index Terms—Fast retrieval, indexing, shape matching.
I. INTRODUCTION
N
UMEROUS applications of shape matching and recognition have made it a very important area of research in the field of computer vision (see Fig. 1). Character recognition,
Manuscript received September 13, 2009; revised February 11, 2010; accepted April 14, 2010. Date of publication May 20, 2010; date of current version July 16, 2010. This work was supported in part by UNISYS, in part by NSF-ITR Grant 03-25119, and in part by ONR MURI Grant N00014–08–1–0638. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. James Z. Wang. S. Biswas was with the Center for Automation Research and the Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742 USA, and is now with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:
[email protected]). G. Aggarwal was with the Center for Automation Research and Department of Computer Science, University of Maryland, College Park, MD 20742 USA, and is now with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:
[email protected]). R. Chellappa is with the Center for Automation Research and the Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742 USA (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2010.2050735
Fig. 1. A few applications that can benefit from robust and efficient shape matching. (a) Matching and retrieval of 2-D shapes [1], like trademark retrieval [2], leaf recognition [3], etc. (b) Activity classification [4]. (c) Gesture recognition. (d) Pose estimation in sports clips [5].
trademark logo retrieval, activity recognition, object recognition, and human pose estimation are a few of the challenging applications that can benefit from accurate and efficient shape matching techniques. Different applications require different representations and hence different matching algorithms to handle the large variations in shapes. Also with the recent advancement in technology and the availability of different kinds of sensors, the amount of data to be handled has increased tremendously over the last few decades. So even though research in the area of shape matching has matured, the challenges involved in achieving high performance in terms of both accuracy and computational complexity continues to interest researchers. Shapes show a great deal of intra-class variations including rotations, translations, articulations, missing portions, and other inexplicable deformations which make the problem of shape matching quite challenging. Errors in extracting shapes from input images or videos further add to the complexity. Matching shapes across complex deformations has been the main focus of most works in recent times. Many existing shape matching algorithms require computationally demanding matching schemes to be able to handle the aforementioned variations, making them not so effective for large databases. On the other hand, much research has also been focused on efficient retrieval of shapes. But many of these approaches are not designed to handle complex deformations like articulations of part structures. In contrast, we propose an indexing system for fast and robust matching and retrieval of shapes across both rigid and nonrigid transformations. We envisage a shape matching system which can efficiently scale to large databases without compromising on the re-
1520-9210/$26.00 © 2010 IEEE
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
trieval performance obtained by state-of-the-art shape matching algorithms. We model a shape as a collection of landmark points arranged in a plane (2-D) or in 3-D space. In our approach, each shape is characterized by features that are used to index it to a table. The table is analogous to the inverted page table used to index web pages using words/phrases. Given a test shape, similar ones from a pre-indexed collection are determined based on its characterizing features. The computational overhead (of establishing point-wise correspondence) involved in the traditional way of matching the query with each shape in the dataset is thereby avoided. As we deal with shapes, the only information usually available is the underlying geometry. Appropriate features are chosen to encode this geometry as richly as possible, without compromising on robustness. Quite clearly, the set of useful features varies depending on the particular application at hand. For example, invariance to articulations of part structures is very important in applications like gait-based human identification whereas the same feature is not desired for applications like retrieval based on human pose. Our goal here is to develop a system that supports fast retrieval of shapes without needing any costly correspondence step during matching. To this end, we use (or propose) features that address most challenges faced by shape matching tasks including invariance to object translation, rotation, scale, articulations, etc. In the proposed indexing framework, a given shape is represented using a collection of feature vectors, each characterizing a geometrical relationship between a pair of landmark points. The features should be easily computable for the matching algorithm to be efficient and to be able to scale up to large database sizes. For each landmark pair, depending on the application, all or a subset of the following geometrical characteristics are encoded in the corresponding feature vector. 1) distance between the points. This can be the inner distance [3] or the standard Euclidean distance, depending on whether or not articulation-invariance is desired; 2) relative angles between the line segment joining the two points and tangents to the contour at the points; 3) contour distance between the points (analogous to geodesic distance in case of 3-D shapes); 4) distances of the points from the center of mass. For applications requiring invariance to articulations, we propose an articulation-invariant center of mass which is analogous to the standard center of mass with the added feature of being invariant to articulations. Clearly, more suitable (indexable) features can be easily added to this list to make the representation richer. The feature vectors are suitably quantized for indexing. The fact that feature vectors depend only on a few points and are quantized provides the necessary robustness required to generalize across large intra-class variations. As shown by the results, the matching speed and ability to generalize does not come at the cost of discriminability across shapes. Since all the desired characteristics of the shape matching algorithm like invariance to rotation, articulation, etc. are incorporated in the feature vectors themselves, this kind of representation allows the proposed system to have a very simple and efficient retrieval scheme. Given a test shape, the matching bins
373
in the index table are determined. A single parse through the matching bins returns the most similar shapes. This does not require any alignment or correspondences, making it extremely fast and scalable. Depending on the requirements of the application, the top matches returned by the single parse retrieval algorithm may directly be used as the similar shapes or there may be a need to further compare the query against the top few matches using a more rigorous algorithm to refine their ordering. Such a refinement stage will typically be more computationally expensive as compared to the proposed retrieval algorithm, but this will need to be performed only for a few top matches instead of the whole shape database that makes such a two-stage scheme computationally attractive. In this paper, we show how we can use the same set of features to do a more rigorous matching on the short-list candidates returned by the indexing phase to further improve the matching performance. The proposed algorithm has been rigorously tested for several shape matching applications. We first evaluate the approach by providing performance comparisons with several existing methods using standard shape silhouette datasets like the MPEG7 shape dataset [1], the articulation dataset [3], Kimia (1 and 2) datasets [6], [7], and ETH-80 object database [8]. The computational advantage obtained using the proposed approach is also highlighted. In addition, we perform experiments on human pose estimation and activity classification on challenging datasets. A. Organization of the Paper The rest of the paper is organized as follows. Section II discusses some of the related works. Section III introduces the indexing framework proposed in the paper. Section IV describes the indexable shape representation. A detailed description of the indexing and retrieval algorithms is given in Section V. The details of the refinement algorithm to re-rank top matches returned by the indexing system are presented in Section VI. Section VII presents the results of extensive evaluations done to compare the proposed algorithm with others. Some real applications of shape matching are shown in Section VIII. The paper concludes with a summary and discussion. A preliminary version of this work was reported in [9]. II. PREVIOUS WORK The problem of shape matching has been around for quite sometime, probably due to its universality. Though significant advancements have been made, the demands on computational efficiency and accuracy continue to interest researchers. In this section, we discuss some of the previous efforts that are related to the approach proposed in this paper. The various approaches for shape matching in literature have their respective advantages and limitations with respect to the kind of input they can handle, computational complexity, etc. A. Related Work on Shape Matching Shape context-based matching [2] has been the theme of several recent works [10]–[13] on shape matching. In the original version [2], each point is characterized by the spatial distribution of the other points relative to it. Similarity computation involves establishing correspondences using bipartite
374
graph matching and thin plate spline (TPS)-based alignment. The shape context framework has since been extended in various ways to suit different requirements of the shape matching problem. Mori and Malik [10] propose using statistics of the tangent vectors along with the point counts to perform object recognition in clutter. A figural continuity constraint has been incorporated to yield reliable correspondences in cluttered scenes [12]. Tu and Yuille [11] incorporate softassign [14] in a shape context framework [11] for shape matching. After aligning the shapes using the correspondence given by shape context, Daliri and Torre [15] transform each contour into a string of symbols which is then matched using a modified edit distance. A recent extension by Ling and Jacobs [3] accounts for movement of part structures, by replacing the Euclidean distance in the classical version with inner distance, which is robust to articulations. McNeill and Vijayakumar [16] propose the hierarchical Procrustes matching algorithm which generalizes the idea of finding a point-to-point correspondence between two shapes to that of finding a segment-to-segment correspondence. In another recent work, Felzenszwalb and Schwartz [17] use a new hierarchical representation called shape tree for two-dimensional objects that captures shape information at multiple levels of resolution. Peter et al. [18] represent point-set shapes as the square root of probability densities expanded in the wavelet basis and uses a linear assignment solver to account for nonrigid transformations prior to matching. There is another body of work for capturing part structures in which shapes are represented using shock graphs [6], [19]. The shock graph grammar helps to reduce the shock graph representation to a unique rooted shock tree which is then matched using a tree matching algorithm. To handle shape deformations, Sebastian et al. [7] propose finding the optimal deformation path of shock graphs that brings the two graphs (shapes) into correspondence. Many of the approaches discussed above require finding correspondence between points/curve segments of two shapes which usually requires computationally expensive methods. Readers are referred to several other interesting approaches for matching shapes [20]–[26]. B. Related Work on Efficient Matching and Indexing Fast nearest neighbor searches in Euclidean space for finding closest points in metric spaces has a rich history [27]. Due to the tremendous increase in the amount of data that needs to be handled, indexing techniques are becoming increasingly popular for the development of fast retrieval algorithms for documents, images, etc. The indexing approach used in the paper is inspired by the work on fingerprint indexing using minutiae triangles as features [28]. Unlike classical geometrical hashing [29], the triangle-based approach hashes a set of points based on local invariants (depends only on three minutiae, though need not be local spatially), which is more robust and leads to faster retrieval. For fast matching and retrieval of images, a vocabulary tree-based representation has been recently proposed by Nister and Stewenius [30]. Similar to their approach, our indexing system relies on invariant and robust shape representation, to make the retrieval process extremely fast. In [31], Mori et al. propose solutions to improve the computational efficiency
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
of shape contexts-based approaches. They show how pruning and vector quantization techniques can be utilized to make shape context useful for large databases. Another approach for fast shape matching is to reduce the shape matching problem to the comparison of probability distributions, which does not require pose registration, feature correspondence, or model fitting. Osada et al. [32] use shape distributions sampled from a shape function and measure global geometric properties of an object for fast matching of 3-D models. Ohbuchi et al. [33] use joint 2-D histogram of distance and orientation of pairs of points for improved performance. Hamza and Krim [34] use geodesic shape distribution that measures the global geodesic distance between two arbitrary points on the surface to be able to better capture the (nonlinear) intrinsic geometric structure of the data. The idea of describing 3-D models using distance between pairs of points and/or their mutual orientations has also appeared in [35] and [36]. Apart from these, other approaches have also been proposed which focus on efficient shape matching [37]–[41]. Existing shape matching methods can also be classified based on the kind of input they require. Some methods require the shape to be represented as a closed contour [3], [24] while some others are more flexible in the kind of input they can work with and just require a set of points as their input [2], [18]. Our approach falls in the first category, but has the advantage of being efficient while being able to handle complex deformations like articulations of part structures. III. INDEXING FRAMEWORK—A GLANCE In many of the existing approaches, a query needs to be compared with every shape in the dataset to return the most similar ones and the comparisons often involve computationally demanding operations like registration, establishing correspondence, etc. Since for each query, these costly operations have to be repeated for each shape in the database, the computational load can become prohibitively high as the size of the database increases. Our goal is to come up with a fast and efficient framework for shape indexing and retrieval that can perform robust shape matching. Fig. 2 illustrates a prototype of our shape indexing framework. In the proposed approach, a shape is represented using a set of indexable feature vectors which are appropriately mapped to a hash table. For a shape , a bin in the hash table stores an entry , where is the number of feature vectors from shape that get hashed to bin . This is repeated for each shape in the database and the hash table is populated. Thus, typically, each bin of the hash table has several 2-tuples corresponding to the different shapes. The quantization scheme determines how uniformly the entries are distributed across the hash table. For a query shape , the feature vectors are extracted and its hash table entries , are determined as done for the case of the database shapes. Then a single parse through the set of matching bins that contain a 2-tuple determines its similarity with all the shapes in the database. In such a retrieval scheme, the processing time depends only on the number of 2-tuples and the number of database entries in the matching bins. So the more uniformly distributed the hash table
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
375
Fig. 3. Inner distance and relative angles. The two human silhouettes on the left show the insensitivity of inner distance with articulation of part structures.
Fig. 2. Prototype of the proposed shape indexing framework. Each shape in the database is indexed to a hash table using a set of indexable feature vectors extracted from the shape.
is, the less is the average time required to process a query. Typically, the processing time increases much more slowly as compared to the database size. The details of the algorithm are described later in Section V. IV. SHAPE REPRESENTATION In this section, we describe suitable features that seamlessly integrate with the proposed indexing framework. To ensure that the single pass retrieval algorithm directly returns the most similar shapes, the features in addition to being indexable, should be invariant to different rigid and nonrigid transformations as required by the application at hand. The choice of features affects both the generalizability and discriminability of the approach. Here we use features that depend only on a few points on the shape and also take the global shape into account. The dependence on only a few points ensures robustness while their relative configuration with respect to the global shape provides discriminability. Complexity of a typical matching algorithm depends on the complexity of the type of transformations that need to be handled, which in turn depends on the application. Articulation of part structures being one of the most difficult kind of deformations addressed by several recent shape matching techniques, we describe representative features that are invariant to articulations in addition to rigid transformations. A. Pairwise Geometrical Features Following these guidelines, each shape is characterized by a set of feature vectors where each vector encodes pairwise geometrical relationships on the shape. Each vector consists of the following features that are robust to different deformations. 1) Inner Distance Between Two Points: The Euclidean distance between two interest points is invariant to rigid transformations of the shapes and is useful for applications where it is required to preserve articulation-dependent discriminability. But even small articulations can change the Euclidean distance significantly for several point-pairs on the shape. Therefore, for applications requiring invariance to articulations, we use the inner distance (ID) [3] which is robust to articulations of part
Fig. 4. Contour distance. The shown shapes illustrate the insensitivity of contour distance to length-preserving deformations.
structures. The inner distance between two points is the length of the shortest path within the silhouette of the shape. Fig. 3 (left) illustrates the difference of inner distance over the standard Euclidean one. Computation of inner distance involves forming a graph with landmark points on the shape forming the nodes. Two nodes in this graph are connected if there is a straight line path between the corresponding points which is completely inside the shape contour. The corresponding edge weight is the Euclidean distance between the two. From this graph, any standard shortest path algorithm can be used to compute the inner-distance for all the unconnected nodes. 2) Relative Angles: Relative angles (A1 and A2) encode the angular relationship between a pair of points. Since absolute orientation of the line segment connecting the points is not invariant to rotations, we use the relative orientation of the connecting line segment with respect to the incident tangents at each end point. If the inner distance is used, this is the relative orientation of the first segment of the path corresponding to the inner distance (see Fig. 3, right). 3) Contour Distance: The contour distance (CD) is analogous to geodesic distance for 3-D shapes and captures the relative positions of the two points with respect to the entire shape contour. The contour distance between two points for 2-D silhouettes is simply the length of the contour between the two points. The distance is robust to both articulations and contour length preserving deformations and complements inner distance in characterizing the relative location of the point pair with respect to the entire shape. Fig. 4 shows the contour distance between two points of an object across several deformations. 4) Articulation-Invariant Center of Mass: The features described so far depend on the entire shape, but none of them capture much information about the relative placement of various point pairs in the shape. Though robust, such a representation may not be able to provide the desired level of discriminability. For matching across rigid transformations, the distance of the points and the line segment joining them from the center of mass can be used as additional features to encode their relative placement. Clearly, since the center of mass can change appreciably
376
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
TABLE I NUMBER OF QUANTIZATION BITS FOR THE USED FEATURES
B. Bag of Features Fig. 5. Articulation-invariant center of mass. Row 1: Original shapes. Row 2: Transformed shapes after MDS.
with articulations, these features are not invariant to articulations. We propose an articulation-insensitive alternative to the traditional center of mass if invariance to articulation is required. We first describe how the location of articulation invariant center of mass is determined followed by a description of the features derived from it. Directly determining such a point is not easy. The proposed approach first transforms a given shape to an articulation-invariant space. All objects related by articulations of their part structures get transformed to the same shape in the new space. This essentially means that the distances between the transformed points are invariant to articulations. The transformation is done using multidimensional scaling (MDS) [42]. MDS essentially places the points in a new Euclidean space such that the inter-point distances are as close as possible to the given inner distances in a collective manner. We use the classical MDS as opposed to other more accurate but iterative algorithms for efficiency. The transformation computation involves spectral decomposition of inner product matrix , which is related to the (squared) inner-distance matrix as follows:
(1) The matrix expressed as
is symmetric, positive semidefinite and can be (2)
The required transformed coordinates in an output space can be obtained by
-dimensional (3)
Fig. 5 shows the result of performing MDS on a few shapes. As desired, the transformed shapes Fig. 5 (second row) look quite similar across articulations. Here is taken to be two for visualization. The approximation improves with the dimensionality of the output space. The desired articulation-invariant center of mass is the center of mass of the transformed shape. Given the articulation-invariant center of mass of a shape, we derive features which capture the relative positioning of the point pairs. For each point pair, distances (D1, D2, D3) of the points and the line segment joining them from the estimated center of mass are computed. This is done in the transformed space itself as the distances in the transformed space are insensitive to articulations.
Given a shape, the pairwise geometrical features are computed for each pair of landmark points on the shape. Here, each point pair is characterized by a seven-dimensional feature vector comprising of the features described above. The distance based features in the vector are made robust to variations in scale by normalizing each with their medians. Note that here, we provide a basic set of features that are robust to rigid transformations and articulations of part structures. The exact choice of the set of features may depend upon the application at hand. The collection of such feature vectors for all pairs of landmark points characterize the shape. In all the experiments, we have used 100 landmark points sampled uniformly on the contour for each shape. The inner points of the shape boundary (if present) have not been considered. V. INDEXING AND RETRIEVAL OF SHAPES In this section, we describe in detail the shape indexing and retrieval algorithm using the proposed representation. Hashing the feature vectors of each shape to the index table requires discretization of the space of feature vectors. Here, we quantize each dimension of the vector independently using a suitably chosen number of levels for each. Suppose denotes the seven-dimensional feature vector. If the number of quantization levels for feature is given by , then bits are required to represent the feature. So each feature vector consisting of seven features is represented using number of bits. There are possible combinations of the feature vectors, and hence, any vector can belong to one of the bins in the hash table. Though the appropriate number of bits assigned to each feature may vary depending on the application, Table I shows the typical number of bits assigned to each feature in our system. The quantization boundaries for each feature are chosen such that there are almost the same number of feature vectors in each bin. This is done for each of the seven features independently by using a set of training shapes which are representative of the database. In all our experiments, we use roughly 10% of the dataset as training shapes to determine these boundaries. In addition to being the basic requirement of an indexing system, quantization provides robustness to variations in actual values of the features across different instances of the same shape. A. Indexing Fig. 2 illustrates the overall indexing procedure. The steps in the indexing are described below in detail. 1) For each shape in the database, landmark points are extracted from the shape contour. Though one can judicially choose these points, we simply pick points uniformly on the shape contours. 2) For each pair of landmark points, features are computed as described in Section IV. This results in a collection of fea-
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
377
Fig. 6. (Left) Retrieval algorithm. (Right) Post-retrieval rank refinement to improve accuracy.
ture vectors for each shape. If there are landmark points, we have feature vectors. 3) Each feature vector is quantized using the proposed quantization scheme. 4) The quantized feature vectors are mapped on to the appropriate bins in the hash table. The th bin contains 2-tuples of the form , where is the th shape in the database and denotes the number of feature vectors that hash to bin . of shape B. Retrieval Given a query shape, the aim is to retrieve the similar shapes in the database as efficiently as possible. Fig. 6 illustrates the retrieval phase using a flow chart. The different steps involved in the retrieval phase are enumerated below. 1) Feature vectors for the query shape are extracted in a manner similar to the one used for indexing. 2) Each vector is quantized using the same quantization steps as used for the shapes enrolled in the database. 3) Hashing each feature vector to the index table results in a list of matching bins , where is the number of query feature vectors which hash to bin . In general, the number of matching bins is much less than the total number of bins in the hash table. 4) The distance of the query with each shape in the database is initialized to zero.
and update the distance 5) Now we parse through the list of the query with each enrolled shape at every step using the following distance metric: (4) where the shape has an entry in the th matching bin. If there is no such entry for a shape in the bin, is taken to be zero. The choice of distance metric is inspired by the standard statistic. 6) If during parsing, the distance for any particular shape in the database exceeds a pre-specified threshold, then that shape is discarded from further computation. 7) At the end of the parse, we get a list of shapes from the database which are most similar to the query shape. C. Computational Complexity The computational complexity of the indexing phase depends on the complexity of feature extraction. For a shape with landmarks, the inner distance computation is of complexity . Computation of relative angles and contour distances takes . The complexity of calculating the articulation invariant center of mass is while deriving features based on it takes . Therefore, the complexity of indexing a shape is . Note that indexing can be done offline so that query processing time is not affected. To ensure fairness, all running times reported in the paper include the time spent in indexing.
378
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
TABLE II NUMBER OF TOP MATCHES VERSUS ERROR RATE
As in the indexing phase, for a query shape with landmarks, feature extraction and hashing is . Hashing results in matching bins. Suppose each bin has entries, where is the number of shapes in the database, we need to perform distance updates (4). This does not take into account the fact that many shapes are discarded during retrieval, which would further reduce the query processing time. It is difficult to put a bound on how large and can be. In the worst case, can be as large as and as large as , but that does not happen in practice. With suitable quantization, increases much slower than . Moreover, if elimination of dissimilar shapes during retrieval process is taken into account, the complexity of the process depends on the number of those database shapes which are somewhat similar to the query. These attributes make the system quite scalable. VI. RANK REFINEMENT: RE-RANKING TOP MATCHES USING DETAILED MATCHING As illustrated by the experimental results (Section VII), in most cases, the proposed indexing and retrieval approach performs very well in terms of accuracy while being extremely fast and efficient. But depending on the application and the accuracy requirement, this can be followed by a detailed matching stage where the query shape is compared with a subset of the database shapes returned by proposed indexing algorithm. Though in the indexing stage, the shapes have been represented by a number of descriptors including some global features to get a rich representation, the information about the relative positioning of the point pairs cannot be fully captured due to the bag of features kind of distributed representation. The goal of the refinement stage is to re-rank the top matches returned by the first stage according to the global similarity. At this stage, we can potentially use any of the already available algorithms. Since for each query, this matching need to be done for a very small subset of the entire database, the increase in computational overhead will be much smaller than using the same algorithm for matching the query with each and every shape in the database. In this paper, we propose such a detailed second matching stage based on the same feature vectors computed in the indexing stage. Before we describe the matching algorithm, we first investigate the usefulness of the proposed indexing/retrieval algorithm as a first step before a more rigorous matcher is used. We follow the pruning protocol used by Mori et al. [31] on the Eth80 dataset which consists of eight categories of objects with ten examples of each. Each example has 41 images from different viewpoints. As in [31], gallery is composed of one randomly picked example for each category (all views) leading to 328 images in the gallery. The remaining 2952 images are used as queries. The reported error rate at rank “r” represents the average possibility of not finding a correct match in top “r” matches as returned by the proposed indexing/retrieval system. The experiment is repeated 100 times for different random selections of gallery. The proposed hashing approach gives an
average error rate of 5.28% for 40-fold pruning (top 8 ranks) which is much better compared to the performance reported in [31] (10% using representative shape contexts and 14% using shapemes). Table II shows the variation of error rates with respect to the number of top matches being considered. So we see that though for a query shape, the best matching shape is not always the one with the highest similarity score, it comes within the top few matches and so a more rigorous matcher has the potential to further improve the matching performance by appropriately re-ranking the top matches returned by the proposed indexing framework. A. Dynamic Programming-Based Re-Ranking Algorithm We make use of the same features as used for indexing to re-rank the top matches, thereby avoiding extra computational overhead for feature extraction. To this end, we propose a tighter representation of shape by characterizing each landmark of the shape. Suppose each shape has ( is 100 for all our experiments) landmarks. Each landmark corresponds to pairwise feature vectors, each of which corresponds to a hashing bin id. In our algorithm, we create a histogram of these bin id’s to characterize each shape landmark. The different steps of the proposed re-ranking algorithm (Fig. 6) are enumerated as follows. 1) Compute histograms for all landmarks of all database shapes as a one-time pre-processing step. Each shape is characterized using ordered set of histograms corresponding to landmarks. 2) Given a query, characterize its landmarks in a similar fashion. 3) Using this representation, compute the similarity of query shape with each of the top matches returned by the proposed indexing/retrieval system using a dynamic programming-based approach (described below). Suppose the landmark points on the contour of the query shape are denoted as and that of a database shape as . If denotes the mapping between the two shapes such that the th landmark of the query shape is matched to the th landmark of the database shape, the matching cost of the two shapes is given by (5) The mapping should be chosen in such a way that it minimizes the matching cost given by (5). A penalty can be imposed if is left unmatched but for all our experiments, the penalty is taken to be zero. The cost of matching the landmarks and is given by the distance between the histograms corresponding to the two landmark locations. Since the shape contours provide information about the ordering of the points and , this can be used to restrict the mapping to this order, thereby making it possible to use dynamic programming (DP) [43] to perform this matching.
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
TABLE III PERFORMANCE COMPARISON ON MPEG7 DATASET.
Fig. 7. (Left) Example shapes from MPEG7 CE Shape 1 dataset [1]. (Right) Articulation database [3].
VII. EXPERIMENTS In this section, we report the results of empirical evaluation of the proposed system and compare it with many state-of-the art matching algorithms on standard datasets. In addition, we highlight the computational advantages of our indexing approach and the usefulness of the proposed refinement stage in terms of improvement in accuracy. In the next section, we also perform experiments on human pose estimation and activity classification to further illustrate the usefulness of the proposed framework for real-world problems that involve large size databases. In all the experiments, we take 100 uniformly sampled points on the shape contour as landmarks. A. MPEG7 Shape Dataset As our focus is to show the efficiency of the proposed system along with its accuracy, we first test it on the MPEG7 CE-Shape-1 [1] dataset, which is probably one of the largest benchmark used for evaluating shape matching algorithms. The dataset consists of 1400 silhouettes with 20 images each for 70 different objects (see Fig. 7, Left). The standard test for this dataset is the Bullseye test. It is a leave-one-out kind of test where 40 most similar shapes are determined for every query shape. The final score is given by the ratio of the number of correct hits to the best possible number of hits (20 1400). Table III compares the performance and computation time of the proposed approach with many algorithms reported in the literature. In terms of accuracy, the algorithm (without refinement) performs quite well, though the performance is not exactly at par with some of the very recently published approaches. On the other hand, as can be seen from Table III, the proposed approach takes several order of magnitudes less time than other approaches. We also report results obtained by applying the proposed refinement step using the top 100 shapes retrieved by the hashing approach. As desired, the refinement step results in significant improvement in performance. Each comparison in the refinement step takes around 0.07 s. Since this is done only for top 100 matches, the overall computation time required is still smaller than the state-of-the-art approaches. The system runs on a regular desktop and is implemented in MATLAB.
D
379
: SHAPE CONTEXT DISTANCE
The run-times reported for other algorithms are directly taken from the respective references and may vary slightly due to differences in machine configurations. References to some other papers which have reported results on this dataset can be found at http://knight.cis.temple.edu/shape/MPEG7/results.html. A recent method proposed by Yang et al. [26], which takes into account the influence of the other shapes while computing the similarity of a pair of shapes, has reported an accuracy of 93.32% on this dataset. Performance Analysis With Respect to Variation in Quantization Scheme: We perform experiments on MPEG7 dataset using different quantization levels for each of the seven features. We observe that reducing the number of levels for a particular feature by half leads to an average accuracy of 81.16%, which is less than 0.7% below the one obtained using the quantization suggested in Table I. On the other hand, doubling the quantization levels for a feature leads to an average accuracy of 79.44%. Performance Analysis With Variation in Number of Landmark Points: We perform experiments on the MPEG7 dataset using a varying number of landmarks. The results show that the retrieval accuracy degrades gracefully as the number of landmarks are reduced from 100. With 75, 50, and 25 number of landmark locations, the accuracy is 81.77%, 79.73%, and 70.90%, respectively, compared to 81.8% for 100 landmark points as used in all our experiments. B. Articulation Database The features used in our approach were chosen so as to support articulation-invariant matching. Therefore, it is important to evaluate the performance of the system on a dataset which explicitly deals with large articulations. Here we use the articulation dataset introduced in [3] which consists of eight objects with five shapes each as shown in Fig. 7 (right). We use the same test scheme as in [3]. For each shape, four most similar shapes are selected and the number of correct hits for ranks 1, 2, 3, and 4 are calculated. Clearly, the best performance of any system possible is to get 40 correct matches at all the four ranks. Table IV (left) summarizes the results obtained. The proposed approach favorably compares with other approaches. It is noteworthy that unlike other approaches, the proposed hashing approach does not require any alignment or costly matching for computing similarity with each shape in the dataset. The accuracy improves when the proposed refinement step is used for matching, further signifying the efficacy of the proposed shape representation. Since the proposed set of features is meant to be insensitive to articulations, we perform an analysis of the features on the articulation dataset. For this analysis, we divide the features into
380
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
TABLE IV ARTICULATION DATASET: (LEFT) RETRIEVAL RESULT. (RIGHT) ANALYSIS OF THE VARIOUS FEATURES USED
TABLE V RETRIEVAL RESULTS ON KIMIA 1 (LEFT) AND KIMIA 2 (RIGHT) DATASETS
Fig. 8. Kimia database. (a) Kimia dataset 1 [6]. (b) Kimia dataset 2 [7].
three sets namely, inner distance + relative angles, contour distance, and articulation-invariant center of mass (AICM)-based features. Table IV (right) summarizes the performance of these feature sets on the articulation dataset. C. Kimia Dataset 1 and 2 Kimia dataset 1 [6] [see Fig. 8(a)] consists of 25 shapes from five categories. The experiment is run in a leave-one-out pattern. The performance is measured by accumulating the correct matches at ranks 1, 2, and 3. The best one can get at any rank is 25. Table V (left) compares the results obtained with other approaches. The proposed approach compares well with other approaches. Kimia dataset 2 [7] [see Fig. 8(b)] is a larger version of dataset 1. It consists of 99 silhouettes from nine categories. The performance is measured by examining the correct matches at top 10 ranks for each query. The best one can get for each rank is 99. Table V (right) summarizes the results obtained. In addition to being extremely efficient, the proposed approach compares favorably with many existing algorithms. D. ETH-80 Database The ETH-80 database [8] contains a total of 80 objects, ten each from eight different categories (Fig. 9). Each object is represented by 41 images taken from viewpoints spaced equally
Fig. 9. Eight object categories of the ETH-80 database [8]. Each category contains ten objects with 41 views per object.
over the upper viewing hemisphere resulting in a total of 3280 images. We follow the standard testing protocol for the database which is leave-one-object-out cross-validation. Each image in the database is compared with all the images (all 41 views) from the other 79 objects, and if the correct category label is assigned, the recognition is considered successful. The recognition rate is averaged over all the objects. Table VI summarizes the results obtained. The approaches listed in the table use a single cue (either appearance or shape) for performing object recognition [8]. The best reported result on this dataset (to the best of our knowledge) is 93.02% which is obtained using a decision trees-based approach [8] that combines the first seven approaches (i.e., combines multiple cues of shape, color, etc.) for better performance. We also report the accuracy obtained using the proposed refinement step on top 5, top 10, and top 20 matches obtained from the efficient retrieval process. As desired, the refinement step improves the accuracy making it even better than the one reported using multiple cues.
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
381
TABLE VI RECOGNITION RESULT ON THE ETH-80 DATASET. COMPARED ACCURACIES ARE FROM [8]
VIII. APPLICATIONS Efficient shape matching and retrieval is useful for many practical applications. Here, we describe two such applications, namely human pose estimation and activity classification. A. Human Pose Estimation Data retrieval based on content rather than human annotation which might be absent or erroneous has received much attention recently. The ability to automatically describe human activities in long video sequences is very useful for automatic video archiving, browsing, and retrieval. Though motion is a very important cue, human activities in videos can often be described by the body pose in still frames [5]. In our context, human pose estimation implies matching the corresponding human silhouettes in the 2-D images based on their body posture and not explicitly estimating the 3-D pose. 1) Evaluation Protocol: As the underlying pose space is continuous, so exemplars cannot be easily classified into positive and negative samples. Here, we use the same evaluation protocol as followed by Tresadern and Reid [48]. If the body joint locations are known, then for each query image , the sum of squared errors between corresponding joint center projections in the image between the query image and each image in the database are calculated. Let this distance in the pose space be denoted by . The database poses are then ranked in order of similarity to the query as determined by the shape descriptor. Let the index of the closest training example be and the furthest be where is the number of images in the database. , given by The curve (6) represents the mean distance of the highest ranking database . Intuitively speaking, examples to the query for the function determines how well the ranking obtained using the shape descriptors correlates with the one given by joint locations. 2) Experiments on MOCAP Data: We first evaluate the proposed shape indexing method using binary silhouettes of a human body model generated from motion capture data which contains information about the joint centers (http://mocap.cs.cmu.edu). Fig. 10 (left) shows a few examples of binary silhouettes. The training data consists of 1500 binary silhouettes of size 128 128 from different motions. The evaluation is performed on over 400 synthetically generated test silhouettes. The silhouettes generated from the synthetic data were automatically labeled with the image projections of the joint centers for evaluation. Fig. 10 (right) shows the normalized curve of against where is the
Fig. 10. (Left) Example silhouettes from the CMU MOCAP dataset. (Right) Evaluation of the proposed method for human pose estimation. Comparison with (a) Lipschitz embeddings (lipschitz) and (b) histogram of shape contexts (hists) is also shown.
Fig. 11. Sample frames from the figure skating data [5].
total number of training images. As mentioned earlier, the lower the curve is, the better is the performance. To illustrate the effectiveness of the proposed approach for human pose estimation, we compare the results with two different approaches, viz. Lipschitz embeddings [49] and histogram of shape contexts [50], that were recently evaluated for this task [48]. The comparison of these approaches with the proposed approach is shown in Fig. 10 (right). We see that the performance of the proposed approach compares favorably with other shape descriptors. The dash-dot curve indicates the best possible ranking where distance in image space correlates perfectly with distance in pose space. Though histogram of shape contexts-based approach gives similar performance, it is several times slower than the proposed indexing framework (985 s versus 393 s for the entire experiment). 3) Experiment on Figure Skating Data [5]: We also perform human pose estimation on a real figure skating dataset [5]. The videos are unconstrained and involve swift motion of the skater and real-world motion of the camera including pan, tilt, and zoom, making it very challenging (Fig. 11). We first perform simple pre-processing of the raw video data to obtain the binary silhouettes of the skater. The foreground pixels are separated from the background by building color models for both, which is followed by median filtering to reject small isolated blobs. The extracted silhouettes are noisy and present quite a challenge for any shape matching algorithm. Since the pose space here is continuous, it is not straightforward to divide the data into separate classes and perform quantitative
382
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
Fig. 12. Visualization of similarity of the different poses of the skater using MDS. MDS places the input silhouettes in a new Euclidean space such that the inter-silhouette distances in the transformed space are as close to the distances obtained using the proposed shape matching approach. We see that similar poses appear closer to each other, even after the dimensionality of the transformed space is reduced to two.
evaluation of the retrieval results. Here, we use MDS to analyze the effectiveness of the proposed method for representing the different poses of the skater. As described in Section IV, MDS places the input binary silhouettes in a new Euclidean space such that the inter-point distances (here each point represents an input silhouette) in the new space are as close to the inter-silhouette distances obtained using the proposed shape matching approach. Fig. 12 shows the result of performing MDS on a subset of the figure skating data. Here the output space is taken to be two-dimensional for visualization purposes. As desired, similar poses appear closer to one another and different poses appear farther apart in the transformed space. We also perform a retrieval experiment to retrieve similar poses from the database. Fig. 13 shows the top 5 matches for a few query images (shown in the first column). In the figure, other than for the second query, the algorithm successfully returns images having similar pose as in the query. These examples show the ability of the proposed framework to effectively match complicated shapes using noisy silhouettes extracted from real data.
Fig. 13. Image retrieval based on pose. First column shows query image. Second to sixth columns show the top 5 matches.
B. Activity Classification The goal of activity classification is to classify the content of human activity sequences in an unsupervised manner without any prior knowledge of the type of actions being performed. Many activity classification methods have addressed this task from a shape matching perspective [51]–[54]. Here, we present a very simple approach to show the usefulness of the proposed
indexing approach for the task of activity classification. In addition to analyzing the sequence of silhouettes to characterize the spatial information, we propose a novel temporal shape representation to capture the temporal characteristics of the observed activity. Note that any method which transforms the activity classification task into a shape matching problem can benefit
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
383
Fig. 14. Silhouettes (first column) and temporal shapes (second column) for a few activities as chosen by our algorithm.
TABLE VII ACTIVITY CLASSIFICATION PERFORMANCE OBTAINED FROM SILHOUETTES-BASED SPATIAL AND TEMPORAL CHARACTERIZATION. THE TWO NUMBERS IN EACH TABLE ENTRY SHOW THE PERFORMANCE OBTAINED USING THE PROPOSED SPATIAL AND TEMPORAL CHARACTERIZATIONS, RESPECTIVELY
from the computational efficiency provided by our framework, irrespective of the exact form of representation. The following discussion provides the details of the approach and the results of the experiments performed for its evaluation. Spatial Characterization: Depending on the input video sequence, the foreground silhouettes are obtained using low-level image processing techniques. Temporal clustering is performed on these silhouettes to obtain number of clusters based on the pose ( in our experiments). We use the distance transform to do the clustering. But they can be taken as key frames or any shape representations from the approaches which view activity classification as a shape matching problem. Temporal clustering results in silhouettes which provide the spatial characterization of the sequence of foreground silhouettes. Temporal Characterization: The indexing approach presented is useful for efficient matching of shapes. In order to efficiently utilize the temporal information for activity classification, we transform it to another shape matching problem. An activity sequence can be represented using a 3-D space-time volume. The silhouettes are essentially slices of this volume taken at different instances along the temporal axis. In a similar manner, one can slice the space-time 3-D volume along one of the spatial axis (here y-axis) to obtain 2-D space-time shapes which we call as temporal shapes. Similar to the temporal clustering of the silhouettes, spatial clustering is performed on these temporal shapes to obtain ( in our experiments) key temporal shapes. Fig. 14 shows the landmark silhouettes and temporal shapes for a few activities. From the figure, we see that this representation seems to contain discriminative
information which can be utilized for classifying different activities. 2-D shapes Each video sequence is represented with ( silhouettes and temporal shapes). Note that these 2-D shapes are ordered (in time and space, respectively). Each shape is then indexed based on the computed features, resulting in separate hash tables. During retrieval, each shape of the query video is used to retrieve similar shapes from the corresponding hash table in a manner similar to the one described in the previous sections. The similarity scores of the retrieved shapes are then fused in an additive manner to obtain the final similarity scores. 1) Experimental Evaluation: We evaluate the proposed approach on the activity dataset introduced in [54]. The dataset consists of 90 video sequences of nine different persons performing ten different activities, namely, run, walk, skip, jumping jack (or jack in short), jump forward on two legs (or jump in short), jump in place with two legs (pjump), gallop sideways (side), wave with two hands (wave2), wave with one hand (wave1), and bend. We follow a leave-one-out protocol as suggested in [54], i.e., for each query sequence, we remove the entire sequence from the database and compare it against the remaining 89 sequences. Table VII shows the performance obtained in this experiment using the proposed spatial and temporal characterization of activity sequences. The performance is measured by verifying if the best match for each query sequence is from the same category or not. Clearly, the best performance possible is to get nine correct matches in all the diagonal entries (as there are nine instances per category that
384
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 5, AUGUST 2010
act as queries in a leave-one-out fashion). The performance is comparable to the approach in [54] which computes features from the complete space-time volume for classification. IX. SUMMARY AND DISCUSSION We presented an efficient and robust approach for fast matching and retrieval of shapes. The following attributes of the approach contribute towards its robustness and hence graceful degradation of performance in the presence of noise, outliers, and other deformations: 1) pair-wise geometric feature-based representation, 2) feature quantization, and 3) invariance of features to rigid transformations and articulations of part structures. Rich and robust feature representation is important even for retrieval process. This helps to achieve robust matching using an extremely simple algorithm not involving any correspondence matching as required by most state-of-the-art techniques. In most existing techniques, the alignment process has to be repeated for every shape in the database for retrieval, making them much slower than the proposed scheme. As dissimilar shapes are eliminated very early during our retrieval process, little effort is wasted in comparing a query to the database shapes which are very different, making the system scalable. We also proposed a refinement stage to further highlight the usefulness of the proposed shape representation and indexing framework. The extensive experimental evaluations performed illustrate the effectiveness of the proposed approach. Due to increase in the amount of data to be handled, most real-life applications require efficient algorithms which can scale up to large size databases. The results obtained are quite promising and make a strong case for such an efficient indexing-based framework for shape matching. REFERENCES [1] L. J. Latecki, R. Lakamper, and U. Eckhardt, “Shape descriptors for non-rigid shapes with a single closed contour,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000, pp. 424–429. [2] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. [3] H. Ling and D. W. Jacobs, “Shape classification using the inner-distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp. 286–299, Feb. 2007. [4] C. Rao, A. Yilmaz, and M. Shah, “View-invariant representation and recognition of actions,” Int. J. Comput. Vis., vol. 50, no. 2, pp. 203–226, 2002. [5] Y. Wang, H. Jiang, M. Drew, L. Ze-Nian, and G. Mori, “Unsupervised discovery of action classes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 1654–1661. [6] D. Sharvit, J. Chan, H. Tek, and B. B. Kimia, “Symmetry-based indexing of image databases,” J. Vis. Commun. Image Represent., vol. 9, no. 4, pp. 366–380, 1998. [7] T. B. Sebastian, P. N. Klein, and B. B. Kimia, “Recognition of shapes by editing their shock graphs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 550–571, May 2004. [8] B. Leibe and B. Schiele, “Analyzing appearance and contour based methods for object categorization,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003. [9] S. Biswas, G. Aggarwal, and R. Chellappa, “Efficient indexing for articulation invariant shape matching and retrieval,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8. [10] G. Mori and J. Malik, “Recognizing objects in adversarial clutter: Breaking a visual captcha,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003, pp. 134–141. [11] Z. Tu and A. L. Yuille, “Shape matching and recognition: Using generative models and informative features,” in Proc. Eur. Conf. Computer Vision, 2004, pp. 195–209.
[12] A. Thayananthan, B. Stenger, P. H. S. Torr, and R. Cipolla, “Shape context and chamfer matching in cluttered scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003, pp. 127–133. [13] G. Mori, S. Belongie, and J. Malik, “Shape contexts enable efficient retrieval of similar shapes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001, pp. 723–730. [14] H. Chui and A. Rangarajan, “A new point matching algorithm for non-rigid registration,” Comput. Vis. Image Understand., vol. 89, pp. 114–141, 2003. [15] M. Daliri and V. Torre, “Robust symbolic representation for shape recognition and retrieval,” Pattern Recognit., vol. 41, no. 5, pp. 1799–1815, 2008. [16] G. McNeill and S. Vijayakumar, “Hierarchical procrustes matching for shape retrieval,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 885–894. [17] P. Felzenszwalb and J. Schwartz, “Hierarchical matching of deformable shapes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8. [18] A. Peter, A. Rangarajan, and J. Ho, “Shape lne rouge: Sliding wavelets for indexing and retrieval,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1–8. [19] K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, and S. W. Zucker, “Shock graphs and shape matching,” Int. J. Comput. Vis., vol. 35, no. 1, pp. 13–32, 1999. [20] N. Alajlan, M. Kamel, and G. Freeman, “Geometry-based image retrieval in binary image databases,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 6, pp. 1003–1013, Jun. 2008. [21] C. Grigorescu and N. Petkov, “Distance sets for shape filters and shape recognition,” IEEE Trans. Image Process., vol. 12, no. 7, pp. 729–739, Jul. 2003. [22] J. Xie, P. Heng, and M. Shah, “Shape matching and modeling using skeletal context,” Pattern Recognit., vol. 41, no. 5, pp. 1756–1767, 2008. [23] E. Attalla and P. Siy, “Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching,” Pattern Recognit., vol. 38, no. 12, pp. 2229–2241, 2005. [24] T. Adamek and N. OConnor, “A multiscale representation method for nonrigid shapes with a single closed contour,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 5, pp. 742–753, May 2004. [25] B. Super, “Retrieval from shape databases using chance probability functions and fixed correspondence,” Int. J. Pattern Recognit. Artif. Intell., vol. 20, no. 8, pp. 1117–1138, 2006. [26] X. Yang, S. Koknar-Tezel, and L. Latecki, “Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 357–364. [27] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT Press, 2001. [28] R. S. Germain, A. Califano, and S. Colville, “Fingerprint matching using transformation parameter clustering,” Comput. Sci. Eng., vol. 4, no. 4, pp. 42–49, 1997. [29] Y. Lamdan and H. J. Wolfson, “Geometric hashing: A general and efficient model-based recognition scheme,” in Proc. Int. Conf. Computer Vision, 1988, pp. 238–249. [30] D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 2161–2168. [31] G. Mori, S. Belongie, and J. Malik, “Efficient shape matching using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 11, pp. 1832–1837, Nov. 2005. [32] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, “Shape distributions,” ACM Trans. Graph., vol. 21, no. 4, pp. 807–832, 2002. [33] R. Ohbuchi, T. Minamitani, and T. Takei, “Shape-similarity search of 3D models by using enhanced shape functions,” in Proc. Theory and Practice of Computer Graphics, 2003, pp. 97–104. [34] A. B. Hamza and H. Krim, “Geodesic object representation and recognition,” in DGCI, LNCS 2886, 2003, pp. 378–387. [35] C. Y. Ip, D. Lapadat, L. Sieger, and W. C. Regli, “Using shape distributions to compare solid models,” in Proc. ACM Symp. Solid Modeling and Applications, 2002, pp. 273–280. [36] Y. Liu, H. Zha, and H. Qin, “The generalized shape distributions for shape matching and analysis,” in Proc. Int. Conf. Shape Modeling and Applications, 2002. [37] J. Beis and D. Lowe, “Shape indexing using approximate nearestneighbour search in high dimensional spaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997, pp. 984–989. [38] I. Fudos and L. Palios, “An efficient shape-based approach to image retrieval,” Pattern Recognit., vol. 23, no. 6, pp. 731–741, 2002. [39] D. Rafiei and A. Mendelzon, “Efficient retrieval of similar shapes,” Int. J. Very Large Data Bases, vol. 11, no. 1, pp. 17–27, 2002.
BISWAS et al.: AN EFFICIENT AND ROBUST ALGORITHM FOR SHAPE INDEXING AND RETRIEVAL
[40] S. Berretti, A. Del Bimbo, and P. Pala, “Retrieval by shape similarity with perceptual distance and effective indexing,” IEEE Trans. Multimedia, vol. 2, no. 4, pp. 225–239, Dec. 2000. [41] J. Wang, W. Chang, and R. Acharya, “Efficient and effective similar shape retrieval,” in Proc. IEEE Int. Conf. Multimedia Computing and Systems, 1999. [42] A. Elad and R. Kimmel, “On bending invariant signatures for surfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 10, pp. 1285–1295, Oct. 2003. [43] E. Petrakis, A. Diplaros, and E. Milios, “Matching and retrieval of distorted and occluded shapes using dynamic programming,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 11, pp. 1501–1516, Nov. 2002. [44] F. Mokhtarian, F. Abbasi, and J. Kittler, “Efficient and robust retrieval by shape content through curvature scale space,” in Proc. Image Databases and Multimedia Search, 1997, pp. 51–58. [45] L. J. Latecki and R. Lakamper, “Shape similarity measure based on correspondence of visual parts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 1185–1190, Oct. 2000. [46] T. B. Sebastian, P. N. Klien, and B. B. Kimia, “On aligning curves,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 116–125, Jan. 2003. [47] Y. Gdalyahu and D. Weinshall, “Flexible syntactic matching of curves and its applications to automatic hierarchical classification of silhouettes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 12, pp. 1312–1328, Dec. 1999. [48] P. Tresadern and I. Reid, “An evaluation of shape descriptors for image retrieval in human pose estimation,” in Proc. British Machine Vision Conf., 2007. [49] G. R. Hjaltason and H. Samet, “Properties of embedding methods for similarity searching in metric spaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 530–549, May 2003. [50] A. Agarwal and B. Triggs, “Recovering 3D human pose from monocular images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 44–58, Jan. 2006. [51] S. Carlsson and J. Sullivan, “Action recognition by shape matching to key frames,” in Proc. IEEE Comput. Soc. Workshop Models versus Exemplars in Computer Vision , 2001. [52] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 3, pp. 257–267, Mar. 2001. [53] A. Yilmaz and M. Shah, “Actions sketch: A novel action representation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005, pp. 984–989. [54] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2247–2253, Dec. 2007.
Soma Biswas (GS’09) received the B.E. degree in electrical engineering from Jadavpur University, Kolkata, India, in 2001, the M.Tech. degree from the Indian Institute of Technology, Kanpur, in 2004, and the Ph.D. degree in electrical and computer engineering from the University of Maryland, College Park, in 2009. She is currently working as a Research Assistant Professor at the University of Notre Dame. Her research interests are in signal, image, and video processing, computer vision, and pattern recognition.
385
Gaurav Aggarwal (S’02) received the B.Tech. degree in computer science and engineering from the Indian Institute of Technology, Madras, in 2002 and the M.S. and Ph.D. degrees in computer science from the University of Maryland, College Park, in 2004 and 2008, respectively. He is currently working as a Research Scientist with Object Video, Reston, VA. His research interests are in image and video processing, computer vision, and pattern recognition.
Rama Chellappa (F’92) received the B.E. (Hons.) degree from the University of Madras, Madras, India, in 1975, the M.E. (Distinction) degree from the Indian Institute of Science, Bangalore, in 1977, and the M.S.E.E. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, IN, in 1978 and 1981 respectively. Since 1991, he has been a Professor of Electrical Engineering and an affiliate Professor of Computer Science at the University of Maryland, College Park. He is also affiliated with the Center for Automation Research (Director) and the Institute for Advanced Computer Studies (Permanent Member). In 2005, he was named a Minta Martin Professor of Engineering. Prior to joining the University of Maryland, he was an Assistant (1981-1986) and Associate Professor (1986-1991) and Director of the Signal and Image Processing Institute (1988-1990) at the University of Southern California (USC), Los Angeles. Over the last 29 years, he has published numerous book chapters, peer-reviewed journal, and conference papers. He has co-authored and edited books on MRFs, face and gait recognition, and collected works on image processing and analysis. His current research interests are face and gait analysis, markerless motion capture, 3-D modeling from video, image and videobased recognition and exploitation, compressive sensing, and hyper spectral processing. Prof. Chellappa served as an Associate Editor of four IEEE Transactions, as a Co-Editor-in-Chief of Graphical Models and Image Processing, and as the Editor-in-Chief of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. He served as a member of the IEEE Signal Processing Society Board of Governors and as its Vice President of Awards and Membership. He is serving a two-year term as the President of the IEEE Biometrics Council. He has received several awards, including an NSF Presidential Young Investigator Award, four IBM Faculty Development Awards, an Excellence in Teaching Award from the School of Engineering at USC, and two paper awards from the International Association of Pattern Recognition. He received the Society, Technical Achievement Award, and Meritorious Service Awards from the IEEE Signal Processing Society. He also received the Technical Achievement and Meritorious Service Awards from the IEEE Computer Society. At the University of Maryland, he was been elected as a Distinguished Faculty Research Fellow, as a Distinguished Scholar-Teacher, received the Outstanding Faculty Research Award from the College of Engineering, an Outstanding Innovator Award from the Office of Technology Commercialization, and an Outstanding GEMSTONE Mentor Award. He is a Fellow of the International Association for Pattern Recognition and Optical Society of America. He has served as a General the Technical Program Chair for several IEEE international and national conferences and workshops. He is a Golden Core Member of the IEEE Computer Society and served a two-year term as a Distinguished Lecturer of the IEEE Signal Processing Society.