Fingerprint Indexing Based on Minutia Cylinder-Code - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Dec 13, 2010 - Minutia Cylinder-Code. Raffaele Cappelli, Member, IEEE, Matteo Ferrara, and Davide Maltoni, Member, IEEE. Abstract—This paper proposes a ...
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

Short Papers

Raffaele Cappelli, Member, IEEE, Matteo Ferrara, and Davide Maltoni, Member, IEEE Abstract—This paper proposes a new hash-based indexing method to speed up fingerprint identification in large databases. A Locality-Sensitive Hashing (LSH) scheme has been designed relying on Minutiae Cylinder-Code (MCC), which proved to be very effective in mapping a minutiae-based representation (position/ angle only) into a set of fixed-length transformation-invariant binary vectors. A novel search algorithm has been designed thanks to the derivation of a numerical approximation for the similarity between MCC vectors. Extensive experimentations have been carried out to compare the proposed approach against 15 existing methods over all the benchmarks typically used for fingerprint indexing. In spite of the smaller set of features used (top performing methods usually combine more features), the new approach outperforms existing ones in almost all of the cases. Index Terms—Fingerprints, identification, indexing, locality-sensitive hashing, minutiae cylinder-code.

Ç INTRODUCTION

FINGERPRINTS are undoubtedly one of the most studied biometric traits and the most widely used in civil and forensic recognition systems. Although state-of-the-art fingerprint comparison algorithms are fast and quite accurate [1], identifying an unknown fingerprint over a large database poses challenging problems in terms of both accuracy and efficiency. To reduce the total number of comparisons, fingerprint identification systems typically use prefiltering techniques. These techniques can be grouped into two main categories: 1) exclusive classification and 2) fingerprint indexing (or continuous classification) [1]. Exclusive classification techniques split the database into a fixed number of classes; during the identification phase, the searched fingerprint is compared only to fingerprints belonging to the same class [2]. The main drawback is that the number of classes is usually small and fingerprints are unevenly distributed among them: When a single fingerprint has to be searched in a large database, exclusive classification is often not able to sufficiently narrow down the search. This led some researchers to investigate retrieval systems that are not based on exclusive classes, but represent each fingerprint in a robust and stable manner so that, in the search phase, it is possible to select the most similar candidates according to such a representation. In the last decade, several fingerprint indexing techniques have been proposed; most of the approaches can be roughly classified on the basis of the features used. .

Global features, e.g., average ridge-line frequency over the whole fingerprint: These features are usually combined

. The authors are with DEIS-Universita` di Bologna, via Sacchi 3, Cesena (FC) 47521, Italy. E-mail: {raffaele.cappelli, matteo.ferrara, davide.maltoni}@unibo.it. Manuscript received 26 Feb. 2010; revised 18 June 2010; accepted 22 Oct. 2010; published online 13 Dec. 2010 Recommended for acceptance by S. Prabhakar. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-2010-02-0127. Digital Object Identifier no. 10.1109/TPAMI.2010.228. 0162-8828/11/$26.00 ß 2011 IEEE

MAY 2011

1051

___________________________________________________________________________________________________

Fingerprint Indexing Based on Minutia Cylinder-Code

1

VOL. 33, NO. 5,

Published by the IEEE Computer Society

with other features to further narrow down the search (see for instance [3]). . Local ridge-line orientations [3], [4], [5], [6], [7], other features derived from the orientation image [8], [9], or local ridge-line frequencies [7]: Typically, a single numerical vector is obtained from these features through a similaritypreserving transformation so that similar fingerprints are mapped into close points (vectors) in the corresponding multidimensional space. . Minutiae [10], [11], [12], [13]: Most indexing approaches based on minutiae derive robust geometric features from triplets of minutiae points and use hashing techniques to perform the search. . Other features obtained from the fingerprint pattern: Such as FingerCode [13] and SIFT features [14]. A few other approaches use matching scores instead of specific features to derive index keys [15], [16]. In this paper, we propose a novel fingerprint indexing technique based on the Minutia Cylinder-Code (MCC) representation [17]. Starting from the minimal fingerprint representation (only minutiae positions and angles) defined by the ISO/IEC 19794-2 standard [18], MCC encodes the neighborhood of each minutia into a fixedlength bit vector, which is invariant with respect to rotation and translation. The bit vectors are indexed by means of LocalitySensitive Hashing (LSH) [19], [20], and fingerprints are retrieved using a novel effective search algorithm. Extensive experiments on public databases show that the proposed technique outperforms state-of-the-art indexing methods in almost all of the cases. The rest of this paper is organized as follows: Section 2 introduces MCC representation and Section 3 describes the new indexing approach. In Section 4, experiments on public databases compare the new approach with 15 indexing methods published in the literature. Finally, Section 5 draws some conclusions.

2

MINUTIA CYLINDER-CODE

Let T be an ISO/IEC 19794-2 minutiae template [18]: Each minutia m is a triplet m ¼ fxm ; ym ; m g, where xm and ym are the minutia location and m is the minutia direction (in the range ½0; 2½). The MCC representation [17] associates a local structure to each minutia m: This structure encodes spatial and directional relationships between the minutia and its neighborhood, and can be conveniently represented as a cylinder whose base and height are related to the spatial and directional information, respectively (see Fig. 1). The cylinder is divided into sections: Each section corresponds to a directional difference in the range ½; ; sections are discretized into cells. During the creation of a cylinder, a numerical value is calculated for each cell by accumulating contributions from minutiae in a neighborhood of the projection of the cell center onto the cylinder base. The contribution of each minutia mt to a cell (of the cylinder corresponding to a given minutia m) depends both on spatial information (how close mt is to the center of the cell), and . directional information (how similar the directional difference between mt and m is to the directional difference associated to the section where the cell lies). In other words, the value of a cell represents the likelihood of finding minutiae that are close to the cell and whose directional difference with respect to m is similar to a given value. Fig. 2a shows the cylinder associated to a minutia with five minutiae in its neighborhood. Given a template T , MCC creates a cylinder set (CS), which contains the cylinders associated to all minutiae in T with a .

1052

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 5,

MAY 2011

Fig. 1. A graphical representation of the data structure associated to a minutia in the MCC representation. Each cell of the cylinder is a cuboid with S  S base and D height.

sufficient number of neighbors (see [17]). Each cylinder is a local data structure invariant for translation and rotation since 1) it only encodes distances and directional differences between minutiae and 2) its base is rotated according to the corresponding minutia direction (see [17]); . robust against skin distortion (which is small at a local level) and against small feature extraction errors, thanks to the smoothed nature of the functions defining the contribution of each minutia (see [17]); . with a fixed length given by the total number of cells in the cylinder. Another advantage of the MCC representation is that the value of each cell can be stored as a bit with a negligible loss of accuracy (see [17]). Such a bit-based representation is particularly well suited for indexing and is taken as the basis for the new approach proposed in this paper.1 Fig. 2b shows the bit-based version of the cylinder in Fig. 2a. In the following, each cylinder is treated as a set of n binary cells. Let vm 2 f0; 1gn be the binary vector obtained by linearizing the cells of cylinder Cm corresponding to a given minutia m. Hence, from a template T , a set of binary vectors V can be derived: .

V ¼ fvm j vm obtained from Cm ; Cm 2 CSg;

ð1Þ

where vm is the binary vector obtained from the cylinder of minutia m and CS is the cylinder set of template T .

3

THE INDEXING APPROACH

Given two templates T1 and T2 , let V1 and V2 be the corresponding sets of n-dimensional binary vectors. Due to the invariant nature of the MCC features, a simple but effective similarity measure between T1 and T2 can be defined as follows: P maxvj 2V2 fnhsðv; vj Þg ; ð2Þ hdsðT1 ; T2 Þ ¼ v2V1 jV1 j where jV1 j denotes the cardinality of set V1 , 1. In the MCC bit-based representation proposed in [17], each cell is actually associated to two bits: one denoting the cell value and one the cell “validity.” For simplicity, in the proposed indexing approach the cell validity is disregarded and all of the cells are considered valid.

Fig. 2. A graphical representation of a cylinder: (a) Lighter areas represent higher values; the corresponding minutia and its neighborhood is shown below the base of the cylinder. (b) The same cylinder using a bit-based representation (black = 0, white = 1).

 nhsða; bÞ ¼

1

dH ða; bÞ n

p ;

ð3Þ

and dH ða; bÞ is the Hamming distance. nhs is a similarity measure between two binary vectors normalized between zero and one, based on the Hamming distance; p is a parameter controlling the shape of the similarity function (p > 0); in particular, the higher p is, the quicker nhs drops to zero as dH increases. Equation (2) defines a matching score in the range ½0; 1 (0 means no similarity, 1 means maximum similarity), which is obtained by selecting, for each vector in V1 , the maximum normalized Hamming similarity (3) with vectors in V2 , and calculating the average of such values. Note that (2) is not necessarily the best choice to compare MCC templates: The reader should refer to [17] for a description of more effective general-purpose similarity measures. However, as will be explained in the following paragraphs, this particular measure is well suited for the derivation of a novel indexing technique. According to the studies in [14], [19], [20], an effective method for indexing binary vectors using Hamming-based metrics is the LSH. LSH is based on the simple idea that if two vectors are similar, then after a “projection” into a lower-dimensional subspace, they will remain similar. The projection of a binary vector v into a subspace with h dimensions (h < n) simply consists of selecting a subset of h bits from the n bits in v. More formally, let H ¼ fi1 ; i2 ; . . . ; ih g  f1; . . . ; ng: The projection of a given binary vector v on H is defined as v½H ¼ ½vi1 ; vi2 ; . . . ; vih . The set of indices H defines a hash function fH : f0; 1gn ! IN that maps a binary vector to the natural number whose binary representation is v½H (see Fig. 3, top right). In the LSH approach, ‘ hash functions are defined by randomly choosing ‘ subsets H1 ; H2 ; . . . ; H‘ , and the index consists of ‘ hash tables IH1 ; IH2 ; . . . ; IH‘ . Given a set of binary vectors to be indexed, each vector v is placed into bucket fHk ðvÞ of each hash table IHk , for k ¼ 1; . . . ; ‘. To perform a similarity search, the hash functions are applied to the query vector and all the

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 5,

MAY 2011

1053

Fig. 4. Algorithm for index creation (offline stage). Function popCount(b) calculates the population count of number b.

have been indexed through LSH, consider a slightly modified version of (3), where the normalizing factor n is replaced by n  h:   dH ða; bÞ p : ð7Þ nhsða; bÞ ¼ 1  nh

Fig. 3. An example of LSH indexing with n ¼ 15, h ¼ 3, ‘ ¼ 3, H1 ¼ fi2 ; i7 ; i9 g, H2 ¼ fi3 ; i10 ; i12 g, H3 ¼ fi1 ; i5 ; i15 g. From top to bottom: 1) a database of 10 binary vectors (v1 ; . . . ; v10 ) and the corresponding values of each hash function, which identify the indices of the buckets hit; 2) the three hash tables with the contents of each bucket; 3) a searched vector (vS ) and the corresponding buckets (denoted by the three arrows). Both hash functions fH2 and fH2 return v1 and v10 (which are actually the most similar to vs according to the Hamming distance), while fH3 produces a false candidate.

vectors in the corresponding buckets are retrieved as candidates (see Fig. 3). The candidates are finally ranked according to their Hamming distance. In practice, LSH allows us to drastically reduce the number of distances to be calculated by considering only those vectors that collide with the query vector under one or more of the hash functions. Moreover, if a certain degree of approximation is tolerated, the computation of distances can be completely avoided since the Hamming distance between two binary vectors can be estimated as follows [21]: 

CIF ðv; vj Þ dH ðv; vj Þ ffi ðn  hÞ 1  ‘

h1 !

‘ X

½fHk ðvÞ  fHk ðvj Þ

P hdsðT1 ; T2 Þ ffi

;

ð4Þ

ð5Þ

k¼1

is the number of hash functions under which the two vectors collide and  1; if a ¼ 0; ½a ¼ ð6Þ 0; otherwise is a unit impulse function. The approximation in (4) holds for moderately large values of n and under the hypothesis that dH ðv; vj Þ  n  h, which can be reasonably assumed in this application. Under the above hypothesis, given a database of fingerprint templates DB ¼ fT1 ; T2 ; . . .g whose MCC-derived binary vectors

v2V1 ðmaxvj2V2 fCIF ðv; vj ÞgÞ p

jV1 j  ‘h

p h

:

ð8Þ

Therefore, according to (8), it is possible to estimate the similarity between two templates by simply counting the number of collisions of each pair of binary vectors. Intuitively, if two vectors collide under many hash functions, then their normalized Hamming similarity is likely to be high, while if the number of collisions is small, then probably the two vectors are not very similar. Equation (8) allows us to design the indexing approach described in the following paragraphs. It is worth noting that the proposed approach is not a direct application of LSH to the fingerprint indexing task, but a novel algorithm based on (8) and on the peculiarities of the MCC representation.

3.1

where CIF ðv; vj Þ ¼

The matching score in (2) can now be estimated as follows by substituting (4) into (7):

Creating the Index

During the creation of the index (see Fig. 4), any binary vector vj of each template Ti is given as input to all of the hash functions, and the pair ði; jÞ, which identifies template Ti but also vector vj (corresponding to minutia mj of Ti ), is stored in the corresponding buckets. Since binary vectors obtained with the MCC representation tend to be quite sparse (i.e., they have more 0s than 1s), buckets whose binary representations have only a few 1 bits are more likely to contain a large number of pairs and then are less selective. For this reason, a parameter (minP C ) is used to discard buckets with a low number of 1 bits (population count).

3.2

Searching

At retrieval time, the hash functions are applied to each binary vector of the searched template, and the number of collisions with database vectors are counted using an accumulator matrix (see Fig. 5). In this way the most similar database templates according to (8) are efficiently determined and the candidate list is easily produced.

1054

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 5,

MAY 2011

TABLE 1 The Six Data Sets Considered and Indexing Methods for Which Published Results Are Available

y The results of Germain et al. [10] on NIST DB4 and FVC2002 DB1 are reported in [11] and [26], respectively. z The results in [12], [13], [27] were obtained using also the 80 fingerprints of training set “B” (see [24], [25]), resulting in 880 fingerprints from 110 fingers.

4.1

Data Sets and Performance Indicators

Most of the published methods for fingerprint indexing have been evaluated on the following data sets: NIST DB4: The NIST Special Database 4 [22], which contains images of rolled fingerprint impressions scanned from cards; the fingerprints are taken from 2,000 fingers uniformly distributed in the five fingerprint classes (arch, left loop, right loop, tented arch, and whorl): Two different fingerprint instances (F and S) are present for each finger. . NIST DB4 (Natural): A subset of NIST DB4 obtained by reducing the cardinality of the less frequent classes in nature, to resemble a natural distribution; the data set contains 1,204 fingerprint pairs. . NIST DB14: The last 2,700 fingerprint pairs from NIST Special Database 14 [23]. This database contains images of rolled fingerprint impressions scanned from cards; the class distribution of DB14 resembles the fingerprint distribution in nature. . FVC2000 DB2: The second FVC2000 database [24], which contains 800 fingerprints from 100 fingers (8 impressions per finger) captured using the capacitive scanner “TouchChip” by ST Microelectronics. . FVC2000 DB3: The third FVC2000 database [24], which contains 800 fingerprints from 100 fingers (8 impressions per finger) captured using the optical scanner “DF-90” by Identicator Technology. . FVC2002 DB1: The first FVC2002 database [25], containing 800 fingerprints from 100 fingers (8 impressions per finger) captured using the optical scanner “TouchView II” by Identix. Table 1 reports indexing approaches for which published results on the above six data sets are available. The performance of fingerprint indexing approaches is typically evaluated by reporting the trade-off between error rate and penetration rate: This trade-off usually depends on a parameter such as the maximum number of candidates to be considered (maxC .

Fig. 5. Algorithm for searching a fingerprint (online stage).

Besides discarding buckets corresponding to a small number of 1 bits (see Section 3.1), to reduce the number of elements considered by the accumulator, only minutiae satisfying basic geometric constraints are considered. Function Compatibleðm; mj Þ defines those constraints: 8 < true; if d ðm; mj Þ   and Compatibleðm; mj Þ ¼ ð9Þ dxy ðm; mj Þ  xy ; : false; otherwise; where d and dxy are the angular difference and the euclidean distance between the two minutiae, respectively. Enforcing such constraints results in tolerating a maximum rotation  and maximum displacement xy between the searched template and the database templates. Note that parameters  and xy can be adjusted for each specific search since they do not affect the indexing phase. An estimation of the computational complexity of the proposed search algorithm is OðjV j  ‘  bs Þ, where bs is the average number of entries per bucket (see Section 4.3 for example bs values obtained in the experimentation).

4

EXPERIMENTAL RESULTS

This section describes the experiments carried out to evaluate the performance of the new indexing technique and compare it with the state of the art.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 5,

MAY 2011

1055

TABLE 2 Parameter Values Used in the Experimentation

Fig. 7. Indexing performance on NIST DB4 (natural).

Fig. 6. Indexing performance on NIST DB4. Fig. 8. Indexing performance on NIST DB14.

parameter in the proposed approach, see Fig. 5). The error rate is defined as the percentage of searched fingerprints that are not found; the penetration rate is the portion of database that the system has to search on the average. Note that maxC determines the portion of database to be explored in any case, including openset scenarios [1]. Besides the above performance indicators, some authors also consider a retrieval scenario (incremental search [1]), where an ideal matching algorithm is used to stop the search as soon as the right candidate is retrieved. In such a scenario there are no retrieval errors since, in the worst case, the search can be extended to the whole database, and the average penetration rate is reported as a single performance indicator.

4.2

Evaluation Setup and Parameters

The new indexing method has been evaluated on the six data sets described in the previous section. On each data set, a state-of-theart minutiae extraction algorithm (developed by one of the bestperforming FVC2006 [28] participants) has been used to create ISO/IEC 19794-2 templates for all fingerprints. On the three NIST data sets, a 50-pixel border has been removed from the images before extracting the minutiae: This helps to reduce the incidence of false minutiae (NIST images are scanned from cards and they often include handwritten notes, stamps, etc., near the borders). MCC features have been derived from the ISO minutiae templates as described in [17] and summarized in Section 2: Cylinders with an eight-cells diameter and six sections have been created (resulting in a total of 312 cells, 52 per section). The same parameters reported in [17] have been used to create the cylinders, 1 with the only exception of  , which here has been set to 200 1 instead of 100 . In fact, in the bit-based MCC implementation, this parameter is the threshold for a cell value to be one: the lower the threshold, the higher the probability of obtaining one-valued cells. According to some preliminary experiments, the new value is more appropriate for indexing since it allows a sufficient number of one-valued cells to be obtained without reducing the discriminability of the features. The indexing approach described in Section 3 has been applied to the resulting bit vectors. Table 2 reports the

parameter values used; all of the indexing parameters have been calibrated on a separate data set: the first 3,000 fingerprint pairs from NIST Special Database 14 [23]. The calibration procedure consisted in an exhaustive search over reasonable values for the various parameters. The LSH hash functions have been chosen randomly, but a constraint has been enforced to guarantee that cells are selected as much evenly as possible. In particular, since the number of cells in a cylinder is 312 and 24 cells have to be selected for each of the 32 hash functions (for a total of 32  24 ¼ 768 selections), the random selection resulted in 168 (of 312) cells selected in two functions and 144 (of 312) in three (168  2 þ 144  3 ¼ 768).

4.3

Results

Figs. 6, 7, and 8 report the trade-off between penetration rate and error rate on the three NIST data sets. All of the results have been obtained by using the first impression for index creation and the second one for searching. From the graphs it is well evident how the proposed approach outperforms other methods published in the literature. Note that on NIST DB4 Natural (see Fig. 7), two results are reported for Lee et al.: The former denotes the performance of the approach proposed in [7], the latter refers to a combination at measurement level with the classification approach described in [29]. Figs. 9, 10, and 11 report the trade-off between penetration rate and error rate on the three FVC data sets. All of the results have been obtained by using the first impression for index creation and the remaining seven for searching, except for Liang et al. [27] on FVC2002 DB1, whose results were obtained by randomly selecting three impressions for index creation and the remaining five for searching. Note that on FVC2000 DB2 (see Fig. 9), four results are shown for De Boer et al. [13]. In fact, De Boer et al. report indexing results using a fusion of three different techniques: 1) directional field (a continuous classification approach based on a feature vector derived from local ridge-line orientations), 2) finger code (the features proposed in [30]), and 3) minutiae triplets (the approach

1056

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 5,

MAY 2011

TABLE 3 Average Penetration Rate for the Incremental Search Strategy: Best Results Are Highlighted with Bold Font

Fig. 9. Indexing performance on FVC2000 DB2.

TABLE 4 NIST DB14: Execution Times of the Proposed Algorithm

TABLE 5 NIST DB14: Cylinder and Bucket Usage Statistics

Fig. 11. Indexing performance on FVC2002 DB1.

proposed in [10]). Results of the individual techniques and of their combination are all reported in Fig. 9. From the three graphs, it is well evident how the proposed approach outperforms most of the other methods, in particular for low values of penetration rate. The only methods with comparable results (and, in some cases, slightly better for nonsmall penetration rates) are .

.

on FVC2000 DB2, the combined approach by De Boer et al., however, it shall be noted that such result was obtained not only by combining three different techniques, but also by manually correcting the core point in 13 percent fingerprints and discarding 1 percent fingerprints because no core point was found (see [13]). on FVC2002 DB1, Germain et al. (as reported in [26]), Feng and Cai, and Liang et al., however, it is worth noting that, while the proposed approach use exclusively minutiae (positions and angles), those three methods also adopt other features (ridge count in Germain et al., ridge skeleton near the minutiae in the other two); furthermore, results by Jiang et al. have been obtained with three impressions per fingerprint for index creation, instead of one.

Table 3 shows the average penetration rate for the incremental search strategy (see Section 4.1) on all data sets and compares it to other results reported in the literature: The proposed approach always achieves the best performance, except on FVC2000 DB2, where it is outperformed by the combined technique in [13]. To evaluate the indexer scalability, a further experiment on NIST DB14 has been carried out, by searching the same fingerprints on a database of 24,000 fingerprints from [23] (all fingerprints available, excluding the 3,000 used for parameter calibration). The indexing accuracy resulted almost identical to that reported in Fig. 8. Table 4 reports average execution times in the two cases: 2,700 and 24,000 indexed fingerprints. The times refer to a multithreaded C# implementation with no particular optimizations, running on a 2.66 GHz Intel Quad Core CPU. From the table it is well evident that index creation and search are very fast. Searching for a fingerprint on NIST DB14 requires just 16 milliseconds with 2,700 fingerprints indexed, and 107 with 24,000. These times are about two orders of magnitude faster than performing a one-to-many identification using the most efficient version of the MCC matching algorithm proposed in [17]. Table 5 shows some statistics about cylinder and bucket usage for the same experiments reported in Table 4. Assuming a maximum number of 255 minutiae per template and of 224 templates, each index entry can be stored using four bytes

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

and the index requires about 6.4 KB of memory (‘  51  4) per fingerprint, where 51 is the average number of cylinders indexed per fingerprint in Table 5. The memory size is larger than the MCC template size [17], but reasonable compared to the great speed up provided by the proposed approach, and not impeding the development of large-scale applications. For instance, an AFIS with one million 10-fingerprint records could consist of 10 computational units, each requiring about 6 GB of RAM to index one million fingerprints.

VOL. 33, NO. 5, [14]

[15]

[16]

[17]

[18]

5

CONCLUSION

In this paper, a new fingerprint indexing approach has been proposed, based on LSH and MCC representation. Differently from typical LSH-based approaches, the proposed algorithm does not require calculating any Hamming distance, resulting in an extremely fast execution. The indexing performance of the proposed algorithm is very good: Over six different data sets, the new method outperforms most of the existing ones. Only in a few cases, comparable results are achieved by other methods, but it is worth noting that such techniques adopt other features beside minutiae, while the proposed algorithm is only based on the mandatory features of ISO/IEC 19794-2 [18] templates (minutiae positions and angles). The robustness of the proposed solution is confirmed by the fact that exactly the same parameter values (for both MCC and LSH) have been used across a set of heterogeneous data sets, including rolled, flat, offline-scanned, and live-scan fingerprints. Finally, the proposed approach seems to scale nicely with the database size since it achieves the same accuracy with an almost linear increase in search time when the indexed fingerprints grow from 2,700 to 24,000. We are confident that the proposed method could be further improved by combining MCC representation with complementary features and by designing specific strategies to stop the search during retrieval early. Our future research will be focused in such directions.

REFERENCES [1] [2]

[3] [4]

[5]

[6]

[7]

[8] [9] [10]

[11]

[12]

[13]

D. Maltoni, D. Maio, A.K. Jain, and S. Prabhakar, Handbook of Fingerprint Recognition, second ed. Springer, 2009. R. Cappelli and D. Maio, “The State of the Art in Fingerprint Classification,” Automatic Fingerprint Recognition Systems, ch. 9, pp. 183-205, Springer, 2004. X. Jiang, M. Liu, and A.C. Kot, “Fingerprint Retrieval for Identification,” IEEE Trans. Information Forensics and Security, vol. 1, pp. 532-542, 2006. A. Lumini, D. Maio, and D. Maltoni, “Continuous versus Exclusive Classification for Fingerprint Retrieval,” Pattern Recognition Letters, vol. 18, pp. 1027-1034 , 1997. R. Cappelli, D. Maio, and D. Maltoni, “A Multi-Classifier Approach to Fingerprint Classification,” Pattern Analysis and Applications, vol. 5, pp. 136144, 2002. R. Cappelli, A. Lumini, D. Maio, and D. Maltoni, “Fingerprint Classification by Directional Image Partitioning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 402-421, May 1999. S. Lee, Y. Kim, and G. Park, “A Feature Map Consisting of Orientation and Inter-Ridge Spacing for Fingerprint Retrieval,” Proc. Audio and Video-Based Biometric Person Authentication, 2005. J. Li, W.Y. Yau, and H. Wang, “Fingerprint Indexing Based on Symmetrical Measurement,” Proc. 18th Int’l Conf. Pattern Recognition, vol. 1, 2006. M. Liu, X. Jiang, and A.C. Kot, “Fingerprint Retrieval by Complex Filter Responses,” Proc. 18th Int’l Conf. Pattern Recognition, vol. 1, 2006. R.S. Germain, A. Califano, and S. Colville, “Fingerprint Matching Using Transformation Parameter Clustering,” IEEE Computational Science and Eng., vol. 4, no. 4, pp. 42-49, Oct.-Dec. 1997. B. Bhanu and X. Tan, “Fingerprint Indexing Based on Novel Features of Minutiae Triplets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 616-622, May 2003. X. Liang, T. Asano, and A. Bishnu, “Distorted Fingerprint Indexing Using Minutia Detail and Delaunay Triangle,” Proc. Third Int’l Symp. Voronoi Diagrams in Science and Eng., 2006. J. De Boer, A.M. Bazen, and S.H. Gerez, “Indexing Fingerprint Databases Based on Multiple Features,” Proc. Ann. Workshop Circuits, Systems and Signal Processing, pp. 300-306, 2001.

[19]

[20]

[21]

[22] [23] [24]

[25]

[26] [27]

[28] [29]

[30]

MAY 2011

1057

X. Shuai, C. Zhang, and P. Hao, “Fingerprint Indexing Based on Composite Set of Reduced SIFT Features,” Proc. Int’l Conf. Pattern Recognition, pp. 1-4, 2008. A. Gyaourova and A. Ross, “A Novel Coding Scheme for Indexing Fingerprint Patterns,” Proc. Seventh Int’l Workshop Statistical Pattern Recognition, 2008. T. Maeda, M. Matsushita, and K. Sasakawa, “Identification Algorithm Using a Matching Score Matrix,” IEICE Trans. Information and Systems, special issue on biometric person authentication, vol. 84, pp. 819-824, 2001. R. Cappelli, M. Ferrara, and D. Maltoni, “Minutia Cylinder-Code: A New Representation and Matching Technique for Fingerprint Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, pp. 2128-2141, vol. 32, no. 12, Dec. 2010. ISO/IEC 19794-2:2005, Information Technology—Biometric Data Interchange Formats—Part 2: Finger Minutiae Data, 2005. P. Indyk and R. Motwani, “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” Proc. 13th ACM Symp. Theory of Computing, pp. 604-613, 1998. A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. 25th Int’l Conf. Very Large Data Bases, pp. 518-529, 1999. S. Mimaroglu and D. Simovici, “Approximate Computation of Object Distances by Locality-Sensitive Hashing,” Proc. Int’l Conf. Data Mining, 2008. C.I. Watson and C.L. Wilson, “Nist Special Database 4, Fingerprint Database,” US Nat’l Inst. of Standards and Technology (NIST), 1992. C.I. Watson, “Nist Special Database 14, Fingerprint Database,” US Nat’l Inst. of Standards and Technology, 1993. D. Maio, D. Maltoni, R. Cappelli, J.L. Wayman, and A.K. Jain, “FVC2000: Fingerprint Verification Competition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 402-412, Mar. 2002. D. Maio, D. Maltoni, R. Cappelli, J.L. Wayman, and A.K. Jain, “FVC2002: Second Fingerprint Verification Competition,” Proc. Int’l Conf. Pattern Recognition, vol. 16, 2002. J. Feng and A. Cai, “Fingerprint Indexing Using Ridge Invariants,” Proc. Int’l Conf. Pattern Recognition, 2006. X. Liang, A. Bishnu, and T. Asano, “A Robust Fingerprint Indexing Scheme Using Minutia Neighborhood Structure and Low-Order Delaunay Triangles,” IEEE Trans. Information Forensics and Security, vol. 2, no. 4, pp. 721733, Dec. 2007. BioLab. FVC2006 Web Site. http://bias.csr.unibo.it/fvc2006/, Jan. 2009. G.T. Candela, P.J. Grother, C.I. Watson, R.A. Wilkinson, and C.L. Wilson, “PCASYS: A Pattern-Level Classification Automation System for Fingerprints,” Technical Report 5647, Nat’l Inst. of Standards and Technology/ Internal Report (NISTIR), 1995. A.K. Jain, S. Prabhakar, and L. Hong, “A Multichannel Approach to Fingerprint Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 348-359, Apr. 1999.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.