A Bayesian Approach for Modeling Sensor Influence on Quality, Liveness and Match Score Values in Fingerprint Verification Ajita Rattani #1 , Norman Poh ∗2 , Arun Ross #3 #
Dept. of Computer Science and Engineering, Michigan State University, USA 1
3
∗
[email protected] [email protected]
Dept. of Computing, University of Surrey, UK 2
[email protected]
Abstract—Recently a number of studies in fingerprint verification have combined match scores with quality and liveness measures in order to thwart spoof attacks. However, these approaches do not explicitly account for the influence of the sensor on these variables. In this work, we propose a graphical model that accounts for the impact of the sensor on match scores, quality and liveness measures. The proposed graphical model is implemented using a Gaussian Mixture Model based Bayesian classifier. Effectiveness of the proposed model has been assessed on the LivDet11 fingerprint database using Biometrika and Italdata sensors.
I. I NTRODUCTION Recent research has highlighted the vulnerability of biometric systems to spoof attacks. A spoof attack occurs when an adversary mimics the biometric trait of another individual in order to circumvent the system. For instance, it has been shown that a person can fool a fingerprint system by using a finger-like object made of gelatin or play-doh that has the fingerprint ridges of another individual impressed on it [1]. In the context of fingerprints, liveness detection algorithms have been proposed as a counter-measure against spoof attacks. These algorithms attempt to discriminate live biometric samples from spoof (fake) artefacts by examining the textural, anatomical and/or physiological attributes of the finger [2], [3]. The output of these liveness detection algorithms is a singlevalued numerical entity referred to as liveness measure. Liveness detection algorithms are not designed to operate in isolation; rather, they have to be integrated with the overall fingerprint recognition system. Accordingly, recent studies have combined match scores generated by a fingerprint matcher with liveness values [4], [5] as well as image quality value [6], in order to render a decision on the recognition process. Typically, a learning-based scheme is used in such a fusion framework [7]. For example, in [4] the authors combine fingerprint match scores with liveness measures using a Bayesian Belief Network. In [6], fingerprint match scores are WIFS‘2013, November 18-21, 2013, Guangzhou, China. c ISBN 978-1-4673-5593-3 ⃝2013 IEEE.
combined with quality and liveness measures using a densitybased fusion framework. The work in [6] also established the benefits of incorporating both image quality and liveness measures in the fusion framework. In [5], face match scores are combined with liveness measures using logistic regression. However, in the aforementioned schemes, the influence of the sensor on the 3 variables - match scores, liveness values and quality - has not been considered. Such a consideration is essential for several reasons: (a) the quality of an image is impacted by the sensor used; (b) most liveness measures are learning-based and are impacted by the sensor that was used to collect live and spoof training data; (c) understanding sensor influence, can help in facilitating sensor interoperability [8] for fingerprint matchers and liveness detectors. In order to address this issue, we propose a fusion framework based on graphical models where the influence of the sensor on match scores, liveness measures and quality values is accounted for. The proposed graphical model is based on the assumption that data from a set of fingerprint sensors are available during the training stage. However, the actual sensor identity is not known during the testing stage1 . The contributions of this work are as follows: (1) development of a graphical model for fusing match scores, liveness measures and quality values while accounting for sensor influence; (2) implementation of the proposed model using a Gaussian Mixture Model (GMM) based Bayesian classifier in designing a fingerprint verification system that is robust to zero-effort impostors as well as non-zero-effort spoof attacks; and (3) evaluation of the proposed model using fingerprint data from two different sensors in the LivDet 2011 fingerprint database. This paper is organized as follows: Section 2 presents the proposed graphical model. Section 3 discusses the database and experimental protocol used in this work. Experimental results are reported and discussed in section 4. Conclusions are drawn in section 5. 1 In principal, data from the sensor used during testing does not have to be available during training.
II. T HE P ROPOSED G RAPHICAL M ODEL AND ITS I MPLEMENTATION USING GMM
Graphical Model
A graphical model is an effective tool to express the relationship between different variables [9], [10]. This is often depicted as a directional graph with nodes representing probabilities of a variable and arrows representing conditional probabilities (e.g., an edge from A to B defines P (B|A)). We formulate the spoof-resilient fingerprint verification problem as follows. An input fingerprint sample has to be compared against a live 2 template fingerprint sample. Liveness measures and quality values are extracted from both samples. Further, a match score is computed between the two samples. Let y ∈ R be the match score, qt ∈ R (qi ∈ R) represent the quality of the template (input) fingerprint sample (q = [qt , qi ]) and lt ∈ R (li ∈ R) denote liveness measure of the template (input) fingerprint sample (l = [lt , li ]). The output is one of three classes: the genuine class, G, when the input is deemed to be a live sample with the same identity as that of the template; the impostor class, I, when the input is deemed to be a live sample whose identity is not the same as that of the template; the spoof class, S, when the input is not deemed to be a live sample. Let k ∈ {G, I, S}. Further, let d signify one of N sensors used for training dataset acquisition. While the set of sensors used for training dataset acquisition is known, the actual sensor used to acquire biometric data during system deployment (or testing) does not have to be known. Table I shows three graphical models and the conditional probabilities between match score (y), quality (q), liveness measure (l) and sensor information (d), conditioned on the class label (k) to represent a) a conventional classifier (Model A); b) a fusion framework based on [6] that combines l, q and y (Model B); and c) the proposed classifier that incorporates the influence of the sensor on l, q and y (Model C). Next, we will explain these graphical models and how they can be realized through a GMM-based Bayesian classifier [7] . a) Conventional classifier: Model A in Table I represents the conventional generative classifier that attempts to model the score (y) conditioned on class label (k ∈ {G, I}), i.e., p(y|k). A conventional classifier assumes the attacker is simply a zeroeffort impostor (k = I) and does not consider the possibility of concerted spoof attacks. It can be implemented using the log-likelihood ratio based test statistic as follows: yallr = log
p(y|k = G) . p(y|k = I)
(1)
b) Fusion framework against spoof attacks: Model B in Table I models the joint density of match scores, quality and liveness measures conditioned on class k. This model is based on the framework mentioned in [6] for fingerprint verification against zero-effort impostor and spoof attacks. The joint distribution represented by model B is in the following 2 In this work, the template is assumed to be that of a live fingerprint sample. Its liveness value is nevertheless computed.
Conditional Probabilities •
k → y ; p(y|k)
q → y ; p(y|q) l → y ; p(y|l) • k → y ; p(y|k) = p(y|k, q, l) •
•
d → y; p(y|d) k → y; p(y|k) = p(y|k, d) • d → q; p(q|d) • d → l; p(l|d) = p(q, l|d) • •
TABLE I: Three graphical models that describe the relationship between match scores (y), quality (q), liveness measures (l) and sensor information (d), conditioned on the class label (k).
form: p(y, k, q, l) = p(y|k, q, l)p(q)p(l)P (k)
(2)
Model B can be realized using the log-likelihood ratio based test statistic as follows: ybllr = log = log
p(y|k=G,q,l) p(y|k̸=G,q,l)
p(y, q, l|k = G) , p(y, q, l|k ̸= G)
p(q|k = G) p(l|k = G) + log + log . p(q|k ̸= G) p(l|k = ̸ G) | {z } | {z }
(3)
where (k ̸= G) ∈ {I, S}. The underbraced terms in (3) will be zero as quality (q) and liveness (l) measures are assumed to have no discriminatory information for distinguishing between the genuine and impostor classes. Therefore, the log-ratio for these terms will be zero. Further, p(y|k, q, l) cannot be directly estimated using off-the-shelf algorithms. This is because the conditioning variables q and l in (3) are continuous. Alternatively, model B can be effectively realized by the joint density estimate of y, q, l for class k i.e., p(y, q, l|k). This method was reported to be more effective than model A (the baseline)
under zero-effort impostors as well as spoof attacks in [6], as will also be confirmed by our experiments here.
length criterion. Hence, the GMM fitting algorithm proposed in [11] was used in this study3 .
c) Proposed model: Model C in Table I is an extension of model B and models the influence of the sensor. This model attempts to model the dependency of score (y) on class label (k ∈ {G, I, S}) as well as the sensor (d). Further, the quality (q) and liveness measures (l) are categorized according to the sensor (d) used in the training dataset. The joint densities represented by model C are as follows:
III. DATABASE , P ROTOCOL AND P ERFORMANCE M ETRICS
p(y, k, q, l, d)
=
p(y|k, d)p(q, l|d)P (d)P (k)
(4)
Model C can be effectively realized by extending (3) as:
ycllr
∑ p(y, q, l|k = G, d)P (d|q) . = log ∑d p(y, q, l|k ̸= G, d)P (d|q) d
(5)
P (d|q) in (5) can be estimated using the Bayes rules as: P (d|q) =
p(q|d)P (d) . p(q)
(6)
There is an integration over the sensor (d) in (5) because sensor (d) to be used during the testing (or system deployment) is not known in advance and its probability is inferred from the quality (q) of the operational data i.e., P (d|q) in (5). Model C is based on a conjecture that match scores, quality and liveness measures are sensor dependent (d → {y, q, l} in model C, Table I(c)). Further, their exists no significant correlation between quality (q) and liveness (l) measures (absence of an arrow between q and l in model C). These conjectures will be validated by empirical evidence (see Section IV). The densities p(y, q, l|k) in (3), p(y, q, l|k, d) in (5) and p(q|d) in (6) to estimate P (d|q) in (5) are themselves estimated using the Gaussian Mixture Model (GMM). GMM has been successfully used to estimate joint densities [11]. Let ϕN (x, µ, Σ) be the N -variate gaussian density with mean vector µ and covariance matrix Σ, i.e., 1 ϕN (x, µ, Σ) = (2π)−N/2 |Σ|−1/2 exp(− (x−µ)T Σ−1 (x−µ)) 2 (7) The estimates of p(x|k) (where x is an observation vector which is (y, q, l) in our case) for class k is obtained as a mixture of Gaussians as: p(x|k) =
Mk ∑
wk,j ϕN (x, µk,j , Σk,j )
(8)
j=1
where Mk is the number of mixture components used to model the densities of class k. wk,j is the weight assigned to the ∑Mk j th mixture component in p(x|k), j=1 wk,j = 1. Selection of the appropriate number of components is one of the most challenging issues in mixture density estimation. The GMM fitting algorithm proposed in [11] automatically estimates the appropriate number of components and the component parameters using an EM algorithm and the minimum message
The LivDet11 dataset was used to evaluate fingerprint liveness detection algorithms submitted to the Second International Competition on Fingerprint Liveness Detection (LivDet11) [12]. It consists of 1000 live and 1000 fake fingerprint images in the training set and the same number of images in the test set. All images collected using the Biometrika and Italdata sensors were used in this study4 . The live images were obtained from 200 different fingers with 5 samples per finger for each set. The fake fingerprints were fabricated using the following five materials: gelatine, silicone, woodglue, ecoflex and latex. 200 fake fingerprints were fabricated per material (200 × 5 = 1000) from 20 fingers with ten samples per finger for each set (training and testing). The NIST Bozorth35 software was used for obtaining a match score between a pair of fingerprint images. The quality of live as well as fake fingerprint impressions was measured using the IQF freeware developed by MITRE6 . The quality factor ranges between 0 and 100, with 0 being the lowest and 100 being the highest quality. Finally, fingerprint liveness was assessed using the liveness measure proposed by Nikam and Aggarwal [3], which is based on Local Binary Pattern (LBP) features. A two class Support Vector Machine (SVM) (implemented using LIBSVM package7 ) was trained using LBP features extracted from live and fake images in the training set. The output score (probability estimate) of SVM was then used as a liveness measure. Equal Error Rate (EER) using this liveness measure was evaluated to be 10.95% and 18.95% on the test partition of the LivDet11 database for Biometrika and Italdata sensors, respectively. Protocol and Performance metrics: Following the LivDet2011 protocol described in [12], we used 1000 live and 1000 fake images to train the models and the remaining 1000 live and 1000 fake images were used to evaluate the performance of the models. a) Learning-based fusion framework: The observation vector consists of a match score (y) and a pair of quality values (q) as well as liveness measures (l) extracted from a pair of training images - the input and the template. This observation vector is mapped to one of three output classes: G, I, or S (4000 observation vectors were used for each class). This information, i.e., (y, l, q, k) is used to train the GMM-based Bayesian classifiers in model B (3) and model C (5). b) Performance assessment metric: In a spoof-resilient fingerprint verification system, as defined in this work, the G 3 We used the MATLAB code available at http://www.lx.it.pt/∼mtf/ mixturecode.zip 4 Data from other sensors in the LivDet11 dataset could not be used due to few correspondences in the subject identity between live and fake images. 5 http://www.nist.gov/itl/iad/ig/nbis.cfm 6 http://www.mitre.org/tech/mtf/ 7 http://www.csie.ntu.edu.tw/∼cjlin/libsvm/
class indicates an “Accept” while the I and S classes indicate a “Reject”. Therefore, we define overall false acceptance rate (OF AR) as the proportion of impostor and spoof input samples that are incorrectly classified as being genuine. Genuine Acceptance Rate (GAR) is defined as the proportion of genuine input samples that are correctly classified as such. The False Reject Rate (FRR), which is 1-GAR, denotes the proportion of genuine input samples that are incorrectly classified as being an impostor or a spoof. Overall Equal Error Rate (O-EER) is the rate at which OF AR is equal to F RR. Further, with respect to the impostor class, there is false accept rate of impostor samples (IF AR). With respect to the spoof class, there is false accept rate of spoof samples (SF AR). SF AR (IF AR) is calculated as proportion of spoofed (impostor) samples incorrectly classified as genuine. IV. E XPERIMENTAL R ESULTS (a) Model B tested using Biometrika sensor.
1) Assessment of model B under same- and cross-sensor operation: First we evaluated the performance of model B under same- and cross-sensor conditions. Figure 1a shows the ROC curves for the performance of model B trained and tested using Biometrika (legend “Model-B Bio-Bio”), and that trained using Italdata and tested using Biometrika (legend “Model-B Ital-Bio”). Comparative assessment has been made with the baseline (legend “Model-A Bio-Bio”) under zeroeffort impostor and spoof attacks. Similarly, Figure 1b shows the performance of model B trained and tested using Italdata (legend “Model-B Ital-Ital”), and that trained using Biometrika and tested using Italdata (legend “Model-B Bio-Ital”). It can be seen from these Figures that model B, when trained and tested using the same sensor (legend “Model-B Bio-Bio” in Figure 1a and “Model-B Ital-Ital” in Figure 1b) can significantly enhance the performance of fingerprint verification under zero-effort impostor as well as spoof attacks by 59.2% and 41.95% over the baseline (legends “Model-A Bio-Bio” and “Model-A ItalItal”) for Biometrika and Italdata sensors, respectively. However, the performance of model B significantly degrades on cross-sensor operation. For instance, there is a 65.44% increase in the O-EER of model B that is trained using Italdata and tested using Biometrika (legend “ModelB Ital-Bio”, Figure 1a) over that trained and tested using Biometrika (legend “Model-B Bio-Bio”, Figure 1a). Similarly, there is a 58.98% increase in the O-EER of model B that is trained using Biometrika and tested using Italdata (legend “Model-B Bio-Ital”, Figure 1b) over that trained and tested using Italdata (legend “Model-B Ital-Ital”, Figure 1b). The degradation in the performance of model B on cross-sensor operation is due to the fact that match scores, quality and liveness measures are impacted by the acquisition sensor used but has not been accounted for. 2) Validation of the conjectures for model C: Model C is based on the conjectures that 1) match scores (y), quality (q) and liveness measures (l) are influenced by the acquisition sensor used (see section II(c)), 2) there exists no significant correlation between quality (q) and liveness measures (l)
(b) Model B tested using Italdata sensor.
Fig. 1: ROC curves for the matching performance of model B under same- and cross-sensor operating conditions for Biometrika and Italdata sensors. Comparative assessment has been made with model A under zero-effort impostor and spoof attacks.
of the fingerprint samples, and 3) quality measures offer discriminatory information to infer the probability of sensor identity, i.e., P (d|q) in (5). Figure 2 shows that match score, quality and liveness measures are influenced by the acquisition sensor used. Further, Figure 3 shows the quality (IQF ) and liveness value (LM ) of live and fake fingerprint samples of a randomly selected user acquired using Biometrika and Italdata sensors. Scatter plot of liveness measure and quality of the fingerprint samples (Figure 4) acquired using the Biometrika sensor show that no significant correlation (ρ = 0.25) exists between quality (q) and liveness measures (l). A similar observation was made for the Italdata sensor. Note that correlation between q and l may
0.1
0.06
0.04
4
80
Biometrika Italdata
3.5 70 3
Quality
0.08
Likelihood
4.5
Genuine−Biometrika Spoof−Biometrika Impostor−Biometrika Genuine−Italdata Spoof−Italdata Impostor−Italdata
Likelihood
0.12
60
50
2.5 2 1.5
40
1 30
0.02
0.5 20 0 0
20
40
60
80
100
120
140
160
Biometrika
Italdata
Sensors
0 0
0.2
0.4
Scores
0.6
0.8
1
Liveness measure
Fig. 2: Score distribution, boxplot of quality and likelihood of liveness measures for fingerprint images from the Biometrika and Italdata sensors. It can be seen that scores, quality and liveness measures are sensor dependent.
1
0.9
(a) Live samples
Liveness measures
0.8
0.7
0.6
0.5
0.4
0.3
0.2 20
25
30
35
40
45
50
55
60
65
Quality
Fig. 4: Scatter plot between liveness measure and quality of the fingerprint samples acquired using the Biometrika sensor. It can be seen that no significant correlation (ρ = 0.25) exists between the two variables. (b) Fake samples
Fig. 3: Quality (IQF ) and liveness measure (LM ) of live as well as fake fingerprint samples of a subject acquired using Biometrika and Italdata sensors.
vary depending upon the dataset and the choice of liveness and quality measurement algorithms. In such cases, model C may derive further advantage on establishing correlation between the two variables. These figures validate the first two conjectures for the graphical model C. Next, we tested the feasibility of estimating the posterior probability of sensors given quality measures, P (d|q) (see (5)). In our context, this classifier was built from the density p(q|d) using the Bayes rule (see (6)). Equal error rate of the classifier (6) in inferring the actual sensor identity is 8.4% and 5.4% for Biometrika and Italdata sensors, respectively. This suggests the effectiveness of quality (q) values in inferring the actual sensor identity,
TABLE II: SFAR, IFAR and O-EER of model B and model C for the Biometrika and Italdata sensors. In all cases, the decision threshold has been set at the O-EER point. Model Model-C Both-Bio Model-B Both-Bio Model-C Both-Ital Model-B Both-Ital
SFAR[%] 12.99 15.96 14.88 25.30
IFAR[%] 1.47 5.95 5.80 1.60
O-EER[%] 7.39 11.10 10.46 13.61
therefore reinforcing the influence of the sensor on image quality. 3) Assessment of model C in a multi-sensor environment: In this section, we evaluate the performance of model C. Comparative assessment has been made with model B when trained using information from both the sensors and tested using each sensor individually. In comparison to model C,
Fig. 5: ROC Curves for the performance of model C under operation with Biometrika and Italdata sensors. Comparative assessment is made with model B when trained using images from both the sensors.
model B does not take into account sensor information when using training data from multiple sensors. Further, model B does not infer the sensor identity during the testing stage. Figure 5 shows the performance of model B and C (when trained using data from both the sensors) for the Biometrika and Italdata sensors. O-EER of model C operating with Biometrika (legend “Model-C Both-Bio”) and Italdata sensors (legend “Model-C Both-Ital”) is evaluated to be 7.39% and 10.46%, respectively. Further, model B when trained using information from both the sensors and tested on images from Biometrika (legend “Model-B Both-Bio”) and Italdata sensors (legend “Model-B Both-Ital”) resulted in a performance degradation of 33.4% and 23.14%, respectively, with respect to that of model C. This shows the advantage of Model C over Model B. Further, Figure 5 and Figure 1 indicate that the O-EER of model C (legend “Model-C Both-Bio” and “Model-C BothItal”, Figure 5) is equivalent to the O-EER of model B trained and tested using the same sensor (legend “Model-B Bio-Bio” and “Model-B Ital-Ital”, Figure 1). For instance, the O-EER of model B trained and tested using Biometrika sensor (legend “Model-B Bio-Bio” in Figure 1) is 7.25% and that of model C tested using Biometrika (legend “Model-C Both-Bio” in Figure 5) is 7.39%. These observations convey the effectiveness of model C in facilitating operation using multi-sensors during the deployment stage. In Table II, we tabulate SF AR, IF AR and O-EER of model B and model C, when the decision threshold is defined by the O-EER point. It can be seen that “Model-C Both-Bio” and “Model-C Both-Ital” obtain reduced SF AR, IF AR and O-EER over “Model-B Both-Bio” and “Model-B Both-Ital”, respectively.
V. C ONCLUSION Recent studies have addressed the security of fingerprint verification systems by combining match scores with quality values and liveness measures in a learning-based fusion framework. We advance the state of the art by modeling sensor influence on these variables. This is realized through a graphical model that accounts for the impact of the sensor on match scores, quality and liveness measures. Experimental investigations on the LivDet11 fingerprint database indicate that a) existing learning-based fusion framework cannot operate effectively in a multi-sensor scenario, and b) the proposed graphical model can effectively operate in a multisensor environment with performance quite comparable to a fusion framework that is trained and tested using images from the same sensor. The effectiveness of the proposed model can be further enhanced by improving the sensitivity of the underlying liveness measure and incorporating user-specific score characteristics for spoof attacks [13]. ACKNOWLEDGMENT Ross was supported by US NSF CAREER Award # IIS 0642554. Poh was partially supported by Biometrics Evaluation and Testing (BEAT), an EU FP7 project with grant no. 284989. R EFERENCES [1] T. Matsumoto, H. Matsumoto, K. Yamada, and S. Hoshino, “Impact of artificial “gummy” fingers on fingerprint systems,” in Proc. of SPIE Opt. Sec. Counterfeit Deterrence Tech. IV, 2002, pp. 275–289. [2] A. Abhyankar and S. Schuckers, “Fingerprint liveness detection using local ridge frequencies and multiresolution texture analysis techniques,” in Proc. of IEEE Intl. Conf. on Image Processing, Georgia, USA, 2006, pp. 321–324. [3] S. Nikam and S. Aggarwal, “Local binary pattern and wavelet-based spoof fingerprint detection,” Intl. Journal of Biometrics, vol. 1, no. 2, pp. 141–159, 2008. [4] E. Marasco, Y. Ding, and A. Ross, “Combining match scores with liveness values in a fingerprint verification system,” in Proc. of IEEE Intl. Conf. on Biometrics: Theory, Applications and Systems (BTAS), Washington, USA, 2012, pp. 1–8. [5] I. Chingovska, A. Anjos, and S. Marcel, “Anti-spoofing in action: joint operation with a verification system,” in Proc. of IEEE Intl. Workshop on Biometrics (CVPRW), USA, 2013, pp. 1–8. [6] A. Rattani and N. Poh, “Biometric system design under zero and nonzero effort attacks,” in Proc. of Intl. Conf. on Biometrics (ICB), Madrid, Spain, 2013. [7] A. Ross, K. Nandakumar, and A. K. Jain, Handbook of Multibiometrics. Springer Verlag, 2006. [8] A. Ross and A. K. Jain, “Biometric sensor interoperability: A case study in fingerprints,” in Proc. of Intl. ECCV Workshop on Biometric Authentication (BioAW), Prague, Czech Republic, 2004, pp. 134–145. [9] F. Jensen, An Introduction to Bayesian Networks. Springer, 1996. [10] N. Poh, J. Kittler, and T. Bourlai, “Quality-based score normalization with device qualitative information for multimodal biometric fusion,” IEEE Transactions on Systems, Man, and Cybernetics, Part A, vol. 40, no. 3, pp. 539–554, 2010. [11] M. Figueiredo and A. Jain, “Unsupervised learning on finite mixture models,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, 2002. [12] D. Yambay, L. Ghiani, P. Denti, G. L. Marcialis, F. Roli, and S. Schuckers, “Livdet2011 - fingerprint liveness detection competition,” in Proc. of 5th Intl. Conf. on Biometrics (ICB), Delhi, India, 2012, pp. 208–215. [13] A. Rattani, N. Poh, and A. Ross, “Analysis of user-specific score characteristics for spoof biometric attacks,” in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), Providence, USA, 2012, pp. 124–129.