Multibiometrics for Identity Authentication: Issues, Benefits and ...

3 downloads 4413 Views 259KB Size Report
of some of the benefits that multi biometric system offer will be illustrated ... speech [27], signature [28], and classifier-dependent measure. (confidence) [29] ..... tion for Text-Independant Speaker Verification Systems,” Digital Signal. Processing ...
Multibiometrics for Identity Authentication: Issues, Benefits and Challenges Josef Kiltter and Norman Poh Centre for Vision, Speech and Signal Processing University of Surrey Guildford GU2 7XH, United Kingdom

Abstract— Multi biometric systems exploit different biometric traits, multiple samples and multiple algorithms to establish the identity of an individual. Over any single biometric system, they have the advantage of increasing the population coverage, offering user choice, making biometric authentication systems more reliable and resilient to spoofing, and most importantly, improving the authentication performance. However, both the design and deployment of multi biometric systems raise many issues. These include system architecture, fusion methodology, selection of component biometric experts based on their accuracy and diversity, measurement of their quality, reliability and competence, as well as overall system usability, and economic viability. These issues will be addressed and possible ways forward discussed.

I. INTRODUCTION The term Multi Biometrics refers to the design of personal identity verification or recognition systems that base their decision on the opinions of more than one biometric expert (an algorithm that processes a biometric sample acquired from the user to determine or confirm his or hers identity). Multiple opinions can be derived in a number of ways. First of all they can be generated from the same biometric sample by distinct algorithms which may differ in the manner the biometric data is preprocessed, or in terms of the features (representation) extracted from the preprocessed data, or in the choice of the algorithm performing the matching of the biometric sample to one or more user templates. Additional opinions can also be generated by a repeated application of the same chain of processing to more than one biometric sample of the same modality. The range of opinions can further be extended to collecting more than one types of biometric from the user. Thus one way or another, multi biometric systems provide different sources of information about the user which have to be integrated. Although more information should make the decision making more reliable, their existence raises a challenging problem of information fusion. Information fusion can in principle be performed at data, feature or decision level. Although there may be merits in fusing information at low levels, from the multi biometric system design point of view, it is most appealing to focus on the decision level fusion, as in this way the construction This work was supported partially by the prospective researcher fellowship PBEL2-114330 of the Swiss National Science Foundation, by the EUfunded Mobio project (www.mobioproject.org) grant IST-214324 and by the Engineering and Physical Sciences Research Council (EPSRC) Research Grant GR/S46543.

{j.kittler,n.poh}@surrey.ac.uk

of biometric experts can be delegated to specialist in the respective biometric modalities to be integrated. The logical consequence of this argument is that the fusion should be performed at the symbolic decision level where each expert has already determined the user’s most likely identity. Some form of voting would then be sufficient to resolve any conflicts of opinions of a given set of experts. However, it has been demonstrated that the symbolic level fusion is not as effective as soft decision fusion, where the fusion process relates to the scores delivered by the experts for the respective hypotheses. In this paper we shall constrain our discussion to soft decision fusion, which is also receiving most of the attention in the multi biometric fusion literature. The aim of this paper is to identify and discuss the key issues and challenges posed by decision level fusion. We start by presenting a brief survey of the recent literature on multi biometric expert fusion in Section II. Examples of some of the benefits that multi biometric system offer will be illustrated in Section III. However, it will be argued that in order to optimize the benefits, the issues of multi biometric system design will have to be understood better so that more effective design methodology can be developed. Section V lists and discusses some of the main challenges that will have to be met to build next generation biometric systems. These relate to fusion architectures, fusion strategies, fusion rules, the accuracy-diversity trade off, and measures of expert confidence, competence and reliability. Before that, our discussion will focus on system reliability and the dilemma posed by the exploitation of biometric sample quality in multi biometric system fusion in Section IV. The open research issues will be summarized in Section VI. II. L ITERATURE SURVEY Combining several systems has been investigated in pattern recognition [1] in general; in applications related to audiovisual speech processing [2] [3], [4]; in speech recognition – examples of methods are multi-band [5], multi-stream [6], [7], front-end multi-feature [8] approaches and the union model [9]; in the form of ensemble [10]; in audio-visual person authentication [11]; and, in multi-biometrics [12], [13], [14], [15], [16] (and references herein), among others. In fact, one of the earliest work addressing multimodal biometric fusion was reported in 1978 [17]. Therefore, biometric fusion has a history of 30 years.

Recent advances in multi-biometrics have been focusing on quality-based fusion, e.g., [18], [19], [20], [21], [22], where the quality associated with the template as well as the query biometric sample are taken into account in decision level fusion. For this purpose, a plethora of quality measures have recently been proposed in the literature for various biometric modalities, e.g., fingerprint [23], [24], iris [25], face [26], speech [27], signature [28], and classifier-dependent measure (confidence) [29], [30]. The proposed quality measures, in general, aim to quantify the degree of excellence or conformance of biometric samples to some predefined criteria known to influence the system performance. For instance, for the face biometrics, these assess image focus, contrast and face detection reliability. In quality-based fusion, the match scores of biometric samples of higher quality are given more important consideration, i.e., higher weights, in order to compute the final combined score. There are two ways quality measures can be incorporated into a fusion classifier, depending on their role, i.e., either as a control parameter or as evidence. In their primary role, quality measures are used to modify the way a fusion classifier is trained or tested, as suggested in the Bayesianbased classifier called “expert conciliation” [18], reduced polynomial classifier [31], quality-controlled support vector machines [19], and quality-based fixed rule fusion [32]. In their secondary role, quality measures are often concatenated with the expert outputs to be fed to a fusion classifier, as found in logistic regression [20] and the mixture of Gaussians Bayesian classifier [21]. Other notable work includes the use of Bayesian networks to gauge the complex relationship between expert outputs and quality measures, e.g., Maurer and Baker’s Bayesian network [33] and Poh et al.’s quality state-dependent fusion [34]. The work in [34] takes into account an array of quality measures rather than representing quality as a scalar. By means of grouping the multi faceted quality measures, a fusion strategy can then be devised for each cluster of quality values. Other suggestions include the use of quality measures to improve biometric device interoperability [35], [36]. Such an approach is commonly used in speaker verification [37] where different strategies are used for different microphone types. Last but not least, another promising direction in fusion is to consider the reliability estimate of each biometric modality. In [38], the estimated reliability for each biometric modality was used for combining symbolic-level decisions, whereas in [39], [40], [41], [30], score-level fusion was considered. However, in [39], [40], [41], the term “failure prediction” was used instead. Such information, derived solely from the expert outputs (instead of quality measures), has been demonstrated to be effective for single biometric modalities [39], fusion across sensors for a single biometric modality [40], and across different machine learning techniques [41]. In [30], the notion of reliability was captured by margin, a concept used in largemargin classifiers [42]. Exactly how the reliability is defined and estimated for each modality, and how it can be effectively used in fusion, are still open research issues.

III. B ENEFITS OF MULTIPLE BIOMETRIC EXPERT FUSION The literature on biometrics is abundant with examples of the benefits of multi biometric fusion. We shall highlight three examples illustrating different benefits in different contexts. The first case illustrates the potential of multi modal biometrics to provide improved performance compared to the individual component experts. The second case illustrates the benefit of using quality measures in fusion. Finally, the third example demonstrates the possibility of optimizing the cost of authentication for a given target performance by managing the choice of multi biometrics. Such a cost-based analysis enables, for instance, one to decide if combining multiple biometrics is better than combining multiple samples of the same biometric (by reusing the same device), as the latter scenario is certainly less costly. The first case study, which illustrates the merit of both multimodal and intramodal fusion, detailed in [43], involves the fusion of face, voice and lip dynamics biometric modalities. The system which used off-the-shelf conventional technologies was evaluated on the XM2VTS data base [44] producing the results obtained according to the Lausanne Experimental Protocol in Configuration I [44], as shown in Table I. Although the performance of the individual modalities was mediocre to poor, with the exception of one of the voice modalities, the fusion of these biometric experts by simple weighted averaging resulted in improved performance, as summarized in Table II. The results show that multimodal fusion has the potential to ameliorate the performance of the single best expert, even if some of the component systems achieve error rates of an order of magnitude worse than the best expert. Interestingly, the combination of the three individually best experts is only marginally better than the best performing voice modality. In a more detailed look at the face and voice modalities in the second row of Table II we see that in the two modal fusion experiment involving multiple algorithms for each modality, the weights assigned to the weaker algorithms are greater than those associated with the individually best algorithms for each modality. Apparently, the diversity offered by these weaker algorithms led to much better performance than the fusion of the two algorithms of highest accuracy. Thus, in combination with the other face and voice based verification algorithms the fusion of the three modalities achieves a factor of five improvement over the individually best expert. Algorithm Lips Face 1 Face 2 Voice 1 Voice 2

threshold 0.50 0.21 0.50 0.50 0.50

FRR 14.00 % 5.00 % 6.00 % 7.00 % 0.00 %

FAR 12.67 % 4.45 % 8.12 % 1.42 % 1.48 %

TABLE I P ERFORMANCE OF MODALITIES ON TEST SET (C ONFIGURATION I).

The second example, demonstrates the benefits of using quality measures in fusion [20]. In the referenced study, logistic regression was used as a fusion classifier, producing an output that approximates the posterior probability of observing a vector of match scores (denoted as x) the elements of

weights 0.27, 0.23 0.50 0.02, 0.06 0.87, 0.05 0.03, 0.01 0.04, 0.89 0.03

threshold 0.51

FRR 0.00 %

FAR 1.31 %

0.50

0.00 %

0.52 %

0.50

0.00 %

0.29%

0 −5

TABLE II F USION RESULTS (C ONFIGURATION I).

relative change of EER (%)

Modalities lips, face and voice 4 experts (no lips) all 5 experts

−10 −15 −20 −25 −30 −35

which correspond to the outputs of the baseline systems to be combined. Quality measures (denoted as q) are then considered as additional input to the fusion classifier. The interaction of quality measures and match scores is explicitly modeled by feeding three variants of input to the fusion classifier, as follows: [x, q] (i.e., augmenting the observation of x by q via concatenation), [x, x ⊗ q] (where ⊗ denotes a tensor product) and [x, q, x ⊗ q]. If there are Nq terms in q and Nx terms in x, the tensor product between q and y produces Nq × Nx elements, hence, providing the fusion classifier with an additional degree of freedom to model the pair-wise product elements generated by the two vectors. The results of such architecture, applied to the XM2VTS database with standard and degraded data sets, are shown in Figure 1. There are six face systems and a speech system. As a result, the total number of possible combinations are 26 − 1 = 63. Each bar in this figure contains a statistic measuring the relative difference between a fusion system without using any quality measure with one of the three arrangements mentioned above. As can be observed, in all cases, using quality measures can reduce the verification error in terms of Equal Error Rate, over the baseline fusion classifier (without quality measure), by as much as 40%. Such an improvement is possible especially when both face and speech biometric modalities contain significantly different quality types. In the experimental setting, the speech was artificially corrupted by uniformly distributed additive noise of varying magnitude (from 0dB to 20dB) whereas the face data contains either well illuminated or side illuminated face images (see Figure 3(a) and (b)). Much research is needed when the noise types are not well defined or unknown. The third example casts the fusion problem as an optimization problem. Since using more biometric modalities implies using more hardware and software computation, possibly requiring additional authentication time, and further inconvenience on the part of the user, it is reasonable to attribute an abstract cost to each addition of a biometric device. The goal of optimization in this context can be formulated as one that finding the subset of candidate biometric systems that minimize the overall cost of operation. Ideally, such a cost-sensitive optimization criterion should be robust to the population mismatch, where one attempts to design a fusion classifier on a development population of users and to apply the resultant classifier to a target population of users. In both the development and the target data sets, the

−40 3,[x]

3,[x,q]

3,[x,x*q]

3,[x,q,x*q]

Fig. 1. Relative change of a posteriori EER (%) of all possible combinations of system outputs implemented using logistic regression for the four arrangements, evaluated on the good and degraded XM2VTS face database according to the modified Lausanne Protocol Configuration I [20]. Each bar shows the distribution of 63 values corresponding to the 63 possible face and speech multimodal fusion tasks obtained by exhaustively matching all face and all speech systems in fusion. The first and thrid quantiles are depicted by a bounding box and the medium value by a horizontal red bar. In arrangement [x], the quality information is not used. The mean absolute performance of the first bar is about 3 percent whereas that of the remaining three quality-based fusion systems is about 2 percent.

users are completely different but may belong to the same demographics. Unfortunately, a commonly used empirical evaluation strategy such as cross-validation has been shown to be particularly sensitive to the above population mismatch [45]. As a possible solution, the Chernoff bound and its special case, the Bhattarchayya bound was proposed as an alternative. A word of caution is that such a bound assumes that the classconditional match scores of multimodal biometric systems are normally distributed. Such a weakness can be overcome by transforming the match scores such that they conform better to normal distributions. This can be achieved, for instance, using the Box-Cox transform [46]. In a separate study [47], considering a single dimensional match score (involving only a single system output), it has been shown that even if the class conditional match scores of a biometric system do not conform to a normal distribution, the predicted performance in terms of equal error rate correlates very well with the empirically measured error (with correlation as high as 0.96). The example used here, i.e., [45], is a generalization of [47] to multimodal fusion. The robustness of a class-separability criterion with respect to deviation from the Gaussian assumption can be attributed to the fact that biases contributed by a positive and a negative class do not necessarily compound in classification; in fact, they may cancel each other. This is in contrast to the regression problem. An example of a cost-sensitive performance curve is shown in Figure 2. This experiment was conducted on the Biosecure DS2 quality-based fusion1. In order to carry out the experiment, the match score database was divided into two partition 1

The data set is available for download at http://face.ee.surrey.ac.uk/qfusion

IV. C ONFIDENCE ,

COMPETENCE , RELIABILITY AND QUALITY

The benefits of multi biometrics illustrated in the previous section and in the literature reviewed in Section II are deemed to stem from the complementarity (diversity) of the component experts. It is generally true that some experts perform better than others and the adopted fusion strategy should reflect the reliability of each opinion. However, the term reliability has many aspects which are often not distinguished. In consequence, any counter measures adopted may not be as effective as expected. First of all, the key factor affecting reliability is expert accuracy. This terminology usually refers to the probability of the expert making a correct decision. It is often considered synonymous with expert confidence. Statistically speaking, confidence usually refers to the quality of an estimate. Thus even poor accuracy can be estimated with

10

Bhattarchayya−GMM GMM−GMM Oracle−GMM

8

6 EER

of enrolled users, constituting the development and evaluation set of users, respectively. For each partition of users, different sets of individuals are used as zero-effort impostors. Two theoretical error criteria, i.e., the Chernoff and Bhattarcharyya bounds, and two empirical error measures of EER based on Bayesian classifiers, i.e., Gaussian Mixture Model as a density estimator (denoted as GMM) and a Quadratic Discriminant Analysis classifier (QDA) have been used. These four measures are applied to both the development and evaluation data sets, which consist of different genuine and impostor populations of users. For the theoretical error criteria, one only needs to fit the match score distributions using the respective (development or evaluation) data sets and estimate the error. However, for the empirical error on the development set (which applies to the GMM and QDA classifiers), a twofold cross-validation procedure is employed. The average error of the two folds (in terms of Equal Error Rate) is used as an indicator of the error on the development set. The trained classifier on the entire development set is then used to assess the empirical error of the evaluation set. For the example fusion problem treated here (see Figure 2), the cost ranges from 1 (using a single system) to 4.5 (using all 8 systems). The entire search space is therefore 28 − 1 = 255. We plot here a “rank-one” cost-sensitive performance curve (performance versus cost). Since the goal here is to achieve minimum generalization error with minimum cost, a curve towards the lower left corner is the ideal target. This curve is called rank-one because only the generalization of the top recommended system (according to a criterion assessed on the development set) is shown here. A “rank-two” cost-sensitive curve would be the minimum of the generalization errors of the top two candidates found on the development set. With enough rank order, a performance curve will lead to the oracle one (the ideal curve with error estimated on the test set). While the rank-one curve found with the Bhattarcharyya is satisfactory, the rank-three curve exhibits exactly the same characteristics as the oracle for QDA and rank-five curve for GMM. Comparatively, using the averaged cross-validated empirical error, a rank-six curve is needed to achieve the performance of the oracle for QDA and more than rank ten is needed for GMM.

4

2

0 1

1.5

2

2.5 3 cost

3.5

4

4.5

Fig. 2. Rank-one cost-sensitive performance curve using Bayesian classifier with GMM as a density estimator, assessed on all 255 fusion candidates of the Biosecure DS2 fusion database. The legend “Bhattarchayya-GMM” refers to optimization the Bhattarchayya criterion on a development set, and measuring the performance of the GMM Bayesian classifier on the evaluation set. The remaining two curves should be interpreted similarly. The experimental results using the QDA classifier are similar to this figure (not shown here).

high confidence. The use of confidence as a measure of expert capability is therefore confusing, if not misleading. Neither accuracy, nor confidence capture the notion of expert competence. By competence we understand the ability of an expert to make decisions. If a biometric sample used for decision making is an outlier, or some parts of the system fail in some way, the system should not attempt to deliver its opinion. Interestingly, a discriminative algorithm may deliver a very confident decision without registering that the test conditions were outside its terms of reference. Thus to assess reliability, it is important to measure accuracy, confidence and competence. Ideally such measurements should be carried out dynamically, for each biometric test. The current interest in biometric signal quality is motivated by the belief that such measurements could help to assess the reliability of expert opinions. Although biometric signal quality has been shown in Section III to provide a scope for performance improvement, there are a number of issues associated with its application. First of all, quality measures are multi faceted. There are some fourteen measures that have been proposed to characterize (face) image quality [34]. If all these measures are treated as separate features augmenting the dimensionality of the score space, then its size may grow disproportionately, potentially leading to over training problems. The likelihood of poor generalization is high, especially in view of the fact that quality measures themselves do not convey discriminatory information. This is particularly aggravated by the small sample size problems plaguing multi biometric expert fusion. Second, signal quality is not an absolute concept. Suppose a biometric system is designed using a set of training data acquired with a web camera, but an operational test is conducted using images captured with a camera of much better quality.

(a)

(c)

(b)

(d)

Fig. 3. (a) frontal and (b) side illumination. (c) out-of-plan head pose and (b) its corresponding corrected head pose to a frontal one using a 3D model.

From the point of view of the template, the better quality image will actually appear as degradation. Thus a quality assessment should be carried out in the context of a reference defined by the system design conditions. In fact the situation is even worse, as the existing approaches to quality based fusion do not take into account the properties of the biometric algorithms. For instance, if one system uses an algorithm that can correct for illumination or pose problems as shown in Figure 3(a) vs (b), as well as (c) vs (d), then the supplied image quality information will be misleading and may affect the performance of the system adversely. On the other hand, for algorithms that cannot compensate for changes in illumination and pose the quality information is likely to be crucial. Thus biometric sample quality assessment cannot be algorithm independent either. This raises a fundamental question how biometric sample quality should be defined, whether it can be measured directly from the biometric sample, or whether it should be derived by the biometric algorithm itself using the internal knowledge about its capabilities. V. OTHER I SSUES AND

CHALLENGES

Architectures There is a huge space of different fusion architectures that has not been explored. The range of possible configurations encompassing serial, parallel and hybrid structures is immense. While the parallel fusion strategy is most commonly used in multimodal biometric fusion, there are additional advantages in exploring serial fusion, where the experts are considered one at a time. It offers the possibility of making reliable decisions with only a few experts, leaving only difficult problems to be handled by the remaining experts. Fusion strategies An important consideration when adopting a fusion strategy is to consider the statistical dependency among the expert outputs. For instance, in intramodal fusion, several experts may rely on the same biometric sample and so higher dependency is expected among the expert outputs. On the other hand, in a multimodal setting, the pool of experts is likely to be statistically independent. In [20], three types

of frameworks are proposed in order to solve a multimodal fusion problem involving intramodal experts. The first framework simply assumes independence, in which case the fusion classifier reduces to a Naive Bayes one. The second framework considers dependency of experts in an intramodal setting (all observing the same biometric modality) whereas ignores the dependency at the multimodal setting, hence realizing a twostage fusion process. Finally, the third framework makes no assumption about the expert outputs. Expert selection Expert selection can be cast as a feature selection problem, as illustrated in [48]. However, directly applying such technique to biometric authentication is difficult. In Section III, for instance, we have seen that the optimal set of experts found using a development population of users may not be the best for the target users. The phenomenon, known as “Doddington’s menagerie”, relates to the fact that each expert is affected by the differences in the ability of the users’ biometric models to represent their respective identities. These diverse abilities have been characterized in Doddington et al. [49] by associating different animal names with the users, such as sheep and goats. Thus, a much more robust criterion, taking into account of Doddington’s menagerie, must be considered in expert selection. Another issue is raised by the cost considerations. Conciliating both the operational cost and performance into a single criterion proves to be a difficult task. A special problem of expert selection, called dynamic expert selection arises naturally in the serial fusion architecture. In dynamic expert selection, a fusion classifier may decide which expert is most informative to query even before the data is acquired. In the recent Multimodal Biometric benchmark evaluation, organized by the Biosecure (EU-funded) project 2 , dynamic fusion strategy proved to be very promising in achieving good performance while minimizing costs. VI. SUMMARY AND CONCLUSIONS Multi biometric systems offer the scope for enhancing the reliability of automatic identity recognition and verification. However, the design of such systems poses challenging problems which the current theories fail to elucidate. The main design issues of multi biometric system design have been identified and discussed with the view of stimulating research activities that will lead to better understanding of the multi biometric expert fusion problems. In turn, advances in the underpinning theories should lead to the development of effective methodologies for the design of next generation biometric systems. R EFERENCES [1] S.N. Srihari T.K. Ho, J.J. Hull, “Decision Combination in Multiple Classifier Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, January 1994. [2] S. Lucey, Audio Visual Speech Processing, Ph.D. thesis, Queensland University of Technology, 2002. [3] J. Luettin, Visual Speech and Speaker Recognition, Ph.D. thesis, Department of Computer Science, University of Sheffield, 1997. [4] T. Chen and R. Rao, “Audio-Visual Integration in Multimodal Communications,” Proc. IEEE, vol. 86, no. 5, pp. 837–852, 1998. 2 A paper is being reviewed but some results are made publicly available at http://biometrics.it-sudparis.eu/BMEC2007 in the “access control scenario” section.

[5] C. Cerisara, Contribution de l’Approache Multi-Bande a` la Reconnaissance Automatic de la Parole, Ph.D. thesis, Institute Nationale Polyt´echnique de Lorraine, Nancy, France, 1999. ´ [6] S. Dupont, Etude et D´evelopment de Nouveaux Paradigmes pour la Reconnaissance Robuste de la Parole, Ph.D. thesis, Laboratoire TCTS, Universite de Mons, Belgium, 2000. [7] Astrid Hagen, Robust Speech Recognition Based on Multi-Stream Processing, Ph.D. thesis, Ecole Polytechnique Federale de Lausanne, Switzerland, 2001. [8] M. L. Shire, Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environments for Multi-Stream Automatic Speech Recognition, Ph.D. thesis, University of California, Berkeley, USA, 2001. [9] Ji Ming and F. Jack Smith, “Speech Recognition with Unknown Partial Feature Corruption - a Review of the Union Model,” Computer Speech and Language, vol. 17, pp. 287–305, 2003. [10] G. Brown, Diversity in Neural Network Ensembles, Ph.D. thesis, School of Computer Science, Uni. of Birmingham, 2003. [11] C. Sanderson, Automatic Person Verification Using Speech and Face Information, Ph.D. thesis, Griffith University, Queensland, Australia, 2002. [12] K. Nandakumar, “Integration of M]ultiple Cues in Biometric Systems,” M.S. thesis, Michigan State University. [13] N. Poh, Multi-system Biometric Authentication: Optimal Fusion and User-Specific Information, Ph.D. thesis, Swiss Federal Institute of Technology in Lausanne (Ecole Polytechnique F´ed´erale de Lausanne), 2006. [14] K. Kryszczuk, Classification with Class-independent Quality Information for Biometric Verification, Ph.D. thesis, Swiss Federal Institute of Technology in Lausanne (Ecole Polytechnique F´ed´erale de Lausanne), 2007. [15] J. Richiardi, Probabilistic Models for Multi-Classifier Biometric Authentication Using Quality Measures, Ph.D. thesis, Swiss Federal Institute of Technology in Lausanne (Ecole Polytechnique F´ed´erale de Lausanne), 2007. [16] A. Ross, K. Nandakumar, and A.K. Jain, Handbook of Multibiometrics, Springer Verlag, 2006. [17] A. Fejfar, “Combining Techniques to Improve Security in Automated Entry Control,” in Carnahan Conf. On Crime Countermeasures, 1978, Mitre Corp. MTP-191. [18] J. Bigun, J. Fierrez-Aguilar, J. Ortega-Garcia, and J. GonzalezRodriguez, “Multimodal Biometric Authentication using Quality Signals in Mobile Communnications,” in 12th Int’l Conf. on Image Analysis and Processing, Mantova, 2003, pp. 2–13. [19] J. Fierrez-Aguilar, J. Ortega-Garcia, J. Gonzalez-Rodriguez, and J. Bigun, “Kernel-Based Multimodal Biometric Verification Using Quality Signals,” in Defense and Security Symposium, Workshop on Biometric Technology for Human Identification, Proc. of SPIE, 2004, vol. 5404, pp. 544–554. [20] J. Kittler, N. Poh, O. Fatukasi, K. Messer, K. Kryszczuk, J. Richiardi, and A. Drygajlo, “Quality Dependent Fusion of Intramodal and Multimodal Biometric Experts,” in Proc. of SPIE Defense and Security Symposium, Workshop on Biometric Technology for Human Identification, 2007, vol. 6539. [21] K. Nandakumar, Y. Chen, S. C. Dass, and A. K. Jain, “Likelihood ratio based biometric score fusion,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, pp. 342–347, 2008. [22] K. Kryszczuk and A. Drygajlo, “Credence estimation and error prediction in biometric identity verification,” Signal Processing, vol. 88, pp. 916–925, 2008. [23] H. Fronthaler, K. Kollreider, J. Bigun, J. Fierrez, F. Alonso-Fernandez, J. Ortega-Garcia, and J. Gonzalez-Rodriguez, “Fingerprint image-quality estimation and its application to multialgorithm verification,” IEEE Trans. on Information Forensics and Security, vol. 3, pp. 331–338, 2008. [24] Y. Chen, S.C. Dass, and A.K. Jain, “Fingerprint Quality Indices for Predicting Authentication Performance,” in LNCS 3546, 5th Int’l. Conf. Audio- and Video-Based Biometric Person Authentication (AVBPA 2005), New York, 2005, pp. 160–170. [25] Y. Chen, S. Dass, and A. Jain, “Localized iris image quality using 2-d wavelets,” in Proc. Int’l Conf. on Biometrics (ICB), Hong Kong, 2006, pp. 373–381. [26] X. Gao, R. Liu, S. Z. Li, and P. Zhang, “Standardization of face image sample quality,” in LNCS 4642, Proc. Int’l Conf. Biometrics (ICB’07), Seoul, 2007, pp. 242–251. [27] National Institute of Standards and Technology, “Nist speech quality assurance package 2.3 documentation,” .

[28] S. Muller and O. Henniger, “Evaluating the biometric sample quality of handwritten signatures,” in LNCS 3832, Proc. Int’l Conf. Biometrics (ICB’07), 2007, pp. 407–414. [29] S. Bengio, C. Marcel, S. Marcel, and J. Marithoz, “Confidence Measures for Multimodal Identity Verification,” Information Fusion, vol. 3, no. 4, pp. 267–276, 2002. [30] N. Poh and S. Bengio, “Improving Fusion with Margin-Derived Confidence in Biometric Authentication Tasks,” in LNCS 3546, 5th Int’l. Conf. Audio- and Video-Based Biometric Person Authentication (AVBPA 2005), New York, 2005, pp. 474–483. [31] K-A. Toh, W-Y. Yau, E. Lim, L. Chen, and C-H. Ng., “Fusion of Auxiliary Information for Multimodal Biometric Authentication,” in LNCS 3072, Int’l Conf. on Biometric Authentication (ICBA), Hong Kong, 2004, pp. 678–685. [32] O. Fatukasi, J. Kittler, and N. Poh, “Quality Controlled Multimodal Fusion of Biometric Experts,” in 12th Iberoamerican Congress on Pattern Recognition CIARP, Via del Mar-Valparaiso, Chile, 2007, pp. 881–890. [33] D. E. Maurer and J. P. Baker, “Fusing multimodal biometrics with quality estimates via a bayesian belief network,” Pattern Recognition, vol. 41, no. 3, pp. 821–832, 2007. [34] N. Poh, G. Heusch, and J. Kittler, “On Combination of Face Authentication Experts by a Mixture of Quality Dependent Fusion Classifiers,” in LNCS 4472, Multiple Classifiers System (MCS), Prague, 2007, pp. 344–356. [35] F. Alonso-Fernandez, J. Fierrez, D. Ramos, and J. Ortega-Garcia, “Dealing with sensor interoperability in multi-biometrics: The upm experience at the biosecure multimodal evaluation 2007,” in Proc. of SPIE Defense and Security Symposium, Workshop on Biometric Technology for Human Identification, 2008. [36] N. Poh, T. Bourlai, and J. Kittler, “Improving Biometric Device Interoperability by Likelihood Ratio-based Quality Dependent Score Normalization,” in accepted for publication in IEEE Conference on Biometrics: Theory, Applications and Systems, Washington, D.C., 2007, pp. 1–5. [37] R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score Normalization for Text-Independant Speaker Verification Systems,” Digital Signal Processing (DSP) Journal, vol. 10, pp. 42–54, 2000. [38] Krzysztof Kryszczuk, Jonas Richiardi, Plamen Prodanov, and Andrzej Drygajlo, “Reliability-based decision fusion in multimodal biometric verification systems,” EURASIP Journal of Advances in Signal Processing, vol. 2007, 2007. [39] W. Li, X. Gao, and T.E. Boult, “Predicting biometric system failure,” Computational Intelligence for Homeland Security and Personal Safety, 2005. CIHSPS 2005. Proceedings of the 2005 IEEE International Conference on, pp. 57–64, 31 2005-April 1 2005. [40] B. Xie, T. Boult, V. Ramesh, and Y. Zhu, “Multi-camera face recognition by reliability-based selection,” Computational Intelligence for Homeland Security and Personal Safety, Proceedings of the 2006 IEEE International Conference on, pp. 18–23, Oct. 2006. [41] T. P. Riopka and T. E. Boult, “Classification enhancement via biometric pattern perturbation,” in AVBPA, 2005, pp. 850–859. [42] A. J. Smola and P. J. Bartlett, Eds., Advances in Large Margin Classifiesr, MIT Press, Cambridge, MA, 2000. [43] U.R. Sanchez and J. Kittler, “Fusion of talking face biometric modalities for personal identity verification,” in IEEE Int’l Conf. Acoustics, Speech, and Signal Processing, 2006, vol. 5, pp. V–V. [44] K Messer, J Matas, J Kittler, J Luettin, and G Maitre, “Xm2vtsdb: The extended m2vts database,” in Second International Conference on Audio and Video-based Biometric Person Authentication, 1999. [45] N. Poh and J. Kittler, “On Using Error Bounds to Optimize Costsensitive Multimodal Biometric Authentication,” in Proc. 19th Int’l Conf. Pattern Recognition (ICPR), 2008. [46] G. E. Box and D. R. Cox, “An Analysis of Transformations,” Automatic Identification Advanced Technologies, 2007 IEEE Workshop on, vol. B, no. 26, pp. 211–246, 1964. [47] N. Poh and S. Bengio, “How Do Correlation and Variance of Base Classifiers Affect Fusion in Biometric Authentication Tasks?,” IEEE Trans. Signal Processing, vol. 53, no. 11, pp. 4384–4396, 2005. [48] N. Poh and S. Bengio, “Towards Predicting Optimal Subsets of BaseExperts in Biometric Authentication Task,” in LNCS 3361, 1st Joint AMI/PASCAL/IM2/M4 Workshop on Multimodal Interaction and Related Machine Learning Algorithms MLMI, Martigny, 2004, pp. 159–172. [49] G. Doddington, W. Liggett, A. Martin, M. Przybocki, and D. Reynolds, “Sheep, Goats, Lambs and Wolves: A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation,” in Int’l Conf. Spoken Language Processing (ICSLP), Sydney, 1998.

Suggest Documents