Classifier ensembles and optimization techniques

143

International Journal of Hybrid Intelligent Systems 8 (2011) 143–154 DOI 10.3233/HIS-2011-0135 IOS Press

Classifier ensembles and optimization techniques to improve the performance of cancellable fingerprint Anne M.P. Canutoa,b,∗, Michael C. Fairhurstb , Fernando Pintroa , Jo˜ao C. Xavier Juniorc , Antonino Feitosa Netoa and Luis Marcos G. Gonc¸alvesc a

Department of Informatics and Applied Mathematics, Federal University of RN, Natal Brazil School of Engineering and Digital Arts, University of Kent, Canterbury, UK c Computing and Automation Engineering Department, Federal University of RN, Natal Brazil b

Abstract. The main aim of biometric-based identification systems is to automatically recognize individuals based on their physiological and/or behavioural characteristics such as fingerprint, face, hand-geometry, among others. These systems offer several advantages over traditional forms of identity protection. However, there are still some important aspects that need to be addressed in these systems. The main questions are concerned with the security of biometric authentication systems since it is important to ensure the integrity and public acceptance of these systems. In order to avoid the problems arising from compromised biometric templates, the concept of cancellable biometrics has recently been introduced. The concept is to transform a biometric trait into a new representation for enrolment and matching. Although cancellable biometrics were proposed to solve privacy concerns, the concept raises new issues, since they make the authentication problem more complex and difficult to solve. Thus, more effective authentication structures are needed to perform these tasks. In this paper, we investigate the use of ensemble systems in cancellable biometrics, using fingerprint-based identification to illustrate the possible benefits accruing. In order to increase the effectiveness of the proposed ensemble systems, three feature selection methods will be used to distribute the attributes among the individual classifiers of an ensemble. The main aim of this paper is to analyse the performance of such well-established structures on transformed biometric data to determine whether they have a positive effect on the performance of this complex and difficult task. Keywords: Classifier ensembles, selection-based combination methods, confidence measures

1. Introduction Establishing the identity of a person is a critical task in many areas of modern society and has led to an increasing interest in developing practical identity authentication systems. Conventional representations of a individual’s identity such as passwords and ID cards are not sufficient for reliable identity determination because they can be easily misplaced, shared, or stolen. In this case, biometric authentication, which is the science of establishing the identity of a person using his/her ∗ Corresponding

author. E-mail: [email protected].

1448-5869/11/$27.50  2011 – IOS Press. All rights reserved

anatomical and behavioural traits, has become a potential solution of increasing interest [2,16]. Biometric traits have a number of desirable properties with respect to their use as an authentication token, namely, reliability, convenience, universality, and so forth but their principal advantage is in binding data or activity unequivocally to an individual person. These characteristics have led to the widespread deployment of biometric authentication systems. However, there are still some issues that need to be addressed in these systems. The main issues are concerned with the security of biometric authentication systems since these systems need to ensure their integrity and public acceptance [10]. For biometrics-based authentication

144

A.M.P. Canuto et al. / Classifier ensembles and optimization techniques to improve the performance of cancellable fingerprint

systems, security is even more important than for the non-biometric systems, since a biometric is permanently associated with a user and cannot be revoked or cancelled if compromised. In this case, if a biometric identifier is compromised, it is lost forever and possibly for every application where the biometric is used. Moreover, if the same biometric is used in multiple applications, a user can potentially be tracked from one application to the next by cross-matching biometric databases. To avert any potential security crisis, vulnerabilities of biometric systems must be identified and addressed systematically [2]. The use of cancellable biometrics is being increasingly applied to address such security issues [10]. This approach uses transformed or intentionally-distorted biometric data instead of original biometric data for identification. Although the use of cancellable biometrics bridges the gap between the convenience of biometric authentication and security vulnerabilities, there is a risk that using such transformed data will decrease the performance of the biometric-based system, since the level of complexity for the transformed biometric is much higher than that of the original data. However, it is important for a biometric system to have a good trade-off between discriminability and non-invertibility (high security) when using feature transformations in any biometric modality. As a contribution to this important topic, this paper will compare the performance of single classification methods as well as classifier ensembles, in both original and transformed biometric space. The biometric modality to be used for illustrative purposes in this paper is the fingerprint, and a modified version of a noninvertible transformation originally proposed in [15] is also presented in this analysis. In order to increase the effectiveness of the ensemble systems, three feature selection methods will be used to distribute the attributes among the individual classifiers of an ensemble, in which two of them are optimization-based algorithms and the remaining one is a simple random distribution method. The main aim of this comparison is to analyse the gain that the use of ensemble systems can bring with respect to the transformed biometric data. It is expected that this gain is higher than in the original space and, based on this, it is our aim to demonstrate that the use of ensemble systems are especially significant when applied in adverse situations. In [?], an initial investigation was done, analysing only the use of ensemble systems in cancellable fingerprint without the feature selection process. In addition, we have used a

simpler version of a genetic algorithm in [23]. As these previous works have provided interesting results, this paper focus on a more detailed analysis, using ensemble without and with feature selection and this selection is made by two well-known optimization techniques. 2. Problem description and related works In order to offer security for biometric-based authentication systems, the biometric templates must be stored in a protected way. In order to do this, a scheme should possess the following four properties [16], which are: The first one is diversity since the secure template must not allow cross-matching across databases, thereby ensuring the user’s privacy. The second one is revocability since it should be straightforward to revoke a compromised template and reissue a new one based on the same biometric data. The third one is security since it must be computationally hard to obtain the original biometric template from the secure template. This property prevents an adversary from creating a physical spoof of the biometric trait from a stolen template. Finally, the biometric template protection scheme should not degrade the recognition performance of biometric systems (Performance). There are several template protection methods proposed in the literature, broadly divided into two classes, which are: biometric cryptosystems and feature transform approaches [10]. In the feature transform approach, for instance, a transformation function (f) is applied to the biometric template and only the transformed template is stored in the database. This function can be invertible (salting) or noninvertible. Here, we focus on the use of noninvertible transformation functions. Several transformation functions have been reported for different biometric modalities, such as face [3], Signature [5,28,29], Iris [12], Voice [20,30] among others. However, among all the available modalities, fingerprint is the one which has offered the highest number of template protection methods proposed in the literature [6,8,14,15,17,19,23,32,33]. Some of the first formal work in cancellable biometrics was reported in [19], which proposed three transforms (Cartesian, polar, and functional transformations) to be used with fingerprint images. The first two methods have the drawback of the boundary problem, namely that if an original minutiae point crosses a boundary of sectors dividing the feature space due to minor deviation of image alignment or distortion of a fingerprint, then the


transformed version of the minutiae point is located far from the appropriate position. The third method deals with this issue by using some local smoothing functions to distort the feature space. As a consequence of the work reported in [19], several other investigations have been carried out. For instance, in [8], the authors presented a technique for converting a fingerprint into a binary-string template based on minutiae triplets. This binary representation is transformed into an anonymous representation using a unique personal key. According to the authors, the proposed transformation is not only computationally infeasible to invert, but in the case that the biometric representation is compromised it can be redefined by simply assigning a different key. In a very recent work [15], the authors proposed a method to generate cancellable bit-strings (templates) from fingerprint minutiae. According to the authors, this method provides a simple means to generate cancellable templates without requiring pre-alignment of fingerprints (for more details, see Section 5). In summarizing, according to the four needed properties described in the beginning of this section, the non-invertible functions can offer high levels of security, since it is hard to recover the original biometric template. In addition, diversity and revocability can be achieved by using application-specific and userspecific transformation functions, respectively. However, the main potential drawback of non-invertible functions is related to the performance property, since it is hard to achieve a good trade-off between performance (discriminability) and security (noninvertibility). The transformation function should preserve the discriminability (similarity structure) of the feature set – that is, just as in the original feature space, features from the same user should have high similarity in the transformed space, and features from different users should be relatively dissimilar after transformation. On the other hand, the transformation should also be noninvertible – that is, given a transformed feature set, it should be hard for an adversary to obtain the original feature set (or a close approximation of it). This paper presents a way to ameliorate this problem, since it focuses on the use of a well-established recognition structure (ensemble) in a fingerprint-protected identification system. The aim is to analyse the performance of these systems with transformed biometric data and investigate the benefits that this can bring for the recognition rate of the transformed data.

145

3. Ensemble of classifiers Ensemble systems are composed of a set of individual classifiers, organized in a parallel way, that receive the input patterns and send their output to a combination method which is responsible for providing the final output of the system. These systems exploit the idea that different classifiers can offer complementary information about patterns to be classified, improving the effectiveness of the overall recognition process [4, 13]. There are two main issues to consider in the design of an ensemble, which are: the ensemble components, and the combination methods that will be used. In relation to the first issue, the members of an ensemble are chosen and implemented. The correct choice of the set of individual classifiers is fundamental to the overall performance of an ensemble. The ideal situation would be a set of base classifiers with uncorrelated errors they would be combined in such a way as to minimize the effect of these failures. According to its structure, an ensemble can be divided using two main approaches: heterogeneous and homogeneous. The first approach combines different types of classification algorithms as individual classifiers. In contrast, the second approach combines classification algorithms of the same type. Once a set of individual classifiers has been created, the next step is to choose an effective way of combining their outputs. The choice of the best combination method for an ensemble needs the execution of exhaustive testing. In other words, the choice of the combination method of an ensemble is very important and difficult to achieve. There are a great number of combination methods reported in the literature [4,9,13] but, according to their functioning, these typically adopt one of three main strategies, fusion, selection and hybrid . In our investigation, five fusion-based combination methods are investigated in the ensemble systems, which are: Sum, Majority vote, Weighted Sum, kNN (nearest neighbour) and support vector machine (SVM). 4. Optimization techniques Optimization can be defined as the search for the optimal solution for a given problem. The idea is to find the optimal values to solve a problem and the techniques used in this search are called optimization techniques. Several techniques have been proposed to solve these problems. Intuitively, it is possible to state

146


that these techniques try to optimize the value of some objective function, subject to any resource and/or other constraints such as legal, input, environmental, and behavioural restrictions. However, there are some problems that cannot be tackled by classical methods, since either they impose unacceptably high computational overheads or are infeasible to solve. This drawback generates a demand for other types of algorithms, such as heuristic optimization approaches. In the next two subsections, two heuristic optimization techniques (genetic algorithm and ant colony optimization respectively) will be briefly described. 4.1. Genetic algorithm Genetic algorithms (GAs) were first developed by Holland in 1960 [23]. They are considered as stochastic global optimization methods inspired by biological mechanisms such as evolution and hereditary processes. Lately, they have been widely used in different tasks, such as optimization of neural networks [24], in ensemble classifiers [25], and in multi-agent systems [26], among others. Genetic algorithm is a population-based method, in which each possible solution for a problem is coded as a chromosome (individual). The set of these individuals is called population. The initial population of a genetic algorithm can be chosen in several ways, being the most common way the random choice. Once an initial population is created, the individuals of this population are assessed by means of a fitness function, which characterises the “goodness” of a chromosome in the solution of the optimization task. In other words, it indicates how close a chromosome is to the optimal solution. In addition, fitness is the function to be optimized (minimization or maximization) in a genetic algorithm. Based on this fitness function, chromosomes are selected and some genetic operators (mutation, crossover and so on) are applied in the selected chromosomes, forming new ones. The idea is that these chromosomes evolve, always creating better individuals until a global optimum is reached [23,24]. 4.2. Ant-colony optimization Ant colony optimization (ACO) was introduced as a novel nature-inspired meta-heuristic for the solution of hard combinatorial optimization (CO) problems [27]. ACO is a population-based meta-heuristic which distributes the search activities over so- called “ants”. In

other words, the activities are divided among agents with very simple basic capabilities which, to some extent, mimic the behaviour of real ants in the search for food. It is important to emphasize that ACO has not been created as a simulation of ant colonies, but to use the metaphor of artificial ant colonies and apply them as an optimization tool. The idea of ACO is to model the problem as an environment, in which it is possible to create a set of artificial ants which move around this environment. In the beginning of the processing, as there is no information about the path to go from one point to the other, the choice of the path is completely random. During the processing, the idea is that if an ant has to choose among different paths, those which were heavily chosen by preceding ants (that is, those with a high trail level) have higher probability to be chosen. Furthermore, high trail levels are synonymous with short paths. 4.3. Optimization techniques in ensemble systems The use of ensemble systems can lead to an increasing in the time processing of a classification system, since they are more complex than single classifiers [13]. In this case, the use of ensembles has to be well justified in order to overcome the increased complexities of these systems. In addition, as already mentioned, there is clearly no accuracy gain in a system that is composed of a set of identical individual classifiers. In other words, the individual classifiers should be diverse among themselves. The use of feature selection or data distribution methods in ensemble systems usually increases the diversity of the members of an ensemble. This is because the individual classifiers will perform the same task (classification of the same input patterns) but they were built using different subsets of features. In addition, these methods can reduce the dimensionality of the individual classifiers, reducing the overall complexity of the ensemble systems. In this case, the use of feature selection methods can help to reduce the complexity of an ensemble system as well as to increase the diversity of the individual classifiers of these systems. Of the techniques used to select attributes for the individual classifiers, the optimization techniques have been successfully used and genetic algorithms [35,36] is the most used method. In addition, ant colony optimization have been used for feature selection in single classifiers and has started to be used in ensemble systems [34]. In this paper, these two optimization techniques will be used to select attributes for ensemble systems.


5. The transformation function In a fingerprint image, the most common features to extract are minutiae, that are specific points that can be captured from a finger image. There are two main types of minutiae, known as ridge endings and bifurcations. The number and locations of the minutiae vary from finger to finger in any particular person, and from person to person for any particular finger (for example, the thumb on the left hand). When a set of finger images is obtained from an individual, the number of minutiae is recorded for each finger. The transformation method which applies a noninvertible transformation to the raw fingerprint data is a modified version of the method proposed in [15] and it uses as basis the minutiae that have been captured from a fingerprint image. Suppose that the i-th minutia is represented as Mi = (xi , yi , σi , ti ), and they represent the position of the minutia (x and y), the orientation and the type of minutia, respectively. The original method can be described in the following steps: 1. Choose a minutia Mi to be the reference minutia; 2. Define a 3D dimensional array, Ai = (DX , DY , DZ ), where DX = [1..WX ], DY = [1..WY ] and DZ = [1..WZ ]; 2.1 The width (WX ) and height (WY ) of the array are twice the size of an input fingerprint image and the depth (WZ ) is 2π. 2.2 The width (CX ), height (CY ), and depth (CZ ) of a cell are determined experimentally. The units of CX and CY are pixels and of CZ is the radian. These cells are initially set to 0. 3. Map the reference minutia into the 3D array with the reference minutia in the centre of the array, Ai (dx , dy , dz ) = 1, where dx = Wx /2, dy = Wy /2 and dz = 1. The other minutiae are rotated and translated to align the orientation of the reference minutia into the x-axis of the array. Ai (dx , dy , dz ) = Ai (dx , dy , dz )+ 1, where dx = A(Mj, Mi ), and A is a function to align the j-th minutia based on the reference minutia (Mi ). dy and dz are calculated in the same way, based on the corresponding axis. 4. Binarize the values of the cells of the 3D array as described in Eq. (1) 1, if Ai (dx , dy , dz ) > 1 Ai (dx , dy , dz ) = (1) 0, otherwise

147

5. A 1D bit-string (cancellable template) is generated and the order of the array is permuted. This permutation is based on the type of reference minutiae (ti ) and the user’s PIN. 6. If the number of reference minutiae has reached its maximum, stop. Otherwise, go to step 1. In order to better understand the transformation function, we illustrate a simple example. Suppose we have three minutiae captured from a fingerprint image. The minutiae information are represented as (x1 , y1 , σ1 , t1 ), (x2 , y2 , σ2 , t2 ) and (x3 , y3 , σ3 , t3 ), respectively. In the first loop of the algorithm, minutia 1 (x1 , y1 , σ1 , t1 ) is considered the reference minutia (step 1). We then place this reference minutia in the center of the 3D array of dimension (Dx , Dy , Dz ). Minutiae 2 and 3 are then rotated and translated to align the orientation of the reference minutia into the x-axis of the array (step 3), placing these rotated minutiae in the 3D array. After the rotation and translation of the other minutiae, we binarize the 3D array, setting 1 if the corresponding cell has more than one minutiae. Otherwise, the cell is set to 0 (step 4). Then, this 3D array is transformed into a 1D array (first we transformed it into z two-dimensional arrays and then in z*y 1D arrays of size x, forming on 1D array of size (x*y*z)). This 1D represents the first reference minutia and its order is randomly permuted, based on the type of this minutia (t1 ) and the user’s pin number (step 5). This process is repeated two more times, for minutiae 2 and 3, and the resulting pattern will be a binary array containing all three 1D arrays obtained by the three minutiae being the reference minutiae. According to step 5 of the algorithm, a 1D string is defined for each reference minutia. This action brings two problems, the first concerns the definition of the exact number of minutiae to be used. In the dataset used in this paper, there are fingerprints with as few as 13 minutiae (minimum), while there are others with more than 80 (maximum of 94). In this case, the definition of a small number means that some useful minutiae information might be lost. In contrast, the use of a large number means that these fingerprint samples with few minutiae should be either disregarded or looped. The second problem is related to the large dimension of the input pattern, since a 3D array is created for each minutia. Suppose that the dimensions of the 3D array are Wx = 13, Wy = 13 and Wz = 5. This means a 1D array of dimension 845 (13 × 13 × 5) for each minutia. For a small number of minutiae, like 10, the input patterns would have 8450 features for each fingerprint sample.

148


In order to overcome these problems, a modified version of this transformation is used in our investigation. In this modified version, instead of creating one 3D array (and consequently 1D) for each minutia, a 3D array is created for all reference minutiae and the values of all corresponding 3D array cells are summed. In this sense, the resulting array is not binary, but it represents the frequency of minutiae present in the corresponding 3D cell. However, it can be easily transformed into binary form through the use of a threshold (binarization procedure), the approach which will be used here. For this modified version, the main modifications in the algorithm of the original transformation are in step 2, in which two 3D arrays have to be created: general and specific. The specific array will store the minutia information of the current reference minutia and the general array will store the sum of the information across all reference minutiae. In addition, step 6 would have to include the sum and binarization procedures. In using this modified version, all minutiae will be used as references and the complexity of the problem is low. In addition, this modified version still uses a 1D binary string, as was proposed in the original version. 6. Methods and materials In order to carry out our investigation, the original and transformed fingerprint data will be used by individual classifiers as well as homogeneous and heterogeneous structures of ensembles. Initial experiments have shown that the modified transformation function has higher performance than the one proposed in [15]. Because of these initial results, we decided to use only the modified function presented here. Three types of classification methods are used as individual classifiers for the ensemble systems, which are: k-NN (nearest neighbour), C4.5 (called DT – decision tree) and MLP (multi-layer Perceptron) neural network. The choice of these classifiers is made on the basis of the diversity in the classification criteria used by each method chosen. In addition, these individual classifiers are combined using five common combination methods, which are: Sum, Majority Voting, Weighted Sum, Support Vector Machine (SVM) and k-NN [13]. Three different ensemble sizes are analysed, corresponding to the inclusion of 3, 6 and 12 individual classifiers respectively. An initial investigation was made using ensemble systems with more than 12 individual classifiers. However, they have a similar performance

of the ensembles with 12 classifiers. Because of this, we have chosen the ensemble systems with these sizes. For each system size, two different structures of ensembles are used, the homogeneous (HO) and Heterogeneous (HE) options. As there are several possibilities for each structure, we present the average of the accuracy delivered by all possibilities within the corresponding structure. In this case, the heterogeneous structures include ensembles with 2 and 3 different types of classifiers. When using two different classifiers, we have used two-thirds of one classifier and one third of the other. This was chosen due to the ensemble systems with 3 classifiers, which we could not divide it into two equal parts. When using three different classifier, each one had one third of the classifiers. Then, we have averaged the accuracy of all these possible combinations. In contrast, the homogeneous ensembles (HO) represent the average values for k-NN, MLP and DT ensembles. The individual classifiers and combination methods used in this study were exported from the WEKA (http:www.cs.waikato.ac.nz/˜ml/weka) machine learning visual package. The ensemble systems were built using a standard stacking procedure to define the learning process for the individual classifiers and for the combination methods [13]. In order to obtain a better estimation of the accuracy rates, a 10-fold cross validation method is applied to all ensembles (as well as individual classifiers). Thus, all accuracy results presented in paper refer to the mean over 10 different test sets. However, some of the combination methods are trainable methods (k-NN, SVM and Weighted Sum). In these cases, a validation set is used to train these methods. For the weighted sum method, the simple recognition rate (percentage of correctly classified patterns) over a validation set was used as weights. For the parameter setting of the combination methods, we have opted by the simplest version of these methods. Therefore, we have used k-NN with k = 1 and SVM using a polynomial kernel and c = 1. Furthermore, a statistical test is applied to compare the accuracy of the classification systems. We use the hypothesis test (t-test), a test which involves testing two learned hypotheses on identical test sets. In this investigation, we use the bi-caudal t-test with a confidence level chosen of 95% (α = 0.05) [7]. For both optimization techniques, a intra-classifier correlation criterion will be used as objective to guide in the search process. It is a criterion that defines the correlation within one classifier. In other words, it describes the correlation that might exist among the at-


tributes of one classifier. For each pair of attributes within one classifier, the correlation is calculated. The average correlation of all possible pairs is the correlation of one classifier (intra-classifier correlation) and the correlation of all classifiers are summed and defined as the overall intra-classifier correlation. The main aim of this criterion is to choose attributes for one classifier which are as uncorrelated as possible. It is envisaged to focus on the diversity of the classifiers separately. In this sense, the Pearson Correlation is used as the correlation measure. Pearson’s Product Moment Correlation Coefficient (PMCC) is a value which indicates the strength of linear relationships between variables. Intra-classifier correlation is a measure which is independent of the classification method used in the ensemble system (filter approach for feature selection). Because of this, it can be used in both ensemble structures (homogeneous and heterogeneous). In order to calculate the intra-correlation criterion, classifiers will need to have two or more attributes and this condition was included in both optimization techniques. Finally, solutions which have chosen the same subset of features for different classifiers are disregarded and a new one is created. This aims to avoid the choice of the same subset of attributes to all individual classifiers, decreasing the diversity of the ensemble system. 6.1. Dataset In this analysis we use the FVC2004 (fingerprint verification competition) database [22], which consists of 800 fingerprint images for 100 fingers (classes). In the original data, minutiae were extracted using the NFIS2 (NIST Fingerprint Image Software 2) software [18]. In order to choose the minutiae to be used, a core detection procedure similar to the one in [11] was applied and the N closest minutiae to the core were picked. In addition, an initial analysis was carried out and N = 10 has been shown to generate the best performance. In the transformation procedure, only x and y coordinates as well as direction the minutia type information are used. As already mentioned, in the transformed space, all minutiae are taking into consideration in the modified transformation function proposed in this paper. In addition, the cell sizes used in this investigation were similar to those used in [15], namely Cx = Cy = 30 and Cz = π/3. In the binarization process of the modified function, after some empirical analysis, we have set the threshold to 4.

149

6.2. The optimization techniques In order to generate the initial population for the genetic algorithm, an initial pool with a pre-defined number of classifiers is used. In this paper, a population of 30 chromosomes is used. A binary chromosome is used to represent a possible solution for the problem. The size of the chromosome is L XN , where L represents the number of classifiers of the ensemble and N represents the number of attributes of the dataset. In this chromosome, the first N bits will represent the feature subset for classifier C1 , followed by the N bits for classifier C2 , and so on. The genetic algorithm applies crossover and mutation genetic operators. It also keeps the best individual to pass to the next generation (elitism). Its ending condition is the maximum number of iterations, which were different for each dataset (2500 for original and 1500 for transformed dataset). The ant colony optimization used requires its representation as a graph, where the vertices (node) are the attributes of the dataset and the edges represent the trails to the next attributes. In this case, the search for the optimal subset of attributes can be represented by the trail of an ant in the graph, where a certain number of X nodes (attributes) have been visited and a termination conditions is met. In this paper, a initial analysis was carried and a value of N/2 was found to be the best value for X, where N is the number of attributes of the dataset. In this paper, the original ACO algorithm was applied, such as in [22]. In our experimentation, a colony is composed of 30 ants. Each ant is responsible for building a possible solution at each iteration of the algorithm. After a defined number of iterations (termination condition), the solution which contains the highest intra-correlation value is chosen as being the final solution. 7. Results and discussion 7.1. Ensembles with three base classifiers In this section, we analyse the accuracy of the classification systems when using an ensemble size of 3, with and without feature selection. Table 1 illustrates the accuracy and standard deviation of the classification systems (individual classifiers – Ind, and ensemble systems combined with Sum, Weighted Sum – WS, Majority Voting – MV, k-NN and SVM). In all cases, homogeneous and heterogeneous structures are analysed. It is important to emphasize that as we use differ-

150

A.M.P. Canuto et al. / Classifier ensembles and optimization techniques to improve the performance of cancellable fingerprint Table 1 Accuracy and standard deviation of the classification system using ensembles of size 3, divided into homogeneous and heterogeneous structures Complete dataset Transformed Het Hom Het 65.85 ± 8.89 77.2 ± 14.35 79.29 ± 5.43 80.27 ± 7.12 83.0 ± 11.73 84.09 ± 8.64 69.71 ± 10.51 80.29 ± 11.74 82.63 ± 8.65 80.3 ± 7.66 82.3 ± 6.7 84.54 ± 5.73 76.7 ± 6.87 78.25 ± 14.74 81.5 ± 8.47 79.23 ± 7.24 75.83 ± 11.68 86.54 ± 5.98 Genetic Algorithm (GA) 49.84 ± 7.16 71.29 ± 8.02 71.37 ± 4.74 65.55 ± 5.81 80.08 ± 11.19 81.63 ± 5.38 52.89 ± 9.26 75.54 ± 8.65 77.5 ± 3.72 62.59 ± 8.73 80.88 ± 9.2 82.59 ± 3.62 60.73 ± 8.23 79.38 ± 9.84 78.2 ± 5.19 67.68 ± 7.56 81.46 ± 9.25 82.77 ± 3.7 Ant Colony Optimization (ACO) 68.11 ± 6.82 76.94 ± 13.54 77.35 ± 5.94 77.82 ± 5.8 82.54 ± 10.74 85.18 ± 6.36 72.27 ± 7.06 79.13 ± 12.05 82.45 ± 5.86 76.54 ± 7.12 81.33 ± 12.57 86.66 ± 4.03 73.93 ± 7.46 79.63 ± 12.48 83.02 ± 5.83 77.41 ± 6.24 81.17 ± 11.28 87.02 ± 2.17 Random distribution 53.7 ± 13.61 66.18 ± 11.67 66.17 ± 5.69 69.27 ± 10.3 80.01 ± 13.3 81.3 ± 5.4 55.8 ± 18.29 73.92 ± 15.88 73.48 ± 7.81 67.46 ± 15.62 80.58 ± 14 82.7 ± 3.4 63.52 ± 16.57 77.67 ± 15.52 80.63 ± 5.23 69.8 ± 13.18 80.46 ± 13.07 81.86 ± 3.82

Original Ind Sum MV WS k-NN SVM

Hom 66.89 ± 9.91 76.375 ± 7.9 69.04 ± 12.47 76.2 ± 5.32 67.79 ± 9.69 72.63 ± 10.05

Ind Sum MV WS KNN SVM

49.08 ± 8.41 66.29 ± 10.32 52.63 ± 11.1 59.54 ± 10.95 59.13 ± 13.42 65.29 ± 12.19


67.58 ± 8.46 75.79 ± 8.35 71.08 ± 9.93 72.75 ± 10.07 72.25 ± 10.02 73.88 ± 9.59


53.08 ± 12.11 64.13 ± 14.08 55.13 ± 15.98 70.13 ± 10.73 63.83 ± 13.58 72.04 ± 11.41

ent individual classifiers to form the homogeneous and heterogeneous ensembles, they have different accuracy values. In this investigation, we carried out two different statistical tests. The first one compared the ensemble systems with and without feature selection, in a two-by-two basis (comparing the version using feature selection with the version without feature selection). In Table 1, the bold numbers represent the cases where the use of feature selection caused an increase in the accuracy of the ensemble systems. In addition, the bold and underlined numbers represent the cases in which the improvement was statistically significant. The second statistical test compared the ensemble systems in the original and transformed data and the last two columns represent the result of this statistical test (+ represents the statistical significant improvement of the transformed data over the original one). This statistical test was also done in a two-by-two basis, always comparing the versions in the transformed space with the one in the original space. As can be seen from Table 1, the accuracy of the classification systems increased when moving from the

t-test Hom Het + + + + + + + + + + − + + + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

original to the transformed fingerprint data. It is believed that this improvement in the accuracy when using the transformed data is because the transformation function used all the minutiae extracted from the fingerprint. This shows that the use of the modified version of the transformation function was positive in relation to the accuracy of the classification systems. From a statistical point of view (columns 5 and 6 of Table 1), there were 47 statistically significant improvements (+) for ensembles of size 3 (out of 48). The only exception was for the complete dataset and with homogeneous ensembles combined by SVM. When analysing the use of the feature selection methods (ACO, GA and random), it can be seen that this caused a decrease in the accuracy of the individual classifiers. In addition, there was an improvement in the accuracy levels (bold numbers in Table 1) of the ensemble systems in some cases (2 cases for GA, 10 cases for ACO and 0 cases for random distribution). When analysing these improvements from a statistical viewpoint (bold and underlined numbers), there were only three cases (2 for ACO and 1 for AG). It is believed that the small number of individual classifiers is the


151

Table 2 Accuracy and standard deviation of the classification system using ensembles of size 6, divided into homogeneous and heterogeneous structures Complete dataset Transformed Het Hom Het 66.61 ± 3.52 76.28 ± 13.93 76.71 ± 7.47 83.13 ± 3.66 84.42 ± 10.36 90.07 ± 3.82 73.77 ± 4.05 81.33 ± 10.04 89.07 ± 3.78 82.025 ± 4.43 81.79 ± 10.33 90.04 ± 3.72 80.48 ± 2.25 80.33 ± 12.06 87.09 ± 5.48 83.45 ± 1.6 79.13 ± 10.2 87.93 ± 3.19 GA 63.24 ± 4.06 70.9 ± 8.84 71.02 ± 4.66 82.3 ± 1.55 89.0 ± 3.47 91.73 ± 0.89 79.54 ± 3.69 87.42 ± 2.55 89.91 ± 0.79 81.36 ± 1.75 89.01 ± 4.0 90.91 ± 1.37 81.55 ± 2.49 84.79 ± 6.56 87.34 ± 2.48 83.21 ± 1.5 87.42 ± 3.59 89.27 ± 1.13 ACO 66.79 ± 4.4 75.89 ± 14.87 75.75 ± 7.13 81.45 ± 2.18 82.71 ± 9.22 90.07 ± 1.19 78.34 ± 4.47 80.75 ± 9.46 88.75 ± 1.73 80.07 ± 2.39 82.0 ± 9.24 89.77 ± 1.14 80.43 ± 2.72 79.46 ± 11.23 87.54 ± 2.5 82.14 ± 1.26 82.13 ± 9.67 89.41 ± 0.68 Random distribution 60.4 ± 4.07 73.21 ± 14.81 72.74 ± 8.3 81.3 ± 3.27 85.71 ± 4.41 90.18 ± 2.29 79.5 ± 4.99 84.79 ± 6.58 88.45 ± 3.31 80.25 ± 4.33 84.13 ± 8.32 89.5 ± 3.71 80.25 ± 4.04 82.25 ± 8.35 86.95 ± 3.65 82.07 ± 2.88 82.92 ± 8 87.7 ± 3.57


Hom 68.16 ± 8.13 79.415 ± 6.61 73.17 ± 8.36 69.875 ± 8.2 75.5 ± 7.63 78.21 ± 8.23


63.11 ± 8.7 78.79 ± 9.11 76.29 ± 10.05 77.08 ± 9.55 77.63 ± 9.52 79.38 ± 8.63


67.11 ± 9.66 77.83 ± 8.33 75.67 ± 9.07 75.88 ± 9.69 76.25 ± 9.49 78.33 ± 8.22


60.79 ± 7.84 75.96 ± 6.91 75.04 ± 7.94 75.54 ± 7.8 74.96 ± 7.22 74.96 ± 7.03

main reason for this decrease in the accuracy level. It is expected that the use of feature selection causes a decrease in the performance of the individual classifiers. In using a small number of individual classifiers, the combination of weaker classifiers was not sufficient to avoid a decrease in the performance of the ensemble systems. When comparing the accuracy of the different ensemble structures, it can be seen that the accuracy of the heterogeneous structures was higher than the homogeneous structures. In addition, the improvement in the accuracy when moving from original to transformed data was higher for the heterogeneous structures (average improvement of 11.5, considering the improvement of all ensemble systems) than the homogeneous ones (average improvement of 11.2). 7.2. Ensembles with six base classifiers In this section, we analyse the accuracy of the classification systems when increasing the ensemble size to 6. Table 2 illustrates the accuracy and standard deviation of the classification systems. From Table 2, it

t-test Hom Het + + + + + + + + + + + + + + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

can be seen that the increase in the ensemble size has caused a slight decrease in the accuracy of the individual classifiers (mainly for ACO) and an increase in the accuracy of the ensemble systems. This improvement happened in both ensemble structures, but it was higher for the homogeneous structures than the heterogeneous ones. In addition, this improvement was noticeable in ensemble systems with all combination methods, but it was higher for k-NN and SVM than for Sum and Voting strategies. Also in Table 2, it can be seen that the improvement achieved when using transformed data was higher than with ensemble systems of size 3, in all analysed cases. When applying the statistical test, statistically significant improvements were observed in all cases. Of the feature selection methods investigated, we can observe that there was an improvement in the accuracy levels (bold numbers in Table 2) of the ensemble systems in the majority of cases (16 cases for GA, 9 cases for ACO and 9 cases for random). When analysing these improvements from a statistical viewpoint (bold and underlined numbers), there were only 11 cases (8 for AG, 3 for ACO and 2 for random). Unlike the

152

A.M.P. Canuto et al. / Classifier ensembles and optimization techniques to improve the performance of cancellable fingerprint Table 3 Accuracy and standard deviation of the classification system using ensembles of size 12, divided into homogeneous and heterogeneous structures Complete dataset Transformed Het Hom Het 67.0 ± 5.51 73.87 ± 16.93 75.5 ± 8.15 84.16 ± 1.38 85.375 ± 9.26 90.55 ± 3.77 78.95 ± 4.87 82.71 ± 9.63 90.63 ± 1.99 83.89 ± 1.67 82.42 ± 10.78 89.57 ± 4.63 81.86 ± 3.16 81.04 ± 9.14 88.14 ± 4.44 84.17 ± 1.37 81.73 ± 8.37 88.2 ± 1.37 GA 63.09 ± 3.41 75.22 ± 10.41 75.24 ± 5.64 83.39 ± 1.04 93.13 ± 1.94 94.13 ± 0.85 81.71 ± 2.93 92.58 ± 1.85 93.54 ± 1.04 82.57 ± 1.81 92.96 ± 2.87 94.69 ± 0.82 84.09 ± 0.83 87.67 ± 7.6 90.15 ± 1.65 84.98 ± 0.12 90.71 ± 2.8 92.04 ± 0.65 ACO 67.04 ± 4.07 76.39 ± 14.58 76.43 ± 6.98 82.23 ± 1.38 86.88 ± 5.73 90.38 ± 1.17 80.5 ± 3.21 85.63 ± 6.74 90.02 ± 1.7 81.14 ± 2.02 86.08 ± 7.78 91.05 ± 1.04 81.29 ± 2.12 84.08 ± 8.06 88.61 ± 1.83 83.61 ± 1.56 85.63 ± 7.3 90.46 ± 1.07 Random distribution 46.94 ± 3.86 74.69 ± 15.8 74.53 ± 9.04 81.96 ± 4.59 90.33 ± 7.52 91.66 ± 3.74 79.25 ± 5.94 88.46 ± 8.2 90.16 ± 3.65 80.59 ± 5.27 87.38 ± 8.93 92.11 ± 3.29 78.14 ± 8.56 86.13 ± 10.34 88.01 ± 4.45 91.14 ± 2.93 81.46 ± 4.99 86.79 ± 8.32


Hom 64.96 ± 13.81 78.58 ± 6.06 72 ± 9.98 76.96 ± 7.1 74.13 ± 7.54 77.58 ± 7.41


62.86 ± 7.61 80.03 ± 6.18 78.58 ± 7.55 77.96 ± 8.97 81.38 ± 4.91 82.79 ± 3.37


67.0 ± 9.67 78.83 ± 8.23 77.67 ± 8.71 76.5 ± 8.52 78.29 ± 7.15 80.0 ± 6.0


47.15 ± 7.04 79.25 ± 9.36 77.33 ± 10.25 77.71 ± 10.58 74.06 ± 15.12 78.5 ± 11.16

ensembles with 3 classifiers, in this section, the GA provided the highest accuracy levels and the highest number of statistically significant improvements. 7.3. Ensembles with twelve base classifiers In this section, we analyse the accuracy of the classification systems when increasing the ensemble size to 12. Table 3 illustrates the accuracy and standard deviation of the classification systems. It can be seen from Table 3 that the increase in the ensemble size from 6 to 12 has caused slight increases in the majority of the cases (mainly for the ensemble systems with feature selection). In comparing the original and transformed datasets, once again the performance of the ensemble systems applied to the transformed dataset was much superior to those applied to the original data, mainly for the ensemble systems with feature selection. From the statistical point of view, statistically significant improvements were achieved for all analysed cases. In addition, as in the previous sections, the improvement in the accuracy when moving from original to transformed

t-test Hom Het + + + + + + + + + + + + + + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

+ + + + + +

data was higher for the heterogeneous structures than for the homogeneous ones. Of the feature selection methods, we can observe that there was an improvement in the accuracy levels of the ensemble systems in almost all analysed cases (18 cases for GA, 13 cases for ACO and 9 cases for random). When analysing these improvements from a statistical viewpoint (bold and underlined numbers), there were only 27 cases (17 for ACO, 10 for AG and 2 for random). Once again the GA provided the highest accuracy levels and the highest number of statistically significant improvements.

8. Final remarks This paper has presented an analysis of some wellestablished recognition structures, ensemble systems, as a tool to enhance the performance of cancellable fingerprint biometric recognition processing. In order to do this, a modified version of a previously-reported transformation function was proposed and a commonly used fingerprint dataset processed using this trans-


formation. In addition, two optimization techniques were used in order to improve the effectiveness of the ensemble systems. Finally, the accuracy of individual classifiers as well as heterogeneous and homogeneous structures has been analysed with respect to both the original and transformed data. Through this analysis, it has been demonstrated that the use of our modified version of a non-invertible transformation function has resulted in an increase in the accuracy of the classification systems. This gain was more marked for the ensemble systems than for the individual classifiers. Also, when increasing the size of the ensemble systems, it is noticeable that the improvement with respect to the transformed data increased. This shows that the use of a higher order ensemble is a more beneficial option when working with a cancellable fingerprint-based biometric identification system. In relation to the use of feature selection methods aiming to increase the effectiveness of the ensemble systems, it can be observed that the use of simple distribution methods, such as random distribution, can lead to an increase in the accuracy of ensemble systems in some cases. However, more elaborated methods, such as optimization techniques, improve even further the accuracy of these systems. In addition, we could seen that, as the number of individual classifiers increased, the performance of the ensemble systems increased and surpassed the performance of the ensembles without feature selections. When using ensembles with 12 individual classifiers, for instance, the accuracy level of the feature selection based ensembles was higher than those without feature selection in almost all analysed cases. In relation to the performance of the individual optimization techniques, it can be noted that ACO has affected more positively the accuracy of the ensembles with fewer individual classifiers (3 and 6). As the number of individual classifiers increased, the performance of the GA-based ensembles increased and they become the ensembles with the highest accuracy in most cases. In other words, based on this empirical analysis, it is possible to conclude that when using small ensembles (small number of individual classifiers), the best option is ACO. However, for large ensembles, the best choice is GA. Acknowledgement This work was supported in part by the CNPq, under processes number 200.755/2009-9.

153

References [1] [2] [3]

[4]

[5]

[6] [7] [8] [9] [10] [11] [12]

[13] [14]

[15] [16] [17] [18] [19]

J. Bringera, H. Chabannea and B. Kindarji, The best of both worlds: Applying secure sketches to cancellable biometrics, Science of Computer Programming 74(1–2) (2008), 43–51. R.M. Bolle, J.H. Connell and N.K. Ratha, Biometric perils and patches, Patt Recognition 35(12) (2002), 2727–2738. T. Boult, Robust distance measures for face-recognition supporting revocable biometrics token, in Proc. 7thInt. Conf. Autom. Face and Gesture Recog., Southampton, U.K., Apr. 10– 12, 2006, pp. 560–566. A. Canuto, M. Abreu, L.M. Oliveira, J.C. Xavier Junior and A. Santos, Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles, Patt Recog Letters 28 (2007), 472–486. P. Campisi, E. Maiorana and A. Neri, On-line signature based authentication: Template security issues and countermeasures, in: Biometrics: Theory, Methods, and Applications, N.V. Boulgouris, K.N. Plataniotis and E. Micheli-Tzanakou, eds, Wiley/IEEE, 2008. S. Chikkerur, N.K. Ratha, H. Connell and R.M. Bolle, Generating registration-free cancelable fingerprint templates. In BTAS08, pages 1–6, 2008. J. Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research 7 (2006), 1–30. F. Farooq, R.M. Bolle, T.-Y. Jea and N. Ratha, Anonymous and Revocable Fingerprint Recognition, in Proc. Computer Vision and Pattern Recognition, Minneapolis, June 2007. G. Giacinto and F. Roli, Dynamic classifier selection based on multiple classifier behaviour, Pattern Recognition 34(9) (2001), 1879–1881. A.K. Jain, K. Nandakumar and A. Nagar, Biometric Template Security, EURASIP Journal on Advances in Signal Processing, Special Issue on Biometrics, January 2008. N. Khan, M. Javed, N. Khattak and U. Chang, Optimization of core point detection in fingerprints, DICTA 2007, pp. 260– 266. S. Kanade, D. Petrovska-Delacretaz and B. Dorizzi, Cancelable Iris Biometrics and Using Error Correcting Codes to Reduce Variability in Biometric Data. In proc of IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 120–127, 2009. L.I. Kuncheva, Combing Pattern Classifiers. New Jersey: Wiley, 2004. C. Lee, J. Choi, K. Toh, S. Lee and J. Kim, Alignment-free cancellable fingerprint templates based on local minutiae information, IEEE T on Systems, Man, and Cybernetics – B 37 (2007), 980–992. C. Lee and J Kim, Cancelable fingerprint templates using minutiae-based bit-strings, J Network Comput Appl (2010), doi:10.1016/j.jnca.2009.12.011. D.M. Maltoni, A.K. Jain and S. Prabhakar, Handbook of Fingerprint Recognition, Springer, Germany, 2003. A. Nagar and A.K. Jain, On the Security of Non-invertible Fingerprint Template Transformed, IEEE Workshop on Information Forensics and Security (WIFS), London, Dec. 2009. NIST Fingerprint Image 2: A user guide to. http://fingerprint. nist.gov/NBIS/index.html. N.K. Ratha, S. Chikkerur, J.H. Connell and R.M. Bolle, Generating cancelable fingerprint templates, IEEE Trans. on Pattern Analysis and Machine Intelligence 29(4) (2007), 561–572.

154 [20]


W. Xu and M. Chang, Cancelable Voiceprint Template Based on Chaff-Points-Mixture Method. In proc of International Conference on Computational Intelligence and Security, pp. 263–266, 2008. [21] B. Yang, C. Busch, M. Derawi, P. Bours and D. Gafurov, Geometric-Aligned Cancelable Fingerprint Templates, In Proc of the 15th int Conference on Image Analysis and Processing. LNCS, Vol. 5716, 490–499, 2009. [22] FVC 2004: http://bias.csr.unibo.it/fvc2004/. [23] A. Canuto, M. Fairhurst, L. Santana, F. Pintro and A. Feitosa Neto, Ensemble-Based Methods for Cancellable Biometrics. ICANN 2010, LNCS 6352, pp. 411–414, 2010. [24] J. Hua, W.D. Tembe and E.R. Dougherty, Performance of feature-selection methods in the classification of highdimension data, Pattern Recognition 42(3) (2009), 409–424. [25] D.F. Oliveira, A.M. Canuto, A. and A. Campos, GNeurAge: An Evolutionary Agent-Based System for Classification Tasks. 6th Int. Conf. on Hybrid Intelligent Systems, 2006. [26] W. Zhong, J. Liu, M. Xue and L. Jiao, A multiagent genetic algorithm for global numerical optimization, IEEE Transactions on Systems, Man, and Cybernetics – Part B 34(2) (2004), 1128–1141. [27] M. Dorigo, Optimization, learning and natural algorithms (in Italian), Ph.D. Thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy, 1992. [28] L. Nanni, E. Maiorana, A. Lumini and P. Campisi, Combining local, regional and global matchers for a template protected on-line signature verification system, Expert Systems with Applications 37(5) (2010), 3676–3684. [29] E. Maiorana, P. Campisi and A. Neri, Template Protection For Dynamic Time Warping Based Biometric Signature Au-

thentication, 16th International Conference on Digital signal Processing, DSP, 2009. [30] W. Xu, Q. He, Y. Li and T. Li, Cancelable Voiceprint Templates Based on Knowledge Signatures, International Symposium on Electronic Commerce and Security (2008), 412–415. [31] A. Nagar, K. Nandakumar and A.K. Jain, A hybrid biometric cryptosystem for securing fingerprint minutiae templates, Pattern Recognition Letters, 2010, in press. [32] F. Quan, S. Fei, C. Anni and Z. Feifei, Cracking Cancelable Fingerprint Template of Ratha. International Symposium on Computer Science and Computational Technology, pp. 572– 575, 2008. [33] K. Takahashi and S. Hirata, Generating Provably Secure Cancelable Fingerprint Templates Based on Correlation-invariant Random Filtering. IEEE 3th International Conference on Biometrics: Theory, Applications and Systems, 2009. [34] L. Santana, A. Canuto, L. Silva, F. Pintro and K. Vale, A Comparative Analysis of Genetic Algorithm and Ant Colony Optimization to Select Attributes for a Heterogeneous Ensemble of Classifiers, CEC (2010), 1–8. [35] M. Lee et al., A two-step approach for feature selection and classifier ensemble construction in computer-aided diagnosis, IEEE International Symposium on Computer-Based Medical System (2008), 548–553. [36] L. Oliveira, M. Morita and R. Sabourin, Feature selection for ensembles applied to handwriting recognition, International Journal of Document Analysis (2006), 262–279. [37] A. Canuto, M. Fairhurst, L. Santana, F. Pintro and A. Feitosa Neto, Enhancing Performance of Cancellable Fingerprint Biometrics using Classifier Ensembles. Proceedings of SBRN, 2010.

Classifier ensembles and optimization techniques

Classifier ensembles and optimization techniques

Suggest Documents

A Classifier Ensemble of Binary Classifier Ensembles

CLASSIFIER ENSEMBLES USING STRUCTURAL ...

Classifier Ensembles for Changing Environments

Language Identification using Classifier Ensembles - Translation and

Classifier Ensembles for Changing Environments - CiteSeerX

Performance Analysis of Classifier Ensembles - Semantic Scholar

Designing Classifier Ensembles with Constrained ... - CiteSeerX

Language Identification using Classifier Ensembles - Semantic Scholar

Data Reduction Using Classifier Ensembles - UCL/ELEN

Tweet Sentiment Analysis with Classifier Ensembles

Classifier Ensembles: Select Real-World Applications - CiteSeerX

Tweet Sentiment Analysis with Classifier Ensembles

Measures of Diversity in Classifier Ensembles and ... - Machine Learning

Off-line Signature Verification using Classifier Ensembles and Flexible ...

Evolutionary Bayesian Classifier-based Optimization

That Elusive Diversity in Classifier Ensembles - Semantic Scholar

Overfitting cautious selection of classifier ensembles with ... - CiteSeerX

The deterministic subspace method for constructing classifier ensembles

Classifier Ensembles for Detecting Concept Change in Streaming Data

Construction of classifier ensembles by means of ... - Springer Link

Building Classifier Ensembles for B-Cell Epitope Prediction

A genetic approach for training diverse classifier ensembles

Pareto Analysis for the Selection of Classifier Ensembles

Defining Classifier Regions for WSD Ensembles using Word Space ...