Author manuscript, published in "Advances in Biometrics : Third International Conference, ICB 2009, Alghero, Italy, June 2-5, 2009 (2009) 1290-1298" DOI : 10.1007/978-3-642-01793-3_130
Palm Vein Verification System based on SIFT matching Pierre-Olivier Ladoux1, Christophe Rosenberger2, Bernadette Dorizzi1 1 Institue TELECOM Télécom & Management SudParis 9 Rue Charles Fourier, 91011 Évry Cedex, France
[email protected] [email protected] 2
hal-00472805, version 1 - 13 Apr 2010
Laboratoire GREYC ENSICAEN – CNRS – Université de Caen Basse-Normandie 6 boulevard Maréchal Juin, 14000 Caen, France
[email protected]
Abstract We present in this communication a new biometric system based on the use of hand veins acquired by an infrared imager. After the preprocessing stage and binarization, the vein image is characterized by specific patterns. One originality of the proposed system is to use SIFT descriptors for the verification process. The developed method only necessitates a single image for the enrollment step allowing a very fast verification. The experimental results on a database containing images of 24 individuals acquired after two sessions show the efficiency of the proposed method. Keywords: Hand vein modality, IR acquisition, normalization preprocessing, SIFT descriptors, key points matching.
1 Introduction Hand vein is a biometric modality that seems promising as it is acquired in Near Infrared light (NIR), which implies that skin variations and dirtiness are less sensible than in visible light [1]. Moreover, the hemoglobin which flows in the veins is sensible to NIR light, this way allowing a good quality of acquisition of the hand veins. It is possible to use either the back of the hand or the hand palm. A recent study [2] using back hand vein data and tested with 5 sessions per person and 50 persons showed promising results. The main problem of this database is the low resolution of the images (images at resolution 132x124 pixels). The first commercialized products have been produced by Hitachi [3] on the back and Fujitsu [4] on the palm. They have been patented but only little information is available on them. These companies claim a very low FRR at very low FAR on a huge database – close to 0% on 140000 hands. Unfortunately at this moment, there is no public database allowing verifying these figures. In general, in the various papers present in the literature, after the acquisition phase, some preprocessing algorithms are used such as histogram equalization, low
pass filtering. Then, some binarization is performed and for verification using this image, two main streams can be found: global matching of the reference and test images through a pixel to pixel superposition [5] or after some frequency analysis as in Laplacian palm processing [8]. Another direction consists in performing local matching of specific feature points extracted in the two references and test images [6]. The matching algorithms in this last approach are similar to those used for fingerprint verification [7].
hal-00472805, version 1 - 13 Apr 2010
Our approach developed in this paper falls in the last category. However, we use for the matching of the feature points, the well known SIFT [9] algorithm which, to our knowledge has never been used so far for hand vein verification. This algorithm developed for graph matching was proven to be very efficient for face verification [10]. We tested the proposed system on a small home database of videos of the hand palm of 24 persons containing two sessions. We present in Section 2 the details of our approach including a synopsis of the developed system, the acquisition device we used for acquiring the palm database as well as the preprocessing we made on these images. We also explain our binarization procedure. In the following subsection, the detection of the feature points and the SIFT procedure is briefly described. Due to the fact that some translations and rotations are present between the two sessions, we also propose a post processing allowing a better separation of genuine and impostor’s scores. Finally, section 3 presents the experimental work, namely the protocols defined on each database and the results in terms of EER and ROC curves.
2 Description of the proposed system The proposed scheme of the proposed method is shown in Figure 1.
Figure 1: Synopsis of the developed method (verification step) It is standard in biometrics with an enrollment and a verification step. One important characteristic of our method is that only one image is needed for the enrollment step.
Moreover, a post-processing is added after the matching phase in order to tackle translation and rotation problems.
2.1 Acquisition
hal-00472805, version 1 - 13 Apr 2010
Image acquisition is done by 48 infrared LEDs and a CCD camera, which sensitive range is between 400nm and 900nm but the interesting wavelength is around 850nm. We added to this system one support in order to help the persons positioning their hand and to therefore limit translations and rotations. The hand is approximately at 20cm of the camera’s lens (see Figure 2).
Figure 2: NIR acquisition device At each acquisition, we acquired a short video which provided us a set of 30 greyscale pictures of size 640x320 pixels. We this way acquired the data of 24 persons in two sessions.
2.2 Pre-processing The first step of the pre-processing is the extraction of the region of interest (ROI). Due to limited translation and rotation, this is eased. Sizes are cut down to 232x280 pixels. Then, we apply the 5x5 box filter on the ROI in order to reduce the noise. After removing the high frequency noise, we need to correct the brightness, which is not uniform. A Gaussian low-pass 51x51 filter is applied on the ROI in order to obtain the brightness image which is considered as low frequencies. Then, the brightness is subtracted of the original ROI. At this step, the contrast is still too bad. We therefore apply a normalization method commonly used in fingerprint verification [7]. For each image I of size NxM, the mean and variance (denoted µ and σ respectively) are calculated. Equation describes the normalization process applied on the image with µd and σd, the desired value of
the mean and variance. For each pixel, we modify its gray-level with the following formula: " #d 2 i( I ( x, y ) ! !)? $ !d + #2 $ I '( x , y ) = % (1) #d 2 i( I ( x, y ) ! !)? $ ! d ! $ #2 &
I ( x, y ) > ! Otherwise
hal-00472805, version 1 - 13 Apr 2010
Where I(x,y) corresponds to the gray-level for the pixel located at (x,y) for the original image and I’(x,y) for the resulting one after pre-processing. Figure 3 shows the original image of the palm acquired with our sensor and the corresponding image after pre-processing. For our experiments, we set empirically the values of µd to 128 and σd to 40².
Figure 3: (left) NIR image of palm, (right) Image after pre-processing
2.3 Vein Pattern Extraction After noise reduction and contrast normalization, the quality of the image is improved as can be seen in Figure 3. To obtain the vein pattern, it is necessary to extract the veins from the background. In fact, the grey level is low where the hemoglobin absorbs the NIR. Therefore, the chosen extraction algorithm is a local thresholding one depending on the mean value of the neighborhood of each pixel. Figure 4 shows two processing results for two images for the same individual. If we would ask somebody to decide if these images correspond to the same individual, he would probably try to find similar areas between these two images. This is the idea of the proposed methodology defined in the next section.
Figure 4: Examples of two binarized images corresponding to the same individual
hal-00472805, version 1 - 13 Apr 2010
2.4 Pattern definition We chose to use a local description of the vein image in order to facilitate the verification step. We used the SIFT descriptor, following of a comparative study [11] showing that it is one of the most efficient local ones. The invariant descriptor developed in the SIFT algorithm described in [9] is applied locally on key-points and is based on the image gradients in a local neighborhood. The SIFT detector and descriptor are constructed from the Gaussian scale space of the source image. The algorithm makes use of another scale space too, called difference of Gaussian (DOG), which can be considered as the scale derivative of the Gaussian scale space. Extracted key-points are defined as points of local extremum of the DOG scale space. The descriptor is created by sampling the magnitudes and orientations of the image gradients in a neighborhood of each key-point and building smoothed orientation histograms that contain the important aspect of the neighborhood. Each local descriptor is composed on a 4x4 array (histogram). To each coordinate of this array, an 8 orientation vector is associated. A 128-elements vector is then built for each keypoint. We used in this article the implementation provided by Lowe [9]. As illustration, we obtained an average value of 800 detected key-points for the vein images at hand.
2.5 Matching similarity Each individual is described by a set of invariant features Y(I) = {ki=(si, xi, yi)}, i=1:N(I) where si is the 128-elements SIFT invariant descriptor computed near keypoint ki, (xi,yi) its position in the original image I and N(I) the number of detected keypoints for image I. The verification problem for an individual given the set Y(I) corresponding to the biometric model of an individual in our case, is to measure the similarity with another
set of keypoints computed on the supposed vein image of the individual. We thus have to compute a similarity between two sets of points Y(I 1) and Y(I2). We thus use the following matching method which is a modified version of a decision criterion first proposed by Lowe [9]: Given two points x∈ Y(I1) and y ∈ Y(I2), we say that x is associated to y iff : (2) d(x,y) = min{z∈ Y(I2)}d(x,z) and d(x,y) ≤ C d(x,y')
hal-00472805, version 1 - 13 Apr 2010
Where C is an arbitrary threshold, d(.,.) denotes the Euclidean distance between the SIFT descriptors and y' denotes any point of Y(I2) whose distance to x is minimal but greater than d(x,y): (3) d(x,y')=min{z ∈ Y(I2), d(x,z)>d(x,y)} d(x,z) In other words, x is associated to y if y is the closest point from x in Y(I2) according to the Euclidean distance between SIFT descriptors and if the second smallest value of this distance d(x,y') is significantly greater than d(x,y). The significance of the necessary gap between d(x,y) and d(x,y') is encoded by the constant C. In the same way, we say that y ∈ Y(I2) is associated to x ∈ Y(I1) iff x is the closest point from y among Y(I1) according to the Euclidean distance between SIFT descriptors and if the second smallest value of this distance d(y,x') satisfies d(y,x)