[14] Claus Bahlmann and Hans Burkhardt,âThe writer inde- pendent online handwriting recognition system frog on hand and cluster generative statistical ...
TRANSFORMING A PATTERN IDENTIFIER INTO BIOMETRIC KEY GENERATORS Yongdong Wu and Bo Qiu Institute for Infocomm Research, A*STAR, Singapore {wydong,qiubo}@i2r.a-star.edu.sg ABSTRACT According to a popular pattern recognition method, this paper proposes two methods to generate a secret from individual’s biometric information, such as fingerprint feature points. One method is to recover a secret based on Chinese Remainder approach. Another is to fake some chaff points based on the general biometric feature points, and then recover the secret based on polynomial fitting. In order to guarantee the integrity, a message authentic code (MAC) for all the points is stored along with the points in public. Only if a user is able to re-generate a secret that matches the stored MAC, the user will be confirmed. To keep pace with the gradual change of the measured features, the secret and the points are able to be updated adaptively. Keywords— Security, privacy, biometric key 1. INTRODUCTION The rapid development of electronic transactions has stimulated a strong demand for cryptography and cryptographic systems. For example, e-banking system employs a public key cryptosystem which enables confidentiality and authenticity assume only the owner knows the private key. Because the private key and the correspond public key are hard to remember, both of them are often stored on a medium such as a smartcard or a flash disk. This public-key method has the inherent weakness that the key is lost to its owner when the medium is damaged, lost or stolen. Since biometric data is available all the time and provided by the owner only, it will be very beneficial to generate a private key for public key crypto-system applications. Pattern recognition community has fruitful solutions ([1][5]) for entity identification by extracting individual biometric features. Unfortunately, these solutions can authenticate the identity only, but can not generate/recover a private key from the biometric parameters because biometric features are only stable to a limited degree. In other words, they are merely acceptable in an identification system where biometric templates are available. To generate a private key from a biometric source, Tomko [6] employed FFT transform with an FFT modulator so as to extract a bit-string. Clearly, Tomko’s method is expensive and inconvenient. In accordance with the keystroke experiments of Araujo et al. [7], keystroke duration and latency may be features of interest. Nonetheless, as the typing pattern is generally not stac 978-1-4244-7493-6/10/$26.00 °2010 IEEE
ble and the dissimilarity between two persons is not far, only a few bits (10 bits or so) can be extracted from the keyboard dynamics given that the false acceptance rate is reasonably low. Therefore, keystroke is inappropriate to the unique source of a cryptographic key. But it can be used to harden password [8]. Juels et al. [9] applied an error correcting code to obtain a stable secret to authenticate the user. Similarly, Hao et al. [10] generated a secret from iris (binary) image with Hadamard code and Reed-solomon code. Dodis et al. [11] formalized the schemes [9] into fuzzy extractor and instantiated with three metrics: hamming distance, set difference and edit distance. However, with edit distance, few bits can be extracted from the biometric data. Boyen [12] extended Dodis’s work so that the biometric is reuseabe. However, their distance measures are not satisfactory in most of biometric applications since only Euclidean distance measurement are widely accepted (e.g., [13]- [16]). Therefore, it is very useful to use Euclidean distance directly because the achievement in Pattern recognition field is abundant. Up to now, there are two methods to managing the biometric features in order to solve the Euclidean distance problems: Binary representation and Chaff insertion. Unlike the pattern identifier which has a template, both methods employ a sketch which help the key recovery with few information leakage. Except 0ne-bit integer (i.e., binary), two integers may have large Euclidean distance even their Hamming distance is small. Therefore, only binary features are suitable for secret recovery directly. Daugman et al. [17] employed multi-scale Gabor wavelet to generate binary biometric code and then authenticated the claimant with the similarity of Hamming distance. The paper reports to extract 173 bits from 2048 bits iris code, and recover a biometric key K = E(2048, 173) with an error correcting code encoder E(·). However, for other biometric information such as handwriting signature or fingerprint, there are no known transformation which enable the Euclidean distance of two non-binary messages is also small assume their Hamming distance is small. Chang et al.[20] gave a method that produces a small sketch for applications involving mobile devices. On the other hand, Juels et al. [21] proposed to add chaff points to the feature point set to create a sketch. Technically, given a sketch including feature points initially, a chaff point is chosen uniformly. If the chaff point is not close to any point in ICME 2010
the sketch, the point is added to the sketch. The chaff point is generated iteratively until a sufficient number of points are selected or it is impossible to add more points. Some schemes [22, 23, 24] adopt the above random method in creating the sketch,i.e., generating random points and then removing close points. The challenging issue in generating-removing process is how to represent the sketch efficient with as small entropy loss as possible. Feng et al [25] proposed a three-step hybrid algorithm based on random projection, discriminability-preserving (DP) transform, and fuzzy commitment scheme. To evaluate the performance of the present method and exploit the face features, they used three publicly available face databases, namely FERET, CMU-PIE, and FRGC, and generated binary templates of above 200 bits. In order to manage the high degree of noise and variability in biometrics signals. Bui et al. [26] proposed fuzzy methods based on dirty paper coding known as quantization index modulation. The paper presents two methods for generating biometric keys. One method is based on Chinese reminder decoding. This method is simple in principle suppose that the order of biometric features are known. The other method is to build a sketch by generating chaff points based on the property of biometric. In comparison with the previous schemes, the present scheme has less spurious chaff points which may be removable as long as the biometric model is estimated. Thus, the present scheme is more robust and efficient. This paper is organized as follows. Section 2.1 introduces some basic concepts. The following sections transform pattern authenticator in Section 2.1 into biometric key generators. Section 3 is based on Chinese Remainder while Section 4 is based on polynomial fitting. Section 5 shows the conclusions. 2. PRELIMINARIES 2.1. A Pattern Authenticator “Biometric” and its grammatical equivalent represent some aspect of a person, which can be recorded and/or measured. It includes, for example, fingerprint,voice, or behavior. A pattern authenticator is used authenticate a person and includes two steps: registration and verification. In the registration step, a required biometric data is acquired by a device (e.g., fingerprint scanner), and then a feature extraction procedure is applied to the data to obtain the necessary features. Some features may be meaningful, and others may not. Assume the biometric feature is of d dimensions, noted as X1 , X2 , . . . , Xd . To increase the robustness, a lot of samples are collected and the mean x ¯i is calculated. Assume Xi is a random variable, and represented as Xi = x ¯i + εi , where εi is a Gaussian noise. All the mean values {¯ x1 , x ¯2 , · · · , x ¯d } are stored as a reference template along with other information such as the owner’s identity into a database. In the verification step, a new sample is captured again, and a feature point x = {x1 , x2 , . . . , xd } is produced from the sample with the same method in registration step. From a lot of biomet-
ric authentication methods [18], this paper adopts a simple one as follows: If k xi − x ¯i k≤ τi , i = 1, 2, . . . , d,
(1)
where τi are predefined parameters, the inspected person is confirmed. 2.2. Chinese Remainder with errors Given all qi are relatively primes, i = 1, 2, . . . , d, and qi < qi+1 , randomly select a positive integer m < q1 q2 · · · qk . Denote Y = {y1 , y2 , . . . , yd }, where yi = m mod qi , Then m can be decoded from set Y˜ = {˜ y1 , yT ˜2 , · · · , y˜t } for some t based on the list decoding [27] if | Y˜ Y |≥ d − e for at most e errors. 2.3. Reed-Solomon code based Polynomial fitting with errors According to [21], given a (k, d) Reed-Solomon code, i.e., each codeword consists of d symbols, and a codeword corresponds to a unique polynomial m(·) of degree less than k over a finite field. For a data source x = {x1 , x2 , . . . , xd }, its codeword Y = {yi } = {m(xi )}, i = 1, 2, . . . , d. Suppose that Y˜ = {˜ yT ˜2 , · · · , y˜t } is a corruption of the codeword of Y , 1, y i.e., | Y˜ Y |≥ d − e, Guruswami and Sudan [28] presented algorithm to decode Y˜ to the original secret m(·). 3. THE FIRST KEY GENERATOR In the use of biometric information to generate a secret such as, for example, key for encryption or like purposes, there are three stages: the gathering of the biometric information; the processing of the biometric information; and the generating of the secret key. The first two stages are similar to the pattern authenticator introduced in Section 2.1. The present paper focuses on the third stage: key generation. The key generation step includes two processes: registration and recovery. 3.1. Key Registration The key registration step aims to extract the biometric features and then build a sketch for the future recovery purpose. Technically, • Capture a user’s biometric data for n times. Denote the feature value as xi1 , xi2 , . . . , xin from n samples according to a feature extracting process. The mean and variance of the ith feature are calculated as x ¯i
=
σi2
=
xi1 + xi2 + . . . + xin , n Pn ¯i )2 j=1 (xij − x n
(2) (3)
Denote the mean vector ¯x = (¯ x1 , x ¯2 , . . . , x ¯d ), and the variance vector σ = (σ1 , σ2 , . . . , σd );
• Generate the sketch T = {ri , qi }, i = 1, 2, . . . , d and authentication token V = H(m, T, 0). Deposit T and V into a storage. The biometric key is K = H(m, T, 1), H(·) is a one-way function.
0.9 0.8 Recovery probability p 2
• Select d relatively primes qi , qi < qi+1 and qi is close to 2τi , i = 1, 2, . . . , d. And further select a random secret m < q1 q2 · · · qk which is large enough1 . Moreover, calculate yi = m mod qi , ri = x ¯i − qi yi .
1
e=10
0.7 0.6
e=8
0.5 0.4
e=6
0.3 0.2 0.1
3.2. Key recovery
0
0
0.5
When a user likes to utilize his biometric key, he will perform the key recovery process. Concretely, • Capture a new sample and calculate the feature vector ˜x = (˜ x1 , x ˜2 , . . . , x ˜t ) with the same method in the pattern identifier system; • Extract {ri , qi }di=1 from the sketch T, and then recover the codeword symbols as y˜i = b(˜ xi − ri )/qi e, i = 1, 2, . . . , t.
(4)
where b·e is an integer round function. Afterwards, recover the secret m ˜ from y˜1 , y˜2 , . . . , y˜t based on the method in the Subsection 2.2. • If V = H(m, ˜ T, 0), output the biometric key K = H(m, ˜ T, 1) for use. 3.3. Key recovery probability To meet the requirement that the genuine user is able to recover his key correctly, Eq.(4) should be satisfied. Thus, if | x ˜i − x ¯i |< τi ≈ 0.5qi , y˜i
= =
b(˜ xi − ri )/qi e = b(˜ xi − x ¯i + qi yi )/qi e yi + b(˜ xi − x ¯i )/qi e = yi i = 1, 2, . . . , d
(5)
According to Central Limit Theorem, one codeword symbol is recovered at a probability Z 0.5qi x2 2 2 p0 = P (| x ˜i − x ¯i |< 0.5qi ) ≈ √ e 2σi dx 2πσi 0 Therefore, with the assumption that the key components are independent,the biometric key is recovered at a probability e µ ¶ X d d−j p1 = p0 (1 − p0 )j . j j=0 If the pattern identifier in Section 2.1 is available, pi can be selected based on the threshold values in Eq.(1). Fig.1 illustrates the recovery probability. For example, if we select qi = 2τi = 2.5σi , k = 12, d = 24, e = 8, the recovery probability is 95.1%. j
1 This condition can be generalized to m < q j1 q j2 · · · q k for some integers 1 2 k j1 , j2 , . . . , jk if some qj is too small.
1 1.5 Error tolerance τ /σ i
2
2.5
i
Fig. 1. Recovery probability vs. error tolerance when d = 24, k = 12. 3.4. Key forgery probability In order to forge a biometric key, an adversary has to solve at least (d − e) equations as Eq.(4). Theoretically, it is impossible to obtain y˜i without knowing any neighbor of point x ¯i . Thus, if the adversary does not know the biometric data such as fingerprint, she has to guess the key based on the her knowledge on the biometric statistics. Thus, a successful attack probability is decided by the entropy of the biometric features if the permissible number of errors is zero (i.e., e = 0). In other words, the present scheme is comparative to the pattern identifier in terms of security strength. 4. THE SECOND KEY GENERATOR In the method elaborated in Section 3, we assume that the order of feature points is stable whenever a new sample is captured. However, this assumption is not always true. In this section, we will present a method which is robust against the point permutation. In order to have high security, we add chaff points into the sketch as Juels et al. [21] did. But we adopt a new method which has advantage over the state-of-the -art. In the previous work, if the chaff points are not within the biometric source, the chaff point can be removed from the sketch. On the other hand, if the sketch is not secure against to Oracle attack, an attacker is able to manipulate the sketch and monitor whether the authentication succeeds or not, and further deduce the biometric structure. 4.1. Key Registration In the registration stage, the applicant’s biometric data is sampled a plural number of times and a biometric feature vector is extracted from the samples. Because the sample value of any feature is random, the mean vector and the variance vector are used to generate the secret key so as to increase the robustness. Specifically,
• Capture the applicant’s biometric data a total of n times. A feature extracting process can obtain the feature vectors from the samples; • Calculate the mean vector ¯x = (¯ x1 , x ¯2 , . . . x ¯d ), and the variance vector σ = (σ1 , σ2 , . . . , σd ) as Eq.(2) and Eq.(3). • Randomly select a secret polynomial m(x) of degree k. For the ith feature xi , calculate yi = m(¯ xi ), and put the point (¯ xi , yi , σi ) as a point of a sketch T. • Add random points to the sketch T repeatedly. Technically, select the feature points which are from biometric sources, e.g., fingerprints of random participants. If a chaff point xj is beyond 2σi boundary of any point x ¯i ∈ T, it will be put into the sketch T along with random yj and σj , otherwise, the node will be removed. Store T and H(m, T, 0) into the storage. The secret key K = H(m, T, 1), where m is the shorthand of the coefficients of polynomial m(·).
habit. For the sake of automatic performance upgrading when the registrant gradually changes their biometric feature, the public information in the database should be refreshed to keep up with any such change. In order to prevent illegal upgrade, only the successful claimant can initiate this upgrade step. That is to say, if the user reconstructs the biometric key successfully with the method in Subsection 4.2, and the feature vector is changed into ˜x, an upgrade process is performed as • For those feature component i ∈ [1, d] which exists in T0 , the new variance vector σ ˆi2 ← βσi2 + (1 − β)(˜ xi − x ¯i )2 , is then calculated, where 0.5 < β < 1. Meanwhile, the new mean vector is updated as x ˆi ← α¯ xi + (1 − α)˜ xi , 0.5 < α < 1 where α is a predefined parameter, and re-calculate yˆi ← m(ˆ xi ). Replace the item {¯ xi , yi , σi } in T with the new one {ˆ xi , yˆi , σ ˆi }.
4.2. Key recovery After an individual has registered his biometric key, he can make use of it for his purpose. For example, he may decrypt a protected document with his biometric key. To carry on the decryption task, the decryption key should be recovered as follows. • Capture a new sample of the user, and extract the feature vector ˜x = (˜ x1 , x ˜2 , . . . , x ˜t ); • If there is x ¯i in the template T such that | x ˜i − x ¯i |< τi , extract point (¯ xi , yi , σi ) from T, otherwise, discard x ˜i . Assume those components extracted consists of a new set T0 . • If there are more than d points extracted from the new sample, select those points whose σi /τi are smallest from T0 . Reconstruct the polynomial m(x) with the elements in T0 . The key recovery step includes an additional confirmation procedure. This confirmation procedure is necessary to ensure that the claimant is not a forger. Only if the key is confirmed, the user can apply the key to do decryption. Specifically, for each output m(·) ˜ based on the (list) decoding algorithm in Subsection 2.3, calculate an MAC such as V˜ = H(m, ˜ T, 0), if V˜ = V holds, the recovered key is K = H(m, ˜ T, 1). 4.3. Adaptive upgrade Usually, a feature extraction procedure will not produce the identical feature vector due to distortion of the sample data. This distortion may result from a change in the individual’s
• The rest steps are the same as those in Subsection 4.1. As a result, although the old sketch T is replaced with the new one, the old biometric key is reserved so as to enable backward compatibility. 5. CONCLUSIONS The present methods enable to generate biometric keys from biometric information with few personal information leakage. They can be applied to many fields, such as access control, authentication, and secret key management. Usually, the entropy of the biometric key is too small to be a cryptographical key independently unless other means such as multiple biometric resources is employed. 6. REFERENCES [1] William F. Lane, “Self-authenticating identification card with fingerprint identification”, US pat. 5,623,552, 199704-22. [2] Tsuhanus Chen, “Method and Apparatus employing audio and video data from an individual for authentication purposes”, US pat No. 5,761,329, 1998-06-02. [3] Gordon H. Matthews, “Electronic audio communications system with voice authentication features”, US Pat. 4,761,807, 1988-08-02. [4] George J. Tomko, “Method and Apparatus for Security Handling a Personal Identification Number or Cryptographic Key Using Biometric Techniques”, US Pat. 5,712,912, 1998-01-27.
[5] David W. Osten, “Biometric, Personal, Authentication System”, EP 752,143B1, 1997-01-08. [6] George J. Tomko, “Fingerprint Controlled Public Key Cryptographic System”, US Pat. 5,832,091, 1998-11-03. [7] Livia C. F. Araujo, Luiz H. R. Sucupira Jr., Miguel G. Lizarraga, Lee L. Ling, andJoa B. T. Yabu-Uti, “User Authentication Through Typing Biometric Features,” IEEE Trans. Signal Proc., 53(2):851-855, 2005 [8] F. Monrose, M.K. Reiter and Susanne Wetzel, “Password Hardening Based on Keystroke Dynamics”, ACM CCS, pp.73-82, 1999. [9] A. Juels, M. Wattenberg, “A Fuzzy Commitment Scheme,” ACM Conf. Computer and Communications Security (CCS), pp. 28-36, 1999. [10] Feng Hao, Ross Anderson, John Daugman, “Combining cryptography with biometric effectively,” http://www.cl.cam.ac.uk/users/jgd1000/ biocrypto.pdf July 2005 [11] Y. Dodis, L. Reyzin, and A. Smith, “Fuzzy extractors: How to generate strong keys from biometric and other noisy data,” Eurocrypt’04, LNCS 3027, pp. 523-540, 2004. [12] Xavier Boyen,“Reusable cryptographic fuzzy extractors,” ACM CCS, pp.82-91, 2004 [13] Xu-Hong Xiao, Graham Leedham, “Signature Verification by Neural Networks with Selective Attention,” Appl. Intell. 11(2):213-223, 1999. [14] Claus Bahlmann and Hans Burkhardt,“The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping,” PAMI, 26(3):299-310, 2004 [15] Andrew Senior, “A Combination Fingerprint Classifier”, PAMI, 23(10):1165-1174, 2001 [16] M. K. Mihcak and R. Venkatesan, “New Iterative Geometric Methods for Robust Perceptual Image Hashing,” ACM CCS-DRM. 2001. [17] J.G. Daugman, “High confidence visual recognition of persons by a test of statistical independence,”, PAMI, 15(11):1148-1161, 1993.
[18] A.K. Jain, A. Ross, S. Prabhakar, “An introduction to biometric recognition,” IEEE Transactions on Circuits and Systems for Video Technology,14(1):4-20, 2004. [19] Yao-Jen Chang, Wende Zhang, Tsuhan Chen, “Biometricbased Cryptographic Key Generation,” ICME 2004. [20] Ee-Chien Chang and Sujoy Roy, “Robust Extraction of Secret Bits from Minutiae,” International Conference on Biometric, pp.750-759, 2007. [21] Ari Juels and Madhu Sudan, “A Fuzzy Vault Scheme,” Designs, Codes and Cryptography, 38, 237-257, 2006. [22] T.C. Clancy, N. Kiyavash, and D.J. Lin, “Secure smartcard-based fingerprint authentication,” ACM Workshop on Biometric Methods and Applications, pp.45-52, 2003 [23] Ee-Chien Chang, Ren Shen, Francis Weijian Teo, “Finding the original point set hidden among chaff,” pp.182188, ASIACCS 2006. [24] Ee-Chien Chang, Qiming Li, “Hiding Secret Points Amidst Chaff,” Eurocrypt, LNCS 4004, pp. 59-72. 2006 [25] Yi C. Feng, Pong C. Yuen, Member, IEEE, and Anil K. Jain, “A Hybrid Approach for Generating Secure and Discriminating Face Template,” IEEE Transactions on Information Forensics and Security, 5(1):103-117, 2010. [26] Francis Minhthang Bui, Karl Martin, Haiping Lu, Konstantinos N. Plataniotis, and Dimitrios Hatzinakos, “Fuzzy Key Binding Strategies Based on Quantization Index Modulation (QIM) for Biometric Encryption (BE) Applications,” IEEE Transactions on Information Forensics and Security, 5(1):118-132, 2010. [27] O. Goldreich, D. Ron, M. Sudan, “Chinese remaindering with errors,” IEEE Transactions on Information Theory, 46(4):1330-1338, 2000. [28] V. Guruswami, M. Sudan, “Improved decoding of ReedSolomon and algebraic-geometry codes,”IEEE Transactions on Information Theory, 45(6):1757-1767, 1999.