Applying Speaker Verification to Certificate Revocation Javier R. Saeta1, Javier Hernando2, Oscar Manso3, Manel Medina3 1 2
Biometric Technologies, S.L. Barcelona, Spain
[email protected]
TALP Research Center. Universitat Politècnica de Catalunya, Spain 3 SeMarket, S.A. Barcelona, Spain
Abstract The increasing popularity and importance of electronic commerce is evident today. However, global electronic commerce will not fully develop its immense potential unless trust is fully established. Digital certificates and electronic signature contribute to increase confidence and security by providing authenticity. However, authenticity on its own is not enough to provide trust. A credible service needs to provide authenticity and validity at the same time. In this sense, traditional Public Key Infrastructure (PKI) services have revealed a weak point due to inherent delays existing in order to cancel the possible use of a certificate when it has been lost or stolen. The CertiVeR Project intends to solve this problem and to strengthen security in commercial transactions. CertiVeR has adopted speaker verification (SV) to validate the users’ identities. The performance of the SV system has been evaluated with a Spanish database that gathers fixedline and mobile telephone sessions.
1. Introduction During these last years, Internet has become an important vehicle for commercial transactions. However, despite its current magnitude and scope, users are still reticent to use it for most transactions. Therefore, there is still a great potential for e-commerce to grow. But for this to happen, users need to feel much safer while doing a commercial transaction through Internet. In order to increase security, it is necessary to validate the identity of the subjects being involved in a transaction. A digital certificate identifies the user who signs a transaction. Digital certificates provide us the option to either encrypt data, to produce an e-signature or both. Electronic signatures provide authenticity – i.e. proof ownership. However, authenticity on its own is not enough to provide trust. A credible service needs to provide authenticity and validity at the same time. For validity we understand the proof that ownership of a certificate is valid at a specific time.
This means that if you are using digital certificates to sign sensitive information or high value transactions, you need to be able to verify that the signature was valid at the time it was carried out – i.e. the certificate used to sign had not been cancelled. The validation of digital certificates in real time is a task that can be accomplished by CertiVeR [1, 2]. CertiVeR is a consortium of European companies funded by the TEN-Telecom project under the auspices of the European Commission. The aim of CertiVeR is to offer a certification revocation service, with the corresponding On-line Certificate Status Protocol (OCSP) publication. The OCSP technology is designed to validate the status of a certificate in real time. CertiVeR may also be in charge of managing the process for the revocation, suspension or rehabilitation of certificates. The revocation or suspension of a certificate is necessary when a certificate is lost or stolen. In such case, one of the fastest and most available mechanisms to cancel the use of a certificate is a telephone communication. However, such mechanism needs to be secured so that a speaker can only cancel her/his own certificates. In order to guarantee speaker’s identity, CertiVeR uses speaker verification technologies. The usage of these technologies allows us to authenticate the user who is making the request for revocation. The maturity of speaker recognition technologies, the very little intrusiveness and the possibility of remote validations in real-time have suggested CertiVeR the use of speaker verification for its revocation module. This paper presents the main features of the voice recognition system being adopted within the CertiVeR project. The empirical results presented in this paper are the fruit of the experiments carried out using a telephonic multi-session database in Spanish. We introduce CertiVeR services in Section 2, focusing on motivation and benefits. Section 3 explains the performance of speaker recognition in combination with certificates. In Section 4, we find the description of the experiments and the analysis of the results and finally conclusions are presented in Section 5.
OCSP Responder Certification Authority OCSP Validation Request Certificates Database Speaker Recognition Revocation Module
Revocation Request
Figure 1: Certiver’s architecture. 2.2. Motivation and benefits
2. CertiVeR services 2.1. PKI description As we have pointed out before, users need a higher degree of security in their commercial transactions. To provide assurance about its source and integrity it is convenient to develop a robust PKI, which derives in the use of digital signature. The electronic signature substitutes the manual signature and allows the recipient of a digitally signed communication to determine whether this communication has changed after it was digitally signed. The system runs with a public-private key pair previously created by the sender. At this point, we encounter the problem of ensuring the identity of the person who holds a key pair. A certification authority (CA) is a trusted third person or entity that certifies that the public key of a public-private key pair used to create digital signatures belongs to the subscriber. Once the identity of the subscriber is verified, the CA issues a certificate. Then if the subscriber finds that the certificate is accurate, the certificate may be published in a repository, an electronic database of certificates accessible to anyone. If a private key is compromised or lost, the corresponding certificate has to be suspended or revoked. If using the traditional model of work, the public key and the certificate are placed in the certificate revocation list (CRL), a file published by the CA containing a list of certificates that have been revoked before their expiration date.
CertiVeR –see architecture in Figure 1- has its origin in the fact that the deployment of the use of electronic signatures in e-commerce and in any transaction that has important value associated with, requires the verification of the signature policy, which includes the validation of all the certificates in the signer’s certification path. In most of the cases, this verification may be done on the basis of CRLs, with a frequency of publication ranging from one hour to one day. In some applications, like the financial ones, the latency between the time a certificate may have been revoked and the time the new CRL will be released, may result in the unsuitability of this mechanism to check the validity of a certificate. In applications where the time constraint is very important, like the purchase of stocks, or the bidding in an auction, it is necessary to know the status of a certificate in real-time using OCSP, which allows to request the status for a particular certificate, without having to wait for the publication of the new version of the CRL by the issuing CA. CertiVeR also implies a faster validation of the identity of the user/customer, including some personal profile, with security and without lost of information privacy. The very important rise of digital signature use -and its legal value- give to the revocation and its associated services a main role. All PKI users must have the chance to revoke instantaneously any compromised certificate, and also verify instantaneously a certificate validity. This kind of services are very suitable for any CA. Subcontracting OCSP related services, a CA can give
to its clients a service of instant certificate verification and revocation. This service covers the gap existing between the revocation request time and the revocation publishing time, making it virtually non-existent. This is a very important feature, all the more so when the digital signature is used in B2B or financial markets. Through the use of the services offered, the following benefits can be expected: •= A substantial reduction in the delay in delivering the revocation information to end users.
set experimentally on a development population. The second method proposed here to estimate the threshold uses data from clients and impostors [4,5] according to:
Θ x = α Mˆ X + (1 − α ) Mˆ X
where Mˆ X is the client scores mean, Mˆ X is the impostor scores mean and α is a constant, different for every equation and empirically determined.
•= Greater security in the signature verification.
4. Experiments
•= Reduction of the cost for the creation of qualified
CAs.
3. Speaker verification Speaker verification has been adopted by CertiVeR to deal with the lack of security when accessing revocation services through a phone line. A user joins the system through Internet by providing some personal details. At the end of the process, a password and a phone number are given to the user in order to make the enrollment to have the possibility to use certificate revocation via voice. The password is only used for the training period. Once the speaker model is estimated, the user is able to verify her/his identity from the telephone line. In the test phase, if the verification is successful, the speaker can cancel the certificates. From the moment the status of the certificate is changed by the user, the CertiVeR OCSP Responder provides its current status through Internet. The validation process consists in the pronunciation of a personal identification number –login-, which is different for every user and normally well-known by the speaker, and the repetition of a 5-digit number randomly generated each time that we name password. The inclusion of random numbers prevents from potential recordings. Speech and speaker verification are applied on the login and the password. A demo of the service is available at the project website [1]. The a priori speaker-dependent (SD) threshold is estimated following two different methods. The first one uses only data from clients and score pruning [3] to remove non-representative LLR scores and better estimate the threshold. In this method, the client mean estimation is adjusted by means of the client standard deviation estimation and α , as follows:
Θ x = Mˆ X − α σˆ X
(2)
(1)
where Mˆ X is the client scores mean, σˆ X is the standard deviation and α is a constant which has to be
4.1. Database A Spanish database presented in [3] has been used to test the performance of the system because the number of real tests obtained up to this moment was not high enough to be considered as valid and statistically reliable data. The database belongs to the company Biometric Technologies, S.L. It has 184 speakers and has been especially designed for speaker recognition. 4.2. Experimental setup Utterances are processed in 25 ms frames, Hamming windowed and pre-emphasized. The feature set is formed by 12th order Mel-Frequency Cepstral Coefficients (MFCC) and the normalized log energy. Delta and delta-delta parameters are computed to form a 39-dimensional vector for each frame. Cepstral Mean Subtraction (CMS) is also applied. Left-to-right HMM models with 2 states per phoneme and 1 mixture per state are obtained for each digit. Client and world models have the same topology. The speaker verification is performed in combination with a speech recognizer for connected digits. During enrollment, those utterances catalogued as "no voice" are discarded. This selection ensures a minimum quality for the threshold setting. It is important to note that fixed-line and mobile telephone sessions are used indistinctly to train or test. This factor increases the error rate. Two kinds of tests have been carried out with the database. The first one uses 8-digit utterances and the second one 4-digit utterances. The speech recognizer discards those digits with a low probability and selects utterances which have exactly 8 digits or 4 digits respectively. Our experiments include speakers with a minimum of 5 recorded sessions for the enrollment. It yields 100 clients, but two of them did not pass the speech recognizer test which finally makes 98 clients. We use 4 sessions of 8- and 4-digit utterances for the enrollment and the rest of sessions to perform client tests. Speakers with more than one session and less than 5 sessions are
impostors. 8-digit and 4-digit utterances are employed for enrollment. We train the model with a number of utterances from 15 to 48. 4.3. Verification results Experiments have been carried out with a database that includes fixed-line and mobile calls. The speaker decides when calling from home, from a mobile, etc. We know the origin of a call: mobile, fixed-line... It could be used for posterior conclusions. Error rates are normally higher for mobile sessions. With this database we are closer to a real application because in it, users expect to be always verified and do not think about the handset. The database does not contain 5-digit utterances but we can use 4-digit ones instead. Of course, the error rates will increase with 4-digit utterances. Results from our experiments are reported in the following table: Threshold method – Test SD1 – 8digit SD2 – 8digit SD1 – 4digit SD2 – 4digit
FA (%) 3.49 2.10 6.73 5.71
FR (%) 3.55 2.26 6.29 6.15
Table 1: Error rates with speaker-dependent thresholds As we can see from table 1, the speaker-dependent threshold method SD2 described in (2) performs better than the method SD1 presented in (1) for both 8- and 4digit utterances. However, in certain cases, when it is difficult to obtain impostor data [6], the method in (1) can be more suitable. The error rates are significantly lower when we use 8-digit test utterances. Anyway, a combination of both – this is the case for CertiVeR - would give us an improvement in global error rates. In our case, the impact of FR errors is even more important than FA errors. The erroneous revocation of a certificate does not elicit dreadful consequences. 4.4. User satisfaction CertiVeR has just finished a survey about its validation services. The survey has been distributed among a broad number of companies and institutions – mainly in Europe but also including some from the rest of the world – mostly related with the PKI environment. The functionality of the tool provided in the CertiVeR demo has been evaluated as a very functional application friendly to use, very intuitive and easy to install. The response time has been qualified as optimum. The revocation service has been considered a bit less functional than the validation one, but it has got good
acceptance (at least 4 in a scale from 0 to 5) by an 80% of the users.
5. Conclusions The growing importance of e-commerce demands nowadays more security to deploy each of its advantages. Users need to be confident on their commercial transactions. One of the greater problems with the digital certificates is the delay from the moment a certificate is being revoked until the list of certificates is brought up to date. To solve this problem, CertiVeR offers a certification revocation service in real-time. Moreover, CertiVeR reduces the cost for certificate authorities and increases security by using speaker verification to validate users’ identities. From the moment the user/speaker is registered in the system through the Internet, (s)he is able to enroll with a phone call. Once the voice profile is loaded for the speaker, it is possible to access to revocation services. The performance of the speaker verification module has been evaluated with some tests with a database in Spanish which includes fixed-line and mobile phone sessions for every speaker. The composition of the database concerning the handset is similar to the real system.
6. References [1] The CertiVeR Project, http://www.certiver.com [2] Medina, M., Manso, O., and López-Baena, A.J., "Certificate Status Publication: Economical factors", Ultimate Leading Edge International IT Conferences & Expos , Toronto, October 13-17, 2003. [3] Saeta, J.R. and Hernando, J., “Automatic Estimation of A Priori Speaker Dependent Thresholds in Speaker Verification”, Proc. 4th International Conference in Audio- and Video-based Biometric Person Authentication (AVBPA), ed. SpringerVerlag, pp. 70-77, 2003. [4] Pierrot, J.B., Lindberg, J., Koolwaaij, J., Hutter, H.P., Genoud, D., Blomberg, and M., Bimbot, F., “A Comparison of A Priori Threshold Setting Procedures for Speaker Verification in the CAVE Project”, Proc. ICASSP’98, pp. 125-128. [5] Lindberg, J., Koolwaaij, J., Hutter, H.P., Genoud, D., Pierrot, J.B., Blomberg, M., and Bimbot, F., “Techniques for A Priori Decision Threshold Estimation in Speaker Verification”, Proc. RLA2C, Avignon 1998, pp. 89-92. [6] Surendran, A.C. and Lee, C.H., “A Priori Threshold Selection for Fixed Vocabulary Speaker Verification Systems”, Proc. ICSLP’00, vol. II, pp.246-249.