Voice Interactive Personalized Security Protocol

Voice Interactive Personalized Security Protocol: Definition and Security Analysis Dimitris Zisiadis, Spyros Kopsidas, Leandros Tassiulas Computer Engineering and Telecommunications Department University of Thessaly, 38221 Volos, Greece {dimitris, spyros, leandros}@uth.gr Abstract Security is a major issue in VoIP communications over the Internet, especially in mobile environments. Voice Interactive Personalized Security (VIPSec) constitutes a method for leveraging the security for Internet communications, with biometric based authentication, exploiting the nature of the application. During the establishment of the communication session, the end-peers exchange a challenge/signature token, the integrity of which is confirmed vocally when the voice communication initiates. In the present work we define the protocol message set in detail and we present an extensive security analysis over the protocol operation. The method is appropriate for ensuring the security of a VoIP or collaboration application, guaranteeing the integrity of the session key exchanged in the beginning of the conversation. It requires minimal resources from the user handsets and no additional support from the network, so it is inherently scalable and readily deployable as it only needs an appropriately enhanced secure handset.

1. Introduction A great load of user communications is carried over open channels via public networks. The Internet is becoming one of the main transport networks both for everyday user communications as well as more sophisticated business communications, like teleconferencing and stock-market services. Trustworthy security necessitates certain security measures such as confidentiality, authenticity, and integrity. Most of the widely spread Internet applications for personal communications employ no security mechanism at all, while others implement some security mechanisms that have shown to suffer from serious security weaknesses to threats. Security weaknesses of the Internet Protocol (IP) have been

thoroughly studied [1][2][3]. Furthermore it is obvious that it is fairly simple for a Man-in-the-Middle (MITM) attack to take place on the Internet, since despite the plethora of security protocols involved in IP communications there is no security mechanism that overcomes the problem. A severe case of a powerful MITM is the network operator, due to unexpected and illegal actions or through compromisation of its resources. Numerous incidents of the kind have been on the Internet as well as on the press over the past years. In the present work we define the basic protocol elements (set of messages, message fields, data payload) and we perform a detailed security analysis of the Voice Interactive Personalized Security protocol. The protocol is suitable for public networks where endto-end security is required by the users and there is no trust upon the CA or upon the network. The users are free to roam and use the network from different places using various devices (PDAs, IP phones, Internet PCs etc). We demonstrate that the proposed protocol is resistant to attacks. The robustness of the protocol is further examined through extensive security analysis. This paper is organized as follows: the assumptions for the present work are given in Section 2; an overview of related work is provided in Section 3; in Section 4 we present the Voice Interactive Personalized Security (VIPSec) protocol for securing end-to-end voice communications and we define the protocol elements in detail; in Section 5 we present the security analysis of the protocol; we prove its resistance to attacks by examining its security fundamentals and through exhaustive step by step analysis; a brief summary and concluding remarks are given in Section 6.

2. Assumptions For this paper we are making the following assumptions:

1. 2. 3. 4. 5.

6. 7.

8.

Scope: VoIP applications. Security: end-to-end security objective needs to be satisfied. Public use: communications in public networks (“open” networks with significant number of users). User freedom: users are independent of location and device. User identification: Our protocol does not identify the end points. The communicating parties can identify physically each other, i.e. there exists a relationship between them. Trust: There is no CA or the CA is not trusted; the network is not trusted either. Session lifetime: the objects, Public keys and secrets span only a session’s lifetime. Different sessions use distinctive data and keys. As an option, the temporary storage of a symmetric key following a successful key exchange procedure for a short period is supported. Devices: support for voice and/or video communications.

3. Related work In practice a MITM attack can be easily implemented in the Internet. MITM through impersonation intercepts authentication and ciphering data exchanged over the compromised communications channel, managing to breach authentication and integrity of the session. While there are attempts to fortify the network against these attacks, up to this day there is no public network or protocol that manages to overcome this problem. PKI is a solution but the CA administrator can compromise every communication that builds upon its trust. Many well known software packages like Pretty Good Privacy [4] follow this approach. Pre-shared key systems are a solution in some cases [5], e.g. when the communicating parties belong to the same closed user group. Hybrid Cryptographic Systems [6], even the most advanced of them, are not in position to eliminate this threat since the attacker is still able to eavesdrop and/or modify exchanged data traffic. The only way to have truly secure communications channels is by implementing end-to-end security down to the user level. By doing so, there is no way that an attacker using current technological means can breach channel security. End-to-end security can be provided by any means of cryptographic schemes: symmetric, asymmetric or hybrid. There are still some critical issues though for implementing effective end-to-end security, like key distribution, management and maintenance as well as data authentication and identity management. The main

disadvantage of the current end-to-end security mechanisms comes from the fact that keys have to be exchanged out of band prior to the call. Out of band key exchange is difficult to be implemented in practice and key management is extremely difficult, even for closed user groups under common control (i.e. employees of a company). Inbound key exchange procedures have been proposed in the past, like Peter Gutmann’s internet draft [7]. ZRTP [8] proposes the use of public key exchange in order to establish a shared secret with key continuity, where the two parties verify a small hash string that is derived out of the public key. Zfone which is based on ZRTP is depicted in [9]. Our proposal is an alternative to ZRTP since it is based on user confirmation procedure, but it uses a completely different key exchange mechanism. Another theoretical approach similar to Zfone without key continuality is the one proposed by Cagalj et al [10].

4. VIPSec definition Our proposal is to have a simple and easy to use “security handshake” procedure prior to the voice session, that effectively “clears” the communication channel by accomplishing user controlled end-to-end encryption. A symmetric session key is exchanged inbound using any public cryptography scheme selected. The public keys are also exchanged inline. There is no need for a CA or for permanent keys.

4.1. VIPSec operation When Alice and Bob need to communicate the following procedure is taking place before any data exchange is performed. A set of asymmetric cryptography keys (Private/Common) is automatically generated per user per session. Users exchange challenge objects encrypted respectively with their individual session Private Key (Pi), resulting in their User Session Signature (USSi). The object itself can be anything the user selects: a random number, a string, an audio or video file etc. Next they exchange their session Public Keys (Ci) and the calling party also sends the Session Symmetric Key (SSK). The communicating parties at the end of this handshake procedure perform a userbased biometric test on the USSs, by using either voice (1st biometric level) or video confirmation (2nd biometric level) of each other’s identity and the nature of the exchanged data. If this procedure produces a positive acknowledgement by both users then their communication is secured and everything sent over the network are encrypted with the Symmetric Key (SSK)

that was securely established during the first phase above. If not, the parties know that the communication channel and any data exchanged during the procedure are compromised. The security of the method relies on the inability to mechanically impersonate an individual, due to the biometric user specific attributes of the human voice and video as well as the user customized profile of the exchanged information. The method is appropriate for ensuring the security of an encrypted voice or data exchange application, guaranteeing the integrity of the session key exchanged in the beginning of the communication. A more detailed presentation of VIPSec operation can be found in [11]. In Table 1 below we present the message sequence of the protocol which is used for defining the protocol specifications in the next subsection and performing a security analysis on it in section 5. Table 1. VIPSec messages Seq#

Message

1 2 3 4 5

HELLO (USS2) HELLO (USS1) SENDKEY (C2) SENDKEY (C1) SENDSYMMETRIC (Encrypted_SSK)

6

LEVELS(Voice, Video)

7

VERIFY(Voice/Video)

Control Data Payload User2 Session Signature User1 Session Signature User2 Common Key User1 Common Key Session Symmetric Key as selected by User1, encrypted with User2 Common key Flag Voice and/or Video capabilities for User 2 Selected verification level (Voice/Video)

4.2. VIPSec specifications VIPSec’s handshaking procedure is implemented over TCP connections while the media stream is carried over encrypted RTP packets. The format of the VIPSec protocol messages is shown in fig. 1 below.

Figure 1. VIPSec protocol: Message Format. The first part of the message is the TYPE field that holds the type of the message exchanged. This field contains a number between 1 and 5 and the corresponding message types are shown in Table 2. The second part of the message is the LENGTH field, which is used to hold the length of the CONTROL DATA field. The possible LENGTH values for the various message types are shown in Table 3.

Table 2. TYPE field Value 1 2 3 4 5

Message Type HELLO SENDKEY SENDSYMMETRIC LEVELS VERIFY

The third part of the message is the CONTROL DATA field, which is used to hold the message control data payload. Table 3. LENGTH field Type 1 2 3 4 5

Length depended on selected key lengths depended on selected key lengths depended on selected key lengths 2 2

The possible CONTROL DATA values for the various message types are shown in Table 4 below. Table 4. CONTROL DATA field Type 1 2 3 4 5

Data arbitrary arbitrary arbitrary 00: voice not supported, video not supported 10: voice supported, video not supported 11: voice supported, video supported 10: voice confirmation selected 01: video confirmation selected

After the message exchange Alice and Bob use the selected verification level to verify the USSs and then they proceed with normal voice communication encrypted with the symmetric key SSK.

5. VIPSec: security analysis In this section we analyze the security of our protocol. This is achieved (a) through intuitive validation, (b) through the provision of security fundamentals implemented by the protocol followed by (c) an exhaustive step by step analysis.

5.1. Intuitive protocol validation An approach to the protocol’s validation can be done intuitively by focusing on the three phases of the protocol, the objects that are exchanged and verified and the timing of the events.

1.

Phase 1: user selected (biometric) objects are ciphered with the user session’s Private Key, producing User Session Signatures (USS). Thus, users are bound to verify the selected object at the end of the protocol handshake. Phase 2: authenticity and identity data are given out piece-by-piece. USSs are exchanged before any key exchange. Common keys are exchanged next, used to decrypt the USSs and extract the original object. Session Symmetric Key is exchanged last and it is considered safe to be used if and only if the USS’s are verified successfully. Phase 3: a user based verification procedure is used at the end. Users verify as humans, by using the voice or video media stream, the objects they sent to each other during phase 1. This verification procedure is simple, yet most powerful and effective as it relies on the human attributes of the users, providing unmet security at this level.

2.

3.

5.2. Asymmetric cryptography key features Asymmetric or public key cryptography algorithms rely on public keys for the encryption of data and private keys for the decryption, as well as the creation of digital signatures for user identification [19]. A basic rule in asymmetric cryptography is that it is computationally infeasible to determine the private key given knowledge of the algorithm and the public key or the digital signature. Public/Private key algorithms are used in many crypto systems including ciphers and authentication protocols. Asymmetric cryptography keys have the following properties: 1. 2. 3. 4.

Independency: keys while paired, they are independent, and it is not feasible to generate one out of the other. Non inversion: encrypted data don’t reveal the key used for the encryption. Encryption: when data destined for a user are encrypted with the user’s common key then only this user can decrypt them with his private key. Identification: when data are encrypted with a user’s private key they form a signature data object that verifies the identity of the sender. Anyone can decrypt these data with the user’s common key.

5.3. Bonding Pair The User Session Signature (USS) of a user is encrypted with the Private Key of that user and the user’s Common key is used to decrypt the signature

and reveal the user selected data to be verified. We say that the USS and the Common key form a bonding pair. Bonding Pair Properties • • •

USS reveals sender’s identity. USS carries one time user selected data. USS and Common key are paired: USSi is decrypted only with the relevant common key Ci. The data carried in the USS are selected by the user and they are verified by the same user when the key exchange is completed. A bonding pair is considered valid (authentic) at the receiving party only after a successful voice or video verification procedure.

•

5.4. Novelty The novelty of the protocol is that key verification is performed through the verification of user selected data, in contrast to other approaches that are using hash commitment schemes. Each user randomly selects some data to form his session signature and these data are verified through a biometric user based test using human oriented attributes (voice or video).

5.5. Analysis Below we provide the facts that validate the correctness of VIPSec and its resistance to attacks: 1. 2.

3.

4.

5.

6.

USS/Common key form a bonding pair. USS exchange is performed prior to any key exchange; both parties have in their disposal a USS to verify at the end before they send the accompanying part of the bonding pair. USS data are one time user selected data of arbitrary type (challenge) which are encrypted with the private key of the user; it is not feasible for any third party to guess the challenge data. Elements are mutually exchanged step by step. The relevant USS and session keys are exchanged in turns by both parties piece by piece and not altogether. An authentic bonding pair that reaches its destination (it was not altered or faked) guarantees a successful verification procedure at the end, which in turn results in a successful symmetric key exchange. If a user receives a bonding pair that is not authentic, then the user can safely assume that an attack is in place and abort the procedure.

7.

As a result of 5 and 6 above, the only way for a malicious middle man to compromise the channel is through impersonation, i.e. a fake bonding pair has to be created. This means than MITM need to guess the object to be exchanged. Due to the user based verification procedure at the end of the process the forgery of the bonding pair is proven and the attack is noticed by the users.

5.6. Step by step exhaustive analysis In order to intercept and interfere (I&I) in protocol handshaking steps knowledge of user selected data is required. This is proven in the following step by step possible I&I by any attacker to the protocol operation steps. Possible I&I scenarios: There are three possible scenarios that can take place where a malicious middle man X can attack by trying to intercept & interfere (I&I) communications between A and B: 1. 2.

3.

X is in the middle trying to alter protocol messages during its execution with both parties X impersonates first B to A and then A to B. In this case when A calls B, X responds as if it were B, executes the protocol with A and then X calls B as if it were A and executes a second protocol instance with B. Finally X bridges the two “half circuits” together. X impersonates first A to B and then B to A. In this case when A calls B, X “holds” the call, calls B as if it were A, executes the protocol with B and then X continues and completes execution of a second protocol instance with A as if it were B. and. Finally X bridges the two “half circuits” together. This case is symmetrical to case 2 above.

First we consider I&I scenario 1. We will examine step by step the protocol execution between the two parties and we will identify any possible I&I by X. In every case a dead-end is reached (no I&I feasible). Step1: User B sends USS2 to User A Possible I&I: In order to create and forward a USS, an object has to be encrypted with a corresponding private key. This means that the only feasible I&I in this step is to produce and forward a fake USS (USSX) using an arbitrary object. This object has to be encrypted with a (fake) private key PX, which as anticipated results in a fake bonding pair (which in turn results to an attack

event noticed by the communicating parties at the end, item 7 in the Analysis part F above). Step 2: User A sends USS1 to User B Possible I&I: Same as in Step 1. Step 3: User B sends C2 to User A Possible I&I: - If USS2 was not faked in Step 1 then bonding pair properties are followed or else no pair is formed; as a consequence C2 has to be forwarded and therefore there is no feasible I&I. - If USS was faked (USSX was forwarded to User 1 in Step 1 instead of USS2) then bonding pair properties must be followed. An also fake common key CX has to be forwarded. The fake bonding pair will be accepted by A and the extracted object that reaches A differs from the original send out by B. This will be revealed in the verification phase where verification will fail. Step 4: User A sends C1 to User B Possible I&I: Same as in Step 3. Step 5: User A sends Encrypted_SSK to User B (symmetric key SSK is encrypted with C2) Possible I&I: Original encrypted symmetric key (Encrypted_SSK) is forwarded to B which in turn uses his private key P2 to decrypt it. This leaves both A and B with the same symmetric key SSK which means that no I&I is possible. In this case X gets the encrypted form of the symmetric key but not the symmetric key itself OR a different symmetric key (SSKX) is encrypted with (eavesdropped in Step 3) C2 resulting in Encrypted_SSKX. B uses his private key P2 to decrypt it to derive SSKX. In this case A and B end up with keys that don’t match (A has SSK SSKX that B has), symmetric cryptography between A and B fails. A has SSK, B has SSKX, MITM knows SSKX but not SSK, so X is not able to use different key for each party OR a different symmetric key (SSKX) is encrypted with a fake common key (CX), resulting in Encrypted_SSKX. In this case B can’t decrypt it with his private key P2 which results once more in procedure fail. This proves the resistance of our protocol for the first scenario. The rest two scenarios are schematically shown in fig. 2 below. In fact they are different views of the same principles. For the second scenario, X is impersonating B to A and has to send a bonding pair to

A. For start, a USS has to be sent in Step1. Thus X has to select an object that will be verified at the end as originating by user B. Since B selects arbitrary objects it is not feasible for X to guess the object that B will choose later in time, therefore verification will fail at the end (same as in Step 1 for scenario 1). For the third scenario the same argument holds with users A and B interchanged.

References [1] Ofir Arkin, “Security Threats to IP Telephony- based Networks”, ;login: the Magazine of USENIX and SAGE, Vol. 27, Iss. 6, Dec. 2002, pp. 30-36 [2] Rohit Dhamankar, “Intrusion Prevention: The Future of VoIP Security”, June 2005, Available: http://www.solunet.com/wp-upload/WP-VoIPSecurity.pdf [3] Thomas J. Walsh, D. Richard Kuhn, “Challenges in Securing Voice over IP”, IEEE Security and Privacy, vol. 03, no. 3, pp. 44-49, May/Jun, 2005 [4] http://www.pgp.com [5] Badra, M., Hajjeh, I., “Key-Exchange Authentication Using Shared Secrets”, Computer, Volume 39, Issue 3, March 2006

Figure 2. Possible attack scenarios.

6. Concluding remarks The protocol proposed here is a user oriented approach for ensuring end-to-end security on an IP communication channel. It relies entirely on the user with no further assumptions (reliable ISPs, PKI etc). It uses a simple yet effective set of messages exchanged over the channel. It is easy to use because of the human oriented attributes and procedures that are followed (personal data, voice and video). The security level is selected by the user through the challenge objects and the verification method. Key sizes are another user selectable attribute depended on desired security, the default value for normal public use being the minimum accepted levels i.e. 128 bits for the symmetric key and 2048 bits for the asymmetric keys. Furthermore, typical end user terminals today easily meet its computational requirements, making it possible to be used in any environment: from a PC connected to a wireline IP network to a handheld device for use in a wireless IP network. It has been demonstrated that the protocol is resistant to man in the middle attacks, the most sophisticated attacks today’s public IP networks suffer from. The protocol is not only resistant to attacks known to compromise today’s voice networks, land line or wireless, but also to hypothetical more powerful attacks, like an attack from the network itself. Given the relatively low complexity for implementing the protocol at the end terminals, its effectiveness and user friendliness we expect it will be an attractive candidate solution for ensuring end-to-end secure communications.

[6] Amit Parnerkar, Dennis Guster and Jayantha Herath, “Secret key distribution protocol using public key cryptography”, Journal of Computing Sciences in Colleges, Volume 19 Issue 1, Consortium for Computing Sciences in Colleges, Oct. 2003 [7] Peter Gutmann, “Key Management through Key Continuity”, Internet Draft, Available: http://www.ietf.org/ internet-drafts/draft-gutmann-keycont-00.txt [8] Phil Zimmerman, “ZRTP: Extensions to RTP for DiffieHellman Key Agreement for SRTP”, Internet Draft, Available: http://tools.ietf.org/wg/avt/draft-zimmermann-avtzrtp-04.txt [9] Samuel Sotillo, “Zfone: A New Approach for Securing VoIP Communication”, Available: http://www.infosecwriters .com/text_resources/pdf/Zfone_SSotillo.pdf [10] M. Cagalj, S. Capkun, and J. Hubaux, “Key agreement in peer-to-peer wireless networks”, Proceedings of the IEEE, Vol.94, Iss.2, Feb. 2006, pp. 467- 478. [11] S. Kopsidas, D. Zisiadis, and L. Tassiulas, “Voice Interactive Personalized Security (VoIPSec) protocol: Fortify Internet telephony by providing end-to-end security through inbound key exchange and biometric verification”, In Proceedings of the First IEEE Workshop on Hot Topics in Web Systems and Technologies Workshop (HotWeb2006), Boston MA, Nov. 2006