Using An E-Model Implementation to Evaluate Speech ... - CiteSeerX

Using An E-Model Implementation to Evaluate Speech Quality in Voice over 802.11b Networks with VPN/IPSec Alexandre Passito, Edjair Mota, Regeane Aguiar, Leandro Carvalho, Eliezer Moura

Anderson Briglia, Ilias Biris

LabVoIP - Computer Science Department Federal University of Amazonas Av. Gen. Rodrigo Octavio, 3000, Coroado 69077-000 - Manaus - AM - Brazil Email: {apq,edjair,rba,leandrogalvao,eliezermoura}@dcc.ufam.edu.br

Abstract— In this paper we present the results of an experimental analysis of voice traffic over 802.11b networks when protected by IPSecurity mechanisms. The main concern is the objective measurement of the impact due these mechanisms by means of our corrected ITU-T E-Model implementation, showing more precisely how the associated security overhead decreases speech quality and channel capacity.

I. I NTRODUCTION Voice over Internet Protocol (VoIP) has become a robust solution to voice communications over the Internet. One of the major causes of this success is due to the fast development of applications and solutions for multimedia traffic transmission and the decrease of differences between PSTN and VoIP speech quality, despite the best-effort service offered by Internet. The introduction of these voice services in wireless networks, such as 802.11 WLANs, is being implemented progressively, thus offering mobility to VoIP users. These netwoks have their own peculiarities which, together with real time media transmission requirements, impose a great challenge to the VoIP technology. The success of implementing VoIP over wireless links points to another challenge: how to offer security mechanisms to voice data guaranteeing quality of service (QoS). This necessity arises as IP networks do not offer any inherent security mechanism and VoIP traffic can be target of several known attacks [1]. This paper presents the results of an experimental analysis of speech quality in VoIP over an IEEE 802.11b network with VPN/IPSec implemented to guarantee traffic confidentiality. The IPSec is a set of protocols developed by the Internet Engineering Task Force (IETF) to support secure exchange of packets at the IP layer. IPsec has been deployed widely to implement Virtual Private Networks (VPNs). Two experimental environments were used and IPSec was implemented with the combination of several encryption and authentication algorithms.

10LE Lab - Nokia Institute of Technology Rodovia Torquato Tapajs, 7200, Tarum˜a 69048-660 - Manaus - AM - Brazil Email: {anderson.briglia, ilias.biris}@indt.org.br

The analysis is based on a widely accepted standard for speech quality rating, the MOS (Mean Opinion Score)[2], by which users rate the speech quality in a scale from 1 (poor quality) up 5 (excellent). To evaluate the quality in an objective manner and to quantify the impact of the security mechanisms offered by IPSec, we implemented a measurement tool based on the computational model E-Model, presented at [3] and with miscalculation corrections at [4] and [5], which provides a single output factor that is converted to MOS rating. The paper is organized as follows: section II presents related works. Section III briefly introduces security issues in 802.11b networks and how IPSec can supply the security needs of the standard. Section IV presents voice over 802.11b networks and the E-Model as a speech quality evaluation tool. Section V describes the implementation of the experiment environments. Section VI describes the analysis of the results. Finaly, section VII presents our conclusions about this work. II. R ELATED W ORKS References [1] and [6] presented experimental results on voice transmission over IPSec, analysing the behavior of the encryption system related to real time traffic. The results showed that in wired networks there is a sensible increase in transmission delay due a bottleneck of the cryptographic engine and a decrease of the effective bandwidth. Reference [7] presented the project and implementation of a WLAN with IPSec, and investigated the performance of the UDP and TCP protocols. The authors concluded that there is a sensible variation over packet loss rate and network throughput. Other related works [8][9], carried out an analysis of IPSec overhead in 802.11b networks showing that there is a high overload in TCP and UDP traffic and the higher the security level is higher the system overhead. In [10] and [11] is evaluated the capacity of a 802.11 network supporting VoIP traffic. It was concluded that channel capacity is a function strongly dependent on the channel bandwidth, voice codec packetization interval and the data traffic

in the system. Reference [12] proposed a new mechanism to enhace the quality of service in 802.11 networks and in [13] VoIP is evaluated according to special channel conditions. III. S ECURITY IN 802.11 B NETWORKS AND VPN/IPS EC There are many known threats in wireless networks [14][15]. The medium of data transmission in these networks offers a new opportunity to eavesdropping because a user sending a message through a wireless channel will never be sure about which path the message will follow due to the coverage area of a wireless system. The prevention of non-authorized network access can hinder several forms of attacks, such as information theft, DoS attacks, non-authorized location of users and jamming. Among proposals to implement security in WLANs are Wired Equivalent Privacy (WEP) and the implementation of VPNs[8]. The WEP protocol is inherent to 802.11b standard and its function is to offer authentication and encryption to the link layer. A lot of flaws were reported in [16], [17] and [18], which turn the protocol insecure. Virtual Private Networks has become an important solution to security flaws in 802.11b standard. VPNs offer security by means of the integration of authentication, encryption, access control and session management[19]. The virtual network is built over another network, often the Internet. VPN provides a secure communication tunnel between the users as if they were at the same network. VPNs use encryption techniques to prevent the interception and analysis of datagrams while they are in the public network. Several tunneling protocols have been proposed as PPTP (Point-to-Point Tunneling Protocol), L2TP (Layer Two Tunneling Protocol), L2TF (Layer Two Forwarding) and the IETF IPSec (IP Security Protocol)[20]. The IPSec is constituted by some protocols: Authentication Header (AH), Encapsulating Security Payload (ESP) and the Internet Key Exchange (IKE). IPSec supports various encryption and authentication algorithms and the major are implemented in Linux kernel version 2.6 used for our experiments:AES-CBC, 3DES-CBC and HMAC-SHA1. IV. VO IP OVER 802.11 B NETWORKS The transmission of voice in wireless links have been used for many years and is known as mobile telephony. Mobility is guaranteed by the coverage of the service provider’s antennas. A type of wireless link that has been deployed due to low cost and fast development is IEEE 802.11 networks. Such a network can be easily implemented to provide VoIP services with mobility to its users [10]. One kind of scenario could be the existence of one or more access points and several VoIP enabled mobile stations that could communicate to each other through that access points and with another network through various types of gateways, e.g., ethernet and bluetooth gateways.

A. Evaluating speech quality with E-Model To evaluate the impact of security algorithms in speech quality and channel capacity we used the computational model E-Model [3]. The E-Model is an objective method to evaluate speech quality in VoIP systems. Its resulting score is the transmission rating R factor, a scalar measure that ranges from 0 (poor) to 100 (excellent). R factor values below 60 are not recommended [5]. The factor R can be obtained by the following expression: R = R0 − Is − Id − Ie + A

(1)

where R0 represents the basic signal-to-noise ratio; Is represents the combination of all impairments which occur more or less simultaneously with voice signal; Id represents the impairments caused by delay; Ie represents impairments caused by low bit rate codecs and A is the advantage factor, that corresponds to the user allowance due to the convenience in using a given technology. Equation (1) can be reduced to: R = 93.4 − Id (Ta ) − Ie (codec, loss)

(2)

where Id is a function of the absolute one-way delay and Ie is, in short, a function of the used codec type and the packet loss rate. Reference [4] proposes corrections in R to MOS conversion, Ie factor miscalculation and packet-loss probability in burst periods miscalculation. These modifications were incorporated in our E-Model implementation. The R factor can be converted to MOS rating in ranges from 1 (worst case) to 5 (excellent quality). After the calculation of R factor in (2), the conversion is done by the relation between R factor and M OS rating: For R < 6.5 : M OS = 1 For 6.5 ≤ R ≤ 100 : M OS = 1 + 0.035R + 7.10−6 R(R − 60)(100 − R) For R > 100 : M OS = 4.5 (3) Once the codec is well-known (Ie ), we need to capture network (delay and loss) and application (dejiter buffer delay and used codec) statistics in order to estimate speech quality by means of the R factor expression in (2). V. VO IP AND IPS EC OVER 802.11 B NETWORKS The major motivation to use VoIP in 802.11b networks is support to mobility. The robustness of this solution, already achieved in wired networks, depends on the quality of the system security. Due to WEP protocol fragility, the utilization of VPNs with IPSec has become a widely deployed solution. IPSec can be useful to voice service encryption technics such as the AES Standard, authentication and key exchange, guaranteeing mobility and security.

Nevertheless, real time applications can be seriously affected by these mechanisms, despite their lower impact in other types of applications. The introduction of a layer to provide security can impose restrictions to VoIP application as increasing delay, packet loss, jitter and decreasing channel bandwidth. A. The E-Model Measurement Tool The measurement tool we developed is open-source written in ANSI C language and takes an input file (trace) containing delay, loss, codec type and frame duration information for each packet exchanged between two endpoints. The output is a file with Id , Ie , delay, loss, R factor and MOS rate. The values can be plotted, giving a quick visual information about the voice quality variation along the call. To be sure about the real delay and loss for each packet, the trace must be collected after the dejitter buffer and not at the network interface. Since it adds an additional delay before the packets can be played and late packets can be discarded in the dejitter buffer. We used an application from the OpenH323Project [21] called callgen323, which was used to generate traces with the needed information. The application library was tailored to fulfill all requirements of E-Model computation. The trace file generated by the VoIP client was preprocessed by a script before it was passed as input to the tool, in order to be converted to a pattern structure recognized by the tool. B. Environment for the experiments The experiments were carried out using mobile stations with IEEE 802.11b interfaces connected to an IEEE 802.11b wireless access point, both with 11Mbps interfaces. The access point was connected to another computer hosted at a wired network using Ethernet.

Fig. 1. Communication between mobile stations through the 802.11b access point.

The tests were deployed in two scenarios. Fig. 1 presents the first scenario where the communication between mobile stations is carried out by means of the access point. Fig. 2 shows the communication path between a mobile station and a station in an Ethernet network. The access point was configured to use IPSec instead of WEP as security mechanism.

Fig. 2. Communication between a mobile station and a station in a fixed network.

We used the implementation of IPSec available in Linux kernel 2.6.8. Each station was configured with the desired security level and secret-keys using the ipsec-tools [22] software. An H.323 VoIP client was used in each station. The tests were taken by playing a pre-recorded audio file from one endpoint to the other one. This file is a speech report made by some NASA researchers and is worthy because it contains human speech, with pauses among regular voice activity; it does not contain background music or noises and its duration is above 5 min, like a common phone call. The voice extracted from the .wav file was coded by callgen323 with G.711 µ-Law codec, in frames of 30ms per packet. The G.711 bit rate is 64kbps, without headers. RTP, UDP and IP headers sums a total of 40 bytes. So, the bandwidth required by each voice channel was about 75kbps. C. Experiments execution The experiments were divided in two parts. The first one evaluated the quality of speech in one single call between the stations over the IPSec. Around 100 consecutive calls with 5 min of duration were performed in both scenarios. At first, the calls were performed without any security mechanism implemented. In subsequent trials, IPSec was enabled with the following configurations, employing widely used algorithms: 1) 3DES-CBC (192 bits) to encrypt and HMAC-SHA1(160 bits) to authenticate. 2) AES-CBC (192 bits) to encrypt and HMAC-SHA1(160 bits) to authenticate. The second part of the experiment evaluated the performance associated with the number of simultaneous calls between the pairs. Callgen323 implements the option to generate simultaneous calls, but it does not guarantee that the all calls will be established due to channel bandwidth usage. A number of 2, 4, 6, 8, ...24 calls were performed with duration of 5 min. VI. N UMERICAL R ESULTS As we can see in Fig. 3, the average MOS and standard deviation over 100 consecutive calls of 5 min were presented to scenario 1. Considering as a good value of speech quality an average MOS rate above 3.5 [5], the results show that

5 4.5 4 3.5 Average MOS

voice speech quality in wireless network without security mechanisms implemented obtained a better MOS value (4.15) than the network implementing a VPN with AES-CBC (3.70) and a VPN with 3DES-CBC encryption algorithm (3.33). The standard deviation remains acceptable for the whole range of values, keeping always a value not higher than 0.15.

3 2.5 2 1.5 1 0.5

5 4.5

2

4

4 Average MOS

3.5

8 10 12 14 Number of simultaneous calls

16

18

20

Fig. 5. Average MOS in function of number of simultaneous calls to VPN with 3DES

3 2.5 2

5

1.5

4.5 4

1

Without security

AES+SHA1

3DES+SHA1

Average MOS value in scenario 1 for security combinations

Average MOS

3.5

0.5

Fig. 3.

6

3 2.5 2 1.5 1

Similar results are presented in Fig. 4 for scenario 2. The results show better speech quality for the three combinations. For the network without security MOS was equal to 4.27, VPN with AES gave MOS equal to 3.92 and for 3DES, MOS was equal to 3.55. 5

5

10 15 Number of simultaneous calls

20

25

Fig. 6. Average MOS in function of number of simultaneous calls to VPN with AES

The average MOS for 2 simultaneous calls without security mechanisms is 4.11, as plotted in Fig. 7. The MOS decreases until 1.56 with 24 calls being stablished. As expected, the results showed a better performance of the VoIP system without the IPSec than using it to implement encryption and authentication.

4.5 4 3.5 Average MOS

0.5

3 2.5 2 1.5 1

A. Results Evaluation

0.5

Without Security

Fig. 4.

AES+SHA1

3DES−SHA1

Average MOS value in scenario 2 for security combinations

In the Fig. 5 we plotted the average MOS score as a function of number of simultaneous calls. This experiment was carried out only in scenario 1 to evaluate the channel capacity with security mechanism. It can be observed that the average MOS decreases as the number of simultaneous calls is increased. Further, the individual MOS scores falls in a wider range, indicating that the voice quality is not the same for all single calls in the same stream. In Fig. 5, the initial MOS score for 2 simultaneous calls and 3DES was 3.50 and decreases until 1.47 when 18 simultaneous calls are established. In Fig. 6, the average MOS score is plotted with the scenario using AES as encryption algorithm. The average MOS value for 2 simultaneous calls is 3.89 and decreases until 1.23 when 24 calls are established. It is worthwhile to emphasize that the callgen323 tries to establish the maximum number of calls, but due to different overheads caused by both combinations, the average number of established calls is different.

With the utilization of VPN/IPSec, the voice packet header can be highly increased due to the utilization of encryption and authentication header. In our experiments, throughput was reduced to 40% because of the utilization of encryption before the packet was sent to the channel. This is due to the manner that IPSec treats voice packets, adding AH and ESP headers, and the time that is spent by the crypto-engine where the lighter the computation of the algorithm, the higher the throughput value achieved, explaining the difference between the implementations with AES and 3DES. Time spent by the crypto-engine influences the calculation of R factor in (2). Id is derived from all delays in the VoIP system. A higher delay in the crypto-engine increases Id final value, which decreases the R score, leading to lower MOS values. The overhead in packet headers introduced by IPSec influenced the calculation of the Ie . This overhead affected the packet loss rate resulting in higher values of the parameter loss in (2), reducing the MOS final score. This packet loss analysis was conducted with network measurement tools helped identify the difference between packet loss rate for implementations with and without security.

5

R EFERENCES

4.5 4 Average MOS

3.5 3 2.5 2 1.5 1 0.5 5

10 15 Number of simultaneous calls

20

25

Fig. 7. Average MOS in function of number of simultaneous calls to VPN with AES

The difference between average MOS in both scenarios, pointing to a better performance in scenario 2, because in scenario 1 the stations are competing for the transmission channel through the CSMA/CA protocol. In scenario 2, there is only one station competing for the channel and the communication between the access point and the fixed station is done by the Ethernet protocol and a channel with 100Mbps. Analysing Fig. 5, Fig. 6 and Fig. 7, we can realize what is the real amount of simultaneous calls which can be established. The results showed that for 3DES combination we can establish up to six calls. For AES and the configuration without security this number is, respectively, eight and ten calls. Reference [10] shows that for codec G.711 with frame size equal to 30ms and BER = 10−4 , the same characteristics of scenarios 1 and 2, the number of maximum simultaneous calls that can be established is nine, what is a very approximate result to the one obtained in our experimentation. The worst result related to 3DES and AES is due to the utilization of IPSec. VII. C ONCLUSION The presented results showed that the E-Model computational model is a robust tool to evaluate the performance of VoIP system, specially speech quality, when implemented with security mechanisms as VPN/IPSec and in networks with poor channel conditions as wireless channels. The combination of AES-CBC and HMAC-SHA1 obtained better results in average MOS score and can be used in the planning of security aspects in 802.11b networks, to increase the capacity of the VoIP system. Future works should investigate the implementation of an evaluation tool not only using H323 project, but the SIP (Session Initiation Protocol). Another ideia could be the implementation of evaluating agents, which could be used to select more appropriate parameters for the network. ACKNOWLEDGMENT The authors would like to thank to Enterasys Networks and Nokia Institute of Technology by their partial support. Alexandre Passito would like to thank in memorian to his grandfather for his support.

[1] R. Barbieri and D. Bruschi and E. Rost, Voice over IPSec: Analysis and Solutions.Proceedings of 18th Annual Computer Security Applications Conference (ACSAC). 2002. [2] ITU-T Recommendation P.800, Methods for subjective determination of transmission quality.Nov. 1996. [3] ITU-T Recommendation G.107, The E-model, a computational model for use in trasmission planning.Mar. 2003. [4] L. Carvalho, et al, An E-Model implementation for speech quality evaluation in VoIP systems.Proceedings of IX IEEE Symposium on Computers and Communications. Spain, 2005. [5] L. Carvalho, An E-Model implementation for objective speech quality evaluation of VoIP Communication Networks.Master Thesis. Sep. 2004 (in portuguese). [6] A. Passito, et al, Performance evaluation of VoIP traffic using IPSecurity protocol.Proceedings of I Workshop on Computer Science and Information Systems. Florianopolis, Brazil. 2004 (in portuguese). [7] W. Qu and S. Srinivas, IPSec-based secure wireless virtual private network, Proceedings of IEEE MILCOM. 2002, pp.1107-1112. [8] P. Ditarso and B.Astuto, Evaluating the overhead introduced by IPSec and WEP in 802.11 networks , Proceedings of Security Workshop of XXI Brazilian Symposium on Computer Networks. 2003. [9] G. Hadjichristofi and N. Davisand and S. Midkiff, IPSec overhead in wireline and wireless networks for Web and email applications, Proceedings of IEEE International Performance Computing and Communications Conference. 2003, pp. 543-547. [10] D. Hole and F. Tobagi, Capacity of an IEEE 802.11b wireless LAN supporting VoIP, Proceedings of IEEE International Conference on Communications. 2004, pp. 196-201. [11] K. Medepalli, et al, Voice Capacity of IEEE 802.11b, 802.11a and 802.11g Wireless LANs, Proceedings of IEEE Globecom. 2004. [12] D. Chen, et al, Supporting VoIP Traffic in IEEE 802.11 WLAN with enhaced Medium Access Control for Quality of Service. 2002.http://www.research.avayalabs.com/techreport/ALR-2002-025paper.pdf [13] O. Awoniyi and F. Tobagi, Effect of Fading on the Performance of VoIP in IEEE 802.11a WLANS, Proceedings of IEEE International Conference on Communications. 2004. [14] U. Murphy, et al, Firewalls for Security in Wireless Networks, Proceedings of 31st Annual Hawaii International Conference on System Science. 1998. [15] S. Russel, Wireless network security for users, Proceedings of IEEE International Conference on Information Technology: Coding and Computing. 2001. [16] N. Borisov and I. Goldberg and D. Wagner, Intercepting Mobile Communications: The insecurity of 802.11. 2001.Hosted at http://www.isaac.cs.berkeley.edu/isac [17] S. Fluhrer and I. Martin and A. Shamir, Weaknesses in the scheduling algorithm of RC4, Proceedings of Selected Areas in Cryptography. 2001. [18] H. Boland, Security issues of the IEEE 802.11b wireless LAN, Proceedings of Canadian Conference on Electrical and Computer Engineering. 2004. [19] W. Qu, Design and Implementation of a wireless VPN, Master Thesis Dalhousie University. 2001. [20] S. Kent and R. Atkinson, Security Architecture for the Internet Protocol, IETF RFC 2401. 1998.http://www.ietf.org/rfc/rfc2401.txt [21] OpenH323 Project. http://www.openh323.org/ [22] IPSec Tools Project. http://ipsec-tools.sourceforge.net/

Using An E-Model Implementation to Evaluate Speech ... - CiteSeerX

Using An E-Model Implementation to Evaluate Speech ... - CiteSeerX

Suggest Documents

An AHP approach to evaluate the implementation of ... - CiteSeerX

Using Card Sorting to Evaluate Branding in an Academic ... - CiteSeerX

An Algorithm to Evaluate Iceberg Query using

An Outcomes Approach to Evaluate Professional ... - CiteSeerX

An Outcomes Approach to Evaluate Professional ... - CiteSeerX

An E-Model Implementation for Speech Quality Evaluation ... - CiteSeerX

The Implementation of Speech Recognition using Mel

An E-Model Implementation for Speech Quality Evaluation ... - CiteSeerX

Implementation of Text To Speech for Marathi Language Using

Implementation of malayalam text to speech using concatenative ...

Usability Issues in an Interactive Speech-to-Speech ... - CiteSeerX

Speech interface implementation for XML browser - CiteSeerX

Design and Implementation of Text To Speech Conversion - CiteSeerX

IMPLEMENTATION AND ANALYSIS OF SPEECH ... - CiteSeerX

Using Habermas to Evaluate Two Approaches to

Decoding speech using LFPs - CiteSeerX

Decoding speech using LFPs - CiteSeerX

Using an Outcomes-Logic-Model Approach to Evaluate a Faculty ...

Using an inverse modelling approach to evaluate the water ... - CORE

An approach using multi-factor combination to evaluate high rocky ...

Using the LLTM to evaluate an item-generating ... - psychologie-aktuell

Using an Outcomes-Logic-Model Approach to Evaluate a Faculty ...

Using an inverse modelling approach to evaluate the water retention ...

Using an option pricing approach to evaluate strategic