Voice over IP Quality of Service Using Active Queue ...

2 downloads 0 Views 134KB Size Report
Abstract—This paper presents a comprehensive study about the impact of active queue management (AQM) on Voice over. Internet Protocol (VoIP) quality of ...
VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil

1

Voice over IP Quality of Service Using Active Queue Management Vitalio Alfonso Reguera, Félix F. Álvarez Paliza, Evelio M. García Fernández and Walter Godoy, Jr.

Abstract—This paper presents a comprehensive study about the impact of active queue management (AQM) on Voice over Internet Protocol (VoIP) quality of service. One of the most representative AQM scheme is analyzed through extensive simulation and its effect on the perceived quality of voice calls is evaluated. Different network scenarios, changing network loads and different scheme control parameters are investigated. Network impairments are related to user perception by means of well known algorithmic models, expressing user satisfaction in the MOS scale. The main results obtained show that the use of active queue management schemes like adaptive random early detection (ARED) significantly improve the perceived quality of voice calls.

Index Terms—Queuing analysis, Internet, Protocols. I. INTRODUCTION

T

HE traditional queuing design of Internet routers implies the use of buffers to absorb the traffic that can not be immediately forwarded. In periods of congestion, when the buffers are full, the traffic in excess is dropped (this kind of queue is known as droptail queue) causing bursty packet loss. This condition triggers the congestion control mechanism of responsive network protocols like transmission control protocol (TCP), reducing transmission rates and eventually, releasing the congestion. The presence of unresponsive traffic in Internet, mostly carried by User Datagram Protocol (UDP), produces an unfair bandwidth distribution among the flows sharing the same link. The aim of active queue management is to reduce queue lengths and oscillations by maintaining high link utilization with fair resources allocation. This seems to be very promising for applications constrained by delay and delay variations (jitter). To achieve the above goals, AQM signaling traffic sources the presence of incipient congestion either by marking packets (e.g. by using explicit notification congestion (ECN) [1]) or dropping packets (here after the term marking will be used for both, marking and dropping packets, unless otherwise Manuscript received April 04, 2006. This work was supported in part by CAPES under grant CAPES/MES-CUBA 016/06. V. A. Reguera and F. A Paliza are with the Electronics and Telecommunication Department, Central University of Las Villas, Santa Clara, Cuba (e-mail: {vitalio, fapaliza}@uclv.edu.cu). E. M. G. Fernández is with the Electrical Engineering Department, Federal University of Paraná, P. O. Box 19011, 81531-990, Curitiba, Brazil (e-mail: [email protected]) W. Godoy Jr. is with the Electronics Department, Federal Technological University of Paraná, Curitiba, Brazil (e-mail: [email protected]).

specified). Several algorithms have been proposed since the original conception of random early detection (RED) [2]. This scheme adjusts the marking rate according to the average queue length. The principal differences among AQM proposals are the technique used to detect congestion and the control mechanism that stabilizes the queue length. Voice quality in VoIP systems is severely impaired by network delay, jitter and bursty loss. The treatment made to voice signals by codecs and other transmission techniques also affects the user’s perception. Voice packets encapsulated in RTP/UDP/IP protocols pass through the network, experiencing different impairments. ITU recommendation G.107 [3] describes an algorithm, called the E model, for quantifying the effect of these impairments in transmission quality. The index provided by the E-model can be easily related to human perception of voice quality. Also from the Internet perspective, RFC 3611 [4] extend the RTP control protocol (RTCP) to incorporate a set of metrics for monitoring voice over IP calls. The complex interaction between congestion control mechanisms and network behavior make analytically difficult to evaluate the impact of AQM on VoIP quality of service. In [5], a simplified approach is made assuming an unrealistic traffic model. Only one active queue management algorithm is considered and the results discourage its implementation. A framework for supporting emergency telecommunications service in IP telephony, proposed in [6], states that more advanced forms of AQM will be needed to deal with emergency traffic. Also, QoS guidelines from Cisco avoid the use of AQM in the presence of voice traffic. Instead, they recommend class based scheduling mechanisms. This work is motivated by the absence of a more complete study on how the quality of VoIP applications is influenced by the underlying network control mechanism; specifically active queue management. A set of simulations involving one of the most representative AQM schemes are analyzed. Network metrics, associated to test voice calls, are mapped into a well known subjective quality score defined in [7]. The results show that the use of active queue management mechanisms could improve the quality of service for voice over IP applications. The rest of the paper is organized as follow. Section II describes the active queue management schemes that will be used in the simulations. Section III underlines the techniques employed to improve and evaluate the quality of voice over IP calls. The experimental setup appears in Section IV. The

VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil impact of AQM on VoIP whit packet drop and with ECN is discussed in Section V. In Section VI other network scenarios are analyzed. The main results obtained are summarized in Section VII. II. ACTIVE QUEUE MANAGEMENT Others queue schemes different than droptail have received a lot of attention in recent years. Many AQM algorithms have been reported in the peer review literature. The experiments presented here are based on one of them, adaptive RED (ARED) [8]. No specific selection criterion was used, except for the existence of tested code within the simulation software described in Section IV. Note that our goal is not making a comparative study of AQM techniques, but bringing some light on how router based congestion control affects quality of service of voice calls. One of the hardest tasks in configuring active queue management is to establish the right values for the parameters set that rules its behavior. Performance metrics are very sensible to these parameters. Its effective values could change according to the number of sources, link’s capacity and round trip time delay. In RED, for example, four parameters are left to specify by network operators: queue minimum and maximum thresholds (minth and maxth), maximum marking probability (maxp) and queue average constant (wq). As it has been pointed out in [8], variations in the value of these parameters dramatically affect performance. Moreover, it is practically impossible to find out a set of parameter’s values that address the Internet diversity of scenarios. Another element that has a significant impact on the performance of AQM is whether the queue is measured in packets or in bytes. As it has been argued in [9], large packets are more likely marked when operating in byte mode. This means that applications generating small packet has a preferential treatment in byte mode. Here on, we make a brief characterization of the studied scheme. Random early detection was the first proposed active queue management algorithm. RED increases the congestion signal sent back to the traffic sources proportionally to the average queue size. The marking probability p(aq) is computed as

 0    p(a q ) =  1  max  a q − minth p   max th − min th 

)

ARED is an enhance version of RED with the ability of dynamically tuning parameters to adjust current traffic conditions. According to designers the only parameter that must be given is the target queue size or equivalently the target queue delay. A recent study revealed that ARED operating in byte mode has a positive effect on web performance [9]. III. VOICE OVER IP AND QOS One of the biggest challenges in next generation networks (NGN) is to offer a quality of service similar to that of the traditional telephone system. IP networks and its best-effort paradigm introduce severe impairments to real time applications. In VoIP systems, delay, jitter and packet loss deteriorates human perception of voice. Several techniques have been developed to reduce the impact of these factors on voice quality. A. VoIP Systems and Impairments In a normal VoIP communication, the voice is digitalized and encoded by using a standard codec (e.g. G.711, G.729, GSM, G.723, etc.). The frames produced by the codec are assembled in packets that travel trough the network, suffering eventual drops and delays. At the listener side, received packets are disassembled and put in a jitter buffer that smooth out delay variations due to network transmission. Finally, the voice signal is reconstructed from digitalized voice samples. As it can be observed in Table I, codecs differ in the produced bit rate. The reduction in bandwidth consumption is at the cost of decreasing perceived quality of voice. Further increase of efficiency can be achieved by suppressing silence transmission, by using voice activity detection (VAD). This results in an on-off traffic pattern composed by interlace periods of constant rate transmission and absence of packet emissions. Codecs could also introduce significant delays related to the time require for encoding. TABLE I CHARACTERISTICS OF STANDARD CODECS Impairment Rate Codec type Reference Factor (Kbps) MIC CS-ACELP VSELP ACELP

G.711 G.729 GSM G.723

64 8 5.6 5.3

0 10 23 19

if a q < min th if a q ≥ max th

(1)

  otherwise  

where aq is the exponential weighted moving average (EWMA) of the queue length which is updated at every packet arrival, by sampling the actual queue size q, as

(

2

a q ← 1 − wq a q + wq q

(2)

In addition to the encoding delay, codecs also contributes to the end-to-end delay, packetization delay, network delay, jitter buffer delay and decoding delay. When the delay is grater than certain value, communication lacks of interactivity and users loss the ability of maintaining a fluid conversation. Additionally, the combination of delay and echo produces an undesirable effect in a voice call [3]. Jitter, mostly due to network queues oscillations, causes out of time packets that must be dropped at the receiver side. The jitter buffer lets a margin to absorb delay variations by scheduling the playout time a few milliseconds after the arrival time. Because of the

VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil TABLE II EQUIVALENT R VALUES INTO ESTIMATE MOS User satisfaction very satisfied satisfied some user dissatisfied many user dissatisfied nearly all user dissatisfied

R factor (lower limit) 90 80 70 60 50

MOS (lower limit) 4.34 4.03 3.60 3.10 2.58

loss rate

of packet loss. A period of time with high loss rate is a “burst” and a period of time between burst is a “gap”. The loss rate boundary between burst and gap is set to be Gmin-1 with a recommended value of 6.25% [4, 18]. It is known that transitions from gap to burst, and vice versa, not suddenly affect the perceived quality [13], as illustrated in Fig 1. This consideration is taken into account to compute the effective equipment impairment factor. RFC 3611 incorporates a VoIP metrics report block to RTCP that includes factors from [3] and [18]. Those metrics are intended to monitoring voice over IP calls.

time perceived quality

dynamics of network delay, scheduling the playout time turn out to be difficult to set a priori. Even within a voice call, delays can dramatically change, causing excessive out of time packets. To overcome this problem several algorithms for adaptive jitter buffering have been proposed [10-12]. Adaptive jitter buffer adjusts the playout time by estimating the current delay and its variation. These techniques can achieve a remarkable reduction of late packets drop. Bursty packet losses, mainly due to network congestion, severely affect voice quality. Also, the burst location within the voice call has an impact in user’s perception. A study conducted in [13] showed that a bursty loss at the end of the conversation has a more pernicious effect than if it were at the beginning; this is called the recency effect. The consequence of packet loss produced either by network or by out of time arrival can be mitigated by adding redundancy to voice packets. For example, by using forward error correction (FEC) techniques, a loss packet is recovered from copies carried by subsequent packets if they arrives timely [14]. However, loss due to network congestion occurs in burst, causing loss of successive packets, reducing the efficiency of FEC schemes. A simpler receiver based approach is the use of packet loss concealment (PLC) [14, 15]. This mechanism consists of replacing loss packets by inferring loss data from previous arrived packets. Like in FEC, the beneficial effect of PLC banishes when packet loss occurs in burst. Combination of adaptive playout scheduling and loss concealment methods can bring a significant improve in audio quality [11].

3

time

B. VoIP QoS Assessment Mean opinion score (MOS) [7] has been traditionally used to measure subjective perception of voice communication. MOS is given on a scale of 1-5, where a higher value corresponds to better quality. Since MOS is a subjective test difficult to be carried out in practical situations, some other objective tests have been developed (e.g. PESQ [16] and PSQM [17]). ITU recommendation G.107 describes the E model, a computational algorithm that incorporates impairment factors present in modern transmission networks. The output of the E-model is a scalar quality rating value, R, which is computed as: R = Ro − Is − Id − Ie-eff + A

(3)

were Ro represents, in principle, the basic signal-to-noise ratio, Is is a combination of all impairments which occur more or less simultaneously with the voice signal, Id represents the impairments caused by delay, Ie-eff is the effective equipment impairment factor and A is the advantage factor. Table II show the relation between R-value, MOS and user satisfaction. In [18], a Markovian model is used to incorporate the effects of bursty packet loss and recency to early revision of the E model (current recommendation incorporates bursty loss effect). Here, a Gilbert-Elliott model captures the bursty nature

Fig. 1. Effect of the transmission quality (packet loss rate) on user perceived quality of service.

We use the guidelines provided in [4] to assess quality of services of VoIP calls and evaluate the impact of AQM on user’s perception. IV. EXPERIMENT SETUP To measure the impact of AQM on VoIP quality of service, extensive simulations were conducted by using the network simulator version 2 (NS2) [19]. The network scenario was composed by a dumbbell topology and a mix of long-lived traffic (e.g. FTP) and short-lived traffic (e.g. web traffic) as well as reverse path traffic (see Fig. 2). The topology selection criteria was based in the fact that dumbbell topology has been traditionally used to evaluate AQM schemes [2,8]. TCP connections for long-lived traffic were progressively started at the beginning of the simulation, together with web flows and reversed path flows. The round-trip time (RRT) for TCP connections were varied in the range of 20 to 400 ms. TCP’s segment size was set to 1000 bytes. Unresponsive flows were also simulated by introducing UDP sessions. The active queue management scheme under study was made operational in routers with a queue size of 120 packets. AQM’s parameters were set according to the guidelines

VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil provided in [8]. The target queue size was set to 20 packets. This represents an average queue delay of 40 ms, considering 500 bytes for mean packet size.

10 Mbps variable delay

2 Mbps

r1

4

TABLE III REPORTED METRICS FOR TEST CALLS Call No. 1 Call No. 2 Metric Droptail ARED Droptail ARED loss rate (%) 6.26 1.00 6.04 0.91 discard rate (%) 0.00 0.00 0.00 0.00 burst density (%) 30.95 19.57 31.06 17.62 gap density (%) 0.71 0.75 0.56 0.69 burst duration (ms) 142.33 106.82 143.00 119.47 gap duration (ms) 630.81 7723.91 650.57 8886.5 205.2 363.25 202.35 m2e delay (ms) 354.47 R factor 50.24 85.30 50.05 85.95 MOSCQ 2.59 4.21 2.58 4.23

r1

5 ms

10 Mbps, 2 ms

long-lived traffic nodes

VoIP terminal

short-lived traffic nodes

unresponsive traffic node

Fig. 2. Network simulation topology.

Test voice calls were created by using constant bit rate (CBR) traffic. UDP packets carrying 92 bytes (80 for audio samples and 12 for RTP header) of payload were sent periodically every 10 ms to simulated G.711 codec. No VAD techniques were introduced. Calls have duration of 3 minutes and were started sequentially by intercalating inactivity periods of 10 seconds. At the receiver end arrived packets were traced to a data file. In order to measure the quality of service, the trace file was processed post-simulation. We introduced a pseudo adaptive jitter buffer, scheduling the packet’s playout time to be the mean delay of arrived packets plus four times the standard deviation of delays. Late packets were discarded and considered loss in the calculation of the effective equipment impairment factor. Because of the random nature of gaps location within the voice call, the recency effect was not taken into account. For the computation of delay impairment factor, we assume perfect echo cancellation. The reported metrics were calculated by following the methodology provided in [4].

V. IMPACT OF ACTIVE QUEUE MANAGEMENT A. Dropping packets Since drop is the natural way of signal back congestion to traffic sources, we first analyze this case. Table III summarizes the reported metrics for two non overlapping test calls, in the scenario of Fig. 2, using droptail queue. Test calls were mixed with 50 long-lived TCP flows and web traffic. Reported metrics are defined as in [4], except for the m2e delay, which is the mean one way delay from mouth to ear. The metrics show a similar behavior in all three cases. The achieved mean opinion score for conversational quality indicates that nearly

all users are dissatisfied. Bursty packet loss and delay have almost the same impact in voice quality deterioration. The main contributors to delay are the queue delay and jitter buffer delay. The high network load provokes queue fluctuations near to the tail causing bursty packet loss, delay, and delay variations. Fig. 3 shows network delay for voice packets of one test call. Observe that the delay is highly variable which implies a large amount of buffer to absorb jitter. On the other hand, high levels of queue occupancy result in full link utilization. If we reduce queue delay and queue delay variations maintaining or reducing packet loss, we could expect a significant improve in voice quality. In the rest of this section, we analyze the impact of active queue management on VoIP quality of service. Bearing in mind that byte mode operation could be favorable for voice packets, we considered it first and deferred the study of packet mode operation to section VI.

Fig. 3. Network delay (ms) for voice packet using droptail queue.

Keeping network configuration as above and setting up ARED in routers of Fig. 2, we achieved a significant reduction in network delay. As illustrated in Fig. 4, ARED stabilized the queue delay very close to the target value of 40 ms. The mean network delay is 58.2 ms with an standard deviation of 35.65 ms. Note that network delay for packets that passes through the queue without waiting in Fig. 2 is 9.67, so the mean queue delay is 48.53 ms. While delay reduction is an expected result, the dramatic decreasing of loss rate is a very surprising one (see Table III). It is caused by the combination of increased time between burst (gap duration) and lower burst density. This means that ARED is able to stabilize queue size around the desired target without imposing an aggressive dropping rate, at least for

VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil voice packets. Reduction of impairments raises the R factor, so the mean opinion score for conversational quality is over 4.20 points. This represents a user satisfaction similar to that of the conventional telephone system. The price that it has to be paid is the diminution in the observed link utilization, which goes down to 95%. In a practical situation this effect could be mitigated by adjusting the target queue size to a reasonable value in virtue of delay-utilization tradeoff.

5

pass. This approach helps to prevent unresponsive traffic for starve bandwidth, but it could impose an excessive dropping rate to voice packets. On the other hand, when no action is taken on misbehaved flows, it is more difficult to control queue length. Both aspects (drop rate and queue length) impact the perceived quality of voice. Repeating the experiment, now with ECN, we obtain the results shown in Fig. 6. The mean opinion score when using

Fig. 4. Network delay (ms) for voice packet using Adaptive RED.

Further simulations reveal that ARED exhibits a remarkable result with different network loads. Fig. 5 shows the MOS obtained when the number of TCP long-lived flows is varied from 10 to 100 flows. In all cases, the MOS value is over 4.03 points, the lower limit for reach and adequate user satisfaction. Fig. 6. Mean opinion score for conversational quality with different network loads. Marking ECN capable packets. Simulations results with 95% confidence.

ARED does not differ from that obtained with drop regime (note that Fig.6 is nearly identical to Fig.5). Experimenting with 50 long-lived flows give a mouth to ear delay of around 190 ms, which is lower than before, while the dropping rate remains almost the same. This means that the active queue management scheme wisely marks packets to control the sending rate of TCP sources. The observed utilization is 96 %, which is a little bit better than that reached dropping packets.

VI. OTHER SIMULATIONS Fig. 5. Mean opinion score for conversational quality with different network loads. Dropping packets. Simulations results with 95% confidence.

When the number of long-lived web flows is not greater than ten, the mean opinion score for conversational quality achieved using ARED agrees with that under heavy traffic load conditions. Additional experiments lowering the offered load below bottleneck link capacity results in an adequate user satisfaction. B. Using ECN We now analyze the effect of active queue management on voice quality when ECN capable packets are marked instead of dropped. In this case, an important matter is how the AQM scheme treats unresponsive traffic when a mark event occurs. The implementation used for ARED drops non ECN capable packets when needed, in contrast with other schemes that let it

Simulations in Section V were carried out with AQM operating in byte mode. Here, we first repeat simulations by operating in packet mode. Results for one test calls are show in Table IV. When ARED is used in packet mode, we verify a slightly decrease in the mean mouth to ear delay with practically no change in the dropping behavior. The link utilization remains in 95 %. Our last experiment was conducted mixing 30 TCP flows with unresponsive traffic. The traffic load for misbehave flows was set to a quarter of the bottleneck link capacity. For droptail queue we found and exaggerated loss rate producing a MOSCQ of 2.53. ARED significantly improve droptail score; see Table IV, achieving a mean opinion score of 3.95, with an observed link utilization of about 97 %.

VI International Telecommunications Symposium (ITS2006), September 3-6, 2006, Fortaleza-CE, Brazil

TABLE IV OTHER REPORTED METRICS WITH ARED Metric

Packet Mode

loss rate (%) discard rate (%) burst density (%) gap density (%) burst duration (ms) gap duration (ms) m2e delay (ms) R factor MOSCQ

1.02 0.00 19.54 0.71 116.15 6554.81 197.35 85.83 4.22

Unresponsive Traffic 1.78 0.00 19.14 0.84 138.81 2510.29 231.72 78.05 3.95

[9]

[10]

[11]

[12]

[13]

[14]

VII. CONCLUSION We performed extensive simulations to assess the impact of active queue management on VoIP quality of service. Network impairments were related to user perception through well known algorithmic models. One of the most representative AQM scheme was studied: ARED. Our main results are: • ARED operating in byte mode provides excellent treatment to voice packets even under heavy load conditions. Offering a perceived voice quality similar to that of the conventional telephone system. • When ECN is used the results from ARED are comparable to those under dropping regime. • There is not variation in the perceived voice quality when operating in packet mode. We conclude, based on simulation, that the uses of active queue management schemes like ARED significantly increase the perceived quality of voice for calls passing through congested links. Nevertheless, the complexity of global networks can not be accurately captured by simulation models. Further work has to be done to assess impact of AQM on voice over IP QoS in real scenarios.

REFERENCES [1] [2]

[3] [4] [5]

[6]

[7] [8]

K. Ramakrishnan, S. Floyd, D. Black, The Addition of Explicit Congestion Notification (ECN) to IP, RFC 3168, September 2001. S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoidance”, IEEE/ACM Trans. Networking, vol. 1, no. 4, August 1993, pp. 397–413. ITU-T Recommendation G.107, The E-model, a computational model for use in transmission planning, March 2005. T. Friedman, R. Caceres, A. Clark, RTP Control Protocol Extended Reports (RTCP XR), RFC 3611, November 2003. Martin May, Thomas Bonald, and Jean-Chrysostome Bolot, “Analytic evaluation of RED performance”, in Proceedings of IEEE Infocom, March 2000. K. Carlberg, I. Brown, C. Beard, Framework for Supporting Emergency Telecommunications Service (ETS) in IP Telephony, RFC 4190, November 2005. ITU-T Recommendation P.800, Methods for subjective determination of transmission quality, August 1996. S. Floyd, R. Gummadi, S. Shenker, Adaptive RED: An Algorithm for Increasing the Robustness of RED’s Active Queue Management, [Online]. Available: http://www.icir.org/floyd/papers/adaptiveRed.pdf, August 1, 2001.

[15]

[16] [17] [18] [19]

6

L. Le, J. Aikat, K. Jeffay, F. D. Smith, “The Effects of Active Queue Management and Explicit Congestion Notification on Web Performance”, ACM SIGCOMM, 2003. R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne, “Adaptive play out mechanisms for packetized audio applications in wide-area networks”, in Proc. IEEE INFOCOM ’94, vol. 2, June 1994, pp.680– 688. J. Rosenberg, L. Qiu, and H. Schulzrinne, “Integrating packet FEC into adaptive voice playout buffer algorithms on the internet”, in Proc. IEEE INFOCOM 2000, vol. 3, Tel Aviv, Israel, Mar. 2000, pp. 1705–1714. C. J. Sreenan, J.-C. Chen, P. Agrawal, and B. Narendran, “Delay reduction techniques for playout buffering”, IEEE Trans. Multimedia, vol. 2, June 2000, pp. 88–100. ITU-T Contribution COM12-D139 France Télécom R&D (Q14/12), Study of the relationship between instantaneous and overall subjective speech quality for time-varying quality speech sequences: influence of a recency effect, May 2000. C. Perkins, O. Hodson, and V. Hardman, “A survey of packet loss recovery techniques for streaming audio”, IEEE Network, vol.12, no.5, pp. 40-48, September 1998. B. W. Wah, X. Su, D. Lin, “A survey of error-concealment schemes for real-time audio and video transmissions over the Internet”, Proc. Int. Symposium on Multimedia Software Engineering, IEEE, Taipei, Taiwan, December 2000, pp. 17-24. ITU-T Recommendation P.862, Perceprual Evaluation of Speech Quality (PESQ) ITU-T Recommendation P.861, Objective quality assessment of telephone band speech codecs (PSQM). ETSI TR 101 329-5, Quality of Service (QoS) measurement methodologies, January 2000. L. Breslau, D. Estrin, K. Fall, S. Floyd, J. Heidemann, A. Helmy,P. Huang, S. McCanne, K. Varadhan, Y. Xu, and H. Yu, “Advances in network simulation”, IEEE Computer, vol. 33, no. 5, May 2000, pp. 59–67.