Telecommun Syst DOI 10.1007/s11235-010-9356-5
Quality of experience (QoE) driven adaptation scheme for voice/video over IP E. Jammeh · I. Mkwawa · A. Khan · M. Goudarzi · L. Sun · E. Ifeachor
© Springer Science+Business Media, LLC 2010
Abstract Network quality of service (NQoS) of IP networks is unpredictable and impacts the quality of networked multimedia services. Adaptive voice and video schemes are therefore vital for the provision of voice over IP (VoIP) services for optimised quality of experience (QoE). Traditional adaptation schemes based on NQoS do not take perceived quality into consideration even though the user is the best judge of quality. Additionally, uncertainties inherent in NQoS parameter measurements make the design of adaptation schemes difficult and their performance suboptimal. This paper presents a QoE-driven adaptation scheme for voice and video over IP to solve the optimisation problem to provide optimal QoE for networked voice and video applications. The adaptive VoIP architecture was implemented and tested both in NS2 and in an Open IMS Core network to allow extensive simulation and test-bed evaluation. Results show that the scheme was optimally responsive to available network bandwidth and congestion for both voice and video and optimised delivered QoE for different network conditions, and is friendly to TCP traffic. Keywords VoIP · IMS · AMR · QoE · NQoS · Quality adaptation E. Jammeh () · I. Mkwawa · A. Khan · M. Goudarzi · L. Sun · E. Ifeachor Centre for Signal Processing and Multimedia Communication, School of Computing, Communication and Electronics, University of Plymouth, Plymouth, PL4 8AA, UK e-mail:
[email protected] I. Mkwawa e-mail:
[email protected] L. Sun e-mail:
[email protected] E. Ifeachor e-mail:
[email protected]
1 Introduction The impact and market penetration of VoIP (voice and video calls) has influenced traditional telephony as witnessed by the popularity of VoIP services (e.g. SKYPE [1]). Next generation multimedia services are likely to be marketed based on how the service meets users’ QoE expectations, and their success in market penetration will depend on their ability to optimise perceived QoE. In addition, users’ demand for VoIP quality and the amount of multimedia traffic has continued to increase. However, in general, network quality of service (NQoS) cannot be guaranteed because of unpredictable characteristics of IP networks which impacts QoE of networked multimedia services. Service and network providers therefore employ adaptive management systems to optimise quality, by compensating for network impairments. Traditional adaptation schemes are not QoE-driven, but are rather based on network quality of service (NQoS) parameters such as delay, jitter and packet loss [2, 3]. They do not take perceived quality into consideration even though a lot of research has gone into the development of voice quality prediction models [4] which are yet to be extensively exploited in adaptive architectures. Furthermore, measurement uncertainties inherent in NQoS parameters make NQoSdriven adaptation schemes suboptimal. In contrast with traditional NQoS-driven adaptation schemes, this paper proposes a QoE-driven adaptation scheme for the optimisation of perceived voice/video quality, or end users’ quality of experience (QoE). The proposed end-to-end QoE based approach can, not only be used by service and application providers (e.g. used in application/content server or voice/video client), but it can also be used by network providers to design/provide appropriate network devices/solutions (e.g. QoE driven priority mark-
E. Jammeh et al.
ing, admission policy or scheduling/queueing schemes, or QoE driven network management system). The QoE-driven adaptation scheme was implemented in NS2 and in an Open IMS Core network to allow extensive simulation and test-bed evaluation. Extensive evaluation and results show that the scheme was optimally responsive to available network bandwidth and congestion. Delivered QoE was also shown to be optimised for different network conditions and available bandwidth, and it was also shown to be friendly to TCP traffic. The remainder of this paper is organised as follows. Section 2 presented related work and the system architecture is presented in Sect. 3. The implementation of the adaptive scheme in NS2 and simulation results are presented in Sect. 4. Test bed implementation and evaluation results are presented in Sect. 5. The conclusions are given in Sect. 6.
2 Related work Adaptive rate control is an important class of VoIP QoS adaptation technique which involves matching the transmission rate to the available network capacity, thereby minimising congestion impairments and optimising delivered voice quality for the available bandwidth. Different types of adaptive sources, based on dynamic rate control have been proposed and used for multimedia streams [5–8]. Rate adaptive VoIP and video applications were traditionally based on network quality of service (NQoS) [9, 10], delivered perceived quality of experience (PQoS) or on the quality of experience (QoE). In [11] an end-to-end QoS-factor transmission control mechanism was proposed based on RTCP-XR [12] to manage the speech quality in real-time. The scheme has a scalability shortcoming because the measurement of QoS in a large infrastructure may not be practically feasible. New hybrid adaptive algorithms that adjust the transmission rate of VoIP source according to both network parameters and perceptual quality such as in [13] have been proposed, and shown to result in better speech quality while improving bandwidth utilisation. Rate control mechanisms based on only network impairments such as [14, 15] may not be sufficient to provide optimum QoS, in terms of the voice quality perceived by the user. In [15] an adaptive transmission rate algorithm based on network measurements is proposed, which selects a transmission rate based on the network capacity, detected from temporal congestion. However, this algorithm does not consider the effects of end-to-end jitter buffer delay accurately and can lead to wrong decisions in case of increased delays in network that do not affect perceptual quality of the speech. [16] presents a control mechanism by dynamically adjusting the encoding bit rate based on the feedback
information about the network congestion from RTCP packets and speech properties. In [17] a new quality of service control scheme was proposed by combining rate adaptation (through the use of Adaptive Multi-Rate codec (AMR) and priority marking of voice packets. One limitation of this approach is that it requires a global controller system and a full-reference quality measurement model to be implemented, which is not practical in current mobile or next generation networks. For video, among the various encoding parameters that play a significant role in QoE (e.g. send bitrate) the content dynamics (i.e. the spatial and temporal activity of the content) are critical for the final perceptual outcome. The inter-relationships between adapting the video send bitrate, the activity of the content and QoE are not well understood and relatively less researched. In [18] the authors have proposed content-based video adaptation where a machine learning method is applied to extract content features from the compressed video streams. In [19, 20] the authors propose content-based adaptation using an optimum adaptation trajectory. Similarly in [21], video adaptation based on utility function obtained from content characteristics is proposed. Whereas in [22] context-aware computing to adapt video content accessed by users with differing device capabilities is proposed. However, their work only considers the video content without considering the impact of network condition. Hence, resulting in over-provisioning of the bandwidth as content providers usually send video at the highest send bitrate because the initial video quality requirement is not well understood. In [23], the authors propose an adaptive fuzzy rate control feedback algorithm based on packet loss rate and congestion notification from routers, but they did not consider initial optimum encoding rate of the video. In [24] a model is proposed based on dynamic bitrate control to subjectively estimate the quality of video streaming. In [9] the authors have proposed a bitrate control scheme based on congestion feedback over the Internet. In [25–27] authors have presented adaptation based on network state and congestion control over UMTS transport channels. Authors in [28] have presented an adaptive bandwidth allocation scheme based on the queue length and the packet loss probability. A scheme based on packet dispersion instead of packet loss is presented in [29] using a fuzzy rule in combination with a transcoder to adapt the video bitrate. Recent studies in [9] and [25] have proposed sender bitrate adaptation schemes over WLAN and UMTS networks. Both of these schemes control the sender bitrate from network statistics and are evaluated against network throughput. Compared to these schemes, our proposed adaptation scheme is driven by users’ QoE via the proposed prediction model, takes account of network congestion (via network statistics) and is evaluated against the perceived QoS
Quality of experience (QoE) driven adaptation scheme for voice/video over IP
(QoE) of the user. Our results show QoE gains when adaptation is applied and the scheme is responsive to network congestion and fairly easy to implement in real time. It also takes the video content type into account and hence defining the initial SBR requirement by minimising over provisioning. Most of these schemes do not take into account the video content as the dynamics of the content are critical for the final perceptual outcome. In addition, the main aim of most of these schemes is to minimise the end-to-end packet loss and/or delay. This optimisation is based on Network Quality of Service (NQoS) parameters which are not directly linked with end user’s perceived quality or QoE metrics.
3 System architecture Figure 1 shows a conceptual diagram of a QoE-driven adaptation scheme for voice/video over IP system. The adaptation can be at the sender side, such as adapting send bit rate and/or adapting packetisation interval (e.g. one or two speech frames in a RTP packet), or at receiver side, e.g. automatically adapting jitter buffer algorithm or jitter buffer size to achieve an optimised end-to-end QoE. At the top of the figure, intrusive voice/video quality measurement block is used to measure voice/video quality at different network conditions (e.g. different packet loss, jitter and delay) or different application settings (e.g. different codec type, content type, sender bit rate, frame rate (for video), resolution (for video), packetisation interval, jitter buffer size, jitter buffer algorithms). The measurement is based on comparing the reference and the degraded voice/video signals. Per-
Fig. 1 Conceptual diagram of QoE-driven adaptation scheme
ceptual Evaluation of Speech Quality (PESQ) [30] is used for measuring voice quality in the paper. Ideally, Perceptual Evaluation of Video Quality (PEVQ) should be used for measuring video quality, but as PEVQ is not available in the public domain, Peak Signal Noise Ratio (PSNR) is used for measuring video quality in the paper to prove the concept. The intrusive voice/video quality measurements (PESQ/PSNR) are used to derive non-intrusive QoE prediction model and/or rate adaptive control mechanism, at the training stage, based on neural network and/or statistical methods (e.g. non-linear regression). The derived QoE prediction model can predict voice/video quality (in terms of MOS) from network QoS parameters (e.g. packet loss, delay, jitter) and application related parameters (e.g. codec type, content type, send bit rate and buffer size). The predicted QoE metrics, together with network QoS parameters, can be used in QoE-driven control scheme to control jitter buffer at receive side or control send bit rate at sender side (as shown in Fig. 1). Feedback information can be sent through extended RTCP report (RTCP XR) or by SIP/SDP QoS report. In the case of video over IP, video content classification (e.g. classify video as fast/medium/slow movement) can be carried out from raw video at the sender side. This information can be combined with feedback information from the receiver for rate adaptive control or packetisation interval control. In Fig. 1, the two most important components are the QoE prediction model and adaptive control mechanisms which are also the focus of this paper. The QoE prediction model is used to derive or learn the relationship between QoE metric (e.g. MOS score as used in the paper) and network/application related parameters (e.g. packet loss, delay,
E. Jammeh et al. Table 1 Fitting parameters for Ie vs. packet loss for AMR codec (based on PESQ)
Para. MR122 MR102 MR795 MR74 MR67 MR59 MR515 MR475 (12.2 kbps) (10.2 kbps) (7.95 kbps) (7.4 kbps) (6.7 kbps) (5.9 kbps) (5.15 kbps) (4.75 kbps) a
22.9
21.14
22.8
22.63
22.86
23.41
25.83
26.46
b
0.3054
0.362
0.2198
0.2113
0.1799
0.148
0.1002
0.0879
c
10.07
13.23
19.5
20.76
23.79
27.36
30.45
32.42
R2
0.9997
0.9999
0.9998
0.9999
0.9999
0.9999
0.9999
0.9998
jitter, codec type, send bit rate and buffer size) as shown below. MOS = f (loss, delay, jitter, codectype, sendbitrate, . . .)
random packet loss to provide a more accurate voice quality prediction. Ie = Ie−c + (95 − Ie−c )
In our previous work [31, 32], we developed relevant voice/video quality prediction models using neural network and regression methods with a consideration for some key network/application parameters. The models are developed or derived using intrusive voice/video quality measurement algorithms (i.e. PESQ for voice and PSNR for video). In the case of Voice over IP, the developed voice quality prediction model for AMR codec (with 8 modes) in relation with packet loss rate and one-way delay is used in the paper. The model can be expressed by the following equations. MOS ⎧ for R ≤ 0 ⎪ ⎪ 1, ⎨ 1 + 0.035R + R(R − 60) × (100 − R)7 × 10−6 (1) = for 0 < R < 100 ⎪ ⎪ ⎩ 4.5, for R ≥ 100 where R, the Rating-factor and the delay impairment Id are given by, R = R0 − Id − Ie .
(2)
Id = 0.024d + 0.011(d − 177.3)H (d − 177.3), H (x) = 0 if x < 0 where H (x) = 1 if x ≥ 0
(3) (4)
Ppl Ppl + Bpl
(6)
where, Ie−c is the equipment impairment (i.e., codec quality), Bpl is the packet loss robustness and Ppl is the packet loss rate in percentage. In the case of the packet loss burstiness, Ie = Ie−c + (95 − Ie−c )
Ppl Ppl BurstR
+ Bpl
(7)
where BurstR is the burst ratio defining the ratio of average length of observed bursts in an arrival sequence to an average length of bursts expected under random loss. When packet loss is random, BurstR = 1. When packet loss is bursty BurstR > 1. By the E-model standard, (6) is only accurate for relatively small values of packet loss of less than 3%. The values of Ie−c and Bpl proposed by ITU for G.711 and G.723 were used in this paper (e.g. for G.711, Ie−c = 0; Bpl = 4.3). For video over IP, we developed video quality prediction models for predicting MOS score from network/application parameters such as sender bit rate (SBR), frame rate (FR), packet error rate (PER) for three different video content types (e.g. slow movement (SM), gentle walking (GW) and rapid movement (RM) video clips). The prediction is given by a rational model with a logarithmic function as, a1 + a2 FR + a3 ln(SBR) . 1 + a4 PER + a5 (PER)2
where d is the one-way delay measurement and Ie is the equipment impairment given by,
MOSv =
Ie = a ln(1 + bp) + c
The metric coefficients were obtained by a linear regression of the prediction model with our training set (MOS values obtained by objective evaluation using PSNR). The coefficients for all three content types are given in Table 2. In the following sections, we will demonstrate how to apply the above voice/video quality prediction models for developing adaptation control mechanisms to optimise end-toend QoE in both NS-2 simulation and voice/video over IP testbed.
(5)
where p is the average packet loss rate and a, b and c are constants dependent on AMR mode as shown in Table 1. For some codecs (such as G.711, G.729 and G.723.1), it is also possible to use equipment impairment Ie directly from extended E-model [33] based on the following equation. This takes into account burst packet loss instead of
(8)
Quality of experience (QoE) driven adaptation scheme for voice/video over IP Fig. 2 Implemented VoIP architecture in NS2
Table 2 Coefficients of metric models for all content types Coeff.
SM
GW
RM
a1
4.5796
3.4757
3.0946
a2
−0.0065
0.0022
−0.0065
a3
0.0573
0.0407
0.1464
a4
2.2073
2.4984
10.0437
a5
7.1773
−3.7433
0.6865
4 Simulation 4.1 Voice NS2 network simulator [34] lacks mature models for the simulation of VoIP quality adaptation, and part of this work is the enhancement of NS2 with modules that enable accurate simulation and performance evaluation of adaptive voice applications as shown in Fig. 2. The VoIP source is modeled as a series of talkspurt/silence periods which have configurable durations derived from exponential distributions which are sent to the encoder module to generate voice frames. Generated voice frames are sent to a packetisation module which packetises the frame by adding an RTP header before sending the packetised frames to an NS2 agent for onward transmission across a simulated network to a receiver module. Received packets are depacketised and forwarded to a de-jitter (jitter buffer) module to minimise short term effects of network impairments. Packet loss is inadequate for signalling incipient congestion, and basing adaptation decisions on packet loss leads to a suboptimal PQOS optimisation [35]. Delay on the other hand, provides a multi-bit information on network state and has been shown to work better in network adaptation
schemes. The QoE-driven adaptation scheme takes a congestion level (Cl ) measure and PQoS, as inputs to the control module. Cl is computed from minimum and average delay as, Cl = 1 −
minOTT . avgOTT
(9)
where minOTT and avgOTT are the minimum and average one way delay, respectively. The monitor module computes NQoS parameters for each packet received within a talkspurt or measurement epoch. The instantaneous voice quality (MOSi ) is computed by a voice quality prediction model as described in Sect. 3. The maximum achievable voice quality (MOSm ) under the current codec setting, such as codec mode, is computed using the minimum network delay and a zero packet loss. This reflects the maximum achievable voice quality without any congestion. A short term QoE degradation (Sd )1 and a long term QoE degradation (Ld )2 due to the network congestion are computed and sent as feedback information to the source together with Cl . The source then runs the control algorithm (Algorithm 1 in Appendix A) when it receives the feedback information, in order to determine an appropriate AMR codec [36] mode. 4.1.1 Results Figure 3 depicts the network setup for the evaluation of the QoE driven adaptation of VoIP streams across an IP network. The ‘dumbbell’ topology was used and all side links 1 S = MOS − MOS . d m i 2 L = MOS − avgMOS . d m i
E. Jammeh et al. Fig. 3 Simulation setup
Fig. 4 Comparison of voice quality for 14 kbps for adaptive and non adaptive VoIP
were set to propagation delays of 1 ms and configured to ensure that congestion occurs only at a configurable tight link. The router at the tight link is configured with a buffer size equal to the bandwidth-delay product to avoid inducing packet loss from a too small buffer size. A drop-tail queueing policy was implemented at all nodes. The ability of the scheme to maximise delivered QoE was tested by abruptly varying the available bandwidth from 20 kbps to 14 kbps during a VoIP session.3 The performance of the QoE driven adaptation scheme was compared to that without any adaptation and the result is shown in Fig. 4. The QoS-driven quickly detected a change in available bandwidth and adapted the sending AMR mode ac3 The
sending bit rate of the AMR codec varies from 10 kbps (mode 1) to 17.5 kbps (mode 8) after the addition of RTP headers.
Fig. 5 Comparison of voice quality results for different bottleneck bandwidth
cordingly, which resulted in an optimised quality for the available bandwidth. The results showed that the adaptive scheme provided an MOS gain of over 1 compared to no adaptation. The bottleneck bandwidth was set to different capacities from 4 kbps to 20 kbps, and the performance of the adaptive scheme evaluated and compared to a non-adaptive VoIP session. The result of the comparison is shown in Fig. 5. Instead of suffering a sudden or drastic drop in quality as bandwidth is constricted, as is the case with no control, adaptive VoIP gracefully adapted the delivered quality to the available network bandwidth. For example, for a bottleneck bandwidth of 12 kbit/s, the quality obtained by the adaptive VoIP is an MOS score of 3.15 which is twice the 1.5 obtained without any control. Delivered QoE for the adapted VoIP session was always the maximum possible which shows that the adaptive algorithm ensured that the maximum possible
Quality of experience (QoE) driven adaptation scheme for voice/video over IP
4.2 Video over IP
Fig. 6 Voice quality for an adaptive VoIP co-existing with an on and off FTP traffic
was always delivered for a given bandwidth as shown in Fig. 5. VoIP services are becoming very popular and VoIP flows have increased in volume over the past few years. This increase in VoIP flows has led to concerns about whether they will cause network instability or congestion collapse because VoIP flows do not generally use TCP. Traffic from TCP flows constitutes the majority of traffic in the Internet and has been responsible for a lack of congestion collapse due to its congestion control mechanism [37]. It has consequently been generally accepted that all adaptive networked multimedia applications should be friendly to TCP traffic. UDP which is used by VoIP flows does not perform any congestion control, and any new protocols or services should be friendly to TCP flows in order to maintain network stability. TCP-friendliness means that flows should consume comparable bandwidth as TCP flows under the same network condition. When a flow shares the same network segment with TCP flows, it should not starve the TCP flows of bandwidth. The friendliness of the adaptive VoIP was tested by forcing it coexist with FTP traffic between 100 to 200 seconds and between 300 and 400 seconds for a VoIP session lasting 500 seconds. The result of this test shows that the control algorithm quickly detected the presence of FTP traffic and reduced the sending rate which resulted in a reduction of PQoS, and it also detected when there is no FTP traffic and increased the sending rate which resulted in an increased PQoS as shown in Fig. 6. This behaviour of the VoIP flows shows that it is fair to TCP flows. It also shows that the adaptive voice session did not use all the bandwidth it requires to deliver the maximum quality but shared the available bandwidth with the FTP.
The video quality adaptation architecture is described in Sect. 3 and the simulation setup used the ‘dumbbell’ topology shown in Fig. 3. The simulation was done on NS2 which was enhanced with open source framework Evalvid [38]. There are two TCP background traffic sender nodes and H263 encoded video source. Both links have 10 Mbps bandwidth capacity and a latency of 1 ms. The maximum video packet size was 1024 bytes and the packets are delivered using a random uniform error model. The video sequence used in the simulation were encoded using CIF frame sizes of 352 × 288, and at a fixed frame rate fixed of 30 f/s. The video sequences were classified in a previous work as Gentle Walking (GM) content type. A block diagram description of the process of video over IP adaptation is given in Fig. 1. A content feature extraction module extracts content features based on the spatiotemporal features for all incoming video [39]. A content classifier then classifies the content into three categories using cluster analysis [40] based on the spatial and temporal features of the video extracted from the previous block to determine the content type. The Feedback Mechanism is responsible for the analysis of feedback information from the network for onward transmission to a video send bitrate adaptor via periodic RTCP reports. A video send bitrate adaptation module adjusts the rate of the transmitted video according to the information received from content classifier (content-aware), feedback mechanism (networkaware) and according to delivered QoE (QoE-aware). At the Encoder the adapted stream is generated according to the required quality based on adaptation decisions of the adaptive controller. Video streams are layered encoded and layers are added or dropped according to the content type and network conditions. The pseudo code of the video send bitrate adaptor is shown in Algorithm 2 in Appendix B. 4.2.1 Results Figure 7 shows the performance comparison of the adaptive and non-adaptive video in terms of delivered quality MOS. From Fig. 7 a clear improvement in the MOS is seen after adaptation especially for higher PERs of greater than 0.1 (10%). However, adaptation does not work very well for PER beyond 10% as video quality degrades rapidly and it is very difficult for send bitrate adaptation techniques to greatly improve delivered video quality, without being complimented with Forward Error Correction (FEC) techniques. This is because the network congestion impacts the video to such an extent that reducing the video send bitrate does
E. Jammeh et al.
Fig. 7 Video quality adaptation
Fig. 9 The testbed
5 The testbed
Fig. 8 Video quality adaptation
not have an effect on loss due to congestion. In such a state adding loss/error robustness schemes such as Forward Error Correction (FEC) to the adaptive scheme can improve the delivered video quality. Figure 8 shows the friendliness of the adaptive video to TCP traffic, by sending background FTP traffic during certain periods of the duration of the video streaming. The duration of the simulation was 10 seconds. FTP started at 2 seconds and ended at 7 seconds. It can be seen from Fig. 8 that the proposed scheme operates efficiently in mixed traffic conditions. Furthermore, with the use of the proposed scheme the video streaming over a wired network shows a friendly behaviour to any other FTP flow that coexists in the network.
Figure 9 depicts the overall testbed built to perform VoIP quality adaptation by using an Open IMS Core for RTP session establishment and termination. Two open source soft phones (IMS-Communicator clients) were used for end to end communication. Queueing discipline (qdisc) in Linux kernel was used to control the bandwidth between the communicating clients, and voice and video quality adaptation experiments were performed separately. The open IMS core is an implementation of IMS Call Session Control Functions (CSCFs, i.e., P-CSCF, S-CSCF and I-CSCF) and a Home Subscriber Server (HSS), which together form the core elements of all IMS architectures as currently specified within 3GPP, 3GPP2 and ETSI TISPAN. All components are based on open source software and are used to exchange SIP messages, register users and setup/terminate multimedia sessions. IMS-Communicator is an open source IMS client built with JAVA programming language and implemented on top of the JAIN-SIP stack. The IMS-Communicator media stack use the Java Media Framework (JMF) API and it supports the following audio and video codecs, PCMU, G721, GSM, G723, DVI, PCMA, G722, G728, G729, H263, JPEG and H261. For demonstration purposes, this paper uses PCMU, G723 and H263 for VoIP adaptation. In order to carry out VoIP adaptation, the IMS-Communicator was ported with a Terminal Adaptation Module (TAM) which is made up of VoIP monitoring and adaptation functions. The monitoring function is done by periodically reporting delivered PQoS (e.g. MOS score for
Quality of experience (QoE) driven adaptation scheme for voice/video over IP
voice and video) and relevant terminal parameters (e.g. codec type and bit rate) to a communicating client via the Open IMS Core using SIP instant messages. The monitoring function was based on the RTCP reports periodically received at intervals of 5 seconds. The adaptation function carries out adaptation actions which are executed by exchanging SIP reINVITE or UPDATE methods amongst the communicating clients via the Open IMS Core. The adaptation actions may range from changing jitter buffer size, buffer mode or buffer algorithm selection to codec type or codec mode selection. For demonstration purposes, adaptation action used in this paper is to switch between codec types. 5.1 Adaptation mechanism 5.1.1 Voice A voice quality prediction model, as described in Sect. 3, was embedded into the TAM of the IMS-Communicator, and it is this model that is responsible for monitoring the quality of the VoIP session in real time. The callee of the ongoing VoIP session monitors the PQoS using equations in Sect. 3. If the voice quality drops below a predetermined MOS value threshold for a predetermined duration, the callee sends an alarm using the SIP instant message (IM) to the caller, which run the adaptation scheme in order to change codec type (e.g. from G.711 to G.723), to ensure the correct codec is used to optimise PQoS. The caller then sends reINVITE METHOD request with an offer of the newly chosen codec type to the callee, which replies with an OK to the caller, and the RTP session then consequently use the new codec. The qdisc queueing discipline is used to adjust the bandwidth of the RTP session between the communication clients. At the beginning of the experiment the bandwidth is set to 128 kb/s which is just enough to allow a G.711 codec to operate without congestion. But because of background traffic such as DNS, HTTP, SNMP, CUPS, SIP and ARP queries, the 128 kb/s bandwidth is eventually exceeded and some packets get lost which cause a degradation of the VoIP quality. The packet loss rate and delay is retrieved from the RTCP report at an interval of 5 seconds and used by the quality prediction model to compute delivered PQoS. If the PQoS starts to drop to or below a predetermined MOS value (3.5 MOS score in this paper), for a predetermine duration (until the next RTCP report is received), the callee then sends an alarm using SIP IM to the caller, which then runs the adaptation scheme which results in the use of a codec with higher compression. The use of a higher compression codec ensures a reduced bitrate for the voice session which ensures that the combined voice
session and background traffic bandwidth does not exceed the available bandwidth and ensures that virtually no packet loss occurs. 5.1.2 Video The video quality prediction model as described in (8) was used to compute video quality (MOS) which was used for monitoring the PQoS. Video session signalling flow is similar to the voice part, and H263 video codec was used and the adaptation was based on changing the send bit rate (SBR) while keeping the frame rate fixed at 30 fps. During the video session setup the caller sends a SIP INVITE message with an offer of H263 codec at 512 kbs and 30 fps to the callee to join the session. With an assumption that both parties are able to communicate with this codec, the video session is setup and both users communicate using RTP. If the video quality drops below a predetermined MOS value (threshold) for a predetermined duration, an SIP IM is sent by the callee to the caller which runs its adaptation scheme to determine the best video send bitrate to use. The caller then consequently sends a reINVITE METHOD request with an offer of the new SBR determined by the adaptation scheme, and the RTP video session then use this new SBR. The qdisc queueing discipline was used to control the bandwidth of the RTP session. At the beginning of the experiment the bandwidth was set at 512 kb/s which was just enough to allow H263 codec at 512 kbs to operate with the other background traffic. After a few minutes the network was congested, which caused a video quality degradation. Packet loss rate and delay was periodically retrieved from the RTCP report at an interval of 5 seconds, and used to predict PQoS of the delivered video quality. Once the PQoS starts to drop to or below a predetermined MOS value (3.0 MOS score in this paper) for a predetermine duration (until the next RTCP report is received), the callee sends an alarm to the caller using SIP IM. The caller then runs the adaptation scheme to determine the best SBR to use given the current PQoS. The newly determined SBR is then subsequently used in order to release the network congestion to ensure virtually no packet loss is encountered which ensures a maximisation of delivered PQoS. 5.2 Experimental results The experimental results have shown that there is a gain in voice and video quality by implementing VoIP quality adaptation mechanism in VoIP clients’ devices (cf., Figs. 10 and 11). For voice, it has been revealed that if a current voice session is using G.711 codec and the voice quality drops to a below a predetermined MOS value for a predetermined duration, the adaptation mechanism was triggered and hence a
E. Jammeh et al.
Fig. 10 Voice quality adaptation
Fig. 11 Video quality adaptation
gain in voice quality was experienced. It was also shown that there is a clear improvement of video quality by switching from higher send bitrate to lower send bitrate when there is network congestion.
6 Conclusions VoIP adaptive schemes are traditionally driven by NQoS and not on perceived user quality even though the user is the
Quality of experience (QoE) driven adaptation scheme for voice/video over IP
best judge of quality, and even when there is an increasing number of accurate voice/video quality prediction models. This paper presented a QoE-driven voice/video adaptation scheme for the optimisation of quality. The scheme was extensively tested by way of simulation using NS2 and by testing it in an Open IMS Core network. The results show that the scheme was responsive to available network and delivered the optimum quality for a given available network bandwidth. The scheme was also shown to be friendly to TCP traffic. Acknowledgement This research is supported by the EU FP7 ADAMANTIUM project (contract No. 214751)
Appendix A: Voice adaptation scheme
Algorithm 1 Voice Adaptation Mechanism if (Cl ≤ 0.2) then Increase Mode; else if (0.2 < Cl ≤ 0.5) then if (LD 0.5) then Decrease Mode; end if
Appendix B: Video adaptation scheme
Algorithm 2 Video Adaptation Mechanism if (CT = GW ) then if (SD ≤ 0.7)) then SBR = 768 kbps else if (0.7 < SD ≤ 0.9) then SBR = 512 kbps else if (0.9 < SD ≤ 1.1) then SBR = 348 kbps else if (1.1 < SD ≤ 1.3) then SBR = 256 kbps else if (1.3 < SD ≤ 1.4) then SBR = 96 kbps else if (SD > 1.4) then SBR = 48 kbps end if end if
References 1. skype. http://www.skype.com, continuously updated. 2. Razavi, R., Fleury, M., & Ghanbari, M. (2007). Fuzzy logic control of adaptive ARQ for video distribution over a bluetooth wireless link. Adv. Multimedia, 2007(1), 8. 3. Jammeh, E. A., Fleury, M., & Ghanbari, M. (2008). Fuzzy logic congestion control of transcoded video streaming without packet loss feedback. IEEE Transactions on Circuits and Systems for Video Technology, 18(3), 387–393. 4. Sun, L., & Ifeachor, E. (2004). A new models for perceived voice quality prediction and their applications in playout buffer optimisation for VoIP networks. In Proceedings of IEEE international conference on communications (IEEE ICC 2004) (pp. 1478– 1483), Paris, France, June 2004. 5. Yin, N., & Hluchyj, M. G. (1991). A dynamic rate control mechanism for source coded traffic in a fast packet network. IEEE Journal on Selected Areas in Communications, 9, 1003–1012. 6. Shenker, S. (1995). Fundamental design issues for the future Internet. IEEE Journal, 13, 1176–1188. 7. Bolot, J. C., & Vega-Garcia, A. (1996). Control mechanisms for packet audio in the Internet. In INFOCOM ’96, fifteenth annual joint conference of the IEEE computer societies. Networking the next generation (Vol. 1, pp. 232–239). 8. Perkins, C., Hodson, O., & Hardman, V. (1998). A survey of packet loss recovery techniques for streaming audio. Network, IEEE, 12, 40–48. 9. Papadimitriou, P., & Tsaoussidis, V. (2007). A rate control scheme for adaptive video streaming over the Internet. In IEEE ICC. 10. Beritelli, F., Ruggeri, G., & Schembra, G. (2002). Tcp-friendly transmission of voice over ip. In IEEE international conference on communications (Vol. 2, pp. 1204–1208). 11. Jinsul, K., Ho, H. S., Hyun-Woo, L., Won, R., & Minsoo, H. (2006). Qos-factor transmission control mechanism for voice over IP network based on RTCP-XR scheme. Consumer Electronics, 1–6. 12. Friedman, T., Caceres, R., & Clark, A. (2003). RTP control protocol extended reports (RTCP XR). RFC 3611. 13. Moura, N. T., Vianna, B. A., Albuquerque, C. V. N., Rebello, V. E. F., & Boeres, C. (2007). Mos-based rate adaption for VoIP sources. Communications, 628–633.
E. Jammeh et al. 14. Rejaie, R., Handley, M., & Estrin, D. (1999). RAP: an end-to-end rate-based congestion control mechanism for realtime streams in the Internet in INFOCOM ’99. In Eighteenth annual joint conference of the IEEE computer and communications societies (Vol. 3, pp. 1337–1345). 15. Barberis, A., Casetti, C., Martin, J. C. D., & Meo, M. (2001). A simulation study of adaptive voice communications on IP networks. Computer Communications, 24, 757–767. 16. Sabrina, F., & Valin, J. M. (2008). Adaptive rate control for aggregated VoIP traffic. In Global telecommunications conference: IEEE GLOBECOM (pp. 1–6). 17. Qiao, Z., Sun, L., Heilemann, N., & Ifeachor, E. (2004). A new method for voip quality of service control use combined adaptive sender rate and priority marking. In IEEE international conference in communications (Vol. 4, pp. 1473–1477). 18. Wang, Y., Schaar, M., & Loui, A. (2005). Classification-based multidimensional adaptation prediction for scalable video coding using subjective quality evaluation. IEEE Transactions on Circuits and Systems for Video Technology, 15(10). 19. Cranley, N., Murphy, L., & Perry, P. (2004). Lecture notes in computer science: Vol. 3271. Content-based adaptation of streamed multimedia (pp. 39–49). Berlin: Springer. 20. Koumaras, H., Kourtis, A., Martakos, D., & Lauterjung, J. (2007). Quantified PQOS assessment based on fast estimation of the spatial and temporal activity level. Journal of Multimedia Tools and Applications, 34(3). 21. Onur, O., & Alatan, A. (2007). Video adaptation based on content characteristics and hardware capabilities. In Second international workshop on semantic media adaptation and personalization. IEEE Computer Society Press. 22. Manzato, M., & Goularter, R. (2005). Live video adaptation: a context-aware approach. In ACM proceedings on the 11th Brazilian symposium on multimedia and the web. 23. Antoniou, P., Pitsillides, A., & Vassiliou, V. (2007). Adaptive feedback algorithm for Internet video streaming based on fuzzy rate control. In 12th IEEE ISCC’07. 24. Hayashi, T., Kawaguti, G., Okamoto, J., & Takahasi, A. (2006). Subjective quality estimation model for video streaming services with dynamic bit-rate control. IEICE Transactions on Communications, E89-B(2). 25. Alexiou, A., Bouras, C., & Igglesis, V. (2005). A decision feedback scheme for multimedia transmission over 3g mobile networks. In WOCN. 26. Garudadri, H., Chung, H., Srinivasamurthy, N., & Sagetong, P. (2007). Rate adaptation for video telephony in 3g networks. Packet Video. 27. Alexiou, A., Antonellis, D., & Bouras, C. (2007). Adaptive and reliable video transmission over UMTS for enhanced performance. International Journal of Communication Systems, 20, 65–81. 28. Kim, D., & Jun, K. (2006). Dynamic bandwidth allocation scheme for video streaming in wireless cellular networks. IEICE Transactions on Communications, E89-B(2). 29. Jammeh, E., Fleury, M., & Ghanbari, M. (2006). Non-packet-lossbased rate adaptive video over the Internet. Electronic Letters, 42(8), 492–494. 30. ITU (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. In ITU-T rec. (p. 862), February 2001. 31. Sun, L., & Ifeachor, E. (2006). Voice quality prediction models and their applications in VoIP networks. IEEE Transactions on Multimedia, 8, 809–820. 32. Khan, A., Sun, L., & Ifeachor, E. (2008). An anfis-based hybrid video quality prediction model for video streaming over wireless networks. In IEEE NGMAST, September 2008. 33. ITU (2000). The e-model, a computational model for use in transmission planning. In ITU-T rec. G.107, July 2000.
34. NS2. http://nsnam.isi.edu/nsnam/. 35. Wei, D. X., Jin, C., Low, S. H., & Hegde, S. (2004). Fast TCP: motivation, architecture, algorithms, performance. In INFOCOM. 36. Ekudden, E., Hagen, R., Johansson, I., & Svedberg, J. (1999). AMR speech coder. In Proc. IEEE workshop on speech coding (pp. 117–119), Porvoo, Finland, June 1999. 37. Yang, Y. R., & Lam, S. S. (2000). General AIMD congestion control. Technical report TR-2000-09. The University of Texas at Austin, May 2000. 38. Klaue, J., Rathke, B., & Wolisz, A. (2003). Evalvid—a framework for video transmission and quality evaluation. In Proc. of the 13th international conference on modelling techniques and tools for computer performance evaluation (pp. 255–272). 39. Khan, A., Sun, L., & Ifeachor, E. (2009). Content-based video quality prediction for mpeg4 video streaming over wireless networks. Journal of Multimedia, 5. 40. DeMartin, J. C. (2001). Source-driven packet marking for speech transmission over differentiated-services networks. In IEEE international conference on acoustics, speech, and signal processing (pp. 753–756). E. Jammeh received a Beng (1st Class) degree in Electronics Systems Engineering, majoring in Telecommunications from the University of Essex (UK) in 1998. He then worked for Gambia Telecommunications Company (Gamtel) as a senior engineer responsible for the company’s Standard B Earth Station, where he had responsibility for all technical issues from installation, maintenance and general management of satellite communications equipment for the company. He obtained a PhD in Telecommunications from the University of Essex in 2005. He then worked as a Senior Research Staff at the University of Essex on the optimisation of encoded video streams over IP networks with a grant from EPSRC working on “A Fuzzy-logic transcoded video stream controller project (EP/C538692/1)”. He joined the University of Plymouth in 2008 where he is currently working on FP7 ADAMANTIUM project. His research interests are in video coding, compressed video quality evaluation, multimedia networking, IPTV and VoIP, network measurement and performance evaluation tools, and Fuzzy Logic Control. He is the author or co-author of over 20 scientific papers in international journals, peer-reviewed conference and book chapters. His main area of expertise is in multimedia networking, network measurement and performance evaluation. I. Mkwawa received his PhD in Computing from the University of Bradford in 2004. He is currently working as a postdoctoral research fellow within EU FP7 ADAMANTIUM project in the School of computing and mathematics at the University of Plymouth. He had previously worked as a postdoctoral research fellow from July 2006 to May 2008 with the University of Bradford, Department of Computing, for EU FP6 funded project “VITAL”. He had also worked with science foundation Ireland as a post-
Quality of experience (QoE) driven adaptation scheme for voice/video over IP doctoral research fellow at University College Dublin from May 2005– June 2006 with the objective of predicting collective communication time in heterogeneous cluster of computers. He has published several papers in refereed journals and conference proceedings and book chapters. He has delivered several tutorials on performance modelling and evaluation of heterogeneous networks in various conferences and written several technical reports for European Network of Excellence FP6 “Design and Engineering of the Next Generation Internet: Towards the Convergence of Multi-Service Heterogeneous Networks” 01/2004– 12/2006 and European Network of Excellence FP6 “Design and Engineering of the Future Generation Internet” from 12/2006–05/2008. He is a member of the IEEE Computer and Communication societies and his interests include interconnection networks, VoIP prediction, performance modelling/evaluation, mobile computing, applications and wireless networks, next generation Internet, multimedia systems, parallel and distributed systems and grid computing. A. Khan graduated in 1992 with BEng (Hons) in Electrical and Electronic Engineering from the University of Glasgow. In 1993 she was awarded an MSc in Communication, Control and Digital Signal Processing from Strathclyde University. She then worked with British Telecommunication Plc. from 1993 to 2002 in a management capacity developing various products and seeing them from inception through to launch. Some of the products that she was directly involved with were the first Multimedia payphone, introducing the £3 phonecard and developing ecommerce products for the Internet. She is currently working towards the PhD degree in the School of Computing, Communications and Electronics at the University of Plymouth. From 2008, she has been a Research Assistant in Perceived QoS Control for New and Emerging Multimedia Services (VoIP and IPTV)—FP7 ADAMANTIUM project at the University of Plymouth. Her research interest include video quality of service over wireless networks, adaptation, perceptual modelling and content-based analysis. M. Goudarzi received his MSc degree in Network Systems Engineering from the University of Plymouth, in 2008. He is currently perusing the PhD degree at the University of Plymouth, UK. He is a research assistant at the Signal Processing and Multimedia Communications (SPMC) department. His MSc dissertation was focused on the voice quality evaluation in 3G mobile networks. His current research interests include multimedia quality measurement, relationship between voice and video quality in mobile/wireless networks and development of new perceptual quality models for voice, video and multimedia content in mobile and next generation networks.
L. Sun received her PhD degree in VoIP speech quality prediction from the University of Plymouth UK in 2004. She holds a MSc in Communication and Electronics System (1988) and BEng in Telecommunications Engineering (1985) from the Institute of Communications Engineering, China. She is currently a Lecturer in Computer Networks in the School of Computing and Mathematics, University of Plymouth, UK. She has been involved in the EU FP6/FP7 and industry funded projects on multimedia communications & networking and leads the Group’s research in this area. She has published over 50 papers in peer-refereed journals and conference proceedings. Her publications on VoIP speech quality have received more than 150 non-self citations by peer researchers. She is a reviewer for journals such as IEEE Transactions on Multimedia, IET Electronics Letters and IEEE Transactions on Speech and Audio Processing. She has served on the TPCs of a number of international conferences, including IEEE Globecom. Her main research interests include VoIP, QoS, QoE, QoS prediction and control for multimedia over packet, mobile and wireless networks, network performance measurement and characterisation, and multimedia quality management. E. Ifeachor a communications engineer, biomedical engineer and computer scientist is a Professor of Intelligent Electronics Systems and Head of the Signal Processing & Multimedia Communications (SPMC) research. He is a graduate of Plymouth University and Imperial College, London, and has over 20 years of research experience in signal/information processing and computational intelligence techniques and their applications to real-world problems in communications, audio and biomedicine. Over the years, he has led many government and industry funded projects and published extensively in these areas (including co-authorship of 15 books/book chapters and over 160 papers). His current research activities include the development of novel techniques for user-perceived quality of service prediction and control for real-time multimedia applications and services (e.g. voice and video over IP networks), grid computing and distributed systems; audio signal processing, biosignals analysis, objective evaluation of intelligent systems and ICT for health. He has served as chair of major events and tracks organised by the IET and IEEE. He was Head of School (1995–1999). He was the coordinator of a recently completed, 30-partner, €6.4m, FP6 project, BIOPATTERN, which was in the area of ICT for Health (2004–2008). Professor Ifeachor has an interest and a good track record in the transfer of ICT and related technologies into industry.