Synchronisation of Internet Multimedia Streams - Performance ...

3 downloads 2139 Views 1MB Size Report
per we present a number of more complex scenarios where the lack of clock synchronisation ... IP to PSTN perspective, the gateway has to receive incoming IP packets, perform ... of gateways, mixers and conference-call servives. Section IV.
1

Synchronisation of Internet Multimedia Streams: Some Issues and Solutions Hugh Melvin, Liam Murphy

Abstract— Synchronised clocks and circuit switching constitute a basic building block within the heart of the traditional PSTN and this coupled with dumb terminals ensures that media synchronisation is not an issue. The same cannot be said of Internet Multimedia where delays are generally non-deterministic and where terminals are much more complex. Previous work by the authors has shown that by incorporating synchronised time into VoIP terminals, significant gains in voice quality can be achieved. Related work by the authors has examined the extent to which a lack of synchronisation (or skew) both within and between terminals can affect VoIP quality and has proposed and tested a high-level solution for skew detection/compensation. In this paper we present a number of more complex scenarios where the lack of clock synchronisation can impact on performance; these include PSTN/VoIP gateways, the use of media mixers for combining media streams and conferencing services. We describe a number of testbeds currently under development where the use of synchronised time and the high level skew detection/compensation approach will be evaluated as a means of dealing effectively with these scenarios. Keywords: Media Synchronisation, Clock Skew, PSTN-VoIP Gateway, Media Mixers .

I. INTRODUCTION Delay sensitive applications such as VoIP typically implement adaptive playout strategies to cope with network jitter. The authors in [1], [2] and [3] propose and successfully test a new hybrid playout algorithm based on synchronised time, provided via the Network Time Protocol (NTP) and the Real-time Transport Protocol Control Protocol (RTCP) Sender Reports (SR). NTP is shown in [2] to adequately meet the required level of synchronisation for the hybrid algorithm. This delay-aware approach enables end-to-end delays to be determined on a perpacket basis and used to select an optimum playout strategy i.e. adaptive or fixed. Adaptive approaches such as [4] [5] [6], though useful where precise delay information is not available, will often result in unnecessary late packet loss. VoIP terminals (such as PC-based soft-phones) typically contain a number of low-grade oscillator crystals, among them being the system clock to maintain system time (and which can be disciplined by NTP), and from a multimedia perspective, an audio clock, to set the sample periods for recording/playback. See Fig. 1 for an overview. We show in [7] that regardless of whether a receiver playout strategy examines trends in M2E delay (as with conventional delay-unaware approaches) or actual M2E delays (as with the hybrid approach), audio-system clock H. Melvin is a lecturer at the Department of Information Technology, National University of Ireland, Galway (email: [email protected]). Liam Murphy is a senior lecturer at the Department of Computer Science, University College Dublin and the director of UCD’s Performance Engineering Laboratory Research Group.

skew can introduce significant delay measurement inaccuracies. In any event, a mismatch between sender and receiver audio clocks can also lead to buffer overflow/underfill. In [7], weoutline a high-level solution to such skew related problems which in common with the hybrid playout strategy is based on the use of synchronised time and RTCP SR packets. This solution relates to the simple case of a unicast VoIP session. For the foreseeable future, it is likely that VoIP networks will not operate in isolation but rather will have to interoperate with conventional PSTN networks through the use of gateways. At a media transfer level, gateways have to perform two way conversion between packet based and circuit based data eg. from an IP to PSTN perspective, the gateway has to receive incoming IP packets, perform anti-jitter buffering (and perhaps Packet Loss Concealment), reconstruct the data stream and convert it into the required PSTN format i.e. 64kbps PCM. Example configurations are shown in Fig. 2 where a single gateway translates calls between the IP and PSTN networks, and Fig. 3 where gateways exist at each end of an IP island. Such gateways have to contend on the IP interface with terminals from many concurrent sessions (Fig. 2) or other gateways (Fig. 3), each potentially operating with different clock rates. Relative clock skew between the various clocks on each end thus needs to be resolved and compensated for to avoid the problems detailed in [7]. A second more complex scenario shown in Fig. 4 arises where a mixer is deployed to combine the audio streams from a number of separate sources. In this scenario, the mixer is required to detect and compensate for different clock rates between the various streams but also has to time-align the different streams. The latter will be necessary if there are significantly different delay characteristics between the various endhosts and the mixer. The issues with mixers apply also to a large degree to conferencing in that synchronisation of concurrent multimedia streams is required. With these scenarios, the need exists to quickly resolve the extent of skew and compensate for it on the fly as well as (in the case of mixers and conferencing) to time-align the different media streams. The remainder of the paper is organized as follows. Section II summarises the author’s NTP-RTCP based solution to skew detection. Section III outlines how problems relating to clock skew and lack of synchronisation will impact on the operation of gateways, mixers and conference-call servives. Section IV proposes an approach to resolving these problems on the fly and which is currently being implemented. Section V concludes the paper.

2

Fig. 3. PSTN-IP-PSTN Scenario

Fig. 1. Audio/System Clocks in VoIP Session

Fig. 4. Media Mixer

Fig. 2. IP-PSTN Scenario

II. NTP/RTCP- BASED S KEW D ETECTION RTP timestamps enable a receiver to accurately reconstruct incoming packets for playout [8]. The timestamps are media specific and relate to the sample number generated by the codec and thus (in the case of a sound card) to the audio clock speed. RTCP SR packets (if implemented correctly) include the system clock timestamp (in NTP format) indicating when the SR packet was generated, along with the corresponding RTP timestamp which is set by the audio card rate. Each sender periodically generates RTCP SR packets during the lifetime of a media session, and sends them to each receiving host. For a unicast session, the interval between successive packets is of the order of seconds. If both system and audio clocks are running at exactly the same rate on a given host, the interval between successive RTCP SR packets as indicated by the increment in RTP and NTP timestamps over the period will be equal. Any differences will indicate the skew between audio and system clock rates. By accumulating the information from successive RTCP SR packets over a session, each receiver can precisely and quickly determine the relative skew value between the sender’s system and audio clocks. If additionally, system clocks are synchronised via NTP, this value also represents the relative skew between sender audio and the receiver system clock. With such information, receivers know precisely what correction needs to

be applied to measured delays (actual or estimated) to avoid the gradual distortion described above. Furthermore, by examining its own RTCP SR packets being generated for transmission, the receiver can determine the relative skew between its own audio and system clocks. As the relative skew of both audio clocks relative to the receiver system clock is now known, the receiver can deduce the relative skew between the sender audio clock and its own audio clock. It can thus correct for buffer overflow/under-fill problems described above. In summary, using information from both sets of RTCP packets, each receiver can quickly generate a precise picture of all four clock rates and implement appropriate compensatory action. Fig. 5 outlines a high level flowchart that describes the skew detection and compensation method. In [7], the authors test this approach and present results that confirm its operation. In [9], their approach is compared with that of [10] and [11]and concludes that once the overhead of NTP is implemented, it is both more robust and suited to realtime deployment. III. S YNCHRONISATION /S KEW I SSUES FOR G ATEWAYS , M IXERS AND C ONFERENCING The previous section summarised the authors’ NTP-RTCP based skew detection/compensation mechanism in the case of an all-IP unicast session. In this section, we examine a number of more complex scenarios and assess the impact that skew and lack of synchronisation can have on performance.

3

Fig. 6. Mixer Timing Error

Fig. 5. Skew Detection Flowchart

A. Gateways Figures 2 and 3 outline two scenarios where IP and traditional PSTN networks interoperate through the use of gateways. Such scenarios are and will continue to be common place and much work has focused on the development of protocols to ensure seamless interoperability. These include the Media Gateway Control Protocol (MGCP) [12] and MEGACO/H.248 [13], the latter being a joint effort between the IETF and the ITU. These protocols are designed to ensure seamless operation at both a signalling and media transfer level. In reconstructing the data stream from incoming IP packets, a gateway needs to strike a balance between the size of the anti-jitter buffer (or playout delay) and total end-to-end delay and also needs to detect and compensate for skew. It then needs to repeat this process concurrently for each incoming media stream. In the reconstruction of each incoming packet stream the choice of playout strategy is thus critical; a large playout delay will result in very low late packet loss but high overall delays. Each stream may be affected by very different delay characteristics (both fixed and variable) and thus very different strategies may be most appropriate. Similarly, the audio/system clocks associated with each stream may have very different relative skew rates which affects both delay measurement and buffer performance. A gateway needs to consider and deal with all of these issues on the fly. B. Media Mixers and Conferencing Services Figure 4 outlines the scenario whereby a mixer is used to combine multiple concurrent media streams from different sources. Differing audio clock rates between sources will result in a cumulative timing error in the reconstruction and mixing of the two streams within the mixer i.e. mixers typically use incoming packet RTP timestamps to determine the elapsed duration of each stream session and to mix samples from each stream. In the scenario where two streams are being mixed and audio clock rates differ, the faster clock will generate and send

packets at a faster pace and the mixer will thus wait for the associated packets from the slower clock and will then mix samples that do not refer to the same instant. An exaggerated example is shown in Fig. 6. Both Host A and B generate packets every 10 msec but Host A’s audio clock is running fast such that it actually generates packets every 7.5 msec whereas Host B’s clock is correct. Fig. 6 shows packet arrivals at the mixer and outlines how the mixer matches up packets resulting in a cumulative timing error. This error grows linearly as the session proceeds unless skew detection/compensation is implemented. This simplified example presumes that packets from both A and B have identical delays to the mixer and no jitter. As such, if A and B have very different delay characteristics to the gateway, the consequent playout strategies in the reconstruction of each stream will differ resulting in a lack of alignment or synchronisation between the reconstructed media streams even if no audio clock skew existed. In this case, the gateway needs to know when exactly each stream commenced in order to determine how much time has elapsed. Many of the issues with media mixers also apply to conferencing services in that different media streams are being synchronised. The conferencing facility has to deal both with different skew rates in end-host clocks as well as realigning streams that may suffer from differing delay characteristics. IV. P ROPOSED S OLUTION TO S YNCHRONISATION /S KEW I SSUES In this section, we describe our current work which aims to resolve the problems relating to skew and lack of synchronisation raised above. The techniques described are currently being implemented to evaluate their effectiveness. A. Gateways As detailed above, gateways have to reconstruct packet based data and convert it into PSTN format and vice versa. In the reconstruction of each incoming packet stream the choice of playout strategy is critical. As outlined in previous work by the authors [2] [3], the use of synchronised time within the hybrid strategy enables a receiver to make an informed decision as to playout strategy and to apply a fixed delay playout strategy whenever possible. This was shown to result in improved

4

voice QoS. In a gateway scenario, this solution can easily be applied to the multiple concurrent streams enabling the gateway to make the optimum choice regarding playout strategy for each stream i.e. presuming that the gateway has some knowledge of delays within the PSTN side of the call, it can then apply an appropriate fixed delay playout, bearing in mind both the delays on the IP side and the overall ITU-T mouth-to-ear M2E limit of 150 msec approximately [14]. Similarly, the use of synchronised time will facilitate the implementation of the NTP-RTCP based skew detection approach and the gateway can thus quickly resolve the relative skew rates associated with the various audio/system clocks and take appropriate compensatory action for each stream B. Media Mixers and Conferencing Services Relative to gateways, Media Mixers have the added complexity of having to time-align and mix the different media streams once they have been reconstructed. The scenario described above in Fig. 6 which results in a cumulative timing error due to different audio clock rates can be resolved using our high level NTP-RTCP based approach. As RTCP SR packets from each host arrive, the mixer can resolve the relative skew rates between the mixer and source clocks. It can then compensate for each source (eg. by adding/deleting samples) to ensure that media samples that are mixed relate to the same instant. The use of synchronised time, will also ensure that the mixer is aware of the sample generation time for each stream to facilitate correct time-alignment. However, the playout strategy applied to each stream is also critical. For example, with the gateway scenario described above, the use of the hybrid playout strategy may result in different fixed playout delays being imposed depending on delay characteristics of each stream. In contrast to the gateway scenario however, a mixer needs to align these reconstructed streams before mixing. As such, there is little point in independently determining and applying an optimum fixed playout delay for each stream. For example, if the hybrid strategy returned the optimum fixed playout delay for stream A and B as TA and TB respectively, where TA < TB , then a mixer will in any event have to delay the reconstructed stream A for the period TB − TA to ensure correct alignment. As such, it makes more sense to implement the highest fixed playout delay TB for both as this should result in even lower late packet loss for stream A. A similar approach can be applied with conferencing to ensure that the different streams remain synchronised and correctly aligned. V. CONCLUSIONS AND FUTURE WORK In the move towards Internet-based multimedia, many challenges have emerged. The limitations of best effort Internet have been well documented and various initiatives have been developed to assist in the delivery of delay-sensitive data. The problems associated with end-points or terminals have however been less well identified or researched. Looking ahead, both the traditional PSTN and Internet -based multimedia services such as VoIP will coexist and thus need to interoperate. In previous work, we introduced the hybrid playout strategy that utilises

synchronised time within endpoints to improve VoIP QoS and facilitates an NTP/RTCP-based skew detection/compensation technique for a simple unicast session. In this paper, we describe our current work which is examining a range of more complex scenarios within gateways and mixers. We propose solutions to these scenarios, based on the combination of synchronised time and skew detection. We are currently implementing these solutions within a number of testbeds in order to assess their potential for performance improvement. R EFERENCES [1] H. Melvin and L. Murphy, “Time synchronization for VoIP Quality of Service,” IEEE Internet Computing, vol. 6,no. 3,May-June 2002, pp.57-63. [2] H. Melvin and L. Murphy, “An evaluation of the potential of synchronized time to improve VoIP quality”, IEEE International Conference on Communications (ICC 2003),Anchorage, May 2003. [3] H. Melvin and L. Murphy, “An Evaluation of Delay-Aware Receiver Playout Strategies for VoIP Applications,” Proceedings of International Federation for Information Processing (IFIP Networking 2004), Athens, May. 2004, and published in Proceedings of LNCS, Springer-Verlag 2004. [4] R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne, “Adaptive playout mechanisms for packetized audio applications in wide-area networks,” Proc. Conf. Computer Comm. (IEEE Infocom), IEEE CS Press, Los Alamitos, Calif., June 1994, pp. 680-688. [5] S. Moon, J. Kurose, and D. Towsley, “Packet audio playout delay adjustment:performance bounds and algorithms,” ACM/Springer Multimedia Systems, vol. 6, pp. 17-28, January 1998. [6] Information Technology Y. Liang, N. Farber, and B. Girod, “Adaptive playout scheduling using time-scale modification in packet voice communications,” Proc. of ICASSP 2001. [7] H.Melvin and L.Murphy, “An Integrated NTP-RTCP Solution to Audio Skew Detection and Compensation for VoIP Applications,” Proceedings of the IEEE Int’l Conference on Multimedia and Expo., Baltimore, July 2003. [8] H.Schulzrinne, S.Casner, R.Frederick, and V.Jacobson “RTP: A Transport Protocol for Realtime Applications,” IETF RFC 1889, Jan. 1996. [9] H. Melvin, “The Use of Synchronised Time in Voice over IP (VoIP) Applications”, PhD Thesis,Dept. of Computer Science, UCD, Ireland, 2004. [10] R.Akester and S.Hailes, “A New Audio Skew Detection and Correction Algorithm,” Proceedings of the IEEE Int’l Conference on Multimedia and Expo.,Lausanne, Aug. 2002. [11] O.Hodson, C.Perkins, and V.Hardman, “Skew Detection and Compensation for Internet Audio Applications,” Proceedings of the IEEE Int’l Conference on Multimedia and Expo.,NY, July 2000. [12] M.Arango, A.Dugan, I.Elliot, C.Huitema, and S.Pickett, “Media Gateway Control Protocol,” IETF RFC 2705, Jan. 1999. [13] F.Cuervo, N.Greene, A.Rayhan, C.Huitema, B.Rosen, and J.Segers, “Megaco Protocol Version 1,” IETF RFC 3015, Nov. 2000. [14] ITU-T Recommendation G.114, “One way transmission time,” ITU, May 2003.