Jul 17, 2002 - Synchronized Multicast Media Streaming employing Server-Client Coordinated. Adaptive Playout and Error Control. Jinyong Jo and JongWon ...
Synchronized Multicast Media Streaming employing Server-Client Coordinated Adaptive Playout and Error Control
Jinyong Jo and JongWon Kim Networked Media Lab., Department of Information and Communications, Kwang-Ju Institute of Science and Technology (K-JIST), Gwangju, 500-712, Korea.
Abstract A new inter-client synchronization framework employing a server-client coordinated adaptive playout and error control toward one-to-many (i.e., multicast) media streaming is discussed in this paper. The proposed adaptive playout mechanism controls the playout speed of audio and video by adopting the time-scale modification of audio. Based on the overall synchronization status as well as the buffer occupancy level, the playout speed of each client is manipulated within a perceptually tolerable range. Additionally, the server implicitly helps increasing the time available for retransmission while the clients perform an interactive error recovery with the assistance of adaptive playout control. By coordinating the playout speed of each client, the inter-client synchronization with respect to the target presentation time is smoothly achieved. The cumulative feedback for retransmission assisted by both playout speed manipulation at clients and implicit help of a streaming server is also conducted to increase the reliability. Furthermore, RTCP-compatible signaling between the server and group-clients is performed, where the exchange of controlling message is restricted. The network-simulator based simulations show that the proposed framework can reduce the playout discontinuity without degrading the media quality, and thus mitigate the client heterogeneity. Key words: Multicast media streaming, inter-client synchronization, source specific multicast, and adaptive playout control.
Email address: {jiny92, jongwon}@kjist.ac.kr (Jinyong Jo and JongWon Kim). URL: http://netmedia.kjist.ac.kr. (Jinyong Jo and JongWon Kim).
Preprint submitted to Computer Networks
17 July 2002
1
Introduction
With the enormous growth of the Internet, more and more applications are distributing media contents to worldwide users. The average size of media contents transferred over the Internet are exponentially increasing everyday. It can be partly supported by the upcoming broadband network infrastructure. However, to provide broadband media contents reliably and scalably, an efficient utilization and proper management of limited network bandwidth become increasingly important. As a viable option to save the precious bandwidth, multicast provides immense merits in designing multimedia applications. Especially for continuous streaming media (e.g., audio and video), it allow us to support a large set of clients simultaneously. The major problem with multicast is that, due to the manageability and scalability problems, the ubiquitous deployment of IP multicast on the Internet is expected only after several years to come [1]. Realizing an effective media streaming over the IP multicast faces lots of challenges. It requires feasible signaling/transport protocols, stable multicastenabled networks, and network-adaptive applications to distribute the broadband media in an acceptable quality [2]. For the network side, source specific multicast (SSM) model [3] alleviates several complications (e.g., by restricting application scenario) of conventional any source multicast (ASM), especially for one-to-many media streaming. For the protocol side, real-time transport protocol pair (RTP/RTCP [4]) can provide interoperable/monitored real-time transport channel over the IP networks. RTP/RTCP in itself does not guarantee quality of service (QoS) to streaming media applications but acts as a helper. Streaming media applications at the server and clients are responsible to adaptively take charge of network congestion/error/delay variations and system resource limitations. To overcome the network variations, streaming media applications are deploying error controls as well as rate (e.g., congestion and flow) controls. Reliable transmission of RTP/UDP packets needs error controls based on automatic repeat request (ARQ), forward error correction (FEC), or their hybrid. In the context of reliable multicast, building blocks are proposed based on the NACK (negative acknowledgement), tree-structured ACK, and FEC schemes [5]. It is also important to deploy an effective congestion control scheme to cope with the network congestion while achieving the TCP-friendliness [6, 7]. The playout of streaming media needs to be synchronized according to the associated timing information (e.g., timestamp per each packet). The goal of media synchronization is to reconstruct the timing relation of media contents at the clients. The noticeable disruption (due to buffer underflow or system overload) and/or discrepancy (due to playout timing skews between different media objects) in the playout significantly degrades the playback quality. 2
Presentation time difference
Server Client Group coordination
Client Network fluctuation Rate, Loss, Delay(jitter)
Client Inter-client synchronization System dynamic Momentary CPU overload and etc Client Application loss and Network loss
SSM-enabled IP network Client Client
Adaptive playout control Network loss Error recovery
Fig. 1. Multicast streaming problem scope and related issues.
These intra-client media synchronization including the lip synchronization has been considered as an important issue. To keep the synchronization, streaming media applications have to deal with the network delay jitter and loss that can cause the playout disruption. They also need to keep the smooth playout regardless of the exceptional events at the client system. In addition, we can expect to see different clients are playing different portion of media as shown in Fig. 1 (denoted as presentation time difference). This situation is happening since each client is connected to the streaming server through paths of different bandwidth, loss, and delay. The difference in the capability of client system is another reason. Thus, we need to address the playback synchronization issue in both intra- and inter-client aspects while properly controlling the buffers at the clients. That is, an adaptive playout is required to maintain the playout synchronized despite of the network fluctuations and system limitations [8–10]. Note that, compared to the multicast rate and error control issues, multicast playback synchronization has been rarely discussed. To cope with the above challenges and provide high-quality media streaming, an adaptive multicast streaming framework has to be established. Thus, in order to reduce the playback discontinuity and mitigate the heterogeneity, we propose to establish a synchronized multicast streaming framework, where the synchronized playout of all clients is adaptively managed. Note that synchronization issue becomes more important as the media streaming goes broadband and high-quality. In the proposed synchronized streaming framework, each client is required to keep synchronization despite of the exceptional system events as well as the network fluctuations. To assist each client for this challenge, we propose to adaptively control the playback speed of playout. By extending the audio/speech adaptive playout with time-scale modification in [11, 12] to audio/video, the playback speed of client can be varied. That is, based on the combined buffer (i.e., network and application buffers) oc3
cupancy level, the player at the client is adaptively expanding or contracting the playout within the range that it does not hurt the viewers perceptually. Thus, we can proactively conduct the adaptive playback not only to reduce the playback discontinuity but also to guarantee high-quality playback with flexible error controls 1 . In summary, the proposed streaming framework consists of 1) local playback adaptation (guided by a playback factor) based on the combined buffer occupancy with the error recovery support, 2) unicast RTCP feedback on the presentation time as well as the channel status, 3) inter-client synchronization with the aid from the server, and 4) cumulative NACK-based error recovery with the assistance of adaptive playout control. More specifically, each client locally controls the playback speed to prevent buffer overflow/underflow (subsequently to prevent playback discontinuity) and to assist the delayconstrained retransmission attempt if allowed. This local adaptation is then reviewed at the server by aggregating client feedbacks and the server will issue target presentation time to coordinate and synchronize all the clients. Cumulative NACK-based error recovery is also assisted by the adaptive playback to secure enough time for request and reply of retransmission. Furthermore, to reduce unnecessary feedbacks, whether to request retransmission or not is selectively determined. In this paper, we are interested in verifying and evaluating the effectiveness of adaptive playout for the proposed framework. We choose to evaluate the proposed framework by conducting simple yet extensive network simulations. Results show that the proposed framework can reduce the playback discontinuity without degrading the media quality while enhancing the inter-client synchronization by mitigating the client heterogeneity. At the same time, it can assist the retransmission-based error recovery with the adaptive playout and reduce bandwidth through the selective retransmission and cumulative feedbacks. The rest of this paper is organized as follows. the proposed one-to-many media streaming framework is introduced in Section 2. In Section 3, the core components of the framework are explained subsequently in the order of the adaptive playout with the audio time-scale modification, the local adaptation for the buffer, the playout for the intra-client synchronization, the server-aided interclient synchronization with feedback, the adaptive playout control for the error recovery, and the coordination of adaptive playout controls. Section 4 shows the simulation setup and the verification results of the proposed framework. After reviewing related works in Section 5, we conclude the paper in Section 6.
1
We are restricting our attention only to the adaptive playout and error control, setting aside the integration of congestion control to future works.
4
2
Synchronized Multicast Media Streaming Framework
Multicast has been extensively investigated to tackle the network bandwidth challenge. However, the global deployment of multicast is hardly accomplished at present because of the involved complexities and limited QoS capabilities. In this paper, we focus on a pragmatic solution to enhance feasibility of multicast media streaming service. That is, we attempt to propose the synchronized multicast media streaming framework that can mitigate the deployment complexity and enhance the service quality. The proposed framework will adopt the SSM model [13, 14], because it offers an off-the-shelf solution to overcome the deployment complexities of IP multicast. Furthermore, the RTP/RTCP over the SSM environment is adopted to adjust the protocols to the SSM [15]. Multicast RTP Multicast RTCP Unicast Feedback
SSM-enabled IP network
Frames played
Client 1
Server audio video
ARQ Network loss Client 2
Retransmission buffer
. . . Client i ARQ Client n
Client n compositor
Application loss
. . .
decoder (AD) demux
VD monitor scheduler
timer controller
Fig. 2. The proposed synchronized multicast media streaming framework.
Fig. 2 illustrates the overall framework of the proposed synchronized multicast media streaming, where the adaptive playout control for audio/video streams (e.g., MPEG-2 and MPEG-4) is deployed with the media delivery through the SSM-enabled IP network. Multiplexed audio/video stream (e.g., MPEG-2 program stream (PS) [16]) is transmitted to all the clients joined through a PIM-SM (protocol independent multicast - sparse mode) routing. It is assumed that a single-layer-encoded stream 2 is delivered to moderate number of clients (e.g., around 100) with receiver buffers. Under this kind of situation, one can expect clients to lose synchronization easily and play slightly different portion of the stream compared to the other clients. Note that we are assuming the lightly coupled synchronization, where the server and clients need to be 2
For the sake of simplicity, neither layered encoding nor multicast congestion (or flow) control is integrated at this stage.
5
synchronized within an allowed range 3 . Note also that this is in comparison to the tightly synchronized playout employing the clock synchronization by the network time protocol (NTP) [17] 4 . The desired inter-client synchronization is accomplished in this work by performing the playout adaptation with the audio time-scale modification [11, 18]. The synchronized playout is assisted with the feedback-based streaming control, which also can encompass the rate and error controls. The simple retransmission-based error recovery can be enhanced with the explicit help of playout control, which tries to expand the time available for retransmission by slowing down the playback temporarily. Server can also help the error recovery implicitly by relaxing the target presentation time to that of the slowest client to secure the time for retransmission. The client can not do a multicast feedback, since the neighboring routers for each client are configured by IGMPv3 (Internet Group Management Protocol, version 3) [19] protocol to filter a source in the SSM. Instead, each client uses a customized unicast RTCP receiver report (RR) while the server multicasts a RTCP sender report (SR) to deliver a control message [15]. The most important component of the proposed framework is the adaptive playout control, that helps us to minimize the playout discontinuity and to cope with the network fluctuations and system limitations. Previous efforts on the adaptive playout control have focused only on the media synchronization based on the presentation timing and they are confined to the unicast case. We, however, propose to deploy the adaptive playout mechanism in multicast environment and exploit its flexibility advantage. That is, we exploit the gain of adaptive playout for the inter-client synchronization. The inter-client synchronization coordinates the presentation timing differences among the clients in a multicast group. Inter-client synchronization is necessary to provide highquality media streaming service with the fairness among session participants. Synchronized playout of participating clients forms the support foundation on which a server easily deploy its network adaptation mechanisms (e.g., including flow and congestion controls). That is, the necessary server-client interaction for the rate, error, and synchronization can be better accomplished with the adaptive playout of each client. Also, feedback-based scheme with RTCP-compatible signaling is proposed to achieve the synchronization with simplicity. An effort on accommodating the RTP/RTCP to the SSM environment has been made in [15], which generally deals the RTCP extensions for the 3
The allowed range depends on the latency requirement of application and the capabilities of clients. 4 With the time synchronization and given presentation time, the inconsistency of the playout can be controlled. However, it may result in the increase of playback discontinuity, which becomes more evident when the performance of clients are not homogeneous and subject to diverse network fluctuations.
6
SSM. We extend it to RTCP-compatible signaling for the inter-client synchronization. We also propose to recover losses based on cumulative NACK-based retransmission request. Although the retransmission-based error recovery has shortcomings like long repair latency and limited multicast scalability, it still remains as an attractive candidate for the error recovery because of its simplicity and efficiency. In the multicast environment, retransmission request/reply may increase network traffics drastically unless multicast transmission is combined with the right feedback implosion control. Issues even become worse if the exchange of controlling message is restricted as in the SSM environment. To solve the scalability problems of retransmission-based error recovery for the SSM with unicast feedback [15], we propose to adopt selective retransmission request similar to [20–22] with cumulative NACKs to save feedback bandwidth. Furthermore, server-client coordinated playout control helps to salvage the time for retransmission. Note that we intentionally simplified the multicast model and restricted our attention only to one-to-many streaming applications in this paper. Thus, the proposed multicast media streaming framework can mitigate the deployment complexity and provide the enhanced streaming quality.
3
Adaptive Playout and Error Control
3.1 Adaptive Playout with Audio Time-scale Modification To provide high-quality video streaming and ease the client adaptation to the network fluctuations and the system limitations, the client-based adaptive playout is adopted. As shown in the magnified part of Fig. 2, the multiplexed stream is stacked at the receiver buffer of each client to wait the decoding and playback. The stream is de-multiplexed into the respective audio and video decoding buffers, where the presentation timestamp (PTS) located at the header of MPEG-2 PS is recorded into a scheduler. It also gets the current buffer occupancy level from a buffer monitor. Based on the timing and buffering status, the scheduler controls the adaptive playout by expanding/contracting the playout. The buffer monitor checks the sequence number of incoming packets and determines the loss from the network - it performs gap-based loss detection. The associated timer of the controller provides the required timing information and controls the retransmission. The main role of the adaptive playout control is to reduce the discontinuity incurred by packet over-/underflows and momentary CPU overload. An effective playout adaptation allows us to avoid excessive packet droppings at the application, conceal the network fluctuations and thus minimize the degradation of 7
perceptual media quality. With the audio time-scale modification [11, 18], we can significantly enhance the adaptation capability of the client at the cost of increased computation. To perform the adaptive playout with the time-scale modification, we need to know the allowed ratio within which a player can manipulate the playback speed without being detected by the user’s perception. Even though the allowed playout variation differs based on the type of audio (including the silence), we assume that playout variation up to 25% is usually unnoticeable 5 . In this paper, we define the playback factor (∆p ) to specify this ratio. Original
P1
P2
P1
P2 Waveform similarity
P1
Waveform similarity
P0 P1
Realignment P1
P2
P2 Time scale contraction
Time scale expansion Time scale modified
P2
Realignment
P1
P2
Fig. 3. Packet based SOLA in time-scale modification.
With the time-scale modification based on the Waveform Similarity OverLapAdd (WSOLA) algorithm [18], a player can vary the playout speed. In Fig. 3, the packet version of modified WSOLA [11] for the time-scale expansion and contraction is illustrated. For the time-scale expansion, a template segment which is longer than one pitch period from the original waveform is sampled and then similar segment is searched based on the waveform similarity. The found segments followed by the rest of the samples in the original packet is overlapped with the template segment and added to the time-scale modified packet. The time-scale contraction is performed in a similar way except that it narrows searching region within a packet distance to compress the output waveform. 3.2 Local Playout Adaptation Under the proposed framework, each client is performing the playout adaptation locally. The local playout is heavily tied with the buffer control. It adapts to control the combined buffer occupancy 6 guided by the playback factor. 5
The playback speed should be changed with caution to keep the playout consistent and smooth. 6 The size of receiver network buffer is assumed to be much larger than the summation of decoder and composition buffers. If this assumption becomes invalid, we need
8
pci Ps + ∆ p Ps Ps − ∆ p bS bH bR t
bL bi
Fig. 4. Desired operation of the proposed adaptive playout control.
In general, the receiver buffer size bs of each client is typically constrained by the hardware complexity and the application requirement on the delay and loss. When dealing with number of clients, it will be simple and much easier to coordinate the playbacks of whole client with larger size buffers. However, when it comes to the hardware cost, smaller size buffers with elegant buffer control are preferred. In this paper, we fix the buffer size of all clients as small as possible but large enough to service the worst case client with the largest round trip time (RTT) and loss. Then, we denote the current buffer occupancy level of client i at time t as bi (t) and specify two fixed thresholds for the receiver buffer as shown in Fig. 4. They are lower limit (bL ) and higher limit (bH ) thresholds. When bi (t) falls under bL or exceeds bH , there is high risk of buffer underflow or overflow, respectively. We introduce another implicit threshold, namely retransmission limit (bR ), to guide the retransmission-based error recovery. With this, the controller determines whether it request the retransmission of lost packet or not. We will cover this threshold later in the error recovery part. bi bS bH pci = Ps ⋅ (1 − ∆ p )
bR bL
pci = Ps ⋅ (1 + ∆ p ) t
0
ε ci 0 .5 ⋅ ε t 0
| ε ci − 0.5ε t | / ∆ p
t
| ε ci − 0.5ε t | / ∆ p
−0.5 ⋅ ε t
Fig. 5. Key roles of local playout adaptation (the effect of network/system fluctuations is ignored for illustration purpose only). to consider the virtual combined buffer that comprises all three buffers together.
9
As discussed above, the local playout adaptation attempts to manage bi (t) between bL and bH . By helping the buffer-level control within the range as shown in Fig. 5, the adaptive playout reduces the risk of buffer underflow/overflows. By keeping the playback speed of client i at t, pic (t) within acceptable range Ps · (1 − ∆p ) ∼ Ps · (1 + ∆p ) (i.e., 1 ± ∆p of its normal playback speed Ps ), we can prevent exceptional events that may result in unnecessary packet discard. Thus, with the adaptive playout based on the time-scale modification, we can manage the buffer level more flexibly. 3.3 Adaptive Playout for Intra-client Synchronization Another key role of local playout adaptation at each client is towards the synchronized playout (intra-client synchronization). If there exists noticeable discrepancy in the audio and its corresponding video (or temporary disruption of playout), it disturbs the audience a lot. In this paper, we assume that globally synchronized time reference is not available (i.e., lightly coupled synchronization). Thus, each client is responsible to manage the synchronization based on its own time reference, namely local virtual time. The time reference is guided by the PTS delivered with media packets. We also focus on the intra-media aspect of intra-client synchronization. Note that the intra-client synchronization issue is closely tied with the inter-client synchronization to be explained later. Tci , Vci
SR( n + 1)
SR( n) Ts ( n)
Tci (n + 1)
Tci Vci t
εt ε ci
Fig. 6. Server-aided client playout adaptation for both types of synchronization.
Fig. 6 depicts the proposed playout operation to maintain the playback close to a target presentation time. Initially, the virtual presentation time vci (t) starts concurrently with the start of each playout at tstart and the timer records the current presentation time of client i, Tci (tstart ). Note that each media packet contains the PTS. Using Tci (tstart ) as the reference, we can estimate the virtual presentation time at tstart + ∆t . Each client then compares its current presentation time Tci (t) to the virtual one vci (t). The target range is controlled by ²t , which is centered around the vci (t). Here, we denote by ²ic the amount of playout time discrepancy with respect to the vci (t). That is, ²ic means the time 10
difference between the virtual (vci (t)) and actual (Tci (t)) presentation time. When the ²ic falls in the target range, i.e., |²ic | < 0.5 · ²t , the playout adaptation is not necessary. When ²ic becomes greater than ²t /2 or less than −²t /2, each client attempts to adjust the current presentation time and consequently move the buffer level. Each player either fasten or loosen its playback speed to compensate the time discrepancy (|²ic − 0.5 · ²t |) until the Tci (t) goes into the target range.
3.4 Server-aided Inter-client Synchronization with Feedback Tc1 Tcn
Ts
Server
Tc2
Client 1 Adaptive Ctrl Group Sync Client 2
Ts Ts
Adaptive Ctrl Group . Sync
. .
Client n Multicast RTP Multicast RTCP Unicast Feedback
Adaptive Ctrl Group Sync
Fig. 7. Server-client signalling for inter-client synchronization.
Fig. 7 depicts the proposed operation of inter-client synchronization to maintain the playback of all clients close to a target presentation time. Remember that, in this paper, we are talking about lightly coupled synchronization, where the server and clients need to be synchronized within an allowed range. Thus, feedback-based synchronization is adopted to accomplish the required serverclient synchronization. RTCP-compatible signaling between the server and clients is performed to exchange controlling messages. Note that the exchange of controlling message needs to be restricted based on the RTCP bandwidth rule and, due to the employed SSM, each feedback is unicasted. The RTCP RR interval is determined based on the bandwidth of standard RTP and RTCP application-defined (APP) packet, which is customized to deliver the necessary timing information for the synchronization. When the server receives the compound RTCP RR packets, it gets the current presentation time of each client (Tci (t)) and estimates the RT T i 7 . They are aggregated to predict the 7
RT T i of each client is approximated at the sender utilizing the unicasted RTCP RR reports. RT T i = tiARR − tDLSR − tLSR , where tiARR is the arrival time of the RR packet at the server, tDLSR is the delay elapsed from receiving last SR at the client and tLSR is the time that has been copied from the server SR packet at the receiver.
11
target presentation time, Ts . Each client is then notified about this target presentation time when it receives the RTCP SR from the server. To determine the target Ts from the aggregated feedbacks from all the clients, the server needs to set an arbitration policy. It is important to adjust the fairness or performance of all clients in terms of synchronization 8 . For example, we can choose a simple averaging for the arbitration and thus Ts is calculated by averaging all Tci ’s. Or we can use other policies such as median, minimum, or maximum. The detailed calculation for Ts is as follows. Even though we are talking about gathering feedbacks from all clients as a response to the server SR, the feedback from each client is returning at arbitrary time point (or maybe lost during feedback). To compensate this, we maintain the delay dis for each client at the server, i.e., the delay after the feedback reception till the server SR. Also the feedback from each client takes different RT T i (to be precise, half of RT T i ) to reach the server. Finally, the target presentation time Ts is calculated as Ts = Farbitration {Tci , dis , RT T i |i = 1, . . . , Nc },
(1)
where Farbitration implies the selected arbitration policy, i stands for client i, Nc is an active group size, dis is the processing delay from the last RR to the current SR, respectively. With the calculated Ts , the server multicasts the compound RTCP packets containing Ts to all clients and each client adapts its playback speed. The previous Fig. 6 also illustrates the detailed behavior of the server-aided client adaptation with respect to the presentation time. Note that the proposed server-aided client adaptation is tightly coupled with the local playout adaptation. At this stage, we are using a simple approach to coordinate local and server-aided adaptations. Simply speaking, we are just replacing local virtual time with the guide from the server. Note however that it is very important to make the switch among the local and server-aided adaptations smooth so that the playout at each client is consistent. The detailed procedure is as follows. While each client tries to adapt the local presentation time Tci (t) to its virtual presentation time vci (t), it receives the target presentation time Ts (n) from the server. Upon the reception, each client goes through transient adaptive playout period by simply resetting its reference presentation time of virtual timer to the server-reported Ts (n). After the reset, the player goes into the adaptation state to re-adjust the presentation time within the allowed target range. 8
Note that this issue is tightly connected to the fairness of multicast congestion control. We also assume that clients with badly broken links are compelled to leave the multicast group so that it does not bias the arbitration.
12
3.5 Adaptive Playout Control for Error Recovery Retransmission-based error recovery is not directly applicable to the applications with strict delay constraint. The adaptive playout control however can reduce the repair latency effectively. That is, we can extend the role of local playout adaptation in order to secure the time required for retransmission request/reply. It enhances the chance of error recovery even in the circumstances that RTT is relatively small. Proposed cumulative NACK-based error recovery scheme is illustrated in Fig. 8. It is no use to request the retransmission if the buffer level bi (t) is less than the retransmission limit bR . bi
Up to n feedback cumulation is possible
bR + Acn (t ) (1 + ∆ p )
Retransmission request is impossible
b bR R (1 + ∆ p )
t
∆ p ⋅ bi (t )
effective bR
Ac
bR
bR (1 + ∆ p )
bi (t ) t
Fig. 8. Error recovery with the help of adaptive playout control.
Precise estimation of one-way delay from each client to the server is very important in setting the bR effectively. However, with the SSM and RTP/RTCP environment, it is rather complex to measure the RTT at the clients without the clock synchronization. By slowing down the playout speed, the adaptive playout can secure sufficient time for retransmission. Thus, the threshold for successful retransmission-based error recovery - the effective bR - can be bR lowered to (1+∆ . It thus enlarges the possible range of retransmission to p) bR i ≤ b (t) ≤ bH . (1+∆p ) Furthermore, the cumulative NACK can save the feedback bandwidth. When clients detect a packet loss based on a sequence gap, it checks the buffer occupancy level bi (t) to determine whether it needs to request retransmission right away or it can hold the request for a while to process multiple requests together. From Fig. 8, it is natural that bi (t) must be greater than bR to merge multiple requests together when there is no adaptive playout. To merge feedbacks of n consecutive packets in the bitmask field of packet header, we need the buffer level margin of Anc (t). That is, bi (t) has to be maintained over bR + Anc (t). Likewise, with the adaptive playout, the buffer margin to enable the cumulative feedback becomes (1 + ∆p ) · bi (t) − bR . When bi (t) lies between bR ∼ bR + Anc (t), we can hold the feedbacks by Ac = min[(1 + ∆p ) · bi (t) − (1+∆p ) 13
bR , Anc (t)] by slowing down the playout speed. Based on the buffer occupancy, the selective decision for retransmission is made. That is, if bi (t) is greater than the effective bR , the request becomes possible. After making the decision, each client feeds the cumulative NACK back to the server using the unicast feedback channel. When the server receives cumulative NACKs from the clients, it checks the requests to suppress duplicate retransmission requests. For example, if there are more than two NACKs requesting the same packet within a RTT, late requests are suppressed at the server. The requested packet which is unsuppressed by the server is then multicasted back to all clients. 3.6 Coordination of Adaptive Playout Controls Inter-client synchronization SYNC Receive
Ts
Local adaptation b(t ) ≤ bL
OR
b(t ) > bL
AND
Set Ts as a reference point
b(t ) ≥ bH
RISK b(t ) < bH
NORMAL Loss AND bR ≤ bi (t ) (1 + ∆ p )
No Loss OR bR > b i (t ) } {Loss AND (1 + ∆ p )
ARQ RISK 0
ARQ SYNC NORMAL
bL
RISK bH
Error recovery bS
Fig. 9. Proposed coordination among the adaptive playout states.
Fig. 9 describes the proposed operation of adaptive playout control for the local adaptation, the server-aided adaptation, and the cumulative NACKbased error recovery. The coordination is performed among the coordination states: NORMAL, RISK, SYNC, and ARQ. A coordination state transits to the other states based on the buffer occupancy, the losses from network and system, and/or the server-reported Ts . When the coordination state is under the NORMAL (i.e., it falls into the magnified portion of Fig. 5), it tries to adjust the current presentation time Tci (t) to the virtual presentation time vci (t). The coordination state goes into the RISK when the application is at the high risk of buffer under-/overflows, i.e., bi (t) ≤ bL or bi (t) ≥ bH . Since the RISK has the highest priority for attention to prevent the playback discontinuity, the player cannot escape from the RISK while the buffer occupancy level bi (t) is below bL or above bH . The ARQ state defines the assistance of adaptive playout control for the error recovery. Thus, the player does not go into the 14
bR state while bi (t) is less than (1+∆ or greater than bR + Anc (t) even though p) packets are lost. The inter-client synchronization is tightly coupled with the adaptive playout control which is performed on the NORMAL state. That is, the player goes into SYNC state when it receives the server-reported Ts . In the SYNC state, the virtual presentation time vci (t) is re-adjusted to the sender-reported Ts in the RTCP SR packet. The actual adjustment of Tci (t) to the sender-reported Ts is achieved when the player goes into the NORMAL state after escaping from the SYNC state.
4
Simulation Results
4.1 Simulation Setup
Streaming Server Session Member loss(%)/RTT(ms) 52
59
58
51
48
60
49
17
46
45
16
47
15
56 7%/350ms 53 54
13%/400ms
55
50
65
66
57
20
18
68
9%/500ms 8%/500ms
67
19
22
64
23
63
62
7
21
6%/350ms 10%/350ms
38
10%/450ms
5
8
12 3%/250ms
40
39 5%/120ms
42
43 44
13
6
4
61 71
26
14
74
25
75 9 30
70
41
4%/200ms
31
69
24
2
32
10
33
34
1
3 28
0 11
37
35
36 0%/50ms
27
78
76
77
29
6%/50ms
79
5%/50ms
4%/250ms
72 5%/200ms 73
82
81
80 1%/150ms
Fig. 10. Simulation topology.
Fig. 10 provides the detailed simulation topology by the network simulator (NS2) [23]. The SSM is provided by the PIM-SM multicast routing and each client feeds back its status through an unicast RTCP RR. There exists one server and 16 clients will join to the group from arbitrary locations. Table 1. Video traffic model. I frame(kbytes) B frame(kbytes)
P frame(kbytes)
Average frame size
50
21.6
10
Standard deviation
8
2
1
We simulate the MPEG-2 audio/video streaming scenario at average 5 Mbps with 30 f/s frame rate. This frame rate leads us to the temporal granularity 15
of 33 ms. The adopted group of picture (GOP) structure is for 15 frames consisting of one I-frame and 3 P-frames (i.e., IBBPBBPBBPBBPBB) and the bit rates of each frame is tabulated in Table 1. At this stage, the decoding and composition delay is ignored for the sake of simplicity. The timing accuracy of playing a frame is less than 10 ms in the simulations. All clients joining the multicast group have the same receiver buffer size (default size: 1 sec, unless specified else) and perform the same playout adaptation. Following thresholds are utilized for the receiver buffers. Lower limit bL bR is set to (1+∆ from the bottom, where the retransmission limit bR is based on p) the estimated RTT. Higher limit bH has margin equal to 100 ms from the top. We set the default playback factor (∆p ) to 0.25 (25%), which is somewhat an aggressive choice for the time-scale modification. For the Ts arbitration policy, selection of minimum value is employed as a default policy. Target synchronization discrepancy ²t with respect to the virtual presentation time is set to 100 ms. Note also that the playout adaptation needs to be done in terms of video frame. The presentation time discrepancy is thus converted to the number of video frames Nfi (based on the video frame rate Rf ). Thus, we are getting the number of frames to be adaptively played out with the time-scale modification, Npi , i.e., Npi = Nfi /∆p . For example, if we want to compensate 2 frames, then we needs to do the playout for 20 frames with ∆p = 0.1. In summary, the player of each client has to perform the playout adaptation up to Npi /Rf sec as long as the buffer level is properly controlled. The simulation initiates just after 0 sec by starting the multicast streaming from the server. Each client joins to the group with a gap of 11 sec (fixed). After joining and starting the playback, all the clients are subject to a randomized, momentary CPU overload. Currently the CPU overload is happening for less than 50 ms at full strength (i.e., no other computation is possible) and it is randomly occurring with a 10 sec exponential distribution. For the network, various combination of TCP and CBR traffics are traversing the whole network topology and causing network fluctuations between the server and clients. We select independent loss only on the receiver links as a loss scenario. The individual losses are occurred with an uniform distribution and several clients experience burst losses with 50 to 100 ms exponentially distributed burst length. The server has one second retransmission buffer. The average loss fraction is 6 %, some clients suffer from maximum loss at 13 % and some clients even do not experience any loss (0%). For the cumulative NACK-based error recovery, up to 16 NACKs are merged. With all these setups, we are using several measures for the synchronized streaming performance and two for the performance of cumulative NACK-based error recovery. They are 1) accumulated playout discontinuity per each client during streaming, 2) the maximum difference of playout among clients, 3) the playout speed variation of a client, 4) the consumed feedback bandwidth and 5) the success probability of loss recovery with retransmission. 16
4.2 Verification Results
Ti me di f f er ence( s)
To verify the performance of the proposed approach, three different playout cases are compared. ‘No playout adaptation’ means the case where the adaptive playout control for both local adaptation and error recovery is not performed and adaptive playout cases are divided into ‘Local adaptation’ and ‘Server-aided adaptation’. In both local and server-aided adaptation cases, the player request retransmission with the help of playout control. The request is performed individually (‘individual request’) or cumulatively (‘cumulative request’). 1.0 No playout adaptation 0.9 0.8 0.7
Local adaptation with individual request Local adaptation with cumulative request Server-aided adaptation with cumulative request
0.6 0.5
Server-aided adaptation with individual request
0.4 0.3 0.2 0.1 50
100
150
200
250
300
Discontinuity(s)
Fig. 11. Maximum playout time differences between the leading and trailing clients. 0.8
No playout adaptation
0.7 0.6 0.5
Local adaptation with individual request
0.4 0.3
Local adaptation with cumulative request
0.2
Server-aided adaptation with cumulative request
0.1
Server-aided adaptation with individual request
0.0 0
50
100
150
200
250
300
Time(s)
Fig. 12. Accumulated playout discontinuity averaged per each client.
17
Probability
First, the maximum playout time differences between the leading and trailing clients at each time t is shown in Fig. 11. Without the adaptive playout, the time difference increases up to maximum buffer occupancy level, i.e., bs . Packet overflows or intentional flushing is useful in reducing the gap, but it pays the playout discontinuity. Check the accumulated discontinuity per each case in Fig. 12. It illustrates 0.8 sec discontinuity is occurring to each client on the average without the playout adaptation. In comparison, the proposed playout adaptation can bound the gap more effectively. If the player adopts local adaptation only, there is no way to restore the inter-client synchronization since there is no guidance. Eventually it is subject to flushing of packets in the buffer. But with the help of server coordination, we can maintain and the gap throughout the whole playout. The accumulated playout discontinuity in Fig. 12 shows that the proposed playout adaptation with the server-aided synchronization efficiently reduces the playout discontinuity to half of ‘No playout adaptation’. 0.7 No playout adaptation 0.6
0.5
Local adaptation with individual request
0.4
Server-aided adaptation with individual request
Local adaptation with cumulative request
0.3
Server-aided adaptation with cumulative request
0.2
0.1
0.0 26
28
30
32
34
Frame rate(f/s)
Fig. 13. Playout speed variation (a client).
Fig. 13 depicts the playout speed variation of a client in the group. It is no doubt that the server-aided adaptation experiences more playout speed variation than the other cases at the expense of reduced discontinuity. The maximum buffer occupancy differences between the leading and trailing clients in the group is illustrated in Fig. 14. With the adaptive playout, the buffer level differences between the group members are significantly reduced and server-aided adaptation maintains slightly smaller difference over the local adaptation. Since the server selects the minimum value as Ts under the default arbitration policy, it uses the presentation time of the slowest client in the group as the target reference presentation time. The adaptation to Ts at the clients makes the distribution of buffer levels more concentrated. 18
Buffer occupancy difference(s)
0.8 No playout adaptation
0.7 0.6
Local adaptation with cumulative request
Local adaptation with individual request
0.5 Server-aided adaptation with individual request Server-aided adaptation with cumulative request
0.4 0.3 0.2 0.1 0.0 50
100
150
200
250
300
Time(s)
Fig. 14. Maximum buffer occupancy differences between the leading and trailing clients. Table 2. Impact of Ts arbitration policy on the performance. Ts selection
Discontinuity(s)
Playout speed variation(f/s)
Minimum
0.4812
2.8345
Median
0.5562
2.5396
Maximum
0.5812
2.3805
Average
0.4870
2.8291
The appropriate selection on Ts , arbitration policy, impacts the overall performance as shown in Table 2. The result shows that Ts derived from minimum Nc c arbitration policy (i.e., min(Tc1 + d1s + RT T 1 , · · · , · · · , TcNc + dN ) is s + RT T preferable to the others with respect to playback discontinuity. That is, if we choose Ts according to the slowest client in the group, it increases the time available for retransmission for all the other clients. This in turn contributes to the enhanced error recovery and hence results in the reduced discontinuity compared to the other arbitration policies. The averaged Ts selection also reduces the playout discontinuity and speed variation compared to the other policies. Fig. 15 illustrates the consumed bandwidth of RTCP RR and retransmission requests used for feedbacks to the server. Note that we assume the compounded RTCP RR with 120 bytes and the retransmission request packet (regardless of feedback cumulation) of with 12 bytes are used to feedback QoS information and retransmission request. Furthermore, three cases compared here are exposed to the same network losses but may experience different application losses based on the scenario. We can observe the bandwidth ef19
Bandwidth(b/s)
25000
No playout adaptation Local adaptation with individual request
20000
Server-aided adaptation with individual request
15000
Local adaptation with cumulative request 10000
Server-aided adaptation with cumulative request
5000
0
50
100
150
200
250
300
Time(s)
Pr obabi l i t y
Fig. 15. Required feedback bandwidth for retransmission request and receiver report. Server-aided adaptation with individual request 0.8
0.7 No playout adaptation Server-aided adaptation with cumulative request
0.6 Local adaptation with individual request
0.5
Local adaptation with cumulative request
0.4
0
50
100
150
200
250
300
Fig. 16. Average repair probability at time t in a group.
ficiency of cumulative NACK for retransmission request in Fig. 15. The cumulative NACK-based requests show the saving of the feedback bandwidth to a half compared with the individual requests. Fig. 16 illustrates the averaged repair probability of lost packets in each client. It is confirmed that the coordination of the server and clients, i.e., server-aided adaptation, increases the repair probability. Without the adaptive playout, the repair probability is highly fluctuating over the time. However, we note that the repair performance of the adaptive playout with the cumulative requests is worse than that of ‘No playout adaptation’. This result partly implies that the choice on the cumulation (i.e., n = 16) is rather aggressive since it makes the timely recovery to fail from time to time. The lack of the precision in converting buffer 20
level to time quantity and the feedback packet loss (since it can affect up to n retransmission) are partly responsible for this, too.
5
Related Works
Streaming media applications have to deal with network fluctuations (delay jitter and loss) that can cause the disruption in the playout. They also need to keep the playout synchronized regardless of the exceptional events at the client systems. With the synchronization, we typically refer to the reconstruction of temporal relation between different media objects. There are several types of synchronization such as intra-media, inter-media, and inter-client synchronization. Intra-media synchronization addresses the temporal relations among consecutive media contents (e.g., packets) of a single stream. Inter-media synchronization defines the temporal relationships between different types of media objects (e.g., lip synchronization for video and audio). Controlling the skew, i.e., the time difference between media streams, is a key issue to be solved. The permissable skew level is tightly coupled with the human perception. For intra-client synchronization, several reactive controls like discarding, skipping, shortening and extension of output duration, and virtual time contraction and expansion have been developed [24]. The playout adaptation is thus required to maintain the playout quality considering the network fluctuations and system limitations [8–10]. Previous works on the adaptive playout control mostly focus only on the intra-client synchronization issues. Multimedia player with an application-level CPU scheduler adapts the playout speed based on the buffer occupancy level, which is linked to the system loads [8]. It is however focused on the system capability variation than the network fluctuations. An adaptive stream synchronization protocols [9] is used to control the synchronized playout based on the buffer fullness. It supports the notion of master and slave streams to coordinate inter-client synchronization. However, it only covers the signaling for synchronization, and the specific mechanisms to make streams synchronized are kept open. As a promising solution candidate, the adaptive playout with the audio time-scale modification [11, 12] is proposed to synchronize the clients without hurting the playout quality (i.e., without any disruption). They allow us to perform the playout adaptation to gracefully conceal the network delay jitters and packet losses in the Internet. Especially, [12] addresses the adaptive media playout to relax the delay constraint of retransmission as well as minimize the buffer underflow. Based on both channel condition and buffer fullness, it considers delay and loss resiliency with the help of adaptive playout control. The inter-client synchronization is rarely discussed yet. It is important to achieve harmonized playout among the clients while maintaining fairness among 21
multicast group members. It should be done by effectively mitigating the network and system heterogeneity of the clients. [9] deals with the inter-client synchronization as well as the intra-client one by propagating the adapt messages. [25] and [26] introduce inter-client synchronization for live and stored media in multicast environments, respectively. In [25], a synchronization agent is used to exchange control packets. To adjust the presentation time, the delay (expressed in levels) at each client is reported. However, it lacks in considering the propagation delay and the feedback implosion due to short-term fluctuations in the network. For the stored media, [26] utilizes a multicast group disseminating control packets from a master to slave destinations. It requires message exchanges among session members, which is impossible in the SSM environment. Also, other works [27–29] consider the fairness among group members. However, they differ from the proposed work since we are focusing on the role of adaptive playout to the server-aided inter-client synchronization under the multicast streaming environment. The error control directly affects media quality, too. Since the Internet does not guarantee the QoS, a reliable transmission needs error controls including the ARQ, the FEC, and their hybrid. Each of them has its own merit and demerit, and works well in the unicast environment while remaining too challenging in the multicast environment. Especially, the error recovery by retransmission is appealing due to its simplicity and efficiency. However, the required feedback bandwidth and repair latency make the problem difficult. In the unicast environment, works in the IETF (Internet Engineering Task Force) are standardizing selective retransmission mechanisms as a part of RTP/RTCP [20–22]. They focus on modifying the retransmission-based error recovery. For example, [22] introduces a desired RTCP format for the selective retransmission over a wireless link. Cumulative NACK approach in [30] is utilizing multiple multicast channels to cope with the feedback implosion, where a TCP’s window-like scheme is used to merge the retransmission request. Note again that we are focusing the impact of adaptive playout to the multicast error recovery in this paper.
6
Conclusions
In this paper, we propose a framework for the synchronized multicast media streaming with the adaptive playout and error control, which is suitable for one-to-many media streaming with an improved flexibility. The adaptive playout controls the playout speed of audio and video by adopting the time-scale modification of audio based on the buffer occupancy. The main role of the adaptive playout control is to reduce the discontinuity incurred by packet overflow/underflows and momentary CPU overload. Furthermore, RTCP-compatible signaling between the server and clients is observed, where 22
the exchange of controlling message is restricted. By employing the serverclient coordinated adaptive playout control with feedback for the presentation time synchronization, it varies the playback speed within the perceptual limit and adapts the presentation time of each client to help the inter-client synchronization. In addition, each client performs retransmission-based error recovery with the assistance of playout control to compensate for long repair latency. The retransmission request with NACK-based cumulation can save the allocated bandwidth and prevent the feedback implosion. We maximize the role of adaptive playout control to help securing the time available for retransmission. Additionally, the server implicitly assists the error recovery while the clients perform interactive error recoveries. The network-simulator based simulations show that the proposed framework 1) reduces the playout discontinuity caused by the network fluctuations and the system limitations without degrading the media quality, 2) saves the feedback bandwidth induced by retransmission request, and 3) mitigates the client heterogeneity by employing the sever-client coordinated adaptive playout and error control. Thus, the potential of the proposed multicast streaming framework to enhance the multicast streaming quality is well verified.
References [1] K.C. Almeroth, “The evolution of multicast: from the MBone to interdomain multicast to Internet2 deployment,” IEEE Network, vol. 14, pp. 10–20, Jan. 2000. [2] C. Diot, B.N. Levine, B. Lyles, H. Kassem, and D. Balensiefe, “Deployment issues for the IP multicast service and architecture,” IEEE Network, vol. 14, pp. 88–98, Jan. 2000. [3] H. Holbrook and B. Cain, “Source-specific multicast for IP,” Internet Draft draft-ietf-ssm-arch-00.txt, IETF, Nov. 2001. [4] A. Schulzrinne, S. Casner, Frederderick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” IETF RFC 1889, Jan. 1996. [5] V.O.K. Li and Z. Zhang, “Internet multicast routing and transport control protocols,” Proceedings of the IEEE, vol. 90, pp. 360–391, Mar. 2002. [6] S. Floyd and et. al., “Equation-based congestion control for unicast applications,” in Proc. ACM SIGCOMM, Aug. 2000, pp. 43–56. [7] J. Widmer and M. Handley, “Extending equation-based congestion control to multicast applications,” in Proc. ACM SIGCOMM, Aug. 2001. [8] J. Walpole and et. al., “A player for adaptive MPEG video streaming over the Internet,” in Proc. of the SPIE 26th Applied Imagery Pattern Recognition Workshop (AIPR), Oct. 1997.
23
[9] K. Rothermal and T. Helbig, “An adaptive stream synchronization protocol,” in Proc. of the Fifth International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Apr. 1995, vol. LNCS 1018, pp. 189–202. [10] B. Moon, J. Kurose, and D. Towsley, “Packet audio playout delay adjustment: performance bounds and algorithms,” ACM/Springer Multimedia Systems, vol. 5, no. 1, pp. 17–28, Jan. 1998. [11] F. Liu, J. Kim, and C.-C. J. Kuo, “Adaptive delay concealment for Internet voice applications with packet-based time-scale modification,” in Proc. IEEE ICASSP, May 2001. [12] E. Steinbach, N. F¨arber, and B. Girod, “Adaptive playout for low-latency video streaming,” in Proc. International Conference on Image Processing (ICIP), Oct. 2001, pp. 962–965. [13] S. Bhattacharyya and et. al., “An overview of source-specific multicast (SSM) deployment,” Internet Draft draft-ietf-ssm-overview-03.txt, IETF, Aug. 2001. [14] H. Holbrook and D. Cheriton, “IP multicast channels: EXPRESS support for large-scale single-source applications,” in Proc. ACM SIGCOMM, 1999, pp. 65–78. [15] J. Chesterfield and et. al., “RTCP extensions for single-source multicast sessions with unicast feedback,” Internet Draft draft-ietf-avt-rtcpssm-00.txt, IETF, Feb. 2002. [16] D. Hoffman and et. al., “RTP payload format for MPEG1/MPEG2 video,” IETF RFC 2250, Jan. 1998. [17] D. L. Mills, “Internet time synchronization: The network protocol,” IEEE Trans. on Communications, vol. 39, no. 10, pp. 1482–1493, Oct. 1991. [18] W. Verhelst and M. Roelands, “An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech,” in Proc. IEEE ICASSP, Apr. 1993, pp. 554–557. [19] B. Cain, S. Deering, and A. Thyagarajan, “Internet group management protocol, version3,” Internet Draft draft-ietf-idmr-igmp-v3-01.txt, IETF, Feb. 1999. [20] J. Ott and et. al., “Extended RTP profile for RTCP-based feedback,” Internet Draft draft-ietf-avt-rtcp-feedback-02.txt, IETF, July 2001. [21] C. Burmeister and et. al., “Extended RTP profile for RTCP-based feedback results of the timing rule simulation,” Internet Draft draft-burmeister-avt-rtcpfeedback-sim-00.txt, IETF, Nov. 2001. [22] A. Miyazaki and et. al., “RTP payload formats to enable multiple selective retransmission,” Internet Draft draft-ietf-avt-rtp-selret-04.txt, IETF, Nov. 2001.
24
[23] K. Fall and editors K. Varadhan, “ns notes and documentation,” The VINT Project at LBL, Xerox PARC, UCB, and USC/ISI, Aug. 2001, http://www.isi.edu/nsnam/ns/. [24] Y. Ishibashi, S. Tasaka, and H. Ogawa, “A comparison of media synchronization quality among reactive control schemes,” in Proc. IEEE INFOCOM, Apr. 2001, vol. 1, pp. 77–84. [25] Y. Ishibashi and S. Tasaka, “A group synchronization mechanism for live media in multicast communications,” in Proc. IEEE GLOBECOM, Nov. 1997, vol. 2, pp. 746–752. [26] Y. Ishibashi, A. Tsuji, and S. Tasaka, “A group synchronization mechanism for stored media in multicast communications,” in Proc. IEEE INFOCOM, Apr. 1997, vol. 2, pp. 692–700. [27] J. Escobar, C. Partridge, and D. Deutsch, “Flow synchronization protocol,” IEEE/ACM Trans. on Networking, vol. 2, no. 2, pp. 111–121, Apr. 1994. [28] Y. Ishibashi, S. Tasaka, and Y. Tachibana, “Adaptive causality and media synchronization control for networked multimedia applications,” in Proc. IEEE ICC, 2001, vol. 3, pp. 952–958. [29] J. Jo and J. Kim, “Synchronized one-to-many media streaming with adaptive playout control,” in Proc. SPIE ITCOM: Multimedia Systems and Applications V, July 2002. [30] G. Poo, “A cumulative negative acknowledgment (CNAK) approach for scalable reliable multicast,” in Proc. Computer Communications and Networks, Oct. 2001, pp. 268–273.
25