Adaptive versus Reservation-Based Synchronization Protocols

0 downloads 0 Views 1MB Size Report
Apr 4, 2001 - An example of resource reservation is RSVP/IP protocol stack with an ability ... We have such a solution [4, 5] and we use it in this paper together with ..... Open TCP Connection for Feedback Info ...... tional instructions are required. ..... networking,” in Proc. of 4th NOSSDAV, Lancaster, England, Nov. 1993.
P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

Multimedia Tools and Applications, 14, 219–257, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. °

Adaptive versus Reservation-Based Synchronization Protocols—Analysis and Comparison HUNG-SHIUN ALEX CHEN LINTIAN QIAO KLARA NAHRSTEDT Department of Computer Science, University of Illinois at Urbana-Champaign

[email protected] [email protected] [email protected]

Abstract. With the expansion of distributed multimedia applications, such as video-phone, video-conference, and video-on-demand, synchronization among various media (time-dependent, time-independent) becomes an integral part of various protocols, mechanisms and services in the underlying computing and communication systems. The current systems allow and provide two different resource management environments where synchronization will be considered: (1) best effort resource management, and (2) reservation-based resource management with differentiation of service classes. Under these two resource management environments, our goal is to analyze and compare the design, implementation, and performance of synchronization protocols and services. Our approach to accomplish this complex analysis is inductive, because we select a representative protocol from each group, and consider an adaptive synchronization protocol on top of the best effort resource management and a reservation-based synchronization protocol on top of the reservation-based resource management. We believe that both protocols include a rich set of known synchronization algorithms and mechanisms, hence our resulting analysis and comparison show: (1) trade-offs/difference in design complexity of the synchronization protocols (space and time), (2) trade-offs/difference in implementation complexity of the synchronization protocols (space and time), and (3) magnitude of performance changes. Keywords: multimedia, synchronization, adaptive, reservation

1.

Introduction

Synchronization between audio and video are becoming an integral part of our distributed multimedia applications such as video-on-demand, video-conferencing, video-phone and others. Various protocols and mechanisms have been developed to meet the synchronization requirements in both local and distributed systems. Furthermore, the new research and technology start to allow and provide computing and communication environments with at least two service classes along the end-to-end communication path for distributed multimedia applications. One service class is the best-effort class which exists in traditional OS and network environments using FIFO queues, and time-sharing to process and communicate their objects of interest (e.g., processes, packets). Examples of best effort service are the time-sharing resource management in UNIX OS or the best-effort IP protocol over Ethernet network. The other class is the reservation-based class which uses admission control, static priorities, traffic shaping and weighted fair queuing, to process and communicate their objects of interest. An example of resource reservation is RSVP/IP protocol stack with an ability of network bandwidth reservation. Another example is the ATM network, which provides

P1: vendor, GHS Multimedia Tools and Applications

220

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

multiple service classes and allows for bandwidth reservation of CBR (constant bit rate) or VBR (variable bit rate) traffic. At the CPU and memory level, the reservation-based service classes are only starting to emerge, and are not widely available in general purpose OS. We have such a solution [4, 5] and we use it in this paper together with underlying ATM reservation approach to generate an end-to-end reservation-based environment. Without the CPU (OS) reservation, one gets only reservation-based service at the network level (using RSVP or ATM), which does not provide an end-to-end reservation-based solution. In the future we will see many end-to-end environments, but we believe that at least two will emerge in the near future: (1) the best-effort environment offering best-effort service, and (2) the reservation-based environment offering both services classes (best-effort and reservation-based). Under these two different environments, the multimedia applications and their synchronization will need to be supported, therefore we believe that a careful analysis of synchronization protocols under best-effort and reservation-based systems will show valuable lessons and trade-offs for future multimedia services and protocol design and implementation. The objective of this paper is to present (1) trade-offs/difference in design complexity of the synchronization protocols (space and time); (2) trade-offs/difference in implementation complexity of the synchronization protocols (space and time); (3) magnitude of performance changes. In order to study the above stated issues, we designed and implemented two representative synchronization protocols in each environment (best effort and reservation-based), applying the rich set of known synchronization algorithms and mechanisms and analyzed them carefully.1 One protocol is the adaptive synchronization protocol which works in the best effort Internet/UNIX environment. The second protocol is the reservation-based synchronization protocol which works on top of the QualMan environment, comprising a QoS-aware resource management with provision of reservation for CPU, memory and bandwidth resources [12]. The results of this paper show and experimentally validate the difficult spots and the intuitive properties of synchronization behavior in both environments. The paper is divided as follows: Section 2 presents extensive related work in the area of synchronization, and Section 3 discusses the design of both protocols. In Section 4 we provide details of implementation and results of each protocol in the considered environments. Section 5 evaluates and compares both protocols in terms of their trade-offs, complexity and performance magnitude. We conclude in Section 6. 2.

Related work

A number of services, protocols and mechanisms have been developed to meet synchronization requirements in both local and distributed environments. In this section, we try to enumerate the related work without the details. We will discuss in depth those specifications and schemes we adopt, such as time-axis based, specification and master/slave scheme in the following sections. We will also compare some reported protocols with our schemes. The major requirements for audio/video synchronization in multimedia systems are bounded jitters within a continuous stream, and minimal and acceptable skews among dependent streams. Steinmetz’s experiments [25] measured audio-visual skews within an

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

221

analog environment that are perceived as “out-of-sync”. A desired lip synchronization skew should be ≤80 ms. Between 80 ms and 160 ms the skew was found acceptable. However, beyond 160 ms skew is perceived as annoying. The synchronization requirements must be specified to the application, so that the application can invoke services, protocols and mechanisms for its provision. Known synchronization specifications are time-axis based specification, timed petri nets, interval-based specification, or time flow graphs ([9, 10, 17, 25, 27], etc.). From the synchronization specification, the internal system derives presentation schedulers and local synchronizers. Examples of systems which include this type of schedulers are MODE [3], multimedia teleorchestra [10], continuous media I/O server [2] or ACME [2] systems. These systems rely on special purpose stream schedulers/schemes to achieve local synchronization with low jitters and skews. Within a distributed environment, protocol-based techniques are applied. Shepherd’s scheme [26] suggests two different techniques for inter-stream synchronization, the synchronization marker (SM) for indication of synchronization points and the synchronization channel. The synchronization markers are transmitted over a separate synchronization channel. Nicolaou’s scheme [13] identifies two levels of data elements for defining synchronization points. The logical synchronization frames (LSF) are defined as the unit of synchronization for the control application while physical synchronization frames (PSF) are for the communication subsystem. Li [10] presents a multimedia segment delivery scheme (SDS) for the simultaneous delivery of multimedia streams belonging to the same time interval. SDS performs the synchronization recovery at the multimedia receiver before the playback and employs the synchronization parameters to guarantee the control for the different types of data streams. The protocol-based synchronization techniques must deal with system changes such as network delays. Escobar’s flow synchronization protocol [7] takes into account the dynamic changes in network delays by monitoring the jitter and re-synchronizing at the receiver if necessary. However, it assumes the presence of global clock in a distributed environment at all time. Similar approach is taken in the Rothemel’s adaptive synchronization protocol [18]. Another approach to re-synchronize is to send feedback. An example is Ramanathan’s feedback technique [23]. The multimedia server provides distribution of video streams over BISDN to mediaphones which are devices that have minimum capability to playback media and lack the sophistication to run any type of synchronization mechanism. The intra-stream and inter-stream synchronization are maintained by mediaphone’s sending feedback units (lightweight messages) to the server. Bounded buffering and feedback from all devices are used for inter-media synchronization. With the compression of video streams, the intra-stream synchronization must be controlled closer. Rangan [20] discusses the continuity and synchronization issues in MPEG compressed video streams. There exist services/protocols for support of video or audio communication such as vat, nv, vic, VDP [6, 11, 21], and others. However, these protocols do not yet support lip synchronization. The protocols do, however, include intra-stream adaptation towards the system changes by dropping information within a stream if system resources are not available.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

222 3.

CHEN, QIAO AND NAHRSTEDT

Design of synchronization protocols

In this section we will present two different designs of synchronization schemes. The two synchronization schemes reflect the current environments for provision of end-to-end QoS guarantees. Two different environments for provision of QoS guarantees emerged: • Best effort environment with adaptive protocols In the best effort environment, the networks and OS maximize the throughput and do not provide QoS guarantees. In this case, if applications desire QoS guarantees within a certain range on top of best-effort environment, then these applications must adapt, or utilize adaptive schemes/protocols at the middleware level to assist them in provision of QoS guarantees. An example of the best-effort environment is the Internet network with Unix OS support at the end-points. Various adaptive transmission protocols are deployed on top of the type of environment such as the VDP (Video Datagram Protocol) [6], vat (adaptive video/audio protocol) [8], and others. • Reservation-based environment In the reservation-based environment the network and OS use reservation, admission and enforcement algorithms to provide end-to-end QoS guarantees. In this case, the application does minimal work in QoS provision and relies heavily on the underlying QoS provision. An example of this kind of environment is the ATM network with QoS-aware resource management in the OS kernel (e.g., Mach OS) [19] or in the middleware using real-time extension of an OS kernel (e.g., QualMan System) [12]. Various experimental transmission protocols are deployed in this type of system such as the native ATM transport protocol in QualMan [12]. In each of the above described environments we designed a representative synchronization protocol based on existing synchronization algorithms with the goal to provide a careful analysis and comparison between them so that multimedia system designers get a good feel of trade-offs when making a choice for a new distributed multimedia system. In Section 3.1, we present the design of our adaptive synchronization protocol in the best-effort environment. In Section 3.2. we discuss the design of the reservation-based synchronization protocol which runs on top of the ATM high-speed network and the QoSaware resource management, called QualMan [12], developed at University of Illinois, Urbana-Champaign. QualMan allows the synchronization protocol to use resource servers such as the CPU server, memory server and communication server, which provide resource reservation, admission control and resource reservation enforcement. 3.1.

Assumptions and lip synchronization specifications

3.1.1. Assumptions. In our adaptive distributed VOD system, we assume that (1) synchronization requirements are input during the recording process and the presentation schedule is predetermined, (2) there are always sufficient OS resources to deliver audio at its recording rate, and (3) the application can access services for Quality of Service (QoS) processing at the application layer of the communication architecture. In adaptive synchronization

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

223

protocol, this would include probing and QoS negotiation services during the call set up phase, QoS monitoring service at the client side with feedback to the server, and adaptation service to modify the rate during the transmission phase at the client and server sides. In reservation-based synchronization protocol, the application should have access to underlying system services such as resource reservations of CPU, memory, communication resources during the call setup phase and proper resource allocation enforcement according to the reservations during the transmission phase along the end-to-end path. In both systems, we do not assume global synchronized clocks. 3.1.2. Lip synchronization specification. To address the lip synchronization problem within an adaptive application there are two important observations to note: First, human perception of changing sampling rate depends on the media. According to the experimental results in [1], reducing video frame rate from the normal rate 30 fps to even 10 fps is acceptable, especially in a multitasking window-based environment. However, we cannot decrease audio streams by this factor since the human ear is much more sensitive to the perceptual medium quality than the human eye. The user would be annoyed by duplicated or delayed audio. Second, audio stream data demand much less system resources, such as CPU or network bandwidth, than video. Based on these observations, we set as a policy that no intentional adaptation of MPEG-1 compressed audio data will be considered in our scheme. For specification of the synchronization relation we use the time-axes-based specification. It is a simple but effective way to provide good synchronization abstraction for media contents. All single medium objects are attached to a time axis that presents an abstraction of real-time [25]. During the recording phase, each audio and video LDU (Logical Data Unit) is time-stamped2 with the starting playout time within the stream. Due to the audio loss sensitivity, audio stream is used as the time-axis divider and the master stream to control the playout. The video stream is the slave stream and is under the control of the audio stream. Audio plays at its original audio recording rate. 3.2.

Adaptive synchronization

Many distributed multimedia applications are designed, implemented and used on top of general purpose OS and network platforms (e.g., UNIX/Internet). When underlying system can not provide sufficient resources, adaptation must be introduced on top of the ‘best-effort’ environment. Because of the dynamically changed adaptive behavior, the synchronization between audio and video stream has to be designed to suit it. In this section, we will discuss the adaptive services and its synchronization protocols. 3.2.1. Basic concepts. We need also to clarify two terms before we describe our design and protocols. (1) The recording rate is determined during the recording phase when the audio and video are digitized and MPEG compressed. This rate is important because of synchronization between audio and video streams. (2) The system rate is the rate that the system can actually provide. It is initially determined during the call establishment phase where we use probe-based algorithm [14]. This algorithm uses a probe—a continuous clip

P1: vendor, GHS Multimedia Tools and Applications

224

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

from the beginning of the stream—to determine what is the possible system rate at the client side. The server sends stream frames as fast as possible and at the client a monitoring service observes what is the possible rate to play the received frames. The probing is aborted either when a degradation point is reached, i.e., the rate decreases below a specified bound, or timeout for probing is called. Client’s measured playback rate during the probe time is used as the initial predicted system rate at the client and server sides for the transmission. 3.2.2. Video adaptation. When the video system rate differs from the video recording rate, the application must adapt. The video adaptation scheme utilizes the knowledge about the MPEG-1 video structure, so that the application can access and process intra-frame structures of MPEG consisting of I, P, and B frames. We divide the video stream into groups, and each group consists of one or more MPEG-1 GROUPs (this procedure can be easily extended to other format). With the monitoring of the system load (which is represented by the actual time to process and display one group of pictures), we can calculate the difference between the time required for displaying one group at the recording rate and the time required for displaying one group at the system rate (the system rate will reflect the combined effects of network and OS status dynamically). If this difference is zero or positive, then the server waits for certain time between sending out two consecutive frames without dropping anything. If the difference is negative, we first try to drop some or all of the B frames. If it’s not sufficient, we apply the same strategy to P frames and then I frames. This also implies a B, P, I priority in increasing order. The algorithm’s details and experimental validation in UNIX/Internet environment are described in [14]. 3.2.3. Lip synchronization specification. As described in Section 3.1.2, we set as a policy that no intentional adaptation of MPEG-1 compressed audio data will be considered in our scheme. When less system resources are available, only MPEG-1 compressed video data are to be dropped. MPEG-1 system standard ISO/IEC 11172-1 [16] specifies synchronization relations between audio and video objects, but it cannot be used in our approach. The reason is that our adaptive scheme manipulates video streams at their frame level, and the interleaving and packetizing structure [20] of MPEG-1 system standard would require that we introduce too much overhead to access individual video frames. Instead, we consider separate MPEG-1 audio and video standards and develop our own adaptive synchronization scheme to preserve the relationship between audio and video in the best-effort environment. 3.2.4. Adaptive synchronization scheme. Both the client and server are responsible for synchronization. To achieve synchronized behavior, the adaptive synchronization scheme incorporates client/server protocols during the call establishment phase and transmission phase, synchronization control on the server side and synchronization control on the client side. Our scheme must preserve the synchronization relationship between audio and video LDUs even when the match-up between the audio and video LDUs dynamically changes due to the dropping of video frames. We call this problem the Dynamic Audio-Video Match-up Problem (figure 1).

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

Figure 1.

225

Dynamic A-V match-up problem.

Client/server protocol. Our client/server protocol utilizes the adaptation scheme from Section 3.2.2 and extends it to support lip synchronization. The protocol consists of two parts: negotiation protocol during call establishment and adaptation / renegotiation protocol during transmission phase. During the call establishment, the client opens a control connection for negotiation and feedback information (see Table 1). In the next step, the directory of available movie clips (including both video and audio) at the server is sent to the client. The client selects a movie and sends a server request for playing a movie. The server receives the request and sends the client the relevant information about that movie such as the number of I, P, B video frames in one group (nI , nP , nB ), whether audio is available or not, audio sample frequency (afreq ), audio layer type, and video recording rate (ru ). The following step of the call establishment phase sends probing audio/video data (e.g., first group) at a maximal rate from the server to the client to determine the possible video system rate. As discussed in Section 3.2.1, the resulting system rate measured at the client side is communicated to the server side.

Table 1.

Negotiation during call establishment. Server

Client < −− Open TCP Connection for Feedback Info

Send File List

−− > Receive File List

Receive File Name

< −− Send File Name

Send n g , n I , n P , n B , afreq , ru

−− > Receive n g , n I , n P , n B , afreq , ru < −− Open UDP Connection for video and audio

Send first group (QoS Probing starts) −− >

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

226 Table 2.

CHEN, QIAO AND NAHRSTEDT Adaptation service/renegotiation protocol for lip synchronization.

Server

Client tw =

receive tw



l − tsend )

if tw < 0 then if ((diff =

ng rs

− t I × n I − t P × n P ) ≥ 0) diff tB

{n B(allowed) = receive pattern

} else if ((diff =

ng rs

− t I × n I ) ≥ 0)

{n P(allowed) =

diff tP

drop all B frames receive pattern



compute a pattern to display n P(allowed) in GROUP } else ng n I groups t I ×rs c = b ng n g c-th I

drop all B, P frames in display every b

tI

frame

rs

receive pattern



(now involves

ng nI

groups)

During the transmission phase, the client monitors the network and OS load, and calculates tw for a video group. tw is the difference between the time required for group display when video recording rate (ru ) is specified, and the time required for group display when video system rate (rs ) is specified. If tw ≥ 0, i.e., the system has enough power/bandwidth to process the recording frame rate (ru ), then the parameter tw is sent to the server. The server sends frames without dropping any of them and waits the specified time between sending out two consecutive frames. The waiting period is decided as shown in Table 2. If tw < 0, i.e., the recording rate is higher (ru ) than the system can provide (rs ), then adaptation is enforced by dropping some or all B frames first, some or all P frames, and lastly, I frames (see Table 2) The number of B or P or I frames which are allowed to display is used to compute the drop pattern, and the pattern is sent to server. The server uses the pattern to dynamically match-up the appropriate video frames and audio frames and send them to the client. Examples 1 and 2 in Table 3 demonstrate the pattern decision when modification (e.g., nB(allowed) ) is requested.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

227

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION Table 3.

Demonstration of drop pattern decision (examples). n B = 6, n P = 4, n I = 2

Example 1 n B(allowed) = 6

I

B1

P

B2

P

B3

I

B4

P

B5

P

B6

n B(allowed) = 3

I

x

P

B2

P

x

I

B4

P

x

P

B6

n B(allowed) = 0, n P = 9, n I = 3

Example 2 n P(allowed) = 9

I

P1

P2

P3

I

P4

P5

P6

I

P7

P8

P9

n P(allowed) = 4

I

P1

P2

x

I

P4

x

x

I

P7

x

x

Synchronization control at the server side. The server operates on separate MPEG-1 audio and video streams. The audio/video frames are indexed with time-stamps indicating the desired playout time within the stream on the time-axis. The adaptive synchronization scheme provides the following control: • Utilizing the client/server protocol, the server gets the video drop pattern within a group and determines which two video frames will be selected to send out consecutively; V V − TScurr • The server calculates the difference TSnext ent between the time stamps of those two video frames. • The result is used to decide how much audio data will match up with the current video frame. The match-up process is done at the frame level (video frame vs audio frames). Because of the different frame rate between audio and video streams, one audio frame will almost always span over two video frames (see figure 2). We can choose to let the overlapping audio frame match-up with the first video frame (i.e., the current frame) for number of reasons. Firstly, no matter how precise the synchronization is provided at server side, the network/OS will almost always introduce jitters. Secondly, a skew of +80 ms or −80 ms is undetectable by human eye and is considered “in-sync”. This range is actually large enough to cover the mismatch-up of one audio frame.3

Figure 2.

Audio match-up when audio data spans over two video frames.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

228

18:24

CHEN, QIAO AND NAHRSTEDT

• The server packetizes the audio/video frames into one multiplexed APDU (application protocol data unit). The multiplexed PDU consists of one video frame followed by its matched-up corresponding audio frames. The header of the APDU includes the time stamp, serving as a sequence number, as well as auxiliary information such as the audio/no audio indicator and the starting and ending time stamps of corresponding audio frames. The advantage of this approach is that the related synchronization information is delivered together on one communication channel, no additional synchronization channel is needed, and no additional delay is caused between related audio/video LDUs. In general, if the underlying network/OS provides different levels of guaranteed QoS services, then the multiplexing approach is not the best solution because the resulting QoS would have to match the requirements of all involved media (e.g., reliability is dominated by the most stringent media), which might result in pessimistic resource allocation. However, in our case of UNIX and Internet best-effort environment we do not have various levels of underlying QoS guaranteed services. The multiplexing approach can therefore be justified. The individual steps of the synchronization control at the server side are shown in figure 3.

Figure 3.

Synchronization protocol at the server side.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

Synchronization control at the client side. can be described as follows:

229

At the client side the synchronization control

while (not end of playback) { QoS monitoring; /* collect system rate, processing time, etc. */ read network and de-multiplexing; /* receive data and put into audio/video buffers */ estimate audio playback time; /* see Audio Playback Estimation */ retrieve and playback audio; estimate video waiting time; /* see Synchronization Scheme if (video waiting time >= 0) during Audio/Video Playback */ retrieve and playback video; else skip video; wait(video waiting time); }

This protocol is a simple and effective way to guarantee the lip-synchronization. There are two interesting situations which need closer explanation: • Audio playback estimation We need to decide how many audio data frames (n A ) have to be decoded and sent to audio device between the playback of two consecutive TSV −TSV V video frames. The theoretical number is n A = d next t A curr ent e, where TScurr ent is the play V time-stamp of the current video frame (mapped to the local clock), TSnext is the timeA stamp of the next video frame, and t play is the playout time of one audio frame. We start A at time TS0 to decode and send n audio frames to audio device (device buffer). Let the finishing time of the ‘audio decoding/sending to audio device’ process for those n A A V A audio frames be TSend . Then TScurr ent − TSend is the time left to process and display the current video frame. However, due to UNIX/Internet non-deterministic behavior, we may end up with audio gap because the actual process time of the current video frame V A V tcurr ent may be longer than TScurr ent − TSend (see figure 4). The length of the audio gap is V A V tcurr ent − (TScurr ent − TSend ) (if > 0) (we assume that there is no delay between starting audio decode and starting audio play in audio device because that time difference is t V −t V negligible). One way to solve the gap problem is to send 1 = d maxt A min e audio data to play the audio decoder in addition to n A . Although this estimation is good enough most of the time, it can also fail. For example, this approach will fail when the system has a sudden V V big load increase and the tcurr ent lasts much longer than tmax . In our experiments, we use upper bound h > 1. The choice of h varies depending on the audio device buffer size and the audio stream type. In our experiments we use h = 20 which corresponds to about half second of audio playout time.4 Note that we only limit the “step-ahead” amount to a total h = 20, therefore the h audio frames are not added every time. For example, if in the previous round, video is late by about 5 audio frame playing time, then only 5 more audio frames are added to n A . If, in the previous round, video is not late, then no additional frames are added to n A . A , we use • Synchronization scheme during audio/video playback At the time TSend t I , t P , t B as the estimation (average times based on past values) of processing times for

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

230

Figure 4.

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

Audio gap issue.

I, P, B frames, respectively. We must make a decision at the end of audio decoding if the next video frame will be decoded or not to achieve lip synchronization. The control decision is done as follows: V A – If t B TScurr ent − TSend + 80 ms (audio ahead video), then skip that B frame. V A For P type frame, even if t P > TScurr ent − TSend + 80 ms, we still decode it and display it unless it is the last one before the next I frame (this could happen because of special frame pattern or intentionally dropping all B frames by the adaptation). For I type V A frame, even if t I > TScurr ent − TSend + 80 ms, we decode and display it unless only I frames are left in the down stream or the whole GROUP is late. After playing the late I or P frames, we try to skip B frames in the next steps to catch up. The tradeoff is getting “out-of-sync” for short period of time and avoid skipping large number of frames which will certainly cause visual disturbances and most likely introduce “out-of-sync” when playing next GROUP of frames.

For the waiting operation, we follow the concept of Restricted Blocking which was introduced by Steinmetz [25]. While waiting for audio segment to finish playback, the

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

231

system performs other actions, like reading network (receive more data from server), demultiplexing, and monitoring system status. 3.2.5. Jitter and skew analysis. Because we are considering the “best-effort” environment, we can not precisely monitor, predict, and control the system operation and the network behavior. Therefore, jitter could be introduced in many places along the end-to-end path of our adaptive VOD system. For example, at the server side, the data retrieving and sending times are different from frame to frame due to the size of the frame, the server load, etc. At the network level, the uncertainty of UDP protocol processing (network delay, packet receiving order, data lost rate, etc.) also introduces jitters. At the clientside, system load, different decoding times for different types of frames, monitoring process, and buffering all contribute to the jitter. It is impossible nor necessary to measure jitters in each place of the end-to-end path. It is sufficient to simply define the overall jitter at the time of display (at the client side) as follows: j (t) =

1 1 − rs (t) ru

(1)

where the recording rate ru (t) = ru is constant because it is pre-specified for the playback process. With the adaptive control theory, we can show that the jitter is bounded [15]. Furthermore, the synchronization skew s(t) between audio and video is also bounded. In fact, it can be showed [15] that s(t) = j (t) =

1 1 − rs (t) ru

(2)

Intuitively, the skew of current frame can be expressed as the skew of previously displayed frame plus the jitter that the system introduces by processing the current frame. Normally, the skew would be accumulated and could be out of control. However, with our re-synchronization enforcement, the skew will be wiped out at each re-synchronization point, therefore, the skew of the following frame is as simple as the jitter. 3.3.

Reservation-based synchronization

In this section, an end-to-end QoS-aware environment is used for the audio/video interstream synchronization. We discuss the VOD client and the VOD server which utilize this type of environment. The basic promise of the environment is that by bounding (1) the end-to-end delay of the coordinated streams and (2) the scheduling jitters, the interstream synchronization skews can be guaranteed. Our experiments show that with proper resource reservation, admission control, and efficient scheduling of resources, such as CPU, memory and bandwidth, the proper synchronized temporal behavior of multimedia applications can be preserved. Both the client and server are responsible for synchronization. To achieve synchronized behavior, the reservation-based synchronization scheme integrates the client/server protocols with synchronization control at the server side and synchronization control at the client side during the call establishment and transmission phases. A typical

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

232

CHEN, QIAO AND NAHRSTEDT

video-on-demand application not only plays video and audio streams, but also provides the capabilities of user interactions such as fast forward, rewind, and pause. Our synchronization protocol provides audio/video synchronization during playout phase. It calculates and stores the synchronization information at regular time points to serve later user interactions. If pause action is activated, the audio/video synchronization point is stored to be able to restart the streams later. For fast forward and rewind operations, no inter-stream synchronization protocol is needed, because fast forward or rewind operations operate on a single stream (video). The only important issue here is that once the operation stops, we need to recapture the nearest synchronization point and restart the synchronization protocol for playback. The buffering and network signaling are required to support the user interactions. It is, however, not within the scope of the paper and will not be discussed further. Our client /server protocol consists of three phases: negotiation phase, reservation phase, and transmission phase. 3.3.1. Negotiation phase. During the call establishment, the client opens a control connection for control and file information (see Table 4). In the next step, the directory of available movie clips (including both video and audio) at the server is sent to the client. The client selects a movie and sends a request for playing a movie to the server. The server receives the request and sends the client relevant information about the movie such as whether audio is A ), maximum available or not, video recording rate (ru ), maximum audio frame size ( f smax V video frame size ( f smax ) and audio sample frequency (afreq ). The client collects the information and according to the information, both the client and server calculate the required resources, such as CPU, memory, and network bandwidth. They both then reserve them from the resource servers embedded in the QualMan resource management. Note that our protocol descriptions use notation T as a time point and t as a period of time. 3.3.2. Reservation phase. QualMan resource management includes a soft real-time CPU server for continuous media processing in the UNIX environment. The CPU server provides a scheme for real-time multimedia applications to reserve their CPU usage in the form of (period, CPU usage in percentage). A profile of the CPU usage percentage for various video/audio average frame sizes and frame rates is created offline using probing [12] and is used as the seed. In addition, QoS-aware memory allocation is introduced to minimize the frequency of the page fault for continuous media to provide predictable CPU execution Table 4.

Information exchange during call establishment. Server

Client < −− Open TCP Connection for File Info −− > Receive File List

Send File List Receive File Name Send ru ,

A , f smax

V ,a f smax f r eq

< −− Send File Name A , f sV , a −− > Receive ru , f smax f r eq max

< −− Open ATM Connections for video and audio (network bandwidth reservation) CPU, memory reservation

CPU, memory reservation

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

233

times. The network bandwidth is guaranteed with the provision of network bandwidth reservation support from the underlying network, such as the ATM high-speed network. By reserving the maximum bandwidth required for a connection, network bandwidth guarantee is attained with the trade of under-utilization of the network. The maximum bandwidth (BW MAX ) includes sufficient bandwidth to accommodate the product of the maximum frame size of the stream and the recording frame rate of the stream. 3.3.3. Transmission phase. In the QualMan resource management, in order to achieve synchronization, we need to coordinate the CPU server and the communication server. The synchronization scheme negotiates a bandwidth contract with the communication server which corresponds to the frame-based shape enforced by the frame-rate controller. It means that every “k” ms one frame will be sent out (or received), and it implies that we need to provide for frame-based CPU reservation5 and the cooperation between the CPU server and the frame-rate controller in the communication server (see figure 5). Due to process scheduling6 implemented in the CPU server, stream dependency can be preserved. Hence, the synchronization skew can be maintained as shown in the figure 6(a), where one group of audio frames matches one video frame. In figure 5, we assumed that there is always sufficient CPU service time to process one frame during the frame period. However, the assumption might be violated if we do not have proper synchronization control at the startup time of a stream and the processing of one frame finishes across the frame period boundary as shown in figure 6(b) and more in detail in figure 7. To coordinate two

Figure 5. Cooperation between the reservation-based CPU scheduler and the frame rate controller. The reservation-based CPU scheduler guarantees that there is sufficient CPU service time to process one frame during the frame period, P, and the process gets CPU service time every P time units. For example, the receiver (or sender) gets its CPU service at T0 , starts to decode and play (or transmit) a frame at T1 , and finishes the processing of one frame at T2 . On the other hand, the frame rate controller will block the frame playout (or transmission) for a period of time, tbk according to the negotiated bandwidth contract with the communication server, to ensure the negotiated traffic shape. This means that the receiver (or sender) at T00 gets another CPU service, but will not process frame i + 1 until T10 .

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

234

CHEN, QIAO AND NAHRSTEDT

Figure 6. Startup time synchronization problem. Figure 6(a) shows the ideal case that one group of audio frames matches one video frame in each CPU period P, which is equal to the video frame period. Figure 6(b) shows the V − T A . The synchronization difference of startup times between the two streams, and the skew is equal to Tstart start protocol solves the problem of startup synchronization such that the stream dependency can be preserved as shown in figure 6(c).

Figure 7.

Detailed view of the startup synchronization problem.

streams, a synchronization facility, such as semaphore, is used to synchronize between two streams. Therefore, the stream dependency is preserved through coordination as shown in figure 6(c), where the audio stream pauses at T0 and the video stream pauses at T1 . A centralized coordination management resumes the two streams in the next CPU period when it detects that both streams are ready to transmit or play their first frame.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

235

We do not discuss schedulers on the switches because our focus is on the end system design. The service discipline operating on the ATM switches may range from a simple FCFS (First Come First Serve) to a more complex one with an algorithmically defined priority-based service. [28] showed that even for a simple service discipline like FCFS, deterministic delay bounds can be obtained with provision of resource reservation from the underlying network even when the sum of the peak rates of all connections is greater than the link speed, although the sum of the average rates of all connections has to be less than the link speed. We assume that there exists a maximum end-to-end delay bound in the underlying network even if the bound is calculated in the “worst case”. To achieve lip synchronization to users satisfaction in a reservation-based environment, we will analyze conditions how to bound the jitter within a stream and the synchronization skew between streams in the Section 3.3.4. 3.3.4. Jitter and skew analysis. The reservation-based synchronization protocol needs to consider these types of delays and their impact on time and space complexity. The first type of delay is the variation of the frame inter-arrival time 1Tarrival which might be caused by the jitterly behavior of the sender, network, and receiver. 1Tarrival has an impact on the size of the buffer space at the receiver side and the buffer space allocation has an impact on the perceived jitter and synchronization skew by the user. The second type of delay is the jitter delay j (t) of the video frames. The jitter delay might be caused by the scheduling imprecision of display tasks between the buffer space and the display device. The third type of delay is the synchronization skew s(t) between the playout video and audio streams at the receiver side caused by the scheduling imprecision of decomposition/playout tasks. We will analyze below each delay type in more detail. The audio/video playout system empties data from the receiver buffer at the recording frame rate to maintain a smooth playback. A data frame, APDU (Application Protocol Data Unit), is transmitted to the receiver buffer by sending out number of TPDUs (Transport Protocol Data Unit). The time spent along this path is tapdu + EED, where tapdu is the processing time to send out all TPDUs comprising one APDU, and EED is the end-to-end delay for the last TPDU as shown in figure 8. The resources (CPU, memory, and network bandwidth) reservation scheme and frame rate controller guarantee to transmit frames in a timely fashion (once every frame period). Hence, the frame period (P) is the upper bound of tapdu . There exists a maximum end-to-end delay (EEDmax ) supported by the underlying network as discussed in Section 3.3.2, and a minimum end-to-end delay (EEDmin ), i.e. the propagation delay. Figure 9 shows the transmission of the ith frame from the sender’s buffer at TSi and arrival i , is calculated as to the receiver’s buffer at TRi . The variation of frame arrival time, 1Tarrival the expected arrival time (TR1 + (i − 1) ∗ P) minus actual arrival time (TRi ): i = TR1 + (i − 1) ∗ P − TRi 1Tarrival ¡ ¢ 1 = TS1 + Tapdu + EED1 + (i − 1) ¡ ¢ i ∗P − TS1 + (i − 1) ∗ P + Tapdu + EEDi ¡ 1 ¢ i − Tapdu = Tapdu + (EED1 − EEDi )

(3)

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

236

CHEN, QIAO AND NAHRSTEDT

Figure 8.

End-to-end APDU time.

Figure 9.

Derivation of sending and receiving time-stamps.

i Therefore, 1Tarrival is bounded as follows:

¯ ¯1T i

arrival

¯ ¯ ¯¡ 1 ¢ i 1 i ¯ ¯=¯ T apdu − Tapdu + (EED − EED ) ¯¡ 1 ¢¯ i ¯ + |(EED1 − EEDi )| ≤ ¯ Tapdu − Tapdu ≤ P + (EEDmax − EEDmin )

(4)

i If 1Tarrival is negative, it means that the ith frame arrives late, and thus it is necessary to prefetch the frames to avoid buffer underflow. The number of prefetched frames is 1T i i |d arrival e|. If 1Tarrival is positive, it means that the ith frame arrives early, and thus the frame P needs to be stored in the buffer. The number ofi frames we need to stored in the receiver’s i 1T 1Tarrival e|. However, |d e| is bounded by buffer due to the frame’s early arrival is |d arrival P P EEDmax −EEDmin n bu f (= 1 + d e) calculated from Eq. (4). That is, the receiver is guaranteed to P be able to retrieve the frames from the buffer every period P without delay if n bu f frames are prefetched and the buffer size is at least equal to n bu f ∗ f smax .

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

Figure 10.

237

Jitter accumulation during playout.

Because we measure the jitter at the end of the frame processing (at the presentation time), the jitter j (t), is computed as follows: ¯¡ 1 ¢ ¡ 1 ¢¯ 1 i ¯ | j (t)| = ¯ T play + t play + (i − 1) ∗ P − T play + (i − 1) ∗ P + t play ¯1 ¯ = ¯t − ti ¯ (5) play

play

≤ P i 1 1 where t play is the playout time of the ith frame, (T play + t play + (i − 1) ∗ P) represents the i 1 expected playout time and (T play + (i − 1) ∗ P + t play ) represents the actual playout time in the interval i (see figure 10). The stream dependency is restored by using synchronization mechanism to solve the startup time synchronization problem as shown in figure 6. This approach implies that the synchronization skew, s(t), between the two streams is equal to the bounded jitter.7

s(t) = j (t) ≤ P

4.

(6)

Implementation and benchmarking of synchronization protocols

We describe the implementation of representative synchronization protocols in their respective environments. Note, that both of the synchronization protocols are implemented at the application level in order to have a fair performance analysis and comparison, and the generic measured QoS metric is the synchronization skew. We picked couple of benchmarking audio/video clips out of the educational scenario, which are easily accessible to us. These clips were used in both protocols. We decided on using clips of teachers giving lectures as benchmarks because the lip synchronization is very important when playback of lectures occurs. Especially, as we observed, for foreign students the lip synchronization is crucial when using taped and digitized lectures as auxiliary educational material.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

238

CHEN, QIAO AND NAHRSTEDT

In terms of background load, for the adaptive synchronization protocol we use the existing load on Internet and shared UNIX stations and we do not add any additional load because there is already an existing load causing jitter and skew variations as our results show. In case of the reservation-based synchronization protocol, we add video transmission as an additional background load because the ATM is not heavily loaded. The OS load includes the existing load of the shared UNIX workstation and the additional video transmission load. 4.1.

MPEG audio and video benchmarks

We have selected two representative movies to show our results. MPEG-1 video stream is decoded using software-based MPEG decoder based on Berkeley’s MPEG-Player [22]. The decoded MPEG-1 audio stream is sent to audio device which then fully controls the playback of audio data in its buffer (audio device buffer, different from the client audio ring buffer). The measured metric of interest is the synchronization skew between audio and video at the client side during the playback phase. We measure the skew at the time before the video frame is sent to display, which is the time difference between audio and its corresponding video. The content of both videos is a teacher giving presentations. These video clips were taped during a Multimedia Systems class in Spring 1996. They were digitized, MPEG-1 compressed and stored in our VOD server. The video clips have frame size of 320 × 240 pixels (W × H) and audio streams are of type layer II, single channel with sample rate of 44.1 kHz (1152 samples/frame). See Table 5 for the details of the characteristics.8 4.2.

Adaptive synchronization

4.2.1. Implementation issues. In our adaptive synchronization scheme, the communication flow between the client and the server uses two communication channels. The first Table 5.

Characteristics of tested videos. Video Name

klara1.mpg

lect-iso.mpg

Total Frames

3880

3978

Recording Rate

7 f/s

10 f/s

Pattern

IBPBPBI...

IBBPBBI...

Ave I frame size

8775.1

7471.5

Ave P frame size

5019.8

3673.3

Ave B frame size

4359.5

2774.8

Max frame size

10368

8704

Total Size

20487496

14773824

Audio Name

klara1.mp2

lect-iso.mp2

Total Frames

21208

15236

Total Playing Time0

90 1400

60 3800

Ave. Audio Frame Size

104.5 Bytes

104.5 Bytes

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

239

one is a bidirectional TCP channel which is used to transmit the QoS control information and the adaptive information, such as clip structure information, rate information, client command for play. The second one is a unidirectional UDP channel which is used to send multiplexed audio and video data on a per-video-frame basis from server to client. In client system, we have two ring buffers, one for video, the other one for audio. Buffering at the client serves to counteract the variations of network and OS delays. The buffers are utilized as follows: First, the buffers serve during the initial probing process in which video/audio data is pre-fetched. Second, when the client receives multiplexed video/audio data from the server, it de-multiplexes the stream, and puts the video and audio data into the individual buffers. The prevention of buffer overflow and underflow for video is provided by the adaptive algorithm. The data in the buffer is queued in data’s time stamp order. In doing so, we can manage the out-of-order arriving data by insert the late data into correct position if not too late. The system software structure (see figure 11) consists of two parts at the client and the server. The software structure for the call establishment phase (upper part in figure 11) implements on top of TCP/IP the protocol in Table 1. The software structure for the transmission phase (lower part in figure 11) implements the transmission protocol on top of UDP/IP and the adaptation/renegotiation protocol on top of TCP/IP described in Table 2. The monitoring and adaptation services at the server and client side are tightly coupled in a tuning system and create a complex adaptive/feedback control loop to adapt against changes introduced by the user, by the network/OS, or between application resources available during recording and playback phases [14]. 4.2.2. Experimental setup. The underline network for testing our adaptive synchronization scheme is our departmental network which is 10 Mbps Ethernet with about 200 machines on. The tests are conducted in the later evening hours when the network is lightly loaded. The adaptive synchronization protocol has been tested on various video clips. The first clip (klara1.mpg and klara1.mp2) is played in full color. The recording rate is 7 fps and the system rate on our platform is 6 fps. Therefore, about 50% B frames are dropped in server by the adaptation algorithm. It is important to note that our adaptive scheme is scalable and works for various rate changes between the recording and the system rate (e.g., change from requested 20 fps to system rate of 7 or 5 fps, depending on available resources). Extensive tests and results are shown in [14]. The second clip (lect-iso.mpg and lect-iso.mp2) is played in black-and-white color. This clip provides us with a scenario when the video recording rate is equal to the video system rate because the underlying network/OS have enough resources to process the clip without intentionally dropping frames at the server. 4.2.3. Results and evaluation. The results in testing out adaptive synchronization scheme with the first clip (Test 1) are showed in the left-side of the figure 12. The average skew is 24.08 ms with standard deviation of 49.05 ms. Note, that the skew is positive and negative. The positive skew describes the situation when audio is ahead of video. The negative skew represents the case when video is ahead of audio. When we magnify some data from the left-side of figure 12, we can see the drop pattern in the area of strongest skew in more detail as shown on the right-side of figure 12. The

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

240

Figure 11.

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

System software structure.

right-side in fig. 12 shows playback of video between frame numbers 2440 and 2494.9 The diamond-shaped points indicate the dropped B frames by the server and the squareshaped points (at 2481 and 2485) indicate the B frames dropped by the client according to our synchronization algorithm. Starting at the 2478th frame, out-of-sync situation happens (audio ahead of video). Because the 2478th frame is a P frame, we display it. The 2479th frame is a B frame which has already been dropped by the server, and the next one is an P frame which we must decode and display. Skew is therefore accumulated even more. It is not until the 2481th frame (and further 2485), that a B frame is available to be dropped, and the skew can be reduced. The length of the out-of-sync period is 11 video frames and it lasts

Multimedia Tools and Applications KL1280-01 April 4, 2001

Figure 12. Skew of Test 1. Left-side figure shows the whole test with the first clip. Right-side figure shows the details of short period when synchronization gets in the state “out-of-sync”.

P1: vendor, GHS 18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

241

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

242

Figure 13.

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

Skew in Test 2.

about 2 seconds. This scenario happens because of a sudden (and short) system overload in which the system cannot process video frames in time. Our results also show that the algorithms for lip synchronization are effective and respond quickly to the dynamic status of the system. The results of the test with the second clip (Test 2), shown in figure 13, present the average skew of 48.26 ms and standard deviation of 27.69 ms. The standard deviation for the second test clip is much lower than for the first one. This was expected because the system is in a much more stable status during the second test. However, the average values of the second test is higher than in the first one. When comparing the shaded area between fig. 12 (left side) and fig. 13, the shaded area in fig.13 moved up. It emerges that in the second test audio is most of the time ahead of video, where in the first test the two cases: (1) audio ahead of video and (2) video ahead of audio oscillate. This behavior in both tests reflects our policy of preserving the recorded audio quality and adapting video quality during the playback. In the first test, due to the adaptation of video frames, some video frames are dropped, and the following video frames get ahead of audio. In the second test, no video frames are dropped, and because audio is the master stream and gets priority during the playback, it is most of the time ahead of video. Overall, 84.99% data in Test 1 and 87.2% data in Test 2 are in the user desired skew range of (−80, 80) ms. 14.06% data in Test 1, and 12.06% data in Test 2 are in the user acceptable skew range of (80,160) ms. 0.875% data in Test 1, and 0.0508% data in Test 2 are in range of (−160, −80) ms and 0.035% data in Test 1 are beyond the bound of 160 ms, and −160 ms. The skew frequency probabilities during the playback are shown in figure 14. 4.2.4. Discussion. There exist some protocols today suitable for distributed real-time video/audio applications, such as RT P and RTCP. RT P [24] is an end-to-end protocol based on UDP/IP to deliver the multimedia data and is primarily designed to satisfy the needs

KL1280-01 April 4, 2001

Skew frequency occurrence. Left-side shows the frequency occurrence for Test 1, the right side shows the occurrence results for Test 2.

Multimedia Tools and Applications

Figure 14.

P1: vendor, GHS 18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

243

P1: vendor, GHS Multimedia Tools and Applications

244

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

of multi-party multimedia conferences. A header containing a timestamp and sequence number is added to each UDP packet. It also has functions of payload type, framing, and source identification. Its companion protocol RTCP is used for the control of RTP. RTCP monitors the quality of service and periodically distributes control packets containing QoS information to all session participants. Certainly, our adaptive algorithm could be implemented on top of RTP and RTCP. However, we choose not to do so because (1) RTP and RTCP cover more general cases in stream data delivery such as when two or more audio streams are transmitted between session participants; in our case we focused on interstream synchronization between two different MPEG-1 media streams (audio and video). (2) With RTP, only intra-stream synchronization is done by using timestamps. The inter-stream synchronization still have to be done at the application level on top of RTP. (3) RTP and RTCP do not address how to do intra-stream and inter-stream adaptation to response to the network bandwidth and computer load changes. They rely on application protocols to adapt, utilizing the implementation from the RTP/RTCP protocol stack. (4) RTCP sends out monitored QoS information periodically. This is not necessary in our case, because our algorithm sends back rate change information only when the network or computer load change. Rothermel and Helbig’s “Adaptive Protocol for Synchronization Media Streams” [18] is also interesting, and we compare it with our adaptive scheme. We refer to their protocol as R&H Protocol. Both R&H Protocol and our scheme use the master/slave model to achieve the adaptive synchronization and both schemes transmit the control message only when it is necessary (in our scheme, the control information is exchanged when network or computer load change or when user demands lower playback frame rate, while R&H scheme exchanges the control message when network condition changes.). However, several issues differentiate our scheme from R&H Protocol. First, the R&H scheme considers more general case in stream media and does not take the media structure into consideration. Our adaptive scheme targets specifically the MPEG video and audio streams and explicitly incorporates the MPEG coding structure in the adaptive synchronization. Second, the R&H scheme allows media delivery rate to be changed in any stream while our scheme assumes that the audio stream should be transmitted and played back in its original recording rate. Third, the R&H scheme considers skipping video frames only at the client side (sink side), while our scheme allows dropping video frames at both server and client sides. By skipping video frames at the server side, our scheme can avoid the unnecessary video data transmission and help to alleviate the network congestion when the network bandwidth is limited. Fourth, the video frame dropping is not a linear operation and the number of frames to be dropped depends on the frame type and the location in the MPEG sequence. For example, if an I frame is dropped, then all following P and B frames before next I frame have to be dropped. If a B frame is dropped, then there is no effect on other frames. This fact seems to cause larger skew in our case than the skew of R&H scheme because R&H scheme does not discuss the details how to drop video frames. Also note that the results in this paper are based on the real video/audio playback while the results reported in R&H scheme are based on simulation. Finally, we want to discuss the usage of buffers and their effects on the adaptive synchronization. It is well-known that the usage of buffers will improve the performance of video/audio playback, especially it will decrease the jitter and help the intra-stream

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

245

synchronization. In our scheme, the buffer schemes are used in playback of both audio and video streams. In practice, we divide the usage of buffers in the system level, such as UDP/IP socket buffer space and audio device buffer, and in the application level, such as the audio and video buffers in our implementation. Increasing the buffer space for UDP/I P sockets, for example, from 4k to 16k, will greatly decrease the packet loss and therefore the drop rate of video frames. However, this does not necessarily help to improve the skew measurements. As long as the system rate, including both network data rate and computer load, is lower than the recording rate, some video frames have to be dropped. Depending on the type and the position of the dropped MPEG video frames, the skew could still be greater than the acceptable 160 ms. Larger video buffers in the application level also do not help very much in skew measurements if audio is behind of video. In the case of fast forward, reverse, and pause operations, we simply disable the audio playback, therefore, no action on synchronization is needed. The only problem is that after switching from normal playback to fast forward, reverse, or pause operations, the audio playback might continue for as much as 1/2 second. This is because we pre-load 20 audio frames (about 1/2 second) into the audio device to avoid audio gaps. At the point of resuming the normal playback, a delay of 1/2 second may be needed to load audio data. 4.3.

Reservation-based synchronization

4.3.1. Implementation issues. In this section, we present the implementation of the QoSaware VOD system for provision of lip synchronization. There are three communication channels between the VOD client and the server. The first one is a TCP connection which is used to transmit the QoS control information, such as requested audio/video file name, clip structure information, frame rate. Two separate AT M connections are dedicated to send multimedia data on a per-frame basis from server to client, one for audio, and the other for video. Bandwidth reservation in ATM network is based on peak rate through CBR (Constant Bit Rate) service. This reservation scheme ensures the guarantees of bandwidth resource, although it will cause some degree of network under-utilization. A better bandwidth utilization is attainable from the statistical multiplexing gain if VBR (Variable Bit Rate) service is supported by the underlying ATM infrastructure. The buffer in the client system is allocated according to the bound calculated in Section 3.3.4 to smooth the variations of network and operating system delays. To coordinate two streams’ playout time of the first frame and solve the startup synchronization problem, we apply semaphores in the synchronization model. Figure 15 shows the implementation of the synchronization model. VOD server and VOD client are the parent processes which handle the connections setup and teardown and children processes coordination by using semaphores. The VOD server parent process forks two children processes and waits for their responses through semaphore signaling when they are ready to send out their first frame. VOD server increases the semaphore value by two which is initialized as zero after waking up from the blocking of semaphore waiting calls, and thus both children processes wake up from their blocking calls after they get CPU execution time. In the client end, the system operates similarly to the server end, but the coordination between two streams is to coordinate the playout of the first frame.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

246

Figure 15.

CHEN, QIAO AND NAHRSTEDT

Audio/video streams coordination software structure.

The system software structure (see figure 16) consists of two parts at the client and the server. The software structure for the call establishment phase (upper part in figure 16) implements on top of the QualMan to reserve CPU, memory, bandwidth resources. The server and client communicate to exchange information through the TCP/IP connections. The software structure for the transmission phase (lower part in figure 16) transfers data through the high-speed ATM network, enforces the QoS on top of the QualMan, and synchronizes the startup times of audio and video streams by using the UNIX semaphores. We have implemented our VOD application with its clients and servers on Sun Ultra Sparc 1 running Solaris 2.5 OS with 64 MB of physical memory. The client and the server are connected via 155 Mbps departmental ATM network. The experimental ATM network is lightly loaded with an average of 11 Mbps network load. 4.3.2. Results and evaluation. The movie clips we used in our testbed are the source (klara1.mpg and klara1.mp2) as described in 4.1. The first experiment runs without any background traffic. The load of the UNIX system at the VOD client and server is light, that is, the machine had default processes running (e.g., X windows, http server), and no additional application from us was started. The CPU server reserves 20% every 50 ms to the audio/video servers and clients. The memory server starts with 5 MB serving the audio/video client processes. Figure 17(a) illustrates skew measurements at the client site. The result shows that the skew is not only in the desirable range of lip synchronization (−80, 80) ms, but most (99.3%) of the skew results are in the more limited range of (−10, 10) ms with an average skew of

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

Figure 16.

247

Reservation-based system software structure.

3.96 ms and standard deviation of 0.003 ms. The positive skew value represents the case when audio is ahead of video and the negative skew value for the case when video is ahead of audio. The second experiment adds a video stream from server to client with no CPU and memory reservation on both server and client sides. This additional video stream imposes not only network load as a background traffic, but also OS load on both server and client sides. Table 6 shows the characteristics of the background video traffic. The result from the second experiment, shown in figure 17(b), presents the average skew of 4.15 ms and standard deviation of 0.003 ms. 99.1% of the skew values are within the range (−10, 10) ms. The result shows that our QoS-aware VOD application delivers QoS guarantees with the presence of network and OS loads. Actually this is exactly what we expect from a system with resource reservations and performance guarantees. Table 7 shows the average skews in these experiments.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

248

Table 6.

CHEN, QIAO AND NAHRSTEDT

Characteristics of the background video stream. Video Name

4dice.mpg

Total Frames

5040

Recording Rate

20 f/s

Pattern

IBBPBBI ...

Max frame Size

18009

Total Size

28475679

Peak Bandwidth

2881 Kbps

Figure 17. Reservation-based experimental results. Figure 17(a) shows the synchronization skew in the first experiment without cross traffic. Figure 17(b) shows the synchronization skew in the second experiment with cross traffic.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

249

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

Table 7. Average skews and percentage of the skews in [−10 ms, 10 ms] bound. “Without” means without cross-traffic and “With” means with cross-traffic.

Reservation-based

Average skew (ms)

Percentage in [−10 ms, 10 ms]

Without

With

Without

With

3.96

4.15

99.3

99.1

Two synchronized processes do not make CPU service reservation independently. They specify the dependency in the CPU reservation and the scheduling algorithm implemented in the CPU server preserves the dependency. Therefore, the audio and video processes are placed into neighboring time slots. Since the CPU server reserves 20% every 50 ms to the audio/video servers and clients, according to Eq. (6) in Section 3.3.4, the synchronization skew is bounded by the period, 50 ms, and it is also shown in our experimental results. However, the bound is calculated in the worst case that the audio and video processes are placed at two separate ends of a CPU reservation cycle. Actually, by preserving dependency supported by the CPU server, the synchronization skew performs better, thus resulting in a range (−10, 10) ms for most frames in our measurements. 5.

Performance evaluation and comparison

In this section, we compare the two presented synchronization protocols in previous sections, point out the differences between them and discuss the trade-offs from various perspective such as message exchange complexity, overhead, resource requirements, audio-video synchronization skew, data loss rate, and discuss the trade-offs between them in design, implementation and performance. • Message Exchange: – Call Establishment Phase: * The message size during this phase is larger for the adaptive synchronization protocol (ASP) than for the reservation-based synchronization protocol (RSP) because ASP needs, besides information such as file names, also detailed MPEG structure such as number of frames in a group (n g ), and number of I, P, B frames in a group (n I , n P , n B ). These additional information are necessary in ASP for frame dropping. RSP does not need them due to the reservation mechanism and end-to-end guarantees. * The number of messages sent between client and server is the same in both cases as the Tables 1 and 4 show. We need 5 messages to establish the control connection and transfer connection. * The call establishment time is longer in case of RSP than in case of ASP. The reason is as following: RSP not only stores the exchanged state information, but also

P1: vendor, GHS Multimedia Tools and Applications

250

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

requests from the underlying QualMan resource management admission control and reservation operations to complete the setup. * Probing time is part of our protocols (ASP and RSP) during the setup phase because both protocols need initial state information for negotiation/re-negotiation and adaptation. This period lasts almost 1/2 to 1 second and depends on the patterns of the MPEG stream. However, note that both protocols probe for different information. ASP needs to probe for the system rate to get the offset between the recording data rate and system rate for adaptation algorithm. RSP does not need system rate because it reserves and work with the recording rate. However, RSP needs to probe for task processing times for the schedulability admission control. – Video/Audio Delivery/Transmission Phase: In terms of number of control messages, ASP needs more messages than RSP because ASP needs feedback information to adjust the sending rate. The frequency of the control messages in ASP depends on the underlying network condition and OS load, which changes dynamically. We define the synchronization interval as the time period from a rate changing point to next rate changing point. Assume that there are n synchronization intervals throughout the transmission phase and one feedback message is needed for each synchronization interval, then the complexity of the number of control messages is O(n). RSP does not need feedback control information because it has reserved resources along the path, hence no adjustment of the sending rate is necessary. • Header Overhead: Both schemes have added a special header to the APDU (Application Protocol Data Unit), however, the overhead in both cases is very small and can be neglected. Here, we only consider the application level overhead, because the overhead analysis of underline layer such as IP or ATM are well known. – Adaptive approach: Due to the usage of the UDP/I P protocol to transmit the video/audio stream, the packet loss and out of order packets have to be properly handled. First of all, the choice of packet size is limited by the socket buffer which is normally 64k at maximum. Although choosing packet size larger than 5k can increase the throughput, according to our experience, it will also increase data loss rate of above 40% which is not desirable. On the other hand, choosing packet size too small will introduce higher overhead. We decided to make APDU a variable size between 3k and 5k. These ranges correspond to the average size of an MPEG-1 B frame, 1/2 of a P frame, and 1/5 to 1/4 of an I frame. With this APDU size selection, we can avoid overhead on APDU segmentation and reassembling. We use a 4-byte header in which the first byte is used for sequence numbers to preserve the order of segments and last 3 bytes are used for frame numbers. For example, the header overhead for the clip klara1.mpg can be calculated as follows: in one GOP (IBPBPB), 2 APDUs are needed for an I frame (average I frame size is 8775 bytes, hence we need 2 APDUs for each I frame), 2 APDUs are needed for P frames (average P frame size is 5019 bytes, hence we need 1 APDU for each P frame, and there are 2 P frames in the GOP), 3 APDUs are needed for B frames (average B frame size is 4359 bytes, hence we need 1 APDU for each

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

251

B frame, and there are total 3 B frames in the GOP). The total header overhead is (2 + 2 + 3) * 4/(8775 + 5019 ∗ 2 + 4359 ∗ 3) = 0.088%. Tests show similar results on other video clips. – Reservation approach: In this approach, we do not need sequence numbers for error control or re-ordering purposes because ATM is connection-oriented network with very low error rate and guaranteed ordering. Therefore, the Header Overhead is none. • Data Loss Rate: ASP has higher data loss rate than RSP due to the following reasons: (a) unreliable underlying network/transport protocol; (b) for video, dropping mechanism and policy are part of the adaptation control. RSP uses reliable connections with the reservation based on a maximum bandwidth requirement over the ATM network, hence no dropping mechanism is deployed. • Synchronization control and its complexity: Both schemes assume that audio stream is the master stream and works as the time-axis. Video stream is synchronized with the audio. However, in term of the synchronization control, the adaptive scheme is more complex than the reservation-based scheme. – Server side: The adaptive approach needs to dynamically change the output rate of video stream and selectively drop the frames. It also needs to handle the Dynamic A-V Match-up Problem as we identified in Section 3. Assume that there are n synchronization intervals throughout the transmission phase, and server needs k steps to complete one control operation, then the complexity at the server for ASP is O(kn). On the other hand, the important issue for reservation-based approach is the tight control of the starting synchronization point. The complexity at the server for RSP is O(1) which is constant. – Client side: The adaptive approach needs to monitor the network and local load condition, report the information to the server, and dynamically handle the video-audio match-up at each re-synchronization point. On the other hand, the synchronization property in reservation-based approach is guaranteed by the underlying resource reservation, such as the ATM, CPU scheduling, memory reservation, etc., hence no additional instructions are required. • Synchronization Skew Comparison: From the previous section, we saw that the skew with the adaptive approach is within the acceptable range. About 85%–87% of time, the skew is in the user desired range of (−80 ms, 80 ms), and the remaining values are mostly within (−160 ms, 160 ms). Although, this is achieved without special resource requirements, it sometimes is not good enough to provide higher quality of VOD services. Depending on different application requirements, this approach may not be always acceptable. The reservation-based approach always provides very good synchronization and the skew values are tightly bounded within [−10 ms, 10 ms]. Furthermore, the same lip synchronization performance can be achieved under extra load running on the client side, server side, and network.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

252

CHEN, QIAO AND NAHRSTEDT

Table 8. Comparison of adaptive and reservation-based schemes. In the table, n is number of synchronization intervals (assume one feedback per synchronization interval) and k is constant which is the steps to complete one control operation. Adaptive-based

Reservation-based

QoS

Best-effort

Guaranteed

Connections

TCP/IP, UDP/IP

TCP/IP, ATM

Control message exchange at the set up

Basic information

Basic information

Control message exchange at the run time

O(n)

None

Initial probing procedure

Testing at least one group of pictures

None

Header overhead

about 0.1% or less

same

Frame loss

depends on system load

None(ideally)

Synchronization Specification

time-axis based

time-axis based

Synchronization control (server)

adaptive control O(kn) dynamic A-V Match-up

tightly starting O(1) point control

Synchronization control (client)

monitoring, feed-back

based on underline

Synchronization skew

[−160 ms, 160 ms]

re-sync point control

resource reservation [−10 ms. 10 ms]

Table 8 gives a summary of the comparison between two approaches as we discussed above. The overall evaluation of both protocols leads to the following trade-offs: • Design trade-offs: – Connection Setup Phase: * the ASP has simpler call establishment phase than RSP because ASP does not have to work with the underlying environment for proper resource allocation. * both protocols must do probing, however, different parameters need to be probed for. In ASP, the probing measures the system video date rate and in RSP, the probing measures the task processing time. * due to reservation capability, the RSP can reserve two different QoS connections for transmission of audio and video streams, where ASP uses only one transmission connection and multiplexes the audio/video streams to minimize the network jitter impact on synchronization skew. * the RSP needs to resolve the reservation conflicts between the application processes and the frame-based communication server, where ASP does not care about scheduling requests during the connection setup. * the RSP needs to resolve the dependence in reservation when two dependent streams are scheduled, where ASP does not care about this issue at the connection setup phase.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

253

– Transmission Phase; * the ASP has more complex transmission phase than RSP because for RSP all resources are available, where ASP needs to react to unpredictable resource availability. * the ASP must provide algorithm for dynamic audio, video matches and for avoidance of audio gap, where RSP needs only to provide a correct startup mechanism for the synchronization algorithm, and from then on the streams are synchronized. * the ASP must have adaptive control capabilities at the server and client side connected by feedback, where RSP does not require these capabilities. * the ASP has a very elaborate system to get an approximate resource state information and decide on dropping information, where RSP does not need this service. • Implementation Trade-offs: – the implementation of ASP is more complex due to the complex adaptive algorithms. The adaptive scheme implementation includes local probing service, the monitoring service and dropping service at the client side, and probing and dropping service at the server side. In addition to transmission protocol, the ASP must include the feedback protocol implementation to react to the resource unavailability. On the other hand, the RSP is simple because it doesn’t have to do monitoring or dropping as the underlying reservation-based environment provides and guarantees required resources. – the implementation of the starting synchronized coordination is more complex in RSP than in ASP using synchronization primitives such as semaphores. For RSP it is very important to start with small jitter deviations as in case of wrong start, the errors propagate throughout the lifetime of the application because the RSP does not have any adaptive correction mechanisms built in. – the ASP has larger buffers at the client side than the RSP due to the need to accommodate larger resource fluctuations from the underlying environment. • Result Trade-offs: – the ASP causes larger loss of information within the streams than RSP due to the adaptive dropping mechanisms, possible audio gaps, and other adaptive mechanisms reacting towards resource unavailability. In the worst case, all B or P frames can be lost in ASP, where with RSP the video stream will be delivered with all information. – the RSP might allow less applications to run than ASP. The reason is that RSP only runs an application if sufficient resources are available. In opposite case the application request will be rejected. In case of ASP, the application is always admitted (even in case of resource limitations), but then during the processing it has to adapt to the currently available resources. In case of shortage, the application is queued, and data are buffered or dropped. – RSP can provide always desirable skews, once it is accepted, even in case of additional computational and communication loads. On the other hand, although ASP will always provide acceptable skews, the overall perceptual quality of the MPEG audio/video can

P1: vendor, GHS Multimedia Tools and Applications

254

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

be only achieved in case of lightly loaded best effort environment. Note that if additional computational and/or communication loads become very heavy, the adaptation mechanisms can degrade the multimedia application to an undesirable degree (e.g., by starting to drop all B, P and most of the I frames, the overall resulting video frame-rate can drop to 1 frame per second or lower which is unacceptable). From the design, implementation and performance analysis we learned: First, surprising was the simplicity of the design and implementation of the synchronization control when reservation and proper scheduling are enforced at the resource management level. One gets the synchronization almost for free when a proper initial synchronization is done and proper scheduling is performed. Second, the reservation-based schemes will work very well for constant processing time, periodic tasks with CBR-type of traffic, however, for a highly bursty applications, if one reserves the resources at the maximum peak rate, there will be large amount of unused resources, hence low system utilization. This leads us to a conclusion thus hybrid synchronization protocols will emerge. This means that for highly variable bit rate traffic and bursty applications, the reservation of resources will be done for an average usage of resources and when bursts arrive, the application or underlying middleware invoke their adaptation algorithms to balance for jitters, and skews introduced by bursts. 6.

Conclusion

In this paper, we selected two representative synchronization protocols running on top of two different environments and based on them, we analyzed and compared the synchronization design, implementation and performance. In summary our analysis in this paper showed that • the adaptive synchronization has a low demand on underlying resource managements, the call establishment phase is simple and fast, however this protocol needs to work very hard during the transmission phase in order to balance the load and adapt against the load variations at the server side, network side and client side. This functional analysis implies that the design and implementation are complex to include local synchronization control service at the server, client side as well as distributed control between server and client sides. The experimental results show oscillation of synchronization skews in the desirable range (−80, 80) ms and in the acceptable range (−160, 160) ms. • the reservation-based synchronization has a high demand on the underlying resource management, because it requires that the resource management provides differentiation in service classes over the CPU, network resource as well as other shared resources. The call establishment phase is complex, hence it is slower than in adaptive schemes. The call establishment protocol must request and wait for admission and reservation, which means that admission control and reservation protocols need to run underneath to establish end-to-end QoS connections. However, once the QoS connections are established, the synchronization protocol is very simple. The only algorithmic issue of this protocol is to start the transmission/playout in a synchronized fashion and to coordinate reservations between audio, video processes and the communication server. Once this is achieved, the QoS-based resource management schedules time-dependent streams (QoS connections)

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

255

in a synchronized fashion. This functional analysis is reflected in a simple protocol design and implementation. The achieved performance is in order of magnitude better than in case of the adaptive synchronization and the skews are bounded in range of (−10, 10) ms. Some of the results in the design, implementation and performance are intuitive, however, some results are not obvious immediately, hence we believe that our detailed analysis is valuable for many multimedia designers of current and future systems. Acknowledgments This work was supported by the Research Board, University of Illinois, Urbana-Champaign under Agreement RES BRD 1-2-68115, and by the National Science Foundation Career Grant under CCR-96-23867. We would like to thank the reviewers for their invaluable comments which lead to an improved version of this paper. Notes 1. The objective was not to come up with new synchronization mechanisms/algorithms, but to take existing algorithms and do a thorough trade-off analysis in different environments. We could have taken existing protocols and place them in different environments, however many of the synchronization protocols are either experimental themselves or closed, hence one cannot easily study the trade-offs of design, implementation and dependencies between design, implementation and performance. 2. In our considerations, time-stamp is a time-point within a playout interval. A 3. For example, one layer-1 audio frame leads to a playout time t play = 8.7 ms at 44.1 kHz and a layer-2 audio frame is only triple of that (26.1 ms at 44.1 kHz). 4. In our tests, we use MPEG-1 type II audio with sample rate of 44.1 kHz and 1152 samples/frame. 20 audio frames is about half second in playout time and about 23040 samples. Our our audio device has a buffer of maximum 100000 samples and can therefore handle this playout time. Half second audio can cover a video playback rate at as low as 2f/s. 5. The CPU reservation can be made for arbitrary units of scheduling. For example, we could define two video frames as one scheduling unit and let the CPU server schedule the two video frames together every “k” ms. 6. The CPU server in Qualman schedules the real-time processes according to the Rate Monotonic (RM) algorithm. 7. We assume that there are always sufficient resources to deliver audio at its recording rate and the audio stream is used as the time-axis divider and the master stream. 8. We have many video clips as a result of the class taping and we tested our algorithms on them. Because the video types are very similar, the results of measured QoS metrics are similar and do not contribute new findings towards our evaluation and comparison. This is the reason, why we picked two representative video clips. 9. Each vertical dot line indicates the position of I frame and between them one pattern of IBPBPB. For example, 2476 2477 2478 2479 2480 and 2481 are IBPBPB, respectively.

References 1. R.T. Apteker, J.A. Fisher, V.S. Kisimov, and H. Neishlos, “Video acceptability and frame rate,” IEEE Multimedia, Vol. Fall, pp. 32–40, 1995. 2. D.P. Anderson and G. Homsy, “A continuous media I/O server and its synchronization mechanism,” IEEE Computer, Vol. 24, No. 10, pp. 51–57, 1991.

P1: vendor, GHS Multimedia Tools and Applications

256

Au: Pls. update

Au: Pls. update

KL1280-01

April 4, 2001

18:24

CHEN, QIAO AND NAHRSTEDT

3. G. Blakowski, Development and Runtime Support for Distributed Multimedia Applications, Verlag Shaker, German edition, 1993. 4. H.-H. Chu and K. Nahrstedt, “Memory management for soft real-time multimedia applications,” Technical report, CS, University of Illinois, Urbana, IL, October 1997. 5. H.-H. Chu and K. Nahrstedt, “CPU service classes for multimedia applications,” Technical Report UIUCDCSR-98-2068, CS, University of Illinois, Urbana, IL, August 1998. 6. Z. Chen, S.-M. Tan, R.H. Campbell, and Y. Li, “Real time video and audio in the World Wide Web,” in WWW 95, 1995. 7. J. Escobar, D. Deutsch, and C. Partridge, “Flow synchronization protocol,” in Proc. of IEEE Globecom, Vol. 3, pp. 1381–1387, 1992. 8. V. Jacobson and S. McCanne, vat, Video Audio Tool. UNIX manual page, 1992. 9. T.D.C. Little, “A framework for synchronous delivery of time-dependent multimedia data,” Multimedia Systems, Vol. 1, No. 2, pp. 87–94, 1993. 10. L. Li, A. Karmouch, and N. Georganas, “Multimedia tele-orchestra with independent sources: Part 2— Synchronization Algorithm,” Multimedia Systems, Vol. 1, No. 4, pp. 154–165, 1994. 11. G. Moon, New video and multimedia products. Internet—Electronic Mail, February 1994. from remconf&.es.net news group. 12. K. Nahrstedt, H. Chu, and S. Narayan, “QoS-aware resource management for distributed multimedia applications,” IOS Journal on High-Speed Networking, to appear. 13. C. Nicolaou, “An architecture for real-time multimedia communication systems,” IEEE JSAC, Vol. 8, pp. 391–400, 1990. 14. K. Nahrstedt and L. Qiao, “A tuning system for distributed multimedia applications,” Technical Report UIUCDCS-R-96-1958, CS, University of Illinois, Urbana, IL, May 1996. 15. K. Nahrstedt and L. Qiao, “Stability and adaptation control for lip synchronization skews,” Journal of Multimedia Systems, 1997, submitted. 16. International Standards Organization. Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 mbit/s—Part 1: Systems, International Standard ISO/IEC IS 11172-1, 1993. 17. M.J. Perez-Luque and T.D.C. Little, “A temporal reference framework for multimedia synchronization,” Journal on Selected Areas in Communication, Vol. 14, No. 1, pp. 36–51, 1996. 18. K. Rothemel and T. Helbig, “An adaptive protocol for synchronizing media streams,” in Proc. of the IEEE, Vol. 5, No. 5, pp. 324–336, 1997. 19. R. Rajkumar, K. Juvva, A. Molano, and S. Oikawa, “Resource kernels: A resource-centric approach to real-time systems,” in Proc. of the SPIE/ACM Conference on Multimedia Computing and Networking, Jan. 1998. 20. P.V. Rangan, S.S. Kumar, and S. Rajan, “Continuity and synchronization in Mpeg,” IEEE Journal on Selected Areas in Communications, Vol. 14, No. 1, pp. 52–60, 1996. 21. S. Ramanathan, P.V. Rangan, and H.M. Vin, “Frame-induced packet discarding: An efficient strategy for video networking,” in Proc. of 4th NOSSDAV, Lancaster, England, Nov. 1993. 22. L.A. Rowe and B.C. Smith, “Continuous media player,” in Proc. of 3rd NOSSDAV, San Diego, CA, Nov. 1992. 23. P.V. Rangan, H.M. Vin, and S. Ramanathan, “Designing an on-demand multimedia service,” IEEE Communications Magazine, Vol. 30, No. 7, pp. 56–65, 1992. 24. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” Internet Draft of IETF—Working Draft, July 18, 1994. 25. R. Steinmetz and K. Nahrstedt, Multimedia: Computing, Communications, and Applications, Prentice Hall, Inc., 1995. 26. M. Salmony and D. Shepherd, “Extending OSI to support synchronization required by multimedia applications,” Computer Communication, Vol. 13, pp. 399–406, 1990. 27. T. Wahl and K. Rothermel, “Representing time in multimedia systems,” in Proc. of Multimedia Computing and Systems, Boston, MA, pp. 538–543, May 1994. 28. H. Zhang and D. Ferrari, “Improving utilization for deterministic service in multimedia communication,” in International Conference on Multimedia Computing and Systems, Boston, MA, May 1994.

P1: vendor, GHS Multimedia Tools and Applications

KL1280-01

April 4, 2001

18:24

ADAPTIVE VERSUS RESERVATION-BASED SYNCHRONIZATION

257

Hung-Shiun Alex Chen received his B.S. degree in Computer Science and Information Engineering from National Taiwan University, Taipei, Taiwan, M.S. degree in Computer Science from Stanford University, Palo Alto, California. Currently he is a Ph.D. candidate in Computer Science Department, University of Illinois at UrbanaChampaign. His email address is chen5 @uiuc.edu

Lintian Qiao received his B.S. degree in Computational Mathematics from Beijing University, Beijing, China, M.S. degree in Computer Science from Marquette University, Milwaukee, Wisconsin, and Ph.D. in Computer Science from University of Illinois at Urbana-Champaign. Currently he is a software engineer of Lucent Technologies, Naperville, Illinois. His email address is [email protected]

Klara Nahrstedt (M’94) received her A.B., M.Sc. degrees in mathematics from the Humboldt University, Berlin, Germany, and Ph.D. in computer science from the University of Pennsylvania. She is an assistant professor at the University of Illinois at Urbana-Champaign, Computer Science Department where she does research on Quality of Service (QoS)-aware systems with emphasis on end-to-end resource management, routing and middleware issues for distributed multimedia systems. She is the coauthor of the widely used multimedia book ‘Multimedia: Computing, Communications and Applications’ published by Prentice Hall, and the recipient of the Early NSF Career Award and the Junior Xerox Award for Research Achievements. Her email address is: [email protected]

Suggest Documents