Transport-oriented Service Models for ... - Semantic Scholar

Transport-oriented Service Models for Synchronization of Multimedia Data Streams

K. Ravindran

Department of Computing and Information Sciences Kansas State University Manhattan, KS 66506, USA. E-mail: [email protected]

Technical Report TR:93-8

January 1993 (revised in December 1994, August 1995) Abstract The paper focuses on temporal synchronization of various data streams in multimedia information (e.g., voice, video, graphics and text) exchanged between user entities distributed over a network. During delivery of such data at users for processing (i.e., data play-out), maintaining the required temporal association between points in the data across various streams is necessary in the presence of transport delays through the network. This synchronization problem has two aspects in an application: i) Framing of data streams which refers to identifying various points in the data streams that are distinctly perceivable, and ii) Temporal presentation control which refers to ordering of various points in the data streams over real-time as required. In our solution approach, the temporal axis of an application is segmented into intervals, where each interval is a unit of synchronization. Temporal presentation in a real-time interval involves play-out of the media data segments belonging to this interval, subject to the temporal dependency relationships among data and the timeliness requirements of data. Both the data framing and temporal presentation control are application speci able. Based on the approach, the paper describes protocol techniques for handling data skew that may arise due to transport delays in the network and enforcing temporal order on data segments during play-out. The application-speci c enforcement of presentation control allows exibility in generating play-out schedules of data and optimizes the usage of network resources to application needs.

Key words: Temporal intervals, media persistence eects, data causality, QOS speci cations, data framing, data play-out protocols, delay controllable network channels.

||||||-

** Part of this work is supported by US Army Research Oce, under grant: DAAH04-95-1-0464. 1

1 Introduction A multimedia application is structured as a set of user entities communicating with one another over a network (such as the Internet, LANs and wide-area ATM networks), with one or more entities generating information consisting of multiple data streams for consumption by other entities in real-time. In broadcast TV for example, video and audio sent from the TV station are presented to the display and voice systems in TV receivers. In a window-based workstation conferencing example, each user activity, say, on a shared document displayed on the local workstation window generates multimedia data, possibly consisting of graphic input from a menu using mouse click to highlight a certain portion of the document and text input from a keyboard and audio input from a voice phone to annotate an update on the document, that is to be presented at the windows of other workstations [1]. A multimedia communication system (MCS) collects the data of various media generated by source entities, transports the data through the underlying network, and delivers the data at destination entities for consumption. During processing of multimedia data, maintaining the required temporal association between the data across various streams is necessary in the presence of network induced data transport delays. This is possible, from the MCS perspective, by segmenting the data streams into application perceivable distinct units of data along the real-time axis and exercising temporal presentation control on these data units at users. See Figure 1. The temporal presentation involves extracting the timing relationship between various data units collected at sources and determining the real-time intervals in which these data units may be delivered at destinations for processing. In the conference example, a user may need to see the text input after the mouse click within, say, 2-3 seconds (sec). In the video telephony example, the human perceptible data units, viz., video screen image and a set of audio samples corresponding to the image, should be seen at about the same time, say, within 30-40 milliseconds (msec). Enforcing this type of ordering and timeliness in the processing of media data units along the real-time axis is controlled by application requirements. The above approach of temporal presentation control depicts a transport level view of synchronizing the occurrence of multimedia data. To support this view, the MCS should embody the following components:

Application-level speci cation of temporal presentation control requirements on a multimedia data; Transport protocols that implement a set of network level and end-system level mechanisms to meet the data presentation requirements. 1

Multimedia source

Multimedia destination

MDU-4

MDU-1

Direction of dataflow C

B

A

MDU-2

MDU-3

Communication system

MDU-i : Collection of i-th data units ("i" indicates chronological sequence in which data units are generated)

X

Network

k Data unit

Time T

A

B

C

Data in stream B Communication to be delivered after that in A and C system

Data unit X arrives k time units earlier than scheduled delivery time T, so X is buffered at destination for duration k

Figure 1: Illustration of synchronization issues in multimedia data streams The presentation control information is typically made available to the transport protocols in the form of quality of service (QOS) parameters that indicate the various temporal characteristics of data presentation. For instance, how long the delivery of a media data x at a user should be delayed with respect to that of another media data y is speci able as a QOS parameter. The lower layers of MCS are concerned with the movement of data from source to destination end-points through the network in real-time (say, x and y are transported using real-time protocols [2]). The end-system mechanisms deal with generating real-time playout schedules that control the delivery times of various data units arriving from the network at user entities for processing, in a manner that meets the QOS parameters speci ed (typically by buering and sequencing the data across the transport-application layer interface). If, for instance, x is scheduled for delivery at time T but arrives at the destination end-point at time (T ? k), it is buered by the end-system protocol for a duration of k before delivery to the user. Refer to Figure 1. As can be seen, the QOS speci cation and the underlying play-out protocols form essential parts of a MCS. In light of the evolution of distributed multimedia application technologies (e.g., class-room teaching using visual, audio and graphics tools), the desirable characteristics of a MCS are:

Extensibility and uniformity in supporting the diverse presentation requirements of applications; Non-dependence of the MCS architecture on speci cs of the underlying network. This paper provides a exible and canonical model of MCS with these characteristics. The model is based on application speci able segmentation of multimedia data, and playing out data segments in a certain order and in a timely manner based on the temporal relationships between them. The paper provides methods to derive the QOS requirements from application-speci ed information and to parameterize the protocol procedures with this QOS. An important feature of MCS model is the speci ability of real-time constraints for the data 2

presentation functions of applications using a temporal framework. This makes the model useful across a wide range of applications. The paper is organized as follows: Section 2 describes the architecture of MCS. Sections 3-4 describe our transport-oriented model of media synchronization. Sections 5-6 discuss how temporal presentation requirements may be speci ed in QOS and how end-to-end protocols for data play-out may be designed using QOS information. Section 7 walks through the protocols with respect to a sample application. Section 8 relates our model to other existing works. Section 9 concludes the paper.

2 Architectural view of multimedia communication system The MCS should allow placement of the media synchronization functions, viz., the derivation of QOS from temporal relationships among media data units and the parameterization of underlying protocols with this QOS, as distinct architectural components. For instance, the generation of data play-out schedules from QOS speci cations and the underlying protocol mechanisms needed to meet these schedules should be separated across a well-de ned inter-layer interface. This `2-tier' structuring of temporal presentation functions allows the exibility of supporting the requirements of a variety of applications, without increasing the complexity of application level programming1. How the `2-tier' approach manifests itself in the MCS architecture is described below (see Figure 2 | the architectural relationship to OSI-Layers is also shown for comparison).

2.1 Channel abstraction layer The data of each media in an application may be transported on a separate communication channel set up through the network. That the media transported on various channels belong to a single application is not known to the network. So these channels are independent data paths, each possibly with dierent characteristics. For example, a video telephone has two channels, viz., one for (compressed) video data at 2-3 mb=sec and the other for audio at 64 kb=sec. These channels may possibly take dierent paths through the network based on bandwidth availability. The MCS should synchronize the data received on various channels for presentation to the user. Since the temporal aspects of media presentation at a destination can be aected by the delays suered by data during transport through a channel, it is desirable that the synchronization protocol has external 1 Here, `programming' refers to the structuring of an application in terms of canonical primitives provided by the MCS (rather than the writing of àpplication program code').

3

Data translation/representation Multimedia system architecture

Control plane functions

OSI architecture Application

Multimedia application

Projections Data generation and dissemination QOS transp

Presentation Session

Segmentation and de-segmentation

class-0 transport connection Datagram network

QOS net

TPD

Multimedia transport functions

MCS

Network channels

Send and receive packets of segments

Data link Physical

Data plane functions Enforcing media relationships + Data encoding

Layer function

Conrol information flow across layers

Inter-layer interface

TPD: Temporal presentation of data

Figure 2: Functional projections of a multimedia communication system control over the delay behavior of channels. The control has two aspects: i) the protocol specifying a quality of network service QOSnet to describe the extent of delay guarantees required of a channel, and ii) the network exercising appropriate internal resources (viz., allocation of storage buers and communication bandwidth for data units) to enforce the delay guarantees. The external behavior of the network assumed is based on existing models of networks such as `-channels' [3] and `real-time communication channels' [4] that allow specifying upper bounds on delays through the network. The delay parameters speci ed in QOSnet, denoted as DQ , may control the actual delay behavior of the channel, denoted as D. In the strongest form, the MCS requires a deterministic channel that does not induce any variability in delays. Requirements weaker than this result in channels providing probabilistic delays, and range from deterministic to probabilistic guarantees of the upper bounds on delays. The delay constraints have a direct relationship to the extent of demand placed on the network internal resources. Typically, weaker delay guarantees expend less resources in the network due to the laxity possible in allocation of channel buers & bandwidth and/or scheduling the movement of data through channel paths.

2.2 Temporal presentation layer We do not favor the idea of the network synchronizing the various data streams (as is done in [2]) since it will reduce data transport exibility and scalability because of the need to maintain extensive state information in the network about the various streams, particularly when the streams ow through dierent parts of the network. So our solution approach is for the upper layer functions of MCS to enforce synchronization in an 4

application-speci c manner using the services of channel layer functions. Informally, a temporal interval depicts the unit of time perceptible to the application (e.g., the video frame display time in digital TV which is 33 msec). A temporal interval is the unit of synchronization, with the end-systems perceiving the ow of real-time using `synchronized clocks' (real-time `clock synchronization' may be realized using a separate protocol [5]). Temporal presentation involves scheduling a play-out of data, i.e., delivering the data units of each interval for processing by user entities. The delivery constraints, viz., the allowed play-out sequences on data units and the acceptable levels of tolerance to real-time delays in play-out, are speci ed by the application. The synchronization requirement is speci ed in the form of a quality of transport service description (QOStransp), derivable from the temporal relationships among media data units. For example, a catalogue browsing service may require that the presentation of text and graphics data of a catalogue page be spaced apart in time within an application speci ed limit, say 2 sec. And this requirement is weaker than that for digital TV where a picture frame and the corresponding audio samples need to be delivered within a 33 msec interval. The QOStransp parameters should be able to capture such variations in synchronization requirements, and re ect them in DQ speci cations and generation of play-out schedules (see sections 5-6).

2.3 Interworking of channel layer and presentation layer The speci cation of QOStransp and the protocol procedures parameterized by the speci cation are bound to a connection at the application interface to MCS. The QOStransp, which originates from the application, is used by a synchronization protocol in the MCS in two ways:

Mapping to QOSnet at the network interface below to support the real-time movement of data units from sources through network channels (lower layer functions of MCS realized on the backbone network);

Generating a play-out schedule over real-time for delivering the data units arriving through network channels at destinations (upper layer functions of MCS). A connection thus embodies channel functions that transport various data streams in a multimedia information as per the QOSnet speci cations generated by the synchronization protocol procedures and presentation functions that exercise timing control on these data streams for delivery to application entities. In the catalogue browsing example, the acceptable dierential delay between the text and graphics data channels and the acceptable end-to-end transfer delay on these channels may be speci ed as QOSnet. With 5

adequate network resource allocations to meet this speci cation, data units may arrive at a destination before their delivery deadlines, and then may be played out appropriately. Such an external control over channel behaviors (from the upper layer) allows an optimal use of network internal resources for the application. The isolation of QOStransp speci cation from source-destination con gurations is possible by employing a `send-and-receive' abstraction at the presentation level. Here, each source entity has a destination function that receives the data generated for local dissemination (e.g., audio samples from microphone to local speaker, video pixels from camera to local display), as if the data arrived over a network channel. The play-out scheduling control on a data at local destination is the same as that at a remote destination receiving the data through network transport mechanisms, such as specifying an upper bound on transport delay2.

2.4 Are existing communication architectures suitable for MCS ?? Many existing communication system architectures that deal with real-time data transport [6] embody primarily the lower layer functions of MCS, viz., channel bandwidth and buer allocations. Even if we provide for the movement of dierent media data on separate `real-time channels', end-system mechanisms are needed to compose a play-out schedule for the data arriving on various channels. Also, the degree of external control on the behavior of a channel is quite restrictive in that only a limited set of transport attributes are speci able for the channel. For example, specifying a data transfer rate and an upper bound on data delays is not possible for some high speed channels [7]; some `real-time channels' allow specifying the transfer rates and delays but not the variability in data delays [2]. We categorize these models as tackling only one dimension of issues, viz., transfer of large amounts of data from sources to destinations meeting a certain transfer delay, reliability and rate requirement. It is dicult to realize the multimedia synchronization protocols on top of such architectures since the temporal presentation control of media data at users exposes another dimension of issues, requiring an explicit control over the network channel behavior3 . Even if one were to build a MCS by customizing the network behavior and the synchronization protocols, the approach will be ad-hoc and non-extensible in light of the diversity in presentation requirements of the evolving multimedia applications. 2 For local receive, the data may not be actually transferred over the network. It is merely the ow of presentation control information through the channel layer that supports the `send-and-receive' abstraction. 3 We do not imply that existing network architectures are bad !! What we stress here is that many architectures basically evolved to tackle only the issues arising in conventional data transfer (such as ow control, error recovery and real-time delivery), but they lack the mechanisms for upper layer entities to exercise a ne-granular control over the network delay behavior, that is desirable for multimedia data presentation control.

6

The limitations in existing models of network control thus motivate the need for a more comprehensive approach in the design and building of a MCS. The proposed architecture in [8] to support real-time multimedia communications exempli es this approach, and is architecturally compatible with our `2-tier' structure.

2.5 OSI and Ìnternet' perspectives In the OSI framework, synchronization of media data falls in the `session layer' (refer to Figure 2). However, the ÒSI session protocols' in their current form are not suitable for media presentation control since the facility to multiplex ÒSI transport connections' through which various data streams may ow does not provide mechanisms to associate temporal relationships among the data. So studies are being done by researchers to augment the ÒSI session protocols' in the form of incorporating a temporal presentation function [9]. Recent trends in the Ìnternet' have been to relegate the implementation of synchronization schemes in the applications themselves rather in the transport layer [10]. In comparison, our approach is a hybrid one in that applications determine the data presentation requirements but the underlying protocol mechanisms themselves are relegated to the MCS. This reduces the complexity in programming of applications without compromising the exibility of being able to support diverse presentation requirements. Our approach is possible with an òbject-oriented interface' between the applications and MCS on one hand and between the MCS and backbone network on the other hand. The design of MCS requires, at the outset, application level segmentation of media data, based on which a transport-oriented characterization of multimedia data can be formulated to aid the design of media synchronization protocols. The segmentation concept is described next.

3 Media data segmentation A multimedia data stream needs to be segmented into application-speci c units for end-to-end transport and synchronization (e.g., a video picture frame in digital TV, a voice clip in multimedia lecturing). The transport level processing of a media data segment involves the generation and/or dissemination of devicespeci c information elements carried in the segment at a pre-determined rate (e.g., 1-byte audio sample every 125 sec from/to a microphone/speaker). The processing of successive data segments appropriately spaced in time de nes a play-out of the media data stream to the application.

7

AS/AR : Application level source/destination PS/PR : Transport level sender/receiver

media segment x r AS

Packets (4#) of media segment x r 1 2 3 4

PS

Packet arrivals Packet transmission

PR

AR

1

2

3

4

‘holding delay’ h r

kr Transport delay D through network channel

t’r

Play-out duration tr

Collection duration

r

xr kr

r

Persistence duration Flow of time

Figure 3: Delays incurred at various points of end-to-end data transport

3.1 End-to-end delays of media segments See Figure 3. Suppose xr represents a data segment in the rth jr=1;2;:::;N media, where a segment may be realized as a sequence of transport level data packets or ÀTM cells', depending on the underlying network4 . At the source of xr , the data packets are collected over a duration r before transmission over a network channel; at a destination of xr , the data packets arriving over a network channel are played out over a duration r . As an example, a 200-byte data packet carrying an audio segment generated from a 8-bit codec operating at a rate of 8000 samples/sec is presented to the speaker device at a constant rate over a 25 msec interval (this relates to how long the aural cues are retained in human listener). As another example, 500 ATM `cells' of 48 bytes each constituting a 120 200 pixel video picture frame with 8-bit coding are presented to the video display system at a constant rate over a 33 msec interval. Extending this example, a video clip consisting of 150000 ATM `cells' may be collected from a video storage over a 10 sec interval. The play-out/collection interval r may also be viewed, from a media presentation perspective, as a minimum duration for which the eects of xr persist (or linger on) in the application. If t0r and tr indicate the start times of the collection of media data segment xr at source and the play-out of xr at a destination respectively, then tr t0r + r + kr + D. Where multiple media are involved, the play-out of xr may be additionally delayed pending the play-out of one or more other media data segments. This `holding delay' hr accounts for inter-media synchronization constraints on xr , i.e., tr t0r + r + kr + D + hr (this timing relation can be appropriately modi ed to re ect any parallelism between the collection/play-out of xr and the transport of packets of xr through network). Basically, the media segment xr is said to occur at an entity over a duration (r + kr ) that corresponds 4

The notion of a data segment is similar to that of `logical data unit' proposed in [8, 11, 12].

8

to how long the eects of xr persist to cause `state changes' at the entity, with kr ( 0) being a constant to account for this media persistence duration. In an òbject-oriented programming' framework, (r + kr ) may be viewed as the time to process a `message' carrying xr by an òbject' implementing the media end-point at a user entity.

3.2 Sensory perception of media data The user level perception of time-varying data on a media can be guaranteed by ensuring that the eects of a data segment are seen: i) before the deadline imposed by its timeliness requirements, and ii) after the persistence eects of its previous data segment have already ceased. To elaborate (i) and (ii), we consider data segments [xir ]i=1;2;::: of rth media, where i indicates the chronological sequence in which the data segments are played out. The timeliness condition may be expressed in terms of the maximum allowed latency ri (> 0) between the generation and dissemination of data segment xir : tir (tir0 + ri + kr + ri );

(1)

where t1r 0 = 0 indicates the initial point along temporal axis of the application. In an interactive audio conversation for example, ri 60 msec. It is necessary that the transport delay through network D(xir ) ri for xir to cause eects at the user entity5 . The play-out time of data segment xir should satisfy the following relation that models the passage of real-time with intervals of duration large enough to contain the media persistence eects: tir = tir?1 + ri?1 + kr + ri?1 ;

(2)

where ri?1 0ji=1;2;:::. Here, x0r is a NULL data segment that starts the rth media sensing activity at t0r = 0, with r0 = r1 and r0 = r1 (i.e., r0 and r0 are chosen to indicate the allowed latency of x1r ). ri may account for any `quiesce time' required after the sensing of xir , to allow a minimum separation between the occurrence of successive data segments. The case of ri being a constant 0j8i means that every data segment is subject to the same degree of latency, which can be enforced deterministically. Other cases of ri allow probabilistically enforced latency. See Figure 4. In an example of displaying text sequences in a 5 In an òbject-oriented' framework, a data segment xr may be viewed as being associated with a `time-to-live' duration r that de nes a context for `state changes' in the application. And the elapse of r modi es the context in which xr cannot cause any `state changes'.

9

i r

i+1 r

kr

xi r

i r

i+2 r kr

kr

x i+1 r

x i+2 r

i+1 r

Flow of real-time at a destination entity i : Quiesce time for x i r r

i : play-out time for x i r r

k r : Media persistence time

Media data segment

Figure 4: Timing components in presentation of successive media data segments (speci cation-oriented) workstation window, ri may be 2 sec, kr may be 1 sec, and ri may vary, say, from 0 sec to 3 sec. In many cases, the data segment size does not change across various intervals, and hence ri = ri?1 . Also, ri need not be the same as ri?1 (i.e., r may be chosen non-deterministically across dierent intervals in an execution), with ri = ri?1 being a special case to model periodic data streams. From equations (1) and (2), the `quiesce time' and the allowed latency of data segments may be related as: (tir?1 + ri?1) (tir0 + ri ):

(3)

So a larger r implies a higher degree of tolerance to transport delays D(xr ) incurred in the network. ri = 0j8i and kr = 0 indicate that the persistence eects of a data segment do not last any longer than its play-out duration and there is no time gap between the play-outs of successive data segments. This is the case with a continuous media where application entities sense data on the media at every point along real-time axis and the perception of continuously changing data on the media can be guaranteed if a segment occurs at the point when the persistence eect of its previous segment has just ceased6 . In the example of 64 kbps digitized audio, 200-byte audio segments can occur at 25 msec intervals for continuous perception of audio. Overall, a media r is representable by a timed sequence of data segments [x1r ; x2r ; : : :], with the inter-segment separation determined by the `quiesce time' r , the allowable latency r , and the persistence duration (r +kr ).

3.3 Temporal intervals A temporal interval is the real-time duration for which the eects of all the related data segments in various media can persist in the application. An interval basically indicates the granularity of real-time meaningful 6 From a transport perspective, continuous media data appears as an ìsochronous stream' of data segments. And the maximum possible number of data segments in the `media pipeline' from source to a destination is rr .

10

Ti

Play-out at destination Data segment i-th data frame

T i-1

di i

h2 h i1

i+2

i+3 x

1

i+3 x 1 x

x

i-th interval

i+1

Stream 1

i+2 1

i x1

i+1 x1

2 i+3 x 2

Source

i+2 x 2

Gaps in real-time x x

i+3 N

i+1 x 2 x

i+2 N

i+1 x N

N

i x x

Stream 2

destination

2 i N Stream N Temporal axis

i+4 ‘container’ of data streams Network level data packets in each stream belonging to various temporal intervals are shown in different shades

Flow of data stream through the network

Figure 5: Temporal segmentation of media data to the application, and hence constitutes the unit of segmentation of real-time axis. See Figure 5. The ith temporal interval (i = 1; 2; : : :) is given by the pair (Ti?1 ; Ti ), where Ti is a point on the real-time axis at which all related data segments belonging to ith interval, viz., fxi1; xi2; : : :; xiN g, have occurred but not any of the segments belonging to (i + 1)th interval. The duration of this interval is given by di = (Ti ? Ti?1 ). The segments in an interval, possibly generated by dierent sources, constitute a data frame. The application determines which segments occur in an interval and what their synchronization requirements are. In general, it is necessary that di (ri + kr )j8r for correct temporal behavior of the application. Given the data segments xis and xir , we treat xis as occurring before xir if the play-out schedule requires that tir > (tis +si +ks); and xir and xis are simultaneous if both tis tir (tis +si +ks ) and tir tis (tir +ri +kr ) are valid play-out schedules. Thus the ordering of data units in an interval subsumes simultaneity of occurrence, whereby a user cannot distinguish any speci c sequence in which data units have occurred. The framing characteristics, viz., the length of an interval and the temporal relationship between data segments in a frame, depend on the persistence duration of data and the allowed schedules in which various data can be seen in the application. And the extent of simultaneity and ordering of data units that need to be enforced for synchronized delivery of data streams is application-speci c. In general, the synchronization constraints are weaker for an application that is less sensitive to real-time delays in the occurrence of data. We integrate the notion of sensory perception of media segments with temporal ordering among media segments to model the ow of real-time in a multimedia application7. This model, as described next, forms 7

Without loss of generality, we assume that data is generated live by sources and is consumed live by destinations. From the

11

the basis for comprehensive speci cations of temporal presentation control and the underlying protocols.

4 Temporal ordering of media data segments In an òbject-oriented' framework, media data segments and temporal intervals may depict `message ows' and `state changes' at various media end-point entities in the application (e.g., display of a video picture frame and the visual cues it causes in human viewer). Accordingly, the framework allows us to adopt the temporal modeling and structuring principles employed in distributed computing systems to the problem of temporal presentation control of multimedia data streams, as re ected in the discussions below.

4.1 Causality among data segments An application may specify an ordering relation `' on data segments that maps onto a play-out timing schedule. We say `xir occurs after xis', denoted as `xis xir ', to mean that xir should occur after the application starts seeing the eects of xis. This means that the relation tir > tis holds in every execution instance of the application. In this case, we say that `xir causally depends on xis ' (or alternately, `xis causally precedes xir '). When neither xis xir nor xir xis is speci able, xis and xir are said to be concurrent (denoted as kfxis; xir g), whereby tis > tir , tir > tis and tis = tir are valid play-out schedules on data. From an application perspective, simultaneity in data delivery is thus a manifestation of concurrency among media segments. The `' relations on various media data segments in ith temporal interval, combined with the timing relation (2), prescribe the interval [Ti ; Ti?1]. See Figure 6. The set of causal dependencies among data segments fxir g8r , indicated as R(fxir g8r ), constitutes an application level information passed on to the underlying communication system as part of QOStransp, to determine a play-out schedule that provides the desired ordering on the occurrence of data segments. R(fxir g8r ) manifests as constraints on the delivery of data segments fxir g for processing in the application within the real-time duration given by di. In comparison to Lamport's notion of causality among `messages in a distributed computation' [14], our notion of causality incorporates the ow of real-time into the de nition of `messages', viz., media data segments, which is necessary for multimedia applications (section 4.2 below further illustrates this). In comparison to the notion of -causality used in [15] that speci es an upper bound `' on the real-time delay in delivering MCS point of view, the assumption holds for stored media presentation as well (e.g., video-on-demand services) by including appropriate temporal information with a media data in storage (such as play-out times and durations) and playing back the data over the network as per the stored temporal information [13].

12

t

i r

t

j i i - t > Persistence duration of X r s r i j X r is seen before X s

i-th media

i Xr t

j s

t

.

j Xs

j i i - t < Persistence duration of X r s r i j X r and X s occur simultaneously

j-th media

Time

Figure 6: Illustration of causality relationship between media data a media segment, our notion of causality, besides specifying an end-to-end latency r similar to `', also incorporates ordering of data segments over real-time in an application-speci c manner, and hence is at a higher level.

4.2 Ordered delivery of data segments The causality among media data, both within an interval and across dierent intervals, allows correct enforcing of sequenced and real-time delivery of the data segments of various media at a destination. For instance, an application consisting of a single media can be characterized by the causality relation among data segments across successive intervals, viz., xi?1 xi for the ith interval (i = 1; 2; : : :), with the timing relation (2) arising therefrom. With multiple media in an application however, the causality among data segments of various media in the (i ? 1)th interval also needs to be taken into account, which takes the form R(fxir?1g8r ) xisjs=1;2;:::;N . When all the media in an application are continuous (i.e., ri = 0; kr = 0 8i; r), the size of a media data segment is so chosen that the play-out of this segment and that of a media segment with the largest play-out duration completely overlap one another, as given by the relation: size(xis) = rate(xis ) max(fri g8r ) for some media s over the ith interval. In audio-video synchronization for example, the size of each audio segment may be increased to 264 bytes to allow continuous play-out over a 33 msec interval in overlap with a video segment. Such a characterization of continuous multimedia data is a degenerate case of concurrent occurrence of media data segments in each temporal interval, indicated as kfxir g8r . Accordingly, the passage 13

of real-time in a continuous multimedia application may be given by extending the relation (2) in the form: Ti = Ti?1 + max(fri gr=1;2;:::;N );

(4)

where max(: : :) depicts the simultaneous occurrence of various media segments in the ith interval. For example, the `lip-sync' presentation requirement at TV viewers manifests as a simultaneous occurrence of the video and audio data segments over a 33 msec interval [16], indicated as kfAUDIO; V IDEOg with di = 33 msec. For non-continuous multimedia, the relation (2) is extended in the form p X Ti = Ti?1 + max(f (ji + kj + ji )g8p ;p ); 2

1 2

j =p1

(5)

where p1 ; : : :; p2( N) refers to a distinct set of media data segments that have a sequential ordering and the max(: : :) is computed on various such sets, i.e., the concurrently occurring sets of data segments. In a multimedia window application for example, text and voice annotations to a document highlighting can be concurrent, occurring in about 2 sec after the highlight and within about 1 sec separation of each other. Here, we have HIGH LIGHT kfTEXT; V OICE g, with di (high light +khigh light +2+1+max(ftext +ktext; voice +kvoice g)) sec. The right hand side of relation (5) indicates the maximumduration of distinctly perceivable eects of media segments, and can range between max(fri +kr +ri gr=1;2;:::;N ) when all segments are concurrent and (ri + kr + ri ) when all segments r=1;2;:::;N are sequential.

X

The timing relations (4) and (5) arise from the causality among multimedia segments fxir g8r across various intervals 1; 2; : : :; i. And the causal order constraints, including simultaneity, are capturable by the `' and/or `k' relations contained in R(fxjr g8r&&j =1;2;:::;i ). So causality may be explicitly speci ed for multimedia data in terms of their persistence durations, with the data play-out schedules determined from these speci cations.

4.3 Multi-source ows The various media data streams in a multimedia application are bound to a canonical set of temporal relationships, expressible as f(xs xr )gxr ;xs2fx1 ;x2 ;:::;xN g , where the `' relation is de ned in terms of the timing parameters s ; ks; r ; s. Since a speci cation of these timing parameters is not tied to how the data generation functions are distributed across source entities (i.e., sender-independent speci cation) and since `synchronized real-time clocks' are employed at source/destination entities, the QOStransp consisting of `' relations on media data is in turn speci able independent of source-destination con guration in the application. 14

The insulation of QOStransp speci cation methodology from source-destination con guration is possible with the use of `send-and-receive' play-out abstraction at each source entity. This, as we shall see in sections 5 and 6, allows uniformity in the QOStransp speci cation constructs and in the underlying scheduling protocols for play-out of media data. Thus single-source media data and multi-source media data may be treated alike by the MCS, both from speci cation and scheduling points of view8 . How the causality relations among media data, as speci ed in QOStransp, may depict various levels of tolerance to data skew is a critical function of the MCS. This function is described next.

5 Speci cation of transport QOS The speci cation of QOStransp consists of causal dependency relations between data segments, viz., R(fxir g8r ), expressible in terms of various timing parameters r ; kr ; r ; r . Given that an application can tolerate certain level of data skew, one needs a speci cation mechanism to express this tolerance as part of QOStransp. A synchronization protocol consists of data segment handling procedures parameterizable by these speci cations.

5.1 Specifying temporal ordering Two types of ordering constraints are speci able: i) how a data frame should be interspersed relative to other data frames (possibly from dierent sources), and ii) how segments within a frame are ordered. To allow specifying such constraints on data, a construct is provided in the form ((xs; s ; ks); Occurs After(xr ); fl dels ; u dels g) for use by applications, whereby the play-out of data segment xs at a destination should be started after that of xr is started, i.e., xr xs. The parameters l dels and u dels indicate the minimum and maximum time separations respectively between the start of play-outs of xr and xs; so 0 l dels u dels . If (r + kr ) < l dels , the eects of xs are distinctly perceivable in time from that of xr ; if (r + kr ) (u dels + s + ks), xs fully overlaps in time with xr ; in other cases, a partial overlap of xs with xr is possible (note that r ; kr are available to the synchronization protocol through the Occurs After relation for xr ). See Figure 7-(a). When xr and xs belong to the same temporal interval, the duration of this interval d is given by max(fl dels + s + ks; r + kr g) d max(fu dels + s + ks; r + kr g):

(6)

8 Also, by separating the speci cation methodology from source-destination con gurations, the temporal properties of multimedia application programs can be easily veri ed from their transport-oriented speci cations in terms of inter-stream and intra-stream synchronization constraints.

15

Flow of time r+kr

r+kr

xr

xr

r+kr

xs y

xs y

s+ks

tr

y

s+ks

tr ts

ts

X2 xs

s+ks

tr

(partial overlap)

next_sgmnt

xr

(no overlap)

ts (full overlap)

{l_del2 ,u_del2 }

X3

X

{l_del3 ,u_del3 }

1

{l_del1 , u_del1 } prev_sgmnt

Data segment s+ks: Persistence duration of x s

r+kr: Persistence duration of xr

l_dels < y < u_dels

Media data segment (a) Overlap in delivery of data segments xr and xs

‘Occurs_After’ dependency relation (b) AND graph to represent ordering constraints among X1, X2 , X3

Figure 7: Illustration of Occurs After speci cations on media data segments The parameter [l dels ; u dels ] indicate the extent of variability in scheduling the play-out of xs relative to xr , and hence prescribes the `quiesce time' r of xr . When Occurs After(NULL) is speci ed, xs can be played out without any constraint, immediately upon arrival from the network channel. For data segments belonging to successive intervals of a single media, say s, we have ((xis ; si ; ks); Occurs After(xis?1 ); fl delsi ; u delsi g)ji=1;2;::: where (si?1 + ks ) l delsi , [l delsi ; u delsi ] prescribes the `quiesce time' si?1 of xis?1, and x0s is NULL (recall that s is related to s in the form: (tis?1 + si?1) (tis0 + si ), with s0 = s1 ). Consider the example of conferencing, discussed in section 4.2 (with slightly modi ed timing parameters). The concurrent delivery of TEXT and V OICE may be realized by specifying

R(fHIGH LIGHT; TEXT; V OICE g) f((HIGH LIGHT; 0:75; 0:25); Occurs After(NULL); f0:2; 0:7g); ((TEXT; 1:0; 0:5; ); Occurs After(HIGH LIGHT); f1:5; 3:5g); ((V OICE; 1:5; 0:5); Occurs After(HIGH LIGHT); f1:5; 2:75g)g, which generates the ordering HIGH LIGHT kfTEXT; V OICE g (the time durations are speci ed in sec). The play-out of fTEXT; V OICE g is separated by a minimum delay of 1:5 sec from the start of HIGH LIGHT; the V OICE and TEXT should complete within [2:5; 3:75] sec and [2:0; 4:0] sec respectively after the HIGH LIGHT completes. The duration d of temporal interval containing these segments is 0:75 + 0:25 + max(f2:5; 2:0g) d 0:75 + 0:25 + max(f3:75; 4:0g); i.e., 3:5 d 5:0. The end-to-end latency allowed on the mouse data is high light = [0:2; 0:7] sec. The Occurs After is basically a programming notation to construct the ordering relation `' explicitly in terms of the ow of real-time. The construct integrates of causality with the ow of real-time through 16

an òbject-oriented programming' framework that transforms the play-out of temporally related media data segments into application level `state changes'.

5.2 Dependency graphs An extended form of the construct allows a user to specify complex ordering relationships based on AND connectives on data in the form ((xs ; s ; ks); Occurs After(xr ^ xk ); fl dels ; u dels g); indicating that the play-out of xs be started after the play-out of `xr AND xk ' has been started, with l dels ; u dels speci ed relative to max(ftr ; tk g). In a TV/videophone example, the simultaneous delivery of audio and video data segments may be speci ed as:

R(fV IDEO; AUDIOg) f0:033; 0:033g);

Occurs After(V IDEOi?1 ^ AUDIOi?1 ); After(V IDEOi?1 ^ AUDIOi?1 ); f0:033; 0:033g)g

f((V IDEOi ; 0:033; 0:0);

((AUDIOi ; 0:033; 0:0;); Occurs

for i = 1; 2; : : :: (7)

Consider a user activity in an interval as represented by data segments fx1; x2; x3g. An ordering relation, say kf(x1 x2); x3g, may be speci ed as follows (only the essential delay parameters are shown):

R(fx1; x2; x3; : : :g) f((x1; 1 ; k1); Occurs After(prev sgmnt), fl del1 ; u del1 g); ((x2; 2 ; k2); Occurs After(x1 ), fl del2 ; u del2 g); ((x3; 3 ; k3); Occurs After(prev sgmnt), fl del3 ; u del3 g); ((nxt sgmnt; : : :); Occurs After(x2 ^ x3), : : :)g, where prev sgmnt is the last segment in the previous frame and nxt sgmnt is the rst segment in the next frame. The interspersing of data segments generated by multiple user activities can be extended from the above structure; here, the various user entities possess knowledge of causality (in a decentralized sense) to determine the ordering constraints. The data dependencies may be represented by a AND graph with each data segment as a `vertex' and its dependency as an èdge' labeled by fl del ; u del g. Figure 7-(b) gives the graph for the ordering relation kfx3; (x1 x2)g discussed earlier. For typical number of media in an application (say, < 5), the temporal relationships are expressible in a closed form, and hence explicitly representable for use by the MCS9 . See section 7) for detailed description of a sample application. 9 It is possible that OR dependency relationships exist among data such as the presentation of a voice data segment after a text or a graphic data segment.

17

5.3 Application level tolerance to data skew Due to delay variability experienced on various network channels, data skew can occur in an ith temporal interval of an application when the relative arrival times of data segments fxir gr=1;2;:::;N at a destination dier from their relative generation times at sources. In a shared document edit session for example, a user may send mouse click to highlight the update on a document and, after a small wait, send voice annotation to the update (i.e., HIGH LIGHT V OICE). The voice data may however experience less channel delay than the mouse data, and hence arrive at a participant workstation sooner; here, the voice data needs to be buered pending the delivery of mouse data. When a data skew is within the limits set by inter-stream synchronization constraints speci ed in QOStransp, it may not be noticeable to the application. A data skew may sometimes exceed the application-level tolerance limits (as set by the u del parameter), manifesting as a glitch in the presentation of data frames to the application10. Recovery from a glitch may often depend on how long the glitch eects persist and how tolerant the application is to these eects. In video delivery for example, a glitch is observable as a blank image interspersed in a sequence of picture images on the screen. The eect of blank image may persist in the minds of the human viewer for a few seconds. When the tolerance limits are exceeded, a recovery from data skew may manifest in activating an application speci ed `glitch handler'. In the TV example, the skew of a picture image may be handled by delivering the current audio segment along with a re-display of the previously delivered image, which may be acceptable due to the slow varying nature of the visual and aural cues at the human viewer [17]. In general, various levels of tolerance to skew among media data are also speci able in QOStransp.

5.4 Asynchronism in media data delivery The extent to which the delivery of a data segment is not constrained by the delivery of other data segments in a given interval is determined by the causal relationships between these segments. This in turn controls the extent of overlap (in time) in their delivery. Thus asynchronism in data delivery is a manifestation of the concurrency extractable from the ordering speci cations on media data segments. In general, media data segments that are concurrent incur less `holding delay' during their play-out than the case where these segments need to be strictly ordered. The application speci able ordering of media data segments, as featured in our model, permits a high 10

With weakly synchronizable media data, the skew tolerance limits can be high (i.e., u del is large).

18

degree of concurrency in the communication system. In the earlier example involving the data segments x1; x2; x3 (refer to Figure 7-(b)), the temporal interval d is given by max(f3 + k3 + l del3 ; l del1 + max(f1 + k1; l del2 + 2 + k2g)g) d max(fu del1 + max(fu del2 + 2 + k2 ; 1 + k1g); u del3 + 3 + k3g). When x1; x2; x3 are to be delivered in that sequence, d0 = u del1 +max(f1 +k1 ; u del2 +max(f2 +k2; u del3 + 3 +k3 g)g) indicates the maximum time duration over which the delivery occurs; and (1 ? dd ) is a normalized measure of the asynchronism (in the scale 0.0-1.0) possible during this delivery. Such a quantitative notion of concurrency manifests in the form of relaxing the constraints on media data play-out schedules and D Q bounds generated in the underlying protocols. 0

5.5 Cross-channel scheduling In certain applications, complex inter-media relationships concerning data skews may also exist. For instance, excessive data skews in one or more channels may aect the play-out schedules in other channels, even though skews in the latter may be within application-speci ed tolerance limits. Such cross-channel relationships can also be incorporated in QOStransp speci cation (using appropriate constructs). In the earlier example of conferencing, data skew with respect to HIGH LIGHT may be acceptable in V OICE or TEXT channels but not in both. With the speci cations for V OICE and TEXT as l del = 1:5 sec each and u del = 2:75 sec and 3:5 sec respectively, a non-arrival of V OICE at the 2:75 sec mark after the delivery of HIGH LIGHT may require the underlying protocol to re-schedule the delivery of TEXT immediately rather than at the 3:5 sec mark (say, by increasing the delivery priority of TEXT with respect to other segments waiting in end-system buers). Such a scheduling needs to be incorporated in the application-supplied `glitch handler' that is invocable by the protocol during generation of play-out schedules. Where applications exhibit complex inter-media relationships concerning data skews (like in the above example), speci c mechanisms for appropriate scheduling of media data may be superimposed on the base protocols that deal with data transport and delivery.

5.6 Hierarchical decomposition of speci cations The Occurs After construct can be used to specify the presentation requirements at various levels of data granularity in a uniform manner. Accordingly, multiple instances of the synchronization protocol may be 19

3.9 sec

1-3.5 sec Higher level data granularity

text segment mouse click segment 3.1sec

2.0 sec voice clip segment

segments containing audio samples . . 25 msec

.

.

Lower level data granularity

(inter-segment gap is shown for clarity; actually, the gap does not exist) Flow of time

Figure 8: Hierarchical decomposition of synchronization speci cations in multimedia document editing active, with each instance parameterized by a separate synchronization speci cation, and these instances composed together to generate a play-out schedule of the data segments at dierent levels of granularity. Consider a multimedia document editing application that employs voice annotations for text inputs and mouse clicks. Here, the requirement may be that, say, the text and mouse data be delivered simultaneously on a display window within a 3-4 sec interval and the voice data be delivered to a speaker within 1-3.5 sec after the text and mouse click. The ordering speci cations for this application may be as follows: Occurs After((V OICE CLIP; 1:5; 0:5); TEXT ^ HIGH LIGHT; f1:0; 3:5g); Occurs After((TEXT; 2:5; 1:4); NULL; f3:0; mg) and Occurs After((HIGH LIGHT; 2:0; 1:1); NULL; f3:0; mg);

(8)

where m ( 3:0) indicates a non-deterministic `quiesce time'. The voice clip lasting for 2:0 sec is viewed as a distinct data segment at the same level of granularity as the text and mouse data. At a lower granularity, the voice data segment in itself may be viewed as a continuous media data stream, being decomposable into a periodic sequence of audio samples, with each sample lasting for 25 msec. This speci cation may be given as Occurs After((AUDIOi ; 0:025; 0:0);AUDIOi?1; f0:025; 0:025g): See Figure 8 for an illustration. Such a hierarchical decomposition of media data exempli es the generality and extensibility of our speci cation method. And this is possible with the notion of application-speci c segmentation of media data11. We believe that the transport-oriented speci cation of media data dependencies in our model (in terms of causality) is more directly usable in an implementation of synchronization protocols than the higher level 11 The hierarchical decomposition of synchronization speci cations may be useful in constructing programming models of complex multimedia applications, say, for veri cation purposes.

20

petrinet based speci cations employed in [18]. We now analyze how the ordering relations, as speci ed in QOStransp, can be used by the underlying protocols to generate data play-out schedules.

6 Execution frameworks for synchronization protocols The parameters speci ed in QOStransp may be used to: i) generate a speci cation of the network delay behavior, and ii) generate a play-out schedule for data segments arriving through the network. These functions are incorporated as distinct elements in the protocols (see Figure 9).

6.1 Generating channel delay speci cations The synchronization protocol maps the Occurs After relations on data segments to acceptable delay variability on channels carrying these data, speci able as QOSnet, satisfying the timeliness constraint D Q r for rth channel (refer to the timing relations (3) and (6)). Consider the QOStransp speci cation given by (7) for a video application example. The parameters l delaudio ; u delaudio = 33 msec and l delvideo ; u delvideo = 33 msec indicate that audio ; video = 33 msec. So the corresponding QOSnet may specify a delay bound DQ = 33 msec for the channels carrying video and audio segments. Consider the earlier example of multimedia document editing, as given by the relations (8). Here, m is set at run-time based on how the synchronization protocol generates `quiesce times' for this application. If

x is the time at which a media data segment x is generated relative to the start of ith interval, QOSnet may specify the delay bounds D Q as: (m ? text ) and (m ? high light ) for the channels carrying text and mouse data respectively, and (m +3:5 ? voice ) for the channel carrying voice data, where text; high light < 3:0 and

voice < 4:0. The above type of mappings from QOStransp to a speci cation of DQ in QOSnet is necessary to control the delay behavior of data channels in the network, and hence exploit the possible asynchronism in media data delivery12. In many cases, a closed form derivation of QOSnet directly from the Occurs After relations speci ed in QOStransp is possible. In some cases, it may not be possible to derive a closed form mapping of delay speci cations. How QOSnet parameters may be generated in such cases needs further work. 12 Where the network delay D is persistently too high to meet the D speci cation, the network may refuse to set up the Q channel for the rth media data. This may in turn cause an abort of the entire connection setup, based on how crucial the rth media is to the application (e.g., refusal to set up video data channel may still allow the audio data channel to be set up for a videophone, in a degraded mode of operation).

21

Application entity M1, M2, M3 : End-point objects for media 1, 2, 3 (implemented by devices)

M1

M2 M3

‘glitch’ handler

Media data presentation

Label buffer

Insert/ remove labels Evaluation of Occurs_After constraints

flow of time

Data segments r,

Clock

invocation

Media splitter

r,

s

,k

Play-out scheduler

r

Synchronization control segments

Data Segment buffer

Segment assembler

Application data segments Packet arrival buffer Data segment Synchronization parameter for framing

synchronization channel

Data channels From network

Figure 9: Functional elements of synchronization protocol

6.2 Evaluation of ordering constraints When a data segment xij is received over a channel, the protocol buers xij , pending determination of whether xij can be scheduled for play-out. When the ordering constraints of xij are satis ed, it is placed in the input buer of j th media object for delivery to the application device. Processing of data by this object from its input buer should ensure that when xij is delivered, all other segments on which xij has dependencies have already been delivered (by `delivery', we mean `starting the play-out'). To allow such a processing, a data segment xij has a unique label (e.g., sequence number). Evaluation of ordering constraint involves parsing the AND graph rooted at the `vertex' xij to match each descendent `vertex' with the labels of various data already delivered in the current interval. When all descendent `vertices' of xij , say segment xir , have their data delivered (i.e., xir xij ), xij is transferred to the input buer of j th media object and scheduled for delivery to the application in the interval [tir + l delji ; tir + u delji ], and the label of xij (not the data part) is retained in another buer. The information in this buer may be garbage collected when the temporal position advances to (i + 1)th interval. With N < 5 and labels garbage collected at every interval, the size of the buer needed to retain the labels of data already delivered will be small13. 13 Time-stamping of data segments at sources using `synchronized real-time clocks' is another technique that may be used in the protocol to determine the causal ordering. The ordering based on time-stamp values, though simpler than our AND-graph based approach, may lose some information on concurrency due to linearization of the partial order among segments that is inherent in the application. For instance, two concurrent segments xir and xis may be assigned time-stamp values T Sr and T Ss respectively such that T Ss > T Sr (say, xs is generated after xr in physical time). Here, the causal relationship inferred from T Ss and T Sr is xir xis; so the play-out schedule enforced will be tir < tis. The potential loss of concurrency, though not incorrect, may reduce the asynchronism possible during data play-outs.

22

6.3 Data play-out procedures The delivery of data segments at a destination has two components: i) handling the data skews that may occur when the boundary conditions on delay behavior are enforced probabilistically by the network (such as in `'-channels [3]), and ii) scheduling the delivery of data segments as per the ordering constraints expressed in Occurs After relations. Data segments are time-stamped at source to allow the play-out scheduler at a destination determine their time of generation. The estimated play-out time for a data segment xir may be determined as: tir (est) = min(fmax(ftir?1 ; tis; : : :g) + u delri ; tir0 + ri + kr + ri g); where xis; : : : are causally preceding segments in the ith interval (if any), tir0 is determined from the time stamp carried by xir , and the timing parameters are determined from the Occurs After speci cations (c.f. equation (3)). Let t be the time at which xir becomes actually available at the destination entity with its ordering constraints satis ed, where (tir0 + ri + kr ) < t (tir0 + ri + kr + DQ (xir )) and t > max(ftir?1; tis; : : :g). The data segment xir may be played out as follows:

case 1. (tir?1 + ri?1 + kr ) t tir (est) The segment is immediately transferred to the device for play-out.

case 2. t < (tir?1 + ri?1 + kr ) The segment is buered for the duration (max(ftir?1 ; tis; : : :g) + l delri ? t), and then transferred to the device for play-out. In both cases, tir is set to tir (est). In case 1, there is some degree of laxity in scheduling the delivery of xir . And this laxity may manifest in the form of: i) specifying probabilistic channel delays, as determined

from ri?1, with the attendant reduction in network resource usage (in comparison to specifying deterministic channel delays), and ii) reducing the buering delays for play-out. When a data segment xir misses its delivery deadline (i.e., t > tir (est), which may arise due to D Q specifying a probabilistic bound), all segments that causally depend on xir (i.e., fxisgjs=1;:::;r?1;r+1;:::;N and xir xis ) cannot be scheduled for play-out without invoking the skew handler for xir . The latter may determine the play-out schedules for these segments in an application-speci c manner. In the earlier example of document editing 23

(c.f. speci cation (8)), the missing of text segment may be handled by enabling the scheduling condition for play-out of voice clip so that the occurrence of mouse data suces for the voice data to occur. A conservative choice of the parameter values in QOSnet (i.e., choosing a low value of D Q | based on l del parameter | when a large jitter is tolerable in the application) may result in less data skew but put more demand on network internal resources than an optimistic choice (i.e, choosing D Q close to or above the allowed limits | based on u del parameter). We have studied this tradeo for dierent applications and network environments by simulation [19].

6.4 Relaxed play-outs of data In many cases, the play-out schedules may allow overlapped delivery of various media data to application devices. The overlapped delivery may be either for concurrent data segments or in cases where the persistence duration of a data segment is larger than the l del parameter associated with its causally dependent data segment. For instance, the play-outs of data x1 and x2 can fully overlap each other when R(x1 ; x2) kfx1; x2g. Consider the earlier example of multimedia document editing given by speci cation (8). Here, the voice clip: i) de nitely overlaps the text since text + ktext(= 3:9) > u delvoice (= 3:5), and ii) possibly overlaps the mouse click since l delvoice (= 1:0) < high light + khigh light (= 3:1) < u delvoice (= 3:5). When the audio data is scheduled based on l delvoice (= 1:0), it fully overlaps the text and mouse data since text + ktext ? l delvoice(= 2:9); high light + khigh light ? l delvoice (= 2:1) > voice + kvoice(= 2:0). When the audio data is scheduled based on u delvoice (= 3:5), the overlap is at least partial with the text data since 0 < text + ktext ? u delvoice (= 0:4) < voice + kvoice(= 2:0), but no overlap with the mouse data is possible since high light +khigh light ? u delvoice (= ?0:4) < 0. The actual arrival time t of voice data, where voice < t 3:5, determines the extent overlap with the text and mouse data in this case. Since devices in end-systems are usually capable of operating in parallel and picking up data directly from play-out buers (say, through `DMA access'), a overlapped play-out mechanism in the synchronization protocol can potentially be exploited by an end-system. It is also possible that the MCS is implemented on a multiprocessor, with each processor controlling a separate device. In this case, the potential for exploiting the parallelism in play-outs is higher. The weakening of delay bounds DQ , made possible due to the relaxed play-out schedules on data segments, can often result in reduced allocation of network resources for data channels. The extent of asynchronism 24

in protocol execution and the extent of optimization in network resource allocations that arise due to the exploiting of weaker synchronization constraints in applications needs a quantitative analysis. This however is beyond the scope of this paper.

6.5 Transport of presentation control information In each interval, the presentation and other control information (viz., skew handler references, data dependency graphs and timing speci cations) in the application program are needed for framing of data. In some applications, these framing parameters may be generated by the source(s) afresh for each interval and transported on a synchronization channel alongside data streams (e.g., multimedia conferencing), as suggested in [9]. Refer to Figure 9. This channel should have low loss rate and delays since a loss and/or large delay of control information can result in dropping of data frames. In some other applications, the framing parameters may be generated locally at the destination for each interval, based on QOStransp speci ed at connection setup. In the TV example, the delay parameters can be locally generated based on the values negotiated at connection setup. Note that any control information necessary to `synchronize' real-time clocks at various entities should also be transported on a separate channel with low loss rate and delays. With our speci cation method treating single-source and multi-source multimedia data alike, the transport level ow of causality information across various sources can also be realized using the above procedures. Empirical studies on the synchronization protocols have been carried out for sample applications, which con rms the feasibility of our approach. An implementation of the protocols on a backbone network testbed is part of our further research plan.

7 Walk-through of protocol in sample application: Multi-player card game We rst describe the user-media interactivity in the application and the media synchronization requirements, and then the real-time play-out schedules of data generated by the synchronization protocol.

7.1 Mapping of game actions to media transport The various players participating in a card game sit in front of their workstations and interact with one another over a network. Each player has three windows displaying his/her own cards, the bets oered by players and the cards played. The MCS implements a multicast channel mechanism that allows the `game 25

state' sent by a workstation to be received by all other workstations through the network. Each player takes turn in some pre-determined sequence to oer bets and play cards. A player typically looks at the bets oered and cards played by other players before deciding his/her action. The turn to play wraps around among M ( 2) players, thereby forming a `logical ring' with each player passing the turn to his/her next player in the ring (this determines the causality among various user activities). The card display is by a graphics media (g card) and the bet oering is by two media: a text display indicating the bet (t bet) and a voice annotation of the bet (v bet). See Figure 10-(a). Here, these media segments pertaining to a player action in a given round of play constitute a data frame, with

R(fg card; t bet; v betg) g card kft bet; v betg: Assuming that the play actions of various players go in a strict sequence, we have that (g cardi kft beti ; v beti g) (g card(i+1) mod M kft bet(i+1) mod M ; v bet(i+1) mod M g); where i is an integer indicating the position of a player in the ring.

7.2 Presentation control on individual player actions From the MCS perspective, the media are modeled as non-continuous data streams, with the data segments generated by each player associated with a temporal interval of non-deterministic time duration. Let us assume that the minimum separation between the actions of any two successive players along real-time axis be 2 sec (we consider only the `reaction time' of players, but not their `thinking time'); let the minimum and maximum separations between the display of cards and the betting by a player be 1:75 sec and 3:0 sec respectively. Then the QOStransp component dealing with ordering of data segments appears as follows: ((g cardi ; 1:0; 0:5); Occurs After(t beti?1 ^ v beti?1 ); f4; m0 g), ((t beti ; 1:5; 0:5); Occurs After(g cardi); f1:75; 3:0g), and ((v beti ; 1:0; 0:5); Occurs After(g cardi ); f1:75; 3:0g), where m0 ( 4:0) is set dynamically by the game control program (based on the `reaction time' of players)14. The length of temporal interval d since the start of display of cards by a player is then given as 4:0+max(f(1:0+0:5); (1:75+1:5+0:5); (1:75+1:0+0:5)g) d m0 +max(f(1:0+0:5); (3:0+1:5+0:5); (3:0+1:0+0:5)g); i.e., 5:75 d (5:0 + m0 ). The various QOStransp parameters are mapped to delay bounds for the data channels carrying graphics, 14 A larger value can be chosen for to re ect the `thinking time' of players. In this case, the game playing time extends to m a larger duration. 0

26

player i+1

turn-to-play

temporal interval (contains data of one player: i+1) .

.

player i

t_bet-i

generation of data segments g_card, t_bet, v_bet during a turn to play

. .

t_bet-i+1 > 4,
1.75, < 3.0 g_card-i+1 > 1.75, < 3.0

1.5 own cards cards played

workstation window

> 1.75, < 3.0

v_bet-i

v_bet-i+1

1.5

bets

Interval processing delay (say, 0.5) Time in sec

(a)

(b)

Media data segment

temporal interval (contains data of more than one player: (k+1), . . . , i) t_bet-i t_bet-k

g_card-i > 4,
1.75, < 3.0

m’

g_card-k v_bet-i

> 1.75, < 3.0

g_card-k+1

v_bet-k

v_bet-k+1

Interval processing delay (say, 0.5) Time in sec

(c)

Figure 10: Illustration of data play-out schedules in the card game example text and audio (speci ed in QOSnet), as given by: D Q (g card) = (m0 ? g card ), D Q (t bet) = (m0 + 3:0 ? t bet), and DQ (v bet) = (m0 + 3:0 ? v bet). In addition, the QOStransp parameters give a range of play-out schedules for the media data. See Figure 10-(b). A normalized measure of the asynchronism arising due to concurrent delivery of voice and text may be given by (1 ? 711:75:5 ) (assuming m0 = 4:0), which is 0:33.

7.3 Interleaved presentation of multiple player actions Consider a case where the action of ith player is based not necessarily on that of his/her immediately previous player (i ? 1) mod M but is based on that of some other previous player k, where k < (i ? 1) mod M. In this case, the ith player generates his/her action after seeing the action of the kth player. Accordingly, the ordering relationship among data segments from players (k + 1) mod M; : : :; i may be given by

kf(g cardj kft betj ; v betj g)gj =(k+1) mod M;:::;i : A single temporal interval encompasses the occurrence of the data segments generated by these players. This means that the average length of an interval d does not depend on the number of players whose actions are concurrent during a given turn to play. See Figure 10-(c) to illustrate the data delivery schedules. The concurrency among the data segments from players (k+1) mod M; : : :; i depicts a weaker constraint on their delivery at all the M player entities. This manifests as increased asynchronism in the delivery processing 27

of data when compared to the case of a xed delivery sequence given by a rigid enforcement of the turn to play. A normalized measure of this asynchronism with, say, 2 concurrent player actions, may be given by (1 ? 27:1175:5 ), which is 0:66. The methods to map the card game application to synchronization protocol procedures in the MCS can be useful for other application models as well.

8 Related works Some existing works suggest controlling the delay behavior of channels set up through the network, on top of which an appropriate synchronization protocol may be layered. For instance, [20] guarantees delay jitter-free channels in the network, so media synchronization may be implicitly achieved by providing `synchronized real-time clocks' in the transport layer and moving data streams over such channels. Though the synchronization protocol is simpler, the approach does not oer exibility of exploiting the weaker synchronization requirements inherent in many applications for optimal usage of network resources. For example, the text and graphics data in a catalogue browsing service are subject to the same degree of delay jitter control as with, say, video and audio data in video telephony that has stronger synchronization requirement. Some works allow weaker levels of delay guarantees from the network [3, 4, 21] to allow more exibility in this aspect. But these works deal with application requirements at the level of QOSnet speci cation, but not with a canonical speci cation of media synchronization requirements at the application-transport layer interface, viz., QOStransp . The work in [17] demarcates synchronization from transport and primarily deals with only the former by analyzing the presentation requirements of multimedia data. So these high level requirements need to be translated into a precise speci cation of QOStransp for use by the MCS. Similarly, the multimedia communication model proposed in [1] identi es the temporal ordering issues. However, it does not deal with how the ordering can be enforced. The synchronization protocol model proposed in [22] is to identify the various delay elements in the end-to-end data channels and (implicitly) account for these delays during play-out of data. In comparison to these works, our paper directly deals with deriving a exible speci cation of QOStransp from application level synchronization requirements and using the QOStransp in various components of the MCS (such as generating delay bounds for channels | see also [19]). The work close to our paper is the òbject-composition petrinet' model proposed in [18] which deals with 28

higher level requirements as to how temporal relationships among multimedia data streams can be speci ed and how these relationships may be enforced by a synchronization protocol to generate data play-out schedules. Likewise, the `partial order transport service' model proposed in [23] adopts a similar speci cation approach, but no concrete protocols for synchronization are dealt with. In comparison, our approach deals with a lower level transport-oriented speci cation of temporal relationships among data, viz., QOStransp, in the form of data dependency graphs, which can be directly used by a synchronization protocol for data play-outs. For instance, how data delays induced by the network may be tackled by the synchronization protocol can be directly inferred from the QOStransp. In this sense, our paper provides an alternative exible approach of extracting directly executable synchronization speci cations from the applications.

9 Conclusions The paper presented a comprehensive model of a multimedia communication system (MCS) that synchronizes multimedia data streams during presentation to the application (e.g., digital TV, scienti c visualization). The basic premise in our approach is that the user level component of the MCS takes the burden of synchronization, instead of the network. This introduces exibility in the data transport protocols and allows optimizing the usage of network resources to application needs. The model can be used for both stored and live data presentation. And it treats single-source and multi-source multimedia data alike from a synchronization speci cation point of view. Basically, the temporal axis of an application is segmented into intervals where each interval is a unit of synchronization and holds a data frame. Simultaneous data play-out involves delivering all data segments belonging to an interval within a certain real-time delay for processing. A data segment basically depicts an application-speci c granularity of data, often realizable as a sequence of packets exchanged through the network. The notion of simultaneity and ordering of data segments is meaningful only to the application in that the latter determines: i) how far a data frame can be skewed in time in relation to other frames and still be acceptable, and ii) the sequence in which various segments in an interval can be presented. These temporal properties are speci able in the form of a transport-oriented QOS description, which is mappable into a lower level QOS speci cation on the required delay behavior of the underlying network. Thus our model allows application characteristics to be mapped into a set of data delivery procedures composed in the form of a media synchronization protocol. A possible relaxation of the data delivery constraints and the network delay requirements, as allowed by the model, oers a potential for optimal usage of network 29

resources and exibility in design of transport systems. The speci cation method is itself independent of the source-destination con guration in applications, which allows easier construction of programming models of multimedia applications for analysis, veri cation and/or implementation. Using the above model, the paper described protocols for synchronized data delivery, meeting application speci c real-time constraints in the presence of data transport delays in the network. The contribution of the paper is in formulating a canonical and uni ed model of temporal presentation control of multimedia data streams that allows dierent types of QOS speci cations and synchronization protocols to be easily `plugged in' as MCS components. The model basically allows building a network-independent media synchronization function in an end-system. And we believe that the model will shed useful insight into the evolving multimedia application technologies.

References [1] W. F. Leung and et al. A Software Architecture for Workstations Supporting Multimedia Conferencing in Packet switching Networks. IEEE Journal on Selected Areas in Communications, Vol.8, No.3, pp.380-390, April 1990. [2] G. Anastasi, M. Conti, E. Gregori TPR: A Transport Protocol for Real Time Services in a FDDI Environment. In International Workshop on Protocols for High Speed Networks, IFIP, 1991. [3] B. Field and T. Znati. Alpha-Channel | A Network Level Abstraction to Support Real-time Communication. In Proc. 2nd International Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg (Germany), pp.148-159, Nov. 1991. [4] D. Ferrari and D. C. Verma. Real-time Communication in a Packet Switching Network. In second International Workshop on Protocols for High Speed Networks, IFIP WG6.1/WG6.4, Palo Alto (CA), Nov. 1990. [5] D. L. Mills. Precision Synchronization of Computer Network Clocks. In Computer Communication Review, Vol.24, No.2, pp.28-43, April 1994. [6] G. J. M. Smit, P. M. Havinga and M. J. P. Smit Rattlesnake: a Network for real-time Multimedia Communications. Technical report, University of Twente, Netherlands. [7] D. Clark, et al. NETBLT: A High Throughput Transport Protocol. In Proc. Symp. on Communication Architectures and Protocols, ACM SIGCOMM, pp. 353-359, 1988. [8] C. Nicalaou. An Architecture for Real Time Multimedia Communication Systems. In IEEE Journal on Selected Areas in Communications, Vol.SAC-8, No.3, pp.391-400, April 1990. [9] D. Shepherd and M. Salmony. Extending OSI to Support Synchronization Required by Multimedia Applications. In Computer Communications, Butterworth-Heinemann Publ. Co., Vol.13, No.7, pp.399-406, Sept. 1990. [10] D. D. Clark and D. L. Tennenhouse. Architectural Considerations for a New Generation of Protocols. In Proc. Symp. on Communication Architectures and Protocols, ACM SIGCOMM, Philadelphia (PA), pp. 200-208, Sept. 1990. 30

[11] R. G. Herrtwich. Time Capsules: An Abstraction for Continuous Media Data. In Journal of Real-time Systems, 1991. [12] L. Li, A. Karmouch and N. D. Georganas. Synchronization in Real Time Multimedia Data Delivery. In Proc. Intl. Conf. on Communications, 1992. [13] P. V. Rangan, S. Ramanathan and T. Kaeppner. Inter-Media Synchronization in Multimedia Retrieval over Communication Networks. In Technical Report: CS92-227, Dept. of Computer Science, UCSD, April 1992. [14] L. Lamport. Time, Clocks and Ordering of Events in Distributed Systems. In Communications of the ACM, 1978. [15] R. Yavatkar. MCP: A Protocol for Coordination and Temporal Synchronization in Multimedia Collaborative Applications. In Proc. Intl. Conf. on Distributed Computing Systems, pp.606-613, IEEE-CS, Yokohama (Japan), June 1992. [16] CCETT. Multimedia Synchronization. In CCETT Int. Note: AFNOR ad-hoc group on AVI standardization, July 1988. [17] R. Steinmetz. Synchronization Properties in Multimedia Systems. In IEEE Journal on Selected Areas in Communications, Vol.SAC-8, No.3, pp.401-412, April 1990. [18] T. D. C. Little and A. Ghafoor. Multimedia Synchronization Protocols for Broadband Integrated Services. In IEEE Journal on Selected Areas in Communications, Vol.SAC-9, No.9, Dec. 1991. [19] K. Ravindran and V. Bansal. Delay Compensation Protocols for Synchronization of Multimedia Data Streams. In IEEE Transactions on Knowledge and Data Engineering, Special Issue on Multimedia Information Systems, vol. 5, no. 4, pp.574-589, Aug. 1993. [20] D. Ferrari. Design and Applications of a Delay Jitter Control Scheme Packet Switching Internetworks. In Proc. 2nd Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg (Germany), pp.72-83, Nov. 1991. [21] A. Campbell, G. Coulson, F. Garcia and D. Hutchison. A Continuous Media Transport and Orchestration Service. In Proc. Symp. on Communication Architectures and Protocols, ACM SIGCOMM, Baltimore (MD), pp. 99-110, Aug. 1992. [22] J. Escobar, D. Deutsch and C. Patridge. Flow Synchronization Protocol. In proc. IEEE GLOBECOM, pp.1381-1387, 1992. [23] P. D. Amer, C. Chassot, T.J. Connolly, M. Diaz, and P. Conrad. Partial-Order Transport Service for Multimedia and Other Applications. In IEEE/ACM Transactions on Networking, vol. 2, no. 5, pp.440-455, Oct. 1994.

31