DirectShow RTP Support for Adaptivity in Networked Multimedia Applications Linda S. Cline, John Du, Bernie Keany, K. Lakshman, Christian Maciocco, David M. Putzolu1 Intel Architecture Labs2 2111 N.E 25th Avenue Hillsboro OR, 97124
Abstract Execution of an interactive collaboration application or a distributed game involves many individual multimedia applications that concurrently generate and/or play back several audio and video streams. As the resource requirements of individual streams change and as streams/applications are started or later terminated, the amount of available resources may change dynamically. Networked MultiMedia (NetMM) applications must be willing to adapt to these changes by taking advantage of the fact that they can deliver varying levels of services that are acceptable to users. This paper addresses the problem of adding network and host adaptive capability to the component-based DirectShow RTP. DirectShow is Microsoft’s architecture for capture and presentation of multimedia data. DirectShow RTP is a framework that extends the DirectShow architecture, adding support for multimedia applications that stream their data across computer networks using the RTP protocol. The DirectShow RTP framework was designed to support a wide variety of multimedia streaming tasks in a highly extensible manner. DirectShow RTP is slated for distribution as part of Windows NT 5.0, thus assuring widespread distribution. We have extended this framework by adding support for streaming applications which dynamically compensate for varying resource availability on the local host and on the computer networks being used to deliver multimedia data to and from these applications. Our extensions include components for capturing 1
information relevant to making adaptation decisions, components that determine adaptation policies to follow based on this information, and methods of implementing these policies using capabilities already present in the base architecture. The lessons learned are useful both to applications that wish to use adaptation as well as to designers of frameworks used to build such applications.
1. Introduction Networked MultiMedia (NetMM) applications are one of the most resource intensive classes of software executed on personal computers and single user workstations. Such applications have high requirements in terms of host processing power and in network bandwidth consumption. These applications also often require that the resources necessary for execution be provided by the underlying operating system and computer networks within near-real time constraints. Delay in access to any of the above resources can result in significant perceived degradation of the quality of the presentation by the user. As the resource requirements of individual NetMM applications change and as streams and applications are started or later terminated, the amount of available resources on the local host and over the network may change dramatically. Because both resource requirements and availability change dynamically at run time, NetMM applications must be prepared to adapt to these changes in a graceful fashion [10,11]. In this paper, we focus on two types of adaptation network adaptation and host adaptation. Network adaptation is the ability of a streaming application to vary
Author names are in alphabetical order.
2
[email protected],
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
IEEE Multimedia Systems ‘98
1
the way it utilizes network resources so as to make most efficient use of these resources in the face of such conditions as changing loss, jitter, and available bandwidth. Host adaptation is defined as the ability for applications to modify their behavior based on conditions present in the local host, including such factors as CPU utilization and available memory. The following examples are illustrative of situations where both network and host adaptation are useful: Competition for resources among multiple streams. Consider an application that uses an audio stream, a high-bitrate video stream, and a bursty slide-show stream, where the audio stream is considered to be of the highest value to the user. If the quality of the audio stream suffers, the application may adapt the slide show or the video stream to utilize less system resources in order to satisfy the user’s priorities. The application must also be able to detect competition for network resources and adapt the behavior of each stream it sends and the set of streams it is receiving appropriately. Allowing presentations to be viewed by clients with a variety of available network and processor resources. In a video conference or collaboration session that involves a set of users with heterogeneous capabilities in terms of processor power and bandwidth availability, all endpoints will not be able to receive all streams. In this situation, all members of a session may be able to participate if hierarchically encoded video is used. This form of adaptation allows heterogeneity of participants, often with minimum resource requirements that are significantly relaxed in comparison to a presentation consisting of a single video stream with no hierarchical encoding. Compensating for user changes in the relative importance of applications. In a single-user environment, the relative importance of one application over another varies from time to time. For example, when a user switches the application focus from watching a news broadcast to a compilation or a design task, the operating systems such as Windows 95 typically causes the multimedia application to move to the background and execute at a lower priority. When the user switches back to the news broadcast to watch an event of interest (for example, the latest highlights from a baseball game) then the operating system increases the execution priority of the application as a response. NetMM applications should also IEEE Multimedia Systems ‘98
2
respond to being placed in the background or moved to the foreground. Thus, NetMM applications switched to the background should reduce their network and CPU utilization, returning to full network and CPU utilization when they are brought back to the foreground. This additional response above and beyond the priority adjustments performed by the operating system maximizes the allocation of resources to those tasks that need them most. Researchers have proposed several methods of enabling adaptation in NetMM applications. These methods include different forms of adaptive source rate control [1,6], receiver-driven adaptation using layered video [7,9,15], and adaptation to host resources [8,11]. Even when protocols and mechanisms such as RSVP and ATM that provide QoS (Quality of Service) assurances are available to meet the demands of NetMM applications, researchers have explored the use of adaptation in the presence of resource constraints and changes in resource usage [8,12,16,17]. Given the positive effects of adaptation on application quality, we decided to explore how application writers can add adaptivity to applications without having to worry about the intricacies of adaptation. In particular, this paper examines how NetMM applications that adapt to changing network and host conditions can be supported by a component based architecture for multimedia streaming. This paper discusses our contributions to this area, which consist of new exploration of adaptation in response to host resource constraints, construction of middleware to enable network and host adaptation in NetMM applications, and a set of lessons learned that are useful to any application writer or framework designer wishing to incorporate adaptivity. We use Microsoft’s DirectShow[10] as the basis for our architecture. DirectShow provides a modular, extensible system for implementing multimedia applications. Within the DirectShow architecture, we have added a framework for constructing NetMM applications by implementing the RTP protocol, which we term DirectShow RTP. The rest of this document is organized as follows. Section 2 presents the DirectShow architecture as well as DirectShow RTP. This discussion is followed by Section 3, which presents an analysis of various methods of adding adaptability to a component-based architecture such as DirectShow RTP. Section 4 describes our implementation of source-based adaptive control, followed by Section 5, which presents an implementation of receiver driver adaptation using layered multicast. Section 6 provides an overall analysis of our experiences in implementing network and host adaptation for applications in a component based streaming architecture.
Section 7 discusses areas where further work in adaptation and in frameworks for adaptation may yield further improvements in NetMM applications. We conclude with a discussion of the benefits of adaptation when implemented in a component based architecture.
2. DirectShow Architecture 2.1 Microsoft’s DirectShow The DirectShow architecture is used to provide the underlying multimedia streaming functionality for many applications. It is used for all aspects of multimedia computing, including capture, encode/decode, playback, and storage of both audio and video data. The DirectShow architecture uses four primary abstractions for manipulating multimedia data. These abstractions are termed filters, pins, media samples, and media types. DirectShow filters are used to encapsulate one or more tasks related to multimedia streaming. Examples of DirectShow filters include the video capture filter, which is used to control a video camera device, outputting raw RGB video frames, and the H.261 codec filter, which is used to compress raw RGB video buffers into H.261 frames and vice versa. Similar filters such as the audio capture filter and the G.711 codec filter exist for audio
MPEG FILE
MPEG Deocoder
Video renderer
Data Flow
Figure 1: DirectShow filters connected in a filtergraph. streaming. Filters are also provided for playback of audio and video to local devices. In order to allow applications to combine several functions together in processing audio or video data, DirectShow uses filter graphs. A filter graph is an ordered set of filters that process multimedia data buffers in a serial manner. Figure 1 shows an example filter graph used for video playback, consisting of a file reader filter, an MPEG decoder filter, and a video renderer filter. Filters are connected together to form filter graphs via pins. Pins have two primary duties in DirectShow. The first of these duties is negotiation of the media type and memory allocator to use for filter interconnection. Media type negotiation is the means whereby the media type (see below) that will govern the data exchanged between two filters is determined. Memory allocator negotiation is used to specify where memory used to contain multimedia buffers (also called media samples) will be allocated and what the characteristics of that memory will be (e.g., bytealignment, use of special regions of memory from memory mapped devices, etc.) The second duty of pins in
IEEE Multimedia Systems ‘98
3
DirectShow is to hide the means whereby data is exchanged between filters. Once a connection has been successfully negotiated, a filter simply receives and delivers media samples to its pins, which in turn implement the actual means whereby the samples are delivered to the next filter in a filter graph. Media samples are DirectShow’s abstraction for buffers containing multimedia data. In addition to the multimedia data they contain, media samples also contain start and end time stamps, which specify the life of the sample. These values are used by renderers to determine when a sample should be played (e.g., rendered) and to detect performance problems. DirectShow media types specify the format of the data contained in the media samples exchanged between filters. Media types include several fields; the most important of which are the major and minor type fields. A major type is typically used to differentiate formats according to high level semantic guidelines. MAJORTYPE_VIDEO and MAJORTYPE_AUDIO are two examples of major types. Minor types typically specify formatting differences – examples include MINORTYPE_AUDIO_G711A and MINORTYPE_AUDIO_G723. If pins from two filters are able to find a media type in common to the filters, then a connection is possible. DirectShow allows definition of new filters, pins, and media types. Taking advantage of the built-in extensibility of DirectShow, we defined two new media types and several filters to enable support for network streaming of multimedia data via the RTP protocol in the DirectShow architecture. This new framework for NetMM is called DirectShow RTP [5].
2.2 DirectShow RTP DirectShow RTP defines a set of filters and media types that provide support for network multimedia streaming using the RTP protocol [14]. The filters defined are the RTP Source filter, the RTP Render filter, the RTP Demux filter, the RTP Receive Payload Handler (RPH) filter, and the RTP Send Payload Handler (SPH) filter. Using these five filters (along with standard CODEC and capture/render filters) it is possible to construct the data plane of applications which stream audio and video data across computer networks using the RTP protocol. The RTP Source filter is used to receive RTP and RTCP packets from a single RTP session. These packets are delivered into a filter graph encapsulated in media samples. The media type advertised by this filter for outgoing connections consists of major type RTP_MAJORTYPE_MULTIPLE_STREAM and minor type RTP_MINORTYPE_PAYLOAD_ANY, both of which were defined as part of the DirectShow RTP
framework. This combination of media types indicates that this filter can produce a stream containing one or more RTP streams (SSRCs), which taken together may be of a single payload type or multiple payload types. This filter provides interfaces for indicating information to send in RTCP receiver reports to other machines and for specifying the network address and port number to use for receiving an RTP session.
Similar to the RTP RPH filter is the RTP Send Payload Handler (SPH) filter, which is responsible for segmenting media samples produced by video or sound compressor filters into RTP packets. This filter provides interfaces for specifying the maximum size of packets to produce (in order to allow for differing network MTUs) and for specifying the value to place in the PT field (to allow use of dynamic RTP PT values.)
Closely related to the RTP Source filter is the RTP Render filter. This filter accepts incoming connections consisting of media samples of major type RTP_MAJORTYPE_SINGLE_STREAM, with any minor type. The more restrictive major type selection is necessary in order to adhere to the guidelines specified in the AVP profile for a sender in an RTP stream with regard to payload type and SSRC values. This filter provides similar control interfaces to those found on the RTP Source filter.
Figures 2 and 3 demonstrate how the filters defined in DirectShow RTP are used. Figure 2 shows a filter graph used to capture local multimedia data and send it across a
The RTP Demux filter is used for de-multiplexing RTP packets that have been received from the RTP Source filter. This de-multiplexing is done according to the SSRC and payload type of each packet. This filter accepts incoming pin connections of major type RTP_MAJORTYPE_MULTIPLE_STREAM and minor type RTP_MINORTYPE_PAYLOAD_ANY. This filter advertises one or more outgoing pin connections of major type RTP_MAJORTYPE_SINGLE_STREAM with a minor type that is associated with the payload of the single stream being delivered on the pin in question. This filter provides interfaces for controlling how streams are demultiplexed and how individual streams are assigned to particular outgoing pins. The RTP Receive Payload Handler (RPH) filter is used to transform RTP packets from a single source of a fixed payload type into their corresponding unpacketized (host native) form. Thus, one version of this filter takes RTP H.261 packets and produces H.261 compressed video frames. Versions of this filter have been written for many other popular payload types including H.261, H.263, Indeo, G.711, G.723, and G.729, with generic versions available that support any audio or video payload. These generic audio and video RPH filters packetize media samples according to the specifications of the RTP AVP profile [13]. This filter provides interfaces for specifying the amount of buffering to allocate for reassembly buffers and for the duration of time to wait before dropping (or forwarding) partially reconstructed frames in the event of packet loss.
IEEE Multimedia Systems ‘98
4
Video Capture
SPH
Codec
RTP Render
Data Flow
Figure 2: Sending network data using DirectShow RTP. computer network using the RTP protocol. This filter graph consists of a video capture filter, which outputs raw video frames, followed by a codec filter, which
RTP Source
RTP Demux
RPH
Deocder
Video Renderer
Data Flow
Figure 3: Receiving data using DirectShow RTP. compresses the frames. Once compressed, these frames are delivered to the RTP SPH filter, which fragments and packetizes them, producing RTP packets, which in turn are delivered to the RTP Render filter, which transmits these packets across a computer network. Figure 3 shows a filter graph used to receive RTP packets containing a video stream and play back the stream. This graph consists of an RTP Source filter, which receives the packets, and RTP Demux filter, which categorizes them according to source and payload type, and an RTP RPH filter, which converts the RTP packets into compressed video frames. These filters are followed by a decoder filter, which decompresses the frames, and a video renderer filter, which displays the uncompressed frames. The most important aspects of the DirectShow RTP framework were the definition of the media types used to represent RTP streams and the implementation of five filters for receiving, manipulating, and sending RTP packets. The media type definitions provide a useful way of describing RTP streams in the DirectShow architecture, allowing future filters to be defined which add new methods of processing RTP streams. The filter implementations provided a set of components upon which NetMM applications could easily be built and
provided an infrastructure that could be used for - for performing further research in the area of NetMM.
change its output or pass the message further up the graph.
3. Adaptation in DirectShow RTP
Quality Control
DirectShow allows a programmer to write multimedia applications without having to worry about media processing [10]. DirectShow RTP extends this architecture to support NetMM applications that use the RTP protocol. In order to enable adaptivity for such applications, several changes were necessary to DirectShow. Components were added that measure and monitor the QoS of delivery and presentation of multimedia streams, determine the causes of any degradation which occurs in such streams from adverse conditions, initiate adaptation to these conditions, and determine whether the adaptation yielded the desired improvement in stream delivery and presentation. It was also necessary to develop a system of determining the relative priorities of the data streams of an application with respect to one another and with respect to other applications that are executing on the same host. The DirectShow architecture supports a mechanism that allows a filter graph to adapt to adverse conditions affecting the stream it carries. This mechanism is based on generation and interpretation of quality messages. Quality messages indicate conditions where data is being produced too rapidly (a “flood”) or too slowly (a “famine”) to be consumed by the filters of the graph. Quality messages travel from filter to filter in the opposite direction of the data flow in a filter graph. It is the responsibility of each filter to attempt to address the conditions indicated in a quality message before passing such messages further up the graph. To indicate the severity of a flood or famine, quality messages include a field that indicates the proportion by which an upgraph filter should change the rate at which it is delivering data. In addition to the default behavior of delivering quality messages upgraph, it is also possible to configure filters to deliver these messages to other system components. This allows DirectShow to support two modes of adaptation. The default mode, which consists of passing quality messages upgraph, allows an in-stream method of adaptation. Re-routing such messages to an alternative receiver allows an adaptation controller to centralize adaptation control for multiple streams. These two methods of quality message delivery yield two different means of adding support for adaptation to the DirectShow RTP architecture.
3.1 The In-Stream Approach In this approach (as shown in Figure 4), the filter that generates the quality control message sends the message to the next upgraph filter. Each filter in a graph may
IEEE Multimedia Systems ‘98
5
Video Capture
Codec
Packetizer
RTP Render
Data Flow
Figure 4: In-stream adaptation in DirectShow This approach has the following consequences: •
Each filter monitors QoS and is able to provide performance indications.
•
Each filter may include functions that interpret quality messages and react to them
•
The responsibility of adapting is distributed among components and the interaction is simple.
However, this approach suffers from certain drawbacks: •
Each stream adapts in a vacuum rather than cooperating with other streams.
•
An application cannot cooperate with applications to share available resources.
•
It is difficult to implement policies such as, “adapt capture settings before codec settings.”
other
In-stream QoS control provides a simple, easy to implement mechanism for indicating and reacting to adverse conditions in a filter graph. However, this method of quality measurement is severely limited by the fact that it occurs in the context of each filter graph, rather than on an application- or system-wide basis.
3.2 Adaptation Controller Approach To specify an adaptation controller (quality manager in DirectShow terminology) for a filter, an application sets a quality-message sink entry for each filter pin. If the sink entry is set, filters deliver quality messages to the sink rather than sending the quality message upgraph. An adaptation controller may either monitor a single filter, all the filters in a filter graph, all the filter graphs in an application, or even filter graphs executing in several different application contexts. Adaptation controllers use quality messages to make adaptivity decisions. Figure 5 illustrates an adaptation controller that controls two streams, where input for adaptation is taken from the two render filters and implementation of adaptation policies is done by modifying the behavior of the source (capture) and codec filters.
An architecture that uses an adaptation controller poses several problems. One issue that must be addressed is determining how the adaptation controller should interact with applications. It is also necessary to examine how an adaptation controller can be written in a flexible fashion so that it can control different kinds of components. Finally, adaptation policies for reacting to adverse conditions in the network and the host computer must be determined. These policies must be applicable to both the single filter graph (media stream) case and multiple filter Data Flow
Video Capture
Control Actions or Quality Messages
Codec
SPH
RTP Render
Finally, for those applications where adaptation is desired but no changes can be made to the application, it is possible to add an adaptation proxy filter to filter graphs used by applications. This technique may be used with applications that take advantage of DirectShow’s ability to store filter graphs in file form for later use. A proxy filter is inserted in these stored filter graphs without the applications knowledge. When the filter graph is instantiated, the proxy filter traverses the graph on the adaptation controller’s behalf, delivering interfaces and quality messages from the other filters of the graph to the adaptation controller. In order to be of maximum utility to applications that use the DirectShow RTP architecture, our adaptation controllers support all three types of application interaction.
Quality Control Message
Adaptation Contoller
initializing the adaptation controller. In this case, the application only is required to hand a pointer to every filter graph that the application creates to the adaptation controller. The adaptation controller parses the graph to obtain various interfaces and interacts with them in a manner that is transparent to the application.
3.3 Manipulating Filters to Cause Adaptation Audio Capture
Codec
SPH
RTP Render
Data Flow
Figure 5: Adaptation control in DirectShow using an adaptation Controller. graph case. Applications may be assigned to three different categories, depending on the type of interactions they have with the adaptation controller. These categories include applications that cooperate with the adaptation controller about the adaptation policies to use, applications that only interact with the adaptation controller during initialization, and applications that have no direct interactions with the adaptation controller. Applications that cooperate with the adaptation controller load the controller as a dynamically linked library (DLL). After creating the filter graph, the application uses an API provided by the adaptation controller to set up the necessary connections that enable the Adaptation controller to adapt the filter graph. The application configures exactly which filters can be adapted and what policies are used for adaptation. For those applications where adding support for interaction with the adaptation controller is impractical, it is possible to add a minimal level of support in the form of IEEE Multimedia Systems ‘98
6
Two methods of filter manipulation by adaptation controllers are considered. In the first alternative, an adaptation controller does not take actions directly; instead it passes quality messages upstream along the appropriate filter graph. The adaptation controller does not have to be aware of the filters and their individual methods of control. However, we found that many filters (such as the standard DirectShow video capture filter and certain codecs) do not have the capability to react to quality messages. Other filters are not graceful in the way they react to quality messages, such as the MPEG decoder filter, which drops all P and B frames whenever it receives a flood message. The adaptation controller constructed for DirectShow RTP can simply send quality messages to a filter graph in the absence of means to directly manipulate the output of the filters in a graph. However, this method of causing adaptations only used as a last resort due to the lack of fine-grained control, since the adaptation controller has no way of knowing what form the response to the quality message will take. Many filters provide interfaces that allow the adaptation controller to directly control their output. In order to take advantage of this capability, an adaptation controller was constructed that is able to utilize these interfaces on certain filters in order to adapt application behavior. Directly controlling the output of each filter allows the adaptation controller to pick and choose which filter it wants to adapt and to what degree it wants it to adapt. This alternative allows the adaptation controller to control
adaptation in a much more graceful and precise fashion than the previous method, which relies on sending a quality message upgraph and hoping that some filter responds in the desired manner.
Changing the capture rate has a significant impact on DirectShow sample time stamps. The DirectShow time stamp for a media sample is based on elapsed time values in video frames set by the capture driver.
The adaptation controller engine constructed provides support for both types of adaptation. This engine centralizes adaptation decisions for all multimedia streams in a process in a single component. This yields more effective adaptability than the default DirectShow behavior, which is to make adaptation decisions for each filter graph stream in an uncoordinated fashion. In the following sections we discuss two mechanisms for adaptability, source based adaptive control and receiver driven adaptation via layered multicast. This paper documents how these adaptation mechanisms were implemented in our component based architecture as well as how these mechanisms were used to achieve both network and host adaptation.
Sample start time = Graph start time + Elapsed frame time in video frame
4. Source-Based Adaptive Control In adaptive source rate control (ASRC) [1] the sender responds to changes in network resources by varying filter parameters such as capture rate and codec compression settings. In our implementation, we extended ASRC to also adapt to changes detected in availability of local resources. Due to the lack of satisfactory support for adaptation in many of the components provided in DirectShow, it was necessary to construct new components which met our requirements. Furthermore, it was necessary to construct an adaptation controller that could manipulate the interfaces of each of these components so as to implement ASRC.
4.1 Video Capture Filter control In order to support adaptation, a video capture filter must allow parameters such as the capture rate and resolution to be changed on the fly. A video capture filter must also generate media samples with sufficiently accurate timestamps that they can be used by the render filter to generate quality messages. Finally, it must respond to quality messages passed to it by downstream filters. As currently implemented, changing the frame rate or resolution of video capture on the video capture filter requires the capture to be stopped, and the capture parameters changed, and the capture restarted. While changing the capture resolution resulted in a noticeable pause in restarting the filter, the effect of a frame rate change on the video is hardly noticeable when using a high-speed CPU such as a 200 MHz Pentium Pro processor. This idiosyncrasy in the video capture filter caused us to modify our adaptation policies so that they favor changing the frame rate rather than the resolution when adaptation is necessary.
IEEE Multimedia Systems ‘98
7
Render filters use the sample timestamps to determine if media samples are arriving on time or not. When video capture is restarted, the elapsed time values in the video frames also restart at zero. The resultant DirectShow time stamps appear to be old to the render filter. It was also observed that, over a period of a few minutes, the video render filter begins indicating that samples are late even when the system is only lightly loaded. This problem occurs because the timestamps generated by the video capture driver gradually begin to lag behind the wall clock time. The video capture filter generates these timestamps from each video frame header, which are assigned based on a clock source in the capture device. This clock source has undesirable resolution and accuracy characteristics leading to eventual disparity between it and the clock used for comparison in the video renderer, resulting in the false indication of sample tardiness. To solve these problems, we have modified the capture filter to use the DirectShow reference clock (which is based on a more accurate wall clock source) to generate time stamps rather than the values found in the video frame header. This eliminates the restart and the lag problem and results in the following equation for sample start timestamp assignment: Sample start time = Graph start time + Elapsed wall clock time In addition to requiring high accuracy clocks for timestamp generation, it was discovered that the buffer management policy used by the capture filter is critical in quality management. Whenever this filter runs out of buffers, it discards the oldest buffer and places the new video frame in the head of the buffer list. This means that the render filter gets new video frames with recent time stamp values rather than old, stale buffers. This causes the renderer to believe that all samples are being delivered on time, indicating low CPU load, indicating no need for quality messages or adaptation. One solution is for the capture filter to directly take corrective action when it runs out of buffers rather than relying on quality messages from downgraph to signal when to take such action. Instead, it was decided to drop new frames, allowing the renderer to detect that samples being delivered to it are late. To allow the video capture filter to adapt in the absence of an adaptation controller, we modified the capture filter’s quality message notification function so that it would react
to these messages rather than ignoring them. A downstream filter (such as the video render) filter calls this function with a quality message, indicating that a change in the sample delivery rate is necessary. Changes to video capture in response to a quality message are performed in a separate thread to avoid performing significant processing in the context of the calling thread. This action was necessary to avoid false detection of further tardiness by the render, which would occur if the renderer’s thread was forced to perform excessive processing on behalf of the source filter.
4.2 Codec Support for Adaptation The codec in the filter graph is a major consumer of processor resources, making it a primary candidate for adaptation. The codecs that we use for video provide an interface that controls output bitrate and video quality. The adaptation controller uses this interface to affect CPU and network bandwidth usage. We found these interfaces to be sufficient for our purposes in constructing an adaptation controller, which directly interacts with individual filters.
4.3 Adaptive RTP Source and Render filter Allowing adaptation in response to network and host conditions required changes to the RTP render filter, which is responsible for transmitting RTP streams onto a network and monitoring RTCP reports. This filter must generate messages that reflect the feedback it receives via RTCP receiver reports from the hosts receiving the rendered stream. One way to accomplish this task is to allow the RTP render filter to interpret RTCP receiver reports. This interpretation would consist of conversion of RTCP report indicated loss percentages to DirectShow quality message proportion values. A drawback to this approach is that it restricts the ability to change the interpretation of RTCP reports and implement various adaptation algorithms. Therefore, in order to add support for source based network adaptation (adaptive source rate control [1]) it was also necessary for RTCP reports to be sent to the network adaptation controller. An adaptation controller can retrieve raw RTCP messages using a custom interface supported by the RTP render filter. To allow host adaptation at the sender, the RTP render filter must generate quality messages that reflect the state of local resources such as CPU. This task was easily accomplished by deriving the RTP render filter from the DirectShow video render base class. The base class supports quality message generation for audio and video streams.
IEEE Multimedia Systems ‘98
8
4.4 Network adaptation controller Network-based sender adaptation is most beneficial for point to point conferences where the available network bandwidth is not known when the call is placed. Multicast conferences can also use sender adaptation, but this penalizes hosts on high bandwidth networks. Therefore the best use of ASRC, when using multicast, is in intranet conferences where the available network capabilities are fairly homogenous. In such environments reducing sender bandwidth in response to delivery problems represents a lighter weight form of adaptation that is more likely to produce the desired results than Receiver Driven Layered Multicast (RLM). The algorithm used by our network adaptation controller is similar to that described by Busse et al. [1]. The adaptation controller allows the application to specify a loss threshold and minimum and maximum bandwidths. We extended the ASRC concept by adding the capability to prioritize between streams in an application through adaptation. Typically, this involved adapting the bandwidth of a low priority stream in the hope of reducing loss seen by the high priority stream. In our implementation of source based network adaptation, three different steps are followed. These steps consist of RTCP analysis, network state estimation, and bandwidth adjustment. Whenever reception of a new RTCP report occurs at a sender, the new report is used to update the smoothed packet loss experienced by the receiver that originally sent the RTCP report. We used an α (weight associated with latest loss report from RTCP) equal to 0.3 in our low pass filter when computing the smoothed packet loss. Each receiver is maintained for the duration of its participation in a presentation. RTCP Bye messages are used to remove receivers from the list of active receivers. Since loss of this message would result in inaccurate ratios being used for adjustment decisions, we plan to add a timestamp to each receiver record and evict receivers when RTCP reports have not arrived after a reasonable duration. Based on the smoothed packet loss calculated for each receiver a state of congested, loaded or unloaded is assigned. A count of receivers in each state is maintained, and as each receiver changes from one state to another the count of receivers in the previous state is reduced and the count of receivers in the new state is incremented. A receiver with a smoothed packet loss greater than 4% is deemed congested, less than 2% is considered unloaded, and in between is categorized as loaded. Using the count of receivers in each state, an adjustment decision is made to increase, decrease, or hold the current bitrate target. If greater than 10% of receivers are congested, we decrease the current network bitrate target
being used. If this condition does not hold but greater than 10% of receivers are loaded, we hold the bitrate target constant. Finally, if neither of these conditions holds true, we increase the network bitrate target. Senders adjust the bitrate they send data at using an additive increase and multiplicative decrease algorithm. Adjusting the bitrate is a two-step task, as we discovered that the codecs occasionally undershoot the target bitrate if the motion in the frame would cause the codec to greatly exceed the target bitrate. In this case we increase the capture frame rate to get the codec to generate the bitrate we’ve targeted. We plan to enhance the network source adaptation controller to allow the user to tradeoff bitrate versus frame rate adjustments.
complicated because it changes the DirectShow media type used in negotiating filter connections. Media types are negotiated when filters are being connected to form a filter graph and are usually fixed. While DirectShow does allow media types to be changed while the filter graph is in the run state, many filters are not able to gracefully accommodate such changes. For such filters it is necessary to use other means to dynamically vary the resolution (and thus the media type) used.
RTP Source
RTP Demux
Depacketizer
H26x Decoder
Video Render
H26x Decoder
Video Render
H26x Splitter
4.5 Host Adaptation Controller The host adaptation controller responds to CPU load changes by initiating adaptation using interfaces provided by the filters of the graphs it controls. This section discusses how the host adaptation controller provides adaptation for NetMM applications that send data. In Figure 2, we presented a typical filter graph for transmitting video streams, consisting of a video capture filter, a video codec, an RTP SPH, and the RTP render filter. In order to add adaptation to such a filter graph, the host adaptation controller configures the RTP render filter to deliver quality messages to the adaptation controller rather than upgraph. The proportion field of the quality messages delivered indicates the degree by which the data rate should be changed. These quality messages are translated into media parameters that are used in manipulating the video capture and codec filters. These parameters include codec bitrate, data capture rate, and data capture resolution. Changing the capture rate and resolution has significant impacts on CPU and network bandwidth consumption, while manipulating the codec bitrate offers a means of changing network bandwidth with little effect on CPU consumption. Combinations of these parameters are provided to account user preferences during adaptation. For example, one policy that we have implemented which follows a particular change path that is described below: •
If the codec bitrate is above its minimum value, reduce it towards the minimum.
•
If the codec bitrate is at its minimum value and the frame rate is above its minimum value, then reduce the frame rate.
•
If the frame rate has already been reduced to its minimum value then reduce the frame resolution.
Changing the frame rate and bitrate are easy tasks when the graph is running. However, changing the resolution is
IEEE Multimedia Systems ‘98
9
Figure 6: The use of the H26x splitter at the receiver to support dynamic resolution changes at the sender. The H.263 and H.261 codecs used belong to this class of filters that do not allow dynamic changes in picture resolution. In addition to these filters, the standard video render filters do not change the size of the display automatically in response to media type changes. To handle changes in the resolution that are initiated by network-video sources, it was necessary to write a video splitter that connects to a pair of decoders and video renderers. The splitter routes media samples according to their resolution so that changes in the resolution of the delivered data streams are not seen by the codecs and renderers. When the media type changes the application must be notified so that it can display the output of the appropriate video render filter. Thus, for certain forms of adaptation (particularly those that result in changes to the user interface), it is necessary that an application be adaptation aware.
5. Receiver Driven Adaptation using Layered Multicast The receiver controls that we have implemented rely on Receiver driven Layered Multicast (RLM)[9] to achieve adaptation. This type of adaptation mechanism requires that each encoding layer be sent on a different multicast IP address (or unicast address and port pair). In RLM, layers are either added or dropped according to network conditions such as available bandwidth and detected loss. Our implementation of receiver adaptation also used RLM to adapt to host CPU load, since the number of layers being processed has a direct impact on system resources such as processor load and bus bandwidth.
To support RLM it was necessary to create new components and modify existing DirectShow components. A new variety of out adaptation controller with support for the use of RLM in performing network and host adaptation was constructed using C++ class hierarchy mechanisms. In this section, we present the version of the adaptive controller and the other changes necessary to DirectShow needed to add support for receiver adaptation using RLM in a component based architecture. The DirectShow RTP Send and Receive Payload Handlers are capable of receiving encoded data and processing it for playback or network transmission for a single stream only. To support layered encodings, we needed to be able to split a single stream containing all layers of video data into separate streams, each containing a separate layer that is transmitted via a separate network connection. It was also necessary to be able to merge multiple layers into a single video stream for delivery to the codec. In order to provide this functionality we added a new type of filter to DirectShow RTP called a layered payload handler. Layered payload handlers (LPH) are specifically written for each payload type that supports hierarchical encoding. LPH filters are written as Send LPH (SLPH) filters and Receive LPH (RLPH) filters. SLPH filters implement the policies used for assigning each video frame to the appropriate stream when using layered multicast. These filters accept input from the codec, which produces a stream consisting of frames from all the layers of the stream. The streams produced by the SLPH filter consists of video frames that are then packetized by RTP SPH filters. No changes are necessary to the RTP SPH filters to do this, as each layer only contains legal frames of the particular payload type for the stream in question. These packets are then transmitted via the RTP render filter (also unchanged) on separate multicast streams. Reception of each stream in a layered multicast is accomplished using the RTP source filter, which requires no modification to support this functionality. Once received, each stream is delivered to an RTP RPH filter via an RTP Demux filter. Each RPH filter reassembles the packets for a single stream into frames of the appropriate media type. These frames are then delivered to an RLPH filter, which is responsible for reordering and combining frames received on each layer into a unified stream of the appropriate media type, which is then delivered to the codec for decompression and rendering.
Video Capture
Poorman’s Encoder
Video Encoder
SPH
RTP Render
Video Encoder
SPH
RTP Render
Video Encoder
SPH
RTP Render
Video Encoder
SPH
RTP Render
Figure 7: Sending video with the poorman’s codec
decoder) filter set that allow temporal video scaling using a standard codec. The poorman’s encoder demultiplexes video frames across its output pins, while the poorman’s decoder multiplexes these frames into a single stream again. The poorman’s encoder uses a weighted roundrobin interleaving algorithm in assigning frames to pins. This simple algorithm is made possible by avoiding any inter-frame dependencies, which we accomplished by forcing the video codec to only produce key-frames. While clearly not an optimal solution for layered video, we found this to be a very effective tool for doing research on hierarchical adaptation and in demonstrating the usefulness of hierarchical encodings in the absence of a true hierarchical codec. Figures 7 and 8 show the filter graphs used in implementing RLM using the poorman’s codec. In demultiplexing video frames in a weighted round-robin fashion across multicast streams, it is important to retain time stamping information. This information is necessary to ensure recombination of these streams by a receiver results in a single video stream with frames arranged in
RTP Source
RTP Demux
RPH
Video Decoder
RTP Source
RTP Demux
RPH
Video Decoder Poorman’s Decoder
RTP Source
RTP Demux
RPH
Video Decoder
RTP Source
RTP Demux
RPH
Video Decoder
Video Capture
Figure 8: Receiving video with the poorman’s codec
5.1 Poorman’s Codec When we began our work, we did not have access to any true hierarchical codecs such as are necessary for H.263+ or Indeo 5.0. As a result, we decided to implement a simple SLPH (poorman’s encoder) and RLPH (poorman’s IEEE Multimedia Systems ‘98
10
correct temporal sequence. In our implementation, the RTP SPH filters ensure correct relative RTP timestamps across layers due to the fact that they derive the RTP timestamp assigned to each packet
from DirectShow time stamps associated with the frame being packetized. On the receive side the RPH filters leave the RTP timestamps in the DirectShow samples, allowing sorting and synchronization of frames to be carried out by the poorman’s decoder. Once frames have been sorted and synchronized by the poorman’s decoder , it is necessary to assign a DirectShow timestamp to each frame. This information is necessary for the video render filter to be able to correctly generate quality messages in response to host or network conditions that require adaptation. We determine the timestamps to be assigned using the following formulas: Time_offset = FIRST_RTP_TIME - START_STREAM TIME Sample_start_time Time_offset
=
CURRENT_RTP_TIME
+
Interval_time=PREVIOUS_RTP_TIME– CURRENT_RTP_TIME Determining FIRST_RTP_TIME is accomplished by sorting the first few packets to arrive according to their RTP timestamps and taking the lowest resulting value. The number of packets that must be received before this sorting occurs (and thus determination of the FIRST_RTP_TIME value) is programmatically controllable. The mechanism is necessary to eliminate the effects of varying packet jitter and reordering which occurs both in the network and in the host operating systems of the sender and the receiver.
5.2 H263+/Indeo Layered Payload Handlers The layered payload handlers were initially designed with two hierarchical video encodings in mind, Indeo and H.263+. These two video encodings have different layering structures, resulting in somewhat different functionality being implemented in each LPH. Indeo is banded, where multiple bands of video are present in each encoded video picture. Each band can be transmitted on a separate stream or all bands can be sent via a single stream. To separate the bands in a stream, a tool called a
Video Capture
H263+ Coder
H263+/ Indeo SLPH
SPH
RTP Render
SPH
RTP Render
SPH
RTP Render
SPH
RTP Render
Figure 9: Sending video with the H263+ codec
band extractor is used. In our implementation the SLPH plays the role of the band extractor, receiving the composite frame from the encoder and splitting it into banded data which are then passed to RTP SPH filters for packetization and transmission via RTP streams. The bands are recombined into a single video frame by the RLPH (after an RTP RPH filter reassembles each band) and provided to the decoder. Indeo requires that a base layer be present, which may be received alone and must be received if any other layer is to be processed. Other layers depend upon each other in a serial fashion, that is, layer 1 depends on layer 0 (the base band), layer 2 depends on layer 1, etc. The layering structure in H.263+ is more complicated than Indeo, with three different basic types of layering, or scalability, introduced [18]. These three types of scalability include temporal, spatial and SNR (signal to noise ratio). Temporal scalability, which includes B (bidirectionally predicted) frames, provides a greater frame rate. Spatial scalability is a change in picture size, such as from QCIF (176x144) to CIF (352x288). SNR scalability is the encapsulation of data to correct coding error introduced during the compression process, improving the picture as compared to the original uncompressed version. These three types of scalability can be combined in various ways to provide many different layers of video. However, each video frame is separate and belongs to one layer only. Two or more of these frames may be combined by the decoder to create a single picture for rendering. Unlike Indeo , layering in H.263+ is not arranged in an ordered fashion. While there is a base layer that is required, other enhancement layers may depend upon any other layer below it in the hierarchy of layers. The layered payload handlers are configured to be aware of the layering dependencies and use this information when directed to drop or add layers. In addition to the complex layering relationships supported by H.263+, a specific ordering of frames is required when layers are recombined. This ordering, which is described in the H.263+ ITU recommendation, requires that B frames be provided to the decoder after both of the frames from which they are predicted. For RTP Source
RTP Demux
RPH
RTP Source
RTP Demux
RPH
RTP Source
RTP Demux
RPH
RTP Source
RTP Demux
RPH
H263+ Indeo RLPH
H263+ Decoder
Video Render
Figure 10: Receiving video with the H263+ merger. IEEE Multimedia Systems ‘98
11
example, in a sequence of three frames, where frame #2 is bi-directionally predicted from frames #1 and #3, the ordering would be 1, 3, 2. This is also the order in which the encoder will produce the frames, which results in the RLPH receiving the frames in a different order than the original sampling, or capture order. Because the specification for RTP requires that the RTP timestamps be based on sampling instant, the payload handlers must reconstruct a sampling instant for the frames when generating the RTP timestamps when transmitting the data. It also means that when the frames are reordered on the receiving side, the RTP timestamps cannot be used to order them for delivery to the decoder. Reconstructing the timestamps is not a difficult problem, although it does result in an estimation of the sampling instant. However, ordering the frames for the decoder without the assistance of the RTP timestamp is a much harder problem to solve, and the number of B frames between any two reference frames can be variable. We solved this problem by sorting the B frames separately from the other frames, and using the reference frames as delimiters for the prediction of the B frames to determine the original encoder ordering. Because the expected number of B frames within any “prediction window” is variable, we added a configurable buffering timeout period. During this period, the Receive Layered Payload Handler will wait for incoming B frames before any the B frames for that window are provided to the decoder. We can also dynamically adapt this timeout period based upon the number of B frames received after the timeout period has expired. The adaptation manager uses the LPH filters to dynamically add and drop layers on the sending or receiving sides. By implementing most of the complexity of RLM in specialized SLPH and RLPH filters and reusing the RTP SPH and RTP RPH filters, it was possible to easily add support for RLM to our adaptation engine. Building such functionality in a monolithic architecture instead of in the context of a componentbased architecture such as DirectShow significantly would have added significant complexity to this task.
5.3 RLM Implementation in the Network Adaptation Controller Our network adaptation controller component implements support for RLM as an adaptation mechanism In order to facilitate experimentation, the RLM aware adaptation controller offers a variety of parameters concerning adaptation policies that may be set by the application. Settable parameters include the loss threshold (the amount of loss that is tolerated before leaving a layer) and the Join/Leave message TTL (used to constrain Join/Leave messages to a subset of the network). In addition to these parameters, all timers associated with RLM are also available to application manipulation. IEEE Multimedia Systems ‘98
12
The RLM algorithm uses packet loss as input in deciding when to add or remove layers. Since this information is currently kept by the RTP RPH filter, we needed a way to propagate it to the network adaptation controller. In order to do this we modified the RTP RPH filter by adding a callback interface used to deliver packet loss information to the adaptation controller. The network adaptation controller receives the packet loss information from each RTP RPH and aggregates it across all layers. Our implementation of the network adaptation controller can function in either of two modes of operation. In one mode it takes direct action to compensate for adverse network conditions. Such action might include instructing the RTP Source filter to add/join a multicast layer and instructing the RLPH of the resulting action so the RLPH knows whether or not to expect data on a particular layer. In the other mode control of adaptation can be handed over to the application. The application then implements adaptation decisions (e.g., by manipulating the RTP source filter and the RLPH) based on information provided by the network adaptation controller.
5.4 Host-Based Receiver Adaptation Using the Host Adaptation Controller The host adaptation controller allows a receiver of video streams to choose a set of layers that best suit host resource availability. Initialization of the host adaptation controller is either performed by the application (in the case of an adaptation aware application) or by using a proxy adaptation filter (as described in section 3.2). The host adaptation controller sets itself as the sink of DirectShow quality messages emanating from the video render filter. When these messages indicate a flood, the controller drops layers of video using the multicast group join/leave interface provided by the RTP source filter, and adds layers when it receives famine messages. These actions are performed on a separate thread to avoid interfering with data flow.
6. Experience with Applications and Testing In order to test our software infrastructure, we constructed a number of adaptive applications. These applications include an ActiveX control that allows display of RTP streams in HTML pages and a generic test application called xNetMM, which provides functionality equivalent to VIC and VAT. The ActiveX control’s attributes can be set using various scripting languages in the HTML page. Using DirectShow RTP and software encoders/decoders we were able to receive and decode up to three simultaneous H263 CIF video streams, each at 20 FPS, and one audio stream, for an aggregate bandwidth of over 3 Mbit/s, on a machine using a 266 MHz Pentium II processor. Using a special DirectDraw video renderer
33.6 Kbps
10/100 Mbps Ethernet
266 Mhz PentiumII Processor
7. Future Work 233 Mhz PentiumII Processor
We plan to extend our adaptation infrastructure to include support for adaptation of multiple resources and applications. We also intend to modify our design so as to allow user-driven policies and priorities. Extending adaptation to be a system-wide activity rather than a perapplication response to adverse conditions will further enhance user experience beyond current implementations. By taking into account user preferences in such systemwide adaptation, information that is otherwise unavailable to an adaptation controller may be used. We believe that this combination of system-wide adaptation and use of user preferences will allow us to arrive at an optimal level of adaptation.
233 Mhz PentiumII Processor
166 Mhz Pentium Processor
Router
300 Mhz PentiumII Processor
90 Mhz Pentium Processor
Figure 11: Test platform. which allows the filters to directly write to video hardware memory, we are able to provide full screen high-quality Indeo video and audio (equivalent to MPEG2 SDTV quality at 5Mbps) video at 50% CPU utilization. We used these applications to test various scenarios using the test platform shown in Figure 11. The presence of a modem allows us to test ASRC and RLM functionality with a wide range of available host bandwidths. In addition to use of a modem to create a wide range of client bandwidths, we have written components for our architecture which drop RTP packets to simulate network loss. The behavior of these components is fully parameterized so that we have detailed control over the distribution of packet losses. Finally, the hosts in our test environment span a wide range of processor capability, starting from a machine with a 90 MHz Pentium processor to a machine with a 300 MHz Pentium II processor. To demonstrate host adaptation in the presence of multiple streaming applications, we have created a video conferencing application using multiple instances of the ActiveX control in a Webpage. The video streams in the conference use layered video allowing receivers to adapt. Participants in the conference view and hear one another and watch video clips in a “movie window”. The movie plays at 20 FPS with 44.1 kHz Indeo audio. As the number of participants increases, the performance of various audio and video streams suffer. The video frames appear very jerky and audio crackles. When adaptation is turned on for the talking head receive streams and the local outgoing stream, the receive streams drop down to the lowermost video layer (2 FPS) and the video sender reduces to 5 FPS. After the adaptation, the quality of movie and its associated audio is returned to acceptable levels. This scenario involves significant user interaction. In the next section, we address how we can minimize such user input.
IEEE Multimedia Systems ‘98
13
7.1 Adapting Across Multiple Resources The metrics used for monitoring performance, the algorithm to adapt, and the mechanisms to control adaptation behavior are resource specific. However, we believe that the interface used to interact with an adaptation controller for a particular resource can be generalized. Given a standard interface, applications may instantiate and interact with specific adaptation controllers without requiring a high degree of application knowledge of adaptation.
Adaptation Controller
Host Adptation Controller
Network Adaptation Controller
Figure 12: Adapting across multiple resource To allow an application to adapt across multiple resources, we plan to extend our architecture to use a hierarchical organization of adaptation controllers [8]. In this type of arrangement, an application interacts with a single root adaptation controller, which in turn manages other adaptation controllers on the application’s behalf. For example, to adapt to both host and network resources an application may instantiate a composite adaptation controller that makes use of a host adaptation controller and a network adaptation controller. When an adaptation controller is a leaf in a tree formed by such a hierarchy it will not take control actions directly. Instead will advise its parent about the need for adaptation when conditions necessitate it. The root of the tree then takes actions considering the current state of all of its children. We believe this hierarchical approach will provide a means of measuring current quality of presentations and
manipulating a wide variety of resources available to applications so as to significantly improve quality of the user experience.
7.2 User Specified Adaptation Policies. One important area of further work is to allow a user the ability to specify preferences such as the relative priority of streams and applications. The user should also be able to specify policies for adaptation [8], perhaps with hints from applications and an adaptation controller. Allowing user input in regards to which streams are most important is critical in arriving at the best possible user experience, as such information (does the user prefer that higher priority be given to the sports channel or to the financial channel?) is often unavailable via any other means. User preferences in terms of exactly how to adapt are also very important, as one user may prefer a higher frame rate, while another may prefer an increased resolution. Finally, hints from the adaptation controller to the user as to what forms of adaptation are likely to succeed in implementing user preferences that would be useful in helping the user to make adaptation decisions. In order to address these issues, we plan to create two entities, a policy tool and a policy controller. The policy tool provides a graphical user interface to allow a user to choose applications and to specify policies. These policies may apply to individual applications or have a systemwide scope. We envision a policy tool that allows these policies to be specified prior to application execution and to be modified at runtime. Finally, the policy tool will also include a default set of policies to use in the absence of user input. Once a set of policies has been specified (and whenever the policies are dynamically changed), the policy tool will load them into the system policy controller. The policy controller is responsible for interacting with each adaptation controller and each application so as to implement the policies specified by the policy tool. The policy controller is also responsible for monitoring the state of each policy controller so as to give feedback to the policy tool, which may then interact with the user so as to update policy specifications.
8. Conclusion Using DirectShow as a basis, we have developed a framework for RTP streaming called DirectShow RTP. By taking advantage of the flexibility of DirectShow and the extensibility inherent in the design of the DirectShow RTP framework, we have implemented several methods of network adaptation and explored new directions in host adaptation. Extending streaming functionality to automatically adapt to both network and host conditions makes it possible to maximize the quality of multimedia IEEE Multimedia Systems ‘98
14
presentations while still using standards based codecs and protocols. The component-based nature of the DirectShow RTP framework makes future work in the area of adaptation simple since such a framework increases both reuse and maintainability of previously constructed components. The addition of adaptation capabilities to the DirectShow RTP framework was accomplished using the underlying mechanisms for quality monitoring that were already present in the system. This allowed adaptation capabilities to easily be added to NetMM applications using this framework with little to no changes. The addition of such capabilities to a component-based framework such as DirectShow RTP has resulted in a dramatic and immediate improvement in NetMM applications using this framework. This demonstrates the value of both adaptation capabilities and of frameworks that allow these capabilities to be easily and rapidly utilized.
Acknowledgments Thanks to Don Ryan and Mike Clark at Microsoft Corporation, Don Newell, Dale Scotto, and numerous other members of Intel Architecture Labs for their insights and contributions to this project. DirectShow, Windows NT, and ActiveX are registered trademarks of Microsoft Corporation. Pentium and Pentium II are registered trademarks of Intel Corporation. References [1] Busse, I., Deffner, B., Schulzrinne, H., “Dynamic QoS Control of Multimedia Applications based on RTP”, Computer Communications, Jan. 1996. [2] Campbell, A.T., Coulson G., and D. Hutchison, "Supporting Adaptive Flows in Quality of Service Architecture", Multimedia Systems Journal, Special Issue on QoS Architecture, 1997. [3] Clark, D., and Tennehouse, D.: “Architectural Considerations for a New Generation of Protocols”, Proceedings of ACM SIGCOMM’90, Sept. 1990, pp. 201208. [4] Day, J. D., and Zimmermann, H.: “The OSI Reference Model,” Proceedings of the IEEE, vol. 71, Dec. 1983, pp. 1334-1340. [5] Du, J., Putzolu, D., Cline, L., Newell, D., Clark, M., Ryan, D.: “An Extensible Framework for RTP Based Multimedia Applications,” NOSDAV Proceedings, May 1997, pp. 53-60. [6] Hang. L., and Zarki E. M., Adaptive Source Rate Control for Wireless Video Conferencing. Submitted for publication 1997. [7] Hoffman, D. and Speer M. Hierarchical Video Distribution Over Internet-Style Networks. In the proceedings of the
IEEE International Conference on Image Processing Lausanne, Switzerland, Sept 1996. [8] Krishnamurthy, L.,. AQUA: An Adaptive Quality of Service Architecture for Distributed Multimedia Applications. Ph.D. thesis, University of Kentucky May 1997. [9] McCanne, S., Jacobsen, V. and Vetterli, M. “Receiverdriven Layered Multicast”, ACM SIGCOMM ’96, August 1996. [10] Microsoft Corporation. DirectShow Online Doumentation. 1997 [11] Nortcutt, D., Kuerener, E.,. “System Support for TimeCriticial Applications”. Computer Communications, 16(10):619-636, October 1993. [12] Parris, C., Zhang H., and Ferrari, D., "Dynamic Management of Guaranteed Performance Multimedia.Connections," ACM Journal of Multimedia Systems, Apr. 1993. [13] Schulzrinne, H. “RTP Profile for Audio and Video Conferences with Minimal Control”, RFC 1890, January 1996. [14] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V., “RTP: A Transport Protocol for Real-Time Applications”, RFC 1889, January 1996. [15] Shacham, N. Multipoint Communication by Hierarchically Encoded Data. Proceedings of the IEEE InfoCom’92, 1992, pp. 2107-2114. [16] Zhang, H and Knigthly, E. A New Approach to Support Delay Sensitive Video in Packet-Switeched Networks. In Proceesings of the 5th NOSSDAV, April 1995, pp 275-286.
[17] Zhang, L., Deering, S., Estrin, D., Shenkar, S., and Zappala, D., RSVP: A New Resource ReSerVation Protocol. IEEE Network Magazine, 7(5), September 1993. [18] ITU H.263+ Specification, Work in Progress
IEEE Multimedia Systems ‘98
15