Designing an Environment for Distributed Real

0 downloads 0 Views 409KB Size Report
Parallax XVideo board, on Sun/Solaris2/SunVideo and on Silicon Graphics O2/IRIX/mvp. 3 Measuring Performance. The performance tests presented below ...
Designing an Environment for Distributed Real-Time Collaboration Mathias Johanson Swedish Institute for Systems Development [email protected] April 1998

Abstract This paper reports on experiences from designing and implementing a set of tools for distributed real-time collaboration. The focus is on audio/video-conferencing in broadband environments, where high quality audio and video is the primary concern. The paper also presents some performance measurements and concludes by describing some situations in which the system has been used.

1 Introduction The quality of distributed collaborative work can be substantially improved with the use of specialized tools. Such tools include audio/video-conferencing systems, application sharing and shared workspaces. The quality of audio and video in a conferencing system is critical for the participating parties to be able to communicate in an unconstrained and highly interactive fashion. Historically the primary limiting factor for realizing high quality audio/video-conferencing systems has been the lack of network bandwidth. With the rapid progress in high-speed networking this situation is about to change. In order to study how distributed collaborative work can benefit from using broadband communications we have developed a set of tools that can take advantage of increasing network resources. A testbed broadband network connecting research organizations in Sweden has been used for testing these tools in ”real-life” collaboration between researchers. While computer supported cooperative work (CSCW) in low bandwidth environments, such as the Internet, has been rather well studied[9, 10, 11, 12], much remains to be done in the high bandwidth situation. Lots of audio/video conferencing software is available on the market and in the public domain, but few of these tools scale well in respect to audio and video quality when the available network bandwidth increases. This is because they utilize audio and video compression schemes that are very destructive, in order to acheive high compression ratios, which is necessary for low bandwidth audio/video communication. When network bandwidth no longer is the limiting factor other bottlenecks appear, such as video compression performance and CPU utilization.

2 The Design of an Audio/Video Conferencing Tool Smile! is a new audio/video conferencing tool designed to be able to take advantage of the high bandwidth available in broadband network environments. The primary design goals for Smile! include: • • • • • •

High quality audio and video Ease of modification for testing purposes Ability to take full advantage of specialized hardware for video compression/decompression/rendering Integration of audio and video into one tool for ease of synchronization Minimizing end-to-end delays Support for multipoint conferencing through multicasting

Non-goals in the design of Smile! are: • • •

Support for a wide range of platforms Support for a wide range of video and audio coding formats Minimized bandwidth consumption

2

2.1 Network Transport Smile! is based on IP-multicast [13], but can run on IP-unicast as well for point-to-point conferencing. The RTP/RTCP protocols [2] are used for segmenting audio and video frames into UDP datagrams. This makes Smile! interoperable with other applications using RTP/RTCP/UDP such as vic [6]. RTP and RTCP are the draft Internet standard protocols for transmitting Real-Time data over packet networks. The Real Time Protocol (RTP) defines a header that contains information like sequence number, timestamp, source identifier and payload type. RTP does not provide any resource reservation facilities and does not guarantee quality of service. For each payload type a specific document known as a profile specifies how this type of media should be fragmented into the RTP packet. The RTP header includes the following fields: • •





Payload Type (PT): 7 bits This field identifies the type of media contained in the RTP packet. Sequence Number: 16 bits This field contains a monotonically incremented sequence number that can be used by the receiver to detect packet loss or out of sequence packets. The initial value should be randomly generated and then incremented by one for each packet sent. Timestamp: 32 bits The timestamp field contains the time of the sampling instant for the data contained in the packet. The time values should be derived from a clock that increments monotonically and linerly in time. The clock frequency is defined in the profile document of the media encapsulated. Wrap-around of timestamps must be detected by the application. Synchronizing Source (SSRC): 32 bits The SSRC field uniquely identifies the sending source of the RTP packet.

In addition to the abovementioned fields, the RTP header also includes fields for RTP version number, padding, header extension, a profile specific marker bit and a contributing source list (for identifying translators and mixers that the RTP-frame has traversed). RTCP, the Real Time Control Protocol, is a control protocol that allows monitoring of RTP data delivery and provides minimal control and identification functionality. RTCP packets corresponding to an RTP session using UDP port p should use UDP port p+1. RTP packets should use even UDP port numbers and, consequently, RTCP should use odd port numbers. RTCP packets are transmitted periodically to all members of a multicast group. The packets contain information to identify the RTP media streams, feedback on the quality of transmissions and minimal session control information. A number of different RTCP packet types are defined to perform these functions, including: • • •



Sender Report (SR): The transmitter of RTP-data transmits sender report packets with transmission statistics. Receiver report (RR): Receivers of RTP media streams transmit receiver report packets with reception statistics. Source descriprion (SDES): SDES packets identify a particular media stream. An SDES packet can contain several different items. The only mandatory item is the CNAME (canonical name) item, which contains a string of the format user@host identifying the sending source. Other SDES items include NAME (the real name of the source), EMAIL (the email address of the sending source), PHONE (a phone number associated with the sending source), etc. BYE: A BYE packet is sent to indicate that a source is leaving the session.

The RTP and RTCP protocols are described in [2].

3

2.2 User Interface The graphical user interface of Smile! was built using the scripting language Tcl/Tk [14]. The primary motivation for this choice was to facilitate rapid prototyping and to make the GUI easy to modify and extend with new features. To make this possible, and to create reusable components for building general audio/video applications, Tcl/Tk was extended with new commands and widgets for audio and video. A snapshot of the user interface of Smile! is presented below:

Fig. 2.1 The Graphical User Interface of Smile!

2.3 The Tcl/Tk Audio/Video Extension 2.3.1 The Video Command and Widget Tcl/Tk is a scripting language for building GUIs. More specifically Tcl (Tool Command Language), is a simple interpreted programming language and Tk is a toolkit with widgets for different GUI components like buttons sliders and scrollbars. In order to make it possible to handle input and output of audio and video from Tcl, two new Tcl commands are introduced, audio and video, and the Toolkit Tk is extended with a widget for displaying video. The new video command is described below. command video init

description initializes the Tcl/Tk video extension

video

creates a video widget with the name

video ssrc

returns a list of SSRC id’s of active video transmitters

video members

returns a list of CNAMEs of active video transmitters

video quit

cleans up some data structures used by the Tcl/Tk video extension

Table 2.1: Video commands

4

When a video widget has been created using the video command a collection of widget specific commands can be used to control the behaviour of the widget. In Tcl, when a widget is created, the pathname of the widget becomes a new Tcl command, that can be invoked with different parameters to perform different widget specific commands. The widget specific commands for a video widget are: widget command

description

live

specifies that a video widget should get its input from a live video source

xfile

specifies that a video widget should get its input from the movie file

transmit

starts transmitting video from a live or xfile video widget

receive

starts receiving video from the network

stop

stops receiving or transmitting video

getpos

returns the current frame number for xfile widgets

getframes

returns the number of frames in a movie attached to an xfile widget

getstandard

returns the video standard in use for live video widgets

save

starts saving the video received in a receiver widget to the movie file

Table 2.2: Video widget commands Below is a code example that illustrates the usage of the new Tk video widget. The following lines construct a live video window of dimensions 384x288 that is transmitted to the multicast address 224.4.4.4 using UDP socket port 4422: video init video .v .v configure -width 384 -height 288 -ipaddr 224.4.4.4 -socket_port 4422 pack .v .v live .v transmit A receiver video window that displays the first available video stream on multicast address 224.4.4.4, UDP port 4422, can be constructed like this: video .v2 .v2 configure -ip_addr 224.4.4.4 -socket_port 4422 pack .v2 .v2 receive If multiple video streams are available on the multicast address in question, a specific video stream can be bound to a receiver video widget, based on the SSRC (synchronizing source) or SDES (source description) of the sender. A list of currently available video sources can be obtained by the commands video ssrcs returning a list of SSRCs, or video members returning a list of CNAMEs for the current multicast address. A video widget can then be bound to a specific video stream by the widget command .v2 configure -receiver_id N, where N is the index in the SSRCS and CNAME lists of the requested video stream.

5

Many configuration options are available for video widgets as outlined in the table below. option

description

-width specifies the width of the video widget in pixels -height specifies the height of the video widget in pixels -fps specifies the frame rate of the video widget, between 1 and 30 frames per second -input input video channel selector for live video widgets -format specifies the video input format for live widgets (Composite, S-Video, YUV, RGB) -standard specifies the video input signal standard for live widgets (NTSC, PAL, SECAM) -qfactor sets the JPEG quality factor, between 0 and 100 -ip_addr the multicast or unicast IP address to connect to -socket_port the UDP port to use -ttl the time-to-live value for multicast communication -frame sets the current frame number when sending video files -receiver_id attaches a video widget to a specific media stream -max_delay specifies the maximum allowed delay for a video datagram Table 2.3: Video widget configuration options 2.3.2 The Audio Command In a manner similar to the usage of the video command, the audio command can be used to control the sampling, playback, reception and transmission of audio, from within a Tcl script. Note that in contrast to the video case, there is no audio widget. This means that the audio system can only handle one audio device, but most workstations doesn´t have more than one audio device anyway. All incoming audio streams are mixed and played back to the same audio device. The audio command is summarized in the following table. command

description

audio init

initializes audio, opens audio device for reading and writing

audio setup

sets default values for all audio parameters

audio quit

cleans up audio structures, closes the audio device

audio volume audio micgain audio delay audio threshold audio getmiclevel audio getnetlevel audio talkspurt audio frontbuf

sets playback volume to v, where v is an integer between 0 and 100 sets microphone gain to g, where g is an integer between 0 and 100 sets maximum permissible delay for audio packets sets silence sensor threshold to returns the current level of the microphone returns the level of the last incoming audio packet sets the minimum length of a talkspurt to ms indicates that ms of audio before a talkspurt should be transmitted in order to avoid front clipping audio endbuf indicates that ms of audio after a talkspurt should be transmitted in order to avoid end clipping audio input selects an audio input (microphone, line in) audio output selects an audio output (headphones, internal speaker, line out) audio sample_rate sets the sample rate to samples per second audio channels sets the number of audio channels to 1 (mono) or 2 (stereo) audio ausize sets the audio packet size to ms audio save saves incoming audio to the file audio monitoring toggles local monitoring on and off Table 2.4: Tcl audio extension

6

2.4 Media Encoding and Transmission 2.4.1 Video Coding and Transmission The video compression scheme used in Smile! is Motion JPEG. M-JPEG is a video coding format based on the ITU-T still image compression format JPEG (Joint Photographic Experts Group). With this format each frame in a video sequence is encoded independently of the other frames in the sequence. This means that M-JPEG compression only reduces spatial redundancy and not temporal redundancy, as is the case with formats like MPEG and H.261. Consequently M-JPEG requires more bandwidth than many formats that reduce redundancy in the time dimension. However, there are a number of advantages making M-JPEG interresting for high quality videoconferencing applications, including • • • •

availability of high performance hardware codecs low compression delay (no buffering necessary) potentially high image quality image quality versus bandwidth tradeoff using the JPEG Q-factor

There are several different compression schemes defined by the JPEG standard, including lossless JPEG and progressive JPEG. The algorithm described here is called the baseline JPEG algorithm, and is the compression scheme used in Smile!. The baseline JPEG compression algorithm is based on the discrete cosine transform (DCT), quantization and entropy coding. First of all, however the original image samples are converted from RGB component space to YCrCb space, through a linear transformation. The Cr and Cb components are then subsampled by a factor two in the horizontal dimension. (This is 4:2:2 format subsampling, for 4:1:1 format the Cr and Cb components are subsampled by a factor two in both the horizontal and vertical dimensions.) Each of the components of the image is then divided into macroblocks of 8-by-8 pixels. The discrete cosine transform is applied to each macroblock. The effect of the DCT is that the higher frequency coefficients are concentrated towards the upper left corner, whereas lower frequencies are concentrated towards the lower right of the macroblock. The elements are then quantized using a quantization table, that is, each element is divided by a coefficient found in a fixed table. Separate tables are kept for the luminance and the chrominance components. The quantization tables can be customized by the user to tradeoff between image quality and compression ratio. (Note however, that even if the quantization tables are chosen to be all ones, the JPEG baseline algorithm is still lossy because of the roundoff errors introduced by the DCT.) The quantized macroblocks are then zig-zag-encoded, that is, the elements are rearranged in a zig-zag-order from the upper left to the lower right. This prepares for an efficient runlength and Huffman coding, which is the final stage of the algorithm. Since both compression and decompression of JPEG images is a very CPU-intensive task, Smile! relies heavily on specialized hardware codecs in order to achieve high frame rate video. The quantization tables are not transmitted along with the image, but the Q-factor is, so that the tables can be reconstructed at the receiver by scaling a set of standard tables specified in annex K of the JPEG specification [1]. Each JPEG image has to be fragmented to fit into the UDP datagrams. In order for the receiver to be able to reconstruct the original images a header, specified in [3], is prepended to each datagram. The header has a field for fragmentation offset that specifies the offset of the current packet in the JPEG image. There are also fields for the Q-factor, image width and image height.

2.4.2 Audio Coding and Transmission The audio subsystem of Smile! uses uncompressed 16 bit linear PCM coding, since this gives uncompromized audio quality. The sampling rate is configurable, but defaults to16 kHz which is good enough for most situations. Although speech can be coded very efficiently using sophisticated coding techniques, this was never a design goal for Smile!, so for simplicity, and quality, the audio is transmitted uncompressed. The bandwidth required for uncompressed 16 bits/sample, 16 kHz audio is 256 kbits/s.

7

Keeping the audio packet size small is important in order to avoid delays. In Smile! the packet size for audio datagrams is 10 ms, that is 320 bytes for 16 kHz, 16 bits/sample, mono. In a multipart conference session each receiver must mix the incoming audio streams to a single stream that can be played back to the audio device. To make this possible each incoming audio stream is put in a separate receiver buffer which at regular intervals are mixed together to a playout buffer that is fed to the audio device. The mixing is performed by addition of the sample values for the separate streams. A playout buffer of 100 ms length has been found to give good performance. In order to avoid wasting bandwidth unnecessarily, a silence suppression algorithm can be utilized in order not to transmit audio when a source is silent. A threshold value can be specified via the GUI, so that when the microphone level is below the threshold, the source is not transmitting the audio. In smile this silence sensor function is performed by taking the average of the absolute values of all the samples in a packet, and if this value is below a configurable threshold value the audio packet is not transmitted. When a burst of audio samples is transmitted after a silence period, this chunk of audio is called a talkspurt. A talkspurt packet is identified by setting the marker bit in the RTP header. In order to avoid clipping the front of a talkspurt a few samples ahead of the talkspurt should be transmitted. Similarly, when the silence sensor applies after a non-silent period, some additional samples should be transmitted to avoid clipping the end of an audio chunk. Values of 20 ms and 40 ms respectively for the lengths of these buffers have been found to yield good performance.

2.4.3 Minimizing delays An important goal in the desing of the Smile! system was to keep the end-to-end delays at a minimum. This is very important for highly interactive communication. In order to minimize delays no buffering of audio or video is to take place at the transmitters, and only as much buffering as is required for synchronization and audio stream mixing is permissible at the receivers. Studies show that audio delays greater than 400 ms severely compromize the interactivity of conversation [5]. The minimum audio and video delay is the propagation delay introduced by the network. For various reasons such as routing delays and temporary network congestion, some packets get more delayed than others. The transmitter must timestamp all audio and video packets so that the receiver can discard packets that are delayed more than a threshold value. If the receiver doesn´t discard old audio packets but plays them to the audio device, a cumulative drift will result. In Smile! the threshold is user configurable, but defaults to 100 ms. The RTP timestamp field is used to put a 90 kHz timestamp in every packet. The algorithm for calculating the packet delay for packet pi is, in pseudo-code: δr:= atime(pi) – atime(pj) δx:= ts(pi) – ts( pj) δ := |δr - δx| if (δ>δmax) then discard(pi) else playout(pj) endif where pi is the most recently received packet, pj is a previously received packet that was not discarded, atime(x) is a function returning the arrival times of packet x, ts(x) is a function returning the timestamp of packet x and δmax is the transmission delay threshold. Functions atime and ts return time values of equal resolution. This approximation of the end-to-end delay is recomputed periodically to accomodate for network delay variations. Also, since the reference packet pj is chosen arbitrarily, the delay estimation could be bad if the delay for packet pj is not representative for the packet flow. Hence a new reference packet should be chosen for each successive recomputation. For increased robustness a mean over a range of delay approximations using different reference packets could be performed. 2.4.4 Cross-Media Synchronization If the threshold for maximum delay is kept relatively small (about 100 ms) the different media streams can not get too much out of synch. The delay thresholds should be the same for different media streams originating from

8

the same source. The experience from using Smile! in a high bandwidth network environment with moderate delays (less than 10 ms) is that no other synchronization scheme is necessary. If transmission delays can´t be kept low, however, playback of one of the media streams must occasionally be delayed in the receiver. This situation is not handled properly in the Smile! system, and is beyond the scope of this paper.

2.5 Remote Camera Control A tool to remotely control the camera of a video source is integrated into Smile!. The camera should be connected to the computer with a serial line. The VISCA standard is used to communicate pan, tilt and zoom commands to the camera. A video switch can also be connected to the serial port to let the remote video receiver select the camera source. The camera control functions are implemented according to the client/server model. The server listens for requests from a client and accepts commands for panning left and right, tilting up and down, setting the pan/tilt speed, zooming in and out and selecting a video input. The video control commands are sent to a camera or a video switch connected to the serial port. Multiple receivers can control the same camera source simultaneously, in which case the pan/tilt/zoom-requests are handled sequentially.

2.6 Movie Files In Smile! video sources can be either live (from a camera) or movie files. The movies should be Parallax MJPEG format, as specified in [7]. Both the audio and video content of the movie file can be transmitted. The movie can also be transmitted video only. The movie can be interactively controlled by the sender from a control panel with buttons for playing, pausing, rewinding, and fast forwarding. A receiver can save the incoming video stream to a movie file in Parallax M-JPEG format.

2.7 Implementation Details The Smile! system was implemented in C and Tcl/Tk for Unix systems. It currently runs on HP/HPUX with the Parallax XVideo board, on Sun/Solaris2/SunVideo and on Silicon Graphics O2/IRIX/mvp.

3 Measuring Performance The performance tests presented below were conducted using two Silicon Graphics O2 computers with mvp video compression hardware connected to a 100 Mbits/s Fast Ethernet network. There was no interfering traffic on the network at the time of the tests. Bandwidth utilization was measured using the netstat program and the frame rates were recorded by the Smile! system itself. The numbers given correspond to one transmitted video stream. Figure 3.1 shows the relationship between video frame rate and bandwidth consumption. The results are given for two different image dimensions, 320x240 and 640x480, and for two different JPEG Q-factors, 50 and 75. The Q-factor 50 corresponds to the standard quantization tables from Annex K of the JPEG specification [1].

9

Video Bandwidth in Relation to Frame Rate

Mbits/s 5,0

320x240/50

4,0

320x240/75 480x360/50

3,0

480x360/75 640x480/50

2,0

640x480/75

1,0 0,0 5

10

15

20

25

fps

Fig. 3.1 Bandwidth requirements for JPEG-video of different frame rates. Figure 3.2 shows the relationship between video quality and bandwidth consumption. The test was performed for three different image dimensions, 320x240, 480x360 and 640x480, and for two different frame rates, 15 and 25 frames per second.

Video Bandwidth in Relation to Q-factor

Mbits/s 5,0

320x240/15

4,0

320x240/25 480x360/15

3,0

480x360/25 640x480/15

2,0

640x480/25

1,0 0,0 30

40

50

60

70

80

Q-factor

Fig. 3.2 Video bandwidth in relation to JPEG Q-factor for different video configurations.

10

It is clear from Figure 3.1 that video bandwidth increases linearly with frame rate. This is obvious for M-JPEG video since the frames are coded independently, but might not be the case for inter-frame compression formats. Figure 3.2 shows that there is not much point in choosing Q-factors below 60, since the required bandwidth is almost the same.

4 Use Cases This chapter describes some of the contexts in which the Smile! system has been used.

4.1 Distributed Engineering Project DING focuses on showing how geographically distributed engineers can collaborate in a construction and design project, using audio/video-conferencing, shared workspaces, CAD and virtual reality systems. The project was initiated in 1997 and funded by the Swedish National Board for Industrial and Technical Development, NUTEK. Participating parties are SISU (Swedish Institute for Systems Development), SICS (Swedish Institute of Computer Science) and Luleå University of Technology. The Smile! system is used for audio/video conferencing and for transmission and playback of animated videos generated from CAD systems and other videos like real life performance tests. In addition to this, 3D models from the CAD systems can be shared and interactively manipulated in a distributed virtual environment, using the VR system DIVE [15]. Shared workspaces are used to support document sharing between the cooperating parties. Below is a screen dump of one of the workstations involved in a demonstrator for the DING project. An animated video showing a construction detail is beeing played back to all three participants.

Fig. 4.1 Snapshot from the demonstrator of the DING project

11

4.2 Telemedicine The Smile! system has been tested in two different telemedical situations. The first test was performed in Oslo, Norway, where the video from a laparoscope was transmitted from one hospital (Rikshospitalet ) to another (Ullevål sykehus). This enabled an expert in minimal invasive surgery to follow a surgical procedure performed in another location, and to guide the performing surgeon throughout the operation. Since video quality is crucial in this type of application, in respect to image resolution as well as frame rate, bandwidth was plentiful, and the application consumed approximately 12 Mbits/s. The backchannel was audio only to maximize the performance for the laparoscopic video. Another telemedical situation tested with the Smile! system is remote diagnosing of dermatological diseases. In this test an electronic dermatoscope (a specialized type of camera) was used, in addition to the regular video camera, to render very high quality video images of the dermatological manifestations. High resolution of the video images is considered more important than high frame rate in this type of application. Interactivity between the local and the remote doctor is also very important in this case, as well as between the remote doctor and the patient.

5 Future Work Smile! will continually be used as a platform for experimenting with audiovisual communication in broadband network environments. We plan to implement layered media codings to support different levels of quality within a single conference. The idea, that was proposed by McCanne [16], is to code the audio and video signals in a hierchical format and then to transmit the different layers on a set of multicast addresses, so that the receivers can join as many multi cast groups as their network connections have bandwith for. Thus, participants that are connected using low speed links can take part in a conference with low quality audio and video, whereas participants with high speed connections can send and receive higher quality audio and video. Furthermore, work is in progress on implementing IPv6 support in Smile!.

6 Summary In this paper the development of a new high performance audio/video conferencing application was presented. The design of reusable audio and video components through an extension of the Tcl/Tk GUI scripting language was described. The choice of network protocols and media encoding formats was discussed. Then some performance measurements was presented to illustrate the bandwidth requirements for the system, and finally some situations where the system has been used were described.

12

References [1] ITU-T Recommendation T.81, ”Information technology – Digital Compression and Coding of ContinuousTone Still Images – Requirements and Guidelines”, September 1992. [2] Schulzrinne, Casner, Frederick, Jacobson, ”RTP: A Transport Protocol for Real-Time Applications”, RFC 1889, January 1996. [3] Berc, Fenner, Frederick, McCanne, ”RTP Payload for JPEG-compressed Video”, RFC 2035, October 1996. [4] H. Schulzrinne, ”RTP Profile for Audio and Video Conferences with Minimal Control”, RFC 1890, January 1996. [5] Brady, ”Effects of Transmisison Delay on Conversational Behaviour in Echo-Free Telephone Systems”, Bell Systems Technical Journal, pp 115-134, January 1971. [6] S. McCanne, V. Jacobson, ”vic: A Flexible Framework for Packet Video”, Communications of the ACM, 1995. [7] Video Development Environment, Reference Guide, Parallax Graphics, Appendix C, 1996. [8] V. Bhaskaran, K. Konstantinides, ”Image and Video Compression Standards Algorithms and Architectures”, Second Edition, Kluwer Academic Publishers, 1997. [9] P. Parnes, ”The mStar Environment - Scalable Distributed Teamwork using IP Multicast”, September 1997. [10] M. Ingvarsson, ”Synchronous Collaboration Over the Internet”, SISU Repoort 97:17, August 1997. [11] Kumar, Vinay, ”Mbone: Interactive Multimedia On the Internet”, MacMillan Publishing, November 1995. [12] Macedonia, Brutzman, "MBone Provides Audio and Video Across the Internet," IEEE Computer, vol. 27 no. 4, pp 30-36, April 1994. [13] S. E. Deering, ”Multicast Routing in a Datagram Internetwork”, PhD Thesis, Stanford University, December 1991. [14] J. K. Ousterhout, ”Tcl and the Tk Toolkit”, Addison-Wesley, 1994. [15] O. Hagsand, "Interactive MultiUser VEs in the DIVE System", IEEE Multimedia Magazine, vol. 3 no. 1, 1996. [16] S. McCanne, ”Scalable Compression and Transmission of Internet Multicast Video”, Ph.D. thesis, University of California Berkeley, December 1996.