XVideo. Video Card. 155 Mbps. SCSI DIsk. Fore. ASX-100. Switch. Switch. AT&T. Public ATM. SPARC 2. SPARC 2. SPARC 2. Figure 3: Experimental Testbed.
High Speed ATM Network Support for Video-Based Distributed Applications Rose P. Tsang and David H.C. Du Distributed Multimedia Center1 Computer Science Department University of Minnesota Minneapolis, MN 55455
Allalaghatta Pavan
Honeywell Technology Center Minneapolis, MN 55418
Abstract
Video trac is expected to be the predominant type of multimedia trac generated by distributed multimedia applications. In terms of bandwidth, real-time delivery and loss, it is also one of the most demanding on both the network and host systems. This paper presents the performance of JPEG (Joint Photographic Experts Group), MPEG-1 (Moving Picture Experts Group), and MPEG-2 coded video over a local Asynchronous Transfer Mode (ATM) network. TCP (Transmission Control Protocol) and the UDP (User Datagram Protocol) are used as the transport protocols. The performance in terms of delay (jitter) and frame loss, as a function of load, are presented and discussed. The appropiateness of using TCP and UDP over an ATM network to transmit periodic bursty sources, such as coded video, are discussed. When the hosts and the network are stressed, the experimental data reveal that the burstiness of the variable bit rate coded video streams is a signi cant factor in the resulting performance degradation. Several trac smoothing schemes are presented. Our results show that trac smoothing results in a signi cantly decreased frame loss rate while maintaining acceptable jitter and loss bounds. On the basis of our experimental results, we also discuss requirements for system components, such as the network interface and switch, which are necessary to eciently support video services.
Invited submission to ACM/Springer-Verlag Multimedia Systems Journal
Keywords: Asynchronous Transfer Mode (ATM), Distributed Multimedia Systems, High
Speed Local Area Networks, trac smoothing, Transmission Control Protocol (TCP), User Datagram Protocol (UDP), video transmission, Moving Picture Experts Group (MPEG), Joint Photographic Experts Group (JPEG). 1 Distributed Multimedia Center (DMC) is sponsored by US WEST, Honeywell, IVI Publishing, Computing Devices International and Network Systems Corporation.
1
1 Introduction Video conferencing, collaborative systems, distance learning, and VOD (video on demand) are all new applications which are based upon the ecient transmission of video, audio, graphical and image-based data. These applications entail network support for multiple simultaneous video and audio streams as well as other trac types with varying quality of service requirements. It is well known that conventional local area networks, such as Ethernet and FDDI, can not provide the aggregate throughputs and real-time requirements necessary for supporting many distributed multimedia applications. Hence, the past several years has seen the emergence of three digital switch-based high-speed network standards: the HIgh Performance Parallel Interface (HIPPI) [12], Fiber Channel [2, 3], and the Asynchronous Transfer Mode (ATM) [5, 11, 14]. HIPPI is a standard aimed at supporting supercomputer applications, such as large-scale visualization and distributed numerical applications, which require relatively infrequent, very large and fast data transfers. Fiber Channel is an extension of HIPPI, a more general standard, aimed at the same type of applications currently supported by HIPPI. ATM is the standard which is intended to support a wide spectrum of multimedia applications. It was developed to serve as the transport mechanism for Broadband Integrated Services Digital Network (BISDN). Because of its international standardization eorts, it will very likely be available on a worldwide basis and may in the future be incorporated into the public telecommunications infrastructure. The ATM standard [5, 11, 14] de nes a fast packet switched network where data is fragmented into xed-size 53 byte cells. It de nes the manner in which cells are switched and routed through network packet switches and links. The ATM standard is expected to serve as the transport mode for a wide spectrum of trac types with varying performance requirements. Using the statistical sharing of network resources (e.g. bandwidth, processing buers, etc.), it is expected to eciently enable multiple transport rates from multiple users with stringent requirements on loss, end-to-end delay, and cell-interarrival delay. Multimedia data types greatly vary in terms of the requirements they impose upon the network infrastructure. Continuous media, such as video and audio, must be delivered in stream-like manner within real-time delay requirements; depending upon the encoding scheme and application, continuous media usually tolerates a certain degree of loss. Conventional computer trac (e.g., le transfers), and applications such as medical imaging, require that data be delivered loss-free, however delays can be tolerated. In our study, we chose to focus on video trac because it is expected to be the predominant type of multimedia trac transmitted, and it is one of the most demanding in terms of bandwidth, delay and processing requirements. The type of video quality expected by today's users require very large network transmission speeds. Uncompressed speeds of broadcast-quality NTSC video and studio quality NTSC video are about 120 Mbits/sec and 216 Mbits/sec, respectively [14]. Fortunately, recent advances in compression techniques make the transmission of these high quality video media feasible. These compression techniques take advantage of the characteristics (or limitations) of the human visual system, to achieve lossy, yet visually lossless, compressed images and video. The two main video compression standards are commonly known as Joint Photographic Experts Group (JPEG) and Moving Picture Experts Group (MPEG). JPEG is a still-image compression standard which 2
can produce visually lossless compression with ratios upto 10 to 1. The MPEG motion video standard was designed to support full motion video at compression ratios upto 200 to 1. Both JPEG and MPEG standards are described in Section 3 Currently, the lack of experimental work has been primarily due to the lack of readily available high speed components. Recently however, several vendors are now providing ATM equipment. Preliminary performance results of ATM networks have been reported in [9, 20, 13, 24, 25]. However, these reports include mainly throughput and delay studies. Little is known about the ability of ATM to support realistic application trac patterns such as time-sensitive bursty trac, e.g., coded variable bit rate video. The goals of our study were the following:
We sought to measure the ability of an actual ATM local area network to support real-time
variable bit sources, i.e., bursty periodic data. Real-time variable bit rate trac, such as coded video, is considered one of the most dicult types of trac to support. Although a small degree of loss may be tolerated, strict timing requirements on frame delivery must be met. Moreover, frame sizes may vary greatly even between consecutive frames. Previous work on the performance of packet switch networks to support variable bit rate sources have used either analytical means or simulation models to predict performance [6, 8, 10]. However, accurate analytical models which capture the time-varying correlated nature of the stochastic process that model video streams are usually intractable [15]. It is also obvious that even in simulation models it is not possible to capture all aspects (or even most) of an actual distributed system. In a distributed system, there exist many components, hardware and software, whose complex interactions cannot be naturally captured or predicted by a xed model. It is likely that video transmission will be one of the dominant media types involved in distributed multimedia systems. Most non-trivial video-based multimedia applications entail the network support of many multiplexed coded video streams. Given an ATM environment, with a known maximum achievable throughput and delay, how many typical coded video streams (e.g., MPEG-1, MPEG-2, JPEG) can be supported within reasonable loss and jitter bounds ? The Transmission Control Protocol (TCP/IP) and User Datagram Protocol (UDP/IP) suite of protocols are widely used today, and are very likely, during the initial deployment of ATM networks, to remain the dominant suite of protocols used in local and wide area network computing applications. How well is each suited for transmitting multiplexed video streams ? What are their performance tradeos ? Protocols which provide end-to-end ow control and re-transmissions, such as TCP, have been considered unsuitable due to the time-sensitive nature of video trac [5, 14]. Packets (frames) must be delivered within a bounded delay or else become meaningless (discarded). End-to-end ow control, such as TCP's sliding window protocol, as well as TCP's retransmission mechanism, have been expected to distort the timing requirement between successive video frames [5, 14], and thus produce unacceptable visual quality video. The experimental results were initially unexpected. Yet, after further experimentation (all described in this paper), they prove to provide an important basis for further study on real-time ow control methods. 3
Experiments conducted were based upon transmitting MPEG-1, MPEG-2 and JPEG coded video streams over a local ATM network consisting of Sun SPARC2 workstations and a Fore Systems ASX-100 ATM switch. A more detailed description of the environment is provided in Section 4. This paper is organized as follows. Section 2 discusses the main features of ATM. Section 3 discusses the JPEG and MPEG video compression standards as well as the characteristics of coded video trac. Section 4 presents the experimental results. Section 5 presents a discussion of network requirements for the support of video transmission. Section 6 provides our conclusions.
2 ATM: the Next Generation Network ATM [4, 5, 11] is a standard developed by the international networking standards committee, CCITT. It is a speci cation for the network layer protocol of future B-ISDNs. It resides above the physical layer and directly below the ATM Adaptation Layer (AAL) (see Figure 1). ATM is distinguished by the following characteristics:
Connection-oriented service. ATM provides a virtual connection between any two phys-
ically dislocated processes which wish to communicate. All cells from the same call traverse the same physical path, or virtual connection. Virtual connections are speci ed by a virtual circuit identi er (VCI) and virtual path identi er (VPI), found in each cell header. The VPI and VCI are used for multiplexing, demultiplexing, and switching the cells through the network. ATM connection-oriented service has the potential to provide very low-latency. High data transfer rates. ATM is independent of any particular physical layer, but is most commonly associated with the Synchronous Optical Network (SONET). SONET de nes a standard set of optical interfaces for network transport. It is a hierarchy of optical signals that are multiples of a basic signal rate of 51.84 Mbits/sec called OC-1 (Optical Carrier Level 1). OC-3 (155.52 Mbits/sec) and OC-12 (622.08 Mbits/sec) have been designated as the customer access rates in B-ISDN. OC-3, 155 Mbits/sec, is the rate currently supported by rst generation ATM networks. Recall, that the aggregate throughput of currently available high-speed shared-medium networks, such as FDDI, is 100 Mbits/sec. Since ATM is a switch-based network architecture, the aggregate throughput is usually several Gbits/sec. The ASX-100 Fore switch, used in our experiments, provides an aggregate throughput of 2.4 Gbits/sec. Each host on an OC-3 ATM network has access to a link speed of 155 Mbits/sec. Support for multiple classes of service. ATM was intended for the support of multiple classes of service, i.e., classes of trac with varying quality of service parameters such as cell loss, delay, cell inter-arrival times, and data transfer rates. These parameters re ect the varying types of trac ATM was intended to support, such as connection-oriented trac types (e.g., video and audio), connection-less trac types (e.g., le transfers), etc. The purpose of the ATM adaptation layer (AAL) is to provide a link between the services required by higher network layers and the generic ATM cells used by the ATM layer. Five 4
Application Software Application Interface
Fore ATM API
SOCKET interface
UDP
TCP
IP
AAL 3/4
AAL 5
ATM
TAXI
SONET
Figure 1: Protocol stack. service class are being standardized to provide these services. The CCITT recommendation for ATM speci es the following AAL protocols. AAL Type 1: supports synchronous bit streams. It is suitable for applications such as traditional voice transmission. AAL Type 2: supports variable bit rate services with a required timing relationship between the source and destination, such as video and audio trac. AAL Type 3/4: supports variable length frames/packets via a connection-oriented or connectionless service without a required timing relationship. AAL Type 5: supports variable bit rate sources without a timing relation between the source and destination; it provides services similar to AAL Type 3/4. The reason for de ning AAL Type 5 was to reduce the overhead found in AAL Type 3/4.
3 Coded Video As mentioned previously, video transmission is extremely bandwidth intensive. Thus sophisticated compression algorithms have been developed which are able to provide high quality, constant image quality video. In this section, we describe the JPEG and MPEG standards and their trac characteristics. JPEG is an international digital image compression standard for continuous-tone (multilevel) still images (grayscale and color). It de nes a \baseline" lossy algorithm, plus optional extensions for progressive and hierarchical coding. The baseline compression algorithm is based upon the following: the pixel values of an image are divided into 8 8 blocks. Each block is transformed through a discrete cosine transform (DCT) function. The DCT is a relative of the Fourier transform and likewise produces a frequency map of 8 8 components. Using a separate \quantization coecient" (or quantization factor), each 8 8 block is divided into 64 frequency components. This is the fundamental 5
JPEG: Starwars
MPEG: Starwars
400
300
350 250
Frame size (ATM cells)
Frame size (ATM cells)
300
250
200
150
200
150
100
100 50 50
0
0 0
50
100 frame number
150
200
0
50
100 frame number
150
MPEG-2: Starwars 900
Frame size (ATM cells)
800
700
600
500
400
300
200 0
50
100 frame number
150
200
Figure 2: Top left graph: JPEG data stream. Top right graph: MPEG-1 data stream. Bottom graph: MPEG-2 data stream. information losing step. A quantization factor of 1 loses no information; larger quantization factors result in losing more information. The high frequencies are normally reduced much more than the lower frequencies. Most of the quantized high-frequency DCT coecients do not need to be sent because they have nearly zero values. The remaining DCT coecients are encoded using either Human or arithmetic coding. The top left graph of Figure 2 depicts a sequence of JPEG frames from a Starwars clip. MPEG-1 de nes a bit stream for compressed video and audio optimized to t into a data rate of about 1.5 Mbits/sec. MPEG video compression is based upon exploiting temporal redundancy (moving image compression) as well as spatial redundancy. Moving image compression techniques predict motion from frame to frame in the temporal direction, and then use DCT's to organize the redundancy in the spatial dimension. An MPEG data stream usually consists of three types of coded frames. `I' frames, or intra-frames, are coded as still images (similar to JPEG). They do not rely on information from any previous frames. `P' frames, or predicted frames, are predicted from the most recently constructed `I' or `P' frame (from the point of 6
200
view of the decompressor). Each macroblock in a `P' frame can either come with a vector and dierence DCT coecients for a close match to the last `I' or `P' frame, or, if there does not exist an adequate match, it can be \intra" coded (as in I frames). `B' frames, or bidirectional frames, are predicted from the closest two `I' or `P' frames, one from the past and one from the future. A typical sequence of MPEG frames looks like: IBBPBBPBBPBBPBBPIBBP ...... `I' frames are usually much larger that `B' or `P' frames. Figure 2 depicts a sequence of MPEG-1 frames from a Starwars clip. Since, there are `I' frames every 16 frames, there are peaks which occur every 16 frames. In MPEG, there are two parameters which may be varied by a user. The rst is the quantization factor (similar to JPEG) the second is the interframe to intraframe ratio, i.e., the number of frames in a period divided by the number of `I' frames. These parameters are of interest because they allow the user to specify the visual quality, as well as allow the user to indirectly specify the bit rate characteristics of the MPEG stream. MPEG-2 is the second phase of MPEG. It deals with the high-quality coding of possibly interlaced video, of either standard or High De nition Television (HDTV). A wide range of applications, bit rates, resolutions, signal qualities and services are addressed, including all forms of digital storage media, television, broadcasting and communications. The MPEG2 \main pro le" baseline is intended to be suitable for use by the largest number of initial applications in terms of functionality and cost constraints. It supports bit rates of 2 to 15 Mbits/sec over cable, satellite and other broadcast channels. The basic compression techniques used in MPEG-2 are similar to those used in MPEG-1. Video content. Another factor which in uences the bit rate of a video sequence is the content of the video. High action scenes, scene changes, pans and zooms, reduce the amount of compression which can be used; the compression algorithm cannot rely on data redundancy. Video sequences with few scene changes, such as video-teleconferencing, generate relatively low constant bit rate streams. It has been observed that video data generated from a live teleconferencing application produces near constant JPEG data rates of about 1 Mbit/sec. Video sequences with complex spatial-temporal activity, such as sports sequences encoded using MPEG-2, require more than 5 or 6 Mbits/sec.
4 Experiments 4.1 Testbed Environment The testbed environment consists of the following components:
Three Sun SPARC 2 workstations. Each Sun SPARC 2 host is equipped with a Fore
Systems SBus ATM host interface SBA-200 as its ATM adapter card. The Series-200 host adapter [7] is Fore's second generation interface. It uses an Intel i960 onboard processor. The i960 takes over most of the AAL and cell related tasks including the SAR functions for AAL 3/4 and AAL 5, as well as cell multiplexing. The SBA-200 allows the host to interface at the packet level, i.e. feeding lists of outgoing packets and incoming buers to the i960. The i960 uses local memory to manage pointers to packets, and uses DMA (Direct Memory Access) to move cells out of and into host memory. Cells are never stored in adapter memory. 7
SPARC 2 VCR XVideo Video Card
SCSI DIsk SPARC 2
TAXI 100 Mbps
TAXI 100 Mbps
AT&T
Fore ASX-100
SONET OC-3c 155 Mbps
Switch
Public ATM Switch USWest
TAXI 100 Mbps
SBA-200 ATM Card
SPARC 2
Figure 3: Experimental Testbed
A Fore Systems Forerunner ASX-100/8 ATM switch. The ASX-100 local ATM switch
[4] is based on a 2.4 Gbits/sec (gigabit per second) switch fabric and a RISC control processor. The switch supports four network modules with each module supporting up to 622 Mbits/sec. The ASX-100 supports Fore's SPANS signaling protocol with both the Series-100 and Series-200 adapters, and can establish either Switched Virtual Circuits (SVCs) or Permanent Virtual Circuits (PVCs). 100 Mbits/sec TAXI links interconnecting the Sun SPARC 2 hosts to the Fore Systems switch. The video data consisted of the following:
MPEG. The input video stream for the MPEG codec was a 3 minute 40 second sequence
from the movie Star Wars [18]. The sequence was digitized from laser disc with a frame resolution (similar to NTSC broadcast quality) of 512 480 pixels. This particular Star Wars sequence was chosen because it contained a mix of high and low action scenes. The interframe to intraframe ratio was 16. The quantizer scale was 8. For these parameters, the image quality was judged to be good (constant) through the entire sequence of frames. The coded video was captured at 24 frames/second. JPEG. The input video stream for the JPEG codec was also a sequence from the Star Wars movie. The movie was input from a VCR and processed by a video card from Parallax Graphics, Inc., inside the Sparc 2. The frame resolution was 512 480 pixels. The quantization factor was 400. The image quality was judged to range from fair to marginal. The coded video was captured at 15 frames/second. 8
host 1 host processing multiplexing
packet loss
receiving host
packet loss
switch multiplexing
host 2
packet loss
host processing
packet loss
multiplexing
packet loss
packet loss
(a) 2 sender to 1 recever
host 1 throughput measurements
receiving host
switch multiplexing packet loss
host processing
packet loss
packet loss
packet loss
(b) 1 sender to 1 receiver
Figure 4: Network multiplexing test con gurations It is important to note that the input sequence we chose contains high as well as low motion scenes. Hence the performance results cannot be compared to all types of video sequences. For instance, a video sequence generated from a video-conferencing application would contain, on average, smaller frames, since in most video-conferencing sequences there is little motion and few, if any, scene changes. Thus, better performance would be expected if such a sequence were used. We chose a clip with a high-level of motion in order to stress test the ATM network's ability to support bursty data. In our experiments, we used the following testing con gurations:
1 sender to 1 receiver. In this con guration (see Figure 4 (b)), one of the SPARC
workstations injected multiple video streams into the network. Multiple processes (1 per video stream) simultaneously injected video streams into the network. The streams were multiplexed at the network interface of the sending host, transmitted through the Fore Systems ATM ASX-100 switch, and received by the single receiver. The other host was used to retrieve steady state throughput measurements from the ATM Fore Systems switch. The retrieval of these measurements did not in any way eect the testing performance. One could think of a video on demand server as a typical example of such a scenario. In a more general case, multiple video sessions may be `on' at dierent destinations all deriving their video streams from a common server. In our case however, 9
all the video streams are directed to the same destination. In the future, it is conceivable that this con guration will be used for applications which require and use multiple video streams all received at the same destination (see below for examples).
2 senders to 1 receiver. In this con guration (see Figure 4 (a)), two SPARC workstations simultaneously injected multiple video streams into the network. Multiple processes were simultaneously invoked on both machines. Both senders were simultaneously sending compressed video data through the network. The streams were multiplexed at the sender's network interface, multiplexed through the Fore Systems ATM ASX-100 switch and received by a single receiver. An example application which would exhibit this type of sending pattern is a video conferencing activity. As mentioned above, future applications can conceivably utilize dozens of video streams (provided there is enough processing capability at the receiver end-station). Below we provide two examples of such future applications.
{ Security Monitoring: Present day security monitoring systems involve dozens of
video cameras placed at strategic locations in a building. Typically there are dozens of television monitors, one per camera, to monitor the activities inside a building. It is conceivable that the same application could be moved into the digital domain with many video windows on display at a single user workstation. An ATM network inside the building can serve as the medium for transporting these multiple video streams to the security desk. The security personnel can, at their discretion, activate a subset of cameras and selectively view them at a user workstation which will reduce their work area considerably and make it easier to perform the security operations. The same scenario can be developed for other monitoring applications spread across a wider geographic area. Possible applications include highway trac monitoring, or process monitoring in an industrial plant. { Live Broadcast / Movie Editing: Imagine a soccer game being televised live. Presently a television network in charge of televising the game has several cameras placed in dierent parts of the eld, at dierent heights and at dierent angles, to provide multiple perspectives of a given play. An on-site editing team uses several television monitors to select one view for broadcast. This entire process can be made far easier if all displays were to be received digitally at a single or a handful of workstations equipped with powerful CPUs and large disk storage. With this facility, the editing team may be remotely located and there would be no need to set up an entire work area for the television crew for televising the game. Also, in movie editing, segments of the movie can be stored at a large disk server and can be brought into a workstation for the nal editing.
Both applications listed above oer the advantages of moving the processing area away from the source of the video streams. Thus a few remotely located agencies can be providers with such services. With advances in multimedia composition, content based search and editing, and high capability user interfaces, the above mentioned tasks will reduce human labor dramatically. It is not our intent to cite the above examples of applications as the prime motivation for embarking on this study of aspects of ATM multiplexing. The examples serve to illustrate that 10
a scenario involving dozens of video streams being sent to a single end-station is conceivable in the future. The following terminology and performance metrics are used thoughout the remainder of the paper.
The average send time of a frame is the time the rst packet of a frame is transmit
ted to the time the rst packet from the next consecutive frame is transmitted. When transmitting video frames at 24 frames per second, the average send time is 1=24 = 41:67 milliseconds. The received interval is the time between the the receipt of an entire frame (the last packet in a frame) and the receipt of the next entire frame. This metric is used to compute jitter - the interarrival delay between consecutive frames. The xed frame interval is the interval based upon the frame rate, e.g. 24 frames per second implies a xed frame interval of 1=24 = 41:67 milliseconds. In a jitter-free environment, all of the received intervals would be equal to the xed frame interval. The steady state throughput is the throughput measured at the switch, which is averaged and reported over 2-3 second intervals. The peak steady state throughput is the peak observed (reported) steady state throughtput. The percentage of lost frames is the percentage of lost frames during a single run, or averaged among several single runs. Any frame missing a packet is considered lost.
Figure 4 depicts a conceptual view of the 2 sender to 1 receiver con guration and the 1 sender to 1 receiver con guration. Packet losses may occur at the sending host (during processing, or the multiplexing and transmission of multiple streams at the network interface), at the switch and at the receiving host. Since the purpose of this study is to examine the performance of variable bit rate video on the network, we sought to avoid making the results dependent on the host processing speed of the sender(s) and receiver. Timing logs of packet transmissions and receivals at the sending and receiving hosts were maintained (see next paragraph on Data Collection). The purpose of this study is to demonstrate the performance of multiplexing variable bit rate streams, speci cally coded video, by stressing the network components - host network interfaces, and switch. We sought to ensure that the losses and delays incurred only re ect the eect of multiplexing multiple coded video streams. The following paragraph describes how this was accomplished.
Data Collection. In both con gurations depicted in Figure 4, the sender(s) and receiver
maintained logs of packet departures and arrivals. The sending host recorded the number of packets sent, and the average send time. If the sending host was transmitting N streams (at 24 frames per second) and the average send time was larger than 1=24 = 42:67 milliseconds, this situation implies that the sending host could not support the transmission of N video streams at the 24 frame per second rate, i.e., the sending host was overloaded, and hence was introducing jitter which was not created by the network components. In these cases, the data was not used in this study. 11
Throughputs : 1 sender to 1 receiver Type of throughput
UDP
TCP
Max throughput measured (in [9]) for large messages approx. 200 KBytes
48 Mbits/sec
35 Mbits/sec
Peak steady state throughput for multiplexed MPEG-1 streams
41 Mbits/sec
26 Mbits/sec
Throughput for ‘acceptable’ quality video
12 Mbits/sec
23 Mbits/sec
(8 MPEG-1 streams)
(20 MPEG-1 streams)
Throughput for ‘acceptable’ quality smoothed (Inter-Interval(4)) video
20 Mbits/sec (16 MPEG-1 streams)
NA
Figure 5: Throughput measurements On the receiving side, the receiving host receives the packets and demultiplexes them according to sender host id and port number. A log for each incoming video stream is kept. The number of the missing frames is recorded. Any frame missing a packet is considered lost. The time to receive each entire frame is recorded. The time between the the receipt of an entire frame and the receipt of the next entire frame is recorded. For all our experiments, segments were `played' for 3 minutes and 40 seconds. The data was `played' by reading a le which consisted of an enumeration of frames and their corresponding byte sizes. To ensure accuracy, individual runs were executed multiple times. When appropiate, replicated results were averaged and the standard deviation computed and reported.
Previous work. In [9], the throughputs for AAL5, AAL3/4, TCP/IP on ATM, and TCP/IP on Ethernet, were measured. For AAL5, a maximum throughput of about 48 Mbits/sec was obtained for large messages (about 200 Kbytes). For AAL4, a maximum throughput of about 40 Mbits/sec was obtained for large messages. TCP was observed to achieve a maximum throughput of about 35 Mbits/sec. The maximum Ethernet throughput was measured at 7.85 Mbits/sec. In terms of round-trip delay, [9] found the delays were best (least) for AAL5, AAL3/4, TCP over ATM, and Ethernet, in that order. Experiment #1: Throughput Measurements. During the course of performing our exper-
iments, our steady state throughput measurements correlated with the measurements reported in [9]. As expected, while transmitting multiplexed MPEG-1 streams, we observed peak steady state throughputs slightly lower than those reported in [9]. In [9], throughputs were attained by using message sizes of approximately 200 KBytes. Using multiplexed MPEG-1 streams, we attained throughputs of 41 Mbits/sec for UDP on AAL5 ATM, and 26 Mbits/sec for TCP on AAL5 ATM. Table 5 depicts our throughput measurements and those measured in [9]. This table also shows the relative ability of TCP and UDP to transport variable bit rate video (i.e., MPEG-1). Surprisingly, TCP is able to eciently transport a large number of multiplexed MPEG-1 streams with `acceptable' visual quality at near peak steady state throughput rates. (Section 4.2 discusses `acceptable' quality video). UDP is able to transport only 8 multiplexed 12
Starwars data (UDP/AAL5) : 1 sender, 1 reciever 60
Average percentage of lost frames
JPEG MPEG-1 MPEG-2 50
40
30
20
10
0 1
2
4
8 12 Number of streams
16
20
Figure 6: Experiment #1: JPEG, MPEG-1, and MPEG-2 MPEG-1 streams with `acceptable' quality at only 30% of the peak steady state throughput. Using a smoothing scheme, described in Section 4.3, UDP is able to support twice the number of MPEG-1 streams with `acceptable' visual quality. These results will be discussed in much greater detail below.
Experiment #2: JPEG, MPEG-1, MPEG-2. In this experiment, the test con guration used was the 1 sender to 1 receiver model. The protocol suite used was UDP AAL5 ATM. The average (among all streams in the same run) percentage of frames lost as a function of the number of streams injected by the sending host is shown in Figure 6. JPEG, MPEG-1, and MPEG-2 sequences from the Star Wars movies are compared. Despite the poorer quality JPEG video sequence and the lower capture rate of 15 frames/sec (compared to 24 frames/sec for the MPEG-1 sequence), the percentage of lost frames for the JPEG coded sequence was still higher than the percentage of lost frames for the MPEG-1 coded sequence (see Figure 6). The higher loss rates of the JPEG streams can be attributed to the following. The average frame size in the JPEG coded sequence consisted of 10944 Bytes. The average frame size in the MPEG-1 coded sequence consisted of 5184 Bytes. The average `I' frame size in the MPEG-1 sequence was close 11000 Bytes (which is close to the average frame size from the JPEG sequence). A sequence with an average larger frame size would result in greater losses because (i) any cell or packet loss would result in the entire frame being lost (i.e., smaller frames are less likely to be lost than larger frames), and (ii) a stream with a higher bit rate would cause more contention throughout its transmission and hence packets from that stream would be more likely to be discarded. For the same reason, due to the much larger frames in the MPEG-2 stream, the average percentage of lost frames for the MPEG-2 sequence was the greatest. The mean bit 13
rate for the MPEG-1 sequence was 1:5 Mbits/sec; the mean bit rate for the JPEG sequence was 2:3 Mbits/sec; the mean bit rate for the MPEG-2 sequence was 4:0 Mbits/sec. For the remainder of our experiments, we used the MPEG-1 sequence from the Star Wars movie. We chose to use the MPEG-1 sequence rather than the JPEG sequence because it is the most likely type of coded video to be used in the future; MPEG-1 provides higher compression ratios and lower mean bit rates than JPEG. The MPEG-2 stream data bit rate was too high (i.e., caused too many lost frames) to produce meaningful results in our particular environment. Thus, hereafter, the term MPEG implies MPEG-1.
4.2 Loss and Jitter Measurements Physically, the network in our testbed is capable of providing 100 Mbits/sec at the physical TAXI interface. However, observed application level throughput is lower due to overhead occurring at the sending and receiving hosts and the switch. Overhead is incurred by hardware and software components. Hardware overhead is incurred by the host interface board, signal propagation delay, bus architecture of the host, and the switch. Software overhead is incurred by interactions with the host system's operating system, device driver and higher layer protocols. An example of software overhead is the protocol processing which occurs at both the sending and receiving hosts. Figure 1 depicts the layers of the protocol stack. A user application message would have to be processed at each layer of the stack, beginning at the application layer, until it is nally in the form suitable for physical transmission via the network medium. Each layer performs processing on the user message by fragmenting/reassembling the message and appending/stripping the appropriate headers, depending upon whether the message is traversing down or up the protocol stack, respectively. As mentioned before, we used the TCP/IP and UDP/IP protocols as the transport mechanisms for transporting the video streams.
UDP and TCP. UDP provides connectionless datagram delivery service. It uses the underlying Internet Protocol (IP) to transport a message from one host to another host. It provides the same unreliable, connectionless datagram delivery semantics as IP. UDP does not maintain an end-to-end connection between the sending and receiving processes. It merely pushes the datagram out on the network and accepts incoming datagrams from the network. It does not provide guaranteed message delivery, nor in-order delivery, nor any type of ow control mechanism. The UDP layer is only responsible for multiplexing (demultiplexing) among multiple sources (destinations) within one host. TCP is a connection-oriented protocol. It supports the reliable, sequenced and unduplicated
ow of data without record boundaries. TCP supports guaranteed delivery (no loss) and ow control by using a sliding window protocol with time-outs and retransmits. TCP's greater capabilities, compared to UDP, is also its drawback: TCP requires more CPU processing and network bandwidth than UDP. `Acceptable' visual quality. Continuous media trac, such as coded video, have the realtime requirement that frames be displayed sequentially (continuously) with no prolonged delays 14
UDP/AAL5 (MPEG): 1 sender, 1 reciever 50 45
Percentage of lost frames
40 35 30 25 20 15 10 5 0 1
2
4
6
8 12 number of streams
16
20
Figure 7: 1 sender to 1 receiver: frame loss vs. number of MPEG streams.
UDP/AAL5 (MPEG): 2 senders, 1 reciever 60
Percentage of lost frames
50
40
30
20
10
0 1
2
4
6
8 12 number of streams
16
20
Figure 8: 2 senders to 1 receiver: frame loss vs. number of MPEG streams. 15
between any pair of consecutive frames; the interarrival time, or jitter, between frames must be bounded. When network congestion occurs, frames may be discarded (via buer over ow), or in the case of TCP, may be discarded and then re-transmitted. If a frame arrives late, it will cause jerkiness in the visual medium. If a frame never arrives (is lost) its absence will also cause jerkiness in the visual medium. In MPEG, which consists of `I', `P', and `B' frames, some frames are more important than others [19]. `I' frames are complete bit images. They must be received regularly in order to re-generate high quality images. `P' and `B' frames are used to `refresh' the current image. If a `P' or `B' frame is lost, the decoder may be able to `guess', or estimate, the lost frame until the next `I' frame arrives. `I' frames serve as a reference point for creating `B' and `P' frames. Hence, `I' frames are more important to maintaining high visual quality; their loss is much more apparent to a viewer. In our experiments, we de ne a loss of more than 10% to be visually noticeable to most viewers and hence unacceptable. The worst case occurs when all 10% of the lost frames are `I' frames. At 24 frames per second, a 10% loss translates into a 21:6 frame per second frame rate. This is in the range of what is usually judged to be `acceptable' visual quality [19]. Recall that the Parallax Graphics video card is only capable of displaying 15 frames per second. Using TCP, packets which are discarded by the network are re-transmitted. Thus packets (and hence frames) are very rarely lost. Packets are discarded only after a pre-set number of re-transmissions occur. However, this situation occurs very rarely, and we did not observe any loss for any of the TCP experiments. We assumed that the receiving host could buer upto 3 xed frame intervals (or 3 frames). This is a reasonable assumption because `B' frames may reference either forward or backward frames; both the last and next `P' or `I' frames must be transmitted before the `B' frames may be sent. Thus the receiver host must be able to buer a minimal of 3 frames [19]. Hence a frame received within a 3 xed frame intervals will not cause jitter to occur. A frame which arrives outside of 3 xed frame intervals is considered lost (undisplayable) and may contribute to the overall perceived jitter. Thus, similarly to what is acceptable for lost frames, we de ne an untolerable number of delays (i.e., losses) to be 10%; if more than 10% of the frames are delayed (beyond 3 xed frame intervals, the visual quality is unacceptable.
Experiment #3: Frame loss vs. number of multiplexed video streams. This experi-
ment was performed using UDP over ATM AAL5. It was of interest to use UDP because of its higher throughput and lower latency than TCP, which is primarily due to TCP's ow control and retransmission mechanism. This is especially of concern to real-time trac such as video. Since UDP does not guarantee packet delivery, loss is a key performance measure. Loss was measured in terms of percentage of frames lost. If any packet of a frame was discarded along the transmission path, the entire frame was considered lost. Figures 7 and 8 depict the percentages of lost frames as a function of the number of transmitted MPEG-1 stream. Each square in the graph corresponds to an individual multiplexed stream. All squares which are aligned in the same vertical line represent the percentage of lost frames of all streams from the same run. Figure 7 results from one sender host transmitting to one receiving host. Figure 8 results from two sender hosts transmitting to one receiving host. In this con guration (2 sender to 1 receiver), the number of video streams to be sent 16
Throughput (MPEG-1), UDP/AAL5: 1 sender, 1 reciever
Steady State Throughput (Mbits/sec)
35 1 stream 8 streams 12 streams 20 streams
30
25
20
15
10
5
0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 observed time
Figure 9: UDP/AAL5 steady state throughput vs. time. are evenly split among the two sending hosts. For example, in Figure 7, `12 streams' implies 12 streams are multiplexed at the single sending host and sent to the single receiving host via the intermediate switch. In Figure 8, `12 streams' implies that there are 2 sending hosts, each transmitting 6 streams to the single receiver, which receives a total of 12 streams. Both gures show that as the number of streams increases beyond 8, performance degrades to a point that would be perceptible to a viewer ( > 10% frame loss). An important observation is the large percentage lost variation between streams in the same run. For example, in Figure 8, when 18 streams are sent to the receiving host (9 streams per sending host), the percentage of frames lost between streams varies between 1:3% to 44:5%. Figure 9 depicts the steady state throughput of four of the runs for the single sender to single receiver con guration. Two important observations from this gure are the following. As the number of MPEG streams is increased, the burstiness of the multiplexed streams increases. This also correlates intuitively with the large variation in frame loss percentages between multiple streams of a same run. The second observation is that even though the peak multiplexed throughput may be well under the maximum sustained throughtput, relatively large frame losses may occur. For example, the peak throughput for 20 streams is 30 Mbits/sec. The maximum sustained throughput (derived from many multiplexed video sources) is 41 Mbits/sec. The percentage of lost frames for some of the 20 streams reached as high as 47%. The received intervals of the MPEG streams remained constant as the network was more heavily loaded with additional streams. Jitter was negligible in all runs. This is because UDP simply discards packets when contention occurs, and does not perform recovery procedures for other lost packets. 17
TCP/AAL5 (MPEG-1): 1 sender, 1 reciever 20 28 24 20 16
streams streams streams streams
Percentage delayed
15
10
5
0 [2, 4)
[4, 6) number of intervals
[6, infinity)
Figure 10: TCP: jitter
Experiment #4: Jitter vs. number of multiplexed streams. This experiment was
performed using TCP on ATM AAL5. Figure 10 depicts the percentages of frames delayed for varying numbers of xed frame intervals. Recall, that loss is not a performance measure in this experiment since TCP rarely loses packets. Packets which are delayed between x xed frame intervals and y ? 1 xed frame intervals are said to be delayed by [x; y ), i.e., their received interval is between [x; y ). Assuming frame delays less than or equal to 3 xed frame intervals are tolerable (due to buering), TCP can support upto 20 multiplexed streams with only 7% of the received intervals exceeding 3 xed frame intervals. Note that TCP is able to support a far greater number of multiplexed streams (within reasonable inter-arrival delay constraints) than UDP (within reasonable frame loss rates). Also for TCP the variation in jitter between individual multiplexed streams from the same run was found to be negligible. Recall that for UDP the variation in frame loss rates between individual multiplexed streams from the same run was very high. Figure 11 depicts the steady state throughput observed at the switch. Note that the observed transmission time increases slightly as the number of streams (amount of data) increases. Observe that as the number of multiplexed streams increases to 20 (and 24 for TCP) the peak throughput for TCP streams (< 24 Mbits/sec) is lower the the peak throughput for 20 UDP streams (31 Mbits/sec). This can be attributed again to the ow control imposed by the TCP's sliding window protocol. Recall that when heavily loaded with periodic bursty streams (e.g., video), we observed TCP able to reach a steady state peak throughput of 26 Mbits/sec. This experiment shows TCP capable of supporting upto 20 streams with reasonable performance at steady state peak 18
Throughput (MPEG-1), TCP/AAL5: 1 sender, 1 reciever
Steady State Throughput (Mbits/sec)
25 1 stream 8 streams 20 streams 24 streams 20
15
10
5
0 0
5
10
15
20 observed time
25
30
35
Figure 11: TCP/AAL5 steady state throughput vs. observed time throughput of nearly 24 Mbits/sec. Thus TCP is capable of providing ecient transport for variable bit rate video trac.
4.3 Controlling burstiness Previous work [22, 23], as well as our experimental results, indicate that bursty trac sources contribute to the amount of delay and loss experienced by network trac. As the overall burstiness of a stream increases, buer over ows due to contention are more likely to be occurring at the sending and receiving hosts and at the switch. Hence, it has been suggested [22, 23, 18, 17] that smoothing on bursty sources be performed at the individual source nodes before being admitted to the network. Most previous work on burst-level control or smoothing at sources have not examined smoothing in the context of delay-sensitive data such as coded video. A study [17] which determined an optimal smoothing algorithm for video trac has appeared. However, that particular study was performed on videoconference sources. As mentioned before, videoconference applications consist of mostly low-motion scenes, and thus usually produce much less bursty and lower bit rate trac then a general video source. This study as well as others [18] are based upon simulation models or analytical techniques. As mentioned before, actual distributed systems consist of many components, hardware and software, whose complex interactions are very dicult to predict. Smoothing is the process of buering cells for a certain period and then transmitting the buered data in a less bursty form. Smoothing video trac may be performed in a variety of ways. It may be performed on individual frames, within a xed frame interval, or among several frames, spread over several xed frame intervals. The former method will be referred to 19
as intra-frame smoothing; the latter method will be referred to as inter-frame smoothing. Video transmission has strict timing requirements on the interarrival time between adjacent frames; frames must arrive consecutively within a bounded delay, otherwise they are meaningless and must be discarded. Thus, if a video frame is to be smoothed by being spread over several frame intervals with other video frames, there must be a buer for which the frames are held until their display, and the number of frame intervals which frames are spread among must be limited to correspond to the buer size. Also, of course, taking the real-time requirements of video trac into consideration, the number of buered frames (smoothing interval) must be such that the inter-arrival delay requirements of the frames will not be violated. Thus there is a fundamental tradeo between providing smoothing (for greater performance gains) and tolerating additional buer and delay requirements. Several schemes for smoothing trac at the source have been proposed. They include deterministic smoothing [22] and smoothing according to a uniform distribution [23]. We chose to look at deterministic smoothing because it is the simplest and hence seems most suitable for the real-time requirements of video transmission. In deterministic smoothing, cells are equispaced over a single or multiple xed frame intervals. In the case of MPEG video streams, it is reasonable to assume that since `I' frames are a major and predictable cause of bursty trac that `I' frames be buered. Considering that the MPEG-1 stream used in our experiments had an average `I' frame consisting of approximately 207 ATM cells, and an average non-`I' frame consisting of approximately 55 ATM cells, it is reasonable to assume that the maximum number of xed frame intervals a frame could be spread among to be approximately four. That is, in a MPEG stream consisting of ... BBPBBPBBPIBBPBBP ..., the four P and B frames (shown in boldface) surrounding each I frame would be buered with their neighboring `I' frame.
Experiment #5: Smoothing Schemes. This experiment used UDP over ATM AAL5. We chose to use UDP instead of TCP because (from Experiment #3) we observed that TCP is able to support video trac with relatively good performance, i.e. no losses and low jitter, at near steady state peak throughput. In this experiment, we re-sent the data, again in burst periodic form, however the data was \smoothed" or averaged among xed frame intervals according to the following schemes. The 2 sender to 1 receiver con guration was used. Intra-Interval smoothing. In this scheme, `I' frames and their xed frame interval were
divided into two, i.e., half an `I' frame is transmitted in the rst half of the xed frame interval, the second half is transmitted in the second half of the xed frame interval. Inter-Interval smoothing(i). In this scheme, `I' frames are smoothed along several, i, xed frame intervals. The same data is sent during the i intervals, however the data from the `I' frame is evenly distributed among the i xed frame intervals. Constant BP (Burst Periodic). In this scheme, the amount of data in every `I' frame period (i.e., the amount of data from the beginning of an `I' frame to the last bit of the last frame before the next consecutive `I' frame) was transmitted such that the same amount of data was transmitted in every xed frame interval of the `I' frame period. Note that this smoothing scheme is not suitable to be used for video trac unless the 20
Smoothed UDP/AAL5 (MPEG-1): 1 sender, 1 reciever 50 45
Percentage of lost frames
40 35 30 25 20 15 10 5 0 1
2
4
8 12 number of streams
16
20
Figure 12: 1 sender to 1 receiver: Inter-Interval smoothing(4)
Smoothed UDP/AAL5 (MPEG-1): 2 senders, 1 reciever 50 45
Percentage of lost frames
40 35 30 25 20 15 10 5 0 1
2
4
8 12 number of streams
16
20
Figure 13: 2 sender to 1 receiver: Inter-Interval smoothing(4) 21
Comparison of Smoothing Schemes Smoothing Scheme max
20 streams min avg
: 2 sender 1 receiver
std dev
max
16 streams min avg
std dev
max
8 streams min avg
std dev
No smoothing
55%
18%
37%
10.80
44%
2%
19%
12.33
11%
0%
4%
3.39
Intra-Interval
NA
NA
NA
NA
36%
4%
14%
10.13
7%
0%
3%
3.10
Inter-Interval(2)
45%
11%
29%
8.34
37%
3%
15%
10.90
8%
0%
3%
3.22
Inter-Interval(4)
26%
7%
14%
5.40
13%
1%
5%
4.09
3%
0%
2%
5.18
7%
1%
2%
1.61
1%
0%
0%
Constant BP
22%
6%
14%
1.34 0.50
Figure 14: Average frame loss comparisons: smoothing vs. non-smoothing interframe to intra-frame ratio is small (less than 3-4). The MPEG-1 stream we used in our experiments had a interframe to intra-frame ratio of 16 which would imply that if Constant BP smoothing would be used, the receiver would have to buer 16 frames. Hence this scheme is only feasible if the interframe to intra-frame ratio is small (less than 3-4). It is only presented here for comparison purposes to provide an optimal lower bound for the other smoothing schemes. Note that the smoothing schemes use a relatively crude, yet eective, form of deterministic smoothing. For instance, in the Intra-Interval scheme, smaller bursts were transmitted in smaller intervals of 0:0208 seconds, or a 1/48th of a second, as opposed to a 1/24th of a second. If bursts were sent out in less than a 1/48th of a second, then in order not to overload the sending and receiving hosts (and distort the experimental results), the number of streams simultaneously transmitting data would have to be greatly reduced. Since our experiments were performed at the application level, the speed of the host processor to send interrupts limited us from implementing true deterministic smoothing (all packets are equispaced along an interval). We implemented a variation of deterministic smoothing which provided smoothing for large bursts (`I' frames) but still produced a bursty (albeit less bursty) periodic trac pattern. Note that the proposed smoothing schemes only serve to show the potential performance gain resulting from smoothing. More eective smoothing schemes must be implemented by hardware or possibly lower levels of the protocol stack. Figures 12 and 13 depict the percentage of frame losses for the 1 sender to 1 receiver and 2 sender to 1 receiver con gurations using Inter-Interval smoothing(4), respectively. Figure 14 shows the results of the various smoothing schemes in terms of the average percentage of frames lost and the standard deviation of frame loss between individual streams of the same run. The `no smoothing' case (from Experiment #3) is shown in contrast to the smoothing schemes. The Intra-Interval, Inter-Interval(2), Inter-Interval(4) and Constant BP schemes correspond to schemes which provide increasing degrees of smoothing. The performance of the Intra-Interval and Inter-Interval(2) schemes show the Inter-Interval(2) scheme to be slightly better. The performance of the Inter-Interval(4) shows the greatest performance gain. For 20 streams (10 22
per host), the Inter-Interval(4) smoothing scheme resulted in an average MPEG stream frame loss percentage of 14% (max = 26%, min = 7%); for 20 streams (10 per host), the no smoothing scheme resulted in an average MPEG stream frame loss percentage of 37% (max = 55%, min = 18%). For 16 streams (8 per host), the Inter-Interval(4) smoothing scheme resulted in an average MPEG stream frame loss percentage of 5% (max = 13%, min = 1%); for 20 streams (10 per host), the no smoothing scheme resulted in an average MPEG stream frame loss percentage of 19% (max = 44%, min = 2%). For 8 streams (4 per host), the Inter-Interval(4) smoothing scheme resulted in an average (max, min) MPEG stream frame loss percentage of 2% (max = 3%, min = 0%); for 20 streams (10 per host), the no smoothing scheme resulted in an average MPEG stream frame loss percentage of 4% (max = 11%, min = 0%). Note that the variability (standard deviation) in percentage of frames lost for individual streams, multiplexed in the same run, also decreased substantially as smoothing increased.
5 Discussion of Results and Network Requirements In this section, using our experimental results presented in Section 4, we discuss requirements for the ecient transmission of variable bit rate sources such as coded video. Two areas are discussed: requirements for ATM switch architectures and network interfaces, and the necessity for a general ow control policy.
Flow control. At the time of this study, the components in our environment, the Fore
Systems ASX-100 ATM switch and the ATM SBA-200 adapter provided no direct support for ow control. We have shown in this study that some type of ow control is a necessity when transmitting bursty variable bit rate sources. Our results showed that a bursty stream produces a high percentage of losses even if the peak throughput of the bursty stream is far less than the maximum sustainable throughput. For example, Figure 7 and 8, show that 16 multiplexed MPEG-1 streams produce losses for individual streams of upto 29% (1 sender, 1 receiver) and 44% (2 senders, 1 receiver). Their peak throughput is half of the maximum sustained throughput of 41 Mbits/sec. Using TCP, for the same number of multiplexed streams, 16, only 7% of the frames encountered delays of more than 3 xed frame intervals (see Figure 10). TCP's sliding window protocol had the eect of smoothing the burstiness as the number of multiplexed streams increased, as well as still being able to deliver packets within their deadlines. Most current host interface and switch architectures are designed to produce the highest possible throughputs and lowest delays. They usually achieve these goals by using greedy methods such as injecting data out as rapidly as possible, thus contributing to bursty traf c conditions and the consequent performance degradation. While these components may achieve high performance in isolation, they are not always conducive to supporting high performance in an environment which consists of interconnected, inter-dependent components. Flow control, whether based upon pacing (rate-based) or credit based mechanisms, must be supported by the underlying hardware components.
Sizing of switch output buers. In our experiments, we consistently observed (in
most runs) that the 2 sender to 1 receiver con guration performed poorer (in terms of 23
packet loss) than the 1 sender to 1 receiver con guration (see Figures 7 and 8). In the 2 sender to 1 receiver con guration, the load was spread evenly between both sending hosts. Thus the senders were exactly one half less loaded than the equivalent 1 sender to 1 receiver con guration. The only network component which was more heavily loaded was the switch. The switch would have to accept data from two ports and multiplex them onto the outgoing link leading to the single receiver host. Since the multiple video streams were invoked serially per host (but for the overall duration of the video clip are transmitted simultaneously), for the 1 sender to 1 receiver con guration, the `I' frames from the multiple streams were initially staggered. In the 2 sender to 1 receiver con guration, since only half of the streams were staggered together, there were most likely more `I' frames simultaneously contending for the switch buers. The Fore Systems ASX-100 switch has output buers which hold only 256 ATM cells or 12288 bytes. The average size `I' frame from the MPEG clip and the average size frame from the JPEG clip consisted of approximately 228 ATM cells. The larger MPEG `I' frames were above 400 cells. When considering these frame sizes, which are typical for MPEG and JPEG coded streams, it is likely that under loaded conditions buer over ow occurred not infrequently. It is obvious that ATM switches which expect to support coded video or other trac types with large frame sizes must provide signi cantly larger output buers for the contending output streams.
Hardware support for bursty periodic sources. ATM interfaces and switches which
convey a substantial amount of video trac may provide circuitry which supports the periodicity of the sources. The number of frames per second, 30 for NTSC, is a constant which could be hard-coded into the circuitry. Sources, such as MPEG video, which produce unusually large sized frames, \I" frames, at known intervals, may be provided circuitry which would enforce the even spacing of multiple MPEG streams such that \collisions" between these large frames are minimized. Video on demand, a video intensive service, is an example of an application which would bene t from hardware specially designed to handle video trac.
6 Conclusion This paper provides insight into the network support for a typical multimedia data type trac, coded video. The results show the performance an application, which generates bursty video trac, can expect from an existing high speed local ATM network platform. Conclusions which can be drawn from the experiments include:
Not only did TCP perform better than UDP, TCP was able to eciently support many
multiplexed video streams at near peak steady state throughput rates. This was an unexpected result. Due to the real-time nature of video trac, a best-eort or guaranteed service protocol has usually been suggested as the most appropiate type of protocol for video trac [12, 23, 19]. Protocols which provide end-to-end ow control and re-transmissions, such as TCP, have been considered unsuitable because it has usually been thought that the delays caused would result in packets missing their bounded delay requirements and 24
hence becoming meaningless (discarded) [19]. From our experiments, we observed TCP's sliding window protocol to have the eect of smoothing trac burstiness, as well as still being able to deliver packets within their deadlines. When heavily loaded with periodic bursty streams (e.g., video), we observed TCP able to reach a steady state peak throughput of 26 Mbits/sec. We observed TCP able to transmit multiplexed MPEG video streams with adequate performance at near steady state peak throughputs (approximately 23 Mbits/second). Between a single sender, Fore Systems ATM switch, and a single receiver, TCP could support 20 MPEG streams (at close to peak throughput rates) with delays within acceptable bounds. Very little variation in jitter between individual multiplexed streams was observed. UDP over ATM AAL5 is suitable for serving as a transport mechanism for video only if the network is lightly loaded with video trac. When 8 or more MPEG streams are multiplexed, unacceptable frame losses occur (greater than 10%. The peak throughput for 8 streams is close to 12 Mbits/sec. When transmitting periodic bursty streams, we observed UDP able to attain a steady state peak throughput of 41 Mbits/sec. The variation in frame loss percentages between multiplexed streams was high (see Figure 7 and 8). Controlling burstiness results in signi cantly less packet losses. This implies a fewer number of re-transmissions for transport protocols which guarantee reliability through re-transmissions, such as TCP, as well as for unreliable transport protocols such as UDP. Losses are signi cant when transmitting trac types with large pre-speci ed (nonchangeable) frame sizes like coded video. If any packet of a frame is lost, the entire frame must be discarded. A relatively small percentage of lost ATM cells may translate into a relatively large frame loss percentage in data trac (such as coded video) which must use large frame sizes. Areas of future study include the study of video transmission in a wide area ATM network. In the wide area, it is expected that the much longer propagation delay, will aect TCP's ability to eciently support bursty periodic multiplexed streams. The behavior of TCP and UDP in a wide area ATM network must be studied. Other areas of future study, based upon our current work, include appropiate sizing for ATM switches, ow control for bursty periodic sources, and hardware support for burst periodic sources. All are discussed in Section 5.
References [1] Agrawal, M., Guha, A., Pavan, A., \A Real-Time Multimedia Network Architecture for Time-Critical Applications", Proceedings of the IEEE Workshop on the Role of Real-Time Multimedia/ Interactive Computing, Durham, NC, November, 1993. [2] Andersen, T.M., Cornelius, R.S., \High Performance Switching with Fiber Channel", IEEE Proceedings of CompCon, pages 261-264, 1992. [3] ANSI X3T9.3 Fiber Channel - Physical and Signalling Interface (FC-PH), 4.2 edition, November 1993. 25
[4] Biagioni, E., Coope, E., Samsom, R., \Designing a Practical ATM LAN", IEEE Network, pages 32{39, March 1993. [5] Boudec, J., \The Asynchronous Transfer Mode: A Tutorial", Computer Networks and ISDN Systems, Vol. 24, pp. 279-309, 1992. [6] Cohen, D., Heyman, D., \A Simulation study of Video Teleconferencing Trac in ATM Networks", Proceedings of IEEE INFOCOM 1993. [7] Fore Systems, Inc., ForeRunner SBA-200 ATM SBus Adapter User's Manual, 1993. [8] Friesen, V., Wong, J., \The Eect of Multiplexing, Switching and Other Factors on the Performance of Broadband Networks", Proceedings of IEEE INFOCOM 1993. [9] Guha A., Pavan A., Liu, J., Steeves, T., \Supporting Real-Time and Multimedia Applications on the Mercuri ATM Testbed", to appear in the IEEE Journal on Selected Areas in Communications - special issue on ATM LANs, 1995. [10] Hees, H., Lucantoni, D., \A Markov Modulated Characterization of Packetized Voice and Data Trac and Related Statistical Multiplexer Performance", IEEE Journal on Selected Areas of Communication, Vol. 4, No. 6, Sept. 1986. [11] Kawarasaki, M., and Jabbari, B., \B-ISDN Architecture and Protocol", IEEE Journal on Selected Areas in Communications, Vol. 9, No. 9, pp. 1405-1415, Dec. 1991. [12] Kung, H.T., \Gigabit Local area Networks: A Systems Perspective", IEEE Communications, April 1992. [13] Lin, M., Hsieh, J., Du, D., Thomas, J., MacDonald, J., \Distributed Network Computing Over Local ATM Networks", to appear in IEEE Journal on Selected Areas in Communications: Special Issue on ATM LANs (Early `95). [14] Lyles, J., Swinehart, D., \The Emerging Gigabit Environment and the Role of Local ATM", IEEE Communications Magazine, April 1992. [15] Maglaris, B., Anastassiou, D., Sen, P., Karlsson, G., Robbins, J., \Performance Models of Statistical Multiplexing in Packet Video Communications", IEEE Transactions on Communications, Vol. 36, No. 7, July 1988. [16] Norros, I., Roberts, J., Simonian, A., Virtamo, J., \The Superposition of Variable Bit Rate Sources in an ATM Multiplexer", IEEE Journal on Selected Areas in Communications, Vol. 9, No. 3, April 1991. [17] Ott, T., Tabatabai, A., Lakshman, T.V.,, \A Scheme for Smoothing Delay Sensitive Trac Oered by ATM Networks", IEEE Proceedings of INFOCOM 1992. [18] Pancha, P., El Zarki, M., \MPEG Coding for Variable Bit Rate Video Transmission", IEEE Communications Magazine, May 1994. [19] Partridge, C., \Gigabit Networking", Addison-Wesley Professional Computing Series. 26
[20] Pavan A., Guha, A., Liu, J., Midani, M. and Pugaczewski, J., \Experimental Evaluation of Real-Time Support on the Mercuri Wide Area ATM Testbed", in review, Mar., 1995. [21] Schoch, J., Hupp, J., \Measured Performance of an Ethernet Local Network", Communications of the ACM, December 1980. [22] Shro, N., Schwartz, M., \Video Modeling within Networks using Deterministic Smoothing at the Source", IEEE Proceedings of INFOCOM 1994, pp. 342-349. [23] Skelly, P., Dixit, S., Schwartz, \A Histogram-Based Model for Video Trac Behavour in an ATM Network Node with an Application to Congestion Control", IEEE Proceedings of INFOCOM 1992. [24] Thekkath, C.A., Levy, H.M., Lazowska, E.D., \Ecient Support for Multicomputing on ATM Networks", Technical Report TR 93-04-03, Department of Computer Science and Engineering, University of Washington, April 1993. [25] Wolman, A., Voelker, G., Chandramohan, A., Thekkath, C.A., \Latency Analysis of TCP on an ATM Network", Technical Report TR 93-03-03, Department of Computer Science and Engineering, University of Washington, March 1993.
27