supported by the network at a specified packet loss rate. ..... H. Song and J. Kuo, "Rate control for low bit rate video via variable frame rates and hybrid DCT/ ...
Network Friendly Video Streaming via Adaptive LMS Bandwidth Control Yon Jun Chung, Jong Won Kim and C.-C. Jay Kuo Integrated Media Systems Center Department of Electrical Engineering-Systems University of Southern California Los Angeles, California 90089-2564
ABSTRACT
In this research, we examine the problem of real-time video streaming over the Internet by introducing an adaptive least-mean-squares (LMS) bandwidth controller to adjust the amount of video data uploaded to the network so that the packet loss can be minimized in face of network congestion. The adaptive LMS bandwidth controller, which resides at the client end, sends a feedback signal to the server regarding the available bandwidth that can be supported by the network at a speci ed packet loss rate. The available bandwidth is continuously updated with the ever-changing network conditions. Simulation results are provided to demonstrate the superior performance of the proposed LMS bandwidth controller. Keywords: video streaming, real time streaming protocol, least mean squares algorithm, feedback channel and available bandwidth.
1. INTRODUCTION
Real-time transport protocol (RTP) [1] and real time streaming protocol (RTSP) [2] have been developed to support real time Internet video transmission over the Internet. They are usually built upon a transport-level protocol known as the user datagram protocol (UDP). UDP packets cannot avoid being dropped completely at least some of the time. In contrast with the transmission control protocol (TCP), UDP lacks the structure to retransmit lost packets between the server and the client. It is also unlikely that the Internet will have another protocol which can retransmit lost packets fast enough to meet the strict time constraint required by full-duplex video transmission, e.g. videophone, in the near future. In the current Internet infrastructure, there is just one packet delivery class, and packet dropping is the main instrument for congestion relief used in today's Internet. The number of UDP packets lost during transmission is largely a function of network trac. That is, as network trac increases so does the number of packets lost. Consequently, the number of dropped packets indicates the possible network congestion status. Since packet loss is a fact of life in real time Internet applications, real time Internet video should cope with dropped packets. Coping with lost packets is often done with two dierent approaches. One approach is to minimize the eect of packet loss by constructing an error resilient bit stream [6,7] and applying error detection and concealment techniques which reconstruct the lost data due to dropped packets [8]. The other approach is to decrease the network load in the hope of minimizing the possibility of packet loss in the immediate future. Under this framework, the transmitter performs adaptive packet transmission continuously for the purposes of packet loss minimization and congestion avoidance. While the rate control algorithm implemented in the streaming server can alter the bit stream size of the underlying video, a separate mechanism is needed to determine the amount of bit stream alteration. Up to now, the most popular congestion avoidance has been guided by a multiple decrease/single increase scheme [4], which means that one should decrease the load with a multiple-step size while increase the load with a single-step size. Most previous work has approached the problem of reducing the impact of lost packets from one of the above two approaches [5]. In this work, we focus on one speci c type of Internet video streaming situation where there is just one receiver for every transmitter. Thus, the packet loss minimization can be coordinated jointly between the receiver and the transmitter via a feedback channel. Here, we employ RTSP, which has a control connection that handles the command communication between the transmitter (or the server) and the receiver (or the client).
We extend the functionality of this control connection to serve as a feedback channel between the server and the client. In particular, we present a point-to-point real-time Internet video transmission that continually adapts its bit stream size to accommodate the network trac condition. The presented solution has dual components: an adaptive least-mean-squares (LMS) bandwidth controller and real-time H.263+ rate controller both working in unison. Only the LMS bandwidth controller will be addressed in this paper. There are several papers discussing the rate control issue of H.263+. For interested readers, we refer to references in [10]. This paper is organized as follows. An overview of packet loss handling schemes associated with RTSP video will be described in Section 2. The main idea of LMS bandwidth controller and its implementation are discussed in detail in Section 3. Some experimental results are presented in Section 4 to demonstrate the superior performance of the proposed LMS bandwidth controller. Finally, concluding remarks and future work will be given in Section 5.
2. PACKET LOSS HANDLING IN RTSP VIDEO 2.1. RTSP Video via Modem Connection
While the entire video le can be retrieved error-free via retransmission of dropped packets, as adopted in the transmission control protocol (TCP), the signi cant download time limits its application to video of a short duration (say, only a few minutes). To overcome this diculty, streaming video over the Internet have become quite prevalent today. The characteristics that separate streaming Internet video from other types of network video is that the client plays the video while receiving the packet from the network in the same time. Streaming video can take the form of either precompressed video (e.g. video server on demand) or real-time captured and coded video (e.g. video conferencing and video phones). The web page given in [3] provides a good overview of commercially available streaming video products. Most streaming video applications have employed a combination of RTP and UDP as their packet protocol. Selecting the RTP/UDP packet protocol over TCP is a reasonable tradeo due to the following reasons. First, the time associated with dropped TCP packet retransmission adds too much delay for the retransmitted packet to be of use in video. Second, the lack of time notion in TCP is gained through the timestamp in the RTP header. To respond to this need, the Audio/Video Transport Working Group of the Internet Engineering Task Force (IETF), which standardized RTP [1] in the past, has developed several pro les for audio/video transmission along with several payload formats in applications like H.261/H.263/H.263+, JPEG, MPEG, etc. Having forsaken packet retransmission, Internet video applications should contend with dropped packets and the data contained within. In this work, we employ the real-time streaming protocol (RTSP) as the streaming level protocol, which has a control connection that handles the command communication between the transmitter (or the server) and the receiver (or the client). In addition, we extend the functionality of this control connection to serve as a feedback channel between the server and the client. There are several Internet video streaming implementations available today. Unfortunately, the performance of these integrated systems is at the mercy of network trac. Especially, with the modem Internet access, their performance ranges from marginal in a low trac condition to complete breakdown in a heavy trac condition. The objective of our current research is to implement a video streaming system that can continually adapt to changing network conditions in a strict environment as explained below. The block diagram of our current prototype system, which was implemented utilizing the Internet modem connection of the USC modem pool is shown in Fig. 1. Most real-time video streaming has a strict restriction on tolerable delay. One consequence is that the computational complexity of each component in the system has to be low so that the cumulative computational cost of these components is suciently small to be able to run in real time. We have put a lot of eort to make this prototype system satisfy the real-time video streaming requirement connected through an Internet modem of a speed less than 44 Kbps. For a detailed description of this implementation, we refer to our previous work [5].
2.2. Error Control Approach to Packet Loss
Some general assumptions about the Internet behavior can serve as a guideline for the proposed approach. As mentioned before, some packets are bound to be dropped by the network in a real time video environment. Thus, once the client detects a lost packet, a two-fold response can be made by the client. First, the client should localize
Figure 1.
tion.
The block diagram for a prototype system for real time video streaming via the Internet modem connec-
the portion of the picture corresponding to the lost packet and conceal the error through a certain error concealment technique. However, if there exists a feedback channel from the client to the server as in our case, the client can notify the server the occurrence of the lost packet and its corresponding location on the picture. Let us rst address the error concealment issue shortly. The packet loss should be accepted as unavoidable and the lost video picture data in the dropped packet should be restored with some error concealment technique prior to playback. If a large portion of the video picture is missing due to packet loss, error concealment methods are of very limited use. In the practical context of RTSP video employing the H.263+ compression standard, data forfeited in a single packet loss rarely constitute an entire frame. It is more common that horizontal slices or macroblocks (MBs) are missing. Reconstruction algorithms in both spatial and temporal domains have been proposed to restore the missing MBs [8]. In H.263+, lost packets manifest in missing horizontal strips or group of blocks (GOBs). Most prevalent concealment techniques chosen for quick execution include repetition of missing GOBs from the previous picture and spatial interpolation. While fast, the replication method can result in jagged and abrupt edge discontinuities. The spatial interpolation scheme makes concealed GOBs blurred and of a lower resolution in comparison of their surrounding GOBs. An alternative concealment technique adopted in our implementation is based on the interpolation of DCT coecients that oers an improved resolution and yet has a complexity low enough to run in real time. Our implementation is a modi ed version of the work presented in [8], where DCT coecients of the missing macroblocks (MBs) are interpolated according to the system of linear operations that decide to what degree the four neighboring MB DCT coecients will contribute toward the reconstruction. However, as good as these concealment methods are, the reconstructed slices are still of a lower resolution compared to the original one. The fact that motion estimation in H.263+ is performed on adjacent pictures exacerbates the resolution deterioration. The adjacent picture dependency results in the propagation of the reconstructed error pictures. A better solution would be to insert an intra-coded I picture immediately following the occurrence of a packet loss. However, this solution is not a robust one either. The size of intra-coded I pictures tends to be on the average
about an order of magnitude larger than that of inter-coded P frames. If an intra-coded I picture is sent as means of correcting a problem resulting from lost packets, it is unlikely that such a large size intra-coded I picture will arrive loss-free. It might be more advisable to transmit in the intra-coded I picture format only the location corresponding to missing MBs due to lost packets, and the remainder in the inter-coded P picture format that consumes less bandwidth. Based on error tracking through the feedback, this adaptive intra-refresh (AIR) technique can achieve the desired refreshing eect without burdening the data connection much. It uses only twice (rather than ten times) the amount of bits compared to that of a pure inter-coded P picture, and still has the desired result in removing the reconstruction error propagation. For the realization of the AIR technique as a means of video quality improvement, all that remain is the need of the client notifying the server of the location of the missing MBs. This can be done utilizing the control connection in RTSP as a feedback channel. The AIR method of employing the hybrid IP picture should be bandwidth economical. Furthermore, this solution is especially well suited for H.263+ since H.263+ does not have a repeating picture cycle structure such as the group of pictures (GOPs) and, consequently, the pictures are encoded using only the previous picture as the reference frame. This means that since every picture is only dependent on its previous picture, the visual gain in removing reconstruction error persistence from missing MBs should be very pronounced. In addition to the AIR technique, another error resilience technique named reference picture selection (RPS) is proposed in H.263+. This method is somewhat complicated but may be more eective in handling packet loss. It restricts and synchronizes the transition to the next reference (reference for motion prediction) frame only after the reconstruction of the same reference frame is completed without error at the client side. The bit usage of Intra-coded MBs can be avoided at the cost of extra motion search.
2.3. Error Avoidance Approach to Packet Loss
Lost packets are a fact of life for real time Internet applications and real time Internet video should cope with dropped packets. Besides the error control approach described above, we focus on the other facet, namely, the error avoidance side of the lost packet problem in this paper. Our objective is to decrease the network load in the hope of minimizing the possibility of packet loss in the immediate future. In this manner, the transmitter can continuously perform adaptive packet transmission for the purpose of packet loss minimization and congestion avoidance. While rate control of the streaming server can alter the bit stream size, a separate entity should dictate the amount of bit stream alteration. Up to the present, the most popular congestion avoidance has been guided by a multiple decrease/single increase scheme [4]. However, with the support of a feedback channel from the receiver to the server, the receiver can signal back to the transmitter about the state of the network trac. Of course, feedback schemes are suitable primarily for interactive, individual point-to-point communications. Furthermore, they are highly dependent on the round-trip delay. The increase in the round-trip delay inevitably leads to less ecient and eventually useless feedback system. However, an application such as an Internet videophone represents a situation where there is just one receiver for every transmitter. Thus, the packet loss minimization can be coordinated jointly between the receiver and transmitter via a feedback channel. Similar to the client, the server (or the H.263+ encoder inside the server) should monitor the status reported by the client and modulate the size of the bit stream placed on the network accordingly. The encoder can either vary the quantization step size or the frame rate to modulate the size of the encoded bit stream. The server can infer the network delay with the packet inter-arrival times, and this delay can vary commonly as much as an order of magnitude during the same transmission. This implies that the encoder in the server should be able to continuously reduce the encoded bit stream size by as much as tenfold. It will be ideal if this reduction is done based on thresholds derived from the rate-distortion curve as a function of packet loss rate vs. video quality. More details regarding how to adjust the size of the bit stream will be continued in Section 3.2.
3. NETWORK FRIENDLY VIDEO STREAMING 3.1. Adaptive LMS bandwidth controller
Some UDP packet loss is accepted as unavoidable. However if the client can instruct the server to send packets of a certain size transmitted at a certain time interval, then the number of packets dropped can be reduced to a tolerable level. A feedback channel controller residing on the client side can perform this task. Most controllers need a model
of the system they are trying to control. Unfortunately, Internet's intrinsic characteristics have de ed attempts to adequately model it. First of all, it is a non-centralized distributed network with the best-eort delivery. The myriad of network con gurations and setups that make up the Internet constitute a heterogeneous environment. Second, the Internet's recent doubling in its size every six months only exacerbates the diculty in modeling it. Thus, as a round-about solution to this modeling problem, we selected a a class of model-less controllers called adaptive LMS controllers. While the response time of the adaptive LMS controller is slower than its model-based counterpart, its robustness makes it a suitable candidate to control a nonlinear and non-stationary system such as the Internet. The adaptive LMS control is performed based only on output observations. In our case, the observations would be the sequence numbers and inter-arrival times of the packets arriving at the client sockets. With the size and the transmission time of the successfully transmitted real time protocol (RTP) packets, which are built upon UDP, network trac conditions can be inferred. By examining the sequence number of the arriving packets, we can surmise which packets were lost and the packet loss rate. Likewise, we can observe the inter-arrival time and the payload size of the arriving packet to estimate the available bandwidth the Internet is capable of sustaining at that particular instance.
Figure 2.
The conceptual system block diagram for the adaptive LMS bandwidth controller.
Based on its observations, the adaptive LMS controller at the client side as shown in Fig. 2 instructs the server to transmit the bitstream according to the desired packet size and the inter-arrival time. An LMS controller has to be designed to operate on these two observation variables. However, as a simpli ed approach, we adopted one replacement variable in this work which quantities the ratio of the packet size over the time interval for its transmission. We call this variable the instantaneous available bandwidth and denote it as ABW . The actual implementation form for the LMS control can be described as
ABWk+1 = ABWk + 2(TH ; k ) pkt =;
(1)
where k denotes a time index according to some internal control clock, which has a comparable resolution to the packet transmission clock, k is the packet loss rate at time k, TH is threshold for acceptable (or desired) packet loss rate, pkt is the size of the last successfully transmitted packet and is the inter-arrival time between the last two packets, and is the adjustment control parameter determined through empirical tuning. That ratio pkt = has some equivalence to the available bandwidth which the client is currently experiencing. Thus, if the packet loss is detected (i:e: the packet loss rate k is increased beyond the acceptable loss rate TH ), the controller will reduce the amount of the available bandwidth ABWk+1 of the next instance.
3.2. Real Time Rate Control for Bandwidth Conforming Server
There are three known existing methods for reducing the size H.263 bitstream, such as coarser quantization, pre ltering (scalable, spatial), and frame dropping [9]. This ordering also coincides with the degree of impact. For example, reducing the number of quantization levels while being the easiest will have the least aect on bitstream size and video quality. Analogously, frame dropping will shrink the bitstream the most but at the same time be least visually pleasing. In H.263+ the number of quantization levels can be adjusted from as many as 256 to as few as 8 levels(25
levels are commonly used). The quantization step of every frame type I, P and B can be adjusted. The advantage of quantization level adjustment is that the eect it has on video quality change is not too drastic. Its disadvantage lies in its limited ability to reduce the bit stream size. If you start with a compressed sequence with 26 quantization levels (110Kbps) then the bitstream size can be reduced at most to around 40Kbps (with 12 quantization levels) before the video quality deteriorates beyond acceptable resolution. That is only a reduction of 2.75 to 1. On the other hand, pre ltering can be constructed as a form of resolution reduction. In essence, the resolution is reduced prior to compression. The advantage of this approach is that scalability is introduced. The uncompressed raw video frames can be retained according to SNR or spatial scalability (the only dierence between SNR and spatial is that the latter is maintained with low resolution video being a fraction of the size of the high resolution video). With this way, the resolution of the resulting video at the client side is kind of known prior to compression. Last, frame dropping yields the most drastic bitstream reduction and is also the most detectable scheme by human eyes. Frame dropping is nothing more than pre ltering temporally. In order to use frame dropping, both the encoder and the decoder should have multitasking threads activated to accommodate changing temporal reference eld in the picture header. A better approach is to perform rate control on a frame level according to the feedback from the controller, and then to encode the frames into a format where a packet has no dependency on its adjacent packets. In this work, we adopt a fast rate controller for the video codec H.263+ given in [10], which is suitable for real-time applications. While most rate control algorithms examine bit allocation at the macroblock (MB) level, this method looks at bit allocation from the frame level perspective. Although performing bit allocation from the broader frame level is more dicult than from the limited MB level, it does yield a greater control over temporal quality. This scheme allows us to estimate the optimal rate for the current frame according to a speci c cost function including bandwidth, all with low computational complexity and no time-delay.
4. EXPERIMENTAL RESULTS
To measure the performance of the network friendly transmission control via adaptive LMS bandwidth control, the following simulation is conducted. In order to generate a replicable and reproducible network environment for easy comparison, we have chosen a discrete event simulator targeted at networking research, i.e. the network simulator (NS) [12]. NS is an output of a joint research project spear-headed by ISI (Information Sciences Institute, a USC research lab) and LBNL (Lawrence Berkeley National Lab). Due to the sheer scope of the network to be simulated, NS (now in version 2) is still in its relative infancy. However, since it can provide substantial support for simulation of TCP, routing, and multicast protocols, it should be sucient to serve our needs. Notice that it has quickly become the tool of choice for network researchers.
Figure 3. Node con guration for the adaptive LMS controller simulation for NS, where the thin dotted lines denote all other tracs except for the thick dotted trac assigned for the comparison.)
The node con guration for the NS simulation of the proposed adaptive LMS controller's is depicted in Fig. 3. There are initially 6 nodes (marked by circle) and the trac on link node2 to node3 is bounded by the limit of 2.7 Mbps. There are four 0.55Mbps connections which begin at four dierent times:
6
6
NS Reproducible Test Bed Network Conditions
x 10
3
2.5
2.5
2
2
Bits per Second
Bits per Second
3
1.5
1.5
1
1
0.5
0.5
0
0
0.5
1
1.5 Seconds
2
2.5
0
3
Non LMS CBR Loaded onto the Test Bed @ 1 second
x 10
0
0.5
1
(a)
6
3
2.5
2.5
2
2
Bits per Second
Bits per Second
3
1.5
1
0.5
0.5
0
0.5
1
1.5 Seconds
2.5
3
2
2.5
3
2
2.5
3
All Three Plots Seen Together
x 10
1.5
1
0
2
(b) 6
LMS Controlled Source Loaded onto the Test Bed @ 1 second
x 10
1.5 Seconds
0
0
(c)
0.5
1
1.5 Seconds
(d)
Figure 4. Bandwidth usage for transmission and bandwidth lost by packet loss: (a) before the fth trac, (b) with the fth trac without LMS, (c) with the fth trac adjusted by the LMS bandwidth control, and (d) overall plots together for comparison. (In (b)-(d), the bottom lines correspond to the bandwidth lost by the packet losses.)
at 0.231 sec 512 byte packets every 7.5ms (node0 to node5), at 0.486 sec 1024 byte packets every 15 ms (node1 to node4), at 0.607 sec 256 byte packets every 3.75ms (node4 to node0), at 1.45 sec 128 byte packets every 1.875ms (node5 to node1).
The bandwidth usage of these four transmissions before applying the main fth trac (marked by thick dotted line) is shown in Fig. 4(a) and the bandwidth usage has some margin to the 2.7 Mbps limit all over the period. Our objective is to start a fth trac (one with LMS and the other just with the constant bit rate) at 1 sec 512 byte packets loaded for every 7.5ms from node6 to node7 (thick dotted line of Fig. 3), and compare their performance. The results are shown in Fig. 4(b), where packet losses begin almost immediately and once again the lower graph is the amount of data lost due to dropped packets. Compared to this, if we start at 1 sec 512 byte packets loading according to the LMS feedback information, the result is shown in Fig. 4(c). In this case, the LMS controller starts
o at 0.55Mbps. When it encounters a loss around 1.45 second, it reduces its output to a steady state of around 0.24 Mbps. Thus, the corresponding loss is not as severe as the unmodulated RTP case as shown in Fig. 4(b). Finally, all traces are put together for the comparison purpose in Fig. 4(d). The working visual interface generated by the network animator (NAM) interface of NS is provided in Fig. 5. Actually, with the help of NAM, we can observe the dynamic behavior of the above network traces, which cannot be reproduced on the paper. However, it is easy to follow the packet loss instances and compare the performance with and without LMS control.
5. CONCLUSION AND FUTURE WORK
A coordinated approach between the server and the client in dealing with lost packets from the network transmission viewpoint was proposed in this paper. As network conditions vary, so does the packet loss rate, which in eect changes the available bandwidth for the Internet applications. While previous eorts, including error concealment and rate control, have been presented as a solution to this problem, they are executed only at one end of the serverclient pair. Our focus was on the design of a video streaming system which can harmonize these solutions. Thus, as a partial result of this eort, we showed the possibility of network friendly video streaming via the use of adaptive LMS bandwidth control concept based on feedback. As a possible extension of this work, a coordinated combination with the error control techniques such as error resilient coding and concealment is desirable. In doing this, special emphasis will be paid on the harmonization of component algorithms at both ends of the server-client pair in order to maximize the combined gain of their individual algorithms via the feedback channel. In other words, The feedback channel would enable the server and the client to work in tandem and continuously adjust to changing trac conditions.
REFERENCES
1. H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, RTP: A Transport Protocol for Real-Time Protocol, RFC 1889, Internet Engineering Task Force, Audio-Video Transport Working Group, Jan. 1996. 2. H. Schulzrinne, A. Rao and R. Lanphier, Real Time Streaming Protocol (RTSP), RFC 2326, Internet Engineering Task Force, Multiparty Multimedia Session Control Working Group, May 1998. 3. J. Hunter, V. Witana and M. Antoniades, \A review of video streaming over the Internet", http://www.dstc.edu.au/RDU/sta/jane-hunter/video-streaming.html, Project SuperNOVA, Distributed Systems Technology Centre, Australia. 4. D. Chiu and R. Jain, "Analysis of the increase/decrease algorithms of congestion avoidance in computer networks" , Computer Networks and ISDN Systems, vol. 17, pp. 1{14, June 1989. 5. Y. Chung, J. Kuo, "Non-disruptive RTSP video over the Internet using a modem connection" , SPIE Visual Communication and Image Processing, San Jose, Jan. 1998. 6. E. Steinbach, N. Farber and B. Girod, \Standard compatible extension of H.263 for robust video transmission in mobile environments", IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, no. 6, pp.872{881, Dec. 1997. 7. N. Farber and B. Girod, \Robust H.263 compatible video transmission for mobile access to video servers", in Proc. of ICIP '97, vol. 2, pp. 73{76, Santa Barbara, Oct. 1997. 8. J. W. Park, J. W. Kim and S. U. Lee, \DCT coecient recovery based error concealment technique and application to the MPEG-2 bit stream error", IEEE Trans. on Circuits and Systems for Video Technology, vol 7, no. 6, pp. 845{854, Dec. 1997. 9. M. Willebeek-LeMair and Z. Shae, \Videoconferencing over packet-based networks", IEEE Journal of Selected Areas in Commun., vol. 15, pp. 1101{1114, Aug. 1997. 10. H. Song and J. Kuo, "Rate control for low bit rate video via variable frame rates and hybrid DCT/wavelet I-frame coding," submitted to IEEE Trans. on Circuits and Systems for Video Technology, April 1998. 11. T. Chiang and Y. Zhang,"A new rate control scheme using quadratic rate distortion model," IEEE Trans. on Circuits and Systems for video Technology, vol. 7, no. 1, pp. 246{250, Feb. 1997. 12. UCB/LBNL/VINT Network Simulator - ns (version 2), http://www-mash.cs.berkeley.edu/ns/.
(a)
(b) Snapshot of network animator (NAM). (a) With the fth trac without LMS. (b) With the fth trac adjusted by the LMS. (The two bottom boxes starting with triangular marks stands for the bandwidth used and packets dropped from node 2 to node 3, respectively.) Figure 5.