SCALABLE MULTIMEDIA STREAMING MODEL AND TRANSMISSION SCHEME FOR VBR-ENCODED VIDEOS Md. H. Kabir
Eric G. Manning
Gholamali C. Shoja
PANDA Research Lab, Department of Computer Science University of Victoria, Victoria BC, Canada
[email protected],
[email protected],
[email protected]
ABSTRACT Streaming of audio/video contents over the Internet requires large network bandwidth as well as timely delivery and playback of the media data. Media files are huge, requiring use of compression techniques that produce traffic bursts. An entire audio/video media file cannot be cached due to intellectual property rights concerns of the content owner, security reasons, and also due to its large size. Large network latency and jitter result in a long start-up delay and frequent unwanted pauses in the playback, respectively. For these reasons multimedia streaming service is hard to scale. In this paper, we have proposed a proxy based streaming model and transmission scheme that deals with the above issues efficiently to stream stored and non-interactive media data. Our model scales up cost effectively with increasing number of streams and clients. We have used prefix buffers at the proxy in order to assist in smoothing operation. In addition, smoothing buffers are also used in order to eliminate unwanted pauses in playback, to enable many clients to share the same server stream, and to reduce the peak bandwidth requirement on the serverproxy-client path. Simulation results are presented to demonstrate the cost effectiveness of the proposed streaming scheme.
KEY WORDS Distributed software systems and applications, Multimedia systems, scalable streaming, proxy servers, VBR-encoded videos, prefix and smoothing buffers 1.
Introduction
Streaming audio and video over a network requires timely delivery such that the streamed media data can be processed as a steady and continuous stream at the receiver end. In general, compressed audio/video files, such as MPEG-2 [1] files, are sent from the server in a steady stream and decompressed in real-time by the media player before playback. Each compression or encoding technique has a mean bit rate. For example, the mean bit rate of MPEG-2 is 4 Mbps. The encoding technique for streaming can be either constant-bit-rate (CBR) or variable-bit-rate (VBR). CBR encoding enforces a constant bit rate over a group of pictures (GOP)
regardless of the complexity of the video interval [2]. For a CBR encoded stream the channel bit rate is equal to the encoding rate. A video of a static image like a bowl of fruit needs a much lower bit rate than a video of a hockey game. However, CBR encoding reduces the degrees of freedom for the encoding task. CBR encoding also reduces the quality of video intervals such as wrestling match, which are full of action. For these reasons, VBR encoding is preferred to CBR encoding. In VBR encoding the quantization scale is simply kept at a constant value [2] and no rate buffer is placed at the output of the encoder. The bit rate over GOP therefore varies with the complexity of the video interval. VBR encoded streams have frequent traffic bursts. For this reason, one needs a high peak bandwidth for transmitting VBR encoded video. The available network bandwidth for a streaming server is not enough to serve thousands of clients who want to watch the latest popular movies at the same time. Server’s limited disk bandwidth, memory size, and processing speed also constrain its ability to serve many clients concurrently. Network latency causes large start-up delays, while network jitter inserts pauses in the playback. Frequent traffic bursts of VBR encoded videos also insert frequent pauses in the playback. It is not possible to cache entire files of streaming media at a streaming proxy, for two reasons. First, streaming media files are typically very large. Secondly, for business and security reasons, the owners of the streaming media content do not want the proxy or the client to cache the whole media file. All of the above issues make a streaming service over the Internet very hard to scale economically. In this paper, we are presenting a proxy based scalable streaming model and transmission scheme for VBRencoded stored videos. Our transmission scheme will also work with CBR-encoded videos. We use prefix buffers at the proxy to store only the first few frames or the prefixes of popular videos in order to reduce start-up delay and to assist in smoothing out jitter and traffic burst effects. We also use smoothing buffers at the proxy in order to enable many clients to share the same server stream and to smooth out jitter and traffic burst problems between the
server and the proxy. The main contributions of our work are as follows: a)
Smoothing measures between the server and the proxy reduce the peak bandwidth requirement between them while the server is transmitting a VBRencoded video. b) Sharing the same server stream among many clients through a smoothing buffer makes the transmission scheme scalable for both CBR- and VBR-encoded videos. c) Client requirements are kept minimal, i.e., a client with only a single channel and a small buffer can economically receive a stream using our transmission scheme. The rest of the paper is organized as follows. In section 2 we will describe some related works and explain why our work is different, and more scalable. In section 3, our streaming model and transmission scheme will be presented. Some experimental results showing the performance of our model and comparison with other models will be given in section 4. Section 5 will conclude this paper by summarizing the benefits of our model and elaborating on our future research plans with respect to media streaming. 2.
Related Research Works
Sen et al. [3] use a proxy with prefix and work-ahead smoothing buffers to help mask the network latency and jitter problems. The proxy stores the prefixes of the popular videos. After intercepting a client’s request, the proxy immediately starts transmitting the prefix to the client and simultaneously sends a request for the suffix to the origin server. The proxy buffers the incoming suffix from the server in the smoothing buffer and transmits the suffix from the smoothing buffer when it finishes the prefix transmission. Prefix caching thus hides the network delay and helps in performing work-ahead smoothing to absorb network jitter. The authors also use the prefix and the smoothing buffers to perform online smoothing to reduce the peak bandwidth requirement only between the proxy and client but not between the server and proxy. The proxy computes a smoothed proxy-toclient transmission schedule, which is multiple runs of constant bit-rate transmissions, using the prefix, smoothing, and playback buffers. It does not allow the sharing of the smoothing buffer. Each client request needs a separate smoothing buffer at the proxy. This requires a huge amount of buffer space at the proxy to stream many videos to a large number of clients. The proxy uses neither the prefix buffer nor the smoothing buffer to enable many clients to share the same suffix from the server, i.e., the transmission scheme is not scalable. No smoothing measure is taken between the server and the proxy. For this reason, the peak bandwidth requirement between the server and the proxy remains high, and the bandwidth allocation process at the server remains complicated. Moreover, the proxy has to absorb large
traffic bursts from the server, which again needs a larger work-ahead smoothing buffer at the proxy for each video and for each client. We are using smoothing buffer to enable many clients to share the same suffix from the server in order to make our transmission scheme scalable. We also do work-ahead smoothing between the server and the proxy in order to reduce the peak bandwidth requirement for VBR traffic between them. All of these make our work more scalable and cost effective than that of [3]. Wang et al. [4] use a proxy with only prefix buffers to batch (SBatch), patch (UPatch, MPatch), or merge (MMerge) several client requests for the same stream in order to share a suffix stream from the server while reducing the effect of network latency. It can handle only constant-bit-rate (CBR) encoded videos. Since no smoothing buffer is used, variable-bit-rate (VBR) encoded videos cannot be smoothly streamed by their methods. It cannot handle network jitter problem since no work-ahead smoothing is done. All of their transmission schemes consider just-in-time arrival of streaming data at the proxy, an assumption that cannot be guaranteed in the Internet. In fact, just-in-time arrival is a costly proposition in a network even when it is feasible. All of their transmission schemes need more than one channel between the proxy and the client, a requirement that may not be economically feasible in many cases, such as current Digital Subscriber Loops (DSLs). We are using smoothing buffers instead of prefix buffers to enable many clients to share the same suffix stream. For this reason, our transmission scheme is effective and scalable for both VBR and CBR traffics. Use of smoothing buffer and work-ahead smoothing makes our scheme capable of handling both network jitter and VBR traffic burst problems. Our transmission scheme does not need multiple channels between a proxy and a client. These are the significant improvements over the work that is done in [4].
3. Proposed Model And Transmission Scheme 3.1 Model We propose a streaming model, where a server streams multiple videos to multiple clients through one or more proxies. We assume the server and the proxies are connected via the Internet with varying degrees of QoS guarantee and with only unicast enabled. The best effort IP is not suitable for our model. Our model needs guaranteed bandwidth, and a limit on maximum allowable delay and jitter between the server and the proxy, i.e., we need QoS guarantee in the Core Network. We also assume that the path between the proxy and the client uses a high QoS guaranteed link such as DSL, Cable, GigabitEthernet (GigE), Passive Optical Network (PON) etc. These assumptions are realistic for currently deployed and soon-to-be-deployed technology in North America and
the North Pacific. Figure 1 shows a simplified diagram of our model. Our aim is to make the transmission scheme highly scalable through sharing a server-to-proxy stream among as many clients as possible and by reducing the peak bandwidth requirement on the server-proxy-client path for VBR-encoded stored videos.
0
2dmin
t
Bprefix t+B prefix
Prefix
2dmin+W
Client 1
Suffix
Smoothing Buffer V ideo
Read Pointers
S tream ing S erver
Write Pointer
In te rn e t
Prefix
Client 2
Suffix
cable
P roxy
P roxy
xD S L client
xD S L client
C able c lient C able client
Figure 1. System model for streaming 3.2 Unicast Suffix Merging (UniSMerge) Transmission Scenario This is a proposal for a new streaming scheme, aimed at addressing the shortcomings of the existing work as noted above. Each proxy stores the prefix of length B prefix seconds of a video in advance to overcome network latency and to help in work-ahead smoothing. Workahead smoothing is done in order to overcome jitter problem and the traffic burst problem of VBR-encoded videos, and to reduce the peak bandwidth requirement between the server and the proxy. We assume that the proxy is doing work-ahead smoothing over a window of W seconds for a video. The proxy intercepts a client’s request, starts streaming the prefix immediately, computes the suffix length and sends the suffix transmission request to the origin server. It caches the incoming suffix in W smoothing buffer, as it arrives. The proxy starts streaming from W when prefix streaming is finished. We assume W is a circular buffer, which has only one write pointer but many read pointers. Both types of pointers are wrap-around – they can start over again from the beginning when they reach the end. The proxy uses the write pointer to write the incoming suffix from the origin server and uses a read pointer to read the cached suffix when it needs to stream the suffix to a client. As shown in Figure 2, when a new request arrives for the same video, the proxy immediately starts prefix streaming and does not send a new suffix request to the origin server if the new request can read the suffix from W . This is possible if the first byte of the suffix is still available at W for the new request at the time it finishes its prefix streaming. In that case, two suffix transmissions from the server are merged into one.
Figure 2. Sharing a prefix and a smoothing buffer We have introduced this new idea of using a circular buffer for work-ahead smoothing as well as for merging several suffix transmissions from the server into one transmission. The use of a circular buffer at the proxy allows us to merge several server streams (of the same movie) even over unicast channels, avoiding the need for costly multicast facilities. It also allows the proxy to start the suffix transmission to a later client after finishing its prefix transmission, i.e., one channel between the proxy and a client is always sufficient, which is a considerable cost saving in many cases. Further, a client does not need a very large buffer space since it never buffers the suffix while playing the prefix. 3.3 The Size of the Prefix Buffer We are using a prefix buffer to reduce network latency, jitter and VBR traffic burst problems. In this section, we are computing the size of our prefix buffer based on them. We assume the maximum and the minimum server-tos, p
s, p
proxy propagation delays to be d max and d min seconds respectively, i.e., the maximum server-to-proxy jitter is ,p s, p s, p ∆smax = d max − d min . To start the streaming at the proxy
immediately, after overcoming above propagation delay and jitter in round trip, the proxy has to pre-fetch video data in the prefix buffer. For this reason, the size of the prefix buffer at the proxy should be s, p ,p s, p B prefix = 2d min + 2∆smax = 2d max
(1)
We also use the prefix buffer at the proxy to assist in work ahead smoothing so that the server can transmit variable-bit-rate suffix data at a constant bit rate with a reduced peak, i.e., at the mean bit rate b bits per second. For this reason, we need to pre-fetch additional data in the prefix buffer. In order to compute the size of this additional data, we assume that the size of jth frame of a video is f j bits and there are total n frames in the video.
We
can
as f mean =
compute
the
mean
1 b ∑ f j = n j =1 frame _ rate n
frame
size
and
find
a min(P ) such that
j ∀j ( j ∈ (1,...,n))Λ ∑ fk ≤ ( j × fmean + P) k =1
overwrite the suffix data before it has been transmitted to client 2, the following relation must hold.
(2)
If we pre-fetch P bits video data in the prefix buffer, server’s mean-bit-rate suffix transmission will never cause the video data at the proxy falling behind the schedule. And the user will not experience any unexpected pause in playback. We call this P bits video data proxy work-ahead pre-fetch data. After adding this P / b seconds the size of the prefix buffer will be
P P s, p ,p s, p (3) + 2∆smax + ) = (2d max + ) B prefix = (2d min b b Using expression (3) we will allocate B prefix seconds prefix buffer for this video at the proxy. 3.4 The Size of the Smoothing Buffer We are using the smoothing buffer to smooth out jitter and VBR traffic burst as well as to enable many clients to share the same suffix. In this section, we will compute the size of our smoothing buffer based on those factors. In Figure 2, client 1 requested a video at time 0. The proxy starts the prefix transmission at time 0 for client 1 and requests the suffix transmission from the server. The earliest possible time at which the proxy can receive the suffix data from the server is 2d shown as dmin).
s, p min
(In Figure 2, d
s, p min
is
The proxy writes incoming suffix into the smoothing buffer. The smoothing buffer can contain W seconds suffix data without overwriting. Therefore, the proxy does not need to overwrite the suffix data in the smoothing buffer until the time ( 2d min + W ) . The proxy needs to start the suffix transmission from the smoothing buffer to client 1 at time B prefix . To guarantee that the proxy does s, p
not overwrite the suffix data before it has been transmitted to client 1, the following relation must hold.
B prefix ≤ (2d
s, p min
+ W ) , i.e.,
s, p ,p W ≥ (Bprefix−2dmin ) = (2∆smax +
P ) b
s, p (t + B prefix ) ≤ (2d min +W)
(4)
(5)
(6)
From expressions (6), (5), and (3) we can derive that: s, p t ≤ (2dmin +W − Bprefix) = (W − Bwork−ahead)
(7)
From expression (7), we can say that when a smoothing buffer equal to W seconds is allocated for a video, all the requests within the next interval T can share the same smoothing buffer, i.e., the same server-side suffix transmission, where T = (W − Bwork − ahead ) (8) The larger the T, the more requests can share the same suffix and the smoothing buffer. We define this T second portion of the smoothing buffer as the sharing smoothing buffer. Therefore, we must allocate a smoothing buffer W greater than Bwork − ahead at the proxy for this particular video in order to share its suffix among many clients. 3.5 The Minimum Start-up Delay at the Client In order to compute the minimum start-up delay at the p ,c
client, we assume d p is the proxy-to-client propagation p ,c
delay with negligible jitter, i.e., d p is constant. DSL supports jitter free transmission from the proxy to client, i.e., the above assumption is quite acceptable with DSL. We define d f as the player’s minimum playback delay to accumulate necessary number of frames in a playback buffer to decode the frames and to smooth out the peaks and lows in VBR-encoded video data. From expression (2), we observe that the client can start playback after accumulating P bits of video data, i.e., d f =
P . We b
need to allocate d f seconds or P bits buffer at each client for this particular video. Then the client’s start-up delay can be computed as
d s = (2d pp ,c + d f )
Expression (4) shows that the proxy needs a minimum smoothing buffer for each client in order to smooth out jitter and VBR traffic bursts when some clients do not share suffix and smoothing buffer. We have defined this as the work-ahead smoothing buffer, Bwork − ahead , i.e.,
P p Bwork − ahead = 2∆s,max + b
In Figure 2, client 2 requested the same video at time t. The proxy may stream the video data from the same prefix buffer and smoothing buffer for client 2. The proxy needs to start suffix transmission to client 2 at time (t + B prefix ) . To guarantee that the proxy does not
(9)
3.6 The Average Cost Function To evaluate our transmission scheme we compute the average cost function of our transmission scheme as the function of access rate and smoothing buffer size. Smoothing buffer is dynamically allocated and deallocated by the proxy based on demand. The proxy allocates one smoothing buffer to a video when a request for that video arrives and there was no request for that video within the immediate past interval T. If a new request comes within the next interval T it shares the first
smoothing buffer with the first client. Otherwise, a new smoothing buffer is allocated for the new client. So, the proxy may allocate several smoothing buffers for the same video. We assume that the arrival of the requests for a video is a Poisson process. If λi is the request arrival rate of video i, according to Poisson probability distribution the expected number of requests for video i in the interval [0, T] is λi T . This does not include the first request, which triggers a smoothing buffer allocation event. Therefore, the total number of requests that share a single smoothing buffer or a single suffix stream of video i becomes (1 + λi T ) or 1 + λi Wi − Bi , work − ahead .
(
(
))
We assume that Li seconds is the length for video i, cst is the cost of transmitting one bit of video data from the server to the proxy, cpt is the cost of transmitting one bit of video data from the proxy to the client, cm is the cost of buffering one bit of video data then the average cost of this video for a client is
C (λi ,Wi ) = c st
Li − Bi , prefix
1 + λi (Wi − Bi , work −ahead )
c pt Li bi + c m + cm
Bi , prefix
λi
bi +
Wi bi 1 + λi Wi − Bi , work − ahead
(
bi + c m Pi
)
(10)
In expression (10), the first term is the per client suffix transmission cost from the server to the proxy and the third term is the per client smoothing buffer cost at the proxy, where the suffix is Li − Bi seconds long and the
c st = c pt = c m = c , for a particular run. We have used the values for the video parameters from a high quality MPEG-4 encoded movie Jurassic Park I. We have used the video trace of this movie from [6]. The length of this video is 1 hour, i.e., there are 90002 frames at a frame rate 25 frames/sec. The mean and the peak bit rate of the video are 0.77 Mbps and 3.3 Mbps respectively. The standard deviation of frame size is 2259.05 bytes. Using our new formula (2) we find the required size of workahead-pre-fetch data P is 15.56MB, equivalent to 161.66
seconds, i.e., the size of our playback buffer at each client is 161.66 seconds. We assume the maximum propagation
delay between the server and proxy is 500ms, the minimum propagation delay between the server and proxy is 100ms, and the propagation delay between the proxy and a client is 100ms. Form expression (3) and (5) the sizes of our prefix and work-ahead smoothing buffers are 162.66 seconds and 162.46 seconds respectively. We have used 3 minutes or 180 seconds smoothing buffer, which is equivalent to 17.325 MB buffer and greater than that of our work-ahead smoothing buffer. This choice of smoothing buffer size ensures that the size of our sharing smoothing buffer is greater than zero or 17.54 seconds, which is necessary to ensure resource sharing among many clients in order to make our transmission scheme scalable. Videos are ranked according to their popularity. Bestavros et al. [5] observed that the distribution of web requests follows a Zipf-like distribution:
k=
smoothing buffer is Wi seconds long and both are shared
(
)) clients. The second term
is the whole video transmission cost from the proxy to a client. The fourth term is the per client prefix buffer cost at the proxy, where the prefix buffer is Bi , prefix seconds long and is shared by
λi clients on the average. And the
P last term is i seconds playback buffer cost at a client. bi 4.
Experimental Results
We have written a Java simulation program to analyze the performance of our UniSMerge transmission scheme and compare it with Online Smoothing technique [3] and SBatch technique [4]. Each time we run our simulation program, we assume that 100 videos are being served through the proxy and the aggregate access rate for these videos is λ requests/second. For simplicity, we assume that all the videos are of equal length, have equal mean bit rate, and all the unit costs are equal, i.e.,
k iα
, where
1 when the web objects are ranked 1 α ∑ i =1 i N
according to their popularity. We have used the same distribution for the videos. We draw the access probability ρ i for video i from a Zipf-like distribution with parameter α = 0.271. Multiplying the aggregate access rate λ by ρ i we derive the access rate λi for video i. 6 Average Costs
(
by 1 + λi Wi − Bi , work − ahead
ρ=
5
UniSMerge Smoothing Online
4
SBatch
3 2 0
50
100
150
200
250
Smoothing Buffer Length (seconds)
Figure 3. Average Cost vs. Smoothing Buffer Length
The smoothing buffer is used to facilitate the resource sharing in our UniSMerge, the same is done by the prefix buffer in SBatch. Therefore, the average costs against different smoothing buffer lengths in UniSMerge are analogous to the average costs against different prefix buffer lengths in SBatch. Figure 3 plots the average cost against the smoothing buffer length under UniSMerge and Online Smoothing [3] transmission schemes and the average cost against the prefix buffer length under SBatch [4] transmission scheme. It shows that the average cost always remains very high and increases with an increase in smoothing buffer length in Online Smoothing. This is because an increase in buffer length increases the cost but this increased buffer does not facilitate resource sharing among more clients. The graph of SBatch cost shows that the average cost decreases very sharply at the beginning as the length of prefix buffer increases. But after a certain point the cost starts to increase and it continues. However, the average cost in our UniSMerge transmission scheme is always decreasing with an increase in the length of the smoothing buffer. It also decreases very sharply at the beginning. So, we can always add more smoothing buffer in our transmission scheme to reduce the cost. Figure 3 also shows that the cost in our UniSMerge transmission scheme is the lowest most of the time compared to that of other transmission schemes.
Average Costs
6 5
UniSMerge Smoothing Online SBatch
4
2 200
400
600
800
In our streaming model, a proxy enables many clients to share a single server stream, a prefix buffer, and a smoothing buffer. This makes the streaming highly scalable, increases the overall throughput, and reduces the load on the server keeping the average cost low. It scales the service economically, i.e., as the number of clients grows the cost per client goes down. It does not require multiple channels between the proxy and the client as well as the infinite buffer at the client to allow a client to share a server stream. A proxy always uses a work-aheadsmoothing buffer. This eliminates the requirement of justin-time arrival of video data at the proxy. The use of smoothing buffer allows it to transmit VBR encoded videos smoothly. Use of pre-fetched data in the prefix buffer to do work ahead smoothing eliminates the requirement of either using large network bandwidth or using complicated online smoothing technique to transmit VBR-encoded video. An optimal bandwidth and buffer allocation algorithm is essential when these resources are limited. Our current research is to find such an algorithm. VCR functionality is an important feature that a streaming solution should provide. In our future research, we will provide VCR functionality. Multicast enabled links are often available between the proxy and the client. In our future research, we will also make our transmission scheme more economical taking the advantage of multicast enabled links between the proxy and the clients.
6. References
3
0
5. Conclusion
1000
Aggregate Request Rate (requests/sec)
Figure 4. Average Cost vs. Aggregate Request Rate Figure 4 plots the average cost against the aggregate access rate for the videos under UniSMerge, Online Smoothing [3], and SBatch [4]. The cost remains highest and constant with the request rate in Online Smoothing, since, it is not sharing either suffix transmission from the server or the smoothing buffers at the proxy. Like SBatch our UniSMerge transmission scheme yields lower cost when the request rate increases. However, UnisMerge cost is always less than that of SBatch, and Online Smoothing. When the aggregate access rate increases, more clients are sharing the resources such as the suffix data, the prefix and the smoothing buffers in our UniSMerge transmission scheme, i.e., it scales the streaming service more economically.
[1] MPEG-2, http://www.mpeg.org. [2] I. Dalgic, and F. Tobagi, Constant Quality Video Encoding, IEEE Proceeding ICC’95, Seatle, Washington, 1995, 1-7. [3] S. Sen, J. Rexford, and D. Towsley. Proxy Prefix Caching for Multimedia Streams, In the Proceedings of IEEE INFOCOM, New York, 1999, 1310-1319. [4] B. Wang, S. Sen, M. Adler, and D. Towsley. Optimal Proxy Cache Allocation for Efficient Streaming Media Distribution, In the Proceedings of IEEE INFOCOM, 2002, 366-374. [5] A. Bestavros, C.R. Cunha, and M.E. Crovella, Characteristics of www client-based traces, In Technical report, BU-CS-95-010, Computer Science Department, Boston University, July 18, 1995. [6] Video trace of MPEG-4 encoded Jurassic Park I, http://www-tkn.ee.tu-berlin.de/research/trace/ltvt.html. [7] M. Reisslein, F. Hartanto, and K. W. Ross, Interactive video streaming with proxy servers, In Proc. of First International Workshop on Intelligent Multimedia Computing and Networking (IMMCN), Atlantic City, NJ, 2000, 588-591.