IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
75
Layer-Encoded Video in Scalable Adaptive Streaming Michael Zink, Jens Schmitt, and Ralf Steinmetz, Fellow, IEEE
Abstract—Combining the concepts of caching and transmission control protocol (TCP)-friendly streaming of layer-encoded video bears the problem that those videos might not be cached in full quality. Therefore, we focus in this work on the scheduling of retransmissions of missing segments of a cached video in a manner that allows clients to receive the content in an improved quality. In a first step, we conducted subjective assessments of variations in layer-encoded video with the goal to validate existing quality metrics, including our own, which are based on certain assumptions. A statistical analysis of the subjective assessment validates these assumptions. We also show that the frequently used peak signal-to-noise ratio (PSNR) is not an appropriate metric for variations in layer-encoded video. With the insight from the subjective assessment we develop heuristics for retransmission scheduling and prove their applicability by conducting a series of simulations.
I. INTRODUCTION A. Motivation
T
HE CHALLENGES of providing true video-on-demand (TVoD) [1] in the Internet are manifold and require the orchestration of different technologies. E.g., the distribution and caching of video content and the adaptation of streaming mechanisms to the current network situation and user preferences are still under investigation. Existing work on TVoD has shown caches to be extremely important with respect to scalability, from the network as well as from video the servers’ perspective [2]. Scalability, of course, is an important issue if a TVoD system is considered to be used in the global Internet. Yet, simply reusing concepts from traditional Internet Web caching is not sufficient to suit the special needs of video content since, for example, popularity life cycles can be very different [3]. In addition to scalability, it is very important for an Internet TVoD system to take the “social” rules implied by transmission control protocol’s (TCPs) cooperative resource management
Manuscript received November 1, 2002; revised March 31, 2003. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Ryoichi Komiya. M. Zink was with the Multimedia Communications Lab, Faculty of Electrical Engineering and Information Technology, Darmstadt University of Technology, D-64283 Darmstadt, Germany. He is now with the Department of Computer Science, University of Massachusetts, Amherst, MA 01003 USA (e-mail:
[email protected]). J. Schmitt was with the Multimedia Communications Lab, Faculty of Electrical Engineering and Information Technology, Darmstadt University of Technology, D-64283 Darmstadt, Germany. He is now with the Distributed Computer Systems Lab, Computer Science Department, University of Kaiserslautern, 67653 Kaiserslautern, Germany (e-mail:
[email protected]). R. Steinmetz is with the Multimedia Communications Lab, Faculty of Electrical Engineering and Information Technology, Darmstadt University of Technology, D-64283 Darmstadt, Germany (e-mail:
[email protected]). Digital Object Identifier 10.1109/TMM.2004.840595
model into account, i.e., to be adaptive in the face of (incipient) network congestion. Therefore, the streaming mechanisms of an Internet TVoD system need to incorporate end-to-end congestion control mechanisms to prevent unfairness against TCP-based traffic and to increase the overall utilization of the network. As it is debatable whether QoS mechanisms will ever be used in the global Internet, e.g., in the form of RSVP/IntServ [4], we do not assume these but build upon the current best-effort service model of the Internet which is based on closed-loop control exerted by TCP-like congestion control. Yet, since video transmissions need to be paced at their “natural” rate, adaptiveness can only be integrated into streaming mechanisms in the form of quality degradation and not by delaying the transfer as is possible with elastic traffic such as FTP transfers. An elegant way of introducing adaptiveness into streaming is to use layer-encoded video [5] as it allows dropping segments (the transfer units) of the video in a controlled way. Thus, it overcomes the inelastic characteristics of traditional encoding formats like MPEG-1 or H.261. However, while the combination of caching and adaptive streaming promises a scalable and TCP-friendly TVoD system it also creates new design challenges. One drawback of adaptive transmissions is the introduction of variations in the amount of transmitted layers during a streaming session. These variations affect both the end-users’ perceived quality and the quality of the cached video and, thus, the acceptance of a service that is based on such technology. Recent work that has focused on reducing these layer variations, either by employing intelligent buffering techniques at the client [6]–[8] or proxy caches [9]–[11] in the distribution network, made various assumptions about the perceived quality of video with time-varying number of layers. To the best of our knowledge, these assumptions have not been verified by subjective assessment so far. The lack of in-depth analysis of quality metrics for variations in layer-encoded video motivated us to conduct an empirical experiment based on subjective assessment to obtain results that can be used in classifying the perceived quality of such videos. B. Outline In Section II, we briefly introduce the basic components of our overall approach toward scalable adaptive video streaming in the Internet. Section III reviews previous work on retransmission scheduling for layer-encoded video and subjective assessment of video quality. The test environment and the subjective test method used for the experiment to determine the subjective impression of variations in layer-encoded video is described and discussed in Section IV. We present the results of the experiment and compare these results with an objective metric, the peak signal-to-noise ratio (PSNR). In Section V, we focus on the particular problem of how to schedule retransmissions, to
1520-9210/$20.00 © 2005 IEEE
76
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
Fig. 2.
Initial cached video quality.
Fig. 3. Quality criteria.
Fig. 1.
System scalability.
a cache, of segments from missing layers of a video. We devise an objective quality metric for variations in layer-encoded video and validate its appropriateness by applying this metric to the sequences from Section IV. We briefly demonstrate that the retransmission of missing segments is a complex problem, which is why we devise a number of heuristic algorithms. These heuristics are compared among each other by simulations. Finally, conclusions are drawn in Section VI. II. SCALABLE ADAPTIVE STREAMING A. System Scalability—Caching Let us briefly describe our video caching architecture. As caching method we employ so-called write-through caching1, where a requested stream is either forwarded through the proxy cache or it is streamed via a multicast group which clients and proxy caches join if the cache replacement strategy decides to store the requested video on the proxy cache. Subsequent clients can then be served from the proxy cache (see Fig. 1). This technique has a lower overall network load in a TVoD system than a method where the video is transported to the cache in a separate stream using a reliable transmission protocol (e.g., TCP). On the other hand, write-through caching requires a reliable multicast protocol to recover from packet losses. The design and implementation of such a protocol, called Loss Collection RTP (LC-RTP), presented in [12], fits particularly well in a TVoD architecture which employs write-through caching. B. Content Scalability—Layer-Encoded Video Hierarchically layer-encoded video represents a suitable method to allow for this quality adaptation. There are other alternatives such as adaptive encoding or switching between different encodings, but these are less attractive for caching purposes since they do not possess the subset relationship of layered encoding and, thus, lead to transmissions which are 1Adopted
terminology from memory hierarchies.
difficult to cache. Fig. 2 illustrates how a layer-encoded video might be stored in a cache after its initial (congestion controlled) transmission. Obviously, the cached copy of the video exhibits a potentially large number of missing segments from different layers. The exact shape of a cached video content is a function of the congestion control mechanism being used. There have been several proposals on how to achieve TCP-friendly congestion control using layer-encoded video transmissions, e.g., [13], [14], or [15]. Given that they are already very effective, we build on these proposals. Our emphasis is a complementary issue: the scheduling of retransmissions of missing segments of a cached video in a manner that allows clients to perceive the content in an improved quality. III. RELATED WORK Given that our work is influenced by the two research areas, retransmission scheduling and video quality, we consider them separately. A. Retransmission Scheduling Nelakuditi et al. [7] state that a good quality metric for layerencoded video should capture the amount of detail per frame as well as its uniformity across frames; i.e., if we compare the sequences of layers in a video shown in Fig. 3 the quality of (a2) would be better than that of (a1), which is also valid for (b2) and (b1), according to their assumption. Their quality metric is based on the principle of giving a higher weight to lower layers and to longer runs of continuous frames in a layer. The metric presented by Rejaie et al. [9] is almost identical to the one advocated in [7]. Completeness and continuity are the two parameters that are incorporated in this quality metric. Completeness of a layer is defined as the ratio of the layer size transmitted to its original (complete) size. Continuity is the metric that covers the “gaps” in a layer. It is defined as the average number of segments between two consecutive layer breaks (i.e., gaps). In contrast to the other metrics presented here, this metric is a per-layer metric. In previous work [11], we also made assumptions about the spectrum for layer-encoded video postulating, similarly to [7]
ZINK et al.: LAYER-ENCODED VIDEO IN SCALABLE ADAPTIVE STREAMING
that it should be based on the frequency of variations and the amplitude of variations. In contrast to our approach, [7] does not include caches in the streaming architectures, but only tries to optimize the streaming between the server and a specific client for layer-encoded video. B. Video Quality The ITU Recommendation BT.500-10 [16] has been used as a basis for subjective assessment of encoders for digital video formats, in particular for MPEG-2 [17], [18] and MPEG-4 [19]. The focus of interest for all these subjective assessment experiments was the quality of different coding and compression mechanisms. Our work, in contrast, is concerned with the quality degradation caused by variations in layer-encoded video. The work presented in [20] is also concerned with layer-encoded video and presents the results of an empirical evaluation of four hierarchical video encoding schemes. The focus of their investigation is on the comparison between the different layered coding schemes and not on the human perception of layer variations. In [21], a subjective quality assessment was carried out in which the influence of the frame rate on the perceived quality is investigated. In contrast to our work, elasticity in the stream was achieved by frame rate variation and not by the application of a layer-encoded video format. The effects of bit errors on the quality of MPEG-4 video were explored in [22] by subjective viewing measurements, but effects caused by layer variations were not examined. Lavington et al. [23] used an H.263+ two layer video format in their trial. This is probably closest to the work presented here although they were rather interested in the quality assessment of longer sequences (e.g., 25 min.). As opposed to using identical pregenerated sequences that were presented to test candidates, videos were streamed via an IP network to the clients and the quality was influenced in a fairly uncontrolled way by competing data originating from a traffic generator. IV. SUBJECTIVE IMPRESSION OF VARIATIONS IN LAYER-ENCODED VIDEO One goal of our research is to investigate whether general assumptions made about the quality metrics of variations in layer-encoded video (see Section III-A) can be verified by subjective assessment. These results are then used for the development of the retransmission scheduling algorithms presented in Section V. It is difficult, in general, to validate all possible scenarios covered by those assumptions presented in Section III-A with an experiment based on subjective assessment. We therefore decided to focus on basic scenarios that have the potential to answer the most fundamental questions. A. Test Environment Scalable MPEG (SPEG) [24] is a simple modification to MPEG-1 which introduces scalability. In addition to the possibility of dropping complete frames (temporal scalability), which is already supported by MPEG-1 video, SNR scalability is introduced through layered quantization of discrete cosine transform (DCT) data [24]. Our decision to use the SPEG
77
format is for the following reasons. SPEG is designed for an adaptive video-on-demand (VoD) approach, i.e., the data rate streamed to the client should be controlled by feedback from the network (e.g., congestion control information). In addition, the developers of SPEG also implemented a function that reconverts SPEG to MPEG-1 allowing the use of standard MPEG-1 players, e.g., the Windows Media Player. The subjective assessment method is widely accepted for determining perceived image and video quality. Research that was performed under the ITU-R led to the development of a standard for such test methods [16]. The direct comparison of two impaired videos is our primary goal, and since this is best represented by the stimulus-comparison (SC) method, we decided to use this method in our test. Finally, we created a small application2 that allows an automated execution of the tests. Detailed information about the test environment are given in [25]. B. Experiment We investigated isolated effects individually to keep the duration of a test session reasonably short while still permitting the drawing of conclusions for the general assumptions, as discussed above. We are thus rather interested in observing the quality ranking for isolated effects like frequency variations (as shown in sequences (b1) and (b2) in Fig. 3) than for combined effects (as shown in Fig. 2). During the whole experiment, 15 different assessments were performed. We differentiated between two types of tests, one in which the amount of segments used by a pair of sequences is equal and one in which the amount differs. Since only the results of the first type of assessments are relevant for retransmission scheduling (Section V), we do not present the results of the second type in this article. These results and a detailed description of the test procedure can be found in [25]. Each candidate performed 15 different assessments and each single test lasted for 33 seconds. All tests were executed according to the SC assessment method. We chose three video sequences for this experiment which are frequently used for subjective assessment [26]. The order of the 15 video sequences was changed randomly from candidate to candidate as proposed in the ITU-R B.500-10 standard [16]. After some initial questions (age, gender, profession), three assessments were executed as a warm-up phase. This was designed to avoid any distraction due to the content of the video sequences (as reported by Aldridge et al. [17]). The experiment was performed with 94 test candidates (62 males and 32 females), between the age of 14 and 64, of whom 81 of them had previous experiences with watching video on a computer. Figs. 5–10 show the layer patterns of each single sequence that was used in the experiment. The three warm-up tests where the comparison is performed between the first sequence that consists of four layers and the second that consists of only one layer, and tests where the total amount of segments differs, are not presented here. It should be mentioned that the single layers are not equal in size (contrary to the presentation of their shapes) layer is given by the expression: as the size of the 2A downloadable version of the test can be found at http://www.kom.tudarmstadt.de/video-assessment/
78
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
Fig. 4. Average and 95% confidence interval for the different tests of the experiment and the comparison values (table) that are shown on the y -axis.
Fig. 5. Farm1.
Fig. 9. M&C3.
Fig. 6. Farm2.
Fig. 10.
Tennis1.
C. Results
Fig. 7. M&C1.
Fig. 8. M&C2.
and, thus, segments of different layers have different sizes. Preliminary experiments showed that equal layer sizes are not appropriate to make layer changes perceivable; as there exist layered schemes that produce layers with sizes similar to ours [27], [28], we consider this a realistic assumption. Since segments from different layers are not equal in size, the amount of data and, therefore, the peak signal-to-noise ratio (PSNR) of the compared sequences differs.
We now present the results of the experiment described in Section IV-B. Given the statistical nature of the data gathered, it is obvious that the results presented cannot prove an assumption but only make it less or more likely. The statistical results of the subjective assessment are presented in Fig. 5. For each of the nine tests we calculated the average and the 95% confidence interval. We also provide objective data in terms of the average PSNR per sequence. The average PSNR was obtained by comparing the original MPEG-1 sequence with the degraded sequence on a per frame basis. This results in 250 single PSNR values per sequence which were used to calculate the average PSNR. In the following we discuss each assessment in detail. 1) Farm1–Amplitude: In this assessment, the stepwise decrease was rated slightly better than one single but higher decrease. The result tends to justify the assumptions about the amplitude of a layer change (as described in Section III-A). 2) Farm2–Frequency: The result of this test shows an even higher likelihood that the second sequence has a better perceived quality than in the case for Farm1. It tends to confirm the assumption that the frequency of layer changes influences the perceived quality, since, on average, test candidates ranked the quality of the sequence with lesser layer changes higher.
ZINK et al.: LAYER-ENCODED VIDEO IN SCALABLE ADAPTIVE STREAMING
79
TABLE I COMPARISON BETWEEN SPECTRUM, SUBJECTIVE, AND OBJECTIVE QUALITY (SAME AMOUNT OF SEGMENTS)
3) M&C1–Closing the Gap: This test tries to answer the question: would it be better to close a gap in a layer on a higher or lower level? The majority of the test candidates decided that filling the gap on a lower level results in a better quality than otherwise. This result tends to affirm our assumptions made for retransmission scheduling in Section V. 4) M&C2–Constancy: With an even higher significance than in the preceding tests, the candidates considered the sequence with no layer changes as having a better quality. One may judge this a trivial and unnecessary test, but from our point of view the result is not that obvious, since (g1) starts with a higher amount of layers. The outcome of this test implies that it might be better, in terms of perceived quality, to transmit less but a constant amount of layers. 5) M&C3–Constancy at a Higher Level: This test was to examine if an increase in the overall level (in this case by comparison to M&C2) has an influence on the perceived quality. Comparing the results of both tests (M&C2 and M&C3) shows no significant change in the test candidates’ assessment. 80% of the test candidates judge the second sequences [(g2) and (h2)] of higher quality (values 1–3 in Fig. 4) in both cases. 6) Tennis1–All is Well That Ends Well: The result of this test shows the tendency that increasing the amount of layers in the end leads to a higher perceived quality. The result has a remarkably strong statistical significance (the highest bias of all tests). We leave as future work tests that will be of longer duration and executed in a different order [first (k2) than (k1)], which should show how the memory-effect [17] of the candidates influenced this test. D. Subjective Versus Objective Quality Since the determination of an objective quality metric can be performed with much less effort than a subjective assessment, the result of this investigation may provide hints as to whether the determination of the average PSNR is sufficient to define the quality of a video sequence. Note that, since the relation between subjective and objective quality is not the focus of the investigation presented in this article this can only be seen as a by-product and would certainly need further investigation. (The PSNR values for each sequence are given in Figs. 5–10.) A comparison of the results from the subjective assessment and the objective quality expressed by the PSNR is given in Table I. The results of the subjective assessments do not agree with what one would expect from the PSNR values, with the one
exception of one test (see Fig. 6). For the tests Farm1, M&C1, M&C2, M&C3 and T-Tennis1 the second sequence was always assessed a better quality by the test candidates while the PSNR value is always higher for the first sequence. Therefore, the obtained results for the test in which the sum of segments was equal for each sequence do not indicate a positive correlation between both quality metrics. From the results in our tests we see a strong tendency that, in the case of layer-encoded video, the quality of a sequence is not well represented by the average PSNR. In Section V, we see that this is an important result for the creation of an objective quality metric. V. RETRANSMISSION SCHEDULING A. Basics Missing segments of the cached video should be retransmitted to enable higher quality service from the proxy cache to its clients. The most interesting issue here is how to schedule the retransmissions, i.e., in which order to retransmit missing segments in order to achieve certain quality goals for the cached video content. A further design issue is when to schedule retransmissions. In contrast to [7], we focus on the retransmission into caches, since those retransmissions can be beneficial for more than one client and caches are a necessary element of Scalable Adaptive Streaming in any case. We call this mechanism retransmission due to the fact that an entire layer was not transmitted for a certain interval due to congestion on the intermediate link, but would have been transmitted if no congestion had occurred. 1) Retransmission time: There are three possible occasions when to perform retransmissions. All three have the goal to improve the quality of the cached content as soon as possible at the point in time when the decision was made to start the retransmission. • Directly after the initial streaming process: the cache starts requesting missing segments without waiting for further requests for a certain video. This allows one to offer the highest possible quality to requests that arrive during the retransmission phase. • During subsequent requests: the proxy cache serves subsequent requests but, simultaneously, also orders missing segments from the origin server. This alternative inherits the advantage of write-through caching that any bandwidth between the proxy cache and origin server is used only if a client request is directly related to it.
80
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
• During requests for different content from the server: in [29], we presented a technique that allows the transmission of requested segments (for an already cached video) in addition to video data that is streamed from the server to the proxy cache. With this approach, the proxy cache can decide on which of the cached videos retransmissions should be performed. All three can be supported by retransmission scheduling, since it only determines in which order missing segments should be sent to the cache. 2) Scheduling Goals: The rationale for making an effort to schedule retransmissions in an intelligent way is that the presentation quality for users that are served from the proxy cache can be enhanced. Therefore, we need to be explicit on what constitutes a quality enhancement, i.e., we need a goal for retransmission scheduling algorithms to strive for. All algorithms we investigate use as much bandwidth as available between origin server and proxy cache to retransmit missing segments; although, as we showed in Section IV-C, it is not necessarily the case that using all available bandwidth improves the quality of the cached video. From the results obtained in Section IV, we conclude that a retransmission scheduling algorithm that tries to avoid or even decreases quality variations for a cached video can be considered superior to others which do not take this into account. The negative effect of quality variations has two dimensions the frequency of variations and the amplitude of variations, and the goal of retransmission scheduling should be to minimize both. In order to state the scheduling goal more formally, let us define some terms: number of layers in time slot , ; indication of a step in time slot , , We assume here, without loss of generality, a slotted time interval with slots corresponding to the transmission time of a single (fixed-size) segment. We can now introduce what we call the spectrum of a cached layer-encoded video :
(1)
The retransmission scheduling goal for a video can be stated . The amplitude is as the minimization of the spectrum captured by the differences between quality levels and average quality levels where larger amplitudes are given a higher weight due to squaring these differences. The frequency of variations is captured by . Only those differences that correspond to a step in the cached layer-encoded video are taken into account. B. Comparison With Subjective Assessment and PSNR In Table I, we compare the spectrum of the shapes presented in Section IV-C with the results from the subjective assessment and the PSNR for each sequence. With the exception of the cases for the Farm1 (Fig. 5) and T-Tennis1 (Fig. 10) tests there is a consistency between the subjective quality and the spectrum. In contrast to the subjective results, the PSNR’s of the sequences
Fig. 11.
Optimal retransmission scheduling model.
are only consistent in one of the six cases (test Farm2). This argues for our hypothesis that the spectrum is more suitable as an objective quality metric than the PSNR. Although, it must be mentioned that the spectrum does not regard time dependencies very well as can be seen in the case of the T-Tennis1 test (Fig. 10). With the knowledge that the spectrum is a suitable metric for the quality of layer-encoded video, we decided to use it to rate and compare the retransmission scheduling algorithms that are presented in the following sections. C. Algorithms for Retransmission Scheduling We first discuss the problem complexity by looking at optimal retransmission scheduling. Since the determination of optimal retransmission schedules is either computationally infeasible or at least intensive, we present some heuristic schemes. 1) Optimal retransmission scheduling: A formulation of optimal retransmission scheduling as a mathematical program is given in Fig. 11. Here, the overall available retransmission capacity is modeled as an estimate. We observe that optimal retransmission scheduling is a discrete nonlinear stochastic optimization problem. As such it is, to the best of our knowledge, analytically intractable. It is structurally similar to the quadratic assignment problem, which is known to be NP-complete [30]. Thus, given reasonable restrictions on computing power, an exhaustive search for reasonable values of the number of time slots is computationally infeasible. 2) Heuristics for retransmission scheduling: The goal of our retransmission scheduling heuristics is to improve the overall quality of the cached video, and not only for the current viewer (in contrast to approaches like [9]). We can support all three cases listed in Section V-A1 while [9] is suited for the second case only. In [11], we have shown that a specialized version (for the second case presented in Section V-A1) of our heuristics can, under certain conditions, be superior to a window-based approach such as [9]. 3) Unrestricted priority-based heuristics: Heuristics that aim to improve the overall quality of a cached video can take an unrestricted look at all missing segments when making requests for retransmissions from the origin server. Note that during the retransmission phase, our algorithms send periodic retransmission requests to the server to ensure that the server obtains an up-to-date schedule of retransmissions based on the modified
ZINK et al.: LAYER-ENCODED VIDEO IN SCALABLE ADAPTIVE STREAMING
81
Fig. 12. U-LLF operation.
Fig. 15.
Fig. 13. U-SG-LLF operation.
Fig. 14. U-LL-SGF operation.
shape of the cached video due to already retransmitted and received segments. In the following, we describe three heuristics of the more general class of unrestricted priority-based retransmission scheduling algorithms. 4) Unrestricted lowest layer first (U-LLF): This is the simplest of all algorithms. It scans the whole video for missing segments and schedules these segments according to their layer level. Fig. 12 gives an example for the U-LLF heuristic. The numbers for each segment define the order in which the segments should be sent from the server. 5) Unrestricted shortest gap lowest layer first (U-SGLLF): Considering the definition of the spectrum in Section V-A2 and taking into account our scheduling goal of minimizing the spectrum, we observe that there are, in principle, two ways to decrease the spectrum of a video, either to increase the lowest quality levels, or to close gaps in the video, i.e., reduce the number where are nonzero. The latter is not captured by simply using layer levels as priorities. Therefore, in contrast to U-LLF, we now use a prioritization of the missing segments which also takes closing of gaps into account. We do so by first sorting the segments according to the length of the gap they belong to and then use their layer levels for further sorting (see Fig. 13). 6) Unrestricted lowest layer shortest gap first (U-LLSGF): Since it is by no means clear which sorting criterion (i.e., gap length or layer level) should be given precedence, we also tried the variant where missing segments are first sorted by their layer level and then sorted further by gap lengths. Comparing Figs. 14 and 13 shows that this heuristic can result in a different retransmission schedule. D. Simulations In order to compare the different retransmission scheduling algorithms from the previous section and investigate their dependency upon different parameters, we performed a number of
Randomly generated layer-encoded video on the cache.
experiments based on a custom simulation environment (implemented in C++). For each simulation, an instance of a layer-encoded video on the proxy cache is randomly generated. Here, we modeled such a layer-encoded video instance as a simple finite birth-death process since it is the result of the congestion-controlled video transmission which restricts state transiis the state space and tions to direct neighbor states. (for birth and death rate are chosen to be equal to all states) which results in a mean length of three time units for periods with stable quality level3. We use a discrete simulation time where one unit of time corresponds to the transmission time of a single segment. In Fig. 15, an example video instance generated in this way is given. During the simulations, the spectrum (as defined in Section V-A2) of the cached video instances is continuously calculated, and the different algorithms’ performance is compared, given parameters such as the available bandwidth. We , assume a constantly available bandwidth, i.e., where is the overall retransmission capacity for the video. This is certainly a simplifying assumption, but one which does not affect the analysis of the algorithms. We first performed a series of 1000 simulations with all three retransmission scheduling algorithms from Section V-C where all parameters were chosen identical. This large sample ensured that the 95% confidence interval lengths for the spectrum values were less than 0.5% of the absolute spectrum values for all heuristics. The results for the evolution of the spectrum values for the different algorithms are shown in Fig. 16. These results indicate that there is a significant gain with respect to the spectrum and, therefore, the perceived quality of the cached video for the unrestricted retransmission algorithms. Of all the algorithms, U-SG-LLF performed best by decreasing the spectrum by an average of 186 and 88 in comparison with U-LLF and U-LL-SGF, respectively. The results from Table I show that even a small spectrum decrease can improve the perceived quality. Thus, we assume that improvement of the spectrum as performed by U-SG-LLF leads to a significantly better perceived quality than U-LL-SGF or especially U-LLF. In the following, we investigate the heuristics’ dependencies on certain parameters. For all of these simulations, we used only the U-SG-LLF heuristic since it showed the best performance of all heuristics in the experiment of the preceding section. 1) Number of layers: For this simulation, we varied the number of layers per cached video to be either five, ten, or 20 layers. Increasing the number of layers in this case does not 3The parameter choice is rather arbitrary. However, simulations with other values showed no significant impact on our results.
82
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
Fig. 16. Average spectrum of 1000 simulation runs for each heuristic (ten layers, retransmission bandwidth = 1).
Fig. 18.
Different amounts of available retransmission bandwidth (ten layers).
VI. CONCLUSIONS
Fig. 17.
Different number of layers.
increase the maximum bandwidth of the layer- encoded video. Instead, the bandwidth of each single layer is decreased. For each of these three alternatives we ran 1000 simulations and calculated again the average of the spectra over time. As Fig. 17 shows, the spectrum converges for each of the three alternatives. Yet, the higher the number of layers the higher the average spectrum. This is intuitive because the more fine-grained the layered encoding the more variations that may be introduced during a congestion-controlled transmission and the harder it is for the retransmission scheduling to smooth these variations. The final average spectra for the ten and 20 layer cases are higher because the amount of segments that could be retransmitted is lower due to a reduced relative bandwidth. This simulation shows that one retransmission phase might not always be sufficient to obtain full quality for the cached content. 2) Available retransmission bandwidth: In the next set of simulations, the effect of different amounts of available retransmission bandwidth on the performance of U-SG-LLF was investigated. Not surprisingly, the spectrum converges faster (as shown in Fig. 18) with a higher available retransmission bandwidth. The reason for the very similar spectrum curves for and is due to sufficient retransmission bandwidth for both cases which allows to retransmit all missing parts of the cached video. Thus, the video is stored on the cache in full quality.
Recent work has shown that layer-encoded video is a technique that supports adaptive streaming well. High scalability for VoD in the Internet can be achieved by a distributed caching architecture. Our Scalable Adaptive Streaming approach combines both caching and adaptive streaming and promises a scalable TCP-friendly TVoD system. In this article, we focused on the problem of how to deal with retransmissions of missing segments for a cached layer-encoded video in order to meet users’ demands to watch high quality video with relatively little quality variation. We conducted a subjective assessment on variations in layer-encoded video with the goal to assess the appropriateness of existing quality metrics. This investigation mostly validates the assumptions that were made in relation to layer variations and the perceived quality of a video. • The frequency of variations should be kept as small as possible. • If a variation cannot be avoided, its amplitude should be kept as small as possible. We showed that adding information at different locations can have a substantial effect on the perceived quality. This knowledge was applied to the development of our heuristics in retransmission scheduling. We saw that it is more likely that the perceived quality of a layer-encoded video is improved if the lowest quality level is increased and gaps in lower layers are filled. The comparison of the spectrum with the subjective assessment and the PSNR strongly supports our claim that the spectrum is a more suitable objective quality metric than the PSNR. Another interesting outcome of the experiment is the fact that the results obtained can additionally be used to refine caching replacement policies that operate on a layer level [5] as well as layered multicast transmission schemes which try to offer heterogeneous services to different subscribers as, for example, in the receiver-driven layered multicast (RLM) [27] scheme and its derivatives. Based on the subjective assessment, we developed and compared different retransmission scheduling algorithms from the general class of unrestricted priority-based heuristics to tackle the problem of retransmitting missing segments of a layer-encoded video into caches. Our simulation results clearly show that these algorithms improve the quality of cached video. The insights gathered from
ZINK et al.: LAYER-ENCODED VIDEO IN SCALABLE ADAPTIVE STREAMING
the simulative experiments encourage future work which integrates adaptive streaming by layered video encodings into our VoD system. This will allow the verification of our simulation results for the retransmission scheduling algorithms in realistic scenarios. Furthermore, from the algorithmic perspective, we will investigate how the decision to cache a certain video and how its retransmissions can be integrated with each other, and their respective effects on each other. ACKNOWLEDGMENT The authors would like to thank R. Tunk and O. Künzel for their support on conducting the assessment, C. “B.” Krasic for his support on SPEG, the test candidates for taking the time to perform the assessment, RTL Television for providing the video sequences, and the anonymous reviewers for their valuable comments. REFERENCES [1] T. D. Little and D. Venkatesh, “Prospects for interactive video-on-demand,” IEEE Multimedia, vol. 1, no. 3, pp. 14–25, May 1994. [2] C. Griwodz, “Wide-Area True Video-on-Demand by a Decentralized Cache-Based Distribution Infrastructure,” Ph.D. dissertation, Darmstadt Univ. Technol., Darmstadt, Germany, 2000. [3] C. Griwodz, M. Bär, and L. C. Wolf, “Long-term movie popularity in video-on-demand systems,” in Proc. ACM Multimedia’97, Nov. 1997, pp. 340–357. [4] R. Braden, D. Clark, and S. Shenker, “RFC 1633–Integrated services in the internet architecture: An overview,” Informational RFC, Jun. 1994. [5] J.-Y. Lee, T.-H. Kim, and S.-J. Ko, “Motion prediction based on temporal layering for layered video coding,” in Proc. ITC-CSCC’98, Jul. 1998, pp. 245–248. [6] D. Saparilla and K. W. Ross, “Optimal streaming of layered video,” in Proc. Nineteenth Annu. Joint Conf. IEEE Computer and Communications Societies 2000 (INFOCOM’00), Tel-Aviv, Israel, Mar. 2000, pp. 737–746. [7] S. Nelakuditi, R. R. Harinath, E. Kusmierek, and Z.-L. Zhang, “Providing smoother quality layered video stream,” in Proc. 10th Int. Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Raleigh, NC, Jun. 2000. [8] R. Rejaie, M. Handley, and D. Estrin, “Quality adaptation for congestion controlled video playback over the internet,” in Proc. ACM SIGCOMM ’99 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication 1999, New York, Aug. 1999, pp. 189–200. [9] R. Rejaie, H. Yu, M. Handley, and D. Estrin, “Multimedia proxy caching for quality adaptive streaming applications in the internet,” in Proc. Nineteenth Annu. Joint Conf. IEEE Computer and Communications Societies 2000 (INFOCOM’00), Tel-Aviv, Israel, Mar. 2000, pp. 980–989. [10] R. Rejaie and J. Kangasharju, “Mocha: A quality adaptive multimedia proxy cache for internet streaming,” in Proc. 11th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV’01), Port Jefferson, NY, Jun. 2001, pp. 3–10. [11] M. Zink, J. Schmitt, and R. Steinmetz, “Retransmission scheduling in layered video caches,” in Proc. Int. Conf. Communications 2002 (ICC’02), New York, Apr. 2002. [12] M. Zink, C. Griwodz, A. Jonas, and R. Steinmetz, “LC-RTP (Loss collection RTP): Reliability for video caching in the internet,” in Proc. Seventh International Conference on Parallel and Distributed Systems: Workshops, Jul. 2000, pp. 281–286. [13] R. Rejaie, M. Handley, and D. Estrin, “RAP: An end-to-end rate-based congestion control mechanism for realtime streams in the internet,” in Proc. Eighteenth Annu. Joint Confe. IEEE Computer and Communications Societies 1999 (INFOCOM’99), New York, Mar. 1999, pp. 395–399. [14] S. Floyd, M. Handley, J. Padhye, and J. Widmer, “Equation-based congestion control for unicast applications,” in Proceedings of the ACM SIGCOMM’00 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication 2000, Stockholm, Sweden, Aug. 2000, pp. 43–56.
83
[15] W.-T. Tan and A. Zakhor, “Real-time internet video using error resilient scalable compression and TCP-friendly transport protocol,” IEEE Trans. Multimedia, vol. 1, no. 2, pp. 172–186, Jun. 1999. [16] ITU-R: Methodology for the Subjective Assessment of the Quality of Television Picture, Int. Std. ITU-R BT.500-10, 2000. [17] R. Aldridge, J. Davidoff, M. Ghanbari, D. Hands, and D. Pearson, “Measurement of scene-dependent quality variations in digitally coded television pictures,” Proc. Inst. Elect. Eng., Vis., Image, Signal Process., vol. 142, no. 3, pp. 149–154, 1995. [18] R. Aldridge, D. Hands, D. Pearson, and N. Lodge, “Continuous quality assessment of digitally-coded television pictures,” Proc. Inst. Elect. Eng., Vis., Image, Signal Process., vol. 145, no. 2, pp. 116–123, 1998. [19] F. Pereira and T. Alpert, “MPEG-4 video subjective test procedures and results,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 32–51, Feb. 1997. [20] C. Kuhmünch and C. Schremmer, “Empirical evaluation of layered video coding schemes,” in Proc. IEEE Int. Conf. Image Processing (ICIP), Thessaloniki, Greece, Oct. 2001, pp. 1013–1016. [21] T. Hayashi, S. Yamasaki, N. Morita, H. Aida, M. Takeichi, and N. Doi, “Effects of IP packet loss and picture frame reduction on MPEG1 subjective quality,” in 3rd Workshop on Multimedia Signal Processing, Copenhagen, Denmark, Sep. 1999, pp. 515–520. [22] S. Gringeri, R. Egorov, K. Shuaib, A. Lewis, and B. Basch, “Robust compression and transmission of MPEG-4 video,” in Proc. ACM Multimedia Conf., Orlando, FL, Oct. 1999, pp. 113–120. [23] S. Lavington, N. Dewhurst, and M. Ghanbari, “The performance of layered video over an IP network,” Signal Process.: Image Commun., vol. 16, no. 8, pp. 785–794, 2001. [24] C. Krasic and J. Walpole, “Priority-progress streaming for quality-adaptive multimedia,” in ACM Multimedia Doctoral Symp., Ottawa, ON, Canada, Oct. 2001. [25] M. Zink, O. Künzel, J. Schmitt, and R. Steinmetz, “Subjective impression of variations in layer encoded videos,” in Eleventh Int. Workshop on Quality of Service (IWQoS 2003), Monterey, CA, June 2003. [26] R. Neff and A. Zakhor, “Matching pursuit video coding–Part I: Dictionary approximation,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 13–26, Jan. 2002. [27] J. Hartung, A. Jacquin, J. Pawlyk, and K. Shipley, “A real-time scalable software video codec for collaborative applications over packet networks,” in Proc. ACM Multimedia Conf., Bristol, U.K., Sep. 1998, pp. 419–426. [28] L. Vicisano, L. Rizzo, and J. Crowcroft, “TCP-like congestion control for layered multicast data transfer,” in Proc. 17th Annu. Joint Conf. IEEE Computer and Communications Societies (INFOCOM’98), Mar. 1998, pp. 996–1003. [29] M. Zink, C. Griwodz, J. Schmitt, and R. Steinmetz, “Exploiting the fair share to smoothly transport layered encoded video into proxy caches,” in Proc. SPIE/ACM Conf. Multimedia Computing and Networking (MMCN), San Jose, Jan. 2002, pp. 61–72. [30] M. Garey and D. Johnson, Computers and Intractability. San Francisco, CA: W. H. Freeman, 1979.
Michael Zink received the Diploma (M.Sc.) degree in 1997 and the Dr.-Ing. (Ph.D.) degree in 2003, both from Darmstadt University of Technology, Germany. His thesis was on “Scalable Internet Video-on-Demand Systems.” He is currently a Postdoctoral Fellow in the Computer Science Department, University of Massachusetts, Amherst. Previously, he was a Researcher at the Multimedia Communications Lab, Darmstadt University of Technology. He works in the fields of sensor networks and distribution networks for high bandwidth data. Further research interests are in wide-area multimedia distribution for wired and wireless environments and network protocols. He is one of the developers of the KOMSSYS streaming platform. From 1997 to 1998, he was a Guest Researcher at the National Institute of Standards and Technology (NIST), Gaithersburg, MD, where he developed an MPLS testbed.
84
Jens Schmitt received the Dipl. (Master’s) degree in Joint Business and Computer Sciences from the University of Mannheim, Germany, in 1996. In 1994, during a stay at the University of Wales, Swansea, U.K., he also received the European Master of Business Sciences degree. In 2000, he received the Dr.-Ing. (Ph.D.) degree from the Darmstadt University of Technology, Germany. His thesis was on “Heterogeneous Network Quality of Service Systems.” He is Professor in the Computer Science Department, University of Kaiserslautern, Germany, where he is heading the Distributed Computer Systems Lab (DISCO). Previously, he was Research Group Leader of the Multimedia Distribution and Networking group in the Multimedia Communications Lab (KOM), Darmstadt University of Technology. He works in the fields of QoS provisioning in distributed systems, in particular in heterogeneous network scenarios, QoS for mobile communications, and scalable distribution of multimedia content with an emphasis on high availability systems. Further research interests are in network traffic modelling, real-time scheduling, and evolutionary algorithms.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005
Ralf Steinmetz (S’83–M’86–SM’93–F’99) worked for over nine years in industrial research and development of distributed multimedia systems and applications. He has been head, since 1996, of the Multimedia Communications lab at Darmstadt University of Technology, Germany. From 1997 to 2001, he directed the Fraunhofer (former GMD) Integrated Publishing Systems Institute IPSI, Darmstadt. In 1999, he founded the Hessian Telemedia Technology Competence Center (httc e.V.). His thematic focus in research and teaching is on multimedia communications with his vision of real “seamless multimedia communications.” With over 200 refereed publications he has become ICCC Governor in 1999. Prof. Steinmetz is a Fellow the ACM (2002).