An Evaluation of Bitrate Adaptation Methods for HTTP Live Streaming

2 downloads 0 Views 2MB Size Report
first discuss the trade-off among typical adaptation methods. The evaluation and comparison are then carried out not only in terms of bitrate and buffer behaviors ...
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

1

An Evaluation of Bitrate Adaptation Methods for HTTP Live Streaming Truong Cong Thang, Member, IEEE, Hung T. Le, Student Member, IEEE, Anh T. Pham, Senior Member, IEEE, and Yong Man Ro, Senior Member, IEEE,

Abstract—HTTP streaming has become a cost-effective means for multimedia delivery nowadays. For adaptivity to networks and terminals, a provider should generate multiple representations of an original video as well as the related metadata. Recently, there have been various adaptation methods to support adaptive HTTP streaming. In this paper, we investigate typical adaptation methods in the context of live video streaming. We first discuss the trade-off among typical adaptation methods. The evaluation and comparison are then carried out not only in terms of bitrate and buffer behaviors but also in terms of the perceptual impact on end users. It is found that the perceptual impact depends not only on adaptation method but also on the content itself. We also show that the preparation of representation sets may affect the behaviors of some adaptation methods. Index Terms—HTTP streaming, adaptivity, DASH.

I. I NTRODUCTION Thanks to the abundance of Web platforms and broadband connections, HTTP streaming has become cost-effective in delivering multimedia content [1][2]. Besides, due to the heterogeneity of today’s communication networks, adaptivity is the most important requirement for any streaming client. Especially, TCP, the underlying layer of HTTP, is notorious for its throughput fluctuations [3]. For adaptivity to networks and terminal capabilities, an HTTP streaming provider should generate multiple representations (or versions) of an original video as well as the metadata that contains the characteristics of the representations (such as bitrate, resolution, etc.) [1]. Based on the metadata and status of terminals/networks, the client makes decisions on which/when media parts are downloaded. To cope with throughput fluctuations in video streaming, a client should buffer some amount of video data before it can start playing [2][4]. Obviously, if the amount of buffered data is large, the client can better cope with the future fluctuations. However, this initial buffering delay (sometimes up to tens of seconds) may negatively affect the quality of experience (especially for live streaming) [4]. For on-demand streaming, a good and simple strategy is to use a somewhat low bitrate, fast start-up, and especially a very large buffer size (e.g. as Apple’s approach [5][6]). For live streaming, even if one tries to use a low bitrate, the amount of buffered media is still limited because 1) the initial buffering should not be long and Manuscript received ...; approved for publication by ... Truong Cong Thang, Hung T. Le, Anh T. Pham are with the University of Aizu, Aizu-Wakamatsu, Japan, email: {thang, m5171120, pham}@uaizu.ac.jp. Yong Man Ro is with Korea Advanced Institute of Science and Technology, Korea, email: [email protected].

2) the client can download only the segments that have been generated (from a live video source). It should be noted that, in the following, the terms “buffer size”, “initial buffering”, and “buffer level”, are all measured in time. In this paper, we focus on the case of live streaming, where bitrate adaptation methods are crucial to support a smooth presentation while maintaining a small buffer size. Recently, various adaptation methods have been proposed (e.g. [2][6]– [10]). However, these methods are evaluated in very different settings. There has been no work that discusses these methods in the same context. In this paper, we will evaluate some typical adaptation methods in terms of bitrate, buffer, and perceptual distortion. We also evaluate how different sets of representations affect the behaviors of adaptation methods. It should be noted that, in previous studies, the set of representations is selected in some ad-hoc manners (e.g. equally-spaced bitrates [2][8], heuristically-selected bitrates [5][6]). To the best of our knowledge, there have been only some evaluations of commercial clients [5][6][10]. Also, these studies mostly focus on the on-demand streaming scenario. The paper is organized as follows. In Section II, we present an overview of the adaptation problem and quality metrics for evaluation. Section III provides a brief review and classification of existing adaptation methods, together with the experimental setup. In Section IV, the performance of different adaptation methods is evaluated in terms of bitrate/buffer behaviors and perceptual distortion. In Section V, we investigate how different sets of representations affect the operations of the adaptation methods. Finally, discussions and conclusions are provided in Section VI and Section VII. II. B ITRATE A DAPTATION FOR V IDEO S TREAMING A. Problem Description To enable the interoperability in the industry, a new standard called Dynamic Adaptive Streaming over HTTP (DASH) has been developed by MPEG [11][12]. In MPEG DASH’s terminology, content may be composed of one or more content components (e.g. video, audio). A long content item could be divided into one or more temporal chapters (or periods). Representations having some common characteristics (e.g. same content component) are grouped into an adaptation set. Further, each representation is divided into media segments. In most cases, for each request from the client, the server will send one segment. So, media will be delivered by a sequence of HTTP request-response transactions. HTTP streaming can be applied to both on-demand streaming and live streaming. The main difference between these

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

2

two cases is the generation time of media segments. In the on-demand streaming case, all media segments are generated well in advance. In contrast, in the live streaming case, a media segment is generated only after the corresponding content interval is fully captured. So, in live streaming, the time distance between the requests of two consecutive segments is approximately the duration of the first segment. That is, if segments have the same duration of ds seconds, the distance between requests will be ds as well. The term “initial buffering” in the following means the length (in seconds) of media needed in the buffer before the playout can start, and “target buffer level” means the buffer level that the client tries to maintain throughout a given session. In live streaming, the “initial buffering” is equal to the “target buffer level” and the “buffer size”. Obviously, the length of buffered media could not be higher than the target level. A general description of the video bitrate adaptation problem has been provided in our previous work [13]. Adaptation methods can be classified by different factors, such as the number of streams, the number of users. In this paper, we focus on adaptation methods for the single stream case. In this part, we present the adaptation problem, highlighting the relationship of throughput, bitrate, and buffer level. We first have the following definitions (Fig. 1) [14]: - Arrival curve: represents the accumulated data size received by the client at a given time instant. - Playout curve: represents the accumulated data size consumed by the client at a given time instant.

Point to decide new segment bitrates

Data size

Trellis of bitrates

Playout Arrival curve curve

Buffer level

P4å

ã

P4

Future segment intervals Time

Fig. 1. Illustration of arrival curve, playout curve, and the trellis of possible bitrates. The horizontal distance between the arrival curve and playout curve is the buffer level. Each segment interval is equal to a segment duration.

The playout curve can be represented by linear sections, of which the slopes are bitrates of the media segments. Each segment interval in Fig. 1 is equal to a segment duration. Due to the fluctuations of instant throughput, the arrival curve actually contains non-linear sections. However, for simplicity, we just consider the points right after receiving a media segment. The arrival curve is then composed of linear sections connecting these points. Suppose that the client starts receiving data at tr0 and starts consuming data from the receiving buffer at tp0 . The initial buffering is then (tp0 − tr0 ) seconds. Essentially, the horizontal distance between the arrival curve and playout curve is the buffer level in time of the client. If average arrival rate (or

throughput) is equal to playout rate, the buffer level will be stable. Obviously, the appropriate bitrate to be requested (and then played) for each segment interval i should depend on the throughput and buffer level at that time. Suppose that the client has at most K bitrate (or segment) options for each segment interval i. If bitrate Rik (0 ≤ k < K) is selected, the arrival curve would be extended by a line section, of which the slope is the average throughput and the vertical length is the data size of that segment. Also, the playout curve is extended by a line section of which the slope is Rik . So, the decision problem can be represented as a trellis containing all possible paths of bitrates for a number of segment intervals. Any path with sections of decreasing slopes corresponds to a series of decreasing bitrates to be considered. With this representation, we can predict the buffer level in the near future according to the segments’ bitrates and throughput estimates. If the current buffer level is large enough, a good path could be found to meet different criteria, e.g. the buffer level must be greater than a minimum threshold and/or the bitrate variance of the selected path must be the smallest. However, if the current buffer level is very low, the client should aggressively switch to a bitrate which is at least smaller than the expected throughput. B. Perceptual Evaluation of Video Adaptation In this paper, the evaluation of adaptation methods is not only based on bitrate and buffer variations, but also on the perceptual impact on end users. As discussed in [13], there are different types of perceptual quality metrics for video adaptation. Usually, MOS (Mean Opinion Score) is used to evaluate the overall quality of a whole content or session [15]. This metric is used in [16] to improve the total utility of multiple users and in [2] for audiovisual content streaming. Because connection bandwidth is highly fluctuating, video bitrate (and so quality) should be adjusted quickly. In this context, it is important to know how the changes of bitrate (and quality) impact end users’ perception. For that purpose, another type of perceptual quality, which is Just Noticeable Difference (JND) [17][18], is used in this study. Theoretically, the concept of JND is used to indicate a minimum perceptual change/difference between two stimuli that can be detected by a human being. Suppose that xi and xj are the physical intensities of stimulus i and stimulus j. When a person perceives stimulus i, the “perceptual intensity” Ψ(xi ) is the actual intensity judged by that person. When intensities xi and xj are sufficiently far apart, a person can observe a “perceptual difference” between stimuli i and j, which is J = Ψ(xi ) − Ψ(xj ).

(1)

It should be noted that, in video evaluation, the intensity of a stimulus is in fact its perceptual quality. Now let’s denote p as the probability that a perceptual difference between two stimuli is detected. According to Thurstone’s “law of comparative judgment” [17], the probability that J is correctly identified is: ] [ J (2) p=C √ 2

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

3

Perceptual distortion (JND units)

where C[.] is the cumulative normal probability density function. With this relationship, obviously when J becomes larger, p increases toward the maximum value of 1. Usually, the perceptual difference can be identified by forced-choice subjective tests, in which observers are repeatedly shown a pair of stimuli and asked to select the one with higher (or lower) quality. The above formulation suggests that the probability p obtained from subjective tests can be used to derive the corresponding JND value. By definition [19], J equal to one JND unit corresponds to a probability p equal to 75%. That is, if 75% of the selections are for the better stimulus and 25% for the other one in a forced-choice test, the perceptual difference between the two stimuli will be one JND unit. As JND is an experimental threshold, it should be noted that a difference of 1 JND unit in practice is “essentially undistinguishable”, and even a difference of 3 JND units is still “not obviously different” [19]. In [17] we have evaluated the perceptual distortion (in JND units) of down-scaled video versions. Fig. 2 shows the perceptual distortion w.r.t. the normalized bitrate for different video sequences. For each sequence, the normalized bitrate of a version is obtained by dividing its bitrate by the bitrate of the highest quality version. We can see that the perceptual distortion increases mostly in the low bitrate ranges. Depending on the sequences, the original video bitrate could be reduced by 50% without causing much perceptual difference (less than 1.5 JND units). From Fig. 2, we can generally classify Harbour and Mobile into the slow motion group and Football and Soccer into the fast motion group. In the following sections, Football and Harbour videos will be used as our test videos.

A. Throughput-based methods The throughput-based methods decide the bitrate based on the estimated throughput only. The key differences between these methods are the ways to 1) estimate and 2) use the throughput. In the following, the measure of segment throughput [2] is computed as the ratio of a given segment’s data size and the delivery duration of that segment. The delivery duration is from the instant of sending the HTTP request to the instant of receiving the last byte of the HTTP response containing the segment. In the simplest way, the measured segment throughput T (i) of the last segment interval i (called instant throughput) is used as the throughput estimate T e (i + 1) of the next segment interval i + 1 [2], i.e. T e (i + 1) = T (i).

(3)

However, due to short-term fluctuations, the throughput estimate obtained in this way will be highly fluctuating. A popular solution to cope with short-term fluctuations is to use a “smoothed” throughput measure Ts (i) [2][10]: T e (i + 1) = Ts (i) = (1 − δ) × Ts (i − 1) + δ × T (i)

(4)

4

where Ts (i) is the smoothed throughput measure of segment interval i and δ is a weighting value. However, the disadvantage of smoothed throughput is that it causes late reaction of the client to a large throughput decrease, which in turn must be handled by having a large buffer. In [2], we propose a method that has the advantages of both instant throughput and smoothed throughput measures. Further, the throughput estimate can be computed from sampled throughput values and RTTs as in [20]. More detailed discussion about these throughput estimates can be found in [2]. In addition, a throughput estimate can be obtained by other means, e.g. based on probing or based on stored data (lookup table) [21]. Once the client obtains the throughput estimate, the bitrate can be decided in various ways. A simple solution is using a safety margin µ to compute the bitrate Re (i + 1) for the next segment interval i + 1 [2][7].

0

Re (i + 1) = (1 − µ) × T e (i + 1)

20 Football

16

Soccer Akiyo

12

Mobile Ice

8

Harbour

0

0.2

0.4

0.6

0.8

1

Normalized bitrate

Fig. 2. Perceptual distortion vs. normalized bitrate. The distortion is strongly dependent on the content [17].

III. OVERVIEW OF A DAPTATION M ETHODS AND E XPERIMENTAL S ETUP In this section, we classify typical adaptation methods and discuss their key features. Adaptation methods can be roughly divided into 1) a throughput-based group and 2) a bufferbased group. This is a rough division because buffer level is strongly affected by the throughput as discussed above. Also, the throughput could be used directly in buffer-based methods to switch video bitrate. So, the buffer-based group could be considered as the hybrid group. Finally, based on the discussion, five typical methods are selected for evaluation.

(5)

where µ usually takes a small value in the range [0, 0.5]. In this way, the smoothness of the bitrate is dependent on the smoothness of the throughput estimate. If (3) and (5) are used to decide the bitrate, we refer to the adaptation method as instant throughput based (ITB) method; and if (4) and (5) are used, the adaptation method is called smoothed throughput based (STB) method. In [8], the bitrate is controlled by a TCP-like mechanism where a measure proportional to the instant throughput is used as the key input. When the throughput goes down, the bitrate is immediately reduced; and when the throughput goes up, the bitrate is slowly increased by a complex checking mechanism. Due to this fact, we refer to this method as conservative throughput based method (CTB). In [22], fuzzy logic control is employed, where a set of linguistic rules is built to support bitrate decision. The problems with this method are that 1) it is difficult to define a set of linguistic rules and 2) the

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

TABLE I C OMPARISON Buffer level B3 ∼ Bmax

B2 ∼ B3

B1 ∼ B2

0 ∼ B1

OF DIFFERENT BUFFER - BASED ADAPTATION METHODS

Method of [9]

Method of [10]

Method of [6]

Switching up to the next higher bitrate if this bitrate is below the previous instant throughput

-Switching up to the next higher bitrate if this bitrate is below the previous smoothed throughput

Bitrate ≤ previous instant throughput times an upscaling factor

Maintaining bitrate

-Maintaining the bitrate if the throughput goes down

Bitrate ≤ previous instant throughput

Switching down to the next lower bitrate if the existing bitrate is higher than previous instant throughput

Switching to the next lower (higher) bitrate if the existing bitrate is higher (lower) than previous instant throughput

Bitrate ≤ previous instant throughput times 0.5

Jumping to the lowest bitrate

Jumping to the lowest bitrate

Bitrate ≤ previous instant throughput times 0.3

the

4

in Section II, the buffer levels corresponding to a selected path can be estimated using the representation of Fig. 1. On the selected path, the buffer level in our method is allowed to decrease at most by a half of the target buffer level. The main goal is to avoid a large bitrate change when the throughput goes down while still guaranteeing that the buffer never underflows. If the buffer level is currently low, the aggressiveness of the method (which is the inverse of N ) is increased. Specifically, the number of sections N in the path is determined as follows. N = [Rbuf × Ω + 1] { Bcur −Bmin

with

resulting bitrate, as seen in the experiment results of [22], is unnecessarily fluctuating though the instant throughput is very stable. In our opinion, methods to be considered for evaluation should not have 1) fluctuating bitrate in an unnecessary manner and 2) many control parameters or rules for correcting/tuning client behaviors. B. Buffer-based methods The buffer-based methods decide the bitrate mainly based on the buffer characteristics. As mentioned, these methods may take into account the throughput as well. In [9][10][6], the buffer is divided into multiple ranges, and different actions are applied when the buffer level stays in different ranges as summarized in Table I. Here B1 , B2 , B3 , Bmax (0 < B1 < B2 < B3 < Bmax ) are the buffer thresholds. Note that the specific values of the buffer thresholds depend on the adaptation method. In our preliminary evaluation, [9] and [10] are similar and can maintain a smooth bitrate in on-demand streaming. The method of [9] seems more stable because the criterion to maintain the bitrate is dependent on a range of buffer levels (defined by thresholds B2 and B3 ), rather than the smoothed throughput measure of [10]. With the method of [6], if the buffer level is about 35% ∼ 50% of the maximum level, the throughput estimate is the same as the previous throughput. If the buffer level is reduced into lower ranges, the throughput estimate is equal to the previous throughput multiplied with a down-scaling factor. So, this method is actually more aggressive than the instant throughput based method, where the throughput estimate is simply equal to the previous segment throughput. Based on the trellis discussed in Section II, we proposed in [14] a heuristic method to build a path of N sections with roughly equal bitrate changes (or slope differences on the trellis) for N future segment intervals. As mentioned

Rbuf =

Bmax −Bmin

0

Bcur > Bmin Bcur ≤ Bmin

(6) (7)

where [x] is the integer part of x; Bmin , Bmax , Bcur are the minimum buffer threshold, the buffer size, and the current buffer level; and Ω is an empirical value that controls the aggressiveness of the method. Currently, we set Bmin = 0.5× Bmax and Ω = 6. The throughput estimate is obtained by the throughput estimation method of [2], which is stable to small fluctuations while responding quickly to sudden changes of the instant throughput. At any segment interval among these N intervals, if the throughput change is predicted to be more than 10%, a new path is built from that point with the new Bcur and throughput estimate. As described in Section II-A, this method considers the buffer level in the near future, so it will be referred to as “future buffer based” method (FBB). In [23], the buffer level deviation and instant throughput are employed as inputs of a proportional-integral (PI) controller for bitrate adaptation. This method is somewhat similar to [6] in that a deviation-derived factor is multiplied with the throughput to decide the bitrate. The smoothness of this method is enabled by delaying the switching-up operation. However, the problem of this method is that the PI controller has some tuning parameters which are not easy to set. C. Selected Adaptation Methods and Experimental Setup Because showing the results for all existing methods would be very complicated for comparison, we will present only some typical methods that span across the problem space of bitrate adaptation in live streaming, i.e. from the most aggressive to the most conservative. The following five methods are considered in our evaluation. - Instant throughput based (ITB) method, which decides bitrate based on instant throughput with a small safety margin (Eqs. (3) and (5)) as described in Section III-A. This method is also called aggressive method in [2][7]. - Smoothed throughput based (STB) method, which decides bitrate based on the smoothed throughput measure with a small safety margin (Eqs. (4) and (5) with δ = 0.2). - Conservative throughput based (CTB) method, which is the method of [8]. - Thresholded buffer based (TBB) method, which is the method of [9]. - Future buffer based (FBB) method, which is the method of [14] described above.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

In this study, all the experiments with the above adaptation methods have the same experimental setup as follows. The test-bed of our evaluation is similar to that of [2], which includes a client, an IP network, and a Web server. The server is an Apache Web server version 2.2.21 running on Ubuntu 11.10 (with default TCP CUBIC). For persistent connections, the server’s timeout is set to 100s and MaxRequest to 0 (i.e. unlimited). The client is implemented in Java and runs on a Windows 7 Professional notebook with 2.0GHz Core2Duo CPU and 2GB RAM. The DummyNet tool [24] is installed at the client to simulate network characteristics. The RTT value of DummyNet is set to 40ms. As random fluctuations of throughput would make it difficult to compare the results of different experiment runs, the loss rate of DummyNet is set to 0% for clear results. Here, we assume that the actual bandwidth trace used in the experiments already contains the fluctuations caused by packet loss. The safety margin µ (in Eq. (5)) to be used for the throughput estimate is 0.05. Using the FFmpeg tool [25], video representations are encoded in constant bitrate (CBR) mode by the main profile AVC (Advanced Video Coding) encoder. Each encoded video has a frame rate of 30fps and CIF resolution. All media segments have the same duration of 2s. As for the set of representations, we have two main contexts. In context #1, the representation bitrates are equally spaced by 200kbps with the highest video bitrate being 3000kbps. In the context #2, the representations are selected such that they are equally spaced on the JND axis. Context #1 will be used in Section IV and context #2 in Section V. IV. E VALUATION WITH REPRESENTATIONS OF EQUALLY- SPACED BITRATES In this section, we carry out experiments to evaluate the adaptation methods mentioned above using context #1. We compare the methods first using a simple simulated bandwidth and then using an actual bandwidth trace. Note that, in the following figures, a bandwidth curve represents the theoretic capacity of the link at a given time; a throughput curve represents the segment throughput [2], measured after receiving a segment; and a bitrate curve represents the bitrates of received segments provided by an adaptation method.

5

employed in this experiment. As this clip is very short (about 10s), it is looped until the end of a session. For the TBB method, the buffer thresholds (B1 , B2 , B3 ) are (5s, 10s, 20s). For live streaming, the client cannot have a buffer level that is higher than the target level or the initial buffering level. So, the optimal level and high level (B3 ) of TBB [9] are all set to 20s. For the STB method, Eq. (4) with δ = 0.2 is used to compute the smoothed throughput. Figs. 3(a), 3(b), and 3(c) show the resulting bitrate curves, buffer level curves and playout curves of the methods. From these figures, we can see the following behaviors. - ITB method: it is obvious that this method changes the bitrate aggressively according to the instant throughput, and thus the buffer is very stable. - STB method: this method provides a gradual decrease of both bitrate and buffer level. The responsiveness of this method can be controlled by the value of δ. Note that this method will cause buffer underflow if the buffer size is reduced to 10s or less (as seen in Fig. 3(b)). - CTB method: After a short delay, this method also reduces the bitrate quickly according to the dropped bandwidth. However, as shown later, it is very slow in jumping to a higher bitrate. - FBB method: this method also reduces the bitrate gradually while guaranteeing that the buffer is not violated. - TBB method: the behavior of this method can be understood by looking at its buffer. When the buffer level is still within 10s ∼ 20s, the client maintains the current bitrate. When the buffer level goes below 10s, the client consecutively switches to the next lower bitrate. Finally when the buffer level is lower than 5s, the client switches to the lowest bitrate. This behavior can be found in [9][10] at the points of sudden bandwidth drop. Obviously, the bitrate of this method will decrease faster if the thresholds are increased and/or the bitrates of the representations are better spaced. From the above example, we can see the trade-off between bitrate smoothness and buffer stability. The more quickly the bitrate is reduced, the lower the variation of buffer level will be. In fact, all the other methods also have the similar tradeoff. B. Complex Bandwidth Case

A. Simple Bandwidth Case In bitrate adaptation for live streaming, one of the most important cases is when the throughput drops suddenly. As seen in the bandwidth trace of the following part, this event is common in wireless networks. Because the length of buffered media is limited, the client should react quickly to avoid buffer underflow. This is different from the throughput-increasing case where the client can choose to switch up the bitrate quickly or gradually without worrying about the buffer. In this part, we compare different adaptation methods when the bandwidth suddenly drops as in Fig. 3(a). In fact, this is a special case where the behaviors of different methods could be clearly explained. The results of this case are obtained under context #1 with buffer size of 20s. The Harbour video is

In this part, we compare the adaptation methods using a real bandwidth trace (Fig. 4) [6]. We consider 3 values of buffer sizes, 20s, 10s, and 6s. The settings of buffer levels (B1 , B2 , B3 ) for TBB method are (5s, 10s, 20s), (4s, 7s, 10s) and (3s, 4.5s, 6s) respectively for buffer sizes of 20s, 10s, and 6s. Because sudden bitrate changes (also called switches) are negative to users, we will analyze the relative bitrate changes between consecutive segments. Meanwhile, for the buffer, the deviation from the target buffer level will be described. Again, the Harbour video is employed in this experiment. For the sake of clarity, a throughput curve, rather than the complex bandwidth trace, is shown in the following figures. All the selected methods, except the ITB method, do not quickly increase the bitrate when the bandwidth/throughput

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

6

4000 Bandwidth FBB

ITB STB

TBB CTB

Bitrate (kbps)

3000

2000

1000

0 28

38

48

58

Time (s)

(a) Adapted bitrate of throughput-based methods

(a) 30

4000 ITB STB

TBB CTB

3000 20 2000 10

Buffer level (s)

Bitrate (kbps)

Bandwidth FBB

1000

0

0

28

33

38

43

48

53

58

Time (s)

(b) Adapted bitrate of buffer-based methods

(b)

Accumulative data (kByte)

6000

Arrival ITB TBB 4000

FBB STB CTB

2000

0

20

30

40

50

60

70

Time (s)

(c) Resulting buffer level

(c) Fig. 3. Behaviors of bitrate (a), buffer (b), and playout curve (c), of different adaptation methods under a drop of bandwidth 6000

Bandwidth (kbps)

5000 4000 3000

2000 1000 0

0

50

100

150

200

250

300

350

400

Time (s)

Fig. 4. Bandwidth trace employed in the evaluation [6]

goes up. Thus, we modify the ITB method so that the bitrate is switched up gradually (i.e. from a lower bitrate to the next higher bitrate) in throughput-increasing case. This will provide fairness in comparing the ITB method and other methods. First we set the buffer size to 20s. Fig. 5(a) shows the bitrate curves of 3 throughput based methods, namely ITB, STB, and CTB. These three methods are independent of the buffer size so they will not be shown in the figures of smaller buffer sizes. Fig. 5(b) shows the bitrate curves of the TBB and FBB methods. Note that the ITB method is included in this figure (and some later figures) for reference purposes. The buffer

Fig. 5. Adaptation results of different adaptation methods when the buffer size is 20s

levels of all methods are shown in Fig. 5(c). It can be seen that the ITB method results in the most fluctuating bitrate curve. However, compared to other methods, the buffer of this method has the smallest variation. The STB method provides a little smoother bitrate curve with larger buffer level variations, which will cause buffer underflow when buffer size is 6s. The CTB method interestingly shows its conservativeness by maintaining a low and stable bitrate in most intervals. As seen in Fig. 5(b), the TBB method tries to maintain a high bitrate as much as the buffer allows, which is contrary to the CTB method. The effectiveness of this strategy depends on whether the throughput will go up soon (e.g. during 300s ∼ 350s). In other intervals (e.g. 190s ∼ 240s) the throughput keeps going down so the client has to switch suddenly to the lowest bitrate. With the FBB method, its bitrate curve is smoother than those of the TBB and ITB methods. Figs. 6(a) and 6(b) depict the client behaviors when the buffer size is 10s. The bitrate curve of the FBB method looks more fluctuating than in Fig. 5(b) but it is still smoother than the curve of the ITB method. For the TBB method, the bitrate from 190s to 240s decreases more smoothly than before.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

7

However, during 320s ∼ 350s, there is a new sudden bitrate drop. This suggests that 1) the results of the TBB method depend on the buffer size and 2) selecting good thresholds is a difficult task. We can see in Fig. 6(b) that sometimes the buffer levels of the FBB and ITB methods drop quickly (around 80s and 370s). This is due to the fact that any change of bandwidth is only detected after a time delay (usually a segment duration). So, if 1) the segment being requested or downloaded has a high bitrate and 2) the bandwidth suddenly drops at that time, it may take a long time to fully receive that segment. Obviously, after receiving that segment, the buffer level is significantly reduced. The solution to this problem will be reserved for our future work.

(a) Adapted bitrate

(b) Resulting buffer level

(a) Adapted bitrate

Datasize

Fig. 7. Adaptation results of different adaptation methods when the buffer size is 6s

High throughput Low throughput

P4å

$.Ü $.Ü?5

ã

P4

Time

Fig. 8. Variations of buffer level due to throughput fluctuations (b) Resulting buffer level Fig. 6. Adaptation results of different adaptation methods when the buffer size is 10s

Fig. 7 shows the case with buffer size equal to 6s. It can be seen that the advantage of the TBB method is not present with this small buffer size. There are more points of sudden bitrate drop, and the performance of the ITB method and the FBB method are close. This is because, when the buffer size is small, the FBB method becomes as aggressive as the ITB method. In the above figures, the buffer level sometimes is higher than the target level. This is because we measure the buffer level after fully receiving a segment. As shown in Fig. 8, if the throughput of that segment turns out to be higher than the previous one’s throughput, the segment will arrive earlier than expected. So the buffer level at that time (BLi in Fig. 8) is higher than the previous buffer level (i.e. BLi−1 ). Some statistics of the tested methods are provided in Table II. The first four data rows show the statistics of the bitrate, while the next three rows show those of bitrate switches. Each switch is represented by the non-zero bitrate difference between the two consecutive segments. The maximum switch

means the largest bitrate difference between any two consecutive segments over the whole session; and the average of switches is the average value of all switches over the whole session. The last row describes the standard deviation (STD) of buffer level where the ITB method has the smallest value. We can see that the statistics provide more concrete information about the behaviors of the methods. Specifically, the CTB method has the lowest average and STD of bitrates due to its conservativeness. As the FBB method always tries not to switch bitrate suddenly, its minimum bitrate (600kbps) is the highest while its STD of bitrates and average of switches are only higher than those of the STB method. As expected, the TBB and CTB methods have smaller numbers of switches; however, their averages of switches and maximum switches are much higher than other methods due to the sudden bitrate drops. In the context of these experiments, we can have some notes as follows. 1) The TBB method is suitable for on-demand streaming, but not for live streaming due to frequent sudden drops of bitrate and difficulties for service providers in deciding buffer thresholds (especially with small buffer sizes)

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

8

TABLE II S TATISTICS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #1. E XCEPT THE NUMBER OF SWITCHES AND STD OF BUFFER LEVEL , THE UNIT OF OTHER PARAMETERS IS KBPS . Buffer-based methods Throughput–based methods

Statistics

Average of bitrates STD of bitrates Maximum bitrate Minimum bitrate Maximum switch Number of switches Average of switches STD of buf. level (s)

Buffer size=20s

Buffer size=10s

Buffer size=6s

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

1581

1621

1331

1798

1852

1687

1769

1664

1744

841

788

551

1103

773

1012

842

1019

885

3000

3000

2600

3000

3000

3000

3000

3000

3000

200

400

200

200

600

200

400

200

400

2000

600

2000

2600

1200

2600

2000

2800

2000

147

100

40

44

79

72

74

79

69

268

222

330

341

243

278

257

334

310

0.91

2.91

1.57

5.00

3.69

2.21

1.81

1.17

1.24

before any viewing session. 2) If a small number of switches and a small deviation of bitrates are the most important, the CTB method should be selected. The traffic provided by this method is the least bursty. 3) When a small buffer size is the most important, the ITB and FBB methods are the most suitable. The ITB method is simple but has a very fluctuating bitrate curve, while the FBB method is complex and has a smoother bitrate curve. However, the smaller the buffer size is, the closer the performances of the ITB and FBB methods become. 4) When a large buffer size is allowed, the STB method will provide the smoothest bitrate change (with the lowest average of switches and lowest maximum switch). Another advantage of this method is the simple implementation. 5) If the minimum bitrate provided by a method must be as high as possible, the FBB method should be used. It should be noted that the above suggestions are valid only in the given context. As shown later in Section V, when the number of representations is smaller and the bitrates are not equally spaced, some methods may have different behaviors.

each adaptation method when the buffer size is 20s. We can see that the distortions of different methods are rather close. Even though the CTB method uses much lower bitrates, its distortion is not much different from other methods’ distortion (around 1-2 JND units). The curves of relative distortion change show that usually the distortion changes are less than 3 JND units. Among the methods, the TBB method causes more significant distortion changes than others. This impact is more severe when the buffer size is smaller. An example of this phenomenon is provided in Fig. 9(c) for buffer size equal to 10s. The statistics of distortions are shown in Table III, including all cases of buffer size. It is confirmed that TBB has the largest average and standard deviation of distortions. Though the CTB method has a low bitrate curve, its minimum distortion is only 0.25, meaning that if the highest bitrate is smartly selected, one can save a lot of bandwidth resources without sacrificing the quality much.

(a) Perceptual distortion when buffer size is 20s

(b) Distortion change when buffer size is 20s

C. Perceptual Impact Evaluation In the previous part, we have evaluated typical methods through the behaviors of video bitrate and buffer level. In this part, we further investigate the perceptual impacts of these methods using the JND metric. The perceptual distortions are obtained by mapping video bitrates into distortion values. Especially, we also focus on the relative distortion change at each bitrate switch (i.e. the distortion difference between the two related segments). Two different cases, one using the Football video and the other using the Harbour video, are studied. Similar to the previous parts, each clip is looped until the end of an experiment run. It should be noted that, the representation set of context #1 is still used in this part. For the Football video, Figs. 9(a) and 9(b) show the distortion values and the relative distortion changes provided by

(c) Distortion change when buffer size is 10s Fig. 9. Results of perceptual distortion (a) and distortion change (b)-(c) of different adaptation methods using context #1 and Football video

The lowest average of distortions (1.58) and the lowest minimum distortion (6.61) are provided by the FBB method at buffer size of 20s. Note that, in terms of bitrate, the average bitrate of the FBB method is just a little higher than that of other methods. The FBB method also has the smallest

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

9

TABLE III S TATISTICS OF PERCEPTUAL DISTORTIONS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #1 AND F OOTBALL VIDEO . E XCEPT THE NUMBER OF CHANGES , THE UNIT OF OTHER PARAMETERS IS JND UNIT. Buffer-based methods Throughput–based methods

Statistics

Average of distortions STD of distortions Maximum distortion Minimum distortion Maximum change Number of changes Average of changes

Buffer size=20s

Buffer size=10s

Buffer size=6s

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

2.78

2.31

2.83

3.65

1.58

3.37

2.09

3.34

2.39

3.05

2.59

2.52

5.25

1.77

4.59

2.46

4.18

2.90

14.34 9.23

14.34

14.34 6.61

14.34 9.23

14.34 9.23

0.00

0.00

0.25

0.00

0.00

0.00

7.72

2.62

7.72

14.21 2.34

14.21 8.73

14.34 8.48

136

102

40

44

79

75

78

83

74

1.21

0.64

1.76

1.92

0.50

1.66

0.79

1.93

1.06

0.00

0.00

(a) Perceptual distortion

0.00

deviation of distortion (1.77), while the smallest deviation of bitrates is provided by the CTB method. However, these advantages of the FBB method become less significant for smaller buffer sizes. Among these methods, STB and FBB have the smoothest transitions, where the maximum changes are just 2.62 and 2.34, and averages of changes are 0.64 and 0.5. For the Harbour video, Figs. 10(a) and 10(b) show the distortion values and the relative distortion changes also at buffer size of 20s. It can be seen that, in this case, the users would perceive very low distortions compared to the case of the Football video. There are few points of large distortion and the maximum change of distortion is only about 4.5 JND units. Table IV provides the corresponding statistics for the Harbour video. The data of the table confirm the low distortions obtained by all methods and with all buffer sizes. From the above discussion, we can have some notes as follows: - Given an adaptation method, the distortion perceived by the users would depend on the content. - For video of slow motion, distortions of different methods are very close. In this case, the selection of adaptation methods will depend chiefly on the bitrate and buffer behaviors. - The FBB method provides good results in terms of perceptual distortion. However, when buffer size is reduced, its advantage becomes less significant. V. E VALUATION WITH JND- DERIVED SETS OF REPRESENTATIONS

In the previous sections, the bitrates of representations are equally spaced by a fixed step size (200kbps). In this section, using the knowledge of JND-bitrate relationship (Fig. 2), we investigate how a different set of representations influences the results of the adaptation methods. Based on the results of Fig. 2, the bitrates which are roughly spaced by 1.5 JND units are selected to build a representation set for this section. Again, the Football and Harbour videos are used. The list of

(b) Distortion change Fig. 10. Results of perceptual distortion (a) and distortion change (b) of different adaptation methods using context #1 and Harbour video. The buffer size is 20s. TABLE IV S TATISTICS OF PERCEPTUAL DISTORTIONS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #1 AND H ARBOUR VIDEO . E XCEPT THE NUMBER OF CHANGES , THE UNIT OF OTHER PARAMETERS IS JND UNIT. Buffer-based methods Throughput–based methods

Statistics

Average of distortions STD of distortions Maximum distortion Minimum distortion Maximum change Number of changes Average of changes

Buffer size=20s

Buffer size=10s

Buffer size=6s

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

0.66

0.55

0.68

1.07

0.41

0.95

0.52

0.86

0.57

0.66

0.44

0.44

1.66

0.29

1.39

0.46

1.21

0.55

4.58

2.09

4.58

4.58

1.08

4.58

2.09

4.58

2.09

0.00

0.00

0.14

0.00

0.00

0.00

0.00

0.00

0.00

3.50

1.01

3.50

4.51

0.42

4.51

1.81

4.58

1.67

136

102

40

44

79

75

78

83

74

0.25

0.15

0.38

0.61

0.10

0.51

0.16

0.57

0.24

bitrates (in unit of kbps) is 3000, 1495, 1038, 773, 640, 550, 427, 322, 260, 222 for the Football video and 1500, 482, 306, 202, 144, 108, 87 for the Harbour video. The other settings are the same as before. Compared to context #1, an obvious advantage of context #2 is that it requires less storage space for the representations. Fig. 11 shows the results for the Football video with buffer size of 20s. Now, most bitrate curves look smoother than previously and there are less sudden drops of bitrate. In this context, the TBB method could better avoid switching suddenly to the lowest bitrate. This is because the representation bitrates are more widely spaced at the high bitrate range, thus enabling faster bitrate reduction. Especially, the CTB method now can achieve the highest

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

bitrate. The reason is that, between 1495kbps and 3000kbps, there are no intermediate bitrates in this context, so the method has to jump to the highest bitrate after some delay. In contrast, in the previous context #1 (which is similar to the context used in [9] for the CTB method), there are a lot of close bitrates and so the client does not have enough time to go to the highest bitrate before the bandwidth/throughput goes down again. The statistics of the methods with the Football video are provided in Table V. The data confirm that the bitrate curves now are smoother (through much fewer number of switches, generally higher minimum bitrates, and lower STD of bitrates). The statistics of the STB method show that it still has the smoothest bitrate curve (with the lowest average of switches and the lowest minimum switch). For the CTB method, a higher average bitrate and a higher STD of bitrates imply that the behavior of this method is not as conservative as before. That is, the note #2 of Section IV-B is not valid in this context.

10

TABLE V S TATISTICS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #2 AND F OOTBALL VIDEO . E XCEPT THE NUMBER OF SWITCHES AND STD OF BUFFER LEVEL , THE UNIT OF OTHER PARAMETERS IS KBPS . Buffer-based methods Throughput–based methods

Statistics

Average of bitrates STD of bitrates Maximum bitrate Minimum bitrate Maximum switch Number of switches Average of switches STD of buf. level (s)

Buffer size=20s

Buffer size=10s

Buffer size=6s

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

1450

1426

1606

1676

1621

1544

1573

1645

1546

805

718

951

895

807

954

818

953

822

3000

3000

3000

3000

3000

3000

3000

3000

3000

322

427

322

640

550

222

427

322

322

2360

1505

1505

1505

1505

2778

1505

2360

2360

75

40

38

17

30

37

33

35

38

417

404

616

644

679

472

515

705

656

1.05

1.90

1.62

4.77

3.00

1.86

1.93

1.24

1.22

TABLE VI S TATISTICS OF PERCEPTUAL DISTORTIONS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #2 AND F OOTBALL VIDEO . E XCEPT THE NUMBER OF CHANGES , THE UNIT OF OTHER PARAMETERS IS JND UNIT. Buffer-based methods Throughput–based methods

Statistics

(a) Adapted bitrate

(b) Distortion change Fig. 11. Adaptation results of different adaptation methods using context #2 and Football video. The buffer size is 20s.

With buffer size of 20s, the TBB method would be the best with the highest values of average bitrate and minimum bitrate (640kbps), the smallest maximum switch (1505kbps), and the least number of switches (17). This behavior is much better than that in Table II. However, this is not the case anymore when the buffer size is 10s or 6s, where the all parameters become much worse. This again suggests that it is difficult to adjust the thresholds of the TBB method. In terms of perceptual distortion (Table VI) the most important point is that the average distortions of all methods are nearly the same as the results of Section IV-C while the variations of distortions become smaller (with generally lower STD of distortions, lower maximum change, and lower number of changes). So, with context #2, we have not only better storage usage but also better perceptual quality. Fig. 12 shows the results for the Harbour video. With this

Average of distortions STD of distortions Maximum distortion Minimum distortion Maximum change Number of changes Average of changes

Buffer size=20s

Buffer size=10s

Buffer size=6s

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

2.83

2.65

2.75

2.27

2.12

3.36

2.35

2.63

2.51

2.57

2.25

2.78

2.24

1.91

4.08

2.19

2.72

2.46

10.50 9.00

10.50

6.00

7.50

13.50 9.00

10.50 10.50

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

6.00

1.50

6.00

1.50

1.50

13.50 7.50

6.00

7.50

75

40

38

17

30

37

33

35

38

1.78

1.50

2.05

1.50

1.50

2.35

1.82

2.06

2.05

video, the representation bitrates are very low so bitrate curves are much more stable than those in context #1. Tables VII and VIII for the Harbour video shows that, in this case, all methods (except TBB) have similar performances which are better than those in context #1. Another important advantage of using JND-derived representation bitrates is the buffer stability. Fig. 13 depicts, for the buffer size of 20s, the cumulative distribution function (CDF) of the buffer level when the representation bitrates are (a) equally spaced by 200kbps, (b) JND-derived for Football, and (c) JND-derived for Harbour. Fig. 14 shows similar results for buffer size of 10s. We can see that, from the left to right figures, the buffer level tends to have higher values. That means the buffer will be more stable in context #2. The same behavior can be found with buffer size of 6s. VI. D ISCUSSIONS From the above experiments, we can see some interesting findings with the existing adaptation methods. First, the

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

11

TABLE VIII S TATISTICS OF PERCEPTUAL DISTORTIONS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #2 AND H ARBOUR VIDEO . E XCEPT THE NUMBER OF CHANGES , THE UNIT OF ALL PARAMETERS IS JND UNIT. Buffer-based methods Throughput–based methods

Statistics

Average of distortions STD of distortions Maximum distortion Minimum distortion Maximum change Number of changes Average of changes

(a) Adapted bitrate

FTB

TTB

FTB

TTB

FTB

0.62

0.62

0.48

0.38

0.45

1.07

0.53

1.01

0.55

0.80

0.76

0.70

0.65

0.69

2.51

0.76

2.13

0.77

3.00

3.00

1.50

1.50

1.50

10.50 3.00

10.50 3.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

1.50

1.50

1.50

1.50

1.50

10.50 1.50

10.50 1.50

24

11

11

4

8

16

10

24

16

1.50

1.50

1.50

1.50

1.50

2.63

1.50

2.25

1.50

Buffer size=10s

0.6 0.4

0.6

0.00

0.4 0.2

0.2

0

0 4

8

12

16

0

20

4

Buffer level (s)

Buffer size=6s

0.00

ITB TBB FBB STB CTB

0.8

8

12

16

20

Buffer level (s)

(a)

(b)

ITB

STB

CTB

TTB

FTB

TTB

FTB

TTB

FTB

1

1099

1082

1174

1246

1195

1165

1156

1110

1141

0.8

503

503

476

442

468

528

486

535

491

0.6

1500

1500

1500

1500

1500

1500

1500

1500

1500

306

306

482

482

482

62

306

62

306

1018

1018

1018

1018

1018

1438

1018

1438

1018

24

11

11

4

8

16

10

24

16

844

865

1018

1018

1018

360

850

509

913

0.60

1.16

1.72

2.96

2.25

1.68

1.63

0.85

1.18

ITB TBB FBB

CDF

Average of bitrates STD of bitrates Maximum bitrate Minimum bitrate Maximum switch Number of switches Average of switches STD of buf. level (s)

Buffer size=20s

methods

TTB

1

0

Buffer-based methods Throughput–based

CTB

CDF

CDF

TABLE VII S TATISTICS OF DIFFERENT ADAPTATION METHODS USING CONTEXT #2 AND H ARBOUR VIDEO . E XCEPT THE NUMBER OF SWITCHES AND STD OF BUFFER LEVEL , THE UNIT OF ALL PARAMETERS IS KBPS .

Buffer size=6s

STB

ITB TBB FBB STB CTB

0.8

Fig. 12. Adaptation results of different adaptation methods using context #2 and Harbour video. The buffer size is 20s.

Buffer size=10s

ITB

1

(b) Distortion change

Statistics

Buffer size=20s

STB CTB

0.4 0.2 0 0

4

8

12

16

20

Buffer level (s)

(c)

changed behaviors of some methods (e.g. CTB and TBB) suggest that these methods have been developed with a type of representation set in mind. So, on the one hand, it is important to prepare a representation set which is suitable to the adaptation method in use. On the other hand, it would be better if we had some adaptation methods which worked optimally with any type of representation sets. Second, there are various tradeoffs among the adaptation methods. No method is claimed to be the best here because each method has its advantages and disadvantages. Especially this may depend on the given context. This suggests that multiple methods could be employed by a client. For example, when the throughput variations are small, the STB method could be used; and when the throughput drops quickly, the

Fig. 13. Cumulative distribution functions of buffer level, when buffer size is 20s, for equally spaced bitrates (a), for JND-derived bitrates of Football (b), for JND-derived bitrates of Harbour (c). From figures (a) to (c), the buffer level tends to be more stable.

FBB or ITB methods could be applied depending on the current buffer level. This strategy has been implemented in [2], where the advantages of both ITB and STB methods are exploited by dynamically changing δ of Eq. (4). Also, when the network entities should work conservatively to avoid possible congestions, the CTB method could be turned on. The evaluation results show that the knowledge of JND gives insights into the time-varying nature of perceptual quality in adaptive streaming. A use of JND could be the hints in selecting bitrates. For example, if the trellis of Fig. 1 were enhanced with JND values, the client could select a path of bitrates that provides a gradual change of perceptual distortion, rather than a gradual change of bitrate. Different trellis-based

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

reveal that, in just a few experiment runs (using the Harbour video and the ITB method), the buffer level variation is well within 4s. That means, if the buffer size is reduced to 2 segment durations, no adaptation methods in this study could guarantee a continuous session without buffer underflows.

1

1

ITB

ITB

TBB

0.8

TBB

0.8

FBB

FBB 0.6

STB

CDF

CDF

12

CTB 0.4 0.2

0.6

STB CTB

0.4

VII. C ONCLUSIONS

0.2

0

0 0

2

4

6

8

10

12

0

2

Buffer level (s)

4

6

8

10

12

Buffer level (s)

(a)

(b) 1

ITB TBB

0.8

CDF

FBB 0.6

STB CTB

0.4 0.2

0 0

2

4

6

8

10

12

Buffer level (s)

In the new trend of HTTP streaming, various adaptation methods have been developed. In this paper, we have investigated some typical adaptation methods in the context of live video streaming. The evaluation and comparison were carried out not only in terms of bitrate and buffer behaviors but also in terms of the perceptual impact to end users. Through experiment results, we showed various tradeoffs among the adaptation methods. It was found that the perceptual impact depends not only on the adaptation method but also the content itself. Also, the selection of representation set for certain adaptation methods should be carefully considered. For future work, we will focus on distortion-based adaptation methods that take advantage of the JND-bitrate relationship.

(c) Fig. 14. Cumulative distribution functions of buffer level, when buffer size is 10s, for equally spaced bitrates (a), for JND-derived bitrates of Football (b), for JND-derived bitrates of Harbour (c). From figures (a) to (c), the buffer level tends to be more stable.

algorithms can be devised to find optimal paths [26]. As mentioned in [17], another important use of JND is to build a good set of representations, which has acceptable perceptual difference between representations and enables the efficiency in storage and bandwidth usage. In our opinion, finding an optimal set of representations in different context is still an open and interesting question in the near future. As seen in Section V, an interesting benefit of JND is that a JND-derived set of representations usually results in smoother adaptation in terms of both perceptual quality and buffer level. The reason behind this fact is that, in such a set, the representations have gradual change in quality while the bitrate can be reduced more quickly at the high bitrate range. An early and large reduction of bitrate in response to throughput drops helps avoid severe drops of buffer level. In this study, JND measures for test videos are known in advance. To obtain JND measures for live video sources, a potential solution is using a machine learning approach as in [27]. Specifically, video clips of a training data set can be divided, using some content features, into a number of classes. Video clips of the same class will have a representative bitrate-distortion function. For any live video clip, it will be first classified and then attributed with the bitrate-distortion function of its class. As discussed in [4], the smallest buffer size for live video streaming is expected to be 2 segment durations. In the current settings of our evaluation experiments, the smallest buffer size contains 3 media segments (6s). If the buffer size is reduced to contain only 2 media segments (4s), it is obvious that no buffer-based methods could be applied. As for the throughputbased methods, the CDFs of buffer level in Figs. 13 and 14

ACKNOWLEDGMENT The authors would like to sincerely thank Prof. Christian Timmerer of Klagenfurt University for providing the bandwidth trace used in this study. The authors are also grateful to the anonymous reviewers for their constructive comments that greatly help improve the content of the paper. R EFERENCES [1] A. C. Begen, T. Akgul, and M. Baugher, “Watching video over the web: Part 2: Applications, standardization, and open issues,” IEEE Internet Computing, vol. 15, no. 3, pp. 59–63, Apr. 2011. [2] T. C. Thang, Q.-D. Ho, J.-W. Kang, and A. T. Pham, “Adaptive Streaming of Audiovisual Content using MPEG DASH,” IEEE Trans. on Consumer Electronics, vol. 58, no. 1, pp. 78–85, Feb. 2012. [3] S. Tullimas, T. Nguyen, R. Edgecomb, and S.-C. Cheung, “Multimedia streaming using multiple tcp connections,” ACM Trans. Multimedia Comput. Commun. Appl (ACM TOMCCAP), vol. 4, no. 2, pp. 1–20, May 2008. [4] T. Lohmar, T. Einarsson, P. Frojdh, F. Gabin, M. Kampmann, “Dynamic adaptive http streaming of live content,” in World of Wireless, Mobile and Multimedia Networks (WoWMoM), Jun. 2011. [5] H. Riiser, H. S. Bergsaker, P. Vigmostad, C. Griwodz, P. Halvorsen, “A Comparison of Quality Scheduling in Commercial Adaptive HTTP Streaming Solutions on a 3G Network,” in Proc. ACM Workshop on Mobile Video (MoVid), pp. 25–30, NC, USA, Feb. 2012. [6] C. M¨uller, S. Lederer, and C. Timmerer, “An evaluation of dynamic adaptive streaming over http in vehicular environments,” in Proceedings of the 4th Workshop on Mobile Video (MoVid), pp. 37–42, NC, USA, Feb. 2012. [7] L. R. Romero, “A dynamic adaptive http streaming video service for google android,” in M.S. Thesis, Royal Institute of Technology (KTH), Oct. 2011. [8] C. Liu, I. Bouazizi, and M. Gabbouj, “Rate adaptation for adaptive http streaming,” in Proceedings of the Second Annual ACM Conference on Multimedia Systems (ACM MMSys2011), California, Feb. 2011. [9] K. Miller, E. Quacchio, G. Gennari, and A. Wolisz, “Adaptation algorithm for adaptive streaming over http,” in Packet Video Workshop (PV), 2012 19th International, pp. 173–178, May 2012. [10] S. Akhshabi, S. Narayanaswamy, A. C. Begen, and C. Dovrolis, “An experimental evaluation of rate-adaptive video players over http,” Signal Processing: Image Commun, vol. 27, no. 4, pp. 271–287, Apr. 2012. [11] T. Stockhammer, “Dynamic adaptive streaming over http – Standards and design principles,” in Proceedings of the Second Annual ACM Conference on Multimedia Systems (ACM MMSys’2011), pp. 133–144, NC, USA. Feb. 2011.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

[12] ISO/IEC IS 23009-1: “Information technology - Dynamic adaptive streaming over HTTP (DASH) – Part 1: Media presentation description and segment formats,” 2012. [13] T. C. Thang, J.-G. Kim, J. W. Kang, and J.-J. Yoo, “SVC adaptation: Standard tools and supporting methods,” Signal Processing: Image Commun., vol. 24, no. 3, pp. 214–228, Mar. 2009. [14] H. T. Le, D. V. Nguyen, N. P. Ngoc, A. T. Pham, and T. C. Thang, “Buffer-based bitrate adaptation for adaptive http streaming,” in Proceedings of IEEE ATC2013, HoChiMinh City, Vietnam, Oct. 2013. [15] ITU, “Methods for subjective determination of transmission quality (ITU-T Rec. P.800),” International Telecommunication Union, Aug. 1996 [16] A. El Essaili, D. Schroeder, D. Staehle, M. Shehada, W. Kellerer, and E. Steinbach, “Quality-of-experience driven adaptive http media delivery,” in Communications (ICC), 2013 IEEE International Conference on Communications (ICC 2013), Budapest, Hungary, Jun. 2013. [17] T. C. Thang, H. X. Nguyen, A. T. Pham, and N. P. Ngoc, “Perceptual difference evaluation of video alternatives in adaptive streaming,” in Communications and Electronics (ICCE), 2012 Fourth Int’l Conf. on Comm. & Electronics (ICCE’12), Hue, Vietnam, Aug. 2012. [18] A. B. Watson and L. Kreslake, “Measurement of visual impairment scales for digital video,” Proc. SPIE, vol. 4299, pp. 79–89, 2001. [19] J. Janssen, T. Coppens, and D. D. Vleeschauwer, “Quality assessment of video streaming in the broadband era,” in Proc. Workshop on Advanced Concepts for Intelligent Vision Systems (ACIVS’2002), pp. 38–45, Sep. 2002. [20] T. C. Thang, H. T. Le, H. X. Nguyen, A. T. Pham, and J. W. Kang, Y. M. Ro “Adaptive Video Streaming over HTTP with Dynamic Resource Estimation, IEEE/KICS Journal of Communications and Networks, (in press). [21] K. Evensen, A. Petlund, H. Riiser, P. Vigmostad, D. Kaspar, C. Griwodz, and P. Halvorsen, “Mobile video streaming using location-based network prediction and transparent handover,” in Proc. The 21st International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’2011), pp. 21–26, Jun. 2011. [22] P. Xiong, Jialie Shen, Qingyang Wang, Deepal Jayasinghe, Jack Li and Calton Pu, “Nbs: A network-bandwidth-aware streaming version switcher for mobile streaming applications under fuzzy logic control,” in Mobile Services (MS), 2012 IEEE First International Conference on Mobile Services (MS’2012), Honolulu, Hawaii, USA, Jun. 2012. [23] G. Tian and Y. Liu, “Towards agile and smooth video adaptation in dynamic http streaming,” in Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (CoNEXT2012), pp. 109–120, Dec. 2012. [24] L. Rizzo, “Dummynet: A simple approach to the evaluation of network protocols,” SIGCOMM Comput. Commun. Rev., vol. 27, no. 1, pp. 31– 41, Jan. 1997. [25] FFmpeg tool. [Online]. Available: http://ffmpeg.org/download.html [26] T. C. Thang, J. W. Kang, J.-J. Yoo, and Y. M. Ro, “Optimal multilayer adaptation of svc video over heterogeneous environments,” Adv. MultiMedia, vol. 2008, Article ID 739192, 8 pages, 2008. [27] Y. Wang, J.-G. Kim, S.-F. Chang, and H.-M. Kim, “Utility-based video adaptation for universal multimedia access (uma) and content-based utility function prediction for real-time video transcoding,” Multimedia, IEEE Transactions on, vol. 9, no. 2, pp. 213–220, Feb. 2007.

Truong Cong Thang received the B.E. degree from Hanoi University of Technology, Vietnam, in 1997 and Ph.D degree from KAIST, Korea, in 2006. From 1997 to 2000, he worked as an engineer in Vietnam Post & Telecom (VNPT). From 2006 to 2011, he was a Member of Research Staff at Electronics and Telecommunications Research Institute (ETRI), Korea. He was also an active member of Korean delegation to standard meetings of ISO/IEC and ITU-T from 2002 to 2011. Since 2011, he has been an Associate Professor of University of Aizu, Japan. His research interests include multimedia networking, image/video processing, content adaptation, IPTV, and MPEG/ITU standards.

13

Hung T. Le received the B.E. degree from Hanoi University of Technology, Vietnam, in 2012. Since 2012, he has been a research assistant at the Computer Communications Laboratory, the University of Aizu. His major research interests are multimedia networking, video adaptation, QoE support. He is a student member of IEEE.

Anh T. Pham received the B.E. and M.E. degrees, both in Electronics Engineering from the Hanoi University of Technology, Vietnam in 1997 and 2000, respectively, and the Ph.D. degree in Information and Mathematical Sciences from Saitama University, Japan in 2005. From 1998 to 2002, he was with the NTT Corp. in Vietnam. Since April 2005, he has been on the faculty at the University of Aizu, where he is currently an associate professor at the Computer Communications Laboratory, the School of Computer Science & Engineering. His present research interests are in the area of computer networking, optical communications, and spread spectrum technique. Dr. Pham received Japanese government scholarship (MonbuKagaku-sho) for Ph.D. study. He also received Vietnamese government scholarship for undergraduate study. Dr. Pham is senior member of IEEE. He is also member of IEICE and OSA.

Yong Man Ro received the B.S. from Dept. of Electronics Engineering in Yonsei University (1985), Seoul and the M.S. and Ph.D. degrees from Dept. of Electrical Engineering in the KAIST (1987, 1992, respectively). He was a researcher at Columbia University and a visiting researcher in University of California at Irvine and KAIST. In 1996, he was a research fellow at Dept. of Electrical Engineering and Computer Sciences in University of California at Berkeley. In 2007, he was a visiting professor at Dept. of Electrical and Computer Engineering at the University of Toronto. He is currently a full professor of Department of Electrical Engineering in KAIST. He is also director of IVY (Image/Video sYstem) Lab. He was an active participant in MPEG standardization efforts since 1997. He has presented 78 contributions, and more than 10 technologies developed by his research lab were selected for inclusion in the MPEG-4, MPEG-7, MPEG-21, and MPEG-A standards. He is the author or co-author of more than 300 international research papers, 67 international standardization documents, more than 50 patents. His research interests include image/video processing, color face recognition, image/video indexing, and multimedia security, medical imaging, pattern recognition, and functional imaging. Dr. Yong Man Ro is a senior member of IEEE as well as member of ISMRM and SPIE. He received the Young Investigator Finalist Award of ISMRM in 1992 and the years scientist Award (Korea) in 2003. He served as an associate editor for IEEE signal processing letters (2010-2013.3).

Suggest Documents