J. Vis. Commun. Image R. 21 (2010) 129–138
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Understanding Internet Video sharing site workload: A view from data center design Xiaozhu Kang a, Hui Zhang b,*, Guofei Jiang b, Haifeng Chen b, Xiaoqiao Meng c, Kenji Yoshihira b a
Columbia University, New York, NY 10027, USA NEC Laboratories America, Princeton Campus, 4 Independence Way, Suite200, Princeton, NJ 08540, USA c IBM T.J. Watson Research Center, Hawthorne, NY 10532, USA b
a r t i c l e
i n f o
Article history: Received 2 December 2008 Accepted 29 June 2009 Available online 5 July 2009 Keywords: Measurement Data center design Media system Online video Workload management Capacity planning SLAs Queueing model Quality of service
a b s t r a c t Internet Video sharing sites, led by YouTube , have been gaining popularity in a dazzling speed, which also brings massive workload to their service data centers. In this paper we analyze Yahoo! Video, the 2nd largest U.S. video sharing site, to understand the nature of such unprecedented massive workload as well as its impact on online video data center design. We crawled the Yahoo! Video web site for 46 days. The measurement data allows us to understand the workload characteristics at different time scales (minutes, hours, days, weeks), and we discover interesting statistical properties on both static and temporal dimensions of the workload including file duration and popularity distributions, arrival rate dynamics and predictability, and workload stationarity and burstiness. Complemented with queueingtheoretic techniques, we further extend our understanding on the measurement data with a virtual design on the workload and capacity management components of a data center assuming the same workload as measured, which reveals key results regarding the impact of workload arrival distribution, Service Level Agreements (SLAs), and workload scheduling schemes on the design and operations of such largescale video distribution systems. Ó 2009 Elsevier Inc. All rights reserved.
1. Introduction Internet Video sharing web sites such as YouTube [2] have attracted millions of users in a dazzling speed during the past few years. In July 2007, it is reported that Americans view more than 9 billion video streams online, and 3 out of 4 U.S. Internet users streamed video online; the industry growth of the online video market keeps fast, with the number of video viewed jumping 8.6% from May to July 2007, in just 2 months [4,3]. Massive workload accompanies those web sites along with their business success. For example, in July 2007, YouTube, the largest Internet Video sharing web site, had daily video views of 79 millions; Yahoo! Video [1], ranked 2nd in U.S. online video properties, delivered 390 millions of video views in the same month [4]. How those companies design and operate their data centers to deliver the service are unknown to the outside, perhaps because the related techniques are deemed as a key part of their core competency over competitors [18]. What’s known publicly is that even YouTube is notorious for its unstable service, despite spending several million dollars monthly on bandwidth; as it is said, You know it’s Friday when YouTube is slow [6].
* Corresponding author. E-mail addresses:
[email protected] (X. Kang),
[email protected] (H. Zhang),
[email protected] (G. Jiang),
[email protected] (H. Chen),
[email protected] (X. Meng),
[email protected] (K. Yoshihira). 1047-3203/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2009.06.007
In order to understand the nature of such unprecedented massive workload and the impact on online video data center design, we analyze Yahoo! Video, the 2nd largest U.S. video sharing site in this paper. The main contribution of our work is an extensive trace-driven analysis of Yahoo! Video workload dynamics. We have crawled the Yahoo! Video site for a 46-days period, and collected the workload statistics at a frequency of 30 min. This allows us to understand the workload characteristics at different time scales (minutes, hours, days, weeks), and discover very interesting properties on both static and temporal dimensions of the workload characterization, including file popularity distribution and evolution, arrival rate dynamics and predictability, and workload burstiness. Complemented with queueing-theoretic techniques, we further extends our understanding on the measurement data with a virtual design on the workload and capacity management components of a data center assuming the same massive workload, which reveals key results regarding the impact of workload arrival distribution, Service Level Agreements (SLAs), and workload scheduling schemes on the design and operations of such large-scale video distribution systems. The highlights of our work can be summarized as the follows: Compared with other Internet Video sharing site workload measurement [9,12,10], we analyzed the workload at a more finegranularity level (30 min rather than 1 day).
130
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
We gave a comprehensive workload characterization for the second largest online video site in U.S. We observed that the measured massive workload can be clearly separated into two contrasting components: a predictable wellbehaving workload, and a bursty, non-predictable trespassing workload. We pointed out the nature of each component, and explained its implication on workload management. We gave quantitative analysis of the impact of workload arrival distributions (exponential and heavy-tailed), Service Level Agreements (SLAs) (average and tailed distribution performance guarantee), and workload scheduling schemes (random dispatching and Least Workload Left) on the dynamic management efficiency of a virtual Yahoo! Video data center. As the popularity of Internet Video sharing sites grows, and as the consumer access bandwidth increases, the sheer volume of data exchanged for online video service has the potential to severely strain the resources of both centralized servers and edge networks. Understanding Internet Video sharing site workload will aid in network management, capacity planning, and the design of new systems. While no new streaming technologies are proposed in this paper, our measurement study leads to guidelines on new multimedia stream architecture/platform design (see discussions in Sections 4.3 and 5.5). The rest of the paper is organized as follows. In Section 2 we present the background on Internet Video sharing services, and the related work on Internet Video workload measurement. Section 3 describes Yahoo! Video web site and the data collection method we used for the workload data collection. The analysis of the Yahoo! Video measurement data is presented in Section 4, categorized into static and temporal properties. Section 5 presents a set of queueing-model based analysis results on a virtual VoD data center design, assuming the same measured workload. We concludes this paper in Section 6 with a discussion of future work.
2. Background and related work 2.1. Internet Video sharing Internet Video sharing sites, including YouTube, Yahoo! Video, MySpace, and Clipshack, bring online video services into a new era. On those web sites, users can upload video quickly, and files are automatically converted to a uniform easily-playable format. Those videos will then be tagged with keywords or phrases for content based categorization and searching. Users can conveniently share their videos among friends by emailing the links or embedding them on web pages; rating and comments on videos also bring new social aspects into the video sharing process. Tagging, social networking, and the abundance of user generated content are the characteristics of those Internet Video sharing services as a Web 2.0 application [12]. Adobe’s Flash Video (FLV) format is used by most of the Internet file sharing sites, including YouTube and Yahoo! Video. This reduces file size significantly and also allows most platforms and browsers to view the content with just a flash player. Video delivery service usually implements Progressive download technology, which enables playback of a partially downloaded video file, but does not support on-the-fly user interactions facilitated by traditional on-demand streaming for operations like fast forward and rewind, or video rate adaptation. Progressive download works with web servers, and delivers content over normal HTTP connections. Therefore, there is no rate control to the playback of a video, and it is sent at the maximum rate that the video server and user can accomplish. The delivery infrastructure of an Internet Video sharing site is comprised of many servers, potentially including some from (one
or more) Content Distribution Networks (CDNs). As reported in [12], the video workload served by CDNs is a tiny portion of the total workload on YouTube. This is not surprising considering the financial cost with CDN service, and those Internet Video sharing sites mostly rely on their own data centers for service delivery. Understanding the massive workload on those sites and finding out the architectural principles in designing an efficient data center to deliver the service motivate our work in this paper. 2.2. Related work Measurement study on Internet Video sharing services has been focusing on YouTube, the leader in this area. For example, [9] collected data on the video files of two categories, and studied the corresponding statistical properties including the popularity life-cycle of videos, the relationship between requests and video age, and the level of content aliasing and illegal content in the YouTube system; the workload on the ‘‘most viewed” category of YouTube site is measured in [12] based on the traffic from a campus network, and they examined usage patterns, file properties, popularity, refereeing characteristics as well as transfer behaviors of YouTube; [10] crawled the YouTube web site and collected data information on about 2.7 million YouTube video files; the study showed noticeably different statistics on YouTube videos compared to traditional streaming videos, such as length, access pattern, active life span, etc. Our work complements these daily based or even rougher measurement works, and distinguish from the previous study by providing richer workload dynamics information in the measurement data that are collected at a frequency of 30 min. There are also many workload measurement studies on traditional online video systems. For example, [7] presented the workload analysis on two media servers located at two large universities in U.S.; [21] used two long-term traces of streaming media services hosted within HP to develop a streaming media workload generator; [13] studied the media workload collected from a large number of commercial web sites hosted by a major ISP and that from a large group of home users via a cable company; [25] presented file reference characteristics and user behavior analysis of a production VoD system in China called Powerinfo system; more recently, [16] analyzed a 9-month trace of MSN Video, Microsoft’s VoD service, and studied the user behavior and video popularity distribution. While sharing some similarity, the workload characterization on Internet Video sharing sites is quite different from that of a traditional online video system, including both video file statistics and access pattern, as reported in the previous studies and also in this paper. Caching and Peer-assisted streaming are suggested to improve the distribution system performance for Internet Video services [16,12,9]. While we believe that a centralized data center will be the key component in a video distribution infrastructure in practice, our study does consider the implication of the measured workload characteristics on the effectiveness of those two technologies.
3. Yahoo! video workload measurement 3.1. Yahoo! video Yahoo! Video [1] is ranked the 2nd U.S. online video web site just after YouTube in terms of total number of video views, following YouTube; during July 2007, it delivered totally 390 millions of video streams to 35.325 millions of unique U.S. viewers, and contributed 4.3% to the U.S. online video traffic [4]. On Yahoo! Video site, all the videos are classified into 16 categories: Animals and Pets, Art and Animation, Autos, Commercials,
131
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
Entertainment, Family and Kids, Food and Drink, Funny, Instructional, Movies and Shorts, Music, News, Sports and Games, Technology and Products, Travel, People and Vlogs. Each video is assigned a unique ID (integer number), and has the following information posted on its web page: title, number of views (so far), video duration, average rating, number of ratings, added (uploaded) time, source (video producer), content description, tags (keywords), and comments. One feature that Yahoo! Video site offers while YouTube does not is the real time update on number of views information, which is critical for the workload dynamics study in our work. To verify it, we did an experiment with two machines, one located at Los Angeles, California (enl.usc.edu) and the other at Princeton, New Jersey (mesh12.nec-labs.com). In the experiment, we clicked on one arbitrary video through one machine, waited for 5 s, and checked its views information through another machine. We repeated the experiment by switching the machines, selecting videos with different popularity, and at different time of a day, and observed the consistently accurate update of the views information. As for YouTube, we observed the information update is slow and unpredictable, mostly at the daily frequency; this can also explain why the previous related work on YouTube study was only based on daily measurement data. 3.2. Data collection We crawled all 16 categories on the Yahoo! Video site for 46 days (from July 17 to August 31 2007), and the data was collected every 30 min. This measurement rate was chosen as a tradeoff between analysis requirement and resource constraint. Due to the massive scale of Yahoo! Video site, we limited the data collection to the first 10 pages of each category. Since each page contains 10 video objects, each time the measurement collects dynamic workload information for 1600 video files in total. Throughout the whole collection period, we recorded 9986 unique videos and a total of 32,064,496 video views. This can be translated into a daily video request rate of 697,064, and gave approximately 5.54% coverage on the total Yahoo! Video workload in July 2007, based on [4]. Before we starting the analysis, preprocessing is necessary for the raw data. First, we found out that the first 100 videos from each category are not the most popular ones because some videos turned out to be new files without VIEWS information, which means the collected data was not biased and therefore can be a reasonable representation of the overall workload. Also because of that, we had to insert ‘‘0 VIEWS” to the new files manually. Second, due to unknown server abnormal status, some of the videos have their VIEWS set to a number smaller (sometimes equals to 0) than the number seen in a previous measurement, which are obviously wrong, thus we need to estimate the correct VIEWS with a simple heuristic. Fortunately those events were rare, and they are only 0.72% of the total data. After these preprocessing works, we can proceed to the analysis.
4. Workload statistics In this section, we present our statistical analysis from a view of workload management. In the measurement data the four attributes utilized are category, video ID, number of views, and video duration. From them we can learn about the basic workload properties including workload composition in terms of content (categories), video request arrival rate, file popularity, and workload distribution in terms of service time, and a set of derived properties on workload stationarity, predictability, and burstiness. As our raw measurement data reflects the aggregated workload at a small time scale of 30 min, we further process the data to learn about
the workload properties at time scales of 1 h, 1 day, and 1 week; this gave us another dimension in the statistical analysis. Similar to [21], we partition the studied workload properties into two categories: Static properties include: – Category-based workload decomposition, video duration, file popularity. Temporal properties includes: – Request size (service time) distribution stationarity, arrival rate predictability. burstiness. 4.1. Static properties 4.1.1. Category-based workload decomposition Table 1 summarizes the number and percentage of video views in each of the 16 Yahoo! Video categories. The distribution is not very skewed: the most popular category is People and Vlogs, attracting 14.76% of the views; the second is News, at 13.1%; and the least popular is Animal and Pets, at 2.2%. This diversity reflects the interest diversity of the Yahoo! users, which might be explained by the web portal positioning of Yahoo! site on itself. 4.1.2. Video duration We recorded 9986 unique videos in total, and the video duration distribution is shown in Fig. 1. The video durations range from 2 to 7518 s. Among them, 76.3% is less than 5 min, 91.82% is less than 10 min, and 97.66% is less than 25 min. The mean video duration is 283.46 s, and the median duration is 159 s. This is close to the statistics on YouTube videos reported in [10], although YouTube videos seem to have even shorter durations; it might be due to the different video upload policies on the two sites: YouTube imposes the limit of 10 min on regular users, while Yahoo! Video allows the video size up to 100 MB (more than 40 min at 300 Kb/s bit rate) to regular users. On those video sharing sites, the video bit rate is usually around 300 Kb/s with FLV format; therefore short video durations in less than 5 min mean small videos size in less than 10 MB. If file popularity were also skewed (e.g., following Zipf), then a video server with a few GB memory could easily stream most requests without accessing its disks, therefore maximizing its capacity in terms of number of concurrent connections supportable. Next, we investigate the file popularity in the measurement workload. 4.1.3. File popularity File popularity is defined as the distribution of stream requests on video files during a measurement interval. Fig. 2 shows typical
Table 1 Breakdown of video views on categories. Category
Number of views
Percentage
Animal and Pets Art and Animation Auto Commercials Entertainment Family and Kids Food and Drink Funny Instructional Movie and Shorts Music News Sports and Games Technology and Products Travel People and Vlogs
723,297 970,815 2,142,053 950,477 2,440,490 2,058,516 3,596,069 1,311,612 2,263,844 1,261,668 2,590,677 4,291,686 1,203,403 1,215,315 895,644 4,832,385
2.2 2.96 6.54 2.9 7.45 6.29 10.98 4 6.9 3.85 7.9 13.1 3.67 3.71 2.74 14.76
132
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
2
Request probability
10
one week popularity exponential power law power law + exp cutoff
0
10
−2
10
−4
10
−6
10
0
10
1
10
2
10
3
10
File rank Fig. 3. Fitting for file popularity distribution at 1-week time scale. Fig. 1. Video duration distribution.
file popularity distributions at 4 time scales. As we can see,the distributions at different time scales are quite similar. This implies that request distribution on files may have strong stationarity. We further validate this hypothesis in Section 4.2.1. The popularity data also follow the widely used Pareto-principle (or 80–20 rule). Our data shows 20% of the top popular videos account for nearly 80.1% of the total views, while the rest 80% of the videos account for the rest workload. In Fig. 3, we pick the distribution of 1 week and perform goodness-of-fit test with several distribution models. It turns out Zipf with an exponential cutoff fits best and well (with Zipf exponent 0.8545), as shown in the figure. It is interesting that [9] also reported the same fitting result for their YouTube video popularity study. The same paper gave a thorough discussion on the nature of Zipf with a truncated tail and its potential causes and implication on video sharing efficiency, which we do not repeat in this paper. 4.1.4. Correlation between file popularity and duration Related to video duration distribution and popularity are questions involving user preference: the relationship between size and popularity. Intuitively, there should be some correlation between
video length and its popularity. The scatter plot for the data collected from July 25th to Aug 31st confirms the hypothesis. In the scatter plot, each dot represents the number of views of the corresponding video size. The figure shows obvious larger density for shorter videos than longer videos, and the most popular files are mostly quite small. Therefore, Figs. 1 and 4 scatter tell us that, in addition to the tendency for video files as created to be small, users also prefer short videos. 4.2. Temporal properties 4.2.1. Request size distribution stationarity Request size distribution is defined as the distribution of stream requests on video durations during a measurement interval. Its stationarity property has a strong correlation with file access locality. Understanding it is also important in data center performance management as size-based workload scheduling schemes have been shown outstanding performance for provisioning general web services [15,26]. Now we check the stationarity of request size distribution in terms of service time. In the study, the service time of a video request is defined as the duration of the requested video. While the actual service time of a video request should be defined as
−1
10
30 minutes one hour one week one day
−2
Request probability
10
−3
10
−4
10
−5
10
−6
10
−7
10
0
10
1
10
2
10
3
10
File rank Fig. 2. File popularity distribution in different time scales.
Fig. 4. File popularity and duration relationship.
133
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
minðp1 ½m; p2 ½mÞ;
m¼0
where p1 ; p2 are the two distribution histograms, and m the bins. In studying the stationarity of the job arrival process, histogram intersection distance is a simple metric to measure the load stability. When the job size distributions are the same during two h consecutive time intervals, d1;2 is equal to 0, and the system will experience the same traffic intensity given the arrival rates are the same; when the jobs are very different from one time interval to another (e.g., requests shift from a wide distribution of diverse h videos to a small set of hot clips), d1;2 will approach 1 and the system will expect a large load variation from one time interval to another. In the study, we generated the histogram of a workload distribution with file size range of [2, 7518] (in seconds) and M ¼ 751 equal-width bins; for each time scale, we calculated the pair-wise histogram intersection distance of two adjacent data points along the time. Fig. 5 shows the CDFs of histogram intersection distance distribution for 3 time scales. We can see that within 30-min and 1-h scale, the histogram distance is very small for most of the time. For example, 80% of the time it is no more than 0.1; 90% of the time it is no more than 0.15. But from day to day, the difference of request size distributions is obvious. Therefore, if we want to do short-term capacity planning, the current workload distribution is a good indication for the next time period, and dynamic provisioning only needs to focus on request arrival rate dynamics. However, if we want to carry out the capacity planning at daily basis, both arrival rate dynamics and request size distribution dynamics need to be taken into account.
Number of requests every 30 minutes
h
d1;2 ¼ 1
M X
5
2
x 10
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
Date Jul 17 ~ Aug 31 Fig. 6. Arrival rate evolution at 30-min time scale.
5
4
Number of requests within 1 hour interval
the playback time, our measurement data (actually the Yahoo! Video site interface) did not contain such detailed information on the workload. Therefore we chose video duration as an approximation. We use histogram intersection distance to measure the change between two request size distributions at different time points. The histogram intersection distance is defined as [20]:
4.2.2. Arrival rate predictability We plot the arrival rates at the 4 time scales in Figs. 6–9. The Xaxis shows the time discretized on the time scale and the Y-axis shows the average arrival rate during a time interval. For example, in Fig. 6, a data point at ðx; yÞ shows that the average arrival rate is y at the time point x with an interval of 30 min.
x 10
3.5 3 2.5 2 1.5 1 0.5 0
Date Jul 17 ~ Aug 31 Fig. 7. Arrival rate evolution at 1-h time scale.
5
14
0.9
13 30 minutes data one hour data one day data
Cumulative probability
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Number of requests within one day
1
x 10
12 11 10 9 8 7 6 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Histogram Intersection Score Fig. 5. Histogram intersection distance distribution.
0.8
4
0
7
14
21
28
35
Date Jul 17 ~ Aug 31 Fig. 8. Arrival rate evolution at 1-day time scale.
42
134
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138 6
6.5
0.9
x 10
Autocorrelation Coefficient
Number of rquests within one week
0.8 6
5.5
5
4.5
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
4
-0.1 10
30
40
60
Lag (k) 3.5
1
2
3
4
5
6
7
Fig. 10. Workload autocorrelation coefficient.
Date Jul 17 ~ Aug 31 Fig. 9. Arrival rate evolution at 1-week time scale.
Workload Predictability Evaluation 1
4.2.3. Burstiness While we can not predict these unexpected spikes in the workload, it is necessary to learn the nature of the burstiness and find out an efficient way to handle it once a busrty event happens.
Cumulative Probability
0.8
0.6
0.4
0.2
0 1e-04
0.001
0.01
0.1
1
10
100
Prediction error = |est. - real|/real Fig. 11. Prediction error with a simple heuristic.
12
7
x 10
6
5
power
The first metric we look at is the over-provisioning factor, which is defined as the ratio of the maximal arrival rate to the average arrival rate through the whole time. This is one value a data center operator takes into consideration in planning his/her system. At a time scale of 30 min, the over-provisioning factor is 13.2; at 1 h, it is 12.9. Until at 1 day or 1 week the aggregation effect reduces the factor to a number less than 2. The reason for that is due to the transient and short-lived bursty loads. As shown in Fig. 6, a bursty load typically lasted for no more than 1 h and the load spikes dispersed widely along the time. Therefore, while the maximal arrival rate at the hourly scale caught the transient peak, the daily arrival rate is much lower than 24 that as the former is the sum of a few hourly load spikes and a lot of hourly regular loads. Clearly, the cost could be high if an operator over-provisioned the data center based on the over-provisioning factor at a small time scale. On workload evolution, the graph for 1 week shows a clear trend of workload increasing. The graph for 1 day shows significant dynamics with irregular spikes. When we look at the graphs at 30 min and one hour, it turns out those spikes are still separate burstiness lasting for short time. If we removed those spikes from Fig. 6 or 7, the rest of the data points seem to be quite regular and predictable. We calculate the autocorrelation coefficient of the arrival rates at the time scale of 30 min, and from Fig. 10 we can see that the workload is highly correlated in short term. Fig. 11 shows the CDF of the prediction error using a simple heuristic: it uses the arrival rate from last interval as the prediction of the following interval’s arrival rate. We can see that workload can be well predicted except the spikes. 82% of the time the prediction had an error no more than 10%; 90% of the time the prediction had an error no more than 17%; 95% of the time the prediction had an error no more than 28%. We also use Fourier analysis to discover the possible periodicity in the workload dynamics after removing these spikes. As shown in Fig. 12, the maximum value on the figure indicates that the period is 1 day. With the strong periodicity components, known statistical prediction approaches like that in [8] can be applied to further improve the accuracy in capacity planning.
4
3
2
1
0
0
20
40
60
80
100
period (30 minutes X) Fig. 12. Periodicity of workload.
The comparison of the request (popularity) distribution during one spike interval and that in the normal interval right before it is shown in Fig. 13. We can see that the workload can be seen as two parts: a base workload similar to the workload in the previous
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
In the following, we will use the workload statistics from the 30-min measurement to simulate the requests on a virtual Yahoo! Video data center, and carry out dynamic provisioning at an interval of 30 min. Queueing-theoretic techniques are used to estimate the resource (server) demand with different instance combinations of the three factors mentioned above, and we give a quantitative analysis of those factors with a set of numerical examples. Based on the results, we try to learn the implications of those factors on designing the workload and capacity management components of this virtual system.
5
Number of requests within 30 minutes
10
stable period bursty period
4
10
135
3
10
2
10
5.1. Methodology 1
10
0
10
0
10
1
2
10
10
3
10
File rank Fig. 13. Popularity comparison of 30 min stable interval and bursty interval.
normal period, and an extra workload that is due to several very popular files. 4.3. Discussion Here we discuss the implication of the found workload characteristics on the design of a high-performance and efficient video distribution data center. The characteristics of video duration and file popularity suggest that commodity PCs can work as high-performance video servers for those video sharing sites. Assuming disk I/O as the performance bottleneck, traditional streaming servers have special hardware configuration with complicated disk access scheduling schemes. But a media server is CPU bounded if it can serve streams from its memory [11], and the latter is supported by our video duration and file popularity statistical data. The trend of multi-core CPU architecture for commodity PCs make them even more powerful and fit for this new video service. The temporal properties of the measured workload reveal two contrasting components: a ‘‘well-behaving” workload component that has strong variability, strong autocorrelation, and strong periodic component; and a ‘‘trespassing” workload component that is bursty and unpredictable. For a data center operator with the measured workload, the good news is that for all the time there is the ‘‘well-behaving” workload component which is ideal for dynamic provisioning; in rare time the ‘‘trespassing” workload comes, there is still positive news that it has extremely high content locality, and the operator can make the best of caching/CDNs, or simply provision extra servers with the best-case capacity estimation on memory-based streaming (usually 2.5 to 3 times higher than the capacity estimation on disk-based streaming [11]). 5. Workload and capacity management: a virtual design Over-provisioning is the most typical practice for workload and capacity management of Internet applications. It simplifies the management but with the obvious shortcoming on resource demand, which includes not only hardware but also power/energy consumption and building infrastructure. Given the observations on large over-provisioning factor and good workload predictability in Section 4, we would like to further investigate a few other factors that affect the efficiency of dynamical provisioning. Those factors include arrival rate distribution, workload scheduling schemes, and Service Level Agreements (SLAs).
5.1.1. Non-homogeneous arrival model Because our measured data does not contain detailed information on individual requests, inter-arrival time distributions can not be inferred directly. In the analysis, non-homogeneous arrival process with homogeneous intervals [19] is used to model the individual request arrivals. In the model, the time is divided into 30-min intervals; each interval has its own expected arrival rate according to the measurement data; within each interval, the arrivals come following a constant-rate stochastic proceeds with certain inter-arrival time distribution. That is, we used the measurement data to fit the mean of the request inter-arrival time, and generated synthetic load to simulate difference scenarios on the variance of the request inter-arrival time. Specifically, we test the two typical Internet traffic inter-arrival time distributions, exponential and a heavy-tailed (e.g., Pareto P½X > x ¼ ðx0 =xÞ with x P x0 ; a > 1), as suggested in [19]. 5.1.2. Queueing-model based capacity planning We model a single video server as a group of virtual servers with First Come First Served (FCFS) queues. The virtual server number corresponds to the physical server’s capacity, which is defined as the maximum number of concurrent streams delivered by the server without loosing a quality of stream [11]. There are two factors that can affect the server’s real-time capacity: file popularity and stream bit rate. Our study on the popularity stationarity and the reported almost-fixed bit rate (concentrated around 330 Kb/s [10,12]) in video sharing services indicate the possibility of reliable server capacity estimation within the provisioning interval. In the analysis, the number 300 is chosen for the capacity of a video server based on the streaming server capacity measurement results in [11]. In this way, we model the video service center as a distributed system with many FCFS servers. 5.1.3. Workload scheduling schemes Assume each incoming job needs to be dispatched immediately via a gateway to one of the video servers, then choosing an appropriate scheduling scheme for the dispatcher will be a key design. This is a general problem of mapping requests to service agents, and has been extensively studied in management science; we will not discuss the optimality of scheduling schemes here. Instead, we want to learn the benefit of dynamic provisioning for any dispatching scheme in such large-scale data center. We pick two well known schemes to study: random dispatching as shown in Fig. 14, which does not make use of any information of the servers and just sends each incoming job to one of s server uniformly with probability 1=s; Least Workload Left (LWL) scheme as shown in Fig. 15, which tries to achieve load balancing among servers by making use of the per-server workload information and assigns the job to the server with the least workload left at the arrival instant. One reason we pick these two schemes is that they represent two extreme cases in the management overhead: random dispatching is stateless therefore robust, while LWL needs up-to-date
136
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
streaming. Assume enough network bandwidth, then QoS on stream quality within the data center side can be guaranteed through admission control based on server capacity. For the waiting time W, we consider two types of SLAs:
Fig. 14. Distributed system with FCFS servers and random dispatching scheme.
maximal average waiting time, defined as E½W 6 x; x > 0. For example, SLA could be the average waiting time is no more than 5 s, i.e., x ¼ 5, bound on the tail of the waiting time distribution, define as P½W > x < y with x > 0; y < 1. For example, SLA could be that 90% of the requests experience no more than 5 s of delay, i.e., x ¼ 5 and y = 90%. In the rest of the paper, we abbreviate the SLA on maximal average waiting time as W av g , and that on bound of waiting time tail as W tail . 5.2. Non-exponential to exponential: the need of workload shaping First, we want to understand the effect of job inter-arrival distribution on the performance measure of the system, thus also the resource demand. According to Kingman’s upper bound on the delay for GI=GI=1 queue, the arrival rate k a server can support is [24]:
Fig. 15. Distributed system with FCFS servers and least-workload-left dispatching scheme.
information on all servers and therefore sensitive to system changes. Clearly the former is preferred to the latter in a data center design if its performance does not lag too much behind that of the latter. The measurement data give us a perfect numerical example to check on the tradeoff. As they are numerically tractable, it makes feasible understanding the impact of workload characteristics on the resource demand qualitatively. If we assume the job arrival process is Poisson, then in random dispatching scheme, each server is an identical M=G=1 queue because of the Poisson decomposition property; for LWL scheme, it was shown in Theorem 2 of [14] that the performance of the system with LWL dispatching scheme is the same as a M=G=s system, i.e., it works as if one queue is maintained in the dispatcher and every server fetch the job there whenever they finish the previous job. There are queueing theory results available on approximating the performance measure in terms of the waiting time W in M=G=s queue, thus also for LWL scheme. Specifically, for a M=G=s queue with arrival rate k, job size distribution with kth moments mk , traffic intensity q ¼ km1 , and s servers, we use the following formulas in [23] to calculate the expected performance: P½WðM=G=sÞ > 0 P½WðM=M=sÞ > 0 ¼
h
sð1qÞ
ðsqÞs
Ps1 ðsqÞk i
ðsqÞs þ s!ð1qÞ
k¼0 k!
E½WðM=G=sÞ m2 P½WðM=G=sÞ>0 2m1 sð1qÞ P½WðM=G=sÞ > t P½WðM=G=sÞ > 0e2sm1 ð1qÞt=m2 5.1.4. Service level agreements Service Level Agreements (SLAs) for a service define the Quality of Service (QoS) guarantees for a specified service based on the cost model and the anticipated level of requests from the service customers. In this paper, we focus on the relationship between the Quality of Service (QoS) guarantees and the needed resource requirement. For video stream services, we consider two QoS metrics: the stream quality for an accepted connection, and the waiting time of a video request in the queue before accepted for
kP
1 r2 þr2
;
ð1Þ
a b m1 þ 2ðdm 1Þ
where d is the mean waiting time and r2a ; r2b are the variance of inter-arrival and service time, respectively. Thus as the variance of inter-arrival time r2a increases, more servers will be required to achieve the SLA. It is thus intuitive that we can save servers by reducing the variance of job inter-arrival times. We take the measurement data on August 1st between 00:00 am and 00:30 am to extract the workload parameters k ¼ 1285:103; m1 ¼ 163:7021; m2 ¼ 54905:8. If we set the SLA W av g as E½W 6 1:5m1 , then we need 1190 physical servers for Poisson arrivals. If we assume the inter-arrival distribution is Pareto Pr½X > x ¼ ð0:00045=xÞ2:3713 (we have picked the parameters such that the expectation of inter-arrival is the same as Poisson case, and also the variance is finite), then we will need 1440 physical servers, which is 21% more resource demand. We suggest to introduce a workload shaping component as proposed in [5] before the job dispatcher; not only because it can help us save servers by transforming a heavy-tailed inter-arrival (or any) distribution to a light-tailed exponential one, but also it enables us to carry out more accurate capacity planning by using the queueing results for M=G=s system. As pointed out in [22], the best approximation result known theoretically on the waiting time distribution of GI=GI=s model (corresponding to the heavy-detailed arrival scenario) is less reliable because they have not yet been studied sufficiently and evidently depend more critically on the distribution beyond the first two moments of the inter-arrival distributions. 5.3. From average to tailed performance: the impact of SLAs on resource demand Next, we want to understand the impact of SLA requirements on the resource demand. Taking the same measurement data on August 1st from 00:00 am to 00:30 am and considering random dispatching scheme, we vary the parameter x in the SLA W av g from 0:1m1 to 3m2 , the resource demand as a function of x is shown in Fig. 16. When the SLA requirement is not high (x > m1 ), the resource demand remains almost flat; after the requirement reaches some cut-point m1 , the demand increases with a dramatic speed
137
Servers needed
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138 8000
Table 2 Performance comparison of the two dispatching schemes, in number of servers.
7000
Dispatching scheme
E½W 6 1:5m1
P½W > m1 < 0:3
6000
Random LWL
1190 710
2330 730
5000
10000
4000
9000
3000
random scheme LWL scheme
8000
1000 0
0
50
100
150
200
250
300
350
400
450
500
waiting time x (secs)
Server needed
2000
Fig. 16. resource demand: as a function of x in W av g .
7000 6000 5000 4000 3000 2000
with stricter requirement; a data center operator should take this function into consideration when deciding the cost model in the SLA. For the SLA W tail , we fix the parameter x ¼ m1 , and then vary the other parameter y, and the results are shown in Fig. 17. We had a similar observation as that for W av g , and note that for this type of SLA, the resource demand is even more sensitive to the SLA parameters. 5.4. Is random dispatching good enough? Taking the measurement data in August 1st from 00:00 am to 00:30 am, we numerically calculated the resource demands of random and LWL dispatching schemes with Poisson arrivals and the SLA as Pr½W > m1 < 0:3 or E½W 6 1:5m1 . The results are shown in Table 2. We can see that LWL performed significantly better than random scheme, especially for the W av g SLA. We further extended the simulation to 1 week. The data of the week August 13th to August 20th was input, and Fig. 18 shows the dynamic provisioning results with the SLA requirement as W av g : E½W 6 1:5m1 . It turns out that during the time, on average 54.9% of servers could be saved with LWL scheme as compared to random dispatching scheme.
1000 0
0
50
100
150
200
250
300
350
time Fig. 18. Resource demand within 1 week: W av g case.
Using the same data and set the SLA requirement as W tail : Pr½W > m1 < 0:3, we repeat the simulation and Fig. 19 shows the results; on average 69.9% of servers could be saved with LWL scheme as compared to random dispatching scheme. 5.5. Discussion As we can see, by making use of the server load information and implementing dispatching schemes beyond naive random scheme could reduce the resource demand of such a large data center dramatically, especially under strict SLAs. Depending on the job size distribution, the best dispatching scheme may be different as discussed in [14]. One factor we did not count into when doing capacity planning analysis is incomplete video sessions observed in streaming workloads [25,21]: a significant amount of clients do not finish playing
7000
18000 16000
6000
LWL scheme random scheme
14000
Servers needed
Servers needed
5000
4000
3000
12000 10000 8000 6000
2000
4000 1000
0 0
2000 0 0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
tail distribution y Fig. 17. resource demand: as a function of y in W tail .
0.45
0.5
0
50
100
150
200
250
time Fig. 19. Resource demand within 1 week: W tail case.
300
350
138
X. Kang et al. / J. Vis. Commun. Image R. 21 (2010) 129–138
an entire video. Our measurement data missed such information without individual view session information. We note our analysis gave an upbound on the resource demand assuming no incomplete video sessions; our analytical tool can take into such information and may yield results with even lower resource demand. 6. Conclusions In this paper, we present the measurement study of a large Internet video sharing site-Yahoo! Video. With a clear goal to facilitate the data center design, this paper gives a comprehensive workload characterization and proposes a set of guidelines for workload and capacity management in a large-scale video distribution system The success of YouTube brings online video into a new era, and its success pushes another wave of building large online video sites on the Internet [17]. We believe our results in this paper will give insights for the design and operations management of those new Internet service data centers. References [1] Yahoo! video. Available from: . [2] Youtube. Available from: . [3] ComScore Video Metrix report: 3 out of 4 U.S. Internet Users Streamed Video Online in May. Available from: , March 2007. [4] ComScore Video Metrix report: U.S. Viewers Watched an Average of 3 hours of Online Video in July. Available from: , July 2007. [5] D. Abendroth, U. Killat, Intelligent shaping: well shaped throughout the entire network? in: Proceedings of INFOCOM 2002, 2002. [6] M. Abundo, Youtube tracks outages on status blog. Available from: , June 2007. [7] J.M. Almeida, J. Krueger, D.L. Eager, M.K. Vernon, Analysis of educational media server workloads, in: NOSSDAV’01: Proceedings of the 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video, ACM, New York, NY, USA, 2001, pp. 21–30. [8] N. Bobroff, A. Kochut, K. Beaty, Dynamic placement of virtual machines for managing SLA violations, in: IM ’07: 10th IFIP/IEEE International Symposium on Integrated Network Management, IEEE, Munich, Germany, 2007, pp. 119– 128.
[9] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, S. Moon, I Tube, YouTube, everybody tubes: analyzing the World’s largest user generated content video system, in: ACM Proceedings of Internet Measurement Conference, San Diego, CA, USA, October 2007. [10] X. Cheng, C. Dale, J. Liu, Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study. ArXiv e-prints, 707, July 2007. [11] L. Cherkasova, L. Staley, Measuring the capacity of a streaming media server in a utility data center environment, in: MULTIMEDIA ’02: Proceedings of the tenth ACM International Conference on Multimedia, ACM, New York, NY, USA, 2002, pp. 299–302. [12] P. Gill, M. Arlitt, Z. Li, A. Mahanti, YouTube traffic characterization: a view from the edge, in: ACM Proceedings of Internet Measurement Conference, San Diego, CA, USA, October 2007. [13] L. Guo, S. Chen, Z. Xiao, X. Zhang, Analysis of multimedia workloads with implications for Internet streaming, in: WWW’05: Proceedings of the 14th International Conference on World Wide Web, ACM, New York, NY, USA, 2005, pp. 519–528. [14] M. Harchol-Balter, M.E. Crovella, C.D. Murta, On choosing a task assignment policy for a distributed server system, J. Parallel Distributed Comput. 59 (2) (1999) 204–228. [15] M. Harchol-Balter, B. Schroeder, N. Bansal, M. Agrawal, Size-based scheduling to improve web performance, ACM Trans. Comput. Syst. 21 (2) (2003) 207– 233. [16] C. Huang, J. Li, K.W. Ross, Can internet video-on-demand be profitable? in: SIGCOMM’07: Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM, New York, NY, USA, 2007, pp. 133–144. [17] D. Kawamoto, NBC, News Corp. Push New Web rival to YouTube, ZDNet News, March 2007. [18] T. O’Reilly, What is web 2.0: Design patterns and business models for the next generation of software. Available from: , September 2005. [19] V. Paxson, S. Floyd, Wide-area traffic: the failure of poisson modeling, SIGCOMM Comput. Commun. Rev. 24 (4) (1994) 257–268. [20] M.J. Swain, D.H. Ballard, Color indexing, Int. J. Comput. Vis. 7 (1) (1991) 11–32. [21] W. Tang, Y. Fu, L. Cherkasova, A. Vahdat, Modeling and generating realistic streaming server workloads, in: Computer Networks: The International Journal of Computer and Telecommunications Networking Archive, January 2007, pp. 336–356. [22] W. Whitt, Approximations for the GI/G/m queue, Prod. Operations Manag. 2 (2) (1993) 114–161. [23] W. Whitt, Partitioning customers into service groups, Manag. Sci. 45 (11) (1999) 1579–1592. [24] R.W. Wolff, Stochastic Modeling and Theory of Queues, Prentice Hall, 1989. [25] H. Yu, D. Zheng, B.Y. Zhao, W. Zheng, Understanding user behavior in largescale video-on-demand systems, in: Proceedings of the 2006 EuroSys Conference, Leuven, Belgium, April 2006. [26] Q. Zhang, W. Sun, Workload-aware load balancing for clustered web servers, IEEE Trans. Parallel Distrib. Syst. 16 (3) (2005) 219–233.