Building Cloud-ready Video Transcoding System for ...

21 downloads 114922 Views 604KB Size Report
typically allow their customers to perform video transcoding on. CDN platforms. With the high volume of video streams and the bursty transcoding workload, CDN ...
Building Cloud-ready Video Transcoding System for Content Delivery Networks (CDNs) ∗

Zhenyun Zhuang∗ and Chun Guo†

College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Email: [email protected] † Ying-Da-Ji Technologies, Southern District of Hi-Tech Park, Shenzhen 518057, China

Abstract—Video streaming traffic of both VoD (Video on Demand) and Live is exploding. Various types of businesses and many people are relying on video streaming to attract customers/users and for other purposes. Given the vast number of video stream formats (e.g., MP4, FLV) and transmission protocols (e.g., HTTP, RTMP, RTSP) for supporting varying types of playback terminals (particularly mobile devices such as iphone/ipad and Android phones), video content providers often need to transcode videos to multiple formats in order to stream to different types of users. Being time-sensitive and requiring high bandwidth, video streaming exerts high pressure on underlying delivery networks. Content Delivery Network (CDN) providers can help their customers quickly and reliably distribute stream contents to end users. In addition to distributing video streams, CDN providers typically allow their customers to perform video transcoding on CDN platforms. With the high volume of video streams and the bursty transcoding workload, CDN providers are eager to deploy elastic and optimized cloud-based transcoding platforms. In this paper, we design and build such a transcoding platform.

I. I NTRODUCTION Video streaming traffic of both VoD (Video on Demand) and Live is exploding. Video traffic has been estimated to account for more than 50% of Internet traffic in United States, and it is likely to be the No.1 Internet traffic type globally. Many businesses and people are relying on video streams to attract customers/users and for other purposes. Mobile users, equipped with various types of mobile devices such as smartphones, are increasingly adopting video streams for entertainment and social lives. Given the vast number of video formats (e.g., MP4, FLV) and transmission protocols (e.g., HTTP, RTMP, RTSP) for supporting varying types of playback terminals (e.g., mobile devices like iphone/ipad and Android phones), video content providers often need to transcode videos to multiple formats for streaming to more users. Being time-sensitive and requiring high bandwidth, Internet video streaming exerts high pressure on the underlying content delivery networks. Facing the requirements of heavy bandwidth and strict timeliness, more and more video content providers (e.g., companies and end users) turn to Content Delivery Networks (CDNs) [1], [2] for a more effective delivery and streaming solution. CDN providers can help their customers quickly stream their video contents to end users while maintaining improved streaming quality. In addition to streaming videos, CDN providers typically allow their customers to perform video transcoding on their CDN platforms.

Video transcoding is extremely computation-intensive. With the high volume of online video streams and the bursty transcoding load, CDN providers are in high need of deploying an elastic and optimized cloud-based transcoding platform. With the unique characteristics of streaming videos on CDNs and the challenging requirements of transcoding operations on today’s Internet, a transcoding platform has to consider multiple factors. For example, videos are increasingly being transcoded to multiple bit-rates to accommodate the dynamic nature of networks. In other words, streaming videos on CDNs are oftentimes coupled with transcoding systems. In fact, most CDN providers have implemented their own transcoding systems to meet such requirements. In this paper, we design and build such a transcoding platform which harnesses the power of cloud computing and content delivery networks. In this work, we consider using cloud-based techniques to address the aforementioned issues effectively. Specifically, we built a cloud-ready enterprise transcoding system which works seamlessly with CDN infrastructure. Referred to as C3 (Cloud-ready transCoding with CDNs), the system consists of the following components for efficient transcoding on CDNs: (i) Ingesting Cloud, which provides low response time and increased capacity for customers who upload videos for transcoding; (ii) Transcoding Cloud, which dynamically allocates/de-allocates transcoding nodes to accommodate bursty behaviors of work loads and chooses the optimal transcoding node to perform transcoding; In the following, we first provide some background information in Section II and then motivate our design of C3 for CDNs (C3 ) in Section III. We then present the detailed design of C3 in Section IV. We perform prototype-based evaluation and show the results in Section V. We also present related works in Section VI and conclude the work in VII. II. BACKGROUND We now provide some background information about CDN, video streaming, transcoding, as well as cloud computing. CDN A typical CDN infrastructure consists of Ingesting Servers (for accepting customer contents), Origin Servers (for serving edge servers) and Edge Servers (for serving end users directly), forming a layered structure. Depending on the scale of a CDN, the number of servers varies from several to hundreds or even thousands.

Ingesting Cloud

Ingest Server

Delivery Cloud

Transcoding Cloud

Transcoding Worker

Ingest Server

Transcoding Manager Transcoding Worker

Delivery Server Delivery Server

Viewing

Ingesting

Video Downloader

Video Uploader

Fig. 1.

Delivery Server

Viewing Video Downloader

Cloud-ready transSoding system for CDNs

CDN-assisted video streaming Video streaming becomes popular recently, and it is increasingly being delivered to web users with CDNs. CDN providers have enhanced and dedicated networks to support video stream transmission, they often have a fleet of streaming servers distributed across the globe so that globally end users can receive better services from a closer streaming server. Before a VoD video can be viewed by end users, the carefully formatted video firstly needs to be copied to such streaming servers for streaming. Video transcoding Though some customers may prepare streaming-ready videos and streams by themselves, other customers do not have the transcoding capability and thus rely on CDN providers to transcode their original videos. Given an original video, customers often need to convert the video into multiple output videos, each with different formats and screen sizes. Video transcoding is very computing-intensive due to the complex encoding/decoding process. Today’s video playback devices, particularly mobile ones, are highly diversified. In addition to conventional desktop and laptops, mobile devices such as tablets (e.g., ipad) and smart phones (e.g., iphone and Android phones) are quickly adopting popularity. Each of these devices has particular screen size (i.e., pixels), prefer different video format (e.g., MP4), and support only certain transmission protocols (e.g., HTTP, RTMP). Moreover, a recent streaming advancement for better playback experience is multi-level streaming, where multiple versions (each with different bit-rate and quality) of the same video are provided, and the users’ devices intelligently play the most suitable version depending on their available bandwidth or device capabilities. All these properties require the transcoding service to convert videos to multiple formats. It is not uncommon for a video to be transcoded to more than 100 individual output videos in today’s typical usage scenarios. Cloud computing Cloud computing provides the elastic (or resizable) compute capability. Amazon EC2 [3], for instance, allows customers to dynamically allocate and de-allocate compute resources to accommodate customers’ computing requirement. Cloud Computing best fits computing jobs that have bursty requirement. By leveraging the power of cloud

computing, a service provider can better adapt to changing needs for better performance and reduced cost. III. M OTIVATION The intensive computation requirements of transcoding systems, coupled with unique features of underlying CDNs, give rise to various challenges. First, as the first step towards delivering, videos/streams need to be uploaded/ingested to CDNs before being transcoded. With multiple available ingesting servers typically provided by CDNs, there is a need to select the optimal ingesting server. Second, CDNs (and customers) may involve large bulk of transcoding jobs. For instance, Youtube transcodes hundreds of thousands of videos each day. Also, based on our study of the incoming transcoding load, jobs come in a bursty behavior. Given such characteristics, the transcoding system needs to be scalable. It is desirable to build an elastic transcoding system that can automatically adjust the capability according to the job properties. With such requirements, there are questions of how to scale the system with multiple transcoding nodes and how to choose transcoding node for a particular job. CDNs typically allow customers to do region-based provisioning. In other words, the customers can control which regions (e.g. US, Asia) the video can be delivered to. Because of this, the physical locations of chosen transcoding servers can cause different bandwidth and time cost. Given such challenges, we believe a enterprise-level transcoding system that coupled with CDN infrastructure is of urgent need. In the following, we further elaborate on several technical aspects that motivate our design. A. Video/stream Ingesting Video/stream ingesting consists of both VoD video uploading and Live stream ingesting. Customers have to ingest their videos/streams to CDN ingesting servers before the videos can be transcoded and streamed. Though CDNs have been traditionally designed to deliver videos to end-users from closer CDN servers, the ingesting component attracts less attention and thus is less optimized. CDN providers typically have multiple servers in different locations and with different

Duration

File Size 10000

10000

1000

1000 File Size (Log)

File Duration (Log)

100

100 10

10

1

1 0

50

100

150

200

0

250

50

100

150

200

250

0.1 File Id

File ID

(b) Playback Duration

(a) File Size Fig. 2.

Video Diversity (File Size and Playback Duration)

capabilities (i.e., bandwidth, cpu, load, etc.). However, for most CDN providers, only a single ingesting server is used to serve a particular customer. Due to the potentially large video sizes, only using a single server could result in very large ingesting time. It is desirable to allow multiple ingesting servers to work in tandem to accept customers’ videos/streams. B. Diversified Transcoding Jobs After videos/streams are ingested into CDN, the transcoding system takes control for converting to various video/streaming formats. Enterprise transcoding systems typically have multiple transcoding nodes, and it is desired to do balancing in some form. It is important to note that a simple balancing algorithm that randomly assigns jobs to transcoding nodes won’t work well if the incoming transcoding jobs vary in terms of the compute resource is required. We collected a VoD transcoding trace from a CDN provider. The trace consists of 251 raw videos. For concern of privacy, we re-scaled the job submission time. We analyzed the sizes and playback durations of these videos and plot the results in Figure 2. In Figure 2(a) we plot the file sizes (log scale) for all the 251 videos, and in Figure 2(b) we plot the playback durations (log scale). As we see, the videos have very diversified values on these two properties. Specifically, though most videos have less than 100 MB, some videos are more than 6 GB. These results suggest that a simple scheduling (balancing) algorithm does not work well. A desired balancing algorithm needs to be able to meter the workload associated with each transcoding job and has to consider the following factors at the minimum: workload and node capabilities. The capabilities can be easily captured by doing CPU benchmarking or so, and the workload needs to consider the special characteristics of CDN delivery. Specifically, depending on customers, a video can be converted to multiple output format (e.g., Silverlight and Flash), each of these formats requires different levels of processing power. Moreover, multi-bitrate transcoding as required by dynamic streaming solutions further complicates the workload metering. Since customers may have various levels of multi-bitrate transcoding, the metering also has to consider this information. With CDN-assisted video streaming, transcoding balancing algorithm needs to consider another factor: moving videos to

target streaming servers. As CDN operators typically charge a customer based on the physical locations and the number of the streaming servers provisioned, customers typically only want to provision a subset of all available streaming servers on a CDN system to cover certain regions. For VoD streaming, after transcoding, the converted videos need to be moved to provisioned streaming servers as determined by the customer profiles. For live streaming, the transcoded streams need to be continuously transmitted to the provisioned live streaming servers. Because of this, choosing different transcoding nodes have the impact on two further things: bandwidth cost and moving latency. C. Bursty Transcoding Load Videos/streams that are going to be transcoded may come in a bursty style. In other words, during some periods only a few videos come while during other periods many more videos come. The bursty behavior can be caused by multiple reasons such as certain video-related events (e.g., Olympics). In Figure 4 we plot the distribution of the joining time of the 251 videos. We see that there are some busy period where much more videos join the network. With such a bursty behavior, a transcoding system needs to be able to dynamically allocate/de-allocate transcoding nodes both for the concerns of cost and job processing time. With such a system, at idle time, some transcoding servers can be turned off to save cost; while at busy time, more transcoding servers should be turned on to reduce the processing time.

Creation Time 100 80 Number of videos

60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Creation Time

Fig. 4.

Bursty Behavior of Transcoding Jobs

Fig. 3.

Software Components

IV. D ESIGN C3 .

We now present the design of We first give a high-level design overview, followed by the software architecture and the two major components of C3 . A. Design Overview The design rationale of C3 is to tightly couple the transcoding process with the work flow of CDN infrastructure to gain maximum performance benefit. Such a tight coupling design is observed in the major components of C3 : Ingesting Cloud and Transcoding Cloud. With C3 , a typical life of a customer-provided VoD video is as follows. First, the video is moved into CDN infrastructure by being uploaded to CDN ingesting servers. We refer to the C3 component that handles of this process as Ingesting Cloud. For live streams, the ingesting process is similar; but instead of being uploaded as video files, live streams are continuously ingested into the ingesting servers. For simplicity, in the following presentation, we use VoD examples to illustrate the designs, unless we explicitly mention live videos. Second, after a video is uploaded into CDN servers, C3 will invoke the second component of Transcoding Cloud to perform the transcoding based on the customer’s profile. A customer’s profile contains information including the video output formats, output screen size, number of multi-bitrate levels, provisioned target streaming servers, and so on. B. Software Architecture The software architecture of C3 is shown in Figure 3. Customer videos are uploaded to a optimal ingesting server

Part I Part II

Ingesting Video Aggregator

Fig. 5. C3 Ingesting Cloud

which is determined by a specialized DNS component. Often referred to as GeoDNS, the specialized DNS component is a standard component in CDN infrastructure. GeoDNS answers a DNS request by providing a server that is geographically closer to the requesting client. GeoDNS ideally will return the ingesting server that has the following properties: (1) having highest uploading bandwidth from the customer video source; (2) having the lowest response time; and (3) not being overloaded. To deal with the bursty behavior of incoming transcoding tasks, C3 can dynamically spin up/down transcoding servers to handle the transcoding more smoothly. Specifically, when the ongoing transcoding jobs are heavy, more servers will be added to the transcoding server pool; otherwise, with light transcoding load, fewer servers will be in the server pool. Since each transcoding job has different properties including playback durations and number of output formats, C3 also balances the jobs assigned to each transcoding server to avoid overloading any server and minimize transcoding time. The ongoing state of transcoding cloud is recorded by the transcoding state component. Finally, based on customers’ provisioning state (i.e., how many and which video streaming servers need to have which output videos), the output videos will be moved to corresponding streaming servers. C. C3 Components We now focus on the major components of C3 : Ingesting Cloud and Balancing Cloud. 1) Ingesting Cloud: The ingesting cloud component of C3 relies on GeoDNS service, but it is designed to use multiple ingesting servers at the same time. Ingesting Cloud consists of two successive steps: Determining the ingesting server set and Uploading videos. First, based on the currently available ingesting servers and the size of the videos, Cloud Ingesting choose a set of ingesting servers that will accept the incoming videos. Apparently, the larger the size of the ingesting server set, the less time is taken to ingest the videos. The size of ingesting server set should be big enough to ensure predefined response time constraint. Specifically, if the total video size is V , it takes time T1 to upload V1 of video size, and the predefined response time constraint is Tc , then the selected number of ingesting servers N should satisfy N ≥ VV1TT1c .

Second, the incoming videos are uploaded to the ingesting servers. If an incoming video is so big that ingesting it into a single server will take unacceptable time, it is split into pieces and each piece is ingested separately. All the video pieces that belong to the same video are pieced together by an ingesting aggregator inside the ingesting cloud, as shown in Figure 5. Specifically, if the maximum video size that fulfills the ingesting time constraint is Vm , and the video size is V , then ⌈ VVm ⌉ ingesting servers are needed. 2) Transcoding Cloud: One of the key features of C3 is that the number of transcoding servers that are actively transcoding is dynamically adjusted to the current transcoding load. Assuming each server has a transcoding capability of R1 , and the current transcoding load requirement is Rc , then at least ⌈ RR1c ⌉ of transcoding servers should be spined up to serve. If the servers have different capability (e.g., CPU frequency), then by taking the specific capability into account, the number of servers can be decided. The more challenging part, however, is how to determine the transcoding load. Since transcoding workloads oftentimes exhibit bursty behavior, an appropriate prediction algorithm is needed to accommodate sudden increase/decrese of transcoding load. Believing that there are many research works covering the prediction algorithm, we will not elaborate on this part in this writing. Given a set of available active transcoding servers and an incoming video that waits to be transcoded, C3 needs to decide which transcoding server to assign this video to. C3 always assign an incoming job to the server with the lowest transcoding load. The transcoding load itself can be measured in multiple ways. A naive measurement would be to only consider the number of currently queued videos on each server. Due to different video sizes and transcoding requirements, such a naive measurement does not suffice. A more accurate measurement is to also consider the video size. In C3 , the load is determined by three factors: video size, video playback duration, and number of output formats. In addition, it is desirable to assign a transcoding task to the transcoding server that is closer to the provisioned target streaming servers if all other factors are the same. Doing so can have two types of benefits: less transmission time and lower bandwidth cost. In other words, if the transcoding servers all have similar loads, then C3 will assign the server that is closer to the provisioned data centers.

The prototype testbed involves 7 ingesting servers, 4 transcoding servers, and 6 target streaming servers. The servers are located in seven different cities. For ease of evaluation, we consider a VoD scenario to quantify the saving on various performance metrics as experienced by both end viewers and CDN providers. A. Ingesting Cloud We first evaluate the performance of the Ingesting Cloud. Starting from a single ingesting server, we gradually add more ingesting servers into the cloud. We use a 200MB video file and upload it to the ingesting servers. As shown in Figure 6, we see that with more ingesting servers, the ingesting time values decreased. Since ingesting time is one of the critical factors that affect the performance experienced by the customer, allocating more ingesting servers improves ingesting performance. Specifically, simply increasing the ingesting servers from 1 to 2 helps reduce the ingesting time by more than 35%. However, as shown in the figure, increasing the servers beyond 5 gains little performance improvement, this is due to the overhead associated with video splitting (on the sending side) and aggregating (on the receiving side). B. Transcoding Cloud In evaluating the Transcoding Cloud, we use the video traces as described in Section III. The transcoding cloud always has at least one transcoding server actively serving customers. Depending on the transcoding workload, C3 may spin up more transcoding servers to handle the jobs. We focus on the performance evaluation of the impact of job-assigning algorithm that assigns transcoding jobs to the active servers. We totally consider 4 algorithms regarding the job assignment. The first one is no-balancing scenario, where incoming jobs are randomly assigned to the transcoding servers, irrespective of the servers’ current workloads. The second algorithm is based on the number of jobs currently queued on the transcoding servers, and a new job is always assigned to the server with lowest load. We term the second algorithm job-based. The third algorithm, preset-based, is based on the number of presets (i.e., the number of output videos), and the server with minimum queued presets is chosen to handle a new job. Finally, the last algorithm is based on playback duration, which considers not only the number

V. E VALUATION We built a prototype with Microsoft Expression Encoder 4 (EE4) [4]. EE4 can encode videos for various types of devices and the web Silverlight videos with customizable templates (i.e., presets). The core of the prototype consists of a transcoding manager and a pool of transcoding servers, which periodically send heart-beat messages to the manager. After obtaining the complete source videos, the transcoding manager then examines the current active transcoding servers. Based on a particular job-assignment algorithm, the transcoding manager determines the lowest loaded server and assigns the job to it.

60 50

53.65

40 Ingesting Time

30

34.69

20

29.91

24.32

10 0

0

1

2

3

4

19.06

18.22

17.56

5

6

7

Number of Ingest Servers

Fig. 6.

Impact of Ingesting Cloud

8

No Balancing Balancing based on presets

Balancing based on jobs Balancing based on playback duration

100000.00

T i m e

10000.00

1000.00

(

S e )c

100.00

10.00

1.00

1

7

3 1

9 1

5 2

1 3

7 3

3 4

9 4

5 5

1 6

7 6

3 7

9 7

5 8

1 9

7 9

3 0 1

9 0 1

5 1 1

1 2 1

7 2 1

3 3 1

9 3 1

5 4 1

1 5 1

7 5 1

3 6 1

9 6 1

5 7 1

1 8 1

7 8 1

3 9 1

9 9 1

5 0 2

1 1 2

7 1 2

3 2 2

9 2 2

5 3 2

1 4 2

7 4 2

Video ID

Fig. 7.

Transcoding Time

of presets, but also the playback duration for each preset. Apparently, the last algorithm is working at the finest level, and we name it playback-based. For all the 251 videos, we plot the transcoding time for all the four algorithms in Figure 7. as we see, though many videos see similar transcoding time for all the algorithms, for other videos, the fourth algorithm (i.e., playback-duration one) achieves much lower transcoding time than other 3 algorithms. Specifically, the average transcoding time for nobalancing is 1574 seconds, job-based 1556 seconds, presetbased 1211 seconds, and playback-based 979 seconds. The playback-duration algorithm has a 38% less transcoding time than no-balancing algorithm. VI. R ELATED W ORK CDNs have been carried out by various providers to expedite the web access [1], [2]. The techniques used by CDNs for delivering conventional web contents are explained in related writings [5], [6]. Despite the popularity and pervasiveness of CDNs, many research problems and systembuilding challenges persist for optimizing CDN infrastructure and addressing challenges associated supporting various types of applications such as video streaming. Various aspects of video transcoding and streaming have been studied and analyzed [7]–[10]. The unique properties of video streaming, coupled with CDN-assisted delivery, justifies a specialized design of CDN infrastructure that specifically serves video streaming and saves CDN transit cost. To our best knowledge, this work is the first to consider and address the problem of optimizing transcoding and streaming with CDNs. Cloud computing, with the promise of better supporting various applications, has attracted attentions from both the research and industry communities [11], [12]. There are opportunities to consider and address the challenges associated with multimedia in the context of cloud computing [13], [14]. Encoding.com [15], the world’s largest video encoding service, provides video transcoding service for more than 1500 companies. Unlike these relevant works, our work is the first to consider optimizing video transcoding and streaming by

harnessing the power of cloud computing model and CDN infrastructure. VII. C ONCLUSION In this work, we propose a cloud-based video transcoding system for CDNs. The design considers the characteristics of CDNs and transcoding requirements and is able to dynamically adjust the transcoding tasks to reduce operation cost. We also build a prototype for evaluating the proposed the design. R EFERENCES [1] “Akamai technologies,” http://www.akamai.com/. [2] “Level 3 communications, llc.” http://www.level3.com/. [3] “Amazon elastic compute cloud (amazon ec2),” Amazon, http://aws. amazon.com/ec2. [4] “Expression encoder 4,” http://www.microsoft.com/expression /products/encoder4 overview.aspx. [5] K. Park, W. W. (editors, H. Kung, and C. Wu, “Content networks: Taxonomy and new approaches,” 2002. [6] D. C. Verma, S. Calo, and K. Amiri, “Policy based management of content distribution networks,” IEEE Network Magazine, vol. 16, pp. 34–39, 2002. [7] N. Bj¨ork and C. Christopoulos, “Video transcoding for universal multimedia access,” in Proceedings of the 2000 ACM workshops on Multimedia, ser. MULTIMEDIA, 2000. [8] Z. Zhuang and C. Guo, “Optimizing cdn infrastructure for live streaming with constrained server chaining,” in Proceedings of the 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications, ser. ISPA ’11, 2011. [9] K. Sripanidkulchai, B. Maggs, and H. Zhang, “An analysis of live streaming workloads on the internet,” in Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ser. IMC ’04, 2004. [10] J. He, A. Chaintreau, and C. Diot, “A performance evaluation of scalable live video streaming with nano data centers,” Comput. Netw., vol. 53, pp. 153–167, February 2009. [11] L. Ramakrishnan, K. R. Jackson, S. Canon, S. Cholia, and J. Shalf, “Defining future platform requirements for e-science clouds,” in Proceedings of the 1st ACM symposium on Cloud computing, ser. SoCC, 2000. [12] T. Karagiannis, C. Gkantsidis, D. Narayanan, and A. Rowstron, “Hermes: clustering users in large-scale e-mail services,” in Proceedings of the 1st ACM symposium on Cloud computing, ser. SoCC, 2010. [13] W. Zhu, C. Luo, W. Jianfeng, and L. Shipeng, “Multimedia cloud computing,” IEEE Signal Processing Magazine, vol. 28, pp. 59–69, May 2011. [14] E. Korotich and N. Samaan, “A novel architecture for efficient management of multimedia-service clouds,” in Proceedings of the GLOBECOM Workshops (GC Wkshps), 2011 IEEE, 2011. [15] “Encoding.com,” http://www.encoding.com/.

Suggest Documents