Optimizaton of Transcoding in Delivery of Multimedia Content Daniel Hofman Faculty of Electrical Engineering and Computing University of Zagreb, Croatia
[email protected]
Abstract—Multimedia content on Internet has become increasingly important during the last decade. User generated videos are becoming very popular. Every minute 24 hours of video are uploaded to YouTube and every day 2 billions of videos are viewed [1]. All these videos need to be converted into appropriate format for viewers. Viewers use diverse devices for watching videos; from small mobile phones, smart phones, laptops, to desktop computers and high-definition displays. This diversity requires many different versions of each video which needs to be available to viewers. Real time (transactional) transcoding is proposed as a solution to the problem. Also, optimization of transcoding for multi-core processors is needed to speed up the transcoding process. Keywords—transcoding; multi-core; transactional transcoding; streaming; multimedia; long tail
I.
INTRODUCTION
The amount of video content on the Internet has rapidly grown during the last decade. Introduction of video services such as YouTube has enabled users to generate their own content and upload it to online video services. In 2007 more than 500.000 user-generated content was uploaded daily on those video services [2]. This created a huge amount of multimedia content which needs to be available to users on the Internet. All the multimedia content needs to be transformed into right format for every user. For example, someone using a desktop computer who wants to watch a high definition video or users who wants to watch video on their mobile phone with a small screen needs video that is optimized for their low resolution screen and limited network bandwidth. This means
that every video must be transcoded into the right resolution, frame rate, bit rate and codec. II.
OVERVIEW OF THE INTERNET MULTIMEDIA SYSTEM
In multimedia system there should be an original video recorded by a camera or created on the computer and encoded into a compressed format. After that the video is uploaded onto an Internet server where the whole transcoding process begins. When a user requests a video, server needs to transcode the video into the desired format and send it to the user. The video from the server is transcoded into a format with desired resolution, bit-rate, frame-rate, codec, and all other specific processing is done (like adding logo or commercial to the video). This process requires a high computational power and data flow. During the transmission of the videos there are limitations of the network bandwidth available and problems concerning transmission errors. Some techniques [3] allow recovery of the video from errors occurred on the end-user side or during transmission. On the end-user side there are different kinds of devices. Depending on the device limitations are in processing power, display resolution, codecs available, etc. Transcoding is used in various multimedia systems like conferencing, telemedicine, education and military. In these systems, audio and video needs to be delivered smoothly without interruptions. Quality of user experience must meet the prescribed level of quality of experience for that system.
Figure 1. Overview of the multimedia system on the Internet with main characteristics of every part
III.
QUALITY OF EXPERIENCE
Quality of experience is used to measure the real efficiency of the transcoding algorithms and video coding algorithms in general. User experience is dependent on both the video and audio quality of the streamed content. Depending on the content type (news shows, sport games, movies, etc.) people react differently to the proportion of video/audio quality. In particular lower video quality is acceptable in a news show if audio is of good quality. Opposite to that lower audio quality with good video would be acceptable for some sport games [12]. Delay time is also important factor that needs to be carefully planned not to degrade the QOE. After requesting a stream from a server and defining its capabilities, end user has to wait some time for the transcoding process to begin and for the transmission of the content to his device. With some perceptual tricks like displaying the channel logo, waiting time can be shorten, but just perceptually [8]. Transcoding algorithms should be able to provide a lower bandwidth video when degradation occurs on the available network bandwidth. This will help from video being stopped and it will be played at lower quality. Unfortunately this approach won’t help if the bit-error or jitter is high. Researches of QOE for the standard devices like desktop computers, video consoles and television has been done but further researches is needed to see the difference concerning mobile devices. IV.
LONG TAIL
With more and more users spending significant amount of time watching multimedia content on the Internet process of transcoding becomes more important. Not only that people watch videos, they also upload a huge amount of videos every day.
Video views
Some multimedia content is watched more frequently and some is watched only by some people and very rarely. This effect is called the “long tail”. Term “the long tail” was introduced by the Chris Anderson in Wired magazine in 2004. Later it was even more popularized in his book [4].
can see that there is a small amount of videos that are frequently viewed and have many total views. As we go to the right number of views falls down. Amount of videos with small view count is much greater than of those with higher view count. This long tail (seen as yellow colored space on the graph) takes big share in the total number of videos views. V.
TRANSCODING IN GENERAL
Transcoding is process of decoding video from some format to usually uncompressed format and encoding it to desired format. For example during transcoding an MPEG-2 video would be decoded to RAW format and then encoded into H.264 format. This process is an intensive computation process for server processor [5]. Most intensive part of transcoding is motion estimation. This is because during motion estimation we must find how some blocks of picture move from one frame to another. A. Types of transcoding Video services used transcoding in a way that they pretranscoded an original video and stored it in one or more formats that they later used for streaming to end user. Lately number of different formats of multimedia streams requested by users is growing with number of new devices having different capabilities (in terms of resolution, processing power and network bandwidth). These trend increases number of different files needed for every single multimedia content. Increase in number of transcoded files requires more processing power for transcoding and more storage space for storing those files. Even doe storage space is taken as a low cost resource, this increase in space is significant and requires new ways of transcoding. IDC proposes a different approach in transcoding called transactional transcoding [9]. In transactional transcoding multimedia content is transcoded when there is a need for transcoding. Original multimedia content is kept only in original format or some format that is more appropriate for transcoding. When a transcoding request comes from the end user, transcoding starts. User can request which format of multimedia he needs. B. List of coding algorithyms Most widely used video coding algorithms are MPEG-4 Part 2 codec, H.264/MPEG-4 AVC codec and Microsoft codecs (WMV and MS MPEG-4v3). Evolution of video compression standards is shown in Table 1.
Head
TABLE I.
Tail % of video content Figure 2. Long tail
In the long tail we have percent of video content on the X axis and number of requests for those videos on the Y axis. We
Year
HISTORY OF VIDEO COMPRESSION STANDARDS [14] Standard
Publisher
1984
H.120
ITU-T
1990
H.261
ITU-T
1993
MPEG-1 Part 2
1995
H.262/MPEG-2 Part 2
ISO, IEC ISO, IEC, ITU-T
Popular Implementations
Videoconferencing, Video-telephony Video-CD DVD Video, Blu-ray, Digital Video Broadcasting, SVCD
Year
Standard
Publisher
1996
H.263
1999
MPEG-4 Part 2
ISO, IEC
H.264/MPEG-4 AVC
ISO, IEC, ITU-T
2003
ITU-T
Popular Implementations
Videoconferencing, Videotelephony, Video on Mobile Phones (3GP) Video on Internet (DivX, Xvid) Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD
C. Cloud transcoding Some sites on the Internet provide services of video transcoding in clouds. User can upload a file and get a transcoded version uploaded to desired server after transcoding. This kind of transcoding can’t be used for transactional transcoding because only whole files are transcoded and it is not done in real-time. Despite that, optimization techniques for parallelized transcoding can be used with some modifications on clouds because they consist of many powerful servers that have multi or many-core processors inside. D. Optimization of transcoding Transcoding algorithms can be used on multi-core and many-core processors. If we want to use the full potentials of these kinds of processors we need to optimize the algorithms. Optimization needs to improve the way computation is done to use all of the processing power distributed over the cores of the processors. Also we need to optimize the data flow between the memory (HDD and RAM) and processors. To use the entire server processing power we can use GPU as another fast processing unit. During the transcoding video is coded to desired format. Some parts of the coding process can be easily parallelized like for example compression of one single video frame that doesn’t rely on other frames. On the other hand it’s harder to parallelize the motion compensation part. In motion compensation we first need to decode previously compressed frame to be able to compare it to the current frame and calculate motion vectors. This need for decoding results in data dependency and influence the way we can parallelize the transcoding across processors with different memory systems. With optimizing the transcoding process by using parallelization we get more processing capabilities that can be used to transcode more files or to get smaller transcoded files. Transcoded files will be smaller if we use more complex algorithms that use more processing power and search not only small blocks of picture for motion estimation, but larger neighboring area. This extended search increases the possibility to find better matches and by that reduce the transcoded file size. Optimization for transcoding on parallel processors needs to be done in every step of encoding and not just most intensive part like motion estimation. This is important because some steps in encoding are dependent on each other and results from one step are used as input in one or more steps. If we, for example, have a process B that is dependent on a result of step A. Step A is processed on GPU and step B (that follows step A) is processed in processor. Now we will need to transfer
results from step A to processor so it could begun step B. This could be a bottleneck for our system so it would be better to do step A and B in the same processor [6]. VI.
STREAMING MODEL OF COMPUTATION
Parallel processors computational power can be exploited by using streaming model of computation. The main idea of this streaming processing model is to provide an efficient implementation of transcoding in multiprocessor architecture [10]. Programs are expressed as a set of operations on input data. Data are processed by processing elements and transported among them with communication channels. Streaming model of computation could provide a scalable solution to easily describe image and video processing without going into details about target architecture. VII. TECHNIQUES OF LOWERING THE BANDWIDTH While trying to lower the size of the transcoded stream we can use several techniques. They are useful when we need to transcode video without changing the coding algorithm used [7]. A. Reducing bits with fixed resolution If we want to leave frame rate and bit rate the same as the original video and at the same time lower stream size we can use some of the following ways. Quantization step during coding can be increased which will decrease the number of nonzero quantified coefficients. Second approach can be discarding some of the higher ac frequency coefficients. This can be done because most of the energy is concentrated in the lower frequency band of the image. B. Spatial resolution reduction These techniques are based on lowering number of bits in a picture by removing them or choosing different types of approximations. With changing the number of bits in the picture we also need to correct the motion vectors that are not the same anymore like in the original picture. Other way is transcoding only a part of the picture that is a region of interest. Region of interest is defined using meta information contained in the original picture [11]. C. Temporal resolution reduction By lowering the frame-rate, transcoded video will automatically be smaller. Some end user devices limited with their computational power, requires lower bit rate independent of high network bandwidth they have. Removal of frames can’t be just done by dropping some of the frames. Motion vectors also need to be corrected because some of them will point to frames that don’t exist anymore. D. Multiple and single layer transcoding Some codecs provides video to be coded into several layers. Every layer contains information about video and with more layers quality of the video will be better. Decoder can generate
a video from just a single base layer, but if there is another layer available then the video will be of greater quality. VIII. TRENDS Mobile devices that have access to the Internet are expected to surpass 1 billion by 2013 [13]. Compared to 2.2 billion devices that will be using Internet in general, this is a 45% share. Since these mobile devices will have different capabilities (in terms of processing power, display resolution and network bandwidth) there will be a need for more scalable video streams. Streams will need to be transcoded into right format for every device based on the information provided by the device. If the environment condition changes (network bandwidth, screen resolution) during a streaming, stream will need to be optimized for the new situation. Mobile devices will bring new dimension to advertisers. Most mobile devices have GPS or can be at least approximately located. This enables location based services to be used on mobile devices. Location-based ads can be presented to the views of videos. This ads needs to be transcoded together with streaming video, and all that should be done real time. This amplifies even more need for transactional transcoding. IX.
CONCLUSION
Multimedia content is taking a primary role in Internet. From the beginning of Internet when text with small amount of images was the leading element, in this decade primary role is being taken by multimedia content. Progress has been empowered by higher network speeds and better processing power of end user devices. Also development and outspread of mobile devices capable of playing multimedia opened new approaches to video experience. All this together with vast amount of user generated videos being uploaded and watched every day is contributing to research in the field of transcoding. To keep up with the increasing amount of user generated videos and diversity of devices for watching videos, providers will need to change the way of transcoding the multimedia. Some videos might still be streamed in the old way by pretranscoding, but majority of videos will need to be streamed in a novel way of transactional transcoding. Transactional transcoding will allow video to be optimized for the end user to gain the best possible quality of experience. Also it will provide the means of extra monetization features like delivering ads targeted using location based services.
Research should be done in every part of the multimedia delivering system. From optimization of transcoding for multiprocessors, across transmission of data over networks to lower the amount of lost data, and finally to the device and end user to maximize the quality of multimedia with given resources.
REFERENCES [1] [2] [3]
[4] [5] [6] [7]
[8]
[9] [10] [11]
[12]
[13]
[14]
“YouTube Fact Sheet”, http://www.youtube.com/t/fact_sheet/ G. Ireland and L. Ward, “Transcoding Internet and Mobile Video: Solutions for the Long Tail”, IDC, 2007 L. Superiori, O. Nemethova, M. Rupp, “An H.264/AVC Error Detection Algorithm Based on Syntax Analysis”, In: A. M. A. Ahmad and I. K. Ibrahim, “Multimedia Transcoding in Mobile and Wireless Networks”, Information Science Reference, London, 2009, pp. 215 – 234. C. Anderson, “The Long Tail: Why the Future of Business Is Selling Less of More”, Hyperion, 2006 H. Kalva, A. Vetro and H. Sun, “Performance Optimization of an MPEG-2 to MPEG-4 Video Transcoder”, MERL, 2003 M. D. McCool, “Transcoding video with parallel programming on multicore processors”, RapidMind, 2008 I. Ahmad, X. Wei, Y. Sun and Y. Zhang, “Video Transcoding: An Overview of Various Techniques and Research Issues”, IEEE Transactions on multimedia, vol. 7, no. 5, Oct. 2005, pp. 793-804 N. Roma and L. Sousa, “Insertation of irregular-shaped logos in the compressed DCT domain”, In: Proc. 14th Int. Conf. Digital Signal Processing (DSP) 2002, vol. 1, Jul. 2002, pp. 125-128 G. Ireland, “Transactional Transcoding: Enabling New Models of Video Distribution, Consuption, and Monetization”, IDC, 2009 J. Knezović, “Streaming Model Architecture for Image and Video Processing”, Zagreb, 2009 M. Žagar, M. Kovač, J. Knezović, H. Mlinarić and D. Hofman, “3D Object Classification and Segmentation Methods”, In: M. Mrak, M. Grgić and M. Kunt, “High-Quality Visual Experience: Creation, Processing and Interactivity of High-Resolution and High-Dimensional Video Signals (Signals and Communication Technology)”, Springer, 2010, pp. 331-347 H. Knoche and M. A. Ssse, “Getting the Big Picture on Small Screens: Quality of Experience in Mobile TV”, In: A. M. A. Ahmad and I. K. Ibrahim, “Multimedia Transcoding in Mobile and Wireless Networks”, Information Science Reference, London, 2009, pp. 31-46. A. Gonsalves, “1 Billion Mobile Internet Devices Seen By 2013”, http://www.informationweek.com/news/internet/webdev/showArticle.jht ml?articleID=222001329/, InformationWeek, 2009 Video compression, http://en.wikipedia.org/wiki/Video_compression/