The 23rd International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2008)
Adaptive MPEG-4 Video Streaming Over IP Networks RU ZHOU1 and Kyung - sik Jang2 1
Department of Electrical and Electronics, Korean University of Technology and Education 330-708 GaJeon-Ri ByeongCheon-Myeon ChounAn-City ChungNam-Province, Korea 2 Department of Electrical and Electronics, Korean University of Technology and Education 330-708 GaJeon-Ri ByeongCheon-Myeon ChounAn-City ChungNam-Province, Korea E-mail:
[email protected],
[email protected] Abstract: As network researchers prefer evaluate the effectiveness of the network before deploying the protocols in real networks, video traces, which give the sizes of the individual video frames in a video sequence, have been emerging as convenient video characterizations for networking studies. This paper proposes a system named VSS (Video Streaming Simulation) for comprehensive video delivered quality evaluation using traffice traces in RTP/UDP/IP network simulation environment.
1. Introduction There is an increasing demand for streaming video applications over both the fixed Internet and wireless IP networks. Recently MPEG-4 becomes the compression standards targeted at steaming multimedia services since low bit-rate to high bit-rate. The dynamic nature of besteffort networks in terms of fluctuating bandwidth and timevarying delays makes it challenging for the application to provide good quality streaming. As network researchers prefer evaluate the effectiveness of the network before deploying the protocols in real networks, video traces, which give the sizes of the individual video frames in a video sequence, have been emerging as convenient video characterizations for networking studies. Because RTP[1]/UDP/IP is becoming almost the defacto standard for the transmission of multimedia data over the Internet, we concentrate on transmission over RTP/UDP/IP. As up to now there is no tool-set is publicly available to perform a comprehensive video delivered quality evaluation using these traffic traces in RTP/UDP/IP network simulation environment, we proposed a system for simulating the MPEG-4[2] video transmission based on RTP/UDP/IP. In brief, researchers who utilize our proposed QoS assessment framework VSS will be benefited in verifying their designs regarding video transmission over simulated network.
quality of the video to be visually evaluated. The networklevel metrics, such as bandwidth, frame rate, and jitter, can also be obtained. However, this kind of tool focuses mainly on real networks. This may prevent networking people from evaluating their proposed protocols in a timely manner because network researchers commonly use simulation tools such as ns2 [6] to verify the effectiveness of their designs before deploying the protocols in real networks. Using traffic traces The video-traffic trace is an abstraction of the real-video stream. It typically gives the frame number, frame type (I, B, P), and frame size in a text file to describe the characteristics of real-video traffic. One example of video trace file is shown in figure 2.1.The [4] provides many kinds of video-traffic traces such as H.264, MPEG, or MDC traces. The advantage of using traffic traces is that one doesn’t need to worry about copyright issue because traffic traces don’t contain the actual video information. Nevertheless, for a simulation study, usually only networklevel metrics can be obtained. In the case of evaluating video transmission, network-level metrics may be insufficient to rate the quality perceived by an end user. Take the loss rate as an example: relatively low loss rates do not necessarily mean that the delivered video quality is good. A 3% packet loss could translate into a 30% frameerror probability. The loss of the I-frame would cause other frames in the same GOP to become useless.
2. Related Works In order to evaluate video transmission, there are three different ways: real bit streams, traffic traces, and videotraffic models.
Figure 2.1 Example of video trace file
Real bit streams The real bit-streams method uses the actual output of video encoding for video encoding for video transmission evaluation. RealTracer[3] , a set of tools for measuring the performance of RealVideo, is an example of how this works. One advantage of this method is that it allows the
Using video-traffic models A video model captures the properties of real-video bit streams in a mathematical way. This method is typically developed based on the statistical properties of a set of video-trace samples of real-video traffic. Transform expand sample (TES) is an example of this kind of methodology
637
for generating data that closely match any set of given observations of a time series. The developed model can be used for mathematical analysis of networks, but it lacks the possibility of visualizing a transmitted video. In our proposed system VSS, we use traffic trace for the evaluation video transmission. Our system is mainly based on EvalVid [5] and NS2[6]. We will describe the details of our system and show some experiment results in Section 3 and conclude the paper in section 4.
3. Architecture of VSS In this section, we introduce our proposed system VSS. In figure 2.1, the architecture of VSS (Video Streaming Simulation) is shown. It mainly includes three steps. 1) Video Trace Files Generation. 2) NS2 Simulation. 3) Video Streaming Evaluation The target of the first step is to generate video trace files. In order to make the video transmission adaptively, in this step we generate lots of trace files in N different scales. The second step is network simulation. It plays the role of black box in our system. Researchers can change the topology inside of it as they want. In our work, we prefer the network structure of RTP/UDP/IP. We generate sender trace file and receiver trace file to record the packets’ details during the simulation process in this step. Finally, by using the trace files generated during simulation and the encoded videos in step one, we can regenerate received video. After decoding the constructed video, the transmission can be evaluated by calculating Peak Signal Noise Ratio (PSNR) or some other metrics by comparing the raw input and the decoded received raw video. The detail of each step is described in the following.Our system is achieved on WinXP + Cgywin + NS2.
Figure 2.1 Architecture of VSS 3. 1 Video Trace Files Generation The target of this part is to generate the video trace files which describes the details of the compressed video and can be used as the input of the next step – NS2[6]. This part includes MPEG-4 Encoder and Video Trace Files Generator.
638
The input of this part is raw video sequence generated by a video source. They can be either in the YUV QCIF which has 176*144 pixels or in the YUV CIF which has 352*288 pixels. These two formats are commonly used in networkrelated studies. The output of this part is video trace files. Our target is to have an real-time video rate controller in the network simulator, but without having to do the media encoding itself. That’s why the media encoding must before the network simulation. Unlike most other related works which encode the source raw data with only one level once, in order to make the system into adaptive sender, we should generate many different video trace files representing different video quality. Then the online rate controller in the NS2 simulation can adaptivly change to another trace file to simulate another quality of encoding during the network simulation. So scalable encoder is needed for encoding the raw video source into lots of compressed video in different scales. In MPEG-4[2], the valid quantizer scale values are in the range from 1 to 31, with 1 producing the best quality and the highest bit rate while 31 producing the worst quality and the lowest bit rate. As shown in the figure2.1, Video_1.m4v, Video_2.m4v … Video_N.m4v are generated by using MPEG-4 encoder. Each of them represents a compressed video file of one quantization scale. As the MPEG standard uses the frame types intra-coded (I), inter-coded (P), and bidirectional coded (B). These different frame types are organized into so-called groups of pictures (GOP). The P frames are inter-coded with reference to the preceding I or P frame while the B frames are with reference to both the preceding and succeeding I or P frames. Because of this feature of MPEG standard, in our sytstem, the real-time rate controller should only consider changing scales at the start of a new GOP. To make this simply to control, we use MPEG-4 encoder to keep the GOP size fixed, so the rate controller will always find an I frame as the first frame after trace file switching. The synchronized GOP boundaries will ensure a refresh of the motion prediction, and all P B frames in that GOP will be based on that I frame. The process of switching can be shown in figure 3.1.1.
Figure 3.1.1 Switching Quantization Scale The video trace file is in the format as we described in section 2. However, this is not enough as an input of NS2 because we have to consider the timestamp for each frame/packet in the NS2. For this, the Video Traffic Trace Generator is used to produce a trace file containing the timestamp and size as shown in figure 3.1.2.
3) Add the extended Trace File class for the interface between step one and step two. 4) Generate the sender trace file and the receiver trace file for recording the transmission details. Sender trace file records the An example of the sender trace file is shown in figure 3.1.3. It gives the sending time, packet sequence number, protocol name, packet size (in bytes), quantization scale, frame type. The video generator we choose support the unit of GoP, so you find in the figure that the quantization of one GoP is fixed. The receiver trace file is in the similar format but if it finds the sequence number is not consecutive it will symbol the lost packets as lost. By using these two files, metrics which are related to network topology such as packet loss, jitter and frame loss can be calculated directly.
Figure 3.1.2 Example of video trace file
Figure 3.1.3 Sending Trace File in NS2
3. 2 NS2 Simulation This part plays the role as a black box in this whole system. Network researchers can set any topology he wants to simulate and evaluate the video transmission. Nowadays, NS2 has become more and more popular and our framework is also based on NS2. NS2[6] is a tool commonly used in network research. It is an object-oriented, discrete-event-driven network simulator developed at the University of California, Berkeley, and written in C++ and OTcl. It covers a very large number of applications, protocols, network types, network elements, and traffic models. As RTP/UDP/IP is becoming the standard for multimedia streaming, but the RTP/RTCP transmission in NS2 is very weak. There is no actual feedback in original RTP/RTCP, adaptive feedback rate controller such as TFRC [7] can not be simulated. Based on the work from [8], we extended RTP/RTCP in NS2 and test many kinds of congestion control algorithms for RTP. As the TFRC is becoming more and more popular for multimedia streaming because of its stable performance, the target of the extended RTP/RTCP includes: 1) Add the feedback (Sender Report and Receiver Report) into RTP/RTCP parts to transmit the QoS parameters timely. 2) Add the strategy of TFRC[7] based on RTP to test its performance and compare it with other rate controllers.
639
Through the experiment of RTP transmission in NS2, we conclude that TFRC performs much more stable than AIMD congestion control algorithms while no need for changing metric values in different cases of network topology. 3.3 Video Streaming Evaluation Tools in this part include video generator, MPEG4 decoder, Error Concealment Generator and PSNR&MOS calculation. As shown in the figure 2.1, the purpose for this step is evaluating the process describe above. With the generated NS2 sender trace file and receiver trace file and RTP-traffic trace file and the compressed video files from step one, the following metrics for evaluating can be got: źPacket/Frame loss rate and Jitter źConstructed received video źPSNR (between original video and constructed received video) źMOS (Mean Opinion Score) The Packet/Frame loss rate and Jitter can be got within the work in NS2. Here we should point out that in multimedia transmission frame loss rate performs important role in evaluation compared with packet loss. The main part of this step is the reconstruction of received video with the help of generated trace files. Due to the
TFRC rate controller (or other rate controller) is changing the quantization scale, the actual video is a mixture of many quantization. The process of reconstruction can be regarded as a process of copying the original compressed video file packet by packet, omitting packets that are lost during network simulation. Then the Error Concealment Generator is used by simply inserts missing frames by copying the former frame so that sent and received video consists of equal number of frames. Up to now the full process of video transmission is finished. We can use VLC Player to open the reconstructed encoded video. Some blur can be found during playback because of the congestion occurred in network simular. Using the reconstructed video and the original video, we can evaluate the whole process between them. Nowadays the most widespread method is the calculation of peak signal to noise ratio (PSNR) image by image. It is a derivative of the wellknown signal to noise ratio (SNR), which compares the signal energy to the error energy. The PSNR compares the maximum possible signal energy to the noise energy, which has shown to result in a higher correlation with the subjective quality perception than the conventional SNR. The following formula shows the definition of the PSNR between the luminance component Y of source image S and destination image D. Figure 3.3.1 shows an example of the PSNR curve by comparing the original video and the reconstructed video using the following fomula. PSNR (n) dB
V peak
20 log10 (
V peak 1 N col N row
Ncol
)
Nrow
¦ ¦ [Y (n, i, j ) Y S
i 0
D
(n, i, j )]2
j 0
k
2 1
k= number of bits per pixel (luminance component)
Also you can define any metrics you want to evaluate the whole process. The metrics in this part are only our concerns.
4. Conclusion and Future Work In this paper we proposed a novel system for Real-Time VSS (Video Streaming Simulation). VSS is a system for the simulation of rate adaptive video streaming, using this system, you can evaluate the transmission by calculating the PSNR and MOS value between the original video source and received video source. This system is mainly designed for the MPEG-4 or H.264 because they provide large range of quantization scales. Our system uses video trace file as the input of network simulator to simulate the transmission of real video and adapt the simulated sending rate by changing the trace files which are generated in the range of encoding quantization scales. We use extended RTP/RTCP in network simulation for the video trace transmission. And the performance of TFRC is proved to be good enough when congestion happens. Actually one of the main reasons we propose this system is for the convenience for network researchers. So network researches can also define their own network protocol or topology to test the process of transmission. VSS can be used as a early test tool for new protocols or network algorithms. Nowadays more and more applications are concerning about the wireless network, researchers who define new algorithms in the research of network can firstly test his work using VSS and if the simulation result is good they can test it in the real wireless network. Testing our system in wireless is also one of our future works.
References
Figure 3.3.1 Example of the PSNR curve Having the PSNR value, another metric MOS (Mean Option Score) can be calculated. This method has the advantage of showing clearly the distortion caused by the network at a glance. The determination of MOS is shown in the table below: PSNR [dB] > 37 31 - 37 25 – 31 20 – 25 < 20
Table 3.3.1 PSNR to MOS conversion
MOS 5 (Excellent) 4 (Good) 3 (Fair) 2 (Poor) 1 (Bad)
640
[1] H.Schulzrinne, S.Casner, R.Frederick,and V.Jacobson, “RTP: A transport protocol for real-time applications.” Request for Comments, IETF,RFC3550, 2003 [2] ISO_IEC_14496-2 Information technology - Part 2 Visual 1999 [3] http://perform.wpi.edu/real-tracer/ [4] http://trace.eas.asu.edu [5] Jirka Klaue “EvalVid – A Framework for Video Transmission and Quality Evaluation” 13th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, USA Sep 2003 [6] http://www.isi.edu/nsnam/ns/ [7] M.Handley, S.Floyd, J.Padhye, and J.Widmer, “TCPFriendly rate control (TFRC): Protocol specification,” Request for Comments, IETF, RFC 3448,2003. [8] Christos Bouras, Georgios Kioumourtzis, “Extending the Functionality of RTP/RTCP Implementation in NS2 to support TCP friendly congestion control” SIMUTools France 2008