QoS guarantees based on end-to-end resource reservation for real-time video communications Kentarou FUKUDA, Naoki WAKAMIYA, Masayuki MURATA and Hideo MIYAHARA Department of Informatics and Mathematical Science Graduate School of Engineering Science, Osaka University 1–3 Machikaneyama, Toyonaka, Osaka 560–8531, Japan E-mail:
[email protected]–u.ac.jp The distributed multimedia system requires Quality of Service (QoS) guarantees in each entity within the system. The underlying transport network has to guarantee the network–level QoS, e.g., transfer delay, delay jitter and loss ratio. The video server has to have a mechanism to transmit the requested video stream in a real-time fashion, and the client must take care of the continuous and high–quality video presentation to users. To provide guaranteed QoS to users, sufficient system resources, i.e., network resource (bandwidth) and CPU resource (processor cycles) must be reserved. In a case when the network or CPU load is high, the multimedia system has to accommodate itself with the available resources. In this paper, we first investigate relationships among the video quality and the amount of CPU/network resources required to provide a real-time video communication. Then, we introduce the QoS control method which enables the effective resource usage and maximizes the provided video quality. 1. Introduction With dramatic improvements in computing power, network bandwidth and video data compression techniques, there has been much advancement in distributed multimedia systems. They require Quality of Service (QoS) guarantees in each entity within the system to perform effective and meaningful presentations [1,2]. As an example, let us consider a live broadcast application where video streams coded by MPEG-2 (Moving Picture Experts Group) video coding algorithm [3] are distributed in a realtime fashion. In such an application, the MPEG video is transferred from a video server to a number of clients over multicast connections. The received video stream is de-compressed by an MPEG-2 decoder and displayed on a computer display or a monitor. To provide users with video stream of their preferred quality, the video server, clients and network should take an important role in QoS provisioning. The video server should have a mechanism to emit the requested video stream in a real-time fashion, and to communicate with clients interactively. The client is responsible for the continuous and high–quality video presentation to a user. It is also
This work was partly supported by Research for the Future Program of Japan Society for the Promotion of Science under the Project “Integrated Network Architecture for Advanced Multimedia Application Systems”, and a Grantin-Aid for Scientific Research (A) 09305028 and a Grant-in-Aid for Encouragement of Young Scientists 10750277 from The Ministry of Education, Science, Sports and Culture of Japan.
necessary to provide the user with a QoS control mechanism to reflect their preference on the video quality. The underlying transport network has to guarantee the network–level QoS. For the network-level QoS, the resource reservation based protocols such as the CBR (Constant Bit Rate) service class in ATM (Asynchronous Transfer Mode) [4] can provide the hard guarantee. In the case of ATM, the CBR service class is standardized to offer the deterministic QoS guarantees. The “deterministic” means that the worst case QoS can be strictly guaranteed. In the CBR service class, it can be accomplished by allocating a fixed bandwidth to a connection according to a pre-described traffic descriptor, PCR (Peak Cell Rate). As long as the cell emission rate is kept under PCR, no cell loss occurs and high-speed cell transmission can be achieved [5]. Other resource reservation based protocols such as RSVP [6,7] can also be employed in our study. That is, we assume that the network can provide an ability to guarantee the negotiated bandwidth. However, we should note that RSVP can only provide a signalling mechanism and underlying networks have to have a bandwidth management mechanism [8–10]. To encode/decode MPEG-2 video in a real-time fashion, the server and client should guarantee the CPU-level QoS, e.g., processing delay and deadline violation ratio. For this, sufficient CPU resource (i.e., processor cycles) should be reserved with an appropriate scheduling mechanism at both end systems. Recently, to provide such a CPU-level QoS guarantee, a lot of researches have been devoted to real-time operating systems [11–14]. Those real-time OSs employ the real-time scheduling algorithms which consider the priority and/or the deadline of task. By using real-time OS, MPEG-2 video is expected to be encoded/decoded in a real-time fashion. If the real-time OSs and the resource reservation based networks are employed together, a realtime and high quality video distribution can be expected. However, to provide QoS guarantees in an effective way, a sufficient amount of resources should be reserved and dedicated to the video distribution service. Then, a similar problem arises in both of network and end systems. In resource reservation based networks such as ATM CBR service class, a connection setup is performed by allocating bandwidth to the connection. It means that the required bandwidth must be known a priori or can be estimated adequately at the connection setup time. In our previous study [2], we have proposed a QoS mapping method which enables the multimedia system to predict the required bandwidth from QoS parameters of MPEG video. Using this method, the bandwidth required for the video transferring can be immediately estimated. In the rest of this paper, we assume that the bandwidth required for MPEG video stream can be estimated accurately by using some suitable prediction method such as ours. When the bandwidth reservation succeeds, real-time video transferring can be achieved as far as the server and client have enough CPU resources, or the sufficient CPU resources are reserved or allocated to the application. In real-time OSs, as in reservation based networks, the amount of CPU resource to reserve (i.e., the processor cycles) must also be known a priori or estimated at the reservation setup time. In [15], authors propose predictors to estimate the required number of CPU cycles to decode MPEG compressed video with software decoder. With these predictors, CPU cycles needed to decode a following frame or packet of MPEG video stream can be estimated. However, it is difficult to dynamically reserve the CPU resource for each frame/packet. And the video quality will be degraded during the session if the reservation is rejected due to heavy CPU load. Furthermore, authors did not take account of CPU cycles needed to encode MPEG video at the server nor the bandwidth to transfer the coded video stream. There have been many investigations on real-time video transferring systems [16–19] and real-time OSs [12–14]. However, in those works, they consider the network resource and the CPU resource separately, although there exists the strong relationship between them. For exam-
ple, if the network resource is surplus and the application can freely occupy much bandwidth, the sender and the clients are liberated from complicated encoding/decoding tasks which require much CPU resource. Conversely, the required bandwidth becomes small when the end systems can sustain the highly effective, but heavy coding tasks. Thus, we have to consider availabilities of both the network and CPU resources. In other words, there must exist balance between two resources for maximizing the users QoS under the resource constraints. In this paper, from analysis on actual MPEG-2 encoding/decoding, we first investigate the relationships among the video quality and the CPU/network resources to provide a real-time video presentation. Then, we propose the QoS control method which maximizes the video quality considering the availability of both the CPU and the network resources. With our method, a high quality real-time video distribution can be provided. This paper is organized as follows. In Section 2, we briefly summarize the relationship between the video quality and QoS parameters of MPEG-2 video. In Section 3, we investigate the relationships among the video quality and the CPU/network resource required to provide a realtime video presentation. We then propose a QoS control method which maximizes the video quality within the restrictions on the CPU and the network resources. Further, we investigate the relationship among coding parameters and the end-to-end delay in Section 4. In Section 5, we will explain how our results presented in Section 3 can be applied to the multimedia communication architecture for providing QoS guarantees. We conclude our paper in Section 6. 2. QoS parameters for MPEG-2 video There are three QoS parameters which specify the MPEG-2 video quality. Those are spatial, SNR (Signal to Noise Ratio), and temporal resolutions [2,3]. We will briefly summarize them. The spatial resolution is specified in terms of the number of pixels of pictures. As the preferred spatial resolution, users may specify 640x480 pixels, 320x240 pixels or 160x120 pixels. When the user receives the video streams of 640x480 pixels large, the user can enjoy the detailed and high quality video contents. When the received video is 160x120 pixels large, on the other hand, the user would suffer from the coarse and rough quality when it is enlarged on the TV monitor, or the smaller and degraded quality on the computer monitor. However, the required network bandwidth and the CPU resource to provide 160x120 video is certainly smaller than 640x480 video. The change in the spatial resolution affects the required CPU/network resources considerably. At the same time, it causes significant change in the perceived video quality [2]. Considering the fact that users prefer the stable video quality, we assume that users first choose the spatial resolution (i.e. 640x480 pixels) and the system keeps this resolution during the session. The SNR resolution is described as the quantization scale of the MPEG-2 coding algorithm. The quantization in the MPEG-2 coding algorithm is performed by applying specific quantization scale against each macroblock of 16x16 pixels. Then the macroblock further divided into 4 blocks of 8x8 pixels, and blocks are quantized with the designated quantization scale of the macroblock. When a larger quantization scale is applied, the quality of decoded block becomes poorer, which leads to degraded SNR values. However, the coded block size becomes smaller, which has a positive effect from the viewpoint of effective resource usage within the network. The temporal resolution is related to the number of frames per second (fps). MPEG-2 video stream consists of three types of frames, I (Intra), P (Predictive) and B (Bi-directionally). The repeated sequence of pictures beginning with I picture is called GoP (Group of Pictures), and used as the indexing point of random access for video control functions (fast forward, reverse, etc.). The frame rate of video stream can be regulated by means of a frame dropping technique [2].
3. Relationship among video quality and required resources In this section, based on observations on coded MPEG video streams, we investigate the relationships among the video quality and the CPU/network resources required to achieve a real-time video presentation. We will discuss QoS parameters of MPEG-2 video described in section 2 in relation to the required bandwidth and CPU cycles at end systems. We employ three video sequences “Scenery”, “Animation” and “Music”. Each sequence consists of 300 frames of pictures captured from a laser disk. For QoS parameters, we use 640x480 pixels, 320x240 pixels and 160x120 pixels as the spatial resolution. The maximum frame rate is 30 frames per second, and GoP structures are I, IP, IB, IPPPPP, IBBPBB or IBPBPB. The quantization scale will be chosen from a range of 4 (highest SNR) to 40 (lowest). In this paper, we employ “mpeg2encode” and “mpeg2decode” [20] as the MPEG-2 software codec. We modify them to deal with VBR (Variable Bit Rate) video coding where the single quantization scale is used throughout the sequence. To measure the number of CPU cycles required to perform real-time video presentation, we employ “SpeedShop” on IRIX6.4. 3.1. Relationship among video quality and required resources at the server side In this subsection, we investigate relationships among the video quality and the amount of network and server’s CPU resources for real-time video presentation. Figs. 1 and 2 show simulation results about 640x480 pixels video sequences “Scenery” and “Music”, respectively. In figures, required resources dependent on GoP structure are depicted with lines. Each point on a line is related to the quantization scale, e.g., 4 ( ), 8 ( ), 12 ( ), 16 ( ), 20 ( ), 24 ( ), 32 ( ) and 40 ( ). Not shown in the figures, video streams of identical quantization scale have almost the same video quality in terms of SNR regardless of GoP structures. We should note that the rate smoothing technique [21] is applied to video streams in a GoP basis, and the required bandwidth and CPU cycles in the figures show the smoothed peak rate throughout the entire video stream. In the case of rate smoothing, an example is shown in the Fig. 4. We first examine the relationships among the GoP structure of video stream and the required CPU/network resources. From Figs. 1 and 2, we can observe that encoding a video sequence with P and/or B pictures require much CPU resource. When the preferable quantization degree is statically specified, this coding strategy is effective in reducing the required bandwidth. However, the motion compensation technique indispensable for coding these pictures requires the large number of processor cycles. The motion compensation task in a P picture is larger than that in a B, because the P picture refers to more distant picture than B. The large search window is employed in P pictures to achieve effective motion compensation. Furthermore, in the case when GoP structure consists of only I and P pictures, P picture employs Dual-Prime prediction to improve the efficiency of prediction and it requires much CPU resource. On the other hand, B pictures requires less CPU resource than P, but they introduce the extra delay which will be discussed later. Comparing Fig. 1 with Fig. 3 where spatial resolution is 320x240 pixels, we can also observe that the number of required CPU cycles in 640x480 pixels video are about 4 times larger than that in 320x240 pixels video. Not shown in the figure, the number of required CPU cycles in 160x120 pixels video are roughly 4 times smaller than that in 320x240 pixels video. Next, we investigate effects of the quantization scale. As shown in Figs. 1 through 3, the required bandwidth decreases with increasing the quantization scale. However, the quantization scale doesn’t affect the required amount of CPU resource so much.
18
I
16
IP
IB
14
IBBPBB
IPPPPP
12 IBBBBB
10 8 6 4 2
Required Bandwidth (Mbps)
Required Bandwidth (Mbps)
18
IBPBPB
0 0
0.5
1
1.5
2
2.5
3
3.5
4
16 14 IBBPBB
I
12 10
IBPBPB
IBBBBB
8
IB
6
IPPPPP
IP
4 2 0
4.5
5
5.5
0
0.5
1
Required CPU resource (1.0e+10 cycle/sec)
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Required CPU resource (1.0e+10 cycle/sec)
Figure 1. Required resources at the server (Scenery, 640x480)
Figure 2. Required resources at the server (Music, 640x480)
Required Bandwidth (Mbps)
6 I 5
IP
IB
IBBPBB
IBPBPB
4
IPPPPP rate
3
rate
IBBBBB 2
allocated bandwidth I
1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
P B
B
B
I
smoothing B
time
1.3
time
Required CPU resource (1.0e+10 cycle/sec)
Figure 3. Required resources at the server (Scenery, 320x240) 16 14 12 10
18
I IP IB IPPPPP IBPBPB IBBPBB IBBBBB
I
IP IB IBBPBB IBBBBB
8 6 4 2 0 0.9
IPPPPP 1.0
1.1
IBPBPB 1.2
1.3
1.4
1.5
1.6
1.7
1.8
Required CPU resource (1.0e+9 cycle/sec)
Figure 5. Required resources at the client (Scenery, 640x480)
Required Bandwidth (Mbps)
Required Bandwidth (Mbps)
18
Figure 4. Smoothing example 16 14 12 10
I IP IB IPPPPP IBPBPB IBBPBB IBBBBB
I IB
8 6 4 2 0 0.9
IBBBBB IBBPBB IBPBPB
IP IPPPPP 1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Required CPU resource (1.0e+9 cycle/sec)
Figure 6. Required resources at the client (Animation, 640x480)
From above observations, the required server CPU resource ( ) in terms of processor cycles can be estimated by a function of spatial resolution ( pixels) as:
(1) where is a constant dependent on GoP structure ( ). Because differs among video sequences, it is difficult to estimate exactly. However, the system will be able to provide the QoS guarantee by using a conservative estimation for .
In our previous study [2], we have proposed a QoS mapping method which enables us to predict the required bandwidth from QoS parameters in terms of spatial resolution ( pixels), SNR resolution ( ), and temporal resolution ( fps). We modify our QoS mapping method considering the rate smoothing and obtain:
F %JI