Optimal video coding for bit-rate switching applications: a ... - CiteSeerX

1 downloads 0 Views 721KB Size Report
In this work 1 we discuss a game theoretic approach to bitstream switching in video coding. Fast and bit-saving video bitstream switching is an important issue in ...
Optimal video coding for bit-rate switching applications: a game-theoretic approach Stefania Colonnese, Gianpiero Panci, Stefano Rinauro, and Gaetano Scarano Dip. INFOCOM, Università “La Sapienza” di Roma via Eudossiana 18, 00184 Roma, Italy e-mail: {colonnese,panci,rinauro,scarano}@infocom.uniroma1.it.

Abstract In this work 1 we discuss a game theoretic approach to bitstream switching in video coding. Fast and bit-saving video bitstream switching is an important issue in video communication system on time varying channels. The most recent video coding standard, namely H.264, support the seamless switching among bitstreams coded at different bitrates by means of suitably coded frames, named Switching Pictures. Since the rate-distortion characteristics of switching frames differ from those of I and P frames, their location affect both the bit-rate and the quality of the coded sequence. In this work, we address the optimization of the SP frames location under an assigned bitrate budget. At this aim we restor to a game theoretic approach and we show that the optimal solution is met when the SP frames are assigned to the frame with the smallest innovation. Experimental results show the advantage in terms of both rate and distortion achieved by the optimized Switching frame insertion with respect to basic H.264 coding.

I. Introduction Fast and bit-saving video bitstream switching is an important issue in video communication on channels varying due to mobility and/or handover in heterogeneous networks. The most recent video coding standard, namely ITU-T Rec. H.264 introduces two new frame types, SP-frames and SI-frames, that can be perfectly reconstructed even when different reference frames are used for their prediction [1]. This property allows seamless bitstream switching, at a lower cost than using I-frames to provide random access. Theoretical and empirical rate-distortion curves of SI and SP frames have been provided in [2]. Since the rate-distortion curve of SI and SP frames are different from those of usual I and P frames, it is worth investigating how the switching frames can be introduced under suitable optimality criteria. Although the rate-distortion optimization issue is widely discussed in video coding literature [3], [4], it is still of great interest and novel results have been recently provided by resorting to a game theory based technique for optimizing the bit rate control 1 This work is partially supported by Italian National project Wireless 8O2.16 Multi-antenna mEsh Networks (WOMEN) under grant number 2005093248.

in video coding [5]. In [5], the authors optimize the perceptual quality of the decoded sequence while guaranteeing “fairness” in bit allocation among macroblocks. Since the whole frame is an entity perceived by viewers, macroblocks represent players that compete cooperatively under a global objective of achieving the best quality with the given bit constraint. Following the approach of the optimization in [5], here we formulate the problem of video coding for bitstream switching by representing the frames of a sequence as players and the overall sequence quality as the objective function. The strategy of each player is the choice of the coding mode and the allocated bits. Experimental results are provided to show the performance of the optimized coding strategy in a video streaming environment allowing bitstream switching.

II. Bitstream switching using SP frames The H.264 video coding structure introduces two special syntactic structures, named Switching Intra (SI) and Switching Predicted (SP) frames, that allow drift-free switching between different coded bitstreams. In fact the switching pictures provide access points to the coded bitstream, so that during the communication the bit-rate can be dynamically changed to adapt to the network conditions resorting to pre-coded bitstream. Each switching frame has a primary and secondary representation. The primary representation is sent along each bitstream and it provides the virtual access point to the bitstream for users incoming by other bitstream. The secondary representation is sent only at the switching phase, and it allows decoding exactly the same frame as the primary representations, while using different reference frames. In other words the primary and secondary coded frames are two alternative representations of the same frame, differing only in the prediction step. An example of SP frames application is shown in Fig.1, representing two bitstreams and the corresponding decoded frames. The frames ranging from 1 to 3 are decoded from the first coded bitstream, while frames 5 to 7 are decoded from the second bitstream. The frame 4 is decoded using its secondary representation, transmitted only when bitstream switching is performed, and prediction from the first bitstream. Thanks to Switching Pictures features, frame 4 is identical to the frame 4 decoded using its primary representation and prediction from the second bitstream. The adoption of SP frames is particularly useful in

streaming applications, since it provides virtual access points to each bitstream at a lower cost than I frames.

quality of a frame as ui (bi , ri ) =

ri K(bi ) mα i

(2)

being mi the standard deviation of the innovation process between frame i and frame i − 1 (α = 0.8) and where the function K(bi ) accounts for the different coding efficiency of normal frames and switching frames, i.e. K(1) < K(0). The main difference between the model in (2) and that in [5] is that in this latter the quality is straightforwardly related to the macroblock quantization parameter by u(M B) ≈ 12/Q2M B . In our case such a simple relation doesn’t stand and a possible reformulation will be addressed in the following. The admissible N -tuples for b0 , · · · , bN −1 are 2N . For an assigned set of coding modes b0 , · · · , bN −1 , the NBS for the game can be shown to be ri = K(bi ) mα i di +

Fig. 1. Primary and secondary SP frames. III. Game theoretic approach to SP allocation In this Section, we formulate the video coding problem in the framework of game theoretic optimization; in details, following the approach in [5], we recast the problem of frame type selection and bit allocation in terms of strategy selection in a bargaining game. To cope with a practical coding asset, the maximum number of allocated switching frames should be fixed in the base of the desired degree of accessibility. Hence, without loss of generality, we can assume that the overall video sequence to be coded is divided in shorter sequences of N frames. In these sequences, only one frame out of N is a switching one, and it is chosen according to the optimization method described in the following. The players of the game are the N frames of the sequence. The strategy of each player is its coding type, denoted by a binary value bi equal to zero for P frames and to one for SP frames, and the number of bits allocated to, denoted ri . Given N  by −1 r frames of a sequence at a frame rate of f0,  N n=0 i ≤ RN/f0 −1 , being R(bits/s) the target bit-rate, and N n=0 bi = 1. The utility ui of each player, representing its preference, is its visual quality after decoding. Furthermore, each player is characterized by an initial utility di , that is the minimal visual quality that must be guaranteed, and by a corresponding minimum number of allocated bits ri0 required to guarantee the quality di . The tuples < u0 , . . . , uN −1 , d0 , . . . , dN −1 > represents the game settings. NBS is a unique solution that satisfy a set of axioms (efficiency, linearity, independency of irrelevant alternatives and symmetry) [6], and it can be found as the solution of the maximization problem [7], −1 max ΠN i=0 (ui (bi , ri ) − di )

N −1

(1)

under the constraints ri > ri0 , n=0 ri ≤ RN/f0 . Each factor in (1) can be seen as a quality improvement respect to the minimal initial quality di . Here, extending the approach in [5], we define the visual

N −1 R 1  − K(bn) mα n dn f0 N n=0

(3)

If the initial quality is uniform over the sequence, i.e. di = d0 , i = 1, · · · N − 1, since ui is a concave injective function of ri , it is easily shown that the gain in (1) is maximized by choosing the switching coding mode bi = 1 for the frame with the minimum standard deviation of the innovation process: bi = 1, i = arg minn=0,··· N −1 mα n This solution correspond to the intuitive choice of assigning the less efficient coding mode to the frame with smallest innovation.

IV. Optimization algorithm The coding optimization algorithm is applied by first partitioning the overall sequence in different shortest sequences of equal length N , in which one and only one SP frame is introduced. Then, the optimization is performed on each sequence, in different steps. First, the standard deviation of the innovation process of the i-th frame is estimated, as the standard deviation of the motion-compensation residual. Let us observe that, generally speaking, the motion compensation residual during the coding process depends on the compression ratio and differs from that evaluated on the original sequence. However, we observe that the relative perturbation of the variance of the residual is negligible for a large range of compression ratios. Moreover, in a realistic streaming system the primary SP frames of a sequence should be set in all the coded version of the same sequence at the same temporal index. Hence, in the following we will evaluate the standard deviation of the motion-compensation residual on the frames of the original sequence. Once the standard deviation has been evaluated, the switching coding mode bi = 1 is assigned to the frame with the minimum standard deviation of the innovation process. After the choice of the coding mode of each frame, the preliminary matter assignment of the initial rates ri0 is performed, based on the initial assignment of the qualities di = d0 , i = 1, · · · N −1. Recent investigations on the theoretical and experimental ratedistortion performance of SP and P frames have highlighted that a given level of distortion is achieved by higher rate for SP frames than for P frames. Hence, to equalize the initial quality, a large initial bit budget ri0 is assigned to the SP frame; namely, in the simulations, the frame bit budget ri0 , i = 0, · · · N − 1

Fig. 2. Relationship between the bit budget r i and mαi in P (left) frames and SP (right) frames. is calculated by a preliminary matter coarse quantization coding with two different quantization parameters, one on SP frames and the other on P frames. The ri , i = 0, · · · N − 1 are then straightforwardly evaluated using (3). Given the frame bit budget ri , the corresponding quantization parameter QPi can be evaluated as the minimum value compatible with the assigned value ri . However, we have 2 investigated a generalization of the relation ri = Kmα i /Q appearing in [5] referring to a macroblock level. Interestingly enough , at a frame level the relation can be generalized as P , with P > 2, yielding an optimized quanri = K(bi )mα i /Q 1/P . In the simulations, tization parameter Qi = [K(bi)mα i /ri ] we found that a heuristic value of P = 7, that accounts for the circumstance that, at a frame level, the rate diminishes faster with the quantization level than at a macroblock level.

V. Experimental results In this Section, we show some experimental results of the optimized method here analyzed, obtained using the H.264 codec [8], on the test sequence Foreman in QCIF format. Fig.2 shows the experimental relationship observed between the bit budget ri and m α i on P frames and SP frames, respectively; although random oscillations are present, a definite linear trend is observed, as invoked in Eq.(2). As mentioned before, the optimization is performed by partitioning the sequence in group of N frames. In the following, we will assume N = 12, (one SP frame every 1200 ms) which represent a good compromise between accessibility and compression efficiency. Fig.3 shows the standard deviation of the innovation process mi evaluated on the original sequence. For every window of N frames the switching coding mode bi = 1 is assigned to the frame with minimum standard deviation. Then, the optimization algorithm is applied by evaluating the bit budget and the quantization parameter for each frame to be coded. Since the optimized quantization parameter is real, it has been approximated with 1/P ˆ i = ceil{[K(bi )mα }. Q i /ri ] Optimized coding has been performed at the reference bitrates of 48Kb/s and 70kb/s, and dynamically varying the bit-rate by switching between the two video bitstreams. For comparison sake, we have also evaluated the performance of a suboptimal coding algorithm with fixed SP periodicity; the

quantization parameters on P and SP frames of the suboptimal coding algorithm have been set to the mode of the optimized quantization parameters on P and SP frames, respectively. To quantitatively evaluate the quality of the 8-bit/pixel coded luminance frame x ˆ[i, j], with respect to the original luminance frame x[i, j], we refer to the Peak Signal-to-Noise Ratio (PSNR)  def ˆ[i, j])2 . PSNR = 176 × 144 · 2552 / i,j (x[i, j] − x In Figs. 4, we show the bit/frameby the optimal and suboptimal coding algorithm at a bit-rate of 45.7Kb/s and 46.9Kb/s respectively, while in Fig.5 we show the bit-rate achieved by the optimal and sub-optimal coding algorithm at a bit-rate of 70.27Kb/s and 70.80Kb/s. In both cases a gain in the instantaneous bit-rate for the optimal algorithm can be observed. Moreover in spite of the gain in term of bitrate, there are no losses in terms of average PSNR, which in both cases is slightly higher for the optimal algorithm. In detail the optimized algorithm reach a 0.5 dB gain in both cases. In particular, we observe the very different distribution of the bit-budget assigned to the SP frames, that is much more equalized using the optimal coding algorithm. This results in a better utilization of the output buffer and in more equalized transmission delays, conditions that are quite important in a realistic communication scenario. Finally, the advantage of the optimal algorithm has been assessed on the secondary representation of SP frames as shown in Fig.6, reporting the bits required by switching between optimized and not optimized bitstreams. Since the cost of secondary frame may be significative, in particular when switching to a video stream at higher bit-rate, the optimization reduce the average frame cost also during the transient from a rate to another.

VI. Conclusion In this work, we discuss a game theoretic approach to bitstream switching in video coding, which is an important issue in video communication over time varying channels. The recent H.264 video coding standard supports the drift-free switching among bitstreams coded at different bitrates by means of suitable syntactic structures, named Switching Pictures. Since the ratedistortion characteristics of switching frames differ from those of I and P frames, their location affects both the bit-rate and the quality of the coded sequence. Here, we adopt a game theoretic approach to rate-distortion optimization of Switching Pictures

Fig. 3. Standard deviation of the innovation process m i of the original sequence, and optimal (black) and suboptimal (gray) SP frame locations.

Fig. 6. Bit budget of secondary SP frames, 48 Kb/s 70 Kb/s up and down switching, optimal (black) and suboptimal (gray) algorithm. coding, optimizing both the coding mode and the assigned rate of the coded frames. Experimental results show the gain in terms of both rate and distortion achievable using the optimized SP coding.

Acknowledgment The authors would like to thank Fabrizio Cassandra for providing simulations.

References [1] M. Karczewicz, R. Kurceren, “The SP- and SI-Frames Design for H.264/AVC,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 13, June 2003.

Fig. 4. Bit/frame of the sequence Foreman coded at 48 Kb/s, 10 frames/s, using the optimal (black) and suboptimal (gray) algorithm.

[2] E. Setton, B. Girod, “Rate-Distortion Analysis and Streaming of SP and SI Frames,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 16, June 2006. [3] P.A. Chou, Z. Miao, “Rate-Distortion Optimized Streaming of Packetized Media,” IEEE Trans. on Multimedia, Vol. 8, April 2006. [4] T. Wiegand, B. Girod, “Lagrange multiplier Selection in Hybrid Video coder Control,” IEEE International Conference on Image Processing, Thessaloniki, Greece, October 2001. [5] I. Ahmad, J. Luo, “On Using Game Theory to Optimize the Rate Control in Video Coding,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 16, February 2006. [6] J. Nash, “Two-person cooperative games,” Econometrica, vol. 21, January 1953. [7] A. Stefanescu and M.W. Stefanescu, “The arbitrated solution for multi-objective convex programming,” Rev. Roum. Math. Pure Applicat., vol. 29, 1984. [8] H.264/AVC Codec Software Archive [Online], “ftp://ftp-imtcfiles.org/jvt-experts/reference software”

Fig. 5. Bit/frame of the sequence Foreman coded at 70 Kb/s, 10 frames/s, using the optimal (black) and suboptimal (gray) algorithm.