SVC reference software. Thanks to the good performance, the method has been adopted by JVT into the JSVM. 2. ERROR RESILIENT SVC MODE DECISION.
ERROR RESILIENT MODE DECISION IN SCALABLE VIDEO CODING Yi Guo1, Ye-Kui Wang 2, and Houqiang Li1 1 University of Science and Technology of China, Hefei, China 2 Multimedia Technologies Laboratory, Nokia Research Center, Tampere, Finland 1) Loop over all the candidate modes, and for each candidate mode, estimate the distortion of the reconstructed MB resulting from the possible packet loss and the coding rate (e.g. the number of bits for representing the MB). 2) Calculate each mode’s cost that is represented by the following equation, and choose the mode that gives the smallest cost. (1) C D OR In (1), C denotes the cost, D denotes the estimated distortion, R denotes the estimated coding rate, O is the Lagrange multiplier. For better understanding and easy description of the proposed mode decision method, the single-layer method similar as in [3] is first presented.
ABSTRACT Error resilient macroblock mode decision has been extensively investigated in the literature for single-layer video coding, for which error resilient mode decision is also called as intra refresh. In this paper, we present a loss-aware rate-distortion optimized macroblock mode decision algorithm for scalable video coding, wherein more macroblock coding modes than intra and inter are involved. Thanks to the good performance, the proposed method has been adopted into the Joint Scalable Video Model by the Joint Video Team. Index Terms— Video coding, scalable video coding, error resilience, macroblock mode decision 1. INTRODUCTION
2.1. Single-layer Method
The problem of error resilient macroblock (MB) mode selection for single-layer coding has been addressed extensively in the literature [1-3]. In [3], an error propagation map based loss-aware MB rate-distortion mode decision algorithm for single-layer coding is presented. A better performance than the loss-aware rate-distortion optimized mode decision algorithm [4] in the H.264/AVC Joint Model (JM) was reported. In addition, the algorithm has a much lower computational complexity as it does not need a number of decoding processes as the one in the JM. The scalable extension of the H.264/AVC, known as Scalable Video Coding or SVC [5], has recently been the main focus of the Joint Video Team (JVT). In SVC, the traditional MB coding modes in single-layer coding as well as new MB modes using inter-layer prediction are used. Similar as in single-layer coding, the MB mode selection in SVC also affects the error resilience performance of the encoded bitstream. Here in this paper, we have extended the single-layer mode decision method in [3] to multi-layer coding and implemented the method in the Joint Scalable Video Model (JSVM), the SVC reference software. Thanks to the good performance, the method has been adopted by JVT into the JSVM.
Assume that the loss rate is pl. The overall distortion of the mth MB in the nth picture with the candidate coding option o is represented by: D(n, m, o) (1 pl )( Ds (n, m, o) Dep _ ref (n, m, o)) (2) pl Dec (n, m)
where Ds(n,m,o) and Dep_ref(n,m,o) denote the source coding distortion and the error propagation distortion respectively; and Dec(n,m) denotes the error concealment distortion in case the MB is lost. Obviously, Dec(n,m) is independent of the MB's coding mode. The source coding distortion Ds(n,m,o) is the distortion between the original signal and the error-free reconstructed signal. For the calculation of the error propagation distortion Dep_ref(n,m,o), a distortion map Dep for each picture on a block basis (e.g. 4x4 luma samples) is defined. Given the distortion map, Dep_ref(n,m,o) is calculated as: K
Dep _ ref (n, m, o)
(n, m, k , o)
ep _ ref
k 1 K 4
¦¦ w D l
ep
(3)
(nl , ml , kl , o)
k 1 l 1
where K is the number of blocks in one MB, and Dep_ref(n,m,k,o) denotes the error propagation distortion of the kth block in the current MB. Dep_ref(n,m,k,o) is calculated as the weighted average of the error propagation distortion {Dep(nl,ml,kl,ol)} of the blocks {kl} that are referenced by the current block. The weight wl of each reference block is proportional to the area that is used for reference.
2. ERROR RESILIENT SVC MODE DECISION The MB mode selection in SVC is decided according to the following steps:
1424404819/06/$20.00 ©2006 IEEE
¦D
2225
ICIP 2006
The distortion map with the optimal coding mode o* is defined as follows. For an inter coded block wherein bi-prediction is not used, i.e., there is only one reference picture used, Dep (n, m, k ) (1 pl ) Dep _ ref (n, m, k , o*)
2.2. Multi-layer Method In scalable coding with multiple layers, the MB mode decision for the base layer pictures is exactly the same as the single-layer method. For a slice in an enhancement layer picture, if no inter-layer prediction is used, the single-layer method is used, with the used loss rate being the loss rate of the current layer. Otherwise (if interlayer prediction is used), the distortion estimation and the Lagrange multiplier selection processes are presented below. Let the current layer containing the current MB be ln, the lower layer containing the collocated MB used for inter-layer prediction by the current MB be ln-1, the further lower layer containing the MB used for interlayer prediction of the collocated MB in ln-1 be ln-2, …, and the lowest layer containing an inter-layer dependent block for the current MB as l0, and let the loss rates be pl,n, pl,n-1, …, pl,0, respectively. For a current slice that may use inter-layer prediction, it is assumed that the any contained MB would be decoded only if the MB and all the dependent lower-layer blocks are received, otherwise the slice is concealed. For a slice that does not use interlayer prediction, a contained MB would be decoded as long as it is received. The overall distortion of the mth MB in the nth picture in layer ln with the candidate coding option o is represented by:
(4)
pl ( Dec _ rec (n, m, k , o*) Dec _ ep (n, m, k ))
where Dec_rec(n,m,k,o*) is the distortion between the error-concealed block and the reconstructed block, Dec_ep(n,m,k) is the distortion due to error concealment and the error propagation distortion in the reference picture that is used for error concealment. Equation (3) is used to calculate Dec_ep(n,m,k) assuming that the error concealment method is known, i.e., Dec_ep(n,m,k) is calculated as the weighted average of the error propagation distortion of the blocks that are used for concealing the current block, and the weight wl of each reference block is proportional to the area that is used for error concealment. For an inter coded block wherein bi-prediction is used, i.e. there are two reference pictures used, Dep (n, m, k )
wr 0 u ((1 pl ) Dep _ ref _ r 0 (n, m, k , o*) pl ( Dec _ rec (n, m, k , o*) Dec _ ep (n, m, k ))) (5) wr1 u ((1 pl ) Dep _ ref _ r1 (n, m, k , o*) pl ( Dec _ rec (n, m, k , o*) Dec _ ep (n, m, k )))
where wr0 and wr1 are, respectively, the weights, of the two reference pictures used for bi-prediction. For an intra coded block, no error propagation distortion is transmitted, only error concealment distortion is considered: Dep (n, m, k ) pl ( Dec _ rec (n, m, k , o*) Dec _ ep (n, m, k )) (6)
n
D ( n, m, o)
i 0
(1 pl )( Ds ( n, m, o) Def _ ref ( n, m, o))
(12)
n
Dep _ ref (n, m, o)) (1- (1 pl ,i )) Dec (n, m) i 0
According to [6], the error-free Lagrange multiplier is represented by dD (7) Oef s dR However, when transmission error exists, a different Lagrange multiplier may be needed. Combining (1) and (2), we get C (1 pl )( Ds (n, m, o) Dep _ ref (n, m, o)) (8) pl Dec (n, m) OR Let the derivative of C to R be zero, we get dD (n, m, o) (9) O (1 pl ) s (1 pl )Oef dR Consequently, (1) becomes C
( (1 pl ,i ))( Ds ( n, m, o)
Where Ds(n,m,o) and Dec(n,m) are calculated the same as in the single-layer method. Given the distortion map of the reference picture in the same layer or in the lower layer (for inter-layer texture prediction), Dep_ref(n,m,o) is calculated using (3). The distortion map is derived as presented in below. When the current layer is of a higher spatial resolution, the distortion map of the lower layer ln-1 is first upsampled. For example, if the resolution is changed by a factor of 2 for both the width and the height, then each value in the distortion map is simply upsampled to be a 2 by 2 block of identical values. a) MB modes using inter-layer intra texture prediction Inter-layer intra texture prediction uses the reconstructed lower layer MB as the prediction for the current MB in the current layer. In JSVM, this coding mode is called Intra_Base MB mode. In this mode, distortion can be propagated from the lower layer. Then the distortion map of the kth block in the current MB is as in Eqn. 13. Note that herein Dep_ref(n,m,k,o*) is the distortion map of the kth block in the collocated MB in the lower layer ln-1.
(10)
pl Dec(n,m) (1 pl )Oef R
Since Dec(n,m) is independent of the coding mode, it can be removed. After Dec(n,m) is removed, the common coefficient (1-pl) can also be removed, which finally results in (11) C Ds (n, m, o) Dep _ ref (n, m, o) Oef R
2226
Dec_rec(n,m,k,o*) and Dec_ep(n,m,k) are calculated the same as in the single-layer method.
d) MB modes not using inter-layer prediction For an inter-coded block, (14) and (15) are used to generate the distortion map. While for an intra-coded block:
n
Dep (n, m, k ) ( (1 pl ,i )) Dep _ ref (n, m, k , o*) i 0
n
(1 (1 pl ,i ))( Dec _ rec (n, m, k , o*)
(13)
n
Dep (n, m, k ) (1 (1 pl ,i ))( Dec _ rec (n, m, k , o*)
i 0
(16)
i 0
Dec _ ep (n, m, k ))
Dec _ ep (n, m, k ))
Combining (1) and (12), we get
b) MB modes using inter-layer motion prediction
n
C
In JSVM, two MB modes employ inter-layer motion prediction, the base layer mode and the quarter pel refinement mode, wherein the motion vector field, reference indices and MB partitioning of the lower layer are used for decoding the corresponding MB in the current layer. The inter prediction process still uses the reference pictures in the same layer. For a block that uses inter-layer motion prediction and does not use biprediction, the distortion map of the kth block in the current MB is
i 0
(1- (1 pl,i )) Dec (n, m) OR i 0
Let the derivative of C to R be zero, then we get n
O
(19)
n
(1- (1 pl,i )) Dec (n, m) ( (1 pl,i ))Oef R i 0
i 0
(14)
Herein, Dec(n,m) may be dependent on the coding mode, since the MB may be concealed even it is received, while the decoder may utilize the known coding mode to use a better error concealment method. Therefore, the Dec(n,m) item should be retained. Consequently, the n
coefficient
n
wr 0 u (( (1 pl ,i )) Dep _ ref _ r 0 (n, m, k , o*)
(1 p
l,i
) that is not common for all the
i 0
item should also be retained.
i 0
(15)
3. SIMULATION RESULTS
n
Dec _ ep (n, m, k ))) wr1 u (( (1 pl ,i )) i 0
( (1 pl,i ))( Ds ( n, m, o) Dep _ ref (n, m, o)) n
For a block that uses inter-layer motion prediction and also uses bi-prediction, the distortion map of the kth block in the current MB is
i 0
(18)
i 0
i 0
i 0
(1 (1 pl ,i ))( Dec _ rec (n, m, k , o*)
n
( (1 pl,i ))Oef
n
C
Dec _ ep (n, m, k ))
n
dDs ( n, m, o) dR
Consequently, (1) becomes
i 0
Dep (n, m, k )
( (1 pl,i ))( i 0
Dep (n, m, k ) ( (1 pl ,i )) Dep _ ref (n, m, k , o*) (1 (1 pl ,i ))( Dec _ rec (n, m, k , o*)
(17)
n
n
n
( (1 pl,i ))( Ds (n, m, o) Dep _ ref (n, m, o))
To demonstrate the error resilience performance of the proposed mode decision algorithm, we also implemented random intra refresh (RIR) and circular intra refresh (CIR) into the JSVM. Assume that the number of intrarefreshed MBs in a frame is m. In RIR, m MBs are randomly selected to be intra-coded for each non-intra frame. For CIR, the positions of MBs forced to be intracoded are 1~m in the first non-intra frame, m+1~2m in the second non-intra frame, and so on. After all MBs have been refreshed in CIR, the refresh pattern is repeated. The bitstreams generated by the proposed method, RIR, CIR, and no error resilience (i.e. the default mode decision algorithm targeting for maximum compression efficiency) were decoded after loss simulation using the loss simulator in [7]. The loss simulator discards coded slices according to the loss patterns for a target loss rate. The proposed method (denoted as “Proposed”) was optimized for the given target loss rate. The refresh ratios used for RIR and CIR are 2, 3 and 4 times of the target loss rate. For a refresh ratio of 2, if the target loss
n
Dep _ ref _ r1 (n, m, k , o*) (1 (1 pl ,i )) i 0
( Dec _ rec ( n, m, k , o*) Dec _ ep (n, m, k )))
Note that herein Dep _ ref (n, m, k , o*) is the distortion map of the kth block in the collocated MB in the ln. reference picture in the same layer Dep _ ec ( n, m, k , o*) and Dec _ ep (n, m, k ) are calculated the same as in the single-layer method. c) MB modes using inter-layer residual prediction In inter-layer residual prediction, the residual signal of the lower layer is used for predicting the residual of the current layer. The difference between the residual of the current layer and the residual of the lower layer is coded. If the residual of the low layer is received, then there is no error propagation due to residual prediction. Therefore, (14) and (15) are used to derive the distortion map for a MB mode using inter-layer residual prediction.
2227
rate is PLR, and the total number of MBs in the current layer is M, then the number of MBs that are forced to be intra coded is hPL5h M). The common conditions for SVC error resilience simulations [8] were followed as much as possible, with the following differences: 1) Due to that coding of multiple slices per picture has not yet been supported in the JSVM, each picture was coded as one slice and assumed encapsulated in one Real-time Transport Protocal (RTP) packet. 2) The quantization parameter was adjusted such that roughly the base layer is of 64kbps and the enhancement layer (including the base layer) is of 128kbps. 3) It was assumed that the first frame is not lost. The used JSVM version was 3.0. IPPP coding pattern was used for the two encoded QCIF@15Hz coarse-grain scalable (CGS) layers. A lost or undecodable slice was concealed by frame-copy from the latest reference picture in the same layer.
in each figure. The two loss rates were used both for encoding and loss simulation. As can be seen from the results, the performance of the proposed method outperforms all the other methods at almost all the simulated loss rates. Note that all the curves converge at the point of zero loss, hence that point is not shown in the figures. More simulation results can be found in [9]. 4. CONCLUSIONS A loss-aware rate-distortion optimized macroblock mode decision algorithm for scalable video coding was presented. An error propagation distortion value is estimated and stored for each block, and then used in the distortion estimation of blocks in subsequent pictures. Selection of the Lagrange multiplier used to combine the distortion and the rate is also discussed. Possibly different loss rates for different scalable layers are considered. Simulation results show that the proposed method has good error resilience. 5. REFERENCES [1] Zhang, R. Regunathan, S.L. Rose, K. “Video coding with optimal inter/intra-mode switching for packetloss resilience”, IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 966-976, Jun. 2000. [2] Z. He, J. Cai and C. W. Chen, “Joint source channel ratedistortion analysis for adaptive mode selection and rate control in wireless video coding”, IEEE Trans. CSVT, vol. 12, no. 6, pp. 511-523, Jun. 2002. [3] Y. Zhang, W. Gao, H. Sun, Q. Huang and Y. Lu, " Error resilience video coding in H.264 encoder with potential distortion tracking," IEEE Int’l Conference on Image Processing. Volume 1, pp. 163 – 166, Oct. 2004. [4] T. Stockhammer, D. Kontopodis, and T. Wiegand, “Ratedistortion optimization for JVT/H.26L coding in packet loss environment”, Packet Video Workshop 2002, Pittsburgh, PY, USA, Apr. 2002. [5] Joint Video Team, “Joint Scalable Video Model – JSVM6: Joint Draft 6 with proposed changes”, JVT-S202, Mar. 2006. [6] T. Wiegand and B. Girod, “Lagrangian multiplier selection in hybrid video coder control,” IEEE Int’l Conference on Image Processing. Volume 3, pp. 542 545, Oct. 2001. [7] Y. Guo, H. Li and Y.-K. Wang, “SVC/AVC loss simulator donation”, JVT-Q069, Oct. 2005. [8] Y.-K. Wang, S. Wenger and M.M. Hannuksela, “Common conditions for SVC error resilience testing”, JVT-P206, Jul. 2005. [9] Y. Guo, Y.-K. Wang, and H. Li, "Error resilient mode decision in scalable video coding," JVT-R057r1, Jan. 2006.
Fig. 1. Simulation results of “Foreman” (pl,0=3%)
Fig. 2. Simulation results of “Paris” (pl,0=3%) The curves of average luma PSNR versus pl,1 (i.e. the loss rate of the enhancement layer) for the “Foreman” and “Paris” sequences are shown in Figs. 1-2. The loss rate of the base layer, i.e., pl,0 is fixed as shown
2228