Pyramid model for macroblock inter=intra mode selection in robust video coding. S.K. Im and A.J. Pearmain. A model is presented to efficiently select inter or ...
Pyramid model for macroblock inter=intra mode selection in robust video coding
In the simplified example of Fig. 1, C ¼ 4, therefore only three out of the four chains considered are redistributed from frame-3 to frame-4.
S.K. Im and A.J. Pearmain A model is presented to efficiently select inter or intra coding mode for macroblocks at an anticipated packet loss rate. This model considers only a few most probable combinations of decoder errors at the encoder side and offers a low complexity algorithm for inter=intra selection to balance compression efficiency with error resilience.
Introduction: In transmission of inter-frame coded video through unreliable channels the impact of errors in a single frame can propagate to several following frames. If feedback from the decoder to encoder is available, this error can be limited by either selecting another inter-prediction reference frame or coding the affected macroblocks (MBs) independently in intra mode [1]. However, there are several scenarios where the feedback is neither available nor applicable in which forced intra updates can be applied for an anticipated packet loss rate [2]. Since intra coding is expensive in terms of compression efficiency, a careful inter=intra mode decision must be applied. In this regard, the best solution appears to be rate-distortion optimised macroblock mode decision where the distortion is calculated based on the expected pixel values at the decoder [2, 3]. A statistical model is suggested in [3], which involves a recursive calculation of expected pixel values. In [2] several decoders are executed at the encoder, each with a random packet loss model, and the average pixel values are calculated to determine the anticipated distortion. Although the method in [2] gives a realistic estimation of the pixel distortion and so outperforms the method in [3], it adds a considerable complexity to the encoder. In this Letter, we introduce a low complexity model in which only a few most probable combinations of decoder errors are considered and the distortion of each is weighted proportional to its probability of occurrence. Simulation results confirm that this model is less complex and provides the same error resilience as the method in [2] with about 70% fewer decoders. Pyramid error model: After transmission of N coded video packets, without feedback the encoder has no knowledge whether each of the past N packets is successfully transmitted or not. In this case, 2N different combinations of past transmission history can exist. For calculating the anticipated interprediction distortion for encoding the current frame, one should consider all these combinations and find the overall average. Of course this is not possible and in [2] a certain number of combinations are randomly selected. The larger this number, the more accurate the distortion estimation will be at the cost of a longer processing time and a larger memory requirement for the encoder. We note that certain combinations of past transmission history have higher statistical probability and also many of them have more or less the same influence on picture distortion. Based on this, we can consider only the most probable and most influential combinations with a weight proportional to their probability. To confine the details of the proposed model to a limited space and to preserve the presentation simplicity, let us first make the following assumptions (nevertheless the model can be easily extended to more complex scenarios). We assume that a coded video consists of one error-free I-frame at the beginning followed by several P-frames, each predicted from its previous frame and encapsulated in one packet with an unconditional loss probability of P. Fig. 1 shows a simplified schematic of our pyramid temporal error model. Frame 0 is an intra frame and is assumed always received. The rest of the frames can be either received or lost as shown in the Figure. For example, after transmission of frame-2 there are four possible combinations or ‘temporal chains’ of (frame-1=frame-2) e {(received=received), (received=lost), (lost=received), (lost=lost)} with associated probabilities of {(1-P)2, (1-P)P, P(1-P), P2}. Since the probability of the lost=lost case is P 2, which is negligible compared with the other chains, we eliminate this chain (shown in grey in Fig. 1) at this stage. Therefore, three chains are left and redistributed at frame3. Again, certain chains have P 2 component in their probability and hence are removed. So, the redistribution of the remaining chains at each frame adds one new chain to the model. To confine the complexity of the model, only a maximum of C chains are analysed at each stage.
Fig. 1 Error propagation pyramid model with C ¼ 4 (C represents maximum number of analysed temporal chains.)
We note that the impairment caused by transmission errors decays over time, and so the top chain considered in frame-3 (highlighted with a bold border) is not redistributed further and is merged with the bottom chain, which is the totally error-free chain. This assumption may appear unrealistic in this simplified example but as C tends towards larger values it will be acceptable, especially as intra updates also stop the error propagation. Distortion estimation and mode selection: Given the above model, at each frame encoding stage there is one error-free reference frame and C-1 distorted versions of this reference frame. These distorted versions are created by assuming the pyramid frame-loss model and then employing an error concealment algorithm, preferably the same one as in the decoder, for the lost frames. The error-free reference frame has an occurrence probability of (1-P)(C-1) and each of its distorted versions has a probability of P(1-P)(C-2). If the sum of absolute distortion (SSD) of the reconstructed pixels (i.e. prediction plus residual data) using the error-free reference is SSD0, and the distortion of the reconstructed data using the distorted versions of the reference frame is SSDi (i ¼ 1 to C-1), the expected overall SSD will be: P SSDi ð1Þ SSD ¼ ð1 PÞðC1Þ SSD0 þ Pð1 PÞðC2Þ i¼1toC1
Note that each SSDi represents the difference between reconstructed pixels and the original pixels, and the motion vectors and modes for all SSDi calculations are identical. The overall SSD is then given to the Lagrangian rate-distortion optimiser for cost computation of: J ¼ SSD þ l : R
ð2Þ
where R is the number of bits for the selected macroblock mode and l is the Lagrangian multiplier as detailed in [2] and [4]. Given a nonzero packet loss rate with the pyramid model, the intra mode normally has less distortion than the inter mode since it is independent of the distorted reference frames, but it requires more bits. The optimiser will decide which mode has the minimum cost and should be selected. Simulation results: The proposed method has been incorporated into the H.264 reference software and the results are compared with the method in [2]. Fig. 2 shows the average decoded PSNR of 100 simulation runs for the Foreman QCIF test sequence encoded at 100 kbit=s and 15 frames=s. Here we assumed that the encoder had knowledge of the expected packet loss rate. For the method of [2] we conducted the experiments with C ¼ 7 and 30, and for the proposed method C ¼ 7. It can be seen that for the proposed method there is no need to increase C (which directly represents the complexity of the algorithm) in order to achieve the same performance of the method in [2] with C ¼ 30. That means the proposed method can reduce about 70% of the encoder complexity both in terms of operation time and memory usage. Fig. 3 shows the same results as for Fig. 2 but this time the encoder was not aware of the expected packet loss rate. In these situations, a medium loss rate is considered for the error resilience optimisation (here we selected 5%). Again, the proposed pyramid model with only seven decoders gives similar results to the method in [2] with 30 decoders.
ELECTRONICS LETTERS 24th May 2007 Vol. 43 No. 11
Conclusion: For error resilient mode selection, a pyramid model is presented which limits the number of decoder chains for decoded pixel distortion estimation at the encoder. It has been shown that this model is less complex and has similar performance to the conventional algorithm while requiring about 70% fewer decoders. # The Institution of Engineering and Technology 2007 26 December 2006 Electronics Letters online no: 20073931 doi: 10.1049/el:20073931 S.K. Im (MPI-QMUL Information Systems, Research Centre Rua de Luis Gonzaga Gomer, Macao) Fig. 2 Average PSNR against packet loss rate (PLR) (encoder assumed to know expected PLR)
A.J. Pearmain (Electronic Engineering Department, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom) References 1 Ghanbari, M.: ‘Standard codecs: image compression to advanced video coding’ (IEE, London, 2003) 2 Stockhammer, T., Hannuksela, M.M., and Wiegand, T.: ‘H.264=AVC in wireless environments’, IEEE Trans. Circuits Syst. Video Technol., 2003, 13, (7), pp. 657–673 3 Zhang, R., Regunathan, S.L., and Rose, K.: ‘Video coding with optimal inter=intra-mode switching for packet loss resilience’, IEEE J. Sel. Areas Commun., 2000, 18, (6), pp. 966–976 4 Wiegand, T., Schwarz, H., Joch, A., Kossentini, F., and Sullivan, G.J.: ‘Rateconstrained coder control and comparison of video coding standards’, IEEE Trans. Circuits Syst. Video Technol., 2003, 13, (7), pp. 688–703
Fig. 3 Average PSNR against packet loss rate (PLR) (encoder optimised for PLR ¼ 5 since it has no knowledge of network conditions)
ELECTRONICS LETTERS 24th May 2007 Vol. 43 No. 11