Upfront Intra-Refresh Decision for Low-Complexity ... - CiteSeerX

2 downloads 4482 Views 157KB Size Report
Email: {yliang,khaled,sharath}@qualcomm.com. Abstract—An adaptive ... the quality of a current frame, but also subsequent inter-coded. (P) frames that depend ...
Upfront Intra-Refresh Decision for Low-Complexity Wireless Video Telephony Yi J. Liang, Khaled El-Maleh, and Sharath Manjunath Qualcomm CDMA Technologies San Diego, CA 92122, USA Email: {yliang,khaled,sharath}@qualcomm.com

Abstract— An adaptive intra-refresh (IR) technique is proposed for low-complexity video encoding on resource-constrained wireless platforms. The IR decision is made upfront without requiring any pre-encoding, which significantly reduces the complexity and power consumption in real-time communication. To allow upfront mode decision, a novel closed-form solution is provided requiring no algorithmic iteration or exhaustive searches. The IR scheme adapts to multiple factors including the content texture information, the frame-to-frame pixel-value variation, and the estimated channel loss probability. The macroblockbased refresh can be performed using various patterns (i.e., cyclic, random etc.). Experimental results demonstrate superior performance of the adaptive scheme over fixed schemes at all channel loss rates.

I. I NTRODUCTION Transmission of video over wireless networks can be unreliable due to channel losses. Errors resulting from channel losses can adversely impact the quality of the video presented to the user. In particular, channel errors can impair not only the quality of a current frame, but also subsequent inter-coded (P) frames that depend on the current frame due to the use of motion estimation and compensation techniques. To limit the propagation of channel-induced errors from one frame to another, the encoder typically employs intra-refresh (IR) techniques to combat errors. To meet the stringent target bitrate and to avoid high rate fluctuations in video telephony applications, intra-coded (I) frames cannot be frequently employed. Instead, macroblock (MB) intra-refresh is considered a more feasible way in such applications. Intra-coded MBs help to improve error-resilience, but usually require an increase in bitrate compared with inter-coded MBs. The core of an IR technique is to determine the coding mode (a.k.a. mode selection) for each macroblock. In previous literature, various schemes have been proposed for intra/inter mode selection, usually based on seeking optimality using certain rate-distortion models [1], [2], [3]. Most notably, in [2], a statistical simulation of the video decoder is used to estimate the channel distortion at the encoder side, with the aim of minimizing the Lagrangian cost that includes both rate and distortion. In [1], an iterative model is used to estimate channel distortion, so that an optimal IR scheme can be found to minimize the total distortion. With the introduction of ITU-T H.264 [4], the efficiency of video compression has increased to a next level, but at the cost of significantly higher coding complexity. More intra and

inter modes are supported than ever before, and optimal mode selection usually involves exhaustive searches that are quite computationally intensive. For real-time applications such as video telephony on handheld devices, such high complexity is usually formidable considering their constraints in processing capability, power etc. For these reasons, many recent works have been focusing on how to reduce the coding complexity without notable degradation in coding quality or efficiency [5], [6], [7]. IR decision being part of the mode decision process, how to narrow down the coding modes in the first place, before actual encoding, becomes very critical for advanced video coding on resource-constrained devices, and is the focus of this work. The major contribution of this work is the upfront IR mode selection, before any actual encoding is performed, which can significantly reduce the complexity and power consumption for encoding on light devices in the real world. The decision could eliminate the need of intensive motion estimation once an intra-mode is selected. This requires a closed-form solution that is not found in previous literature. The closedform solution has to be obtained without requiring any output from actual encoding, such as the number of bits, measured distortion etc. This allows encoding on mobile devices under stringent real-time constraints with satisfactory performance. This paper is structured as follows: Section II starts with an introduction of the IR process, with Subsection II-A on how to determine the IR rate, and Subsection II-B on how to actually determine the mode for a particular MB. Experimental results are presented in Section III to demonstrate the performance of the proposed scheme. II. U PFRONT I NTRA -R EFRESH D ECISION The mode decision process in a typical hybrid video encoder is illustrated in Fig. 1. In this work, for video encoding on light wireless platforms, intra-refresh is determined upfront before motion estimation (ME) and spatial estimation (SE) to reduce the complexity and save processing power. The intra-refresh decision unit takes various inputs including parameters from the source content as well as the estimated channel condition, and then determines whether an MB should be intra-refreshed for enhanced error-resilience. When an MB is determined to be intra-refreshed, the intensive ME process is skipped to save processing power. It may still be subject to spatial estimation and spatial mode decision depending on the codec type. When

tradeoff between error-resilient capability and compression efficiency. In real-time wireless video communication such as video telephony, content is encoded on-the-fly, which is required to meet certain target bitrate depending on the channel condition. The problem hence can be formulated as, given a certain channel condition and a target source bitrate Rs , find out the optimal intra-refresh rate such that the total distortion D after decoding is minimized. The total distortion can be broken into source distortion Ds , resulting from lossy coding, and channel distortion Dc , resulting from data loss during transmission. In past literature, various rate-distortion models have been presented to estimate the source and channel distortions that have shown sufficient accuracy. For instance, in [1], given a particular source rate Rs , the source distortion at an IR rate of β is Ds

= Ds (all inter) + β(1 − λ + λβ) · [Ds (all intra) − Ds (all inter)],

(1)

where λ is a sequence-dependent parameter. In the above, the source distortion is modeled as an increasing quadratic function of the IR rate β, which also depends on the source distortions when all MBs are inter-coded and intra-coded respectively. The channel distortion Dc of Frame n is modeled in [1] as Dc (n) = [(1 − β)(1 − p)b + p] · Dc (n − 1) +pa · Fd (n, n − 1), Fig. 1. Mode decision process with upfront intra-refresh decision in a typical H.264 encoder.

an MB is determined not to be intra-refreshed, normal ME/SE processes will be performed, and an optimal mode will be selected that could be either an intra- or an inter-mode. The intra-refresh decision unit shown in Fig. 1 is divided into two sub-functions. An intra-refresh (IR) rate is first determined, followed by an actual decision on whether the MB should be intra-refreshed according to a refresh pattern. The IR rate applies to either an entire video frame, or a particular MB. In a per-frame-basis IR implementation, the IR rate, denoted by β in this work, is the percentage of the MBs that are intracoded. In a frame consisting of M MBs, β · M MBs will be inter-coded. In a per-MB-basis IR implementation, the IR rate is the probability of a particular MB to be intra-coded1 . The final IR decision has to depend on the IR rate, as well as the refreshing pattern to be described in Subsection II-B. A. Determining the Intra-Refresh Rate Macroblock intra-refresh achieves higher error-resilience over lossy channels, at the cost of coding efficiency. Higher intra-refresh rate brings the benefit of stronger error-resilience, but more intra-coded MBs require more bits. Determining the optimal rate for intra-refresh is critical for improving the 1 Due to space limitation, only the details and results of the per-frame-basis implementation is presented in the rest of the paper unless specified.

(2)

where p is the channel loss probability and Fd (n, n − 1) is the MSE difference between Frames n and n − 1. a and b are constants characterizing the strength of the encoder’s loop filter and the motion randomness of the video scene respectively. In [1], the source models are demonstrated to provide accurate results. However, to find the optimal IR rate β, source distortions for all MBs being inter-coded and intra-coded have to be obtained as in (1), respectively, which requires encoding in both modes first. Due to lack of a closed-form solution, the optimal IR rate β then has to be found by searches so that the total distortion D = Ds + Dc is minimized, which introduces high complexity. In addition, three sequencedependent parameters (λ, a, and b) have to be figured out to estimate the distortions in (1) and (2), which has to depend on the analysis of the sequence. Such computations result in extra encoding delay that can be prohibitive in real-time and interactive video telephony. In real-time video telephony, the nature of “live-coding” and constraints in processing power prohibit performing multiple upfront encodings or exhaustive searches to achieve an optimal solution. A closed-form solution is highly desirable so that the IR rate can be determined upfront, without performing any pre-encoding. In this work, we propose a simplified distortion model which allows upfront intra-refresh decisions. Ideally, the intra-refresh decision should be made only depending on the video content itself and the estimated channel condition without actual encoding. For this reason, we model

the source distortion based on the variance of the content, instead of the distortion resulting from lossy encoding. Using V (n) to represent the variance of Frame n, we model the source distortion as Ds (n) = e + d · V (n) · β,

(3)

where a linear model, determined by the two constants e and d, is used to simplify the following steps. In video-telephony applications, the optimal IR rate is typically below 30% to meet the stringent target bitrate. At this relatively low IR rate, experiments show that the linear model is sufficiently accurate while providing the convenience of simplicity. For bitrate-constrained transmission channels, intra-frames are used infrequently to avoid high bitrate fluctuations. Error concealment by intra-frames is very limited as soon as the coding process is in a steady state. The concealment is mainly contributed by the intra-refreshed MBs. For this reason, we are more interested in stationary formulation of the channel distortion. An asymptotic solution for channel distortion can be obtained from (2) for stationary process [1]: p a · · F (n, n − 1), (4) Dc (n) = 1 − b + bβ 1 − p where Dc (n) is the averaged channel distortion for Frame n when sequence coding is in a stationary state, and F (n, n −1) is the average frame difference Fd (n, n − 1). Using the stationary solution for simplicity, we express the total distortion of Frame n as D(n) = Ds (n) + Dc (n).

(5)

To find the optimal IR rate β, we take the derivative of (5) with respect to β, and equate it to zero ∂D(n) = 0. ∂β

(6)

Then a closed solution for the optimal IR rate β is found to be v p b−1 c F (n, n − 1) · · + , (7) β= b V (n) 1−p b where c = a/d is a constant. Intuitively, in (7), texture information, frame-to-frame variation, and estimated channel loss probability are considered to dynamically adjust the IR rate for incoming video frames. Higher channel loss rate and frame-to-frame variation tend to correlate with higher error rate or quality degradation due to error propagation, and support intra-coding for enhanced error resilience. On the other hand, higher texture variance generally indicates more complex video content, and tends to correlate with a higher intra-coding cost. The IR scheme is adaptive to the changes in channel condition and video content. In general, the IR rate increases when estimated channel loss probability increases, or the frame-to-frame variance increases. The IR rate decreases, however, when the variance of the video content increases. The IR rate varies as these input quantities vary, which reflects the adaptive nature of the technique. By

considering both the channel and source content parameters, the proposed adaptive approach is able to achieve a better tradeoff between error resiliency and coding efficiency. This closed-form solution in (7) is highly feasible and desirable for an upfront IR implementation that requires very low complexity. The proposed IR scheme leverages this closedform solution and does not require any pre-encoding. In a per-frame-basis implementation, the IR rate is expressed as a fixed percentage of the MBs within the current frame that have to be intra-coded. In per-MB-basis implementation, the IR rate may be expressed as the statistical probability that a particular MB is to be intra-coded, in which case V (n) in (7) is measured for an MB instead of a frame, and F (n, n − 1) is measured for co-located MBs from frame to frame. In practice, the loss probability p in (7) can be obtained in a variety of ways. For example, the estimated channel loss probability may be determined based on the receiver feedback from a remote terminal, e.g., via H.245 signaling in a H.323-based video telephony system, or real time transport control protocol (RTCP) in the form of quality of service (QoS) feedback. As a simple alternative, estimated channel loss probability can be estimated at the encoder side using loss statistics from the received bitstream from the remote terminal, assuming symmetry of channel distortion in the transmit and received directions of channel. Channel condition estimates may be obtained periodically. The two parameters (c and b) in (7) are obtained through experimenting with a limited number of sequences, and then applied on generic video sequences. B. Intra-Refresh Pattern In a frame-level implementation, after the IR rate and the number of MBs to be intra-coded are determined for a given P frame, the next step is to decide which MBs are to be intrarefreshed. This can be determined in the following different ways: 1) Cyclic (CIR): refreshing a consecutive block of MBs at one time; 2) Adaptive (AIR): selecting MBs with the lowest variance in a frame; 3) Hybrid (HIR): a combination of 2) and 3) to avoid refreshing the same co-located area in consecutive frames; 4) Random (RIR): selecting MBs randomly. Among the schemes, CIR is the simplest to implement but may result in visual artifacts of periodic refresh. CIR and AIR combined, referred to as HIR, will largely eliminate the artifacts and prevent refreshing the same co-located area from frame to frame. RIR is also a good candidate with relative simplicity but satisfactory performance. III. E XPERIMENTAL R ESULTS We experiment with the proposed adaptive IR scheme on our real-world video telephony product platform. We compare its performance with that of the IR schemes using fixed IR rates, which were used on the platform before the adaptive scheme is proposed. All these schemes under evaluation

Foreman 31

sequence. With the adaptive IR scheme, a gain of 0.8 dB is observed at zero channel loss rate compared with the fixed scheme using a 10% IR rate. At 6% channel loss rate, the gain is 2 dB compared with the fixed scheme using a 5% IR rate. In summary, the adaptive IR-scheme outperforms fixed schemes at all channel loss rates. Similar results are observed in Fig. 3 for the Mother-Daughter sequence tested under the same experimental conditions.

No IR Fixed 5% IR Fixed 10% IR Content adaptive IR

30

PSNR (dB)

29 28 27 26 25 0

1

2

3

4

5

6

Channel loss rate (%)

Fig. 2.

Performance of different IR schemes. Foreman sequence. Mother-Daughter

34

ACKNOWLEDGMENT The authors would like to thank James Feng at UCSD for his help on the experiments.

No IR Fixed 5% IR Fixed 10% IR Content adaptive IR

33.5 33

R EFERENCES

PSNR (dB)

32.5 32

31.5 31 30.5 30 29.5 0

1

2

3

4

5

6

Channel loss rate (%)

Fig. 3.

IV. C ONCLUSIONS We propose a novel intra-refresh scheme with an upfront MB mode decision that significantly reduces the complexity and power consumption during real-time video communication. The scheme considers both channel condition and video content to decide whether a macroblock should be intracoded in an adaptive way, using different refresh patterns. The proposed method can operate either at the frame level or at the MB level. Experimental results demonstrate good rate-distortion performance for different video sequences at all channel loss rates from 0 to 6 percent.

Performance of different IR schemes. Mother-Daughter sequence.

use simple upfront IR decisions without requiring any preencoding, which is mandatory on our low-complexity hardware/software platform in practice. Note that on our real-time hardware/software platform, we are not able to compare with the schemes proposed in [2] or [1] since their algorithmic properties and complexity prohibit them from being implemented. We use a 3GPP-compliant MPEG-4 video codec, and have tested sequences including Foreman and Mother-Daughter in QCIF resolution, representing high and moderate motion, respectively. 150 frames in each sequence are coded, with only the first frame being an I-frame, at a default frame rate of 15 fps. Rate control is used to keep the data rate at approximately 48Kbps, which is typical for video telephony over UMTS channels. RIR refresh pattern is used. Packets are discarded according to simulated channel conditions at loss rates of 0, 2%, 4% and 6%. Fig. 2 shows the PSNR performance for the Foreman

[1] Zhihai He, Jianfei Cai, and Chang Wen Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 6, pp. 511 – 523, June 2002. [2] R. Zhang, S.L. Regunathan, and K. Rose, “Video coding with optimal inter/intra-mode switching for packet loss resilience,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 966–976, June 2000. [3] J.Y. Liao and J.D. Villasenor, “Adaptive intra update for video coding over noisy channels,” in Proceedings IEEE International Conference on Image Processing, Lausanne, Switzerland, Sept. 1996, vol. 3, pp. 763–6. [4] ITU-T Recommendation H.264, Advanced video coding (AVC) for generic audiovisual services, May 2003. [5] Y. J. Liang and K. El-Maleh, “Low-complexity Intra/Inter mode-decision for H.264/AVC video coder,” in Proc. International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP), Hong Kong, Oct. 2004. [6] F. Pan et al., “Fast Intra mode decision algorithm for H.264/AVC video coding,” in Proc. of the IEEE International Conference on Image Processing (ICIP), Singapore, Oct. 2004. [7] J. Støttrup-Andersen, S. Forchhammer, and S.M. Aghito, “Ratedistortion-complexity optimization of fast motion estimation in H.264/MPEG-4 AVC,” in Proc. of the IEEE International Conference on Image Processing (ICIP), Singapore, Oct. 2004.

Suggest Documents