Low-Complexity Wyner-Ziv Video Coding Based ... - Semantic Scholar

4 downloads 6524 Views 459KB Size Report
recovery based on robust image hashing without needing to perform motion estimation. .... image hashing scheme, called structural digital signature (SDS).
Low-Complexity Wyner-Ziv Video Coding Based on Robust Media Hashing+ Li-Wei Kang and Chun-Shien Lu* Institute of Information Science, Academia Sinica Taipei, Taiwan, ROC Email: {lwkang, lcs}@iis.sinica.edu.tw B. Related Work

Abstract—To meet the requirement of distributed video coding in resource-limited sensor networks, Wyner-Ziv theorem-based source coding with side information available at the decoder states that an intraframe encoder with interframe decoder system can achieve comparable coding efficiency of a conventional video codec. Most existing Wyner-Ziv video coding systems are with light encoder and heavy decoder. In this paper, a new content-aware media hash-based Wyner-Ziv video codec with light encoder and light decoder is proposed. The key is that the significant differences between a video frame and its reference frame are efficiently extracted and used for frame recovery based on robust image hashing without needing to perform motion estimation. The particular contribution of our method is its low complexity in both the encoder and decoder sides. Simulation results demonstrate the achievable coding efficiency of our method in particular for videos with small and middle motions. Keywords—Wyner-Ziv video coding; distributed video coding; media hash; video sensor network; wireless video communication Topic area—multimedia processing.

I. INTRODUCTION A. Background Conventional video compression standards, such as H.264/AVC [1], usually perform motion estimation among successive frames for interframe predictive coding so that the encoder is typically 5 to 10 times more complex than the decoder [2]. However, with the advancement of emerging applications (e.g., wireless video sensor networks and wireless mobile video communications), current video coding paradigm is insufficient if some new requirements, such as the restrictions on computational capability and memory for a low power video device, are considered. In fact, this calls for a new video codec with low-complexity encoder. To meet this requirement, distributed video coding based on the Wyner-Ziv theorem [3] has recently become an emerging video coding paradigm, where individual frames are encoded independently (intraframe coding) but decoded conditionally (interframe decoding) [2], [4]-[8]. Analogously to conventional video coding, distributed video coding shifts part of the computational burden (e.g., motion estimation) from the encoder to the decoder and results in a new codec with light encoder and possible heavy decoder. _______________________________________________________________ +This work was supported in part by National Science Council, Republic of China under Grant NSC 95-2422-H-001-008. *Corresponding author ([email protected])

0-7803-9752-5/06/$20.00 ©2006 IEEE

267

A general framework of Wyner-Ziv video codec is shown in Fig. 1. At the encoder, an input video sequence is divided into key frames and Wyner-Ziv frames. Each key frame (K) is encoded using a conventional intraframe encoder while each Wyner-Ziv frame (W) is encoded using a distributed encoder to generate Wyner-Ziv bits. For example, in [5]-[7], a distributed encoder consists of the discrete cosine transform (DCT), a scalar quantization followed by a turbo encoder to generate parity bits, which are then partly transmitted as the Wyner-Ziv bits. On the other hand, the encoder can transmit some extra information to the decoder optionally. For example, in [6], the encoder transmits the hash bits consisting of a small subset of the quantized DCT coefficients of some selected blocks in Wyner-Ziv frames. At the decoder, each key frame is decoded using the conventional intraframe decoder. For a Wyner-Ziv frame, it is decoded using the distributed decoder with the assistance of side information. Side information can be generated using any previous decoded frames and/or the extra information transmitted from the encoder. For example, in [5], the side information for a Wyner-Ziv frame is generated based on the interpolation of some previous decoded frames. In [6], for each block in a Wyner-Ziv frame, the side information is generated based on the hash bits (transmitted from the encoder) to find the best matched block from the previous decoded frame by means of motioncompensated extrapolation. The generated side information, Ŵ, of a Wyner-Ziv frame, W, is an estimate of W and can cooperate with the decoder to decode the Wyner-Ziv bits. In [5]-[7] if the received Wyner-Ziv bits (a subset of parity bits) are not enough for decoding, the decoder can request additional parity bits from the encoder via a feedback channel. The above-mentioned Wyner-Ziv coding systems are with light encoder and heavy decoder because the encoder performs only intraframe coding and the decoder performs some complex interframe decoding operations (e.g., motioncompensation). Such a Wyner-Ziv codec is suitable for the scenario where the decoder can support high computational capability for complex interframe decoding. For example, in a video sensor network, there may be thousands of lowcomplexity encoder devices (video sensors) and only one or a few high-complexity decoder devices [2]. However, if both encoder and decoder are required to be with low-complexity (e.g., wireless video communication between a pair of mobile camera phones), an alternative

terms of low Hamming distance. The perceptual similarity among the three images can be deduced as PSNR(Ŵ, I)