Flexible Complexity Control Solution for Transform Domain Wyner-Ziv ...

4 downloads 1878 Views 2MB Size Report
To address this problem, we propose a different approach to complexity .... the coding efficiency of TDWZ (Transform Domain Wyner-Ziv. Video Coding) is worse ...
IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

209

Flexible Complexity Control Solution for Transform Domain Wyner-Ziv Video Coding Xiem HoangVan and Byeungwoo Jeon, Senior Member, IEEE

Abstract—Most Wyner-Ziv (WZ) video coding solutions in the literature focus on improving the coding efficiency. Recently, a few papers have addressed problems related to the complexity distribution of WZ video coding by sharing the motion estimation process between the encoder and decoder. However, these methods turn out to significantly increase the computational complexity of the encoding process, due to the presence of motion estimation at the encoder, which is less appealing in WZ coding applications, since their primary requirement is a low-encoding complexity. To address this problem, we propose a different approach to complexity control based on the adaptive selection between WZ and intra coding in the WZ frames. This solution is more flexible than the previous ones in that it can be implemented either at the encoder or decoder. The proposed intra mode selection algorithm exploits the spatial and temporal coherency in the transform domain Wyner-Ziv video coding (TDWZ), in order to achieve complexity control between the WZ encoder and decoder. The experimental results illustrate that the proposed algorithm not only effectively distributes the computational complexity over the encoder and decoder, but also retains the low-complexity feature at the encoder. This should make it attractive for a large number of real video applications of the WZ video coding paradigm. Moreover, the coding efficiency of the conventional TDWZ codec without intra mode decision is improved by up to 2 dB by the proposed intra mode selection algorithm. Index Terms—Complexity control, computational complexity, intra mode decision, video coding, video communication, Wyner-Ziv video coding.

I. INTRODUCTION

V

IDEO compression has been steadily evolving in the past 20 years or so with significant technological breakthroughs, the wide deployment of multimedia services and applications, and an unsurpassed overall impact on our society. Video applications are almost ubiquitous in many ICT (Information and Communication Technology) services, such as digital television, DVDs, smart phones, mobile and internet streaming, to name just a few. In retrospect, the “video

Manuscript received September 04, 2011; revised January 02, 2012; accepted January 18, 2012. Date of publication March 22, 2012; date of current version May 18, 2012. This paper was presented in part at IEEE BMSB2011, Nurnberg, Germany, June, 2011. This work was supported by the National Research Foundation of Korea (NRF) under Grant 2011-001-7578 funded by the Korea Government (MEST). X. HoangVan was with the Digital Media Lab, Sungkyunkwan University, Suwon, Korea. He is now with the Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal (e-mail: [email protected]). B. Jeon is with Digital Media Lab, Sunkyunkwan University, Suwon, Korea (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBC.2012.2187611

compression adventure” has been providing an additional compression gain of around 50% every 5 years or so and has led to the establishment of a set of video compression standards. It has been a steady technological evolution based on the predictive/hybrid video coding architecture, which has been continually enriched by new or enhanced coding tools to provide higher compression for the same quality, typically at the cost of higher encoding complexities. A key concept of current predictive video compression [1] is to remove the (temporal and spatial) redundancies by predicting the picture (or pixels) and transmitting only the prediction error with respect to the reference picture (or pixels), thus reducing the bandwidth and storage space needed for compressed video data. In conventional video coding, the video content is mainly compressed on the encoder side; therefore, a rather high encoding complexity is one of its inevitable characteristics. This type of video coding architecture is well suited to applications such as broadcasting [2] which can afford a high encoding complexity, but not for recent emerging applications where low encoding complexity is a must. With the wide deployment of mobile and wireless networks [3], there is a growing number of applications which do not fit the typical down-link model, but rather require an up-link model where many senders deliver data to a central receiver. Examples of these applications are wireless digital video cameras, low-power video sensor networks, and video surveillance cameras. Video recording by mobile phones can also be thought of as being in this category. Typically, these emerging applications require light encoding or, at least, flexible distribution of the computing complexity over the encoder and decoder. They also prefer to have robustness to packet losses, high compression efficiency and, in many cases, low latency/delay [9]. There is also a growing need for multi-view video encoding [4] by mobile terminals which the conventional scheme based on the joint encoding cannot provide. Note that in the 1970s, the conventional joint video coding problem was revisited in the light of an information theoretic theorem known as the Slepian-Wolf Theorem [5]. This theorem addresses the case where two statistically dependent signals, X and Y, although correlated, are independently encoded. Surprisingly, the theorem says that the minimum rate required to encode the two (correlated) sources independently can be made the same as the minimum rate for joint encoding with an arbitrarily small probability or error, under the assumption that the two sources having certain statistical characteristics are jointly decoded. This is a very interesting result in the context of the emerging challenges mentioned above, since it opens the door to a new coding paradigm where, at least in theory, separate encoding does not induce any loss of compression efficiency

0018-9316/$31.00 © 2012 IEEE

210

when compared to the traditional joint encoding paradigm. While the Slepian-Wolf Theorem deals with lossless coding, which is not the most relevant case in practical video coding solutions, the so-called Wyner-Ziv Theorem developed in 1976 [6] by A. Wyner and J. Ziv states that independent encoding can also be made to be no worse than the joint encoding in the case of lossy coding. A coding scheme based on these two theorems is generally referred to as a distributed coding solution and, if it is exploited for video contents, it is called distributed video coding (DVC) [7], [8]. In contrast to the conventional video coding techniques, such as MPEG-x and H.264/AVC [1], DVC can provide unprecedented functional benefits which are very important for many new emerging applications requiring such features as: i) flexible allocation of the video coding complexity over the encoder and decoder; ii) improved error resilience [9]; iii) codec independent scalability; and iv) friendliness to exploiting multi-view correlation [10]. The coding efficiency of DVC is still in need of improvement, especially in the case of high motion video sequences, thus, it has spurred much research. For example, to improve the coding efficiency some studied the generation of better side information [11]–[15], more accurate correlation noise estimation [16]–[18], or more effective reconstruction processes [19]. Besides, when the coding efficiency of TDWZ (Transform Domain Wyner-Ziv Video Coding) is worse than H.264/AVC intra coding (e.g., in high motion regions), intra mode has been selected to cope with this problem [20]–[25]. This coding tool can be executed at either the encoder [20]–[24] or decoder [25]. Besides the coding efficiency improvement, another existing research issue in DVC is its complexity-control capability. In contrast to the conventional video coding architecture, the motion estimation and rate control processes are shifted from the encoder towards the decoder in DVC schemes. This can make the decoder extremely complex and, thus, precludes its usage in a large number of real video coding applications. Since more and more applications are emerging which are equipped with an encoder and decoder in the same device, the flexible distribution of the complexity between the encoder and decoder is of the utmost importance. To deal with this emerging need, three approaches to the complexity distribution in DVC have been proposed so far, as described below. Decoding complexity reduction: In this approach, reducing the decoding complexity is the main issue. In DVC, the motion estimation and rate control tasks are shifted from the encoder to the decoder, which results in an extremely high decoding complexity, as analysed in [26]. Several decoding complexity reduction solutions have been proposed. In [27], for example, a novel LDPC decoder architecture using the “forced convergence” approach is presented. It uses a thresholding rule to identify those nodes that already have a strong “confidence” in their probability of being in the state of 0 or 1. These nodes are deactivated and their respective messages are no longer updated, resulting in a reduction of the decoding complexity. Meanwhile, in [30], [31], an earlier stopping criterion is proposed to reduce the decoding complexity. Although these solutions successfully reduce the decoding complexity with only a slight reduction of the coding efficiency, they do not provide sufficiently powerful complexity control capability, such that the user can easily dis-

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

tribute the complexity over the encoder and decoder according to the various demands of power-resource constraints. Sharing motion estimation: In this second approach, the motion estimation is performed at both the encoder and decoder [33], [35], and [36]. This has two purposes—the first one is to improve the quality of the side information by receiving accurate motion vectors derived from the encoder, and the second one is to reduce the complexity on the decoder side in accordance with its restricted power-resources. However, this solution still has the following two drawbacks: firstly, since it moves part or all of the motion estimation task back to the encoder, it may cause a significant increase in the encoding complexity, which should be avoided from the DVC point of view. Moreover, because only the generation of the side information is influenced by this solution, the reduction in the decoding complexity is quite limited. Therefore, this solution is actually not effective in terms of the reduction of the decoding complexity. Hybrid predictive-DVC codec: In this solution, a combination of conventional predictive video coding and DVC is proposed. Several coding modes with different complexities have been introduced. In the so called Hybrid-spatial mode [37], the macroblocks in a WZ frame are grouped into two subsets, S1 and S2 (using a checkerboard pattern), for one of which motion estimation is performed at the encoder. That is, at the encoder, we can directly apply the techniques developed for conventional predictive video coding to one set of macroblocks while, for the other set of macroblocks, WZ encoding is performed. In the Hybrid-subsample mode [38], the motion search space for each macroblock is restricted at the encoder, instead of restricting the number of macroblocks for which encoder-side motion estimation is performed. These two modes, in addition to the DVC and predictive coding modes, provide the user with a flexible way to satisfy the power-resource requirements. To address the drawbacks of the existing solutions, this paper proposes a more flexible complexity control solution, in which the user can easily set the control mechanism either at the encoder or decoder depending on the purpose. The key concept of the proposed complexity control is based on the selection between intra and WZ coding. Therefore, an intra mode selection algorithm, which effectively exploits the temporal and spatial coherencies, is introduced first. Moreover, instead of controlling the computational complexity only on the encoder side [29], the user can also control the complexity on the decoder side by using the available decoding resources. This opens the door to many new video coding applications in which the complexity control needs to be easier and more flexible. The rest of this paper is organized as follows. In Section II, transform domain Wyner-Ziv video coding is briefly described, along with a review of related works. The proposed intra mode selection algorithm and complexity control solutions, which are embedded in TDWZ, are presented in Section III. The experimental results are shown and discussed in Section IV. Finally, in Section V, some concluding remarks and suggestions for future works are made. II. TRANSFORM-DOMAIN WYNER-ZIV VIDEO CODING In Transform Domain Wyner-Ziv video coding [28], the input sequence is separated into two parts: the Key and Wyner-Ziv

HOANGVAN AND JEON: COMPLEXITY CONTROL SOLUTION FOR TRANSFORM DOMAIN WZ VIDEO CODING

211

Fig. 1. Transform domain Wyner-Ziv video coding.

Fig. 2. RD performance comparison between TDWZ [28] vs. H.263

(intra) and H.264/AVC intra (main profile).

frames. The key frames are typically compressed using conventional video coding such as H.263 or H.264/AVC intra coding, due to their low-encoding complexity, and the WZ frames are coded using WZ coding (see Fig. 1). The transform coefficients are computed using the Discrete Cosine Transform (DCT) thanks to its good energy compaction property. Note that if a transform is not used, then the codec is labeled as the Pixel Domain Wyner-Ziv (PDWZ). Since TDWZ is a more popular architecture and typically shows better rate-distortion (RD) performance than PDWZ [28], the TDWZ architecture is used in experiments to evaluate the efficiency of the proposed complexity control solution. The main feature that causes the Wyner-Ziv encoder to have a very low computational complexity is the shift of the motion estimation task from the encoder to the decoder. In conventional video coding such as H.264/AVC, the motion estimation process, which demands extremely high computational complexity, is placed at the encoder to exploit the redundancy in the temporal direction. Contrarily, TDWZ requires the motion

estimation to be performed at the decoder in order for the generation of the side information to exploit the correlation between the key and WZ frames. After estimating the positions of the erroneous bits in the side information at the TDWZ decoder by referring to a channel noise model (CNM) [16] assuming a Laplacian distribution of the noise (difference) between the original WZ and side information frames, parity bits to correct the errors are requested from the encoder over a feedback channel. The motion estimation and the large number of iterations in the LDPC decoder are the main source of decoding complexity in the Wyner-Ziv decoder. III. PROPOSED INTRA MODE SELECTION ALGORITHM AND COMPLEXITY CONTROL SOLUTION A. Motivations Although TDWZ is attractive in many video applications [10], its limited coding efficiency and excessive decoding complexity have restricted its wide development. Fig. 2 illustrates

212

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

algorithm, this paper proposes a simple measurement for evaluating the correlation among the blocks in the temporal and spatial directions. A method of selecting the proper coding mode is derived by directly comparing these two measurements. Assume that we have three consecutive frames , , and where and are the closest temporal neighbors of frame . To evaluate the temporal coherency of with the others, the sum of the differences (SAD) between the co-located blocks at and is computed as in (1): (1) Fig. 3. Computational complexity comparison between TDWZ [28] and H.264/AVC intra coding (main profile).

an RD performance comparison between the TDWZ architecture (Fig. 1) with H.263 (intra) and H.264/AVC intra coding (main profile) [34] using four sequences: Foreman, Hall monitor, Coastguard, and Soccer. It shows that for sequences having slow and moderate motion such as Hall monitor or Coastguard, TDWZ is clearly better than H.264/AVC intra in terms of its coding efficiency. However, for high motion sequences such as Soccer, the opposite is true. This motivated us to consider the possibility that an appropriate combination of TDWZ and H.264/AVC intra might provide a better video coding scheme for both complexity control and coding efficiency. Although a lot of effort has been made to look for better algorithms for intra coding mode selection [20]–[25], there is still room left for problems related to dependency on thresholds [20]–[23] or parameters [24], [25]. The user needs to find the best pre-defined threshold or parameter before executing the coding mode selection algorithm. This makes the existing algorithms difficult to use in real video compression applications, since the user does not have a clue how to set the thresholds or parameters. Moreover, to evaluate the computational complexity of the H.264/AVC intra [34] and TDWZ [28] paradigm, the processing time of 149 frames in the Foreman sequence is compared in Fig. 3 ( (Quantization matrix for WZ frame coding) and (Quantization parameter for key frame coding) are used for TDWZ as in [32]). The complexity of the TDWZ decoding is extremely higher than that of intra coding, while the difference in the encoding complexity between TDWZ and intra coding is moderate. Therefore, with the proper selection between the TDWZ and intra coding modes, we can easily control the decoding complexity, while maintaining as low a complexity at the encoder as possible. This is highly desirable for real video applications under power-resource constraints. B. Temporal and Spatial Coherence Measurements In principle, WZ video coding exploits the correlation between the pixels extended in the temporal direction to generate the side information, while H.264/AVC intra coding utilizes the correlation between the pixels only in the spatial direction. Therefore, if a frame has more coherence in the spatial than in the temporal direction, it is logical to expect the frame to be better encoded as intra than as WZ. For the intra mode selection

Notations: : Pixel index , : Block index F: Frame T: Temporal The summation in (1) is taken over the given frame. The spatial coherency of the current frame is determined based on the SAD between a block and its neighboring blocks using one of the following equations: (2)

(3)

(4) where: : modified median function : mean function S: Spatial A, B, C H, are the neighboring blocks of the current block I in Fig. 4. Note that, in (3), to calculate the modified median value for an even number of input arguments, an ordered list of input elements is firstly created and the arithmetic mean of the two middlemost elements in the ordered list is computed as the modified median value. For example, suppose we are calculating (1, 6, 2, 8, 7, 2). The ordered list of the input arguments is {1, 2, 2, 6, 7, 8} and its two middlemost terms are {2, 6}. Thus, their arithmetic mean is . The modified median value is 4, which is the arithmetic mean of the two middle observations in the ordered list. Since the pixels in the current block are predicted from those in the decoded neighboring blocks [1] in H.264/AVC intra coding, the SAD values in (2)–(4) can measure (in the opposite

HOANGVAN AND JEON: COMPLEXITY CONTROL SOLUTION FOR TRANSFORM DOMAIN WZ VIDEO CODING

213

coding mode decision algorithm in TDWZ for the purpose of obtaining the best coding efficiency). Therefore, compared to the previous literatures [20]–[25] which solved only the intra coding mode decision issue, the proposed method has the advantage of having the ability to perform mode selection without using any thresholds or parameters. D. Proposed TDWZ With Encoder-Based Complexity Control Fig. 4. Block-coherence in temporal and spatial directions.

sense) the correlation of the pixels in the spatial direction. Among these three, the best one is chosen by heuristics in the experiments. C. Proposed Intra Mode Selection and Complexity Control Solution A comparison between the spatial and temporal coherency is implicitly made by comparing the temporal and spatial SAD values. Moreover, for flexible complexity control, we like to distribute the complexity over the encoder and decoder by taking account of the percentage of intra coding mode based on the following inequality: (5) Note that the frame satisfying the inequality (5) is thought to be spatially more coherent, thus it is better to encode it as , an appropriate an intra-mode. With various gamma factors percentage of intra coding mode can be chosen, which makes flexible complexity control possible. The spatial and temporal coherencies can easily be computed using (1) and (2) (or (3), (4)). Let the ratio of the spatial to temporal coherencies measured in SAD for the frame be :

(6) Thus, the percentage (%) of WZ frames which are converted to intra frames by the proposed decision algorithm can be calculated using the following equation: (7) where: : Number of WZ frames : Frame index Intuitively, as the gamma value increases, the percentage of intra mode increases and, therefore, the decoding complexity decreases (and the encoding complexity increases). This creates an easy way to control the complexity for DVC. Note that the gamma factor used here is employed to solve the complexity control problem. The gamma factor can be ignored (that is, can be used) if the user is not interested in complexity control (that is, if the user only wants to employ the intra

This sub-section describes how to apply the complexity control mechanism to TDWZ on the encoder side. The input to the encoder is a video sequence which is separated into WZ and key frames. Both the WZ and key frames are exploited to distribute the intra frame coding mode and control the complexity for TDWZ (Fig. 5). The encoding and decoding processes of the WZ frames can be described as follows. For each WZ frame, a degree of complexity control is firstly decided in response to the key frames, which are used in calculating the temporal coherency, the WZ frame which is used in calculating the spatial coherency, and a gamma factor which is used in adjusting the number of intra frame modes or the degree of distributing computational complexity between the encoder and decoder. The gamma factor can be pre-set or controlled online by the user at the encoder in the implementation of this scheme to meet the requirements of the complexity constraints or power resources at the encoder and decoder. When the power capacity of the decoder decreases, the user can increase the gamma factor until the decoding computational complexity satisfies the power capacity of the decoder. This will be very useful for real time applications. After selecting the proper coding mode, the proposed encoder performs intra encoding of those WZ frames having higher coherency in the spatial direction than in the temporal direction. The mode information is transmitted to the decoder for the purpose of reconstructing the WZ frames exactly. At the decoder, after receiving the mode information, the complexity control component will automatically select the intra decoding process to reconstruct the WZ frames signaled by the mode information. E. Proposed TDWZ With Decoder-Based Complexity Control On the decoder side, first of all, the H.264/AVC intra decoder is executed to reconstruct the decoded key frames. Therefore, when the complexity control mechanism is performed at the decoder, only the decoded key frames are exploited to adjust the complexity. The TDWZ architecture with the proposed decoder-based complexity control solution is shown in Fig. 6, and its encoding and decoding processes are described as follows. The key frame coding is performed at the encoder and the decoded key frames are used as the inputs to the proposed complexity control block. Here, to calculate the degree of coherency in the temporal direction, we directly refer to these decoded key frames. However, due to the absence of the original WZ frames at the decoder, we create an error version of the WZ frames by averaging the two decoded key frames in order to compute the coherence of the WZ frames in the spatial direction. At the decoder, the user can control the computational complexity for the TDWZ codec by adjusting the gamma factor. The mode information is sent back to the encoder and, at the encoder, the WZ frames are processed by either WZ encoding or intra encoding

214

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

Fig. 5. Proposed TDWZ with encoder-based complexity control.

Fig. 6. Proposed TDWZ with decoder-based complexity control.

based on the mode information which is received from the decoder.

TABLE I QUANTIZATION PARAMETER SETTING FOR KEY FRAME

IV. EXPERIMENTAL RESULTS AND DISCUSSION In order to evaluate and verify the efficiency of the proposed intra mode selection algorithm and complexity control solution, the following three figures of merit are measured: i) the accuracy of the proposed intra mode selection algorithm, ii) the coding efficiency of the proposed intra mode selection algorithm with variable gamma factors, and iii) the efficiency of the proposed complexity control solution for TDWZ. The test sequences are Foreman, Coastguard, and Soccer - these sequences represent a variety of contents, which is important to obtain representative and meaningful results. All 149 frames per sequence are used at the encoder with a group of picture (GOP) structure of 2. The spatial and temporal resolutions are respectively QCIF and 15 Hz, which means 7.5 Hz for the WZ frames. The same quantization matrix and quantization parameter are used as the test condition for the DISCOVER codec [32] and they are shown in Table I. Here, four RD points are examined to test the coding efficiency. The reference software JM 17.2 [34] is used for processing the key frames and WZ frames when the intra mode is chosen. A. Mode Selection Accuracy The mode selection accuracy is compared graphically. (SAD—Spatial), A comparison between (SAD—Temporal), and the bitrates used by the WZ and intra

: quantization matrix number for WZ frame

modes, respectively, is given in Fig. 7 for TDWZ under the proposed encoder-based complexity control. It is clear that the proposed method shows high accuracy in terms of coding mode selection, since the portion of the frame having lower SAD values between the temporal and spatial directions coincides very well with that having a lower bitrate of the corresponding WZ and intra modes (see Fig. 7). By using (6) and (7), the percentage of WZ frames which are converted to intra frames when applying the proposed decision algorithm is also observed 20.3%, 4.1%, and 83.4%, respectively for Foreman, Coastguard, and Soccer sequence. In order to choose the best representative equation among (2)–(4) for the spatial coherence measurement, the coding efficiency is compared, as shown in Table II. Here, BDPSNR and BDRATE [39] are also computed as a summary over the 4 RD test points. Table II shows that (2) is the best measurement for the spatial coherency, because the BDRATE values of (3) and (4) compared to the anchor (i.e., (2)) are in most cases positive, while BDPSNR is negative.

HOANGVAN AND JEON: COMPLEXITY CONTROL SOLUTION FOR TRANSFORM DOMAIN WZ VIDEO CODING

215

Fig. 7. SAD and Rate Comparison (under the proposed encoder-based complexity control).

Note that a positive BDPSNR or negative BDRATE indicate that the anchor has worse performance than its competitor [39]. Therefore, to measure the spatial coherence, only those pixels to the left and above the current pixel (A, B and C), as indicated in (2), should be chosen. The other surrounding pixels, such as measurements D, E H, can be ignored, since they have little impact on the spatial coherence. B. Coding Efficiency of the Proposed Intra Mode Selection Algorithm To verify the coding efficiency of the proposed intra mode selection algorithm, two experiments are performed. First of all, Figs. 8 and 9 show a comparison of the RD performance for the following three settings: (i) the SKKU codec [29]: this video codec is developed based on the TDWZ structure [28]. The side information is generated as in [11], and an LDPCA [40] code is

used to correct the error bits in the side information; (ii) TDWZ [28] with the proposed encoded-based complexity control: this codec is developed based on the structure in Fig. 5, where the complexity control is performed on the encoder side and the gamma factor is set as in {0.5, 1.0, 1.5, and 2.0}; (iii) TDWZ [28] with the proposed decoder-based complexity control: this codec is created based on the structure in Fig. 6, where the complexity control is performed on the decoder side. An interesting conclusion derived from these experiments is that the highest coding efficiency can be achieved with the proposed schemes when the gamma factor is set to 1. This means that by directly comparing the spatial coherency to the temporal coherency, we can choose the best coding mode fairly well for the WZ frames. In the second experiment, a comparison between the two proposed coding schemes (Encoder-based and Decoder-based)

216

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

TABLE II CODING EFFICIENCY COMPARISON AMONG DIFFERENT SPATIAL COHERENCY MEASUREMENTS

Negative BDPSNR means the anchor is better;

Positive BDRATE means the anchor is better

Fig. 8. TDWZ with encoder-based complexity control.

when the gamma factor is set to 1 with the SKKU codec [29] and DISCOVER codec [32] is shown in Fig. 10. For all sequences with a variety of rate constraints, the proposed coding schemes always outperform the SKKU and DISCOVER codec. This confirms the efficiency of the proposed intra mode selection algorithm. Note that a video sequence having higher motion has a higher coding gain when the proposed additional intra coding mode decision is used. The coding gains are increased by up to 2 dB by the proposed method when compared with the TDWZ codec without the intra frame mode decision, notably in the Soccer sequence. However, for sequences with slow motion such as Coastguard (see Fig. 10) or Hall monitor (not shown) sequences, the proposed method shows almost similar coding performance as the conventional TDWZ without the intra mode decision. It is because in a slow motion sequence, temporal coherence is much stronger than the spatial one, therefore, WZ

coding mode is dominantly chosen even if the proposed intra decision method is applied. This can be also expected from Fig. 2. Note that in case of Hall monitor sequence, not a single frame was selected as intra mode frame by the proposed mode selection, therefore there was no coding performance difference, so its result is omitted in Fig. 10. Besides, as shown in Fig. 10, the coding efficiency afforded by TDWZ with the proposed encoder-based complexity control is similar to that obtained with the proposed decoder-based complexity control. Therefore, the proposed intra mode selection algorithm is suitable at both the encoder and decoder. C. Computational Complexity In this sub-section, the performance of computational complexity control is examined. Here, the processing time is used to compare the computational complexity at the encoder and

HOANGVAN AND JEON: COMPLEXITY CONTROL SOLUTION FOR TRANSFORM DOMAIN WZ VIDEO CODING

217

Fig. 9. TDWZ with decoder-based complexity control.

Fig. 10. RD performance for QCIF video sequences at 15Hz and for GOP size of 2.

decoder. Fig. 11 shows the percentage processing time of the three encoding processes, namely, mode decision, WZ coding, and intra coding for key frames and WZ frames which are selected as intra mode by the proposed decision algorithm. The mode decision process, which is expected to have a low computational complexity, determines the coding mode for the WZ frames. In the WZ coding process, the gamma factor is adjusted to control the complexity at TDWZ. Therefore, the number of WZ frames using WZ coding will depend mainly on this gamma factor. Note that the gamma factor is selected by user. That is, it is not something estimated from the video data, but it is chosen by the user’s discretion depending on the desired extent of complexity distribution between encoder and decoder. For example, if the user likes to have very low encoding complexity, the gamma is set as a lower value, for ex-

ample, or less. In this case, most WZ frames will be encoded using the WZ mode instead of intra mode. On the contrary, when the gamma is set as a higher value, for example, or more, most of WZ frames will choose the intra mode, and the encoding complexity increases and the decoder complexity will decrease in return. Therefore, the complexity can be gradually shifted from the decoder toward the encoder, or vice versa by setting up the gamma value appropriately. It can meet a particular usage scenario depending on user’s demand of computational complexity distribution. The intra and key frame coding process includes the processing time spent in coding the key frames and WZ frames, which are selected for intra coding mode. Compared to the other processing parts on the encoder side, the processing time of the mode decision is acceptable—it takes about 3% 4% of the overall encoding processing time.

218

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

Fig. 11. Comparison of the percentage processing times by the processing components in the encoder (Foreman).

Fig. 12. Computational complexity in TDWZ with variable gamma values (Foreman, QCIF @15Hz).

The key frame coding and WZ frame with intra coding take up most of the processing time at the encoder. In Fig. 12, the processing time of decoding and encoding (seconds) for Foreman sequence with variable gamma factors are examined to evaluate the efficiency of the proposed complexity control algorithms. Obviously, when is set as 0, all of WZ frames are coded with WZ mode only. That is, there was no conversion of WZ frame into intra frame. Therefore, the case with actually corresponds to the decoding and encoding time for reference WZ coding; and when is set as 7, all WZ frames are coded with intra mode only. Therefore, the decoding/encoding time for full H.264/AVC intra coding corresponds to this case. It is clear that when the gamma value is increased, the decoding complexity is reduced and the encoding complexity is increased under both

schemes of encoder complexity control (Fig. 12(a), (b)) and decoder complexity control (Fig. 12(c), (d)). This mechanism is really useful when the computing resources at the decoder are insufficient for proper processing, since the user can easily change the complexity balancing option to adapt to this resource constraint. V. CONCLUSION This paper proposed a novel complexity control solution for TDWZ based on an adaptive selection between intra and WZ coding. An effective intra mode selective algorithm is firstly introduced and used for complexity control. The complexity control mechanism can be performed at either the encoder or decoder. This makes the proposed solution more flexible than those in previous works. The experimental results confirmed the

HOANGVAN AND JEON: COMPLEXITY CONTROL SOLUTION FOR TRANSFORM DOMAIN WZ VIDEO CODING

effectiveness of the proposed solution in three respects: i) the accuracy of the intra mode selection algorithm, ii) the coding performance of the proposed schemes, and iii) the complexity control capability. The experiments showed that the proposed solution is effective not only in distributing the complexity for TDWZ, but also in retaining a low encoding complexity. This is an essential feature for real video coding applications. In future works, not only the intra and WZ coding modes, but also the predictive and hybrid predictive-distributed video coding modes will be investigated and exploited in the TDWZ architecture to adapt it for different purposes. REFERENCES [1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [2] P. Nasiopoulos and R. K. Ward, “Effective multi—program broadcasting of pre—recorded video using VBR MPEG-2 coding,” IEEE Trans. Broadcast., vol. 48, no. 3, pp. 207–214, Sep. 2002. [3] C. Xu, E. Fallon, Y. Qiao, L. Zhong, and G.-M. Muntean, “Performance evaluation of multimedia content distribution over multi-homed wireless networks,” IEEE Trans. Broadcast., vol. 57, no. 2, pp. 204–215, Jun. 2011. [4] X. Guo, Y. Lu, F. Wu, D. Zhao, and W. Gao, “Wyner-Ziv based multiview video coding,” IEEE Trans. Circuits Syst.Video Technol., vol. 18, no. 6, pp. 713–724, Jun. 2008. [5] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol. IT-19, pp. 471–480, Jul. 1973. [6] A. D. Wyner and J. Ziv, “The rate—distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. IT-22, no. 1, pp. 1–10, Jan. 1976. [7] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,” in Proc. IEEE, Special Issue Video Coding Del., Jan. 2005, vol. 93, pp. 71–83, Invited Paper. [8] R. Puri, A. Majumdar, and K. Ramchandran, “PRISM: A video coding paradigm with motion estimation at decoder,” IEEE Trans. Image Process., vol. 16, no. 10, pp. 2436–2448, Oct. 2008. [9] Z. Xue, K. K. Loo, J. Cosmas, M. Tun, L. Feng, and P.-Y. Yip, “Errorresilience scheme for wavelet video codec using automatic ROI detection and Wyner-Ziv coding over packet erasure channel,” IEEE Trans. Broadcast., vol. 56, no. 4, pp. 481–493, Dec. 2010. [10] F. Pereira, L. Torres, C. Guillemot, T. Ebrahimi, R. Leonardi, and S. Klomp, “Distributed video coding: selecting the most promising application scenarios,” J. Signal Process.: Image Commun., vol. 23, pp. 339–352, 2008. [11] J. Ascenso, C. Brites, and F. Pereira, “Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding,” in 5th EURASIP Conf. Speech Image Process., Multimedia Commun. Serv., Smolenice, Slovak Republic, Jul. 2005. [12] M. B. Badem, M. Mrak, and W. A. C. Fernando, “Side information refinement using motion estimation in DC domain for transform-based distributed video coding,” IET Electron. Lett., vol. 44, no. 16, pp. 965–966, Jul. 2008. [13] R. Martins, C. Brites, J. Ascenso, and F. Pereira, “Refining side information for improved transform domain Wyner-Ziv video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 9, pp. 1341–1927, Sep. 2009. [14] B. Ko, H. J. Shim, and B. Jeon, “Wyner-Ziv coding with spatio-temporal refinement based on successive turbo decoding,” in IEEE Int. Conf. Multimedia Expo (ICME), Jun. 2008, pp. 785–788. [15] X. HoangVan, J. Park, and B. Jeon, “A flexible side information generation scheme using adaptive search range and overlapped block motion compensation,” in Proc. 5th Int. Conf. Ubiquitous Inf. Manag. Commun., Feb. 2011, no. 46. [16] C. Brites and F. Pereira, “Correlation noise modeling for efficient pixel and transform domain Wyner-Ziv video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 9, pp. 1177–1190, Sep. 2008. [17] J. Skorupa, J. Slowack, S. Mys, P. Lambert, and R. V. Walle, “Accurate correlation modeling for transform domain Wyner-Ziv video coding,” in Proc. Pacific–Rim Conf. Multimedia (PCM), Dec. 2008, pp. 1–10.

219

[18] G. R. Esmaili and P. C. Cosman, “Wyner-Ziv video coding with classified correlation noise estimation and key frame coding mode selection,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2463–2474, Sep., 2011. [19] Y. Vatis, S. Klomp, and J. Ostermann, “Enhanced reconstruction of the quantized transform coefficients for Wyner-Ziv coding,” in Proc. IEEE Int. Conf. Multimedia Expo, Aug. 2007, pp. 172–175. [20] A. A. Trapanese, M. Tagliasacchi, S. Tubaro, J. Ascenso, C. Brites, and F. Pereira, “Embedding a block-based intra mode in frame-based pixel domain Wyner-Ziv video coding,” presented at the Int. Workshop Very Low Bitrate Video Coding, Sep. 2005. [21] M. Tagliasacchi, A. Trapanese, S. Tubaro, J. Ascenso, C. Brites, and F. Pereira, “Intra mode decision based on spatio-temporal cues in pixel domain Wyner-Ziv video coding,” in IEEE ICASSP, Toulouse, France, May 2006, vol. 2, pp. 57–60. [22] D.-C. Tsai, C.-M. Lee, and W.-N. Lie, “Dynamic key block decision with spatio-temoral analysis for Wyner-Ziv video coding,” in Proc. IEEE Int. Conf. Image Process., Nov. 2007, pp. 425–428. [23] L. Liu, D.-K. He, A. Jagmohan, L. Lu, and E. J. Delp, “A low-complexity iterative mode selection algorithm for Wyner-Ziv video compression,” in Proc. IEEE Int. Conf. Image Process., Oct. 2008, pp. 1136–1139. [24] J. Ascenso and F. Pereira, “Low complexity intra mode selection for efficient distributed video coding,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2009, pp. 101–104. [25] S. Mys, J. Slowack, J. Skorupa, N. Deligiannis, P. Lambert, A. Munteanu, and R. V. Walle, “Decoder-driven mode decision in a block-based distributed video codec,” Multimedia Tools Appl., Jan. 2011. [26] C. Brites, J. Ascenso, J. Q. Pedro, and F. Pereira, “Evaluating a feedback channel based transform domain Wyner-Ziv video codec,” Signal Process.: Image Commun., vol. 23, no. 4, pp. 269–297, Apr. 2008. [27] E. Zimmermann, G. Fettweis, P. Pattisapu, and P. K. Bo-ra, “Reduced complexity LDPC decoding using forced convergence,” presented at the Int. Symp. Wireless Personal Multimedia Commun., 2004. [28] A. Aaron, S. Rane, E. Setton, and B. Girod, “Transform domain Wyner-Ziv codec for video,” in Proc. SPIE Vis. Commun. Image Process., Jan. 2004, pp. 520–528. [29] X. HoangVan, J. Park, and B. Jeon, “Flexible complexity control based on intra frame mode decision for distributed video coding,” in Proc. EEE Int. Symp. Broadband Multimedia Syst. Broadcast., Jun. 2011, pp. 1–5. [30] W. A. Blad and L. O. Gustafsson, “An early decision decoding algorithm for LDPC codes using dynamic thresholds,” in Eur. Conf. Circuit Theory Des., Aug. 2005, pp. III/285–III/288. [31] J. Ascenso and F. Pereira, “Complexity efficient stopping criterion for LDPC based distributed video coding,” presented at the 5th Int. ICST Mobile Multimedia Commun. Conf., 2009. [32] “DISCOVER codec evaluation,” [Online]. Available: http://www. img.lx.it.pt/~discover/test_conditions.html [33] T. Clerckx, A. Munteanu, J. Cornelis, and P. Schelkens, “Distributed video coding with shared encoder/decoder complexity,” in Proc. IEEE Int. Conf. Image Process., Nov. 2007, pp. 417–420. [34] “H.264/AVC reference software,” [Online]. Available: http://iphome. hhi.de/suehring/tml/ [35] H. Chen and E. Steinbach, “Flexible distribution of computational complexity between the encoder and the decoder in distributed video coding,” in Proc. IEEE Int. Conf. Multimedia Expo, Hannover, Jun. 2008, pp. 801–804. [36] C. Kim, M. Kim, and D. Suh, “Distributed video coding encoder/decoder complexity sharing method by phase motion estimation algorithm,” in IEEE Pacific-Rim Conf. Commun., Comput. Signal Process., 2009, pp. 849–853. [37] S. Mys, J. Slowack, J. Skorupa, P. Lambert, and R. Van de Walle, “Dynamic complexity coding: Combining predictive and distributed video coding,” presented at the Picture Coding Symp. (PCS), Nov. 2007. [38] J. Slowack, J. Skorupa, S. Mys, P. Lambert, C. Grecos, and R. Walle, “Flexible distribution of complexity by hybrid predictive-distributed video coding,” Signal Process.: Image Commun., vol. 25, no. 2, pp. 94–110, 2010. [39] G. Bjontegaard, “Calculation of average PSNR differences between RD curves,” Visual Coding Expert Group, TU-T Q6/16 document VCEG-M33.doc, Apr. 2001. [40] D. Varodayan, A. Aaron, and B. Girod, “Rate-adaptive codes for distributed source coding,” EURASIP Signal Process. J., Special Issue Distributed Source Coding, vol. 86, no. 11, pp. 3123–3130, Nov. 2006.

220

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 2, JUNE 2012

Xiem HoangVan received the B.Sc. degree from Hanoi University of Science and Technology, Vietnam, in 2009, and the M.Sc. degree from Sungkyunkwan University, Korea, in 2011, both in electrical engineering. He is currently working toward the Ph.D. degree in electrical and computer engineering from the Instituto Superior Técnico, Portugal. His research interests include video compression standards and distributed video coding. He is a member of ACM and KSOBE.

Byeungwoo Jeon (S’88–M’95–SM’10) received the B.S. degree (Magna Cum Laude) in 1985, the M.S. degree in 1987 in electronics engineering from Seoul National University, Seoul, Korea, and the Ph.D. degree in electrical engineering from Purdue University, West Lafayette, IN, in 1992. From 1993 to 1997, he was in the Signal Processing Laboratory, Samsung Electronics, Korea, where he conducted research and development into video compression algorithms, the design of digital broadcasting satellite receivers, and other MPEG-re-

lated research for multimedia applications. Since September 1997, he has been with the faculty of the School of Information and Communication Engineering, Sungkyunkwan University, Korea, where he is currently a Professor. he served as Project Manager of Digital TV and Broadcasting in the Korean Ministry of Information and Communications from 3/2004 to 2/2006, where he supervised all digital TV-related R&D in Korea. He has authored many papers in the areas of video compression, pre/post processing, and pattern recognition. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing. Dr. Jeon is a member of Tau Beta Pi and Eta Kappa Nu. He is a member of SPIE, IEEK, KICS, and KSOBE. He was a recipient of the 2005 IEEK Haedong Paper Award in Signal Processing Society, Korea.

Suggest Documents