IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
3
Kalman Filtering Based Rate-Constrained Motion Estimation for Very Low Bit Rate Video Coding Chung-Ming Kuo, Shu-Chiang Chung, and Po-Yi Shih
Abstract—The rate-constrained (R-D) motion estimation techniques have been presented to improve the conventional block-matching algorithm by using a joint rate and distortion criterion. This paper presents two motion estimation algorithms using Kalman filter to further enhance the performance of the conventional R-D motion estimation at a relative low computational cost. The Kalman filter exploits the correlation of block motion to achieve higher precision of motion estimation and compensation. In the first algorithm, the Kalman filter is utilized as a postprocessing to raise the motion compensation accuracy of the conventional R-D motion estimation. In the second algorithm, the Kalman filter is embedded into the optimization process of R-D motion estimation by defining a new R-D criterion. It further improves the rate-distortion performance significantly. Index Terms—Kalman filter, motion model, R-D motion estimation.
I. INTRODUCTION
M
OTION estimation plays an important role in video coding systems [1]–[7], such as H.26x and MPEG-x, with significant improvement in bit rate reduction. Among the various motion estimation approaches, the block-matching algorithm (BMA) is the most popular one due to its simplicity and reasonable performance. In BMA, an image frame is divided into nonoverlapping rectangular blocks with equal or variable block sizes, and all pixels in each block are assumed to have the same motion. The motion vector (MV) of a block is estimated by searching for its best match within a search window in the previous frame. The distortion between the current block and each searching block is employed as a matching criterion. The resulting MV is used to generate a motion compensated prediction block. The motion compensated prediction difference blocks (called residue blocks) and the MVs are encoded and sent to the decoder. In high-quality applications, the bit rate , is much less than that for residues, ; thus, for MVs, can be neglected in MV estimation. However, in low- or very low- bit rate applications such as videoconference and videophone, the percentage of MV bit rate is increased when overall rate budget decreases. Thus, the coding of MVs takes up a significant portion of the bandwidth [9]. Then in very low Manuscript received August 3, 2003; revised January 9, 2004. This work was supported by National Science Council of Taiwan, R.O.C., under Grant NSC 92-2213-E-214-044, and in part by I-Shou University under Grant ISU-93-07-03. This paper was recommended by Associate Editor H. Sun. C.-M. Kuo and S.-C. Chung are with the Department of Information Engineering, I-Shou University, Kaoshiung 840, Taiwan, R.O.C. (e-mail:
[email protected];
[email protected]). P.-Y. Shih is with the VIA Technologies Inc., Hsin-Chu 300, Taiwan, R.O.C. (e-mail:
[email protected]). Digital Object Identifier 10.1109/TCSVT.2005.857287
bit rate compression, the motion compensation must consider the assigned MV rate simultaneously. Thus, a joint rate and distortion (R-D) optimal motion estimation has been developed to achieve the trade-off between MV coding and residue coding [8]–[16]. In [13], a global optimum R-D motion estimation scheme is developed. The scheme achieves significant improvement of performance, but it employs Viterbi algorithm for optimization, which is very complicated and results in a significant time delay. In [14], a local optimum R-D motion estimation criterion was presented. It effectively reduces the complexity at the cost of performance degradation. In this paper, we will propose two Kalman filter-based methods to improve the conventional R-D motion estimation, which are referred to as enhanced algorithm and embedded algorithm, respectively. In the enhanced algorithm, the Kalman filter is employed as a post processing of MV, which extends the integer-pixel accuracy of MV to fractional-pixel accuracy, thus enhancing the performance of motion compensation. Because the Kalman filter exists in both encoder and decoder, the method achieves higher compensation quality without increasing the bit rate for MV. In the embedded algorithm, the Kalman filter is applied directly during the process of optimization of motion estimation. Since the R-D motion estimation consider compensation error (distortion) and bit rate simultaneously, when Kalman filter is applied the distortion will be reduced, and thus lowering the cost function. Therefore, the new algorithm can improve distortion and bit rate simultaneously. However, the rate constraint used in this paper is a general criterion for motion estimation. Thus, the approaches we proposed can be combined with existing advanced motion estimation algorithms such as overlapped block motion compensation (OBMC) [17], [18], and those recommended in H.264 or MPEG-4 AVC [19], [20]. The paper is organized as follows. A brief review of the related works is presented in Section II. In Section III, we derive the enhanced R-D motion estimation based on the Kalman filter. In Section IV, we describe how to embed the Kalman filter into the R-D optimization process of motion estimation. Simulation results are presented in Section V and finally the conclusions are given in Section VI. II. REVIEW OF RELATED WORKS A. Kalman Filter The Kalman filtering algorithm estimates the states of a system from noisy measurement [21], [22]. There are two
1051-8215/$20.00 © 2006 IEEE
4
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
major features in a Kalman filter. One is that its mathematical formulation is described in terms of state-space representation, and the other is that its solution is computed recursively. It consists of two consecutive stages: prediction and updating. We summarize the Kalman filter algorithm as follows: Predicted equation (1) (2)
Measurement equation
and are state and measurement vector at where and are state transition, measurement and time . , driving matrices, respectively. Generally, we assume that and are white Gaussian with , and for all , . Let , and be initial conditions. Prediction (see equations (3) and (4) at the bottom of the page.) Updating (see equations (5)–(7) at the bottom of the page.) The is the error covariance matrix that is associated with . It is defined as the state estimate (8) This matrix provides a statistical measure of the uncertainty . The superscripts “ ” and “ ” denote “before” and in “after” measurement, respectively.
Fig. 1. Generic hybrid video coding system.
In motion compensated hybrid coding, the bit rate can be divided into the displacement vector field, the prediction error, and additional side information. The very accurate motion compensation is not the key to a better picture quality at low or very low bit rates. So, the problem of optimally allocating a limited rate budget to the displacement vector field and the motion-compensated prediction error is addressed. In 1998, Chen and Willson have conferred this point again [13], and analyzed this issue thoroughly. They explained a new estimation criterion in detail, and proposed a new rate-constrained motion estimation for general video coding system. Since the performance of video compression is according to not only motion compensation but also the rate budget, which is include bit rate for MV and bit rate for prediction error. Therefore, The optimal solution can then be searched for throughout the convex hull of all possible R-D pairs by minimizing the total Lagrangian cost function given by
B. R-D Motion Estimation In conventional motion estimation, a major consideration is to reduce the motion compensated prediction error such that the coding rate for the prediction error can be reduced. This is true for high-rate applications because the bit rate for MV is only a very small part of all transmission rates. However, in low bit rate or very low bit rate situation, is a significant part in all available rate budget. For this reason, should be considered into the process of motion estimation. Therefore, the criterion of motion estimation must be modified accordingly. In 1994, Bernd Girod addressed this problem first. He proposed a theoretical framework for rate-constrained motion estimation, and a new region based motion estimation scheme [12].
(9) is the quantization parameter for blocks, respecwhere tively. This approach, however, is computationally intensive, involving a joint optimization between motion estimation/compensation and prediction residual coding schemes. By (9), we see that the dicrete cosine transform (DCT) and quantization operations must be performed on an MV candidate basis in order to obtain and . The significant computations make the scheme unacceptable for most practical implementations, no matter what software or hardware. Thus, they
State prediction
(3)
Prediction-error covariance
(4)
State updating
(5)
Updating-error covariance
(6)
Kalman gain matrix
(7)
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
5
Fig. 2. Block diagram of the proposed enhanced R-D motion estimation algorithm.
simplify (9) by only considering motion estimation error and bit rate for MV. Assume a frame is partitioned into block sets. Let be the MV estimated for block . Then the motion field of a frame is described by the -tuple vector, . The joint R-D optimization can be interpreted as finding a MV field that minimizes the distortion under a given rate constraint, which can be formulated by the Lagrange multiplier method as follows: (10) where is the Lagrange multiplier, and and are the motion-compensated distortion and the number of bits associated with MV of the block , respectively. In most video coding standards, the MVs of blocks are differentially coded using Huffman code. Thus, the blocks are coded dependently. However, this simplification has two evident defects: 1) it is still too complex and 2) the performance is degraded. In the same year, Coban and Merserau proposed different scheme on the RD optimal problem [14]. They think that (10)
is a principle for global optimal of R-D problem, but it is difficult in implementation. They supposed, if each block is coded independently, the solution (10) can be reduced to minimizing the Lagrangian cost function of each block, i.e., (11) In order to simplify the problem, although the MVs are coded differentially, the blocks will be treated as if they are being coded independently. This will lead to a locally optimal, globally sub-optimal solution. By this way, the framework of R-D optimal motion estimation is close to of conventional motion estimation. Although it saves computation by ignoring the relation of blocks, it reduces the overall performance. III. ENHANCED R-D MOTIOIN ESTIMATION USING KALMAN FILTER The R-D motion estimation often yields smooth MV fields, as compared with conventional BMAs [13], [14]. In other words, the resulting MVs are highly correlated. In this work, we try to fully exploit the correlation of MVs by using the Kalman filter. This is motivated by our previous works [23], [24], in which
6
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
Fig. 3. Block diagram of the proposed embedded R-D motion estimation algorithm.
the Kalman filter is combined with the conventional BMAs to improve the estimate accuracy of MVs. In [1], a generic hybrid video coding system is depicted in Fig. 1. Fig. 2 shows the block diagram of the proposed motion estimation technique, which consists of two cascaded stages: measurement of MV and Kalman filtering. We employ a R-D fast search scheme [9]–[11], [13]–[15] to obtain the measured MV. Then we model the MVs and generate the predicted MV utilizing the inter-block correlation. Based on the measured and predicted MVs, a Kalman filter is applied to obtain an optimal estimate of MV. For the sake of simplicity in implementation, we employ the first-order AR (autoregressive) model to characterize the MV correlation. The MV of the block at location of the -th frame is denoted by , and its two components in horizontal and vertical directions are modeled as (12) (13) and represent the model error where components. In order to derive the state-space representation, the time indexes and are used to represent the current block location , and the left-neighbor block location
, respectively. Consequently, the state-space representation of (12) and (13) are (14) or (15) , , where we let and . The error components, and , are assumed to be Gaussian distribution with zero mean and the same variance . The measurement equations for the horizontal and vertical directions are expressed by
(16) where , denote two measurement error components with the same variance . In general, the model error and measurement error may be colored noises. We can model each colored noise by a low-order difference equation that is excited by white Gaussian noise, and augment the states associated colored noise models to the original state space representation. Finally, we apply the recursive filter to the augmented system. However, the procedure requires considerable computational complexity and is not
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
7
TABLE I COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE VARIOUS MOTION ESTIMATION ALGORITHMS USING THE CIF-CLAIR 100 FRAMES UNDER 15 FRAMES/S
FOR
TABLE II COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE VARIOUS MOTION ESTIMATION ALGORITHMS USING THE CIF- SALESMAN 100 FRAMES UNDER 15 FRAMES/S
FOR
TABLE III COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- FOREMAN 100 FRAMES UNDER 10 FRAMES/S
FOR
suitable for our application. Moreover, in this paper, the measurements are obtained by the R-D fast search algorithm [14], in which the blocks are processed independently. Thus, we can assume that the measurement error is independent. For simplicity but without loss of generality, the prediction error and measurement error are assumed to be zero-mean Gaussian distribution with the same variances and , respectively. In the above equations, the measurement matrix is constant, and state transition matrix can be estimated by the least square method. Since the motion field for low bit rate applications is rather smooth, we assume that and are with fixed values.
The proposed algorithm is summarized as follows. Step 1) Measure MV. Measure the MV of a moving block, by any R-D search algorithms [9]–[11], [13]–[15]. Encode the MV by H.263 Huffman table [6], [7]. Step 2) Kalman filtering a) The predicted MV is obtained by b) Calculate prediction-error covariance by
8
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
TABLE IV COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- MOTHER & DAUGHTER 120 FRAMES UNDER 10 FRAMES/S
TABLE V COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- MOTHER & DAUGHTER 120 FRAMES UNDER 10 FRAMES/S
TABLE VI COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- MOTHER & DAUGHTER 120 FRAMES UNDER 10 FRAMES/S
c) Obtain Kalman gain by d) The MV estimate is updated by This is the final estimate output. e) Calculate the filtering-error covariance by Step 3) Go to Step 1 for next block. is usuIn the above algorithm, the optimal estimate ally real, thus yielding fractional-pixel accuracy estimate. The conventional BMA can also obtain the fractional-pixel MV by increasing resolution with interpolation and matching higherresolution data on the new sampling grid. However, this not only increases computational complexity significantly, but also raises overhead bit rate for MV. On the contrast, the required
computational overhead is much lower than that of the conventional BMA with fractional-pixel matching. In addition, using the same Kalman filter as in the encoder, the decoder can estimate the fractional part of MV by receiving integer MV. In summary, the new method achieves fractional pixel performance with the same bit rate for MV as an integer-search BMA, at the cost of a small increase of computational load at the decoder. The detail analysis of computational complexity will be given in Section V. Furthermore, because the Kalman filter is independent with motion estimation, it can be combined with any existing R-D motion estimation scheme with performance improvement. IV. KALMAN FILTER EMBEDDED R-D MOTION ESTIMATION The main feature of the above enhanced scheme is to obtain fractional pixel accuracy of MV with estimation instead of ac-
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
9
TABLE VII COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- CARPHONE 120 FRAMES UNDER 10 FRAMES/S
TABLE VIII COMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- CARPHONE 120 FRAMES UNDER 10 FRAMES/S
TABLE IX EXTRA COMPUTATION REQUIRED BY KALMAN FILTERING FOR EACH ALGORITHM
tual searching. Hence, no extra bit rate is needed for the fractional part of MV. However, because the enhanced algorithm does not involve the estimation process of MV, the obtained MV is not optimum from viewpoint of distortion. To address the problem, we develop a new R-D motion estimation, in which the Kalman filter is included into the estimation process of R-D scheme. We refer to it as Kalman filter embedded R-D motion estimation and describe the details in the following. The cost function of Kalman filter embedded R-D motion estimation can be formulated as (17) is a distortion of Kalman filter-based The motion compensation. It is obtained by Kalman filtering the integer-point MV and the resulting floating-point MV is used to generate motion compensation prediction. In such case, the MV is represented in integer-point, but it can generate motion compensation with fractional pixel accuracy. Therefore, the assigned bit rate for MV is not affected by , but the total cost function is reduced due to the accuracy increase in compensation. Fig. 3 is the block diagram of the embedded algorithm. For simplicity, we select (11) as the criterion for motion estimation.
The Kalman filter embedded R-D motion estimation algorithm is summarized as follows. Step 1) Kalman filter-based motion estimation a) Select a location in the search range and denote it as a candidate measurement of MV . b) Apply the Kalman filter to using the procedure of Step 2 in the previous section. Then we obtain an optimal estimate of MV , which is with fractional accuracy. Calculate the distortion according to the . Step 2) Calculate the bit rate of the MV according to the H.263 Huffman table [6], [7]. Notice that transmission MV is , which is an integer; thus the required bit rate of MV is not affected by Kalman filter. Step 3) Using (17), we calculate the cost function. If the best match is found, go to Step 4; otherwise, go back to Step 1 to select the next location for estimation. Step 4) Go to Step 1 for next block. In the enhanced algorithm, the Kalman filter is not applied during the block searching. It is only used to enhance the performance when MV is obtained by R-D motion estimation. Therefore, the Kalman filter can be viewed as a post processing of
10
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
Fig. 4. Comparisons of PSNR performance using the QCIF-Froeman sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8 search range = [ 15; 16].
0
motion estimation. However, in the embedded algorithm, the Kalman filter is applied for every block searching by employing the joint R-D. Thus, it can be considered as a new R-D motion estimation approach. Since it includes the Kalman filter into the optimization process, the embedded method performs better than the enhanced version at the cost of computational complexity. V. SIMULATION RESULTS The performance of the proposed RD-motion estimation with Kalman filter (RD-Kalman) was evaluated using a set of standard image sequences including Forman, Mother and Daughter, Carphone, Salesman and Claire. All sequences are with CIF (352 288) or QCIF (176 144) resolution and frame rate of 10 Hz. Since the RD-Kalman motion estimation has fractional pel accuracy, the results are compared with the conventional RD algorithm and MSE-optimal scheme with both integer and half-pixel accuracy. The block size 16 16 and search range 64 64 for CIF format and block size 16 16 (or 8 8) and search range 31 31 for QCIF format were chosen, respectively. The conventional RD and RD-Kalman adopted the same motion estimation strategy as that in [13]. Specifically, for the current block, the MVs of the left-neighbor block and up-neighbor block, and the MV obtained with MSE criterion, were selected as the predicted search center, and then a small search of 3 3 is performed. For the KF-based motion estimation, the parameters are chosen experimentally as follows: the model coefficients , model error variance , measurement error variance , initial error covariance , and initial state . It is evident that from [19], the estimated MVs are real values rather than integer. The displaced pixels may not be on the sampling grid. Therefore,
2 8,
the well-known bilinear interpolation is adopted to generate a motion compensated prediction frame. A Huffman codebook adopted from H.263 standard was used in the coding of 2-D differentially coded MVs. The various algorithms were compared in terms of R-D performance. The common peak signal-to-noise ratio (PSNR) measure defined in the following was selected to evaluate distortion performance (18) Moreover, rate performance was evaluated by the number of bits required to encode an image frame or a motion field. The Lagrange multiplier , which controls the overall performance in the R-D sense, is a very important parameter. Generally, an iterative method is needed to determine the value of . However, it is very computational expensive. As pointed out in [14], for typical video coding applications is insensitive to different frames of a video sequence; thus a constant of 20 is adopted in our simulations. The simulation was carried out by incorporating various motion estimation algorithms into an H.263 based motion compensated DCT video coding system. To be fair in the comparisons, we fixed the overall coding bit rate at 4000 bits per frame for CIF-Claire and CIF-Salesman. For QCIF format, two block sizes are conducted for each sequence, which are assigned two different bit rates per frame, respectively. The bit rates preset are 2000 bits (8 8) and 1600 bits (16 16) for Forman, 1400 bits and 1000 bits for Mother & Daughter, and 2000 bits and 1400 bits for Carphone, respectively. The averaged results for 100 frames of CIF format sequences are summarized in Tables I and II. The Kalman-based R-D motion estimation approach outperformed the MSE-optimal and conventional RD algorithms in terms of PSNR. Since
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
11
2
Fig. 5. Comparisons of PSNR performance using the QCIF-Froeman sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 16 16, search range = [ 31; 32].
0
Fig. 6. Comparisons of PSNR using the QCIF-Mother & Daughter sequence, 200 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8 search range = [ 15; 16].
0
the Kalman filter has fractional pel accuracy with the rates of integer MV, it achieves significant PSNR improvement, as expected. When the integer-based Kalman filter is compared to the motion estimation methods in half pixel accuracy, it still achieves better PSNR, but not so significantly. We found that the Kalman filter with half pixel accuracy performs better slightly than that with integer pixel accuracy. This may be due to the limitation of bilinear interpolation; i.e., the accuracy improvement is saturated when too many interpolations are performed. The performance may be further enhanced with the
2 8,
advanced interpolation filters [25], [26]. However, it is not a major issue in our paper. At the same bit rate level and integer pixel accuracy, the enhanced algorithm achieved an average of 1.23 dB gain over MSE-optimal and 0.34 dB gain over the conventional RD. The embedded version achieved an average of 1.77 dB gain over MSE-optimal, and 0.88 dB gain over the conventional RD. Note that the new methods have lower bit rate. Tables III–VIII summarized the average results for QCIF format sequences. For both block sizes of 16 16 and 8 8, the Kalman filter-based
12
Fig. 7.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
Comparisons of PSNR performance using QCIF-Mother and Daughter sequence, 200 frames at 10 frames/s, fixed coding bit rate at 1400 bits. Block size
= 16 2 16, search range = [031; 32].
Fig. 8. Comparisons of PSNR performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8 search range = [ 15; 16].
0
R-D motion estimation approaches achieve significant PSNR improvement. Particularly, the embedded Kalman R-D algorithm achieves the best performance due to its ability in reduction of MV rate as well as the compensation distortion. Figs. 4–9 compare the MSE-Optimal, conventional R-D, enhanced Kalman R-D and embedded Kalman R-D schemes with both integer and half pixel accuracy in terms of PSNR with approximately fixed bit rate for each sequence, respectively. Figs. 10–12 compare these algorithms in terms of bit rate with approximately fixed PSNR for each sequence, respectively. As expected, the results indicate that the proposed schemes achieve better R-D performance.
2 8,
The MV fields generated by various algorithms are shown in Figs. 13–17, respectively. The test sequences contain mainly small rotation and camera panning. As expected, the proposed algorithm produces smoother motion fields because of the filtering effect of Kalman filter. Analysis of Computational Complexity: Consider the case that a block size is , the maximum displacement is p, and the matching criterion is mean absolute difference (MAD), i.e.,
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
13
Fig. 9. Comparisons of PSNR performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed coding bit rate at 1400 bits. Block size =
16 2 16, search range = [031; 32].
Fig. 10. Comparisons of bit rate performance using the CIF-Salesman sequence, 120 frames at 10 frames/s, fixed average PSNR full search at 39.93 dB, half pixel search at 40.01 dB, half pixel RD at 39.92 dB, half pixel RD with KF(En) at 39.95 dB, half pixel RD with KF(Em) at 40.05 dB, RD-Optimal at 39.88 dB, RD with KF(En) at 39.95 dB, and RD with KF(Em) at 39.98 dB.
where block
is the pixel intensity at the location of in the current frame , and is the pixel intensity at the location with the displacement in the previous frame . The full search algorithm (FSA) requires search locations. Each search location corresponds to one MAD computation, which consists of additions and absolute operations. Therefore, the total computation is additions and absolute operations. In general, the Kalman filtering is computationally expensive. However, in our application, the computational complexity is relative small because the calculation of Kalman filtering can
be significantly simplified. We evaluate the required calculations for Kalman filtering as follows. The Kalman filtering contains two stages: prediction and updating. In our application, the state transition matrix , driven matrix and measurement matrix are all 2 2 identity matrix. The covariance of model and measurement error is 2 2 diagonal matrix with the same constant value and , respectively. For simplicity, we denote the addition and multiplication as and , respectively. Prediction:
14
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
Fig. 11. Comparisons of bit rate performance using the QCIF-Mother and Daughter sequence, 200 frames at 10 frames/s, fixed average PSNR full search at 38.80 dB, half pixel search at 38.82 dB, half pixel RD at 38.75 dB, half pixel RD with KF(En) at 38.81 dB, half pixel RD with KF(Em) at 38.89 dB, RD-Optimal at 38.77 dB, RD with KF(En) at 38.83 dB, and RD with KF(Em) at 38.87 dB.
Fig. 12. Comparisons of bit rate performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed average PSNR Full Search at 37.23 dB, Half Pixel Search at 37.32 dB, Half Pixel RD at 37.20 dB, Half Pixel RD with KF(En) at 37.29 dB, Half Pixel RD with KF(Em) at 37.37 dB, RD-Optimal at 37.14 dB, RD with KF(En) at 37.28 dB, and RD with KF(Em) at 37.30 dB.
No calculation is needed
The . Thus,
The
is the covariance matrix, i.e., , which is 2 2 diagonal matrix with . The expansion of is
is 2 2 diagonal matrix with can be expressed as Thus we have
It contains only two additions Updating:
.
. It contains only regarded as multiplication) and Obviously the calculation contains
and (here, division is
and
.
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
(a)
(a)
(b)
(b)
15
(c) (c)
(d) (d) Fig. 13. (a) Motion field estimated by the conventional MSE-Optimal scheme on the CIF- Claire sequence frame 15. The PSNR quality is 39.87 dB and it requires 1980 bits to encode using the H.263 Huffman codebook. (b) Motion field estimated by the Michael C. Chen proposed RD-Optimal scheme on the CIF-Claire sequence frame 15. The PSNR quality is 39.26 dB and it requires 1378 bits to encode using the H.263 Huffman codebook. (c) Motion field estimated by the R-D Optimal with Enhanced Algorithm on the QIF-Claire sequence frame 15. The PSNR quality is 39.45 dB and it requires 1378 bits to encode using the H.263 Huffman codebook. (d) Motion field estimated by the R-D Optimal with Embedded Algorithm on the CIF-Claire sequence frame 15. The PSNR quality is 39.94 dB and it requires 1030 bits to encode using the H.263 Huffman codebook.
Finally, we consider the calculation of
Fig. 14. (a) Motion field estimated by the conventional MSE-Optimal scheme on the CIF- Salesman sequence frame 07. The PSNR quality is 36.32 dB and it requires 1246 bits to encode using the H.263 Huffman codebook. (b) Motion field estimated by the Half Pixel with RD-Optimal scheme on the CIF- Salesman sequence frame 07. The PSNR quality is 36.11 dB and it requires 1095 bits to encode using the H.263 Huffman codebook. (c) Motion field estimated by the Half Pixel RD with Enhanced Algorithm scheme on the CIF-Salesman sequence frame 07. The PSNR quality is 36.24 dB and it requires 1095 bits to encode using the H.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RD with Embedded Algorithm scheme on the CIF-Salesman sequence frame 07. The PSNR quality is 36.47 dB and it requires 1016 bits to encode using the H.263 Huffman codebook.
Obviously the calculation of fied as
can be simpliand
16
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
(a)
(a)
(b)
(b)
(c)
(c)
(d)
(d) Fig. 15. (a) Motion field estimated by the conventional Half Pixel scheme on the QCIF- Foreman sequence frame 204. The PSNR quality is 34.56 dB and it requires 1230 bits to encode using the H.263 Huffman codebook. (b) Motion field estimated by the Half Pixel with RD-Optimal on the QCIF-Foreman sequence frame 204. The PSNR quality is 34.15 dB and it requires 1158 bits to encode using the H.263 Huffman codebook. (c) Motion field estimated by the Half Pixel RD with Enhanced Algorithm scheme on the QCIF-Foreman sequence frame 204. The PSNR quality is 34.27 dB and it requires 1158 bits to encode using the H.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RD with Embedded Algorithm scheme on the QCIF-Foreman sequence frame 204. The PSNR quality is 34.66 dB and it requires 889 bits to encode using the H.263 Huffman codebook.
. The calculation contains .
and
Fig. 16. (a) Motion field estimated by the conventional Half Pixel scheme on the QCIF- Mother and Daughter sequence frame 28. The PSNR quality is 34.83 dB and it requires 1476 bits to encode using the H.263 Huffman codebook (b) Motion field estimated by the Half Pixel with RD-Optimal scheme on the QCIF- Mother & Daughter sequence frame 28. The PSNR quality is 34.52 dB and it requires 1112 bits to encode using the H.263 Huffman codebook. (c) Motion field estimated by the Half Pixel RD with Enhanced Algorithm scheme on the QCIF- Mother and Daughter sequence frame 28. The PSNR quality is 34.67 dB and it requires 1120 bits to encode using the H.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RD with Embedded Algorithm scheme on the QCIF- Mother & Daughter frame 28. The PSNR quality is 34.95 dB and it requires 868 bits to encode using the H.263 Huffman codebook.
Since the computation is combined the and components simultaneously. We assumed that the two components use
KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING
(a)
17
and , so pixel, the bilinear interpolation requires a block with size need and . The computational complexity of enhanced algorithm is different from that of embedded algorithm, since the filtering operation is performed once per block for the former, but once for each search location for the latter. In the decoder, there is a same Kalman filter, therefore, it performs one Kalman filtering operation after the MVs received no matter what enhanced or embedded algorithms. The extra computational load required for the proposed algorithms is summarized in Table IX. It indicates the extra computation introduced by the proposed method is small. VI. CONCLUSION
(b)
(c)
In this paper, we have presented two efficient Kalman filterbased R-D motion estimation algorithms in which a simple 1-D Kalman filter is applied to improve the performance of conventional RD motion estimation. Since equivalent Kalman filters are used in both encoder and decoder, no extra information bit for MV is needed to send to the decoder. The new algorithm achieves significantly PSNR gain with only a slight increase of complexity. The enhanced algorithm is a post processing, and can be easily combined with any conventional R-D motion estimation schemes. The embedded algorithm is a new R-D motion estimation algorithm that can more effectively exploit the correlation of block motion. In the future, we will develop an adaptive scheme to further improve the performance motion compensation. In addition, the combinations with variable block size and OBMC searching, and the investigation of robust transmission, are interesting issues and will be studied in the future. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. REFERENCES
(d) Fig. 17. (a) Motion field estimated by the conventional Half Pixel scheme on the QCIF- Carphone sequence frame 60. The PSNR quality is 34.50 dB and it requires 1302 bits to encode using the H.263 Huffman codebook. (b) Motion field estimated by the Half Pixel with RD-Optimal scheme on the QCIF- Carphone sequence frame 60. The PSNR quality is 34.07 dB and it requires 1132 bits to encode using the H.263 Huffman codebook. (c) Motion field estimated by the Half Pixel RD with Enhanced Algorithm scheme on the QCIF-Carphone sequence frame 60. The PSNR quality is 34.21 dB and it requires 1132 bits to encode using the H.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RD with Embedded Algorithm on the QCIF-Carphone sequence frame 60. The PSNR quality is 34.63 dB and it requires 830 bits to encode using the H.263 Huffman codebook.
the same Kalman filter, therefore, the actual computation is only half of the above analysis. For embedded algorithm, we must calculate the distortion function for each searching. Thus, the interpolation is necessary. For each
[1] J. R. Jain and A. K. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Trans. Commun., vol. 29, no. 12, pp. 1799–1808, Dec. 1981. [2] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” in Proc. NTC81, New Orleans, LA, 1981, pp. C9.6.1–9.6.5. [3] R. Srinivasan and K. Rao, “Predictive coding based on efficient motion estimation,” IEEE Trans. Commun., vol. 33, no. 8, pp. 888–896, Aug. 1985. [4] “MPEG-4 Visual Fixed Draft International Standard,”, ISO/IEC 14 496-2, 1998. [5] “MPEG-4 Video Verification Model Version 18.0,” MPEG Video Group, ISO/IEC JTC1/SC29/WG11 N3908, 2001. [6] “Video Coding for Low Bitrate Communication,” ITU Telecom. Standardization sector of ITU, ITU-T Recommendation H.263, 1996. [7] “Video Coding for Low Bitrate Communication,” ITU Telecom. Standardization sector of ITU, Draft ITU-T Rec. H.263 Version 2, 1997. [8] H. li, A. Lundmark, and R. Forchheimer, “Image sequence coding at very low-bit bitrates: A review,” IEEE Trans. Image Process., vol. 3, no. 9, pp. 589–609, Sep. 1994. [9] T. Wiegand, M. Lightstone, D. Mukherjee, T. G. Cambell, and S. K. Mitra, “Rate- distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 2, pp. 482–190, Apr. 1996.
18
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006
[10] F. Kossentini, Y.-W. Lee, M. J. T. Smith, and R. K. Ward, “Predictive RD optimized motion estimation for very low bit rate video coding,” IEEE J. Sel. Areas Commun., vol. 15, no. 6, pp. 1752–1763, Dec. 1997. [11] D. T. Hoang, P. M. Long, and J. S. Vitter, “Efficient cost measure for motion estimation at low bit rate,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 5, pp. 488–500, Aug. 1998. [12] B. Girod, “Rate-constrained motion estimation,” Proc. SPIE Visual Commun. Image Process., vol. 2308, pp. 1026–1034, Nov. 1994. [13] M. C. Chen and A. N. Willson, “Rate-distortion optimal motion estimation algorithm for motion-compensated transform video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 3, pp. 147–158, Apr. 1998. [14] M. Z. Coban and R. M. Mersereau, “A fast exhaustive search algorithm for rate-constrained motion estimation,” IEEE Trans. Image Process., vol. 7, no. 5, pp. 769–773, May 1998. [15] J. C. H. Ju, Y. K. Chen, and S. Y. Kung, “A fast rate-optimized motion estimation algorithm for low-bit rate video coding,” IEEE Trans. Circuis Syst. Video Technol., vol. 9, no. 7, pp. 994–1002, Oct. 1999. [16] Y. Y. Sheila and S. Hemami, “Generalized rate-distortion optimization for motion-compensated video coders,” IEEE Trans. Circuis Syst. Video Technol., vol. 10, no. 6, pp. 942–955, Sep. 2000. [17] M. T. Orchard and G. J. Sullivan, “Overlapped block motion compensation: An estimation-theoretic approach,” IEEE Trans. Image Process., vol. 3, no. 5, pp. 693–699, Sep. 1994. [18] J. K. Su and R. M. Mersereau, “Motion estimation methods for overlapped block motion compensation,” IEEE Trans. Image Process., vol. 9, no. 6, pp. 1509–1521, Sep. 2000. [19] “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 ISO/IEC 14496-10AVC) Joint Video Team (JVT),”, Doc. JVT-G050, 2003. [20] T. Wiegand, G. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [21] C. K. Chui and G. Chen, Kalman Filtering With Real-Time Applications, ser. Springer Series in Information Sciences. New York: Springer, 1987, vol. 17. [22] M. S. Grewal and A. P. Andrews, Kalman Filtering Theory and Practice. Englewood Cliffs, NJ: Prentice-Hall, 1993. [23] C. M. Kuo, C. H. Hsieh, Y. D. Jou, H. C. Lin, and P. C. Lu, “Motion estimation for video compression using Kalman filtering,” IEEE Trans. Broadcast., vol. 42, no. 2, pp. 110–116, Jun. 1996. [24] C. M. Kuo, C. H. Hsieh, H. C. Lin, and P. C. Lu, “Motion estimation algorithm with Kalman filter,” Electron. Lett., vol. 30, no. 7, pp. 1204–1206, Jul. 1994. [25] T. Wedi and H. G. Musmann, “Motion- and aliasing-compensated prediction for hybrid video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 577–586, Jul. 2003. [26] T. Wedi, “Adaptive interpolation filter for motion compensated prediction,” in Proc. 2002 IEEE Int. Conf. Image Process., vol. 2, Sep. 22–25, 2002, pp. II-509–II-512.
Chung-Ming Kuo received the B.S. degree from the Chinese Naval Academy, Kaohsiung, Taiwan, R.O.C., in 1982, and the M.S. and Ph.D. degrees from Chung Cheng Institute of Technology, Taiwan, R.O.C., in 1988 and 1994, respectively, all in electrical engineering. From 1988 to 1991, he was an Instructor in the Department of Electrical Engineering, Chinese Naval Academy, where he became an Associate Professor in January 1995. From 2000 to 2003, he was an Associate Professor in the Department of Information Engineering, I-Shou University, Kaoshiung, Taiwan, R.O.C., and became Professor in February 2004. His research interests include video compression and image/video retrieving, multimedia signal processing, and optimal estimation.
Shu-Chiang Chung received the B.S. and M.S. degrees in system engineering from Chung Cheng Institute of Technology, Taiwan, R.O.C., in 1983 and 1994, respectively. He is currently working toward the Ph.D. degree at I-Shou University, Kaohsiung, Taiwan, R.O.C. From 1994 to 1999, he was an Instructor at the Chinese Naval Academy, Kaohsiung, Taiwan, R.O.C. In November 1999, he joined the Naval Shipbuilding and Development Center as a Researcher with the Department of Information Systems. His research interests include video coding, image processing, digit signal processing, and optimal estimation.
Po-Yi Shih received the B.S. degree in 2001 and the M.S. degree in information engineering in 2003, both from I-Shou University, Kaohsiung, Taiwan, R.O.C. In 2003, he joined the VIA Technologies, Inc., Hsin-Chu, Taiwan, R.O.C., where he is involved in research and design of video encoder and decoder systems. His current research interests include video coding, image processing, and optimal estimation.