Entropy criterion for optimal bit allocation between motion and prediction error information F. Moscheni, F. Dufaux and H. Nicolas Signal Processing Laboratory Swiss Federal Institute of Technology CH-1015 LAUSANNE, Switzerland Phone : (+41 21) 693 26 01 Fax : (+41 21) 693 46 60 Email :
[email protected] .ch,
[email protected] .ch,
[email protected] .ch
Abstract
Motion estimation and compensation techniques are widely used in video coding. This paper addresses the problem of the trade-o between the motion and the prediction error information. Under some realistic hypotheses, the transmission cost of these two components can be estimated. Therefore, we obtain a criterion which controls the motion estimation process in order to optimize its performance. As a particular application, this criterion is applied to the split procedure of an adaptive multigrid block matching technique. Simulation results are presented, showing the signi cant improvements due to the method.
1 Introduction In recent years, there has been a growing interest in technologies involving digital image sequences, as for example high de nition television (HDTV) and video conference. Much research is presently done in order to reduce the transmission cost of the video signal, that is to say its bit rate. Video coding schemes using motion estimation and compensation techniques have been developed and have shown their eciency [1]. In these schemes, the motion and prediction error information, also named displaced frame dierence (DFD), are transmitted instead of the image itself. It is straightforward that a very precise motion estimation will lead on one hand to a very low DFD energy and one the other hand to a high overhead motion information. Conversely, a coarse motion estimation will entail a low overhead motion information and a high DFD energy. Nevertheless, the problem of the nal bit rate being a trade-o between the motion and the DFD information is not addressed in the literature. This paper introduces an approach where the optimal bit allocation between respectively the motion and the DFD components is obtained. Consequently, the introduction of this method in any classical coding scheme leads to a signi cant improvement of its performance. This optimization procedure demands an estimation of the transmission cost for both the components. By the very nature of the prediction process, the DFD tends to have a characteristic distribution which allows its modeling as a Laplacian probability density function (PDF). Moreover, the low correlation of the DFD pixel values permits to consider it as a 0th order Markov process. Hence, an analytical expression to estimate the entropy of the DFD, namely its cost has been derived. With regard to motion information, the estimation of its cost is most of the time straightforward and computationally easy. Therefore the optimal bit allocation is reached through 1
the minimization of the sum of these transmission costs and de nes the so-called entropy criterion. In Sec. 2, the entropy criterion is developed. As an application, in Sec. 3, an adaptive multigrid block matching algorithm [2] is considered in which the splitting procedure is controlled by the entropy criterion. Experimental results are shown in Sec. 4. Finally, a conclusion and further research directions are presented in Sec. 5.
2 Entropy criterion 2.1 Entropy criterion
Among the coding schemes based on a motion compensation technique, the total bit rate R is a trade-o between the amount of motion parameters Rmotion and the DFD information RDFD . To obtain a global optimization of the coding scheme, an expression of the total bit rate R = Rmotion + RDFD
(1)
is introduced. The diculty lies undoubtedly in estimating these two terms. Assuming an entropy coding, the DFD information is as follows: RDFD = nDFD HDFD ;
(2)
where nDFD is the number of pixels and HDFD their total entropy, i.e. the entropy of order nDFD , 1. As we shall see in section 2.2, an analytical expression for HDFD can be derived under some realistic hypotheses. As far as Rmotion is concerned, it corresponds to the whole motion information. Therefore it includes the motion vectors (e.g. translation, rotation, : : : ) as well as the segmentation if any. For example, the transmission cost of the motion vectors can be evaluated by the mere computation of their entropy. Most generally in motion estimation algorithms, parameters have to be set to control the estimation process. For instance, these parameters could be a threshold in a segmentation procedure, the block size in a block matching technique, the precision of the motion vectors (e.g. 1 or 1=2 pixel accuracy), the choice of the motion model (e.g. translational, linear, : : : ). In this paper, the problem of the threshold in a split procedure will be considered. These dierent problems of parameters optimization can be solved by the following entropy criterion: Rmotion (n~2 ) + RDFD (n~2 ) < Rmotion (n~1) + RDFD (n~1 ) ) n~2 is accepted and n~1 rejected ;
(3)
where n~1 and n~2 correspond to two parameters states of the algorithm.
2.2 Analytical expression for the entropy of the displaced frame dierence
In this section, a modeling of the DFD resulting from the motion compensation is given in both the continuous and uniform quantization cases. Hence, an analytical formula for the DFD entropy can be derived. 2
2.2.1 Continuous case When the continuous case is considered, the PDF of the DFD can be modeled by a continuous Laplacian distribution. Assuming stationarity and representing the process with the random variable X (t), its PDF is given by [3] p ! 1 2jxj ; p(X = x) = p exp , (4) 2 where x is a realization of the random variable X . Moreover, the low correlation of the DFD pixel values permits to consider X as a 0th order Markov process. Its energy E and entropy H are then respectively E=
and
H=,
Z1
,1
Z1
,1
x2p(x)dx = 2
(5)
p
p(x) log2 (p(x))dx = log2 ( 2 e) :
(6)
2.2.2 Uniform quantization case In a lossy compression scheme, higher compression ratios are obtained by a coarser quantization of the DFD pixel values. In the case of a uniform quantization of the Laplacian distribution with a quantization step size Q, the PDF is given by p ( R Q=2 1 exp , 2jnQ+xj dx if i = nQ; n = 0; 1; 2; : : : p (7) p(i) = ,Q=2 2 0 otherwise ; namely
p(0) = 1 , exp , 2 p(nQ) = sinh exp (,jnj)
(8)
jnj > 0 ; (9) 2 p where = 2Q= . The energy and the entropy of this distribution are given by (see Appendix A) 1 X + 1) ; 2 2 E= (nQ) p(nQ) = 2Q sinh e (e (10) 2 (e , 1)3 n=,1 and 1 X H=, p(nQ) log2 (p(nQ)) = ,p(0) log2 (p(0)) , 2 sinh
n=,1
, e + : e , 1 ln 2 (e , 1)2
1
log2 sinh (11) 2 2 In the entropy criterion (see Eq. (3)), the entropy of the DFD needs to be evaluated in order to estimate the bit rate required to code the DFD pixels. This entropy can be obtained from the energy of the DFD, according to the modeling of its PDF. Due to the complexity of Eq. (10) and (11) no analytical relation can be derived to explicitly express H as a function of E . However, one can numerically compute the standard deviation through Eq. (10), and introduce the value in Eq. (11) in order to obtain the entropy H corresponding to the energy E . Therefore, by computing the energy of the pixels in the DFD (straightforward if a block matching algorithm is used), the entropy can be derived, and this last value is introduced in the entropy criterion. The validity of this approach has been demonstrated in [?]. 3
3 Application of the entropy criterion to adaptive multigrid block matching In order to illustrate the eciency of the entropy criterion, this latter is applied to control the split procedure in an adaptive multigrid block matching motion estimation technique. A brief review of the motion estimation technique is given in this section, more details can be found in [2]. In Sec. 4, coding results obtained with this motion estimation technique and the entropy criterion will be presented. The adaptive multigrid block matching technique generates a locally varying block size. This coarse segmentation is carried out by a quad-tree decomposition. The motion vectors are initially estimated on a xed large size grid. Blocks for which the accuracy of the obtained motion vector is not sucient are further split. The corresponding motion vector is re ned on a ner grid until a satisfactory accuracy is achieved or a minimum block size is reached. This technique allows to have more accurate motion vectors in highly detailed area, and to decrease the number of motion parameters in uniform area. However, the criterion to decide whether to split a block in uences signi cantly the overall performance of the motion estimation procedure. A simple criterion which is widely used [2, 4, 5] is the following:
If the mean square error (MSE) (or eventually another error measure) of the motion compensated block is above a preset threshold T , the block is split.
MSEnosplit > T ) split :
(12)
A similar criterion, leading to a slightly higher performance is:
If the MSE of the motion compensated block is above yT and if the split decreases the MSE, then the block is split.
MSEnosplit > T (MSEnosplit , MSEsplit) > 0
)
) split :
(13)
However, none of these two criteria guarantee that the extra-cost to send more motion parameters is worth the gain of decreasing the DFD energy. By applying the entropy criterion described in Sec. 2.1, the split can be controlled in order to reach the optimal bit allocation between motion parameters and DFD information. The entropy criterion (see Eq. (3)) can be written as follows:
If the extra-cost to send additional motion parameters is worth the gain obtained on the DFD side,
then the block is split. In order to compare the criterion with the two previously mentioned ones, the constraint that the MSE of the motion compensated block is above T can be added. MSEnosplit > T n (HDFD nosplit , HDFD split) > 4 H~v split , H~v nosplit
)
) split ;
(14)
where n is the number of pixels in the block, HDFD split and HDFD nosplit are the entropy of the DFD pixels with/without split respectively, and H~v split and H~v nosplit the entropy of the motion vectors with/without split respectively. 4
In the algorithm, the amount of information to transmit the segmentation information, i.e. the quad-tree, is negligible. Therefore, the extra-cost is only represented by an increased number of motion vectors. The factor 4 is due to the fact that, in case of splitting, four motion vectors are transmitted for the block (quad-tree segmentation) instead of one. The entropy of the DFD pixels is evaluated through Eq. (11), while the entropy of the motion vectors is estimated from the vectors already available. Of course this criterion should performed optimally for T = 0, when all the blocks are considered for segmentation.
4 Experimental results In this section, experimental results obtained in a motion compensated video coding scheme are presented in order to demonstrate the eciency of the entropy criterion. Figure 1 shows the block diagram of the encoder. Transf.
Quant.
EC
Channel
Quant.
EC
Channel
EC
Channel
Input sequence -
Inv. Quant.
Inv. Quant.
Inv. Transf.
+
Motion Comp.
Motion Estim.
Frame Buffer
Figure 1: The encoder block diagram. The rst frame is intraframe coded using a transform-based technique, in our particular case a wavelet transform as described in [6]. The following frames are motion compensated predicted and interframe coded. In accordance with the 0th order Markov process modeling of the DFD, the latter is simply quantized and entropy coded (without transform). The motion estimation is performed by an adaptive multigrid block matching technique as described in Sec. 3. The transformed coecients, the DFD coecients and the motion vectors are entropy coded by an adaptive arithmetic coder [7]. Simulations have been carried out on the luminance component of the sequences \Table Tennis" and \Flower Garden" in CIF format (288 352 pixels, 8 bits/pixels, 25 frames/sec.). Figures 2 and 3 give a comparison of the dierent splitting criterion discussed in Sec. 3. Results are expressed in terms of bit rate versus threshold, and bit rate versus peak-signal-noise-ratio (PSNR). The two criteria de ned by Eq. (12) and (13) show a characteristic behavior. On the one hand and for a small threshold value, too many blocks are split, resulting in a high overhead motion information compared 5
2.2
2.2 - MSE_nosplit > T
2.1
- MSE_nosplit > T
.. MSE_nosplit > T and (MSE_nosplit - MSE_split) > 0
2.1
-. MSE_nosplit > T and entropy criterion
-. MSE_nosplit > T and entropy criterion 2 bit rate (Mb/s)
bit rate (Mb/s)
2
1.9
1.9
1.8
1.8
1.7
1.7
1.6 0
.. MSE_nosplit > T and (MSE_nosplit - MSE_split) > 0
50
100
150 threshold T
200
250
1.6 28.8
300
29
29.2
29.4 PSNR (dB)
29.6
29.8
30
Figure 2: Split criterion comparison: bit rate vs threshold and bit rate vs PSNR for the sequence \Table Tennis". 3.5
3.4
3.5
- MSE_nosplit > T
3.4
.. MSE_nosplit > T and (MSE_nosplit - MSE_split) > 0
.. MSE_nosplit > T and (MSE_nosplit - MSE_split) > 0
-. MSE_nosplit > T and entropy criterion
3.3 bit rate (Mb/s)
bit rate (Mb/s)
3.3
3.2
3.1
3
3
50
100
150 threshold T
200
250
2.9 28.5
300
-. MSE_nosplit > T and entropy criterion
3.2
3.1
2.9 0
- MSE_nosplit > T
28.6
28.7
28.8 28.9 PSNR (dB)
29
29.1
29.2
Figure 3: Split criterion comparison: bit rate vs threshold and bit rate vs PSNR for the sequence \Flower Garden". to a small gain in terms of the bit rate to transmit the DFD. On the other hand and for a high threshold value, too few blocks are split, reducing the eciency of the adaptive multigrid block matching motion estimation technique. Between these two extremes, there is an optimal threshold value which gives the best performance. However, the arbitrary nature of this optimum does not allow to predetermined it. Only several trials could lead to this value, but this method is unworkable in practice. The entropy criterion, Eq. (14), shows a very dierent behavior. As expected, the performance is optimal for T = 0 when all the blocks are candidates for segmentation. Most naturally, the performance diminishes along with an increase of the threshold, as fewer blocks are considered for splitting. When compared to the two other criteria for T = 0, the optimum achieved with the entropy criterion corresponds to a gain of 18% to 25% in terms of bit rate for the sequence \Table Tennis" and 10% to 15% for the sequence \Flower Garden". Even though this gain decreases signi cantly when the comparison 6
with the two other criteria is made at their respective optimal threshold value (100 < T < 200), the entropy criterion still performs better. Moreover, it must be highlighted that the entropy criterion avoids the problem of determining the threshold value, as it always performs optimally for T = 0. It therefore leads to a workable algorithm for practical applications.
5 Conclusion In this paper, we propose an entropy criterion which allows an optimal bit allocation between motion and DFD information. It is based on realistic hypotheses about the DFD statistics and simple computation. Thanks to this criterion, the parameters which control the motion estimation process are automatically optimized to their best value on the line. As a particular application, the diculty to set the threshold in the split procedure of an adaptive multigrid block matching algorithm has been removed. Therefore, the entropy criterion leads to a workable algorithm and improved performances. Further research will deal with higher order modeling of the DFD statistics as well as the application of the entropy criterion to other motion estimation techniques.
Appendix A: Energy and entropy of a quantized Laplacian probability density function The energy E of the uniformly quantized Laplacian PDF is given by E=
1 X n=,1
X 1
X 1 2 2 2 2 n=,1 n exp (,jnj) = 2Q sinh 2 n=1 n exp (,n) :
(nQ)2p(nQ) = Q2 sinh
Using d'Alembert criterion [8] for the positive terms series
1 X
un+1 un ; with un 0; if nlim < 1 ) the series converges !1 un n=1
it is easy to show that the series converges. We have also that 1 1 X @2 X n2 exp (,n) = 2 exp (,n) @ n=1 n=1
with the geometric series Therefore, we obtain
1 X n=1
exp (,n) = e 1, 1 :
e (e + 1) E = 2Q2 sinh 2 (e , 1)3 :
The derivation of the entropy H of the quantized Laplacian PDF is similar H=,
1 X n=,1
p(nQ) log2 (p(nQ))
7
= ,p(0) log2 (p(0)) ,
,1 X
1 ! X
+ sinh 2 exp (,jnj) log2 sinh 2 exp (,jnj) n=,1 n=1 ) ( X 1 1 X exp (,n) + ln 2 (,n) exp (,n) : = ,p(0) log2(p(0)) , 2 sinh 2 log2 sinh 2 n=1 n=1
It is easy to prove that both series converge using d'Alembert criterion. We also have 1 X n=1
@ (,n) exp (,n) = @
1 X
exp (,n) = ,e 2 : (e , 1) n=1
The expression for the entropy H is straightforward H = ,p(0) log2 (p(0)) , 2 sinh
2
log2 sinh 2
, e e , 1 + ln 2 (e , 1)2 :
1
References [1] H.G. Musmann, P. Pirsch, and H.J. Grallert. Advances in picture coding. Proc. IEEE, vol. 73, pp. 523-548, April 1985. [2] F. Dufaux and M. Kunt. Multigrid block matching motion estimation with an adaptive local mesh re nement. In SPIE Proc. Visual Communications and Image Processing '92, volume 1818, pages 97{109, Boston, MA, November 1992. [3] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, New York, 1965. [4] C. Labit and H. Nicolas. Compact motion representation based on global features for semantic image sequence coding. In SPIE Proc. Visual Communications and Image Processing '91, volume 1605, pages 697{709, Boston, MA, November 1991. [5] M.H. Chan, Y.B. Yu, and A.G. Constantinides. Variable size block matching motion compensation with applications to video coding. IEE Proc., vol. 137, no. 4, pp. 205-212, August 1990. [6] T. Ebrahimi. Perceptually Derived Localized Linear Operators: Application to Image Sequence Compression. PhD thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland, 1992. [7] R.M. Witten, I.H. Neal, and J.G. Cleary. Arithmetic coding for data compression. Commun. of the ACM, vol. 30, no. 6, pp. 520-540, June 1987. [8] G. Arfken. Mathematical Methods for Physicists. Academic Press, 1970.
8