SPATIALLY ADAPTIVE INTEGER LIFTING WITH NO SIDE INFORMATION FOR LOSSLESS VIDEO CODING G. Charith K. Abhayaratne Centre for Mathematics and Computer Science (CWI) Kruislaan 413, P.O. Box 94079 1090 GB Amsterdam, The Netherlands. Email:
[email protected]. ABSTRACT
HP d
Xo
A spatially adaptive integer lifting scheme that chooses both prediction and update functions, independent of each other, by considering the variance normalised auto correlation of the input data is presented. Since the adaptation of prediction and update lifting steps are based on the members of prediction and update templates respectively, no side information regarding the choice of the prediction or update function needs to be sent to the inverse transform in order to achieve perfect reconstruction. This type of adaptive decomposition scheme can be used in lossless coding of signals with highly spatially varying correlation behaviours. We demonstrate that the lossless coding of motion compensated prediction residual frames in lossless video coding can be benefitted by using such an adaptive wavelet transform than using wavelet transforms with fixed basis functions. 1. INTRODUCTION Wavelet transforms are considered as the best transform for lossy image coding. With the introduction of the lifting scheme [1, 2, 3], in which the integer inputs can be mapped to integers, wavelet transforms have been used in lossless image coding as well. With integer lifting schemes, the quality and spatial scalability features can be added to lossless image coding algorithms. In lifting schemes, a filter bank operation, which is the widely used wavelet transform realisation is split into a finite sequence of simple filtering steps by performing lifting steps. This corresponds to the factorisation of the polyphase matrix corresponding to the filter bank into elementary matrices. Lifting has been used in constructing both orthogonal and biorthogonal wavelets. The idea of lifting was originated as a method of building the second generation wavelets, where the wavelets are not necessarily translates and dilates of the mother wavelet as This work was completed while the author was with the Signal and Image Processing Group (SIPG), Dept. of Electronic and Electrical Engineering, University of Bath, BATH BA2 7AY, United Kingdom.
P
U
X
LP Xe
s
Fig. 1. Lifting Block Diagram. in the first generation wavelets. Therefore, the lifting steps can be used to construct better signal representations based on signal characteristics. 1.1. The Lifting Scheme The lifting schemes, as shown in Fig. 1, is summarised here in order to introduce the notation used in this paper. The first step of lifting is splitting the original sequence (X) into two sub sequences containing the odd indexed samples (Xo ) and the even indexed samples (Xe ). Xo : d i ← x 2 i + 1 Xe : s i ← x 2 i
(1) (2)
Then the lifting steps, dual lifting (P) and primal lifting (U), are performed on these two sequences. In dual lifting, which is also called the prediction lifting step, the odd indexed samples are predicted using a predictor P ( ) based on the neighbouring even indexed samples and the prediction errors (details) are recorded replacing the original sample values. d i ← d i − bP (s A )e where, A = ( i − dN/ 2e + 1 , . . . , i + bN/ 2c )
(3)
The update operator bxe rounds x to the nearest integer. This guarantees integer outputs for integer inputs. N is the
number of dual vanishing moments that set the smoothness of the P function. In the U lifting step, the even samples are replaced with smoothed values using the update operator U ( ) on previously computed details. The U ( ) operator is designed to maintain the correct running average of the original sequence, in order to avoid aliasing. s i ← s i + bU (d B )e ˜ ˜ / 2e − 1 ) where, B = ( i − bN / 2c , . . . , i + dN
(4)
˜ moments in the s seThe operator preserves the first N quence. The lazy wavelet is lifted to a transform with required properties (number of vanishing moments in analysis and synthesis filters as in the filter bank approach) by applying the P and U lifting pair of operations one or more times. Finally, the output streams are normalised using the normalising factor 0 k 0 , the integer realisation of which can be found in [2], di ← di si ← si
× 1/k × k
(5) (6)
1.2. Adaptive Lifting Recently, a nonlinear approach for adaptive lifting that can be used in still image coding has been developed using adaptive switching between different predictors based on the local edginess of the image [4, 5, 6]. In these examples, the update lifting step is performed prior to the prediction step, so that the moving average preservation in the low pass signal is not affected by the adaptive prediction step. With these methods, side information regarding the adaptive predictor has to be sent to the inverse transform. Another adaptive lifting scheme using an adaptive update lifting step followed by a fixed predictor has been proposed in [7] (updatefirst lifting). Since the update step, which is usually an averaging process, is performed first in the above methods, these schemes in their original form are not capable of mapping integers to integers in the updated channels. Therefore, such methods cannot be used in lossless coding. Further in the adaptive update-first case, the concept of vanishing and preserving moments cannot be properly interpreted as the final prediction is dependent on the adaptive update process.
In this paper, we present a spatially adaptive lifting scheme that uses interpolating functions as P ( ) and U ( ). Since it follows the classical framework of lifting, it can map integers to integers with the rounding operations. The variance normalised auto correlation of the prediction or update template, whose members are a priori available in both the forward and the inverse transforms, is used as the selection criterion for the best interpolating function in prediction and
update lifting steps respectively. This enables the perfect reconstruction with no side information regarding the P ( ) and U ( ) function choice sending to the synthesis transform. The rest of this paper is organised as follows: In section 2 we present the novel spatially adaptive lifting technique. The use of this technique in lossless video coding is demonstrated in section 3 followed by the concluding remarks in section 4.
2. SPATIALLY ADAPTIVE LIFTING ˜ ), where We denote the generic lifting transforms as (N, N ˜ N, N = 0, 1, 2, ... are the number of vanishing and preserving moments in P ( ) and U ( ) lifting steps, respectively. In interpolating wavelets P lifting can be regarded as interpolating the s channel to obtain the missing points due to previous sub sampling and the d channel recording the error between the interpolated value and the corresponding original value. Similarly, in U lifting, these interpolation errors in the d channel are interpolated and a half of the interpolated value is added to the corresponding element in the s channel. The spatially adaptive scheme locally chooses the interpolator that fits best to the data to be interpolated in P and U steps.
2.1. Signal Interpolation Traditionally, the process of interpolation is mostly associated with image resampling applications, where a discrete image is interpolated to a continuous image and then sampled the interpolated image back to the discrete case [8]. The interpolation is mainly concerned with fitting a continuous function to discrete points in a digital signal. The most common interpolation functions are the nearest neighbour, linear and cubic functions, which are also analogous to polynomials with one, two and four vanishing moments, that are used in P and U lifting steps respectively. It is well known that a signal can be reconstructed from samples if the signal is band limited and the sampling is done at a frequency higher than the Nyquist rate. However, the most real-life signals, and thereby their sub sampled channels, cannot be considered as band limited signals. The down sampling process can also be considered as replicating the frequency spectrum at the multiples of 2ωs , where ωs is the original sampling frequency. The interpolation process removes those replicates of the spectrum. According to the Wiener-Khintchine theorem, the spectrum of a finite energy signal can be obtained by the Fourier transform of the auto correlation sequence of the input signal. This suggests that the auto correlation sequence can be used to determine the interpolation criteria in spatial domain.
2.2. Spatially adaptive interpolation Auto−correlation comparisons for various predictions 1.5 Original 2−point correlations (X+Y)/2 Prediction Y Prediction X Prediction 1 Normalised Auto−correlation Coefficient
The nearest neighbour and the linear interpolation functions use two successive points in the down sampled stream to determine the point equidistant to those two points. The use of two successive points corresponds to the unit lag auto correlation of the down sampled signal. In this approach, the interpolator that maintains the unit lag normalised auto correlation of the down sampled signal at the local point of interest after the interpolation process is used to interpolate the local point. The derivation of the selection criterion is formulated as below. Let X and Y be the values of the two points to be interpolated. It is also noted that |Y|≥|X| and X and Y are not necessarily at the left and the right sides respectively. Since the normalised auto correlation is considered in this analyX sis, the two values 1 and X Y , where −1 ≤ Y ≤ 1 and Y6=0, are used as the two values to be interpolated. We define the variance normalised unit lag auto correlation coefficient (r(1)X,Y ) of two values to be interpolated as below.
0.5
0
−0.5
−1 −1
−0.8
−0.6
−0.4
−0.2
0 X/Y ratio
0.2
0.4
0.6
0.8
1
Fig. 2. Resulting auto-correlation values for different interpolators
Auto−correlation comparisons for various predictions 1.4
(7)
The interpolated value Z is 21 (1 + X/Y ) for the linear interpolation case. For the nearest neighbour case, Z is either X/Y , which is the lower absolute value, or 1, which is the higher absolute value. The nearest neighbour interpolation we used in this work is different from the prediction and update methods used in the (1,1) transform, where the left side value in the P lifting and the right side value in the U lifting are always chosen irrespective of the value of the other neighbouring point. The plots for unit lag normalised auto correlation computed using the two points with values 1 and X Y for the down sampled signal and the corresponding unit lag normalised auto correlation using three points (1, Z, X/Y ) for different X values of X Y in the range −1 ≤ Y ≤ 1 and in 0.01 units increments are shown in Fig. 2. The plots of absolute difference of the unit lag normalised auto correlation for the three interpolated signals and that of the two points to be interpolated are shown in Fig. 3. The resulting unit lag normalised auto correlation difference due to zero padded interpolation, which corresponds to the lazy wavelet i.e. no prediction or update process, is also shown in the figure. It can be seen from the plots, √1 the nearest lower neighbour interthat for 0 < X Y < 2 polation provides the closest auto correlation match and for
Error in Normalised Auto−correlation Coefficient
The variance normalised unit lag auto correlation (r(1)X,Y,Z ) after interpolating X and Y with Z is as below. 3 Z + Z(X/Y ) r(1)X,Z,Y = (8) 2 1 + (X/Y )2 + Z 2
0 Prediction (X+Y)/2 Prediction Y Prediction X Prediction
1.2
1
0.8
0.6
0.4
0.2
0 −1
−0.8
−0.6
−0.4
−0.2
0 X/Y ratio
−1 ≤
3.a)
X Y
0.2
0.4
0.6
0.8
1
0.9
1
≤1
Difference in auto−correlation comparisons for various predictions 0.7 0 Prediction (X+Y)/2 Prediction Y Prediction X Prediction
0.6 Error in Normalised Auto−correlation Coefficient
r(1)X,Y
2(X/Y ) = 1 + (X/Y )2
0.5
0.4
0.3
0.2
X/Y= 1/sqrt(2)
0.1
0
0
0.1
0.2
3.b)
0.3
0.4
0≤
0.5 X/Y ratio
X Y
0.6
0.7
0.8
≤1
Fig. 3. Resulting auto-correlation difference for different interpolators.
x(i-1)
3. A LOSSLESS VIDEO CODING APPLICATION
( )
9
9
x i+ 1 2 x(i)
-1
1
1
-1
x(i+1)
x(i+2)
a+ b 2
a=
9 x ( i ) − x ( i −1) 8
b =
9 x ( i +1) − x ( i + 2 ) 8
Fig. 4. Two point interpretation of the cubic interpolation
≤ X Y ≤ 1 the linear interpolator provides the closest match. The values X Y < 0 correspond to X and Y with different signs and to negative correlation. It is clear from Fig. 3.a. that the nearest lower neighbour interpolation produces the closest match in this region. However, due to the sign difference of the neighbours, and thereby the negative auto correlation, a zero padded interpolation is considered in this region. The same treatment is given when X=Y=0. √1 2
2.3. Extension to cubic interpolation The cubic interpolation uses four points to find the value at the mid point (using the two most immediate neighbours from either sides), whereas the above analysis for the nearest neighbour and the linear interpolation considered only two points. The weights in cubic interpolation for points at 1 9 9 1 i−1, i, i+1 and i+2 positions are {− 16 , 16 , 16 , − 16 } respectively . This can be interpreted as a linear interpolation of two points at i+ 81 and i+ 87 , the values of which are computed by extrapolating the values at i−1 and i by 9 : −1 ratio and extrapolating the values at i+1 and i+2 by −1 : 9 ratio, respectively as shown in Fig. 4. The interpolated values a and b (Fig. 4) at positions i + 81 and i + 78 can be considered as X and Y in the previous analysis for linear / nearest neighbour interpolation. The use of cubic interpolation can be determined by considering the ratio X Y . The same procedure can be used to extend this algorithm to other higher order interpolations as well.
2.4. The algorithm The adaptive interpolation function selection can be summarised as in Fig. 5. The same algorithm can be used in both P and U lifting steps. In P lifting the interpolated value is used to predict the members of the d channel, whereas, in U lifting a half of the interpolated value is added to the corresponding members in the s channel to update them with the running average.
In any given frame in a video sequence, there are regions with different amounts of motion. Regions with low motion content produce smooth regions of low valued residuals due to accurate motion predictions. As a result, highly decorrelated regions can be can be seen in residual frames. The regions with high motion cause high valued residuals, decorrelated to a certain extent, due to inaccurate predictions. Therefore, the amount of local motion present in a frame causes regions of highly and lowly decorrelated regions in a motion compensated prediction residual field. A priory knowledge of such regions can be used to choose a suitable wavelet to transform the residual field. Spatially adaptive selection of interpolating functions, thereby the spatially adaptive lifting scheme presented in this paper is capable of adaptively detecting such regions and varying the number of vanishing / preserving moments in the prediction and update polynomials accordingly. The performance of the spatially adaptive lifting scheme ˜ ), where N and N ˜ can be 0,1,2, or 4 depending on the (N, N local statistics, on lossless coding of inter frames in lossless video coding are compared with the wavelets with fixed vanishing moments: (0,0), (1,1) (2,2) and (4,4). The 2D transforms in above cases were obtained as separable row and column realisations of the 1D transforms. The average weighted entropy values for the Y components (8 bpp) of inter frames of four different sequences are compared in Table ˜ ) achieves 1. It can be seen that the adaptive scheme (N, N the lowest entropy values.
Claire Mobile Kiel Unicycle
(0,0) 2.174 4.524 4.499 4.494
(1,1) 2.204 4.531 4.428 4.383
(2,2) 2.160 4.594 4.439 4.326
(4,4) 2.208 4.625 4.452 4.338
˜) (N, N 2.164 4.469 4.439 4.359
Average
3.923
3.886
3.880
3.906
3.858
Table 1. Average zero order entropy (in bpp) comparison for adaptive lifting
4. CONCLUSIONS A spatially adaptive lifting scheme that maps integers to integers was designed based on the classical lifting framework and the adaptive selection of the interpolating polynomials according to the prediction and update template variance normalised unit lag auto correlation coefficients. This enables to choose different predictors and update filters, independent of each other, at a given point in the signal.
X=x(i); Y=x(i+1); if ((X==0) AND (Y==0)) {"Use No Interpolation"}; else { if (sign(X) == sign(Y)) { if (abs(X) >= abs(Y)) {ratio = abs(Y)/abs(X)}; else {ratio = abs(X)/abs(Y)}; if (ratio < 1/sqrt(2)) { if (abs(X) >= abs(Y)) {"Nearest Neighbour Int. with Y"}; else {"Nearest Neighbour Int. with X"}; } else { a=(9*x(i)-x(i-1))/8; b=(9*x(i+1)-x(i+2))/8; if (abs(a) >= abs(b)) {ratio_c = abs(b)/abs(a)}; else {ratio_c = abs(a)/abs(b)}; if (ratio < ratio_c) else
{"Linear Interpolation"}; {"Cubic Interpolation"};
} } else
{"Use No Interpolation"}
}
Fig. 5. AL-2 Summary. The algorithm is first derived for the linear and the nearest neighbour interpolations and subsequently showed how to extend it to higher order interpolations. In the case of the nearest neighbour interpolation, the interpolation value is chosen either from the left or the right neighbour, as compared to the corresponding (1,1) transform (Haar wavelet) it is always the left neighbour in P and right neighbour in U lifting steps. A zero padded interpolator is also considered when the input signal is highly decorrelated. The main advantage is both adaptivity and the integer mapping can be achieved with no overhead coding cost for sending the information regarding to the interpolator used at different spatial points. Finally, our experimental results show that the lossless video coding can gain advantages by employing such adaptive decomposition schemes in lossless coding of inter frames, which contains highly varying spatial correlations according to the motion content of the frame and the accuracy of the motion compensation process. 5. REFERENCES [1] R. Calderbank, I. Daubechies, W. Sweldens, and B.-L. Yeo, “Wavelet transforms that map integers to integers,” Applied and Computational Harmonic Analysis, vol. 5, no. 3, pp. 332– 369, 1998. [2] I. Daubechies and W. Sweldens, “Factoring wavelet trans-
forms into lifting steps,” J. Fourier Anal. Appl., vol. 4, no. 3, pp. 245–267, 1998. [3] W. Sweldens, “The lifting scheme: A new philosophy in biorthogonal wavelet constructions,” in Wavelet Applications in Signal and Image Processing III, A. F. Laine and M. Unser, Eds. 1995, pp. 68–79, Proc. SPIE 2569. [4] R. Claypoole, G. Davies, W. Sweldens, and R. Baraniuk, “Non-linear wavelet transforms for image coding using lifting,” in 31st Asilomar Conference on Signals, Systems and Computers, IEEE Comp.Soc., Los Alamitos, CA, 1998, vol. 1, pp. 662–667. [5] R. Claypoole, R. Baraniuk, and R Novak, “Adaptive wavelet transform via lifting,” in International Conference on Acoustics, Speech and Signal Processing, Piscataway, NJ, 1998, vol. 3, pp. 1513–1516. [6] R. Claypoole, G. Davies, W. Sweldens, and R. Baraniuk, “Lifting for non-linear image processing,” in Wavelet Applications in Signal and Image Processing VII, 1999, vol. Proc. SPIE 3813, pp. 372–383. [7] G. Piella and H.J.A.M. Heijmans, “Adaptive lifting schemes with perfect reconstruction,” IEEE Transactions on Signal Processing, vol. 50, no. 7, pp. 1620–1630, July. 2002. [8] J.A. Parker, R.V. Kenyon, and D.E. Troxel, “Comparison of interpolating methods for image resampling,” IEEE Transactions on Medical Imaging, vol. MI-2, no. 1, pp. 31–39, March 1983.