In most standardised video compression algorithms, motion information is .... tor plus the two luminance compensation factors, c and b, as dis- cussed in the .... In this work, luminance transformations were combined with spa- tial transforms to ...
HIERARCHICAL MOTION COMPENSATION WITH SPATIAL AND LUMINANCE TRANSFORMATIONS Nuno M. M. Rodriguestt, Vttor M . M . da SilvaSand Se‘rgio M . M. de Fariatt tEscola Superior de Tecnologia e Gestiio, Instituto Polit6cnico de Leiria, Portugal SInstituto de TelecomunicaGBes, Dep. Eng. ElectrotCcnica - Univ. de Coimbra, Portugal ABSTRACT
process is achieved through the use of a spatial mapping (usually affine, perspective or bilinear), which provide a more general approach to motion estimation, enabling complex motion to be tracked more efficiently [3].Among these geometric transforms, bilinear transformations [4]have shown to be most efficient. Using this technique, a square block is transformed into a quadrilateral, through the mapping of the vertices of the original block into a set of new vertices. Generally, the use of geometric transforms for MC allows a reduction in the bit rate for the same quality of the reconstructed sequence, or a gain in quality for the same bit rate ~4.51. Nevertheless, non-uniform illumination changes in spatial and temporal domain, masking between objects and uncovered background are still difficult to compensate using spatial transformations, because they represent changes in pixels’ luminance, instead of a simple spatial deformation In order to overcome this problem, we propose the use of transformations in the luminance domain associated with spatial transforms and an hierarchical block size segmentation [ 6 ] .
In this paper we present a new method for motion compensation, which combines transformationsin spatial and luminance domain. In most standardised video compression algorithms, motion information is assumed to be translation only, as used by traditional block matching algorithm (BMA). Such assumption is not always correct, as inmost scenes, namely head and shoulders, objects’ motion are rotations and zooms, among other effects, which can not be represented as translations. In order to compensate more accurately these complex motion features, we applied geometric transformations (BMGT) for motion compensation, as described in our previous work. Nevertheless, spatial transformations alone are unable to compensate situations like uncovered background, masking between objects or changes in lighting conditions. In this sense, we introduced an additional transformation in the luminance domain (BMGTI), which has proven to be appropriate to overcome such problems. Experimental results have shown that BMGTI exceed the performance of BMGT about 2 dB, and achieves better results than global brightness compensation (GBC) technique. This method is implementedusing a spatial hierarchical block structure, which allows video encoding at low bit rates and the reduction of the computational complexity.
2. TRANSFORMATIONSIN THE LUMINANCE DOMAIN
In order to perfom the motion estimation, we obtain a block of pixels, Q, of the current frame, from a block of pixels, P, in the reference frame. Hence, we transform each pixels’ luminance of the reference block using two scalar factors: a scaling factor c and an adding factor b, determined such that:
1. INTRODUCTION Motion compensation (MC) is one of the most important techniques used traditional hybrid video compression schemes. This techniques allows the exploitation of similaritiesbetween successive frames of animage sequence, improving substantially the efficiency of the signal compression Among the MC techniques, block matching algorithm (BMA) [l, 21 is widely used and have been proposed in most video coding standards. This method assume that the motion of every pixel inside an image block is similar and can be represented by a translation, thus can be represented by a linear motion vector. These assumptions are not always correct, as in most video s e quences, namely head and shoulders, the camera is normally close to the object and objects’ motionare generally rotations, zoom and other complex effects. Thus, the traditional method may lead to a loss of coding efficiency, due to its inability to compensate more complex motion. Even though the pixel accuracy is increased, which leads to better results, the problem is not solved. In this sense, a geometric transforms must be used to properly compensate the objects’ motion and thus improve the MC efficiency. When using these techniques, blocks of pixels are deformed appropriately to best model their complex motion The estimation
0-7803-6725-1/01/$10.0002001 IEEE
for every pixel p,, and q$3of blocks P and Q, respectively. For blocks with Jvf x N pixels, the estimation of the transformation factors will originate an over-determined system with Ivf x N equations and two unknowns. This system is solved using a least square approximation, such that values of the compensation factors, c and b, are determined in order to minimise the cumulative quadratic error for the &.Ix N equations of a block, given by: e=
M
N
2=1
j=1
CC
[q23- (p2? x c + b)12
(2)
The solution of this system, using a pseudo-inverse matrix, is a set of two algebraic expressions, that allow the determination of the compensation factors, directly from the pixels of both image
518
blocks, Q and P :
using the quantised compensation factors, apply the luminance transformationto thereferenceblock, B:L> (5, y), resulting in a new block B&!;,I( y); determine the mean absolute error (MAE) between B,';;'(x!y) and the original block B ( x !y); choose the bilinear mapping and luminance compensation factors, that minimise the MAE for this block of the refe rence image. In order to evaluate the performance of this technique, different video sequences were encoded using MC only. The first frame is intraframe coded and it is assumed to be available at the receiver. The video encoder uses anhierarchical structure of block sizes, from a range of 32 x 32 to 8 x 8 pixels. The type of MC technique used for each block is chosen, considering reconstruction quality and the associated overhead bits. Block segmentation restrictions are used to control the generated bit rate, that is kept as close as possible to a chosen target value.
(3)
C=
"-
M N 5
5&-(5
$=13=1
5pv)2
%=13=1
(4) Using these compensation factors and dependmg on the pixel values of the blocks P and Q, the transformation given by equation (1) can yield an improvement of the mean square emor (MSE). 3. MOTION COMPENSATION WITH LUMINANCE TRANSFORMATIONS
4. EXPERIMENTALRESULTS
The luminance transformation described in the previous section has shown to be appropriate to compensate changes in pixels' domain, like masking between objects, uncovered background and non uniform intensity changes, that were difficult to compensate with spatial transformations only. However, in order to compensate spatial transformations l i e translations, rotations, zooms and other common phenomena of natural video sequences, techniques like BMA or BMGT must be used. The combination of both transfonnations, in spatial and luminance domain, resulted in two new techniques, named BMAI and BMGTI, respectively from BMA and BMGT Zmproved with luminance transformations. The BMAI technique is a combination of a full search BMA with luminance compensation It uses a translational motion vector plus the two luminance compensation factors, c and b, as discussed in the previous section As these factors are real numbers, they must be quantised for an e k i e n t coding and transmission Such quantisation may introduce an error in the decoded block after MC, that can be minimised if the quantisation is performed prior to the evaluation of the MSE during the estimation of MC parameters. In order to further improve the MC accuracy, the use of luminance compensation with BMGT (BMGTI) was also implemented. In this case, MC is performed using four motion vectors plus two luminance compensation factors. These motion vectors are applied to the comers of an image block and define the bilinear transform used in the MC. The MC parameters are determined according to the following steps:
Different combinations of these four MC methods wereused: conventional BMA, BMAI, motion estimation with geometric transforms, BMGT andBMGTI. These tests allowedus to comparethe relative results of each technique in terms of the objective quality (PSNR) of the reconstructed sequence. Several image sequences were used, especially head and shoulders, due to their impoaance in low bit rate video coding applications-:.
Fig. 1. Sergio sequenceusedin some of the experimental tests. InEgure2, weshow thereconstructed image quality (peak signal to noise ratio - PSNR) for 72 images of Sergio sequence (CIF), using an initial block size of 32 x 32 pixels and four combinations of the studied MC methods. Inthis sequence (figure 1) a head moves 180 degrees from left to right, opening and closing mouth and eyes. This introduces a strong rotational motion, as well as a large component of uncovered background from eyes, mouth and fiom the left face (initially hidden). As can be seen from figure 2, the use of the geometric transformations in the luminance domain causes an improvement in the final quality of the reconstructed sequence. The quality gain is about 2 dB for tests using BMGTI, over those using only spatial transforms (BMA+BMGT), and above 1 dB when BMA+BMAI is used. This shows that the new transformations are able to compensate a larger number of distortions, improving the motion compensation efficiency, despite the use of a larger overhead.
1. for each block of the reference image, BTef(z! y), conesponding to theblock to be estimated, B,,,ig(z! y), search all possible bilinear transformations by moving the four veaices of the block inside a search window, using a orthogonal search algorithm and bilinear interpolation [ 7 ] ;
2. let 7'1 be one of the bilinearmappingsdefined in the previous search and B:i'/(z> y) the resulting block of that bilinearmapping overtheblock Bv,,(z! y). Using B:&(z!y) and the BoTig(z! y), determine the luminance cornpensahon factors, using equations (3) and (4);
3. quantise c and b, using fixed quantisation steps, Qc and Qb. respectively;
519
SERGIGsq adedwilh3h32blrrCs 41
a severe decrease in the PSNR of the decoded frames. When luminance transformations are used, the encoder is able to compensate the introduction of this new object in the scene and maintain the quality of the reconstmcted image. Several other tests were performed, to investigate the influence of other coding parameters on the efficiency of the studied methods, like various block sizes, bit rates and quantisation steps for the luminance compensation factors.
I
Sergio seq ceded with fiMA+SMAI+BMGT&MGll
41,
I
0
10
20
30
, 40
Frame No
I 50
SO
70
80
Fig. 2 Sequence SERGIO (12 5 Hz) coded at 40 kbps. The improvement of BMAI over BMGT, represented by the first two curves of figure 2, can be explained by an improvement in the MC efficiency of BMAI over the use of simple geometric transforms and the reduction of MC overhead bits, associated with BMAI coded blocks. This reduction of the overhead allows an increase in the number of segmented blocks for the same bit rate, which is an improvement factor for the reconstruction quality. When we compare both types of luminance transforms, we observe that BMGTI performs better that BMAI, in spite of the larger overhead associated with it. When we combine all the coding methods, better results are achieved, as each block is encoded with a transformation that more accurately represents image changes. In each case there is a selection of the transformationmoresuitable for each block, based on the relationship between available bit rate and the reconstruction quality, measured by the MAE of the reconstmcted block.
I O
20
40
60
80 Frame NO
1W
120
140
20
30
40
50
SO
70
80
Frame No
Fig. 4. Variation of the PSNR with the initial block size The hierarchical block stmcture used in this encoder allows the segmentation of a large block into four smaller blocks, until a minimum size of 8 x 8 pixels. The decision to segment a block is based on the bit rate available to encode the present frame. In figure 4 we show some results for Sergio sequence, when the initial block size is set to 32 x 32, 16 x 16 and 8 x 8 pixels. From this figure. we can see that initial blocks of small size tend to increase the quality of the decoded sequence,but increasing the bit rate. The segmentation process allows the encoder to use large blocks in areas with uniform motion This allows for the use of extra bits in more detailed areas, which can be encoded with small blocks, increasing the overall quality of the decoded frame.
MoherandDsughterssq, OCIF,co&dw!h 1816bImiu
0
10
160
Fig. 3. Sequence Mother and Daughter (15 Hz). In figure 3 we present the results for Mother and Daughter (MAD) sequence (QCIF), coded with an initial block size of 16 x 16 pixels. We can see the improvement achievedby the new transf o m in the final decoded sequence, namely from around frame 56, where the lady in the scene does a sudden move and her arm showsup. The MC for this new object is pmicularly inefficient for those methods that use only spatial transformations,then we notice
Fig. 5. Variation of the PSNR with the bit rate. In figure 5, the average PSNR for the 72 decoded frames of Sergio sequence is depicted for different bit rates, using an initial
520
block size of 32 x 32. These curves show that, when lunlinance transformation is used, the quality of the decode images is improved for all range of bit rates.
its hierarchical structure is important to adjust the size of image blocks to the scene characteristics. However, this scheme increases significantly the computational complexity.
5. COMPARISON WITH OTHER LUMINANCE TRANSFORMATION METHODS
6. CONCLUSIONS In this work, luminance transformations were combined with spatial transforms to obtain a very efficient motion compensationmethod. While spatial transforms are suitable to properly compensate complex motion, luminance transformations enable the compensation of block mismatching resultant from uncovered background, masking between objects and changes in lighting conditions. These results are achieved due to the efficiency of the optimised luminance factors, associated with the spatial bilinear transformations. Although these additional factors tend to increase the bit rate, theuse of anhierarchicalblock stmcture (32 x 32 to 8 x 8 pixels) allows for low bit rate video coding. Experimentalresults showed that these techniques perform far better than spatial transforms only, increasing the quality of the reconstructed sequence as much as 2 dB (PSNR). Also, as can be seen from MAD sequence tests, this method is able to compensate situations that cause severe reconstruction errors when only spatial transforms are used. Comparison of the proposed method with the GBC method, showed that the use of different luminance compensation factors for each image block (BMGTI) can improve signilicantly the final quality of the reconstructed image, when MC alone is used for coding video sequences.
We have also compared our results with a method that uses luminance transformations to compensate the global variation of the image brightness [SI, called GBC - global brightness compensation This method performs a global compensation of the luminance, using one set of parameters to estimate the entire image changes, unlike the method proposed in this paper. An image is divided into blocks of 16 x 16 pixels of fixed size, and a pair of luminance compensation factors, plus a translational motion vector, are determined for each block, using a full search algorithm, similar to BMAI. The compensation factors used for the entire image are the most frequent among those determined for all the blocks. An additional bit is transmitted for each block, to indicate if the luminance compensation is used in that particular block. This bit is set if the transformation allows for an improvement in the reconstruction quality of the block. Figure 6 shows some results of coding Sergio sequence with three different methods (none of them transmitting the MC predicted error image): the GBC method, BMA+BMAIusing blocks of a fixed size of 16 x 16 pixels and the same quantisation steps proposed in [SI, and finally the combination of the four methods presented in this paper, using hierarchical MC with an initialblock size of 32 x 32 pixels. The comparison between the first and the second curves exhibits the gain of using different compensation factors for each block instead of using only one pair of compensation factors for the entire image.
7. REFERENCES [ 11 A. Puri, H. M.
28
Hang, and D. L. Schlling, “An efficient blockmatching algorithm for motion compensated coding,” Proceedings of the ICASSP 87 Conference, pp. 25.4.1-25.4.4, 1987. [2] A. N. Netravalli and J. B. Robbins, “Motion-compensated television coding: Paa 1,” Bell System Technology Journal, , no. 58(3), pp. 631-670, Marqo 1979. [3] H. Li, A. Lundmark, and R. Forchheimer, ‘‘Image sequence coding at very low bitrates: A review,” IEEE Transactions on Image Processing, vol. 3, no. 5, pp. 589-609, September 1994. [4] M. Ghanbari, S. de Faria, I. N. Goh, and K. T. Kan, “Motion compensation for very low bit-rate video,” Signal Processing: Image Communication, ,no. 7, pp. 567-580, 1995. [5] Y Nakaya and H. Harashima, “Motion compensation based on spatial transformations,” IEEE Transactions on Circuits and Systems f o r Wdeo Technology, vol. 4, no. 3, pp. 339-356, June 1994. [6] P. Strobach, “Quadtree-structured interframe coding of hdtv sequences,” Proc. SPIE Internat. Con$ Wsual Commum. Image Process., pp. 812-820, November 1988. [7] M. Ghanbari, S. de Faria, I. N. Goh, and K. T. Tan, “Motion compensation for very low bit-rate video,” Signal Processing: Image Communication, vol. 7, pp. 567-580, 1995. [8] K. Kamikura, H. Watanabe, H. Jozawa, H. Kotera, and S. Ichinose, “Globalbrightness-variation compensation for video coding,” IEEE Transactions on circuits and systems for video technology, vol. 8,no. 8, pp. 988-1000, December 1998.
Sergia sequence
42
0
1
1
;
[
I
I
10
20
1
68C-382xtlysNonHierarc E M A t B M A I - 4 3 5 k o ~+ EMAtBMAltBMGltBMGll - 38 1 kbps -c
I 30
40
50
El
70
Bo
FiameNe
Fig. 6. Comparison of the results of several MC methods with luminance transformations. From figure 6 we can see that the use of different luminance compensation factors for each block coded with BMAI achieves better results than the use of a global luminance compensation. Nevertheless, this implies using a larger number of bits, as more MC parameters have to be transmitted. The best results are achieved by combining the four methods discussed in this work, with large initial blocks and quadtree segmentation This method allows the use of the more appropriate transform to compensate each individual block. Furthermore,
521