REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING Robert L. Stevenson
Laboratory for Image and Signal Processing Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556 E-Mail:
[email protected] ABSTRACT The compression of digital video data has many applications in the transmission and storage of video sequences. For moderate compression ratios there are many techniques which can provide satisfactory performance, for high compression ratios, however, typical compression techniques produce noticeable artifacts in the reconstructed video. This paper proposes a technique for the post-processing of motioncompensated compressed video data. The technique utilized a stochastic regularization approach which can be realized using a simple and fast iterative computational algorithm. The approach has been applied to the post-processing color video sequences and yields good results.
1. INTRODUCTION Source coding of video data has been a very active area of research for many years. The goal of course is to reduce the number of bits needed to represent a video sequence while making as few as possible perceptible changes to the data. Many algorithms have been developed which can successfully compress a video sequence to a certain rate with almost no perceptible eects. A problem arises, however, as we try to push these compression techniques beyond their target rate. For high compression ratios most algorithms start to generate artifacts which severely degrade the perceived quality of the video sequence. The type of artifacts generated are dependent on the compression technique. For motion-compensated block encoded video sequence, the most noticeable artifact is generally the discontinuities present at block boundaries due to both the transform coding of inter-frame coded images and the motioncompensation of intra-frame coded images. This paper extends a technique which we originally proposed for post-processing still image data to post-processing video data. It is based on a stochastic framework, where probabilistic models are used for both the noise introduced by the coding and for a \good" image. The restored video sequence is the MAP estimate based on these models. Most previous work in this area have addressed just the postprocessing of still image data. Extending these previous ideas is not always straight forward. Previous techniques This work was supported by the Intel Corporation.
which have tried to address this issue have various problems which limit their ability to produce high quality image estimates. Some techniques propose changes in the way the image is coded[4, 2], this however reduces the eciency of the source coder and thus reduces the compression ratio. Linear based estimators[10], while removing some artifacts, usually degrade edge information in the original image. Several techniques try to overcome this smoothing of the edges by rst estimating the edge information in the compressed image data[3, 5] or by estimating edge information during an iterative smoothing procedure[8, 9]. This however is a very dicult task for very high compression ratios, where the actual edge information is somewhat scrambled. This paper will rst describe a generic model for video compression. For the purpose of the reconstruction algorithm, this model is descriptive enough to describe many compression techniques, such as: subband coding, vector quantization, DPCM and various hybrid techniques which combine some of these methods. It also describes the effects of motion compensation which is used in many video compression techniques. A decompression algorithm is then described based on a previously proposed image model[6, 7]. The computational algorithm is also brie y described. Experimental results are shown for video data compressed using the Intel's Indeo compressor. It can be seen that reconstructed image sequence of this new method show a reduction in many of the most noticeable artifacts and thus allow higher compression ratios.
2. DECOMPRESSION ALGORITHM To decompress the compressed video representation, a MAP technique is proposed. Let the compressed video data be represented by y while the decompressed full resolution video sequence is represented by z . For MAP estimation, the decompressed video data estimate z^ is given by z^ = arg max L(z jy) (1) z where L(:) is the log likelihood function L(:) = log Pr(:). Using Bayes rule (2) z^ = arg max flog Pr(yPrjz()yPr) (z ) g z = arg max flog Pr(yjz ) + log Pr(z )g (3) z
The conditional probability Pr(yjz ) is based on the video compression method while prior probability Pr(z ) is based on the stochastic image model[6, 7].
ρ
2.1. Video compression model
In a transform coding compression technique, a unitary transformation H is applied to an original video frame x. The compressed representation y is obtained by applying a quantization Q to the transform coecients
y = Q[Hx]:
(4)
In video compression, the quantizer, Q, often includes a linear dierence operation with the previous frame. Without loss of generality this can be included as part of the nonlinear operator Q. Quantization partitions the transform coecient space and maps all points in a partition cell to a representative reconstruction point, usually taken as the centroid of the cell. The indices of these cells are transmitted in the compressed representation y. In the standard video decompression method, the reconstructed video is given by z^ = H ?1 Q?1 [y]: (5) The inverse quantization maps the indices to the reconstruction points. Quantization may be viewed as a many to one operation. That is, many video sequences map into the same compressed representation. The operation of the quantizer is assumed to be noise free. A given video sequence z will be compressed to the same compressed representation y every time. The conditional probability for the noise free quantizer can be described by
Pr(yjz ) =
1; y = Q[Hz ]; 0; y = 6 Q[Hz ]:
(6)
Therefore the MAP estimate of (3) can be written as a minimization constrained to the space Z = fz : y = Q[Hz ]g
z^ = arg min f? log Pr(z )g: z2Z
(7)
2.2. Image model
For a model of a \good" image (i.e. Pr(z )) a non-Gaussian Markov random eld (MRF) model is used [6, 7]. This model has been shown to successfully model both the smooth regions and discontinuities present in images. In this research a special form of the MRF is used which has this very desirable property. This model is characterized by a special form of the Gibbs distribution X (8) Pr(x) = 1 expf? 1 (dt x)g
Z
c2C
T c
where is a scalar constant that is greater than zero, dc is a collection of linear operators and the function T (:) is shown in Figure 1 and is given by
T (u) =
u2 ; juj T; T 2 + 2T (juj ? T ); juj > T ;
(9)
-T
T
x
Figure 1: T (x) Since T (:) is convex, this particular form of the MRF results in a convex optimization problem when used in the MAP estimation formulation (3). Therefore, such MAP estimates will be unique, stable, and can be computed eciently. The function T (:) is known as the Huber minimax function [1] and for that reason this statistical model is called the Huber-Markov random eld model (HMRF). For this distribution, the linear operators, dc , provide the mechanism for incorporating what is considered consistent most of the time while the function T (:) is the mechanism for allowing some inconsistency. The parameter T controls the amount of inconsistency allowable. The function T (:) allows some inconsistency by reducing the importance of the consistency measure when the value of the consistency measure exceeds some threshold, T . For the measure of consistency, the fact that the difference between a pixel and its local neighbors should be small is used; that is, there should be little local variation in the image. For this assumption, an appropriate set of consistency measures is
fdtc z gc2C = fzm;n ? zk;l gk;l2Nm;n ;1m;nN ; (10) where Nm;n consists of the eight nearest neighbors of the
pixel located at (m; n) and N is the dimension of the image. Across discontinuities this measure is large, but the relative importance of the measure at such a point is reduced because of the the use of the Huber function. The MAP estimate can now be written as
X
Vc (z ) z^ = arg min z2Z c2C X X
(11)
= arg min T (zm;n ? zk;l ): (12) z2Z 1m;nN k;l2Nm;n
As a result of the choice of image model [6, 7], this results in a convex (but not quadratic) constrained optimization which can be solved using iterative techniques.
2.3. Reconstruction algorithm
An iterative approach is used to nd z^ in the constrained minimization of (12). An initial estimate z0 is improved by successive iterations until the dierence between zk and zk+1 is below a given threshold . The rate of convergence of the iteration is aected by the choice of the initial estimate. A better initial estimate will result in faster convergence.
The initial estimate used here is formed by the standard decompression z0 = H ?1 Q?1 [y]: (13) Given the estimate at the k-th iteration, zk , the gradient descent method is used to nd P the estimate at the next iteration, zk+1 . The gradient of T (dtc z ) is used to nd the steepest direction b(zk ) towards the minimum
b(zk ) =
X c2C
0T (dtc zk )dtc :
(14)
The size of the step k is chosen as
k = bt (P k
c2C
btk bk
00T (dtczk )dc dtc )bk :
(15)
(16)
may fall outside the constraint space Z , wk+1 is projected onto Z to give the image estimate at the k + 1-th iteration
zk+1 = PZ (wk+1 ):
(17)
PZ is dependent both on the original compressed image y and the quantization Q which was used to produce it. In projecting the image wk+1 onto the constraint space Z , we are nding the point zk+1 2 Z for which jjzk+1 ? wk+1 jj is a minimum. If wk+1 2 Z , then zk+1 = wk+1 and jjzk+1 ? wk+1 jj = 0. Since H is unitary,
jjHzk+1 ? Hwk+1 jj = jjzk+1 ? wk+1 jj
The problem of video decompression has been cast as an ill-posed inverse problem, and a stochastic regularization technique has been used to form a well-posed reconstruction algorithm. A statistical model for the image was produced which incorporated the convex Huber minimax function. The use of the Huber minimax function T () helps to maintain the discontinuities from the original image which produces high resolution edge boundaries. Since T () is convex, the resulting multidimensional minimization problem is a constrained convex optimization problem. Ecient computational algorithms can be used in the minimization. The proposed video decompression algorithm produces reconstructed video sequences which greatly reduced the noticeable artifacts which exist using standard techniques.
5. REFERENCES
Since the updated estimate wk+1 ,
wk+1 = zk + k b(zk );
4. CONCLUSION
(18)
and the projection can be carried out in the transform domain. For intra-frame encoded images the motion compensated block is subtracted o before the projection operator is applied. It is then added back to the projected image point to obtain zk+1 . Devising projector operators in the transform domain can be done for both scalar and vector quantizers.
3. EXAMPLES Figure 2a shows a single intra-coded frame from a teleconferencing test sequence after it has been compressed by Intel's MRV compression standard. The coding parameters have been set so that the original 320 240 at 10 frames per second is compressed to a 90kbs (a compression ratio of 200 to 1) Coding artifacts are very noticeable at this rate. Most noticeable are the blocking eects of the coding algorithm. Figure 2b shows the result of the postprocessing described in this paper. Only seven iterations of the iterative procedure were executed in order to reduce the coding artifacts. Notice that the blocking eects have been completely removed in the postprocessed image. This can be most easily seen in the face and background regions.
[1] P. J. Huber, Robust Statistics, New York: John Wiley & Sons, 1981. [2] K. N. Ngan, D. W. Lin, and M. L. Liou, \Enhancement of Image Quality for Low Bit Rate Video Coding," IEEE Transactions on Circuits and Systems, Vol. 38, No. 10, October 1991, pp. 1221{1225. [3] B. Ramamurthi and A. Gersho, \Nonlinear SpaceVariant Post-processing of Block Coded Images," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 5, October 1986, pp. 1258{1268. [4] H. C. Reeve and J. S. Lim, \Reduction of Blocking Eects in Image Coding," Optical Engineering, Vol. 23, No. 1, January/February 1984, pp. 34{37. [5] K. Sauer, \Enhancement of Low Bit-Rate Coded Images Using Edge Detection and Estimation," Computer Vision Graphics and Image Processing: Graphical Models and Image Processing, Vol. 53, No. 1, January 1991, pp. 52{62. [6] R. R. Schultz and R. L. Stevenson, \Improved Definition Image Expansion," Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, March 23{26, 1992. [7] R. L. Stevenson and S. M. Schweizer, \Nonlinear Filtering Structure for Image Smoothing in Mixed-Noise Environments," Journal of Mathematical Imaging and Vision Vol. 2, 1992, pp. 137{154. [8] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, \Regularized Reconstruction to Reduce Blocking Artifacts of Block Discrete Cosine Transform Compressed Images," IEEE Trans. on Circuits and Systems for Video Technology, Vol. 3, No. 6, pp. 421{432, Dec. 1993. [9] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, \Projection-Based Spatially Adaptive Reconstruction of Block-Transform Compressed Image," IEEE Trans. on Image Processing, Vol. 4, No. 7, pp. 896{908, July 1995.
[10] A. Zakhor, \Iterative Procedures for Reduction of Blocking Eects in Transform Image Coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 2, No. 1, March 1992, pp. 91{95.
(a)
(b)
Figure 2: a) MRV compressed intra-frame coded image (200:1), b) Post-processed image