ADAPTIVE IMAGE QUANTIZATION USING TOTAL ... - CiteSeerX

10 downloads 0 Views 93KB Size Report
ious standards for teleconferencing, video telephony, and digital storage media are in their final stages or have been finalized. However standards like MPEG.
ADAPTIVE IMAGE QUANTIZATION USING TOTAL VARIATION CLASSIFICATION C. H. Choi, Oscar C. Au Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong

Abstract

the corresponding estimated blocks from the anchor frames. The I-frames and the DFDs of the B and P frames are then coded blockwise using 8x8 discrete cosine transform(DCT). The DCT coecients in each 8x8 block are quantized uniformly, scanned in a zigzag manner, and coded by combined runlength and Hu man coding to form the MPEG bitstreams. MPEG is an asymmetric coding scheme with decoding much simpler than encoding. In the encoder, the most time consuming portion is the motion estimation. The conventional exhaustive searching is the best motion estimation scheme but is also very computational intensive. Many fast methods have been proposed to alleviate the problem such as the logarithmic search, the three-step search, and the subsampling search. Another unspeci ed part under active research is the assignment of the extra quantization step size (Qp ) of the macroblocks. In any macroblock, each DCT coecient is quantized by cij 8 q c =

Most perceptual quantization algorithms use some kind of activity measure such as variance in determining the quantization step size for local macroblocks in MPEG coding. In this paper, we use directional total variation as a new kind of activity measure to develop a novel adaptive quantization algorithm. The scheme is contrasted with a method proposed by Puri and Aravind. Simulation result suggests that the proposed scheme can achieve similar image quality at a considerably lower bitrate.

1 Introduction Multimedia is a fast evolving and emerging technology. With the cooperative e orts of di erent companies, researchers and standard bodies, various standards for teleconferencing, video telephony, and digital storage media are in their nal stages or have been nalized. However standards like MPEG have left many unspeci ed parts which allows for exible solutions of the coder design. MPEG itself is an audio video coding scheme for digital transmission and storage. Its development was divided into many phases. MPEG-I has been nalized and MPEG-II, which is for interlaced picture, is still in the drafting stage. The video coding portion of MPEG uses an interpolative-predictive scheme and has three frame types: intracoded frame (I-frame), predictive coded frame (P-frame) and bidirectional interpolative coded frame (B-frame). Each P-frame is divided into 16x16 macroblocks which are motion estimated from an anchor frame, which may be an I-frame or another P-frame. Similarly, macroblocks of B-frame are motion estimated from two anchor frames of which one is a past frame and the other is a future frame. The motion vectors obtained in motion estimation are sent and will be used in the reconstruction of the images at the decoder. Each P-frame and B-frame after motion estimation will leave a displaced frame di erence (DFD) which is the di erence of the macroblocks and

ij

Qij Qp

where cij and cqij are the original and quantized DCT coecients respectively, Qij is the quantization step from a quantization table, and Qp is the extra quantization step. There are two suggested quantization tables in MPEG, one for the I-frames and one for the B-frames and P-frames. Typically, the quantization tables are designed according to the sensitivity of the human eye to each DCT coecient to remove subjective redundancy. This paper is not concerned about the design of the quantization table, but rather, the assignment of Qp . The Qp is a design parameter that provides control to the overall bitrate as well as the local quality of the coded images. The two goals are in con ict with each other. To reduce the overall bitrate, one tends to use a large Qp . On the other hand, to achieve high local image quality, one needs to use a small Qp . In order to limit the bitrate and at the same time not to have severe degradation of image quality, an adaptive assignment of the Qp is crucial. 1

Subjective quantization schemes considering the human visual system characteristics have been studied by many researchers[1-7]. Typical some kind of local image activity measure is constructed and Qp is determined according to the activity measure. The human eye is sensitive to the distortions in the low activity regions but insensitive to the distortions in the high activity regions. Thus more bits are usually allocated to the low activity regions and vice versa. Such masking e ect is exploited by Puri and Aravind[1] who derived a comprehensive MPEG coding scheme using adaptive perceptual quantization. They use variance as their activity measure. In this paper a novel adaptive perceptual quantization method using total variation classi cation to obtain the Qp is introduced. The next section discusses the methodology of nding the total variation, the classi cation of total variation into scores, and the determination of Qp according to the scores. Section 3 presents some of the results and a comparison with Puri's approach. Section 4 gives a discussion of the present approach and Puri's approach. Section 5 is a conclusion with some possible future research directions.

= 1::16 be the 16x16 macroblock. The horizontal total variation (HT v) and vertical total variation (V T v) are de ned as i; j

The total variation T v(X ) of a sequence X = is de ned as

x1 ; ::; xn

T v (X )

=

i=1

=

V T vj

=

NX ,1 j =1 NX ,1 i=1

jxi;j , xi;j+1 j jxi;j , xi+1;j j

where N is 16. Thus, a total of 32 directional total variation values are computed. The 16 HT v calculated along horizontal direction measures the horizontal activity, while the 16 V T v measures the activity in the vertical direction. The subscripts give the corresponding row number and column number within the macroblock. A large T v indicates a high level of activity within a line and vice versa. With 32 directional total variation values, the local image activity in a macroblock can be assessed signi cantly better than a single gure such as the variance. The HT v and V T v values are further classi ed into scores, HS and V S respectively, each ranging from 1 to 3. HSi = f (HT vi ) V Sj = f (V T vj ) 8 < 3; if x > c1 c1 >= x >= c2 f (x) = : 21;; if if c2 > x The rationale here is that the total variation value in a line segment gives an indication of the local activity level in that direction. The score re ects the complexity of the line (row or column of 16 pixels). A low score of 1 suggests a low activity line segment, while a medium score of 2 or a high score of 3 may correspond to a line segment in an edge region, multi-edge region or a high activity texture region. If most of the score are medium or high, the macroblock is probably in an edge or texture region. Similarly, if most scores are low, the macroblock is probably in a low activity region. With the 32 total variation scores, the activity of the macroblock can be more accurately determined. The HS and V S are summed separately to give the total horizontal score T HS and total vertical score T V S respectively.

2 Methodology

nX ,1

HT vi

jxi , xi+1 j

Generally, the total variation re ects the activity within X. In this paper, the T v is used as a measure of local activity within images. According to the MPEG standard, each 16x16 macroblock of a frame irrespective of the frame type is the smallest coding unit for the quantization step size Qp . In the proposed scheme, to determine the Qp for a macroblock, the horizontal and vertical total variation of the macroblock are evaluated. The total variation values are then classi ed into scores which are summed. The score sums are used to select Qp from a prede ned table. It is assumed that there is a target overall bitrate for each frame, which may be the result of some bu er control algorithm. N X Note that the horizontal and vertical total variation T HS = HSi in the P-frames and B-frames are obtained from the i =1 original images, instead of the DFDs. This is because N X our decision on Qp is based on the local activity level V HS = V Sj of the actual image and not the di erence. j =1 The activity of each macroblock can be estimated by the total variation calculated along each where N =16. These two scores gives an estimate horizontal and vertical directions. Let X = [xij ] with of the macroblock type which is listed in Table 1. 2

Based on human perceptual properties, low activity regions and edge regions needs more bits, while the high-activity texture regions can have fewer bits due to the masking e ect. The two scores are used to select Qp from some Qp tables. Di erent Qp tables are used for di erent frame types. Each table is two dimensional with the row and column index being the T HS and T V S respectively, and the table entries being the Qp values to be used. The table is symmetric about the diagonal. The tables generated by the following equation are found to be useful.  T HS  T V S , 256 n Qp = Qp;min +  48  48 , 256

This contrasts with Puri's approach which uses the di erence frame instead. The original frame should be used because the di erence frame data do not necessarily carry information of the scene complexity. Our rationale here is to assign Qp based on the complexity of the frame. Besides, the activity of the B-frames is not taken into account in Puri's method, unlike the proposed scheme. In Puri's method, the Qp of each slice of macroblocks is assigned based on the average Qp of the macroblocks in the corresponding slice in the anchor frames. In the proposed scheme, the total variation values of the B-frames are used also to determine the Qp .

3 Results

where Qp;min is the minimum Qp value to be used, and  is the di erence between the maximum and minimum Qp values. Notice that the expression in the bracket takes value between 0 and 1. Four such Qp tables are generated: one for I-frames, two for P-frames, and one for B-frames. For I-frames, since they are referenced as an anchor frame by the other predictive and interpolative frames, it should be coded with better image quality even at the expense of more bits. Thus the Qp;min and  used are 3 and 18. For P-frames, each of them is estimated from a previous anchor frame and is also used as the anchor frame for the other predictive frames and interpolative frames. So it should not be too coarsely quantized. Two Qp tables are built to cater for frames with di erent complexity. Inside a P-frame, each macroblock of the original image, after motion estimation, will leave a residue macroblock which records the di erence between the macroblock of current frame and that of the anchor frame. It is termed as displaced frame di erence macroblock (DFD macroblock). If the mean absolute di erence (MAD) of the DFD macroblock is low, it can be concluded that the current macroblock resembles closely to the corresponding block in the anchor frame and thus fewer bits (larger Qp ) are needed. On the other hand if the MAD is high, the current macroblock is probably in a fast changing or moving scene and more bits (smaller Qp ) are required. As a result, we build two tables to account for these situations. For the fast moving P-frames, we use 9 and 15 for the Qp;min and . For the other P-frames, the Qp;min and  are 12 and 15. For B-frames, none of them are referenced by other frames and thus can be coarsely quantized. Consequently the Qp table contains much higher Qp values compared with that of the P-frames. The Qp;min and  are chosen to be 17 and 14 respectively. Notice that, in the proposed scheme, the T V S and T HS are estimated from the original frame.

A 208 frame "football" sequence is encoded using the proposed perceptual quantization scheme. Only the luminance component of the sequence is considered. The sequence contains translational, zooming-out and both slow and fast paning. In MPEG terms, a group of picture(GOP) contains 15 frames, with four P-frames and two B-frames between each two anchor frames. This is the same as those speci ed in Puri's method. A simple bu er control scheme is simulated, which starts to increment the Qp when the bu er is more than 80% full and decrement when the bu er is less than 20% full. The bu er size is 32768 bytes. In the proposed scheme, the amount of computation for a macroblock is 510 additions and 512 comparisons. In Puri's approach, there are 510 additions, 257 multiplications and 2 divisions. Thus the two methods have similar computation complexity. Fig. 1 contains four consecutive frames which are encoded by simple uniform quantization. This is equivalent to the proposed scheme with constant entries in the Qp tables. Among the four frames, the upper left is an I-frame, and the lower right is a P-frame. The other two are B-frames. The overall bitrate is 0.39 bits per pixel(bpp). There are many blocking artifacts throughout each frame. Fig. 2 contains the same four frames encoded by Puri's method. The overall bitrate is 0.41 bpp which does not include the overhead to code the Qp . Comparing with Fig. 1, it can easily be seen that perceptual quantization schemes can achieve signi cantly higher image quality than simple uniform quantization. Fig. 3 contains the same frames encoded by the proposed scheme. The overall bitrate is 0.36 bpp, which includes the overhead to code the Qp . Comparing with Fig. 2, these frames have e ectively identical image quality as those coded by Puri's method, but 3

at a considerably lower bitrate. The bitrate per frame for the two schemes are shown in Fig. 4 against the frame numbers. It can be seen that the major bit saving of the proposed scheme comes from savings in the P-frames. The mean square error(MSE) of the two schemes are shown in Fig. 5. As expected, the MSE of the proposed scheme are lower in P-frames such as frames 4,7, and 10 than in B-frames such as frames 2, 3, and 5, with the lowest value in the I-frames, such as frames 1 and 16. On the other hand, Puri's method have similar MSE in both the I-frames and P-frames because the same Qp table is used for the two frame types. This may be the reason why signi cantly more bits are used in the P-frames in Puri's method than in the proposed scheme. Because the two frames types have di erent generic characteristics, using di erent Qp tables for the two frame types can achieve additional bit saving. The MSE of the proposed scheme is considerably higher than that of Puri's method, though the perceptual image quality of the two schemes are virtually the same. This suggests that the proposed scheme is more successful in masking the errors or distortions in the high activity areas in the images. Variance, the activity measure used in Puri's method, measures the average variation about the mean pixel value within a macroblock. It gives no distinction between the texture region and edge region. On the other hand, total variation, the activity measure used in the proposed scheme, has a better discrimination in this respect due to its directional property.

Activity type T HS V HS Low activity Low Low Vertical edge Medium/High Low Horizontal edge Low Medium/High High activity High High

References [1] Atui Puri, R. Aravind, "Motion-Compensated Video Coding with Adaptive Perceptual Quantization", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 1, pp. 351-361, Dec. 1991. [2] W.H. Chen, C.H. Smith, "Adaptive Coding of Monochrome and Color Images", IEEE Trans. on Communications, Vol. COM-25, pp. 1285-1292, Nov. 1977. [3] B. Chitprasert, K.R. Rao, "Human Visual Weighted Progressive Image Transmission", IEEE Trans. on Communications, Vol. COM-38, pp. 1040-1044, Jul. 1990. [4] M.G. Perkins, T. Lookabaugh, "A Psychophysically Justi ed Bit Allocation Algorithm for Subband Image Coding Systems", Proceedings of ICASSP, Vol. 3, pp. 1815-1818, 1989. [5] K.N. Ngan, K.S. Leong, H. Singh, "Adaptive Cosine Transform Coding of Images in Perceptual Domain", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 37, pp. 1743-1750, Nov. 1989. [6] D.L. McLaren, D.T. Nguyen, "Removal of Subjective Redundancy from DCT-coded Images", IEE Proceedings-I, Vol. 138, pp. 345-350, Oct. 1991. [7] C.A. Gonzales, E. Viscito, "Motion Video Adaptive Quantization in the Transform Domain", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 1, pp. 374-378, Dec. 1991.

4 Conclusion A novel adaptive image quantization scheme using total variation classi cation is proposed in this paper. By using directional total variation as activity measure and summing the scores, the proposed scheme determines the quantization step Qp from some Qp tables. Di erent Qp tables are designed for di erent frame types. From the simulation, the proposed scheme is found to give signi cantly better image quality and lower bitrate than the simple uniform quantization. The proposed scheme is contrasted with another perceptual quantization scheme by Puri and Aravind. The proposed scheme is found to have essentially the same image quality as their method and manage to achieve a lower bitrate.

5 Acknowledgement This work is funded by UPGC Research Infrastructure Grant 92/93. 4

Suggest Documents