A SCENE ADAPTIVE BITRATE CONTROL METHOD IN ... - CiteSeerX

5 downloads 0 Views 156KB Size Report
Myeong-jin Lee, Soon-kak Kwon, and Jae-kyoon Kim. Department of Electrical Engineering, KAIST. 373-1 Kusong-dong Yusong-gu, Taejon, Korea. ABSTRACT.
A SCENE ADAPTIVE BITRATE CONTROL METHOD IN MPEG VIDEO CODING Myeong-jin Lee, Soon-kak Kwon, and Jae-kyoon Kim Department of Electrical Engineering, KAIST 373-1 Kusong-dong Yusong-gu, Taejon, Korea

ABSTRACT This paper presents a simple bitrate control method to prevent the abrupt quality degradation after scene change. After scene change, the quality degradation occurs due to the poor temporal prediction between pictures before and after scene change. We predict the coding complexity of picture using the spatial variance before DCT and spectral atness measure. From the predicted coding complexity, we show that the rate-distortion relation of image can be approximated to exponential function. When scene changes, picture target bit is adjusted in the direction to minimize the distortion in a GOP using the rate-distortion relations for each P-picture. Since the bit shortage could be occurred, proposed method extends the current GOP to the next. The algorithm can be applied to the existing MPEG codecs and real-time applications easily. Compared with the MPEG-2 TM5 rate control algorithm, proposed algorithm shows 0.52.5 dB gain in PSNR and a small uctuation in quality after scene change. Keywords :

video coding, scene change, rate control, bit allocation, rate distortion relation

1 Introductoin In order to transmit the hugh amount of video data through a bandwidth constrained channel, a compression technique is needed. The Hybrid DPCM/DCT coding, which is generally used, removes both the temporal and the spatial redundancies by motion compensation between the neighboring pictures and DCT and completes to the bitstream through the quantization and variable length coding. In hybrid DPCM/DCT coding like MPEG[1], we can regulate the amount of coded bits using the quantization step size. Because quantization is the loss coding procedure, it has very close relation to the quality of the reconstructed images. Thus, it is needed in MPEG video encoder not only to prevent the bu er over/under ow but also to keep better quality of the reconstructed images. When scene changes, since the statistical characteristics of the current picture is very di erent from the previous, statistical information of the previous picture cannot be used as in the feedback rate control method. So, scene adaptive bitrate control method is needed to prevent the abrupt quality change and its propagation to the remaining pictures in a GOP due to the incorrect bit estimation. The main reason of the problem is due to the bit shortage by coding the scene changed P-picture which requires much bit amount than the stationary P-picture. To solve this bit shortage problem, additional bit allocation is needed. By coding the scene changed P-picture as I-picture[3] or estimating the target using some model[4] or extending the current GOP to the next[3,4], bit

shortage problem is somewhat lessened. When the coding complexity di erence of the two picture sequences is large, however, the additional bit amount is more or less than the really needed bit amount resulting in a quality

uctuation and long transient time due to the incorrect complexity update for the incoming pictures. Also, the bu er stability is not guaranteed. We present a bitrate control method which is based on the MPEG-2 TM5[1] rate control method. It can cope with the quality degradation and uctuation problem occurred by scene changes in the existing statistical rate control methods. The proposed method consists of two steps. In the rst step, we detect scene changes using the picture coding complexity de ned by the spatial domain variance and the spectral atness measure (SFM) of the previous picture. Using the picture coding complexity calculated, we can predict the rate-distortion relation of the picture. In the second step, we allocate appropriate bits to the scene changed picture and update the coding complexities. Through the bit adjustment and complexity update, sum of distortion within the scene changed GOP can be minimized and the quality degradation e ect after scene change can be reduced. The performance of the proposed algorithm is compared to TM5 method. This paper consists of as followings. In section 2, we present the prediction method of rate-distortion relation for the scene changed picture. In section 3, a new scene adaptive bit allocation method is proposed. In section 4, simulation results of the proposed control method are compared with that of TM5 method. In section 5, we conclude the results.

2 Rate-Distortion Relation of Image In this section we review the rate-distortion relation in image coding and present a method to predict the ratedistortion relation of a frame to code. Rate-distortion relation should be considered in bit allocation, because we can determine the amount of the resource to achieve some level of the distortion. But, there remain diculties of applying it to the real-time application due to the heavy calculation cost. Thus, many rate-control methods allocate bits by de ning the coding complexity of image[1,6]. Especially for the case of much change of coding complexity due to the poor prediction from the previous picture like scene change, it is necessary to allocate bits by predicting the exact coding complexity. Rate-distortion function R(D) is the least amount of average mutual information which is needed to transmit some information within distortion D between the information transmitter and the receiver, which correspond to raw image and reconstructed image in image coding. Discrete stationary Gaussian process fx(n)g has been used to model the rate-distortion relation with the mean square distortion criterion in many references. In [7], when the discrete stationary Gaussian process fx(n); n = 0; 1; :::; g has the spectral density function like equation (1), the rate-distortion relation can be represented by equation (2),(3) using the parameter . 1 X k e?jkw (w) = k ?1 Z  min[; (w)]dw D = 21 ? Z  max[0; log (w) ]dw R(D ) = 1 =



4 ?



(1) (2) (3)

k is the k-th element of the source correlation matrix and has a range of 0    supw (w). R(D) curve is generated when the parameter  varies on the range. Rate-distortion relation of the image source can be predicted using the discrete stationary Gaussian process. If we assume that image source can be modeled by the two dimensional signal which is separated into horizontal and vertical direction, characteristics of one dimensional

Distortion (MSE)

70

R-D points

50 40 30 20 10

R-D points

90

Predicted R-D

60

00

100

Distortion (MSE)

80

Predicted R-D

80 70 60 50 40 30 20 10

100000 200000 300000 400000 500000 600000 R (Bit)

(a) I-picture

00

100000

200000

300000

400000 R (Bit)

(b) P-picture

Figure 1: Observed Rate-Distortion Relation for the \Table Tennis" Sequence signal of equation (1), (2), (3) can be used without correction. Rate-distortion function with MSE criterion with variance of x2 is de ned as follows. D(R) = x2x2 2?2R; 0  x2  1 (4)

x2 in equation (4) is the spectral atness measure (SFM) which can measure the redundancy of the information source characterized by the shape of the power spectral density function. When coded by the same amount of resource, the distortion level is lowered for the Gaussian source with much memory. In hybrid DPCM/DCT coding scheme like MPEG, motion compensation and DCT are used to remove the temporal and spatial redundancies respectively. If we assume that each pictures have Gaussian distributions, intra-coded I-picture have memory due to the spatial redundancy. Though temporal redundancies of most macroblocks are removed to leave the error signal between the frames for P-pictures by motion compensation, memory exists due to the spatial redundancy. Because error signal contains changed information from the previous frame, the spatial redundancy is less than before the motion compensation. Rate-distortion relation of the image signal before DCT can be approximated by Gaussian source which have SFM of 0.050.4, 0.250.95 for I- and P-pictures respectively. Thus, we can predict the rate-distortion relation and use it for bit allocation using the spatial variance and the SFM of the previous picture. Figure 1 shows the rate-distortion relations of I- and P-picture of \Table Tennis" sequence by varying the quantization step from 2 to 62 by step 2. Dotted lines show the rate-distortion curve of Gaussian source using the spatial domain variance and SFM of the same picture. Predicted R-D curve matches the real R-D curves well for the small quantization step sizes, but some mismatches for the large in I-picture. For P-picture, it matches well for the large quantization step sizes than I-picture. The best matching region corresponds to the range where pictures of each mode are coded. Thus, we can use the predicted R-D curve for coding complexity prediction and bit allocation.

3 Bitrate Control Method for Scene Change The rate control using the statistical feedback cannot handle the scene change eciently. To solve the problem, additional bit allocation methods[3,4] have been proposed. However, they do not consider the coding complexity change, i.e. the amount of scene change. When scene change between the di erent picture complexities, same bit allocation as the stationary case could result in a bit shortage or excessive bit allocation problem to e ect the remaining pictures in the GOP. Because the target bit of current P-picture is calculated from the assumption of large correlation to the previous P-picture, the quality is degraded due to the increased number of intra-coded

macroblocks. In this section, we propose the scene adaptive rate control method which can solve the problem of scene change using the rate-distortion relation.

3.1 Bit Allocation for Scene Change For the poorly temporally predicted picture, bit allocation using the statistical feedback[1,6] allocates bits corresponding to the complexity of previous picture. But, the complexity of the poorly predicted picture is di erent from the previous. We can nd the rate-distortion relation by equation (4) using the spatial domain variance and the SFM. Thus, the ratio of spatial domain variances between the previous picture (p2) and the current picture (02) can be a measure of coding complexity change. As we mentioned in the previous section, image source can be assumed to be a Gaussian process with spatial memory. Because the amount of memory according to the picture coding modes are di erent, the ratio of e ective variances, which is de ned as a multiplication of spatial variance and the SFM, between the previous picture and the current picture can be a better measure of coding complexity change. We derive the bit allocation equation for scene changed picture which can minimize the distortion in a GOP using the information of change of coding complexity. To simplify the problem, we assume that the amount of correlation between the pictures in a GOP is same except the scene changed picture. The target bitrate of pictures in a GOP are assigned based on the TM5 rate control and the proposed additional bit allocation is done only for pictures after scene change in the GOP. Figure 2 shows the procedure of target bit adjustment by T for scene changed picture. In TM5 rate control, the target bit of the scene changed picture is calculated from the statistical characteristics of the previous pictures. p2 p2; Tp ; Dp represent the e ective spatial variance, target bitrate, and the distortion of the stationary P-pictures respectively. 02 02; D0 represent the spatial domain variance and the distortion resulted by the target Tp respectively. Generally, p2p2 is smaller than 02 02 and distortion increases by (D0 ? Dp ). When scene change from the large complexity to the small, error signal may have much information than the simple picture. Distortion of the scene changed picture after bit adjustment reduces from D0 to D00 . From the assumption that temporal correlation is same except the scene changed picture, coding complexities of the P-pictures after scene change are same. Additionally allocated bit T is distributed to P-pictures with same amount to reduce the target bitrate to Tp + T j and to increase the distortion from Dp to Dp0 . Np is the number of the P-pictures remaining in a GOP. T , the amount of target bit adjustment, have important means to the next pictures. [3,4] show some bit adjustment method for scene change, but they do not consider the change of the coding complexity eciently. If T is set to be too small, target bit decrement of the next pictures are small but the large distortion of the scene changed picture propagates to the current GOP. If T is set to be too large, there is small distortion of the scene changed picture, but the increment in distortion of the next pictures due to the shortage of remaining bit budget within the current GOP. Thus, it is required that some trade-o between the bit adjustment considering the coding complexity of the scene changed picture and the sucient bit budget for the next pictures. The trade-o can be done using the MSE cost function of the reconstructed pictures to calculate the T . Adjusting the target bits by T for scene changed picture and Tj for the j-th P-pictures after scene change to result the target bit R0; Rj respectively, the sum of distortion and the constraints are as the following. Equation (6) means that the total bitrate of a GOP is constant.

Dsum = =

XN Dp0 + D0 j XN p p2? R +  2? R p

=1

p

j=1

j

2 2

0

2

j

2 2 0 0

2 0

(5)

Distortion (MSE) γo2 σ2o

I p p p p p p p p I

SC

Np

γp2 σ2p Do Do Dp Dp

Tp-∆Τj / Np Tp (=R j )

Tp+ ∆Τ (=Ro)

R (bits/pel)

Figure 2: Target Bit Adjustment after Scene Change

XN Rj = R p

(6)

j=0

Under the constraints of equation (6), Rj to minimize the equation (5) is obtained by Lagrangian multiplier method. 2 2 R0 = N R+ 1 + 2(NNp+ 1) log2(

0202 )

p

p

2 2 Rj = N R+ 1 + 2(N 1+ 1) log2( p2 p2 ); p

p

0 0

p p

j = 1; 2; :::; Np

(7) (8)

Equation (7) can minimize the distortion for P-pictures after a scene change including the scene changed picture. If the above assumption is satis ed after a scene change, the equation can be used repetitively. As the bit adjustment procedure considers the current picture target with the assumption of same coding complexities for the incoming pictures, it can be modi ed to the general rate control method easily. Also, equation (8) means that the additionally allocated bits for scene changed picture is distributed equally to the remaining P-pictures.

3.2 Bitrate Control using the Bit Adjustment The target bit of each picture is based on the TM5 rate control method and the target bit adjustment is done only for the poorly predicted pictures like scene change. Figure 3 shows the system architecture of the proposed method. It is based on the hybrid DPCM/DCT coder with some additional blocks to calculate the picture complexity. The sum of distortion in a GOP can be minimum, if we use the bit adjustment equation of the previous section, However, there exist some limit of target bit due to the xed bit of GOP resulting in a poor

picture quality compared to the same picture of stationary case. This bit budget shortage problem can be solved by the current GOP extension to the next [3,4]. GOP extension means the picture coding mode change of the next I-picture to P-picture. Though this method cannot preserve the CBR temporarily in a current GOP, the channel

uctuation can be absorbed to preserve the CBR in a integrated GOP using the encoding bu er. Threshold for GOP extension is obtained from simulation and can be considered about two cases. One is to decrease the coding complexity, the other is the opposite. If the coding complexity decreases, additional bits can be negative and bit shortage problem does not occur. However because additional bits go to positive for most cases, we can use the bit threshold as in equation (9) for GOP extension and bit allocation. In the conventional methods, signal di erence between the neighboring pictures was used as a measure for scene change. However, as the comparison is done in the procedure of determining the coded macroblock type, the number of intra-coded macroblocks can be a simple measure for detecting scene change. We de ne a scene change as the occurrence of the number of intra-coded macroblocks over 60 percent of total macroblocks. For the GOP extension, we take the case which the ratio of the e ective spatial variances is greater than one. In TM5, the coding complexity of a picture is de ned as a multiplication of coded bit S and the average quantization step Q and used for the bit allocation for the next picture. However, if we de ne the coding complexity for the scene changed picture like this, the coding complexity of the scene changed picture increases to e ect the next P-picture to have the larger target bit than needed. From the assumption of same temporal correlation except the scene changed picture, which means the same coding complexities for P-pictures in a GOP, we updated the coding complexity of the scene changed P-picture as in equation (9).

XP = [S ? Adjusted bits]  Q

(9)

From the above equation, the coding complexity of P-picture can be updated for scene change. When we code I-picture after scene change, it is dicult to allocate suitable bit because current coding complexity keep that of the I-picture before scene change. Because the change of the coding complexity of I-picture is not updated, excessive bit allocation or bit shortage problem may occur to have a bad e ect to the remaining pictures in a GOP. As there are so many intra-coded macroblocks for scene changed picture, we can calculate the partial coding complexity using the coded bitrate and average quantization step for intra-coded macroblocks and scale it to the entire picture linearly as in equation (9).

XISC =

X MBBits  NX

NIMB j=1

IMB

j

j=1

NTotal Qj NIMB  NIMB

(10)

NIMB , NTotal are the number of the intra-coded macroblocks and total macroblocks respectively. MBBitsj and Qj are the coded bitrate and the quantization step of j-th macroblock. Bits for macroblocks can be allocated as averaged equally over total macroblocks or linearly allocated using the local complexity of a macroblock. But, it is dicult to estimate the coded bitrate from the coding complexity and to generate exact target bit for each macroblock due to the determination of quantization step from the virtual bu er fullness. Thus, we allocate the target bits averaged over entire picture to each macroblock. In our study, we did not consider the bu er status. In [8], bu er safety is considered after the target bit allocation by adjusting the target with current and the predicted bu er states after current picture. In this case, with some increase of distortion, scene adaptive rate control with stable bu er operation was possible. Bu er control should be considered together with the bit allocation. In [5], bu er control is done with bit allocation by adding cost function to the original distortion function. Because the cost function used measure the deviation from the desired bu er level, ecient bu er operation for stationary sequence is impossible. However, it is impossible to predict the nonstationary input like scene change to force the bu er control as this approach.

4 Simulation Results In this section, we compare the performance of the proposed rate control method with TM5 rate control. We used the SIF format input sequence of \Table Tennis" and two edited sequences consist of \Mobile&Calendar" and \Flower Garden". Scene changes occur at 68 and 98-th picture in Table Tennis, 51-th in the edited sequences. The target bitrate is set to 1.152Mbps and the motion search range is 7:5pel=frame. The GOP size and the reference picture distance are set to 12 and 1 respectively. To improve the human visual e ect, adaptive quantization step is used in TM5. Because this step reduces the PSNR, we did not used this step for fairness. As measures of the performance, we select the average PSNR over 1 GOP and the variance of PSNR over 2 GOP after scene changes to view the both e ects of additional bit allocation and the suitable complexity updates. The reason of choosing the variance of PSNR over 2 GOP is due to the fact that complexity update of I-picture after scene change takes about 1 GOP e ecting the pictures to be coded with incorrect target bits over 2 GOP after scene change. Figure 46 show the generated bits and PSNR of Table Tennis, less and more complexity sequences after scene change. When coded with TM5 method, the reconstructed picture quality shows abrupt degradation and uctuation. It takes at least on GOP interval to recover. Using the proposed rate control method, we can see that additional bit allocations are done on the scene changed pictures resulting in a less degradation and small uctuation of reconstructed picture quality. Table 1 summarizes the simulation results. However, there are still remains the bu er over ow problem. Though there are bu er regulation methods adjusting the target with the bu er status[2,8] , as mentioned in the previous section, the bu er control should be considered with the bit allocation and remains for further study. Table 1: Comparison of the PSNR after Reconstruction Test Sequence Average PSNR over 1 GOP Variance of PSNR over 2 GOP Proposed TM5 Proposed TM5 Table Tennis 1st 36.1 34.4 1.39 3.50 Table Tennis 2nd 35.1 32.7 0.58 3.52 Decreasing Complexity 26.9 26.6 0.39 0.49 Increasing Complexity 24.0 23.3 0.25 0.96

5 Conclusion In this paper, we investigated the rate control problem of hybrid DPCM/DCT video encoder using the DCT and variable length coding, especially the problem of scene change in the environment of existing statistical feedback rate control methods. To solve the problem, we proposed scene adaptive rate control method which minimize the total distortion of pictures in a GOP after scene change from the rate-distortion relation using both the spatial domain variance and SFM. This increases the hardware complexity of MPEG video encoder. From the simulation results, there is 0.52.5 dB gain in PSNR for 1 GOP after scene change and small quality uctuation than TM5. It is expected that the proposed bit allocation method will be also used to the case of the poor temporal prediction such as zooming, fast motion, and uncovered background. Furthermore, the study should be done in future about the prediction of exact rate-distortion relation using the e ective variance of the image source and the bit allocation considering together with the encoding bu er.

(n-th frame)

Frame Memory (n, 1)

...

(n, 2)

(n,K-1) (n,K)

Input Pictures

DCT

+

Q

VLC

-

+

Coded Bitstream

-1 +

Frame Memory (Coded)

Motion Estimation Frame Memory (Original)

Adjust the Target Bits

Mode Decision based on original frame

Calculate the Variance

Modes & Vectors

MC

K: number of macroblocks in a frame (n,i) : i-th macroblock of n-th frame : Control Flow : Data Flow

Buffer

Q -1

DCT

(n+1,1)

Determine the Target Bits

Mode Decision

Figure 3: System Con guration of the Proposed Method

6 REFERENCES [1] ISO/IEC JTC1/SC29/WG11 and ITV-TS SG15 EG for ATM video coding, \MPEG-2 Video Test Model 5," April 1993. [2] Joel Zdepski, Dipankar Raychaudhuri, and Kuriacose Joseph, \Statistically Based Bu er Control Policies for Constant Rate Transmission of C ompressed Digital Video," IEEE Trans. on Commun., Vol. 39, No. 6, pp. 947-957, June 1991. [3] Limin Wang, \Bit Rate Control for Hybrid DPCM/DCT Video Codec," IEEE Trans. on Circuits and Systems for Video Tech., Vol. 4, No. 5, pp. 50 9-517, Oct. 1994. [4] Seong Hwan Jang and Seop Hyeong Park, \An Adaptive Rate Control Algorithm for DPCM/DCT Hybrid Video Codec Adopting Bi- directional Prediction," SPIE Visual Communications and Image Processing Conference, Vol. 2094, pp. 1237-1248, 1993. [5] Liang-Jin Lin, Antonio Ortega, and C.-C. Jay Kuo, \Gradient-based bu er control technique for MPEG," SPIE Visual Communications and Image Processing Conference, Vol. 2501, pp. 1502-1513, 1995. [6] Eric Viscito and Cesar Gonzales, \A Video Compression Algorithm with Adaptive Bit Allocation and Quantization," SPIE Visual Communications and Image Processing Conference, Vol. 1605, pp. 205-216, 1991. [7] N.S.Jayant and Peter Noll, \Digital Coding of Waveforms," Prentice Hall, 1984. [8] Myeong-jin Lee, \An E ective Bitrate Control Method for Scene Change in MPEG Video Coding," KAIST Master Thesis, 1996.

200000

Proposed TM5

180000

Generated Bits

160000 140000 120000 100000 80000 60000 40000 20000 0

0

20

40

60

80

100

120

140

Frame 38

Scene Change

Scene Change

Proposed TM5

37

PSNR(dB)

36 35 34 33 32 31 30 0

20

40

60

80

100

120

140

Frame

Figure 4: Simulation Results for \Table Tennis" Sequence

110000

Proposed TM5

100000

Generated Bits

90000 80000 70000 60000 50000 40000 30000 20000 1000030

40

28

50

60

70

80

90

100

Frame

Scene Change

27.5

PSNR(dB)

27 26.5 Mobile & Calendar

26 25.5

Proposed TM5

25 Flower Garden

24.5 24

30

40

50

60

70

80

90

100

Frame

Figure 5: Simulation Results for Decreasing Picture Complexity

100000

Proposed TM5

90000

Generated Bits

80000 70000 60000 50000 40000 30000 20000

30

40

50

60

70

80

90

100

Frame 28

Proposed TM5

Scene Change

PSNR(dB)

27 26 25

Flower Garden

24 Mobile & Calendar

23 22

30

40

50

60

70

80

90

100

Frame

Figure 6: Simulation Results for Increasing Picture Complexity

Suggest Documents