Quality Enhancement for Motion JPEG Using Temporal ... - CiteSeerX

1 downloads 0 Views 1MB Size Report
Feb 23, 2009 - Abstract—The paper proposes a pixel-based post-processing al- gorithm to enhance the quality of motion JPEG (MJPEG) by ex- ploiting the ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

609

Quality Enhancement for Motion JPEG Using Temporal Redundancies Du˜ng Trung Võ, Student Member, IEEE, and Truong Quang Nguyen, Fellow, IEEE

Abstract—The paper proposes a pixel-based post-processing algorithm to enhance the quality of motion JPEG (MJPEG) by exploiting the temporal redundancies of the decoded frames. The technique permits reconstruction of the high frequency coefficients lost during quantization, thereby reducing ringing artifacts. Based on the linearization of the quantization function, the error between the estimated and original coefficients is analyzed for both cases of ideal and real video sequences. Blocking artifact reduction is verified by a reduction in the variance of this coefficient error. The condition of valid motion vectors to get quality improvement is considered based on these errors. The algorithm is also extended to find the optimal filter for a general estimation scheme based on an arbitrary number of frames. Results in visual and peak signal-to-noise ratio improvement using both integer and subpixel motion vectors are verified by simulations on video sequences. Index Terms—Blocking artifact, discrete cosine transform (DCT), motion JPEG (MJPEG), quantization error, ringing artifact.

I. INTRODUCTION OTION JPEG (MJPEG) compresses separately each frame of a video sequence in JPEG format. Compared to MPEG, MJPEG has lower attainable compression level but is also less computationally complex. It is popularly used for nonlinear editing which requires easy access to any frame. Another application of MJPEG is in medical imaging which requires high quality images and error resilience. The disadvantage of MJPEG is that it does not exploit the temporal redundancies of successive frames to achieve higher compression, and consequently MJPEG has higher bit rate than MPEG for the same quality. Quality enhancement for MJPEG until now has focused on improving the quality using single JPEG frame. Typical quality problems for JPEG are blocking and ringing artifacts, especially at low bit rate compression. Blocking artifacts occur when each frame is processed independently in separate blocks with coarse quantization of discrete cosine transform (DCT) coefficients. Ringing artifacts, which are similar to Gibbs phenomenon [1] and happen around sharp edges, are the result of lost high frequencies during quantization.

M

Manuscript received October 15, 2006; revised March 22, 2007 and September 12, 2007. This work is supported in part by Texas Instruments. This paper was recommended by Associate Editor J. Boyce. The authors are with the Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093-0407 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2008.918807

One method to deal with blocking artifacts is to use the lapped orthogonal transform (LOT) [2], which increases the dependence between adjacent blocks. As the LOT based approach is incompatible with the JPEG standard, many other approaches consider spatial- and DCT-domain post-processing techniques. These include lowpass filtering [3], adaptive median filtering [4] and nonlinear spatially variant filtering [5] which were applied to remove the high frequencies caused by sharp edges between adjacent blocks. Other pixel-based methods are projection on convex sets (POCS), constrained least squares (CLS) [6], and maximum a posteriori probability approach (MAP) [7], all of which require many iterations with high computational load. Xiong et al. [8] considered the artifact caused by discontinuities between adjacent blocks as quantization noise and used overcomplete wavelet representations to remove this noise. The edge information is preserved by exploiting cross-scale correlation among wavelet coefficients. The algorithm yields improved visual quality of the decoded image over the POCS and MAP methods and has lower complexity. The DCT-based methods adjust the quantized DCT coefficients to reduce quantization error. Tien and Hang [9] made an assumption that the quantization errors are strongly correlated to quantized coefficients in high contrast areas. Their algorithm first compares the DCT coefficients to the pretrained quantized coefficient representatives to get the best match, then adds the corresponding quantized error pattern to reconstruct the original DCT coefficient. This method requires a large predefined set of quantized coefficient representatives and provides slight peak signal-to-noise ratio (PSNR) gain. In another approach, Jeon and Jeong [10] defined a block boundary discontinuity and compensated for selected DCT coefficients to minimize this discontinuity. To restore the stationarity of the original image, Nosratinia [11] averaged the original decoded image with its 15 displacements. These displacements are calculated by compressing, decompressing and translating back shifted versions of the original decoded images. With assumption on the small changes of neighboring DCT coefficients at the same frequency in a small region, Chen et al. [12] applied an adaptively weighted lowpass filter to the transform coefficients of the shifted blocks. The window size is determined by the block activity which is characterized by human visual system (HVS) sensitivity at different frequencies. A new type of shifted blocks across any two adjacent blocks was constituted by Liu and Bovik [13]. They also defined a blind measurement of local visibility of the blocking artifact. Based on this visibility and primary edges of the image, the block edges were divided into three categories and were processed by corresponding effective methods.

1051-8215/$25.00 © 2008 IEEE Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

610

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

To reduce ringing artifacts, Hu et al. [14] used a Sobel operator for edge detection, then applied a simple lowpass filter for pixels near these edges. Using a similar process, Kong et al. [15] established an edge map with smooth, edge and texture blocks by considering the variance of a 3 3 window centered on the pixel of interest. Only edge blocks are processed with an adaptive filter to remove the ringing artifacts close to the edges. Oguz et al. [16] detected the ringing artifact areas that are most prominent to HVS by binary morphological operators. Then a gray-level morphological nonlinear smoothing filter is applied to these regions. Although lessening the ringing artifacts, these methods do not solve the problem completely because the high frequency components of the resulting images are not reconstructed. All above blocking and ringing artifact reduction techniques can be applied to still images and are also suitable to MJPEG in separate frame-based enhancement. Due to the strong correlation between successive frames in video sequences, information from neighboring frames can be used to reduce the quantization error resulted from DCT coefficient truncation in each frame. This paper proposes a novel method of using the previous and future frames to enhance the quality of the current frame. Motion vectors between these decoded frames are found and used to estimate the displacements of the current frame. They are then averaged to yield the final reconstructed frame. This is equivalent to a motion compensated temporal filter (MCTF) for aligned blocks. Previous MCTF are used in prefilter denoising [17], [18] and in scalable video coding (SVC) [19], [20]. The paper is organized as follows. Section II formulates the translational relation in the DCT domain and develops an algorithm to find a displacement of one block. The enhancement process for the case of pure translational video sequences will be described in detail in Section III. This section also analyzes the error between the reconstructed and original blocks. Extensions for applications to video sequences are discussed in Section IV. Section V generalizes the enhancement process which uses an arbitrary number of referenced frames and designs an optimal filter providing the minimal error. Section VI presents simulation results and comparison to other approaches. Finally, Section VII concludes the paper and discusses future directions related to this work. II. TRANSLATIONAL RELATION OF DCT COEFICIENTS This section derives the relation in the DCT domain for shifted blocks of different frames. In MJPEG, each frame is . Assume processed in separate blocks of size matches to a portion of image starting at that block , possibly located among four adjacent blocks pixel as in Fig. 1, i.e., for (1) The DCT transform (Type II) of

for

Fig. 1. Translation between blocks of image x and x.

where

if otherwise. In (2), if is replaced by its DCT coefficients , can be calculated based on the DCT coefficients of 4 blocks ( , , and ) of image or more generally, based on a function as shown in Appendix I (3) After the DCT transform, the quantization process truncates to obtain the output with the DCT coefficients error (4) where (5) and is the quantization step size matrix. The same quanto get . Because of the tization process is applied to nonlinear characteristic of the quantization function, the error prevents the quantized DCT coefficients of the origand the shifted blocks from satisfying (3). inal block Linearizing the quantization function as shown in Fig. 2, a disof can be estimated based on placement (6) Instead of having zero values due to truncation, high frequency DCT coefficients are recreated using (6). This equation is equivalent to a simple translation in the pixel domain

can be obtained by [21]

(7)

(2)

In summary, if the motion vectors between blocks are known, a translational relation can be established as in (7) to get a displacement of the current block . The next

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

VÕ AND NGUYEN: QUALITY ENHANCEMENT FOR MOTION JPEG USING TEMPORAL REDUNDANCIES

611

Since the DCT is an orthonormal transform, the mean squared error (MSE) between the estimated and original blocks can be calculated by their DCT coefficients. The error between the original and estimated DCT coefficients is defined by

(11) where

Fig. 2. Original and linearized quantization functions.

(12) (13) (14)

The total error comes from quantization error of the current block and the error of using the past and future displacements of the current block. At a specific frequency Fig. 3. Block diagram of the enhancement algorithm.

section presents a method to use this displacement in reducing . the blocking and ringing artifacts of III. QUALITY ENHANCEMENT USING TEMPORAL REDUNDANCIES

(15)

Fig. 3 shows the block diagram of the enhancement algorithm. The IDCT blocks transform the DCT coefficients to the pixel domain and motion estimation (ME) blocks are used to find the motion vectors between blocks of frames. Assume that the present encoded block is shifted from one block of the prepixels and from another block of the vious frame by pixels, the backward estimated verfuture frame by and forward estimated version of sion compressed block are calculated by

(8)

, , , and are the quantized DCT cowhere efficients of blocks ( , , , ), respectively and the terms are calculated as in Appendix II. Equation (15) shows that the backward error is caused by the quantization errors of weighted by . An equivalent expression blocks is found by replacing with and for with . The variance of the error is calculated with the assumption that the quantization error is white and independent for DCT coefficients within the same block, across different blocks and is different frames. It is well known from [22] that uniformly distributed with zero mean and

(9) (16) The matching blocks in the previous and future frames are any shifted blocks, not just the encoded blocks. Consequently, an average scheme is used to obtain the final processed block

Consequently, from (11), (15), and (16)

(10) This is equivalent to a temporal filtering for aligned blocks of different frames. Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

(17)

612

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

The MSE between the reconstructed and original blocks can be calculated by

and have zero mean and are independent from quantization error of blocks

(24)

(18)

The error variance increases in real case and the MSE becomes (25)

To obtain improvement, the MSE of the proposed approach must be less than the MSE between the compressed and original blocks. This leads to the following condition

(19) In this section, an algorithm to enhance the quality of MJPEG was presented and analyzed. To achieve MSE enhancement, fulfilling the conmotion vectors must lead to dition (19) which will be considered in detail simulations in Section VI.

(25) To get the improvement, the upper limit of MSE should satisfy the same condition in (19). This is equivalent to (26).

IV. QUALITY ENHANCEMENT FOR REAL VIDEO SEQUENCES In real video sequences, a pure relation as in (1) is rarely satisfied. A more suitable relation between blocks needs to be considered in this case. Assume that the present encoded block is shifted from one block of the previous frame by pixels with difference and from another block of the future pixels with difference , or frame by

The backward estimated version , forward estimated version and are still calculated by (8)–(10) as in the ideal case. The error between the original and estimated DCT coefficients is

(20) where and

and , respectively, and

are the DCT coefficients of

(21) (22) (23)

(26) Distinctive from conventional MCTF, the proposed temporal filter is adaptive for different blocks. The temporal filter will be applied for blocks which have backward and forward motion vectors satisfying (26). The other blocks are not processed because they have better quality than the output of the temporal filter. V. OPTIMAL ADAPTIVE FILTER FOR ARBITRARY NUMBER OF REFERENCED FRAMES This section considers a more general approach which uses previous frames and future frames to enhance the current frame. Assume that the processed block of the present frame is shifted from another block of the referenced frame by pixels for , where correcorresponds to the sponds to the previous frames and future frames. The estimated versions of the current frame are calculated by (27) The final processed block is obtained by applying the length filter to these versions, expressed as

Similarly, , and are analyzed as in the case of pure translation. With the assumption that Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

(28)

VÕ AND NGUYEN: QUALITY ENHANCEMENT FOR MOTION JPEG USING TEMPORAL REDUNDANCIES

613

where, for synchronization, (29) To ensure that the processed block has the same averaged value as the original, the filter coefficients have to satisfy (30) Similarly, as in (11), the error between the estimated and original DCT coefficients is Fig. 4. MSE for motion vectors (m 7) : (0; 0).

0 0

;n

) = (0; 0) : (7; 7) and (m

;n

)=

( 7;

(31) Under the same assumption as in Section III, the MSE can be obtained by (32)

In real video sequences, the filter length should be chosen of the difference between the refbased on the variance , high-order filter erenced and the current frames. With low can be used to reduce with less error in motion estimation. From (35), the filter coefficients depend on a function of motion vectors and are different for blocks with different motion vectors. This complexity prevents the optimal filter from being used in practice.

where VI. SIMULATION RESULTS if

A. Motion Vectors for Enhancement Process (33)

otherwise. The optimal values of which give minimal MSE are found under the condition (30) by using Lagrangian function (34) To find the optimal solution, the partial derivative for each variable will be set to zero

As mentioned above, only those motion vectors which satisfy the condition (19) will provide enhancement in image quality. The MSE between the reconstructed and original blocks is calculated as in (25) for all cases of backward motion vectors from (0,0) to (7,7) and forward motion vectors from to (0,0). These MSE values are shown in Fig. 4. Two pairs of and are combined in the -axis variables and -axis, respectively, in order to plot the MSE in the 3-D Descartes plane. MSE increases for larger motion vectors and and the maximum MSE is 12 470 at . With the standard quantization step size matrix

The solution for this equation system is the optimal coefficients of the adaptive filter (35)

the MSE between the quantized and original blocks is constant and equal to

and the corresponding minimal MSE is

(37) (36)

which is higher than the maximum MSE between the reconstructed and original blocks. That means all motion vectors will give quality improvement.

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

614

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

For a real video sequence, if (26) is satisfied for all motion vectors

(38) This condition will be replaced by an approximate condition in the simulation (39) The best backward and forward matching blocks which have the sum of squared differences less than 51 894 will be considered a valid pair of displacements for improvement. B. Enhancement in the Ideal Case The simulation is applied to single frame of different video sequences. Previous and future frames are purely translated from the present frame according to (40) (41) Motion vectors are determined by block matching method with integer pixel accuracy in the pixel domain. The search window size is 23 23. Quantization error prevents correct detection of all motion vectors. For one block in the current frame, the best matching blocks with least sum of squared difference in the previous and future frames will be the backward and forward displacements. They will be averaged with the original block to yield the final reconstructed block as in (10). All motion vectors are valid for enhancement as discussed in Section VI-A. The proposed algorithm is only applied to luminance component because chrominance components are downsampled by 2 in both directions and quantized with a larger quantization matrix. The proposed optimal filtering method is also simulated with only one previous frame and one future frame. With and , the optimal filter coefficients resulting from (35) are . For comparison, the spatial deblocking methods using postfilters proposed by Chen et al. [12] and Liu et al. [13] are implemented. These methods are applied to both the luminance and chrominance components of the compressed images. They assume that the DCT coefficients of shifted blocks change slightly and apply a filter for these DCT coefficients. Another simulated method is 5/3 MCTF proposed by Chen et al. [20]. This MCTF requires 5 successive frames and is used for scalable video coding. The other two previous and future frames are respectively simulated by (42) (43)

For case of compression using the standard quantization step , the results are shown in Fig. 5. The frame size matrix enhanced with the proposed method in Fig. 5(f) has reduced ringing artifacts near sharp edges over the unprocessed frame in Fig. 5(b). The relation in (6) permits replacement of the lost high frequency DCT coefficients and helps reducing ringing artifacts. Compared to the enhanced frames [Chen method in Fig. 5(c), Liu method in Fig. 5(d) and 5/3 MCTF method in Fig. 5(e)], the proposed enhancement method has higher PSNR and better reduction of ringing artifacts. The Chen and Liu methods also blur edges in the frame. The results obtained using a larger quantization step size maare shown in Fig. 6. The compressed frame is more trix affected by blocking artifacts as can be seen in Fig. 6(b). Because high quantization error causes inaccurate motion estimation, the Chen method is applied to remove blocking artifacts and increases the quality prior to implementing the proposed method. Since Chen and Liu methods are spatial domain algorithms, the proposed method can be combined with them for greater improvement. The same procedure is applied for the 5/3 MCTF method in this simulation. As shown in Fig. 6(c), the Chen method effectively removes the blocking edges, but it still has some artifacts, especially around areas near the mouth and nose. In Fig. 6(d), the Liu method reduces the blocking artifacts, but also introduces artifacts at the middle of 2 blocks in some areas. Result image using proposed method in Fig. 6(f) reduces the artifact of Fig. 6(c) and has better quality comparing to that in Fig. 6(e) which uses the 5/3 MCTF method. The PSNR value from Table I shows that at high bit rate com), the proposed method yields higher PSNR pression ( and than other methods. The PSNR of the reconstructed frame using the proposed method is improved by 0.58 dB to 1.78 dB compared to the PSNR of the compressed frame for different video and ), the sequences. With higher compression levels ( proposed method provides a 0.56 dB to 0.93 dB PSNR improvement. Although only 3 frames are used, the proposed method still achieves better PSNR than the 5/3 MCTF method which uses 5 frames. Using optimal filter gives a slight PSNR improvement compared to using the conventional (1/3, 1/3, 1/3) proposed filter. From Table I, the improvements in the City and Bus sequences are greater than those of the Foreman and Mobile sequences since they have more high frequency details. C. Enhancement in Real Video Sequence The algorithm is applied to the City sequence. The matching blocks from previous and future frames are only considered valid displacements if their sum of squared difference is less than 51 894. These displacements are formed by (8) and (9). The reconstructed block is the average of the current block and its displacements. If there are no valid displacements, the reconstructed block will be the current block. The motion vectors 23 full search are found to quarter pixel accuracy with 23 window. As demonstrated in (24), the algorithm for real video sequence does not provide as much enhancement as for the ideal case since the relation in real video sequences is not pure translation. For motion vectors with integer pixel accuracy, 89.23%

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

VÕ AND NGUYEN: QUALITY ENHANCEMENT FOR MOTION JPEG USING TEMPORAL REDUNDANCIES

Q

Fig. 5. Quality enhancement for ideal case—Compressed . (a) Sixth frame of mobile sequence. (b) Compressed— (d) Liu method (25.44 dB). (e) 5/3 MCTF method (25.97 dB). (f) Proposed method (26.74 dB).

Q

615

Q (25.58 dB). (c) Chen method (25.30 dB).

Fig. 6. Quality enhancement for ideal case—Compressed 4 . (a) Sixth frame of Foreman sequence. (b) Compressed—4 (28.47 dB). (d) Liu method (28.37 dB); (e) 5/3 MCTF method (28.62 dB). (f) Proposed method (28.67 dB).

of displacements over the whole sequence satisfies the condition in (39). A magnified portion of the original, compressed and enhanced frames are displayed in Fig. 7, for which quantization is used. The results show that the algorithm is efmatrix fective for a real sequence. When displayed as a still frame, the enhanced frames in Fig. 7(f)–(h) have less ringing artifacts than the compressed frame in Fig. 7(b) and have better quality comparing to those using the Chen, Liu, or 5/3 MCTF methods. More accurate motion vectors give greater improvement. The

Q (27.93 dB). (c) Chen method

improvement in PSNR compared to the compressed frame is 0.48, 1.00, and 1.21 dB for the respective cases of integer, half and quarter pixel motion estimation accuracy of ME. When displayed as a video sequence, the enhanced videos also have less mosquito noise than the MJPEG video, especially for the quantization matrix. case of a Fig. 8 summarizes the improvement for different frames of the City sequence. This graph verifies the robustness of the algorithm for the whole sequence. The proposed method outperforms the 5/3 MCTF, Chen, and Liu methods. As there is only

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

616

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

TABLE I PSNR ENHANCEMENT IN DECIBELS FOR IDEAL SEQUENCES

Q

Fig. 7. Quality enhancement for City sequence. (a) Original frame. (b) Compressed— (31.12 dB). (c) Chen method (30.93 dB). (d) Liu method (30.72 dB). (e) 5/3 MCTF (31.15 dB). (f) Integer pixel ME (31.60 dB). (g) 1/2 pixel ME (32.12 dB). (h) 1/4 pixel ME (32.33 dB).

Fig. 8. PSNR improvement for City frames compressed with quantization matrix .

Q

a small enhancement in PSNR in using quarter pixel over half pixel accuracy, half pixel ME is sufficient for typical applications, balancing complexity requirements while still providing PSNR improvement around 1 dB.

For practical implementation, further simulations are run for instead of cases of using the previous enhanced frame , as well as cases of using the previous decoded frame only the previous frames to deal with causal problem and cases of using more than 3 frames as in Fig. 3. The following options are stimulated for detail comparison. , , . • Scenario 1: conventional , , . • Scenario 2: using • Scenario 3: using , , . , , . • Scenario 4: using , , , , • Scenario 5: using . The PSNR for different options is shown in Figs. 9 and 10 with integer pixel and half pixel motion estimation accuracies, respectively. The average improvement in PSNR from the 5th frame to the 50th frame for these options with different motion estimation accuracies is shown in Table II. For integer pixel ME accuracy, using the previous enhanced frame gives higher PSNR than using the previous decoded frame. The average PSNR is 0.5629 and 0.4159 dB, respectively, for scenario 2 and 1. This

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

VÕ AND NGUYEN: QUALITY ENHANCEMENT FOR MOTION JPEG USING TEMPORAL REDUNDANCIES

617

Fig. 9. PSNR for different options with integer pixel ME accuracy.

Fig. 10. PSNR for different options with half pixel ME accuracy. TABLE II COMPARISON IN PSNR IMPROVEMENT FOR DIFFERENT SCENARIOS

is also true when using only the previous frames where the average PSNR is 0.3411 and 0.3402 dB for scenario 4 and 3, respectively. Using only previous frames gives lower average PSNR than using both previous and future frames. This can of be explained by the lower correlation to current frame than . Using more frames such as five frames in scenario 5 gives better improvement but the running time is longer due to more ME. Increasing number of frames is not good for nonpanning video sequences.

For half pixel ME accuracy, there is not significant difference in PSNR between using the post-processing and decoded frame for the pair of scenario 1 and 2 (with average PSNR improvement 0.9421 dB and 0.9545 dB, respectively) and the pair of scenario 3 and 4 (with average PSNR improvement 0.9070 dB and 0.8361 dB, respectively). The benefit of interpolation the referenced frame overcomes the benefit of using better referenced frame in this case. Using only previous frames still gives less improvement than using both previous and future frames and using more frames gives higher PSNR than other scenarios. As mentioned above, half pixel ME accuracy is used to balance the PSNR improvement and complexity. In that case, using the previous decoded frames will make the algorithm less complex but still maintain the same PSNR improvement when using the previous enhanced frames. This also permits applying the parallel MEs between the previous decoded frames and current frame instead of waiting until the previous enhanced frames are available. To deal with causal problem, instead of using one previous and one future frame as in scenario 1, two previous decoded frames as in scenario 3 can be used without a remarkable loss of quality improvement.

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

618

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

The proposed method only needs motion estimation and temporal filtering. It has less complexity and is faster than 5/3 MCTF method. Comparing to Liu method which includes edge detection and applies different processes for each classified block and Chen method which needs forward and inverse DCT, the proposed method is less complex, but it requires longer running time due to motion estimation. All image and video results can be found at http://videoprocessing.ucsd.edu/~dungvo/MJPEG.html.

(44) where

(45) VII. CONCLUSION A pixel-based post-processing algorithm for MJPEG frame quality improvement is proposed. The enhancement is verified in both PSNR and visual quality. Since the proposed method uses inter-frame correlation, it can be combined with other spatial-based methods. The proposed approach requires block matching motion estimation, which is available in MPEG encoders. Future work will extend the results using more general motion model between blocks, such as affine model for zooming and rotation. The chrominance components can be upsampled by 2 before being processed by the estimation algorithm to get further enhancement. The algorithm can also be extended to H.264/AVC, which explores the temporal redundancies to obtain a higher compression level. For this standard, the proposed algorithm can use the accurate motion vectors which are available at the decoder instead of having to estimate the motion vectors from the compressed frames.

(46) and

(47)

where

APPENDIX II EQUATION FOR

APPENDIX I TRANSLATIONAL RELATION

(48)

(49)

(50)

and (51)

where

(52) (53) Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

VÕ AND NGUYEN: QUALITY ENHANCEMENT FOR MOTION JPEG USING TEMPORAL REDUNDANCIES

and

(54)

REFERENCES [1] A. J. Jerri, The Gibbs Phenomenon in Fourier Analysis, Splines and Wavelet Approximations. Dordrecht, The Netherlands: Kluwer, 1998. [2] H. S. Malvar and D. H. Staelin, “The LOT: Transform coding without blocking effects,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 4, pp. 553–559, Apr. 1989. [3] T. Jarske, P. Haavisto, and I. Defee, “Post filtering methods for reducing blocking effects from coded images,” IEEE Trans. Cosum. Electron., vol. 40, no. 3, pp. 521–526, Aug. 1994. [4] Y. F. Hsu and Y. C. Chen, “A new adaptive separate median filter for removing blocking effects,” IEEE Trans. Consum. Electron., vol. 39, no. 3, pp. 510–513, Aug. 1993. [5] B. Ramamurthi and A. Gersho, “Nonlinear space-variant postprocessing of block coded images,” IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 5, pp. 1258–1268, Oct. 1986. [6] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, “Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 6, pp. 421–432, Dec. 1993. [7] R. L. Stevenson, “Reduction of coding artifacts in transform image coding,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Apr. 1993, vol. 5, pp. 401–404. [8] Z. Xiong, M. T. Orchard, and Y. Q. Zhang, “A deblocking algorithm for JPEG compressed images using overcomplete wavelet representations,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 433–437, Apr. 1997. [9] C. N. Tien and H. M. Hang, “Transform-domain postprocessing of DCT-coded images,” in Proc. SPIE, Visual Commun. Image Processi., Nov. 1993, vol. 2094, pp. 1627–1638. [10] B. Jeon and J. Jeong, “Blocking artifacts reduction in image compression with block boundary discontinuity criterion,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 3, pp. 345–357, Jun. 1998. [11] A. Nosratinia, “Embedded post-processing for enhancement of compressed images,” in Proc. IEEE Data Compression Conf. (DCC), Mar. 1999, pp. 62–71. [12] T. Chen, H. R. Wu, and B. Qiu, “Adaptive postfiltering of transform coefficients for the reduction of blocking artifacts,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 5, pp. 594–602, May 2001. [13] S. Liu and A. C. Bovik, “Efficient DCT-domain blind measurement and reduction of blocking artifacts,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 12, pp. 1139–1149, Dec. 2002. [14] J. Hu, N. Sinaceur, F. Li, K. W. Tam, and Z. Fan, “Removal of blocking and ringing artifacts in transform coded images,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 1997, vol. 4, pp. 2565–2568. [15] H. S. Kong, A. Vetro, and S. Huifang, “Edge map guided adaptive postfilter for blocking and ringing artifacts removal,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2004, vol. 3, pp. 929–932. [16] S. H. Oguz, Y. H. Hu, and T. Q. Nguyen, “Image coding ringing artifact reduction using morphological post-filtering,” in Proc. IEEE Int. Work. Multimedia Signal Process., Dec. 1998, pp. 628–633. [17] K. J. Boo and N. K. Bose, “A motion-compensated spatio-temporal filter for image sequences with signal-dependent noise,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 3, pp. 287–298, Jun. 1998.

619

[18] B. C. Song and K. W. Chun, “Motion-compensated temporal filtering for denoising in video encoder,” Electron. Lett., vol. 40, no. 13, pp. 802–804, Jun. 2004. [19] J. R. Ohm, “Temporal domain subband video coding with motion compensation,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Mar. 1992, vol. 3, pp. 229–232. [20] C. Y. Chen, C. T. Huang, Y. H. Chen, S. Y. Chien, and L. G. Chen, “System analysis of VLSI architecture for 5/3 and 1/3 motion-compensated temporal filtering,” IEEE Trans. Image Process., vol. 54, no. 10, pp. 4004–4014, Oct. 2006. [21] K. R. Rao and P. Yip, Discrete Cosine Transforms: Algorithms, Advantages, Applications. New York: Academic, 1990. [22] B. Widrow, I. Kolla, and M. C. Liu, “Statistical theory of quantization,” IEEE Trans. Instrum. Meas., vol. 45, no. 2, pp. 353–361, Apr. 1996. [23] J. R. Ohm, “Advanced packet-video coding based on layered VQ and SBC techniques,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 3, pp. 208–221, Jun. 1993. [24] U. V. Koc and K. J. R. Liu, “DCT-based motion estimation,” IEEE Trans. Image Process., vol. 7, no. 7, pp. 948–965, Jul. 1998. [25] U. V. Koc and K. J. R. Liu, “Interpolation-free subpixel motion estimation techniques in DCT domain,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, pp. 460–487, Aug. 1998.

Du˜ ng Trung Võ (S’07) received the B.S. degree in electrical and electronics engineering and the M.S. degree in electronics engineering from Ho Chi Minh City University of Technology, Ho Chi Minh, Vietnam, in 2002 and 2004, respectively. He is currently working toward the Ph.D. degree at University of California at San Diego, La Jolla, since 2005 as a Fellow of Vietnam Education Foundation (VEF). He is a teaching staff member of Ho Chi Minh City University of Technology since 2002. From June 2007 to September 2007, he worked as an intern at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA. His research interests are super-resolution, algorithms and applications for image and video enhancement. Mr. Vo received the second prize of the Student Scientific Research Competition, hosted by the Ministry of Education and Training of VietNam and second prize of the VIFOTEC Contest for Students of the Vietnam Foundation Of Technological Creativity in 2002.

Truong Quang Nguyen (F’06) is currently a Professor in the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla. His research interests are video processing algorithms and their efficient implementation. He is the coauthor (with Prof. Gilbert Strang) of a popular textbook, Wavelets & Filter Banks (Wellesley-Cambridge, 1997) and the author of several MATLAB-based toolboxes on image compression, electrocardiogram compression and filter bank design. Prof. Nguyen received the IEEE TRANSACTIONS ON SIGNAL PROCESSING Paper Award (Image and Multidimensional Processing area) for the paper he co-wrote with Prof. P. P. Vaidyanathan on linear-phase perfect-reconstruction filter banks (1992). He received the National Science Foundation Career Award in 1995 and is currently the Series Editor (Digital Signal Processing) for Academic Press. He served as Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING from 1994–1996 and for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: ANALOG DIGITAL SIGNAL PROCESSING from 1996–1997; and for the IEEE TRANSACTIONS ON IMAGE PROCESSING from 2004–2005.

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 23, 2009 at 16:00 from IEEE Xplore. Restrictions apply.

Suggest Documents