Memory Efficient Progressive Rate-Distortion Algorithm for JPEG 2000

4 downloads 0 Views 697KB Size Report
Abstract—A novel rate-distortion optimization algorithm for. JPEG 2000 is proposed. In JPEG 2000 standard, the encoder generates a number of independent ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

181

Memory Efficient Progressive Rate-Distortion Algorithm for JPEG 2000 Taekon Kim, Member, IEEE, Hyun Mun Kim, Senior Member, IEEE, Ping-Sing Tsai, Member, IEEE, and Tinku Acharya, Senior Member, IEEE

Abstract—A novel rate-distortion optimization algorithm for JPEG 2000 is proposed. In JPEG 2000 standard, the encoder generates a number of independent compressed bit streams. But it is the rate-control algorithm that generates truncation points for these bit streams in an optimal way in order to minimize the distortion according to a target bit rate. In this letter, we propose a computationally efficient rate-distortion optimization algorithm that generates the truncation points of the compressed bit streams during the encoding procedure to avoid unnecessary coding of the possible discarded bit streams for the given bit rate. This algorithm meets memory buffer requirement for the compressed bit streams quite strictly according to a given bit rate. Moreover, before the encoding process even starts, a required memory buffer size can be estimated. This algorithm can also help avoid unnecessary encoding for some parts of an image. Hence, it is memory efficient and supports progressive encoding also. In low bit rate application the proposed algorithm can greatly reduce the encoder complexity and buffer requirement. Index Terms—Image compression, JPEG 2000, rate-distortion optimization.

I. INTRODUCTION

T

HE GROWING demand for multimedia communications in a variety of applications mandates the incorporation of a number of desirable properties in current and future image coding algorithms. New techniques should exhibit cutting edge performance while providing desirable functionalities including progressive transmission, scalability, region of interest coding, random access, error resilience and so forth. Many of these requirements are not met by current discrete cosine transform (DCT)-based image coders such as baseline JPEG [1]. Recently, a new standard for the compression of still images, the JPEG 2000 standard, has been developed to meet these requirements by the International Organization for Standardization (ISO) [2]. The JPEG 2000 standard aims at creating a new versatile coding system for various types of still images that provides excellent image quality both objectively and subjectively at low bit rate along with those functionalities aforementioned. In fact, most of the state-of-the-art technologies of still image compression were integrated in the JPEC 2000 standard. Although basic encoding key modules of JPEG 2000 such as wavelet transformation, quantization, bit plane coding and bi-

nary arithmetic coding are clearly specified, some implementation issues are still left to individual developers. Among these problems rate-distortion optimization plays a key role in JPEG 2000 implementation. JPEG 2000 encoder generates a number of independent bit streams. Accordingly a rate-distortion optimization algorithm should provide truncation points for these bit streams in an optimal way to minimize distortion according to a target bit rate. After the image is completely compressed, a rate-distortion optimization algorithm is applied once at the end using all the rate and rate-distortion slope information of each coding unit. This is a so-called post-compression rate-distortion (PCRD) algorithm. A PCRD algorithm encodes all coding units completely and keeps the entire compressed bit streams regardless of a given bit rate. In other words, the encoder has to retain all the compressed bit streams. This may be impractical in some applications. In fact, JPEG 2000 shows superior image quality than the current standards do at very low bit rate. So images may be encoded at low bit rate frequently. This makes the PCRD algorithm perform inefficiently. In this letter, a memory efficient progressive rate-distortion (MEPRD) optimization algorithm is proposed. It encodes a coding unit and performs rate-distortion optimization at the same time as necessary. So it can help avoid encoding some unnecessary coding units and meets memory requirement for the compressed bit streams quite strictly. Moreover, the required memory buffer size can be estimated before the encoding process even starts and it is very close to a target rate. This paper is organized as follows. Section II reviews JPEG 2000 and its rate-distortion optimization. In Section III, we implement a PCRD algorithm and propose the MEPRD algorithm. Section IV shows the simulation results of both algorithms. Section V concludes the paper. II. BACKGROUND The algorithms in this letter are designed for JPEG2000 encoder and they are based on rate-distortion optimization theory. Before giving all the details of PCRD and MEPRD, first we briefly describe the JPEG 2000 encoding procedures and general rate-distortion optimization for JPEG 2000. A. JPEG 2000

Manuscript received May 6, 2002; revised November 10, 2003. T. Kim is with Digital Media R&D Center, Samsung Electronics Co., Ltd, Kyungki 442-742, Korea (e-mail: taekon.kim@ Samsung.com). H. M. Kim is with Samsung Advanced Institute of Technology, Kihung, Kyungki 440-600, Korea (e-mail: [email protected]). P. S. Tsai is with the Department of Computer Science, University of Texas–Pan American, Edinburg, TX 78541 USA (e-mail: pstsai@ cs.panam.edu). T. Acharya is with Avisere Inc., Tucson, AZ 85711 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSVT.2004.839970

The functional block diagram of the JPEG 2000 algorithm is shown in Fig. 1(a). In JPEG 2000 encoder the image components can be divided into rectangular tiles as shown in the dataflow diagram in Fig. 1(b). This operation makes it suitable to work with huge images. DC level shifting is performed on these tile components followed by either an irreversible or reversible component transformation. The component transformation helps improve compression performance. Each

1051-8215/$20.00 © 2005 IEEE

182

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

Fig. 1. (a) Block diagram of the JPEG 2000 encoder algorithm. (b) Dataflow.

component of a tile is independently transformed by the discrete wavelet transformation [2], [3]. In JPEG 2000, the 9–7 irreversible wavelet transformation for lossy compression and the 5–3 reversible lifting based wavelet transform for lossless compression are recommended. Uniform scalar quantization with deadzone at the origin is applied to the samples in subbands at the wavelet domain for lossy compression. The quantization step size can be determined by the dynamic range of the samples in a subband. After quantization, each subband is divided into nonoverlapping rectangular blocks, called precinct. Each precinct is further divided into code blocks. Code block is the basic coding unit for entropy encoding and it is encoded independently. The size of the code block is typically 32 32 or 64 64. Entropy encoding in JPEG 2000 consists of a fractional bit plane coding (FBPC) [4], [5] and binary arithmetic coding (BAC). The combination of FBPC and BAC is also referred as the Tier 1 coding in the standard. FBPC has three passes in each bit plane: significance propagation pass, magnitude refinement pass, and cleanup pass. Each pass generates context models and the corresponding binary data. With the output of FBPC, BAC produces the compressed bit stream. So each coding block has an independent bit steam. These independent bit streams of all the code blocks are combined into a single bit stream using Tier 2 coding based on the result of rate-distortion optimization. An efficient rate-distortion

algorithm provides possible truncation points of the bit streams in an optimal way to minimize distortion according to any given target bit rate. Tier 2 coding multiplexes these independent bit streams that were generated in Tier 1 coding to compose the final compressed output bit stream and it also gives header information to indicate ordering of the resulting coded blocks and corresponding coding passes in an efficient manner. B. General Rate Distortion Optimization for JPEG 2000 Rate-distortion optimization in JPEG 2000 can be considered as budget constrained allocation problem that finds truncation points giving the minimum distortion for the given budget [6]. In JPEG 2000 the rate and distortion of each coding unit (code block) can be measured independently [2], [4], [5]. Let denotes a set of all code blocks that represent an image in the wavelet transform domain. Since each code block is encoded independently, we can assume that the relevant distortion metric is additive,1 i.e., (1) represents overall image distortion and denotes where of code block distortion generated by the truncation point 1Mean squared error (MSE) and weighted mean squared error (WMSE) satisfy this additive property and MSE is used in our experiments.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

183

Fig. 2. Rate-distortion curves of JPEG2000.

. The overall bit rate ( ) included into the bit stream is given by (2) where is a given bit constraint and is the accumulated of code block . Now bit rate up to the truncation point rate-distortion optimization problem is the optimal selection of the truncation points s that minimize the distortion. This well known problem can be solved using a Lagrange multiplier. Any , that minimizes set of truncation points, (3) for some is optimal in the sense that the distortion cannot is be reduced without increasing the overall rate where the corresponding distortion generated by . The value of must be adjusted until the rate satisfies . Finding the correct value of can be done using a bisectional search or alternative methods [6]. If the set of truncation candidate points of each code block, , are finite, then the problem can be simplified further. represents the set of possible truncation points of a code block while represents a truncation point of the code block. Let be an enumeration of these truncation points be the corresponding rate-disand let and tortion slopes where . Then (3) is simplified to finding among finite and discrete rate-distortion slopes. Although possible truncation points of each coding pass in JPEG 2000 could be quite large, in this letter we restrict these points to be located at the end of coding passes for the following reasons. First, rate-distortion curve between two adjacent coding passes of a coding block is almost a straight line. This is because in a coding pass of a coding block each encoded coefficient is supposed to give the same contribution to image quality. Fig. 2(a) shows the linear interpolation of rate-distortion curve of a coding block. Second, in real cases rate-distortion curve does not necessarily meet the convexity.2 Fig. 2(b) shows an example that adjusts the convexity of rate-distortion 2For JPEG 2000 it often occurs due to different statistics of three types of coding passes.

curve. Third, this assumption makes complicated rate-distortion optimization feasible. In fact, there is finite number of slopes . So rate-distortion optimization becomes a kind of sorting or searching problem of these slopes. III. NEW ALGORITHM BACKGROUND The set of possible truncation points , the rate increment and the slope are calculated when the code block are stored along with the compressed bit stream. each The set is given by the number of bit plane coding passes because end of each coding pass in the of the code block compressed bit stream is used as a truncation candidate point. The distortion-rate slopes of a code block are assumed to be decreasing. When this condition is not met, slopes are adjusted to meet the convexity of rate-distortion curve. In our experiments, these slopes are modified to accumulated slopes.3 We first describe the PCRD algorithm and then describe the proposed MEPRD algorithm below. A. PCRD Algorithm In PCRD algorithm, after all code blocks are completely compressed, a rate-distortion optimization algorithm is applied once with all the rate and slope information of each code block. The proceeds in a straightforward search for the optimal and manner. Note that this rate-distortion optimization can be interpreted as a kind of sorting problem of the distortion-rate slopes.4 So there can be many ways to find the optimal and for a given bit rate. A fast algorithm implemented in this experiment is as follows. PCRD Algorithm With Sorting 1)Sort all slopes in descending order and where is the for total number of code blocks. s based on 2)Accumulate corresponding the above sorting order until where is the accumulated rate. Find the that satisfies , and maximum . let 3When S (1D + 1D

< S , )=(1R + 1R

S

=

accumulated slope, , is calculated. 4Complexity of sorting algorithms may be increased linearly with image size. In fact, bisectional method can be used for a large image.

)

184

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

EXPERIMENTAL RESULTS FOR PCRD–CPU TIME (

T

TABLE I ) IN SECONDS (FOR BUBBLE SORT, INSERTION SORT, BISECTION METHOD), REQUIRED BUFFER SIZE TO ), PSNR HOLD THE BITSTREAM (

3)Determine the optimal truncation points for each code block. The maximum that satisfies is set to . In above algorithm, we observe that the sorting of the slopes is a major component that dictates performance of the algorithm. There are several sorting algorithms available in the literature [7]. Choice of sorting algorithm has major impact in the computation time. We have implemented two simple methods for sorting—bubble sort and insertion sort. However, same result can be obtained by using bisection method to find corresponding and experimental results show that it requires much less computational time compared to that of traditional sorting methods. We have shown the results in Table I. The main advantage of a PCRD algorithm is its simplicity because the algorithm is applied just once. However, at low bit rate, a PCRD algorithm may be very inefficient from encoding and memory requirement point of view. A PCRD algorithm encodes all code blocks completely and keeps the compressed bit streams entirely regardless of a given bit rate. In fact, JPEG2000 gives superior image quality than the current standards do at very low bit rate. So a PCRD algorithm may not be good for low bit rate applications. Hence we propose a memory efficient progressive rate-distortion algorithm below to solve these problems. B. MEPRD Algorithm MEPRD algorithm encodes one coding pass of a code block and executes rate-distortion optimization simultaneously as necessary. (After encoding one coding pass of a code block, ratedistortion optimization occurs if necessary.) So MEPRD can help avoid encoding some coding passes of a code block that

M

have relatively small distortion-rate slopes. Its memory buffer requirement for the compressed bit streams is managed quite strictly according to a given bit rate. Moreover, before the encoding process even starts, a required memory buffer size can be estimated. In the following algorithm we have used two variables Total_rate and Given_rate to indicate the total rate accumulated by the algorithm in the intermediate steps and the desired bit rate, respectively. MEPRD Algorithm 1. Initialize the Total_rate to 0. at a time. 2. Consider one code block , do the following For each code block steps: 2.1 Encode one coding pass ( ) of the current code block ( ) at a time, caland the culate the rate increment , add to Total_rate, and slope save the encoded bit stream in memory and is the set buffer (where of possible truncation points of the code ). block 2.2 While Total_rate is greater than Given_rate, repeat the following steps. among 2.2.1 Find the lowest slope coding passes of encoded coding blocks. 2.2.2 Truncate the corresponding bit and reduce the stream for the slope Total_rate by . is equal to , then termi2.2.3 If nate the coding process for the current code block.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

Fig. 3. Flow chart of MEPRD.

MEPRD algorithm performs almost the same procedures as PCRD algorithm until Total_rate does not exceed the Given_rate. The only difference is that MEPRD calculates the Total_rate and compares to the Given_rate after encoding in each coding pass. In case the Total_rate exceeds the Given_rate, MEPRD performs rate-distortion optimization procedures and makes the Total_rate be less or equal to the Given_rate. Of course in these procedures, unnecssary encoded bit streams and the rate/slope information of these bit streams are removed from memory buffer. If the bit stream of the currently encoded code block is truncated, then the remaining coding passes of this code block are not even encoded because slopes of coding passes of a coding block are assumed to be decreasing.5 So MEPRD is involved in both Tier1 and Tier 2 encoding phases in order to efficiently generate the compressed bit streams with reduced computational and memory buffer requirements compared to the PCRD scheme which is applied in Tier 2 encoding phase only. Fig. 3 shows the flowchart of the MEPRD algorithm. In Fig. 3, the loop ends when becomes greater than the total number of code blocks and hence the rate-distortion algorithm stops. Because MEPRD truncates bit streams of the encoded code blocks that are not necessary for the final bit stream of a target bit rate, it is easy to see that MEPRD keeps memory buffer size 5As we discussed in Section II, slopes of coding passes of a coding block may not be in descending order. So, encoding two or more additional coding passes before determining slope of a coding pass can give us better rate-distortion result.

185

for the compressed bit streams almost constant. This memory buffer size can be easily estimated and it is almost the same as that of the final bit stream. Moreover, MEPRD does not need all the rate and slope information of the truncated bit streams. When a target bit rate is less than the size of the completely compressed bit stream, MEPRD may not encode the whole coding passes of code blocks and it can save the encoding computation significantly. But, the MEPRD algorithm greatly depends on a target bit rate. In fact, as a target bit rate increases MEPRD algorithm needs more encoding time/memory and rate-distortion operations and the advantages over PCRD get smaller. In fact, MEPRD truncation can increase computational complexity considerably. So, when the memory buffer size is much larger than that for the compressed bit streams of a given rate, this truncation may occur intermittently or even not. In fact, JPEG 2000 software named Kakadu is introduced [5], [8] and it provide some functions to scan and truncate the memory buffer to a certain maximum size only periodically where the period is determined on the basis of image size. at any point of the encoding, To find the optimal and appropriate information should be stored, i.e., rates and slopes of coding passes of coding blocks so far encoded. Instead of rate and slope, an ordered list and slope information are used in MEPRD. And this overhead may increase almost linearly with image size. To minimize the amount of such information, a method to find the suboptimal is suggested in [5]. Also, a method that estimates the suboptimal via statistical data of a set of images is proposed [8]. IV. EXPERIMENTAL RESULTS Both PCRD and MEPRD algorithms should give the same rate-distortion results in principle. In other words, both algorithms should generate the identical compressed bit stream when the same target rate for an image is applied. However, as we discussed, slopes of coding passes of a coding block may not be in strictly descending order and the same rate-distortion results can not be guaranteed. In fact, PCRD can provide the optimal rate-distortion results while MEPRD provides the suboptimal. The experimental results show that PCRD performs better PSNR results than MEPRD at most of the data rates as we expected.6 However, the PSNR gain is not significant and it is disappeared as the data rate grows. Fig. 4 shows the set of test images used in our experiment. All images are size of 640 480 and three levels of wavelet decomposition with 5–3 lifting based DWT were used to encode the image. The code block size of 32 32 is chosen in all experiments. Various target bit rates (from 0.1 bpp to lossless) are chosen to illustrate the differences between two bit rate control algorithms. Table I shows the experiment results using PCRD with three different implementation methods, i.e., Bubble Sort, Insertion Sort and Bisection method. As we expected, all three different implementations produce the same bit stream. We have gen) for the PCRD algorithm using erated the CPU time ( all these three implementation methods as shown in Table I and also the CPU time for the MEPRD algorithm as shown in 6In all experiments we used our own JPEG 2000 codec. So the PSNR results may differ from those of the JPEG 2000 VMs.

186

Fig. 4.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

Images used in our experiment. (a) ARMY. (b) CATHEDRAL. (c) CHIMP. (d) TOWN. All images are of size 640

2 480.

TABLE II EXPERIMENTAL RESULTS FOR THE PROPOSED MEPRD METHOD

Table II. We have also generated the memory buffer require) to hold the compressed bit stream for PCRD ments ( and MEPRD7 as shown in Tables I and II, respectively. The memory requirements measured in these experiments do not include memory space for the rate and slope information of an image because this memory space greatly depends on code block size and truncation points of a code block. In PCRD, the ) for an image is independent of memory requirement ( the given target bit rate as shown in Table I. This is simply due to the fact that we need to hold all compressed bit stream before PCRD is applied. In lossless case a sorting algorithm need not be applied and the execution (CPU) time of lossless compression is less than those of other rates. Obviously the execution time of all three implementation shows the same results in lossless case. Table I shows that sorting method implemented in PCRD algorithm plays an important role in coding time of JPEG 2000 encoder. Bubble Sort is well known with high computational . As shown in Table I, the Bubble Sort complexity of implementation always requires much more CPU time as compared with the Insertion Sort or Bisection method. The Bisection method is slightly faster than the Insertion Sort implementation. 7We do experiment on the CPU time and memory buffer requirement for the MEPRD algorithm and the MEPRD without truncation.

However, the Insertion Sort is easy to implement as compared with Bisection method. Table II shows the experiment results for the proposed MEPRD method. As we can see from the table, the required memory buffer size for the compressed bit streams of MEPRD depends on the target bit rate while PCRD always needs the same memory size regardless of the target bit rate. In fact, memory requirement of PCRD depends on the image itself and it is hard to estimate until encoding is completed. Also, the memory requirement of MEPRD can be estimated based on the given bit rate with some extra marginal memory which is needed to keep the bit stream of a coding pass that is being encoded. The maximum marginal memory depends on the target ) bit rate and the code block size. Note that the CPU time ( increases with the increase of target bit rate. It reaches the peak execution time around 0.5 bpp then it decreases as the rate increases. This can be interpreted as follows; when the bit rate is low, MEPRD encodes only limited number of coding blocks and its coding passes and truncation of unnecessary bit streams does not occur significantly. As the bit rate increases, encoding and truncation procedures get heavier and the CPU time increases. As the bit rate approaches to lossless compression, encoding time increases while truncation procedure decreases

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 1, JANUARY 2005

187

ison of average execution (CPU) time between PCRD (Bisection method) and MEPRD. It is clear from these two figures that at low bit rate application, the proposed MEPRD method has the advantage both in terms of memory requirement and execution time. V. CONCLUSION

Fig. 5. Comparison of average memory required between PCRD and MEPRD with and without truncation.

A novel memory efficient progressive rate-distortion optimization algorithm for JPEG 2000 is proposed. The proposed MEPRD algorithm truncates parts of bit streams of the encoded code blocks by applying the rate-distortion algorithm multiple times as necessary. So this algorithm can make the memory buffer for keeping the compressed bit streams almost constant. Moreover, this helps avoid encoding some coding passes of code blocks that have relatively small contribution to the final image quality. At low bit rate, this algorithm has great advantages over the existing PCRD algorithm. Since one of major applications in JPEG 2000 standardization is to provide excellent image quality at low bit rate, the proposed algorithm can greatly reduce the encoder complexity and buffer requirement in low bit rate applications. However, as target bit rate increases, MEPRD algorithm needs more encoding time/memory and rate-distortion operations and the advantages over PCRD get smaller. REFERENCES

Fig. 6. Comparison of average CPU time over the set of test images between PCRD (bisection method) and MEPRD with and without truncation.

significantly. In lossless case no truncation occurs. However, MEPRD need to calculate the Total_rate and compare to the Given_rate. So MEPRD require a little more time than PCRD does but it is negligible as shown in Tables I and II. As we expect, the MEPRD algorithm without truncation shows that its CPU time is much less that both the PCRD and MEPRD for the entire data rate. And its memory buffer requirement lies between the PCRD and MEPRD. MEPRD truncation consumes lot of CPU time resource in our implementation. Fig. 5 shows the comparison of average memory requirement between PCRD and MEPRD. Fig. 6 shows the compar-

[1] W. B. Pennebaker and J. L. Mitcell, JPEG: Still Image Data Compression Standard. New York: Nostrand Reinhold, 1993. [2] JPEG 2000 Part I: Final Committee Draft (ISO/IEC FCD15441-1), ISO/IEC JTC1/SC29/WG1 N11855, 2000. [3] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform,” IEEE Trans. Image Process., vol. 1, no. 4, pp. 205–220, Apr. 1992. [4] D. Taubman, “High performance scalable image compression with EBCOT,” IEEE Trans. Image Process., vol. 9, no. 7, pp. 1158–1170, Jul. 2000. [5] D. Taubman and M. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Norwell, MA: Kluwer, 2002. [6] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 23–50, Nov. 1998. [7] MAD. E. Knuth, The Art of Computer Programming Volume 3: Sorting and Searching. Reading: Addison-Wesley, 1998. [8] D. Taubman, “Software architectures for JPEG2000,” Proc. IEEE Int. Conf. Digital Signal Processing, vol. 1, pp. 197–200, 2002.

Suggest Documents