Quality-biased Rate Allocation for Compound Image Coding with ...

10 downloads 2273 Views 283KB Size Report
Email: [email protected], [email protected]. Abstract - In this paper, we propose a novel rate allocation method for compound image coding using ...
Quality-biased Rate Allocation for Compound Image Coding with Block Classification Dong Liu1, Wenpeng Ding2, Yuwen He3, Feng Wu3 1

Dept. of Electronic Science and Technology, 2Dept. of Computer Science, University of Science and Technology of China, Hefei 230026, China Email: {liud, wpding3}@mail.ustc.edu.cn 3 Microsoft Research Asia, Beijing 100080, China Email: [email protected], [email protected]

Abstract - In this paper, we propose a novel rate allocation method for compound image coding using Quality-biased Rate-Distortion Optimization (QRDO) technique to enhance visual quality. The compound image is divided into 16×16 blocks, which are further classified into four types: smooth, text/graphics, continuous tone and hybrid blocks. Four carefully designed coding approaches according to different statistical features are introduced to compress them, and ratedistortion tradeoff with QRDO technique is considered to decide which approach should be used for each block. The quality of “text” regions is specially emphasized during this process by weighting blocks according to their types, which is verified by experimental results.

I.

INTRODUCTION

Compound images are such kind of images that contain both palettized regions, i.e. text or graphics, and continuous tone regions, i.e. photograph or natural-like textures. For example, hybrid documents, slides, web-pages and contents captured from screen are all typical “compound” images. In many real-time remote collaborations and interactive applications, compound images need to be compressed and transmitted. While compressing compound images, it is most believed that the palettized regions and continuous tone regions should be coded in different approaches due to their different statistical characteristics. This conclusion leads to many compression methods, in which segmentation of text is always adopted. In [1, 2, 3] for document image compression, multilayer segmentation is used and text is regarded kept in one of the layers; [4] chooses block-based segmentation, in which blocks are classified into several types to indicate different regions in document image: they represent two categories of compound image coding, i.e. layer-based and block-based. Since the latter is much simpler, it is widely used in many novel real-time transmission systems for computer-generated images or videos, which are greatly demanded and developed in recent years, such as [5, 6].

transmission. There are a lot of rate allocation methods existing for general image coding, but they are not quite suitable for compound image. The latter raises different requirement, because human visions tend to be much more sensitive to the quality of text and other sharp edges, than to the quality of photograph or textures, and text is often more important as it contains high-level semantical information, at least the text should be ensured recognizable. In this paper, an approach to address this problem is proposed. We also choose block-based segmentation: first the image is divided into 16×16 blocks, which are then classified into four types. Then we propose an effective rate allocation algorithm using Quality-biased Rate-Distortion Optimization (QRDO) technique, in which we decide which coding approach should be used for each block, namely select a best mode for each block, and set a quality parameter (QP) for each approach according to the given rate constraint. The distortion of “text” blocks is higher evaluated during this rate-distortion tradeoff to enable high quality of text. The whole coding framework is shown as Figure 1. This paper is organized as below: Section 2 describes the segmentation algorithm and coding approaches in detail, and then Section 3 introduces our rate allocation algorithm with weighted distortion. Section 4 gives some experimental results and Section 5 concludes our works.

In constant bit-rate coding and real-time streaming, bitrate control or rate allocation is a basic problem. This is definitely the basis for future screen video compression and

Figure 1. Framework of block-based compound image compression

This work was done when the authors were with Microsoft Research Asia.

0-7803-9390-2/06/$20.00 ©2006 IEEE

4947

ISCAS 2006

II.

BLOCK-BASED SEGMENTATION AND CODERS DESIGN

Block-based segmentation has been studied by many researchers. One of the simplest versions can be found in [6], where they just count unique colors number in a block and compare the count with a threshold to classify the block as text/graphics or picture. To find some text on picture, a further refinement is to extract shape primitives from picture blocks. In [4] color histogram is applied and they find some modes that corresponding to peaks in the histogram, and judge the block type by thresholding the number of modes. We also refer to [5] to provide another approach, where the distribution of gradients is used to calculate entropy to classify different blocks. In our system, we propose a novel block classification algorithm, which separates the image into four types of blocks, i.e. smooth blocks that contain only one dominant color, text/graphics blocks that contain mostly several (less than four) colors, continuous tone (cont-tone) blocks that contain many colors and little edges, hybrid blocks that contain many colors with considerable edges. We want to point out that the hybrid blocks can indicate those regions in which both palettized and continuous tone components can be found, e.g. text written on photograph, or rich-textured graphics, and these blocks are not easily handled by both text/graphics coder and cont-tone coder, because the former would get very poor compression ratio and the latter would greatly decrease the visual quality due to ringing effects. Similar consideration can also be found in [6] but there the thresholding of colors number is too simple while the refinement to extract shape primitives is too complicated. We use the grayscale-gradient distribution of a block for classification. If the block contains text or graphics, there must be continuous “objects” in it, which are easily differentiated from each other, so some peaks will be obvious in the low-gradient partition (indicate continuous objects) and in the high-gradient partition (indicate edges between different objects). Whereas if no text or graphics contained in the block, the distribution will be concentrated in the medium-gradient partition (indicate continuous tone regions). For convenience, we divide the grayscale-gradient distribution into three partitions, i.e. low-gradient grayscale histogram, medium-gradient and high-gradient. We can see the differences between blocks in four types from Figure 2. Our algorithm can be summarized as follows. First we count low-gradient and high-gradient histograms of each block by checking all pixels and comparing them with their 8-neighbors. Then a series of operations are designed according to the histograms. If both low-gradient and highgradient counts are less than a pre-defined threshold, the block is labeled cont-tone. If the color range of the lowgradient histogram is small enough, the block is labeled smooth, and the dominant color is recorded for coding. Otherwise we find peaks both in the low-gradient histogram and high-gradient histogram, and regard a range of colors corresponding to a peak as a mode, if several (less than four) modes include most pixels, the block is labeled text/graphics.

Figure 2. Typical grayscale histograms of blocks in four types: smooth, text/graphics, cont-tone and hybrid

At last if high-gradient count is large enough, the block is labeled hybrid, or else the block is labeled cont-tone. Corresponding to four types of blocks, we design four coders. Smooth coder just records the dominant color of the block. Text coder is near-lossless, in which the colors of the same mode are all replaced with the most frequent color in that mode, and the colors that belong to neither mode is independently recorded. Cont-tone coder, similar to JPEG, is based on DCT transform. Hybrid coder is one-level Haar wavelet transform followed by entropy coding. Because of short wavelet bases of Haar, hybrid coder is efficient in removing ringing artifacts around edges, which are always annoying while using DCT. But due to the same reason, its coding performance is poor comparing with DCT while processing images with little edges. III.

RATE ALLOCATION WITH MODE SELECTION

A. Mode Selection Now we have four coders and we need to decide the proper coder for each block under a specified rate constraint. This problem is similar to the “mode selection” in traditional video compression. Cheng et al. [7] gave an approach of mode selection for block-based document compression, but in which they did not address rate allocation problem, besides, it does not consider the quality of text. In our system, we propose QRDO technique to guarantee the quality of text in compound images. The problem of optimal rate allocation is formulated as: min(∑ α i Di , x ( i ) ), subject to ∑ Ri , x ( i ) ≤ R0

i i (1) x(i ) ∈ {smooth, text/graphics, cont - tone, hybrid} where i indicates the i -th block and α i is a quality-weight factor, x(i ) is the selected coding mode. Here we evaluate α i as:

4948

⎧C hg / Thg for text/graphics/hybrid block

αi = ⎨

⎩1.0

for smooth/cont - tone block

(2)

Thg is a threshold, note that this factor is gotten during the segmentation process and it keeps the same for each block despite of the mode selection. In our system Thg is set as 0.1 to ensure that for most “text” blocks the factor α i is larger than 1.0. Using Lagrangian optimization, for each block we have to minimize the cost function

J i = α i Di , x (i ) + λRi , x ( i )

(3)

to choose the best mode x * (i ) . Here another problem is how to determine the value of λ to satisfy the given rate constraint.

B. Rate Allocation Method Rate-Distortion (R-D) optimized mode selection can achieve the optimal coding performance subject to a specified rate constraint. But first we need to set the parameter λ in (3), however, the evaluation of λ can be very computation-consumed while given a rate constraint R0 . We introduce an approximated algorithm to determine the Lagrangian parameter λ. First we evaluate the quality parameters for cont-tone coder and hybrid coder, according to the rate constraint. Then we use the slopes of the R-D curves of the two flexible coders to estimate λ and make the mode selection. After that total rate may not satisfy the constraint, so we change the quality parameters and estimate a new λ, and again select mode for each block. These operations can be iterated for several times, until the rate constraint is satisfied and minimum distortion keeps no change for a period (more than 10 times). This algorithm is shown as Figure 3. C. Modelazation and Quality Parameters Setting Modelization can help greatly in computations for ratedistortion tradeoff and it has been well studied in rate-control works for video coding. A recent result [8] shows that exponential approximation is proper for both rate and distortion, that is

(4)

D(Q) ≈ AQα , R(Q) ≈ BQ β

where Chg means the high-gradient count of this block and

where Q is the quantization step, and A, B, α > 0, β < 0 are parameters that depend on the characteristics of the block. According to the QRDO technique, here we replace the distortion with weighted distortion, i.e.

Di' = α i Di

(5)

α i is the quality-weight factor defined in (2). Then we use minimize-MSE fitting for both hybrid coder and cont-tone coder to solve the parameters in (4). So, derived from constant slope policy, we have λ=−

A α Ah α h α − β = − c c Qcα −β Qh Bc β c Bh β h h

h

c

c

(6)

which can calculate the Lagrangian parameter λ for mode selection. Here the subscript h and c indicate hybrid and cont-tone blocks. Besides, we have N h Rh + N c Rc ≤ R0'

(7)

where Nh and Nc represent the numbers of hybrid and conttone blocks, Rh and Rc correspond to average rates of hybrid and cont-tone blocks, and R0' means the rate constraint of hybrid and cont-tone blocks, it equals the difference of R0 and the rate consumed by smooth and text/graphics blocks. We assume that the total-rate constraint is high enough so that R0' is reasonable. So now, by combining (4), (6) and (7), Qh and Qc can be easily solved, and by (6) we also get a estimated value of the Lagrangian parameter λ for mode selection. IV.

EXPERIMENTAL RESULTS

First we verify the exponential approximation by plotting the average rate and average distortion versus quantization step for both hybrid and cont-tone blocks in logarithmic coordinates, as shown in Figure 4. It is clearly that logD and logR are both nearly linear with logQ for both hybrid and cont-tone blocks, which verifies the relations described in (4). Note that here the distortion is already weighted. Hybrid D-Q Hybrid R-Q Conttone D-Q Conttone R-Q

Quantization scale

Figure 4. Average distortion and rate of hybrid blocks and cont-tone blocks versus quantization scale (in logarithmic coordinates)

Figure 3. Rate allocation algorithm

4949

1.8 1.6

0.35

1.4

0.3

1

4.5 4 3.5

0.8

1.2

0.25

1 0.2

3 2.5

0.6

0.8 0.6

0.05 0

4

6

Iterations 8

10

12

14

16

18

Rate (bpp)

0.25

2

0.2

2

0.2

0

0

20

0

40 0.35 35 30

1.5

Rate Specified Rate Distortion

0.2

Distortion

0

0.4

2

4

1

0.3

0.5

Iterations 6

8

10

12

14

0

16

Rate (bpp)

Rate Specified Rate Distortion

0.1

0.4

25

Distortion

0.15

20

0.25

15

25

0.15

Figure 6 shows the curves of PSNR versus rate for several compound images using different coding methods. “No RDO” means the mode selection is skipped and the quality parameters for cont-tone and hybrid are the same, “QRDO” means the quality-biased mode selection and rate allocation are all performed, and “GRDO” means generic RDO in which no special requirement of the quality of text is considered, i.e. we set all quality-weight factor α i as 1.0. Obviously, generic RDO can improve PSNR by about 0.5 to 2 dB, or even more. But quality-biased RDO often decreases PSNR a little. This is because we replace the distortion with weighted distortion in R-D tradeoff, which will make the QRDO not “optimized” in the sense of average distortion. However as shown in Figure 7, QRDO method outperforms other methods in the metric of visual quality, especially it enhances the quality of text. We also give the results by DjVu [1] and JPEG.

5

Distortion

1.2

Rate (bpp)

0.4

2

Distortion

Rate (bpp)

0.45

0.2 20

10 0.15

0.1

15

Rate Specified Rate Distortion

0.05

0 0

2

4

10

Iterations 6

8

10

12

5

0.1

5

0.05

0

0

14

Rate Specified Rate Distortion 0

2

4

0 Iterations

-5 6

8

10

12

14

Figure 5. Evolvements of rate and distotion during rate allocation iterations

42

No RDO QRDO GRDO

42 41

PSNR (dB)

44

43

No RDO QRDO GRDO

46

40 39

V.

38

40

37 38

36 Rate (bpp)

42

0.3

0.35

0.4

35 0.25

41

39 38 37

Rate (bpp) 0.3

0.35

0.4

0.45

0.5

0.4

0.5

0.6

0.7

0.55

No RDO QRDO GRDO

40

PSNR (dB)

40

38

0.25

No RDO QRDO GRDO

41

39

0.2

PSNR (dB)

36 0.15

37

36

36

35

35

34

34 33 0.25

Rate (bpp) 0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

Rate (bpp)

33 0.2

0.3

CONCLUSION

We present an effective rate allocation approach for compound image coding with block-based segmentation and QRDO-based mode selection. The segmentation algorithm gives out not only text/graphics and picture, but also hybrid blocks. The rate allocation method gives an approximated estimation of the parameters for mode selection, which is verified rapid and efficient. During the mode selection and QP setting, the distortion is replaced with weighted distortion, which gives out Quality-biased RDO technique that enhances the quality of text. This algorithm is efficient in balancing visual quality and coding performance and it outperforms generic RDO method.

PSNR (dB)

48

0.8

Figure 6. PSNR versus rate for several compound images with different coding methods: No RDO, QRDO and GRDO

REFERENCES [1]

[2]

[3]

[4]

[5] Figure 7. Visual quality comparison with compression ratio: raw image, DjVu (22.25:1), JPEG (22.11:1), No RDO (22.29:1), GRDO (22.19:1) and QRDO (22.18:1)

[6]

Figure 5 shows the evolvements of rate and distortion during the rate allocation iterations. As expected, the rate keeps always near to the specified rate, and the distortion gradually decreases to minimum. But sometimes the distortion fluctuates. So we set the terminating requirement of the iterations as the minimum distortion that has been found keeps no change for a period.

[7]

[8]

4950

Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, Yoshua Bengio and Yann LeCun, “High Quality Document Image Compression with DjVu”, Journal of Electronic Imaging, vol.7, no.3, pp.410-425, July 1998 Patrice Y. Simard, Henrique S. Malvar, James Rinker, and Erin Renshaw, “A Foreground/Background Separation Algorithm for Image Compression”, DCC ’2004, pp.498-507, Mar. 2004 B.-F. Wu , C.-C. Chiu and Y.-L. Chen, “Algorithms for compressing compound document images with large text/background overlap”, IEE Proceedings - Vision, Image, and Signal Processing, vol.151, no.6, pp.453-459, Dec. 2004 Xin Li and Shawmin Lei, “Block-based Segmentation and Adaptive Coding for Visually Lossless Compression of Scanned Documents”, ICIP ’2001, vol.3, pp.450-453 Amir Said, “Compression of Compound Images and Video for Enabling Rich Media in Embedded Systems”, SPIE Proc. vol. 5308, pp. 69-82, Jan. 2004 Tony Lin and Pengwei Hao, “Compound Image Compression for Real-Time Computer Screen Image Transmission”, IEEE Trans. on Image Processing, vol.14, no.8, pp.993-1005, Aug. 2005 Hui Cheng and Charles A. Bouman, “Document compression using rate-distortion optimized segmentation”, Journal of Electronic Imaging, vol.10, no.2, pp.460-474, Apr. 2001 Nejat Kamaci, Yucel Altunbasak, and Russell M. Mersereau, “Frame Bit Allocation for the H.264/AVC Video Coder Via Cauchy-DensityBased Rate and Distortion Models”, IEEE Trans. on Circuits and Systems for Video Technology, vol.15, no.8, pp.994-1006, Aug. 2005

Suggest Documents