Down–Sampling for Better Transform Compression - CiteSeerX

4 downloads 0 Views 3MB Size Report
Alfred M. Bruckstein. Michael Elad. ¡. Ron Kimmel. ¢. Abstract. The most popular lossy ...... [2] D. Taubman. High Performance Scalable Image Compression with ...
Submitted to the IEEE Transactions on Image Processing on February 2001.

100

Down–Sampling for Better Transform Compression Alfred M. Bruckstein

Michael Elad



Ron Kimmel



Abstract The most popular lossy image compression method used on the Internet is the JPEG standard. JPEG’s good compression performance and low computational and memory complexity make it an attractive method for natural image compression. Nevertheless, as we go to low bit rates that imply lower quality, JPEG introduces disturbing artifacts. It is known that at low bit rates a down–sampled image when JPEG compressed visually beats the high resolution image compressed via JPEG to be represented with the same number of bits. Motivated by this idea, we show how down–sampling an image to a low resolution, then using JPEG at the lower resolution, and subsequently interpolating the result to the original resolution can improve the overall PSNR performance of the compression process. We give an analytical model and a numerical analysis of the down–sampling, compression and up–sampling process, that makes explicit the possible quality/compression trade-offs. We show that the image auto-correlation can provide good estimate for establishing the down–sampling factor that achieves optimal performance. Given a specific budget of bits, we determine the down sampling factor necessary to get the best possible recovered image in terms of PSNR.

keywords: JPEG compression, image down–sampling, bit allocation, quantization.

1. Computer Science Department, Technion, Haifa 32000, Israel, email: [email protected]. 2. Computer Science Department - SCCM Program, Stanford University, Stanford, CA. 94305, USA, email: [email protected]. 3.Computer Science Department, Technion, Haifa 32000, Israel, email: [email protected].

101

I. I NTRODUCTION The most popular lossy image compression method used on the Internet is the JPEG standard [1]. Figure 1 presents a basic block diagram of the JPEG encoder. JPEG uses the Discrete Cosine Transform (DCT) on image blocks of size



pixels. The fact that JPEG operates on small blocks is motivated by

both computational/memory considerations and the need to account for the non-stationarity of the image. A quality measure determines the (uniform) quantization steps for each of the

 

DCT coefficients. The

quantized coefficients of each block are then zigzag-scanned into one vector that goes through a runlength coding of the zero sequences, thereby clustering long insignificant low energy coefficients into short and compact descriptors. Finally, the run-length sequence is fed to an entropy coder, that can be a Huffman coding algorithm with either a known dictionary or a dictionary extracted from the specific statistics of the given image. A different alternative supported by the standard is arithmetic coding.

Slice to 8 by 8 Blocks

Discrete Cosine Transform

Uniform Quantization

Fig. 1. JPEG block diagram

ZigZag Scan

Entropy Coding

`01001010 ...'

JPEG’s good middle and high rate compression performance and low computational and memory complexity make it an attractive method for natural image compression. Nevertheless, as we go to low bit rates that imply lower quality, the JPEG compression algorithm introduces disturbing blocking artifacts. It appears that at low bit rates a down–sampled image when JPEG compressed and later interpolated, visually beats the high resolution image compressed directly via JPEG using the same number of bits.

102

Whereas this property is known to some in the industry (see for example [9]), was never explicitly proposed nor treated in the scientific literature. One might argue however, that the hierarchical JPEG algorithm implicitly uses this idea when low bit-rate compression is considered [1]. Let us first establish this interesting property though a simple experiment, testing the compressiondecompression performance both by visual inspection and quantitative mean-square-error comparisons. An experimental result displayed in Figure 2 shows indeed that both visually and in terms of the Mean Square Error (or PSNR), one obtains better results using down–sampling, compression, and interpolation after the decompression. Two comments are in order at this stage: (i) throughout this paper, all experiments are done using Matlab v.6.1. Thus, simple IJG-JPEG is used with fixed quantization tables, and control over the compression is achieved via the Quality parameter; and (ii) throughout this paper, all experiments applying down-sampling use an anti-aliasing pre-filter, as Matlab 6.1 suggests, through its standard image resizing function. Let us explain this behavior from an intuitive perspective. Assume that for a given image we use blocks of

  

pixels in the coding procedure. As we allocate too few bits (say 4 bits per block on average),

only the DC coefficients are coded and the resulting image after decompression consists of essentially constant valued blocks. Such an image will clearly exhibit strong blocking artifacts. If instead the image is down-sampled by a factor of 2, the coder is now effectively working with blocks of an average budget of







  

and has

bits to code the coefficients. Thus, some bits will be allocated to

higher order DCT coefficients as well, and the blocks will exhibit more detail. Moreover, as we up-scale the image at the decoding stage we add another improving ingredient to the process, since interpolation further blurs the blocking effects. Thus, the down-sampling approach is expected to result in better both visually and qualitatively outcomes. In this paper we propose an analytical explanation to the above phenomenon, along with a practi-

103

Fig. 2. Original image (on the left) JPEG compressed–decompressed image (middle), and down-sampled–JPEG

  . The compressed       ‘Lena’ image in both cases used    bpp inducing MSE‘s of    and !  " , respectively. The " # " ‘Barbara’ image in both cases used   bpp inducing MSE‘s of     $ and  $ % $  , compressed  compressed-decompressed and up sampled image (right). The down sampling factor was

respectively.

cal algorithm to automatically choose the optimal down–sampling factor for best PSNR. Following the method outlined in [4], we derive an analytical model of the compression–decompression reconstruction error as a function of the memory budget, (i.e. the total number of bits) the (statistical) characteristics of the image, and the down–sampling factor. We show that a simplistic second order statistical model provides good estimates for the down–sampling factors that achieves optimal performance. This paper is organized as follows. Sections II–III–IV present the analytic model and explore its

104

theoretical implications. In Section II we start the analysis by developing a model that describes the compression-decompression error based on the quantization error and the assumption that the image is a realization of a Markov random field. Section III then introduces the impact of bit-allocation so as to relate the expected error to the given bit-budget. In Section IV we first establish several important parameters used by the model, and then use the obtained formulation in order to graphically describe the trade-offs between the total bit-budget, the expected error, and the coding block-size. Section V describes an experimental setup that validates the proposed model and its applicability for choosing best down–sampling factor for a given image with a given bits budget. Finally, Section VI ends the paper with some concluding remarks. II. A NALYSIS

OF A

C ONTINUOUS “JPEG- STYLE ” I MAGE R EPRESENTATION M ODEL

In this section we start building a theoretical model for analyzing the expected reconstruction error when doing compression–decompression as a function of the total bits budget, the characteristics of the image, and the down–sampling factor. Our model considers the image over a continuous domain rather then a discrete one, in order to simplify the derivation. The steps we follow are: 1. We derive the expected compression-decompression mean-square-error for a general image representation. Slicing of the image domain into &

by

'

blocks is assumed.

2. We use the fact that the coding is done in the transform domain using an orthonormal basis, to derive to error induced due to truncation only. 3. We extend the calculation to account for quantization error of the non-truncated coefficients. 4. We specialize the image transform to the DCT basis. 5. We introduce an approximation for the quantization error, as a function of the allocated bits. 6. We explore several possible bit-allocation policies and introduce the overall bit-budget as a parameter into our model.

105

At the end of this process we obtain an expression for the expected error as a function of the bit budget, scaling factor, and the image characteristics. This function eventually allows us to determine the optimal down–sampling factor in JPEG-like image coding. A. Compression-Decompression Expected Error Assume we are given images on the unit square of a 2D random process

( )+* ,  ( )+* , , -.0/213*547698:( )+* ,  ( )+* ,=

, realizations

? -.0/213*5476A@ , with second order statistics given by

B ( - .0/213*5476C,  )+*

NM O  RP Q:STU VWT U PRQ:SX U VWX U Y

D#/213*54FE51 GIH"J+*54KGIH"L6

and

(1)

Note that here we assume that the image is stationary. This is a marked deviation from the real-life scenario, and this assumption is done mainly to simplify our analysis. Nevertheless, as we shall hereafter see, the obtained model succeeds in predicting the down-sampling effect on compression-decompression performance. We assume that the image domain

( )+* ,  ( )+* ,

[\ ]_^a`b3c * b  5` fgc * f & & d e ' h ' d

is sliced into for

Assume that due to our coding of the original image result

b



-n.9/213*5476

&Z!'

regions of the form

R*ij* YkY *A&lE f



R*ij* YkY *A'm*

we obtain the compressed-decompressed

o- .9/213*5476 , which is an approximation of the original image. We can measure the error in approxi-

mating

-R.9/213*5476

by

o- .9/213*5476

p   .

 x x c u / 9 . 2 / 3 1 5 * 7 4 6 o9 . 2 / 3 1 5 * 7 4 5 6 6 1 4 Os utwv Os ut z q5q~€ ‚  x x /u- .0/213*5476 c o- .9/213*547656 1 4 ]h}{ z x x q5q~€ ‚ [\ ] /u- .9/213*5476 c o- .9/213*547656 1 4 ]}{ Area / 6 Area / [\ ] 6 z B…‡† / [\ ] 6A@ˆ* ]h}{ &Z ' ? &„ƒ

q5qNr



zy \|{ 

zy \|{ 

zy \|{

as follows

r

(2)

106

where we define

&„ƒ B…‡† / [\ ] 6 ^

[\ ] qAq ~ € ‚ u/ - .0/213*5476 c o- .9/213*547656  x 1 x 4 Y Area / 6

(3)

We shall, of course, be interested in the expected mean square error of the digitization, i.e.,

 zy z BЉ p   BŽ .Œ‹ |\ { h}]{ &Z!'

[\ ] q5q~€ ‚ /u- .9/213*5476 c o- .9/213*547656  x 1 x +4  Area / 6

(4)

Note that the assumed wide–sense stationarity of the image process results in the fact that the expression

B ( &„ƒ B … / [\ ] 6C,

/ b * f 6 , i.e., we have the same expected mean square error over each

is independent of

slice of the image. Thus we can write

B (p  ,  . 

B B…‡† / [ 5 6C, &Z!' Z & !' ( &„ƒ B ` [ qAqIˆ~ ‘u‘ u/ - .0/213*5476 c o- .9/213*547656  x 1 x 4 Y d Area / 5 6

(5)

Up to now we considered the quality measure to evaluate the approximation of -+.0/213*5476 in the digitization process. We shall next consider the set of basis functions needed for representing

-+.9/213*5476

over each

slice. B. Bases for representing

-.9/213*5476

over slices

In order to represent the image over each slice tions. Denote this basis by

qAqI~€ ‚ If

?R’0“"•w@

[\ ] , we have to choose an orthonormal basis of func-

?R’”“"•W/213*5476A@!“ s • { O s 5s s – – – @ . We must have

x x š  ™ ™ œ  ž › ’0“"•—’0“˜ • ˜ 1 4 “ Q “˜ • Q • ˜  Ÿ )

/u F*A¡C6 ^ u/  j¢w*A¡2¢|6 otherwise Y if

is indeed an orthonormal basis then we can write

- .9/213*5476



z£ z£ - .9/213*5476i*’0“"•‡/213*54765¥5’0“"•W/213*5476i* “ { O • { O7¤

(6)

107

as a representation of

-.9/213*5476

¦ “"• ^

over

in terms of an infinite set of coefficients

¤ - .9/213*5476i*’0“"•‡/213*54765¥

Suppose now that we approximate functions

[\ ]

-.9/213*5476

over

§q5q~€ ‚ [\ ]

x x - .9/213*54765’0“"•W/213*5476 1 4 Y

by using only a finite set

¨

(7) of the orthonormal

?R’©“"•W/213*5476A@ , i.e. consider zªz¬« 

o- .9/213*5476

“ s • ­w®R¯ ¤ - .0/213*5476i*’0“"•W/213*54765¥5’0“"•‡/213*5476i*

The optimal coefficients in the approximation above turn out to be the corresponding infinite representation. The mean square error of this approximation, over

 &„ƒ B…‡† / [ 5 6  y  q5qˆ~ ‘u‘ & ' ` }-

[ 5

(8)

¦ “"•

’s from the

say, will be

 x x  - .9/213*5476 c o- .9/213*5476‡± 1 4 x x x x x x  q5q~ˆ‘u‘ q5q~ˆ‘u‘  - .9/213*5476² o- .0/213*5476 1 ³ 4 G o- . 2/ 13*5476 1 4 d . /213*5476 1 4 c  q5q~ˆ‘u‘0°

(9)

Hence,

 &„ƒ B…‡† / [ 5 6

 Now the expected

x x q5q~ˆ‘u‘  &Z!' ` - . /213*5476 1 4 zªz¬« x x qiqI~ˆ‘u‘ c  - .0/213*5476µ´F“"•W/213*5476 1 4 s“ • ­w®R¯ ¤ - .0/213*5476µ´F“"•W/213*54765¥ z¶z «  G “ s • ­w®R¯ ¤ - .9/213*5476µ´F“"•‡/213*54765¥ d z¶z¬« x x  q5q~ˆ‘u‘  &Z!' ` - . /213*5476 1 4 c s“ • ­w®R¯ ¤ - .0/213*5476i*i´F“"•‡/213*54765¥ d

&„ƒ B·…‡† / [ 5 6

(10)

will be:

 B ( &„ƒ B…‡† / [ 5 C6 ,  & ' ` qAqI~ˆ‘u‘ BЉ - .  /213*5476 x 1 x 4 c z¶z « B ‰ l ‹ s“ • ­w®R¯ ¤ - .9/213*5476i*i´F“ s •‡/213*54765¥ ‹ d

(11)

Hence:

B ( p ¸ ,  

z¶z « M B ¦  &Z!'¹ O  &Z !' c Z &  'º s“ • ­w®R¯ ( “"• , z z B ¦ s Y M O c &Z!'¹ « ( “ •, “ s • ­w®R¯

(12)

108

C. The effect of quantization of the expansion coefficient

¦ “"•

Suppose that in the approximation

- o .9/213*5476

 zªz¬«

¦ s “ s • ­w®R¯ "“ •2’0“ •C/213*5476

(13)

¦ “"•W@ that take values in R. If ¦ “"• ¦¾½ is represented / encoded with ¼» “"• -bits we shall be able to describe it via “"• that takes on ¿CÀ‡Á values ¦³½ „ / ¦ “"•26_87ÃÄ; set of R¿CÀ‡Á representation levels. The error in representing ¦ “"• in this only, i.e. “"• ¿CÀ‡Á   ¦  way is Å “"• / “"• c ¦K“"½ • 6 Y Let us now see how the quantization errors affect the &„ƒ BK…‡† / [ 5 6 . We we can only use a finite number of bits in representing the coefficients ?

have

 x x qAq ~ˆ‘u‘ °  [ ³ B ½ ½ ‡ … † 5 - .9/213*5476 c o- . /213*5476‡± 1 4 &„ƒ / 6 (14)  y « §ÆÇÆ « ½ ½ ¦ where o- . J s L ­ “ s • ­w®R¯ “ s • ´F“"•W/213*547}6 . Now  x x   qAq ~ˆ‘u‘ ° [ ³ B ½ ½ ‡ … † 5 &„ƒ / 6 &Z ' - .9/213*5476 c o- .9/213*5476ŒGÈ o- .9/213*5476 c o- . /213*5476‡± 1 4  x x  qAq~ˆ‘u‘0° c &Z ' - .9/213*5476 o- .9/213*5476‡± 1 4 x x q5q~ˆ‘u‘0° ° GÉ!&Z ' - .9/213*5476 c - o .9/213*5476‡± o- .9/213*5476 c o- . ½ /213*5476‡± 1 4 x x q5q~ˆ‘u‘0° ½ G·&Z ' o- .9/213*5476 c - . /213*5476‡± 1 4 zªz¬«  q5q~ˆ‘u‘_Ì ¦ &„ƒ B…‡† / [ 5 6ÊGË· & ' - .9/213*5476 c “ s • ­w®R¯ “"•2’0“"•C/213*5476‡ÍÎ x x Ì zªz « ¦ ¦K½  s“ • ­w®R¯ / “"• c “"• 65’0“"•W/213*5476‡Í 1 4 x x q5q~ˆ‘u‘_Ì zªz « ¦ K ¦ ½ G·&Z ' “ s • ­w®R¯ / “"• c “"• 65’0“"•W/213*5476‡Í 1 4 z¶z «  ¦ ¦K½ ¦ &„ƒ B…‡† / [ 5 6ÊGËR& ' ` “ s • ­w®R¯ / “"• c “"• 63 “"• z¶z « ¦ ¦K½ ¦ c “ s •  ­w®R¯ / “"• c “"• 63 “"• d q5q~ Ì zªz « ¦ “"• c ¦K½ 65’0“"•|Í x 1 x 4 G·&Z '¹ / s “"• 5 “ • ­w®R¯ z¶z¬«  ¦ ¦ ½  &„ƒ B…‡† / [ 5 6ÊG& '¹ s“ • ­w®R¯ / “"• c “"• 6

109

 The expected

z¶z «  &„ƒ B…‡† / [ 5 Ê6 G& '¹ s“ • ­w®R¯ Å "“ •

&„ƒ B ‡… ½ † / [ 5 6

is therefore given by:

BÐÏ &„ƒ B³…‡½ † / [ 5 C6 Ñ  B ( &„ƒ B…‡† / [ 5 C6 ,G&Z!'¹ zªz¬« “ zªz « z¶z NM O Bl‰ ¦   c &Z!'¹ “"•k‹ G &Z!' s“ • ­w®R¯  Hence, in order to evaluate

B (k/ p . ½ 6  ,

B  s • ­w®R¯ ( Å “"• , « BÐÏ ¦ ¦ ½  K s“ • ­w®R¯ / “"• c “"• 6 Ñ

in a particular representation when the image is sliced into

pieces and over each piece we use a subset

¨

of the possible basis functions (i.e.

(15)

&Z '

¨¹Òº?j/u F*A¡C6¼Ó  F*A¡



)+* R*i YkY @ ) and we quantize the coefficients with ÔK“"• - bits we have to evaluate M O c G

z¶z «

¦ “ s • ­w®R¯ ? variance of “"•w@ z¶z « ¦ & '¹ “ s • ­w®R¯ ? expected error in quantizing “"• & '¹

with

/u»“"•26

bits

@Y

D. An important particular case: Markov process with separable Cosine bases We now return to the assumption that the statistics of

B ( - .9/213*5476C,  )+*

and

? -n.9/213*5476A@

 M O P Q:STU VWTRU P Q:SX U VWX!U B ( - .9/213*5476µ- .0/21GH"Jj*54KGH"LR6C, N

and we choose a separable cosine basis for the slices, i.e. over where

Õ Õ

“+/21Ê6 •‡/21Ê6



Ö 

is given by Equation (1), namely,

™ &Ð/u c “ 6+×"ØÙF +&ÛÚÜ13* Ö ™ 'Ð/u c •26+×"ØÙ:¡R'ÛÚÜ13*

šÕ Õ ( )+* y ,  ( )+* , , ’0“"•C/213*5476 “+/21Ê6 •‡/2476 , }

¡

  



)+* R*ij* YkYkY )+* R*ij* YkYkY

This choice of using the DCT basis is motivated by our desire to model the JPEG behavior. As is well known [6], the DCT offers a good approximation of the KLT if the image is modelled as a 2D random Markov field with very high correlation factor.

110

To compute

B (p  , .

for this case we need to evaluate the variances of

¦ “"•

defined as

¦ “"• ^ q5q~ˆ‘u‘ - .9/213*5476 Õ “/21Ê6 Õ •µ/2476 x 1 x 4

(16)

We have

B (¦  ,  “"• 

B ` q5q~ˆ‘u‘q5q~ˆ‘u‘ - .9/213*5476µ- .0/2Ý+*5Þ76 Õ 7“ /21Ê6 Õ •‡/2476 Õ “/2Ýn6 Õ •‡/2Þ76 x 1 x 4 x Ý x Þ d  q5q ~ˆ‘u‘ q5q ~ˆ‘u‘ M O P Q:STRU J Q7ß¼U P Q:SX U L Q7à U ™ !&Ð/u c “6j×"ØÙ /u +ÚÊ&á1Ê6j×"ØÙR/u +ÚÊ&áÝn6 x x x x ™  '/u c •26j×"ØÙ /w¡wÚÊ'm476j×"ØÙR/w¡wÚÊ'mÞ76 1 4 Ý Þ Y

(17)

Therefore, by separating the integrations ‘ we obtain ‘

 MO ™ q â B (¦  , N c u /  “ 6 O “"• ™ q —/u c •26

x x q ‘ â RP ‘ Q:STU J Q7ß¼U ×"ØÙ!/u +ÚÊ&á1Ê6j×"ØÙR/u +ÚÊ&áÝn6‡& 1 Ý O ã q ã L Q7à U ×"ØÙ /w¡wÚÊ'm476j×"ØÙ /w¡wÚÊ'mÞ76‡' x 4 x Þ R P : Q  S X U O O

(18)

Changing variables of integration to





ä Î &á13* 1æ å ( )+* ,  4 ä 'm4F* 4ä åI( )+* ,

ä &áÝ+*  Ý åI( )+* ,  Þ ä 'mÞF* Þçä åI( )+* ,C* Ý

ä 

yields

B (¦  ,  “"•

ä x x ä M O ™ ™ q q u/  c “6"/u c •26

Values of

»"“"•





?+

+

R)

??

@

?

)

  ¼)@ ¼)ù

GIH H H H H H H H H H H H H H H

 H

Y H H H H

77 H H

 ¼)R  ù

  ù?D??   @ E@  F7   7¬ ¼)ù + R) ¼)    ¼)) ¼ )ù @@ 7RD@D@?D@

H H H H H J

were chosen to be inversely proportional to these values.

C. Soft bit allocation with cost functions for error and bit usage We could also consider cost functions of the form K

yMLN

° BÐÏ p O ½  / 6 ѱG

K 

/w&Z!'¹ Bits/slice 6i*

116

where

K

yMLN

and

K  are cost functions chosen according to the task in hand, and ask for the bit alloca-

tion that minimize the joint functionals, in the spirit of [5]. IV. T HE T HEORETICAL P REDICTIONS

OF THE

M ODEL

In the previous sections we proposed a model for the compression error as a function of the image statistics (

MO * ð J * ð L ), the given total bits budget Ô

1.21 , and the number of slicings

&

and

'

. Here,

we fix these parameters according to the behavior of natural images and typical compression setups and study the behavior of the theoretical model.

 +? 

Assume we have a gray scale image of size ?+  considers

 

with 8 bits/pixel as our original image. JPEG

slices of this image and produces, by digitizing the DCT transform coefficients with a

predetermined quantization table, approximate representation of these

 I

slices. We would like to

explain the observation that down–sampling the original image, prior to applying JPEG compression to a smaller image, produces with the same bit usage, a better representation of original image. Suppose the original image is regarded as the ’continuous’ image defined over the unit square

( )+* , , as we have done in the theoretical analysis. BPQ?+ 

Then, the pixel width of a ?+ 

 +? 

( )+* , 

image will be

. We shall assume that the original image is a realization of a zero mean 2D stationary random

 M O U \ Q \ ˜ U  U ] Q ] ˜ U process with autocorrelation of the form D#/AÓ bcb ¢ Ó * Ó fc f ¢ Ó 6 Ó R Ó Ó R Ó , with R , and R  in M the range of ( ) Y  ?j*A) Y @?!, , as is usually done (see [6]). From a single image, O can be estimated via the expression

M O! S

?+ 



W V   z   Y b f c b / U / F T / * 6 + 6 6  / / G  )  ? 6 6 ? n+ Y ùj* 6T ?+  \ s ] ? O z

assuming an equalized histogram. If we consider that R

\ \ U Q ˜ U ö

‘€

€‘

‘ \ \ ˜ P Q:SU W Q W U  P Q W è U Q ˜ U *

we can obtain an estimate for

using P

ð



c ?+ð 

‘

117

Q Wè 

R åI( ) Y j*A) Y @?!, , This provides

 Ø EXYR c ; c +?    ØXYR ð åI( ?j*iR) , Y

The total number of bits for the image representation will range from ) Y )? bpp to about

Ô

 +?   ) Y ) ?  ùj* ¼)E7

1.21 will be between ?+ 

 +?     ?  *i 

to ?+ 

 Y)

bpp, hence,

bits for ?+ 

 +? 

M Jð +* ð L åá( ?j*iR) , , ñ S ?R))) for ? gray level images, with total bit usage between ¼)+*A))) and R))+*A))) .  R*ij* YkYkY   , where we assume that The symmetric 1 and 4 axis slicings considered will be &á*A'   Jð Lð and & ' . Then we we shall evaluate (see Equation (26)) ™ ™ ì ì B ‰ / p ½ ,w6  ^ M O ô© c zªz ÷ . ‹ “ s •k®R¯ /u c “6"/u c •26 /ê& ð *i F*i 76 /F' ð *A¡‡*A¡C6"(ø c  “"• , õ original images. Therefore, in the theoretical evaluations we shall take

with

 

“"• -s provided by the optimal level allocation ‘

    ™ ™ ì ì y 6 “"• /u c “6"/u c •w6 / & ð *i F*i 76 / & ð *A¡‡*A¡C6‡÷/u

Practically, the optimal level allocation



“"þ •

should be given by



«



‘

 (  “ s • ­w®R¯ /  ä "“ • 6C,   [Z]\^ /µ R*_  “"•a` 6 , a measure that "“þ •  

automatically prevents the allocation of negative numbers of bits. Obviously this step must be followed by re-normalization of the bit allocation in order to comply with the bits budget constraint. taken from 1 to 3, whereas

¨

approach which is coding of

will be

 æ

precise encoding of only about

Ó ¨KÓ



?j/u F*A¡C6¼Ó  G ¡'4b7 ,  F*A¡



÷

can be

)+* R* YkY 7@ , simulating the standard JPEG

transform coefficients, emphasizing the low frequency range via the

ù

coefficients.

Using the above described parameter ranges, we plot the predictions of the analytical model for the expected mean square error as a function of the slicings

&

with bit usage as a parameter.

Figures 3 and 4 demonstrate the approximated error as a function of the number of slicings for various total number of bits. Figure 3 displays the predictions of the theoretical model in conjunction with optimal level allocation while Figure 4 uses the JPEG style rigid relative bit allocation. In both figures

118

the left side shows the results of restricting the number of bits or quantization levels to integers, while the right side shows the results allowing fractional bit and level allocation. These figures show that for every given total number of bits there is an optimal slicing parameter

&

indicating the optimal down–sampling factor. For example, if we focus on the bottom right graph (rigid bit allocation with possible fractions), if 50Kbits are used, the optimal & that an image of size ?+ 

 +? 

W  W c ed

S

   

W  5 W  c

  

should be sliced to blocks of size c 5

move to a budget of only 10Kbits, optimal & c ed

is found to be ù . This implies

is found to be



. As we

and the block size to work with becomes

. Since JPEG works with fixed block size of

¾æ

, the first case of 50Kbits calls

for a down-sampling by a factor of  , and the second case with 10Kbits requires a down-scaling factor of ù Y ? . Note that integer bit allocation causes in both cases non-smooth behavior. Also in Figure 3 it appears that the minimum points are local ones and the error tends to decrease again as

&

increases. This

phenomenon can be explained by the fact that we used an approximation of the quantization error which fails to predict the true error for a small number of bits at large down-scaling factors. Finally, we should note that the parameter

÷

was chosen differently between the two allocation policies. Within the range

(ø R*iù!, , we empirically set a value that matched the true JPEG behavior. The overall qualitative behavior however was quite similar for all the range of ÷ ’s. Figure 5 shows the theoretical prediction of PSNR versus bits per pixel curves for typical ?+  images with different down–sampling factors (different values of is

  &fPQ?+ 

&

 +? 

, where the down–sampling factor

). One may observe that the curve intersections occur at similar locations as those of the

experiments with real images shown in the next section. Also, it appears that even though the allocation policies are different, the results are very similar. An interesting phenomenon is observed in these graphs: For down-sampling factors smaller than



an

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

10Kbits 30Kbits

0.5

190Kbits

0

10

20

30

M

40

50

60

10Kbits 30Kbits 50Kbits 70Kbits 90Kbits 110Kbits 130Kbits 150Kbits 170Kbits 190Kbits

0.4

130Kbits 150Kbits 170Kbits

0.3

0.6

0.5

50Kbits 70Kbits 90Kbits 110Kbits

0.4

0.2

Normalized MSE

Normalized MSE

119

0.3

0.2

70

0

10

20

30

M

40

50

Fig. 3. Theoretical prediction of MSE based on optimal bit allocation versus number of slicings g usage as a parameter. Here, we used the typical values hji 50Kbits 30Kbits 10Kbits

1

   , and kli 

60

70

with total bits

1

70Kbits 0.9

0.9

0.8

10Kbits

0.8

0.7

Normalized MSE

Normalized MSE

90Kbits

0.6

0.5

0.7

30Kbits

0.6 50Kbits 0.5 70Kbits

110Kbits 0.4

130Kbits 150Kbits

0.3

0.2

170Kbits 190Kbits 0

10

20

30

M

40

50

60

70

0.4

90Kbits 110Kbits

0.3

0.2

130Kbits 150Kbits 170Kbits 190Kbits 0

10

20

30

M

Fig. 4. Rigid relative bit allocation based prediction of MSE versus number of slicings g a parameter. Here, we used the typical values hmi

   , and kli  

40

50

60

70

with total bits usage as

almost flat saturation of the PSNR versus bit-rate is seen at sufficiently large bit-rate. This phenomenon is quite expected, since the down-sampling of the image introduces un-recoverable loss. Thus, even an infinite amount of bits can not recover this induced error, and this is the obtained saturation height. The reason this flattering happens at lower bit-rates for smaller factors is that the smaller the image, the smaller the amount that will be considered as leading to near-perfect transmission. In terms of qualityscalable coder, this effect is parallel to the attempt to work with an image pyramid representation and

120

allocating too many bits to a specific resolution layer, instead of also allocating bits to the next resolution layer. Theoretical Prediction 512x512

22

Theoretical Prediction 512x512

22 Scale=1

21

Scale=1

21

Scale=0.875

20

Scale=0.75

20

Scale=0.75

19

Scale=0.625

19

Scale=0.625

Scale=0.5

18

PSNR

PSNR

Scale=0.875

17

Scale=0.375

16

Scale=0.5

18

17

Scale=0.375

16 Scale=0.25

Scale=0.25

15

15

14

14

13

0

0.5

1

bpp

1.5

2

13

2.5

Theoretical JPEG Prediction 512x512

21

0

0.5

22

Scale=1

1

bpp

1.5

2

2.5

Theoretical JPEG Prediction 512x512

Scale=0.875 Scale=1

20

Scale=0.75

21

19

Scale=0.625

20

Scale=0.75

19

Scale=0.625

Scale=0.875

PSNR

17

Scale=0.375

16

PSNR

Scale=0.5

18

Scale=0.5

18

17

Scale=0.375

Scale=0.25 15

16

14

15

13

14

Scale=0.25

12

0

0.5

1

bpp

1.5

2

2.5

13

0

0.5

1

bpp

1.5

2

2.5

Fig. 5. Optimal (top) and rigid relative (bottom) bit allocation based prediction of PSNR versus bits per pixel with image down–sampling as a parameter. Again, we used the typical values hni allocation case, and kli



   , and koi

for the optimal bit

for the JPEG-style case.

V. C OMPRESSION R ESULTS

OF

NATURAL

AND

S YNTHETIC I MAGES

To verify the validity of the analytic model and design a system for image trans-coding we can generate synthetic images for which the autocorrelation is similar to that of a given image. Then we can plot the PSNR/bpp JPEG graphs for all JPEG qualities, one graph for each given down–sampling ratio. The

121

statistical model is considered valid if the behavior is similar for the natural image and the synthesized one. A. Image Synthesis Assume that for an image pÜ/rq *ts

Suggest Documents