Submitted to the IEEE Transactions on Image Processing on February 2001.
100
Down–Sampling for Better Transform Compression Alfred M. Bruckstein
Michael Elad
Ron Kimmel
Abstract The most popular lossy image compression method used on the Internet is the JPEG standard. JPEG’s good compression performance and low computational and memory complexity make it an attractive method for natural image compression. Nevertheless, as we go to low bit rates that imply lower quality, JPEG introduces disturbing artifacts. It is known that at low bit rates a down–sampled image when JPEG compressed visually beats the high resolution image compressed via JPEG to be represented with the same number of bits. Motivated by this idea, we show how down–sampling an image to a low resolution, then using JPEG at the lower resolution, and subsequently interpolating the result to the original resolution can improve the overall PSNR performance of the compression process. We give an analytical model and a numerical analysis of the down–sampling, compression and up–sampling process, that makes explicit the possible quality/compression trade-offs. We show that the image auto-correlation can provide good estimate for establishing the down–sampling factor that achieves optimal performance. Given a specific budget of bits, we determine the down sampling factor necessary to get the best possible recovered image in terms of PSNR.
keywords: JPEG compression, image down–sampling, bit allocation, quantization.
1. Computer Science Department, Technion, Haifa 32000, Israel, email:
[email protected]. 2. Computer Science Department - SCCM Program, Stanford University, Stanford, CA. 94305, USA, email:
[email protected]. 3.Computer Science Department, Technion, Haifa 32000, Israel, email:
[email protected].
101
I. I NTRODUCTION The most popular lossy image compression method used on the Internet is the JPEG standard [1]. Figure 1 presents a basic block diagram of the JPEG encoder. JPEG uses the Discrete Cosine Transform (DCT) on image blocks of size
pixels. The fact that JPEG operates on small blocks is motivated by
both computational/memory considerations and the need to account for the non-stationarity of the image. A quality measure determines the (uniform) quantization steps for each of the
DCT coefficients. The
quantized coefficients of each block are then zigzag-scanned into one vector that goes through a runlength coding of the zero sequences, thereby clustering long insignificant low energy coefficients into short and compact descriptors. Finally, the run-length sequence is fed to an entropy coder, that can be a Huffman coding algorithm with either a known dictionary or a dictionary extracted from the specific statistics of the given image. A different alternative supported by the standard is arithmetic coding.
Slice to 8 by 8 Blocks
Discrete Cosine Transform
Uniform Quantization
Fig. 1. JPEG block diagram
ZigZag Scan
Entropy Coding
`01001010 ...'
JPEG’s good middle and high rate compression performance and low computational and memory complexity make it an attractive method for natural image compression. Nevertheless, as we go to low bit rates that imply lower quality, the JPEG compression algorithm introduces disturbing blocking artifacts. It appears that at low bit rates a down–sampled image when JPEG compressed and later interpolated, visually beats the high resolution image compressed directly via JPEG using the same number of bits.
102
Whereas this property is known to some in the industry (see for example [9]), was never explicitly proposed nor treated in the scientific literature. One might argue however, that the hierarchical JPEG algorithm implicitly uses this idea when low bit-rate compression is considered [1]. Let us first establish this interesting property though a simple experiment, testing the compressiondecompression performance both by visual inspection and quantitative mean-square-error comparisons. An experimental result displayed in Figure 2 shows indeed that both visually and in terms of the Mean Square Error (or PSNR), one obtains better results using down–sampling, compression, and interpolation after the decompression. Two comments are in order at this stage: (i) throughout this paper, all experiments are done using Matlab v.6.1. Thus, simple IJG-JPEG is used with fixed quantization tables, and control over the compression is achieved via the Quality parameter; and (ii) throughout this paper, all experiments applying down-sampling use an anti-aliasing pre-filter, as Matlab 6.1 suggests, through its standard image resizing function. Let us explain this behavior from an intuitive perspective. Assume that for a given image we use blocks of
pixels in the coding procedure. As we allocate too few bits (say 4 bits per block on average),
only the DC coefficients are coded and the resulting image after decompression consists of essentially constant valued blocks. Such an image will clearly exhibit strong blocking artifacts. If instead the image is down-sampled by a factor of 2, the coder is now effectively working with blocks of an average budget of
and has
bits to code the coefficients. Thus, some bits will be allocated to
higher order DCT coefficients as well, and the blocks will exhibit more detail. Moreover, as we up-scale the image at the decoding stage we add another improving ingredient to the process, since interpolation further blurs the blocking effects. Thus, the down-sampling approach is expected to result in better both visually and qualitatively outcomes. In this paper we propose an analytical explanation to the above phenomenon, along with a practi-
103
Fig. 2. Original image (on the left) JPEG compressed–decompressed image (middle), and down-sampled–JPEG
. The compressed ‘Lena’ image in both cases used bpp inducing MSE‘s of and ! " , respectively. The " # " ‘Barbara’ image in both cases used bpp inducing MSE‘s of $ and $ % $ , compressed compressed-decompressed and up sampled image (right). The down sampling factor was
respectively.
cal algorithm to automatically choose the optimal down–sampling factor for best PSNR. Following the method outlined in [4], we derive an analytical model of the compression–decompression reconstruction error as a function of the memory budget, (i.e. the total number of bits) the (statistical) characteristics of the image, and the down–sampling factor. We show that a simplistic second order statistical model provides good estimates for the down–sampling factors that achieves optimal performance. This paper is organized as follows. Sections II–III–IV present the analytic model and explore its
104
theoretical implications. In Section II we start the analysis by developing a model that describes the compression-decompression error based on the quantization error and the assumption that the image is a realization of a Markov random field. Section III then introduces the impact of bit-allocation so as to relate the expected error to the given bit-budget. In Section IV we first establish several important parameters used by the model, and then use the obtained formulation in order to graphically describe the trade-offs between the total bit-budget, the expected error, and the coding block-size. Section V describes an experimental setup that validates the proposed model and its applicability for choosing best down–sampling factor for a given image with a given bits budget. Finally, Section VI ends the paper with some concluding remarks. II. A NALYSIS
OF A
C ONTINUOUS “JPEG- STYLE ” I MAGE R EPRESENTATION M ODEL
In this section we start building a theoretical model for analyzing the expected reconstruction error when doing compression–decompression as a function of the total bits budget, the characteristics of the image, and the down–sampling factor. Our model considers the image over a continuous domain rather then a discrete one, in order to simplify the derivation. The steps we follow are: 1. We derive the expected compression-decompression mean-square-error for a general image representation. Slicing of the image domain into &
by
'
blocks is assumed.
2. We use the fact that the coding is done in the transform domain using an orthonormal basis, to derive to error induced due to truncation only. 3. We extend the calculation to account for quantization error of the non-truncated coefficients. 4. We specialize the image transform to the DCT basis. 5. We introduce an approximation for the quantization error, as a function of the allocated bits. 6. We explore several possible bit-allocation policies and introduce the overall bit-budget as a parameter into our model.
105
At the end of this process we obtain an expression for the expected error as a function of the bit budget, scaling factor, and the image characteristics. This function eventually allows us to determine the optimal down–sampling factor in JPEG-like image coding. A. Compression-Decompression Expected Error Assume we are given images on the unit square of a 2D random process
( )+* , ( )+* , , -.0/213*547698:( )+* , ( )+* ,=
, realizations
? -.0/213*5476A@ , with second order statistics given by
B ( - .0/213*5476C, )+*
NM O RP Q:STU VWT U PRQ:SX U VWX U Y
D#/213*54FE51 GIH"J+*54KGIH"L6
and
(1)
Note that here we assume that the image is stationary. This is a marked deviation from the real-life scenario, and this assumption is done mainly to simplify our analysis. Nevertheless, as we shall hereafter see, the obtained model succeeds in predicting the down-sampling effect on compression-decompression performance. We assume that the image domain
( )+* , ( )+* ,
[\ ]_^a`b3c * b 5` fgc * f & & d e ' h ' d
is sliced into for
Assume that due to our coding of the original image result
b
-n.9/213*5476
&Z!'
regions of the form
R*ij* YkY *A&lE f
R*ij* YkY *A'm*
we obtain the compressed-decompressed
o- .9/213*5476 , which is an approximation of the original image. We can measure the error in approxi-
mating
-R.9/213*5476
by
o- .9/213*5476
p .
x x c u / 9 . 2 / 3 1 5 * 7 4 6 o9 . 2 / 3 1 5 * 7 4 5 6 6 1 4 Os utwv Os ut z q5q~ x x /u- .0/213*5476 c o- .9/213*547656 1 4 ]h}{ z x x q5q~ [\ ] /u- .9/213*5476 c o- .9/213*547656 1 4 ]}{ Area / 6 Area / [\ ] 6 z B
/ [\ ] 6A@* ]h}{ &Z ' ? &
q5qNr
zy \|{
zy \|{
zy \|{
as follows
r
(2)
106
where we define
& B
/ [\ ] 6 ^
[\ ] qAq ~ u/ - .0/213*5476 c o- .9/213*547656 x 1 x 4 Y Area / 6
(3)
We shall, of course, be interested in the expected mean square error of the digitization, i.e.,
zy z B p B . |\ { h}]{ &Z!'
[\ ] q5q~ /u- .9/213*5476 c o- .9/213*547656 x 1 x +4 Area / 6
(4)
Note that the assumed wide–sense stationarity of the image process results in the fact that the expression
B ( & B
/ [\ ] 6C,
/ b * f 6 , i.e., we have the same expected mean square error over each
is independent of
slice of the image. Thus we can write
B (p , .
B B
/ [ 5 6C, &Z!' Z & !' ( & B ` [ qAqI~ u u/ - .0/213*5476 c o- .9/213*547656 x 1 x 4 Y d Area / 5 6
(5)
Up to now we considered the quality measure to evaluate the approximation of -+.0/213*5476 in the digitization process. We shall next consider the set of basis functions needed for representing
-+.9/213*5476
over each
slice. B. Bases for representing
-.9/213*5476
over slices
In order to represent the image over each slice tions. Denote this basis by
qAqI~ If
?R0"w@
[\ ] , we have to choose an orthonormal basis of func-
?R"W/213*5476A@! s { O s 5s s @ . We must have
x x 0"0 1 4 Q Q )
/u F*A¡C6 ^ u/ j¢w*A¡2¢|6 otherwise Y if
is indeed an orthonormal basis then we can write
- .9/213*5476
z£ z£ - .9/213*5476i*0"/213*54765¥50"W/213*5476i* { O { O7¤
(6)
107
as a representation of
-.9/213*5476
¦ " ^
over
in terms of an infinite set of coefficients
¤ - .9/213*5476i*0"/213*54765¥
Suppose now that we approximate functions
[\ ]
-.9/213*5476
over
§q5q~ [\ ]
x x - .9/213*547650"W/213*5476 1 4 Y
by using only a finite set
¨
(7) of the orthonormal
?R©"W/213*5476A@ , i.e. consider zªz¬«
o- .9/213*5476
s w®R¯ ¤ - .0/213*5476i*0"W/213*54765¥50"/213*5476i*
The optimal coefficients in the approximation above turn out to be the corresponding infinite representation. The mean square error of this approximation, over
& B
/ [ 5 6 y q5q~ u & ' ` }-
[ 5
(8)
¦ "
’s from the
say, will be
x x - .9/213*5476 c o- .9/213*5476± 1 4 x x x x x x q5q~u q5q~u - .9/213*5476² o- .0/213*5476 1 ³ 4 G o- . 2/ 13*5476 1 4 d . /213*5476 1 4 c q5q~u0°
(9)
Hence,
& B
/ [ 5 6
Now the expected
x x q5q~u &Z!' ` - . /213*5476 1 4 zªz¬« x x qiqI~u c - .0/213*5476µ´F"W/213*5476 1 4 s w®R¯ ¤ - .0/213*5476µ´F"W/213*54765¥ z¶z « G s w®R¯ ¤ - .9/213*5476µ´F"/213*54765¥ d z¶z¬« x x q5q~u &Z!' ` - . /213*5476 1 4 c s w®R¯ ¤ - .0/213*5476i*i´F"/213*54765¥ d
& B·
/ [ 5 6
(10)
will be:
B ( & B
/ [ 5 C6 , & ' ` qAqI~u B - . /213*5476 x 1 x 4 c z¶z « B l s w®R¯ ¤ - .9/213*5476i*i´F s /213*54765¥ d
(11)
Hence:
B ( p ¸ ,
z¶z « M B ¦ &Z!'¹ O &Z !' c Z & 'º s w®R¯ ( " , z z B ¦ s Y M O c &Z!'¹ « ( , s w®R¯
(12)
108
C. The effect of quantization of the expansion coefficient
¦ "
Suppose that in the approximation
- o .9/213*5476
zªz¬«
¦ s s w®R¯ " 20 C/213*5476
(13)
¦ "W@ that take values in R. If ¦ " ¦¾½ is represented / encoded with ¼» " -bits we shall be able to describe it via " that takes on ¿CÀÁ values ¦³½  / ¦ "26_87ÃÄ; set of R¿CÀÁ representation levels. The error in representing ¦ " in this only, i.e. " ¿CÀÁ ¦ way is Å " / " c ¦K"½ 6 Y Let us now see how the quantization errors affect the & BK
/ [ 5 6 . We we can only use a finite number of bits in representing the coefficients ?
have
x x qAq ~u ° [ ³ B ½ ½
5 - .9/213*5476 c o- . /213*5476± 1 4 & / 6 (14) y « §ÆÇÆ « ½ ½ ¦ where o- . J s L s w®R¯ s ´F"W/213*547}6 . Now x x qAq ~u ° [ ³ B ½ ½
5 & / 6 &Z ' - .9/213*5476 c o- .9/213*5476GÈ o- .9/213*5476 c o- . /213*5476± 1 4 x x qAq~u0° c &Z ' - .9/213*5476 o- .9/213*5476± 1 4 x x q5q~u0° ° GÉ!&Z ' - .9/213*5476 c - o .9/213*5476± o- .9/213*5476 c o- . ½ /213*5476± 1 4 x x q5q~u0° ½ G·&Z ' o- .9/213*5476 c - . /213*5476± 1 4 zªz¬« q5q~u_Ì ¦ & B
/ [ 5 6ÊGË· & ' - .9/213*5476 c s w®R¯ "20"C/213*5476ÍÎ x x Ì zªz « ¦ ¦K½ s w®R¯ / " c " 650"W/213*5476Í 1 4 x x q5q~u_Ì zªz « ¦ K ¦ ½ G·&Z ' s w®R¯ / " c " 650"W/213*5476Í 1 4 z¶z « ¦ ¦K½ ¦ & B
/ [ 5 6ÊGËR& ' ` s w®R¯ / " c " 63 " z¶z « ¦ ¦K½ ¦ c s w®R¯ / " c " 63 " d q5q~ Ì zªz « ¦ " c ¦K½ 650"|Í x 1 x 4 G·&Z '¹ / s " 5 w®R¯ z¶z¬« ¦ ¦ ½ & B
/ [ 5 6ÊG& '¹ s w®R¯ / " c " 6
109
The expected
z¶z « & B
/ [ 5 Ê6 G& '¹ s w®R¯ Å "
& B
½ / [ 5 6
is therefore given by:
BÐÏ & B³
½ / [ 5 C6 Ñ B ( & B
/ [ 5 C6 ,G&Z!'¹ zªz¬« zªz « z¶z NM O Bl ¦ c &Z!'¹ "k G &Z!' s w®R¯ Hence, in order to evaluate
B (k/ p . ½ 6 ,
B s w®R¯ ( Å " , « BÐÏ ¦ ¦ ½ K s w®R¯ / " c " 6 Ñ
in a particular representation when the image is sliced into
pieces and over each piece we use a subset
¨
of the possible basis functions (i.e.
(15)
&Z '
¨¹Òº?j/u F*A¡C6¼Ó F*A¡
)+* R*i YkY @ ) and we quantize the coefficients with ÔK" - bits we have to evaluate M O c G
z¶z «
¦ s w®R¯ ? variance of "w@ z¶z « ¦ & '¹ s w®R¯ ? expected error in quantizing " & '¹
with
/u»"26
bits
@Y
D. An important particular case: Markov process with separable Cosine bases We now return to the assumption that the statistics of
B ( - .9/213*5476C, )+*
and
? -n.9/213*5476A@
M O P Q:STU VWTRU P Q:SX U VWX!U B ( - .9/213*5476µ- .0/21GH"Jj*54KGH"LR6C, N
and we choose a separable cosine basis for the slices, i.e. over where
Õ Õ
+/21Ê6 /21Ê6
Ö
is given by Equation (1), namely,
&Ð/u c 6+×"ØÙF +&ÛÚÜ13* Ö 'Ð/u c 26+×"ØÙ:¡R'ÛÚÜ13*
Õ Õ ( )+* y , ( )+* , , 0"C/213*5476 +/21Ê6 /2476 , }
¡
)+* R*ij* YkYkY )+* R*ij* YkYkY
This choice of using the DCT basis is motivated by our desire to model the JPEG behavior. As is well known [6], the DCT offers a good approximation of the KLT if the image is modelled as a 2D random Markov field with very high correlation factor.
110
To compute
B (p , .
for this case we need to evaluate the variances of
¦ "
defined as
¦ " ^ q5q~u - .9/213*5476 Õ /21Ê6 Õ µ/2476 x 1 x 4
(16)
We have
B (¦ , "
B ` q5q~uq5q~u - .9/213*5476µ- .0/2Ý+*5Þ76 Õ 7 /21Ê6 Õ /2476 Õ /2Ýn6 Õ /2Þ76 x 1 x 4 x Ý x Þ d q5q ~u q5q ~u M O P Q:STRU J Q7ß¼U P Q:SX U L Q7à U !&Ð/u c 6j×"ØÙ /u +ÚÊ&á1Ê6j×"ØÙR/u +ÚÊ&áÝn6 x x x x '/u c 26j×"ØÙ /w¡wÚÊ'm476j×"ØÙR/w¡wÚÊ'mÞ76 1 4 Ý Þ Y
(17)
Therefore, by separating the integrations we obtain
MO q â B (¦ , N c u / 6 O " q /u c 26
x x q â RP Q:STU J Q7ß¼U ×"ØÙ!/u +ÚÊ&á1Ê6j×"ØÙR/u +ÚÊ&áÝn6& 1 Ý O ã q ã L Q7à U ×"ØÙ /w¡wÚÊ'm476j×"ØÙ /w¡wÚÊ'mÞ76' x 4 x Þ R P : Q S X U O O
(18)
Changing variables of integration to
1ä
ä Î &á13* 1æ å ( )+* , 4 ä 'm4F* 4ä åI( )+* ,
ä &áÝ+* Ý åI( )+* , Þ ä 'mÞF* Þçä åI( )+* ,C* Ý
ä
yields
B (¦ , "
ä x x ä M O q q u/ c 6"/u c 26
Values of
»""
?+
+
R)
??
@
?
)
¼)@ ¼)ù
GIH H H H H H H H H H H H H H H
H
Y H H H H
77 H H
¼)R ù
ù?D?? @ E@ F7 7¬ ¼)ù + R) ¼) ¼)) ¼ )ù @@ 7RD@D@?D@
H H H H H J
were chosen to be inversely proportional to these values.
C. Soft bit allocation with cost functions for error and bit usage We could also consider cost functions of the form K
yMLN
° BÐÏ p O ½ / 6 ѱG
K
/w&Z!'¹ Bits/slice 6i*
116
where
K
yMLN
and
K are cost functions chosen according to the task in hand, and ask for the bit alloca-
tion that minimize the joint functionals, in the spirit of [5]. IV. T HE T HEORETICAL P REDICTIONS
OF THE
M ODEL
In the previous sections we proposed a model for the compression error as a function of the image statistics (
MO * ð J * ð L ), the given total bits budget Ô
1.21 , and the number of slicings
&
and
'
. Here,
we fix these parameters according to the behavior of natural images and typical compression setups and study the behavior of the theoretical model.
+?
Assume we have a gray scale image of size ?+ considers
with 8 bits/pixel as our original image. JPEG
slices of this image and produces, by digitizing the DCT transform coefficients with a
predetermined quantization table, approximate representation of these
I
slices. We would like to
explain the observation that down–sampling the original image, prior to applying JPEG compression to a smaller image, produces with the same bit usage, a better representation of original image. Suppose the original image is regarded as the ’continuous’ image defined over the unit square
( )+* , , as we have done in the theoretical analysis. BPQ?+
Then, the pixel width of a ?+
+?
( )+* ,
image will be
. We shall assume that the original image is a realization of a zero mean 2D stationary random
M O U \ Q \ U U ] Q ] U process with autocorrelation of the form D#/AÓ bcb ¢ Ó * Ó fcf ¢ Ó 6 Ó R Ó Ó R Ó , with R , and R in M the range of ( ) Y ?j*A) Y @?!, , as is usually done (see [6]). From a single image, O can be estimated via the expression
M O! S
?+
WV z Y b f c b / U / F T / * 6 + 6 6 / / G ) ? 6 6 ? n+ Y ùj* 6T ?+ \ s ] ? O z
assuming an equalized histogram. If we consider that R
\ \ U Q U ö
\ \ P Q:SU W Q W U P Q W è U Q U *
we can obtain an estimate for
using P
ð
c ?+ð
117
Q Wè
R åI( ) Y j*A) Y @?!, , This provides
Ø EXYR c ; c +? ØXYR ð åI( ?j*iR) , Y
The total number of bits for the image representation will range from ) Y )? bpp to about
Ô
+? ) Y ) ? ùj* ¼)E7
1.21 will be between ?+
+? ? *i
to ?+
Y)
bpp, hence,
bits for ?+
+?
M Jð +* ð L åá( ?j*iR) , , ñ S ?R))) for ? gray level images, with total bit usage between ¼)+*A))) and R))+*A))) . R*ij* YkYkY , where we assume that The symmetric 1 and 4 axis slicings considered will be &á*A' Jð Lð and & ' . Then we we shall evaluate (see Equation (26)) ì ì B / p ½ ,w6 ^ M O ô© c zªz ÷ . s k®R¯ /u c 6"/u c 26 /ê& ð *i F*i 76 /F' ð *A¡*A¡C6"(ø c " , õ original images. Therefore, in the theoretical evaluations we shall take
with
" -s provided by the optimal level allocation
ì ì y 6 " /u c 6"/u c w6 / & ð *i F*i 76 / & ð *A¡*A¡C6÷/u
Practically, the optimal level allocation
"þ
should be given by
«
( s w®R¯ / ä " 6C, [Z]\^ /µ R*_ "a` 6 , a measure that "þ
automatically prevents the allocation of negative numbers of bits. Obviously this step must be followed by re-normalization of the bit allocation in order to comply with the bits budget constraint. taken from 1 to 3, whereas
¨
approach which is coding of
will be
æ
precise encoding of only about
Ó ¨KÓ
?j/u F*A¡C6¼Ó G ¡'4b7 , F*A¡
÷
can be
)+* R* YkY 7@ , simulating the standard JPEG
transform coefficients, emphasizing the low frequency range via the
ù
coefficients.
Using the above described parameter ranges, we plot the predictions of the analytical model for the expected mean square error as a function of the slicings
&
with bit usage as a parameter.
Figures 3 and 4 demonstrate the approximated error as a function of the number of slicings for various total number of bits. Figure 3 displays the predictions of the theoretical model in conjunction with optimal level allocation while Figure 4 uses the JPEG style rigid relative bit allocation. In both figures
118
the left side shows the results of restricting the number of bits or quantization levels to integers, while the right side shows the results allowing fractional bit and level allocation. These figures show that for every given total number of bits there is an optimal slicing parameter
&
indicating the optimal down–sampling factor. For example, if we focus on the bottom right graph (rigid bit allocation with possible fractions), if 50Kbits are used, the optimal & that an image of size ?+
+?
W W c ed
S
W 5 W c
should be sliced to blocks of size c 5
move to a budget of only 10Kbits, optimal & c ed
is found to be ù . This implies
is found to be
. As we
and the block size to work with becomes
. Since JPEG works with fixed block size of
¾æ
, the first case of 50Kbits calls
for a down-sampling by a factor of , and the second case with 10Kbits requires a down-scaling factor of ù Y ? . Note that integer bit allocation causes in both cases non-smooth behavior. Also in Figure 3 it appears that the minimum points are local ones and the error tends to decrease again as
&
increases. This
phenomenon can be explained by the fact that we used an approximation of the quantization error which fails to predict the true error for a small number of bits at large down-scaling factors. Finally, we should note that the parameter
÷
was chosen differently between the two allocation policies. Within the range
(ø R*iù!, , we empirically set a value that matched the true JPEG behavior. The overall qualitative behavior however was quite similar for all the range of ÷ ’s. Figure 5 shows the theoretical prediction of PSNR versus bits per pixel curves for typical ?+ images with different down–sampling factors (different values of is
&fPQ?+
&
+?
, where the down–sampling factor
). One may observe that the curve intersections occur at similar locations as those of the
experiments with real images shown in the next section. Also, it appears that even though the allocation policies are different, the results are very similar. An interesting phenomenon is observed in these graphs: For down-sampling factors smaller than
an
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
10Kbits 30Kbits
0.5
190Kbits
0
10
20
30
M
40
50
60
10Kbits 30Kbits 50Kbits 70Kbits 90Kbits 110Kbits 130Kbits 150Kbits 170Kbits 190Kbits
0.4
130Kbits 150Kbits 170Kbits
0.3
0.6
0.5
50Kbits 70Kbits 90Kbits 110Kbits
0.4
0.2
Normalized MSE
Normalized MSE
119
0.3
0.2
70
0
10
20
30
M
40
50
Fig. 3. Theoretical prediction of MSE based on optimal bit allocation versus number of slicings g usage as a parameter. Here, we used the typical values hji 50Kbits 30Kbits 10Kbits
1
, and kli
60
70
with total bits
1
70Kbits 0.9
0.9
0.8
10Kbits
0.8
0.7
Normalized MSE
Normalized MSE
90Kbits
0.6
0.5
0.7
30Kbits
0.6 50Kbits 0.5 70Kbits
110Kbits 0.4
130Kbits 150Kbits
0.3
0.2
170Kbits 190Kbits 0
10
20
30
M
40
50
60
70
0.4
90Kbits 110Kbits
0.3
0.2
130Kbits 150Kbits 170Kbits 190Kbits 0
10
20
30
M
Fig. 4. Rigid relative bit allocation based prediction of MSE versus number of slicings g a parameter. Here, we used the typical values hmi
, and kli
40
50
60
70
with total bits usage as
almost flat saturation of the PSNR versus bit-rate is seen at sufficiently large bit-rate. This phenomenon is quite expected, since the down-sampling of the image introduces un-recoverable loss. Thus, even an infinite amount of bits can not recover this induced error, and this is the obtained saturation height. The reason this flattering happens at lower bit-rates for smaller factors is that the smaller the image, the smaller the amount that will be considered as leading to near-perfect transmission. In terms of qualityscalable coder, this effect is parallel to the attempt to work with an image pyramid representation and
120
allocating too many bits to a specific resolution layer, instead of also allocating bits to the next resolution layer. Theoretical Prediction 512x512
22
Theoretical Prediction 512x512
22 Scale=1
21
Scale=1
21
Scale=0.875
20
Scale=0.75
20
Scale=0.75
19
Scale=0.625
19
Scale=0.625
Scale=0.5
18
PSNR
PSNR
Scale=0.875
17
Scale=0.375
16
Scale=0.5
18
17
Scale=0.375
16 Scale=0.25
Scale=0.25
15
15
14
14
13
0
0.5
1
bpp
1.5
2
13
2.5
Theoretical JPEG Prediction 512x512
21
0
0.5
22
Scale=1
1
bpp
1.5
2
2.5
Theoretical JPEG Prediction 512x512
Scale=0.875 Scale=1
20
Scale=0.75
21
19
Scale=0.625
20
Scale=0.75
19
Scale=0.625
Scale=0.875
PSNR
17
Scale=0.375
16
PSNR
Scale=0.5
18
Scale=0.5
18
17
Scale=0.375
Scale=0.25 15
16
14
15
13
14
Scale=0.25
12
0
0.5
1
bpp
1.5
2
2.5
13
0
0.5
1
bpp
1.5
2
2.5
Fig. 5. Optimal (top) and rigid relative (bottom) bit allocation based prediction of PSNR versus bits per pixel with image down–sampling as a parameter. Again, we used the typical values hni allocation case, and kli
, and koi
for the optimal bit
for the JPEG-style case.
V. C OMPRESSION R ESULTS
OF
NATURAL
AND
S YNTHETIC I MAGES
To verify the validity of the analytic model and design a system for image trans-coding we can generate synthetic images for which the autocorrelation is similar to that of a given image. Then we can plot the PSNR/bpp JPEG graphs for all JPEG qualities, one graph for each given down–sampling ratio. The
121
statistical model is considered valid if the behavior is similar for the natural image and the synthesized one. A. Image Synthesis Assume that for an image pÜ/rq*ts