Quadtree-structured recursive plane decomposition ... - IEEE Xplore

1 downloads 0 Views 3MB Size Report
Jun 6, 1991 - [I I] W. H. Chen and W. K. Pratt, “Scene adaptive coder,” IEEE Trans. Commun., vol. ... Int. Con$ ASSP (Dallas, TX), Apr. 1987, pp. 1051-1054.
IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 39, NO. 6, JUNE 1991

I380

Quadtree-Structured Recursive Plane Decomposition Coding of Images Peter Strobach, Member, IEEE

Abstract-The approximation of two-dimensional highly correlated grey value functions can be performed using a linear model of the type f ( x , y ) = a + bx + cy. The set of plane parameters (PP’s) [a, b, c] can be determined in the least squares sense for, say, a block of size N X N pixels. Starting with a block size of 2 x 2 pixels, it can be shown that the PP’s obey a recursive law such that the PP’s of a 2 N X 2 N block can be computed recursively when only the PP’s of the four adjacent subblocks of size N x N in the lower decomposition.jeve1are known. This concept of “recursive plane decompositlbn” (RPD) is embedded in a quadtree data structure to obtain a new variable block size image coding algorithm that offers a high performance at a low computational cost. Extensive comparisons to other state-of-the-art image coding algorithms are reported in the paper. These comparisons indicate that the quadtreeRPD coding algorithm presented in this paper performs very closely to the results obtained by other important schemes, such as vector quantization and subband coding, in the range of 0.51.5 b per pixel. This is achieved at a coder complexity as low as 3 multiplicationslpixel and 8 additionslpixel which compares favorably with the complexity of traditional methods.

I. INTRODUCT~ON URRENT algorithms for natural grey value image coding [l], [2] generally employ one of three techniques: 1) transform coding (TC) [3]-[5], 2J vector quantization (VQ) [6]-[8], or subband decompositioii [9], [ 101. It is a common experience that most natural images can be divided into regions of high and regions of low detail. Traditional implementations of coding schemes such as [ l l ] do not adapt the block size on the space-varying characteristics of natural images, but they break the image into blocks of a fixed size prior to processing. The block size in TC is usually too large to handle partial high-detail regions satisfactorily. “Ringing” effects are a well-known problem of fixed block size TC and various attempts have been made to cope with this difficulty [12]. On the other hand, the block size in conventional VQ coding is too small to take advantage of large homogeneous and unstructured regions in an image. Since high detail regions are often intrinsically less compressable than regions of low detail, it can be very attractive to employ variable block size image coding schemes which of-

C

Manuscript received May 24, 1988; revised June 8, 1990. This work was presented in part at the IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, Scotland, May 1989. The author is with Siemens AG, Zentralabteilung Forschung und Entwicklung, ZFE IS INF, Forschung f i r Informatik und Software, D-8000 Munchen 83, Germany. IEEE Log Number 9143805.

fer the potential of a better variation of the number of bits spent per unit area, according to the local detail. During the past several years, a great interest has grown around variable block size image coding algorithms. Many of these recently developed concepts attempt to combine classical schemes such as TC [13], VQ [14], or even a combination of TC and VQ [ 151 with spatial segmentation techniques. In TC, variable block size coding can be employed in the original domain [13], or even in the transform (coefficient) domain [ 161. The gapless nesting of blocks of a different size in the two-dimensional space requires some methodology. For this purpose, most variable block size image coding algorithms use the quadtree data structure [17]-[21] or simple modifications thereof [22]. A closer look at the principle of variable block size coding reveals that there are two major problems associated with variable block size image coding algorithms. First is the problem of how to incorporate a meaningful adaptivity in the segmentation process, and second is the problem of order recursivity of the underlying signal representation processing schemes of increasing (bottom-up decomposition) or decreasing (top-down decomposition) block sizes. For example, the Hadamard transform possesses the sometimes important property that the coefficients of a block of size 2 N X 2 N can be computed order recqksively and efficiently when only the coefficients of the four adjacent subblocks of size N x N are known [23]. A similar order recursive law exists in the case of the discrete cosine transform (DCT) [24]. These order recursive laws facilitate a much more efficient computation of coefficients of increasing block sizes when compared to the simple nonrecursive case, where the coefficients must be computed completely new for each block size. Since adaptive variable block size image coding algorithms frequently involve a hypothesis test which requires the coefficients of all block sizes as input, such efficient order recursive construction properties play a central role in the area of variable block size image coding. But there is one more aspect that deserves some attention. As soon as an adaptive variable block size segmentation process reaches large block sizes, this automatically indicates that the corresponding part of the image is of low detail, and hence the high frequency coefficients of a transform applied to these blocks are close to zero or less significant and can therefore be omitted. Nevertheless, conventional transform coding schemes waste a lot

1053-587X/91/0600-1380$01 .OO O 1991 IEEE

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

of computations to calculate just such insignificant coefficients. Clearly more informative to the problem of signal representation in adaptive variable blocksize image coding schemes would be a transform with a number of coefficients per block which is fixed, and not equal to the number of pixels in the block as with conventional orthogonal transforms. The key step to the derivation of such “fixed number of coefficients” transforms is the approximation of the luminance signal by a parametric function in blocks of a different size. A special case treated in this paper is the cy. two-dimensional linear functionf(x, y) = a + bx The set [ a , b, c] is called the “plane parameters” (PP’s), where [ a ] is the mean parameter and [b, c] form the plane gradient. Clearly, the PP’s have an important interpretation as the coefficients of just such a transform with a number of coefficients which is fixed, and independent of the block size. This piecewise linear model can also be discussed in context with the previous work on nonorthogonal basis functions by Green and Bass [25] and Paul and Koch [26]. These early papers, however, have not treated the second important problem, namely, the fast order recursive computation of PP’s of increasing block sizes. When the PP’s are determined in the least squares sense, then it can be shown that these quantities obey a recursive law that allows their efficient recursive bottomup computation. Recursive laws of approximation polynomials in the area of picture representation were discussed by Eden et al. [27]. Leonardi and Kunt presented an application of a least squares polynomial transform where they varied the order of the polynomial according to the local detail [28]. We have combined the piecewise linear model with the quadtree data structure to obtain a new variable block size image coding algorithm named the quadtree recursive plane decomposition (quadtree RPD) image coder. The paper discusses all important aspects of this coding concept, such as parameter quantization, rate distortion characteristics, a methodology for automatic computation of optimal merging thresholds in dependence of the data characteristics, and entropy coding of the PP’s. Throughout the extensive simulations, the described quadtree-RPD image coder showed good coding results. The signal-tonoise ratios of quadtree-RPD coded test images were compared to the results of other sophisticated schemes reported in the literature. These comparisons are presented in the paper; they indicate that the quadtree-RPD coder performs almost identically to most other coding schemes based on much more sophisticated processing schemes. In particular, it was found that the coder’s performance is very close to that of some recently developed variable block size VQ algorithms. One tentative explanation of this fact can be that indeed many VQ codebooks for natural image coding contain a large number of vectors which just exhibit approximately the shape of our “planes” underlying the piecewise linear model. Hence, if the LBG design [29] of a codebook is interpreted as the design of a multidimensional Lloyd-Max quantizer [30], [3 I], then

+

1381

the principle of RPD coding may be loosely interpreted as a multidimensional entropy coded nonuniform quantizer. When the plane gradient (parameters b and c) is forced to zero, one obtains a simplified method termed “quadtree-structured regular mean decomposition coding” [32][34]. This method has been applied with great success to the problem of coding heavily decorrelated motion compensated frame-to-frame difference images appearing in the field of image sequence (video) coding. The paper is organized as follows. Section I1 gives a short review on regular decomposition and quadtrees, as far as it will be required in this paper. Section I11 presents the RPD approach. Two different linear models will be discussed. These two models may be distinguished upon the “center of plane rotation.” In the unquantized case, i.e., when the PP’s are represented with infinite precision, both models are equivalent. But in the quantized case, i.e., when the PP’s are represented with a finite number of steps, it is seen that among all possible linear models, the model where the center of plane rotation is located exactly in the center of a block, will give the best rate distortion characteristics. Section IV discusses several important optimization subproblems such as optimal quantization of the PP’s. Small block sizes correspond to regions of high detail, hence allowing a fairly coarse quantization. At increasing block sizes, the mean parameter [ a ] must be quantized at successively smaller step sizes. Another question is how mean and plane gradient quantizer step sizes should be related to each other so that the quantization of each parameter will induce the same portion of quantization error energy to the overall reconstruction error (balanced quantization). The optimal ratio of quantization step sizes of mean and gradient parameters is crucial to the overall performance of the quadtreeRPD algorithm. Section IV shows how this interesting problem can be solved analytically. Another interesting question in the optimization of the quadtree-RPD coder was the incorporation of “adaptivity” and the optimal choice of the decomposition thresholds. For this purpose, an algorithm that computes the rate distortion characteristics of the quadtree-RPD coder at a predetermined quantization and for real data has been developed. This procedure is useful for the determination of the optimal quantizer sets at different compression ratios. The procedure is also useful for the determination of optimal decomposition thresholds. Diagrams were obtained where, for a given image and quantization, the optimal decomposition thresholds are plotted as a function of the desired compression. When the decomposition thresholds are chosen as suggested, the true coder will produce an output bit rate that varies in a not more than 2 % range around the desired bit rate. The reconstruction of the image from PP’s is the third problem addressed in Section IV. Clearly, the reconstruction can be posed as another optimization problem where a smooth function must be reconstructed from mean and gradient information available at a nonuniform (quadtree-structured) sampling grid. The sim-

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39. NO. 6, JUNE 1991

I382

plest way is to expand the signal by the linear planes. It is shown that an appropriate higher order interpolative reconstruction of the luminance function generally yields better results both in terms of signal-to-noise ratio and in the subjective image quality. Section V finally presents some experimental results. These results are compared to other known coding schemes. A discussion in Saction VI concludes this paper. 11. THE QUADTREE SEGMENTATION STRUCTURE In the following, we give a review of some basic ideas of regular decomposition and quadtrees, as far as it will be required in this context. Quadtrees are a powerful technique for describing two-dimensional regions. The basic relationships between a region and its quadtree representation is available in the papers of Klinger and Dyer [ 181, Hunter and Steiglitz [19], Fu and Mui 1201, and in the detailed survey paper of Samet [17]. When regular decomposition is used in describing an image, the image is usually presegmented in blocks of a fixed size. Each block can be subdivided into four smaller units, called subblocks. After each subdividing operation, the size of the resulting subblocks is a quarter of the predecessor. The subdividing operation can be repeated recursively many times until there is no further subdividing needed or the smallest possible block size is reached. Each subdividing operation is guided by a hypothesis test. In this test, a decision is made whether four adjacent subblocks are homogeneous in the property of interest. If the test is positive, the block can be represented by a single parameter set of the underlying image model (in our case, the PP’s of the linear model for the luminance function). If the test fails, the four subblocks are generated and the actual region is represented by four independent parameter sets of the four subblocks. The subdivide operation constitutes the so-called top-down construction of a quadtree. Inversely, the merge operation corresponds with the bottom-up construction of a quadtree. In the bottom-up construction of a quadtree, we start with a small initial block size and proceed to larger block sizes through any type of hypothesis test where four adjacent subblocks are tested to see if they can be merged into a larger block which has four times the size of the predecessors. Clearly, the bottom-up construction of quadtrees allows the utilization of order recursive signal models or processing schemes. The top-down construction of a quadtree was used in quadtree predictive coding [35]. For an illustration of both procedures, see Fig. 2. Fig. 1 explains tQe principles of quadtree-based regular decomposition and its description; (a) is the segmented region, (b) is the corresponding quadtree, and (‘1 shows the generatioq Of a length structure code. 111. RECURSIVE PLANEDECOMPOSITION CODING The application of the linear function f ( x , y)

=

a

+ bx + cy

(1)

0 :NODE 0 : LEAF (C) 0 0

1

0

QC:lll

1

0

0

1

1

. . . . . .LEVEL

v = 3

. . . . . .LEVEL

u = 2

. . . . . .LEVEL

Y

= 1

I O 010 I O 1 0 0 1 1

Fig. 1 . Quadtree-based regular decomposition and its description. (a) Segmented region. (b) Corresponding quadtree. (c) Generation of a variable wordlength quadtree code (QC).

(b)

Fig. 2. (a) Top-down construction of a quadtree based on the subdivide operation. (b) Bottom-up construction of a quadtree based on the merge operation.

has a long history in image processing. The early work of Yakimovsky [36] employed the model (1) for boundary and object detectiqn in real-world images. This work was later extended by Nagel and Rekers [37]. The usefulness of the approximation (1) can be understood from the genera1 observation that indeed large areas in most natural or less oblique planes of images can be viewed as different size. This statement is justified by the fact that large parts of LBG-designed codebooks [29] in the area of image coding based on vector quantization exhibit the shape of oblique planes and can therefore be closely represented by the linear model (1). Moreover, the spacevarying characteristics of natural images suggests quite

1383

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

naturally the application of an adaptive segmentation structure, in which the model (1) must be embedded. In our case, this will be a quadtree segmentation structure which has the advantage of a very efficient structural description. A. Two-Dimensional Linear Models

Consider the two-dimensional grey value function s (x, y) and its piecewise linear approximation SI@, y) given by

?(x, y)

=

U

Fig. 3 . Z-scan: a convention for subblock labeling.

+ bx + cy

(2)

e(x, y ) = s(x, y) - ?(x, y);

0 I x, y I N

-

2

1

0

where [a] is a mean parameter and [b, c] form a plane gradient. We define the corresponding approximation error e ( x , y) as

3

bX

0 4

1. (3)

The evaluation of the approximation error e ( x , y) in the two-dimensional space requires the serialization of the data in a block of size N x N pixels. Without loss of generality, we henceforth assume that among other possible schemes, such a serialization (or one-dimensional ordering) of pixels is performed along a Z-scan as shown in Fig. 3. The Z-scan of Fig. 3 exhibits the important property that it is recursive, i.e., it leads to a simple recursive construction of address sequences when the block size increases, and it ensures that the so-scanned data can always be partitioned in a particular sense as required in the further considerations. When proceeding to larger block sizes, four adjacent subblocks are always treated as "superblocks" addressed according to the Z-scan of Fig. 3. This statement is illustrated in Fig. 4 on the example of a block of 4 x 4 pixels. The Z-scan is completely determined by the scan matrix SN.For the example N = 4,we may write

L o o 1 1

0 0 1 1

l

C

>

7

2c~70/070 0-0

-0

1 Y Fig. 4. Sixteen pixels in a 4

X

4 block addressed according to a 2-scan.

The coordinate vectors xN and y N can be generated automatically, for arbitrary N , according to the following rules: log2 N

x ~ ( j )=

C

n=O

2" MOD2 [DIV,(j)],

k=22n,

2 2 3 3

2 2 3 3 1

O < j s N 2 - 1

(84

Ly:]

and, in general, log2 N

Y N ( ~=)

(5)

where xN and y N are the coordinate vectors (6a)

C

n=O

2" MOD2 [ D I v ~ ( j > l , 1 = 2(2n+1), 0 I j I N 2 - 1

(8b)

where DIV, (x)is the truncated quotient x / k and MODk(x) is the remainder of x / k . Next, one may discover some interesting properties of the scan matrix SN.For a block size of N = 2 , we have

(6b) (9) (7)

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 6. JUNE 1991

I384

According to the definition of the coordinate vectors (8a, b), and in consideration of the scan matrix (4)of block size N = 4, it turns out that the scan matrix obeys a recursive construction property such that the scan matrix S 2 N may be constructed from SN plus some correction terms as follows:

Y *

s;++)

ZN

s;+N(i)].

0

(10)

A closer look at the definition of the scan matrix SN ( 5 ) , together with the definition of the address vectors xN and y N reveals that the linear model discussed so far presents only a special case of a two-dimensional linear model where the origin of the underlying coordinate system, or the “center of plane rotation,” is located in the upper left comer of the respective block. Among several other possibilities, one may consider a second case, where the origin of the coordinate system is located exactly in the center of the block. Both cases are illustrated in Fig. 5 . The location of the origin of the coordinate system is unimportant in the case of unquantized PP’s. The situation changes, as soon as the plane gradient (parameters bNand c N )are quantized at a finite number of steps. Here, it turns out that case 2 will be the most favorable model since the portion of quantization error energy, which is induced to each pixel in the block due to quantization of the plane gradient, is a quadratic function of the distance of the pixel from the origin. Hence the model of case 2, which performs the closest “grouping” of pixels around the origin (see again Fig. 5), will generally give the best coding results. We are interested in deriving the structure of the scan matrix SA which constitutes the address sequence of case 2. It can be shown that SiN obeys a similar recursive construction property as S 2 N (10). In particular, one may deduce that

0

0

0

0

0

0

0

0

0

0

‘I0 0

*

0

0

Fig. 5 . Two linear models distinguished by the location of the origin of the underlying coordinate system. Example shown for block size N = 4. (a) Case 1. Center of plane rotation located in the upper left-hand comer of the block. (b) Case 2. Center of plane rotation located in the center of the block.

sions

sib = [ i N , x N SN

- CN, Y N

-

(124

cN1

(12b)

= [iN, xN, Y N 1

where cNis a constant “address offset” vector of the form N - 1 N - 1 2 ’2 ’

“ *



NL]T. (13) 2

B. Plane Parameter Normal Equations For the purpose of a least squares computation of the PP’s, one introduces the error vector eN which describes the approximation error in a block of size N X N pixels by simply evaluating expression (3) for all pixel coordinates inside the N x N block, where we make use of the scan matrix SN to obtain eN

=

SN

- SNpN.

(14)

The vectors sN and eN contain the signal and error values ordered according to the Z-scan determined by the scan matrix SN (5). An analogous expression can be found for case 2 determined by the scan matrix SA. The set of PP’s [aN,bN,C N ] form the plane parameter vector p N and, furthermore, one readily finds that the scan matrices for the two cases are related to each other via the expres-

PN

= LaN,

(15)

bN? c N I T

which is assumed as constant over the entire N

X

N block.

1385

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

X

The approximation error energy E N in a block of size N N pixels is then given by the quadratic functional =

e i e N = sisN -

-

sisNpN +P ; ~ ; ~ N P N

{rg),

where 0 5 5 3) are ‘‘update vectors” which solely depend on the means { M : ) , 0 5 m 5 3) of the four adjacent subblocks of size N X N according to

(16) and a minimization of E N can be achieved via setting the gradient of the approximation error energy to zero yielding

r p = N- [O, ~

Th

=

(ShTSk)-’.

(19)

A simple inspection of (1 1) reveals that in case 2 this quantity is a diagonal matrix which solely depends on the blocksize N:

Clearly, the subblock means obey a recursive law in that 3

and hence the recursive computation of the intermediate solution vector 4 2 N from the intermediate solution vectors of { q $ ) , q:), q $ ) , 4;)) of the four adjacent subblocks can be lead back to a simple mean recursion. Table I summarizes this algorithm for the unconditional recursive bottom-up plane decomposition of a two-dimensional signal.

D. Normalized Recursions In fixed-point applications, it is sometimes of interest to work with normalized algorithms. The RPD algorithm, as introduced in Table I, is an unnormalized algorithm. The magnitude of the elements of TiN decreases rapidly with increasing block size N while the magnitude of MN increases with increasing block size. A block size normalized (N-normalized) version of the unconditional RPD algorithm can be easily deduced from Table I, if we only substitute the following variables: M,$ = 1/N2(MN)

Ti

Next, we consider the second factor in the least squares solution of the parameter vector p N , namely, the product of the scan matrix Sh times the data vector SA. Introducing the intermediate parameter vector q N q N = shTsh (21) one may exploit the partitioning scheme (1 1) of the scan matrix SiN to find the recursive relationship 3

3

(23b)

L

According to (17), the least squares estimate of the plane , by p N , is given by the parameter vector p ~ denoted expression

In order to reduce the total amount of arithmetic operations, one can be interested in a recursive computation of plane parameter vectors of increasing block sizes of a decomposition. In particular, we shall consider the problem of computing the plane parameter vector P 2 N corresponding to a block of size 2 N X 2 N pixels recursively (1) (2) from the plane parameter vectors { p (0) N, p N ,P N , P N(3) } of the four adjacent subblocks of size N X N, where we returned to the simplified notation p* to denote the true least squares estimate p*. Only case 2 will be considered in the following. Analogous recursions can be found for case 1 (see [38] for details). According to (18), the least squares computation of plane parameter vectors essentially requires the evaluation of two factors. The first is the “inverse scan covariance matrix” Th of fixed dimension 3:

+ M p , -M;)lT

=

N2Th

M,$ T;

+

-+

MN

(264

Th.

(26b)

E. Adaptivity The RPD algorithm of Table I provides the unconditional recursive plane decomposition of the pixel information { s ( x , y)}. It is clear that regions with low detail can be approximated with a sufficient accuracy at high levels of the decomposition (large blocks). But regions of high detail might require very low levels of the decomposition depending on the desired accuracy of approximation, or the acceptable distortion. Hence, some adaptivity must be incorporated in the RPD algorithm of Table I, in order to achieve a meaningful variation of the block size, according to the local detail. In this paper, we discuss two different approaches for a local distortion mea-

IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 39, NO. 6. JUNE 1991

1386

TABLE I UNCONDITIONAL RECURSIVE BOTTOM-UP PLANE DECOMPOSITION OF THE SIGNAL {s(x, y)}. THETRANSFORM MATRIX TiN = (s;:s;,,,-’ Is A DATA-INDEPENDENT DIAGONAL MATRIX DEFINED BY (20) For each 2 x 2 block do:

r

Initialize: MjO’ = s ( x , y); M:” = s ( x + 1, y); M Y ) = s(x + 1, y M ( 2 ) = s(x, y 1);

,

+

When the considered 2 N x 2 N block covers a region of high detail the model underlying hypothesis H I is assumed to converge much better than the model corresponding with hypothesis Ho. Thus, in the presence of high detail, the inequality 3

+ 1)

C E:)’ m=O

5 E$N

(30)

holds, where the left and right sides of expression (30) represent the approximation error energies resulting from hypothesis H , and Ho, respectively. Now, from inequality (30), the minimum variance decision rule is obtained by introducing the threshold 7 N as follows:

1

3

PZ = T;q2 Save: q2.

E$N -

ForN=2,4,8,

r

. . . do:

For each 2 N x 2 N block do: rO

P2N

=

I

rn

5

T;NqZN

Save: q2N.

sure. The first is a minimum variance error criterion and for the second, we introduce a statistical maximum-norm criterion as an error measure. I ) Minimum Variance Detector: Consider again an image block of 2 N x 2 N pixels. We wish to establish a rule for deciding whether the four subblocks labeled 0, 1 , 2 , and 3 are homogeneous in terms of their plane parameters. This requires a hypothesis test of the following form: Hypothesis H,: The four subblocks are homogeneous in terms of their plane parameters. They can be merged to a single block of 2 N X 2Npixels. Hypothesis H I : The four subblocks perform differently in terms of their plane parameters. They must be described by four different plane parameter sets. To aid in the hypothesis test, we first require the approximation error energy of the entire block assuming first the “no-jump” hypothesis Ho. Assuming that the test is performed on the basis of the already quantized plane parameter vector p $ , we may obtain from (14) The approximation error energy is then determined by where

E& = ehTeh = uN - pNQ T [2qN - S A ~ S ~ P $ I (28) = shTsh is the signal energy with the property

U,

3

C E:)’ rn 0 =

HI

S

Ho (merge)

TN.

(31)

We may interpret the left-hand-side expression of (31) as the ‘‘differential increase” in error energy when four subblocks are merged to a larger block. As will be shown in the next section, this expression can therefore be exploited for deriving a global optimization strategy for the underlying quadtree segmentation structure. 2) Statistical Maximum-Norm Criterion: As a second decision rule, a maximum-norm criterion is introduced which considers the approximation error directly, rather than evaluating the approximation error energy. Such a criterion is justified from the desire to keep a certain amount of approximation error pixels absolutely smaller in magnitude than a certain predefined bound. To achieve this, we simply calculate the approximation error for each pixel inside the 2 N X 2 N block assuming hypothesis Ho. In those cases, where the magnitude of the error exceeds a certain threshold T ~ a, counter is incremented. After the entire block has been examined, the counter is compared with the total number of pixels in the block. In the case where the relative error count exceeds a certain value, the decision falls according to hypothesis H , and the decomposition is stopped at the next lower level (block size N X N ) . Otherwise, the level is increased by one and the procedure is examined in turn. In the sequel, we give a summary of this maximum-norm criterion. Note that the error count procedure makes this decision criterion quite robust against pixel noise. We must note, however, that this procedure requires many more computations than the previously discussed minimum variance error criterion because of the need for an explicit computation of all pixel errors in each block at each level of the decomposition. (See Table 11.)

F. Operations Count Concluding this section, we summarize the operations count for the quadtree-RPD coder. The operations count includes the operations for the mean subtraction in the basic blocks (preprocessing), the operations count for the bottom-up recursive plane decomposition which is determined by the RPD-algorithm listed in Table I, and finally the computation of the minimum variance decision rule according to ( 2 8 ) and ( 2 9 ) .

1387

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

TABLE I1 STATISTICAL MAXIMUM-NORM DECISION RULE Set:

7d:

“Q,

0 0 0

I

0

0

0 ,&”

Threshold for pixel error.

z,,: Relative error count threshold. errcount = 0 For0 5 x , y 5 2 N

,Y ) [:eif@le(x, y) > =

S(X,

Y) 7,,

1 do:

-

- a 2 ~

0 0 0 ‘O\,A’ 0 0 0 b 2 ~ x- C Z N Y

then: errcount

=

errcount

* X

+I

0 0 0 )’\E, 0 0 0

HI

zd

errcount S id 4N2 H”

5 1.

(3 0 Cl’ 0 0 ‘Q 0 0

0 Mean subtraction:

,a’0

0 0 0 ‘Q\ 0

D’ 0 0 0 0 0 0 \E)\

ADD/pixel = 2. Y

RPD (Algorithm Table I):

+ l5[& + + & + - - 1 = 2[+ + & + & + & + - - -1.

ADD/pixel = MUL/pixel

Minimum variance decision:

& + & + & + -3 MUL/pixel = 1 + 5 [ ; + & + & + & + - -1. The terms 1 / 4 , 1/16, 1/64, - - - represent the contrito the operations bution of block sizes N = 2, 4, 8, ADD/pixel

=

9[+ +

*

count. This also reveals that higher levels of the decomposition contribute with a decreasing amount of computations to the overall .operations count. For a principal block size of N = 16, as used in the experiments reported in Section V, one estimates a total operations count of roughly 3 multiplications per pixel and 8 additions per pixel including mean subtraction preprocessing. IV. OPTIMIZATION When applied in image data compression, the RPD technique gives rise to a number of additional optimization subproblems. This section suggests solutions to the three major problems, namely, the problem of quantizer optimization, the problem of computing the rate-distortion characteristics and the optimal merging thresholds for a given quantizer set, and finally the problem of reconstruction of the decoded image from the quantized mean and gradient iqformation available at a nonuniform (quadtree-structured) sampling grid. A . Quantizer Optimization One of the crucial points in RPD coding is the optimal choice of quantizer step sizes. There are two basic questions in the quantizer design. The first is the relationship between mean quantizer step size and gradient quantizer step size which are dependent on the block size, and second, the choice of the absolute values of mean or gradient quantizer stepsizes at increasing values of the blocksize N. A meaningful relationship between the mean and quantizer stepsizes at block size N can be derived analytically.

Fig. 6 . Evaluation of plane gradient quantization error at different pixel locations. Example shown for the N = 8 problem. Only the pixels marked by 0 and A must be evaluated due to principal symmetry.

For this purpose, consider the average contribution of quantization error induced by quantization of the gradient parameters [b, c] in a block of size N X N pixels. Without loss of generality, case 2 (linear model with the origin located in the center of the block) is considered. Clearly, the contribution of gradient quantization error to each pixel inside the block is proportional to the distance of the pixel from the origin. The problem has four principal axes of symmetry formed by the axes of the coordinate system and the 45” rotated coordinate axes. The situation is illustrated in Fig. 6. Without loss of generality, we assume that 0 and A are members of the nonnegative sets of address vectors { O : x1 0 , y 1 0 , y >x}

(324

{ A : x 1 0 , y 1 0 ,y = x } .

(32b)

Next, suppose that the step size of the uniform scalar quantizer of the plane gradient parameters [b, c] is 26,. When this step size is small compared to the gradient parameter variance, one can assume that the quantization error of the gradient parameters is uniformly distributed with probability density function

p(6) =

[t

for -6,

I6 5

6, (33)

26g

otherwise.

The quantization errors 6 b and 6, of the gradient parameters [b, c] contribute to the pixel error E ( X , y) at location {x, according to a simplified linear model of the form E(X,

y)

=

X66

+ y6,.

(34)

Note that the quantities x 6 b and y 6 , are again uniformly distributed where 1

for -x6, 5

6b 5

x6g

(35)

P(X6,) = 0

otherwise

1388

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 6, JUNE 1991

and an analogous distribution exists in the case y6,. The probability density function of E @ ,y) is then obtained as the convolution of p (x6b) and p ( y 6 , ) P(E(x, Y ) ) = P(x6b) * P(Y6c)

for 0

PI(€)

IE I(

for (y - x)6,

0

for

y - x)6,

IE I(

y

+ x)6,

+ x)6,

(y

E

On the other hand, it can be readily verified that the average distortion Erninduced by mean parameter quantization can be expressed in terms of the mean parameter quantization step size 26, as

Hence the ratio of the gradient and mean quantizer step sizes for a given blocksize N can be posed as a function of the distortion ratio Eg/Ern

(36) with the pixel coordinates x and y restricted to the halfquarterplane {x, y: y 2 x, y > 0, x > 0}, as shown in Fig. 6. The quantities p 1 ( e )and p2(e)determine the constant and the linear part of the symmetric probability density function of E (x,y) according to 1 Pl(4 = 2x6,

(374

1 P 2 ( 4 = -[x 4XY 6,

+y

-

;].

Then the distortion generated at pixel location {x, y} due to gradient quantization is expressed as

1

(Y

E(x, Y , 6,) = 2

- x)6,

€=O

E2PlWE

~ ~ p ~= (~ ~6 ) ;(38) a ~

where

- (Y + X ) ( Y - x)3

(39) h Y The average distortion in a block of N X N pixels is computed by simply evaluating the partial distortions induced at pixel locations 0 (32a) and A (32b). This can be accomplished by the following recurrence relations : for: x = 0.5, 1.5, 2.5,

[

for: y = x

E2 = E2

,N / 2

+ 1, x + 2 , -

for: x = 0.5, 1.5, 2 . 5 ,

[

* *

*

a ,

- 1.5

N / 2 - 0.5

, N / 2 - 0.5

+ K(x, y )

(40)

which allow the determination of the average distortion -

Eg caused by plane gradient quantization as follows:

where El and E2 are functions of the block size N , according to (40). Table I11 lists some explicit values of the quantizer step size ratio for different values of N , as suggested by the described procedure, under the assumption that each component of the plane parameter vector contributes with the same portion of quantization error to the overall distortion, so that Eg = 2Ern(balanced quantization). Once the optimal quantizer step size ratios are known, it remains to determine the absolute values of the mean quantizer step size for all block sizes appearing in the decomposition. The gradient quantizer step sizes are then completely determined by the step size ratios of Table 111. For large blocks (e.g., N 2 8), the choice of the mean quantizer step size is mainly an experimental procedure and relies on a subjective evaluation of the resulting picture quality. The variation of the quantizer step size in the large blocks has little influence on the overall rate. The main contribution to the overall rate comes from the 2 x 2 and 4 X 4 blocks. For these block sizes, the quantization step size has been optimized by an automatic procedure that calculates the rate distortion characteristics of the quadtree-RPD coder for a given quantization. This procedure will be described in the next subsection. The rate distortion curves are computed for a number of different choices of the mean parameter quantization step sizes in the 2 X 2 and 4 X 4 blocks. Using these curves, the optimal quantizer at a desired comprehion can be determined. In general, the optimization process’s result is that the parameters of small blocks must be quantized more coarsely than those of larger blocks. This result is also reasonable in terms of the subjective image quality since small block sizes correspond with high detail regions in the image where errors are less visible. Some appropriately designed quantizer sets will be presented later in the paper, and the corresponding rate-distortion characteristics of the entire quadtree-RPD will be discussed.

B. Rate-Distortion Characteristics and Threshold Optimization The decision rules for the merging of four subblocks, as described in the previous section, are basically local

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

TABLE I11 AS A QUANTIZER STEPSIZERATIOS FUNCTION OF THE BLOCK SIZEN FOR A D ~ O & T I O RATIO N OF E , / E , = 2

~

e

x

2.0 0.894 0.436 0.216

2 4 8 16

1.6 1.51.41.3-

I

I

I

I

I

.

I

I

I

I

I

I

1

I

I

.

I

.

I

QR

0.30.2-

tests of the image data inside a block of 2 N X 2 N pixels. Nevertheless, in the case of the minimum variance decision rule (31), one can devise a procedure that allows the optimization of the merging thresholds { T N , N = 2 , 4 , 8, 1 involved in the test (31) so that, when the optimized thresholds are used, the coding result will be optimal in that for a given number of merged subblocks, the overall increase A E' in approximation error is minimal. Clearly, such an optimization strategy makes it necessary to find, at a given state of the decomposition, the particular node which, in the entire picture, is the node with the smallest increase A E' in approximation error energy,

10

20 30

50 60 70 80 90 100 110 120

40

MSE

n

1.4

,

l

'

l

'

l

'

l

'

l

'

l

'

---

C m=O

Er''

(44)

where A E' is recognized as precisely the left-hand-side expression of (31). Clearly, each time an additional node is merged, this will not only result in an increase in the overall approximation error energy, but also in some decrease of the overall rate. This decrease in rate can be formulated as the differential entropy A R

A R = [ " 2 N - i3 o R F ) RZN -

m=O

RF) - 4

bit]

for N

[bit]

otherwise

=

2

(45)

where RN

= P(aN)

log2 [ P ( a N ) l

+ p(bN) log2 [ P ( b N ) l

+ P ( c N ) log2 [ P ( c N ) ]

(46)

and the p (*)'s are the probabilities of the quantized plane parameters of block size N x N . It remains to note that one has to take into account that each merging operation cuts the quadtree code by 4 b (see again (45)), except in the lowest decomposition level. This way, one can define a "steepest descent merge" as the sequence of merging of nodes of increasing value of p in the rate-distortion diagram where AE'/AR. (47) Obviously, at a certain stage of the decomposition the best decision is to merge the particular node which, among all p =

. 10

l 20

~

. 30

, ~ l . 40 50 60

l ~ l . 70 80 90

l . 100

l

MSE

3

AE' = EiN -

o

Fig. 7. Overall rate-distortion characteristics of the quadtree-RPD coder including bits for quadtree structure code and PCM-coded 16 X 16 block means when only a simple reconstruction by the linear planes is assumed. Parameter is the differential increase A E' in approximation error energy. Curves shown for the quantizers Q8, 4 1 2 , 416, and Q24, according to Table IV. (a) Rate-distortion characteristics for test picture Lena. Results of other coding techniques (see Table VU) marked for comparison. See also the discussion in Section V. (b) Rate-distortion characteristics for test picture Boats.

possible nodes, offers the smallest relative increase in distortion, i.e., the node associated with the minimum value of p (steepest descent). In practice, one can assume that AR is approximately constant. In this case, it turns out that the true steepest descent merge can be approximated quite closely by only seeking for the node that minimizes A E'. Such approximate steepest descent merging diagrams (rate-distortion curves) are displayed in Fig. 7 for the commonly used 512 x 512 x 8 b test pictures Lena and Boats. The diagrams have been computed for several different quantizer sets. Only the results for the quantizer sets Q8, Q12, 4 1 6 , and 4 2 4 are explicitly plotted in the diagrams of Fig. 7 . The step sizes of these quantizer sets are listed in Table IV. Each curve represents the MSE performance of the RPD coder for a particular quantizer set. When the plane parameters are entropy coded, the true coder behaves very closely to the performance predicted by these curves. When Huffman coding was used, we observed an increase of roughly 1.5% of the true rate compared to the estimated rate through the redundancy of the Huffman code. But other forms of entropy coding, such as arithmetic coding [39], could be used as well. Some more discussion about the rate-distortion curves of Fig. 7 seems to be in order. The curves start at an initial

~

l

.

l

.

~

IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 39. NO. 6. JUNE 1991

I390

TABLE IV UNIFORM QUANTIZER SETS4 8 - Q24 FOR MEANAND GRADIENT QUANTIZATION AS U S E D IN THE EXPERIMENTS

35 Quantizer

Step Size

N

=

2

N

=

4

N

=

8

Q8

26, 26,

8.0 16.0

4.0 3.55

2.0 0.85

1 .o 0.25

12.0 24.0

6.0 5.25

3.0 1.30

1 .o

412

26, 2%

0.25

416

2&,, 26,

16.0 32.0

8.0 7.15

4.0 1.75

2.0 0.45

424

26, 2%

24.0 48.0

12.0 10.95

6.0 2.65

3 .O 0.65

25 20

-5 0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

RATE [BIT/PIXEL]

state of the decomposition where the entire picture is represented by 2 X 2 blocks. When these blocks are successively merged to larger blocks, we move left in the rate distortion diagram. This is not surprising since the 2 X 2 parameters are more coarsely quantized than the parameters of the 4 x 4 blocks. Therefore, merging in the homogeneous regions even reduces the overall distortion. When a sufficient number of 2 x 2 blocks has been merged, the coder approaches at a state where blocks with more detail must be merged. At this state of the merging process, there exists a number of blocks where the merging of 2 X 2 blocks to larger 4 X 4 blocks does not induce any additional error. The worse approximation by the 4 X 4 blocks is just compensated by the more coarse quantization of parameters in the 2 x 2 blocks. Here, the ratedistortion curve drops down until more detailed blocks are reached where merging induces an additional error. Subsequently, the curve turns to the right. Clearly, the described effect occurs not only at the transition of 2 X 2 to 4 X 4 blocks, but also at the higher levels of the decomposition. Therefore, the rate-distortion curve of the quadtree-RPD coder drops down several times until a final smooth descent is reached. The characteristic of the ratedistortion curve depends on the particular quantizer set being involved. The rate-distortion characteristic is therefore important for evaluating the effect of the different quantizer sets. In particular, it turned out that a strategy which simply cuts the mean quantization step size by a factor of one half when the block size increases by a factor of two is generally near optimal. When the differential increase in approximation energy { A E h , N = 2, 4, 8, - } for each block size N is monitored during the successive merging process, one obtains the progress of the corresponding optimal thresholds {7N,N = 2, 4,8, ] simply by setting 7N = A Eh which causes the true coder based on the minimum variance decision rule (3 1) to perform identically to the optimization process. This way, one obtains the optimal thresholds for each block size as a function of the desired rate. Examples of the optimal thresholds versus bit rate are shown in Fig. 8. Clearly, when the thresholds are chosen as suggested by these curves, the true coder will produce exactly the same quadtree structures as produced by the optimization process itself. This surprising statement is easily verified by not-

--

---

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 RATE [BIT/PIXEL]

Fig. 8 . Optimal merging thresholds versus desired bit rate for the minimum variance decision rule. 7N is the threshold used in the decision whether four subblocks of size N x N pixels can be merged. (a) Thresholds shown for test picture Lena and quantizer Q8. (b) Thresholds shown for test picture Boats and quantizer 4 1 2 .

ing that the merging process is completely determined by only three conditions: 1) the bottom-up direction, 2) the quadtree nesting law which allows a node to change into a leaf only when this node already has four leaves, and 3), the comparison of the associated increase in approximation error energy with a prescribed threshold according to condition (31). Based on these rules, it can be shown that the r8llult of a decomposition solely depends on the merging threshold, and is therefore invariant to the specific order in which the nodes are successively merged.

C. Reconstruction from Nonuniformly Sampled Mean and Gradient Information In the decoder of an RPD-based scheme, the decoded picture is available in terms of the quantized mean and gradient information sampled at a nonuniform (quadtreestructured) grid. The simplest way is to reconstruct the image by the linear plane approximation (2). Clearly, in a next step, one could connect adjacent blocks by appropriate interpolation techniques. Such a postprocessing takes advantage of the a priori knowledge that gradients seldom change abruptly at the boundaries of larger blocks. In this section, we describe a simple cubic interpolation scheme which is conceptually well suited to RPD and

1391

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

. . . ..................... .. ............ .. . . . . .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..... ....... . . . . . .......

T I

I

I

I

I

I

I

1

0

1

2

3

+ X

Fig. 9. Boundary region between two blocks.

I

helps in making block boundaries less visible when very low bit-rate coding is an issue. Consider the boundary region of width 2 pixels (onedimensional case here). The approximation planes in the blocks left and right of the boundary region are uniquely determined by their boundary pixel value and the firstorder derivative of the approximation function. See Fig. 9. Now assume that the luminance function in the boundary region satisfies a cubic law s(x)

=

a x 3 + bx2

+ cx + d

(48)

+ 2bx + c.

(49)

................... .. .. . . . 1 . . l..i..I..I..I..i.. . . . . . . . . .. .. .. .. .................... ..................... . . . ................. ................. ................. ................. ................. .................

with the first-order derivative

as(x)/ax

=

3ax2

The cubic law is conceptually well suited to RPD since it is uniquely determined by the zero and first-order derivatives at the boundaries of the interpolating curve in a certain interval. Since the pixel values and the first-order derivatives are explicitly known in RPD coded blocks, we can derive the parameters of (48) by evaluating the expressions

as(x)/axl,=, = c as(x)/dx(,,, = 27a

(50a)

+ 6b + c

(50b)

+ 9b + 3c + d

(50d)

(504

s(0) = d

s ( 3 ) = 27a which gives b = [ s ( 3 ) - as(x)/dxl,,,

(51a) (5 lb) The interpolation defined by (48), (50a, c), and (51a, b) can be carried out in a separable procedure. During the decoding process, we generate a bit mask which marks the boundary regions (see Fig. 10). After the decoding process has been completed, the interpolation process operates first in the horizontal direction. When a boundary region of width 2 pixels is found, two boundary pixels are interpolated and the corresponding control bits in the bit mask are set. In the subsequent vertical procedure, the algorithm performs identically, but uses the information provided by the horizontal run. Alternatively, one could start with the vertical interpolation followed by the horizontal procedure. This gives virtually the same result. Fig. a =

[as(x)/axl,,,

-c

-

- 2c - d ] / 3 6b]/27.

I

I

i.jjl :

j : ... ........... .............. ............... ................ ................ i ,

,

,

,

:: : : :

,

,

1 I

(C) Fig. 10. Control bit mask for interpolation. Boundaries of 1 pixel width are not interpolated. (a) Bit mask after decoding (prior to interpolation). (b) Bits set by horizontal interpolation run. (c) Bits set by final vertical interpolation run.

10 illustrates the control procedure underlying this interpolation algorithm. The next section shows coding results for the two types of reconstruction; the simple reconstruction from linear planes ( 2 ) , and the results from an extended version where the block boundaries were smoothed by interpolation using the cubic polynomial (48). V. EXPERIMENTAL RESULTSAND COMPARISONS The quadtree-RPD coder has been tested for a set of commonly used images. This section reports the results obtained for the monochrome test pictures Lena and Boats, both of size 512 X 512 pixels with a resolution of 8 b. The two test pictures are rather different in their charac-

I392

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 6. JUNE 1991

TABLE V CODER PARAMETERS AND RESULTS FOR LENAAT 0.5 b/PIXEL. CODING RESULTS SHOWN FOR QUANTIZER 48

N

Number of Blocks

Number of Codewords MEAN

Number of Codewords GRADIENT

Total Number of Bits for MEAN

Total Number of Bits for GRADIENT

2 4 8 16

3052 5101 2030 150

45 69 59 2

19 32 36 14

14338 25057 9128 150

18696 3273 1 12059 854

Total number of bits for mean and gradient information: Total number of bits for auadtree structure code: Total number of bits for i 6 X 16 block mean PCM coded: Total: MSE (not interpolated): MSE (interpolated):

38.33 37.05

113013 10384 8192 131589 b = 0.502 b/p. PSNR (not interpolated): PSNR (interpolated):

32.29 dB 32.44 dB

TABLE VI CODER PARAMETERS AND RESULTS FOR BOATS AT 0.5 b/PIXEL. CODING RESULTS SHOWN FOR QUANTIZER 412

N

Number of Blocks

Number of Codewords MEAN

Number of Codewords GRADIENT

Total Number of Bits for MEAN

Total Number of Bits for GRADIENT

2 4 8 16

4376 8454 1705 1

20 33 28 1

8 18 12 2

16016 3 1869 3462 1

18128 36854 3989 2

Total number of bits for mean and gradient information: Total number of bits for quadtree structure code: Total number of bits for 16 x 16 block mean PCM coded: Total: MSE (not interpolated): MSE (interpolated):

25.72 23.76

teristics. This allows a better evaluation of the coding technique. In the second part of this section, the results obtained by the quadtree-RPD coder are compared with a set of representative coding schemes which have been recently described in the literature. A . Experimental Results

In the presented quadtree-RPD coder, the picture is divided into blocks of 16 X 16 pixels prior to processing. These blocks are treated as independent subimages which can be further subdivided, where the smallest possible block size is N = 2. Prior to processing, the mean of the 16 x 16 subimage is calculated and subtracted from the subimage. An 8-b PCM codeword is used to encode the 16 x 16 block mean. In the described coder, the total information for a 16 x 16 subimage can therefore be divided into three components: the 8-b PCM codeword for the mean, the quadtree structure code, and a string of Huffman codewords for the PP’s. The signal in the blocks of a different size is represented by plane parameter vectors with three components (mean and gradient parameters). These components are independently Huffman coded. Two independent Huffman code tables are related to each blocksize. The number of codewords varies with the block size (see Tables V and VI for details).

110321 14664 8192 133177 b

=

0.508 b/p.

PSNR (not interpolated): PSNR (interpolated):

34.03 dB 34.37 dB

The coding results for Lena at 0.5 b/pixel are summarized in Table V, and Fig. 11 shows the corresponding pictures. The coding results for Boats at 0.5 b/pixel are summarized in Table VI. Fig. 12 shows the corresponding pictures. Fig. 13 finally shows enlarged versions of the original pictures and the coded pictures. Throughout the tests we used the minimum variance decision rule (3 1) where the thresholds were chosen as suggested by the threshold curves shown in Fig. 8. In comparison to the maximum-norm criterion, one finds that the minimum variance decision rule is a little more sensitive to variation of the thresholds. On the other hand, when the thresholds are chosen in an optimal way, one obtains the lowest mean-square error. The maximumnorm makes the quadtree-RPD coder more robust against changes in the picture characteristics. Fig. 7 reveals that in the case of Lena, the quantizer Q8 gives the best result at 0.5 b/pixel, whereas in case of the test picture Boats, Q12 is the preferable quantizer at the same rate. All important coder parameters and results from the experiments are listed in Tables V and VI. Both the mean-square error (MSE) and the peak-to-peak signal-to-noise ratio (PSNR) 2S2 PSNR = 10 log MSE are shown.

[dBl

(52)

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

Fig. 1 1 . (a) Original test image Lena (512 x 512 x 8 b). (b) Coded image Lena at 0.5 b/pixel without interpolation. ( c ) Coded image Lena at 0 . 5 b/pixel with cubic interpolation. (d) Quadtree segmentation structure. Block size ranges from N = 2 to N = 16. ( e ) Control bit mask for interpolation. Black raster marks boundary regions. (f) Boundary regions of 1 pixel width: not interpolated.

Throughout all of the tests, we used the linear model referred as case 2, namely, the model where the origin of the coordinate system was assumed to be in the center of a block. Using this model, we obtained much improved pictures when compared to earlier tests based on case 1 [38]. The improvement in subjective image quality was much more noticeable than the pure MSE improvement. Another improvement is achieved by incorporating the cubic interpolation of neighbor blocks. The interpolated version of Lena coded at 0.5 b/pixel can hardly be distinguished from the original by the nonspecialist. Blocking artifacts are not visible, and the edges are reproduced

accurately. A comparison of the enlarged original (Fig. 13(a)) and the enlarged coded picture with interpolation (Fig. 13(c)) reveals that much of the noise in the original picture has been removed during the coding process. This is clearly a consequence of the inherent assumption in RPD coding that homogeneous areas are to be approximated by flat planes. It seemed that the coder is not very sensitive to pixel noise. On the other hand, the highly adaptive scheme allows the accurate representation of fine detail in the pictures. Similar results were found in the case of the test picture Boats. In that case, one observes some artifacts around

1394

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39. NO. 6. JUNE 1991

Fig. 12. (a) Original test image Boats (512 x 512 x 8 b). (b) Coded image Boats at 0.5 b/pixel without interpolation. (c) Coded image Boats at 0.5 b/pixel with cubic interpolation. (d) Quadtree segmentation structure. Block size ranges from N = 2 to N = 16. (e) Control bit mask for interpolation. Black raster marks boundary regions. ( f ) Boundary regions of 1 pixel width: not interpolated.

sharp luminance discontinuities in the image reconstructed from linear planes only. See the enlarged picture, Fig. 13(e). Here, it was interesting to see that the interpolation had much more effect than in the case of the simpler picture Lena. The remaining blocking artifacts around sharp luminance discontinuities were much reduced by the application of cubic interpolation as seen in Fig. 13(f). But the fine detail in the picture is still reproduced quite satisfactorily, as becomes apparent from a comparison of Fig. 13(d) (original Boats) and Fig. 13(f) (coded image Boats with interpolation).

B. Comparisons The comparison of different coding techniques is generally a difficult task. In the following, we compare the MSE values of several representative coding techniques with the quadtree-RPD coder of this paper. For this purpose, the coding techniques are listed in Table VII, including the reference, the name of the first author, the coding technique, and the results taken from the papers. These results are shown for Lena (512 x 512) with 8-b resolution, which is the most commonly used test picture

1395

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

Fig. 13. (a) Original test image Lena (enlarged). (b) Coded Lena without interpolation (Fig. ll(b) enlarged). (c) Coded Lena with cubic interpolation (Fig. 1 l(c) enlarged). (d) Original test image Boats (enlarged). (e) Coded Boats without interpolation (Fig. 12(b) enlarged). ( f ) Coded Boats with cubic interpolation (Fig. 12(c) enlarged).

in the field. Results for the more difficult picture Boats are seldomly reported and therefore the comparisons had to be restricted to Lena only. In order to enable a better comparison with the quadtree-RPD coder, the results reported in Table VI1 were additionally marked as points in the rate-distortion diagram of the quadtree-RPD coder as shown in Fig. 7(a). Note that in this diagram, the curves represent the result obtainable by the quadtree-RPD coder when only a simple reconstruction of the image by the linear planes (without additional interpolation) is used.

VI. DISCUSSION A new variable block size image coding algorithm based on the quadtree data structure and a linear model of the luminance function inside the blocks of a variable size has been presented. The new method consequently employs recursive processing schemes which reduce the complexity to only 3 multiplications and 8 additions per pixel, including mean subtraction, quadtree-structured recursive plane decomposition, and hypothesis testing. Although the

IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 39, NO. 6, JUNE 1991

1396

TABLE VI1 SEVERAL IMAGE CODING ALGORITHMS AND THEIRMEAN-SQUARE ERRORPERFORMANCE. RESULTSSHOWN FOR TESTPICTURE LENA(512 X 512 X 8 b)

Reference

Name of First Author

Coding Technique

Mean-Square Error (MSE)

Bit Rate (Bit/pixel)

V. Ramamoorthy T. Kim

VQ hum. vis. sys. SMVQ OMVQ VQ-TC SBPVQ MSHVQ MSHVQ PTSVQ PTSVQ SB arithm. cod. SBC-VQ Pyramid-interpol. DCT TC-FML CT-VQ DCT-CVQ quadtree-RPD interpol.

83.0 65.0 29.0 68.1 36.6 18.1 52.2 37.2 52.6 22.7 36.6 64.3 12.0 65.0 51.1 32.2 37.1

1.67 0.25 0.5 0.35 0.45 1.01 0.36 0.51 0.31 0.7 0.5 0.29 0.71 0.5 0.3 0.44 0.5

D. J. Vaisey R. A. Cohen Y.-S. HO E. A. Riskin D. LeGall R. J. Safranek M. Todd W. A. Pearlman G. W. Wornell Y.-S. HO J. W. Kim P. Strobach

underlying model of the luminance function is surprisingly simple, the proposed quadtree-RPD coder compares well with a number of much more sophisticated variable block size image coding algorithms proposed in the literature. Especially the fact that the new coder performs almost identically to some recently introduced variable block size image coding algorithms based on vector quantization [ 141, [43], [45] seems instructive. One possible explanation for this surprising similarity in the MSE performance could be given by the observation that, indeed, large areas in natural images are represented by plane-like patterns in most VQ codebooks. For these reasons, the linear plane assumption turned out to be much less restrictive than expected. We have discussed all important aspects of quadtree-RPD coding, such as optimal parameter quantization, rate distortion characteristics, decomposition threshold optimization, and interpolative reconstruction. In its present implementation, the quadtree-RPD coder seems to be an appropriate coding technique in the range of 0.5-1.5 b/pixel. From the experimental results, it seems that in the case of natural images, the centroids of the multidimensional quantization lattice underlying the quadtree-RPD method are close to the centroids of the quantization lattice obtained from a vector quantizer design. This conclusion, on the other hand, suggests that the quadtree-RPD method can possibly be extended to obtain a new type of fast search procedure for vector quantization: one first applies an adaptive quadtree recursive plane decomposition on a given image. The plane parameters are then used as initial addresses in a stmctured codebook. ACKNOWLEDGMENT

The author wishes to thank M. Schielein for his during the preparation of the experimental results.

REFERENCES [I] A. K. Jain, “Image data compression: A review,” Proc. IEEE, vol. 69, no. 3, pp. 349-389, Mar. 1981. [2] A. K. Jain, “Advances in mathematical models for image processing,” Proc. IEEE, vol. 69, no. 5 , pp. 502-528, May 1981. [3] W. K. Pratt, Image Transmission Techniques. New York: Academic, 1979. [4] N. Ahmed, T. Natarjan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90-93, Jan. 1974. [ 5 ] A. K. Jain, “A sinusoidal family of unitary transforms,” IEEE Trans. Pattern Analysis Mach. Intell., vol. PAMI-1, pp. 356-365, Oct. 1979. [6] R. M. Gray, “Vector quantization,” IEEE ASSP Mag., Apr. 1984. [7] R. L. Baker, “Vector quantization of digital images,” Ph.D. dissertation, Stanford University, Stanford, CA, June 1984. [8] A. Gersho and B. Ramamurthi, “Image coding using vector quantization,” in Proc. Int. Con$ ASSP, Apr. 1982, pp. 428-431. [9] M. Vetterli, “Multidimensional subband coding: Some theory and algorithms,” Signal Processing, vol. 6, pp. 97-112, Apr. 1984. [IO] J. W. Woods and S . D. O’Neil, “Subband coding of images,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, no. 5 , pp. 1278-1288, Oct. 1986. [I I ] W. H. Chen and W. K. Pratt, “Scene adaptive coder,” IEEE Trans. Commun., vol. COM-32, no. 3, pp. 225-232, 1984. [12] P. M. Farrelle and A. K. Jain, “Recursive block coding-A new approach to transform coding,” IEEE Trans. Commun., vol. COM-34, no. 2, pp. 161-179, Feb. 1986. [ 131 C.-T. Chen, “Adaptive transform coding via quadtree-based variable block size DCT,” in Proc. Int. Con$ ASSP (Glasgow, Scotland), May 1989, pp. 1854-1857. 1141 E. A. Riskin, E. M. Daly, and R. M. Gray, “Pruned tree-structured vector quantization in image coding,” in Proc. Int. Con$ ASSP (Glasgow, Scotland), May 1989, pp. 1735-1738. 1151 D. J. Vaisey and A. Gersho, “Variable block size image coding,” in Proc. Int. Con$ ASSP (Dallas, TX), Apr. 1987, pp. 1051-1054. [I61 A. Baskurt and R. Goutte, “Encoding the location of spectral coefficients using quadtrees in transform image compression,” in Proc. Inc. Conj ASSP (Glasgow, Scotland), May 1989, pp. 1842-1845. [I71 H. Samet, “The quadtree and related hierarchical data structures,’’ ACM Computing Surveys, vol. 16, no. 2, pp. 188-216, June 1984. 1181 A. Klinger and C. R. Dyer, “Experiments on picture representation using regular decomposition,” Comput. Graphics Image Processing, vol. 5, pp. 68-105, 1976. 1191 G. M. Hunter and K. Steiglitz, “Operations on images using quadtrees,” IEEE Trans. Pattern Analysis Mach. Intell., vol. PAMI-I, pp. 145-153, 1979. [20] K. S. Fu and J. K. Mui, “A survey on image segmentation,” Part. Recognition, vol. 13, pp. 3-16, 1981.

STROBACH: QUADTREE-STRUCTURED RPD CODING OF IMAGES

[21] P. Habericker and R. Thiemann, “Klassifizierung von Siedlungen in digitalisierten Luftbildern, die als Quad-Tree-Strukturen codiert sind,” in Informatik Fachberichte, W . Brauer, Ed., DAGM-OAGM Symposium Mustererkennung. Berlin, Gemany: Springer-Verlag, Oct. 1984, pp. 49-54. [22] B. Hammer, A. v. Brandt, and M. Schielein, “Hierarchical encoding of image sequences using multistage vector quantization, ” in Proc. Int. Conf. ASSP (Dallas, TX), Apr. 1987, pp. 1055-1058. [23] D. F. Elliot and K. R. Rao, Fast Transforms, Analyses, Algorithms, Applications. London; Academic, 1982. [24] H. S. Hou, “A fast recursive algorithm for computing the discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, no. 10, pp. 1455-1461, Oct. 1987. [25] D. N. Green and S. C. Bass, “Representing periodic waveforms with nonorthogonal basis functions,” IEEE Trans. Circuits Syst., vol. CAS-31, no. 6, pp. 518-534, June 1984. [26] C. R. Paul and R. W. Koch, “On the piecewise linear basis function and piecewise linear signal expansion,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, pp. 263-268, Aug. 1974. [27] M. Eden, M. Unser, and R. Leonardi, “Polynomial representation of pictures,’’ Signal Processing, vol. 10, pp. 385-393, 1986. [28] R. Leonardi and M. Kunt, “Adaptive split for image coding,” in Proc. IASTED Int. Symp. Appl. Signal Proc. Digital Filtering (Paris, France), June 1985, pp. 220-223. [29] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. COM-28, pp. 84-95, Jan. 1980. [30] S . P. Lloyd, “Least squares quantization in PCM,” Bell Laboratories, 1957, unpublished memorandum. [3 11 J. Max, “Quantizing for minimum distortion,” IRE Trans. Inform. Theory, vol. IT-6, pp. 7-12, 1960. [32] P. Strobach, D. Schutt, and W. Tengler, “Space-variant regular decomposition quadtrees in adaptive interframe coding, ’’ in Proc. Int. Conf.ASSP(New York, NY), Apr. 1988, pap. M7.8, pp. 1096-1099. [33] P. Strobach, “Tree-structured scene adaptive coder,” IEEE Trans. Commun.,to be published. [34] P. Strobach, “Quadtree-structured interframe coding of HDTV sequences,” in Proc. SPIE Cambridge Symp. Visual Commun. Image Processing ’88 (Cambridge, MA), Nov. 1988, pp. 812-820. 1351 R. Wilson, “Quadtree predictive coding-a new class of image data compression algorithms,” in Proc. Int. Con$ ASSP (San Diego, CA), Mar. 1984, pap. 29.3.1. [36] Y. Yakimovsky, “Boundary and object detection in real-world images,” J . Ass. Comput. Mach., vol. 13, pp. 599-618, 1976. 1371 H.-H. Nagel and G. Rekers, “Moving object masks based on an improved likelihood test,” in Proc. Int. Con$ Patt. Recognition (Munchen, Germany), 1982, pp. 1140-1142. [38] P. Strobach, “Image coding based on quadtree-structured recursive least squares approximation,” in Proc. Int. Con$ ASSP (Glasgow, Scotland), May 1989, pp. 1961-1964. 1391 G. G. Langdon and J. Rissanen, “Compression of black-white images with arithmetic coding,” IEEE Trans. Commun., vol. COM-29, pp. 858-867, June 1981. [401 V. Ramamoorthy and N. S. Jayant, “High quality image coding with a model-testing vector quantizer and a human visual system model,” in Proc. IEEE Int. Con$ ASSP (New York, NY), Apr. 1988, pp. 1164-1 167. [411 T. Kim, “New finite state vector quantizers for images,” in Proc. Int. Con$ ASSP (New York, NY), Apr. 1988, pp. 1180-1183. 1421 R. A. Cohen and J . W. Woods, “Sliding block entropy coding of images,” in Proc. Int. Conf.ASSP (Glasgow, Scotland), May 1989, pp. 1731-1734.

1397

Y .-S. Ho and A. Gersho, “Variable-rate multistage vector quantization for image coding,” in Proc. IEEE Int. Con$ ASSP (New York, NY), Apr. 1988, pp. 1156-1159. D. LeGall and A. Tabatabai, “Subband coding of digital images using symmetric short-kernel filters and arithmetic coding techniques,” in Proc. IEEE Inr. Conf.ASSP (New York, NY), Apr. 1988, pp. 761764. R. J . Safranek, K. MacCay, N. S . Jayant, and T. Kim, “Image coding based on selective quantization of the reconstruction noise in the dominant subband,” in Proc. IEEE Int. Con$ ASSP (New York, NY), Apr. 1988, pp. 765-768. M. Todd and R. Wilson, “An anisotropic multiresolution image data compression algorithm,” in Proc. Int. Con5 ASSP (Glasgow, Scotland), May 1989, pp. 1969-1972. W.A. Pearlman, “Variable block rate and blockwise spectral adaptation in cosine transform image coding,” in Proc. Int. Conf.ASSP (New York, NY), Apr. 1988, pp. 773-776. G. W. Wornell and D. H. Staelin, “Transform image coding with a new family of models,” in Proc. IEEE Int. Con$ ASSP (New York, NY), Apr. 1988, pp. 777-780. Y.-S. Ho and A. Gersho, “Classified transform coding using vector quantization,” in Proc. IEEE Int. Conf.ASSP (Glasgow, Scotland), May 1989, pp. 1890-1893. J. W. Kim and S. U. Lee, “Discrete cosine transform-classified VQ technique for image coding,” in Proc. IEEE In?. Con$ ASSP (Glasgow, Scotland), May 1989, pp. 1831-1834.

Peter Strobach (M’86) was born in Passau, West Germany, on February 6 , 1955. He received the Engineer’s degree in electrical engineering from Fachhochschule Regensburg in 1978, the DiplomIngenieur degree in statistics and computer science from the Technical University of Munich, Munich, Germany, in 1983, and the Dr.-Ing. (Ph.D.) degree from Bundeswehr University Munich, Munich, Germany, in 1985. From October 1976 to February 1977, and during the summer of 1977, he was a visiting scholar at CERN Nuclear Research Laboratory, Geneva, Switzerland. From 1978 to 1981, he was temporarily with Messerschmitt-Boelkow-Blohm Research Center, Munich, where he worked on aircraft radar systems. In 1981, he held a Friedrich Ebert Scholarship. From 1983 to 1985, he was a Research Assistant at the Mathematics and Computer Science Institute, Bundeswehr University, Munich, where he worked on architectures for adaptive signal processing and the development of numerically robust estimation algorithms for finite arithmetic applications. He has been a Lecturer for adaptive signal processing algorithms at the University of Erlangen-Nuernberg, Germany, and was also an invited Guest Lecturer at the University of Tallinn, Estonia, U.S.S.R. His current research interests include the application of combinatorial optimization methods to adaptive filtering and artificial neural network learning problems, applications such as biomedical signal processing and pattern recognition, and VLSI architectures and parallel processing. Since May 1986, he has been with Siemens AG, Zentrale Forschung und Entwicklung (Information Systems Laboratory) in Munich. Dr. Strobach is a member of the IEEE Signal Processing Society. He has written over 30 technical papers and is the author of the book Linear Prediction Theory: A Mathematical Basis for Adaptive Systems (Springer Series in Information Sciences, vol. 21, 1990). He received the 1988 ITG Paper Prize Award.

Suggest Documents