image coding using vector-embedded karhunen-lo eve ... - CiteSeerX

3 downloads 0 Views 213KB Size Report
transforms, a number of works have been proposed [5],[6]. However, most of the ... 2, we review the Karhunen-Lo eve transform (KLT) brie y. In Section 3, we ...
 IMAGE CODING USING VECTOR-EMBEDDED KARHUNEN-LOEVE TRANSFORM

Toshihisa Tanakay and Yukihiko Yamashitaz y z

Department of Electrical and Electronic Engineering

Department of International Development Engineering

Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8552, Japan

[email protected], [email protected]

ABSTRACT In this paper, the theory and the design of a new class of orthogonal transforms are presented. The novel transform is derived from a correlation matrix in which an arbitrary orthonormal system is embedded. By embedding an orthonormal system designed empirically, we obtain the transform that can not only represent intuitive features but also possess statistical property like the KLT. Our main motivation is the application in block-based adaptive transforms coding. We show a design example of the transform, which adapts orientational features such as edges and lines. Using this transform, we perform orientation adaptive coding. In experimental results, it is shown that image coding using the transform is e ective in rate-distortion criterion and subjective quality.

1. INTRODUCTION Among all unitary transforms, the Karhunen-Loeve transform (KLT) [1],[2],[3] is the one that concentrates most of energy into the rst few coecients. Because the KLT is data dependent, obtaining the KLT for each signal, in general, is a nontrivial computational task. In several standards for speech, image and video coding, alternatively, the discrete cosine transform (DCT) [4], whose basis is xed, is widely used. Note that the DCT is the particular case of the KLT since the DCT is a limit for the KLT of a rst-order Markov process with a large positive correlation coecient  ( ! 1). The KLT derived from a xed correlation model, however, is not optimal for a non-stationary signal, such as a block of an image containing strong edges or lines. It follows that high-frequency data lost by quantization cause visual artifacts called ringing artifacts. It is usually dicult to generate a transform which can represent such a nonstationary signal. An adaptive transform is one of methods to settle this problem. In the adaptive approach, for an input signal, a suitable rule is used to select an appropriate transform (Figure 2). In order to design adaptive transforms, a number of works have been proposed [5],[6]. However, most of the methods is based on statistical features of signals, that is, these transforms are appropriate for only stationary signals. To solve this problem, we attempt to represent a class of the non-stationary signals by a

transform containing an orthonormal system designed empirically. In this paper, we propose a procedure to design a novel orthogonal transform called the vector-embedded KarhunenLoeve transform (VEKLT), which is derived from the correlation matrix in which an arbitrary orthonormal system is embedded. This transform possesses energy packing property like the conventional KLT. Simultaneously, it can represent intuitive features with the embedded vectors. In addition to the formulation of the VEKLT, we design the orientation adaptive VEKLT which can represent edges and lines. Although block-transform-based orientation adaptive coding has been proposed in [5], we adopt a direct approach, where vectors to express directional features are determined rstly. The organization of this paper is as follows. In Section 2, we review the Karhunen-Loeve transform (KLT) brie y. In Section 3, we formulate our proposed VEKLT. In Section 4, we present a design example of the VEKLT for orientation adaptive transform of images, where directionality in a local region is considered. In Section 5, we describe details of our proposed coder. In Section 6, we show experimental results of image compression. Finally, we conclude our proposed framework in Section 7.

1.1. Notation In terms of notation, the following conventions are adopted:

RNN denotes the N dimensional Euclidean space. For f ; g 2 R , hf ; gi denotes the inner product of f with g. kf k de-

notes the norm of f . Ef denotes the ensemble average for f . AT denotes the transposition of a matrix A. rank(A) denotes the rank of A.

2. KARHUNEN-LOE VE TRANSFORM | A REVIEW Let us review the well-known Karhunen-Loeve transform (KLT). Suppose we create N -dimensional vectors from the given image by taking blocks of N consecutive samples. Let f = [f (0); : : : ; f (N 0 1)]T be a vector of the original data samples in RN . The correlation matrix R with respect to f is given by

R = E f [ T ] ;

(1)

where we assume rank(R) = N . The matrix R is real and symmetric, hence there exist eigenvalues [7] 0  1  1 1 1  N 01  0N 0and corresponding eigenvectors u0 ; : : : ; uN 01 such that fui gi=01 is an orthonormal basis of RN . The KLT matrix U is given by a matrix of order N whose columns are the basis vectors ui , namely,

U = [u0 u1 : : : uN 01 ]T : The transformed vector g = [g (0); : : : ; g(N tained as the following

(2)

0 1)]T

g = Uf :

is ob(3)

Now, we have the following approximation result.

Proposition 1 [1],[3] The KLT is the best transform in

terms of packing most of the energy into the rst few transform coecients, that is, it minimizes

Ef kf 0 ^f k2 ; M 01 M 01 X X where ^f = g (i)ui = hui ; f i ui and M i=0

(4)

 N.

i=0

The proof is omitted. Generally, the KLT is considered as an impractical transform because it depends on input signals. Therefore, one usually de nes an appropriate correlation matrix such as a rst-order Markov model, that is, (R)ij = ji0j j , where  is the correlation coecient between adjacent pixels, and derives the xed suboptimal KLT from R. For typical natural images, each pixel is strongly correlated (0:9 <  < 1). It has been shown that the KLT U leads to the discrete cosine transform (DCT) as  ! 1 [4],[8],[9].

3. VECTOR-EMBEDDED KARHUNEN-LOE VE TRANSFORM Although natural images are commonly approximated as the Markov process [10], there may exist many signals to which the KLT derived from a xed R (including the DCT) is inappropriate. In a two-dimensional setting, for example, a part of an image containing edges, lines, or textures corresponds to such a signal. Designing a transform which can represent such a non-stationary signal is generally dicult. One of methods to solve this problem is adaptive transform approach, that is, each input signal is classi ed into one of categories and expanded using the transform with respect to the category. Conventional adaptive transform approaches determine the category (and the corresponding transform) based on statistical modeling [6],[5]. However, since these approaches assume that signals are stationary, diculty to represent the non-stationary signals remains. Unlike these works, we attempt to expand some non-stationary signals by an orthonormal basis containing some vectors which can represent intuitive features and the others which have a statistic advantage like the KLT. In this section, we present the vector-embedded KarhunenLoeve transform (VEKLT) and its derivation. Suppose we 01 in RN , where L  create an orthonormal system fwi giL=0 N . Our purpose is to derive an orthonormal basis which

01 in a KLT-like contains the orthonormal system fwi gLi=0 01 a princriterion. We call a subspace spanned by fwi giL=0 ciple subspace and write it by W . First, we introduce a matrix W such as

W=

L01 X

wi wiT :

(5)

i=1

Note that W is an orthogonal projection matrix onto a 01 , since W2 = W, and WT = subspace spanned by fwi gLi=0 W. Now, we de ne a vector g such as

g = f 0 Wf : (6) The correlation matrix Q with respect to g is given by Q = E g [ T ] = Ef [(f 0 Wf )(f T 0 f T WT )] = Ef [ T ] 0 Ef [ T ]W 0 WEf [ T ] + WEf [ T ]WT = R 0 RW 0 WR + WRW; (7) where (1) is used. This implies that Q is determined by 01 . We get the following result: only R and fwi gLi=0 Proposition 2 The correlation matrix Q given in (7) has N 0 L nonzero positive eigenvalues 0  1 1 1  N 0L01 > 0 and correponding eigenvectors v0 ; : : : ; vN 0L01 , such that fvi gLi=001 is an orthonormal system, exist. Proof: We describe the brief proof of the above fact. From (6), the vector g belongs to the orthogonal complement of the principle subspace W . Therefore, it is veri ed that rank(Q) = N 0 L. In addition, since Q is symmetric and positive de nite, the proposition holds. 2 01 ; fvi gN 0L01 g Consequently, the set of vectors ffwi giL=0 i=0 is an orthogonal basis for a signal space RN . We can organize an orthogonal transform matrix V such that

V = [w0 1 1 1 wL01 v0 1 1 1 vN 0L01 ]T :

(8)

Note that order of the column vectors except for one of the vectors vi can be changed such that the transform V packs most energy into lower coecients.

4. A DESIGN EXAMPLE Let us provide an example of the VEKLT. For a non-stationary region of an image, that is, a block containing strong edges or lines, the VEKLT can be applied. We describe a design procedure of adaptive transforms considering the orientation of a local region using the VEKLT as a design example. We assume that an input image is partitioned into I 2 I blocks. With the two-dimensional notation, a sample in each block are expressed as f (m; n); m; n = 0; : : : ; I 0 1. However, transforms can be designed and performed in an one-dimensional manner if two-dimensional data f (m; n) are rearranged in lexicographical order [11], that is,

f = [f (0; 0); : : : ; f (0; I 0 1); f (1; 0); : : : ; f (1; I 0 1); |

{z

}|

I

{z

I T

: : : ; f (I 0 1; 0); : : : ; f (I 0 1; I 0 1)] : |

{z I

}

}

(9)

No. 1

No. 1

No. 2

No. 2

0.2

0.2

0.2

0

0

0

0

−0.2 8

−0.2 8

−0.2 8

−0.2 8

6 4 2

2

4

6

8

6 4 2

No. 3

2

4

6

8

0.2

6 4

Rotation and Orthonomalization

No. 4

2

2

4

8

6

6 4 2

No. 3

0.2

0.2

0.2

0

0

0

0

−0.2 8 6 4 2

2

4

6

8

−0.2 8 6 4 2

2

4

6

8

4

6

8

No. 4

0.2

−0.2 8

2

−0.2 8 6 4 2

2

4

8

6

6 4 2

2

4

6

8

Figure 1: The process to obtain an orthogonal system for a principle subspace To describe elements of a vector f , we utilize an one-dimensional an orthonormal basis of a principle subspace W . Figure 1 shows a process for constructing the orthonormal system notation f (n); n = 0; : : : ; N 0 1 such that f (m + nI ) = fwi giL=001 when  = =15, as an example. The left and right f (m; n) and N = I 2 . gures indicate fdl g3l=0 and fwi g3i=0 , respectively.

4.1. Design of a principle subspace

To construct the VEKLT as the orientation adaptive transform, rst, an orthonormal basis for a principle subspace W must be designed. As we stated, we attempt to design vectors adapted for the local orientation such as edges and lines, and construct the VEKLT involving these vectors, as described in Section 3. Then, we design the l th lowest basis vector using the DCT basis vector as the following way: Consider the subset fdl gLl=001 of the DCT basis [8] such that   p 2 Cl cos (2m + 1)l ; (10) dl (m + nI ) = I 2I m; n = 0; : : : ; I 0 1; l = 0; : : : ; L 0 1;  p =0 Cl = 1=1 2 lotherwise ; where we use an one-dimensional notation such that dl (m + nI ) = dl (m; n). To rotate the coordinate by the orientation , we extend dl from the discrete function into the continu01 such ous one. Then, we obtain the rotated version fdl gLl=0 that

dl (m + nI ) = dl (m0 ( ); n0 ());

(11)

where  0       m () = cos  sin  m 0 I=2 + I=2 ; (12) n0 () 0 sin  cos  n 0 I=2 I=2 m; n = 0; : : : ; I 0 1: Obviously, the dc vector d0 is immutable, that is, d0 = d0 . 01 may not be an orthonormal Since the vector set fdl gLl=0 system, they have to be orthonormalized in order to construct an orthogonal transform. To orthonormalize the vec01 , nally, the Gram-Schmidt process [7] is tor set fdi gLi=0 performed. We choose these orthonormalized vectors as

4.2. Directional correlation model It is necessary to choose the correlation matrix in which 01 will be embedded. We adopt the the vector set fwi giL=0 directional correlation model 1 R proposed by Bjntegaard [5] such that (R )i;j = dx () 1 dy () ; i; j = 0; : : : ; N 0 1; 0

0

(13)

where and are the correlation coecients and d0x () and d0y () are de ned as  0   dx () = cos  d0y () sin 

0 sin  cos 





dx ; dy

(14)

where dx and dy indicate the distances between two pixels in the horizontal and the vertical directions, respectively. Remarks: Although we chose the directional correlation model to organize the VEKLT, any correlation matrix is valid. More appropriate models may exist, but we leave this problem for future work.

5. ORIENTATION ADAPTIVE CODING The VEKLT designed in Section 4 is applied to orientation adaptive coding. Because the main issue of this study is on the transform step, we introduce a simple algorithm like the baseline-JPEG [12]. The image coding algorithm is summarized as follows: 1. A whole image is partitioned into square blocks. 2. Each block is transformed by the transform that is chosen among the L VEKLTs and the DCT according to the rule as will be described later. 3. Side information that indicates the orientation of each block is run-length and Hu man coded as a header. 1

In his paper, the representation for R is slightly di erent.

Transform Selecter

Table 1: Energy compaction results for 5122512 Lena showing the number of transform coecients used for reconstruction and the corresponding mean square error (MSE)

V0 V1 . . .

VK-1

Side Information

VK Input Vector

4

# of coef. OA-VEKLT DCT DCM-KLT

55.16 131.99 64.03

8

16

35.52 126.09 35.93

20.28 42.35 21.00

g Encoder

f Transformed Vector

39 38 37

Figure 2: Structure of adaptive transform coding

5.1. Transform selection Suppose that for 090   < 90,  is quantized into K levels, that is,  = 090+ 180 K k; k = 0; : : : ; K 0 1. We constrain the constant K to be the power of two for an ecient bit

assignment of orientation information, as we will see later. One of these K VEKLTs and the DCT is selected to transform each block. We use the following simple algorithm to select the transform. 1. For the original input vector f of each block, if the variance Var(f ) is below a certain threshold  , that is, Var(f ) <  , then the DCT is selected. 2. If Var(f )   , the transform that concentrates the most energy in fewer L transform coecients is selected from the K VEKLTs and the DCT.

5.2. Coding side information In addition to the quantized coecients, additional information, which is the information on the transform selected in each block, must be transmitted or stored in our adaptive coding. In our experiment, since the DCT is most frequently selected, the side information can be compressed. We build the Hu mann codebook for the DCT run-length. For a block whose transform is not the DCT, log2 K bits are allocated. We adopt the end-of-header (EOH) symbol which means that the rest of the blocks in the image are the DCT blocks.

6. EXPERIMENTAL RESULTS We divide an image into 8 2 8-pixel blocks (I = 8), and generate 32 di erent VEKLTs (K = 32), with correlation coecients = 0:95 and = 0:75. For the transform selector, the threshold  = 300 is used.

PSNR [dB]

4. The transform coecients are quantized and encoded. In the code assignment step, the same Hu mann codebooks as the baseline-JPEG are utilized. The structure of this coder is described in Figure 2, where VK denotes the DCT and Vk ; k = 0; : : : ; K 0 1 denotes the orientation adaptive VEKLT with respect to the direction , which will be given in Section 5.1.

36 35 34 33 32 OA-VEKLT DCT DCM-KLT

31 30 29 28 0.2

0.3

0.4

0.5

0.6 0.7 Rate [bpp]

0.8

0.9

1

1.1

Figure 3: The rate-distortion performance for 512 2 512 Lena image The transform and coding performances of the new VEKLT are evaluated through energy compaction, image coding, and subjective comparisons. The coders to be compared are:  the OA-VEKLT: Orientation adaptive coding where the orientation adaptive VEKLTs are utilized selectively, as stated previously.  the DCT: JPEG-like coding where all blocks are transformed with only the DCT. No side information.  the DCM-KLT: The coding method is the same as the OA-VEKLT, but the KLTs derived from the directional correlation model given by (13) are utilized. The image used for the experiments is \Lena", which is standard, well-known 512 2 512 8-bit gray-scale test image. Firstly, we perform energy compaction test. In this test, the rst j components of the transform coecients of each block are retained and the last 64 0 j components are set to zero. Transform for each block is selected as described in Section 5.1. From the truncated set of transform coecients, we reconstruct an approximation of a original vector of each block. We compare the performance of three coders in the mean square error (MSE) sense, as shown in Table 1. Of course, this experiment does not consider the side information to reconstruct the coded image. What we want to emphasize, therefore, is the comparison between the DCMKLT and the OA-VEKLT. It is observed that our method concentrates more energy in fewer coecients. Secondly, coding results are illustrated in Figure 3. To be fair, we employ an uniform step quantizer with the same

Figure 4: A decoded image at 0.25 bpp with the DCT stepsize for each coecient. In spite of the existence of the side information, rate-distortion performance is competitive with the method using only the DCT. Note that at low bit rate, our proposed method outperforms the DCT. Finally, Figures 4 and 5 show the decoded images at 0.25bpp. It is observed that the visual quality of the image is better especially around edges and lines. In addition, we can observe the reduction of the blocking e ects.

7. CONCLUSIONS The work presented in this paper demonstrates a novel transform called the vector-embedded Karhunen-Loeve transform (VEKLT), which has energy packing property like the KLT. Simultaneously, it can represent intuitive features with the embedded vectors. As the application in blocktransform-based image coding, we design the VEKLT that can represent the local orientational features of an image. At low bit rates, we can accomplish considerable cleaner edges than the DCT-based approach in which orientation is not considered. Moreover, we show that the orientation adaptive VEKLT-based coder concentrates more energy on the rst few coecients than a conventional orientation adaptive transform. Although our VEKLT is signi cant for an adaptive transform design, the arithmetic complexity and the blocking artifacts remain. These problems are left for future research.

8. REFERENCES [1] P. A. Wintz, \Transform picture coding," Proc. of IEEE, vol. 60, pp. 809{820, July 1972. [2] Y. Yamashita and H. Ogawa, \Relative KarhunenLoeve transform," IEEE Trans. Signal Processing, vol. 44, pp. 371{378, Feb. 1996.

Figure 5: A decoded image at 0.25 bpp with the OAVEKLT [3] H. Ogawa, \Karhunen-Loeve subspace," in Proc. of the 12th International Conf. on Pattern Recognition, vol. 2, (Hague, Netherlands), pp. 75{78, 1992. [4] N. Ahmed, T. Natarajan, and K. R. Rao, \Discrete cosine transform," IEEE Trans. Computers, vol. COM25, pp. 90{93, Jan. 1974. [5] G. Bjntegaard, \A novel method for compressing images using discrete directional transforms," in Proc. SPIE Visual Commun. and Image Processing '88, vol. 1001, pp. 840{846, 1988. [6] G. W. Wornell and D. H. Staelin, \Transform image coding with a new family of models," in Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Proc., pp. 777{780, 1988. [7] S. J. Leon, Linear algebra with applications. Englewood Cli s, NJ: Prentice Hall, 1994. [8] K. R. Rao and P. Yip, Discrete cosine transform: algorithms, advantages, applications. New York, NY: Academic Press, Inc, 1990. [9] R. J. Clarke, \Relation between the Karhunen-Loeve and cosine transforms," IEE Proc. Pt. F, vol. 128, pp. 359{360, Nov. 1981. [10] W. K. Pratt, Digital image processing. New York, NY: J. Willy, 1991. [11] F. A. Kamangar and K. R. Rao, \Fast algorithms for the 2-D discrete cosine transform," IEEE Trans. Computers, vol. C-31, pp. 899{906, Sept. 1982. [12] W. B. Pennebaker and L. J. Mitchell, JPEG Still Image Data Compression Standard. New York, NY: Van Nostrand Reinhold, 1992.