Perceptually Tuned Subband Coder with Region of Interest

1 downloads 0 Views 1MB Size Report
Perceptually Tuned Subband Coder with Region of. Interest. Ee-Leng, Tan, Woon-Seng Gan, Yong-Kim Chong. School of Electrical and Electronic Engineering.
Perceptually Tuned Subband Coder with Region of Interest Ee-Leng, Tan, Woon-Seng Gan, Yong-Kim Chong

Meng-Tong Wong

School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore {etanel, ewsgan, eykchong}@ntu.edu.sg

Texas Instruments Singapore Singapore, Singapore [email protected]

Abstract—At high compression, various artifacts such as blurring and blocking are common in JPEG images. In this paper, a perceptually tuned subband coder based on the subband discrete cosine transform (SBDCT) is discussed. At the same bit rate, the compressed images from our proposed coder are visually better than JPEG images. In [5], simple schemes of computation and distortion scalability are introduced in our coder. In this paper, an updated implementation of [5] which consists of lower memory requirement and ROI coding are discussed. Keywords—subband discrete cosine transform, JPEG

I.

INTRODUCTION

Lossless compression techniques are typically considered with a compression ratio of 3:1 [1]. Hence losslessly compressed images require significant amount of time for acquisition, storage, and transmission. Lossy compression techniques can be used to minimize this time delay, but highly compressed images tend to exhibit artifacts such as blocking and blurring. At the same level of compression, JPEG 2000 [2] images do not exhibit the same degree of degradation as compared to JPEG [3] images. Contents of JPEG 2000 images can be easily recognized even at high level of compression. However, the computational cost of JPEG 2000 is found to be much higher than other image compression standards [4]. Currently, JPEG is still the most commonly used compression standard and widely employed in many imagery applications. Therefore, this paper focuses on enhancing the JPEG standard. Our perceptually tuned subband coder described in [5] forms the basis in this work. In the heart of our coder, subband discrete cosine transform (SBDCT) is used instead of discrete cosine transform (DCT). At low bit rates, Jung et. al. [6] has shown SBDCT produces images with lesser blocking artifacts as compared to DCT. In [7], we have derived a fast SBDCT algorithm. Four modes of operation of the SBDCT are also introduced to support different computational cost. Our strategy is to exploit the human visual system (HVS) with the just noticeable distortion (JND) measure. A simplified JND model from [8] is adopted in our SBDCT coder. Visibility thresholds derived from the JND profile are used to locally adapt the quantization table with a multiplier. This is supported

1-4244-0983-7/07/$25.00 ©2007 IEEE

with ISO/IEC DIS 10918-3 which specifies an extension to JPEG. Consequently, we referred this coder as the scalable perceptual image coder (SPIC). For the same bit rate, results indicate that grayscale and color images from SPIC are of higher perceived visual quality as compared to JPEG. For applications such as medical and surveillance, it is particularly useful to consider different regions of an image with different priorities in the order of interpretability. Regions with high importance are marked as region of interest (ROI) and are coded with higher bit rates as compared to the nonROI. An image with high perceived image quality in specific regions is then obtained without prohibitive bandwidth. In this paper, an updated implementation of [5] which consists of lower memory requirement and ROI coding are discussed. The rest of this paper is organized as follows. The introduction of SPIC is first given followed by its updated implementation in Section II. In Section III, the simulation results are presented. Finally, our conclusions are given in Section IV. II.

PERCEPTUALLY TUNED SBDCT CODER

An overview of SPIC is shown in Fig. 1. The blocks in the top row of SPIC is the SBDCT coding scheme. The JND computation is given in the second row. The JND profile of the input image is computed and subsequently decomposed to subband JND profiles. The subband JND profiles determine which subbands of the input image are to be processed by the subband transform block. The last row of SPIC contains the quantization and entropy coding blocks. The quantization matrix used in SPIC is derived from the subband JND profiles. A. Subband Discete Cosine Transform The DCT algorithm for an N × N sequence [9] is given as S ( u , v ) = α uα v

N −1 N −1

∑∑ s ( x, y ) x=0 y =0

⎛ ( 2 x + 1) uπ ⎞ ⎛ ( 2 y + 1) vπ ⎞ .cos ⎜ ⎟ cos ⎜ ⎟, 2N 2N ⎝ ⎠ ⎝ ⎠ for 0 ≤ u, v < N ,

(1)

ICICS 2007

Input Image

Divide to N x N sub-images JND Profile

SBDCT Subband Subband Decomposition Transform

TABLE I.

JND Profile Decomposition

Subband Selection

No. of subbands

No. of multiplications

No. of additions

Quantization

Entropy Encoding

1 2 3 4 (Fullband)

24 52 80 112

72 168 304 400

Compressed Image

Fig. 1. Overview of SPIC.

where ⎪⎧ 1/ N ,

αk = ⎨

k = 0,

⎪⎩ 2 / N , k = 1, 2,… , N − 1.

(2)

The fullband SBDCT algorithm is mathematically equivalent to (1). We denote Ca / b and S a / b as cos ( aπ / b ) and sin ( aπ / b ) , respectively. The fullband SBDCT is then expressed as N ′ −1 N ′ −1 ⎧ ⎫ C C ⎪ u / 2 N v / 2 N ∑ ∑ sLL ( x, y ) Cx′u / N C y ′v / N ⎪ x =0 y =0 ⎪ ⎪ N ′ −1 N ′ −1 ⎪ ⎪ ⎪+Cu / 2 N Sv / 2 N ∑ ∑ sLH ( x, y ) Cx′u / N S y ′v / N ⎪ x =0 y =0 ⎪ ⎪ S ( u , v ) = α uα v ⎨ ⎬, N ′ −1 N ′ −1 ⎪ (3) ⎪+S u / 2 N Cv / 2 N ∑ ∑ sHL ( x, y ) S x ′u / N C y ′v / N ⎪ ⎪ x =0 y =0 ⎪ ⎪ N ′ −1 N ′ −1 ⎪ ⎪ ⎪ + Su / 2 N Sv / 2 N ∑ ∑ sHH ( x, y ) S x ′u / N S y ′v / N ⎪ x =0 y =0 ⎭ ⎩ for 0 ≤ u, v < N ,

where N ′ = N / 2, x′ = 2 x + 1 and y ′ = 2 y + 1. sLL(x, y), sLH(x, y), sHL(x, y), and sHH(x, y) are the low-low, low-high, high-low and high-high subbands of the N × N sequence, respectively. These subbands are expressed as 1

i =0 j =0 1

sLH ( x, y ) = ∑∑ ( −1) s ( 2 x + i, 2 y + j ) , j

1

1

i

(4)

i =0 j =0 1

1

sHH ( x, y ) = ∑∑ ( −1) i =0 j =0

i+ j

The salient difference between this work and [5] is the use of sub-images instead of the whole image. Advantages of this approach are threefold. Firstly, the memory requirement for storing the JND profile is greatly reduced. Secondly, selection of the subbands in (3) is performed at sub-image level. For most images, their statistics are not stationary throughout the image. Hence different combination of subbands should be used for different sub-images. This results in lower computational cost with respect to the whole image. Thirdly, the weighting factors become invariant to the image size as sub-images are used. The weighting factors can then be precomputed to avoid computational cost. Let the fullband JND profile of an N × N sub-image be JNDfb and is described as JND fb ( x, y ) = max { f1 ( x, y ) , f 2 ( x, y )} ,

(5)

where f1 ( x, y ) = mg ( x, y ) α ( x, y ) + β ( x, y ) ,

α ( x, y ) = 0.0001bg ( x, y ) + 0.115,

i =0 j =0

sHL ( x, y ) = ∑∑ ( −1) s ( 2 x + i, 2 y + j ) ,

B. Just Noticeable Distortion If the noise level in an image is just noticeable, a visually perfect image can be obtained with the lowest possible bits [10]. The JND profile adopted for this work is based on a simplified inter-relevance model of the spatial masking effect and background luminance [8].

⎧ ⎛ bg ( x, y ) ⎞ ⎪⎪T0 ⎜ 1 − ⎟ + 3, for bg ( x, y ) ≤ 127, 127 ⎟ f 2 ( x, y ) = ⎨ ⎜⎝ ⎠ ⎪ ⎪⎩γ ( bg ( x, y ) − 127 ) + 3, for bg ( x, y ) > 127,

1

sLL ( x, y ) = ∑∑ s ( 2 x + i, 2 y + j ) , 1

COMPUTATIONAL COST FOR FAST SBDCT (8 × 8 SUBIMAGES)

s ( 2 x + i, 2 y + j ) ,

for 0 ≤ x, y < N ′ .

Based on the symmetrical properties of the SBDCT algorithm [6], we have derived the fast SBDCT algorithm and presented the results in [7]. The computational cost of the fast SBDCT algorithm for an 8 × 8 sub-image is shown in Table I. Generally, most of the energy content of images resides in sLL, thus a reasonable approximation of (3) can be obtained with only sLL..

(6)

β ( x, y ) = λ − 0.01bg ( x, y ) , for 0 ≤ x, y < N .

bg(x,y) and mg(x,y) are the average background luminance and maximum weighted average luminance difference, respectively. T0, γ and λ are derived from experiments [8] and are found to be 17, 3/128 and 1/2, respectively. Subband JND profiles are derived from the fullband JND profile with a set of weighing factors [5], [8]. These weighting factors are referred as wf0, wf1, wf2 and wf3. For our work, we have found wf0, wf1, wf2 and wf3 to be 0.1301, 0.2618, 0.1825 and 0.4256, respectively. Let the subband JND profiles be JNDk . These subband JND profiles are obtained by

Subband 0 Subband 1

Subband 2 Subband 3

Fig. 2. Numbering scheme of subbands. For example, sLL, sHL, sLH, and sHH are referred interchangeably as subbands, s0, s1, s2, and s3, respectively.

(a)

(b)

Fig. 3. Regions of SLL. Lighter and darker gray regions represent D1 and D2, respectively.

JNDk ( x, y ) = wf k

1

1

i =0

j =0

∑∑ JND ( i + 2 x, j + 2 y ), 2 fb

(7)

for 0 ≤ k < 4, 0 ≤ x, y < N ′.

These subband JND profiles are used to evaluate the significance of the subbands in (4). The numbering scheme of the subbands and subband JND profiles is given in Fig. 2. Only subbands with higher energy than the corresponding subband JND profiles will be processed by the subband transform block. 1. Let the energy of the subbands and subband JND profiles be eSUB and eJND, respectively. eSUB and eJND are computed as eSUB ( k ) = eJND ( k ) =

N ′−1 N ′−1

∑∑ x =0

y =0

N ′−1 N ′−1

k

(8) for 0 ≤ k < 4.

y =0

The last row of Fig. 1 denotes quantization and entropy block. The quantization table is derived with a modified approach from [8]. Let Γ be a multiplier to the quantization table of an N × N sub-image. This multiplier is computed as Γ = r / n,

(9)

where r=

N ′ −1 N ′ −1

∑ ∑ JND ( x, y ) − ∑ ∑ k

( u , v )∈ A

x =0 y =0

−∑

n = 63 − ∑



( u , v )∈B



( u , v )∈ A

2 S LL ( u, v ),

1− ∑



{( u, v ) S B = {( u , v ) S A=

LL LL

( u, v ) < 0.5, ( u, v ) ∈ D1} , ( u, v ) < 1, ( u, v ) ∈ D2 } ,

(11)

where D1 and D2 are depicted in Fig. 3.

sk 2 ( x, y ),

∑∑ JND ( x, y ), x =0

(c) Fig. 4. Scaled JND profile of the “Bike” image. (a) JND profile of image, (b) scaled version of the JND profile, and (c) ROI mask used. Dark and white regions indicate the non-ROI and ROI, respectively.

2 S LL ( u, v )

(10) 1.

( u , v )∈B

Let the transform of sLL ( x, y ) be S LL ( u, v ) . The two sets of the transform coefficients, A and B, are given as

C. Region of Interest To incorporate a ROI mask into the JND profile, we have adopted a simple approach. A scaling factor is used to scale up the values in the JND profile that belongs to the non-ROIs given by the ROI mask. An example is shown in Fig. 4. Most values of the JND profile within the ROIs are then lower than those in the non-ROIs. Consequently, it is expected that most of the ROIs and non-ROIs are processed by fullband SBDCT and SBDCT with sLL, respectively. This scheme yields perceptually good images in the ROIs only. This scheme also translates to a lower computational cost per image as only ROIs are processed at higher computational cost. Assuming all ROIs are processed with fullband SBDCT and non-ROIs with SBDCT with sLL, the computational cost per 8 × 8 block then is 112 multiplications and 400 additions and 24 multiplications and 74 additions, respectively. By manipulating the quantization table in the baseline coder, similar ROI scheme can be implemented. As compared to SPIC, the computational cost with respect to the whole image is higher for the baseline coder. Moreover, SPIC images are of superior visual quality as compared to those from the baseline coder at the same bit rate.

(a)

(a)

(b)

(c) (d) Fig. 5. Coding errors of the “Plane” image coded at 0.6 bpp by SPIC and JPEG. (a) Reference “Plane” image, (b) JND profile of “Plane” image, (c) shows that most of the coding error found in the SPIC image is concentrated in the region indicated by the JND profile, and (d) relatively significant coding error in the JPEG image is observed in the whole image.

TABLE III.

Bitrate (bpp) 0.5 0.4 0.3

Original coder [5]

TABLE II.

Baseline coder

PSNR (dB)

PSPNR (dB)

PSNR (dB)

PSPNR (dB)

PSNR (dB)

PSPNR (dB)

36.49 34.68 32.20

46.39 42.53 38.32

35.99 33.86 31.96

45.53 40.59 37.29

36.07 34.12 32.47

43.24 40.12 37.64

A set of results obtained with the “Lena” image are tabulated in Table II. Other than the PSNR measurement, peak signal-to-perceptual-noise ratio (PSPNR) [8] measurement is also included in Table II. Let the original and reconstructed images be s ( x, y ) and sˆ ( x, y ) , respectively. Let the height and width of an image be H and W, respectively. PSPNR is then given as PSPNR = 20 log10 255

,

W −1 H −1

PSNR MEASUREMENT OF ROI AND NON-ROI

Bitrate (bpp)

PSNR for ROI (dB)

PSNR for Non-ROI (dB)

1 0.5 0.25

45.85 45.85 37.36

42.91 32.36 25.49

where

III. SIMULATION RESULTS The JND profile is scaled to vary the bit rate of coders with the JND measure. In [8], the scaled JND profile and scaling factor are referred as the minimum noticeable distortion (MND) and the distortion index, respectively. To vary the bit rate of the baseline coder [9], the quantization table is multiplied with a quality factor [11].

1 HW

(c) (d) Fig. 6. ROI coding with the “Pepper” and “Bike” images (512 x 512 pixels). (a) Original “Pepper” image, (b) “Pepper” image with a ROI of 250 x 250 pixels defined in the middle of the image. The PSNR measurement for the ROI and non-ROI is 45.8536 dB and 21.8889 dB, respectively, (c) original “Bike” image and, (d) “Bike” image with two ROIs defined for the color chart and clock.

PSNR & PSPNR MEASUREMENTS OF THE BASELINE AND SBDCT BASED CODERS Proposed coder

(b)

∑ ∑ ⎡⎣ s ( x, y ) − sˆ ( x, y ) − JND ( x, y )⎤⎦ δ ( x, y ) 2

fb

x=0 y =0

(12)

⎪⎧1, if s ( x, y ) − sˆ ( x, y ) > JND fb ( x, y ) , ⎪⎩0, if s ( x, y ) − sˆ ( x, y ) ≤ JND fb ( x, y ) ,

δ ( x, y ) = ⎨

(13)

for 0 ≤ x < H , 0 ≤ y < W .

Since PSPNR accounts for the JND profile of the original image, it provides more insight to the perceived visual quality as compared to PSNR. Several observations can be made from Table II. The proposed coder offers the highest perceived visual quality at all bit rates. Our explanation is given with an example in Fig. 5. Most of the coding error in “Plane” image, compressed by the proposed coder, is found in the imperceptible regions indicated by its JND profile. Conversely, the coding error in the JPEG image is relatively spread across the whole image. Thus visually good images are expected from our proposed coder. Examples of ROI coded images are shown in Fig. 6. Various possible bit allocations for ROI coding are shown in Table III. A series of PSNR measurement is obtained by using different scaling factors to scale the values of the JND profile in non-ROIs.

IV.

CONCLUSIONS

This paper presents a SBDCT coder with the JND measure. Computation of the JND profile is performed at with subimages. This approach drastically reduces the memory requirement of the JND computation as compared to our prior work. In addition, ROI coding scheme is introduced with the proposed coder. Future works can include adaptation of the JND measure to incorporate various application specific requirements. REFERENCES [1]

[2]

[3]

M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I lossless image compression algorithm: Principles and standardization into JPEGLS,” IEEE Trans. Comm., vol. 9, no. 8, pp. 1309-1324, Aug. 2000. C. Christopoulos, A. Skodras, and Touradj Ebrahimi, “The JPEG2000 still image coding system: An overview,” IEEE Trans. Consumer Electronics, pp. 1103-1127, August 2000. G. K. Wallance, “The JPEG still picture compression standard,” IEEE Trans. Consumer Electronics, pp. xviii - xxxiv, February 1992.

[4]

D. Santa-Cruz, and T. Ebrahimi, “A study of JPEG 2000 still image coding versus other standards,” in Proc. X European Signal Processing Conf., vol. 2, pp. 673-676, Sept. 2000. [5] E. L. Tan, W. S. Gan, and M. T. Wong, “Perceptually lossless coder based on just noticeable distortion profile with subband DCT,” IEEE Int’l Symp. Consumer Electronics, pp. 253-257, June 2005. [6] S. H. Jung, S. K. Mitra, and D. Mukherjee, “Subband DCT: definition, analysis, and applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 273-286, June. 1996. [7] E. L. Tan, W. S. Gan, and M. T. Wong, “Practical implementation of subband DCT based coder,” IEEE Int’l Symp. Intell. Signal Process. and Commun., pp. 485-488, Dec. 2005. [8] C. H. Chou, and Y. C. Li, “A perceptually tuned subband image coder based on the measure of just-noticeable distortion profile,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 6, pp. 467-476, June 1996. [9] Recommendation ITU-T.81, Information Technology-Digital Compression and Coding of Continuous-Tone Still Images-Requirements and Guidelines, Int'l Telecommunications Union, 1992. [10] N. Jayant, J. Johnston, and R. Safranek, “Signal compression based on models on human perception,” Proc. IEEE, vol. 81, no. 10, pp. 13851422, Oct. 1993. [11] V. Bhaskaran, and K. Konstantinides, Image and video compression standards, Klumwer Academic Publisher, 1997.

Suggest Documents