Document not found! Please try again

Content-Based Image Compression for Arbitrary ... - Semantic Scholar

5 downloads 6295 Views 5MB Size Report
and cloud computing) for universal access and presentation, without extra computational burden to the receiving end. Scalable image compression and ...
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

Content-based Image Compression for Arbitrary-resolution Display Devices Chenwei Deng, Member, IEEE, Weisi Lin, Senior Member, IEEE, and Jianfei Cai, Senior Member, IEEE,

Abstract—The work presented in this paper addresses the increasing demand of visual signal delivery to terminals with arbitrary resolutions (like in mobile multimedia communication and cloud computing) for universal access and presentation, without extra computational burden to the receiving end. Scalable image compression and transmission are essential, and to be effective and meaningful, it has to be content-based. The existing coding methods cannot support content-based spatial scalability with high compression. In this paper, the principle of seam carving (SC) is incorporated into a wavelet codec. After multilevel discrete wavelet transform (DWT), SC is performed in the low frequency subband. Different from the conventional waveletbased coding schemes, DWT coefficients here are encoded and transmitted according to the energy map of resultant seams. At the decoder side, the end user has the ultimate choice for the scalability without the need to examine the visual content; an image with arbitrary aspect ratio can be reconstructed in a contentaware manner based upon the encoded information. Simulation results show that the resized images preserve important content while achieving high coding efficiency in transmission. Index Terms—Image compression, Content-aware, Spatialscalability, Discrete wavelet transform (DWT), Seam carving (SC), SPIHT.

I. I NTRODUCTION URING the past few years, image compression has been extensively studied. Among the existing image codecs, the most commonly used ones are mainly based upon discrete wavelet transform (DWT), such as JPEG2000 [1] and SPIHT [2]. Generally speaking, such codecs can achieve high coding efficiency in terms of rate-distortion (R-D) performance. However, the spatial-scalability is not fully supported and only dyadic resizing can be achieved without considering the original image content (as a result, the quality of important visual content may be severely degraded in the re-targeted images). Nowadays, as the size of client devices (laptops, PDAs, mobile phones, etc.) continue diversifying, the existing coding schemes cannot be directly applied, and an additional image re-targeting process (e.g., down-sampling, cropping, warping [3] and seam carving (SC) [4], [5]) is needed in the receiving end. However, for the resource-limited devices for the end user, it is not always possible and economical to perform sophisticated content-aware image resizing. Therefore, content-based spatial-scalable image compression for arbitrary resolution is

D

Manuscript received September 20, 2010; revised February 11, 2011. This work was supported by MoE AcRF Tire 2, Singapore, Grant Number : T208B1218. The authors are with School of Computer Engineering (SCE), Nanyang Technological University, Singapore, 639798. (Email: {cwdeng, wslin, asjfcai}@ntu.edu.sg

becoming one of the emergent challenges for universal access [6], i.e., one can access any information over any network from anywhere through any type of display devices. Anh et al. [7] first proposed a content-aware multi-size image compression scheme. The basic idea is that, by using the significance map based upon SC, an image is decomposed into two components: ROI (region of interest) and non-ROI. For the ROI, it is encoded by the SPIHT codec and the size is not expected to be altered; for the non-ROI, the pixels are grouped as a sequence of seams, i.e., a connected path of low energy pixels crossing the non-ROI region from top to bottom (a vertical seam), or from left to right (a horizontal seam). The seam information is encoded by adaptive arithmetic coding algorithm and during image decoding with the need of resizing, the seams with low energy are deleted. Experimental results in [7] showed that the coding efficiency is far below the wavelet-based SPIHT codec: a 2.68 bpp (bit per pixel) SPIHT-coded image achieved the same peak signal to noise ratio (i.e., PSNR = 40.5 dB) as a 5.85 bpp seam-coded image. The method is not able to adapt an image to the display smaller than the ROI size. Furthermore, since the ROI and non-ROI are encoded using different coding schemes, severe block-artifacts occur on the boundaries of the ROI and non-ROI regions. To address the three limitations of the aforementioned scalable coding method (as in [7]), the advantages of SC (i.e., content-aware image resizing) and wavelet-based coding (i.e., high R-D performance) are combined in this paper and this results in a novel content-based spatial-scalable compression scheme. Different from the scheme in [7], the original image is considered as a whole and not divided into two components, while SC is performed in the low-frequency subband. The coding process is guided by the seam energy map. The SPIHTcoded bitstream and the side information of the resultant seams are transmitted to the decoder. In this way, we can reconstruct the content-aware image with arbitrary aspect ratio. II. PROPOSED CODEC DESCRIPTION Seam carving (SC) [4], [5] is an emerging image resizing paradigm. For SC, the importance of pixels is defined by an energy function and based upon this function, the image size can be changed by gracefully carving-out or inserting pixels in different parts of the image. In our work, SC is applied to improve the spatial scalability of the conventional wavelet based SPIHT scheme. The block diagram of the proposed codec is shown in Fig. 1, which involves two major aspects: the seam energy map and seam-based SPIHT coding. The main difference between the proposed codec and the conventional

978-1-61284-231-8/11/$26.00 ©2011 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

Original image

Resized image

Low-frequence subband DWT

Energy map

Seam carving

Side information

SPIHT-coded bitstream

High-frequence subband

(a) SPIHT decoder

Seam insertion

IDWT Side decoder

1 Fig. 1.

The block diagram of the proposed image codec.

2

5 4

SPIHT lies in the coding order of DWT coefficients. In the proposed scheme, the spatial orientation trees (SOT) [2] (as shown in Fig. 2(c)) are encoded and transmitted according to the seam energy map. At the decoder side, the coefficients are reconstructed in energy descending order, and if one would like to obtain an image with an arbitrary size, seam insertion is performed (as will be discussed in Subsection II.D). A. Seam Energy Map

3 6 (b)

Fig. 2. Wavelet coefficients coding: (a) the conventional scanning order in low-frequency subband; (b) seam energy guided scanning order in lowfrequency subband; (c) the spatial orientation tree (SOT) used in SPIHT (2 scales being used for illustration purpose here).

EH =

Assume that for an N × M image, L-scale wavelet decomposition is performed. Following the framework in Fig. 1, the J × K (J = N/2L , K = M/2L ) coefficients in the low frequency subband are grouped into vertical and horizontal seams, i.e., 8-connected paths from top (or left) to bottom (or right). Let si denote the i-th component of a seam, and then sxi and syi represent its vertical and horizontal aspects, respectively. Mathematically, a vertical seam is defined as [4]:

(c)

( i ) L i i ∑ βi |ωLH + ωHL + ωHH | 4L−i i

(6)

i i i where ωLH , ωHL and ωHH are the respective highfrequency coefficients of the corresponding SOT in level-i, and βi denotes the tuning parameter. Combing Eqs. 1 to 6, the optimal seam with minimized energy can be found. The resultant seams are then sorted in energy descending order. The seam energy map would be used to control the scanning and coding order of the SOT, which is to be discussed in the next subsection.

S x = {sxi }Ji=1 = {x(i), i}Ji=1 , ∀i, | x(i) − x(i − 1) ≤ i | (1) where x is a mapping x : [1, · · · , J] → [1, · · · , K]. Similarly, if y is a mapping y : [1, · · · , K] → [1, · · · , J], then a horizontal seam is K S y = {syj }K j=1 = {y(j), j}j=1 , ∀j, | y(j)−y(j −1) ≤ j | (2)

The DWT coefficients of a vertical seam will therefore be {ωLL (si )}Ji=1 , where ωLL (·) denotes the low-frequency subband. We look for the optimal seam S ∗ that satisfies: S ⋆ = arg min S

J ∑

E [ωLL (si )]

(3)

i=1

where E[·] denotes the seam energy, which is a measure of visual importance. Due to the property of SOT in SPIHT codec, E[·] is defined as follows: E = αL EL + αH EH

(4)

where αL and αH are weighted factors and αL + αH = 1, while EL and EH represent the energy contributed by the lowfrequency subband and high-frequency subbands, respectively; EL and EH are defined as: ∂ ∂ EL = ωLL + ωLL (5) ∂x ∂y

B. Content-based SPIHT Coding In the conventional SPIHT scheme, it is well known that the DWT coefficients are sorted as SOT [2], and all the root nodes of the trees are located in the lowfrequency subband. Generally speaking, the trees are scanned in a zig-zag order (as shown in Fig. 2(a)), and the coding process is performed from the most significant bitplane (MSB) to the least significant bitplane (LSB). Let the SOT and the encoded bitstream be denoted as Ti and {T1M SB , T2M SB , · · · , T1M SB−1 , T2M SB−1 , · · · , T1LSB , T2LSB , · · · }, respectively. One can see that the bitstream is grouped in a fully embedded manner and can be progressively transmitted to the decoder side. Therefore, quality scalability can be perfectly achieved. It is difficult for the SPIHT codec to achieve spatial scalability. In addition, the importance of the different SOTs has not been considered, and therefore, SPIHT cannot support content-aware spatial-scalable coding. To address this limitation, we propose a seam energy based SPIHT framework. As mentioned in Subsection II.A, for each root node ωLL (si ), (i.e., DWT coefficients in the low-frequency subband), the corresponding seam energy E (as in Eq. 4) is calculated and all the root nodes of SOTs are sorted according to the seam energy map. In the proposed scheme, the SOTs are scanned in the energy descending order, i.e., the most important tree

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

is encoded first. It should be mentioned that the resultant bitstream is grouped and transmitted in the following order: {TEM1 SB , TEM1 SB−1 , · · · , TELSB , TEM2 SB , TEM2 SB−1 , · · · , TELSB , 1 2 · · · }, where E1 > E2 . For simplicity, the vertical and horizontal seams are alternatively processed, as is shown in Fig. 2(b). At the decoder side, the required SOT corresponding to the horizontal and vertical seams may not be received simultaneously due to the seam order. In such a case, the redundant seams need to satisfy the required aspect ratio. Note that such redundancy must be deleted before image reconstruction. C. Side Information Since the SOT are encoded based on the seam energy map, the side information (i.e., position of the root nodes) has to be sent for the reconstruction of trees. Similar to the method in [7], the adaptive arithmetic coding (AAC) scheme is applied for the side information. It should be noted that for a vertical (or horizontal) seam, the positions of a seam are encoded from top (or left) to bottom (or right), and only x-coordinates (or y-coordinates) are required to identify the position. The coding process of the coefficients in low-frequency subband can be divided into three categories: for the first coefficient of the seam with most energy, the coordinate value is encoded; for the first coefficient of other seams, the difference with that of previous seam is encoded; for the other coefficients, the three possible values {−1, 0, 1} are encoded. We would like to point out that due to the alternative seam transmission order, one more bit is needed to indicate the status of the first transmitted seam (i.e., vertical or horizontal seam). For the first pair of vertical and horizontal seams, J and K (i.e., the size of the low-frequency subband) positions are encoded; and then for each new pair of seams, the number of positions needed to be encoded is reduced by one. D. Spatial Scalability As discussed in Subsections II.B and II.C, the SPIHT-coded bitstream are transmitted in a content-aware manner. By using the side information and the bitstream, the reconstructed image with a different resolution can be generated. However, due to the DWT architecture, a vertical or horizontal seam in the low frequency subband represents 2L (following the notation in Subsection II.A, L is the highest wavelet decomposition scale) neighboring columns or rows of pixels in the original image. That is, if one vertical or horizontal seam is deleted, 2L neighboring columns or rows of the original image are deleted accordingly. In this case, the retargeted size must be the integer times of 2L . In order to obtain the decoded image with an arbitrary resolution, seam insertion is needed at the decoder side. An arbitrary image size can be represented by (n ∗ 2L − Ir ) × (m ∗ 2L − Ic ), where n and m are the integers, and Ir and Ic are integers smaller than 2L . The adaptation process is as follows: a 2L (n − 1) × 2L (m − 1) image is first generated; the side information of the last received vertical and horizontal seam is then saved; the 2L corresponding

columns and rows of the pixels in 2L (n − 1) × 2L (m − 1) image are enlarged to the required size by bilinear interpolating [2L (2L − Ic )(n − 1) + (2L − Ir )(m ∗ 2L − Ic )] or [2L (2L − Ir )(m − 1) + (2L − Ic )(n ∗ 2L − Ir )] pixels. III. SIMULATIONS AND ANALYSIS In this section, we report the experiments conducted to evaluate the spatial-scalability and compression performance of the proposed seam energy based SPIHT codec (denoted as Seam-SPIHT), which is compared with the following three schemes: the ROI-based image lossy codec in [7] (denoted as ROI-Seam); the state-of-the-art JPEG2000 with ROI-based tools [1] (denoted as ROI-JPEG2000); the conventional SPIHT followed by uniformly scaling (denoted as SPIHT-Scale). Four 672 × 672 test images, wave, forest, grave and butterfly were used for the objective and subjective verification. For the ROI-Seam in [7], the original wave, forest, grave and butterfly were first decomposed into a ROI (i.e., a key image portion) and non-ROI (i.e., a sequence of seams). The resolutions of ROI are 352 × 432, 264 × 336, 398 × 484 and 494 × 526, respectively. The ROIs were encoded losslessly using SPIHT; the pixel value and the position information of each seam were quantized and encoded using the AAC. The overall PSNR of the seams was set as 35 dB. At the decoder side, the seams correspond to an image with near but smaller size than the targeted size was received and the seam insertion was performed to generate the images with required aspect ratio. For the ROI-JPEG2000, DWT was performed for the whole image, and we defined a ROI for each test image. The resolutions of ROI are the same as that of [7], i.e., 352 × 432, 264 × 336, 398 × 484 and 494 × 526 for wave, forest, grave and butterfly, respectively. The ROIs were encoded losslessly while the PSNR for non-ROI region was set as 35 dB. Since each codeblock is encoded independently, only the bitstream correspond to the needed subbands should be sent to the decoder. If the required size is larger than that of ROI, we cropped the decoded image (with nearest dyadic-resolution larger than the ROI) to the targeted size and the ROI remained as the original size; otherwise, only the scaled-down version of the ROI can be reconstructed at the decoder side. For the SPIHT-Scale framework, the DWT coefficients were losslessly processed by SPIHT codec [2]. As mentioned in Subsection II.B, the coefficients are scanned using SOT, and therefore no mater what the required size is, all the bitstream should be received at the decoder side, and one can reconstruct the images with a dyadic aspect ratio. In the experiments, we first decoded the images with nearest dyadic-resolution and then uniformly scaled it to the required size. For the proposed Seam-SPIHT, 2-level DWT was performed for the whole original image (note that the DWT was only performed for the ROI in the ROI-Seam [7]). Due to the 2level DWT architecture, if the required resolution is the integer times of 22 (= 4), seam insertion is not needed; otherwise, less than 4 columns or rows of pixels are interpolated at the decoder side. The DWT coefficients and the side information (i.e., positions of seams) were encoded losslessly using the SPIHT and AAC, respectively.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

TABLE I

C OMPARISON OF COMPRESSION PERFORMANCE IN TERMS OF BPP ( RECEIVED BITS PER PIXEL INCLUDING THE OVERHEAD INFORMATION ). T HE WHOLE IMAGES WERE LOSSLESSLY ENCODED BY THE SPIHT-S CALE AND P ROPOSED S EAM -SPIHT; FOR THE ROI-JPEG2000 AND ROI-S EAM [7], THE ROI REGION WAS LOSSLESSLY ENCODED AND THE PSNR OF THE NON -ROI REGION WAS SET AS 35 D B. Test images

wave

forest

grave

butterfly

Resolutions 241 × 320 392 × 462 432 × 492 672 × 672 241 × 320 392 × 462 432 × 492 672 × 672 241 × 320 392 × 462 432 × 492 672 × 672 241 × 320 392 × 462 432 × 492 672 × 672

SPIHT-Scale 25.94 11.05 9.41 4.43 25.0 10.65 9.07 4.27 23.19 9.87 8.41 3.96 28.58 12.17 10.37 4.88

Table 1 compared the compression efficiency of the four codecs mentioned above. One can see that ROI-JPEG2000 performs the best with original image size. For the SPIHTScale scheme, all the coded bits are received for the reconstruction of images with any size, and this is similar to the reconstruction of 432 × 492 and 392 × 462 images using ROIJPEG2000 scheme. Therefore, in the low-resolution cases, the average number of bits used for decoding each pixel increases. It should be noted that for the reconstruction of 241 × 320 image, the coded bits in the first DWT level (i.e., 336 × 336) are not needed in the ROI-JPEG2000. In the ROISeam method, the performance is less than the SPIHT-Scale and ROI-JPEG2000, since the efficiency of AAC (used for seams coding) is far below the SPIHT and JPEG2000 codecs with the original image size; when adapted to a smaller size, only part of the seam-coded bitstream is needed. In such cases, the performance can be better than that of original image reconstruction. In the proposed Seam-SPIHT, since the whole image is coded by SPIHT with additional side information and not all the bits need to be received for resizing, the highest efficiency is achieved. As can be seen from Table 1, the proposed Seam-SPIHT results in the lowest bitrates, and this is remarkable since the non-ROI region was coded with loss (PSNR being 35 dB) by ROI-JPEG2000 and ROISeam [7] while the whole image was losslessly encoded by the proposed Seam-SPIHT. Table 2 reported the overhead of side information for 672 × 672 image reconstruction. We can see that much less overhead was needed for the Seam-SPIHT in comparison with the ROISeam. The reason is that for Seam-SPIHT, seam carving (SC) was performed in the low-frequency subband and the number of DWT coefficients is much less than that of original image pixels (1/16 for 2-level DWT). Apart from the objective evaluation, the visual quality of decoded images was also shown in Fig. 3. It can be seen

ROI-JPEG2000 5.12 8.83 7.52 3.54 4.9 8.35 7.12 3.35 5.16 8.70 7.42 3.49 6.63 11.30 9.62 4.53

Bitrate (bpp) ROI-Seam [7] 5.29 5.75 5.91 6.18 5.25 5.89 6.14 6.67 5.42 5.97 6.21 6.46 5.95 6.52 6.78 7.09

Proposed Seam-SPIHT 4.83 4.64 4.55 4.51 4.78 4.51 4.45 4.36 4.65 4.42 4.21 4.05 5.39 5.22 5.06 4.95

TABLE II

C OMPARISON OF SIDE INFORMATION OVERHEAD BETWEEN TWO SEAM - BASED IMAGE CODECS (ROI-S EAM [7] AND S EAM -SPIHT). Test images wave forest grave butterfly

ROI-Seam 0.88 1.17 1.05 0.69

Bitrate (bpp) Proposed Seam-SPIHT 0.08 0.09 0.09 0.07

that the important region (i.e., the “sportsman”) was severely distorted in SPIHT-Scale. In the ROI-JPEG2000, the ROI and non-ROI were encoded with different bitrates; therefore, block artifacts occur in the boundaries of ROI and non-ROI regions; in addition, due to the directly cropping, the “water” portion was almost removed in 432 × 492 and 392 × 462 decoded images. In the ROI-Seam, the key image (i.e., ROI) and seams (i.e., Non-ROI) were encoded by different codecs, block artifacts also occur in the boundaries of the key image and the seams; furthermore, due to the seam insertion in the bottom part, the sportsman is not in the same position as the original image. In the ROI-JPEG2000 and ROI-Seam, the distortion (caused by scaling) is more visible in the 241 × 320 decoded image. In comparison with the existing relevant codecs, the proposed Seam-SPIHT yields better subjective quality after decoding, due to no need for ROI segmentation and resorting to different coders for image coding. IV. C ONCLUSIONS We have provided a solution for delivering images to diversifying end-user display terminals with arbitrary resolutions, maintaining visually important content as far as possible and without extra computational burden from the decoding perspective. It essentially results in a single bitstream to be

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

(a) 672 × 672

(b) 432 × 492

(c) 392 × 462

(d) 241 × 320

(e) Amplified portion

Fig. 3. Comparison of subjective performance (each column is with a retargeted resolution, and the last column is the amplified portions of the decoded 672 × 672 images; first row: SPIHT-Scale; second row: ROI-JPEG2000; third row: ROI-Seam; forth row: proposed Seam-SPIHT).

decoded for any image resolution requirement at the decoding end, preserves the most significant signal content, keeps the decoder’s complexity low, and therefore addresses the need for arbitrary display sizes (increasingly meaningful in mobile multimedia communication, cloud computing and so on). In this paper, the use of seam carving (SC) technique in spatialscalable image coding has been studied and a novel image codec has been developed for content-based adaptation. The proposed codec (Seam-SPIHT) has its roots primarily in the DWT-based SPIHT scheme, with different coefficient coding order. In Seam-SPIHT, a seam energy map is generated using a wavelet-based energy function. According to the resultant map, the coefficients are grouped as seam-based spatial orientation trees (SOT), which are encoded in energy descendant order and the side information is also sent to indicate the positions of trees. In this way, one can reconstruct the arbitrary size image in a content-aware manner. Experimental results show that the re-targeted images generated by the proposed codec preserve important and sensitive image content (i.e., in

a content-aware manner), while achieving better compression performance compared with other relevant coding schemes. R EFERENCES [1] Information technology-JPEG2000 image coding system-Part1: core coding system, IEEE Std. ISO/IEC 15 444-1, Sep. 2004. [2] A. Said and W. A. Pearlman, “A new, fast and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 243–250, Jun. 1996. [3] Y. S. Wang, C. L. Tai, O. Sorkine, and T. Y. Lee, “Optimized scale-andstretch for image resizing,” ACM Trans. Graphics, vol. 27, no. 5, pp. 118–125, Dec. 2008. [4] S. Avidan and A. Shamir, “A seam carving for content-aware image resizing,” ACM Trans. Graphics, vol. 26, no. 3, pp. 10–19, Jul. 2007. [5] A. Shamir and O. Sorkine, “Visual media retargeting,” in Proc. ACM SIGGRAPH ASIA’09, Dec. 2009, pp. 11–25. [6] B. Caldwell, M. Cooper, L. G. Reid, and G. Vanderheiden. (2008) Web Content Accessibility Guidelines (WCAG) 2.0. [Online]. Available: http://www.w3.org/TR/2008/REC-WCAG20-20081211 [7] N. T. N. Anh, W. X. Yang, and J. F. Cai, “Seam carving extension: a compression perspective,” in Proc. ACM Conf. Multimedia’09, 2009, pp. 825–828.

Suggest Documents