Seam Carving Extension: a Compression Perspective

2 downloads 0 Views 6MB Size Report
Oct 24, 2009 - framework to improve the performance in low bitrate image transmission ... share the same vast multimedia content on Internet. Im- age and video ... ing based codec for multi-size image compression. In partic- ular, an image ...
Seam Carving Extension: a Compression Perspective Nguyen Thi Nhat Anh, Wenxian Yang, Jianfei Cai School of Computer Engineering, Nanyang Technological University, Singapore

{ngnanh, wxyang, asjfcai}@ntu.edu.sg

ABSTRACT There is an increasing demand on image compression adaptive to different display sizes. However, existing spatial scalable coding only supports dyadic resolutions and is not content-aware. In this paper, we apply the recently developed image resizing algorithm, seam carving, for contentaware multi-size image compression. Our proposed codec encodes an image into a content-aware progressive bitstream that allows decoding into arbitrary display resolution. In addition, seam insertion is incorporated into the proposed framework to improve the performance in low bitrate image transmission applications. To the best of our knowledge, this is the first applicable content-aware multi-size image coding work in literature.

Categories and Subject Descriptors I.4.2 [Image Processing and Computer Vision]: Compression (Coding)—Approximate methods; I.4.9 [Image Processing and Computer Vision]: Applications

Keywords multi-size, content-based, image compression, image resizing

1.

INTRODUCTION

Nowadays, there are numerous consumer multimedia devices in the market with different display resolutions that share the same vast multimedia content on Internet. Image and video compression have been extensively studied for storage and transmission applications. With the advanced coding techniques including SPIHT [3] and JPEG 2000, image compression algorithms are considered mature. However, the research in multi-size image compression has been limited in spatial scalable coding which can only support dyadic resolutions. Existing spatial-scalable image compression schemes, e.g., JPEG-2000, are generally based on subband decompositions, which intrinsically provide multiresolution representation of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’09, october 19–24, 2009, Beijing, China Copyright 2009 ACM 978-1-60558-608-3/09/10 ...$10.00.

the data. The constraint is that they can only provide dyadic spatial resolutions. To tailor the image to a device whose display resolution is between the available scales, a larger image has to be transmitted and clipped or scaled. On the other hand, in conventional spatial scalable image coding, the full image is downsized without considering the image content. Therefore, the quality is degraded homogeneously over the entire image. The increasing demand on image compression adaptive to different display sizes and the limitations of existing spatialscalable image coding lead us to the idea of content-based multi-size image compression. Fortunately, the recent contentaware image resizing algorithm, seam carving [1], facilitates the realization of our idea. Seam carving supports arbitrary image resizing within a certain range. It uses an energy function to define the importance of pixels, and a seam is a connected path of low energy pixels crossing the image from top to bottom (a vertical seam), or from left to right (a horizontal seam). During image size reduction, the seams with low energy are removed from the image. In this way, seam carving preserves the important content of the image. In this paper, we propose to apply seam carving for contentaware multi-size image compression. The basic idea is to use seam carving to decompose an image into two components: a key/base image and a sequence of seams. The two components are encoded separately and composed into a progressive bitstream. Depending on the device display resolution, only the key image and the needed seams are decoded or transmitted. Such a multi-size compression is very suitable for resource-limited devices, which cannot afford to run seam carving on the spot. In addition, seam insertion is incorporated into the proposed framework to improve the performance in low bitrate image transmission applications.

2. MULTI-SIZE IMAGE COMPRESSION Fig. 1 shows the block diagram of the proposed seam carving based codec for multi-size image compression. In particular, an image is divided into two parts: a key/base image and additional seams. The key image covers the region-ofinterest (ROI), which is not expected to be changed during resizing, while seams facilitates arbitrary resizing within a certain resolution range. At the encoder side, we adopt SPIHT [3] to encode the key image, which is a high performance progressive image coder based on the wavelet transform. Note that any image codec can be adopted to encode the key image. The side information including the position and the color of the seams is coded using our proposed codec. At the decoder side, depending on the required size

(a) Encoding position Figure 1: The block diagram of the proposed codec.

of the reconstructed image, as many seams as necessary are decoded and added back to the key image. We would like to point out that the order in which seams are added back during decoding is the reverse of the order in which seams are extracted in the encoder. Therefore, seams must be encoded and transmitted in reversed order, i.e., the last seam extracted is the first one in the encoded bitstream. As a result, the seam encoding process is started after the seam extraction process has finished. Furthermore, the horizontal and vertical seams are alternatively removed from the image and coded into a progressive bitstream. The reconstructed image size in the decoder side can be arbitrary. In the decoder side, depending on the required image size, we can select the amount of horizontal and vertical seams to be added back to the key image. It is possible that the required number of horizontal and vertical seams will not be reached simultaneously due to the order of the seams in the transmitted bitstream. In such a case, we keep adding back more seams until both horizontal and vertical size of the image are satisfied, and save the position information of the redundant seams. After that, we simply remove the redundant seams added to achieve the required size.

2.1 Seam Compression First we discuss the method used to encode the seam information losslessly. We adopt the adaptive arithmetic coding algorithm to code the seam information. At the beginning of each seam, there is one bit indicating whether it is a row or a column seam, followed by the information of each pixel in the seam indicating its position and color. A seam is a connected path of low energy pixels crossing the image from top to bottom (a vertical seam), or from left to right (a horizontal seam). Therefore, if we code the pixels within each seam in the order from top to bottom (for a vertical seam) or from left to right (for a horizontal seam), only x-coordinate (for a vertical seam) or y-coordinate (for a horizontal seam) is required to identify the position of each pixel. Let’s consider the encoding of a vertical seam. For the first pixel in a vertical seam, depending on whether it is the first vertical seam coded or not, we will code the x-coordinate of the pixel as it is or code it as the difference to the first pixel of the previous vertical seam. Assuming that both horizontal and vertical resolutions of the image are limited to 2N pixels, the value coded will be within the range of [−2N +1, 2N −1], which includes (2N+1 − 1) possible values. Thus, we use the adaptive arithmetic coding model with (2N+1 − 1) symbols to encode the position information of the first pixel. For any pixel that is not the first pixel in the seam, we will

(b) Encoding color Figure 2: The diagram for lossless seam coding.

code the difference in x-coordinates of this pixel and the previous pixel in the seam, which has three possible values {−1, 0, 1}. Therefore, we use a 3-symbol adaptive arithmetic coding model for this case. The color information of each pixel is coded in the same way as for position information except that we use the (29 − 1)-symbol adaptive arithmetic coding model, considering the color difference is within the range [−255, 255]. The diagram for encoding the position and color information of a seam is shown in Fig. 2. Although our target is to render images with good visual quality, it is not necessary to losslessly encode images due to the existence of psychovisual redundancies. Moreover, lossy compression provides much better performance in terms of coding efficiency. Therefore, we further extend our lossless seam codec by applying a quantizer to the color differences before encoding them with arithmetic coding. Considering that most of the color differences have small value but their range is quite large, i.e., [−255, 255], a non-uniform quantization is adopted to maximize signal-to-noise ratio. To avoid the need to transmit the quantization codebook, a fixed codebook is empirically chosen. Note that the lossy compression of the key image is very simple since it is encoded by the progressive coder of SPIHT. In our work, we simply truncate the SPHIT bitstream to 1 bpp, at which the visual quality of the reconstructed images are usually as good as the original ones. Certainly, the just-noticeable difference (JND) thresholds studied in the area of visual quality assessment can be applied here.

2.2 Key Image Extraction To extract the key image, we could directly use the seam carving algorithm [1] which uses the gradient magnitude as the weight to calculate seam energy. Although it does produce satisfactory results for many images, the seams selected often pass through the important objects. Thus, we incorporate the significance map introduced in [4] into the en-

ergy function of seam carving. The significance map is the product of the gradient magnitude and the saliency measure [2]. The gradient field helps to preserve structurally important image features while the saliency map helps to preserve semantically important objects and smooth out noise. Therefore, the significance map can successfully asign higher weight to the important objects of the image to guarantee that the selected seams do not pass through them. A critical problem is how much information we want/need to retain in the key image. In other words, how to determine the size of the key image? Our strategy is to first calculate the ROI containing structurally and semantically important information of the image, and then set the key image to at least cover the ROI. Particularly, the significant map as mentioned above is firstly thresholded. Then, the largest connected component in the threshold significance map is regarded as the ROI. We calculate the ROI bounding box, whose size is denoted by w0 × h0 , and set the size of the key image to 1.5w0 × 1.5h0 to cover the ROI and part of the background. In addition, our framework supports user interactive mode, where the user can draw a bounding box or even a mask for the parts of the image that he/she wants to preserve. Generally, the significance map can delineate the key parts of the image, while the user interactive mode is useful for special purposes where part of the image that he/she wants to protect cannot be highlighted by the automatic method. In the user interactive mode, the coordinates of the bounding box or the binary mask will need to be encoded with the image. The required bits to encode this information is very few and thus can be neglected.

2.3 Seam Insertion at the Decoder Side Although our proposed lossy multi-size image compression codec can significantly reduce the storage size compared with the original seam carving without compression, its coding performance is still far below the state-of-the-art non-multisize codec such as SPIHT. This makes our codec incompetent for low bitrate image transmission applications. To overcome the limitation, we propose to utilize the seam insertion technique introduced in the seam carving framework [1] for low bitrate transmission. The basic idea is to transmit an image smaller than the targeted size to the receiver side and then use seam insertion to expand the small image to the desired resolution. In particular, to enlarge the size of an image by one in vertical (horizontal) resolution, we select the vertical (horizontal) seam s with minimum energy and insert one seam beside s by averaging the pixels on s with their left (top) neighbors, as suggested in [1]. However, repeating this process will most likely create a stretching artifact by always choosing the same seam. Therefore, to enlarge an image by k in horizontal (vertical) direction, we find the first k horizontal (vertical) seams with least energy, then interpolate and insert seams between them and their left (top) neighbors in order. It is clear that there is certain limitation in seam insertion. Using seam insertion to recover a large amount of image content from a small region will typically produce artifacts. Thus, in order to produce natural and visually pleasant image at the receiver side, we limit the expansion ratio to have an upper limit of 2, which is empirically selected. Based on this limit, we can determine the value range for the trans-

mitted image size. Specifically, let wo × ho denote the size of the ROI bounding box, which is found through thresholding the significant map as discussed in the previous subsection, wr × hr and wt × ht denote the target image size and the transmitted image size respectively. The size of the background region in the transmitted image can be described as (wt − wo ) and (ht − ho ) for the horizontal and vertical directions, respectively. During the seam insertion, only the background region will be used to recover the image to the desired size wr × hr . Thus, according to the expansion ratio limitation of 2, we have 2(wt − wo ) ≥ wr − wt ,

2(ht − ho ) ≥ (hr − ht )

(1)

3. EXPERIMENTAL RESULTS To evaluate the performance of our codec, we apply it to encode different images and reconstruct the images at different resolutions. Table 1 shows the coding results of applying our codec on the color image wave with original size 672 × 672. The key image has resolution 352 × 432 and is coded with SPIHT at 1 bpp, generating a bitstream of 18.5 kbytes (kB). The seams are encoded in both lossless and lossy ways. We observe that with lossy seam coding, the number of bits used to encode the seams reduces to about half, compared to lossless seam coding, with unnoticeable degradation in the image quality, where the peak signal to noise ratio (PSNR) over all the seams is 40.5 dB. Without a content-based multi-size image codec, to tailor an image to a device whose display resolution is smaller than the size of the image, a larger image has to be transmitted and then clipped or scaled without considering the image content. Therefore, to demonstrate the effectiveness of our codec at various resolutions, we compare the reconstructed images coded with our codec and the downsampled versions of the decoded image coded with SPIHT. The results are shown in the first two rows of Fig. 3. It can be seen that our reconstructed images well preserve the ROI and thus achieve better visual quality than those downsampled SPIHT decoded images. Considering low bitrate transmission, our codec has some advantages over SPIHT at low resolution cases. For example, if the target resolution is 392 × 462, our codec needs to transmit a bitstream of 52.5 kB, while SPIHT needs to transmit the entire image with a bitstream size of 55.1 kB and then downsamples the decoded image to the target resolution. On the other hand, the bandwidth consumption of our proposed codec is much larger than SPIHT when the target resolution is large. Thus, seam insertion at the decoder side is incorporated to further improve the performance. The third row of Fig. 3 shows the reconstructed images using our codec with seam insertion. We can see that the visual quality of the reconstructed images with seam insertion is about as good as that without seam insertion, while the former has much less bandwidth consumption. For example, if the target resolution is 512 × 552, with seam insertion we only transmit a 432×492 image with a bitstream size of 80.76 kB, while without seam insertion we transmit a bitstream of 162.48 kB.

4. CONCLUSIONS AND DISCUSSIONS The main contribution of this paper is that we developed a multi-size image compression codec based on seam carving,

(a) 672 × 672

(b) 512 × 552

(c) 432 × 492

(d) 392 × 462 (e) 352 × 432

Figure 3: Comparison of our reconstructed wave images with the downsampled SPIHT decoded wave images. First row: downsampled SPIHT decoded images, where the original image is coded by SPIHT at 1 bpp (bitstream size: 55.1 kB) with full resolution. Second row: reconstructed images using proposed lossy seam coding. Third row: reconstructed images using proposed lossy seam coding with seam insertion at the decoder side, where the reconstructed image at a specific resolution is generated by decoding its closest smaller-resolution image and expanding to the target resolution.

Table 1: The results of encoding the 672 × 672 wave image using our proposed codec with a key image size of 352 × 432. Reconstructed Image Size 352 × 432 392 × 462 432 × 492 512 × 552 672 × 672

lossless coding Bitstream Bitrate size (kB) (bpp) 18.5 1 80.8 3.65 144.46 5.57 279.72 8.10 569.84 10.34

lossy coding Bitstream Bitrate size (kB) (bpp) 18.5 1 52.5 2.37 80.76 3.11 162.48 4.71 322.5 5.85

which encodes an image into a content-aware progressive bitstream allowing decoding into arbitrary display resolution. To the best of our knowledge, this is the first applicable content-aware multi-size image coding framework in literature. However, the proposed multi-size image codec still has some limitations. First, the proposed seam coding is quite simple and not as efficient as coding of the key image. Other

efficient coding techniques such as transform-domain approaches can be applied to improve the coding efficiency. Second, seam carving does not allow to scale the important content, which limits the range of resizing. The latest image resizing algorithm [4] allows regions with high importance to scale uniformly and regions with homogeneous content to be distorted. It would be interesting to adopt this latest resizing method into the design of multi-size image compression.

5. REFERENCES

[1] S. Avidan and A. Shamir. Seam carving for content-aware image resizing. ACM Trans. on Graphics, 26(3), July 2007. [2] L. Itti, C. Koch, and E. Neibur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on PAMI, 20(11):1254–1259, 1998. [3] A. Said and W. A. Peralman. A new, fast, and efficient image coded based upon set partitioning in hierarchical trees. IEEE Trans. on CSVT, 6(3):243–250, June 1996. [4] Y.-S. Wang, C.-L. Tai, O. Sorkine, and T.-Y. Lee. Optimized scale-and-stretch for image resizing. ACM Trans. on Graphics, 27(5), December 2008.