image retargeting using a bandelet-based similarity ... - IEEE Xplore

2 downloads 0 Views 376KB Size Report
Similarity Measure. Aldo Maalouf, Member IEEE, Mohamed-Chaker Larabi, Senior Member IEEE. XLIM-SIC Laboratory, UMR CNRS 6172, University of Poitiers.
Image Retargeting Using a Bandelet-Based Similarity Measure Aldo Maalouf, Member IEEE, Mohamed-Chaker Larabi, Senior Member IEEE XLIM-SIC Laboratory, UMR CNRS 6172, University of Poitiers {maalouf, larabi}@sic.univ-poitiers.fr

Abstract— Media content retargeting aims to adapt images/videos to displays of large or small sizes. In this work, we propose a bandelet-based image retargeting algorithm for summarizing image data into smaller sizes. First, we define a multi-scale bandelet-based perceptual similarity measure which measures the geometric and perceptual similarities between two images at different bandelet scales. Two images are said to be geometrically similar if they have approximately the same geometric flow and quadtree structure. After determining the geometric similarity, a perceptual similarity measure based on the properties of the human visual system is defined to assess the perceptual difference between the original image and the retargeted one. Then, the problem of image retargeting is considered as a geometric optimization problem based on the bandelet-based geometric and perceptual similarity measures. That is, for an image S we search for a retargeted image T that contains as much as possible of geometric and perceptual information from S and, consequently, preserves visual coherence. The proposed retargeting algorithm outperforms the state-of-the-art methods in terms of the visual quality of the retargeted image. Index Terms— Image resizing, retargeting, bandelet transform.

I. I NTRODUCTION Image resizing/retargeting is a technology that shows on a small display, a reduced version of a large image such as portable device displays like a digital camera or a mobile phone, or shows a large collection of image thumbnails on an ordinary display like a computer monitor. It provides a convenient way to quickly access the content of a large image. Traditional image resizing techniques, such as scaling and cropping, can be used to change the size of images and videos easily. However, the results are often unsatisfactory. Since all parts of the image are treated equally, it is impossible to preserve certain important areas, which may therefore become unacceptably degraded at the lower resolution. Moreover, changing the aspect ratio will distort the image content, which is generally undesirable. To solve this problem, content aware retargeting algorithms have been developed. These algorithms try to change the image size while preserving the most important information in it. Here we observe the work of Setlur et al. [8] who defined a non-photorealistic algorithm, which is a tradeoff between image resizing and image cropping, to adapt images for small sized displays, while preserving the important features in the image. They achieved this by extracting regions of interest (ROI), and then pasting them back onto their corresponding positions in the downsized background. Regions are found by

978-1-4244-4296-6/10/$25.00 ©2010 IEEE

942

performing image segmentation which requires several parameters. A problem with this approach is that state of the art segmentation algorithms are unreliable, which compromises the robustness of the proposed system. Another image retargeting algorithm which is also a tradeoff between image resizing and image cropping was proposed by Liu et al. in [3] and [4]. The method consists of finding the Region-of-Interest (ROI) of the image and constructing a novel Fisheye-View warp that applies a linear scaling function in each dimension of the image. Basically, in this method the information on the ROI is preserved and the rest of the image is warped. The disadvantage of this method is that, as the method of Setlur et al. [8], segmentation algorithms are sometimes deceitful which lead to degradations in the retargeted image. Another approach for the summarization of visual data for images and videos is proposed in [1]. The authors propose a measure to quantify how good is a visual summary. This measure can be applied to compare two images or two videos sequences with different sizes. This is useful to improve some objective function within an optimization process, to generate good visual summaries or to compare quantitatively and evaluate visual summaries produced by different methods. Another approach has been developped by Avidan and Shamir [7]. Rather than cropping, they adjusted the image size by adding or removing seams. A seam was defined as an optimal 8-connected path of pixels between opposing margins of the image by using an image energy function. They used dynamic programming to find seams which pass through unimportant areas in order to preserve the important ones to the greatest extent. The results are impressive, but because the method uses dynamic programming, it has a low performance when applied to images with big sizes. Recently, a structure-preserving image resizing technique has been proposed by Wang et al. [9]. They first downsample the original image by using bilinear interpolation. Then, they introduced a structure constraints derived from the line detection into the resizing procedure to preserve the image structures. In this work, the problem of image retargeting is addressed from a new angle. Instead of defining a seam or searching for ROIs and then perform cropping, we pose it as a multi-scale geometric and perceptual similarity optimization. That is, we define a function to measure the geometric similarity between the original image S and a set of summarized images Q. The retargeted image is said to be a ”good visual summary” of S if it contains as much as possible of geometric and perceptual

ICASSP 2010

features of the original image. The geometric features are characterized by using the bandelet bases [6]. From Q we select the retargeted image T ∈ Q that maximizes the geometric and perceptual similarity function. The remainder of the paper is organized as follows: in section 2 a review of the bandelet transform is given. In section 3 the retargeting technique is described. Section 4 is devoted to experimental results and section 5 draws some concluding remarks. II. BANDELET T RANSFORM We only present here a brief review of the Bandelet transform. The reader can refer to [6] for a full detailed description. The bandelets are defined as anisotropic wavelets that are warped along the geometric flow, which is a vector field indicating the local direction of the regularity along edges. The dictionary of bandelet frames is constructed using a dyadic square segmentation and parameterized geometric flows. For image surfaces, the geometry is not a collection of discontinuities, but rather areas of high curvature. The Bandelet transform recasts these areas of high curvature into an optimal estimation of regularity direction. Figure 1 shows an example of bandelets along the geometric flow in the direction of edges. In real applications, the geometry is

Fig. 2.

Lenna image quadtree segmented

used in the definition of a multi-scale geometric similarity measure. The role of this similarity measure is to determine the geometric similarity between the original image S and a set of resized images Q so as to determine which image T ∈ Q is most similar to S. In other words, T should contains as much as possible of geometric features of S. This idea is discussed in details in the following section. III. BANDELET-BASED I MAGE R ETARGETING In our retargeting scheme, we consider two images S (the image to be retargeted) and T (the retargeted one) to be similar, if T has approximately the same geometric structure and perceptual features. We begin by defining the similarity measure, then we present our retargeting algorithm. A. Geometric similarity measure

Fig. 1. An illustration of bandelets with geometric flows in the direction of the edge. The support of the wavelets is deformed along the geometric flows in order to exploit the edge regularity.

estimated by searching for the regularity flow and then for a polynomial to describe that flow.

Our geometric similarity measure determines the geometric resemblance between the bandelet bands of T and S. After performing the bandelet transform, T and S contain an ordered set of dyadic squares C 1 , C 2 . . . C K (where K is the total number of dyadic squares in the image). The similarity between the dyadic squares of S and T is studied as follows: Let C k1 be a dyadic square in S and C k2 another dyadic square in T . C k1 is said to be geometrically similar to C k2 if C k2 has the same geometric characteristics, i.e. the same side length l and the same optimal geometric direction θ according to the following equation: d(C k1 , C k2 ) = min |(θC k2 − θC K1 ) (lC k2 − lC K1 )|

Implementation of the Bandelet Transform

C k2 ⊂T

The classical tensor wavelet transform of an image I is the decomposition of the latter on an orthogonal basis formed by the translation and dilation of three mother wavelets for the horizontal, vertical and diagonal directions. Once the wavelet transform is found, the quadtree is computed by dividing the image into dyadic squares with variable sizes (refer to [6] for more information on computing the quadtree). For each square in the quadtree the optimal geometrical direction is computed by the minimization of a Lagrangian (refer also to [6]). Then a projection of the wavelet coefficients along the optimal direction is performed [6]. Finally a 1D discrete wavelet transform is carried on the projected coefficients. On figure 2 one can see, after the bandelet decomposition, the quadtree, and a zoom on the orientation of the linear flow on each dyadic square. Notice that the quadtree segmentation performs very well in the area corresponding to edges. The geometry encoded by the bandelet transform will be

943

(1)

The optimal geometric direction θ is obtained by the bandelet transform (the optimal geometric direction in each dyadic square). Equation (1) associates to each dyadic square in T similar squares in S. Let N be the number of squares in T having similar dyadic squares in S. The geometric similarity between S and T can therefore be defined by:    1 (2) d C K1 , C K2 dgeometry = N K C 2 ∈T,C K1 ∈V (C K2 )   where V C K2 is the set of squares in S similar to C K2 ∈ T . We define in the next subsection the perceptual similarity measure. B. Perceptual similarity measure First, we have to find the sensitivity coefficients of both images, i.e., the most attractive bandelet coefficients as seen by

an observer, so that we can measure the perceptual similarity by using few coefficients. We begin by performing contrast sensitivity function (CSF) filtering. The CSF describes the variations in visual sensitivity as a function of spatial frequency and orientation. In fact, it has been reported that human eyes have different sensitivity to signals with different frequencies. So, to make full use of the bandelet coefficients and orientations in different bandelet scales, we used the CSF to render the sensitivity of human visual system the same to different frequencies. Particularly, we used the model proposed by Daly in [2]. The CSF is applied for all bandelet bands at a given scale j as follows: ˆj = Bj · N B

CSF

(3)

where B is the bandelet coefficient and N CSF is the value of the 2D CSF defined by Daly [2]. After the normalization of the frequencies at different bands and scales, we apply sensitivity thresholding to find the sensitivity coefficients. For that purpose we used the thresholds proposed by Daly [2]. In this model, the visibility threshold elevation Tj (x, y) at site (x, y) and scale j is given by:  1    f b b  ˆ  Tj (x, y) = 1 + k1 · k2 · Bj (x, y) (4) where k1 and k2 determine the pivot point of the curve, and the parameter b determines how closely the curve follows the asymptote in the transition region. In the initial work of Daly, a value for the learning slope is chosen depending on the cortex subband. Ideally, this value should depend on the uncertainty of the signal masking. f is the slope of the curve of the masking contrast. The value of f is between 0.65 and 1.0, k 1 = 6−7/3 , k2 = 610/3 and b = 4 [2]. Typically, a similarity measure between two features (one from the original image and one from the retargeted image) corresponds to the absolute difference between the two features, normalized by the magnitude of the feature from the original image as shown in the following equation: dperceptual =

M 2 1  ˆ ˆj,T Bj,S − B M i=1

(5)

where M is the total number of the remaining coefficients after applying the sensitivity thresholding. The overall similarity measure between S and T is therefore defined by: D(S, T ) = dgeometry + dperceptual

(6)

Typically we have to search for a retargeted image T so as to minimize (6), i.e. T = arg min D (S, T )

(7)

T

This is explained in details in the next subsection. C. Retargeting algorithm   Let V C K1 be the set of dyadic squares in S that are similar to a dyadic square C K2 ∈ T according to (1). Similarly

944

Fig. 3.

Gradual retargeting algorithm

we can use (1) to find the set of dyadic squares C K2 in T that are geometrically similar to a square C K1 ∈ S. We   denote by W C K2 this set. Let ˆb11 , . . . , ˆb1n be the pixels (the bandelet coefficients without sensitivity thresholding) in C K1 having the same position of dˆ1 , . . . , dˆn in C K2 . We measure the contribution of each pixel dˆ1 , . . . , dˆn to equation (5) by: E1 =

1 N

n  



dˆi − ˆb1i

2

C K1 ∈V (C K2 ),C K1 =arg min d(C K1 ,C K2 ) i=1 C K2 ∈T

(8) Similarly, we can calculate the same error for the pixels   ˆb2 , . . . , ˆb2 of the squares in W C K2 . 1 n E2 =

1 P

n  



dˆi − ˆb2i

2

C K2 ∈W (C K2 ),C K2 =arg min d(C K1 ,C K2 ) i=1 C K1 ∈S

(9)    where P = card W C K2 . Therefore a bidirectional error between S and T can be defined by: E = E1 + E2 (10) To find the optimal retargeted image that minimizes equation (6) we proceed as in [1]. We first find an update rule, then we iterate to find the optimal T that minimizes (7). To define the update rule, we have to find the bandelet coefficient which minimizes (10). To this end, E is differentiated w.r.t. to the unknown bandelet coefficient d and equated to zero. This leads to the following update rule: n 

di =

i=1

P ˆb1i + N ˆb2i P +N

(11)

Finally the retargeting algorithm can be outlined as follows: 1) Compute the bandelet transform of the source image S. 2) Find the initial guess of T by subtle scaling down S and compute the bandelet transform    of each intermediate T . 3) Find V C K1 and W C K2 using (1) for each target T. 4) Choose the target T that minimizes (6). These steps are illustrated in figure 3. A series of intermediate targets T0 , T1 ...., TK is obtained from S by gradual scaling down. The target T k is initialed to be a scaled-down version of S. Then iterative refinement is performed by using the update rule (11) on T k .

IV. E XPERIMENTAL R ESULTS In this section, we validate the proposed method by conducting several experiments for quality and efficiency verification. Figures 4 and 5 show the results obtained by our method and those obtained by the recently proposed method of Wang et al. [9] and by the method of Avidan et al. [7]. Images 4(a) (300 × 400) and 5(a) (500 × 333) have been resized by half. Visually, our method outperforms the other two methods for the arc de triomphe image and has the same visual quality as the method of Wang et al. for the couple image. For a better quality assessment we have used the two reduced reference image quality metrics proposed in [10] and [5]. Both metrics can score the quality of the retargeted image using partial information from the original image and each retargeted one by taking into account the properties of the human visual system. The scores are shown in Fig. 4 and 5 (S1 for the score obtained by the metric of Wang et al. and S2 for the metric of Maalouf et al.). Our method achieved better quality scores than the other two methods. Figure 6 (a) and (b) show the scores obtained by using the metric proposed in [10] for different sizes of the retargeted image. Our method achieved better scores than those of the approach proposed by Avidan et al. and slightly better scores than the method of Wang et al. This confirms the visual assessment mentioned above.

(a)

(b)

(c)

(d)

Fig. 5. (a) Original couple image and results obtained by the method proposed in: (b) [7] (S1 = 0.34, S2 = 0.29), (c) [9] (S1 = 0.91, S2 = 0.90) and (d) our method (S1 = 0.93, S2 = 0.92)

Fig. 6. Quality scores [10] vs percentage of gradual scaling (e.g. 50 corresponds to 50 % of the original image size). ’- -’ our method, ’-o’ the method of Wang et al., ’-’ for the method of Avidan et al. and ’- 2’ for the method of Simakov et al. [1] (a) is for the arc de triomphe image and (b) is for the couple image.

(a)

experimental results demonstrate the improved image quality and efficiency of the proposed resizing algorithm. (b)

R EFERENCES

(c)

(d) Fig. 4. (a) Original arc de triomphe image and results obtained by the method proposed in: (b) [7] (S1 = 0.45, S2 = 0.38), (c) [9] (S1 = 0.80, S2 = 0.75) and (d) our method (S1 = 0.9, S2 = 0.88)

V. C ONCLUSION In this paper, we proposed an efficient image retargeting algorithm to resize images to smaller sizes with the global image geometric and perceptual features preserved. In the proposed algorithm, we first defined geometric and perceptual similarity measures. Then, these similarity measures are used to find a target image that contains as mush as of the geometric and perceptual features of the original image. Both similarity measures are based on the bandelet transform. The

[1] Simakov D., Caspi Y., Shechtman E., and Irani M., Summarizing visual data using bidirectional similarity, IEEE Conference on Computer Vision and Pattern Recognition (2008), 1–8. [2] S. Daly, The visible difference predictor: an algorithm for the assessment of image fidelity, Proc. of SPIE 1666 (1992), 2–15. [3] Liu F. and Gleicher M., Automatic image retargeting with fisheye-view warping, In UIST 05: Proceedings of the 18th annual ACM symposium on User interface software and technology, NY, USA (2005), 153–162. [4] , Video retargeting: automating pan and scan, In MULTIMEDIA 06: Proceedings of the 14th annual ACM international conference on Multimedia, NY, USA (2006), 241–250. [5] A. Maalouf, M.C. Larabi, and C. Fernandez-Maloigne, A grouplet-based reduced reference image quality assessment, IEEE First International Workshop on Quality of Multimedia Experience, QOMEX09 (2009). [6] G. Peyre and S. Mallat, Surface compression with geometrical bandelets, ACM Transactions on Graphics 14 (2005), no. 3. [7] Avidan S and Shamir A., Seam carving for content-aware image resizing, ACM Trans Graph (SIGGRAPH) 26 (2007), no. 3, 10–18. [8] Setlur V., Takagi S., Raskar R., Gleicher M., and Gooch B., Automatic image retargeting, ACM Proceedings of the 4th international conference on Mobile and ubiquitous multimedia, NY, USA (2005), 59–68. [9] S. F. Wang and S. H. Lai, Fast structure-preserving image retargeting, IEEE International Conference on Acoustics, Speech and Signal Processing (2009), 1049–1052. [10] Z. Wang and E. P. Simoncelli, Reduced-reference image quality assessment using a wavelet-domain natural image statistic model, Human Vision and Electronic Imaging X, San Jose, CA, Proc. SPIE 5666 (2005).

945