Vis Comput (2011) 27:853–860 DOI 10.1007/s00371-011-0559-x
O R I G I N A L A RT I C L E
Saliency-driven scaling optimization for image retargeting Dong Wang · Guiqing Li · Weijia Jia · Xiaonan Luo
Published online: 25 March 2011 © Springer-Verlag 2011
Abstract This paper proposes a saliency-weighted scaling factor energy for image retargeting. Considering that salient objects should be scaled with a larger scaling factor with respect to nonsalient regions, we define a quadric energy to establish the relation between the scaling factor of a local region and its saliency. The quadric energy is the weighted sum of the square of scaling factors, where the weight of each scaling factor is inversely proportional to its corresponding saliency. Furthermore, a triangle similarity quadric energy is introduced to prevent salient regions from distortion. Compared to previous methods, our approach not only preserves the shapes of salient objects and the integrity of the whole image well, but also reserves more resolution to salient objects in target image even when the aspect ratio is unchanged. Keywords Image retargeting · Scaling factor energy · Visual saliency · Mesh deformation 1 Introduction With the popularization of digital cameras, digital images become ubiquitous. All kinds of image optimization techD. Wang · W. Jia () City University of Hong Kong, Hong Kong, China e-mail:
[email protected] D. Wang e-mail:
[email protected] G. Li South China University of Technology, Guangzhou, China e-mail:
[email protected] X. Luo Sun Yat-Sen University, Guangzhou, China e-mail:
[email protected]
niques emerge as the times require. Among these, image retargeting requires displaying a picture on devices with different resolution and aspect ratio [1] while preserving the shape and size of salient objects. It also frequently happens in the web page layout with multimedia content. Linear interpolation often fails to work due to introducing serious distortion or shrinkage of salient objects [2]. A related topic is to highlight objects of interest in a photo by enlarging them while keeping the image size unchanged and integrity undamaged [3]. Another related topic is thumbnail image generation which summarizes salient objects of a picture in a small image served as an icon or abstract in a web page [4]. By examining previous image resizing approaches, we find that most of them concentrate on exploring tricks to preserve the shape of salient objects. The resolution of salient regions by these methods are generally not high enough and there is room to be improved further. Although some methods try to explicitly control the size of salient objects, experimental results are not competitive enough. Moreover, when the size of the target image remains unchanged or is changed uniformly along both vertical and horizontal directions, some (mainly mesh-based) methods degenerate to simple linear interpolation scaling. We present a novel method which attempts to ensure that the scaling factor in salient regions is larger than nonsalient regions when image retargeting so that more resolution to salient objects in the target image. The main contribution of the paper is therefore casting the image retargeting to the minimization of a saliency-weighted scaling factor energy. It can usually lend more pixels to salient objects in the target image even when the source image is equally scaled along vertical and horizontal directions (including the case that the image remains unchanged). Furthermore, multilevel scaling for different salient objects can also be achieved by assigning saliency values to them independently.
854
D. Wang et al.
2 Related work
3 Scaling factor driven optimization
A great amount of work about image retargeting has been described in literature. Roughly speaking, image resizing methods can be classified into two categories. The first class is discrete approaches which generally adjust the image size by inserting or removing pixels one by one or part by part [5–8]. The second class generally views the problem as the constrained deformation of a mesh overlaying the source image with texture mapping in the deformed mesh space [9–16]. Because both mesh deformation and texture mapping are continuous transformation, we say that this kind of method is continuous. In addition, there are also hybrid methods which generally consist of an optimized sequence of resizing operators from schemes of both classes [17–19]. Finally, as an extension, video retargeting has also been explored by a great amount of literature (see [20–22] for example). Due to space limitation, we only briefly discuss continuous image retargeting methods, which are closely related to our approach. For more details, one can refer to surveys [23, 24]. Mesh-based methods provide globally continuous solutions to image retargeting. Gal et al. [9] presented a Laplacian editing based texturing method that allows an image to be warped in a feature-aware manner by constraining user specified features to comply with a similarity transformation. Wang et al. [10] resized images by a scale-and-stretch warping technique which minimizes the distortion energy of mesh quads plus the bending energy of the mesh grid lines. Zhang et al. [11] defined a unified similarity transformation constraint for each local salient region in order to preserve the shape of salient objects. Karni et al. [12] addressed image resizing as an application of their energy-based image deformation in which each point is associated with a set of transformations. Guo et al. [13] presented a method based on mesh parameterization which can exaggerate salient objects by explicitly specifying their scaling factor which is not directly related to object saliency. Their approach results in a nonlinear optimization problem. Jin et al. [16] extended the idea in [12] to an arbitrary triangulated mesh over an image and exploited a similar energy to constrain the deformation of the mesh by considering scaling transformation. These approaches have a common feature, i.e., using deformation energy of different forms to prevent a target mesh from shape distortion, therefore, usually degenerate to linear interpolation scaling if the retargeted image has the same aspect ratio as the source one. The proposed method is also based on mesh deformation. It directly optimizes a weighted scaling factor energy which indicates that the higher the saliency, the larger the corresponding scaling factor. This makes it easy for our approach to avoid the above artifact.
We firstly introduce our notations before describing the approach in detail. Given source image s of ms × ns , we want to adjust it to image t of mt × nt . Like [10], we create a planar quadrilateral mesh on the image space, denoted as M = V , E, F , by uniformly placing m × n grid lines over s , where V is the vertex set, E the edge set, and F the face set. Further denote the edge set of f ∈ F by E(f ), the coordinates of v ∈ V by (v(x), v(y)), and edge with u and v as two endpoints by (u, v). Our goal is to deform M to M = V , E , F according to the content of source image s , where E = E, F = F . For v ∈ V , we always use v as its counterpart in V . The boundaries of M should be registered with target image t . t is finally generated by applying a routine texture mapping to M . Boundary alignment is achieved by the following boundary constraints on boundary vertices of M : i) v (x) = v(x) nnst for all vertical mt boundary vertices v ∈ V ; ii) v (y) = v(y) m for all horis zontal boundary vertices v ∈ V . As a preprocessing step, we assign each face f ∈ F a saliency value ω(f ) to characterize the visual saliency of the corresponding image region. The saliency map I , i.e., each pixel corresponds to a saliency value, can be evaluated by any saliency computing method and a graph-based saliency detector described in [25] is employed in our implementation. The saliency ω(f ) of face f is then defined by adding the saliency of all pixels in face f . Considering that ω(f ) will be used as denominator in the scaling factor energy, we normalize all values of ω(f ) to range [0.001,1]. Let be the average of all face saliency values and reset ω(f ) = 1 if ω(f ) is above . In practical implementation, we also allow the user to interactively specify the saliency values of corresponding faces of an object. It is useful when the user wants to shrink some objects to foil other enlarged objects. Let s(f ) = (sx (f ), sy (f )) be the scaling factors along X and Y directions respectively for transforming f ∈ F to f ∈ F . Concretely, assuming (u, v) ∈ E(f ) and (u , v ) ∈ E(f ), we define s(f ) as follows: u (x) − v (x) = sx (f )(u(x) − v(x)), (1) u (y) − v (y) = sy (f )(u(y) − v(y)). Section 3.1 presents in detail how to define the scaling energy for deforming the mesh nonuniformly. Section 3.2 introduces some additional energies to improve the quality of the mesh. Section 3.3 describes the total energy used by the proposed method. 3.1 Saliency-driven scaling energy Our basic idea comes from a simple observation: the more salient the region is, the larger the scaling factors of its corresponding faces should be. Specifically, we assume that
Saliency-driven scaling optimization for image retargeting
855
the scaling factor of a face are linearly proportional to its saliency. Namely, 1 1 s(f ) = s(g), ω(f ) ω(g)
∀f, g ∈ F.
(2)
Without loss of generality, we consider a simple case in which mesh M has only a row of faces (m = 2) and sy (f ) = 1 holds ∀f ∈ F . Considering that the initial mesh is uniformly generated in which the sizes of all mesh faces ns , we have are equal and the width of each face is n−1 f ∈F
ns = nt . sx (f ) n−1
This yields
sx (f ) = (n − 1)
f ∈F
nt . ns
3.2 Additional energy Face similarity In this part, the vertex subscript of a face should connotate a modulo 3 or 4 operation depending on triangle or quad face involved. For a triangle t = (v0 , v1 , v2 ), let θi = ∠vi−1 vi vi+1 . According to scaling and rotational transformation, we can easily derive
(3) vi = (U − Λi )vi+1 + Λi vi+2 ,
In this case, (2) reduces to 1 1 sx (f ) = sx (g), ω(f ) ω(g)
and other areas, respectively. For salient regions with face saliency ω(f ) = 1, like [21], we enforce all faces to share the same scaling factor sI , that is, sx (f ) = sy (f ) = sI . Usually, nonsalient faces do not follow uniform scaling and, therefore, own independent scaling factors. The expressions of scaling factors for the two cases are described in the Appendix.
∀f, g ∈ F
(4)
We claim that (4) is equivalent to minimize s 2 (f ) x . ω(f )
where 1 0 U= , 0 1 Λi =
(5)
f ∈F
under the constraint of (3) where the right of the equation is a constant. To show this, we employ the Lagrange multiplier method to solve optimization with constraints:
i = 0, 1, 2
vi − vi+1 cos θi+1 vi+2 − vi+1 sin θi+1
− sin θi+1 . cos θi+1
For deformation from t to t = (v0 v1 v2 ), we define the following energy to measure its shape distortion ψ(t, t ) =
2 v − (U − Λi )v − Λi v 2 . i i+1 i+2
(7)
i=0
Φ(. . . , sx (f ), . . . , λ) s 2 (f ) nt x . +λ sx (f ) − (n − 1) = ω(f ) ns f ∈F
f ∈F
Setting ∇Φ = 0 exactly yields (4). A similar conclusion holds for meshes comprising one column of faces. This motivates us that replacing (2) with minimization of the following energy can achieve a similar effect: s(f ) 2 ξsc = . ω(f )
(6)
f ∈F
It should be pointed out that (2) can usually not be satisfied for general meshes due to overconstraint. Therefore, the equivalence between (2) and the minimization of (6) is not true again. For convenience of combining other constraints, we substitute vertex coordinates of M and M for the scaling factors in (6) according to (1). To guarantee salient objects uniformly scaled, we derive the relationship between scaling factors and vertices of M and M for salient regions
Now, assume f = (v0 , v1 , v2 , v3 ) be a quad face of M and f = (v0 , v1 , v2 , v3 ) be its deformed version in M . Their similarity is measured by the deformation of four triangles ti = (vi , vi+1 , vi+2 ), i = 0, 1, 2, 3: ψ(f, f ) =
3
Ξ (ti , ti ).
(8)
i=0
To guarantee that salient regions have higher similarity, we introduce the following weighted similarity constraint: ξsm = ω(f )ψ(f, f ). (9) f ∈F
Mesh regularity As salient and nonsalient regions are transformed with different scaling factors according to their saliency value, edges may severely deviate from their original orientation after deformation. This can be alleviated by minimizing the orientation deviation which we regard as the improvement of mesh regularity. Considering that there are only horizontal or vertical edges in our initial setting,
856
D. Wang et al.
Fig. 1 Equally resizing along two directions: (a) original image of m × n and its saliency; (b) retargeted image with the height reduced first and then the width; (c) retargeted image in the reverse order with
respect to (b); (d) retargeted image along two directions simultaneously; (e) our result. The results (b–d) are produced by OSS [10]. All retargeted images are of size 12 m × 12 n
we simply penalize the deviation by ξed = δ(u, v).
3.3 The total optimization energy (10)
(u,v)∈E
where δ(u, v) = (u (x) − v (x))2 (u(y) − v(y)) + (u (y) − v (y))2 (u(x) − v(x)). It should be pointed out that minimizing (10) serves to prevent the edges from bending [10]. Line pattern preservation Assuming l is such a line segment that intersects k mesh edges (ui , vi )k−1 i=0 ⊂ E at points pi , we can write pi as the combination of edge endpoints ui and vi , pi = λi ui + (1 − λi )vi , 0 ≤ λi ≤ 1,
i = 0, 1, . . . , k − 1.
Suppose the corresponding deformed points still satisfy the same combination. For line l can usually not sustain the slope of l, just as [21], the collinear energy constraint is φ(l) =
k−1
2 al pi (x) + bl pi (y) + cl ,
(11)
i=0
where al , bl , cl are unknown and need to be estimated in advance as described in Sect. 3.3 and then normalized to al2 + bl2 + cl2 = 1. Denote the line segment set as L. All line pattern preservation constraints can be expressed as ξln = φ(l). (12) l∈L
So far, we have discussed four kinds of energy for optimizing the constrained mesh deformation. We define their weighted sum as our final optimization function ξ(M ) = αξsc + βξsm + γ ξed + ηξln .
(13)
Note that (13) is not a quadratic function due to line pattern preservation and it will lead to a nonlinear optimization problem. Instead of directly solving the nonlinear optimization, we explore a two-stage scheme to conquer it. In the first stage, we perform mesh deformation without line pattern preservation, namely set η = 0. The problem reduces to a quadratic optimization. In the second stage, we first find the new positions of intersections in M following a line fitting operation to calculate the coefficients al , bl , cl (see (11)) for each line segment we have detected. Substituting the values for these coefficients in (13), again yields a least square problem. This stage can be repeated multiple times until the result is satisfactory. 4 Implementation and results This section presents some examples to show the new features of our approach. Parameter values α = 0.1, β = 1, γ = 10, and η = 2 are used for all cases. Figure 1 presents an example uniformly retargeting an image (see Fig. 1(a)) to half size. Figure 1(b) is produced by
Saliency-driven scaling optimization for image retargeting
857
Fig. 2 Comparisons with seam carving and OSS methods: (a) the original image and its saliency; (b) seam carving; (c) linear scaling; (d) OSS [10]; (e) our method
Fig. 3 Comparisons with OSS and NSO algorithms: (a) original image and its saliency; (b) OSS [10]; (c) NSO [16]; (d) our method
performing the optimized scale-and-stretch method (OSS) [10] twice, firstly adjusting the height of original image to half and then the width to half, while Fig. 1(c) is generated in a reverse order. They both need to calculate the saliency maps of intermediate image besides the original image. And according to the meshes we can easily observed that the shapes of the fishes are deformed. Figure 1(d) is generated by conducting the OSS operation once. In this case, the OSS method degenerates to the linear interpolation scaling. Figure 1(e) is our method. Though performing the OSS method multiple times may avoid uniform scaling and prevent salient objects shrinking sometime (Fig. 1(b)), our ap-
proach may keep the shapes of salient objects and the structure of the original image better. Figure 2 gives two examples to compare our method with two typical discrete and continuous retargeting methods. Figure 2(a) and (b–e) are respectively original images of size m × n and target images of 23 m × 12 n. Figure 2(b–d) are results created by seam carving [5], linear scaling, and OSS [10] separately. It can be observed that our approach (Fig. 2(e)) is the best in preserving the shapes of salient objects in these examples. Figure 3 is another two examples to compare our algorithm with some mesh-based (continuous) methods: OSS
858
[10] and nonhomogeneous scaling optimization (NSO) [16]. Compared to our method (Fig. 3(d)), NSO (Fig. 3(c)) tends to suppress the vertical orientation of the fish while OSS (Fig. 3(b)) usually shrinks salient objects. Figure 3(a) depicts original images of m × n and their saliency maps. The target images are of 12 m × 23 n in these examples. To further compare our algorithm with OSS [10], we select 6 images and resize them to five sizes by both methods, respectively. We estimate the area ratio of salient objects in each target image and the corresponding original image. The 6 ratios for each target size are then averaged for both methods. Figure 4 demonstrates that our algorithm outperforms OSS for all target sizes. And we note that the ratio is about 0.25 when the source images are uniformly scaled to half by OSS [10]. We also present an example to compare our method with the mesh parameterization based approach (MP) [13] in
Fig. 4 The average area ratio of salient objects in target images and corresponding original images. The horizontal axis is the size of target images and the vertical axis is the corresponding ratio. The blue line is the statistics to the method in [10] and the pink line is our approach
D. Wang et al.
Fig. 5 in which the original image of 544 × 800 is resized to 800 × 800. It seems that the visual effect by our method is better. Enlarging salient objects and simultaneously suppressing others are useful in photo composition [3]. Our method can automatically achieve this goal without losing information. Here, we give two examples. The lamps at the top of Fig. 6(a) are too small and our algorithm can magnify them as salient objects, shown in the top of Fig. 6(b)). The flowers at the bottom are exaggerated by interactively assigning a larger saliency value than other salient regions. Limitation In most cases, our method can produce expected results, just as shown in Fig. 4. It exhibits good performance in preventing salient objects from shrinking. As the scaling factor of mesh face is related to its saliency closely, retargeting may sometimes result in overenlargement of salient objects if they are small enough with respect to the whole space of source image. Figure 7 shows such a case in which the trees in the target image are larger than those in the original image. Though one can alleviate
Fig. 6 Object enlargement (a) original image; (b) retargeted image
Fig. 5 Comparisons with OSS and MP methods: (a) original image; (b) OSS [10]; (c) MP [13]; (d) our method
Saliency-driven scaling optimization for image retargeting
859
sI =
Fig. 7 Failure case: (a) original image; (b) target image
the artifact by reducing the saliency value of salient faces or adjusting the blending parameters, precise control is difficult.
5 Conclusions We present a novel image retargeting approach in which the scaling energy makes the local scaling factor of a quad proportional to its saliency and the triangle similarity energy constrains the distortion of salient regions. Compared to previous continuous image retargeting methods, our method shares more pixels to salient objects even when the size of the image remains unchanged or is just scaled with the same ratio along vertical and horizontal directions. As future work, it is valuable to adaptively set blending parameters of energy terms according to image saliency so that the salient objects in a target image are large enough but not over-magnified. Acknowledgement We thank the anonymous reviewers for their valuable comments and Dr. Hong-bo Fu for helpful discussion. Thanks also go to Jonathan Harel for providing source code for saliency detection; Yu-shuen Wang for their executable program for producing OSS results, Ligang Liu and Yong Jin for generating NSO results, and Yan-wen Guo for generating parametrization based results. Wei-jia Jia is supported by the Research Grants Council of the Hong Kong SAR, China (CityU 114908, 114609). Guiqing Li is supported by NSFC(60973084), NSF of Guangdong (9151064101000106), and Fundamental Research Funds for the Central Universities (2009 zz0016). Xiao-nan Luo’s research is supported by the Joint Funds of NSFC-Guangdong (U0835004, U0935004).
Appendix: The calculation of scaling factors In salient regions, combining (1), we derive the following linear system u − v = sI (u − v),
∀(u, v) ∈ E(f ), f ∈ F ∧ ω(f ) = 1.
This is an over-determined system with single variable sI and its solution in the sense of least square is
f ∈F,ω(f )=1
f ∈F,ω(f )=1
(u,v)∈E(f ) u − v, u
− v
(u,v)∈E(f ) (u − v)
2
,
where , stands for the dot product. In nonsalient regions, their scaling factors in (1) are independent, that is, (u,v)∈E(f ) (u(x) − v(x))(u (x) − v (x)) sx (f ) = , 2 (u,v)∈E(f ) (u(x) − v(x)) (u,v)∈E(f ) (u(y) − v(y))(u (y) − v (y)) sy (f ) = . 2 (u,v)∈E(f ) (u(y) − v(y)) References 1. Setlur, V., Lechner, T., Nienhaus, M., Gooch, M.: Retargeting images and video for preserving information saliency. Computer Graphics and Applications 27(5), 80–88 (2007) 2. Shamir, A., Avidan, S.: Seam carving for media retargeting. Communication of the ACM 52(1), 77–85 (2009) 3. Liu, L., Chen, R., Wolf, L., Cohen-or, D.: Optimizing photo composition. Computer Graphics Forum (Proceedings of Eurographics) 29(2), 469–478 (2010) 4. Lam, H., Baudisch, P.: Summary thumbnails: readable overviews for small screen web browsers. In: CHI 2005, pp. 681–690 5. Avidan, S., Shamir, A.: Seam carving for content-aware image retargeting. ACM Trans. Graph. 26(3), 267–276 (2007) 6. Achanta, R., Susstrunk, S.: Saliency detection for content-aware image resizing. In: ICIP 2009, pp. 1001–1004 7. Stephan, K., Johannes, K., Hendrik, L., Wolfgang, E.: FSCAV— Fast seam carving for size adaptation of videos. In: ACM Multimedia 2009, pp. 321–330 8. Li, Z., Ishwar, P., Konrad, J.: Video condensation by ribbon carving. IEEE Transactions on Image Processing 18(11), 2572–2583 (2009) 9. Gal, R., Sorkine, O., Cohen-or, D.: Feature aware texturing. In: Proceedings of Eurographics Symposium on Rendering, pp. 297– 303. (2006) 10. Wang, Y., Tai, C., Sorkine, O., Lee, T.: Optimized scale and stretch for image resizing. ACM Trans. Graph. 27(5), 1–8 (2008) 11. Zhang, G., Cheng, M. Hu S., Martin, R.: A shape-preserving approach to image resizing. Computer Graphics Forum 28(7), 1897– 1906 (2009) 12. Karni, Z., Freedman, D., Gotsman, C.: Energy-based image deformation. Computer Graphics Forum 28(5), 1257–1268 (2009) 13. Guo, Y., Liu, F., Shi, J., Zhou, Z., Gleicher, M.: Image retargeting using mesh parametrization. IEEE Trans. Multimedia 11(4), 856– 867 (2009) 14. Ren, T., Liu, Y., Wu, G.: Image retargeting using multi-map constrained region warping. In: Proceedings of the Seventeen ACM International Conference on Multimedia, pp. 853–856. (2009) 15. Pavi, D., Kobbelt, L.: Two-colored pixels. Computer Graphics Forum (Proceedings of Eurographics) 29(2), 743–752 (2010) 16. Jin, Y., Liu, L., Wu, Q.: Nonhomogeneous scaling optimization for realtime image resizing. The Visual Computer 26(6), 769–778 (2010) 17. Rubinstein, M., Shamir, A., Avidan, S.: Multi-operator media retargeting. ACM Trans. Graph. 28(3), 1–11 (2009). 23 18. Dong, W., Paul, J.: Adaptive content-aware image resizing. Computer Graphics Forum 28(3), 1–10 (2009) 19. Dong, W., Zhou, N., Paul, J., Zhang, X.: Optimized image resizing using seam carving and scaling. ACM Trans. Graph. 28(5), 1–10 (2009). 125
860
D. Wang et al.
20. Rubinstein, M., Shamir, A., Avidan, S.: Improved seam carving for video retargeting. ACM Trans. Graph. 27(3), 1–9 (2008) 21. Philipp, K., Manuel, L., Alexanger, H.: A system for retargeting of streaming video. ACM Trans. Graph. 28(5), 126:1–10 (2009) 22. Wang, Y., Lin, H., Sorkine, O., Lee, T.: Motion-based video retargeting with optimized crop-and-warp. ACM Trans. Graph. 29(4), 90:1–9 (2010) 23. Shamir, A., Sorkine, O.: Visual media retargeting. In: Proceedings of ACM SIGGRAPH Asia Course(2009) 24. Vaquero, D., Turk, M., Pulli, K., Tico, M., Gelfand, N.: A survey of image retargeting techniques. Applications of Digital Image. Processing XXXIII, Proc. SPIE (2010) 25. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. Advances in Neural Information Processing Systems 19, 545–552 (2007) Dong Wang is a Ph.D. candidate in the Department of Computer Science, City University of Hong Kong, China. Her research interests include video & image processing, computer graphics.
Guiqing Li is a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His research interests include computational geometry, computer graphics, and CAD/CAM applications.
Weijia Jia is a Professor in the Department of Computer Science, City University of Hong Kong, China. His research interests include mobile computing, video processing.
Xiaonan Luo is a Professor in the School of Information Science and Technology, Director of the Computer Application Institute, Sun Yatsen University, China. His research interests include mobile computing, computer graphics and CAD.