Fast Multi-Operator Image Resizing and Evaluation - Springer Link

10 downloads 1220 Views 1MB Size Report
1Sino-French Laboratory for Computer Science, Automation and Applied Mathematics/National Laboratory of ... cropping, our method can realize content-aware image resizing very fast. .... cient degrees of freedom to compress regions without.
Dong WM, Bao GB, Zhang XP et al. Fast multi-operator image resizing and evaluation. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(1): 121–134 Jan. 2012. DOI 10.1007/s11390-012-1211-6

Fast Multi-Operator Image Resizing and Evaluation Wei-Ming Dong1,2 (董未名), Member, CCF, ACM, IEEE, Guan-Bo Bao1 (鲍冠伯) Xiao-Peng Zhang1 (张晓鹏), Member, ACM, and Jean-Claude Paul2 1

Sino-French Laboratory for Computer Science, Automation and Applied Mathematics/National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

2

National Institute for Research in Computer Science and Control, Domaine de Voluceau Rocquencourt Le Chesnay 78153, France

E-mail: {Weiming.Dong, Guanbo.Bao, Xiaopeng.Zhang}@ia.ac.cn; [email protected] Received April 13, 2011; revised April 21, 2011. Abstract Current multi-operator image resizing methods succeed in generating impressive results by using image similarity measure to guide the resizing process. An optimal operation path is found in the resizing space. However, their slow resizing speed caused by inefficient computation strategy of the bidirectional patch matching becomes a drawback in practical use. In this paper, we present a novel method to address this problem. By combining seam carving with scaling and cropping, our method can realize content-aware image resizing very fast. We define cost functions combing image energy and dominant color descriptor for all the operators to evaluate the damage to both local image content and global visual effect. Therefore our algorithm can automatically find an optimal sequence of operations to resize the image by using dynamic programming or greedy algorithm. We also extend our algorithm to indirect image resizing which can protect the aspect ratio of the dominant object in an image. Keywords

1

image resizing, multi-operator, operator cost, indirect resizing

Introduction

With the rapid growth of display device diversity and versatility today, new demands are presented to digital media. Image resizing, as one of the most useful and widely-used techniques in relevant areas, has been accordingly greatly improved. Recently, content-aware methods such as seam carving, non-homogeneous warping and patch transform were proposed as supplements to the traditional content-oblivious methods such as scaling and cropping. A content-aware resizing operator preserves prominent objects according to an importance map. Different measures can be combined together to determine the pixel significance, including gradients, saliency and entropy, etc. High level cues such as face detectors, texture detectors and motion detectors can also be employed. In general, the resizing quality largely depends on the image itself: single-operator methods might work well for some target sizes while not in other cases. Multi-operator methods[1-2] are more effective techniques for content-aware image resizing. This kind of

techniques can combine different operators in an optimal manner instead of searching for a “best” operator that will work on all images. An image similarity measure based on dynamic time warping (DTW)[1] or image euclidean distance (IMED)[2] is used to optimize the search for the best resizing result. However, the computation of patch matching in DTW and IMED becomes the main bottleneck of the efficiency. Moreover, because the optimization computation grows exponentially with the number of operators, it will also slow the resizing processing when more operators are employed (2∼10 minutes for 2 operators, 10∼20 minutes for 3 or 4 operators). This drawback causes difficulties to apply the techniques for interactive usage. To combine several operators there is a need to compare the operation damage to the current image and evaluate different resizing results. The users’ preference is always the most important measure. Hence, we need to analyze the users’ preference of using different operators. And given the evaluation measure, we need a scheme that integrates the users’ feedback into the resizing process.

Regular Paper This work is supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 60872120, 60902078, 61172104, the Natural Science Foundation of Beijing under Grant No. 4112061, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of State Education Ministry of China, the French System@tic Paris-Region (CSDL Project), and the National Agency for Research of French (ANR)-NSFC under Grant No. 60911130368. ©2012 Springer Science + Business Media, LLC & Science Press, China

122

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

Fig.1. Original image is resized using seam carving (SC), scaling (SL), cropping (CR), our multi-operator algorithm and the mean result of the user study. (a) Original. (b) By SC. (c) By SL. (d) By CR. (e) By ours. (f) User mean. The multi-operator algorithm combines seam carving, scaling and cropping to optimize our new similarity measure. We conduct a user study to let users select their favorite images. In the result, 36% of the users choose our result, 44% choose the user mean result and the other 10% choose the cropping result. Our result still gets high support although it does not agree with the users’ preference. More cropping operations are used in (f) than in (e).

In this paper, we introduce a new, operator costbased approach that performs multi-operator image resizing, as well as other methods in a direct manner. Seam carving, scaling and cropping are combined together to resize the given images. Image energy and dominant color descriptor (DCD) are combined together to formulate the operator cost functions. We conduct user studies to analyze the users’ compromise tendency when an original image has to be damaged by an operator. A coefficient is employed to revise the operator costs so that the users’ preference can be responded in the results. Our approach is fast, straightforward to implement, and can generate comparable results to the barycenter of users choices (Fig.1). We propose cost functions to evaluate how much damage each operator will cause to the current image. An objective function is also formulated to optimize the resizing process. A novel criterion is defined for operator selection at each resizing step. It can stochastically choose an operator based on the statistical analysis of the operator costs. The best path (i.e., sequence of operators) is found by minimizing the objective function according to the cost functions. We show how to encode the new criterion into dynamic programming. Moreover, a new optimization algorithm is proposed, which dramatically increases the speed of multi-operator resizing without damaging the visual quality. We also provide a semi-automatic tool to help the user select the best result from several images according to his own visual preference. In addition, we illustrate the extension of our algorithm for indirect resizing in order to preserve the aspect ratios of the dominant objects. It is worth noting that our multi-operator algorithm can support several types of similarity measures as well as different resizing operators. In summary, our specific contributions are: • analysis of the users’ preference on using seam

carving, scaling and cropping, through user studies; • a new operator cost function combining image energy and DCD for evaluating the information loss in the being resized image, which is fast and easy to implement; • a novel stochastic method for operator selection at each resizing stage; • a new and fast quantitative global similarity measure between the source and target images; • an interactive multi-operator image resizing framework which tightly integrates the users’ real visual preference; • extension to indirect resizing for shape preservation. 2

Related Work

Image resizing is crucial for displaying visual media at different resolutions and aspect ratios. Traditional methods work by uniformly scaling the image to a target size without considering the image content. These methods equally propagate the distortion throughout the entire image and noticeably squeeze prominent objects. To overcome this shortcoming, many approaches attempted to remove the unimportant information from the image periphery[3-6] . Another way is to use a face detector[7] and a saliency measure[8-10] . The image is cropped to fit the target aspect ratio and then uniformly resized by traditional interpolation. More sophisticated cropping approaches usually require human intervention to create an optimal window for the most appropriate portion of the scene. These methods work well for some special applications such as surveillance[11] , but tend to fail in general photo editing applications. Besides, prominent objects may be removed by cropping methods especially when the output resolution is significantly lower than the input resolution.

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

Recently, seam carving (SC) methods have been proposed to retain important contents while reducing or removing other image areas[12-13] . These techniques reduce or expand uniform regions scattered throughout the image, by removing or duplicating monotonic pixel-wide low-energy seams. SC produces impressive results, but may deform important content, especially structural objects, when the homogeneous information in the required spatial direction runs out. Moreover, the visual effect coming from global composition of the image may also be damaged in the output, because such techniques only preserve the “important” objects. Continuous resizing methods have been demonstrated by using image warping. To minimize the resulting distortion, the local regions are squeezed or stretched by globally optimizing warping functions. The earlier work of Gal et al.[14] warped an image according to the user specifications, preserving the shape of masked regions. Wolf et al.[15] automatically transformed an existing video to fit the dimensions of an arbitrary display based on local importance detectors. Zhang Y F et al.[16] employed shrinkability maps and random walk model to accelerate the scaling process and decrease the storage requirements. Wang et al.[17] presented a “scale-and-stretch” warping method. The method iteratively updates a warped image that matches optimal local scaling factors. However, since the distortion is distributed in all spatial directions, some objects may be excessively distorted, damaging the global spatial structure of the original image. Guo et al.[18] constructed a mesh image representation that is consistent with the underlying image structures. The limitation is that the emphasis of relative scale of salient object will inevitably distort its nearby objects. Kr¨ahenb¨ uhl et al.[19] presented a video retargeting framework which combines key framebased constraint editing with numerous automatic algorithms for video analysis. The limitation of this method is that in some cases the warping may fall back to linear scaling when the warp does not have sufficient degrees of freedom to compress regions without violating feature constraints. Wang et al.[20] preprocessed the full video to keep the distortion constant across the optical flow which results in improved temporal coherence for complex motion. Kim et al.[21] proposed a divide-and-conquer approach to media retargeting based on Fourier analysis. Zhang G X et al.[22] presented a shape-preserving approach to ensure that the new shapes of prominent objects are geometrically similar to their original shapes both locally and globally. The limitation is that the method cannot guarantee to strictly preserve edges. Huang et al.[23] presented a novel framework for preserving the global structure in

123

images and vector art. The accuracy of this method relies on robust structure detection methods. Wu et al.[24] developed symmetry-summarization to catch and summarize repetitive structural contents in an image when there is little overlapping. Patch-based methods are also presented for image retargeting or image summarization. Cho et al.[25] chose patch arrangements that fit well together to change the size of an image. The main drawback of this method is that it cannot preserve the completeness of the image. Pritch et al.[26] represented operations such as image retargeting, object removal, or object rearrangement as an optimal graph labeling and used graph cut to solve this problem. Barnes et al.[27] present a new randomized algorithm for quickly finding approximate nearest-neighbor matches between image patches. Interactive tools for image resizing and reshuffling are also presented. When one operator does not perform well, it is natural to extend it to a multi-operator. Rubinstein et al.[1] presented an image resizing algorithm (denoted as Multi-Op) to combine different operators in an optimal manner. Bi-cubic scaling, cropping and seam carving are used together in the process. They proposed DTW as a similarity measure between images to compare and evaluate the resizing results. Dong et al.[2] resized an image by performing seam carving and scaling coherently. An image distance measure based on IMED and DCD is defined for quantifying the quality of a resizing result. Similar global measures are also defined, such as bi-directional similarity[28] and inverse texture synthesis[29] . All these methods are based on patch matching, two images I and T are considered similar if all patches of I can find similar patches in T , and vice versa. 3

User Study

The content of an image will be damaged to some extent in the resized result, no matter how well the resizing operator is designed. Previous quantitative image similarity measures only automatically compare the inner contents of the images, without taking into account people’s real visual tendency. As shown in Fig.2, the quantitatively optimal result is not visually better than our result and the ground truth. We consider the reason is due to the people’s sensitivity to the discontinuities of the object boundaries. We conduct user study to prove the above supposition. We build a local program to separately record every user’s operation data. We first present users with an original image in one window, and in the other three windows, resized images that are retargeted in one or

124

Fig.2. Comparison of Multi-Op[1] , our method and user mean result. (a) Original. (b) By Multi-Op. (c) By our method. (d) User mean result. All methods use two operators (SC and SL) to resize the original image. In our user study, 64% of the users choose (d) as their favorite image, 32% choose (c), and only 4% pick (b).

two dimensions to a fixed size (the change in size was randomly chosen between 30% and 60% of the original size). The resized images in the three windows are generated by using CR, SC and SL separately. We specifically choose images that contain either structure or content that presents difficulties for the three methods. Most of the images are classic images used in previous work. All the visual appearances of the original images are severely damaged in the results. Users are asked to select the least dislike images from the results. The user study data are gathered together to be analyzed. Fig.3(a) summarizes the results of this experiment with fifty participants of different age, gender, and education background. As can be seen from the chart, in most cases, the users would compromise on the damage caused by CR or SL rather than SC. It means that the

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

users can more easily accept the content loss (caused by CR) or shape deformation (caused by SL) than the destruction of the object local structures. This is a very important information which helps us to develop the operator cost functions. We should stop using SC operator in time or even in advance when it is believed to have begun to destroy the image local structures and change the operations to CR or SL. Usually the energy of SC operator is smaller than that of SL and CR due to its optimized scheme, so we will add a factor to the cost function to adjust the SC cost value. The factor value will be adaptively set according to the image feature. In the second experiment, we evaluated the users’ viewing preferences between CR and SL. This information is important for the resizing algorithm in operator selection when SC will not be used any more. We use more examples in order to get an accurate data. The original images are randomly reduced to 70% to 90% in one or two dimensions. We use larger resized images than in the first user study in order to be sure that the image contents are not destroyed too much. In fact the main reason is that this time we need to acquire the users’ like-most information instead of the dislike-least during the first conduct. Fig.3(b) shows that between CR and SL the users do not perform apparent tendency, but we can still notice that in many more examples the users like CR rather than SL. Besides the above two user studies, we also conducted another two interesting experiments by letting the participants choose their dislike-least or like-most resized images without giving the original images. The

Fig.3. User study of fifty participants clearly indicates the people’s acceptance tendency of damaged images. (a) Comparing the users’ compromise tendency of using CR, SC and SL to damage the image contents. Only in two examples there are more users who compromise to SC than CR and SL. And in many cases the users only choose CR and SL. (b) Comparing the users’ viewing preference of using CR and SL to resize the images.

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

results show the users’ preferences are even more clear than that in the experiments with the original images. 4

Operator Cost

The effectiveness of using energy-based measure to estimate pixel importance has been demonstrated in previous work[12,17] . In our algorithm, we also use image energy to detect the loss of the prominent information during the resizing process. We define the pixel energy as s ³ ∂ ´2 ³ ∂ ´2 e(I) = I + I , (1) ∂x ∂y which is the L2 norm of the gradient. Smaller values mean less importance. Saliency map can also be integrated to determine the attractiveness of a region, if necessary. We use the method in [10] to extract the saliency map. The energy can indicate the presence of local structures. However, the global visual effects are also important. In many applications such as content-based image retrieval, color features are commonly used to represent the global information of images, which are relatively independent of the viewing angle, translation, and rotation of the objects and regions of interest. In our algorithm, we use dominant color descriptor (DCD) to describe the global information of the original image. A DCD specifies a small number of dominant color values and their statistical properties: distribution and variance[30] . The structure of a DCD, F , is defined as FI = {pj , cj , v j },

j = 1, 2, . . . , NDCD ,

(2)

where NDCD is the number of dominant colors (NDCD = 16 in our experiments), pj is the percentage of pixels in the image corresponding to the j-th dominant color, v j is a vector representing the i-th dominant color, and the cj is the variation of the dominant color values of the pixels around v i . In our algorithm,

125

we use Dong et al.’s method[2] to extract DCD. A new measure of DCD difference is proposed, which can accurately evaluate the damage to the global visual effect. The energy term and DCD term are combined together to formulate our operator cost. 5

Fast Image Resizing

During the resizing process, an operator O is employed to reduce or enlarge an image either in its width or its height. In this paper, we use seam carving (SC), scaling (SL) and cropping (CR) as the resizing operators. Each time one pixel is removed or added to the width or height of the image. In our algorithm, cropping is used only for reducing image size. We remove the side with lower cost separately for width and height. For scaling, we always perform a scale-by-k each time rather than applying one pixel scaling k-times. Fig.4(b) shows the gradient and dominant color of the image in Fig.4(a). These two parts are both important in evaluating the operator cost. Formally, let I be an original image and O = {O1 , . . . , On } be a collection of n operators. We define the cost function to be: C(Ol (I)) = (1.0 − ωdcd ) · E(sl ) + ωdcd · D(Ol (I)), (3) where sl is the operation field of the operator Ol on the current being resized image I, Ol (I) is the resized image after performing operator Ol . E(sl ) is the energy information which measures the damage to the local object structures, D(sl ) evaluates the global visual information loss by employing DCD. We describe in detail how to calculate E(sl ) and D(Ol (I)) in Subsections 5.1 and 5.2. 5.1

Energy Information Cost

The energy information E(s) is calculated by considering the image energy: E(s) =

Ns 1 X e(si ) + max e(si ), 16i6Ns Ns i=1

(4)

Fig.4. We integrate the energy and the dominant color into a function as the operator cost. The energy part combines an average and a max operators. (a) Original image. (b) 1) Gradient, 2) Dominant color. (c) By [17]. (d) By ours. (e) User mean. (f) No MAX item. The images in (c), (d) and (e) respectively get 28%, 32% and 40% support during our user study. Our algorithm takes 4.5 seconds to resize the image.

126

where si represents one pixel in the operation field s, Ns = ksk is the number of pixels in s. For SC, s is the seam which is proposed to be removed or inserted. For SL, we set s to be the whole image because all the pixels will be affected by the SL operator. Suppose we are reducing the width of the image and the height is h, we use the average of the top h maximum energy values as the max operator value of SL. For CR, s only needs to count boundary pixels. The max operator is necessary because resizing visual artifacts are often affected by a small number of deformed elements. Fig.4 illustrates the resizing results of using Scale-and-Stretch[17] , our method and user mean. We can see the importance of the max operator in preserving the shapes and boundaries of the prominent objects. As discussed in Section 3, the damage of local structures caused by SC is the most unacceptable artifact for users. Therefore, we add an additional term to revise the SC energy in (4) in order that the algorithm can stop using the SC operator in time. In our experiments, we find that the resizing quality of using pure SC operation is closely related to the richness of the original image. As suggested in [31], the richness of the image can be roughly characterized by the amount of image edges. Detect the edge number ns by the Sobel detector and nc by the Canny detector. The edge number ne of an image is: ne = κ · ns · g(− ns −5000 ) + nc · g( ns −5000 ), where 500 500 1 g(x) = 1+exp (−x) is a sigmoid function and κ = 8 is a constant parameter to make the edge numbers of Sobel ne as the richand Canny comparable. We denote r = kIk ness ratio of the original image, where kIk is the number of the pixels. We calculate the relative standard deviation rrsd of the edge numbers of all the image patches. We separately calculate three rrsd values by using 8×8, 16 × 16 and 32 × 32 patches and use the average value as the final result. We tested more than 500 images downloaded from Flickr. The images are resized in one dimension to a fixed size from 30% to 60% of the original size, by using pure SC operation. We find that for most examples, the SC operator can get an acceptable result when the image richness value is from 0.055 to 0.075, at the same time usually rrsd > 1.3. As shown in Fig.5, the image

Fig.5. From left to right, the richness ratio and relative standard deviation (r, rrsd ) of the images are: (0.069, 1.391), (0.139, 0.764) and (0.099, 1.101).

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

on the left can get good resized results by using SC only. It means that to let SC work well, the image should not have too many details in its objects (r is small) or have large relatively homogeneous background (rrsd is big). We define a revised coefficient as  1.0 + max (0.075/r, r/0.055) + (1.3 − rrsd ),     if 0.055 < r < 0.075, α=  1.0 + max (r/0.075, 0.055/r) + (1.3 − rrsd ),    otherwise. (5) Finally we reformulate the energy cost of SC as ESC = ESC · α. 5.2

DCD Information Cost

We utilize DCD to evaluate the damage to the image global visual effect. We first extract the DCD of the original image using the method in [2], as formulated in (2). Then, for all pixels si labeled with v j , we define X d(I(si ) − v j ) fj = Confidence(v j ) =

si ∈Ivj

No Pixels Labeled(v j )

,

(6)

where Ivj is the image domain defined by the pixels si that are closest to the dominant color v j , d is an L2 norm operator. The confidence map returns a weighted measure between the variance within the cluster and the number of pixels in the cluster. The lower its value is, the more reliable the estimate vj is[32] . Then we use the dominant colors v j in FI as the cluster set to extract the DCD of image Ol (I): The operator cost is calculated by considering the difference between FI and FOl (I) : v uNDCD uX [(v j − v lj )2 + (pj − plj )2 ] · (1.0 − fj ). D(Ol (I)) = t j=1

(7) Apparently, (7) reflects the DCD information loss after performing an operator Ol , which is used in our algorithm to evaluate the damage to the global visual effect. In (3), we use the DCD term as a soft constraint by setting ωdcd = 0.2. Fig.6 shows the importance of the DCD term. The algorithm cannot obtain a good result by using only the image energy. DCD contains not only the color value information, but also the percentage of pixels in the image corresponding to each dominant color. The item fj in (7) is used to emphasize the importance of the dominant colors which are more reliable in their clusters. We can see that DCD helps to achieve a nice balance between several visually-important objects, e.g., the flower and the leaf in Fig.6(e).

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

127

Fig.6. The effect of DCD. (a) Original image. (b) 1) Gradient 2) Dominant color. (c) By SC. (d) Without DCD. (e) By ours. (f) Result of [2]. We can see that the image in (e) which integrates DCD preserves the global visual effect better.

5.3

Operator Selection

To combine the energy part E(s) and DCD part D(Ol (I)) together as the operator cost function. We need to transform the two values to be compatible. We first use a similar method as Gibbs sampling[33] to convert the energy and DCD difference values into probability values: 1 E(s ) ´ PEl = l exp − t ZE ³ D(O (I)) ´ 1 l l PDCD = l exp − , t ZDCD ³

l

(8)

Pn E(sl ) l l where ZE = = l=1 exp (− t ) and ZDCD Pn D(Ol (I)) ) are the partition functions of exp (− l=1 t Gibbs sampling, t = 15 is a constant in our algorithm. Then the cost function of (3) is converted to: l P (Ol (I)) = (1.0 − ωdcd ) · PEl + ωdcd · PDCD ,

(9)

where P is the probability of using operator Ol at current resizing step. We will use this value in the following optimization process. ∗

l = arg min P (hσl ∪ Ol i(I)). 16l6n

(10)

In practice, the same as the discrete search schemes in [1] and [2], we also sample the search space in higher rates than 1 pixel, applying each operator five times between stages. FOl (I) = 5.4

{plj , clj , vjl },

j = 1, 2, . . . , NDCD .

(11)

Optimization

We adopt the dynamic programming scheme used in [1] to optimize the search for the best mixed operator sequence. The DTW-based similarity measure is replaced by our Energy-DCD-based operator cost. In our algorithm, we also assume that the ratio of operators in a sequence is more important than their orders in the sequence. We represent a sequence by (q1 , . . . , qn ) where ql denotes the total number of times of applying

operator Ol . For representative formulation, given a source image I, the problem is to seek a target image T generated by a sequence that minimizes: ql n X X

P (Ol (Ijl )),

I11 = I.

(12)

l=1 j=1

Note that (12) is only an approximate description of the optimization problem because we ignore the operator sequence in the equation. A dynamic programming table is used to store the optimal cost and optimal sequence σ(q1 , . . . , qn ) including the order of applying all the operators. In the dynamic programming process, to fill the entry (q1 , . . . , qn ), we need to examine all its predecessor sequences σl = σ(q1 , . . . , ql−1 , . . . , qn ), 1 6 l 6 n. Following the mechanism in [1], the operator Ol is appended to sequence σl to get the new operator sequence hσl ∪ Ol i. Then we can apply this new sequence to the original image and choose the best one: l∗ = arg min C(hσl ∪ Ol i(I)). 16l6n

(13)

In practice, the same as the discrete search schemes in [1] and [2], we also sample the search space in higher rates than 1 pixel, applying each operator five times between stages. 5.5

Further Acceleration

Suppose we want to reduce the width w of input image I by m pixels, the time and space complexities of using dynamic programming are O(mn ), which is polynomial in the amount of size change, while exponential in the number of operators to be used. One possible method of accelerating the resizing process is to use regular paths[1-2] . It means that the order of operators is fixed ahead of time, which Pncan be denoted as hk1 × Ol1 , . . . , kn × Oln i, where j=1 kj = m (see Fig.7(g)). This allows us to find the optimal result using exhaustive search in O(mn−1 ), which is polynomial in the size change m while exponential in the number of operators n. Since we usually use small number of

128

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

Fig.7. Comparison of the results using different methods. (a) Original image. (b) Resizing results of Multi-Op[1] (mixed path). (c) (regular path, h−90SC, −60SLi). (d) Our results (mixed path, using (13)). (e) (mixed path, using stochastic operator selection). (f) (direct mixed path, using stochastic operator selection). (g) (regular path, using stochastic operator selection, h−85SC, −65SLi). The optimal operator sequence of (e) is h−10SC, −10SL, −5SC, −5SL, −20SC, −10SL, −10SC, −45SL, −15SC, −20SLi. (h) SC. (i) SL.

n (three or four operators), the search is feasible by sampling m in discrete steps. In fact, we found that using a simpler method can still work well for most examples. Assume we need to reduce the image width by m, this can be achieved simply by selecting m operators from O. For each step (narrowing the image width by 1 pixel), we simply pick the operator which causes minimum cost or use the stochastic method in Subsection 5.4 to select one. This direct mixed path method is not guaranteed to find the global optimal result, but is well enough for many examples. The time and space complexities of the algorithm are O(mn) which is quadratic in both the amount of size change and the number of operators. Our greedy scheme runs much faster than dynamic programming or regular path search. During the experiments, we noticed that for some examples a carefully-designed random selection scheme tends to achieve better results than the standard min function in (13), especially when the richness of the example is high. After calculating the operator probabilities by (9), we first calculate the standard deviation std (P ) of P . If std (P ) < 0.03 we calculate the cumulative probability Pl0 for each operator: P10 = 0, Pl0 =

l X i=1

Pl ,

l = 1, . . . , n.

(14)

Finally, a random real number α is generated in (0, Pn0 ] and the l-th operator Ol is selected such that Pl0 < α 6 0 Pl+1 . In this paper, our results except Fig.7(d) and Fig.11 are all generated using the stochastic operator selection technique. Specifically, in our algorithm the stochastic scheme is based on a statistical analysis of the operator costs. This method can smooth the deviations during the operator cost calculations. In fact, the mathematically optimal results are not always consistent with the users’ preference, this is also the reason that why our stochastic scheme can increase the quality of the results. 5.6

User Interaction

As discussed in Section 3, users’ viewing preferences are always the best standard to evaluate the resizing results. In our framework, we develop a simple but very efficient method to generate several candidates for a user to select according to his own preference. We first generate a result using our multi-operator algorithm and get the SC operation NSC . Then we separately fix the SC number to be 0.6·NSC , 0.8·NSC and 1.2·NSC to generate another three results, together with the pure SC, CR and SL results as the final candidates. More results can be generated by using the same scheme if necessary. Fig.8 shows an example of our interactive

Fig.8. Example of our interactive framework. (a) Original (©CITV). (b) By SC. (c)∼(f) Several candidate results automatically generated by our system. From (c) to (f), the SC operation numbers are: 36, 48, 60, 72.

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

framework. The image in Fig.8(e) is the initial candidate generated by our algorithm. Then our system can generate as many as needed candidates according to the SC number of the first result. 5.7

Indirect Resizing

The preservation of the object aspect ratios is one important problem in image resizing. In our algorithm, we use a simple indirect resizing method to achieve this objective. As shown in Fig.9, we indicate a bounding box and a rough boundary in the original image for the dominant object. Assume the sizes of the original image and the bounding box are w × h and w0 × h0 . We first resize the image to the target size by our multioperator algorithm, the width of the bounding box w00 in this image is calculated as: (w − NSC w − NCR w ) − NSL w , w − NSC w − NCR w (15) where NSC w is the total SC operation number, NSC w 0 is the number of the seams which went through the object boundary, NCR w is the CR operation number, NSL w is the SL operation number. We can see that in Fig.8(e), the aspect ratio of the object has been changed. The SC number NSC h for indirect resizing is calculated by the following equations:  w00 w0   00 = 0 h h (16) h   h0 · = h00 , h + NSC h w00 = (w0 − NSC

w0 ) ·

where h00 is the height of the bounding box after indirect resizing. We use optimized SC method in [2] to increase the height of the direct resized image to h + NSL h and then resize back directly by SL. Fig.9(d) shows our indirect resizing result. The aspect ratio of the bounding box w00 /h00 is the same as the one w0 /h0 in the original image. We can see that the global shape of the bunny is nicely preserved. Our method can work well for many examples especially when the shape of the object is relatively regular.

6

129

Results and Discussions

Our fast multi-operator framework can greatly accelerate the optimization process by using image energy and DCD information loss instead of bi-directional patch matching, without damaging the visual qualities of the results. To resize a 500 × 333 size image to half with 2∼4 operators, [1] costs 120∼1200 seconds, and [2] costs 40∼180 seconds, while our regular or mixed path searching algorithm uses only 10∼40 seconds. Using the direct mixed path described in Subsection 5.5 can further realize an approximatly 4∼5 times acceleration, which means 4∼10 s per image. All results in this paper were calculated on a 2.53 GHz dual core PC with 2 GB memory. This performance can generally meet the requirement of the online interactive use on personal computers. Our algorithm works very well for the images which have high richness contents, especially when the prominent objects occupy large part of the original image (Fig.4). Fig.10 shows a special example, the snowflake adds much noise to the image. This causes difficulties for the patch-matching-based methods to get very good results because this kind of methods are very sensitive to the local shape of the objects. The noise will affect the accuracy of patch matching operation. The SC & SL method almost falls back to pure scaling in Fig.10(f). Our result in Fig.10(f) achieve the best visual appearance and the highest support by users. Here we can also see the importance of integrating cropping operator. In Figs. 2, 7 and 11, we compare our image resizing results with those of CR, SC, SL and Multi-Op. Our algorithm generates visually similar results as those of Multi-Op but the speed is much faster. Our algorithm takes the advantages of both the discrete SC and the continuous global scaling methods. Compared with the optimized warping methods such as [16] and [17], our method is better at preserving the global visual effect and relative spatial aspect among the local objects. At the same time, the warping methods can protect well the aspect ratio of separate objects

Fig.9. Comparison of our indirect resizing result with warping[19] and user mean result. (a) Original (©Blender Foundation). (b) By warping. (c) By our direct. (d) By our indirect. (e) User indirect. In our user study, the percentages of votes obtained are: (b) 14%, (c) 10%, (d) 40%, (e) 36%.

130

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

Fig.10. Comparison of resizing results using various image resizing methods. (a) Original image. (b) By seam carving. (c) By scaling. (d) By cropping. (e) By SC & SL. (f) By ours. The SC & SL result is generated by the method in [2]. In our user study, the percentages of votes obtained are: (b) 4%, (c) 14%, (d) 0%, (e) 16%, (e) 66%.

Fig.11. Comparison of our results (mixed path, three operators) with those of SC, SL, CR and Multi-Op[1] . (a) Original image. (b) By SC. (c) By SL. (d) By CR. (e) By Multi-Op. (f) By ours. Our results are optimized by using dynamic programming in 23 seconds (glasses example) and 20 seconds (boy and snowman example). In our user study, for “Glass” example, the votes are: (b) 0%, (c) 6%, (d) 22%, (e) 38%, (f) 34%; for “Snow Man” example, the votes are: (b) 0%, (c) 0%, (d) 52%, (e) 24%, (f) 24%. Users may prefer the simple cropping results because cropping do not change any inner contents of the objects.

which are labeled by the importance map. For SC results, notice the discontinuities in the buildings and bridge arches, which are due to the pixels being excessively removed. Compared with optimized warping, our method can preserve the spatial proportions of the objects better. As shown in the “San Francisco Heart” example, our images successfully avoided the tree shape distortion problem found in warping’s result, as well as the relative aspect among the objects. In the “St Angelo Castle” example, our method maintains the relative aspect ratio between the castle and the bridge. In this example, the cropping operator also avoids the over-distortion of the bridge in our results. The castle, which is important in the original image, gets too small in warping’s result. In addition, these two examples also demonstrate the advantages of multi-operator

methods that integrate cropping operator on shape and relative ratio preservation. However, warping methods still claim better results on objects that extend to almost the full width/height of the image, such as the bridge in Fig.12. In Fig.13, we show the effect of using different DCD weights in our operator cost function. We can see that the DCD term is very useful in preserving the global visual effect of the image when the relatively homogeneous background occupies a very large space of the original image. The vote result tells us that in some cases keeping the homogeneous background is important to satisfy the users’ viewing preference. Due to the different image similarity measure definition and acceleration-aimed approximation, the practical resizing operations appear to be different as well

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

131

Fig.12. Comparison of our results with those of the optimized warping method[17] , the user mean results and our indirect resizing results. (a) Original image. (b) By warping. (c) By ours. (d) User mean. (e) By our indirect. In our user study, for “San Francisco Heart” example, the votes are: (b) 18%, (c) 30%, (d) 28%, (e) 24%; for “St Angelo Castle” example, the votes are: (b) 8%, (c) 32%, (d) 38%, (e) 20%. The users do not always appreciate the indirect results due to the damage to the global spatial structures.

Fig.13. Comparison of different DCD weights in the operator cost function. (a) Original image. (b) By seam carving. (c) ωdcd = 0.2. (d) ωdcd = 0.5. (e) ωdcd = 0.8. We user two operators (SC and SL) to resize the image from 333 × 500 to 333 × 250. The users’ votes for the results are: (b) 4%, (c) 6%, (d) 45%, (e) 55%.

between our algorithm and previous studies. For example, in Fig.14, our algorithm performed more scaling than Multi-Op and [2]. In Fig.15, our algorithm performed more seam insertion than Multi-Op. However, the results show that the resizing quality of our

high-speed algorithm is equivalent to that of previous studies that employ more computation-complicated similarity measures. In Figs. 1, 4, 9 and 12, we can see that in many cases, the automatic results are not the same as the user mean results. However, user studies show that the users also did not always agree on the “correct” result. One result may still get higher support by users even it is not quantitatively optimal. The cropping results in Figs. 1 and 11 still get some support because the cropping operator does not change any inner content of the original. The contents in the cropped window always keep the same information as the original image. The cropping results of the “boy and snowman” example in Fig.11 get the highest vote percentage. We think that the reason is due to the best preservation of the shape of both the boy and the snow man, although some boundary information is lost. In fact, as formulated in our energy cost, the cropping operator is easily to be triggered because in most cases there is few prominent content near the boundaries of an image. Trying to use as more as possible cropping operations is well consistent with the conclusions we get during the user studies of Section 3.

Fig.14. 2D retargeting results. (a) Original image. (b) By Multi-Op. (c) By [2]. (d) By ours. The Multi-Op result is generated by applying Simakov et al.[28] ’s SSD-based image distance measure. All of the three results are generated by combing SC and SL. In our user study, the votes are: (b) 26%, (c) 34%, (d) 40%. Our result better preserve the aspect ratio of the monitor.

132

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

Fig.15. Image enlarging. Our algorithm takes 12 seconds to find the best regular path using two operators (SC and SL). (a) Original image. (b) By SC[13] . (c) By Multi-Op[1] . (d) By ours.

Fig.16. One example when our algorithm fails. (a) Original. (b) By SC. (c) By our direct. (d) By our indirect. (e) By warping. (f) By shift-map. All the operators (SC, CL and CR) in our framework cannot narrow the distance between the two birds without destroying the shape of the birds or the continuity of the wire.

7

Limitations

The operators used in our framework can work well for many images. Unfortunately, using only these three operators is not always enough. As shown in Fig.16, our algorithm cannot generate a good result even with the indirect scheme. The shift-map method[26] works better than the other algorithms when the background is highly structural or there are many small individual prominent objects in the original image. Similar to the non-homogeneous warping method, our indirect resizing algorithm will also enlarge the relatively homogeneous background and decrease the size of the dominant objects. This will somewhat damage the global spatial structure of the image. The cost function of our algorithm is based on the evaluation of the image energy and dominant colors. This may not work well for catching the artifacts of some global structures. We use a revision item formulated by combining the image richness ratio and relative standard deviation of the patch richness. This may cause excessive use of scaling or cropping in the result. For those examples, our user-interaction scheme described in Subsection 5.6 could well solve the problem. As shown in some examples, our automatic results do not always agree with the users’ preference. This problem makes our user-interaction system more important. Furthermore, the limitations imposed by the specific methods (seam carving, cropping, scaling) will

also be overcome by our solution. 8

Conclusions and Future Work

We presented a novel technique for fast and interactive multi-operator image resizing. We defined an operator cost combining image energy and DCD to find the optimal paths in resizing space, given a global objective function that evaluates the information loss during the resizing process. We employ a similar method as Gibbs sampling to make the energy and DCD difference value comparable. A stochastic algorithm was proposed to select the operators based on their probabilities being selected. In practical experiments, we have observed that this scheme usually works better for high richness images than directly choosing the operators with minimum cost. Joint with a further acceleration scheme termed direct mixed path, our algorithm can realize an acceleration of 30∼200 times compared to state-of-the-art methods such as [1] and [2] while maintaining an equivalent resizing quality. We developed an intuitive user interaction scheme to let the user choose the favorite result from a set of automatically generated candidates. In addition, we described an interactive indirect image resizing algorithm. In many examples, the algorithm can precisely preserve the aspect ratio of the dominant object. This technique partly fixes the artifact of our multi-operator algorithm in shape preservation. In addition to the arbitrary resizing of images, our

Wei-Ming Dong et al.: Fast Multi-Operator Image Resizing and Evaluation

method has potential use in video resizing. It would be necessary to consider continuity between adjacent frames, especially when there are substantial differences in their contents. Our approach has been tested on a large number of images, most of which are difficult for single resizing operators. We have noticed that in many cases our algorithm can obtain similar and sometimes even superior results to previous works. This discovery inspires us to carry out not only qualitative but also quantitative analysis of image similarity measures in future work. Integration of non-homogeneous warping and shift-map techniques into our framework is also a challenging problem. Acknowledgements We thank Dr. Ning Zhou at Sony Corporation for the inspiring discussion of this work. We thank the following Flickr members for sharing their images on the internet: Mandychu543 (girl and snowman), seri* (lotus), Stuck in Customs (boy and pumpkin), yocca (snowing temple), Gerald Goh (boat). The Eiffel Tower, San Francisco Heart, St Angelo Castle images and results are borrowed from [17]. The car, sail boats, glasses, boy and snowman, desk, volleyball images and results are borrowed from [1]. The two birds image and results are borrowed from [26]. The blue bird image is captured from the movie “The King of Milu Deer”. References [1] Rubinstein M, Shamir A, Avidan S. Multi-operator media retargeting. ACM Trans. Graph., 2009, 28(3), Article No. 23. [2] Dong W, Zhou N, Paul J C, Zhang X. Optimized image resizing using seam carving and scaling. ACM Trans. Graph., 2009, 28(5), Article No. 125. [3] Chen L, Xie X, Fan X, Ma W, Zhang H, Zhou H. A visual attention model for adapting images on small displays. ACM Multimedia Systems Journal, 2003, 9(4): 353-364. [4] Liu H, Xie X, Ma W Y, Zhang H J. Automatic browsing of large pictures on mobile devices. In Proc. the 11th MULTIMEDIA, Nov. 2003, pp.148-155. [5] Suh B, Ling H, Bederson B B, Jacobs D W. Automatic thumbnail cropping and its effectiveness. In Proc. the 16th UIST, Nov. 2003, pp.95-104. [6] Santella A, Agrawala M, DeCarlo D, Salesin D, Cohen M. Gaze-based interaction for semi-automatic photo cropping. In Proc. CHI, April 2006, pp.771-780. [7] Viola P, Jones M J. Robust real-time face detection. Int. J. Comput. Vision, 2004, 57(2): 137-154. [8] Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. [9] DeCarlo D, Santella A. Stylization and abstraction of photographs. ACM Trans. Graph., 2002, 21(3): 769-776. [10] Walthera D, Koch C. Modeling attention to salient protoobjects. Neural Networks, 2006, 19(9): 1395-1407.

133

[11] El-Alfy H, Jacobs D, Davis L. Multi-scale video cropping. In Proc. the 15th MULTIMEDIA, Sept. 2007, pp.97-106. [12] Avidan S, Shamir A. Seam carving for content-aware image resizing. ACM Trans. Graph., 2007, 26(3), Article No. 10. [13] Rubinstein M, Shamir A, Avidan S. Improved seam carving for video retargeting. ACM Trans. Graph., 2008, 27(3), Article No. 16. [14] Gal R, Sorkine O, Cohen-Or D. Feature-aware texturing. In Proc. Eurographics Symposium on Rendering, June 2006, pp.297-303. [15] Wolf L, Guttmann M, Cohen-Or D. Non-homogeneous content-driven video-retargeting. In Proc. the 11th ICCV, Oct. 2007, pp.1-6. [16] Zhang Y F, Hu S M, Martin R R. Shrinkability maps for content-aware video resizing. Computer Graphics Forum, 2008, 27(7): 1797-1804. [17] Wang Y S, Tai C L, Sorkine O, Lee T Y. Optimized scaleand-stretch for image resizing. ACM Trans. Graph., 2008, 27(5), Article No. 118. [18] Guo Y, Liu F, Shi J, Zhou Z H, Gleicher M. Image retargeting using mesh parametrization. IEEE Trans. Multi., 2009, 11(5): 856-867. [19] Kr¨ ahenb¨ uhl P, Lang M, Hornung A, Gross M. A system for retargeting of streaming video. ACM Trans. Graph., 2009, 28(5), Article No. 126. [20] Wang Y S, Fu H, Sorkine O, Lee T Y, Seidel H P. Motionaware temporal coherence for video resizing. ACM Trans. Graph., 2009, 28(5), Article No. 127. [21] Kim J S, Kim J H, Kim C S. Adaptive image and video retargeting technique based on fourier analysis. In Proc. CVPR, June 2009, pp.1730-1737. [22] Zhang G X, Cheng M M, Hu S M, Martin R R. A shapepreserving approach to image resizing. Computer Graphics Forum, 2009, 28(7): 1897-1906. [23] Huang Q X, Mech R, Carr N. Optimizing structure preserving embedded deformation for resizing images and vector art. Computer Graphics Forum, 2009, 28(7): 1887-1896. [24] Wu H, Wang Y S, Feng K C, Wong T T, Lee T Y, Heng P A. Resizing by symmetry-summarization. ACM Trans. Graph., 2010, 29(6), Article No. 159. [25] Cho T S, Butman M, Avidan S, Freeman W T. The patch transform and its applications to image editing. In Proc. CVPR, June 2008. [26] Pritch Y, Kav-Venaki E, Peleg S. Shift-map image editing. In Proc. the 12th ICCV, Setp. 29-Oct. 2, 2009, pp.151-158. [27] Barnes C, Shechtman E, Finkelstein A, Goldman D B. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph., 2009, 28(3), Article No. 24. [28] Simakov D, Caspi Y, Shechtman E, Irani M. Summarizing visual data using bidirectional similarity. In Proc. CVPR, June 2008. [29] Wei L Y, Han J, Zhou K et al. Inverse texture synthesis. ACM Trans. Graph., 2008, 27(3), Article No. 52. [30] Manjunath B, Salembier P, Sikora T. Introduction to MPEG7: Multimedia Content Description Interface. Chichester: Wiley, 2002. [31] Tao L, Yuan L, Sun J. SkyfInder: Attribute-based sky image search. ACM Trans. Graph., 2009, 28(3), Article No. 68. [32] Ilea D, Whelan P. CTex-an adaptive unsupervised segmentation algorithm based on color-texture coherence. IEEE Transactions on Image Processing, 2008, 17(10): 1926-1939. [33] Casella G, George E I. Explaining the gibbs sampler. The American Statistician, 1992, 46(3): 167-174.

134

J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1

Wei-Ming Dong is an associate professor in the Sino-French Laboratory for Computer Science, Automation and Applied Mathematics (LIAMA) and National Laboratory of Pattern Recognition (NLPR) at Institute of Automation, Chinese Academy of Sciences (CAS). He received his B.Sc and M.Sc. degrees in computer science in 2001 and 2004, both from Tsinghua University, China. He received his Ph.D. degree in computer science from the University of Henri Poincar´e Nancy 1, France, in 2007. During his Ph.D. study, he worked as a research assistant at the French National Institute for Research in Computer Science and Control (INRIA) from April 2004 to June 2007. His research interests include image synthesis and realistic rendering. He is a member of CCF, ACM and IEEE. Guan-Bo Bao received his M.S. degree in computer science from Institute of Software, CAS in 2008. He is currently a Ph.D. candidate in LIAMA at Institute of Automation, CAS. His research interests include image-based rendering, image analysis and synthesis.

Xiao-Peng Zhang received his M.S. degree in mathematics from Northwest University in 1987, and the Ph.D. degree in computer science from Institute of Software, CAS, in 1999. He is a professor in LIAMA/NLPR at Institute of Automation, CAS. His main research interest is computer graphics and pattern recognition. Dr. Zhang was invited as a foreign specialist for forest visualization in INRIA from September 2002 to August 2004. He is also a professor of Graduate University of CAS. He received the National Scientific and Technological Progress Prize (Second Class) in 2004. His research work is supported by projects from the National Natural Science Foundation of China, National High-Tech Research and Development 863 Program of China, and the French National Research Agency. He is a member of ACM. Jean-Claude Paul is director of INRIA and a professor at Tsinghua University, Beijing, China. He received his Ph.D. degree in mathematics from the University of Paris and graduated in architecture design from the French National School of Fine Arts (ENSBA) in 1976. In 1995, he obtained the Academie des Sciences Prize and the Academie des Beaux Arts Prize, for both his artistic and scientific work. His research interests include realistic rendering, geometry processing and curves and surfaces theory.

Suggest Documents