Journal of the Chinese Institute of Engineers, Vol. 32, No. 3, pp. 427-433 (2009)
427
Short Paper
INTERACTIVE OBJECT SEGMENTATION IN DIGITAL IMAGES BY FOREGROUND/BACKGROUND REGION CLASSIFICATION WITH HIERARCHICAL QUEUES
Jhing-Fa Wang*, Han-Jen Hsu, and Jyun-Sian Lee
ABSTRACT In this paper, we present work on interactive object segmentation in digital images. The user can employ the proposed work to separate the target object from the background easily. After marker drawing of foreground and background from a user, the user interface shows the target object in original color and the background in some other specific color, respectively. Compared with the tedious steps of Adobe Photoshop, the user can interactively approximate the contour of the target object with more efficiency. Our proposed algorithm is based on Foreground/Background region classification by comparing the similarity of color information. At first, the user interface processes the input image by watershed segmentation to produce the segmented regions. Then, some unlabeled regions are assigned as foreground regions or background regions by marker drawing. After marker drawing, the remaining unlabeled regions are processed by Foreground/Background region classification. In our implementation, we also introduce hierarchical queues to store the unlabeled regions during the procedure of region classification. The target object is segmented after Foreground/Background region classification. In our experiments, the proposed algorithm provides output with high accuracy and low effort. Key Words: user interface, object segmentation, color image segmentation.
I. INTRODUCTION For object-based applications, we need to address how to find the contour of an entire object in digital images. Therefore, object segmentation in image and video has become an important research topic. This paper aims for “interactive object segmentation” in digital images that separates the image into two parts: foreground and background with user interaction. Interactive object segmentation can identify the foreground object and separate the object from background. This function can allow the user to cut *Corresponding author. (Tel: 886-6-2746867; Fax: 886-62761693; E-mail:
[email protected]) The authors are with the Department of Electrical Engineering, National Cheng Kung University, EE, No.1, Ta-Hsueh Rd., Tainan 701, Taiwan, R.O.C.
the target object and then to paste the object on another image by means of input information from the user. By using our image user interface, a user with no experience in image processing can finish the object segmentation confidently. In current commercial products, the Magic Wand of Adobe Photoshop (Adobe, 2002) is a famous image editing tool with wide usage. The operation is tedious for the user to complete object segmentation due to many repeat steps of refinement. To reduce the tediousness of delineating the entire object, pixelwise, several techniques are presented (Mortensen, 1999; Gleicher, 1995; Perez, 2001; Sun, 2005; Rother, 2004; Li, 2004) to simplify the operation. The related techniques include boundary-based methods and region-based methods. Boundary-based methods (Mortensen, 1999; Gleicher, 1995; Perez, 2001) try to approximate the pre-assigned curves near the
428
Journal of the Chinese Institute of Engineers, Vol. 32, No. 3 (2009)
contour of the object. These include methods such as intelligent scissors (Mortensen, 1999), image snapping (Gleicher, 1995), and Jetstream (Perez, 2001). The user needs to draw the curves which enclose the whole object. However, these methods still require too much effort for users. The user must draw the curves near the contour of the object explicitly. To improve pixel-based methods, several region-based methods (Sun, 2005; Rother, 2004; Li, 2004) are proposed to replace the pixel-based methods. The accuracy is increased without as much effort as that of boundary-based methods for users. Sun et al. proposed a smart method “Poisson Matting” (Sun, 2005) to preserve the fine features of an object, such as hairs and feathers. The selected object can be cut out from the original image and pasted onto another target image. Rother et al. proposed “GrabCut” (Rother, 2004) to achieve foreground segmentation using iterated Graph Cuts. The target object is segmented by dragging a square window loosely around an object. Another method “Lazy snapping” (Li, 2004), is presented by drawing the foreground and background curves to extract the object. In this paper, we provide hierarchical queues to achieve more regular implementation than that of “Lazy snapping” without sacrificing quality. In summary, we need to find the contour of a target object which divides the image into only two parts: foreground and background. By definition in previous papers (Rother, 2004; Li, 2004), object segmentation is considered as a binary labeling problem. The primitive concerns of this work are twofold. One problem is employing less effort to acquire more accurate result. Another problem is that the detailed information on the object also needs to be preserved. The user draws two kinds of curves, green curves for foreground marker and yellow curves for background marker in the input image, respectively. The target object is acquired easily from the output image. The organization of this paper is as follows. In Section II, we will describe our proposed image user interface and algorithm in detail. In Section III, more experimental results and comparisons are given to provide subjective measurement. Finally, the conclusions of this work are provided. II. PROPOSED INTERACTIVE OBJECT SEGMENTATION ALGORITHM At first, the input image is pre-processed to reduce additive noise. The edge detection and watershed segmentation are applied to produce the segmented regions. Then, the user draws the image by foreground markers (green curves) and background markers (yellow curves). Once the user finishes marker drawing, according to the two sets of markers, two sets of regions are assigned as foreground label
(F) and background label (B), respectively. The other, non-marked, regions are defined as unlabeled regions. The unlabeled regions are classified into foreground or background to generate the final image. The detailed description of the proposed algorithm is shown as follows. 1. Noise Reduction The pre-processing step is applied to remove the noise which may affect the output result. We use a median filter and a mean filter to reduce the noise. The median filter is applied first to avoid the operation of averaging an impulse noise for the mean filter. The size of the two filter masks is assigned to be 3 × 3. 2. Edge Detection and Watershed Segmentation After noise reduction, edge detection and watershed segmentation are applied to segment the noise reduced image into a large number of regions. In order to consider the luminance and color variation near the boundary of an object, we adopt a simple and efficient method (Gao, 2001) which incorporates luminance and colors in L*a*b* color space. The gradient value is decided by erosion and dilation. We assume that f (x, y) denotes an input signal and M n × n denotes a flat structuring element of size n × n, the definition of erosion and dilation by the flat structuring element are given as follows: Erosion:
ε n(f)(x, y) = min{f (x + x 0, y + y 0), (x0, y 0) ∈ M n × n}
(1)
Dilation:
δ n(f)(x, y) = max{f (x – x 0, y – y 0), (x 0, y 0) ∈ M n × n}
(2)
Then, the morphological gradient is given as shown in Eq. (3). Morphological gradient: g(f)(x, y) = δ n(f)(x, y) – ε n(f)(x, y)
(3)
As mentioned above, we want to obtain a gradient value which incorporates luminance information and color information. Let g Y(x, y) denote the gradient value obtained from luminance information and gC(x, y) denote the gradient value obtained from color information. Therefore, g Y(x, y) is calculated on the Y component in YCbCr color space using Eq. (3), and g C(x, y) is calculated as follows.
J. F. Wang et al.: Interactive Object Segmentation in Digital Images by Foreground/Backgroung Region
g C(x, y) =
(g L*(x, y)) 2 + (g a*(x, y)) 2 + (g b*(x, y)) 2 ,
(4) where , g L*(x, y), ga*(x, y) and gb*(x, y) are calculated on L*, a*, and b* components in L*a*b* color space using Eq. (3), respectively. Finally, the incorporated gradient value g i(x, y) is given as in Eq. (5). g i(x, y) = max(g Y(x, y), g C(x, y))
(5)
We have used D65 as CIE L*a*b* reference white point. Thus, the constants X n, Y n, and Z n are equal to 0.9504, 1.0, and 1.0889, respectively (Kasson, 1992). After edge detection, we use watershed segmentation from (Gonzalez, 1992) to chunk the image into many regions. The basic concept of watershed segmentation is that an image is regarded as a three dimensional topographic chart. The three dimensions consist of horizontal, vertical coordinates and the gray level of a gradient image. In a traditional immersion watershed algorithm, the procedure is analogous to water flooding from smaller gradient values to larger gradient values. When the water reaches a pixel p with gradient value h in the image, there are three possible situations during flooding: (1) there are no basins around p, which means the gradient values of neighbors of p within a 3 × 3 block are all greater than or equal to h, and then we set p as a new basin; (2) there is one basin adjacent to p, then we include p in this basin; (3) when there are more than one basins adjacent to p, we build a “dam” on p. After flooding to the maximal gradient value, we get the segmented regions bounded by the dams. 3. Marker Drawing of Foreground/Background All regions are unlabeled regions before this step. In this Section, we draw the image with foreground and background markers. More markers drawing in the foreground and background will lead to a more exact result. The foreground markers are selected in green and the background markers are selected in yellow, respectively. Once the user marks the image, corresponding regions of foreground and background are labeled as foreground label (F) and background label (B), respectively. Therefore, the user-marked regions are the labeled regions inevitably and the nonmarked regions are regarded as unlabeled regions. After this Section, the remaining non-marked regions are processed by Foreground/Background region classification in the next Section. 4. Foreground/Background Region Classification Foreground/Background region classification is
429
the most important step in our algorithm, which will affect the quality of the output. The principle of this step is based on comparing the similarity of color information. The remaining unlabeled regions which are not user-marked foreground and background regions are labeled by Foreground/Background region classification in this Section. After Foreground/Background region classification, the regions belonging to the object are given as foreground label (F) and the remaining regions are classified as background label (B). In our implementation, we adopt hierarchical queues. In each queue of hierarchical queues, the regions with the same index number are put in the same queue. For instance, three green balls in the same queue means three regions with the same index number. This data structure was used in image segmentation originally (Gao, 2001). However, this mechanism is also suitable for Foreground/Background region classification of our proposed algorithm. We then construct the region adjacency graph (RAG), which represents the relation between each region and its neighbors. The RAG is defined as an undirected graph G = (V, E), where V = {1, 2, 3, ... , k} is the set of graph nodes, k is the number of regions obtained from watershed segmentation, and E ⊂ V × V is the set of graph edges. Each region is represented by a graph node and there exists a graph edge (x, y) if the two graph nodes x and y are adjacent. The regional color distance of the two adjacent regions is defined as follows. RCD(x, y) = ||C(x) – C(y)||
(6)
where C( . ) denotes the mean color vector of a region in L*a*b color space, x ∈ V, y ∈ V, and (x, y) ∈ E. The procedure is composed of two steps: Initialization of the hierarchical queues and Flooding. The generic notation diagram of Foreground/Background region classification is shown in Fig. 1. The blue region is denoted as A in Initialization of the hierarchical queues. On the other hand, the blue region is denoted as R in the Flooding step. The regional color distance between region A and the adjacent labeled region is used to decide the index number of region A in hierarchical queues. On the other hand, the blue region is denoted as R in the Flooding step. Therefore, the regional color distance between region R and adjacent labeled region is used to decide what the label of region R should be. The detailed descriptions of these two steps are shown as follows. Initialization of the hierarchical queues: The adjacent unlabeled regions of the usermarked regions are inserted into the hierarchical queues in sequence, lowest index number first. Each
430
Journal of the Chinese Institute of Engineers, Vol. 32, No. 3 (2009)
where R * is the region most similar to R, and N(R) denotes the labeled region adjacent to R. Then, R is assigned the same label as R *. On the other hand, the unlabeled regions (B) adjacent to region R which are still outside the hierarchical queues are inserted into the q-th queue as in Eq. (9). Region X
t = floor(min (RCD(R j, B)) + 0.5) where j = 1 to n n
q = t , if t ≥ z q = z , otherwise where z is the index of R , (9)
Region boundary A non-labeled neighboring region of region X A labeled neighboring region of region X The nodes and edges between region X and its labeled neighboring regions
Fig. 1
The generic notation diagram of Foreground/Background region classification
adjacent unlabeled region (A) of user-marked regions is inserted into the hierarchical queues according to its index number as in Eq. (7). The index number of a region is denoted as q. q = floor(mimn(RCD(R i, A)) + 0.5), i = 1 to m, (7) where (Ri, A) ∈ E and m is the number of the labeled regions adjacent to A. Flooding step: After Initialization of the hierarchical queues, we start to extract the unlabeled regions from the queue with the lowest index number. The hierarchical queues are processed until all of the queues are empty. Each region in the hierarchical queues is extracted and compared for similarity with the adjacent labeled regions. The labeled regions may include the user-marked regions and the regions which are already labeled in this step. After we extract a region R from the hierarchical queues, we have to find a labeled region adjacent to R which is most similar to R as in Eq. (8). R * = arg min RCD(LR, R) , where LR
LR ∈ N(R) R * ∈ N(R)
(8)
where Rj is labeled, (Rj, B) ∈ E, and n is the number of the labeled regions adjacent to B. During the procedure of region classification, if the queue with the same index number of R is empty, it will be suppressed and no longer inserted. In the later processing, if we obtain a region which will be inserted into the suppressed queue, we put the regions into the lowest un-suppressed queue. Finally, until all regions are labeled (the hierarchical queues are empty), the wanted object is segmented from the input image. 5. Boundary Editing by Marker Drawing and Magnetic Lasso After the previous steps, there may still exist some errors around ambiguous and low contrast edge boundaries. We finish the boundary editing by marking drawing near the edge boundaries. However, some errors are not easily solved. We apply Magnetic Lasso in Adobe Photoshop to achieve high accuracy in the segmented results. In our implementation, boundary editing by marking drawing is restricted to the bounding box to reduce the processing time. III. EXPERIMENTAL RESULTS To verify the performance of our proposed algorithm, we provide our captured images, Kodak test images and test images from other papers. The experimental results show our proposed algorithm can produce excellent output. A test image, parrot, was chosen from the Kodak test images package by the user as shown in Fig. 2(a). After our proposed algorithm, the red parrot is segmented from the input image. Another example is provided in Fig. 2(b), the lighthouse is extracted from the image. The test image from (Drori, 2003) shows that the elephant is segmented from the background as shown in Fig. 2(c). The test image from (Jia, 2004) shows three chairs on the grass. The red chair is chosen to be the target object separated from the grass. We draw two curves
431
J. F. Wang et al.: Interactive Object Segmentation in Digital Images by Foreground/Backgroung Region
(a)
(e)
(b)
(f)
(c)
(d)
Fig. 2
(g)
More experimental results. The left image of each image pairs shows the image with marker drawing. After our proposed algorithm, the right image of each image pairs shows the final result of each image
Table 1. Computation time of each step of proposed object segmentation algorithm Image no.
Number of F/B/ boundary editing
Image resolution
a b c d e f g
3/1/4 2/2/13 7/4/15 2/2/4 3/2/2 2/2/4 4/2/6
768*512 768*512 352*211 352*234 352*264 600*400 450*600
Noise reduction
Edge detection
1.047 1.031 0.188 0.187 0.234 0.516 1.016
4.501 4.594 0.890 0.953 1.078 2.609 7.765
sec sec sec sec sec sec sec
sec sec sec sec sec sec sec
Watershed segmentation 1.953 1.984 0.375 0.407 0.469 1.141 1.797
sec sec sec sec sec sec sec
Region classification 0.078 0.031 0.001 0.016 0.001 0.001 0.015
sec sec sec sec sec sec sec
* The numbers in the second column denote the number of foreground markers, the number of background markers and the number of boundary editing markers, respectively.
in green as foreground and two curves in yellow as background. The middle red chair is segmented from the image shown in Fig. 2(d). For our captured picture, the left person is also segmented with less effort as shown in Fig. 2(e). The house and the cross in similar background are also segmented as shown in Fig.
2(f) and Fig. 2(g), respectively. We also provide the number of foreground and background markers of each image. The number of the boundary editing and the computation time analysis are also given in Table 1. The computation time costs most time in edge detection and watershed segmentation. The final result is
432
Journal of the Chinese Institute of Engineers, Vol. 32, No. 3 (2009)
Table 2. Comparison of our proposed algorithm and adobe photoshop (a) Comparative processing time for each image (b) The error rate of each image object editing marker drawing
Percentage 100 90 80 70 60 50 40 30 20 10 0
error rate
Percentage 7 6 5 4 3 2 1
A
B
C
D Image
E
F
G
0
A
(a)
obtained immediately with pre-segmentation after the marker drawing from the user. The simulation environment is on Pentium 1.7GHz with 1GB of RAM and the algorithm is implemented in C++. In order to evaluate the quality of our algorithm, we generated the ground truth images by use of Magic Wand and Magnetic Lasso in Adobe Photoshop. The user is advised to employ Magic Wand as the major tool and Magnetic Lasso as an auxiliary tool. The processing times of our images are compared to those of Photoshop in each image as shown in Table 2(a). Our algorithm used only about 60% of the time Photoshop used. We also compare the output results with the ground truth images to acquire the error pixels in each image. The error rate of each image is computed as shown in Table 2(b). Compared to “Lazy snapping” (Li, 2004), our algorithm can process the input image without using K-means algorithm. Our proposed algorithm directly uses the color information and gradient information to extract the unwanted object. The low computational cost is achieved via the hierarchical queues. The L*a*b* color space is also designed to approximate the human visual system. IV. CONCLUSIONS In this work, we propose an algorithm for interactive object segmentation by Foreground/Background region classification with hierarchical queues. The unlabeled regions after watershed segmentation are classified into foreground and background to accelerate the object segmentation of digital image. The hierarchical queues are implemented to increase the efficiency and regularity. The image user interface is easy to use, even the general users without experience in image processing can acquire the wanted object easily. The proposed algorithm can produce good
B
C
D Image
E
F
G
(b)
results in real-time with pre-segmentation. In our experimental results, an object on a similar background also can be segmented easily by our proposed algorithm. NOMENCLATURE f (x, y) Mn × n g Y(x, y)
An input signal A flat structuring element of size n × n The gradient value obtained from luminance information g C(x, y) The gradient value obtained from color information. g i(x, y) The incorporated gradient value F foreground label B background label RAG The region adjacency graph C( . ) The mean color vector of a region in L*a*b color space REFERENCES
Adobe Photohshop, 2002, Adobe Photoshop User Guide, Adobe Systems Inc., San Jose, CA, USA. Drori, I., Cohen-Or, D., and Yeshurun, H., 2003, “Fragment-Based Image Completion,” ACM Transactions on Graphics, Vol. 22, No. 3, pp. 303-312. Gao, H., Siu, W. C., and Hou, C. H., 2001, “Improved Techniques for Automatic Image Segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 12, pp. 1273-1280. Gleicher, M., 1995, “Image Snapping,” Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, pp. 183-190. Gonzalez, R., and Woods, R., 1992, Digital Image Processing, Addison-Wesley, Boston, MA, USA.
J. F. Wang et al.: Interactive Object Segmentation in Digital Images by Foreground/Backgroung Region
Jia, J., Tang, and C. K., 2004, “Inference of Segmented Color and Texture Description by Tensor Voting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 6, pp. 771-786. Kasson, J. M., and Plouffe, W., 1992, “An Analysis of Selected Computer Interchange Color Spaces,” ACM Transactions on Graphics, Vol. 11, No. 4, pp. 373-405. Li, Y., Sun, J., Jia, J., Tang, C. K., and Shum, H. Y., 2004, “Lazy Snapping,” ACM Transactions on Graphics, Vol. 23, No. 3, pp. 303-308. Mortensen, E. N., and Barrett, W. A., 1999, “Toboggan-based Intelligent Scissors with a Four Parameter Edge Model,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, Vol. 2, No. 1, pp. 455-458.
433
Perez, P., Blake, A., andGangent, M., 2001, “Jetstream: Probabilistic Contour Segmentation with Parti-cles,” Proceedings of IEEE Conference on Computer Graphics, Vancouver, BC, Canada, Vol. 2, No. 1, pp. 524-531. Rother, C., Kolmogorov, V., and Blake, A., 2004, “GrabCut-interactive Foreground Segmentation Using Iterated Graph Cuts,” ACM Transactions on Graphics, Vol. 23, No. 3, pp. 309-314. Sun, J., Jia, J. , Tang, C. K., and Shum, H. Y., 2004, “Poisson Matting,” ACM Transactions on Graphics, Vol. 23, No. 3, pp. 315-321. Manuscript Received: July 04, 2007 Revision Received: Sep. 07, 2008 and Accepted: Oct. 07, 2008