Multimed Tools Appl DOI 10.1007/s11042-010-0678-9
Image matting through a Web browser Yen-Chun Lin & Hsiang-An Wang & Yi-Fang Hsieh
# Springer Science+Business Media, LLC 2010
Abstract Image matting is a process of foreground extraction from an image. An interactive, Web-based tool, called NIM 2.0, for image matting is presented in this paper. NIM is the first image matting tool accessible through a Web browser. Its algorithm has been improved from the first version to make it faster. How NIM is used, why it works, its architecture, and experimental results are described. It can extract a foreground with thin, thread-like shapes. It begins to process inputs immediately after the user has started to paint with a brush roughly along the boundary between the foreground and the background. While painting, the user can stop anywhere to change the width of brush as needed to achieve good matting quality. The quality of the foreground extracted by NIM is usually better or not worse than those done by the other two online tools and Photoshop. NIM is fast and the amount of time required to complete matting is essentially limited by the speed of brush movement only. Several variations of our algorithm are also discussed and experimented. Keywords Alpha matte . Foreground extraction . Image editing . Image matting . Multimedia . Web-based
1 Introduction Blogs with multimedia capability, particularly images, have been emerging. More and more blogs provide Web-based tools for digital image editing. However, the editing capabilities of current online image editors have been inadequate, which is especially true for image Y.-C. Lin (*) : H.-A. Wang : Y.-F. Hsieh Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, No. 43 Keelung Rd., Sec. 4, Taipei 106, Taiwan e-mail:
[email protected] H.-A. Wang e-mail:
[email protected] Y.-F. Hsieh e-mail:
[email protected]
Multimed Tools Appl
matting. Image matting is to extract a foreground object from an image based on user input. Users need a good Web-based matting tool to extract an object with elaborate details. After extracting a foreground object, one can use it for other purposes, such as pasting it onto another image to create a new composite. Two approaches to image matting have been introduced. One is natural image matting from a single image, and the other is matting from multiple images. Natural image matting [6, 9, 15, 17, 19] assumes that the color of a pixel I of an image is a linear combination of a foreground color F and a background color B I ¼ aF þ ð1 aÞB; where α is the foreground opacity of I, and 0≤α≤1. The foreground opacities of all the pixels in the image constitute an opacity image, or alpha matte. Natural image matting needs to obtain the alpha matte of a given image. Once the alpha matte is obtained, the foreground object is readily obtained. Natural image matting usually requires the user to provide a trimap, which partitions the original image into three regions, as shown in Fig. 1: definitely foreground (in white), definitely background (in black), and unknown region (in gray). For any pixel in the unknown region, we need to decide its F and B values before its α can be decided. This is done by utilizing some assumptions on F and B to predict their values for each pixel in the unknown region. The value of α of every pixel can then be computed to extract the foreground. In the Corel KnockOut algorithm [5], F and B are assumed to be smooth so that the alpha of pixels in the unknown region can be computed based on a weighted average of foreground and background pixels. Bayesian matting [6] formulates the computing of alpha in a well-defined Bayesian framework before solving it. Poisson matting [15] obtains the alpha matte by solving Poisson equations with the matte gradient field. The method consists of two steps: global Poisson matting and local Poisson matting. In the global Poisson matting, the matte is obtained from input image by solving Poisson equations using a trimap supplied by the user. In the local Poisson matting, the user needs to apply local manipulations to the matte gradient field to improve the result. A closed-form solution has also been presented for obtaining the alpha matte; this approach requires the assumption of local smoothness of F and B [9]. Matting from multiple images solves the problem using two different images. Blue screen matting [14] uses a photograph in which the subject is against a constant-colored background, so that it can extract the foreground. Difference matting [11] needs one image with foreground and the other without foreground. The difference between the two images is used to obtain the alpha matte. Flash matting [16] uses two photos, one taken with flash
Fig. 1 An image (left) and its corresponding trimap (right)
Multimed Tools Appl
and the other without, to obtain the alpha matte. For a comprehensive review of matting algorithms and systems, the reader is referred to [18]. We have implemented an online natural image matting tool, called NTUST Image Matting (NIM). Through a Web browser, NIM can extract thin, thread-like shapes of the foreground without specifying the exact shape of the foreground. Unlike some other matting approaches [9, 15], NIM need not assume smoothness on foreground or background colors. NIM is the first Web-based image matting application. The implementation of a Web-based image matting program is very different from and more complicated than that of a stand-alone version. It should require a nearly complete rewrite of a stand-alone image matting program, such as the image matting tool of Adobe Photoshop [13], to make it operational through a Web browser. The added undertaking is unavoidable since it involves a very different environment and technology. After the publication of an earlier version of NIM [10], we have improved some internal algorithms of NIM to make it faster. In this paper, with almost complete revision of the earlier version, we have updated and added figures and data for comparison of NIM with other foreground extraction tools. In addition, the architecture of NIM is added in Section 2.4, which is totally new. A new Section 4 is also added to discuss several alternatives to implement NIM. Although we have not known of any other Web-based image matting tool, two online cutout applications do exist [3, 8]. Unlike matting tools, cutout algorithms do not compute α when trying to extract the foreground. Other related research includes foreground and background segmentation of scenes, especially video scenes taken with a moving camera [2]. The rest of this paper is organized as follows. Section 2 introduces NIM, including how it is operated, its basics, and its architecture. Section 3 compares the performance of NIM and three other foreground extraction tools. Section 4 presents several alternatives to implement NIM and their pros and cons. Section 5 concludes this paper and gives future research directions.
2 NIM 2.1 Using NIM To introduce NIM, we first describe how the user can use it to extract the foreground object of an image shown in Fig. 2a. The user uses an input device like the mouse as a brush to paint roughly along the boundary of the foreground and the background. Figure 2b shows that the brush starts painting clockwise from the southwestern edge of the foreground. This figure also shows that to the left side of the brush trail is the background (in black), that to the right side is the foreground (in white), and that the gray area between the foreground and the background is the unknown region. Alpha values of all pixels in the unknown region should be computed. Figure 2c shows that painting and computing is nearly halfdone. Figure 2d shows the end of matting. The user can then obtain the foreground and use it, such as pasting it onto another image to create some special effect. Two features of NIM should be noted here. First, the user can stop anywhere while painting, and change the width of brush as needed, for example, when the edge of foreground is not smooth. This allows NIM to extract the foreground with a better quality than without this feature. Second, NIM begins to process inputs immediately after the user has started to paint. This feature is useful because a careful user usually does not move the mouse too quickly when painting.
Multimed Tools Appl Fig. 2 An image in the process of matting: a original image, b painting just started, c painting almost half-done, d end of matting
2.2 NIM in execution As already mentioned, while the user is painting, NIM can compute the alpha of pixels painted by the brush. After NIM has finished computing the alpha of known inputs, it checks for new inputs to process. Whenever the brush moves, NIM has new inputs to process. The movement of the NIM brush is considered in one of eight directions, namely north, northeast, east, southeast, south, southwest, west, or northwest. Figure 3 gives more detail of how NIM works. The arrow indicates the movement of brush. Pixels to the left of the brush belong to the background and pixels to the right belong to the foreground. The area painted by the brush is divided into four regions: (1) definitely foreground, (2) definitely background, (3) unknown, (4) decided regions. The definitely foreground region is on the right edge of the path of the brush, and the definitely background region is on the left edge. A pixel originally in the unknown region becomes one in the decided region once its alpha value is computed. 2.3 Computing alpha A Web-based image matting program is inherently much more difficult to implement, much more complicated, and much slower than a stand-alone image matting program. Thus, a major concern is to choose an algorithm that is as fast as possible. To compute α, we use global Poisson matting equation [15]
rI ; Δa ¼ div F B
ð1Þ
2 @ @2 @ @ ; where Δ ¼ @x 2 þ @y2 is Laplacian operator, div is Divergence operator, and r ¼ @x @y is gradient operator. Recall that F and B are foreground and background colors of a pixel, as mentioned in Section 1x. If F and B can be decided, then α can be obtained by solving Eq. 1.
Multimed Tools Appl Fig. 3 Four regions in the area painted by the brush
Figure 4 helps explain how NIM obtains F and B. In this figure, the pixel with oblique lines at the northwestern corner of the unknown region is a target pixel, for which we want to decide F and B. To compute F, we first find the foreground window, which is the smallest square that contains at least one pixel of the definitely foreground and has the target pixel in its center. The window is shown as an 11×11 square with a solid line on each side, and contains four definitely foreground pixels. Suppose that the foreground window contains m definitely foreground pixels with grayscales fi for 1≤i≤m, and that the distance between the target pixel and the definitely foreground pixel with grayscale fi is di. Then, we have m P
F¼
i¼1 m P i¼1
fi di 2 1 di 2
To compute B, we first find the background window, which is the smallest square that contains at least one pixel of the definitely background and has the target pixel in its center. It is shown as a 3×3 square with a dotted line on each side, and contains four definitely background pixels. Suppose that the background window contains n definitely background Fig. 4 Target pixel and its foreground and background windows for deciding its α
Multimed Tools Appl
pixels with grayscale bj for 1≤j≤n, and that the distance between the target pixel and the definitely background pixel with grayscale bj is dj. Then, n P j¼1
B¼P n
j¼1
bj dj 2 1 dj 2
Subsequently, we use Eq. 1 and the Gauss-Seidel iteration method [4] to obtain α. First, Eq. 1 is rewritten rI ; aðx þ 1; yÞ þ aðx 1; yÞ þ aðx; y þ 1Þ þ aðx; y 1Þ 4aðx; yÞ ¼ div F B that is, aðx; yÞ ¼
1 rI ½aðx þ 1; yÞ þ aðx 1; yÞ þ aðx; y þ 1Þ þ aðx; y 1Þ div 4 F B
The Gauss-Seidel iteration process is then applied to compute α. For any pixel, if α≥ 0.95, it is deemed a foreground pixel with α=1. If α≤0.05, it is considered a background pixel with α=0. If 0.05