Robot Painter: From Object to Trajectory - CiteSeerX

4 downloads 0 Views 1MB Size Report
methods to high-level manipulation of a painter robot shown in Fig.1. ... human interaction to mark the foreground roughly as an input for the system. .... p , represents colors at each coordinate, and is a candidate for a new cluster. Since the ...
Robot Painter: From Object to Trajectory Miti Ruchanurucks, Shunsuke Kudoh, *Koichi Ogawara, Takaaki Shiratori, and Katsushi Ikeuchi The University of Tokyo, Japan, *Kyushu University, Japan {miti,kudoh,siratori,ki}@cvl.iis.u-tokyo.ac.jp, *[email protected] Abstract – This paper presents visual perception discovered in high-level manipulator planning for a robot to reproduce the procedure involved in human painting. First, we propose a technique of 3D object segmentation that can work well even when the precision of the cameras is inadequate. Second, we apply a simple yet powerful fast color perception model that shows similarity to human perception. The method outperforms many existing interactive color perception algorithms. Third, we generate global orientation map perception using a radial basis function. Finally, we use the derived foreground, color segments, and orientation map to produce a visual feedback drawing. Our main contributions are 3D object segmentation and color perception schemes. Index Terms – High-level planning, object segmentation, color perception, orientation map.

I. INTRODUCTION Recently, many areas of research on humanoid robots have been studied, such as motion control, man-machine interfaces, artificial intelligence (AI), and so on. Among them many research projects have tried to create artist robots, with the common objective of exploring new sensing, artificial intelligence, and manipulation techniques. ISAC, [1], for example, is a robot that can track an artist’s hand trajectory and mimic it. AARON, [2], is a unique robot that creates a piece of art by itself based on adapting a geometrical model of a subject. [3] considers the possibility of a tele-operated drawing robot that can track markers in the human hand online to generate brush trajectories. DrawBot [4] focuses on force feedback to draw a simple preprogrammed shape, whereas [5] considers both visual and force feedback along with grasping technology. Other than artist robots, there are more examples of using robots for specific tasks in order to explore high-level sensing and planning methods. For example, [6] focuses on reasoning techniques of a cook robot, and [7] designs depth perception in a robot that replicates human eye movements.

Fig. 1 Humanoid robot painter: Dot-chan.

This paper explores new visual perception techniques, namely object segmentation, color perception, and orientation mapping, and then shows how to apply these

methods to high-level manipulation of a painter robot shown in Fig.1. Human painters often concentrate on specific areas of a painting in which details in each area are different. This fundamental observation about the actions of a human painter can be applied to a robot to make generated paintings seem more meaningful. Hence, object segmentation should be used to extract the subject area in a 3D image. In object tracking and segmentation, where works like [8] focus on object detection or [9] try to follow an object, these methods cannot return the boundary correctly; hence, this paper focuses on how to segment the object area correctly, based on stereo image techniques and a 2D object segmentation method. Recent approaches to 2D object segmentation attempt to extract the foreground using color information, for example, Bayes matting [10], which is an improvement on the algorithms presented in [11], [12], and [13], contour information, for example, Snakes [14] [15] and Intelligent Scissors [16], or Graph Cut [17] and our Comprehensive Iterative Foreground Segmentation (CIFS) [18], which exploit both types of data. Combining both color and edge information seems to offer the most promise. The difference is that whereas [17] treats local and global color similarity as an objective function, [18] iteratively optimizes an objective function of local and global color similarity with a constraint of edge strength. For multi-dimensional images, [19] shows how to apply [17] to segment 3D images, like medical images. [20] and [21] consider videos as 3D objects and use [17] to segment the foreground. [22] is a good example that exploits color and laser range data as an input to a method like CIFS to extract an object and then perform alignment to generate a 3D model. [23] uses special video sensors to capture and extract the foreground. Among these methods, the drawbacks are that, first, [19], [20], [21], and [22] all require human interaction to mark the foreground roughly as an input for the system. Second, [19], [22], and [23] need special, and expensive, hardware. This paper focuses, first, on how to exploit normal stereo cameras to roughly extract the object automatically, based on 3D background subtraction and other vision techniques, and then use CIFS to segment the object area correctly. This is different from our former work [24] that targets only 2D segmentation. If doing so in 3D automatically is possible, it will greatly benefit the vision field. The robot tries to understand color distribution of the object to select the best set of colors to use. Conventionally, color reduction uses a splitting-based algorithm, such as median-cut [25], or a clustering-based algorithm, such as local K-means [26], to restrict the perception of color in a

scene. Normally, such methods face the problem that they tend to ignore colors with a small number of pixels. We earlier solves this problem by incorporating two clustering methods [24]. Our fast algorithm is shown in [27]. Another more complex approach makes use of a neural network for better classification results, for example, the Neural Gas [28]. Nevertheless, exploitation of the neural net is time consuming and difficult to implement. The fourth paradigm tries to enhance the quality of the output image by considering spatial characteristics, for example, dithered color quantization [29]. However, this consumes more time than is acceptable from an interactive standpoint. The final scheme is region-based color segmentation, such as [30]. However, this approach often cannot reduce the number of colors effectively, and, instead, is applied in the field of region segmentation. In this paper, since the robot is required to select colors as quickly and accurately as possible, we use the fast color perception method of [27]. We also show that the behaviors of clustering methods in RGB and CIELAB color spaces are different, which is interesting. Tips for K-means clustering are also presented. In order to draw brush strokes meaningfully, the robot then senses the orientation of the object. Since orientation data are usually noisy, only the boundary’s gradient is considered. To interpolate the orientation for the whole object, the global orientation method of [31], which exploits radial basis function [32] to generate an orientation similar in style to Van Gogh, is applied. Finally, based on color segments and a planned orientation map, the robot then performs a visual feedback drawing. First, it detects a brush and grabs it using cameras and force sensors. Second, a position of the brush tip is calculated using principal component analysis (PCA). Third, it then draws and compares the canvas with the picture produced by the stereo cameras. For color filling, simulation is done on a laptop with 1.8 GHz CPU. The overall time consumed by visual feedback simulation is very low, indicating that it would not be too great a load for the robot to adopt such an algorithm. These key problems of object segmentation, color perception, and orientation map are what our method can directly address. In Section 2, our novel 3D object segmentation is explained. Section 3 explains the color perception method. Section 4 demonstrates multi-scale painting using an orientation map along with brush detection by camera and force sensors. Finally, Section 5 contains our conclusions. II. OBJECT SEGMENTATION In order to extract the subject area, the object is placed in front of the robot. The robot then automatically extracts the foreground. The question is: how does the robot accomplish that? First, instead of letting the robot track a human interaction as a guide, as in [1] or [33], conventional 3D techniques and CIFS are exploited in this paper so that the robot can perform the whole process by itself.

Attempts to extract objects using 3D information have been relying on high quality hardware to gain adequate information and resolution of images, for example, the medical capturing device [17], the laser range finder [22], or special stereo cameras [23]. At present, one challenge in this field is whether it is possible to extract such information using normal stereo cameras. A. Preliminary 3D data extraction Using normal cameras, for example, 280x200-pixel cameras in a robot’s head in Fig.1, after disparity map and 3D background subtraction, the extracted foreground is usually, if not always, noisy, as shown in Fig. 2.

(a)

(b)

Fig. 2 Conventional foreground segmentation method. (a) An image captured by a camera attached to the robot in Fig.1, (b) Disparity map and background subtraction.

We then notice that interactive foreground segmentation can be used to solve this problem, as described in [19] [20] [21] and [22], which require a user to roughly mark foreground and background. Subsequently, the correct foreground can then be extracted using their algorithms. In contrast with such methods, instead of letting a user mark foreground and background, this paper directly exploits the extracted foreground shown in Fig. 2 (b). As the data is noisy, dilations and erosions are performed on foreground and background. The number of dilations and erosions is fixed for each calibration. This automatically marked image is shown in Fig. 6 (a). B. Foreground segmentation As mentioned earlier, a segmentation method that exploits both local and global color similarity is preferable, such as Graph Cut, which optimizes the combination. Since the original scheme was developed [17,34], there have been various improvements. [35] improves the Graph Cut method by allowing interactive estimation and incomplete labeling, both of which reduce the amount of needed user interaction. However, graph cut is generally incapable of extracting small or thin objects, in contrast to the matting method. [35] proposed to cut before matting by performing graph cut and then erosion to generate unknown areas for the matting algorithm. However, the number of erosions is ad hoc, which is one problem [36] tries to address by using belief propagation for matte estimation. The result is quite clean even when the boundary between foreground and background is not smooth. However, [36] shares a similar weakness to normal matting for images when the foreground/background colors are too ambiguous. Another

problem with Graph Cut is the high degree of user interaction required in some cases of multiple object segmentation. There are methods that optimize an objective function of color similarity with a constraint of edge strength, for example, [22] and its successor CIFS [18]. The constraint of edge in CIFS is used as a state indicator to perform multiple cuts and matting automatically as shown in Fig. 4 and Fig. 5,respectively. Furthermore, the CIFS algorithm enables segmentation to exploit both local and global data independently and systematically. By separately considering local similarity prior to global similarity, the requirement that foreground and background data distributions be well separated, a limitation inherent in many previous works, is eliminated. The detailed explanation of the algorithm and more segmentation results can be found in [18].

(a)

(b)

Fig. 3 CIFS notion: optimizing local and global color similarity with edge constraint. (a) Input. (b) First iteration.

(a)

(b)

Fig. 4 CIFS result. (a) Normal mode, (b) Multiple cuts mode.

(a)

(b) Fig. 5 CIFS for matting. (a) Input, (b) Composite image.

We then use the rough 3D foreground derived earlier as an input for CIFS. From Fig. 6, it can be seen that the object is segmented much better than the preliminary result in Fig. 2 (b). The CIFS consumes less than 0.1 second for a 280x200 image on a laptop with a 1.8GHz CPU.

(a)

(b)

Fig. 6 Object segmentation using 3D data as an input for CIFS. (a) Input from dilations and erosions on Fig.2 (b), (b) Object segmentation result.

III. COLOR PERCEPTION After the object boundary is derived, the robot must represent its color using an appropriate set of colors. This section attacks the problem of whether it is possible for a robot to perceive colors in an image in the same way a human does. The question falls into the domain of color reduction. Usually, digital color images consist of up to 16 million different colors in a 24-bit color space. However, in many applications, namely compression, presentation, and transmission, it is preferred to have as small a number of colors as possible. Color reduction is a process that transforms a full color image to an image with a smaller number of colors by grouping similar colors and replacing them with a representative color. Many segmentation methods apply splitting [25], clustering [26], neural network [28], spatial characteristics [29], or region-based algorithm [30] to restrict the perception of color in a scene. However, none of these methods reflect a human’s color perception correctly, as can be seen in Fig. 7. This paper uses a color reduction scheme based on clustering-based approaches of [24] and [27]. [24], hereafter referred to as MDC+K, incorporates two clustering methods, maximum distance clustering (MDC) and Kmeans. It shows that using MDC + an iteration of K-means in RGB color space yields a better result than using Kmeans alone. This results in high-quality, well-contrasted output images, and even the reduction in the number of colors is very low. [27], hereafter referred to as subMDC+K, proposes sub-optimal MDC to solve the speed problem of MDC+K. Both methods work well in both RGB and CIELAB color space. However, convergence behaviors in such two color spaces are different, which will be shown here. Furthermore, CIELAB provides better color discrimination power. Another objective of this paper is an algorithm that anyone could easily implement, and this is already achieved since our improvement is based on well-known and easy algorithms, MDC and K-means. For benchmarking, we use Photoshop’s perceptual-based palette generation [37], region-based color segmentation of [30], and clustering-based method of normal K-means. A. MDC+K Maximum distance clustering (MDC), considered a weak clustering method, is initially used to specify the highest

contrasted colors. The set of maximum contrast colors is then used to initialize K-means clustering. K-means clustering is robust and widely used in various fields of research. The drawback is that it tends to ignore colors with a small number of pixels and joins them to a neighboring larger cluster, which lessens the contrast in the output image. Hence, MDC+K is a method that weights the contrasted colors obtained by MDC and the statistically calculated colors obtained by K-means. To capture the maximum contrast of colors of an image, MDC is applied to the image in the RGB color space. The first iteration starts by sampling a pixel from the input image and then identifying a color with the largest Euclidean distance from the sampled pixel’s color. For all consecutive iterations, the criterion for a new color is (1). (1) d max,k +1 = min max c k − pi , j k

i, j

where d max,k +1 is the maximum distance of cluster k+1, k is the number of existing clusters, i,j is the coordinate of a pixel, c k is the existing cluster’s colors, and pi , j represents colors at each coordinate, and is a candidate for a new cluster. Since the colors would be used to initialize K-means clustering, the algorithm continues until the number of clusters equals the desired number of colors, or there is no other candidate color. In other words, the method requires running k iterations on an image to find k initial cluster means. The derived highest contrasted colors are used as initial means for K-means clustering. Then, each new piece of data is used to compute the new mean of the closest cluster derived from (2). (2) min c k − p i , j k

In order to prevent K-means from dominating the clustering process, it must not be run until converged. Practically, running K-means for only one iteration gives the

(a)

(b)

(c)

best result. Finally, each pixel of an image is rendered based on the closest derived cluster mean. B. Sub-MDC Since it is preferable that color reduction returns the output image as fast as possible, MDC is the bottleneck of MDC+K, as it consumes a considerable amount of time when the number of colors is high (see Table 1). This is due to the fact that searching for one maximum distance cluster requires an iteration search on an image. So sub-MDC [27] is used in this paper. The algorithm starts similarly by identifying a color with the largest Euclidean distance from a sampled pixel’s color. The difference is that every new cluster color is derived within the second iteration search by a criterion explained as follows. First, calculate a minimum Euclidean distance of a pixel from the existing cluster(s) using (3). (3) d min,i , j = min c k − p i , j k

if d min,i , j > d max,k for any cluster k , replace the existing cluster k with p i , j . else if d min,i , j ≠ 0 & d min,i , j ≤ d max,k for all clusters, generate a new cluster with value equal to pi , j if the present number of clusters is still lower than the desired number.

Using this new algorithm, running sub-optimal MDC consumes approximately the same time as running an iteration of K-means. In other words, running sub-MDC+K consumes time approximately equal to that of two iterations of K-means. Moreover, we discover that running K-means using the same order of input will result in an even lower contrasted image, so K-means here is performed forward for one iteration and backward for the next iteration, and vice versa, on the image coordinate domain.

(d)

(e)

(f)

Fig. 7 Comparison of color reduction tools when performing color reduction to 16 colors in RGB color space. (a) Original, (b) Watershed, (c) Photoshop, (d) 15 iterations of K-means forward and backward, (e) MDC+K, (f) Sub-MDC+K.

Method K-means MDC+K Sub-MDC+K

Time complexity Ο( N ⋅ k ⋅ i )

Ο( N ⋅ k 2 ) + Ο( N ⋅ k ) Ο( N ⋅ k ) + Ο( N ⋅ k )

16 colors 2.33 1.21 0.29

64 colors 7.64 15.83 1.03

Table 1 Algorithm time comparison. N is the image size (here 640x480), k is the number of colors, and i is the number of iterations until convergence (here 15). Time is in seconds. Average is based on 20 images using a laptop with 1.8 GHz CPU.

C. Behaviors in CIELAB color space Unlike RGB, CIELAB is designed to produce a color that is more perceptually linear than other color space, meaning that a change of the same amount in color value should result in a change of about the same visual importance. Nevertheless, it might be assumed that the comparison result in the former section should hold even when the color space is changed from RGB to CIELAB. Surprisingly, it does not.

This paper shows that, using CIELAB, two powerful yet easy-to-implement color reduction methods can be achieved. The best choice is the sub-MDC+K. The second one is running K-means for two iterations, forward and backward. We then apply the finding on the foreground derived from Section 2, using sub-MDC+K in CIELAB color space. The color-reduced image is shown in Fig. 9.

(a) (a)

(b)

(c)

Fig. 8 Comparison of clustering methods when performing color reduction to 16 colors in CIELAB color space. (a) 2 iterations of K-means forward and backward, (b) 15 iterations of Kmeans forward and backward, (c) Sub-MDC+K.

Method Photoshop K-means 1 iteration 2 iterations 2 iterations forward backward 15 iterations 15 iterations forward backward MDC+K Sub-MDC+K

MSE 86.121 (RGB) (CIELAB) 168.581 115.223 131.606 90.499 110.640 67.156 95.913 70.699 86.895 60.762 84.769 62.345 87.626 63.718

Table 2 Mean square error (MSE) from original images measured in CIELAB color space. Average of 20 images reduced to 16 colors.

It can be seen from Table 2 that, first, using RGB color space, the perceptual color palette of Photoshop, [37],’s error is as low as that of K-means with a high number of forward and backward iterations, as well as that of subMDC+K; [27]. However, it would be a misinterpretation to conclude that [37], K-means, and [27] are comparable when using in RGB color space. [37] cannot preserve contrast in the scene well, whereas K-means is slow and cannot preserve the colors with a small number of pixels, as can be seen in Fig. 7. On the other hands, [27] can preserve the contrast in an image very well in a reasonable time. Second, it can be seen that although more iterations of Kmeans would reduce the error, the contrast sometimes gets worse when the number of iterations goes too high. Also, practically, the number of iterations is limited by the interactive requirement. Third, running K-means forward and backward helps reduce a great deal of error. Finally, it can be clearly seen that CIELAB offers higher quality in color reduction than RGB. In CIELAB, running only two iterations of K-means, forward and backward, is comparable to sub-MDC+K.

(b)

Fig. 9 Comparison of color reduction tools when performing color reduction to 4 colors using sub-MDC+K. (a) Using RGB color space, (b) Using CIELAB color space.

IV. VISUAL FEEDBACK DRAWING After the foreground and its appropriate number of colors are derived, a method that mimics human painting style is used. For the local information used to guide brush strokes, [38] uses a gradient to guide brush strokes that are more robust to texture, whereas [39] uses image moment that is more robust to area orientation. However, such methods do not produce good brush strokes because the gradient or moment that a robot preserves could be noisy. Recently, methods that consider global gradient include [31], which selects only a strong gradient and applies radial basis function (RBF) to interpolate a gradient on other areas. This seems to match an artist’s perception. In fact, results shown in [31] resemble Van Gogh’s style. This paper modifies the method of [31], similarly to our former work [24], to generate practical end-effector trajectories for the robot. For lowlevel manipulation, please refer to [5]. A. Brush strokes generation Whereas [24] uses texture to guide brush strokes, we observe that using the gradient of the object, it may or may not generate good trajectories. Hence, this paper considers only the gradient information of the boundary between foreground and background. The robot then uses linear basis RBF [32] as a window to interpolate the orientation field, which is possible by weighting the function of (4). 2 2 (4) w = Grad x + Grad y In other words, the orientation image is not calculated totally globally since doing so would make an orientation of any pixel affected by the whole image’s orientation, making straight lines bend. Also, globally computing consumes a great deal of time. Instead, the robot uses a fixed-size RBF mask to calculate orientation. The area that has no orientation information would use the orientation of the

surrounding area, by increasing the size of the RBF mask. The result is shown in Fig. 10. Note that Fig. 10 (b) is generated just in order for a reader to view the result of RBF. Practically, we need to interpolate the orientation only at a brush position. (a)

(b)

Fig. 11 Geometry edge drawn by robot. (a) Input processed by algorithm of [5], and (b) Output.

(a)

(b)

Fig. 10 Orientation of an image scaled from {- π /2, π /2} to {0, 255}. (a) Original orientation of boundary, (b) Orientation after RBF is applied.

B. Visual feedback drawing: plan, act, and sense First, the robot detects a brush using stereo cameras mounted in its head as can be seen in Fig. 1. Whereas a former grasping working like [40] focuses on localization of an object and path planning, and [41] focuses on learning from observation, the low level control of this work considers localization and grasping using multi-sensors: cameras and force sensors in the robot’s hand. After the robot picks up the brush, in order to calibrate the position of the brush tip, based on a real-time captured image, principal component analysis (PCA) is used to calculate the axis of the brush in its hand. Based on this axis, the pre-computed distance between the lower edge of the handle and the tip is added to the lowest position of the brush handle detected. This is established as the brush tip position. The robot then scans each color segment derived from Section 3 to find an area that contains a number of pixels larger than (5). (5) th _ region = k _ region × r 2 where r is the pre-computed radius of the brush, and k_region is the constant that must be lower than π , depending on drawing style. If such an area is found, the robot starts to draw. To check whether the brush tip touches the canvas or not, the force sensor is then used along with the position of the brush tip detected in real time. For the next movement, the robot then focuses along the normal direction of orientation derived earlier, and counts all yet-to-be-drawn pixels with the same color. If the number of pixels is higher than the threshold described in (5), the robot would decide to move the brush. Each stroke is considered finished if an area found in the normal orientation direction is smaller than (5). After one color is painted, the next color is selected. At the present time, due to insufficient drawing equipment and other resources, the artworks the robot created are subjected only to drawing of the geometry edge, as can be seen in Fig. 11. In color filling, simulation on a laptop is done, k_region is set to 0.5, and the result is shown in Fig. 12.

(a)

(b) Fig. 12 Visual feedback painting. (a) Trajectories, (b) Area filling result.

This type of visual feedback simulation typically consumes less than 0.1 second, for a 280x200 image, on a laptop with 1.8GHz CPU; thus, it should not be any load at all for the real robot platform. V. CONCLUSION This paper focuses on finding new robot vision technologies of object segmentation, color perception, and orientation mapping, and applies them to accomplish visual feedback drawing by a robot. In order to extract the subject area, this paper focuses on how to exploit normal stereo cameras to roughly extract the object automatically using disparity map and 3D background subtraction, and then use our Comprehensive Iterative Foreground Segmentation (CIFS) to extract the object area correctly. 3D background subtraction is usually noisy, thus dilations and erosions are required. CIFS, then, exploits the local and global color similarity optimization with a constraint of edge to extract the boundary correctly. This is a promising paradigm, which can be applied in wide varieties of 3D segmentation/detection tasks. Resulting in color perception similar to that of a human, maximum distance clustering and K-means clustering-based color reduction are used for the robot to sense color distribution in a scene. The method in a reasonable time outperforms many existing color reduction algorithms in the aspect of color perception. A fast but delicate global orientation perception is proposed using a radial basis function. Visual feedback painting then uses derived color segments and an orientation map to guide brush strokes. Cameras on the robot’s head and force sensors in its hand are used to locate and grasp the brush as well as to check whether or not the brush contacts the canvas. ACKNOWLEDGMENT

This work is supported by the Japan Science and Technology Corporation (JST) under the CREST project. REFERENCES [1] [2] [3] [4] [5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

[16] [17] [18] [19]

[20] [21] [22]

[23] [24]

[25]

A. Srikaew, M. E. Cambron, S. Northrup, R. A. Peters II, M. Wilkes, and K. Kawamura, “Humanoid Drawing Robot,” IASTED International Conference on Robotics and Manufacturing, 1998 P. Monaghan, “An Art Professor Uses Artificial Intelligence to Create a Computer That Draws and Paints,” The Chronicle of Higher Education, 1997. http://techhouse.brown.edu/~neel/drawing_telerobot/ http://robix.com/drawbot.htm S. Kudoh, K. Ogawara, M. Ruchanurucks, and K. Ikeuchi, “Painting Robot with Multi-Fingered Hands and Stereo Vision,” IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006. F. Gravot, A. Haneda, K. Okada, and M. Inab, “Cooking for Humanoid Robot, a Task that Needs Symbolic and Geometric Reasonings,” ICRA, 2006. F. Santini and M. Rucci, “Depth Perception in an Anthropomorphic Robot that Replicates Human Eye Movements,” ICRA, 2006. D. R. Thompson and D. Wettergreen, “Multi-object Detection in Natural Scenes with Multiple-view EM Clustering,” IROS, 2005. M. Marrn, J. C. Garca, M. A. Sotelo, D. Fernndez, and D. Pizarro, “XPFCP: An Extended Particle Filter for Tracking Multiple and Dynamic Objects in Complex Environment,” IROS, 2005. Y. Y. Chuang, B. Curless, D. Salesin, and R. Szeliski, “A Bayesian approach to digital matting,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. Y. Mishima, “Soft edge chroma-key generation based upon hexoctahedral color space,” U.S. Patent 5,355,174, 1993. A. Berman, P. Vlahos, and A. Dadourian, “Comprehensive method for removing from an image the background surrounding a selected object,” U.S. Patent 6,134,345, 2000. M. A. Ruzon and C. Tomasi, “Alpha estimation in natural images,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 18–25, 2000. M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” IEEE International Conference on Computer Vision, 261– 268, 1987. J. Hug, C. Brechbuhler, and G. Szekely, “Tamed Snake: A Particle System for Robust Semi-Automatic Segmentation,” Proc. Intl. Conf. on Medical Image Computing and Computer-Assisted Intervention, 106-115, 1999. E. Mortensen and W. Barrett, “Intelligent scissors for image composition,” ACM SIGGRAPH, 1995. Y. Boykov and M. P. Jolly, “Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images,” IEEE International Conference on Computer Vision, 2001. (submitted) M. Ruchanurucks, K. Ogawara, and K. Ikeuchi, “Comprehensive Iterative Foreground Segmentation,” IEEE Trans. on Pattern Analysis and Machine Intelligence. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 26, 2004. J. Wang, P. Bhat, R. A. Colburn, M. Agrawala, and M. F. Cohen, “Interactive video cutout,” ACM SIGGRAPH, 2005. Y. Li, J. Sun, and H. Y. Shum, “Video object cut and paste,” ACM SIGGRAPH, 2005. M. Ruchanurucks, K. Ogawara, and K. Ikeuchi, “Neural Network Based Foreground Segmentation with an Application to Multi-Sensor 3D Modeling,” IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006. M. McGuire, W. Matusik, H. Pfister, J. F. Hughes, and F. Durand, “Defocus video matting,” ACM SIGGRAPH, 2005. M. Ruchanurucks, S. Kudoh, K. Ogawara, T. Shiratori, and K. Ikeuchi, “Humanoid robot painter: visual perception and high-level planning,” IEEE International Conference on Robotics and Automation, 2007. P. Heckbert, “Color image quantization for frame buffer display,” Comput. Graph., vol. 16, pp. 297–307, 1982.

[26] O. Verevka, “The local K-means algorithm for color image quantization,” M.Sc. dissertation, Univ. Alberta, Canada, 1995. [27] M. Ruchanurucks, “Clustering based color reduction: Improvement and tips,” Meeting on Image Recognition and Understanding, 2007. [28] A. Atsalakis and N. Papamarkos, “Color reduction and estimation of the number of dominant colors by using a self-growing and selforganized neural gas,” Engineering Applications of Artificial Intelligence 19, pp. 769–786, 2006. [29] J.M. Buhmann, D.W. Fellner, M.Held, J.Ketterer, and J. Puzicha, “Dithered color quantization,” Proc. EUROGRAPHICS, vol. 17, no. 3., pp. 219–231, 1998. [30] L. Vincent and P. Soille, “Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations,” IEEE Trans. on Pattern Analysis and Machine Intelligence 13, 1991. [31] J. Hays and I. Essa, “Image and Video Based Painterly Animation,” International Symposium on Non-Photorealistic Animation and Rendering, 2004. [32] A. G. Bors, “Introduction of Radial Basis Function (RBF) Networks,” Online Symposium for Electronics Engineers, vol.1 of DSP Algorithms: Multimedia. [33] C. Breazeal, C. Kidd, A. L. Thomaz, G. Hoffman, and M. Berlin, “Effects of Nonverbal Communication on Efficiency and Robustness in Human-Robot Teamwork,” IROS, 2005. [34] D. Greig, B. Porteous, and A. Seheult, “Exact MAP Estimation for Binary Images,” J. Roy. Stat. Soc. B. 51, 1989, pg. 271–279. [35] C. Rother, V. Kolmogorov, and A. Blake, “GrabCut – Interactive Foreground Extraction Using Iterated Graph Cut,” ACM SIGGRAPH, 2004. [36] J. Wang and M. F. Cohen, “An Iterative Method Approach for Unified Image Segmentation and Matting,” IEEE International Conference on Computer Vision, 2005. [37] Adobe Photoshop 7. [38] A. Hertzmann, “Painterly Rendering with Curved Brush Strokes of Multiple Sizes,” ACM SIGGRAPH , 1998. [39] M. Shiraishi and Y. Yamaguchi, “An Algorithm for Automatic Painterly Rendering Based on Local Source Image Approximation,” International Symposium on Non-Photorealistic Animation and Rendering, 2000. [40] Y. Hirano, K. Kitahama, and S. Yoshizawa, “Image-Based Object Recognition and Dexterous Hand/Arm Motion Planning Using RRTs for Grasping in Cluttered Scene,” IROS, 2005. [41] M. Hueser, T. Baier, and J. Zhang, “Learning Demonstrated Grasping Skills by Stereoscopic Tracking of Human Hand Recognition,” ICRA, 2006.