Automatic Generation of Robot Program Code : Learning from ...

3 downloads 1353 Views 272KB Size Report
no robot programming language is capable of meeting this mundane requirement. Hence, it is of practical interest for some speci c applications (viz, automatic.
Automatic Generation of Robot Program Code : Learning from Perceptual Data M. Yeasin Department of Electrical Engg. Bangladesh Institute of Technology Chittagong, Chittagong-4349, Bangladesh

Abstract

We propose a novel approach to program a robot by demonstrating the task multiple number of times in front of a vision system. Here we integrate human dexterity with sensory data using computer vision techniques in a single platform. A simultaneous feature detection and tracking framework is used to track various features ( nger tips and the wrist joint). A Kalman lter does the tracking by predicting the tentative feature location and a HOS-based data clustering algorithm extracts the feature. Color information of the features are used for establishing correspondences. A fast, ecient and robust algorithm for the vision system thus developed process a binocular video sequence to obtain the trajectories and the orientation information of the end e ector. The concept of a trajectory bundle is introduced to avoid singularities and to obtain an optimal path.

I. Introduction

Most of today's robots lack the ability to perform satisfactorily when the object to be manipulated exhibits a high degree of variability [1]. Developing a

exible robotic manipulating system to tackle these uncertainties and to adapt it to the frequently changing working environment with a minimum operator intervention is of much practical importance . Severe diculties arise in designing such a system while the human beings have the capability to intelligently process these information and the skill to execute them. They can perform these tasks with ease, exibility and, of course, with sucient reliability. Ideally, one wants to have a system that can equally respond like a human being. In many practical systems like a exible manufacturing system and a robot handling system, recon gurability of the robot is quite crucial. Unfortunately no robot programming language is capable of meeting this mundane requirement. Hence, it is of practical interest for some speci c applications (viz, automatic robot programming/human-computer interaction) to  On

study leave to IIT, Bombay

S. Chaudhuri Department of Electrical Engg. Indian Institute of Technology Powai, Bombay-76, India tap the natural intelligence and dexterity of human being into the system. The motivation behind programming a robot by demonstration is simple and compelling: if a user knows how to perform a task that should be sucient to create a program to replicate the task. This research work investigates various issues that arise in practically realizing this idea. Teaching by guiding [2] is a simple method in which an operator directs the robot's end-e ector to all necessary locations where it has to grip or manipulate an object by means of a teach panel. This method is simple but error prone, tiring, less portable and possibly risky. Nevertheless, this simple example discloses some of the characteristics frequently found in learning algorithms. A logical extension of this paradigm is programming a robot by visual demonstration (PRVD). It is widely believed (for example, [2], [3], [4], [5], [6], [7], [8]) that integration of perceptual information with human skill can help in developing a exible autonomous platform for programming a robot. A robot programming language is a means by which a programmer can express the intended operation of a robot and associated activities [9]. However, the level of abstraction at which one must express robotic and sensory operations does not ful ll one's expectations. To overcome the disadvantages, we suggest a visionbased system to program a robot. The subject of this paper is not the entire system which remains to be fully integrated. Instead, our endeavor to this end is to develop a fast, ecient and robust vision system which is capable of extracting sucient statistic to program a robot from perceptual data. We introduce a new notion of nonsingular optimal path by de ning a trajectory bundle. We believe this will simplify the code generation and will help the robot to perform the task with desired level of accuracy. The organization of the paper is as follows. This paper describes the vision system for automatic code generation in Section II. Section III reports the design and implementation issues of the proposed vision system. Section IV describes experimental results to

show the ecacy of the proposed system. Finally, Section V concludes the paper.

II. Proposed Vision System

The essential component of the proposed system is shown in Figure 1. This system is composed of ve major blocks namely, i) data acquisition, ii) vision, iii) trajectory reconstruction, iv) task description and v) command generation modules. Data acquisition module captures the training data and the vision module extracts sucient statistics from it to generate the automatic command for the robot controller. The trajectory reconstruction module constructs a smooth and optimal path from the information (for example, trajectory of the human hand in motion) extracted by vision system. The task description module breaks the trajectory into sub-tasks to facilitate the command generation using any available robot programming language.

Given the con guration space of the robot, it is possible to check the trajectory information obtained from demonstrations to locate workspace singularities, if any. The inverse kinematic solution can be used to translate the Cartesian coordinates of the trajectory thus obtained into the corresponding robot joint coordinates [11]. However, there may be singularities in the path so de ned, which will force the robot to deviate from the trajectory (that may not be collision free) and may also lead the robot to fail to perform its task. This path is not optimal either in the human or in the robot manipulator sense. This problem can be easily circumvented by de ning a \trajectory bundle" instead.

Manipulator End-effector Joint

Camera

Gripper

Objects to be manipulated

Robot Controller

Work space of human Data acquisition system

Standard work station

Objects to be manipulated Work space of robot

Vision Module

Command generation Frame digitizer Stereo Port

Feature det -ection and tracking Segmentation

Trajectory reconstruction

Task description

Fig. 1. Block diagram of the proposed system.

In this approach a human operator will demonstrate a particular task needs to be automated. The operator will perform this task multiple number of times in front of a binocular data acquisition system. The operator provides the intelligence in choosing the hand (end-e ector) trajectory, the grasping strategy, the hand-object intimation by directly grasping the object and the subsequent action performed on the object. This alleviates the need for path planning, grasp synthesis, object recognition and task speci cation. The vision system we present in this paper captures the trajectory (position and orientation) information from the training data. The stereo vision system provides the world co-ordinate trajectory traversed by the human operator while demonstrating the task. It is possible to transform the trajectory information thus obtained to the coordinates of the robot working space using the standard technique [10].

Fig. 2. Illustration of the concept of a trajectory bundle for a two nger end-e ector. The projection from the 3-D to 2-D is shown.

The transformed trajectories obtained from multiple demonstrations are coalesced together to construct a \trajectory bundle". Thus, the trajectory bundle can be de ned to be the 3-D envelope (given by the convex hull) of all trajectories for a particular ende ector during the multiple demonstrations of the the task execution. In Figure 2, the conceptual outline of trajectory bundles for the thumb and the index nger (which is applicable for a two- nger gripper) has been shown for illustration. It may be noted that the exact nature of the bundle would depend on the dexterity of the human operator. Trajectory selection module selects the best trajectory from the bundle thus constructed based on some optimality criterion. The criterion must guarantee a smooth as well as a properly interpolated path in addition to being optimal. This can be achieved using the vector spline technique given below.

A. Generation of Smooth Path

The idea of the reconstruction of an optimal, nonsingular trajectory from the demonstrations stems

from the fact that the trajectory generated by the human hand which has more than 30 DOFs has to be replicated by the robot manipulator having only six DOFs. Hence, it is crucial to impose some constraints on the trajectory traversed by the human operator if the robot has to replicate the task. To track a trajectory a robotic system usually encounters two kinds of singularities. The rst category of singularity involves a point which is on or out side the boundary of the robot's workspace. The second type of singularity lies within the work space but can be bypassed under suitable conditions. The trajectory generated by the human operator is arbitrary in nature and may not be smooth. For this reason we must seek a trajectory in the class of smooth curves. This can be achieved using a vector spline [12] that minimizes the following cost function

d(j ) =

Nj N X X

2 2  yij ? Y (xji ) zij ? Z (xji ) + Nj Nj



j =1 i=1 Z 

+

d2 Z dx2

2



2 + d Y2 dx

2 !

dx;

(1)

where (xi ; yi ; zi ) denotes the 3-D coordinates of a point on the training trajectory in the robot coordinate system and is a regularization parameter which controls the degree of smoothness. The superscript j is the running index for the particular demonstration, N stands for the total number of demonstrations and Nj is the number of measurement points in the j th demonstration. The estimated path is given by the coordinates (x; Y (x); Z (x)) 8x. It may be noted that the x?coordinate has been used as the independent variable. If there is any singularity in the robot space, this must be enforced as a constraint while minimizing equation (1) [13], [14].

B. Temporal Segmentation and Command Generation

It is essential to map the trajectory thus obtained using the technique described in section II-A to generate appropriate commands for the robot. Ideally, one would like to transfer the smooth trajectory obtained from the demonstrations as it is and leave it up to controller to track it in space and time. However, the interface between the user and the robot is a controller programming language which unfortunately can not track such a trajectory. This problem can be circumvented by segmenting the trajectory into meaningful sub-tasks. This can be achieved from the velocity pro le of the trajectory so de ned using the technique

adopted in [7]. From the velocity pro le of the trajectory we can de ne high and low velocity nodes on the path to be followed. Finally, the di erent segments of the trajectory can be merged to form the complete instruction list for the robot controller. The segmented path can be used as an input parameter to plan the command sequence. Individual commands like OPEN (open the gripper), ROTATE, MOVE, MOVES (move along a straight line), CURVE (cubic Bezier spline) can be linked with segments to generate automatic commands for the robot.

III. Design of the Experiment

It is extremely dicult to achieve a complete automatic programming of a robot in real time using existing techniques. One must ensure meeting the following demands,  The system must be able to provide a precise nonsingular, collision-free trajectory to replicate the task.  The system must be able to split the image sequence into meaningful segments that correspond to separate human assembly tasks, i.e perfect temporal segmentation of the image sequence. Both of these criteria can be met using the natural dexterity and skill of human beings. To obtain an accurate result, we attach circular shaped markers of di erent colors on various ngertips and hand joints. We use the centroids of the blobs thus generated in the image plane as our features for tracking. A typical color encoding scheme for di erent ngertips and the wrist joint is shown in Figure 3. Red

Red White

White

Blue

Blue

Green Yellow

Left camera view of hand

Green Yellow

Right camera view of hand

Fig. 3. Color coding for establishing feature correspondences for left and right images. The little nger is not used for a four- nger gripper.

We use a higher order statistics (HOS) based data clustering algorithm to detect the color markers with an improved accuracy. The algorithm is based upon a series expansion of the multivariate probability density function in terms of the Gaussian function N (; R) and Hermite polynomials, where  and R represent the usual rst and second order statistics of the data X (RGB information at a pixel), respectively. The HOS-based similarity measure (for details see [15]) is

given by

? log N (; R) 1 + 

m h  1 i X E H Tn R? 2 (X ? ) H n n=3 

R? 12 (X ? )

:

(2)

Here Hn is the nth order Hermite polynomial, m is the order of expansion and E is the expectation operator. Based on the above similarity measure we now present the data clustering algorithm.

HOS-based Data clustering algorithm: Step - I: Obtain c initial pattern centers with Euclidean distances on the dataset fXg. Divide the data set into c clusters by assigning each data sample to the nearest pattern center. Step -II: Initialize the joint moments (second and onwards up to order m) of all c clusters. Step-III: Recompute pattern centers to be the centroids of the current data partitions. Step -IV: Re-assign the data partitions into c classes using the HOS-based similarity measure de ned in equation(2). If the data partitions remain unchanged or if the maximum number of inner-loop is exceeded, go to Step - V. Otherwise, return to Step - III.

Step - V: Re-compute higher order moments

of all c clusters from their respective data partitions. Step -VI: Recompute the data partitions with the HOS-based measure using the currently learned statistics from the data. If the data partitions remain unchanged or if the maximum number of outer loop (i.e Steps III - VI) iterations has been exceeded, proceed to Step - VII else go to Step - III. Step-VII: Return the current set of c pattern centers and the joint moments for each cluster. The inner loop does the k?means data clustering whereas the outer loop learns the necessary statistics from the data set. The result of application of the HOS-based k-means clustering technique is the detection of individual markers in each frame. However, there is no need to process the entire image to detect the color markers. We use an iterated extended Kalman lter (IEKF) [16] to reduce the search space

and to track the features simultaneously. A 3-D translational model is used for state updation [17]. The algorithm can be written as

Initialization Step - I: Take the rst image from stereo se-

quences Il (x; y; t) and Ir (x; y; t). Segment the images using proposed data clustering algorithm and extract features of interest. Step -II: Establish both temporal and stereo correspondences based on color and the knowledge of the placement of markers. Step-III: Initialize the trajectory for every nger tip and the wrist joint.

Simultaneous feature detection and tracking Step -IV: Predict the ith feature in (k + 1)th

frame based on information up to kth frame. Crop windows of appropriate dimensions from both Il (x; y; k + 1) and Ir (x; y; k + 1) at predicted locations. Step - V: Segment only the cropped images using the proposed algorithm and extract the feature points of interest. Establish temporal and stereo correspondences using color. Step -VI: Check for occlusion (absence of feature) and retain the predicted point in case of occlusion. Step-VII: Resolve ambiguity in case of multiple blobs being detected within the window using the path coherence measure [18]. Step-VIII: Repeat Step IV - VII for every feature point. Step - IX: Repeat Step IV - VIII for the entire sequence. Step - X: Repeat Step I - IX for each demonstration.

This 3-D trajectory information for various nger tips and the wrist joint for every demonstration is used to construct the trajectory bundle for an individual feature.

IV. Experimental Results

We experimented on real, binocular, color image sequences to extract the 3-D trajectory information of human hand motion. The ultimate goal we have in our mind is to derive the sucient statistics from the perceptual data. The existing Euclidean, Mahalanobis and the proposed HOS-based data clustering

algorithms were used for color image segmentation and their performances were compared. We select c (number of clusters) to be 4 in our experimentation. Figures 4 show a representative frame of a training image sequence and the corresponding segmentation results (only grey-level images are shown) for a two nger grasping sequence. Clearly, the color markers on the nger tips are best resolved by the proposed segmentation scheme.

Fig. 5. Sample left and right images from the left and right stereo sequence demonstrating execution of a task.

(a)

(b)

y co-ordinates of different finger tips

380

1 0 0 1

360

Thumb 340 320

1 0 1 0

300

Index

280

Ring

260

Middle

240 220 0

50

a)

(d)

Fig. 4. (a) First frame of the image sequence. (b,c,d) Segmentation results using the Euclidean, the Mahalanobis and the HOS-based clustering algorithms, respectively.

The second task to achieve the goal is to track the feature points and meaningfully interpret the trajectory traversed by the nger tips and the wrist joint while demonstrating the task. Figure 5 shows a representative frame (from both left and right cameras) we used in our experimentation for nding the trajectory information using the proposed algorithm. The sequence was captured by demonstrating the task only once in front of the vision system. Figure 6 shows the trajectory traversed by di erent nger tips (the perspective projection on the image plane is displayed for clarity). This result is very signi cant in terms of capturing the human strategy and skill. It is evident in Figure 6 that the trajectory traversed by each nger tip has a nite number of motion break points. These, in some sense, re ect the strategy chosen by the human operator while demonstrating the task. These motion break points were computed from the velocity

1 0 0 1

150

200

250

300

350

400

Corresponds to start of hand motion Start of preshaping of hand (low speed node) Start of approaching towards object ( high speed node) Start of pregrasping the object ( low speed node) Start of fine motion adjustment ( extremely low speed node)

Fig. 6. Trajectories of di erent nger tips obtained from the training data set for a single demonstration.

230 220

Y - cordinate of the feature point

(c)

1 0 0 1

100

x co-ordiates of different finger tips

Optimal smooth trajectory

210 200 190 180 170

Trajectory bundle for thumb 160 150 20

40

60

80

100

120

140

160

180

200

X - cordinate of the feature point

Fig. 7. The computed trajectory bundle for the thumb obtained from multiple training data sets. Optimal smooth path derived from the trajectory bundle is shownas bold line inside the trajectory bundle.

cial to perform the task. We use human intelligence to choose the task strategy and task execution point which automatically ensures a collision-free path. We discuss the essential steps of command generation by integrating sensor data and human intelligence.

Y - Cordinate of the feature point

220

200

Optimal smooth trajectory

180

160

References

Trajectory bundle for index finger 140

120

0

20

40

60

80

100

120

140

160

180

200

X - Cordinate of the feature point

Fig. 8. The trajectory bundle computed similarly for the index nger from the same training data set. The optimal smooth path is also derived as mentioned above and presented in similar fashion.

pro le of the trajectory. This information is useful in terms of meaningfully segmenting the task into subtasks performed by the operator. We also experimented on multiple demonstrations to circumvent the problem related to a single demonstration. Figures 7 and 8 show the trajectory bundle for the thumb and index nger, respectively, recovered from the training data. Since the trajectory bundle de nes a volume in 3-D, we show a projection of that on the image plane for illustration. As we arti cially induce features on the operator's hand for getting consistent features to track, our calculation of depth is quite accurate. The chosen smooth trajectory (solution to eqn (1)) is also displayed in the gure. The orientation of the end-e ector can also be obtained from the training data [17]. It may be noted that the proposed tracking algorithm handles limited occlusion but may fail to track the points in case large non-uniformity in temporal sampling of the perceptual data, a common problem with the commercial grade frame grabbers.

V. Conclusion

In this paper we present a simple, ecient and robust vision-based system which provides a nonsingular, optimal trajectory for the robot manipulator. Vision system we proposed in this paper is quite fast as it simultaneously predicts and detects the features to be tracked over successive frames and it needs to process only a small neighborhood of the predicted points. The use of color information simpli es the task of establishing feature correspondences in the binocular video sequence. We introduce the notion of `trajectory bundle' to alleviate the problem of encountering singularities in the trajectory. We believe that an optimal, nonsingular trajectory so computed may help the robot to do the task with the desired level of accuracy. This accurate trajectory information avoids the object recognition (object to be manipulated) where the explicit description of the object is not very cru-

[1] U. Lima and G. N. Sairidas, \Learning optimal robotic tasks", IEEE Expert, pp. 38{45, April,1996. [2] K. Ikeuchi and T. Suehiro, \Towards an assembly plan from observation part I: Task recognition with polyhedral object", IEEE Tran. on Robotics and Automation, vol. 10, pp. 360{385, Oct.,94. [3] Y. Kuniyoshi, H. Inoue, and M. Ibana, \Learning by watching : Extracting reusable task knowledge from visual observation from human performance", IEEE Tran. on Robotics and Automation, vol. 10, No.3, pp. 799{822, June, 1994. [4] T. Sato and S. Hira, \Language aided robotic teleoperation for advanced teleoperation", IEEE Tran. on Robotics and Automation, vol. 3, No. 5, pp. 476{481, Oct., 1987. [5] Y.F Li and M. H. Lee, \Applying vision guidence in robotic food handling", IEEE Robotics and Automation Megazine, pp. 4{12, March,1996. [6] D. Angluin and C. Smith, \Inductive inference: Theory and methods", ACM Computing Surveys, vol. 3, No. 15, pp. 237 { 269, 1983. [7] S. B. Kang and K. Ikeuchi, \Towards automatic robot instruction from perception- recognizing a grasp from observation", IEEE Tran. on Robotics and Automation, vol. 9, No.2, pp. 432{443, April, 1993. [8] M. Yeasin and S. Chaudhuri, \Automatic robot programming by visual demonstration of task execution", in Proc. of Intl. Conf. on Advanced Robotics, Monterey, CA, June 1997. [9] R. A. Volz, \Report of the robot programming working group: NATO workshop on robot programming languages", IEEE Tran. on Robotics and Automation, vol. 4, No. 1, pp. 87{89, Feb., 1988. [10] R. M. Murray, Z. Li, and S. S. Shastry, A mathematical introduction to robotic manipulator, CRC Press, Florida, 1993. [11] R. J. Vaccaro and S. D. Hill, \Ajoint-space command generator for cartesian control of robotic manipulators", IEEE Tran. on Robotics and Automation, vol. 4, No. 1, pp. 71{ 77, Feb., 1988. [12] J. A. Fessler, \Nonparametric xed-interval smoothing with vector splines", IEEE Trans. on Signal. Proc., vol. 39, No. 4, pp. 852{859, 1991. [13] A. Ude, \Trajectory generation from noisy positions of object features for teaching robot paths", Robotics and Automation Systems, vol. 11, no. 2, pp. 113{127, 1993. [14] H. Friederich and R. Dillman, \Robot programming based on single demonstration and user intentions", in Proc. 3rd European conference on learning robots, 1995. [15] A. N. Rajagopalan, M. Yeasin, and S. Chaudhuri, \HOSbased data clustering", submitted to TENCON '97, Queensland University of Technology, Australia. [16] T. J. Broida and R. Chellappa, \Estimation of motion parameters from noisy images", IEEE Tran. on Pattern Analy and Mach. Intell., pp. 90{99, 1986. [17] M. Yeasin, Visual Analysis of Human Motion: Some Applications in Bio-medicine and Dexterous Programming, Ph.D Thesis (under preperation), Dept. of Elec. Engg., IIT Bombay, India. [18] I. K. Sethi and R. Jain, \Finding trajectories of feature points in a monocular image sequence", IEEE Tran. on Pattern Analy and Mach. Intell., vol. 9 No. 1, pp. 56{73, 1987.

Suggest Documents