Visual Primitives for Imitation Learning

Visual Primitives for Imitation Learning Ajo Fod, Maja J. Matarić, and Odest Chadwicke Jenkins University of Southern California, Los Angeles CA, USA, afod|mataric|[email protected], WWW home page: http://www-robotics.usc.edu/ãgents

Abstract. We present a new method for representing human movement compactly to help in imitation learning. This paper describes a method to represent movement in terms of a linear superimposition of simpler movements termed primitives. If control strategies can be developed that can be easily transformed based on the variation of the contributions of these primitives to the movement, learning can be performed in the space of these primitives. Data from an experiment on human imitation was used to test the theory that movement can be represented as a superimposition of components. PCA performed on segments from these data give us a set of basis vectors. Clustering in the space of projections of segments on to the eigenvectors leads to the often-used movements. Finally, the movement obtained by expanding the cluster points in terms of the eigenvectors is used as a sequence of via points to control Adonis - our humanoids test bed.

1

Introduction

Programming and interacting with robots, especially humanoid ones, is a complex problem. Using learning to address this problem is a popular approach, but the high dimensionality of humanoid control makes the process prohibitively show. Imitation, the process by which the robot learns new movement patterns and skills by observing a human demonstrator, is a promising alternative. The ability to imitate enables a robot to greatly reduce the space of possible trajectories to a subset that approximates that of the observed demonstration. Trial and error and refinement are still likely to be required, but in a greatly reduced space. We have developed a model for learning by imitation [5, 29], inspired by neuroscience evidence for motor primitives and mirror neurons, structures that directly link the visual and motor systems in the context of complete generalized movements. The ability to mimic and imitated is based on this mapping mechanism. It also serves as a basis for more complex and less direct forms of skill acquisition. Thus, primitives are the fundamental building blocks of our motor control and imitation models. They also shape perception, as they directly couple visual inputs to the motor system. Biological primitives are structures that organize the underlying mechanisms of movement, including spinal fields [1] and central pattern generators [4]. In a computational sense, primitives can be viewed as a basis set of motor programs that are sufficient, through combination operators for generating entire movement repertoires. Several properties of human movement patterns are known, including smoothness [23, 22, 24], interjoint torque constraints [25] and the “2=3 power law” relating speed to curvature[26]. It is not clear how one would go about creating motor primitives with such knowledge alone. Determining an effective basis set of primitives is a difficult problem. Our previous work has explored handcoded primitives[6]. In this paper we describe a method of automatically generating a set of movement primitives from human movement data. We hypothesize that the trajectory followed by the arm during movement is composed of segments where a set of principal components are active. To find these segments, we first segment movement data and then run principal component analysis on the resulting segments. The eigenvalues corresponding to a few of the highest eigen values gives us a basis set for a subspace. The projection of the segment vector on to this subspace contains most of the information about the segment. By clustering in this subspace we obtain a set of points that correspond to a set of primitives. Any given segment from human movement data can be represented with arbitrary precision by one of these points. Subsequently we present a measure of the performance of this method based on the Mean Square deviation of the reconstructed movement from the original movement. Finally, we demonstrate the movements on a 20 DOF dynamic humanoid simulation. The rest of the paper is organized as follows. In Section 2, we place our work in the context of current research. A brief description of our model of imitiation is provided in Section 3. The data collection and the Adonis simulator are described in Section 4. The methods used to analyze data are presented in Section 5. Section 6 presents an error metric for evaluating the performance of the resulting reconstruction. In Section 7, these results are discussed and a summary is provided in Section 8.

2

Motivation

Movement primitives are behaviors that accomplish complete goal-directed actions. While we are aiming at a unified view of perceptual-motor pimitives, a simpler first approach to the problem is to divide primitives into two types: perceptual and motor. Motor primitives capture the dynamics of human movement. They prescribe the torques on the joints as a function of state and sensory feedback. Perceptual primitives are a way of compactly describing information about a movement, including the variation of the joint angles over time, the nature of a grasp, etc., from the visual perspective. One example of motor primitives is the functioning of the Central Nervous System in vertebrates. The CNS must transform information about a small number of variables to a large number of signals to many muscles. Any such exact transformation might not have a solution. How the nervous system computes these impulses is a topic of research. Bizzi et.al [1] motivate the concept of Convergent Force Fields (CFFs). Their experiments show that inter-neuronal activity imposes a specific balance of muscle activation. These synergies lead to a finite number of force patterns, which they call CFFs. The CNS uses such synergies to recruit a group of muscles simultaneously to perform any given task. Other studies and theories about neural primitives [2, 3, 27] suggest that there are ways to classify humanoid movement data into primitives. Stefan Schaal et al. [8, 9] discuss a set of primitives for generating control commands for any given motion by modifying trajectories appropriately or generating entirely new trajectories from previously learned knowledge. In certain cases it is possible to apply scaling laws in time and space to achieve the desired trajectory from a set of approximate trajectories. His elementary behaviors consist of two types motor primitives, namely discrete and rhythmic movements. Though this is a good method to control a humanoid robot, it is not very obvious how the robot would generalize from the input space of visual data. The rest of this paper considers perceptual primitives as a method of generalizing from visual inputs. Much of our visual experience is devoted to watching humans manipulate their environment. Intelligent robots are expected to work in societies composed of humans and robots. Every robot needs to recognize other agents actions and to dynamically react to them. Two factors make humanoid movement amenable to machine understanding. Firstly, they are performed by humanoid agents which prescribes the structural relationships between the joints. Secondly, they are composed of movements that can be grouped into actions like picking up, grasping, throwing, running, walking that are defined by a particular causal sequence of joint movements. These joint movements which can also be referred to as primitives are repeatedly used with a few variations whenever required. Running temporal segmentation of tasks into meaningful units becomes crucial when a continuous flow of actions is available. As a step toward being able to learn reusable task knowledge, and reproduce the equivalent action sequence with a robot manipulator in a natural environment, we need to be able to observe segmented movement and generalize or classify them into primitives or action units. The movement is temporally segmented by changes in the nature of contact or a significant drop in movement velocities. Brand [11] discusses a method to gist a video of action sequences. His gister reasons about changes in motions and contact relations between participants in an action. Usually only the agent (typically a part of an arm) and any objects under its control are segmented. Jenkins et al.[20] discuss a method of segmenting visual data manually into primitives. The work here was partly motivated by the results obtained in classification and recognition of eigenfaces [14] and work on PCA of certain kinds of movement by Sanger[16]. The work on eigenfaces shows that PCA can be used to distinguish between faces when one would ordinarily expect the differences to be of a nonlinear. Sanger has also used PCA on a seeming nonlinear domain. This work extends the analysis performed by Sanger to the domain of continuous motion. This brings us to a position where a continuous stream of movement data can be analyzed.

3

Our Model of Imitation

Our model [5] of imitation consists of five main components in three layers as shown in the Figure 1. A detailed description is provided in [20]. The Perception layer[28] consists of two modules. The tracking module tracks various parameters in the input data like the joint angles or contact with objects in the environment. The attention selection mechanism picks parts of the data for further study. The Learning layer consists of the Classification and Learning modules. The Learning module detects patterns in the input and remembers the sequences that occur relatively often and the corresponding motor actions required to perform the movement. The classification module classifies input data into the available primitives. The Action layer provides the control algorithms for acheiving a given task. Motor primitives encode the activation of actuators corresponding to each primitive that would result in the desired movement.

Fig. 1. The general model of imitation

In this paper, we discuss a set of methods that implement the Learning layer and the attention selection mechanism. Our segmentation routine corresponds to the Attention Selection module. Principal Component Analysis and Clustering routines correspond to the Learning module. The Reconstruction routine corresponds to the Matching and Classification module.

4

Resources

4.1 Data

Fig. 2. Location of FastTrack sensors on the subject’s arm

Data used for the experiment were collected from an experiment described by Pomplun and Matarić[21]. For this experiment, 11 subjects watched and imitated 20 short videos (stimuli) of arm movements. Each stimulus shows a 3 to 5 seconds long sequence of arm movements. Stimuli were presented on a CRT terminal. Arm movements were tracked with the FastTrak motion tracking system. Position sensors were fastened to the subject’s right arm at 4 points on the arm as shown in Figure 2: 1. the center of the upper arm, 2. the center of the lower arm, 3. above the wrist,

4. on the phalanx of the middle finger. The position sensors were about 211 cm in size. Elastic bands were used to hold the sensors in position. Data from this experiment are recorded in files that contain the 3-D co-ordinate location of each of the sensors and a time stamp for each measurement. The position of the sensors were measured every 34ms with an accuracy of about 2mm. 4.2 Adonis - the test bed Our tests and simulations were conducted on Adonis [6, 30]. It is a rigid-body simulation of a human torso, with static legs. It consists of eight rigid links connected with revolute joints of one and three Degrees of Freedom (DOF). The simulator has a total of 20 DOFs. Mass and moment-of-inertia information is generated from the geometry of body parts and human body density estimates. Equations of motion are calculated using a commercial solver, SD/Fast [17]. The simulation acts under gravity, and accepts other external forces from the environment. No collision detection, with itself or with the environment are used in the version used here.

Fig. 3. Adonis’ graphical display

The simulation is stable and the static ground alleviates the need for explicit balance control. We use the JointSpace PD-Servo Approach in our display of the primitives. Based on the target arm configuration, the torques for all joint angles were calculated as a function of the angular position and velocity errors by using a PD-servo controller: = k (desired a tual ) + kd (_desired _a tual ) (1) Here k is the stiffness of the joint, kd the damping coefficient, desired , _desired are the desired angles and velocities for the joints, and a tual , _a tual are the actual joint angles and velocities. We display movements on Adonis by determining a sequence of target joint angle configurations. When its joint angle configuration reaches a prescribed vicinity of one target configuration in the sequence, Adonis reads in the next angle configuration and tries to reach it. If the set points fall on a smooth trajectory, the resulting motion on Adonis is a natural looking movement.

5

Methods

The following is a brief description of the methods used. Inverse Kinematics is used to transform 3D coordinates of joints into a joint angle representation. Filtering removes the incidental random zero velocity points caused by noise. Segmenting slices the movement data so that, finite dimensional vectors can be extracted for PCA. The PCA results in a set of eigenvectors a few of which represent almost all the variance in the data. Movement segments can be projeced on to this subset of eigen vectors that capture most of the information in the movement. The

K-Means clustering algorithm is then used to group the projections of the segment on to the few eigenvectors to obtain clusters that represent often used movements. The following methods were used in the analysis of the data. Detailed descriptions for each step follow. These steps are summarized in Figure 4.

Training data

Filter

Joint angles

Segments

Segment

Project

PCA Test data

Joint angles

Filter

Cluster

Clusters

Eigen vectors Low D representation

Segments

Segment

Low D representation

Project

Classify Approximate Cluster

Reconstruct

Fig. 4. Components of our system: The thin lines represent flow of data in the learnig phase. The dark lines correspond to the flow of data for the testing phase. The dotted lines represent the use of previously computed data.

5.1 Inverse Kinematics Raw data are in the form of marker position as a function of time in 3D cartesian space. Due to the constraints imposed on the joints, all the data in the input files can be compacted into fewer variables than required by a 3D representation. For example, since the elbow lies on the surface of a sphere centered at the shoulder, we need only two degrees of freedom to completely describe the elbow when the position of the shoulder is known. Similarly, the wrist can lie only on the surface of a sphere centered at the elbow. Thus, all the variability one arm can be represented by 4 variables henceforth referred to simply as angles. The Euler angle representation I have used describes the position of the arm in terms of a sequence of rotations around one of the principal axis. Each rotation transforms the coordinate frame and produces a new coordinate frame in which the next transformation is performed. The first transformation Rz rotates the coordinate frame through gamma about the z-axis. Subsequently, the rotations Ry about the y-axis and Rz determine the configuration of the ball and socket joint at the shoulder. If the rotation of the elbow is specified, the position of the elbow and the wrist relative to the arm is completely determined. 5.2 Filtering Every 34ms the FastTrack system reads the 3-D co-ordinates of every one of the sensors placed on the arm once. These data are noisy. The noise consists of two distinct parts. The low frequency portion of the noise or drift, which is primarily caused by the movement of the subject’s shoulder, is assumed to affect all the points on the arm uniformly. Making the shoulder the origin of our coordinate frame eliminated this part of the noise. The high frequency component of noise can be attributed to the elastic vibrations in the fasteners used to attach the markers

to the arm. Passing the signal through a Butterworth filter of order 10 and normalized cutoff frequency of 0.25 reduced this part of the noise considerably. After this transformation, the angles are a smooth function of time. 5.3 Segmenting Segmentation of the data was the most challenging aspect of this problem. It is necessary to segment the data into smaller segments so that they can be studied together and common features across segments can be recorded. We would like segments to have the following properties. 1. Segmentation needs to be consistent. In other words, they have to be identical in different repetitions of the action. For this, we need to define a set of features which could be used to detect the transition from one segment to another. 2. We need to ensure that we detect a few features that are consistent across segments. 3. Segments would have to partition the input space so that the original movement can be reconstructed from them. With these features, it would be possible to make a statement about the similarity between movement sequences or distinguish between them. Brand et al.[11, 12] have shown how video input can be used to segment continuous visual input. Theirs is an event-based segment model. Segments are introduced when a certain feature or contact configuration changes. Yasuhiro et al.[15] have introduced a via point method of segmenting handwriting speech and other temporal functions. We consider a few simpler segmentation routines. The evolution of the angles with time can be plotted in a 4D space as a parametric curve. The radius of curvature of this curve is defined as the radius of the largest circle in the plane of the curve that is tangential to the curve. Curvature has the advantage that it is a scalar function of time. When the radius is low, it could imply a change of desired state and hence can be used to separate segments. Unfortunately, the calculation of the radius of curvature of a trajectory becomes intractable as the number of dimensions of the data increase. The choice of a threshold for the curvature where a segment needs to be thrown is also difficult.

Fig. 5. Segmenting usin zero veclocity crossings. Variables are plotted along the y-axis against the time on the x-axis. The four sub-plots are of the four variables that we analyze. Every curved line above the movement polts represent a segment.

The segmentation routines we use are based on the angular velociy in different DOFs. The angular velocity reverses after moving in a direction for a while. Whenever this happens, the point is labeled as a Zero Velocity Crossing (ZVC). Each ZVC is associated with one DOF. All movement in any DOF occurs between ZVCs. If this movement is significant, it can be recorded as a segment. ZVCs could flank a segment where almost no movement occurs. Ideally, these segments should be discarded since they contain no movement information other than that the angle needs to be held steady. Spurious ZVCs could also be introduced by noise. Figure 5 shows the result

of segmentation performed on a single movement. Significant movement is assumed to occur when the change in the angle is above a certain threshold. The short curves above the continuous curves are drawn whenever such segments are recorded. We are looking for properties common to all DOFs. Fortunately, the ZVCs in different DOFs seem to coincide in many of the movement samples most of the time. Due to this special property of certain parts of the movement, we decided to snip the data stream at the points where more than one DOF has a ZVC separated by less than 300ms (called the SEG-1 routine). To avoid including random segments, whose properties can not be easily determined, we decided to keep only those segments whose two ends have this property. Since we drop a few segments, we would be unable to reconstruct any movement completely. For such movements, during the segment between these points, the movement has certain common properties. For example the end points are likely to involve little movement and there is a directed effort to get to a new configuration of joint angles in the interim. It is also very likely that the limb is moving in a nearly constant direction between these points. The direction reversals are likely to be points where intentions and control strategies change. In our first segmentatoin strategy, we have segmented between direction reversals to find recurring patterns of movements and extract primitives. The second segmentation routine (SEG-2) we discuss is one based on the sum of squares of angular velocity z .

2

2

2

2

z = 1 + 2 + 3 + 4

(2)

where i is one of the angles. Figure 6 shows the variation of z with time. We decided to segment whenever z is lower than a constant. The data consisted of repetitions of some movement sequence. We varied this constant till we obtained at least as many segments in each movement as the number of repetitions of the action. The resulting segments had the common property that z was high in each segment. Because of the method of determining the constant, this method of segmentation is reliable. Since the movement is partitioned into segments, we can reconstruct the original movement data. Thus this method seems to be a promising segmentation strategy.

25

20

15

10

5

0

0

20

40

60

80

100

120

140

Fig. 6. A plot of the variation of the sum of squares of velocities along the y-axis as a function of the time along the x-axis.

5.4 Principal Component Analysis Given segments of varying lengths with the common property that they are between ZVCs. In order to perform PCA we need to convert this movement into a vector. In each DOF, the movement between these ZVCs is interpolated to a length of 100 elements. These 100 elements can be represented in a vector form. The vectors so obtained for each of the DOFs is concatenated to form a vector s consisting of 400 elements. This is effectively a 400D representation of the movement within each segment. All data in the input files are divided into segments by the above procedure. If there are N segments in the input data, the mean is now determined as follows PN s = i=1 si (3) N

The covariance matrix K is given by

PN (s s)(s s)T i K = i=1 i

(4)

N

The principal components of the vector are the eigenvectors of the matrix K. There are d such eigenvectors of dimension d where d is the dimension of the vector s. Figure 7 shows the most significant few of the 400 eigenvectors obtained by PCA arranged in increasing order of magnitude.

6

3

x 10

2.5

2

1.5

1

0.5

0

1

2

3

4

5

6

7

8

9

10

11

Fig. 7. The the magnitude of the last 11 eigen values.

Most of the variance of the distribution is captured by the few eigenvectors (say f vectors) that have the highest eigenvalues. Thus, we can describe the vector s in terms of the projection of the vector along these f eigenvectors (Ef ) in other words, an f -dimensional vector p.

p = ETf s

(5)

This above results in an f -dimensional representation of each segment. The first four of these eigenvectors are shown in Figure 8. The number of eigenvectors chosen for further calculations determine the accuracy and the time taken to complete the search for movement primitives. If more eigenvalues are used, the representation would have less error. On the other hand, the clustering space would be larger. 5.5 Clustering A set of points of type p in the f -D space is obtained by projecting all the si vectors as shown above. The algorithm clusters the points in the space into k clusters, where k is a predetermined number. Each of the k clusters represents a primitive. The algorithm is initialized by a set of k estimates of cluster vectors. In every iteration, each point that is closer to one of the vectors than others is grouped into a cluster belonging to that vector. The vector is then recalculated as the centroid of all the points belonging to the vector’s cluster. This method is also known as the k-means algorithm. The clustering method takes as input the number of clusters required. The choice of this number has to be based on the error that can be tolerated in approximating a movement by a cluster. If the tolerable error is large, we can do well with few clusters. For more accuracy, if we increase the number of clusters, the error would reduce but the time to compute the clusters would increase. The memory required also increases with the number of clusters used. Fortunately, the clustering algorithm is used in the learning phase. Thus it is run only once. It might be possible to overlook the time taken in clustering in favor of the accuracy resulting from a higher number of clusters.

1

2

0.01

0.12

0

0.1

−0.01

0.08

−0.02

0

10

20

30

40

50 1

60

70

80

90

100

0.02

0.06

0

10

20

30

40

50 2

60

70

80

90

100

0

10

20

30

40

50 2

60

70

80

90

100

0

10

20

30

40

50 2

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

0.1

0 0 −0.02 −0.04

0

10

20

30

40

50 1

60

70

80

90

100

0.2

−0.1 0.02 0

0 −0.02 −0.2

0

10

20

30

40

50 1

60

70

80

90

100

0.1

−0.04 0.02 0.01

0 0 −0.1

0

10

20

30

40

50

60

70

80

90

100

−0.01

3

4

0.02

0.1

0.015 0 0.01 0.005

0

10

20

30

40

50 3

60

70

80

90

100

−0.1

0.03

0

0.02

−0.02

0.01 0

10

20

30

40

50 4

60

70

80

90

100

0

10

20

30

40

50 4

60

70

80

90

100

0

10

20

30

40

50 4

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

−0.04 0

10

20

30

40

50 3

60

70

80

90

100

0.1

−0.06 0.2

0

0

−0.1

−0.2

0

10

20

30

40

50 3

60

70

80

90

100

0.2

0.1

0.1

0

0

0

0

10

20

30

40

50

60

70

80

90

100

−0.1

Fig. 8. The first 4 eigenvectors. Each figure shows one eigen vectors. Each sub plot shows the variation of an angle with time in each of the eigen vectors.

5.6 Reconstruction The vector p mentioned above can be used to reconstruct the original movement segment, as follows. The vector p is essentially the projection of the movement segment on to the eigenvector space. To reconstruct, we only need to expand the original vector in terms of this set of basis vectors.

s = Ef p

(6)

This vector s is in the same space as the original set of vectors. These can now be split into 4 components. Splitting the vector into 4 parts of length 100 each accomplishes the decomposition into movement in the 4 joint angles in time. This is the movement on a normalized time scale. We would need to interpolate this movement to obtain the original movement. In the next section we describe how reconstruction was performed and evaluated.

6

Results and Evaluation

One of the most commonly used error metrics for linear systems is the Mean Squared Error (MSE). This measure is attractive to us for the following reasons: 1. It penalizes large deviations from the original more severely than smaller ones. 2. The metric is additive. Error in one part of the reconstruction will not cancel the error in another part. 3. Finally, it is mathematically tractable as shown in this paper. Representation of segments in terms of eigenvectors allows us to make a statement about the error in the subsequent reconstruction. If the reconstructed vector sr corresponds to the original vector s then the squared error

Fig. 9. A couple of clusters. Each figure shows a different cluster. Each subplot shows the variation of an angle plotted against time.

in this reconstruction is given by.

s sr )T (s sr )

(7)

N 2

(8)

=(

If we tollerate an average reconstruction error of degrees at each sample point in the vector s, the required bound on the expectation of , also known as the Mean Squared Error (MSE) is given by, E ()

The Kohonen-Louve expansion of the original vector in terms of the eigenvectors is,

s=

N X i=1

where pi =

ei

pi

(9)

s T ei

(10)

The expansion of the reconstructed vector in terms of the eigenvectors is,

sr =

f X i=1

pr;i

ei

(11)

where r is the cluster that is used for the reconstruction and pr;i are the coordinates of the cluster points. The difference between the original and the reconstructed vectors can now be expressed as,

s sr =

f X i=1

(pi

ei +

pr;i )

N X i=f +1

pi

ei

(12)

The first term in the reconstruction error is the Clustering error, while the second part is the Projection error. Since the eigenvectors are a set of orthonormal basis vectors. The MSE is a sum of the MSE in projecting along each of the individual eigen vectors. The MSE in projecting along each of the dropped eigenvectors is, E ()e =

N X i=f +1

2

i

(13)

1 60 40 20 0

0

10

20

30

40

50 2

60

70

80

90

100

0

10

20

30

40

50 3

60

70

80

90

100

0

10

20

30

40

50 4

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

150

100

50 120 100 80 60 100

50

0

Fig. 10. Reconstruction of a segment. Each subplot shows the variation of one angle with time. The dotted line shows the reconstruction.

100

50

0

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

140

100 80 60 40 150 100 50 0 100

50

0

Fig. 11. Reconstruction of an entire movement composed of many segments. Each subplot shows one angle as a function of time. The dotted line shows the reconstructed segment.

Numer of M SEp using M SEp using eigenvectors SEG-1 SEG-2 used 10 1:22 107 6:47 107 5 20 1:05 10 1:48 106 3 30 2:82 10 7:64 104 40 1:92 102 5:39 103 Table 1. MSE in projection using SEG-1 and SEG-2 routines

where the eigenvectors f + 1 through N were dropped. The error in projection is shows a function of the number of eigen vectors kept for reconstruction in Table 1. We use thirty eigenvectors (f = 30) in our projection of the input vector. The error in clustering is orthogonal to the projection error and is given by E () = E

f X i=1

(ai

2 pr;i )

!

(14)

where f is the number of dimensions in which clustering is done, ai is projection of the data point along the ith eigenvector, and pr;i is the projection of the cluster vector along the same. The above is also the average squared distance from the representative vector when clustering is done. Here is a tabulation of the M SE introduced by clustering as a function of the number of clusters used. A tabulation of M SE as a function of the number of clusters used is provided in Table 2. Numer of M SE using M SE using clusters SEG-1 SEG-2 10 1:45 109 6:12 105 20 3:05 105 3:52 105 30 2:34 105 2:84 105 100 9:63 104 170 7:54 104 Table 2. MSE in clustering using SEG-1 and SEG-2 segmentation routines

Since the error in clustering is orthogonal to the error in dropping smaller eigenvalues, the total error introduced by the approximation steps is given by: E () = E ()p + E () (15) A tabulation of this total error as a function of the number of clusters used is given in Table 3. The total error is calculated assuming that 30 eigenvectors were used in the expansion. No. of Total Error (degrees) Total Error (degrees) clusters Using SEG-1 Using SEG-1 Using SEG-2 Using SEG-2 10 1:45 106 60.28 6:88 105 41.42 20 3:05 105 26.68 4:28 105 32.71 5 30 2:34 10 24.43 3:50 105 29.50 100 1:633 105 20.22 170 1:514 105 19.42 Table 3. Angular error corresponding to reconstruction with 30 eigen vectors using SEG-1 and SEG-2segmentation routines

Figure 10 contains a reproduction of one segment from the input data. The dotted lines show the movement reconstructed from one of the cluster points. The line shows the original movement.

7

Discussion

The procedure discussed above is suitable for generating movement primitives. Parts of any visible movement can be segmented and classified as belonging to a certain cluster. Control algorithms can now be developed within these clusters for the manipulator. Simple modifications like time or space scaling applied to these control algorithms are expected to lead to control strategies for other movements that belong to these clusters. There is a way to measure the error in the representation of the trajectory that can be calculated based on the model parameters. By changing the model parameters, we can guarantee a higher accuracy in representation or achieve greater generalization. Higher accuracy can be obtained by either increasing the number of eigenvectors that are included in the representation or by increasing the number of cluster points. There is a tradeoff between the processing time, memory and the accuracy in choosing the number of clusters.

The segmentation routine SEG-1 described in Section 5.3 results in an unpleasant side effect, namely that the original movement is not partitioned. There is considerable overlap when two joint angles cross their zeros simultaneously at one point and the other two cross their zeros at a different point in time. For some actions, the zero crossings may not coincide for a huge part of the movement. This is possible both when the some of the zeros are missed by the zero marking procedure or when the zeros just fail to align properly. These alignment problems arise when the movement being analyzed is a circle or some other complex movement. Though SEG-1 explains the formation of primitives in well-defined segments, it can not reconstruct reliably outside these segments because the segmentation does not result in partition of the movement space. It might be possible to improve the segmentation for reaches. But, that would not solve the problems caused by potential misalignment of the zeros themselves. It might also be possible to arrive at a different segmentation scheme for the parts that are not amenable to this form of segmentation. This would essentially divide the space of human movement into a set of reaches and a class of non-reaches. SEG-2 is an alternative to SEG-1. It partitions the movement data into complete non-overlapping segments. As a result of this, such segments can be joined together to reconstruct the original movement. While using SEG-2 the magnitude of the eigen vectors does not decline as fast as in SEG-1. As a result of this, the number of eigen vectors required to reconstruct the original segment is larger as shown in Table 2 and 3. In our implementation Adonis is programed to target a point in its joint space. When it has reached a prescribed vicinity of this point, its set point is reset based on input from a file. This can be thought of as setting a region about the point into which the trajectory absolutely must fall at each successive interval. This has the potential of overconstraining the trajectory instead of trying to achieve the goal. We believe that the tolerance should be a function of the point on the trajectory and the importance associated with sticking to the prescribed trajectory. One of the potential applications of this research is in complex robots like the P3 and the proposed Robonaut [18, 19] that have many degrees of freedom. The large number of interdependant variables makes designing a control strategy for these robots a formidable task. For example, when the arm moves, it affects the rest of the body. Using observations of human movement, we can find a set of relationships between the different degrees of freedom as our eigenvectors do. This helps the controller in choosing a control strategy in a low dimensional space. In his paper on imitation learning, Schaal[9] suggests a mechanism in learning wherein a few movement primitives compete for a demonstrated behavior. He uses an array of primitives for robot control, each of which has an associated controller. Learning is performed by adjusting both the controller and the primitives. Our primitive set could be used either in place of the learning algorithm or to help start it. Adonis was programmed to show some of these commonly used movements. They looked like generic strokes. For any movement, segmented into strokes as described earlier, control algorithms could be generated that interpolate between control strategies for similar strokes to obtain a control strategy for the arm.

8

Summary

We have presented a method that can be used to generate a finite number of fast control strategies for humanoid robots based on the data collected from human movement. At a high level of processing we can now project the desired movement on to a space of lower dimensionality than the input space. This allows us to make a faster choice of control algorithms. Thus we only need to store a small number of control strategies and interpolate for all generic movements. The eigenvectors in the above case are the superimposable basis functions which we call primitives. The clustering in the space of the projections of the movement vectors on to the eigenvector space shows us the often used or repeated movements. The steps of the process are summarized as follows. Raw data in the form of 3D locations of the arm joints is converted into an Euler representation of joint angles using inverse kinematics. The signal was then filtered to obtain smoother curves so that the segmentation step is not confused by the zeros introduced by the noise in the signal. PCA was then performed on the all the segments to obtain the eigen movements or primitives. The K-means clustering algorithm then clusters the points in the space of the projections of the vectors along the eigenvectors corresponding to the highest eigenvalues. To obtain the commonly used movements as the evolution of Euler angles in time, the reconstruction step was used. The resulting movement was converted into a quaternion representation for display on Adonis. A measure of error in the above representation was also derived. The desribed process is one of several lines of research related to our work on primitives-based motor control and imitation. We are interested in deriving primitives directly from human movement data, and continue to explore refinements and variations toward solutions to this problem.

Acknowledgments The research described here was supported by the NSF Career Grant to M. Matarić. The data used for this analysis were gathered by M. Matarić and M. Pomplun in a joint interdisciplinary project conducted at the National Institutes of Health Resource for the Study of Neural Models of Behavior, at the University of Rochester.

References 1. Emilio Bizzi, Simon F. Giszter, Eric Loeb, Ferdinando A. Mussa-Ivaldi and Phillippe Saltie: ’Modular organization of motor behavior in the frog’s spinal cord’ Trends Neurosci.18, 442-446 (1995). 2. Mussa-Ivaldi. F. A.: ’Nonlinear Foce Fields: A disributed System of Control Primitives for Representation and Learning Movements’ 3. Mussa-Ivaldi. F. A., Simon. F. Giszter, Ferdinando A. Mussa-Ivaldi, Emilio Bizzi: ’Convergent Force Fields Organized in the Frog’s Spinal Cord’ Journal of Neuroscience.13,2: 467-491(1993). 4. Stein. P. S. G.: ’Neurons, Networks, and Motor Behavior’. The MIT Press, Cambridge, Massachusetts (1997). 5. Matarić, M.J.: ’Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics, in C. Nehaniv & K. Dautenhahn, eds, ‘Imitation in Animals and Artifacts’, The MIT Press. (2000) 6. Maja J. Mataric, Victor B. Zordan: ’Making Complex Articulated Agents Dance: an analysis of control methods drawn from robotics, animation, and biology.’, Autonomous Agents and Multi-Agent Systems, 2, 23-43 (1999). 7. Maja J. Mataric: ’Learning Motor Skills by Imitation’, Proceedings, AAAI Spring Symposium Toward Physical Interaction and Manipulation, Stanford University, Mar 21-23, (1994). 8. Stefan Schaal, Shinya Kotosaka, and Dagmar Sternad: ’Programmable Pattern Generators’ Proceedings, 3rd International Conference on Computational Intelligence in Neuroscience, Research Triangle Park, NC, 48-51(1998), 9. Stefan Schaal: ’Is Imitation Learning the Route to Humanoid Robots’ TICS,3, 6, 233-242(1999) 10. Yasuo Kuniyoshi, Masayuki Inaba, Hirochika Inoue: ’Seeing, Understanding and Doing Human Task’, Proceedings of the 1992 IEEE, International Conference on Robotics and Automation, Nice France (1992) 11. Matthew Brand: ’Understanding Manipulation in Video’ 2nd International Conference on Face and Gesture Recognition, Killington, VT, (1996). 12. Matthew Brand, Nuria Oliver, Alex Pentland: ’Coupled Hidden Markov Models for complex action recognition’, Proceedings, CVPR, IEEE Press, 994-999 (1997). 13. Matthew Brand: ’Learning concise models of human activity from ambient video’ 14. Matthew Turk, Alex Pentland: ’Eigenfaces for recognition’ 15. Yasuhiro Wada, Yasuharo Koike, Eric Vatikiotis-Bateson, Mitsuo Kawato: ’A computational theory for movement pattern recognition based on optimal movement pattern generation’, Biol. Cybern 73, 15-25 (1995). 16. Terence David Sanger: ’Human Arm Movements Described by a Low-Dimensional Superposition of Principal Component’, Journal of NeuroScience, Feb 1, (2000), 20(3): 1066-1072. 17. M. Hollars, D. Rosenthal, and M. Sherman: SD/Fast User’s Manual, Technical report, Symbolic Dynamics, Inc. (1991) 18. ’NASA Anthropomorphic Projects’, http://www.androidworld.com/prod41.htm 19. ’Robonaut’, http://vesuvius.jsc.nasa.gov/er er/html/robonaut/robonaut.html 20. Odest Chadwicke Jenkins, Maja J. Matarić, and Stefan Weber: ’Primitive-Based Movement Classification for Humanoid Imitation’, submitted to Humanoids(2000). 21. Marc Pomplun, Maja J. Matarić: ’Evaluation Metrics and Results of Human Arm Movement Imitation’, Submitted to Humanoids (2000). 22. Flash T., Hogan N.: ’The coordination of arm movements: an experimentally confirmed mathematical model’, J. Neurosci 5:1688-1703 (1985). 23. Hogan N.: An organizing principle for a class of voluntary movements’. J. Neuroscience 4:2745-2754 (1984). 24. Uno Y., Kawato M., Suzuki R.: ’Formation and control of optimal trajectory in human multijoint arm movement.’ Biol Cybern61:89-101 (1989). 25. Gottlieb G. L., Song Q., Hong D-A, Almeida G. L., Corcos D., ’Coordinating movement at two joints: a principle of linear covariance.’ J Neurophysiol 75:1760-1764 (1996). 26. Viviani P., Terzuolo C.: ’Trajectory determines movement dynamics.’ Neuroscience 7:431-437 (1982). 27. Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., Rizzolatti, G.: ‘Cortical Mechanisms of Human Imitation’, Science 286, (1999) 2526–2528. 28. Matarić, M. & Pomplun, M.: What do People Look at When Watching Human Movement?, Technical Report CS-97-194, Brandeis University. (1997) 29. Matarić, M. J.: Learning Motor Skills by Imitation, in ‘Proceedings, AAAI Spring Symposium Toward Physical Interaction and Manipulation’, Stanford University. (1994) 30. Matarić, M. J., Zordan, V. B., Mason, Z.: Movement Control Methods for Complex, Dynamically Simulated Agents: Adonis Dances the Macarena, in ‘Autonomous Agents’, ACM Press, Minneapolis, St. Paul, MI, (1998) 317–324. 31. Weber, S., Jenkins, O. C. & Matarić, M. J.: Imitation Using Perceptual and Motor Primitives, in ‘Proceedings, The Fourth International Conference on Autonomous Agents (Agents 2000)’, Barcelona, Spain.

Visual Primitives for Imitation Learning

Visual Primitives for Imitation Learning

Suggest Documents

Gradients as Visual Primitives

Learning Action Primitives - CiteSeerX

Generative Adversarial Imitation Learning

Learning Motor Primitives for Robotics - CiteSeerX

Learning Perceptual Coupling for Motor Primitives - CiteSeerX

Generative Adversarial Imitation Learning

Learning Movement Primitives - Semantic Scholar

Convergence of Value Aggregation for Imitation Learning

Learning Nonparametric Models for Probabilistic Imitation - CiteSeerX

Multi-Agent Imitation Learning for Driving Simulation

Model-based Adversarial Imitation Learning

Fundamental Graphical Primitives for Visual Query ... - Semantic Scholar

Robot Learning by Active Imitation

Self-Imitation Learning - EECS @ Michigan

Visual Transformations in Gesture Imitation - VisLab

Learning From Observation and Practice Using Primitives

Learning Deep Movement Primitives using Convolutional Neural

Learning Prioritized Control of Motor Primitives

Dynamical Movement Primitives: Learning ... - MIT Press Journals

cuDNN: Efficient Primitives for Deep Learning - Google Sites

Learning Perceptual Coupling for Motor Primitives - Semantic Scholar

Learning Constructive Primitives for Online Level Generation ... - arXiv

Social Cognition: Imitation, Imitation, Imitation - Cell Press

Imitation