ment primitives (Forte et al. [2012]; Stulp et ... [2010], reaching movement Forte et al. [2012]; Ude et ..... Fit a DMP to ξm with WLS Îm = {wm,gm,Ïm}. - Associate ...
Universit`a degli Studi di Genova
Istituto Italiano di Tecnologia - ADVR
Statistical Learning of Task Modulated Human Movements Through Demonstration
Tohid Alizadeh
Supervisor: Dr. Sylvain Calinon
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
Doctoral School of Life and Humanoid Technologies Universit`a degli Studi di Genova and Department of Advanced Robotics (ADVR) Istituto Italiano di Tecnologia
Genova - Italy, April 2014
iii
Abstract Making the robots to be well integrated into the domestic environments is one of the challenging issues in the robotics field of research. There are several aspects concerning this issue, one of which is the difficulty in the interaction with and programming the robots. Manually programming a robot is a complex and tedious job. One way to make the programming a robot easier is to use programming by demonstration techniques. It would be much more convenient to just show the robot what to do, and for the robot to learn and imitate the required behaviour automatically. In this way, the end user does not need to be an expert in robotics or programming. The problem of teaching a robot to learn a task is considered in this thesis. There are several approaches to solve such a problem, however, the approaches which are investigated and developed in this thesis are mainly statistical approaches. Special effort is given here to the tasks for which the movement is modulated by some task parameters. This makes the developed approaches to encompass bigger family of the tasks. Task-parameterized Gaussian mixture model (TP-GMM) is proposed as a solution of the problem, eventually. The proposed approach has better generalization properties compared to other similar methods. Even though the TP-GMM offers a suitable solution for the problem, it still has its own limitation. One of the main limitations which is tackled in this thesis is the lack of information in the task parameters. Evolving another step ahead, the case of partially observable task parameters is investigated and the proposed approach is modified in order to take into account the availability of the task parameters. This increases the generalization property of the approach in some sense, making the approach even more suitable for practical scenarios, in which not all the needed information, regarding the task parameters, are available all the time. The proposed approaches are evaluated using real world experiments with robotic platforms. Dynamic movement primitives (DMP) have been used extensively in the filed of learning by imitation. Mixture models are proposed to act as the core of dynamic movement primitives. By using mixture models instead of the conventional LWR approaches for learning a DMP, several advantages are achieved, including the reduced complexity of the model.
Key-words: Robot programming by demonstration (RPbD), learning from imitation (LfD), Human robot interaction (HRI), Learning by Imitation, Gaussian Mixture model (GMM), Gaussian mixture regression (GMR), Hidden Markov Model (HMM), Dynamical systems (DS), Dynamic movement primitives (DMP), parametric hidden Markov model (PHMM), task-parameterized GMM (TPGMM), Partially observable task parameters.
v
Acknowledgments I would like to acknowledge all those who supported me and provided guidance during the work on this thesis. In particular, I would like to express my gratitude towards my supervisor, Dr. Sylvain Calinon, who gave me the opportunity to work in his research group. He provided an exciting working environment with many opportunities to develop new ideas, learn new skills, meet interesting people and shared his strong supports and friendships. Without any doubt, it was his presence, support, patience and enthusiasm that inspired me all the way through my PhD studies. I am very thankful to professor Darwin G. Caldwell, the director of ADVanced Robotics (ADVR) department, who provided the necessary conditions for the research activities in ADVR. I would like to thank the secretaries and administrative assistants of our department, Floriana, Silvia, Valentina, Simona(s), and Monica for their valuable support in the organization of many activities. I would also like to thank all my colleagues who turned these years at ADVR and specially in learing and interaction lab into a pleasant time. Special thanks goes to Milad, Reza, Davide, Leonel, Antonio, Danilo, Nawid, Petar and all the others in learning and interaction lab. I would like to express my deep gratitude to all my friends in IIT, including Mohamad, Arash, Ali, Reza, Mahmood and the other friends whose presence made the this period in Genova more enjoyable. The financial support of the University of Genova and Italian Institute of Technology is also gratefully acknowledged. I would like to thank my family. My parents and my siblings receive my greatest gratitude and love for their support, all through my life. Even though we were far from each other, I always have felt their presence in my life during the past years. Last but not least, I would like to thank my wife, Soheila, for all her love, for her support, and for being my best friend. Her presence made the life in Genova more graceful.
Contents Contents
vii
List of Figures
ix
List of Tables
xi
1
Introduction
1
1.1
Scope and Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.4
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.5
Publication note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2
3
Concepts and Backgrounds (state of the art)
11
2.1
problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2
Available approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.1
Style machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.2
Parametric DMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.3
Parametric DMP-GPR using a trajectory database . . . . . . . . . . . . . . .
18
2.2.4
Multi-streams approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.2.5
Parametric Hidden Markov model . . . . . . . . . . . . . . . . . . . . . . .
24
Task-parameterized Gaussian Mixture Model (TP-GMM)
27
3.1
Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.2
Experiment: Rolling out a Pizza dough using a robotic manipulator . . . . . . . . . .
31
3.2.1
Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2.2
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
vii
viii
Contents 3.3
3.4 4
3.3.1
Experiment: Transferring skills to a humanoid robot . . . . . . . . . . . . .
41
3.3.1.1
Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.3.1.2
Time-based and time-invariant movements . . . . . . . . . . . . .
43
3.3.1.3
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . .
44
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 51
4.1
Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.2
Robotic dust sweeping Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2.1
Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.2.2
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
DMP learned by GMR
67
5.1
Dynamic movement primitives (DMP) . . . . . . . . . . . . . . . . . . . . . . . . .
68
5.2
Proposed approach for learning DMP parameters . . . . . . . . . . . . . . . . . . .
69
5.3
Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.3.1
Discrete movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.3.2
Periodic movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.4.1
Comparison for equal number of parameters stored in the models . . . . . .
75
Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.4
5.5 6
36
Task-parameterized GMM with partially observable frames of reference
4.3 5
Combining task-parameterized GMM with dynamical systems . . . . . . . . . . . .
Concluding Remarks
83
6.1
Discussions and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.2
Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
A Publications of the Author
87
B Properties of Gaussian distributions
91
Bibliography
103
List of Figures 2.1
A general overview of the PbD problem . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2
A general overview of the task-parameterized PbD problem . . . . . . . . . . . . . .
12
2.3
A simple left to right HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4
Schematic of a stylistic HMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1
Illustration of task parameterized movement based on the proposed approach. . . . .
28
3.2
Illustration of the task to the robot by kinesthetically moving it. . . . . . . . . . . . .
31
3.3
Snapshots of a typical reproduction result. . . . . . . . . . . . . . . . . . . . . . . .
35
3.4
Demonstrations and reproductions in the same situations (top-view of the worktop). .
36
3.5
Reproductions in new situations (new task parameters). . . . . . . . . . . . . . . . .
37
3.6
Comparison of different approaches. . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.7
Illustration example of movement learning through the superposition of virtual springs. 39
3.8
Information flow of the overall process. . . . . . . . . . . . . . . . . . . . . . . . .
41
3.9
Full-body compliant humanoid robot COMAN developed at IIT. . . . . . . . . . . .
42
3.10 Results for the clapping task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
3.11 Physical perturbation during reproduction of the learned hand clapping. . . . . . . .
45
3.12 Results for the tracking task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.13 Adaptive uni/bimanual tracking behaviours learned from a single demonstration. . .
47
3.14 Results for the sweeping the floor task. . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.1
Snapshots of a demonstration of the dust sweeping task . . . . . . . . . . . . . . . .
55
4.2
Snapshots of a reproduction attempt . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.3
Demonstration data observed from different frames of reference and the obtained model 58
4.4
Demonstrations, reproductions and importance of frames . . . . . . . . . . . . . . .
59
4.5
Reproductions for the same set of task parameters with missing frames . . . . . . . .
60
4.6
Reproductions for new set of task parameters . . . . . . . . . . . . . . . . . . . . .
61
ix
x
List of Figures 4.7
Reproductions for new set of task parameters with missing frames. . . . . . . . . . .
62
4.8
Average precision pattern for reproductions in different situations . . . . . . . . . . .
64
4.9
Reproduction when only the refernce frame is available . . . . . . . . . . . . . . . .
65
5.1
Reproduction of periodic signals with WGMR for GMM1 and GMM2 . . . . . . . .
72
5.2
Movement trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
5.3
Sequences of the reaching movement . . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.4
Sequences of the walking movement . . . . . . . . . . . . . . . . . . . . . . . . . .
77
5.5
Sequences of the swing bat movement . . . . . . . . . . . . . . . . . . . . . . . . .
78
5.6
Sequences of the clapping movement . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.7
RMS errors in function of the number of basis functions (components). . . . . . . .
80
5.8
Number of parameters to be stored in function of the number of basis functions . . .
81
5.9
RMS errors in function of the number of parameters to be stored.
. . . . . . . . . .
82
B.1 Properties of Gaussian distributions. . . . . . . . . . . . . . . . . . . . . . . . . . .
91
List of Tables 2.1
Overall process of parametric DMP-GPR using a trajectory database. . . . . . . . . .
19
2.2
Overall process of multi-streams approach. . . . . . . . . . . . . . . . . . . . . . .
25
2.3
Overall process of Standard PGMM approach. . . . . . . . . . . . . . . . . . . . . .
26
3.1
Overall process of proposed PGMM approach. . . . . . . . . . . . . . . . . . . . .
30
3.2
Overall process of adapted multi-streams approach. . . . . . . . . . . . . . . . . . .
32
3.3
Overall process of adapted standard PGMM approach. . . . . . . . . . . . . . . . .
33
3.4
Overall process of adapted GPR with trajectory models database. . . . . . . . . . . .
50
4.1
Overall process of the proposed task-parameterized GMM with missing frames. . . .
63
5.1
Movements from the Human Motion Database . . . . . . . . . . . . . . . . . . . . .
74
xi
Chapter 1
Introduction Robots have been used for several decades in the factories and industrial places to perform different kinds of well-defined tasks within well-structured environments. Most of such tasks are the ones that are repetitive and also difficult for the humans to perform. Therefore, industrial robots have been successfully used in that way, pre-programmed by robotic experts and employed by the trained personnel. However, by the rapid developments of compliant actuators and decreasing cost of the robots, due to the new technological inventions at the same time, the presence of robots within society is becoming more common1 , giving more opportunities to non-expert people to interact with the robots. As a result, the robotic applications are evolving toward the service domain. In this field of applications, the robots are usually in direct contacts with each other and also with humans (it is undesirable and impossible to put security fence around a robot working in home or office environments). It is predictable that in few years, there will be lots of robots employed by inexperienced users for performing different kinds of task, in different working environments. In order to increase the efficiency of using a robot in service domain, the procedure of programming it should be made easier, so that the non-expert people could use the robots without a big amount of effort. In other words, it is not possible, in terms of time and cost, to take the robot to an expert when a new task needs to be done by the robot and ask for reprogramming it for performing the task (the same for calling an expert to come to the place and reprogram the robot). Nowadays, there are lots of different applications for the robots and since it is not possible to have a general purposed robot and program it to be able to tackle all of such applications by predicting all the 1
To have an idea, it is worth to know that at the end of 2012, around 3 million service robots for personnel/domestic
use, including household, entertainment and leisure robots were sold worldwide, 20 % more than 2011, where as according to the predictions, this number will be about 22 million for the period of 2013-2016 (International federation of Robotics (IFR), 2013))
1
2
I NTRODUCTION
possible situations, the end users should be provided with proper tools in order to be able to program robots easily to perform a specific task. Since not all of the end users are robotic experts and have good programming skills, such tools should be simple enough to be used by a wide range of people with different levels of expertise and programming the robot should be done in a natural and intuitive way. On the one hand, the number of tasks that a robot is intended to perform in an industrial environment is limited and even the robot might have been constructed by having those tasks in mind, while the number of tasks that a robot needs to perform in service domain is not limited at all. On the other hand, in the industrial environments, robots are subjected to limited interactions with humans whom are experts in working with the robot, know what are the risks and what to do in cases of any problems, while the robots in the service domain would interact with various people with different levels of expertise, whom might not have enough knowledge and information about how the robot works technically. Therefore, the conventional ways of employing robots would not be enough and there is a need to make the robot programming accessible to non-expert users. Of course, such an approach should consider several issues at the same time, including safety, simplicity of applying the approach, the range of applications for which it could be used, etc. One approach to this goal is robot learning from demonstrations (LfD) or robot programming by demonstration (PbD) (also known as imitation learning), which has attracted a lot of research interest during the recent years Billard et al. [2008]. It is a new way of transferring skills from humans to robots which is much easier and user friendlier than the classical approaches of programming a robot. PbD is inspired by the way in which the knowledge is transferred between human beings while performing a task. Imitation is a natural way of learning new skills for people Lieberman [2002]. In PbD, the end user teaches the robot by performing the task for the robot several times (in slightly different situations) and lets the robot to learn the task (or imitate the human teacher). There is no need to be an expert in computer programming or in robotics. Such methods develop robot controllers, by extracting the main features of the task, which are derived from the observations of a teacher’s performance and they are expected to extend and adapt the learnt task to novel situations and changes in the environment (generalize the task by observing several demonstrations). In other words, PbD approaches build models that are based on the examples or demonstrations which are performed by a teacher. The learning by imitation is usually conducted in two different levels of abstraction: symbolic level Muench et al. [1994] and trajectory level Ude [1993]. In the symbolic level, the task is described by a series of primitives that are pre-determined or extracted from pre-defined rules. On the other hand, in the trajectory level, the task is described by continuous variables representing different time-
1.1
Scope and Impacts
3
dependant configuration properties. The symbolic level uses a higher level of abstraction compared to the trajectory level. In this thesis we will mainly focus on the trajectory level processing of the task. This change in the way of programming robots could be compared analogously to what has happened in the field of personal computers. Going back in the history, only expert people had access to the sophisticated and expensive computers and were able to write codes and use them, however, nowadays almost every body has access to affordable computers and knows how to use it for different purposes (or at least it is very easy to learn how to use it), even though he/she knows nothing about programming and developing code or how different parts of the computer are connected to and communicate with each other.
1.1
Scope and Impacts
To learn from demonstrations, the robots must have the ability to obtain information while a teacher executes the task. There are several ways of providing demonstrations, including kinesthetic teaching (physically holding the robot and moving it to perform the task), tele-operation, visual tracking of a human performing the task by vision systems and using exoskeletons. Demonstrations have the following two interesting features: firstly, they are an intuitive means of communication from humans, who already used it to teach other people, and secondly the dataset focuses on the areas of state-space which are actually encountered during task execution, see Argall et al. [2009]. Research on LfD is influenced by a variety of fields, including control theory, artificial intelligence, statistics, psychology, ethology, and neuro-physiology, see Billing and Hellstr¨om [2010]. The main questions in learning by imitation are what, when, how and who to imitate, see Lieberman [2002]; Billing and Hellstr¨om [2010]; Schaal et al. [2003]. We will focus on what to imitate and how to imitate parts, which deals with the ways of encoding the performed skill by extracting the important features of the task. It should also be noticed that imitating a task is not simply recording and replaying movements. Instead, the learned task should be robust to changes in the environment during the reproductions, and to imperfect executions, see Atkeson and Schaal [1997b].
1.2
Motivations and Objectives
The motivation for the work described in this thesis is to develop new PbD approaches and improve the existing ones by addressing some of the open problems in the field. The advantages of using PbD
I NTRODUCTION
4
approaches to teach a task to a robot are as follows, according to Billard et al. [2008]: 1. Reducing the complexity of the search space for learning, by providing good examples and bad examples of the performed task. Therefore, the learning process will be enhanced and accelerated. 2. It is a natural way of interacting by a machine, since it offers an implicit means of teaching the task to the robot rather than explicitly programming it. 3. Studying and modelling the coupling of perception and action, helps to understand the mechanisms by which the self-organization of perception and action could arise during development. The reciprocal interaction of perception and action could explain how competence in motor control can be grounded in rich structure of perceptual variables, and vice versa, how the processes of perception can develop as means to create successful actions. 4. The reproduced movement by the robot is similar to human movements and therefore implicitly is safe, since it is predictable for the humans. Several approaches have been proposed to tackle the robot programming by demonstration problem using different tools from diverse areas such as statistics, machine learning, control theory and a combination of them. However, the different lines of work that have been performed, could be categorized into the following main streams: Reinforcement Learning (RL), see Waltz and Fu [1965]; Mendel and McLaren [1970]; Sutton and Barto [1998]; Peters et al. [2003]; Yoshikai et al. [2004], using self-experimentation by the robot performing the task itself and evaluating its performance and trying to improve it over time, in a trial-and-error sense and Imitation Learning (IL), see LozanoPerez [1983]; Segre and DeJong [1985]; Kuniyoshi et al. [1989]; Muench et al. [1994]; Delson and West [1996]; Schaal [1999]; Sato et al. [2003]; Nicolescu and Mataric [2003]; Jansen and Belpaeme [2006]; Ogawara et al. [2003]; Ekvall and Kragic [2006]; Pardowitz et al. [2007], providing several demonstrartions performed by the teacher and having the robot to extract the main features of the task and learning how to perform it in new scenarios. Both approaches are founded on the learning mechanisms of biological systems. As another alternative, it has been shown by several researchers that RL could be used jointly with IL. In this case, the demonstrations are used to initialize the self-experimentation as presented in Atkeson and Schaal [1997a]; Bentivegna et al. [2004]. In other words, the robot can build its skills upon the provided demonstrations and then improve them (possibly beyond the skills of the teacher) using RL. One of the main desirable properties of learning by demonstration approaches, is their generalization to new conditions. If the robot manages to successfully repeat a demonstrated behaviour under
1.2
Motivations and Objectives
5
different conditions than during the demonstration we say that the robot is able to generalize the demonstrated behaviour, see Billing and Hellstr¨om [2010]. This means that the learning approach should have the capability to perform a good behaviour in the cases in which there are some changes, for example in the working environment, or in the specifications of the task, such as changing the target point. Some of the available PbD approaches have good performance in terms of generalization (for a limited range of changes), however, in order to have better generalization results, a new family of the approaches has been emerged during the recent years. In these approaches the learned model for the task takes into account some variables which are task specific. This is due to the fact that most of movements depend on task parameters such as the positions of objects or the locations of targets that can locally influence the shape, amplitude, direction or timing of movements. In other words, for each individual demonstration, the movement depends on a set of parameters and is modulated by these parameters. Several terms have been used to refer to the task parameters, such as style variables, in Brand and Hertzmann [2000], query points, in Ude et al. [2010] and frames of references, in Calinon et al. [2010a]. Recently, many approaches are proposed to tackle this problem, using different tools and usually by improving original methods available for the conventional case. The approaches are mainly based on HMMs (Brand and Hertzmann [2000]; Herzog et al. [2008]; Kr¨ueger et al. [2010]), dynamic movement primitives (Forte et al. [2012]; Stulp et al. [2013]; Matsubara et al. [2010, 2011]; Ude et al. [2010]), reinforcement learning (Kober et al. [2012]; Silva et al. [2012]), Gaussian mixture regression (Calinon et al. [2007, 2010a]) and dynamical systems combined by Gaussian mixture regression and Gaussian process regression (Kronander et al. [2011]). The above mentioned approaches are applied to learn different robotic tasks such as playing mini-golf Kronander et al. [2011], throwing dart Kober et al. [2012], playing table tennis Kober et al. [2012]; Matsubara et al. [2010], reaching movement Forte et al. [2012]; Ude et al. [2010], grasping Herzog et al. [2008], drumming Ude et al. [2010], obstacle avoidance Matsubara et al. [2011]; Stulp et al. [2013], action recognition and synthesis Kr¨ueger et al. [2010] and object moving tasks Calinon et al. [2007]. In one family of approaches, HMM is extended into parametric HMM (PHMM) such that the centres and covariance matrices of the observation Gaussian density functions are modulated by the task parameters Brand and Hertzmann [2000]; Herzog et al. [2008]; Kr¨ueger et al. [2010]. In another family of approaches, DMPs are used as the main learning tool. In Matsubara et al. [2010, 2011], the formulation of the external modulating force is adapted to contain the task parameters, in an implicit way (task parameters are not provided directly, but rather extracted from the demonstration data using principal components analysis). In Ude et al. [2010]; Forte et al. [2012], DMP is combined
6
I NTRODUCTION
with GPR in order to build a mapping between the parameters of DMP and the task parameters, using a trajectory database. In Stulp et al. [2013], the modulating external force of DMP is adapted in such a way that the task parameters are passed as its input variables (by increasing the dimensionality of the modulating force). An approach which is based on reinforcement learning, presented in Kober et al. [2012], builds a map between task parameters and DMP meta-parameters based on cost-regularized kernel regression. In another reinforcement learning based approach, Silva et al. [2012], a combination of ISOMAP and Support Vector Machines (SVM) is used to train a family of independent non-linear regression models mapping task parameters to policy parameters. SVM is used to choose a proper non-linear regression algorithm. An interesting aspect of this approach is that for different sections of the task parameter space, different regressors are learned, and therefore the problem of averaging across multiple task parameters that achieve the same task is avoided. For instance, reaching around an obstacle will succeed by going around it to the left or right, but the average of these two motions will not succeed. In Calinon et al. [2007] an approach is proposed that relies on replicating the recorded movement by projecting it in several frames of references, and learning separated GMMs for each frame of reference. GMR was then used to retrieve a generalized motion associated with each frame of reference in the form of a series of Gaussians. A metric of imitation performance was determined (with weights locally reflecting the importance of these different frames), which was then analytically derived to estimate optimal trajectories during reproduction. Later on, in Calinon and Billard [2009] it is shown that a trade-off between constraints in joint space and task space could be locally estimated through products of Gaussians, and that the estimation actually corresponds to the derivation of the same metric of imitation performance. A combination of dynamical systems (DS) and GMR/GPR is used to learn task-parameterized motion in Kronander et al. [2011]. The approach is divided in two parts. In the first part, a DS is used to model a hitting motion and in the second part, a GMR/GPR is used to find a good mapping between hitting parameters and the hitting locations. Most of the above mentioned approaches allow the system to learn one general model for several demonstrations performed in different scenarios (for the same kind of task). Compared to the existing approaches, these approaches have better generalization properties and can be used to learn task awith a higher variability (for some of them, they could be used to learn tasks that do not depend on the task parameters, in which case, the task parameters could be assumed to be constant all the time, for all the demonstrations and reproduction attempts). Even though the generalization feature of the learning approaches is improved in this way, there are still several issues to be considered. One issue which is considered in this thesis, in the absence of
1.3
Contributions of the thesis
7
some of the task parameters, during some demonstrations and/or reproductions. the above mentioned approaches have not considered this issue and therefore their proposed methods could not be used in their current format to deal with this problem. The approaches developed in this thesis are inspired by a family of the approaches which rely on the encoding of trajectories in several candidate frames of reference, see Alissandrakis et al. [2006]; Calinon et al. [2007]; Cederborg et al. [2010]; Muehlig et al. [2009]; Schneider and Ertel [2010]. The core of these approaches rely on projecting the same training set in different frames of reference, and training separate models in these frames before merging the learned model in the reproduction phase. However, in the proposed approach, only one model is built for all the demonstrations and the projection of demonstrations in different frames of references is performed inside the model.
1.3
Contributions of the thesis
This thesis addresses some of the challenges in learning from demonstrations, including mainly task parameter dependency of the motions, the absence of some of the task parameters during demonstrations and/or reproductions , by using a probabilistic approach to learn the parameters of a DMP. At the first step, the task-parametrized Gaussian mixture model is proposed to take into account the variability of the task due to the task parameters. The proposed approach has specific advantages over the available approaches, such as modulating the covariances as well as the centres of the Gaussian components with the task parameters. The proposed approach is evaluated using real world robotic experiments, such as robotic dust sweeping using a Barret WAM robot. Since the issue of the availability of all task parameters at all times is important to consider, the proposed approach is modified and adapted to consider the cases in which some of the task parameters are missing during the demonstration and/or reproduction phases. This issue has been overlooked so far in the existing task-parameterized approaches. Another contribution of this thesis, concerns the exploitation of Gaussian mixture regression to learn the parameters of DMPs. In the proposed approach both the parameters of the local models and their corresponding weights are estimated based on the demonstrated data, using Gaussian mixture regression.
1.4
Thesis Outline
The problem is described in Section 2.1, given the main requirements, the inputs and outputs of the learning system. An overview of the available approaches related to programming by demonstration
8
I NTRODUCTION
considering the task parameter dependence is given in Section 2.2. Some approaches using different statistical, machine learning and control theory tools including dynamical systems, Gaussian processes, Gaussian Mixtures, Hidden Markov models, etc. are investigated. Namely, time-series style machines or stylistic hidden Markov model (alternatively style-specific HMM), is reviewed in Section 2.2.1. In Section 2.2.2 an approach which is based on dynamic movement primitives (DMP) is described. Another approach using DMPs is presented in Section 2.2.3, which is based on a database of the trajectories. Parametric hidden Markov models (PHMM), as an approach to learn task dependent movements is presented in Section 2.2.5. Chapter 3 introduces the task-parameterized Gaussian mixture model (TP-GMM) which is developed to ameliorate the generalization capability over previously proposed task-parameterized approaches, in situations where the task can be described in terms of movement acting in several coordinate systems. In the proposed approach, task parameters are considered as candidate frames of references and each demonstration is associated with a set of frames of references which can be different from one demonstration to the other one and also can be varying during a specific demonstration. The model is built considering both the recorded data and the frames of references. In this approach, both the centres and the covariance matrices of the Gaussian components are parameterized by the task parameters. Later on, in the reproduction phase, the new trajectories are estimated, using the developed model, for the new set of frames of references. An extension of PGMM for the cases in which some of the task parameters are not available during the demonstrations and/or reproductions is provided in Chapter 4. This new approach could be useful in learning tasks in which the number of task parameters is not fixed and it might be changed during the demonstrations and/or reproduction. This could encompass a variety of tasks, including for example via-point passing, dust sweeping and glass filling. As a proof of concept, it is applied to learn a dust sweeping task by the robotic manipulator. In Chapter 5, an approach is proposed to learn the parameters of Dynamic Movement Primitives (DMP) using Gaussian mixture regression (GMR). The proposed approach has the ability to learn the parameters of the local models, and their corresponding weights at the same time, and also have nice properties such as considering the correlation information between different variables of the task. Finally, the concluding remarks and the future works are presented in Chapter 6. The thesis also includes three appendices. Appendix A summarizes the publication of the author. Appendix B summarizes some of the properties of the Gaussian distributions that are exploited in this thesis.
1.5
1.5
Publication note
9
Publication note
Most of the materials presented in chapters 3, 4 and 5 have been published or are under review in peer-reviewed conference proceedings. The source codes developed for the proposed approaches together with the videos of the robotic experiments are available online at the following address:
http://programming-by-demonstration.org/
Chapter 2
Concepts and Backgrounds (state of the art) The state-of-the-art and a review of the existing approaches for programming by demonstration is provided in this chapter. At first, the problem of learning by imitation is presented and then some of the state-of-the-art approaches considering the task parameters are presented in more detail.
2.1
problem statement
Generally speaking, the programming by demonstration problem could be considered as a three stage procedure, namely demonstrations, model learning and reproductions. Obviously, demonstrations is the first step to be performed, because without it, there is no information about what to be imitated. After the demonstrations are provided, a model should be learned based on these information in order to have a good approximation of the system generating the movements. Finally, using the learned model, for probably new situations, reproductions could be performed. This general overview is depicted in Fig. 2.1. Usually, the demonstrations are stated in terms of several variables changing over time, such as
Figure 2.1: A general overview of the PbD problem
11
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
12
Figure 2.2: A general overview of the task-parameterized PbD problem
Cartesian positions of hand, positions of different joints and so on. However, in the case of taskparameterized movements, task parameters should also be recorded together with the demonstrations and also, these parameters should be provided for performing the reproductions, as well. Therefore, the learning problem can be summarized as illustrated in Fig. 2.2. In the next section, an overview of some of the available approaches for learning task-parameterized movements will be investigated.
2.2
Available approaches
In this section, some of the available approaches which could be used in the context of task-parameterized PbD are provided.
2.2.1
Style machines
Brand and Hertzmann [2000] have approached the problem of stylistic motion synthesis by learning motion patterns from a highly varied set of motion capture sequences. Each sequence may have a distinct choreography, performed in a distinct style. Their approach is based on Hidden Markov Model (HMM). HMM, which is a statistical Markov model, has been used widely in different areas of scientific research such as pattern recognition, language analysis and speech processing. It is a generative model and a finite state machine extended in a probabilistic manner. It also could be considered as a generalization of mixture model in which the transition between different components of the model is based on a Markov process. A HMM consists of finite set of states st and the transition between these states is governed by a probabilistic transition matrix representing transition probabilities. Each state of HMM is associated with an output probability function. At each time step, the observer can only see the output of HMM, yt , without knowing which state has generated it (that is why it is called
2.2
Available approaches
13
Figure 2.3: A simple left to right HMM
Hidden Markov model). See Rabiner and Juang [1986] and Rabiner [1989] for more details about HMMs. As an illustrating example, a left-right HMM is shown in Fig. 2.3. In this example, aij , bi (xt ) refer to the transition probabilities between the i-th and j-th states and the output probability distribution of the i-th state, respectively. In the case of continuous HMMs, the output probability distributions of HMM are usually considered as multivariate Gaussian distributions N (µi , Σi ). In this case, the parameters of the HMM are the transition probabilities between different states (AT ), centres (µi ) and covariance matrices (Σi ) of the output Gaussians and initial probability of being in each state at time 0 ($i ). Therefore, a HMM, composed of K states, can be described as: λ = {$k , AT , µk , Σk }K k=1
(2.1)
The output variable of HMM is considered as a continuous-valued time series X. The likelihood of a path X = {x1 ; x2 ; · · · ; xT } with respect to particular sequence of hidden states S = {s(1) ; s(2) ; · · · ; s(T ) } T Y is the probability of each point on the path with respect to the current hidden state ( ps(t) (xt )), t=1
times the probability of the state sequence itself, which is the product of all its state transitions T Y (Ps(1) Ps(t−1) →s(t) ). When this is summed over all possible hidden state sequences, one obtains t=2
the likelihood of the path with respect to the entire HMM: P (X|λ) =
X S∈S T
Ps(1) ps(1) (x1 )
T Y
Ps(t−1) →s(t) ps(t) (xt )
(2.2)
t=2
The parameters of the HMM are estimated according to the output data observed at each time step, using for example the alternating steps of Expectation-computing a distribution over the hidden
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
14
states and Maximization-computing locally optimal parameter values with respect to that distribution, (EM) algorithm, (see Rabiner [1989] for more details). The Expectation-step contains a dynamic programming recursion for Eq. (2.2) that saves the trouble of summing over the exponential number of state sequences in S T : P (X|λ) =
X
αT,i
(2.3)
i
X
αt,i = pi (xt )
αt−1,j Pj→i ; α1,i = Pi Pi (x1 )
(2.4)
j
α is called the forward variable; a similar recursion gives the backward variable β: βt,j =
X
βt+1,j pj (xt+1 )Pi→j ; βT,1 = 1.
(2.5)
j
In the E-step the variables α and β are used to calculate the expected sufficient statistics w = {C, γ} that form the basis of new parameter estimates. These statistics count the expected number of times the HMM transitioned from one state to another Cj→i =
T X
αt−1,j Pj→i pi (xt )βt,i /P (X, λ).
(2.6)
t=2
and the probability that the HMM was in hidden state si when observing datapoint xt αt,i βt,i . γt,i = P i αt,i βt,i
(2.7)
These statistics are optimal with respect to all the information in the entire sequence and in the model, due to the forward and backward recursions. In the M-step, one calculates maximum likelihood parameter estimates which are simply normalizations of w Ci→j Pˆi→j = P . i Ci→j P γt,i xt µˆi = Pt . t γt,i P ˆ i )> γt,i (xt − µˆi )(xt − µ ˆ P Σi = t . t γt,i
(2.8)
(2.9) (2.10)
After training, Eqs. (2.9) and (2.10) can be used to remap the model to any synchronized time-series. Based on HMM, Brand and Hertzmann [2000] have proposed an approach for stylistic human movement analysis and synthesis. Their proposed model which is called stylistic HMM (SHMM) or style machine is an HMM whose parameters are functionally dependant on a style variable v. A schematic view of the SHMM is shown in Fig. 2.4.
2.2
Available approaches
15
Figure 2.4: Schematic of a stylistic HMM.
In other words, the mean vector and covariance matrices of the output Gaussian distributions of the HMM, are modulated by v. This modulation is performed by U i and W i matrices as follows: bi (xt ) = N (µi + U i v, Σi + W i v)
(2.11)
{U i , W i } are the dominant eigenvectors obtained in the post-training PCA of the style-specific models; and v can be estimated from data and/or varied by the user. An Entropy maximization approach is used to have an estimation of the model parameters. The optimal Gaussian parameter settings for minimizing cross entropy with respect to the datapoints xi and a reference Gaussian parameterized by mean µ• and covariance Σ• are PN ˆ= µ
ˆ = Σ
PN i
i
0
xi + Z µ• N + Z0
(2.12)
0
ˆ ˆ > + Z ((µ• )(µ• )> + Σ• ) (xi − µ)(x i − µ) N + Z + Z0
(2.13)
0
The factors Z, Z vary the strength of the entropy and cross-entropy priors in annealing. For more details see Brand and Hertzmann [2000]. One specific property of SHMM is that it is not needed to provide the style parameter as an input to the system. SHMM automatically derives the style parameter and its dimension. SHMM is used for the analysis and synthesis of human movements, however, it can be used for learning task-parameterized human motions as well.
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
16
2.2.2
Parametric DMP
Dynamic movement primitives (DMP) has been developed as a powerful tool for learning by imitation. It is originally inspired by the fact that the human movements are composed of limited number of movement primitives and humans perform different movements by combining these primitives through time. DMP is a dynamical systems approach to the learning problem. It is a second order dynamical system, acting as a virtual spring-damper perturbed by a non-linear force profile used to modify the shape of the movement [Ijspeert et al., 2013]. For a discrete movement, the system is described as τ 2 y¨ = αz (βz (g − y) − τ y) ˙ + f (x), τ x˙ = −αx x, PN i=1 Ψi (x)wi x, f (x) = P N i=1 Ψi (x) 1 2 Ψi (x) = exp − 2 (x − ci ) , 2σi
(2.14) (2.15) (2.16)
where y, y˙ and y¨ are position, velocity and acceleration of the system. g defines the goal, τ the duration of the movement. αz , βz and αx are hyper-parameters of the model. (5.1) is known as the transformation system, (5.2) the canonical system with phase variable x, and (5.3) the learned force profile. This force profile is represented as a weighted sum of constant weights wi , activated by radial basis functions Ψi (x). For periodic movements, DMP is similarly defined as τ 2 y¨ = αz (βz (g − y) − τ y) ˙ + f (φ, r), τ φ˙ = 1, PN i=1 Ψi (φ)wi r, f (φ, r) = P N i=1 Ψi (φ)
(2.17) (2.18)
Ψi (φ) = exp γi (cos(φ − ci ) − 1) , where φ is the phase angle of an oscillator and r is the amplitude of the oscillation. τ and r are used to generalize the movement to different speeds (periods of oscillation) and amplitudes. The standard learning procedure in DMP (for example, for discrete movements) consists of predefining Ψi (x), and estimating wi with weighted least squares. After setting the hyper-parameters αz , βz , αx and N (number of basis functions), ci (center of basis functions) and σi (width of basis functions), the learning problem consists of minimizing for each component i ∈ {1, . . . , N } the cost function Ji =
P X t=1
2 Ψi (t) ftarget(t) − wi x(t) ,
2.2
Available approaches
17
resulting in a weighted least squares solution wi = (X> Γi X)−1 X> Γi F , in which
x(1) Ψi (1) · · · . . .. . . X= . . ,Γi = .
x(P )
0
···
0 .. .
(2.19)
ftarget(1) .. ,F = , .
Ψi (P )
ftarget(P )
where P is the number of data points, and ftarget(t) is the force profile calculated at each time step according to the recorded movement as ftarget(t) = τ 2 y¨demo(t) − αz βz g − ydemo(t) − τ y˙demo(t) , where ydemo(t), y˙demo(t) and y¨demo(t) are the position, velocity and acceleration of the system at time step t. In order to make the DMP model task dependant, Matsubara et al. [2011, 2012] proposed to model the attractor function f (x) by a multifactor function. An additional control variable, s is introduced as f (x, s), so that f (x, s) with a given value of s represents the attractor landscape encoding a specific demonstration as: f (x, s) =
J X
sj bj (x; wj ) = s> b(x; W ),
(2.20)
j=1
where s = [s1 · · · sJ ]> is a style variable (task parameter), b(x; W ) = [b1 (x; w1 ) · · · bJ (x; wJ )]> are the basis functions and W = [w1 · · · wJ ]> is a parameter matrix. J is the dimension of the style parameter. The authors called the resulting model, parametric DMP (PDMP). An algorithm with the following four steps is proposed to learn the parameters of the model from several demonstrations: m M 1. Alignment of the demonstrations. Assuming M number of demonstrations {y m demo (tc )}m=1 , m m = C m ∆t. tm c = c∆t, c = 1, · · · , C , and the duration of each demonstration is given as T
After selecting a nominal trajectory indexed by c ∈ {1 · · · M }, other trajectories are timescaled by the ratio ym demo ∈ R
C n ×1
Tn Tm
so that all trajectories are represented as the same size of vector as
for all m. Therefore, for all of the demonstrations we have y 1:M demo =
{y 1demo , · · · , y M demo }. 2. Calculation of the target force, f target according to the demonstration data, y target separately m m for each demonstration m, by numerically integrating the attractor dynamics (as in the case of 1 M DMPs). Then we have f 1:M target = {f target , · · · , f target }.
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
18
m 1:M J 1 3. Extraction of basis targets f 1:J basis = {f basis , · · · , f basis } from all targets f basis so that f target PJ m j is approximately represented as f m target ≈ j=1 sj f basis , where J 6 M . The basis targets
and corresponding style parameters for all demonstrations can be extracted by a matrix factorization with singular-value decomposition (SVD). The factorial representation of the M × C 1 M > matrix F all target = [f target · · · f target ] using SVD is obtained as: > F all target = U ΣV ≈ SF basis .
(2.21)
The style parameter S = [s1 · · · sM ]> ∈ RM ×J is defined to be the first J(6 M ) rows of U , and the basis target matrix F basis = [f 1basis · · · f Jbasis ] ∈ RJ×C to be the first J columns of ΣV > . The dimension J can be determined with the singular value spectrum (e.g., select J so PJ
that
j=1 δj
PM
m=1 δm
> 0.9, where δj is the jth diagonal element of Σ ordered in decreasing order).
4. Learning W using a supervised learning approach. This is done separately for each j ∈ {1, · · · J} as n
w∗j
← arg min wj
C X
k f jbasis (tc ) − bj (x; wj ) k2 .
(2.22)
c=1
The optimal values of w∗j constitute the task-parameterized external force f (x, s; W ∗ ), which is parameterized by the style variable s. For performing the supervised learning problem in the fourth step, Gaussian process regression is used. The style parameters in this model, could be considered to be simply task parameters, or alternatively, a function of the task parameters. In order to perform the reproduction, the value of the new style parameter s∗ should be given. Then, using the already learnt weights W ∗ , the external force is calculated and put into the main formulation of DMP, yielding the values of position, velocity and acceleration at each time step. The approach is used in Matsubara et al. [2011] and Matsubara et al. [2012] to learn point to point reaching movements with different heights of an obstacle and stylistic arm movements of walking behaviour from several demonstrations. It is shown that the approach works well in both interpolation and extrapolation of the style variable.
2.2.3
Parametric DMP-GPR using a trajectory database
Another approach which is based on DMPs is proposed by Ude et al. [2010] and Forte et al. [2012]. However, instead of reformulating the modulating external force of DMP to be a function of the task parameters, in these works, the authors first learn a DMP model for each demonstration separately and then build a database of trajectories containing the task parameters and the corresponding learnt
2.2
Available approaches
19
Table 2.1: Overall process of parametric DMP-GPR using a trajectory database. 1. Task demonstration - Set P (number of candidate frames) for m ← 1 to M , (for each demonstration) - Collect task parameters q m for n ← 1 to Tm (for each time step) - Collect datapoint {ξ m,n } end end 2. Model fitting Set N (number of basis functions in DMP) and initialize other parameters for m ← 1 to M (for each demonstration) - Fit a DMP to ξ m with WLS Θm = {wm , g m , τm } - Associate query points q m with the corresponding DMP parameters Θm end Pre-compute [K(Q, Q) + σ 2 I]−1 Θ 3. Reproduction - Collect/select new query point Qd - Use GPR to estimate y¯∗ (using equation 2.23) - Build a DMP using the new parameters, y¯∗ for n ← 1 to T (for each reproduction step) - Use DMP formulation to estimate the values of trajectory at each time step. end
DMP parameters. This results in a significant data reduction. A set of demonstrations ξ = {ξ m }M m=1 is recorded together with their corresponding task parameters (query points) q = {q m ∈ Rq }M m=1 . The demonstration data are recorded in the form of trajectories ¨ mj ∈ RDOF }, measured at times tmj with position, velocity and acceleration as ξ m ≡ {y mj , y˙ mj , y in which j = 1, · · · , Tm and Tm is the number of samples in the mth demonstration. It is assumed that tm1 = 0. DOF denotes the number of degrees of freedom encoded by the example trajectories. The goal is to find a mapping between the task parameters and the movements G : q ξ. However, in general, G is not a function. For example, in the case of reaching movements, there are many different ways to reach towards a desired destination. However, we can impose an additional constraint
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
20
that synthetic reaching trajectories should be similar to the example reaching trajectories. The closer the desired query point q is to the example query point q j , the more similar the generated trajectory ξ should be to the trajectory ξ j associated with query point q j . With this additional constraint, G(q; ξ 1 , · · · , ξ M ) becomes a function that can be learned. As the first step each trajectory should be encoded as a DMP. To do so, a linear regression tool is used and the initial raw trajectory data ξ m is converted into a DMP, i.e. ξ m (wm , g m , τm ), in which wm ∈ RN ×DOF are the weights of DMPs for all degrees of freedom with N number of basis functions, g m ∈ RDOF are the final configurations on the example trajectories and τm ∈ R are the time durations of example trajectories. This results in a significant data reduction. At the next step, Gaussian process regression (GPR) is applied to estimate the G function. GPR predicts the output values together with their confidence intervals of the output variables of G. Furthermore, GPR exhibits good generalization performance and the predictive distribution can be used to measure the uncertainty of the estimated function. The GPR is implemented as: g(q) ∼ GP (m(q), k(q, q 0 ))
(2.23)
where m(q) = E[g(q)] is the mean function and k(q, q 0 ) = E[(g(q − m(q))(g(q 0 − m(q 0 )))] is the covariance function of the process. A set of noisy observations is considered as {q m , y m |m = 1, · · · , M }, in which y m = g(q m ) + , ∼ N (0, σn2 ). Here y m is a vector containing the parameters of the associated DMP (y m = [τm
gm
wm ]> ). It can be assumed that m(q) = 0, by subtracting
the mean from the training data, then given a set of query points g(q ∗ ), the joint distribution of all outputs is given by K(Q, Q) + n I y ∼ N 0, y∗ K(Q∗ , Q)
K(Q, Q∗ )
!
K(Q∗ , Q∗ )
(2.24)
where Q, Q∗ , y, and y ∗ respectively combine all inputs and outputs and K(., .) are the associated ¯ ∗ associated with joint covariance matrices calculated according to Eq. (2.23). The expected value y the new query points q ∗ is given by Rasmussen and Williams [2006] ¯ ∗ = E[y ∗ |Q, y, Q∗ ] = K(Q, Q)[K(Q, Q) + σn2 I]−1 y, y
(2.25)
and the covariance of the prediction is given as cov(y ∗ ) = K(Q∗ , Q∗ ) − K(Q∗ , Q)[K(Q, Q) + σn2 I]−1 K(Q, Q∗ ).
(2.26)
The employed kernel function can for example be defined as 0
K(q, q ) =
σf2
q X i=1
0
1 (qi − qi )2 ) exp(− 2 li2
(2.27)
2.2
Available approaches
21
which results in a Bayesian regression model with an infinite number of basis functions. m denotes ¯ ∗ are calculated, they are used as the the dimension of the query point space. When the values of y parameters of a DMP and then the values of the new trajectory are calculated accordingly. A summary of the approach is presented in Table 2.1. In order to choose suitable values for the parameters of GPR, (σn , σf , li ) the authors used tools such as maximizing marginal likelihood and leave-one-out cross-validation. The proposed approach is used to learn reaching and grasping movements by a humanoid robot and a robotic manipulator from human demonstrations. The experimental results presented in Ude et al. [2010] and Forte et al. [2012] show that it has good interpolation capabilities.
2.2.4
Multi-streams approaches
One family of approaches is based on the concept of projecting the original data of the demonstrations in different coordinate systems. In Calinon et al. [2007] an approach is proposed that relies on replicating the recorded movement by projecting it in several frames of references, and learning separated GMMs for each frame of reference. Gaussian nixture regression (GMR) was then used to retrieve a generalized motion associated with each frame of reference in the form of a series of Gaussians. In Dong and Williams [2012], an approach based on probabilistic flow tube (PFT) representation is proposed in which the model is parameterized by the relevant variables of the motion. After collecting the demonstrations, the so called relevant motion variables are extracted. The motion variables are the important features or relations in the demonstration, which are a kind of task parameters. Afterwards, in order to learn the model and reproduce the new movements, for the new environment of initial states, the set of motion variables are determined. Then, a subset of the demonstrations for which the motion variables are similar to the new ones, are selected. The selected demonstration sequences are projected in the new frames of references (by scaling, translation, and rotation), so that the values of the motion variables match the values of the new motion variables. Finally a probabilistic flow tube is generated for these data and then the new reproduction is constructed. In another approach employing the idea of projecting movements in different frames of references, M¨uhlig et al. [2012] have tackled the PbD problem by constructing a layered structure. The main learning layer of the approach is consisted of three sublayers. At the first step, the demonstrations are segmented based on some defined criteria. Afterwards, at the second step, for each segment of the overall movement, a feasible task space is chosen from a pool of task spaces, based on the consistency of the representation of the demonstration in the corresponding task space. In other words, a consistency score is calculated for the projection of the object movements into each task space in the task space pool, and then the one with the highest
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
22
score is chosen. All the task spaces in the task space pool are predefined. After choosing the best task space, the demonstration data is projected into it and dynamic time warping is applied to temporally normalize all the demonstrations. Then, finally a GMM, as a probabilistic tool, is used to model the demonstrated movements. The core strategy in Calinon et al. [2010a], consists of observing the same movement from different landmarks or coordinate systems, and training separated models in these frames. The learned models are then merged during reproduction to reproduce a generalized version of the movement. The learning approach is based on HMM and Gaussian mixture regression (GMR). The position ˙ are recorded for each demonstration and velocity of the end-effector (x and x) data in the frame x m of reference of the torso of the humanoid robot forming a data set in which x = {xt }Tt=1 x˙ m . Afterwards, these data are projected separately into the different frames of and x˙ = {x˙ t }Tt=1
references based on each landmark n with the position o(n) and orientation matrix R(n) through >
>
˙ Then for each landmark, the joint probability of the x(n) = R(n) (x − o(n) ) and x˙ (n) = R(n) x. two variables, P(x(n) , x˙ (n) ) is modelled using a HMM which is learned using the Baum-Welch algorithm. The parameters of the HMM are {Π(n) , A(n) , µ(n) , Σ(n) } in which µ(n) and Σ(n) represent (n)
the means and covariances of the output Gaussian components. For the i-th component, µi (n)
Σi
and
can be written in the following format:
(n)
µi
(n) (n) x µi Σi x (n) = , Σi = (n) (n) (n) µi x˙ Σi x˙ x
Σi
x(n) x˙ (n)
x˙ (n)
Σi
(2.28)
In order to retrieve the velocity given the position information (x(n) ) for each landmark, GMR is ˆ˙ (n) and the covariance x ˆ˙ (n) of the velocity using the conditional used to estimate the mean value x probability rules: (n)
ˆ x˙ ˆ˙ (n) , Σ x˙ (n) |x(n) ∼ N (x
),
(2.29)
where ˆ˙ (n) = x
K X
(n)
hi (x)[µix˙
(n)
˙ + Σx i
x(n)
(n)
(Σx i
(n)
)−1 (x(n) − µx i
)]
(2.30)
]
(2.31)
i=1
and (n)
ˆ x˙ Σ
=
K X i=1
(n)
h2i (x)[Σix˙
(n)
˙ − Σx i
x(n)
(n)
(Σx i
(n) x ˙ (n)
)−1 Σix
2.2
Available approaches
23
in which the weights are determined as K X
hi (x(n) ) =
(n) (n) (n) (n) hj (xt−1 aji ) N (xt |µx , Σx ) i i
j=1 K h X K X
(2.32)
(n) hj (xt−1 ajk )
i
(n) (n) (n) N (xt |µx , Σx ) k k
j=1
k=1
where hi (x(n) ) represents the HMM forward variable and (n)
(n) hi (x1 )
(n)
(n)
πi N (x1 |µx , Σx ) i i = K . X (n) x(n) x(n) (πk N (x1 |µk , Σk ))
(2.33)
k=1
During the reproduction, for new positions o0(n) and orientations R0(n) of the landmarks, the generˆ˙ (n) of the end effector, estimated using GMR for different landˆ (n) and velocity x alized position x marks, are projected back into the frame of reference attached to the torso of the humanoid robot as ˆ˙ 0 (n) = R0(n) x ˆ˙ (n) . ˆ 0(n) = R0(n) x ˆ (n) + o0(n) and x x The associated covariances matrices are transformed through the linear transformation property of (n) (n) (n) (n) ˆ 0x = R0(n) Σ ˆ x R0(n)> and Σ ˆ 0x˙ ˆ x˙ R0(n)> . Finally, the Gaussian distributions as Σ = R0(n) Σ Gaussian product is used to obtain the estimated values of position and velocity in the frame of reference of torso of the humanoid as
0x˙
ˆ )= ˆ˙ 0 , Σ N (x
N Y
0x˙ (n)
ˆ ˆ˙ 0(n) , Σ N (x
),
(2.34)
).
(2.35)
n=1
and ˆ 0x ) = N (ˆ x ,Σ 0
N Y
(n)
ˆ 0x N (ˆ x0(n) , Σ
n=1
Table 2.2 summarizes the multi streams approaches. The approach is used to teach several robots different tasks such as hitting a ball with a table tennis rocket, feeding another robot by a spoon (as already mentioned) and performing a simple periodic motion. One of the performed experiments consists of feeding a robotic doll (Robota) using a humanoid robot (HOAP-3). The humanoid robot brings a spoon towards a plate to take mashed potato and then moves the spoon towards the mouth of the Robota, with an appropriate path, in order to feed it. The positions and orientations of the plate and the mouth of Robota (forming the task parameters) are considered to be changing from one demonstration to the other.
24
2.2.5
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
Parametric Hidden Markov model
The parametric hidden Markov model (PHMM) was originally introduced for recognition and prediction of gestures in Wilson and Bobick [1999] and extended in robotics to movement generation in Kr¨ueger et al. [2010] and Yamazaki et al. [2005]. We will refer to parametric Gaussian mixture model (PGMM) when the transition and initial state probabilities are not taken into account in the likelihood estimation. In PGMM The centres of the Gaussian components are parameterized by the task parameters. The modulation is done through a linear relationship. Yamazaki et al. [2005] have incorporated multiple regression hidden semi Markov models (HSMM) for analysis and synthesis of human movements with different styles. In multiple regression HSMM the mean of the output Gaussian distributions, as well as the mean of the duration distribution at each state, are parameterized by some factors, such as task parameters. An approach called factor adaptive training is used to learn the parameters of HSMM. The proposed approach is used to control the walking movements in accordance with a change of the factors such as walking pace and stride length, and it provided smooth and realistic human motions. Similarly, in Kr¨ueger et al. [2010], a PHMM is incorporated which assumes that the observation densities of standard HMM, bi (x) = N (x|µi , Σi ) are functions of task parameters, φ: bφ i (x) = N (x|µφ i , Σi ). In other words, the means of the observation densities is a function of the task parameter, µφ i = f i (φ). The dimension of the task parameter φ is determined by the task. A linear function is used to model the effect of the task parameter on the mean as: µi = µ¯i + W i φ
(2.36)
where the matrices W i describe the linear variation produced by φ from the average trajectory µ¯i . For learning the PHMM parameters, an extended version of Baum-Welch approach is used and the number of PHMM states was chosen through cross validation. The proposed approach is used to learn reaching movements and movements composed of grasping an object and moving it into a specific hole position. Since the proposed approach in this thesis is also based on the parameterization of GMMs, the current PGMM will be referred as the standard PGMM from now on. The overall process is summarized in Table 2.3.
2.2
Available approaches
25
Table 2.2: Overall process of multi-streams approach. 1. Task demonstration (M demonstrations) - Set P (number of landmarks or candidate frames of references) M X for n ← 1 to N , with N = Tm (for each step) m=1 xn - Collect datapoint x˙n - Collect task parameters {{o(j) , R(j) }Pj=1 } end 2. Model fitting - Set K (number of components for GMMs) for j ← 1 to P (for each candidate frame)
(j)
xn
- Project the dataset in the j-th candidate frame x˙n (j) - Initialize model parameters {πi,j , µi,j , Σi,j }K i=1 - Refine the model with EM and projected dataset end 3. Reproduction for n ← 1 to T (for each reproduction step) - Collect/select xn and {o(j) , R(j) }Pj=1 for j ← 1 to P (for each candidate frame) (j)
- Obtain xn by projecting xn into the jth frame, 0(j)
(j)
0x˙ n
ˆ ˆ˙ n , Σ - Use GMR to estimate N (x
)
end ˆ 0x˙ ) ˆ˙ 0 , Σ - Use the Gaussian product to retrieve N (x end
C ONCEPTS AND BACKGROUNDS ( STATE OF THE ART )
26
Table 2.3: Overall process of Standard PGMM approach. 1. Task demonstration (M demonstrations) - Set P (number of candidate frames) M X for n ← 1 to N , with N = Tm (for each step) m=1
- Collect datapoint {ξ n } - Collect task parameters {{φn,j }Pj=1 } end 2. Model fitting - Set K (number of components in the GMM) ¯ i , Σi }K - Initialize model parameters {πi , W i , µ i=1 - Refine the model with modified EM algorithm 3. Reproduction - Set the input I and output O variables for n ← 1 to T (for each reproduction step) - Collect/select ξ In and {φn,j }Pj=1 (concatenated in a vector Qd ) I O - Estimate temporary GMM parameters {µn,i }K i=1 model in ξ n and ξ n as PK ξ In , ξ O n ∼ i=1 πi N (µn,i , Σi ) ˆ n) ˆ n, Σ - Use GMR to retrieve ξ O as ξ O |ξ I ∼ N (µ
n
end
n
n
Chapter 3
Task-parameterized Gaussian Mixture Model (TP-GMM) In this chapter, the proposed task-parameterized GMM approach is described in details, at first as a stand-alone approach. Afterwards, the combination of the proposed approach and dynamical systems is presented. Experimental results are presented for both scenarios.
3.1
Proposed approach
The proposed approach shares connections with standard PGMM models, but modulates both the centres and covariances of the Gaussians. The learning problem is set as maximizing the log-likelihood of the observations in different candidate frames, under the constraint that these observations are generated by the same source. Namely, each frame j observes the same training datapoint ξ n from its own perspective through local projection. Similarly to the estimation of the parameters of a standard GMM, deriving this constrained optimization problem results in an expectation maximization (EM) algorithm that guarantee to improve the likelihood of the model at each iteration. In the proposed approach, only task parameters are taken into account which can be described as a coordinate system. A frame of reference is defined here as a coordinate system represented by a position b (origin of the observer) and a set of basis vectors {e1 , e2 , · · · } forming a transformation matrix A = [e1 e2 · · · ]. The frames need to have the same number of rows but can have various number of columns (e.g., to consider constraints in both configuration space and task space). Only squared frames of references defined by orthogonal coordinate systems in task space are considered in this thesis. The coordinate system can include time as coordinate, or any other variable relevant for the task. An observed movement can be projected in different candidate frames of references. The 27
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
28
Figure 3.1: Illustration of task-parameterized movement based on the proposed approach. The task consists of reaching frame from frame ¬ with a desired approach angle, which is representative of skills such as reaching an object, inserting a peg in a hole, or moving a car from a parking spot to another. wi shows the relative importance of the different frames.
proposed model can adapt the motion in real-time with respect to the current position/orientation of the frames. We assume that each demonstration m ∈ {1, · · · , M } contains Tm datapoints forming a dataset PM
m=1 {ξ n }n=1
Tm
, where each datapoint ξ n is associated with a set of task parameters {An,j , bn,j }Pj=1 ,
representing respectively P candidate frames of references. The datapoint can be for example ξ n = t n , in which t represents time and x represents position in Cartesian space. However, it can enxn compass any kind of variables. Σ The parameters of the proposed model are {πi , Z µ i,j , Z i,j }, representing respectively the mixing co-
efficients, centres and covariances matrices for each frame j and mixture component i. At iteration n, the resulting center µn,i and covariance matrix Σn,i of each component i correspond to products of linearly transformed Gaussians
N (µn,i , Σn,i ) =
P Y j=1
Σ > N (An,j Z µ i,j + bn,j , An,j Z i,j An,j ),
(3.1)
3.1
Proposed approach
29
computed as P X
Σn,i =
> −1 (An,j Z Σ i,j An,j )
−1
,
(3.2)
j=1
µn,i = Σn,i
P X
µ > −1 (An,j Z Σ i,j An,j ) (An,j Z i,j + bn,j ).
(3.3)
j=1
As an illustrative example, a typical task is shown in Fig. 3.1. Several demonstrations with different position and orientation of frame are plotted in Fig. (3.1)-a. In (b-c) the same movements observed from the point of view of frames ¬ and are shown. (d) shows the reproduction process. The model Σ parameters Z µ i,j and Z i,j are projected in the new positions and orientations of frames ¬ and , and
each set of Gaussians in their candidate frames are multiplied to form a resulting Gaussian. These resulting Gaussians form a trajectory (or tube) that is used to move the system from frame ¬ to frame . The covariance provides information about the variability of the movement. The parameters of the model are iteratively estimated with the following EM procedure. E-step: γn,i =
πi N (ξ n |µn,i , Σn,i ) K X
(3.4)
πk N (ξ n |µn,k , Σn,k )
k=1
M-step: N X
πi =
N X
γn,i
n=1
N
,
Zµ i,j =
γn,i A−1 n,j [ξ n − bn,j ]
n=1 N X
(3.5) γn,i
n=1 N X
ZΣ i,j =
> ˜ n,i,j ][ξ n − µ ˜ n,i,j ]> A− γn,i A−1 n,j [ξ n − µ n,j
n=1 N X
(3.6) γn,i
n=1
with ˜ n,i,j = An,j Z µ µ i,j + bn,j .
(3.7)
Note that in contrast to the multi-streams approach, a single EM process is used to iteratively refine the model parameters, where the E-step considers the influence of the different frames for clustering the data in a common frame, see Eqs.(3.2) and (3.3) for the computation of µn,i and Σn,i in the Estep. The model parameters are initialized with a k-means procedure, modified similarly as for the EM algorithm.
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
30
During the reproduction phase, the input I and output O variables should be specified in the beginning. Then, at each time step n, the input data ξ In together with the task parameters {An,j , bn,j }Pj=1 are collected. Using this information and the learned model parameters, the temporary GMM parameters {µn,i , Σn,i }K i=1 are estimated, using Eqs. (3.2) and (3.3), to obtain the joint distribution between ξ In and ξ O n as ξ In , ξ O n ∼
K X
πi N (µn,i , Σn,i )
(3.8)
i=1 I Then GMR is used to retrieve ξ O n given ξ n as I ˆ n ). ˆ n, Σ ξO n |ξ n ∼ N (µ
(3.9)
Table 3.1: Overall process of proposed PGMM approach. 1. Task demonstration (M demonstrations) - Set P (number of candidate frames) M X for n ← 1 to N , with N = Tm (for each step) m=1
- Collect datapoint {ξ n } - Collect task parameters {{An,j , bn,j }Pj=1 } end 2. Model fitting - Set K (number of components in the GMM) Σ P K - Initialize model parameters {πi , {Z µ i,j , Z i,j }j=1 }i=1
- Fit a model to ξ with the modified EM algorithm in Eqs. (3.4)-(3.7), 3. Reproduction - Set the input I and output O variables for n ← 1 to T (for each reproduction step) - Collect/select ξ In and {{An,j , bn,j }Pj=1 } I O - Use Eqs. (3.2) and (3.3) to estimate temporary GMM parameters {µn,i , Σn,i }K i=1 modelling ξ n and ξ n as PK ξ In , ξ O n ∼ i=1 πi N (µn,i , Σn,i ) ˆ n) ˆ n, Σ - Use GMR to retrieve ξ O as ξ O |ξ I ∼ N (µ
n
n
n
end Table 3.1 describes the overall process1 . Model selection is compatible with the techniques employed in standard GMM (Bayesian information criteria, Dirichlet processes, etc.). 1
In step 3), the temporary GMM parameters do not need to be re-estimated if {An,j , bn,j }P j=1 do not change over time.
3.2
Experiment: Rolling out a Pizza dough using a robotic manipulator
31
Figure 3.2: Illustration of the task to the robot by kinesthetically moving it. The aim of the task is to move the rolling pin towards the center of the dough, then move forward and backward in the direction of the smaller diameter in order to make it circular.
3.2
Experiment: Rolling out a Pizza dough using a robotic manipulator
In order to have an idea about the performance of the proposed approach, its performance results are compared here with the results obtained by the application of three other approaches, namely multi-streams approach, standard PGMM and GPR with trajectory models database. The mentioned approaches are adapted to use the same task parameters as in the proposed approach. In the case of standard PGMM, the task parameters {An,j , bn,j }Pj=1 are concatenated to form a vector Qd , which modulates the centres through a linear relationship as ˜ i [Qd , 1]> µn,i = Z
(3.10)
˜ i , see Wilson and Bobick [1999] for details. Note that only the centres of the with model parameter Z Gaussians are parameterizable in the standard PGMM and the covariances are estimated as constant matrices Σi , with the standard EM procedure for GMM. Namely, if the candidate frames do not move during the movement, {µn,i , Σn,i }K i=1 can be evaluated only once in the beginning of the movement to reduce computation.
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
32
Table 3.2: Overall process of adapted multi-streams approach. 1. Task demonstration (same as in Table. 3.1) 2. Model fitting for j ← 1 to P (for each candidate frame) - Set Kj (number of components for frame j) N - Fit a GMM to {A−1 n,j [ξ n − bn,j ]}n=1 with EM K
j {πi,j , µi,j , Σi,j }i=1
end 3. Reproduction - Set the input I and output O elements for n ← 1 to T (for each reproduction step) - Collect/select xn and {An,j , bn,j }Pj=1 for j ← 1 to P (for each candidate frame) ˆO ) ˆO , Σ - Use GMR to estimate N (µ n,j
n,j
- Project the resulting Gaussian with An,j , bn,j end - Multiply the projected Gaussians to retrieve ξ O n end
In the case of GPR with trajectory models database approach, for each single demonstration m, the associated dataset is ξ m and the task parameters, {Am,j , bm,j }Pj=1 , are considered to be constant all along the demonstration. These task parameters are concatenated to form a vector q m . For each demonstration, a GMM model is fitted and the model parameters (output variables) are recorded together with the associated task parameters (query points). This information is then used to retrieve new model parameters from the new task parameters, by means of GPR. More specifically, for a new query point, after centring and training data, the joint distribution of the demonstrated and new outputs can be estimated as
K(Q, Q) + σ 2 I = N 0, Θd K(Qd , Q) Θ
K(Q, Qd )
K(Qd , Qd )
(3.11)
where Q is the concatenation of query points q m and Θ is the concatenation of outputs θ m . Squared exponential covariance functions K are considered, with hyper-parameters optimized for the specific extrapolation requirements of the experiment.
3.2
Experiment: Rolling out a Pizza dough using a robotic manipulator
33
Table 3.3: Overall process of adapted standard PGMM approach. 1. Task demonstration (same as in Table. 3.1) 2. Model fitting - Set K (number of components in the GMM) ˜ i , Σi }K - Fit a PGMM to ξ with EM algorithm of Wilson and Bobick [1999], {πi , Z i=1 3. Reproduction - Set the input I and output O variables for n ← 1 to T (for each reproduction step) - Collect/select ξ In and {{An,j , bn,j }Pj=1 } (concatenated in a vector Qd ) - Use Eq. (3.10) to estimate temporary GMM parameters {µn,i , Σn,i }K i=1 PK I O I O modelling ξ n and ξ n as ξ n , ξ n ∼ i=1 πi N (µn,i , Σn,i ) ˆ n) ˆ n, Σ - Use GMR to retrieve ξ O as ξ O |ξ I ∼ N (µ n
n
n
end ˆ associated with the new query points Qd is given by The expected outputs Θ ˆ = K(Qd , Q)[K(Q, Q) + σ 2 I]−1 Θ Θ
(3.12)
with the covariance of the prediction given by ˆ Θ = K(Qd , Qd ) − K(Qd , Q)[K(Q, Q) + σ 2 I]−1 K(Q, Qd ) Σ
(3.13)
The above formulation is independent of the trajectory model parameterization employed. Here, it is implemented as a GMM and GMR is used to regenerate new trajectories, see Calinon et al. [2007]. The adapted algorithms which are used for the experiment are shown in Tables. (3.2), (3.3) and (3.4), for the multi-streams, the standard PGMM and the GPR with trajectory models database, respectively. Note that in the GPR with trajectory models database approach, GMM is used instead of DMP for modelling each demonstration.
3.2.1
Experimental setup
We consider the movement of rolling out a pizza dough demonstrated by kinesthetic teaching, see Fig. 3.2. Such task is used as an example of movements modulated locally by the position and orientation of an object (the dough). The user simulates a movement that would increase the dough surface to make the pizza circular2 . The task is to move the rolling pin from an initial pose in the 2
In a real pizza making process, other elements would come into play such as friction, softness of the dough, flour on
the worktop, etc. In our experiment, the dough is pre-shaped in an elliptical form and no force is applied to it
34
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
robot workspace to the center of the dough, then move back and forth the rolling pin in a direction following the minor axis of the dough shape, and finally lift the rolling pin to let the vision system track the dough again. The experiment is implemented in a Barrett WAM torque controlled 7 DOFs manipulator with a rolling pin mounted at the end-effector. The robot is gravity-compensated during demonstrations and reproductions, with a wrench command to keep the orientation of the end-effector perpendicular to the worktop. The resulting trajectories are tracked by a virtual spring damper system, whose attractor and stiffness matrix are respectively defined by the center of the retrieved Gaussians in Eq. (3.9) and associated precision matrices (inverse of covariances). Forces acting as gravity compensation are superposed to the tracking forces, resulting in a safe controller for the user, who can exploit the redundancy of the robot and the redundancy of the task during demonstration and reproduction, see Calinon et al. [2010b] for details of the controller. For example, when sharing the workspace with the robot, the user can during reproduction change the position of the robot’s elbow to have more space. The input variable of the model is time, t, and the output variable is the Cartesian position of the rolling pin, x. Two candidate frames are considered (P = 2): the fixed robot frame (useful for the first part of the motion) and a frame defined by the dough location and shape extracted by a camera (based on color information). By assuming that the duration of the movement is not modulated by the task parameters, we have bn,1 = 0,
An,1 = I,
and
0
bn,2 = , pn
1
0
, An,2 = 0 Rn
with pn and Rn the position and orientation of the dough at time step n (An,j ∈ R4×4 , I is the identity matrix, 0 are zeros vectors of appropriate sizes and Rn is a direction cosine matrix). The motions are described by three variables (D = 3) representing the position of the rolling pin, with an orientation actively maintained to follow the movement direction. Four demonstrations with different positions and orientations of the dough are provided to the robot by kinesthetic teaching. Models with K = 6 components are considered in the experiment (selected empirically).
3.2
Experiment: Rolling out a Pizza dough using a robotic manipulator
35
Figure 3.3: Snapshots of a typical reproduction result performed in a new situation that has not been demonstrated to the robot.
3.2.2
Experimental results
Fig. 3.3 presents snapshots of the reproduction results in new situations3 . Fig. 3.4 presents the demonstrations and reproductions results. For the same situations, all models produce smooth trajectories passing over the dough. Namely, the movement starts with a path in the robot frame that progressively moves towards the dough, and when reaching it, moves the rolling pin back and forth in a direction parallel to the minor axis of the dough. Fig. 3.5 presents interpolation and extrapolation results. All approaches show good interpolation capability (first column graphs). The proposed approach and the multi-streams approach stand out in terms of extrapolation (last three columns graphs), both being capable of adapting the movement outside the region covered by the demonstrations (depicted by ’+’ signs). Fig. 3.6 provides quantitative comparisons with four criterions, highlighting the overall advantages of the proposed approach. The first two graphs show that for the same situations, the proposed model can regenerate movements closely matching the demonstrations, with the best likelihood fit over the other approaches. The third graph shows the computation time improvement over multi-streams approaches that require more Gaussians products operations. The last graph shows that the proposed approach has on average the best extrapolation capability, with more consistent results than the other 3
The most representative reproduction attempts are depicted here, since the results can randomly differ with the k-means
initialization.
36
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
Figure 3.4: Demonstrations and reproductions in the same situations (top-view of the worktop). Initial positions are plotted with points, and doughs are plotted with ellipses.
approaches. These results concord with our expectation that task-parameterized models extracting the local structures of the task can improve generalization and speed up the retrieval process.
3.3
Combining task-parameterized GMM with dynamical systems
Dynamical Systems (DS) has been used as a powerful tool to provide robots with the flexibility to solve multiple task constraints with various sources of perturbation. Using DS, the robot can swiftly react to continuous perturbation, see Schaal et al. [2007], Hoffmann et al. [2009] and S. M. KhansariZadeh [2011]. DS also simplifies the coexistence with other robot’s controller by sharing a similar state-space representation of the movement (e.g., with stiffness and damping parameters conventional
3.3
Combining task-parameterized GMM with dynamical systems
37
Figure 3.5: Reproductions in new situations (new task parameters). The ’+’ signs depict the dough positions used to train the model.
to both fields). In particular we are interested here in the subset of dynamical systems that can simulate virtual mass-spring-damper systems, which encapsulate physical notions of inertia, stiffness and damping that can be modified on-the-fly to modulate the characteristics of the movement in the vicinity of humans. Such systems may also facilitate the inclusion of active impedance control strategies into actuated systems with elements of passive compliance. Here, we combine the proposed task-parameterized GMM with dynamical systems in order to make benefit of both approaches. The overall approach can be illustrated as follows. The motion of the robot is assumed to be driven by a set of virtual springs connected to a set of candidate objects or body parts of the robot (e.g. end-effectors). The learning problem consists of estimating when and where to activate these springs. This can be learned from demonstrations by exploiting the invariant
38
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
Figure 3.6: Comparisons by considering the root-mean-square error between the demonstrated and reproduced trajectories (for the same situations as the demonstrations), the likelihood of the model retrieved after regression, the computation time for each reproduction step and a simulated surface elongation score for the reproductions with new task parameters. This score estimates the extrapolation capability by using the datapoints on the surface of the dough to measure the cumulated distance along the minor axis minus the cumulated distance along the major axis. This score reflects the requirement of spreading the dough in a desired direction. The error bars show standard deviations over the sets of datapoints and trials.
characteristics of the task (the parts of the movement that are similar across the multiple demonstrations). Consistent demonstrations will result in stronger springs, while irrelevant connections will vanish. A set of candidate frames of references (represented as coordinate systems) is predefined by the experimenter. This set remains valid for a wide range of applications (e.g. the hands of the robot are relevant for most manipulation skills). The role of the robot is to autonomously figure out which frames of references matter along the task, and in which way the movement should be modulated with respect to these frames. The robot can also learn that a frame is not relevant for the task. However, predefining too many candidate frame may require the user to provide a large number of demonstrations to obtain statistically relevant information, which would conflict with the aspiration of the approach to transfer skills in a user-friendly manner. In Calinon et al. [2009], the movement of a robot’s end-effector is represented as a virtual mass driven ¨ as positions, velocities by a weighted superposition of spring-damper systems. By setting x, x˙ and x
3.3
Combining task-parameterized GMM with dynamical systems
39
Figure 3.7: Illustration example of movement learning through the superposition of virtual springs. The left graph depicts an iteration step in the middle of the movement. The pink arrow represents the ˆ . The right graph shows the stochastic resulting spring force pointing to the current virtual attractor y movement generation process that preserves the smoothness and variability of the demonstrations.
and accelerations of the end-effector, the movement is described as ¨= x
K X
x V ˙ , hi K P i [µi − x] − κ x
(3.14)
i=1 V x where K P i , κ and µi are respectively the full stiffness matrix, damping term and attractor point
of the i-th virtual spring. K is the number of spring-damper systems. The connection of this model with dynamic movement primitives (DMP) is discussed in Calinon et al. [2010b], Hoffmann et al. [2009] and Calinon et al. [2009]. A noticeable difference is that the non-linear force modulating the movement in the original DMP formulation is now expressed as additional sets of virtual springs, adding local corrective terms that can swiftly react to perturbations during reproductions. More generically the above system can be written as4 : ˆ P [ˆ ¨=K ˙ x y − x] − κV x,
(3.15)
ˆ denotes the evolution/path of the virtual attractor. The learning problem is formulated here where y ˆ P that will pull the end-effector to follow the ˆ and changing stiffness K as estimation of the path of y behaviours demonstrated by the user. By first assuming that the movement is driven by Eq. (3.15) 4
ˆ= Here the equivalence is found by setting y
P
i
P hi µx i and K i =
P
i
hi K P i .
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
40
with a diagonal stiffness matrix with gain κP , and after homogeneous rescaling of the human demon¨ are transformed strations to match the ratio robot/human’s size, the collected datapoints x, x˙ and x V
¨ κ1P + x˙ κκP + x, corresponding to the estimation of the position of the virtual into a variable y = x attractor for each datapoint5 . In order to present the approach didactically, we will for now assume that time t is the driving mechanism of the movement. We will show later that other mechanisms can be used here (e.g. timeindependent autonomous system). In this context a Gaussian mixture model (GMM) is used to encode the joint probability distribution µt P(t, y). EM is used to estimate the priors (mixing coefficients) πi , centres µi = yi and covariµi Σti Σty i ances Σi = yt of the GMM. Several model selection approaches exist to estimate the Σi Σyi number of Gaussians in a GMM (see e.g. Calinon et al. [2009]). GMR is then used to estimate, at each time iteration, the conditional probability P(y|t), estimated in ˆ y ). See Eqs. (2.29)-(2.33) for the detials of GMR. ˆ y, Σ the form of a new Gaussian distribution N (µ As already discussed, GMR can retrieve control commands in real-time, independently of the number of datapoints in the training set. Indeed, the data retrieval problem in GMR is considered as a joint probability estimation problem. The joint distribution of the data is first approximated by a mixture of Gaussians with EM. An estimate of the outputs can then be computed for each new inputs in the form of another mixture of Gaussians, by exploiting various properties of Gaussians (linear transformations, products and conditional probability). This output provides additional information about the variation and coordination of the movement variables (local shape of the movement). In GMR, there is no distinction between the input and output variables when learning the model. Namely, any subset of input-output dimensions can be selected, and expectations on the remaining dimensions can be computed in real-time. It corresponds to a convex sum of linear approximations (with weights varying non-linearly), see Ghahramani and Jordan [1994]. In terms of computation, learning the model depends linearly on the number of datapoints, while the prediction is independent on this number, which makes the approach an interesting alternative to kernel-based regression methods such as Gaussian process regression (GPR), whose processing time grows with the size of the dataset. ˆ y provide estimates of the attractor point and its variability in the form of a covariance maˆ y and Σ µ trix, at time step t. The changing stiffness profile can be estimated as being inversely proportional to 5
An illustrative analogy of the problem consists of estimating the trajectory of a boat pulling a water-skier such that the
skier follows a desired path, with the rope acting as a spring of given stiffness and damping.
3.3
Combining task-parameterized GMM with dynamical systems
41
Figure 3.8: Information flow of the overall process. In the demonstration phase, movements are ¨ n } together with the task parameters {An,j , bn,j }Pj=1 , reprecorded as a set of datapoints {xn , x˙ n , x Σ resenting coordinate systems. In the learning phase, the model parameters {πi , Z µ i,j , Z i,j } are esti-
mated from the demonstration using the EM algorithm. In the reproduction phase, for new observa¨n tions of task parameters {An,j , bn,j }, the system estimates on-the-fly the movement commands x for each iteration n.
the variation in the movement. The details are presented in Calinon et al. [2010b]. Fig. 3.7 illustrates the overall mechanism of the approach. The representation of the dynamical systems parameters of Eq. (3.15) in the form of a standard GMM has several advantages. It is exploited here to encapsulate dynamical systems learning into a standard Gaussian mixture problem, which allows its extension to models encoding movements with respect to multiple landmarks in the robot’s environment. We now consider the case in which the virtual spring-damper systems can act in various candidate frames of reference. Here, instead of passing the position of end-effector, xn , as the datapoints of tn task-parameterized GMM, y n is used to build the data set, ξ n = . Afterwards, the parameters yn of the task-parameterized GMM is learned using the EM algorithm. Then, during the reproduction phase, given the new task parameters and the time step, the new virtual attractor point is estimated and used to calculate the new acceleration command. The overall work flow of the process is illustrated in Fig. 3.8.
3.3.1 3.3.1.1
Experiment: Transferring skills to a humanoid robot Experimental setup
The compliant humanoid robot (COMAN) is used in this experiment, which has been designed to explore how compliance can be exploited for safer human-robot interaction, reduced energy con-
42
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
Figure 3.9: Full-body compliant humanoid robot COMAN developed at IIT.
sumption, and faster learning capabilities, see Tsagarakis et al. [2011] for more details about this robot. Fig. 3.9 shows the joints of the robots endowed with passive compliance (series elastic actuators). For each arm of the robot, three candidate frames (P = 3) are considered: a frame attached to a wooden box object (p = 1), the robot’s upper torso (p = 2), and the robot’s other hand (p = 3). The position of the objects and the robot are tracked with a marker-based NaturalPoint OptiTrack motion capture system. It is composed of 12 cameras tracking position and orientation of predefined landmarks at a rate of 30 frames/second with a position tracking accuracy below 10mm. Two additional sets of markers are used in the demonstration phase, placed on the back of the demonstrator’s hands. During reproduction, the robot’s upper-torso and the box are tracked by the vision system. The position of the hands are determined by proprioception (from the motors encoders and forward kinematics from the upper torso’s tracked frame). The legs and torso are controlled to let the robot stand by continuously reacting to perturbations with a stabilization control scheme adapted from Li et al. [2012], exploiting the intrinsic and controlled compliance of the robot. The arms are controlled with an admittance controller to let the user physically interact with the robot by grasping and moving its arms. In total, 8 degrees of freedom (DOFs) are used to reproduce the learned skills (namely, the two arms), while the rest of the body DOFs are used to react to perturbation
3.3
Combining task-parameterized GMM with dynamical systems
43
and maintain balance. Three examples of movement behaviours are presented, that will be labelled clapping, tracking and sweeping the floor tasks. The first two are reproduced on the robot. The floor sweeping task is reproduced in simulation for practical reason, with the future goal of reproducing the task on the real platform when wrists and/or hands will be endowed in COMAN to hold a broom. The examples are chosen to be didactic and simple enough to be visualized and analysed after learning. Each of these behaviours could have been preprogrammed separately with a few lines of code, but here, the aim is to provide a model that can be used to transfer various tasks without modification of the main program. The transfer of tasks is thus not limited to this specific set of examples. For the video of the experiments refer to http://programming-by-demonstration.org/. 3.3.1.2
Time-based and time-invariant movements
The approach is tested with bimanual gestures, which is a challenging learning problem in humanoids, see Gribovskaya and Billard [2008] and Adorno et al. [2010]. Let us denote the positions at iteration R n of the left and right hands as xL n and xn . Let us represent their orientations with rotation matrices R L RL n and Rn . For time-based movements, a bimanual motion is retrieved by estimating P(y |t) and
P(y R |t) from the Gaussians L N (µL n,i , Σn,i )
=
P Y
Lµ L L LΣ L > N (AL n,j Z i,j + bn,j , An,j Z i,j (An,j ) ),
(3.16)
j=1
with
0 , bL n,P = xR n and R N (µR n,i , Σn,i ) =
P Y
AL n,P
> I 0 , = R 0 Rn
Rµ R R RΣ R > N (AR n,j Z i,j + bn,j , An,j Z i,j (An,j ) ),
(3.17)
(3.18)
j=1
with
0
, bR n,P = xL n
AR n,P
I 0> , = 0 RL n
(3.19)
where I = 1 for the special case of driving the motion through activation weights based on time (namely, scalar input for the regression process), and 0 are vectors of zero elements with corresponding lengths. Namely bimanual coordination is achieved by considering the other hand as a candidate frame for the reproduction of the skill. The movement of the two hands will thus be coupled for parts of the movement in which regular patterns have been observed between the two hands. The strength of the coupling constraint will be automatically adapted with respect to the variations observed in the
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
44
Figure 3.10: Results for the clapping task. The movement of the object and the robot’s two endeffectors are represented as a trail of changing colors (black, blue and red are used for object, left hand and right hand, respectively). The time line graphs show the relative importance of the different candidate frames., estimated as a single variable to facilitate its visualization (but not used in the algorithm). It is computed as the ratio of the precision matrix determinant for a given frame with y−1
respect to the other frames ( Phm |Σmy−1| ). i
hi |Σi
|
task. Any other type of inputs can be used to drive the movement. We consider as an example activation weights changing with respect to the position o of an object in the 3D Cartesian space. Namely, the bimanual motion is retrieved by estimating in real-time P(y L|o ) and P(y R|o ).
3.3.1.3
Experimental results
K = 4, K = 3 and K = 6 are used for the clapping, tracking and sweeping tasks to initialize the models. For each task, the box is moved in the vicinity of the robot, from its right-hand side to its left-hand side by following an ’S’-shaped trajectory. The frame of reference used to present the results is shown in Fig. 3.9.
3.3
Combining task-parameterized GMM with dynamical systems
45
Figure 3.11: Physical perturbation during reproduction of the learned hand clapping. While grabbing and moving the right hand of the robot, the robot uses the motors of its legs and torso to keep balance, and the motors of its left hand to adapt the clapping motion to the perturbation
Fig. 3.10 presents the results of the hands clapping task. The time graphs show that the important aspect of the task is to keep the motion of the hands coordinated (the respective hand frame is extracted as the most important). The robot does not react to the motion of the box (candidate frame irrelevant for the task). If the user grasps one hand of the robot and moves it to a new position, the robot reacts by adapting the movement of the other hand (the forces applied to the robot’s right hand are represented with green arrows). Fig. 3.11 shows additional results for the clapping task, where the user perturbs the robot by grasping one hand and moving it sideways. The robot reacts to the perturbation by reconfiguring its legs and torso to keep balance. Fig. 3.12 presents the results of the box tracking task. The user demonstrated that the box should be reached with the right hand if it is on the right-hand side, with the left hand if it is on the left-hand side and with both hands if it is in the middle (by pointing at the box with the forearms). After observing a single continuous demonstration showing these three behaviours in random orders, and training the model (10 seconds in our experiment), the robot is able to smoothly switch in real-time from one to two hands tracking, while progressively bringing back the unused hand to a natural pose. The regions of activation of these behaviours are depicted by three ellipsoids representing the resulting Gaussians. We see in the time line graphs that in the first part of the movement (when the object is on the righthand side of the robot), the right hand points at it (object frame is relevant), while the left arm is in a comfortable pose (robot’s frame is relevant). In the middle of the movement, when the box is in the center, the right hand continues to track the box and the left hand moves from the comfortable pose to the left side of the box. In this case, the two relationships hand-hand and hand-box are detected to be important for the task (both object frame and hand frame equally are important). Finally, when
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
46
Figure 3.12: Results for the tracking task. The movement of the object and the robot’s two endeffectors are represented as a trail of changing colors (Refer to Fig. 3.10 for more details).
the box progressively goes toward the left-hand side of the robot, the right hand moves back to a comfortable pose and the left hand tracks the object. In this case, for the left hand, the object frame is important and for the right hand, the robot’s frame is important. For both of the hands, the hand frame is not relevant. Fig. 3.14 presents the results of the sweeping the floor experiment. Here, the time line graphs are separated in different directions in order to show what the system learned. The movements in the demonstrations showed more variability in the horizontal plane than in the vertical direction. Indeed, the aim of the task is to sweep the floor which can be done at several places on the floor, however, the broom needs to be in touch with the floor (consistent movement in the vertical direction). The system correctly extracts that the movement of the two hands requires bimanual coordination, and that the task can be generalized to different positions in the robot’s frame, as long as vertical constraints are satisfied. Namely, the robot’s frame is detected as being relevant only for vertical direction6 . Similarly to the clapping task, the position of the box is correctly detected as being irrelevant for the task. 6
Since the robot does not make walking steps in this experiment, the fixed world’s frame and the robot’s frame remain
mostly aligned.
3.3
Combining task-parameterized GMM with dynamical systems
47
Figure 3.13: Adaptive uni/bimanual tracking behaviours learned from a single demonstration. The robot autonomously acquired hands switching behaviours and changes of coordination patterns that are modulated with respect to the position of the box.
Figure 3.14: Results for the sweeping the floor task. The movement of the object and the robot’s two end-effectors are represented as a trail of changing colors (see to Fig. 3.10 for more details).
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
48
3.4
Discussion
The proposed model circumvents the shortcoming of standard PGMM/PHMM that only the centres are influenced by the task parameters. The results show that adapting covariance matrices to external task parameters is crucial when modelling continuous movements. The covariances represent the principal direction of the movement, the local synergies among the variables, and the variations between demonstrations (e.g., to determine which parts of the motion need to be reproduced precisely and which parts can be more loosely reproduced). A common practice to prevent this problem in standard PGMM/PHMM is to increase the number of Gaussians, which has the effect of reducing the relevance of the covariance information in the modelling of the movement. It however degrades the parsimony of the model and comes at the expense of losing information about the local shape of the movement. By increasing the number of Gaussians, each covariance will provide only narrow and unusable information instead of the important local neighbourhood characteristics, synergies and shapes of the movement. The proposed approach is well suited for problems in which the task parameters can be represented in the form of coordinate systems. However, for task parameters that cannot be represented in such way, the standard PGMM/PHMM remains a better modelling choice because it does not require the separation of the task parameters into an offset value (position b of the observer) and basis vectors forming a matrix A. The standard PGMM/PHMM can also tackle the reverse problem of estimating the task parameters from observations in a more straightforward manner. Indeed, in the proposed approach, the parameters representing the coordinate systems are combined in such a way that the estimation of the task parameters from observed data becomes more complex. This issue will require further investigation (b can easily be estimated in a closed form, but A requires an iterative procedure). The proposed model and the multi-streams approach share the common perspective of probabilistically representing the local importance of different coordinate systems. While this is done as separated batch learning processes in multi-streams GMR (projection, EM learning, back-projection and recombination with Gaussians products), the proposed approach permits us to formulate the different steps directly in the EM procedure, resulting in a mixture of Gaussian products organizing the different Gaussian components in a principled manner, and speeding up the retrieval process. We showed that the proposed approach has several advantages. It ameliorates PGMM/PHMM adaptability by parameterizing not only the centres of the Gaussians but also the covariances. It is faster and more consistent than multi-streams GMR by relying on a single EM process rather than learning separated models. It shows extrapolation capability that go beyond approaches relying on interpola-
3.4
Discussion
49
tion principles such as GPR. This generalization capability is crucial for scaling up imitation learning challenges to skills of increasing complexity, for which it can be difficult to guarantee that the demonstrations cover a sufficient range of situations. On the other hand, we showed that the proposed approach can be combined with other useful tools such as dynamical systems. This opens the roads for new developments combining the versatility of dynamical systems and the robustness of statistical approaches. It can be applied to dynamic movement primitives, either by considering the force components in DMP as virtual springs or by keeping the original formulation. The activation weights mechanism in DMP is characterized by a second dynamical system τ s˙ = −αs acting as a decay term, which can directly be used without further change in the proposed approach by learning the joint distribution P(s, y) from demonstrations and estimating P(y|s) using GMR in the reproductions. This issue will be addressed in the next chapters in more detail. Future challenge perspectives are multiple. The frames of reference considered in the experiment were defined by Cartesian coordinate systems. The approach, however, does not require this assumption to be made, and the candidate frames do not need to have the same dimensions. Non-squared A matrices could also be considered, which could for example be exploited to learn skills requiring joint space and task space coordinates. Also, the current work focused on acquiring movements. We will study in future work if such models could similarly learn reflex behaviours that are shaped by task parameters (including force signals), which would be relevant for robots faced with continuous sources of perturbation.
TASK - PARAMETERIZED G AUSSIAN M IXTURE M ODEL (TP-GMM)
50
Table 3.4: Overall process of adapted GPR with trajectory models database. 1. Task demonstration - Set P (number of candidate frames) for m ← 1 to M , (for each demonstration) - Collect task parameters {Am,j , bm,j }Pj=1 for n ← 1 to Tm (for each time step) - Collect datapoint {ξ m,n } end end 2. Model fitting - Set K (number of components in each GMM) for m ← 1 to M (for each demonstration) - Fit a GMM to ξ m with EM θ m = {πi,m , µi,m , Σi,m }K i=1 - Associate query points q m = {Am,j , bm,j }Pj=1 end - Pre-compute [K(Q, Q) + σ 2 I]−1 Θ in Eqs. (3.11)-(3.12) 3. Reproduction - Set the input I and output O elements - Collect/select new query point Qd = {Aj , bj }Pj=1 - Estimate resulting GMM using Eq. (3.12) for n ← 1 to T (for each reproduction step) - Collect/select ξ In O I ˆO ˆO - Retrieve ξ O n through GMR as ξ n |ξ n ∼ N (µ n , Σn )
end
Chapter 4
Task-parameterized GMM with partially observable frames of reference In the previous chapters, the problem of robot programming by demonstration is investigated and it is generalized towards the task-parameterized approaches using the proposed method. However, it should be noted that in the state-of-the-art approaches, the number of the task parameters (query points or style variables), which are used for the demonstrations and reproductions are considered to be constant and all the task parameters to be always available. This can be a limiting factor for the generalization of tasks in real-world situations. For example, in some situations the information regarding one or even more task parameters could be missing in the reproduction phase. It could also be the case during the demonstrations to have different demonstrations with different number of available task parameters. The absence of task parameters could also be because of the unpredicted faults in the sensory data. For example, an occlusion can occur for one of the cameras used for gathering information about the task parameters. This problem is studied with an approach based on task-parameterized Gaussian mixture model. A dust sweeping task using a broom is considered to illustrate the applicability of the proposed model. In addition to the position of the broom, its orientation information is taken into account as input variables to the learning model. In this task, the position and orientation of the dust areas and the dust pan are tracked using a conventional camera and image processing. Task parameters are constructed according to this information. For each set of task parameters (some of them can be unavailable), the task is performed by the demonstrator and the trajectory data are recorded. Then, the proposed approach is applied to learn a model for the recorded trajectories modulated by the task parameters. 51
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
52
The obtained model is used to re-generate new trajectories for new task parameters, by considering various cases when some of them are missing.
4.1
Proposed approach
The proposed approach is based on the task-parameterized GMM and therefore each task parameter is considered as a candidate frame of reference, which has a position b and a transformation matrix A. The original task-parameterized GMM is formulated by considering that all task parameters are available. Here, instead, we study the case where only some frames are available, which can change during both demonstrations and reproductions. When cleaning a table, we make use of the available visual information to form a path that will allow us to move a broom through a set of area surfaces to clean. If the same task is done in the dark and only some parts of the table are visible, we will modify this path according to the missing information. In complete darkness, an average path covering the table would be produced. We aim to achieve similar behaviours with our model. Therefore, we will do the same in the demonstration and reproduction phases, when a task parameter is not accessible, it will be simply ignored and the model will be estimated or will retrieve movements based on the available task parameters and the corresponding data. We will assume that each demonstration m ∈ {1, . . . M } contains Tm datapoints forming a dataset PM of N datapoints {ξ n }N n=1 with N = m Tm . Each datapoint at each demonstration ξ n = [tn , xn ] ∈ RD+1 (e.g. D = 4 Cartesian datapoints and orientation angle augmented with time stamp tn ) is associated with the observed task parameters {An,j , bn,j }j∈V n that represent respectively available candidate frames of reference (at most P ), with offset position vectors bn,j and linear transformation matrices An,j . V n is a set of indices corresponding to the accessible frames of reference at time step n. Σ The parameters of the task-parameterized GMM are {πi , Z µ i,j , Z i,j }, representing respectively the
mixing coefficients, centres and covariance matrices for each frame j and mixture component i. At time step n, during both learning and reproduction phases, the resulting center µi,n and covariance matrix Σi,n of each component i correspond to products of linearly transformed Gaussians
N (µn,i , Σn,i ) ∼
Y j∈V n
computed as
Σ > N An,j Z µ + b , A Z A n,j n,j i,j n,j , i,j
(4.1)
4.1
Proposed approach
53
−1
X
Σn,i =
> An,j Z Σ i,j An,j
−1
,
(4.2)
j∈V n µ > −1 (An,j Z Σ i,j An,j ) (An,j Z i,j + bn,j ).
X
µn,i = Σn,i
(4.3)
j∈V n
The difference between Eqs. (4.1-4.3) and the original ones of TP-GMM (Eqs. (3.1-3.3)), lies in the indices used for the summation and multiplication. In the original one, the indices includes all frames of reference, while here it includes only the available ones. The parameters of the model are iteratively estimated with the following EM procedure: E-step: πi N (ξ n |µn,i , Σn,i ) γn,i = PK . k=1 πk N (ξ n |µn,k , Σn,k )
(4.4)
M-step: X PN πi =
n=1 γn,i
N
, Zµ i,j =
γn,i A−1 m,j [ξ n − bm,j ]
n∈W j
X
,
(4.5)
γn,i
n∈W j
X ZΣ i,j =
> ˜ n,i,j ][ξ n − µ ˜ n,i,j ]> A− γn,i A−1 m,j [ξ n − µ m,j
n∈W j
X
,
(4.6)
γn,i
n∈W j
˜ n,i,j = with µ
An,j Z µ i,j
+ bn,j . The set W j contains the time steps indices in which the frame j
is available, W j = {n = 1 · · · N |j ∈ V n }. Compared to the original EM formulation for TP-GMM (Eqs. 3.5-3.6), the indices include only the the datapoints in which the frame of reference is available during the demonstration, instead of encompassing all the data points for all the frames of reference. The model parameters are initialized by clustering the data with equal time intervals. After the model is learnt using the provided demonstration data, it can be used to produce new trajectories, given a set of new task parameters. Similarly to TP-GMM, at first step, the input and output variables, ξ I and ξ O , should be specified (here, we consider the time as the input variable, but in general, any variable or set of variables could be considered as the input of the system). Then, for each time step, the set of accessible task parameters is provided as {Aj,n , bj,n }j∈V n . Afterwards, a GMM is built for
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
54
the input and output variables at time step n, based on the learnt task-parameterized GMM parameters as:
P (ξ In , ξ O n)
=
K X
πi N (ξ In , ξ O n |µn,i , Σn,i ),
i=1
in which K is the number of Gaussian components and µn,i and Σn,i are computed using Eqs. (4.2-4.3). Then Gaussian mixture regression (GMR) is applied to estimate the value of the output I ˆ n is variable, ξ O n given the value of the input variable, ξ n using Eq. (3.9). Afterwards, the estimated µ
the best estimate of ξ O n at this time step. We can see with Eqs. (4.2-4.3) that, when a task parameter Σ is missing, the corresponding Z µ i,j and Z i,j will have no effect on the calculation of µn,i and Σn,i .
Model selection is compatible with the techniques applied for the conventional GMM. A summary of the proposed approach is provided in Table. 4.1. In the next section, the proposed approach is demonstrated using a robotic dust sweeping experiment.
4.2
Robotic dust sweeping Experiment
In a real world scenario, a robotic manipulator (Barrett WAM) is considered to sweep dust areas on a table. The aim is to sweep a set of dust spots with a broom on a table and bring the dust inside a dust pan. For this task, the shape of the movement is modulated by the positions and orientations of the dust spots and the dust pan. The task requires to move the broom attached to the end-effector of the robot from an initial pose in the robot workspace towards a first marker representing a dust spot, then move towards a second marker representing a second dust spot and finally move towards the dust pan, with suitable orientation. For the dust spots the orientation of the broom should be in the direction of the biggest elongation. On the other hand, for the dust pan the orientation of the broom should be in the direction of the smallest elongation. In this way the dust pieces will be collected inside the dust pan. The two dust pieces and the dust pan are forming the task parameters. In this experiment two dust areas are considered and therefore there will be 4 frames of reference as the task parameters (P = 4): 1. The robot frame (which is fixed all the time), 2. The dust pan frame, 3. The dust piece #1 frame, 4. The dust piece #2 frame.
4.2
Robotic dust sweeping Experiment
55
Figure 4.1: Snapshots of a demonstration of the dust sweeping task through kinesthetic teaching with a gravity-compensated robot. The task parameters are assumed to be constant during the movement for each demonstration, meaning that the initial position and orientation of the dust pieces and dust pan is captured and used as the corresponding task parameters all along the demonstration.
4.2.1
Experimental setup
The experiment is implemented in a Barrett WAM torque-controlled 7 DOFs manipulator. The robot is gravity-compensated during demonstrations and reproductions. The orientation of the end-effector is kept perpendicular to the worktop, with the orientation along the vertical axis determined by the learning approach. The demonstrated trajectories are tracked by a virtual spring-damper system. Forces acting as gravity compensation are superposed to the tracking forces, resulting in a safe controller for the user, who can exploit the redundancy of the robot and the redundancy of the task during demonstration and reproduction, see Calinon et al. [2010b] for details of the controller. The information regarding the position, size and orientation of the dust pieces and the dust pan are tracked by a camera using color-based image processing and homography transformation to unwrap
56
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
the camera image. Task parameters for each frame are then built using this information. Fig. 4.1 illustrates the experimental setup during a typical demonstration. All demonstrations start from nearly the same position, with the same orientation of the dust pan. The motions are described by 4 variables (D = 4), representing the Cartesian position of the broom (x1 , x2 , x3 ) and its horizontal orientation (which is calculated from the quaternion information provided by the robot). Therefore, we have bn,1 = 0,
, bn,j = p n,j αn,j
An,1 = I,
0
1
0
An,j = 0 Rn,j 0 0
0
0 1
in which pn,j , αn,j and Rn,j refer to the Cartesian positions, the horizontal angle and the orientation (as rotation matrix) of the dust pan and two dust pieces at time step n, respectively. 0 is a vector or a zero matrix of appropriate size and I is the identity matrix. In total, 6 demonstrations are provided with different locations of the dust pieces and dust pan. Models with K = 6 components are considered in the experiment (selected empirically).
4.2.2
Experimental results
Snapshots of the reproduction of the task by the robot after learning is presented in Fig. 4.2. Fig. 4.3 presents the demonstration data observed form the point of view of frames 1 to 4 (in 2D, on the horizontal plane). The obtained model after the learning is also depicted in terms of Gaussian comΣ ponents with mean values Z µ i,j and covariance matrices Z i,j for each one of the frames of reference.
As it can be seen from the graphs, from the point of view of the 1st frame, the trajectories are close to each other at the beginning of the movements and therefore the corresponding Gaussian component has smaller covariance matrix than the other components. In the case of the 2nd frame, since all the movements end at this frame, the trajectories are close to each other at the end of the movements, and therefore, the corresponding Gaussian component has a small covariance matrix, while in the beginning of the movements, the trajectories are diverse and the corresponding Gaussian component has a larger covariance matrix. The 3rd and the 4th frames of reference have small Gaussian components in the middle of the movement and bigger ones at the beginning and the end of movement. In Fig. 4.4 samples of the demonstrations are depicted in the first row (in 2D, from the top view).
4.2
Robotic dust sweeping Experiment
57
Figure 4.2: Snapshots of a reproduction attempt of the dust sweeping task after model learning. In the second row, reproductions for the same set of task parameters as the demonstrations are presented, in which all the frames of reference are available. The orientation of the dust broom is shown by segments along the trajectories. The reproduced data follows reasonably the trajectory and the orientation of the dust broom in all cases. In the third row of Fig. 4.4, the importance of the frames during the reproduction is shown. The importance of frame j at time step n, Fn,j is computed as the ratio of the precision matrix determinant for a given frame with respect to the other frames at time step n, defined as: Fn,j =
|Σ−1 n,j | P X
,
(4.7)
|Σ−1 n,k |
k=1
where Σn,j =
K X
OI I −1 IO φi (ξ In ) ΣO n,i,j − Σn,i,j (Σn,i,j ) Σn,i,j ,
(4.8)
i=1
and > Σn,i,j = An,j Z Σ i,j An,j .
(4.9)
These results show that the first frame is the most important in the beginning of the movement,
58
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
then gradually the fourth (1st dust area), third (2nd dust area) and second (dust pan). In Fig. 4.5, reproductions in the case of missing frames are presented. In the first row, reproductions for the same set of task parameters are presented in which one of the frames of reference is missing. Only the available task parameters are shown in the image. The importance of the frames are shown below the reproduction trajectories (in the second row). Note that the importance of the missing frame is always 0, and the importance pattern for the available frames is similar to the normal case. We can see that the role of the missing frame is shared by the neighbouring frames. According to this results, the approach tackles the absence of task parameters in a natural way by reproducing the movement with the available task parameters. For the present task parameters, the task is replicated properly (the dust piece is collected in the dust pan at the end of the reproduction). Similar reproduction results are illustrated in Figs 4.6 and 4.7, this time for new sets of task parameters. It can be observed from these results that the proposed approach performs well for the new task parameters, proving the generalization capability of the approach. (a) Obs. from 1st frame
(b) Obs. from 2nd frame 0
0.1 0.05
−0.2
x2 (m)
0 −0.05
−0.4
−0.1
−0.6
−0.15 −0.2
−0.8
−0.25 0.3
0.4
0.5
0.6 x1 (m)
0.7
0.8
−0.6
(c) Obs. from 3rd frame
−0.4
−0.2
0
0.2
0.4
(d) Obs. from 4th frame
0 0 −0.5 −0.5
−1 −1.5
−1 −2 −2.5
−1.5
−3 −1
0
1
2
−1.5
−1
−0.5
0
0.5
1
Figure 4.3: Demonstration data observed from different frames of reference and the obtained model after learning (the movements start from the black squares). Reproductions in the case of new task parameters in which one of the frames is missing are de-
4.2
Robotic dust sweeping Experiment
59
x2(m)
0.2 0 −0.2 −0.4
0.4
0.6 x1(m)
0.8
Fn,j
1
0.5
0 50 100 150 200 time step
Figure 4.4:
Top: samples of the demonstrations, Middle: the corresponding reproductions, and
Bottom: the importance of the frames during the reproduction, after model learning in the presence of all the frames (task parameters). Task parameters are shown as frames of reference (Black, blue, red and green correspond to the first, second, third and fourth frame of reference, respectively).
picted in Fig. 4.7. We can see that the approach is able to tackle the absence of the task parameter and at the same time generalize to new situations. In Fig. 4.8 a measure of precision is shown for reproductions in different situations. This measure is calculated as the inverse of the determinant of the estimated covariance by GMR, at each time step. Fig. 4.8 (a) presents the precision pattern for the reproductions in which all the frames of reference are available and are the same as the demonstrations, averaged over 6 reproductions. Fig.4.8 (b) gives the same information for the new set of task parameters (all accessible), averaged over 4 reproductions. The precision pattern in the case of new task parameters follows the reproductions with the task parameters as the demonstrations. The two main peaks in both of the graphs correspond to the time intervals in which the trajectory passes close to the 4th and the 3rd frames of reference, respectively. In Fig. 4.8 (c), the precision pattern for
60
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
Figure 4.5: Top: samples of the reproductions for the same set of parameters as the demonstration in the case of missing frames. Bottom: corresponding importance of the frames during the reproduction. the case that the task parameters are the same as the demonstrations, but some of them are missing, is presented. A similar graph is presented in Fig. 4.8 (d), but for new task parameters. It can be seen from these graphs that when a task parameter is not accessible, the precision of the reproduction does not follow the normal behaviour around the corresponding time interval and the movement is performed less precisely in that region. This behaviour is predictable, since when the task parameter is missing, there is no need to be precise for the time intervals which correspond to that frame of reference. In the extreme case where only the reference frame is available (namely, without vision tracking) the approach is still able to perform an appropriate average movement, as depicted in Fig. 4.9. The retrieved movement looks similar to how humans would do in the same situation, by averaging over the previous experiences with a smooth movement.
4.3
Summary and conclusion
In this chapter, an extension of the task-parameterized GMM model was presented to handle partially observable task parameters, both in the demonstration and the reproduction phases. This is achieved by building the model and retrieving output variable based only on the available task parameters at each time step. The proposed extension is simple and effective, and its performance is demonstrated using a robotic dust sweeping task.
4.3
Summary and conclusion
61
Figure 4.6: Top: reproductions for new set of task parameters. Bottom: the corresponding importance of the frames during the reproduction, in the presence of all frames. Task parameters of the demonstrations are depicted using smaller and thinner frames of reference. The approach that we propose is well suited for problems in which the task parameters can be represented in the form of coordinate systems, and where it is desirable to have a reasonable movement when a task parameter is missing. The approach retrieves a confidence measure on the estimated path, which could be exploited in future work to automatically determine if the movement can be reproduced when some frames are missing (e.g., partial occlusions), and wait for the occlusion to disappear if the measure of confidence on the retrieved movement is too low. The proposed approach considers only the cases in which the number of task parameters in the reproduction is smaller or equal to the number of task parameters in the demonstration. In other words, each task parameter should be present at least once during the demonstrations. In order to deal with excessive task parameters, other approaches should be developed to automatically discard redundant and irrelevant frames from larger sets of candidate frames. Another way of dealing with the missing task parameters, that could be investigated in future work, is to use a similar strategy as in the problem of matrix and higher order tensor completion algorithms Signoretto et al. [2011]. Investigating the similar case of missing task parameters using other learning approaches could be another line of the future works to consider.
62
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
Figure 4.7: Top: reproductions for the new set of parameters in the case of missing frames. Bottom: the corresponding importance of the frames during the reproduction. Only the available task parameters are shown in the first row. Task parameters used for the demonstrations are depicted using smaller and thinner frames of reference.
4.3
Summary and conclusion
63
Table 4.1: Overall process of the proposed task-parameterized GMM with missing frames. 1. Task demonstration - Determine P (maximum number of frames) - Perform the following loop during demonstration - Record data {ξ n }N n=1 - Determine the available task parameters and store it as indices sets V n and W j - Record the available task parameters {{An,j , bn,j }n∈W j }j∈V n 2. Model fitting - Determine K (number of components of GMM) - Initialize model parameters {πi , {Z µi,j , Z Σi,j }Pj=1 }K i=1 - Estimate model parameters with EM (Eq. (4.6)) 3. Reproduction - Determine which are the input I and output O variables - Determine the available task parameters, V n - Perform the following computation iteratively - Collect/select ξ In and {An,j , bn,j }j∈V n - Estimate temporary GMM parameters {πi , µn,i , Σn,i }K i=1 modelling the joint distribution of ξ In and ξ O n as K X P(ξ In , ξ O πi N (ξ In , ξ O n) = n |µn,i , Σn,i ) (Eq. (4.6)) i=1
- Use GMR to retrieve ξ O n as O I O ˆ n ) (Eq. (3.9)) ˆ n, Σ P(ξ n |ξ n ) = N (ξ n |µ
64
TASK - PARAMETERIZED GMM WITH PARTIALLY OBSERVABLE FRAMES OF REFERENCE
(a)
(b)
6
8
6
x 10
x 10 10 8
6
n
6
P
4 4
2
0
2 0
0
50
100 time step
150
0
200
50
(c)
100
150
200
150
200
(d)
6
6
x 10
x 10
6
7
5
6 5
4
4 3 3 2
2
1
1
0
0 0
50
100
150
200
0
50
100
Figure 4.8: Average precision pattern for reproductions in different situations. In (c) and (d) the missing frame is 3rd frame (blue), 4th frame (thick red line), and both 3rd and 4th frames (thick green line). Refer to text for more description.
4.3
Summary and conclusion
65
x2 (m)
0.2 0 −0.2 −0.4
0.4
0.6 x1 (m)
0.8
Figure 4.9: Reproduction (black, thick line) for the case in which only the reference frame is available. For comparison demonstrations are plotted using thinner (blue) lines.
Chapter 5
DMP learned by GMR Ample evidence from biology suggests that voluntary actions are composed of simpler elements, often called motor-primitives, that are bonded either simultaneously or serially in time, see MussaIvaldi [1992]; Flash and Hochner [2005]; Schaal et al. [2007]; Bizzi et al. [2008]; Hogan and Sternad [2012]. A widespread model for motor-primitives learning in robots is dynamic movement primitives (DMP) Ijspeert et al. [2003, 2013]. One key reason for the prevalence of the model is its ability to cope with the perturbations of real-world robot applications, and to couple dynamical systems with a formulation that leaves room for further modifications and improvements. In this chapter, we aim at exploiting mixture modeling techniques with global learning rules for DMP parameters estimation. The combination of statistics and dynamical systems allows one to encapsulate variability and correlation information in the model, which allows us to foster further links to control strategies such as the uncontrolled manifold and the minimum intervention principle. Scholz and Schoener expounded the notion of uncontrolled manifolds to show the exploitation of variability as an evidence of skill Scholz and Sch¨oner [1999]; Latash et al. [2002]. Todorov and Jordan introduced the minimum intervention principle Todorov and Jordan [2002] as a general property of optimal feedback controllers, with the idea that deviations from the average trajectory are corrected only when they interfere with task performance. In the context of human skill acquisition studies, Sternad et al. proposed that a preferential distribution of performance variability along do-not-matter directions suggests skillful control by the central nervous system Sternad et al. [2010, 2011]. With a broader perspective, Kelso suggested that synergies constitute Nature’s way of handling information in systems of enormous complexity, where a perturbation to any part of the synergy is immediately compensated by remotely linked elements to preserve the functional integrity of the organism Scott Kelso [2009]. Correlations and variations thus appear to be crucial elements in representing such signals. 67
DMP LEARNED BY GMR
68
GMR has mostly been exploited in two manners to encode and retrieve movements: 1) as time-indexed trajectories with ξ = [t, y]> , by learning P(t, y) with a GMM, and retrieving P(y|t) with GMR for each time step to reproduce a trajectory Calinon et al. [2007]. The retrieved trajectory presents useful smoothness properties (infinitely differentiable); ˙ > , by learning P(y, y) ˙ with a GMM, with y and y˙ 2) as an autonomous system with ξ = [y, y] representing position and velocity of the system (either in task space or joint space), and by ˙ retrieving iteratively during reproduction a series of velocity commands by estimating P(y|y) with GMR, see Calinon et al. [2010a]; S. M. Khansari-Zadeh [2011]. It is proposed in this chapter to use GMR as a probabilistic formulation of DMP. A new extension of the regression process (wrapped GMR) will also be introduced to encode both discrete and periodic signals.
5.1
Dynamic movement primitives (DMP)
Dynamic movement primitives (DMP) is a second order dynamical system, acting as a virtual springdamper perturbed by a non-linear force profile used to modify the shape of the movement Ijspeert et al. [2013]. For a discrete movement, the system is described as1 τ 2y ¨ = αz (βz (g − y) − τ y) ˙ + f (x), τ x˙ = −αx x, PN i=1 Ψi (x)w i x, f (x) = P N i=1 Ψi (x) 1 2 Ψi (x) = exp − 2 (x − ci ) , 2σi
(5.1) (5.2) (5.3)
where y, y˙ and y ¨ are position, velocity and acceleration of the system, respectively. g defines the goal and τ represents the duration of the movement. αz , βz and αx are hyper-parameters of the model, usually tuned in a way to make the system critically damped. Eq. (5.1) is known as the transformation system, Eq. (5.2) the canonical system with phase variable x, and Eq. (5.3) the learned force profile. This force profile is represented as a weighted sum of weights wi , activated by radial basis functions Ψi (x). 1
The standard notation of DMP is in one dimension, but for the purpose of the thesis, a matrix and
vector formulation is adopted.
5.2
Proposed approach for learning DMP parameters
69
For a periodic movement, DMP is similarly defined as τ 2y ¨ = αz (βz (g − y) − τ y) ˙ + f (φ, r), τ φ˙ = 1, PN i=1 Ψi (φ)w i f (φ, r) = r, P N i=1 Ψi (φ)
(5.4) (5.5)
Ψi (φ) = exp γi (cos(φ − ci ) − 1) , where φ is the phase angle of an oscillator and r is the amplitude of the oscillation. τ and r are used to generalize the movement to different speeds (periods of oscillation) and amplitudes. The standard learning procedure in DMP (for example, for discrete movements) consists of predefining Ψi (x), and estimating wi with weighted least squares. After setting the hyper-parameters αz , βz , αx and N (number of basis functions), ci (center of basis functions) and σi (width of basis functions), the learning problem consists of minimizing for each basis function i ∈ {1, . . . , N } the cost function Ji =
P X
2 Ψi (t) ftarget(t) − wi x(t) ,
t=1
resulting in a weighted least squares solution wi = (X> Γi X)−1 X> Γi F , in which
Ψi (1) · · · x(1) . .. ,Γi = .. . X= . . .
x(P )
0
···
0 .. .
(5.6)
ftarget(1) .. , ,F = .
Ψi (P )
ftarget(P )
where P is the number of data points, and ftarget(t) is the force profile calculated at each time step according to the recorded movement as ¨demo(t) − αz βz g − ydemo(t) − τ y˙ demo(t) , ftarget(t) = τ 2 y
(5.7)
¨demo(t) are the position, velocity and acceleration of the system at time where ydemo(t), y˙ demo(t) and y step t, respectively.
5.2
Proposed approach for learning DMP parameters
In contrast to other regression methods such as locally weighted regression (LWR) Schaal and Atkeson [1998], locally weighted projection regression (LWPR) Vijayakumar et al. [2005], or Gaussian process regression (GPR) Nguyen-Tuong and Peters [2008]; Grimes et al. [2006], GMR does not
DMP LEARNED BY GMR
70
model the regression function directly, but models a joint probability density function of the data. It then derives the regression function from the joint density model.
5.3
Proposed approach
We propose to encapsulate the probabilistic relation between the variables composing the DMP in a Gaussian mixture model (GMM). Recall that the parameters {πi , µi , Σi }N i=1 of a GMM respectively represent the mixing coefficients (priors), centers and covariance matrices. The covariance matrices can be constrained to various forms. The most known forms are circular, diagonal and full covariances, but there exists a full range of possible constraints on the covariances, with dedicated EM procedure, including mixtures of factor analyzers McLachlan et al. [2003] or mixtures of probabilistic principal component analysers Tipping and Bishop [1999] to locally reduce the dimensionality of the problem, and parsimonious GMM Bouveyron and Brunet-Saumard [2012]; Mcnicholas and Murphy [2008]; Baek et al. [2010] to share common synergies among the Gaussians.
5.3.1
Discrete movement
x , the parameters of a For a dataset composed of the phase variable and forcing term as f target GMM with diagonal covariances can be set as GMM1 :
µi =
µx i , µfi
x > Σ 0 . Σi = i f 0 diag(σ i )
(5.8)
The estimation of f (x) in Eq. (5.3) corresponds to the estimation of P(f |x) with GMR, with wi = µfi , and the radial basis functions Ψi (x) replaced by normal distributions N (x| µxi , Σxi ).2 Namely, the external forcing term is calculated as f (x) =
N X
γi (x)wi ,
(5.9)
i=1
with πi N (x|µxi , Σxi ) γi (x) = PN . x x k=1 πk N (x|µk , Σk ) 2
Namely, the exponentials are rescaled by the determinants of the variances in the case of Gaussian
basis functions. If the mixing coefficients and variances are the same for all components, the two formulations are equivalent.
5.3
Proposed approach
71
By using diagonal covariances as in Eq. (5.9), GMR is equivalent to the weighted least squares estimate commonly used in DMP. But with an EM procedure, the GMM encoding allows us to simultaneously estimate the hyper-parameters ci = µxi and σi = Σxi , thus also optimizing the placement, spread and overlap of the basis functions while learning the forcing term profile. Moreover, this representation permits the encoding of more complex relationships between x and f by setting the GMM parameters as GMM2 : µi =
µx i , µfi
Σi =
Σx i Σfi x
Σxf i diag(σ fi )
,
(5.10)
thus extending the regression estimate in (5.9) to a weighted least-square estimate of degree 1 (instead of degree 0), given by wi (x) = µfi + Σfi x (Σxi )−1 (x − µxi ).
5.3.2
(5.11)
Periodic movement
For a periodic motion, P(f |φ) can similarly be estimated by GMR by taking into account the periodicity of the movement in the estimation of P(φ, f ) as a GMM. The wrapped GMM (WGMM) can be used to tackle such circular statistics problem, see Mardia and Jupp [2009]; Agiomyrgiannakis and Stylianou [2009]. The first step is to define a tiling (or lattice) that will be used to compute probabilities in regions adjacent to the range of periodic variables. This tiling is represented by a series of offset vectors {v m }M m=1 determining the neighbouring regions where likelihood estimation should take place. An intermediary variable uj is used to define the tiling for each dimension, taking a null value for non-periodic dimensions, and a positive integer value for periodic dimensions. This integer determines the adjacent regions where the likelihood will be evaluated. The tiling for dimension j is represented as a vector of offsets oj with dimension Oj = 1+2uj defined by oj = − uj , −uj +1, . . . , 0, . . . , uj −1, uj rj , where rj represents the period of dimension j (2π in our case). For mixture problems, since the eigenvalues of the covariance of each component will be lower than the range 2π, this integer can be set to 1 or 2 without losing precision on the estimate. With the above notation, the complete tiling is defined by M =
QD
j=1 Oj
vectors v m denoting
a point on the lattice formed by offset vectors oj . For example, for a dataset of 3 dimensions with u = [1 0 0]> (the first dimension denotes the phase variable and one adjacent region is considered in the calculation of probabilities), we have o1 = [−1, 0, 1]r1 , o2 = [0]r2 , o3 = [0]r3 (with dimensions
DMP LEARNED BY GMR
72
Reproduction with WGMR
f1
N1
N2 N3
GMM1 0
2π
x
f1
N1 N2
N3
GMM2 0
2π
x
Figure 5.1: Reproduction of periodic signals with WGMR for GMM1 and GMM2 . The demonstration and reproduction are respectively represented in solid and dotted lines. O1 = 3, O2 = 1, O3 = 1 and M = O1 O2 O3 = 3). The vectors {v m }3m=1 will then be v 1 = [−r1 , 0, 0]> , v 2 = 0, and v 3 = [r1 , 0, 0]> . For a multivariate dataset of D dimensions, we define a WGMM of parameters Θ = {πi , µi , Σi }N i=1 with N Gaussian components as
P(ξ n |Θ) = with
N W(ξ n | µi , Σi ) =
N X i=1 M X
πi N W(ξ n | µi , Σi ),
N (ξ n−v m | µi , Σi ).
(5.12)
m=1
Eq. (5.12) consists of replicating the Gaussian function on the lattice, see Fig. 5.1. It collapses to a standard GMM by setting u = 0 (dataset with only non-periodic variables), yielding M = 1 and
5.3
Proposed approach
73
v 1 = 0. The expectation-maximization process to estimate Θ is described as E-step: πi N (ξ n −v m | µi , Σi ) . γn,m,i = PN k πk N (ξ n −v m | µk , Σk ) M-step: PP πi =
PP
γn,m,i
n m
P
,
µi =
n m
γn,m,i (ξ n −v m ) PP , γn,m,i n m
PP Σi =
n m
γn,m,i (ξ n −µi −v m )(ξ n −µi −v m )> PP . γn,m,i n m
The tiling process can be computed very efficiently by transforming the loops in the equations above with matrices operations, which speed up the computation time by several orders of magnitude. After estimation of the model parameters, a wrapped GMR (WGMR) algorithm can be defined with
ξˆO n =
N X M X
h i I OI I −1 I I I γn,m,i µO + Σ (Σ ) (ξ − v − µ ) i i i n m i ,
(5.13)
i=1 m=1 I where γn,m,i and v Im are based on the input variables dimensions, namely
πi N (ξ In −v Im | µIi , ΣIi ) I . γn,m,i = PN I I I I k πk N (ξ n −v m | µk , Σk ) For a periodic movement modeled by DMP, we define the input and output partitions as (for GMM2 )
Σφi
Σφf i
φn ξ n = , µi = fn
µφ i , µfi
and
. Σi = f φ Σi diag(σ fi ) Similarly to discrete motion, the dataset is composed of the phase variable and external force as [φ, f target ]> . The forcing term data are periodic with respect to the phase variable (of modulo 2π), used as input variable in WGMR to estimate the external forcing term, see Fig. 5.1.
DMP LEARNED BY GMR
74
Table 5.1: Movements from the Human Motion Database Guerra-Filho and Biswas [2012] used in
Reaching1
Swing bat2
Walking3
Clapping4
the experiment.
Number of dimensions
3
12
12
6
Number of datapoints
150
130
137(per period)
170(per period)
Duration (sec.)
1.25
1.08
3.42
7.08
-
-
3
5
Movement
Number of periods
1 HMD/Generalization/position/reach/reach0000220 2 HMD/Cross-Validation/Subject
011/swing bat
3 HMD/Generalization/Position/Walk/walk000
2
4 HMD/Splicing/Individual/clap
5.4
Experiment
The proposed approach is evaluated with two discrete and two periodic movements from the Human Motion Database (HMD) of the University of Texas at Arlington, described in Guerra-Filho and Biswas [2012]: • Reaching an object by considering the position of the right hand; • Swing bat movement by considering the position of the right shoulder, elbow, wrist and hand; • Walking movement by considering the position of the right and left ankles and toes; • Clapping movement by considering the position of the right and left hand. Table 5.1 provides a summary of the data used in the experiment, and Fig. 5.2 shows the motion trajectories. Sequences of the movements are presented in Figs. 5.3-5.6 for the reaching, swing bat, walking and clapping movements, respectively. In order to make it more clear, the trajectories for some of the markers are depicted for each of the movements. Models with 2-20 basis functions are considered. τ is set to be equal to the duration of movement. αz and βz are fixed to be critically damped. GMR1 and GMR2 respectively refer to regression with diagonal (5.9) and full covariance matrices (5.10). The Matlab code provided in Ijspeert et al. [2013] is augmented with the proposed GMR training option.
5.4
Experiment
75
Reaching y2
y3
80 60 40 20 0
y
0.2
0.4
0.6
0.8
1
0 −20
1.2 time [s]
0.2
0.4
y
3
y1
9
−1
position [cm]
position [cm]
11
0.6
0.8
1 time [s]
Clapping y
1
y
9
20
Walking y
y
7
position [cm]
position [cm]
y1
Swing bat
−6
y3
y6
−5 −10 −15
−12 0.5
1
1.5
2
2.5
3 time [s]
2
4
6 time [s]
Figure 5.2: Movement trajectories. For more than 3 dimensions, partial data are shown to facilitate readability (the displayed variables are indicated in the legends). The root mean square (RMS) error between the demonstrated and reproduced trajectory is used to compare the learning approaches. Fig. 5.7 presents the results in function of the number of basis functions. For a given number of basis functions, the GMR strategy outperforms the WLS approach, especially when the number of basis function is small. As expected, GMR2 outperforms GMR1 , since the synergy patterns in the movement can be encapsulated in the covariance structure. However, for a given number of basis functions, GMR2 requires to store more parameters than GMR1 . The next section address this issue to provide fair comparisons.
5.4.1
Comparison for equal number of parameters stored in the models
For a given number of basis functions, more parameters are stored in GMR1 and GMR2 than WLS, see Fig. 5.8. For a fair comparison, the performance is here evaluated for an equal number of parameters to be stored by the models. For N basis functions and D dimensions, the number of parameters to be stored by WLS, GMR1 and GMR2 are given by N (2 + D), N (3 + D) − 1 and N (3 + 2D) − 1, respectively.3 Therefore, for a given number of parameters to be stored, the different approaches will result in a different number of basis functions. 3
Only the signal retrieved by GMR is considered here (not the full distribution).
DMP LEARNED BY GMR
76
t = 0 sec
t = 0.24 sec
t = 0.49 sec
60
60
60
50
50
50
40
40
40 y
70
y
70
y
70
30
30
30
20
20
20
10
10
10
0
0
−20 −10
0 −10 −20
0
−20 −10
0 −10
0
−20
10
−20 −10
0 −10
0
−20
10
x
x
z
t = 0.74 sec
z
t = 0.99 sec
t = 1.24 sec
60
60
60
50
50
50
40
40
40 y
70
y
70
y
70
30
30
30
20
20
20
10
10
10
−20
0
−10
0 −10 −20
10
x
z
0
0
−20 −10
0 −10
0
−20
10 x
z
0
−20 −10
0 −10
0
−20
10 x
z
0 10 x
z
Figure 5.3: Sequences of the reaching movement. The trajectories for the two markers on the right hand are depicted to make it more clear. Fig. 5.9 presents the results in function of the number of parameters to be stored. For a given number of parameters, we can see that GMR2 outperforms the other approaches in most cases.
5.5
Summary and conclusion
GMR presents an interesting alternative to the representation and estimation of parameters in dynamic movement primitives. The approach was tested with several discrete and periodic trajectories of different dimensions and characteristics. After extending the GMR formulation to periodic movements, we showed that EM could be used in both discrete and periodic settings to learn the basis functions of
5.5
Summary and conclusion
77
t = 0 sec
t = 0.22 sec
t = 0.45 sec
60
50
50
50
40
40
40 y
70
60
y
70
60
y
70
30
30
30
20
20
20
10
10
10
0
0 80
0 80
60
80 60
60
−10
40 20
−10
40
0 10
20 x
z
10
50
50
40
40
40 y
60
50
y
70
60
y
70
60
30
30
30
20
20
20
10
10
10
0
0
0 80
80 60 −10
40
0 20
z
60
−10
40 10
z
−10
40
0 20
x
x
t = 1.12 sec
70
60
10
z
t = 0.89 sec
80
0 20
x
z
t = 0.67 sec
−10
40
0
10
0 20
x
z
10 x
Figure 5.4: Sequences of the walking movement. The trajectories of the two markers on the left foot are depicted to make it more clear. the DMP from the data, decreasing the RMS reconstruction error of the model. By using GMR with full covariance matrices (GMR2 ), we then showed that the covariance information between the phase variable and the multidimensional force profile could also be encapsulated in the model, meaning that couplings between different DOFs could also be taken into account. The proposed GMR approach to DMP encoding opens up a host of new roads that we started to explore, such as exploiting the variability of multiple demonstrations to regulate stiffness (e.g., to determine the directions in which parts of the movement needs to be reproduced precisely Calinon et al. [2014]), or to generate movements with a variability that follows the essential characteristics of the task (e.g., by considering full matrices also in the output part of the covariances). The proposed encoding as standard mixture model allows the creation of robust mechanisms to study and adapt gestures to new situations Calinon et al. [2012a, 2013], and to estimate the number of required basis functions Bruno et al. [2013]. Finally, it creates promising links to subspace clustering and spectral learning approaches Shi et al. [2009]; Hsu and Kakade [2013].
DMP LEARNED BY GMR
78
t = 0 sec
t = 0.21 sec
t = 0.43 sec
60
60
60
50
50
50
40
40
40 y
70
y
70
y
70
30
30
30
20
20
20
10
10
10
0 10
0 10
−10
0 −10
−10
0
−20 −30
0 10
−10
0 −30
−10
0 −10
0
−20
10
−30
x
x
z
t = 0.65 sec
z
t = 0.86 sec
t = 1.08 sec
60
60
60
50
50
50
40
40
40 y
70
y
70
y
70
30
30
30
20
20
20
10
10
10
−10
0 −10
0 10
−30
−10
0 −10
0
−20
10
x
z
0 10
0
−20
10
−20
10
−30 x
z
0 10
−10
0 −10
0
−30 x
z
0
−20
10
10 x
z
Figure 5.5: Sequences of the swing bat movement. the trajectories of the two markers on the right hand are depicted to make it more clear.
5.5
Summary and conclusion
79
t = 0 sec
t = 0.28 sec
t = 0.56 sec
60
60
60
50
50
50
40
40
40 y
70
y
70
y
70
30
30
30
20
20
20
10
10
10
0 5
0 5
−10 0
−5 −10
0 −15
0 5
−10 0
10
−5 −10
0 −15
x
t = 1.13 sec
t = 1.41 sec
60
60
50
50
50
40
40
40 y
60
y
70
y
70
30
30
30
20
20
20
10
10
10
−10 0 −15
10
0 5
−10 0
−5 −10
0 −15
x z
10
z
70
−5 −10
0 −15
x
z
t = 0.85 sec
0
−5 −10
x
z
0 5
−10 0
10
10
0 5
−10 0
−5 −10
x z
0 −15
10 x
z
Figure 5.6: Sequences of the clapping movement. The trajectories for the two markers on the left and right arms are depicted to make it more clear.
DMP LEARNED BY GMR
80
0.06
0.2 GMR1
WLS
GMR2
[m]
Reaching
0.1
0.05
Yrms
Y rms
[m]
0.15
0.05
0.04
Swing bat
0.03 0.02 0.01
10
15
0
20
# components
0.04
0.025
0.035
Walking
Yrms
0.015 0.01 0.005 0
5
10
15
20
# components
0.03
0.02
rms
Y
5
[m]
[m]
0
Clapping
0.03 0.025 0.02 0.015 0.01
5
10
15
20
# components
0.005
5
10
15
20
# components
Figure 5.7: RMS errors in function of the number of basis functions (components).
5.5
Summary and conclusion
81
GMR1
# parameters
WLS
GMR2
500 250 0
2
5
10
15 20 # components
12 dimensions GMR1
WLS
GMR2
# parameters
200
100
0
2
5
10
15 20 # components
3 dimensions Figure 5.8: Number of parameters to be stored in function of the number of basis functions (components).
DMP LEARNED BY GMR
82
0.06 0.2
Y rms
0.05
Reaching
0.05
0
0.04
Swing bat
0.03
Yrms
0.15
0.1
[m]
GMR2
[m]
GMR1
0.02 0.01 0
20
40
60
80
100
50
100
#parameters
0.03
0.04
0.025
0.035
Walking
0.02
150
200
250
# parameters
[m]
[m]
WLS
0.03
Clapping
0.025
Yrms
Yrms
0.015 0.01 0.005 0
0.02 0.015 0.01
50
100
150
200
250
# parameters
0.005
50
100
150
# parameters
Figure 5.9: RMS errors in function of the number of parameters to be stored.
Chapter 6
Concluding Remarks 6.1
Discussions and conclusions
In this thesis, new approaches are proposed to transfer the skills from humans to the robots in a safe and efficient way. Most of the tasks are usually modulated by some task parameters such as the position of an object. We consider the category of motions in which the tasks are modulated by some task parameters or style variables. For the purposes of this thesis, only the tasks are taken into account for which the task parameters can be expressed as coordinate systems (frames of references). Task-parameterized Gaussian mixture model (TP-GMM) is a suitable solution for learning such kind of tasks. TP-GMM is successfully used to transfer several skills to different robotic platforms and yielded satisfactory results. It has good generalization properties compared to the state-of-the-art methods. As an extension of the original proposed TP-GMM approach, the case in which task parameters are partially observable during the demonstration and/or reproduction is investigated as the main contribution of the thesis. In other words, the approach is modified to be applicable for the cases in which the information regarding some of the task parameters are missing. The idea is to consider only the task parameters that are accessible during the demonstration and reproduction, therefore, in the learning of the model, this observability is taken into account. So that, whenever a task parameter is not present during the demonstration, its corresponding data is not considered in the calculations of the model parameters. The similar procedure is considered for the reproduction part. Prior to start the reproduction, the observability of the frames of the references is determined, and if any of them are not accessible, their effect on the calculation of the estimate of the output variable is simply ignored. The proposed approach is very simple and effective. It has been tested in a dust sweeping scenario, in which the number of dust areas to be cleaned, is different in the demonstration and reproduction. In other words, some of the frames of references are not observable. The approach presented a natural 83
C ONCLUDING R EMARKS
84
behaviour in the experiment. In the case in which only the robot frame of reference is available, the robot performs an average movement. The second contribution of this thesis is related to the study of dynamic movement primitives using mixture models. The mixture model is used to replace the conventional linear weighted regression (LWR) approach for fitting DMP on the given trajectory data. In this way, a GMM is used to model the relation between the phase variable and the external force of the DMP model. By the introduction of mixture model as the core of the DMP, the problem of pre-determining the number of basis functions and their corresponding parameters (centres and widths) could be circumvented. Compared to the conventional LWR approach, mixture models need smaller number of basis functions in order to generate similar behaviour in terms of RMS error of the trajectory data. There are also other advantages of using mixture models, such as their suitability for handling multiple demonstrations and applying optimal control strategies such as the minimal intervention principle, see Calinon et al. [2014].
6.2
Future Perspectives
This thesis opens roads for further issues to be investigated and potential improvements to be done for each of the proposed approaches. For the task-parameterized GMM approach the following issues could be considered as future research problems: • In the proposed approach, only those tasks are taken into account for which the task parameters can be expressed as the coordinate frames. This can be a limiting factor for broader family of tasks for which the task parameters cannot be expressed easily in this specific way. • The current formulation could be modified in order to make it computationally more efficient. Parallel computation techniques could be investigated to optimize the computations. • The reverse problem of identifying task parameters b, A given a demonstration is not straight forward using the proposed approach and needs further investigation. • The frames of references considered in the experiments were defined by Cartesian coordinate systems. The approach, however, does not require this assumption to be made, and the candidate frames do not need to have the same dimensions. Non-squared A matrices could also be considered, which could for example be exploited to learn skills requiring joint space and task space coordinates. Also, this thesis focused on acquiring trajectories. A direction for the future
6.2
Future Perspectives
85
work is to study if such models could similarly learn reflex behaviours that are shaped by task parameters (including force signals), which would be relevant for robots faced with continuous sources of perturbation. In the case of learning by demonstration with partially observable task parameters, the following could be the lines of future research: • The model could be modified in such a way to consider different kinds of behaviour when different frames are missing. For example, in the case of dust sweeping example, when the frame corresponding to one of the dust areas is missing, it is expected that the task should be performed, while when both of the dust areas or the dust pan are missing, it would be desirable for the system not to do anything and stay put (assuming that there is nothing to sweep). This change in the behaviour of the system, of course, should be derived from the demonstrations that are provided by the user. • Other ways of handling missing frames could be studied as well. For example, one solution could be to replace each of the missing frames with a random position for the frame and a big transformation matrix (setting big values for the eigen values), or, alternatively, replacing it with the average of the corresponding frames form different demonstrations. Other approaches such as matrix and higher order tensor completion methods could be considered as well. • The case in which a higher number of task parameters are present during the reproduction than during the demonstration could also be investigated. The current approach does not take this into account, however, this could be treated by combining this approach with other tools such as reinforcement learning techniques. By doing so, the approach would become even more powerful in terms of generalization. • Other types of representations could also be investigated to encode tasks and skills. In other words, other PbD approaches could be extended to cope with the partially observable task parameters problem. It would be nice to compare the performance of different methods using various measures of performance such as generalizability, computation time, and so on. Finally, the potential future works for the DMP learned with mixture models approach are as follows: • The performance of the proposed approach should be investigated in the cases of multiple demonstrations. In this way, the stiffness pattern of the movement could be extracted and the performance could be evaluated and compared to the conventional ones.
C ONCLUDING R EMARKS
86
• Several motor control approaches such as the minimal intervention principle could be investigated with the proposed approach. The performance of the method could be compared to the other approaches for the case of disturbance rejection, for example.
Appendix A
Publications of the Author This appendix lists the publications of the author during his doctoral studies. Each publication is briely described with comments on the connections to this thesis. The publications are listed according to the date of their publication.
Peer-reviewed proceedings ”Workshop paper: Teaching of bimanual skills in a compliant humanoid robot”, 5th International Workshop on Human-Friendly Robotics (HFR), Brussels, Belgium, 18-19 October, 2012 Calinon et al. [2012b].
Excerpt: A learning by imitation approach is proposed based on a superposition of virtual springdamper systems to drive a humanoid robot’s movement. The novelty of the method relies on a statistical description of the springs attractor points acting in different candidate frames of reference. The proposed approach combines the practical convenience of employing dynamical systems in the humanoid’s versatile environment with the generality and rigor of statistical machine learning. The robot exploits local variability information extracted from multiple demonstrations to determine which frames are relevant for the task, and how the movement should be modulated with respect to these frames. The proposed learning approach is achieved by predefining a set of candidate frames of reference, such as objects in the environment or relevant body parts such as the end-effectors of the robot. The role of the robot is to autonomously figure out which frames of reference matter along the task, and in which way the movement should be modulated with respect to these different frames. Bimanual coordination is achieved by considering the other hand as a candidate frame of reference for the re87
88
P UBLICATIONS OF THE AUTHOR
production of the skill. The movement of the two hands are thus naturally coupled for parts of the movement in which regular patterns have been observed between the two hands. The strength of the coupling constraint is automatically adapted with respect to the variations observed throughout the task.
Conference paper: ”Statistical dynamical systems for skills acquisition in humanoids”, IEEE International Conference on Humanoid Robots (Humanoids), Osaka, Japan, 28-30 November, 2012 Calinon et al. [2012a].
Abstract: Learning by imitation in humanoids is challenging due to the unpredictable environments these robots have to face during reproduction. Two sets of tools are relevant for this purpose: 1) probabilistic machine learning methods that can extract and exploit the regularities and important features of the task; and 2) dynamical systems that can cope with perturbation in real-time without having to replan the whole movement. We present a learning by imitation approach combining the two benefits. It is based on a superposition of virtual spring-damper systems to drive a humanoid robot’s movement. The method relies on a statistical description of the springs attractor points acting in different candidate frames of reference. It extends dynamic movement primitives models by formulating the dynamical systems parameters estimation problem as a Gaussian mixture regression problem with projection in different coordinate systems. The robot exploits local variability information extracted from multiple demonstrations of movements to determine which frames are relevant for the task, and how the movement should be modulated with respect to these frames. The approach is tested on the new prototype of the COMAN compliant humanoid with time-based and timeinvariant movements, including bimanual coordination skills.
Conference paper: ”On improving the extrapolation capability of task-parameterized movement models”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3-7 November, 2013 (CoTeSys cognitive robotics best paper award finalist) Calinon et al. [2013].
Abstract: Gestures are characterized by intermediary or final landmarks (real or virtual) in task space or joint space that can change during the course of the motion, and that are described by varying accuracy and correlation constraints. Generalizing these trajectories in robot learning by imitation is challenging, because of the small number of demonstrations provided by the user. We present an approach to statistically encode movements in a task-parameterized mixture model, and derive an
89 expectation-maximization (EM) algorithm to train it. The model automatically extracts the relevance of candidate coordinate systems during the task, and exploits this information during reproduction to adapt the movement in real-time to changing position and orientation of landmarks or objects. The approach is tested with a robotic arm learning to roll out a pizza dough. It is compared to three categories of taskparameterized models: 1) Gaussian process regression (GPR) with a trajectory models database; 2) Multi-streams approach with models trained in several frames of reference; and 3) Parametric Gaussian mixture model (PGMM) modulating the Gaussian centers with the task parameters. We show that the extrapolation capability of the proposed approach outperforms existing methods, by extracting the local structures of the task instead of relying on interpolation principles.
Conference paper: T. Alizadeh, S. Calinon, and DG. Caldwell, ”Learning from Demonstrations with Partially Observable Task Parameters”, IEEE International conference on Robotics and Automation (ICRA), Hong Kong, China, May 31-June 5, 2014 Alizadeh et al. [2014].
Abstract: Robot learning from demonstrations requires the robot to learn and adapt movements to new situations, often characterized by position and orientation of objects or landmarks in the robot’s environment. In the task-parameterized Gaussian mixture model (TP-GMM) framework, the movements are considered to be modulated with respect to a set of candidate frames of reference (coordinate systems) attached to a set of objects in the robot workspace. Following a similar approach, this paper addresses the problem of having missing candidate frames during the demonstrations and reproductions, which can happen in various situations such as visual occlusion, sensor unavailability, or tasks with a variable number of descriptive features. We study this problem with a dust sweeping task in which the robot requires to consider a variable amount of dust areas to clean for each reproduction trial.
Appendix B
Properties of Gaussian distributions This appendix provides the formulations for the linear combination and products of Gaussian distributions, which are already used in the thesis, as depicted in Fig. B.1.
Figure B.1: Properties of Gaussian distributions exploited in the proposed approaches. From left to right: product, linear combination, and conditional probability (reproduced from Calinon [2009]).
Linear combinations of Gaussian distributions Given two variables x1 and x2 that distribute according to a multinomial Gaussian density identified by the following parameters
x1 ∼ N (µ1 , Σ1 ) and x2 ∼ N (µ2 , Σ2 ) 91
P ROPERTIES OF G AUSSIAN DISTRIBUTIONS
92
then, the variable x3 = C 1 x1 + C 2 x2 + d describes a linear combination with C 1 , C 1 being transformation matrices, d being a shift vector and x3 still distributes according to a multinomial Gaussian distribution
x3 = C 1 x1 + C 2 x2 + d ∼ N (µ, Σ); with parameters
µ = C1 µ1 + C2 µ2 + d, and Σ = C1 Σ1 C>1 + C2 Σ2 C>2
Conditional probability properties of Gaussians Let x ∼ N (µ, Σ) be defined by x1 µ1 Σ11 Σ12 x = ,µ = ,Σ = x2 µ2 Σ21 Σ22 The conditional probability P(x2 |x1 ) is defiend by
P(x2 |x1 ) ∼ N (µc , Σc ) with µc = µ2 + Σ21 (Σ11 )−1 (x1 − µ1 ), and Σc = Σ22 − Σ21 (Σ11 )−1 Σ12 ,
Product of Gaussians Given two variables x1 and x2 that distribute according to a multinomial Gaussian density identified by the following parameters
x1 ∼ N (µ1 , Σ1 )
93 and x2 ∼ N (µ2 , Σ2 ) then, the product of these two Gaussian distributions is defined by
cN (µP , ΣP ) = N (µ1 , Σ1 ).N (µ2 , Σ2 ) with
c = N (µ1 |µ2 , Σ1 + Σ2 ),
−1 ΣP = Σ−1 1 + Σ2
−1
,
and −1 µP = ΣP (Σ−1 1 µ1 + Σ2 µ2 ).
Bibliography Adorno, B. V., Fraisse, P., and Druon, S. (2010). Dual position control strategies using the cooperative dual task-space framework. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 3955–3960. Agiomyrgiannakis, Y. and Stylianou, Y. (2009). Wrapped gaussian mixture models for modeling and high-rate quantization of phase data of speech. Audio, Speech, and Language Processing, IEEE Transactions on, 17(4):775–786. Alissandrakis, A., Nehaniv, C. L., and Dautenhahn, K. (2006). Action, state and effect metrics for robot imitation. In IEEE Intl Symposium on Robot and Human Interactive Communication (RoMan), pages 232–237, Hatfield, UK. Alizadeh, T., Calinon, S., and Caldwell, D. G. (2014). Learning from demonstrations with partially observable task parameters. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pages 3309–3314, Hong Kong, China. Argall, B. D., Chernova, S., Veloso, M., and Browning, B. (2009). A survey of robot learning from demonstration. Robot. Auton. Syst., 57(5):469–483. Atkeson, C. G. and Schaal, S. (1997a). Learning tasks from a single demonstration. In Proc. IEEE/RSJ Intl Conf. on Robotics and Automation (ICRA), volume 2, pages 1706–1712. Atkeson, C. G. and Schaal, S. (1997b). Robot learning from demonstration. In ICML, volume 97, pages 12–20. Baek, J., McLachlan, G. J., and Flack, L. K. (2010). Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1298–1309. Bentivegna, D. C., Atkeson, C. G., and Cheng, G. (2004). Learning tasks from observation and practice. Robotics and Autonomous Systems, 47(2-3):163–169. 95
96
B IBLIOGRAPHY
Billard, A., Calinon, S., Dillmann, R., and Schaal, S. (2008). Robot programming by demonstration. In Siciliano, B. and Khatib, O., editors, Handbook of Robotics, pages 1371–1394. Springer, Secaucus, NJ, USA. Billing, E. A. and Hellstr¨om, T. (2010). A formalism for learning from demonstration. Journal of Behevioral Robotics, 1(1):1–13. Bizzi, E., Cheung, V., d’Avella, A., Saltiel, P., and Tresch, M. (2008). Combining modules for movement. Brain Research Reviews, 57(1):125–133. Bouveyron, C. and Brunet-Saumard, C. (2012). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis. Brand, M. and Hertzmann, A. (2000). Style machines. In Proc. ACM Intl Conf. on Computer graphics and Interactive Techniques (SIGGRAPH), pages 183–192, New Orleans, Louisiana, USA. Bruno, D., Calinon, S., and Caldwell, D. G. (2013). Bayesian nonparametric multi-optima policy search in reinforcement learning. In AAAI Conference on Artificial Intelligence, Bellevue, Washington, USA. Calinon, S. (2009). Robot Programming by Demonstration: A Probabilistic Approach. EPFL/CRC Press. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867-2. Calinon, S., Alizadeh, T., and Caldwell, D. G. (2013). On improving the extrapolation capability of task-parameterized movement models. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Tokyo, Japan. Calinon, S. and Billard, A. (2009). Statistical learning by imitation of competing constraints in joint space and task space. Advanced Robotics, 23(15):2059–2076. Calinon, S., Bruno, D., and Caldwell, D. G. (2014). A task-parameterized probabilistic model with minimal intervention control. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Hong Kong, China. Calinon, S., D’halluin, F., Caldwell, D. G., and Billard, A. G. (2009). Handling of multiple constraints and motion alternatives in a robot programming by demonstration framework. In Proc. IEEE-RAS Intl Conf. on Humanoid Robots (Humanoids), pages 582–588, Paris, France. Calinon, S., D’halluin, F., Sauser, E. L., Caldwell, D. G., and Billard, A. G. (2010a). Learning and reproduction of gestures by imitation: An approach based on hidden Markov model and Gaussian mixture regression. IEEE Robotics and Automation Magazine, 17(2):44–54.
B IBLIOGRAPHY
97
Calinon, S., Guenter, F., and Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Trans. on Systems, Man and Cybernetics, Part B, 37(2):286–298. Calinon, S., Li, Z., Alizadeh, T., Tsagarakis, N. G., and Caldwell, D. G. (2012a). Statistical dynamical systems for skills acquisition in humanoids. In Proc. IEEE Intl Conf. on Humanoid Robots (Humanoids), pages 323–329, Osaka, Japan. Calinon, S., Li, Z., Alizadeh, T., Tsagarakis, N. G., and Caldwell, D. G. (2012b). Teaching of bimanual skills in a compliant humanoid robot. In Intl Workshop on Human-Friendly Robotics (HFR). Calinon, S., Sardellitti, I., and Caldwell, D. G. (2010b). Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), pages 249–254, Taipei, Taiwan. Cederborg, T., Ming, L., Baranes, A., and Oudeyer, P.-Y. (2010). Incremental local online gaussian mixture regression for imitation learning of multiple tasks. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Taipei, Taiwan. Delson, N. and West, H. (1996). Robot programming by human demonstration: Adaptation and inconsistency in constrained motion. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pages 30–36, Minneapolis, MN, USA. Dong, S. and Williams, B. (2012). Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes. International Journal of Social Robotics, 4(4):357–368. Ekvall, S. and Kragic, D. (2006). Learning task models from multiple human demonstrations. In Proc. IEEE Intl Symposium on Robot and Human Interactive Communication (RO-MAN), pages 358–363, Hatfield, UK. Flash, T. and Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current opinion in neurobiology, 15(6):660–666. Forte, D., Gams, A., Morimoto, J., and Ude, A. (2012). On-line motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems, 60(10):1327–1339. Ghahramani, Z. and Jordan, M. I. (1994). Supervised learning from incomplete data via an EM approach. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems, volume 6, pages 120–127. Morgan Kaufmann Publishers, Inc.
B IBLIOGRAPHY
98
Gribovskaya, E. and Billard, A. (2008). Combining dynamical systems control and programming by demonstration for teaching discrete bimanual coordination tasks to a humanoid robot. In Proc. ACM/IEEE Intl Conf. on Human-Robot Interaction (HRI). Grimes, D. B., Chalodhorn, R., and Rao, R. P. N. (2006). Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In Proc. Robotics: Science and Systems (RSS), pages 1–8. Guerra-Filho, G. and Biswas, A. (2012). The human motion database: A cognitive and parametric sampling of human motion. Image and Vision Computing, 30(3):251–261. Herzog, D., Kr¨uger, V., and Grest, D. (2008). Parametric hidden markov models for recognition and synthesis of movements. In British Machine Vision Conference, pages 163–172. Hoffmann, H., Pastor, P., Park, D. H., and Schaal, S. (2009). Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pages 2587–2592. Hogan, N. and Sternad, D. (2012). Dynamic primitives of motor behavior. Biological cybernetics, 106(11-12):727–739. Hsu, D. and Kakade, S. M. (2013). Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 11–20. ACM. Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., and Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural computation, 25(2):328–373. Ijspeert, A. J., Nakanishi, J., and Schaal, S. (2003). Learning control policies for movement imitation and movement recognition. In Neural Information Processing System (NIPS), volume 15, pages 1547–1554. Jansen, B. and Belpaeme, T. (2006). A computational model of intention reading in imitation. Robotics and Autonomous Systems, 54(5):394–402. Kober, J., Wilhelm, A., Oztop, E., and Peters, J. (2012).
Reinforcement learning to adjust
parametrized motor primitives to new situations. Autonomous Robots, 33(4):361–379. Kronander, K., Khansari-Zadeh, M. S., and Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 710–717.
B IBLIOGRAPHY
99
Kr¨ueger, V., Herzog, D. L., Baby, S., Ude, A., and Kragic, D. (2010). Learning actions from observations: Primitive-based modeling and grammar. IEEE Robotics and Automation Magazine, 17(2):30–43. Kuniyoshi, Y., Inaba, M., and Inoue, H. (1989). Teaching by showing: Generating robot programs by visual observation of human performance. In Proc. Intl Symposium of Industrial Robots, pages 119–126, Tokyo, Japan. Latash, M. L., Scholz, J. P., and Sch¨oner, G. (2002). Motor control strategies revealed in the structure of motor variability. Exercise and sport sciences reviews, 30(1):26–31. Li, Z., Vanderborght, B., Tsagarakis, N. G., Colasanto, L., and Caldwell, D. G. (2012). Stabilization for the compliant humanoid robot coman exploiting intrinsic and controlled compliance. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 2000–2006. Lieberman, H. (2002). Art imitates life: Programming by example as an imitation game. In Nehaniv, C. and Dautenhahn, K., editors, Imitation in Animals and Artifacts, pages 157–169. MIT Press. Lozano-Perez, T. (1983). Robot programming. Proc. of the IEEE, 71(7):821–841. Mardia, K. V. and Jupp, P. E. (2009). Directional statistics, volume 494. Wiley. Matsubara, T., Hyon, S., and Morimoto, J. (2010). Learning stylistic dynamic movement primitives from multiple demonstrations. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 1277–1283. Matsubara, T., Hyon, S.-H., and Morimoto, J. (2011). Learning parametric dynamic movement primitives from multiple demonstrations. Neural Networks, 24(5):493–500. Matsubara, T., Hyon, S.-H., and Morimoto, J. (2012). Real-time stylistic prediction for whole-body human motions. Neural Networks, 25:191–199. McLachlan, G. J., Peel, D., and Bean, R. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41(3):379–388. Mcnicholas, P. D. and Murphy, T. B. (2008). Parsimonious gaussian mixture models. Statistics and Computing, 18(3):285–296. Mendel, J. and McLaren, R. (1970). Reinforcement-learning control and pattern recognition systems. Mathematics in Science and Engineering, 66:287–318.
100
B IBLIOGRAPHY
Muehlig, M., Gienger, M., Hellbach, S., Steil, J. J., and Goerick, C. (2009). Task-level imitation learning using variance-based movement optimization. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pages 1635–1642. Muench, S., Kreuziger, J., Kaiser, M., and Dillmann, R. (1994). Robot programming by demonstration (RPD) - Using machine learning and user interaction methods for the development of easy and comfortable robot programming systems. In Proc. Intl Symposium on Industrial Robots (ISIR), pages 685–693, Hannover, Germany. M¨uhlig, M., Gienger, M., and Steil, J. J. (2012). Interactive imitation learning of object movement skills. Autonomous Robots, 32(2):97–114. Mussa-Ivaldi, F. A. (1992). From basis functions to basis fields: vector field approximation from sparse data. Biological Cybernetics, 67(6):479–489. Nguyen-Tuong, D. and Peters, J. (2008). Local gaussian process regression for real-time model-based robot control. In IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), pages 380–385. Nicolescu, M. N. and Mataric, M. J. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proc. Intl Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 241–248, Melbourne, Australia. Ogawara, K., Takamatsu, J., Kimura, H., and Ikeuchi, K. (2003). Extraction of essential interactions through multiple observations of human demonstrations. IEEE Trans. on Industrial Electronics, 50(4):667–675. Pardowitz, M., Zoellner, R., Knoop, S., and Dillmann, R. (2007). Incremental learning of tasks from user demonstrations, past experiences and vocal comments. IEEE Trans. on Systems, Man and Cybernetics, Part B, 37(2):322–332. Peters, J., Vijayakumar, S., and Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proc. IEEE Intl Conf. on Humanoid Robots (Humanoids), Karlsruhe, Germany. Rabiner, L. and Juang, B. (1986). An introduction to hidden Markov models. ASSP Magazine, IEEE, 3(1):4–16. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77:2:257–285.
B IBLIOGRAPHY
101
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press, Cambridge, MA, USA. S. M. Khansari-Zadeh, A. B. (2011). Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transaction on Robotics, 27 (5):943–957. Sato, T., Genda, Y., Kubotera, H., Mori, T., and Harada, T. (2003). Robot imitation of human motion based on qualitative description from multiple measurement of human and environmental data. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), volume 3, pages 2377–2384. Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6):233–242. Schaal, S. and Atkeson, C. G. (1998). Constructive incremental learning from only local information. Neural Computation, 10(8):2047–2084. Schaal, S., Ijspeert, A., and Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transaction of the Royal Society of London: Series B, Biological Sciences, 358(1431):537–547. Schaal, S., Mohajerian, P., and Ijspeert, A. J. (2007). Dynamics systems vs. optimal control a unifying view. Progress in Brain Research, 165:425–445. Schneider, M. and Ertel, W. (2010). Robot learning by demonstration with local gaussian process regression. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 255–260. Scholz, J. P. and Sch¨oner, G. (1999). The uncontrolled manifold concept: identifying control variables for a functional task. Experimental brain research, 126(3):289–306. Scott Kelso, J. A. (2009). Synergies: Atoms of brain and behavior. Progress in Motor Control, 629:83–91. Segre, A. B. and DeJong, G. (1985). Explanation-based manipulator learning: Acquisition of planning ability through observation. In IEEE Conf. on Robotics and Automation (ICRA), pages 555– 560, St. Louis, MO, USA. Shi, T., Belkin, M., and Yu, B. (2009). Data spectroscopy: Eigenspaces of convolution operators and clustering. The Annals of Statistics, pages 3960–3984.
102
B IBLIOGRAPHY
Signoretto, M., Van de Plas, R., De Moor, B., and Suykens, J. A. (2011). Tensor versus matrix completion: a comparison with application to spectral data. Signal Processing Letters, IEEE, 18(7):403–406. Silva, B. D., Konidaris, G., and Barto, A. G. (2012). Learning parameterized skills. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1679–1686. Sternad, D., Abe, M. O., Hu, X., and M¨uller, H. (2011). Neuromotor noise, error tolerance and velocity-dependent costs in skilled performance. PLoS computational biology, 7(9):e1002159. Sternad, D., Park, S.-W., Mueller, H., and Hogan, N. (2010). Coordinate dependence of variability analysis. PLoS Computational Biology, 6(4):1–16. Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., and Sigaud, O. (2013). Learning compact parameterized skills with a single regression. In Proc. IEEE Intl Conf. on Humanoid Robots (Humanoids), pages 1–7, Atlanta, USA. Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning : an introduction. Adaptive computation and machine learning. MIT Press, Cambridge, MA, USA. Tipping, M. E. and Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural computation, 11(2):443–482. Todorov, E. and Jordan, M. I. (2002). Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5:1226–1235. Tsagarakis, N. G., Li, Z., Saglia, J., and Caldwell, D. G. (2011). The design of the lower body of the compliant humanoid robot ’ccub’. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 2035–2040. Ude, A. (1993). Trajectory generation from noisy positions of object features for teaching robot paths. Robotics and Autonomous Systems, 11(2):113–127. Ude, A., Gams, A., Asfour, T., and Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. Robotics, IEEE Transactions on, 26(5):800–815. Vijayakumar, S., D’souza, A., and Schaal, S. (2005). Incremental online learning in high dimensions. Neural Computation, 17(12):2602–2634. Waltz, M. and Fu, K. (1965). A heuristic approach to reinforcement learning control systems. Automatic Control, IEEE Transactions on, 10(4):390–398.
B IBLIOGRAPHY
103
Wilson, A. D. and Bobick, A. F. (1999). Parametric hidden Markov models for gesture recognition. IEEE Robotics and Automation Magazine, 21(9):884–900. Yamazaki, T., Niwase, N., Yamagishi, J., and Kobayashi, T. (2005). Human walking motion synthesis based on multiple regression hidden semi-markov model. In Cyberworlds, 2005. International Conference on, pages 445–452. Yoshikai, T., Otake, N., Mizuuchi, I., Inaba, M., and Inoue, H. (2004). Development of an imitation behavior in humanoid Kenta with reinforcement learning algorithm based on the attention during imitation. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), pages 1192– 1197, Sendai, Japan.