Learning Stable Task Sequences from Demonstration with Linear ...

2 downloads 0 Views 4MB Size Report
we run our approach on the motions obtained during a kinaesthetic teaching session. The task con- sists of a repetitive process with three phases: reach, peel ...
Learning Stable Task Sequences from Demonstration with Linear Parameter Varying Systems and Hidden Markov Models Jos´e R. Medina and Aude Billard

Abstract: The problem of acquiring multiple tasks from demonstration is typically divided in two sequential processes: (1) the segmentation or identification of different subgoals/subtasks and (2) a separate learning process that parameterizes a control policy for each subtask. As a result, segmentation criteria typically neglect the characteristics of control policies and rely instead on simplified models. This paper aims for a single model capable of learning sequences of complex time-independent control policies that provide robust and stable behavior. To this end, we first present a novel and efficient approach to learn goal-oriented timeindependent motion models by estimating both attractor and dynamic behavior from data guaranteeing stability using linear parameter varying (LPV) systems. This method enables learning complex task sequences with hidden Markov models (HMMs), where each state/subtask is given by a stable LPV system and where transitions are most likely around the corresponding attractor. We study the dynamics of the HMM-LPV model and propose a motion generation method that guarantees the stability of task sequences. We validate our approach in two sets of demonstrated human motions. Keywords: Dynamical systems, Stability, Linear Parameter Varying Systems

1

Introduction

Learning from Demonstration (LfD) has become one of the most effective tools to enable robot autonomy. The majority of work in this field has been devoted to encoding demonstrations of a single task into a controller that synthesizes accurate, robust and generalizable robotic behavior. In this context, Dynamical Systems (DS) have become the most popular method due to their ability to provide behavioral guarantees. Specifically, if defined properly, DSs ensure convergence to a pre-specified attractor or goal (stability). Recently, many researchers have focused on extending these models from a single task to more complex sequences of tasks. A typical approach to this problem is the application of a two-steps: (1) a segmentation algorithm is first applied to acquire potential subtask demonstrations and (2) a stable DS is learned from the resulting segments assuming that end-points determine the attractors. However, segmentation algorithms rely on oversimplified linear models that neglect the properties of the DS considered. In addition, assuming that the end-point of a potential segment is an attractor might not hold, e.g. when data does not exhibit convergence (a bad segmentation point). A unified way of representing multiple sequential control policies is to model the currently active subtask with a discrete latent variable such as in a Hidden Markov Model (HMM): the hidden state indicates the current subtask and each subtask is given by a different parameterization of a DS. This way, subtasks are learned jointly without explicitly assuming segmentation points. Following this idea, HMMs have been successfully applied with simple Gaussian [1], linear Gaussian [2] or potentially non-stable dynamical system policies [3]. However, learning HMMs with stable complex nonlinear subtasks becomes more challenging. In fact, the standard learning procedure of stable goal-oriented motions prevents from learning such a unified model as it requires knowing the attractor a priori. In this work, we first present a novel method to learn asymptotically stable goal-oriented motions capable of estimating the dynamic behavior and the attractor/goal from demonstrations. We param1st Conference on Robot Learning (CoRL 2017), Mountain View, United States.

eterize motions as linear parameter varying (LPV) systems by means of Gaussian Mixture Models (GMM). The most likely parameters are learned applying the Expectation-Maximization (EM) algorithm and relying only on convex optimization programs. To represent complex task sequences, we formulate an HMM where each state/subtask is given by a stable LPV system and where transitions between subtasks are most likely around the corresponding attractor, which guarantees convergence of the full sequence. We validate this approach in a simulated experiment and in a set of recorded human motions. Related work: Learning autonomous DSs has been a recurrent problem since the early works from [4] and [5]. More recently, the SEDS approach [6] ensures asymptotic stability to a previously known attractor and is also applicable to second-order systems [7] and incremental learning settings [8]. All aforementioned methods assume that the attractor is known a priori and is not part of the estimation process. In [9], the attractors of Dynamic Motor Primitives are estimated based on a previously trained library assuming simplified linear models. In contrast, the learning method proposed in this work estimates the attractor and dynamic behavior only from data in a time-independent manner, with no phase variable and with no prior knowledge about the underlying primitive. A body of work has been devoted to learning task sequences. While some early works exploit mixtures of autonomous DSs to synthesize complex behavior [10], most algorithms rely on an initial segmentation step followed by a model learning phase. In [1], a segmentation algorithm provide potential primitives which are learned and clustered by means of HMMs and their sequence is determined by a graph. Other segmentation algorithms for manipulation rely on contact information [11, 12]. A more informed segmentation that considers changes of potential constraints is presented in [13] to later fit hybrid position/force controllers with the resulting segments. Although the sequencing policy is typically represented by a graph, more sophisticated conditions can be extracted from data. In [14] a set of pre- and post-conditions is identified thereby learning a Petri net. Other works rely on linear policies [2] or directional normal distributions [15] as state emissions of HMMs [16] that consider state-dependent termination policies [17] for each state. The resulting model is later used to train movement primitives for each subtask. An interesting alternative is proposed in [18], where, given a set of potential segmentation points, the most probable ones are chosen depending on the model later used for control. In contrast to previous work, our proposed approach directly fits an HMM where each state is given by a stable LPV system. The capability of estimating the attractor from data enables a solution that considers all subtasks jointly given an initial guess. The rest of this paper is organized as follows. Section 2 presents and evaluates the LPV model. Section 3 introduces and validates the HMM where each subtask is given by a stable LPV model. A discussion and concluding remarks are given in Section 4.

2

Learning asymptotically stable LPV systems with unknown attractor

Autonomous DSs represent motions as ξ˙ = f (ξ), where ξ ∈ Rn represents the system’s state and f : Rn 7→ Rn . A suitable option to encode nonlinear systems is given by a mixture of C local linear DSs C X ξ˙ = − hc (α)(Ac ξ + bc ) , (1) c=1

where −Ac ∈ Rn×n is the dynamics matrix, bc the bias and α ∈ Rm is a measurable external C P parameter that defines the mixing coefficient hc (α), hc (α) = 1 and hc (α) > 0. In this case, if c=1

Ac + A0c  0 , ξ ∗ = A−1 c bc

∀c = 1 · · · C

(2)

the system asymptotically converges to the unique attractor ξ ∗ [6]. This is a specific instance of a stable LPV system (if α = ξ it is denoted quasi-LPV system) [19] and its parameterization can be estimated from data if the attractor is known a priori using GMMs [20] or fuzzy models [21, 22, 23]. However, when the attractor is unknown, the estimation of ξ ∗ becomes a challenging problem due to the nonconvexity arising from (2). Instead of considering (1), the main idea behind the method presented in this section is to avoid this dependency by considering its inverse, i.e. the mixture of 2

local inverse linear dynamical systems ξ=

C X

∗ ˙ hc (α)(ξ ∗ − A−1 c ξ) = ξ −

c=1

C X

˙ hc (α)A−1 c ξ

c=1

This alternative representation simplifies the identification of LPV and quasi-LPV systems when the attractor ξ ∗ is unknown, as it becomes a bias in the model. Problem formulation: Given a set of M i.i.d. samples {ξ i , ξ˙ i , αi }M i=1 the problem considered in this section is the estimation of the dynamic behavior represented in the observations assuming that it is globally asymptotically stable and has a single unknown attractor ξ ∗ . 2.1

Modeling asymptotically stable LPV systems with GMMs

The model presented in this section assumes that the joint distribution of α and the observed dy˙ ξ is given by a GMM. Specifically, observations are distributed namic behavior represented by ξ, as    X    C α α P P (c ; θ) P (3) ξ ; θ = ξ, c ; θ ξ˙ ξ˙ c=1

where P (c ; θ) = πc represents the prior and the conditional probability density function of each component     α ˙ c ; θ P (α|c ; θ) c ; θ = P ξ|ξ, P ξ, ˙ξ   ˙ c ; θ = N − Ac (ξ − ξ ∗ ), Σc,˙ P ξ|ξ, P (α|c ; θ) = N (µc,α , Σc,α ) (4) has dynamics matrix Ac , estimation noise Σc,˙ and µc,α , Σc,α denote the mean and variance of α respectively. Note that all mixture components share the same attractor ξ ∗ by design. The parameter set is given by θ = {ξ ∗ , θ 1 , · · · , θ C } where θ c = {πc , µc,α , Σc,α , Ac , Σc,ξ˙ }. For the sake of simplicity, in the following we will omit the dependency w.r.t θ. For a standard LPV system, i.e. α 6= ξ, equation (4) accordingly assumes that α is independent of the system’s state or its derivative.In the case of a quasi-LPV system, i.e. α = ξ, expres ξ ˙ sion (4) reduces to the joint density P ˙ c = P ξ|ξ, c P (ξ|c), as in the SEDS approach [6]. ξ

For later convenience, we also define the inverse linear dynamics of each component, which, under the asymptotic stability constraint are   ˙ ˙ c = N ξ ∗ − A−1 P ξ|ξ, (5) c ξ, Σc, and, again, all components share the same attractor ξ ∗ . 2.2

Expected dynamics of the GMM

Given an observed pair {α, ξ}, the dynamic behavior specified by the conditional distribution of ξ˙ from (3) is also Gaussian, with mean C   X ˙ α = E P ξ|ξ, hc (α)(−Ac (ξ − ξ ∗ ))

hc (α) =

c=1

Note that hc (α) ≥ 0 and

C P

πc N C P j=1



α|µc,α , Σc,α



.

(6)

πj N (α|µj,α ,Σj,α )

hc (α) = 1. Sufficient conditions for convergence of (6) are given in

c=1

the following proposition. Proposition 1 (Asymptotic stability [6]). Let  ∈ R be a small positive constant. Dynamics (6) globally asymptotically converge to the attractor ξ ∗ if Ac + AT c  I

∀c = 1 · · · C . 3

(7)

2.3

Learning asymptotically stable LPV systems with the EM algorithm

Given a set of observations, the LPV parameters θ that maximize the likelihood are estimated by means of the expectation-maximization (EM) algorithm [24]. To ensure convergence of (6) we constrain the maximization problem with the set of convex constraints (7), yielding arg max θ

M X

 log P

i=1

  αi ; θ ξ i ξ˙ i

Ac + AT c  I

s.t.

∀c = 1 · · · C .

(8)

The EM algorithm aims for a locally optimal solution of this problem by maximizing a lower bound of the log-likelihood in an iterative process. Let Ci be a latent variable representing the membership of the i-th obsevation. The expectation step (E-step) maximizes the distribution of Ci for current parameters θ, yielding the responsibility of each component   ! αi , c ξ i ξ˙ i   ! . P (Ci = c) = C α P i P(k )P , k ξ i ξ˙ k=1 P(c)P

(9)

i

Leveraging this distribution, the maximization step (M-step) computes the optimal parameters solving the now simplified optimization problem ˆ = arg max θ θ

M X C X

 P (Ci = c) log P (c) P ξ˙i |ξ i , c P (αi |c)

i=1 c=1

Ac + AT c  I

s.t.

∀c = 1 · · · C .

(10)

ˆ c,α } are computed in closed form, see [24]. To estimate attractor ξˆ∗ and ˆ c,α , Σ The optimal {ˆ πc , µ avoid nonconvexity, we approximate (10) considering inverse dynamics (5) yielding surrogate problem ˆ = arg max θ θ

M X C X

 P (Ci = c) log P (c) P ξ i |ξ˙i , c P (αi |c)

i=1 c=1 −T A−1  inv I c + Ac

s.t.

∀c = 1 · · · C ,

(11)

with inv > 0. The stationary point for the estimation noise covariance in terms of the constrained ∗ optimal inverse dynamic parameters ξˆ , Aˆ−1 c is given by M P

ˆ c, = Σ

P

 ˆ∗ +A ˆ∗ +A ˆ−1 ξ˙ )(ξ −ξ ˆ−1 ξ˙ )T (ξi −ξ i i i c c



i=1

Ci = c

M P C P

P



i=1 c=1

.



Ci = c

Substituting this expression into (11), neglecting constant terms and assuming a diagonal covariance ∗ matrix, the constrained maximization step for ξˆ , Aˆ−1 1..C simplifies to the convex quadratic program ˆ−1 = arg max ˆ ∗ ,A ξ 1..C ξ∗ ,A−1 1..C

s.t.

A−1 c

+

C X

 1 ˆ c, − tr Σ 2 c=1

A−T c

 inv I

M P

 P (Ci = c)

i=1

∀c = 1 · · · C .

(12)

Although Aˆ−1 c is a valid estimate of the inverse of the forward dynamics matrix Ac of each com∗ ponent, a direct optimization of (10) considering the resulting ξˆ after solving (12) yields more ˆ c, similar accurate results. This requires an additional convex optimization step for Aˆ1..C and Σ to (12) but considering forward dynamics (4). Given an initial θ, the EM algorithm iteratively apˆ c,α } from (10), attractor ξˆ∗ from (12) ˆ c,α , Σ plies the E-step (9) followed by the M-steps for {ˆ πc , µ ˆ c, until convergence. and linear dynamics matrix Aˆ1..C and estimation noise Σ 4

10

5

0

25 20 15 10

5 10 15 20 25 Number of mix. components (C)

140 120 100 80 40 60 20 0

Attractor RMSE [mm]

SEDS known ξ∗ ∗ LPV-EM known ξ SEDS unknown ξ∗ ∗ LPV-EM unknown ξ

Avg. computation time [s]

Prediction RMSE [mm/s]

15

5 0

5 10 15 20 25 Number of mix. components (C)

5 10 15 20 25 Number of mix. components (C)

[mm]

Figure 1: Results for the LASA handwriting dataset for SEDS and LPV-EM with and without knowing the attractor ξ∗ . The left and center plots show the prediction RMSE and the the training time respectively for all four conditions. The right plot shows the attractor RMSE for conditions with unknown attractor. 40

0

20

-20

0

-40

SEDS unknown ξ ∗

50

150

100 0

50 0 -60

-20 -40 -60

-40

-20

0

20

-80 -200 -150 -100

-50

-50 -50

0

-100 -50

50

0

50

100

150

200

-100 -300

-200

-100

0

100

-200

-100

0

100

[mm]

LPV-EM unknown ξ ∗ 40

0

150

20

-20

100

0

-40

50 0

50 0

-20 -40 -60

-60 -40

-20

0

20

-80 -200 -150 -100

-50

-50 -50

0

-100 -50

50

0

50

100

150

200

-100 -300

Figure 2: Streamlines and attractors of four recordings of the LASA handwriting dataset for SEDS (first row) and LPV-EM (second row) with unknown attractor ξ∗ and c = 7 components. Red dots show the training samples and the blue marker indicates the estimated attractor.

2.4 Evaluation We implemented our method in MATLAB using the sedumi [25] solver and the YALMIP [26] interface to solve (12). We compare our method LPV-EM with α = ξ to the SEDS method minimizing the mean squared error using the LASA handwriting dataset and the provided code from [6], which consists of 24 demonstrated two-dimensional trajectories of handwritten letters. We consider four conditions: SEDS and LPV-EM both with and without known attractor ξ ∗ . All conditions are evaluated between 1 and 25 components. For the EM algorithm, we initialize ξ ∗ with the average of observed states and the rest of parameters with the initial clustering given by k-means. We set inv = 0.5 and  = 10−6 . We evaluate our approach in terms of the root mean squared erM P ˆ 1 ror (RMSE) of the estimated velocity computed as M ||ξ˙ i − ξ˙ i ||, the RMSE of the estimation of i=1

the attractor ξ ∗ and the computation time. In terms of the prediction RMSE, as shown in Fig. 1, when the attractor is known SEDS outperforms all other conditions as it directly minimizes the MSE. LPV-EM maximizes the whole likelihood including the distribution of α (which will be necessary in the next section) resulting in a slightly less accurate prediction. When the attractor is unknown, LPV-EM clearly outperforms the SEDS variant and the prediction performance of LPV-EM with and without knowing the attractor is almost identical. In terms of computation time, both LPV-EM conditions yield significantly faster computation times. Concerning the RMSE of of the estimation of the attractor, the LPV-EM condition yields significantly better results. As illustrated in Fig. 2, when the attractor is unknown, the SEDS estimation might fall into local minima, while the LPV-EM variant achieves always consistent results due to the convexity of (12). In summary, the LPV-EM model is a fast alternative to SEDS when the attractor is known and it is the only reliable option when the attractor is unknown as shown in https://youtu.be/ojAhun_1_uQ.

3

Learning Task Sequences with HMMs and stable LPV systems

In this section, to represent task sequences, we formulate an HMM where each subtask is given by an asymptotically stable quasi-LPV system. In addition, following recent work [2, 27], transitions between latent states (subtasks) depend on observations by means of a termination policy. We then study the dynamics of the HMM-LPV model and propose a learning and motion generation method that guarantees stability of the full sequence for left-to-right and periodic models. 5

d }D Problem formulation: Given a set of demonstrated trajectories {Ξd = {ξ i , ξ˙ i , αi }Ti=1 d=1 , the problem considered in this section is the acquisition of a dynamic model of the task assuming that it consists of a sequence of S goal-oriented globally asymptotically stable subtasks.

3.1

The HMM-LPV model

Let si ∈ {1, 2, ..., S} be a discrete latent variable representing a subtask at time step i and bi ∈ {0, 1} a termination binary variable that represents the event of finishing (bi = 1) or not (bi = 0) subtask si . The HMM-LPV model assumes that observed trajectories are distributed as   ! S 1 X X ξ1 P (Ξ ; λ) = P (s1 ; λπ ) P ˙ s1 ; λe ξ1

s1 =1 bi−1 =0 T X S S Y X

1 X

P (si |si−1 , bi−1 ; λa ) P (bi−1 |si−1 , ξi−1 ; λb ) P

i=1 si =1 si−1 =1 bi−1 =0

  ξ i

ξ˙ i

si ; λe

 ,

(13)

with parameter set λ = {λπ , λa , λb , λe } and where • initial subtask probabilities are P (s ; λπ ) = λπ = {π s } for 1 ≤ s ≤ S. • subtask transition parameters λa = {ajk } for 1 ≤ j, k ≤ S, with j 6= k, are such that in case of no termination P (j|j, b = 0 ; λa ) = 1 and in case of termination P (j|k, b = 1 ; λa ) = ajk and P (j|j, b = 1 ; λa ) = 0. As we are only considering left-to-right or periodic topology, if j precedes k then ajk = 1. • emissionprobabilities by a quasi-LPV systems, i.e. λe = {θ 1 · · · θ S }  are given  and

P

  ξ s ; λe ξ˙

= P

  ξ ; θs ξ˙

as in (3) where θ s = {ξ ∗s , θ s,1 , · · · , θ s,C }

and θ s,c = {πs,c , µs,c,α , Σs,c,α , As,c , Σs,c,ξ˙ }. • termination probabilities are given by an ξ-dependent Bernouilli distribution P (b = 1|s, ξ ; λb ) = exp{−(ξ − µb,s )T Σ−1 b,s (ξ − µb,s )} with parameters λb = {µb,1 , Σb,1 · · · µb,S , Σb,S }. tion P (b = 0|s, ξ ; λb ) = 1 − P (b = 1|s, ξ ; λb ).

(14)

From the assumed distribu-

3.2 Expected dynamics of the HMM-LPV model At time step i and given t samples, i.e. {ξ t }it=1 , the expected dynamics are S   X   ˜ s,i+1 (ξ )E P ξ˙ |ξ ; θ s , ξ˙ hmm = E P ξ˙ i |{ξ t }it=1 ; λ = h i i i

(15)

s=1

   ˜ s,i+1 (ξ ) = P si |{ξ }i ; λ is the forward variable where E P ξ˙ i |ξ i ; θ s is given by (6) and h t t=1 i computed as 1 S X   X ˜ ˜ s,i (ξ )P (s|si , bi ; λa ) P (bi |si , ξ i ; λb ) P ξ si ; λe hs,i+1 (ξ i ) = h (16) i−1 i si =1 bi =0

and is normalized such that

S P ˜ s,i+1 (ξ ) = 1. In this case, we only preserve the stability properties h i s=1

˜ s,i (ξ ) = 1, when we recover dynamics (6). To ensure convergence of the of the s-th subtask if h i whole sequence, we define scurr as the current subtask and snext as the next subtask with dynamics   ˜ s (ξ ) = 1 ˜ s (ξ ) = 1 snext,i if h s +1 if h i i next next scurr,i+1 = snext,i+1 = next,i . (17) scurr,i otherwise snext,i otherwise Conditions for the stability of transitions from scurr to snext are given in the following proposition. Proposition 2 (Stability of transitions). Let scurr , snext be the current and the next subtask respectively with dynamics (17) and let the conditional forward probability be computed considering these two states only, i.e. S = {scurr , snext } in (16). Any trajectory that reaches the attractor ξ ∗scurr will ˜ s ,i (ξ ) = 0, h ˜ s ,i (ξ ) = 1 if converge to the subtask distribution h i i curr next  ∗ P b = 1|scurr , ξ scurr ; λb = 1 . (18) 6

Example 1

60

40

40

20

20 80

60

100

0 20

120

40

80

60

60

20

40

20 80

100

0 20

120

20 0

80 100 120

40

80

60

100

20 0

120

60

20

40

60

20 0

80 100 120

60

80

100

120

x1

0 20

x2

60

x2

80

60

x2

100

80

x2

100

80

40

60 40

40

60

40

80

100

20 0

120

x1

60

80 100 120

x1

100

0 20

20

Simulated traj. with pert.

80

20

Attractor Streamlines Responsibility ξ-Gaussians Term. function

x1

Simulated trajectories

100

40

80 100 120

(d)

40

x1

20

60

80

60 40

Simulated traj. with pert.

40

40

100

80

x1

20

x1

(c)

100

Attractor Streamlines Responsibility ξ-Gaussians Term. function

20

Simulated trajectories

60 x1

40

40

60

20 0

120

(d)

x2

80

60

40

100

60 40

40

100

80

0 20

60

x1

x2

x2

100

80

80

x1

(c)

(b)

100

x2

40

(a)

100

x2

80

60

x2

x2

80

0 20

Example 2 (b)

100

x2

(a)

100

60 40

20

40

60

80 100 120

x1

20 0

20

40

60

80 100 120

x1

Figure 3: Resulting subtasks with sequence (a)-(b)-(c)-(d) and simulated trajectories for the HMM-LPV model with 4 subtasks and 7 GMM components each for two exemplary sets of human demonstrations captured with a mouse. Black trajectories represent training samples and the pink asterisk indicates the initial positions. The first two rows depict the subtask parameters while the last row shows simulated trajectories depicted by the green solid lines generated following (19) with (left) and without (right) random perturbations every second.

˜ s ,i (ξ ) = 1.  Proof. Evaluating (16) for S = {scurr , snext } at ξ i = ξ ∗scurr and with (18) yields h i next To guarantee that ξ ∗scurr is reached, we modify the original expected dynamics (15) adding a stabilizing input [28] such that ( 0 if lT ξ˙ hmm > 0 (19) ξ˙ = ξ˙ hmm + ξ˙ corr ξ˙ corr = lT ξ˙ hmm otherwise − klk l + corr l where l = (ξ ∗scurr − α) and corr > 0. If Proposition 2 is fulfilled by every subtask, a left-toright model following (19) is guaranteed to converge to the attractor of the last subtask, while a periodic model exhibits stable discrete limit cycle dynamics. Note that Proposition 2 provides very conservative conditions and transitions usually converge without intervention of the corrective input. 3.3

Learning an HMM-LPV model with the Baum-Welch algorithm

To estimate parameters from demonstrations we apply the Baum-Welch algorithm [29], the EM algorithm for HMMs considering conditions from Proposition 1 and 2 as D X log P (Ξd ; λ) (20) arg max λ

d=1

s.t. P (b = 1|s, ξ ∗s ; λb ) = 1 , As,c + AT s,c  0

∀s = 1 · · · S, ∀c = 1 · · · C .

(21)

The E-step and the M-steps for λπ , λa are detailed in [29] and [27]. The M-step for emission probabilities λe , is similar to (12) differing only in the responsibilities computed in the E-step. The M-step for the termination probabilities λb with RBF function (14) yields a similar problem structure to logistic regression [27] but with an ellipsoid boundary function and therefore a nonconvex objective. However, constraint (21) couples these two problems together and, intuitively, the maximization of λe drives solutions for λb towards regions where dynamics converge. Given an initial λ, the optimal parameters are computed applying the E- and M-step iteratively until convergence. 3.4

Validation

We implemented our approach in MATLAB using the FMINSDP solver [30] to solve the joint maximization of λb and λe from (20). We initialize the model parameters with k-means. With this initial clustering we apply the M-step of the LPV-EM to initialize λe while the covariance of the RBF termination function is initially set to the variance of the corresponding cluster. In our experiments, we set positive constants to corr = 1 in (19), and inv = 0.5, inv = 10−6 in (12). We first illustrate the capability of our model in two exemplary sets of 2-dimensional human motions captured with a mouse. As shown in Fig. 3 the HMM-LPV model is able to extract meaningful 7

Figure 4: (left) Schematic of the repetitive task reproduced through kinaesthetic demonstrations. (right) Resulting subtasks with cyclic subtask sequence (b)-(c)-(a) and simulated trajectories for the HMM-LPV model with 3 subtasks with 5 GMM components each for the kinaesthetic teaching task. (a),(b) and (c) depict the parameters of each subtask. The bottom right plot shows the simulated trajectory generated following (19).

subtask dynamics and termination policies. Extracted attractors are typically at the end of the direction of motion, but in some cases, e.g. Example 1 (a) or Example 2 (a) they are placed further away as the corresponding data does not exhibit convergence. In both examples, generated trajectories without perturbations yielded no corrections from the stabilizing input. In simulations with disturbances the time-independent state-feedback nature of the model allows for immediate replanning and trajectories converge to the last attractor. To validate our approach in a more realistic setting, we run our approach on the motions obtained during a kinaesthetic teaching session. The task consists of a repetitive process with three phases: reach, peel and retract while a second arm rotates the zucchini after each peel. We only modeled the motions of the peeling arm, and to capture the expected behavior we train our model with a periodic structure and 3 states. The resulting subtasks are depicted in Fig. 4. The periodic topology of the model successfully captures the observed motions structure and the reach-peel-back subtasks are represented by subtasks (b)-(c)-(a) respectively. Also, the simulated trajectory shows the state-dependent limit cycle behavior captured by the model, similar to the demonstrations. In summary, the HMM-LPV model is a suitable method to represent complex dynamic behavior with multiple attractors from data. The resulting dynamic policies are stable, time-independent and insensitive to perturbations as shown in the accompanying video https://youtu.be/92Xg3rFH8ag.

4

Conclusions and Outlook

In this paper, we first presented a method to learn asymptotically stable motions from demonstrations based on the formulation of an LPV system as a GMM. The main novelty lies on the proposed LPV-EM algorithm, which robustly learns both attractor and dynamics from data. We believe that the LPV-EM algorithm has potential application in other domains that require nonlinear system identification and that has also interesting extensions, such as the the addition of linear dependencies or constraints on the attractor for which the proposed M-step remains convex. This could be applied to interaction settings, where, the state of the interacting agent shifts the robot’s attractor. The capability of the LPV-EM algorithm to estimate the attractor enabled the formulation of the HMM-LPV model, a dynamic model for complex task sequences where each subtask is given by a stable LPV system and transitions between subtasks are guaranteed by means of a state-dependent termination policy. All subtasks are learned jointly without strict segmentation bounds and the way subtasks transition is also learned from data. Although our validation shows promising results, performance strongly depend on the initialization values. In our evaluation k-means provided satisfactory results but more complex experiments would require more sophisticated initialization methods. An interesting future work is the application of our model to incremental and nonparametric settings [31], which would partially mitigate the initialization issue. Another interesting direction is the application of the HMM-LPV model as a stable closed loop policy for reinforcement learning [32]. 8

Acknowledgments We gratefully acknowledge the funding provided by the European Commission (Horizon 2020 Framework Programme, H2020-ICT-643950) for the SecondHands project. Thanks to Wissam Bejjani for his great help with the implementation and to Sina Miravazzi for the fruitful discussions. Thanks to Nadia Figueroa for providing the data of the zucchini experiment.

References [1] D. Kuli´c, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura. Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 31(3):330–345, 2012. [2] O. Kroemer, C. Daniel, G. Neumann, H. Van Hoof, and J. Peters. Towards learning hierarchical skills for multi-phase manipulation tasks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1503–1510. IEEE, 2015. [3] S. Calinon, F. D’Halluin, E. Sauser, D. Caldwell, and A. Billard. Learning and Reproduction of Gestures by Imitation. IEEE Robot. Autom. Mag., 17(2):44 –54, 2010. [4] B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2):263–269, 1989. [5] J. Tani. Model-based learning for mobile robot navigation from the dynamical systems perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 26(3): 421–436, 1996. [6] S. M. Khansari-Zadeh and A. Billard. Learning Stable Non-Linear Dynamical Systems with Gaussian Mixture Models. IEEE Transaction on Robotics, 2011. [7] S. S. M. Salehian, M. Khoramshahi, and A. Billard. A Dynamical System Approach for Softly Catching a Flying Object: Theory and Experiment. IEEE TRO, 32(2):462–471, 2016. [8] K. Kronander, M. Khansari, and A. Billard. Incremental motion learning with locally modulated dynamical systems. Robotics and Autonomous Systems, 70:52–62, 2015. [9] S. Manschitz, J. Kober, M. Gienger, and J. Peters. Learning Movement Primitive Attractor Goals and Sequential Skills from Kinesthetic Demonstrations. Robotics and Autonomous Systems, 74, Part A:97–107, 2015. [10] J. Tani and S. Nolfi. Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems. Neural Networks, 12(7):1131–1141, 1999. [11] J. Kober, M. Gienger, and J. J. Steil. Learning movement primitives for force interaction tasks. In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages 3192– 3199, 2015. [12] Z. Su, O. Kroemer, G. E. Loeb, G. S. Sukhatme, and S. Schaal. Learning to Switch Between Sensorimotor Primitives Using Multimodal Haptic Signals. In International Conference on Simulation of Adaptive Behavior, pages 170–182. Springer International Publishing, 2016. [13] A. Ureche, K. Umezawa, Y. Nakamura, and A. Billard. Task Parameterization Using Continuous Constraints Extracted From Human Demonstrations. Robotics, IEEE Transactions on, 31 (6):1458–1471, Dec 2015. ISSN 1552-3098. doi:10.1109/TRO.2015.2495003. [14] G. Chang and D. Kuli´c. Robot task error recovery using Petri nets learned from demonstration. In Advanced Robotics (ICAR), 2013 16th International Conference on, pages 1–6. IEEE, 2013. [15] S. Manschitz, M. Gienger, J. Kober, and J. Peters. Probabilistic decomposition of sequential force interaction tasks into Movement Primitives. In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pages 3920–3927. IEEE, 2016. 9

[16] O. Kroemer, H. Van Hoof, G. Neumann, and J. Peters. Learning to predict phases of manipulation tasks as hidden states. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 4009–4014. IEEE, 2014. [17] S. Manschitz, J. Kober, M. Gienger, and J. Peters. Probabilistic progress prediction and sequencing of concurrent movement primitives. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, pages 449–455. IEEE, 2015. [18] R. Lioutikov, G. Neumann, G. Maeda, and J. Peters. Probabilistic segmentation applied to an assembly task. In Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on, pages 533–540. IEEE, 2015. [19] B. Bamieh and L. Giarre. Identification of linear parameter varying models. International journal of robust and nonlinear control, 12(9):841–853, 2002. [20] X. Jin, B. Huang, and D. S. Shook. Multiple model LPV approach to nonlinear process identification with EM algorithm. Journal of Process Control, 21(1):182–193, 2011. [21] G. Feng. A survey on analysis and design of model-based fuzzy control systems. IEEE Transactions on Fuzzy systems, 14(5):676–697, 2006. [22] S.-G. Cao, N. W. Rees, and G. Feng. Analysis and design for a class of complex control systems part I: fuzzy modelling and identification. Automatica, 33(6):1017–1028, 1997. [23] M.-T. Gan, M. Hanmandlu, and A. H. Tan. From a Gaussian mixture model to additive fuzzy systems. IEEE Transactions on Fuzzy Systems, 13(3):303–316, 2005. [24] J. Bilmes. A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, U.C. Berkeley, 1997. [25] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization methods and software, 11(1-4):625–653, 1999. [26] J. L¨ofberg. YALMIP : A Toolbox for Modeling and Optimization in MATLAB. In Proceedings of the CACSD Conference, Taipei, Taiwan, 2004. [27] C. Daniel, H. vanHoof, J. Peters, and G. Neumann. Probabilistic inference for determining options in reinforcement learning. Machine Learning, 104(2):337–357, 2016. ISSN 15730565. [28] S. M. Khansari-Zadeh and A. Billard. Learning Control Lyapunov Function to Ensure Stability of Dynamical System-based Robot Reaching Motions. Robotics and Autonomous Systems, 62 (6):752–765, 2014. [29] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257–286, 1989. [30] C. Thore. FMINSDP a code for solving optimization problems with matrix inequality constraints, 2013. [31] I. Havoutis, A. Tanwani, and S. Calinon. Online incremental learning of manipulation tasks for semi-autonomous teleoperation. In IEEE/RSJ IROS, 2016. [32] J. Rey, K. Kronander, F. Farshidian, J. Buchli, and A. Billard. Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies. Autonomous Robots, pages 1–20, 2017.

10

Suggest Documents