Nonlinear Infinite Horizon Model Predictive Control

0 downloads 0 Views 1MB Size Report
metric trajectories for control of nonlinear systems. We use ... that enable a swing-up of the pendulum and stabilization around its upper equilibrium. A testing.
Jan Carius

Nonlinear Innite Horizon Model Predictive Control with Parametric Trajectories Semester Thesis

Institute for Dynamic Systems and Control Swiss Federal Institute of Technology (ETH) Zurich

Supervision

Michael Mühlebach Prof. Dr. Raaello D'Andrea

December 2015

IDSC-RD-MMu-09

Contents Zusammenfassung

iii

Abstract

v

Nomenclature 1 2

3

Introduction

1

Mathematical Background and Tools

3

2.1 2.2 2.3

3.5

5

Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laguerre-Gauss Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generating Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Problem derivation

3.1 3.2 3.3

3.4

4

vii

General Formulation . . . . . . . . . . . . . . . . . . . Sequential Quadratic Programming . . . . . . . . . . . Reduction to a Quadratic Program . . . . . . . . . . . 3.3.1 Cost . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 System Dynamics . . . . . . . . . . . . . . . . . 3.3.3 Constraint Sampling . . . . . . . . . . . . . . . Simple Actuated Pendulum . . . . . . . . . . . . . . . 3.4.1 Dynamics . . . . . . . . . . . . . . . . . . . . . 3.4.2 Simulation through Polynomial Approximation 3.4.3 MPC on the Simple Pendulum . . . . . . . . . 3.4.4 Trajectory Generation . . . . . . . . . . . . . . Pendulum on a Cart . . . . . . . . . . . . . . . . . . . 3.5.1 Dynamics . . . . . . . . . . . . . . . . . . . . . 3.5.2 Initial Conditions . . . . . . . . . . . . . . . . . 3.5.3 Cost . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 State and Input Constraints . . . . . . . . . . . 3.5.5 Trajectory Generation . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

Performance in Simulation

4.1 4.2

Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Implementation

5.1 5.2 5.3 5.4

System Overview . . . . . . . Communication . . . . . . . . Filtering and Control . . . . . 5.3.1 LQR Controller . . . . 5.3.2 Friction Compensation Building and Deployment . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

i

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 4 5

7

7 7 8 8 9 9 9 9 10 12 14 15 15 17 17 18 19 21

21 21

27

27 27 29 29 29 30

Zusammenfassung Diese Arbeit zeigt, wie Model Predictive Control (MPC) mit unendlichem Zeithorizont mit parametrisierten Trajektorien kombiniert werden kann, um nichtlineare Systeme zu regeln. Wir benutzen exponentiell abfallende Ansatzfunktionen, um die Zeittrajektorien von Zustands- und Eingangsgröÿen eines Systems zu approximieren. Eine variationelle Formulierung erlaub uns, Kosten und Nebenbedingungen des Optimierungsproblems in Abhängigkeit der Parameter darzustellen. Die Optimierung wird über diese Parameter betrieben mithilfe von sequentiellen quadratischen Programmen. Eine vollständige MATLAB Implementation dieser MPC Routine wird entwickelt. Ergebnisse werden in Simulationen getestet anhand eines invertierten Pendels auf einem Schlitten. Wir zeigen, dass dieser Regler Trajektorien ndet, die ein Aufschwingen ermöglichen und das Pendel um den oberen Gleichgewichtspunkt stabilisieren. Eine Testumgebung mit einem realen System wird vorbereitet für weitere Forschung an diesem Regelschema.

iii

Abstract This work shows how innite horizon model predictive control (MPC) can be combined with parametric trajectories for control of nonlinear systems. We use exponentially decaying basis functions to approximate state and input trajectories in time. A variational formulation allows trajectory cost and constraints of the optimization problem to be expressed in terms of parameters describing the trajectories. Optimization is done over those parameters via sequential quadratic programming. A full MATLAB implementation of this MPC routine is developed. Results are tested in simulation on an inverted pendulum on a cart. We show that the controller nds trajectories that enable a swing-up of the pendulum and stabilization around its upper equilibrium. A testing environment with a real system is prepared for further research on this control scheme.

v

Nomenclature Symbols Unless noted otherwise, scalars are denoted in light (a), vectors in bold (a) and matrices in uppercase bold A. A tilde on top of a symbolic quantity (a ˜) represents its parametric approximation with basis functions.

g ⊗ δij diag([d1 , d2 , . . . ]) Ii Li (t) τi (t) λ N

Acceleration due to gravity Kronecker product Kronecker delta Diagonal matrix with elements di Identity matrix of dimension i by i Laguerre polynomial of order i Exponentially decaying Laguerre polynomial of order i Decay parameter Highest polynomial order of Laguerre polynomials

τ (t) η ξ u nqo qi , q u , r

Vector τ (t) = [τ1 (t), τ2 (t), . . . , τN (t)] Vector of trajectory parameters State of the system Input to the system Quadrature order Scalar cost weights

>

Acronyms and Abbreviations ETH MPC IC DoF LQR

Eidgenössische Technische Hochschule Model Predictive Control Initial Condition Degree of Freedom Linear Quadratic Regulator

vii

[m/s2 ] [−] [−] [−] [−] [−] [−] [−] [−] [−] [−] [−] [−] [−] [−]

Chapter 1

Introduction Model predictive control provides a systematic way for controlling complex systems under input and state constraints. Owing to the optimization procedure, there is usually a trade-o between computational tractability and prediction horizon because each time step introduces a further dimension in the optimization variables. The approach taken here avoids this specic tradeo by relying on parametric trajectories. Rather, we make a compromise between computation times and prediction accuracy, but without limiting the prediction time. This allows an innite horizon formulation where optimization still only happens over a manageable number of parameters. This work explores how the resulting optimization problem can be solved in a nonlinear way by using sequential quadratic programming. This is useful because nonlinear systems could be controlled that way with a single controller in dierent operating conditions. This report begins by introducing the mathematical background and tools (Chapter 2) that will be used in later parts of this work. Chapter 3 covers the derivation of the problem formulation, rst in a generic case and subsequently for two mechanical systems. The derived algorithms are then tested in simulation and results are presented in Chapter 4. The last Chapter deals with implementation details on a real system.

1

2

Chapter 2

Mathematical Background and Tools 2.1

Basis Functions

The key idea in this alternative MPC formulation is to use a limited set of parameters to describe time trajectories. A number of dierent basis functions could be used to approximate such trajectories in open time intervals. Popular choices are for example trigonometric functions or polynomials. In this project, we use an exponentially decaying polynomial family. A member τi (t) (indexed by subscript i) is dened by

√ τi (t) :=

(2.1)

2λ Li (2λt) exp(−λt) ,

where Li is the ith Laguerre polynomial. Stacking those members from order zero up to i = N yields the vector quantity τ (t) ∈ R(N +1)×1 . It comes with a number of properties that justify the choice of functions. In this work, we make use of a simple analytic derivative formula and mutual orthogonality as denoted by the equations (2.2)

τ˙ (t) = Mλ τ (t) , Z∞ hτi , τj i := τi (t)τj (t) dt = δij ,

(2.3)

0

where δij denotes the Kronecker delta. The matrix Mλ is a lower diagonal matrix with −λ on the diagonal and −2λ below the diagonal, i.e.,   −λ 0 ··· ··· 0  −2λ −λ 0 ··· 0     .. ..  .. ..  ∈ R(N +1)×(N +1) . . . . . −2λ Mλ =  (2.4)    .  . . . .. .. ..  .. 0  −2λ · · · · · · −2λ −λ Furthermore, the following two results shall be pointed out separately because they will be exploited to simplify expressions in subsequent parts:

Z∞

(2.5)

τ (t)τ > (t) dt = I(N +1) ,

0

Z∞

>

(2.6)

τ (t)τ˙ > (t) dt = Mλ .

0

3

4

2.2. Laguerre-Gauss Quadrature

2.2

Laguerre-Gauss Quadrature

When linearizing the optimization problem, one often encounters an integral of the form

Z∞

(2.7)

h(t)τi (t) dt , 0

with a generic scalar function h(t). By expanding the denition for τi (t) given in (2.1), one obtains by variable substitution

Z∞

Z∞ h(t)τi (t) dt =

0

0

Z∞   r √ s 2 (s:=λt) h(t) 2λ Li (2λt) exp(−λt) dt = h Li (2s) exp(−s) ds . (2.8) λ λ | {z } 0 ˆ =:h(s)

ˆ as indicated, one obtains an integral that is suitable for the By dening this new function h Laguerre-Gauss quadrature rule. In particular, we can approximate (2.7) with the quadrature rule Z∞ h(t)τi (t) dt ≈

nqo X

ˆ i) . wi h(s

(2.9)

i=1

0

The integer nqo is the quadrature order. Choosing the order is usually a compromise between computation time and accuracy, both scaling positively with the number of function evaluations. The evaluation points si are the roots of the Laguerre polynomial Lnqo and the weights wi are given by

wi =

(nqo +

si 2 1) (L

nqo +1 (si ))

2

.

(2.10)

Quadrature Performance

The quadrature rule gives exact results for polynomials up to order 2nqo − 1. Unfortunately, the functions that need to be integrated in our procedure contain sinusoidal terms, which have an innite polynomial order in theory. In order to assess the performance of the quadrature rule, we compare it to a variable step integral solver (MATLAB integral) and trapezoid integration. The test function that we want to integrate is

ˆ exp(−t) = t2 sin(tk exp(−t)) exp(−t) , h(t)

(2.11)

with varying exponent k . The original function h in this case would be a sinusoid and its argument the product of a polynomial and an exponential. This is the typical case which we will encounter later. The rst factor t2 corresponds to the polynomial term from L and may also have higher exponents. For illustration purposes, we x it to a squared term. Figure 2.1 plots the integrand for exponent values 3 and 8. One can clearly observe that as k increases, the function starts to oscillate vigorously. Therefore it will also become more and more dicult to correctly approximate the integral. Error plots of the integration can be seen in Figure 2.2. The left portion of the gure is a cross section of the 3D plot on the right hand side at quadrature order 20. To allow for a logarithmic scale, the relative errors are taken in absolute terms. One can clearly observe that Laguerre-Gauss quadrature performs very well for low exponents and starts to break down when k increases above 5 or 6. On the right hand side we see that increasing the quadrature order only marginally improves the result at the cost of much longer computation times. Furthermore, we also compare to the trapezoid rule as another technique for numerical integral approximation. We use 200 uniformly spaced nodes starting at t = 0 and ending at t = 10 s. Note that those are signicantly more

Chapter 2. Mathematical Background and Tools

5

0.6 k=3 k=8

ˆ exp(−t) Integrand h(t)

0.4 0.2 0 −0.2 −0.4 −0.6

0

2

4

6

8 t

10

12

14

16

ˆ exp(−t) (see equation (2.11)) for assessing the quadrature Figure 2.1: Plot of the integrand h(t) performance. Two dierent exponents k are shown.

function evaluations as Laguerre-Gauss quadrature needs. At the nishing time, the exponential term acquires a value of the order 10−5 while the polynomial has a magnitude of 102 . Hence the integrand should have died o suciently. Looking at Figure 2.1, one can conrm that the contributions after time t = 10 will be very small compared to the former ones. One can also observe that Laguerre-Gauss quadrature performs signicantly better at low k while requiring fewer function evaluations. We conclude that whenever Laguerre-Gauss quadrature gets used, a simple numerical experiment should be conducted to verify that the calculation is within an acceptable tolerance. When applying MPC in subsequent chapters, quadrature appears to perform well.

2.3

Generating Trajectories

To start the MPC algorithm o, a initial trajectory needs to be generated. How this is done depends on the problem at hand. The goal is to calculate a set of parameters that describe trajectories that are as close to the optimal solution as possible.

Extracting Parameters

Given that we have computed a trajectory, it must be modeled by parameters which shall be stacked in a column vector η . The elements of this vector describe the linear combination of polynomials τi (t) that approximate a time trajectory. In the following, the one-dimensional case is considered rst. Essentially, a linear combination of the exponentially decaying polynomials τ (t) need to be tted to the computed trajectory f (t) with minimal squared error. It follows that the

6

2.3. Generating Trajectories

101

relative error (signed)

100 relative error (unsigned)

20

Trapezoid Lag.-Gauss

10−1 10−2 −3

10

20 10 0 0 −20

−4

10

−10 10

10−5

20

5

−20

10−6

2

4

6

8

10

exponent k

exponent k

0 quadrature order

Figure 2.2: Error plots of integral approximations. Left plot compares the trapezoid rule and Laguerre-Gauss quadrature (order 20) to MATLAB's integral function. Right shows the relative errors of Laguerre-Gauss quadrature for various quadrature orders and exponents w.r.t. integral. Note the logarithmic error scale in the rst plot. optimal parameters are given by

Z∞ ηopt = arg min

f (t) − τ > (t)η

2

dt

η

0

Z∞ = arg min

 f 2 (t) − 2f (t)τ > (t)η + η > τ (t)τ > (t)η dt

η 0

Z∞ = arg min

 f 2 (t) − 2f (t)τ > (t)η dt + η > η

η 0

Z∞ =

f (t)τ (t) dt .

(2.12)

0

In the second to last step, we exploited that the element-wise products of τ integrate to the identity matrix (see (2.5)).

Chapter 3

Problem derivation This chapter is concerned with deriving the governing equations of the optimization problem. In general, the MPC formulation requires the denition of a cost, which is minimized over a given horizon. The control routine entails solving the optimization problem to nd an input sequence or trajectory that minimizes the cost. The following generic formulation will be adapted and substantiated in the subsequent sections (3.4 and 3.5) to suit the individual physical systems.

3.1

General Formulation

In order to make the optimization computationally tractable, the time trajectories of the system's states and the input are parametrized by a nite number of independent parameters. The approximated trajectories are denoted by a tilde symbol. The running cost L(ξ, u) is a function of the system's state ξ and the input u1 at any given time. System dynamics are to be considered in a variational fashion because the trajectories will not be able to fulll them exactly due to the limited degrees of freedom. The system dynamics are assumed to be of the form

ξ˙ = f (ξ, u) .

(3.1)

Initial conditions ξ(0) = ξ0 are to be enforced exactly and the system may have input and state constraints. Since state constraints are likely not to apply to all states, introduce the projection matrix K which selects the relevant states. Therefore we are looking for parameters ηopt which fullls

Z∞ min η

˜ u L(ξ(t), ˜(t)) dt

(3.2)

0

 R∞   ˙  > ˜ ˜ ˜  δ ξ ξ − f ( ξ, u ˜ ) dt = 0 ∀ δ ξ˜    0 ˜ = ξ0 s.t. ξ(0)   u ˜(t) ∈ [umin , umax ]    K ξ˜ ≤ ξmin/max

3.2

(dynamics) (IC) (input constraint) (state constraints)

(3.3)

Sequential Quadratic Programming

The nonlinear optimization problem (3.2, 3.2) is solved through sequential quadratic programming. In essence, the following steps get executed: 1 Input u is a scalar quantity here because we only consider systems with a single input. The derivation can be done in a similar way with multiple inputs.

7

8

3.3. Reduction to a Quadratic Program

1. Linearize all constraints. 2. Approximate the cost by a quadratic one. 3. Solve the resulting quadratic program. 4. Use the new solution for steps 1 and 2 again and continue until convergence. The following description provides a more detailed description of what happens. First, given an initial guess of the solution η0 , the system is approximated by a quadratic program (i.e., quadratic cost) with linear equality and inequality constraints around that solution. This is essentially an optimization problem in the new variable ηˆ = η − η0 . A solver subsequently returns the incremental improvement ηˆ. Adding that to the original guess, we obtain a new solution. This process is repeated until the improvement is suciently small. The condition used here is

||η(k+1) − η(k) || < tol , ||η(k) ||

(3.4)

with some tolerance tol and k being the iteration index. There are several drawbacks, some of which will also be encountered later: ˆ Linearizing around a point far from the optimum may render the problem infeasible. This is especially an issue when starting the algorithm with the very rst guess. ˆ The sequential quadratic programming routine may fall into a local minimum of the global nonlinear problem. This is the case even if all the individual quadratic programs return the global optimum of their respective sub-problem. The quadratic resp. linear approximations become misleading in this case and the algorithm may get stuck there. ˆ Any inequality constraints are dicult to introduce with this formulation because it is dicult to ensure that they are satised for all time instants.2 Time Shifting

At low sampling frequencies, it may make sense to shift the solution η from the last time step forward by the sampling period to have a better starting point in the next iteration. Starting from τ˙ (t) = Mλ τ (t), one easily nds

τ (t + ∆t) = exp(Mλ ∆t)τ (t) ,

(3.5)

where exp(.) is the matrix exponential in this case. Therefore the updated trajectory parameters can be computed via   > ηupdated = In ⊗ exp (Mλ ∆t) η , (3.6) where n is the order of the system.

3.3 3.3.1

Reduction to a Quadratic Program Cost

R∞ ˜ u ˜(t)) dt is a complicated nonlinear function in η . In the general case, the cost C(η) = 0 L(ξ(t), For a quadratic program, this needs to be approximated by the second order Taylor approximation. Considering a generic scalar function h(x) with vector argument x, its second order Taylor 2 Note that in the discrete MPC formulation, one also only fullls the state constraints at the sampling instants. What happens in between does not get captured.

Chapter 3. Problem derivation

9

approximation around x0 reads

1 h(x) ≈ h(x0 ) + g > (x0 )(x − x0 ) + (x − x0 )> H(x0 )(x − x0 ) , 2  > ∂h(x0 ) ∂ ∂h(x0 ) with g > (x0 ) = , H(x0 ) = . ∂x ∂x ∂x

(3.7)

The quantities g and H dene the quadratic cost and need to be passed to the quadratic program solver. 3.3.2

System Dynamics

The system dynamics usually reduce to an equation of the from F (η) = 0 with a nonlinear function (η0 ) (η − η0 ) and introducing ηˆ = η − η0 , the constraint F . After linearizing F (η) ≈ F (η0 ) + ∂F∂η reads

∂F (η0 ) ηˆ = −F (η0 ) . ∂η 3.3.3

(3.8)

Constraint Sampling

Inequality constraints such as the input saturation or state limits are dicult to enforce in this formulation. Therefore those are only sampled at discrete times ti , each time instant constitutes one equation per constraint. The time instants ti are the solutions of

τ (0)> τ (t) = 0 , ⇔

N X

Lk (2λt) = 0 .

(3.9) (3.10)

k=0

Additionally, we add t0 = 0 as a further sampling point. Results show, that those sampling times ensure a feasible solution for all times in most of the cases. Note on Partial Linearization

All of the above linearization was done with respect to the full η vector. There is also the possibility to linearize only with respect to those parts of η that actually introduce nonlinearities. Analytically, this yields the same results but there are dierences in implementation and computation. We observed for instance that there are (rare) cases where one method converges while the other does not. However as expected, if both converge they also yield the same result up to machine precision.

3.4

Simple Actuated Pendulum

First, the system of an actuated pendulum is considered. A schematic drawing and coordinate conventions can be seen in Figure 3.1. The input torque u acts directly on the frictionless joint. 3.4.1

Dynamics

The equation of motion of this single DoF mechanical system can be directly derived from rst principles and reads

Θ0 ϕ¨ = mg sin(ϕ) − u .

(3.11)

The moment of inertia Θ0 is calculated with respect to the pivot point. For simplicity, we consider the pendulum as a point mass m on a beam with negligible mass, hence Θ0 = ml2 . We introduce

10

3.4. Simple Actuated Pendulum

m l

ϕ

g

u

Figure 3.1: Schematic of the actuated pendulum. the generalized momentum p = Θ0 ϕ˙ and the system state ξ = system (n := dim(ξ) = 2) of rst order dierential equations,



ξ˙ =

ϕ˙ p˙



 = f (ξ, u) =

Θ0−1 p mgl sin(ϕ) − u



ϕ

p

>



to rewrite (3.11) as a

(3.12)

.

By writing the system in this form, we allow more freedom in fullling the kinematic constraint, because the states p and ϕ are approximated independently. If the second order system were to be considered directly, the only coordinate would be ϕ with its rst and second derivatives and the kinematics would be enforced exactly. 3.4.2

Simulation through Polynomial Approximation

As a preliminary step, we rst show how a variational approach can be taken to simulate the system evolution. Note that no input (u ≡ 0) is assumed for this part, however also nontrivial inputs of known time evolution could easily be treated in the same manner. The function f is therefore only taking one argument in this context. We approximate the state time trajectories by exponentially decaying polynomials

ϕ → ϕ˜ = τ (t)> ηϕ ,

(3.13)

p → p˜ = τ (t)> ηp .

Stacking the parameters into a single vector η = compactly expressed by



ηϕ>

ηp>

>

, the state and its variation can be

˜ = (I2 ⊗ τ (t))> η , ξ(t) ˜ = (I2 ⊗ τ (t))> δη . δ ξ(t)

(3.14) (3.15)

To nd suitable parameters η for the trajectory, dynamics and initial conditions are enforced in a variational fashion according to Galerkin's method. This essentially reduces to projecting the dynamics and initial conditions on the variation of the solution. With the above quantities we can write

Z∞ 

 ˜ > (ξ(t) ˜˙ − f (ξ(t))) ˜ ˜ > (ξ(0) ˜ − ξ0 ) = 0 δ ξ(t) dt + δ ξ(0)

˜ . ∀δ ξ(t)

(3.16)

0

 > Initial conditions are captured by ξ0 = ϕ0 p0 . Note that they are only enforced through ˜ the variation δ ξ(0) and not strictly. This will be changed in subsequent section when MPC is introduced. By inserting the approximations into above equation, one arrives at conditions for η which ensure that the system dynamics and initial conditions are fullled at least approximately. The quality of the approximation depends on the degrees of freedom available through the basis

Chapter 3. Problem derivation

11

ϕ [rad]

ϕ [rad]

ODE solver Approximation

π

2π π 0

0 −π

−π 0

1

2

3

4

0

5

1

4

5

4

5

2 ϕ˙ [rad/s]

5 ϕ˙ [rad/s]

3

Time [s]

Time [s]

0 −5 −10

2

0 −2

0

1

2

3

Time [s]

4

5

0

1

2

3

Time [s]

Figure 3.2: Left side shows the pendulum swinging around ϕ = π , right side corresponds to an oscillation around ϕ = 0 (inverted gravity). functions τ (t). The resulting equation     1 0 1 I2 ⊗ Mλ> η − ⊗ I(N +1) η 0 0 Θ0     > ! Z∞  0 1 − ⊗ τ (t) mgl sin ⊗ τ (t) η dt 1 0 0  + I2 ⊗ τ (0)τ (0)> η − (I2 ⊗ τ (0)) ξ0 = 0 ,

(3.17)

is an algebraic expression in η and can be solved numerically, for example with a damped newton method. The derivative can easily be calculated symbolically for solvers that rely on gradient information. The integral in (3.17) (and its derivative w.r.t. η ) can be computed using LaguerreGauss quadrature (see Section 2.2). Results

The plots in Figure 3.2 show how above procedure manages to simulate the system dynamics. It is evident that an oscillation around zero can be much better approximated. This is to be expected because the exponential decay term in τ (t) forces the trajectory towards zero. The simulated system is the pendulum with m = l = Θ0 = 1. The approximation uses Laguerre Polynomials up to order N = 7 and a decay rate of λ = 5. All integrals were solved with LaguerreGauss Quadrature of order 17. As one may expect, the polynomial approximation improves with increasing polynomial order. This intuitively makes sense, because there are more DoFs available to approximate the true trajectory. However, there are also more oscillations with higher orders which lead to a jittering eect in MPC. One may potentially improve the approximation by splitting the time interval into two pieces with dierent decay rates λ. Furthermore, convergence of the damped newton method appears to be sensitive to initial conditions. Therefore, a number of dierent initial guesses for η are required and tested sequentially until a working one is found.

12

3.4. Simple Actuated Pendulum

3.4.3

MPC on the Simple Pendulum

Considering the simple actuated pendulum (Figure 3.1) again, we would like to explore how model predictive control performs. The pendulum is to be stabilized at the upper equilibrium point, which motivates the running cost q2 r L(ϕ, p, u) = (− cos(ϕ) + 1)q1 + p2 + u2 . (3.18) 2 2 The rst term penalizes deviations from the equilibrium position in terms of height of the pendulum endpoint. The other two terms help in avoiding high angular speeds and large control commands. The coecients q1 , q2 , r are weighting factors and should be tuned for good performance. We want nd trajectories, that minimize this cost function while fullling the dynamics, initial conditions and input saturation. The search is not done over all possible trajectories, but only over those that are attainable by our basis functions. This greatly simplies optimization, because it leads to a nite dimensional parameter search space instead of a function space. As long as the order N of the polynomials is large enough, the approximation holds relatively well. We introduce approximations

ϕ → ϕ˜ = τ (t)> ηϕ ,

p → p˜ = τ (t)> ηp ,

u→u ˜ = τ (t)> ηu .

(3.19)

For convenient notation and mathematical treatment, we stack all parameters to one unied pa>  ∈ R3(N +1) . The state and input can be extracted by rameter vector η = ηϕ> ηp> ηu>      ϕ˜ 1 0 0 ξ˜ = = ⊗ τ (t)> η , (3.20) p˜ 0 1 0    0 0 1 ⊗ τ (t)> η . u ˜= (3.21) With this formalism, the problem reduces to the form introduced in (3.2) and (3.3). Those equations constitute an optimization problem in η . The exact formulations and conversions to a quadratic program with linear constraints are dealt with in the following sections. Note that the system dynamics and initial conditions yield equality constraints and the input saturation inequality constraints. Cost

The total cost C is the running cost (3.18) integrated over all times. The quadratic costs in p and u yield quadratic terms in η after simplication:

Z∞ C(η) :=

Z∞ − cos

L(ϕ, ˜ p˜, u ˜) dt = 0



1

0

0



   ⊗ τ (t)> η + 1 q1

0

  2 r    2 q2  0 1 0 ⊗ τ (t)> η + 0 0 1 ⊗ τ (t)> η dt + 2 2 Z∞      1 0 0 ⊗ τ (t)> η + 1 q1 dt = − cos 0

 0 1 >  0 + η 2 0 |

0 q2 0

  0 0  ⊗ I(N +1)  η . r {z }

(3.22)

=:H1

For a quadratic cost approximation, we use second order tailor expansion as shown in (3.7). The rst derivative (as required by the quadratic program) of the cost reads

∂C(η) = ∂η

Z∞ q1 sin 0



1

0

0



   1 ⊗ τ (t)> η

0

0



 > ⊗ τ (t)> dt + (H1 η) ,

(3.23)

Chapter 3. Problem derivation

13

and the second derivative (quadratic term) is

∂ ∂C(η) ∂η ∂η

Z∞

>

=



q1 cos

1

0

0



   ⊗ τ (t)> η diag( 1

0

  0 ) ⊗ τ (t)τ (t)> dt + H1 .

0

(3.24) Dynamic Constraint

The dynamic constraint equation for this problem is obtained by inserting (3.20) and (3.21) into the dynamics equation in (3.3). In particular, this results in    Z∞ 1 0 δη >  0 1  ⊗ τ (t) · 0 0 0      1 0 0 Θ−1 ˜ 0 p ⊗ τ (t)> Mλ> η − dt = 0 . (3.25) 0 1 0 mgl sin (ϕ) ˜ +u ˜

         1 0 0 ⊗ τ (t)> η , p˜ = 0 1 0 ⊗ τ (t)> η and u 0 0 1 ⊗ τ (t)> η where ϕ˜ = ˜= have not yet been substituted for readability. After some algebra and simplications exploiting the choice of basis functions we arrive at       1 0 0 0 1 0  0 1 0  ⊗ Mλ>  η −  0 0 0  ⊗ I(N +1)  Θ−1 η 0 0 0 0 0 0    Z∞ 0     1 0 0 ⊗ τ (t)> η dt −  1  ⊗ τ (t) mgl sin 0 0    0 0 0 (3.26) −  0 0 1  ⊗ I(N +1)  η = 0 . 0 0 0 That equation shall now be referred to as F (η) = 0 for improved readability. Since we want to perform quadratic programming, linearization is necessary because only constraints linear in η are (η0 ) permissible. We use a rst order taylor expansion F (η) ≈ F (η0 ) + ∂F∂η (η − η0 ) around η0 . The Jacobian in this particular case is given by       0 1 0 1 0 0 ∂F (η0 )  0 1 0  ⊗ Mλ>  −  0 0 0  ⊗ I(N +1)  Θ−1 = ∂η 0 0 0 0 0 0    ∞ Z 0        1 0 0 ⊗ τ (t)> η0 1 0 0 ⊗ τ (t)> dt −  1  ⊗ τ (t) mgl cos 0 0    0 0 0 −  0 0 1  ⊗ I(N +1)  . (3.27) 0 0 0 Initial Conditions

Initial conditions given as ξ0 =



1 0

0 1

0 0



>

⊗ τ (0)



ϕ(0)

 η = ξ0 .

p(0)

>

directly yield a linear constraint (3.28)

14

3.4. Simple Actuated Pendulum

Since we want to solve the optimization in terms of ηˆ, this needs to be rearranged to       1 0 0 1 0 0 > > ⊗ τ (0) ηˆ = ξ0 − ⊗ τ (0) η0 . 0 1 0 0 1 0

(3.29)

This constraint can be stacked with the dynamic constraint from above to one big equality constraint, to be supplied to the quadratic program solver. Input Constraint

Input constraints have the form umin ≤ u(t) ≤ umax and should be fullled for all times t. This is not trivial to enforce, so instead the constraints are sampled at discrete sampling times ti (see Section 3.3.3 for details). Hence the inequality constraints for lower and upper limit     >  0 0 1  ⊗ τ (ti )>  η ≤ umax , (3.30) 0 0 1 ⊗ τ (ti ) η ≤ −umin , − are simply stacked for all sampling times. Those are already linear, so to make them suitable for our solver we simply need to express them in terms of ηˆ, which gives        > >  0 0 1  ⊗ τ (ti )>  ηˆ ≤ umax − 0 0 1 ⊗ τ (ti ) > η 0 , (3.31) 0 0 1 ⊗ τ (ti ) ηˆ ≤ −umin + 0 0 1 ⊗ τ (ti ) η0 . − 3.4.4

Trajectory Generation

Sequential quadratic programming relies on linearization. Therefore, it is necessary to come up with near-optimal starting values to linearize around, such that the approximated constraints resemble the real system as close as possible. This is especially a problem when the algorithm gets initialized, because no prior parameter vector η (e.g., from the last time step) is available. The MPC algorithm does for instance not converge to a swing-up solution when the pendulum rests in its stable equilibrium and the initialization values of η are all zero. This is because it falls into a local minimum from which it cannot recover and the problem becomes infeasible. In this case, we resort to trajectory generation. We use a dierent method to generate a trajectory that the MPC algorithm could follow and start optimizing from. A simple energy controller shows sucient performance in our case. A short derivation is provided in the following. The total potential and kinetic energy stored in the system is

H=

1 Θ0 ϕ˙ 2 + mgl cos(ϕ) . 2

(3.32)

The system is conservative except for the external torque. Hence we can use the input u to supply energy to or remove energy from the system, governed by H˙ = −uϕ˙ . Energy control means, that we want the system to attain a desired energy level H0 = mgl that is equal to the energy of the goal state (resting at the upright equilibrium). Hence we choose the control law

1 u = ϕ(H ˙ − H0 )k = ϕ(Θ ˙ 0 ϕ˙ 2 + mgl cos(ϕ) − H0 )k , 2

(3.33)

with gain k such that

H˙ = −uϕ˙ = −ϕ˙ 2 (H − H0 )k .

(3.34)

This controller is not capable of stabilizing the system at its upper equilibrium. However this is not required because we are only interested in the swing-up trajectory. If stabilization is desired additionally, it can be achieved by catching the pendulum near its equilibrium with a simple PD controller.

Chapter 3. Problem derivation

15

ϕ l

x

M

m g

u

Figure 3.3: Schematic of the actuated pendulum. Table 3.1: Parameters of the cart-pendulum system. Description Symbol Value Unit Eective cart mass M 1.73 kg Pendulum point mass m 0.175 kg Pendulum arm length l 0.28 m

3.5

Pendulum on a Cart

In the following, we consider the mechanical system of an inverted pendulum on a trolley. A schematic drawing including coordinate conventions can be found in Figure 3.3. 3.5.1

Dynamics

Similar to the previous section, the system dynamics are characterized rst. The mechanical system has 2 DoF and therefore n = 4 states. Physical parameter values are given in Table 3.5.1. The input to the system is the resultant force acting on the cart. For example by using Lagrange's equations, one obtains the following equations of motion in the minimal coordinates x and ϕ:         0 x 0 0 1 0 x˙ 1 0 0 0    ϕ˙   0 0 0 1   ϕ    0 1 0 0 0  (3.35) +  =   2  0 0 M + m ml cos(ϕ)   x ¨   0 0 0 0   x˙   u + mlϕ˙ sin(ϕ)  mgl sin(ϕ) 0 0 0 0 ϕ˙ ϕ¨ 0 0 ml cos(ϕ) ml2 For better readability, the time dependence is not explicitly stated, but it should be noted that all variables other than the physical parameters vary in time. We dene the mass matrix M and the generalized momentum p as   M + m ml cos(ϕ) M (ϕ) := , (3.36) ml cos(ϕ) ml2     p1 x˙ p := = M (ϕ) . (3.37) p2 ϕ˙ This allows simplifying the equations of motion     x M −1 (ϕ)p  ϕ  d  =  =: f (ξ, u) , u ξ˙ = (3.38) dt  p1  mgl sin(ϕ) − mlϕ˙ x˙ sin(ϕ) p2  > where ξ := x ϕ p1 p2 denes the new state vector. Now we introduce the approximation of time trajectories via a parametric representation of state and input,    ξ˜ = I5 ⊗ τ > (t) η . (3.39) u ˜

16

3.5. Pendulum on a Cart

This formulation is equivalent to (3.20), (3.21) concatenated with four states and one input. The system dynamics are formulated according to the Galerkin method

Z∞

  ! ˙ ˜u δ ξ˜> ξ˜ − f (ξ, ˜) = 0

∀ δξ .

(3.40)

0

Note that the zero vector, 0, is a column vector of appropriate length with only zero entries. By expressing the approximated state trajectory ξ˜ and its derivatives as a function of η ,    I4 0 ⊗ τ > (t) η , ξ˜ = (3.41)    I4 0 ⊗ τ > (t) δη , δ ξ˜ = (3.42)    ˙˜ > > I4 0 ⊗ τ (t)Mλ η , ξ= (3.43) we obtain the system dynamics in the form of equality constraints for η :    0>    0>     >  0 0 0 0 1  ⊗ I(N +1)  η diag([1, 1, 1, 1, 0]) ⊗ Mλ η −        0> 0>    1 0 0   Z∞  0 1 0     M −1 (ϕ) ˜ p˜ !  0 0 0  ⊗ τ (t) dt = 0 . −   mgl sin(ϕ)   ˙x ˙ sin(ϕ) ˜ − ml ϕ ˜ ˜ ˜   0 0 1  0 0 0 0

(3.44)

The product ϕ˜˙ x ˜˙ can also be expressed as a function of the system's state (which was not inserted above for readability):   1 0 1 ϕ˜˙ x ˜˙ = p˜> M −1 (ϕ) ˜ M −1 (ϕ) ˜ p˜ . (3.45) 1 0 2 Note that all state variables in (3.44) (i.e., all variables that have a tilde above them) are linear functions in η , namely    0 0 1 0 0 > p˜ = ⊗ τ (t) η , (3.46) 0 0 0 1 0    0 1 0 0 0 ⊗ τ > (t) η . ϕ˜ = (3.47) At this stage, we have moved all explicit time dependence to the decaying polynomials τ (t). Linearized Dynamics

The dynamics (3.44) with (3.45), (3.46) and (3.47) inserted are of the form F (η) = 0 with a nonlinear function F . In a quadratic program, we are only permitted to have linear constraints, therefore this function must be linearized around a given setpoint η0 . With ηˆ = η − η0 the linearization reads

F (η) ≈ F (η0 ) +

∂F (η0 ) ! (η − η0 ) = 0 , ∂η ∂F (η0 ) ⇔ ηˆ = −F (η0 ) . ∂η

(3.48)

The Jacobian matrix on the left hand side of (3.48) can be computed analytically with a computer algebra software package. Again an integral remains which is evaluated by Laguerre-Gauss quadrature.

Chapter 3. Problem derivation

17

Taking the derivative of (3.44) with respect to η is not trivial because of the complicated dependence on η . Here, we use the Symbolic Math Toolbox from MATLAB. The dicult part is the second factor of the integrand. The current implementation takes the symbolic derivative of that factor, then converts it into a MATLAB function that can be evaluated. This function denition can potentially be used in other code for speed improvements. The quadrature rule used to solve the integral is only applied afterwards and relies on evaluations of the derivative at the support points. 3.5.2

Initial Conditions

We want to enforce the initial conditions exactly by introducing them as an equality constraint in the optimization procedure. The corresponding equation reads   x0     ϕ0 !  ˜ =   , I4 0 ⊗ τ > (0) (ηˆ − η0 ) =  ξ(0) (3.49)  x˙ 0 | {z }  M (ϕ ) 0 η ϕ˙ 0 which can easily be solved for ηˆ. During runtime, the values for x0 , ϕ0 , x˙ 0 , ϕ˙ 0 are obtained by direct measurement from the system and possibly a state observer and η0 is known from initialization or the previous optimization step. 3.5.3

Cost

The cost is the central tuning knob for the MPC algorithm. We penalize deviations from the nominal operating point, which corresponds to the cart resting at zero displacement with the pendulum in its upright equilibrium at zero angular velocity. All states are penalized quadratically, except the angle due to its 2π periodicity. Hence we choose to penalize the height deviation of the pendulum end point. Additionally, the input is penalized quadratically. The total cost C therefore reads   x ˜ ∞ Z     p˜1  1  + q2 (1 − cos(ϕ))  x ˜ p˜1 p˜2 u ˜ diag q1 q3 q4 r ˜ dt C= 2 | {z }  p˜2  0 =:Q u ˜       1 0 0 0 1 0 0 0 0   0 0 0 0  Z∞       1 >  0 1 0 0  ⊗ τ (t) Q  0 0 1 0 0  ⊗ τ > (t) η = η          0 0 0 1 0 2   0 0 1 0  0 0 0 0 0 1 0 0 0 1     0 1 0 0 0 ⊗ τ > (t) η dt . + q2 1 − cos (3.50) When dening the above constant matrices kronecker product  1 0  0 0 Acost :=   0 0 0 0

to be Acost and using a simplication for the matrix-

0 1 0 0

0 0 1 0

 0 0   , 0  1

(3.51)

> > > > > (A> cost ⊗ τ )Q(Acost ⊗ τ ) = (Acost ⊗ τ )(Q ⊗ 1)(Acost ⊗ τ ) = (Acost QAcost ⊗ τ τ ) , Z∞ > > H1 := (A> cost QAcost ⊗ τ τ ) dt = (Acost QAcost ⊗ I(N +1) ) , (3.52) 0

18

3.5. Pendulum on a Cart

the cost can be simplied to

1 C = η > H1 η + 2

Z∞ q2 1 − cos



0

1

0

0

0



  ⊗ τ > (t) η dt .

(3.53)

0

Quadratic Cost Expansion

The above cost is not suitable for a quadratic program, instead we must make it quadratic. This is done by linearization, see Section 3.3.1 for details. The derivatives can be found analytically and are given by     0    1  Z∞    >     ∂C(η) >  0  ⊗ τ (t) 0 1 0 0 0 ⊗ τ > (t) dt , η = (H1 η) + q2 sin        ∂η    0  0 0 (3.54)     0    1  Z∞    >    0  ⊗ τ (t) diag ([0, 1, 0, 0, 0]) ⊗ τ (t)τ > (t) dt . η H(η) = H1 + q2 cos         0  0 0 (3.55) Regularization

In some cases, it might be useful to also penalize the rate of change of the input (u˙ ). This means the term

Z∞ 0

 1 2 1 u˙ qu dt = qu η > diag( 0 2 2 |

0

0

  0 1 ) ⊗ Mλ Mλ> η {z }

(3.56)

=:Hu

needs to be added to the total cost. Essentially, Hu gets added to H1 in equation (3.53). 3.5.4

State and Input Constraints

This system has to obey two kinds of inequality constraints, one is the input saturation and the second is the limited distance that the cart may move. This is expressed as

xmin ≤ x(t) ≤ xmax ,

(3.57)

umin ≤ u(t) ≤ umax .

(3.58)

Introducing parametrization via basis functions, the above read      −1 0 0 0 0 −xmin > ⊗ τ (t) η≤ , 1 0 0 0 0 xmax      0 0 0 0 −1 −umin ⊗ τ (t)> η ≤ . 0 0 0 0 1 umax

(3.59) (3.60)

Similar to the procedure in case of the simple pendulum (Section 3.4.3) these constraints are sampled and stacked together for the dierent sampling times ti . In addition, they are formulated in terms of ηˆ = η − η0 .

Chapter 3. Problem derivation

3.5.5

19

Trajectory Generation

Several methods (controllers) have been tested for generating initial swingup trajectories. They are presented in the following section. Given dierent input constraints and starting congurations, they may perform better or worse. The major diculty is that the mass of the cart is relatively large compared to the pendulum, and has therefore comparably large inertia. An optimal control approach with the Minimum Principle has also been tested, however the resulting boundary value problem turned out to be dicult to solve and was discarded due to time constraints of this project. There is however still code available. Energy Control

Energy control is not quite as simple as is was in the simple pendulum case, because the system has two degrees of freedom in which energy can be stored. It is therefore not sucient anymore to guide the system to a certain energy level, because it may all be stored in the cart. In the following, a control law is developed which takes care of that. The potential and kinetic energy of the system and its partial derivative w.r.t. p1 may be written as

> 1 H(p, ϕ) = mgl cos(ϕ) + p> M (ϕ)−1 p , 2 ∂H 2(−p1 + p2 cos(ϕ)) . = ∂p1 −3 + cos(2ϕ)

(3.61) (3.62)

Furthermore, dene the Lyapunov function

V =

1 2 1 p + (H − H0 )2 α ≥ 0 . 2 1 2

(3.63)

The tuning parameter α > 0 balances how much energy goes into the pendulum or the cart motion respectively. H0 = mgl is the desired energy level at the upper equilibrium. By construction, the Lyapunov function is positive semidenite. Its total time derivative simplies to   dV ∂H = p1 + α(H − H0 ) u. (3.64) dt ∂p1 1 The rst term arises directly from the system dynamics dp dt = u. The second term contains only a partial derivative w.r.t. p1 because the sum over all other partials vanishes, since the system is conservative. Now by choosing the input u to be   ∂H u = −γ p1 + α(H − H0 ) , (3.65) ∂p1

we ensure that the derivative dV dt ≤ 0 is non-positive. Therefore the system should converge to a state corresponding to V = 0 which means that the energy H approaches H0 while at the same time the cart momentum vanishes. The dicult part of this control strategy is choosing a suitable α. This choice naturally depends on the physical parameters of the system. Bang-Bang Energy Control

Åström and Furuta [1] propose the cart acceleration

x ¨ = sat {k(Epend − E0 ) sign(ϕ˙ cos(ϕ))} .

(3.66)

for a simple way to inject energy into the system. The operator sat in this context means that the result is clipped to the allowable maximal acceleration and E is the kinetic energy of the rotatory motion only.

20

3.5. Pendulum on a Cart

Knowing the system model, we can solve for the input that is needed to achieve this acceleration. However, this approach shows a rather poor performance in simulation and is not suitable for reducing the cart motion to a minimum. The drawbacks may be due to the dierent constraints. While Åström and Furuta assume the cart acceleration is constrained, in our case it is the force on the cart. This is not a one-to-one correspondence because of the pendulum dynamics. Furthermore, it is unclear how to choose a suitable gain k . Heuristic Approach

A very simple but surprisingly eective way to swing up the pendulum is by supplying it with sucient energy and then hoping that an LQR controller can catch it at the top. Here, we simply used    5π , k · sign(ϕ) ˙ if ϕ ∈ 3π 4 , 4 (3.67) u= 0 else . to inject energy into a system with gain k to be tuned. Resonance Swingup

Another option to swing up the pendulum from its resting position is to excite its natural frequency. The Pendulum-Cart system linearized around the stationary state at ϕ = π, x = 0, reads         x ¨ 0 0 x u M + m −ml + = . (3.68) ϕ¨ 0 mgl ϕ−π 0 −ml ml2 {z } {z } | | =:A

=:C

The roots of the characteristic polynomial det(Aν 2 + C) are the eigenvalues νi of the system. As expected, two of them are zero because the cart's motion is unconstrained. The other two are a purely complex conjugated pair, indicating an undamped oscillation at radial frequency r (M + m)g . (3.69) ω= Ml By stimulating the system at this frequency, a strong swingup response can be achieved. LQR Control

When applying one of the previous control laws, one hopefully comes close enough to the unstable equilibrium point such that an LQR controller can catch and stabilize the system. This is necessary in order to get a smooth and converging trajectory for the MPC routine. The linearized system dynamics ξ˙ = Aξ + Bu are fully captured by the matrices     1 −1 0 0 0 M Ml −1 M +m   0  0  0  Ml M ml2  , A= B= (3.70)  0   1  . 0 0 0 0 0 mgl 0 0 Those matrices together with suitable cost weighting are then used to solve for the optimal feedback gain matrix, e.g., via MATLAB lqr command. The condition for switching to LQR control is simple: At every time step, the force that would be applied by the LQR controller is calculated. Once this falls below the input constraint, the LQR control law is used. In all simulations, the LQR controller successfully caught the system, i.e., it never violated input constraints once the switch occurred.

Chapter 4

Performance in Simulation 4.1

Simulation Setup

The performance of the MPC routine was evaluated in simulation. The following enumeration describes the procedure: 1. First, a trajectory generator was called that takes the system at the user-specied initial conditions (usually at rest at the stable equilibrium). It simulates the system subject to one of the controllers mentioned in the trajectory generation Sections 3.4.4 resp. 3.5.5. The integration is performed with the ordinary dierential equation solver ode45 in MATLAB. 2. If the convergence of the generated trajectory is not towards ϕ = 0, but another multiple of 2π , the entire ϕ(t) trajectory and initial condition ϕ0 get shifted by that value. This is necessary because the exponentially decaying polynomials are only reasonable to use when there is convergence to zero. Finally, the parameter η gets tted to the trajectory according to Section 2.3. 3. This parameter vector is the initial guess for the MPC routine. The total cost and system constraints are approximated by quadratic, resp. linear functions in the parameter and the corresponding matrices are passed to MATLAB's quadratic program solver quadprog. The result of this optimization is the minimizer ηˆ. Finally ηˆ gets added to the original guess around which we linearized. This step is repeated until convergence, usually after less than ten iterations. 4. Once the sequential quadratic programming routine converges to the solution η , the part corresponding to the input (see equations (3.21) resp. (3.39)) is used to extract u(t) and applied to the system for the time span of one sampling interval. The system is again simulated with ode45. 5. At the end of the current sampling period, the system's states are extracted and set as new initial conditions for the next time step. Optionally, one may perform time shifting as introduced in 3.21 for forward propagation of the parameters. The current parameter vector η (time shifted or not) then gets used as a new starting point for the next time step, i.e., continue at step 3.

4.2

Results

The following results and plots are taken from the Pendulum-Cart system. This is the more complicated and sophisticated system. Similar results are obtained for the simple pendulum. 21

x [m]

u [Nm]

22

4.2. Results

200 100 0 −100

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

2 0 −2

x˙ [m/s]

10 0 −10 −20 4 2 0 −2 −4

ϕ˙ [rad/s]

ϕ [rad]

Generated Approx

40 20 0 −20

Time [s] Figure 4.1: Generated trajectory with resonance swingup from an initial condition of ϕ0 = π . Approximation by basis functions displayed as dashed line. Trajectory Generation

Figure 4.1 shows the generated trajectory (solid line) along with its approximation (dashed) by basis functions of order N = 5 with decay λ = 7. It was produced with the a resonance swingup controller. One can observe that the trajectory is not really feasible, because the cart displacement is much larger than the rail length (1 m) and the input magnitude of the approximation also exceeds the generated input by far at the beginning. Nonetheless, the extracted parameters are good enough to give the MPC routine a rst starting point. Another interesting point is that the approximation of ϕ˙ appears to be very poor. This can be explained by the fact that the trajectories that we approximate are p1 (t) and p2 (t) and not the raw speeds x, ˙ ϕ˙ . Hence by back-transforming the time series to the latter representation, errors may add up in an unfortunate manner. This does, however, not aect the performance of the MPC, because there, only generalized momenta are used. In general one should note that the generation of an initial guess for η0 is not a trivial task and several approaches may need to be tested out for a given initial condition. This is, of course, not very robust and one might look into a more sophisticated solution which works every time and comes up with a feasible and close-to-optimal solution. MPC Routine

Figure 4.2 shows an exemplary plot of a successful MPC swing-up. The blue dots are the time trajectories extracted from η . They are discontinuous because each sampling interval a new solution

100 50 0 −50 −100

ϕ [rad]

4 2 0 −2 −4 4 2 0 −2

ϕ˙ [rad/s]

0

0.5

1

1.5

23

2

2.5

3

3.5

4 2 0 −2 −4

x˙ [m/s]

X [m]

Input [N]

Chapter 4. Performance in Simulation

10 5 0 −5

4

4.5

5

Predicted True Evolution

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Time [s] Figure 4.2: Example of a successful swing-up of the pendulum with MPC at a frequency of only 5 Hz. Parameters are λ = 5 and N = 5.

24

4.2. Results

gets calculated and no continuity is enforced. Red lines on the other hand are the simulations from the ode45 solver in MATLAB, given the inputs from the model predictive controller. It can be seen that the MPC predicts the evolution correctly because the blue dots are coincident with the red line in most cases. However the long-term prediction (plotted at certain points in thin black lines) are not very accurate, owing to the nonlinear model and the decaying trajectories. The following notes contain further observations made during simulation experiments: ˆ Tighter input constraints make it harder for the algorithm to nd a solution until nally the problem becomes (seemingly) unfeasible. Increasing the polynomial order N tends to alleviate that, but at the cost of longer computation times. It was not clear whether changes in λ have a predictable eect. ˆ There is an inherent tradeo between prediction horizon and accuracy. Large λ (e.g., greater than 8) make the approximation very accurate at the beginning, because all the weight is shifted to this period when tting the curves. However the solution quickly decays towards zero, which usually does not resemble the convergence speed of the physical system. Therefore the forecast period is very limited. Smaller lambda on the other hand allow longer prediction times at the cost of accuracy. On the good side however, this means that the approximation of the trajectory cost is relatively accurate and therefore optimization should produce a good result. ˆ Closely related to the previous point is the problem of infeasibility. If the algorithm falls into a local minimum, it may happen at the next iteration that the problem becomes infeasible. Though physically this is not true (we know there is certainly a way to reach the top position without violating any constraint), it may appear so to the algorithm because we have linearized around a solution that is not physical at all. Therefore the constraints may not t the system at all. In some cases, this can be avoided by increasing λ, i.e., reduce the prediction horizon. Trajectories decay faster towards zero then and do not violate constraints anymore. ˆ Due to the decay of τ (t) towards zero, there is a bias towards convergence at an angle of ϕ = 0. This may, however, not be the optimal thing to do and one can imagine several scenarios where convergence towards another multiple of 2π would be better. This remains an unsolved problem. When using trajectory generation, a partial workaround is possible by shifting the initial condition for ϕ passed to the MPC routine by a multiple of 2π . In the case of disturbances this approach breaks down because trajectory generation is (likely to be) sub-optimal and would be required to run online at each sampling step. ˆ A peculiar problem arises from the use of polynomial basis functions, resulting in jittering of the solution. If the MPC routine runs at a reasonable sampling frequency (e.g., above 10 Hz), only a very small portion of the input actually gets applied to the system each time. Due to the nature of polynomials, they tend to oscillate around the function to be approximated (see Figure 4.1), especially at the beginning of each interval. Since we only apply this very rst portion, the input is strongly governed by the oscillations rather than the real solution. In Figure 4.3 one can see that there is an intention to swing the pendulum to the left rst and then swing it up towards ϕ = 0. However due to the input jittering, the position remains stuck around ϕ = π .

ϕ˙ [rad/s]

x˙ [m/s]

ϕ [rad]

x [m]

Input [N]

Chapter 4. Performance in Simulation

100 50 0 −50 −100

0

0.5

1

1.5

25

2

2.5

3

3.5

4 2 0 −2 −4

4

4.5

5

Predicted True Evolution

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

4 2 0 −2 −4 1 0.5 0 −0.5 2 0 −2 −4

Time [s] Figure 4.3: Example of Jittering at a sampling frequency of 10 Hz. Blue dots is the result of the MPC routine. Red curve is the actual time evolution of the system, black lines are the future predictions of the MPC algorithm at various time points.

26

4.2. Results

Chapter 5

Implementation 5.1

System Overview

The described MPC procedure shall be tested on a real system available at the IDSC lab. It consists of a motor which drives a cog-wheel. A belt transfers the torque from the cog-wheel to the cart. The input to the system is an analogue voltage signal. An amplier sets the motor voltage accordingly. Measurements are obtained from the motor encoder (corresponds to the cart position) and from an encoder at the pendulum pivot point, measuring the pendulum angle. There exists a ready-assembled micro controller setup for the inverted pendulum system. It can be programmed in a C# .NET Microframework. Since the functionality is very limited, we decided to use the micro controller only as a bridge to forward data from the system encoders to the computer and vice versa. All signicant computations are executed on the computer directly. In the following, we describe how communication between a laptop and the microcontroller is handled and how the system is controlled. The owchart in Figure 5.1 captures the most important aspects. Note that the MPC routine has not yet been implemented on that level, due to time constraints. For preliminary testing, a stabilizing LQR controller was used.

5.2

Communication

Assuming almost all computation is done on a computer, there are only a small number of values to transmit: The micro controller sends two oating point variables (cart position and pendulum angle) to the computer and the latter responds with a single force command which is also a oating point number. Currently, the protocol is implemented in a very naive and fragile way to save transmission times. On the architectures used, each oating point number has the size of 4 bytes. Those bytes are simply split up and transmitted one by one. Note that this is not possible with an arbitrary system combination and needs to be tested on compiler and hardware changes. To determine the end of a transmission, an additional byte of value 0xFF is appended to the data. This is also not safe and may lead to misinterpretation in rare cases. With a baudrate of 115200 bps, the transmission of 9 bytes (two oating point numbers and the delimiting byte), the theoretical transmission time can be calculated. The serial communication uses one start and one stop bit and no parity. Therefore each data byte needs 10 bits when transmitted. Hence the time reads

t9 databytes =

9 databytes · 10 bit/databyte = 0.78 ms . 115200 bit/s

(5.1)

It turns out that the bottleneck of the communication is not the bandwidth, but the latency. This leads to a delay of one sample currently and should be addressed in further work on this project. It can be observed that the computer may actually catch up with the micro controller and apply 27

28

5.2. Communication

Calculate control input

Read UART

no

Send control command

Reading available?

yes Set motor commands

Read encoders and process signals

Send cart position and pendulum angle

Wait until reading available

Sleep

Micro-controller

Computer

Figure 5.1: Schematic owchart of control loop. Solid arrows show execution sequence, dashed arrows represent data transfer.

Chapter 5. Implementation

29

the undelayed input for one time step and then falls behind again. Therefore computation times are not an issue, rather the time it takes to query and read or send over the serial port.

5.3

Filtering and Control

For preliminary inspection of the control performance, a simple LQR state feedback controller is implemented on the computer. A rst order discrete lowpass lter runs on the micro controller to smooth the encoder values which get transmitted to the computer. The dierence equation reads

ylt [tk ] = (1 − a)ylt [tk−1 ] + aymeasured [tk ] ,

a=

Ts , Ts + Tf

(5.2)

with sampling time Ts and lter time constant Tf . Note that the square brackets indicate the discrete nature of the variables. In the current implementation, the sampling time is chosen according to the execution frequency and the lter time constant by analyzing the frequency spectrum of the data. Numerically, this yields Ts = 1/50 s and Tf = 0.02 s (which is coincidentally equal for the sampling rate of 50 Hz). There the full state vector is obtained by simple dierentiation of the states (with knowledge of the sampling frequency). The LQR gains for the linearized system have been found with MATLAB, a tuning script is available. Experiments showed, that the combined delay of communication (1 sample typically) and lter (which, due to its causal formulation inherently introduced some delay) makes the system go unstable with the controller that stabilized the system when directly running on the microcontroller. Since the MPC formulation also relies on full knowledge on the state, it may be helpful to implement a state observer or at least nd a suitable lter that ensures reliable state derivatives without introducing too much delay. 5.3.1

LQR Controller

The design of a LQR controller is straightforward with software packages like MATLAB. The pendulum-cart system is linearized around its upper equilibrium point (x = x˙ = ϕ = ϕ˙ = 0). For simplicity, we use those states directly (rather than considering generalized momenta). The linearized system dynamics read     −1   0 0 0 1 0 1 0 0 0      0 1 0 0  0 0 1    0  ζ +  0  u , (5.3) ζ˙ =   1    0 0 M + m ml   0 0 0 0  0 0 mgl 0 0 0 0 ml ml2  > with ζ = x ϕ x˙ ϕ˙ . 5.3.2

Friction Compensation

In all of the above, it was assumed that the control input u is the force on the cart. However the physical system only allows setting a motor voltage, so the resultant force can not be set directly. An initial approach was taken by theoretically calculating the force on the cart from rst principles with knowledge of the parameters such as motor resistance, torque and speed constants and cog wheel radius. However experiments with this approach resulted in very poor performance of the LQR controller. The main observation was that friction is not negligible, especially some stick-slip eects at low speeds make control dicult. Additionally, it is hard to verify the correctness of the calculations since the microcontroller analogue commands are sent to an amplier whose properties were not available at the time of experimenting. For those reasons, a simple system identication was performed, similar to what was done by Preiswerk [2]. It is assumed that the resultant force on the cart reads

Fres = α1 uin − α2 x˙ − α3 sign(x) ˙ . | {z } Friction

(5.4)

30

5.4. Building and Deployment

Table 5.1: Parameters of the Friction Model (5.4) Parameter Value Unit α1 0.055 N Ns/m α2 25 α3 10 N

Essentially, the friction consists of a constant value and a part proportional to the cart speed. The parameters {α1 , α2 , α3 } are identied through the following procedure: 1. The system is assumed to rest at its stable equilibrium and no force acts on the cart. 2. A constant input value uin is applied to the system for a duration of 0.5 s. After that period, the input is reset to zero again and remains there. The step height has arbitrary units when specied in the code. After several conversions, those values essentially translate to voltage. The corresponding conversion factor α1 is one of the parameters that needs to be identied. 3. The time evolution of the cart motion during that input sequence is recorded and plotted. 4. The same control input is reproduced in simulation. Comparing the recorded and simulated time trajectories allows for parameter identication. We repeat this procedure for dierent input step magnitudes. It turns out that this model underestimates the friction at low speeds and overestimates it at high speeds. The simulation results are plotted against the displacement sensor readings in Figure 5.2. One can clearly see this underresp. overestimating trend. For our purpose, lower speeds were treated with higher importance, since most of the control action happens at very low cart speeds. The identied parameter values are listed in Table 5.1. We conclude that the presented friction model is far from optimal. It is advisable to perform a proper system identication or t a more exible model to the system in order to improve control performance. Once the model has been xed, the control action can be reconstructed directly from (5.4). The resulting force is the theoretical control input which was assumed in previous chapters and therefore corresponds to the desired force. Hence, once a controller has returned the desired force Fdesired to be applied to the cart, the actual input value that needs to be set on the microcontroller is

uin =

5.4

Fdesired + α2 x˙ + α3 sign(x) ˙ . α1

(5.5)

Building and Deployment

Computer Side

The executable on the computer side has been built with MinGW on a Windows 7 platform in the Eclipse Luna IDE. Console commands to build the binaries can be found in Listing 5.1. Specic paths and compiler options may need to adjusted. In the future, constructing a makele may also prove helpful in achieving portability.

Chapter 5. Implementation

2

31

Input Step = 200

·10−2 Measurement Simulation

Position x [m]

1.5

1

0.5

0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

3.5

4

4.5

5

3.5

4

4.5

5

Time [s]

12

Measurement Simulation

10 Position x [m]

Input Step = 300

·10−2

8 6 4 2 0

0

0.5

1

1.5

2

2.5

3

Time [s] Input Step = 500

0.5 Measurement Simulation

Position x [m]

0.4 0.3 0.2 0.1 0

0

0.5

1

1.5

2

2.5

3

Time [s] Figure 5.2: Plots showing how the friction model simulation compares with recorded data. Three dierent input step magnitudes are plotted. The units are arbitrary and only get dened through the parameters. The simulated friction model assumes parameters given in Table 5.1.

32

5.4. Building and Deployment

Listing 5.1: Build commands 1 2

3

4 5 6

g++ "−IC:\\path\\to\\Mosek\\7\\tools\\platform\\win32x86\\h" −O0 −g3 −Wall −c − fmessage−length=0 −std=c++11 −o "src\\CartMPC.o" "..\\src\\CartMPC.cpp" g++ "−IC:\\path\\to\\Mosek\\7\\tools\\platform\\win32x86\\h" −O0 −g3 −Wall −c − fmessage−length=0 −std=c++11 −o "src\\PendOnCart.o" "..\\src\\PendOnCart.cpp " g++ "−IC:\\path\\to\\Mosek\\7\\tools\\platform\\win32x86\\h" −O0 −g3 −Wall −c − fmessage−length=0 −std=c++11 −o "src\\LaguerreLib.o" "..\\src\\LaguerreLib. cpp" g++ "−IC:\\path\\to\\Mosek\\7\\tools\\platform\\win32x86\\h"−O0 −g3 −Wall −c − fmessage−length=0 −std=c++11 −o "src\\COMPort.o" "..\\src\\COMPort.cpp" g++ −static−libgcc −static−libstdc++ −o CartMPC.exe "src\\PendOnCart.o" "src\\ LaguerreLib.o" "src\\CartMPC.o" "src\\COMPort.o"

Micro-controller Side

For installation instructions, the the corresponding document from the digital control course at IDSC was followed. Here, Microsoft Visual C# Express 2010 is used. The micro-controller can be ashed and debugged via a USB connection. Serial communication happens through a second USB connection and gets emulated by a COM port in Windows.

Bibliography [1] Åström and Furuta. Swinging up a pendulum by energy control. 2000.

Automatica, 36:287  295,

[2] Pascal Preiswerk. Simulation-based lqr-trees: Implementation and verication on the inverted pendulum system. Semester Project, January 2010.

33

Institute for Dynamic Systems and Control Prof. Dr. R. D'Andrea, Prof. Dr. L. Guzzella

Title of work:

Nonlinear Innite Horizon Model Predictive Control with Parametric Trajectories Thesis type and date:

Semester Thesis, December 2015 Supervision:

Michael Mühlebach Prof. Dr. Raaello D'Andrea

Student:

Name: E-mail: Legi-Nr.: Semester:

Jan Carius [email protected] 11-936-242 8

Statement regarding plagiarism:

By signing this statement, I arm that I have read and signed the Declaration of Originality, independently produced this paper, and adhered to the general practice of source citation in this subject-area. Declaration of Originality:

http://www.ethz.ch/faculty/exams/plagiarism/confirmation_en.pdf Zurich, 14. 1. 2016: