Numerical schemes for Hamilton-Jacobi equations ... - uma (ensta)

Numerical schemes for Hamilton-Jacobi equations, control problems and games M. Falcone (SAPIENZA, Roma) & H. Zidani (ENSTA, Paris)

SADCO Spring School ”Applied and Numerical Optimal Control” April 23-27, 2012, Paris – Lecture 1/3

OUTLINE OF THE COURSE:

• Lecture 1: Approximation of optimal control problems via DP • Lecture 2: Efficient methods and perspectives • Lecture 3: Approximation of reachable sets and state constrained problems

OUTLINE OF THIS LECTURE

• • • • • •

Control problems via Dynamic Programming The minimum time problem Pursuit-evasion games Numerical experiments Other classical problems Some references

The model problem

Let us consider the nonlinear system dynamics (

y(t) ˙ = f (y(t), a(t), b(t)), t > 0, y(0) = x

where y(t) ∈ RN is the state a( · ) ∈ A is the control of player 1 (player a) A = admissible control functions of player 1 = { a : [0, +∞[ → A, measurable } (e.g. A = piecewise constant functions with values in A),

(D)

The model problem

b( · ) ∈ B is the control of player 2 (player b), B = { b : [0, +∞[ → B, measurable }, A, B ⊂ RM are given compact sets. Assume f is continuous and |f (x, a, b) − f (y, a, b)| ≤ L |x − y|

∀x, y ∈ RN , a ∈ A, b ∈ B.

Then, for every given control strategies a( · ) ∈ A, b( · ) ∈ B, there is a unique trajectory of (D), denoted by yx(t; a, b) (Caratheodory).

Payoff

The payoff of the game is tx(a( · ), b( · )) = min{ t : yx(t; a, b) ∈ T } ≤ +∞, where T ⊆ RN is a given closed target .

Goal of the game Player a wants to minimize the payoff, he is called the pursuer, whereas Player b wants to maximize the payoff, he is called the evader.

Example: Minimum time problem

This is a simple example with just 1 player (

y˙ = a, A = { a ∈ RN : |a| = 1 }, y(0) = x.

Then, tx(a∗) is equal to the length of the optimal trajectory joining x and the point yx(tx(a∗)), thus tx(a∗) = min tx(a) = dist(x, T ) a∈A

and any optimal trajectory is a straight line!

Example: Pursuit-Evasion Games

We have two players, each one controlling its own dynamics (

y˙1 = f1(y1, a), y˙2 = f2(y2, b) The target is Tδ ≡ { |y1 − y2| ≤ δ },

yi ∈ RN/2, i = 1, 2

(PEG)

δ > 0, or T0 ≡ { (y1, y2) : y1 = y2 } .

Then, tx(a( · ), b( · )) is the capture time corresponding to the strategies a(·) and b(·).

Dynamic Programming for 1-Player

In this section we assume B = { ¯ b }, so the system can be rewritten as (

y˙ = f (y, a), t > 0, y(0) = x.

Define the value function T (x) ≡

inf tx(a) .

a(·)∈A

T ( · ) is the minimum-time function, it is the best possible outcome of the game for player a, as a function of the initial position x of the system.

Reachable set

DEFINITION R ≡ { x ∈ RN : T (x) < +∞ }, i.e. the set of starting points from which it is possible to reach the target. WARNING The reachable set depends on the target and on the dynamics in a rather complicated way. It is NOT given in our problem.

Dynamic Programming for 1-Player

LEMMA For all x ∈ R \ T , 0 ≤ t < T (x) , T (x) =

inf { t + T (yx(t; a)) } .

a( · )∈A

(DPP)

Sketch of the Proof The inequality “≤” follows from the intuitive fact that ∀a( · ) T (x) ≤ t + T (yx(t; a)).

Sketch of the proof

The proof of the opposite inequality “≥” is based on the fact that the equality holds if a( · ) is optimal for x. For any ε > 0 we can find a minimizing control aε such that T (x) + ε ≥ t + T (yx(t; aε)) split the trajectory and pass to the limit for ε → 0.

Sketch of the proof

To prove rigorously the above inequalities the following two properties of A are crucial:

1. a( · ) ∈ A ⇒ ∀s ∈ R the function t 7→ a(t + s) is in A;

2. (Concatenation) a1, a2 ∈ A ⇒ a( · ) ∈ A ∀s > 0, where

a(t) ≡

(

a1(t) t ≤ s, a2(t) t > s.

WARNING: concatenation is a crucial property

Note that the DP Principle works for A = { piecewise constants functions into A } but not for A = { continuous functions into A }. because joining together two continuous controls we are not guaranteed that the resulting control is continuous.

Getting the Bellman equation

Let us derive the Hamilton-Jacobi-Bellman equation from the DP Principle. Rewrite (DPP) as T (x) − inf T (yx(t; a)) = t a( · )

and divide by t > 0, sup a( · )

(

T (x) − T (yx (t; a)) t

)

=1

We want to pass to the limit as t → 0+.

for t < T (x) .

Bellman equation

Assume T is differentiable at x and that the limit for t → 0+ commutes with the supa( · ).

Then, if y˙x(0; a) exists, sup { −∇T (x) · y˙x(0, a) } = 1.

a( · )∈A

Then, for lim a(t) = a0, we get + t→0

sup { −∇T (x) · f (x, a0) } = 1 .

(HJB)

a0∈A

This is the Hamilton-Jacobi-Bellman partial differential equation. A first order nonlinear PDE.

Bellman equation

Let us define the Hamiltonian, H1(x, p) := max{ −p · f (x, a) } − 1, a∈A

we can rewrite (HJB) in short as H(x, ∇T (x)) = 0 in R \ T . A natural boundary condition on ∂T is T (x) = 0

for x ∈ ∂T

T verifies the HJB equation

PROPOSITION If T ( · ) is C 1 in a neighborhood of x ∈ R \ T , then T ( · ) satisfies (HJB) at x. Sketch of the proof We first prove the inequality “≤”. Fix ¯ a(t) ≡ a0 ∀t, and set yx(t) = yx(t; ¯ a). (DPP) gives T (x) − T (yx(t)) ≤ t

∀ 0 ≤ t < T (x).

We divide by t > 0 and let t → 0+ to get −∇T (x) · y˙x(0) ≤ 1, where y˙x(0) = f (x, a0) (since ¯ a(t) ≡ a0).


Then, −∇T (x) · f (x, a0) ≤ 1,

∀a0 ∈ A

and we get max{ −∇T (x) · f (x, a) } ≤ 1 . a∈A

Next we prove the inequality “≥”.


Fix ε > 0. For all t ∈ ]0, T (x)[, by DPP there exists α ∈ A such that T (x) ≥ t + T (yx(t; α)) − εt . Then T (x) − T (yx(t; α)) t Z t Z 1 1 t ∂ =− T (yx(s; α)) ds = − ∇T (yx (s)) · y˙x(s) ds t Z0 ∂s t 0 1 t ∇T (x) · f (x, α(s)) ds =− t 0 as t → 0+ we get 1−ε≤

1 − ε ≤ −∇T (x) · f (x, a0) for a0 ∈ A .


Letting ε → 0+ we get sup{ −∇T (x) · f (x, a) } ≥ 1 . a∈A

We have proved that if T is regular then it satisfies pointwise the Bellman equation in the reachable set R

Is T regular?

The answer is NO even for simple cases. Let us go back to Example 1 where T (x) = dist(x, T ). Note that T is not differentiable at x if there exist two distinct points of minimal distance. EXAMPLE Let us take N = 1, f (x, a) = a, A = B(0, 1) and choose T = ]−∞, −1] ∪ [1, +∞[ . Then, T (x) = 1 − |x| which is not differentiable at x = 0.

a.e. solutions

Note that in this example the Bellman equation is the eikonal equation |Du(x)| = 1

(1)

which has infinitely many a.e. solutions also when we fix the values on the boundary ∂T , u(−1) = u(1) = 0

Continuity of T

Also the continuity of T is, in general, not guarateed.

Take the previous example and set A = [−1, 0], then we have T (1) = 0

lim T (x) = 2

x→1

(2)

However, the continuity of T ( · ) is equivalent to the property of Small-Time Local Controllability (STLC) around T .

STLC property

DEFINITION Assume ∂T smooth. ∀x ∈ ∂T ∃ˆ a∈A:

f (x, ˆ a) · η(x) < 0

(STLC)

The STLC guarantees that R is an open subset of RN and that

lim T (x) = +∞,

x→x0

∀x0 ∈ ∂R

Weak solutions

We want to interpret the HJB equation in a “weak sense” so that T ( · ) is a “solution” (non-classical), unique under suitable boundary conditions. Let’s go back to the proof of our proposition. We proved that 1. T (x)−T (yx (t)) ≤ t, ∀t small and T ∈ C 1 ⇒ H(x, ∇T (x)) ≤ 0; 2. T (x) − T (yx (t)) ≥ t(1 − ε), H(x, ∇T (x)) ≥ 0.

∀t, ε small and T ∈ C 1 ⇒

Weak solutions

Main Idea : If φ ∈ C 1 and T − φ has a maximum at x then T (x) − φ(x) ≥ T (yx (t)) − φ(yx(t))

∀t,

thus φ(x) − φ(yx(t)) ≤ T (x) − T (yx(t)) ≤ t, so we can replace T by φ in the proof of proposition and get H(x, ∇φ(x)) ≤ 0 .

Weak solutions

Similarly, if φ ∈ C 1 and T − φ has a minimum at x, then T (x) − φ(x) ≤ T (yx (t)) − φ(yx(t)),

∀t.

thus φ(x) − φ(yx(t)) ≥ T (x) − T (yx (t)) ≥ t(1 − ε) and by the proof of the proposition H(x, ∇φ(x)) ≥ 0 . Thus, the classical proof can be fixed also for T ∈ / C 1(R) just replacing T with a “test function” φ ∈ C 1(R).

Viscosity solutions

DEFINITION (Crandall-Evans-Lions, 1985)

Let F : RN × R × RN → R be continuous, Ω ⊆ RN open. We say that u ∈ C(Ω) is a viscosity subsolution of F (x, u, ∇u) = 0

in Ω

if ∀φ ∈ C 1, ∀x local maximum point of u − φ, F (x, u(x), ∇φ(x)) ≤ 0. It is a viscosity supersolution if ∀φ ∈ C 1, ∀x local minimum point of u − φ, F (x, u(x), ∇φ(x)) ≥ 0. A viscosity solution is a sub- and supersolution.

Viscosity solutions

THEOREM If R \ T is open and T ( · ) is continuous, then T ( · ) is a viscosity solution of the Hamilton-Jacobi-Bellman equation (HJB). The proof is the argument before the definition!

Uniqueness

Next we want to prove the uniqueness of the solution for the Dirichlet boundary value problem (

u + H(x, ∇u) = 0 in Ω u=g on ∂Ω

(BVP)

under assumptions satisfied by the Hamiltonian H1. T (·) can be recovered from the solution of a boundary value problem as follows.

Uniqueness

We rescale T by the Kruˇ zkov change of variable (

1 − e−T (x) if T (x) < +∞, i.e. x ∈ R V (x) := 1 if T (x) = +∞, (x ∈ / R) = inf J(x, a) a( · )∈A

where J(x, a) :=

Z t (a) x 0

e−t dt .

Uniqueness

Note that, by definition, ∇V (x) = e−T (x)∇T (x) and 1 − V (x) = e−T (x) which implies ∇T (x) =

∇V (x) 1 − V (x)

Then, substituting into the equation for the minimum fime, we get the new equation for v V (x) + max[−f (x, a) · ∇V (x)] = 1 a∈A

Solving the free boundary problem

From V we can reconstruct T and R by T (x) = − log(1 − V (x)),

R = { x : V (x) < 1 }.

This is quite important to solve the free boundary problem as well as for the numerical approximation. In fact, V takes values in [0,1].

Uniqueness

Moreover, by the DP Principle, V satisfies   V + max{ −∇V · f (x, a) − 1 } = 0 in RN \ T a∈A  V =0 on ∂T ,

(BVP-B)

which is a special case of (BVP), with

H(x, u, p) = H1(x, u, p) ≡ u + max{ −f (x, a) · p − 1 } a∈A

Ω = T c ≡ RN \ T .

Uniqueness

LEMMA The Mininimum Time Hamiltonian H1 satisfies the “structure condition” |H(x, p) − H(y, q)| ≤ K(1 + |x|)|p − q| + |q| L |x − y|

(SH)

for any x, y, p, q. THEOREM (Crandall-Lions, 1984) ¯ u subsolution, w suAssume H satisfies (SH), u, w ∈ BUC(Ω), persolution of v + H(x, ∇v) = 0 in Ω (open), u ≤ w on ∂Ω. Then u ≤ w in Ω.

Dynamic Programming for 2-Players

What is the value function for the 2-players game?

WARNING It is not inf sup J(x, a, b)

a∈A b∈B

because a would choose his control function with the information of the whole future response of player b to any control function a( · ).

Nonanticipating Strategies

A more reasonable information pattern can be modeled by means of the notion of nonanticipating strategies (Varayia, Roxin, Elliott-Kalton)

for the Pursuer ∆ ≡ { α : B → A| b(t) = eb(t) ∀t ≤ t′ ⇒ α[b](t) = α[eb](t) ∀t ≤ t′ }, for the Evader e (t) ∀t ≤ t′ ⇒ Γ ≡ { β : A → B|a(t) = a e ](t) ∀t ≤ t′ } . β[a](t) = β[a

Lower Value of a game

Now we can define the lower value of the game T (x) ≡ inf sup tx(α[b], b), α∈∆ b∈B

or V (x) ≡ inf sup J(x, α[b], b) α∈∆ b∈B

R t (a,b) −t e dt.

where the payoff is J(x, a, b) = 0x

Value of a game

Similarly the upper value of the game is Te (x) := sup inf tx(a, β[a]), β∈Γ a∈A

or

Ve (x) := sup inf J(x, a, β[a]) . β∈Γ a∈A

DEFINITION We say that the game has a value if the upper and lower values coincide, i.e. if T = Te or V = Ve .

DP Principle for 2 Players

Lemma For all 0 ≤ t < T (x) T (x) = inf sup{ t + T (yx(t; α[b], b)) }, α∈∆ b∈B

∀x ∈ R \ T ,

and Z t

V (x) = inf sup α∈∆ b∈B

0

e−s ds + e−tV (yx(t; α[b], b)) ,

∀x ∈ T c.

The proof is similar to the 1-player case but more tecnical due to the essential use of non-anticipating strategies.

Isaacs equation

Isaacs’ Lower Hamiltonian H(x, p) := min max{ −p · f (x, a, b) } − 1 . b∈B

a∈A

The upper values Te and Ve satisfy a similar DP Principle. Isaacs’ Upper Hamiltonian f H(x, p) := max min { −p · f (x, a, b) } − 1 . a∈A

b∈B

Isaacs equation

THEOREM (Evans-Souganidis, 1984) 1. If R\T is open and T ( · ) is continuous, then T ( · ) is a viscosity solution of H(x, ∇T ) = 0

in R \ T .

(HJI-L)

2. If V ( · ) is continuous, then it is a viscosity solution of V + H(x, ∇V ) = 0

in T c.

Numerical approximation

We will describe a method to construct approximation schemes for the Isaacs equation which keeps the main informations of the game/control problem.

This approach leads to the numerical approximation of a first order PDE by a discretization of the original control problem and by a discrete DP principle.

Naturally, one can also choose to construct directly an approximation scheme for the Isaacs equation based on the discretization of the PDE, e.g. using a Finite Difference (FD), Finite Volume (FV) scheme, .

Features of the DP scheme

• The schemes have a natural interpretation which comes from the Discrete Dynamic Programming Principle

• Approximate feed-back controls can be obtained without extra computations on the nodes. Once the value function is computed we easily obtain approximate optimal trajectories .

• Natural extensions to high-order methods.

Discretization: 1-Player

By applying the change of variable (Kruˇ zkov) v(x) = 1 − e−T (x) and rewrite the equation in the new variable v(x) + sup[−f (x, a) · ∇v(x)] = 1 a∈A

v(x) = 0 on T v(x) = 1 on ∂R As seen, we can drop the second boundary condition.

(HJ)

Time discretization

Time step h = ∆t > 0. We look at the system at discrete times tj = jh, j ∈ N Discrete dynamical system (

xj+1 = xj + hf (xj , aj ) x0 = x

We define the time discrete reachable set Rh ≡ {x ∈ RN : ∃ {aj } and j ∈ N such that xj ∈ T }

Discrete Minimum Time Function

Let us define, for every control sequence {aj }, the number of steps to hit the target nh({aj }, x) =

(

+∞ x∈ / Rh min{∈ N : xj ∈ T } ∀ x ∈ Rh

and the minimal number of steps to hit the target Nh(x) = min nh({aj }, x) {aj }

The discrete analogue of the minimum time function is Nh(x)h.

The discrete Bellman equation

We change the variable introducing vh(x) = 1 − e−h Nh(x) Note that 0 ≤ vh ≤ 1. By the Discrete Dynamic Programming Principle we get vh(x) = S[vh](x) h

−h

where S[vh](x) ≡ min e a∈A

vh(x) = 0

on Rh \ T .

(HJh)

i

vh(x + hf (x, a)) + 1 − e−h on T

(BC)

Characterization of vh

Since x ∈ RN \ Rh implies that x + hf (x, a) ∈ RN \ Rh we can extend vh to RN setting vh(x) = 1

on RN \ Rh .

getting rid of the boundary condition on the discrete reachable set. THEOREM vh is the unique bounded solution of (HJh) − (BC).

Local Controllability

Assumptions on T :

• (i) T ≡ {x : gi(x) ≤ 0 ∀ i = 1, . . . , M } where gi ∈ C 2(RN ) and |∇gi(x)| > 0 for any x such that gi(x) = 0. • (ii) ∀ x ∈ T ∃ a ∈ A such that gi(x) = 0 implies f (x, a) · ∇gi(x) < 0.

Bounds

Let Tδ ≡ ∂T + δB ,

d(x) ≡ dist (x, ∂T )

Lemma Under our assumptions on f and local controllability, there exist some positive constants h, δ, δ ′ such that h Nh(x) ≤ C d(x) + h ,

∀h < h ,

and T (x) ≤ c d(x) ,

∀ x ∈ Tδ ′ .

x ∈ Tδ

Convergence

THEOREM Let the assumptions of the previous Lemma be satisfied and let T be compact with nonempty interiour. Then, for h → 0+ vh → v locally uniformly in RN h Nh → T locally uniformly in R .

Sketch of the proof

v = lim inf vh(y), h→0+ y→x

v = lim sup vh(y) h→0+ y→x

1. v is a viscosity subsolution for (HJ)

2. v is a viscosity supersolution for (HJ)

Since vh(x) ≤ C d(x) + h

this implies |v| ≤ C d(x) |v| ≤ C d(x)

Sketch of the proof

Then v=v=0

on ∂T

and by a comparison theorem for sub and supersolutions (BarlesPerthame, 1987) v=v=v and v is continuous on ∂T .

on RN .

Error estimate

Let us assume Q is a compact subset of R where the following condition holds: ∃ C0 > 0 : ∀ x ∈ Q there is a time optimal control with total variation less than C0 bringing the system to T .

(BV)

THEOREM Let the assumptions of theorem hold true and let Q be a compact subset of R where (BV) holds. Then ∃ h, C > 0: |T (x) − h Nh(x)| ≤ Ch ,

∀ x ∈ Q,

∀h ≤ h

Sketch of the Proof

1. Our assumptions imply that T is continuous on ∂Tn and V is continuous in RN

2. h Nh(x) ≤ d(x) + h ,

∀ x ∈ Tδ

3. T (x) ≤ C d(x). That implies T (x) − h Nh(x) ≤ Ch Finally, (BV) implies h Nh(x) − T (x) ≤ Ch.

First order scheme

COROLLARY Under the same hypotheses there exists two positive constants h and C: |v(x) − vh(x)| ≤ Ch

∀ x ∈ Q, h ≤ h

(E)

This means that the rate of convergence of the approximation scheme is 1.

Discretization in space

We build a triangulation of a rectangle Q in R2, Q ⊃ T . xi =

nodes of the grid

L=

# of nodes

IT ≡ {i ∈ N : xi ∈ T } Iout ≡ {i ∈ N : xi + hf (xi , a) ∈ / Q, ∀ a} Iin ≡ {i ∈ N : xi + hf (xi , a) ∈ Q} k ≡ max diameter of the cells (or triangles)

Fully discrete scheme

We want to solve v(xi) = mina∈A[βv(xi + hf (xi , a)] + 1 − β, v(xi) = 0 v(xi) = 1

∀xi ∈ Iin

∀xi ∈ IT ∀xi ∈ Iout

in the space of piecewise affine functions (P1 finite elements) W k ≡ {w : Q → [0, 1] : w ∈ C(Q) and ∇w = const, in every Sj }

Linear interpolation

For any i ∈ I in , xi + hf (xi, a) ∈ Q there exists a vector of coefficients, λij (a): 0 ≤ λij (a) ≤ 1 and

L X

λij (a) = 1

j=1

xi + hf (xi , a) =

L X

j=1

λij (a)xj

Linear interpolation

So that we can write, the piecewise affine interpolation as I[v](xi + hf (xi , a)) =

L X

λij (a)v(xj )

j=1

The coefficients λij (a) are the local coordinates with respect to the triangulation. They can be computed solving a linear system.

The fixed point problem

The fully discrete problem can be written as U = S[U ], for U ∈ RL The operator S : RL → RL defined componentwise as   [ e−hΛi(a)U ] + 1 − e−h , ∀ i ∈ Iin   min a∈A [S(U )]i ≡ 0 ∀ i ∈ IT    1 ∀ i ∈ Iout

S : [0, 1]L → [0, 1]L

has a unique fixed point.

S properties

THEOREM S : [0, 1]L → [0, 1]L and kS(U ) − S(V )k∞ ≤ βkU − V k∞ Sketch of the proof S is monotone, i.e. U ≤ V ⇒ S(U ) ≤ S(V ) Then, for any U ∈ [0, 1]L 1 − β = Si(0) ≤ Si(U ) ≤ Si(1) = 1,

∀ i ∈ Iin

where 1 ≡ (1, 1, . . . , 1). This implies, S : [0, 1]L → [0, 1]L

S is a contraction

For any i ∈ Iin Si(U ) − Si(V ) ≤ βΛi(ˆ a)(U − V ) since kΛi(a)k ≤ 1, ∀ a ∈ A this implies

kSi(U ) − S(V )k∞ ≤ βkU − V k∞.

Monotone convergence

We choose U 0 ∈ [0, 1]L Ui0 =

(

0 ∀ i ∈ IT 1 elsewhere

U 0 ∈ U + ≡ {U ∈ [0, 1]L : U ≥ S(U )}

Monotone convergence

By the monotonicity of S the sequence U n U0 U n+1 ≡ S(U n) is monotone decreasing, at least when a sufficiently coarse grid is chosen, and Un ց U∗ by the fixed point argument. Monotonicity allows to accelerate convergence.

Back to the P-E game

Using the same change of variable v(x) = 1 − e−T (x) we can set the Isaacs eq. in RN obtaining   v(x) + min max[−f (x, a, b) · ∇v(x)] = 1 b∈B a∈A  v(x) = 0

in RN \ T for x ∈ ∂T

(I)

The discretization in time and space leads to a Fully discrete scheme w(xi) = max min[e−hI1[w](xi + hf (xi , a, b)] + 1 − e−h for i ∈ Iin b

w(xi) = w(xi) =

a

1 0

for i ∈ Iout2 for i ∈ IT ∪ Iout1

Fully discrete scheme for games

where Iin = {i : xi ∈ Q \ T } IT = {i : xi ∈ T ∩ Q} Iout1 = {i : xi ∈ / Q2} Iout2 = {i : xi ∈ / Q2 \ Q} Q = Q1 ∩ Q2

Scheme properties for games

THEOREM The operator S : [0, 1]L → [0, 1]L , moreover: U ≤ V ⇒ S(U ) ≤ S(V ) S is a contraction map. Let U ∗ be the unique fixed point, we define w(xi) = w(x) =

Ui∗

∀i

P j λij (a, b)w(xj )

Convergence for games

Naturally w depends on the discretization steps, h and k. THEOREM Let T be the closure of an open set with Lipschitz boundary, “diam Q → +∞” and v continuous. Then wh,k → v for h → 0+ and hk → 0+.

on compact sets of RN

Convergence: discontinuous value

ε be the sequence generated by the numerical scheme with Let wn target Tε = {x : d(x, T ) ≤ ε} THEOREM For all x there exists the limit ε (x) w(x) = lim wn + ε→0 n→+∞ n≥n(ε)

and it coincides with the lower value V of the game with target T , i.e. w=V The convergence is uniform on every compact set where V is continuous.

Error estimates

Assume for simplicity Lf ≤ 1 and v Lipschitz continuous.

Then, kwh,k − vk∞ ≤ Ch1/2 1 +

2 ! k

h

Synthesis of Feedback Controls

The algorithm computes also an approximate optimal control at each point of the grid . However by w we can also compute an approximate optimal feedback at each point of Q, i.e. we can define the feedback map F : Q → A x → F (x) = akx

feedback

where akx is the argmin of φk (x, a) ≡ e−hw(x + hf (x, a)) + 1 − e−h Note that φk (x, ·) has a minimum over A (compact), but the minimum point may be not unique.

Feedback selection

We want to construct a selection, e.g. take a strictly convex ψ. We define Akx = {a∗ ∈ A : φk (x, a∗) = min φk (x, a)} A

The feedback selection is done solving arg min ψ(a) Akx

Discrete optimal trajectories

To compute the discrete optimal trajectories, we define the piecewise constant control (in time) ak (s) = akyn,h

s ∈ [nh, (n + 1)h[

where yn,h = state of the Euler scheme, step h. This can be done also with high order methods for ODEs and a different time step. Error estimates of the approximation of feedbacks and optimal trajectories in L1 are available for control problems (F. 2001).

Feedback controls for games

The algorithm computes an approximate optimal control couple (a∗, b∗) at each point of the grid. By w we can also compute an approximate optimal feedback at every point of Q. (a∗, b∗) ≡ argminmax{e−hw(x + hf (x, a, b))} + 1 − e−h In case of multiple solutions we can select a unique couple, e.g. minimizing two convex functionals. We can also introduce an inertia criterium to stabilize the trajectories, i.e. if a at step n + 1 the set of optimal couples contains (a∗n, b∗n) we keep it.

The Tag-Chase Game

Dynamics fP (y, a, b) = vP a vP = 2

fE (y, a, b) = vE b vE = 1

Admissible control sets A = B = B(0, 1) Relative coordinates x ˜ = (xE − xP ) cos θ − (yE − yP ) sin θ y ˜ = (xE − xP ) sin θ − (yE − yP ) cos θ

The Tag-Chase Game

ys y

yP

teP

P

x vP teE E

yE

xP

xE

vE

xs

Optimal trajectories

Test 3: P=(0.3,0.3) E=(0.6,-0.3) 1 0.8 0.6 0.4 P

x2

0.2 0 -0.2 E -0.4 -0.6 -0.8 -1 -1

-0.8

-0.6

-0.4

-0.2

0 x1

0.2

0.4

0.6

0.8

1

The Sector Game

Dynamics fP (y, a, b) = vP a vP = 2

fE (y, a, b) = vE b vE = 1

Admissible control sets A = B(0, 1) B = B(0, 1) \ S,

S = (ρ cos θ, ρ sin θ), θ ∈ (θ1, θ2), |ρ| ≤ 1

Value Function

Test 4

v(x1,x2) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

-1 -1

-0.5 -0.5

0 x2

0 0.5

0.5 1

1

x1

Optimal Trajectories

Test 4: P=(-0.5,0.8) E=(-0.5,0.0) 1 P 0.5

E

x2

0

-0.5

-1

-1.5

-2 -1

-0.8

-0.6

-0.4

-0.2

0 x1

0.2

0.4

0.6

0.8

1

The Homicidal chauffeur

Dynamics   x˙ P       y˙P x˙ E   y˙E      θ˙

= vP sin θ = vP cos θ = vE sin b = vE cos b = vR a P

The Homicidal chauffeur

y

ys

b vE

yE E vP te yP

P R

a

xP

xE x

xs

Value Function

Test 5

v(x1,x2) 1

0.5

0

-1 -1

-0.5 -0.5 0 x2

0 0.5

0.5 1

1

x1


Test 5: P=(-0.1,-0.3) E=(0.1,0.3) 1 0.8 0.6 0.4 E

x2

0.2 0 -0.2 P -0.4 -0.6 -0.8 -1 -1

-0.8

-0.6

-0.4

-0.2

0 x1

0.2

0.4

0.6

0.8

1


Test 5: P=(0.0,0.2) E=(0.0,-0.2) 1 0.8 0.6 0.4

x2

0.2

P

0 E

-0.2 -0.4 -0.6 -0.8 -1 -1

-0.8

-0.6

-0.4

-0.2

0 x1

0.2

0.4

0.6

0.8

1

Optimal Trajectories (Merz Thesis)

Optimal Trajectories (computed)

Test 5 1 0.8 0.6 0.4

x2

0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

0.1

0.2

0.3

0.4

0.5 x1

0.6

0.7

0.8

0.9

1

Other classical control problems

The finite horizon problem

Dynamics

(

y(t) ˙ = f (y(t), a(t)), t > 0, y(t0) = x

(D)

where y ∈ RN α(·) ∈ A ≡ {α : [0, +∞[→ A, measurable} A ⊂ RM compact Cost Jx(α) ≡

Z t f t0

l(y(s), α(s))e−λsds + ψ(y(tf )),

λ>0

Value function

v(x, t0) ≡ inf J(x,t0)(α) . α∈A HJB equation   −vt(x, t0 ) + λv(x, t0) + max{−f (x, a) · ∇x v(x, t0) − l(x, a)} = 0 a∈A  v(x, t ) = ψ(x) f


The infinite horizon problem

Dynamics (

y(t) ˙ = f (y(t), a(t)), t > 0, y(0) = x

where y ∈ RN α(·) ∈ A ≡ {α : [0, +∞[→ A, measurable} A ⊂ RM compact

(D)

Cost Jx(α) ≡

Z ∞ 0

f (y(s), α(s))e−λsds ,

λ>0

Value function

v(x) ≡ inf Jx(α) α∈A

HJB equation λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} = 0 a∈A


The infinite horizon problem with state constraints

Dynamics

(

y(t) ˙ = f (y(t), a(t)), t > 0, y(0) = x

where y ∈ RN . Now we require that yx(t) ∈ Ω ⊂ RN for any t.

Admissible controls α(·) ∈ Ax ≡ {α : [0, +∞[→ A, measurable such that yx(t) ∈ Ω}

A ⊂ RM compact Cost Jx(α) ≡

Z ∞ 0

l(y(s), α(s))e−λsds ,

λ>0

Value function

v(x) ≡ inf Jx(α) . α∈Ax

Under a compatibility condition of the type b x ∈ A : f (x, a b x ) · η(x) < 0 ∀x ∈ ∂Ω, ∃a

HJB equation (Soner) λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} ≥ 0

in Ω

λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} ≤ 0

in Ω

a∈A

a∈A

This means that the value function is a super-solution up to the boundary and a sub-solution inside Ω.

The HJPACK Library HJPACK is a public domain library for Hamilton-Jacobi equations. It includes • finite difference schemes and semi-Lagrangian schemes in R and R2 • applications to control problems and front propagation • fast-marching schemes • a graphical interface and a user’s guide You can get it at www.caspur.it/hjpack.

Basic references

GENERAL THEORY G. Barles, Solutions de viscosit` e des equations d’Hamilton–Jacobi, Springer–Verlag, 1998. A very readable introduction to HJ equations is also contained in the book C. Evans, Partial Differential Equations, American Mathematical Society, 1999.

Basic References

DETERMINISTIC CONTROL PROBLEMS AND GAMES

M. Bardi, I. Capuzzo Dolcetta, Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations, Birkh¨ auser, 1997

A. I. Subbotin, Generalized solutions of first-order PDEs, Birkh¨ auser, Boston, 1995

M. Falcone, R. Ferretti, Semi–Lagrangian Approximation Schemes for Linear and Hamilton–Jacobi Equations, SIAM book, in preparation

STOCHASTIC CONTROL PROBLEMS

W.H. Fleming, H.M. Soner, Control of Markov chains and viscosity solutions, Springer-Verlag, 1998.

F. Silva, An introduction to stochastic control, Notes of the PhD Course ”Optimal Control”, Rome, March 2012