Apr 27, 2012 - Lecture 2: Efficient methods and perspectives. • Lecture 3: ... |f(x, a, b) − f(y, a, b)| ≤ L|x − y| ∀x, y ∈ R. N ... that the equality holds if a(·) is optimal for x. For any ε > 0 we can find a minimizing control aε such that. T(x) + .... f(x,â) · η(x) < 0. (STLC) ..... We change the variable introducing v h. (x)=1 − e. −hN h. (x).
Numerical schemes for Hamilton-Jacobi equations, control problems and games M. Falcone (SAPIENZA, Roma) & H. Zidani (ENSTA, Paris)
SADCO Spring School ”Applied and Numerical Optimal Control” April 23-27, 2012, Paris – Lecture 1/3
OUTLINE OF THE COURSE:
• Lecture 1: Approximation of optimal control problems via DP • Lecture 2: Efficient methods and perspectives • Lecture 3: Approximation of reachable sets and state constrained problems
OUTLINE OF THIS LECTURE
• • • • • •
Control problems via Dynamic Programming The minimum time problem Pursuit-evasion games Numerical experiments Other classical problems Some references
The model problem
Let us consider the nonlinear system dynamics (
y(t) ˙ = f (y(t), a(t), b(t)), t > 0, y(0) = x
where y(t) ∈ RN is the state a( · ) ∈ A is the control of player 1 (player a) A = admissible control functions of player 1 = { a : [0, +∞[ → A, measurable } (e.g. A = piecewise constant functions with values in A),
(D)
The model problem
b( · ) ∈ B is the control of player 2 (player b), B = { b : [0, +∞[ → B, measurable }, A, B ⊂ RM are given compact sets. Assume f is continuous and |f (x, a, b) − f (y, a, b)| ≤ L |x − y|
∀x, y ∈ RN , a ∈ A, b ∈ B.
Then, for every given control strategies a( · ) ∈ A, b( · ) ∈ B, there is a unique trajectory of (D), denoted by yx(t; a, b) (Caratheodory).
Payoff
The payoff of the game is tx(a( · ), b( · )) = min{ t : yx(t; a, b) ∈ T } ≤ +∞, where T ⊆ RN is a given closed target .
Goal of the game Player a wants to minimize the payoff, he is called the pursuer, whereas Player b wants to maximize the payoff, he is called the evader.
Example: Minimum time problem
This is a simple example with just 1 player (
y˙ = a, A = { a ∈ RN : |a| = 1 }, y(0) = x.
Then, tx(a∗) is equal to the length of the optimal trajectory joining x and the point yx(tx(a∗)), thus tx(a∗) = min tx(a) = dist(x, T ) a∈A
and any optimal trajectory is a straight line!
Example: Pursuit-Evasion Games
We have two players, each one controlling its own dynamics (
y˙1 = f1(y1, a), y˙2 = f2(y2, b) The target is Tδ ≡ { |y1 − y2| ≤ δ },
yi ∈ RN/2, i = 1, 2
(PEG)
δ > 0, or T0 ≡ { (y1, y2) : y1 = y2 } .
Then, tx(a( · ), b( · )) is the capture time corresponding to the strategies a(·) and b(·).
Dynamic Programming for 1-Player
In this section we assume B = { ¯ b }, so the system can be rewritten as (
y˙ = f (y, a), t > 0, y(0) = x.
Define the value function T (x) ≡
inf tx(a) .
a(·)∈A
T ( · ) is the minimum-time function, it is the best possible outcome of the game for player a, as a function of the initial position x of the system.
Reachable set
DEFINITION R ≡ { x ∈ RN : T (x) < +∞ }, i.e. the set of starting points from which it is possible to reach the target. WARNING The reachable set depends on the target and on the dynamics in a rather complicated way. It is NOT given in our problem.
Dynamic Programming for 1-Player
LEMMA For all x ∈ R \ T , 0 ≤ t < T (x) , T (x) =
inf { t + T (yx(t; a)) } .
a( · )∈A
(DPP)
Sketch of the Proof The inequality “≤” follows from the intuitive fact that ∀a( · ) T (x) ≤ t + T (yx(t; a)).
Sketch of the proof
The proof of the opposite inequality “≥” is based on the fact that the equality holds if a( · ) is optimal for x. For any ε > 0 we can find a minimizing control aε such that T (x) + ε ≥ t + T (yx(t; aε)) split the trajectory and pass to the limit for ε → 0.
Sketch of the proof
To prove rigorously the above inequalities the following two properties of A are crucial:
1. a( · ) ∈ A ⇒ ∀s ∈ R the function t 7→ a(t + s) is in A;
2. (Concatenation) a1, a2 ∈ A ⇒ a( · ) ∈ A ∀s > 0, where
a(t) ≡
(
a1(t) t ≤ s, a2(t) t > s.
WARNING: concatenation is a crucial property
Note that the DP Principle works for A = { piecewise constants functions into A } but not for A = { continuous functions into A }. because joining together two continuous controls we are not guaranteed that the resulting control is continuous.
Getting the Bellman equation
Let us derive the Hamilton-Jacobi-Bellman equation from the DP Principle. Rewrite (DPP) as T (x) − inf T (yx(t; a)) = t a( · )
and divide by t > 0, sup a( · )
(
T (x) − T (yx (t; a)) t
)
=1
We want to pass to the limit as t → 0+.
for t < T (x) .
Bellman equation
Assume T is differentiable at x and that the limit for t → 0+ commutes with the supa( · ).
Then, if y˙x(0; a) exists, sup { −∇T (x) · y˙x(0, a) } = 1.
a( · )∈A
Then, for lim a(t) = a0, we get + t→0
sup { −∇T (x) · f (x, a0) } = 1 .
(HJB)
a0∈A
This is the Hamilton-Jacobi-Bellman partial differential equation. A first order nonlinear PDE.
Bellman equation
Let us define the Hamiltonian, H1(x, p) := max{ −p · f (x, a) } − 1, a∈A
we can rewrite (HJB) in short as H(x, ∇T (x)) = 0 in R \ T . A natural boundary condition on ∂T is T (x) = 0
for x ∈ ∂T
T verifies the HJB equation
PROPOSITION If T ( · ) is C 1 in a neighborhood of x ∈ R \ T , then T ( · ) satisfies (HJB) at x. Sketch of the proof We first prove the inequality “≤”. Fix ¯ a(t) ≡ a0 ∀t, and set yx(t) = yx(t; ¯ a). (DPP) gives T (x) − T (yx(t)) ≤ t
∀ 0 ≤ t < T (x).
We divide by t > 0 and let t → 0+ to get −∇T (x) · y˙x(0) ≤ 1, where y˙x(0) = f (x, a0) (since ¯ a(t) ≡ a0).
T verifies the HJB equation
Then, −∇T (x) · f (x, a0) ≤ 1,
∀a0 ∈ A
and we get max{ −∇T (x) · f (x, a) } ≤ 1 . a∈A
Next we prove the inequality “≥”.
T verifies the HJB equation
Fix ε > 0. For all t ∈ ]0, T (x)[, by DPP there exists α ∈ A such that T (x) ≥ t + T (yx(t; α)) − εt . Then T (x) − T (yx(t; α)) t Z t Z 1 1 t ∂ =− T (yx(s; α)) ds = − ∇T (yx (s)) · y˙x(s) ds t Z0 ∂s t 0 1 t ∇T (x) · f (x, α(s)) ds =− t 0 as t → 0+ we get 1−ε≤
1 − ε ≤ −∇T (x) · f (x, a0) for a0 ∈ A .
T verifies the HJB equation
Letting ε → 0+ we get sup{ −∇T (x) · f (x, a) } ≥ 1 . a∈A
We have proved that if T is regular then it satisfies pointwise the Bellman equation in the reachable set R
Is T regular?
The answer is NO even for simple cases. Let us go back to Example 1 where T (x) = dist(x, T ). Note that T is not differentiable at x if there exist two distinct points of minimal distance. EXAMPLE Let us take N = 1, f (x, a) = a, A = B(0, 1) and choose T = ]−∞, −1] ∪ [1, +∞[ . Then, T (x) = 1 − |x| which is not differentiable at x = 0.
a.e. solutions
Note that in this example the Bellman equation is the eikonal equation |Du(x)| = 1
(1)
which has infinitely many a.e. solutions also when we fix the values on the boundary ∂T , u(−1) = u(1) = 0
Continuity of T
Also the continuity of T is, in general, not guarateed.
Take the previous example and set A = [−1, 0], then we have T (1) = 0
lim T (x) = 2
x→1
(2)
However, the continuity of T ( · ) is equivalent to the property of Small-Time Local Controllability (STLC) around T .
STLC property
DEFINITION Assume ∂T smooth. ∀x ∈ ∂T ∃ˆ a∈A:
f (x, ˆ a) · η(x) < 0
(STLC)
The STLC guarantees that R is an open subset of RN and that
lim T (x) = +∞,
x→x0
∀x0 ∈ ∂R
Weak solutions
We want to interpret the HJB equation in a “weak sense” so that T ( · ) is a “solution” (non-classical), unique under suitable boundary conditions. Let’s go back to the proof of our proposition. We proved that 1. T (x)−T (yx (t)) ≤ t, ∀t small and T ∈ C 1 ⇒ H(x, ∇T (x)) ≤ 0; 2. T (x) − T (yx (t)) ≥ t(1 − ε), H(x, ∇T (x)) ≥ 0.
∀t, ε small and T ∈ C 1 ⇒
Weak solutions
Main Idea : If φ ∈ C 1 and T − φ has a maximum at x then T (x) − φ(x) ≥ T (yx (t)) − φ(yx(t))
∀t,
thus φ(x) − φ(yx(t)) ≤ T (x) − T (yx(t)) ≤ t, so we can replace T by φ in the proof of proposition and get H(x, ∇φ(x)) ≤ 0 .
Weak solutions
Similarly, if φ ∈ C 1 and T − φ has a minimum at x, then T (x) − φ(x) ≤ T (yx (t)) − φ(yx(t)),
∀t.
thus φ(x) − φ(yx(t)) ≥ T (x) − T (yx (t)) ≥ t(1 − ε) and by the proof of the proposition H(x, ∇φ(x)) ≥ 0 . Thus, the classical proof can be fixed also for T ∈ / C 1(R) just replacing T with a “test function” φ ∈ C 1(R).
Viscosity solutions
DEFINITION (Crandall-Evans-Lions, 1985)
Let F : RN × R × RN → R be continuous, Ω ⊆ RN open. We say that u ∈ C(Ω) is a viscosity subsolution of F (x, u, ∇u) = 0
in Ω
if ∀φ ∈ C 1, ∀x local maximum point of u − φ, F (x, u(x), ∇φ(x)) ≤ 0. It is a viscosity supersolution if ∀φ ∈ C 1, ∀x local minimum point of u − φ, F (x, u(x), ∇φ(x)) ≥ 0. A viscosity solution is a sub- and supersolution.
Viscosity solutions
THEOREM If R \ T is open and T ( · ) is continuous, then T ( · ) is a viscosity solution of the Hamilton-Jacobi-Bellman equation (HJB). The proof is the argument before the definition!
Uniqueness
Next we want to prove the uniqueness of the solution for the Dirichlet boundary value problem (
u + H(x, ∇u) = 0 in Ω u=g on ∂Ω
(BVP)
under assumptions satisfied by the Hamiltonian H1. T (·) can be recovered from the solution of a boundary value problem as follows.
Uniqueness
We rescale T by the Kruˇ zkov change of variable (
1 − e−T (x) if T (x) < +∞, i.e. x ∈ R V (x) := 1 if T (x) = +∞, (x ∈ / R) = inf J(x, a) a( · )∈A
where J(x, a) :=
Z t (a) x 0
e−t dt .
Uniqueness
Note that, by definition, ∇V (x) = e−T (x)∇T (x) and 1 − V (x) = e−T (x) which implies ∇T (x) =
∇V (x) 1 − V (x)
Then, substituting into the equation for the minimum fime, we get the new equation for v V (x) + max[−f (x, a) · ∇V (x)] = 1 a∈A
Solving the free boundary problem
From V we can reconstruct T and R by T (x) = − log(1 − V (x)),
R = { x : V (x) < 1 }.
This is quite important to solve the free boundary problem as well as for the numerical approximation. In fact, V takes values in [0,1].
Uniqueness
Moreover, by the DP Principle, V satisfies V + max{ −∇V · f (x, a) − 1 } = 0 in RN \ T a∈A V =0 on ∂T ,
(BVP-B)
which is a special case of (BVP), with
H(x, u, p) = H1(x, u, p) ≡ u + max{ −f (x, a) · p − 1 } a∈A
Ω = T c ≡ RN \ T .
Uniqueness
LEMMA The Mininimum Time Hamiltonian H1 satisfies the “structure condition” |H(x, p) − H(y, q)| ≤ K(1 + |x|)|p − q| + |q| L |x − y|
(SH)
for any x, y, p, q. THEOREM (Crandall-Lions, 1984) ¯ u subsolution, w suAssume H satisfies (SH), u, w ∈ BUC(Ω), persolution of v + H(x, ∇v) = 0 in Ω (open), u ≤ w on ∂Ω. Then u ≤ w in Ω.
Dynamic Programming for 2-Players
What is the value function for the 2-players game?
WARNING It is not inf sup J(x, a, b)
a∈A b∈B
because a would choose his control function with the information of the whole future response of player b to any control function a( · ).
Nonanticipating Strategies
A more reasonable information pattern can be modeled by means of the notion of nonanticipating strategies (Varayia, Roxin, Elliott-Kalton)
for the Pursuer ∆ ≡ { α : B → A| b(t) = eb(t) ∀t ≤ t′ ⇒ α[b](t) = α[eb](t) ∀t ≤ t′ }, for the Evader e (t) ∀t ≤ t′ ⇒ Γ ≡ { β : A → B|a(t) = a e ](t) ∀t ≤ t′ } . β[a](t) = β[a
Lower Value of a game
Now we can define the lower value of the game T (x) ≡ inf sup tx(α[b], b), α∈∆ b∈B
or V (x) ≡ inf sup J(x, α[b], b) α∈∆ b∈B
R t (a,b) −t e dt.
where the payoff is J(x, a, b) = 0x
Value of a game
Similarly the upper value of the game is Te (x) := sup inf tx(a, β[a]), β∈Γ a∈A
or
Ve (x) := sup inf J(x, a, β[a]) . β∈Γ a∈A
DEFINITION We say that the game has a value if the upper and lower values coincide, i.e. if T = Te or V = Ve .
DP Principle for 2 Players
Lemma For all 0 ≤ t < T (x) T (x) = inf sup{ t + T (yx(t; α[b], b)) }, α∈∆ b∈B
∀x ∈ R \ T ,
and Z t
V (x) = inf sup α∈∆ b∈B
0
e−s ds + e−tV (yx(t; α[b], b)) ,
∀x ∈ T c.
The proof is similar to the 1-player case but more tecnical due to the essential use of non-anticipating strategies.
Isaacs equation
Isaacs’ Lower Hamiltonian H(x, p) := min max{ −p · f (x, a, b) } − 1 . b∈B
a∈A
The upper values Te and Ve satisfy a similar DP Principle. Isaacs’ Upper Hamiltonian f H(x, p) := max min { −p · f (x, a, b) } − 1 . a∈A
b∈B
Isaacs equation
THEOREM (Evans-Souganidis, 1984) 1. If R\T is open and T ( · ) is continuous, then T ( · ) is a viscosity solution of H(x, ∇T ) = 0
in R \ T .
(HJI-L)
2. If V ( · ) is continuous, then it is a viscosity solution of V + H(x, ∇V ) = 0
in T c.
Numerical approximation
We will describe a method to construct approximation schemes for the Isaacs equation which keeps the main informations of the game/control problem.
This approach leads to the numerical approximation of a first order PDE by a discretization of the original control problem and by a discrete DP principle.
Naturally, one can also choose to construct directly an approximation scheme for the Isaacs equation based on the discretization of the PDE, e.g. using a Finite Difference (FD), Finite Volume (FV) scheme, .
Features of the DP scheme
• The schemes have a natural interpretation which comes from the Discrete Dynamic Programming Principle
• Approximate feed-back controls can be obtained without extra computations on the nodes. Once the value function is computed we easily obtain approximate optimal trajectories .
• Natural extensions to high-order methods.
Discretization: 1-Player
By applying the change of variable (Kruˇ zkov) v(x) = 1 − e−T (x) and rewrite the equation in the new variable v(x) + sup[−f (x, a) · ∇v(x)] = 1 a∈A
v(x) = 0 on T v(x) = 1 on ∂R As seen, we can drop the second boundary condition.
(HJ)
Time discretization
Time step h = ∆t > 0. We look at the system at discrete times tj = jh, j ∈ N Discrete dynamical system (
xj+1 = xj + hf (xj , aj ) x0 = x
We define the time discrete reachable set Rh ≡ {x ∈ RN : ∃ {aj } and j ∈ N such that xj ∈ T }
Discrete Minimum Time Function
Let us define, for every control sequence {aj }, the number of steps to hit the target nh({aj }, x) =
(
+∞ x∈ / Rh min{∈ N : xj ∈ T } ∀ x ∈ Rh
and the minimal number of steps to hit the target Nh(x) = min nh({aj }, x) {aj }
The discrete analogue of the minimum time function is Nh(x)h.
The discrete Bellman equation
We change the variable introducing vh(x) = 1 − e−h Nh(x) Note that 0 ≤ vh ≤ 1. By the Discrete Dynamic Programming Principle we get vh(x) = S[vh](x) h
−h
where S[vh](x) ≡ min e a∈A
vh(x) = 0
on Rh \ T .
(HJh)
i
vh(x + hf (x, a)) + 1 − e−h on T
(BC)
Characterization of vh
Since x ∈ RN \ Rh implies that x + hf (x, a) ∈ RN \ Rh we can extend vh to RN setting vh(x) = 1
on RN \ Rh .
getting rid of the boundary condition on the discrete reachable set. THEOREM vh is the unique bounded solution of (HJh) − (BC).
Local Controllability
Assumptions on T :
• (i) T ≡ {x : gi(x) ≤ 0 ∀ i = 1, . . . , M } where gi ∈ C 2(RN ) and |∇gi(x)| > 0 for any x such that gi(x) = 0. • (ii) ∀ x ∈ T ∃ a ∈ A such that gi(x) = 0 implies f (x, a) · ∇gi(x) < 0.
Bounds
Let Tδ ≡ ∂T + δB ,
d(x) ≡ dist (x, ∂T )
Lemma Under our assumptions on f and local controllability, there exist some positive constants h, δ, δ ′ such that h Nh(x) ≤ C d(x) + h ,
∀h < h ,
and T (x) ≤ c d(x) ,
∀ x ∈ Tδ ′ .
x ∈ Tδ
Convergence
THEOREM Let the assumptions of the previous Lemma be satisfied and let T be compact with nonempty interiour. Then, for h → 0+ vh → v locally uniformly in RN h Nh → T locally uniformly in R .
Sketch of the proof
v = lim inf vh(y), h→0+ y→x
v = lim sup vh(y) h→0+ y→x
1. v is a viscosity subsolution for (HJ)
2. v is a viscosity supersolution for (HJ)
Since vh(x) ≤ C d(x) + h
this implies |v| ≤ C d(x) |v| ≤ C d(x)
Sketch of the proof
Then v=v=0
on ∂T
and by a comparison theorem for sub and supersolutions (BarlesPerthame, 1987) v=v=v and v is continuous on ∂T .
on RN .
Error estimate
Let us assume Q is a compact subset of R where the following condition holds: ∃ C0 > 0 : ∀ x ∈ Q there is a time optimal control with total variation less than C0 bringing the system to T .
(BV)
THEOREM Let the assumptions of theorem hold true and let Q be a compact subset of R where (BV) holds. Then ∃ h, C > 0: |T (x) − h Nh(x)| ≤ Ch ,
∀ x ∈ Q,
∀h ≤ h
Sketch of the Proof
1. Our assumptions imply that T is continuous on ∂Tn and V is continuous in RN
2. h Nh(x) ≤ d(x) + h ,
∀ x ∈ Tδ
3. T (x) ≤ C d(x). That implies T (x) − h Nh(x) ≤ Ch Finally, (BV) implies h Nh(x) − T (x) ≤ Ch.
First order scheme
COROLLARY Under the same hypotheses there exists two positive constants h and C: |v(x) − vh(x)| ≤ Ch
∀ x ∈ Q, h ≤ h
(E)
This means that the rate of convergence of the approximation scheme is 1.
Discretization in space
We build a triangulation of a rectangle Q in R2, Q ⊃ T . xi =
nodes of the grid
L=
# of nodes
IT ≡ {i ∈ N : xi ∈ T } Iout ≡ {i ∈ N : xi + hf (xi , a) ∈ / Q, ∀ a} Iin ≡ {i ∈ N : xi + hf (xi , a) ∈ Q} k ≡ max diameter of the cells (or triangles)
Fully discrete scheme
We want to solve v(xi) = mina∈A[βv(xi + hf (xi , a)] + 1 − β, v(xi) = 0 v(xi) = 1
∀xi ∈ Iin
∀xi ∈ IT ∀xi ∈ Iout
in the space of piecewise affine functions (P1 finite elements) W k ≡ {w : Q → [0, 1] : w ∈ C(Q) and ∇w = const, in every Sj }
Linear interpolation
For any i ∈ I in , xi + hf (xi, a) ∈ Q there exists a vector of coefficients, λij (a): 0 ≤ λij (a) ≤ 1 and
L X
λij (a) = 1
j=1
xi + hf (xi , a) =
L X
j=1
λij (a)xj
Linear interpolation
So that we can write, the piecewise affine interpolation as I[v](xi + hf (xi , a)) =
L X
λij (a)v(xj )
j=1
The coefficients λij (a) are the local coordinates with respect to the triangulation. They can be computed solving a linear system.
The fixed point problem
The fully discrete problem can be written as U = S[U ], for U ∈ RL The operator S : RL → RL defined componentwise as [ e−hΛi(a)U ] + 1 − e−h , ∀ i ∈ Iin min a∈A [S(U )]i ≡ 0 ∀ i ∈ IT 1 ∀ i ∈ Iout
S : [0, 1]L → [0, 1]L
has a unique fixed point.
S properties
THEOREM S : [0, 1]L → [0, 1]L and kS(U ) − S(V )k∞ ≤ βkU − V k∞ Sketch of the proof S is monotone, i.e. U ≤ V ⇒ S(U ) ≤ S(V ) Then, for any U ∈ [0, 1]L 1 − β = Si(0) ≤ Si(U ) ≤ Si(1) = 1,
∀ i ∈ Iin
where 1 ≡ (1, 1, . . . , 1). This implies, S : [0, 1]L → [0, 1]L
S is a contraction
For any i ∈ Iin Si(U ) − Si(V ) ≤ βΛi(ˆ a)(U − V ) since kΛi(a)k ≤ 1, ∀ a ∈ A this implies
kSi(U ) − S(V )k∞ ≤ βkU − V k∞.
Monotone convergence
We choose U 0 ∈ [0, 1]L Ui0 =
(
0 ∀ i ∈ IT 1 elsewhere
U 0 ∈ U + ≡ {U ∈ [0, 1]L : U ≥ S(U )}
Monotone convergence
By the monotonicity of S the sequence U n U0 U n+1 ≡ S(U n) is monotone decreasing, at least when a sufficiently coarse grid is chosen, and Un ց U∗ by the fixed point argument. Monotonicity allows to accelerate convergence.
Back to the P-E game
Using the same change of variable v(x) = 1 − e−T (x) we can set the Isaacs eq. in RN obtaining v(x) + min max[−f (x, a, b) · ∇v(x)] = 1 b∈B a∈A v(x) = 0
in RN \ T for x ∈ ∂T
(I)
The discretization in time and space leads to a Fully discrete scheme w(xi) = max min[e−hI1[w](xi + hf (xi , a, b)] + 1 − e−h for i ∈ Iin b
w(xi) = w(xi) =
a
1 0
for i ∈ Iout2 for i ∈ IT ∪ Iout1
Fully discrete scheme for games
where Iin = {i : xi ∈ Q \ T } IT = {i : xi ∈ T ∩ Q} Iout1 = {i : xi ∈ / Q2} Iout2 = {i : xi ∈ / Q2 \ Q} Q = Q1 ∩ Q2
Scheme properties for games
THEOREM The operator S : [0, 1]L → [0, 1]L , moreover: U ≤ V ⇒ S(U ) ≤ S(V ) S is a contraction map. Let U ∗ be the unique fixed point, we define w(xi) = w(x) =
Ui∗
∀i
P j λij (a, b)w(xj )
Convergence for games
Naturally w depends on the discretization steps, h and k. THEOREM Let T be the closure of an open set with Lipschitz boundary, “diam Q → +∞” and v continuous. Then wh,k → v for h → 0+ and hk → 0+.
on compact sets of RN
Convergence: discontinuous value
ε be the sequence generated by the numerical scheme with Let wn target Tε = {x : d(x, T ) ≤ ε} THEOREM For all x there exists the limit ε (x) w(x) = lim wn + ε→0 n→+∞ n≥n(ε)
and it coincides with the lower value V of the game with target T , i.e. w=V The convergence is uniform on every compact set where V is continuous.
Error estimates
Assume for simplicity Lf ≤ 1 and v Lipschitz continuous.
Then, kwh,k − vk∞ ≤ Ch1/2 1 +
2 ! k
h
Synthesis of Feedback Controls
The algorithm computes also an approximate optimal control at each point of the grid . However by w we can also compute an approximate optimal feedback at each point of Q, i.e. we can define the feedback map F : Q → A x → F (x) = akx
feedback
where akx is the argmin of φk (x, a) ≡ e−hw(x + hf (x, a)) + 1 − e−h Note that φk (x, ·) has a minimum over A (compact), but the minimum point may be not unique.
Feedback selection
We want to construct a selection, e.g. take a strictly convex ψ. We define Akx = {a∗ ∈ A : φk (x, a∗) = min φk (x, a)} A
The feedback selection is done solving arg min ψ(a) Akx
Discrete optimal trajectories
To compute the discrete optimal trajectories, we define the piecewise constant control (in time) ak (s) = akyn,h
s ∈ [nh, (n + 1)h[
where yn,h = state of the Euler scheme, step h. This can be done also with high order methods for ODEs and a different time step. Error estimates of the approximation of feedbacks and optimal trajectories in L1 are available for control problems (F. 2001).
Feedback controls for games
The algorithm computes an approximate optimal control couple (a∗, b∗) at each point of the grid. By w we can also compute an approximate optimal feedback at every point of Q. (a∗, b∗) ≡ argminmax{e−hw(x + hf (x, a, b))} + 1 − e−h In case of multiple solutions we can select a unique couple, e.g. minimizing two convex functionals. We can also introduce an inertia criterium to stabilize the trajectories, i.e. if a at step n + 1 the set of optimal couples contains (a∗n, b∗n) we keep it.
The Tag-Chase Game
Dynamics fP (y, a, b) = vP a vP = 2
fE (y, a, b) = vE b vE = 1
Admissible control sets A = B = B(0, 1) Relative coordinates x ˜ = (xE − xP ) cos θ − (yE − yP ) sin θ y ˜ = (xE − xP ) sin θ − (yE − yP ) cos θ
The Tag-Chase Game
ys y
yP
teP
P
x vP teE E
yE
xP
xE
vE
xs
Optimal trajectories
Test 3: P=(0.3,0.3) E=(0.6,-0.3) 1 0.8 0.6 0.4 P
x2
0.2 0 -0.2 E -0.4 -0.6 -0.8 -1 -1
-0.8
-0.6
-0.4
-0.2
0 x1
0.2
0.4
0.6
0.8
1
The Sector Game
Dynamics fP (y, a, b) = vP a vP = 2
fE (y, a, b) = vE b vE = 1
Admissible control sets A = B(0, 1) B = B(0, 1) \ S,
S = (ρ cos θ, ρ sin θ), θ ∈ (θ1, θ2), |ρ| ≤ 1
Value Function
Test 4
v(x1,x2) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
-1 -1
-0.5 -0.5
0 x2
0 0.5
0.5 1
1
x1
Optimal Trajectories
Test 4: P=(-0.5,0.8) E=(-0.5,0.0) 1 P 0.5
E
x2
0
-0.5
-1
-1.5
-2 -1
-0.8
-0.6
-0.4
-0.2
0 x1
0.2
0.4
0.6
0.8
1
The Homicidal chauffeur
Dynamics x˙ P y˙P x˙ E y˙E θ˙
= vP sin θ = vP cos θ = vE sin b = vE cos b = vR a P
The Homicidal chauffeur
y
ys
b vE
yE E vP te yP
P R
a
xP
xE x
xs
Value Function
Test 5
v(x1,x2) 1
0.5
0
-1 -1
-0.5 -0.5 0 x2
0 0.5
0.5 1
1
x1
Optimal Trajectories
Test 5: P=(-0.1,-0.3) E=(0.1,0.3) 1 0.8 0.6 0.4 E
x2
0.2 0 -0.2 P -0.4 -0.6 -0.8 -1 -1
-0.8
-0.6
-0.4
-0.2
0 x1
0.2
0.4
0.6
0.8
1
Optimal Trajectories
Test 5: P=(0.0,0.2) E=(0.0,-0.2) 1 0.8 0.6 0.4
x2
0.2
P
0 E
-0.2 -0.4 -0.6 -0.8 -1 -1
-0.8
-0.6
-0.4
-0.2
0 x1
0.2
0.4
0.6
0.8
1
Optimal Trajectories (Merz Thesis)
Optimal Trajectories (computed)
Test 5 1 0.8 0.6 0.4
x2
0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0
0.1
0.2
0.3
0.4
0.5 x1
0.6
0.7
0.8
0.9
1
Other classical control problems
The finite horizon problem
Dynamics
(
y(t) ˙ = f (y(t), a(t)), t > 0, y(t0) = x
(D)
where y ∈ RN α(·) ∈ A ≡ {α : [0, +∞[→ A, measurable} A ⊂ RM compact Cost Jx(α) ≡
Z t f t0
l(y(s), α(s))e−λsds + ψ(y(tf )),
λ>0
Value function
v(x, t0) ≡ inf J(x,t0)(α) . α∈A HJB equation −vt(x, t0 ) + λv(x, t0) + max{−f (x, a) · ∇x v(x, t0) − l(x, a)} = 0 a∈A v(x, t ) = ψ(x) f
Other classical control problems
The infinite horizon problem
Dynamics (
y(t) ˙ = f (y(t), a(t)), t > 0, y(0) = x
where y ∈ RN α(·) ∈ A ≡ {α : [0, +∞[→ A, measurable} A ⊂ RM compact
(D)
Cost Jx(α) ≡
Z ∞ 0
f (y(s), α(s))e−λsds ,
λ>0
Value function
v(x) ≡ inf Jx(α) α∈A
HJB equation λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} = 0 a∈A
Other classical control problems
The infinite horizon problem with state constraints
Dynamics
(
y(t) ˙ = f (y(t), a(t)), t > 0, y(0) = x
where y ∈ RN . Now we require that yx(t) ∈ Ω ⊂ RN for any t.
Admissible controls α(·) ∈ Ax ≡ {α : [0, +∞[→ A, measurable such that yx(t) ∈ Ω}
A ⊂ RM compact Cost Jx(α) ≡
Z ∞ 0
l(y(s), α(s))e−λsds ,
λ>0
Value function
v(x) ≡ inf Jx(α) . α∈Ax
Under a compatibility condition of the type b x ∈ A : f (x, a b x ) · η(x) < 0 ∀x ∈ ∂Ω, ∃a
HJB equation (Soner) λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} ≥ 0
in Ω
λv(x) + max{−f (x, a) · ∇xv(x) − l(x, a)} ≤ 0
in Ω
a∈A
a∈A
This means that the value function is a super-solution up to the boundary and a sub-solution inside Ω.
The HJPACK Library HJPACK is a public domain library for Hamilton-Jacobi equations. It includes • finite difference schemes and semi-Lagrangian schemes in R and R2 • applications to control problems and front propagation • fast-marching schemes • a graphical interface and a user’s guide You can get it at www.caspur.it/hjpack.
Basic references
GENERAL THEORY G. Barles, Solutions de viscosit` e des equations d’Hamilton–Jacobi, Springer–Verlag, 1998. A very readable introduction to HJ equations is also contained in the book C. Evans, Partial Differential Equations, American Mathematical Society, 1999.
Basic References
DETERMINISTIC CONTROL PROBLEMS AND GAMES
M. Bardi, I. Capuzzo Dolcetta, Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations, Birkh¨ auser, 1997
A. I. Subbotin, Generalized solutions of first-order PDEs, Birkh¨ auser, Boston, 1995
M. Falcone, R. Ferretti, Semi–Lagrangian Approximation Schemes for Linear and Hamilton–Jacobi Equations, SIAM book, in preparation
STOCHASTIC CONTROL PROBLEMS
W.H. Fleming, H.M. Soner, Control of Markov chains and viscosity solutions, Springer-Verlag, 1998.
F. Silva, An introduction to stochastic control, Notes of the PhD Course ”Optimal Control”, Rome, March 2012