A Hybrid Differential Dynamic Programming Algorithm for ... - CiteSeerX

cite peer-reviewed, updated versions:

Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 1: Theory,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 382-417, DOI 10.1007/s10957-012-0039-0 Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1

A Hybrid Differential Dynamic Programming Algorithm for Robust Low-Thrust Optimization Gregory Lantoine∗ and Ryan P. Russell† Georgia Institute of Technology, Atlanta, Georgia, 30318, USA

Low-thrust propulsion is becoming increasingly considered for future space missions, but optimization of the resulting trajectories is very challenging. To solve such complex problems, differential dynamic programming is a proven technique based on Bellman’s Principle of Optimality and successive minimization of quadratic approximations. In this paper, we build upon previous and existing optimization strategies to present an alternative hybrid variant of differential dynamic programming for robust low-thrust optimization. It uses first- and second-order state transition matrices to take advantage of an efficient discretization scheme and obtain the partial derivatives needed to perform the minimization. Unlike the traditional formulation, the state transition approach provides valuable constraint sensitivities and furthermore is naturally amenable to parallel computation. The method includes also a smoothing strategy to improve robustness of convergence when starting far from the optimum, as well as the capability to handle efficiently both soft and hard constraints. Procedures to drastically reduce the computation cost are mentioned. Preliminary numerical results are presented and compared to existing algorithms to illustrate the performance and the accuracy of our approach.

Nomenclature ∆ ∆max ∆min ǫ η λ λmin L x Φ1 Φ2 π ρ ϕ Fe ge gec c F f g H ∗ PhD

Trust Region radius Maximum trust region radius Minimum trust region radius Step reduction parameter Energy penalty paramter Lagrange multiplier vector Minimum eigenvalue Lagrangian Nominal state vector: x ∈ ℜnx First-order state transition matrix Second-order state transition matrix Control law Cost reduction ratio Terminal loss function Augmented transition function Hard equality and active inequality constraint vector: ge : ℜnx × ℜnu → ℜm Vector of current values of active constraints: gec : ℜnx × ℜnu → ℜm Penalty parameter Transition function vector: F : ℜnx × ℜnu → ℜnx Equations of motion for the state: f : ℜnx × ℜnu → ℜnx Equality constraint vector: g : ℜnx × ℜnu → ℜne Hamiltonian Candidate, School of Aerospace Engineering, 270 Ferst Dr., AIAA Member, [email protected]. Professor, School of Aerospace Engineering, 270 Ferst Dr., AIAA Member, [email protected].

† Assistant

1 of 21 American Institute of Aeronautics and Astronautics

Presented as Paper AIAA-2008-6615 at the AIAA/AAS Astrodynamics Specialist Conference, Honolulu, Hawaii, August 18-21, 2008.

h I J Jk L l P P1 P2 t u X x ALM DDP HC HDDP KKT SC STM

Inequality constraint vector: h : ℜnx × ℜnu → ℜni Identity matrix Cost function Cost-to-go function Loss function Accumulated loss function Pne i Pni Total multiplier penalty function: P = i=1 P1 + j=1 P2j Multiplier penalty functions associated with equality constraints Multiplier penalty functions associated with inequality constraints Time Control vector: u ∈ ℜnu Augmented state vector: X ∈ ℜnx +nu State vector: x ∈ ℜnx Augmented Lagrangian Method Differential Dynamic Programming Hard Constraints Hybrid Differential Dynamic Programming Karush-Kuhn-Tucker Soft Constraints State Transition Matrix (used for both first and second order)

Subscripts k Node number m Dimension of active constraints N Total number of segments ne Number of equality constraints ni Number of inequality constraints nu Dimension of control vector nx Dimension of state vector Conventions ∆t Time step between two nodes δx Small increment of x x˙ Total derivative of x with respect to time: x˙ = dx/dt P A • B When A is a matrix and P B is a second-order tensor: (A • B)ijk = p A(i, p)B(p, j, k). When A is an array, (A • B)ij = p A(p)B(p, i, j) J∗ Control-free cost where control is replaced by the state-dependent control law: J ∗ (x) = J(x, π(x)) Jq First partial derivative of J with respect to the dummy vector variable q (columnwise convention): iT h ∂J ∂J Jq = ∇q J = ∂q · · · ∂qN 1 Jqq Second partial derivative of J with respect to the dummy vector variable q: Jqq = ∇qq J =   ∂J ∂J · · · ∂q1 qN  ∂q1. q1 .. ..   .  . .   . ∂J · · · ∂q∂J ∂qN q1 N qN Superscripts i ith component of a vector T Transpose

I.

Introduction

L

ow-thrust propulsion systems are gaining consideration for future space missions because propellant is used more efficiently, allowing reduced spacecraft mass and/or increased payload capacity. The Deep Space 1 mission successfully implemented this technology to achieve a comet encounter1 and the Dawn mission is currently using electric propulsion to transfer to the two most massive asteroids in the solar system, 2 of 21 American Institute of Aeronautics and Astronautics

Vesta and Ceres.2 Many other past, current, and planned missions are taking advantage of low-thrust technology.3 However the optimization of the resulting trajectories is a very challenging and time-consuming task since such problems are characterized by highly nonlinear dynamics, control bounds, state constraints, large design spaces with discrete system parameters, and multiple local optima. This paper presents an efficient trajectory optimization method that is only local but global optimization can be achieved by rapid problem convergence that allows more trials of initial guesses. In the literature, numerous approaches have been reported to solve this problem.4, 5 Most of them typically fall into two distinct categories: indirect and direct methods. Indirect methods are based on necessary optimality conditions derived from the Pontryagin Maximum Principle.6 The original problem is then reduced to a two-point boundary value problem, solved via shooting, relaxation, collocation, or gradient descent. But the methods depend strongly on the accuracy of the initial guess, and introduce extra variables - the so-called co-states - which are not physically intuitive. On the other hand, direct methods consist of the direct minimization of the objective function by discretizing the control variables and using nonlinear programming techniques.7 These methods are more robust and flexible primarily because the necessary conditions do not have to be derived for each problem. However, the parameterization leads to a large number of variables, especially when the thrust has to be operated over long periods. One idea that combines the advantages of both approaches is provided by Differential Dynamic Programming (DDP).8 The method is based on Bellman’s Principle of Optimality of dynamic programming and successive backward quadratic expansions of the objective function. Quadratic programming is then used on each resulting quadratic subproblem to find control increments that improve the trajectory locally. The states and objective function are then calculated forward using the new control law and the process is repeated until convergence. DDP has second-order convergence if sufficiently close to the optimal trajectory, and appears to be numerically more efficient than Newton’s method.9 Like direct methods, DDP is known to be robust to poor initial guesses since it also includes a parameterization of the control variables. However, it is not as sensitive to the resulting high dimensional problem because DDP transforms this large problem into a succession of low dimensional subproblems. In addition, there is also a strong connection with indirect methods. For instance first-order DDP integrates the same equations as those from calculus of variation and finds control increments to decrease the Hamiltonian at each iteration.10, 11 In second-order DDP, Jacobson performs strong control variations to globally minimize the Hamiltonian for simple problems. Therefore, DDP is not blind to necessary conditions of optimality, which makes the minimization process more robust. DDP is a tool of choice for problems that can be translated into a dynamic optimization formulation.12 We will see in the next section how to convert low-thrust problems into a form suitable for DDP. However, although DDP has been increasingly used in favor of more traditional control theory algorithms, it is only effective for smooth unconstrained problems, otherwise it may converge slowly or may not converge at all. Unfortunately low-thrust problems typically include constraints (control bounds, terminal constraints in particular) and are highly non-smooth because they generally involve bang-bang controls. Accordingly, this paper discusses modifications and improvements to the standard DDP algorithm. By combining DDP with some well-proven mathematical programming techniques, we get an efficient Hybrid Differential Dynamic Programming (HDDP) algorithm for low-thrust trajectory optimization, with an emphasis on robustness and flexibility. The following subsections outline areas of improvement addressed in this study. A.

Efficient discretization scheme

DDP is primarily used in discrete time problems. To handle continuous time problems, most previous authors use an Euler scheme to approximate dynamics,8 resulting in a loss of accuracy. In this paper, we follow the recent work of Whiffen13 and Patel et al.14 to keep the exact dynamics during the optimization process using multiple stages of fixed control. In particular, Whiffen developed the SDC algorithms, which are considered as state-of-the-art for DDP-based methods. The multi-stage SDC formulation has been been successfully implemented in the Mystic software to solve complex low-thrust problems.15 For example, Mystic is currently being used at the Jet Propulsion Laboratory to design and navigate the elaborate trajectory of the aforementioned Dawn spacecraft. But contrary to the SDC algorithm that relies on Riccati-like equations 3 of 21 American Institute of Aeronautics and Astronautics

integrated backward to get the derivatives, we propose to use the first-order and second-order state transition matrices to generate the required partials. We will show in later sections that the STM-based formulation enjoys several benefits such as improved constraint handling and natural parallelization. B.

Improvement of convergence properties

DDP is based on successive quadratic approximation methods. But minimizing a quadratic is only directly possible when the computed Hessian is positive definite, which may not (and in practice will not) be the case for general nonlinear dynamics and initial guesses far from optimality. DDP may not converge at all in that situation. The traditional way to remedy this issue is to the shift the eigenvalues of the Hessian to ensure positivity using one of the many Trust Region or shifting variants.13, 16, 17 But a major drawback of the shifting strategy is that the exact Hessian of the problem is no longer used, and therefore the rate of convergence is likely reduced. We therefore selected another - related - approach that relies on smoothing and stabilizing the problem by adding a perturbed quadratic energy term in the objective function. This extra term yields more regular controls and tends to make the exact Hessian of this auxiliary problem positive definite. Unlike standard smoothing methods that reduce the perturbed term by continuation, our approach allows it to be specified dynamically (rather than statically in advance) to guarantee positivity and control the increment step to ensure second-order expansions are correct. C.

Constrained optimization

Modifications of the DDP algorithm for constrained problems have been proposed by many authors. They can be grouped together in two main categories. The most popular way is to add a penalty function in the objective to convert the constrained problem into an unconstrained problem.13, 15 The second approach is to use active set quadratic programming methods to perform minimization while satisfying constraints to the first or second order.14, 18 Although quite successful, from a mission analysis point of view, previous methods suffer a common deficiency. Low-thrust problems involve hard and soft constraints and previous approaches do not differentiate between them. Hard constraints (HC) are constraints that must not be violated. They consist principally of technical limitations of the spacecraft (maximum thrust, maximum heating, ...). On the other hand, soft constraints (SC) can be satisfied within a specified tolerance. A typical example of a soft constraint in preliminary design is a maximum miss distance allowed for a rendezvous problem. In our algorithm, hard and soft constraints are handled differently to better reflect user requirements. Constrained quadratic programming is used to guarantee adhesion to active hard constraints. Soft constraints are modeled using an augmented Lagrangian approach where constraint penalty terms are added to the Lagrangian. Then the unconstrained DDP approach can be used to solve the resulting unconstrained minimization problem. This procedure for handling mixed hard/soft constraints has been previously adopted by Lin et al.19 We will see in a later section how we can take advantage of the STM-based approach to improve algorithm efficiency of this procedure. This paper is organized as follows. First we present our general model algorithm. Next, we describe different strategies to improve convergence properties followed by a procedure to account for mixed hard/soft constraints. A summary of the different steps of the full algorithm is then presented including a section dealing with practical speed improving implementations of the algorithm. Finally, we show representative numerical results.

II. A.

Problem Statement

Discretization

Consistent with the management period formulation of SDC,13 the trajectory is subdivided into N segments with constant thrusting (see figure 1). This is a reasonable model since the thrust cannot change too frequently in reality. This also reduces the computational requirements because control variables are reduced.


u0 xN-1

x1

uN-1

x2

x0

xN

u1 t0

t1

tN-1

t2

tN

Figure 1. Discretization of low-thrust problem.

B.

Dynamics

The mapping between the states that form the segment boundaries is defined on each segment by: xk+1 = Fk (xk , uk )

(1)

In our formulation for a continuous problem, Fk = xk +

Z

tk+1

f (x, uk , t) dt

(2)

tk

Note this is an extension of the discrete DDP formulation of Mayne20 and Jacobson8 in which Fk = xk + f (xk , uk , tk )∆t, which corresponds to an Euler scheme. C.

Objective function

The objective function to be minimized is defined by: J=

N −1 X

Lk (xk , uk ) + ϕ(xN )

(3)

k=0

Rt Note that accumulated loss function over the whole segment tkk+1 l(x, uk ) dt can be classically represented as well by adding an extra state variable. A typical example is the fuel consumption of the spacecraft where l is equal to thrust. D.

Constraints

For each node k = 0...N , constraints are written in the general form gk and hk to account for path, control and terminal constraints. To simplify the notations, we add a dummy control at the last node uN = 0. gk (xk , uk ) = 0 hk (xk , uk ) ≤ 0 E.

(4)

Summary: Dynamic optimization formulation

The optimal control problem we are considering is reduced to the following form:

min

u0 ,...,uN −1

subject to

J=

N −1 X

Lk (xk , uk ) + ϕ(xN )

k=0

   xk+1 = Fk (xk , uk ) gk (xk , uk ) = 0    hk (xk , uk ) ≤ 0

for k = 0...N − 1 for k = 0...N for k = 0...N

5 of 21

American Institute of Aeronautics and Astronautics

(5)

This formulation has a dynamic structure that is well adapted to apply Bellman’s Principle of Optimality that describes the process of solving problems where one needs to find the best decisions one after another (see figure 2). An approach for the solution using DDP is discussed in detail in the following sections. u0

x0

F0 g0 = 0 h0 ≤ 0

u1

uk

F1 g1 = 0 h1 ≤ 0

x1

L0

x2

Fk gk = 0 hk ≤ 0

xk

L1

uN-1

xk+1

xN-1

FN-1 gN-1 = 0 hN-1 ≤ 0

Lk

LN-1

xN gN = 0 hN ≤ 0

φ

Figure 2. Dynamic structure of optimal control low-thrust problem.

III.

STM-based DDP algorithm formulation

DDP procedures for unconstrained discrete-time control problems were initially introduced by Mayne,20 Jacobson and Mayne,8 Gershwin and Jacobson,21 Dyer and Mc Reynolds,22 and further developed by many other authors. Yakowitz gives a survey of the many different versions of DDP.23 This section represents part of this continuing effort to investigate improved implementations of DDP. After reviewing the basics of DDP, we introduce the State Transition Matrix approach to obtain the partial derivatives needed by DDP. A.

Bellman Principle of Optimality

The fundamental foundation of DDP is Bellman’s Principle of Optimality.24 It essentially states that on an optimal solution no matter where we start, the remaining trajectory must be optimal. Therefore instead of considering the total cost over the whole trajectory of Eq. 3, dynamic programming techniques consider the cost-to-go function: Jk (xk , uk , ..., uN −1 ) =

N −1 X

Li (xi , ui ) + ϕ(xN )

(6)

i=k

Since the search of the optimal control at each segment is independent of the initial states and controls used before the segment considered, the goal is to seek the minimum of this cost-to-go for each k = 0...N − 1, which is obtained through a control law πk . Therefore Jk∗ depends on the current state xk only. Jk∗ (xk ) =

min

uk ,...,uN −1

Jk (xk , uk , ..., uN −1 ) = Jk (xk , πk (xk ), ..., πN −1 (xN −1 ))

(7)

According to the Principle of Optimality, we can perform a recursion by decomposing this optimization problem into that of the current segment plus that for the rest of the cost-to-go: ∗ Jk (xk ) = min Lk (xk , uk ) + min Jk+1 (xk+1 , uk+1 , ..., uN −1 ) (8) uk

uk+1 ,...,uN −1

Using Eq. 7 we can substitute the cost-to-go of the next segments: ∗ (xk+1 ) = min [Jk (xk )] Jk∗ (xk ) = min Lk (xk , uk ) + Jk+1 uk

uk

(9)

Eq. 9 is the fundamental recursive equation that is the basis for all dynamic programming techniques. If we suppose that the minimization has been performed at segments k...N − 1, then Jk∗ (xk ) can be interpreted 6 of 21 American Institute of Aeronautics and Astronautics

as the expected cost if the system is initialized with state xk at time step k, and governed by the controls obtained to minimize the remaining time steps. The well-known Hamilton-Jacobi-Bellman partial differential equation can be derived from Eq. 9 by differentiating it with respect to time. However, classical dynamic programming performs the minimization step of Eq. 9 by discretizing both states and controls, which requires huge storage requirements (“curse of dimensionality”). To overcome this issue, differential dynamic programming sacrifices globality with local quadratic approximations around the current trajectory at each segment, which reduces drastically the dimension of the search space. In our approach, the coefficients of the quadratic approximations are derived with the help of the first-order and second-order state transition matrices. This formulation comes naturally from Bellman’s Principle of Optimality and offers several advantages to be explained throughout the next sections. B.

Local quadratic expansion

Being a second-order method, DDP relies on quadratic Taylor series expansions about the nominal solution in terms of state and control deviations δxk = xk − xk and δuk = uk − uk . Expanding the cost-to-go function Jk (xk + δxk , uk + δuk ) of segment k, we get: 1 1 T T δJk ≈ Jx,k δxk + Ju,k δuk + δxTk Jxx,k δxk + δuTk Juu,k δuk + δxTk Jxu,k δuk 2 2

(10)

The goal is to find the coefficients of this Taylor series expansion to be able to minimize this expression with respect to δuk . This is achievable by marching backwards and mapping the partials from one segment to another using the state-transition matrix. Indeed, if the minimization has been performed at the segments ∗ ∗ upstream, then the partials Jx,k+1 and Jxx,k+1 are known. Therefore we can expand the terms of the current ∗ cost-to-go Jk = Lk + Jk+1 and match with those of Eq. 10: 1 1 δLk ≈ LTx,k δxk + LTu,k δuk + δxTk Lxx,k δxk + δuTk Luu,k δuk + δxTk Lxu,k δuk 2 2 1 ∗ ∗T ∗ δJk+1 ≈ Jx,k+1 δxk+1 + δxTk+1 Jxx,k+1 δxk+1 2

(11a) (11b)

All partials of Eq. 11a and Eq. 11b are known. However, to be able to equal coefficients, we need to express δxk+1 as a function of δxk and δuk . Using Eq. 1, we can do a quadratic expansion of the transition function to obtain the desired relationship: 1 1 T T δxk+1 ≈ Fx,k δxk + Fu,k δuk + δxTk • Fxx,k δxk + δuTk • Fuu,k δuk + δxTk • Fxu,k δuk (12) 2 2 To get a more compact expression for clarity, we define the augmented state XkT = xTk uTk and the augmented transition function FekT = FkT 0nu (since u˙ k = 0). By definition of the first-order and secondorder state transition matrices, Eq. 12 simplifies to: 1 1 T δXk+1 ≈ FeX,k δXk + δXkT • FeXX,k δXk = Φ1k δXk + δXkT • Φ2k δXk 2 2

State transition matrices are useful tools for our problem since they can map the perturbations in the spacecraft state from one time to another. The methodology presented here to propagate perturbations with high-order state transition matrices is not new. For instance, Majji et al.25 and Park et al.26 use them to implement very accurate filters for orbital propagation under uncertainty. The state transition matrices are computed from the following differential equations:

(13)

X (t k +1 )

X (t k )

δX (t k ) X (t k )

Φ1k , Φ 2k

δX (t k +1 ) X (t k +1 )

Figure 3. Perturbation mapping.

Φ˙ 1k = fX Φ1k

(14a)

1 Φ˙ 2k = fX • Φ2k + Φ1T k • fXX • Φk

(14b)


subject to the initial conditions Φ1k (tk ) = Inx +nu and Φ2k (tk ) = 0nx +nu . ∗ Combining Eq. 11a, Eq. 11b, Eq. 13, and matching Taylor coefficients of the variation of Jk = Lk + Jk+1 with those of Eq. 10, we get the needed partials:

" C.

"

Jxx,k Jux,k

#T #T " ∗ Jx,k+1 Lx,k Φ1k = + Lu,k 0nu #T # " # " ∗ ∗ J J 0 Lxu,k n ×n x u xx,k+1 Φ1k + x,k+1 • Φ2k + Φ1T k Luu,k 0nu 0nu ×nx 0nu ×nu

Jx,k Ju,k

# " Lxx,k Jxu,k = Lux,k Juu,k

#T

"

(15a)

(15b)

Connection with Pontryagin Maximum Principle

Since the sensitivity of J with respect to x is generally the same as the co-state of x,10 the discrete Hamiltonian of node k is then defined by:20, 21 ∗T Hk = Lk + Jx,k+1 Fk

(16)

We can express the partials of the cost-to-go as a function of partials of Hk . First, the STMs are partitioned according to the parts relative to the states and the controls. For instance, the first-order STM is partitioned the following way: # " Φ1u Φ1x 1 (17) Φ = 0nu ×nx 0nu ×nu The same principle applies for the second-order STM. We can now express the cost-to-go partials in terms of the submatrices generated: T ∗T T Jx,k = LTx,k + Jx,k+1 Φ1x,k = Hx,k T Ju,k ∗T Jxx,k = Lxx,k + Jx,k+1 ∗T Juu,k = Luu,k + Jx,k+1 ∗T Jux,k = Lux,k + Jx,k+1

=

∗T T + Jx,k+1 Φ1u,k = Hu,k ∗ 1 Φ1T x,k Jxx,k+1 Φx,k = Hxx,k ∗ 1 Φ1T u,k Jxx,k+1 Φu,k = Huu,k

LTu,k

• Φ2xx,k • Φ2uu,k • Φ2ux,k

+ +

∗ 1 + Φ1T u,k Jxx,k+1 Φx,k =

∗ 1 + Φ1T x,k Jxx,k+1 Φx,k ∗ 1 + Φ1T u,k Jxx,k+1 Φu,k ∗ 1 Hux,k + Φ1T u,k Jxx,k+1 Φx,k

(18a) (18b) (18c) (18d) (18e)

Eq. 18a and Eq. 18b show that the first-order derivatives of the current cost-to-go and that of the Hamiltonian are identical. Therefore, minimizing Jk comes to the same as minimizing H and the final optimal solution found by DDP is then guaranteed to satisfy the Pontryagin Maximum principle. In the case of DDP, the minimization is performed using weak variations of the controls (necessary to keep the secondorder approximations accurate as we will see in the next section) in contrast to many indirect methods that use strong variations.

Classical discrete formulation

STM discrete formulation

H u = Lu + J x f u

H u = Lu + J x Φ u

Figure 4. Comparison of classical and STM-based discretization schemes. 8 of 21 American Institute of Aeronautics and Astronautics

Also, one advantage of our discrete formulation is that H at one node accounts for the effect of the controls over the entire corresponding segment through the sensitivities provided by the STMs. Most previous discrete or continuous formulations are minimizing H at one point only,8, 24 which is less efficient and requires more mesh points to optimize at the same resolution, as shown in figure 4. However, a fine grid is still necessary for areas with rapidly varying optimal controls since constant controls do not capture well the optimal solution in that case. In the rest of the paper, we will keep using the partials with respect to the cost-to-go function. If necessary, the Hamiltonian partials counterparts are derived easily using Eq. 18a to Eq. 18e. D.

Control law

Now that we know the coefficients of the Taylor series, we can minimize Eq. 10 with respect to δuk . Making the gradient vanish, we get the usual control law: −1 −1 δuk = −Juu,k Ju,k − Juu,k Jux,k δxk

(19)

However, the resulting δuk might violate hard constraints or Juu,k might not be positive definite - in the latter case δuk is unlikely to be a descent direction. Therefore, to ensure that our algorithm can handle general situations, we will not take this specific control law. All we assume is that the control law is affine with respect to the states: δuk = Ak + Bk δxk

(20)

If the problem is unconstrained and Juu,k is positive definite (a very rare case in low-thrust problems without the smoothing strategy explained in section IV), then:  A = −J −1 J k uu,k u,k (21) Bk = −J −1 Jux,k uu,k

Otherwise, the matrices Ak and Bk have to be modified to obtain the right control law. This will be the subject of the next two main sections. E.

Derivation of state-only quadratic coefficients and expected reduction

After replacing in Eq. 10 the controls with the corresponding state-dependent control law and noting that the square matrix is symmetric, we can deduce the expected cost reduction and the state-only quadratic coefficients at the segment k:a 1 T ERk = ERk+1 + Ju,k Ak + ATk Juu,k Ak 2 ∗ T Jx,k = Jx,k + Ju,k Bk + ATk Juu,k Bk + ATk Jux,k

(22b)

∗ T Jxx,k = Jxx,k + BkT Juu,k Bk + BkT Jux,k + Jux,k Bk

(22c)

(22a)

The initial conditions of those coefficients are obtained from the terminal cost function, while the expected reduction is set to zero at the beginning of the procedure: ERN = 0 ∗ Jx,N = φx

(23a) (23b)

∗ Jxx,k = φxx

(23c)

The general procedure outlined in this section to obtain the required partial derivatives is summarized in figure 5. This step is generally called a backward sweep in standard DDP papers. Note that the computation of the STMs is performed forward alongside the integration of the trajectory. Therefore contrary to most DDP approaches as well as the SDC algorithm,13 no integration is needed in our backward sweep. a no

terms in δx are present in the ER since δx is zero on the reference trajectory 9 of 21 American Institute of Aeronautics and Astronautics

Lk ( xk , uk )

L0 ( x0 , u0 )

Lu , 0

Lx ,k

Quadratic approximation

Lxx ,0 Luu , 0

Lu ,k φ

Lxx ,k Luu ,k

Mapping

Φ10 , Φ 02

Quadratic approximation

Lx ,0

*

Mapping J x , k

J xx* ,k

Φ1k −1 , Φ 2k −1 ER

k

Mapping

Φ1k , Φ 2k

Mapping

J x*, N J xx* , N

Φ1N −1 , Φ 2N −1 ER = 0 N

Minimization

Minimization

Control Law

Control Law

Figure 5. General procedure to generate required derivatives.

IV.

Improvement of convergence properties

−1 Like most second-order methods, one complication with DDP is the reliance on the Newton step −Juu,k Ju,k for the minimization of the quadratic subproblems at each segment. However, a descent direction is guaranteed to be obtained only if Juu,k is positive definite, which may not (and likely will not) be the case in practice. Another issue is the necessity to limit the magnitude of the variations δxk and δuk to ensure that the second-order truncations of the Taylor series are reliable. Our approach intends to solve both issues by using an innovative adaptive smoothing strategy that simultaneously enforces positive definite Hessians and limits the increment magnitudes within a certain radius.

A.

Adaptive smoothing strategy

Our approach is inspired by the smoothing method of Jacobson et al.27 and Bullock28 who modify the original problem by adding to the objective function a quadratic penalty term on the control variations of each segment: e k = Lk + ηk kδuk k2 L 2

(24)

The effect of the Hessian, shown in Eq. 25, is similar to that of a shifting strategy, i.e. a multiple of identity is added to obtain a modified positive definite Hessian. Note that there is no effect on the gradient because δuk = 0 on the nominal solution. e uu,k = Luu,k + ηk In ×n L u u

(25)

Contrary to Jacobson who uses for all controls a single η specified in advance in order to follow a monotone decreasing sequence, our ηk is dynamically adjusted so that the resulting step falls within the current trust radius ∆. For that, we compute the minimum eigenvalue λmin of the Hessian Juu,k of the original problem kJu,k k to ensure that the Hessian Juu,k of the modified and choose ηk = max(0, λ∗ − λmin ) where λ∗ = ∆ 10 of 21 American Institute of Aeronautics and Astronautics

problem has no eigenvalues smaller than λ∗ . This way the equation Juu,k δuk = −Ju,k is guaranteed to kJ k = ∆. We emphasize that our trust region method is not iterative and have a solution with kδuk k ≤ λu,k ∗ the eigenvalue calculation is fast due to the typical low dimension of the control vector, u. The advantage of this approach over a classical shifting strategy is that we are modifying the problem to use the exact Hessian instead of the modifying the Hessian of the original problem, which would make the minimization less efficient. B.

Trust Region Update

It is necessary to have a procedure to quantify the quality of the second-order approximations. One would like to be able to define the region in which the quadratic truncations are reliable. Following Rodriquez et al,29 Whiffen,13 and other general nonlinear programming techniques, a test at the end of each full iteration is therefore performed based on the cost reduction ratio ρ defined by: ρ = (Jnew − J)/ER0

(26)

If ρ ≤ 0, this means that there is no reduction, the approximation is bad and the trust region radius ∆ has to be reduced, and the iterate is rejected. If ρ > 0, the trust region update depends on how close to zero or one ρ is, and whether the control iterate is on the boundary of the trust region. There is no general rule for the trust region updating, it is essentially based on heuristic and experience. The procedure used in the algorithm is the same as the one of Rodriguez et al.:29   max(0.25∆k , ∆min ) if ρ ≤ 0.25,     ∆ if 0.25 ≤ ρ ≤ 0.75, k (27) ∆k,new =   ∆k if ρ > 0.75 and kδuk k < ∆k ,     min(2∆k , ∆max ) if ρ > 0.75 and kδuk k = ∆k . Note that if the iteration was not successful, we reject the step and redo the backward sweep with the reduced radius without recomputing the STMs.

V.

Constrained Differential Dynamic Programming

Low-thrust problems always involve a certain number of constraints, from limitations on the controls to terminal state constraints necessary to achieve mission objectives. But the methodology described in Section III is designed for unconstrained problems only, so it has to be modified to account for constraints. One common and perhaps the simplest solution method uses penalty functions to transform the problem to an unconstrained form.13, 30–33 While this technique is proven successful under many circumstances, penalty functions are known to accompany slow convergence rates.34 Here we explore alternative methods in search of improved convergence (despite the added complexity). To make the HDDP algorithm more flexible and obtain tailored trajectories, we propose a methodology that can handle and differentiate between hard and soft constraints. This gives more degrees of freedom to ease convergence while satisfying the constraints accurately. The two corresponding computational procedures are presented below. A.

Hard constraints handling by Fletcher’s quadratic programming

Some constraints must not be violated. Examples include the maximum allowed thrust of a spacecraft engine, the minimum distance to the sun, minimum flyby distances, or an automated docking maneuver. Therefore, the control law δuk = Ak + Bk δxk has to be modified so that it can only cause changes along the active hard constraints. A simple procedure is proposed by Yakowitz.18, 35 Active constraints are linearized and a constrained quadratic programming technique based on Fletcher’s work36 is applied. The problem to be solved is very similar to the one of Eq. 5, except that we are now considering only constraints of the form gek (xk , uk ) = 0 at each segment. gek is of dimension mk and includes the equality constraints and the current active inequality constraints that must be satisfied. The set of active inequality 11 of 21


constraints comes from the preceding iteration. We assume here that all constraints are independent and mk < nu (if mk ≥ nu , the reader is encouraged to relax some constraints with the method of the next subsection). Control-independent terminal constraints gN (xN ) = 0 can be expressed in this form by replacing xN using Eq. 1 to get geN = gN (F (xN −1 , uN −1 )).

As in Section III, a new control law must be found by marching backwards and solving the successive minimization subproblems that arise. As usual, we perform a quadratic approximation of Jk while the constraints gek are linearized. If necessary, Juu,k should be handled using techniques of Section IV. 1 1 T T min δJk = Jx,k δxk + Ju,k δuk + δxTk Jxx,k δxk + δuTk Juu,k δuk + δxTk Jxu,k δuk δuk 2 2 T T subject to geu,k δuk + gex,k δxk + gec = 0

(28)

Fletcher36 presents a good algorithm for this problem by solving the Karush-Kuhn-Tucker (KKT) conditions. The Lagrangian of the system is introduced: 1 1 T T T T gu,k δuk + gex,k δxk + gec ) (29) L = Jx,k δxk + Ju,k δuk + δxTk Jxx,k δxk + δuTk Juu,k δuk + δxTk Jxu,k δuk + λ(e 2 2 Making the gradient of Eq. 29 vanish with respect to δuk and λ leads to the following system: " #" # # " T −Ju,k − Jxu,k δxk Juu,k geu,k δuk = T T geu,k 0 λ −e gc − gex,k δxk

(30)

To solve it, the classical formula for the inverse of a partitioned matrix is used:36 δuk = Ak + Bk δxk λ = λ∗ + νδxk

   Ak = −KJu,k − GT gec     T T  − GT gex,k Bk = −K T Jxu,k     λ∗ = −GJ + (e −1 T gu,k Juu,k geu,k )−1 gec u,k where −1 T T T   ν = −GJxu,k + (e gu,k Juu,k geu,k )−1 gex,k     −1 −1 T T  G = (e gu,k Juu,k geu,k )−1 geu,k Juu,k     K = J −1 (I − ge G) u,k uu,k nu

(31) (32)

(33)

The KKT conditions require the Lagrange multipliers of the active inequality constraints to be positive.37 Therefore the signs of the updated components of the multipliers are tested on the nominal trajectory (i.e. for δxk = 0). If some components are not positive, corresponding constraints are dropped from the set of active constraints, and Fletcher provides an economical algorithm to modify matrices G and K accordingly.36 Suppose for simplicity that it is the mth equation that must be dropped: h

i G(m, :)T G(m, :)Juu,k G Gnew , 0 = G − G(m, :)Juu,k G(m, :)T Knew = K +

G(m, :)T G(m, :) G(m, :)Juu,k G(m, :)T

(34) (35)

In addition, if uk + Ak violates an inactive constraint, Ak is modified by reducing the current trust-region radius ∆k until the constraint is satisfied (using a dichotomy procedure for instance). Also, the control law of Eq. 31 guarantees only that the constraints are met to the first-order. During the forward run, precautions 12 of 21 American Institute of Aeronautics and Astronautics

must be therefore taken to ensure nonlinear constraints are still satisfied (by reducing the trust-region radius for instance). For highly nonlinear constraints, it might be more efficient to employ the next method with a small tolerance. In future work we intend to implement the algorithm of Patel and Scheeres14 who derive a quadratic control low to meet the constraints to the second-order. Finally, note that this approach requires starting with a solution that is exactly (or close to be) feasible. In fact the algorithm stops when a solution is strongly infeasible and no locally feasible point can be found. Consequently, a preliminary run to minimize only constraint violations might be necessary to guarantee feasibility. The method described next could also be used at first. In practice, the HC method is particularly useful for linear constraints such as maximum thrust or minimum orbit radius where nonlinearity is not an issue and feasible initial guesses are easy to achieve. B.

Soft constraints handling by Augmented Lagrangian

Some constraints are extra difficult to make feasible and, further, have a wide array of acceptable violation. It is also sometimes beneficial to violate hard constraints on the way to convergence to obtain more easily a feasible trajectory that hits the target first. In these cases, we relax the “soft” constraints based on the Augmented Lagrangian method (ALM). This method was proposed in the nonlinear programming area by Hestenes38 and Powell39 to overcome the main difficulties of traditional penalty function methods related to ill-conditioning and slow rate of convergence. ALM consists of introducing dual multiplier variables associated to each constraint and transforms the constrained problem into an unconstrained one, which DDP can solve effectively. This technique is also well-suited for our STM-based algorithm because partial derivatives with respect to the multipliers can be calculated “for free” without integrating a new set of equations. 1.

Formulation of the problem

For relaxing constraints at segment k, the augmented Lagrangian loss function is introduced as:

e k = Lk + L

nek X

i P1,k (gki , λik , cik ) +

nik X

nek +j

j P2,k (hjk , λk

nek +j

, ck

) = Lk + Pk (gk , hk , λk , ck )

(36a)

j=1

i=1

ϕ e = ϕ + PN (gN , hN , λN , cN )

(36b)

P1 (g, λ, c) = λg + cg 2

(37)

The choice of P1 and P2 is known to have a dramatic effect on robustness and convergence rate.40 For equality constraints, the classical multiplier penalty function is used:

The choice of the function for inequality constraints is not as simple. DDP is a second-order method that requires functions to be twice differentiable, therefore we cannot extend the previous penalty function to inequality constraints (second derivative would not be continuous). Chang et al.41 and Ruxton42 suggest a second-order continuous multiplier function that is showing encouraging results with DDP.  λh(1 + ch) if h > 0, (38) P2 (h, λ, c) = λh/(1 − ch) if h ≤ 0. In the primal-dual framework, the optimal control problem of Eq. 5 is recast as the following minimax problem. We omit theoretical justifications for the conciseness of the paper and interested readers may refer to the book of Bertsekas for detailed theory.34

max

min

λ0 ,...,λN u0 ,...,uN −1

N −1 X k=0

e k (xk , uk ) + ϕ(x L e N)

subject to xk+1 = Fk (xk , uk ) for k = 0...N − 1 13 of 21 American Institute of Aeronautics and Astronautics

(39)

This minimax problem is solved in two successive stages. In the first stage, λk is kept constant and the procedure of Section III is applied to perform minimization with respect to u without regard to the value of the constraints. In the second, an increment δλk is calculated (without requiring another calculation of the STMs) to move towards feasibility. The methodology applied for this second step is described in detail below. Since the procedure is common for any segment, consider for simplicity only p constraints at an arbitrary segment q. 2.

Generation of multiplier partial derivatives

At segment k = q (constraints do not contribute for k > q), sensitivities of the cost with respect to multipliers come from the added multiplier penalty function: Jλ,k = Pλ,k , Jλλ,k = Pλλ,k , Jxλ,k = Pxλ,k , Juλ,k = Puλ,k

(40)

At segments k < q, mapping of control-free partials (see next subsection on how to generate those) from segment k + 1 to segment k provides the desired derivatives. Since multipliers do not appear in the equations of motion, derivatives with respect to multipliers only are straightforward: ∗ ∗ Jλ,k = Jλ,k+1 , Jλλ,k = Jλλ,k+1

(41)

Cross derivatives are determined using the definition of the first-order STM and the chain rule. This is one advantage of the STM-based formulation: only a chain rule through the STM is calculated without having to integrate more equations. Note this method can be generalized to get the partial derivatives of any function dependent on the augmented state at a particular time. h 3.

T Jxλ,k

h i ∂Xk+1 T ∗T ∗T T = Jxλ,k+1 = JXλ,k = JXλ,k+1 Juλ,k ∂Xk

i 0nu Φ1k

(42)

Augmented control law and generation of control-free partials

To obtain the control law, we still do a quadratic approximation of the current cost-to-go but we add some terms to account for partials with respect to the Lagrange multipliers: 1 1 T T δJk ≈Jx,k δxk + Ju,k δuk + δxTk Jxx,k δxk + δuTk Juu,k δuk + δxTk Jxu,k δuk 2 2 1 T + Jλ,k δλq + δλTq Jλλ,k δλq + δxTk Jxλ,k δλq + δuTk Juλ,k δλq 2

(43)

Making the gradient of this expression vanish with respect to δuk , we get the control law of the form: δuk = Ak + Bk δxk + Ck,q δλq

(44)

Ak and Bk are still given by Eq. 21 or, in case when hard constraints are present, Eq. 33. Ck,q has similar expressions:  −J −1 J in case there is no hard constraint, uu,k uλ,k (45) Ck,q = −KJ otherwise. uλ,k

where K is defined by Eq. 33. Replacing the control law in Eq. 43, we obtain extra expressions for the control-free partials with respect to the multipliers, in addition to Eq. 22b and Eq. 22c: ∗ T Jλ,k = Jλ,k + Ju,k Ck,q + ATk Juu,k Ck,q + ATk Juλ,k

(46a)

∗ T T T Jλλ,k = Jλλ,k + Ck,q Juu,k Ck,q + Ck,q Juλ,k + Juλ,k Ck,q

(46b)

∗ Jxλ,k

= Jxλ,k +

BkT Juu,k Ck,q

+

BkT Juλ,k

+

T Jux,k Ck,q


(46c)

4.

Multipliers increment

The last two procedures are repeated recursively in a backward sweep until the first segment is reached. The result is the final derivatives of the total cost with respect to the multipliers: ∗ ∗ Jλ = Jλ,0 , Jλλ = Jλλ,0

(47) 8

Jacobson proves that the resulting Jλλ should be negative definite under mild conditions. The opti−1 mal increment to maximize J is then δλq = −Jλλ Jλ . To limit the step magnitude and justify quadratic expansions, this expression is modified using a step reduction parameter ǫ: −1 δλq = −ǫJλλ Jλ with ǫmin ≤ ǫ ≤ 1

(48)

Plugging this step into the quadratic expansion of J with respect to λ, δJ = ER + JλT δλ + δλT Jλλ δλ, the updated expected reduction -note that it can be now an augmentation due to the negativity of Jλλ - is easily derived: −1 ER = ER + (ǫ2 /2 − ǫ)JλT Jλλ Jλ

5.

(49)

Improvement check and update of penalty parameters

First, to ensure Taylor series expansion are accurate, actual and expected reductions must be compared. In addition, KKT conditions should be satisfied more accurately. Equality constraints should be reduced at each iteration. For inequality constraints, the criterion is a decrease in quantity λk hk since it should tend to zero at the optimal KKT solution. To summarize, the corresponding criteria for a successful iteration are the following: Jnew − J ≥ 1/2ER i i gk,new < gk for i = 1...ne , k = 0...N , k nek +j nek +j nek +j nek +j hk λk,new hk,new < λk for j = 1...nik , k = 0...N .

(50a) (50b) (50c)

If the above criteria are not satisfied, ǫ is reduced by half and the process is repeated. If ǫ reaches a certain threshold ǫmin , no improvement can be obtained. In that case, in the current version of the algorithm, penalty parameters are increased by a certain factor to give more weight to the constraints. Future work intends to update penalty parameters at each iteration in a way that guarantee enough reduction towards feasibility. In fact, thanks to the STM approach, partial derivatives of each constraint taken individually can be obtained with limited computational effort using the same procedure as in Eq. 42. The expected reduction for each constraint could be therefore computed for a given penalty parameter, which could be then adjusted to change the expected reduction in order to meet a specified degree of infeasibility. This would make more physical sense to the penalty update rather than using an arbitrary monotone increasing sequence.

VI.

Summary of the HDDP algorithm

Step 0. Initialization For each constraint, decide whether it is hard or soft. Select initial controls uk for k = 0...N − 1, initial Lagrange multipliers, initial penalty parameters, initial radius, convergence thresholds. Calculate trajectory, initial objective and constraint values. Contrary to indirect methods, note that the algorithm is not hypersensitive to the initial Lagrange multiplier values and simple guesses are sufficient in general. This statement also holds for initial control guesses. Step 1. Computation of first-order and second-order STMs Evaluate Φ1k (tk+1 ) and Φ2k (tk+1 ) for k = 0...N − 1 in forward time. This is the most computational intensive step of the algorithm. If available multi-core computers or clusters can be used to perform this step in parallel (see next section). 15 of 21 American Institute of Aeronautics and Astronautics

Initialization

Computation of first-order and second-order STMs Controls and multipliers update Backward sweep for minimization w.r.t. controls (Lagrange multipliers of SC being fixed)

Penalty update

Improvement test

Mapping of cost derivatives w.r.t. states and controls

Integrate trajectory following (**)

Control law δu = A + Bδx (*) (HC if any embedded in A and B)

ε ≤ εmin

Criteria check based on Eq. 50 and HC violations

fails

ε = ε/2 Compute multiplier step δλ (ε )

Expected Cost Reduction ER

ε=1

Stop

ER ≤ tol

Yes

Backward sweep for maximization w.r.t. multipliers

SC feasible Expected Reduction ER

No Improvement test

Control law Integrate trajectory following (*)

Jnew ≥ J

Update Radius based on Eq. 27 and HC violations

δu = A + Bδx + Cδλ

(**)

Mapping of extra cost derivatives w.r.t. multipliers

Jnew < J and SC feasible otherwise

Figure 6. Algorithm Flow Chart. Double lines indicate numerical integration.

Step 2. Backward Sweep with fixed multipliers Perform recursive mapping of control and state cost derivatives. Deduce control law δu = A + Bδx (where A and B are given by Eq. 21, or Eq. 33 when hard constraints are present), and corresponding expected reduction ER. Step 3. Objective function improvement test If ER is less than convergence tolerance and constraint tolerances are met, solution is converged. Otherwise integrate trajectory with the new control law. Update trust region based on trust region ratio ρ. If ρ < 0, GOTO step 2. If soft constraints are within tolerances, GOTO step 1. Step 4. Backward Sweep with free multipliers Perform recursive mapping of multiplier cost derivatives. Deduce augmented control law δu = A+Bδx +Cδλ (A and B have the same expression as in step 2, C is given by Eq. 45), cost derivatives with respect to multipliers Jλ , Jλλ and corresponding expected reduction ER. Step 5. Multipliers increment −1 Set δλ = −ǫJλλ Jλ . Step 6. Constraints improvement test Integrate trajectory with new control law. IF |Jnew − J| > 16 of 21

1 2

2 ǫ ( 2 − ǫ)ER AND ǫ ≥ ǫmin , set ǫ = ǫ/2 and


GOTO step 5. ELSEIF ǫ < ǫmin , GOTO step 7. OTHERWISE GOTO step 8. Step 7. Penalty parameter updates Set cnew = αc and GOTO step 2. α is an arbitrary positive number. Experience tells us that α = 2 is a good value for most problems. More elaborate update schemes can be implemented here. Step 8. Controls and multipliers update Set xk = xk,new , uk = uk,new , λ = λnew , J = Jnew , and cnew = c + γ. GOTO step 1. γ is another postive tuning parameter to slightly increase penalty constraints at each iteration. This is optional since it is not required in ALM methods. We took γ = 10 but we believe any small value would work as well.

VII.

Improvement of efficiency

It has been shown that the introduction of state-transition matrices to compute required partial derivatives provides several advantages. Nevertheless, their high computational cost, due to the necessity to integrate a large set of equations at each segment, poses an important problem for the efficiency of our algorithm. A problem with n states generally requires n2 and n3 (n(n2 + n)/2 if the symmetry of the secondorder STM is taken into account) additional equations to be integrated for the first- and second-order STMs respectively. Below we mention possible approaches to significantly enhance the computational efficiency of our algorithm. A.

Parallelization of STM computations

Once the trajectory is integrated, the STMs at each segment can be computed independently from each other. The STM calculations can therefore be executed in parallel on a multicore machine or even a cluster to dramatically reduce the computation time (see figure 7). This is a major advantage over classical formulations where the derivatives are interconnected and cannot be computed independently. Trajectory Integration

Φ10 , Φ 02

Φ11 , Φ12

Φ1k , Φ k2

Φ1N −1 , Φ 2N −1

Partials mapping

Figure 7. Parallelization of STM computations.

B.

Adaptive mesh refinement

Low-thrust optimal control is inherently discontinuous with a bang-bang structure. Since the location of the switching points is known in advance, a fine equally-spaced mesh is required to obtain an accurate solution if the mesh is kept fixed during the optimization process. To use a more coarse mesh and reduce the computational cost, one can employ an internal mesh optimization strategy that automatically increases the resolution when the control undergoes large variations in magnitude.43 This leads to an algorithm that is able to properly describe the optimal control discontinuities by creating a mesh that has nodes concentrated around switching points.


C.

Analytic State Transition Matrices

State transition matrices can be derived analytically for some problems.44 It is known that low-thrust optimization software utilizing analytic STMs enjoy impressive speed advantages compared to integrated counterparts.45, 46 Our approach offers the possibility to use those analytic STMs instead of the integrated ones, which similarly enables tremendous computational time savings. This promising topic is not going to be developed here but will be the subject of a future paper.

VIII.

Numerical results

Two example problems are presented to test the performance of our algorithm and to demonstrate the capabilities of the different techniques selected. A.

One Dimensional Landing Problem

We start with a simple dynamical problem. The objective is to minimize fuel during the vertical descent of a lander with a fixed final time. Despite its apparent simplicity, this is a good benchmark problem because the equations of motion are linear with respect to the control, so the thrust will exhibit a bang-bang structure, which is hard to tackle with traditional DDP. The cost function to be minimized is J = −m(tf ) and the dynamics of the problem are governed by:  ˙   h = v v˙ = −g + T /m    m ˙ = −T /(g0 Isp )

Table 1. Normalized data of the problem

The initial conditions h(0) = h0 , v(0) = v0 and m(0) = m0 are fixed and the constraints are set to be:  h(t ) = 0, v(t ) = 0 f f 0 ≤ T ≤ Tmax

Parameter h0 v0 m0 hf vf tf g0 Isp Tmax g

Value 1.0 -0.783 1.0 0.0 0.0 1.397 2.349 1.227 1.0

Terminal constraints are considered to be soft but control bounds are hard constraints. Numerical data are shown in table 1 and are taken from Ref. 47. Results are compared with two existing solvers, DIDO and BNDSCO. DIDO is considered as a state-of-the-art direct solver and is discretizing states and controls using a pseudo-spectral method in order to fit efficiently globally orthogonal polynomials to the discrete data over the entire time span.47 However, like all collocation methods, it does not exactly capture the dynamics. On the other hand, BNDSCO is a multiple shooting indirect method that relies on optimal control theory to solve effectively boundary-value problems, yielding accurate solutions without discretization.48 1.4

1.4

1.2

1.2 HDDP DIDO BNDSCO

0.8

0.6

0.4

0.8

0.6

0.4

0.2

0 0

HDDP DIDO BNDSCO

1

Thrust (Normalized force)

Thrust (Normalized force)

1

0.2

0.2

0.4

0.6 0.8 Time (TU)

1

1.2

0 0

1.4

Figure 8. Resulting solution with fixed mesh.

0.2

0.4

0.6 0.8 Time (TU)

1

1.2

1.4

Figure 9. Resulting solution with adaptive mesh.


First, a 10-node equally-spaced mesh is given to DDP and a 10-node mesh is given to DIDO with automatic spacing. The number of nodes is kept fixed during the optimization process for both algorithms. We can see in figure 8 that the DDP solution obtained is much closer to the exact solution than the DIDO one with the same number of nodes. The DDP solution differs from the exact one in only one segment, corresponding to the segment where the bang is happening. The mesh is therefore not fine enough to be able to capture the optimal switching time. In figure 9, we associate our algorithm with an adaptive mesh strategy, while a finer grid of 30 nodes is given to DIDO. The resulting DDP solution becomes then almost exact with a resulting mesh of 15 nodes only, whereas DIDO still only approximates the solution despite using more nodes.

0

10

0.6 Penalty − constraints AL − constraints Penalty − final mass AL − final mass

0.5

Final mass

Norm of constraints

−2

10

−4

10

0.4

0

2

4

6

8

10 12 # Iterations

14

16

18

0.3 20

Figure 10. Evolution of constraint violation and final mass.

Also, it is interesting to check the efficiency of the Augmented Lagrangian method compared to classical penalty method. Figure 10 presents the evolution of the constraint violation versus number of iterations. It is clear that ALM leads to a much faster reduction of constraint violations. This is because it forces violations to be reduced at each iteration, which is not the case in the penalty method. The resulting reduction in the number of iterations is an important asset, especially for complex problems where each iteration is computationally expensive. B.

LEO-to-GEO orbital transfer

Finally, we examine a more complicated problem about the Table 2. Data of the orbital transfer minimum fuel optimization of a low-thrust circle-to-circle planar orbital transfer between LEO and GEO orbits. NuParameter Value merical data used for the transfer are given in table 2. A Initial orbit radius 20000 km fixed equally-spaced mesh of 50 nodes is used. Results are Final orbit radius 42164 km compared with the indirect solver T3D dedicated to orbital Maximum thrust 30 N 49 transfers. Since it is an indirect method that is not disI 31.5 s sp cretizing controls, it gives “exact” locally optimal solutions. Time of Flight 164160 s Figure 11 compares the thrust structure of the DDP soluInitial mass 2000 kg tion with the one of T3D. Despite the complexity of structure with multiple bangs, we can see that they agree quite closely. Except a very fine bang at the beginning that is not captured, all be bangs are reproduced accurately. Future work intends to implement the adaptive mesh approach from the previous example to reproduce the switching structure more accurately. 7

x 10

6 T3D DDP

4

3

5

2 4

y (m)

Thrust (N)

1 3

0

−1

2

−2 1

−3

0 0

−4 2

4

6

8 10 Time (s)

12

14

16

18

−5

4

−4

−3

−2

−1

x 10

Figure 11. Time evolution of thrust.

0 x (m)

1

2

3

4

5 7

x 10

Figure 12. Planar trajectory of the transfer.


IX.

Conclusion

In this paper, a new second-order algorithm based on Differential Dynamic Programming is proposed to solve challenging low-thrust trajectory optimization problems. The hybrid method builds upon several generations of successful, well-tested DDP and general nonlinear programming algorithms. A discretization is performed intelligently to reduce the number of control variables while keeping good accuracy. The present algorithm makes full use of the structure of the resulting discrete time optimal control problem by mapping the required derivatives recursively through the first-order and second-order state transition matrices, which is in the main spirit of dynamic programming. Convergence properties are improved, and preliminary results demonstrate quadratic convergence even far from the optimal solution. Constraints are included by using two different procedures: an active set constrained quadratic programming method for hard constraints (preferably linear), and an Augmented Lagrangian method for soft constraints. For the later case, our STM-based approach is effective because no additional integrations are needed. The possible disadvantage of the additional cost in CPU time per iteration to compute the STMs can be also outweighed by several benefits, such as the exploitation of the inherent parallel structure of our algorithm and the improved constraint handling. Further, the main computational effort involving integrations of the trajectory and sensitivities is decoupled from the main logic of the algorithm making it modular and simpler to generalize and experiment. We implement and test the algorithm on one simple dynamics problem and one representative space trajectory problem. In both cases, we find robust convergence and improvements when compared to existing approaches. Additional work is needed to fully test the many properties of the algorithm across a wide spectrum of problems. Example complex problems of interest include gravity-assists, planetary capture, multiple gravitating bodies, non-spherical potentials, and multi-moon tours.

Acknowledgments The authors thank Greg Whiffen for his valuable insight, feedback, and general introductions to DDP based methods.

References 1 Rayman, M. D., Varghese, P., and Livesay, L. L., “Results from the Deep Space 1 technology validation mission,” Acta Astronautica, Vol. 47, No. 2, July 2000, pp. 475–487. 2 Rayman, M. D., Fraschetti, T. C., Raymond, C. A., and Russell, C. T., “Dawn: A mission in development for exploration of main belt,” Acta Astronautica, Vol. 58, No. 11, June 2006, pp. 605–616. 3 “The Vision for Space Exploration,” NASA, Publ. NP-2004-01-334-HQ, Feb. 2004. 4 von Stryk, O. and Bulirsch, R., “Direct and indirect methods for trajectory optimization,” Annals of Operations Research, Vol. 37, No. 1, Dec. 1992, pp. 357–373. 5 Betts, J. T., “Survey of Numerical Methods for Trajectory Optimization,” Journal of Guidance, Control, and Dynamics, Vol. 21, No. 2, 1998, pp. 193–207. 6 Kirk, D. E., Optimal Control Theory - An Introduction, Prentice-Hall Networks Series, Prentice-Hall Inc., Englewood Cliffs, N.J., 1970. 7 Betts, J. T., “Practical Methods for Optimal Control using Nonlinear Programming,” Applied Mechanics Reviews, Vol. 55, No. 4, July 2002, pp. 1368. 8 Jacobson, D. H. and Mayne, D. Q., Differential Dynamic Programming, Elsevier Scientific, New York, N.Y., 1970. 9 Liao, L. Z. and Shoemaker, C. A., “Advantages of differential dynamic programming over Newton’s method for discretetime optimal control problems,” Technical report, Cornell University, 1993. 10 Dreyfus, S. E., Dynamic programming and the calculus of variations, Academic Press, New York, N.Y., 1965. 11 Bryson, A. E., Dynamic Optimization, Addison Wesley, Menlo Park, CA, 1999. 12 Liao, L. Z., “Optimal Control Approach for Large Scale Unconstrained Optimization Problems,” , 1995. 13 Whiffen, G. J., “Static/Dynamic Control for Optimizing a Useful Objective,” No. Patent 6496741, Dec. 2002. 14 Patel, P. and Scheeres, D. J., “A Non-Linear Optimization Algorithm,” No. AAS 08-116, Jan. 2008. 15 Whiffen, G. J. and Sims, J., “Application of a Novel Optimal Control Algorithm to Low-Thrust Trajectory Optimization,” No. AAS 01-209, Feb. 2001. 16 Coleman, T. F. and Liao, A., “An efficient trust region method for unconstrained discrete-time optimal control problems,” Computational Optimization and Applications, Vol. 4, No. 1, Jan. 1995, pp. 47–66.


17 Liao, L. Z. and Shoemaker, C. A., “Convergence in unconstrained discrete-time differential dynamic programming,” IEEE Transactions on Automatic Control, Vol. 36, No. 6, June 1991, pp. 692–706. 18 Yakowitz, S. J., “The stagewise Kuhn-Tucker condition and differential dynamic programming,” IEEE transactions on automatic control, Vol. 31, No. 1, 1986, pp. 25–30. 19 Lin, T. C. and Arora, J. S., “Differential dynamic programming for constrained optimal control,” Computational Mechanics, Vol. 9, No. 1, 1991, pp. 27–40. 20 Mayne, D. Q., “A Second-Order Gradient Method for Determining Optimal Control of Non-Linear Discrete Time Systems,” International Journal of Control, Vol. 3, 1966, pp. 85–95. 21 Gershwin, S. and Jacobson, D. H., “A discrete-time differential dynamic programming algorithm with application to optimal orbit transfer,” AIAA Journal, Vol. 8, 1970, pp. 1616–1626. 22 Dyer, P. and McReynolds, S., The Computational Theory of Optimal Control, New York: Academic, New York, N.Y., 1970. 23 Yakowitz, S. J., “Algorithms and Computational Techniques in Differential Dynamic Programming,” Control and Dynamical Systems: Advances in Theory and Applications, Vol. 31, Academic Press, New York, N.Y., 1989, pp. 75–91. 24 Bellman, R. and Dreyfus, S. E., Applied Dynamic Programming, Princeton University Press, Princeton, N.J/, 1962. 25 Majji, M., Turner, J. D., and Junkins, J. L., “High Order Methods for Estimation of Dynamic Systems Part 1: Theory,” AAS - AIAA Spaceflight Mechanics Meeting, Galveston, TX. To be published in Advances in Astronautical Sciences. 26 Park, R. S. and Scheeres, D. J., “Nonlinear Semi-Analytic Methods for Trajectory Estimation,” Journal of Guidance, Control and Dynamics, Vol. 30, No. 6, 2007, pp. 1668–1676. 27 Jacobson, D. H., Gershwin, S., and Lele, M., “Computation of optimal singular controls,” IEEE Transactions on Automatic Control, Vol. 15, No. 1, Feb. 1970, pp. 67–73. 28 Bullock, T. E., Computation of optimal controls by a method based on second variations, Ph.D. thesis, Department of Aeronautics and Astronautics, Stanford University, Palo Alto, CA, 1966. 29 Rodriguez, J. F., Renaud, J. E., and Watson, L. T., “Trust region augmented Lagrangian methods for sequential response surface approximation and optimization,” Journal of mechanical design, Vol. 120, No. 1, 1998, pp. 58–66. 30 Whiffen, G. J., “Piecewise Continuous Control of Groundwater Remediation,” No. Patent 5813798, Sept. 1998. 31 Whiffen, G. J. and Shoemaker, C. A., “Nonlinear Weighted Feedback Control of Groundwater Remediation Under Uncertainty,” Water Resources Research, Vol. 29, No. 9, Sept. 1993, pp. 3277–3289. 32 Chang, L. C., Shoemaker, C. A., and Liu, P. L.-F., “Optimal TimeVarying Pumping Rates for Groundwater Remediation: Application of a Constrained Optimal Control Algorithm,” Water Resources Research, Vol. 28, No. 12, 1992, pp. 3157–3173. 33 Li, G. and Mays, L. W., “Differential Dynamic Programming for Estuarine Management,” J. Water Resour. Plng. and Mgmt., Vol. 121, No. 6, Nov. 1995, pp. 455–462. 34 Bertsekas, B. P., Constrained Optimization and Lagrange Multiplier Methods, Academic Press, 1982. 35 Murray, D. M. and Yakowitz, S. J., “Constrained Differential Dynamic Programming and Its Application to Multireservoir Control,” Water Resources Research, Vol. 15, No. 5, 1979, pp. 1017–1027. 36 Fletcher, R., Practical Methods of Optimization, Vol. 2, Wiley, New York, N.Y., 1981. 37 Kuhn, H. W. and Tucker, A. W., “Nonlinear programming,” Proceedings of 2nd Berkeley Symposium, Berkeley: University of California Press, 1951, pp. 481–492. 38 Hestenes, M. R., “Multiplier and Gradient Methods,” Journal of Optimization Theory and Applications, Vol. 4, 1969, pp. 303–320. 39 Powell, M. J. D., A Method for Nonlinear Constraints in Minimization Problems, Academic Press, London and New York, r. fletcher (ed.) optimization ed., 1969. 40 Birgin, E. G., Castillo, R. A., and Martinez, J. M., “Numerical Comparison of Augmented Lagrangian Algorithms for Nonconvex Problems,” Computational Optimization and Applications, Vol. 31, No. 1, May 2005, pp. 31–55. 41 Chang, S. C., Chen, C. H., Fong, I. K., and Luh, P. B., “Hydroelectric generation scheduling with an effective differentialdynamic programming algorithm,” IEEE Transactions on Power Systems, Vol. 5, No. 3, Aug. 1990, pp. 737–743. 42 Ruxton, D. J. W., “Differential dynamic programming applied to continuous optimal control problems with state variable inequality constraints,” Dynamics and Control, Vol. 3, No. 2, April 1993, pp. 175–185. 43 Jain, S., Multiresolution Strategies for the Numerical Solution of Optimal Control Problems, Ph.D. thesis, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, 2008. 44 Arsenault, J. L., Ford, K. C., and Koskela, P. E., “Orbit Determination Using Analytic Partial. Derivatives of Perturbed Motion,” AIAA Journal, Vol. 8, 1970, pp. 4–12. 45 Sims, J. A., Finlayson, P., Rinderle, E., Vavrina, M., and Kowalkowski, T., “Implementation of a Low-Thrust Trajectory Optimization Algorithm for Preliminary Design,” No. AIAA-2006-674, Aug. 2006, AAS/AIAA Astrodynamics Specialist Conference and Exhibit, Keystone, CO. 46 Russell, R. P. and Ocampo, C. A., “Optimization of a Broad Class of Ephemeris Model EarthMars Cyclers,” Journal of Guidance, Control, and Dynamics, Vol. 29, No. 2, 2006, pp. 354–367. 47 Ross, I. M., “User’s Manual for DIDO (Ver. PR.13): A MATLAB Application Package for Solving Optimal Control Problems,” Technical report 04-01.0, Naval Postgraduate School, Monterey, CA, Feb. 2004. 48 Oberle, H. J. and Grimm, W., “BNDSCO - a program for the numerical solution of optimal control problems,” Internal report no. 515-89/22, Institute for Flight Systems Dynamics, DLR, Oberpfaffenhofen, Germany, 1989. 49 Dargent, T. and Martinot, V., “An Integrated Tool for Low Thrust Optimal Control Orbit Transfers in Interplanetary Trajectories,” Proceedings of the 18th International Symposium on Space Flight Dynamics, German Space Operations Center of DLR and European Space Operations Centre of ESA, Munich, Germany, Oct. 2004, p. 143.


A Hybrid Differential Dynamic Programming Algorithm for ... - CiteSeerX

A Hybrid Differential Dynamic Programming Algorithm for ... - CiteSeerX

Suggest Documents

A Hybrid Differential Dynamic Programming Algorithm

A Hybrid Differential Dynamic Programming ... - Semantic Scholar

A Hybrid Differential Evolution Algorithm - CiteSeerX

An Adaptive Dynamic Programming Algorithm for ... - CiteSeerX

Hybrid Real coded Genetic Algorithm - Differential ... - CiteSeerX

An Adaptive Dynamic Programming Algorithm for a ... - CiteSeerX

Receding Horizon Differential Dynamic Programming - CiteSeerX

An Optimal Approximate Dynamic Programming Algorithm ... - CiteSeerX

An Optimal Approximate Dynamic Programming Algorithm ... - CiteSeerX

A Hybrid Linear Programming and Evolutionary Algorithm ... - CiteSeerX

Neural Networks and Differential Dynamic Programming for ...

Differential Dynamic Programming for Graph-Structured Dynamical ...

Differential Dynamic Programming - Semantic Scholar

A Hybrid Algorithm Based on Firefly Algorithm and Differential ...

A Hybrid Evolutionary Algorithm for Multiobjective ... - CiteSeerX

A Dynamic Programming Offloading Algorithm using ...

Towards a Genetic Programming Algorithm for ... - CiteSeerX

a novel hybrid optimization algorithm for differential ... - Scielo.br

A Nonlinear Programming Algorithm for Solving ... - CiteSeerX

A multiscale dynamic programming procedure for ... - CiteSeerX

Dynamic Programming Approximations for a Stochastic ... - CiteSeerX

A Hybrid Algorithm for Dynamic Location Problems - INESC Coimbra

A DE and PSO based hybrid algorithm for dynamic

A Hybrid Intelligent Algorithm for Fuzzy Dynamic Inventory Problem