cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems. Part 2: Application∗ Gregory Lantoine† and Ryan P. Russell‡
Abstract In the first part of this paper series, a new solver, called HDDP, is presented for solving constrained, nonlinear optimal control problems. In the present paper, the algorithm is extended to include practical safeguards to enhance robustness, and four illustrative examples are used to evaluate the main algorithm and some variants. The experiments involve both academic and applied problems to show that HDDP is capable of solving a wide class of constrained, nonlinear optimization problems. First, the algorithm is verified to converge in a single iteration on a simple multi-phase quadratic problem with trivial dynamics. Successively, more complicated constrained optimal control problems are then solved demonstrating robust solutions to problems with as many as 7 states, 25 phases, 258 stages, 458 constraints, and 924 total control variables. The competitiveness of HDDP, with respect to general-purpose, state-of-the-art NLP solvers, is also demonstrated. Key Words: Optimal control problems, differential dynamic programming, nonlinear large-scale problem AMS Classification: 49L20 - Dynamic programming method ∗ Acknowledgements:
This work was partially supported by Thales Alenia Space and the authors thank Thierry Dargent for
support and collaborations. † Corresponding
author, PhD Candidate, Georgia Institute of Technology, School of Aerospace Engineering, 270 Ferst Dr.,
Atlanta, Georgia, 30318, USA,
[email protected]. ‡ Assistant
Professor, The University of Texas at Austin, Department of Aerospace Engineering and Engineering Mechanics, 1
University Station C0600, Austin, TX 78712-0235, USA,
[email protected].
1
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
1
Introduction
Constrained, nonlinear optimal control problems are a major subject of interest and are useful in many fields [1]. The first part of this paper series presents the theoretical foundation of a new algorithm, called HDDP, developed to solve this type of problems. HDDP is a variant of the classical Differential Dynamic Programming technique [2] and relies on successive quadratic expansions of the cost, propagation and constraint functions. HDDP includes several standard nonlinear programming techniques (augmented Lagrangian, trust region, active set) to facilitate the inclusion of constraints in the formulation and increase robustness. In addition, HDDP is based on a state transition matrix formulation which allows for a decoupling of the dynamics from the optimization, as opposed to other modern DDP variants [3, 4].
The purpose of this second part of the series is to numerically evaluate HDDP by solving a variety of optimal control problems. In addition, several algorithmic extensions of HDDP are introduced and some conclusions on their respective merits are drawn. Comparisons are reported with SNOPT [5] and IPOPT [6], two popular, state-of-the-art, general-purpose solvers.
The paper is organized as follows. Preliminaries are given first, where the problem formulation and an overview of the test cases are discussed. Secondly, some practical algorithmic extensions of HDDP, including several heuristics for safeguarding and improving the overall method, are provided. Then, the implementation of HDDP is validated using a simple linear quadratic test problem. In addition, an Earth-Mars rendezvous transfer problem is investigated to confirm the relationship between HDDP and the Pontryagin minimum principle (see section 4 of Part 1). The three algorithms are also compared in this example. Next, SNOPT and HDDP are used to solve a multi-revolution orbital transfer, and the scalability of both algorithms is studied as the problem size increases. Finally, beyond the linear quadratic example, the multi-phase capability of HDDP is tested on a complex multi-asteroid tour problem.
2
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
2 2.1
Preliminaries Problem Formulation
In this two-part paper series, we consider multi-phase problems of the following generic form. Given a set of M phases divided by Ni stages per phase, minimize the objective function: Ni M X X J= (Li,j (xi,j , ui,j , wi )) + ϕi (xi,Ni +1 , wi , xi+1,1 , wi+1 ) , i=1
(1)
j=1
with respect to ui,j and wi for i = 1...M , j = 1...Ni subject to the dynamical equations xi,1 = Γi (wi ),
(2)
xi,j+1 = Fi,j (xi,j , ui,j , wi ),
(3)
gi,j (xi,j , ui,j , wi ) ≤ 0,
(4)
ψi (xi,Ni +1 , wi , xi+1,1 , wi+1 ) = 0,
(5)
U L U uL i,j ≤ ui,j ≤ ui,j , wi ≤ wi ≤ wi ,
(6)
the stage constraints
the phase constraints
and the control bounds
where Ni is the number of stages of the ith phase, xi,j ∈ Rnx,i are the states of dimension nx,i at phase i and stage j, ui,j ∈ Rnu,i are dynamic controls of dimension nu,i at phase i and stage j, wi ∈ Rnw,i are static controls (or parameters) of dimension nw,i associated with the phase i, Γi : Rnw,i → Rnx,i are the initial functions of each phase, Fi,j : Rnx,i × Rnu,i × Rnw,i → Rnx,i are the transition functions that propagate the states across each stage, Li,j : Rnx,i × Rnu,i × Rnw,i → R are the stage cost functions, ϕi : Rnx,i × Rnw,i × Rnx,i+1 × Rnw,i+1 → R are the phase cost functions, gi,j : Rnx,i × Rnu,i × Rnw,i → Rng,i are the stage constraints, and ψi : Rnx,i × Rnw,i × Rnx,i+1 × Rnw,i+1 → Rnψ,i are the (boundary) phase constraints. Note that problems with general inequality phase constraints ψi (xi,Ni +1 , wi , xi+1,1 , wi+1 ) ≤ 0 can be reformulated in the above form by introducing slack variables. By convention, i + 1 = 1 for i = M . We suppose that all the functions are at least twice continuously differentiable, and that their first- and second-order derivatives are available (and possibly expensive to evaluate).
3
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
Note that, contrary to HDDP, solving the optimal control problem formulation of (1) cannot be done directly with the SNOPT and IPOPT solvers. In fact, an additional, cumbersome step is necessary to convert the problem of (1) into a more generic NLP problem and to construct the first-order Jacobian (and possibly second-order Hessian) of the total objective and the constraints with respect to all the control variables of the problem. On the other hand, in HDDP, these functional derivatives are treated internally to yield the descent direction at each stage (see Part 1), so no further step is needed. It follows that an interface has been developed to greatly facilitate the use of NLP solvers for the multi-phase optimal control problems we consider. An outline of the implementation of SNOPT and IPOPT to solve the problem formulation of (1) is given in [7] and is beyond the scope of this paper.
2.2
Overview of the Test Cases
As pointed out in Table 1, this paper includes numerical test problems of various sizes and difficulty, representing both academic and real world examples. This diversity is important for a valid assessment of the robustness, performance and capabilities of the different algorithms. Note that all examples (except the linear quadratic problem) are using spacecraft dynamics.
Table 1: Characteristics of the test problems. Test Problem
# of variables
# of constraints
Number of phases M
Number of stages
Linear Quadratic
36
12
2
10
Earth-Mars
120
46
1
40
Multi-Revolution
120 - 900
42 - 302
1
20 - 300
GTOC4
924
458
25
258
All numerical tests are performed on a Intel Core 2 Duo (2.4 GHz) workstation under Windows XP 32, using the default runtime options of the Intel Visual Fortran compiler (v.11.0.066). In addition, it is not practical to consider for each problem all possible option combinations of HDDP, SNOPT, and IPOPT. Therefore, the following specific settings are used for each solver:
4
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
• HDDP (see the first part of the paper series for the definition of each constant): opt = 10−7 , feas = 10−5 , ∆0 = 0.01, σ0 = 0.001, κ = 0.25, 1 = 0.01, kσ = 1.1, SVD = 10−8 . • SNOPT: step limit = 10−3 , major feasibility tolerance = 10−5 , major optimality tolerance = 10−7 . • IPOPT: tol = 10−5 , nlp scaling method = none, mu strategy = adaptive, linear solver = mumps. If exact second-order derivatives are used, hessian approximation = exact and step limit = 0.01 a , otherwise hessian approximation = limited-memory and step limit = 10−3 .
3
Algorithmic Improvements and Options within HDDP
A complete algorithm requires many features to achieve robustness and efficiency. The first part of the paper series presented the theoretical aspects and the main techniques at the core of HDDP. Below, we mention possible practical, additional approaches to enhance the computational robustness and efficiency of HDDP.
3.1
Safeguardings
We recall that at each iteration, the next control iterates are found by applying the following control laws:
δuk = Ak + Bk δxk + Ck δw + Dk δλ,
(7a)
δλ+ = Aλ+ + Cλ+ δw+ ,
(7b)
δw+ = Aw+ + Bw+ δx− + Cw+ δw− + Dw+ δλ− .
(7c)
As described in Section 3.3.1. of Part 1, a trust region method is used to restrict the step of the control iterates to ensure second-order expansions stay valid. However, only the non-feedback terms Ak , Aλ+ , and Aw+ of the different control laws are affected by the trust region procedure. The other feedback terms are not rigorously restricted since they depend on the current state and parameter deviations, which are not known a priori in the backward sweep. One way to better control the step-length is to take the deviations of the previous forward sweep iterations and use them as a guess to estimate the magnitudes of the feedback terms. a This
option does not exist in the original IPOPT package, which could bias the results, so we modified the source code to be
able to specify the step between two successive iterates. In addition, larger steps are allowed when exact second-order derivatives are used, since the estimated optimal step is expected to be more accurate in that case.
5
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
The coefficient matrix terms are then truncated if their associated predicted steps are greater than some fraction of the non feedback terms. For instance, the feedback matrix Bk in (7a) is reset as: Bk =
η1 kAk k Bk , max(η1 kAk k , kBk δxprev k)
(8)
where η1 is a parameter set by the user (η1 = 10 in our implementation). In addition, to prevent occasional divergence, we zero the feedback matrices that have become suspiciously large.
Finally, for the forward run, we have found that the step 4 of the algorithm (see Section 3.4.4. of Part 1) for computing the successor nominal policy must be modified, the modification being motivated by the fact the successor policies, after accounting for all feedback terms, may ‘step outside’ the quadratic region despite the safeguarding heuristics of the backward sweep. To overcome this problem, the new iterate during the forward sweep is set according to an extra safeguarding rule: δuk =
η2 ∆ δuk , max(η2 ∆, kδuk k)
(9)
where η2 is another parameter set by the user. Note that adjusting the forward run in a way that violates the feedback law from the backward sweep can lead to discrepancies in the expected and actual reduction. It follows that this strategy must be used as an extreme safeguard, so a high value of η2 is recommended (η2 = 1000 in our implementation) to avoid this case as much as possible. Nevertheless, these safeguarding techniques are included by default in the standard HDDP algorithm to ensure appropriate robustness.
3.2
Trust Region Scaling
We recall that one HDDP iteration requires to solve a succession of quadratic subproblems. As noted in Section 3.3.1 of Part 1, each subproblem is formulated as an elliptical trust region problem: 1 min Ju,k δuk + δuTk Juu,k δuk , δuk 2 such that kDδuk k ≤ ∆,
(10)
where ∆ is the current trust region radius, and D is a positive definite diagonal scaling matrix that determines the geometrical shape of the trust region. From our experience, we find that the overall optimization algorithm usually behaves differently for different scaling matrices, particularly if the terms in the matrices are different 6
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
by orders of magnitude. Unfortunately, for nonlinear optimization problems, it is not clear how to determine the scaling matrix D to obtain good efficiency and robustness (primarily because scaling is more of an art than a science). In fact, the sensitivities of the cost-to-go function, with respect to change in variables, might vary drastically from one iteration to another due to nonlinearities. In our implementation, we offer two alternatives.
The first possibility is to fix the scaling matrix for all iterations. The matrix is set by the user at the start of the optimization (identity matrix by default). The scaling matrix can be determined at the beginning by independently estimating the quadratic region of validity for each variable. To that end, we find the maximum change of each variable that keeps the predicted reduction coinciding with the actual reduction of the nonlinear cost function. This strategy is the default algorithmic option in HDDP.
The second alternative is to reset the scaling matrix at each iteration so that the eigenvalues of the scaled Hessian D−1 Juu D−1T have a more balanced distribution, which can be viewed as a means of preconditioning the subproblem. In our implementation, we use a simple diagonal Hessian preconditioning: Dii =
p ii | , ), max(|Juu
(11)
where is the relative machine precision. In our experience, while this strategy works well in some cases, fixing the scaling matrix at the beginning seems to be more robust, especially if the user has some knowledge about the sensitivities of the problem.
3.3
Treatment of Control Bounds
In the standard HDDP algorithm presented in Part 1, control bounds are treated using a trust-region rangespace method. One drawback of this method is that the trust region computation is performed first, and then the resulting shifted Hessian is reduced to account for the constraints. This may lead to numerical difficulties since the trust region step may underestimate the size of components along the constraints. This undesirable side-effect is especially true when some control bounds are active. For instance, in Figure 1, the left-hand side shows a situation where the unconstrained trust region step is mainly along a direction that violates a control bound. The contribution of the unconstrained variable is much smaller than that of the fixed variable, and thus numerically swamped. On the right-hand side, the corresponding feasible direction left after reduction of 7
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
the Hessian is artificially small and not representative of a full trust region step. y
y
yL
yL
x
x
Figure 1: Negative effect of bounds on trust region step estimations.
To avoid this shortcoming, we use a different method to account specifically for control bounds. First, as before, we compute an unconstrained trust region step δu∗k to estimate the set of active bound constraints. Secondly, the control variables that lie on - or outside - their bounds are assigned a non-feedback δuk that keeps them - or takes them - directly on their bounds, and the feedback matrices are zeroed. Next, the Hessian Juu,k and gradient Ju,k are reduced to remove the rows and columns that correspond to the fixed control variables. A second trust region problem is then solved with the reduced Hessian and gradient b . The full size of the trust region is thus guaranteed to be used on the free control variables. Note that this technique is a special case of null-space methods that construct a reduced Hessian Z T Juu,k Z and a reduced gradient Z T Ju,k where Z is a full-rank matrix that spans the null space of active linearized constraints (in other words, geu,k Z = 0). Null-space methods are successfully implemented in state-of-the-art NLP solvers [5, 8]. Future work will therefore intend to generalize the outlined procedure for all nonlinear stage constraints.
However, this method to enforce control bounds is more computationally intensive because two trust region computations are necessary. Another idea for the treatment of the control bound constraints is to use an affine scaling interior-point method introduced by Coleman and Li [9]. Interior-point approaches are attractive for problems with a large number of active bounds since the active set does not need to be estimated. In this method, the scaling matrix D of the trust region technique (see (10)) is a diagonal matrix whose diagonal elements are determined by the distance of the control iterates to the bounds and by the direction of the gradient: b If
nonlinear stage constraints are present, they are handled with the range-space method described in the previous subsection
8
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
Dpp =
1 q , uU [ k −uk ]p 1, q 1 L , [uk −uk ]p 1,
if [Ju,k ]p < 0 and uU k p < ∞, if [Ju,k ]p < 0 and uU k p = ∞,
(12)
if [Ju,k ]p ≥ 0 and uL k p > −∞, if [Ju,k ]p ≥ 0 and uL k p = −∞.
In general, the nulls-pace and interior-point methods are both effective for the treatment of bounds. Finally, note that both approaches require starting with a solution that strictly satisfies the bound constraints. It might be therefore necessary to modify the user-provided initial point so that unfeasible control components are projected on the boundary. The range-space method described in the first paper could also be used initially.
3.4
Filtering Method for Accepting New Iterates
To accept one iterate, an case extra can be distinguished compared to the algorithm presented in Part 1. First, if the predicted vs actual reduction ratio ρ belongs to the interval [1 − 1 , 1 + 1 ] where 1 2 >> 1 , the approximations are not as accurate but we do not simply throw away the trial iterate. Instead, we give it another chance by testing whether it can be accepted by a filter criterion. The filter concept originates from the observation that the solution of the optimal control problem consists of the two competing aims of minimizing cost functions and minimizing constraint violations. Hence it can be seen as a bi-objective problem. Fletcher and Leyffer [10] propose the use of a Pareto-based filtering method to treat this problem. A filter F is a list of pairs (h, f ) such that no pair dominates any other. A pair (h1 , f1 ) is said to dominate another pair (h2 , f2 ) if and only if both f1 ≤ f2 and h1 ≤ h2 . In our case, the pair corresponds to the cost and infeasibility values:
Ni M 1 X X h= (Li,j (xi,j , ui,j , wi )) + ϕi (xi,Ni +1 , wi , xi+1,1 , wi+1 ) , M i=1 j=1 v uM h i X 1 u 2 t f= kψi (xi,Ni +1 , wi , xi+1,1 , wi+1 )k . M i=1
(13a)
(13b)
A natural requirement for a new iterate is that it should not be dominated by previous iterates. Hence, when
9
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
hnew < hk or fnew < fk for all (hk , fk ) ∈ F, we accept the new iterate and add it to the filter. All entries that are dominated by the new iterate are removed from the filter. The advantage of the filter method in our algorithm is to increase the opportunity of iterates to be accepted, which is likely to accelerate convergence. Note that the definitions in (13a) and (13b) for optimal and feasible play a significant role in successful Pareto filter implementations. As an example, the true performance index J includes the augmented Lagrangian, whose extra term is not included in the f definition. Therefore, excessively large penalty weights can overwhelmingly favor the feasibility, while the optimality moves in the dominated direction. In such cases, we simply rely on the successful iterates that satisfy the more conservative condition that ρ is within the interval [1 − 1 , 1 + 1 ].
3.5
Parallelization of STM Computations
Once the trajectory is integrated, the STMs at each segment can be computed independently from each other. The STM calculations can therefore be executed in parallel on a multicore machine or even a cluster to dramatically reduce the computation time (see figure 2). This is a major advantage over classical (Riccati-like) formulations, where the derivatives are interconnected and cannot be computed independently. This strategy is not tested in this paper and is left as future work. Trajectory Integration
Φ10 , Φ 02
Φ11 , Φ12
Φ1k , Φ 2k
Φ1N −1 , Φ 2N −1
Partials mapping
Figure 2: Parallelization of STM computations.
3.6
Adaptive Mesh Refinement
Many optimal control problems are inherently discontinuous with bang-bang control structures. Since the locations of the switching points are unknown in advance, a fine equally-spaced mesh is required to obtain an accurate solution if the mesh is kept fixed during the optimization process. To use a more coarse mesh 10
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
and reduce the computational cost, HDDP could be implemented with an internal mesh optimization strategy that automatically increases the resolution when the control undergoes large variations in magnitude [11]. Such a refinement can properly describe the optimal control discontinuities by creating a mesh that has nodes concentrated around switching points. This refinement is not considered in this paper and is left as future work.
3.7
Analytic State Transition Matrices
State transition matrices can be derived analytically for some problems [12]. It is known that spacecraft trajectory optimization software utilizing analytic STMs enjoy impressive speed advantages compared to integrated counterparts [13, 14]. Our HDDP framework offers the possibility to use these analytic STMs, which similarly enables tremendous computational time savings. This promising topic is considered in [15] and is extensively used in the following examples to speed up computations.
4
Validation of HDDP: Linear Quadratic Problem
The first part of the paper series outlined the theory and the mathematical equations that govern the HDDP algorithm. As a preliminary validation check, we propose to solve a linear system with a quadratic performance index and linear constraints. Powell proved that methods based on Augmented Lagragian functions should exactly converge in one iteration for this kind of problem [16]. It comes from the fact that the augmented cost function remains quadratic when linear constraints are included (they are only multiplied by the Lagrange multiplier).
To test the complete algorithm, we consider a simple multi-phase, targeting problem with 2 phases (M = 2) and 5 stages for each stage (N1 = N2 = 5). The states are governed by controls only where the transition functions Fi,j (see (3)) acting on each stage are given by:
ri,j+1 ri,j + vi,j = for i = 1...2, j = 1...5. xi,j+1 = Fi,j (xi,j , ui,j ) = vi,j+1 vi,j + ui,j
(14)
The states are the position and velocity, and the controls are directly related to the acceleration. At each stage,
11
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
the following quadratic cost function Li,j is considered: 2
Li,j = kui,j k
for i = 1...2, j = 1...5.
(15)
The phase constraints ψ1 between the two phases enforce the continuity of the states: ψ1 (x1,6 , x2,1 ) = x2,1 − x1,6 = 0.
(16)
The final constraint ψ2 targets an arbitrary point in space: ψ1 (x2,6 ) = r2,6 − [1, −1, 0].
(17)
The initial states of the first phase are fixed: x1,1 = [1, 1, 1, 1, 1, 1]. The initial guesses of the controls and the first states of the second phase are simply zero: x1,5 = [0, 0, 0, 0, 0, 0] and ui,j = [0, 0, 0] for i = 1...2, j = 1...5.
0
4 ux uy uz
−0.05
3
−0.15 States
Controls
−0.1
rx ry rz
−0.2
2
1
−0.25 −0.3
0
−0.35 −0.4
2
4
6 Stage #
8
−1
10
2
4
6 Stage #
8
10
Figure 3: Controls (left) and states (right) of the optimal solution.
Figure 3 shows the converged solution obtained by HDDP. As expected, HDDP converges to the optimal solution in one iteration (when all the safeguards are fully relaxed). Experiments with other initial conditions and target constraints are consistent, yielding single iteration optimal solutions.
5
Earth-Mars Rendezvous Transfer
An example problem for a classic Earth-Mars rendezvous transfer is presented to compare the solvers and point out the coupling between HDDP and indirect methods. We maximize the final mass of the spacecraft, and the 12
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
time of flight is fixed and equal to 348.79 days. The spacecraft has a 0.5 N thruster with 2000 s Isp . The initial mass of the spacecraft is 1000 kg. Planets are considered massless, so only the gravitational force of the Sun is taken into account. As a consequence, we use only one phase to describe the trajectory: M = 1. Along the trajectory, the spacecraft state vector is defined by 7 variables: position vector, velocity vector and mass. x = (r, v, m) .
(18)
We consider a launch date on April 10th, 2007. The corresponding states of the Earth at this date are obtained with JPL ephemerides DE405: r0 = [−140699693, −51614428, 980] km, v0 = [9.774596, −28.07828, 4.337725 10−4 ] km/s. The terminal constraints impose a rendezvous with Mars:
rf − rM (tf ) . ψf = vf − vM (tf )
(19)
From JPL ephemerides DE405, the targeted states are : rM (tf ) = [−172682023, 176959469, 7948912] km, vM (tf ) = [−16.427384, −14.860506, 9.21486 × 10−2 ] km/s.
The low-thrust spacecraft trajectory is approximated as a series of impulsive ∆V ’s connected by coast arcs. From classical two-body mechanics, these arcs are computed analytically using a standard Kepler solver through the “f and g” procedure presented by Bate et al.[17]. The mapping between the states can be therefore defined analytically on each stage with the following closed-form transition function Fk :
f rk + g(vk + ∆vk ) rk+1 v = Fk (xk , ∆vk ) = f˙r + g(v k+1 k ˙ k + ∆vk ) , ∆vk mk+1 mk exp(− g0 Isp )
(20)
where f and g are the Lagrange coefficients. The mass discontinuity, due to the impulse, is obtained from the rocket equation. The corresponding first- and second-order STMs are also computed analytically [18, 15]. A fixed equally-spaced mesh of 40 stages is used, which corresponds to 40 impulses separated by 8.7 days. The initial guess of the controls is zero. 13
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
The problem is solved with HDDP, SNOPT and IPOPT. Note that, for IPOPT, two different cases are considered: 1) first and second order derivatives provided, and 2) first order derivatives only provided. Regarding HDDP, we consider the following basic variants : • standard : algorithm as described in part 1 of the paper series • unsafe: no safeguardings are used. • scaling: an automatic trust region scaling is used, with components of the scaling matrix updated at each iteration (see subsection 3.2). • reduc: the Hessian is reduced to account for control bounds before solving the trust region subproblem (see subsection 3.3). • inter : the affine scaling interior-point method is used to treat control bounds (see subsection 3.3). • filter : the filtering method is considered for the acceptance or rejection of an iterate (see subsection 3.4). The results of the optimizations are given in Table 2. Figure 5, Figure 6 and Figure 7 show the thrust profiles of the optimal solution from each solver. The trajectory of the final optimal solution is given in Figure 4. The values of the Lagrange multipliers of the final constraints are given in Table 3 to test the similarity between HDDP and NLP solvers. 8
x 10 2
0.5
1.5
0.4 Thrust (N)
y (km)
1 0.5 0
0.3 0.2
−0.5 0.1
−1 −2
−1
0 x (km)
1
0 0
2 8
x 10
Figure 4: Optimal Earth-Mars Rendezvous trajec-
100 200 300 Time (days from epoch)
Figure 5: Thrust profile from SNOPT.
tory.
14
400
0.5
0.5
0.4
0.4 Thrust (N)
Thrust (N)
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
0.3
0.3
0.2
0.2
0.1
0.1
0 0
100 200 300 Time (days from epoch)
0 0
400
100 200 300 Time (days from epoch)
400
Figure 6: Thrust profile from IPOPT (First-order
Figure 7: Thrust profile from HDDP (standard ver-
version).
sion). Table 2: Comparison of results from different solvers. Solver
mf (kg)
# of function calls
# of derivative function calls
CPU Time (s)
SNOPT
598.66
439
439
10
IPOPT 1
598.66
1821
1816
90
IPOPT 2
598.66
304
249
762
HDDP standard
598.66
1419
1360
69
HDDP unsafe
FAILED
HDDP scaling
598.66
5498
4123
203
HDDP reduc
598.66
1239
1193
60
HDDP inter
598.66
3520
3018
145
HDDP filter
598.66
1329
1262
64
We can see that Figure 5, Figure 6 and Figure 7 show good agreement, so all solvers find the same solution with the same thrust profile. SNOPT is by far the fastest solver. Interestingly, Second-order IPOPT requires the fewest number of iterations, but its overall CPU time is the largest. This comes from that fact that the computation and construction of the second-order Hessian of the problem is very expensive. The standard HDDP solver compares reasonably well for this problem. However, HDDP is more intended for large-scale problems, and a more suited example is provided in the next section. As expected, there is a substantial decrease in the number of function calls when the Hessian is reduced before solving the trust-region subproblem 15
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
Table 3: Comparison of the Lagrange multipliers of the constraints.
(reduc variant).
Solver
Lagrange Multipliers
SNOPTc
[0.4804, -1.2011, -0.2510, 0.1151, 1.9604, 0.1265]
IPOPT 1
[0.4802, -1.1941, -0.2492, 0.1173, 1.9472, 0.1255]
IPOPT 2
[0.4810, -1.2037, -0.2511, 0.1145, 1.9643, 0.1262]
HDDP standard
[0.5095, -1.2700, -0.2665, 0.1178, 2.0701, 0.13404]
On the other hand, the interior point method (inter variant) performs the worst out of
all the methods. That poor performance may be partly explained by the fact the scaling matrix of the trust region procedure is likely to change from one iteration to another, which may interfere with the update of the trust region radius. The filter variant appears to decrease the computational time, although not dramatically. This improved performance may be explained by the aggressive nature of this strategy, which tends to accept iterates more frequently. For this problem, the scaling variant deteriorates the performance compared with the standard algorithm. This result illustrates the difficulty in designing automatic scaling procedures. Finally, we can see that the safeguarding techniques are crucial (as expected) since the algorithm is not converging without them. Note in Table 3 that the values of the Lagrange multipliers from HDDP match roughly those of SNOPT and IPOPT, which indicates that this NLP-like feature of HDDP is working well, and the resulting multipliers could be used as efficient guesses for pure direct methods.
In addition, we test the validity of the claim of section 4 of the first part of the paper series regarding the correspondence between the initial values of the co-states and the initial values of Jx (the sensitivities of the performance index with respect to the states) in HDDP. We find that: Jx,0 = [−0.96759, −1.32018, −8.8556 10−2 , −0.64969, −1.56202, 0.37153, 6.47488 10−2 ]. When the problem is solved using an indirect method, we have: λ0 = [−0.87165, −1.14978, −8.75855 10−2 , −0.54003, −1.40597, 0.33121, −0.52092]. The HDDP and indirect values are clearly related. The discrepancies likely result from the discretization and the use of approximated dynamics. In fact, Figure 8 shows the average errors in the initial costates for the same c We
point out that the signs of the Lagrange multipliers from SNOPT are switched to account for the different conventions.
16
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
0
Initial Co−state Relative error
10
−1
10
−2
10
−3
10
0
50
100 150 200 Number of stages N
250
300
Figure 8: Co-state error (relative to indirect values) as a function of discretization points. problem solved with varying different numbers of stages. Clearly, the errors decrease rapidly as the number of discretization points increases. The HDDP values for N = 40 are then given as initial guesses to the indirect optimization procedure. It is found that the indirect algorithm converges in a few iterations only. This ease of convergence demonstrates that the HDDP solution can be used as an good initial guess for an indirect formulation. 80 70
# of runs
60 50 40 30 20 10 0 1000
2000
3000 4000 # of function calls
5000
6000
Figure 9: Distribution of iteration # corresponding to 100 random initial guesses.
Finally, the robustness of HDDP is tested by generating 100 random initial guesses. For each stage, assuming uniform distributions, the magnitude and angles of the starting control guesses are randomly selected in the intervals [0, Tmax ] and [0, 2π], respectively. It is found that HDDP is able to converge to the same optimal solution for all initial guesses. This result shows that the radius of convergence of HDDP is very large for this
17
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
problem. The distribution of the number of iterations of the runs is plotted in Figure 9. The mean is 1817 iterations and the standard deviation is 403 iterations. Surprisingly, the number of iterations corresponding to the zero intial guess (see Table 2) is less than the mean number. This observation can be explained by the fact that the non-zero random initial guesses can significantly go beyond the Mars orbit, especially if the random thrust magnitudes are high.
6
Multi-Revolution Orbital Transfer
This example is a more complicated spacecraft trajectory problem regarding the minimum fuel optimization of a low-thrust orbital transfer from the Earth to a circular orbit. Again, we use only one phase to describe the trajectory: M = 1. The Isp is assumed to be constant and equal to 2000 s. The initial states (position, velocity, mass) are the same as in the previous example. The objective is to maximize the final mass. The analytical Kepler model described in the previous example (see (20)) is chosen to propagate the stages. Final constraints enforce the spacecraft to be on a final circular orbit with radius atarget = 1.95 AU. The square of the eccentricity is used in the second constraint to have continuous derivatives.
af − atarget , ψf = e2f
(21)
where af and ef can be expressed as functions of the final states: af =
1 2
2/ krf k − kvf k /µ
,
(kvf k2 − µ/ krf k)rf − (rT vf )vf
f ef =
.
µ
(22a) (22b)
To study the influence of the number of revolutions on the optimization process, this problem is solved several times for increasing times of flight. The maximum thrust allowed and the number of stages are modified accordingly so that the problem stays accurate and feasible. The problem is intentionally designed such that the time allowed exceeds that from the minimum time solution. Therefore, an unknown number of coast/thrust arcs will appear in the solution. Such problems with multiple (on the order of ten) bang-bang control switches are known to be particularly challenging for direct optimal control methods.
18
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
• Case 1 (≈ 2 revs): T OF = 1165.65 days, N = 40, Tmax = 0.2 N. • Case 2 (≈ 5 revs): T OF = 2325.30 days, N = 80, Tmax = 0.14 N. • Case 3 (≈ 9 revs): T OF = 4650.60 days, N = 160, Tmax = 0.05 N. • Case 4 (≈ 17 revs): T OF = 8719.88 days, N = 300, Tmax = 0.015 N. Table 4: Comparison results between HDDP and SNOPT for multi-rev transfers. mf
# of
# of
# of
CPU time
(kg)
coast arcs
function calls
derivative function calls
(s)
HDDP
655.65
4
510
452
42
SNOPT
655.63
4
12638
12638
315
HDDP
655.65
7
1562
1026
171
SNOPT
654.35
5
9431
9431
832
HDDP
654.75
10
2875
2048
524
SNOPT
651.43
9
10321
10321
2981
HDDP
651.70
15
6060
4879
1689
Case 1
Case 2
Case 3
Case 4 SNOPT
FAILED
1.4 SNOPT HDDP
Cost per iteration (s)
1.2 1 0.8 0.6 0.4 0.2 0 0
2000
4000 6000 Time of Flight (days)
8000
10000
Figure 10: Cost per iteration as a function of time of flight for HDDP and SNOPT. The solver IPOPT cannot be used to solve these large-scale problems due to memory limitations. The large memory requirements of our implementation of IPOPT are due to the accommodation of the second-order 19
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
sensitivity calculations. It follows that only the solvers HDDP (standard variant) and SNOPT are used for the optimization. In order to evaluate the robustness, the initial guess of the thrust controls is set to zero for all stages d . This initial guess is very poor as the resulting trajectory never leaves the Earth’s vicinity. This example is, therefore, particularly challenging to solve, especially for long flight times. Table 4 summarizes the results for the different cases. We can see that HDDP is able to converge in all cases, while SNOPT fails when the time of flight (hence the number of variables) becomes large. These results point out that the many revolution problems become difficult to converge even with the sparse capabilities of SNOPT. We point out here that, for this problem, HDDP is significantly faster in total compute time due to the reduced number of iterations despite the extra cost of requiring second order derivatives. In addition, for cases 2 and 3, HDDP and SNOPT appear to converge on different local minima. Figure 10 shows the cost per iteration of HDDP and SNOPT, and demonstrates that SNOPT does indeed suffer from the ‘curse of dimensionality’ to a greater extent than that of HDDP. The computational cost of SNOPT increases exponentially (arguably the rate may be considered super-linear due to the sparsity of the problem), while that of HDDP increases only linearly. For a small number of variables, SNOPT is faster per iteration than HDDP since exact second-order derivatives are not computed in SNOPT. However, for a large number of variables, SNOPT becomes slower per iteration than HDDP because SNOPT does not take advantage of the time structure of the problem.
8
Inertial Trajectory
x 10
400 350
2
300 Angle (deg)
y (km)
1 0 −1
250 200 150 100
−2
50 −3
−2
−1
0 x (km)
1
2
0 0
3 8
x 10
2000
4000 6000 Time (days)
8000
10000
Figure 11: Trajectory of the case 4 transfer (from
Figure 12: In-plane azimutal thrust angle history of
HDDP).
the case 4 transfer (from HDDP).
d in
practice, the thrust magnitudes are set to a very small value so that sensitivities with respect to the angles do not vanish
20
0.016
0.016
0.014
0.014
0.012
0.012
0.01
0.01
Thrust (N)
Thrust (N)
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
0.008 0.006
0.008 0.006
0.004
0.004
0.002
0.002
0 0
2000
4000 6000 Time (days)
8000
0 0
10000
2000
4000 6000 Time (days)
8000
10000
Figure 13: Thrust profile of case 4 from HDDP (left) and T3D (right). 0
0
10
−5
e2
a−atarg
10
10
−10
10
0
−5
10
−10
2000
4000 # of iterations
6000
10
8000
0
0
2000
4000 # of iterations
6000
8000
2000
4000 # of iterations
6000
8000
10
λ2
λ1
5 −0.5
0 −1 0
2000
4000 # of iterations
6000
−5 0
8000
Figure 14: Evolution of the constraints and associated Lagrange multipliers λ1 and λ2 during optimization: semi-major axis constraint (left) and eccentricity constraint (right).
Details on the solution of case 4 found by HDDP are given from Figure 11 to Figure 14. The trajectory involves nearly 17 revolutions. Figure 14 shows that the radius constraint and associated Lagrange multiplier are approximately converged after about 1/4 of the iterations. During the remaining iterations, the solutuion slowly improves the eccentricity constraint. In addition, in Figure 13 results are compared with the indirect solver T3D dedicated to orbital transfers [19]. Since T3D is an indirect method that does not discretize controls, it gives “exact” locally optimal solutions. The solution produced by T3D is therefore considered as the benchmark solution. The right plot of Figure 13 shows the thrust structure of the T3D solution. Despite the complexity of the structure with multiple exact bang-bang switches, we can see that the T3D and HDDP
21
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
solutions agree very closely. Note that convergence for T3D was very difficult for this challenging multi-rev problem, requiring a great deal of user intervention.
7
GTOC4 Multi-Phase Optimization
GTOC4 is the fourth issue of the Global Trajectory Optimization Competition (GTOC), first initiated in 2005 by the Advanced Concepts Team of the European Space Agency. GTOC problems are traditionally global low-thrust trajectory optimization problems that seek to find the best thrust profile and sequence of asteroids according to some performance index. GTOC editions are therefore a challenging benchmark for optimization algorithms. In the GTOC4 problem, the spacecraft is constrained to flyby a maximum number of asteroids (from a given list), and then rendezvous with a last asteroid. The primary performance index to be maximized is the number of visited asteroids, but when two solutions have the same number of visited asteroids a secondary performance index is the maximization of the final mass of the spacecraft. A local optimizer is therefore required to optimize a given sequence of asteroids.
In this problem, the trajectory can be readily broken into several portions connected by the flybys at the asteroids. GTOC4 is therefore a good test case for the multi-phase formulation of HDDP. The spacecraft has a constant specific impulse Isp of 3000 s and its maximum thrust is 0.2 N e . The initial mass of the spacecraft is 1500 kg and its dry mass is 500 kg. The spacecraft must launch from Earth with a departure excess velocity no greater than 4.0 km/s in magnitude, but with an unconstrained direction. The year of launch must be within 2015 and 2025, and the time of flight for the whole trajectory must not exceed 10 years.
This problem is defined to be in the same form as the generic formulation presented in (1). We define now all the functions and variables of this formulation. First, the variables are defined in the same way as in the last two examples. A spherical representation of the thrust vector controls is used. The initial function Γi is defined as: e The
maximum thrust is 0.135 N in the original GTOC4 problem. The authors raise the maximum thrust value because the
GTOC4 problem is not feasible with the original maximum thrust value when the analytical Kepler model is used to approximate the low-thrust trajectory.
22
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
rast,i (t0,i ) Γi = vast,i (t0,i ) + V∞,i , m0,i
(23)
where rast,i (t0,i ) and vast,i (t0,i ) are the position and velocity of the ith asteroid of the sequence at the starting time t0,i of phase i. Given the definition of the GTOC4 problem and the continuity conditions between the masses and the times of successive phases, the phase constraints have the following form:
rf,i − rast,i+1 (tf,i ) for i = 1...M − 1, ψi = tf,i − t0,i+1 mf,i − m0,i+1 rf,i − rast,i+1 (tf,i ) for i = M . ψi = vf,i − vast,i+1 (tf,i )
(24)
Again, the analytical Kepler model is used to propagate the spacecraft across each stage, and the thrusts are approximated as impulsive velocity increments at the end of each stage. The trajectory obtained can then be refined using a numerical constant thrust model, but this extra step is not shown here. The initial guess comes from a promising ballistic Lambert solution that gives the asteroid sequence and initial values for all the static parameters wi = [V∞,i , m0,i , t0,i , tf,i ] of each phase. The orbital elements and associated epoch times of the asteroids of the sequence are given in Table 5. The thrust on each stage is set to zero.
Table 6 compares the charateristics of the results from HDDP (standard variant) and SNOPT. We can see that HDDP and SNOPT converge to nearly identical but nevertheless different final masses. In addition, for this example, HDDP takes more iterations and is slower than SNOPT. Interestingly, in both cases, the spacecraft uses only about half of the propellant available. Figure 15 depicts two-dimensional and three-dimensional trajectory views of the resulting solution optimized by HDDP. The optimal static parameters of each phase are given in Table 7. Figure 16 shows the resulting thrust and inclination histories. The spacecraft inclination remains low and varies little throughout the trajectory, while the inclinations of the intercepted asteroids vary significantly (see Table 5), which suggests the solution is efficient, as it is well known that changing inclination
23
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
is fuel expensive. Note that the problem was formulated with 25 phases since this trajectory has 24 asteroid flybys and 1 asteroid rendezvous. This example therefore demonstrates the multi-phase capability of HDDP. Table 5: Orbital Elements of the bodies encountered in the GTOC4 trajectory. Body
Epoch
Semi-major axis
Eccentricity
Inclination
Longitude of Ascending
Argument of Periapsis
#
(MJD)
a (AU)
e
i (deg)
Node LAN (deg)
w (deg)
Mean Anomaly M A (deg)
0f
54000
0.99998804953
1.67168116E-2
0.885435307E-3
175.4064769
287.6157754
257.6068370
1
54800
9.3017131191E-1
1.6769455838E-1
8.9335359602E-1
14.822384375
131.38493398
275.70393807
2
54800
1.084255941
3.155808232E-1
7.850170754
95.26367740
264.6332999
4.356061282
3
54800
1.7552828368
5.7945771228E-1
6.5141899261
13.045124964
270.61856651
155.74454312
4
54800
1.3800997657
2.7580784273E-1
2.6606667520E-1
96.339403680
101.42094303
229.92816483
5
54800
1.7075464883
5.2695554262E-1
4.2213705005
44.554300450
87.662123588
280.43305520
6
54800
1.0006640627
6.3230497939E-1
2.6484263889
19.209151230
200.25315258
106.72858289
7
54800
1.5911507659
3.4753312629E-1
3.7576988738E-1
74.065001060
10.415406459
169.90158505
8
54800
8.6572958591E-1
2.3794231521E-1
18.696815694
302.11000003
233.44411915
262.07506808
9
54800
1.6714664872
6.1129594066E-1
4.6618217515
263.39841256
84.928649085
272.63494057
10
54800
1.3160154668
2.1492182204E-1
2.7420643449
175.90424294
353.47104336
168.00794806
11
54800
9.5081078800E-1
3.0065719387E-1
1.4145702318
93.498333536
110.24580112
267.39908757
12
54800
2.0350529986
5.0272870853E-1
1.7759725806
44.755065897
144.09991810
218.24463021
13
54800
1.2388916078
3.7055387373E-1
21.681828028
73.115325356
105.53090047
132.76357397
14
54800
1.2152271331
5.6461074502E-1
1.7232805609
104.16370212
356.45495764
183.47161359
15
54800
1.0611146623
3.0767442711E-1
5.6219229406
269.68129154
80.383012719
312.78301349
16
54800
9.2123263041E-1
3.6297077952E-1
1.5474324643
347.20714860
57.688377044
302.98512819
17
54800
2.0515997162
6.6534478064E-1
6.1718765590
79.806648798
84.811200730
115.91149094
18
54800
1.2664655353
9.2674837663E-1
23.703765923
39.717681807
149.42286711
268.01737324
19
54800
8.9557654855E-1
4.9544188148E-1
11.561952262
162.89527752
139.57717229
26.143357706
20
54800
9.2467395906E-1
2.9779807731E-1
3.7631635262
203.55546271
253.44738625
238.74232395
21
54800
7.2358966214E-1
4.1051576901E-1
8.9805388805
231.65246288
355.50277050
121.10107758
22
54800
1.0047449862
2.9343421704E-1
5.2415677063
25.948442789
280.91259530
133.78127639
23
54800
7.5828217967E-1
3.5895682728E-1
33.432860441
281.89275262
201.48128492
275.33499146
24
54800
1.7057098943
6.8990451045E-1
8.7448312990
34.400999084
99.314851116
240.06977412
25
54800
1.0327257593
6.8786392762E-2
2.6459755979E-1
21.101512017
300.73089876
96.412302864
Table 6: Comparison results between HDDP and SNOPT for the GTOC4 problem.
f Body
Solver
mf (kg)
# of function calls
# of derivative function calls
CPU time (s)
HDDP
926.2642
796
586
489
SNOPT
931.1934
301
301
83
0 corresponds to the Earth.
24
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
Table 7: Optimal static parameters for each phase of the GTOC4 trajectory. Body
Body
V∞
m0
t0
tf
1 #
2 #
(km/s)
(kg)
(MJD)
(MJD)
0
1
[0.6056280844E5, 0.6087346971E5, 0.1166261180]
0.150000000E4
0.6056280844E5
0.6087346971E5
1
2
[0.6087346971E5, 0.6105433915E5, 0.1503211339E1]
0.145702336E4
0.6087346971E5
0.6105433915E5
2
3
[0.6105433915E5, 0.6117228617E5, 0.5732629605]
0.1457023213E4
0.6105433915E5
0.6117228617E5
3
4
[0.6117228617E5, 0.6147368910E5, -0.1442777185E2]
0.1454588435E4
0.6117228617E5
0.6147368910E5
4
5
[0.6147368910E5, 0.6155457131E5, -0.3442815132E1]
0.1388058079E4
0.6147368910E5
0.6155457131E5
5
6
[0.6155457131E5, 0.6168479608E5, 0.1285209291E2]
0.1378572898E4
0.6155457131E5
0.6168479608E5
6
7
[0.6168479608E5, 0.6181659422E5, -0.1713959727E2]
0.1354366965E4
0.6168479608E5
0.6181659422E5
7
8
[0.6181659422E5, 0.6202268662E5, 0.5676274520E1]
0.1346964060E4
0.6181659422E5
0.6202268662E5
8
9
[0.6202268662E5, 0.6214606799E5, -0.1229533875]
0.1333671408E4
0.6202268662E5
0.6214606799E5
9
10
[0.6214606799E5, 0.6229888436E5, -0.3339891717E1]
0.1307106049E4
0.6214606799E5
0.6229888436E5
10
11
[0.6229888436E5, 0.6240361798E5, 0.3149314215E1]
0.1281408461E4
0.6229888436E5
0.6240361798E5
11
12
[0.6240361798E5, 0.6260213851E5, -0.4926779996E1]
0.1281408331E4
0.6240361798E5
0.6260213851E5
12
13
[0.6260213851E5, 0.6272707632E5, -0.3888349194E1]
0.1257511109E4
0.6260213851E5
0.6272707632E5
13
14
[0.6272707632E5, 0.6281958835E5, 0.4641538756E1]
0.1236617638E4
0.6272707632E5
0.6281958835E5
14
15
[0.6281958835E5, 0.6291570320E5, 0.1944523210E2]
0.1224535301E4
0.6281958835E5
0.6291570320E5
15
16
[0.6291570320E5, 0.6301755513E5, 0.1423925345E-1]
0.1173846117E4
0.6291570320E5
0.6301755513E5
16
17
[0.6301755513E5, 0.6308978742E5, 0.8867154830E1]
0.1138044522E4
0.6301755513E5
0.6308978742E5
17
18
[0.6308978742E5, 0.6322878110E5, 0.7004391395E1]
0.1095698752E4
0.6308978742E5
0.6322878110E5
18
19
[0.6322878110E5, 0.6336470037E5, 0.1461061798E2]
0.1068972677E4
0.6322878110E5
0.6336470037E5
19
20
[0.6336470037E5, 0.6344120838E5, -0.1187284667E2]
0.1045183636E4
0.6336470037E5
0.6344120838E5
20
21
[0.6344120838E5, 0.6362006008E5, 0.9087521608E1]
0.1036215390E4
0.6344120838E5
0.6362006008E5
21
22
[0.6362006008E5, 0.6376812849E5, -0.2886452763E1]
0.9980220982E3
0.6362006008E5
0.6376812849E5
22
23
[0.6376812849E5, 0.6386880148E5, -0.6331117778E1]
0.9688235582E3
0.6376812849E5
0.6386880148E5
23
24
[0.6386880148E5, 0.6397596122E5, 0.8571439780E1]
0.9450834093E3
0.6386880148E5
0.6397596122E5
24
25
[0.6397596122E5, 0.6421530844E5, 0.1740526569E2]
0.9262642097E3
0.6397596122E5
0.6421530844E5
8
x 10 1.5
7
x 10 1
1
0.5 z (km)
y (km)
0.5 0
0 −0.5
−0.5 −1 2
−1
2 8
−1.5 −2
x 10 −1.5
−1
−0.5
0 x (km)
0.5
1
1.5
0 y (km)
8
x 10
0 −2
−2
8
x 10
x (km)
Figure 15: GTOC4 trajectory (Earth=blue, flybys=green, rendezvous=red) from HDDP: two dimensional top view (left) and three-dimensional view (right).
25
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
0.25
3 2.5
0.2
i (deg)
Thrust (N)
2 0.15
1.5
0.1 1 0.05
0 0
0.5
1000
2000 Time (days)
3000
0 0
4000
1000
2000 Time (days)
3000
4000
Figure 16: GTOC4 Thrust History (left) and Inclination History (right) from HDDP.
8
Conclusion
This paper series introduces HDDP, a new algorithm intended for the solution of complex optimal control problems. In this second part, we test HDDP on four optimal control problems of varying levels of difficulty. In all cases, we find robust convergence and competitive performance when compared to some existing state of the art NLP solvers. As expected due to its formulation, HDDP seems particularly efficient for large-scale problems, where the number of stages substantially exceeds the state and control dimensions. The results also indicate that HDDP has the ability to determine accurate estimates of the adjoint variables. In addition, several algorithmic variants of HDDP are analyzed, and it is found that adding a filtering method and reducing the Hessian to handle control bounds tend to be beneficial. Overall, the standard algorithmic version of the solver described in the first part appears to be reliable and acceptably efficient.
While the practical results of the HDDP are very encouraging, there is certainly room for improvement since it is a relatively new algorithm. In particular, the performance of HDDP might be strongly affected by the choice of parameters throughout the optimization process (trust region, penalty update, acceptance criterion...). Our conclusions are restricted to the reported choice of parameters and some aspects of the algorithm can be possibily improved making variations of these features. Another possible area worthy of further investigation is the approximation of the Hessians via a quasi-Newton approach, in case second-order information is not available or too expensive. In addition, the choice of the trust region algorithm used to solve each 26
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
quadratic subproblem can impact the computational speed and robustness of HDDP, so it might be worthwile to compare different trust region variants (Levenberg-Marquardt, dog-leg...). Finally, since the examples in this paper are focused on space trajectory problems, it would be useful to investigate the potential of HDDP in different engineering applications, like robotics or chemical engineering.
In summary, the current study and results of the preliminary testing offer the hope that HDDP will prove useful in the increasingly important area of constrained, nonlinear optimal control.
References 1. A. Chinchuluun, P.M. Pardalos, R. Enkhbat, and I. Tseveendorj. Optimization and Optimal Control: Theory and Applications. Volume 39 of Springer Optimization and Its Applications. Springer, 2010. 2. D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier Scientific, New York, N.Y., 1970. 3. G. J. Whiffen. Static/dynamic control for optimizing a useful objective. No. Patent 6496741, December 2002. 4. C. Colombo, M. Vasile, and G. Radice. Optimal low-thrust trajectories to asteroids through an algorithm based on differential dynamic programming. Celestial mechanics and dynamical astronomy, 105(1):75–112, 2009. 5. P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization. SIAM Journal on Optimization, 12(4):979–1006, 2002. 6. A. Wachter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming, mathematical programming. Mathematical Programming, 106(1):25–57, 2006. 7. G. Lantoine. A Methodology for Robust Optimization of Low-Thrust Trajectories in Multibody Environments. PhD thesis, School of Aerospace Engineering, Georgia Institute of Technology, Georgia, 2010. 8. R. H. Byrd, J. Nocedal, and R. A. Waltz. KNITRO: An integrated package for nonlinear optimization. In Large Scale Nonlinear Optimization, pages 35–59. Springer Verlag, 2006. 9. T. F. Coleman and Y. Li. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal of Optimization, 6(2):418–445, 1996. 10. R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Numerical analysis report na/195, Department of Mathematics, University of Dundee, Scotland, 1997. 11. S. Jain. Multiresolution Strategies for the Numerical Solution of Optimal Control Problems. PhD thesis, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, 2008.
27
cite published version: Lantoine, G., Russell, R. P., “A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems, Part 2: Application,” Journal of Optimization Theory and Applications, Vol. 154, No. 2, 2012, pp. 418-442, DOI, 10.1007/s10957-012-0038-1
12. J. L. Arsenault, K. C. Ford, and P. E. Koskela. Orbit determination using analytic partial. derivatives of perturbed motion. AIAA Journal, 8:4–12, 1970. 13. J. A. Sims, P. Finlayson, E. Rinderle, M. Vavrina, and T. Kowalkowski. Implementation of a low-thrust trajectory optimization algorithm for preliminary design. No. AIAA-2006-674, August 2006. AAS/AIAA Astrodynamics Specialist Conference and Exhibit, Keystone, CO. 14. R. P. Russell and C. A. Ocampo. Optimization of a broad class of ephemeris model earthmars cyclers. Journal of Guidance, Control, and Dynamics, 29(2):354–367, 2006. 15. G. Lantoine and R. P. Russell. A fast second-order algorithm for preliminary design of low-thrust trajectories. Paper IAC-08-C1.2.5, 2008. 59th International Astronautical Congress, Glasgow, Scotland, Sep 29 - Oct 3. 16. M. J. D. Powell. Algorithms for nonlinear constraints that use lagrangian functions. Mathematical Programming, 14:224–248, 1978. 17. R. Bate, D. Mueller, and J. White. Fundamentals of Astrodynamics. Dover Publications, New York, 1971. 18. E. T. Pitkin. Second transition partial derivatives via universal variables. Journal of Astronautical Sciences, 13:204, January 1966. 19. T. Dargent and V. Martinot. An integrated tool for low thrust optimal control orbit transfers in interplanetary trajectories. In Proceedings of the 18th International Symposium on Space Flight Dynamics, page 143, Munich, Germany, October 2004. German Space Operations Center of DLR and European Space Operations Centre of ESA.
28