ence approximation, up to the analysis of advanced differentiation schemes, such as the ..... 2.2 Analytical and numerical derivatives of the function f(x) = sin(x), h ...
UNIVERSITÀ DEGLI STUDI DI NAPOLI “FEDERICO II”
SCUOLA POLITECNICA E DELLE SCIENZE DI BASE DIPARTIMENTO DI INGEGNERIA INDUSTRIALE TESI DI LAUREA MAGISTRALE IN INGEGNERIA AEROSPAZIALE
IMPLEMENTATION OF ADVANCED DIFFERENTIATION METHODS FOR OPTIMAL TRAJECTORY COMPUTATION
RELATORI
CANDIDATO
PROF. MICHELE GRASSI ING. MARCO SAGLIANO
VINCENZO D’ONOFRIO M53/363
ANNO ACCADEMICO 2014/2015
Alla mia famiglia, per essere stata un costante sostegno materiale e morale per essere stata punto di riferimento e dispensatrice di valori durante questo lungo e arduo percorso.
A Clara, compagna di vita, per tutte le emozioni condivise per il sentimento che ci lega maturato in questi anni per ogni ostacolo affrontato e superato insieme.
Abstract Nowadays the new, increased capabilities of CPUs have constantly encouraged researchers and engineers towards the investigation of numerical optimization as an analysis and synthesis tool to generate optimal trajectories and the controls to track them. In particular, one of the most promising techniques is represented by direct methods. Among these, Pseudospectral Methods are gaining widespread acceptance for their straightforward implementation and some useful properties like the “spectral” (exponential) convergence observable in the case of smooth problems. Direct methods use gradient-based techniques, and require the computation of the derivatives of the objective function and of the constraints of the problem under analysis. The accuracy of these derivatives has a strong impact on the computational efficiency and reliability of the solutions. Therefore, the quality of the results and the computation time are strongly affected by the Jacobian matrix describing the discrete, transcribed Optimal Control Problem (OCP), that is, the resulting Nonlinear Programming Problem (NLP). From this perspective, the core of this thesis provides the reader a thorough knowledge of several differentiation methods starting from the analysis of the most basic approaches, which estimate derivatives by means of finite difference approximation, up to the analysis of advanced differentiation schemes, such as the complex-step derivative approach, and the dual-step derivative method. These methods are here implemented in SPARTAN (Shefex-3 Pseudospectral Algorithm for Reentry Trajectory ANalysis), a tool developed by the DLR (Deutsches Zentrum f¨ ur Luft und Raumfahrt) which implements the global Flipped Radau Pseudospectral Method (FRPM), in order to solve several well-known literature examples of OCP. Results in terms of accuracy and CPU time are thoroughly inspected. Furthermore, the problem of the differentiation of real-time signals is discussed with the aim to examine, on tutorial examples, robust differentiators/observers based on sliding mode control technique.
Sommario Oggigiorno, le nuove e accresciute capacit`a delle CPU incoraggiano ricercatori ed ingegneri verso lo studio di tecniche di ottimizzazione numerica come strumento di analisi e sintesi, in modo da calcolare traiettorie ottimali e i controlli necessari a generarle. Una delle tecniche che si preannunciano pi` u promettenti `e rappresentata dai metodi diretti. Tra questi ultimi i metodi pseudospettrali stanno guadagnando ampio consenso grazie alla loro chiara implementazione, e ad alcune loro utili propriet`a, come la convergenza “spettrale” (esponenziale) osservabile nel caso di problemi “smooth”. I metodi diretti utilizzano tecniche basate su gradienti, e richiedono dunque il calcolo delle derivate della funzione di costo e dei vincoli che descrivono il problema in esame. L’accuratezza di queste derivate ha un grosso impatto sull’efficienza computazionale e sull’affidabilit`a delle soluzioni. Per questo motivo, la qualit`a dei risultati e la potenza computazionale necessaria a generarli sono fortemente influenzati dalla matrice Jacobiana, la quale descrive il problema di programmazione non-lineare (Nonlinear Programing Problem, NLP) ottenuto dalla discretizzazione e transformazione del problema di controllo ottimo (Optimal Control Problem, OCP). In quest’ottica, la parte centrale della tesi fornisce al lettore una conoscenza dettagliata di un’ampia gamma di metodi di differenziazione, partendo dall’analisi degli approcci di base, i quali calcolano le derivate attraverso l’approssimazione delle differenze finite, fino all’analisi degli schemi di differenziazione avanzati, come ad esempio l’approccio mediante complex-step ed il metodo derivativo dual-step. Nella tesi, questi metodi sono implementati in SPARTAN (Shefex-3 Pseudospectral Algorithm for Reentry Trajectory ANalysis), un algoritmo elaborato presso il DLR (Deutsches Zentrum fur Luft und Raumfahrt), e che implementa il “Flipped Radau Pseudospectral Method” (FRPM), al fine di risolvere alcuni esempi di OCP ben documentati in letteratura. I risultati in termini di accuratezza e costo computazionale sono esaminati in modo dettagliato. Inoltre, nella tesi si analizza il problema della differenziazione di segnali dati in tempo reale con l’obbiettivo di esaminare, attraverso alcuni esempi, differenziatori robusti basati sulla tecnica di controllo sliding mode.
Acknowledgements I would like to express my gratitude to my supervisor Prof. Michele Grassi. Thanks for suggesting me the opportunity to be part of the DLR, for the useful comments, remarks and diligence throughout the learning process of this master thesis. I would like to express my deep appreciation to Eng. Marco Sagliano. It has been an honour to work with you, thanks for helping me develop the thesis, for your advice and for acting as a mentor to me. Special thanks are given to the members of the GNC Dep. of the DLR’s Institute of Space System in Bremen. Thanks for the time spent together and for helping me integrate in the Bremen society. The most special gratitude goes to my best partner, Clara. She gave me her unconditional support and love through all this long path. I am very privileged to have someone like you. Last but not the least important, I owe more than thanks to my family members for their financial support and encouragement throughout my life.
Contents
List of Figures
iv
List of Tables
viii
1 Introduction 1.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation and goals . . . . . . . . . . . . . . . . . . . . . . . 1.3 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . .
1 1 2 3
2 Finite Difference Traditional Schemes 5 2.1 Backward and Forward Differences Traditional Schemes . . . . 6 2.2 Central Difference Traditional Schemes . . . . . . . . . . . . . 7 2.2.1 3-points stencil central difference scheme . . . . . . . . 7 2.2.2 5-points stencil central difference scheme . . . . . . . . 8 2.2.3 7-points and K-points stencil central difference schemes 8 2.2.4 Numerical Examples . . . . . . . . . . . . . . . . . . . 9 2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Advanced Differentiation Scheme: the Complex-Step Derivative Approach 3.1 The Complex-Step Derivative Approximation . . . . . . . . . 3.2 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 29 30 38
4 Advanced Differentiation Scheme: the Dual-Step Derivative Approach 4.1 The Dual-Step Derivative Approach . . . . . . . . . . . . . . . 4.2 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . 4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41 42 42 49
i
Contents
5 Generation of Reference Data 5.1 Definition of Gradient and Jacobian . . 5.2 Numerical Examples . . . . . . . . . . 5.2.1 Space Shuttle Reentry Problem 5.2.2 Orbit Raising Problem . . . . . 5.2.3 Hang Glider Problem . . . . . . 5.3 Generation of Reference Jacobians . . . 5.4 Conclusions . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
6 Jacobian Matrix Generation with Numerical Differentiations - Analysis of Accuracy and CPU Time 6.1 Jacobian Matrix Generation with Central Difference Traditional Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Space Shuttle Reentry Problem . . . . . . . . . . . . . 6.1.2 Orbit Raising Problem . . . . . . . . . . . . . . . . . . 6.1.3 Hang Glider Problem . . . . . . . . . . . . . . . . . . . 6.2 Jacobian Matrix Generation with Complex-Step Derivative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Space Shuttle Reentry Problem . . . . . . . . . . . . . 6.2.2 Orbit Raising Problem . . . . . . . . . . . . . . . . . . 6.2.3 Hang Glider Problem . . . . . . . . . . . . . . . . . . . 6.3 Jacobian Matrix Generation with Dual-Step Derivative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Space Shuttle Reentry Problem . . . . . . . . . . . . . 6.3.2 Orbit Raising Problem . . . . . . . . . . . . . . . . . . 6.3.3 Hang Glider Problem . . . . . . . . . . . . . . . . . . . 6.4 CPU Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Space Shuttle Reentry Problem . . . . . . . . . . . . . 6.4.2 Orbit Raising Problem . . . . . . . . . . . . . . . . . . 6.4.3 Hang Glider Problem . . . . . . . . . . . . . . . . . . . 6.5 Analysis of CPU Time vs. Increasing Size of the Problem . . . 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Use of the Advanced Differentiation Schemes for Optimal Control Problems 7.1 General Formulation of an Optimal Control Problem . . . . . 7.2 SPARTAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Hybrid Jacobian Computation . . . . . . . . . . . . . . . . . . 7.4 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Space Shuttle Reentry Problem . . . . . . . . . . . . . 7.4.2 Orbit Raising Problem . . . . . . . . . . . . . . . . . . ii
50 50 52 52 54 55 56 59
60 61 61 63 64 66 66 67 67 69 69 70 71 73 73 74 75 76 78
79 80 82 84 85 85 91
Contents
7.5
7.4.3 Hang Glider Problem . . . . . . . . . . . . . . . . . . . 96 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8 Further Tools: Robust Differentiation via Sliding Mode Technique 102 8.1 Sliding Mode Technique . . . . . . . . . . . . . . . . . . . . . 103 8.1.1 Theory of Sliding Mode Control . . . . . . . . . . . . . 103 8.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.2 Sliding Mode Robust Differentiators . . . . . . . . . . . . . . . 107 8.2.1 Fifth-Order Differentiator . . . . . . . . . . . . . . . . 108 8.2.2 Second Order Nonlinear System Obsrever . . . . . . . 111 8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9 Conclusions 115 9.1 Lesson Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.2 Future Developments . . . . . . . . . . . . . . . . . . . . . . . 117 A Dual and Hyper-Dual Numbers 118 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2 Dual Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.3 Algebraic operations . . . . . . . . . . . . . . . . . . . 119 A.2.4 Defining functions . . . . . . . . . . . . . . . . . . . . . 120 A.2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . 122 A.3 Hyper-Dual Numbers . . . . . . . . . . . . . . . . . . . . . . . 129 A.3.1 Defining Algebraic Operations . . . . . . . . . . . . . . 129 A.3.2 Hyper-Dual Numbers for Exact Derivative Calculations 130 A.3.3 Numerical Examples . . . . . . . . . . . . . . . . . . . 132 A.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . 134 Bibliography
141
iii
List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20
Function f (x) = sin(x). . . . . . . . . . . . . . . . . . . . . . Analytical and numerical derivatives of the function f (x) = sin(x), h = 1 · 10−2 . . . . . . . . . . . . . . . . . . . . . . . . Errors comparison, f (x) = sin(x) and h = 1 · 10−2 . . . . . . . Errors comparison, f (x) = sin(x) and varying h. . . . . . . . Errors comparison, 3-points stencil scheme, function f (x) = sin(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors comparison, 5-points stencil scheme, function f (x) = sin(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors comparison, 7-points stencil scheme, function f (x) = sin(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function f (x) = x1 . . . . . . . . . . . . . . . . . . . . . . . . Analytical and numerical derivatives of the function f (x) = x1 , h = 1 · 10−2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors comparison, f (x) = x1 and h = 1 · 10−2 . . . . . . . . . Errors comparison, f (x) = x1 and varying h. . . . . . . . . . Function f (x) = x12 . . . . . . . . . . . . . . . . . . . . . . . . Analytical and numerical derivatives of the function f (x) = 1 , h = 1 · 10−2 . . . . . . . . . . . . . . . . . . . . . . . . . . x2 Errors comparison, f (x) = x12 and h = 1 · 10−2 . . . . . . . . . Errors comparison, f (x) = x12 and varying h. . . . . . . . . . Function f (x) = A sin(ωt)e−λt . . . . . . . . . . . . . . . . . . Analytical and numerical derivatives of the function f (x) = A sin(ωt)e−λt , h = 1 · 10−2 . . . . . . . . . . . . . . . . . . . . Errors comparison, f (x) = A sin(ωt)e−λt and h = 1 · 10−2 . . . Errors comparison, f (x) = A sin(ωt)e−λt and varying h. . . . Function f (x) = A sin2 (ωt) cos(ωt2 ). . . . . . . . . . . . . . .
iv
. 10 . 11 . 11 . 12 . 12 . 13 . 13 . 14 . . . .
15 15 16 17
. . . .
17 18 18 19
. . . .
20 20 21 22
List of Figures
2.21 Analytical and numerical derivatives of the function f (x) = A sin2 (ωt) cos(ωt2 ), h = 1 · 10−2 . . . . . . . . . . . . . . . . . . 2.22 Errors comparison, f (x) = A sin2 (ωt) cos(ωt2 ) and h = 5 · 10−4 . 2.23 Errors comparison,f (x) = A sin2 (ωt) cos(ωt2 ) and varying h. . x 2.24 Function f (x) = √ 3 e . . . . . . . . . . . . . . . . . . 3 sin (x)+cos (x)
22 23 23 24
2.25 Analytical and numerical derivatives of the function f (x) = x √ 3 e , h = 1 · 10−2 . . . . . . . . . . . . . . . . . . . . 25 3 sin (x)+cos (x)
2.26 Errors comparison, f (x) = √ 2.27 Errors comparison, f (x) = √
ex sin3 (x)+cos3 (x) ex sin3 (x)+cos3 (x)
and h = 1 · 10−3 . . . 25 and varying h. . . . 26
3.1
Relative error in the sensitivity estimates, function f (x) = sin(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Minimum-Errors comparison, function f (x) = sin(x). . . . . . 3.3 Relative error in the sensitivity estimates, function f (x) = x1 . . 3.4 Minimum-Errors comparison, function f (x) = x1 . . . . . . . . . 3.5 Relative error in the first derivative, function f (x) = x12 . . . . 3.6 Minimum-Errors comparison, function f (x) = x12 . . . . . . . . 3.7 Relative error in the first derivative, f (x) = A sin(ωt)e−λt . . . 3.8 Minimum-Errors comparison, f (x) = A sin(ωt)e−λt . . . . . . . 3.9 Relative error in the first derivative, f (x) = A sin2 (ωt) cos(ωt2 ). 3.10 Minimum-Errors comparison, f (x) = A sin2 (ωt) cos(ωt2 ). . . . x . . . 3.11 Relative error in the first derivative, f (x) = √ 3 e 3 sin (x)+cos (x)
31 31 32 33 34 34 35 36 37 37 39
3.12 Relative error in the first derivative [4]. . . . . . . . . . . . . . 39 x . . . . . . . 40 3.13 Minimum-Errors comparison, f (x) = √ 3 e 3 sin (x)+cos (x)
4.1 4.2 4.3 4.4 4.5 4.6
Relative Relative Relative Relative Relative Relative
4.7
Relative error in the first derivative [13]. . . . . . . . . . . . . 49
5.1
Jacobian Matrix Sparsity Patterns for the Space Shuttle Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Jacobian Matrix Sparsity Patterns for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Jacobian Matrix Sparsity Patterns for the Hang Glider Problem. 59
5.2 5.3
error error error error error error
in in in in in in
the the the the the the
first first first first first first
derivative, derivative, derivative, derivative, derivative, derivative,
v
function f (x) = sin(x). . function f (x) = x1 . . . . . function f (x) = x12 . . . . f (x) = A sin(ωt)e−λt . . . f (x) = A sin2 (ωt) cos(ωt2 ). x f (x) = √ 3 e . . . 3 sin (x)+cos (x)
43 44 45 46 47 48
List of Figures
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem. . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Hang Glider Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem. . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Hang Glider Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem. . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum error in the Jacobian Matrix for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU Time Required for the Space Shuttle Reentry Problem. CPU Time Required for the Orbit Raising Problem. . . . . . CPU Time Required for the Hang Glider Problem. . . . . . CPU Time Required for the Space Shuttle Reentry Problem. CPU Time Required for the Orbit Raising Problem. . . . . . CPU Time Required for the Hang Glider Problem. . . . . .
. 62 . 63 . 65 . 66 . 68 . 69 . 70 . 71 . . . . . . .
72 73 74 75 76 77 77
Legendre Polynomials of order 5. . . . . . . . . . . . . . . . . States Evolution for the Space Shuttle Reentry Problem. . . . Controls Evolution for the Space Shuttle Reentry Problem. . . Heat Rate Evolution for the Space Shuttle Reentry Problem. . Discrepancy between optimized and propagated solutions for the Space Shuttle Reentry Problem. . . . . . . . . . . . . . . . 7.6 Shuttle reentry - state and control variables. Results reported in [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Space Shuttle Reentry Problem - Groundtrack of Trajectory Optimizing Final Crossrange. . . . . . . . . . . . . . . . . . . 7.8 Spectral Convergence for the Space Shuttle Reentry Problem. 7.9 States Evolution for the Orbit Raising Problem. . . . . . . . . 7.10 Control Evolution for the Orbit Raising Problem. . . . . . . . 7.11 Orbit Raising Problem - Trajectory Optimizing Final Orbit Energy (LU=Unitary Length). . . . . . . . . . . . . . . . . . .
84 86 86 87
6.10 6.11 6.12 6.13 6.14 6.15 7.1 7.2 7.3 7.4 7.5
vi
87 88 88 90 91 92 92
List of Figures
7.12 Discrepancy between optimized and propagated solutions for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . 7.13 Orbit Raising - state and control variables. Results reported in[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.14 Spectral Convergence for the Orbit Raising Problem. . . . . 7.15 States Evolution for the Hang Glider Problem. . . . . . . . . 7.16 Control Evolution for the Hang Glider Problem. . . . . . . . 7.17 Discrepancy between optimized and propagated solutions for the Hang Glider Problem. . . . . . . . . . . . . . . . . . . . 7.18 Hang Glider - state and control variables. Results reported in [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.19 Spectral Convergence for the Hang Glider Problem. . . . . . 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
. 93 . . . .
93 95 97 97
. 98 . 98 . 100
Sliding Variable and Sliding Mode Control . . . . . . . . . . . Asymptotic Convergence and State Trajectory for f (x1 , x2 , t) = sin(2t) and u(x1 , x2 ) = −cx2 − ρsign(σ). . . . . . . . . . . . . Fifth-Order Differentiator, without noise. . . . . . . . . . . . . Fifth-Order Differentiator Errors, without noise. . . . . . . . . Fifth-Order Differentiator, with noise. . . . . . . . . . . . . . . Fifth-Order Differentiator Errors, with noise. . . . . . . . . . . True and Estimated State Variables, without noise. . . . . . . True and Estimated State Variables, with noise. . . . . . . . . Comparison between True, Estimated and Measured Position.
106 107 109 109 110 110 112 113 113
A.1 Accuracies of several derivative calculation methods as a function of the step size for the function f (x) = A sin(ωt)e−λt . . . . 133 A.2 Accuracies of several derivative calculation methods as a funcx . . . 133 tion of the step size for the function f (x) = √ 3 e 3 sin (x)+cos (x)
A.3 Accuracies of several derivative calculation methods as a funcx tion of the step size for the function f (x) = √ 3 e , [15].133 3 sin (x)+cos (x)
vii
List of Tables
6.1 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6
Accuracy and step size comparison for the Space Shuttle Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Accuracy and step size comparison for the Orbit Raising Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Accuracy and step size comparison for the Hang Glider Problem. 65 Accuracy and CPU Time comparison for the Space Shuttle Problem (SNOPT). . . . . . . . . . . . . . . . . . . . . . . . Accuracy and CPU Time comparison for the Space Shuttle Problem (IPOPT). . . . . . . . . . . . . . . . . . . . . . . . Accuracy and CPU Time comparison for the Orbit Raising Problem (SNOPT). . . . . . . . . . . . . . . . . . . . . . . . Accuracy and CPU Time comparison for the Orbit Raising Problem (IPOPT). . . . . . . . . . . . . . . . . . . . . . . . Accuracy and CPU Time comparison for the Hang Glider Problem (SNOPT). . . . . . . . . . . . . . . . . . . . . . . . Accuracy and CPU Time comparison for the Hang Glider Problem (IPOPT). . . . . . . . . . . . . . . . . . . . . . . .
viii
. 89 . 89 . 94 . 94 . 99 . 99
Chapter
1
Introduction 1.1
State of the Art
Nowadays the new, increased capabilities of CPUs have constantly encouraged researchers and engineers towards the investigation of numerical optimization as an analysis and synthesis tool to generate optimal trajectories and the controls to track them. Optimal control is defined in [8] as the subject where it is desired to determine the inputs to a dynamical system that optimize (i.e., minimize or maximize) a specified performance index while satisfying any constraints on the motion of the system. Because of the complexity of most applications, Optimal Control Problems (OCPs) can no longer be solved analytically and, consequently, they are most often solved numerically. [1] concentrates on practical numerical methods for solving the OCP. Numerical methods for solving OCPs are divided in two major classes: indirect methods and direct methods. Indirect methods are based on the Pontryagin Maximum Principle, which leads to a multiple-point boundary-value problem. Direct methods, instead, consist in the discretization of the OCP, transcribing it to a nonlinear optimization problem or NonLinear Programming Problem (NLP). It is seen [8] that indirect methods and direct methods emanate from two different philosophies. Indeed, the indirect approach solves the problem indirectly by converting the OCP to a boundary-value problem and, as a result, the optimal solution is found by solving a system of differential equations that satisfies endpoint and/or interior point conditions. On the other hand, using a direct approach, the optimal solution is found by transcribing an infinite-dimensional optimization problem to a finite-dimensional optimization problem. As a consequence, researchers who focus on indirect methods 1
Chapter 1. Introduction
are interested more in differential equations theory, while researchers who focus on direct methods are occupied largely in optimization techniques. From this perspective, SPARTAN (Shefex-3 Pseudospectral Algorithm for Reentry Trajectory ANalysis) is an optimal control package developed by the DLR (Deutsches Zentrum f¨ ur Luft und Raumfahrt). It is the reference tool for the development of the entry guidance for the SHEFEX-3 (Sharp Edge Flight Experiment) mission and has been validated with several well-known literature examples. SPARTAN implements the global Flipped Radau Pseudospectral Methods (FRPMs) to solve constrained or unconstrained OCPs, which can have a fixed or variable final time. Pseudospectral Methods [5] represent a particular area of interest in the frame of the wider class of direct methods. The basic idea behind these methods is, as in the other direct methods, to collocate the differential equations, the cost function and the constraints (if any) in a finite number of points to treat them as a set of nonlinear algebraic constraints. In this way, the continuous OCP is reduced to a discrete NLP problem having finite dimensions, which can be solved with one of the well-known available software packages, e.g. SNOPT or IPOPT. SPARTAN has a highly exploited Jacobian structure, as well as routines for automatic linear/nonlinear scaling and auto-validation using the RungeKutta 45 scheme. In more details, SPARTAN exploits the general structure of the Jacobian associated to the NLP problem deriving from the application of the FRPM, which results in a hybrid computation. Indeed, the Jacobian matrix is expressed as a sum of three different contributions. Two of them (the Pseudospectral and the Theoretical contributions) are exact, the third term (the Numerical contribution), instead, is not exact and it is numerically computed using the complex-step derivative technique which is proved to be subject to truncation errors [4, 8] .
1.2
Motivation and goals
Four computational issues that arise in the numerical solution of OCP are: consistent approximations for the solution of differential equations; the scaling of the OCP; exploitation of sparsity in the NLP; computation of derivatives of the objective and constraint functions.
2
Chapter 1. Introduction
Indeed, an inconsistent approximation of the differential equation can lead to either nonconvergence or convergence of the optimization problem to a poor solution. Scaling and exploitation of sparsity in the NLP are discussed for SPARTAN in [5, 10] and they greatly affect both computational efficiency and convergence of the NLP. Finally, the manner in which derivatives are computed is of great importance because accurate derivatives can strongly improve both computational efficiency and reliability. Therefore, differential information is an important ingredient in all optimization algorithms, and consequently it is well worth analysing thoroughly the methods for computing these quantities. From this perspective, the aim of this thesis is to provide the reader a thorough knowledge of several differentiation methods starting from the study of the most basic approaches, which estimate derivatives by means of finite difference approximation, up to the analysis of advanced differentiation schemes, such as the complex-step derivative approach, and the dual-step derivative method. These methods will be compared in terms of accuracy and computanion time, as well as they will be implemented in SPARTAN to assess the effects of the use of the pseudospectral methods in combination with each of these differentiation methods.
1.3
Structure of the thesis
The work presented in this thesis is organized as follows. In Chapter 2 the most common methods for the finite difference approximation of a first-order derivative are discussed. The central difference schemes are treated in detail, and are implemented to compute the first-order derivative of six different test functions having increasing complexity. The effects, in terms of accuracy, of selecting different values of the perturbation h are discussed. Chapter 3 presents the complex-step derivative approach and its application to the six test functions already defined. The results are compared with respect to the ones achieved using the central difference traditional schemes. Chapter 4 describes the dual-step derivative method and its implementation. The error in the first derivative calculation is compared with the error for the central difference schemes and complex-step approximation. In Chapter 5, the analytical structure of the Gradient vector and the Jacobian matrix are analysed to generate a set of reference data for each of the following problems: the Space Shuttle Reentry Problem [1], the Orbit Raising Problem [5], and the Hang Glider Problem [1]. The reference data is used 3
Chapter 1. Introduction
to compare the exact analytical Jacobian with the numerical one, presented in Chapter 6, and computed using the numerical differentiations previously defined. In Chapter 7, the differentiation schemes are used to solve optimal control problem. A general formulation of an OCP is shown, then the differentiation schemes are implemented in SPARTAN. Three numerical examples of OCP are studied: the maximization of the final crossrange in the space shuttle reentry trajectory, the maximization of the final specific energy in an orbit raising problem, and the maximization of the final range of a hang glider in presence of a specific updraft. Each of the three examples is solved using two different off-the-shelf, well-known NLP solvers: SNOPT and IPOPT. The results obtained using the different differentiation schemes are thoroughly inspected in terms of accuracy and CPU time. In Chapter 8, we deal with the problem of the differentiation of signals given in real time with the aim to design a robust differentiator based on sliding mode technique. Two sliding mode robust differentiators are examined on tutorial examples, and simulated in Simulink.
4
Chapter
2
Finite Difference Traditional Schemes Overview Direct methods for optimal control use gradient based techniques for solving the NLP (Nonlinear optimization problem or Nonlinear programming problem). Gradient methods for solving NLPs require the computation of the derivatives of the objective function and constraints and, the accuracy of the derivatives has a strong impact on the computational efficiency and reliability of the solutions. The most obvious way to compute the derivatives of the objective function and constraints is by analytical differentiation. This approach is appealing because analytical derivatives are exact and generally result in faster optimization but, in many cases it is impractical to compute them. For this reasons, the aim of the following discussion is to employ alternative means to obtain the necessary gradients. The most basic way to estimate derivatives is by finite difference approximation. In this chapter the most common methods for the finite difference approximation of a derivative are discussed. The principle of the finite difference methods consists in approximating the differential operator by using a discrete differential operator. Considering a generic one dimensional function f(x) defined in the domain D=[0, X], it is possible to identify N-grid points xi = ih
(i = 0, 1, . . . , N − 1)
where h is the mesh size. The finite difference schemes calculate derivatives by approximating them by linear combinations of function values at the grid 5
Chapter 2. Finite Difference Traditional Schemes
points. The simplest schemes using this approach are the backward and forward schemes.
2.1
Backward and Forward Differences Traditional Schemes
The two basic methods for the finite difference approximation of a derivative are backward differencing and forward differencing. They are obtained considering Taylor series expansion about the point xi ∈ D: 1 (2.1) f (xi − h) = f (xi ) − f 0 (xi )h + f 00 (xi )h2 + . . . 2 1 f (xi + h) = f (xi ) + f 0 (xi )h + f 00 (xi )h2 + . . . (2.2) 2 Focusing on the forward difference scheme, dividing the equation (2.2) through by h yields [1] f 0 (xi ) =
f (xi + h) − f (xi ) f (xi + h) − f (xi ) h 00 − f (xi ) . . . = + O(h) (2.3) h 2 h
If the terms of order O(h) are ignored, one obtains the forward difference approximation (2.4). Applying the same procedure to the equation (2.1) one obtains the backward difference approximation (2.5) [1] f 0 (xi ) =
f (xi + h) − f (xi ) + O(h) h
Forward Difference
(2.4)
f (xi ) − f (xi − h) + O(h) Backward Difference (2.5) h where h is the perturbation around xi . The choice of h directly affects the accuracy of the backward and forward difference schemes. The error between the numerical solution and the exact analytical one is called truncation error because it reflects the fact that only a finite part of a Taylor series is used in the approximation. In both cases, using either a forward or a backward scheme, the truncation error is O(h) and for this reason we will refer to these schemes as first order approximations. In addition, due to the fact that, on a digital computer, the difference approximation must be evaluated using finite precision arithmetic, there is a second source of error, the round-off error, which depends on the accuracy of the evaluation of the function we are dealing with. To compute more accurate numerical derivatives, it is worth analysing the central difference schemes. f 0 (xi ) =
6
Chapter 2. Finite Difference Traditional Schemes
2.2
Central Difference Traditional Schemes
If the function f (x) can be evaluated at values that lie to the left and right of xi , then the central difference schemes will involve abscissas that are chosen symmetrically on both sides of xi ∈ D. In the following discussions six different functions will be examined using different central difference scheme: 3-points stencil central difference scheme; 5-points stencil central difference scheme; 7-points stencil central difference scheme; K-points stencil central difference scheme.
Errors between exact analytic derivatives and numerical ones will be calculated considering different values of the perturbation h.
2.2.1
3-points stencil central difference scheme
Assuming that f ∈ C 3 in D and that (xi + h2 ) and (xi − h2 ) ∈ D then 2 h h 1 00 h 0 f xi − = f (xi ) − f (xi ) + f (xi ) + ... 2 2 2 2
(2.6)
2 h h 1 00 h 0 f xi + = f (xi ) + f (xi ) + f (xi ) + ... 2 2 2 2
(2.7)
end if we combine the equations (2.7) and (2.6) we get the 3-points stencil central difference formula (2.8) [2], the truncation error (2.9) [2] and the round-off error (2.10) [1] f 0 (xi ) =
f (xi + h2 ) − f (xi − h2 ) h h2 |f 000 (x)| 24 2 ηR = h
ηT =
7
(2.8) (2.9) (2.10)
Chapter 2. Finite Difference Traditional Schemes
The 3-points stencil central difference scheme error is O(h2 ) which means that it is a second order approximation and it provides more accurate results than the backward and forward traditional schemes.
2.2.2
5-points stencil central difference scheme
Assuming that f ∈ C 5 in D and that (xi + h), (xi − h), (xi + h2 ) and (xi − h2 ) ∈ D it is possible to derive the 5-points stencil central difference formula (2.11) [2], the truncation error (2.12) [2] and the round-off error (2.13) [1] −f (xi + h) + 8f xi + h2 − 8f xi − h2 + f (xi − h) 0 (2.11) f (xi ) = 6h h 4 (5) |f (x)| 2 ηT = (2.12) 30 3 ηR = (2.13) h It is a fourth order approximation meaning that the truncation error term is of the order O(h4 ). Comparing the formulas (2.8) and (2.11) it is possible to observe that the truncation error for the fourth order formula is O(h4 ) and will go to zero faster than the truncation error O(h2 ) for the second order formula. This will have a strong impact on the choice of h as it will be seen in the next sections.
2.2.3
7-points and K-points stencil central difference schemes
To derive the 7-points stencil central difference formula it is convenient to derive a formula for a generic number of points K. Up to now the numerical derivative of the function f (x) at any point xi has been computed approximating f (x) by a polynomial in the neighbourhood of xi . Considering N equidistant points around xi so that N is an odd number and N −1 N −1 ,..., , (2.14) f (xk ) = fk ; xk = xi + kh; k=− 2 2 and assuming that the points (xk , fk ) are interpolated by a polynomial of (N − 1)th degree N −1 X PN −1 (x) = aj x j (2.15) j=0
8
Chapter 2. Finite Difference Traditional Schemes
where coefficients aj are found as a solution of system of linear equation {PN −1 (xk = fk )}, the derivative f 0 (xi ) can be approximated by the derivative of the constructed interpolating polynomial f 0 (xi ) ≈ PN0 −1 (xi )
(2.16)
It can be seen that the generic expression of the central difference scheme has an anti-symmetric structure and in general difference of N th order can be written as [3] (N −1) 2 1 X f 0 (xi ) ≈ ak (fk − f−k ) h k=1
(2.17)
Starting from (2.17), it is possible to derive the formula for N = 7 which represents the 7-points stencil central difference formula [3] f 0 (x) =
2.2.4
−f (xi − 3h) + 9f (xi − 2h) − 45f (xi − h) 60h 45f (xi + h) − 9f (xi + 2h) + f (xi + 3h) + (2.18) 60h
Numerical Examples
In the following subsection the schemes described above are applied to compute the first derivative of some test functions. Let f (x) = sin(x).
The following figures illustrate the function trend (Figure 2.1), the analytical and numerical derivatives of the function (Figure 2.2) and the comparison between errors related to the three different schemes, at first considering a constant h (Figure 2.3), and then varying h (Figure 2.4).The error are computed using the analytical result as the reference; 0 0 η = f 0 − fref /fref . As shown in Figure 2.2, considering h = 1 · 10−2 , the more points the stencil is composed of, the more accurate the numerical derivative is, so it is convenient to use a 7 points stencil central difference scheme. This fact can be pointed out also observing the Figure 2.3, in which the error decreases as the number of the points of the stencil increases. However, this is not generally true. Indeed, Figure 2.4 shows that, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h is so that the error |η| is minimum (meaning ηT = ηR ). If h is furtherly reduced, the 9
Chapter 2. Finite Difference Traditional Schemes
accuracy of the more complex stencil schemes decreases because of the increase of the relative round-off error, which becomes dominant. In addition, their computational load will be heavier due to the increase of the number of points where the function must be evaluated. So, in this case, below some values of h, it is not convenient to use a more complex stencil.
Figure 2.1: Function f (x) = sin(x).
Considering the 3-points stencil scheme, it is interesting to compare the trend of the error |η| between analytical and numerical derivative of the function with the trend of the truncation error ηT and the roundoff error ηR . Figure 2.5 illustrates that the minimum of |η| occurs, approximatively, at the intersection between ηT and ηR (meaning at the value of h which corresponds to the minimum value of the sum (ηR + ηT )), as it is reasonable to expect. The same analysis is repeated considering the 5-points stencil scheme (Figure 2.6) and the 7-points stencil scheme (Figure 2.7). From the qualitative point of view, the results are the same.
10
Chapter 2. Finite Difference Traditional Schemes
Figure 2.2: Analytical and numerical derivatives of the function f (x) = sin(x), h = 1 · 10−2 .
Figure 2.3: Errors comparison, f (x) = sin(x) and h = 1 · 10−2 .
11
Chapter 2. Finite Difference Traditional Schemes
Figure 2.4: Errors comparison, f (x) = sin(x) and varying h.
Figure 2.5: Errors comparison, 3-points stencil scheme, function f (x) = sin(x).
12
Chapter 2. Finite Difference Traditional Schemes
Figure 2.6: Errors comparison, 5-points stencil scheme, function f (x) = sin(x).
Figure 2.7: Errors comparison, 7-points stencil scheme, function f (x) = sin(x).
13
Chapter 2. Finite Difference Traditional Schemes
Let f (x) = x1 .
The following figures illustrate the function trend (Figure 2.8), the analytical and numerical derivatives of the function (Figure 2.9) and the comparison between errors related to the three different schemes at first considering a constant h (Figure 2.10) and then varying h (Figure 2.11).
Figure 2.8: Function f (x) = x1 .
For this case as well, considering h = 1 · 10−2 , Figures 2.9 and 2.10 show that the 7-points stencil scheme is more accurate than the 5points stencil scheme, and the 5-points stencil scheme appears to be more accurate than the 3-points stencil scheme. Figure 2.11 illustrates that, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h∗ , defined as the value of h so that |ηT | = |ηR |. If h is furtherly reduced, the accuracy of the dense stencil schemes decreases because of the round-off error becomes dominant. As in the previous case, their computational load will be heavier due to the increase of the number of points where the function must be evaluated. So, in this case, it is not convenient to use a dense stencil for h < h∗ . 14
Chapter 2. Finite Difference Traditional Schemes
Figure 2.9: Analytical and numerical derivatives of the function f (x) = x1 , h = 1 · 10−2 .
Figure 2.10: Errors comparison, f (x) =
15
1 x
and h = 1 · 10−2 .
Chapter 2. Finite Difference Traditional Schemes
Figure 2.11: Errors comparison, f (x) =
Let f (x) =
1 x
and varying h.
1 . x2
In the following figures the function trend (Figure 2.12), the analytical and numerical derivatives of the function (Figure 2.13) and the comparison between errors related to the three different schemes at first considering a constant h (Figure 2.14) and then varying h (Figure 2.15) are shown. Here again, considering h = 1 · 10−2 , Figure 2.13 and Figure 2.14 show that the more complex the stencil is, the more accurate the numerical derivative is. Figure 2.15 illustrates that, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h∗ , defined as the value of h so that |ηT | = |ηR |. If h is furtherly reduced, the accuracy of the dense stencil schemes decreases because of the round-off error becomes dominant. In addition, their computational load will be heavier due to the increase of the number of points where the function must be evaluated. So, in this case, it is not convenient to use a dense stencil for h < h∗ .
16
Chapter 2. Finite Difference Traditional Schemes
Figure 2.12: Function f (x) =
1 . x2
Figure 2.13: Analytical and numerical derivatives of the function f (x) = h = 1 · 10−2 .
17
1 , x2
Chapter 2. Finite Difference Traditional Schemes
Figure 2.14: Errors comparison, f (x) =
1 x2
Figure 2.15: Errors comparison, f (x) =
18
and h = 1 · 10−2 .
1 x2
and varying h.
Chapter 2. Finite Difference Traditional Schemes
Now it is interesting to analyze how the central difference schemes work in presence of more complicated functions: Let f (x) = A sin(ωt)e−λt .
The following figures show the function trend (Figure 2.16), the analytical and numerical derivatives of the function (Figure 2.17) and the comparison between errors related to the three different schemes at first considering a constant h (Figure 2.18) and then varying h (Figure 2.19).
Figure 2.16: Function f (x) = A sin(ωt)e−λt .
From the qualitative point of view, even in presence of a more complicated function, the results of the sensitivity analysis are the same (Figures 2.17 - 2.18 - 2.19). Indeed, Figures 2.19 illustrates that, here again, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h∗ , defined as the value of h so that |ηT | = |ηR |. If h is furtherly reduced, the accuracy of the more complex stencil schemes decreases because of the increase of the round-off error and, in addition, their computational 19
Chapter 2. Finite Difference Traditional Schemes
Figure 2.17: Analytical and numerical derivatives of the function f (x) = A sin(ωt)e−λt , h = 1 · 10−2 .
Figure 2.18: Errors comparison, f (x) = A sin(ωt)e−λt and h = 1 · 10−2 .
20
Chapter 2. Finite Difference Traditional Schemes
load will be heavier due to the increase of the number of points where the function must be evaluated. So, in this case, it is not convenient to use a more complex stencil for h < h∗ .
Figure 2.19: Errors comparison, f (x) = A sin(ωt)e−λt and varying h.
Let f (x) = A sin2 (ωt) cos(ωt2 ).
In the following figures the function trend (Figure 2.20), the analytical and numerical derivatives of the function (Figure 2.21) and the comparison between errors related to the three different schemes at first considering a constant h (Figure 2.22) and then varying h (Figure 2.23) are illustrated. Considering h = 5 · 10−4 , Figure 2.21 shows that the denser the stencil is, the more accurate the central difference scheme is. Furthermore, Figure 2.22 illustrates that, here again, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h∗ , defined as the value of h so that |ηT | = |ηR |. If h is furtherly reduced, the accuracy of the dense stencil schemes decreases because of the round-off error becomes dominant. In addition, their computational load will be heavier due to the increase of the number of points where the function must be evaluated. 21
Chapter 2. Finite Difference Traditional Schemes
Figure 2.20: Function f (x) = A sin2 (ωt) cos(ωt2 ).
Figure 2.21: Analytical and numerical derivatives of the function f (x) = A sin2 (ωt) cos(ωt2 ), h = 1 · 10−2 .
22
Chapter 2. Finite Difference Traditional Schemes
Figure 2.22: Errors comparison, f (x) = A sin2 (ωt) cos(ωt2 ) and h = 5 · 10−4 .
Figure 2.23: Errors comparison,f (x) = A sin2 (ωt) cos(ωt2 ) and varying h.
23
Chapter 2. Finite Difference Traditional Schemes
The last example has been selected from the literature [4], and confirms the results of the analysis performed so far. Let f (x) = √
ex . sin3 (x)+cos3 (x)
The following figures show the function trend (Figure 2.24), the analytical and numerical derivatives of the function (Figure 2.25) and the comparison between errors related to the three different schemes at first considering a constant h (Figure 2.26) and then varying h (Figure 2.27). As seen in the previous cases, here again, considering h = 1 · 10−2 ,
Figure 2.24: Function f (x) = √
ex sin3 (x)+cos3 (x)
.
Figures 2.25 - 2.26 show that the more points the stencil is composed of, the more accurate the numerical derivative is, so it is convenient to use a 7 points stencil central difference scheme, meaning that the error between analytical and numerical derivative decreases as the number of the points of the stencil increases. However, this is not generally true. Indeed, Figure 2.27 shows that, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h∗ , defined as the value of h so that |ηT | = |ηR |. If h is furtherly reduced, the accuracy of the more complex 24
Chapter 2. Finite Difference Traditional Schemes
Figure 2.25: Analytical and numerical derivatives of the function f (x) = x √ 3 e , h = 1 · 10−2 . 3 sin (x)+cos (x)
Figure 2.26: Errors comparison, f (x) = √
25
ex sin3 (x)+cos3 (x)
and h = 1 · 10−3 .
Chapter 2. Finite Difference Traditional Schemes
Figure 2.27: Errors comparison, f (x) = √
ex sin3 (x)+cos3 (x)
and varying h.
stencil schemes decreases because of the increase of the round-off error, which becomes dominant. In addition, their computational load will be heavier due to the increase of the number of points where the function must be evaluated. So, in this case, it is not convenient to use a more complex stencil for h < h∗ .
2.3
Conclusions
In this chapter the traditional finite difference schemes have been analysed. We focused on the central difference schemes which appear to be more accurate than the backward and forward difference schemes. Numerical examples of the different stencil (3-points, 5-points and 7-points) on six different functions having increasing complexity have been discussed showing the effects of selecting different values of the perturbation h. The fundamental result is the following: to improve the accuracy of the central difference schemes it is necessary to reduce the truncation error, due to the higher order terms in the Taylor series, by reducing h. However, making h too small can lead to subtraction errors due to the finite precision used by computers to store numbers. Indeed, it is not desirable to choose h to small otherwise the round-off error becomes dominant. 26
Chapter 2. Finite Difference Traditional Schemes
In addition, the last three examples also show that, as it is reasonable to expect, the error between the analytical and the numerical derivatives becomes higher when more complex functions need to be differentiated. This is a consequence of the combination of several errors that of course reduce the overall accuracy of the numerical schemes we have investigated so far. In conclusion, considering a specific function, it is possible to study how the error between analytic and numerical derivatives varies with respect to h in order to choose the central difference scheme which minimize the error according to a certain value of the perturbation h.
27
Chapter
3
Advanced Differentiation Scheme: the Complex-Step Derivative Approach Overview In this chapter the complex-step derivative approximation and its application to six test functions are presented. As seen from Chapter 2, the easiest way to estimate numerical derivatives is by finite difference approximation and, in particular, the central difference schemes appears to be the most accurate ones. These schemes can be derived by truncating a Taylor series expanded about a point x. When estimating derivatives using finite difference formulas we are faced with the problem of selecting the size of the perturbation h so that it minimizes the error between the analytic and numerical derivative. Indeed, it is necessary to choose a small h to minimize the truncation error while avoiding the use of a perturbation too small because, in this case, the round-off error becomes dominant. To improve the accuracy of the numerical derivatives, the complex-step derivative approximation is defined so that it is not subject to subtractive cancellation errors. This is a great advantage over the finite difference operations as it will be seen in the next sections. In the following sections the complex-step method is defined and then tested in presence of six different functions. The results are compared with respect to the ones achieved in Chapter 2 using the finite difference approximation.
28
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
3.1
The Complex-Step Derivative Approximation
In this section it is shown that, a first derivative estimate for real functions can be obtained using complex calculus. Consider a function, f = u + iv, of the complex variable, z = x + iy. If f is analytical - that is, if it is differentiable in the complex plane - the Cauchy-Riemann equations apply and ∂u ∂v = , ∂x ∂y ∂u ∂v = − . ∂y ∂x
(3.1) (3.2)
Using the definition of derivative in the right-hand side of the first CauchyRiemann equation it is possible to write ∂u v(x + i(y + h)) − v(x + iy) = lim , ∂x h→0 h
(3.3)
where h is a real number. Since the functions we are interested in are originally real functions of real variables, y = 0, u(x) = f (x) and v(x) = 0. Equation (3.3) can be rewritten as Im[f (x + ih)] ∂f = lim . ∂x h→0 h
(3.4)
For a small discrete h, this can be approximated by Im[f (x + ih)] ∂f ≈ . ∂x h
(3.5)
The equation (3.5) is called complex-step derivative approximation [4]. As it can be seen, this estimate is not subject to subtractive cancellation errors, because it does not involve a difference operation as it happens, instead, in the finite difference approximation. In other words, the complex-step derivative approximation is not affected by round-off error and it constitutes a huge advantage over the finite difference approximations allowing us to chose h as small as possible to reduce the truncation error without worrying about the round-off error. To determine the error involved in this approximation it is possible to expand f as a Taylor series about a real point x but now, rather than using a real step h, a pure imaginary step ih is used 0
2f
f (x + ih) = f (x) + ihf (x) − h 29
000
(x) + .... 3!
(3.6)
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Taking the imaginary parts of both sides of this Taylor expansion (3.6) and dividing it by h yields [4] Im[f (x + ih)] + O(h2 ). (3.7) h As a consequence, the approximation is of order O(h2 ). The second order errors can be reduced by ensuring that h is sufficiently small and, since the complex-step approximation does not involve a difference operation, it is possible to choose extremely small size of the perturbation h without losing accuracy. The only drawback is the need to have an analytical function. It cannot be applied, for instance, to look-up tables. f 0 (x) =
3.2
Numerical examples
In the following section the complex-step method is tested in presence of six different functions having increasing complexity and the results are compared with the ones obtained in Chapter 2 using the finite difference schemes. Let f (x) = sin(x).
Figure 3.1 shows the relative error η in the errors estimates given by the central difference and the complex-step methods using the analyt0 0 ical result as the reference; η = |f 0 − fref |/|fref |. It illustrates that the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.2 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complexstep derivative approximation. The complex-step approximation appears to be, approximately, two orders of magnitude more accurate than the central difference scheme.
30
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.1: Relative error in the sensitivity estimates, function f (x) = sin(x).
Figure 3.2: Minimum-Errors comparison, function f (x) = sin(x).
31
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Let f (x) = x1 .
Figure 3.3 shows the relative error η in the errors estimates given by the central difference and the complex-step methods. It illustrates that, here again, the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.4 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complexstep derivative approximation. Here too, the complex-step approximation appears to be, approximately, two orders of magnitude more accurate than the central difference scheme.
Figure 3.3: Relative error in the sensitivity estimates, function f (x) = x1 .
32
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.4: Minimum-Errors comparison, function f (x) = x1 .
Let f (x) =
1 . x2
Figure 3.5 shows the relative error η in the errors estimates given by the central difference and the complex-step methods. It illustrates that, here again, the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.6 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complexstep derivative approximation. Here too, the complex-step approximation appears to be, approximately, two orders of magnitude more accurate than the central difference scheme. 33
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.5: Relative error in the first derivative, function f (x) =
Figure 3.6: Minimum-Errors comparison, function f (x) =
34
1 . x2
1 . x2
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Let f (x) = A sin(ωt)e−λt .
Figure 3.7 shows the relative error η in the errors estimates given by the central difference and the complex-step methods. It illustrates that, here again, the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.8 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complexstep derivative approximation. Here too, the complex-step approximation appears to be, approximately, two orders of magnitude more accurate than the central difference scheme.
Figure 3.7: Relative error in the first derivative, f (x) = A sin(ωt)e−λt .
35
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.8: Minimum-Errors comparison, f (x) = A sin(ωt)e−λt .
Let f (x) = A sin2 (ωt) cos(ωt2 ).
Figure 3.9 shows the relative error η in the errors estimates given by the central difference and the complex-step methods. Here again, the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.10 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complex-step derivative approximation. The complex-step approximation appears to be, approximately, three orders of magnitude more accurate than the central difference scheme. 36
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.9: Relative error in the first derivative, f (x) = A sin2 (ωt) cos(ωt2 ).
Figure 3.10: Minimum-Errors comparison, f (x) = A sin2 (ωt) cos(ωt2 ).
37
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Let f (x) = √
ex . sin3 (x)+cos3 (x)
Figure 3.11 shows the relative error η in the errors estimates given by the central difference and the complex-step methods. Here again, the central difference estimates initially converge quadratically to the exact result but, when the step h is reduced below a value of about 10−2 for the 7-points stencil scheme, 10−3 for the 5-points stencil scheme and 10−5 for the 3-points stencil scheme, round-off error becomes dominant and the resulting estimates are not reliable. Since this example is taken from [4], we can compare the results. The comparison of results reported in the Figures 3.11 and 3.12 shows the consistency of the results. Indeed, η diverges or tends to 1 meaning that the finite difference estimates yield zero because h is so small that no difference exists in the output. The complex-step derivative approximation converges quadratically with decreasing step size because of the decrease of the truncation error. The estimate is practically insensitive to small size of the perturbation h and for any h below a value of about 10−8 it achieves the accuracy of the function evaluation. Figure 3.13 illustrates the comparison of the minimum-error of the central difference derivative estimate and the minimum-error of the complex-step derivative approximation. The complex-step approximation appears to be, approximately, three orders of magnitude more accurate than the central difference scheme.
3.3
Conclusions
In this chapter the complex-step derivative approximation has been analysed. The complex-step approximation provides greater accuracy than the finite difference formulas, for first derivatives, by eliminating the subtraction error. Indeed, as the finite differences, the complex-step derivative approximation is concerned with truncation error but, this approximation does not suffer from the problem of round-off error. f 0 (x) is the leading term of the imaginary part of f (x + ih), so h can be made small enough that the truncation error is effectively zero without worrying about the round-off error. The only disadvantage is the need to have an analytical function, meaning that, for instance, it is not possible to apply this approximation to look-up tables. The complex-step derivative approximation has been tested in presence of six different functions and the results have been compared with the ones shown in the chapter 2. The value of h that minimize the error between analytical and numerical derivatives is, in the case of the complex-step approximation, 38
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.11: Relative error in the first derivative, f (x) = √
ex . sin3 (x)+cos3 (x)
Figure 3.12: Relative error in the first derivative [4].
39
Chapter 3. Advanced Differentiation Scheme: the Complex-Step Derivative Approach
Figure 3.13: Minimum-Errors comparison, f (x) = √
ex sin3 (x)+cos3 (x)
.
less than the machine epsilon (ε = 2.2204 · 10−16 ). For this reason, to be sure to get reliable results, a value of h equal to twice the machine epsilon has been selected.
40
Chapter
4
Advanced Differentiation Scheme: the Dual-Step Derivative Approach Overview In this chapter the dual-step derivative approach and its application to six test functions are presented. As said in the previous chapters, derivatives are often approximated using finite difference schemes. These approximations are subject to truncation error, associated with the higher order terms of the Taylor series that are ignored when forming the approximation, and to round-off error which is a result of performing these calculations on a computer with finite precision. The complex-step derivative approximation is more accurate than the finite difference scheme and the greater accuracy is provided by eliminating the round-off error. To improve the accuracy of the numerical derivatives, the dual-step approach uses dual numbers, see Appendix [A], and the derivatives calculated using these new numbers are exact, without any truncation error or subject to subtraction errors. This is a great advantage, in terms of the step size, over the complex-step derivative approximation. In the following sections the dual-step method for the first derivative calculation is defined and tested in presence of six different functions. The error in the first derivative calculation is then compared with the error for the central difference schemes and complex-step approximation.
41
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
4.1
The Dual-Step Derivative Approach
In this section the dual-step formula for the first derivative calculation is shown. Consider the Taylor series of a function f (x) for x ∈ R f (x + a) = f (x) + af 0 (x) +
1 2 00 a3 f 000 (x) a f (x) + + .... 2! 3!
(4.1)
If we assume that the perturbation a is the dual part of the dual number (x + a1 ) a = a1 with 2 = 0 and 6= 0 (4.2) so that a2 = 0, a3 = 0, . . . , the Taylor series (4.1) truncates exactly at the first-derivative term, yielding the properties of the approximation that we are seeking: f (x + a) = f (x) + a1 f 0 (x). (4.3) So, to get f 0 (x) it is necessary to simply read off the component and divide by a1 , yielding the dual-step first derivative formula: f 0 (x) =
Dual[f (x + a)] . a1
(4.4)
Since the dual-step derivative approximation does not involve a difference operation and no terms of the Taylor series are ignored, this formula is subject neither to truncation error, nor to round-off error. There is no need to make the step size small and the simplest choice is a1 = 1, which eliminates the need to divide by the step size. The main disadvantage of the dual-step approach is related to the computational cost. Working with dual numbers requires additional computational work. Indeed, adding two dual numbers is equivalent to 2 real additions. Multiplying two dual numbers is equivalent to 3 real multiplications and 2 real additions, as shown in the Appendix [A]. Therefore a dual-function evaluation should take about 2 to 5 times the runtime of a real-function evaluation.
4.2
Numerical Examples
In the following section the dual-step method is tested in presence of six different functions having increasing complexity and the results are compared with the ones obtained using the finite difference schemes and the complexstep method. 42
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) = sin(x).
Figure 4.1 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complex-step and the dual-step methods using the analytical result as the reference. 0 0 |. As the step size |/|fref The relative error is defined as η = |f 0 − fref decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation, as seen in Chapters 2 and 3. The error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size.
Figure 4.1: Relative error in the first derivative, function f (x) = sin(x).
43
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) = x1 .
Figure 4.2 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complex-step and the dual-step methods using the analytical result as the reference. 0 0 |. As the step size |/|fref The relative error is defined as η = |f 0 − fref decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation. Here again, the error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size.
Figure 4.2: Relative error in the first derivative, function f (x) = x1 .
44
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) =
1 . x2
Figure 4.3 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complex-step and the dual-step methods using the analytical result as the reference. 0 0 |. As the step size |/|fref The relative error is defined as η = |f 0 − fref decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation. For this case as well, the error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size.
Figure 4.3: Relative error in the first derivative, function f (x) =
45
1 . x2
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) = A sin(ωt)e−λt .
Figure 4.4 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complex-step and the dual-step methods using the analytical result as the reference. 0 0 |. As the step size |/|fref The relative error is defined as η = |f 0 − fref decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation. For this case as well, the error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size.
Figure 4.4: Relative error in the first derivative, f (x) = A sin(ωt)e−λt .
46
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) = A sin2 (ωt) cos(ωt2 ).
Figure 4.5 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complex-step and the dual-step methods using the analytical result as the reference. 0 0 |. As the step size |/|fref The relative error is defined as η = |f 0 − fref decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation. For this case as well, the error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size.
Figure 4.5: Relative error in the first derivative, f (x) = A sin2 (ωt) cos(ωt2 ).
47
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Let f (x) = √
ex . sin3 (x)+cos3 (x)
Figure 4.6 illustrates, as a function of the step size h, the relative error η in the errors estimates given by the central difference, the complexstep and the dual-step methods using the analytical result as the reference. As the step size decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at, machine zero (the machine epsilon). This shows the effect of subtractive cancellation errors, which affects the finite difference approximations but not the first derivative complex-step approximation. For this case as well, the error of the dual-step approximation, which is not subject to truncation error or round-off error, is machine zero regardless of the selected step size. Since this example is taken from [13], we can compare the results. The comparison of results reported in the Figures 4.6 and 3.12 shows the consistency of the results.
Figure 4.6: Relative error in the first derivative, f (x) = √
48
ex . sin3 (x)+cos3 (x)
Chapter 4. Advanced Differentiation Scheme: the Dual-Step Derivative Approach
Figure 4.7: Relative error in the first derivative [13].
4.3
Conclusions
In this chapter the dual-step approach for the computation of the first derivatives has been introduced. This approach provides greater accuracy than the finite difference formulas and the complex-step derivative approximation. Indeed, the dual-step approach is subject neither to the truncation error, nor to the round-off error and, as a consequence, the error in the first derivative estimate is machine zero regardless of the selected step size. This is a great advantage over the complex-step derivative approximation because, using the dual-step approach, there is no need to select the step sizes as small as possible. The disadvantage is the computational cost due to the fact that working with dual numbers requires addition computational work. In addition, as the complex-step approximation, the dual-step approach is concerned with the need to have an analytical function. The dual-step approach has been tested in presence of six different functions. The relative errors have been calculated using the analytical derivative as reference and then, they have been compared with the ones computed in chapters 2 and 3 using, respectively, the finite difference schemes and the complex-step approximation.
49
Chapter
5
Generation of Reference Data Overview In this chapter the analytical structure of the Gradient vector and the Jacobian matrix are analysed to generate a set of reference data for each of the following problems: the Space Shuttle Reentry Problem [1], the Orbit Raising Problem [5] and the Hang Glider Problem [1]. These reference data are useful so as to compare the exact analytical Jacobian with the numerical one, computed using different numerical differentiations, as we will see in the next chapters. In the first section of this chapter the analytical definitions of the Gradient vector and the Jacobian matrix of a generic function are given. Then, the formulation of the three aforementioned problems is described and the analytical Jacobian is generated for each of them. In the last section the Jacobian matrix sparsity pattern for each of the three problems is shown.
5.1
Definition of Gradient and Jacobian
Given a generic scalar-valued function of n variables, f (x) : x ∈ Rn , if f (x1 , x2 , . . . , xn ) is differentiable, its gradient is the vector whose components are the n partial derivatives of f with respect to the n variables xi . The gradient of the function f is denoted by ∇f where ∇ denotes the vector differential operator: ∂f ∂f ∂f , ,..., . (5.1) ∇f = ∂x1 ∂x2 ∂xn The gradient can be considered as a generalization in n dimensions of the usual concept of derivative of a function of several variables. 50
Chapter 5. Generation of Reference Data
If we now consider a generic function F (x) : x ∈ Rn → Rm , the Jacobian matrix of this vector-valued function is the matrix of all the n first-order partial derivatives of the m real-valued functions, F1 (x1 , . . . , xn ), . . . , Fm (x1 , . . . , xn ), which can be organized in an m-by-n matrix as follows: ∂F
∂x11 ∂F2 ∂x1 J = . ..
∂Fm ∂x1
∂F1 ∂x2
...
∂F2 ∂x2
...
.. .
...
∂Fm ∂x2
...
∂F1 ∂xn
∂F2 ∂xn .. .
(5.2)
∂Fm ∂xn
The Jacobian can be considered as the generalization of the gradient for vector-valued functions of several variables. Indeed, in the case m = 1 the Jacobian matrix has a single row, and may be identified with a vector, which is the gradient. In the case m = n the Jacobian matrix is a square matrix and its determinant is a function of (x1 , x2 , . . . , xn ) called the Jacobian determinant of F . Let us now consider the general structure of the Jacobian matrix associated to a dynamical system of m equations. In the most general case, considering ns states x = (x1 , . . . , xns )T , nc controls u = (u1 , . . . , unc )T and a generic time t, the Jacobian matrix will have the following dimension and structure: dim(J ) = [m × (1 + ns + nc )]
(5.3)
∂F1 ∂t
∂F2 ∂t J = . ..
∂Fm ∂t
∂F1 ∂x1 ∂F2 ∂x1
...
∂F1 ∂xns
∂F1 ∂u1
...
∂F2 ∂xns
∂F2 ∂u1
.. .
...
.. .
.. .
∂Fm ∂x1
...
∂Fm ∂xns
∂Fm ∂u1
...
∂F1 ∂unc
∂F2 . . . ∂u nc .. ... . ∂Fm . . . ∂u n
(5.4)
c
Considering the three problems we will analyse in the next sections, the number m will be equal to the number of the state variables ns but, if we consider a generic optimal control problem we have to account for additional equations, meaning for the cost function and for the nc equation constraints so that m will be equal to ns + nc + 1.
51
Chapter 5. Generation of Reference Data
5.2
Numerical Examples
In this section, three problems are formulated and, for each of them, we will generate the analytical Jacobian matrix based on the definition given above.
5.2.1
Space Shuttle Reentry Problem
The problem of the construction of the reentry trajectory for the space shuttle is a classic example of an optimal control problem and it is of significant practical interest. The motion of the vehicle is defined by the following set of differential algebraic equations [1]: h˙ = v sin(γ), v sin(ψ) cos(γ) , φ˙ = r cos(θ) v θ˙ = cos(γ) cos(ψ), r D v˙ = − − g sin(γ), m v g L γ˙ = cos(β) + cos(γ) − , mv r v v L sin(β) + cos(γ) sin(ψ) sin(θ), ψ˙ = m v cos(γ) r cos(θ)
(5.5) (5.6) (5.7) (5.8) (5.9) (5.10)
where the aerodynamic and atmospheric forces on the vehicle are specified by the following quantities (English units) [1]: L = 21 cL Sρv 2 , g = µ/r2 (µ = 0.1407 · 1017 ), ρ = ρ0 exp−h/hr (ρ0 = 0.002378), cL = a0 + a1 α, a0 = −0.20704, a1 = 0.029244, S = 2690,
D = 12 cD Sρv 2 , r = Re + h (Re = 20902900), hr = 23800, c D = b0 + b1 α + b2 α 2 , b0 = 0.07854, b1 = −0.6159 · 10−2 , b2 = 0.6214 · 10−3 .
The state variables are x = (h, φ, θ, v, γ, ψ)T where h is the altitude (ft), φ is the longitude (rad), θ is the latitude (rad), v is the velocity (ft/sec), γ is the flight path angle (rad) and ψ is the azimuth (rad). Whereas, the control variables are u = (α, β)T where α is the angle of attack (rad) and β is the bank angle (rad). So, considering the formula (5.4), the analytical Jacobian 52
Chapter 5. Generation of Reference Data
matrix associated to the Space 0 0 0 0 J22 0 0 J32 0 J = 0 J42 0 0 J52 0 0 J62 0
Shuttle Reentry problem is the following: 0 J15 J16 0 0 0 J24 J25 J26 J27 0 0 0 J35 J36 J37 0 0 (5.11) 0 J45 J46 0 J48 0 0 J55 J56 0 J58 J59 J64 J65 J66 J67 J68 J69
The non-zero elements of J are the following: 2
J15 = sin(γ),
J48 = − S ρ v
(b1 +2b2 α) , 2m
J16 = v cos(γ),
L J52 = − Sρvcos(β)c + 2 m hr
v cos(γ) cos(θ) sin(ψ) J22 = − (R 2, e cos(θ)+h cos(θ))
J55 =
S ρ cos(β) cL 2m
+
cos(γ)(v 2 r+µ) , r2 v2
J24 =
v cos(γ) sin(ψ) sin(θ) , (Re +h) cos2 (θ)
J56 =
−sin(γ)(v 2 r−µ) , r2 v
J25 =
cos(γ) sin(ψ) , (Re +h) cos(θ)
J58 =
S ρ v cos(β) a1 , 2m
sin(ψ) , J26 = − v(Rsin(γ) e +h) cos(θ)
J27 =
v cos(γ) cos(ψ) , (Re +h) cos(θ)
cos(γ)(2µ−v 2 r) , v r3
J59 = − S ρ v c2Lmsin(β) , L J62 = − 2Sρsin(β)vc − m cos(γ) hr
vcos(γ)sin(ψ)tan(θ) , r2
J64 =
v cos(γ) sin(ψ) , r cos2 (θ)
J65 =
S ρ sin(β) cL m cos(γ)
cos(ψ) J36 = − v sin(γ) , Re +h
J66 =
S ρ sin(β)v cL sin(γ) m cos2 (γ)
sin(ψ) , J37 = − v cos(γ) Re +h
J67 =
v cos(γ)cos(ψ)tan(θ) , r
S ρ cD v 2 2 m hr
J68 =
S ρ sin(β)v a1 , m cos(γ)
J69 =
S ρ cos(β)v cL . m cos(γ)
cos(ψ) , J32 = − v cos(γ) (Re +h)2
J35 =
J42 =
cos(γ) cos(ψ) , Re +h
+
J45 = − S ρmcD v ,
2 µ sin(γ) , (Re +h)3
µ cos(γ) J46 = − (R 2, e +h)
53
+
cos(γ)sin(ψ)tan(θ) , r
−
v sin(γ)sin(ψ)tan(θ) , r
Chapter 5. Generation of Reference Data
5.2.2
Orbit Raising Problem
This problem has been proposed more than once in literature [5] and deals with the maximization of the specific energy of a low-thrust spacecraft orbit transfer, in a given fixed time. It can be expressed considering an orbit subject to the following dynamics (expressed in canonical units) [5]: dr dt dϕ dt dVr t dVt dt
= Vr
(5.12)
Vt r Vt2 µ = − 2 + T sin(δ) r r Vr Vt = − + T cos(δ) r =
(5.13) (5.14) (5.15)
where Vr and Vt are, respectively, the radial and the tangential speed, r is the radius, ϕ is the true anomaly, δ is the thrust angle, T is the specific force (the thrust acceleration), assumed to be constant and equal to 0.01, µ is the normalized gravitational parameter and δ is the angle between the direction of the thrust and the tangential velocity. The state variables are x = (r, φ, Vr , Vt )T whereas, the control variable is u = δ. So, here again, considering the formula (5.4), the analytical Jacobian matrix concerning the Orbit Raising problem is the following: 0 0 0 J14 0 0 0 J22 0 0 J25 0 J = (5.16) 0 J32 0 0 J35 J36 0 J42 0 J44 J45 J46 The non-zero elements of J are the following: J14 = 1,
J36 = T cos(δ),
J22 = − Vr2t ,
J42 =
J25 = 1r ,
J44 = − Vrt , V2
J32 = − rt2 + J35 =
2µ , r3
2 Vt , r
Vr Vt , r2
J45 = − Vrr , J46 = −T sin(δ).
54
Chapter 5. Generation of Reference Data
5.2.3
Hang Glider Problem
This problem deals with the range maximization of an hang glider in the presence of a specified thermal updraft. The state equations which describe the planar motion for the hang glider are [1] x˙ = vx , y˙ = vy , 1 v˙x = (−L sin(η) − D cos(η)), m 1 v˙y = (L cos(η) − D sin(η) − W ), m
(5.17) (5.18) (5.19) (5.20)
where L = 21 cL Sρvr2 , sin(η) = Vvry , p vr = vx2 + Vy2 , ua (x) = uM (1 − X)e−X ,
D = 12 cD Sρvr2 , cos(η) = vvxr , Vy = vy − ua (x), 2 X = Rx − 2.5 ,
and with the quadratic drag polar cD (cL ) = c0 + kc2L . The state variables are x = (x, y, vx , vy )T , where x is the horizontal distance, y is the altitude, vx and vy are, respectively, the horizontal and the vertical velocity. The control variable is u = cL , the aerodynamic lift coefficient. Considering the formula (5.4), the analytical Jacobian matrix representing the Hang Glider problem is the following: 0 0 0 J14 0 0 0 0 0 0 J25 0 J = (5.21) 0 J32 0 J34 J35 J36 0 J42 0 J44 J45 J46 The non-zero elements of J are the following:
55
Chapter 5. Generation of Reference Data
J14 = 1, J25 = 1,
J35 J36 J42
J46
5.3
x −2.5 R
Vy vr
D vvx2 r
Rm
r
cL ρ S vx cos(η) + vLr − cD ρ S vx sin(η) + D vxvV3 y , r D vx2 , = mVyvr cos(η)cL S ρ vr − L vvx2 − sin(η)cD S ρ vr − m vr3 r ρ S vr2 cos(η) 1 2 − ρ S K cL vr sin(η) . =m 2
J44 = J45
sin(η)cL Sρ vr + cos(η)cD Sρ vr − +L 2 Vy vx D Vy = − mvr sin(η)cL Sρ vr − L v2 + cos(η)cD Sρ vr − m v3 , r r 2 Vy vx L = − mvr sin(η)cL Sρ vr − D v2 + cos(η)cD Sρ vr − m vvx3 , r r 2 = m1 ρ S vr 2sin(η) − ρ S K cL vr2 cos(η) , x 2 ua (x)+uM e−X −2.5 Vy R vx = cos(η)c Sρ v − sin(η)c Sρ v − L −D L r D r 2 Rm vr v
J32 = J34
2 ua (x)+uM e−X
1 m
Generation of Reference Jacobians
In this section we will generate a reference Jacobian matrix for each of the three problems which have been formulated in the previous sections. To do that, it is necessary to define a reference solution for each problem. The reference solution is computed using SPARTAN, a tool developed by DLR based on the use of the Flipped Radau Pseudospectral Method, which gives, for each of the aforementioned problems, the following outputs: - a vector t with nt components which are the nt times, (t1 , . . . , ti , . . . , tnt ), when the solution is calculated; - a matrix X, whose dimension is equal to [ns ×nt ], containing the values of the ns state variables evaluated at the nt times when the solution is calculated; - a matrix U, whose dimension is equal to [nc ×nt ], containing the values of the nc control variables evaluated at the nt times when the solution 56
vx2 vr3
,
vx2 vr3
,
Chapter 5. Generation of Reference Data
is calculated. As a consequence, the reference Jacobian matrix corresponding to this reference solution will have the following dimension and structure: dim(J ) = [ns · nt ] × [nt · (1 + ns + nc )]
(5.22)
J1 O[m×(1+ns +nc )] O J2 [m×(1+ns +nc )] J = ... O[m×(1+ns +nc )] .. . ... O[m×(1+ns +nc )] ...
...
...
...
...
J3
...
...
...
O[m×(1+ns +nc )] O[m×(1+ns +nc )] O[m×(1+ns +nc )] .. . Jnt
. . . O[m×(1+ns +nc )]
(5.23) The generic submatrix Ji has the dimension and structure shown in formula (5.4), but now each component of the matrix is calculated at time t = ti : dim(Ji ) = [m × (1 + ns + nc )] ∂F1 ∂ti
∂F 2 ∂ti Ji = .. . ∂Fm ∂ti
∂F1 ∂x1
∂F2 ∂x1
ti
∂F1 ∂xns
ti
∂F2 . . . ∂x ns
...
.. . ∂Fm ∂x1
...
ti
ti
∂F1 ∂u1
ti
ti
∂F2 ∂u1
ti
.. .
...
.. .
∂Fm . . . ∂xn s
ti
∂Fm ∂u1
∂F1 ∂unc
ti ∂F2 . . . ∂u nc ti
... ... ti
(5.24)
.. .
∂Fm ∂unc
(5.25)
ti
So the reference Jacobian matrix is a sparse matrix and its pattern depends on the problem we are analysing. In the following figures the Jacobian matrix sparsity patterns for the Space Shattle Reentry problem (Figure 5.1), the Orbit Raising problem (Figure 5.2) and the Hang Glider (Figure 5.3) are illustrated. To have a better visualization of the pattern, all the figures showing the Jacobian structure are associated to the solutions obtained using 3 nodes, meaning nt = 3.
57
Chapter 5. Generation of Reference Data
Figure 5.1: Jacobian Matrix Sparsity Patterns for the Space Shuttle Problem.
Figure 5.2: Jacobian Matrix Sparsity Patterns for the Orbit Raising Problem.
58
Chapter 5. Generation of Reference Data
Figure 5.3: Jacobian Matrix Sparsity Patterns for the Hang Glider Problem.
5.4
Conclusions
In this chapter the Gradient vector and the Jacobian matrix have been defined. Three different problems (the Space Shuttle Reentry Problem, the Orbit Raising Problem and the Hang Glider Problem) have been formulated to generate a reference Jacobian matrix for each of them, meaning a Jacobian matrix formulated using analytical derivatives. The reference Jacobian matrices have been generated using reference solutions which have been calculated with SPARTAN, a tool developed by DLR based on the the use of the FRPM. The resulting reference Jacobian matrices are characterized by a sparsity pattern whose structure depends on the problem that we are analysing, as it is reasonable to expect. These reference data will be useful to validate the numerical differentiation approaches we will introduce in the next chapters.
59
Chapter
6
Jacobian Matrix Generation with Numerical Differentiations - Analysis of Accuracy and CPU Time Overview In this chapter the numerical differentiation schemes analysed in the Chapter 2.2, 3 and 4 are employed to generate the numerical Jacobian matrix for three different problems: the Space Shuttle Reentry Problem [1], the Orbit Raising Problem [5] and the Hang Glider Problem [1]. Then, these numerical Jacobian matrices are compared with the analytical ones computed in the previous and the results in terms of accuracy and CPU time are illustrated. In the first section of this chapter we focus our attention on the central difference traditional schemes. For each of the three aforementioned problems the Jacobian matrix is generated using the 3-points stencil central difference scheme, the 5-points stencil central difference scheme and the 7-points central difference scheme. Then the accuracy of these schemes is analysed using the analytical Jacobian matrix as reference. In the second section the Jacobian matrix, for each of the three problems, is generated using the complex-step derivative approximation. Then the accuracy of this scheme is analysed and the results are compared with the one achieved using the central difference traditional schemes. In the third section the same analysis is repeated, for each of the three problems, considering the Jacobian matrix generated with the dual-step derivative
60
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
approach. In the last sections we focus our attention on the CPU time to have a measure of the different computational power required for each technique.
6.1
Jacobian Matrix Generation with Central Difference Traditional Schemes
In this section the 3-points stencil central difference scheme, the 5-points stencil central difference scheme and the 7-points central difference scheme are employed to generate the Jacobian matrix for three different problems: the Space Shuttle Reentry Problem, the Orbit Raising Problem and the Hang Glider Problem.
6.1.1
Space Shuttle Reentry Problem
Figure 6.1 illustrates, as a function of the step size h, the error η in the Jacobian Matrix given by the 3-points central difference scheme, the 5-points stencil central difference scheme and the 7-points central difference scheme. The error is computed using the exact analytical Jacobian as reference: η = kJnum − Jan k∞ . As shown in the Figure 6.1, as the step size decreases, the maximum error decreases according to the order of the truncation error. However, this is not generally true. Indeed, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h is equal to a given h? so that η is minimum. If h is furtherly reduced, the accuracy of the more complex stencil schemes decreases because of the increase of the relative round-off error, which becomes dominant. In this case, below a certain value of h, it is not convenient to compute the Jacobian matrix using dense stencil schemes. The minimum value of the error in the computation of the Jacobian matrix η ? and the corresponding value of the perturbation h? are summarized, for each central difference scheme, in the Table 6.1. In terms of accuracy, to reduce η ? it is convenient to use a central difference scheme which involves more points and, at the same time, to select a value of the perturbation which is not too small.
61
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
Figure 6.1: Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem.
Numerical Differentiation Scheme
η?
h?
3-points stencil CD scheme
3.1 · 10−8
1.8 · 10−6
5-points stencil CD scheme
3.9 · 10−10
3.7 · 10−4
7-points stencil CD scheme
3.1 · 10−11
4.8 · 10−3
Table 6.1: Accuracy and step size comparison for the Space Shuttle Problem.
62
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
6.1.2
Orbit Raising Problem
The analysis is now repeated for the Orbit Raising Problem. Figure 6.2 illustrates, as a function of the step size h, the error in the Jacobian Matrix η = kJnum − Jan k∞ given by the 3-points central difference scheme, the 5-points stencil central difference scheme and the 7-points central difference scheme. Here again, as the step size decreases, the error decreases according to the order of the truncation error. However, this is not generally true. Indeed, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h is equal to a given h? so that η is minimum. If h is furtherly reduced, the accuracy of the more complex stencil schemes decreases because of the increase of the relative round-off error, which becomes dominant. In this case, below a certain value of h, it is not convenient to compute the Jacobian matrix using dense stencil schemes. The minimum value of the error in the Jacobian matrix η ? and the cor-
Figure 6.2: Maximum error in the Jacobian Matrix for the Orbit Raising Problem.
responding value of the perturbation h? are summarized, for each central difference scheme, in the Table 6.2. In terms of accuracy, to reduce η ? it is convenient to use a central difference scheme which involves more points 63
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
and, at the same time, to select a value of the perturbation which is not too small. Numerical Differentiation Scheme
η?
h?
3-points stencil CD scheme
3.8 · 10−11
3.6 · 10−6
5-points stencil CD scheme
7.0 · 10−13
4.4 · 10−4
7-points stencil CD scheme
1.1 · 10−13
2.7 · 10−3
Table 6.2: Accuracy and step size comparison for the Orbit Raising Problem.
6.1.3
Hang Glider Problem
Here the analysis is repeated for the Hang Glider Problem. Figure 6.3 illustrates, as a function of the step size h, the error in the Jacobian Matrix η = kJnum − Jan k∞ given by the 3-points central difference scheme, the 5-points stencil central difference scheme and the 7-points central difference scheme. Also for the Hang Glider Problem, as the step size decreases, the error decreases according to the order of the truncation error. However, this is not generally true. Indeed, if we reduce the size of the perturbation h, the accuracy of the central difference schemes which involve more points increases until h is equal to a given h? so that η is minimum. If h is furtherly reduced, the accuracy of the more complex stencil schemes decreases because of the increase of the relative round-off error, which becomes dominant. In this case, below a certain value of h, it is not convenient to compute the Jacobian matrix using dense stencil schemes. The minimum value of the error in the Jacobian matrix η ? and the corresponding value of the perturbation h? are summarized, for each central difference scheme, in the Table 6.3. In terms of accuracy, to reduce η ? it is convenient to use a central difference scheme which involves more points and, at the same time, to select a value of the perturbation which is not too small. It is interesting to underline this result: if we compare the results summarized in the tables 6.1, 6.2 and 6.3 we can point out that, the simpler the equations which describe the dynamics of the problem are, the better the 64
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
accuracy of each central difference scheme are and the value of the step size which minimize the error in the Jacobian matrix increases. These results are consistent with the ones achieved in the Chapter 2.2.
Figure 6.3: Maximum error in the Jacobian Matrix for the Hang Glider Problem.
Numerical Differentiation Scheme
η?
h?
3-points stencil CD scheme
5.9 · 10−11
7.4 · 10−5
5-points stencil CD scheme
7.8 · 10−13
7.7 · 10−3
7-points stencil CD scheme
1.4 · 10−13
4.9 · 10−2
Table 6.3: Accuracy and step size comparison for the Hang Glider Problem.
In addition, the results suggest that: in case of use of the 3-points stencil central difference scheme the step size should be selected in the range 1.8 · 10−6 < h < 7.4 · 10−5 ;
65
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
in case of use of the 5-points stencil central difference scheme the step size should be selected in the range 3.7 · 10−4 < h < 7.7 · 10−3 ; in case of use of the 7-points stencil central difference scheme the step size should be selected in the range 2.7 · 10−3 < h < 4.9 · 10−2 .
6.2
Jacobian Matrix Generation with ComplexStep Derivative Approach
In this section the complex-step derivative approximation is applied to generate the Jacobian matrix for three different problems: the Space Shuttle Reentry Problem, the Orbit Raising Problem and the Hang Glider Problem.
6.2.1
Space Shuttle Reentry Problem
Figure 6.4 shows, as a function of the step size h, the error in the Jacobian matrix η = kJnum − Jan k∞ given by the central difference schemes and the complex-step method. The figure illustrates that, the central difference estimates initially converge
Figure 6.4: Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem.
66
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
to the exact result but, when the size of the perturbation h is reduced below a value which is specific for each stencil, round-off error becomes dominant and the resulting estimates are not reliable. The complex-step derivative approximation, instead, converges with decreasing step size and the estimate is practically insensitive to small size of the perturbation h. This trend is due to the fact that, as shown in Chapter 3, the complex-step approximation is concerned with truncation error but, this approximation does not suffer from the problem of round-off error. This is a great advantage over the central difference schemes which are subject to both truncation and round-off errors. The computation of the Jacobian matrix with the complex-step approximation appears to be more accurate than the one with the central difference schemes. Indeed, in this case, the minimum value of the error in the Jacobian matrix is η ? = 7.4·10−12 and the corresponding value of the perturbation is h? = 7 · 10−9 . These results can be compared with the ones summarized in table 6.1 to assess the better accuracy of the Jacobian matrix generated with the complex-step approximation.
6.2.2
Orbit Raising Problem
The analysis is now repeated for the Orbit Raising Problem. Figure 6.5 shows, as a function of the step size h, the error in the Jacobian matrix η = kJnum − Jan k∞ given by the central difference schemes and the complexstep method. The figure shows that, here again, the central difference estimates initially converge to the exact result but, when the size of the perturbation h is reduced below a value which is specific for each stencil, round-off error becomes dominant and the resulting estimates are not reliable. The complex-step derivative approximation, instead, converges with decreasing step size and the estimate is practically insensitive to small size of the perturbation h. The computation of the Jacobian matrix with the complex-step approximation appears to be more accurate than the one with the central difference schemes. Indeed, in this case, the minimum value of the maximum error in the Jacobian matrix is η ? = 4.4 · 10−16 and the corresponding value of the perturbation is h? = 1.9 · 10−9 . These results can be compared with the ones summarized in table 6.2 to assess the better accuracy of the Jacobian matrix generated with the complex-step approximation.
6.2.3
Hang Glider Problem
Here the analysis is repeated for the Hang Glider Problem. Figure 6.6 illustrates, as a function of the step size h, the error in the Jacobian matrix 67
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
Figure 6.5: Maximum error in the Jacobian Matrix for the Orbit Raising Problem.
η = kJnum − Jan k∞ given by the central difference schemes and the complexstep method. The figure shows that, here too, the central difference estimates initially converge to the exact result but, when the size of the perturbation h is reduced below a value which is specific for each stencil, round-off error becomes dominant and the resulting estimates are not reliable. The complexstep derivative approximation, instead, converges with decreasing step size and the estimate is practically insensitive to small size of the perturbation h. The computation of the Jacobian matrix with the complex-step approximation appears to be more accurate than the one with the central difference schemes. Indeed, in this case, the minimum value of the maximum error in the Jacobian matrix is η ? = 4.5 · 10−15 and the corresponding value of the perturbation is h? = 2.1 · 10−9 . These results can be compared with the ones summarized in table 6.3 to assess the better accuracy of the Jacobian matrix generated with the complex-step approximation.
68
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
Figure 6.6: Maximum error in the Jacobian Matrix for the Hang Glider Problem.
6.3
Jacobian Matrix Generation with DualStep Derivative Approach
In this section the dual-step derivative approach is employed to generate the Jacobian matrix for three different problems: the Space Shuttle Reentry Problem, the Orbit Raising Problem and the Hang Glider Problem.
6.3.1
Space Shuttle Reentry Problem
Figure 6.7 shows, as a function of the step size h, the error η in the Jacobian matrix given by the central difference schemes, the complex-step approximation and the dual-step approach. The error is computed using the exact analytical Jacobian as reference: η = kJnum − Jan k∞ . The figure illustrates that, as the step size decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at a minimum value. The error of the dual-step approach, which is not subject to truncation error or roundoff error (see Chapter 4), is around the minimum value of the error of the 69
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
complex-step, regardless of the selected step size. This means that it is convenient to select a size of the perturbation h = 1 so that we will avoid to perform the ratio (4.4) to compute each term of the Jacobian matrix but, at the same time, we will not lose accuracy. The computation of the Jacobian matrix with the dual-step approach appears to be more accurate than the one with either the central difference schemes or with the complex-step approximation. Indeed, even if the minimum value of the error in the Jacobian matrix η ? = 7.4 · 10−12 is comparable with the one obtained with the use of the complex-step approximation now, with the dual-step approach, the number of exact derivatives in the Jacobian matrix is increased. In addition, the value of the perturbation h? which minimize the error η can be selected equal to 1.
Figure 6.7: Maximum error in the Jacobian Matrix for the Space Shuttle Reentry Problem.
6.3.2
Orbit Raising Problem
Figure 6.8 shows, as a function of the step size h, the error in the Jacobian matrix η = kJnum − Jan k∞ given by the central difference schemes, the complex-step method and the dual-step approach. Here again, the figure illustrates that, as the step size decreases, the error decreases according to the order of the truncation error of the method. 70
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at a minimum value. The error of the dual-step approach, which is not subject to truncation error or round-off error (see Chapter 4), is around the minimum value of the error of the complex-step, regardless of the selected step size. This means that, here too, it is convenient to select a size of the perturbation h = 1. The computation of the Jacobian matrix with the dual-step approach appears to be more accurate than the one with either the central difference schemes or with the complex-step approximation. Indeed, in this case the minimum value of the error in the Jacobian matrix η ? = 3.3 · 10−16 . In addition, with the use of the dual-step, the number of exact derivatives in the Jacobian matrix is increased and the value of the perturbation h? which minimize the error η can be selected equal to 1.
Figure 6.8: Maximum error in the Jacobian Matrix for the Orbit Raising Problem.
6.3.3
Hang Glider Problem
The analysis is now repeated for the Hang Glider Problem. Figure 6.9 shows, as a function of the step size h, the error in the Jacobian matrix 71
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
η = kJnum − Jan k∞ given by the central difference schemes, the complexstep method and the dual-step approach. Here too, the figure illustrates that, as the step size decreases, the error decreases according to the order of the truncation error of the method. However, after a certain value of h, the error for the central difference approximations begins to grow, while the error for the complex-step approximation continues to decrease until it reaches, and remains at a minimum value. The error of the dual-step approach, which is not subject to truncation error or round-off error (see Chapter 4), is around the minimum value of the error of the complex-step, regardless of the selected step size. This means that it is convenient to select a size of the perturbation h = 1 so that we will avoid to perform the ratio (4.4) to compute each term of the Jacobian matrix but, at the same time, we will not lose accuracy. The computation of the Jacobian matrix with the dual-step approach appears
Figure 6.9: Maximum error in the Jacobian Matrix for the Orbit Raising Problem.
to be more accurate than the one with either the central difference schemes or with the complex-step approximation. Indeed, even if the minimum value of the error in the Jacobian matrix η ? = 4.0 · 10−15 is comparable with the one obtained with the use of the complex-step approximation now, with the dual-step approach, the number of exact derivatives in the Jacobian matrix 72
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
is increased. In addition, the value of the perturbation h? which minimize the error in the Jacobian matrix can be selected equal to 1.
6.4
CPU Time Analysis
In this section we will consider the aforementioned problems and, for each of them, we will compare the CPU time required to compute the Jacobian matrix using the numerical differentiation proposed.
6.4.1
Space Shuttle Reentry Problem
Figure 6.10 shows, as a function of the step size h, the CPU time required for the computation of the Jacobian matrix with the central difference schemes, the complex-step approximation and the dual-step approach. The figure illustrates that, concerning the central difference scheme, the more complex the stencil is, the higher the time required to compute the Jacobian matrix is, as it is reasonable to expect.
Figure 6.10: CPU Time Required for the Space Shuttle Reentry Problem.
The CPU time required to compute the Jacobian matrix with the dual-step approach is higher than the one associated to the complex-step approximation because of the additional computational work associated to the use of 73
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
the dual numbers. However, the values are comparable and the dual-step approach is preferred due to the better accuracy. The CPU time associated to the use of either the 7-points or the 5-points stencil central difference schemes is higher than the one required when the dual-step approach is employed. This is caused by the complexity of the equations which describe the problem. Indeed, in this case, the multiple evaluation of the functions associated to the central difference schemes is, for this problem, the major contribution to the CPU load, and its cost is higher then the effort associated to the use of the dual-step class.
6.4.2
Orbit Raising Problem
Figure 6.11 shows, as a function of the step size h, the CPU time required for the computation of the Jacobian matrix with the central difference schemes, the complex-step approximation and the dual-step approach. The figure shows that, here again, concerning the central difference scheme, the more complex the stencil is, the higher the time required to compute the Jacobian matrix is.
Figure 6.11: CPU Time Required for the Orbit Raising Problem.
The CPU time required to compute the Jacobian matrix with the dual-step approach is higher than the one associated to the use of the other approximations. This is due to the fact that the use of the dual-step approach 74
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
requires the implementation of a new MATLAB class which allows a realvalued function to be converted to operate on dual numbers. When the equations which describe the dynamics of the problem are not complicated, the need to call the MATLAB class and the additional computational work associated to the use of the dual numbers need more CPU power than using the other differentiation methods.
6.4.3
Hang Glider Problem
Figure 6.12 shows, as a function of the step size h, the CPU time required for the computation of the Jacobian matrix with the central difference schemes, the complex-step approximation and the dual-step approach. The figure illustrates that, here too, concerning the central difference scheme, the more complex the stencil is, the higher the time required to compute the Jacobian matrix is. Here again, the CPU time required to compute the Jacobian matrix with the dual-step approach is higher than the one associated to the use of the other methods. Indeed, also in this case, the need to call the new MATLAB class and the additional computational work associated to the use of the dual numbers need more CPU time than the one required to implement either the central difference schemes or the complex-step approximation.
Figure 6.12: CPU Time Required for the Hang Glider Problem.
75
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
6.5
Analysis of CPU Time vs. Size of the Problem
Increasing
In this section a comparison between the CPU time required to compute the Jacobian matrix is performed varying the size of the problem. In particular, in the following figures, for each of the three problems considered, the CPU time is calculated increasing the number of the nodes n. Figures 6.13, 6.14 and 6.15 show that, for each of the three problems, if n increases from 20 up to 120, the CPU time increases as well, as expected. The CPU time has been calculated selecting, for each of the five methods applied to compute the Jacobian matrix, the corresponding value of the step size h? which minimizes the error. In particular, for the dual-step approach a value of the step size equal to one has been chosen.
Figure 6.13: CPU Time Required for the Space Shuttle Reentry Problem.
76
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
Figure 6.14: CPU Time Required for the Orbit Raising Problem.
Figure 6.15: CPU Time Required for the Hang Glider Problem.
77
Chapter 6. Jacobian Matrix Generation with Numerical Differentiations Analysis of Accuracy and CPU Time
6.6
Conclusions
In this chapter the Jacobian matrices for three different problems (the Space Shuttle Reentry Problem, the Orbit Raising Problem and the Hang Glider Problem) have been generated using different numerical differentiations. The results are compared in terms of accuracy and CPU time. The dual-step approach has proved to be the most accurate differentiation method for the computation of the Jacobian matrix. Each term of the Jacobian matrix calculated using this approach is subject neither to truncation error, nor to round-off error. In addition, using the dual-step approach there is no need to make the step size small because the best accuracy is achieved regardless of the selected step size and the simplest choice is h = 1 to eliminate the need to divide by the step size. This is an advantage over the use of the central difference schemes and the complex-step approximation. Indeed, the use of either the central difference schemes or the complex-step approximation has proved to be less accurate and, in addition, their accuracy is strongly influenced by the selection of the optimal step size. The optimal step size for these methods is not known a priori and it requires a trade off between the truncation and round-off errors as well as a substantial effort and knowledge of the analytical derivative. In terms of CPU time, the time required for the computation of the Jacobian matrix with the dual-step approach is not the smallest one and it depends on the non-linearity and complexity of the equations which describe the dynamics of the problem. Indeed, the use of the dual-step approach requires additional computational work associated to the use of the dual numbers which are implemented in MATLAB as a new class of numbers. The CPU time is also influenced by the number of the nodes and, for each of the three problems analysed, an increase of the number of the nodes causes an increase of the CPU time as well, as expected. To conclude, the trade-off between accuracy and CPU power suggests that the dual-step differentiation is a valid alternative to the other well-known numerical differentiation methods, in case the problem is analytically defined (i.e., no look-up table are part of the data).
78
Chapter
7
Use of the Advanced Differentiation Schemes for Optimal Control Problems Overview In this chapter the differentiation schemes defined in the previous chapters are used to solve optimal control problems. In the first section we formulate a general optical control problem and we focus our attention on the numerical approaches which can be used for solving it. In the next sections SPARTAN, an algorithm developed by DLR based on the use of the Flipped Radau Pseudospectral Method, is presented focusing on the computation of the Jacobian matrix associated to the NLP problem. Three numerical examples of optimal control problem are studied: the maximization of the final crossrange in the space shuttle reentry trajectory, the maximization of the final specific energy in an orbit raising problem, and the maximization of the final range of a hang glider in presence of a specific updraft. Each of the three examples is solved using five different differentiation schemes ( the 3-points stencil central difference scheme, the 5-points stencil central difference scheme, the 7-points stencil central difference scheme, the complex-step approach and the dual-step method), and two different offthe-shelf, well-known NLP solvers (SNOPT and IPOPT). The results are compared in terms of accuracy and CPU time to realize the main advantages and drawbacks related to the use of the pseudospectral methods in combination with the dual numbers, and with the other differentiation schemes.
79
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
7.1
General Formulation of an Optimal Control Problem
Optimal control is a subject where it is desired to determine the inputs to a dynamical system that optimize (i.e., minimize or maximize) a specific performance index while satisfying any constraints on the motion. Indeed, the term optimal control problem refers to a problem where the inputs to the system are themselves functions or static parameters and it is desired to determine the particular input function and trajectory that optimize a given performance index or objective function. An optimal control problem is posed formally as follows, [8]. Determine the state (equivalently, the trajectory or path), x(t) ∈ Rn , the control u(t) ∈ Rm , the vector of static parameters p ∈ Rq , the initial time, t0 ∈ R, and the terminal time, tf ∈ R (where t ∈ [t0 , tf ] is the independent variable) that optimizes the cost function J[x(t), u(t), t; p]
(7.1)
subject to the dynamic constraints (i.e., differential equation constraints), ˙ x(t) = f[x(t), u(t), t; p],
(7.2)
Cmin ≤ C[x(t), u(t), t; p] ≤ Cmax ,
(7.3)
the path constraints
and the boundary conditions Φmin ≤ Φ[x(t), u(t), t; p] ≤ Φmax ,
(7.4)
The objective function (7.1), in a Bolza formulation of the OCP, can be expressed as follows Z tf J = φ[x(t0 ), t0 , x(tf ), tf ; p] + L[x(t), u(t), t; p]dτ (7.5) t0
where φ is called the Mayer term and L is called the Lagrange integrand. The differential equations (7.2) describe the dynamics of the system while the objective (7.1) is the performance index which can be considered as a measure of the “quality” of the trajectory. When it is desired to minimize the performance index, a lower value of J is preferred; conversely, when it is desired to maximize the performance index, a higher value of J is preferred. With the exception of simple problems (i.e., some special weakly nonlinear 80
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
low-dimensional systems), optimal control problems must be solved numerically. The need for solving optimal control problems numerically has given rise to a wide range of numerical approaches. These numerical approaches are divided into two major categories: indirect methods and direct methods, [8]. The indirect methods are based on the calculus of variations which is used to determine the first-order optimality conditions of the original optimal control problem given in equations (7.1)-(7.4). Unlike ordinary calculus (where the objective is to determine points that optimize a function), the calculus of variations is the subject of determining functions that optimize a functional. A functional is a function from a vector space into its underlying scalar field and, commonly, the vector space is a space of functions, thus the functional is sometimes considered as a function of a function. The indirect approach leads to a multiple-point boundary-value problem that is solved to determine candidate optimal trajectories called extremals. Each of the computed extremals is then examined to see if it is a local minimum, maximum, or a saddle point. Of the locally optimizing solutions, the particular extremal with the lowest cost is chosen. So, the indirect aproach solves the problem indirectly by converting the optimal control problem to a boundary-value problem and, as a result, the optimal solution is found by solving a system of differential equations that satisfies endpoint and/or interior point conditions. Direct methods are fundamentally different from indirect methods. In a direct method, the state and/or control of the optimal control problem is discretized in some manner and the problem is transcribed to a nonlinear optimization problem or nonlinear programming problem (NLP). Indeed, a NLP problem is characterized by a finite set of state and control variables, while optimal control problems can involve continuous functions. Therefore, it is convenient to view the optimal control problem as an infinite-dimensional extension of an NLP problem. The general nonlinear programming problem can be stated as follows [1]: Find the n-vector xT = (x1 , . . . , xn ) to minimize the scalar objective function F (x) (7.6) subject to the m constraints cL ≤ c(x) ≤ cU
(7.7)
(equality constraints can be imposed by setting cL = cU ) and the simple bounds xL ≤ x ≤ xU . (7.8) 81
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Once the optimal control problem is transcribed to a NLP problem, the NLP will be solved using well known optimization techniques, [1]. In conclusion, in a direct method the optimal solution is found by transcribing the infinitedimensional (continuous) optimization problem to a finite-dimensional optimization problem. In particular, one of the most promising techniques is represented by direct collocation methods and, among these, pseudospectral methods are gaining popularity for their straightforward implementation and some interesting properties which are associated to their use [5].
7.2
SPARTAN
SPARTAN (Shefex-3 Pseudospectral Algorithm for Reentry Trajectory ANalysis) is an optimal control package developed by the DLR. It has already been used in literature [11, 12], and it is the reference tool for the development of the entry guidance for the SHEFEX-3 (Sharp Edge Flight Experiment) mission and has been validated with several well-known literature examples. SPARTAN implements the global Flipped Radau Pseudospectral Methods (FRPMs) to solve constrained or unconstrained optimal control problems, which can have a fixed or variable final time. It belongs to the class of direct methods and has a highly exploited Jacobian structure, as well as routines for automatic linear/nonlinear scaling and auto-validation using the RungeKutta 45 scheme. The basic idea of the FRPM is, as in the other direct methods, to collocate the differential equations, the cost function and the constraints in a finite number of points to treat them as a set of nonlinear algebraic equations. In this way, the continuous OCP is reduced to a discrete NLP problem which can be solved with one of the well-known available software packages, e.g. SNOPT, IPOPT. In details, considering the structure of the classical Bolza Optimal Control Problem (7.1)-(7.5) we want to solve, SPARTAN proposes a transcription of the OCP as NLP based on the choice of some “trial” functions to represent the continuous variables x(ti ) ∼ = Xi , u(tj ) ∼ = Uj ,
82
i ∈ [0, N ] j ∈ [1, N ].
(7.9) (7.10)
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
In other words, the continuous states and controls can be substituted with polynomials which interpolate the values in the nodes x(t) ∼ = u(t) ∼ =
N X
Xi Pi (t)
(7.11)
Ui Pi (t)
(7.12)
t − tj ti − tj j=0,j6=i
(7.13)
i=0 N X i=1
where Pi (t) =
N Y
and tj are the roots of linear combinations of Legendre Polynomials (Figure 7.1) Pn (t) and Pn−1 (t). The difference in the indexing in (7.9), (7.10) and in their discrete representations is due to the distinctions between discretization and collocation. While the discretization includes (in the FRPM) the initial point, the collocation does not. Hence, the controls will be approximated with a polynomial having a lower order and the NLP problem will not provide the initial values for the controls. These can be, in some cases, part of the initial set of known inputs, otherwise they can be extrapolated from the generated polynomial interpolating the N values of controls in the collocation nodes, [5]. In this way the entire information related to the states and the controls is enclosed in their nodal values. Of course, the boundaries valid for the continuous form will also be applied to the discrete representation of the functions. In particular, it has been shown that for SPARTAN, as well as for all the pseudospectral methods, the following properties are valid: “Spectral” convergence in the case of smooth problem; The Runge phenomenon is avoided; Sparse structure of the associated NLP problem; Differential equations become algebraic constraints evaluated in the collocations points.
In the next section we will focus our attention on the structure of Jacobian associated to the NLP problem deriving from the transcription implemented by SPARTAN. Indeed, experience shows that, while for simple systems a more detailed analysis of Jacobian can be avoided, in complex problems like atmospheric reentry a solid knowledge of its structure is very helpful and significantly increases the speed of computation and in some cases the quality of the results. 83
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.1: Legendre Polynomials of order 5.
7.3
Hybrid Jacobian Computation
In the most general case, considering ns states, nc controls, ng constraints, n collocation points and unknown final time, the Jacobian associated to the transcription of an autonomous system of equations will be expressed as a matrix having the following dimension, [5]: dim(J ) = [n · (ns + ng ) + 1] × [(n + 1) · ns + n · nc + 1].
(7.14)
SPARTAN exploits the general structure of the Jacobian associated to the NLP problem deriving from the application of the FRPM. Indeed, to take full advantage from the intrinsic sparsity associated to the use of pseudospectral methods and from the theoretical knowledge contained in the definition of the discrete operator D, the Jacobian is expressed as sum of three different contributions J = JP seudoSpectral + JN umerical + JT heoretical .
(7.15)
For a deeper analysis on the structure of each term of the Jacobian see [5]. The PseudoSpectral and the Theoretical terms are exact. So far, the Numerical Jacobian has been computed in SPARTAN using the complex-step derivative method, so it is not exact because the complex-step derivative approach is affected by truncation errors. Now, by means of the use of the dual-step derivative method it is possible to have exact Numerical Jacobian because the dual-step differentiation scheme is subject neither to truncation error, nor to round-off error, as demonstrated in the previous chapters.
84
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
7.4 7.4.1
Numerical Example Space Shuttle Reentry Problem
The construction of reentry trajectory for the space shuttle is a classic example of an OCP. The motion of the vehicle is described by the set of differential algebraic equations defined in Chapter 5.2.1, equations (4.5)-(4.10). The reentry trajectory begins at an altitude where the aerodynamic forces are quite small with the following initial conditions: h0 = 260000f t, φ0 = 0◦ , θ0 = 0◦ ,
v0 =25600ft/s, γ0 = −1◦ ψ0 = 90◦ .
The final point on the reentry trajectory occurs at the unknown final time tf . The goal is to choose the control variables α(t) and β(t) so that the final cross-range is maximized which is equivalent to maximizing the final latitude θ(tf ). So, the cost function can be defined as follows: J = θ(tf ).
(7.16)
In this case the Jacobian has all the three components and its pattern is shown in [5]. The OCP has been implemented and solved with SPARTAN using different differentiation schemes: the 3-points stencil central difference scheme, the 5-points stencil central difference scheme, the complex-step derivative approach and the dual-step derivative method. Furthermore, the solution is computed with an upper bound on the aerodynamic heating of 70 BT U/f t2 /s. Figures 7.2, 7.3 and 7.4 illustrate the time histories of the states, the controls and the constraints which are associated to the solution obtained using the dual-step derivative method and a number of nodes equal to 100. Figure 7.5 shows the discrepancy between SPARTAN and propagated (using the Runge-Kutta 45 scheme) solutions. Since this example is taken from [1], we can compare the results. In Figure 7.6 the time histories for the states and the controls are shown as a solid line for the unconstrained solution and, as a dotted line for the constrained solution (the one implemented and solved by SPARTAN). The comparison of the Figures 7.2-7.4 and 7.6 shows the consistency of the results. Tables 7.1 and 7.2 summarize the results obtained with SPARTAN using the five different differentiation schemes. In the first table SNOPT (Sparse Nonlinear OPTimizer) is used to solve the NLP problem, instead in the second table IPOPT (Interior Point OPTimizer) is employed in SPARTAN. 85
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.2: States Evolution for the Space Shuttle Reentry Problem.
Figure 7.3: Controls Evolution for the Space Shuttle Reentry Problem.
86
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.4: Heat Rate Evolution for the Space Shuttle Reentry Problem.
Figure 7.5: Discrepancy between optimized and propagated solutions for the Space Shuttle Reentry Problem.
87
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.6: Shuttle reentry - state and control variables. Results reported in [1]
Figure 7.7: Space Shuttle Reentry Problem - Groundtrack of Trajectory Optimizing Final Crossrange.
88
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
100 Nodes
200 Nodes
300 Nodes
SNOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
8.647
60.47
13
21.38
1.367
13.23
2
62.18
0.4963
4.378
3
248.29
8.558
60.35
13
16.21
1.363
13.44
10
83.79
0.5106
4.380
3
267.60
8.571
60.45
13
17.39
1.357
13.28
10
82.27
0.5171
4.408
3
268.38
8.612
60.44
13
15.20
1.383
13.15
3
63.40
0.5337
4.441
3
249.23
8.533
60.45
13
15.37
1.348
13.05
10
80.87
0.4902
4.350
3
270.06
Table 7.1: Accuracy and CPU Time comparison for the Space Shuttle Problem (SNOPT).
100 Nodes
200 Nodes
300 Nodes
IPOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
6.287
62.78
483
472.1
7.402
26.52
3094
1.2 · 104
13.917
48.33
1706
2.8·104
6.306
62.80
767
675.8
7.547
27.01
6635
2.7 · 104
13.911
48.32
1168
1.9·104
6.280
62.87
687
628.85 6.867
24.63
1971
8.1 · 103
13.920
48.31
1448
2.5·104
6.526
62.84
439
313.6
6.983
25.24
3044
1.2 · 104
13.910
48.33
4177
7.2·104
6.285
62.76
574
518.7
6.932
21.93
3493
1.3 · 104
13.857
47.14
2425
4.1·104
Table 7.2: Accuracy and CPU Time comparison for the Space Shuttle Problem (IPOPT).
Figure 7.8 shows the trend of the mean error (in logarithmic scale) between the solutions computed with SPARTAN and the propagated solutions as a function of the number of the nodes. For each of the five differentiation schemes we have a “spectral” (exponential) convergence of the solution. The dual-step related plot appears to be much smoother than the complex-step. 89
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.8: Spectral Convergence for the Space Shuttle Reentry Problem.
90
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
7.4.2
Orbit Raising Problem
This problem is taken from [5, 9] and deals with the maximization of the total specific energy of a low-thrust spacecraft orbit transfer, in a given fixed time. The orbit is subject to the dynamics expressed by the equations defined in Chapter 5.2.2. The goal is to maximize the total specific energy at the final time, considering that the rocket engine provides a constant thrust acceleration to the spacecraft. Thus, the cost function can be defined as follows: 1 2 1 2 (7.17) − V (tf ) + Vt (tf ) . J= r(tf ) 2 r Since the final time is known, the Jacobian here will only consist of the pseudospectral and numerical contributions [5]. Figures 7.9, 7.10 and 7.12 illustrate states, controls and the discrepancy between optimized and propagated solutions. These results are obtained using the dual-step derivative approach, with a number of nodes equal to 110. Figure 7.11 shows the trajectory optimizing the final orbit energy. As it can been seen, the optimal trajectory is a multirevolution spiral away from attracting body [9] which has its center of mass located at the origin. Since this example is solved in [9], we can compare the results in the Figures 7.9, 7.10 and 7.13.
Figure 7.9: States Evolution for the Orbit Raising Problem.
91
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.10: Control Evolution for the Orbit Raising Problem.
Figure 7.11: Orbit Raising Problem - Trajectory Optimizing Final Orbit Energy (LU=Unitary Length).
92
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.12: Discrepancy between optimized and propagated solutions for the Orbit Raising Problem.
Figure 7.13: Orbit Raising - state and control variables. Results reported in[9]
93
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Tables 7.3 and 7.4 summarize the results obtained with SPARTAN using five different differentiation schemes, and two different NLP solvers. 100 Nodes
200 Nodes
300 Nodes
SNOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Max Error Error Iter. (·10−5 ) (·10−4 )
CPU (sec)
Mean Max Error Error Iter. (·10−5 ) (·10−4 )
CPU (sec)
Mean Max Error Error Iter. (·10−5 ) (·10−4 )
CPU (sec)
1.482
3.392
35
12.94
5.081
3.233
28
61.54
3.283
3.621
10
18.88
1.481
3.333
35
12.11
1.791
1.131
28
64.99
1.306
3.401
10
16.15
1.481
3.334
35
12.16
1.791
1.314
28
64.71
1.307
3.406
10
16.36
1.482
3.333
35
12.19
1.792
1.132
28
62.70
1.307
3.406
10
15.76
1.480
3.333
35
15.37
1.791
1.132
28
67.29
1.021
3.422
10
21.65
Table 7.3: Accuracy and CPU Time comparison for the Orbit Raising Problem (SNOPT).
100 Nodes
200 Nodes
300 Nodes
IPOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
0.0019
0.0118
87
47.83
0.0037
0.0225
90
333.5
0.0054
0.0330
184
1.61·103
0.0019
0.0118
72
40.65
0.0037
0.0225
87
317.7
0.0054
0.0330
103
934.9
0.0019
0.0118
86
46.99
0.0037
0.0225
108
388.8
0.0054
0.0330
126
1.09·103
0.0019
0.0118
110
58.13
0.0037
0.0225
101
360.8
0.0054
0.0330
125
1.13·103
0.0019
0.0118
75
42.0
0.0037
0.0225
114
394.4
0.0054
0.0330
175
1.56·103
Table 7.4: Accuracy and CPU Time comparison for the Orbit Raising Problem (IPOPT).
Figure 7.14 shows the trend of the mean error (in logarithmic scale) between the solutions computed with SPARTAN and the propagated solutions as function of the number of the nodes. 94
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.14: Spectral Convergence for the Orbit Raising Problem.
95
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
7.4.3
Hang Glider Problem
This problem is taken from [1], it describes the optimal control of a hang glider in the presence of a specified thermal updraft. The state equations which describe the planar motion for the hang glider are the ones defined in Chapter 5.2.3. The final time tf is free and the final range x(tf ) has to be maximized. Therefore, the cost function can be defined as follows J = x(tf ). (7.18) The lift coefficient is bounded 0 ≤ cL ≤ 1.4
(7.19)
and the following boundary conditions are imposed: x(0) = 0(m), y(0) = 1000(m), vx (0) = 13.227567(m/sec), vy (0) = −1.28750052(m/sec),
x(tf ) : f ree, y(tf ) = 900(m), vx (tf ) = 13.227567(m/sec), vy (tf ) = −1.28750052(m/sec).
Figures 7.15, 7.16 and 7.17 illustrate states, controls and the discrepancy between optimized and propagated solutions. These results are obtained using the dual-step derivative method, with a number of nodes equal to 100. Since this example is solved in [1], we can compare the results. The comparison of the Figures 7.15, 7.16 and 7.18 shows the consistency of the results. In more details, in Figure 7.18, the dashed line is the initial guess which has been computed using linear interpolation between the boundary conditions, with x(tf ) = 1250, and cL (0) = cL (tf ) = 1. Tables 7.5 and 7.6 summarize the results obtained with SPARTAN using five different differentiation schemes. In the first table SNOPT is used to solve the NLP problem, instead in the second table IPOPT is employed in SPARTAN as NLP solver. Furthermore, Figure 7.19 shows the trend of the mean error (in logarithmic scale) between the optimized and propagated solutions as function of the number of the nodes. For each of the five differentiation schemes we have a “spectral” (exponential) convergence of the solution, as expected by the use of Pseudospectral methods.
96
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.15: States Evolution for the Hang Glider Problem.
Figure 7.16: Control Evolution for the Hang Glider Problem.
97
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.17: Discrepancy between optimized and propagated solutions for the Hang Glider Problem.
Figure 7.18: Hang Glider - state and control variables. Results reported in [1]
98
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
100 Nodes
200 Nodes
300 Nodes
SNOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Error
Max Error
Iter.
CPU (sec)
Mean Error
Max Error
Iter.
CPU (sec)
Mean Max Error Error (·10−4 )
Iter.
CPU (sec)
0.0068
0.0821
28
22.65
0.0010
0.0524
34
70.59
5.267
0.0102
23
279.87
0.0068
0.0816
29
21.87
0.0010
0.0526
34
85.61
5.127
0.0095
29
266.83
0.0068
0.0812
26
21.68
0.0012
0.0518
23
58.71
5.862
0.0103
29
257.05
0.0068
0.0823
26
20.64
0.0010
0.0524
31
63.87
4.519
0.0100
29
271.11
0.0068
0.0823
26
20.57
0.0010
0.0524
31
65.20
4.519
0.0100
29
262.21
Table 7.5: Accuracy and CPU Time comparison for the Hang Glider Problem (SNOPT).
100 Nodes
200 Nodes
300 Nodes
IPOPT
CD3
CD5
CD7
Compl.-Step
Dual-Step
Mean Error
Max Error
Iter.
CPU Time
Mean Error
Max Error
Iter.
CPU Time
Mean Error
Max Error
Iter.
CPU Time
0.0070
0.0817
80
42.32
0.0015
0.0521
133
344.5
0.0011
0.0099
136
1.59·103
0.0070
0.0817
89
39.65
0.0015
0.0521
115
332.6
0.0011
0.0099
125
1.65·103
0.0070
0.0817
89
38.14
0.0015
0.0521
91
359.23 0.0011
0.0099
144
1.32·103
0.0070
0.0817
89
40.42
0.0015
0.0521
116
343.6
0.0011
0.0099
178
1.31·103
0.0070
0.0817
89
38.47
0.0015
0.0521
117
338.1
0.0011
0.0099
137
578.36
Table 7.6: Accuracy and CPU Time comparison for the Hang Glider Problem (IPOPT).
99
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
Figure 7.19: Spectral Convergence for the Hang Glider Problem.
100
Chapter 7. Use of the Advanced Differentiation Schemes for Optimal Control Problems
7.5
Conclusions
In this chapter the general formulation of an optimal control problem, and the main numerical approaches used to solve it have been briefly summarized. We have focused on SPARTAN and on the structure of the Jacobian matrix which describes the discrete, transcribed OCP, that is, the resulting NLP. Therefore, the advanced differentiation schemes defined in the previous chapters have been implemented in SPARTAN to solve three different examples of optimal control problem. Each of the three OCPs has been solved using two different NLP solvers (SNOPT and IPOPT). The results in terms of accuracy and CPU time show that the effects of the use of the dual-step derivative method, as well as of the other schemes, in combination with the pseudosectral methods are strongly influenced by the nonlinear behaviour of the equations which describe the problem, by the number of the nodes used to discretized the problem under analysis, and by the NLP solver which has been selected. Overall, among the NLP solvers, SNOPT has been demonstrated to provide better accuracy than IPOPT and, in addition, the computational power required for the greater accuracy is far less than the one required by IPOPT. Furthermore, considering SNOPT, the dual-step method provides better accuracy than the other differentiation schemes as the number of the nodes increases, and the improvements of the accuracy are paid in terms of an increasing CPU time. On the other hand, considering IPOPT, there are no significant differences in terms of accuracy when different differentiation schemes are implemented in SPARTAN, but these schemes have different effects in terms of CPU time. In some cases, the dual-step method in combination with IPOPT provides a significant save in CPU time which leads to faster optimization. In addition, the trend of the mean error between SPARTAN and propagated (using the Runge-Kutta 45 scheme) solutions as function of the number of nodes has been analysed for each of the three problems to verify the “spectral” convergence of the solution. In conclusion, it is not possible to define a priori the most convenient differentiation methods to be implemented in SPARTAN. Therefore a trade-off between the desired quality of the results (e.g., hypersensitive problems), and the CPU time (e.g., trajectory database generation) can be found according to the specific number of nodes, to the NLP solver used and to the behaviour of the equations which describe the problem under analysis. However, the dual-step method has been demonstrated to be a valid alternative to the other traditional, well-known differentiation schemes, and it is worth being considered as valid method to solve the OCP with SPARTAN. 101
Chapter
8
Further Tools: Robust Differentiation via Sliding Mode Technique Overview In the previous chapters, different differentiation schemes have been analysed to define the method that provides the best accuracy in the computation of the derivatives and the Jacobian matrix. This analysis is of great importance considering that gradient methods for solving NLPs require the computation of the derivatives of the objective function and constraints. Therefore, the accuracy of the derivatives has a strong impact on the computational efficiency and reliability of the solution. In this chapter we will deal with the problem of the differentiation of signals given in real time with the aim to design a robust differentiator. Real-time differentiation is an old and well-studied problem and, the main difficulty is the obvious differentiation sensitivity to input noises. Given an input signal consisting of a noise and an unknown base signal, the goal of a robust differentiator is to find real-time robust estimations of the derivatives of the base signal which are exact in the absence of measurement noise. Combining differentiation exactness with robustness in respect to possible measurement errors and input noise is a challenging task and, one particular approach to robust differentiator design is the so-called robust differentiation via sliding mode technique. In the first section the main concepts of the theory of sliding mode control are briefly summarized to underline the potential advantages of using dis-
102
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
continuous, switching, control laws. Furthermore, the use of sliding surface for control design is examined at a tutorial level. In the second section we focus our attention on the design of a robust differentiator based on sliding mode technique. Two sliding mode robust differentiators are examined on tutorial examples with simulation plots.
8.1 8.1.1
Sliding Mode Technique Theory of Sliding Mode Control
In the formulation of any control problem, there will typically be discrepancies between the actual plant and the mathematical model adopted to design the controller. These disturbances usually derives from plant parameters and unmodelled dynamics. Moreover, known or unknown external disturbances can deteriorate the performances of the system. The need to design control systems which are able to provide the required performance levels in a real environment despite the presence of these plant mismatches has led to an intense interest in the development of the so-called robust control methods. The goal of robust design is to retain assurance of system performance in spite of model inaccuracies and disturbances. Indeed, when we design a control system, the mathematical model, based on numerous assumptions, incorporates two important problems that are often encountered: a disturbance signal which is added to the control input to the plant and, noise that is added to the sensor output. A robust control system exhibits the desired performances despite these plant uncertainties. One particular approach to robust controller design is the so-called sliding mode control technique. Let us briefly summarize the basic idea of a sliding mode. Sliding mode control is a particular type of variable structure control system (VCS). These systems are characterized by a set of feedback control laws and a decision rule, known as switching function. Consider the dynamic system, [6]: x(n) = f (x, t) + b(x, t)u(t) + d(t) where: u(t) is the scalar control input; x is the scalar output of interest; x = [x, x, ˙ . . . , x(n−1) ]T is the state;
103
(8.1)
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
f (x, t) is not exactly known, it is assumed to be a continuous function in x, in general nonlinear, but the extend of the imprecision on f (x, t) is upper bounded by a known continuous function of x and t; b(x, t) is a control gain, not exactly known, but it is of a known sign, bounded by known, continuous of x and t, and it is assumed to be a continuous function of x; the disturbance d(t) is unknown but bounded in absolute value by a known continuous function of time.
The control problem is to get the state x to track a specific state xref = [xd , x˙ d , . . . , x˙ n−1 ]T in the presence of model imprecision on f (x, t) and b(x, t) d ˜ := x − xref = and of disturbances d(t). Defining the tracking error vector x (n−1) ˙ [˜ x, x˜, . . . , x˜ ], we must assume ˜ t=0 = 0 x (8.2) to guarantee that the control problem provides the aforementioned performances using a finite control u. It is possible to define a time-varying sliding surface s(t) in the state-space Rn as s(˜ x; t) = 0 with s(˜ x; t) :=
d +λ dt
n−1 x˜
(8.3)
where λ is a positive constant. For instance, if n = 2 we have s = x˜˙ + λ˜ x. ˜ ref is equivalent to Given initial condition (8.2), the problem of tracking x≡ x that of remaining on the surface s(t) for all t > 0. Indeed, s ≡ 0 represents a linear differential equation whose unique solution is x˜ ≡ 0, given initial condition (8.2). Therefore, the problem of reaching the n-dimensional vector xref can be reduced to the problem of tending the scalar sliding surface s to zero. This is an advantage because if we compare equations (8.1) and (8.3) we have a reduced-order compensated dynamics. A sufficient condition for such a positive invariance of s(t) is to choose the control law u so that outside of s(t) 1d 2 s (x; t) ≤ −η|s| 2 dt
(8.4)
where η is a positive constant. Relation (8.4) is equivalent to ss˙ ≤ −η|s|
⇐⇒ 104
s˙ sign(s) ≤ −η.
(8.5)
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Indeed, the sign function has the important property so that s · sign(s) = |s|. Inequality (8.4), often termed either sliding condition or reachability condition, constraints trajectories to point towards the sliding surface s(t) and to remain on it thereafter. The idea behind conditions (8.3) and (8.4) is to pick up a well-behaved function of the tracking error, s, according to (8.3), and then select the feedback control law u in (8.1) such that s2 satisfied equation (8.4) despite the presence of model imprecision and of disturbances, [6]. The control u that drives the state variables to the sliding surface s in a finite time, and keeps them on the surface thereafter in presence of bounded disturbances, is called a sliding mode controller, and an ideal sliding mode is said to be taking place in the system. Control laws that satisfy (8.4) are discontinuous across the sliding surface. In more details, sliding mode control usually is a high frequency switching control with a switching frequency which is finite due to the discrete-time nature of the computer simulation. This high-frequency switching control causes in practice the control chattering meaning a finite frequency “zig-zag” motion in the sliding mode. In an ideal sliding mode the switching frequency is suppose to approach infinity and the amplitude of the “ zig-zag” motion tends to zero.
8.1.2
Example
The main advantages of the sliding mode control, including robustness, finitetime convergence, and reduced-order compensated dynamics, are demonstrated on a tutorial example taken from [7]. In the example, a singledimensional motion of a unit mass is considered. If we introduce variables for the position and the velocity, x1 = x and x2 = x˙1 , a state-variable description is the following ( x˙1 = x2 (8.6) x˙2 = u + f (x1 , x2 , t), where u is the control force, and the disturbance term f (x1 , x2 , t), which may include dry and viscous friction as well as any other unknown resistance forces, is assumed to be bounded, i.e., |f (x1 , x2 , t)| ≤ L > 0. The problem is to design a feedback control law u = u(x1 , x2 ) that drives the mass to the origin asymptotically (t → ∞ ⇒ x1 , x2 = 0). First, we introduce a new variable, called sliding variable, in the state space of the system (8.6): σ = σ(x1 , x2 ) = x2 + cx1 ,
105
c > 0.
(8.7)
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.1: Sliding Variable and Sliding Mode Control
To achieve asymptotic convergence of the state variable x1 , x2 to zero in presence of bounded disturbances, we have to drive the variable σ to zero in a finite time by means of the control u. Equation σ = x2 + cx1 = 0, c > 0 corresponds to a straight line in the state space of the system (8.6) and is the so-called sliding surface. The sliding mode control u = u(x1 , x2 ) that drives the state variables x1 , x2 to the sliding surface in finite time, and keeps them on the surface thereafter in the presence of the bounded disturbance f (x1 , x2 , t) is suggested in [7] as the following u = u(x1 , x2 ) = −cx2 − ρsign(σ), (8.8) where ρ is a given control gain. The results of the simulation system (8.6) with the sliding mode control law (8.8), the initial conditions x1 (0) = 1, x2 (0) = −2, the control gain ρ = 2, the parameter c = 1.5, and the disturbance f (x1 , x2 , t) = sin(2t) are presented. Figure 8.1 illustrates finite-time convergence of the sliding variable to zero and the sliding mode control. As we have previously said, the sliding mode control is a high frequency switching control and the zoom, in the figure, shows a finite amplitude and finite frequency “zig-zag” motion in the sliding mode due to the discrete-time of the computer simulation when the sign function is implemented. Figure 8.2 shows the asymptotic convergence of 106
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.2: Asymptotic Convergence and State Trajectory for f (x1 , x2 , t) = sin(2t) and u(x1 , x2 ) = −cx2 − ρsign(σ).
the state variables to zero and the state trajectory, in the presence of the external bounded disturbance f (x1 , x2 , t) = sin(2t) and of the sliding mode control u(x1 , x2 ) = −cx2 − ρsign(σ). It is possible to identify a reaching phase, when the state trajectory is driven towards the sliding surface, and a sliding phase, when the state trajectory is moving along the sliding surface towards the origin. So far, we have assumed that all the state variables are measured (available). In many cases, the full state is not available. As a consequence, the definition of sliding surface as given in (8.3) is not adequate in this case. By means of some modifications in the algorithm, it is possible to define a sliding mode observer which can be treated as a differentiator, since the variable it estimates is a derivative of the measured variable.
8.2
Sliding Mode Robust Differentiators
Construction of a special differentiator may often be avoided. For example, if the signal satisfies a certain differential equation or is an output of some known dynamic system, the derivative of the given signal may be calculated as a derivative with respect to some known dynamic system. Thus, the 107
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
problem is reduced to the well-known observation and filtering problems. In other cases construction of a differentiator is inevitable. However, the ideal differentiator could not be realized. Indeed, together with the basic signal it would also have to differentiate any small noise which always exists and may have large derivative. If nothing is known on the structure of the signal except some differential inequalities, the sliding mode technique is used. In this section two robust differentiators based on the sliding mode technique are examined at a tutorial level.
8.2.1
Fifth-Order Differentiator
Let the input signal f (t) be a function defined on [0, ∞) consisting of a bounded Lebesgue-measurable noise with unknown features and of an unknown base signal f0 (t), whose 6th derivative has a known Lipschitz constant L > 0. The problem of finding real-time robust estimations of f˙0 (t), f¨0 (t), (5) . . ., f0 (t) which are exact in the absence of measurement noises is solved by the fifth-order differentiator defined in [7]: z˙0 z˙1 z˙2 z˙3 z˙4 z˙5
= = = = = =
ν0 , ν0 = −8L1/6 |z0 − f (t)|5/6 sign(z0 − f (t)) + z1 ν1 , ν1 = −5L1/5 |z1 − ν0 |4/5 sign(z1 − ν0 ) + z2 ν2 , ν2 = −3L1/4 |z2 − ν1 |3/4 sign(z2 − ν1 ) + z3 ν3 , ν3 = −2L1/3 |z3 − ν2 |2/3 sign(z3 − ν2 ) + z4 ν4 , ν4 = −1.5L1/2 |z4 − ν3 |1/2 sign(z4 − ν3 ) + z5 −1.1Lsign(z5 − ν4 ), |f (6) (t) ≤ L|.
The fifth-order differentiator is applied, with L = 1, to differentiate the function f (t) = sin(0.5t) + cos(0.5t). The initial values of the differentiator are taken zero. Convergence of the differentiator is demonstrated in Figures 8.3, 8.4, 8.5, and 8.6. Figure 8.3 illustrates the comparison between the analytical derivatives of the base signal f (t) = sin(0.5t) + cos(0.5t) and the outputs of the fifthorder differentiator, in the absence of measurement noises. Figure 8.4 shows the errors which are evaluated using the analytical derivatives as reference. Figures 8.5 and 8.6 illustrate the results of the fifth-order differentiator and the relative errors in the presence of a normally distributed random noise with mean .01 and standard deviation .1. In practice, the fifth-order differentiator provides accurate estimations of the derivatives of the base signal f (t) = sin(0.5t) + cos(0.5t), significantly shortening the transient and, in spite of the presence of measurement noises. 108
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.3: Fifth-Order Differentiator, without noise.
Figure 8.4: Fifth-Order Differentiator Errors, without noise.
109
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.5: Fifth-Order Differentiator, with noise.
Figure 8.6: Fifth-Order Differentiator Errors, with noise.
110
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
The transient time is a function of the gain L, and it can be shortened according to the performance requirements.
8.2.2
Second Order Nonlinear System Obsrever
This numerical example is taken from [6] and, here it is proposed again with some modifications. Let us consider a second order nonlinear system, consisting of a mass connected to a nonlinear spring in the presence of dynamic and static friction: x˙ 1 = x2 x˙ 2 = −κx31 − f (x1 , x2 ) + u z = x1 + ν where ν is the measurement noise, κ is a constant nonlinear spring coefficient, and f (x1 , x2 ) = Fs x1 + Fd x2 represents dynamic and static friction. For this system the sliding mode observer suggested in [6] is: xˆ˙ 1 = −α1 z˜ + xˆ2 + k1 sign(˜ z) 3 xˆ˙ 2 = −α2 z˜ − κ ˜ xˆ1 − fˆ(ˆ x1 , xˆ2 ) + u + k2 sign(˜ z) z˜ = xˆ1 − z The numerical values in the simulations are: κ = 1.0 Fs = 1.0 Fd = 0.75 (8.9) while the estimated values used in the observer are: κ ˆ Fˆs Fˆd α1 α2
= = = = =
1.0 1.25 1.00 3.8 7.2
The true system is excited by a sinusoidal input u = sin(t) and the initial condition are: x1 (0) = 0.0 and x2 (0) = 0.5; with the estimated initial conditions: xˆ1 (0) = 0.0 xˆ2 (0) = 0.2. 111
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.7: True and Estimated State Variables, without noise.
The simulation results, in the absence of measurement noises (ν = 0), are shown in Figure 8.7. In this case values of k1 and k2 respectively equal to .2 and .5 have been used. The Figure shows that the state variables estimated by the observer converge to the true states, after a short transient phase. Figures 8.8 and 8.9 illustrates the results of the simulation in the presence of a normally distributed random noise ν with mean .01 and standard deviation .1. In this case values of k1 and k2 respectively equal to .25 and .65 have been used. The Figures show the convergence of the estimated states to the true states and, in particular, Figure 8.9 illustrates that the position estimated by the observer is much closer to the true position with respect to the measured one. These simulations show that, in spite of the parameter mismatch, the sliding mode observer provides adequate performances.
8.3
Conclusions
In this chapter the sliding mode technique and the robust differentiation via sliding mode have been analysed. The main concepts of the sliding mode control for tracking problems, as well as state and input observation have been briefly summarized. The 112
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
Figure 8.8: True and Estimated State Variables, with noise.
Figure 8.9: Comparison between True, Estimated and Measured Position.
113
Chapter 8. Further Tools: Robust Differentiation via Sliding Mode Technique
main advantages of the sliding mode control such as the robustness, the finite-time convergence, and the reduced-order compensated dynamics, are demonstrated on a tutorial example. The need of the employment of the sliding mode technique to construct a robust observer/differentiator has been justified and two robust differentiators based on the sliding mode technique have been analysed. Robust differentiation via sliding mode control has been proved to be able to provide accurate, real-time estimations of the derivatives of a base signal when nothing is known about its structure except some differential inequalities and, in spite of the presence of measurement noises.
114
Chapter
9
Conclusions 9.1
Lesson Learned
This thesis has been focus on analysing advanced differentiation schemes to implement them in SPARTAN, an algorithm developed by DLR to solve OCPs. One of the most important computational issues that arise in the numerical solution of OCP is the computation of derivatives of the objective and constraint functions. Therefore, the aim of this thesis has been to define and test advanced differentiation schemes to realize how they work, in terms of accuracy and CPU time, when implemented in combination with pseudospectral methods. The central difference schemes have been proved to be more accurate than the backward and forward difference schemes. To improve the accuracy of the central difference schemes it is necessary to reduce the truncation error by reducing the value of the step h. However, making h to small is not desirable otherwise the round-off error becomes dominant. Therefore, the optimal step size for these methods is not known a priori and it requires a trade off between the truncation and round-off errors as well as a substantial effort and knowledge of the analytical derivative. The complex-step derivative approach has been proved to provide greater accuracy than the finite difference formulas, for the first derivatives, by eliminating the round-off error. Therefore, the step size h can be made small enough that the truncation error is effectively zero without worrying about the subtraction error. The dual numbers have been presented as a tool for differentiation, not only for optimal control, but also as tool that can be used with existing codes. Indeed, the dual numbers have been implemented as a new class of numbers, using operator overloading, in MATLAB. The new class allows a real-valued 115
Chapter 9. Conclusions
analysis code to be easily converted to operate on dual numbers by just changing the variable type declarations, the structure of the code remains unchanged. The dual-step approach has been proved to be able to provide exact firstorder derivatives. Indeed, the dual-step method is subject neither to the truncation error, nor to the round-off error and, as a consequence, the error in the first derivative estimate is machine zero regardless of the selected step size. The disadvantage is the computational cost due to the fact that working with the dual numbers requires additional computational work. In addition, as the complex-step approximation, the dual-step method is concerned with the need to have an analytical function. The dual-step approach is the most accurate method for the computation of the Jacobian matrix even if the improvements in the accuracy are paid in terms of the computational power. However, the trade-off between accuracy and CPU power suggests that the dual-step differentiation is a valid alternative to the other well-known numerical differentiation schemes, in case the problem is analytically defined (i.e., no look-up table are part of the data). Hyper-dual numbers have been presented as a higher dimensional extension of dual numbers. The hyper-dual class, in MATLAB, can be used to compute exact first and second derivatives to form Gradients and Hessians which are exact. Focusing on SPARTAN, two NLP solvers (SNOPT and IPOPT) have been tested in combination with the aforementioned differentiation schemes. Overall, SNOPT is preferable for the greater accuracy and for the minor CPU time required with respect to IPOPT while, among the differentiation schemes, a trade-off between the desired quality of the results and the CPU time will suggest the more suitable scheme. Indeed, the results in terms of accuracy and CPU time show that the effects of the use of the dual-step method, as well as of the other schemes, in combination with the pseudosectral methods are strongly influenced by the nonlinear behaviour of the equations which describe the problem and by the number of the nodes used to discretized the problem under analysis. In addition, the use of the dual differentiation method has given the possibility to prove the spectral convergence for problems which are very different from each other. Furthermore, the thesis focused on the sliding mode technique and the robust differentiation via sliding mode. The main advantages of the sliding mode control such as the robustness and the finite-time convergence have been taken into account to design and test a robust differentiator/observer. Robust differentiation via sliding mode control provides accurate, real-time estimations of the derivatives of a base signal when nothing is known about its structure except some differential inequalities and, in spite of the presence 116
Chapter 9. Conclusions
of measurement noises.
9.2
Future Developments
The future developments can be summarized in three points: Extension of the dual numbers class into dual quaternion class to have a stable and efficient representation of the 6 DOF motion. Extension of the hyper-dual class to work with Hessian matrix. Developments of nonlinear adaptive controllers based on robust differentiation via sliding mode technique.
117
Appendix
A
Dual and Hyper-Dual Numbers A.1
Introduction
This appendix provides the necessary background that is required for the computation of the derivatives of a generic function using either the dual numbers or the hyper-dual numbers. For further details see [16].
A.2 A.2.1
Dual Numbers Definition
In linear algebra, the dual numbers extend the real numbers by adjoining one new element with the property 2 = 0 ( is nilpotent). The collection of dual numbers forms a particular two-dimensional commutative associative algebra over the real numbers. Every dual number has the form z = a + b
(A.1)
with a and b uniquely determined real numbers and, in particular, a = real(z) b = dual(z)
Real P art Dual P art
Dual numbers extend the real numbers in a similar way to the complex numbers. Indeed, as the dual numbers, the complex numbers adjoin a new element i, for which i2 = −1, and every complex number has the form z = a + bi where a and b are real numbers. The above definition (A.1) relies on tha idea that 2 = 0 with 6= 0. This 118
Appendix A. Dual and Hyper-Dual Numbers
may not be mathematically possible; 2 = 0 may require = 0. For this reason, these numbers will be also called “fake numbers” as a reference to their similarity with imaginary numbers and to acknowledge that this type of number may not formally exist. Using matrices, dual numbers can be represented also as 0 1 a b = ; z = a + b = . 0 0 0 a It is easy to see that the matrix form satisfies all the properties of the dual numbers.
A.2.2
Properties
To implement the dual numbers, operations on this numbers should be properly defined. Indeed, given three dual numbers a, b and c, it is possible to demonstrate that the following properties hold: Additive associativity: (a + b) + c = a + (b + c). Additive commutativity: a + b = b + a. Additive identity: 0 + a = a + 0 = a. Additive inverse: a + (−a) = (−a) + a = 0. Multiplicative associativity: (a · b) · c = a · (b · c). Multiplicative identity: 1 · a = a · 1 = a. Multiplicative inverse: a · a−1 = a−1 · a = 1. Left and right distributivity: a · (b + c) = (a · b) + (a · c) and (b + c) · a = (b · a) + (c · a).
A.2.3
Algebraic operations
Given two numbers of this type a = a1 + a2 and b = b1 + b2 addition and multiplication can be defined as follows. Addition: a + b = (a1 + b1 ) + (a2 + b2 ). Multiplication: a · b = (a1 · b1 ) + (a1 · b2 + a2 · b1 ).
Using the definition for multiplication, the multiplicative inverse can be defined as 119
Appendix A. Dual and Hyper-Dual Numbers
Multiplicative inverse:
1 a
=
1 a1
−
a2 . a21
The multiplication inverse is therefore defined for all numbers of this type with a non-zero real part, a1 . Division can be defined as follows. Division: ab = a · 1b = ab11 + ab12 − ab12b2 . 1
An important property of the dual numbers is related to the definition of their norm; norm(a). The norm p should be related only to the real part of these numbers, norm(a) = a21 . This is a useful property to compare numbers of this type using inequalities. For instance consider a < b, this is equivalent to norm(a) < norm(b) or a1 < b1 . This definition of the norm has the property that norm(a · b) = norm(a) · norm(b). It is moreover possible to define the conjugate of p the dual number. Indeed, √ given the norm, it is possible to write norm(a) = a21 = a · aconj so that Conjugate: aconj = a1 − a2 .
A.2.4
Defining functions
With addition and multiplication defined in a consistent way, functions can be defined using Mclaurin series. sin(x) Example. The Mclaurin series of the function sin(x) for x ∈ R is the following
sin(x) = x −
x3 x5 x7 (−1)n 2n+1 + − + ... + x . 6 120 5040 (2n)!
(A.2)
For x = x1 + x2 , the terms x3 , x5 , etc. can be calculated using the rule for multiplication given in section A.3. 3x2 3 3 2 3 x = x1 + (3x1 x2 ) = x1 1 + (A.3) x1 5x2 5 5 4 5 x = x1 + (5x1 x2 ) = x1 1 + (A.4) x1 These results are then added according to the Mclaurin series so that x31 x51 x71 x21 x41 x61 sin(x) = x1 − + − +. . . +x2 1− + − +. . . . 6 120 5040 2 24 720 (A.5) 120
Appendix A. Dual and Hyper-Dual Numbers
Recognizing that cos(x) = 1 −
x2 x4 x6 (−1)n + − + ... + 2 24 720 (2n)!
(A.6)
the expression (A.5) can be simplified to give [13] sin(x) = sin(x1 ) + x2 · cos(x1 )
(A.7)
ln(x) Example. The Mclaurin series of the function ln(x) for x inR is the following
(−1)n+1 n x2 x3 x4 + − + ... + x . (A.8) 2 3 4 n For x = x1 + x2 , using the above expressions for the terms x3 , x5 , etc. yields x21 x31 x41 2 3 ln(1+x) = x1 − + − +. . . +x2 1−x1 +x1 −x1 +. . . (A.9) 2 3 4 2 1 1 3 2 +. . . and −x = 1−x +x = 1−2x1 +3x21 −4x31 +. . . where 1+x 1 1 1 1+x 1 1 so [13] x2 ln(1 + x) = ln(1 + x) + (A.10) 1 + x1 x2 ln(x) = ln(x1 ) + (A.11) x1 ln(1 + x) = x −
Function f (x). The above results can also be derived from the Taylor series for a general function f (x).
z 3 f 000 (x) 1 2 00 z f (x) + + ... 2! 3! where z is a dual number so that f (x + z) = f (x) + zf 0 (x) +
z = b z2 = 0 z 3 = 0.
(A.12)
(A.13) (A.14) (A.15)
These terms are then added according to the Taylor series [13], so that f (x + b) = f (x) + bf 0 (x).
(A.16)
It is implicit that each function extended in the dual plane “hides” its derivative in its dual part. For this reason it is possible to state that the dual-step approach can be considered as belonging to the class of the Automatic Differentiation Methods as well. 121
Appendix A. Dual and Hyper-Dual Numbers
A.2.5
Implementation
The dual numbers have been implemented as new class of numbers, using operator overloading, in MATLAB. This section contains their implementation which is based on Ref. [17], with some minor modifications. The new class includes definitions for the standard algebraic operations, logical comparison operations, and other more general functions such as the exponential, the sine and so on. This class definition file allows a real-valued analysis code to be easily converted to operate on dual numbers by just changing the variable type declarations, the structure of the code remains unchanged. classdef Dual properties x = 0; d = 0; end methods % Constructor function obj = Dual(x,d) if size(x,1) ~= size(d,1) | | size(x,2) ~= size(d,2) error('DUAL:constructor','X and D are different size') else obj.x = x; obj.d = d; end end % Getters function v = getvalue(a) v = a.x; end function d = getderiv(a) d = a.d; end % Indexing function B = subsref(A,S) switch S.type case '()' idx = S.subs; switch length(idx) case 1 B = Dual(A.x(idx{1}),A.d(idx{1})); case 2 B = Dual(A.x(idx{1},idx{2}), A.d(idx{1},idx{2})); otherwise
122
Appendix A. Dual and Hyper-Dual Numbers
error('Dual:subsref','Arrays with more than 2 dims not supported') end case '.' switch S.subs case 'x' B = A.x; case 'd' B = A.d; otherwise error('Dual:subsref','Field %s does not exist',S.subs) end otherwise error('Dual:subsref','Indexing with {} is not supported') end end function A = subsasgn(A,S,B) switch S.type case '()' idx = S.subs; otherwise error('Dual:subsasgn','Assignment with {} and . not supported') end if ~isdual(B) B = mkdual(B); end switch length(idx) case 1 A.x(idx{1}) = B.x; A.d(idx{1}) = B.d; case 2 A.x(idx{1},idx{2}) = B.x; A.d(idx{1},idx{2}) = B.d; otherwise error('Dual:subsref','Arrays with more than 2 dims not supported') end end % Concatenation operators function A = horzcat(varargin) for k = 1:length(varargin) tmp = varargin{k}; xs{k} = tmp.x; ds{k} = tmp.d; end A = Dual(horzcat(xs{:}), horzcat(ds{:})); end function A = vertcat(varargin) for k = 1:length(varargin) tmp = varargin{k}; xs{k} = tmp.x;
123
Appendix A. Dual and Hyper-Dual Numbers
ds{k} = tmp.d; end A = Dual(vertcat(xs{:}), vertcat(ds{:})); end % Plotting functions function plot(X,varargin) if length(varargin) < 1 Y = X; X = 1:length(X.x); elseif isdual(X) && isdual(varargin{1}) Y = varargin{1}; varargin = varargin(2:end); elseif isdual(X) Y = X; X = 1:length(X); elseif isdual(varargin{1}) Y = varargin{1}; varargin = varargin(2:end); end if isdual(X) plot(X.x,[Y.x(:) Y.d(:)],varargin{:}) else plot(X,[Y.x(:) Y.d(:)],varargin{:}) end grid on legend({'Function','Derivative'}) end % Comparison operators function res = eq(a,b) if isdual(a) && isdual(b) res = a.x == b.x; elseif isdual(a) res = a.x == b; elseif isdual(b) res = a == b.x; end end function res = neq(a,b) if isdual(a) && isdual(b) res = a.x ~= b.x; elseif isdual(a) res = a.x ~= b; elseif isdual(b) res = a ~= b.x; end end function res = lt(a,b) if isdual(a) && isdual(b) res = a.x < b.x;
124
Appendix A. Dual and Hyper-Dual Numbers
elseif isdual(a) res = a.x < b; elseif isdual(b) res = a < b.x; end end function res = le(a,b) if isdual(a) && isdual(b) res = a.x b.x; end end function res = ge(a,b) if isdual(a) && isdual(b) res = a.x >= b.x; elseif isdual(a) res = a.x >= b; elseif isdual(b) res = a >= b.x; end end function res = isnan(a) res = isnan(a.x); end function res = isinf(a) res = isinf(a.x); end function res = isfinite(a) res = isfinite(a.x); end % Unary operators function obj = uplus(a) obj = a; end function obj = uminus(a) obj = Dual(-a.x, -a.d); end
125
Appendix A. Dual and Hyper-Dual Numbers
function obj = transpose(a) obj = Dual(transpose(a.x), transpose(a.d)); end function obj = ctranspose(a) obj = Dual(ctranspose(a.x), ctranspose(a.d)); end function obj = reshape(a,ns) obj = Dual(reshape(a.x,ns), reshape(a.d,ns)); end % Binary arithmetic operators function obj = plus(a,b) if isdual(a) && isdual(b) obj = Dual(a.x + b.x, a.d + b.d); elseif isdual(a) obj = Dual(a.x + b, a.d); elseif isdual(b) obj = Dual(a + b.x, b.d); end end function obj = minus(a,b) if isdual(a) && isdual(b) obj = Dual(a.x - b.x, a.d - b.d); elseif isdual(a) obj = Dual(a.x - b, a.d); elseif isdual(b) obj = Dual(a - b.x, -b.d); end end function obj = times(a,b) if isdual(a) && isdual(b) obj = Dual(a.x .* b.x, a.x .* b.d + a.d .* b.x); elseif isdual(a) obj = Dual(a.x .* b, a.d .* b); elseif isdual(b) obj = Dual(a .* b.x, a .* b.d); end end function obj = mtimes(a,b) % Matrix multiplication for dual numbers is elementwise obj = times(a,b); end function obj = rdivide(a,b) if isdual(a) && isdual(b) xpart = a.x ./ b.x; dpart = (a.d .* b.x - a.x .* b.d) ./ (b.x .* b.x); obj = Dual(xpart,dpart); elseif isdual(a) obj = Dual(a.x ./ b, a.d ./ b); elseif isdual(b)
126
Appendix A. Dual and Hyper-Dual Numbers
obj = Dual(a ./ b.x, -(a .* b.d) ./ (b.x .* b.x)); end end function obj = mrdivide(a,b) % All division is elementwise obj = rdivide(a,b); end function obj = power(a,b) % n is assumed to be a real value (not a dual) if isdual(a) && isdual(b) error('Dual:power','Power is not defined for a and b both dual') elseif isdual(a) obj = Dual(power(a.x,b), b .* a.d .* power(a.x,b-1)); elseif isdual(b) ab = power(a,b.x); obj = Dual(ab, b.d .* log(a) .* ab); end end function obj = mpower(a,n) % Elementwise power obj = power(a,n); end % Miscellaneous math functions function obj = sqrt(a) rr=a.x; rr(rr==0)=eps; obj = Dual(sqrt(a.x), a.d ./ (2 * sqrt(rr))); end function obj = abs(a) obj = Dual(abs(a.x), a.d .* sign(a.x)); end function obj = sign(a) z = a.x == 0; x = sign(a.x); d = a.d .* ones(size(a.d)); d(z) = NaN; obj = Dual(x,d); end function obj = pow2(a) obj = Dual(pow2(a.x), a.d .* log(2) .* pow2(a.x)); end function obj = erf(a) disp('Reached here') ds = 2/sqrt(pi) * exp(-(a.x).ˆ2); obj = Dual(erf(a.x), a.d .* ds); end function obj = erfc(a) disp('Reached here') ds = -2/sqrt(pi) * exp(-(a.x).ˆ2);
127
Appendix A. Dual and Hyper-Dual Numbers
obj = Dual(erfc(a.x), a.d .* ds); end function obj = erfcx(a) ds = 2 * a.x .* exp((a.x).ˆ2) .* erfc(a.x) - 2/sqrt(pi); obj = Dual(erfcx(a.x), a.d .* ds); end % Exponential and logarithm function obj = exp(a) obj = Dual(exp(a.x), a.d .* exp(a.x)); end function obj = log(a) obj = Dual(log(a.x), a.d ./ a.x); end % Trigonometric functions function obj = sin(a) obj = Dual(sin(a.x), a.d .* cos(a.x)); end function obj = cos(a) obj = Dual(cos(a.x), -a.d .* sin(a.x)); end function obj = tan(a) obj = Dual(tan(a.x), a.d .* sec(a.x).ˆ2); end function obj = asin(a) obj = Dual(asin(a.x), a.d ./ sqrt(1-(a.x).ˆ2)); end function obj = acos(a) obj = Dual(acos(a.x), -a.d ./ sqrt(1-(a.x).ˆ2)); end function obj = atan(a) obj = Dual(atan(a.x), 1 ./ (1 + (a.x).ˆ2)); end % Hyperbolic trig functions function obj = sinh(a) obj = Dual(sinh(a.x), a.d .* cosh(a.x)); end function obj = cosh(a) obj = Dual(cosh(a.x), a.d .* sinh(a.x)); end function obj = tanh(a) obj = Dual(tanh(a.x), a.d .* sech(a.x).ˆ2); end function obj = asinh(a) obj = Dual(asinh(a.x), 1 ./ sqrt((a.x).ˆ2 + 1)); end function obj = acosh(a) obj = Dual(acosh(a.x), 1 ./ sqrt((a.x).ˆ2 - 1)); end function obj = atanh(a)
128
Appendix A. Dual and Hyper-Dual Numbers
obj = Dual(atanh(a.x), 1./ (1 - (a.x).ˆ2)); end end end
The function isdual.m is the following: function b = isdual(a) b = strcmp(class(a),'Dual'); end
A.3
Hyper-Dual Numbers
Hyper-dual numbers are a higher dimensional extension of dual numbers in a similar way that the quaternions are a higher dimensional extension of ordinary complex numbers. A hyper-dual number is of the form x = x1 + x2 1 + x3 2 + x4 1 2 .
(A.17)
It has one real part and three non-real parts with the following properties 21 = 22 = (1 2 )2 = 0,
(A.18)
1 6= 2 6= 1 2 6= 0
(A.19)
√ 0 6= 0 1 = √ 2 = 0 6= 0 √ 1 2 = 0= 6 0
(A.20) (A.21) (A.22)
where or in other words
The properties of these numbers are exactly the same of the ones concerning the dual numbers (see section A.2.2).
A.3.1
Defining Algebraic Operations
Given two numbers of this type a = a1 + a2 1 + a3 2 + a4 1 2 and b = b1 + b2 1 + b3 2 + b4 1 2 addition and multiplication can be defined as follows. Addition: a + b = (a1 + b1 ) + (a2 + b2 )1 + (a3 + b3 )2 + (a4 + b4 )1 2 .
129
Appendix A. Dual and Hyper-Dual Numbers
Multiplication: ab = (a1 b1 )+(a1 b2 +a2 b1 )1 +(a1 b3 +a3 b1 )2 +(a1 b4 +a2 b3 +a3 b2 +a4 b1 )1 .
Using the definition for multiplication, the multiplicative inverse can be defined as Multiplicative inverse: 1 = a11 − aa22 1 − aa23 2 + − a 1
1
a4 a21
+
2a2 a3 a31
1 2 .
The multiplication inverse is therefore defined for all numbers of this type with a non-zero real part, a1 . Division can be defined as follows. Division: a = ab11 + ab12 − b 2b2 b3 1 2 . b2
a1 b2 b21
1 +
a3 b1
−
a1 b3 b21
2 +
a4 b1
−
a2 b3 b21
−
a3 b2 b21
+ a1
−b4 b21
+
1
An important property of the hyper-dual numbers is related to the definition of their norm; norm(a). The norm is defined just like for the dual numbers andpit should be related only to the real part of these numbers, a21 . This is a useful property to compare numbers of this norm(a) = type using inequalities. For instance consider a < b, this is equivalent to norm(a) < norm(b) or a1 < b1 . This definition of the norm has the property that norm(a · b) = norm(a) · norm(b). It is moreover possible to define the conjugate of the hyper-dual number. p √ Indeed, given the norm, it is possible to write norm(a) = a21 = a · aconj so that Conjugate: aconj = a1 − a2 1 − a3 2 + 2aa21a3 − a4 . With the addition and the multiplication operators defined in a consistent way, functions can be defined as already done for the dual numbers (see section A.2.4).
A.3.2
Hyper-Dual Numbers for Exact Derivative Calculations
Hyper-dual numbers can be used to compute exact first and second derivatives to form gradients and Hessians for optimization methods [14]. Considering the form of a hyper-dual number (A.17), the definition (A.18) of 1 and 2 implies that the Taylor series for a function with a hyper-dual step truncates exactly at the second-derivative term: f (x + h1 1 + h2 2 + 01 2 ) = f (x) + h1 f 0 (x)1 + h2 f 0 (x)2 + h1 h2 f 00 (x)1 2 . (A.23) 130
Appendix A. Dual and Hyper-Dual Numbers
The higher order terms are all zero by the definition of 21 = 22 = (1 2 )2 = 0, so there is no truncation error. The first and second derivatives are the leading terms of the non-real parts, meaning that if f 0 (x) is desired simply look at the 1 or 2 part and divide by the appropriate step and if f 00 (x) is desired look at the 1 2 part: 1 part[f (x + h1 1 + h2 2 + 01 2 )] h1 2 part[f (x + h1 1 + h2 2 + 01 2 )] f 0 (x) = h2 1 2 part[f (x + h1 1 + h2 2 + 01 2 )] f 00 (x) = . h1 h2 f 0 (x) =
(A.24) (A.25) (A.26)
The derivative calculations are not even subject to subtractive cancellation error so the use of hyper-dual numbers results in first and second derivative calculations that are exact, regardless of the step size. The real part returns the original function evaluated in the real argument Re(x), and it is mathematically impossible for the derivative calculations to affect the real part. Indeed, the use of this new number system to compute the first and second derivative involves converting a real-valued function evaluation to operate on these alternative types of numbers. Then, the derivatives are computed by adding a perturbation to the non-real parts and evaluating the modified function. The mathematics of these alternative types of numbers are such that when operations are carried out on the real part of the number, derivative information for those operations is formed and stored in the non-real parts of the number. At every stage of the function evaluation the non-real parts of the number contain derivative information with respect to the input. This process must be repeated for every input variable for which derivative information is desired [14]. Following the above discussion, methods for computing exact higher derivatives can be created by using more non-real parts. For instance, to produce nth derivatives, nth order hyper-dual numbers would be used. These nth order hyper-dual numbers have n components 1 , 2 , . . . , n and all of their combinations. If only the first derivatives are needed, first order hyper-dual numbers would be used: the dual numbers.
131
Appendix A. Dual and Hyper-Dual Numbers
A.3.3
Numerical Examples
The behavior of the second-derivative calculation methods is demonstrated using a simple analytic function: f (x) = A sin(ωt)e−λt . Figure A.1(b) shows the relative error of the second-derivative calculation methods as function of the step size, h. The relative error is defined as 00 00 |. |/|fref η = |f 00 − fref As the step size is initially decreased, the error of the central difference and complex-step approximations decreases according to the order of the truncation error of the method. For the second-derivatives, the complex-step approximation is subject to subtractive cancellation error, as are the central difference approximations (the formula for the second-derivative complexstep approximation is available at Ref. [15]). The subtractive cancellation error begins to dominate the overall error as the step size is furtherly reduced. The error of the hyper-dual number calculations is machine zero, regardless of the step size, because of the hyper-dual number approach is subject neither to truncation error, nor to round-off error. If we compare the error in the first-derivative calculations A.2(a) with the one in the second-derivative calculations A.1(b) we can point out that, unfortunately, the optimal step size for accurate second derivative calculations with central difference and complex-step approximations is usually not the same as the optimal step size for first derivatives. The optimal step size for these methods requires a trade off between the truncation and round-off errors. The optimal step size is not known a priori, it may require knowledge of the true derivative value and it will change depending on the function, the point where the derivative is desired, and the independent variable we are considering. These problems do not concern with the hyper-dual number calculations of the first and second derivatives as shown in Figure A.1. Figure A.2 illustrates the accuracies of several derivative calculation methods x as a function of the step size, for the function f (x) = √ 3 e . Since 3 sin (x)+cos (x)
this example is taken from [15], we can compare the results. The comparison of results reported in the Figures A.2 and A.3 shows the consistency of the results. In conclusion, comparing the behavior of the first- and second- derivative approximations the figures show the difficulty of computing accurate second derivatives using traditional methods. To compute exact second derivatives it is necessary to employ the hyper-dual numbers.
132
Appendix A. Dual and Hyper-Dual Numbers
(a)
(b)
Figure A.1: Accuracies of several derivative calculation methods as a function of the step size for the function f (x) = A sin(ωt)e−λt .
(a)
(b)
Figure A.2: Accuracies of several derivative calculation methods as a function of x the step size for the function f (x) = √ 3 e . 3 sin (x)+cos (x)
Figure A.3: Accuracies of several derivative calculation methods as a function of x the step size for the function f (x) = √ 3 e , [15]. 3 sin (x)+cos (x)
133
Appendix A. Dual and Hyper-Dual Numbers
A.3.4
Implementation
The hyper-dual numbers have been implemented as new class of numbers, using operator overloading, in MATLAB. This section contains their implementation, which is based on the dual numbers class (available at section A.2.5). The new class includes definitions for the standard algebraic operations, logical comparison operations, and other more general functions such as the exponential, the sine and so on. This class definition file allows a real-valued analysis code to be easily converted to operate on hyper-dual numbers by just changing the variable type declarations, the structure of the code remains unchanged. classdef HyperDual properties x = 0; d1 = 0; d2 = 0; d3 = 0; end methods % Constructor function obj = HyperDual(x,d1,d2,d3) if size(x,1) ~= size(d1,1) | | size(x,2) ~= size(d1,2) ... | | size(x,1) ~= size(d2,1) | | size(x,2) ~= size(d2,2) ... | | size(x,1) ~= size(d3,1) | | size(x,2) ~= size(d3,2) error('DUAL:constructor','X and D are different size') else obj.x = x; obj.d1 = d1; obj.d2 = d2; obj.d3 = d3; end end % Getters function v = getvalue h(a) v = a.x; end function d1 = getderiv 1(a) d1 = a.d1; end function d2 = getderiv 2(a) d2 = a.d2; end
134
Appendix A. Dual and Hyper-Dual Numbers
function d3 = getderiv 3(a) d3 = a.d3; end % Indexing function B = subsref(A,S) switch S.type case '()' idx = S.subs; switch length(idx) case 1 B = HyperDual(A.x(idx{1}),A.d1(idx{1}),... A.d2(idx{1}),A.d3(idx{1})); case 2 B = Dual(A.x(idx{1},idx{2}), A.d1(idx{1},idx{2}),... A.d2(idx{1},idx{2}),A.d3(idx{1},idx{2})); otherwise error('HyperD:subsref','Arrays with more than 2 dims not supported') end case '.' switch S.subs case 'x' B = A.x; case 'd1' B = A.d1; case 'd2' B = A.d2; case 'd3' B = A.d3; otherwise error('Dual:subsref','Field %s does not exist',S.subs) end otherwise error('Dual:subsref','Indexing with {} is not supported') end end function A = subsasgn(A,S,B) switch S.type case '()' idx = S.subs; otherwise error('Dual:subsasgn','Assignment with {} and . not supported') end if ~isdual(B) B = mkdual(B); end switch length(idx) case 1 A.x(idx{1}) = B.x; A.d1(idx{1}) = B.d1;
135
Appendix A. Dual and Hyper-Dual Numbers
A.d2(idx{1}) = B.d2; A.d3(idx{1}) = B.d3; case 2 A.x(idx{1},idx{2}) = B.x; A.d1(idx{1},idx{2}) = B.d1; A.d2(idx{1},idx{2}) = B.d2; A.d3(idx{1},idx{2}) = B.d3; otherwise error('Dual:subsref','Arrays with more than 2 dims not supported') end end % Comparison operators function res = eq(a,b) if ishyperdual(a) && ishyperdual(b) res = a.x == b.x; elseif ishyperdual(a) res = a.x == b; elseif ishyperdual(b) res = a == b.x; end end function res = neq(a,b) if ishyperdual(a) && ishyperdual(b) res = a.x ~= b.x; elseif ishyperdual(a) res = a.x ~= b; elseif ishyperdual(b) res = a ~= b.x; end end function res = lt(a,b) if ishyperdual(a) && ishyperdual(b) res = a.x < b.x; elseif ishyperdual(a) res = a.x < b; elseif ishyperdual(b) res = a < b.x; end end function res = le(a,b) if ishyperdual(a) && ishyperdual(b) res = a.x b.x; end end function res = ge(a,b) if ishyperdual(a) && ishyperdual(b) res = a.x >= b.x; elseif ishyperdual(a) res = a.x >= b; elseif ishyperdual(b) res = a >= b.x; end end function res = isnan(a) res = isnan(a.x); end function res = isinf(a) res = isinf(a.x); end function res = isfinite(a) res = isfinite(a.x); end % Unary operators function obj = uplus(a) obj = a; end function obj = uminus(a) obj = HyperDual(-a.x, -a.d1, -a.d2, -a.d3); end function obj = transpose(a) obj = HyperDual(transpose(a.x), transpose(a.d1),... transpose(a.d2), transpose(a.d3)); end function obj = ctranspose(a) obj = HyperDual(ctranspose(a.x), ctranspose(a.d1),... ctranspose(a.d2), ctranspose(a.d3)); end function obj = reshape(a,ns) obj = HyperDual(reshape(a.x,ns), reshape(a.d1,ns),... reshape(a.d2,ns), reshape(a.d3,ns)); end % Binary arithmetic operators function obj = plus(a,b) if ishyperdual(a) && ishyperdual(b) obj = HyperDual(a.x + b.x, a.d1 + b.d1, a.d2 + b.d2, a.d3 + b.d3);
137
Appendix A. Dual and Hyper-Dual Numbers
elseif ishyperdual(a) obj = HyperDual(a.x + b, a.d1, a.d2, a.d3); elseif ishyperdual(b) obj = HyperDual(a + b.x, b.d1, b.d2, b.d3); end end function obj = minus(a,b) if ishyperdual(a) && ishyperdual(b) obj = HyperDual(a.x - b.x, a.d1 - b.d1, a.d2 - b.d2, a.d3 - b.d3); elseif ishyperdual(a) obj = HyperDual(a.x - b, a.d1, a.d2, a.d3); elseif ishyperdual(b) obj = HyperDual(a - b.x, -b.d1, -b.d2, -b.d3); end end function obj = times(a,b) if ishyperdual(a) && ishyperdual(b) obj = HyperDual(a.x .* b.x, (a.x .* b.d1)+(a.d1 .* b.x),... (a.x.*b.d2)+(a.d2.*b.x), (a.x.*b.d3)+(a.d1.*b.d2)+... (a.d2.*b.d1)+(a.d3.*b.x)); elseif ishyperdual(a) obj = HyperDual(a.x .* b, a.d1 .* b, a.d2 .* b, a.d3 .* b); elseif ishyperdual(b) obj = HyperDual(a .* b.x, a .* b.d1, a .* b.d2, a .* b.d3); end end function obj = mtimes(a,b) % Matrix multiplication for dual numbers is elementwise obj = times(a,b); end function obj = rdivide(a,b) if ishyperdual(a) && ishyperdual(b) xpart = a.x ./ b.x; d1part = (a.d1 ./ b.x) - ((a.x .* b.d1) ./ (b.x .* b.x)); d2part = (a.d2 ./ b.x)-((a.x .* b.d2) ./ (b.x .* b.x)); d3part = ((a.d3 ./ b.x)-((a.d1 .* b.d2)./(b.x .* b.x))-... ((a.d2 .* b.d1)./(b.x .* b.x))+(a.x .*(-(b.d3... ./ (b.x .* b.x))+((2 .* b.d1 .* b.d2)./(b.x .* b.x .*b.x))))); obj = HyperDual(xpart,d1part, d2part, d3part); elseif ishyperdual(a) xpart = a.x ./ b; d1part = a.d1 ./ b; d2part = a.d2 ./ b; d3part = a.d3 ./ b; obj = HyperDual(xpart, d1part, d2part, d3part); elseif ishyperdual(b) xpart = a ./ b.x; d1part = -((a .* b.d1) ./ (b.x .* b.x));
138
Appendix A. Dual and Hyper-Dual Numbers
d2part = -((a .* b.d2) ./ (b.x .* b.x)); d3part = a .*(-(b.d3 ./ (b.x .* b.x)) + ((2.* b.d1 .* b.d2)... ./(b.x .* b.x .*b.x))); obj = HyperDual(xpart, d1part, d2part, d3part); end end function obj = mrdivide(a,b) % All division is elementwise obj = rdivide(a,b); end function obj = power(a,b) % n is assumed to be a real value (not a dual) tol=1e-15; if ishyperdual(a) && ishyperdual(b) error('Dual:power','Power is not defined for a and b both dual') elseif ishyperdual(a) obj = HyperDual(power(a.x,b), b .* a.d1 .* power(a.x-tol,b-1),... b .* a.d2 .* power(a.x+tol,b-1),... (b .* a.d3 .* power(a.x+tol, b-1))+... (b .* (b-1) .* a.d1 .* a.d2 .*power(a.x+tol, b-2))); elseif ishyperdual(b) error('Power with dual index, not implemented yet') end end function obj = mpower(a,n) % Elementwise power obj = power(a,n); end % Miscellaneous math functions function obj = sqrt(a) obj = power(a,0.5); end function obj = abs(a) obj = HyperDual(abs(a.x), a.d1 .* sign(a.x), a.d2 .* sign(a.x),... a.d3 .* sign(a.x)); end % Exponential and logarithm function obj = exp(a) obj = exp(a.x) .* HyperDual(ones(1,length(a.d1)), a.d1,... a.d2, a.d3 + a.d1 .* a.d2); end function obj = log(a) obj = HyperDual(log(a.x), a.d1 ./ a.x,... a.d2 ./ a.x, a.d3 ./ a.x-(a.d1 .* a.d2)./(a.x.ˆ(2))); end % Trigonometric functions function obj = sin(a) obj = HyperDual(sin(a.x), a.d1 .*cos(a.x), a.d2 .* cos(a.x),... ((a.d3 .* cos(a.x)) - (a.d1 .* a.d2 .* sin(a.x))));
139
Appendix A. Dual and Hyper-Dual Numbers
end function obj = cos(a) obj = HyperDual(cos(a.x), -a.d1 .* sin(a.x), -a.d2 .* sin(a.x),... (-a.d3 .* sin(a.x))-(a.d1 .* a.d2 .* cos(a.x))); end function obj = tan(a) obj = HyperDual(tan(a.x), a.d1 .*sec(a.x).ˆ2, a.d2 .*sec(a.x).ˆ2,... a.d3 .*sec(a.x).ˆ2+a.d1 .*a.d2 .*(2.*tan(a.x).*sec(a.x).ˆ2)); end function obj = asin(a) obj = HyperDual(asin(a.x), a.d1 ./ sqrt(1-(a.x.ˆ(2))),... a.d2 ./sqrt(1-(a.x.ˆ(2))), a.d3 ./sqrt(1-(a.x.ˆ(2)))... + a.d1 .* a.d2 .* (a.x ./ (1-(a.x.ˆ(2))).ˆ(-3/2))); end function obj = atan(a) obj = HyperDual(atan(a.x),a.d1 ./(1+(a.x).ˆ2),a.d2 ./(1 + (a.x).ˆ2),... a.d3./(1+(a.x).ˆ2)+a.d1 .* a.d2 .*... (-2 .*a.x ./ (1+a.x.ˆ(2)).ˆ(2))); end % Hyperbolic trig functions function obj = sinh(a) obj = (exp(a)-exp(-a)) ./ 2; end function obj = cosh(a) obj = (exp(a)+exp(-a)) ./ 2; end function obj = tanh(a) obj = (exp(a)-exp(-a)) ./ (exp(a)+exp(-a)) ; end function obj = asinh(a) obj = log(a + sqrt(a.ˆ(2)+1)); end function obj = acosh(a) obj = log(a + sqrt(a.ˆ(2)-1)); end function obj = atanh(a) obj = log((sqrt(1-a.ˆ(2))) ./ (1-a)); end end end
The function ishyperdual.m is the following: function b = ishyperdual(a) % ISHYPERDUAL Return true if a is of class HyperDual, % else return false b = strcmp(class(a),'HyperDual'); end
140
Bibliography
[1] John T. Betts: Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, SIAM-Society for Industrial and Applies Mathematics, Philadelphia, SECOND EDITION, 2010. [2] John H. Mathews and Kurtis K. Fink: Numerical Methods Using Matlab., Pearson, New Jersey, FOURTH EDITION, 2004. [3] http://www.holoborodko.com/pavel/numerical-methods/numericalderivative/central-differences/ [4] Joaquin R. R. A. Martins and Peter Sturdza and Juan J. Alonso: The Complex-Step Derivative Approximation, ACM Transactions on Mathematical Software, Vol.29, No.3, September 2003. [5] M. Sagliano, S. Theil: Hybrid Jacobian Computation for Fast Optimal Trajectories Generation, AIAA Guidance, Navigation, and Control (GNC) Conference, August 19-22, 2013, Boston, MA. [6] J.-J. E. Slotine, J.K. Hedrick, E.A. Misawa: On Sliding Observers for Nonlinear Systems, Journal of Dynamic Systems, Measurement, and Control, September 1987, Vol. 109/245 [7] Y. Shtessel, C. Edwards, L. Fridman, A. Levant: Sliding Mode Control and Observation,Springer New York Heidelberg Dordrecht London, 2014 [8] A. V. Rao: A Survey of Numerical Methods for Optimal Control, AAS/AIAA Astrodynamics Specialist Conference, AAS Paper 09-334 [9] A. L. Herman and B. A. Conway: Direct Optimization Using Collocation Based on High-Order Gauss-Lobatto Quadrature Rules, Journal of Guidance, Control, and Dynamics, Vol.19, No.3, May-June 1996. 141
Bibliography
[10] M. Sagliano: Performance analysis of linear and nonlinear techniques for automatic scaling of discretized control problems, Operations Research Letters, Volume 42, Issue 3, May 2014, Pages 213–216. [11] Sagliano, M., Samaan M., Theil S., Mooij E.: SHEFEX-3 Optimal Feedback Entry Guidance, AIAA SPACE 2014 Conference and Exposition, AIAA 2014-4208, San Diego, CA, 2014, doi:10.2514/6.2014-4208. [12] Arslantas Y. E., Oehlschl¨agel T., Sagliano M., Theil S., Braxmaier C., Safe Landing Area Determination for a Moon Lander by Reachability Analysis, 17th International Conference and Control (HSSC), Berlin, Germany, 2014. [13] Jeffrey A. Fike: Numerically Exact Derivative Calculation Using Fake Numbers, Stanford University, Department of Aeronautics and Astronautics, April 9, 2008. [14] Jeffrey A. Fike, S. Jongsma, Juan J. Alonso, Edwin van der Weide: Optimization with Gradient and Hessian Information Calculated Using Hyper-Dual Numbers, AIAA Applied Aerodynamics Conference, 27-30 June 2011, Honolulu, Hawaii. [15] Jeffrey A. Fike, Juan J. Alonso: The Development of Hyper-Dual Numbers for Exact Second-Derivative Calculations, 49th AIAA, 4-7 January 2011, Orlando, Florida. [16] W. B. Vasantha Kandasamy, Florentin Smarandache: Dual Numbers, Zip Publishing, Ohio, 2012. [17] https://gist.github.com/chris-taylor/2005955
142