(Preprint) AAS 12-634
ORTHOGONAL POLYNOMIAL APPROXIMATION IN HIGHER DIMENSIONS: APPLICATIONS IN ASTRODYNAMICS John L. Junkins *, Ahmad Bani Younes† and Xiaoli Bai‡ Abstract. In this paper, we unify classical results from function approximation theory and consider their utility in astrodynamics. Least square approximation, using the classical Chebyshev polynomials as basis functions, is reviewed for discrete samples of the to-be-approximated function. We extend the method to n-dimensions in a novel way, through the use of array algebra and Kronecker operations. Approximation of test functions illustrates the method and provides insight into the errors of approximating functions, as well as the associated errors arising when the approximations are differentiated or integrated. Two sets of applications are considered that are challenges in astrodynamics. The first application problem is local approximation of high degree and order geopotential models, replacing the global spherical harmonic series by a family of locally precise orthogonal polynomial approximations for efficient computation. A method is introduced which adapts the approximation degree radially, compatible with the truth that the highest degree approximations ( ~10-9 m/sec2 maximum acceleration errors) are required near the Earth’s surface, whereas lower degree approximations are required as radius increases to several Earth radii. We show that up to four orders of magnitude speedup in computing state of the art gravitational acceleration is easily feasible, with efficiency optimized using radial adaptation. The second class of problems includes orbit propagation and solution of associated boundary value problems. The ChebyshevPicard path approximation methods are shown well-suited to solving both initial value and two-point boundary value problems and we show over one order of magnitude speedup. Furthermore, the algorithms are parallel-structured so that they are immediately suited for parallel implementation with additional speedups. Used in conjunction with orthogonal FEM gravity approximations, truly revolutionary speedups in orbit propagation can be achieved without accuracy loss.
INTRODUCTION AND PAPER ORGANIZATION We first review classical discrete polynomial approximation results for one and two dimensions and introduce a convenient array algebra means to extend the one dimensional orthogonality results to higher dimensions in a way that avoids the curse of dimensionality and establishes the *
Fellow AAS, Distinguished Professor, Texas A&M University, Aerospace Engineering Dept, 722 H.R. Bright Building, 3141 TAMU, College Station, TX, 77843-3141,
[email protected], 979-845-3912. † Graduate Research Assistant, Texas A&M University, Aerospace Engineering Dept, 701 H.R. Bright Building, 3141 TAMU, College Station TX, 77843-3141,
[email protected], 979-845-3912. ‡ Scientist, Optimal Synthesis, Inc, 95 1st St, Suite 240, Los Altos, CA 94022,
[email protected] , 650-559-8585.
1
results needed for computation. Several simple examples are provided to enable the efficacy and utility of the methodology to be appreciated heuristically. Secondly we consider the ChebyshevPicard method for solving initial and two-point boundary value problems, following the developments in Bai’s recent dissertation [2]. Thirdly, we consider the problem of piecewise approximation of high degree and order gravitational potential models, for use in orbit propagation. We also consider the fusion of the gravitational field models and the Chebyshev-Picard method for solving the corresponding boundary value problems. Since each portion of this paper represents a significant branch, we choose to discuss the relevant literature in several subsets in the context of the corresponding section of this paper. ORTHOGONAL APPROXIMATION WITH ONE INDEPENDENT VARIABLE There are several treatments of discrete approximation using Chebyshev polynomials [2 - 10] Among the more comprehensive of these are the texts [3,5]. In [6], orthogonal approximation is placed in a broader context of multi-resolution approximation via linear and nonlinear input/output maps. In Appendix 1, we summarize a few most relevant aspects of approximation using Chebyshev polynomials that we utilize in this paper. Let us first set the context by considering the approximation of a single-valued function of one independent variable (1) g ( x), {xmin x xmax } To put the problem in a non-dimensional framework, we first introduce a new independent variable such that {1 1} . It is easy to verify the forward and inverse transformations:
x xmin 1 , xmax xmin
2
x xmin
and
( 1) xmax xmin 2
(2)
Substituting the first of Eqs (2) into Eq (1), we see that we wish to approximate the function
( 1) (3) g x( ) g xmin xmax xmin 2 In the case of general basis functions, we seek to approximate f ( ) as a linear combination of a prescribed set of N+1 linearly independent basis functions {0 ( ), 1 ( ), , N ( )} as f ( )
N
f ( ) ann ( )
(4)
n 0
For the case of discrete measurement samples, we introduce a set of sample points (nodes) as {0 , 1 , , M ; M N} ; the residual approximation error at each measurement node is N
rj f ( j ) ann ( j ); j 0,1,..., M
(5)
r f a
(6)
n 0
or in vector-matrix notation where
0 ( 0 ) 1 ( 0 ) f ( 0 ) ( ) ( ) f ( ) 1 1 1 f , 0 1 f ( M ) 0 ( M ) 1 ( M )
2
N ( 0 ) N (1 )
a0 a , a 1 N ( M ) aN
(7)
The method of least squares seeks the coefficient vector (a) that minimizes the weighted sum square of the residuals
1 J ( f a )T W ( f a ); W W T (positive definite weight matrix) 2
(8)
It follows [1] that the least square minimization solution for a leads to the normal equations
a (TW )1 TWf
(9) Restricting W to be diagonal hereinafter, and choosing a special class of orthogonal basis functions, TW can be rendered a diagonal matrix so the matrix inverse in Eq (9) is trivial. For this case we obtain (TW )1 diag{1/ (TW )ii } diag{1/ m00 1/ m11 1/ mNN } . The typical element of TW is a discrete inner product denoted m m and the condition that TW be a diagonal matrix directly gives rise to the orthogonality conditions, requiring the inner product of the typical pair of orthogonal basis functions obey:
m m
M for 0, ( ), ( ) W j ( j ) ( j ) j 0 m c 0, for
(10)
The orthogonality conditions depend jointly on the basis functions, the set of node locations and the weight matrix (more generally, W = W T may be fully populated). For the case that the above orthogonality conditions are satisfied, the explicit solution for the coefficients of Eq (9) is given by the independent/uncoupled ratios of inner products as M
( ), f ( ) a ( ), ( )
W
j
( j ) f ( j )
j 0
M
W
2
j
( j )
1 c
M
W
j
( j ) f ( j ), for 0,1, 2, , N
(11)
j 0
j 0
An important special case arises when we make a specific choice of orthogonal basis functions, namely {0 ( ), 1 ( ), , N ( )} {T0 ( ), T1 ( ), , TN ( )} , i.e., we choose the classical Chebyshev polynomials {T0 ( ), T1 ( ), , TN ( )} , as discussed in references [2-6] and the Appendix 1 as the basis functions and also we choose the N+1 cosine sample points (also known [2-6] as the CGL nodes in honor of Chebyshev-Gauss-Lobatto): (12) j cos( j / M ), j 0,1, 2, , M Consistent with the classical orthogonality conditions for Chebyshev polynomials, we adopt the weight matrix W diag{1/ 2,1,1, ..., 1,1,1/2} . Upon substituting the sample points of Eq (12) and the chosen weight matrix, it is easy to verify that orthogonality conditions of Eqs (10) are satisfied and the least square coefficients of Eqs (11) are specifically 1 M 1 1 1 a WjT ( j ) f ( j ) { T (0 ) f (0 ) T ( M 1 ) f ( M 1 ) T ( M ) f ( M )} (13) c j 0 2 c 2 where the denominators c in Eq (13) are the positive constants M 1 c W jT2 ( j ) { T2 (0 ) T2 (1 ) 2 j 0
1 T2 ( M 1 ) T2 ( M )}, 0,1, 2, , N (14) 2
or, more explicitly it can be verified that these inner products reduce to
3
c0 T0 ( ), T0 ( ) M
c T ( ), T ( ) M / 2, 1, 2, , N 1 c N TN ( ), TN ( ) M , if M N (interpolation case) c N TN ( ), TN ( ) M / 2, if M N (least squares case) so that the final coefficients are computed directly from the discrete inner products as T ( ), f ( ) 1 1 1 a0 0 { T0 (0 ) f (0 ) T0 ( M 1 ) f ( M 1 ) T0 ( M ) f ( M )} T0 ( ), T0 ( ) M 2 2 T ( ), f ( ) 2 1 1 a { T (0 ) f (0 ) T ( M 1 ) f ( M 1 ) T ( M ) f ( M )}, 1, 2,..., N T ( ), T ( ) M 2 2 aN
TN ( ), f ( ) 1 1 { TN (0 ) f (0 ) TN ( ), TN ( ) c N 2
(15)
(16) c M , M N N 1 TN ( M 1 ) f ( M 1 ) TN ( M ) f ( M )}, M 2 c N 2 , M N
Note that the coefficients of Eq (16) are computed independently of each other, and the absolute value of each coefficient is the maximum contribution of that term – this gives convenient means for obtaining efficient and accurate truncated approximations, as well as insight for adapting the order of the approximation. If a vector-matrix form is desired for the least squares solution for the coefficients, we can re-arrange Eqs (16) in the form
a Af
(17)
where the Chebyshev least square operator matrix is T0 (1 ) T0 ( M 1 ) T0 ( M ) / 2 T0 (0 ) / 2 T ( ) 2T1 (1 ) 2T1 ( M 1 ) T1 ( M ) 1 0 1 (18) A M 2TN 1 ( M 1 ) TN 1 ( M ) TN 1 (0 ) 2TN 1 (1 ) TN ( 0 ) 2TN (1 ) 2TN ( M 1 ) TN ( M ) We mention that the cosine nodes of Eq. (12) locate all N–1 extrema of the Chebyshev polynomials, as well as the two end points of the approximation interval. Also note the extrema of these polynomials, and therefore the sample points, cluster near +/– 1 as the degree N increases. See Figure A1 of the Appendix 1. The particular weight matrix W diag{1/ 2,1,1, ..., 1,1,1/2} is consistent with the classical Chebyshev polynomials satisfying the orthogonality conditions of Eq (10). The choice of an identity matrix, for example, together with the Gramm-Schmidt process [6], gives rise to a related set of orthogonal polynomials. The approximation properties of the Chebyshev polynomials are well-researched and a substantial literature exists related to this choice, therefore we adopt the slight modification of the identity weight matrix. Observe the unit weight applies to all interior maxima and minima, whereas the ½ weights apply to the two end points. We also, mention that Bai’s recent dissertation and some of the related literature are consistent with the above, but care must be taken in reading these references due to a factor of ½ applied to the zeroth and/or the Nth terms of the summations, depending on whether M >N or M = N. In particular, the second of Eqs (16) is frequently used to compute all N+1 a s, and this is then compensated by the introducing a ½ factor into the zeroth and Nth terms of Eq (4). This somewhat unusual inner product notation in the literature is eliminated by the notations above. While the competing conventions and inner product definitions (which implicitly incorporate the weights and c terms) are not wrong, we believe the above formulation leads to a logical path to generalize the classical weighted least square formulations to the analogous developments for approximating functions of n variables, as we show below.
4
The first few Chebyshev polynomials and the recurrence relationship are given by T0 ( ) 1, T1 ( ) , T2 ( ) 2 2 1, T3 ( ) 4 3 3 ,..., Tk 1( ) 2 Tk ( ) Tk 1( ) (19) The first few Chebyshev polynomials are plotted in Appendix 1. As M and N become large, the Chebyshev Polynomials constitute a complete set of basis functions, and therefore, theoretically, a linear combination of these basis functions can represent to arbitrary precision a continuous function f() on the interval from {1 1} given a sufficiently high M and N in Eq (4). Some functions “submit” to accurate approximation for a small M and N, and in some cases, a larger M and N are required to achieve a small approximation error. Note the absence of a matrix inverse allows great flexibility and efficiency, analogous to other orthogonal approximation techniques, such as Fourier series. The above Chebyshev polynomial formulation is known to be relatively immune to the so-called Runge Phenomena wherein the approximation errors near the end of the data support at +/- 1 can become unacceptably large. The dense sampling near the ends of the approximation interval associated with Eq. (12) implicitly reduces the errors near the boundary. Also, the fact that no numerical matrix inversion is required for orthogonal polynomials means that the order of the approximation can be robustly computed at any desired or required order. These advantages are best illustrated by numerical examples. Prior to considering these examples, let us compare the location of the nodes (sample points) of Eq. (12), to the most elementary alternative of uniformly spaced samples given by {i 1 2(i / M ), i 0,1, 2,..., M } , see Figure 1 for the cases of M = 2, 3, 4, and 20 sample points, and degree of approximation, that show the cosine sample point density of Eq (12) versus the uniform sample density. Note the clustering near the ends of the interval and the increasing sparseness nearer the center of the interval.
Figure 1 Cosine Nodes
5
NUMERICAL EXAMPLES ON ILLUSTRATIVE TEST FUNCTIONS (of one variable) To appreciate the above developments, let us approximate the test function (Test Function 1)
1 f ( ) / 2 [( )sin(5 1)] / [1 2 sin 2 ( 1/ 2)] 10
(20)
and use Eq (12) to generate measurements with either M = 300 or M = N, with N swept. The true function is shown in Figure 2 and in Figures 3-6 are several approximations. For reference, in addition to the orthogonal Chebyshev orthogonal approximation using the basis functions {T0 ( ), T1 ( ), , TN ( )} and the cosine nodes, we also show least square approximations with the power series polynomial basis functions {1, , 2 , N } of the same (M, N) and uniform nodes. The Runge Phenomena is evident in Figure 5 (note large boundary errors for the power series approximation versus the more uniform Chebyshev approximation). Furthermore, we see in Figure 6 for higher the case of high degree approximation using Chebyshev polynomials, we can approximate Test Function 1 with a residual error approaching a machine zero. All computations were done using MATLAB® with 16 bit floating point arithmetic. Figure 7 shows the maximum errors that result from least square approximation when M = 300 measurement nodes are used, for the case of the Chebyshev and power series polynomial approximation of Test Function 1. As is evident, the uniform convergence of the Chebyshev approximation again approaches machine precision by N = 50, with the maximum error decreases about one order of magnitude every time the degree N is increased by N = 3. On the other hand, the slope is much less for N N); for the square M = N case, the last row is [TN (0 ) TN (1 ) x
x
TN ( M 1 ) x
x
2 2
TN x ( M x )] .
We mention, without going through the details, that this formulation readily extends to ndimensions, for example for approximating a function in three dimensions, Eq. (46) are simply
13
C (TW )1 TW Cx C y Cz , W Wx Wy Wz ,
f W 1/2 f
(48)
where C z has the same form as Eq (47) and the coefficients are given by Eq (33) as before. We illustrate in the figures below what cosine sampling looks in 1, 2, and 3 dimensional spaces.
NUMERICAL EXAMPLES ON ILLUSTRATIVE TEST FUNCTIONS (of two variables) To construct some two dimensional test cases that relate closely to the above one dimensional examples, we define:
1 ( x)sin(5 x 1) x 10 G ( x) , 2 1 x 2 sin 2 ( x 1 ) 2
Test Function 2: f ( , ) G( )G( ) (49)
Below in Figure 9 is an illustration of the cosine nodal distribution in one, two and three dimensional spaces, the generalization to a hypercube is straightforward. We now consider several cases analogous to the one dimensional case, but we omit detailed discussions so that we can get to the most important results in this paper, namely applying these ideas to important problems in astrodynamics. Similar experiments as in the Test Function 1 case are performed using Eq (12) to generate measurements with either M = 80 or M = N, with N swept. The true function is shown in Figure 10 and in Figures 11 and 12 are several approximations. The Runge Phenomena as evident in Figure 5 (note large boundary errors for the power series approximation versus the more uniform Chebyshev approximation, which is concentrated in the center) can be expected to generalize for higher dimensioned cases. We can approximate Test Function 2 with a residual error approaching a machine zero. All computations were done using MATLAB® with 16 bit floating point arithmetic.
f ( , ) G( )G( )
Figure 10 Test Function 2
Figure 9 Multidimensional Cosine Meshes for Discrete Orthogonality of Chebyshev Polynomials in n Dimensional Approximation
14
Fig 11a Chebyshev Approximation (M=N=5)
Fig 11b Power Series Approximation (M=N=5)
Fig 11c Chebyshev Approximation (M=N=10)
Fig 11d Power Series Approximation (M=N=10)
Fig 11e Chebyshev Approximations (M=N=30)
Fig 11f Chebyshev Approximations (M=N=30)
Figure 11 Approximation of Test Function 2
15
M = N = 10
M = N = 10
Fig 12a Chebyshev Error
Fig 12b Power Series Error
M = N = 30
M = N = 30
Fig 12c Chebyshev Error
Fig 12d Power Series Error
M = 80
M = 80
Polynomial degree N
Polynomial degree N
Fig 12e Chebyshev Maximum Approx. Error
Fig 12f Power Series Maximum Approx. Error
Figure 12 Approximation Error of Test Function 2
16
Figure 11 shows the approximation results for the Chebyshev and power series polynomial approximation of Test Function 2 for (M=N=5, 10, 30). The power series experienced large Runge errors near the boundary and “died” altogether due to ill-conditioning around N ~15. Figure 12ad show the approximation error for the Chebyshev and power series polynomial approximation of Test Function 2 for (M=N=10, 30). Note for low degree approximation that the power series works fairly well in the center of the interval, but encounters large errors near the boundary (see Figures 12b, 12d). The maximum errors, shown in figure 12e and 12f, that result from least square approximation when M = 80 measurement nodes are used, for the case of the Chebyshev and power series polynomial approximation of Test Function 2. The Chebyshev approximations converged to 8 digit accuracy around N = 20 and ~15 digit accuracy is obtained (essentially a machine zero approximation error) around N = 50. The uniform convergence of the Chebyshev approximation again approaches machine precision by N = 50, with the maximum error decreases about one order of magnitude every time the degree N is increased by N = 3. On the other hand, the slope is much less for N 40, this value remains approximately constant 0.054. This gives rise to the maximum interval length as
(t f t0 )max (2 / c) 1/ max (CxC ) 37 / c . While this condi-
tion guarantees convergence of the Picard iterations, for a fixed N, it does not guarantee that N is sufficiently high to give an accurate approximation of the solution. It is fortunate, as is evident in Figure 23, that convergence does not degrade for large N, or put another way, N > 40 can be adjusted to achieve high solution precision, without affecting the rate of Picard iteration convergence. Thus, the most fundamental truth, we are guaranteed a significant finite time interval over which Picard iteration will converge. For longer time intervals, this suggests “patching” converged sub arcs together. The remarkable asymptotic behavior of the maximum eigenvalue of C x C is related to an even more unusual behavior of the locus of eigenvalues of C x C with increasing N. As is evident in Figures 21-22, we show the eigenvalue locus results which are new and are reported in this proposal for the first time. For N100, while the remaining eigenvalues of C x C , including all “new” ones as N increases and the ( N 1) ( N 1) dimensions of C x C increases - all cluster on the circular locus near the origin. Based on this root locus study, we hypothesize that as N , an infinite number of eigenvalues cluster approaching zero separation tangent to the circle at the origin. Thus the new eigenvalues for N>40 cluster ever nearer the origin, while the larger eigenvalues converge to distinct positions on the right-most part of the circular locus in Figure 22. The right most pair’s convergence to their position with increasing N is responsible for the asymptotic behavior in Figure 23. We have studied these remarkable loci carefully and confirmed this behavior, but explaining the cause of these beautiful loci has proven elusive to this point. We will continue the quest in the proposed research. In addition to gaining fundamental insights on the eigenstructure, we seek to learn how these characteristics depend upon the choice of the Chebyshev polynomials and the CGL non-uniform nodes versus, for example, Legendre polynomials with uniformly spaced nodes. For convergence insight into the case of 2nd order differential equations, we consider
d x(t ) / dt 2 cx(t ) . We find the velocity update equation is similar to Eq (18), whereas the position 2
updated equation is t f t0 t f t0 X i c C x C C x v 0 C x x 0 (76) C x C C x C X i 1 2 2 We found out that the maximum eigenvalues of CxC CxC decreases from 0.038 to 0.003 as N increases
2
from 10 to 40, and analogous to the case for the first order system, for all the N>40, the maximum CxC CxC eigenvalues are asymptotically approach about 0.003. Thus the convergence condition for Picard Iteration (MCPI) is approximately c(t f
t0 )2 4 / (0.003) 1333 ; we mention the significant
33
truth that this represents a two order of magnitude increase over the size of the classical estimate of the convergence region { c(t f
t0 )2 12} [2].
Although this linear analysis tells us that the Chebyshev-Picard iteration algorithm only converges on a finite interval, we can anticipate using a piecewise approach over longer intervals to solve a significant family of IVPs over an arbitrarily large time domain. The initial conditions on the subsequent segments should be the final state values from the previous segment. This may sound similar to the concept of the step size control used in forward integration methods such as Runge-Kutta or analytical continuation methods. However, the step size used by MCPI methods is typically a much larger finite interval than the steps used by the typical numerical methods, as will be shown in the examples. Furthermore, compared with the forward integration methods in which the integration errors are typically increasing with time in a secular unstable fashion, we anticipate that better stability/accuracy can be achieved from using MCPI methods because, qualitatively, the largest errors from MCPI methods usually appear in the middle of the interval and the smallest errors are at the ends where adjacent (successive) segments are joined. The fundamental reason for this special characteristic of MCPI methods is because of the chosen Chebyshev basis functions and CGL nodes which are denser at the boundaries and sparser in the middle. Eigenvalue of CxCa
Eigenvalue of CxCa
0.1
0.02 N=10 N=20 N=30 N=40 N=50 N=70 N=90
0.05
0
-0.05
Max eigenvalue of CxCa
0
10
0.03
N=100 N=200 N=300 N=400 N=500 N=700 N=900
0.01
0
max(CxCa)
0.15
-1
10
-0.01 -0.1 -0.02
-0.15
-0.2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Fig 21 Eigenvalue locus of C x C
(N=100)
0.06
10
1
10
2
10 Polynomial order
3
10
Fig 23 Maximum eigenvalue of C x C versus Polynomial Order N
Numerical Examples All computations underlying this paper, (other than the parallel computations) were done in a conventional PC. The settings of the computer and the development environment used are the following Intel(R) Pentium(R) D CPU 3.4GHz, 3.4GHz, 2.0GB of RAM Windows XP Operating System MATLAB R2009b NVIDIA GeForce 9400GT Graphics Card Microsoft Visual Studio 2005 Example 1: A First Order Nonlinear System Consider a dynamic equation
dy f (t , y) cos(t y), t0 0, t f 256 , y(t0 ) 1, 0.001 dt
(77)
Fukushima [50] suggested this problem which has an analytical solution, as a benchmark with a known true for conducting convergence accuracy studies. We first compare the results in solving an IVP for Eq (55) and by using MCPI methods implemented in MATLAB and by using the “garden variety” solver ODE45, which is a Runge-Kutta 4-5 method implemented in MATLAB. For more significant nonlinear problems below, we use more sophisticated (and efficient) integrators as the basis for comparison. The results are shown for this first example in Figure 24. The immediate following conclusions can be drawn
34
1) For this tuning, the MCPI solutions have about one order of magnitude better accuracy than the ODE45 solutions, the CPU time using ODE45 is about 60 times slower than the CPU time using MCPI methods. We further note that orders (N) up to of several thousand are feasible with MCPI, without numerical difficulty, owing to the orthogonality properties (no matrix inverses or other linear algebra ill-conditioning opportunities to lose significant digits) and, especially, highly efficient recursions based on simple inner products. The optimal order is typically much less, but obviously high order approximation in numerical integration now takes on a new meaning. 2) As we will show later, these solutions have not taken advantages of the fact that the long intervals can be subdivided, which will reduce the order of the required polynomial approximations thus more speedup can be achieved by the MCPI methods when implemented on a serial machine. Furthermore, if parallel computation environment is available, more speedup will also be obtained because the un-coupled function evaluations matrix operations can be distributed to different processors. 3) While the errors from the reference ODE45 solution have a typical secular increase, which is a pattern common to all forward integration methods, note that the errors using MCPI method have the maximum values near the middle of the interval and the smallest errors at the boundaries. To graphical precision on a log scale, there is negligible secular error growth, in this example. The fundamental reason is because of the CGL nodes, which are dense at the boundaries and sparse in the middle. Notice this special feature makes MCPI methods more attractive than the forward integration methods in reducing the global errors for long time integrations, for which the solutions in different segments have to be patched together at the terminal points of each solution interval where the errors are typically smallest. 4) We note convergence can be obtained up to some problem dependent maximum final time. For linear problem, this maximum can be determined. For nonlinear problems, approximation or adaptive tuning is required. The interval for practical convergence is greater than 256π (~128 oscillation periods). x 10-10
x 10-10
MCPI CPU=.054 s
ODE45 CPU=3.17s
Figure 24 Integration errors and CPU time for Example 1 For qualitative purpose, we note that expanding the dynamic equation in ε leads to the approximate equation dy / dt cos(t ) sin(t ) y so the linear (in y) coefficient is bounded by ± ε. Even though it is not rigorous, we can estimate from the above analysis of the analogous constant coefficient linear system 1 that convergence might be expected if H 2 . Thus with the chosen polynomial N > 100, the max (Cx C ) idealized convergence analysis suggests that H should be less than
2 / (0.001)(0.05) ~ 40000 . These
approximations are typically useful for starting estimates. In this case we verified excellent convergence for the nonlinear system was actually achieved if H 800 2 5026.5 . Perhaps the most striking feature of this example is that very high precision can be achieved via MCPI over long time periods including many main period oscillations of a nonlinear system, whereas many time steps per period are required by all step-by-step integrators known to achieve comparable precision.
35
Example 2: A Second Order Nonlinear System The following second order differential equation has the same analytical solution as the above first order example, and allows us to conduct convergence studies for those integrators designed for second order systems
d2 y 1 f (t , y ) sin(t y ) cos(2t 2 y), t0 0, t f 256 , y(t0 ) 1, y(t0 ) 1 (78) 2 2 dt Among the many convergent possibilities, we have tuned the second order MCPI methods to use a Chebyshev polynomial of order 130 to approximate the solution over an interval length of 16π (8 periods of unperturbed oscillations), and found convergent solutions on the sixteen segments of 16 duration that are patched together to generate the final solution. At the starting iteration, all the positions and velocities at the N+1 CGL nodes are simply chosen as the straight line ensuing from the initial position and the initial velocity, thus a very poor starting guess is provided for the MCPI methods so that the timing results will be very conservative. To provide a more meaningful comparison vis a vis relative efficiency, we adopt the 12 th order Runge-Kutta-Nystrom algorithm RKN12(10) with adaptive step size control. The errors of MCPI and RKN12(10) are shown in Figures 25,26.. We can see with slightly better accuracy achieved by the MCPI solution, MCPI also obtained a speedup of about 32 relative to RKN12(10). This speedup is (in spite of the fact that the number of function evaluations were not substantially reduced), a consequence of the recursive vector-matrix nature of the MCPI algorithm. As the force model becomes complicated, this speedup advantage on a serial machine can be expected to be smaller. The RKN12(10) algorithm calls the function evaluation routine 14,974 times, whereas totally MCPI takes 113 iterations, which leads to 113×(130+1)=14,803 function evaluations (remarkably, almost the same number in this case). Thus on a serial machine, even with a very poor starting solution estimate, MCPI requires essentially the same number of function evaluations as does RKN12(10). However, it is vitally important to recognize that the MCPI acceleration evaluations are independent, since the entire path approximation is available at once on each iteration. In an ideal parallel environment where we can distribute the function evaluations on the N+1 CGL nodes onto N+1 processors, the theoretical speedup a factor is 131, and can be expected to approach that theoretical limit if 131 or more cores are available, since little shared memory is involved. Additionally, comparing the computational time and accuracy of this tuned second order MCPI with the previous first order MCPI using one segment, we see the benefit to use the second order formulation and also the potential for even better accuracy and more speedup when careful tuning is applied to the MCPI methods. Figures 25, 26 show the errors are in the 11th significant figure for both solutions, although the MCPI solution has about ¼ the error norm of the RKN12(10) solution. The speedup achieved on a serial processor was 32, the theoretical speedup on a parallel processor with over 130 cores is two additional orders of magnitude for this problem. Impressive potential exists, if these results for “toy” idealized problems extend
36
to the problems of orbit mechanics. In the results presented below, the test cases to date indicate that these speedups are typical for the more nonlinear problems of central practical interest. -11
2.5
Second order MCPI errors: CPU=0.020925
x 10
-11
1.5 1
RKN12(10) errors: CPU=0.85631
x 10
MCPI 2nd Order CPU=.028s
2
8
RKN12(10) CPU=.856s
7 6 5
y
y
0.5 4
0 3
-0.5 2
-1
1
-1.5 -2
0
0
100
200
300
400 500 Time (sec)
600
700
800
0
900
Figure 25 Second Order MCPI error history
100
200
300
400 Time (sec)
500
600
700
800
Figure 26 RKN12(10) error history
Example 3: Integration of Unperturbed Keplerian Motion (Natural Second Order System) Although a classical problem that has been used very often for ODE solvers’ performance comparison is a planar two-body problem [43-47], we choose to use an example that integrates a three dimensional near circular orbit for one week to help us draw more practical insight. The dynamic equations are
d2x 3 x; 2 dt r
d2 y 3 y; 2 dt r
d2z 3 z, 2 dt r
r 2 x2 y 2 z 2
(79)
The six classical orbital elements are: a = 6644.754km; e = 0.001; i = 68°; ω = -160°; Ω = 92°; Tp = 5.3905 ×103sec Both the MCPI methods and RKN12(10) are tuned such that sub-millimeter position accuracy, relative to the exact analytical solution, is achieved for the whole week. A Chebyshev polynomial of order 40 is chosen for the MCPI method and the (convergent) segment length is selected to be 5400seconds. The classical F&G solution [15] provides an analytical truth that is used to calculate the solution relative errors, which are smaller than 10-11. Several observations are summarized here 1) The computational time is 0.2639sec for MCPI and 1.8882sec for RKN12(10). Thus with slightly better accuracy in both position and velocity, MCPI achieved a speedup factor of seven. 2) RKN calls the differential equations 29662 times. Using an initial starting solution that the position and velocity at all the CGL nodes are the same as the initial position and velocity, MCPI took 2465 iterations in total. Thus on a serial machine, the ratio of function evaluation of RKN over MCPI is 29662/ (2465×41) = ~0.3. However, in an ideal parallel environment where we can distribute the function evaluation on the N+1 CGL nodes to N+1 processors, the ratio is 29662/2465 = ~12. 3) The reason for the speedup of MCPI over RKN in a serial implementation lies in two aspects. The first reason is that the matrix-vector form of the MCPI approach is computationally very efficient. The second reason attributes to the large step size that MCPI can use. For RKN12(10), the minimum step size is 0.0465sec, the maximum is 629.4089sec, and the mean is 363.4615sec, which is ~7% of the one full orbit step size that MCPI is using, there is a significant qualitative difference in approximating a path vs taking small steps along it! 4) We have gained some preliminary insight about how to tune the polynomial order and the segment length. Figures 27, 28 show the computational time and accuracy for the MCPI method when the orders are chosen from 40 to 300, and the segment lengths are chosen from about 10% of the orbit periods up to ~2.2 of the orbit period. The computational time is shown in Figure 14 and its contour plot is in Figure 15. The minimum computation time is 0.1847sec, which is obtained when
37
we choose N = 50 and segment length is 10260sec (about 1.9 orbit). Notice although we currently do not have a rigorous way to find this optimal settings for the minimum computational time, we have a large region (time intervals and approximation areas) where we can obtain sub-optimal solutions (the region where the computational time is less than one second, which is still significantly faster than RKN12(10)). The most time consuming settings are the cases where a low order Chebyshev polynomials are used to integrate a rather large segment. We also characterize the solution errors as the maximum global relative error e max(
| r ( MCPI , t ) r ( FG, t ) | | v ( MCPI , t ) v ( FG, t ) | ) max( ) | r ( FG, t ) | | v ( FG, t ) |
where the first argument indicates the method to compute the solution, FG denotes the classical analytical solution. The significant figures shown in Figs 29, 30 are defined as ε = - log10(e). Looking at Figures 28, 30, we can see that we have a large region where we could obtain more than eleven significant digits in less than one second of computational time, the irregularity of the 11th and especially the 12th singificant digit contour is a consequence of the solution accuracy approaching the noisy precision limit associated with finite word arithmetic areas associated with a machine precision of 14 digit arithmetic, the RKN12(10) algorithm also experiences similar bumpy convergence when it approaches 12 digit accuracy, but only the step size tolerance was available for tuning.
In this case, we have the choice over a large space of interval lengths and
orders to achieve 12 digit accuracy, but of course, 9 digit accuracy for orbit problems is typically considered sufficient for “engineering accuracy”, since this already corresponds to cm precision. For runtime efficiency, the optimal region in this tuning space for serial machines is as near the top left boundary of Figures 28, 30 as accuracy allows, whereas, for parallel machines, it is near the top boundary but further to the right (we accept the longest practical convergence interval, since the order can be adjusted, by moving right for larger N equal to the number of cores available (so the number N+1 of function evaluations along each iterative trajectory can be carried out simultaneously to achieve a theoretical speedup of (N+1)). The flatness of the efficiency and accuracy surfaces and their large overlapping sweet spots permit a large space for adjustment to take full advantages of various parallel architectures. MCPI Preliminary Results for Propagating a Family of Perturbed Orbits As we discussed earlier, the ability to propagate satellite motion quickly and accurately is one of the major factors that affect the performance for tasks such as collision avoidance. For these tasks, numerical integration of the satellite motion with even more accurate and complicated perturbation force models has become necessary. Possibly a degree and order (200,200) or higher gravity model and a time-varying atmospheric density model will be required to be included to adequately model perturbation accelerations. In the following preliminary studies, we include only the zonal harmonic perturbation forces up to the fifth order in the dynamic models and investigate how the performance of the algorithms change as the force model becomes more complicated. A future study will evaluate the impact of including all of the gravitational effects up to degree, order (200,200). Including zonal harmonic perturbations up to the order of k leads to the dynamic equation as i k
r
r3
r adi i 2
r
r3
i k
r adi i 2
38
(80)
Large Efficiency “Sweet Spot” {relatively invariant to (interval, order)}
Figure 27 MCPI Computational time
Figure 28 Contour of MCPI CPU time 11
6
12
8
100
150 Order N
200
250
Figure 29 MCPI Significant Figures Sweet Spot
where a di is the ith of k perturbation terms. We compare the computational time and the number of function evaluations when using MCPI and RKN12(10). We look at a low eccentricity problem as well as a high eccentricity problem. Four perturbed force models were examined. - Inverse-square gravity force+J2 perturbation - Inverse-square gravity force+J2 perturbation +J3 perturbation - Inverse-square gravity force+J2 perturbation +J3 perturbation+ J4 perturbation - Inverse-square gravity force+J2 perturbation +J3 perturbation+ J4 perturbation+ J5 perturbation The purpose of considering the perturbations in this way is to assess the role that model complexity plays on the relative efficiency and accuracy of MCPI in comparison to existing methods. We also vary the eccentricity of the orbits from near circular to very eccentric, to assess the degree to which rapidly varying nonlinearity impacts the relative merits of the several algorithms. We discuss higher the MCPI results with these models below, for low and high eccentricity orbits. Example 6: Integration of Perturbed Orbits with Low Eccentricity (e = 0.01) The initial conditions for this example are the same as those used in Example 2. For the MCPI method, the Chebyshev polynomial order is 40 and the segment length is 5400 seconds, which have been tested in the unperturbed problem to provide sub-millimeter position accuracy. For the four perturbed cases, although no analytical solutions available, we have verified the relative energy changes for both methods are in the range of 10-13. The computational times for the two methods are shown in Figure 31 and the comparison results are shown in Figures 32-34. The order “1” case is the unperturbed Example 2 we studied before and we include it here to illustrate the performance trend with respect to the complication level of the perturbation models. Figure 30 shows that the MCPI method achieved six to eleven speedup over RKN12(10). Figure 33 shows that RKN12(10) calls about 30% of the number of function evaluations required by the MCPI method. Figure 34 shows that in an ideal parallel computation environment where we can distribute the force evaluation on the (N+1) CGL nodes onto (N+1) processors, RKN12(10) calls about twelve times of the number of function evaluations required by the MCPI method. Although the speedup achieved by the MCPI is shown to be decreasing on Figure 32 in a serial implementation, we predict that the trend will change to be beneficial to the MCPI method on an advanced parallel machine because of the following three reasons:
39
7
12
6
12
12
12
12
12
12
50
12
12
12
12
0.2
12
12
12
12
1
9
12
12
0.8
10
12
12
12
11
98
11 10
1.2
0.6
Figure 29 MCPI Significant Figures
12
12
12
12
11
11
12
1.4
0.4
Extremely Large Accuracy “Sweet Spot” (for 11 to 12 significant digits)
12 12
Segment length(h/Tp)
1.6
11
7
11
1.8
11
12
2
11
10
Significant digits: -log10(e) (e=max(error(r)/r(truth))+max(error(v)/v(truth))) 2.2
300
5
1) Figure 33 and 34 showed that as more perturbation terms included, the function call ratio of RKN12(10) over MCPI is increasing. 2) Figures 35 and 36 showed some preliminary results about the speedups obtained from the GPUaccelerated MCPI over the MATLAB MCPI when we use INVIDIA GeForce 9400GT. They demonstrate that the speedup achieved by the parallel MCPI code increases as either the perturbation forces become more complicated or higher order polynomials are used. 3) As we discussed earlier, the parameters for the MCPI method we used here are not the optimal settings. We anticipate more speedup can be achieved after we establish an adaptively tuning algorithm Computation time comparison (e=0.01)
Speedup of MPCI over RKN1210(e=0.01)
3
11 10.5
RKN1210 MCPI
2.5
CPU(RKN)/CPU(MCPI)
10
CPU time [sec]
2
1.5
1
9.5 9 8.5 8 7.5
0.5 7
0
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
6.5
5
Figure 29 Computational time of MCPI and RKN12(10) (e=0.01)
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
5
Figure 30 Speedup of MCPI over RKN12(10) (e=0.01)
Ratio of function call (serial computation (e=0.01)) 0.308
Ratio of function call (Ideal MCPI paralle(e=0.01)l)
Function Call(RKN)/Function Call(Ideal MCPI parallel)
12.6
Function Call(RKN)/Function Call(MCPI)
0.306 0.304 0.302 0.3 0.298 0.296 0.294 0.292
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
12.5
12.4
12.3
12.2
12.1
12
5
Figure 31 Ratio of Function Calls in a Serial Computer (e=0.01)
Figure 33 Speedup of GPU-MCPI (N=127)
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
5
Figure 32 Ratio of Function Calls in an Ideal Parallel Architecture (e=0.01)
Figure 34 Speedup of GPU-MCPI (N=511)
Example 7: Integration of Perturbed Orbits with High Eccentricity (e=0.9) For this highly eccentric orbit, the six classical orbital elements are: a = 65000km; e = 0.9; i = 68°; ω= -160°; Ω=92°; Tp = 1.6492 ×105sec For the MCPI method, the Chebyshev polynomial order is 45 except the segment passing the perigee, during which we use a polynomial of order 110. We have kept the segment length as 1/20 of the orbit, which have been tested in the unperturbed problem to provide sub-millimeter position accuracy. For the perturbed cases, we have verified the relative energy changes for both methods are in the range of 10 -13. The compu-
40
tational time for the two methods are shown in Figure 37 and the comparison results are shown in Figures 38-40. Figure 38 shows that the MCPI method achieved more than one order magnitude of speedup over RKN12(10) and this speedup is higher than the one achieved for the low eccentric case. Figure 39 shows that RKN12(10) calls about 40% of the number of function evaluations required by the MCPI method whereas Figure 40 shows that in an ideal parallel computation environment where we can distribute the force evaluation on the (N+1) CGL nodes onto (N+1) processors, RKN12(10) calls about twenty times of the number of function evaluations required by the MCPI method. Computation time comparison (e=0.9)
Speedup of MPCI over RKN1210(e=0.9)
1.4
18
1.2
17
RKN1210 MCPI 16
CPU(RKN)/CPU(MCPI)
CPU time [sec]
1
0.8
0.6
0.4
14
13
0.2
0
15
12
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
11 1
5
Figure 37 Computational Time (e=0.9)
Function Call(RKN)/Function Call(Ideal MCPI parallel)
Function Call(RKN)/Function Call(MCPI)
0.45 0.445 0.44 0.435 0.43 0.425 0.42
`
2
2.5
3 3.5 Harmonic order
4
3 3.5 Harmonic order
4
4.5
5
22.9
0.455
1.5
2.5
Ratio of function call (Ideal MCPI paralle(e=0.9)l)
Ratio of function call (serial computation (e=0.9))
1
2
Figure 38 Speedup of MCPI (e=0.9)
0.46
0.415
1.5
4.5
22.8 22.7 22.6 22.5 22.4 22.3 22.2 22.1 22 21.9
5
Figure 39 Ratio of Function Calls in a Serial Computer (e=0.9)
1
1.5
2
2.5
3 3.5 Harmonic order
4
4.5
5
Figure 40 Ratio of Function Call in an Ideal Parallel Architecture (e=0.9)
We remark that the tuning of the MCPI algorithm is ad hoc (we did use some insights from the previous examples), and even without optimization, the order of magnitude speedup and high precision relative to the competing RKN12(10) algorithm suggests a strong basis for optimism, especially when the highly parallel architecture and adaptive tuning features are fully developed and exploited CONCLUDING REMARKS We have summarized classical and recent developments from approximation theory, with emphasis on representing given complicated functions by orthogonal polynomials in one, two, and higher dimensions. It was shown that arranging the regression matrix to be Kronecker factorable allows array algebra identities to generate the multidimensional orthogonal least square operators directly from the corresponding one dimensional operators. We showed that four order of magnitude speedup in the computation time to obtain state-of-the-are gravitational acceleration is possible. As a consequence, a new generation of very efficient algorithms results for orbit integration, regardless of the method used to propagate the orbit.. A number of nonlinear test functions of elementary test functions of one and two variables were introduced and used to show that machine precision approximation results could be routinely obtained. It was further shown that integration of these functions led to especially attractive accuracy increase in that the oscillatory zero mean least square approximation errors are reduced by one order of magnitude through integration. This has immediate implications and explains in part the impressive results obtain in the Chebyshev-Picard obtained recently by Bai et al in solving the boundary value problems of celestial mechanics. We provide several examples in this paper. The good news is that we are now in a position to compute gravity globally with 10 or more significant figures at all orbit altitudes. The bad news is that the classical spherical harmonic ex-
41
pansion and analogous global models require > 105 terms to compute local acceleration with a single global expansion. Therefore, it is not attractive to utilize these high order global models to compute local gravity. We showed in this paper that it is feasible to develop an adaptive finite element approximation method that inherently addresses a heretofore un-solved challenge: For FEM gravitational approximation methods, how can we adapt the degree, as a function of radial distance, so that the number of terms in the approximation is automatically minimized to maintain a prescribed accuracy? The results obtained are illustrated with a number of tests cases that show several orders of magnitude reduction in computation time, in comparison to the correspondingly accurate spherical harmonic series. The implications of this result are substantial for propagating space object catalogs, efficient Monte Carlo studies and analogous operations where orbits must be iteratively computed and propagated over long time intervals. We also summarize the main features of the modified Chebyshev-Picard iteration (MCPI) method for path approximation using related approximation methods. These latter developments promise additional speedups for approximating long orbit arcs in a fashion ideally suited for parallel computation with the prospects for an additional speedup of two or more additional orders of magnitude. These results enable orbit computation on a personal computer with elapsed time feasible heretofore only using a supercomputer. REFERENCES
1.
Crassidis, John L.; Junkins, John L. (2011). Optimal Estimation of Dynamic Systems,Second Edition. New York, NY: CRC Press - Taylor and Francis. ISBN 978-1-4398-3985-0.
2.
X. Bai, Modified Chebyshev-Picard Iteration Methods for Solution of Initial Value and Boundary Value Problems. Ph.D. dissertation, Texas A&M Univ, College Station, TX, 2010.
3.
Mason, J.C., Handscomb, D.C., Chebyshev Polynomials, Chapman and Hall/CRC, 2003.
4.
P. L. Chebyshev. Thorie des mcanismes connus sous le nom de paralllogrammes, Mmoires des Savants trangers prsents l'Acadmie de Saint-Ptersbourg, 7:539-586, 1857.
5.
L. Fox and I. B. Parker. Chebyshev Polynomials in Numerical Analysis. London, UK: Oxford University Press, 1972.
6.
Singla P: Junkins JL; Multi-resolution methods for modeling and control of dynamical systems, CRC Press, 2009.
7.
Snay, R. A. (1978), Applicability of array algebra, Rev. Geophys., 16(3), 459–464, doi:10.1029/RG016i003p00459.
8.
H. Schaub and J. L. Junkins, Analytical Mechanics of Space Systems, Reston, VA, AIAA Education Series, 2nd ed, 2009.
9.
R. Battin, An Introduction to the Mathematics and Methods of Astrodynamics, Reston, VA: American Institute of Aeronautics and Astronautics, Inc, revisededition, 1999.
10. Junkins JL; Miller GW; Jancaitis, JR, “Weighting Function Approach to Modeling of Irregular Surface,” Journal of Geophysical Research; Volume: 78 Issue: 11 Pages: 1794-1803, 1973 11. Junkins JL; Jancaitis, JR; “Modeling in N Dimensions Using a Weighting Function Approach,” ournal of Geophysical Research; Volume: 79 Issue: 23 Pages: 3361-3366 1974 12. Junkins, JL: “Investigation of Finite-Element Representations of the Geopotential,”AIAA JOURNAL; Volume: 14 Issue: 6 Pages: 803-808, 1976 13. S.Pines, “Uniform Representation of the Gravitational Potential and its Derivatives,” AIAA Journal, Vol. 11, 1973, pp. 1508–1511. 14. Earth Gravitational Model 2008 (EGM2008): http://Earth-info.nga.mil/GandG/wgs84/gravitymod/egm2008/
42
15. H. Schaub and J. L. Junkins, Analytical Mechanics of Space Systems, 2nd ed, AIAA Education Series, Reston, VA, 2011. 16. B. Hofmann-Wellenhof and H. Moritz, Physical Geodesy, Springer-Verlag Wien, 2005. (This text is an updated edition of the 1967 classic by W.A. Heiskanen and H. Moritz). 17. Arora, N, and Russell, RP; “Fast, Efficient, and Adaptive Interpolation of the Geopotential,” Preprint AAS11-501, to Appear, Journal of the Astronautical Sciences, 2011 http://russell.ae.utexas.edu/FinalPublications/ConferencePapers/2011Aug_Alaska_AAS-11-501_Gravity.pdf
18. Ryan Russell AAS 11-158, Global point mascon mdels for simple, accurate, and parallel geopotential computation, preprint. 19. Brandon A. Jones AAS 10-237, “Orbit Determination with the Cubed-Sphere Gravity Model.” 20. Feagin, T., “An Explicit Runge-Kutta Method of Order Twelve,” SIAM J of Num Anal, to appear. 21. E.Hairer, A Runge-Kutta method of order 10, J Inst Math. App. 21,1978, pp.47-59. 22. J. T. Betts, “Survey of numerical methods for trajectory optimization,” Journal of Guidance, Control, and Dynamics, vol. 21, no. 2, pp. 193–20, 1998. 23. F. L. Lewis and V. L. Syrmos, Optimal control, New York, NY: Wiley-Interscience, 1995. 24. A. Migdalasa, G. Toraldo, and V. Kumar, “Nonlinear optimization and parallel computing,” Parallel Computing, vol. 29, no. 4, pp. 375–391, Apr. 2003. 25. R. Travassos and H. Kaufman, “Parallel algorithms for solving nonlinear two point boundaryvalue problems which arise in optimal control,” J of Optimization Theory and Applications, vol. 30, no. 1, pp. 53–71, 1980. 26. J. T. Betts and W. Huffman, “Trajectory optimization on a parallel processor,” Journal of Guidance, Control, and Dynamics,, vol. 14, no. 2, pp. 431–439, Mar.-Apr. 1991. 27. X. Bai, J. D. Turner, and J. L. Junkins, “Optimal thrust design of a mission to Apophis based on a homotopy method,” AAS/AIAA Spaceflight Mechanics Mtg, Savannah, GA, Feb. 2009. 28. F. Fahroo and I.M. Ross, “Direct trajectory optimization by a Chebyshev pseudospectral method,” J. Guidance, Control & Dynamics, vol. 25, no. 1, pp. 160–166, Jan.-Feb. 2002. 29. Q. Gong, F. Fahroo, and I. M. Ross, “Spectral algorithm for pseudospectral methods in optimal control,” J of Guidance, Control & Dynamics, vol. 31, no. 3, pp. 460–471, 2008. 30. F. Fahroo and I. M. Ross, “A spectral patching method for direct trajectory optimization,” Journal of the Astronautical Sciences, vol. 48, no. 2/3, pp. 269–286, 2000. 31. E. Picard, \Sur l'application des mbthodes d'approximations successives bl'8tude de certaines equations differentielles ordinaires," J. de Math., vol.9, pp. 217-271, 1893. 32. E. Picard, Trait d'analyse, vol. 1, chapter 4.7, Paris, France: Gauthier-Villars, 3rd edition, 1922. 33. J. V. Craats, “On the region of convergence of Picard's iteration," ZAMM-Journal of Applied Mathematics and Mechanics, vol. 52, no. 8, pp. 487-491,Dec. 1971. 34. R.P.Agarwal, “Nonlinear two point boundary value problems," Indian J of Pure and Applied Math,vol.4,pp.757-769, 1973. 35. W. J. Coles and T. L. Sherman, “Convergence of successive approximations for nonlinear twopoint boundary value problems," SIAM J on Applied Math, v. 15, no. 2, pp. 426-433, Mar. 1967.
43
36. P. B. Bailey, “On the interval of convergence of Picard's iteration," ZAMM-Journal of Applied Mathematics and Mechanics, vol. 48, no. 2, pp. 127-128,1968. 37. P.l Bailey, L. F. Shampine, and P. Waltman, “Existence and uniqueness of solutions of the second order boundary value problem," Bulletin, American Math Soc, vol. 72, no. 1, pp. 96-98, 1966. 38. J. Van de Craats, “On the Region of Convergence of Picard Iteration,” ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik, 62, 487 – 491, 1972. 39. Coddington, Earl A.; Levinson, Norman (1955), Theory of Ordinary Differential Equations, New York, McGraw-Hill. 40. E. Lindelöf, “Sur l'application de la méthode des approximations successives aux équations différentielles ordinaires du premier ordre;” Comptes rendus hebdomadaires des séances de l'Académie des sciences. Vol. 114, 1894, pp. 454–457. 41. G. E. Parker and J. S. Sochacki, “Implementing the Picard iteration," Neural, Parallel & Scientific Computations, vol. 4, no. 1, pp. 97-112, 1996. 42. C.W. Clenshaw, “The numerical solution of linear differential equations in Chebyshev series," Mathematical Proceedings of the Cambridge Philosophical Society, vol. 53, pp. 134-149, 1957. 43. R. E. Scraton, “The solution of linear differential equations in Chebyshev series," The Computer Journal, vol. 8, no. 1, pp. 57-61, 1965. 44. K. Wright, “Chebyshev collocation methods for ordinary differential equations," The Computer Journal, vol. 6, no. 4, pp. 358-365, 1964. 45. H. J. Norton, “The iterative solution of non-linear ordinary differential equations in Chebyshev series," The Computer Journal, vol. 7, no. 2, pp. 76-85, 1964. 46. J. Vlassenbroeck and R. V. Dooren, “A Chebyshev technique for solving non-linear optimal control problems," IEEE Transactions on Automatic Control, vol. 33, no. 4, pp. 333-340, Apr. 1988. 47. C. W. Clenshaw and H. J. Norton, “The solution of nonlinear ordinary differential equations in Chebyshev series," The Computer Journal, vol. 6, no. 1, pp. 88-92, 1963. 48. T. Feagin, “The numerical solution of two point boundary value problems using Chebyshev series," Ph.D. dissertation, The University of Texas at Austin, Austin, TX, 1973. 49. T. Feagin and P. Nacozy, “Matrix formulation of the Picard method for parallel computation," Celestial Mechanics and Dynamical Astronomy, v.29, no.2, pp.107-115, Feb. 1983. 50. T. Fukushima, “Vector integration of dynamical motions by the Picard-Chebyshev method," The Astronomical Journal, vol. 113, no. 6, pp. 2325-2328, Jun. 1997. 51. T. Fukushima, “Picard iteration method, Chebyshev polynomial approximation, and global numerical integration of dynamical motions," The Astronomical Journal, vol. 113, no. 5, pp. 19091914, May 1997. 52. X. Bai and J. L. Junkins, “Solving Initial Value Problems by the Picard-Chebyshev Method with NVIDIA GPUS,” San Diego, CA, 20th Spaceflight Mechanics Meeting, Feb. 2010. 53. X. Bai and J. L. Junkins, “Modified Chebyshev-Picard Iteration Methods for Solution of Initial Value Problems,” Monterey, CA, Kyle T. Alfriend Astrodynamics Symposium, May 2010.
44
Appendix A Chebyshev Polynomials Chebyshev polynomials are a set of orthogonal polynomials developed by the Russian mathematician Pafnuty Lvovich Chebyshev in 1857[4,5]. There are two kinds of Chebyshev polynomials. The kth Chebyshev polynomials of the first kind usually are denoted by Tk and the kth Chebyshev polynomials of the second kind usually are denoted by Uk. In this paper, we refer to Chebyshev polynomials of the first kind as Chebyshev polynomials. The Chebyshev polynomials can be computed through the recurrence relation as
T0 1,
T1 x,
Tk 1 x 2 xTk x – Tk 1 x
(A.1)
-1
or the Chebyshev polynomial of degree k can be defined by the identity Tk(x ) = cos(k cos (x)) : The continuous orthogonality conditions for Chebyshev polynomials are
0:n m 1 w( x)Tn ( x)Tm ( x)dx : n m 0 / 2:n m 0 1
w( x) (1 x 2 ) 1/2
and
The discrete orthogonality conditions for the Chebyshev polynomials using the CGL nodes are 0:n m M wk Tn ( xk )Tm ( xk ) M : n m 0 and w0 wM 1/ 2, wk 1; k 1, 2,..., M 1 k 0 M / 2:n m 0 The (N + 1) CGL (“cosine”) nodes for the Nth order Chebyshev polynomials are calculated from
xk cos( kM ); k 0,1, 2,..., M
Indefinite integration of the Chebyshev polynomials has the property Tk ( x)dx
(A.2)
(A.3)
(A.4)
1 Tk 1 Tk 1 (A.5) 2 k 1 k 1
The first derivative of the Chebyshev polynomials satisfies
dTk ( x) kU k 1 ( x) k (1 x 2 )1[ xTk ( x) Tk 1 ( x)] dx
(A.6)
Thus integrals and the derivatives are expressed as recursions contiguous degree Chebyshev polynomials. The first six Chebyshev polynomials are shown in Fig. A.1.
Figure A.1 Chebyshev Polynomials of the first kind
Appendix B
Kronecker Factorization and Least Square Approximation Proof of the important property regarding Kronecker factorization in least Squares that if a matrix of rank n with R
mn
; m n can be Kronecker factored as x y
45
(B.1)
then the classical normal equations a can, amazingly, be rewritten as
f
T
(B.2)
a Tx x Tx Ty y Ty f 1
1
(B.3)
1
T
1
T
That is, the large “least square operator” product of two small matrices:
1
T
T
T
can be re-written as simply the Kronecter 1
T x
T x
x
1
T y
T y
y
(B.4)
To carry out the proof of this identity, we need the following three properties of Kronecter matrix operations for square and nonsingular matrices A and B
A B AT BT A1 A2 B1 B2 A1B1 A2 B2 1 A B A1 B1 T
The property of Eq. (B.4) is proven as follows:
1
(B.5) (B.6) (B.7)
Using the assumed factorization of Eq. (B.1),
Then using Eqs. (B.5), (B.8) re-arranges to T
T
can written as
1
T
x
T
And using Eq. (B.6), Eq. (B.9) becomes Using Eq. (B.7), Eq. (B.10) is
1
1
T
T
y
T
T x
x
T
y
x
1
T y
x
y
T x
y
T y
T
T
1
1
T
T x
T
T x
1
x
1
T y
x
T y
T x
y
1
T y
T x
y
T y
(B.8) (B.9) (B.10) (B.11)
And finally, using the property of Eq. (B.6), Eq.(B.11) becomes
T
1
T
T x
1
x
T x
T y
1
T y
y
Q.E.D.
(B.12)
This property easily extends to higher dimensioned Kronecker factorizations, for example if
x y z
(B.13)
then the larger least square operator can be written as the Kronecker product of three smaller least square operators as
T
1
T
1
T x
x
T x
T y
1
y
T y
1
T z
z
T z
(B.14)
These results are easily extended to include the weighted least square case, as well. For the special case that the basis functions in 1, 2, and 3 dimensions satisfy orthogonality conditions such that the off-diagonal
elements of Tx x , Ty y , Tz z vanish, then likewise the larger matrix T
is diagonal and
we see that Eq. (B12) and (B.14) also provide very convenient means for generalizing one dimensional orthogonal approximation operators to higher dimensions. Care must always be taken to understand and properly account for the multidimensional nodal sample patterns and weight matrices, to ensure orthogonality of the basis functions with respect to both the weight matrices and nodal patterns.
46