Mar 1, 1990 - UNIVERSITY OF WISCONSIN-MADISON. COMPUTER SCIENCES DEPARTMENT. Computational aspects of ... purpose tool for approximation, for the simple reason that polynomials are only good for local approximation ...
UNIVERSITY OF WISCONSIN-MADISON COMPUTER SCIENCES DEPARTMENT Computational aspects of polynomial interpolation in several variables Carl de Boor1,2 & Amos Ron1 March 1990
ABSTRACT The pair hΘ, P i of a pointset Θ ⊂ IRd and a polynomial space P on IRd is correct if the restriction map P → IRΘ : p 7→ p|Θ is invertible, i.e., if there is, for any f defined on Θ, a unique p ∈ P which matches f on Θ. We discuss here a particular assignment Θ 7→ ΠΘ , introduced in [3], for which hΘ, ΠΘ i is always correct, and provide an algorithm for the contruction of a basis for ΠΘ which is related to Gauss elimination applied to the Vandermonde matrix (ϑα )ϑ∈Θ,α∈ZZd+ for Θ. We also discuss some attractive properties of the above assignment and algorithmic details of the algorithm, and present some bivariate examples.
AMS (MOS) Subject Classifications: primary 41A05, 41A10, 41A63, 65D05, 65D15 secondary 15A12 Key Words: exponentials, polynomials, multivariate, interpolation, multivariate Vandermonde, Gauss elimination
Authors’ affiliation and address: Computer Sciences Department University of Wisconsin-Madison 1210 West Dayton St. Madison WI 53706
1 2
supported by the United States Army under Contract No. DAAL03-87-K-0030 supported by the National Science Foundation under Grant No. DMS-8701275
Computational aspects of polynomial interpolation in several variables Carl de Boor & Amos Ron We say that the pair hΘ, P i of a (finite) pointset Θ ⊂ IRd and a (polynomial) space P of functions on IRd is correct if the restriction map P → IRΘ : p 7→ p|Θ is invertible, i.e., if there is, for any f defined (at least) on Θ, exactly one p ∈ P which matches f on Θ, i.e., satisfies p(ϑ) = f (ϑ) for all ϑ ∈ Θ. Polynomial interpolation in one variable is so basic a Numerical Analysis tool that many textbooks on Numerical Analysis begin with this topic and none fails to provide a detailed account of it. The topic is associated with the illustrious names of Newton, Cauchy, Lagrange, Hermite, and is essential for various basic tasks, such as the construction of rules for quadrature and differentiation, or the construction of difference approximations for ordinary differential equations. To be sure, polynomial interpolation is not a general purpose tool for approximation, for the simple reason that polynomials are only good for local approximation (though for some particularly well-behaved function, this might mean approximation on the entire line). Even in local approximation, badly handled polynomial interpolation, such as interpolation at equally spaced points, is not to be recommended in general. But well-handled polynomial interpolation, such as interpolation at the Chebyshev points, is one of the most efficient ways available for local approximation. It has been therefore all the more annoying that there has not been available a correspondingly simple and effective theory of multivariable polynomial interpolation. The reason is easy to spot: Whereas there is a unique interpolant from the space Πk of polynomials of degree ≤ k for any data given on any k +1-pointset in IR, there is no corresponding universal multivariable space of polynomials. In other words, a correct polynomial space P for interpolation to an arbitrary f at the given set Θ ⊂ IRd cannot in general be determined from the cardinality #Θ of the pointset Θ alone. Rather, the actual location and configuration of Θ must be taken into account. Further, the standard choice of P = Πk requires that #Θ equal k+d d dim Πk (IR ) = . d Finally, even if Θ satisfies this rather restrictive requirement, there is no guarantee that the pair hΘ, Πk i is correct. In [3], we give a particular assignment Θ 7→ ΠΘ for which hΘ, ΠΘ i is always correct, and give an algorithm for the construction of a basis for ΠΘ from Θ. We also prove there some of its nice properties. In the present paper, we list these and other properties of our assignment Θ 7→ ΠΘ and, eventually, verify the additional ones. We also provide some enticing (so we hope) examples. But the main point of the present paper is a detailed discussion of the algorithmic aspects of our particular 1
choice: How is ΠΘ to be constructed and, once in hand, how is the interpolant from it to be found? We did provide in [3] an algorithm for the construction of ΠΘ , but found to our surprise (cf. [2]) that ΠΘ can also be constructed by Gauss elimination applied to the Vandermonde matrix ϑα for Θ, but with a twist. This allows us to view our particular assignment ΠΘ in retrospect as arising from a stabilization and symmetrization of a simpleminded approach for finding a correct polynomial space of minimal degree for interpolation at Θ. The paper is organized as follows: In Section 1, we recall necessary details from [3] concerning the definition of our polynomial interpolant, give a very simple verification of our formula for the interpolant, and give an extensive list of its properties. In Section 2, we use Gauss elimination to extract from the Vandermonde matrix (ϑα ) (for the given ϑ ∈ Θ) a monomial-spanned polynomial space of lowest possible degree which is correct for interpolation at Θ, and prove that the same calculation also provides a basis for ΠΘ , albeit not a very convenient one. In Section 3, we show that Gauss elimination applied to the Vandermonde matrix, but carried out degree by degree rather than monomial by monomial, leads to a convenient basis for ΠΘ and provides a suitable ordering of the points of Θ, and contrast this with the algorithm proposed in [3] which corresponds to Gauss elimination by columns, with column pivoting without interchanges, and fails to provide an ordering of the points in Θ. Since, in our multivariable setting, each degree (other than degree 0) involves several monomials, we have to replace the standard goal of Gauss elimination, viz. the generation of zeros below the pivot element, by the more suitable goal of making the entries below the pivot element orthogonal to the pivot element, with respect to a certain weighted scalar product. We believe that such a generalization of Gauss elimination may be advantageous in other situations where more than partial pivoting is needed but total pivoting is perhaps too radical a measure. In Section 4, we introduce a modified power form for multivariate polynomials as well as a nested multiplication algorithm for its efficient evaluation. We believe both the form and the algorithm to be new (with the algorithm closely related to de Casteljau’s algorithm for the evaluation of the Bernstein-B´ezier form). In Section 5, we give a detailed description (in a MATLAB-like program) of the calculation of the modified power coefficients of our interpolant from the given data (ϑ, f (ϑ)), ϑ ∈ Θ. We illustrate the interpolation procedure with three examples in Section 6: The first explores the first nontrivial case, that of a four-point set Θ coplanar but not collinear, the second illustrates the close connection of ΠΘ to polynomials which vanish on Θ, and the third shows that the algorithm works sufficiently well to provide the polynomial interpolant to a smooth bivariate function at 40 randomly chosen points. The second example also shows the surprising fact that our interpolant to data at the six vertices of a regular hexagon takes a convex combination of the given function values as its value at every point in a hexagon-shaped region, and makes the point that, for any Θ on some circle in the plane, our polynomial space ΠΘ consists of harmonic polynomials. 2
In Section 7, we provide discussion and proofs of the various properties listed in Section 1, and close with a short section on a generalization of our process, from point evaluations to arbitrary linear functionals on Π. For alternative approaches to multivariable polynomial interpolation in the literature, see, e.g., their discussion in [2].
1. The interpolant and some of its properties The leading term p↑ of a polynomial p is, by definition, the homogeneous polynomial for which deg(p − p↑ ) < deg p. The construction proposed in [3] makes use of an analogous concept for power series, namely the initial term f↓ of a function f analytic at the origin. This is the homogeneous polynomial f↓ for which f − f↓ vanishes to highest possible order at the origin. In other words, f↓ (we call it ‘f least’ for short) is the first nontrivial term in the power series expansion f = f (0) + f (1) + f (2) + · · · for f in which f (j) is the sum of all the (homogeneous) terms of degree j. Pd For example, with ϑ · x := j=1 ϑ(j)x(j) the ordinary scalar product of the two d-vectors ϑ and x, the exponential eϑ with frequency ϑ has the power series expansion eϑ (x) := eϑ·x = 1 + ϑ · x + (ϑ · x)2 /2 + . . . . Therefore, (eϑ )↓ (x) = 1,
(eϑ − eϑ0 )↓ (x) = (ϑ − ϑ0 ) · x,
the latter in case ϑ 6= ϑ0 . We also use the abbreviation H↓ := span{f↓ : f ∈ H} for any linear space H of functions analytic at the origin, and recall from [3] the fact that (1.1)
dim H↓ = dim H.
In these terms, our assignment for ΠΘ is (1.2)
ΠΘ := (expΘ )↓ ,
with expΘ := span{eϑ : ϑ ∈ Θ}. Thus, if Θ consists of a single point, then ΠΘ = Π0 , while if Θ consists of the two points ϑ, ϑ0 , then ΠΘ is spanned by the two polynomials 1, (ϑ − ϑ0 )·, i.e., ΠΘ consists of the (two-dimensional) space of all polynomials which are (at most) linear in the direction ϑ − ϑ0 and constant in any direction orthogonal to ϑ − ϑ0 . 3
The construction of our interpolant also makes use of the pairing hg, f i :=
(1.3)
X
D α g(0) D α f (0)/α!
α
defined, e.g., for an arbitrary function g analytic at the origin and an arbitrary polynomial f . The weights in (1.3) are chosen so that point-evaluation at ϑ is represented with respect to this pairing by the exponential eϑ , i.e., heϑ , f i = f (ϑ),
(1.4)
ϑ ∈ IRs , f ∈ Π,
as one readily verifies by substituting ϑα = (D α eϑ )(0) for (D α g)(0) in (1.3). This justifies the following extension of the pairing to arbitrary g ∈ expΘ and f ∈ C(IRd ) by h
X
w(ϑ)eϑ , f i :=
ϑ∈Θ
X
w(ϑ)f (ϑ),
f ∈ C.
ϑ∈Θ
This extension is well defined since any collection of exponentials with distinct frequencies is linearly independent (see (2.4)Fact below). Consequently, dim expΘ = #Θ, and (1.5)
X
w(i)gi = eϑ
=⇒
i
X
w(i)hgi , f i = f (ϑ) forf ∈ C.
i
The construction proposed in [3] provides the polynomial interpolant IΘ f in the form (1.6)
IΘ f :=
n X
gj ↓
j=1
hgj , f i , hgj , gj ↓ i
with g1 , g2 , . . . , gn a(ny) basis for expΘ (hence, in particular, n = #Θ) for which (1.7)
hgi , gj ↓ i = 0
⇐⇒
i 6= j.
Since each gj ↓ is a (homogeneous) polynomial, it is clear that IΘ f is a polynomial. But it may be less obvious why IΘ f = f on Θ. Here is a simple argument. From (1.7), it follows that IΘ f is well-defined and that (1.8)
hgi , IΘ f i = hgi , f i,
all i.
Since g1 , g2 , . . . , gn is a basis for expΘ , this implies (with (1.4) and (1.5)) that IΘ f (ϑ) = heϑ , IΘ f i = heϑ , f i = f (ϑ), This also implies that the space span{g1 ↓ , . . . , gn ↓ } 4
ϑ ∈ Θ.
is a correct polynomial space for interpolation at Θ. This space is contained in ΠΘ . But since dim ΠΘ = dim expΘ = #Θ = n (the first equality by (1.1)), we must have ΠΘ = span{g1 ↓ , . . . , gn ↓ }. In order to provide encouragement, we now list some nice properties of this particular map Θ 7→ ΠΘ , but postpone their verification until after the discussion of the algorithm for the construction of the interpolant. (1) well-defined, i.e., for any finite Θ, ΠΘ is a well-defined polynomial space and hΘ, ΠΘ i is correct. (2) continuity (if possible), i.e., small changes in Θ shouldn’t change ΠΘ by much. There are limits to this. For example, if Θ ⊂ IR2 consists of three points, then one would usually choose ΠΘ = Π1 (as our scheme does). But, as one of these points approaches some point between the two other points, this choice has to change in the limit, hence it cannot change continuously. As it turns out, our scheme is continuous at every Θ for which Πk ⊆ ΠΘ ⊆ Πk+1 for some k. (3) coalescence =⇒ osculation (if possible), i.e., as points coalesce, Lagrange interpolation approaches Hermite interpolation. This will, of course, depend on just how the coalescence takes place. If, e.g., a point spirals in on another, then we cannot hope for osculation. But if, e.g., one point approaches another along a straight line, then we are entitled to obtain, in the limit, a match at that point also of the directional derivative in the direction of that line. (4) translation-invariance, i.e., ∀(p ∈ ΠΘ , a ∈ IRd ) p(a + ·) ∈ ΠΘ . This implies that ΠΘ is D-invariant, i.e., is closed under differentiation. (5) scale-invariance, i.e., ∀(p ∈ ΠΘ , α ∈ IR) p(α·) ∈ ΠΘ . This is equivalent to the fact that ΠΘ is spanned by homogeneous polynomials. Note that (4) and (5) together are quite restrictive in the sense that the only finite-dimensional spaces of smooth functions satisfying (4) and (5) are polynomial spaces. (6) coordinate-system independence, i.e., an affine change of variables ϑ 7→ Aϑ+c (for some invertible matrix A) affects ΠΘ in a reasonable way. Precisely, ∀{invertible A ∈ IRd×d , c ∈ IRd } ΠAΘ+c = ΠΘ ◦ AT . This implies that ΠΘ inherits any symmetries (such as invariance under some rotations and/or reflections) that Θ might have. This also means that ΠΘ is independent of the choice of origin. In conjunction with (5), it also implies that ΠΘ is independent of scaling of Θ. Hence, altogether ∀r 6= 0, c ∈ IRd . ΠrΘ+c = ΠΘ Finally, each p ∈ ΠΘ is constant along any lines orthogonal to the affine hull of Θ, i.e., ΠΘ ⊆ Π(affine(Θ)), with affine(Θ) := {
X
ϑ w(ϑ) :
ϑ∈Θ
X ϑ
5
w(ϑ) = 1}.
(7) minimal degree, i.e., the elements of ΠΘ have as small a degree as is possible. Here is the precise description: For any polynomial space P for which hΘ, P i is correct, and for all j, dim P ∩ Πj ≤ dim ΠΘ ∩ Πj . This implies, e.g., that if hΘ, Πk i is correct, then ΠΘ = Πk . In other words, in the most heavily studied case, viz. of Θ for which Πk is an acceptable choice, our assignment would also be Πk . (8) monotonicity, i.e., Θ ⊂ Θ0 =⇒ ΠΘ ⊂ ΠΘ0 . This makes it possible to develop a Newton form for the interpolant. Also, in conjunction with (7) and (9), this ties our scheme closely to standard choices. (9) Cartesian product =⇒ tensor product, i.e., ΠΘ×Θ0 = ΠΘ ⊗ΠΘ0 . In this way, our assignment in the case of a rectangular grid coincides with the assignment standard for that case. In fact, in conjunction with (8), we can conclude that we obtain the standard assignment even in the case that Θ is a ‘lower’ set of a rectangular grid of points (see Section 7). (10) associated differential operators. This unusual property links polynomials p which vanish on Θ to homogeneous constant coefficient differential operators q(D) which vanish on ΠΘ . The precise statement is that such q(D) vanishes on ΠΘ if and only if the homogeneous polynomial q is the leading term p↑ of some polynomial p which vanishes on Θ. We expect this property to play a major role in formulae for the interpolation error. (11) constructible, i.e., a basis for ΠΘ can be constructed in finitely many arithmetic steps. This list provides enough details to make it possible to identify ΠΘ in certain simple situations directly, without the aid of the defining formula (1.2). For example, if #Θ = 1, then necessarily ΠΘ = Π0 (by (7)). If #Θ = 2, then, by (6) and (7), necessarily ΠΘ = Π1 (affine(Θ)). If #Θ = 3, then ΠΘ = Πk (affine(Θ)), with k := 3 − dim affine(Θ). The case #Θ = 4 is the first one that is not clear-cut. In this case, we have again k := 4 − dim affine(Θ),
ΠΘ = Πk (affine(Θ)),
but only for k = 1, 3. When affine(Θ) is a plane, we may use (6) to normalize to the situation that Θ ⊂ IR2 and Θ = {0, (1, 0), (0, 1), θ}, with θ, offhand, arbitrary. Since Π1 is the choice for the set {0, (1, 0), (0, 1)}, this means that ΠΘ = Π1 + span{q} for some homogeneous quadratic polynomial q. While (2) and (6) impose further restrictions, it seems possible to construct a suitable map θ 7→ q in many ways so that the resulting Θ 7→ ΠΘ satisfies all the above conditions, except conditions (8) and (10) perhaps. (See Section 6 for our choice for q = qθ .) At present, we do not know whether there is only one map Θ 7→ ΠΘ satisfying all conditions (1)-(9). But, addition of condition (10) uniquely determines the map. Of course, we didn’t make up the above list and then set out to find the map Θ 7→ ΠΘ . Rather, we came across the fact that the pair hΘ, (expΘ )↓ i is always correct, and this started us off studying the assignment ΠΘ := (expΘ )↓ .
6
2. The choice of P provided by elimination In this section, we provide further insight into our particular assignment ΠΘ = (expΘ )↓ by comparing it with a more straightforward assignment which is provided by Gauss elimination applied to the Vandermonde matrix for Θ. This also should help in the understanding of the algorithm for the construction of IΘ described in the next section. In the absence of bases for the space Π := Π(IRd ) of all polynomials in d variables more suitable for calculations with multivariable polynomials, we deal here with the power form, i.e., we express polynomials as linear combinations of the powers α(1) α(d) ()α : IRd → IR : x 7→ xα := x1 · · · xd . P The polynomial p =: α ()α c(α) on IRd matches the function f at the pointset Θ if and only if its coefficient sequence c := c(α) α∈ZZd solves the linear system +
(2.1)
V ? = f|Θ ,
with (2.2)
V :=
ϑα
ϑ∈Θ,α∈ZZd +
the Vandermonde matrix for Θ. Thus a search for polynomial interpolants to f at Θ is a search for solutions c : ZZd+ → IR of (2.1) of finite support (i.e., with all but finitely many entries equal to zero). Actual calculations would force us to order the points in Θ and the indices α ∈ ZZd+ . It is more convenient, though, to let the ϑ ∈ Θ and the α ∈ ZZd+ index themselves for the time being. Thus V is a linear map taking functions on ZZd+ to functions on Θ. Its columns correspond to α ∈ ZZd+ , its rows to ϑ ∈ Θ. (2.3)Proposition. The Vandermonde matrix V (see (2.2)) is of full rank. One way to see this is to observe that (a(ϑ))ϑ∈Θ V = 0 implies that P Proof: a(ϑ)e = 0 (since ϑα = (D α eϑ )(0)), and thus to rely on the following ϑ ϑ∈Θ (2.4)Fact. Any collection of exponentials with distinct frequencies is linearly independent. Proof: The proof is by Pinduction since the linear independence is obvious when #Θ P = 1. If #Θ > 1 and s := ϑ∈Θ a(ϑ)eϑ = 0 with all a(ϑ) 6= 0, then also (Dy − c)s = ϑ∈Θ ((y · ϑ) − c)a(ϑ)eϑ = 0 for any particular y. Since the ϑ are distinct, we can choose y and c so that y · θ = c for a particular θ ∈ Θ while y · ϑ 6= c for at least one ϑ ∈ Θ. Thus P ϑ6=θ ((y · ϑ) − c)a(ϑ)eϑ = 0 is a sum of the same nature but with one fewer summand, hence with all its coefficients zero by induction hypothesis, hence at least one of the a(ϑ) must be zero, contrary to our assumption. ♠ 7
Elimination is the standard tool provided by Linear Algebra for the determination of the solution set of any linear (algebraic) system. Elimination classifies the unknowns into bound and free. Assuming the coefficient matrix to be of full rank (which our matrix V is by (2.3)Proposition), this means that each row is designated a pivot row for some unknown, which thereby is “bound”, i.e., computable once all “later” unknowns are determined. Any unknown not bound is “free”, i.e., freely choosable. Standard elimination proceeds in order, from left to right and from top to bottom, if possible. In Gauss elimination with partial pivoting, one insists on proceeding from left to right, but is willing to rearrange the rows, if necessary. Thus, Gauss elimination with partial pivoting applied to (2.1) (written according to some ordering of the ϑ ∈ Θ and the α ∈ ZZd+ ) produces a factorization LW = V, with L unit lower triangular and W in row echelon form. This means that there is a sequence β1 , β2 , . . . , βn which is strictly increasing, in the same total ordering of ZZd+ that was used to order the columns of V , and so that, for some ordering {ϑ1 , ϑ2 , . . . , ϑn } of Θ and for all j, the entry W (ϑj , βj ) is the first nonzero entry in the row W (ϑj , :) of W . (2.5)Proposition. Let LW = V be the factorization of V provided by Gauss elimination with partial pivoting. Specifically, let β1 , β2 , . . . , βn be the sequence, strictly increasing in the same total ordering of ZZd+ that was used to order the columns of V , for which, for some ordering {ϑ1 , ϑ2 , . . . , ϑn } of Θ and for all j, the entry W (ϑj , βj ) is the first nonzero entry in the row W (ϑj , :) of W . Then P := span(()βj )nj=1 is correct for interpolation at Θ. Moreover, if the columns of V are ordered by degree, then P is a polynomial space of smallest possible degree which is correct for Θ. Proof:
By assumption, the square matrix n U := W (ϑi , βj )
i,j=1
is upper triangular and invertible, and so provides the particular interpolant whose coefficient vector (2.6)
P
βi i () a(i),
a := (LU )−1 (f (ϑ1 ), . . . , f (ϑn ))
is obtainable from the original data f|Θ by permutation followed by forward- and backsubstitution. Now recall that Gauss elimination determines the next pivot column as the closest possible column to the right of the present pivot column. This means that each βj is chosen as the smallest possible index greater than βj−1 , in whatever order we chose to write down the columns of V . Consequently, the polynomial space n P := span ()βi i=1
selected by this process is spanned by monomials of smallest possible exponent (in the ordering of ZZd+ used). In particular, assume that we ordered the α by degree, i.e., so that α j. The result is a factorization LW = V, with L again unit lower triangular, but W is in row echelon form in the following sense. There is a nondecreasing sequence k1 , k2 , . . . , kn and some ordering {ϑ1 , ϑ2 , . . . , ϑn } of Θ so that, for all j, the (vector-)entry W(ϑj , kj ) is the first nonzero entry in the row W(ϑj , :) of W. In other words, the matrix
n W(ϑi , kj )
i,j=1
10
is block upper triangular, with nonzero diagonal entries. Note that this matrix need not be upper triangular, since the sequence k1 , k2 , . . . , kn need not be strictly increasing. But, there has to be orthogonality of W(ϑi , kj ) to W(ϑj , kj ) when ki = kj and i 6= j. Explicitly, the square matrix n (3.2) U := hW(ϑi , kj ), W(ϑj , kj )ii i,j=1
is upper triangular and invertible. Consequently, with U G := W, the matrix
n hG(ϑi , kj ), G(ϑj , kj )ii
i,j=1
is diagonal and invertible. For, factoring out the upper triangular matrix U is equivalent to ‘backward elimination’, i.e., to the calculations for j = n, n − 1, . . . , 1, do: W (ϑj , :) ←− W (ϑj , :)/U (j, j) for i = 1, . . . , j − 1, do: W (ϑi , :) ←− W (ϑi , :) − U (i, j)W (ϑj , :) end end in which the jth step enforces orthogonality of the pivot element in row j to the elements above it in the pivot column, without changing the orthogonalities already achieved in subsequent columns, and without changing anything in the preceding columns. Thus, in terms of the weighted scalar product X X (3.3) ha, bi := ha, bik = a(α)b(α)/α! α
k
for sequences a, b : ZZd+ → IR, we have (3.4)
hG(ϑi , :), Gkj (ϑj , :)i = δi,j /U (j, j),
with Gk given by
Gk (:, α) :=
G(:, α), 0,
i, j = 1, . . . , n,
|α| = k; otherwise.
With this, let (3.5)
gi :=
X
()α /α! G(ϑi , α).
α
Then (3.6)
X
(LU )(i, j)gj = eϑj ,
j
11
all j,
since LU G = V = D α eϑi (0) i,α . Further, (3.7)
X
gi ↓ =
()α /α! G(ϑi , α)
=
|α|=ki
X
()α /α! Gki (ϑi , α),
α
and we conclude from (3.4) that hgi , gj ↓ i = δi,j /U (j, j),
(3.8)
i, j = 1, . . . , n.
Since g1 , g2 , . . . , gn is linearly independent by (3.8), (3.6) implies that g1 , g2 , . . . , gn is a basis for expΘ . But (3.8) also implies that g1 ↓ , . . . , gn ↓ so constructed is linearly independent, hence a basis for ΠΘ by (1.1). This proves (3.9)Theorem. The functions gi defined by (3.5) provide a basis for expΘ which satisfies (1.7), and the corresponding gi↓ form a basis for ΠΘ . (3.10)Corollary. Let a := diag(U )(LU )−1 (f (ϑ1 ), . . . , f (ϑn )),
(3.11)
with L, U , and ϑ1 , ϑ2 , . . . , ϑn determined during Gauss elimination with partial pivoting applied to V as described above. Then, with gi↓ as given by (3.7), IΘ f =
X
gj ↓ a(j)
j
is the unique interpolant from ΠΘ to f on Θ. P Proof: The function q := j gj ↓ a(j) is in ΠΘ by (3.9)Theorem. Further, from (3.8), hgj , qi = a(j)/U (j, j). Therefore, from (3.6) and (3.11), q(ϑi ) = heϑi , qi =
X j
LU (i, j)hgj , qi =
X j
LU (i, j)
X
(LU )−1 (j, r)f (ϑr ) = f (ϑi ).
r
In effect, the multiplication in (3.11) by the diagonal matrix diag(U ) accounts for the division by hgj , gj ↓ i in (1.6), as the latter number is 1/U (j, j), by (3.8). ♠ It is worth noting that the factoring out of U from W will not change the pivot entries W(ϑj , kj ) since U (i, j) = 0 if ki = kj and i 6= j, except for the normalizing division. In other words, Gki = Wki /U (i, i), (with Wk defined entirely analogously to Gk ), showing that the factoring out of U from W need not be carried out, unless one is interested in the gi rather than the gi ↓ . On the other hand, formation of U is essential for the calculation of the coefficients of the interpolating polynomial. 12
In the language introduced in this section, the algorithm for the calculation of suitable g1 , g2 , . . . , gn from fj := eϑj , j = 1, . . . , n proposed in [3] amounts to Gauss elimination with column pivoting applied to V, except that no columns are actually interchanged. Rather, at the jth step, one looks for the left-most nonzero entry in the jth row of the working array W, say the entry W(ϑj , kj ), then uses the jth row to make all entries W(ϑi , kj ) for i 6= j orthogonal to W(ϑj , kj ). This will not spoil orthogonality of W(ϑi , ki ) to W(ϑr , ki ) for r 6= i and i < j achieved earlier, since either ki < kj , hence W(ϑj , ki ) = 0, or ki > kj , hence W(ϑj , kj ) is trivially orthogonal to W(ϑi , kj ) = 0, or ki = kj , hence W(ϑj , kj ) is already orthogonal to W(ϑi , kj ). Thus one obtains a factorization AW = V, with A invertible, and W in reduced row echelon form in the sense that, for some sequence k1 , k2 , . . . , kn , hW(ϑi , kj ), W(ϑj , kj )ikj = 0 ⇐⇒ i 6= j. This implies that the functions g1 , g2 , . . . , gn defined by X gj := ()α /α! W (ϑj , α) α
satisfy (1.7), hence the corresponding gj ↓ must be a basis for ΠΘ (by the reasoning used earlier). It is not obvious without recourse to the results from [3] that the two sequences g1 ↓ , . . . , gn ↓ produced by the two algorithms span the same space. The algorithm outlined in this section seems preferable to the one from [3] not only because it is closer to a standard algorithm but also because it provides a ready means for ordering the points of Θ for greater stability of the calculations. Some of the finer computational details are taken up below, after a short section on a particularly suitable polynomial form.
4. Nested multiplication for the modified power form We know only two polynomial forms readily available for the presentation of polynomials in several variables, the power form and the Bernstein-B´ezier form. The calculations above are in terms of the power form X p= ()α D α p(0)/α!, α
hence we stick with that form here, particularly since we are not concerned here with the Bernstein-B´ezier form’s major strength, the smooth patching of polynomial pieces (see, e.g., [1]). It is only prudent to use the shifted power form, i.e., to write X p= (· − c)α D α p(c)/α!, α
13
for some appropriate center c, e.g., c = cΘ :=
X
ϑ/#Θ.
ϑ∈Θ
Equivalently, we assume that Θ has been shifted at the outset by its center cΘ . It turns out to be simpler to use the following modified power form X α |α| (4.1) p= () D α p(0)/|α|! , α α
|α| α
:= |α|!/α! the multinomial coefficients. There are two reasons. (i) It is easy to program and use the following multivariable version of nested multiplication (or Horner’s scheme): with
(4.2)Proposition. If α D p(0)/|α|!, Pd (4.3) c(α) := D α p(0)/|α|! + i=1 xi c(α + ii ),
|α| = deg p; |α| = deg p − 1, deg p − 2, . . . , 0,
with ii the ith unit vector, p ∈ Π(IRd ), and x ∈ IRd , then c(0) = p(x). Proof:
Indeed, it follows that X c(0) =
nα xα D α p(0)/|α|!,
|α|≤deg p
with nα the number of different increasing paths to α from the origin through points of |α| d ♠ ZZ+ . This number is nα = α , hence c(0) = p(x), by (4.1). In effect, it is possible to evaluate a multivariable polynomial p from its normalized Taylor coefficients (D α p)(0)/|α|! without the (explicit) computation of multinomial coefficients. (ii) The information about gj computed by the algorithm outlined in the preceding section readily provides the numbers D α gj (0) (see (3.5)), hence the calculation of the modified power form for IΘ f , i.e., of the normalized Taylor coefficients (D α IΘ f )(0), from the matrix G and the vector (LU )−1 f|Θ can be accomplished without generation and use of the multinomial coefficients. The close similarity to de Casteljau’s algorithm (see, e.g., [1]) for the evaluation of the Bernstein-B´ezier form is actually not surprising, for the following reason. The BernsteinB´ ezier form X β |β| x 7→ (ξ(x)) c(β) β |β|=k
describes a polynomial of degree ≤ k in terms of the d + 1 linear polynomials ξt defined by the identity X ξt p(t) = p for all p ∈ Π1 , t∈T
14
with T ⊂ IRd in general position. The de Casteljau algorithm for its evaluation at some x consists of the calculations X c(β) := ξt (x)c(β + it ), |β| = j, t∈T
for j = k − 1, k − 2, . . . , 0, with the resulting c(0) the desired value at x. While the vector ξ(x) provides the barycentric coordinates of x withPrespect to the pointset T , no use is made in the de Casteljau algorithm of the fact that t∈T ξt (x) = 1. Thus the calculations c(α) :=
d X
xi c(α + ii ),
|α| = j,
i=1
for j = k − 1, k − 2, . . . , 0 and started from given c(α) with |α| = k will provide the number X α |α| x c(α). α |α|=k
The full algorithm above merely combines appropriately the steps common to de Casteljau applied to the terms in (4.1) of different degrees.
5. Algorithmic details We give here a (somewhat informal) MATLAB-like program (see, e.g., [6] for language details) for the construction of our interpolant in order to document the simplicity of the actual calculations needed. In this ‘program’, we use the following conventions: V and W denote the matrices V and W, respectively. In particular, W(i,k) is a vector with k+d−1 entries, indexed by {α ∈ ZZd : |α| = k}. This is decidedly not allowable d−1 in present-day MATLAB, but convenient here, as it avoids discussion of the (important technical) question of the best way to order the index set {α ∈ ZZd : |α| = k}. Correspondingly, for two vectors a and b (such as W(i,k), W(j,k)) indexed by {α ∈ d ZZ : |α| = k}, denotes the (scaled) scalar product X |α| (5.1) := a(α)b(α) α |α|=k related to (3.1) (with k = k). All matrices mentioned in the ‘program’ other than V and W are proper MATLAB matrices, i.e., have scalar entries. Further, we use a