are interested in the fast and robust computation of the discrete Fourier transform for nonequispaced data (NDFT) f(vj) = X. k2IN fk e ?2 ixkvj. (j 2 IM): (1.1).
Chapter 1 Fast Fourier transforms for nonequispaced data: A tutorial
Daniel Potts Gabriele Steidl Manfred Tasche
ABSTRACT In this section, we consider approximative methods for the fast computation of multivariate discrete Fourier transforms for nonequispaced data (NDFT) in the time domain and in the frequency domain. In particular, we are interested in the approximation error as function of the arithmetic complexity of the algorithm. We discuss the robustness of NDFT{algorithms with respect to roundo errors and apply NDFT{ algorithms for the fast computation of Bessel transforms.
1.1 Introduction Let d := [? 12 ; 21 )d and IN := fk 2 Zd : ? N2 k < N2 g, where the inequalities hold componentwise. For xk 2 d , vj 2 N d , and fk 2 C , we are interested in the fast and robust computation of the discrete Fourier transform for nonequispaced data (NDFT)
f (vj ) =
X
k2IN
fk e?2ixk vj
(j 2 IM ) :
(1.1)
For arbitrary nodes, the direct evaluation of (1.1) takes O(N d M d) arithmetical operations, too much for practical purposes. For equispaced nodes xk := Nk (k 2 IN ) and vj := j (j 2 IN ), the values f (vj ) can be computed by the well-known fast Fourier transform (FFT) with only O(N d log N ) arithmetical operations. However, the FFT requires sampling on an equally spaced grid, which represents a signi cant limitation for many applications. For example, most geographical data are sampled at individual observation points or by fast moving measuring devises along lines. Using the ACT method (a daptive weight, c onjugate gradient acceleration, T oeplitz matrices) [12] for the approximation of functions from scattered data [22], one has to solve a system of linear equations with a block Toeplitz matrix as system matrix. The entries of this Toeplitz matrix are of the form (1.1) and should be computed by ecient NDFT algorithms. Further applica-
This is page 1 Printer: Opaque this
2
Daniel Potts, Gabriele Steidl, Manfred Tasche
tions of the NDFT range from frequency analysis of astronomical data [19] to modelling and imaging [4, 23]. Often either the nodes in time or in frequency domain lie on an equispaced grid, i.e., we have to compute
f (vj ) = or
h(k) :=
X
k2IN X
j 2IN
fk e?2ikvj =N
(j 2 IM )
(1.2)
fj e?2ikvj =N
(k 2 IM ):
(1.3)
Several methods were introduced for the fast evaluation of (1.2) and (1.3). It may be useful to compare the dierent approaches by using the following matrix-vector notation: For the univariate setting d = 1 and M = N , the equations (1.2) can be written as f^ = AN f (1.4) with
N=2?1 2?1 ?2ikvj =N N=2?1 : f^ := (f (vj ))N= j =?N=2 ; f := (fk )k=?N=2 ; AN := e j;k=?N=2
Note that (1.3) is simply the \transposed" version of (1.2), i.e. h^ = ATN f
(1.5)
2?1 with h^ := (h(k))N= k=?N=2 . The dierent NDFT algorithms can be divided into three groups: I. Based on local series expansions of the complex exponentials like Taylor series [2] or series of prolate spheroidal functions [18] algorithms were developed which are more ecient than the direct evaluation of the NDFT. In matrix-vector notation, the matrix AN was approximated by the Hadamard (componentwise) product of the classical Fourier matrix N=2?1 FN := e?2ikj=N j;k=?N=2
and a low rank matrix E , i.e.
AN FN E : The rank of E determines the approximation error. Multiplication of the right-hand side with a vector means to perform rank(E ) times an FFT of length N .
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
3
II. A quite dierent idea was introduced in [9] for the univariate case. Note that the idea can be generalized for \line settings". Using Lagrange interpolation, the matrix AN can be exactly splitted as AN = D1 CD2 FN ; (1.6) where Di (i = 1; 2) are diagonal matrices and where
2?1 C := (1='((vj =N ) ? (k=n))N= j;k=?N=2 with a monotone decreasing function '. A. Dutt and V. Rokhlin [9] originally have used '(x) := tan(x) ? i. For real input data, a more ecient modi ed version with '(x) := 1=x was suggested in [6]. See Figure 1.1. The multiplication with C can be approximately realized by the fast multipole method [13]. In contrast to the other NDFT methods, the splitting (1.6) works also for the inverse matrix, i.e. A?N1 = FN D3 C T D4 , such that for suitably distributed nodes vj , the same algorithm can be used for the inverse transform. For clustered nodes, the inverse approach does not work, since the entries of the diagonal matrices dier extremely. See [6].
III. The most ecient NDFT algorithms for the computation of (1.2) and (1.3) we have noticed the rst time in [8, 4]. See also [5, 20]. In [25], we proposed a uni ed approach to the ecient computations of (1.2) and (1.3), which includes [8, 4]. In matrix-vector notation our approach to (1.2) can be written as AN BFn D (n := N ; > 1) ; (1.7) with a modi ed diagonal matrix D and with a sparse matrix B with nonzero entries ((vj =N ) ? (l=n)), where approximates '. The approaches in [8, 4] dier only by the choice of the window function '. Now we have learned that this algorithm with several window functions ' is a remake of a gridding algorithm, which was known in image processing context years ago [23, 15, 19]. The gridding method is simply the transposed version of (1.7) for the ecient computation of (1.3). However, the dependence of speed and accuracy of the algorithm on the choice of the oversampling factor and the window width of ' was rstly investigated in [8, 4]. In [25] the error estimates from [8] were improved, which leads to citeria for the choice of the parameters of the algorithm. In this paper, we repeat the uni ed approach (1.7) for the fast computation of (1.2) from [25] and show the relation to the gridding algorithm for the fast computation of (1.3). We will see that our approach is also closely connected with interpolation methods by \translates of one function" known from approximation theory. We include estimates of the approximation error.
4
Daniel Potts, Gabriele Steidl, Manfred Tasche 4
3.5
3
[sec]
2.5
2
[8] [9] [6] original
1.5
1
0.5
0 0
500
1000
1500
2000
2500
3000
3500
4000
4500
N
FIGURE 1.1. Time (in seconds) for straightforward computation of NDFT, NDFT{algorithm in [8], FMM{based NDFT{algorithms in [9] and [6] without consideration of initialization (precomputation) steps. Section 1.3 contains an algorithm for the ecient computation of the general NDFT (1.1). In Section 1.4, we demonstrate another advantage of Algorithm 1.1, namely its robustness with respect to roundo errors, a feature which is well-known from the classical FFT [21, 26]. Finally, we apply the NDFT in a fast Bessel transform algorithm.
1.2 NDFT for nonequispaced data either in time or frequency domain Let us begin with the computation of (1.2). Only for notational reasons, we restrict our attention to the case M = N . We have to evaluate the 1-periodic trigonometric polynomial
f (v) :=
X
k2IN
fk e?2ikv
(1.8)
at the nodes wj := vj =N 2 d (j 2 IN ). We introduce the oversampling factor > 1 and set n := N . Let ' be a 1-periodic function with uniformly convergent Fourier series. We approximate f by X s1 (v) := gl '(v ? nl ): (1.9) l2In
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
5
Switching to the frequency domain, we obtain
s1 (v) = =
X
k2In
X
k2Zd
g^k ck (') e?2ikv X
g^k ck (') e?2ikv +
with
g^k := Z
(1.10) X
r2Zdnf0g k2In X
l2In
g^k ck+nr (') e?2i(k+nr)v
gl e2ikl=n ;
(1.11)
ck (') := '(v)e2ikv dv (k 2 Zd): d
If the Fourier coecients ck (') become suciently small for k 2 ZdnIn and if ck (') 6= 0 for k 2 IN , then we suggest by comparing (1.8) with (1.10) to set g^k := 0fk =ck (') kk 22 IINn;I : (1.12) n N
Now the values gl can be obtained from (1.11) by the reduced inverse dvariate FFT of size N . If ' is also well-localized in time domain such that it can be approximated by a 1-periodic function with supp \ d 2m d (2m n), then n
f (wj ) s1 (wj ) s(wj ) =
gl (wj ? nl ) l2In;m (wj ) X
(1.13)
with In;m (wj ) := fl 2 IN : nwj ? m l nwj + mg. For xed wj 2 d , the above sum contains at most (2m + 2)d nonzero summands. In summary, we obtain the following algorithm for the fast computation of (1.2) with O((N )d log(N )) arithmetical operations:
Algorithm 1.1. (Fast computation of NDFT (1.2))
Input: N 2 N , > 1, n := N , wj 2 d, fk 2 C (j; k 2 IN ). 0: Precompute ck (') (k 2 IN ), (wj ? nl ) (j 2 IN ; l 2 In;m (wj )). 1: Form g^k := fk =ck (') (k 2 IN ): 2: Compute by d{variate FFT
gl := n?d 3: Set
s(wj ) :=
X
k2IN
g^k e?2ikl=n (l 2 In ):
X
l2In;m (wj )
gl (wj ? nl ) (j 2 IN ) :
6
Daniel Potts, Gabriele Steidl, Manfred Tasche
Output: s(wj ) approximate value of f (wj ) (j 2 IN ).
In the univariate setting d = 1 , Algorithm 1.1 reads in matrix-vector notation as AN f BFn Df ; where B denotes the sparse matrix
B := and where
N=2?1 ;n=2?1 l (wj ? n ) j =?N=2;l=?n=2
2?1 D := O j diag(1=(n ck (')))N= k=?N=2 j O with (N ,(n ? N )=2)-zero matrices O.
(1.14) T
(1.15)
If only few nodes vj dier from the equispaced nodes j , then the approximating function s1 of f can be alternatively constructed by requiring exact interpolation at the nodes j=n, i.e. f (j=n) = s1 (j=n) (j 2 In ). As consequence we have only to replace ck (') in step 1 of Algorithm 1.1 by the discrete Fourier sum of ('(l=n))l2In . The approximation of a function f by an interpolating function s1 which is a linear combinations of translates of one given function ' on a grid was considered in various papers in approximation theory. See for example [16] and also [4]. Let us turn to the gridding algorithm for the fast computation of (1.3). It seems to be useful to introduce the gridding idea by avoiding the convenient notation with delta distributions. In short: For
g(x) := we obtain that
ck (g) =
X
j 2IN
X
j 2IN
fj '(x + wj );
fj e?2ikwj ck (') = h(k) ck (') (k 2 Zd)
such that h(k) in (1.3) can be computed, if ck (g) is known. Now we determine Z X ck (g) = fj '(x + wj ) e2ikx dx d j 2IN
approximately by the trapezoidal rule X X fj '(wj ? nl ) e?2ikl=n ; ck (g) n1d l2In j 2IN
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
7
which introduces an aliasing error. Furthermore, we replace ' by its truncated version , which introduces a truncation error. Obviously, this results in a \transposed" Algorithm 1.1, which reads for d = 1 in matrix{vector notation ATN DT Fn B T : For l 2 In we introduce the index set Jm;n (l) := fj 2 IN : l ? m nwj l + mg.
Algorithm 1.2. (Fast computation of NDFT (1.3))
Input: N 2 N , > 1, n := N , wj 2 d, fk 2 C (j; k 2 IN ). 0: Precompute ck (') (k 2 IN ), (wj ? nl ) (l 2 In ; j 2 Jn;m (l)). 1: Set X fj (wj ? nl ) (l 2 In ) : g~l := j 2Jn;m (l)
2: Compute by d{variate FFT
c~k (g) := n?d
X
l2In
g~l e?2ikl=n (k 2 IN ):
3: Form ~h(k) := c~k (g)=ck (') (k 2 IN ): Output: h~ (k) approximate value of h(k) (k 2 IN ).
Both algorithms introduce the same approximation errors. By (1.13), the error splits as follows: E (wj ) := jf (wj ) ? s(wj )j Ea (wj ) + Et (wj ) with Ea (wj ) := jf (wj ) ? s1 (wj )j and Et (wj ) := js1 (wj ) ? s(wj )j. Note that Ea (wj ) and Et (wj ) are the aliasing error and the truncation error introduced by Algorithm 1.2, respectively. Let X jfk j: E1 := jmax E ( w ) ; jj f jj := j 1 2I N
k2IN
Then we obtain immediately by (1.8), (1.10) { (1.12) that X ck+nr (') X jfk j Ea (wj ) c (') k d k2IN r2Z nf0g X ck+nr (') jjf jj1 kmax 2IN r2Zdnf0g ck (') and by (1.9), (1.11) { (1.13) that
Et (wj ) jjf jj1 n?d (max jc (')j?1 ) k2I k N
X
l2In
(1.16)
j'(wj ? nl ) ? (wj ? nl )j:
8
Daniel Potts, Gabriele Steidl, Manfred Tasche
To ll these error estimates with life, we consider special functions ' in the case d = 1. In [25], we have proved
Theorem 1.1. Let (1:2) be computed by Algorithm 1:1 with the dilated, periodized Gaussian bell '(v) := (b)?1=2
X
r2Z
e?(n(v+r)) =b 2
and with the truncated version of ',
(v) := (b)?1=2
X
r2Z
e?(n(v+r)) =b [?m;m] (n(v + r)) : 2
Here [?m;m] denotes the characteristic function of [?m; m]. Let > 1 and 1 ?( k )2 b and 1 b (22m ?1) . Then ck (') = n e n
Ea(wj ) jjf jj1 e?b (1? ) 1 + (2 ?1)b2 + e?2b Et(wj ) jjf jj1 p2 (1 + 2bm ) e?b (( bm ) ?( ) ) ; b E1 4 jjf jj1 e?b (1? ) : 2
1
2
2
2
1 2
2
=
1 + (2 +1)b2
2
1
The approximation error decreases with increasing b. Therefore (1.17) b = (22m ? 1) provides a good choice for b as function of and m. For the above parameter b, pthe approximation error and the truncation error are balanced. Since 2= b contributes also to the decay of Et (wj ), we expect that the optimal parameter b lies slightly above the value in (1.17). A. Dutt and V. Rokhlin [8] does not give a criteria for the choice of the parameter b. They determine b by trial computations. Moreover, our error estimate is sharper than the estimate E1 (b + 9) jjf jj1 e?b (1? )=4 in [8]. For oversampling factor = 2, the new paper [19] of J. Pelt contains extensive numerical computations to determine the optimal parameter b as function of m. More recently, A. J. W. Duijndam and M. A. Schoneville [7] have evaluated the RMS-error of Algorithm 1.2 with the Gaussian pulse '. They found the same optimal parameter b as in (1.17), which con rms our result. Estimates for the multivariate setting were given in [11, 7]. 2
1 2
;
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
9
Figure 1.2 illustrates our theoretical results by numerical tests. The tests were implemented in MATLAB. The C{programs for Algorithm 1.1 were taken from [6] and were included by cmex. The programs were tested on a Sun SPARCstation 20 in double precision. Let N = 2t . As example we consider the computation of
f (wj ) =
t ?1 2X
k=0
e?2ikwj (j 2 I2t )
(1.18)
with uniformly distributed random nodes wj 2 ? 21 ; 12 . The exact vector t? f^ := (f (wj ))2j=?2?t1? is given by 1
1
t
?2iwj (2 ?1) 2iwj : f (wj ) := e 1 ? e2iw?j e By f~ we denote the result of Algorithm 1.1 with '; as in Theorem 1.1, m := 15; := 2 and b as in (1.17). Let ENDFT (t) := log10 (kf~ ? f^k2)=kf^k2 ) :
Figure 1.2 (left) shows the error ENDFT (10) for 10 numerical tests and for various m = 6; : : : ; 20. Figure 1.2 (right) presents the computation time (in seconds) for the cascade summation of (1.2) and for Algorithm 1.1. Note that for t = 13 the fast Algorithm 1.1 requires for less than one second while the direct method based on cascade summation takes more than 3 hours. 5
10
−5
4
−6
10
−7
3
10
−8 2
10
−9 1
10
−10
0
−11
10
−12
−1
10
−13 −2
10
−14 −3
10
5
10
15
20
0
2
4
6
8
10
12
14
16
FIGURE 1.2. Left: (m; E (10)) for m = 6; : : : ; 20. Right: Computation time in seconds for t = 2; : : : ; 15. NDFT
Next, we consider centered cardinal B-splines as window functions, which are in contrast to the Gaussian kernel of compact support such that we have
10
Daniel Potts, Gabriele Steidl, Manfred Tasche
no truncation error. Algorithm 1.2 with centered cardinal B-splines and with a modi ed third step was originally introduced in [4]. The dierence in the third step was motivated by the tight frequency localization of Lemarie{ Battle scaling functions which arise in wavelet theory. See also [25]. With M2m we denote the centered cardinal B-spline of order 2m.
Theorem 1.2. Let (1:2) be computed by Algorithm 1:1 with the dilated, periodized, centered cardinal B-spline '(v) :=
X
r2Z
M2m (n(v + r)) (m 1)
of order 2m and let = '. Then 8 > 1 k = 0; < ck (') = n1 sin(k=n) 2m
otherwise n k=n and for > 1 and 0 4m=3, ? E1 2m4m? 1 N2 (2 ? 1)?2m jf j;1 : > :
Here jf j;1 denotes the Sobolev-like seminorm jf j;1 :=
N=P 2?1
k=?N=2
jfk j jkj .
Proof: 1. For 0 < u < 1 and m 2 N, it holds that
1 u 2m 2 u 2m + 2 X u 2m u+r u?1 u?r r=2 r2Z=f0g Z 2m 1 u 2m 2 u ?u 1 + 2 u ? x dx 1 2m 2m4m? 1 u ?u 1 : 2. Since crn (') = 0 (r 6= 0) and X
2m 2m 2m k=n sin( k=n ) sin( k=n ) = n ck+rn (') = k=n + r k=n k=n + r 2m k=n = n ck (') k=n ; +r
we obtain by (1.16) that
E1 2m4m? 1
N= 2?1 X k=?N=2 (k6=0)
j m? : jfk j jkj n? (k=njk=n ? sgn(k)) m 2
2
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
11
The functions h+ (u) := u2m? =(u ? 1)2m and h? (u) := u2m? =(u + 1)2m are monotonically increasing for u 2 [0; 21 ] and 4m=3. Thus, since jkj N=2, we obtain the assertion.
By Theorem 1.2, we have for = 0 that E1 2m4m? 1 (2 ? 1)?2m jjf jj1 : If the values fk are Fourier coecients of a smooth function such that jf j;1 does not increase with N ! 1, then the estimates with larger values of may be useful. For example, we obtain for = = m = 2 by Theorem 1.2 that 5 E1 32 N ?2 jf j2;1 : Multivariate estimates were given in [11, 4]. In various papers, other window functions than the Gaussian pulse or centered cardinal B{splines were considered: - prolate spheroidal functions and Kaiser-Bessel functions [15], - Gaussian pulse tapered with a Hanning window [7], - Gaussian kernels combined with sinc kernels [19], - special optimized windows [15, 7]. Numerical results demonstrate that the approximation error (as function of the speed of the computation) can be further improved by using these functions.
1.3 NDFT for nonequispaced data in time and frequency domain The following algorithm for the fast computation of the general NDFT (1.1) (with M = N ) turns out to be a combination of the gridding approach to (1.3) and of our approach to (1.2). Moreover, in order to apply again an aliasing formula, a periodization procedure becomes necessary. In order to form a-periodic functions, a parameter a comes into the play. By '1 2 L2 (Rd ) we denote a suciently smooth function with Fourier transform Z '^1 (v) := '1 (x)e?2ivx dx 6= 0 Rd
12
Daniel Potts, Gabriele Steidl, Manfred Tasche
for all v 2 N d . Then we obtain for
G(x) := that
G^ (v) =
and consequently
X
k2IN
X
fk '1 (x ? xk )
fk e?2ixk v '^1 (v)
k2IN
^ f (vj ) = '^G((vvj )) (j 2 IN ): 1 j Thus, given '^1 (vj ), it remains to compute G^ (vj ). Let n1 := 1 N (1 > 1), m1 2 N (2m1 n1 ). We introduce the parameter a := 1 + 2nm and rewrite G^ (v) as 1 1
G^ (v) = =
X
k2IN X
k2IN
fk fk
Z
RZd
'1 (x ? xk ) e?2ixv dx X
ad r2Zd
'1 (x + ra ? xk ) e?2i(x+ra)v dx: (1.19)
Discretization of the integral by the rectangular rule leads to X X X j fk n?1 d G^ (v) S1 (v) = '1 ( nj + ra ? xk ) e?2i( n +ra)v : 1 j 2Ian r2Zd k2IN (1.20) Now we approximate the function '1 by a function 1 with supp 1 2m d . Then the third sum in (1.20) contains only one nonzero summand n for r = 0. Changing the order of the summation, we obtain 1
1
1 1
!
fk 1 ( nt ? xk ) e?2itv=n : 1 1 t2Ian k2IN After the computation of the inner sum for all t 2 Ian , we arrive at computation problem (1.2), which can be solved in a fast way by Algorithm 1.1, where the corresponding window functions and parameters are indicated by the index 2. We summarize: S1 (v) S2 (v) := n?d
X
X
1
1
1
Algorithm 1.3. (Fast computation of NDFT (1.1))
Input: N 2 N , 1 > 1, 2 > 1, n1 := 1 N , a := 1 + 2nm11 , n2 := 1 2 aN , xk 2 d , vj 2 N d , fk 2 C (j; k 2 IN ). 0: Precompute '^1 (vj ) (j 2 IN ), 1 ( nt1 ? xk ) (k 2 IN ; t 2 Ian1 ;m1 (xk )), ct ('2 ) (t 2 Ian1 ), 2 ( nvj1 ? nl2 ) (j 2 IN ; l 2 In2 ;m2 ( nvj1 )).
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial 1: Compute
13
fk 1 ( nt ? xk ) (t 2 Ian ) : 1 k2IN 2: Form g^t := F (t)=ct ('2 ) (t 2 Ian ): 3: Compute by d{variate FFT F (t) :=
X
1
1
gl := n?2 d 4: Form
X
t2Ian1
g^t e?2itl=n
(l 2 In2 ):
2
gl 2 ( nvj ? nl ) (j 2 IN ) : 1 2 (v =n )
X
s(vj ) := n?1 d
l2In2 ;m2 j 1 5: Form S (vj ) := s(vj )='^1 (vj ) (j 2 IN ). Output: S (vj ) approximate value of f (vj ) (j 2 IN ).
The Algorithm 1.3 requires O((1 2 aN )d log(1 2 aN )) arithmetical operations. The approximation error is given by E (vj ) E1 (vj ) + E2 (vj ) + E3 (vj ) with E1 (vj ) := jf (vj ) ? 'S^ ((vvjj )) j; E2 (vj ) := j S (v'^j )?(vSj )(vj ) j; and E3 (vj ) := j S ('v^j )(?vjs)(vj ) j : The error E3 (vj ) is the product of the error of Algorithm 1.1 and j'^1 (vj )j?1 . The cut-o error E2 (vj ) behaves like the truncation error in Algorithm 1.1. The error E1 (vj ) arising from the discretization of the integral (1.19) can be estimated by an aliasing argument [11]. The following Table 1.1 (see also [10, 11]) contains the relative approximation error max jf (vj ) ? S (vj )j= jmax jf (vj ) : j 2I 2I 1
1
1
2
1
2
1
N
N
introduced by Algorithm 1.3 for tensor products of Gaussian bells '1 and '2 , for N := 128, m = m1 = m2 , a := NN?m , 1 := a2 , 2 := 2 and for randomly distributed nodes xj and vk =N in 2 . By the choice of 1 ; 2 and a, the main eort of the algorithm consists in the bivariate FFT of size 4N = 1024. The third column of Table 1.1 contains the error of Algorithm 1.3, if we simply set a = 1 and 1 = 2 = 2. This change of parameters in uences only the rst step of the algorithm. (A similar error occurs, if we consider 1 as 1-periodic function.) Table 1.1 demonstrates the signi cance of the parameter a.
1.4 Roundo errors Beyond the approximation error, the numerical computation of Algorithm 1.1 involves roundo errors. In this section, we will see that similar to the
14
Daniel Potts, Gabriele Steidl, Manfred Tasche
m 5 7 9 11 13 15
a = NN?m 5.96608e-06 5.44728e-08 1.07677e-09 3.31061e-11 1.26030e-12 2.16694e-13
a = 1 0.0180850 0.0318376 0.0541445 0.0906439 0.1507300 0.2500920
TABLE 1.1. Approximation error of Algorithm 1:3 for N := 128, a := NN?m , 1 := a2 , 2 := 2 and for a := 1, 1 = 2 := 2, respectively. classical FFT [21, 14], our algorithm is robust with respect to roundo errors. In the following, we use the standard model of real oating point arithmetic (see [14], p. 44): For arbitrary ; 2 R and any operation 2 f+; ?; ; =g the exact value and the computed value ( ) are related by
( ) = ( ) (1 + ) (jj u) ; where u denotes the unit roundo (or machine precision). In the case of single precision (24 bits for the mantissa (with 1 sign bit), 8 bits for the exponent), we have u = 2?24 5:96 10?8 and for double precision (53 bits for the mantissa (with 1 sign bit), 11 bits for the exponent) u = 2?53 1:11 10?16: Since complex arithmetic is implemented using real arithmetic, the complex oating point arithmetic is a consequence of the corresponding real arithmetic (see [14], pp. 78 { 80): For arbitrary ; 2 C , we have
( + ) = ( + ) (1 + ) (jj u) ; (1.21) p (1.22)
( ) = (1 + ) (jj 12 ?22uu ) : In particular, if 2 R [ iR and 2 C , then
() = (1 + ) (jj u) : (1.23) To simplify the following error analysis, we assume that all entries of AN , B , Fn and D were precomputed exactly. Moreover, we restrict our attention to real input vectors f 2 RN . If we form f^ := AN f by conventional multiplication and cascade summation (see [14], p. 70), then we obtain by (1.21) and (1.23) the componentwise estimate 2 N e + 1) u j (f^)j ? f^j j 1 (?dlog (dlog2 N e + 1)u jjf jj1
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
15
and by taking the Euclidean norm ? k (f^) ? f^k2 u N (dlog2 N e + 1) + O(u2 ) kf k2 : In particular, we have for f^ := FN f that ?
k (FN f ) ? FN f k2 u N (dlog2 N e + 1) + O(u2 ) kf k2 : If we compute f^ = FN f (f 2 RN , N power of 2) by the radix{2 Cooley{ Tukey FFT, then, following the lines of the proof in [26] and using (1.21) p
{ (1.22), the roundo error estimate can be improved by the factor N , more precisely
p p k (FN f ) ? FN f k2 u (4 + 2) N log2 N + O(u2 ) kf k2 : (1.24)
The following theorem states that the roundo error introduced by Algorithm 1.1 can be estimated as the FFT error in (1.24) up to a constant factor, which depends on m and .
Theorem 1.3. Let m; N 2 N and let n := N ( > 1) be a power of 2 with 2m n. Let h be a nonnegative real{valued even function, which decreases monotonically for x 0, and let '(x) := (x) :=
X
r2Z
h(n(x + r)) ;
X
([?m;m]h)(n(x + r)) :
r2Z
Suppose that ' has a uniform convergent Fourier expansion with monotone decreasing absolute values of Fourier coecients c (') = 1 h^ ( 2k ) (k 2 Z) : k
n
n
Let the nodes wj := vNj 2 [?21 ; 12 ), wj 1 (j 2 IN ) be distributed such that each m \window" ? n + nl ; mn + nl (l 2 In ) contains at most = nodes. If (1.2) is computed by Algorithm 1.1 with the above functions '; , i.e., f~ := B Fn D f (f 2 RN ) ;
where D 2 C n;N and B 2 RN;n are determined by (1.14) { (1.15), then the roundo error of Algorithm 1.1 can be estimated by
p p k (f^)?f~k2 p u(4 + 2) N (log2 N + log2 + 2m +p1 ) + O(u2 ) kf k2 4+ 2 with 2 2 1=2 := (h (0)^+ jjhjjL2 ) : jh(=)j
16
Daniel Potts, Gabriele Steidl, Manfred Tasche
Let us call an algorithm for the computation of the NDFT (1.2) robust, if for all f 2 RN there exists a positive constant kN with kN u 1 such that jj (f~) ? f~jj2 (kN u + O(u2 )) jjf jj2 : Then, by Theorem 1.3, Algorithm 1.1 is robust.
Proof: 1. First, we estimate the spectral norms of D and B ([14], p. 120). By assumption and by (1.15), we see immediately that
kDk = kmax jn c (')j? = (n jcN= (')j)? = jh^ (=)j? : (1.25) 2IN k Since is even and monotonically decreasing for x 0, it is easy to check the 1
2
1
2
1
integral estimate
1 X 2 (w ? l ) 1 j n n l2In n Then it follows by de nition of that X
2
2
(0) +
Z 1=2
?1=2
2
(x) dx :
Z m=n h2 (nx) dx (wj ? nl ) h2 (0) + n
?m=n
l2In
h (0) + jjhjjL : 2
(1.26)
2
2
By de nition (1.14) of the sparse matrix
k 2?1;n=2?1 B = (bj;k )N= j =?N=2;k=?n=2 ; bj;k := (wj ? n ) ; we get for the j {th component (By)j of By (y = (yk )kn==2??n=1 2 2 C n ) that 2m X
j(By)j j 2
r=1 2m X
r=1
jbj;kr j jykr j !
bj;kr 2
!2
2m X
r=1
(bj;kr > 0; kr 2 f?n=2; : : : ; n=2 ? 1g) !
ykr : 2
By (1.26), we have 2m X
r=1
b2j;kr
X
k2In
(wj ? nk )2 h2 (0) + jjhjj2L2
such that
j(By)j j (h (0) + jjhjjL ) 2
2
2
2
2m X
r=1
yk2r :
(1.27)
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
17
By assumption, each \window" [? mn + nl ; mn + nl ) (l 2 In ) contains at most = nodes wj ; wj 1 (j 2 IN ). Therefore each column of B contains at most = nonzero entries such that by (1.27)
kByk = 2 2
N= 2?1 X
j(By)j j (h (0) + jjhjjL ) kyk 2
2
j =?N=2
2
2
2 2
and consequently q
2 2 1=2 ~ (h (0) + jjhjjL ) =: : 2. Next, it is easy to check that by (1.23) and (1.25) k (Df ) ? Df k2 u jh^ (=)j?1 kf k2 :
kB k 2
(1.28)
2
(1.29)
From (1.25) it follows that
k (Df )k k (Df ) ? Df k + kDf k jh^ (=)j? (u + 1) kf k : 2
2
1
2
2
(1.30)
3. Set y^ := (Fn ( (Df )) and y := Fn Df . Then we can estimate
ky^ ? yk ky^ ? Fn ( (Df ))k + kFn ( (Df )) ? Fn Df k 2
2
2
such that by (1.24), (1.29) and (1.30)
ky^ ? yk 2
?
p p
u(4 + 2) n log2 n + O(u2 ) k (Df )k2 p + n k (Df ) ? Df k2 ? p jh^ (=)j?1 u(4 + 2)pn log2 n + pnu + O(u2 ) kf k2 : (1.31)
4. Finally, we consider the error between (f~) := (B y^) and f~ := By. By (1.28) and (1.31), we obtain k (f~) ? f~k2 k (B y^) ? B y^k2 + kB (^y ? y)k2 k? (B y^)p? B y^k2 + ~ jh^ (=)j?1 p p u(4 + 2) n log2 n + nu + O(u2 ) kf k2 : (1.32) By (1.21), (1.23) and (1.14), it follows from [14], p. 76 that j (B y^) ? B y^j 1 ?2mu 2mu B jy^j with jy^j := (jy^k j)kn==2??n=1 2 and consequently by (1.28) that
k (B y^) ? B y^k 1 ?2mu 2mu kB k ky^k ? 2m ~ u + O(u ) ky^k : 2
2
2
2
2
18
Daniel Potts, Gabriele Steidl, Manfred Tasche
By (1.30) and (1.25), we obtain
ky^k k?y^ ? yk + kyk = kFn Df k + O(u) kf k p ? n jh^ (=)j + O(u) kf k : 2
2
2
2
1
2
2
Thus
k (B y^) ? B y^k 2m ~ pn jh^ (=)j? u + O(u ) kf k : ?
1
2
2
(1.33)
2
Together with (1.32) this yields the assertion.
Note that 2m for \uniformly distributed" nodes wj . Finally, we con rm our theoretical results by numerical experiments. We use the same setting as in Section 1.2. Further, let f~C 2 C 2t denote the vector, which was evaluated by cascade summation of the right{hand side of (1.18), and let
EC (t) := logt (kf~C ? f^k2 )=kf^k2 ) : Figure 1.3 (left) shows the error EC (t) for 10 numerical tests with various random nodes wj as function of the transform length N = 2t . For comparison, Figure 1.3 (right) presents the corresponding error ENDFT (t) introduced by NDFT{Algorithm 1.1. −10
−10
−11
−11
−12
−12
−13
−13
−14
−14
−15
−15
−16
−16
−17
0
2
4
6
8
10
12
14
16
−17
0
2
4
6
8
10
FIGURE 1.3. Left: (t; E (t)) for t = 1; : : : ; 13. Right: (t; E
t = 1; : : : ; 15.
C
12
NDFT
14
16
(t)) for
Chapter 1. Fast Fourier transforms for nonequispaced data: A tutorial
19
1.5 Fast Bessel transform In this section, we apply the NDFT for a fast Bessel transform. We are interested in a fast algorithm for the computation of
f^(n) := where
Jn (x) :=
Z1
0
f (x) Jn (x) dx (n 2 N 0 ) ;
1 X k=0
(1.34)
2)n+2k (x 2 R) (?1)k k(x= ! (n + k)!
is the Bessel function (of rst kind) of order n. Let f be a real{valued, compactly supported function with supp f [0; a] (0 < a < 1). Using the formula (see [1], p. 361) 1 X
e?iq cos t = 2
k=0
1 0 (?1)k J2k (q) cos(2kt) + 2i X(?1)k J2k+1 (q) cos(2k +1)t k=0
with
1 X
1 0 ak := 1 + X ak ;
2 k=1 we obtain for x 2 [?1; 1] and q 2 R that k=0
e?iqx = 2
1 X k=0
0 (?1)k J2k (q) T2k (x)
? 2i
1 X k=0
(?1)k J2k+1 (q) T2k+1 (x) :
Now multiplication with f (q) and integration with respect to q yields
h(x) :=
Za
0
= 2
f (q) e?iqx dq
1 X k=0
0 (?1)k f^(2k) T2k (x)
(1.35)
? 2i
1 X k=0
(?1)k f^(2k + 1) T2k+1 (x) :
Note that (Re h)(x) = (Re h)(?x) ; (Im h)(x) = ? (Im h)(?x) : Finally, orthogonality of the Chebyshev polynomials Tk (x) Z1
8