A Fast Fourier Algorithm on the Rotation Group - CiteSeerX

A Fast Fourier Algorithm on the Rotation Group Daniel Potts∗

J¨ urgen Prestin†

Antje Vollrath‡

In this paper we present an algorithm for the fast Fourier transform on the rotation group SO(3) which is based on the fast Fourier transform for nonequispaced nodes on the three-dimensional torus. This algorithm allows to evaluate the SO(3) Fourier transform of B-band-limited functions at M arbitrary input nodes in O(M + B 3 log2 B) flops instead of O(M B 3 ). Some numerical results will be presented establishing the algorithm’s numerical stability and time requirements. 2000 AMS Subject Classification: 65T99, 33C55, 42C10, 65T50 Keywords: rotation group, spherical Fourier transform, generalized spherical harmonics, associated Legendre functions, Wigner-D functions, Wigner-d functions, fast discrete transforms, fast Fourier transform at nonequispaced nodes, SOFT

1 Introduction During the past years, several different techniques have been proposed for computing Fourier transforms on the rotation group SO(3) motivated by a variety of applications. These algorithms are based on discrete Fourier transforms on the two-dimensional sphere and a generalization thereof. In [10] Kostelec and Rockmore use a separation of variables based technique together with a bivariate FFT and a recursive evaluation of Wigner-d functions. In contrast Risbo in [17] describes how to expand the Wigner-d functions into a Fourier sum and handle the transform on the whole group SO(3) by a trivariate FFT. Yet all these techniques consider samples on specially structured, i.e., equispaced, grids on the SO(3). But in some fields of application like protein-protein-docking [3] or texture analysis [2, 18] ∗

[email protected], Chemnitz University of Technology, Department of Mathematics, 09107 Chemnitz, Germany † [email protected], University of L¨ ubeck, Institute of Mathematics, Wallstrasse 40, 23560 L¨ ubeck, Germany ‡ [email protected], University of L¨ ubeck, Institute of Mathematics, Wallstrasse 40, 23560 L¨ ubeck, Germany

1

one often deals with nonuniformly distributed samples. Our work is based on the Wigner-D functions Dlmn , which yield an orthogonal basis of the L2 (SO(3)). Using these functions we expand B-band-limited functions f ∈ L2 (SO(3)) into the sum f (g q ) =

B X l l X X

l=0 m=−l n=−l

fblmn Dlmn (g q ),

q = 0, . . . , M − 1.

In the course of the paper we present an algorithm for the efficient and accurate calculation of B-band-limited functions f ∈ L2 (SO(3)) at samples g ∈ SO(3), outlining that the above sum is the discrete Fourier transform of a function defined on the rotation group. We use the nonequispaced fast Fourier transform (NFFT) algorithm, see e.g., [15] and [8] for its implementation. Based on the fast polynomial transform, cf. e.g., [6, 13] we develop an algorithm for transforming sums of Wigner-d functions fastly into sums of Chebychev polynomials. Putting together our efforts the SO(3) Fourier transform will be sped up from O(M B 3 ) flops to O(M + B 3 log2 B) flops, with B being the bandwidth and M the number of input nodes.

FFT NFFT SOFT NFSOFT NDSOFT FWT DWT

complete name Fast Fourier Transform Nonequispaced Fast Fourier Transform (Equispaced) SO(3) Fourier Transform Nonequispaced Fast SO(3) Fourier Transform Nonequispaced Discrete SO(3) Fourier Transform Fast Wigner Transform Discrete Wigner Transform

complexity O(B d log B) O(M + B d log B)

reference [7] [8]

O(B 4 ), M = (2B)3

[10]

O(M + B 3 log2 B)

Sect. 2.6

O(M B 3 )

(2.12)

O(B log2 B) O(B 2 )

Sect. 2.4 Sect. 2.3

Table 1.1: A list of the transforms mentioned in this paper with references and their asymptotic complexities depending on the bandwidth B and the number of input nodes M . The paper is structured as follows. We begin with a short introduction to the group SO(3) including Euler angles and sampling on SO(3). Important properties of the Wigner-D functions Dlmn as well as normalization and relation to functions known from harmonic analysis on the two-dimensional sphere are outlined briefly. We then present an approach to the SO(3) Fourier transform for B-band-limited functions f ∈ L2 (SO(3)) at M arbitrary nodes g q ∈ SO(3). Following this, the adjoint transform, i.e., the reconstruction of SO(3) Fourier coefficients out of function samples f (g q ) is described.

2

Introducing the matrix-vector notation f = D fb for this transform we split up the computation of the whole transform into three steps f = D fb = F AW fb where f contains the M function samples f (g q ) and fb the SO(3) Fourier coefficients flmn . The corresponding matrices F , A and W will be discussed in the following subsections each. The first step, the multiplication with the matrix W represents the Wigner-d transform which is discussed in Section 2.3. There the transform itself as well as a fast algorithm is explained. Here, the deduction of a modified three-term recurrence relation plays an important role. The following section deals with an intermediate step, the multiplication with the matrix A, transforming Chebychev sums into trigonometric sums. Finally we consider the whole SO(3) Fourier transform in Section 2.6 using the NFFT to handle the computation represented by the nonequispaced Fourier matrix F . Again, also the adjoint transform is considered briefly. Additionally, Section 2.7 compares our algorithm to the equispaced SO(3) Fourier transform from [10]. In the last section we present time and error estimates of our implementation along with a comparison to the equispaced SOFT algorithm implemented in [11] and a short conclusion.

2 The Rotation Group SO(3) An orthogonal matrix R ∈ R3×3 with determinant det(R) = 1 can be identified with a rotation in R3 . It is an orientation-preserving orthogonal transformation and can be interpreted as the circular movement of an object in R3 by an angle about a fixed point. The set of all such matrices {R ∈ R3×3 : det(R) = 1, RT R = I3 } constitutes the special orthogonal group SO(3). There are various ways to parametrise this group [19, Sect. 1.4]. We use one variant of the well known Euler angles. Definition 2.1 Given three angles α, γ ∈ [0, 2π) and β ∈ [0, π], the corresponding rotation R(α, β, γ) is given by R(α, β, γ) = RZYZ (α, β, γ) = RZ (α)RY (β)RZ (γ) where

  cos ϕ − sin ϕ 0 RZ (ϕ) =  sin ϕ cos ϕ 0 , 0 0 1

 cos θ 0 sin θ 1 0  RY (θ) =  0 − sin θ 0 cos θ 

denote rotations about the z- and y-axis, respectively.

In literature there are different conventions for choosing those angles. Most commonly we find the ZXZ- and the ZYZ-convention where the Euler angles denote a sequence of rotations about the z, x, z-axes or the z, y, z-axes respectively. Both representations can be transformed into each other as RZYZ (α, β, γ) = RZXZ (α + π/2, β, γ − π/2).

3

Throughout this paper we will use the ZYZ-convention, identifying an element g ∈ SO(3) with g(α, β, γ) = RZYZ (α, β, γ). For sake of simplicity, we will also denote functions f : SO(3) → C by f (g(α, β, γ)) = f (α, β, γ).

2.1 A Basis System for L2 (SO(3)) We now consider the Hilbert space L2 (SO(3)) with the inner product of two functions f1 , f2 ∈ L2 (SO(3)) given by Z hf1 , f2 i = f1 (g)f2 (g) dg SO(3) 2π Z π

Z

=

0

0

Z

2π

f1 (α, β, γ)f2 (α, β, γ) sin(β) dαdβdγ

(2.1)

0

p and the corresponding norm ||f ||L2 (SO(3)) = hf, f i. In this section we introduce the Wigner-D and Wigner-d functions. They play a major role in Fourier Analysis on the SO(3). The Wigner-D functions Dlm,n (g) are the eigenfunctions of the Laplace operator for SO(3). When parametrizing these eigenfunctions in Euler angles we obtain an explicit expression for the Wigner-D functions (see [4, Sect. 9]). They are given for |m|, |n| ≤ l ∈ N0 by Dlm,n (α, β, γ) = e−imα e−inγ dm,n (cos β) l

(2.2)

where dm,n (x) l

(−1)l−m = 2l

s

(l + m)! (l − n)!(l + n)!(l − m)!

s

(1 − x)n−m dl−m (1 + x)n+l (1 + x)m+n dxl−m (1 − x)n−l

are called Wigner-d functions. They satisfy the orthogonality condition Z π 1 δll′ . dm,n (cos β)dm,n l l′ (cos β) sin β dβ = l + 12 0

(2.3)

(2.4)

Wigner-d as well as Wigner-D functions generalize functions that are known from harmonic analysis on the sphere S2 , e.g., s 1 (l − n)! p dl+n 0,−n n dl (x) = Pl (x) = l (2.5) (1 − x2 )n l+n (x2 − 1)l 2 l! (l + n)! dx where Pln denotes the associated Legendre functions. Due to (2.5) the Wigner-d functions are also called generalized associated Legendre functions. We now obtain immediately the relation between Wigner-D functions Dlm,n (α, β, γ) and spherical harmonics Ylm as r r 2l − 1 imα 0,m 2l − 1 0,−m δm|m| m m e dl (cos β) = (−1) Dl (α, β, γ) Yl (ξ) = Yl (β, α) = 4π 4π 4

where (β, α) ∈ [0, π] × [0, 2π) are the polar coordinates of the point ξ ∈ S2 . As a result of this relationship Wigner-D functions are sometimes also called generalized spherical harmonics (cf. e.g. [2]). Note that subsequently we will simplify Dlm,n = Dlmn as well as dm,n = dmn if no confusion occurs. l l By means of the Peter-Weyl-Theorem (cf. [20, Sect. 3.3] for details) the harmonic spaces Harml (SO(3)) = span {Dlmn : m, n = −l, . . . , l} spanned by the Wigner-D functions satisfy L2 (SO(3)) = closL2

∞ M

Harml (SO(3)).

l=0

Hence the collection of Wigner-D functions {Dlmn (g) : l ∈ N0 , m, n = −l, . . . , l} forms an orthogonal basis system in L2 (SO(3)). Moreover we define the function spaces DB =

B M

Harml (SO(3))

l=0

for arbitrary B ∈ N. The dimension of these spaces is given by dim(DB ) =

B X l=0

1 (2l + 1)2 = (B + 1)(2B + 1)(2B + 3). 3

(2.6)

We define an ordered set of indices IB = {(l, m, n) : l = 0, . . . , B; m, n = −l, . . . , l}

(2.7)

corresponding to all sets of admissible indices (l, m, n) within the space DB . Throughout this paper we keep a particular order of the indices fixed. The Wigner-D functions Dlmn are not normalized with respect to the inner product (2.1) but for all g ∈ SO(3) they satisfy kDlmn (g)k2L2 (SO(3)) =

4π 2 . l + 21

Every element f ∈ L2 (SO(3)) has a unique series expansion in terms of the Wigner-D functions ∞ X l l X X fblmn Dlmn (g), (2.8) f (g) = l=0 m=−l n=−l

where g ∈ SO(3) and the SO(3) Fourier coefficients fblmn are given by the integral fblmn =

l + 21 hf, Dlmn i. 4π 2

(2.9)

5

For approximation purposes we deal with B-band-limited functions f ∈ DB which can be written as their own finite Fourier partial sum X fblmn Dlmn (g). (2.10) f (g) = (l,m,n)∈IB

The fast evaluation of (2.10) for a set of samples g ∈ SO(3) is the main aspect of this paper.

2.2 Discrete Fourier Transforms on SO(3) From now on we restrict ourselves to B-band-limited functions f ∈ DB . These functions will be evaluated on a certain number of sampling nodes g q = (αq , βq , γq ) from a nonequispaced sampling set XM = {(αq , βq , γq ) : q = 0, . . . , M − 1}

(2.11)

where 0 ≤ αq , γq < 2π and 0 ≤ βq ≤ π are Euler angles. The Fourier sum (2.8) of a B-band-limited function f ∈ DB reads as f (αq , βq , γq ) =

X

(l,m,n)∈IB

fblmn Dlmn (g) =

B X l l X X

l=0 m=−l n=−l

fblmn Dlmn (αq , βq , γq )

(2.12)

for q = 0, . . . , M − 1. The evaluation of this finite triple sum where the complex valued SO(3) Fourier coefficients fblmn ∈ C are given will be called the nonequispaced discrete Fourier transform on the rotation group (NDSOFT). Let us now consider the NDSOFT (2.12) from an algebraic point of view. According to (2.6) we deal with 13 (B + 1)(2B + 1)(2B + 3) Fourier coefficients fblmn . These coefficients are mapped onto M sampled function values of a function f ∈ DB . The NDSOFT can thus be represented in matrix-vector notation as f = D fb

(2.13)

where the vector f = (f (g))g∈XM ∈ CM contains the sampled function values of f , the 1 vector fb = (fblmn )(l,m,n)∈IB ∈ C 3 (B+1)(2B+1)(2B+3) the Fourier coefficients for all triples from the set of indices IB (see (2.7)). The matrix 1

D = (Dlmn (αq , βq , γq ))(αq ,βq ,γq )∈XM ;(l,m,n)∈IB ∈ CM × 3 (B+1)(2B+1)(2B+3) will be called the nonequispaced SO(3) Fourier matrix. On the other hand if we seek to compute SO(3) Fourier coefficients fblmn out of function samples f (αq , βq , γq ) the inner product (2.9), i.e., an integral needs to be calculated. As 6

we only know samples of the function f we have to use quadrature rules. For a certain choice of nodes (αq , βq , γq ) one can compute the coefficients fblmn =

M −1 X

wqB f (αq , βq , γq ) Dlmn (αq , βq , γq )

(2.14)

q=0

for all admissible (l, m, n) ∈ IB using an associated quadrature rule and quadrature weights wqB , depending on the bandwidth B of the function and the sampling set XM itself. The calculation of (2.14) is realized by a matrix-vector multiplication fb = D H W f

where fb, f and D are the same as in (2.13) and W ∈ CM ×M is a matrix containing the quadrature weights wqB . A matrix-vector multiplication with D H is called adjoint NDSOFT. An example of (2.14) will be given in Section 2.7. In the more general case where we use arbitrary points we need to determine the corresponding quadrature rules first. This is beyond the scope of this paper and will be considered in future work. Let us analyse the computational complexity of computing the NDSOFT naively via the matrix-multiplication (2.13). Owing to the size of D, the asymptotic complexity for evaluating the NDSOFT of a B-band-limited function f ∈ DB at M nodes would be O(M B 3 ) flops. Analogously the adjoint transform in (2.14), i.e., the multiplication of the sample vector f with D H requires O(M B 3 ) operations as well. The lower bound of the algorithm is given by the number of input values of which there are O(M + B 3 ) consisting of O(B 3 ) Fourier coefficients and O(M ) nodes. We will not obtain but approach this lower bound by O(M + B 3 log2 B) flops. This will be accomplished by generalising different algorithms in particular we will adopt an algorithm which has been presented in [12] for the two-dimensional sphere and combine it with the separation of variables technique from [10]. In the following sections we will describe how to split up the NDSOFT (2.13) such that we can use the NFFT [8, 15] on one hand and a fast polynomial transform [6, 13] on the other hand for its evaluation. We start focusing on the latter.

2.3 Discrete Wigner Transform In this section we deal with the Wigner-d functions dmn as they are the main source for l the high computational complexity with O(B) flops needed for their evaluation. One important feature we will use is the orthogonality of the Wigner-d functions dmn l . Since they are orthogonal functions in the sense of (2.4) they can be described not only by their Rodrigues formula given in (2.3) but also by a three-term recurrence relation that reads for |m|, |n| ≤ l as mn mn mn mn mn dmn l+1 (x) = (ul x + vl )dl (x) + wl dl−1 (x),

x = cos θ,

(2.15)

7

with the recurrence coefficients given by umn = l vlmn = wlmn =

(l + 1)(2l + 1) p , ((l + 1)2 − m2 )((l + 1)2 − n2 ) −mn(2l + 1) p , l ((l + 1)2 − m2 )((l + 1)2 − n2 ) p −(l + 1) (l2 − m2 )(l2 − n2 ) p l ((l + 1)2 − m2 )((l + 1)2 − n2 )

where we set dmn l (x) = 0 for all l < max(|m|, |n|). For the evaluation of this three-term recurrence relation we can now employ the Clenshaw algorithm e.g., [16, pp.184]. Although it decreases the memory requirements compared to the naive matrix multiplication from (2.13) it is still impracticably slow for most applications as it does not reduce the computational complexity of O(M B 3 ) flops in total. This recursive evaluation however will be used in the discrete Wigner transform (DWT) which will be described now. The aim of the DWT is to transform a linear combination of Wigner-d functions dmn l from (2.3) into a linear combination of Chebychev polynomials of the first kind Tl . We will take advantage of the fact that Wigner-d functions are similar to the associated Legendre functions that are subject matter of [12]. In fact they are a generalization of those functions as mentioned before in (2.5). From equation (2.3) it can be seen that for m + n even the Wigner-d functions dmn l (x) are polynomials of degree at most l whereas for odd m + n they have to be transformed into polynomials of degree l − 1 by multiplying them with (1 − x2 )−1/2 . Performing a change of the polynomial basis we get the following Chebychev coefficients tmn out of l the input coefficients fblmn by computing B X

l=max(|m|,|n|)

fblmn dmn l (cos θ)

=

 B X    tmn  l Tl (cos θ)  l=0

for m + n even,

B−1 X     tmn sin θ  l Tl (cos θ) for m + n odd,

(2.16)

l=0

o n at nodes θ ∈ XC on the equispaced grid XC = (2k+1)π 2(B+1) : k = 0, . . . , B where m, n are fixed and Tl (cos θ) = cos lθ are Chebychev polynomials of degree l. Definition 2.2 For fixed orders m and n the change of basis given in equation (2.16) T mn mn and between the vector of SO(3) Fourier coefficients fb = fbmax(|m|,|n|) , . . . , fbBmn

mn T the vector of Chebychev coefficients tmn = (tmn 0 , . . . , tB ) is described by the matrix mn (B+1)×(B−max(|m|,|n|)) W ∈C through mn

tmn = W mn fb 8

.

Lemma 2.3 The matrix W mn given in Definition 2.2 can be separated into ( ˜ mn TD for m + n even, mn W = mn −1 ˜ S TD for m + n odd

(2.17)

where (2l + 1)k π 2 − δ0k cos , T = B+1 2(B + 1) k,l=0,...,B (2k + 1)π mn mn ˜ D = dl (cos ) 2(B + 1) (k=0,...,B);(l=max(|m|,|n|),...,B)

and

(q + 1)π S = diag sin B+2

. q=0,...,B

Proof. We consider the B + 1 nodes of an equispaced grid XC . For these nodes the matrix-vector-notation of equation (2.16) reads as ( −1 mn for m + n even, mn mn ˜ fb = T t D −1 mn T St for m + n odd mn ˜ mn are defined as in Lemma 2.3. A simple calculation shows where fb , tmn , S and D that the inverse of T is given by (2k + 1)l π −1 T = cos . 2(B + 1) k,l=0,...,B

Since S as well is a nonsingular matrix we obtain the unique solution of ˜ tmn = T D for even orders m + n, and

for odd orders m + n.

mn mn

fb

mn

= W mn fb

˜ mn fbmn = W mn fbmn tmn = S −1 T D

˜ mnfbmn will be computed recursively in Note that the matrix vector multiplication D 2 O(B ) steps with the Clenshaw algorithm using (2.15) for fixed m and n. Neither the multiplication with T −1 which is realized with the discrete cosine transform (DCT) from [1] in O(B log B) flops nor the multiplication with the diagonal matrix S does increase the asymptotic complexity. Thus the total complexity of the DWT, i.e., a multiplication mn with tmn = W mn fb is O(B 2 ). As there are (2B+1)2 many vectors tmn to be computed the total complexity of the whole transformation step summarizes to O(B 4 ) flops. 9

2.4 Fast Wigner Transform In this section we outline how to speed up the DWT by adopting the fast polynomial transform together with the stabilization presented in [14]. The resulting algorithm will be referred to as the fast Wigner transform (FWT). Our aim is the computation of the matrices W mn for all m, n = −B, . . . , B. Lemma ˜ mn . The 2.3 shows that we can split W mn into a product of the matrices T , S −1 and D matrix S is a diagonal matrix and can be computed in O(B) flops. The effect of matrix T can be computed by means of the DCT in O(B log B) flops. What remains to be done ˜ mn . is to employ a fast algorithm for the transform represented by the matrix D A simple but powerful idea is to modify the three-term recurrence relation from (2.15) by defining the Wigner-d functions for |m|, |n| < l as well. Note that this is the important step, which allows us to realize the fast Wigner-d transform in a stable way. This technique was first used in [14] in order to present stable fast spherical Fourier transforms. By that trick we obtain for m, n = −B, . . . , B and l = 0, . . . , B at arbitrary B ∈ N0 an extended three-term recurrence formula mn mn mn mn mn dmn l+1 (x) = (αl x + βl )dl (x) + γl dl−1 (x),

x = cos θ,

(2.18)

where for µ = min(|m|, |n|) and ν = max(|m|, |n|) the recurrence coefficients read as   m+n+1 for 0 < l ≤ ν − µ,   for m = n, 1 (−1) mn mn mn α0 = −1 for m + n even, m 6= n, αl = |mn| for ν − µ < l < ν,     mn 0 otherwise, ul for ν ≤ l,  (  for 0 ≤ l < ν, 1 0 for l ≤ ν, mn mn βl = 0 and γl = for m = n = 0, mn  wl otherwise,  mn vl otherwise

mn and w mn , the recurrence coefficients from (2.15). To start the recurrence using umn l , vl l √ (2µ)!

for all m, n =B , . . . , B we set dmn −1 = 0 and with λmn = 2µ µ! we obtain ( λmn for m + n even, mn √ d0 (x) = λmn 1 − x2 for m + n odd.

The announced FWT will now exploit the concepts of the fast polynomial transforms as it performs a change of basis replacing the Wigner-d functions with Chebychev polynomials of first kind. Following [12] we perform the fast polynomial multiplication based on discrete cosine transforms (DCT). For that we derive associated Wigner-d functions dmn l (·, c) with a shift parameter c ∈ N0 by dmn −1 (x, c) = 0, dmn 0 (x, c) = 1, mn mn mn mn mn dmn l+1 (x, c) = (αl+c x + βl+c )dn (x, c) + γl+c dl−1 (x, c).

10

(2.19)

Instead of only one step as in the modified recurrence (2.18) we will now shift the degree l of dmn by c ∈ N0 steps using the result of the following Lemma. l Lemma 2.4 Let B ∈ N0 , m, n = −B, . . . , B and c, l = 0, . . . , B and let the functions dmn and dmn l l (· · · , c) be given as in (2.18) and (2.19), respectively. We then obtain mn mn mn mn mn dmn l+c (x) = dc (x, l)dl (x) + γl dc−1 (x, l + 1)dl−1 (x).

Proof. The proof is done by induction over c. On these functions we perform the fast polynomial transform that has been described in [12] and [6] for the sphere S2 . Applying their algorithms we can speed up our DWT from O(B 2 ) to only O(B log2 B) flops per set of orders m, n. For all O(B 2 ) different sets of orders m, n we obtain a total complexity of O(B 3 log2 B) flops for the FWT instead of the previous O(B 4 ) flops for the DWT. When using exact arithmetic the algorithm is exact. But when computing in finite precision small errors introduced by the DCT algorithm cause numerical instabilities as they are multiplied with large function values of the associated Wigner-d functions. An effective approach developed in [14] is to replace a multiplication step in the cascade build by the algorithm with a stabilization whenever the functions to be transformed exceed a certain threshold κ. The algorithmic details on this method as well as its implementation for associated Legendre functions can be found in [12] and [8], respectively. This method can be directly applied to the Wigner-d functions as they show the same behavior.

2.5 An intermediate step The FWT generates Chebychev coefficients tmn out of the SO(3) Fourier coefficients fblmn l as seen in (2.16). Yet another change of basis needs to be done. Our aim is to obtain suitable coefficients for a trivariate Fourier transform. Thus we need to find coefficients hmn which satisfy l B X

tmn l Tl (cos β)

l=0

for m + n even and

sin β

B X l=0

for m + n odd.

=

B X

−ilβ hmn l e

(2.20)

l=−B

tmn l Tl (cos β) =

B X

−ilβ hmn l e

(2.21)

l=−B

Definition 2.5 For fixed orders m and n the change of basis between Chebychev coeffimn = (hmn , . . . , hmn )T given mn T cients tmn = (tmn 0 , . . . , tB ) and the Fourier coefficients h B −B in equations (2.20) and (2.21) is described by the matrix Amn ∈ C(2B+1)×(B+1) through hmn = Amn tmn .

11

To obtain an explicit expression for the matrix Amn we first consider the slightly easier case that m + n is even. By cos lβ = 21 (eilβ + e−ilβ ) we obtain B X

tmn l Tl (cos β) =

B X l=0

l=0

Setting hmn l

1 tmn cos lβ = l 2

  0 = tmn 0   1 mn 2 t|l|

tmn 0 +

B X

ilβ tmn |l| e

l=−B

!

.

for |l| = M − 1, M, for l = 0, otherwise,

(2.22)

we achieve the desired change of basis. For arbitrary but fixed orders m and n the matrix-vector-notation of this transformation can be written as  1 H mn

2

    = 1   

1 2 1 2

1 2

Hence hmn = H mn tmn for m + n even. In case of m + n odd, we obtain sin β

B−1 X

tmn l Tl (cos β)

l=0

= sin β

B−1 X

     ∈ C(2B+1)×(B+1) .    

tmn cos lβ l

l=0

sin β = 2

tmn 0

+

(2.23)

B X

l=−B

ilβ tmn |l| e

!

.

It remains to convert sin β = 2i (e−iβ − eiβ ) and to replace the tmn according to (2.22) l which yields sin β

B−1 X

tmn l Tl (cos β)

l=0

In other words, we need to multiply  0  i −1 O mn =  2 

=

B X i mn ilβ (h − hmn |l−1| )e . 2 |l+1|

l=−B

the matrix H mn with  1  .. ..  . .  ∈ C(2B+1)×(B+1) .  .. .. . . 1 −1 0

which yields hmn = O mn H mn tmn for m + n odd.

12

All in all the matrix Amn is composed as follows ( H mn for m + n even, mn A = mn mn O H for m + n odd. The multiplication of Amn can be done in O(B) steps as it is a sparse one. So for all possible m, n this yields O(B 3 ) flops in total for the intermediate step.

2.6 The Nonequispaced Fast SO(3) Fourier Transform (NFSOFT) Sections 2.3 and 2.5 provides us with the tools necessary to manage the fast computation of SO(3) Fourier transforms of band-limited functions. Looking again at the NDSOFT in (2.12) the naive approach of computation would yield O(M B 3 ) flops. The basic idea for obtaining a fast algorithm is a simple reorganization of the sums from NDSOFT in (2.12) and (2.14). Two steps are necessary for this factorization: a separation of variables and a re-indexing of the sums. In Algorithm 1 we will see in detail that the application of Theorem 2.6 reduces the asymptotic complexity from O(M B 3 ) to O(M + B 3 log2 B). In the following theorem we describe the matrix factorization of the NDSOFT given in 2.13. Theorem 2.6 The matrix D given in (2.13) by f = D fb representing the NDSOFT can be splitted into the matrix product D = F AW

where W = diag (W mn )m,n=−B,...,B ∈ C(2B+1)

2 (B+1)×(2B+1)2 (B+1)

is the diagonal block matrix consisting of the matrices W mn from Definition 2.2, A = diag (Amn )m,n=−B,...,B ∈ C(2B+1)

3 ×(2B+1)2 (B+1)

the diagonal block matrix composed of blocks from Definition 2.5 and finally a trivariate 3 nonequispaced Fourier matrix F ∈ CM ×(2B+1) with T F = e−i((m,l,n)(αq ,βq ,γq ) ) (2.24) q=0,...,M −1;(l,m,n)∈IB .

Proof. For the separation of variables we simply use formula (2.2), that splits up the Wigner-D functions according to the Euler angles of f (αq , βq , γq ) ∈ DB . We may rewrite (2.12) for q = 0, . . . , M − 1 as f (αq , βq , γq ) =

B X l l X X

l=0 m=−l n=−l

=

B X l l X X

l=0 m=−l n=−l

fblmn Dlmn (αq , βq , γq )

−inγq . fblmn e−imαq dmn l (cos βq )e

(2.25)

13

The next step is to rearrange these sums into f (αq , βq , γq ) =

B X

B X

e−imαq

n=−B

m=−B

B X

e−inγq

l=max(|m|,|n|)

fblmn dmn l (cos βq ).

Performing the change of basis from equation (2.16) we are able to replace the last sum through a linear combination of Chebychev polynomials f (αq , βq , γq ) =

B X

B X

e−imαq

e−inγq

n=−B

m=−B

B X

mod(m+n,2) tmn . l Tn (cos βq )(sin βq )

l=0

Then (2.20) and (2.21) provide the change from Chebychev coefficients to standard Fourier coefficients, i.e., B X

f (αq , βq , γq ) =

e−imαq

=

e−inγq

n=−B

m=−B B X

B X

B X

B X

B X

e−ilβq hmn l

l=−B

−i(mαq +nγq +lβq ) hmn l e

m=−B n=−B l=−B

for q = 0, . . . , M − 1. Thus we obtain a trivariate Fourier transform which can be represented by the matrix F in (2.24). Corollary 2.7 The adjoint NFSOFT, i.e., the matrix-vector multiplication with D H as in (2.14) reads in matrix-vector notation as fb = D H Qf

where Q is a matrix containing the quadrature weights wqB . A possible choice for this matrix will be discussed in Section 2.7. Corresponding to the matrix D we split up its adjoint in a similar way into fb = W H AH F H Qf .

Our implementations also provide an algorithm to compute this adjoint problem fastly, i.e., with the same asymptotic complexity as the NFSOFT. Note that the explicit factorization of W is done similar as in [9]. Remark 2.8 Instead of transforming the SO(3) Fourier coefficients in Chebychev coefficients first, they could also be directly transformed into standard Fourier coefficients. This is done by expanding the Wigner-d functions directly into a Fourier series dmn l (cos θ)

=

l X

s=−l

14

isθ dbmn ls e ,

|m|, |n| ≤ l,

where the Fourier coefficients dbmn ls are given by

π s iπ m+n ms π 2 d dsn , dbmn l ls = (−1) e l 2 2

see [2, 17] for details. What we gained by this rearrangement is that we omit the polynomial transform and can now compute the DWT by means of an NFFT as well. But in contrast to the previously suggested method we do not improve the computational complexity of the discrete Wigner transform as we have to deal with an additional sum depending on the bandwidth B. Indeed we even increased the asymptotic complexity to O(M + B 4 ). Algorithm 1 Nonequispaced Fast SO(3) Fourier Transform (NFSOFT) Input:

B∈N the bandwidth XM a sampling set according to (2.11) mn b b f = (fl ) the 13 (B + 1)(2B + 1)(2B + 3) SO(3) Fourier coefficients of f ∈ DB

1. Compute t = W fb, the vector of (B + 1)(2B + 1)2 Chebychev coefficients tmn in l 2 3 O(B log B) flops as described in Section 2.4.

2. Compute h = At, the vector of (2B + 1)3 Fourier coefficients hmn in O(B 3 ) flops l as in Section 2.5. 3. Compute f = F h by means of a trivariate NFFT in O(M + B 3 log B) flops. Output: f = (f (αq , βq , γq ))q=0,...,M −1 (2.25)

the function samples of f ∈ DB as given in

Complexity: O(M + B 3 log2 B) flops

2.7 Equispaced Fourier Transforms on the Rotation Group Algorithm 1 enables us to evaluate function values f (α, β, γ) out of given SO(3) Fourier coefficients fblmn for any arbitrary sampling set XM . For special choices of XM we are able to reconstruct the SO(3) Fourier coefficients fblmn from function samples f (α, β, γ) using a quadrature rule. The investigation of the adjoint NFSOFT and further quadrature rules will be subject of future work but we will present one example here. The reason to do so is we want to validate the results of the NFSOFT in Section 3. One idea is to transform SO(3) Fourier coefficients into function samples by the NFSOFT from Theorem 2.6. Then, by means of an adjoint NFSOFT, see Corollary 2.7, we reconstruct the SO(3) Fourier coefficients used as input.

15

We assume the equispaced grid XBCC = {(αa1 , βb , γa2 ) : a1 , a2 = 0, . . . , 2B + 1; b = 0, . . . , 2B}

(2.26)

with uniformly distributed sample points αa1 =

πa2 πb πa1 , γa2 = and βb = . B+1 B+1 2B

Note that technically speaking we are now dealing with an SO(3) Fourier transform at equispaced nodes which could be computed by using the standard FFT instead of the NFFT in the last step of Algorithm 1. Other algorithms for equispaced discrete SO(3) Fourier transforms can also be found e.g. in [10] (see Remark 2.9). However, we apply the NFSOFT as described in Section 2.6. By using the grid XBCC we can compare accuracy and time requirements of our results with the exact computations, as we know the associated quadrature rule. In our example we use a Clenshaw-Curtis quadrature rule for the Wigner-d functions. Let us now collect the formulae necessary for the reconstruction. We like to evaluate a function f ∈ DB for given Fourier coefficients fblmn . Starting with the Fourier coefficients given by the integral Z Z Z 2l + 1 2π π 2π fblmn = f (α, β, γ)Dlmn (α, β, γ) sin β dα dβ dγ, 8π 2 0 0 0 Z Z Z 2l + 1 2π imα 2π inγ π mn = dl (β)f (α, β, γ) sin β dβ dγ dα e e 8π 2 0 0 0 we can employ the following quadrature rule to describe the integral with respect to the two angles α and γ about the z-axis with Z 2π Z 2π f (α, β, γ)eimα einγ dγ dα fb (β) = 0

=

0

2B+1 X 2B+1 X 1 f (αa1 , β, γa2 )ei(mαa1 +nγa2 ) 4(B + 1)2

(2.27)

a1 =0 a2 =0

a1 π a2 π at nodes αa1 = B+1 and γa2 = B+1 for a1 , a2 = 0, . . . , 2B + 1. For the quadrature of the lπ for b = 0, . . . , 2B Wigner-d function we use a Clenshaw-Curtis rule with nodes βb = 2B B and weights wb such that the integral transforms to Z π 2B 1 X B mn wb dl (βb )fb (βb ). (2.28) dmn (β)f (β) sin βdβ = b l 2l + 1 0 b=0

The weights are given according to [5, pp. 86] by B wbB = wb+B

B X ǫ2B −2 πbj b ǫB cos j B 4j 2 − 1 B j=0

16

(2.29)

δl0 + δlB with ǫB for b = 0, . . . , B. l =1− 2 We are now able to weight the adjoint transform with combined weights such that the Fourier transform is exactly computed for a set of nodes on the grid XBCC . Putting together (2.27), (2.28) and (2.29) yields fblmn =

2B+1 2B X 2B+1 X X 1 wbB f (αa1 , βb , γa2 )Dlmn (αa1 , βb , γa2 ). 4(B + 1)2 (2l + 1)

(2.30)

a1 =0 a2 =0 b=0

Seen from an algebraic point of view, we have the vector of SO(3) Fourier coefficients 3

4B −B fb = (fblmn )(l,m,n)∈IB ∈ C( 3 ) ,

the data vector

f = (f (αa1 , βb , γa2 ))(αa

CC 1 ,βb ,γa2 )∈XB

∈ C(2B+2)

2 (2B+1)

containing the sampled values of the original function as well as the matrix D ∈ 2

4B 3 −B

C (2B+2) (2B+1)×( 3 ) consisting of the Wigner-D functions for all indices (l, m, n) ∈ IB at all sampling nodes g(αa1 , βb , γa2 ) with (αa1 , βb , γa2 ) ∈ XBCC . The quadrature weights are comprised in two different diagonal matrices. The first matrix 2 2 W B = diag waB1 ba2 (a ,b,a )∈X CC ∈ R((2B+2) (2B+1))×((2B+2) (2B+1)) 1

2

B

depends on the sampling nodes g(αa1 , βb , γa2 ) and contains the weights waB1 ba2 = wbB ordered according to the indices a1 , b, a2 . The remaining parts of the weights are comprised in another diagonal matrix 4B 3 −B 4B 3 −B B ∈ R( 3 )×( 3 ) V B = diag vmln (l,m,n)∈I B

where

B vmln = vlB =

1 . 4(B + 1)2 (2l + 1)

Summarizing, we obtain Db f = f for the discrete SO(3) Fourier transform, V B DH W B f = fb for the adjoint transform,

see also Corollary 2.7. In Section 3 the accuracy of the NFSOFT is tested by computing a combination of NFSOFT and its adjoint. Remark 2.9 In our experiments we chose to apply the Clenshaw-Curtis type quadrature rule for arguments βb of the Wigner-d functions dmn l (βb ) on the closed interval βb ∈ [0, π]. In contrast the authors of [10] use a slightly different rule as the authors consider βb = (2b+1))π for b = 0, . . . , 2B − 1, i.e. on the open interval βb ∈ (0, π) for 4B integrating the Wigner-d functions with weights B−1 π ) π(2l + 1) X sin((2j + 1)(2l + 1) 4B 2 . w ˜lB = sin B 4B 2l + 1 j=0

17

3 Numerical Results and Conclusion We now present some numerical examples to demonstrate performance and accuracy of the NFSOFT. All algorithms mentioned were implemented in C and tested on a 3.00 GHz AMD AthlonTM 3000+ with 2GB main memory, SuSe-Linux (32 bit), using double precision arithmetic. In addition to that we used the FFTW 3.0.1 [7] and NFFT 3.0.2 [8] libraries. If not mentioned otherwise SO(3) Fourier coefficients fblmn from 1 1 we used i pseudo-random i the complex square − 2 , 2 × − 2 , 2 and pseudo-random nodes (α, β, γ) ∈ [0, 2π) × [0, π] × [0, 2π). Whenever involved we used the NFFT [8] with oversampling factor ρ = 2 and precomputed Kaiser-Bessel functions. Example 3.1 At first we examine the Wigner transform itself, i.e., the transform represented by the matrix W from Definition 2.2. The left graph of Figure 3.1 shows the time requirements for this transform in two variations: 1. DWT: the transform corresponding to the matrix W mn as described in Section 2.3 , i.e., the transform as described above followed by a discrete cosine transform (dashed line), 2. FWT: the same transform as above but using the concepts of the fast polynomial transform described in Section 2.4 (solid line). −12

−1

10

10

−13

10

−3

10

E∞

CPU time in seconds

−2

10

−14

10

−4

10

−15

10

−5

10

0

50

100

150 bandwidth

200

250

300

−16

10

0

50

100

150 bandwidth

200

250

300

Figure 3.1: The times for performing one Wigner transform using the FWT algorithm (solid) and the DWT (dashed). The left graph shows the time as a function of the bandwidth. The right graph shows the error (3.1) of the FWT at the intern threshold κ = 1000. The time has been measured for one transform with orders set to m = n = 0 to guarantee the maximum number of nonzero coefficients. As expected the DWT shows quadratic growth of time while the FWT confirms its complexity of O(B log2 B) flops.

18

Note that both the FWT and DWT show time jumps at bandwidth B = 2n for n ∈ N. While this is a desired behavior in the FWT algorithm resulting from the zero padding described in [13] it is not necessary in the slower DWT. These jumps occur due to the discrete cosine transform as implemented in [8]. The jumps in the DWT are due to design decisions in the implementation of the discrete polynomial transform from [8] which also needs zero padding depending on powers of 2. This however does not change the asymptotic complexity of the DWT. From Lemma 2.3 we know that for a fixed set of orders m, n we can split up W mn = ˜ mn for m + n even and W mn = S −1 T D ˜ mn for m + n odd, respectively. The TD ˜ mn has also been implemented algorithm resembling the matrix multiplication with D ˜ mn in [10]. Therefore we like to compare our transform corresponding to the matrix D done by evaluating the three-term recurrence (2.18) by the Clenshaw Algorithm (see ˜ mn using the algorithms [16, pp.184]) to the transform corresponding to the matrix D described and provided by [10]. Note the algorithm from [10] only applies for bandwidths that are powers of 2 and a fixed set of input values βb = π(2b+1) for b = 0, . . . , 2B − 1. In Table 3.1 we compare it 4B to our algorithms at exactly these bandwidths and for the same set of input values. B 16 32 64 128 256 512

˜ mn transform from [10] D 3.0 · 10−5 9.5 · 10−5 4.0 · 10−4 1.3 · 10−3 1.4 · 10−2 6.6 · 10−2

˜ mn transform cf. Lemma 2.3 D 1.2 · 10−5 3.5 · 10−5 1.3 · 10−4 5.2 · 10−4 1.9 · 10−3 6.5 · 10−3

Table 3.1: Times of the variants of the Wigner Transform. The second issue we analysed in this example is the error the Wigner transform produces. The right graph shows the error E∞ =

||t − tF W T ||∞ ||t||1

(3.1)

between the Chebychev coefficients tF W T = (tmn l )(l,m,n)∈IB computed by the FWT and the coefficients in t computed by the direct algorithm DWT for different bandwidths B. The norms || · ||1 and || · ||∞ are the unweighted lp −norms of vectors. For the tested bandwidths up to B = 300 the error does not exceed 10−12 . Example 3.2 Now we consider the NFSOFT transform, (see Algorithm 1), and compare its time requirements to those of the NDSOFT, i.e., the naive evaluation of (2.12). We chose three variations of our algorithm:

19

1. NFSOFT (FWT, NFFT): the transform described in Algorithm 1 using FWT (see Section 2.4) and NFFT (solid line), 2. NFSOFT (DWT, NFFT): the transform as above but with DWT (see Section 2.3) instead of FWT (dot-dashed line), 3. NDSOFT: the transform using the DWT and nonequispaced discrete Fourier transform (NDFT) and thus directly evaluating equation (2.12) (dashed line). 3

10

3

10

NFSOFT (FWT,NFFT) NFSOFT (DWT,NFFT) NDSOFT

2

CPU time in seconds

CPU time in seconds

10

1

10

0

10

NDSOFT

2

10

1

10

NFSOFT (FWT,NFFT), m=2, κ=10

8

−1

NFSOFT (DWT,NFFT), m=2 NFSOFT (DWT,NFFT), m=5

10

NFSOFT (FWT,NFFT), m=5, κ=103 −2

0

0

10

0

10

10

20

30 bandwidth

40

50

10

1

10

2

10

3

10

4

10

5

10

number of nodes

Figure 3.2: This figure shows the time for SO(3) Fourier transforms using the NFSOFT (FWT, NFFT) algorithm, the NFSOFT (DWT, NFFT) algorithm and the NDSOFT. The grey curves represent the NFSOFT (FWT, NFFT) algorithm and the NFSOFT (DWT, NFFT) algorithm with parameters m = 2 and κ = 108 while in the other transforms we set m = 5 and κ = 1000 to assure a better accuracy. This graph shows the time as a function of the bandwidth while we set the number of nodes to M = B 3 . In the right graph we see the same transforms but with the time given as a function of the number of input nodes M while we fix the bandwidth at B = 24. In Figure 3.2 we show the time requirements for the NFSOFT (FWT, NFFT) using NFFT cutoff parameter m = 5 and FWT intern threshold κ = 103 (black) as well as for the transform with m = 2 and κ = 108 (grey) for different bandwidths. We use the same parameters for the NFSOFT (DWT, NFFT), respectively. More information about these parameters controlling the accuracy of the fast transforms can be found in [12] and [9]. The number of nodes is set to M = B 3 . As expected the NFSOFT (FWT, NFFT) outperforms the NDSOFT by far. When lowering the accuracy of these computations (see Example 3.3 for more information) by adjusting the parameters m and κ we can speed up our transforms. In the right part of Figure 3.2 we see that in the NDSOFT requires much more time than both NFSOFT algorithms when increasing the number of input nodes. We do not spot a difference in time between NFSOFT (FWT, NFFT) and NFSOFT (DWT, NFFT) here.

20

That is due to the fact that both Wigner transforms are independent from the input nodes (see Section 2.3). But also in the left graph the time difference between NFSOFT (FWT, NFFT) and NFSOFT (DWT, NFFT) is rather small while it is huge between those two and the NDSOFT. All in all we conclude that the required time does not so much depend on the choice between discrete (DWT) and fast Wigner transform (FWT) but on the choice between NDFT and NFFT. Comparing the time for a NFSOFT (FWT, NFFT) and a threedimensional NFFT of size (2B + 1)3 which corresponds to the third step of the NFSOFT (FWT, NFFT) (see Algorithm 1) we found out that averaged over 80 computations the NFFT comprises 79,8% of the total time for the whole algorithm. B 4 8 16 32 52 64

SOFT 1.2 · 10−4 9.3 · 10−4 7.3 · 10−3 9.3 · 10−2 7.2 · 10−1 (∗) 8.2 · 10−1

NFSOFT 5.1 · 10−2 6.7 · 10−1 2.8 · 101 3.2 · 102 9.2 · 102 (∗∗)

Table 3.2: The time requirements of the SOFT compared to the NFSOFT for different bandwidths and M = (2B)3 nodes. The results of the SOFT at bandwidth 52 (∗) have been computed by doing a transform at bandwidth 64 but setting the unnecessary Fourier coefficients to zero. The NFSOFT at bandwidth 64 (∗∗) could not be computed due to missing memory. Additionally we listed the time requirements for the equispaced SOFT algorithm from [10] to compare it with our methods. Note that the SOFT can be applied only for a fixed number of input nodes depending on the bandwidth which has to be a power of two. Furthermore one can only evaluate on a special grid XBSOF T = {(αa1 , βb , γa2 ) : a1 , a2 , b = 0, . . . , 2B − 1} with uniformly distributed sample points αa1 =

πa2 π(2b + 1) πa1 , γa2 = and βb = . B B 4B

Thus one needs M = (2B)3 input nodes here. It can be seen that the SOFT algorithm from [10] is still much faster than the new fast algorithm NFSOFT. This is due to the FFTW [7] being faster than the NFFT [8] (see, especially the online documentation p.32). Additionally we can only compute transforms up to B = 52 before exceeding our 2 GB of main memory while the SOFT from [10] still works for bandwidths B = 128. This again is due to their use of the FFT rather than the NFFT which in contrast limits them

21

to compute on equispaced grids only.

Example 3.3 As mentioned in Example 3.2 the intern parameters m and κ control the accuracy of the NFFT and FWT, respectively. We do not discuss these parameters in detail here as this has been done e.g. in [12] or [9]. Instead this example shall illustrate briefly how to choose these parameters to obtain a certain accuracy. Figure 3.3 shows the error ||f − fNFSOFT ||∞ E∞ = ||f ||1 between the samples f NFSOFT computed by the NFSOFT (FWT, NFFT) and the samples f of the NDSOFT. The error is given as a function of the internal FWT threshold κ 0

10

−8

10

−10

−5

10

E∞

E∞

10

−12

10

−10

10

−14

10

−15

10 0

10

2

10

4

10

κ

6

10

8

10

1

2

3

4

5

6

7

8

m

Figure 3.3: The error E∞ of the NFSOFT at bandwidth B = 36 with M = B 3 = 46656 nodes. The error is plotted as a function of the threshold κ used by the FWT (left) and the cutoff parameter of the NFFT (right). In our examples we either used κ = 103 with m = 5 or κ = 108 with m = 2. Both choices are marked in the graphs. and the NFFT cut off parameter m. On one hand we see that for the NFFT the cut off parameter m = 8 will do for single precision and that for m ≥ 6 the accuracy does not improve significantly. On the other hand an acceptable threshold for single precision in the FWT is given by κ ≤ 108 . Starting from the best accuracy the error increases more rapidly for κ ≥ 102 . Example 3.4 Finally we give an example for the results in Section 2.7. We consider the error b ||fb − f˜||∞ E∞ = ||fb||1 22

b where fb denotes the original SO(3) Fourier coefficients and f˜ the coefficients computed via the NFSOFT and its adjoint using the Clenshaw-Curtis quadrature rule from (2.30). Figure 3.4 shows this error for different bandwidths. We chose the NFFT cutoff parameters m = 2 and m = 5, FWT threshold κ = 108 and κ = 1000 respectively. The number of nodes was set to M = (2B + 2)2 (2B + 1). We also compared our results to the ones from the SOFT in [10] showing that the error of their algorithm grows depending on the bandwidth while the error of our algorithm stays the almost the same for chosen parameters. For m = 5 our NFSOFT even yields more accurate results than the SOFT. −2

10

−4

10

NFSOFT, m=2, κ=108 NFSOFT, m=5, κ=103 −6

10 E∞

SOFT

−8

10

−10

10

−12

10

5

10

15

20

25

30 35 bandwidth

40

45

50

55

Figure 3.4: The error E∞ of the NFSOFT and its adjoint as a function of the bandwidth B at equispaced nodes using the quadrature rule as described in Section 2.7 as well as the error of the SOFT algorithm from [10].

Conclusion We presented a method for the efficient and accurate calculation of the SO(3) Fourier transform B X l l X X q = 0, . . . , M − 1, f (g q ) = fblmn Dlmn (g q ), l=0 m=−l n=−l

of B-band-limited functions f ∈ DB at M arbitrary nodes g q ∈ SO(3), as well as the adjoint transform. We showed how to split up the computation of the whole transform f = D fb = F AW fb

into three steps. The new algorithm NFSOFT computes the SO(3) Fourier transforms by means of a trivariate NFFT, represented by the matrix F and a fast Wigner transform, represented by the matrix W . Instead of naively O(M B 3 ) operations we can compute the NFSOFT in O(M + B 3 log2 B) flops. The numerical results show that the NFSOFT outperforms its counterpart, the NDSOFT.

23

References [1] G. Baszenski and M. Tasche. Fast polynomial multiplication and convolution related to the discrete cosine transform. Linear Algebra Appl., 252:1 – 25, 1997. [2] H. J. Bunge. Texture Analysis in Material Science. Butterworths, 1982. [3] J. E. Castrillon-Candas, V. Siddavanahalli, and C. Bajaj. Nonequispaced Fourier transforms for protein-protein docking. ICES Report 05-44, University of Texas at Austin, 2005. [4] G. S. Chirikjian and A. Kyatkin. Engineering Applications of Noncommutative Harmonic Analysis: with Emphasis on Rotation and Motion Groups. CRC Press, Boca Raton, 2001. [5] P. J. Davis and P. Rabinowitz. Methods of Numerical Integration (Second Edition). Academic Press Inc., 1984. [6] J. R. Driscoll and D. Healy. Computing Fourier transforms and convolutions on the 2–sphere. Adv. in Appl. Math., 15:202 – 250, 1994. [7] M. Frigo and S. G. Johnson. FFTW, C subroutine library. http://www.fftw.org. [8] J. Keiner, S. Kunis, and D. Potts. NFFT3.0, Softwarepackage, C subroutine library. http://www.tu-chemnitz.de/~potts/nfft, 2006. [9] J. Keiner and D. Potts. Fast evaluation of quadrature formulae on the sphere. Math. Comput., 2007. accepted. [10] P. J. Kostelec and D. N. Rockmore. FFTs on the rotation group. Santa Fe Institute Working Papers Series Paper 03-11-060, 2003. [11] P. J. Kostelec and D. N. Rockmore. The SOFT Package: FFTs on the Rotation Group, collection of C routines. http://www.cs.dartmouth.edu/~geelong/soft, 2006. [12] S. Kunis and D. Potts. Fast spherical Fourier algorithms. J. Comput. Appl. Math., 161:75 – 98, 2003. [13] D. Potts, G. Steidl, and M. Tasche. Fast algorithms for discrete polynomial transforms. Math. Comput., 67:1577 – 1590, 1998. [14] D. Potts, G. Steidl, and M. Tasche. Fast and stable algorithms for discrete spherical Fourier transforms. Linear Algebra Appl., 275/276:433 – 450, 1998. [15] D. Potts, G. Steidl, and M. Tasche. Fast Fourier transforms for nonequispaced data: A tutorial. In J. J. Benedetto and P. J. S. G. Ferreira, editors, Modern Sampling Theory: Mathematics and Applications, pages 247 – 270. Birkh¨ auser, Boston, 2001.

24

[16] V. Press, Teukolsky and Flannery. Numerical Recipes in C++, Second Edition. Cambridge University Press, Cambridge, 2002. [17] T. Risbo. Fourier transform summation of Legendre series and D-Functions. J. Geod., 70:383–396, 1996. [18] H. Schaeben and K. G. v.d. Boogaart. Spherical harmonics in texture analysis. Tectonophysics, 370:253–268, 2003. [19] D. Varshalovich, A. Moskalev, and V. Khersonski. Quantum Theory of Angular Momentum. World Scientific Publishing, Singapore, 1988. [20] N. Vilenkin. Special Functions and the Theory of Group Representations. Amer. Math. Soc., Providence, RI, translations of mathematical monographs, 22nd edition, 1968.

25

A Fast Fourier Algorithm on the Rotation Group - CiteSeerX

A Fast Fourier Algorithm on the Rotation Group - CiteSeerX

Suggest Documents

A self-sorting in-place fast Fourier transform algorithm ... - CiteSeerX

Fast WalshâHadamardâFourier Transform Algorithm

INTEGER FAST FOURIER TRANSFORM - CiteSeerX

INTEGER FAST FOURIER TRANSFORM - CiteSeerX

Rotation Scheduling: A Loop Pipelining Algorithm - CiteSeerX

Studies on Fast Pareto Genetic Algorithm Based on Fast ... - CiteSeerX

A Fast Modified DOA Estimation Algorithm with Rotation ... - IEEE Xplore

THE FAST FOURIER TRANSFORM 1. Introduction. Fast Fourier

Fast Fourier Transform for Fitness Landscapes - CiteSeerX

Nonequispaced hyperbolic cross fast Fourier transform - CiteSeerX

(DHT) and Fast Fourier Transform (FFT) - CiteSeerX

Fast Quaternionic Fourier Transform Student Project - CiteSeerX

A Fast Mutual Information Calculation Algorithm - CiteSeerX

A Fast Hierarchical Quadratic Placement Algorithm - CiteSeerX

on some block algorithms for fast fourier transforms - CiteSeerX

Block Algorithms for Fast Fourier Transforms on Vector ... - CiteSeerX

Mini Project 2: Fast Fourier Transform Algorithm and

FFTPL: An Analytic Placement Algorithm Using Fast Fourier Transform

Fast Fourier Transform

Fast Fourier Transform

Fast Fourier Transform Implementation on FPGA ...

IMPLEMENTING FAST FOURIER TRANSFORMS ON ...

Fast Fourier Transform on Hexagons - Springer Link

Fast Fourier transform benchmark on X86 Xeon

A Fast Fourier Algorithm on the Rotation Group - CiteSeerX