Fast Fourier Transform on Hexagons
∗
Huiyuan Li1 and Jiachang Sun1 Laboratory of Parallel Computing, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China.
[email protected],
[email protected]
Summary. We propose fast algorithms for computing the discrete Fourier transforms on hexagon. These algorithms are easy to implement, they reduce the computation complexity from O(M 2 ) to O(M log M ), where M is the total number of sampling points. Key words: Fast Fourier transforms, algorithms, hexagon, non-tensor-product
1 Introduction Encouraged by the success of fast Fourier transform (FFT) in information and computing sciences [1, 6] and the increasing application demand for complex geometry, we wish to extend FFT to efficiently solve problems on certain irregular domains [2, 7, 8, 10]. We find further motivation for the study of sampling schemes for multidimensional isotropic functions. In general, sampling 2-dimensional isotropic functions on hexagonal lattice is significantly more efficient than sampling such functions on square lattice [3]. Motivation also arises from the study of mesh generation for solving partial differential equations. As a compromise between rectangular grid and unstructured mesh, hexagonal tessellation is widely used in practical applications. It possesses simple data structure and more flexibility, it is easy to use, while it will bring out less effect of grid orientation. The classic approaches for multidimensional discrete Fourier transform (DFT) can not be taken with functions periodically sampled on hexagon. To overcome this difficult, we formulate the hexagonal DFT in terms of the bilinear form of a periodicity matrix. Resorting to the periodicity matrix factorization, we derive the decimation-in-time (DIT) and the decimationin-frequency (DIF) hexagonal FFT algorithms. From the technical point of view, the DIT and DIF hexagonal FFT algorithms are both composed of 3 ∗ This project is supported by National Natural Science Foundation of China (No. 60173021) and Basic Research Foundation of ISCAS (No. CXK35281).
358
Huiyuan Li and Jiachang Sun
FFTs sampled on a square, a series of shifted 3-point FFTs and a certain enciphering/deciphering procedure. As a result, our algorithms can be conveniently and efficiently implemented by using existing FFT packages such as FFTPACK [9] and FFTW. We give some numerical results in the last section, which demonstrate the high performance of our hexagonal FFT algorithms.
2 Discrete Fourier transform on hexagon Let Ω be a centrally symmetric hexagon. We establish the 3-directional partition on Ω, and specify each knot by an integer pair (k1 , k2 ), with the center knot indexed by (0, 0). Given a positive integer N , we define > = (1) ΛN = (k1 , k2 ) ∈ Z2 : −N ≤ k1 , k2 , k1 + k2 < N , which stands for the set of dotted knots indicated in Fig. 1. For a complex number sequence uj defined on ΛN , we define the hexagonal DFT [8], $ % 1 uj exp −2πi k N−1 j , k ∈ ΛN . (2) u ˆk = | det(N)| j∈ΛN
⎛
⎞
where the integer matrix N =⎜⎜⎝ 2N −N ⎟⎟⎠ , and | det(N)| = card(IN ) = 3N 2 . −N 2N Generally, N is referred to as the periodicity matrix. Due to the orthogonality, we find the inverse formula for (2), $ % uj = u ˆk exp 2πi k N−1 j , j ∈ ΛN . (3) k∈ΛN
The periodicity matrix N induces an equivalent relation on Z2 . We shall say that two integer vectors m ≡ n (mod N) if m = n + Nr for some integer vector r. It is true that j, k ∈ ΛN with m ≡ n (mod N) implies j = k. In this case, we say ΛN a period associated with N. Specifically, we shall use the notation (j)N to denote the vector which is both congruent to j and contained in ΛN .
(-3,3) c
c c c(0,3) B B B B s s s c(1,2) (-3,2) s B B B B B s Bs Bs Bs Bc(2,1) (-3,1) s B s B s B s B s B s BBc(3,0) (-3,0) s B B B B s B s B s B(0,0) Bs Bs Bc B B B k2 Bs B s Bs B s BBc BBs BBs BBs BBc -k (0,-3)(1,-3)(2,-3)(3,-3)
1
Fig. 1. The 3-directional partition and the hexagonal indices.
3 Fast Fourier transform on hexagon In this section, we develop some algorithms to evaluate the hexagonal DFTs (2) and (3) conveniently and efficiently. Without loss of generality, we only consider the following “canonical” form,
Fast Fourier Transform on Hexagons
Xk =
−1
xj e−2iπk N
j
,
k ∈ ΛN .
359
(4)
j∈ΛN
Let v = (1, −1) . The periodicity matrix N can be factorized as followings, N = PQ,
P = N I,
Q = I + vv .
Now P is a periodicity matrix for an ordinary square DFT, and its associated period IP can be chosen in a general way, IP = {k = (k1 , k2 ) : 0 ≤ k1 , k2 < N } . Define e = (1, 0) and Γ = {−1, 0, 1}. Then for any j, k ∈ ΛN , there exist unique vectors p, m ∈ IP and unique integers q, n ∈ Γ such that [4, 5] k = (m + nv)N ,
j = (p + N qe)N .
Noting that j N−1 k ≡ N −1 p m+(3N )−1 n(N q+p1 −p2 ) (mod 1), we rewrite the hexagonal DFT sum in (4) as X(Qm+nv)N =
e−
2πi p m N
p∈IP
$
% 2πi x(N qe+p)N e− 3N n(N q+p1 −p2 ) .
q∈Γ
2πi
Let w = e− 3N . Define yq,p = x(N qe1 +p)N and yq,p wn(N q+p1 −p2 ) . Yn,p =
(5)
q∈Γ
Then for any n ∈ Γ , X(Qm+nv)N =
Yn,p e−
2πi p m N
,
m ∈ IP .
(6)
p∈IP
This means the computation of (4) can be fulfilled by solving (5) and (6) successively. It is well known that (6) is the standard square DFT, which can efficiently computed by any existing FFT packages such as FFTW and FFTPACK. While (5) defines N 2 shifted 3-point DFTs, the computation of which are depicted by the butterfly symbol in Fig. 2. In practice, the hexagonal FFT computations are normally performed in place in a 2-dimensional array. Let the input data x be stored in the 3N × N array A in the natural order, i.e., A(p1 +N q, p2 ) = x(p+N qe)N , p ∈ IP , q ∈ Γ . The in-place computation implies yq,p = A(p1 + N q, p2 ), p ∈ IP , q ∈ Γ , this may divide the array A into three N ×N subarrays indicated by Fig. 4. Denote p = p1 − p2 and define two arrays of real trigonometric function values, wr(p) = cos(2πp/3N ), wi(p) = sin(2πp/3N ),
−N ≤ p ≤ 2N − 1.
We propose an in-place algorithm in Fig. 3 corresponding to the butterfly symbol in Fig. 2. This algorithm is expressed in terms of complex numbers, but all the multiplications are by real number or by i.
360
Huiyuan Li and Jiachang Sun
- Y−1,p ⎛ y−1,p Q PP ⎞ ⎛ p−N p 1 P w w Y−1,p P Q 3 P q P Q ⎟ ⎜ ⎜ -P P y0,p Y0,p PP Q Y 1 1 = ⎝ 0,p ⎠ ⎝ 1 y1,p
PQ s P PQ q P P -Q Q P Y1,p
wN −p w−p
Y1,p
⎞⎛ ⎞ y−1,p ⎟⎜ ⎟ 1 ⎠ ⎝ y0,p ⎠ y1,p w−p−N wp+N
Fig. 2. The butterfly for the computation of the shifted 3-point DFT. u = A(p1 − N, p2 ) + A(p1 + N, p2 ); A(p1 − N, p2 ) = A(p1 − N, p2 ) − A(p1 + N, p2 ) A(p1 + N, p2 ) = A(p1 + N, p2 ) − A(p1 , p2 ); A(p1 , p2 ) = A(p1 , p2 ) + u u = wi(N − p) ∗ A(p1 − N, p2 ) + wi(p) ∗ A(p1 + N, p2 ) A(p1 + N, p2 ) = wr(N − p) ∗ A(p1 − N, p2 ) − wr(p) ∗ A(p1 + N, p2 ) A(p1 − N, p2 ) = A(p1 + N, p2 ) − i ∗ u; A(p1 + N, p2 ) = A(p1 + N, p2 ) + i ∗ u
Fig. 3. An in-place algorithm corresponding to the butterfly in Fig. 2.
The in-place computation of hexagonal FFT will be achieved after the in-place FFTs having been successfully performed on the subarrays of A. However, the final output X is scrambled (see Fig. 4). To get the naturally ordered output, a deciphering procedure is needed: the content of the entry A(p1 + N q, p2 ) should be moved to the location A(m1 + N n, m2 ), where p + N qe ≡ (Qm + nv)N ,
m, p ∈ IP , n, q ∈ Γ.
(7)
It is obvious that (Qm + nv)N − (m + N ne) ≡ (m1 − m2 + v)e
(mod N ).
The above equation means that for any given p ∈ Ip , the source location line p + λe periodically coincides with the destination location line m + λe. Thus the deciphering procedure can also be performed in place. x−3,0 x−3,1 x−2,0 x−2,1 x−1,0 x−1,1 x0,0 x0,1 x1,0 x1,1 x2,0 x−1,−2 x0,−3 x0,−2 x1,−3 x1,−2 x2,−3 x2,−2
x−3,2 x−2,2 x−1,2 x0,2 x−2,−1 x−1,−1 x0,−1 x1,−1 x2,−1
X−1,1 X1,0 X−3,2 X0,0 X2,−1 X−2,1 X1,−1 X−3,1 X−1,0
(−3, 0) (−3, 1) (−3, 2) (−2, 0) (−2, 1) (−2, 2) (−1, 0) (−1, 1) (−1, 2) (0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(4, 0)
(4, 1)
(4, 2)
(5, 0)
(5, 1)
(5, 2)
input x
indices of array A
X1,−3 X0,2 X−1,−2 X−1,2 X1,1 X0,−3 X0,1 X2,0 X−2,2
X0,−1 X2,−2 X−2,0 X1,−2 X−3,0 X−1,−1 X2,−3 X−2,−1 X0,−2
output X
Fig. 4. The input x in array A is overwritten by the scrambled output X.
The above algorithm is obtained by decimating the output frequency sequence, and we call it decimation-in-frequency (DIF) hexagonal FFT algorithm. Similarly, letting j = (m + nv)N , k = (p + N qe)N yields X(N qe+p)N =
$
n∈Γ
m∈IP
x(Qm+nv)N e−
2πi p m N
%
2πi
e− 3N n(N q+p1 −p2 ) .
Fast Fourier Transform on Hexagons
361
Thus by almost reversing the above DIF algorithm, we can easily derive the decimation-in-time (DIT) hexagonal FFT algorithm. Due to page limit, we omit the details. Now we analyze the arithmetic cost in the hexagonal FFT algorithms. Let L be the flop count in computing an N × N square FFT. Then the flop count in computing the hexagonal FFT is 3L + 24N 2 . If FFTPACK [9] is used for computing the square FFTs and if N = 2p2 3p3 4p4 5p5 6p6 , then this flop count amounts only to N 2 (30p2 + 54p3 + 51p4 + 81.6p5 + 80p6 − 12) + 36N.
4 Numerical results In this section, we present some numerical results concerning the hexagonal FFT algorithms in § 3. Our experiments have been made under the Linux/g77 environment on PIII 1.5GHZ/1GB computers. We use the FFTPACK subroutines to implement square FFTs. N 3 9 27 81 243 729 DFT 2.824E-5 5.619E-4 3.962E-2 3.535 525.13 54650. DIF 1.937E-5 4.475E-5 3.001E-4 3.819E-3 4.188E-2 0.515 DIT 1.629E-5 4.074E-5 3.123E-4 3.621E-3 4.283E-2 0.527 Table 1. Time expenses (in seconds) of hexagonal FFT and hexagonal DFT. N 4 8 16 32 64 128 256 512 DFT 4.611E-5 3.682E-4 5.061E-3 8.028E-2 1.427 26.521 734.73 12057. DIF 1.918E-5 4.468E-5 1.040E-4 3.876E-4 1.669E-3 8.365E-3 4.318E-2 0.211 DIT 1.591E-5 3.720E-5 1.043E-4 4.017E-4 1.807E-3 8.764E-3 4.381E-2 0.221 Table 2. Time expenses (in seconds) of hexagonal FFT and hexagonal DFT.
Table 1 and Fig. 5 (left) indicate that as N trebles, the elapsed time for evaluating the hexagonal DFT by straightforward summation increases about 81 times, while the elapsed time for hexagonal FFTs only increases about 9 times. Table 2 and Fig. 5 (center) imply that as N doubles, the elapsed time for evaluating the hexagonal DFT by straightforward summation increases about 16 times, while the elapsed time for hexagonal FFTs only increases about 4 times. We also plot the elapsed time of the hexagonal FFTs on the right side of Fig. 5 as N increases successively from 100 to 6000. This plot shows us nearly linear curves. All the numerical results demonstrate that our hexagonal FFT algorithms is very efficient, they reduce the complexity of the hexagonal DFT from O(N 4 ) to O(N 2 log N ). By performing round-trip FFTs (first FFT and then inverse FFT), we find that the errors of the hexagonal FFTs grow very slowly as N increases (see Fig. 6). Thus we conclude that our hexagonal FFT algorithms are efficient, accurate and stable.
362
Huiyuan Li and Jiachang Sun 2
2
5
1.5
0 0
1 −2
0.5
−10
−4
lgt
log2t
log3t
−5
0 −0.5
−6
−1 −8
−1.5
−15
−10 −20 0
2
4
6
8
10
−12 1
12
−2 2
3
4
5
6
−2.5 2
7
log3N
log N 2
2.5
3
3.5
4
lgN
Fig. 5. Logarithm plots of the elapsed time for hexagonal DFTs and FFTs. Left: N = 2n ; Center: N = 3n ; Right: N = 100n. : DFT; ∗ : DIF FFT; ◦ : DIT FFT. −14.5
−14.7
−14
−14.6
−14.8
−14.1
−14.7
lgε
lgε
−15
−14.2
−14.8
−14.3
−14.9
−14.4
lgε
−14.9
−15
−15.1
−14.5
−15.1
−14.6
−15.2
−14.7
−15.2 −15.3 −15.4 0
−15.3 2
4
6
log2N
8
10
12
−15.4 1
−14.8
2
3
4
log3N
5
6
7
−14.9 0
1000
2000
3000
4000
5000
6000
N
Fig. 6. The errors of the round-trip hexagonal FFTs. Left: N = 2n ; Center: N = 3n ; Right: N = 100n. ∗ : DIF FFT; ◦ : DIT FFT.
References 1. J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19:297–301, 1965. 2. D. E. Dudgeon and R. M. Mersereau. Mutlidimensional Digital Signal Processing. Prentice-Hall:Englewood Cliffs, NJ, 1984. 3. R. M. Mersereau. The processing of hexagonally sampled two-dimensional signals. Proc. IEEE, 67:930–949, 1979. 4. R. M. Mersereau and T. C. Speake. Unifed treatment of Cooley-Tukey algorithms for the evaluation of multidimensional DFT. IEEE Transactions on Acoustic, Speech and Signal Processing, 29:1011–1018, 1981. 5. R. M. Mersereau and T. C. Speake. The processing of periodically sampled multidimensional signals. IEEE Transactions on Acoustic, Speech and Signal Processing, 31:188–194, 1983. 6. D. N. Rockmore. The FFT: An algorithm the whole family can use. Computing in Science & Engineering, 2(1):60–64, 2000. 7. J. Sun. Generalized Fourier transformation in an arbitrary triangular domain. Advances in Computational Mathematics, to appear. 8. J. Sun. Multivariate Fourier series over a class of non tensor-product partition domains. J. Comput. Math., 21:53–62, 2003. 9. P.N. Swarztrauber. Vectorizing the ffts. In G. Rodrigue, editor, Parallel Computations, pages 51–83. Academic Press, 1982. 10. J. L. Zapata and G. X. Ritter. Fast fourier transform for hexagonal aggregates. Journal of Mathematical Imaging and Vision, 12:183–197, 2000.