On Computing the Fast Fourier Transform over Finite Fields Sergei Fedorenko1 and Peter Trifonov St.Petersburg State Polytechnical University, Distributed Computing and Networking Department, Politekhnitcheskaya st., 21, Room 9–104, St.Petersburg, 194021, Russia.
[email protected] [email protected] In this paper we consider the problem of computing the Fast Fourier Transform of a polynomial over finite fields. The polynomial is decomposed into a sum of linearized polynomials allowing one to use fast evaluation algorithms. An example of the FFT algorithm with the complexity lower than the best one known to the authors is provided.
1
Introduction
Currently there exist a lot of algorithms for computing the Fast Fourier Transform (FFT) over the field of complex numbers. Many of these algorithms can be used in the case of finite fields, but in practice the problem of construction of FFT for a finite field remains hard and poorly formalized [3]. In this paper we suggest an universal approach for the construction of FFT algorithms over the fields of characteristics 2. The algorithm is based on the decomposition of an arbitrary polynomial into a sum of linearized polynomials allowing thus usage of the effective evaluation algorithms [2].
2
Basic definitions
Definition 1. The polynomial over GF (2m ) is called linearized if X i L(x) = li x2 , li ∈ GF (2m ). i
It can be easily proved that for linearized polynomials L(a + b) = L(a) + L(b) holds. This property leads to the following theorem presented here in a slightly modified form: 1
This work was supported by the Alexander von Humboldt Foundation.
Theorem 1 ([1]). Let x ∈ GF (2m ) and let β0 , β1 , . . . , βm−1 be a basis of the field. If
x=
m−1 X
xi βi ,
xi ∈ GF (2),
then
L(x) =
i=0
m−1 X
xi L(βi ).
i=0
Let us consider cyclotomic cosets modulo n = 2m − 1 over GF (2): {0}, {k1 , k1 2, k1 22 , . . . , k1 2m1 −1 }, . . . , {kl , kl 2, kl 22 , . . . , kl 2ml −1 }, where ki ≡ ki 2mi mod n. n−1 X Then any polynomial f (x) = fi xi , fi ∈ GF (2m ) can be decomposed as i=0
f (x) =
l X
ki
Li (x ), Li (y) =
i=0
m i −1 X
j
fki 2j mod n y 2 .
(1)
j=0
In fact (1) represents a way of grouping numbers 0 ≤ s < n into cyclotomic cosets: s ≡ ki 2j mod n. Obviously, this decomposition is always possible. Note, that term f0 can be represented as L0 (x0 ), where L0 (y) = f0 y.
3
Fast Fourier Transform
Let us consider P the problem of computing the FFT of a polynomial f (x), i.e. computing j ij values f (α ) = n−1 a primitive element of GF (2m ). According to (1), i=0 fi α , where α is P f (αj ) can be represented as f (αj ) = li=0 Li (αj ki ). It is known [1] that αki is a root of a minimal polynomial of degree mi and thus belongs to a subfield GF (2mi ), mi | m. Thus all the values (αki )j lie in GF (2mi )Pand so they can be decomposed in some basis i −1 (βi,0 , . . . , βi,mi −1 ) of the subfield: αj ki = m s=0 aijs βi,s , aijs ∈ GF (2). Then, according to the theorem 1, Ãm −1 ! l m l m i −1 i −1 i X X X X X p 2 Fj = f (αj ) = aijs Li (βi,s ) = aijs βi,s fki 2p . (2) i=0 s=0
i=0 s=0
p=0
This equation can be represented in matrix form as F = ALf , where F = kFj k, f = kfj k, A is a matrix with elements aijs ∈ GF (2), L is a block diagonal matrix with elements 2p βi,s . It is possible to choose the same basis for all the linearized polynomials of the same degree mi in (1) and obtain very small amount of different blocks in matrix L. This can simplify the problem of construction of a fast algorithm for multiplication of a matrix L by a vector f over GF (2m ). The described transforms are similar to the ones presented in [4]. The main differences are: 1. Matrix L has regular structure which can be used for a further optimization. 2. There is a single multiplication of a binary matrix by a vector. This can be used for a better optimization.
Example 1. A polynomial f (x) =
P6 i=0
fi xi ,
fi ∈ GF (23 ) can be represented as
f (x) = L0 (x0 ) + L1 (x) + L2 (x3 ) L0 (y) = f0 y L1 (y) = f1 y + f2 y 2 + f4 y 4 L2 (y) = f3 y + f6 y 2 + f5 y 4 . Let us choose as basis elements of GF (23 ) the standard basis and represent the components of Fourier transform as f (α0 ) =
L0 (α0 ) + L1 (α0 ) + L2 (α0 )
f (α1 ) =
L0 (α0 ) + L1 (α) + L2 (α3 ) =
2
f (α ) = L0
(α0 )
+ L1
(α2 )
+ L2
(α6 )
L0 (1) + L1 (α) + L2 (1) + L2 (α)
= L0 (1) + L1 (α2 ) + L2 (1) + L2 (α2 )
f (α3 ) = L0 (α0 ) + L1 (α3 ) + L2 (α2 ) = L0 (1) + L1 (1) + L1 (α) + L2 (α2 ) f (α4 ) = L0 (α0 ) + L1 (α4 ) + L2 (α5 ) = L0 (1) + L1 (α) + L1 (α2 ) + L2 (1) + L2 (α) + L2 (α2 ) f (α5 ) =
L0 (α0 ) + L1 (α5 ) + L2 (α) =
L0 (1) + L1 (1) + L1 (α) + L1 (α2 ) + L2 (α)
f (α6 ) = L0 (α0 ) + L1 (α6 ) + L2 (α4 ) = L0 (1) + L1 (1) + L1 (α2 ) + L2 (α) + L2 (α2 ),
where α is a root of the primitive polynomial x3 +x+1. These equations can be represented in a matrix form as F0 1 0 0 1 0 0 1 L1 (1) F1 0 1 0 1 1 0 1 L1 (α) F2 0 0 1 1 0 1 1 L1 (α2 ) = 1 1 0 0 0 1 1 L2 (1) = AS. F F = 3 F4 0 1 1 1 1 1 1 L2 (α) F5 1 1 1 0 1 0 1 L2 (α2 ) F6 1 0 1 0 1 1 1 f0 Then the problem of computing the FFT of a polynomial f (x) can be represented as W 0 0 ¡ 1 1 1 ¢T F = A 0 W 0 f1 , f2 , f4 , f3 , f6 , f5 , f0 , W = α α2 α4 . (3) 2 4 8 0 0 1 α α α ai1 bi1 The first stage of the algorithm is computing bi2 = W ai2 for i = 1, 2, where ai3 bi3 a11 = f1 , a12 = f2 , . . ., a23 = f5 (see (3)). This can be implemented using the following algorithm: bi1 = ai1 + ai2 + ai3 bi2 = α(ai1 + ai2 ) + α4 (ai2 + ai3 ) bi3 = α2 (ai1 + ai3 ) + α4 (ai2 + ai3 ),
which requires 3 multiplications and 6 additions. At the end of the first stage one obtains the vector S = (S0 , . . . , S6 ) = (b11 , b12 , b13 , b21 , b22 , b23 , f0 ). The following algorithm computes the product of a binary matrix A with vector S: T7 = S 3 + S 6 T8 = S 1 + S 5 T9 = S 1 + S 4 T10 = S2 + S5 T12 = S0 + T8 F0 = T0 = S0 + T7 F1 = T1 = T7 + T9
F2 = T2 = T7 + T10 F3 = T3 = S6 + T12 F4 = T4 = T1 + T10 T11 = S2 + T3 F6 = T6 = T9 + T11 F5 = T5 = T6 + T8 .
Thus the FFT of length 7 can be computed with 2×3 = 6 multiplications and 2×6+13 = 25 additions. This is smaller by one addition than in the algorithm presented in [4]. If one chooses the normal basis in (2) then all the blocks of the matrix L are circulant matrices. Thus the problem of the multiplication by this matrix can be considered as a problem of the computing a set of circular convolutions of degree mi | m. Application of these techniques allowed us to construct the FFT algorithm of length 15 with 3×5+1×1 = 16 multiplications and 3 × 10 + 1 × 2 + 45 = 77 additions which is better than the ones presented in [4] (16 multiplications and 100 additions) and [3] (20 multiplications and 70 additions).
4
Conclusions
In this paper we suggested an algorithm for computing the FFT of a polynomial over GF (2m ). The task of computing the FFT of length n = 2m − 1 can be reduced to computing the circular convolutions of length mi | m and multiplication of a binary matrix by a vector.
References [1] E.R. Berlekamp. Algebraic coding theory. New York: McGraw-Hill, 1968. [2] S.V. Fedorenko and P.V. Trifonov. Finding roots of polynomials over finite fields. Accepted for publication in IEEE Transactions on Communications, 2002. [3] E.M. Gabidulin and V.B. Afanasyev. Coding in radioelectronics. Moscow, Radio i Svyaz, 1986 (in Russian). [4] T.G. Zakharova. Fourier transform evaluation in fields of characteristic 2. Problems of Information Transmission, 28(2): 154–167, 1992.