Arithmetic and factorization of polynomials over F2 - CiteSeerX

0 downloads 0 Views 457KB Size Report
Jun 14, 1996 - The multiplication method of Karatsuba & Ofman (1962) has an ... polynomial factorization algorithm over F2 is described in section 10, ...
Arithmetic and factorization of polynomials over F 2

Joachim von zur Gathen and Jurgen Gerhard

Fachbereich 17 Mathematik-Informatik Universitat-GH Paderborn D-33095 Paderborn, Germany e-mail: fgathen,[email protected] June 14, 1996

Abstract

We describe algorithms for polynomial multiplication and polynomial factorization over the binary eld F 2 and their implementation. They allow polynomials of degree up to 100 000 to be factored in about one day of CPU time, distributing the work on two machines. ;

1 Introduction The problem of polynomial factorization over the binary eld F is, given a monic polynomial f 2 F [x], to compute the factorization f = f e1    frer with monic irreducible pairwise distinct polynomials f ; : : : ; fr 2 F [x] and positive e ; : : : ; er 2 N . The eciency of the currently known algorithms for this problem relies on fast polynomial arithmetic, in particular, on fast polynomial multiplication. The multiplication method of Karatsuba & Ofman (1962) has an asymptotic running time of O(n : ) operations in F for polynomials of degree less than n, which is better than the O(n ) bound for the nave multiplication algorithm, but still too slow in practice for large n. Schonhage (1977) gives an O(n log n loglog n) algorithm based on a ternary FFT with roots of unity of 3-power order. Reischert (1995) implemented several algorithms for polynomial multiplication over F , including those by Karatsuba & Ofman, Schonhage, and Cantor (see below). Shoup (1995) has successfully implemented a fast FFT-based algorithm for multiplying polynomials over F p for a prime p, using a modular approach, but it seems to be practical only when p is not too small. Cantor (1989) presented an algorithm for multiplying polynomials of degree less than n over F p for a prime p that uses O(mpm) multiplications and O(m pm) scalar operations (i.e., additions or multiplications by elements of F p ) in the eld F pm , where m is the least power of p with 4n  mpm . It behaves particularly well in the case p = 2. In contrast to the FFT-based algorithms cited above, which evaluate and interpolate at suitably subgroups of the multiplicative group of a 2

2

1

1

2

1

1 59

2

2

2

2

This work is partly supported by the DFG Sonderforschungsbereich 376 \Massive Parallelitat: Algorithmen, Entwurfsmethoden, Anwendungen". 

1

( nite) eld, Cantor's approach uses additive subgroups, i.e., F p -linear subspaces of F pm . Montgomery (1991) implemented a polynomial factorization algorithm that uses Cantor's algorithm as a subroutine, and was able to factor a sparse polynomial of degree more than 200; 000 over F in about 45 hours. In section 2, we generalize Cantor's method to work for prime powers p and arbitrary m. We state explicit \O"-free upper bounds on the time and space cost of our algorithms. In section 3, simpli ed versions of the algorithms for p = 2 are discussed. In section 4, we present arithmetic circuits for our algorithms. In section 10, we report on experiments in which our implementation of Cantor's original method turned out to be superior to these new variants. In the last 5 years, dramatic progress in the area of polynomial factorization has been made, both in theory and in practice. The classical algorithms for polynomials over nite elds are due to Berlekamp (1967, 1970), Cantor & Zassenhaus (1981), and Ben-Or (1981). Recently, many variants and asymptotically faster algorithms have been proposed by von zur Gathen & Shoup (1992), Kaltofen (1992), Niederreiter (1994), Gao & von zur Gathen (1994), Kaltofen & Lobo (1994), and Kaltofen & Shoup (1995). Implementations are described in Kaltofen & Lobo (1994), Shoup (1995), and Fleischmann & Roelse (1995). Section 5 gives an outline of the structure of some modern polynomial factorization algorithms. In sections 6 and 7, we discuss and analyze a new variant of the distinct degree factorization stage in those algorithms, using interval partitions with polynomially growing interval sizes. In sections 8 and 9, we indicate how the distinct degree factorization stage over F can be further speeded up by a special way to compute interval polynomials and by the use of an irreducibility test running in parallel on a second machine. Finally, an implementation of the polynomial factorization algorithm over F is described in section 10, including examples of running times, on pseudorandom polynomials. We have concentrated on optimizing our implementation for the distinct degree factorization stage. Of course, more work is required to also optimize for cases where the input is known to be special, say when we factor trinomials or cyclotomic polynomials. In particular, we have not yet optimized the equal degree factorization stage of our software. An Extended Abstract of this paper will appear in Proc. ISSAC '96, ACM Press, Zurich, Switzerland. 2

2

2

2 Fast polynomial multiplication over F

q

Let F q be a nite eld with q elements and m a positive integer. The extension eld F qm is an m-dimensional vector space over F q . Suppose that W  F qm is a xed k-dimensional subspace, where 1  k  m, and that we want to solve the following problems. 2

Problem 2.1 (Multipoint evaluation) Given f 2 F qm [x] of degree less than qk , compute f ( ) for all 2 W . Problem 2.2 (Interpolation) Given a map : W ?! F qm , compute the unique polynomial f 2 F qm [x] of degree less than q k satisfying f ( ) = ( ) for all 2 W . The algorithms presented below for these two problems admit a natural parallelization, but here we only discuss the sequential versions. We x a basis ( ; : : : ; k ) of W over F q , and for 0  i  k let Wi be the subspace Wi = spanf ; : : : ; ig  W of dimension i. The sets Wi form a strictly ascending chain 1

1

f0g = W ( W (    ( Wk? ( Wk = W; 0

1

(1)

1

and for 1  i  k we have the recursive decomposition

Wi =

[

c2Fq

(c i + Wi? ) 1

of Wi into q pairwise disjoint cosets, which generalizes to

+ Wi =

[

c2Fq

( + c i + Wi? )

(2)

1

for arbitrary 2 F qm . Next, we de ne the sequence of polynomials si 2 F qm [x] for 0  i  k by

si =

Y

2Wi

(x ? ):

Obviously, si is a monic squarefree polynomial of degree qi, and corresponding to (1), we have x = s j s j    j s k ? j sk : The following lemma collects some useful properties of the polynomials si. 0

1

1

Lemma 2.3 The following hold for 0  i  k. (i) The recursion formulae

si =

Y

c2Fq

(si? ? csi? ( i)) = sqi? ? si? ( i)q  si? 1

1

1

1

1

hold if i  1. (ii) si is an F q -linearized polynomial, i.e., si (f + g) = si(f )+ si (g) and si(cf ) = csi(f ) for all f; g 2 F qm [x] and c 2 F q .

3

(iii) si ? si( ) =

Y c2Fq

(si? ? si? ( + c i)) = 1

1

Y 2 +Wi

(x ? ) for all 2 F qm .

Note that statement (iii) of the lemma reduces to statement (i) and the de nition of the si, respectively, if = 0. Proof. We prove i. and ii. by simultaneous induction on i. The case i = 0 is immediate from the de nitions, so we may assume that i > 0. We write v = si? ( i) for short. Note that v 6= 0 since i 62 Wi. (i) Using (2) and the linearity of si? , we obtain 1

si = =

Y

1

Y

c2Fq 2c i +Wi?1

Y

(x ? ) =

(si? (x ? c i)) =

Y Y

c2Fq 2Wi?1

Y

(x ? c i ? )

(si? ? cv) = vq 1

1

Y ?1 (v si?1 ? c)

c2Fq c2Fq c2Fq q q q ? q ? 1 q ? 1 = v (v si?1 ? v si?1) = si?1 ? v si?1; Q we have used Fermat's Little Theorem xq ? x =

where c2Fq (x ? c) with ? v si? substituted for x in the last line. (ii) By what we just proved and the induction hypothesis for i., we have si(f + g) = si? (f + g)q ? vsi? (f + g)  q   = si? (f ) + si? (g) ? v si? (f ) + si? (g) = si? (f )q ? vsi? (f ) + si? (g)q ? vsi? (g) = si(f ) + si(g); and si(cf ) = si? (cf )q ? vsi? (cf ) = cq si? (f )q ? vc  si? (f ) = c si? (f )q ? vsi? (f ) = csi(f ): 1

1

1

1

1

1

1

1

1

1

1

si ? si( ) = si(x ? ) =

and

Y c2Fq

Y

c2Fq

1

1

1

(iii) Using i. and ii., we get

=

1

1

1

=

1

Y c2Fq

(si? (x ? ) ? csi? ( i )) 1

1

(si? ? si? ( ) ? csi? ( i)) 1

1

1

(si? ? si? ( + c i ));

si ? si( ) = si(x ? ) =

1

1

Y 2Wi

4

(x ? ? ) =

Y 2 +Wi

(x ? ): 2

The decomposition (2) and statement (iii) of the lemma suggest the following algorithm for Problem 2.1. First, we generalize the situation to the evaluation of a polynomial f 2 F qm [x] of degree less than qi at all points of the coset + Wi, where 0  i  k and is an arbitrary element of W . Problem 2.1 is then the special case where i = k and = 0. Next, we partition the coset + Wi into q disjoint cosets + c i + Wi? , one for each c 2 F q , as in (2). For any such c, we divide f with remainder by the polynomial si? ? si? ( + c i), which has exactly the elements of ( + c i) + Wi? as roots, by the previous lemma, and then proceed recursively with the remainder rc of degree less than qi? and the coset ( + c i) + Wi? . This will yield the desired result since f and rc have the same values at all points of ( + c i) + Wi? . Algorithm 2.4 (Multipoint evaluation) We assume that the polynomials si 2 F qm [x] for 0  i  k as de ned above and the values si ( j ) for 0  i < j  k are precomputed and stored. Input: i 2 N with 0  i  k, f 2 F qm [x] of degree less than q i, and ci ; : : : ; ck 2 Fq . P Output: The values f ( ) for all 2 + Wi , where = i 0 the cost for the nontrivial steps is: 2. 2(k ? i) ? 1 + 2(q ? 1) scalar operations for the computation of si? ( ) and si? ( ) + csi? ( i) for c 2 F q , and at most i(qi ? qi? ) multiplications and i(qi ? qi? ) scalar operations for each of the q divisions with remainder, together no more than  i(qi ? qi) multiplications and  i(qi ? qi) + 2(k ? i) + 2q ? 3 scalar operations, 3. q  M (i ? 1) multiplications and q  S (i ? 1) scalar operations. Hence we have the recursive inequalities M (i)  qM (i ? 1) + i(qi ? qi); S (i)  qS (i ? 1) + i(qi ? qi) + 2(k ? i) + 2q ? 3 for i > 0. Let T; U : N ?! N be functions with T (0) = 0 and T (i)  qT (i ? 1) + U (i) for i > 0. Unfolding the recursion, we can explicitly bound T in terms of U : 1

1

1

1

1

1

1

1

+1

+1

+1

+1

T (i) 

X i?j q U (j ): j i

1

This is easily proved by induction on i. Hence

M (i) 

X X i?j j +1 j q j (q ? q ) = (q ? 1)qi

j i

j i

1

1

6

j = q ?2 1 (i + i)qi; 2

1

and

S (i) 

X i?j j +1 j q (j (q ? q ) + 2(k ? j ) + 2q ? 3) j i

1

= (q ? 1)qi

X j i

j + (2(k ? i) + 2q ? 3)

X i?j X q +2 (i ? j )qi?j

j i i q ?1

1

j i

1

1

i i = q ?2 1 (i + i)qi + (2(k ? i) + 2q ? 3) q ? 1 + 2 (q ? 1)(iqq ??1)q(q ? 1)   q ? 1 q ? 1 2 2 q ? 3 2 q i i i = i q + 2 iq + q ? 1 kq + q ? 1 ? (q ? 1) qi 2   2 2 q 2 q ? 3 ? q ? 1 (k ? i) ? q ? 1 ? (q ? 1) : 2

2

2

2

2

Letting i = k, we get M (k)  q ?2 1 k qk + q ?2 1 kqk ; + 5 kqk + 2q ? 7q + 3 (qk ? 1) S (k)  q ?2 1 k qk + q 2(?q 2?q 1) (q ? 1)  q ?2 1 k qk + q +2 7 kqk : Denote by A(i) the work space for elements of F qm used by the algorithm on input i. The computation of si? ( ) in step 2 can be done in two storage registers. One division with remainder in step 2 can be carried out in place using one additional register. In a sequential implementation, the storage for the divisions in step 2 and the recursive calls in step 3 can be reused for each c 2 F q , hence space for the output of one division with remainder and three additional registers are sucient, and we have A(0) = 0 and A(i)  A(i ? 1) + qi + 3 for 1  i  k, or k X j (q + 3) = q q ??1 q + 3k  q ?q 1 qk + 3k: 2 A(k)  j k 2

2

2

2

2

2

1

+1

1

The importance of sparse polynomials for fast multipoint evaluation was already remarked by Fiduccia (1972) (in a context where linearized polynomials over nite elds did not occur). For the interpolation, let 2 W and suppose that for each c 2 F q we have already computed rc 2 F qm [x] of degree less than qi? such that rc( ) = ( ) for all 2 + c i + Wi? ; or equivalently, f  rc mod (si? ? si? ( ) ? csi? ( i)); 1

1

1

1

7

1

where f is the unique interpolating polynomial of degree less than qi at all the points of + Wi (note that si? ( ) + csi? ( i) = si? ( + c i), by the linearity of si? ). The following lemma gives an explicit formula for f . Lemma 2.6 Let F be a eld, n 2 N > , U  F with #U = n, s 2 F [x] a nonconstant polynomial, and (fu )u2U  F [x] a sequence of polynomials of smaller degree than s. Then the system of congruences f  fu mod (s ? u) for all u 2 U (3) has a unique solution f 2 F [x] of degree less than n deg s which is given by the formula X Y s?w f = fv (4) v ? w: 1

1

1

1

0

v2U

w2U w6=v

Note that (4) reduces to the familiar Lagrange interpolation formula when s = x. Proof. First, we show that f as de ned in (4) satis es (3). Fix u 2 U . Then Q s?w w2U; w6 v v?w is divisible by s ? u if v 6= u, and hence Y s?w Y u?w f  fu  f  fu mod (s ? u): u w2U u ? w w2U u ? w =

w6=u

w6=u

The degree of f is clearly less than n deg s. Uniqueness follows from the Chinese Remainder Theorem; we give a proof here for completeness. Suppose that there is another polynomial f 0 of degree less than n deg s satisfying f 0  fu mod (s ? u) for all u 2 U . Then f ? f 0 is divisible by s ? u for all u 2 U . If u 6= v, the two polynomials s ? u and s ? v are relatively prime since any common divisor divides their di erence u ? v 2 F nf0g. Therefore, f ? f 0 is divisible by the prodQ uct polynomial u2U (s?u) of degree n deg s, and hence is the zero polynomial. 2 We write s = si? ? si? ( ) and v = si? ( i) for short. Then, by Lemma 2.6 with F = F qm , U = fcv : c 2 F q g, and fcv = rc, f is given by Q X Y s ? dv X 1 d2Fq ;d6 c(s ? dv ) Q rc f = r = c vq? c2Fq d2Fq ;d6 c c ? d c2Fq d2Fq cv ? dv 1

1

1

=

1

= ?q?1 v

1

d6=c

X c2Fq

rc

Q

d2Fq (s ? dv )

s ? cv

=

=

?1

rc s ?sis? s(i ( +) c ) : q ? 1 si?1( i) c2Fq i?1 i?1 i Q

X

In the second line, we used Wilson's identity a2Fq a = ?1 (cf. Lidl & Niederreiter 1983). The last equation follows from Lemma 2.3 (iii). The above discussion leads to the following algorithm. 8

Algorithm 2.7 (Interpolation)

We assume that the polynomials si 2 F qm [x] for 0  i  k as de ned above and the values si ( j ) and ?si ( i+1 )1?q for 0  i < j  k are precomputed and stored. Input: i 2 N with 0  i  k, ci+1 ; : : : ; ck 2 FP q , and an array of values ( ) 2 F qm , one for each point 2 + Wi , where = i 0 the cost for the nontrivial steps is: 2. q  S (i ? 1) scalar operations and q  M (i ? 1) multiplications, 3. for each c 2 F q at most  (i + 1)qi? multiplications and (i + 1)qi? scalar operations for the multiplication of the polynomial rc of degree less than qi? and the sparse polynomial si ? si( ),  iqi multiplications and iqi scalar operations for the subsequent division of the product polynomial of degree less than qi + qi? by the sparse polynomial si? ? si? ( + c i) of degree qi? , and 1

0

1

1

1

1

1

1

9

 2(k ? i) ? 1 scalar operations for the computation of si( ),  at most (q ?1)qi scalar operations for the addition of the q polynomials

of degree less than qi,  at most qi multiplications of the coecients of the result with the ? q precomputed ?si? ( i) , together no more than  i(qi + qi) + 2qi multiplications and  i(qi + qi) + qi + 2(k ? i) ? 1 scalar operations. This yields the recursive inequalities 1

1

+1

+1

+1

M (i)  qM (i ? 1) + i(qi + qi) + 2qi; S (i)  qS (i ? 1) + i(qi + qi) + qi + 2(k ? i) ? 1: +1

+1

The solutions are

M (i) 

+1

X i?j X q (j (qj+1 + qj ) + 2qj ) = (q + 1)qi j + 2iqi j i

j i

1

1

= q +2 1 (i + i)qi + 2iqi; 2

and

S (i) 

X i?j j +1 j q (j (q + q ) + qj+1 + 2(k ? j ) ? 1) j i

1

= (q + 1)qi

X j i

j + iqi + (2(k ? i) ? 1) +1

X i?j X q +2 (i ? j )qi?j

j i i q ?1

1

j i

1

1

i i = q +2 1 (i + i)qi + iqi + (2(k ? i) ? 1) q ? 1 + 2 (q ? 1)(iqq ??1)q(q ? 1)     q + 1 2 2 q q + 1 1 i i i = 2 i q + 2 + q iq + q ? 1 kq ? q ? 1 + (q ? 1) qi   2 q 1 2 ? q ? 1 (k ? i) + q ? 1 + (q ? 1) : Finally, for i = k we get M (k)  q +2 1 k qk + q +2 5 kqk ; S (k)  q +2 1 k qk + 3q2(?q ?2q1)+ 3 kqk ? (3qq??1)1 (qk ? 1)  q +2 1 k qk + 3q 2+ 5 kqk : 10 2

+1

2

2

2

2

2

2

2

2

2

Let A(i) denote the work space of the algorithm on input i. The computation of si( ) in step 3 can be done in two storage registers. In a sequential implementation, the space for the recursive calls of the algorithm in step 2 can be reused for each c 2 F q , and the sum in step 3 can be computed in an accumulating way. The multiplication of one rc with si ? si( ) and the subsequent division with remainder can be carried out in place in the output space of the division, which can also hold the input rc, using one additional register. The sum is accumulated in the output space for f . Hence space for the output of one division with remainder in step 3 and three additional registers are sucient, and we have A(0) = 0 and A(i)  A(i ? 1) + qi + qi? + 3 for 1  i  k, or 1

A(k) 

X j k

1

k + 1 qk + 3k: 2 (qj + qj? + 3) = (q + 1) qq ??11 + 3k  qq ? 1 1

Remark. Cantor (1989) considers the special case where q = p is prime, k = m = pd is a power of the characteristic, and Wi = ker 'i for the F p -endomorphism ': 7! p ? of F ppd . He gives a basis ( ; : : : ; pd ) of W = Wk = F ppd over F p such that Wi = spanf ;    ; ig and '( i) = i? 2 Wi? for 1  i  k. Then 1

1

1

1

'(si? ( i)) = si? ('( i)) = 0; 1

1

and hence si? ( i) 2 ker ' = W = F p , and si? ( i ) 6= 0 since i 62 Wi? . This implies that the polynomials si satisfy 1

1

1

1

s = x; s = xp ? x; si = spi? ? si? ( i)p? si? = spi? ? si? = s  si? 0

1

1

1

1

1

1

1

1

1

for i  0, or more generally si = sj  si?j for 0  j  i and si( ) = 'i( ) for 2 F pm . So the coecients of the si are in F p , which reduces the number of multiplications in F pm to O(kqk ). Furthermore, si( j ) = 'i( j ) = j?i for i < j , and the values si( j ) need not even be precomputed. In the precomputation stage for the algorithms 2.4 and 2.7, the following objects have to be computed.  The valuesl si( j ) for 0  i < j  k. This can be done by rst computing the powers jq for 0  l < j , taking at most 2(j ? 1)(dlog qe? 1) multiplications in F qm for j , and then multiplying them with the appropriate coecients of si and adding up the results. The total cost for this is at most X X i = (dlog qe +1)(k ? k)+ 61 (k ? k) { 2(dlog qe? 1) (j ? 1)+ i c. To compute x mod b for some e 2 N , we use the following variant of von zur Gathen & Shoup's algorithm. Algorithm 9.1 Frobenius powers. Input: b 2 F [x] and e 2c N n f0g. We assume that the remainders of the polynomials x ; x ; x ; : : : ; x modulo multiples of b have already been computed for some c 2 N . Output: g 2 F [x] with deg g < deg b and g  x e mod b. P 1. Write e = jl ej 2l?j with ej 2 f0; 1g for 0  j  l and e = 1. 2

4

8

2

2

2

2

2

2 4

8

2

2

2

0

0

P

2. Let k 2 f0; : : : ; lg be minimal such that be=2k c = 0jl?k ej 2l?k?j  c, k and set gl?k = x2be=2 c rem b, performing one table-lookup and one subsequent division with remainder by b. 3. Repeat step 4 for i = l ? k + 1; : : : ; l. 4.

Set hi = gi?1 (gi?1 ) rem b, using one modular composition.

5.

If ei = 0 then set gi = hi else set gi = h2i rem b.

f gi  x be=2 2

l?i c

mod b g

6. Return gl .

The cost of the above algorithm is essentially at most blog c e c + 1 modular compositions, since the cost for the initial division with remainder and the modular squarings are negligible. So if c is close to deg b, only very few modular compositions are sucient to test b for irreducibility, in particular when deg b has no small prime factors. Since the space occupied by the the data written to disk grows quadratically with the input size, the practicability of the above method is limited by the amount of disk space available. For degrees above 131072, the table size easily exceeds 1024M bytes. One can still, however, store as much as possible in that case. We note that the irreducibility test can not only check whether b is irreducible or not but can also detect if b is an equal-degree polynomial of order dividing deg b, using Fact 7.4 in von zur Gathen & Shoup (1992). 2

31

+1

10 Implementation and running times In this section, we describe our implementation of the polynomial factorization algorithm over F on two Sun Sparc Ultra 1 computers, rated at 143 MHz each. The software is written in C++. Polynomials over F are represented as arrays of 32-bit unsigned integers, and 32 consecutive coecients of a polynomial are packed into one machine word. We built a C++ class for polynomials over F o ering standard operations like copying, reversing, shifting, and determining the degree of polynomials, the arithmetic operations addition, multiplication, squaring, division with remainder, and the Extended Euclidean Algorithm. Polynomial multiplication. We have implemented several algorithms for polynomial multiplication over F : the school method, Karatsuba & Ofman, Cantor's method over F 16 and over F 32 , and our extension of Cantor's method over F 20 . We did not implement Schonhage's algorithm. The timings of Reischert (1995) indicate that in his implementation, it beats Cantor's method for degrees above 500; 000, and for degrees around 40; 000; 000, Schonhage's algorithm is faster than Cantor's by a factor of  . As basis for the classical multiplication and the method of Karatsuba & Ofman, we have tried several algorithms for the multiplication of polynomials of degree less than 32: by direct classical multiplication, by 3 classical multiplications of 16-bit blocks a la Karatsuba & Ofman, and by 9 multiplications of 8-bit blocks a la Karatsuba & Ofman, where the 8-bit blocks are multiplied via table-lookup (the corresponding table uses 128k bytes of main memory). The last variant has turned out to be the fastest. Multiplication in F 16 is implemented by means of an exponentiation and a discrete logarithm table with respect to a primitive element of the multiplicative group. This was also done by Montgomery (1991) and Reischert (1995). The cost for one multiplication in F 16 is then essentially the cost for three table lookups and one addition of 16-bit integers. The size of each of the two tables is 256k bytes of main memory. Multiplication in F 20 is done in the same way, using an exponentiation and a logarithm table of size 4M bytes each. For multiplications in F 32 , the corresponding tables are far too large to t in main memory, and we use an alternate approach. We use two representations P for elements of F 32 . The rst is the usual polynomial basis representation i< ai i, where = x mod G for an irreducible generating polynomial G 2 F [x] of degree 32 for F 32 . This will be referred to as representation A in the sequel. The second is a polynomial basis representation of F 32 over F 16 . We have chosen the primitive polynomial g = x + x + x + x + 1 2 F [x] as generating polynomial ofPF 16 over F , and write elements of F 16 in the polynomial basis representation i< bi i with = x mod g 2 F 16 and all bi 2 F . The polynomial F = x + x + 2 F 16 [x] is irreducible over F 32 , and G is chosen so that F ( ) = 0. Hence we may represent 2

2

2

2

2

2

2

3 2

2

2

2

2

2

0

32

2

16

15

2 4

13

2

2

2

2

2

2

2

2

2

2

32

0

16

2

elements of F 32 as A + A , with A ; A 2 F 16 . We call this representation B. The product with another element B + B of F 32 can be computed as 1

2

0

0

1

1

2

0

2

(A + A )(B + B ) = ((A + A )(B + B ) + A B ) + (A B + A B ); 1

0

1

0

0

1

0

1

0

0

0

0

1

1

using 3 multiplications in F 16 (the multiplication of A B by is just a shift in the polynomial basis representation of F 16 ). If we want to multiply two polynomials a; b 2 F [x] using multipoint evaluation and interpolation at linear subspaces of one of the three elds F m with m 2 f16; 20; 32g as above, we write a and b as 1

2

1

2

2

2

a=

X

i