some historical notes on number theoretic transform

SOME HISTORICAL NOTES ON NUMBER THEORETIC TRANSFORM M. Bhattacharya*, R.Creutzburg **, and J. Astola * * Institute of Signal Processing Tampere University of Technology P.O. Box 553, Tampere, FIN 33101, FINLAND.

E-mail: {mrinmoy, jta}@cs.tut.fi

** Fachhochschule Brandenburg

University of Applied Sciences P. O. Box 2132 D-14737 Brandenburg an der Havel, GERMANY E-mail: [email protected]

1. INTRODUCTION

by term (i.e., pointwise) multiplications are done; the sequence so obtained is inverse-transformed to get the output sequence. The primary method as on these days is Fast Fourier Transform (FFT). Number Theoretic Transform (NTT) has the necessary property of CCP and uses the same architecture as that of FFT and is attractive always due to its exactness i.e., free of round off errors compared to other methods and implementation by simple and real arithmetic for real sequences, in general. However, it is associated with the stringent relation of convolution length N with the choice of modulus M. Efforts to alleviate this problem is generally associated with increase in computational effort in terms of computational structure. The basic requirement of W N =1 (1) where, W = e − j 2π / N (2) i.e., W is the N-th root of unity, in computing FFT, is similar to, (3) α N = 1 mod M,

Digital signal processing may be termed as arithmetic operations of real number sequences in some way as defined and stipulated by the algorithms in order to achieve the desired result. Also, we note that any real number appropriately scaled is equivalent to an integer and hence digital signal processing is arithmetic operations between numbers and this is where the applicability of number theory starts. Filtering is one of the basic process in signal processing that is represented by convolution of two sequences It is well known that direct computation of convolution of two sequences {x(n)} and {h(n} requires excessive amount of computational effort even for moderate lengths of the sequences. Transform domain methods are used to reduce this effort. The sequences are transformed to the transform domain using the transforms possessing the cyclic convolution property (CCP). Term

where, M is the modulus, a prime integer, and α is an integer of order N or N-th root of unity that is required in computing Number Theoretic transform (NTT). It is opined by the authors that interest in NTT may have remained (if not in the lead vis-à-vis FFT), but for the facts that when the NTT schemes were being developed, hardware i.e., bits and bytes (those relates to word length of data sequences) were quite costly those days compared to nowadays, and unlike Fourier spectra, number theoretic spectra of a sequence carries no meaning (meaning that application of NTT would have remained confined generally to convolution; however, it is mentioned that algorithm associated with a basic principle exists wherein convolution can be used to compute FFT and that part of convolution could be computed by NTT as well [11, 47]), in spite of advantages of being free of any round off error due to arithmetic operations, as the

ABSTRACT Modulo arithmetic modulo a prime integer have many interesting properties. Such properties are found in standard books on number theory. Some properties are especially of interest to the signal processing application. It was observed analogy exists between some of them and that cyclic convolution of two sequences modulo a prime integer of two sequences could be computed in integer domain as can be done by Fast Fourier Transform using complex real numbers, leading to exactness of the final result (i.e., free of any roundoff errors). These methods, appropriately named as Number Theoretic Transform, are associated with both advantages and disadvantages. These developments in signal processing algorithms took place following the footsteps of developments of Fast Fourier Transform techniques. This paper traverses some of the developments of the Number Theoretic Transform techniques over time and discusses mostly the initial contributions and efforts made by various researchers.

computations are done by modulo an integer, and data paths would remain real unlike those in FFT where each data path is actually comprised of two data paths i.e., real and complex parts. The major disadvantage is the stringent relation between the choice of modulus M and the sequence length N, especially in the cases where one wishes to employ simple arithmetic like bit shifts and adds in place of multiplications and chooses a simple modulus of type 2n±1 along with simple α like powers of 2. It may be noted that modulo arithmetic operations, modulo a general integer requiring many bits to represent, is more complex. We will see that the efforts of researchers generally mainly revolves around removing and/or reducing the stringent relation between the word length and the convolution length and a suitable value of α . While we will try to adhere to describe the developments chronologically as they appeared in the publications, we will not be able to follow that strictly, as we will be including certain discussions as well to highlight and elaboration (accompanied with illustration) of certain issues related to the development. At times we may appear to switch back and forth. Further, in this paper we will not try to explore or trace the history of the two of the most related topics. Those are: (a) Chinese Remainder Theorem (CRT) and (b) Fermat’s theorem; these two are almost inseparable with theory and practices of NTT. CRT helps in two ways: (a) mapping indices of an unidimensional data vector into multidimensional data set and back (after the required operations), when the sequence length N is equal to

∏p

i

, where pi ’s are integer primes, and

i

(b) increasing the dynamic range by using multiple moduli (mutually prime integers) of the type M = M1M2......., where Mi ’s are mutually prime integers. Burrus [17] provides an excellent exposition of such mappings, while most of the authors generally provides the mapping of input and output indices those are quite similar. Fermat’s too well-known theorem [1, 2, 29, 36, 42], p a = a mod p , a , p are integers with p a prime integer, along with its many ramifications are well documented in many books and publications; and this theorem is the basic foundation for the NTT. All the basic essentials and the related relevant portions of number theory and NTT are very well expositioned in [1] and we suggest that all uninitiated readers should read this. This will help them in understanding the cases of single modulus, the cases of multiple moduli, the mapping of indices, the wordlength requirement etc., and to sum up help them to understand the constraints and problems faced in developments of NTT.

2. Contributions of the Researchers In 1965, Cooley and Tukey [19] published their findings in the first paper on computation of Discrete Fourier Transform (DFT) that is later known to be the FFT. The method, shown to be far superior (in a sense that requires much less computation than the direct computation of multiplication of the DFT matrix and the data vector), depends primarily on two factors. (a) the sequence length must be highly composite, i.e., factorable into many factors of small length and a particular way unidimensional data is mapped into twodimensional data (which is again repeated for each of these dimensions till the smallest factor), that is known as the famous ‘divide and conquer approach’, that led to a gross reduction in computational effort (that depends again on the degree of compositeness of the sequence length N). (b) The basic requirement of W N = 1 where

W = e − j 2π / N i.e., W is the N-th root of unity. Fairly large research activity followed in the wake of this celebrated and valuable publication. A plethora of activities on investigation in various aspects of computation of DFT and convolution was started by the researchers interested in this area. Good [28] in 1971 had shown the difference between the data mapping proposed based on CRT when the factors of N are mutually prime and those based on Cooley-Tukey approach where factors need not be mutually prime (rather, it is better to have small factors like, say two only with N as large powers of two). In the case of the former, while data tracking is more complex, there will not be any requirement of multiplication by inter-stage twiddle factor as in the case of latter case. This former case is generally referred to as prime factor algorithm (PFA) [31] of computing the DFT. Around the same time Pollard [45] discusses the fast Fourier transform in a finite field, where he also points out about the equivalence of convolution for multiplication of large integers and utility of transform method. 2.1 Mersenne Number Transform In 1972, Rader (one of the leading pioneers in the initial development of digital signal processing and the coauthor of the first book on digital signal processing [27]) proposed a scheme modulo Mersenne number primes defining Mersenne number transform (MNT) [46]. The p-th Mersenne numbers Mp’s are defined to be the prime integers of type 2p−1, p also being a prime.We mention that so far there are only forty-one Mersenne primes found so far (the first few values of p are 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203 ...............). The 41st prime is discovered only in this year

and this is the largest prime found so far, at this point of writing (a tail-piece about the 41st Mersenne number along with its some interesting features is being included at the end of the paper). As, Mp=2p−1

(4)

2p=1 mod Mp

(5)

then it follows that

meaning that 2 is of order p in the finite ring of integers under operations of additions and multiplication modulo Mp. Following our discussion in the earlier section, we conclude that 2 could be an α to compute NTT (and subsequently convolution) for a sequence length p. Addition modulo Mp is simply one’s complement addition in a p-bit word. Multiplication modulo Mp is achieved by forming the 2p-bit product of two words and adding the least significant p-bits with the most significant p-bits, in one’s complement fashion. Multiplication by powers of 2 modulo Mp is done by rotation of the bits within the p-bit word; all the computation is done by additions and bit shifts.

2, 3, 3, 7, 11, 31, 151, 331) that can be termed as barely composite, to gain any benefit from FFT structure; further α (for Nmax) and its higher powers will not be simple like 2 or powers of 2 for simple multiplication by word shift or rotation operations. At this stage we see that there exist undesirable relations between choice of modulus M, i.e., the wordlength, N the convolution length, value of α the root of order N, and the computational complexity. Rader [46] suggested the use of Fermat numbers as the choice of modulus to alleviate some of the problems in use of MNT.

2.2 Fermat Number Transform Agarwal and Burrus [1−3] thoroughly investigated the use of Fermat numbers as the choice of modulus in the NTT that came to be known as Fermat number transfer (FNT). Fermat numbers denoted by Ft where t-th Fermat t number is defined as 22 +1, t an integer = 0, 1, 2, . . . . , (it may be noted that most of the the Fermat numbers are not primes; only the first five numbers F0 thru’ F4 are primes, others so far searched are known to be composites (see page 554 in [1] )). As, t

Ft = 22 +1

2.1.1 Visualization of Constraints and problems in NTT In the earlier paragraph, while it appears to be quite attractive, realization of the deficiencies becomes selfevident. We illustrate them as follows: (a) With α as 2 the maximum length for convolution is p, a prime integer, while NTT follows the architecture of FFT that is efficient only when the length N is highly composite; with α as −2 the length can be doubled to 2p, but then even 2p is barely composite (the reasoning that −2 is a root of order 2p mod Mp follows from the fact that (2, p)=1, meaning 2 and p are mutually prime, 4 i.e., 22 is also root of order p, so −2 = 41/2 will be a root of order 2p [1]. (b) Maximum length Nmax possible is Mp−1=2p−2 that will require α be a primitive root of order Nmax. This value of α will be represented by few bits unlike 2 and so will be its higher powers, leading to far more complexity in modular arithmetic especially in the case of multiplications. The advantage we gain is that Nmax or its submultiple length available may be composite (but may not be highly, as is desirable in the computationally efficient FFT structure). As an example we consider M31 = 231−1 = 2147483647 (that can be represented by 31-bits wordlength i.e., all ones in a 31-bit register noting powers of two runs from zero upwards), with α as 2 or −2, one can obtain a convolution length of 31 or 62 respectively. At the same time, Nmax = 2147483646 has factors (as

(6)

it follows that for primes, i.e., t ≤ 4, the order of 2 is 2t+1 as, t

22 = −1 mod Ft

(7)

leading to t

2(2 ).2 = 22

t+1

= 1 mod Ft .

(8)

Hence, with α as 2, Nmax would be 2t+1. The concept of √2 was introduced to double the length to 2t+2 with α ’s that required only two bit representation. Equating b, the number of bits, with 2t, 2 may be termed as α2b and √2 as α4b. Actually the composite Fermat numbers F5 (b=32) and F6 (b=64) are attractive for the applications in signal processing. It was pointed out that while these numbers are composite and their factors are of the type K2t+2 + 1, they can still be utilized for transforming with α2b and α4b with

α4b = 2b/4 (2b/2 − 1)

(9)

and

α4b2 = 2 mod Ft .

(10)

(Please refer to the relevant discussions in [1, 2]). It can be seen (from the Table 1 in [1]) that the undesirable and

restrictive relations between N, M, α , and computational complexity still continue in FNT as well. As suggested by Rader in [46], Agarwal and Burrus [3] proposed a two-dimensinal scheme for convolution of length N = LP being implemented as a two-dimensional cyclic convolution of length 2L by P that can be computed by using a two-dimensional FNT. Using this twodimensional scheme, the wordlength is proportional to the square root of the length of the sequences to be convolved which would give for a maximum length of 8b2, rather than 4b (refer Table II in [1]). Because of modular arithmetic, in the ring of integers mod M, unambiguous representation of integers is possible only if their absolute value is less than M/2. So, either the wordlength M will have to be chosen accordingly, or the input sequences will have to be suitably scaled. This is akin to the overflow constraint in fixed-point digital filtering. Some schemes are proposed in [1] as alternatives; arithmetic operations in FNT are described in [2]. They had also suggested and described use of multiple moduli to increase the dynamic range and complex NTT (CNTT) that was mainly theorized and initiated by Reed and Truong [48−51]. McClellan [35] demonstrated hardware for FNT. He used a number representation scheme based on a general class of representation evolved by Leibowitz [32] that is known to be ‘Diminished−1 number representation’. This leads to modulo 2b+1 arithmetic to be implemented in a manner similar to one’s complement arithmetic i.e., modulo 2b−1 arithmetic.

2.4 Some Related Contemporary Developments In this subsection, we briefly describe a few contemporary developments, while not being number theoretic exactly in a sense that we have discussed till this point, will be generally related to some of the developments those we will describe in the latter subsections. Firstly, in 1968 Rader [47] made a very interesting observation that when number of data samples is a prime, DFT can be expressed as convolution plus addition of a few simple terms. Illustrations would be found in [31, 55, 60] where one can see that by appropriate permutation of indices, the DFT matrix (excluding the first row and the first column) is a convolution matrix. Secondly, Agarwal and Cooley [4] developed direct algorithms for convolutions for short lengths of 2, 3, 5, 6, 7, 8 and 9 (hereafter will be referred to as the AC algorithm). When the factors of convolution length are mutually primes, these short length algorithms can be made use of. Using CRT type of mapping the unidimensional data are formulated as multidimensional data. Nested algorithm is made use to combine the individual length algorithms and remap the final result similar to input mapping. The individual short length

algorithms may be termed as rectangular transform. The speciality is that elements of the rectangular matrices are quite simple that leads to the matrix-vector multiplication by additions (and/or subtractions). The disadvantage is that the length of the data vector expands in the intermediate stages and then contracts back to its original length that leads to increase in complexity in data tracking unlike in FFT structure where the length of the data vector remains the same through all the intermediate stages. It was seen that for convolution lengths up to 210 this algorithm requires the least amount of computational effort compared to other methods. Thirdly, based on [47] Winograd [55, 60] developed small length DFT algorithm of lengths 2, 3, 4, 5, 7, 8, 9, and 16 in a manner that is somewhat akin to the AC algorithm. This algorithm is referred to as Winograd Fourier transform algorithm (WFTA). Composite lengths comprising of mutually prime factors is employed using a similar nested algorithm. Lucid illustrations and relevant explanations are found in [55, 60]. As in the case of the AC algorithm, the length of the data vector expands in the intermediate stages and then contracts back to its original length that leads to increase in complexity in data tracking unlike in typical CooleyTukey type FFT structure where the length of the data vector remains the same through all the intermediate stages. Baily [6] developed a number theoretic version of the WFTA; independently of this development Bhattacharya and Agarwal [7] also developed the number theoretic version that they termed as Winograd Number Theoretic transform algorithm (WNTA). Kolba and Parks [31] demonstrated that WFTA can be used in prime factor algorithm (PFA), unlike in nested algorithm, with an advantage of reduced additions per point at the cost of some marginal increase of multiplications. We mention that using nested algorithm in both AC and WFTA the number of additions increases at a much rapid rate with increase in length.

2.5. Complex Number Theoretic Transform (CNT) Reed and Truong [48−50] initiated the concept of complex number theoretic transform (CNT) from the theoretical view point of Galois field theory (We mention that I. S. Reed is the co-inventor of the famous ReedSolomon codes). They defined the transform in the Galois field of q2 elements GF(q2), a finite field analogous to the field of complex numbers, when q is a prime such that (−1) is not a quadratic residue, i.e., x2 = −1 mod q

(11)

is not solvable (meaning there is no integer x that satisfies above relation). It can be shown that it holds good for Mp’s the Mersenne primes, i.e., q could be a Mersenne

prime where the modulo arithmetic is quite simple as mentioned earlier. They also outlined a procedure for finding a primitive element of order 2p+1, of the type a + ib, a, b, integers, (and i is analogous to √−1 of complex number field) in G2p+1 of GF(q2). It is seen that that this transform has certain disadvantage due to the fact that multiplication by powers of primitive element is not simple like bit shifts and/or word rotation, because the element is like an integer comprising of few bits (to many bits). While lengths obtainable are quite high, the modulo operation is simple due to use of Mersenne primes. In [49] they had shown that by using CRT the transforms can be computed (i.e., leading to complex integer convolution like earlier) over direct sum of Galois fields GF(q2) in order to obtain larger dynamic range. It is akin to our earlier statement of using multiple moduli, along with CRT method of combining, to obtain increased dynamic range. Further extending their work [50], for utilizing the Fermat numbers, they had defined transforms over a ring of quadratic integers, modulo a prime number q in the quadratic field R(√m), where m is a square free integer. When q is a Fermat prime, one can use FFT algorithm over the resulting finite fields to yield fast convolutions of quadratic integer sequences in R(√m). This was extended to to direct sum of such finite fields; as a result, such transforms can be utilized with nonprime Fermat numbers F5 and F6.

2.6 Pseudo- Number Theoretic Transforms Noting the scarcity of primes of the type of primes of the type 2b ± 1 along with simple α like powers of 2, so that we have wide choices of wordlength along with wide choices of convolution lengths Nussbaumer [37, 38, 39] suggested using large prime factors of the composite numbers such as M =Mb/x, where Mb = (2b ± 1) and x is small part of the factors that may or may not be composite. As a result there will be some loss in the effective wordlength compared to b the working wordlength corresponding to M. The complete operation is done modulo Mb that is equal to 2b ± 1 to have simple modulo arithmetic except the the final output is taken after reduction modulo M =Mb/x that gives the correct output. These types of Mb’s are known as Pseudo-Fermat or Pseudo-Mersenne numbers depending upon whether Mb = 2b +1 or 2b −1, respectively. Accordingly, the transform is known as Pseudo-FNT or Pseudo-MNT. He also introduced the use of roots i.e., like 2(j−1) and (j+1) (like the concept of √2 to increase the length permissible with the exception that while √2 can be represented in general by two bits or that element whose square is 2 modulo M, the aforesaid roots are actually

complex and complex Pseudo NTT’s are performed; readers may note that the operation as such, and the computation by such roots and its higher power are quite simple and no multiplication is involved at all). The logic of such roots is illustrated by the example in next paragraph. In the case of M = (225 + 1) / (3.11), with α as 2 one can compute a transform a length of 50, then 24 will be of order 25. Now as (j +1)8 = 24 and (8, 25) = 1, i.e., mutually prime, the order of (j +1) will be 200 (eight times twenty-five). Here the effective wordlength is reduced to 20. Examples of such moduli will be found in [7, 37−39].

2.7 Hybrid Transform methods It is normally the case, that there exists many methods for doing the same thing, with each method having its own advantages and disadvantages. Hence it may be possible that one may be able to combine advantages of some of them and devise methods to gain some advantage. We briefly outline some of the schemes. Agarwal and Cooley [3] had shown that FNT can be combined with the AC algorithm i.e., the short length convolution algorithms. The length of the first dimension is taken as 128 (using F5) and the lengths of the other dimension are taken as mutually prime odd numbers 3, 5, 7, and 9. From the results it is amply clear that the length is increased with a very small computational effort. Reed and Truong [51] had similarly shown that the short length AC convolution algorithms can be combined with the CNT’s (refer to subsection 2.5) they had developed in GF(q2) using the Mersenne primes as q. In [7], Bhattacharya and Agarwal demonstrated quite in detail the efficacy of using the hybrid type of techniques by decomposing the convolution length N in three dimensions (mutually prime factors) and employing different methods in each dimension like WNTA in nested mode or PFA mode (the number theoretic variant of WFTA), the short length AC algorithm, and the NTT of the type (that we had described here these requires no multiplication). The requirement of computational effort had confirmed the utility of such schemes in computing convolution.

2.8 New Mersenne Number Transform (NMNT) This type of transform was developed by Boussakta and Holt [12−15] and is considered an intelligent and excellent derivative where by the CNT approach as mentioned in subsection 2.5 is suitably converted to a real transform. The transform is defined as

X (k ) =

N −1

x(n)( β1 (nk ) + β 2 (nk )) mod M p

(12)

n =0

Here M p is a Mersenne prime and

β1 ( nk ) = Re ((α1 + jα 2 ) nk ) and

(13)

β 2 (nk ) = Im ((α1 + jα 2 ) nk ) Also, α1 = 2q, α2 = (−3)q, q = 2p − 2 mod M p , and Re(.) and Im(.) denotes the real and imaginary parts. In (13) α1 and α2 are of order N = 2p + 1 as given in [48] and mentioned in the subsection on CNT. For lengths N/d,

β1 ( nk ) = Re (((α1 + jα 2 ) d ) nk ) and

(14)

β 2 (nk ) = Im (((α1 + jα 2 ) d ) nk ) The inverse transform is the same as in (12) except for the multiplying factor of 1/N. The transform length is an integer powers of two and can be up to 2p. It is mentioned that arithmetic operations and residue reduction modulo Mersenne primes are simpler, all practical lengths of powers of two will be available, and the multiplications are normal type as (β1(.) + β2(.)) will be an integer needing few bits representation not like powers of 2. They have also proved that this transform has the necessary circular convolution property (CCP); also, they have shown that this can be combined with FNT to increase the dynamic range as like earlier cases of multiple moduli.

2.9 Miscellaneous Effort and Schemes In this subsection, we outline very briefly the efforts and schemes made by some researchers. Creutzburg [20−23] along with his co-researchers investigated on NTT of prescribed length, pseudo- NTT’s, and CNT’s using cyclotomic integers and polynomials in order to determine the relevant parameters, i.e. N, M, and α. in a general sense. Lu and Lee [34] made an effort to solve the sequence length constraint problem. They proposed generalized modulo primes (GMP) of the type given by t

(15) M = pq ± a such that M is a prime for some p, q, t, and a. One can see that for some values we can get a Fermat number and for some other set we can obtain Mersenne numbers. Parker and Benaissa [43] proposed application of Rader’s algorithm [47] that states that when number of

samples is a prime, DFT of the sequence can be represented by convolution that again can be computed by NTT. This holds good for NTT also. They presented a scheme by which one can obtain large lengths by recursively applying a small length NTT algorithm module. Dimitrov, Cooklev, and Donevsky [24, 25] presented the scheme for generalized Fermat-Mersenne number theoretic transform; they also defined NTT over the golden section quadratic field. Lastly, we mention that Golomb, Reed, and Truong [26] investigated the prime numbers of the type 3*2n +1, where there are many values of n. Under such choice of modulus Nmax is equal to 3*2n that is highly composite. They had developed certain arithmetic operations scheme modulo such M’s, and the procedure to find a root of given order. Bhattacharya and Astola [9] investigated the set of primes K*2n +1, a much larger sets of primes [52] for their utilization. They noted that modulo such numbers one can see that the sequence length constraint vis-à-vis the word length almost vanishes and proposed certain schemes for their utilization. Further, noting, that Nmax contains many factors those are squares, they developed some scheme for implementing Bluestein [11] algorithm for linear filtering.

3. Conclusions It is opined by the authors that interest in NTT may have remained and would not have waned as one can notice this days (if not in the lead vis-à-vis FFT), but for the facts that when the NTT schemes were being developed, hardware i.e., bits and bytes (those relates to word length of data sequences) were quite costly those days compared to nowadays. Unlike Fourier spectra, number theoretic spectra of a sequence carries no meaning indicating that application of NTT would have remained confined generally to convolution. However, it is mentioned in the paper that algorithm associated with a basic principle exists [11, 47] wherein convolution can be used to compute FFT and that part of convolution could be computed by NTT as well. We would reiterate the advantages of being free of any round off error due to arithmetic operations, as the computations are done by modulo an integer; and data paths would remain real unlike those in FFT where each data path is actually comprised of two data paths i.e., real and complex parts. The major disadvantage is the stringent relation between the choice of modulus M and the sequence length N, especially in the cases where one wishes to employ simple arithmetic like bit shifts and adds in place of multiplications and chooses a simple modulus of type 2n±1. It may be noted that modulo arithmetic operations modulo a general integer is more complex.

However, one would also point out that for real sequences DFT is symmetric; there is a well established way that one can form a complex sequence of length N from two consecutive real sequence of data and then compute one N length FFT. Subsequently, it is possible to separate the DFT’s of the sequence by a few adds and subtracts per point. As number theoretic spectra has no meaning, this advantage is not available for NTT; for comparison purpose one may have to compare 2N length sequence NTT with N length FFT. However for convolution of complex sequences such disadvantage will not apply for using NTT especially with roots as 2(j−1) and (j + 1) using pseudo-NTT’s. This paper was also intended for reviving some interest in this topic for further investigation against the backdrop of present day signal processing research. It is with this intention that there are many references included in the reference list those are not referred to in the main body of the paper. A fully exhaustive and all inclusive investigation and comparison at present days hardware and software techniques along with cost for practical application is warranted, to appreciate the full benefits.

4. Some References. [1] R. C. Agarwal and C. S. Burrus, "Number Theoretic Transforms to Implement Fast Digital Convolution," Proc. IEEE, vol. 63, pp. 550-560, Apr. 1975. [2] …………….."………………, "Fast Convolution using Fermat number transforms with applications to digital filtering," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-22, pp. 87-97, Apr. 1974. [3] …………………”………………, “Fast one-dimensional digital convolution by multi-dimensional techniques,”IEEE Trans. Acoust., Speech,and Signal Processing, vol. ASSP22, pp. 1-10, Feb. 1974. [4] R. C. Agarwal and J. W. Cooley, "New Algorithms for Digital Convolution," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-25, pp. 392-410, Oct. 1977. [5] …………………”…………….. , “New algorithms for digital convolution,” Proc. IEEE ICASSP, May 1977, pp. 360-362. [6] D. Bailey, "Winograd’s algorithm applied to number theoretic transform". Electron. Lett. vol. 13, pp. 548-549, Sep. 1977. [7] M. Bhattacharya and R. C. Agarwal, "Number Theoretic Techniques for Computation of Digital Convolution," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP32, pp. 507-511, Jun. 1984. [8] ……………”……………., “Comments on “A fast computation of complex convolution using a hybrid transform”,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 1072-1073, Oct. 1981. [9] M. Bhattacharya and J. Astola, “Recursive structure for linear filtering using number theoretic transform,” Proc. 8th IEEE Intl. Conf. Electronics, Circuits, Systs., ICECS 2001, vol. 1, pp. 525-528, Sep. 2001. [10] R.E. Blahut, Fast Algorithms for Digital Signal Processing. Addison-Wesley,1987.

[11] L.I. Bluestein, "A linear Filtering Approach to the Computation of Discrete Fourier Transform". IEEE Trans. Audio. Electroacoust. vol. AU-18, pp. 451-455, Dec. 1970. [12] S. Boussakta and A. G. J. Holt, “New Number Theoretic Transform.” Signal Processing, vol. 28, pp. 1683-1684, Aug. 1992. [13] …………….."………………., “New transform using the Mersenne numbers,” IEE Proc. Vis. Image Signal Process., vol. 142, pp. 381-388, Dec. 1995. [14] ………………”………………., “A novel combination of NTTs using the MRC,” Signal Processing, vol. 54(1), pp. 91-98, 1996. [15] …………….."………………., “New Two Dimensional Transform,” Electron. Lett. Vol. 29, pp. 949-950, May 1993. [16] C. S. Burrus and I. W. Selesnick, "On programs for prime length FFTs and Circular Convolution," IEEE Proc. ICASSP-95, vol. 2, pp. 1137-1140, 1995. [17] C. S. Burrus, "Index Mapping for Multidimensional formulation of the DFT and the Convolution," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-25, pp. 239-242, Jun. 1977. [18] ………"……… "Notes on the FFT". Rice University, Houston, TX 77005, 29 Sep 1997. [19] J. W. Cooley and J. W. Tukey, “An algorithm for the machine computation of complex Fourier series,” Math. Comp., vol. 19, pp. 297-301, Apr. 1965. [20] R. Creutzberg, “Constructive Parameterization of NumberTheoretic Transforms – A Short Overview,” T1CSP Workshop on Transforms and Filter Banks, 1998. [21] R. Creutzberg and M. Tasche, “Number-theoretic transforms of prescribed length,” Math. Comp., vol. 47, pp. 693-701, 1986. [22] ……………….”…………………., “parameter determination for complex number.theoretic transforms using cyclotomic polynomials,” Math Comp., vol. 52, pp. 189-200, 1989. [23] R. Creutzberg and G. Steidl, “Construction of parametersfor number-theoretic transforms in rings of cyclotomic integers,” ournal Inform. Process.,Cybernetics, vol. EIK24, pp. 573-584, 1988. [24] V. S. Dimitrov, T. V. Cooklev, and B. D. Donevsky, “Generalized Fermat-Mersenne number theoretic transform,” IEEE Trans. CAS-II, vol.41, pp. 133-139, Feb. 1994. [25] …………………..”……………………, “Number Theoretic Transforms Over the Golden Section Quadratic Field,” IEEE Trans. Signal Processing, vol. SP-43, pp. 1790-1797, Aug. 1995. [26] S. W. Golomb, I. S. Reed, and T. K.Truong, "Integer Convolution over Finite Field GF(3.2n+1),". SIAM. J. Appl. Math., vol. 32, pp. 356-365, Mar. 1977. [27] B. Gold and C. M. Rader, Digital Processing of Signals. New York : McGraw-Hill, 1969. [28] I. J. Good, “The relation between the two fast Fourier transforms,” IEEE Trans. Comput., vol. C-20, pp. 310-317, Mar. 1971. [29] G. H. Hardy and E. M. Wright, The Theory of Numbers. Oxford, England: Oxford Univ. Press, 1960. [30] D. Kibler, R. C. Agarwal, C. S. Burrus, “Necessary and sufficient conditions for the existence of the modular Fourier transform: Comments on ‘Number theoretic

transforms to implement fast digital convolution’ [and reply],” Proc. IEEE, vol. 65, pages 265-267, Feb. 1977. [31] D. P. Kolba and T. W. Parks, "A Prime Factor FFT Algorithm using High Speed Convolution," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-25, pp. 281-294, Aug. 1977. [32] L. M. Leibowitz, “A binary arithmetic for the Fermat number transform,” Naval Research Lab, Report 7971, 1976. [33] ………..”………, “A Simplified Binary Arithmetic for the Fermat Number Transform,” IEEE Trans. Acoust., Speech, and Signal Processing,” vol. ASSP-24, pp. 356-359, Oct. 1976. [34] H. Lu and S. C. Lee, "A new approach to Solve the Sequence-Length Constraint Problem in Circular Convolution Using Number Theoretic Transform," IEEE Trans. Signal Processing, vol. 39, pp. 1314-1321, Jun. 1991. [35] J. H. McClellan, “Hardware Realization of a Fermat Number Transform,” IEEE Trans. Acoust., Speech, and Signal Processing,” vol. ASSP-24, pp. 216-225, Jun. 1976. [36] J. H. McClellan and C.M. Rader, Number Theory in Digital Signal Processing. Prentice-Hall Inc. Englewood Cliffs, NJ., 1979. [37] H.J. Nussbaumer "Complex convolutions via Fermat Number Transforms," IBM Journal Research and Development, pp. 282-284, May 1976. [38] .………"………, "Digital Filtering using complex Mersenne Transform," IBM Journal Research and Development, pp. 498-504, Sep 1976. [39] .………"……… "Digital Filtering using Pseudo Fermat Number Transforms," IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-25, pp. 79-83, Oct. 1977. [40] .………"………, "Linear Filtering Technique for Computing Mersenne and Fermat number Transform," IBM Journal Research and Development, pp. 334-339, Jul. 1977. [41] H. J. Nussbaumer, Fast Fourier Transform and Convolution Algorithm. Springer-Verlag, 1982 [42] O. Ore, Number Theory and Its History. New York: McGraw-Hill, 1948. [43] M. G. Parker and M. Benaissa, “Unusual-length numbertheoretic transforms using recursive extensions of Rader’s algorithm,” IEE Proc. Vis. Image Signal Process., vol. 142, pages 31-34, Feb 1995. [44] A. Peled and B. Liu, “A new Hardware realization of Digital Filters,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-22, pp. 456-462, Dec. 1974. [45] J. M. Pollard, “The fast Fourier transform in a finite field,” Math. Comp., vol. 25, pp. 365-374, Apr. 1971. [46] C. M. Rader, “Discrete convolutions via Mersenne Transform”. IEEE Trans. Comput., vol. C-21, pp. 12691273, Dec. 1972. [47] ……”………... , “Discrete Fourier transform when the number of data samples is prime,” Proc. IEEE, vol. 56, pp. 1107-1108, Jun.1968. [48] I. S. Reed and T.K. Truong, “The Use of Finite Fields to Compute Convolutions,” IEEE Trans. Inform. Theory, vol IT-21, pp. 208-213, Mar. 1975. [49] …………..”……………….., “Complex Integer Convolution over a Direct Sum of Galois Fields,” IEEE Trans. Inform. Theory, vol IT-21, pp. 657-661, Nov. 1975.

[50] ……………..”……………… , “Convolutions over residue Classes of Quadratic Integers,” IEEE Trans. Inform. Theory, vol. IT-22, pp. 468-475, Jul. 1976. [51] …………...”…………….., “A Fast Computation of complex convolution using a Hybrid Transform,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP26, pp. 566-570, Dec. 1978. [52] R.M. Robinson, “A report on Primes of the form k.2n+1 and on factors of Fermat Numbers,” Proc. Amer. Math. Soc. 9(1958), pp. 673-681, Oct. 1958. [53] I. W. Selesnick and C. S. Burrus, “Extending Winograd’s small convolution algorithm to longer lengths”. IEEE Proc. ISCAS-94, vol. 2, pp. 449-452, 1994. [54] ………………….”…………………, “Automatic Generation of Prime Length FFT Programs,” IEEE Trans. Signal Processing, vol. 44, pp. 14-24, Jan. 1996. [55] H. F. Silverman “An Introduction to Programming the Winograd Fourier Transform Algorithm (WFTA),” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP25, pp. 152-165, Apr. 1977. [56] N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications to Computer Technology. New York: McgrawHill, 1967. [57] H. Tamori, N. Aoiki, and T. Yamamoto, “A Fragile Watermarking Technique by Number Theoretic Transform,” IEICE Trans. Fundamentals, vol. E85A, pp. 1902-1904, Aug. 2002. [58] E. Vegh and L. M. Leibowitz, “Fast Complex Convolution in Finite Rings,” IEEE Trans. Acoust., Speech, and Signal Processing,” vol. ASSP-24, pp. 343-344, Aug.1976. [59] …………………..”……………….., “Discrete convolution of complex integer sequences,” Proc. ICASSP’76, vol. 1, pp. 134-135, Apr. 1976. [60] S. Winograd, “On Computing the Discrete Fourier Transform,” Math. Comp., vol. 32, pp. 175-199, Jan 1978 (also in IBM Research report RC-6291 of 1976, IBM Thomas J. Watson Research Center, Yorktown Heights, New York). [61] …….”………, “Some bilinear forms whose multiplicative complexity depends on the field of constants,” Math, Syst, Theory, 10(2), pp.169-180, 1976-1977. ***********

Some Interesting Notes:(a) Pierre Fermat (1601−1665) was a lawyer and government official most remembered for his work in number theory, in particular for Fermat’s last theorem. (b) Marin Mersenne (1588−1648) was a monk and is best known as clearinghouse for correspondence between eminent philosophers and scientists and for his work in number theory. (c) The 41st Mersenne prime was found in May 2004 by Josh Findley. He used a 2.4 GHz Pentium 4 Windows XP computer running for 14 days to prove the number was prime. It has 7,235,733 decimal digits represented by 224036583−1. Written out, it would stretch for 25 kilometres (only!!). The 7,235,733 digits, would take someone the best part of six weeks to write out longhand. The authors have the digits in a text file (Any volunteers to try it out are welcome to approach the authors or download them from GIMPS (Great Internet Mersenne Prime Search) website).