A Karatsuba-based Algorithm for Polynomial Multiplication in

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY

1

A Karatsuba-based Algorithm for Polynomial Multiplication in Chebyshev Form Juliano B. Lima, Student Member, IEEE, Daniel Panario, Member, IEEE, and Qiang Wang

Abstract— In this paper, we present a new method for multiplying polynomials in Chebyshev form. Our approach has two steps. First, the well-known Karatsuba’s algorithm is applied to polynomials constructed by using Chebyshev coefficients. Then, from the obtained result, extra arithmetic operations are used to write the final result in Chebyshev form. The proposed algorithm has a quadratic computational complexity. We also compare our method to other approaches. Index Terms— Theory of computation, analysis of algorithms and problem complexity, computations on polynomials.

I. I NTRODUCTION HEBYSHEV polynomials have been an essential mathematical object in several fields of knowledge. In Electronics, for instance, such polynomials have an important role in the design of analog and digital filters with characteristics close to the ideal ones [1]. Recently, Chebyshev series, i.e., the approximation of a function in terms of Chebyshev polynomials, was proposed for analyzing circuit’s nonlinearities. This provides more accuracy when comparing to other expansions, such as Taylor series [2]. Interpolation techniques via Chebyshev polynomials have been part of numerical algorithms for calculating chromatic dispersion coefficients of optical fibers. This allows us to plot the dispersion curves that describe the behavior of those fibers [3]. Such techniques are also useful in direct digital frequency synthesis of arbitrary waveform, resample procedures for discrete multitone modems and many other scenarios [4], [5]. In general, the use of Chebyshev polynomials for approximating a function assures more stability than the monomial representation or the use of other basis. In particular, if a truncation is necessary, the quick decreasing of Chebyshev expansion coefficients entails relatively small rounding errors [6], [7]. This is the basic reason making those polynomials highly attractive in numerical analysis and, in particular, in approximation and interpolation techniques. This paper deals with the important operation of multiplication of polynomials in Chebyshev form. That is, given two polynomials a(x) and b(x) in Chebyshev form, obtain the polynomial c(x) = a(x) · b(x) also written in Chebyshev form. This problem was previously addressed in [8], where two approaches were given. The first one is a direct multiplication of polynomials in Chebyshev form, while the second is based on the discrete cosine transform (DCT). In this work, we propose a method based on the well-known Karatsuba’s algorithm [9], [10]. Our approach consists of the application of Karatsuba’s algorithm to the ordinary polynomials a′ (x) and b′ (x) obtained

C

Manuscript received Month DD, YYYY; revised Month DD, YYYY. J. B. Lima is with the Department of Electronics and Systems, Federal University of Pernambuco, Recife, Brazil (e-mail: juliano [email protected]). D. Panario and Q. Wang are with the School of Mathematics and Statistics, Carleton University, Ottawa, Canada (e-mail: [email protected]; [email protected]).

from a(x) and b(x). The coefficients in the resulting product are denoted by c′i . Then, we show that the Chebyshev coefficients of c(x), denoted by ci , can be computed from the coefficients c′i . This procedure, which needs extra arithmetic operations, is derived and explained in detail. Although our method involves a quadratic computational complexity, the number of required multiplications is reduced by half, when compared to the direct multiplication [8]. Under this aspect, for small degree polynomials a(x) and b(x) covering several Chebyshev expansion practical applications [2], [3], [4], our method is also more efficient than the mentioned DCT approach. Moreover, our procedure seems to provide implementation advantages because it does not introduce rounding errors. In Section II, we review Chebyshev polynomials and the direct method for multiplying polynomials in Chebyshev form. In Section III, after introducing the main ideas of this paper, the standard Karatsuba’s algorithm is briefly shown. Then, we use this algorithm to perform the Chebyshev basis polynomial multiplication and provide some examples. Furthermore, Theorem 2 gives a precise estimate for the cost of the algorithm. A comparison with other approaches and conclusions are given in Section IV. II. M ULTIPLICATION OF P OLYNOMIALS IN C HEBYSHEV F ORM The classical definition of Chebyshev polynomials of the first kind is Ti (x) := cos(i · arccos x), (1) where i ∈ N and x ∈ [−1, 1]. From Equation (1), we obtain T0 (x) = 1, T1 (x) = x, and the recurrence relation Ti+1 (x) = 2 x Ti (x) − Ti−1 (x).

Hence, Chebyshev polynomials of degree i can be easily obtained. It is also shown that every real polynomial a(x) of degree ≤ N − 1 can be written as a linear combination of Chebyshev polynomials of the first kind [6]. Usually, this is called Chebyshev expansion and it is given by a(x) =

N −1 X a0 ai Ti (x), ai ∈ R. + 2

(2)

i=1

From the relation Ti Tj =

Ti+j + T|i−j| 2

, i, j ∈ N,

which can be verified by using simple trigonometric identities, a multiplication rule for polynomials in Chebyshev form can be derived. It is described in the following proposition [8]. Proposition 1: Let a(x) and b(x) be polynomials of degree N − 1 given in the Chebyshev form a(x) =

N −1 X a0 ai Ti (x) + 2 i=1

2


and b(x) =

N −1 X b0 bi Ti (x), + 2 i=1

where ai , bi ∈ R. Then the product c(x) = a(x) · b(x) has the Chebyshev form c(x) =

2N −2 X c0 ci Ti (x) + 2

with  PN −1 a0 · b0 + 2 l=1 al · bl , i = 0;       P PN −1−i i 2ci = (al · bl+i + al+i · bl ), l=0 ai−l · bl + l=1    i = 1, . . . , N − 2;    PN −1 a · b , i = N − 1, . . . , 2N − 2. l l=i−N +1 i−l

(3)

The computation of all coefficients ci , i = 0, . . . , 2N − 2, directly from Equation (3) is referred as a “direct method” and involves O(N 2 ) real multiplications [8]. In this same equation, the number of all possible products ai ·bj , i, j = 0, . . . , N −1, and the number of products by 1/2 are counted. This gives Md (n), the exact number of multiplications for computing all coefficients ci using the direct method, Md (N ) = N 2 + 2 N − 1.

(4)

According to Equation (3), given integers i1 and i2 such that 1 ≤ i1 ≤ N − 2 and i1 < i2 ≤ 2 N − 2, any term with the form (al ·bl+i1 +al+i1 ·bl ), l = 1, . . . , N −1−i1 , is previously computed P2 P −1 in the sum il=0 ai2 −l · bl or N a · bl . Consequently, l=i2 −N +1 i2 −l in the second row of that equation, the additions (al ·bl+i +al+i ·bl ) do not need to be counted. Therefore, Ad (N ), the exact number of additions for obtaining all coefficients ci using the direct method is N −2 X

(N − 1) +

i=1

=

(N − 1) (3N − 2) . 2

2N −2 X

i = 0; i = 1, . . . , N − 2;

(6)

i = N − 1, . . . , 2N − 2.

By substituting Equation (6) into Equation (3), we obtain

i=1

Ad (N ) = N − 1 +

 a0 · b 0 ,       P i c′i = l=0 ai−l · bl ,       PN −1 l=i−N +1 ai−l · bl ,

(2 N − 2 − i),

 ′ PN −1 ci + 2 l=1 al · bl , i = 0;       P −1−i 2ci = (al · bl+i + al+i · bl ), c′i + N l=1    i = 1, . . . , N − 2;    ′ ci , i = N − 1, . . . , 2N − 2.

(7)

We remark that Equation (6) can be obtained by running a classical divide and conquer method. It involves N 2 multiplicaP(log2 N )−1 −k tions and N − 1 + N (N − 1) k=1 2 additions. To obtain coefficients ci in Equation (7), we need extra operations (that is, 2N − 1 extra multiplications and N (N − 1)/2 extra additions). The total numbers of multiplications and additions are equal to the same numbers for the direct method; see Equations (4) and (5). That is why in this paper we concentrate on using Karatsuba’s algorithm to obtain coefficients c′i in Equation (6). Given coefficients c′i computed by Karatsuba’s algorithm, coefficients ci could be obtained from Equation (7) with the following number of extra multiplications: 2 N − 1 due to the scale factor 1/2; N − 1 for computing terms al · bl , l = 1, . . . , N − 1; (N − 2) (N − 1)/2 for computing terms (al · bl+i + al+i · bl ) = (al + al+i ) · (bl + bl+i ) − al · bl − al+i · bl+i , i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i. This implies a total number of extra multiplications given by (N 2 + 3 N − 2)/2. The number of extra additions related to first and second rows of Equation (7) would be N − 1 and 5(N − 2)(N − 1)/2, respectively. Then, the total number of extra additions would be (5 N 2 − 13 N + 8)/2. We show how these numbers of extra operations can be further reduced by using the intermediate results of the Karatsuba’s algorithm previously applied. Our algorithm is given below.

i=N −1

(5)

III. K ARATSUBA - BASED ALGORITHM FOR THE MULTIPLICATION OF POLYNOMIALS IN C HEBYSHEV FORM In this section, we present our algorithm: we use Karatsuba’s algorithm to compute the product of two polynomials whose Chebyshev coefficients are given. Karatsuba’s algorithm intermediate results are kept in track and then used to obtain the Chebyshev coefficients ci of the product polynomial. The key point of our approach is to apply Karatsuba’s algorithm for performing an ordinary polynomial multiplication and recover Chebyshev coefficients through some equations. More specifically, in order to use the algorithm for multiplying a(x) and b(x), coefficients ai and bi are associated to the term of degree i, i = 0, . . . , N − 1, in the monomial representation. This P −1 ai xi and b′ (x) = procedure gives polynomials a′ (x) = N i=0 PN −1 i b x . By running Karatsuba’s algorithm, we obtain the i i=0 P2N −2 ′ i ci x . On the other polynomial c′ (x) = a′ (x) · b′ (x) = i=0 hand, these coefficients c′i are given by

Algorithm: Karatsuba-based algorithm for polynomial multiplication in Chebyshev form. P

N −1 Input: polynomials a(x) = a20 + i=1 ai Ti (x) and b(x) = PN −1 b0 b T (x) of degree N − 1 in Chebyshev form. + i=1 i i 2

Output: polynomial c(x) = a(x) · b(x) = of degree 2N − 2 in Chebyshev form.

c0 2

+

P2N −2 i=1

ci Ti (x)

Step 1: Apply Karatsuba’s algorithm on polynomials a′ (x) = PN −1 PN −1 ai xi and b′ (x) = bi xi , the product of which is i=0 i=0 P2N −2 ′ i ′ ′ ′ ci x and store all denoted by c (x) = a (x) · b (x) = i=0 intermediate computations. Step 1.1: These c′i are obtained from Equation (6).

Step 1.2: Clearly, any intermediate computation related to the term of degree d in the polynomial c′ (x) can be written in the P form D (aik ·bjk +ajk ·bik ), where D ≤ N −1 and ik +jk = d. k=0

Step 2: Obtain terms al ·bl , l = 1, . . . , N −1, and (al ·bl+i +al+i · bl ), i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i, from intermediate computations of the form presented in Step 1.1. This may require

LIMA et al.: A KARATSUBA-BASED ALGORITHM FOR POLYNOMIAL MULTIPLICATION IN CHEBYSHEV FORM

3

a separation procedure.

B. Extra Operations for Karatsuba’s algorithm

Step 2.1 (Separation): Separate each term of the form (al · bl+i + P al+i · bl ) from the intermediate term D (aik · bjk + ajk · bik ), k=0 D > 0, such that ik + jk = 2l + i, for i = 1, . . . , N − 2 and l = 1, . . . , N − 1 − i.

According to Equation (7), in order to obtain the Chebyshev coefficients ci of polynomial c(x) from coefficients c′i , we need to consider scaling factors 1/2 and computing terms al · bl , l = 1, . . . , N − 1, and (al · bl+i + al+i · bl ), i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i. Due to the recursive nature of Karatsuba’s algorithm, some of these terms appear computed together with other terms. Therefore, extra arithmetic operations will be needed for computing them separately before conveniently adding them to coefficients c′i . In this paper, this procedure is referred as separation. Briefly, extra operations for obtaining coefficients ci from coefficients c′i are related to: • operations for separating terms originally computed together with other terms; • additions of terms al · bl and (al · bl+i + al+i · bl ) respectively on first and second rows of Equation (7); • multiplications by the scale factor 1/2. The total number of required extra operations is stated in the following theorem.

Step 3: Add the terms obtained in Step 2 to coefficients c′i , i = 0, . . . , N − 2, according to first and second rows of Equation (7), P −2 to obtain c(x) = a(x) · b(x) = c20 + 2N i=1 ci Ti (x).

We provide details concerning the execution of Step 2 of the presented algorithm in Section III-B. The correctness of the algorithm is immediate from Equations (3), (6) and (7). A. Karatsuba’s Algorithm Assume that we want to multiply two polynomials, a′ (x) and b (x), with degrees N − 1. These polynomials are given in the monomial form and have coefficients ai and bi respectively. For the purpose of this paper, we consider N = 2n , n ∈ N. However, there are also efficient ways for dealing with polynomials with degrees different from 2n − 1 [10], [11]. We may write ′

a′ (x) = A1 (x) xN/2 + A0 (x)

and b′ (x) = B1 (x) xN/2 + B0 (x),

where A1 (x) = aN −1 xN/2−1 + · · · + aN/2 ,

Theorem 1: Let a(x) and b(x) be polynomials of degree N − 1 whose Chebyshev coefficients ai and bi , i = 0, . . . , N − 1, are PN −1 PN −1 given. Let a′ (x) = ai xi and b′ (x) = bi xi be i=0 i=0 P 2N −2 ′ ′ polynomials whose product is denoted by c (x) = i=0 ci xi . If the polynomial c′ (x) is computed using Karatsuba’s algorithm, then the Chebyshev coefficients ci , i = 0, . . . , 2N − 2, of the polynomial c(x) = a(x) · b(x) are obtained from the coefficients c′i , i = 0, . . . , 2N − 2, with

B1 (x) = bN −1 xN/2−1 + · · · + bN/2 , N/2−1

B0 (x) = bN/2−1 x

(9)

5 N 2 − 6 N log2 3 + N (1 − log2 N ) 2

(10)

extra multiplications and

+ · · · + b0 .

Ae (N ) ≤

We have c′ (x) = a′ (x) · b′ (x) given by

extra additions.

c′ (x) = [A1 (x) B1 (x)] xN + [A0 (x) B1 (x) + A1 (x) B0 (x)] xN/2

N 2 − 2 N log2 3 + 5 N − 2 2

Me (N ) =

A0 (x) = aN/2−1 xN/2−1 + · · · + a0 ,

(8)

+ [A0 (x) B0 (x)] .

In the above equation, simplifying the notation and omitting “(x)”, the term multiplying xN/2 may be rewritten as A0 B1 + A1 B0 = (A0 + A1 ) (B0 + B1 ) − A0 B0 − A1 B1 .

This saves one multiplication, because we have previously computed A0 B0 and A1 B1 . Therefore, the product of polynomials with degree N − 1 may be computed using three products of polynomials with degree (N/2)−1. As this procedure is recursive, it is shown that Karatsuba’s algorithm for multiplying polynomials of degree N = 2n , i.e., for obtaining coefficients c′i , can be done with N log2 3 multiplications and at most 6 N log2 3 − 8 N + 2 additions [12]. It is important to notice that we are not applying Karatsuba’s algorithm in a blackbox manner. Instead, we store all intermediate results to be used later. We also remark that such an algorithm has a “three term” structure based on the recursive computation of A1 B1 , A0 B0 and A0 B1 +A1 B0 = (A0 +A1 ) (B0 +B1 )−A0 B0 −A1 B1 . Throughout this paper, intermediate terms involved on the computation of A1 B1 , A0 B0 and A0 B1 + A1 B0 are respectively associated to symbols 11, 00 and 01.

Before presenting the proof of Theorem 1, we introduce some notations and develop examples which make the derivation of Equations (9) and (10) easier to understand. Particularly, we are interested in observing which intermediate terms related to symbols 11, 00 and 01 are produced together. In what follows, terms with such characteristic are written between h·i; we omit this notation for single terms ai · bi . Example 1: We want to multiply polynomials a(x) and b(x), N = 2, whose Chebyshev coefficients ai and bi are given. Using Karatsuba’s algorithm for computing coefficients c′i , we have 11 : c′2 = A1 B1 = a1 · b1 ; 00 : 01 :

c′0 c′1

(11)

= A0 B0 = a0 · b0 ;

(12)

= (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0 = (a1 + a0 ) (b1 + b0 ) − a1 b1 − a0 b0 = ha0 · b1 + a1 · b0 i.

(13) c′2 /2,

c′1 /2

From Equation (7), we directly obtain c2 = c1 = and c0 = c′0 /2 + c′2 , because there are no terms to be separated. In this case, extra operations are exclusively due to the scale factor 1/2 and the addition c′0 /2 + c′2 , which results in Me (2) = 3 and Ae (2) = 1. Example 2: A second example is to multiply a(x) and b(x) where N = 4. As Karatsuba’s algorithm is recursive, in this

4


case, the computation of A1 B1 and A0 B0 may be viewed as repetitions of the first example. Therefore, the intermediate terms related to symbols 11 and 00 are 11

: c′6

= a3 · b 3 ,

00 : a1 · b1 ,

c′1

c′5

= ha2 · b3 + a3 · b2 i, a2 · b2 ;

= ha0 · b1 + a1 · b0 i,

c′0

= a0 · b 0 .

11 : c′14 = a7 · b7 , c′13 = ha6 · b7 + a7 · b6 i,

(14)

a6 · b6 , ha5 · b7 + a7 · b5 i,

(15)

c′11 = ha5 · b6 + a6 · b5 + a4 · b7 + a7 · b4 i,

The computation of (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0 is similar, being necessary a special care with terms produced together. More specifically, we have (A1 + A0 ) = (a3 + a1 )x + (a2 + a0 ) and (B1 + B0 ) = (b3 + b1 )x + (b2 + b0 ), the product of which produces terms h(a3 + a1 ) · (b3 + b1 )i,

ha4 · b5 + a5 · b4 i, a4 · b4 ; 00 : a3 · b3 , ha2 · b3 + a3 · b2 i, a2 · b2 , ha1 · b3 + a3 · b1 i,

c′1 = ha0 · b1 + a1 · b0 i, c′0 = a0 · b0 .

The subtractions by A1 B1 and A0 B0 come from the intermediate terms related to symbols 11 and 00, respectively, in Equations (14) and (15). By subtracting a3 · b3 and a1 · b1 from h(a3 + a1 ) · (b3 + b1 )i, we obtain ha1 · b3 + a3 · b1 i; by subtracting a2 · b2 and a0 · b0 from h(a2 +a0 )·(b2 +b0 )i, we get ha0 ·b2 +a2 ·b0 i; by subtracting ha2 · b3 + a3 · b2 i and ha0 · b1 + a1 · b0 i from h(a3 + a1 ) · (b2 + b0 ) + (a2 + a0 ) · (b3 + b1 )i, we obtain ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i. Therefore, the final result for the intermediate terms related to symbol 01 is 01 : ha1 · b3 + a3 · b1 i,

The computation of (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0 is also analogous. We give only its final result, that is 01 : ha3 · b7 + a7 · b3 i, ha2 · b7 + a7 · b2 + a3 · b6 + a6 · b3 i, ha2 · b6 + a6 · b2 i, ha1 · b7 + a7 · b1 + a3 · b5 + a5 · b3 i, c′7 = ha1 · b6 + a6 · b1 + a2 · b5 + a5 · b2

(20)

+ a0 · b7 + a7 · b0 + a3 · b4 + a4 · b3 i, ha0 · b6 + a6 · b0 + a2 · b4 + a4 · b2 i,

(16)

ha1 · b5 + a5 · b1 i,

We recall that coefficients i = 0, . . . , 6, are obtained by running Karatsuba’s algorithm after all other intermediate terms are computed. However, at this point, we just want to observe the terms that are produced together, being sufficient to perform the first step of the algorithm. In this sense, from Equation (7), we particularly know that

ha0 · b4 + a4 · b0 i.

ha0 · b2 + a2 · b0 i.

ha0 · b5 + a5 · b0 + a1 · b4 + a4 · b1 i,

c′i ,

c′1 + (a1 · b2 + a2 · b1 + a2 · b3 + a3 · b2 ) . 2

Hence, in order to evaluate c1 , we need to compute a1 ·b2 +a2 ·b1 , because this term is originally produced together with a0 · b3 + a3 · b0 , as shown in Equation (16). Since we know a1 · b1 and a2 · b2 , this requires one multiplication and four additions because a1 · b2 + a2 · b1 = (a1 + a2 ) · (b1 + b2 ) − a1 · b1 − a2 · b2 .

All other coefficients ci can be obtained in similar way. Naturally, we still need to count other extra operations mentioned before Theorem 1. The final result is Me (4) = 8 and Ae (4) = 11. Remark: In Example 2, we do not need to separate a0 · b3 + a3 · b0 . However, this term could be obtained by using one more addition, namely a0 · b3 + a3 · b0 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i

(19)

ha0 · b2 + a2 · b0 i, a1 · b1 ,

h(a2 + a0 ) · (b2 + b0 )i.

c1 =

(18)

ha4 · b6 + a6 · b4 i, a5 · b5 ,

c′3 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i,

h(a3 + a1 ) · (b2 + b0 ) + (a2 + a0 ) · (b3 + b1 )i,

c′3 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i,

and A0 B0 may be viewed as repetitions of the case N = 4. The terms obtained are

(17)

− ha1 · b2 + a2 · b1 i.

Although the last step of the separation procedure is not required in Example 2, we do need to use it in multiplications involving larger degree polynomials. Example 3: In this example, we want to multiply a(x) and b(x) for N = 8. As in Example 2, the computation of A1 B1

Let us consider the term c′7 = ha1 · b6 + a6 · b1 + a2 · b5 + a5 · b2 + a0 ·b7 +a7 ·b0 +a3 ·b4 +a4 ·b3 i. Terms a1 ·b6 +a6 ·b1 , a2 ·b5 +a5 ·b2 and a3 ·b4 +a4 ·b3 need to be separated from c′7 because they must also be added to c′5 , c′3 and c′1 , in order to compute c5 , c3 and c1 , respectively. Similarly to the previous example, one multiplication and four additions are necessary for calculating each one of these terms. From the term ha2 · b7 + a7 · b2 + a3 · b6 + a6 · b3 i, which is associated to c′9 , we need to separate a2 · b7 + a7 · b2 and a3 ·b6 +a6 ·b3 , and respectively add them to c′5 and c′3 , in order to compute c5 and c3 . The same procedure is applied for all terms which are previously computed together. After this, other extra operations have to be counted for adding the separated terms to coefficients c′i and multiplying by the factor 1/2. This results in Me (8) = 24 and Ae (8) = 71. With the previous examples in mind, we can derive a formula for the number of operations necessary to separate terms produced together in Karatsuba’s algorithm. We start observing the intermediate terms produced by the algorithm, i.e., before obtaining the final result for the coefficients of c′ (x). We associate terms in the form ai · bi to 0, hai1 · bj1 + aj1 · bi1 i to 1, hai1 · bj1 + aj1 · bi1 + ai2 · bj2 + aj2 · bi2 i to 2, hai1 · bj1 + aj1 · bi1 + ai2 · bj2 + aj2 · bi2 + ai3 · bj3 + aj3 · bi3 + ai4 · bj4 + ai4 · bj4 i to 4, etc. In general, a term with the form + * 2t X (aik · bjk + ajk · bik ) , (21) k=1

where t ∈ N and ik + jk is a constant for 1 ≤ k ≤ 2t , is associated to the number or status s = 2t . If we consider that all terms in the


above expression need to be separated, s − 1 extra multiplications are required. Consequently, at most 4 (s −1)+1 extra additions are necessary. The upper bound is justified by the possible presence of terms of the form ha0 · bi + ai · b0 i, i 6= 0, produced together with other terms. They do not need to be separated and, in these cases, one addition is saved; see remark after Example 2. After applying the separation procedure just explained, every term has status at most 1, i.e, has the form ai · bi or hai · bj + aj · bi i. Such terms are then added to coefficients c′i according to Equation (7) in order to obtain coffecients ci . For N = 1, we have a0 · b0 only, which has status 0 and does not represent any extra operation. Since this case is like an “initial state”, we associate it to 01. For N = 2, we have a repetition of the previous one on terms associated to 11 and 00; see Equations (11) and (12). The symbol 01 is also a repetition of the previous one, but with a status incremented from 0 to 1; see Equation (13). Due to the recursive nature of the algorithm, an analogous fact occurs for N = 4, 8, . . .. This may be verified in Equations (14)–(16) and Equations (18)–(20). This allows to construct Table I, which shows the status of all terms in Karatsuba’s algorithm up to N = 8. The last row emphasizes that m(n), the number of multiplications necessary for separating terms that Karatsuba’s algorithm computes together, is obtained by summing contributions of terms associated to 11, 01 and 00. These contributions are respectively denoted by m(n)11 , m(n)01 and m(n)00 . If N = 4, for instance, we have m(n) = m(n)01 = 1 because only the term with status 2 associated to 01 requires a separation procedure (see Table I). Specifically, this term corresponds to ha1 ·b2 +a2 ·b1 +a0 ·b3 +a3 ·b0 i, presented in Example 1. If N = 8, we have m(n)00 = 1 (one term with status 2), m(n)01 = 7 (four terms with status 2 and one term with status 4) and m(n)11 = 1 (one term with status 2). These terms may be distinguished in Equations (18)–(20). In this case, m(n) = 1 + 7 + 1 = 9. Moreover, by comparing rows for N = 4 (n = 2) and N = 8 (n = 3) in Table I, we note that m(3)11 = m(3)00 = m(2); m(3)01 is given by 2 m(2)01 plus the contribution of the terms related to m(2)01 , but with incremented (doubled) statuses. Due to the recursion of Karatsuba’s algorithm, this situation is general, that is, m(n)11 = m(n)00 = m(n − 1) and m(n)01 is given by 2 m(n − 1)01 plus the contribution of the terms associated to m(n − 1)01 with incremented statuses. Proof of Theorem 1: By using previous notation and remarks, the number of multiplications necessary for separating terms that Karatsuba’s algorithm computes together, m(n), is given by m(n) = m(n)11 + m(n)01 + m(n)00 .

(22)

We know that m(n)11 = m(n)00 = m(n − 1).

(23)

From above comments, m(n)01 is given by 2 m(n − 1)01 plus the contribution of the terms related to m(n − 1)01 with incremented (doubled) statuses. A term with status s1 = 2n , n ≥ 0, contributes with ms1 = 2n − 1 extra multiplications. Consequently, a term with status s2 = 2 s1 = 2n+1 contributes with ms2 = 2n+1 − 1 = 2 (2n − 1) + 1 = 2 ms1 + 1 extra multiplications. Then, if a set with t terms contributes with mt extra multiplications, a new set, obtained by doubling the status of each term in the previous set, contributes with 2 mt + t extra multiplications. We note that there

5

are 3n−2 terms associated to m(n − 1)01 (see Table I for the cases n = 1, 2, 3). Therefore, by doubling the status of each one of these terms, the new contribution is 2 m(n − 1)01 + 3n−2 . This allows us to write m(n)01 = 2 m(n − 1)01 + 2 m(n − 1)01 + 3n−2 = 4 m(n − 1)01 + 3n−2 .

We also note that m(n − 1)01 = m(n − 1) − 2 m(n − 2). Thus, the above equation may be rewritten as m(n)01 = 4 (m(n − 1) − 2 m(n − 2)) + 3n−2 .

(24)

By substituting Equations (23) and (24) in Equation (22), we have m(n) = 2 m(n − 1) + 4 (m(n − 1) − 2 m(n − 2)) + 3n−2 = 6 m(n − 1) − 8 m(n − 2) + 3n−2 .

(25)

Equation (25) is a recurrence relation1 and it can be solved by means of the z -transform. Denoting by M (z) the z -transform of m(n), Equation (25) is written in the z -transform domain as M (z) = 6 M (z) z −1 − 8 M (z) z −2 +

z −2 . 1 − 3 z −1

In the last equation, grouping the terms with M (z), we have z −2 (1 − 6 z −1 + 8 z −2 ) (1 − 3 z −1 ) 1/2 1/2 1 + − . = 1 − 4 z −1 1 − 2 z −1 1 − 3 z −1

M (z) =

(26)

Applying the inverse z -transform to Equation (26), one obtains m(n) =

4n + 2n − 2 · 3n . 2

The above equation can be written in function of N as m(N ) =

N 2 + N − 2 N log2 3 . 2

Adding to m(N ) multiplications due to the scale factor 1/2, we compute Me (N ), the total number of extra multiplications for computing coefficients ci from coefficients c′i , by N 2 + N − 2 N log2 3 + 2N − 1 2 2 log2 3 N − 2N + 5N − 2 = . 2

Me (N ) =

The extra additions come from two sources. The first one is related to the separation procedure. There are four additions per product and at most one more addition per each term with status ≥ 2; see comments immediately after Equation (21). Given n, the total number of terms produced in the first step of Karatsuba’s algorithm is 3n . Denoting respectively by S0 (n) and S1 (n) the number of terms with status 0 and 1 for such an n, we know that S≥2 (n), the number of terms with status ≥ 2, is given by S≥2 (n) = 3n − S0 (n) − S1 (n).

(27)

We note that S0 (n) = 2n and S1 (n) = 2 S1 (n − 1) + S0 (n − 1) = 2 S1 (n − 1) + 2n−1 . 1 Curiously, this recurrence relation produces a sequence m(n), n = 0, 1, 2, . . ., which coincides with the number of monotone Boolean functions of n variables with 2 mincuts. It also represents the number of Sperner systems with 2 blocks and some other sequences archived by the “On-line Encyclopedia of Integer Sequences” [13].

6


TABLE I S TATUS OF ALL TERMS IN K ARATSUBA’ S ALGORITHM UP TO N = 8. T HE NUMBER OF MULTIPLICATIONS m(n) NECESSARY FOR SEPARATING TERMS ORIGINALLY COMPUTED TOGETHER WITH OTHER TERMS IS ALSO PRESENTED .

N = 2n

11

01

00

m(n)

1 2 4 8

-

0 1 1, 2, 1 1, 2, 1, 2, 4, 2, 1, 2, 1 | {z }

-

0 0 1 9

0 0, 1, 0 0, 1, 0, 1, 2, 1, 0, 1, 0 | {z } m(n)11

m(n)01

Since the above equation is also a recursion, it may be solved using the z -transform. The result is S1 (n) = n 2n−1 . Hence, Equation (27) may be written as S≥2 (n) = 3n − 2n − n 2n−1 and, consequently, N log2 N S≥2 (N ) = N log2 3 − N − 2 log2 N log2 3 . =N −N 1+ 2

Thus, the number of extra additions related to the separation procedure is at most log2 N N 2 + N − 2 N log2 3 + N log2 3 − N 1 + 2 2 log2 N 2 log2 3 . (28) =2 N − 3 N +N 1− 2 4

The second source of extra additions is related to operations needed to add terms al ·bl , l = 1, . . . , N −1, and al ·bl+i +al+i ·bl , i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i, in Equation (7), which gives N −1+

N −2 X

(N − 1 − i) =

i=1

N (N − 1) . 2

(29)

Thus, by summing Equations (28) and (29), we compute Ae (N ), the total number of extra additions for computing coefficients ci from coefficients c′i . One obtains Ae (N ) ≤ 2N 2 − 3N log2 3 + N =

1−

log2 N 2

+

N (N − 1) 2

5N 2 − 6N log2 3 + N (1 − log2 N ) . 2

C. Total Arithmetic Complexity By using our Karatsuba-based algorithm, the total arithmetic complexity for computing Chebyshev coefficients of the product of two polynomials in Chebyshev form is given by the following theorem. Theorem 2: Let a(x) and b(x) be polynomials of degree N − 1 whose Chebyshev coefficients ai and bi , i = 0, . . . , N − 1, are given. By means of the proposed Karatsuba-based algorithm, Chebyshev coefficients ci , i = 0, . . . , 2N − 2, of the polynomial c(x) = a(x) · b(x) are obtained with N2 + 5 N − 2 Mk (N ) = 2

(30)

0 0, 1, 0 0, 1, 0, 1, 2, 1, 0, 1, 0 | {z } m(n)00

multiplications and Ak (N ) ≤

5 N 2 + 6 N log2 3 − N (15 + log2 N ) + 4 2

(31)

additions. The proof is immediate. Equations (30) and (31) are obtained by adding the number of operations necessary for computing coefficients c′i , presented in Section III-A, to the number of extra operations derived in the last subsection. We remark that the standard application of Karatsuba’s algorithm for multiplying polynomials involves O(N log2 3 ) arithmetic operations. Here, due to the extra operations, our method has a higher cost of O(N 2 ).

IV. D ISCUSSION AND C ONCLUSIONS From Equations (4), (5), (30) and (31), we construct Table II, in which the total number of multiplications and additions for multiplying polynomials in Chebyshev form by direct (resp. Md and Ad ) and our Karatsuba-based (resp. Mk and Ak ) methods are shown. All the entrances in Table II were checked by a c Matlab computer simulation. The program counted the number of operations for both direct and our Karatsuba-based methods. Although both direct and our Karatsuba-based methods involve O(N 2 ) multiplications, the division by 2 in Equation (30) makes a considerable difference. By asymptotically evaluating the ratio Mk (N )/Md (N ), we conclude that half of the multiplications required by the direct method is saved if we use Karatsuba-based algorithm. This tendency is observed in Table II. As expected, since one of Karatsuba’s algorithm principles is to exchange multiplications by additions, Ak (N ) is larger than Ad (N ). More precisely, the ratio Ak (N )/Ad (N ) is closed to 5/3 as N increases. Thus, a coherent comparison between the direct and the proposed methods strongly depends on the computational cost of one multiplication in terms of additions. If we consider that one multiplication costs r additions, the following analysis can be done. Let N = 2n , the total computational cost Td (N ) for multiplying two polynomials of degree N −1 in Chebyshev form by the direct method is measured by Td (N ) = r Md (N ) + Ad (N ).

The total cost Tk (n) using our Karatsuba-based method is Tk (N ) = r Mk (N ) + Ak (N ).


TABLE II T OTAL NUMBER OF MULTIPLICATIONS AND ADDITIONS FOR MULTIPLYING POLYNOMIALS IN C HEBYSHEV BASIS BY DIRECT ( RESP. Md AND Ad ) AND K ARATSUBA - BASED ( RESP. Mk AND Ak ) METHODS . N = 2n 1 2 4 8 16 32 64 128

Md 2 7 23 79 287 1087 4223 16639

Mk 2 6 17 51 167 591 2207 8511

Ad 0 2 15 77 345 1457 5985 24257

Ak 0 5 35 171 733 2971 11757 46115

TABLE III T OTAL NUMBER OF MULTIPLICATIONS AND ADDITIONS FOR MULTIPLYING POLYNOMIALS IN C HEBYSHEV BASIS BY DCT ( RESP. MDCT AND ADCT ) AND K ARATSUBA - BASED ( RESP. Mk AND Ak ) METHODS . N = 2n 1 2 4 8 16 32 64 128

MDCT 2 7 23 67 179 451 1091 2563

Mk 2 6 17 51 167 591 2207 8511

ADCT 12 30 81 216 555 1374 3297 7716

Ak 0 5 35 171 733 2971 11757 46115

A general knowledge concerning the ratio Td (N )/Tk (N ) can be acquired by computing lim

N →∞

r Md (N ) + Ad (N ) Td (N ) = lim . Tk (N ) N →∞ r Mk (N ) + Ak (N )

In order to find the range of r where Karatsuba-based approach is faster than the direct approach, we substitute previously derived formulas in the above equation and obtain 2r + 3 > 1, r+5

whose solution is r > 2.

Hence, Karatsuba-based approach is cheaper than the direct approach if one multiplication costs more than two additions. In most applications, one multiplication is significantly more expensive than two additions [10]. Another alternative for performing the operation discussed in this paper is to expand the polynomials in Chebyshev form to rewrite them in monomial form. Then, the product is computed applying the standard Karatsuba’s algorithm. As a final step, the obtained polynomial is written back in Chebyshev form. In this case, besides increasing the involved arithmetic complexity, extra operations for converting polynomials in Chebyshev form to polynomials in monomial form and vice-versa also induce precision restrictions. It is also pertinent to compare our approach with that proposed in [8], where the polynomial multiplication in Chebyshev form is computed in the discrete cosine transform (DCT) domain. In this case, the product of two polynomials of degree N − 1 is carried out by computing 2N -DCTs. Although the authors of [8] only discuss asymptotic aspects of the arithmetic complexity involved in this method, it is possible to use general formulas and obtain a

7

more precise number of multiplications and additions required by the DCT method. They are respectively denoted by MDCT (N ) and ADCT (N ) and are given in [14] MDCT (N ) = 3N log2 2N − 4N + 3

and ADCT (N ) = (9N + 3) log2 2N − 4N + 12.

By observing Table III, which compares DCT and our Karatsubabased methods, we note that the former uses less arithmetic operations for N ≥ 32. For N = 16, a coherent comparison depends on the cost r of one multiplication in terms of additions. Since DCT implementation requires multiplications by cosines of arcs, precision restrictions must be also considered. On the other hand, in Karatsuba-based method, besides products among coefficients ai and bi , only products by 1/2 are necessary, which makes this aspect less critical. Hence, for N < 16, which covers several Chebyshev expansion practical applications, Karatsubabased method should be used. For instance, in [2], [3] and [4], Chebyshev expansions with 4 ≤ N ≤ 6, 5 ≤ N ≤ 13 and 3 ≤ N ≤ 5 are used, respectively. For larger N , if precision is not a problem, DCT method should be used. We remark that the space required by our algorithm is a bit larger than that for the other algorithms. However, our method should be employed for intermediate sizes where this larger memory requirement is not a problem. Although this paper is not focused on hardware implementations for the proposed method, there is a relevant remark concerning this aspect. Except for some multiplications by 1/2, all extra operations needed for computing coefficients ci from coefficients c′i can be implemented in parallel to standard Karatsuba’s algorithm. Thus, using this, our method can be considerably sped up. ACKNOWLEDGMENT Juliano B. Lima performed this work while at the School of Mathematics and Statistics, Carleton University. He was supported by Coordenac¸a˜ o de Aperfeic¸oamento de Pessoal de N´ıvel Superior – CAPES – under Grant 0599-07-7. Both Daniel Panario and Qiang Wang are supported in part by NSERC of Canada. R EFERENCES [1] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 1999. [2] I. Sarkas, D. Mavridis, M. Papamichail, and G. Papadopoulos, “Volterra analysis using Chebyshev series,” in Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS’2007), May 2007, pp. 1931–1934. [3] P. J. Chiang, C. P. Yu, and H. C. Chang, “Robust calculation of chromatic dispersion coefficients of optical fibers from numerically determined effective indices using Chebyshev-Lagrange interpolation polynomials,” Journal of Lightwave Technology, vol. 24, no. 11, pp. 4411–4416, Nov. 2006. [4] A. Ashrafi, R. Adhami, L. Joiner, and P. Kaveh, “Arbitrary waveform DDFS utilizing Chebyshev polynomials interpolation,” IEEE Transactions on Circuits and Systems–I: Regular Papers, vol. 51, no. 8, pp. 1468–1475, Aug. 2004. [5] G. Cuypers, G. Ysebaert, M. Moonen, and F. Pisoni, “Chebyshev interpolation for DMT modems,” in Proc. IEEE Int. Conference on Communications (ICC’2004), June 2004, pp. 2736–2740. [6] J. C. Mason and D. C. Handscomb, Chebyshev Polynomials, Chapman & Hall/CRC, Boca Raton, FL, 1st edition, 2003. [7] G. H. Rawitscher and I. Koltracht, “An efficient numerical spectral method for solving the Schrodinger equation,” Computing in Science & Engineering, vol. 7, no. 6, pp. 58–66, Nov.-Dec. 2005.

8

[8] G. Baszenski and M. Tasche, “Fast polynomial multiplication and convolutions related to the discrete cosine transform,” Linear Algebra Appl., vol. 252, no. 1-3, pp. 1–25, Feb. 1997. [9] A. Karatsuba and Y. Ofman, “Multiplication of many-digital numbers by automatic computers,” Doklady Akad. Nauk SSSR, vol. 145, pp. 293– 294, 1962. Translation in Physics-Doklady, no. 7, pp. 595–596, 1963. [10] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press, Cambridge, United Kingdom, 2nd edition, 2003. [11] P. L. Montgomery, “Five, six, and seven-term Karatsuba-like formulae,” IEEE Transactions on Computers, vol. 54, no. 3, pp. 362–369, Mar. 2005. [12] C. Paar, “A new architecture for a parallel finite field multiplier with low complexity based on composite fields,” IEEE Transactions on Computers, vol. 45, no. 7, pp. 856–861, July 1996. [13] N. J. A. Sloane, “The on-line encyclopedia of integer sequences,” http://www.research.att.com/∼njas/sequences/A016269. [14] S. C. Chan and K. L. Ho, “Direct method for computing sinusoidal transforms,” IEEE Proceedings, vol. 137, no. 6, pp. 433–442, Dec. 1990.


A Karatsuba-based Algorithm for Polynomial Multiplication in

A Karatsuba-based Algorithm for Polynomial Multiplication in

Suggest Documents

implementation of karatsuba algorithm using polynomial multiplication

Polynomial Multiplication

Parallel Algorithm for Matrix Multiplication

A Polynomial-Delay Polynomial-Space Algorithm for Extracting ...

A POLYNOMIAL TIME ALGORITHM FOR THE CONJUGACY ...

A Polynomial Algorithm for Multiprocessor ... - Semantic Scholar

A Polynomial Algorithm for Decentralized Markov ... - CiteSeerX

A Polynomial Time Incremental Algorithm for

A Polynomial-time Algorithm for Solving Certain

A strongly polynomial algorithm for linear systems

A Matricial Algorithm for Polynomial Refinement

A POLYNOMIAL-TIME ALGORITHM FOR NEAR

Polynomial algorithm for graphs isomorphism's

A LOCAL POLYNOMIAL JUMP DETECTION ALGORITHM IN ...

A Polynomial Approximation Algorithm for the Minimum Fill-In Problem

A Polynomial-Time Algorithm for Detecting Directed Axial Symmetry in

Exact polynomial multiplication using approximate ... - Semantic Scholar

Multiplicative complexity of polynomial multiplication over ... - CiteSeerX

Parallel Sparse Polynomial Multiplication Using Heaps - CECM

Lecture 3: The Polynomial Multiplication Problem

Multiple Constant Multiplication Algorithm for High ...

An Efficient Scalar Multiplication Algorithm for ... - onlinepresent.org

An algorithm for multiplication of split_O_K - arXiv

An algorithm for multiplication of trigintaduonions