Fast Implementation of Extension Fields with TypeII

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.5 MAY 2005

1200

PAPER

Special Section on Discrete Mathematics and Its Applications

Fast Implementation of Extension Fields with TypeII ONB and Cyclic Vector Multiplication Algorithm Yasuyuki NOGAMI†a) , Member, Shigeru SHINONAGA† , Student Member, and Yoshitaka MORIKAWA† , Member

SUMMARY This paper proposes an extension field named TypeII AOPF. This extension field adopts TypeII optimal normal basis, cyclic vector multiplication algorithm, and Itoh-Tsujii inversion algorithm. The calculation costs for a multiplication and inversion in this field is clearly given with the extension degree. For example, the arithmetic operations in TypeII AOPF F p5 is about 20% faster than those in OEF F p5 . Then, since CVMA is suitable for parallel processing, we show that TypeII AOPF is superior to AOPF as to parallel processing and then show that a multiplication in TypeII AOPF becomes about twice faster by parallelizing the CVMA computation in TypeII AOPF. key words: extension field, public-key cryptosystem, fast implementation, optimal extension field, optimal normal basis

1.

Introduction

In the modern information-oriented society, the public key cryptosystem plays a key role. The Rivest Shamir Adleman (RSA) cryptosystem has been most widely used; however, its key length reaches about 2000 bits. It is not so practical to implement the RSA on scarce computation resources such as IC cards and mobile devices. On the other hand, the elliptic curve cryptosystem (ECC) [1] and XTR-based cryptosystem [2] require shorter length key and the encryption and decryption are carried out faster than the RSA, where XTR is the abbreviation of Efficient and Compact Subgroup Trace Representation. Therefore, the ECC and the XTRbased cryptosystem have received much attention and many applications have been proposed [3], [4]. This paper focuses on the fact that they adopt a large finite field as the definition field. Especially, as the definition field of ECC, an extension field whose extension degree is a prime number is recommended from security reasons. In order to fast implement these cryptosystems and their applications, this paper proposes an extension field named TypeII AOPF in which we can fast calculate a multiplication and inversion. In the research area of large extension field implementation, a multiplication and inversion play a key role. As conventional methods, optimal extension field (OEF) [5], all one polynomial field (AOPF) [6], generalized OEF (GOEF) [7], and tower field technique [8], [9] are well-known. In these extension fields, we can fast calculate a multiplication and inversion since they adopt a speManuscript received August 17, 2004. Manuscript revised November 15, 2004. Final manuscript received December 27, 2004. † The authors are with Communication Network Engineering, Okayama University, Okayama-shi, 700-8530 Japan. a) E-mail: [email protected] DOI: 10.1093/ietfec/e88–a.5.1200

cial prime number and irreducible polynomial as the characteristic and the modular polynomial, respectively. In addition, these extension fields have devised fast multiplication and inversion algorithms. For example, OEF adopts a pseudo Mersenne prime number as the characteristic, an irreducible binomial as the modular polynomial, Karatsuba method [10] for polynomial multiplication, and Itoh-Tsujii inversion algorithm (ITA) [11]. On the other hand, AOPF adopts a pseudo Mersenne prime number as the characteristic, TypeI optimal normal basis (ONB) [12], cyclic vector multiplication algorithm (CVMA), and ITA-based inversion that uses CVMA. The extension degree of AOPF must be a certain even number [6], therefore AOPF can not have odd prime extension degrees. This paper proposes an extension field that adopts TypeII ONB [12] as the basis and CVMA for multiplication. In this paper, we call this extension field TypeII AOPF in this paper. First, we go over the fundamentals of OEF, AOPF, ITA, and the following properties of CVMA; 1) the calculation cost for CVMA is clearly given with the number of additions and that of multiplications in prime field, 2) CVMA is quite effective when both of multiplier and multiplicand vectors are self reciprocal vectors, where a self reciprocal vector has symmetric vector representation, 3) the extension degree for CVMA is preferred to be small [6], and 4) CVMA is suitable for parallel processing. Based on the property that an element in TypeII AOPF is represented as a self reciprocal vector, we modify the CVMA for multiplication in TypeII AOPF. Then, we show a multiplication and an ITA-based inversion in TypeII AOPF. Our main proposal is to incorporate CVMA into a multiplication in TypeII AOPF. TypeII AOPF has the following features; 1) TypeII AOPF adopts TypeII ONB as the basis, 2) it is possible for the extension degree of TypeII AOPF to be an odd number, of course an odd prime number, 3) the calculation costs for a multiplication and inversion in TypeII AOPF are clearly given because these operations are based on CVMA and ITA. In this paper, we evaluate the calculation costs for a multiplication and inversion in TypeII AOPF F pm by counting the number of additions, that of multiplications, and that of inversions in the prime field F p , where p is the characteristic. Then, we compare these costs to those in OEF, AOPF, and GOEF. In addition, we simulate a multiplication and inversion in TypeII AOPF on PentiumIII (800 MHz) processor with C language. Then, we show that a multiplication and inversion in TypeII AOPF are 20% faster than those of OEF. Since the CVMA is suitable for parallel processing,

c 2005 The Institute of Electronics, Information and Communication Engineers Copyright

NOGAMI et al.: FAST IMPLEMENTATION OF EXTENSION FIELDS WITH TYPEII ONB

1201

we also show that TypeII AOPF is superior to AOPF as to parallel processing. A multiplication in TypeII AOPF becomes about twice faster by parallelizing the CVMA computation, where we used Pentium SSE2 technology (Pentium4, 1.7 GHz) for experiments. Throughout this paper, #SADD , #SMUL , and #SINV denote the number of additions, that of multiplications, and that of inversions in F p , respectively. p and m denote the characteristic and the extension degree, respectively, where p is a prime number. F pm denotes an m-th extension field over F p and F ∗pm denotes the multiplicative group in F pm . Without any additional explanation, lower and upper case letters show elements in prime field and extension field, respectively, and a Greek alphabet shows a zero of modular polynomial.

characteristic p : a pseudo Mersenne prime number in the form of p = 2l ± c, l/2 ≥ log2 c, where l is less than the word size of the processor concerned. modular polynomial : all one polynomial; (x2m+1 − 1)/(x − 1), where it must be irreducible. basis : a pseudo polynomial basis; {ω, ω2 , · · · , ω2m−1 , ω2m }

The pseudo polynomial basis Eq. (4) is equivalent to the following normal basis; 2m−1

{ω, ω p , ω p , · · · , ω p

Fundamentals

We briefly go over the fundamentals of well-known optimal extension field(OEF) [5] and all-one polynomial field(AOPF) [6]. Then, we introduce cyclic vector multiplication algorithm (CVMA) [6] and Itoh-Tsujii inversion algorithm (ITA) [11], where AOPF adopts CVMA, both of OEF and AOPF adopt ITA. 2.1 Optimal Extension Field

(4)

where ω is a zero of the modular polynomial.

2

2.

(3)

},

(5)

such a normal basis is generally called TypeI Optimal Normal Basis (ONB) and it is suitable for fast arithmetic operations in extension field. (x2m+1 − 1)/(x − 1) must satisfy the following conditions so as to be irreducible over F p ; 1) 2m + 1 is a prime number, 2) p is a primitive element in F2m+1 . Accordingly, the extension degree of AOPF must be an even number, therefore AOPF can not have odd prime extension degrees. In AOPF, we calculate a multiplication by CVMA and an inversion by ITA.

OEF is an extension field F pm proposed by Bailey et al., and OEF adopts the following characteristic, modular polynomial, and basis so as to fast implement multiplication and inversion in OEF F pm ;

2.3 Cyclic Vector Multiplication Algorithm (CVMA)

characteristic p : a pseudo Mersenne prime number in the form of p = 2l ± c, l/2 ≥ log2 c, where l is less than the word size of the processor concerned. modular polynomial : irreducible binomial;

where ω is a zero of (x2m+1 − 1)/(x − 1), that is the modular polynomial of AOPF, and the basis Eq. (4) consists of conjugates of ω as shown in Eq. (5). Now, let us consider two vectors X and Y in F p2m that are represented by the basis Eq. (4) as

xm − s, s ∈ F p .

(1)

ω2m+1 = 1, ω + ω2 + · · · + ω2m = −1,

X = (x1 , x2 , · · · , x2m ), Y = (y1 , y2 , · · · , y2m ),

basis : a polynomial basis; {1, ω, ω2 , · · · , ωm−1 },

CVMA [6] uses the following two relations;

(2)

where ω is a zero of the modular polynomial. In the case of c = 1, that is p = 2l ±1, it is called TypeI. In the case that the modular polynomial is xm −2, it is called TypeII. TypeI and II OEFs achieve more effective implementations. As shown in Bailey et al. [5], OEF adopts Karatsuba method [10] and ITA for multiplication and inversion, respectively. 2.2 All One Polynomial Field All-one polynomial field (AOPF) is an extension field F p2m , where the extension degree of AOPF must be a certain even number as described below. AOPF adopts the following characteristic, modular polynomial, and basis so as to fast implement multiplication and inversion in AOPF F p2m ;

(6)

(7)

where xi , yi ∈ F p and 2m ≥ i ≥ 1. Supposing the product Z of X and Y as follows; Z = XY = (z1 , z2 , · · · , z2m ),

(8)

where zi ∈ F p and 2m ≥ i ≥ 1, according to CVMA [6], we calculate the following q j ; qj =

m k=1

x2−1 j+k − x2−1 j−k · y2−1 j+k − y2−1 j−k ,

(9a)

where 2m ≥ j ≥ 0. Then, we have the coefficient zi as zi = q0 − qi , 2m ≥ i ≥ 1,

(9b)

where the subscript · means · mod 2m + 1. In the case that


1202

the extension degree m = 2, CVMA is written as q0 = (x1 − x4 )(y1 − y4 ) + (x2 − x3 )(y2 − y3 ). z1 = q0 − {(x2 − x4 )(y2 − y4 ) + x1 y1 }, z2 = q0 − {(x3 − x4 )(y3 − y4 ) + x2 y2 }, z3 = q0 − {(x1 − x2 )(y1 − y2 ) + x3 y3 }, z4 = q0 − {(x1 − x3 )(y1 − y3 ) + x4 y4 }.

following the calculation cost; (10a) (10b) (10c) (10d) (10e)

As the differences between Karatsuba-based multiplication and CVMA, we can clearly evaluate the calculation cost for CVMA in AOPF F p2m as [6] #SMUL = m(2m + 1), #SADD = 6m − 3m − 1, 2

(11)

because CVMA is based on Eqs. (9). Moreover, as we can easily find from Eqs. (9), CVMA is quite effective when both of multiplier and multiplicand vectors are self reciprocal vectors [6], where self reciprocal vector is defined as follows; Definition 1: For a vector A = (a1 , a2 , · · · , a2m ) in F p2m , we call A∗ = (a2m , · · · , a2 , a1 ) the reciprocal vector of A. If A = A∗ , in this paper, we say that A is a self reciprocal vector. When both of multiplier and multiplicand vectors are self reciprocal vectors, the calculation cost becomes #SMUL

m = (m + 1) , #SADD = 2m (m − 1) , 2

(12)

it is only about one-fourth of Eq. (11) [6]. CVMA does not depend on whether or not the all one polynomial (x2m+1 − 1)/(x − 1) is irreducible over F p because Eq. (9a) and Eq. (9b) are theoretically deduced from Eqs. (6), where Eqs. (6) are given only from the fact that ω is a zero of (x2m+1 − 1)/(x − 1) [6]. For fast implementation, the extension degree for CVMA is preferred to be small [6]. In addition, as shown in Eq. (9a) and Eqs. (10), we can easily parallelize the CVMA calculations. CVMA is suitable for parallel processing.

2

p2

nA = A(A A · · · A p

m−1

pm−1

),

),

(13a) (13b)

where nA is the norm of A with respect to F p , therefore nA belongs to F ∗p because A 0. The following mapping φ is called Frobenius mapping (FM), φ : A → Ap.

3.

TypeII All One Polynomial Field

First, we introduce TypeII ONB and define an extension field F pm named TypeII AOPF. Then, we show a relation between a self reciprocal vector introduced in Sect. 2.3 and an element in TypeII AOPF. After that, we modify the CVMA for a multiplication in TypeII AOPF F pm , then we show a multiplication and an ITA-based inversion in TypeII AOPF F pm . 3.1 TypeII Optimal Normal Basis TypeII ONB is defined as follows [12]; Definition 2: Let us consider the set Eq. (16a) that consists of m elements, where ω is a zero of all one polynomial (x2m+1 − 1)/(x − 1) over F p ; {ω + ω−1 , ω2 + ω−2 , · · · , ωm + ω−m }.

(14)

For a FM, OEF needs several F p -multiplications [5], [6]; however, AOPF does not need any arithmetic operations because the basis of AOPF is a normal basis. Therefore, ITA is quite effective in these fields. With applying a certain addition chain into Eqs. (13) [11], we calculate A−1 with the

(16a)

If m and p satisfy the following condition 1 and either 2a or 2b, Eq. (16a) forms a basis in F pm . In addition, since Eq. (16a) is equivalent to the normal basis Eq. (16b), it is generally called TypeII ONB, where β = ω + ω−1 and ω−1 = ω2m . 1: 2m + 1 is a prime number. 2a: p is a primitive element in F2m+1 . 2b: 2 | (m − 1) and the order of p in F2m+1 is m. {β, β p , · · · , β p

Itoh-Tsujii inversion algorithm (ITA) [11] calculates the inverse A−1 of a non-zero element A in F pm by

(15a) (15b)

where · is the maximum integer less than or equal to · and w(·) is the Hamming weight of ·. #FM and #VMUL mean the number of FMs and that of multiplications in F pm , respectively. Both of OEF and AOPF adopt binary extended Euclidian algorithm (BEEA) [10] for an inversion in F p .

m−1

2.4 Itoh-Tsujii Inversion Algorithm (ITA)

p p p A−1 = n−1 A (A A · · · A

#FM = #VMUL = log2 (m − 1) + w(m − 1), #SMUL = m, #SINV = 1,

}.

(16b)

Since 2m + 1 is a prime number, the order of ω is 2m + 1. Therefore, 2m elements shown in Eq. (4) are different from each other. As shown in Sect. 2.3, if m and p satisfy the conditions 1 and 2a, the set Eq. (4) forms a pseudo polynomial basis in F p2m , that is TypeI ONB. Regardless of whether 2a or 2b is satisfied, the set Eq. (4) is equivalent to the set Eq. (17) in order. {ω, ω2 , · · · , ωm , ω−m , · · · , ω−2 , ω−1 }.

(17)

Since TypeII AOPF adopts TypeII ONB as the basis and TypeII ONB is a normal basis, a FM in TypeII AOPF does not need any arithmetic operations.


1203

3.2 Definition of TypeII AOPF

3.4 Multiplication in TypeII AOPF

We define an extension field F pm named TypeII AOPF as follows;

Based on the discussion in Sect. 3.3 and using Eq. (6), we modify the CVMA for TypeII ONB representation. In the same of Eqs. (20), let us suppose an arbitrary element Y in TypeII AOPF F pm as follows;

characteristic p : a pseudo Mersenne prime number in the form of p = 2l ± c, l/2 ≥ log2 c, where l is less than the word size of the processor concerned. modular polynomial : all one polynomial; (18)

basis : TypeII optimal normal basis; −2

{ω + ω , ω + ω , · · · , ω + ω 2

m

−m

},

(19)

where ω is a zero of the modular polynomial. Since TypeII AOPF adopts TypeII ONB, the conditions shown in Def. 2 must be satisfied. It is noted that the above modular polynomial is not irreducible when the conditions 1 and 2b are satisfied. In this definition, the modular polynomial only means that its zero ω constructs the TypeII ONB Eq. (19). It is possible for the extension degree m of TypeII AOPF F pm to be an odd number, of course an odd prime number. In Sect. 3.4 and Sect. 3.5, we show a multiplication and inversion in TypeII AOPF. Our proposal is to incorporate CVMA into a multiplication in TypeII AOPF. 3.3 A Relation between a Self Reciprocal Vector and an Element in TypeII AOPF Let us suppose that an arbitrary element X in TypeII AOPF F pm is represented by TypeII ONB Eq. (19) as X=

m

m

yi (ωi + ω−i ),

(21)

i=1

(x2m+1 − 1)/(x − 1).

−1

Y=

xi (ωi + ω−i ),

(20a)

i=1

we consider the product Z as shown in Eq. (8) by using CVMA Eqs. (9). Since X and Y are self reciprocal vectors as shown in Eq. (20b) and Eq. (21), we can find that q0 becomes 0 from Eq. (9a). Therefore, we do not have to calculate q0 . In addition, for m ≥ j ≥ 1 and Eq. (7), the following relations hold; x j = x− j = x2m+1− j , y j = y− j = y2m+1− j ,

(22a) (22b)

where the subscript · means · mod 2m + 1. Therefore, for Eq. (9a), as shown in Appendix we obtain q j = q− j = q2m+1− j .

(22c)

From Eq. (22c) and q0 = 0, for Eq. (8) we have z j = z− j = z2m+1− j .

(22d)

It is obvious that the product Z given by multiplying two self reciprocal vectors X and Y also becomes a self reciprocal vector as follows; Z = z 1 ω + z 2 ω2 + · · · + z m ωm + zm ωm+1 + · · · + z2 ω2m−1 + z1 ω2m .

(23)

Consequently, we can calculate Z = XY by

where xi ∈ F p and m ≥ i ≥ 1. Since Eq. (4) is equivalent to Eq. (17) and ω−1 = ω2m , we can represent X as X = x1 ω + x2 ω2 + · · · + xm ωm + xm ωm+1 + · · · + x2 ω2m−1 + x1 ω2m ,

where yi ∈ F p and m ≥ i ≥ 1,

zj = −

k=1

(20b)

that is a self reciprocal vector introduced in Sect. 2.3. Therefore, we can deal with an arbitrary element in TypeII AOPF as a self reciprocal vector. As shown in Sect. 3.4, we calculate a multiplication in TypeII AOPF by modifying CVMA. The most important point is to represent an arbitrary element in TypeII AOPF F pm as a self reciprocal vector Eq. (20b) with 2m coefficients. By the way, when the conditions 1 and 2a in Def. 2 are satisfied, {ω, ω2 , · · · , ω2m−1 , ω2m } forms TypeI ONB that is the basis of AOPF F p2m . Therefore, we find that a self reciprocal vector in AOPF F p2m is an element in its proper subfield F pm [6].

m

=

m k=1

x2−1 j+k − x2−1 j−k

y2−1 j+k − y2−1 j−k x2−1 j−k − x2−1 j+k

y2−1 j+k − y2−1 j−k ,

(24)

where m ≥ j ≥ 1. As an example of CVMA for TypeII ONB representation, let us consider the case that the extension degree m = 2. Supposing two arbitrary elements X, Y in TypeII AOPF F p2 as X = x1 (ω + ω−1 ) + x2 (ω2 + ω−2 ), Y = y1 (ω + ω−1 ) + y2 (ω2 + ω−2 ).

(25a) (25b)

According to Eq. (24), we can calculate the product Z of X and Y by


1204

Z = z1 (ω + ω−1 ) + z2 (ω2 + ω−2 ), z1 = (x1 − x2 )(y2 − y1 ) − x1 y1 , z2 = (x1 − x2 )(y2 − y1 ) − x2 y2 ,

(26) (27a) (27b)

where we calculate (x1 − x2 )(y2 − y1 ) in Eqs. (27) only once and use twice. In what follows, we refer the calculation Eq. (24) as CVMA in TypeII AOPF. As previously described, CVMA in TypeII AOPF does not need q0 calculation, therefore it is more suitable for parallel processing than the original CVMA. 3.5 Inversion in TypeII AOPF We calculate an inversion in TypeII AOPF by using ITA with CVMA as follows; Inversion in TypeII AOPF F pm Input: Non zero element A in TypeII AOPF F pm Output: Multiplicative inverse A−1 2 m−1 Step1: Calculate B = A p A p · · · A p by applying a certain addition chain [11] and CVMA in TypeII AOPF. Step2: Calculate nA = A · B by CVMA in TypeII AOPF. Since A 0 and nA is the norm of A with respect to F p , nA becomes a non-zero element in F p . Since the basis of TypeII AOPF is TypeII ONB and Eq. (6) holds, nA is also represented as a vector with m coefficients over the prime field F p as follows; (−nA , −nA , · · · , −nA ).

4.1 Cost Evaluation

(28)

Therefore, it is enough to calculate only one coefficient of the above vector representation by using Eq. (24). Step3: Calculate c = n−1 A by BEEA [10]. Step4: Calculate A−1 = c · B by scalar multiplications. (End of algorithm) As previously mentioned, we note that a FM in TypeII AOPF does not need any arithmetic operations. As shown above, ITA needs several multiplications in the extension field F pm and one inversion in the prime field F p . Therefore, it is quite effective for ITA-based inversion to fast implement a multiplication in the extension field F pm . In Sect. 4, we can see the effectiveness of CVMA in TypeII AOPF for fast implementing an inversion in extension field. 4.

extension degree m = 2, 3, 5, and 6, we compare the calculation costs among TypeII AOPF and conventional extension fields OEF [5], AOPF [6], and GOEF [7]. The reason why this paper selects these extension degrees is that CVMA is superior to Karatsuba-based multiplication when the extension degree is small [6]. We evaluate the calculation costs by counting the number of additions, that of multiplications, and that of inversions in F p needed for a multiplication and inversion in the extension field F pm . For example, (2, 4, 1) shown in Table 1 means that a multiplication in OEF F p2 needs 2 additions, 4 multiplications, and 1 inversion in F p for its implementation. To be more detailed, these numbers show #SADD , #SMUL , and #SINV from the left hand side in the parenthesis, respectively. In this paper, we count a subtraction in F p as an addition in F p . After that, we select some prime numbers as the characteristic p and then concretely implement CVMA and inversion in TypeII AOPF F p5 and F p6 , then we simulate these arithmetic operations on Pentium III (800 MHz) with C language. In addition, we show that TypeII AOPF is superior to AOPF as to parallel processing and also show that a multiplication in TypeII AOPF becomes about twice faster by parallelizing the CVMA computation, where we used Pentium SSE2 technology (Pentium4, 1.7 GHz) for experiments.

Cost Evaluation and Simulation

In this section, we evaluate the calculation costs for a multiplication and inversion in TypeII AOPF. Especially for the Table 1

As shown in Sect. 2.3, CVMA for self reciprocal vectors in AOPF F p2m needs the calculation cost Eq. (12), therefore Eq. (12) is just the calculation cost for a multiplication in TypeII AOPF F pm . A typical difference between CVMA and Karatsuba-based multiplication is that the calculation cost is clearly given with the extension degree m as shown in Eq. (12). For an inversion in TypeII AOPF F pm , Step1 needs m (m + 1) (tm − 1), 2 = 2m (m − 1) (tm − 1),

#SMUL =

(29a)

#SADD

(29b)

where tm = log2 (m − 1) + w(m − 1). Step2 needs #SMUL = m, #SADD = 3m − 2 for calculating one vector coefficient. Step3 and Step4 need #SINV = 1 and #SMUL = m, respectively. Concludingly, an inversion in TypeII AOPF F pm needs

Calculation costs for F p2 -arithmetic operations.

method

modular polynomial

multiplication

inversion

OEF

x +1

(5, 3, 0)

(2, 4, 1)

TypeII OEF

x −2

(6, 3, 0)

(3, 4, 1)

AOPF

x +x+1

(4, 3, 0)

(2, 4, 1)

TypeII AOPF

(x − 1)/(x − 1)

(4, 3, 0)

(2, 4, 1)

2 2

2

5


1205 Table 2

Calculation costs for F p3 - and F p5 -arithmetic operations.

F p3

method

multiplication

inversion

multiplication

inversion

TypeII OEF

(17, 6, 0)

(20, 15, 1)

(54, 15, 0)

(114, 52, 1)

TypeII AOPF

(12, 6, 0)

(19, 12, 1)

(40, 15, 0)

(93, 40, 1)

Table 3

F p6 -Implementations and calculation costs for F p6 -arithmetic operations.

method

modular polynomial

multiplication

inversion

TypeII OEF [5]

x −2

(69, 18, 0)

(276, 98, 1)

TypeII AOPF

(x − 1)/(x − 1)

(60, 21, 0)

(196, 75, 1)

GOEF [7]

(x7 − 1)/(x − 1)

(62, 18, 0)

−

(x − 1)/(x − 1)

(50, 21, 0)

(98, 42, 1)

x3 − 2 → x2 − ω†

(67, 18, 0)

(95, 39, 1)

(48, 18, 0)

(73, 36, 1)

AOPF [6]

6

13

††

7

Tower TypeII OEF††

(x7 − 1)/(x − 1)

Tower TypeII AOPF††

→ (x5 − 1)/(x − 1) †

††

ω is a zero of x3 − 2 that is the modular polynomial of F p3 .

These extension fields effectively use the operations in proper subfields.

m (m + 1) (tm − 1) + 2m, 2 #SADD = 2m (m − 1) (tm − 1) + 3m − 2, #SINV = 1. #SMUL =

F p5

(30a) (30b) (30c)

4.2 Comparison Table 1 shows the calculation costs for a multiplication and inversion in F p2 . According to Sect. 4.1, the cost for an inversion in TypeII AOPF F p2 should be given as #SADD = 4, #SMUL = 4, and #SINV = 1; however, we can decrease this calculation cost to (2, 4, 1) by using the following calculation; −1 (31) A−1 = (ca2 , ca1 ), c = (a1 a2 − (a1 − a2 )2 , where A = (a1 , a2 ). The reason why we consider such an optimization is that most of previous works also consider such optimizations when the extension degree m = 2. Next, Table 2 shows the calculation costs for a multiplication and inversion in F p3 and F p5 , and Table 3 shows those in F p6 . In Tables 3 and 5, Tower means that the extension field is constructed by applying tower field technique [8], [9]. For example, Tower TypeII AOPF F p6 is constructed by 2nd extension over TypeII AOPF F p3 . In the tables, (x7 − 1)/(x − 1) is the modular polynomial of TypeII AOPF F p3 and (x5 − 1)/(x − 1) is the modular polynomial for the 2nd extension. Of course, Tower F p6 can be also constructed by 3rd extension over F p2 ; however, by this construction the calculation cost for an inversion becomes worse. Tower field technique enables to fast implement the

extension field arithmetic operations; however, their programs become complicated. The detail of tower field technique can be seen in [8], [9]. From the cost evaluations, we can say that TypeII AOPF is effective for fast implementation of extension fields whose extension degree is small† . To be more detailed, for F p3 , F p5 , and F p6 implementations, we can find that 1) the number of F p -additions for a multiplication in TypeII AOPF is about 20% less than that in OEF, 2) the number of F p multiplications and that of F p -additions for an inversion in TypeII AOPF are about 20% less than those in OEF, respectively. In addition, from Table 3, we can find that it is quite effective for TypeII AOPF F p6 -implementation to apply tower field technique. 4.3 Simulation Result For this simulation, we used PentiumIII(800 MHz) and C language. We adopted pseudo Mersenne prime numbers within 32 bits length as the characteristic p because the word size of PentiumIII is 32 bits. Table 4 shows the computation time of a multiplication in F p5 and that of an inversion in F p5 . Table 5 shows the computation time of a multiplication in F p6 and that of an inversion in F p6 . From these tables, we can find that operations in TypeII AOPF F p5 and F p6 are faster than those in OEF. Especially, an inversion in TypeII AOPF F p5 , F p6 are about 20% faster than those in † From only the viewpoint of #SMUL , we can say that CVMA becomes worse than Karatsuba-based multiplication as the extension degree m becomes larger, because CVMA and Karatsuba-based multiplication need #SMUL = m(m + 1)/2 and #SMUL ≈ m1.58 , respectively.


1206 Table 4

Simulation result of F p5 -multiplication and inversion.

unit:µs p 217 − 1 220 − 5 230 + 7

method

modular polynomial

multiplication

inversion

TypeI OEF

x −3

0.96

3.75

TypeII AOPF

(x − 1)/(x − 1)

0.78

2.76

5

11

TypeII OEF

x −2

1.12

4.23

TypeII AOPF

(x11 − 1)/(x − 1)

0.98

3.34

TypeII OEF

x −2

1.57

6.16

TypeII AOPF

(x − 1)/(x − 1)

1.50

4.94

5

5

11

*CPU:PentiumIII, 800 MHz Table 5

Simulation result of F p6 -multiplication and inversion.

unit:µs p

2 −3 24

method

modular polynomial

multiplication

inversion

TypeII OEF

x −2

1.47

7.16

TypeII AOPF

(x − 1)/(x − 1)

1.46

5.41

Tower TypeII OEF

†

x −2→ x −ω

1.46

3.32

1.29

3.01

6

13

Tower TypeII AOPF †

3

2

(x7 − 1)/(x − 1) → (x5 − 1)/(x − 1)

ω is a zero of x3 − 2 that is the modular polynomial of F p3 . *CPU:PentiumIII, 800 MHz

128 bits

z1 =(

XMM register z2 64 bits

64 bits

x1

64 bits

x2

x1 ∗ y1

x2 ∗ y2

x1 , x2 , y1 , and y2 are integers less than 32 bits. Fig. 1

y1

∗

)−

− y2

y1

x1

y1

x2

y2

x1 , x2 , y1 , and y2 are integers less than 32 bits. The calculation of Eqs. (27).

y2 =

x2 + y2

x2

Fig. 2

y1

= x1 + y1

x2 ∗

y2

)∗(

− x1

y2

x2

64 bits

x1

+ y1

x1

Addition and multiplication with SSE2 technology.

OEF. From these simulation results, we can conclude that TypeII AOPF is enough practical for fast implementation of extension field. Parallelizing CVMA Computation There is no need to say that CVMA is quite suitable for parallel processing as compared to Karatsuba-based multiplica-

tion. In this section, we particularly compare the multiplications in AOPF F p6 and TypeII AOPF F p6 and we used XMM registers of Pentium SSE2 technology (Pentium4, 1.7 GHz) in two parallel lines as shown in Fig. 1 and Fig. 2. Figure 1 shows addition and multiplication for integers x1 , x2 , y1 , y2 using XMM registers. Figure 2 shows the calculation of Eqs. (27). Table 6 shows the experimental result. From the table, we find that the computation times in TypeII AOPF F p6 is faster than those in AOPF F p6 . It is caused from the fact that the CVMA in AOPF needs q0 calculation as shown in Eqs. (9); however, CVMA in TypeII AOPF does not need it. From this point of view, we can say that TypeII AOPF is superior to AOPF as to parallel processing. In addition, we find that the computation time of a multiplication in TypeII AOPF F p6 becomes about twice faster by parallelizing the CVMA computation. By the way, when we use XMM reg-


1207 Table 6

Simulation result of F p6 -multiplication with SSE2 technology.

unit:µs p 224 − 3 225 − 39 226 − 27

method

calculation cost

with SSE2

without SSE2

AOPF

(50, 21, 0)

0.59

1.02

TypeII AOPF

(60, 21, 0)

0.56

1.25

AOPF

(50, 21, 0)

0.68

1.47

TypeII AOPF

(60, 21, 0)

0.57

1.78

AOPF

(50, 21, 0)

0.68

1.47

TypeII AOPF

(60, 21, 0)

0.57

1.79

*CPU:Pentium4, 1.7 GHz Table 7

Simulation results of ECA and ECD over TypeII OEF F p5 and TypeII AOPF F p5 .

unit:µs p 220 − 5

method

modular polynomial

ECA

ECD

TypeII OEF

x −2

8.07

9.30

TypeII AOPF

(x11 − 1)/(x − 1)

7.04

8.31

5

*CPU:PentiumIII, 800 MHz

isters, as shown in Fig. 1, most of calculations which consist of integers less than 32 bits are carried out by using two 32bit registers. We can easily operate 64-bit data. Therefore, as shown in Table 6, the calculation time of multiplication is not just twice faster. For ECC and XTR-Based Cryptosystem The encryption and decryption of ECC need scalar multiplications for rational points on the elliptic curve. Scalar multiplication needs a lot of elliptic curve additions and elliptic curve doublings. Elliptic curve addition (ECA) and doubling (ECD) need additions, multiplications, and inversions in the definition field. Therefore, if we can fast implement these fundamental arithmetic operations in the definition field, the encryption and decryption of ECC can be fast carried out. Table 7 shows the computation times for ECA and ECD over the definition field TypeII OEF F p5 and TypeII AOPF F p5 , where we used 220 − 5 as the characteristic p. From Table 7, we find that ECA and ECD over TypeII AOPF are carried out about 10% faster than those over TypeII OEF, respectively. We can understand this result from Table 2 and Table 4. On the other hand, XTR-based cryptosystem does not need any inversions but additions and multiplications in the definition field [2], [13]. As shown in Table 5, a multiplication in Tower TypeII AOPF F p6 is carried out about 10% faster than that in Tower TypeII OEF F p6 . Therefore, in the same way of ECC implementation, XTR-based cryptosystem over Tower TypeII AOPF F p6 will be approximately 10% faster than that over Tower TypeII OEF F p6 [7], [14].

5.

Conclusion

This paper has proposed an extension field named TypeII AOPF. This extension field adopts TypeII optimal normal basis, cyclic vector multiplication algorithm (CVMA), and Itoh-Tsujii inversion algorithm. Our main proposal is to incorporate CVMA into a multiplication in TypeII AOPF. The calculation costs for a multiplication and inversion in this field is clearly given with the extension degree. The arithmetic operations in TypeII AOPF F p5 , for example, was about 20% faster than those in OEF F p5 . In addition, we showed that TypeII AOPF was superior to AOPF as to parallel processing and also showed that the CVMA computation in TypeII AOPF became about twice faster by parallelizing the CVMA computation. References [1] A. Menezes, Elliptic Curve Public Key Cryptosystems, Kluwer Academic Publishers, 1993. [2] A. Lenstra and E. Verheul, “The XTR public key system,” Proc. Crypto 2000, LNCS 1880, pp.1–20, 2000. [3] D. Boneh, B. Lynn, and H. Shacham, “Short signatures from the Weil pairing,” Proc. Asiacrypt2001, LNCS 2248, pp.514–532, 2001. [4] P. Barreto, H. Kim, B. Lynn, and M. Scott, “Efficient algorithms for pairing-based cryptosystems,” CRYPTO 2002, LNCS 2442, pp.354–368, 2002. [5] D. Bailey and C. Paar, “Optimal extension fields for fast arithmetic in public-key algorithms,” Proc. Asiacrypt2000, LNCS 1976, pp.248–258, 2000. [6] Y. Nogami, A. Saito, and Y. Morikawa, “Finite extension field with modulus of all-one polynomial and representation of its elements for fast arithmetic operations,” IEICE Trans. Fundamentals, vol.E86-A, no.9, pp.2376–2387, Sept. 2003. http://search.ieice.org/2003/pdf/e86-a 9 2376.pdf


1208

[7] D. Han, K. Yoon, Y. Park, C. Kim, and J. Lim, “Optimal extension fields for XTR,” Proc. SAC2002, LNCS 2595, pp.369–384, 2003. [8] M. Morii and M. Kasahara, “Efficient construction of gate circuit for computing multiplicative inverse over GF(2m ),” IEICE Trans., vol.E72, no.1, pp.37–42, Jan. 1989. [9] J. Fan and C. Paar, “On efficient inversion in tower fields of characteristic two,” Proc. ISIT1997, p.58, 1995. http://www.crypto.ruhruni-bochum.de/Publikationen/texte/tower final.ps [10] D. Knuth, The Art of Computer Programming, vol.2: Seminumerical Algorithms, Addison-Wesley, 1981. [11] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in GF(2m ) using normal bases,” Inf. Comput., vol.78, pp.171–177, 1988. [12] I. Blake, G. Seroussi, and N. Smart, Elliptic Curves in Cryptography, LNS 265, Cambridge Univ. Press, 1999. [13] S. Lim, S. Kim, I. Yie, J. Kim, and H. Lee, “XTR extended to GF(p6m ),” Proc. SAC 2001, LNCS 2259, pp.301–312, 2001. [14] S. Shinonaga, Y. Fujii, Y. Nogami, and Y. Morikawa, “A fast implementation of F p6m for XTR,” IEICE Technical Report, ISEC200363, pp.81–88, 2003.

Appendix:

Proof of Eq. (22c)

(A· 1)

where m ≥ j ≥ 1. We can develop the first term of the left hand side of Eq. (A· 1) as x2−1 (− j)+k = x−(2−1 j−k) ,

(A· 2)

and the second term is developed as x2−1 (− j)−k = x−(2−1 j+k) .

(A· 3)

Therefore, substituting Eqs. (22) into the above equations, we obtain Eq. (A· 1). In the same way, we find y2−1 (− j)+k − y2−1 (− j)−k = − y2−1 j+k − y2−1 j−k .

Yoshitaka Morikawa graduated from the Department of Electronic Engineering, Osaka University in 1969 and obtained the MS degree in 1971. He then joined Matsushita Electric, where he engaged in research on data transmission. In 1972, he became a research associate at Okayama University, and subsequently an associate professor in 1985. He is now a professor of the Department of Communication Network Engineering. He has been engaged in research on image information processing. He holds a D.Eng. degree.

Based on Eqs. (22), we show the next relation; x2−1 (− j)+k − x2−1 (− j)−k = − x2−1 j+k − x2−1 j−k ,

Shigeru Shinonaga graduated from Electric and Electronic Engineering, Okayama University in 2003 and then received master’s degree in 2005. He joined Sanyo Broadcasting Company, Ltd. in 2005.

(A· 4)

Consequently, we get Eq. (22c).

Yasuyuki Nogami graduated from Shinshu University in 1994 and received doctor’s degree in 1999 from Shinshu University. He is now research associate of Okayama University. The main fields of his research are finite field theory and its applications. He is a member of IEEE.

Fast Implementation of Extension Fields with TypeII

Fast Implementation of Extension Fields with TypeII

Suggest Documents

Implementation of Fast Nearest Neighbor Search with

A fast logarithm implementation with adjustable accuracy

Fast computation of electromagnetic near-fields with the ... - Core

Fast Implementation of Morphological Filtering

Implementation of Parallelization Contract Mechanism Extension of ...

Hardware Implementation of Finite Fields of

FAST RECONNECTION OF MAGNETIC FIELDS IN TURBULENT ...

Fast sampling of parameterised Gaussian random fields

FAST RECONNECTION OF MAGNETIC FIELDS IN TURBULENT ...

Evaluation, Implementation, and Extension of Primitive ... - CiteSeerX

Implementation of a Fast Deployable Ad-Hoc Sensor Network with ...

Composite Extension Finite Fields for Low Overhead

Assessing Program Implementation - The Journal of Extension

Implementation of Fast Nearest Neighbor Search with Keywords for ...

Implementation of a Fast Deployable Ad-Hoc Sensor Network with ...

The FAST Satellite Fields Instrument - Semantic Scholar

Fast multilevel implementation of recursive ... - Semantic Scholar

A Fast Implementation of Adaptive Histogram Equalization

A Fast and Secure Implementation of Sflash

Fast Parallel Implementation of Lazy Languages

Fast implementation of the delta generalized multi

FAST IMPLEMENTATION OF VECTOR ... - Semantic Scholar

Implementation of FPGA based Fast DOA

Automatic derivation and implementation of fast ... - CiteSeerX