International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
Fast and Compact ASIC Implementation of SFlash New Signature Scheme Mohamed M.Abdelhalim Cairo University Advanced Smart Card Co., 6th October City
[email protected]
Raafat S.Elfouly Cairo University, Faculty of Engineering, Computer Engineering Dept
[email protected]
Abstract The idea of using multivariate polynomials as public keys has attracted several cryptographers, SFlash signature scheme is a variant of the Matsumoto and Imai multivariate public Key cryptosystem and selected by NESSIE Consortium. In this paper we describe a hardware implementation of SFlash based on bit-parallel architectures to achieve high speed circuits for operations on Finite Fields which can be efficiently used as an authentication unit in wireless devices, smart cards and RFID networks. We have proposed a new generalization to Karatsuba-Ofman multiplier as the core of the design. An ASIC chip can be realized with 78K gates counts and 2.8 mm 2 die size with 0.35 mm CMOS technology, with a maximum clock frequency 140 MHZ, which takes about 21.5 ms to sign 259-Bits data. Keywords: SFlash, Karatsuba-Ofman, Digital Signature, Cryptography.
1. Introduction This paper presents a SFlash hardware security unit for communication tags which implements SFlash signature generation unit (SSGU). The generated digital signature can be used to prove the authenticity of the claimed information to the reader device, meaning that with the help of this SSGU the communication tag provides cryptographic authentication. The signature unit could be used in wireless devices, contact-less smart cards and Radio Frequency Identification (RFID) which has settled down in the industry as a bar code replacement and will further proceed to be the “Internet of Things”, meaning an RFID tag attached to everything and communicating to other devices and also wireless networks where very high speed secure traffic is badly needed. Trapdoor mappings are central to Public-Key Cryptography. As such, cryptographers have studied trapdoor permutations and maps since the dawn of public key cryptography [1]. A handful of the many schemes attempted reached practical deployment. However, the critical trapdoor maps are often very slow, and that is frequently due to the sheer size of the algebraic structure. Typical is the modular exponentiation in RSA or the discrete logarithms in ElGamal/DSA/ECC. Multivariate public-key cryptosystems were born partly to overcome this limitation. Great names like Shamir [2] and Diffie [3] abound among the pioneers, but the first scheme to show promise and grab everyone’s attention was by Imai and Matsumoto. SFlash, a fast multivariate signature scheme was selected by the NESSIE Consortium. It belongs to the family of the new public key cryptosystem based on multivariable quadratic
33
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
polynomials. This idea is built on the proven theorem that solving a general set of multivariable polynomial equations over a finite field is NP-hard problem. SFlash was designed to have a security level of 280 with the present state of the art in the cryptanalysis, as required in the NESSIE project. The SFlash scheme is a simple variant of a basic design by Matsumoto and Imai, who suggested using the map: q q M : X X 1+(2 ) In [4] Matsumoto and Imai presented a signature scheme based on the following property: two secret affine transformations are blinded by a non-linear monomial map. Since the affine transformation hides the monomial map, general denomination for such cryptosystems is hidden monomial. In the case of IM cryptosystem, the two secret affine applications are S: k n k n and T: n k k n where: k F q and q is a power of 2, let £ be an extension of degree n of the finite field K, an n-tuple of: x (x 0 ,....x n 1 ) k n . Let: ( x (x 0 ,...., x n 1 ) k n ) x £ and 1 its inverse. The blinding monomial map of IM is: F ' A £ A h £ where: h (q 1) . The exponent h further satisfies the: GCD ((q 1),(q n 1)) 1 . In this case F is a bijection and its inverse is h F -1 : A A h ' where: h ' = h -1 mod (q q +1) . In [5] Akkar and Courthouse proposed a design of an optimized software implementation of SFlash and presented a method to protect the implementation against power attacks such as Differential Power Analysis (DPA attack). Although the security of SFlash is not as well understood as, for example RSA, SFlash is apparently the fastest signature schema known. Following these lines, this paper introduces a new formulation that can solve one of the most expensive problems that are associated with the cryptographic applications over finite fields, namely, the multiplication problem of large elements over finite fields. Although, our proposal can be widely used for any cryptographic algorithm needs polynomial multiplication, our design is best tuned for the finite fields applications. Several architectures have been proposed for multiplication over Finite Fields. For example, efficient bit-parallel multipliers for both canonical and the normal basis representation have been proposed in [8, 9, 10]. All these algorithms exhibit a space complexity O (m 2 ) .Karatsuba-Ofman algorithm which originally appeared in 1963 [11], proposed a formula which able to solve the problem on only O ( m 1.58 ) operations. Karatsuba multipliers can result in fewer bit operations at the expense of some design restrictions, particularly in the selection of the degree of the polynomial m to be a power of two. However, for certain applications, especially, elliptic curve cryptosystems and our currently algorithm under research, it is important to consider finite fields GF (2m ) where m is not necessarily a power of two. In fact, for this specific application some sources [12] suggest that, for security purposes, it is strongly recommended to choose degrees for the finite field, in the range [160,512] with m a prime. There are many studies on hardware implementation of Karatsuba algorithm over Galois Field (GF) intended for use in elliptic curve cryptography [13, 14]. In Bailey and Paar [15] a new scheme how to apply Karatsuba’s idea was proposed. In this scheme the operands are divided into three parts. It requires 6 partial multiplications of m/3bit long operands. This method can be combined with the original Karatsuba formula for operands, whose length is divisible by six.
34
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
In Weimerskirch and Paar [16] a generalization of the classical Karatsuba Algorithm for polynomial multiplication to polynomials of arbitrary degree was proposed. A description of the best possible usage of the KA was introduced. In our work, we propose a new scheme to solve the multiplication problem for polynomials of arbitrary degrees over finite fields with better number of multiplication and addition operations compared to the generalized one proposed in [16] based on a modification on the Bailey and Paar [15] proposal for the degree-2 polynomials as the base of our recursion. We proved that we can reach to only 6 multiplications and 12 additions to multiply two degree-2 polynomials provided that they are over Galois Fields, also we will show how to exploit redundancies in multiplications and additions to improve the performance.
2. Signature Algorithm 2.1. Algorithm Fields SFlash algorithm [6] uses three finite fields. - The binary finite field GF(2). - K= 128 is precisely defined as: K= F2 [x]/( x 7 + x + 1 ). We will denote by π the bijection {0,1} 7 and K defined by: b= (b 0 ,.......,b 6 ) {0,1} 7 , π(b)= b 6 x 6 + ... + bx + b 0 (mod x 7 + x + 1 ) - £ = K [ x ]/( x 37 + x 12 + x 10 + x 2 + x + 1 ). We will denote by φ the bijection between K 37 and £ defined by: = ( 0 ,........., 36 ) K 37 , φ( )= 36 x36 +…………+ 1x + 0 ( mod x 37 + x 12 + x 10 + x 2 + x + 1 ).
2.2. Secret Parameter. - An affine secret bijection S from K 37 to K 37 - An affine secret bijection t from K 37 to K 37 - 80-bits secret string denoted by
2.3. Signing Algorithm. To sign a message Z one goes through the following steps Use the hash function SHA-1 to create V a string of 182-bits (n-r) × q where n=37, r=11 and q=7 (n, r and q are the algorithm parameters). Use V and the secret parameter to create W a string of 77-bits (q × r). Use the bijection π to produce Y the string of 26 elements of K from V. Use the bijection π to produce R the string of 11 elements of K from W. Produce an n × q ( 259-bits) string T by concatenating Y and R, this string is 37 elements of K. Apply the inverse affine bijection t -1 on T and then use the bijection φ to produce B. Compute the A= F -1 (B), where the function F is a map from £ to £ is defined by: 11
A £ , F(A) = A128 1 Apply the inverse affine bijection s -1 on j -1 (A) to get X. Get the 259-bits Signature from concatenating p -1 ( x 0 ) ||…|| p -1 ( x 36 ).
3. Composite Fields Multiplication.
35
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
3.1 Mastrovito architecture for GF(2 7 ) . Let A(x) and B(x) be two elements of GF (2m ) . The product A(x) B(x) mod P(x) is denoted by C(x) and can be expressed in the following way: C(x) = A(x) B(x) mod P(x) =
m 1 i 0
ci x i
Where: C i = am 1f mi 1 (B ) am 2 f mi 2 ( B ) ............... a1f 1i (B ) a0 f 0i (B ) The functions (f ji ) are linear in B, i.e they are in the following form: f ji (B ) =
b
for _ some _ k k [0,1,...., m 1]
Multiplication can now be described through the functions (f ji ) which are dependent on the choice or irreducible polynomial P(x). In matrix notation we can write the product as follows: f mm11 f mm21 f 0m 1 am 1 m 2 f m 2 f m 2 f 0m 2 am 2 =M A t m1 0 0 f m 2 f 00 a0 f m 1
f 0i (B ) b (1) i i 0,1,........................, m 1 i f j (B ) (i j )bi j j 1 j 1t b m 1t i 0,1,..., m 1, j 1,...,m 1 (2) t 0 q i Where (K ) is a step function defined as: ,k 0 1 (K ) ,k 0 0 Hence, once the matrix Q has been computed, we can easily obtain the entries of M by using (1) and (2): a5 a4 a3 a2 a1 b0 a0 a6 a1 a0 a6 a6 a5 a5 a4 a4 a3 a3 a2 a2 a1 b1 a2 a1 a0 a6 a6 a5 a5 a4 a4 a3 a3 a2 b 2 C a3 a2 a1 a0 a6 a6 a5 a5 a4 a4 a3 b3 a a a2 a1 a0 a6 a6 a5 a5 a4 b 4 a4 a3 a3 a2 a1 a0 a6 a6 a5 b5 a5 a4 a a a a1 a0 a6 b6 5 4 3 2 6
3.2. Multiplier Over Composite Field GF(2 7 ) 37 . Consider the multiplication of the two elements C(x)= A(x) B(x) mod P(x), where : A (x ) am 1x m 1 .... a0 ai GF (27 ) A ( x ) GF (27 )37 This can be implemented in two stages. 1- Traditional polynomial multiplication. 2- Reduction modulo the generating polynomial P(x). For achieving the first step, we will consider the multiplication of the two polynomial A(x) and B(x) with a maximum degree of m-1 over a field i.e, each polynomial posses at most
36
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
m coefficients from , we want to find the product C ' (x ) A (x ) B (x ) with degree of C ' (x ) 2m-2.
3.2.1 Modified Karatsuba-Ofman Algorithm:The Karatsuba multiplication algorithm is a technique for quickly multiplying large numbers, was discovered by Alexeevich Karatsuba and published together with Yu.Ofman in 1962. Its time complexity is O(n log 2 3 ) . This makes it faster than the classical O(n 2 ) algorithm. Therefore, it is important to investigate a generalized version of the algorithm that is not restricted to special degrees of m-1. The major point in our approach is to apply the modified Karatsuba-Ofman algorithm and terminate the recursion when the polynomials coefficients reaches to two or three coefficients, and we will calculate the multiplication of two polynomial of three coefficients with a special algorithm Also we will exploit a previous multiplication of one coefficient to decrease the number of multiplications involved in each stage of the recursion in case that the polynomials have odd number of multiplications 3.2.2 Polynomials of degree 2: Consider the degree-2 polynomials: A (x ) a2 x 2 a1x a0 , B (x ) b 2 x 2 b1x b0 The product of A(x) and B(x) is given by: 4
C '( x )
c ' x i
i
A (x )B (x ) (a2b 2 )x 4 (a2b1 a1b 2 ) x 3 (a2b0 a1b1 a0b 2 ) x 2
i 0
(a1b0 a0b1 )x a0b0 Using the schoolbook method we require 9 multiplications and 4 additions to achieve such a goal, but we have derived a more efficient method which will require only 6 multiplications and 12 additions as follows: D 0 a0b 0 , D1 a1b 1 , D 2 a2b 2 , D 3 a2 a1 , D 4 D 3 a0 a0 a1 a2 D 5 b 2 b 1, D 6 D 5 b0 b0 b1 b 2 , D 0,1 (a0 a1 )(b0 b1 )
D1,2 D 3 .D 5 (a2 a1 )(b 2 b1 ), D D 4 .D 6 (a0 a1 a2 )(b0 b1 b 2 ) C '0 D 0 , C '1 D 0,1 D 0 D1 , C '2 D D 0,1 D1,2 C '3 D1,2 D1 D 2 , C '4 D 2 We can notice that to calculate C '2 the term a1b1 will appear three times in the equation, however, since the subtraction in the GF (2n ) is performed by XOR, two of them will disappear. If the value of m is even, we can proceed with the original algorithm. If the value of m is odd, then pad an extra term am =0, b m =0, now A(x) and B(x) has even number of coefficients, and then can be divided into lower and higher polynomials. A set of auxiliary polynomials D(x) is defined: D 0 (x ) A l (x )B l (x ) D1 ( x ) [A l (x ) A h (x )][B l (x ) B h (x )] (3) D 2 (x ) A h (x )B h (x ) m 1
' 2 [ D ( x ) D ( x ) D ( x )] 1 0 2 The product polynomial C ( x ) D 0 (x ) x (4) m 1 x D 2 (x ) However, we should investigate the degrees of polynomials D 0,i ( x ), D1,i ( x ), D 2,i ( x ) . m 1 - Since degrees of A l and B l are 1 , so the degree of D 0,i ( x ) and D1,i ( x ) are m2 1.
37
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
-
m 1 1 , But we know that the higher degree 2 coefficient of the polynomial A h or B h is ZERO, So we can neglect the most two The degrees of A h and B h are
higher degree coefficients of D 2,i (x ) and consider the degree of D 2,i ( x ) is m-3. Another benefit from knowing that the higher degree coefficient of the polynomial A h or B h is ZERO, when we want to calculate the term: D1 ( x ) [A l (x ) A h (x )][B l (x ) B h (x )] , adding the higher degree coefficients will result in adding to ZERO, which mean we have the same two coefficients, exploiting that we have multiplied them already in: D 2 (x ) A h (x )B h (x ) , we can save one extra multiplication. - Another benefit from knowing that the higher degree coefficient of the polynomial A h or B h is ZERO, when we want to calculate the term: D1 (x ) D 0 (x ) , subtracting the higher degree coefficients will result in XOR two similar values, which mean we can neglect this subtracting and save one extra subtracting. An implementation over GF (27 )37 can be done with Table 1 -
Table 1. Comparison of Used Logic Gates
Number of MUL Number of ADD Number of AND op. Number of XOR op. Total No of Boolean Gates Percentage
Schoolbook 1369 1296 67081 74784 141865 100%
General kA 703 3294 34447 56802 91249 64.3%
Our MKA 345 1688 16905 28376 45281 31.9%
3.3 Modular reduction The reduction modulo P(x) can be viewed as a linear mapping of the 2m-1coefficients of C ' (x ) into the m coefficients of C(x). This mapping can be represented in a matrix notation as follows: c 0' 1 0 0 r0,0 r0,m 2 c0 ' c1 0 1 0 r1,0 r1, m 2 c m 1 (5) c' m c m 1 0 0 1 rm 1,0 rm 1, m 2 ' c 2m 2 The matrix on the right hand side consists of a (m, m) identity matrix and a (m, m-1) matrix R which we may name it the reduction matrix. R is solely a function of the chosen field polynomial P (x ) x m p m 1x m 1 ... p 0 , i.e. to every P(x) a reduction matrix is uniquely assigned. R’s recursive dependency on P(x) is the following: i 0,...., m 1; j 0 p j ri , j (6) ri 1, j 1 rm 1, j 1ri 0 i 0,..., m 1; j 1,..., m 2
38
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
Where ri 1, j 1 0 if j=0. From (6), it follows directly that ri , j GF (2n ) since p i GF (2n ) . It should be emphasized that (5) does not require any general multiplication but only additions and multiplications with a constant from GF (2n ) . Both operations require only mod 2 adders.
3.4 Secret affine transformation To calculate the inverse affine transformation, we will need to implement a circuit for vector-matrix multiplication, where the secret inverse affine bijection is stored and represented as a vector of 37 elements, and a matrix of 37 37 elements, where each element 7 over the Galois field GF (2 ) . - The normal addition of the 37 elements of the message and the linear vector of the secret inverse affine bijection. - The matrix multiplication between the resultant of the above operation and the quadratic portion of the secret inverse affine bijection. We need to design a vector-matrix multiplier to achieve the above operation; the multiplication can be simplified by decomposing the bijection matrix into row vectors which are then multiplied by the 37 elements of the hashed message. Once this is achieved, the final resulting row vector is obtained by summing up the intermediate results row wise. The following architecture, Figure.1, and, Figure.2, describes how we can achieve this methodology in hardware circuit.
(M0
X X
6
X
M1
( ( (
æ TM0,0 çç çç T ç M1,0 M36 X çç çç çç ççèTM36,0
)
TM0,0
TM0,1
+
+
TM1,0
TM1,1
TM36,0
TM36,1
ß
ß
TM0,1 TM1,1
TM36,1
TM0,36 ö÷ ÷ TM1,36 ÷÷÷ ÷÷ ÷÷÷ ÷÷ TM36,36 ÷÷ø
TM0,36
)
+
TM1,36
)
TM36,36
)
ß
Figure 1. Matrix Multiplication Circuit Methodology
39
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
Tm
Tm
Tm
Tm
Tm
M
M
M
7 bits register
7 bits register
M
M
M
CLK RESET
Y
7 bits register
7 bits register
Y
Y
7 bits register
Y
Y
Figure 2. Matrix Multiplication Circuit.
3.5 Mapping Function Now we should find a way to calculate the inverse mapping of the SFlash algorithm, the mapping can be defined as: A = F -1 (B ) , where the function F is a map from £ to £ is defined by: 11
"A Î £ , F(A) = A128 +1 . Where A is an element over the composite field GF (27 )37 . This inverse mapping can be found by finding the following mapping: A = B h The value of the exponent h is the inverse of 12811 + 1 mod 128137 -1 , in fact h can be explicitly found by the following formula: 17
h = 2258 + å
154 i +152
å
2j
i =0 j =154 i +76
h can be written in binary format as following: I= 12811 + 1 mod 128137 -1 1000000 1000000 1000000 0111111 0111111 0111111 0111111 1000000 1000000 1000000 1000000 0111111 0111111 0111111 1000000 1000000 1000000 1000000 0111111 0111111 0111111 0111111 1000000 1000000 1000000 0111111 0111111 0111111 0111111 1000000 1000000 1000000 1000000 0111111 0111111 0111111 1000000 The problem here is how to find such a large power.
3.5.1 K-ary Exponentiation: generally speaking the K-ary method for exponentiation may be thought of as a three major steps procedure. partitioning the binary representation of the exponent h in k-bits windows. pre-computing all possible powers in windows one by one. Iterating the squaring of the partial result k times to shift it over, and then multiplying it by the power in the next window, if the window is different from 0. To make the k-ary efficient for our signature algorithm, we should choose the parameter k equal to 7, that will minimize the pre-processing step to only two possibles, the power "0111111" and the power "1000000". The hardware that implements the k-ary method, presented in the previous algorithm, is described in Figure 3. Algorithm 1.KARY(B,h,M)
40
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
1: Partition h into p k-bits windows; 2: for i=2 to m-1 3: compute B i mod M ; V 4: A:= B p -1 mod M ; 5: for i:= p-2 downto 0 do k 6: A:= A 2 mod M ; V 7: if V i ¹ 0 then A := A ´ B i mod M ; end. 8: return In this algorithm we consider the modular multiplication is performed using the modified Karatsuba multiplier we have described in the previous section. The hardware that implements the k-ary method, presented in figure.3 performs the first or pre-processing step computes the two possible powers "0111111" and "1000000" of B, and store them in two registers REGMA and REGMB. Later on, in the second exponentiation step, each partition of the exponent h will be used to address the registers to obtain the pre-computed power of B, as defined in line 3 of the algorithm. The first power of B, i.e B 2 modulo M, is computed by passing B through both multiplexers MUX1 and MUX4, feeding the Modified Karatsuba Multiplier (MKM). The result is then stored in REGMA. The subsequent powers are obtained by passing the previous result through multiplexer MUX3 then MUX1, note that B is kept available through multiplexer MUX4.Using Square and multiply method we can obtain the value of the power "1000000" and finally storing it in register REGMA.
Figure 3. Signature Generation Circuit
41
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
With the same above procedure, the power "1000000" is computed in the same way and store the final result in register REGMB. After that we can use the most significant bit of the partition of the power to find the relevant pre-computed power in the two power registers. In each iteration of the second step, the partial result A is raised to the 2k power then V multiplied by B i modulo M, where i here is 0 or 1. The value of B V i is obtained from the two power registers, according to the current most significant bit of the partition of the exponent h, we store the exponent h in a shift register. REGe, from which the most significant partition is retrieved to address the power registers REGMA and REGMB. When a new partition is required, register REGe is left-shifted k-times. Recall that k represents the partition size. This operation is controlled by a down counter initialized by k and decremented each time the register REGe is left-shifted. Signal zero is asserted when the down counter reaches zero.
4. Results Table 2 gives an overview of the area utilization with 259-bits word length. The gate equivalent for AMI 0.35 mm s cell library is a NAND gate with two inputs (36.9 mm 2 ). Units Multiplier GF (27 )37 Registers Multiplexers Multipliers over GF (2)7 Registers Adders over GF (2)7 Total
Gate equivalents 68812.5 1968.4 666 5194.8 984.2 440.3 78066.2
Area 2.53918125 0.07263396 0.0245754 0.19168812 0.03631698 0.01624707 2.88064278
Table 2. Area Utilization of The Circuit.
To achieve the signature process we need: 384 Multiplication time 384 Reduction time 74 Addition time 2738 Clock Cycles for data feeding 9 Clock Cycles for temp Data saving We will consider the multiplication and reduction is a complete process and will call them together a multiplication. So the number of clock cycles required is 3131, and considering the single multiplication operation is the critical path. The clock cycle time is. CLK = T mul +T red = 5.245 + 1.61 = 6.855 ns Then time needed for the signature is 6.855 ´ 3131 = 21.5 ms 1 And the throughput of the algorithm is ´ 259 = 12.10 Mb/s 21.4 ´10-6
5. Conclusion
42
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
In this paper, we implemented SFlash signature chip using modified Karatsuba-Ofman algorithm for wireless devices and smartcards which needs the fast operation and low hardware resource. SFlash chip using full parallel modified Karatsuba-Ofman with 6 stages multiplication require only 3131 cycles Result in synthesis operating frequency is about 140MHZ and gate counts are approximately 78K.
References [1] W. Diffie and M. Hellman, New Directions in Cryptography, IEEE Trans. Info. Theory, vol. IT-22, no. 6 (1976),pp. 644-654. [2] H. Ong, C. Schnorr, and A. Shamir, A Fast Signature Scheme Based on Quadratic Equations, Proc. 16th ACM Symp. Theo. of Computations, 1984, pp. 208-216. [3] H. Fell and W. Diffie, Analysis of a Public Key Approach Based on Polynomial Substitution, CRYPTO’85, LNCSV. 218, pp. 340–349. [4] T. Matsumoto and H. Imai, Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption, EUROCRYPT’88, LNCS V. 330, pp. 419–453. [5] M. Akkar, N. Courtois, R. Duteuil, and L. Goubin, A Fast and Secure Implementation of SFLASH, PKC 2003,LNCS V. 2567, pp. 267–278. [6] Jacques Patarin, Louis Goubin, Nicolas Courtois, Flash, a fast multivariate signature algorithm, Cryptographers’ Track RSA Conference 2001, San Francisco 8-12 April 2001,mLNCS 2020, Springer, pp. 298-307. [7] E.D.Mastrovito, VLSI designs for computations over finite fields GF(2 ), Ph.D, Thesis No.159, Dept. Of Electr. Engineering, Linkoping Univ. Sweeden, 1988 [8] M. A. Hasan, M. Z. Wang, and V. K. Bhargava. A modified Massey-Omura parallel multiplier for a class of finite fields. IEEE Transactions on Computers, 42(10):1278–1280, November 1993. [9] C. Paar. Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields. PhD thesis, Universit¨at GH Essen, VDI Verlag, 1994 [10] B. Sunar and C¸ . K. Koc¸. Mastrovito multiplier for all trinomials. IEEE Transactions on Computers, 48(5):522–527, May 1999. [11] Karatsuba A.; Ofman Y. Multiplication of multidigit numbers by automata, Soviet Physics-Doklady 7, p.595596, 1963 [12] IEEE P1363. Standard specifications for public-key cryptography. Draft Version 7, September 1998. [13] C. Grabbe, M. Bednara, J. Teich, J. von zur Gathen, and J. Shokrollahi, FPGA designs of parallel high 233 performance GF (2 ) multiplier, Proc. of the IEEE International Symposium on Circuits and Systems, May 2003, 362-369. [14] Z. Dyka and P. Langendoerfer, Area efficient hardware implementation of elliptic curve cryptography by iteratively applying Karatsuba’s method, Proc. of the Design, Automation and Test in Europe Conference and Exhibition, Vol.3, Mar. 2005, 70-75. [15] Bailey, D. V. and Paar, C. Efficient Arithmetic in Finite Field Extensions with Application in Elliptic Curve Cryptography. Journal of Cryptology, vol. 14, no. 3, 153–176. 2001. [16] A.Weimerskirch and C.Paar, Generalization of The Karatsuba Algorithm for Polynomail Multiplication, Design, Codes and Cryptography, March 2001.
43
International Journal of Security and Its Applications Vol. 3, No. 4, October, 2009
Authors Mohamed M.Abdelhalim, a M.Sc student at Faculty of Engineering, Cairo University, He received the B.Sc. degree in Computer Engineering at University of Cairo, Egypt. He worked as research and development engineer in Globaltronics and Advanced Smart Card Co., he contributes in developing many smart cards
based projects.
Raafat S.Elfouly, an assistant professor at Faculty of Engineering, Cairo University, He received the Ph.D. degree in Computer Science and Engineering at University of Connecticut, CT, USA. He has published several papers in refereed journals and international conferences, he also has a good list of technical reports.
44