Implementation of Scalable Elliptic Curve Cryptosystem Crypto ...

4 downloads 7153 Views 169KB Size Report
tiplier only implementation and a multiplier plus divider implementation are compared in ..... [2] National Institute of Standards and Technology, Digital Signature ...
Implementation of Scalable Elliptic Curve Cryptosystem Crypto-Accelerators for GF (2m) Aaron E. Cohen and Keshab K. Parhi Department of Electrical and Computer Engineering VLSI Digital Signal Processing Laboratory University of Minnesota Twin Cities {aecohen, parhi}@ece.umn.edu Abstract This paper focuses on designing elliptic curve crypto-accelerators in GF (2m ) that are cryptographically scalable and hold some degree of reconfigurability. Previous work in elliptic curve crypto-accelerators focused on implementations using projective coordinate systems for specific field sizes. Their performance, scalar point multiplication per second (kP/s), was determined primarily by the underlying multiplier implementation. In addition, a multiplier only implementation and a multiplier plus divider implementation are compared in terms of critical path, area, and area time (AT ) product. Our multiplier only design, designed for high performance, can achieve 6314 kP/s for GF (2571 ) and requires 47876 LUTs. Meanwhile our multiplier and divider design, with a greater degree of reconfigurability, can achieve 44 kP/s for GF (2571 ). However, this design requires 27355 LUTs, and has a significantly higher AT product. It is shown that reconfigurability with the reduction polynomial significantly benefits from the addition of a low latency divider unit and scalar point multiplication in affine coordinates. In both cases the performance is limited by a critical path in the control logic.

1: Introduction In the mid 1970’s early public key cryptosystems were developed. Their usefulness increased almost exponentially with the invention of the Internet and the numerous financial transactions that occur over this system every day. Many public key cryptosystems base their security on the difficulty of solving a known hard problem such as discrete logarithms or integer factoring. Thus they are assumed to be very secure. However many researchers have shown that breaking cryptosystems is not necessarily as difficult as solving these known hard problems. Some exploits that are capable of breaking security of public key cryptosystem implementations are timing analysis attacks and power analysis attacks. One such implementation called OpenSSL [1] implemented a patch for timing analysis attacks. In the mid 1980’s the elliptic curve cryptography was developed independently by Miller and Koblitz. This new invention allows public key cryptosystems to be developed for smaller key sizes but with comparable security strength to the RSA cryptosystem. This advantage, smaller key size with similar security strength, does not come free. Elliptic curve cryptosystems have higher hardware complexity than systems built for the RSA Cryptosystem. To help minimize this gap, a significant amount of research has focused on better understanding the optimizations available depending on the elliptic curve and field size. A few optimizations found with some types of elliptic curves are optimal normal basis, composite basis, and Koblitz curves. The obvious drawback is that published standards [2,3]

do not include all these wonderful curves containing a high degree of optimization potential because it is assumed that the other curves, which can not be optimized as easily, are more secure. The NIST has issued a standard for using elliptic curves in security systems that are intended for use with financial transactions. The industry has taken extreme interest in elliptic curve cryptography. Sun Microsystems has implemented elliptic curve cryptography on a field programmable gate array and proven elliptic curve cryptography acceleration [4]. Sun Microsystems also donated elliptic curve cryptography code to the OpenSSL project. One company, Certicom [5], has issued a challenge similar to RSA Labs but specifically intended for elliptic curves rather than the RSA Cryptosystem [6]. There is a growing need for hardware acceleration of public key cryptosystems. Not only have servers and security challenges but also embedded systems have forced designers to add crypto-accelerators, additional hardware for cryptosystems. Crypto-accelerators are very promising as they typically achieve better performance and better power efficiency than a software implementation on a generic processor. A crypto-accelerator can be implemented in a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC). This paper is organized as follows. Section 2 introduces previous work. This is then followed by a discussion on the background for elliptic curves and our design criteria. Then an introduction to the functional units is presented in Section 4. Section 5 analyzes the algorithm run time for our processors. Section 6 discusses the two separate designs and their performance. Finally Section 7 provides the conclusion and a brief discussion of future work.

2: Previous Work The invention of elliptic curve crypto-accelerators has spawned a renewed interest in finite field GF (2m ) architectures and modular arithmetic GF (p) architectures. Research in elliptic curve crypto-accelerators includes research in finite field multiplier, divider, and squarer units and research in modular multiplier, divider, and squarer units. Research at the algorithmic level of the scalar point multiplication includes scalar recoding methods, algorithm parallelization methods, and various coordinate systems. Research in architectures for arithmetic in GF (2m ) often denoted finite field arithmetic include multiplier, divider, other operational units and different bases. Finite field multipliers have been a popular research topic [7, 8] which in one case led to a patent for Massey-Omura normal basis multipliers [9]. Other multipliers include the Sunar-Koc designs [10, 11], Mastrovito designs [12], multipliers based on the classical algorithm for polynomial basis, and multipliers based on the Karatsuba-Ofman [13] algorithm for polynomial basis. More recently finite field divider units have been designed using a modified version of the extended Euclidean algorithm [14, 15] or a modified version of the extended binary GCD algorithm [15, 16]. Other less known division techniques include solving a system of linear equations [17] and the almost inverse algorithm [18]. Previous research in architectures for arithmetic in GF (p), often implemented as modular arithmetic, include various multiplier, divider, and other operational units. The multiplier units are either derived from the Karatsuba-Ofman and classical algorithms, or the Montgomery N-Residue technique. The most popular division unit is the Almost Montgomery

Inverse [19] because of its relation to N-Residues. The algorithm level of scalar point multiplication includes research in the area of recoding methods, parallelization techniques, and coordinate systems. The most popular recoding technique has been the non-adjacent format (NAF) because it does not require any additional storage overhead over using the scalar number without recoding. Other popular techniques include KT representation and Booth recoding [20]. A paper by Lopez and Dahab introduced a parallelizable version of scalar point multiplication [21]. Their work eliminated the dependence in the scalar point algorithm by removing the y coordinate update. This work led to a scalar point algorithm with uniform operation count between iterations. Also changing the coordinate system can have a significant effect on the overall system performance [22]. The most popular switch is to use a projective coordinate (X, Y, Z) system instead of the affine coordinate system (x, y). This technique allows the costly division operation to be removed from the main processing loop of the scalar point multiplication algorithm. There are a few miscellaneous research topics in elliptic curve cryptography such as the Frobenius map [23] for Koblitz Curves and the composite basis [24]. The Frobenius map is an optimization technique that is only useful for Koblitz curves. The composite basis allows multipliers to be designed more efficiently than with the Mastrovito technique. Unfortunately, this leads to a design with the additional overhead due to long wire routing issues. The tradeoff of using composite basis multipliers is an increase in long interconnects but a significant decrease in gate count. Also when using polynomial basis numbers it becomes unnecessary to use composite basis multipliers because it is more practical to implement fast squaring and share the reduction logic with the multiplication unit. Previous research on elliptic curve crypto-accelerators has focused primarily on optimizing speed given different area constraints [25–30]. In [25, 27, 28, 30], arithmetic was performed in the polynomial basis however in [26, 29] arithmetic was performed in the normal basis. The elliptic curve parameters such as field size and reduction polynomial are typically preset or built into the architecture [25–27, 29, 30]. Previous research has focused on implementations of elliptic curve crypto-accelerators for specific elliptic curves, whereas this paper is primarily addressing re-configurability and performance for the general curves listed in the NIST standard [2].

3: Background 3.1: Elliptic Curves An elliptic curve cryptosystem is defined by a fundamental elliptic curve and an operation called scalar point multiplication. The most generalized equation for an elliptic curve is the Weierstrass [31] equation (1), which requires six variables to specify the exact elliptic curve. E ⇒ y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6

(1)

3.2: Mathematical Operations over Elliptic Curves Algorithm I: MSB Scalar Point Multiplication INPUTS: k, P Q = O; for i = m − 1 downto 0 begin Q = 2Q; if k[i] == 1 then Q = Q + P; end; OUTPUT: Q = kP

Scalar point multiplication, kP where k is a scalar and P is a specific point on the elliptic curve E, adds up points on an elliptic curve E depending on the scalar number k. Adding points on an elliptic curve is called Point Addition and it has a special case called Point Doubling. Point Addition [31] behaves according to the following rules 1)P+P=2P, 2)P+O=P, 3)O+O=O, 4)P1+P2=P3, 5)P-P=O where P, P1, P2, and P3 are points on the elliptic curve and O is the point at infinity. The general steps to perform point addition are to draw a line (y=mx+b equations (2, 3) from [31]) between points P1 and P2 then find the third intersection with E. If P1 equals P2 then draw a tangent line to find the third intersection. After the third intersection is found it is necessary to find the other y coordinate that solves E, equation (4) [31]. Finally the result consists of the third intersections x coordinate and the other y coordinate.     

y2 − y1 x2 − x1 m =  3x2 + 2a2 x + a4 − a1 y    2y + a1 x + a3

when P1 6= ±P2 (2) when P1 = P2

    

y1 x2 − y2 x1 x2 − x1 b = 3 + a x + 2a − a y  −x 4 6 3    2y + a1 x + a3 y2 = −y1 − a1 x − a3 .

when P1 6= ±P2 (3) when P1 = P2 (4)

3.3: Mathematical Operations in Fields The mathematical operations, for the coordinate calculations, are performed in GF (pm ) depending on the elliptic curve. For mathematical operations in GF (2m ) either the polynomial basis (5) or the normal basis (6) may be used. a(x) = am−1 xm−1 + am−2 xm−2 + ... + a0 x0 2m−1

a(x) = am−1 x

2m−2

+ am−2 x

(5) 20

+ ... + a0 x .

(6)

In the polynomial basis for GF (2m ) addition (7), multiplication (8), and division (9) are performed without carries and are followed by a reduction modulo another polynomial. (a(x) + b(x)) mod p(x)

(7)

(a(x) × b(x)) mod p(x)

(8)

(a(x) ÷ b(x)) mod p(x)

(9)

Addition in the normal basis is identical to addition in the polynomial basis because it does not require carries. Squaring is a cyclic shift. The details for multiplication can be found in [32]. For mathematical operations in GF (p) usually modular arithmetic is used where addition (10), multiplication (11), and division (12) are reduced by a modulo operation. (A + B) mod p

(10)

(A × B) mod p

(11)

(A ÷ B) mod p

(12)

3.4: Design Considerations 3.4.1: Reconfigurability

For the user there is a concern for re-configurability, i.e., the ability to change the elliptic curve parameters after the design has been manufactured. If it is found that security is compromised when using a specific reduction polynomial or specific elliptic curve parameters then it is necessary to allow updates to the reduction polynomial or these parameters rather than hardwire them into the design. 3.4.2: Scalability

One important issue when designing crypto-accelerators is scalability. Scalability is the ability to perform encryption and decryption with various security (cryptographic) strengths. There are two advantages in designing a system with built in scalability options. The first is the ability to allow the user to waste less power on cryptography when it is not needed. The second is it allows the user to increase the performance of his or her system by selecting the minimum cryptographic strength. There are two primary ways of adding scalability to a crypto-accelerator. The first method is over-designing. Over designing means the crypto-accelerator has enough area for performing the full operation but the total cycle count is decreased when performing encryption or decryption with minimum cryptographic strength. The second method is the hardware software codesign method. This method implements the minimum cryptographic strength operations in hardware and uses software or delay elements to perform the extension to higher cryptographic strength.

4: Functional Units Analysis A subset of the total functional units which were tested are the adder, the linear feedback shift register (LFSR) multiplier [8], the Karatsuba-Ofman multiplier [7], the squarer unit, and the least significant bit (LSB) first divider [16].

4.1: Bit Parallel Adder The adder unit was designed for GF (2m ). Addition in this field does not require carry propagation and therefore the hardware implementation simplifies to an array of exclusive or gates as in Figure 4.1. Therefore the area complexity of the adder is O(m) xor gates and the critical path is TXOR . a0 b0

a1 b1

am−1 bm−1 ...

c0

c1

cm−1

Figure 4.1: Bit Parallel Adder 4.2: LFSR Multiplier The multiplier unit can be constructed as a Linear Feedback Shift Register (LFSR) or using the Karatsuba-Ofman algorithm followed by reduction circuitry. There are a few methods for extending the LFSR Multiplier, Figure 4.2, to support different fields. The first method is to use a multiplexor on the feedback loop. This is non-ideal. An additional multiplexor would limit the total number of additional fields supported by this multiplier and it would incur a delay penalty on the long feedback wire. Another method is to perform the multiplication as normal but with shifted inputs and outputs. This is the ideal method as it does not incur an additional delay on the feedback path. Both methods require the LFSR to be implemented for the maximum field size. (i.e. For maximum field GF (2571 ) there will be 571 and logic gates connected to the feedback path) By inspection the area complexity of the LFSR is O(2m − 1) xor gates, O(2m) delay elements, O(m) multiplexor units, and O(m) and gates. The critical path is approximately TF EEDBACK + TAN D + TXOR + TM U X .

...

p0

p1

a0

p2

a1

bi

pm−1

a2

D Q

am

am−1 ...

D Q

D Q

...

D Q

... D Q

D Q

D Q

ci

Parallel To Serial Converter

Figure 4.2: Linear Feedback Shift Register (LFSR) Multiplier 4.3: Karatsuba-Ofman Multiplier The Karatsuba-Ofman algorithm is an optimization technique used for decomposing larger multiplications into multiple smaller multiplications. This feature allows the multiplier to be scaled easily. The Karatsuba-Ofman algorithm is defined as follows Given two numbers A and B to multiply A = AH ∗ 2(m/2) + AL B = BH ∗ 2(m/2) + BL then calculate U, M, and L as U

= AH ∗ BH

M

= (AH AL ) ∗ (BL BH ) = AH BH + AL BH + AH BL AL BL

L = AL ∗ BL Using the relation AL BH + AH BL = (U + M + L) calculate the result (C) as follows C = 2m (U ) + 2(m/2) (U + M + L) + L A first level decomposition with the Karatsuba-Ofman algorithm is shown in Figure 4.3.

AH

BH

AL

BL

XOR

KM(m/2)

XOR

KM(m/2)

U

KM(m/2)

M

L

XOR UML m

m

C = U*2 + (UML)*2 2 + L

C

Figure 4.3: Karatsuba-Ofman Multiplier For this project a hybrid karatsuba multiplier [7] was implemented. This increases the area complexity but reduces the critical path. The total area for a hybrid karatsuba mulm 2 2 tiplier is O(( m F ) ) AND gates and O(( F ) ) XOR gates, where F is defined as the folding factor. The critical path is then (log2 (m/F ) + 1) × TXOR . 4.4: LSB First Divider The next functional unit is the divider. The most promising divider units are based on one of the extended GCD algorithms. Typical implementations of the extended GCD algorithms are folded systolic arrays. The LSB first divider [16] is a systolic array derived from the extended binary gcd algorithm. The LSB first bit-serial, Figure 4.4, divider scales at a rate of O(2m) with a time complexity of O(5m − 4). The critical path of the divider is determined by the processing element. Each processing element has a critical path of TAN D + 2TXOR . 0

p3 p2 p1 0 0 b2 b1 b0

PE

PE

a2 a1 a0 001

Figure 4.4: LSB First Divider

PE

PE

PE

c2 c1 c0

5: Algorithm Analysis 5.1: MSB Scalar Point Multiplication The MSB scalar point multiplication using non-adjacent format (NAF) has a running time complexity of TT OT AL = TPrecomputations +O(m) × (TPoint Doubling + TCompare and Branch )   m +O ×(TPoint Addition + TCompare and Branch ) 3 +TPostcomputations Where O( m 3 ) is from using NAF encoding and O(m) is from the doubling that occurs in every iteration. 5.2: Affine Coordinates Performing scalar point multiplications in affine coordinates requires three arithmetic units: an adder, multiplier, and a divider. The divider can be implemented as a multiplier using Fermat’s Little Theorem; however, the computational time for one scalar point multiplication grows as O(m3 ) instead of O(m2 ) with a divider where m is defined as GF (2m ). Average performance calculation based on our MSB Scalar Point Multiplication in affine coordinates and scalar in non-adjacent format. TT OT AL = 1TADD + 1 + 2 +O(m) × (1TDIV + 3TM LT + 2TSQR + 4TADD + 2)   m ×(1TDIV + 1TM LT + 1TSQR + 8TADD + 1) + 1 +O 3 5.3: Projective Coordinates Use of projective coordinates is one method for eliminating divisions from the main loop in the scalar point multiplication algorithm. Average performance calculation based on our MSB Scalar Point Multiplication in projective coordinates and scalar in non-adjacent format. TT OT AL = 1TADD + 1 + 2 + O(m) × (7TM LT + 5TSQR + 4TADD + 2)   m ×(13TM LT + 1TSQR + 7TADD + 1) +O 3 +TCT A + 1 There are three methods for converting projection coordinates back to affine coordinates. TCT A

   2TDIV ,

TDIV < 2TM LT = 1TDIV + 2TM LT , 2TM LT ≤ TDIV ≤ TF LT   (m − 1) ∗ (T + T ) + 2T , TDIV > TF LT M LT SQR M LT

where using Fermat’s Little Theorem the time for division becomes TF LT = (m − 1) ∗ (TM LT + TSQR ). When TDIV < 2TM LT performing two divisions is the fastest. However when 2TM LT ≤ TDIV ≤ TF LT sharing the division becomes faster. Finally when TDIV > TF LT implementing division in software using a square and multiply operation is more efficient.

6: Implementation Designing the control for a complex crypto-accelerator is a challenge but design of an instruction set to handle the control logic vastly simplified things. [Table 6.1] lists the different assembly instructions. One advantage when using an instruction set was it was easy to simulate the processor and it was easy to switch between affine and projective coordinate [22] code for the MSB scalar point multiplication. Math Instructions ADD MLT SQR DIV MOV LDR NOP BRB CMP CMV HLT Table 6.1:

Addition Multiplication Square Division Load Instructions Move Register Load Control Registers Control Instructions No Operation Branch if not point addition Branch if not finished with scalar point multiplication Condition Move Register Halt processor Instruction Set Architecture

Finally, a generic processor was designed for the crypto-accelerator to control the various functional units, Figure 6.1.

DMEM

IMEM

PC

+1

Project: Author: Version:

P I P E L I N E

ECC Processor

Decoder

P I P E L I N E

ALU

P I P E L I N E

CONTROL

1.0

Figure 6.1: Datapath 6.1: Design Configurations Two processor descriptions, with scalable cryptographic strength, were written in VHDL and synthesized to an FPGA. Design IV’s Arithmetic Logic Unit (ALU) consists of the bit parallel adder, LFSR multiplier, and the LSB first divider. This design is both scalable and reconfigurable. Design V’s ALU consists of the bit parallel adder, the Hybrid KaratsubaOfman multiplier, and the squarer. In Design V a hardwired reduction circuitry was shared between the multiplier and the squarer. Therefore, this design is scalable but the reduction polynomials are hardwired and hence the reduction polynomials are not reconfigurable. 6.2: Analysis Results For comparison purposes it is necessary to analyze the area consumption [Table 6.2], performance [Table 6.3], and critical path [Table 6.4] in terms of standard metrics. Design IV has smaller area requirements but its performance is worse than Design V. Also it is important to note that Design IV benefits from using affine coordinates.

Growth Rates Design XOR AND MUX GATES IV 11m − 4 18m − 8 23m − 11 52m − 23 1 2 1 2 V 14 m2 + 2m − 1 m 2m − 1 m + 4m − 1 4 2 Table 6.2: Area Consumption Growth Rates Design Affine Projective Best 26 37 2 50 2 IV Affine m + 9m + 5 m + m + 5 3 3 3 178 2 + 74 m + 5 V 20 m 3 3 3 m + 5 Projective Table 6.3: Latency

MEM 42m − 18 3 2m

Growth Rates Design Critical Path HW Unit IV TAN D + 2TXOR Divider V (1 + log2 (m/F )) × TXOR + TREDU CE Multiplier Table 6.4: Critical Path Growth Rates Cycles Area AT Product 1924 3 553 2 2 + 9m + 5 52m − 23 Design IV 37 m 3 3 m + 3 m + 53m − 115 1439 2 −118 178 1 2 89 3 Design V 3 m+5 8 m + 4m − 1 3 m + 6 m + 3 m−5 Table 6.5: Design Comparisons Design V is comparable, in terms of area requirements and critical path, to a scaled up model of the crypto-accelerator in [28]. Both crypto-accelerators have a cycle count growth of O(m2 ) per scalar point multiplication. However Design IV utilizes a division unit which currently has not been explored in this level of detail. The cycle count of the scalar point multiplication grows as O(m2 ) in Design IV rather than O(m3 ) as in [25] or O(m2 ∗log2 (m)) as in [29]. The above numbers were obtained by assuming multiplication was implemented by a digit serial multiplier which requires O(m) time. 6.3: Synthesized Results Both Designs support the following fields defined in the NIST recommended curves list [2] GF (2571 ), GF (2409 ), GF (2283 ), GF (2233 ), and GF (2163 ) but as mentioned before the two designs have different Arithmetic Logic Unit’s (ALU)’s. The area was measured with Xilinx’s synthesis tools. Both designs were synthesized with the target device set to the Xilinx Virtex 2 8000 field programmable gate array. The critical path analysis was performed after synthesis with Xilinx’s Timing Analyzer. LUTS FFs Design IV 27355/93184 29234/93184 Design V 47876/93184 4664/93184 Table 6.6: Synthesis Results

Critical Path 5.638ns (control logic) 4.674ns (control logic)

The results from [Table 6.6] are not at all surprising as large arrays of exclusive or gates can be implemented efficiently in field programmable gate arrays and hence the critical path turned out to be the adder in the control logic. There are two adders in the design one for the control logic and another for the program counter. Therefore implementing one hot encoding for the control is not necessary as our critical path would then be limited by the other adder.

7: Conclusions Our contribution consists of designing and synthesizing scalable and reconfigurable processors that implement scalar point multiplication for the approved elliptic curves for GF (2m ) specified by NIST [2]. Our results indicate it is not sufficient to simply choose a hardware combination and assume that using a given coordinate system is optimal. Divider units based on systolic architectures derived from one of the extended GCD algorithms show significant promise

in terms of designing for scalability, high clock rates, and lower latency over other divider implementations. This research shows that scalability is a simple addition to several elliptic curve crypto-accelerators and that reconfigurability on the reduction polynomial requires a significant reduction in performance. The additional division unit allows the design to achieve an O(m2 ) cycle count growth rate. Finally our last contribution is the AT product for our designs grows as O(m3 ). This result gives the designer a rough estimate on the amount of area required to achieve a given performance. This paper has provided a detailed overview of the design steps necessary to implement a reconfigurable elliptic curve crypto-accelerator. There are several interesting research topics such as how reconfigurability affects power consumption, the reduction polynomial’s effect on power consumption, and extreme low area crypto-accelerators. Another important area besides power consumption is the hardware software co-design. Although the approach of this paper was to select hardware units and optimize the scalar point multiplication for the given architecture it is just as important to go the other way. Only a thorough analysis of the various methods for performing the different finite field multiplications can lead to a true Area × Time product graph where one can simply pick their target performance and get a rough idea of the area required.

8: References [1] The OpenSSL Toolkit for SSL, http://www.openssl.org. [2] National Institute of Standards and Technology, Digital Signature Standard, FIPS Publication 186-2, February 2000. [3] IEEE Standard Specifications for Public-Key Cryptography, IEEE Std 1363-2000. [4] Sun Microsystems Crypto Research, http://www.research.sun.com/projects/crypto/. [5] Certicom Corporation, http://www.certicom.com. [6] The RSA Factoring Challenge, http://www.rsasecurity.com/rsalabs/challenges/factoring/. [7] C. Grabbe, M. Bednara, J. Teich, J. von zur Gathen, and J. Shokrollahi, “FPGA Designs of Parallel High Performance GF (2233 ) Multipliers,” International Symposium on Circuits and Systems, ISCAS, vol. 2, pp. 268–271, 2003. [8] L. Song and K. K. Parhi, “Efficient Finite Field Serial/Parallel Multiplication,” Proceedings of International Conference on Application Specific Systems, Architectures and Processors, ASAP, pp. 72–82, August 19-21, 1996. [9] J. Omura and J. Massey, “Computational Method and Apparatus for Finite Field Arithmetic,” U.S. Patent Number 4,587,627, May 1986. [10] C. K. Koc and B. Sunar, “Low-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a class of Finite Fields,” IEEE Transactions on Computers, vol. 47 no. 3, March 1998. [11] C. K. Koc and B. Sunar, “An Efficient Optimal Normal Basis Type II Multiplier,” IEEE Transactions on Computers, vol. 50 no. 1, January 2001. [12] E. D. Mastrovito, “VLSI Designs for Multiplication over Finite Fields GF (2m ),” 6th International Conference on Applied Algebra, Algebraic Algorithms and Error-Correcting Codes AAECC-6, Lecture Notes in Computer Science 357 Springer-Verlag, pp. 297–309, 1989. [13] A. Karatsuba and Y. Ofman, “Multiplication of Multidigit Numbers on Automata,” Soviet Physics Doklady, vol. 7, pp. 595–596, 1963. [14] J. Guo and C. Wang, “Bit-Serial Systolic Array Implementation of Euclid’s Algorithm for Inversion and Division in GF (2m ),” IEEE Proceedings of Technical Papers, June 1997.

[15] A. J. Menezes, P. C. V. Oorschot, and S. A. Vanstone, Handbook of Applied Cryptography. CRC Press, 1996. [16] C. Kim, S. Kwon, C. P. Hong, and G. I. Nam, “Efficient Bit-Serial Systolic Array for Division over GF (2m ),” International Symposium on Circuits and Systems, ISCAS, vol. 2, pp. 252–255, 2003. [17] R. Schroeppel, H. Orman, S. O’Malley, and O. Spatscheck, “Fast key exchange with elliptic curve systems,” Advances in Cryptology - CRYPTO’95, LNCS 973, Springer-Verlag, pp. 43–56, 1995. [18] C. L. Wang and J. L. Lin, “A Systolic Architecture for Computing Inverses and Divisions in Finite Fields GF (2m ),” IEEE Transactions on Computers, vol. 42 no. 9, pp. 1141–1146, September 1993. [19] E. Savas and C. K. Koc, “The Montgomery Modular Inverse Computers, vol. 49 no. 7, July 2000.

Revisited,” IEEE Transactions on

[20] R. Katti, “Speeding up Elliptic Cryptosystems using a new Signed Binary Representation for Integers,” IEEE Proceedings of the Euromicro Symposium on Digital System Design (DSD02), 2002. [21] J. Lopez and R. Dahab, “Fast Multiplication on Elliptic Curves over GF (2m ) without Precomputation,” Cryptographic Hardware and Embedded Systems CHES99, LNCS 1717, pp. 316–327, 1999. [22] H. Cohen, A. Miyaji, and T. Ono, “Efficient Elliptic Curve Exponentiation Using Mixed Coordinates,” ASIACRYPT98, LNCS 1514, pp. 51–65, 1998. [23] J. Lopez and R. Dahab, “An Overview of Elliptic Curve Cryptography, technical report,” Institute of Computing, State University of Campinas, Brazil, 22nd of May 2000. [24] C. Paar, “A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields,” IEEE Transactions on Computers, vol. 45 no. 7, pp. 856–861, July 1996. [25] J. Wolkerstorfer and W. Bauer, “A PCI-Card for Accelerating Elliptic Curve Cryptography,” Proceedings of Austrochip 2002, Graz, Austria, October 4, 2002. [26] M. Ernst, S. Klupsch, O. Hauck, and S. Huss, “Rapid Prototyping for Hardware Accelerateed Elliptic Curve Public-Key Cryptosystems,” 12th International Workshop on Rapid System Prototyping, June 25 - 27, 2001. [27] A. Moon, J. Park, and Y. Lee, “Fast VLSI Arithmetic Algorithms for High-Security Elliptic Curve Cryptographic Applications,” IEEE Transactions on Consumer Electronics, vol. 47 no. 3, August 2001. [28] N. Gura, S. Shantz, H. Eberle, S. Gupta, V. Gupta, D. Finchelstein, E. Goupy, and D. Stebila, “An Endto-End Systems Approach to Elliptic Curve Cryptography,” Cryptographic Hardware and Embedded Systems CHES02, LNCS 2523, pp. 349 – 365, 2002. [29] K. Leung, W. W. K. Ma, and P. Leong, “FPGA Implementation of a Microcoded Elliptic Curve Cryptographic Processor,” IEEE Symposium on Field-Programmable Custom Computing Machines, April 17 - 19, 2000. [30] S. Janssens, J. Thomas, W. Borremans, P. Gijsels, I. Verbauwhede, F. Vercauteren, B. Preneel, and J. Vanderwalle, “Hardware/Software Co-Design of an Elliptic Curve Public-Key Cryptosystem,” Proceedings IEEE Workshop on Signal Processing Systems SiPS-2001, pp. 209–216, September 2001. [31] J. H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, 1986. [32] R. J. McEliece, Finite Fields for Computer Scientists and Engineers. Kluwer Academic, 1987.

Suggest Documents