An Efficient Family of Multibase Methods and its Application to ... - Webs

0 downloads 0 Views 2MB Size Report
Patrick Longa. Microsoft ... The Galbraith-Lin-Scott (GLS) Method and Extensions. 4-GLV Method ..... cannot be chosen randomly, Galbraith et al. showed that.
Highly-Efficient and Secure Elliptic Curve Scalar Multiplication using the 4-GLV Method

Patrick Longa Microsoft Research

http://research.microsoft.com/en-us/people/plonga/

Joint work with Zhi Hu, Francesco Sica and Maozhi Xu

Outline

 ECC Basics

 The Gallant-Lambert-Vanstone (GLV) Method  The Galbraith-Lin-Scott (GLS) Method and Extensions  4-GLV Method on GLS Curves  GLV and Side-Channel Attacks  GLV and the Twisted Edwards Model  Efficient Field Arithmetic  Efficient Point Arithmetic  Experimental Results  Conclusions

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

1 / 30

Elliptic Curve Scalar Multiplication A (Weierstrass) elliptic curve over a field 𝐾 is given by 𝐸 𝐾 ∶ 𝑦 2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥 3 + 𝑎2 𝑥 2 + 𝑎4 𝑥 + 𝑎6 where 𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 , 𝑎6 ∈ 𝐾 and discriminant ∆𝐸 ≠ 0 . Given a point 𝑃 ∈ 𝐸(𝐾) of prime order 𝑛 and an integer 𝑘 ∈ [1, 𝑛 − 1], elliptic curve scalar multiplication consists in computing 𝑘 𝑃 .  This operation is central to protocols based on elliptic curves .

In this talk, we focus on the variable-point scenario on curves over large prime characteristic fields to achieve: - Highest performance possible - Full protection against timing-type side-channel attacks  Implications also extend to other scenarios (e.g., fixed-point and double-scalar

scenarios). Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

2 / 30

Elliptic Curve Scalar Multiplication A (Weierstrass) elliptic curve over a field 𝐾 is given by 𝐸 𝐾 ∶ 𝑦 2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥 3 + 𝑎2 𝑥 2 + 𝑎4 𝑥 + 𝑎6 where 𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 , 𝑎6 ∈ 𝐾 and discriminant ∆𝐸 ≠ 0 . Given a point 𝑃 ∈ 𝐸(𝐾) of prime order 𝑛 and an integer 𝑘 ∈ [1, 𝑛 − 1], elliptic curve scalar multiplication consists in computing 𝑘 𝑃 .  This operation is central to protocols based on elliptic curves .

In this talk, we focus on the variable-point scenario on curves over large prime characteristic fields to achieve: - Highest performance possible - Full protection against timing-type side-channel attacks  Implications also extend to other scenarios (e.g., fixed-point and double-scalar

scenarios). Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

2 / 30

GLV Method

GLV Method

Given a point 𝑃 ∈ 𝐸(𝔽𝑞 ) of prime order 𝑛, an integer 𝑘 ∈ [1, 𝑛 − 1] and an efficiently computable endomorphism ©, the GLV method computes 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)

where max(|𝑘0 |,|𝑘1 |) = O ( 𝑛) .

 Using simultaneous multi-scalar multiplication (a.k.a. Strauss-Shamir trick),

the number of doublings is cut to half  Drawback: requires curves with small endomorphism ring when implemented

over 𝔽𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

3 / 30

GLV Method

Given a point 𝑃 ∈ 𝐸(𝔽𝑞 ) of prime order 𝑛, an integer 𝑘 ∈ [1, 𝑛 − 1] and an efficiently computable endomorphism ©, the GLV method computes 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)

where max(|𝑘0 |,|𝑘1 |) = O ( 𝑛) .

 Using simultaneous multi-scalar multiplication (a.k.a. Strauss-Shamir trick),

the number of doublings is cut to half  Drawback: requires curves with small endomorphism ring when implemented

over 𝔽𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

3 / 30

GLV Method Description

 𝐸/𝔽𝑝 , s.t. #𝐸 𝔽𝑝 = ℎ𝑛, with ℎ relatively small, and 𝑃 a point on the curve 𝐸 of

prime order 𝑛  © a nontrivial endomorphism defined over 𝔽𝑝 with characteristic polynomial

𝑋 2 + 𝑟𝑋 + 𝑠, where △ = 𝑟 2 −4𝑠 < 0  © 𝑃 = ¸𝑃, where ¸ ∈ 1, 𝑛 − 1 is a root of the char polynomial of © modulo 𝑛  By solving a closest vector problem in a lattice, one can get values 𝑘0 , 𝑘1 s.t.

𝑘 = 𝑘0 + 𝑘1 ¸ (mod 𝑛), or equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

4 / 30

d-dimensional GLV Scalar Multiplication Typical computation of 𝑘 𝑃 : 1. 2. 3.

Conversion of scalar 𝑘 to an efficient representation (e.g., wNAF) Precomputation (if applicable) Evaluation of 𝑘 𝑃 using double-and-add algorithm

A slight variation for case with 𝑑-dimension GLV method : 1. 2. 3. 4.

(If required) decomposition of 𝑘 to get smaller integers 𝑘𝑖 Conversion of scalars 𝑘𝑖 to an efficient representation Precomputation (if applicable) Evaluation of 𝑘 𝑃 = 𝑑−1 𝑖=0 𝑘𝑖 Ã𝑖 (𝑃) using interleaving, where Ã𝑖 are 𝑑 endomorphism mappings depending on the GLV construction (slightly abusing notation by assuming Ã0 (𝑃) = 𝑃)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

5 / 30

d-dimensional GLV Scalar Multiplication Typical computation of 𝑘 𝑃 : 1. 2. 3.

Conversion of scalar 𝑘 to an efficient representation (e.g., wNAF) Precomputation (if applicable) Evaluation of 𝑘 𝑃 using double-and-add algorithm

A slight variation for case with 𝑑-dimension GLV method : 1. 2. 3. 4.

(If required) decomposition of 𝑘 to get smaller integers 𝑘𝑖 Conversion of scalars 𝑘𝑖 to an efficient representation Precomputation (if applicable) Evaluation of 𝑘 𝑃 = 𝑑−1 𝑖=0 𝑘𝑖 Ã𝑖 (𝑃) using interleaving, where Ã𝑖 are 𝑑 endomorphism mappings depending on the GLV construction (slightly abusing notation by assuming Ã0 (𝑃) = 𝑃)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

5 / 30

d-dimensional GLV Scalar Multiplication Again, let 𝑘 𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) for 𝑑-dimensional GLV, assuming Ã0 𝑃 = 𝑃 . Let 𝑘𝑖 = 𝑘(𝑖,𝑙−1) 2𝑙−1 + 𝑘(𝑖,𝑙−2) 2𝑙−2 + ⋯ + 𝑘(𝑖,0) and [𝑠]Ã𝑖 (𝑃) be 𝑑 sets of precomputed points for 𝑖 = 0 to (𝑑 − 1), where 𝑠 ∈ 1, 3, 5, … , 2𝑤−1 − 1 for certain window width 𝑤 . Simultaneous multi-scalar multiplication using double-and-add with interleaving is computed as follows: INPUT: 𝑘𝑖 = 𝑘(𝑖,𝑙−1) , … , 𝑘(𝑖,0) for 0 ≤ 𝑖 < 𝑑, point 𝑃 ∈ 𝐸(𝔽𝑝 ) of prime order 𝑛 OUTPUT: 𝑄 = [𝑘]𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) 1.

𝑄=𝑂

2.

for 𝑗 = 𝑙 − 1 downto 0 do

3.

𝑄 = [2]𝑄

4. 5.

for 𝑖 = 0 to (𝑑 − 1) do if 𝑘(𝑖,𝑗) ≠ 0, then 𝑄 = 𝑄 + [𝑘(𝑖,𝑗) ] Ã𝑖 (𝑃)

6.

end for

7. Patrick Longa

end for Efficient and Secure Elliptic Curve Scalar Multiplication

6 / 30

GLV Extensions

Use curves over 𝔽𝑝2 instead of 𝔽𝑝 (Galbraith et al., Eurocrypt 2009):  Galbraith-Lin-Scott, 2-dimensional GLV (GLS curves)

Use the Frobenius endomorphism ª 𝑥, 𝑦 = (𝑥 𝑝 , 𝑦 𝑝 ), satisfying 𝑋 2 + 1 = 0 in 𝐸 𝔽𝑝2 .  Galbraith-Lin-Scott, 4-dimensional GLV (GLS curves, #𝐴𝑢𝑡(𝐸) > 2)

Use powers of the Frobenius endomorphism ª 𝑥, 𝑦 = (𝑥 𝑝 , 𝑦 𝑝 ) in 𝐸 𝔽𝑝2 . Studied by Hu-Longa-Xu .  Sica-Longa: 4-dimensional GLV (GLV-GLS curves)

Combine Frobenius and © on GLV curves over 𝔽𝑝2 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

7 / 30

GLS Curves using 2-GLV

2-GLV on GLS Curves Galbraith-Lin-Scott in 2009 : Let 𝐸 be an elliptic curve over 𝔽𝑝 , s.t. the quadratic twist 𝐸′ of 𝐸 𝔽𝑝2 has an efficiently computable homomorphism ª 𝑥, 𝑦 → 𝛼𝑥, 𝛽𝑦 , ª 𝑃 = 𝜇𝑃 .  ª arises from the 𝑝-power Frobenius map π on 𝐸′, that is ª = 𝜓𝜋𝜓 −1 , where

𝜓: 𝐸 → 𝐸′ is the twisting isomorphism.  𝜇2 + 1 = 0 (mod 𝑛) . Thus, ª2 (𝑃) + 𝑃 = 𝑂 .

Remarkably, 2-dimensional GLV now applies to a large number of curves of different forms (e.g., Weierstrass and Twisted Edwards curves)

 In settings where 𝑘𝑖 cannot be chosen randomly, Galbraith et al. showed that

solving the closest vector problem in this case is very simple (no lattice reduction needed) and that |𝑘𝑖 | ≤ (𝑝 + 1) 2 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

8 / 30

2-GLV on GLS Curves Galbraith-Lin-Scott in 2009 : Let 𝐸 be an elliptic curve over 𝔽𝑝 , s.t. the quadratic twist 𝐸′ of 𝐸 𝔽𝑝2 has an efficiently computable homomorphism ª 𝑥, 𝑦 → 𝛼𝑥, 𝛽𝑦 , ª 𝑃 = 𝜇𝑃 .  ª arises from the 𝑝-power Frobenius map π on 𝐸′, that is ª = 𝜓𝜋𝜓 −1 , where

𝜓: 𝐸 → 𝐸′ is the twisting isomorphism.  𝜇2 + 1 = 0 (mod 𝑛) . Thus, ª2 (𝑃) + 𝑃 = 𝑂 .

Remarkably, 2-dimensional GLV now applies to a large number of curves of different forms (e.g., Weierstrass and Twisted Edwards curves)

 In settings where 𝑘𝑖 cannot be chosen randomly, Galbraith et al. showed that

solving the closest vector problem in this case is very simple (no lattice reduction needed) and that |𝑘𝑖 | ≤ (𝑝 + 1) 2 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

8 / 30

2-GLV on GLS Curves: an Example

Let 𝐸 𝔽𝑝 be a curve in short Weierstrass form. The quadratic twist of 𝐸 over 𝔽𝑝2 is given by the equation 𝐸′ 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 𝑢2 𝑎𝑥 + 𝑢3 𝑏 where 𝑢 is a non-square in 𝔽𝑝2 , and #𝐸′(𝔽𝑝2 ) = 𝑝 − 1 trace of the Frobenius map.

2

+ 𝑡 2 , where 𝑡 is the

The homomorphism is given by 𝑢 𝑝 𝑢3 𝑝 ª 𝑥, 𝑦 = 𝑥 , 𝑦 𝑢𝑝 𝑢3𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

9 / 30

2-GLV on GLS Curves: an Example

Let 𝐸 𝔽𝑝 be a curve in short Weierstrass form. The quadratic twist of 𝐸 over 𝔽𝑝2 is given by the equation 𝐸′ 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 𝑢2 𝑎𝑥 + 𝑢3 𝑏 where 𝑢 is a non-square in 𝔽𝑝2 , and #𝐸′(𝔽𝑝2 ) = 𝑝 − 1 trace of the Frobenius map.

2

+ 𝑡 2 , where 𝑡 is the

The homomorphism is given by 𝑢 𝑝 𝑢3 𝑝 ª 𝑥, 𝑦 = 𝑥 , 𝑦 𝑢𝑝 𝑢3𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

9 / 30

GLS Curves using 4-GLV

4-GLV on GLS Curves: Approach I

Let 𝐸 𝔽𝑝 be an elliptic curve.  If 𝑃 ∈ 𝐸(𝔽𝑝4 ), we have that ª4 𝑃 − 𝑃 = 𝑂, and if 𝑃 ∉ 𝐸(𝔽𝑝2 ) has large prime

order 𝑛 we have ª2 𝑃 + 𝑃 = 𝑂, with corresponding polynomial 𝑋 2 + 1 = 0.

 In particular, ª satisfies 𝑋 4 − 𝑋 2 + 1 = 0 , −ª 2 satisfies 𝑋 2 + 𝑋 + 1 = 0 and

ª 3 satisfies 𝑋 2 + 1 = 0  Assuming the decomposition 𝑘 = 𝑘0 + 𝑘1 ¹ + 𝑘2 ¹2 + 𝑘3 ¹3 (mod 𝑛), or

equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ª(𝑃) + 𝑘2 ª2 (𝑃) + 𝑘3 ª3 (𝑃)

In settings where 𝑘𝑖 cannot be chosen randomly, Hu-Longa-Xu showed how to decompose 𝑘 on a 𝑗-invariant 0 curve achieving the maximum bound |𝑘𝑖 | ≤ 2 2𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

10 / 30

4-GLV on GLS Curves: Approach I

Let 𝐸 𝔽𝑝 be an elliptic curve.  If 𝑃 ∈ 𝐸(𝔽𝑝4 ), we have that ª4 𝑃 − 𝑃 = 𝑂, and if 𝑃 ∉ 𝐸(𝔽𝑝2 ) has large prime

order 𝑛 we have ª2 𝑃 + 𝑃 = 𝑂, with corresponding polynomial 𝑋 2 + 1 = 0.

 In particular, ª satisfies 𝑋 4 − 𝑋 2 + 1 = 0 , −ª 2 satisfies 𝑋 2 + 𝑋 + 1 = 0 and

ª 3 satisfies 𝑋 2 + 1 = 0  Assuming the decomposition 𝑘 = 𝑘0 + 𝑘1 ¹ + 𝑘2 ¹2 + 𝑘3 ¹3 (mod 𝑛), or

equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ª(𝑃) + 𝑘2 ª2 (𝑃) + 𝑘3 ª3 (𝑃)

In settings where 𝑘𝑖 cannot be chosen randomly, Hu-Longa-Xu showed how to decompose 𝑘 on a 𝑗-invariant 0 curve achieving the maximum bound |𝑘𝑖 | ≤ 2 2𝑝

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

10 / 30

4-GLV on GLS Curves: Approach II (GLV-GLS curves)

Extending work by Galbraith, Lin and Scott, to the GLV setting over 𝔽𝑝2 using the 𝑝-power Frobenius endomorphism ª and © . Theorem (4-GLV) If 𝐸′ is a quadratic twist of a GLV curve by a quadratic nonresidue of 𝔽𝑝2 , then assuming 𝑃 generates a large subgroup of prime order 𝑛 of 𝐸′(𝔽𝑝2 ) , given 𝑘 ∈ [1, 𝑛 − 1] , we can find a decomposition 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃) + 𝑘2 ª (𝑃) + 𝑘3 ª ©(𝑃)

where max (|𝑘𝑖 |) < 𝐶4 𝑛 𝑖

Patrick Longa

1

4

and 𝐶4 = 103 1 + 𝑟 + 𝑠 .

Efficient and Secure Elliptic Curve Scalar Multiplication

11 / 30

4-GLV on GLS Curves: Approach II (GLV-GLS curves)

Relatively easy to show a weaker form of (4-GLV) with a value of 𝐶4 = Ω(𝑠

3

2

).

However, our form of (4-GLV), with 𝐶4 = O ( 𝑠), allows to deduce that the relative improvement from (2-GLV) to (4-GLV) is at least log 𝑛 log 103𝑛

1

1 4

2

𝑠 2 1+ 𝑟 +𝑠

which is practically independent of the curve (true independence would be achieved 1 if we could show that 𝐶4 = 𝑂(𝑠 4 )) .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

12 / 30

GLV-GLS Curves in Twisted Edwards Form

GLV-GLS using the Twisted Edwards Model

In Weierstrass form, 𝑗-invariant 0 and 1728 GLV curves are very efficient. However, several other GLV curves are not .

The idea: Use the Twisted Edwards model (TEM) instead to make all of them highly efficient .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

13 / 30

Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

14 / 30

Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

14 / 30

Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

14 / 30

Example: GLV-GLS + TEM (Cont.) Let m, s and a stand for costs of multiplication, squaring and addition over 𝔽𝑝2 . 𝑬′𝟑 𝔽𝒑𝟐 (Weierstrass) Operation DBL mADD

ADD

𝑬′𝑻𝟑 𝔽𝒑𝟐 (Twisted Edwards)

Coord.

Cost

Coord.

Cost

Jacobian

3m + 6s + 12a

Homogeneous

4m + 3s + 5a

Mixed Jacobian/ affine (𝑍1 = 1)

8m + 3s + 7a

Jacobian

12m + 4s + 7a

Mixed extended homogeneous / homogeneous (𝑍1 = 1) Mixed extended homogeneous / homogeneous

7m + 7a

8m + 6a

Table 1. Point Operation Costs for Curves in the Example.

 A significant speed-up expected when moving from 𝐸′3 𝔽𝑝2 to 𝐸′ 𝑇3 𝔽𝑝2 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

15 / 30

Example: GLV-GLS + TEM (Cont.)

Computations for ª and © are also relatively inexpensive on 𝐸′ 𝑇3 𝔽𝑝2 . Let 𝜔 = 𝜁8 −𝑎

(1−𝑝)

© 𝑃 = © 𝑋1 , 𝑌1 , 𝑍1 = 𝑋2 , 𝑌2 , 𝑍2 , 𝑇2 , where 𝑇2 = 𝑋2 𝑌2 𝑍2 𝑋2 = −𝑋1 𝑌 21 𝛼 + 𝑍 21 𝜃 𝑌 21 𝛾 + Á − 𝑍 21 Á 𝑌2 = 2𝑌1 𝑍 21 𝑌 21 Á + 𝑍 21 𝛾 − Á 𝑍2 = 2𝑌1 𝑍 21 𝑌 21 𝛾 + Á − 𝑍 21 Á 𝑇2 = −𝑋1 𝑌 21 𝛼 + 𝑍 21 𝜃 𝑌 21 Á + 𝑍 21 𝛾 − Á where: 𝛼 = 𝜔3 + 2𝜔2 + 𝜔, 𝜃 = 𝜔3 − 2𝜔2 + 𝜔, 𝛾 = 2𝜔3 , Á = 𝜔2 − 1 Cost: 12m+2s+5a or 8m+1s+5a (if 𝑍1 = 1)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

16 / 30

Example: GLV-GLS + TEM (Cont.)

Computations for ª and © are also relatively inexpensive on 𝐸′ 𝑇3 𝔽𝑝2 . Let 𝜔 = 𝜁8 −𝑎

(1−𝑝)

ª 𝑃 = ª 𝑋1 , 𝑌1 , 𝑍1 = 𝑋2 , 𝑌2 , 𝑍2 , 𝑇2 , where 𝑇2 = 𝑋2 𝑌2 𝑍2 𝑝

𝑝

𝑋2 = 𝜔 𝑋 1 𝑌 1 𝑌2 =

𝑝2 𝑍1 𝑝

𝑝

𝑍2 = 𝑌 1 𝑍 1 𝑝

𝑝

𝑇2 = 𝜔 𝑋 1 𝑍 1 Cost (at most): 4m+1s+3a

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

17 / 30

GLV and Side-Channel Attacks

GLV Method and Side-Channel Attacks

There are innumerable types of side-channel attacks in the literature. On server and desktop computers, the main risk is posed by timing attacks, cache attacks and variants. Main approach: constant-time execution independent of the secret key . In particular:  No secret-dependent conditional branches  No secret-dependent table look-up accesses

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

18 / 30

GLV Method and Side-Channel Attacks

There are five key parts especially vulnerable in the computation of elliptic curve scalar multiplication:

Modular inversion: compute 𝑎−1 mod 𝑝 as a regular-pattern exponentiation 𝑎𝑝−2 mod 𝑝 using a short addition chain for 𝑝 − 2 . Reduction during field operations: exploit conditional move instructions with constant-time execution (e.g., cmove on x86 and x64 processors). Access to precomputed tables: run through whole table and extract required data by using conditional move instructions. Scalar recoding and exponentiation: use algorithms with regular-pattern execution.

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

19 / 30

GLV Method and Side-Channel Attacks Scalar recoding and exponentiation: We need a regular-pattern representation with fixed length.

Adapting Joye-Tunstall regular recoding to obtain fixed length: INPUT: scalar 𝑘 odd, dimension 𝑑 of 𝑙-bit GLV scalar mult and window width 𝑤 OUTPUT: 𝑘 𝑇 , 𝑘(𝑇−1) , … , 𝑘0 where 𝑘𝒕 ∈ ±1, ±3, ±5, … , ± 2𝑤−1 − 1

1.

𝑇 = 𝑙 𝑑 ∙ (𝑤 − 1)

2.

for 𝑖 = 0 to (𝑇 − 1) do

3.

𝑘𝑖 = 𝑘 mod 2𝑤 − 2𝑤−1

4.

𝑘 = 𝑘 − 𝑘𝑖

5.

end for

6.

𝑘𝑇 = 𝑘

2𝑤−1

 In scalar multiplication, easy to treat odd/even 𝑘 during initialization in constant-

time. Requires a constant-time final correction after main computation. Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

20 / 30

GLV Method and Side-Channel Attacks Scalar recoding and exponentiation The modified algorithm:  Executes in constant-time  Produces fixed-length representations for scalars 𝑘𝑖  Produces regular representations for 𝑘𝑖 that enable regular-pattern double-andadd execution: (𝑤 − 1) DBLs, 𝑑 ADDs, … , (𝑤 − 1) DBLs, 𝑑 ADDs , repeated 𝑇 times INPUT: 𝑘𝑖 = 𝑘(𝑖,𝑇) , 𝑘(𝑖,𝑇−1) , … , 𝑘(𝑖,0) for 0 ≤ 𝑖 < 𝑑, point 𝑃 ∈ 𝐸(𝔽𝑝 ) , window 𝑤 OUTPUT: 𝑄 = [𝑘]𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) (assuming Ã0 𝑃 = 𝑃 by abuse of notation) 1.

𝑄 = [𝑘 0,𝑇 ]Ã0 𝑃 + ⋯ + [𝑘(𝑑−1,𝑇) ]Ã(𝑑−1) (𝑃)

2.

for 𝑗 = 𝑇 − 1 downto 0 do

3.

𝑄 = 2(𝑤−1) 𝑄

4. 5.

for 𝑖 = 0 to (𝑑 − 1) do 𝑄 = 𝑄 + [𝑘(𝑖,𝑗) ] Ã𝑖 (𝑃)

6.

end for

7.

end for

 Some performance loss: nonzero density increases from 1 (𝑤 + 1) to 1 (𝑤 − 1). Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

21 / 30

Multicore Execution and its Protection

In addition:  GLV scalar multiplication is easy to parallelize.  Previously described side-channel countermeasures can be extended to the

multicore setting.

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

22 / 30

Efficient Field Arithmetic

Efficient Prime Forms

For extreme performance choose a Mersenne prime 𝑝 = 2𝑚 − 1  Reduction is extremely efficient (performed with a few adds, shifts and rotates)

Sadly, Mersenne primes are very scarce. When not possible choose a pseudo-Mersenne prime 𝑝 = 2𝑚 − 𝑐, where 𝑐 is “small” (i.e., 𝑐 < 2𝑤 , 𝑤 is computer wordsize)  Reduction is still efficient  A few additional techniques can be applied and combined E.g., incomplete reduction and lazy reduction

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

23 / 30

Efficient Prime Forms

For extreme performance choose a Mersenne prime 𝑝 = 2𝑚 − 1  Reduction is extremely efficient (performed with a few adds, shifts and rotates)

Sadly, Mersenne primes are very scarce. When not possible choose a pseudo-Mersenne prime 𝑝 = 2𝑚 − 𝑐, where 𝑐 is “small” (i.e., 𝑐 < 2𝑤 , 𝑤 is computer wordsize)  Reduction is still efficient  A few additional techniques can be applied and combined E.g., incomplete reduction and lazy reduction

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

23 / 30

Incomplete Reduction (IR) Yanik-Savaş-Koç in 2002:

Given 𝑎, 𝑏 ∈ [0, 𝑝 − 1] , allow the result of a given operation to stay in the range [0, 2𝑚 − 1] instead of performing a complete reduction, where 𝑝 < 2𝑚 < 2𝑝 − 1 . Possible cases:  If 𝑝 = 2𝑚 − 𝑐, where 𝑐 < 2𝑤 , 𝑚 = 𝑛 ∙ 𝑤 (𝑛 : number of words, w : computer

wordsize) Reduction after addition 𝑎 + 𝑏 : discard carry bit in most significant word and then add 𝑐  If 𝑝 = 2𝑚 − 𝑐, where 𝑐 < 2𝑤 , 𝑚 = 𝑛 ∙ 𝑤 − 𝑧 (𝑧 : small integer)

Reduction slightly more expensive after addition 𝑎 + 𝑏 . However, a few additions may be accumulated w/o reduction

 Subtraction does not require IR (already optimal!). But other operations may benefit

from IR: addition between completely reduced and incompletely reduced numbers, multiplication by constant, division by constant,… Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

24 / 30

Incomplete Reduction (IR)

Algorithm. Modular division by 2 with pseudo-Mersenne prime INPUT: integer a ∈ [0, 2m - 1], p = 2m – c OUTPUT: (a) r = a/2 (mod p) or (b) a/2 ∈ [1, 2m – 1] (a) Complete reduction 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Patrick Longa

carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 borrow = 0 For i from 0 to (n - 1) do (borrow, R[i]) ← r[i] - p[i] - borrow If borrow = 0 r←R Return r

(b) Incomplete reduction 1. 2. 3. 4. 5. 6. 7.

carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 Return r

Efficient and Secure Elliptic Curve Scalar Multiplication

25 / 30

Incomplete Reduction (IR)

Algorithm. Modular division by 2 with pseudo-Mersenne prime INPUT: integer a ∈ [0, 2m - 1], p = 2m – c OUTPUT: (a) r = a/2 (mod p) or (b) a/2 ∈ [1, 2m – 1] (a) Complete reduction 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Patrick Longa

carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 borrow = 0 For i from 0 to (n - 1) do (borrow, R[i]) ← r[i] - p[i] - borrow If borrow = 0 r←R Return r

(b) Incomplete reduction 1. 2. 3. 4. 5. 6. 7.

carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 Return r

(r – p) in case r = (p+a)/2 2 [p, 2m – (c+1)/2]

Efficient and Secure Elliptic Curve Scalar Multiplication

25 / 30

Lazy Reduction in 𝔽𝑝2

A sum of products modulo 𝑝

±𝑎𝑖 𝑏𝑖 mod 𝑝 can be reduced with only one reduction

 Addition of inner products are accumulated as “double-precision” integers  ±𝑎𝑖 𝑏𝑖 < 2𝑠 , where 𝑠 = 𝑡 ∙ 𝑤 (for efficiency), where 𝑡 is number of words

to represent a “double-precision” integer

Example: multiplication in 𝔽𝑝2 (𝑐 = 𝑎 × 𝑏 using Karatsuba): Let a = (a0, a1) and b = (b0, b1) ∈ 𝔽𝑝2

3Mu + 3R

rdcn

c0 = (a0 × b0) + β (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0 × b0 - a1 × b1

rdcn

3Mu + 2R

* Mu and R stands for the cost of integer multiplication and reduction over 𝔽𝑝 , resp. Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

26 / 30

Lazy Reduction in 𝔽𝑝2

A sum of products modulo 𝑝

±𝑎𝑖 𝑏𝑖 mod 𝑝 can be reduced with only one reduction

 Addition of inner products are accumulated as “double-precision” integers  ±𝑎𝑖 𝑏𝑖 < 2𝑠 , where 𝑠 = 𝑡 ∙ 𝑤 (for efficiency), where 𝑡 is number of words

to represent a “double-precision” integer

Example: multiplication in 𝔽𝑝2 (𝑐 = 𝑎 × 𝑏 using Karatsuba): Let a = (a0, a1) and b = (b0, b1) ∈ 𝔽𝑝2

3Mu + 3R

rdcn

c0 = (a0 × b0) + β (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0 × b0 - a1 × b1

rdcn

3Mu + 2R

* Mu and R stands for the cost of integer multiplication and reduction over 𝔽𝑝 , resp. Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

26 / 30

Incomplete + Lazy Reduction in 𝔽𝑝2  Let 𝑝 = 2127 − 2569, 𝑤 = 4, 𝑡 = 4  2𝑠 = 2256  Let a0, a1, b0, b1 ∈ [0, 2127 – 1], i.e., incompletely reduced numbers  Let 𝔽𝑝2 = 𝔽𝑝 [𝑖]/(𝑖 2 + 1) (since –1 is QNR in 𝔽𝑝 )  Multiplication in 𝔽𝑝2 , a×b = (a0, a1) × (b0, b1) :

c0 = (a0 × b0) - (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0×b0 - a1×b1 T0 ← a0 × b0 [0, 2254] T1 ← a1 × b1 [0, 2254] c0 ← T0 – T1 (mod p) if < 0 correct by adding 2128 ∙ 𝑝  [0, 2255]. After rdcn  [0, 2127– 2569] t0 ← a0 + a1 [0, 2128], no rdcn t1 ← b0 + b1 [0, 2128], no rdcn T2 ← t0 × t1 [0, 2256] T2 ← T2 – T0 [0, 2256], no correction to (+) c1 ← T2 – T1 (mod p) [0, 2256], no correction to (+). After rdcn  [0, 2127–2569] Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

27 / 30

Incomplete + Lazy Reduction in 𝔽𝑝2  Let 𝑝 = 2127 − 2569, 𝑤 = 4, 𝑡 = 4  2𝑠 = 2256  Let a0, a1, b0, b1 ∈ [0, 2127 – 1], i.e., incompletely reduced numbers  Let 𝔽𝑝2 = 𝔽𝑝 [𝑖]/(𝑖 2 + 1) (since –1 is QNR in 𝔽𝑝 )  Multiplication in 𝔽𝑝2 , a×b = (a0, a1) × (b0, b1) :

c0 = (a0 × b0) - (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0×b0 - a1×b1 T0 ← a0 × b0 [0, 2254] T1 ← a1 × b1 [0, 2254] c0 ← T0 – T1 (mod p) if < 0 correct by adding 2128 ∙ 𝑝  [0, 2255]. After rdcn  [0, 2127– 2569] t0 ← a0 + a1 [0, 2128], no rdcn t1 ← b0 + b1 [0, 2128], no rdcn T2 ← t0 × t1 [0, 2256] T2 ← T2 – T0 [0, 2256], no correction to (+) c1 ← T2 – T1 (mod p) [0, 2256], no correction to (+). After rdcn  [0, 2127–2569] Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

27 / 30

Generalized Lazy Reduction on ECC Easy to adapt technique by [Aranha-Karabina-Longa-Gebotys-López 2011] to ECC arithmetic with a few variations:  “Delay” 𝔽𝑝2 reductions in point formulas  E.g., let 𝑇 = (𝑋1 , 𝑌1 , 𝑍1 ) ∈ 𝐸′(𝔽𝑝2 ) be in Jacobian coordinates.

To compute 2𝑇 = (𝑋2 , 𝑌2 , 𝑍2 ) [Longa-Gebotys 2010]:

X2 = A2 – 2B Y2 = A(B – X2) – Y14

rdcn

Z2 = Y1Z1, where A = 3(X1+Z12)(X1 – Z12)/2 and B = X1Y12 This formula costs 4mu+4su+9a+7r (eliminating one 𝔽𝑝2 reduction) * mu and su stands for the cost of multiplication and squaring w/o reduction over 𝔽𝑝2 , resp., a and r stands for the cost of addition and reduction 𝔽𝑝2 , resp. Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

28 / 30

Efficient Point Arithmetic

Jacobian Coordinates I Faster formulas with reduced number of “adds” (assuming add=sub=div2=mul2):  Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏

𝑥, 𝑦 ↦ 𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mDBLADD (Z2 = 1) mADD (Z2 = 1)

  

4M + 4S + 9A [Longa 2010] 13M + 5S + 13A [Longa 2007] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]

 Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 + 𝑏 𝑥, 𝑦 ↦

𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mADD (Z2 = 1)

 

3M + 4S + 7A [Longa 2010] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]

 One may replace muls for sqrs if 1mul > 1sqr + 3“adds” using the

transformation 𝑎 ∙ 𝑏 = [ 𝑎 + 𝑏 Patrick Longa

2

− 𝑎2 − 𝑏2 ]/2, when 𝑎2 and 𝑏 2 are known

Efficient and Secure Elliptic Curve Scalar Multiplication

29 / 30

Jacobian Coordinates I Faster formulas with reduced number of “adds” (assuming add=sub=div2=mul2):  Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏

𝑥, 𝑦 ↦ 𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mDBLADD (Z2 = 1) mADD (Z2 = 1)

  

4M + 4S + 9A [Longa 2010] 13M + 5S + 13A [Longa 2007] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]

 Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 + 𝑏 𝑥, 𝑦 ↦

𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mADD (Z2 = 1)

 

3M + 4S + 7A [Longa 2010] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]

 One may replace muls for sqrs if 1mul > 1sqr + 3“adds” using the

transformation 𝑎 ∙ 𝑏 = [ 𝑎 + 𝑏 Patrick Longa

2

− 𝑎2 − 𝑏2 ]/2, when 𝑎2 and 𝑏 2 are known

Efficient and Secure Elliptic Curve Scalar Multiplication

29 / 30

Jacobian Coordinates II

Minimizing costs: ∗

 Trade additions for subtractions (or vice versa) by applying λ = –1 ∈ 𝔽 𝑝



 Minimize constants and additions/subtractions by applying λ = 2–1 ∈ 𝔽 𝑝

Example: 𝑋2 , 𝑌2 , 𝑍2 ← 2(𝑋1 , 𝑌1 , 𝑍1 ) using Jacobian coordinates A = 3(X1 + Z12)(X1 – Z12), B = 4X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – 8Y14 Z2 = 2Y1Z1

A = 3(X1 + Z12)(X1 – Z12)/2, B = X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – Y14 Z2 = Y1Z1

 Several constants are eliminated

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

30 / 30

Jacobian Coordinates II

Minimizing costs: ∗

 Trade additions for subtractions (or vice versa) by applying λ = –1 ∈ 𝔽 𝑝



 Minimize constants and additions/subtractions by applying λ = 2–1 ∈ 𝔽 𝑝

Example: 𝑋2 , 𝑌2 , 𝑍2 ← 2(𝑋1 , 𝑌1 , 𝑍1 ) using Jacobian coordinates A = 3(X1 + Z12)(X1 – Z12), B = 4X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – 8Y14 Z2 = 2Y1Z1

A = 3(X1 + Z12)(X1 – Z12)/2, B = X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – Y14 Z2 = Y1Z1

 Several constants are eliminated

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

30 / 30

Twisted Edwards Coordinates Assuming again add=sub=div2=mul2  Mixed extended/homogeneous coordinates on Twisted Edwards curve :

𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 𝑥, 𝑦 ↦ 𝑋 𝑍, 𝑌 𝑍, 1, 𝑇 𝑍 , 𝑋: 𝑌: 𝑍: 𝑇 = {λ𝑋, λ𝑌, λ𝑍, λ𝑇: λ ∈ 𝔽 𝑝∗}, 𝑇 = 𝑋𝑌/𝑍 DBL

 4M + 3S + 6A [Bernstein-Birkner-Joye-Lange-Peters 2008]

DBLADD

 12M + 3S + 11A [Hisil-Wong-Carter-Dawson 2008]

 For all these formulas, one may replace muls for sqrs if 1mul > 1sqr + 3adds

(however, that is not generally the case on many processors!)  (In some cases) there are some additional ops when working on a GLS curve over

𝔽𝑝2 (operations with twisting parameter 𝑢)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

31 / 30

Twisted Edwards Coordinates Assuming again add=sub=div2=mul2  Mixed extended/homogeneous coordinates on Twisted Edwards curve :

𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 𝑥, 𝑦 ↦ 𝑋 𝑍, 𝑌 𝑍, 1, 𝑇 𝑍 , 𝑋: 𝑌: 𝑍: 𝑇 = {λ𝑋, λ𝑌, λ𝑍, λ𝑇: λ ∈ 𝔽 𝑝∗}, 𝑇 = 𝑋𝑌/𝑍 DBL

 4M + 3S + 6A [Bernstein-Birkner-Joye-Lange-Peters 2008]

DBLADD

 12M + 3S + 11A [Hisil-Wong-Carter-Dawson 2008]

 For all these formulas, one may replace muls for sqrs if 1mul > 1sqr + 3adds

(however, that is not generally the case on many processors!)  (In some cases) there are some additional ops when working on a GLS curve over

𝔽𝑝2 (operations with twisting parameter 𝑢)

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

31 / 30

Point Formulas: Summary Operation

Coord.

Curve

Cost (𝔽𝒑 )

Cost (GLS method, 𝔽𝒑𝟐 )

DBL

Jacobian

Weierstrass 𝑎=0

3M + 4S + 7A

same

DBL

Jacobian

Weierstrass 𝑎 = −3

4M + 4S + 9A

4m + 4s + 11a

mADD

Jacobian

Weierstrass 𝑎 = 0, 𝑎 = −3

8M + 3S + 7A

same

mDBLADD

Jacobian

Weierstrass 𝑎 = −3

13M + 5S + 13A

same

DBLADD

Jacobian

Weierstrass 𝑎 = −3

16M + 5S + 13A

same

homogeneous

Twisted Edwards 𝑎 = −1

4M + 3S + 5A

4m + 3s + 7a

mDBLADD

Mixed extended/ homogeneous

Twisted Edwards 𝑎 = −1

11M + 3S + 11A

12m + 3s + 16a

DBLADD

Mixed extended/ homogeneous

Twisted Edwards 𝑎 = −1

12M + 3S + 11A

13m + 3s + 16a

DBL

(1) Assuming that multiplying by 𝑢 costs about 2 adds (2) mXXX involves use of mixed affine/projective coordinates when 𝑍 = 1 for point to be added Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

32 / 30

Experimental Results

Setup I For our experiments, we consider the five curves below: two GLV curves in Weierstrass form with and without nontrivial automorphisms, their corresponding GLV-GLS counterparts and one curve in Twisted Edwards form isomorphic to the GLV-GLS curve 𝐸′3 (see below).

 GLV-GLS curve with 𝑗-invariant 0 in Weierstrass form 𝐸′1 /𝔽𝑝 12 ∶ 𝑦 2 = 𝑥 3 + 9𝑢 ,

where 𝑝1 = 2127 − 58309 and #𝐸 ′1 (𝔽𝑝 12) is a 254-bit prime. We use 𝔽𝑝 12 = 𝔽𝑝1 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 12. We have that ©2 + © + 1 = 0 and ª2 + 1 = 0 .  GLV curve with 𝑗-invariant 0 in Weierstrass form 𝐸2 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 2 , where 𝑝2 = 2256 − 11733 and #𝐸2 𝔽𝑝2 is a 256-bit prime.

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

33 / 30

Setup II

∶ 𝑦 2 = 𝑥 3 − 15 2 𝑢2 𝑥 − 7𝑢3 , where 𝑝3 = 2127 − 5997 and #𝐸 ′ 3 (𝔽𝑝 32) = 8𝑟 , where 𝑟 is a 251-bit prime. We use 𝔽𝑝 32 = 𝔽𝑝3 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 2 .

 GLV-GLS curve in Weierstrass form 𝐸′3 /𝔽𝑝

2

2

2

2

2 3

3

We have that © + 2 = 0 and ª + 1 = 0 .  GLV-GLS curve in Twisted Edwards form 𝐸′ 𝑇3 𝔽𝑝 32 : −𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑝3 = 2127 − 5997, 𝑑 = 170141183460469231731687303715884099728 + 116829086847165810221872975542241037773𝑖 and #𝐸 ′ 𝑇3 (𝔽𝑝 32) = 8𝑟 , where 𝑟 is a 251-bit prime. We use again 𝔽𝑝 32 = 𝔽𝑝3 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 2 . 3

We have that © + 2 = 0 and ª + 1 = 0 . 𝐸′ 𝑇3 is isomorphic to curve 𝐸′3 above.  GLV curve 𝐸4 𝔽𝑝4 : 𝑦 2 = 𝑥 3 − 15 2 𝑥 − 7 , where 𝑝4 = 2256 − 45717 and #𝐸4 𝔽𝑝4 = 2𝑟 , where 𝑟 is a 256-bit prime.

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

34 / 30

Results I: single-core, no protection Operation count and performance of scalar multiplication (∼128 bits of security). Theoretical estimates and actual results based on tests on a single core of a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor .

Curve

Method

Total Cost

Gain

Performance

Gain

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

4-GLV-GLS

1209m

51%

99,000cc

53%

𝐸2 𝔽𝑝2 , Weierstrass

2-GLV

2004M ≈ 1824m

-

151,000cc

-

𝐸 ′ 𝑇3(𝔽𝑝 32) , Twisted Edwards

4-GLV-GLS

1117m

97%

91,000cc

102%

𝐸 ′ 3 (𝔽𝑝 32), Weierstrass

4-GLV-GLS

1468m

50%

121,000cc

52%

𝐸4 𝔽𝑝4 , Weierstrass

2-GLV

2416M ≈ 2199m

-

184,000cc

-

 About 50% speed-up when moving from 2-GLV to 4-GLV-GLS .  Twisted Edwards injects a further 30% speed-up to curve 𝐸 ′ 3 . * m, s and a stand for costs of multiplication, squaring and addition over 𝔽𝑝2 , and M for cost of multiplication over 𝔽𝑝 . Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

35 / 30

Results II: single and multi-core, unprotected and protected Performance of scalar multiplication (∼128 bits of security). Results based on tests on a single core of a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor . Curve

Method

Protection

#Cores

Performance

𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards

4-GLV-GLS





91,000cc

𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards

4-GLV-GLS





137,000cc

𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards

4-GLV-GLS





61,000cc

𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards

4-GLV-GLS





78,000cc

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

4-GLV-GLS





99,000cc

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

4-GLV-GLS





145,000cc

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

4-GLV-GLS





70,000cc

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

4-GLV-GLS





89,000cc

𝐸 ′1 (𝔽𝑝 12) , Weierstrass

non-GLV





201,000cc

𝐸2 (𝔽𝑝2 ), Weierstrass

2-GLV





151,000cc

𝐸2 (𝔽𝑝2 ), Weierstrass

2-GLV





127,000cc

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

36 / 30

Results III: single and multi-core, unprotected and protected

 2x speed-up when moving from non-GLV to 4-GLV-GLS on curve 𝐸 ′1 (sequential/    

 

unprotected version) Up to 76% speed-up when using multicore execution (protected version) 46%-50% overhead for protecting sequential implementations Only ∼28% overhead for protecting multicore implementations As before, ∼50% speed-up when moving from 2-GLV to 4-GLV-GLS (curve 𝐸 ′1 ). Four-core GLV-GLS is 1.81x faster than the standard two-core 2-GLV (curve 𝐸 ′1 ). Twisted Edwards curve 𝐸 ′ 𝑇3 is 6%-15% faster than Weierstrass curve 𝐸 ′1 .

Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

37 / 30

Results III: New Speed Records

Our implementations using the new GLV-GLS curves have set new speed records for elliptic curves over large prime characteristic fields for several scenarios (x64 processors)  Unprotected versions:

Sequential: 91,000 cycles (previous: Hu-Longa-Xu 2011, 122,000 cycles) Multicore: 61,000 cycles (no previous record).  Versions fully protected against timing-type side-channel attacks:

Sequential: 137,000 cycles (previous: Bernstein et al. 2011, 194,000 cycles) Multicore: 78,000 cycles (no previous record).

* Figures on a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor. Patrick Longa

Efficient and Secure Elliptic Curve Scalar Multiplication

39 / 30

Highly-Efficient and Secure Elliptic Curve Scalar Multiplication using the 4-GLV Method

Q&A Patrick Longa Microsoft Research

http://research.microsoft.com/en-us/people/plonga/

Joint work with Zhi Hu, Francesco Sica and Maozhi Xu

Suggest Documents