Patrick Longa. Microsoft ... The Galbraith-Lin-Scott (GLS) Method and Extensions. 4-GLV Method ..... cannot be chosen randomly, Galbraith et al. showed that.
Highly-Efficient and Secure Elliptic Curve Scalar Multiplication using the 4-GLV Method
Patrick Longa Microsoft Research
http://research.microsoft.com/en-us/people/plonga/
Joint work with Zhi Hu, Francesco Sica and Maozhi Xu
Outline
ECC Basics
The Gallant-Lambert-Vanstone (GLV) Method The Galbraith-Lin-Scott (GLS) Method and Extensions 4-GLV Method on GLS Curves GLV and Side-Channel Attacks GLV and the Twisted Edwards Model Efficient Field Arithmetic Efficient Point Arithmetic Experimental Results Conclusions
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
1 / 30
Elliptic Curve Scalar Multiplication A (Weierstrass) elliptic curve over a field 𝐾 is given by 𝐸 𝐾 ∶ 𝑦 2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥 3 + 𝑎2 𝑥 2 + 𝑎4 𝑥 + 𝑎6 where 𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 , 𝑎6 ∈ 𝐾 and discriminant ∆𝐸 ≠ 0 . Given a point 𝑃 ∈ 𝐸(𝐾) of prime order 𝑛 and an integer 𝑘 ∈ [1, 𝑛 − 1], elliptic curve scalar multiplication consists in computing 𝑘 𝑃 . This operation is central to protocols based on elliptic curves .
In this talk, we focus on the variable-point scenario on curves over large prime characteristic fields to achieve: - Highest performance possible - Full protection against timing-type side-channel attacks Implications also extend to other scenarios (e.g., fixed-point and double-scalar
scenarios). Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
2 / 30
Elliptic Curve Scalar Multiplication A (Weierstrass) elliptic curve over a field 𝐾 is given by 𝐸 𝐾 ∶ 𝑦 2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥 3 + 𝑎2 𝑥 2 + 𝑎4 𝑥 + 𝑎6 where 𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 , 𝑎6 ∈ 𝐾 and discriminant ∆𝐸 ≠ 0 . Given a point 𝑃 ∈ 𝐸(𝐾) of prime order 𝑛 and an integer 𝑘 ∈ [1, 𝑛 − 1], elliptic curve scalar multiplication consists in computing 𝑘 𝑃 . This operation is central to protocols based on elliptic curves .
In this talk, we focus on the variable-point scenario on curves over large prime characteristic fields to achieve: - Highest performance possible - Full protection against timing-type side-channel attacks Implications also extend to other scenarios (e.g., fixed-point and double-scalar
scenarios). Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
2 / 30
GLV Method
GLV Method
Given a point 𝑃 ∈ 𝐸(𝔽𝑞 ) of prime order 𝑛, an integer 𝑘 ∈ [1, 𝑛 − 1] and an efficiently computable endomorphism ©, the GLV method computes 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)
where max(|𝑘0 |,|𝑘1 |) = O ( 𝑛) .
Using simultaneous multi-scalar multiplication (a.k.a. Strauss-Shamir trick),
the number of doublings is cut to half Drawback: requires curves with small endomorphism ring when implemented
over 𝔽𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
3 / 30
GLV Method
Given a point 𝑃 ∈ 𝐸(𝔽𝑞 ) of prime order 𝑛, an integer 𝑘 ∈ [1, 𝑛 − 1] and an efficiently computable endomorphism ©, the GLV method computes 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)
where max(|𝑘0 |,|𝑘1 |) = O ( 𝑛) .
Using simultaneous multi-scalar multiplication (a.k.a. Strauss-Shamir trick),
the number of doublings is cut to half Drawback: requires curves with small endomorphism ring when implemented
over 𝔽𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
3 / 30
GLV Method Description
𝐸/𝔽𝑝 , s.t. #𝐸 𝔽𝑝 = ℎ𝑛, with ℎ relatively small, and 𝑃 a point on the curve 𝐸 of
prime order 𝑛 © a nontrivial endomorphism defined over 𝔽𝑝 with characteristic polynomial
𝑋 2 + 𝑟𝑋 + 𝑠, where △ = 𝑟 2 −4𝑠 < 0 © 𝑃 = ¸𝑃, where ¸ ∈ 1, 𝑛 − 1 is a root of the char polynomial of © modulo 𝑛 By solving a closest vector problem in a lattice, one can get values 𝑘0 , 𝑘1 s.t.
𝑘 = 𝑘0 + 𝑘1 ¸ (mod 𝑛), or equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
4 / 30
d-dimensional GLV Scalar Multiplication Typical computation of 𝑘 𝑃 : 1. 2. 3.
Conversion of scalar 𝑘 to an efficient representation (e.g., wNAF) Precomputation (if applicable) Evaluation of 𝑘 𝑃 using double-and-add algorithm
A slight variation for case with 𝑑-dimension GLV method : 1. 2. 3. 4.
(If required) decomposition of 𝑘 to get smaller integers 𝑘𝑖 Conversion of scalars 𝑘𝑖 to an efficient representation Precomputation (if applicable) Evaluation of 𝑘 𝑃 = 𝑑−1 𝑖=0 𝑘𝑖 Ã𝑖 (𝑃) using interleaving, where Ã𝑖 are 𝑑 endomorphism mappings depending on the GLV construction (slightly abusing notation by assuming Ã0 (𝑃) = 𝑃)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
5 / 30
d-dimensional GLV Scalar Multiplication Typical computation of 𝑘 𝑃 : 1. 2. 3.
Conversion of scalar 𝑘 to an efficient representation (e.g., wNAF) Precomputation (if applicable) Evaluation of 𝑘 𝑃 using double-and-add algorithm
A slight variation for case with 𝑑-dimension GLV method : 1. 2. 3. 4.
(If required) decomposition of 𝑘 to get smaller integers 𝑘𝑖 Conversion of scalars 𝑘𝑖 to an efficient representation Precomputation (if applicable) Evaluation of 𝑘 𝑃 = 𝑑−1 𝑖=0 𝑘𝑖 Ã𝑖 (𝑃) using interleaving, where Ã𝑖 are 𝑑 endomorphism mappings depending on the GLV construction (slightly abusing notation by assuming Ã0 (𝑃) = 𝑃)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
5 / 30
d-dimensional GLV Scalar Multiplication Again, let 𝑘 𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) for 𝑑-dimensional GLV, assuming Ã0 𝑃 = 𝑃 . Let 𝑘𝑖 = 𝑘(𝑖,𝑙−1) 2𝑙−1 + 𝑘(𝑖,𝑙−2) 2𝑙−2 + ⋯ + 𝑘(𝑖,0) and [𝑠]Ã𝑖 (𝑃) be 𝑑 sets of precomputed points for 𝑖 = 0 to (𝑑 − 1), where 𝑠 ∈ 1, 3, 5, … , 2𝑤−1 − 1 for certain window width 𝑤 . Simultaneous multi-scalar multiplication using double-and-add with interleaving is computed as follows: INPUT: 𝑘𝑖 = 𝑘(𝑖,𝑙−1) , … , 𝑘(𝑖,0) for 0 ≤ 𝑖 < 𝑑, point 𝑃 ∈ 𝐸(𝔽𝑝 ) of prime order 𝑛 OUTPUT: 𝑄 = [𝑘]𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) 1.
𝑄=𝑂
2.
for 𝑗 = 𝑙 − 1 downto 0 do
3.
𝑄 = [2]𝑄
4. 5.
for 𝑖 = 0 to (𝑑 − 1) do if 𝑘(𝑖,𝑗) ≠ 0, then 𝑄 = 𝑄 + [𝑘(𝑖,𝑗) ] Ã𝑖 (𝑃)
6.
end for
7. Patrick Longa
end for Efficient and Secure Elliptic Curve Scalar Multiplication
6 / 30
GLV Extensions
Use curves over 𝔽𝑝2 instead of 𝔽𝑝 (Galbraith et al., Eurocrypt 2009): Galbraith-Lin-Scott, 2-dimensional GLV (GLS curves)
Use the Frobenius endomorphism ª 𝑥, 𝑦 = (𝑥 𝑝 , 𝑦 𝑝 ), satisfying 𝑋 2 + 1 = 0 in 𝐸 𝔽𝑝2 . Galbraith-Lin-Scott, 4-dimensional GLV (GLS curves, #𝐴𝑢𝑡(𝐸) > 2)
Use powers of the Frobenius endomorphism ª 𝑥, 𝑦 = (𝑥 𝑝 , 𝑦 𝑝 ) in 𝐸 𝔽𝑝2 . Studied by Hu-Longa-Xu . Sica-Longa: 4-dimensional GLV (GLV-GLS curves)
Combine Frobenius and © on GLV curves over 𝔽𝑝2 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
7 / 30
GLS Curves using 2-GLV
2-GLV on GLS Curves Galbraith-Lin-Scott in 2009 : Let 𝐸 be an elliptic curve over 𝔽𝑝 , s.t. the quadratic twist 𝐸′ of 𝐸 𝔽𝑝2 has an efficiently computable homomorphism ª 𝑥, 𝑦 → 𝛼𝑥, 𝛽𝑦 , ª 𝑃 = 𝜇𝑃 . ª arises from the 𝑝-power Frobenius map π on 𝐸′, that is ª = 𝜓𝜋𝜓 −1 , where
𝜓: 𝐸 → 𝐸′ is the twisting isomorphism. 𝜇2 + 1 = 0 (mod 𝑛) . Thus, ª2 (𝑃) + 𝑃 = 𝑂 .
Remarkably, 2-dimensional GLV now applies to a large number of curves of different forms (e.g., Weierstrass and Twisted Edwards curves)
In settings where 𝑘𝑖 cannot be chosen randomly, Galbraith et al. showed that
solving the closest vector problem in this case is very simple (no lattice reduction needed) and that |𝑘𝑖 | ≤ (𝑝 + 1) 2 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
8 / 30
2-GLV on GLS Curves Galbraith-Lin-Scott in 2009 : Let 𝐸 be an elliptic curve over 𝔽𝑝 , s.t. the quadratic twist 𝐸′ of 𝐸 𝔽𝑝2 has an efficiently computable homomorphism ª 𝑥, 𝑦 → 𝛼𝑥, 𝛽𝑦 , ª 𝑃 = 𝜇𝑃 . ª arises from the 𝑝-power Frobenius map π on 𝐸′, that is ª = 𝜓𝜋𝜓 −1 , where
𝜓: 𝐸 → 𝐸′ is the twisting isomorphism. 𝜇2 + 1 = 0 (mod 𝑛) . Thus, ª2 (𝑃) + 𝑃 = 𝑂 .
Remarkably, 2-dimensional GLV now applies to a large number of curves of different forms (e.g., Weierstrass and Twisted Edwards curves)
In settings where 𝑘𝑖 cannot be chosen randomly, Galbraith et al. showed that
solving the closest vector problem in this case is very simple (no lattice reduction needed) and that |𝑘𝑖 | ≤ (𝑝 + 1) 2 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
8 / 30
2-GLV on GLS Curves: an Example
Let 𝐸 𝔽𝑝 be a curve in short Weierstrass form. The quadratic twist of 𝐸 over 𝔽𝑝2 is given by the equation 𝐸′ 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 𝑢2 𝑎𝑥 + 𝑢3 𝑏 where 𝑢 is a non-square in 𝔽𝑝2 , and #𝐸′(𝔽𝑝2 ) = 𝑝 − 1 trace of the Frobenius map.
2
+ 𝑡 2 , where 𝑡 is the
The homomorphism is given by 𝑢 𝑝 𝑢3 𝑝 ª 𝑥, 𝑦 = 𝑥 , 𝑦 𝑢𝑝 𝑢3𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
9 / 30
2-GLV on GLS Curves: an Example
Let 𝐸 𝔽𝑝 be a curve in short Weierstrass form. The quadratic twist of 𝐸 over 𝔽𝑝2 is given by the equation 𝐸′ 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 𝑢2 𝑎𝑥 + 𝑢3 𝑏 where 𝑢 is a non-square in 𝔽𝑝2 , and #𝐸′(𝔽𝑝2 ) = 𝑝 − 1 trace of the Frobenius map.
2
+ 𝑡 2 , where 𝑡 is the
The homomorphism is given by 𝑢 𝑝 𝑢3 𝑝 ª 𝑥, 𝑦 = 𝑥 , 𝑦 𝑢𝑝 𝑢3𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
9 / 30
GLS Curves using 4-GLV
4-GLV on GLS Curves: Approach I
Let 𝐸 𝔽𝑝 be an elliptic curve. If 𝑃 ∈ 𝐸(𝔽𝑝4 ), we have that ª4 𝑃 − 𝑃 = 𝑂, and if 𝑃 ∉ 𝐸(𝔽𝑝2 ) has large prime
order 𝑛 we have ª2 𝑃 + 𝑃 = 𝑂, with corresponding polynomial 𝑋 2 + 1 = 0.
In particular, ª satisfies 𝑋 4 − 𝑋 2 + 1 = 0 , −ª 2 satisfies 𝑋 2 + 𝑋 + 1 = 0 and
ª 3 satisfies 𝑋 2 + 1 = 0 Assuming the decomposition 𝑘 = 𝑘0 + 𝑘1 ¹ + 𝑘2 ¹2 + 𝑘3 ¹3 (mod 𝑛), or
equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ª(𝑃) + 𝑘2 ª2 (𝑃) + 𝑘3 ª3 (𝑃)
In settings where 𝑘𝑖 cannot be chosen randomly, Hu-Longa-Xu showed how to decompose 𝑘 on a 𝑗-invariant 0 curve achieving the maximum bound |𝑘𝑖 | ≤ 2 2𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
10 / 30
4-GLV on GLS Curves: Approach I
Let 𝐸 𝔽𝑝 be an elliptic curve. If 𝑃 ∈ 𝐸(𝔽𝑝4 ), we have that ª4 𝑃 − 𝑃 = 𝑂, and if 𝑃 ∉ 𝐸(𝔽𝑝2 ) has large prime
order 𝑛 we have ª2 𝑃 + 𝑃 = 𝑂, with corresponding polynomial 𝑋 2 + 1 = 0.
In particular, ª satisfies 𝑋 4 − 𝑋 2 + 1 = 0 , −ª 2 satisfies 𝑋 2 + 𝑋 + 1 = 0 and
ª 3 satisfies 𝑋 2 + 1 = 0 Assuming the decomposition 𝑘 = 𝑘0 + 𝑘1 ¹ + 𝑘2 ¹2 + 𝑘3 ¹3 (mod 𝑛), or
equivalently, 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ª(𝑃) + 𝑘2 ª2 (𝑃) + 𝑘3 ª3 (𝑃)
In settings where 𝑘𝑖 cannot be chosen randomly, Hu-Longa-Xu showed how to decompose 𝑘 on a 𝑗-invariant 0 curve achieving the maximum bound |𝑘𝑖 | ≤ 2 2𝑝
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
10 / 30
4-GLV on GLS Curves: Approach II (GLV-GLS curves)
Extending work by Galbraith, Lin and Scott, to the GLV setting over 𝔽𝑝2 using the 𝑝-power Frobenius endomorphism ª and © . Theorem (4-GLV) If 𝐸′ is a quadratic twist of a GLV curve by a quadratic nonresidue of 𝔽𝑝2 , then assuming 𝑃 generates a large subgroup of prime order 𝑛 of 𝐸′(𝔽𝑝2 ) , given 𝑘 ∈ [1, 𝑛 − 1] , we can find a decomposition 𝑘 𝑃 = 𝑘0 𝑃 + 𝑘1 ©(𝑃) + 𝑘2 ª (𝑃) + 𝑘3 ª ©(𝑃)
where max (|𝑘𝑖 |) < 𝐶4 𝑛 𝑖
Patrick Longa
1
4
and 𝐶4 = 103 1 + 𝑟 + 𝑠 .
Efficient and Secure Elliptic Curve Scalar Multiplication
11 / 30
4-GLV on GLS Curves: Approach II (GLV-GLS curves)
Relatively easy to show a weaker form of (4-GLV) with a value of 𝐶4 = Ω(𝑠
3
2
).
However, our form of (4-GLV), with 𝐶4 = O ( 𝑠), allows to deduce that the relative improvement from (2-GLV) to (4-GLV) is at least log 𝑛 log 103𝑛
1
1 4
2
𝑠 2 1+ 𝑟 +𝑠
which is practically independent of the curve (true independence would be achieved 1 if we could show that 𝐶4 = 𝑂(𝑠 4 )) .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
12 / 30
GLV-GLS Curves in Twisted Edwards Form
GLV-GLS using the Twisted Edwards Model
In Weierstrass form, 𝑗-invariant 0 and 1728 GLV curves are very efficient. However, several other GLV curves are not .
The idea: Use the Twisted Edwards model (TEM) instead to make all of them highly efficient .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
13 / 30
Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
14 / 30
Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
14 / 30
Example: GLV-GLS + TEM Let 𝑝 > 3 be a prime s.t. −2 is quadratic residue modulo 𝑝 . Let 𝑢 ∈ 𝔽𝑝2 be a nonsquare in 𝔽𝑝2 . The curve 15 2 2 3 𝐸′3 𝔽𝑝2 : 𝑦 = 𝑥 − 𝑢 𝑥 − 7𝑢3 2 is isomorphic to the quadratic twist of the GLV curve 𝐸3 𝔽𝑝 : 𝑦 2 = 4𝑥 3 − 30𝑥 − 28 . Then, 𝐸′3 𝔽𝑝2 written down in TEM form is given by −𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑎 = 27𝑢3 2 2 − 1 and 𝑑 = −27𝑢3 2 2 + 1 . Let 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝2 , with 𝑖 2 = −1 , and 𝜁8 = 𝑢 2 , where 𝜁8 is a primitive 8th root of unity. After ensuring that −𝑎 be a square in 𝔽𝑝2 , use the map (𝑥, 𝑦) ↦ 𝑥 −𝑎 , 𝑦 to finally obtain 𝐸′ 𝑇3 𝔽𝑝2 : −𝑥 2 + 𝑦 2 = 1 + 𝑑′𝑥 2 𝑦 2 with 𝑑 ′ = 𝑑/𝑎 , 𝑎 = 54 𝜁 38 − 𝜁 28 + 1 and 𝑑 = −54 𝜁 38 + 𝜁 28 − 1 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
14 / 30
Example: GLV-GLS + TEM (Cont.) Let m, s and a stand for costs of multiplication, squaring and addition over 𝔽𝑝2 . 𝑬′𝟑 𝔽𝒑𝟐 (Weierstrass) Operation DBL mADD
ADD
𝑬′𝑻𝟑 𝔽𝒑𝟐 (Twisted Edwards)
Coord.
Cost
Coord.
Cost
Jacobian
3m + 6s + 12a
Homogeneous
4m + 3s + 5a
Mixed Jacobian/ affine (𝑍1 = 1)
8m + 3s + 7a
Jacobian
12m + 4s + 7a
Mixed extended homogeneous / homogeneous (𝑍1 = 1) Mixed extended homogeneous / homogeneous
7m + 7a
8m + 6a
Table 1. Point Operation Costs for Curves in the Example.
A significant speed-up expected when moving from 𝐸′3 𝔽𝑝2 to 𝐸′ 𝑇3 𝔽𝑝2 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
15 / 30
Example: GLV-GLS + TEM (Cont.)
Computations for ª and © are also relatively inexpensive on 𝐸′ 𝑇3 𝔽𝑝2 . Let 𝜔 = 𝜁8 −𝑎
(1−𝑝)
© 𝑃 = © 𝑋1 , 𝑌1 , 𝑍1 = 𝑋2 , 𝑌2 , 𝑍2 , 𝑇2 , where 𝑇2 = 𝑋2 𝑌2 𝑍2 𝑋2 = −𝑋1 𝑌 21 𝛼 + 𝑍 21 𝜃 𝑌 21 𝛾 + Á − 𝑍 21 Á 𝑌2 = 2𝑌1 𝑍 21 𝑌 21 Á + 𝑍 21 𝛾 − Á 𝑍2 = 2𝑌1 𝑍 21 𝑌 21 𝛾 + Á − 𝑍 21 Á 𝑇2 = −𝑋1 𝑌 21 𝛼 + 𝑍 21 𝜃 𝑌 21 Á + 𝑍 21 𝛾 − Á where: 𝛼 = 𝜔3 + 2𝜔2 + 𝜔, 𝜃 = 𝜔3 − 2𝜔2 + 𝜔, 𝛾 = 2𝜔3 , Á = 𝜔2 − 1 Cost: 12m+2s+5a or 8m+1s+5a (if 𝑍1 = 1)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
16 / 30
Example: GLV-GLS + TEM (Cont.)
Computations for ª and © are also relatively inexpensive on 𝐸′ 𝑇3 𝔽𝑝2 . Let 𝜔 = 𝜁8 −𝑎
(1−𝑝)
ª 𝑃 = ª 𝑋1 , 𝑌1 , 𝑍1 = 𝑋2 , 𝑌2 , 𝑍2 , 𝑇2 , where 𝑇2 = 𝑋2 𝑌2 𝑍2 𝑝
𝑝
𝑋2 = 𝜔 𝑋 1 𝑌 1 𝑌2 =
𝑝2 𝑍1 𝑝
𝑝
𝑍2 = 𝑌 1 𝑍 1 𝑝
𝑝
𝑇2 = 𝜔 𝑋 1 𝑍 1 Cost (at most): 4m+1s+3a
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
17 / 30
GLV and Side-Channel Attacks
GLV Method and Side-Channel Attacks
There are innumerable types of side-channel attacks in the literature. On server and desktop computers, the main risk is posed by timing attacks, cache attacks and variants. Main approach: constant-time execution independent of the secret key . In particular: No secret-dependent conditional branches No secret-dependent table look-up accesses
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
18 / 30
GLV Method and Side-Channel Attacks
There are five key parts especially vulnerable in the computation of elliptic curve scalar multiplication:
Modular inversion: compute 𝑎−1 mod 𝑝 as a regular-pattern exponentiation 𝑎𝑝−2 mod 𝑝 using a short addition chain for 𝑝 − 2 . Reduction during field operations: exploit conditional move instructions with constant-time execution (e.g., cmove on x86 and x64 processors). Access to precomputed tables: run through whole table and extract required data by using conditional move instructions. Scalar recoding and exponentiation: use algorithms with regular-pattern execution.
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
19 / 30
GLV Method and Side-Channel Attacks Scalar recoding and exponentiation: We need a regular-pattern representation with fixed length.
Adapting Joye-Tunstall regular recoding to obtain fixed length: INPUT: scalar 𝑘 odd, dimension 𝑑 of 𝑙-bit GLV scalar mult and window width 𝑤 OUTPUT: 𝑘 𝑇 , 𝑘(𝑇−1) , … , 𝑘0 where 𝑘𝒕 ∈ ±1, ±3, ±5, … , ± 2𝑤−1 − 1
1.
𝑇 = 𝑙 𝑑 ∙ (𝑤 − 1)
2.
for 𝑖 = 0 to (𝑇 − 1) do
3.
𝑘𝑖 = 𝑘 mod 2𝑤 − 2𝑤−1
4.
𝑘 = 𝑘 − 𝑘𝑖
5.
end for
6.
𝑘𝑇 = 𝑘
2𝑤−1
In scalar multiplication, easy to treat odd/even 𝑘 during initialization in constant-
time. Requires a constant-time final correction after main computation. Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
20 / 30
GLV Method and Side-Channel Attacks Scalar recoding and exponentiation The modified algorithm: Executes in constant-time Produces fixed-length representations for scalars 𝑘𝑖 Produces regular representations for 𝑘𝑖 that enable regular-pattern double-andadd execution: (𝑤 − 1) DBLs, 𝑑 ADDs, … , (𝑤 − 1) DBLs, 𝑑 ADDs , repeated 𝑇 times INPUT: 𝑘𝑖 = 𝑘(𝑖,𝑇) , 𝑘(𝑖,𝑇−1) , … , 𝑘(𝑖,0) for 0 ≤ 𝑖 < 𝑑, point 𝑃 ∈ 𝐸(𝔽𝑝 ) , window 𝑤 OUTPUT: 𝑄 = [𝑘]𝑃 = 𝑑−1 𝑖=0 [𝑘𝑖 ]Ã𝑖 (𝑃) (assuming Ã0 𝑃 = 𝑃 by abuse of notation) 1.
𝑄 = [𝑘 0,𝑇 ]Ã0 𝑃 + ⋯ + [𝑘(𝑑−1,𝑇) ]Ã(𝑑−1) (𝑃)
2.
for 𝑗 = 𝑇 − 1 downto 0 do
3.
𝑄 = 2(𝑤−1) 𝑄
4. 5.
for 𝑖 = 0 to (𝑑 − 1) do 𝑄 = 𝑄 + [𝑘(𝑖,𝑗) ] Ã𝑖 (𝑃)
6.
end for
7.
end for
Some performance loss: nonzero density increases from 1 (𝑤 + 1) to 1 (𝑤 − 1). Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
21 / 30
Multicore Execution and its Protection
In addition: GLV scalar multiplication is easy to parallelize. Previously described side-channel countermeasures can be extended to the
multicore setting.
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
22 / 30
Efficient Field Arithmetic
Efficient Prime Forms
For extreme performance choose a Mersenne prime 𝑝 = 2𝑚 − 1 Reduction is extremely efficient (performed with a few adds, shifts and rotates)
Sadly, Mersenne primes are very scarce. When not possible choose a pseudo-Mersenne prime 𝑝 = 2𝑚 − 𝑐, where 𝑐 is “small” (i.e., 𝑐 < 2𝑤 , 𝑤 is computer wordsize) Reduction is still efficient A few additional techniques can be applied and combined E.g., incomplete reduction and lazy reduction
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
23 / 30
Efficient Prime Forms
For extreme performance choose a Mersenne prime 𝑝 = 2𝑚 − 1 Reduction is extremely efficient (performed with a few adds, shifts and rotates)
Sadly, Mersenne primes are very scarce. When not possible choose a pseudo-Mersenne prime 𝑝 = 2𝑚 − 𝑐, where 𝑐 is “small” (i.e., 𝑐 < 2𝑤 , 𝑤 is computer wordsize) Reduction is still efficient A few additional techniques can be applied and combined E.g., incomplete reduction and lazy reduction
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
23 / 30
Incomplete Reduction (IR) Yanik-Savaş-Koç in 2002:
Given 𝑎, 𝑏 ∈ [0, 𝑝 − 1] , allow the result of a given operation to stay in the range [0, 2𝑚 − 1] instead of performing a complete reduction, where 𝑝 < 2𝑚 < 2𝑝 − 1 . Possible cases: If 𝑝 = 2𝑚 − 𝑐, where 𝑐 < 2𝑤 , 𝑚 = 𝑛 ∙ 𝑤 (𝑛 : number of words, w : computer
wordsize) Reduction after addition 𝑎 + 𝑏 : discard carry bit in most significant word and then add 𝑐 If 𝑝 = 2𝑚 − 𝑐, where 𝑐 < 2𝑤 , 𝑚 = 𝑛 ∙ 𝑤 − 𝑧 (𝑧 : small integer)
Reduction slightly more expensive after addition 𝑎 + 𝑏 . However, a few additions may be accumulated w/o reduction
Subtraction does not require IR (already optimal!). But other operations may benefit
from IR: addition between completely reduced and incompletely reduced numbers, multiplication by constant, division by constant,… Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
24 / 30
Incomplete Reduction (IR)
Algorithm. Modular division by 2 with pseudo-Mersenne prime INPUT: integer a ∈ [0, 2m - 1], p = 2m – c OUTPUT: (a) r = a/2 (mod p) or (b) a/2 ∈ [1, 2m – 1] (a) Complete reduction 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Patrick Longa
carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 borrow = 0 For i from 0 to (n - 1) do (borrow, R[i]) ← r[i] - p[i] - borrow If borrow = 0 r←R Return r
(b) Incomplete reduction 1. 2. 3. 4. 5. 6. 7.
carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 Return r
Efficient and Secure Elliptic Curve Scalar Multiplication
25 / 30
Incomplete Reduction (IR)
Algorithm. Modular division by 2 with pseudo-Mersenne prime INPUT: integer a ∈ [0, 2m - 1], p = 2m – c OUTPUT: (a) r = a/2 (mod p) or (b) a/2 ∈ [1, 2m – 1] (a) Complete reduction 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Patrick Longa
carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 borrow = 0 For i from 0 to (n - 1) do (borrow, R[i]) ← r[i] - p[i] - borrow If borrow = 0 r←R Return r
(b) Incomplete reduction 1. 2. 3. 4. 5. 6. 7.
carry = 0 If a is odd For i from 0 to (n - 1) do (carry, r[i]) ← a[i] + p[i] + carry For i from (n - 1) to 0 do (carry, r[i]) ← (carry, r[i])/2 Return r
(r – p) in case r = (p+a)/2 2 [p, 2m – (c+1)/2]
Efficient and Secure Elliptic Curve Scalar Multiplication
25 / 30
Lazy Reduction in 𝔽𝑝2
A sum of products modulo 𝑝
±𝑎𝑖 𝑏𝑖 mod 𝑝 can be reduced with only one reduction
Addition of inner products are accumulated as “double-precision” integers ±𝑎𝑖 𝑏𝑖 < 2𝑠 , where 𝑠 = 𝑡 ∙ 𝑤 (for efficiency), where 𝑡 is number of words
to represent a “double-precision” integer
Example: multiplication in 𝔽𝑝2 (𝑐 = 𝑎 × 𝑏 using Karatsuba): Let a = (a0, a1) and b = (b0, b1) ∈ 𝔽𝑝2
3Mu + 3R
rdcn
c0 = (a0 × b0) + β (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0 × b0 - a1 × b1
rdcn
3Mu + 2R
* Mu and R stands for the cost of integer multiplication and reduction over 𝔽𝑝 , resp. Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
26 / 30
Lazy Reduction in 𝔽𝑝2
A sum of products modulo 𝑝
±𝑎𝑖 𝑏𝑖 mod 𝑝 can be reduced with only one reduction
Addition of inner products are accumulated as “double-precision” integers ±𝑎𝑖 𝑏𝑖 < 2𝑠 , where 𝑠 = 𝑡 ∙ 𝑤 (for efficiency), where 𝑡 is number of words
to represent a “double-precision” integer
Example: multiplication in 𝔽𝑝2 (𝑐 = 𝑎 × 𝑏 using Karatsuba): Let a = (a0, a1) and b = (b0, b1) ∈ 𝔽𝑝2
3Mu + 3R
rdcn
c0 = (a0 × b0) + β (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0 × b0 - a1 × b1
rdcn
3Mu + 2R
* Mu and R stands for the cost of integer multiplication and reduction over 𝔽𝑝 , resp. Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
26 / 30
Incomplete + Lazy Reduction in 𝔽𝑝2 Let 𝑝 = 2127 − 2569, 𝑤 = 4, 𝑡 = 4 2𝑠 = 2256 Let a0, a1, b0, b1 ∈ [0, 2127 – 1], i.e., incompletely reduced numbers Let 𝔽𝑝2 = 𝔽𝑝 [𝑖]/(𝑖 2 + 1) (since –1 is QNR in 𝔽𝑝 ) Multiplication in 𝔽𝑝2 , a×b = (a0, a1) × (b0, b1) :
c0 = (a0 × b0) - (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0×b0 - a1×b1 T0 ← a0 × b0 [0, 2254] T1 ← a1 × b1 [0, 2254] c0 ← T0 – T1 (mod p) if < 0 correct by adding 2128 ∙ 𝑝 [0, 2255]. After rdcn [0, 2127– 2569] t0 ← a0 + a1 [0, 2128], no rdcn t1 ← b0 + b1 [0, 2128], no rdcn T2 ← t0 × t1 [0, 2256] T2 ← T2 – T0 [0, 2256], no correction to (+) c1 ← T2 – T1 (mod p) [0, 2256], no correction to (+). After rdcn [0, 2127–2569] Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
27 / 30
Incomplete + Lazy Reduction in 𝔽𝑝2 Let 𝑝 = 2127 − 2569, 𝑤 = 4, 𝑡 = 4 2𝑠 = 2256 Let a0, a1, b0, b1 ∈ [0, 2127 – 1], i.e., incompletely reduced numbers Let 𝔽𝑝2 = 𝔽𝑝 [𝑖]/(𝑖 2 + 1) (since –1 is QNR in 𝔽𝑝 ) Multiplication in 𝔽𝑝2 , a×b = (a0, a1) × (b0, b1) :
c0 = (a0 × b0) - (a1 × b1) c1 = (a0 + a1) × (b0 + b1) - a0×b0 - a1×b1 T0 ← a0 × b0 [0, 2254] T1 ← a1 × b1 [0, 2254] c0 ← T0 – T1 (mod p) if < 0 correct by adding 2128 ∙ 𝑝 [0, 2255]. After rdcn [0, 2127– 2569] t0 ← a0 + a1 [0, 2128], no rdcn t1 ← b0 + b1 [0, 2128], no rdcn T2 ← t0 × t1 [0, 2256] T2 ← T2 – T0 [0, 2256], no correction to (+) c1 ← T2 – T1 (mod p) [0, 2256], no correction to (+). After rdcn [0, 2127–2569] Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
27 / 30
Generalized Lazy Reduction on ECC Easy to adapt technique by [Aranha-Karabina-Longa-Gebotys-López 2011] to ECC arithmetic with a few variations: “Delay” 𝔽𝑝2 reductions in point formulas E.g., let 𝑇 = (𝑋1 , 𝑌1 , 𝑍1 ) ∈ 𝐸′(𝔽𝑝2 ) be in Jacobian coordinates.
To compute 2𝑇 = (𝑋2 , 𝑌2 , 𝑍2 ) [Longa-Gebotys 2010]:
X2 = A2 – 2B Y2 = A(B – X2) – Y14
rdcn
Z2 = Y1Z1, where A = 3(X1+Z12)(X1 – Z12)/2 and B = X1Y12 This formula costs 4mu+4su+9a+7r (eliminating one 𝔽𝑝2 reduction) * mu and su stands for the cost of multiplication and squaring w/o reduction over 𝔽𝑝2 , resp., a and r stands for the cost of addition and reduction 𝔽𝑝2 , resp. Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
28 / 30
Efficient Point Arithmetic
Jacobian Coordinates I Faster formulas with reduced number of “adds” (assuming add=sub=div2=mul2): Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏
𝑥, 𝑦 ↦ 𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mDBLADD (Z2 = 1) mADD (Z2 = 1)
4M + 4S + 9A [Longa 2010] 13M + 5S + 13A [Longa 2007] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]
Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 + 𝑏 𝑥, 𝑦 ↦
𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mADD (Z2 = 1)
3M + 4S + 7A [Longa 2010] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]
One may replace muls for sqrs if 1mul > 1sqr + 3“adds” using the
transformation 𝑎 ∙ 𝑏 = [ 𝑎 + 𝑏 Patrick Longa
2
− 𝑎2 − 𝑏2 ]/2, when 𝑎2 and 𝑏 2 are known
Efficient and Secure Elliptic Curve Scalar Multiplication
29 / 30
Jacobian Coordinates I Faster formulas with reduced number of “adds” (assuming add=sub=div2=mul2): Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏
𝑥, 𝑦 ↦ 𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mDBLADD (Z2 = 1) mADD (Z2 = 1)
4M + 4S + 9A [Longa 2010] 13M + 5S + 13A [Longa 2007] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]
Jacobian coordinates on short Weierstrass curve 𝑦 2 = 𝑥 3 + 𝑏 𝑥, 𝑦 ↦
𝑋 𝑍 2 , 𝑌 𝑍 3 , 1 , 𝑋: 𝑌: 𝑍 = {λ2 𝑋, λ3 𝑌, λ𝑍: λ ∈ 𝔽 𝑝∗} DBL mADD (Z2 = 1)
3M + 4S + 7A [Longa 2010] 8M + 3S + 7A [Hankerson-Menezes-Vanstone 2004]
One may replace muls for sqrs if 1mul > 1sqr + 3“adds” using the
transformation 𝑎 ∙ 𝑏 = [ 𝑎 + 𝑏 Patrick Longa
2
− 𝑎2 − 𝑏2 ]/2, when 𝑎2 and 𝑏 2 are known
Efficient and Secure Elliptic Curve Scalar Multiplication
29 / 30
Jacobian Coordinates II
Minimizing costs: ∗
Trade additions for subtractions (or vice versa) by applying λ = –1 ∈ 𝔽 𝑝
∗
Minimize constants and additions/subtractions by applying λ = 2–1 ∈ 𝔽 𝑝
Example: 𝑋2 , 𝑌2 , 𝑍2 ← 2(𝑋1 , 𝑌1 , 𝑍1 ) using Jacobian coordinates A = 3(X1 + Z12)(X1 – Z12), B = 4X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – 8Y14 Z2 = 2Y1Z1
A = 3(X1 + Z12)(X1 – Z12)/2, B = X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – Y14 Z2 = Y1Z1
Several constants are eliminated
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
30 / 30
Jacobian Coordinates II
Minimizing costs: ∗
Trade additions for subtractions (or vice versa) by applying λ = –1 ∈ 𝔽 𝑝
∗
Minimize constants and additions/subtractions by applying λ = 2–1 ∈ 𝔽 𝑝
Example: 𝑋2 , 𝑌2 , 𝑍2 ← 2(𝑋1 , 𝑌1 , 𝑍1 ) using Jacobian coordinates A = 3(X1 + Z12)(X1 – Z12), B = 4X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – 8Y14 Z2 = 2Y1Z1
A = 3(X1 + Z12)(X1 – Z12)/2, B = X1Y12 X2 = A2 – 2B Y2 = A(B – X2) – Y14 Z2 = Y1Z1
Several constants are eliminated
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
30 / 30
Twisted Edwards Coordinates Assuming again add=sub=div2=mul2 Mixed extended/homogeneous coordinates on Twisted Edwards curve :
𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 𝑥, 𝑦 ↦ 𝑋 𝑍, 𝑌 𝑍, 1, 𝑇 𝑍 , 𝑋: 𝑌: 𝑍: 𝑇 = {λ𝑋, λ𝑌, λ𝑍, λ𝑇: λ ∈ 𝔽 𝑝∗}, 𝑇 = 𝑋𝑌/𝑍 DBL
4M + 3S + 6A [Bernstein-Birkner-Joye-Lange-Peters 2008]
DBLADD
12M + 3S + 11A [Hisil-Wong-Carter-Dawson 2008]
For all these formulas, one may replace muls for sqrs if 1mul > 1sqr + 3adds
(however, that is not generally the case on many processors!) (In some cases) there are some additional ops when working on a GLS curve over
𝔽𝑝2 (operations with twisting parameter 𝑢)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
31 / 30
Twisted Edwards Coordinates Assuming again add=sub=div2=mul2 Mixed extended/homogeneous coordinates on Twisted Edwards curve :
𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 𝑥, 𝑦 ↦ 𝑋 𝑍, 𝑌 𝑍, 1, 𝑇 𝑍 , 𝑋: 𝑌: 𝑍: 𝑇 = {λ𝑋, λ𝑌, λ𝑍, λ𝑇: λ ∈ 𝔽 𝑝∗}, 𝑇 = 𝑋𝑌/𝑍 DBL
4M + 3S + 6A [Bernstein-Birkner-Joye-Lange-Peters 2008]
DBLADD
12M + 3S + 11A [Hisil-Wong-Carter-Dawson 2008]
For all these formulas, one may replace muls for sqrs if 1mul > 1sqr + 3adds
(however, that is not generally the case on many processors!) (In some cases) there are some additional ops when working on a GLS curve over
𝔽𝑝2 (operations with twisting parameter 𝑢)
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
31 / 30
Point Formulas: Summary Operation
Coord.
Curve
Cost (𝔽𝒑 )
Cost (GLS method, 𝔽𝒑𝟐 )
DBL
Jacobian
Weierstrass 𝑎=0
3M + 4S + 7A
same
DBL
Jacobian
Weierstrass 𝑎 = −3
4M + 4S + 9A
4m + 4s + 11a
mADD
Jacobian
Weierstrass 𝑎 = 0, 𝑎 = −3
8M + 3S + 7A
same
mDBLADD
Jacobian
Weierstrass 𝑎 = −3
13M + 5S + 13A
same
DBLADD
Jacobian
Weierstrass 𝑎 = −3
16M + 5S + 13A
same
homogeneous
Twisted Edwards 𝑎 = −1
4M + 3S + 5A
4m + 3s + 7a
mDBLADD
Mixed extended/ homogeneous
Twisted Edwards 𝑎 = −1
11M + 3S + 11A
12m + 3s + 16a
DBLADD
Mixed extended/ homogeneous
Twisted Edwards 𝑎 = −1
12M + 3S + 11A
13m + 3s + 16a
DBL
(1) Assuming that multiplying by 𝑢 costs about 2 adds (2) mXXX involves use of mixed affine/projective coordinates when 𝑍 = 1 for point to be added Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
32 / 30
Experimental Results
Setup I For our experiments, we consider the five curves below: two GLV curves in Weierstrass form with and without nontrivial automorphisms, their corresponding GLV-GLS counterparts and one curve in Twisted Edwards form isomorphic to the GLV-GLS curve 𝐸′3 (see below).
GLV-GLS curve with 𝑗-invariant 0 in Weierstrass form 𝐸′1 /𝔽𝑝 12 ∶ 𝑦 2 = 𝑥 3 + 9𝑢 ,
where 𝑝1 = 2127 − 58309 and #𝐸 ′1 (𝔽𝑝 12) is a 254-bit prime. We use 𝔽𝑝 12 = 𝔽𝑝1 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 12. We have that ©2 + © + 1 = 0 and ª2 + 1 = 0 . GLV curve with 𝑗-invariant 0 in Weierstrass form 𝐸2 𝔽𝑝2 : 𝑦 2 = 𝑥 3 + 2 , where 𝑝2 = 2256 − 11733 and #𝐸2 𝔽𝑝2 is a 256-bit prime.
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
33 / 30
Setup II
∶ 𝑦 2 = 𝑥 3 − 15 2 𝑢2 𝑥 − 7𝑢3 , where 𝑝3 = 2127 − 5997 and #𝐸 ′ 3 (𝔽𝑝 32) = 8𝑟 , where 𝑟 is a 251-bit prime. We use 𝔽𝑝 32 = 𝔽𝑝3 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 2 .
GLV-GLS curve in Weierstrass form 𝐸′3 /𝔽𝑝
2
2
2
2
2 3
3
We have that © + 2 = 0 and ª + 1 = 0 . GLV-GLS curve in Twisted Edwards form 𝐸′ 𝑇3 𝔽𝑝 32 : −𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 , where 𝑝3 = 2127 − 5997, 𝑑 = 170141183460469231731687303715884099728 + 116829086847165810221872975542241037773𝑖 and #𝐸 ′ 𝑇3 (𝔽𝑝 32) = 8𝑟 , where 𝑟 is a 251-bit prime. We use again 𝔽𝑝 32 = 𝔽𝑝3 𝑖 /(𝑖 2 + 1) and 𝑢 = 1 + 𝑖 ∈ 𝔽𝑝 2 . 3
We have that © + 2 = 0 and ª + 1 = 0 . 𝐸′ 𝑇3 is isomorphic to curve 𝐸′3 above. GLV curve 𝐸4 𝔽𝑝4 : 𝑦 2 = 𝑥 3 − 15 2 𝑥 − 7 , where 𝑝4 = 2256 − 45717 and #𝐸4 𝔽𝑝4 = 2𝑟 , where 𝑟 is a 256-bit prime.
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
34 / 30
Results I: single-core, no protection Operation count and performance of scalar multiplication (∼128 bits of security). Theoretical estimates and actual results based on tests on a single core of a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor .
Curve
Method
Total Cost
Gain
Performance
Gain
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
4-GLV-GLS
1209m
51%
99,000cc
53%
𝐸2 𝔽𝑝2 , Weierstrass
2-GLV
2004M ≈ 1824m
-
151,000cc
-
𝐸 ′ 𝑇3(𝔽𝑝 32) , Twisted Edwards
4-GLV-GLS
1117m
97%
91,000cc
102%
𝐸 ′ 3 (𝔽𝑝 32), Weierstrass
4-GLV-GLS
1468m
50%
121,000cc
52%
𝐸4 𝔽𝑝4 , Weierstrass
2-GLV
2416M ≈ 2199m
-
184,000cc
-
About 50% speed-up when moving from 2-GLV to 4-GLV-GLS . Twisted Edwards injects a further 30% speed-up to curve 𝐸 ′ 3 . * m, s and a stand for costs of multiplication, squaring and addition over 𝔽𝑝2 , and M for cost of multiplication over 𝔽𝑝 . Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
35 / 30
Results II: single and multi-core, unprotected and protected Performance of scalar multiplication (∼128 bits of security). Results based on tests on a single core of a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor . Curve
Method
Protection
#Cores
Performance
𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards
4-GLV-GLS
91,000cc
𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards
4-GLV-GLS
137,000cc
𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards
4-GLV-GLS
61,000cc
𝐸 ′ 𝑇3 (𝔽𝑝 32) , Twisted Edwards
4-GLV-GLS
78,000cc
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
4-GLV-GLS
99,000cc
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
4-GLV-GLS
145,000cc
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
4-GLV-GLS
70,000cc
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
4-GLV-GLS
89,000cc
𝐸 ′1 (𝔽𝑝 12) , Weierstrass
non-GLV
201,000cc
𝐸2 (𝔽𝑝2 ), Weierstrass
2-GLV
151,000cc
𝐸2 (𝔽𝑝2 ), Weierstrass
2-GLV
127,000cc
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
36 / 30
Results III: single and multi-core, unprotected and protected
2x speed-up when moving from non-GLV to 4-GLV-GLS on curve 𝐸 ′1 (sequential/
unprotected version) Up to 76% speed-up when using multicore execution (protected version) 46%-50% overhead for protecting sequential implementations Only ∼28% overhead for protecting multicore implementations As before, ∼50% speed-up when moving from 2-GLV to 4-GLV-GLS (curve 𝐸 ′1 ). Four-core GLV-GLS is 1.81x faster than the standard two-core 2-GLV (curve 𝐸 ′1 ). Twisted Edwards curve 𝐸 ′ 𝑇3 is 6%-15% faster than Weierstrass curve 𝐸 ′1 .
Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
37 / 30
Results III: New Speed Records
Our implementations using the new GLV-GLS curves have set new speed records for elliptic curves over large prime characteristic fields for several scenarios (x64 processors) Unprotected versions:
Sequential: 91,000 cycles (previous: Hu-Longa-Xu 2011, 122,000 cycles) Multicore: 61,000 cycles (no previous record). Versions fully protected against timing-type side-channel attacks:
Sequential: 137,000 cycles (previous: Bernstein et al. 2011, 194,000 cycles) Multicore: 78,000 cycles (no previous record).
* Figures on a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor. Patrick Longa
Efficient and Secure Elliptic Curve Scalar Multiplication
39 / 30
Highly-Efficient and Secure Elliptic Curve Scalar Multiplication using the 4-GLV Method
Q&A Patrick Longa Microsoft Research
http://research.microsoft.com/en-us/people/plonga/
Joint work with Zhi Hu, Francesco Sica and Maozhi Xu