2. ) â 0. Elliptic Curve Cryptography (ECC): basics point doubling. 1. 1. ( , ) .... âConsider now the possibility that one in a million of all curves have an exploitable ...
NUMS Elliptic Curves and their Implementation Patrick Longa Microsoft Research Joppe Bos Craig Costello Michael Naehrig
NXP Semiconductors Microsoft Research Microsoft Research
Elliptic Curve Cryptography (ECC) (Independently) discovered by V. Miller and N. Koblitz in mid 80’s. Elliptic curves enable secure and efficient cryptographic systems, currently competing with RSA to become the next generation public-key system. ECC key size (bits)
RSA key size (bits)
Key size ratio
160
1024
1:6
256
3072
1:12
128
384
7680
1:20
192
512
15360
1:30
256
NIST Guidelines for Public Key Sizes
AES key size (bits)
Elliptic Curve Cryptography (ECC): basics (Focusing on the prime case) A (short Weierstrass) elliptic curve over a prime field 𝔽𝑝 is given by:
𝐸/𝔽𝑝 : 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏 where 𝑎, 𝑏 ∈ 𝔽𝑝 and ∆ = −16(4𝑎3 + 27𝑏 2 ) ≠ 0.
Elliptic Curve Cryptography (ECC): basics (Focusing on the prime case) A (short Weierstrass) elliptic curve over a prime field 𝔽𝑝 is given by:
𝐸/𝔽𝑝 : 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏 where 𝑎, 𝑏 ∈ 𝔽𝑝 and ∆ = −16(4𝑎3 + 27𝑏 2 ) ≠ 0.
𝑥, 𝑦 ∈ 𝔽𝑝 × 𝔽𝑝 : 𝑦 2 − 𝑥 3 − 𝑎𝑥 − 𝑏 = 0 ∪ {∞} is an abelian group with group operation (+) • Given points 𝑃 and 𝑄, any of the following cases can arise for 𝑅 = 𝑃 + 𝑄: •
𝑃=∞ ⇒𝑅=𝑄 𝑄=∞ ⇒𝑅=𝑃 𝑃 = −𝑄 ⇒ 𝑅 = ∞ 𝑃 = 𝑄 ⇒ 𝑅 = 2𝑃 (point doubling) otherwise 𝑅 = 𝑃 + 𝑄 (point addition)
Q ( x2 , y2 )
P ( x1, y1 )
y2 y1 x x 2 1 2 x3 x1 x2 y x x y 1 3 1 3
P Q ( x3 , y3 )
point addition
Elliptic Curve Cryptography (ECC): basics (Focusing on the prime case) A (short Weierstrass) elliptic curve over a prime field 𝔽𝑝 is given by:
𝐸/𝔽𝑝 : 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏 where 𝑎, 𝑏 ∈ 𝔽𝑝 and ∆ = −16(4𝑎3 + 27𝑏 2 ) ≠ 0.
𝑥, 𝑦 ∈ 𝔽𝑝 × 𝔽𝑝 : 𝑦 2 − 𝑥 3 − 𝑎𝑥 − 𝑏 = 0 ∪ {∞} is an abelian group with group operation (+) • Given points 𝑃 and 𝑄, any of the following cases can arise for 𝑅 = 𝑃 + 𝑄: •
𝑃=∞ ⇒𝑅=𝑄 𝑄=∞ ⇒𝑅=𝑃 𝑃 = −𝑄 ⇒ 𝑅 = ∞ 𝑃 = 𝑄 ⇒ 𝑅 = 2𝑃 (point doubling) otherwise 𝑅 = 𝑃 + 𝑄 (point addition)
P ( x1, y1 )
3 x12 a 2 y 1 2 x3 2 x1 y x x y 1 3 1 3
2 P ( x3 , y3 )
point doubling
Elliptic Curve Cryptography (ECC): basics (Focusing on the prime case) Let #𝐸 = ℎ ∙ 𝑛, where 𝑛 is a large prime and ℎ is a small cofactor (typically 1, 2 or 4). ECC computations are performed on the subgroup of prime order 𝑛, which we will denote 𝐸(𝔽𝑝 ). Scalar multiplication: given a point 𝑃 ∈ 𝐸(𝔽𝑝 ), with order 𝑛, and an integer 𝑘 ∈ [1, 𝑛 , compute 𝑄 = 𝑘 𝑃 = 𝑃 + 𝑃 + ⋯ + 𝑃 (𝑘 times) This is the central and most time-consuming operation in ECC.
ECC arithmetic layers
SCALAR
Scalar multiplication operations E.g., [k]P, [u]P+[v]Q, etc.
ARITHMETIC
POINT ARITHMETIC
Doubling and addition of points: 2P , P+Q
FIELD ARITHMETIC
Field addition, subtraction, multiplication, squaring, inversion
TLS Handshake with ECDHE_ECDSA: an example Client
Server
ClientHello
Verify(𝐵, …) : 𝑢∗𝑃+𝑣∗𝑄 Ephemeral public key 𝐴=𝑎∗𝐺 Premaster secret 𝑀 = 𝑎 ∗ 𝐵 (≡ 𝑎 ∗ 𝑏 ∗ 𝐺)
ServerHello Certificate ServerKeyExchange CertificateRequest* ServerHelloDone
Ephemeral public key 𝐵 =𝑏∗𝐺 Sign(𝐵, …) : 𝑟 ∗ 𝐻
Certificate* ClientKeyExchange CertificateVerify* [ChangeCipherSpec] Finished [ChangeCipherSpec] Finished ApplicationData
ApplicationData
Premaster secret 𝑀 = 𝑏 ∗ 𝐴 (≡ 𝑎 ∗ 𝑏 ∗ 𝐺)
(Standard) NIST elliptic curves NIST-recommended prime elliptic curves (part of NSA’s Suite B) cover three security levels (matching AES): 128-bit : P-256 (𝑝 = 2256 − 2224 + 2192 + 296 − 1) 192-bit : P-384 (𝑝 = 2384 − 2128 − 296 + 232 − 1) 256-bit : P-521 (𝑝 = 2521 − 1) These curves have short Weierstrass form (with 𝑎 = −3 for efficiency) and cofactor ℎ = 1.
June 2013 – the Snowden leaks “… the NSA had written the [crypto] standard and could break it.”
NIST’s Curve P-256: one-in-a-million? Base field prime: Elliptic curve: Curve constant: Seed:
𝑝 = 2256 − 2224 + 2192 + 296 − 1 𝐸/𝑭𝑝 : 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏 𝑏=
27 − SHA1 𝑠
𝑠 = c49d360886e704936a6678e1139d26b7819f7e90
Scott ‘99: “Consider now the possibility that one in a million of all curves have an exploitable structure that "they" know about, but we don't.. Then "they" simply generate a million random seeds until they find one that generates one of "their" curves…”
Post-Snowden responses • Bruce Schneier: “I no longer trust the constants. I believe the NSA has manipulated them…”
• TLS Working Group: Formal request to CFRG for new elliptic curves for usage in TLS (and other Internet standards) • NIST: Plans to host workshop to discuss new elliptic curves are announced: http://crypto.2014.rump.cr.yp.to/487f98c1a1a031283925d7affdbdef1c.pdf
Our motivations 1. Curves that regain confidence and acceptance from public - rigid generation / “nothing up my sleeves” 2. Improved performance and security for standard ECC algorithms and protocols - new curve models - faster finite fields - side-channel resistance Industry moving to Perfect Forward Secrecy (PFS) modes (e.g., ECDHE)
Our ECC research Comprehensive analysis • Curve forms and their arithmetic • Prime forms • Constant-time and exception-free implementation • Performance in protocols (ECDHE and ECDSA)
Curve forms and their arithmetic Weierstrass curves 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏
• • • •
Most general form Prime order possible Exceptions in group law NIST and Brainpool curves
Montgomery curves
Twisted Edwards curves
𝐵𝑦 2 = 𝑥 3 + 𝐴𝑥 2 + 𝑥
𝑎𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 4 𝑦 4
• Subset of curves • Not prime order • Montgomery ladder
• • • •
Subset of curves Not prime order Fastest arithmetic Some have complete group law
Prime forms Assume 𝑠 ∈ 128, 192, 256 . Two candidates: Pseudo-Mersenne primes: 𝑝 = 2𝑚 − 𝑐, with small 𝑐 Two options: 𝑚 ∈ {2𝑠, 2𝑠 − 1}
Montgomery-friendly primes: 𝑝 = 2𝑎 (2𝑏 − 𝑐) − 1 Two options: 𝑎 = 8𝑡 and 𝑏 ∈ {2𝑠 − 𝑎, 2𝑠 − 2 − 𝑎}, where 𝑡 and 𝑐 are the smallest positive integers s.t. 𝑝 is prime • Deterministically given by smallest 𝑐 such that 𝑝 ≡ 3 mod 4. • Support efficient arithmetic on wide range of devices: 8-, 32-, 64-bit (and beyond).
Constant-time and exception-free • Regular algorithms for scalar multiplication: Resilience against simple side-channel attacks • No secret-data conditional branches, no secret memory accesses: Resilience against timing and cache attacks • No algorithmic failures during scalar multiplications: Resilience against exceptional point attacks E.g., fixed-window method for variable-base scalar multiplication 𝑤=4
[ …, 0, -1, 0, 1, 1, 1, 0, -1, -1, 0, 1, 1, … ] [ …, -3, 11, -5, … ]
… 4 DBLs, ADD(-3P), 4 DBLs, ADD(11P), 4 DBLs, ADD(-5P), …
Constant-time and exception-free • Regular algorithms for scalar multiplication: Resilience against simple side-channel attacks • No secret-data conditional branches, no secret memory accesses: Resilience against timing and cache attacks • No algorithmic failures during scalar multiplications: Resilience against exceptional point attacks E.g., point extraction through linear passes over precomputed tables (Assume that input “𝑥” contains the secret index) 𝑄 = table[0] for 𝑖 = 1 to (2𝑤−1 − 1) 𝑥-𝑚𝑎𝑠𝑘 = ( 𝑥 OR−𝑥 ≫ bitlength − 1 ) − 1 𝑄 = 𝑚𝑎𝑠𝑘 AND 𝑄 XOR table 𝑖 XOR 𝑄 end for
table[0] table[1] table[2]
3P
table[3]
7P
P 5P
Constant-time and exception-free • Regular algorithms for scalar multiplication: Resilience against simple side-channel attacks • No secret-data conditional branches, no secret memory accesses: Resilience against timing and cache attacks • No algorithmic failures during scalar multiplications: Resilience against exceptional point attacks Solutions in the paper:
- Proved that scalar multiplications work for any possible pair {k, P} using the fastest, dedicated formulas (whenever possible) - Clear small torsion in twisted Edwards curves (applying point doublings before execution)
- New “pseudo-complete” addition formulas for Weierstrass curves
The problem of Weierstrass completeness • Bosma-Lenstra specialized to 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏, homogeneous space, the sum 𝑋1 : 𝑌1 : 𝑍1 + 𝑋2 : 𝑌2 : 𝑍2 will be at least one of 𝑋3 : 𝑌3 : 𝑍3 or 𝑋3 ′: 𝑌3 ′: 𝑍3 ′ :
• For our 𝑎 = −3 Weierstrass curves, after reasonable level of optimization the above gave 𝟐𝟐𝑴 + 𝟒𝑴𝒃 (compared to ≈ 𝟏𝟒𝑴 for dedicated projective additions) • Roughly, this is expected to be twice as slow There’s got to be a better way…
Weierstrass “pseudo-completeness” We give a “pseudo-complete’’ addition for general Weierstrass curves that works for any possible inputs when computing 𝑃 + 𝑄 Approach: • • • •
Find “common” computations between addition and doubling formulas Create a mask after evaluating if the computation is addition or doubling Mask inputs to “common” computations using mask A table lookup is used to select other possible results (∞, 𝑃 and 𝑄)
Weierstrass “pseudo-completeness” Example: computation of the Y-coordinate For doubling 2𝑃:
For addition 𝑃 + 𝑄:
𝑌2𝑃 = 3𝑋𝑃2 − 3𝑍𝑃4 𝑋𝑃 𝑌𝑃2 − 𝑋2𝑃 − 𝑌𝑃4
𝑌𝑃+𝑄 = 𝑍𝑃3 𝑌𝑄 − 𝑍𝑄3 𝑌𝑃
𝑌2𝑃 = 𝛼 𝑋𝑃 𝛽 2 − 𝑋 − 𝑌𝑃 𝛽 3
𝑌𝑃+𝑄 = 𝛼 𝑋𝑃 𝛽 2 − 𝑋 − 𝑌𝑃 𝛽 3
If (DBL): α = 3𝑋𝑃2 − 3𝑍𝑃4
else: α = 𝑍𝑃3 𝑌𝑄 − 𝑍𝑄3 𝑌𝑃 𝑍𝑃2 𝑋𝑄
𝛽 = 𝑌𝑃
𝛽=
𝑋 = 𝑋𝑃
𝑋 = 𝑋𝑃+𝑄
−
𝑍𝑄2 𝑋𝑃
• Jac+aff (original, dedicated) = 8M+3S, • Jac+aff (complete-masking) = 8M+3S+𝝐 (𝜖 ≈ 20%) ~10% overhead in the total cost of scalar multiplication
𝑋𝑃 𝑍𝑃2 𝑋𝑄 − 𝑍𝑄2 𝑋𝑃
2
− 𝑋𝑃+𝑄 − 𝑌𝑃 𝑍𝑃2 𝑋𝑄 − 𝑍𝑄2 𝑋𝑃
One only needs to choose the right values using predefined mask for computing the fully common expression
3
Curve selection Requirements (in order of importance) 1. Security and rigidity Simple/deterministic/unquestionable curve generation, ECDLP bit-security, conservative curve features, side-channel resiliency
2. Compatibility CPU alignment and bitlength compatibility, design consistency across security levels, cofactor issues in existing protocols
3. Performance By principle, not to trade a small gain in a given requirement at the expense of a (small) loss in another requirement above in the list
Curve selection • Consistency through all the security levels • Maximal ECDLP security • Standard prime bitlengths: favors compatibility with existing {NIST, Brainpool} implementations • Little room for manipulation Weierstrass • Backwards compatible with existing {NIST, Brainpool} implementations Twisted Edwards curve • fastest curve arithmetic • no Montgomery form necessary: no conversions required
protocol layer
Constant-time, exception-free algorithms to do crypto
128-bit security
192-bit security
Weierstrass curves
256-bit security
twisted Edwards curves
Pseudo-Mersenne primes with maximal bitlength: 𝟐𝟐𝟓𝟔 − 𝒄, 𝟐𝟑𝟖𝟒 − 𝒄, 𝟐𝟓𝟏𝟐 − 𝒄
Montgomery curves
“Nothing-Up-My-Sleeve” curve generation Given the short Weierstrass curve 𝐸𝑏 : 𝑦 2 = 𝑥 3 − 3𝑥 + 𝑏 with quadratic twist 𝐸′𝑏 : 𝑦 2 = 𝑥 3 − 3𝑥 − 𝑏. 1. Pick a bit-security 𝑠 from the standard set {128, 192, 256} 2. Fix 𝑝 = 22𝑠 − 𝑐, where 𝑐 is the smallest integer s.t. 𝑝 ≡ 3 mod 4 (standard choice) 3. Find smallest |𝑏| s.t. #𝐸𝑏 and #𝐸′𝑏 are prime. Choose ±𝑏 based on smallest order Given the twisted Edwards curve 𝐸𝑑 : −𝑥 2 + 𝑦 2 = 1 + 𝑑𝑥 2 𝑦 2 with quadratic twist 𝐸′𝑑 : −𝑥 2 + 𝑦 2 = 1 + (1/𝑑)𝑥 2 𝑦 2 and birationally equivalent to the Montgomery curve 𝐸𝐴 : 𝑦 2 = 𝑥 3 + 𝐴𝑥 2 + 𝑥. 1. and 2. 3. Find smallest 𝑑 > 0 such that #𝐸𝑑 and #𝐸′𝑑 are 4 × prime, and 𝑡 > 0 Note: search that minimizes 𝑑 also minimizes the Montgomery constant (𝐴 + 2)/4.
The NUMS curves Security Prime 𝒔= 𝒑= 128 2256 − 189
192 256
2384 − 317 2512 − 569
Weierstrass Twisted Edwards 𝒃= 𝒅= 152961 15342
−34568 121243
333194 637608
Montgomery 𝑨= −61370
−1332778 −2550434
The NUMS curves Security Prime 𝒔= 𝒑= 128 2256 − 189
192 256
2384 − 317 2512 − 569
Weierstrass Twisted Edwards 𝒃= 𝒅= 152961 15342
−34568 121243
333194 637608
Montgomery 𝑨= −61370
−1332778 −2550434
Other candidates (in the CFRG curve selection): ‐ Curve25519: Montgomery curve 𝑦 2 = 𝑥 3 + 𝐴𝑥 2 + 𝑥 defined over GF(2255 − 19) ‐ Ed448-Goldilocks: Edwards curve 𝑥 2 + 𝑦 2 = 1 − 39081𝑥 2 𝑦 2 defined over GF(2448 − 2224 − 1) ‐ E-521: Edwards curve 𝑥 2 + 𝑦 2 = 1 − 376014𝑥 2 𝑦 2 defined over GF(2521 − 1)
MSR ECCLib: a library for the NUMS curves Main features: • Wide variety of scalar multiplications, multiple security levels, different curve forms • Regular algorithms for scalar multiplication: Basic protection against simple side-channel attacks • No secret-data conditional branches, no secret memory accesses: Resilience against timing and cache attacks • No algorithmic failures during scalar multiplications: Resilience against exceptional point attacks
MSR ECCLib: overview complementary API (e.g., modular operations, etc.)
tEd W
Protocol API (ECDH, ECDSA, etc.)
tEd W
ECC API (point formulas, scalar mul, etc.)
Base field API C + intrinsics
32-bit
64-bit
Arch-based C 256 384 512
256 384 512
asm
Fully portable C 256 384 512
x64
ARM32
ARM64
256 384 512
256 384 512
256 384 512
tEd: twisted Edwards W: Weierstrass
MSR ECCLib: a library for the NUMS curves • Design prioritizes security, maintainability and scalability • Modularized design keeps separate API for field, curve and protocol layers (Among many other benefits) this helps to reduce the possibility of implementation flaws
Cons: Expected (relatively minor) loss in efficiency
Comparison of x64 assembly implementations Running on a 3.4GHz Intel Core i7 (Sandy Bridge) processor: • MSR ECCLib computes one 256-bit scalar multiplication in 216,000 cycles or 0.000064 seconds • (Full x64 assembly, standalone, one security level, one curve) Curve25519-based amd64-51 computes one scalar multiplication in 194,000 cycles or 0.000057 seconds
Security Level
Prime 256
128
𝑝=2
192
𝑝 = 2384 − 317
256
𝑝 = 2512 − 569
− 189
Curve
Variable -base
Fixed -base
ECDHE
Weierstrass twisted Edwards Weierstrass twisted Edwards Weierstrass twisted Edwards
270 216 714 588 1,504 1,242
107 82 252 201 488 391
377 298 966 789 1,992 1,633
Table: results in 10^3 cycles. Platform: Intel Core i7, Sandy-Bridge architecture (@3.4GHz)
@128sec-level • Fastest report NIST P-256 (Gueron & Krasnov ‘13): ≈ 490𝑘 cycles ECDHE using a 150KB table (timing-attack resistant?) • Compare Curve25519 ≈ 398𝑘 cycles to twisted Edwards ≈ 298𝑘 • (Not currently implemented) compare Curve25519+Ed25519 ≈ 263𝑘 cycles using a 30KB table to twisted Edwards ≈ 298𝑘 using a 9KB table
Security Level
Prime 256
128
𝑝=2
192
𝑝 = 2384 − 317
256
𝑝 = 2512 − 569
− 189
Curve
Variable -base
Fixed -base
ECDHE
Weierstrass twisted Edwards Weierstrass twisted Edwards Weierstrass twisted Edwards
270 216 714 588 1,504 1,242
107 82 252 201 488 391
377 298 966 789 1,992 1,633
Table: results in 10^3 cycles. Platform: Intel Core i7, Sandy-Bridge architecture (@3.4GHz)
@mid sec-level (192 - 224) • Compare Goldilocks ≈ 664𝑘 cycles variable-base to twisted Edwards ≈ 588𝑘
Security Level
Prime 256
128
𝑝=2
192
𝑝 = 2384 − 317
256
𝑝 = 2512 − 569
− 189
Curve
Variable -base
Fixed -base
ECDHE
Weierstrass twisted Edwards Weierstrass twisted Edwards Weierstrass twisted Edwards
270 216 714 588 1,504 1,242
107 82 252 201 488 391
377 298 966 789 1,992 1,633
Table: results in 10^3 cycles. Platform: Intel Core i7, Sandy-Bridge architecture (@3.4GHz)
@256sec-level • Compare E-521 ≈ 1,120𝑘 cycles variable-base to twisted Edwards ≈ 1,242𝑘 MSR ECCLib uses basic Comba multiplication There is room for optimization in the NUMS high-security curves.
Relevant links • Report: http://eprint.iacr.org/2014/130.pdf • MSR ECC Library: http://research.microsoft.com/en-us/projects/nums/ • Specification of curve selection: http://research.microsoft.com/apps/pubs/default.aspx?id=219966 • IETF Internet Draft (authored with Benjamin Black) http://tools.ietf.org/html/draft-black-numscurves-02
NUMS Elliptic Curves and their Implementation Q&A Patrick Longa Microsoft Research http://research.microsoft.com/en-us/people/plonga/