Implementing Pairings at the 192-bit Security Level ... Evaluate in light of new techniques the performance of serial and ..... RELIC cryptographic library1.
Implementing Pairings at the 192-bit Security Level Diego F. Aranha Department of Computer Science University of Bras´ılia
Joint work with Laura Fuentes-Casta˜ neda, Edward Knapp, Alfred Menezes, Francisco Rodr´ıguez-Henr´ıquez. Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Introduction
Pairing-Based Cryptography (PBC) enables many elegant solutions to cryptographic problems: Identity-based encryption Short signatures Non-interactive authenticated key agreement Pairing computation is the most expensive operation in PBC.
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Introduction
Pairing-Based Cryptography (PBC) enables many elegant solutions to cryptographic problems: Identity-based encryption Short signatures Non-interactive authenticated key agreement Pairing computation is the most expensive operation in PBC. Important: Make it faster!
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Introduction
Barreto-Naehrig curves are ideal for asymmetric setting at the 128-bit security level. What about higher security levels? Best choice of parameters is protocol-dependent Security usually scaled by increasing embedding degree Several choices of curve: BN, BLS12, KSS18, BLS24 Several choices of pairings: latency or throughput?
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Introduction
Barreto-Naehrig curves are ideal for asymmetric setting at the 128-bit security level. What about higher security levels? Best choice of parameters is protocol-dependent Security usually scaled by increasing embedding degree Several choices of curve: BN, BLS12, KSS18, BLS24 Several choices of pairings: latency or throughput? Focus: Restrict study to pairing computation alone!
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Objective Evaluate in light of new techniques the performance of serial and parallel implementations of pairings at the 192-bit security level: Maximize throughput Minimize latency Applications: servers, real-time services. Contributions Suggest curve choices and explicit parameters Present new state-of-the-art implementations Detect fastest pairing among candidates
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Parameter generation KSS curves: k = 18, ρ ≈ 4/3 p(z) = (z 8 + 5z 7 + 7z 6 + 37z 5 + 188z 4 + 259z 3 + 343z 2 + 1763z + 2401)/21 r (z) = (z 6 + 37z 3 + 343)/343, t(z) = (z 4 + 16z + 7)/7 BN curves: k = 12, ρ ≈ 1 p(z) = 36z 4 + 36z 3 + 24z 2 + 6z + 1 r (z) = 36z 4 + 36z 3 + 18z 2 + 6z + 1, t(z) = 6z 2 + 1 BLS12 curves: k = 12, ρ ≈ 1.5 p(z) = (z − 1)2 (z 4 − z 2 + 1)/3 + z, r (z) = z 4 − z 2 + 1, t(z) = z + 1 BLS24 curves: k = 24, ρ ≈ 1.25 p(z) = (z − 1)2 (z 8 − z 4 + 1)/3 + z, r (z) = z 8 − z 4 + 1, t(z) = z + 1
Heuristic for each parameterized family 1 2
Generate random sparse integer z of appropriate size If p(z), r (z) prime numbers and best towering possible 1 2
Iterate b-coefficient until correct curve is found Find correct twist (D- or M-type)
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Parameter generation Curve KSS BN BLS12 BLS24
b 2 5 4 4
k 18 12 12 24
z − + 246 + 212 2158 − 2128 − 268 + 1 −2107 + 2105 + 293 + 25 −248 + 245 + 231 − 27 −264
251
dlog2 pe 508 638 638 477
dlog2 r e 376 638 427 383
Properties Very sparse z (faster loop and final exponentiation) Short b-coefficients (faster point doubling) Best towering choices available Generalized lazy-reduction in extension field arithmetic (|p(z)| mod w ≤ w − 2, for any w multiple of 8) Important: In a few cases, improved parameters found by others! Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Optimal ate pairing A Miller function fs,R is a function with divisor s(R) − (sR) − (s − 1)∞. An extended Miller function fs,h,R is the normalized rational deg Xh function with divisor hi [(s i R) − (∞)]. i=0
The pairing e(P, Q) is defined by fp,h,Q (P)(p
Diego F. Aranha
k −1)/r
.
Implementing Pairings at the 192-bit Security Level
Optimal ate pairing A Miller function fs,R is a function with divisor s(R) − (sR) − (s − 1)∞. An extended Miller function fs,h,R is the normalized rational deg Xh function with divisor hi [(s i R) − (∞)]. i=0
The pairing e(P, Q) is defined by fp,h,Q (P)(p Curve
BN BLS12 BLS24
f6z+2,Q
.
Optimal ate pairing (p 18 −1)/r p fz,Q · f3,Q · `z[Q],[3p]Q (P) (p 12 −1)/r · `[6z+2]Q,[p]Q · `[6z+2+p]Q,[−p 2 ]Q (P) (p 12 −1)/r fz,Q (P) (p 24 −1)/r fz,Q (P)
KSS
k −1)/r
Diego F. Aranha
h(x) z + 3x − x 4 6z + 2 + x − x 2 + x 3 z −x z −x
Implementing Pairings at the 192-bit Security Level
Generalized lazy reduction Intuitively, it is a trade-off between addition and modular reduction: (a · b) mod p + (c · d) mod p = (a · b + c · d) mod p Observation: Pairings use non-sparse primes for Fp !
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Generalized lazy reduction Intuitively, it is a trade-off between addition and modular reduction: (a · b) mod p + (c · d) mod p = (a · b + c · d) mod p Observation: Pairings use non-sparse primes for Fp ! Any P coefficient c of an element in Fpk is ultimately computed as c = ±ai bj mod p, requiring a single reduction. For k = 2i 3j , total of (3i · 6j )M + kR.
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Generalized lazy reduction Intuitively, it is a trade-off between addition and modular reduction: (a · b) mod p + (c · d) mod p = (a · b + c · d) mod p Observation: Pairings use non-sparse primes for Fp ! Any P coefficient c of an element in Fpk is ultimately computed as c = ±ai bj mod p, requiring a single reduction. For k = 2i 3j , total of (3i · 6j )M + kR. Remark 1: Montgomery bounds should be maintained for intermediate results. Choose |p| acoordingly. Remark 2: Same idea applies to arithmetic in E 0 (Fpk/6 ). Example: Mult. in Fp12 goes from 54M + 36R to 54M + 12R. Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Removing the inversion penalty Consider (p k − 1)/r = (p k/2 − 1)(p k/6 + 1)(Φk (p)/r ). The hard part is Φk (p)/r which requires |z|-th powers. If s < 0, from pairing definition: pk −1 e(P, Q) = f|s|,h,Q (P)−1 · l r . By distributing the power (p k − 1)/r , we can compute instead: h
p k/2
e(P, Q) = f|s|,Q (P)
Diego F. Aranha
·l
i pk −1 n
.
Implementing Pairings at the 192-bit Security Level
Compressed cyclotomic squarings
Consider Fpk = Fpk/3 [t]/(t 3 − u). P Let g = 2i=0 (g2i + g2i+1 s)t i ∈ Gφ6 (Fpk/6 ) and P g 2 = 2i=0 (h2i + h2i+1 s)t i with gi , hi ∈ Fpk/6 . Given C(g ) = [g2 , g3 , g4 , g5 ], it is efficient to compute C(g 2 ) = [h2 , h3 , h4 , h5 ] . Important: Decompression map D requires one inversion in Fpk/6 .
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Compressed cyclotomic squarings Suppose |z| = 2a − 2b + 1, a > b. Idea: g |z| can now be computed in three steps: i
b
a
1
Compute C(g 2 ) for 1 ≤ i ≤ a and store C(g 2 ) and C(g 2 )
2
Compute D(C(g 2 )) = g 2 and D(C(g 2 )) = g 2
3
Compute g |z| = g 2 · (g 2 )
a
a
a
b
b
k/2
b
·g
Remark: Montgomery’s simultaneous inversion allows simultaneous decompression. Important: Computing a |z|-th power is now 30% faster.
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Parallelization of optimal ate pairing Property of Miller functions fa·b,P (D) = f b,P (D)a · f a,bP (D)
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Parallelization of optimal ate pairing Property of Miller functions fa·b,P (D) = f b,P (D)a · f a,bP (D) We can write s = 2w s1 + s0 and compute fr ,P (D): fs,P (D) = f2w s1 +s0 ,P (D) w
= f s1 ,P (D)2 · f 2w ,s1 P (D) · f s0 ,P (D) ·
Diego F. Aranha
l(2w s1 )P,s0 P (D) . vsP (D)
Implementing Pairings at the 192-bit Security Level
Parallelization of optimal ate pairing Property of Miller functions fa·b,P (D) = f b,P (D)a · f a,bP (D) We can write s = 2w s1 + s0 and compute fr ,P (D): fs,P (D) = f2w s1 +s0 ,P (D) w
= f s1 ,P (D)2 · f 2w ,s1 P (D) · f s0 ,P (D) ·
l(2w s1 )P,s0 P (D) . vsP (D)
If s has low Hamming weight, w can be chosen so that s0 is small. For many processors, we can: Apply the formula recursively Write s as s = 2wi si + · · · + 2w2 s2 + 2w1 s1 + s0 Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
β Weil pairing Parallelizing the optimal ate pairing has some limitations: The final exponentiation is inherently serial Parallelization cost in terms of expensive point doublings and extension field squarings Solution: Pairings tailored for parallel execution! Theorem Let k = ed, where d is the order of the automorphism group of E . The following is a pairing: β : G1 × G2 → GT : (P, Q) 7→
e−1 Y i=0
Diego F. Aranha
fp,h,[pi ]P (Q) fp,h,Q ([p i ]P)
pe−1−i .
Implementing Pairings at the 192-bit Security Level
β Weil pairing
" KSS : (P, Q) 7→
fp,h,P (Q) fp,h,Q (P)
BN : (P, Q) 7→
p 2
fp,h,P (Q) fp,h,Q (P)
BLS12 : (P, Q) 7→
BLS24 : (P, Q) 7→
fp,h,[p]P (Q) fp,h,Q ([p]P) p
fz,P (Q) fz,Q (P)
3
p
fp,h,[p2 ]P (Q) fp,h,Q ([p 2 ]P)
fp,h,[p]P (Q) fp,h,Q ([p]P)
p
fz,[p]P (Q) fz,Q ([p]P)
#(p9 −1)(p3 +1)
(p6 −1)(p2 +1)
(p6 −1)(p2 +1)
2
p p p (Q) · fz,[p (Q) · fz,[p]P fz,P 2 ]P (Q) · fz,[p 3 ]P (Q) 3
2
p p p fz,Q (P) · fz,Q ([p]P) · fz,Q ([p 2 ]P) · fz,Q ([p 3 ]P)
(p12 −1)(p4 +1)
Note: For efficiency, compute (p k/2 − 1)(p k/6 + 1)-th power. Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Operation counts If M(n) = 2n2 + n then m640 ≈ (210/136) · m512 ≈ 1.544 · m512 . Curve KSS
BN
BLS12
BLS24
Phase Miller Loop Final Step Final Exp. ML + FS + FE Miller Loop Final Step Final Exp. ML + FS + FE Miller Loop Final Exp. ML + FE Miller Loop Final Exp. ML + FE Diego F. Aranha
Mult. in Fp 13168m512 534m512 23821m512 37523m512 16387m640 166m640 7218m640 23771m640 10865m640 8464m640 19329m640 14927m480 25412m480 40339m480
Mult. in Fp512 13168m512 534m512 23821m512 37523m512 25301m512 256m512 11145m512 36702m512 16775m512 13068m512 29843m512 14927m512 25412m512 40339m512
Implementing Pairings at the 192-bit Security Level
Estimated speedup of optimal ate
2
Speedup
1.5
1
0.5
0
BLS12 BN KSS BLS24 1
2
3
4
5
6
7
8
Number of processors
Observation: Speedups over serial KSS optimal ate pairing! Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Estimated speedup of β Weil 4 3.5
Speedup
3 2.5 2 1.5 1
BLS12 BN KSS BLS24
0.5 0
1
2
3
4
5
6
7
8
Number of processors
Observation: Speedups over serial KSS optimal ate pairing! Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Implementation notes
Finite field arithmetic in Assembly High-level code in the C programming language Parallelization by OpenMP constructs GCC 4.7.0 with optimization level -O3 RELIC cryptographic library1 Intel 2-core Westmere and 4-core Sandy Bridge processors For multiplication, we verified m640 ≈ 1.544 · m512 Inversion from GMP2 , giving I/M-ratio in Fp of around 16 Affine coordinates were only competitive in BLS24 curve
1 2
http://code.google.com/p/relic-toolkit/ http://www.gmplib.org/ Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Experimental results
70
Latency (millions of cycles)
60
Scott 2011 - KSS This work - KSS This work - BLS12 This work - BN This work - BLS24
60.0
50 40 30
26.32 23.4
20
24.00
23.22 20.91 18.67 15.15
15.04
14.63 10.80
10 0
17.83
17.28
11.17 9.18
10.26
7.24
1
2
4
8
Number of threads
Observation: Faster pairing between optimal ate and β Weil. Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
Conclusions and future
New state-of-the-art for pairing computation at the 192-bit level: BLS12 has the fastest serial and parallel pairing computation Serial optimal ate pairing is more than 3 times faster Parallel β Weil pairing improves it by another 2.6-factor Future directions: Performance of protocols (also in the symmetric setting) Evaluate performance at the 256-bit security level
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level
RELIC cryptographic library: http://code.google.com/p/relic-toolkit/ Thank you for your attention! Any questions?
Diego F. Aranha
Implementing Pairings at the 192-bit Security Level