NTRU Modular Lattice Signature Scheme on CUDA ...

4 downloads 581 Views 1MB Size Report
Motivation. ▷ Lattice based signature schemes are post-quantum secure. ▷ Rejection .... Verify a signature s on document µ with respect to the public key h: 1.
NTRU-MLS CUDA Wei Dai Motivation

NTRU Modular Lattice Signature Scheme on CUDA GPUs

NTRU-MLS Why GPUs? Scheme Details Implementation

Wei Dai, John Schanck, Berk Sunar, William Whyte and Zhenfei Zhang

Results

Worcester Polytechnic Institute, Worcester, MA, USA {wdai, sunar}@wpi.edu Security Innovation, Wilmington, MA, USA {jschanck, wwhyte, zzhang}@securityinnovation.com

1 / 20

Motivation

NTRU-MLS CUDA Wei Dai Motivation

I

Lattice based signature schemes are post-quantum secure.

I

Rejection sampling slows down the signing procedure.

I

Previous NTRU-MLS parameters are less secure against new attacks.

I

Revised parameters offer higher security, smaller keys and signatures, but require more aggressive rejection sampling, i.e. slow.

I

This slowdown is mitigated by parallel computing on CUDA-enabled GPUs.

NTRU-MLS Why GPUs? Scheme Details Implementation Results

2 / 20

NTRU-MLS CUDA

Notations

Wei Dai Motivation

I I

N

 Ring: R = Z [x] / x − 1 Polynomial/vector: f =

N −1 X

NTRU-MLS Why GPUs?

ai xi = ha0 , a1 , . . . , aN −1 i

i=0 I

Norm: kf k = max |ai |

Scheme Details Implementation Results

0≤i

Implementation

uniform over R(q/2)

3. s0 = sp + pr

7. If ksk >

Scheme Details

q 2

mult. R − Bt , goto Step 2.

8. Output s as a signature of µ.

8 / 20

NTRU-MLS CUDA

Verify

Wei Dai Motivation NTRU-MLS

Verify a signature s on document µ with respect to the public key h: 1. Compute t = s ∗ h (mod q)

Why GPUs? Scheme Details Implementation Results

2. (sp , tp ) = Hash (h, µ). 3. If (s, t) 6≡ (sp , tp ) (mod p), invalid. 4. If ksk >

q 2

− Bs or ktk >

q 2

− Bt , invalid.

5. Valid.

9 / 20

NTRU-MLS CUDA

Sign

Wei Dai

Assume h ∈ R(q/2) and g −1 ∈ R(p/2) are ready. Sign µ ∈ {0, 1}∗ with (f , g): 1. (sp , tp ) = Hash (h, µ) j k q 2. r ← R 2p + 12

Motivation NTRU-MLS

hash function uniform RNG uniform over R(q/2)

3. s0 = sp + pr 4. t0 = s0 ∗ h (mod q)

mult. R(q/2)

5. a = (tp − t0 ) ∗ g −1 (mod p)

mult. R(p/2)

6. (s, t) = (s0 , t0 ) + a ∗ (f , g)

mult. R

7. If ksk >

q 2

− Bs or ktk >

q 2

Why GPUs? Scheme Details Implementation Results

− Bt , goto Step 2.

8. Output s as a signature of µ.

10 / 20

Product-form Keys

NTRU-MLS CUDA Wei Dai Motivation

I

I

Introduced to NTRUEncrypt by Hoffstein and Silverman in 2003 Extra parameters and new keygen: I I

I

I

d1 , d2 , d3 : three small integers, e.g. 6 − 13 f = p(F 1 ∗ F 2 + F 3 + 1) g = G1 ∗ G2 + G3 + 1 F i and Gi have exactly di coefficients equal to +1 and di coefficients equal to −1.

NTRU-MLS Why GPUs? Scheme Details Implementation Results

Only store indices of non-zero coefficients: I I

f and g are stored as (F 1 , F 2 , F 3 ) and (G1 , G2 , G3 ) F i or Gi is stored as an array of 2di indices, the first di are indices of +1, the left are those of −1.

11 / 20

NTRU-MLS CUDA

CUDA-enabled GPUs

Wei Dai

Memory Register Constant Texture Shared Global

Cached N/A Yes Yes N/A No

Access R/W R R R/W R/W

Scope one thread threads + host threads + host threads in a block threads + host

Lifetime Thread Application Application Block Application

Motivation NTRU-MLS Why GPUs? Scheme Details Implementation Results

a a a a a a a a

12 / 20

CPU-GPU Workflow

NTRU-MLS CUDA Wei Dai

I

I

82-bit security: 1.11% accepted ≈ 90 attemps Host: I I I

I

NTRU-MLS Why GPUs? Scheme Details Implementation Results

Device (each block): I I I I

I

Hash Allocatation Data to device

Motivation

RNG, Salsa20 Polynomial mult. Check validity (Write back)

Host: I

Repeat, or retrive one signature. 13 / 20

NTRU-MLS CUDA

CPU-GPU Workflow

Wei Dai

I

Host → Device:  sp , tp , h, g −1 , F 1 , F 2 , F 3 , G1 , G2 , G3 Another 48 bytes for Salsa20.

Motivation NTRU-MLS Why GPUs? Scheme Details

Trials per launch I

int Pos[]:

z

0

···

}| 0 1

0

1

0

{

Implementation Results

Block No.1 and No.3 has valid signatures. Retrieve only the No.1 signature in Sig. Input sp , tp , g −1 F i , Gi h Salsa20

Type int8 t uint16 t int32 t uint32 t

Bytes N 4di 4N 48

14 / 20

Polynomial Multiplication

NTRU-MLS CUDA Wei Dai Motivation

Convolution: N 2 integer multiplications with N threads. Compute: Input: Output:

C = A * B int t A[N], B[N] int t C[tid]

NTRU-MLS Why GPUs? Scheme Details Implementation Results

t = 0; for (i=0; i

Suggest Documents