INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
FPGA Implementation of a Recently Published Signature Scheme Jean-Luc Beuchat — Nicolas Sendrier — Arnaud Tisserand — Gilles Villard
N° 5158 March 2004
ISSN 0249-6399
ISRN INRIA/RR--5158--FR+ENG
THÈME 2
apport de recherche
FPGA Implementation of a Recently Published Signature Scheme
Jean-Luc Beuchat , Nicolas Sendrier , Arnaud Tisserand , Gilles Villard
Thème 2 Génie logiciel et calcul symbolique Projet Arénaire Rapport de recherche n 5158 March 2004 11 pages °
An algorithm producing cryptographic digital signatures less than 100 bits long with a security level matching nowadays standards has been recently proposed by Courtois, Finiasz, and Sendrier. This scheme is based on error correcting codes and consists in generating a large number of instances of a decoding problem until one of them is solved (about attempts are needed). A careful software implementation requires more than one minute on a 2GHz Pentium 4 for signing. We propose a rst hardware architecture which allows to sign a document in 0.86 second on an XCV300E-7 FPGA, hence making the algorithm practical. Key-words: Cryptography, digital signature, code-based cryptosystems, FPGA implementation Abstract:
9! = 362880
This text is also available as a research report of the Laboratoire de l'Informatique du Parallélisme http://www.ens-lyon.fr/LIP.
Unité de recherche INRIA Rhône-Alpes 655, avenue de l’Europe, 38330 Montbonnot-St-Martin (France) Téléphone : +33 4 76 61 52 00 — Télécopie +33 4 76 61 52 52
Implantation sur FPGA d'un algorithme de signature récemment publié
Courtois, Finiasz et Sendrier ont récemment proposé un algorithme produisant des signatures numériques de moins de 100 bits et satisfaisant les exigences de sécurité actuelles. Basé sur les codes correcteurs d'erreurs, cet algorithme consiste à générer un grand nombre d'instances d'un problème de décodage, jusqu'à ce que l'une d'elles admette une solution (environ essais sont nécessaires). Une implantation logicielle optimisée nécessite plus d'une minute pour produire une signature à l'aide d'un Pentium 4 cadencé à 2 GHz. Nous proposons une première implantation matérielle permettant de signer un document en 0.86 seconde sur un FPGA XCV300E-7. Mots-clés : Cryptographie, signature numérique, cryptosystèmes basés sur les codes correcteurs, implantation sur FPGA Résumé :
9! = 362880
FPGA Implementation of a Recently Published Signature Scheme
Jean-Luc Beuchat , Nicolas Sendrier , Arnaud Tisserand , Gilles Villard 1
2
3;1
4;1
1
Laboratoire de l'Informatique du Parallélisme Ecole Normale Supérieure de Lyon, 46, Allée d'Italie, F69364 Lyon Cedex 07 {first name.last name}@ens-lyon.fr 2 Projet Codes, INRIA Rocquencourt BP 105, F78153 Le Chesnay Cedex
[email protected] 3 INRIA Institut National de Recherche en Informatique et Automatique 4 CNRS Centre National de la Recherche Scientique
1
Introduction
A cryptographic digital signature is a sequence of digits appended to any message which allows anyone to make sure both of the content and of the origin of the message. A signature scheme is proposed in [2], which allows the production of signatures less than 100 bits long with a security level matching nowadays standards. For software implementations, the relatively high signature time of the scheme could represent a main drawback. In this paper we demonstrate the applicability of the approach with the design of a dedicated hardware architecture. Indeed, a careful software implementation requires more than one minute on a 2GHz Pentium 4 to produce one signature. We are able to sign in about : second on an XCV300E-7 FPGA. A disavantage of the signature scheme may concern its large public key, which size is about one Mbyte. This may not be crucial with available low prices memories. The signature scheme of [2] compares favourably with other existing approaches. To match the present security requirements, classical techniques (RSA, DSA, Elliptic Curves) produce digital signatures of length to bits. Producing shorter signatures can be of great interest for some applications, typically when either the transmission channel or the physical data support has a limited capacity. For instance one might wish to transmit a signature by voice on a phone, or print an authentier on a banknote or a letter, store many signatures on a small device like a smart card. The signature algorithm we have implemented, briey described in Section 2, is based on a -error correcting Goppa code of length . This corresponds to the secure parameters t ,n and m in [2]. The algorithm operates on elements of the nite eld F , and manipulates polynomials of one variable z whose coecients are in F . For the representation of the elements and the arithmetic operations in F we rely on the design of [5], of which key points are recalled in Section 3. Our main objective is to prove the feasibility and the interest of the signature scheme on FPGA boards. Due to packaging reasons, we further impose a size constraint corresponding to an XCV300E-7 FPGA. As we shall see in Section 4, this size constraint does not allow 0 86
320
16
9
= 9
1024
2
= 2
16
= 16
216
216
216
2
J.-L. Beuchat , N. Sendrier , A. Tisserand , G. Villard
a straightforward implementation of the algorithm. In particular the two main steps the Berlekamp-Massey algorithm and a polynomial divisibility test cannot be implemented as separate blocks. Our solution is to show that both steps can share the same hardware resources, and can be implemented by designing a specic processor. 2
The Signature Algorithm
The signature algorithm is based on a public key McEliece/Niederreiter cryptosystem. Its underlying code is a -error correcting binary Goppa code of length n , and dimension n tm [2]. The code is dened using the elements of F , and the public key is a tm n parity check matrix. The security of the signature scheme relies on the diculty of decoding. Indeed, for a document D, the signature is obtained from the decryption (an n-bit word of weight ) of a syndrome s (a -bit word) associated to D. The computational signing task is to nd a decodable syndrome to be used. For two hashing functions h and h , one considers s h D and s h D; i , i . The algorithm essentially increments the index i until s is decodable. We have implemented the most time consuming part of the process: while s is not decodable do 1: Computation a double syndrome S (a -bit word) from s h s; i ; 2: Berlekamp-Massey algorithm for a candidate error locator z 2 F z ; 3: Divisibility test on z that determines whether s is decodable; i i ; 16
9
= 2
16
9
= 2
16
16
= 144
216
2
16
2
(i)
9
=
c
(
(i)
)
144
=
(i)
c(
)
0
(i)
(i)
288
=
( )
(i)
( )
c(
)
216 [ ]
+1
end while
The signature cost is essentially the cost of this while loop. About decoding attempts are required on the average (with a very small standard deviation) [2]. We shall detail the architecture design of the three steps of the loop in Section 4. As commonly done forPthe syndrome elements and the error location, we manipulate polynomials Sz S z and z P z over F . Step 2 and step 3 are briey detailed below, the polynomial is computed using the Berlekamp-Massey algorithm for solving the key equation S z ! z = z z . Then, the decodability of s is checked through a divisibility test, which determines whether has t distinct roots in F that situate the errors. Berlekamp-Massey Algorithm. The Berlekamp-Massey algorithm eciently solves the key equation and compute [1, 3]. Searching for a polynomial of degree t exactly and assuming that the discrepancies are non-zero, slightly simplify the implementation. We 9! = 362880
( ) =
17 i=0
i
i
( ) =
( ) =
( )
( )
9 i=0
mod
i
i
216
2t
(i)
= 9
216
= 9
INRIA
3
FPGA Implementation of a Recently Published Signature Scheme
compute
using the following t-step iterative and division-free scheme: z Æ( z z z ; (1) z if i is even; z (2) z z if i is odd; ( if i is even; Æ (3) Æ if i is odd; where i < t, z z , S , and Æ P . At each step i < t , the new discrepancy is computed as S : If is equal to zero (this happens only with small probability), the computation is aborted, and we go for another decoding attempt. For simplifying the last step of the signature scheme, we make the error locator polynomial monic. The operation is z z = , and is implemented as one inversion and eight multiplications, (2t
1)
=
2
(i)
(i)
2
(
2
1)
1)
(i 1)
(i
( ) =
(i)
0
(i
( ) =
1)
(i
( )
(i)
(i
=
0
1)
( )
( )
1)
( )
(i)
=
(i
(
( ) =
1)
1)
(0)
( ) = 1
(i+1)
1
(
(i+1)
1)
=
(i+1)
(2t
(2t
1)
= 1
min(t;i+1) j =0
(i)
j
i+1
j
1)
(2t 1)
with the sole inversion of the body of the while loop. Divisibility/Decodability Test. Step 3 of the signature scheme consists in checking if the error locator polynomial splits completely into linear factors, i.e. if has distinct roots in F . Since Fermat's little theorem states that z z Q 2F z a , the polynomial splits over F i z mod z z [2]. Assume that the four polynomials dened by ' z z mod z , i , are precomputed, and let p be a polynomial of degree . Then, ( ) =
( )
t
9
2m
216
216
216
=
a
216
(
)
( ) =
i(
2i
) =
( )
5
8
8
pz
2
( )
mod z
8 X
( ) =
pz
2 2i i
i=0
!
8 X
mod z
( ) =
p' z 2 i
4 X
i( ) +
i=5
pz : 2 2i i
i=0
(4)
Consequently, z mod z can be evaluated in steps by repeated squaring, starting with ' z z mod z . For the precomputation of the ' 's, notice that for a polynomial p z of degree , since z is monic, we have: zp z mod z p z zp z : (5) Therefore, the four polynomials ' z can be iteratively computed according to (5), starting from p z z . 216
8( ) =
24
( )
12
( )
8
( )
( ) =
3
( )
i
( )
i(
8
( ) =
8
( ) +
( )
)
Composite Arithmetic in the Finite Field
F 216
The eciency of the signature algorithm relies heavily on the eciency of the underlying arithmetic in the eld F . The operation count is dominated by the number of additions and multiplications. We have followed the composite eld approach of [5], whose performance 216
RR n° 5158
4
J.-L. Beuchat , N. Sendrier , A. Tisserand , G. Villard
is superior, in our case, to other techniques like standard base arithmetic operators. The eld F ,m , is built recursively from F (using a eld extension of degree ). An element a 2 F is represented by an m-bit array a ; a of two elements in F [4,5]. For the architecture in Section 4 we shall use multiply-and-add operators. With above representation, the operation also is dened recursively. For a; b and c in F , d d ; d a b c is given by: ( 2m
2m=2
= 16
2m
2
[ 1
2m=2
0]
2m
+
d d
a a b b ab ab ab! c
1
= ( 0 +
0
=
1 )( 0 +
0 0 +
1 1
1) +
m=2 +
0 0 +
= [ 1
0] =
c; 1
0
where ! is a constant in F . For this work we have stopped the recursion at the level of F or F , and implemented -input or -input operators using look-up tables with four address bits. We have proceeded in a similar way for the inverter, we refer to the algorithm of [5, 2.2.2] and references therein. These choices lead to a Multiply-and-Add operator using slices and to an inverter using slices (including an internal pipeline stage). Both operators have a critical path of about ns. For a general comparison, note that a bits to bits integer multiplier would require about slices. 2m
m=2
24
22
1
2
98
109 10
16
32
140
4
Hardware Implementation
We now detail our architecture for the three steps of the signature while loop. We show that parallel structures for the Berlekamp-Massey and for the decodability stages can be based on similar resources. From (1) and (4) we identify the key operation p z a q z r z where p; q and r are three polynomials of degree over F , and where a is an element of F . Designing a specic processor for the latter polynomial operator (which actually can be seen as a vector Axpy), leads to an ecient implementation of the whole while loop that respect our board size constraint. ( )
9
( )+
( )
216
216
4.1 Computation of a Double Syndrome
Figure 1 describes the operator computing a double syndrome from the -bit hashed document s and the counter i. The seed of the 25-stage linear feedback shift register (LFSR) is set to i during an initialization step. Then, at each clock cycle, the circuit performs the XOR of s and the j th bit of the pseudo-random sequence generated by the LFSR to obtain s . The j th row of the matrix V (stored in 18 Block Select RAM memories) is then multiplied by s , which gives the coecients of a new double syndrome (in the ip-ops): 0 1 0 1 0 1 144
j
(i)
144
j
288
(i)
18
j
16
18
S
143 X
=
@
s v ( i) j
j =0
0;j
A+@
143 X
s v (i) j
j =0
1;j
143 17 X X A z + : : : + @ s( ) v287 A z 17 = Sz i
j
j =0
;j
where S 2 F . The double syndrome block and its control unit require to say : % of the available resources of an XCV300E FPGA. i
6 7
216
i
i
;
i=0
204
slices, that is
INRIA
5
FPGA Implementation of a Recently Published Signature Scheme
Incr Rst
Load/Shift
Counter i
01
01
01
01
01
01
01
01
01
01
jth bit of the hashed message
sj
En
25−stage LFSR (i)
DATA_IN[15:0] ADDRA[7:0] WEA[17]
sj RAMB4_S16_S16
RAMB4_S16_S16
WEA CLKA ADDRA[7:0] DIA[15:0]
144 lines of the matrix V
WEA CLKA ADDRA[7:0] DIA[15:0]
V[i][0]
DOB[15:0] 16 CLKB ADDRB[7:0]
V[i][272]
DOB[15:0] 16 CLKB ADDRB[7:0]
S0
S2t−1
V[i][15]
V[i][287]
CLK ADDRB[7:0]
18 Block SelectRAM memories (288 columns of the matrix V)
Fig.1. Computation of the double syndrome S. 4.2 Specic Processor for Error Locating and Decodability Testing
Let us rst look at the resources asked by the Berlekamp-Massey algorithm with input S and output . The computation (1) of may be rewritten as follows: (i)
pz
(i) (i
( ) =
1)
z
X t
( ) =
z
( ) =
Æ
(i
1)
(i
1)
z
( ) +
1)
j
j =0
(i)
(i) (i
zp z
( ) =
X t
Æ
(i
(6)
z; j
1)
(i j
1)
+
p
j +1
(7)
z; j
j =0
where p 2 F z . Thus can be evaluated in two clock cycles by means of t multiplyand-add units. The computation of the new discrepancy requires up to t products S and a multi-operand addition. Finally, is obtained by multiplying the coecients of the polynomial by the inverse of . This is summarized in Table 1 which provides an estimate of the corresponding circuit area on an XCV300E FPGA. 216 [ ]
(i)
j
i+1
(i)
+1
(i+1)
j
(2t
RR n° 5158
1)
(2t
t
1)
+1
6
J.-L. Beuchat , N. Sendrier , A. Tisserand , G. Villard
Table 1. Hardware resources required for the Berlekamp-Massey algorithm. Components Area (% of available slices)
Ten Multiply-and-add operators An inverter to compute 1 t(2t 1) A multi-operand adder to compute (i+1) Four 160-bit registers to store (i) ( ), (i) ( ), up to ten coecients of the double syndrome, and the polynomial
=
p(z)
Two
16-bit
registers for
Æ
(i)
980 109
z z
and
32 320
slices slices slices slices
: : : :
( 32 0%) ( 3 5%) ( 1 0%) ( 10 4%)
: 1457 slices ( 47:4%)
(i)
16
Total:
slices (
0 5%)
A brief study of the divisibility test indicates that its hardware implementation involves
t multiply-and-add units and a few registers (see Table 2 for details). Indeed, from (5), the four polynomials ' , i , are obtained using t multiply-an-add operations in F 16 . The
computation (4) is then realized in ve steps: X pz p z , q p , q p , q p , and q p ; for j to do pz q' z pz; i
5
8
2
4
2 2i
( )
i
0 =
2 5
1 =
2 6
2 =
2 7
3 =
2 8
i=0
= 0
( )
end for
3
j
j +5 (
)+
( )
Table 2. Hardware resources required for the divisibility test. Components Area (% of available slices)
Nine Multiply-and-add operators Five 144-bit registers to store intermediate results and the polynomials 5 ( ), 6 ( ), 7 ( ), and 8 ( ) Four 16-bit registers for 25 , 26 , 27 , and 28
' z ' z ' z p p p
p
' z
Total:
: : 32 slices ( 1:0%) 1274 slices ( 41:4%) 882 360
slices ( 28 7%) slices ( 11 7%)
According to our area estimates (Tables 1 and 2), an XCV300E FPGA does not contain enough slices for implementing a double syndrome block, a Berlekamp-Massey block, a divisibility test block, and their respective control units. However, the above analysis shows that the Berlekamp-Massey algorithm and the divisibility test can share the same hardware resources. Our solution is the design of a specic processor for programming these algorithms (Figure 2). It consists of ten multiply-and-add operators, a multi-operand adder, an inverter, INRIA
7
FPGA Implementation of a Recently Published Signature Scheme
six -bit registers, and four -bit registers (Table 3). We have seen the arithmetic units in Section 3, and we only describe the main features of the registers: If inputs A and B of the multiply-and-add block are equal to p z , then the SQR unit provides a routing mechanism for storing the polynomial P p z of degree in R . The APPEND unit updates the double syndrome sequence involved in the computation of the discrepancy as follows: R zR S mod z . For reducing the size of the multiplexer selecting the A input of the multiply-and-add block, R , R , R , and R form a FIFO. The feedback mechanism allows to copy R or R , and to shift R (multiplication by z). For instance this allows to update z according to (2). A more detailed example is given in Appendix A. 160
16
( )
4 i=0
(
1
2
3
4
8
i
0
2t
i)
5
5
(i)
4
c27
c24
Load c15 R5
R5
z 0
1
0
1
A[9:0] Load c17 R4 R4, R5 c16 1
z
9
Σ
i=0 c12 0
4
SQR
Σ
1
C[9:0]
R3 R2
1
B[9:0]
Σ
p25
1
1 0
pi z2i
Load c R6 7
c13 Load R0
00 01 10
R0 c14 R0
c6
z8 Si
p26
c4 c3
INV
0
c26 c25
0
i=0 0
c10 Load R1 R1 c11 R1
APPEND
10 01 00
A[9]B[9]+C[9] ... A[0]B[0]+C[0]
00 01 10
c18
c9
R4
pi zi
4
( )
c23 c22
160 bits 16 bits ith control bit
ci
Load c21 R2, R3 c20 c19
1 +
2 2i
00
01
11
R6
10
c1
R8
c8 R6
1
R7
c2 Load R8
R9
c5 Load R7
0 1
p27 p28
c0 Load R9
Fig.2. Application-specic processor for error locating and decodability testing.
If we apply loop unrolling, no conditional instructions are needed in Berlekamp-Massey algorithm and the divisibility test. Furthermore, each instruction simply consists of the control bits described on Figure 2. This allows the design of a small control unit which increments the program counter, checks if , and returns z mod z . The BerlekampMassey algorithm and the divisibility test respectively require and instructions (see Appendix A for more details). Our specic processor, including the program ROM, requires slices ( % of the slices available in an XCV300E FPGA) and allows to implement the signature scheme on our target FPGA. 28
(i+1)
= 0
216
74
2209
RR n° 5158
72
( ) 72
8
J.-L. Beuchat , N. Sendrier , A. Tisserand , G. Villard
Table 3. Register assignment for the Berlekamp-Massey and the divisibility test. Berlekamp-Massey Divisibility test R R
0
R
2
1
to
R R R R
R
5
Intermediate results Double syndrome sequence Copy of (i) (i) ( ) and ( ) ( ), then 5 ( ),
z
6
Æ
( i)
R (for squaring) ' z ' (z), ' (z), and ' (z) p p p p
z
(i)
7
8 9
5
z
0
6
2 5 2 6 2 7 2 8
7
8
Results and Analysis
A VHDL description of the signature scheme has synthesized by XST 5.2.03i and placedand-routed for an XCV300E-7 FPGA. slices (88% of the available resources) and Block SelectRAM memories implement this rst prototype which runs at f MHz. The computation of the rst double syndrome involves clock cycles (an initialization step and computation steps). The Berlekamp-Massey algorithm and the divisibility test require additional clock cycles ( instructions and clock cycles devoted to control tasks), a new double syndrome is evaluated in parallel. Hence the signature time is about (on the average with a very small standard deviation): T : second; f whereas a software implementation requires more than one minute to achieve the same task on a 2GHz Pentium 4 processor. Though our circuit outperforms software implementations, future studies should improve its eciency. In particular, the arithmetic operators of the specic processor are not fully exploited. Among the instructions executed, there are multi-operand additions ( : % of the total number of instructions), a single inversion ( : %), and multiply-andadd operations ( %). Furthermore, the tenth multiply-and-add unit is only involved in the very last steps of the Berlekamp-Massey algorithm. Further theoretical and architectural studies should therefore allow to shorten the signature time. 2593
18
= 62
145
144
148
146
signature
146
11 6
=
1
(145 + 148
2
9!)
0 86
17
0 7
122
83
References
1. E.R. Berlekamp. Algebraic Coding Theory. Second Edition, Aegean Park Press, Laguna Hills, California, 1984. 2. N. Courtois, M. Finiasz, and N. Sendrier. How to achieve a McEliece-based digital signature scheme. In C. Boyd, editor, Advances in Cryptology ASIACRYPT 2001, number 2248 in Lecture Notes in Computer Science, pages 157174. Springer, 2001.
INRIA
9
FPGA Implementation of a Recently Published Signature Scheme
3. J.L. Massey. Shift-register synthesis and BCH decoding. IEEE Trans. Inform. Theory, IT15:122127, 1969. 4. C. Paar. A new architecture for a parallel nite eld multiplier with low complexity based on composite elds. IEEE Transactions on Computers, 45(7):856861, July 1996. 5. C. Paar and M. Rosner. Comparison of arithmetic architectures for Reed-Solomon decoders in recongurable hardware. In 5th IEEE Symposium on FPGA-based Custom Computing Machines (FFCM'97), Napa Valley, CA, April 1997. A
Implementation of the Berlekamp-Massey Algorithm
This Appendix describes the instructions which allow to implement the Berlekamp-Massey algorithm on our specic processor (Table 4), and provides a coding example: Instruction I is responsible for the initialization step of the Berlekamp-Massey algorithm. We dene two instructions I and I to compute z , z , and Æ when i is even. I implements Equations (2) and (6), and shifts the FIFO so that the A input of the multiply-and-add block is z . Then, I evaluates in parallel the ith step error locator polynomial and Æ (Equations (3) and (6)), shifts the FIFO, and updates the double syndrome sequence. Instructions I and I perform the same task when i is odd. Finally, instructions I and I compute the new discrepancy according to the algorithm described in Section 4.2, and shift the FIFO so that R and R respectively store z and z . This state of the FIFO allows to start the computation of the i th step error locator polynomial. Table 5 illustrates the programming of the Berlekamp-Massey Algorithm. Instruction I properly initializes the registers. Then, four instructions implement the ith step of the algorithm. Note that it is useless to compute a new discrepancy when i t (see Section 2). Consequently, the algorithm requires t instructions. Three clock cycles are then required to make z monic (an inversion and a multiplication), which gives the total of 74 clock cycles mentioned in Section 4.2. 0
1
1
(i
(i)
2
1)
( )
3
(i)
(i)
( )
(i)
2
(i)
5
( )
4
6
5
(i)
4
( )
( + 1)
0
= 2
1 + (2
( )
RR n° 5158
1)
4 + 2 = 71
1
( )
10
J.-L. Beuchat , N. Sendrier , A. Tisserand , G. Villard
Table 4. Operations involved in the Berlekamp-Massey Algorithm. Mathematical description Instruction Comment
(z) = 1 (z) = 1 I =S Æ =1 p(z) (z) I (z) (z) (
1)
(
1)
(0)
0
(
R 1 R 1 R ;R S R 1 R RR R R R R (z) + zp(z) R ; R R R R R R R R zR + S (z) R RR (z ) R zR R R (z) + zp(z) R ; R R R R R R zR + S
0
1)
(i) (i
(i)
1
(i
(z) Æ (i)
I Æ 2
(i)
1)
(i)
(i) (i
(i)
(i
(z) Æ (i)
I
(i
4
5
4
1)
0
2
6
7
X
min(t;i+1)
j =0
(i+1)
X
7
2
(i
5
1)
0
S ( i)
j
i+1
j
z
j
R R R R R R
0
3
p
j
7 5 4
5
5
4
2
6
2
1
2
0
3
6
2
=
5 2 3
=
(i) (i
(i
1)
1)
z
( )
(i)
(i)
(i)
0
3
(z) = z = (z ) = (z ) = (z )
4
=
5 2
(i)
(i
(i
1)
1)
z
( )
(i)
(i)
New syndrome sequence
1
3
4
5
1)
(z) = = (z ) = (z ) = (z ) =Æ
2
1
New syndrome sequence
R R + zR R R
i+1
RR R R PR R R 0
0
i+1
5
1
min(t;i+1)
j =0
1
0
4
6
(
6
R R + zR R R R
1)
(0)
7
5
2
3
1)
5
6
7
2
3
I I
0
1)
1)
4
p(z)
0
1)
(
4
6
1)
(i
7
1
p(z) I (z) z 3
1
(
5
4
1)
(i
R = (z ) R = (z) R = and R stores the current double syndrome sequence R =Æ
5
R R R R R
3 7 5 4
= = = =
(i)
z z
( )
(i)
( )
(i+1)
(i)
z z
( )
(i)
( )
INRIA
11
FPGA Implementation of a Recently Published Signature Scheme
Table 5. First steps of the Berlekamp-Massey algorithm.
I
I I I I I R (z) (z) S z + S (z) R S S z+S R (z) = (z) (z) (z) = z (z) R (z) (z) R (z ) (z) (z) R (z) (z) (z ) (z ) R Æ Æ R 0
1
(0) (
0
1
4 5 6
7
RR n° 5158
(0)
5
(0) 1
0
(
1)
1)
(
1
1
(0)
(1)
1)
(0)
(0)
1)
(0)
(0)
(0)
(
3
(1) (0)
0
(0)
(0)
(
6
(0) 0
0
(0)
2
3
2
1)
(0)
(0)
(0)
(1)
Unité de recherche INRIA Rhône-Alpes 655, avenue de l’Europe - 38330 Montbonnot-St-Martin (France) Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois - Campus scientifique 615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex (France) Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France) Unité de recherche INRIA Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex (France) Unité de recherche INRIA Sophia Antipolis : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex (France)
Éditeur INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France) http://www.inria.fr
ISSN 0249-6399