Hardware Efficient Implementation of Probabilistic ... - Lab-STICC

Hardware Efficient Implementation of Probabilistic Gradient Descent Bit-Flipping 

 Fakhreddine Ghaffari, Khoa Le, David Declercq

                  

ETIS, ENSEA/UCP/CNRS France

B. Vasic, Department of Elec.and Comp. Eng. University of Arizona Tucson, USA

June 8, 2016

                  

Context/Objectives

Statistical Analysis

IVRG

LFSR

IVRG

MI

Performance

Conclusion

Outline 1

Context & Objectives

2

Statistical Analysis of PGDBF

3

PGDBF Implementation Architecture

4

Linear Feedback Shift Register (LFSR) Initialization

5

Intrinsic-Valued Random Generator (IVRG) Initialization

6

Maximum Identifier Implementation

7

Performance and Hardware cost

8

Conclusions

GDR-ISIS Energy-efficiency in Error-Correction Coding |

F. Ghaffari, K. Le, D. Declercq, B. Vasic

|

June 8, 2016

2 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Outline 1


2


3


4


5


6


7


8

Conclusions



|

June 8, 2016

3 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Context & Objectives Low Density Parity Check (LDPC) Decoders Applications : Wired, Wireless communication, Storage...

Message passing LDPC decoder Coding

Noisy Channel

2 types of LDPC decoding algorithms : Soft information algorithms : Sum-Product, Min-Sum..., powerful error correction capacity but high complexity.Passed messages Hard-decision algorithms : Gallager Bit Flipping (BF), Gradient Descent Bit Flipping (GDBF), Probabilistic GDBF..., low complexity, usually weak in error correction. We work on hard-bit decision algorithm



|

June 8, 2016

4 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Context & Objectives Concept of Bit Flipping (BF) Algorithm : Iteratively passing binary information between 2 groups of processing units : 1 2

Check Nodes Units (CNU) : compute parity check equations (XOR operations), Variable Nodes Units (VNU) : the VN value is flipped if the number of violated CN neighbors is too large. (k +1)

BF : xn

(k )

xn

(k )

= flip xn

if

(k )

En

=

P

(k ) s m∈N (\) m

≥ τ,

τ : predefined threshold.

: current value of VN n at iteration k

(k )

sm : current value of CN m at iteration k

Hardware architecture and performance comparison IRISC−dv4−R050−L54−N1296

0

10

−1

y2

y1

10

yN

−2

VNU1

VNU2

VNUN

Interconnection

Frame Error Rate (FER)

10

….

−3

10

−4

10

−5

10

−6

10

Uncoded Quantized Layered−MS−10Iter BF−100Iter

−7

10

CNU1

CNU2

….

CNUM −8

10


0

0.01

0.02

0.03 0.04 0.05 BSC crossover probability


|

0.06

0.07

June 8, 2016

5 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Context & Objectives Gradient Descent Bit Flipping (GDBF) algorithms : BF with following modifications (k )

Energy function : En (k ) Emax (k +1)

xn

(k )

= Max En

(k )

= flip xn

(k )

= xn

xor

yn +

P

(k ) s , m∈N (\) m

with yn the noisy version of VN n.

, (k )

if

En

(k )

= Emax

….

EN

E2

E1

Hardware architecture and performance comparison

Maximum-finder Emax

y2

y1

IRISC−dv4−R050−L54−N1296

0

10

−1

yN

10

−2

VNU1

VNU2

E1

VNUN EN

E2

Interconnection Block


10

….

−3

10

−4

10

−5

10

−6

10

Uncoded Quantized Layered−MS−10Iter BF−100Iter GDBF−100Iter

−7

CNU1

CNU2

….

CNUM

10

−8

10


0

0.01

0.02



|

0.06

0.07

June 8, 2016

6 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Context & Objectives Probabilistic Gradient Descent Bit Flipping (PGDBF) algorithms : GDBF with following modifications (k )

Energy function : En (k ) Emax

(k )

= Max En

(k )

= xn

P

(k ) s , m∈N (\) m

with yn the noisy version of VN n.

, (k )

sample N random bits Rn (k +1) xn

⊕ yn +

(k )

(k )

if

= flip xn

from a Bernoulli distribution with probability p0 ,

En

(k )

= Emax

(k )

and

Rn

= 1.

[Rasheed2014] O.A. R ASHEED, P. I VANIS AND B. VASIC, “FAULT-TOLERANT P ROBABILISTIC G RADIENT-D ESCENT B IT F LIPPING D ECODER ”, Communications Letters, IEEE , VOL . 18, NO. 9, PP. 1487–1490, S EPTEMBER 2014.

Hardware architecture and performance comparison IRISC−dv4−R050−L54−N1296

0

−1

EN

E2

E1

10

….

10

Maximum-finder

−2

10 yN

Binary Random generator

RAN2

VNU1

VNU2

….

VNUN

RANN

EN

E2

E1

Interconnection Block


Emax

y2

y1 RAN1

−3

10

−4

10

−5

10

Uncoded Quantized Layered−MS−10Iter BF−100Iter GDBF−100Iter PGDBF−100Iter

−6

10

−7

10

CNU1

CNU2

….

CNUM −8

10


0

0.01

0.02



|

0.06

0.07

June 8, 2016

7 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Context & Objectives Hardware overhead for Random Generator implementation could be prohibitive

Random Generator (RG) : Linear Feedback Shift Register (LFSR)

Init LFSR 3bits

LSB

LFSR 4bits

b1

LFSR 5bits

b2

LFSR 6bits

b3

LFSR 7bits

b4

LFSR 8bits

b5

LFSR 9bits

b6

LFSR 10bits

MSB

Comparator

Threshold

A Naive implementation of PGDBF would require duplicating N LFSR-RG to get N random bits at each iteration :

Tanner code (155,93, dv = 3, dc = 5)

0xE6

Output

1-bit Register

Slice LUT

Non-Probabilistic GDBF

946

2151

PGDBF with LFSR

9161

3545

offset min-sum (6 bits)

13694

15350

Our approach Statistical study of the PGDBF to identify the critical features of the probabilistic blocks, Replace the LFSR-RG by an "approximate" RG with low hardware complexity.



|

June 8, 2016

8 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Outline 1


2


3


4


5


6


7


8

Conclusions



|

June 8, 2016

9 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Statistical Analysis in the Waterfall Analysis Setting Random binary sequence at iteration k : R (k ) =

R

(k ) ,i i

= 1...N

Focus on the number L of 10 s in the sequence R (k ) , instead of the Bernoulli probability p0 Analyse the Frame Error Rate (FER) as a function of L

Binary Symmetric Channel - Waterfall Region - Tanner Code (M = 93, N = 155) Waterfall region,Tanner Code, N155M93,Itermax100

0

10

−1


10

Iter#10 Iter#20 Iter#30 Iter#50 Iter#100

Conclusion 1 : for the first decoding iterations random part does not help,

−2

10

−3

Conclusion 2 : choosing an optimized p0 is not that important, R (k ) only need to have a "cut the tail" property (Lmin ≤ L ≤ Lmax )

10

−4

10

−5

10

0

20 40 60 80 100 Number of bit 1 (L) in the binary random sequence R


120


|

June 8, 2016

10 / 44

Context/Objectives

Statistical Analysis S

IVRG

0

LFSR

IVRG

MI

po

S0

Performance

Conclusion

po

StatisticalA Analysis in the Error B Floor Errors located on Trapping Sets In the error floor region, the dominant uncorrectable error configurations are concentrated on Trapping Sets Trapping Sets TS(a, b) are defined as a small set of a VNs for which the neighboring CNs contains exactly b odd degree CNs TS(5, 3) is the smallest trapping set for regular dv = 3 LDPC codes with girth g = 8.

Smallest Error Events not correctable by the GDBF weight-3 error patterns which does not satisfy 5 parity-checks, weight-4 error patterns which does not satisfy 10 parity-checks,

v1

v2

v1

v2

v5

v6

v5

v4

v4 A


v3

B

v3


|

June 8, 2016

11 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Statistical Analysis in the Error Floor Frame error rate with fixed input errors for each Monte-Carlo round, only the random sequences R (k ) differ. Error floor region,Tanner Code, N155M93,Itermax100

Error floor region,Tanner Code, N155M93,Itermax100

0

10

0.95 Iter#10 Iter#20 Iter#30 Iter#50 Iter#100

0.9



0.85

−1

10

Iter#10 Iter#20 Iter#30 Iter#50 Iter#100 0


0.7 0.65 0.6 0.55 0.5

−2

10

0.8 0.75

120

3 bits error pattern

0.45

0


120

4 bits error pattern

Conclusion 1 : the random part is useful in the first iterations, Conclusion 2 : R (k ) needs to have the "cut the tail" property (Lmin ≤ L ≤ Lmax ), Concept of decoder rewinding : replace a single decoder with K iterations with P decoders with K /P iterations, using same input, but different random seeds : it is expected to get FER = ( 21 )P for the low-weight error events. GDR-ISIS Energy-efficiency in Error-Correction Coding |


|

June 8, 2016

12 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Outline 1


2


3


4


5


6


7


8

Conclusions



|

June 8, 2016

13 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion


D Q clk

v1y1

D Q clk 0 MUX MUX 1

init

. . .

D Q

0 MUX MUX 1

init (k)

R Signal from RG

D Q clk

clk

v2y2

. . .

clk

. . .

D Q

N

N*dv

.. .

1 CNU1

dc0

2

1 2

.. .

CNU2

dc1

. . .

1 2

.. .

CNUM

dcM

M

1 2

I1

1

...2 EC1 dv0 v1y1

1 2 . EC2

..

dv1 v2y2

.. .

...

1 2 . ECN

..

dvN vNyN

M*dc 1 2

.. .

Maximum Indicator

init

1 2

Connections network 2

1

1 2

Connections network 1

0 MUX MUX

. . .

1 2

I2

. . .

IN

0 = Stop decoding

M Syndrome check

D Q clk

vNyN

.. .

v

y



|

June 8, 2016

14 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

.. R(k) Simplified Random Generator - Duplication approach 1 N 2 …. A :

R(k) : Rt

….

2

1

Rt

..

s

.. 1

2

..

.. N

A

….

2

1

….

N ..

R(k) :

.. 1

2

….

N

B

Initialize the R t by :

Linear Feedback Shift Register (LFSR) Intrinsic-Valued Random Generator (IVRG) GDR-ISIS Energy-efficiency in Error-Correction Coding |


|

June 8, 2016

15 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

Outline 1


2


3


4


5


6


7


8

Conclusions



|

June 8, 2016

16 / 44

Context/Objectives


IVRG

LFSR

IVRG

MI

Performance

Conclusion

LFSR

LFSR Unit

1 2

Threshold

B 0:7 B A

Hardware Efficient Implementation of Probabilistic ... - Lab-STICC

Hardware Efficient Implementation of Probabilistic ... - Lab-STICC

Suggest Documents

Area Efficient Hardware Implementation of Elliptic Curve

Efficient Hardware Implementation of the Horn

Efficient FPGA Hardware Implementation of Secure ...

Efficient Hardware/Software Implementation of LPC ...

Hardware Implementation of Efficient Modified Karatsuba Multiplier ...

Efficient Hardware Implementation of Encoder and Decoder for Golay ...

Efficient Analog Hardware Implementation of Neural ... - Google Sites

Efficient hardware implementation of the hyperbolic tangent sigmoid ...

Energy-efficient Hardware Architecture and VLSI Implementation of a ...

An efficient hardware implementation of feed-forward neural ... - BME

Efficient Hardware Implementation of the Horn-Schunck ... - MDPI

Efficient Architecture and Hardware Implementation of the Whirlpool

Area Efficient Hardware Implementation of Elliptic Curve ... - arXiv

A Hardware Intensive Approach for Efficient Implementation of ... - IJRIT

Reconfigurable Hardware Implementation of a Fast and Efficient

An Efficient Hardware Implementation for Motion Estimation of AVC ...

Efficient hardware implementation of a full COFDM processor with

A Hardware Intensive Approach for Efficient Implementation of ...

An Efficient Hardware Implementation for AI applications - Embedded ...

FPGA Implementation Technology for Memory Efficient Hardware Architecture

Resource Efficient Hardware Implementation for Real-Time Traffic

An Efficient Hardware Implementation for a Reciprocal Unit - CiteSeerX

Hardware Implementation of Polyphase ...

Towards Hardware implementation of video