Digital Signal Processing for Embedded Communications and ...

6 downloads 69940 Views 3MB Size Report
May 23, 2012 ... Communications and Biomedical Systems. Keshab K. Parhi ... 3rd-order IIR filter. • See Parhi, VLSI Digital Signal Processing Systems,. Wiley ...
Digital Signal Processing for Embedded Communications and Biomedical Systems

Keshab K. Parhi Distinguished McKnight University Professor University of Minnesota, Minneapolis http://www.ece.umn.edu/users/parhi May 23, 2012

OUTLINE • Communications Systems - Folding - Polar Decoders

• Biomedical Systems - Communication - Feature Computation and Classification - Monitoring • IC Chip Security by PUF

2 /11

4/30/2012

Wireless Phone Timeline • http://gizmodo.com/357895/the-analog-cellphone-timeline

Folding Transformation • 3rd-order IIR filter

• See Parhi, VLSI Digital Signal Processing Systems, Wiley, 1999 • A possible folding set: A={A0, A1, A2, A3}, M={M0, M1, M2, M3} 3 /11

4/30/2012

Folding Transformation (Cont’d) • Folded 3rd-order IIR filter

• Multiple algorithm operations are time-multiplexed to a single functional unit • Area reduction! 4 /11

4/30/2012

Folding Transformation (Cont’d) • 6th-order IIR filter (cascade of two 3rd-order IIR filter)

• Also can be folded into 1multiplier and 1 adder • A possible folding set with interleaved ordering: A={A0, A0’, A1, A1’, A2, A2’, A3, A3’ }, M={M0, M0’, M1, M1’, M2, M2’, M3, M3’} 5 /11

4/30/2012

Folding Transformation (Cont’d) • Folded 6th-order IIR filter

• More Pipelining -> Low-Power, High-Speed • Hierarchical Folding Algorithm: D 2D switch i switch 2i, 2i+1 6 /11

4/30/2012

Advances in Coding Theory • Turbo Codes • LDPC Codes • Polar Codes (Most Recent)

2 /11

4/30/2012

What are polar codes? Successive cancellation List-decoding WiMax turbo WiMax LDPC List + CRC-16

Broadcast channels

Systematic + List + CRC-16

Wiretap channels

Point-to-point channels

• Arıkan introduced polar coding in his breakthrough paper. • Polar codes have provably capacity-achieving capability. • The are applicable in a diverse set of scenarios. E. Arıkan, “Channel polarization: a method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. on Inf. Theory, vol. 55, no. 7, pp. 3051-3073, July 2009. Plot from UCSD Web link

Successive Cancellation (SC) decoding Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

(1) 1

L (y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 )  L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2  uˆ1,2ie 2 ) 2]  tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

L(11) ( y 7 ) 8

L(11) ( y 8 )

uˆ 4 : Type II PE

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

: Type I PE

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure. However, the decoding latency is 2(N-1). N over 210 are always required.

Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2  uˆ1,2ei 2 )  L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2  uˆ1,2ei 2 ) 2]  tanh[ L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ) 2]}.

SC decoding algorithm Stage 1

1

Stage 2

2

Stage 3

3

L(11) ( y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

8

9

10

1

5

6

(1) 1

L (y 3 ) (1) 1

L (y 4 )

uˆ 3  uˆ 4

8

uˆ1  uˆ 2 12

13

1

uˆ 5  uˆ 6

4

2

L(11) ( y 5 ) L(11) ( y 6 )

uˆ 2  uˆ 4

8

9

uˆ1

1

5

uˆ 5

L(11) ( y 7 ) 8

L(11) ( y 8 )

uˆ 4

11 7

uˆ 2 12

uˆ 3 14

uˆ 6

uˆ 7

L(81) ( y 18 )

uˆ1

L(85 ) ( y 18 , uˆ14 )

uˆ 5

L(83 ) ( y 18 , uˆ12 )

uˆ 3

L(87 ) ( y 18 , uˆ16 )

uˆ 7

L(82 ) ( y 18 , uˆ1 )

uˆ 2

L(86 ) ( y 18 , uˆ15 )

uˆ 6

L(84 ) ( y 18 , uˆ13 )

uˆ 4

L(88 ) ( y 18 , uˆ17 )

uˆ 8

Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure. However, the decoding latency is 2(N-1). N over 210 are always required.

: Type II PE : Type I PE How to reduce the latency? Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2  uˆ1,2ei 2 )  L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2  uˆ1,2ei 2 ) 2]  tanh[ L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ) 2]}.

Data flow graph (DFG) analysis Stage 1

1 A1

Stage 2

2 C1

Stage 3

3 E1

L (y 1 )

L(11) ( y 2 )

uˆ1  uˆ 2  uˆ 3  uˆ 4

(1) 1

8 B1

9 C3

10 E3

1 A2

5 D1

6 E2 (3)

8 B2

uˆ1  uˆ 2 12 D3

L (y 3 ) L(11) ( y 4 )

uˆ 3  uˆ 4

1 A3

uˆ 5  uˆ 6

(1) 1

L (y 6 )

uˆ 2  uˆ 4

L8 ( y 18 , uˆ12 )

13 E4 4 F1

9 C4

uˆ1

1 A4

5 D2

uˆ 5

8 B4

uˆ 2 12 D4

uˆ 4 : Type II PE

uˆ 6

The derived DFG is singlerated.

L(87 ) ( y 18 , uˆ16 )

8 B3

L(11) ( y 7 ) L(11) ( y 8 )

L(85 ) ( y 18 , uˆ14 )

2 C2

L(11) ( y 5 )

Marked each PE with red labels as indicated.

L(81) ( y 18 )

(1) 1

L(82 ) ( y 18 , uˆ1 )

11 F3

L(86 ) ( y 18 , uˆ15 )

Now we are able to derive the decoder architectures.

7 F2

L(84 ) ( y 18 , uˆ13 )

uˆ 3 14 F4 L(88 ) ( y 18 , uˆ17 ) D uˆ 7 A1

: Type I PE

C1 A2

D

B1 D

Get the DFG for it.

C1

D

D

F1

E1

D

D

D2

D

B3

C2 B4

uˆ 2

uˆ 3

uˆ 4

D1

D D

F1

D

E1

D

F1

D

D

D

D

uˆ1

D

E1 D

D

D

F1

D C2

A4

D

D B2 E1

A3

D1

D

D

D

D

D2

D

D

uˆ 5

uˆ 6

uˆ 7

uˆ 8

Latency-reduced architecture • First, we would like to derive a multi-rate version of the previous DFG. • Then, with the look-ahead manner, it can be further refined as follows.

Stage 3 1

E

2

Stage 2 D

{2,9}

1

{5,12}

C

2

Stage 1 Stage 3

{1}

D

1 {8}

start

D F 1

1

D

2 {3,6,10,13} 4

D

1

A

1

B

1

E/F

Stage 2 {3,6}

2

D {4,7}

1

C/D

D {4,7} 2 {5}

4

end

end{4,7,11,14}

C. Zhang, B. Yuan and K.K. Parhi, IEEE ICC 2012

2

Stage 1 {2}

D {5}

1

A/B start

Latency-reduced architecture Stage 1

Merged PEs are used instead of Type I and Type II PEs.

L(11) ( y 1 )

L(11) ( y 2 )

L(11) ( y 4 )

L(11) ( y 6 )

Only half of the delay elements are employed by the feedback part.

D

O2

D

0

I2 O 3

D

1

L (y 7 ) L(11) ( y 8 )

0

I1 O1

D

O2

D

0

I2 O 3

D

1

I1 O1

D

O2

D

0

I2 O 3

D

1

I1 O1

D

O2

D

0

I2 O 3

D

1

uˆ 4

D

0

O2

D

0

I2 O 3

D

1

1

I1 O1

D

O2

D

0

0

I2 O 3

D

1

m1

uˆ1  uˆ 2 or uˆ 5  uˆ 6 D

D D D

uˆ 2 or uˆ 6 D

D

0 1

0 1

d2

uˆ 2 i 1

I1 O1

0

uˆ 2 or uˆ 6

1

I2

1

uˆ1  uˆ 2 or uˆ 5  uˆ 6

1

: Merged PE

O2 O3

0

I1 O1

uˆ1  uˆ 2  uˆ 3  uˆ 4

uˆ 2  uˆ 4 uˆ 4

1

uˆ 2  uˆ 4

(1) 1

uˆ 3  uˆ 4

O1 I1

uˆ 3  uˆ 4

L(11) ( y 5 )

Stage 3

0

I1 O1

uˆ1  uˆ 2  uˆ 3  uˆ 4

L(11) ( y 3 )

The decoding latency is only 50% of tree architecture.

Stage 2

1

m2

O2

0

I2 O 3

1

uˆ 2 i

Additional architectures Stage 3 and 3'

Stage 2

Stage 1 {4, 5, 7, 8} D D D D

0 0

uˆ 2 i 1

uˆ 2 i

O1 I1

0

O2

0

1

O3 I2

1

D D D D

O1 I1

D D D

O3 I2

1

1

0 0

1

1

O3 I2

D D D D

O1 I1

D D D

O3 I2

L (y 4 )

D D D D

O1 I1

L(11) ( y 5 )

0

D D D

O3 I2

L(11) ( y 2 )

uˆ 2 i

0

O2

0

1

O3 I2

1

D D D D

O1 I1

D D D

O3 I2

1

1

L (y 3 )

0

0

1

{5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7}

1

m1

{4, 0}

3

1 0 0

U1 2 U2 U3

1

DD D

DD D

D D D

O3 I2

DD

0

L(11) ( y 7 )

1

L(11) ( y 8 )

DD

1

0

uˆ 2 i

D

0

uˆ 2 i 1

{4, 5, 7, 8} {4, 5, 7, 8}

1

{1, 2} (1) {3, 6} 1

L ( y 3 ), L(11) ( y 11 )

{1}

O1 I1

{2, 3, 6} {1}

O2 {2, 3, 6} {1}

{1, 2} (1) {3, 6} 1

L ( y 4 ), L(11) ( y 12 )

O3 I2

{2, 3, 6}

{3, 6} {1, 2} (1) 1

{1} O1 I1 {2, 3, 6} {1} O2 {2, 3, 6} {1} O3 I2

L ( y 5 ), L(11) ( y 13 )

D

DD

D DD D

{2, 3, 6} {1}

D

{2, 3, 6}

1

{4, 7}

1 0 0

1 0

O2

{4, …, 8} {1, 2}

O3 I2

L(11) ( y 8 ), L(11) ( y 16 )

{3}

{5}

D D

1

1 0

U1 U1 U2

L ( y 7 ), L(11) ( y 15 )

O1 I1

{5} {4, 7}

d 0 1 d 1 0 1

{4, …, 8} {1, 2} (1) 1 {3}

{1}

{2, 3, 6} {1}

{6}

2

L ( y 6 ), L(11) ( y 14 )

{2, 3, 6}

DD

0

1

{3, 6} {1, 2} (1) 1

{4, 5, 7, 8}

0

{5}

L ( y 1 ), L(11) ( y 9 )

L ( y 2 ), L(11) ( y 10 )

Partial parallel

L(11) ( y 6 )

O2

D

U1 2 U2 U3

DD D D DD D

7l+2, …, 7 O1 I1

{1} {2, …, 7}

O2

{2, 5} 1

O3

I2

{1}

{3, 4, 6, 7}

{1}

L(12 i 1) ( y 18 , uˆ12 i 2 )

O1 I1

0

D

1

DFG

: Pipeline

DD 0

(1) 1

D

0

    

O1 I1

{1, 2}

{3, …, 8} (1) {1, 2} 1

{2, 3, 6}

DD D

DD D D DD DD

D

D

8l+3, …, 8 (1) 1

O1 I1 {2, 3, 6} {1} O2 {2, 3, 6} {1} O3 I2

{4, 5, 7, 8}

0

D DD

D

1

1 0

D D D D

{4}

1 0

U1 U1 U2

uˆ 4

1

{1}

{4, 5, 7, 8}

1

1

d1 d 0 1 d 1 0 1

1 2

{3, 4, 6, 7} {2, 5} {3, 4, 6, 7}

2 outputs

Based on the previous analysis on the DFG, numerous decoder architectures can be obtained for different applications.

{5, 1} {3, 6}

DD

O2

uˆ 2  uˆ 4

O2

uˆ 2 or uˆ 6

1

0

uˆ 2 i

(1) 1

D

0

uˆ 2 i 1

Fully pipelined O1 I1

DD

0

O2

uˆ 3  uˆ 4 0

uˆ 2 i 1

D D D

DD D

L(11) ( y 1 )

O2

uˆ1  uˆ 2  uˆ 3  uˆ 4

O2

uˆ1  uˆ 2 or uˆ 5  uˆ 6

O1 I1

0

O2

1

O3 I2

{2, 5} {1}

L(11) ( y 1 ) L(11) ( y 2 ) L(11) ( y 3 ) L(11) ( y 4 )

{2, 5} 1

L(12 i ) ( y 18 , uˆ12 i 1 )

signs

O1 I1

Folded

L(11) ( y 5 )

0

O2

1

O3 I2

L(11) ( y 6 )

O1 I1

L(11) ( y 7 )

O2

0

O3 I2

1

{3, 6}

{4}

c1 RAM2 0 1

0 1

U1 U2

L(11) ( y 8 )

And more …

Body Area Network

Wireless Sensor Nodes in Healthcare • http://ieeewban.wordpress.com/author/mikuslaw/

Wireless BAN

WBAN Applications • • • •

Chronic disease monitoring Episodic patient monitoring Patients alarm monitoring Elderly people monitoring

Biomedical monitoring systems

OFFLINE TRAINING Recordings from Databases

ONLINE DETECTION

Electrodes

Feature extraction

Feature selection

Selected feature set

Feature extraction Spectral power Wavelets Auto-regressive coefficients ICA

Classification

Linear SVM Non-linear SVM Adaboost

Classifier Training

Classifier Model

Postprocessing

Moving avearge Kalman

Drug Delivery System/ Create an alert

Closed-loop systems http://ieeewban.wordpress.com/author/mikuslaw/

MIMO

Compared with the SISO case, channel capacity increases ~min{M,K} times by using a M x K antenna array.

Problem Statement - Transmitter

Transmitted Signal

Image face from http://drawinghowtodraw.com/stepbyste pdrawinglessons/wpcontent/uploads/2010/01/cartoonfacesh eads360degrees.png

Problem Statement – Access Point

• Timing? (signal arrival time) • Channel information? • Carrier frequency offset?

Received Signal

Solution • Preamble

• Access Point

Contributions

• Achieve perfect timing synchronization when SNR ≤ 0dB (100% chance to find the correct timing) – Existing methods only have ≤ 40% chance to find the correct timing at the same SNR

• Zero BER is achievable when SNR ≥ 0dB – Existing methods have error floors, and may not achieve zero BER at any SNR (SNR >> 0dB)

• •

Te-Lung Kung, Keshab Parhi, “Optimized Joint Timing Synchronization and Channel Estimation for OFDM Systems,” IEEE Wireless Communications Letters, on IEEExplore (Early Access). Te-Lung Kung, Keshab Parhi, “Frequency Domain Symbol Synchronization for OFDM Systems,” IEEE EIT, May, 2011.

Support Vector Machines • Most widely used classification algorithm – Training based on quadratic optimization • Non-linear SVMs (kernel based) d • Map x to some high dimensional space  : R   • The derived feature vectors are ( x j ) • Kernel function allows implicit calculation of dot products • Learn a linear separator in high dimensional space K ( xi , x j )  ( xi )T ( x j )

• The final prediction is

f ( x) 

T a y  ( x )  i i i ( x )  b 

i:ai 0

 a y K ( x , x)  b

i:ai 0

i

i

i

Illustration of SVM

34

SVM Classification • Three popular kernels • Linear • Polynomial

K ( xi , x)  xiT x

  T f ( x)  sign   ai yi xi x  b   sign  wT x  b  where  i:ai 0 

w   ai yi xi i

K ( xi , x)  [ xiT x  1] p p=2:

K ( xi , x)  ( zi z ) T

2

 z T zi ziT z

where z  [ x 1]T

SVM Classification • Polynomial

f ( x)  sign  z Wz  b  where T

• RBF Kernel K ( xi , x)  e

  x x i   2 

W

T a y z z  iiii

i:ai  0 2

   

  f ( x)  sign   ai yi K ( xi , x)  b   i:ai 0 

• Further simplifications not possible for RBF

Computational complexity

Kernel

#words #addition (memory) s

#multiplications

Linear

d

d

d

Polynomial (p=2)

d2

d2

d(d+1)

RBF

Nsv(d+1)

2Nsvd

Nsvd

* Additional Nsv exponential operations for RBF

• Complexity depends on number of support vectors and # dimensions

Reducing the complexity • Number of support vectors (Nsv) – Reduced SVM (RSVM) can be used – Number of SVs decrease while training • Feature dimensionality (d) – Feature selection algorithms – SVM-RFE (Recursive Feature Elimination) – Adaboost, HPD, etc.,. • Optimizing the hardware – MAC and exponent operations – Memory requirements depends on word length

Configurable SVM processor

39

SVM Architectures: Energy Consumption

40

FFT Architectures: Prior Designs • 4-parallel delay-feedback

M. Shin, et.al 2008

Contain 4 datapaths Yuan chen, et.al 2008

41

DIF Design • 4-parallel feed-forward design (DIF)

• Two datapaths, processing 2 samples each • Requires 3N/2 delay elements • Hardware utilization is 100%

42

DIF Design • 8-parallel feed-forward design

43

DIT 4-parallel Architecture • N-point FFT requires – N-4 delay elements – 4logN complex adders – #multipliers depend on the algorithm No delays at this stage

44

DIT 8-parallel Architecture • Requires N-8 delay elements

45

No delays at two stages

DIT128-point FFT Architecture • Hardware complexity – Complex adders: 28 – Complex multipliers: 4+0.41 – Delay elements: 124

M. Ayinala, M. Brown, K.K. Parhi, IEEE Trans. VLSI Systems, June 2012 (patent) M. Ayinala, K.K. Parhi, ACM Great Lakes Symp., Utah, May 2012 46

Seizure prediction

(a) Open-loop

(b) Closed-loop

47

Seizure Prediction • Objective: Patient-specific prediction of seizures (5 min ahead) from EEG signal (6 electrodes) • Issues: unbalanced data, feature selection input pattern

feature extrac tion

X

classifier

dec ision (class label)

• System implementation details: • features ~ power measured in 9 spectral bands for 4 differential channels. Total 4x6 = 36 features • classifier ~ Adaboost with decision stumps Y. Park, L. Luo, K. Parhi, T. Netoff, Epilepsia, Oct. 2011 48

EEG Data for Classification  Parts of EEG data identified by medical experts: ictal,

preictal (+), interictal(-)  Preictal and interictal data used for classification  Each data sample ~20 sec moving window

At least 1-hour gap Preictal (Class +1)

Interictal (Class -1)

49

Seizure Prediction

50

Seizure Prediction

51

Physical Unclonable Functions (PUFs) “It is estimated that as much as 10% of all high-tech products sold globally are counterfeit which leads to a conservative estimate of 100 billion of revenue loss.” [Guajardo et al, 2008]

Device cloning

Side-channel attack

• Security Challenges – Computing devices are becoming physically exposed – Adversaries may physically temper the devices and extract secret keys from non-volatile memory – Software-only protections are not enough

What is PUF? • Extract secret keys from complex physical objects • Due to manufacturing process variations, no two Integrated Circuits even with the same layouts are identical Physical Objects

Process Variations

PUF

Unpredictable Behavior Easy to Evaluate Hard to Clone

Unclonable anti-counterfeiting marks for ICs!

Silicon MUX PUF Challenge 1

1

0

0

0

0

0

1

1

1

1



0

D Q

Response

G

• All the multiplexers are identically designed. • Each challenge creates two paths through the circuit. • The response is generated by the racing result of the two paths. • No special fabrication needed.

Characteristics of PUFs • Security – Uniqueness: inter-chip variation – Unclonability: randomness – Unpredictability: hard to model

• Reliability – Intra-chip variation – Authentication robustness (add extra processing circuits, e.g., error correcting techniques)

Contributions • Logically-reconfigurable PUFs (security) • Systematic statistical analysis of (feed-forward) MUX PUFs • Modified feed-forward path (reliability) • Two-arbiter authentication scheme (reliability) [1] Y. Lao and K.K. Parhi, "Novel Reconfigurable Silicon Physical Unclonable Functions", Proc. of 2011 Workshop on Foundations of Dependable and Secure Cyber-Physical Systems (FDSCPS-11)," pp. 30-36, Chicago, April 2011 [2] Y. Lao and K.K. Parhi, "Reconfigurable Architectures for Silicon Physical Unclonable Functions," Proc. of IEEE Int. Conference on Electro Information Technology , Mankato, May 2011

Logically-reconfigurable PUFs • Reconfigurable PUF circuit – Alter the model of PUF circuit to update the challengeresponse behavior, instead of re-mapping the challenge and response through pre- and/or post-processing – Several novel solutions, e.g., Reconfigurable feed-forward MUX PUF MUX and DeMUX PUF

(Challenge)

n PUF

Response

Reconfigurable

Reconfigurable feed-forward MUX PUF • Ideas: using reconfigurable feed-forward path – Original MUX PUF can be modeled as a linear additive delay model – Feed-forward path: add nonlinearity to MUX PUF, improve the security

• Three types of feed-forward path: Cascade, Overlap, Separate – based on the beginning stage and the ending stage of the feed-forward path – experimental results have shown that the inter-chip and intra-chip characteristics of the 3 types are different – our statistical analysis has demonstrated that the mathematical models of the 3 types are different

Why reconfigurable PUFs? Reconfigurability is desirable: 1. Application needs: updatable authentication keys 2. Improving the security, as the challenge-response behaviors can be updated (against modeling attacks).

Solutions for reconfigurablility Challenge-like Challenge-like

Vulnerable to attacks & Poor performance.

Reconfigurable RO Silicon PUF The frequencies of ring oscillators are possible to be evaluated by attackers

FPGA based

Hard to implement: lower level design detail, symmetrical routing

Conclusions • Wireless communications systems for body area network will grow significantly • Biomedical monitoring systems and drug delivery systems will grow • Low-power DSP for biomedical monitoring will grow • IC Chip Security by PUFs for biomedical systems

2 /11

4/30/2012

Acknowledgements • Chuan Zhang, Bo Yuan (Polar) • Te-Lung Kung (Wireless BAN) • Manohar Ayinala (SVM, FFT) • Yun-Sang Park, Lan Luo, Prof. T. Netoff (Epilepsy) • Yingjie Lao (PUF)

2 /11

4/30/2012