May 23, 2012 ... Communications and Biomedical Systems. Keshab K. Parhi ... 3rd-order IIR filter.
• See Parhi, VLSI Digital Signal Processing Systems,. Wiley ...
Digital Signal Processing for Embedded Communications and Biomedical Systems
Keshab K. Parhi Distinguished McKnight University Professor University of Minnesota, Minneapolis http://www.ece.umn.edu/users/parhi May 23, 2012
OUTLINE • Communications Systems - Folding - Polar Decoders
• Biomedical Systems - Communication - Feature Computation and Classification - Monitoring • IC Chip Security by PUF
2 /11
4/30/2012
Wireless Phone Timeline • http://gizmodo.com/357895/the-analog-cellphone-timeline
Folding Transformation • 3rd-order IIR filter
• See Parhi, VLSI Digital Signal Processing Systems, Wiley, 1999 • A possible folding set: A={A0, A1, A2, A3}, M={M0, M1, M2, M3} 3 /11
4/30/2012
Folding Transformation (Cont’d) • Folded 3rd-order IIR filter
• Multiple algorithm operations are time-multiplexed to a single functional unit • Area reduction! 4 /11
4/30/2012
Folding Transformation (Cont’d) • 6th-order IIR filter (cascade of two 3rd-order IIR filter)
• Also can be folded into 1multiplier and 1 adder • A possible folding set with interleaved ordering: A={A0, A0’, A1, A1’, A2, A2’, A3, A3’ }, M={M0, M0’, M1, M1’, M2, M2’, M3, M3’} 5 /11
4/30/2012
Folding Transformation (Cont’d) • Folded 6th-order IIR filter
• More Pipelining -> Low-Power, High-Speed • Hierarchical Folding Algorithm: D 2D switch i switch 2i, 2i+1 6 /11
4/30/2012
Advances in Coding Theory • Turbo Codes • LDPC Codes • Polar Codes (Most Recent)
2 /11
4/30/2012
What are polar codes? Successive cancellation List-decoding WiMax turbo WiMax LDPC List + CRC-16
Broadcast channels
Systematic + List + CRC-16
Wiretap channels
Point-to-point channels
• Arıkan introduced polar coding in his breakthrough paper. • Polar codes have provably capacity-achieving capability. • The are applicable in a diverse set of scenarios. E. Arıkan, “Channel polarization: a method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. on Inf. Theory, vol. 55, no. 7, pp. 3051-3073, July 2009. Plot from UCSD Web link
Successive Cancellation (SC) decoding Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
(1) 1
L (y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2io 2 uˆ1,2ie 2 ) 2] tanh[L(Ni ) 2 ( yNN 2 1 , uˆ1,2ie 2 ) 2 ]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
L(11) ( y 7 ) 8
L(11) ( y 8 )
uˆ 4 : Type II PE
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
: Type I PE
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure. However, the decoding latency is 2(N-1). N over 210 are always required.
Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2 uˆ1,2ei 2 ) L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2 uˆ1,2ei 2 ) 2] tanh[ L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ) 2]}.
SC decoding algorithm Stage 1
1
Stage 2
2
Stage 3
3
L(11) ( y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
8
9
10
1
5
6
(1) 1
L (y 3 ) (1) 1
L (y 4 )
uˆ 3 uˆ 4
8
uˆ1 uˆ 2 12
13
1
uˆ 5 uˆ 6
4
2
L(11) ( y 5 ) L(11) ( y 6 )
uˆ 2 uˆ 4
8
9
uˆ1
1
5
uˆ 5
L(11) ( y 7 ) 8
L(11) ( y 8 )
uˆ 4
11 7
uˆ 2 12
uˆ 3 14
uˆ 6
uˆ 7
L(81) ( y 18 )
uˆ1
L(85 ) ( y 18 , uˆ14 )
uˆ 5
L(83 ) ( y 18 , uˆ12 )
uˆ 3
L(87 ) ( y 18 , uˆ16 )
uˆ 7
L(82 ) ( y 18 , uˆ1 )
uˆ 2
L(86 ) ( y 18 , uˆ15 )
uˆ 6
L(84 ) ( y 18 , uˆ13 )
uˆ 4
L(88 ) ( y 18 , uˆ17 )
uˆ 8
Successive cancellation (SC) is one of the most popular decoding algorithms. It is suitable for VLSI implementation for the FFTlike structure. However, the decoding latency is 2(N-1). N over 210 are always required.
: Type II PE : Type I PE How to reduce the latency? Type I PE: L(2Ni ) ( y1N , uˆ12i 1 )=(-1)uˆ2 i1 L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2 uˆ1,2ei 2 ) L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ), Type II PE: L(2Ni -1) ( y1N , uˆ12i 1 )=2artanh{tanh[ L(Ni ) 2 ( y1N 2 , uˆ1,2oi 2 uˆ1,2ei 2 ) 2] tanh[ L(Ni ) 2 ( yNN 21 , uˆ1,2ei 2 ) 2]}.
Data flow graph (DFG) analysis Stage 1
1 A1
Stage 2
2 C1
Stage 3
3 E1
L (y 1 )
L(11) ( y 2 )
uˆ1 uˆ 2 uˆ 3 uˆ 4
(1) 1
8 B1
9 C3
10 E3
1 A2
5 D1
6 E2 (3)
8 B2
uˆ1 uˆ 2 12 D3
L (y 3 ) L(11) ( y 4 )
uˆ 3 uˆ 4
1 A3
uˆ 5 uˆ 6
(1) 1
L (y 6 )
uˆ 2 uˆ 4
L8 ( y 18 , uˆ12 )
13 E4 4 F1
9 C4
uˆ1
1 A4
5 D2
uˆ 5
8 B4
uˆ 2 12 D4
uˆ 4 : Type II PE
uˆ 6
The derived DFG is singlerated.
L(87 ) ( y 18 , uˆ16 )
8 B3
L(11) ( y 7 ) L(11) ( y 8 )
L(85 ) ( y 18 , uˆ14 )
2 C2
L(11) ( y 5 )
Marked each PE with red labels as indicated.
L(81) ( y 18 )
(1) 1
L(82 ) ( y 18 , uˆ1 )
11 F3
L(86 ) ( y 18 , uˆ15 )
Now we are able to derive the decoder architectures.
7 F2
L(84 ) ( y 18 , uˆ13 )
uˆ 3 14 F4 L(88 ) ( y 18 , uˆ17 ) D uˆ 7 A1
: Type I PE
C1 A2
D
B1 D
Get the DFG for it.
C1
D
D
F1
E1
D
D
D2
D
B3
C2 B4
uˆ 2
uˆ 3
uˆ 4
D1
D D
F1
D
E1
D
F1
D
D
D
D
uˆ1
D
E1 D
D
D
F1
D C2
A4
D
D B2 E1
A3
D1
D
D
D
D
D2
D
D
uˆ 5
uˆ 6
uˆ 7
uˆ 8
Latency-reduced architecture • First, we would like to derive a multi-rate version of the previous DFG. • Then, with the look-ahead manner, it can be further refined as follows.
Stage 3 1
E
2
Stage 2 D
{2,9}
1
{5,12}
C
2
Stage 1 Stage 3
{1}
D
1 {8}
start
D F 1
1
D
2 {3,6,10,13} 4
D
1
A
1
B
1
E/F
Stage 2 {3,6}
2
D {4,7}
1
C/D
D {4,7} 2 {5}
4
end
end{4,7,11,14}
C. Zhang, B. Yuan and K.K. Parhi, IEEE ICC 2012
2
Stage 1 {2}
D {5}
1
A/B start
Latency-reduced architecture Stage 1
Merged PEs are used instead of Type I and Type II PEs.
L(11) ( y 1 )
L(11) ( y 2 )
L(11) ( y 4 )
L(11) ( y 6 )
Only half of the delay elements are employed by the feedback part.
D
O2
D
0
I2 O 3
D
1
L (y 7 ) L(11) ( y 8 )
0
I1 O1
D
O2
D
0
I2 O 3
D
1
I1 O1
D
O2
D
0
I2 O 3
D
1
I1 O1
D
O2
D
0
I2 O 3
D
1
uˆ 4
D
0
O2
D
0
I2 O 3
D
1
1
I1 O1
D
O2
D
0
0
I2 O 3
D
1
m1
uˆ1 uˆ 2 or uˆ 5 uˆ 6 D
D D D
uˆ 2 or uˆ 6 D
D
0 1
0 1
d2
uˆ 2 i 1
I1 O1
0
uˆ 2 or uˆ 6
1
I2
1
uˆ1 uˆ 2 or uˆ 5 uˆ 6
1
: Merged PE
O2 O3
0
I1 O1
uˆ1 uˆ 2 uˆ 3 uˆ 4
uˆ 2 uˆ 4 uˆ 4
1
uˆ 2 uˆ 4
(1) 1
uˆ 3 uˆ 4
O1 I1
uˆ 3 uˆ 4
L(11) ( y 5 )
Stage 3
0
I1 O1
uˆ1 uˆ 2 uˆ 3 uˆ 4
L(11) ( y 3 )
The decoding latency is only 50% of tree architecture.
Stage 2
1
m2
O2
0
I2 O 3
1
uˆ 2 i
Additional architectures Stage 3 and 3'
Stage 2
Stage 1 {4, 5, 7, 8} D D D D
0 0
uˆ 2 i 1
uˆ 2 i
O1 I1
0
O2
0
1
O3 I2
1
D D D D
O1 I1
D D D
O3 I2
1
1
0 0
1
1
O3 I2
D D D D
O1 I1
D D D
O3 I2
L (y 4 )
D D D D
O1 I1
L(11) ( y 5 )
0
D D D
O3 I2
L(11) ( y 2 )
uˆ 2 i
0
O2
0
1
O3 I2
1
D D D D
O1 I1
D D D
O3 I2
1
1
L (y 3 )
0
0
1
{5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7} {5, 1} {3, 6} {4, 7}
1
m1
{4, 0}
3
1 0 0
U1 2 U2 U3
1
DD D
DD D
D D D
O3 I2
DD
0
L(11) ( y 7 )
1
L(11) ( y 8 )
DD
1
0
uˆ 2 i
D
0
uˆ 2 i 1
{4, 5, 7, 8} {4, 5, 7, 8}
1
{1, 2} (1) {3, 6} 1
L ( y 3 ), L(11) ( y 11 )
{1}
O1 I1
{2, 3, 6} {1}
O2 {2, 3, 6} {1}
{1, 2} (1) {3, 6} 1
L ( y 4 ), L(11) ( y 12 )
O3 I2
{2, 3, 6}
{3, 6} {1, 2} (1) 1
{1} O1 I1 {2, 3, 6} {1} O2 {2, 3, 6} {1} O3 I2
L ( y 5 ), L(11) ( y 13 )
D
DD
D DD D
{2, 3, 6} {1}
D
{2, 3, 6}
1
{4, 7}
1 0 0
1 0
O2
{4, …, 8} {1, 2}
O3 I2
L(11) ( y 8 ), L(11) ( y 16 )
{3}
{5}
D D
1
1 0
U1 U1 U2
L ( y 7 ), L(11) ( y 15 )
O1 I1
{5} {4, 7}
d 0 1 d 1 0 1
{4, …, 8} {1, 2} (1) 1 {3}
{1}
{2, 3, 6} {1}
{6}
2
L ( y 6 ), L(11) ( y 14 )
{2, 3, 6}
DD
0
1
{3, 6} {1, 2} (1) 1
{4, 5, 7, 8}
0
{5}
L ( y 1 ), L(11) ( y 9 )
L ( y 2 ), L(11) ( y 10 )
Partial parallel
L(11) ( y 6 )
O2
D
U1 2 U2 U3
DD D D DD D
7l+2, …, 7 O1 I1
{1} {2, …, 7}
O2
{2, 5} 1
O3
I2
{1}
{3, 4, 6, 7}
{1}
L(12 i 1) ( y 18 , uˆ12 i 2 )
O1 I1
0
D
1
DFG
: Pipeline
DD 0
(1) 1
D
0
O1 I1
{1, 2}
{3, …, 8} (1) {1, 2} 1
{2, 3, 6}
DD D
DD D D DD DD
D
D
8l+3, …, 8 (1) 1
O1 I1 {2, 3, 6} {1} O2 {2, 3, 6} {1} O3 I2
{4, 5, 7, 8}
0
D DD
D
1
1 0
D D D D
{4}
1 0
U1 U1 U2
uˆ 4
1
{1}
{4, 5, 7, 8}
1
1
d1 d 0 1 d 1 0 1
1 2
{3, 4, 6, 7} {2, 5} {3, 4, 6, 7}
2 outputs
Based on the previous analysis on the DFG, numerous decoder architectures can be obtained for different applications.
{5, 1} {3, 6}
DD
O2
uˆ 2 uˆ 4
O2
uˆ 2 or uˆ 6
1
0
uˆ 2 i
(1) 1
D
0
uˆ 2 i 1
Fully pipelined O1 I1
DD
0
O2
uˆ 3 uˆ 4 0
uˆ 2 i 1
D D D
DD D
L(11) ( y 1 )
O2
uˆ1 uˆ 2 uˆ 3 uˆ 4
O2
uˆ1 uˆ 2 or uˆ 5 uˆ 6
O1 I1
0
O2
1
O3 I2
{2, 5} {1}
L(11) ( y 1 ) L(11) ( y 2 ) L(11) ( y 3 ) L(11) ( y 4 )
{2, 5} 1
L(12 i ) ( y 18 , uˆ12 i 1 )
signs
O1 I1
Folded
L(11) ( y 5 )
0
O2
1
O3 I2
L(11) ( y 6 )
O1 I1
L(11) ( y 7 )
O2
0
O3 I2
1
{3, 6}
{4}
c1 RAM2 0 1
0 1
U1 U2
L(11) ( y 8 )
And more …
Body Area Network
Wireless Sensor Nodes in Healthcare • http://ieeewban.wordpress.com/author/mikuslaw/
Wireless BAN
WBAN Applications • • • •
Chronic disease monitoring Episodic patient monitoring Patients alarm monitoring Elderly people monitoring
Biomedical monitoring systems
OFFLINE TRAINING Recordings from Databases
ONLINE DETECTION
Electrodes
Feature extraction
Feature selection
Selected feature set
Feature extraction Spectral power Wavelets Auto-regressive coefficients ICA
Classification
Linear SVM Non-linear SVM Adaboost
Classifier Training
Classifier Model
Postprocessing
Moving avearge Kalman
Drug Delivery System/ Create an alert
Closed-loop systems http://ieeewban.wordpress.com/author/mikuslaw/
MIMO
Compared with the SISO case, channel capacity increases ~min{M,K} times by using a M x K antenna array.
Problem Statement - Transmitter
Transmitted Signal
Image face from http://drawinghowtodraw.com/stepbyste pdrawinglessons/wpcontent/uploads/2010/01/cartoonfacesh eads360degrees.png
Problem Statement – Access Point
• Timing? (signal arrival time) • Channel information? • Carrier frequency offset?
Received Signal
Solution • Preamble
• Access Point
Contributions
• Achieve perfect timing synchronization when SNR ≤ 0dB (100% chance to find the correct timing) – Existing methods only have ≤ 40% chance to find the correct timing at the same SNR
• Zero BER is achievable when SNR ≥ 0dB – Existing methods have error floors, and may not achieve zero BER at any SNR (SNR >> 0dB)
• •
Te-Lung Kung, Keshab Parhi, “Optimized Joint Timing Synchronization and Channel Estimation for OFDM Systems,” IEEE Wireless Communications Letters, on IEEExplore (Early Access). Te-Lung Kung, Keshab Parhi, “Frequency Domain Symbol Synchronization for OFDM Systems,” IEEE EIT, May, 2011.
Support Vector Machines • Most widely used classification algorithm – Training based on quadratic optimization • Non-linear SVMs (kernel based) d • Map x to some high dimensional space : R • The derived feature vectors are ( x j ) • Kernel function allows implicit calculation of dot products • Learn a linear separator in high dimensional space K ( xi , x j ) ( xi )T ( x j )
• The final prediction is
f ( x)
T a y ( x ) i i i ( x ) b
i:ai 0
a y K ( x , x) b
i:ai 0
i
i
i
Illustration of SVM
34
SVM Classification • Three popular kernels • Linear • Polynomial
K ( xi , x) xiT x
T f ( x) sign ai yi xi x b sign wT x b where i:ai 0
w ai yi xi i
K ( xi , x) [ xiT x 1] p p=2:
K ( xi , x) ( zi z ) T
2
z T zi ziT z
where z [ x 1]T
SVM Classification • Polynomial
f ( x) sign z Wz b where T
• RBF Kernel K ( xi , x) e
x x i 2
W
T a y z z iiii
i:ai 0 2
f ( x) sign ai yi K ( xi , x) b i:ai 0
• Further simplifications not possible for RBF
Computational complexity
Kernel
#words #addition (memory) s
#multiplications
Linear
d
d
d
Polynomial (p=2)
d2
d2
d(d+1)
RBF
Nsv(d+1)
2Nsvd
Nsvd
* Additional Nsv exponential operations for RBF
• Complexity depends on number of support vectors and # dimensions
Reducing the complexity • Number of support vectors (Nsv) – Reduced SVM (RSVM) can be used – Number of SVs decrease while training • Feature dimensionality (d) – Feature selection algorithms – SVM-RFE (Recursive Feature Elimination) – Adaboost, HPD, etc.,. • Optimizing the hardware – MAC and exponent operations – Memory requirements depends on word length
Configurable SVM processor
39
SVM Architectures: Energy Consumption
40
FFT Architectures: Prior Designs • 4-parallel delay-feedback
M. Shin, et.al 2008
Contain 4 datapaths Yuan chen, et.al 2008
41
DIF Design • 4-parallel feed-forward design (DIF)
• Two datapaths, processing 2 samples each • Requires 3N/2 delay elements • Hardware utilization is 100%
42
DIF Design • 8-parallel feed-forward design
43
DIT 4-parallel Architecture • N-point FFT requires – N-4 delay elements – 4logN complex adders – #multipliers depend on the algorithm No delays at this stage
44
DIT 8-parallel Architecture • Requires N-8 delay elements
45
No delays at two stages
DIT128-point FFT Architecture • Hardware complexity – Complex adders: 28 – Complex multipliers: 4+0.41 – Delay elements: 124
M. Ayinala, M. Brown, K.K. Parhi, IEEE Trans. VLSI Systems, June 2012 (patent) M. Ayinala, K.K. Parhi, ACM Great Lakes Symp., Utah, May 2012 46
Seizure prediction
(a) Open-loop
(b) Closed-loop
47
Seizure Prediction • Objective: Patient-specific prediction of seizures (5 min ahead) from EEG signal (6 electrodes) • Issues: unbalanced data, feature selection input pattern
feature extrac tion
X
classifier
dec ision (class label)
• System implementation details: • features ~ power measured in 9 spectral bands for 4 differential channels. Total 4x6 = 36 features • classifier ~ Adaboost with decision stumps Y. Park, L. Luo, K. Parhi, T. Netoff, Epilepsia, Oct. 2011 48
EEG Data for Classification Parts of EEG data identified by medical experts: ictal,
preictal (+), interictal(-) Preictal and interictal data used for classification Each data sample ~20 sec moving window
At least 1-hour gap Preictal (Class +1)
Interictal (Class -1)
49
Seizure Prediction
50
Seizure Prediction
51
Physical Unclonable Functions (PUFs) “It is estimated that as much as 10% of all high-tech products sold globally are counterfeit which leads to a conservative estimate of 100 billion of revenue loss.” [Guajardo et al, 2008]
Device cloning
Side-channel attack
• Security Challenges – Computing devices are becoming physically exposed – Adversaries may physically temper the devices and extract secret keys from non-volatile memory – Software-only protections are not enough
What is PUF? • Extract secret keys from complex physical objects • Due to manufacturing process variations, no two Integrated Circuits even with the same layouts are identical Physical Objects
Process Variations
PUF
Unpredictable Behavior Easy to Evaluate Hard to Clone
Unclonable anti-counterfeiting marks for ICs!
Silicon MUX PUF Challenge 1
1
0
0
0
0
0
1
1
1
1
…
0
D Q
Response
G
• All the multiplexers are identically designed. • Each challenge creates two paths through the circuit. • The response is generated by the racing result of the two paths. • No special fabrication needed.
Characteristics of PUFs • Security – Uniqueness: inter-chip variation – Unclonability: randomness – Unpredictability: hard to model
• Reliability – Intra-chip variation – Authentication robustness (add extra processing circuits, e.g., error correcting techniques)
Contributions • Logically-reconfigurable PUFs (security) • Systematic statistical analysis of (feed-forward) MUX PUFs • Modified feed-forward path (reliability) • Two-arbiter authentication scheme (reliability) [1] Y. Lao and K.K. Parhi, "Novel Reconfigurable Silicon Physical Unclonable Functions", Proc. of 2011 Workshop on Foundations of Dependable and Secure Cyber-Physical Systems (FDSCPS-11)," pp. 30-36, Chicago, April 2011 [2] Y. Lao and K.K. Parhi, "Reconfigurable Architectures for Silicon Physical Unclonable Functions," Proc. of IEEE Int. Conference on Electro Information Technology , Mankato, May 2011
Logically-reconfigurable PUFs • Reconfigurable PUF circuit – Alter the model of PUF circuit to update the challengeresponse behavior, instead of re-mapping the challenge and response through pre- and/or post-processing – Several novel solutions, e.g., Reconfigurable feed-forward MUX PUF MUX and DeMUX PUF
(Challenge)
n PUF
Response
Reconfigurable
Reconfigurable feed-forward MUX PUF • Ideas: using reconfigurable feed-forward path – Original MUX PUF can be modeled as a linear additive delay model – Feed-forward path: add nonlinearity to MUX PUF, improve the security
• Three types of feed-forward path: Cascade, Overlap, Separate – based on the beginning stage and the ending stage of the feed-forward path – experimental results have shown that the inter-chip and intra-chip characteristics of the 3 types are different – our statistical analysis has demonstrated that the mathematical models of the 3 types are different
Why reconfigurable PUFs? Reconfigurability is desirable: 1. Application needs: updatable authentication keys 2. Improving the security, as the challenge-response behaviors can be updated (against modeling attacks).
Solutions for reconfigurablility Challenge-like Challenge-like
Vulnerable to attacks & Poor performance.
Reconfigurable RO Silicon PUF The frequencies of ring oscillators are possible to be evaluated by attackers
FPGA based
Hard to implement: lower level design detail, symmetrical routing
Conclusions • Wireless communications systems for body area network will grow significantly • Biomedical monitoring systems and drug delivery systems will grow • Low-power DSP for biomedical monitoring will grow • IC Chip Security by PUFs for biomedical systems
2 /11
4/30/2012
Acknowledgements • Chuan Zhang, Bo Yuan (Polar) • Te-Lung Kung (Wireless BAN) • Manohar Ayinala (SVM, FFT) • Yun-Sang Park, Lan Luo, Prof. T. Netoff (Epilepsy) • Yingjie Lao (PUF)
2 /11
4/30/2012