A Parallel Hardware Architecture for Information-Theoretic Adaptive

0 downloads 0 Views 3MB Size Report
Crucial for real-time online learning. ▫. Computation time of filter coefficients .... Total of 192 Altera Stratix-III E260 FPGAs. ❑ Experiments only run on up to 2 ...
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering

HPRCTA 2010

November 14, 2010

Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe

NSF CHREC Center ECE Department, University of Florida

Outline 

Introduction to Information-Theoretic Learning (ITL) and Adaptive Filters (AF)  



Hardware Architecture 



Building Blocks

Experiments & Results  



Motivations Background

Application: System Identification Performance

Conclusions 2

Information Theory & Adaptive Filters 

Information-Theoretic Learning 

ITL: useful in context of nonlinear non-Gaussian signal processing   



How to extract and quantify information Entropy replaces variance (measure of uncertainty) Superior results when applied to nonlinear system identification

Adaptive Filters 

 

Self-adjusting filter (change transfer function by changing filter weights) Adapt to changes in input Stable solution reached in finite # of iterations

3

Project Description 

Digital Signal Processing Domain 

Adaptive Filters (AFs) used when:   



Target Application  



Statistics of input signal are unknown Estimate required signal statistics through iterative learning process Adjust internal parameters to meet specific performance criteria System Identification Approximate transfer function of unknown system

Why RC? 

Create custom hardware architecture  Take advantage of inherent parallelism within algorithm  Accelerate adaptation process

4

Challenges 

AF is closed-loop system  



Each iteration depends upon previous one. Can only parallelize computations within feedback loop

Most applications require 1000s to 10000s of iterations 

Easily lose precision over time  Never converge to optimal solution  Deviate and never reach stable weights

5

Advantages 

Find optimal filter weights faster? 

Yes, but not most important 



Model an unknown system faster

Target a much wider range of applications? 

Crucial for real-time online learning   

Computation time of filter coefficients dictates sampling rate Cannot process new samples until previous iteration is complete By accelerating one iteration we can increase sampling frequency

6

New Approach 

Using MEE vs. MSE cost function to adapt filter weights



MSE: most popular criterion used in adaptation (LMS algorithm) MEE requires considerably more computations within one iteration MEE has demonstrated superior performance at cost of a substantial increase in computational complexity RC: Adapt unique hardware architecture to MEE computational requirements







7

xi xj

AF Structure 

Composed of three major blocks N



Adaptive FIR

y   xi   i i1





FIR output approaches desired output as weights converge to optimal solution

 Cost Function and Learning Algorithm 

Learning Algorithm: updates the weights



Cost function defines rules of adaptation 1 N N V  2   (ei  e j ) N i1 j1

1 N N V  2   (ei  e j )  (ei  e j )  (xi  x j ) N i1 j1  

κ is Gaussian Kernel

  Window

size determines number of computations: complexity O(N2) 8

Effects of Window Size on AF 

Window size has direct effect on filter performance 

 





Determines how smooth and fast weights converge Determines hardware resources required Window size: most influential variable affecting speedup

Steep tradeoff between filter performance and computation time (in software) Proposed architecture transforms time-performance tradeoff to linear dependence 



Goal: RC custom architecture used to reduce complexity from O(N2) to O(N) Avoid losing precision over multiple iterations  

Employ hybrid-precision architecture Mix of fixed-point and floating-point (single-precision)

9

AF Building Blocks 

FIR Filter & MEE Cost Function 



Hybrid fixed/float (singleprecision) operations Balance: maintain precision while decreasing latency

10

Adaptive FIR   

Single-precision float operations Simple pipelined implementation Rigorously studied in literature

11

Pairwise Distance Block 

For window size of N: Compute N2/2 pairwise error distances



Errors become available in sequential manner 

 

Require N clock cycles to compute N2/2 subtractions Fixed-point architecture (32 bit) On last clock cycle perform pairwise distance computations between e(n) and all previous errors e(n-1), e(n-2), … e(n)

12

Gaussian Kernel 

Inputs come from pairwise distance block 



For window size of N: compute N2/2 Gaussian Kernels Use Altera’s FP mega-functions:  



Exponential function Multiplications

Accumulate results Floating-point architecture 1 n n V  2   (ei  e j )  (ei  e j )  (xi  x j ) n i1 j1 

13

Overall Architecture 

Algorithm in software is quadratically dependent upon window size 



Impossible to obtain fast/smooth convergence without large time penalty

Transform computation time vs. window size to linear relationship

14

Experimental Testbeds 

Hardware 



Novo-G @ NSF CHREC Center 

Cluster of 24 + 1 servers (compute + head ode)



48 boards housed in 24 Linux computer servers



Each board: quad-FPGA PCIe x8 GiDEL



Total of 192 Altera Stratix-III E260 FPGAs

Experiments only run on up to 2 GiDEL boards (@ 150 MHz) 



Realistic scientific constraints limit problem size and thus scale

Software 

AMD 2.4 GHz Opteron with 4GB DDR400



Optimized C code

15

Precision Results 

System Identification Application  



Procedure 





Given unknown system/plant AF will converge to same transfer function (match output = min error) Input sequence of 2000 samples of white Gaussian noise Observe how weights change every iteration

Precision comparison to software 



Weights converge to optimal values in same number of iterations No loss in precision when compared to software 16

Performance Analysis 

Window size plays most important role 

Controls # of computations within feedback loop 



Controls how fast and accurately the weights change 



performance

Controls the hardware resources required 



# of parallelizable computations

# of AFs that fit on one FPGA

# of AFs also influences speedup AF window size

10

50

100

Fraction of total logic cells used

Suggest Documents