A Parallel Hardware Architecture for Information-Theoretic Adaptive

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering

HPRCTA 2010

November 14, 2010

Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe

NSF CHREC Center ECE Department, University of Florida

Outline 

Introduction to Information-Theoretic Learning (ITL) and Adaptive Filters (AF)  



Hardware Architecture 



Building Blocks

Experiments & Results  



Motivations Background

Application: System Identification Performance

Conclusions 2

Information Theory & Adaptive Filters 

Information-Theoretic Learning 

ITL: useful in context of nonlinear non-Gaussian signal processing   



How to extract and quantify information Entropy replaces variance (measure of uncertainty) Superior results when applied to nonlinear system identification

Adaptive Filters 

 

Self-adjusting filter (change transfer function by changing filter weights) Adapt to changes in input Stable solution reached in finite # of iterations

3

Project Description 

Digital Signal Processing Domain 

Adaptive Filters (AFs) used when:   



Target Application  



Statistics of input signal are unknown Estimate required signal statistics through iterative learning process Adjust internal parameters to meet specific performance criteria System Identification Approximate transfer function of unknown system

Why RC? 

Create custom hardware architecture  Take advantage of inherent parallelism within algorithm  Accelerate adaptation process

4

Challenges 

AF is closed-loop system  



Each iteration depends upon previous one. Can only parallelize computations within feedback loop

Most applications require 1000s to 10000s of iterations 

Easily lose precision over time  Never converge to optimal solution  Deviate and never reach stable weights

5

Advantages 

Find optimal filter weights faster? 

Yes, but not most important 



Model an unknown system faster

Target a much wider range of applications? 

Crucial for real-time online learning   

Computation time of filter coefficients dictates sampling rate Cannot process new samples until previous iteration is complete By accelerating one iteration we can increase sampling frequency

6

New Approach 

Using MEE vs. MSE cost function to adapt filter weights



MSE: most popular criterion used in adaptation (LMS algorithm) MEE requires considerably more computations within one iteration MEE has demonstrated superior performance at cost of a substantial increase in computational complexity RC: Adapt unique hardware architecture to MEE computational requirements







7

xi xj

AF Structure 

Composed of three major blocks N



Adaptive FIR

y   xi   i i1





FIR output approaches desired output as weights converge to optimal solution

 Cost Function and Learning Algorithm 

Learning Algorithm: updates the weights



Cost function defines rules of adaptation 1 N N V  2   (ei  e j ) N i1 j1

1 N N V  2   (ei  e j )  (ei  e j )  (xi  x j ) N i1 j1  

κ is Gaussian Kernel

  Window

size determines number of computations: complexity O(N2) 8

Effects of Window Size on AF 

Window size has direct effect on filter performance 

 





Determines how smooth and fast weights converge Determines hardware resources required Window size: most influential variable affecting speedup

Steep tradeoff between filter performance and computation time (in software) Proposed architecture transforms time-performance tradeoff to linear dependence 



Goal: RC custom architecture used to reduce complexity from O(N2) to O(N) Avoid losing precision over multiple iterations  

Employ hybrid-precision architecture Mix of fixed-point and floating-point (single-precision)

9

AF Building Blocks 

FIR Filter & MEE Cost Function 



Hybrid fixed/float (singleprecision) operations Balance: maintain precision while decreasing latency

10

Adaptive FIR   

Single-precision float operations Simple pipelined implementation Rigorously studied in literature

11

Pairwise Distance Block 

For window size of N: Compute N2/2 pairwise error distances



Errors become available in sequential manner 

 

Require N clock cycles to compute N2/2 subtractions Fixed-point architecture (32 bit) On last clock cycle perform pairwise distance computations between e(n) and all previous errors e(n-1), e(n-2), … e(n)

12

Gaussian Kernel 

Inputs come from pairwise distance block 



For window size of N: compute N2/2 Gaussian Kernels Use Altera’s FP mega-functions:  



Exponential function Multiplications

Accumulate results Floating-point architecture 1 n n V  2   (ei  e j )  (ei  e j )  (xi  x j ) n i1 j1 

13

Overall Architecture 

Algorithm in software is quadratically dependent upon window size 



Impossible to obtain fast/smooth convergence without large time penalty

Transform computation time vs. window size to linear relationship

14

Experimental Testbeds 

Hardware 



Novo-G @ NSF CHREC Center 

Cluster of 24 + 1 servers (compute + head ode)



48 boards housed in 24 Linux computer servers



Each board: quad-FPGA PCIe x8 GiDEL



Total of 192 Altera Stratix-III E260 FPGAs

Experiments only run on up to 2 GiDEL boards (@ 150 MHz) 



Realistic scientific constraints limit problem size and thus scale

Software 

AMD 2.4 GHz Opteron with 4GB DDR400



Optimized C code

15

Precision Results 

System Identification Application  



Procedure 





Given unknown system/plant AF will converge to same transfer function (match output = min error) Input sequence of 2000 samples of white Gaussian noise Observe how weights change every iteration

Precision comparison to software 



Weights converge to optimal values in same number of iterations No loss in precision when compared to software 16

Performance Analysis 

Window size plays most important role 

Controls # of computations within feedback loop 



Controls how fast and accurately the weights change 



performance

Controls the hardware resources required 



# of parallelizable computations

# of AFs that fit on one FPGA

# of AFs also influences speedup AF window size

10

50

100

Fraction of total logic cells used

A Parallel Hardware Architecture for Information-Theoretic Adaptive

A Parallel Hardware Architecture for Information-Theoretic Adaptive

Suggest Documents

A Parallel Hardware Architecture for Information-Theoretic Adaptive ...

A Parallel Hardware Architecture for Information-Theoretic Adaptive ...

a scalable parallel hardware architecture for ...

FPGA-based Parallel Hardware Architecture for Real

Architecture Considerations for Massively Parallel Hardware ... - CRoCS

A cortical architecture on parallel hardware for motion ... - FortKnox

A Parallel Hardware Architecture for fast Gaussian ... - CiteSeerX

Hardware Architecture for Large Parallel Array of Random Feature ...

Cell-based hardware architecture for full-parallel ... - OSA Publishing

An FPGA-based Parallel Hardware Architecture for Real ... - CiteSeerX

An FPGA-based Parallel Hardware Architecture for Real-Time Face ...

Parallel Hardware/Software Architecture for the BWT and ... - SciELO

parallel computer architecture a hardware software approach pdf ...

A Hardware/Software Reconfigurable Architecture for ... - CiteSeerX

A Hardware Architecture for Accelerating Neuromorphic Vision ...

A New Reconfigurable Hardware Architecture for Cryptography ...

A Proposed Hardware-Software Architecture for ...

A Proposed Hardware-Software Architecture for

A Hardware Architecture for Filtering Irreducible Testors

Hardware architecture for security improved

EFFICIENT SCALABLE HARDWARE ARCHITECTURE FOR ...

A NEW PARADIGM FOR PARALLEL ADAPTIVE MESHING

Dual Functions for A Parallel Adaptive Method

FPGA-BASED PARALLEL HARDWARE