A Parallel Hardware Architecture for Information-Theoretic Adaptive
Recommend Documents
Create custom hardware architecture. â¢. Take advantage of inherent parallelism within algorithm. â¢. Accelerate ... computation time (in software). â«. Proposed ...
A Parallel Hardware Architecture for. Information-Theoretic Adaptive Filtering. Stefan Craciun. Dr. Alan D. George. Dr. Herman Lam. Dr. Jose C. Principe.
ABSTRACT. The parallel connected component labeling used in binary image analysis is ... union operations are implemented to complete the entire connected ...
Object detection is one of the most important tasks in computer vision. It has multiple applications in many different fields such as face detection, video ...
Keon Jang, Sangjin Han, Seungyeop Han, Sue Moon, and Kyoungsoo Park. SSLSshader: cheap SSL acceleration with commodity processors. In 8th USENIX.
Aug 18, 2010 - Karl Pauwels. Cognitive Vision ..... (C, D) The optical flow angle and magnitude. ... trajectories identified (Pugeault, Pauwels, Pilz, Van Hulle,.
cessors of the ICL-DAP3. Therefore, all element operations. (componentwise XOR) within one row operation have been parallelized, permitting execution of a ...
Dec 24, 2015 - Email: [email protected], [email protected]. ... These networks demand parallel computations of large set of feature extractors.
Young-Ho Seo,1,5 Hyun-Jun Choi,2 Ji-Sang Yoo,3 and .... higher calculation speed is required to service a real-time hologram video of moderate image size.
Jun 2, 2012 - Index TermsâEye detection, hardware architecture,. FPGA, image processing ... Pohang, Korea. *** Adv. Robotics Research Center, KIST, Seoul, Korea .... each call, a distribution of weights Dt is updated that indicates the ...
stage classifier cascade, and FCM-based post-processing. By eliminating ... detected face window which contains the maximum ..... (d) âDancingâ at frame t+3Î.
Millions of users access the World Wide Web every day to send and receive all kind ..... _ s e l l s _ s e a _ s h e l l s _ s h e a _ s h e l l s _ s h e _ s e l l s _ s e.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to op
algorithms and architectural design leading to a reconfig- urable architecture for ... as custom-fit processors [5] provides the highest amount of flexibility, but is ...
FPGA-based hardware architecture that accelerates an object classification ... robotics, medicine, image processing, and video analytics. Recently announced ...
Dec 5, 2011 - Communication. Engineering, Anna University from 1975 to 2008. He is presently a Professor, Department of Electronics and Communication.
2 Department of Applied Mechanics, Chalmers University of Technology,. SE-41296 ... tutes a challenging problem for industries such as aerospace and auto- ... Keywords: Computer Graphics, Virtual Reality, Software Engineering, Com-.
Related works. General Motor VR room. TechViz Showroom ... Rear Projection system. â¡ Image generators unit ... Projection room. Entrance. Server Room.
The first implementation of Testor Theory algorithms on ... this problem is through Testor Theory. .... A priority encoder is used to select the n-tuple holding the.
proved Fallahpour Audio Watermarking Scheme is proposed, demon- strating the suitability of ... of watermarking techniques allows to add an imperceptible and statistically undetectable signature to ... and online audio-clips trade. FPGAs are ...
The Montgomery inversion is a fundamental computation in several cryptographic applications. In this work, we propose a scalable hardware architecture to ...
others are not. If an initial mesh is distributed quite fairly among a number of processors, a very good error estimator (coupled with adaptive refinement) quickly.
Dual functions, BankâHolst algorithm, parallel adaptive grid generation .... For Tables and Figures, we will use the abbreviation eh = u â uh to denote the error.
Figure 58: Bit width in the proposed hardware implementation vs. error . ..... implementing real-time object recognition systems using low resources and a ... feature extraction and classification algorithms in an efficient way in order to reach ....
A Parallel Hardware Architecture for Information-Theoretic Adaptive
Crucial for real-time online learning. â«. Computation time of filter coefficients .... Total of 192 Altera Stratix-III E260 FPGAs. â Experiments only run on up to 2 ...
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering
HPRCTA 2010
November 14, 2010
Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe
NSF CHREC Center ECE Department, University of Florida
Outline
Introduction to Information-Theoretic Learning (ITL) and Adaptive Filters (AF)
Hardware Architecture
Building Blocks
Experiments & Results
Motivations Background
Application: System Identification Performance
Conclusions 2
Information Theory & Adaptive Filters
Information-Theoretic Learning
ITL: useful in context of nonlinear non-Gaussian signal processing
How to extract and quantify information Entropy replaces variance (measure of uncertainty) Superior results when applied to nonlinear system identification
Adaptive Filters
Self-adjusting filter (change transfer function by changing filter weights) Adapt to changes in input Stable solution reached in finite # of iterations
3
Project Description
Digital Signal Processing Domain
Adaptive Filters (AFs) used when:
Target Application
Statistics of input signal are unknown Estimate required signal statistics through iterative learning process Adjust internal parameters to meet specific performance criteria System Identification Approximate transfer function of unknown system
Why RC?
Create custom hardware architecture Take advantage of inherent parallelism within algorithm Accelerate adaptation process
4
Challenges
AF is closed-loop system
Each iteration depends upon previous one. Can only parallelize computations within feedback loop
Most applications require 1000s to 10000s of iterations
Easily lose precision over time Never converge to optimal solution Deviate and never reach stable weights
5
Advantages
Find optimal filter weights faster?
Yes, but not most important
Model an unknown system faster
Target a much wider range of applications?
Crucial for real-time online learning
Computation time of filter coefficients dictates sampling rate Cannot process new samples until previous iteration is complete By accelerating one iteration we can increase sampling frequency
6
New Approach
Using MEE vs. MSE cost function to adapt filter weights
MSE: most popular criterion used in adaptation (LMS algorithm) MEE requires considerably more computations within one iteration MEE has demonstrated superior performance at cost of a substantial increase in computational complexity RC: Adapt unique hardware architecture to MEE computational requirements
7
xi xj
AF Structure
Composed of three major blocks N
Adaptive FIR
y xi i i1
FIR output approaches desired output as weights converge to optimal solution
Cost Function and Learning Algorithm
Learning Algorithm: updates the weights
Cost function defines rules of adaptation 1 N N V 2 (ei e j ) N i1 j1
1 N N V 2 (ei e j ) (ei e j ) (xi x j ) N i1 j1
κ is Gaussian Kernel
Window
size determines number of computations: complexity O(N2) 8
Effects of Window Size on AF
Window size has direct effect on filter performance
Determines how smooth and fast weights converge Determines hardware resources required Window size: most influential variable affecting speedup
Steep tradeoff between filter performance and computation time (in software) Proposed architecture transforms time-performance tradeoff to linear dependence
Goal: RC custom architecture used to reduce complexity from O(N2) to O(N) Avoid losing precision over multiple iterations
Employ hybrid-precision architecture Mix of fixed-point and floating-point (single-precision)
9
AF Building Blocks
FIR Filter & MEE Cost Function
Hybrid fixed/float (singleprecision) operations Balance: maintain precision while decreasing latency
10
Adaptive FIR
Single-precision float operations Simple pipelined implementation Rigorously studied in literature
11
Pairwise Distance Block
For window size of N: Compute N2/2 pairwise error distances
Errors become available in sequential manner
Require N clock cycles to compute N2/2 subtractions Fixed-point architecture (32 bit) On last clock cycle perform pairwise distance computations between e(n) and all previous errors e(n-1), e(n-2), … e(n)
12
Gaussian Kernel
Inputs come from pairwise distance block
For window size of N: compute N2/2 Gaussian Kernels Use Altera’s FP mega-functions:
Exponential function Multiplications
Accumulate results Floating-point architecture 1 n n V 2 (ei e j ) (ei e j ) (xi x j ) n i1 j1
13
Overall Architecture
Algorithm in software is quadratically dependent upon window size
Impossible to obtain fast/smooth convergence without large time penalty
Transform computation time vs. window size to linear relationship
14
Experimental Testbeds
Hardware
Novo-G @ NSF CHREC Center
Cluster of 24 + 1 servers (compute + head ode)
48 boards housed in 24 Linux computer servers
Each board: quad-FPGA PCIe x8 GiDEL
Total of 192 Altera Stratix-III E260 FPGAs
Experiments only run on up to 2 GiDEL boards (@ 150 MHz)
Realistic scientific constraints limit problem size and thus scale
Software
AMD 2.4 GHz Opteron with 4GB DDR400
Optimized C code
15
Precision Results
System Identification Application
Procedure
Given unknown system/plant AF will converge to same transfer function (match output = min error) Input sequence of 2000 samples of white Gaussian noise Observe how weights change every iteration
Precision comparison to software
Weights converge to optimal values in same number of iterations No loss in precision when compared to software 16
Performance Analysis
Window size plays most important role
Controls # of computations within feedback loop
Controls how fast and accurately the weights change