Survey of C-based Application Mapping Tools for ...

Survey of C-based Application Mapping Tools for Reconfigurable Computing

Brian Holland, Mauricio Vacas, Vikas Aggarwal, Ryan DeVille, Ian Troxel, and Alan D. George High-performance Computing and Simulation (HCS) Research Lab Department of Electrical and Computer Engineering University of Florida

Holland

#215 MAPLD 2005

Outline  

Introduction General Survey 



   

Holland

CARTE CATAPULT C B E N C H M S A U R R K V E Y

Ten C-based Application Mappers

Benchmarking & Results 

Finite-Impulse Response (FIR)



N-Queens



Radix Sort

Lessons Learned Conclusions Acknowledgements References

DIME-C HANDEL C IMPULSE C MITRION C NAPA C SA-C STREAMS C SYSTEMC

2

#215 MAPLD 2005

Motivation for Application Mappers 

Motivation for Application Mappers 



Holland

HDL programming has shortcomings 

Limited applicability to application developers



More involved development process (vs. software)



Requires training beyond application level

HDL

Instead, can we find and exploit an environment that allows a measure of hardware control along with increased productivity? 

Can we bring RC performance benefits to application developers?



Would this be practical/possible in traditional HDL? 

HDL is well below the level of traditional application programming



Consequently, we need to move to a higher level of abstraction

3

#215 MAPLD 2005

Introduction

C Code

COMPILER





Selecting a Higher Level of Abstraction 

CAD tools: Visual appealing, but tedious for large projects



New language: Optimal, but requires complete retraining



Traditional or Object-Oriented languages: Which? How?

HDL

Netlist

Configuration File

Ideally, use pure ANSI-C, “The Universal Language” 

Requires no additional knowledge or special training



Port existing C programs into hardware implementations (HDL)



Translation can be handled by a hardware compiler



Programmer concentrates on algorithmic functionality

Holland

4

#215 MAPLD 2005

Commonalities 

General characteristics of C-based application mappers: 

Companies create proprietary ANSI C-based language



Languages do not have all ANSI C features



Extra pragmas are included for corresponding compilers



Additional libraries of functions/macros for further extensions



Must adhere to specific programming “style” for maximum optimization



Emphasis on both hardware generation and I/O interfaces ANSI-C

VHDL

void FIR(int INPUTA, int OUTPUTB) {

/*user source*/

COMPILER

Entity FIR is Port ( rst, clock: in std_logic; INPUTA_en: in std_logic; INPUTA_data: in std_logic_vector(31 downto 0); OUTPUTB_en: in std_logic; OUTPUTB_data: out std_logic_vector(31 downto 0)); end;

} /*user source in VHDL*/

Holland

5

#215 MAPLD 2005

Spectrum of C-based Application Mappers SURVEY PORTION Catapult C

Carte

Impulse C

DIME-C

SA-C

SystemC

Mitrion C

Handel C

Streams C

Napa C

Open Standard

Generic HDL Multiple Platforms

Generic HDL (Optimize for Manufacturer’s Hardware)

Targets a Specific Platform/Configuration

RISC/FPGA Hybrid Only

Cycle Accurate

BENCHMARK SECTION VHDL

Deterministic

VHDL

Not Cycle Accurate

Limited Predicitiblity

Co ntrol

Handel C

DIME-C Impulse C ANSI-C

ANSI-C

Software

Holland

DIME-C Handel C Impulse C

Some HW Pragmas

Many HW Pragmas

Effort HDL

THE LAW OF CONSERVATION OF PAIN

6

#215 MAPLD 2005

Carte

SRC Computers 

Mentor Graphics

[1]

C/Fortran FPGA environment 









Catapult C 

Direct mapping of C/Fortran code to configuration level Software emulation and simulation of compiled code for debugging Capable of multiprocessor and multi-FPGA computational definitions Allows explicit data flow control within memory hierarchy







Holland

Algorithmic synthesis tool for RTL generation 

RTL from “pure” untimed C++



No extensions, pragmas, etc.

Compiler uses “wrappers” around algorithmic code  

Targets SRC’s MAP processor Produces “Unified Executables” for HW or SW processor execution Runtime libraries handle required interfacing and management

7

[2-3]

External: manages I/O interface Internal: constrains synthesis to optimize for chosen interface



Explicit architectural constraints and optimization



Output: RTL netlists in VHDL, Verilog, and SystemC

#215 MAPLD 2005

DIME-C Nallatech

FPGA prototyping tool



Designs are not cycle-accurate







Celoxica

[4]





Handel C

Allows application synthesis for a higher clock speed



Environment for cycle-accurate application development



All operations occur in one deterministic clock cycle

Compilation/Optimization 

Pipeline/parallelize where possible



Included IEEE-754 FP cores



Dedicated (integer) multipliers





Currently in beta, expected release: 4Q05 Output: synthesizable VHDL and DIMEtalk components

Holland

8

[5]

Makes it cycle-accurate, but clock freq reduced to slowest operation Decisions/Loops are “penalty-free” but can significantly impact timing



Language has pragmas for explicitly defined parallelism



Compiler can analyze, optimize, and rewrite code



Output: VHDL/Verilog, SystemC, or targeted EDIFs #215 MAPLD 2005

Impulse C

Impulse Accelerated Technologies 



Processes - independent, potentially concurrent, computing blocks



Streams – communicate and synchronize processes





Each process implemented as separate state machine

Output: Generic or FPGAspecific VHDL

Holland

9

“Processor” creates abstraction layer between C code and FPGA

Compilation 

However, focuses on compatibility with C development environments

[7]

“Softcore” processor tactic 

Compilation 





Uses Streams-C methodology 



Mitrion

[6]

Language/compiler for modeling sequential apps. 



Mitrion C

C code is mapped to a generic “API” of possible functions Processor instantiated on FPGA, tailored to specific application Custom instruction bit-widths, specific cache and buffer sizes



Currently in beta, expected release: 4Q05



Output: a VHDL IP core for target architectures #215 MAPLD 2005

Napa C

National Semiconductor 



Capitalize on single-cycle interconnect instead of I/O bus







Hand-optimized pre-placed, prerouted module generators





Holland

Designed to implicitly express data-parallel operations Image and signal processing

Compiler (UC-Irvine, UC-Riverside, Colorado State Univ.)

Compiler generates hardware pipelines from C loops

Targets NS NAPA1000 hybrid processor 

[9-12]

High-level, expression-oriented, machine-independent, singleassignment language

Datapath Synthesis Technique 



Colorado State University

[8]

Language/compiler for RISC/FPGA hybrid processor 



SA-C

Fixed-Instruction Processor (FIP), Adaptive Logic Processor (ALP)





Loop optimizations



Structural transforms



Execution block placement

Target Platforms 

ALP also compiles to RTL VHDL, structural VHDL, structural Verilog 10

UC Irvine Morphosys; Annapolis WildForce, StarFire, WildFire #215 MAPLD 2005

Streams C

Los Alamos National Laboratory 

Open SystemC Initiative (OSCI)

[12-14]

Stream-oriented sequential process modeling 



SystemC 

Essentially, data elements moving through discrete functional blocks







Generates multi-threaded processor executables and multiple FPGA bitstreams

Includes functional-level simulation environment



Output: synthesizable RTL







11

Hierarchical decomposition of a system into modules Structural connectivity between modules using ports/exports Scheduling and synchronization of concurrent processes using events

Event-driven simulator 

Holland

Core language, modules & ports for defining structure, and interfaces & channels

Supports functional modeling 

Allows parallel C program translation into a parallel arch.



Open-source extension of C++ for HW/SW modeling 

Compiler

[15-16]

Events are basic dynamic/static process synchronization objects #215 MAPLD 2005

About the Benchmarks 

Three classic algorithms used for benchmarking

10 8 6



Finite-Impulse Response (FIR)  





0

Simple 51-tap FIR filter for standard DSP applications Compare compiler solutions and analyze their usage metrics

1

3

5

7

9

11

13

15

17



Sorts using ‘binary bins’, minimizing resources Illustrates resource metrics in RAM-intensive applications

Implementation Details

21

23

25

27

-6 -8 -10

0

110

100

1

111

101



DIME-C, Handel C, Impulse C, VHDL, and ANSI-C (for baseline timing)



Experiments performed on Nallatech BenNUEY-PCI card with VirtexII-6000 FPGA



Resource utilization based on post place-and-route data



Runtime represents communication time (setup and verification I/O is negated)



Handel C and Impulse C require VHDL wrappers which can increase resource usage

Holland

19

-4

Classic embarrassingly parallel HPC backtracking search problem Showcases the potential of optimized implementations

Radix Sort 



2

-2

N-Queens 



4

12

#215 MAPLD 2005

10

Finite-Impulse Response 100

FIR Resource Utilization Statistics

Speedup over 2.4GHz Xeon 4

% Usage

80

60

3

40

2

20

1

0 Slices

Multipliers DIME-C

Handel C

Block RAMs Impulse C

Clock Freq

0 DIME-C

VHDL

Handel C

Impulse C

VHDL

gcc -O3



FIR filter containing 51 taps, each 16-bits wide (based on algorithms in [4,6])



Various application-mapper languages do not have a consistent I/O interface





Could not create a consistent streaming channel with requisite blocking in every tool



Instead, FIR algorithm operates on values stored in a block RAM

Obtains speedup through parallel multiplication, efficient memory accesses 



gcc -O0

The 51 coefficients and variables are stored in local variables

Additional performance boosts are possible in multi-channel DSP processing

Holland

13

#215 MAPLD 2005

N-Queens Speedup over 2.4GHz Xeon

N-Queens Resource Utilization Statistics

6

100

5

80

% Usage

4 60

3 40

2

20

1 0

0 Slices DIME-C

13

Clock Freq Handel C

Impulse C

DIME-C

VHDL

14

15

Handel C

Impulse C

16 VHDL

17 gcc -O3

gcc -O0



Represents a purely computational algorithm; virtually no communication overhead



Algorithm contains several parallelizable code segments, exploitable for speedup



Implementations are based upon same baseline C code 



Holland

N

Every available technique and compiler optimization is employed to boost performance

Notes: 

Handel C N-Queens is a benchmark from our MAPLD’04 paper with additional refinements



VHDL N-Queens is culmination of a semester-long endeavor into algorithm’s parallelism



DIME-C and Impulse C N-Queens are results of experimentation with beta compilers 14

#215 MAPLD 2005

Radix Sort Radix Sort Resource Utilization Statistics

Speedup over 2.4GHz Xeon

1.0

100

% Usage

80

60 0.5

40

20

0 Slices DIME-C

Block RAMs Handel C

Impulse C

Clock Freq

0.0 DIME-C

VHDL

Handel C

Impulse C

VHDL

gcc -O3

gcc -O0



Sorts values one bit at a time (saving significant resources vs. sorting on digit at a time)



Represents a “worst-case” legacy algorithm, containing no functional-level parallelism





Every element in every iteration depends on every previous element in every iteration



Ideal for software processor with fast cache, challenging in FPGA hardware

Speedup comes through efficient RAM usage and compiler optimizations/pipelining 



Holland

Reduce quantity and addressing complexity of RAM accesses whenever possible

Metrics are based on sorting 600 32-bit integers contained within a block RAM

15

#215 MAPLD 2005

Some Optimization Techniques Keep expensive computational operations to a minimum Multiplication, division, modulo, greater/less than, and floating point are *slow*

BAD

temp = a[0]; for(i=0;i

Survey of C-based Application Mapping Tools for ...

Survey of C-based Application Mapping Tools for ...

Suggest Documents

application of digital survey mapping technology in the investigation of ...

Ontology Mapping Tools, Methods and Approaches â Analytical Survey

Application of pattern recognition tools for classifying

automated tools for mapping among ontologies

Application of Nucleic Acid-Based Tools for

Application of splitGFP for topology mapping ...

Protein Interface Pharmacophore Mapping Tools for

Protein Interface Pharmacophore Mapping Tools for

Online Mapping Tools for Geolocating Amish Settlements

Tools for mapping applications to CCMs

Web-based survey tools are powerful tools for ... - CiteSeerX

A Mapping Survey

Robotic Mapping: A Survey

Application of Uphole Seismic Refraction Survey for

Geoelectric Survey for Mapping Groundwater Flow ...

PRACTICAL APPLICATION OF QUALITY TOOLS

xyz mapping tools i - CDRiA

Development of geospatial analysis tools for inventory and mapping of ...

Specialty Application Tools

PlowNYC Mapping Application - NYC.gov

android mapping application - AIRCC

PlowNYC Mapping Application - NYC.gov

Tools for Application-Oriented Performance Tuning

Worldsens: Development and Prototyping Tools for Application ...

Survey of C-based Application Mapping Tools for ...