Hardware-dependent Software Synthesis for Many-Core ... - ASP-DAC

10 downloads 38089 Views 1MB Size Report
App. Dev. Prototype. Board. + BSP. Board. TLM. Gen. TLM. ASIC/. FPGA. Tools. Future. Board. + BSP. HdS Gen. HW Gen. Prototype. Platform. C/C++. App. Dev.
Hardware-dependent Software Synthesis for Many-Core Embedded Systems Samar Abdi, Gunar Schirner, Ines Viskic, Hansu Cho, Yonghyun Hwang, Lochi Yu, Daniel Gajski

Presenter: Samar Abdi Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu

Outline •

Motivation for many-core HdS synthesis



Model based approach to system synthesis



HdS synthesis for TLM (Front-End)



HdS synthesis for PCAM (Back-End)



Embedded System Environment (ESE)



Experimental results for MP3 decoder example



Conclusion Copyright ©2009, CECS

2

Present

Past

System Design Trend

Platform

HW Dev.

Board

VP

Platform

Platform Modeling

HdS Dev.

Board + BSP

App. Dev.

Prototype

App. Dev.

Prototype

HdS Dev.

Virtual Platform

Board + BSP

Future

HW Dev.

C/C++ HdS Gen. App. Dev.

TLM Gen.

ASIC/ FPGA Tools

TLM HW Gen. Board + BSP

Prototype

Platform

Copyright ©2009, CECS

3

Motivation for Many-Core HdS Synthesis •

Multi-core and many-core HW platforms ¾ ¾ ¾



Automatic HdS synthesis ¾ ¾ ¾



Technology scaling is limited Better throughput, lower power with parallel execution Heterogeneous systems required for efficient execution

Multipurpose, complex applications Flexible HW platforms that vary across application domains Short time to market

Approach: Model based synthesis ¾ ¾ ¾

Executable models at various abstraction levels Formalized model semantics Layered HdS structure Copyright ©2009, CECS

4

Model Based Synthesis •

Benefits ¾ ¾ ¾ ¾

Faster design/simulation/validation at higher abstraction Early application development Rapid design space exploration Automatic synthesis • •



reduces probability of error vs. manual implementation Increases designer productivity

Challenges ¾ ¾ ¾

Identifying the “right” abstraction levels Formal model semantics at each abstraction level Methods and tools for • • •

Application development SW/HW synthesis Model debugging and analysis Copyright ©2009, CECS

5

Model Abstractions (with Respect to OSI) Pin / Cycle Accurate Model Transaction Level Model Specification Model 7.

Application

7.

Application

6.

Presentation

6.

Presentation

5.

Session

5.

Session

4.

Transport

4.

Transport

3.

Network

3.

Network

Spec

TLM

2b. Link + Stream

2b. Link + Stream

2a. Media Access Ctrl

2a. Media Access Ctrl

2a. Protocol

2a. Protocol

1.

1.

Physical

Physical

Address lines Data lines Control lines P/CAM

Copyright ©2009, CECS 6

6

System Synthesis Flow System Definition Application

Platform

mapping

Component Models

Front End

TLM

Component Libraries

Back End

PCAM ASIC flow

FPGA flow Copyright ©2009, CECS

7

Front End Design Flow System Definition Application Platform mapping

PE/RTOS Models

Timing Estimation

Design Optimization

Timed Application

Bus/IF/Mem Models

TLM Generation

SystemC TTLM

SystemC Simulation

Metrics

Copyright ©2009, CECS

8

Outline •

Motivation for many-core HdS synthesis



Model based approach to system synthesis



HdS synthesis for TLM (Front-End)



HdS synthesis for PCAM (Back-End)



Embedded System Environment (ESE)



Experimental results for MP3 decoder example



Conclusion Copyright ©2009, CECS

9

Application Model P1

P2

v1

C1

C2

P3



P4

Application Spec Model: set of communicating processes ¾ ¾ ¾

Executes on a simulation kernel (eg: SystemC) – No HdS Process functionality defined using C/C++ Blocking channels and non-blocking variables Copyright ©2009, CECS 10

10

Platform Architecture

Bus1

HW IP



Mem

Transducer

Arbiter

CPU1

Bus2

CPU2

Netlist of SW processors, HW, Buses and interfaces Copyright ©2009, CECS 11

11

Input: System Definition CPU1 P2

P3

HW IP

Bus1

Transducer

v1

C2 C1

Arbiter

P1

Mem

Bus2

P4

CPU2

System Definition = Platform + Application + Mapping Copyright ©2009, CECS 12

12

Computation Timing Estimation BB1

BB1

wait(t1) N

If

N

Y

If

Y

Processor Model BB2

BB3

BB2

BB3

wait(t2)

N

If

Y

Process CDFG

Timing Estimation

N

wait(t3)

If

Y

Timed Process

• DFG scheduling to compute basic block delay • RTOS model added for PEs with multiple processes Copyright ©2009, CECS

13

Communication Timing Estimation PE2 p1 p2

PE1

Bus3

Tx2

Tx1 Bus2

Untimed Bus1

Tx3 PE3 PE4

Application + Platform

Protocol Model

Estimation Engine

Bus1

Write( ) { Get_Bus( ); Transfer( ); Release_Bus( ); }

Write( ) { Get_Bus( ); wait (t1); Transfer( ); wait (t2); Release_Bus( ); wait (t3); }

Timed Bus1

• Protocol model used to estimate synchronization, arbitration and transfer • Timing is annotated in bus channel and HdS model Copyright ©2009, CECS

14

CPU1

Output: SystemC Timed TLM Mem P1

P2 OS Transducer

Bus1

Bus2

TLM Semantics P3

HW IP

• Application code Æ sc_thread

OS

• Processing element Æ sc_module P4 • OS Model Æ sc_module • Bus Æ sc_channel • Memory Æ Array inside sc_module CPU2 • TransducerÆ FIFO channel+sc_process Copyright ©2009, CECS

15

Transaction Level HdS Model



Routing and packeting layers of HdS generated in TLM Copyright ©2009, CECS

16

HdS Layers in TLM SW Core TLM

Sender Process i_Send (D,S);

i_Send (data, size) { sent = 0; //Rt = Route(Ch); //pktSz=MinBuf(Rt); do { i_SyncTransfer (data[sent], pktSz); sent+=PktSz; } while (sentFirstBus(); //if (Rt->HasTx()) { //Tx=Rt->FirstTx(); B Write (Req[Tx, Ch], &S); //} Tx request while (!B Sync(Ch)); B Write(Addr[B,Ch], D, S); } // end Send

Bus Channel Methods Write (Addr, DataPtr); Read (Addr, DataPtr); Sync (Channel_ID);

Synchronization and Transfer Model

Routing and Packeting B

Bus B Req

I/O Buffer

• •

Tx

Bus channel model provides basic transaction services Synchronization and transfer is modeled in SW core Copyright ©2009, CECS

17

TLM vs. Traditional Models TLM: Transaction Level Model ISM: Instruction Set Model PCAM: Pin/Cycle Accurate Model

Accuracy Board 100%

PCAM

Timed TLM

~92% ~80%

ISM

Func. TLM 0

2sec

3~4 hrs

15~18hrs

Exec. Time (MP3)

Time and accuracy trade off among different models Copyright ©2009, CECS

18

Outline •

Motivation for many-core HdS synthesis



Model based approach to system synthesis



HdS synthesis for TLM (Front-End)



HdS synthesis for PCAM (Back-End)



Embedded System Environment (ESE)



Experimental results for MP3 decoder example



Conclusion Copyright ©2009, CECS

19

Back End Synthesis Flow SystemC TLM

SW/RTOS Library

SW Synthesis

RTL IP Library

CÆRTL (Forte/NISC)

Interface Synthesis

OR Bus Library

Binary

HW RTL

IF RTL

Pin/Cycle Accurate Model (PCAM) Generator Prototype

CA Sim. Tools

C/Verilog PCAM

FPGA Tools Copyright ©2009, CECS

20

CPU1

Cycle-Accurate Software Synthesis Program EXE RTOS

Compile

P1 RTOS/ Driver Synthesis

HAL

P2 OS HAL

Transducer

Bus1

Bus2

Compile

OS P4 HW IP

Program EXE

RTOS/ Driver Synthesis

RTOS HAL

CPU2 Copyright ©2009, CECS

21

CPU1

Cycle-Accurate Hardware Synthesis Mem

Transducer

Bus1

Bus2

Cycle-accurate Synthesis

P3

HW IP (RTL)

Processes in C

CPU2 Copyright ©2009, CECS

22

Cycle-Accurate Interface Synthesis CPU1

Mem IC

Transducer Arbiter

Interface Synthesis

HW IP

CPU2

Copyright ©2009, CECS

23

CPU1

Pin/Cycle-Accurate Model (PCAM) Mem

Program EXE

IC

RTOS HAL

Transducer Arbiter

Program EXE

HW IP

PCAM is input for fast prototyping with FPGAs or implementation with ASIC design tools

RTOS HAL

CPU2

Copyright ©2009, CECS

24

Pin and Cycle Accurate Model (PCAM) CPU1 Model

System SW stack

App. Binary

Channel API Routing, Packeting Thread. IPC Synchronization Lib. Lib. Data transfer Mem. IH IH ... Access Instruction Set

CPU2 Model

Memory Transducer RTL

Bus1

Bus2 Bitstream

• •

Download

Sync. and transfer layers of HdS generated in PCAM Application, System libs and HdS is cross-compiled Copyright ©2009, CECS

25

HdS Layers in PCAM SW Core

Sender Process i_Send (D,S);

i_Send (data, size) { sent = 0; //Rt = GSRT(Ch); //pktSz=MinBuf(Rt); do { i_SyncTransfer (data[sent], pktSz); sent+=PktSz; } while (sentFirstBus(); //if (Rt->HasTx()) { //Tx=Rt->FirstTx(); WrMem (SendRB[Tx, Ch], S, WordSize); //} end Tx case while (!SyncFlag_Ch) { SuspendProc(); } SyncFlag_Ch=0; //SA=low(AR(B,Ch)); WrMem(SA, D, S); } // end Send Synchronization and Transfer

WrMem (A,D,S) { Written=0; while (Written

Suggest Documents