App. Dev. Prototype. Board. + BSP. Board. TLM. Gen. TLM. ASIC/. FPGA. Tools. Future. Board. + BSP. HdS Gen. HW Gen. Prototype. Platform. C/C++. App. Dev.
Hardware-dependent Software Synthesis for Many-Core Embedded Systems Samar Abdi, Gunar Schirner, Ines Viskic, Hansu Cho, Yonghyun Hwang, Lochi Yu, Daniel Gajski
Presenter: Samar Abdi Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu
Outline •
Motivation for many-core HdS synthesis
•
Model based approach to system synthesis
•
HdS synthesis for TLM (Front-End)
•
HdS synthesis for PCAM (Back-End)
•
Embedded System Environment (ESE)
•
Experimental results for MP3 decoder example
•
Conclusion Copyright ©2009, CECS
2
Present
Past
System Design Trend
Platform
HW Dev.
Board
VP
Platform
Platform Modeling
HdS Dev.
Board + BSP
App. Dev.
Prototype
App. Dev.
Prototype
HdS Dev.
Virtual Platform
Board + BSP
Future
HW Dev.
C/C++ HdS Gen. App. Dev.
TLM Gen.
ASIC/ FPGA Tools
TLM HW Gen. Board + BSP
Prototype
Platform
Copyright ©2009, CECS
3
Motivation for Many-Core HdS Synthesis •
Multi-core and many-core HW platforms ¾ ¾ ¾
•
Automatic HdS synthesis ¾ ¾ ¾
•
Technology scaling is limited Better throughput, lower power with parallel execution Heterogeneous systems required for efficient execution
Multipurpose, complex applications Flexible HW platforms that vary across application domains Short time to market
Approach: Model based synthesis ¾ ¾ ¾
Executable models at various abstraction levels Formalized model semantics Layered HdS structure Copyright ©2009, CECS
4
Model Based Synthesis •
Benefits ¾ ¾ ¾ ¾
Faster design/simulation/validation at higher abstraction Early application development Rapid design space exploration Automatic synthesis • •
•
reduces probability of error vs. manual implementation Increases designer productivity
Challenges ¾ ¾ ¾
Identifying the “right” abstraction levels Formal model semantics at each abstraction level Methods and tools for • • •
Application development SW/HW synthesis Model debugging and analysis Copyright ©2009, CECS
5
Model Abstractions (with Respect to OSI) Pin / Cycle Accurate Model Transaction Level Model Specification Model 7.
Application
7.
Application
6.
Presentation
6.
Presentation
5.
Session
5.
Session
4.
Transport
4.
Transport
3.
Network
3.
Network
Spec
TLM
2b. Link + Stream
2b. Link + Stream
2a. Media Access Ctrl
2a. Media Access Ctrl
2a. Protocol
2a. Protocol
1.
1.
Physical
Physical
Address lines Data lines Control lines P/CAM
Copyright ©2009, CECS 6
6
System Synthesis Flow System Definition Application
Platform
mapping
Component Models
Front End
TLM
Component Libraries
Back End
PCAM ASIC flow
FPGA flow Copyright ©2009, CECS
7
Front End Design Flow System Definition Application Platform mapping
PE/RTOS Models
Timing Estimation
Design Optimization
Timed Application
Bus/IF/Mem Models
TLM Generation
SystemC TTLM
SystemC Simulation
Metrics
Copyright ©2009, CECS
8
Outline •
Motivation for many-core HdS synthesis
•
Model based approach to system synthesis
•
HdS synthesis for TLM (Front-End)
•
HdS synthesis for PCAM (Back-End)
•
Embedded System Environment (ESE)
•
Experimental results for MP3 decoder example
•
Conclusion Copyright ©2009, CECS
9
Application Model P1
P2
v1
C1
C2
P3
•
P4
Application Spec Model: set of communicating processes ¾ ¾ ¾
Executes on a simulation kernel (eg: SystemC) – No HdS Process functionality defined using C/C++ Blocking channels and non-blocking variables Copyright ©2009, CECS 10
10
Platform Architecture
Bus1
HW IP
•
Mem
Transducer
Arbiter
CPU1
Bus2
CPU2
Netlist of SW processors, HW, Buses and interfaces Copyright ©2009, CECS 11
11
Input: System Definition CPU1 P2
P3
HW IP
Bus1
Transducer
v1
C2 C1
Arbiter
P1
Mem
Bus2
P4
CPU2
System Definition = Platform + Application + Mapping Copyright ©2009, CECS 12
12
Computation Timing Estimation BB1
BB1
wait(t1) N
If
N
Y
If
Y
Processor Model BB2
BB3
BB2
BB3
wait(t2)
N
If
Y
Process CDFG
Timing Estimation
N
wait(t3)
If
Y
Timed Process
• DFG scheduling to compute basic block delay • RTOS model added for PEs with multiple processes Copyright ©2009, CECS
13
Communication Timing Estimation PE2 p1 p2
PE1
Bus3
Tx2
Tx1 Bus2
Untimed Bus1
Tx3 PE3 PE4
Application + Platform
Protocol Model
Estimation Engine
Bus1
Write( ) { Get_Bus( ); Transfer( ); Release_Bus( ); }
Write( ) { Get_Bus( ); wait (t1); Transfer( ); wait (t2); Release_Bus( ); wait (t3); }
Timed Bus1
• Protocol model used to estimate synchronization, arbitration and transfer • Timing is annotated in bus channel and HdS model Copyright ©2009, CECS
14
CPU1
Output: SystemC Timed TLM Mem P1
P2 OS Transducer
Bus1
Bus2
TLM Semantics P3
HW IP
• Application code Æ sc_thread
OS
• Processing element Æ sc_module P4 • OS Model Æ sc_module • Bus Æ sc_channel • Memory Æ Array inside sc_module CPU2 • TransducerÆ FIFO channel+sc_process Copyright ©2009, CECS
15
Transaction Level HdS Model
•
Routing and packeting layers of HdS generated in TLM Copyright ©2009, CECS
16
HdS Layers in TLM SW Core TLM
Sender Process i_Send (D,S);
i_Send (data, size) { sent = 0; //Rt = Route(Ch); //pktSz=MinBuf(Rt); do { i_SyncTransfer (data[sent], pktSz); sent+=PktSz; } while (sentFirstBus(); //if (Rt->HasTx()) { //Tx=Rt->FirstTx(); B Write (Req[Tx, Ch], &S); //} Tx request while (!B Sync(Ch)); B Write(Addr[B,Ch], D, S); } // end Send
Bus Channel Methods Write (Addr, DataPtr); Read (Addr, DataPtr); Sync (Channel_ID);
Synchronization and Transfer Model
Routing and Packeting B
Bus B Req
I/O Buffer
• •
Tx
Bus channel model provides basic transaction services Synchronization and transfer is modeled in SW core Copyright ©2009, CECS
17
TLM vs. Traditional Models TLM: Transaction Level Model ISM: Instruction Set Model PCAM: Pin/Cycle Accurate Model
Accuracy Board 100%
PCAM
Timed TLM
~92% ~80%
ISM
Func. TLM 0
2sec
3~4 hrs
15~18hrs
Exec. Time (MP3)
Time and accuracy trade off among different models Copyright ©2009, CECS
18
Outline •
Motivation for many-core HdS synthesis
•
Model based approach to system synthesis
•
HdS synthesis for TLM (Front-End)
•
HdS synthesis for PCAM (Back-End)
•
Embedded System Environment (ESE)
•
Experimental results for MP3 decoder example
•
Conclusion Copyright ©2009, CECS
19
Back End Synthesis Flow SystemC TLM
SW/RTOS Library
SW Synthesis
RTL IP Library
CÆRTL (Forte/NISC)
Interface Synthesis
OR Bus Library
Binary
HW RTL
IF RTL
Pin/Cycle Accurate Model (PCAM) Generator Prototype
CA Sim. Tools
C/Verilog PCAM
FPGA Tools Copyright ©2009, CECS
20
CPU1
Cycle-Accurate Software Synthesis Program EXE RTOS
Compile
P1 RTOS/ Driver Synthesis
HAL
P2 OS HAL
Transducer
Bus1
Bus2
Compile
OS P4 HW IP
Program EXE
RTOS/ Driver Synthesis
RTOS HAL
CPU2 Copyright ©2009, CECS
21
CPU1
Cycle-Accurate Hardware Synthesis Mem
Transducer
Bus1
Bus2
Cycle-accurate Synthesis
P3
HW IP (RTL)
Processes in C
CPU2 Copyright ©2009, CECS
22
Cycle-Accurate Interface Synthesis CPU1
Mem IC
Transducer Arbiter
Interface Synthesis
HW IP
CPU2
Copyright ©2009, CECS
23
CPU1
Pin/Cycle-Accurate Model (PCAM) Mem
Program EXE
IC
RTOS HAL
Transducer Arbiter
Program EXE
HW IP
PCAM is input for fast prototyping with FPGAs or implementation with ASIC design tools
RTOS HAL
CPU2
Copyright ©2009, CECS
24
Pin and Cycle Accurate Model (PCAM) CPU1 Model
System SW stack
App. Binary
Channel API Routing, Packeting Thread. IPC Synchronization Lib. Lib. Data transfer Mem. IH IH ... Access Instruction Set
CPU2 Model
Memory Transducer RTL
Bus1
Bus2 Bitstream
• •
Download
Sync. and transfer layers of HdS generated in PCAM Application, System libs and HdS is cross-compiled Copyright ©2009, CECS
25
HdS Layers in PCAM SW Core
Sender Process i_Send (D,S);
i_Send (data, size) { sent = 0; //Rt = GSRT(Ch); //pktSz=MinBuf(Rt); do { i_SyncTransfer (data[sent], pktSz); sent+=PktSz; } while (sentFirstBus(); //if (Rt->HasTx()) { //Tx=Rt->FirstTx(); WrMem (SendRB[Tx, Ch], S, WordSize); //} end Tx case while (!SyncFlag_Ch) { SuspendProc(); } SyncFlag_Ch=0; //SA=low(AR(B,Ch)); WrMem(SA, D, S); } // end Send Synchronization and Transfer
WrMem (A,D,S) { Written=0; while (Written