Engineering an Effective PowerPoint Presentation

0 downloads 0 Views 2MB Size Report
Selective Read-out Processor (SRP) for the CMS ECAL. On-board Gamma .... FPGA + soft-core processor (MMU +FPU) + Real time OS. Microblaze (Xilinx ...
FPGA-based System-on-Chip Designs for Real-Time Applications in Particle Physics

Shebli Anvar, Olivier Gachelin, Pierre Kestener, Herve Le Provost, Irakli Mandjavidze DAPNIA, CEA Saclay, 91191 Gif-sur-Yvette, France [email protected]

VLVnt2, Catania, Italy, November 2005

Overview 

Platform FPGAs  



Example designs (On-going projects)   



Xilinx Virtex-II Pro devices Typical SoC Architecture Test bench for the ANTARES off-shore DAQ/ SC board Selective Read-out Processor (SRP) for the CMS ECAL On-board Gamma Ray Burst DAQ/ Trigger and alert system for the ECLAIRs microsatellite

Conclusive remarks on the use of SoC approach

[email protected]

VLVnt2, Catania, Italy, November 2005

2

Platform FPGAs (Virtex-II Pro)

MGT

DCM

CPU

M E M O R Y

 Programmable logic cells → combinatorial and synchronous  Versatile IOs → Single ended (LVTTL, LVCMOS) and differential (LVDS)  Hard IP cores → Clock management → Memory blocks → Serial transceivers (MGT) → Embedded processor(s)  Plus various soft IP cores → Microcontrollers, network IF...

Xilinx Virtex-II Pro 2vp30 (Middle range device)

2 PowerPC 405 CPU @ 300 MHz 8 RocketIO transceivers up to 3.125 Gbit/ s 136 18-kbit dual-port memories blocks = 2.4 Mbits 644 configurable I/ Os [email protected]

VLVnt2, Catania, Italy, November 2005

3

SoC architecture on Virtex-II Pro  IBM CoreConnect standard : on-chip bus-communication link I-cache UART RS232 console

PowerPC D-cache

P

Memory

L B

User logic User defined IP

Ethernet

O P

Fast periphs

B

Slow periphs Master/ slave interface IPIF

PLB / OPB bridge

IPIC

Clock, Reset, JTAG Existing hard and soft IP cores PLB – Processor local bus OPB – On-chip peripheral bus [email protected]

Specific IPIF – IP Interface IPIC – IP Interconnect

VLVnt2, Catania, Italy, November 2005

4

SoC example 50 MHz

50 MHz PowerPC 100 MHz

P

RS232 console 19200 baud

32 kB memory

Reset & ID register O

L

P

B

B

Data register

Slave interface IPIF

PLB / OPB bridge

User Logic

Clock, Reset, JTAG

256 byte memory

IPIC 32-bit R/ W

 12% of BRAM and 7% of logic cells of a middle range 2VP30 device

Plenty resources for much more sophisticated user cores [email protected]

VLVnt2, Catania, Italy, November 2005

5

Prototyping and design  Number of development kits with various Virtex-II Pro devices → 2VP7, 2VP30, 2VP50 1 or 2 PowerPCs

→ → → → → →

FF1152 development kit from Memec Inc.

4 or 8 RocketIO transceivers Pluggable optical modules LVDS interfaces RS232, Ethernet 64 Mbyte external memory P160 extension module RS232 + Ethernet + Flash Memory

→ Soft IP cores → Software libraries [email protected]

VLVnt2, Catania, Italy, November 2005

6

1st SoC development example: Test bench for the ANTARES DAQ/ SC board  Production test bench for 350 Local Control Modules → Electronics to be installed in Mediterranean Sea: 2.5 km below surface → Fully automated with test report populating quality control DB

 Several data & control interfaces with different IO standards Acoustic & thermal sensors RS232 & RS485

Input data LVDS links 350 Mbit/s

LCM board

Slow control LVTTL bus

Trigger LVDS links

On-shore link Ethernet

 Test bench emulates LCM environment → Stimulates inputs and analyzes responses [email protected]

VLVnt2, Catania, Italy, November 2005

7

The test bench  SoC-based tester board Memec development kit with Xilinx 2VP30 FPGA Supports hot swappable DAQ/ SC Test duration ~ 15 minutes per LCM SoC tester board

LCM board

LVTTL Bus

LVDS Links

Ethernet

RS232/ 48 5 [email protected]

VLVnt2, Catania, Italy, November 2005

8

Test bench organization  3 interacting systems: control PC, LCM & SoC tester  200 MHz embedded PowerPC on tester FPGA runs Linux OS → with NFS root file system on control PC Simple cross-compilation step to reuse and adapt the ANTARES DAQ software concentrate development efforts on the test functionalities  Successions of tests initiated by control PC → Actions taken by C++ callback functions in LCM & tester Control PC Test bench s/ w Test 1

SoC-based tester Test bench s/ w Callbacks [email protected]

...

Test n

100Mbit/ s Ethernet Switch Serial links: LVDS, RS232 Slow control bus: LVTTL VLVnt2, Catania, Italy, November 2005

LCM Test bench s/ w Callbacks 9

Firmware design of the SoC tester  An IP core per test → C++ callback function addresses the IP core corresponding to the active test

66 MHz

RS232 Console

O

Test n Test ... Test 1: Data Test IP Core

RS232/ RS485 for sensors 16x256 block RAM preloaded data

P B

Three-state LVDS buffers 7 data links to LCM 350 Mbit/ s

Ethernet FSM IPIF slave Interface

50 MHz Read-out clock from LCM

 Simplified firmware development → Most IP cores are very simple – test sequence in software → Use of existing IP cores for Ethernet and RS232/ RS485 interfaces [email protected]

VLVnt2, Catania, Italy, November 2005

10

2nd SoC development example: The Selective Read-out Processor (for CMS)  Part of the CMS electromagnetic calorimeter read-out  Assists in on-line ECAL raw data reduction ECAL Front-end electronics

Trigger electronics

L1 Accept

Read-out

Raw data 1.5 Mbyte

Selected data 100 Kbyte

Trigger tower flags

100 kHz

Selective read-out flags

Selective Read-out Processor 5 µs timing budget

HLT & DAQ

Asynchronous hard real time system [email protected]

VLVnt2, Catania, Italy, November 2005

11

SRP Boards  Singe 6U VME crate 12 conceptually identical VME64x compliant boards Up to 17 optical communication links at 1.6 Gbit/s each P1

J0

VME buffers Boundary scan & JTAG chain FPROMs Memory

Parallel optics T T F Rx

S R F Tx

S R P Rx

S R P Tx

P2

Xilinx V-II Pro xc2vp70-6ff1704 VME Serial links Algorithms Trigger IF

Power supply

Clock synthesizer

Trigger Interface Trigger, timing, and Throttling control TTS Out

O/ E

Aux. connector Cons., JTAG Ethernet

SRP Tester: same hardware, modified firmware [email protected]

VLVnt2, Catania, Italy, November 2005

12

SRP Application IP core VME RocketIO transceiver

50 MHz

IPIF slave interface O P Ethernet

B

RS232 Console

Arbiter L O C A L B U S

SR algo logic

Multiport memories

RocketIO transceiver

RocketIO transceiver SR FSM

RC FSM

Trigger Interface

 Seamless integration in SoC based on Virtex-II Pro devices → Embedded processor accesses IP resources via slave interface → 80 MHz pipelined hardware logic to satisfy real time requirements → Standalone “C” software on 100 MHz PowerPC to control and monitor [email protected]

VLVnt2, Catania, Italy, November 2005

13

SRP Prototyping  Three development kits  3 firmware  3 standalone “C” applications Trigger control system emulator (2vp7: also a SoC design)

Trigger signals over flat cables TT and SR links SRP Tester

SRP (2VP50)

(2VP30)

Internal SRP links  Validate SRP latency and communication channels  Advance in SRP firmware/ software [email protected]

VLVnt2, Catania, Italy, November 2005

14

Summary  Flexibility of SoC designs  Diversity of applications with substantially different requirements  Comfortable development environment  Relatively short learning phase  Common kernel + large variety of IP cores and associated software  Well defined interface with user logic  Tradeoff between hardware and software complexity  Running OS on embedded processor (VxWorks, Linux, Nucleus, RTEMS, …)  Facility of debugging and testing  Simulate individual modules  Debug entire system running a test application on embedded processor

Performance of hardware and flexibility of software [email protected]

VLVnt2, Catania, Italy, November 2005

15

3rd example: On-board GRB Trigger and Alert System of the ECLAIRs microsatellite  Gamma Ray Burst study (4 to 50 keV)  Compute in near real-time the position of the GRB in the sky with

an accuracy of up to 10 arcmin  Transmit this information on-ground in real-time and distribute it as fast as possible to other observatories  On-board 2-level trigger system  first level : counting histogram (hardware)  second level : image processing to localize sources (software, FFT)

SoC approach for Hardware/ Software design of the DAQ/ Trigger sub-system FPGA + soft-core processor (MMU +FPU) + Real time OS Microblaze (Xilinx, 32 bits RISC) LEON (Open Source, Spark v8, ESA project) [email protected]

VLVnt2, Catania, Italy, November 2005

16

ECLAIRs On-board trigger and data-flow CXG 16 ADC

EGCU Config/Status/HK CXG Position Refine -request -answer

Config/Status/HK SXC

SXC

TM/TC Config/Status/HK UTS

8 modules+overseer all photons

AbsTime SatPointing

DAQ

photon lists freeze mem

Trigger

bulk-mem

GRB alert

UTS

[email protected]

VLVnt2, Catania, Italy, November 2005

VHF

X-band

17

Summary  Flexibility of SoC designs → Diversity of applications with substantially different requirements  Comfortable development environment → Relatively short learning phase → Common kernel + large variety of IP cores and associated software → Well defined interface with user logic → Tradeoff between hardware and software complexity → Running OS on embedded processor  Facility of debugging and testing → Simulate individual modules → Debug entire system running a test application on embedded processor

Performance of hardware and flexibility of software [email protected]

VLVnt2, Catania, Italy, November 2005

18

Xilinx EDK (Embedded Development Kit)

[email protected]

VLVnt2, Catania, Italy, November 2005

19

Virtex-II Pro Linux boot terminal

[email protected]

VLVnt2, Catania, Italy, November 2005

20

Deconvolution algorithm – computing speed study

 decorrelate data from detector with the mask

geometry  2D Decorrelation :

FFT ( N^ 2 log(N)) versus direct decorrelation (N^ 4)  N ~mask_size + detector_size ~120 + 80 = 200 [email protected]

VLVnt2, Catania, Italy, November 2005

21

Hardware implementation of the FFT algorithm (FPGA via a IP Core)  fixed-point computing : Xilinx IP core (datasheet ds260) * FFT1D 256, 24 bits data, Virtex-II @200MHz : 1 to 2 µs * extrapolate to 2D FFT 256x256 : 500 to 1000 µs * NOT EASY to HANDLE (troncature, numeric representation, etc…)

 floating-point computing: IP core from Dillon or 4DSP * FFT1D 256, data 8+16 bits, virtex-4@200MHz : 4 µs extrapolation :Virtex-II@100MHz : 8 µs * extrapolate to 2D FFT 256x256 : 2000 to 4000 µs expensive IP (19 to 26000 dollars) target-specific (VHDL sources unavailable). 

Write an FFT IP (floating point) : several weeks * design directly in VHDL * co-design software C-to-HDL (Handel-C)

[email protected]

VLVnt2, Catania, Italy, November 2005

22

Software implementation of the FFT algorithm (C langage)  Avantage : development easy, testbench easy to develop  Embedded processors : LEON (SparcV8) or MICROBLAZE

(RISC)  Compilation toolchain : GCC Library FFTW benchmark Pentium4 2.2GHz

[email protected]

VLVnt2, Catania, Italy, November 2005

23

Software implementation of the FFT algorithm (C langage) -- 2  Test of the FFTW3 library: ./ bench irf 256x256 •

Desktop Machines (P4 2GHz or Sparc 1.6Ghz) : 3ms (950 MFLOPS)

 Extrapolate for LEON on FPGA@100MHz : 50ms to 100ms (this supposes

that FPU can handle such frequency)  Problem with MICROBLAZE (compiler that can handle FPU not available),

Xilinx FPU performance is 33MFLOPS (Virtex-4@200MHz, usable with Virtex-II@100MHz).  To compare, Virtex-II Pro@200MHz (Linux + soft-emulated floating point) :

1.94 SECONDES (1.4 MFLOPS)  One can estimate to several 0.1 secondes the total time of the source

localization algorithm  memory : 12x256kBytes ~3Mbytes RAM

[email protected]

VLVnt2, Catania, Italy, November 2005

24