FPGA-based System-on-Chip Designs for Real-Time Applications in Particle Physics
Shebli Anvar, Olivier Gachelin, Pierre Kestener, Herve Le Provost, Irakli Mandjavidze DAPNIA, CEA Saclay, 91191 Gif-sur-Yvette, France
[email protected]
VLVnt2, Catania, Italy, November 2005
Overview
Platform FPGAs
Example designs (On-going projects)
Xilinx Virtex-II Pro devices Typical SoC Architecture Test bench for the ANTARES off-shore DAQ/ SC board Selective Read-out Processor (SRP) for the CMS ECAL On-board Gamma Ray Burst DAQ/ Trigger and alert system for the ECLAIRs microsatellite
Conclusive remarks on the use of SoC approach
[email protected]
VLVnt2, Catania, Italy, November 2005
2
Platform FPGAs (Virtex-II Pro)
MGT
DCM
CPU
M E M O R Y
Programmable logic cells → combinatorial and synchronous Versatile IOs → Single ended (LVTTL, LVCMOS) and differential (LVDS) Hard IP cores → Clock management → Memory blocks → Serial transceivers (MGT) → Embedded processor(s) Plus various soft IP cores → Microcontrollers, network IF...
Xilinx Virtex-II Pro 2vp30 (Middle range device)
2 PowerPC 405 CPU @ 300 MHz 8 RocketIO transceivers up to 3.125 Gbit/ s 136 18-kbit dual-port memories blocks = 2.4 Mbits 644 configurable I/ Os
[email protected]
VLVnt2, Catania, Italy, November 2005
3
SoC architecture on Virtex-II Pro IBM CoreConnect standard : on-chip bus-communication link I-cache UART RS232 console
PowerPC D-cache
P
Memory
L B
User logic User defined IP
Ethernet
O P
Fast periphs
B
Slow periphs Master/ slave interface IPIF
PLB / OPB bridge
IPIC
Clock, Reset, JTAG Existing hard and soft IP cores PLB – Processor local bus OPB – On-chip peripheral bus
[email protected]
Specific IPIF – IP Interface IPIC – IP Interconnect
VLVnt2, Catania, Italy, November 2005
4
SoC example 50 MHz
50 MHz PowerPC 100 MHz
P
RS232 console 19200 baud
32 kB memory
Reset & ID register O
L
P
B
B
Data register
Slave interface IPIF
PLB / OPB bridge
User Logic
Clock, Reset, JTAG
256 byte memory
IPIC 32-bit R/ W
12% of BRAM and 7% of logic cells of a middle range 2VP30 device
Plenty resources for much more sophisticated user cores
[email protected]
VLVnt2, Catania, Italy, November 2005
5
Prototyping and design Number of development kits with various Virtex-II Pro devices → 2VP7, 2VP30, 2VP50 1 or 2 PowerPCs
→ → → → → →
FF1152 development kit from Memec Inc.
4 or 8 RocketIO transceivers Pluggable optical modules LVDS interfaces RS232, Ethernet 64 Mbyte external memory P160 extension module RS232 + Ethernet + Flash Memory
→ Soft IP cores → Software libraries
[email protected]
VLVnt2, Catania, Italy, November 2005
6
1st SoC development example: Test bench for the ANTARES DAQ/ SC board Production test bench for 350 Local Control Modules → Electronics to be installed in Mediterranean Sea: 2.5 km below surface → Fully automated with test report populating quality control DB
Several data & control interfaces with different IO standards Acoustic & thermal sensors RS232 & RS485
Input data LVDS links 350 Mbit/s
LCM board
Slow control LVTTL bus
Trigger LVDS links
On-shore link Ethernet
Test bench emulates LCM environment → Stimulates inputs and analyzes responses
[email protected]
VLVnt2, Catania, Italy, November 2005
7
The test bench SoC-based tester board Memec development kit with Xilinx 2VP30 FPGA Supports hot swappable DAQ/ SC Test duration ~ 15 minutes per LCM SoC tester board
LCM board
LVTTL Bus
LVDS Links
Ethernet
RS232/ 48 5
[email protected]
VLVnt2, Catania, Italy, November 2005
8
Test bench organization 3 interacting systems: control PC, LCM & SoC tester 200 MHz embedded PowerPC on tester FPGA runs Linux OS → with NFS root file system on control PC Simple cross-compilation step to reuse and adapt the ANTARES DAQ software concentrate development efforts on the test functionalities Successions of tests initiated by control PC → Actions taken by C++ callback functions in LCM & tester Control PC Test bench s/ w Test 1
SoC-based tester Test bench s/ w Callbacks
[email protected]
...
Test n
100Mbit/ s Ethernet Switch Serial links: LVDS, RS232 Slow control bus: LVTTL VLVnt2, Catania, Italy, November 2005
LCM Test bench s/ w Callbacks 9
Firmware design of the SoC tester An IP core per test → C++ callback function addresses the IP core corresponding to the active test
66 MHz
RS232 Console
O
Test n Test ... Test 1: Data Test IP Core
RS232/ RS485 for sensors 16x256 block RAM preloaded data
P B
Three-state LVDS buffers 7 data links to LCM 350 Mbit/ s
Ethernet FSM IPIF slave Interface
50 MHz Read-out clock from LCM
Simplified firmware development → Most IP cores are very simple – test sequence in software → Use of existing IP cores for Ethernet and RS232/ RS485 interfaces
[email protected]
VLVnt2, Catania, Italy, November 2005
10
2nd SoC development example: The Selective Read-out Processor (for CMS) Part of the CMS electromagnetic calorimeter read-out Assists in on-line ECAL raw data reduction ECAL Front-end electronics
Trigger electronics
L1 Accept
Read-out
Raw data 1.5 Mbyte
Selected data 100 Kbyte
Trigger tower flags
100 kHz
Selective read-out flags
Selective Read-out Processor 5 µs timing budget
HLT & DAQ
Asynchronous hard real time system
[email protected]
VLVnt2, Catania, Italy, November 2005
11
SRP Boards Singe 6U VME crate 12 conceptually identical VME64x compliant boards Up to 17 optical communication links at 1.6 Gbit/s each P1
J0
VME buffers Boundary scan & JTAG chain FPROMs Memory
Parallel optics T T F Rx
S R F Tx
S R P Rx
S R P Tx
P2
Xilinx V-II Pro xc2vp70-6ff1704 VME Serial links Algorithms Trigger IF
Power supply
Clock synthesizer
Trigger Interface Trigger, timing, and Throttling control TTS Out
O/ E
Aux. connector Cons., JTAG Ethernet
SRP Tester: same hardware, modified firmware
[email protected]
VLVnt2, Catania, Italy, November 2005
12
SRP Application IP core VME RocketIO transceiver
50 MHz
IPIF slave interface O P Ethernet
B
RS232 Console
Arbiter L O C A L B U S
SR algo logic
Multiport memories
RocketIO transceiver
RocketIO transceiver SR FSM
RC FSM
Trigger Interface
Seamless integration in SoC based on Virtex-II Pro devices → Embedded processor accesses IP resources via slave interface → 80 MHz pipelined hardware logic to satisfy real time requirements → Standalone “C” software on 100 MHz PowerPC to control and monitor
[email protected]
VLVnt2, Catania, Italy, November 2005
13
SRP Prototyping Three development kits 3 firmware 3 standalone “C” applications Trigger control system emulator (2vp7: also a SoC design)
Trigger signals over flat cables TT and SR links SRP Tester
SRP (2VP50)
(2VP30)
Internal SRP links Validate SRP latency and communication channels Advance in SRP firmware/ software
[email protected]
VLVnt2, Catania, Italy, November 2005
14
Summary Flexibility of SoC designs Diversity of applications with substantially different requirements Comfortable development environment Relatively short learning phase Common kernel + large variety of IP cores and associated software Well defined interface with user logic Tradeoff between hardware and software complexity Running OS on embedded processor (VxWorks, Linux, Nucleus, RTEMS, …) Facility of debugging and testing Simulate individual modules Debug entire system running a test application on embedded processor
Performance of hardware and flexibility of software
[email protected]
VLVnt2, Catania, Italy, November 2005
15
3rd example: On-board GRB Trigger and Alert System of the ECLAIRs microsatellite Gamma Ray Burst study (4 to 50 keV) Compute in near real-time the position of the GRB in the sky with
an accuracy of up to 10 arcmin Transmit this information on-ground in real-time and distribute it as fast as possible to other observatories On-board 2-level trigger system first level : counting histogram (hardware) second level : image processing to localize sources (software, FFT)
SoC approach for Hardware/ Software design of the DAQ/ Trigger sub-system FPGA + soft-core processor (MMU +FPU) + Real time OS Microblaze (Xilinx, 32 bits RISC) LEON (Open Source, Spark v8, ESA project)
[email protected]
VLVnt2, Catania, Italy, November 2005
16
ECLAIRs On-board trigger and data-flow CXG 16 ADC
EGCU Config/Status/HK CXG Position Refine -request -answer
Config/Status/HK SXC
SXC
TM/TC Config/Status/HK UTS
8 modules+overseer all photons
AbsTime SatPointing
DAQ
photon lists freeze mem
Trigger
bulk-mem
GRB alert
UTS
[email protected]
VLVnt2, Catania, Italy, November 2005
VHF
X-band
17
Summary Flexibility of SoC designs → Diversity of applications with substantially different requirements Comfortable development environment → Relatively short learning phase → Common kernel + large variety of IP cores and associated software → Well defined interface with user logic → Tradeoff between hardware and software complexity → Running OS on embedded processor Facility of debugging and testing → Simulate individual modules → Debug entire system running a test application on embedded processor
Performance of hardware and flexibility of software
[email protected]
VLVnt2, Catania, Italy, November 2005
18
Xilinx EDK (Embedded Development Kit)
[email protected]
VLVnt2, Catania, Italy, November 2005
19
Virtex-II Pro Linux boot terminal
[email protected]
VLVnt2, Catania, Italy, November 2005
20
Deconvolution algorithm – computing speed study
decorrelate data from detector with the mask
geometry 2D Decorrelation :
FFT ( N^ 2 log(N)) versus direct decorrelation (N^ 4) N ~mask_size + detector_size ~120 + 80 = 200
[email protected]
VLVnt2, Catania, Italy, November 2005
21
Hardware implementation of the FFT algorithm (FPGA via a IP Core) fixed-point computing : Xilinx IP core (datasheet ds260) * FFT1D 256, 24 bits data, Virtex-II @200MHz : 1 to 2 µs * extrapolate to 2D FFT 256x256 : 500 to 1000 µs * NOT EASY to HANDLE (troncature, numeric representation, etc…)
floating-point computing: IP core from Dillon or 4DSP * FFT1D 256, data 8+16 bits, virtex-4@200MHz : 4 µs extrapolation :Virtex-II@100MHz : 8 µs * extrapolate to 2D FFT 256x256 : 2000 to 4000 µs expensive IP (19 to 26000 dollars) target-specific (VHDL sources unavailable).
Write an FFT IP (floating point) : several weeks * design directly in VHDL * co-design software C-to-HDL (Handel-C)
[email protected]
VLVnt2, Catania, Italy, November 2005
22
Software implementation of the FFT algorithm (C langage) Avantage : development easy, testbench easy to develop Embedded processors : LEON (SparcV8) or MICROBLAZE
(RISC) Compilation toolchain : GCC Library FFTW benchmark Pentium4 2.2GHz
[email protected]
VLVnt2, Catania, Italy, November 2005
23
Software implementation of the FFT algorithm (C langage) -- 2 Test of the FFTW3 library: ./ bench irf 256x256 •
Desktop Machines (P4 2GHz or Sparc 1.6Ghz) : 3ms (950 MFLOPS)
Extrapolate for LEON on FPGA@100MHz : 50ms to 100ms (this supposes
that FPU can handle such frequency) Problem with MICROBLAZE (compiler that can handle FPU not available),
Xilinx FPU performance is 33MFLOPS (Virtex-4@200MHz, usable with Virtex-II@100MHz). To compare, Virtex-II Pro@200MHz (Linux + soft-emulated floating point) :
1.94 SECONDES (1.4 MFLOPS) One can estimate to several 0.1 secondes the total time of the source
localization algorithm memory : 12x256kBytes ~3Mbytes RAM
[email protected]
VLVnt2, Catania, Italy, November 2005
24