Image and video Processing and Communication. DYNAMIC MULTIOBJECTIVE
OPTIMIZATION MAN. ENERGY-PERFORMANCE-ACCURACY SPACE FOR ...
Image and video Processing and Communications Laboratory
DYNAMIC MULTIOBJECTIVE OPTIMIZATION MANAGEMENT OF THE ENERGY-PERFORMANCE-ACCURACY ACCURACY SPACE FOR SEPARABLE 2-D 2 COMPLEX FILTERS ABSTRACT
OPTIMIZATION FRAMEWORK FOR 2D COMPLEX FILTERS: The optimization framework consists of 3 steps:
imag
-
COEFFICIENTS (imag) SYMMETRY (imag) v E sclr 1D FIR core COEFFICIENTS (imag) SYMMETRY (imag) v E sclr 1D FIR core
E
NO
Yr
NO
Yi
+
COEFFICIENTS (real) SYMMETRY (real) v E sclr 1D FIR core
sclr
v
The 1D complex FIR filter IP processes complex input data and has complex coefficients (VHDL IP core available at www.ivpcl.org)
EPA constraints
Embedded System: The figure below shows an embedded system (implemented in the ML605 Board) that allows for Dynamic Partial Reconfiguration and Dynamic Frequency Control. This embedded system implements the 2D separable complex FIR filter. The 1D complex FIR filter IP is a PLB peripheral. Dynamic Frequency Control: The Dynamic Reconfiguration Port (DRP) of the Multi-Mode Clock Manager (MMCM) inside the Virtex-6 FPGA can adjust the frequency at run-time via the parameter O0. The complex filter is clocked at the adjustable frequency clkfx while the rest is clocked at PLB_clk (100 MHz).
(a)
PLB Bus
(b)
PR_done
...
DRP
Filter n
PLB slave interface
Filter 2
Col n
CF card
Filter 1
Row n
'2n' bitstreams in memory
DPR
Col 2
memory
ICAP core
Row 2
DMA core
Col 1
System ACE
PRR
Row 1
Ethernet MAC
S ICAP port
M
External IBUFGDS ref. clock clock (200 MHz) M clkfx = × clkin D × O0 O0 is controlled at run-time PLB Bus
oFIFO
PLB
complex FIR IP core
MMCM_ADV clkfbin clkfbout clkin1 rst clkout0 BUFG dclk di[15..0] dwe den daddr[6..0] do[15..0] drdy
...
µ Blaze
iFIFO
BUFG
interface
clkfx frequency & PR ctrl PR_done
clkfx
Complex Filter peripheral
FREQ & PR CTRL - PLB SLAVE
FSM
rd wr
rst
FSM
ofull
PR_done
2D Gabor magnitude response
Real-valued image
(c)
IP2Bus_AddrAck
IP2Bus_Wrack
IP2Bus_Rdack
IP2Bus_Data
PLB Interface - User Logic
(b)
(a) 'm' Pareto points PO 1 PO 2 PO 3 PO 4
Row 1 Col 1
f1 EPA1
Row 2 Col 2
f2 EPA2
Row 1 Col 1
f3 EPA3
Row 3 Col 3
f2 EPA4
Pareto front
PO m
Row n Col n
f4 EPAm
bitstream pair 'n' unique pairs -Performance n ≤ m (c)
-Accuracy
EPA values
Filtered image (magnitude)
RESULTS Multiobjective optimization of the EPA space: The Pareto front lied entirely at the 100 MHz frequency. Thus, we carried out optimization for the EPA space at 100 MHz. There are 20 Pareto-optimal points, that requires 20 MB of memory (40 bitstreams). Dynamic EPA Management: Time-varying constraints applied to a video sequence. The circled points meet the constraints. 0.35
0.3
Energy per frame Accuracy (psnr)
min 50dB
OB=8
NH=10, OB=16 62.11dB, 0.2093mJ
N=7, NH=10, OB=16 51.47dB, 0.1821mJ
psnr(dB)
90
N=29,NH=12,OB=16 90.68dB, 0.2463mJ
80
(a)
N=31
N=15
N=29 N=23
N=11 N=9
N=19
N=7
100
215
frame # 190
frame # 30
0.2
psnr (dB)
100
N=31, NH=16, OB=16 98dB, 0.289mJ
60
0.25mJ 0.25mJ -60dB max max
Freq.: 100 MHz N=15,
0.25
40 clkfx
-Performance
Gabor 31x31
iwren
PLB_clk
-Performance -Accuracy
A complex image is processed through a Gabor separable filter (complex coeffiicients). The table shows: parameter and frequency combinations for the generation of the EPA space. N (# of coefficients), NH (coefficient bit-width), L (LUT size), OB (output bit-width). Ideal filter: 31x31 double precision coefficients. Test image: analytical lena (8-bit, CIF resolution)
Energy per frame(mJ)
iempty
orden
irden
E sclr v
oempty
Yi
full empty
Xi
Yr
32
complex FIR IP B=8, NO=8
Slave Reg 1 oFIFO 512x32 FWFT mode DO DI rden wren
owren
ifull
Xr
32
full empty
Bus2IP_Data iwren
Bus2IP_RdCE(1)
Bus2IP_WrCE(0)
Bus2IP_BE
Bus2IP_Clk
Bus2IP_WrReq
Bus2IP_RdReq
B=8
PRR NO=8
-Accuracy
EXPERIMENTAL SETUP
PLB46 Slave Burst IP Slave Reg 0 iFIFO 512x32 FWFT mode DI DO wren rden
Pareto front
...
1D complex FIR Filter IP
constraints
COEFFICIENTS (real) SYMMETRY (real) v E sclr 1D FIR core
real
freq.
imag
Energy per frame
real
Feasible set: golden points. Circled point: realization from the feasible set that minimizes energy consumption. Pareto optimal point: Hardware realization that becomes active in the FPGA via DPR and/or Dynamic Frequency Control. It is represented by:
Energy per frame
[NO NQ]
OP
L
COEFFICIENTS SYMMETRY (from text file)
Accuracy(Ri ) ≥ 50dB subject to : Performance(Ri ) ≥ 30 fps
Energy per frame
Xi
B
min Energy(Ri ) Ri
PSNR (dB)
B
1) Generate the Energy-Performance-Accuracy space of 2D complex filter realizations, by varing hardware parameters and frequency of operation. 2) Multi-objective Pareto Optimization of the EPA space: The EPA space is represented by a set of hardware realizations along with their EPA values. We find the optimal realizations in the Pareto (multi-objective) sense. 3) Dynamic management based on real-time EPA constraints: Once the Pareto front has been extracted, we can cast optimization problems based on PPA constraints. Example:
...
In the context of an embedded system, a 2D separable complex filter is implemented by cyclically swapping the row filter (1D) with the column filter (1D) via Dynamic Partial Reconfiguration.
Xr
NH
2D SEPARABLE COMPLEX FILTER IMPLEMENTATION AND PLB PERIPHERAL
B
N
We present a dynamic framework for 2D complex filter implementation that is based on a multi-objective optimization scheme that generates Pareto-optimal realizations from the Energy-Performance-Accuracy Accuracy (EPA) space. The EPA space is created by evaluating the 2D complex filter realizations in terms of their required energy, accuracy, and performance. Dynamic EPA management, carried out via Dynamic Partial Reconfiguration (DPR) and Dynamic Frequency Control, then consists on selecting Pareto-optimal realizations that meet timetime varying EPA requirements. We demonstrate dynamic EPA management by applying a complex filter to a standard video sequence.
80
70
205
frame # 120
frame # 280
60 avg=58.78 std=0.3348
210 fps
50
fps
0
50
Frame #
avg=66.56 std=0.4168
100
150
avg=93.47 std=0.0631
200
avg=98.22 std=0.0578
250
300
(b)
Daniel Llamocca
[email protected] C esar C arranza
[email protected]
Oslo, Norway, Aug. 29-31, 2012
Marios Pattichis
[email protected]