Image and video Processing and Communication nd ...

4 downloads 1939 Views 977KB Size Report
Image and video Processing and Communication. DYNAMIC MULTIOBJECTIVE OPTIMIZATION MAN. ENERGY-PERFORMANCE-ACCURACY SPACE FOR ...
Image and video Processing and Communications Laboratory

DYNAMIC MULTIOBJECTIVE OPTIMIZATION MANAGEMENT OF THE ENERGY-PERFORMANCE-ACCURACY ACCURACY SPACE FOR SEPARABLE 2-D 2 COMPLEX FILTERS ABSTRACT

OPTIMIZATION FRAMEWORK FOR 2D COMPLEX FILTERS: The optimization framework consists of 3 steps:

imag

-

COEFFICIENTS (imag) SYMMETRY (imag) v E sclr 1D FIR core COEFFICIENTS (imag) SYMMETRY (imag) v E sclr 1D FIR core

E

NO

Yr

NO

Yi

+

COEFFICIENTS (real) SYMMETRY (real) v E sclr 1D FIR core

sclr

v

The 1D complex FIR filter IP processes complex input data and has complex coefficients (VHDL IP core available at www.ivpcl.org)

EPA constraints

Embedded System: The figure below shows an embedded system (implemented in the ML605 Board) that allows for Dynamic Partial Reconfiguration and Dynamic Frequency Control. This embedded system implements the 2D separable complex FIR filter. The 1D complex FIR filter IP is a PLB peripheral. Dynamic Frequency Control: The Dynamic Reconfiguration Port (DRP) of the Multi-Mode Clock Manager (MMCM) inside the Virtex-6 FPGA can adjust the frequency at run-time via the parameter O0. The complex filter is clocked at the adjustable frequency clkfx while the rest is clocked at PLB_clk (100 MHz).

(a)

PLB Bus

(b)

PR_done

...

DRP

Filter n

PLB slave interface

Filter 2

Col n

CF card

Filter 1

Row n

'2n' bitstreams in memory

DPR

Col 2

memory

ICAP core

Row 2

DMA core

Col 1

System ACE

PRR

Row 1

Ethernet MAC

S ICAP port

M

External IBUFGDS ref. clock clock (200 MHz) M clkfx = × clkin D × O0 O0 is controlled at run-time PLB Bus

oFIFO

PLB

complex FIR IP core

MMCM_ADV clkfbin clkfbout clkin1 rst clkout0 BUFG dclk di[15..0] dwe den daddr[6..0] do[15..0] drdy

...

µ Blaze

iFIFO

BUFG

interface

clkfx frequency & PR ctrl PR_done

clkfx

Complex Filter peripheral

FREQ & PR CTRL - PLB SLAVE

FSM

rd wr

rst

FSM

ofull

PR_done

2D Gabor magnitude response

Real-valued image

(c)

IP2Bus_AddrAck

IP2Bus_Wrack

IP2Bus_Rdack

IP2Bus_Data

PLB Interface - User Logic

(b)

(a) 'm' Pareto points PO 1 PO 2 PO 3 PO 4

Row 1 Col 1

f1 EPA1

Row 2 Col 2

f2 EPA2

Row 1 Col 1

f3 EPA3

Row 3 Col 3

f2 EPA4

Pareto front

PO m

Row n Col n

f4 EPAm

bitstream pair 'n' unique pairs -Performance n ≤ m (c)

-Accuracy

EPA values

Filtered image (magnitude)

RESULTS Multiobjective optimization of the EPA space: The Pareto front lied entirely at the 100 MHz frequency. Thus, we carried out optimization for the EPA space at 100 MHz. There are 20 Pareto-optimal points, that requires 20 MB of memory (40 bitstreams). Dynamic EPA Management: Time-varying constraints applied to a video sequence. The circled points meet the constraints. 0.35

0.3

 Energy per frame Accuracy (psnr)



min 50dB

OB=8

NH=10, OB=16 62.11dB, 0.2093mJ



N=7, NH=10, OB=16 51.47dB, 0.1821mJ

psnr(dB)

90



N=29,NH=12,OB=16 90.68dB, 0.2463mJ

80

(a)

N=31

N=15

N=29 N=23

N=11 N=9

N=19

N=7

100

215

frame # 190

frame # 30



0.2

psnr (dB)

100



N=31, NH=16, OB=16 98dB, 0.289mJ



60



0.25mJ 0.25mJ -60dB max max

Freq.: 100 MHz N=15,

0.25

40 clkfx

-Performance

Gabor 31x31

iwren

PLB_clk

-Performance -Accuracy

A complex image is processed through a Gabor separable filter (complex coeffiicients). The table shows: parameter and frequency combinations for the generation of the EPA space. N (# of coefficients), NH (coefficient bit-width), L (LUT size), OB (output bit-width). Ideal filter: 31x31 double precision coefficients. Test image: analytical lena (8-bit, CIF resolution)

Energy per frame(mJ)

iempty

orden

irden

E sclr v

oempty

Yi

full empty

Xi

Yr

32

complex FIR IP B=8, NO=8

Slave Reg 1 oFIFO 512x32 FWFT mode DO DI rden wren

owren

ifull

Xr

32

full empty

Bus2IP_Data iwren

Bus2IP_RdCE(1)

Bus2IP_WrCE(0)

Bus2IP_BE

Bus2IP_Clk

Bus2IP_WrReq

Bus2IP_RdReq

B=8

PRR NO=8

-Accuracy

EXPERIMENTAL SETUP

PLB46 Slave Burst IP Slave Reg 0 iFIFO 512x32 FWFT mode DI DO wren rden

Pareto front

...

1D complex FIR Filter IP

constraints

COEFFICIENTS (real) SYMMETRY (real) v E sclr 1D FIR core

real

freq.

imag

Energy per frame

real

Feasible set: golden points. Circled point: realization from the feasible set that minimizes energy consumption. Pareto optimal point: Hardware realization that becomes active in the FPGA via DPR and/or Dynamic Frequency Control. It is represented by:

Energy per frame

[NO NQ]

OP

L

COEFFICIENTS SYMMETRY (from text file)

Accuracy(Ri ) ≥ 50dB subject to : Performance(Ri ) ≥ 30 fps

Energy per frame

Xi

B

min Energy(Ri ) Ri

PSNR (dB)

B

1) Generate the Energy-Performance-Accuracy space of 2D complex filter realizations, by varing hardware parameters and frequency of operation. 2) Multi-objective Pareto Optimization of the EPA space: The EPA space is represented by a set of hardware realizations along with their EPA values. We find the optimal realizations in the Pareto (multi-objective) sense. 3) Dynamic management based on real-time EPA constraints: Once the Pareto front has been extracted, we can cast optimization problems based on PPA constraints. Example:

...

In the context of an embedded system, a 2D separable complex filter is implemented by cyclically swapping the row filter (1D) with the column filter (1D) via Dynamic Partial Reconfiguration.

Xr

NH

2D SEPARABLE COMPLEX FILTER IMPLEMENTATION AND PLB PERIPHERAL

B

N

We present a dynamic framework for 2D complex filter implementation that is based on a multi-objective optimization scheme that generates Pareto-optimal realizations from the Energy-Performance-Accuracy Accuracy (EPA) space. The EPA space is created by evaluating the 2D complex filter realizations in terms of their required energy, accuracy, and performance. Dynamic EPA management, carried out via Dynamic Partial Reconfiguration (DPR) and Dynamic Frequency Control, then consists on selecting Pareto-optimal realizations that meet timetime varying EPA requirements. We demonstrate dynamic EPA management by applying a complex filter to a standard video sequence.

80

70

205

frame # 120

frame # 280

60 avg=58.78  std=0.3348

210 fps

50

fps

0

50



Frame #

avg=66.56 std=0.4168

100

150

avg=93.47  std=0.0631

200

avg=98.22  std=0.0578

250

300

(b)

Daniel Llamocca [email protected] C esar C arranza [email protected]

Oslo, Norway, Aug. 29-31, 2012

Marios Pattichis

[email protected]