Application-Specific Logic-in-Memory for Polar ... - Semantic Scholar

1 downloads 0 Views 2MB Size Report
Sep 22, 2011 - Alternative: Decimation filtering and then smaller-size IFFT. • Theory behind: Multiplication in the space is identical to convolution in the Fourier ...
Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar Qiuling Zhu, Eric L. Turner, Christian R. Berger, Larry Pileggi, Franz Franchetti September 22, 2011

Application-Specific Logic-in-Memory Can we push some memory-intensive computational logic into or close to the memory by constructing a smart and efficient “Logic in memory” block ?

Traditional:

Logic-in-memory:

Main Memory

Local Memory

CPU

Main Memory

Local Memory

CPU

logic

logic

Slide 2

Enabling Technology: Regular Patterns D. Morris, et. al, “Design of Embedded Memory and Logic Based On Pattern Constructs” , Symp.VLSI Technology, June 2011.

Regular patterns

SRAM bitcell

Implementing sub-22nm designs using a limited set of pattern constructs can enable robust compilation of smart memories

Application-specific “Magic” memory

Compatible Logic Compatible logic cells

Slide 3

Tool Chain: Chip Generator and Memory Compiler Logic in Memory

Local Memory logic Logic

Slide 4

Tool Chain: Chip Generator and Memory Compiler Chip Generator

Logic in Memory

Smart Memory Compiler SRAM bitcell

Local Memory

App-specific logic-in-memory Compatible logic cells

logic Logic

Slide 5

Tool Chain: Chip Generator and Memory Compiler Chip Generator

Logic in Memory

Smart Memory Compiler SRAM bitcell

Local Memory

App-specific logic-in-memory Compatible logic cells

logic Logic

Chip Generator

 Generates designs from high-level parameterization and specification  Utilizes Stanford’s chip generator platform (Genesis 2) Smart Memory Compiler

 Map memory and logic onto a set of pre-characterized pattern constructs  Allow flexible synthesis of logic and memory functionalities in place of hard IP Slide 6

Big Question: Impact on Algorithms Logic-in-memory changes the relative cost of operations, requiring new types of algorithms. Traditional  Data storage and processing are logically and physically split  Algorithms are optimized w.r.t. cost measure as Operation count, minimum number of memory accesses, reuse,… eg. FFT: O(log n), Matrix Multiplication: O(n)

Logic-in-memory  Local data dependency  Regular memory access pattern  Simple computational logic  Cost measure changes

Slide 7

Case Study: Interpolation Memory  Ex 2: Image Pyramid Memory

 Ex 1: FFT Twiddle Factor AL U

level k level k-1

AL U

 Ex 3:

Geometry Transformation

level k-2

 Ex 4: Tomography Backprojection

x Original Phantom image

Slide 8

Outline



SAR Polar Format Algorithms for Logic-in-Memory Extension: Partial Reconstruction



Implementation and Design Automation



Experimental Results



Summary

Slide 9

Synthetic Aperture Radar (SAR) Data acquisition

Slide 10

Synthetic Aperture Radar (SAR) Data acquisition

Image formation SAR image formation

Interpolation

2D FFT

Slide 11

FFT Upsampling Based Polar Reformatting m1

n2

m1

Grid Interpolation

n2

n2

n2

Inverse 2D FFT

SAR image formation:

Computational cost:

 Range interpolation

Interpolation: 10lm1·(m·log2(m) + n·log2(n))

• FFT upsampling based

 Cross range interpolation  2D inverse FFT

2D IFFT: 10·n22·log2(n2) I is the number of segments per range line, m is the input segment size and n is the size of the upsampled output segment.

Slide 12

FFT Upsampling Based Polar Reformatting m1

n2

m1

Grid Interpolation

n2

n2

n2

Inverse 2D FFT

SAR image formation:

Computational cost:

 Range interpolation

Interpolation: 10lm1·(m·log2(m) + n·log2(n))

• FFT upsampling based

 Cross range interpolation  2D inverse FFT

2D IFFT: 10·n22·log2(n2) I is the number of segments per range line, m is the input segment size and n is the size of the upsampled output segment.

Data transferring cost:

Slide 13

FFT Upsampling Based Polar Reformatting m1

n2

m1

Grid Interpolation

n2

n2

n2

Inverse 2D FFT

SAR image formation:

Computational cost:

 Range interpolation

Interpolation: 10lm1·(m·log2(m) + n·log2(n))

• FFT upsampling based

 Cross range interpolation  2D inverse FFT

Logic-in-Memory Interpolation • Needs new algorithm

2D IFFT: 10·n22·log2(n2) I is the number of segments per range line, m is the input segment size and n is the size of the upsampled output segment.

Data transferring cost:

Memory

CPU

Interpolation

Slide 14

Local Interpolation Based Polar Reformatting Approach: direct local interpolation

P(x,y)

Finding neighbors is expensive Grid points in Curvilinear grid (measurements) Grid points in Cartesian space (outputs)

sqrt, atan operations are expensive in Logic-in-memory Slide 15

Local Interpolation Based Polar Reformatting dx P(x,y)

dy

(+, -,×…) sqrt, atan…

Grid points in Curvilinear grid (measurements) Grid points in Cartesian space (outputs)

Steps:  Coordinate transformation • Four-corner image perspective geometric transformation • Avoid sqrt and atan

 2D surface interpolation • Simple logic computation • bilinear, bicubic,… Slide 16

2D Interpolation dx dx i, j

i, j

dy

i-1, j

dy i, j-1

i, j

i-1, j+2

i-1, j+1

i, j+1 i, j+1

P(x,y)

i+1, j

Nearest Neighbor

i-1, j-1

i, j+2

P(x,y)

i+1, j+1 i+1, j-1

Bilinear Interpolation

i+2, j-1

i+1, j i+2, j

i+1, j+1 i+2, j+1

i+1, j+2 i+2, j+2

Bicubic Interpolation Dividable 2D interpolation • Bilinear: (2 horizontal + 1 vertical) 1D interpolations • Bicubic: (4 horizontal + 1 vertical) 1D interpolations • 1D interpolation: Newton divided difference form based polynomial interpolation

Suitable for Logic in Memory • Localized computation: Outputs are only decided by their neighbors • Regular memory access: Continuous or block data array access • Simple computational logic: Adders, subs, boolean operations …

Slide 17

Tiling: Accurate Geometry Approximation

error

Geometry approximation conditions: K

 deltawidth is small enough  RL is large enough Solution: Image tiling

Tile1

RL

Tile2

Tile in the Cartesian grid Tile3 deltawidth

Tile4

 Output oriented tiling  Easy to identify boundary and tile overlap

Slide 18

Outline



SAR Polar Format Algorithms for Logic-in-Memory Extension: Partial Reconstruction



Implementation and Design Automation



Experimental Results



Summary

Slide 19

SAR Partial Reconstruction  Scenario: Big image, small screen, pan-and-zoom (e.g. handheld device)  Bad approach: reconstruct everything, display only region of interest  Better: reconstruct only what will be displayed requires sophisticated filtering before reconstruction Image data 10,000 × 10,000

Display 800× 600

Partial Image formation Partial image formation

Interpolation + Filtering

2D FFT

Slide 20

Partial Reconstruction I Reconstructs and displays low-resolution full-size image • Traditional: Interpolate all, full-size large IFFT then decimation • Alternative: Partial interpolation then smaller-size IFFT • Theory behind: Multiplication in the Frequency is identical to convolution in the spatial space.

cut off high frequencies in Fourier space

only computes the pixels that are required!

Smaller-size interpolation

Smaller-size IFFT Low pass filtering In the spatial domain

Slide 21

Partial Reconstruction II Reconstructs and displays a high-resolution image portion • Traditional: Full-size large IFFT, reconstruct all then cut off unnecessary region • Alternative: Decimation filtering and then smaller-size IFFT • Theory behind: Multiplication in the space is identical to convolution in the Fourier domain. Displacement in time is equivalent to phase shift

FFT

sample

ROI

interpolate

decimation filter

smaller IFFT

Logic in Memory

Slide 22

Decimation Filter Implementation  FIR Polyphase filter is expensive at high decimation factors  Cascaded Integrated Comb(CIC) filter is more economical • Large decimation factors • No multiplication inp

• CIC compensation is required

z-1

z-1

CIC filter structure

M=1 N=4

z-1

z-1

R outp z-M

z-M

Frequency Response:

z-M

z-M

Magnitude Response (dB) 0

Magnitude (dB)

-20 -40 -60

CIC ciccomp cascade

-80

-100

CIC Spec: Decimation factor = 16; N = 4; M= 1 -120 CIC Comp Spec: 0 Fp = 0.45; Fst = 0.55; Ap = 0.1dB, Ast = 35dB; 45 stages; downsample = 2 ; total decimation factor = 32 ;

5

10

15

Frequency (Hz)

Slide 23

Outline



SAR Polar Format Algorithms for Logic-in-Memory Extension: Partial Reconstruction



Implementation and Design Automation



Experimental Results



Summary

Slide 24

Design Automation and Optimization Hardware Structure

Design Automation Flow: Customized Parameters

Code Generator

Design Space Exploration

RTL Design (memory/logic mixed)

Smart memory Compiler

Target + Budget

Performance Model

Performance /Cost Report

Regular Pattern

Slide 25

Chip Generator http://genesis.web.ece.cmu.edu/gui/scratch/mydesign-10545.php

Reference: O. Shacham, O. Azizi, M. Wachs, et. al, "Rethinking Digital Design: Why Design Must Change”, Micro, IEEE, Dec 2010.

Slide 26

Outline



SAR Polar Format Algorithms for Logic-in-Memory Extension: Partial Reconstruction



Implementation and Design Automation



Experimental Results



Summary

Slide 27

Reconstruction Quality vs. FFT SAR Perfect reconstruction of point targets original

hermitian image

Actual reconstruction algorithms FFT-based

linear

cubic

Is FFT-based SAR better than interpolation-based SAR? Slide 28

Can FFT and Interpolation Be Distinguished?

nearest neighbor interpolation

FFT interpolation

bilinear interpolation

bicubic interpolation

Answer: Hypothesis Testing Hypothesis testing for linear and FFT: Random guessing:

P(Error) = 0.495 P(Error) = 0.5

Results are statistically indistinguishable. Interpolation is as good as FFT

Slide 29

Accuracy Improvement Through Tiling Mean Square Error relative to Gold Standard Method 0.02 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0

Mean square error vs. interpolation methods for different tile numbers

One-tile 4-tiles 16-tiles

Nearest Neighbor

Bilinear

Bicubic

MSE decreases with more tiling and higher interpolation order Slide 30

Energy Saving for Logic-in-Memory Energy Saving for SAR PFA Grid Interpolation 1.00E+12 1.00E+11 1.00E+10 1.00E+09

Energy(nJ) vs. SAR image size CPU_centric Logic_in_Memory

1.00E+08 1.00E+07 1.00E+06 1.00E+05 1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00 size32×32

size64×64

size128×128

size256×256

size512×512

Energy saving increases with the increasing of problem size Slide 31

Accurate Region-of-Interest by Sacrificing Border Decimation Filter Hardware Cost with ROI Factors 9

Area[1000um2]vs. Region of Interest(ROI) , decimation factor = 2

8

ast=15dB

7

ast=20ddB

6

ast=25dB

5

ast=30dB

4

ast=35dB

3

error

2 1 0 0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

ast: decimation filter stopband attenuation (dB)

Imperfect image edge is resulting from non-steep filter transition region Slide 32

Partial Reconstruction: Operation saving vs. Cost 2D IFFT Computational Cost vs Decimation Factor 1.00E+10

Logic in Memory Hardware Cost

Operationcount vs. decimation factor, SAR image size = 4K×4K

4.00E-04

Logic area/memory area vs. decimation factor

3.50E-04

1.00E+09

3.00E-04

1.00E+08

Grid Interpolation + Decimation Filter(Beta=0.3,Ast=25dB) Grid Interpolation + Decimation Filter(Beta=0.3,Ast=35dB) Grid Interpolation + Decimation Filter(Beta=0.2, Ast=35dB) Grid Interpolation

2.50E-04

1.00E+07

2.00E-04

1.00E+06

1.50E-04 1.00E-04

1.00E+05

Beta: filter rolloff factors ; Ast: decimation filter stopband attenuation (dB)

5.00E-05

1.00E+04

0.00E+00

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

 IFFT operation counts decreases exponential with increasing decimation factors  Logic hardware cost is negligible compared with memory cost  Decimation filter cost slightly increases when increasing decimation factors

Slide 33

140

Outline



SAR Polar Format Algorithms For Logic-in-Memory Extension: Partial Reconstruction



Implementation and Design Automation



Experimental Results



Summary

Slide 34

Summary Logic in Memory and its applications for interpolation Local Memory

Logic in Memory for SAR FPA and partial reconstruction Magnitude Response (dB) 0

inp

-20

Tile2

z-1

z-1

z-1

z-1

Magnitude (dB)

Tile1

R Tile3

-40 -60 -80

-100

Tile4 z-M

z-M

z-M

z-M

-120

outp

0

5

10

15

Frequency (Hz)

Evaluation and integration with Genesis2 Decimation Filter Hardware Cost 7

Area[1000um2] vs. Decimation Factor

6 5 4 3 Beta=0.3,Ast=25dB

2

Beta=0.3,Ast=35dB Beta=0.2, Ast=35dB

1

Polar-to-Rect_Interpolation

0 0

20

40

60

80

100

120

140

Slide 35

Suggest Documents