An Overflow-Free Fixed-point Singular Value Decomposition

0 downloads 0 Views 3MB Size Report
SVD for dimensionality reduction of hyperspectral images. .... C. Egho, & T. Vladimirova, “Adaptive hyperspectral image compression using the KLT and integer ...
HyspIRI Symposium-2015 NASA Goddard Spaceflight Center

An Overflow-Free Fixed-point Singular Value Decomposition Algorithm for Dimensionality Reduction of Hyperspectral Images Bibek Kabi, Anand S. Sahadevan, Ramanarayan Mohanty Aurobinda Routray, Bhabani. S. Das, Anmol Mohanty Indian Institute of Technology Kharagpur

Motivation Hyperspectral Algorithms

Floating-point Program

Hardware

Conversion to Fixed-Point

Floating-Point Processor

Uniform Wordlength

Fixed-Point Processor Wordlength Optimization

Fixed- Point FPGA and ASIC Optimized Wordlength

FPGA: Field Programmable Gate Array ASIC: Application Specific Integrated Circuit

Price

Power Consumption

Overflowing

Hyperspectral Image

My Cost

Processed Image

Previous Works on Numerical Linear Algebra Algorithms in Fixed-Point

Szecówka and Malinowski, 2010

Nikolic et al., 2007 *SVD – Singular Value Decomposition

Jerez et al., 2015

Overflow & Cost Diverse Range

Scale : Log

2

Large Range

Hyperspectral Images for Validation

• Hyperion (Space-borne): Hyperion image contains the Chilika Lake site, India. • ROSIS (Air-borne): ROSIS image contains Pavia, University site, Italy.

Overflow Data Hyperion ROSIS

Variables Variables A a σ

Hyperion (IWLs) 19 47 24

IWLs - Integer Wordlengths

ROSIS (IWLs) 20 49 25

PCs PC1 PC2 PC3 PC4 PC5

SQNR 3.73 10.01

MSE Hyperion 2.0628e+05 1.0199e+03 1.5305e+04 262.3714

SQNR - Signal-to-quantization-noise-ratio

ROSIS 6.7682e+04 6.5052e+03 4.6489e+03 1.3140e+03 3.2786e+03

PCs - Principal Components

Hyperion

MSE - Mean Squared Error

ROSIS

Related Articles General trend in previous studies Fixed-point SVD implementation uses the simulation-based approach for estimating the ranges of variables. It can’t produce promising bounds Jerez et al., 2015 applied a scaling procedure to control the bounds on variables of Lanczos algorithm. Similar scaling procedure cannot be used for SVD because the spectrum of the original matrix changes after scaling. This prevents the recovery of the original singular values and singular vectors.

Egho et al., 2014 Fixed-point implementation of SVD algorithm leads to inaccurate computation of singular values and singular vectors due to overflow. They implemented Jacobi algorithm in FPGA with floating-point arithmetic.

Lopez et al., 2013 The diverse range of the elements of the input data matrices for different HSIs limits the use of fixed-point SVD for dimensionality reduction of hyperspectral images.

Simulation-based approach

• Can produce exact bounds for a specific input matrix. The same bounds can’t avoid overflow for other input matrices. • For every new input matrix, simulation has to be run again and again to find the range of the variables which is a tedious process.

• Bounds grow with the increase in the iterations • Can't handle algorithms that are recursive like SVD • Can’t handle range analysis of algorithms with comparator units and if-statements (Sarbishei and Katarzyna, 2012)

• Difficult to derive same bounds for all range of input data • Difficult to derive same bounds for different stages of the algorithm.

Scaling Method Works !! A =U VT If each element of a matrix is divided by the square root of the product of its one-norm and infinity-norm or Frobenius norm then all the variables generated during the computation of SVD will have tight analytical ranges

U VT

A

m

Aˆ  1 2

m = 3.1227E+7

m=

A1 A

Scaling

m = 2.0629E+7

m= A F

PROOF Derivation in brief Proof: Using vector and matrix norm properties, the ranges of the variables can be derived. We start by bounding the elements of the input matrix as

max | Aˆ xy | Aˆ 2  1 xy

U is the left singular vector matrix, which is orthogonal and each column of U has unity norm. Hence all elements of are in the range [-1,1] following (*).

U (:, i)   U (:, i) 2 = 1

(*)

Similar is the case for right singular vectors V.

Improved SQNR and MSE

MSE [ROSIS] Scale: log10 10.00 5.00

4.83

47 bits (Overflow)

3.66

3.81

47 bits

25 bits

3.52

3.11

0.00 -5.00

PC1

PC2

PC3 -6.74

-10.00

-5.35

PC4

-5.19

PC5

-15.00 -20.00

-25.00

-20.00 -20.00

-20.00

-20.00

-20.00

SQNR and MSE (scaling vs without scaling) with double precision floating-point as the reference

-20.00

-5.62

High-level synthesis [HLS] of fixed-point SVD algorithm on Xilinx Virtex 7 XC7VX485 FPGA

Reduced Hardware Cost

Fixed-point code for HLS is implemented using SystemC

Hardware Cost [Hyperion] Without Scaling

47 bits (With Scaling)

25 bits (With Scaling)

30 20 10

14.27

12.72 6.71 6.29

3.63 1.73 1.62

27

5.34 3.88

5

2.86

0.817 0.45 0.41

0

FFs (%)

LUTs (%)

BRAM (%)

DSP48 (%)

Power (W)

Hardware Cost [ROSIS] Without Scaling

47 bits (With Scaling)

25 bits (With Scaling)

30

14.25

20

10

3.64 1.73 1.62

27

12.72 6.79 6.32

5.34 3.88

4.96 2.82

0.78 0.49 0.42

0 FFs (%)

LUTs (%)

BRAM (%)

DSP48 (%)

Power (W)

Percentage reduction in hardware cost after scaling 53%–55% in FF (flip flop) 53%–56% in LUT (look-up-table)

59%–69% in BRAM 82%–89% in DSP48

38%–50% in On-Chip power

Indian Institute of Technology Kharagpur, India Aurobinda Routray

Bibek Kabi

Bhabani Sankar Das

It is exciting to do Remote Sensing

Ramanarayan Mohanty

Anand S Sahadevan

Anmol Mohanty

References S. Lopez, T. Vladimirova, C. Gonzalez, J. Resano, D. Mozos, & A. Plaza, “The Promise of reconfigurable computing for hyperspectral imaging onboard systems: A review and trends,” Proceedings of the IEEE, vol. 101, no. 3, pp. 698-722, 2013. C. Egho, & T. Vladimirova, “Adaptive hyperspectral image compression using the KLT and integer KLT algorithms,” in Proc. 9th NASA/ESA Conf. Adapt. Hardware Syst., 2014, pp. 112-119. Z. Nikoli´c, H. T. Nguyen, G. Frantz, “Design and Implementation of Numerical Linear Algebra Algorithms on Fixed Point DSPs,” EURASIP J. Adv. Signal Process., pp. 1-22, 2007, article 87046. P. M. Szecowka and P. Malinowski, “CORDIC and SVD Implementation in Digital Hardware,” in Proc. Int. Conf. Mixed Design of Integrated Circuits and Systems (MIXDES), 2010, pp. 237–242. J. L. Jerez, G. A. Constantinides and E. C. Kerrigan, “A Low Complexity Scaling Method for the Lanczos Kernel in Fixed-Point Arithmetic,” IEEE Trans. on Computers, vol. 64, no. 2, pp. 303-315, 2015.

O. Sarbishei and R. Katarzyna, “Verification of fixed-point datapaths with comparator units using Constrained Arithmetic Transform (CAT),” in Proc. IEEE ISCAS, 2012, pp. 592-595.

Backup slides

SQNR and MSE

NIL

SQNR and MSE (scaling vs without scaling) with double precision floating-point as the reference

Va l i d a t e d Without Scaling (Overflow , 47 bits )

SQNR and MSE in fixed-point arithmetic (scaling vs without scaling) with double precision floating-point result as the reference. Without Scaling (Overflow, 47 bits ) MSE PC1 PC2 PC3 PC4 PC5

Hyperion 2.0e+05 1.0e+03 1.5e+04 262.3714 NIL

SQNR

47 bit

Hyperion

3.73

ROSIS

10.01

With Scaling (Overflow-free)

ROSIS 6.7e+04 6.5e+03 4.6e+03 1.3e+03 3.3e+03

SQNR

47 bit

40 bit

35 bit

32 bit

25 bit

Hyperion

176.76

132.15

106.44

84.45

78.03

ROSIS

180.13

135.65

134.79 120.60

74.96

With Scaling (Overflow-free) MSE 25 32 35 40 47

bit bit bit bit bit

PC1 8.1e-7 4e-9 0 0 0

Hyperion PC2 PC3 4.9e-7 3.7e-6 1.3e-7 2.1e-6 0 8.5e-7 0 9.5e-11 0 0

PC4 4.3e-6 1.8e-6 1.9e-7 2.2e-9 0

PC1 0 0 0 0 0

PC2 1.8e-7 4.6e-7 0 0 0

ROSIS PC3 4.5e-6 3e-6 0 0 0

PC4 6.5e-6 5.8e-6 0 0 0

PC5 2.4e-6 2.3e-6 0 0 0

Reduced Cost & On-chip Power Consumption

With Scaling Utilization (%)

Without Scaling COST FF LUTs BRAM DSP48 On-Chip Power Power

COST

Utilization (%) Hyperion ROSIS 3.63 3.64 14.27 14.25 12.72 12.72 27.00 27.00 Consumption (W) Hyperion ROSIS 0.817 0.783

FF LUT BRAM DSP48 On-Chip Power Power

25 bit 1.62 6.29 3.88 2.86

Hyperion 32 bit 35 bit 40 bit 47 bit 1.63 6.41 4.17 3.29

1.67 6.51 4.66 5

1.71 6.56 5.15 5

1.73 6.71 5.34 5

25 bit 1.62 6.32 3.88 2.82

ROSIS 32 bit 35 bit 40 bit 47 bit 1.63 6.48 4.17 3.25

1.67 6.53 4.66 4.96

1.71 6.62 5.15 4.96

1.73 6.79 5.34 4.96

0.44

0.46

0.49

Consumption (W) 0.41

0.43

0.43

0.45

0.45

0.42

0.44

With Scaling

COST FF LUT BRAM DSP48 Power

Reduction (%)

Hyperion 25 bit 55.37 55.92 69.49 89.4 49.44

32 bit 55.09 55.08 67.21 87.81 47.73

35 bit 53.99 54.37 63.36 81.48 47.36

40 bit 52.89 54.02 59.51 81.48 44.79

47 bit 52.34 52.97 58.01 81.48 44.55

25 bit 55.49 55.64 69.49 89.55 46.36

ROSIS 32 bit 55.21 54.52 67.21 87.96 43.67

35 bit 54.12 54.17 63.36 81.62 42.91

40 bit 53.02 53.54 59.51 81.62 41.63

47 bit 52.47 52.35 58.01 81.62 37.42

One-sided Jacobi SVD algorithm (Hestenes)

Suggest Documents