Lecture 5 Fixed Point vs Floating Point Q-Format number ...

Q-Format number representation Lecture 5 Fixed Point vs Floating Point

N-bit fixed point, 2’s complement number is given by: x = − bN −1 2 N −1 + bN − 2 2 N −2 + • • • + b1 21 + b0 20

Objectives:

N-1

Understand fixed point representations Understand scaling, overflow and rounding in fixed point Understand Q-format Understand TMS320C67xx floating point representations Understand relationship between the two in C6x architecture

0

S imaginary binary point

Difficult to work with due to possible overflow & scaling problems Often normalise number to some fractional representation (e.g. between ± 1) x ′ = − bN −1 20 + bN − 2 2 −1 + • • • + b1 2 N − 2 + b0 2 N −1

Reference: "What Every Computer Scientist Should Know About Floating-Point Arithmetic" by David GoldbergACM Computing Surveys 23, 5 (March 1991).

N-1

0

S imaginary binary point

Lecture 5 - Fixed point vs Floating point

5-1

Q-format notation Q-format representation: – if N=16, 15 bit fractional representation

Rule: – Q m + Qm – Qm x Q n

⇒Q ⇒Q

Storing Q30 number to 16-bit memory requires rounding or truncation:

⇒ Q15 format

31

Q30

X

Q15

S

S

16

+

1

00

0000

0000

0000

Rounding:

0

– if r = 0, round down, – r = 1, round up

15

0

r rounding by addition a '1' here

16

15

S

m

31

Q30

S

m+n

15

S

5-2

How to store Q30 number to 16-bit memory?

Assume 16-bit data format, Q15 x Q15 ⇒ Q30 Q15


0

MPY A3,A4,A6 NOP ADDK 4000h,A6 SHR A6,15,A6 STH A6,*A7

; A3 x A4 → A6 ; Delay slot ; rounding add ; truncate bottom 15 bits ; A6 → mem[A7]

S


5-3


5-4

Avoid overflow with SADD

Safe add routine in C to avoid overflow

SADD - saturation add instruction Always clip to max (or min) possible Set bit 9 of the CSR register to indicate saturation has occurred


5-5

Single Precsion Floating Point number

single precision

S

23 22

8-bit exp

64-bit double precision floating point: 31 30

double precision

0

23-bit frac

20 19

11-bit exp

S

Odd register (e.g. A5)

x = − 1s × 2 exp −127 × 1. frac

0 31

0

52-bit frac Even register (e.g. A4)

x = − 1s × 2 exp −1023 × 1. frac

1.175 × 10−38 < x < 1.7 × 1038

2.2 × 10−308 < x < 1.7 × 10308

MSB is sign-bit (same as fixed point) 8-bit exponent in bias-127 integer format (i.e., add 127 to it) 23-bit to represent only the fractional part of the mantissa. The MSB of the mantissa is ALWAYS ‘1’, therefore it is not stored


5-6

Double Precision Floating Point number

Easy (and lazy) way of dealing with scaling problem 32-bit single precision floating point: 31 30


MSB is sign-bit (same as fixed point) 11-bit exponent in bias-1023 integer format (i.e., add 1023 to it) 52-bit to represent only the fractional part of the mantissa. The MSB of the mantissa is ALWAYS ‘1’, therefore it is not stored

5-7


5-8

Examples

Problems of Q-format

Convert 5.75 to SP FP – – – –

Wrong Q-format representation will give totally wrong results Even correct use of Q-format notation may reduce precision For this example, Q12 result is totally wrong, and Q8 result is imprecise:

22

5.75 to binary: +1.01110000... x exponent in bias-127 is 127+2 = 129 = 1000 0000b The fractional part is .01110000... after we drop the hidden ‘1’ bit. Answer: 0 10000001 0111000 00...00 = 40B80000 (hex)

Convert 0.1 to DP FP – 0.1 to binary: 1.10011001(1001 repeats) x 2-4 – exponent in bias-1023 is 1023-4 = 1019 = 011 1111 1011b – The fractional part is .10011001...1010 after we drop the hidden ‘1’ bit and rounding – Answer: 0 01111111011 1001100 ...1001 1010 = 3FB9 9999 9999 999A (hex).


5-9

Q12 → 7.50195 Q12 → 7.25 Q24 → 54.38916

0111. 1000 0000 1000 * 0111. 0100 0000 0000 0110 0110. 0110 0011 1010 0000 0000 0000 Q12 → 6.38916 Q8 → 54.38281


Data types used by C6x DSPs

5 - 10

Special SP numbers IEEE floating point standard has a set of special numbers: Special value +0 -0 1 2 +Inf -Inf NaN LFPN SFPN


5 - 11

Sign (s) 0 1 0 0 0 1 x 0 0

Exponent (e) 0 0 127 128 255 255 255 254 1


Fraction (f) 0 0 0 0 0 0 Nonzero All 1’s All 0’s

Hex Value

Decimal value

0x0000 0000

0.0 -0.0 1.0 2.0 +∞ -∞ not a number 3.40282347 e+38 1.17549435e-38

0x8000 0000 0x3F80 0000 0x4000 0000 0x7F80 0000 0xFF80 0000 0x7FFF FFFF 0x7F7F FFFF 0x0080 0000

5 - 12

Special DP numbers

TMS320C67x Internal System Architecture

Double precision floating point special numbers: Fraction (f) 0 0 0 0 0 0 Nonzero All 1’s All 0’s

Hex Value

Decimal value

P E R I P H E R A L S

External Memory

0x0000 0000 0000 0000

0.0

0x8000 0000 0000 0000

-0.0

0x3FF0 0000 0000 0000

1.0

0x4000 0000 0000 0000

2.0

0x7FF0 0000 0000 0000

+∞

0xFFF0 0000 0000 0000 0x7FFF FFFF FFFF FFFF

-∞ not a number

0x7FEF FFFF FFFF FFFF

1.7976931348623157 e+308

0x0010 0000 0000 0000

2.2250738585072014 e-308

Internal Buses

.D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2

Regs (B0-B15/31)

Exponent (e) 0 0 1023 1024 2047 2047 2047 2046 1

Regs (A0-A15/31)

Special value +0 -0 1 2 +Inf -Inf NaN LFPN SFPN

Internal Memory

CPU Lecture 5 - Fixed point vs Floating point

5 - 13

Four functional units for each datapath


5 - 14

Mapping of instructions to functional units .S Unit

.S .L .D .M


5 - 15

ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKH

NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO

ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP

.D Unit ADD NEG ADDAB (B/H/W) STB (B/H/W) LDB (B/H/W) SUB LDDW SUBAB (B/H/W) MV ZERO


.L Unit ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM

NOT OR SADD SAT SSUB SUB SUBC XOR ZERO

MPY MPYH MPYLH MPYHL

SMPY SMPYH

ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP

.M Unit MPYSP MPYDP MPYI MPYID

No Unit Used NOP

IDLE 5 - 16

Detailed internal datapaths

Data path A


Data path B

5 - 17

Lecture 5 Fixed Point vs Floating Point Q-Format number ...

Lecture 5 Fixed Point vs Floating Point Q-Format number ...

Suggest Documents

Floating-point to Fixed-point conversion

Floating-Point Numbers Floating-point number system characterized ...

5. Floating Point Arithmetic - KAUNI

A Fused Hybrid Floating-Point and Fixed-Point Dot ... - CiteSeerX

Automatic Floating-point to Fixed-point Conversion for DSP Code ...

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C ...

FLOATING-POINT ARITHMETIC • Floating-point representation and ...

Privacy-Preserving Oriented Floating-Point Number Fully

Around Brouwer's fixed point theorem (Lecture notes)

Fixed Point, Coincidence Point and Common Fixed Point ... - DergiPark

Page 1 Floating Point Arithmetic Floating Point Fractional Binary ...

Floating point numbers

Floating-Point Comparison

Floating-Point Comparison

Floating-Point Comparison

Floating Point Arithmetic

Embedded Floating Point Microcontroller

Embedded Floating Point Microcontroller

Floating Point - Akbar College

Erratum to: Fixed-Point Theorems and Common Fixed-Point Theorems ...

equivariant fixed point index and fixed point transfer in nonzero ...

equivariant fixed point index and fixed point transfer in nonzero

Fixed point theorems for F-expanding mappings - Fixed Point Theory ...

Fixed Point and Common Fixed Point Theorems for Cyclic Quasi