Integrated Digital System Design using Hardware Description ...

44 downloads 162 Views 659KB Size Report
Integrated Digital System Design using. Hardware Description Languages and. Programmable Logic Devices x x x x. LE. LE. LE. LE. LE. LE. LE. LE. LE. LE. LE.
Integrated Digital System Design using Hardware Description Languages and Programmable Logic Devices Verilog Testbench D r i v e r

M o n i t o r

Verilog Design Specification (Register Transfers)

Constraints CAD tools (behavioral simulation synthesis, map, place and route timing analysis)

device configuration stream to FPGA logic elements

LE

x

LE

multipliers

LE

x

. . .

interconnect wires interconnect switches

LE

LE

M E M

. . .

. . .

. . .

. . .

LE

x

LE

LE

LE

x

LE

M E M

LE

embedded memory blocks

LE

 

Combinational Logic Revision Basic Logic Gates

Muxes (Selection Logic)

0

0

1

1

0

1

0

1

     

   

00

00

01

01

10

10

11

11

00

01

10

11

00

01

10

11

 

 

Combinational Logic Revision

 

Decoders 0 1 2 3 EN

Arithmetic Circuitry    

 

   

Comparator Circuitry =

>

=3

>127


=

=256



SI

Right shift


Tcq + Tlogic + Tsetup There is also the hold constraint: Tcq + Tlogic > Thold  

Clock Skew Zero skew (no phase difference at the clock terminals of the two flip-flops) Assume both flip-flops are reset D1

Q1

D0

Q0

Clock Clock

D1 Q1 D0 Q0

Skew between Clock 1 and Clock 0 D1 Clock1

Q1

D0

Q0

Clock0

Clock1 Clock0

D1 Q1 D0 Q0

 

Clock Skew What can be done to avoid skew? Clock tree . . .

. . .

Clock source

Clock sinks

Clock trees cannot correct bad design practice, which must be avoided!

Clock1 Clock1

Clock0

BAD!

Clock0

GATED CLOCKS ARE FEASIBLE BUT DANGEROUS (ADJUSTED CLOCK TREES AND ADDITIONAL LATCHES ARE NEEDED - THIS TYPE OF TECHNIQUES ARE BEYOND THE SCOPE OF OUR PROJECT)

Clock1

 

Timing Parameters Important timing parameters

Inputs

Combinational Logic

Next

state

Present

state

Outputs

• register-to-register delay • input-to-register delay • register-to-output delay • input-to-output delay (Mealy only)

 

Technology Mapping There is a trade-off between circuit delay and circuit area (or resource usage, such as LUTs, for FPGAs)! X0 X 1 X 2 X 3

X4 X5 X6

X7 X8 X9

X10 X11 X12

F

 

Area-Driven Mapping Area optimization when using 4-LUTs X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

4-LUT

4-LUT

4-LUT

4-LUT F

 

Timing-Driven Mapping Delay optimization when using 4-LUTs X0 X 1 X 2 X 3

X4 X5 X6 0

X7 X8 X9 0

4-LUT

4-LUT

4-LUT

X10 X11 X12 0

4-LUT

4-LUT F

For realistic circuits solution space is too large to be explored for the optimal solutions when targeting area or delay • CAD tools rely heavily for heuristic algorithms for synthesis, mapping and place and route •

Optimal solutions unlikely to be reached in “reasonable” runtime  

Logic Array Blocks Can the FPGA implementation fabric help us with delay improvement?

4-LUT

1

A0 B0

3-LUT

1

3-LUT

LE

Configuration SRAM cells set the LE in the arithmetic mode (not shown in the figure)

LE

Carry out

Carry in

0

An-1 Bn-1

0 1

3-LUT

LE

3-LUT

Sum0

3-LUT

. . .

0

Sumn-1

0 1

3-LUT

LE

Stack LEs on top of each other into a logic array block (LAB)

Special arithmetic mode for LEs suitable for the implementation of ripple carry adders (the delay through interconnect switches is removed!)  

Critical Path Activation Longest propagation delay occurs on the critical path however its activation depends on the processed data A3 B3 C4

+ S3

A2 B2 C3

+ S2

A0 B0

A1 B1 C2

+ S1

A3-A0

0000

1111

B3-B0

0000

0001

C1

+ S0

 

C1 C2 C3 C4 S0 S1 S2 S3 Distance between two clock pulses for the circuit to function properly (in addition to LUT propagation delays, the propagation delays through FFs and the setup time of FFs should be taken in account)

 

Embedded Memories Dual-port RAM (DP-RAM)

Read Only Memory A

R O M

DO

Single port synchronous random access memory (RAM)

A1 DI1 we1

DO1

A DI we

A2 DI2 we2

DO2

DO

Different capacities and organizations • e.g., 512 bits or 4Kbits blocks • 4Kbits blocks can be organized as o 512x8 o 256x16 o… The timing diagram for clocked ROM Clock Adress (A) Data Out (DO)

0

1

2

ROM(0)

ROM(1)

3 ROM(2)

Content from ROM location 0

 

Embedded Memories The timing diagram for single-port RAM Clock Adress (A)

1

0

2

3

value

Data In (DI) Write Enable (we)

RAM(0)

Data Out (DO)

RAM(1)= value

RAM(2)

Content from RAM location 1 that was updated during the write cycle  

The timing diagram for dual port RAM Clock A1

0

1

2

3

4

value1

DI1 we1 DO1 A2

5

RAM(0)

RAM(1)= value1

6

1

RAM(2) 2

RAM(3) 4

value2

DI2 we2 DO2

RAM(5)

RAM(6)

RAM(1)= value2

RAM(2)

Writing on both ports at the same time in the same location is NOT permitted!  

Computer-Aided Design (CAD) Tasks Important tasks in the design flow • Synthesis – analyzes the hardware description language (HDL) source code, translates it into logic blocks (or functions) and then it optimizes them through Boolean manipulation • Mapping – breaks the logic network into technology specific units, such as LEs with a given LUT capacity, for a given design goal (area/delay) • Place & route – places the mapped logic network onto the FPGA fabric, and finds the shortest routes between them so the delay (through interconnect switches) is reduced • Timing analysis – analyzes the delay paths in the implemented circuit; it gives the maximum clock frequency  

Suggest Documents