Disruptive Trends urging to rethink Embedded System Implementation

3 downloads 0 Views 4MB Size Report
and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008. 3 at Dallas. [Randy Katz: IEEE Spectrum, Febr. 2009]. Energy cost may overtake. IT equipment ...
Reiner Hartenstein

Disruptive Trends urging to rethink Embedded System Implementation

The impact of shifting to multicore TU Kaiserslautern

4 P issues: performance market trends

programmer productivity program efficiency power consumption

© 2010, [email protected]

2

http://hartenstein.de

Power Consumption of Computers TU Kaiserslautern

... has become an industry-wide issue: incremental improvements are on track, IPCC ?

but „we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]

Power consumption by internet: x30 til 2030 if trends continue

(~90% payed by customers?)

(Google denied)

G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008 „Google

causes 2% of the worlds electricity consumption“

at Dallas [Randy Katz: IEEE Spectrum, Febr. 2009]

© 2010, [email protected]

Energy cost may overtake IT equipment cost in the near future 3

[Albert Zomaya]

http://hartenstein.de

vN: a Massive Power Guzzler TU Kaiserslautern

it‘s a symptom of the von Neumann Syndrome:

Software

is extremely power-hungry - by

massively memory-cycle-hungry instruction streams

Software:

has often very bad performance

we need an approach using much less

Software

triple paradigm © 2010, [email protected]

http://hartenstein.de

Growth beyond Moore‘s Law? TU Kaiserslautern

relative performance 1013 1012 1011

the end of the single-core era

triple paradigm we need to learn parallel programming

1010 109 108 107 106 105 104

... performance drops, productivity & other problems ... „Multicore shifts the burden of Performance from Chip Designer to Software Developers.“ [J. Larus: Spending

Program

Moore's Dividend; C_ACM, May 2009]

103 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30 year © 2010, [email protected]

http://hartenstein.de

Multimedia in the Multicore Era TU Kaiserslautern

begin of the multicore era

relative performance

94

96

98

00

02

[courtesy E. Sanchez]

MIPS

Multimedia needed Performance Needs performance performance application growing needs up to: Audio faster than 800 MIPS Graphics Moore‘s law 11 GOPS

Video 160 GOPS Digital TV 900 GOPS [Pierre Paulin, MPSoC’09] next GSM GPRS EDGE UMTS standard

04

© 2010, [email protected]

06

08 10

12

6

14

16

18

20

22

24

26

28

30 year

http://hartenstein.de

ICT market at an inflection point

TU Kaiserslautern

The battle for the living room & mobile is more important than the PC market. Prosperity depends on network capacity, ..., efficient pricing, flexible platforms, & ...

... Cheap Revolution: • low power

• affordable broadband •software performance triple paradigm

Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.

Broadband is significant at the inflection point, prompting major market governance changes

Cowhey‘s & Aronson‘s Law © 2010, [email protected]

& massive funding needed http://hartenstein.de

Performance Growth by Multicore? & massive programmer productivity problems

TU Kaiserslautern

begin of the multicore era

relative performance

year 94

96

98

00

02

04

06

08 10

12

14

16

18

20

22

24

26

28

30

von-Neumann-only is not the silver bullet Reconfigurable Computing is indispensable! © 2010, [email protected]

http://hartenstein.de

Dead Supercomputer Society TU Kaiserslautern

[Gordon Bell, keynote, ISCA 2000]

•DAPP •ACRI •Denelcor •Alliant •Elexsi •American Supercomputer •ETA Systems •Ametek •Evans and Sutherland •Applied Dynamics Computer •Astronautics •Floating Point Systems •Galaxy YH-1 •BBN •Goodyear Aerospace MPP •CDC •Gould NPL •Convex •Guiltech •Cray Computer •ICL •Cray Research •Intel Scientific Computers •Culler-Harris •International Parallel •Culler Scientific Machines •Cydrome •Kendall Square Research •Dana/Ardent/ Stellar/Stardent •Key Computer Laboratories

only 2 or 3 successes

•MasPar •Meiko most in 1985-1995 •Multiflow - mainly research •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

the single core sequential mind set was http://hartenstein.de the winner

© 2010, [email protected]

TU Kaiserslautern

new types of bugs introduced

Hastily knitted compilers for the heavy lifting ?

e. g. automatically parallelizing compilation via multi-threading, and many other ad-hoc solutions?

widespread confusion and competing claims, „I would be © 2010, [email protected] panicked if I were in industry“

easy fix? John Hennessy:

http://hartenstein.de

TU Kaiserslautern

Amid the Clamor

Michael Wrinn, (keynote at SIGCSE2010): Suddenly, All Computing Is Parallel: Seizing Opportunity Amid the Clamor

a senior course architect in the Intel Software College

http://www.sigcse.org/sigcse2010/attendees/keynotes.php

„Foundational change will disrupt traditional habits throughout the discipline ....“

„The proud era of von Neumann architecture passes into history.“

works to bring parallel computing into mainstream of undergraduate education He also works with the ACM Education Council to bring industrial perspective to curriculum evolution. © 2010, [email protected]

11

... especially how students are to be introduced .... http://hartenstein.de

TU Kaiserslautern

HPRC: High Performance Reconfigurable Computers

programming dilemma…

© 2010, [email protected]

… a taxonomy of design flows

http://hartenstein.de

RC*: Demonstrating the intensive Impact *) RC = Reconfigurable Computing

TU Kaiserslautern Tarek El-Ghazawi

[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008] SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster

Application

DNA and Protein sequencing

DES breaking

Power

Savings Cost

Size

8723

779

22

253

28514

3439

96

1116

Speed-up factor

much less memory and bandwidth needed

no software used ! © 2010, [email protected]

13

massively saving energy

much less equipment needed http://hartenstein.de

by Software to Configware migration No instruction fetch at runtime:

no software !

Speedup-Factor

Speed-up factors TU Kaiserslautern obtained

106

& most MIPS running on FPGAs

Image processing, Pattern matching, Multimedia DSP and real-time face detection

SPIHT wavelet-based image compression 52 40

20

100

BLAST

288 457

FFT 88

A physical signal is the simplest and fastest way of message & data transport.

DES breaking

Reed-Solomon Decoding

video-rate stereo vision MAC pattern 730 1000 900 recognition 400

Abundant on-chip bandwidth available for parallelism of flexible granularity (by FPGA).

http://hartenstein.de © 2009,

wireless

6000

103

28500

protein identification

2400

DNA seq.

8723

3000

crypto CT imaging 1000

Viterbi Decoding Smith-Waterman pattern matching 100

molecular dynamics simulation

Bioinformatics

Astrophysics

GRAPE

[email protected]

© 2010, [email protected]

14

http://hartenstein.de

Energy saving factors: ~10% of speedup

Speedup-Factor

Power save factors TU Kaiserslautern obtained

106

Image processing, Pattern matching, Multimedia DSP and real-time face detection

wireless

6000

GPGPU and x86 multicore:

SPIHT wavelet-based image compression

no energy saving data available

52

Low Power Circuit Design: 40

20

http://hartenstein.de © 2009,

BLAST

288 457

FFT 88

PowerOpt™ (ChipVision Design Systems): divides power consumption by up to 4

100

DES breaking

Reed-Solomon Decoding

video-rate stereo vision MAC pattern 730 1000 900 recognition 400

103

28500

protein identification

DNA 2400 seq.

8723

3000

crypto CT imaging 1000

Viterbi Decoding Smith-Waterman pattern matching 100

molecular dynamics simulation

Bioinformatics

Astrophysics

GRAPE

[email protected]

© 2010, [email protected]

15

http://hartenstein.de

Why such Speed-up Factors ... TU Kaiserslautern

... with FPGAs: a much worse technology ! massive wiring overhead + massive reconfigurability overhead + routing congestion growing with FPGA size

The „Reconfigurable Computing Paradox“ main reason:

no von Neumann Syndrome! no software!

using Configware and Flowware instead © 2010, [email protected]

http://hartenstein.de

Isn‘t NVIDIA the solution?

TU Kaiserslautern

begin of the multicore era

relative performance

year 94

96

98

00

02

04

© 2010, [email protected]

06

08 10

12

14

16

18

20

22

24

26

28

30

http://hartenstein.de

Speed-up factors by GPGPUs (1) http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home CUDA ZONE pages [NVIDIA Corp.]: non-reviewed CUDA user submissions

TU Kaiserslautern

power consumption not reported!

http://hartenstein.de © 2009, [email protected]

© 2010, [email protected]

Speedup-Factor

Drawbacks: von Neumann syndrome, Programmer productivity

103

Astrophysics Bioinformatics

EDA

675 500 340 270 327 420 250 169 270 170 150 260 169 138 150 109 172 100 100120 100 100 100 100 100 100 100 60 90 55 90 75 55 34 60 50 77 60 55 50 30 50 50 50 40 50 36 29 4035 30 50 39 50 50 35 35 31 35 32 35 26 27 23 29 25 26 1520 1630 20 20 20 17 16 15 13 15 15 12 13 10 12 10 10 10 10 10 10 10 10 9 10 9 7 8 9 7 5 5 5 5 5 4 8 4 . 3 3 .5 4 4 3 5 3 4 3 2 2 2 2 1.3 470

102

CFD Computational Fluid Dyamics Cryptography oil & gas

DCC DSP

101

100

Jan 2007

July 2007

Jan 2008

18

July 2008

Jan 2009

July 2009

Digital Content Creation Digital Signal Processing

Graphics Imaging Jan 2010

Numerics Video & Audio

http://hartenstein.de

Speed-up factors by GPGPUs (2) http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home CUDA ZONE pages [NVIDIA Corp.]: non-reviewed CUDA user submissions

TU Kaiserslautern

power consumption not reported! (up to ~600 x)

Speedup-Factor

103

Astrophysics Bioinformatics

EDA

675 675 500 500 470 470270 340340 270 327327 420 420 250 169 270 169 270 150 260 150260 169 169 170 170 138 150 138150 109 109 100 100 172 172 100120 120 100 100100100 100100 100 100 100 100 100 100 100 6060 90 90 55 55 90 90 55 75 7560 55 77 34 60 34 60 77 60 55 55 50 30 50 50 30 50505050 50 40 50 50 40 50 36 50 29 50362935 35 35 35 40 30 50 39 40 30 3950 50 50 35 35 35 32 2626 27 2731 3135 35 35303220 30 29 23 252923 25 26 1520 26 15 20 20 17 17 20 20 16 2016 20 15 13 15 1616 15 13 15 1515 12 10 10 1313 12 12 10 10 10 10 10 10 10 10 10 10 10 10 10 9 10 10 10 9 10 7 109 9 79 7 88 7 9 5 55 5 5 5 5 55 5 4 4 8 8 4 3 4 44 4 3 3 4. 35. 3 3 . 55 5 3 4 3 4 3 3 2 22 2 2 2 2 2 1.3 1.3

102

CFD Computational Fluid Dyamics Cryptography oil & gas

DCC DSP

101

100

Jan 2007

© 2010, [email protected]

July 2007

Jan 2008

19

July 2008

Jan 2009

July 2009

Digital Content Creation Digital Signal Processing

Graphics Imaging Jan 2010

Numerics Video & Audio

http://hartenstein.de

by Software to Configware migration (up to ~30,000x) (200x) vs. GPU: almost 50x

Speedup-Factor

Speed-up factors TU Kaiserslautern obtained (2)

106

Image processing, Pattern matching, Multimedia DSP and real-time face detection

675 500 340 470 270 327420 250 270169 150 260169170 138 150 109100172 100 120 100 100 100 100100 100 60 10090 55 90 75 60 55 77 34 6050 50 503055 50 50 35 5040 50 35 2936 39 4030 5050 35 35 32262731 35 20 23 30 29 25 2615 20 16 20 20 17 16 13 15 1515 12 10 13 12 10 10 1010 10 109 10 7 10 9 7 8 9 5 55 55 4 4 48 3 3. 54 . 3 4 3 35 2 22 2 1.3

© 2010, [email protected]

wireless

6000

SPIHT wavelet-based image compression 52

BLAST

288 457

FFT 88

40

20

100

DES breaking

Reed-Solomon Decoding

video-rate stereo vision MAC pattern 730 1000 900 recognition 400

103

28500

protein identification

2400

DNA seq.

8723

3000

crypto CT imaging 1000

Viterbi Decoding Smith-Waterman 327 pattern matching 250 100 50 molecular dynamics simulation

50 12

Bioinformatics 12

Astrophysics

GRAPE

20

Cryptography http://hartenstein.de

TU Kaiserslautern

RC versus Multicore „RC“ = Reconfigurable Computing

RC: speed-up often higher by orders of magnitude

Sure !

RC: energy-efficiency often higher:

very much, or, by orders of magnitude ?

this is the silver bullet

Sure !

We need both: Multicore and RC © 2010, [email protected]

http://hartenstein.de

„Software“ stands

for extremely memory-cycle-hungry instruction streams

TU Kaiserslautern

Patterson’s Law:

Nathan’s Law:

It expands to fill its containers ...

bandwidth gap grows 50% / year Dave has reached >1000x Patterson

“The Memory Wall”

Nathan Myhrvold

coined by Sally McKee (& co-author)

Software is a gas.

… until being limited by Moore’s Law [& Kryder’s Law]

Wirth‘s “software is slowing faster Law [Niklaus Wirth]

than hardware is accelerating“

The von Neumann Syndrome: C.V.

overhead piles up to code sizes of astronomic dimensions

Ramamoorthy © 2010, [email protected]

22

http://hartenstein.de

term by F. L. Bauer [1968] TU Kaiserslautern

50 years Software Crisis [Cyril Northcote Parkinson, 1955]

Parkinson‘s Law

bureaucracy growth independent of actual work to be done

The time has come

Max Planck:

Replacement of false doctrines by new insights needs 50 years waiting for not only old professors but also their scholars to die off. Software Engineering critics is not new:

Peter G. Neumann 1985-2003: F. L. Bauer 1968, coined the term „Software Crisis“ 216x “Inside Risks“(18 years inside back N. N. 1995: THE STANDISH GROUP REPORT cover of Comm_ACM) Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005 L. Savain 2006: http://hartenstein.de © 2010, [email protected] Why Software is bad Anthony Berglas 2008: Why it is Important that Software Projects Fail

CPU-centric flat world model TU Kaiserslautern

(Aristotelian model)

typical programmer qualification: sequential-only mind set – CPU-“centric“ but no hardware know-how (kind of tunnel view)

CPU not visible from SE © 2010, [email protected]

This

Software-centric

world model is obsolete

http://hartenstein.de

The Machine Model Dichotomy auto-sequencing Memory

TU Kaiserslautern

asM

FE

Flowware Engineering

CPU SE Software Engineering

PE Program Engineering

*) do not confuse with „dataflow“!

von Neumann versus Anti-machine (data stream machine).

PE: the Generalization of Software Engineering — First Step © 2010, [email protected]

25

http://hartenstein.de

Procedural Languages Twins TU Kaiserslautern

program counter

imperative Software Languages read next instruction goto (instruction address) jump to (instruction address) instruction loop instruction loop nesting instruction loop escape instruction stream branching no: no internally parallel loops

data counter(s) super

systolic Flowware Languages read next data item goto (data address) jump to (data address) data loop data loop nesting data loop escape data stream branching yes: internally parallel loops

But there is the Asymmetry

for data parallelism

26

http://hartenstein.de

© 2010, [email protected]

Machine twins: different data movement TU Kaiserslautern

if not Software? Who moves operand to operator if not an instruction? / from

moving data # between Neumann 1 von CPU cores

execution strategy data transport triggered by via common instruction moving data at memory stream run time moving at piped thru arrival of data (r)DPU cores compile time 2 within (r)DPA directly from (transport- the locality of (r)DPU to (r)DPU triggered*) execution *Daniel Tabac, Jack Lipovski

remember the Memory Wall (Patterson‘s Law)

© 2010, [email protected]

27

http://hartenstein.de

A Heliocentric CS Model needed TU Kaiserslautern

CPU SE Software Engineering Triple Paradigm Dual Dichotomy Approach. The Generalization of Software Engineering — © 2010, [email protected]

time to space mapping issue

auto-sequencing Memory

asM

FE

Flowware Engineering

PE Program Engineering

*) do not confuse with „dataflow“!

structure pipe network model s CE Configware Engineering

rDPU reconfigurable-Data-Path- Unit rDPA reconfigurable-Data-Path- Array 29

http://hartenstein.de

Triple Paradigm Compilation TU Kaiserslautern

automatic partitioning

Software Engineering

Code-X

Configware Engineering

mid‘ 90ies: Jürgen Becker

C, FORTRAN MATHLAB, …

source „program“ placement & routing mapper software configware compiler compiler instruction scheduler data scheduler configware software code code flowware code http://hartenstein.de © 2010, [email protected] 30 instruction streams data streams configuration source program

SE Education Revolution

TU Kaiserslautern

Software Engineering

by triple paradigm co-education: traditional qualification in the time domain + lean qualification in the space domain = lean hardware modeling qualification at a higher level of abstraction

© 2010, [email protected]

31

http://hartenstein.de

Conclusions (1) TU Kaiserslautern

We urgently need a Software Education Revolution for using Multicore - and RC* (SERUM-RC*) *) Reconfigurable Computing We urgently need a Mead-&Conway-dimension text book on triple-paradigm programming education

and a few new Matlab/Simulink boxes for a model-based lean instruction approach to undergraduate students © 2010, [email protected]

32

http://hartenstein.de

Conclusions (2) TU Kaiserslautern

To maintain a Booming Multicore Era: possible for 2 or 3 more decades? Not without Reconfigurable Computing!

the end of the singlecore era

relative performance

year © 2010, [email protected] 04 06 08 10

12

14

16

18

20

22

24

26

28

http://hartenstein.de 30

33

TU Kaiserslautern

thank you © 2010, [email protected]

34

http://hartenstein.de

TU Kaiserslautern

END © 2010, [email protected]

35

http://hartenstein.de

TU Kaiserslautern

extra pages for discussion: © 2010, [email protected]

36

http://hartenstein.de

The Systolic Array

TU Kaiserslautern

nice time/space notation - defines: ... which data item time at which time at which port

x x x

(pipe network) DPA* *) DataPath Array (array of DPUs) DataPath Unit has no program counter! it’s no CPU!

time

(H. T. Kung paradigm)

|

input data stream

| |

x x x x x x -

port #

- - - x x x

time

- - - - x x x

x x x - -

- - - - - x x x port #

port #

|

|

|

|

|

|

|

|

|

|

|

x x x © 2010, [email protected]

x x x

x x x

Algebra experts‘ hobby, early 80ies

time

x x x

output data streams

|

x x x

37

http://hartenstein.de

The von Neumann Syndrome TU Kaiserslautern

The instruction-stream-based von Neumann approach: The data-stream-based anti machine approach: has no von per Neumann bottleCPU! necks has

the watering pot model [Hartenstein]

several von Neumann overhead phenomena

© 2010, [email protected]

38

http://hartenstein.de

Data meeting the Processing Unit (PU) TU Kaiserslautern

... explaining the RC advantage

We have 2 choices routing the data by memory-cycle-hungry instruction streams thru shared memory data-stream-based: placement* of the execution locality ... pipe network generated by configware compilation © 2010, [email protected]

by Software by Configware

(data)

(PU)

*) before run time

39

http://hartenstein.de

*> Declarations EastScan is TU Kaiserslautern

by [1,0] 4 step end EastScan;

2

JPEG zigzag scan pattern

goto PixMap[1,1]

a datastream HalfZigZag; SouthWestScan uturn (reverse (HalfZigZag))

SouthScan is step by [0,1] endSouthScan; NorthEastScan is loop 8 times until [*,1] step by [1,-1]

3 endloop

language example

an animation

x y

dataHalfZigZag counter

data counter

end NorthEastScan;

SouthWestScan is loop 8 times until [1,*] step by [-1,1]

1 endloop

end SouthWestScan;

endloop end HalfZigZag;

© 2010, [email protected]

data counter

40

data counter

reverse (HalfZigZag)

HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan

http://hartenstein.de

TU Kaiserslautern

Double Dichotomy Paradigm Dichotomy

time domain

von Neumann

Anti Machine

(Software-Domain)

(Flowware-Domain)

data stream

instruction stream

time domain

Relativity Dichotomy time domain

space

time

Procedure

Structure

(Software-Domain) © 2010, [email protected]

(Configware-Domain)

41

space domain

http://hartenstein.de

Paradigm Dichotomy: an old hat TU Kaiserslautern

HDL scene ~1970:

paradigm mapping causes a time to space mapping decision box:

demultiplexer:

ENABLE

B0

CONDITION

CONDITION

ENABLE

B0 1

0

B1 B1

W. A. Clark: 1967 SJCC, AFIPS Conf. Proc. C. G. Bell et al: IEEE Trans-C21/5, May 1972 RTM as DEC product available: © 2010, [email protected]

1973 42

decision box turns into demultiplexer “That’s so simple! why did it take 30 years to find out ?” reductionists’ tunnel view David Parnas: Put [very] Old Ideas Into Practice

PvOIIP

http://hartenstein.de

Paradigm Dichotomy (2)

TU Kaiserslautern

Paradigm Dichotomy time domain

von Neumann

Anti Machine

(Software-Domain)

(Flowware-Domain)

data stream

instruction stream

time domain

software to flowware mapping ? Relativity Dichotomy time domain

space

time

Procedure

Structure

(Software-Domain) © 2010, [email protected]

(Configware-Domain)

43

space domain

http://hartenstein.de

Relativity Dichotomy

TU Kaiserslautern

Paradigm Dichotomy time domain

von Neumann

Anti Machine

(Software-Domain)

(Flowware-Domain)

data stream

instruction stream

time domain

Relativity Dichotomy time domain

space

time

Procedure

Structure

(Configware-Domain)

(Software-Domain)

space domain

time to space mapping © 2010, [email protected]

44

http://hartenstein.de

Relativity Dichotomy (2) TU Kaiserslautern

space time/space time/space

time time time

time domain: procedure domain

space domain: structure domain

2 phases: 1) programming instruction streams 2) run time

3 phases: 1) reconfiguration of structures 2) programming data streams 3) run time

© 2010, [email protected]

45

http://hartenstein.de

time-iterative to space-iterative TU Kaiserslautern

n time steps, 1 CPU

the space dimension is limited (e.g. because of the chip size) n*k time steps, 1 CPU

a time to space mapping

a time to space/time mapping

1 time step, n DPUs

n time steps, k DPUs

loop transformation methodogy: 70ies and later © 2010, [email protected]

Strip mining [D. Loveman, J-ACM, 1977]

46

http://hartenstein.de

POIIP: Loop turns into Pipeline [1979]

TU Kaiserslautern

loop:

Memory

CPU loop body

complex loop body nested loops

Pipeline: (reconfigurable) DataPath Unit:

loop body

rDPU

rDPU rDPU rDPU rDPU

complex rDPU or pipe network inside rDPU

© 2010, [email protected]

47

complex pipe network http://hartenstein.de

TU Kaiserslautern

The Bubble Sort Algorithm

loop i = 2 … N loop j = 2 … N if key [j-1] > key [j] then swap (key [j-1], key [j]) endif; endloop j; endloop i;

© 2010, [email protected]

48

http://hartenstein.de

architecture instead of synchro bubble sort example

TU Kaiserslautern

conditional swap

conditional swap conditional swap conditional swap

conditional swap conditional swap conditional swap

conditional swap

only half of the number of blocks

conditional swap conditional swap conditional swap

direct time to space mapping

modification: with shufflefunction

accessing conflicts

© 2010, [email protected]

conditional swap

„Shuffle Sort“ 49

http://hartenstein.de

TU Kaiserslautern

time 2 space mapping

Time domain: Procedure-Domain

space-Domain: Structure-Domain

time-Algorithm

space-Algorithms

Pipeline

Program loop

n time steps, 1 CPU

1 clock steps n DPUs

Shuffle Sort

Bubble Sort conditional swap

n x k time steps: 1 „conditional x swap“ unit y

k clock steps, n „conditional swap“ units

conditional swap conditional swap conditional swap

space- / time-Algorithm

time-Algorithm © 2010, [email protected]

conditional swap

50

http://hartenstein.de