Navier-Stokes Algorithms onto. Multiple-lnstruction/Multiple-Data-. Stream Computers. D. Scott Eberhardt, Donald 8aganoff and. K.G. Stevens, Jr. July 1984. NI\S ...
NASA Technical Memorandum 85945
NASA_TM-8594519840021475
Study of the Mapping of Navier-Stokes Algorithms onto Multiple-lnstruction/Multiple-DataStream Computers D. Scott Eberhardt, Donald 8aganoff and K.G. Stevens, Jr.
July 1984
NI\S/\
National Aeronautics and Space Administration
'-ANGLEY Q[sc:,c.qc'-' LlGR!\~C
,'t
~
Examples: Iterat10ns 1 2
5 10
SEeeduE
Iterat10ns
SEeeduE
1.574 1.705 1.813 1.857
15 50 100 400
1.872 1.895 1.900 1.904
15
TABLE 3.- TASK TIMINGS:
NAVIER-STOKES EQUATIONS
Time-MIMD code, sec Task Serial Setup RHS Residual + BC LHS Output (optional) Time/iteration Output routines
6.57 .02 .84 .08 (1. 64)
Concurrent
Time-serial code, sec
Speedup
6.07 15.15 .84 30.50 Not measured
0.924 1.927 1.0 1.972 Assume 1.0
7.84 15.47
(.94) 3.24
(23.31)
Iteratl.ons
Speedup
Iterations
Speedup
1 2 5 10 15 25
1.635 1. 751 1.842 1.876 1.889 1.899
50 100 400
1.906 1.910 1. 913
1. 914 .994
(46.42) 3.22
Examples:
TABLE 4.- TOTAL CPU TIMINGS FOR EULER EQUATION Number of l.teratl.ons
t
MAIN
, sec
11.60 11. 79 14.25 16.10 22.60 28.03 33.45
1 1 2 5 10 15 25
t RHS ' sec
tLHS,sec
4.21 4.23 8.41 20.98 42.04 63.03 104.79
13.67 13.74 27.36 67.95 132.38 204.58 338.04
t
MIMD
, sec
29.48 29.76 50.02 105.03 197.02 295.64 476.28
t
serl.al' sec 45.32 82.70 187.36 365.60 544.49 897.78
Speedup 1. 537 1.523 1.653 1. 784 1.856 1.842 1.885
TABLE 5.- TOTAL CPU TIMINGS FOR NAVIER-STOKES EQUATION Number of l.terations 1 2 5 10 25
t
MAIN
, sec
11.62 13.93 16.19 20.58 33.36
t RHS ' sec 7.84 15.69 39.15 78.17 196.72
t LHS ' sec 15.47 30.87 77 .15 155.13 386.95
16
t
MIMD
, sec
34.93 60.49 132.49 253.88 617.03
t serial' sec
Speedup
56.70 105.07 244.24 478.13 1173.25
1.623 1.737 1.843 1.883 1.901
TABLE 6.- STOPWATCH TIMINGS FOR EULER EQUATION Number of iterations
t
MIMD
, sec
t
serial'
sec
Speedup
1 1 2 5 10 15
44.61 36.98 58.10 114.68 213.66 311.86
82.70 188.32 366.98 545.98
1.034 1.248 1.423 1.642 1. 718 1. 751
25
499.44
899.53
1.801
46.14
TABLE 7.- STOPWATCH TIMINGS FOR NAVIER-STOKES EQUATION Number of lteratl0ns 1 2 5 10 25
t
MIMD
, sec
t
43.73 68.58 151.50 269.69 647.78
serlal' 58.39 106.75 246.11 479.39 1175.21
17
sec
Speedup 1.335 1.557 1.642 1. 778 1.814
PREVIOUS ITERATION
/" •
L xyz
RHS
/ F -1 x
- SYNCHRONIZE •
1,3 !, 1,4
2,1
2,2
2,3
2,4
3,1
3,2
3,3
3,4
4,1
4,2
4,3
4,4
- SYNCHRONIZE .. q**
/
- SYNCHRONIZE
k =1
.. qn + 1
F -1 z
1,2
q*
/ F -1 y
1,1
J= 1
/
Jmax
- SYNCHRONIZE
Figure 3.- Memory partition for a four-processor system set up for a two-dimensional problem.
NEXT ITERATION
Figure 1.- MIMD procedure for solv1ng approximate-factored algor1thms.
INITIALIZATION
1 DATA INPUT GRID GENERATION VARIABLE INITIALIZATION
2 3
4
k = 1
MAIN ITERATION LOOP
J =1
Jmax SWEEP 1
1
2
3
BOUNDARY CONDITIONS RIGHT-HAND-SIDE OPERATOR EXPLICIT SMOOTHING CALCULATE RESIDUAL - CONVERGANCE? IMPLICIT INTEGRATION - x-SWEEP - y-SWEEP - z-SWEEP OUTPUT EVERY N ITERATIONS
4
k= 1
J= 1
Jmax SWEEP 2
FINAL OUTPUT ROUTINES
Figure 2.- Implementat10n of a twodimensional problem on a fourprocessor system.
Figure 4.- Serial code flow summary.
18
PROGRAM AIR3D
,
INITIA GRID JACOB METOUT (4x) OUT2 EIGEN
XXM, YYM, ZZM XXM, YYM, ZZM
STEP
BC
XXM, YYM, ZZM
RHS
ZZM FLUXVE DIFFER
J = 2, Jmax - 1
YYM FLUXVE DIFFER
J = 2, Jmax - 1
XXM FLUXVE DIFFER VIXRHS
J=1,Jmax k = 2, Kmax - 1 1= 2, Lmax - 1
k = 2, Kmax - 1 1= 1, Lmax
k=1,Kmax 1= 2, Lmax - 1
ZZM MUTUR ZZM
SMOOTH
XXM, YYM, ZZM
FILTRX
XXM AMATRIX (J = 1, Jmax)
k = 2, Kmax - 1 1= 2, Lmax - 1
YYM AMATRX (k = 1, '('l1ax)
J = 2, Jmax - 1 1= 2, Lmax - 1
ZZM AMATRX (I = 1, Lmax) ZZl'vl
J = 2, Jmax - 1 k = '2, Kmax - 1
BTRI FILTRY BTRI FILTRZ VISMAT BTRI MAP OUT2 (x4) < EhlD n-LOOP> OUT (x6) PLOT (x4) PLANE
Figure 5.- Serial code subroutines.
19
MEMORY MAPPING AND SUBPROCESS SETUP MAP GLOBAL SECTIONS CREATE FLAG CLUSTERS CREATE SUBPROCESSES
(SERIAL) (SERIAL) (SERIAL)
INITIALIZATION DATA INPUT GRID GENERATION VARIABLE INITIALIZATION
(SERIAL) (SERIAL) (SERIAL)
MAIN ITERATION LOOP BOUNDARY CONDITIONS RIGHT-HAND-SIDE OPERATOR EXPLICIT SMOOTHING CALCULATE RESIDUAL - CONVERGENCE] IMPLICIT INTEGRATION - x-SWEEP - V-SWEEP - z-SWEEP OUTPUT EVERY N ITERATIONS
(SERIAL)
FINAL OUTPUT ROUTINES
(SERIAL)
SUBPROCESS DELETION AND MEMORY DEALLOCATION
(SERIAL)
F1gure 6.- Concurrent code flow summary.
20
(SERIAL) (PARALLEL) (PARALLEL) (SERIAL) (PARALLEL)
MAPPRM SPAWN
MAPPRM SPAWN INITIA
OUT2 EIGEN
GRID JACOB METOUT (x4) (XXM, YYM, ZZM) METPASS (XXM, YYM, ZZM)
METREC
, . - - - -..
STEP BC
SUBRHS RHS ZZM FLUXVE DIFFER YYM FLUXVE DIFFER XXM FLUXVE DIFFER
) = 2, Jmax - 1 k = 2, Kmax - 1 1= 1, Lmax ) = 2, Jmax - 1 k = 1, Kmax 1= 2, Lmax - 1 ) = 1, JmdX k = 2, Kmdx - 1 1= 2, Lmax - 1
SMOOTH (SYNC)
(SYNC)
FILTRX XXM k=2,Kmax-1 AMATRX 1= 2, Lmax - 1 () = 2, Jmax) BTRI (SYNC)
(SYNC) FIL TRY YYM ) = 2, Jmax - 1 AMATRX 1= 2, Lmax - 1 (k = 2, Kmax) BTRI
(SYNC)
(SYNC) FIL TRZ ZZM ) = 2, Jmax - 1 AMATRX k = 2, Kmax - 1 (I = 1, Lmax) VISMAT BTRI
(SYNC)
(SYNC) MAP OUT2 (x4)
OUT (x6) PLOT (x4) PLANE DELPRM
DELPRM
Figure 7.- Concurrent code subrout1nes.
21
MERCURY
JUPITER
MAIN
SLAVE
LOCAL GEO READ VAR3 COUNT PLOT
MA780
\VARll STEPUP
BASE VIS VARS VARO PPRCSS TURMU MUKIN FS
LOCAL
IVARll STEPUP
LOCAL VAR3 BTRI
LOCAL VAR3 BTRI
SUBRHS
SUBRHS
LOCAL VAR3 RHS
LOCAL VAR3 RHS
F1gure 8.- Process/memory allocat1on (us1ng data block names).
22
-
COMPONENT TIMINGS •
TOTAL CPU TIME REAL TIME
20 0
19 - - - - - - - - - - - - - - - - - - - -
a..
::> w w
18
0
Q
0
a..
0
en 1 7 0
16 1 5 L.--_ _ _ _ _-'---_ _ _ _ _- ' - -_ _ _ _- - - - ' 10 100 1000 1 ITERATIONS
Figure 9.- Speedup of the Pulliam-Steger AIR 3D code using two processors to solve the Euler equations.
20
•
COMPONENT TIMINGS TOTAL CPU TIME
o
REAL TIME
a..
::> w
Q
w a.. en
15~
1
o
_____
~
_____
10
~
_ _ _ _ __ J
100
1000
ITERATIONS
Figure 10.- Speedup of the Pulliam-Steger AIR3D code using two processors to solve the Nav1er-Stokes equation with a thin-layer approximation and an algebra1c turbulence model.
23
2 Government Ace_,on No
1 Report No
3 Recipient's Catalog No
NASA TM-85945 4 Title and Subtitle
5 Report Date
STUDY OF THE MAPPING OF NAVIER-STOKES ALGORITHMS ONTO MULTIPLE-INSTRUCTION/MULTIPLE-DATA-STREAM COMPUTERS
6 Performing Organization Code
July 1984
7 Author(s)
8 Performing Organization Report No
D. Scott Eberhardt and Donald Baganoff (Stanford Univ. , Stanford, CA), and Ken Stevens
A-9716 10 Work Unit No
9 Performing Organrzatlon Name and Address
Ames Research Center Moffett Field, CA 94035
11 Contract or Grant No
13 Type of Report and Period Covered
12 Sponsoring Agency Name and Address
Techn1cal Memorandum
National Aeronaut1cs and Space Admin1strat10n Wash1ngton, DC 20546
14 Sponsoring Agency Code
505-37-01
15 Supplementary Notes
P01nt of Contact:
Ken Stevens, Ames Research Center, MS 233-14, Moffett Field, CA 94035, (415) 965-5949 or FTS 448-5949
16 Abstract
Imp11c1t approximate-factored algor1thms have certa1n propert1es that are sU1table for parallel processing. This study demonstrates how a part1cular computational flu1d dynamics (CFD) code, using th1s algorithm, can be mapped onto a multiple-1nstruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapp1ng procedure is presented, as well as some of the diff1cult1es encountered when try1ng to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility Wh1Ch cons1sts of two VAX 11/780' s w1th a common MA780 multi-ported memory. Speedups exceeding 1.9 for character1st1c CFD runs were ind1cated by the timing results.
17 Key Words (Suggested by Author(s))
18 Distribution Statement
Navier-Stokes equat10ns MIMD Concurrent process1ng Computational fluid dynam1cs 19 Security aasslf (of thiS report)
Unclassified
Unlimited Subject category - 62
20 Security Classlf (of thiS page}
Unclassified
121
No of Pages
26
"For sale by the National Technical Information Service Sprongfleld, Virginia 22161
1 22
Proe,"
A03
End of Document