Scalable Flow Simulations with the Lattice Boltzmann ...

81 downloads 0 Views 10MB Size Report
Dec 8, 2016 - International Journal of Computational Science and. Engineering, 4(1), 3-11. ... SIAM Journal on Scientific Computing, 38(2), C96-C126.
Lattice Boltzmann methods on the way to Exa Scale

Ulrich Rüde, Algo Team December 8, 2016

Extreme Scale Simulation



Ulrich Rüde

1

Outline Motivation Building Blocks for the Direct Simulation of Complex Flows 1. Supercomputing: scalable algorithms, efficient software 2. Solid phase - rigid body dynamics 3. Fluid phase - Lattice Boltzmann method 4. Electrostatics - finite volume 5. Fast implicit solvers - multigrid 6. Gas phase - free surface tracking, volume of fluids Multi-physics applications Coupling Examples Perspectives

LBM for EXA



Ulrich Rüde

2

Multi-PetaFlops Supercomputers Sunway TaihuLight 10,649,600 cores 260 cores (1.45 GHz)
 per node 32 GiB RAM per node 125 PFlops Peak

TOP 500: #1

SuperMUC (phase 1)

Blue Gene/Q
 architecture

SW26010 processor

Power consumption:
 15.37 MW

JUQUEEN

458,752 PowerPC
 A2 cores 16 cores (1.6 GHz)
 per node 16 GiB RAM per node 5D torus interconnect 5.8 PFlops Peak

Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27

TOP 500: #13

LBM Methods

Ulrich Rüde

Building block II:

The Lagrangian View: 1 250 000 spherical particles 256 processors 300 300 time steps
 runtime: 48 h (including data output) texture mapping, ray tracing

Granular media simulations with the physics engine

Pöschel, T., & Schwager, T. (2005). Computational granular dynamics: models and algorithms. Springer Science & Business Media.

LBM for EXA



Ulrich Rüde

4

Lagrangian Particle Presentation Single

particle described by

state variables (position x, orientation φ, 
 translational and angular velocity v and ω) a parameterization of its shape S (e.g. 
 geometric primitive, composite object, or mesh) and its inertia properties (mass m, principle 
 moments of inertia Ixx, Iyy and Izz).

The Newton-Euler equations of motion for rigid bodies describe the rate of change of the state Newton-Euler Equations for Rigid Bodies variables: ✓ ◆ ✓ ◆ ˙ x(t) v(t) = ˙ '(t) Q('(t))!(t) ✓ ◆ ✓ ◆ ˙ v(t) f(s(t), t) M('(t)) = ˙ !(t) ⌧(s(t), t) !(t) ⇥ I('(t))!(t)

• Integrator of order one similar to semi-implicit Euler. LBM for EXA



Ulrich Rüde

5

Contact Models

Hard Contacts alternative to the discrete element method

Hard contacts • require impulses, • exhibit non-differentiable but continuous trajectories, • contact reactions are defined implicitly in general, • have non-unique solutions, • and can be solved numerically by methods from two classes.

100%

soft contacts hard contacts

80% 60% 40% 20% 0% -20% -1

-0.5

0

0.5

1

1.5

2

2.5

3

Fig.: Bouncing ball with a soft and a hard contact model.

) measure differential inclusions Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York Erlangen, 15.12.2014 — T. Preclik —

Lehrstuhl fur ¨ Systemsimulation — Ultrascale Simulations of Non-smooth Granular Dynamics

6

Popa, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12. Preclik, T. & UR (2015). Ultrascale simulations of non-smooth granular dynamics; Computational Particle Mechanics, DOI: 10.1007/s40571-015-0047-6

Extreme Scale LBM Methods

-

Ulrich Rüde

6

Nonlinear Complementarity Discretization Underlying the Time-Stepping and Time Stepping Non-penetration conditions



discrete

impulses

continuous

forces

⇠=0

⇠˙+ ⇠˙+ = 0

⇠¨+ ⇠

⇠=0

⇠˙+

0?

n

0

n

0

n

0

0 ? ⇤n

0

0? 0?

0 ? ⇤n

⇠ + v 0n( ) t Signorini condition

k

+ k vto k2

+ k ˙v tok2

n

to k2

µ

to

=

to

=

µ µ

n n

+ vto

n

˙v + to

+ k vto k2 = 0

k⇤tok2  µ⇤n

+ k vto k2⇤to =

0

0?

impact law

Coulomb friction conditions

0

k

0 k vto ( )k2

friction cone condition

to k2 to

+ µ⇤n vto

µ

=

µ

n n

0 vto ( )

frictional reaction opposes slip

Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York Erlangen, 15.12.2014 — T. Preclik —

Lehrstuhl fur ¨ Systemsimulation — Ultrascale Simulations of Non-smooth Granular Dynamics

Popa, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12. Preclik, T. & UR (2015). Ultrascale simulations of non-smooth granular dynamics; Computational Particle Mechanics, DOI: 10.1007/s40571-015-0047-6

LBM for EXA



Ulrich Rüde

7

9

Parallel Computation Key features of the parallelization: domain partitioning distribution of data contact detection synchronization protocol subdomain NBGS accumulators and corrections aggressive message aggregation nearest-neighbor communication Iglberger, K., & UR (2010). Massively parallel granular flow simulations with non-spherical particles. Computer Science-Research and Development, 25(1-2), 105-113 Iglberger, K., & UR (2011). Large-scale rigid body simulations. Multibody System Dynamics, 25(1), 81-95

LBM for EXA



Ulrich Rüde

8

Shaker scenario with sharp edged hard objects

864 000 sharp-edged particles with a diameter between 0.25 mm and 2 mm. LBM for EXA



Ulrich Rüde

9

PE marble run - rigid objects in complex geometry

Animation by Sebastian Eibl and Christian Godenschwager LBM for EXA



Ulrich Rüde

10

Scaling Results

Tobias Preclik, Ulrich R¨ ude

fastest. The T coordinate is limited by the number of Solver algorithmically not optimal for dense systems, hence scalemeasureprocesses per node, which was cannot 64 for the above creation of a three-dimensional communiunconditionally, but is highly efficient inments. manyUpon cases of practical importance cator, the three dimensions of the domain partitionStrong and weak scaling results for a constant number iterations ing are mapped also in of row-major order. This e↵ects, if performed on SuperMUC and Juqueenthe number of processes in z-dimension is less than the number of processes per node, that a two-dimensional Largest ensembles computed or even three-dimensional section of the domain parti10 tioning is mapped to a single node. However, if the number of processes in z-dimension is larger or equal to the 10 number Breakup of processes up per of node, only a one-dimensional compute times on section of Erlangen the domain partitioning is mapped to a single granular gas:graph scaling results RRZE Cluster Emmy 7.1 Scalability of Granular Gases (a) Weak-scaling on the Emmy cluster. node. A one-dimensional section of the domain partitioning performs considerably less intra-node communi0.116 1.2 cation than a two- or three-dimensional section of the 0.114 1 domain partitioning. This matches exactly the situa0.112 tion for 2 048 and124.6096 nodes. For 2 048 nodes, a16 two0.11 .5 % % 0.8 % % section 1⇥2⇥32 of the domain dimensional partitioning .7 0.108 64⇥64⇥32 is mapped to each node, and for 4 096 nodes 0.106 0.6 0.104 a one-dimensional section 1 ⇥ 1 ⇥ 64 of the domain par0.4 0.102 titioning 64 ⇥ 64 ⇥ 64 is mapped to each node. To sub0.1 stantiate this claim, we confirmed that the performance 0.2 av. time per time step ( rst series) av. time per time step (second series) 0.098 jump occurs when the last dimension of the domain parparallel e ciency (second series) 0.096 0 titioning reaches the number of processes per node, also 1 4 16 64 256 1024 4096 16384 number of nodes when using 16 and 32 processes per node. (a) Time-step profile of the granular gas exe(b) Time-step profile of the granular gas exe22

.3% %8 5.9

25 .9

%

% 9.5

25

.8%

8.0

LBM for EXA

cuted with 5 × 2 × 2 = 20 processes on a single

30.6%

(b) Weak-scaling graph on the Juqueen supercomputer.

16.0%

parallel e ciency

2.8 × 10 non-spherical particles icles t 1.1 × 10 contacts r a p re o m s n e o i d t u a t i t en agn m m e l f p o t-im ers r d a r o e h r f-t fou o e t a t s n i n a h t 18.1%

av. time per time step and 1000 particles in s

18

cuted with 8 × 8 × 5 = 320 processes on 16

node. Fig. 5c presents the weak-scalingnodes. results on the SuFigure 7.3: supercomputer. The time-step profiles for The two weak-scaling the granular perMUC setup executions di↵ers offrom thegas on the Emmy cluster with 253 particles per process. granular gas scenario presented in Sect. 7.2.1 in that it — Ulrich Rüde 11centers of two is more TheThe distance between the domain dilute. decompositions. scaling experiment for the one-dimensional domain decompositions (20 × 1 × 1, . . . , 10 240 × 1 × 1) performs best and achieves on 512 nodes a parallel granular particles along each spatial dimension is 2 cm,

Building Block III:

Scalable Flow Simulations with the Lattice Boltzmann Method Lallemand, P., & Luo, L. S. (2000). Theory of the lattice Boltzmann method: Dispersion, dissipation, isotropy, Galilean invariance, and stability. Physical Review E, 61(6), 6546. Feichtinger, C., Donath, S., Köstler, H., Götz, J., & Rüde, U. (2011). WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2), 105-112.

Extreme Scale LBM Methods

-

Ulrich Rüde

12

The stream step Move PDFs into neighboring cells

Non-local part, Linear propagation to neighbors (stream step) Extreme Scale LBM Methods

Local part, Non-linear operator, (collide step) -

Ulrich Rüde

13

Performance on Coronary Arteries Geometry Weak scaling 


458,752 cores of JUQUEEN
 d n o c 12 e s over a trillion 10 r fluid lattice cells e p s e at 1.27µm
 st d i l p a u n cell sizes i l l f 2 1 ce ell B 0 n 1 o × rd of red blood cells: 7µm diameter o 1 . G 2 5 o 1 0 pt 2 u s h t a i t 2.1 1012 cell updates per second s W a f s a e c i twassignment 0.41 PFlops Color coded proc

Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., & UR (2013). A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (p. 35). ACM.

Extreme Scale LBM Methods

Strong scaling
 32,768 cores of SuperMUC cell sizes of 0.1 mm 2.1 million fluid cells 6000+ time steps per second -

Ulrich Rüde

Single Node Performance SuperMUC

JUQUEEN

ed

riz o t c ve

o

ze i m pti

d

standard

Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel largescale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference on Supercomputing (p. 21). IEEE Computer Society. Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and Engineering, 4(1), 3-11.

Extreme Scale LBM

-

Ulrich Rüde

Weak scaling for TRT lid driven cavity - uniform grids JUQUEEN 


SuperMUC


 s 4 processes per node

16 processes per node 4 threads per process

e 4 threads per process l t l a ) e d s c p 2 p u 1 ) u l l s 
 0 L e p 1 s T c u ( 2 e × 1 t L d a T 7 n 0 ( d 3 1 co p d 8 . u n × se 0 o 1 r c . e 2 pe s r e p

Körner, C., Pohl, T., UR., Thürey, N., & Zeiser, T. (2006). Parallel lattice Boltzmann methods for CFD applications. In Numerical Solution of Partial Differential Equations on Parallel Computers (pp. 439-466). Springer Berlin Heidelberg.

Feichtinger, C., Habich, J., Köstler, H., UR, & Aoki, T. (2015). Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU–GPU clusters. Parallel Computing, 46, 1-13.

Extreme Scale LBM

-

Ulrich Rüde

Automatic Generation of Efficient LBM Code Equations with fields and neighbor accesses

lbmpy

pystencils Abstract Syntax Tree

Transformations Array access

Loop Splitting

Kernel

Loop

Add

Blocking

Assign

Condition

Mul

Move Constants before loop

...

... Python JIT

CUDA

waLBerla

Propagation

- moments (SRT,TRT,MRT) - cumulant - (entropic)

- source/destination - EsoTwist - AABB

Specific Transformations - specific common subexpression elimination - loop splitting - input/output of macroscopic values

Backends C(++)

Collision

LLVM

other C/C++ frameworks

Equations with fields and neighbor accesses

Functions

C/C++ Code

Bauer, M., Schornbaum, F., Godenschwager, C., Markl, M., Anderl, D., Köstler, H., & Rüde, U. (2015). A Python extension for the massively parallel multiphysics simulation framework waLBerla. International Journal of Parallel, Emergent and Distributed Systems.

Flow at Scale



Ulrich Rüde

17

Automatic Generation of Efficient LBM Code Measured performance improvements

Flow at Scale



Ulrich Rüde

18

Partitioning and Parallelization

static block-level refinement (→ forest of octrees)

static load balancing

DISK

compact (KiB/MiB) binary MPI IO

DISK

separation of domain partitioning from simulation (optional)

allocation of block data (→ grids) Flow at Scale



Ulrich Rüde

19

Flow through structure of thin crystals (filter)

work with Jose Pedro Galache and Antonio Gil CMT-Motores Termicos, Universitat Politecnica de Valencia LBM for EXA



Ulrich Rüde

20

Parallel AMR load balancing 2:1 balanced grid
 (used for the LBM)

different views on domain partitioning

distributed graph:

forest of octrees:

nodes = blocks edges explicitly stored as < block ID, process rank > pairs

octrees are not explicitly stored,
 but implicitly defined via block IDs

Flow at Scale



Ulrich Rüde

21

AMR and Load Balancing with waLBerla

Isaac, T., Burstedde, C., Wilcox, L. C., & Ghattas, O. (2015). Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing, 37(5), C497-C531. Meyerhenke, H., Monien, B., & Sauerwald, T. (2009). A new diffusion-based multilevel algorithm for computing graph partitions. Journal of Parallel and Distributed Computing, 69(9), 750-761. Schornbaum, F., & Rüde, U. (2016). Massively Parallel Algorithms for the Lattice Boltzmann Method on NonUniform Grids. SIAM Journal on Scientific Computing, 38(2), C96-C126.

Extreme Scale LBM Methods

-

Ulrich Rüde

22

AMR Performance LBM AMR - Performance • Benchmark Environments: •

JUQUEEN (5.0 PFLOP/s) • •



Blue Gene/Q, 459K cores, 1 GB/core compiler: IBM XL / IBM MPI

SuperMUC (2.9 PFLOP/s) • •

Intel Xeon, 147K cores, 2 GB/core compiler: Intel XE / IBM MPI

• Benchmark (LBM D3Q19 TRT): avg. blocks/process (max. blocks/proc.) level

initially

after refresh

after load balance

0

0.383 (1)

0.328 (1)

0.328 (1)

1

0.656 (1)

0.875 (9)

0.875 (1)

2

1.313 (2)

3.063 (11)

3.063 (4)

3

3.500 (4)

3.500 (16)

3.500 (4)

Peta-Scale Simulations with the HPC Framework waLBerla: Massively Parallel AMR for the LBM Florian Schornbaum - FAU Erlangen-Nürnberg - April 15, 2016

Extreme Scale LBM Methods

-

Ulrich Rüde

23

33

LBMAMR AMR Performance AMR Performance LBM - -Performance

BenchmarkEnvironments: Environments: • • Benchmark JUQUEEN(5.0 (5.0PFLOP/s) PFLOP/s) • • JUQUEEN BlueGene/Q, Gene/Q,459K 459Kcores, cores,1 1GB/core GB/core • • Blue compiler:IBM IBMXLXL/ IBM / IBMMPI MPI • • compiler: SuperMUC(2.9 (2.9PFLOP/s) PFLOP/s) • • SuperMUC IntelXeon, Xeon,147K 147Kcores, cores,2 2GB/core GB/core • • Intel compiler:Intel IntelXEXE/ /IBM IBMMPI MPI • • compiler:

Benchmark(LBM (LBMD3Q19 D3Q19TRT): TRT): • • Benchmark during this refresh process … … all cells on the finest level are coarsened and coarsen refine the same amount of fine cells is created by splitting coarser cells → 72 % of all cells change their size Peta-Scale Simulations with the HPC Framework waLBerla: Massively Parallel AMR the LBM Peta-Scale Simulations with the HPC Framework waLBerla: Massively Parallel AMR forfor the LBM Florian Schornbaum- FAU - FAU Erlangen-Nürnberg- April - April 2016 Florian Schornbaum Erlangen-Nürnberg 15,15, 2016

Extreme Scale LBM Methods

-

Ulrich Rüde

24

3032

AMR Performance LBM AMR - Performance • JUQUEEN – space filling curve: Morton 12

197 billion cells

seconds

10

58 billion cells

#cells per core

14 billion cells

8

31,062

hybrid MPI+OpenMP version with SMP 1 process ⇔ 2 cores ⇔ 8 threads

6

127,232 429,408

4

2 0 256

4096

32,768

458,752

cores Peta-Scale Simulations with the HPC Framework waLBerla: Massively Parallel AMR for the LBM Florian Schornbaum - FAU Erlangen-Nürnberg - April 15, 2016

Extreme Scale LBM Methods

-

Ulrich Rüde

25

37

LBM AMR - Performance AMR Performance

• JUQUEEN – diffusion load balancing 12

197 billion cells

seconds

10

58 billion cells

#cells per core

14 billion cells

8

31,062 6

127,232

time almost independent of #processes !

4

429,408

2 0 256

4096

32,768

458,752

cores Peta-Scale Simulations with the HPC Framework waLBerla: Massively Parallel AMR for the LBM Florian Schornbaum - FAU Erlangen-Nürnberg - April 15, 2016

Extreme Scale LBM Methods

-

Ulrich Rüde

26

38

Multi-Physics Simulations for Particulate Flows

Ladd, A. J. (1994). Numerical simulations of particulate suspensions via a discretized Boltzmann equation. Part 1. Theoretical foundation. Journal of Fluid Mechanics, 271(1), 285-309.

Parallel Coupling with waLBerla and PE LBM for EXA



Tenneti, S., & Subramaniam, S. (2014). Particle-resolved direct numerical simulation for gas-solid flow model development. Annual Review of Fluid Mechanics, 46, 199-230. Bartuschat, D., Fischermeier, E., Gustavsson, K., & UR (2016). Two computational models for simulating the tumbling motion of elongated particles in fluids. Computers & Fluids, 127, 17-35.

Ulrich Rüde

27

Fluid-Structure Interaction

direct simulation of Particle Laden Flows (4-way coupling)

Götz, J., Iglberger, K., Stürmer, M., & UR (2010). Direct numerical simulation of particulate flows on 294912 processor cores. In Proceedings of Supercomputing 2010, IEEE Computer Society. Götz, J., Iglberger, K., Feichtinger, C., Donath, S., & UR (2010). Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Computing, 36(2), 142-151.

LBM for EXA



Ulrich Rüde

28

Simulation of suspended particle transport

Simulation und Vorhersagbarkeit —

Ulrich Rüde

29

Building Block IV (electrostatics)

Positive and negatively charged particles in flow subjected to transversal electric field

Direct numerical simulation of charged particles in flow Masilamani, K., Ganguly, S., Feichtinger, C., & UR (2011). Hybrid lattice-boltzmann and finite-difference simulation of electroosmotic flow in a microchannel. Fluid Dynamics Research, 43(2), 025501. Bartuschat, D., Ritter, D., & UR (2012). Parallel multigrid for electrokinetic simulation in particle-fluid flows. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 374-380). IEEE. Bartuschat, D. & UR (2015). Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows, Journal of Computational Science, Volume 8, May 2015, Pages 1-19

LBM for EXA



Ulrich Rüde

30

6-way coupling charge distribution

velocity BCs

LBM

Finite volumes MG treat BCs V-cycle

treat BCs stream-collide step

object motion

hydrodynam. force

iter at.

electrostat. force

Newtonian mechanics collision response object distance

LBM for EXA



correction force

Ulrich Rüde

Lubrication correction 31

2048

32 8 ⇥ 4 ⇥to16 the 26 2048 behaviour 32 ⇥ 32 ⇥ 64 This corresponds expected that 116 the required number of iterations scales with the diameter coarsestsize gridGmeiner problem et is depicted for according di↵erent probof the the problem al. (2014) to lem sizes. Doubling the domain in Shewchuk all three dimensions, the growth in the condition number (1994). the when number of CG the iterations approximately doubles. However, doubling problem size, CG iterations This stay corresponds to or thehave expected that the sometimes constant to bebehaviour increased. This required number of iterations scales with the diameter results from di↵erent shares of Neumann and Dirichlet of the problem size Gmeiner et al. (2014) according to BCs onthethe boundary. Whenever the relative proportion growth in the condition number Shewchuk (1994). of Neumann BCs increases, and However, when doublingconvergence the problem deteriorates size, CG iterations more CG iterations necessary. sometimes stayare constant or have to be increased. This Theresults runtimes all parts of the algorithmand are Dirichlet shown from of di↵erent shares of Neumann in Fig.BCs 13 on forthe di↵erent problem sizes, indicating their boundary. Whenever the relative proportion increases, and sharesofonNeumann the totalBCs runtime. Thisconvergence diagram isdeteriorates based on the more CG iterations maximal (for MG, LBM, are pe) necessary. or average (others) runtimes The runtimes all partsall of processes. the algorithm areupper shown of the di↵erent sweepsofamong The

353

157 (41)

136 (35)

27

7

60

of the individual sweeps. The sweeps that scale perfectly—HydrF, LubrC, size—mainly dueand to increasing MPI communication—are Map, SetRHS, ElectF—are summarized as ‘Oth‘. LBM, MG,simulation pe, and PtCm. LBM and MG take For longer timesOverall, the particles attracted by the up more than 75% of the total time, w.r.t. the runtimes bottom wall are no longer evenly distributed, possibly of the individual sweeps. causing load imbalances. However, they hardly a↵ect the The sweeps that scale perfectly—HydrF, LubrC, overall performance. For the simulation for the animaMap, SetRHS, and ElectF—are summarized as ‘Oth‘. tion, the relative share of the lubrication correction is For longer simulation times the particles attracted by the below each otherevenly sweepdistributed, of ‘Oth‘ ispossibly well below bottom0.1% wall and are no longer 4% of the total runtime.However, they hardly a↵ect the causing load imbalances. Overall the coupled achieves overall performance. For multiphysics the simulationalgorithm for the animation, parallel the relative share of lubrication correction 83% efficiency onthe 2048 nodes. Since mostistime and eachLBM other and sweepMG, of ‘Oth‘ is well below isbelow spent0.1% to execute we will now turn to 4% of the total runtime. analyse them in more detail. Fig. 14 displays the paralOverall the coupled multiphysics algorithm achieves lel performance for di↵erent numbers of nodes. On 2048 83% parallel efficiency on 2048 nodes. Since most time to nodes, MG executes 121,083 MLUPS, corresponding is spent to execute LBM and MG, we will now turn to a parallel efficiency of 64%. The LBM performs 95,372 analyse them in more detail. Fig. 14 displays the paralMFLUPS, withfor 91% parallel efficiency. lel performance di↵erent numbers of nodes. On 2048

Separation experiment

shares on the total runtime. This diagram is based on the

64 12 8 25 6 51 2 10 24 20 48

32

16

8

4

2

1

0 Figure 13 Runtimes of charged particle algorithm sweeps for 240 time stepsNumber on increasing number of nodes. of nodes Figure 13 Runtimes of charged particle algorithm sweeps forofEXA for 240 time steps on increasingLBM number nodes.

part of the diagram shows the cost of fluid-simulation

80 40 LBM Perform. 60 MG Perform. 20 40

0

Perform. 250 500 750 1000 1250LBM 1500 1750 2000 Number of nodesMG Perform. 20

10 Figure 0 14 Weak scaling performance of MG and LBM 250 500 750 1000 1250 1500 32 1750 2000 — 0 sweep Ulrich Rüde for 240 time of steps. Number nodes Figure 14

Weak scaling performance of MG and LBM

3

60 30 50 20 40 10 30 0 20

3

Number of nodes

SetRHS PtCm ElectF

10 MLUPS (MG)

64 12 8 25 6 51 2 10 24 20 48

16 32

100

8

0

200

4

100

300

2

200

3

400

3

300

10 MFLUPS 10 MFLUPS (LBM) (LBM)

LBM Map nodes, MG executes 121,083 MLUPS, corresponding to Lubr a parallel efficiency of 64%. The LBM performs 95,372 HydrF 90 120 LBM MFLUPS, with 91% parallel efficiency. pe 80 Map 100 MG 70 Lubr SetRHS 60 HydrF 90 120 80 PtCm pe 80 50 100 60 MG ElectF 70 40

of the di↵erent sweeps among all processes. The upper

1 Total runtimes []

Total runtimes []

400maximal (for MG, LBM, pe) or average (others) runtimes

10 MLUPS (MG)

240 time steps fully 6-way coupled simulation 400 sec on SuperMuc weak scaling up to 32 768 cores 7.1 Mio particles in Fig. 13 for di↵erent problem sizes, indicating their

Building Block V

Volume of Fluids Method for Free Surface Flows joint work with Regina Ammer, Simon Bogner, Martin Bauer, Daniela Anderl, Nils Thürey, Stefan Donath, Thomas Pohl, C Körner, A. Delgado Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR. (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196. Donath, S., Feichtinger, C., Pohl, T., Götz, J., & UR. (2010). A Parallel Free Surface Lattice Boltzmann Method for LargeScale Applications. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, 318. Anderl, D., Bauer, M., Rauh, C., UR, & Delgado, A. (2014). Numerical simulation of adsorption and bubble interaction in protein foams using a lattice Boltzmann method. Food & function, 5(4), 755-763.

Simulation und Vorhersagbarkeit —

Ulrich Rüde

33

Free Surface Flows Volume-of-Fluids like approach Flag field: Compute only in fluid Special “free surface” conditions in interface cells Reconstruction of curvature for surface tension

Simulation und Vorhersagbarkeit —

Ulrich Rüde

34

Simulation for hygiene products (for Procter&Gamble)

capillary pressure inclination LBM for EXA



surface tension contact angle Ulrich Rüde

35

Additive Manufacturing Fast Electron Beam Melting Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330. Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C.

LBM for EXA



Ulrich Rüde

36

Electron Beam Melting Process
 3D printing EU-Project FastEBM ARCAM (Gothenburg) TWI (Cambridge) FAU Erlangen Generation of powder bed Energy transfer by electron beam

penetration depth heat transfer Flow dynamics melting melt flow surface tension wetting capillary forces contact angles solidification

Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330. Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C.

LBM for EXA



Ulrich Rüde

37

Simulation of Electron Beam Melting

High speed camera shows melting step for manufacturing a hollow cylinder

Simulating powder bed generation using the PE framework

WaLBerla Simulation

LBM for EXA



Ulrich Rüde

38

Conclusions

LBM for EXA



Ulrich Rüde

39

Research in Computational Science is done by teams

Harald Köstler

Florian Schornbaum

Christian Godenschwager

Sebastian Kuckuk

Kristina Pickl

Regina Ammer

Simon Bogner

Christoph Rettinger

Dominik Bartuschat

Martin Bauer

LBM for EXA



Ulrich Rüde

Ehsan Fattahi

Christian Kuschel

40

Thank your your attention!

Thürey, N., Keiser, R., Pauly, M., & Rüde, U. (2009). Detail-preserving fluid control. Graphical Models, 71(6), 221-228. Thürey, N., &UR. (2009). Stable free surface flows with the lattice Boltzmann method on adaptively coarsened grids. Computing and Visualization in Science, 12(5), 247-263.

Simulation und Vorhersagbarkeit —

Ulrich Rüde

41

Suggest Documents