Numerical Simulation of Multiphase Flows at the ...

2 downloads 0 Views 15MB Size Report
Jun 28, 2017 - ... in complex geometry. Animation by Sebastian Eibl and Christian Godenschwager ..... Kuipers, J. A. M. (2007). Review of discrete particle.
Numerical Simulation of Multiphase Flows at the Extreme Scale Ulrich Rüde LSS Erlangen and CERFACS Toulouse [email protected] Symposium: Fluid Dynamics Modelling for Real World Applications June 28, 2017 Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique www.cerfacs.fr Flow at Scale

Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de —

Ulrich Rüde

1

On the agenda today … Continuum models: Finite elements and implicit solvers Geophysics Particle based methods: rigid body dynamics Granular systems Mesoscopic methods: Lattice Boltzman methods Complex flows Perspectives Towards Predictive Science Computational Science and Engineering

Not Yet at Scale

-

Ulrich Ruede

2

SuperMuc: 3 PFlops

Building Block I:

Current and Future High Performance Supercomputers

Coupled Flow for ExaScale



Ulrich Rüde

3

Multi-PetaFlops Supercomputers Sunway TaihuLight SW26010 processor 10,649,600 cores 260 cores (1.45 GHz)
 per node 32 GiB RAM per node 125 PFlops Peak Power consumption:
 15.37 MW TOP 500: #1

JUQUEEN

SuperMUC (phase 1)

Blue Gene/Q
 architecture 458,752 PowerPC
 A2 cores 16 cores (1.6 GHz)
 per node 16 GiB RAM per node 5D torus interconnect

Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27

5.8 PFlops Peak TOP 500: #13

What is the problem?

LBM Methods

Ulrich Rüde

Would you want to propel a Superjumbo

Designing Algorithms! with four strong jet engines

Large Scale S imulation Soft ware

Moderately Parallel Computing

or with 1,000,000 blow dryer fans?

Massively Parallel MultiCore Systems

Coupled Flow for ExaScale



Ulrich Rüde

5

Building block II:

Fast Implicit Solvers: Parallel Multigrid Methods for Earth Mantle Convection Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., & Wohlmuth, B. (2015). Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 37(2), C143-C168. Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., & Wohlmuth, B. (2015). Towards textbook efficiency for parallel multigrid. Numerical Mathematics: Theory, Methods and Applications, 8(01), 22-46. Huber, M., John, L., Pustejovska, P., Rüde, U., Waluga, C., & Wohlmuth, B. (2015). Solution Techniques for the Stokes System: A priori and a posteriori modifications, resilient algorithms. ICIAM 2015.

Simulation und Vorhersagbarkeit —

Ulrich Rüde

6

Stationary Flow Field

TERRA NEO

Simulation of Earth Mantle Convection

-

Uli Rüde

7

TERRA

Exploring the Limits … ally appear in simulations for molecules, quantum mechanics, or geophysics. The initial

Gmeiner B., Huber M, John L, UR, Wohlmuth, B: A quantitative performance study for Stokes solvers at the of extreme Journal of Computational Science, in print. onsists 240 scale, tetrahedrons for the case of 52016, nodes and 80 threads. The number

of degr

Multigrid withT0Uzawa Smoother oms on the coarse grid grows from 9.0 · 103 to 4.1 · 107 by the weak scaling. We con

Optimized Minimal Memory Consumption tokes system with thefor Laplace-operator formulation. The relative accuracies for coarse 13 Unknowns correspond to 80 TByte for the solution vector 10 r (PMINRES and CG algorithm) are set to 10 3 and 10 4 , respectively. All other param Juqueen has 450 TByte Memory F o D e r he solver remain as previously described. o m e d matrix free implementation essential u ers gnit nodes 5

lv a o s m f E o F s t r r e a d r e o h l -t time time w.c.g. threads DoFs iter f a r o e v e t e a s t s t s o m 80than 2.7 · 109 10 685.88 678.77

40

640

320

5 120

2 560

40 960

20 480

327 680

2.1 · 1010

time c.g. in % 1.04

10

703.69

686.24

2.48

10

741.86

709.88

4.31

1.7 · 1012

9

720.24

671.63

6.75

1.1 · 10

9

1.2 · 1011 13

776.09

681.91

TERRA NEO

12.14

ERRA Table 10: Weak scaling results with and without coarse grid for the spherical shell Tgeometry. 8 Simulation of Earth Mantle Convection - Uli Rüde

Multigrid: The algorithm for 1013 unknowns Goal: solve Ah uh = f h using a hierarchy of grids

Ah u h = f h rh = f h Residual

Relax on

Restrict

Correct

uh

uh + eh

Ah u h

rH = IhH rh

Interpolate

h H e eh = IH

Multigrid uses coarse grids Solve AH eH = r H to accomplish the global data exchange in the most efficient by recursion way possible …



Simulation of Earth Mantle Convection

TERRA NEO

-

Uli Rüde

9

TERRA

Hierarchical Hybrid Grids (HHG) Parallelize „plain vanilla“ multigrid for tetrahedral finite elements partition domain parallelize all operations on all grids use clever data structures matrix free implementation

Do not worry (so much) about coarse grids idle processors? short messages? sequential dependency in grid hierarchy?

Elliptic problems always require global communication. This cannot be accomplished by

Bey‘s Tetrahedral Refinement

local relaxation or Krylov space acceleration or domain decomposition without coarse grid Gholami, A., Malhotra, D., Sundar, H., & Biros, G. (2016). FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube. SIAM Journal on Scientific Computing, 38(3), C280-C306. B. Bergen, F. Hülsemann, U. Rüde, G. Wellein: ISC Award 2006: „Is 1.7× 1010 unknowns the largest finite element system that TERRA NEO can be solved today?“, SuperComputing, 2005. Gmeiner, UR, Stengel, Waluga, Wohlmuth: Towards Textbook Efficiency for Parallel Multigrid, Journal of Numerical Mathematics: Theory, Methods and Applications, 2015

Simulation of Earth Mantle Convection

-

Uli Rüde

10

TERRA

Building block II:

Granular media simulations

Hiking Aug. 2016 in the Silvretta mountains

with the physics engine

Pöschel, T., & Schwager, T. (2005). Computational granular dynamics: models and algorithms. Springer Science & Business Media.

Coupled Flow for ExaScale



Ulrich Rüde

11

Lagrangian Particle Presentation Single particle described by state variables (position x, orientation φ, 
 translational and angular velocity v and ω) a parameterization of its shape S (e.g. 
 geometric primitive, composite object, or mesh) and its inertia properties (mass m, principle 
 moments of inertia Ixx, Iyy and Izz).

The Newton-Euler equations of motion for rigid bodies describe the rate of change of the state ewton-Euler Equations for Rigid Bodies variables: ✓ ◆ ✓ ◆ ˙ x(t) v(t) = ˙ '(t) Q('(t))!(t) ✓ ◆ ✓ ◆ ˙ v(t) f(s(t), t) M('(t)) = ˙ !(t) ⌧(s(t), t) !(t) ⇥ I('(t))!(t)

• Integrator of order one similar to semi-implicit Euler. coupled problems at scale

-

Ulrich Rüde

12

Discretization Underlying the Time-Stepping

Nonlinear Complementarity: Measure Differential Inclusions Non-penetration conditions



discrete

impulses

continuous

forces

⇠=0

⇠˙+ ⇠˙+ = 0

⇠¨+ ⇠

⇠=0

⇠˙+

0?

n

0

0?

n

0

0?

n

0

0 ? ⇤n

0

0 ? ⇤n

⇠ + v 0n( ) t Signorini condition

impact law

Coulomb friction conditions

k

+ k vto k2

+ ˙ k v tok2

n

to to

µ

= =

µ µ

n n

+ vto

n

˙v + to

+ k vto k2 = 0

k⇤tok2  µ⇤n

+ k vto k2⇤to =

0

0?

to k2

0

k

0 k vto ( )k2

friction cone condition

to k2 to

+ µ⇤n vto

µ

=

µ

n n

0 vto ( )

frictional reaction opposes slip

Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York 15.12.2014 — T. Preclik — Lehrstuhl fur ¨ Systemsimulation — Ultrascale Simulations of Non-smooth Granular Dynamics Popa,Erlangen, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12. Preclik, T., & UR (2015). Ultrascale simulations of non-smooth granular dynamics. Computational Particle Mechanics, 2(2), 173-196.

Preclik, T., Eibl, S., & UR (2017). The Maximum Dissipation Principle in Rigid-Body Dynamics with Purely Inelastic Impacts. arXiv preprint:1706.00221.

coupled problems at scale

-

Ulrich Rüde

13

9

Shaker scenario with sharp edged hard objects

864 000 sharp-edged particles with a diameter between 0.25 mm and 2 mm. coupled problems at scale

-

Ulrich Rüde

14

PE marble run - rigid objects in complex geometry

Animation by Sebastian Eibl and Christian Godenschwager LBM for EXA



Ulrich Rüde

15

Scaling Results

18

Tobias Preclik, Ulrich R¨ ude

fastest. The T coordinate is limited by the number of Solver algorithmically not optimal for dense systems, hence processes per node, which wascannot 64 for the scale above measurecreation of a three-dimensional communiunconditionally, but is highly efficient inments. manyUpon cases of practical importance cator, the three dimensions of the domain partitionStrong and weak scaling results for a constant number of iterations ing are mapped also in row-major order. This e↵ects, if performed on SuperMUC and Juqueenthe number of processes in z-dimension is less than the number of processes per node, that a two-dimensional Largest ensembles computed or even three-dimensional section of the domain parti10 tioning is mapped to a single node. However, if the number of processes in z-dimension is larger or equal to the 10 number Breakup of processes up per of node, only a one-dimensional compute times on section of Erlangen the domain partitioning is mapped to a single granular gas: scaling results RRZE Cluster Emmy 7.1 Scalability of Granular Gases (a) Weak-scaling graph on the Emmy cluster. node. A one-dimensional section of the domain partitioning performs considerably less intra-node communi0.116 1.2 cation than a two- or three-dimensional section of the 0.114 1 domain partitioning. This matches exactly the situa0.112 tion for 2 048 and124.6 096 nodes. For 2 048 nodes, 1a6 two0.11 .5 % % 0.8 % % section 1⇥2⇥32 of the domain .7 dimensional partitioning 0.108 64⇥64⇥32 is mapped to each node, and for 4 096 nodes 0.106 0.6 0.104 a one-dimensional section 1 ⇥ 1 ⇥ 64 of the domain par0.4 0.102 titioning 64 ⇥ 64 ⇥ 64 is mapped to each node. To sub0.1 stantiate this claim, we confirmed that the performance 0.2 av. time per time step ( rst series) av. time per time step (second series) 0.098 jump occurs when the last dimension of the domain parparallel e ciency (second series) 0.096 0 titioning reaches the number of processes per node, also 1 4 16 64 256 1024 4096 16384 number of nodes when using 16 and 32 processes per node. (a) Time-step profile of the granular gas exe(b) Time-step profile of the granular gas exe22

%

.3% %8 9 . 5

25 .9

parallel e ciency

% 9.5

25

.8%

8.0

cuted with 5 × 2 × 2 = 20 processes on a single node.

30.6%

(b) Weak-scaling graph on the Juqueen supercomputer.

16.0%

18.1%

av. time per time step and 1000 particles in s

2.8 × 10 non-spherical particles 1.1 × 10 contacts

cuted with 8 × 8 × 5 = 320 processes on 16 nodes.

Fig. 5c presents the weak-scaling results on the SuFigure 7.3: supercomputer. The time-step profiles for two weak-scaling the granular perMUC The setup executions di↵ers offrom thegas on the Emmy cluster with 253 particles per process. granular gas scenario presented in Sect. 7.2.1 in that it coupled problems at scale - Ulrich Rüde 16centers of is more TheThe distance between twodecomdomain dilute. decompositions. scaling experiment for thethe one-dimensional domain

Building Block III:

Scalable Flow Simulations with the Lattice Boltzmann Method Succi, S. (2001). The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press. Feichtinger, C., Donath, S., Köstler, H., Götz, J., & Rüde, U. (2011). WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2), 105-112.

Extreme Scale LBM Methods

-

Ulrich Rüde

17

Starting from the Boltzmann equation • •

The Boltzmann equation:

@f + ⇠rx f + K r⇠ f = Q(f , f ), @t

f = f (t, x, ⇠).

describes the dynamics of particle position probability in phase space:

phase space (6D) = position space (3D) + velocity space (3D) .



Complicated integro-differential equation due to its collision term. Assumptions for Q: binary collisions and molecular chaos. f(t,x,ξ) is the probability to encounter particles with the continuous microscopic velocity ξ at position x at time t.

Flow at Scale



Ulrich Rüde

18

Simplest approximation of the collision term Modeling of the complex collision term

Krook equation, BGK

(Bhatnagar-Gross-Krook) equation:

@f + ⇠rx f = @t

1 ⌧ t

(f

f

(0)

).

with ⌧ : relaxation time depending on the viscosity of the simulated fluid,

1 ! = : Collision frequency [0.5, 2.0] / Δt ⌧ This is „Single Time Relaxation“ (SRT) In LBM many (better) alternatives can be devised Geier, M., Schönherr, M., Pasquali, A., & Krafczyk, M. (2015). The cumulant lattice Boltzmann equation in three dimensions: Theory and validation. Computers & Mathematics with Applications, 70(4), 507-547.

Flow at Scale



Ulrich Rüde

19

Discretization If (Kn = ε ≪ 1) and in case of very small deviations from the local equilibrium: f can be approximated with only a few degrees of freedom. The physical approximation of the distribution function f with values fi at N collocation points, which move with the velocity ci, is

@f + ⇠rx f = @t

1

(f

f (0) )

⌧ t mean free path length Knudsen number: Kn = . characteristic length scale

f (t, x, ⇠) =) f˜i (t, x).

Flow at Scale



Ulrich Rüde

20

Eulerian: Lattice-Boltzmann-Method Discretization in squares or cubes (cells) Common examples for particle distribution functions (PDF) in 2D: 9 numbers (2DQ9) in 3D: D3Q19 (alternatives D3Q27, etc)

Flow at Scale



Ulrich Rüde

21

The stream step Move PDFs into neighboring cells

Non-local part, Linear propagation to neighbors (stream step) coupled problems at scale

Local part, Non-linear operator, (collide step) -

Ulrich Rüde

22

The Equilibrium Distribution Function fi (x + ci t, t +

t)

fi (x, t) =

1 (fi ⌧

eq

fi ).

feq is the equilibrium distribution function It defines the local particle distribution for a certain velocity u and a certain density ρ when the fluid has reached a state of equilibrium

fi

eq

= fi

eq



3 9 (⇢, u) = ti ⇢ 1 + 2 ci u + 4 (ci u)2 c 2c

2

3u 2c 2

ci: discrete lattice velocity ti: direction dependent weighting factor

Flow at Scale



Ulrich Rüde

23



The collide step Compute new PDFs modeling molecular collisions Most collision operators can be expressed as Equilibrium function: non-linear,
 depending on the conserved moments , , and .

coupled problems at scale

-

Ulrich Rüde

24

The Lattice Boltzmann Algorithm

coupled problems at scale

-

Ulrich Rüde

25

Performance on Coronary Arteries Geometry

Weak scaling 
 458,752 cores of JUQUEEN
 over a trillion (1012) fluid lattice cells Color coded proc assignment Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., & UR (2013). A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (p. 35). ACM.

Strong scaling
 32,768 cores of SuperMUC cell sizes of 0.1 mm 2.1 million fluid cells 6000 time steps per second

coupled problems at scale

-

Ulrich Rüde

Where have all my cycles gone? … evaluating single node performance SuperMUC

JUQUEEN

ed z i r o vect ze i m i opt

d

standard

Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel largescale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference on Supercomputing (p. 21). IEEE Computer Society. Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and Engineering, 4(1), 3-11.

Flow at Scale



Ulrich Rüde

27

Weak scaling for TRT lid driven cavity - uniform grids JUQUEEN 


SuperMUC


 s 4 processes per node

16 processes per node 4 threads per process

e 4 threads per process l t l a ) e d s c p 2 p u 1 ) u l l s 
 0 L e p 1 s T c u ( 2 e × at TL 1 d 7 n 0 ( d 3 1 co p d 8 . u n × se 0 o 1 c . r e 2 pe s r e p Extreme Scale LBM

-

Ulrich Rüde

Adaptive Mesh Refinement and Load Balancing

Isaac, T., Burstedde, C., Wilcox, L. C., & Ghattas, O. (2015). Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing, 37(5), C497-C531. Meyerhenke, H., Monien, B., & Sauerwald, T. (2009). A new diffusion-based multilevel algorithm for computing graph partitions. Journal of Parallel and Distributed Computing, 69(9), 750-761. Schornbaum, F., & Rüde, U. (2016). Massively Parallel Algorithms for the Lattice Boltzmann Method on NonUniform Grids. SIAM Journal on Scientific Computing, 38(2), C96-C126. Schornbaum, F., & Rüde, U. (2017). Extreme-Scale Block-Structured Adaptive Mesh Refinement. arXiv preprint:1704.06829.

coupled problems at scale

-

Ulrich Rüde

29

Partitioning and Parallelization

static block-level refinement (→ forest of octrees)

static load balancing

DISK

compact (KiB/MiB) binary MPI IO

DISK

separation of domain partitioning from simulation (optional)

allocation of block data (→ grids) Flow at Scale



Ulrich Rüde

30

Parallel AMR load balancing 2:1 balanced grid
 (used for the LBM)

different views on domain partitioning

distributed graph:

forest of octrees:

nodes = blocks edges explicitly stored as < block ID, process rank > pairs

octrees are not explicitly stored,
 but implicitly defined via block IDs

Flow at Scale



Ulrich Rüde

31

Flow through structure of thin crystals (filter)

Gil, A., Galache, J. P. G., Godenschwager, C., & Rüde, U. (2017). Optimum configuration for accurate simulations of chaotic porous media with Lattice Boltzmann Methods considering boundary conditions, lattice spacing and domain size. Computers & Mathematics with Applications, 73(12), 2515-2528.

Coupled Flow for ExaScale



Ulrich Rüde

32

Multi-Physics Simulations for Particulate Flows

Ladd, A. J. (1994). Numerical simulations of particulate suspensions via a discretized Boltzmann equation. Part 1. Theoretical foundation. Journal of Fluid Mechanics, 271(1), 285-309.

Parallel Coupling with waLBerla and PE Coupled Flow for ExaScale



Tenneti, S., & Subramaniam, S. (2014). Particle-resolved direct numerical simulation for gas-solid flow model development. Annual Review of Fluid Mechanics, 46, 199-230. Bartuschat, D., Fischermeier, E., Gustavsson, K., & UR (2016). Two computational models for simulating the tumbling motion of elongated particles in fluids. Computers & Fluids, 127, 17-35.

Ulrich Rüde

33

Fluid-Structure Interaction

direct simulation of Particle Laden Flows (4-way coupling)

Götz, J., Iglberger, K., Stürmer, M., & UR (2010). Direct numerical simulation of particulate flows on 294912 processor cores. In Proceedings of Supercomputing 2010, IEEE Computer Society. Götz, J., Iglberger, K., Feichtinger, C., Donath, S., & UR (2010). Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Computing, 36(2), 142-151.

coupled problems at scale

-

Ulrich Rüde

34

Mapping Moving Obstacles into the LBM Fluid Grid An Example

Fluid Cell Noslip Cell

Velocity/ Pres

Acceleration C

Coupled Flow for ExaScale



Ulrich Rüde

35

Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with with state state change change Cells PDF acting as Force from Particle Fluid to to Particle from Fluid

Cell change Momentum fromcalculation particle fluid to particle to fluid Coupled Flow for ExaScale



Ulrich Rüde

36

Comparison between coupling methods momentum exchange method alternative: partially saturated cells method
 (Noble and Torczynski) Algorithm: LBM algorithm with momentum exchange coupling Initially map particles into fluid domain.

for each time step do for each LBM subcycle do Perform LBM collision step Apply boundary conditions Perform LBM stream step Calculate hydrodynamic forces on particles

end for Average forces on particles over LBM subcycles
 Obtain total force and torque on particles

for each multi-body solver subcycle do Perform rigid-body solver step (collision detection, collision response, time integration).

end for Update particle mapping and reconstruct PDFs if necessary

end for coupled problems at scale

-

Ulrich Rüde

37

Comparison between coupling methods 10

8

8

6

6

xp||

10

z

Example: Single moving particle evaluation of oscillating oblique regime: Re= 263, Ga= 190 correctly represented by momentum exchange (less good with Noble and Torczynski method) Different coupling variants First order bounce back Second order central linear interpolation Cross validation with spectral method of Ullmann & Dušek

4

4

2

2

0

0

M.Uhlmann, J.Dušek, The motion of a single heavy sphere in ambient fluid: A benchmark forinterface-resolved particulate flow simulations with significant 2 2 2 0 2 2 0 2 relative velocities, International Journal of Multiphase Flow 59 (2014). xpH xpHz? D. R. Noble, J. R. Torczynski, A Lattice-Boltzmann Method for Partially Saturated Computational Cells, International Journal of Modern Physics C (1998). Figure 4: Contours of the projected relative velocity urk for case B-CLI-48 (Ga = 178.46). Contours are Rettinger, C., Rüde, U. (2017). A comparative study of fluid-particle coupling the red line outlines the recirculation area with urk = 0. The blue cross in the left plot marks the l methods for fully resolved lattice Boltzmann simulations. Computers & Fluids.

Visualization of recirculation length in particle wake

calculation of the recirculation length Lr .

LBM for Multiphysics



Ulrich Rüde

38

Simulation of suspended particle transport

Preclik, T., Schruff, T., Frings, R., & Rüde, U. (2017, August). Fully Resolved Simulations of Dune Formation in Riverbeds. In High Performance Computing: 32nd International Conference, ISC High Performance 2017, Frankfurt, Germany, June 18-22, 2017, Proceedings (Vol. 10266, p. 3). Springer.

Simulation und Vorhersagbarkeit —

Ulrich Rüde

39

Towards the first principles simulation of fluidized beds

simulation vs. experiment model reactors 2cm x 10cm x 40cm jww: V.V. Buwa
 B. Singh, IIT-Delhi

Deen, N. G., Annaland, M. V. S., Van der Hoef, M. A., & Kuipers, J. A. M. (2007). Review of discrete particle modeling of fluidized beds. Chemical engineering science, 62(1), 28-44. Simulation und Vorhersagbarkeit —

Ulrich Rüde

40

Building Block IV (electrostatics)

Positive and negatively charged particles in flow subjected to transversal electric field

Direct numerical simulation of charged particles in flow Masilamani, K., Ganguly, S., Feichtinger, C., & UR (2011). Hybrid lattice-boltzmann and finite-difference simulation of electroosmotic flow in a microchannel. Fluid Dynamics Research, 43(2), 025501. Bartuschat, D., Ritter, D., & UR (2012). Parallel multigrid for electrokinetic simulation in particle-fluid flows. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 374-380). IEEE. Bartuschat, D. & UR (2015). Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows, Journal of Computational Science, Volume 8, May 2015, Pages 1-19

Coupled Flow for ExaScale



Ulrich Rüde

41

6-way coupling charge distribution

velocity BCs

LBM

Finite volumes MG treat BCs V-cycle

treat BCs stream-collide step

object motion

hydrodynam. force

iter at.

electrostat. force

Newtonian mechanics collision response object distance

coupled problems at scale

-

Ulrich Rüde

correction force

Lubrication correction

42

The sweeps that scale perfectly—HydrF, LubrC, size—mainly dueand to increasing MPI communication—are Map, SetRHS, ElectF—are summarized as ‘Oth‘. LBM, MG,simulation pe, and PtCm. LBM and MG take For longer timesOverall, the particles attracted by the up more wall than are 75%no of the totalevenly time, w.r.t. the runtimes bottom longer distributed, possibly of the individual sweeps. causing load imbalances. However, they hardly a↵ect the The performance. sweeps that scale perfectly—HydrF, LubrC, overall For the simulation for the animaMap, SetRHS, and ElectF—are summarized as ‘Oth‘. tion, the relative share of the lubrication correction is For longer simulation times the particles attracted by the below each otherevenly sweepdistributed, of ‘Oth‘ ispossibly well below bottom0.1% wall and are no longer 4% of the total runtime.However, they hardly a↵ect the causing load imbalances. Overall the coupled achieves overall performance. For multiphysics the simulationalgorithm for the animation, parallel the relative share of lubrication correction 83% efficiency onthe 2048 nodes. Since mostistime 0.1% and each LBM other and sweepMG, of ‘Oth‘ is well below isbelow spent to execute we will now turn to 4% of the total runtime. analyse them in more detail. Fig. 14 displays the paralOverall the coupled multiphysics algorithm achieves lel performance for di↵erent numbers of nodes. On 2048 83% parallel efficiency on 2048 nodes. Since most time nodes, MG executes 121,083 MLUPS, corresponding to is spent to execute LBM and MG, we will now turn to a parallel efficiency of 64%. The LBM performs 95,372 analyse them in more detail. Fig. 14 displays the paralMFLUPS, with 91% parallel efficiency. lel performance for di↵erent numbers of nodes. On 2048

Separation experiment

shares on the total runtime. This diagram is based on the

400maximal (for MG, LBM, pe) or average (others) runtimes

Number of nodes

0

64 12 8 25 6 51 2 10 24 20 48

32

16

8

4

2

1

Figure 13 Runtimes of charged particle algorithm sweeps for 240 time stepsNumber on increasing number of nodes. of nodes

60 30 50 20 40 10 30 0

80 40 LBM Perform. 60 MG Perform. 20 40

3

Perform. 20 0 250 500 750 1000 1250LBM 1500 1750 2000 Number of nodesMG Perform. 20 10 Figure Figure 13 Runtimes of charged particle algorithm sweeps 0 14 Weak scaling performance of MG and LBM 1000 250 500 750time 1250 1500 1750 2000 coupled problems at scale -0 sweep Ulrich Rüde 43 for 240 time steps on increasing number of nodes. for 240 steps. Number of nodes

3

SetRHS PtCm ElectF

3

64 12 8 25 6 51 2 10 24 20 48

32

2

0 100

16

200

8

100

300

4

200

10 MLUPS (MG)

400

3

300

10 MFLUPS 10 MFLUPS (LBM) (LBM)

LBM Map nodes, MG executes 121,083 MLUPS, corresponding to Lubr efficiency of 64%. The LBM performs 95,372120 HydrF a parallel 90 LBM MFLUPS, with 91% parallel efficiency. pe 80 Map 100 MG 70 Lubr SetRHS 60 HydrF 90 120 80 PtCm pe 50 80 100 60 MG ElectF 70 40

of the di↵erent sweeps among all processes. The upper

10 MLUPS (MG)

240 time steps fully 6-way coupled simulation 400 sec on SuperMuc weak scaling up to 32 768 cores 7.1 Mio particles in Fig. 13 for di↵erent problem sizes, indicating their

1 Total runtimes []

Total runtimes []

equired number of iterations scales with the diameter f thethe problem al. (2014) to coarsestsize gridGmeiner problem et is depicted for according di↵erent problem sizes. Doubling the domain in all three dimensions, he growth in the condition number Shewchuk (1994). the when number of CG the iterations approximately doubles. However, doubling problem size, CG iterations This corresponds to or thehave expected that the ometimes stay constant to bebehaviour increased. This number of iterations scales with theDirichlet diameter esultsrequired from di↵erent shares of Neumann and of the problem size Gmeiner et al. (2014) according to BCs on the boundary. Whenever the relative proportion the growth in the condition number Shewchuk (1994). f Neumann BCs increases, and However, when doublingconvergence the problem deteriorates size, CG iterations more CG iterations necessary. sometimes stay are constant or have to be increased. This The runtimes all parts of the algorithmand areDirichlet shown results from of di↵erent shares of Neumann n Fig. 13 on forthe di↵erent problem sizes, indicating their BCs boundary. Whenever the relative proportion increases, convergence and haresofonNeumann the totalBCs runtime. This diagram isdeteriorates based on the more CGMG, iterations maximal (for LBM, are pe) necessary. or average (others) runtimes The runtimes all partsall of processes. the algorithm areupper shown f the di↵erent sweepsof among The

Building Block V

Volume of Fluids Method for Free Surface Flows joint work with Regina Ammer, Simon Bogner, Martin Bauer, Daniela Anderl, Nils Thürey, Stefan Donath, Thomas Pohl, C Körner, A. Delgado Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR. (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196. Donath, S., Feichtinger, C., Pohl, T., Götz, J., & UR. (2010). A Parallel Free Surface Lattice Boltzmann Method for LargeScale Applications. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, 318. Anderl, D., Bauer, M., Rauh, C., UR, & Delgado, A. (2014). Numerical simulation of adsorption and bubble interaction in protein foams using a lattice Boltzmann method. Food & function, 5(4), 755-763.

Coupled Flow for ExaScale



Ulrich Rüde

44

Free Surface Flows Volume-of-Fluids like approach Flag field: Compute only in fluid Special “free surface” conditions in interface cells Reconstruction of curvature for surface tension

coupled problems at scale

-

Ulrich Rüde

45

Simulation for hygiene products (for Procter&Gamble) capillary pressure inclination surface tension contact angle

coupled problems at scale

-

Ulrich Rüde

46

Free surface and rigid objects

Bogner, S., & Rüde, U. (2013). Simulation of floating bodies with the lattice Boltzmann method. Computers & Mathematics with Applications, 65(6), 901-913. Bogner, S., Ammer, R., & Rüde, U. (2015). Boundary conditions for free interfaces with the lattice Boltzmann method. Journal of Computational Physics, 297, 1-12.

coupled problems at scale

-

Ulrich Rüde

47

Additive Manufacturing Fast Electron Beam Melting Bikas, H., Stavropoulos, P., & Chryssolouris, G. (2015). Additive manufacturing methods and modelling approaches: a critical review. The International Journal of Advanced Manufacturing Technology, 1-17. Klassen, A., Scharowsky, T., & Körner, C. (2014). Evaporation model for beam based additive manufacturing using free surface lattice Boltzmann methods. Journal of Physics D: Applied Physics, 47(27), 275303. Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196.

Coupled Flow for ExaScale



Ulrich Rüde

48

Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing) EU-Project FastEBM ARCAM (Gothenburg) TWI (Cambridge) WTM (FAU) ZISC (FAU) Generation of powder bed Energy transfer by electron beam

penetration depth heat transfer Flow dynamics melting solidification melt flow surface tension wetting capillary forces contact angles

Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330. Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C.

Coupled Flow for ExaScale



Ulrich Rüde

49

Simulation of Electron Beam Melting

High speed camera shows melting step for manufacturing a hollow cylinder

Simulating powder bed generation Simulating powder bed generation using the PE framework using the PE framework

WaLBerla Simulation

Coupled Flow for ExaScale



Ulrich Rüde

50

CSE research is done by teams

Harald Köstler

Florian Schornbaum

Christian Godenschwager

Sebastian Kuckuk

Kristina Pickl

Christoph Rettinger

Coupled Flow for ExaScale



Regina Ammer

Dominik Bartuschat

Ulrich Rüde

Simon Bogner

Martin Bauer 51

Towards fully resolved 3-phase systems

coupled problems at scale

-

Ulrich Rüde

52