Numerical Simulation of Multiphase Flows at the Extreme Scale Ulrich Rüde LSS Erlangen and CERFACS Toulouse
[email protected] Symposium: Fluid Dynamics Modelling for Real World Applications June 28, 2017 Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique www.cerfacs.fr Flow at Scale
Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de —
Ulrich Rüde
1
On the agenda today … Continuum models: Finite elements and implicit solvers Geophysics Particle based methods: rigid body dynamics Granular systems Mesoscopic methods: Lattice Boltzman methods Complex flows Perspectives Towards Predictive Science Computational Science and Engineering
Not Yet at Scale
-
Ulrich Ruede
2
SuperMuc: 3 PFlops
Building Block I:
Current and Future High Performance Supercomputers
Coupled Flow for ExaScale
—
Ulrich Rüde
3
Multi-PetaFlops Supercomputers Sunway TaihuLight SW26010 processor 10,649,600 cores 260 cores (1.45 GHz)
per node 32 GiB RAM per node 125 PFlops Peak Power consumption:
15.37 MW TOP 500: #1
JUQUEEN
SuperMUC (phase 1)
Blue Gene/Q
architecture 458,752 PowerPC
A2 cores 16 cores (1.6 GHz)
per node 16 GiB RAM per node 5D torus interconnect
Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27
5.8 PFlops Peak TOP 500: #13
What is the problem?
LBM Methods
Ulrich Rüde
Would you want to propel a Superjumbo
Designing Algorithms! with four strong jet engines
Large Scale S imulation Soft ware
Moderately Parallel Computing
or with 1,000,000 blow dryer fans?
Massively Parallel MultiCore Systems
Coupled Flow for ExaScale
—
Ulrich Rüde
5
Building block II:
Fast Implicit Solvers: Parallel Multigrid Methods for Earth Mantle Convection Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., & Wohlmuth, B. (2015). Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 37(2), C143-C168. Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., & Wohlmuth, B. (2015). Towards textbook efficiency for parallel multigrid. Numerical Mathematics: Theory, Methods and Applications, 8(01), 22-46. Huber, M., John, L., Pustejovska, P., Rüde, U., Waluga, C., & Wohlmuth, B. (2015). Solution Techniques for the Stokes System: A priori and a posteriori modifications, resilient algorithms. ICIAM 2015.
Simulation und Vorhersagbarkeit —
Ulrich Rüde
6
Stationary Flow Field
TERRA NEO
Simulation of Earth Mantle Convection
-
Uli Rüde
7
TERRA
Exploring the Limits … ally appear in simulations for molecules, quantum mechanics, or geophysics. The initial
Gmeiner B., Huber M, John L, UR, Wohlmuth, B: A quantitative performance study for Stokes solvers at the of extreme Journal of Computational Science, in print. onsists 240 scale, tetrahedrons for the case of 52016, nodes and 80 threads. The number
of degr
Multigrid withT0Uzawa Smoother oms on the coarse grid grows from 9.0 · 103 to 4.1 · 107 by the weak scaling. We con
Optimized Minimal Memory Consumption tokes system with thefor Laplace-operator formulation. The relative accuracies for coarse 13 Unknowns correspond to 80 TByte for the solution vector 10 r (PMINRES and CG algorithm) are set to 10 3 and 10 4 , respectively. All other param Juqueen has 450 TByte Memory F o D e r he solver remain as previously described. o m e d matrix free implementation essential u ers gnit nodes 5
lv a o s m f E o F s t r r e a d r e o h l -t time time w.c.g. threads DoFs iter f a r o e v e t e a s t s t s o m 80than 2.7 · 109 10 685.88 678.77
40
640
320
5 120
2 560
40 960
20 480
327 680
2.1 · 1010
time c.g. in % 1.04
10
703.69
686.24
2.48
10
741.86
709.88
4.31
1.7 · 1012
9
720.24
671.63
6.75
1.1 · 10
9
1.2 · 1011 13
776.09
681.91
TERRA NEO
12.14
ERRA Table 10: Weak scaling results with and without coarse grid for the spherical shell Tgeometry. 8 Simulation of Earth Mantle Convection - Uli Rüde
Multigrid: The algorithm for 1013 unknowns Goal: solve Ah uh = f h using a hierarchy of grids
Ah u h = f h rh = f h Residual
Relax on
Restrict
Correct
uh
uh + eh
Ah u h
rH = IhH rh
Interpolate
h H e eh = IH
Multigrid uses coarse grids Solve AH eH = r H to accomplish the global data exchange in the most efficient by recursion way possible …
…
Simulation of Earth Mantle Convection
TERRA NEO
-
Uli Rüde
9
TERRA
Hierarchical Hybrid Grids (HHG) Parallelize „plain vanilla“ multigrid for tetrahedral finite elements partition domain parallelize all operations on all grids use clever data structures matrix free implementation
Do not worry (so much) about coarse grids idle processors? short messages? sequential dependency in grid hierarchy?
Elliptic problems always require global communication. This cannot be accomplished by
Bey‘s Tetrahedral Refinement
local relaxation or Krylov space acceleration or domain decomposition without coarse grid Gholami, A., Malhotra, D., Sundar, H., & Biros, G. (2016). FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube. SIAM Journal on Scientific Computing, 38(3), C280-C306. B. Bergen, F. Hülsemann, U. Rüde, G. Wellein: ISC Award 2006: „Is 1.7× 1010 unknowns the largest finite element system that TERRA NEO can be solved today?“, SuperComputing, 2005. Gmeiner, UR, Stengel, Waluga, Wohlmuth: Towards Textbook Efficiency for Parallel Multigrid, Journal of Numerical Mathematics: Theory, Methods and Applications, 2015
Simulation of Earth Mantle Convection
-
Uli Rüde
10
TERRA
Building block II:
Granular media simulations
Hiking Aug. 2016 in the Silvretta mountains
with the physics engine
Pöschel, T., & Schwager, T. (2005). Computational granular dynamics: models and algorithms. Springer Science & Business Media.
Coupled Flow for ExaScale
—
Ulrich Rüde
11
Lagrangian Particle Presentation Single particle described by state variables (position x, orientation φ,
translational and angular velocity v and ω) a parameterization of its shape S (e.g.
geometric primitive, composite object, or mesh) and its inertia properties (mass m, principle
moments of inertia Ixx, Iyy and Izz).
The Newton-Euler equations of motion for rigid bodies describe the rate of change of the state ewton-Euler Equations for Rigid Bodies variables: ✓ ◆ ✓ ◆ ˙ x(t) v(t) = ˙ '(t) Q('(t))!(t) ✓ ◆ ✓ ◆ ˙ v(t) f(s(t), t) M('(t)) = ˙ !(t) ⌧(s(t), t) !(t) ⇥ I('(t))!(t)
• Integrator of order one similar to semi-implicit Euler. coupled problems at scale
-
Ulrich Rüde
12
Discretization Underlying the Time-Stepping
Nonlinear Complementarity: Measure Differential Inclusions Non-penetration conditions
⇠
discrete
impulses
continuous
forces
⇠=0
⇠˙+ ⇠˙+ = 0
⇠¨+ ⇠
⇠=0
⇠˙+
0?
n
0
0?
n
0
0?
n
0
0 ? ⇤n
0
0 ? ⇤n
⇠ + v 0n( ) t Signorini condition
impact law
Coulomb friction conditions
k
+ k vto k2
+ ˙ k v tok2
n
to to
µ
= =
µ µ
n n
+ vto
n
˙v + to
+ k vto k2 = 0
k⇤tok2 µ⇤n
+ k vto k2⇤to =
0
0?
to k2
0
k
0 k vto ( )k2
friction cone condition
to k2 to
+ µ⇤n vto
µ
=
µ
n n
0 vto ( )
frictional reaction opposes slip
Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York 15.12.2014 — T. Preclik — Lehrstuhl fur ¨ Systemsimulation — Ultrascale Simulations of Non-smooth Granular Dynamics Popa,Erlangen, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12. Preclik, T., & UR (2015). Ultrascale simulations of non-smooth granular dynamics. Computational Particle Mechanics, 2(2), 173-196.
Preclik, T., Eibl, S., & UR (2017). The Maximum Dissipation Principle in Rigid-Body Dynamics with Purely Inelastic Impacts. arXiv preprint:1706.00221.
coupled problems at scale
-
Ulrich Rüde
13
9
Shaker scenario with sharp edged hard objects
864 000 sharp-edged particles with a diameter between 0.25 mm and 2 mm. coupled problems at scale
-
Ulrich Rüde
14
PE marble run - rigid objects in complex geometry
Animation by Sebastian Eibl and Christian Godenschwager LBM for EXA
—
Ulrich Rüde
15
Scaling Results
18
Tobias Preclik, Ulrich R¨ ude
fastest. The T coordinate is limited by the number of Solver algorithmically not optimal for dense systems, hence processes per node, which wascannot 64 for the scale above measurecreation of a three-dimensional communiunconditionally, but is highly efficient inments. manyUpon cases of practical importance cator, the three dimensions of the domain partitionStrong and weak scaling results for a constant number of iterations ing are mapped also in row-major order. This e↵ects, if performed on SuperMUC and Juqueenthe number of processes in z-dimension is less than the number of processes per node, that a two-dimensional Largest ensembles computed or even three-dimensional section of the domain parti10 tioning is mapped to a single node. However, if the number of processes in z-dimension is larger or equal to the 10 number Breakup of processes up per of node, only a one-dimensional compute times on section of Erlangen the domain partitioning is mapped to a single granular gas: scaling results RRZE Cluster Emmy 7.1 Scalability of Granular Gases (a) Weak-scaling graph on the Emmy cluster. node. A one-dimensional section of the domain partitioning performs considerably less intra-node communi0.116 1.2 cation than a two- or three-dimensional section of the 0.114 1 domain partitioning. This matches exactly the situa0.112 tion for 2 048 and124.6 096 nodes. For 2 048 nodes, 1a6 two0.11 .5 % % 0.8 % % section 1⇥2⇥32 of the domain .7 dimensional partitioning 0.108 64⇥64⇥32 is mapped to each node, and for 4 096 nodes 0.106 0.6 0.104 a one-dimensional section 1 ⇥ 1 ⇥ 64 of the domain par0.4 0.102 titioning 64 ⇥ 64 ⇥ 64 is mapped to each node. To sub0.1 stantiate this claim, we confirmed that the performance 0.2 av. time per time step ( rst series) av. time per time step (second series) 0.098 jump occurs when the last dimension of the domain parparallel e ciency (second series) 0.096 0 titioning reaches the number of processes per node, also 1 4 16 64 256 1024 4096 16384 number of nodes when using 16 and 32 processes per node. (a) Time-step profile of the granular gas exe(b) Time-step profile of the granular gas exe22
%
.3% %8 9 . 5
25 .9
parallel e ciency
% 9.5
25
.8%
8.0
cuted with 5 × 2 × 2 = 20 processes on a single node.
30.6%
(b) Weak-scaling graph on the Juqueen supercomputer.
16.0%
18.1%
av. time per time step and 1000 particles in s
2.8 × 10 non-spherical particles 1.1 × 10 contacts
cuted with 8 × 8 × 5 = 320 processes on 16 nodes.
Fig. 5c presents the weak-scaling results on the SuFigure 7.3: supercomputer. The time-step profiles for two weak-scaling the granular perMUC The setup executions di↵ers offrom thegas on the Emmy cluster with 253 particles per process. granular gas scenario presented in Sect. 7.2.1 in that it coupled problems at scale - Ulrich Rüde 16centers of is more TheThe distance between twodecomdomain dilute. decompositions. scaling experiment for thethe one-dimensional domain
Building Block III:
Scalable Flow Simulations with the Lattice Boltzmann Method Succi, S. (2001). The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press. Feichtinger, C., Donath, S., Köstler, H., Götz, J., & Rüde, U. (2011). WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2), 105-112.
Extreme Scale LBM Methods
-
Ulrich Rüde
17
Starting from the Boltzmann equation • •
The Boltzmann equation:
@f + ⇠rx f + K r⇠ f = Q(f , f ), @t
f = f (t, x, ⇠).
describes the dynamics of particle position probability in phase space:
phase space (6D) = position space (3D) + velocity space (3D) .
•
Complicated integro-differential equation due to its collision term. Assumptions for Q: binary collisions and molecular chaos. f(t,x,ξ) is the probability to encounter particles with the continuous microscopic velocity ξ at position x at time t.
Flow at Scale
—
Ulrich Rüde
18
Simplest approximation of the collision term Modeling of the complex collision term
Krook equation, BGK
(Bhatnagar-Gross-Krook) equation:
@f + ⇠rx f = @t
1 ⌧ t
(f
f
(0)
).
with ⌧ : relaxation time depending on the viscosity of the simulated fluid,
1 ! = : Collision frequency [0.5, 2.0] / Δt ⌧ This is „Single Time Relaxation“ (SRT) In LBM many (better) alternatives can be devised Geier, M., Schönherr, M., Pasquali, A., & Krafczyk, M. (2015). The cumulant lattice Boltzmann equation in three dimensions: Theory and validation. Computers & Mathematics with Applications, 70(4), 507-547.
Flow at Scale
—
Ulrich Rüde
19
Discretization If (Kn = ε ≪ 1) and in case of very small deviations from the local equilibrium: f can be approximated with only a few degrees of freedom. The physical approximation of the distribution function f with values fi at N collocation points, which move with the velocity ci, is
@f + ⇠rx f = @t
1
(f
f (0) )
⌧ t mean free path length Knudsen number: Kn = . characteristic length scale
f (t, x, ⇠) =) f˜i (t, x).
Flow at Scale
—
Ulrich Rüde
20
Eulerian: Lattice-Boltzmann-Method Discretization in squares or cubes (cells) Common examples for particle distribution functions (PDF) in 2D: 9 numbers (2DQ9) in 3D: D3Q19 (alternatives D3Q27, etc)
Flow at Scale
—
Ulrich Rüde
21
The stream step Move PDFs into neighboring cells
Non-local part, Linear propagation to neighbors (stream step) coupled problems at scale
Local part, Non-linear operator, (collide step) -
Ulrich Rüde
22
The Equilibrium Distribution Function fi (x + ci t, t +
t)
fi (x, t) =
1 (fi ⌧
eq
fi ).
feq is the equilibrium distribution function It defines the local particle distribution for a certain velocity u and a certain density ρ when the fluid has reached a state of equilibrium
fi
eq
= fi
eq
✓
3 9 (⇢, u) = ti ⇢ 1 + 2 ci u + 4 (ci u)2 c 2c
2
3u 2c 2
ci: discrete lattice velocity ti: direction dependent weighting factor
Flow at Scale
—
Ulrich Rüde
23
◆
The collide step Compute new PDFs modeling molecular collisions Most collision operators can be expressed as Equilibrium function: non-linear,
depending on the conserved moments , , and .
coupled problems at scale
-
Ulrich Rüde
24
The Lattice Boltzmann Algorithm
coupled problems at scale
-
Ulrich Rüde
25
Performance on Coronary Arteries Geometry
Weak scaling
458,752 cores of JUQUEEN
over a trillion (1012) fluid lattice cells Color coded proc assignment Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., & UR (2013). A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (p. 35). ACM.
Strong scaling
32,768 cores of SuperMUC cell sizes of 0.1 mm 2.1 million fluid cells 6000 time steps per second
coupled problems at scale
-
Ulrich Rüde
Where have all my cycles gone? … evaluating single node performance SuperMUC
JUQUEEN
ed z i r o vect ze i m i opt
d
standard
Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel largescale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference on Supercomputing (p. 21). IEEE Computer Society. Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and Engineering, 4(1), 3-11.
Flow at Scale
—
Ulrich Rüde
27
Weak scaling for TRT lid driven cavity - uniform grids JUQUEEN
SuperMUC
s 4 processes per node
16 processes per node 4 threads per process
e 4 threads per process l t l a ) e d s c p 2 p u 1 ) u l l s
0 L e p 1 s T c u ( 2 e × at TL 1 d 7 n 0 ( d 3 1 co p d 8 . u n × se 0 o 1 c . r e 2 pe s r e p Extreme Scale LBM
-
Ulrich Rüde
Adaptive Mesh Refinement and Load Balancing
Isaac, T., Burstedde, C., Wilcox, L. C., & Ghattas, O. (2015). Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing, 37(5), C497-C531. Meyerhenke, H., Monien, B., & Sauerwald, T. (2009). A new diffusion-based multilevel algorithm for computing graph partitions. Journal of Parallel and Distributed Computing, 69(9), 750-761. Schornbaum, F., & Rüde, U. (2016). Massively Parallel Algorithms for the Lattice Boltzmann Method on NonUniform Grids. SIAM Journal on Scientific Computing, 38(2), C96-C126. Schornbaum, F., & Rüde, U. (2017). Extreme-Scale Block-Structured Adaptive Mesh Refinement. arXiv preprint:1704.06829.
coupled problems at scale
-
Ulrich Rüde
29
Partitioning and Parallelization
static block-level refinement (→ forest of octrees)
static load balancing
DISK
compact (KiB/MiB) binary MPI IO
DISK
separation of domain partitioning from simulation (optional)
allocation of block data (→ grids) Flow at Scale
—
Ulrich Rüde
30
Parallel AMR load balancing 2:1 balanced grid
(used for the LBM)
different views on domain partitioning
distributed graph:
forest of octrees:
nodes = blocks edges explicitly stored as < block ID, process rank > pairs
octrees are not explicitly stored,
but implicitly defined via block IDs
Flow at Scale
—
Ulrich Rüde
31
Flow through structure of thin crystals (filter)
Gil, A., Galache, J. P. G., Godenschwager, C., & Rüde, U. (2017). Optimum configuration for accurate simulations of chaotic porous media with Lattice Boltzmann Methods considering boundary conditions, lattice spacing and domain size. Computers & Mathematics with Applications, 73(12), 2515-2528.
Coupled Flow for ExaScale
—
Ulrich Rüde
32
Multi-Physics Simulations for Particulate Flows
Ladd, A. J. (1994). Numerical simulations of particulate suspensions via a discretized Boltzmann equation. Part 1. Theoretical foundation. Journal of Fluid Mechanics, 271(1), 285-309.
Parallel Coupling with waLBerla and PE Coupled Flow for ExaScale
—
Tenneti, S., & Subramaniam, S. (2014). Particle-resolved direct numerical simulation for gas-solid flow model development. Annual Review of Fluid Mechanics, 46, 199-230. Bartuschat, D., Fischermeier, E., Gustavsson, K., & UR (2016). Two computational models for simulating the tumbling motion of elongated particles in fluids. Computers & Fluids, 127, 17-35.
Ulrich Rüde
33
Fluid-Structure Interaction
direct simulation of Particle Laden Flows (4-way coupling)
Götz, J., Iglberger, K., Stürmer, M., & UR (2010). Direct numerical simulation of particulate flows on 294912 processor cores. In Proceedings of Supercomputing 2010, IEEE Computer Society. Götz, J., Iglberger, K., Feichtinger, C., Donath, S., & UR (2010). Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Computing, 36(2), 142-151.
coupled problems at scale
-
Ulrich Rüde
34
Mapping Moving Obstacles into the LBM Fluid Grid An Example
Fluid Cell Noslip Cell
Velocity/ Pres
Acceleration C
Coupled Flow for ExaScale
—
Ulrich Rüde
35
Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with with state state change change Cells PDF acting as Force from Particle Fluid to to Particle from Fluid
Cell change Momentum fromcalculation particle fluid to particle to fluid Coupled Flow for ExaScale
—
Ulrich Rüde
36
Comparison between coupling methods momentum exchange method alternative: partially saturated cells method
(Noble and Torczynski) Algorithm: LBM algorithm with momentum exchange coupling Initially map particles into fluid domain.
for each time step do for each LBM subcycle do Perform LBM collision step Apply boundary conditions Perform LBM stream step Calculate hydrodynamic forces on particles
end for Average forces on particles over LBM subcycles
Obtain total force and torque on particles
for each multi-body solver subcycle do Perform rigid-body solver step (collision detection, collision response, time integration).
end for Update particle mapping and reconstruct PDFs if necessary
end for coupled problems at scale
-
Ulrich Rüde
37
Comparison between coupling methods 10
8
8
6
6
xp||
10
z
Example: Single moving particle evaluation of oscillating oblique regime: Re= 263, Ga= 190 correctly represented by momentum exchange (less good with Noble and Torczynski method) Different coupling variants First order bounce back Second order central linear interpolation Cross validation with spectral method of Ullmann & Dušek
4
4
2
2
0
0
M.Uhlmann, J.Dušek, The motion of a single heavy sphere in ambient fluid: A benchmark forinterface-resolved particulate flow simulations with significant 2 2 2 0 2 2 0 2 relative velocities, International Journal of Multiphase Flow 59 (2014). xpH xpHz? D. R. Noble, J. R. Torczynski, A Lattice-Boltzmann Method for Partially Saturated Computational Cells, International Journal of Modern Physics C (1998). Figure 4: Contours of the projected relative velocity urk for case B-CLI-48 (Ga = 178.46). Contours are Rettinger, C., Rüde, U. (2017). A comparative study of fluid-particle coupling the red line outlines the recirculation area with urk = 0. The blue cross in the left plot marks the l methods for fully resolved lattice Boltzmann simulations. Computers & Fluids.
Visualization of recirculation length in particle wake
calculation of the recirculation length Lr .
LBM for Multiphysics
—
Ulrich Rüde
38
Simulation of suspended particle transport
Preclik, T., Schruff, T., Frings, R., & Rüde, U. (2017, August). Fully Resolved Simulations of Dune Formation in Riverbeds. In High Performance Computing: 32nd International Conference, ISC High Performance 2017, Frankfurt, Germany, June 18-22, 2017, Proceedings (Vol. 10266, p. 3). Springer.
Simulation und Vorhersagbarkeit —
Ulrich Rüde
39
Towards the first principles simulation of fluidized beds
simulation vs. experiment model reactors 2cm x 10cm x 40cm jww: V.V. Buwa
B. Singh, IIT-Delhi
Deen, N. G., Annaland, M. V. S., Van der Hoef, M. A., & Kuipers, J. A. M. (2007). Review of discrete particle modeling of fluidized beds. Chemical engineering science, 62(1), 28-44. Simulation und Vorhersagbarkeit —
Ulrich Rüde
40
Building Block IV (electrostatics)
Positive and negatively charged particles in flow subjected to transversal electric field
Direct numerical simulation of charged particles in flow Masilamani, K., Ganguly, S., Feichtinger, C., & UR (2011). Hybrid lattice-boltzmann and finite-difference simulation of electroosmotic flow in a microchannel. Fluid Dynamics Research, 43(2), 025501. Bartuschat, D., Ritter, D., & UR (2012). Parallel multigrid for electrokinetic simulation in particle-fluid flows. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 374-380). IEEE. Bartuschat, D. & UR (2015). Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows, Journal of Computational Science, Volume 8, May 2015, Pages 1-19
Coupled Flow for ExaScale
—
Ulrich Rüde
41
6-way coupling charge distribution
velocity BCs
LBM
Finite volumes MG treat BCs V-cycle
treat BCs stream-collide step
object motion
hydrodynam. force
iter at.
electrostat. force
Newtonian mechanics collision response object distance
coupled problems at scale
-
Ulrich Rüde
correction force
Lubrication correction
42
The sweeps that scale perfectly—HydrF, LubrC, size—mainly dueand to increasing MPI communication—are Map, SetRHS, ElectF—are summarized as ‘Oth‘. LBM, MG,simulation pe, and PtCm. LBM and MG take For longer timesOverall, the particles attracted by the up more wall than are 75%no of the totalevenly time, w.r.t. the runtimes bottom longer distributed, possibly of the individual sweeps. causing load imbalances. However, they hardly a↵ect the The performance. sweeps that scale perfectly—HydrF, LubrC, overall For the simulation for the animaMap, SetRHS, and ElectF—are summarized as ‘Oth‘. tion, the relative share of the lubrication correction is For longer simulation times the particles attracted by the below each otherevenly sweepdistributed, of ‘Oth‘ ispossibly well below bottom0.1% wall and are no longer 4% of the total runtime.However, they hardly a↵ect the causing load imbalances. Overall the coupled achieves overall performance. For multiphysics the simulationalgorithm for the animation, parallel the relative share of lubrication correction 83% efficiency onthe 2048 nodes. Since mostistime 0.1% and each LBM other and sweepMG, of ‘Oth‘ is well below isbelow spent to execute we will now turn to 4% of the total runtime. analyse them in more detail. Fig. 14 displays the paralOverall the coupled multiphysics algorithm achieves lel performance for di↵erent numbers of nodes. On 2048 83% parallel efficiency on 2048 nodes. Since most time nodes, MG executes 121,083 MLUPS, corresponding to is spent to execute LBM and MG, we will now turn to a parallel efficiency of 64%. The LBM performs 95,372 analyse them in more detail. Fig. 14 displays the paralMFLUPS, with 91% parallel efficiency. lel performance for di↵erent numbers of nodes. On 2048
Separation experiment
shares on the total runtime. This diagram is based on the
400maximal (for MG, LBM, pe) or average (others) runtimes
Number of nodes
0
64 12 8 25 6 51 2 10 24 20 48
32
16
8
4
2
1
Figure 13 Runtimes of charged particle algorithm sweeps for 240 time stepsNumber on increasing number of nodes. of nodes
60 30 50 20 40 10 30 0
80 40 LBM Perform. 60 MG Perform. 20 40
3
Perform. 20 0 250 500 750 1000 1250LBM 1500 1750 2000 Number of nodesMG Perform. 20 10 Figure Figure 13 Runtimes of charged particle algorithm sweeps 0 14 Weak scaling performance of MG and LBM 1000 250 500 750time 1250 1500 1750 2000 coupled problems at scale -0 sweep Ulrich Rüde 43 for 240 time steps on increasing number of nodes. for 240 steps. Number of nodes
3
SetRHS PtCm ElectF
3
64 12 8 25 6 51 2 10 24 20 48
32
2
0 100
16
200
8
100
300
4
200
10 MLUPS (MG)
400
3
300
10 MFLUPS 10 MFLUPS (LBM) (LBM)
LBM Map nodes, MG executes 121,083 MLUPS, corresponding to Lubr efficiency of 64%. The LBM performs 95,372120 HydrF a parallel 90 LBM MFLUPS, with 91% parallel efficiency. pe 80 Map 100 MG 70 Lubr SetRHS 60 HydrF 90 120 80 PtCm pe 50 80 100 60 MG ElectF 70 40
of the di↵erent sweeps among all processes. The upper
10 MLUPS (MG)
240 time steps fully 6-way coupled simulation 400 sec on SuperMuc weak scaling up to 32 768 cores 7.1 Mio particles in Fig. 13 for di↵erent problem sizes, indicating their
1 Total runtimes []
Total runtimes []
equired number of iterations scales with the diameter f thethe problem al. (2014) to coarsestsize gridGmeiner problem et is depicted for according di↵erent problem sizes. Doubling the domain in all three dimensions, he growth in the condition number Shewchuk (1994). the when number of CG the iterations approximately doubles. However, doubling problem size, CG iterations This corresponds to or thehave expected that the ometimes stay constant to bebehaviour increased. This number of iterations scales with theDirichlet diameter esultsrequired from di↵erent shares of Neumann and of the problem size Gmeiner et al. (2014) according to BCs on the boundary. Whenever the relative proportion the growth in the condition number Shewchuk (1994). f Neumann BCs increases, and However, when doublingconvergence the problem deteriorates size, CG iterations more CG iterations necessary. sometimes stay are constant or have to be increased. This The runtimes all parts of the algorithmand areDirichlet shown results from of di↵erent shares of Neumann n Fig. 13 on forthe di↵erent problem sizes, indicating their BCs boundary. Whenever the relative proportion increases, convergence and haresofonNeumann the totalBCs runtime. This diagram isdeteriorates based on the more CGMG, iterations maximal (for LBM, are pe) necessary. or average (others) runtimes The runtimes all partsall of processes. the algorithm areupper shown f the di↵erent sweepsof among The
Building Block V
Volume of Fluids Method for Free Surface Flows joint work with Regina Ammer, Simon Bogner, Martin Bauer, Daniela Anderl, Nils Thürey, Stefan Donath, Thomas Pohl, C Körner, A. Delgado Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR. (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196. Donath, S., Feichtinger, C., Pohl, T., Götz, J., & UR. (2010). A Parallel Free Surface Lattice Boltzmann Method for LargeScale Applications. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, 318. Anderl, D., Bauer, M., Rauh, C., UR, & Delgado, A. (2014). Numerical simulation of adsorption and bubble interaction in protein foams using a lattice Boltzmann method. Food & function, 5(4), 755-763.
Coupled Flow for ExaScale
—
Ulrich Rüde
44
Free Surface Flows Volume-of-Fluids like approach Flag field: Compute only in fluid Special “free surface” conditions in interface cells Reconstruction of curvature for surface tension
coupled problems at scale
-
Ulrich Rüde
45
Simulation for hygiene products (for Procter&Gamble) capillary pressure inclination surface tension contact angle
coupled problems at scale
-
Ulrich Rüde
46
Free surface and rigid objects
Bogner, S., & Rüde, U. (2013). Simulation of floating bodies with the lattice Boltzmann method. Computers & Mathematics with Applications, 65(6), 901-913. Bogner, S., Ammer, R., & Rüde, U. (2015). Boundary conditions for free interfaces with the lattice Boltzmann method. Journal of Computational Physics, 297, 1-12.
coupled problems at scale
-
Ulrich Rüde
47
Additive Manufacturing Fast Electron Beam Melting Bikas, H., Stavropoulos, P., & Chryssolouris, G. (2015). Additive manufacturing methods and modelling approaches: a critical review. The International Journal of Advanced Manufacturing Technology, 1-17. Klassen, A., Scharowsky, T., & Körner, C. (2014). Evaporation model for beam based additive manufacturing using free surface lattice Boltzmann methods. Journal of Physics D: Applied Physics, 47(27), 275303. Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196.
Coupled Flow for ExaScale
—
Ulrich Rüde
48
Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing) EU-Project FastEBM ARCAM (Gothenburg) TWI (Cambridge) WTM (FAU) ZISC (FAU) Generation of powder bed Energy transfer by electron beam
penetration depth heat transfer Flow dynamics melting solidification melt flow surface tension wetting capillary forces contact angles
Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330. Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C.
Coupled Flow for ExaScale
—
Ulrich Rüde
49
Simulation of Electron Beam Melting
High speed camera shows melting step for manufacturing a hollow cylinder
Simulating powder bed generation Simulating powder bed generation using the PE framework using the PE framework
WaLBerla Simulation
Coupled Flow for ExaScale
—
Ulrich Rüde
50
CSE research is done by teams
Harald Köstler
Florian Schornbaum
Christian Godenschwager
Sebastian Kuckuk
Kristina Pickl
Christoph Rettinger
Coupled Flow for ExaScale
—
Regina Ammer
Dominik Bartuschat
Ulrich Rüde
Simon Bogner
Martin Bauer 51
Towards fully resolved 3-phase systems
coupled problems at scale
-
Ulrich Rüde
52