CS-Storm 8 K40. CS-Storm 16 K80 ..... objective of this project is to efficiently simulate flow-induced platelet activation in order to better understand thrombosis.
Efficient Multiscale Platelets Modeling Using Supercomputers Na Zhang Advisor: Professor Yuefan Deng Department of Applied Mathematics and Statistics, Stony Brook University
Computational Complexities
Motivation HPC Matters, Now More than Ever
Performance Results on Supercomputers
Time (s)
Thrombosis Burden!
Cardiovascular Devices
10 0
10 -6
10
# of Particles
Dimensions
Exp-S
1
680,718
45×90×45
Exp-M
4
2,722,872
90×90×90
Exp-L
16
10,891,488
180×90×180
Mesoscopic DPD, LBM, BD
CGMD Classical MD, MC
-12
Case
Quantum
-15
10 -12
10 -9
10 -6
10 -3
10 0
CaseA CaseB CaseC CaseD CaseE STS
Space (m)
Complexity II: Complicated Model and Force Fields
Source: Thrombogenicity Potential of Mechanical Heart Valves Simulations, Bio fluids Laboratory, Department of Biomedical Engineering. Stony Brook University
This objective of this project is to efficiently simulate flow-induced platelet activation in order to better understand thrombosis formation mechanisms. The methodologies include: Mathematical modeling of viscous blood flows and human platelet cell Algorithmic acceleration
Multiscale coupling methods
Heterogeneous computing
Data analysis of thermodynamic properties
Visualization
Categories
Single Platelet
Multiple Platelets
In Vacuum
~0.14 million particles
Complex interactions among platelets
~0.6 million particles
~2.7 million particles for 4 platelets flipping in blood plasm ~10.9 million particles for 16 platelets flipping in blood plasma > 50 million particles for 100 platelets in blood plasma
In Blood Plasma
Multiscale Fluid-Platelet Models
Varying MTS Jump Factors: Time steps for each scale
Complexity I: Disparate Temporal and Spatial Scales Jackson et al., Dynamics of Platelet Thrombus Formation, Journal of Thrombus and Haemostasis, 2009
# of Platelets
Flow
10 -9 10
Experiments Continuum CFD
10 -3
Varying Problem Sizes:
In Blood Vessels
Many types of blood cells and complex interactions among those cells
With Shear Stresses & Thermo Conditions
Much more complex inputs and outputs control
CGMD-BD (𝚫𝒕𝟏 × 𝟏𝟎−𝟔 ) 2.5 5.0 5.0 10.0 10.0 1.0
CGMD-NB (𝚫𝐭 𝟐 × 𝟏𝟎−𝟔 ) 2.5 5.0 5.0 10.0 10.0 1.0
DPD-CGMD (𝚫𝐭 𝟑 × 𝟏𝟎−𝟔 ) 25.0 50.0 50.0 100.0 100.0 1.0
Configurations DPD (𝚫𝐭 𝟒 × 𝟏𝟎−𝟔 ) 500.0 1000.0 500.0 500.0 1000.0 1.0
𝚫𝒕 × 𝟏𝟎−𝟔
𝑲𝟏
𝑲𝟐
𝑲𝟑
500.0 1000.0 500.0 500.0 1000.0 1.0
1 1 1 1 1 1
10 10 10 10 10 1
20 20 10 5 10 1
Varying Test Systems (CPU-only versus High-Density GPGPU Server): Tianhe-2
CS-Storm 8 K40
CS-Storm 16 K80
Complexity III: Large Demand of On-the-fly Analysis
Speedup Strategy I-Multiscale Multiple Time Stepping Algorithm
Scales
Nanoscale
Mesoscale
Simulation Domain
Platelet Cell
Blood Plasma
Methods
Coarse-Grained Molecular Dynamics (CGMD)
Dissipative Particle Dynamics (DPD)
Time Step
10~100 fs
0.01~1 𝜇s
Length
1~20 A
0.1 ~ 1 𝜇m
Speeds (in unit of day/𝝁𝒔) of no_mts or mts algorithms on (1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards
Various vascular geometries simulated by Dissipative Particle Dynamics method Source: Joao and Chao 2012
Model Abstraction
Perf. Improvement of mts over no_mts algorithms on (1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards
Multiple Scales in the Model
Physical structures and constitutes of Multiscale model of human platelets
𝑉 𝑟 = +
+
Force Fields +
𝑘𝑏 𝑟 − 𝑟0
2
+
𝑘𝜙 1 + cos 𝑛𝜙 − 𝛿
4𝜀
𝜎 𝑟
12
𝜎 − 𝑟
𝑟 𝜖 𝛼 1− 𝑅 𝜇
𝑘𝜃 𝜃 − 𝜃0 +
2
Bond and Angle Terms
𝑞𝑖 𝑞𝑗 Dihedral and Electrostatic 4𝜋𝜀0 𝑟
6
Van Der Waals (L-J) 𝛼 𝑟 − 2 exp 1− 2 𝑅 𝜇
Modified Morse
𝑉 is the total energy on each particle composed of platelet. It includes a classical MD potential for describing the actin filament structure, a modified Morse potential for describing the viscous cytoplasm structures, and a CGMD for describing the filamentous core and the membrane structures Parameterize the undetermined parameters to match physical properties:
Properties Considerations
Platelet Cell Size
Cell Plasma Compressibility
Membrane Young’s Modulus Cell Plasma Viscosity Cell Plasma Pressure Membrane Shear Modulus Stretching Response
►
𝒓𝑝 ← 𝒓𝑝 + 𝒗𝑝 ∙ Δ𝑡𝑝 + 𝑭𝑃 ∙ (Δ𝑡𝑝2 )/2
DPD
►
𝒗𝑝 ← 𝒗𝑝 + 𝜆𝑝 ∙ 𝑭𝑃 ∙ Δ𝑡𝑝
DPD
𝑹 𝑭𝒊𝒋 = 𝑭𝑪𝒊𝒋 + 𝑭𝑫 𝒊𝒋 + 𝑭𝒊𝒋 (Groot and Warren 1997)
►
𝑭𝑃 ← 𝑭𝑃 𝒓𝑝 , 𝒗𝑝
DPD
𝑭𝐶𝒊𝒋
= 𝛼𝜔 𝑟𝑖𝑗 𝒆𝑖𝑗
Conservative Term
𝑭𝐷 = −𝛾𝜔2 𝑟𝑖𝑗 𝒆𝑖𝑗 ∙ 𝑣𝑖𝑗 𝒆𝑖𝑗 Dissipative Term 𝑅
𝑭 = 𝜎𝜔 𝑟𝑖𝑗 𝜁𝑖𝑗 𝒆𝑖𝑗
Where 𝒓𝒊𝒋 = 𝒓𝒊 − 𝒓𝒋 , 𝑟𝑖𝑗 = 𝒓𝒊𝒋 , 𝒆𝒊𝒋 = 𝜔 𝑟𝑖𝑗
►
2 𝒓𝑚 ← 𝒓𝑚 + 𝒗𝑚 ∙ Δ𝑡𝑚 + 𝑭𝑀 ∙ Δ𝑡𝑚 /2
DPD-MD
►
𝒗𝑚 ← 𝒗𝑚 + 𝜆𝑚 ∙ 𝑭𝑀 ∙ Δ𝑡𝑚
DPD-MD
►
𝑭𝑀 ← 𝑭𝑀 𝒓𝑚 , 𝒗𝑚
DPD-MD
►
For 𝑙2 = 0 … 𝐾2 − 1
MD-NB
►
𝒓𝑛 ← 𝒓𝑛 + 𝒗𝑛 ∙ Δ𝑡𝑛 + 𝑭𝑁 ∙ Δ𝑡𝑛2 /2
MD-NB
𝑁
𝒗𝑛 ← 𝒗𝑛 + 𝑭 ∙ Δ𝑡𝑛
►
𝑟𝑖𝑗 = 1− 𝑓𝑜𝑟 𝑟𝑖𝑗 ≤ 𝑟𝑐 ; 𝑜. 𝑤. 𝜔 𝑟𝑖𝑗 = 0 𝑟𝐶
𝑁
𝑭 ←𝑭
►
The 𝜉𝑖𝑗 are symmetric random variables with zero mean and unit variance, uncorrelated for different pairs of particles and different times.
𝑁
𝒓𝑛
MD-NB
Respa()
Modified verlet half step integration (If ilevel=level_dpd or level_interface)
post_integrate_respa() Rebuild the neighbors if necessary
init() Forward comm Largest timestep
setup() or setup_minimal()
run(int n)
Recursive for four levels: level_Bond, level_lj, level_interface, level_dpd
force_clear() fix->pre_force
pair->compute_lj()
pair->compute()
pair->compute_interface()
MD-NB bond->compute()
►
For 𝑙4 = 0 … 𝐾1 − 1
MD-BD
►
𝒓𝑏 ← 𝒓𝑏 + 𝒗𝑏 ∙ Δ𝑡𝑏 + 𝑭𝑁 ∙ Δ𝑡 2 /2
MD-BD
angle->compute()
►
𝒗𝑏 ← 𝒗𝑏 + 𝑭𝑁 ∙ Δ𝑡
MD-BD
Reverse comm
►
𝑭𝐵 ← 𝑭𝐵 𝒓𝑏
MD-BD
fix->post_force()
►
𝒗𝑚 ← 𝒗𝑚 + 𝑭𝑀 + 𝑭𝑀 ∙ Δ𝑡𝑚 /2
DPD-MD
►
𝑭𝑀 ← 𝑭𝑀
DPD-MD
Viscosity
►
𝒗𝑝 ← 𝒗𝑝 + 𝑭𝑃 + 𝑭𝑃 ∙ Δ𝑡𝑝 /2
DPD
Compressibility
►
𝑭𝑃 ← 𝑭 𝑃
DPD
Density
pair->compute_dpd()
Classical verlet final step integration (If ilevel=level_bond or level_lj)
final_integrate_respa() fix->end_of_step()
Three precision options: single, mix, and double
P. Zhang, N. Zhang, Y. Deng, and D. Bluestein, “A Multiple Time Stepping Algorithm for Efficient Multiscale Modeling of Platelets Flowing in Blood Plasma”, Journal of Computational Physics, vol. 284, pp. 668-686, 01/2015. P. Zhang, C. Gao, N. Zhang, M. J. Slepian, Y. Deng, and D. Bluestein, "Multiscale Particle-Based Modeling of Flowing Platelets in Blood Plasma Using Dissipative Particle Dynamics and Coarse Grained Molecular Dynamics", Cellular and Molecular Bioengineering, vol. 7 pp. 552-574, 12/2014.
. Membrane velocity distribution of platelet during it flips in Couette flow
No-slip boundary condition for DPD flows
Above 4-level multiscale MTS algorithm
DPD prediction and correction time integration
N. Zhang, P. Zhang, W. Kang, D. Bluestein, and Y. Deng, "Parameterizing the Morse potential for coarse-grained modeling of blood plasma", Journal of Computational Physics, vol. 257, pp. 726-736, 01/2014.
pair->compute_lj()
pair->compute_lj_gpu()
pair->compute_interface()
pair->compute_interface_gpu()
Acknowledgements
pair->compute_dpd()
pair->compute_dpd_gpu()
I would like to thank my team members: Dr. Peng Zhang, Dr. Seetha Pothapragada, Chao Gao and Li Zhang for their help and Prof. Danny Bluestein for his support. This research was made possible by grants from the National Institute of Health: NHLBI R21 HL096930-01A2 (DB) and NIBIB Quantum Award Implementation Phase II-U01 EB012487-0 (DB). The tests on Tianhe-2 used the award of 20K computing hours from National Supercomputer Center in Guangzhou, China (NSCC-GZ).
Tailored Modifications:
With combined algorithmic and hardware accelerations, we can efficiently simulate 1-𝑚𝑠 the millisecond-scale hematology at resolutions of nanoscale platelets and mesoscale bio-flows using millions of particles. The rule of thumb is to consider the balance of speed and accuracy for an optimal MTS scheme and the balance of computation and communication for an optimal load-balancing scheme between accelerators and CPUs. Future work involves with the efforts to reduce communication overheads and simulate more complicated multiscale phenomena.
N. Zhang, P. Zhang, L. Zhang, X. Zhu, L. Huang, and Y. Deng, “Performance Examinations of Multiple Time-Stepping Algorithms on Stampede Supercomputer”, XSEDE15 Technical Paper Program, St. Louis, MO, 07/2015.
Particle data is exchanged between host and device every step.
𝑳𝑱
Summary and Future Work
References and Related Published Work
output->write() (If any)
LAMMPS (S. Plimpton et al.) and LAMMPS GPU Package (M. Brown et al.) Force evaluations and neighbor list build can be accelerated.
Ratio of communication over computation(1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards
Modified verlet final step integration (If ilevel=level_dpd or level_interface)
Speedup Strategy II-GPGPU Acceleration
Viscous Boundary Layers
𝑹 𝑭𝒊𝒋 = 𝑭𝒊𝒋 + 𝑭𝑫 𝒊𝒋 + 𝑭𝒊𝒋
1 𝑟𝑒 𝜙 = 𝜙 𝛾𝑡 = atan tan −𝛾𝑡 2 + tan−1 𝑟𝑒 tan 𝜙0 𝑟𝑒 𝑟𝑒 + 1
DPD-MD
𝒓𝒊𝒋
Hybrid force filed containing the dissipative and random terms from DPD and Lenard-Jones potential from MD. It’s exploited to mimic friction between platelet membranes and surrounding blood flows.
Parameterize the undetermined parameters to match: platelets flipping trajectory with analytical solution (Jeffery’s orbit) in Couette flow Rotation angle:
For 𝑙3 = 0 … 𝐾3 − 1
𝒓𝒊𝒋
Parameterize the undetermined parameters and modify boundary conditions to match the physical properties:
initial_integrate_respa()
►
Random Term
Classical verlet half step integration (If ilevel=level_bond or level_lj)
env_set()
Reynolds Number
Cell Plasma Density
Besides, we also need to consider computational feasibility and the ability of platelet model to become activated.
Spatial Interfacing
K1 , K 2 , K 3 are “Jump Factors”
Couette flow by applying two counter body forces on all boundary particle; in such case, a uniform shear stress will be emulated; and platelet can flip inside such environment. It’s the simplest fluidplatelet simulation setup