HPC Overview

4 downloads 0 Views 2MB Size Report
CS-Storm 8 K40. CS-Storm 16 K80 ..... objective of this project is to efficiently simulate flow-induced platelet activation in order to better understand thrombosis.
Efficient Multiscale Platelets Modeling Using Supercomputers Na Zhang Advisor: Professor Yuefan Deng Department of Applied Mathematics and Statistics, Stony Brook University

Computational Complexities

Motivation HPC Matters, Now More than Ever

Performance Results on Supercomputers 

Time (s)

Thrombosis Burden!

Cardiovascular Devices

10 0

10 -6

10

# of Particles

Dimensions

Exp-S

1

680,718

45×90×45

Exp-M

4

2,722,872

90×90×90

Exp-L

16

10,891,488

180×90×180



Mesoscopic DPD, LBM, BD

CGMD Classical MD, MC

-12

Case

Quantum

-15

10 -12

10 -9

10 -6

10 -3

10 0

CaseA CaseB CaseC CaseD CaseE STS

Space (m)

Complexity II: Complicated Model and Force Fields

Source: Thrombogenicity Potential of Mechanical Heart Valves Simulations, Bio fluids Laboratory, Department of Biomedical Engineering. Stony Brook University

This objective of this project is to efficiently simulate flow-induced platelet activation in order to better understand thrombosis formation mechanisms. The methodologies include:  Mathematical modeling of viscous blood flows and human platelet cell  Algorithmic acceleration 

Multiscale coupling methods

 Heterogeneous computing



Data analysis of thermodynamic properties

 Visualization

Categories

Single Platelet

Multiple Platelets

In Vacuum

~0.14 million particles

Complex interactions among platelets

~0.6 million particles

~2.7 million particles for 4 platelets flipping in blood plasm ~10.9 million particles for 16 platelets flipping in blood plasma > 50 million particles for 100 platelets in blood plasma

In Blood Plasma

Multiscale Fluid-Platelet Models

Varying MTS Jump Factors: Time steps for each scale

Complexity I: Disparate Temporal and Spatial Scales Jackson et al., Dynamics of Platelet Thrombus Formation, Journal of Thrombus and Haemostasis, 2009

# of Platelets

Flow

10 -9 10

Experiments Continuum CFD

10 -3

Varying Problem Sizes:

In Blood Vessels

Many types of blood cells and complex interactions among those cells

With Shear Stresses & Thermo Conditions

Much more complex inputs and outputs control



CGMD-BD (𝚫𝒕𝟏 × 𝟏𝟎−𝟔 ) 2.5 5.0 5.0 10.0 10.0 1.0

CGMD-NB (𝚫𝐭 𝟐 × 𝟏𝟎−𝟔 ) 2.5 5.0 5.0 10.0 10.0 1.0

DPD-CGMD (𝚫𝐭 𝟑 × 𝟏𝟎−𝟔 ) 25.0 50.0 50.0 100.0 100.0 1.0

Configurations DPD (𝚫𝐭 𝟒 × 𝟏𝟎−𝟔 ) 500.0 1000.0 500.0 500.0 1000.0 1.0

𝚫𝒕 × 𝟏𝟎−𝟔

𝑲𝟏

𝑲𝟐

𝑲𝟑

500.0 1000.0 500.0 500.0 1000.0 1.0

1 1 1 1 1 1

10 10 10 10 10 1

20 20 10 5 10 1

Varying Test Systems (CPU-only versus High-Density GPGPU Server): Tianhe-2

CS-Storm 8 K40

CS-Storm 16 K80

Complexity III: Large Demand of On-the-fly Analysis

Speedup Strategy I-Multiscale Multiple Time Stepping Algorithm

Scales

Nanoscale

Mesoscale

Simulation Domain

Platelet Cell

Blood Plasma

Methods

Coarse-Grained Molecular Dynamics (CGMD)

Dissipative Particle Dynamics (DPD)

Time Step

10~100 fs

0.01~1 𝜇s

Length

1~20 A

0.1 ~ 1 𝜇m

Speeds (in unit of day/𝝁𝒔) of no_mts or mts algorithms on (1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards

Various vascular geometries simulated by Dissipative Particle Dynamics method Source: Joao and Chao 2012

Model Abstraction

Perf. Improvement of mts over no_mts algorithms on (1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards

Multiple Scales in the Model

Physical structures and constitutes of Multiscale model of human platelets

𝑉 𝑟 = +

+

Force Fields +

𝑘𝑏 𝑟 − 𝑟0

2

+

𝑘𝜙 1 + cos 𝑛𝜙 − 𝛿

4𝜀

𝜎 𝑟

12

𝜎 − 𝑟

𝑟 𝜖 𝛼 1− 𝑅 𝜇

𝑘𝜃 𝜃 − 𝜃0 +

2

Bond and Angle Terms

𝑞𝑖 𝑞𝑗 Dihedral and Electrostatic 4𝜋𝜀0 𝑟

6

Van Der Waals (L-J) 𝛼 𝑟 − 2 exp 1− 2 𝑅 𝜇

Modified Morse

𝑉 is the total energy on each particle composed of platelet. It includes a classical MD potential for describing the actin filament structure, a modified Morse potential for describing the viscous cytoplasm structures, and a CGMD for describing the filamentous core and the membrane structures Parameterize the undetermined parameters to match physical properties:

Properties Considerations

 Platelet Cell Size

 Cell Plasma Compressibility

 Membrane Young’s Modulus  Cell Plasma Viscosity  Cell Plasma Pressure  Membrane Shear Modulus  Stretching Response



𝒓𝑝 ← 𝒓𝑝 + 𝒗𝑝 ∙ Δ𝑡𝑝 + 𝑭𝑃 ∙ (Δ𝑡𝑝2 )/2

DPD



𝒗𝑝 ← 𝒗𝑝 + 𝜆𝑝 ∙ 𝑭𝑃 ∙ Δ𝑡𝑝

DPD

𝑹 𝑭𝒊𝒋 = 𝑭𝑪𝒊𝒋 + 𝑭𝑫 𝒊𝒋 + 𝑭𝒊𝒋 (Groot and Warren 1997)



𝑭𝑃 ← 𝑭𝑃 𝒓𝑝 , 𝒗𝑝

DPD

𝑭𝐶𝒊𝒋

= 𝛼𝜔 𝑟𝑖𝑗 𝒆𝑖𝑗

Conservative Term

𝑭𝐷 = −𝛾𝜔2 𝑟𝑖𝑗 𝒆𝑖𝑗 ∙ 𝑣𝑖𝑗 𝒆𝑖𝑗 Dissipative Term 𝑅

𝑭 = 𝜎𝜔 𝑟𝑖𝑗 𝜁𝑖𝑗 𝒆𝑖𝑗

Where 𝒓𝒊𝒋 = 𝒓𝒊 − 𝒓𝒋 , 𝑟𝑖𝑗 = 𝒓𝒊𝒋 , 𝒆𝒊𝒋 = 𝜔 𝑟𝑖𝑗



2 𝒓𝑚 ← 𝒓𝑚 + 𝒗𝑚 ∙ Δ𝑡𝑚 + 𝑭𝑀 ∙ Δ𝑡𝑚 /2

DPD-MD



𝒗𝑚 ← 𝒗𝑚 + 𝜆𝑚 ∙ 𝑭𝑀 ∙ Δ𝑡𝑚

DPD-MD



𝑭𝑀 ← 𝑭𝑀 𝒓𝑚 , 𝒗𝑚

DPD-MD



For 𝑙2 = 0 … 𝐾2 − 1

MD-NB



𝒓𝑛 ← 𝒓𝑛 + 𝒗𝑛 ∙ Δ𝑡𝑛 + 𝑭𝑁 ∙ Δ𝑡𝑛2 /2

MD-NB

𝑁

𝒗𝑛 ← 𝒗𝑛 + 𝑭 ∙ Δ𝑡𝑛



𝑟𝑖𝑗 = 1− 𝑓𝑜𝑟 𝑟𝑖𝑗 ≤ 𝑟𝑐 ; 𝑜. 𝑤. 𝜔 𝑟𝑖𝑗 = 0 𝑟𝐶

𝑁

𝑭 ←𝑭



The 𝜉𝑖𝑗 are symmetric random variables with zero mean and unit variance, uncorrelated for different pairs of particles and different times.

𝑁

𝒓𝑛

MD-NB

Respa()

Modified verlet half step integration (If ilevel=level_dpd or level_interface)

post_integrate_respa() Rebuild the neighbors if necessary

init() Forward comm Largest timestep

setup() or setup_minimal()

run(int n)

Recursive for four levels: level_Bond, level_lj, level_interface, level_dpd

force_clear() fix->pre_force

pair->compute_lj()

pair->compute()

pair->compute_interface()

MD-NB bond->compute()



For 𝑙4 = 0 … 𝐾1 − 1

MD-BD



𝒓𝑏 ← 𝒓𝑏 + 𝒗𝑏 ∙ Δ𝑡𝑏 + 𝑭𝑁 ∙ Δ𝑡 2 /2

MD-BD

angle->compute()



𝒗𝑏 ← 𝒗𝑏 + 𝑭𝑁 ∙ Δ𝑡

MD-BD

Reverse comm



𝑭𝐵 ← 𝑭𝐵 𝒓𝑏

MD-BD

fix->post_force()



𝒗𝑚 ← 𝒗𝑚 + 𝑭𝑀 + 𝑭𝑀 ∙ Δ𝑡𝑚 /2

DPD-MD



𝑭𝑀 ← 𝑭𝑀

DPD-MD

 Viscosity



𝒗𝑝 ← 𝒗𝑝 + 𝑭𝑃 + 𝑭𝑃 ∙ Δ𝑡𝑝 /2

DPD

 Compressibility



𝑭𝑃 ← 𝑭 𝑃

DPD

 Density

pair->compute_dpd()

Classical verlet final step integration (If ilevel=level_bond or level_lj)

final_integrate_respa() fix->end_of_step()

 Three precision options: single, mix, and double

 P. Zhang, N. Zhang, Y. Deng, and D. Bluestein, “A Multiple Time Stepping Algorithm for Efficient Multiscale Modeling of Platelets Flowing in Blood Plasma”, Journal of Computational Physics, vol. 284, pp. 668-686, 01/2015.  P. Zhang, C. Gao, N. Zhang, M. J. Slepian, Y. Deng, and D. Bluestein, "Multiscale Particle-Based Modeling of Flowing Platelets in Blood Plasma Using Dissipative Particle Dynamics and Coarse Grained Molecular Dynamics", Cellular and Molecular Bioengineering, vol. 7 pp. 552-574, 12/2014.

. Membrane velocity distribution of platelet during it flips in Couette flow

No-slip boundary condition for DPD flows



Above 4-level multiscale MTS algorithm



DPD prediction and correction time integration

 N. Zhang, P. Zhang, W. Kang, D. Bluestein, and Y. Deng, "Parameterizing the Morse potential for coarse-grained modeling of blood plasma", Journal of Computational Physics, vol. 257, pp. 726-736, 01/2014.

pair->compute_lj()

pair->compute_lj_gpu()

pair->compute_interface()

pair->compute_interface_gpu()

Acknowledgements

pair->compute_dpd()

pair->compute_dpd_gpu()

I would like to thank my team members: Dr. Peng Zhang, Dr. Seetha Pothapragada, Chao Gao and Li Zhang for their help and Prof. Danny Bluestein for his support. This research was made possible by grants from the National Institute of Health: NHLBI R21 HL096930-01A2 (DB) and NIBIB Quantum Award Implementation Phase II-U01 EB012487-0 (DB). The tests on Tianhe-2 used the award of 20K computing hours from National Supercomputer Center in Guangzhou, China (NSCC-GZ).

Tailored Modifications: 

 With combined algorithmic and hardware accelerations, we can efficiently simulate 1-𝑚𝑠 the millisecond-scale hematology at resolutions of nanoscale platelets and mesoscale bio-flows using millions of particles.  The rule of thumb is to consider the balance of speed and accuracy for an optimal MTS scheme and the balance of computation and communication for an optimal load-balancing scheme between accelerators and CPUs.  Future work involves with the efforts to reduce communication overheads and simulate more complicated multiscale phenomena.

 N. Zhang, P. Zhang, L. Zhang, X. Zhu, L. Huang, and Y. Deng, “Performance Examinations of Multiple Time-Stepping Algorithms on Stampede Supercomputer”, XSEDE15 Technical Paper Program, St. Louis, MO, 07/2015.

 Particle data is exchanged between host and device every step.

𝑳𝑱

Summary and Future Work

References and Related Published Work

output->write() (If any)

LAMMPS (S. Plimpton et al.) and LAMMPS GPU Package (M. Brown et al.)  Force evaluations and neighbor list build can be accelerated.

Ratio of communication over computation(1) Tianhe-2 (2) CS-Storm with eight K40 cards (3) CS-Storm with sixteen K80 cards

Modified verlet final step integration (If ilevel=level_dpd or level_interface)

Speedup Strategy II-GPGPU Acceleration

 Viscous Boundary Layers

𝑹 𝑭𝒊𝒋 = 𝑭𝒊𝒋 + 𝑭𝑫 𝒊𝒋 + 𝑭𝒊𝒋

1 𝑟𝑒 𝜙 = 𝜙 𝛾𝑡 = atan tan −𝛾𝑡 2 + tan−1 𝑟𝑒 tan 𝜙0 𝑟𝑒 𝑟𝑒 + 1

DPD-MD

𝒓𝒊𝒋

Hybrid force filed containing the dissipative and random terms from DPD and Lenard-Jones potential from MD. It’s exploited to mimic friction between platelet membranes and surrounding blood flows.

Parameterize the undetermined parameters to match:  platelets flipping trajectory with analytical solution (Jeffery’s orbit) in Couette flow  Rotation angle:

For 𝑙3 = 0 … 𝐾3 − 1

𝒓𝒊𝒋

Parameterize the undetermined parameters and modify boundary conditions to match the physical properties:

initial_integrate_respa()



Random Term

Classical verlet half step integration (If ilevel=level_bond or level_lj)

env_set()

 Reynolds Number

 Cell Plasma Density

Besides, we also need to consider computational feasibility and the ability of platelet model to become activated.

Spatial Interfacing

K1 , K 2 , K 3 are “Jump Factors”

Couette flow by applying two counter body forces on all boundary particle; in such case, a uniform shear stress will be emulated; and platelet can flip inside such environment. It’s the simplest fluidplatelet simulation setup