Centre for Advanced Coating Technologies

1 downloads 0 Views 5MB Size Report
Chart below shows the solver's routines ... Overall, GPU speed ups of up to 120 times faster compared to a single processor .... times faster runtime is gained.
GPU BASED MULTIPHASE SMOOTHED PARTICLE HYDRODYNAMICS (SPH) Amirsaman Farrokhpanah, Babak Samareh, Javad Mostaghimi Centre for Advanced Coating Technologies, Mechanical and Industrial Engineering Department, University of Toronto

Particle Arrangements

Materials and Methods Main Loop

Multiphase SPH with surface tension SPH formulations for can differ based on the nature of the problem under investigation. Here for multiphase SPH, isothermal and incompressible Navier-Stokes equations in a Lagrangian form shown below are used. =− =

1



+

+

Adding ghost particles to each cell

Loop on each cell

Sending cell to GPU

+

where represents external body and the surface tension force is shown by derived based on the Continuum Surface Force (CSF) [2] as 1 = = Π= . −

and is Each cell is sent to GPU separately

where is the surface tension coefficient, the curvature of the phase interface, the unit normal which is perpendicular to the interface, surface delta function, and a color function defined to produce a unit jump when passing over interfaces between phases. These formulations are discretized using SPH in the form of (see [1] for details) =

Updating particles located in each cell

Ghost particles are used since GPU Blocks do not support message passing

1

=− = =

1

Surface tension effects Viscous/pressure force effects

+ 2

1 +

1

Π=

Recieving data back from GPU

1

+ Π

+

Π Searching for Neighboring Particles

With Π

,

=

1

1

,

,



,

,

List of all particles located in the neighborhood of each particle should be constructed for knowing which particles should contribute to the summations. Two methods are shown below:

∀ ,

+

GPU Implementation and Results

GPU architecture GPU Grid

=

•Global Memory •Constant Memory •Texture Memory GPU Block



Direct search: searching directly all particles to find those who fall into the neighborhood of each particle

As demonstrated in the bar chart, by using the right amount of registers memory and reduction algorithms, a 77 times faster runtime is gained on GPU compared to the corresponding CPU version. 2.

One particle per thread Another practical method to avoid atomic operations is to change threads responsibilities by assigning each individual particle to a single thread. Loops inside each thread would make sure that all 64 neighbors of each particle are involved in calculations. As shown below, avoiding atomic operations can lead to ~17% reduction in computational time. (black lines presenting results when only atomic operations are used (method 1.i), while red lines belong to the reduction method (method 1.ii). Blue lines are also related to the case when one particle per thread method is used (method 2)).

Nearest Neighboring Particle Search method (NNSP): the computational domain is divided into equal subdomains. In the first step, a tracking list is generated to assign particles to their corresponding subdomain. The search algorithm is then only limited to the neighboring subdomains of the target particle

• GPU processing threads can perform parallel calculations. Threads are launched in groups called Blocks. Several Blocks together form a GPU Grid, as demonstrated in the figure. • Each thread has access to its Local and Registers memory. Threads inside each Block have access to Block’s Shared memory. All threads inside grid have access to Global, Constant and Texture memories. www.PosterPresentations.com

Drop profile is linearly interpolated into the solid boundary. Boundary particles fallen in each region would be treated as having the same phase as the particles above them. time



10

0.5

1

1.5

2

2.5

= 100°

3

0

5

0 0

0.5

1

1.5

2

2.5

-5

3 time

= 110°

-0.5 -1 -1.5 -2 -2.5

= 130°

-3

-10

Figures above compare results for a drop reaching an equilibrium contact angle of 90° while initially positioned at an angle close to 90°. Left figure shows result of previous available model [1] while figure on the right shows current implementation results. Figure below compares same cases when reaching an 60 from an initial angle of 90° 90 equilibrium angle of 60° (black line shows result of the new model while red line 4.0E-4 displays results of previous model [1]).

= 145°

= 160°

= 175°

3.0E-4

30

2.0E-4

25

1.0E-4

20

0.0E+0 0.0E+0

15 10 5 0 -5

0

0.5

1

1.5

2

2.5

3 time

2.0E-4

4.0E-4

6.0E-4

8.0E-4

1.0E-3

1.2E-3

Figures on the right show water droplets impacting on a surface at their maximum expanded radius. Using the method presented before, a constant contact angle is imposed ( ). Impact velocity is 1 m/s (Re=440, We=6.86). The vertical axis here is also the symmetry axis.

Conclusion Using GPU device combined with the right algorithms was shown to enhance performance by decreasing the runtime. Contact angle implementation method proposed here generates more accurate results while keeping the solver more stable.

•Shared Memory •Registers Memory GPU Thread •Local Memory

RESEARCH POSTER PRESENTATION DESIGN © 2012



ii. Reduction: these methods are designed to manage threads in orders that preform predefine tasks that by default have no conflicts inside. Unfortunately these procedures can only be conducted inside each GPU block. Here reduction results of each block are still added using atomic operations. (figure on right)

Density Calculation

Time marching

= 90°

0

Updating NNSP list on GPU

Size limitations force to split domain into cells that fit into GPU

=

i. Using only atomic operations: Operations performed under atomic operations would be executed in serial rather than parallel whenever race condition occurs, hence this makes the solution rather slow. (schematically shown in figure on top)

deviation from 90° (°)

Initial CPU-GPU variables allocation

Contact angle near the triple point needs to be adjusted for more accurate results. Here, similar methods to those proposed in VOF by Šikalo et al. [3] and Afkhami et al. [4] are jointly used for improving the previous available SPH model. = 50° The following procedures are taken: • Only surface tension effects between fluid and gaseous phases are taken into account • Based on the desired contact angle value ( ), the = 70° unit normal vector of the particles near the contact line is recalculated based on = cos + sin

deviation from 90° (°)

Chart below shows the solver’s routines

Main

A parallel GPU compatible Lagrangian mesh free particle solver for multiphase fluid flow based on Smoothed Particle Hydrodynamics (SPH) scheme is developed. Surface tension is modeled employing the multiphase scheme of Hu et al. [1]. GPU speedup using a variety of different memory management algorithms on GPU-CPU is studied. Overall, GPU speed ups of up to 120 times faster compared to a single processor CPU were obtained.

In the solver here, each particles is allowed to have up to 64 neighbors. Adding the effects of these neighbors for each particle can be done in two ways: 1. One particle per multiple threads 64 different threads are assigned to these 64 neighbors. Since different threads work in parallel and might access variables simultaneously, they might overwrite each other's effects and produce errors (race condition). This can be avoided in two ways:

Contact angle implementation

deviation from 60° (°))

Abstract

Threads’ task assigning

SPH implementation on GPU

References Figure shows runtime is 30 times faster when using NNSP instead of the Direct Search method on CPU. By switching from CPU to GPU using CUDA, the NNSP method runtime is boosted by 4 folds. Performance can be even improved by an extra 6 folds when the tracking list generation is also transferred over to the GPU. Overall, the GPU version would be 25 times faster than the CPU counterpart.

Since data transfer to GPU is costly, keeping the amount of transfers minimized would reduce runtime. One of the useful methods for achieving this goal is to have different variables calculated each time needed rather than storing them after initial calculations. This procedure has been used in cases shown with red and blue lines in the above figure. This frees GPU memory (figure in middle) and hence lets more particles to be launched to GPU. By increasing the number of particles sent to GPU, the amount of GPU idle time is decreased (figure on right) and consequently runtime is decreased (figure on left). By using all these mentioned methods, runtime can be up to 120 times faster on GPU compared to the corresponding CPU version.

[1] Hu, X.Y. and Adams, N.A., 2006, A multi-phase SPH method for macroscopic and mesoscopic flows, 213, Elsevier Inc. , Journal of Computational Physics, pp. 844–861. [2] Brackbill, J.U, Kothe, D.B, and Zemach, C., 1992, A continuum method for modeling surface tension, Journal of Computational Physics, pp. 335-354. [3] Šikalo, Š., Wilhelm, H.-D., Roisman, I. V., Jakirli, S., and Tropea, C., 2005, . Dynamic contact angle of spreading droplets: Experiments and simulations, Physics of Fluids, American Institute of Physics, 17. [4] Afkhami, S., Zaleski, S. and Bussmann, August 2009, A mesh-dependent model for applying dynamic contact angles to VOF simulations. M. 15, s.l.: Academic Press Professional, Inc. San Diego, CA, USA, Journal of Computational Physics, Vol. 228.