Available online at www.sciencedirect.com
ScienceDirect Procedia Engineering 126 (2015) 622 – 627
7th International Conference on Fluid Mechanics, ICFM7
A new parallel algorithm of dsmc method Fei Huang*, Xuhong Jin, Wenlong Mei, Xiaoli Cheng China Academy of Aerospace Aerodynamics, Beijing, 100074, China
Abstract A new parallel algorithm of DSMC method with dynamic load balance implementation was introduced using MPI standard library. The parallel efficiency of the two kinds of parallel modes, peer-to-peer mode and master/slave mode, were investigated, respectively. The key points of implementation in unstructured DSMC code using peer-to-peer parallel mode were analyzed. Validity about this method was investigated by analyzing blunt cone flow field. The numerical results confirm the feasibility of method mentioned above. © by Elsevier Ltd. by ThisElsevier is an open ©2015 2015Published The Authors. Published Ltd.access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of The Chinese Society of Theoretical and Applied Mechanics (CSTAM). Peer-review under responsibility of The Chinese Society of Theoretical and Applied Mechanics (CSTAM) Keywords: peer-to-peer mode; unstructured grid; dynamic load balance; parallel efficiency
1
Introduction
Direct simulation Monte Carlo (DSMC) method was formulated by Bird [1] in 1970, and it marked a milestone in the evolution of fluid mechanics. Although the method was not appreciated then, it had enormous power, and, as time went on, it was extensively adopted in solving complex rarefied flows involving chemical reactions, ionizations, and high degrees of non-equilibrium [2, 3]. The basic steps in DSMC [4] simulation are that the position, velocity, and internal energy of each simulated particle which represent real gas molecules are stored and can be changed through colliding with each other and reflecting off surfaces. Lastly, the flow macroscopic quantities could be obtained by sampling. As hardware ability improves, parallel computation becomes very in prevalent [4, 5]. So many DSMC codes have used parallel implementations, including the MONACO [6], PDSC [7-9] based on unstructured tetrahedral cell, and
* Corresponding author. Tel.: +86-010-68742511, 13401016943. E-mail address:
[email protected].
1877-7058 © 2015 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of The Chinese Society of Theoretical and Applied Mechanics (CSTAM)
doi:10.1016/j.proeng.2015.11.250
Fei Huang et al. / Procedia Engineering 126 (2015) 622 – 627
623
the DAC [10], SMILE [11], ICARUS [12], DS2V/3V [13] based on Cartesian cell. However, parallel efficiency has to be improved due to the larger computational cost associated with the DSMC method for transitional gas flows. In order to improve parallel efficiency of DSMC simulation using MPI standard library, in this work, the key points of implementation in unstructured tetrahedral DSMC code using peer-to-peer parallel mode will be analyzed. Then, validity about this method is going to be investigated by analyzing blunt cone flow field. 2
Parallel mode based on dynamic load balancing
As we all know, the most important object of parallel computation is to decrease computational cost, or to solve more complicated problem under the same hardware source. Actually, computational cost is mainly determined by the slowest process in simulation. Each process is never keeping consistent, because each task load may not equal. The fastest process has to wait the slowest process for exchanging data, thus, it should keep load balancing of each process. However, in DSMC computing, particles will move in different computational region, resulting in unbalanced load. Therefore, dynamic load balancing has to be adopted. Two different DSMC parallel modes, master-slave mode and peer mode, are investigated for dynamic load balancing in parallel DSMC simulation in this paper. 2.1
Parallel implementation in master-slave mode
This parallel way uses one master to gather all the whole particles, decompose domain, scatter particle information, output results, etc. Other process compute particle movement, collision, and output flow field belong to self process. Master does not compute except for controlling. Each slave process is the same status for master. Dynamic load balance implementation procedure in master slave mode is as follows. 1) Master initializes flow field in DSMC simulation, and sends initial free stream parameters and controlling parameters. Slave node receives the parameters from master. 2) Master node implements domain decomposition based on this initial particles distribution information, and sends that decomposition information by MPI_SEND function. 3) Slave node receives domain information, particles information by MPI_RECV function, and conducts DSMC simulation. After some cycle steps, slave sends results to master, and master receives dynamic particles distribution information and determines whether it makes domain decomposition. If it is true, circulating to 2), or continue to simulate or exit. 2.2
Parallel implementation in peer-to-peer mode
This parallel way allows all the processors to implement the same program, and each processor computes particle movement, collision, and outputs results in self domain. Every CPU has the equal status to control data and deal with that in communication. Dynamic load balance implementation procedure in peer-to-peer mode is as follows: 1) Each processor initializes flow fields in DSMC simulation. 2) Any processor gather the particle distribution information in the whole flow field, and then implement domain decomposition based on this distribution information, broadcast the domain decomposition information stored in array ID(i) by MPI_BCAST function lastly. 3) All processors make data exchange based on decomposition information array ID(i) by MPI_SCATTERV functionˈand after that, implement the Monte Carlo simulation in self domain. Each detects whether load balance need be done, if it is true, circulating to 2), or continue to simulate or exit. By comparing the two above different parallel modes, it can be concluded that, master-slave mode, which use a master node to make data gather, scatter, and other operation, is relatively complex. That parallel way needs master to gather all the whole particles information, resulting in a larger storage requirement, a longer communication time, while the use of peer to peer mode can overcome this difficulty, which is because this parallel mode implements domain decomposition in any one process only by particle distribution information, and makes data exchange one another.
624
Fei Huang et al. / Procedia Engineering 126 (2015) 622 – 627
Fig. 1 Dynamic load balance implementation in master/slave mode
3 3.1
Fig. 2 Dynamic load balance implementation in peer-to-peer mode
Data management on unstructured tetrahedral cell Output method for temporary files
All the processors participate in writing/reading temple files in peer-to-peer mode. However, the dynamic load balance approach results in the sequence alteration of particle information (such like particle position, velocity and energy) and grid-related information (such like sampling information of macro properties) after each load balancing. That alteration will make it difficulty of writing/reading temple files in sequence. To overcome this difficult, an identity grid method is adopted in load balancing, and the main procedure in DSMC simulation is as follows: 1) Each processor exports cell serial number of self domain located in the whole flow field cell serial number saved in the array MY_CELL(i) and the particle numbers in corresponding cell saved in the array MY_IC(i) in sequence. 2) Output particle information (such like particle position, velocity and energy) and grid-related information (such like sampling information of macro properties) in sequence after outputting MY_CELL(i) and MY_IC(i) for each processor. Here, all the particles and cells information have been outputted before load balancing. 3) While reading the temple files in the next continuing computation, firstly read the serial number value MY_CELL(i) and particle numbers value MY_IC(i) in cells, and then make new load balance based on that information, broadcast the new identity grid information saved in the array ID(i) among all the processors by MPI_BCAST function. 4) Read self cells and particles information based on identity cell array ID(i) for each processor. This identity grid method makes sure cell serial number in the whole flow field never changed (because of grid sequence without any alteration), and cell-related information can be written in temple files in any sequence for each processor. The new domain decomposition information can be identified by array ID(i), and each processor only read self cell and particle-related information. 3.2
Processing method of sampling information on surface
In the peer mode, sampling information for each process on the surface will also be changed as the load balancing makes, in order to overcome the difficulty of surface sampling information, surface sampling information has to be dealt specially. Given that the whole body surface are fewer grid cells and not bring too much memory requirements, therefore, the object surface sampling information for each process before the read and write operations or load balancing can be stored in any process by summing function MPI_REDUCE, and this process is responsible for unified management. This has nothing to do with the other processes during the information reading or outputting, that facilitates information management on object surface.
Fei Huang et al. / Procedia Engineering 126 (2015) 622 – 627
4
625
DSMC implementation algorithms in Peer to peer mode
DSMC simulation process under peer to peer mode is as follows: (1) Initialize MPI parallel environment, start NP processes, and all the processes load free stream parameter, control parameter and overall grid information. (2) If starting a new calculation, any processes make domain decomposition based initial particles distribution information, and broadcast domain information by MPI_BCAST function, which are stored in the array ID(i). If not a new calculation, any process read particles distribution information to ID(i) array, and the broadcast domain information to other processes by MPI_BCAST function. (3) Every processes allocate array variable according to the domain identifies array ID(i). If it is a new calculation, every processes initialize flow field information belonging to self domain. If not, every processes need read relative information according to ID(i) array. (4) Implement common DSMC simulation such like particle movement, collision, sorting, sampling etc. (5) If transient step equal the cycle of dynamic load balancing, any processes receive particles distribution information and make domain decomposition base on particle numbers. Then broadcast domain information by MPI_BCAST function, exchange data among every processes by EXCHANGE() function, circulating to (4). (6) If transient step equals the number of output cycles. Every process output flow field, grid and particles–related information of self process one by one. (7) Judge whether cycle ends, if not, go to process (5), or else stop. 5
Results and discussion
The flow over a blunted cone is simulated to analyze applicability of above parallel modes. The free stream velocity is about 2478m/s. Blunt cone head radius is about 2.286mm. Half cone angle is 9°. Base diameter is 15.24mm. The nitrogen flow is considered in this work. The angle of attack are 0ι, 10ι, 20ι, 25ι. Half model is used in the simulation. The number of unstructured cell is about 0.17 million, and simulated molecules are about 3 million. Minimal size cell is about 3mm. The reference area is the base area, and the references position of pitch moment is located in head vertex. Fig. 3 shows pressure contours at the 10ι angle of attack. Then, aerodynamics predicted by present parallel code and the Monaco code developed by Boyd are compared with experimental results[14] in Fig. 4, indicating that the results predicted by different methods are in excellent agreement. To investigate the parallel efficiency in different parallel modes, parallel efficiency with CPU numbers at the 10ι angle of attack are shown in Fig. 5. It can be seen that parallel efficiency is higher when using present peer-to-peer mode. As processor increases, parallel efficiency difference between the two modes becomes a little. This can be explained by the fact that master node in master/slave mode makes more data exchange with another than the peerto-peer mode. With the increase of processors, the time consumed of two modes approaches gradually. Fig. 6 shows computation time by using different CPU numbers. Parallel algorithms for peer mode at the same CPU numbers take smaller CPU cost than master/slave mode, and as the number of processors decreases, peer mode algorithms saves machine time more. This is mainly due to the fact that master node of master-slave mode takes much more time in data exchange in the relatively small number of processor. Then, speedup ratio in Fig. 7 can be found that the results predicted by present parallel algorithms give a greater speedup ratio than in the past. Lastly, Fig. 8 presents particles number per processor, showing that particles number per processor keeps equal nearly using present peer-to-peer mode, and load balance is well. 6
Conclusions
The present work developed a data management strategy in parallel peer to peer mode DSMC simulation. The key points of implementation in unstructured DSMC code using peer to peer parallel mode are described. The efficiency of two different parallel modes is compared, with results predicted by present strategy showing a good improvement in computational efficiency, and keeping in an excellent agreement with experimental data. The numerical results confirm the feasibility of method mentioned above.
626
Fei Huang et al. / Procedia Engineering 126 (2015) 622 – 627
Fig. 3 Pressure contours
Fig. 4 Comparison on aerodynamics
Fig. 5 CPU time
Fig. 6 Parallel effenciency
Fig. 7 Speedup ratio
Fig. 8 Comparison of particle numbers in different processors
Fei Huang et al. / Procedia Engineering 126 (2015) 622 – 627
References [1]
J. F.PadillaୈK.C.Tseng and I. D. Boyd. Analysis of Entry Vehicle Aerothermodynamics Using the Direct Simulation Monte Carlo Mehod[R]. AIAA paper 2005-4681. [2] J. N.Moss, C. E. Glass and F. A. Greene. DSMC Simulation of Apollo Capsule Aerodynamics for Hypersonic Rarefied Conditions[R].AIAA paper 2006-3577.
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
M.S.Ivanov, G.N.Markelov, S.F.Gimelshein. High-altitude Capsule Aerodynamics with Real Gas Effects[J]. Journal of Spacecraft and Rockets,1998,35(1),16-22. M. G. Kim, H.S. Kim. A parallel Cell Based DSMC Method with Dynamic load Balancing Using Unstructured Adaptive meshes[R].AIAA 2003-1033. M .Laux. Optimization and Parallelization of the DSMC Method on Un-structured Grids[R].AIAA 1997-2512. S.Dietrich and I.D.Boyd, Scalar and Parallel Optimized Implementation of the Direct Simulation Monte Carlo Method[J]. Journal of Computational Physics,Vol.126,1996,pp.328-342. J.S.Wu,K.C.Tseng. Parallel DSMC method using dynamic domain decomposition[J].International Journal for Numerical Methods in Engineering.Volume:63,Issue:1,2005,Pages:37-76. J.S.Wu,K.C.Tseng, T J Yang. Parallel implement of DSMC using unstructured mesh[J].International Journal of Computational Fluid Dynamics.Volume:17,Issue:5,2003,Pages:405-422. J.S.Wu,K.C.Tseng. Concurrent DSMC method using dynamic domain decomposition[C]. 23 rd International Symposium on Rarefied Gas Dynamics.Whidtler,Canada,2002,Rarefied Gas Dynamics Book Series: AIP Conference Proceedings,Volume:663,Pages:406-413,2003. G.J.LeBeau.A parallel implementation of the direct simulation Monte Carlo method[J].Computer Methods in Applied Mechanics and Engineering,Vol.174,1999,pp.319-337. M.S.Ivanov,G.N.Markelov, S.F.Gimelshein. Statistical simulation of reactive rarefied flows:numerical approach and applications[R].AIAA Paper 98-2669,1998. T.J.Otahal,M.A.Gallis,T.J.Bartel.An Investigation of Two-Dimensional CAD Generated Models with Body Decoupled Cartesian Grids for DSMC[R].AIAA paper 2000-2361,2000. G. A. Bird. Molecular Gas Dynamics and Direct Simulation of Gas Flow Clarendon[M].London:Oxford University Press,1994. J. F. Padilla, I. D. Boyd. Assessment of Rarefied Hypersonic Aerodynamics Modeling and Windtunnel Data[R].AIAA 2006-3390,2006
627