48th AIAA Aerospace Sciences Meeting
Including the
New Horizons Forum and Aerospace 4 - 7 January 2010, Orlando, Florida
AIAA 2010-450
A Three-Level Cartesian Geometry Based Implementation of the DSMC Method Da Gao∗, Chonglin Zhang†, and Thomas E. Schwartzentruber‡ Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, MN 55455
The data structures and overall algorithms of a newly developed 3-D direct simulation Monte Carlo (DSMC) program are outlined. The code employs an embedded 3-level Cartesian mesh, accompanied by a cut-cell algorithm to incorporate triangulated surface geometry into the adaptively refined Cartesian mesh. Such an approach enables decoupling of the surface mesh from the flow field mesh, which is desirable for near-continuum flows, flows with large density variation, and also for adaptive mesh refinement (AMR). Two separate data structures are proposed in order to separate geometry data from cell and particle information. The geometry data structure requires little memory so that each partition in a parallel simulation can store the entire mesh, potentially leading to better scalability and efficient AMR for parallel simulations. A simple and efficient AMR algorithm that maintains local cell size and time step consistent with the local mean-free-path and local mean collision time is detailed. The 3-level embedded Cartesian mesh combined with AMR allows increased flexibility for precise control of local mesh size and time-step, both vital for accurate and efficient DSMC simulation. Simulations highlighting the benefits of AMR and variable local time steps will be presented along with DSMC results for 3-D flows with large density variations.
I. Introduction IRECT Simulation Monte Carlo (DSMC) is a particle-based numerical method1 that simulates the Boltzmann equation. As a result, DSMC is an accurate simulation tool for modeling dilute gas flows ranging from continuum to free-molecular conditions. The DSMC method tracks a representative number of simulation particles through a computational mesh with each simulation particle representing a large number of real gas molecules. A key aspect of the method is that molecular movement and collision processes are decoupled. Specifically, during each simulation time step, all particles are first translated along their molecular velocity vector without interaction. After the movement phase, nearby particles that lie within the same computational cell are collided in a statistical manner. This decoupling is accurate for dilute gas flows as long as the simulation time step remains less than the local mean-collision-time (∆t < τc ) and the cell size dimension remains less than the local mean-free-path (∆x < λ). Unlike the Molecular Dynamics technique, precise molecular positions are not integrated using physically realistic interatomic potentials. Rather, simulation particles within each DSMC cell are randomly selected to undergo collisions. This enables the DSMC method to require only a small number of simulation particles per cell (≈ 20) in order to obtain acceptable collision statistics and an accurate solution to the Boltzmann equation. If the correct collision rate in each cell is specified in addition to the correct probabilities of internal energy exchange and chemical reaction during each collision, the DSMC method can accurately simulate complex dilute gas flows. The numerical restrictions on cell size and time step become significant for full-scale 3D flows at moderate densities where small λ and τc require large numbers of computational cells (and thus particles) and many simulation time steps. It has been shown that reducing the cell size and time step significantly below λ and τc results in a negligible improvement in accuracy.2, 3 As a result, an ideal DSMC simulation (that maximizes both accuracy and efficiency) would have every computational cell adapted to λ (each containing 20 simulation particles) and move all particles for a local time step of τc . Typically, ∆x ≈ 21 λ and ∆t ≈ 15 τc . For flows involving little variation in density (λ ≈ constant, τc ≈ constant), a uniform mesh and simulation time step is both accurate, efficient, and easy to implement. However, many low density flows are associated with high-altitude flight which, in turn, is inherently linked to high flow velocities
D
∗ Research
Associate, Member AIAA. Email:
[email protected]. Assistant, Student Member AIAA. Email:
[email protected]. ‡ Assistant Professor, Member AIAA. Email:
[email protected]. † Research
1 of 14 American Institute of Aeronautics Copyright © 2010 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved. and Astronautics
and significant compressibility effects. Such flows can exhibit orders-of-magnitude variation in density and therefore present a challenge for the DSMC method both in terms of accuracy and in terms of computational efficiency. Existing state-of-the-art DSMC codes generally adopt one of two approaches for the geometry model. Here geometry model refers to both the computational mesh and surface representation. For example, the MONACO code4 utilizes an unstructured body-fitted quadrilateral or tetrahedral geometry model. Such tetrahedral meshes are easily fitted to complex surface geometries. MONACO is also equipped with a pre-processor that enables the generation of a new mesh that is refined based on the local λ values from a previous solution. In addition, different particle weights and time steps can be specified in each cell. The Icarus code5 uses a multi-block system of algebraic meshes to define the computational domain. This grid system allows flexibility for capturing local high gradient regions without excessively gridding the entire domain because there is no requirement for side correspondence between regions.5 In contrast to body-fitted grids, the second approach is to use a Cartesian-based geometry model. Here a Cartesian grid is used for the flow field and the surface mesh must be “cut” from the Cartesian grid in order simulate complex geometries. For example the DSMC Analysis Code (DAC)6 uses a two-level Cartesian grid where each level-1 cell can be refined into any number (in each co-ordinate direction) of level-2 Cartesian cells. A pre-processor enables a new mesh to be adapted using a previous flow solution and different particle weights and time steps can be specified for each level-1 cell. The DAC code then employs a “cut-cell” method that cuts an arbitrary triangulated surface mesh from the 2-level Cartesian flow field grid. Similarly, the SMILE code7 also employs a 2-level Cartesian grid and a cutcell method for simulating complex surface geometries. Finally, the DSnV codes1 use an equally-spaced background Cartesian grid where each division is further subdivided into a large number of Cartesian elements such that the number of elements is on the order of the number of simulated molecules. Collision and sampling cells are then comprised of grouping nearby elements together to form cells that are adapted locally to λ and contain a constant number of particles. However, caution must be exercised as grouping such smaller elements may lead to highly irregular-shaped collision and sampling cells. The purpose of this article is to describe in detail a new DSMC implementation called the Molecular Gas Dynamic Simulator (MGDS) code. The MGDS code extends upon the above-mentioned DSMC approaches and of particular interest is the manner in which the geometry model and particle information is stored in memory and the potential implications for large parallel simulations. Specifically, the geometry model chosen for the MGDS code is a 3-level Cartesian flow field grid utilizing a cut-cell method to imbed arbitrary triangulated surface geometries. The MGDS code performs fully automated adaptive mesh refinement (AMR) during a DSMC simulation and automatically sets different time steps in each level-3 cell. The major reasons for development of this particular geometry model (in order of importance) are as follows. 1. The memory required to store cell-node vertices and cell-face normal vectors for unstructured tetrahedral meshes is substantially larger than the memory required to store a multi-level Cartesian grid. For large flow field meshes (> 100 million cells) an unstructured tetrahedral mesh must not only be partitioned across parallel processors during a DSMC simulation, but pre/post-processing, grid generation, and AMR routines must also be parallelized due to the large memory requirements. A Cartesian geometry model significantly alleviates these problems and may therefore be more suitable for future large-scale DSMC simulations. 2. A Cartesian geometry model necessitates a cut-cell method in order to handle complex surface geometries. Although this may seem like a drawback, a cut-cell approach is an extremely general and accurate technique for complex geometries.6, 8 As discussed by LeBeau et al.,6 a cut-cell approach allows for complete decoupling between the flow field and surface mesh discretization. Such decoupling is especially important for the DSMC method and may be necessary for large-scale simulations of near-continuum flows where the mean-free-path near a vehicle surface may be orders of magnitude smaller than accurate surface resolution requires. Likewise, decoupling is equally important for low density flows (such as low orbiting satellites) where the local value of λ may be much larger than fine surface geometry features. 3. The MGDS code employs a 3-level embedded Cartesian grid for the flow field. Adding the third independent level of Cartesian refinement adds considerable flexibility over previous 2-level implementations for flows with large density gradients. Precise AMR is essential for both simulation accuracy and computational efficiency. 4. Use of a Cartesian geometry model not only results in lower memory storage requirements, but also enables many DSMC calculations to be performed with fewer floating point operations. Algorithms for initial grid generation, successive AMR, particle movement, and particle re-sorting during AMR require fewer operations when using a Cartesian grid compared to a non-Cartesian grid. The efficiency of such operations, and the degree to which they can be automated, will become increasingly important as larger DSMC simulations are performed, especially for time-accurate unsteady problems where frequent AMR is required. 2 of 14 American Institute of Aeronautics and Astronautics
The spatial and temporal discretization required for accurate and efficient DSMC simulation is inherently linked to the solution itself. Thus, the DSMC method is ideal for fully automated grid generation, AMR, and variable time steps, potentially requiring little-to-no user input. As computational resources continue to grow rapidly, such automation would avoid current bottle-necks in total simulation turnover created by high demands on code users for large-scale parallel flow problems. In addition to providing a new DSMC simulation tool for complex flows over 3D geometries, development of the MGDS code will provide independent evaluation of existing DSMC implementation approaches, as well as investigation of new approaches specifically geared towards future large-scale parallel simulations. The article is organized as follows. In Section II the data structures used to organize geometry, cell, and particle data are described, after which particle movement algorithms are outlined. In Section III the main DSMC algorithm, the AMR and variable time-step algorithm, and the cut-cell algorithm are described. Results for various 3D simulations that demonstrate the benefits of 3-level AMR and variable time steps are presented in Section IV. Section V contains a summary of the major aspects of MGDS code and conclusions drawn from the preliminary results.
II. Data Structures A.
Geometry Data Structure
Figure 1. Geometry data structure used within the MGDS code.
The Geometry data structure used within the MGDS code is depicted in Fig. 1. It consists of a 3D array of level-one (L1) cell structures. Each L1 cell structure contains the co-ordinates of the bounding Cartesian vertices (x0 , y0 , z0 , xE , yE , zE ) and a pointer to a 3D array of level-two (L2) cell structures. This 3D array can have arbitrary dimensions, meaning each L1 cell can contain an arbitrary number of L2 cells. Each L2 cell structure is identical to a L1 cell structure, and therefore contains a pointer to a 3D array of L3 cell structures. In addition to storing the bounding vertex coordinates, each L3 cell structure contains the local values of τc , λ, ∆t, and currently the maximum expected collision rate < σcr >max . Also, each L3 cell structure contains a pointer to an array of surface structures. The majority of cells will contain no boundary surfaces and the pointer will be “null”. However, for cells which do contain boundary surfaces (described in Section III C.), each surface structure in the array contains vertex, surface normal, area, surface type, and sampling variables. Given the coordinates of a particle, the Geometry data structure enables efficient cell-indexing of the particle. The most important aspect of this Geometry data structure (Fig. 1) is 3 of 14 American Institute of Aeronautics and Astronautics
that it contains all of the information required for the AMR, cut-cell, and variable time step (movement) procedures while requiring relatively little memory storage. Clearly the largest contribution to memory storage requirements comes from the data stored in each L3 cell. While the current MGDS implementation stores vertex coordinates for each L3 cell, this information could simply be computed from the L2 vertex information and the Cartesian index of the L3 cell, thereby greatly reducing the memory storage requirement. Only a small percentage of L3 cells that intersect a boundary surface will contain arrays of surface structures which clearly involve substantially larger memory requirements. Finally, each L3 cell structure contains both the processor number and local array index (on that processor) where its particle data and additional cell data is stored. It is proposed that the entire Geometry data structure could be stored locally on each processor in a parallel simulation even for large DSMC flow field and surface meshes. For simulations where highly complex surface geometry prohibits this, at a minimum, large portions of the Geometry data structure could be stored on each processor. The more memory-intensive cell and particle data associated with each L3 cell (detailed in the next section) must be partitioned across parallel processors. An important aspect of the MGDS data structures is that any L3 cell data can be partitioned to any processor. That is, domain decomposition is not limited to L1 or L2 cells, which is essential for load balancing of large parallel simulations. The proposed benefits of such a Geometry data structure, including efficient movement of particles with variable time steps and efficient AMR, for parallel simulations will be discussed in upcoming sections. B.
Cell/Particle Data Structure
Figure 2. Cell/Particle data structure used within the MGDS code.
A DSMC code requires the storage and book-keeping of a large amount of data. Complex data structures are often employed for data organization and can provide the flexibility required for a general DSMC implementation. Particle data is the dominant contributor to memory, however, the global number of particles, as well as the local number within each cell, can vary greatly within a DSMC simulation. Thus, statically allocated arrays of particles must either be conservatively large (unused memory) or re-sized often (computationally and memory intensive). Likewise, since the mesh resolution in DSMC depends on the solution itself, ideally, the number of cells would change during a simulation and data structures should account for this possibility as well. The MGDS code adopts the cell and particle data structures approach used in the MONACO code4 with certain modifications. The specific Cell/Particle data structure used by the MGDS code is depicted in Fig. 2. On each parallel partition, an array of L3 cells is maintained. This array is actually an array of pointers to L3 Cell/Particle structures so that if the array requires resizing (during AMR), resizing an array of pointers is much less memory intensive and more efficient than resizing an array of Cell/Particle structures. Each cell-data structure contains substantial data (currently including duplicated data from the Geometry data structure). Each Cell/Particle structure also contains an array that stores sampled (cell-averaged) data of the particles within the cell. Currently each Cell/Particle structure also contains the nine integer indices of the corresponding L3 cell in the Geometry data structure. Finally, each Cell/Particle structure contains pointers to the head and tail of a doubly-linked list of particle structures. Each particle structure contains all required particle information (as shown in Fig. 2) and in addition, also contains two integers for the processor number
4 of 14 American Institute of Aeronautics and Astronautics
and index within the local cell array where the particle is located (the reason for this will be explained in Section III). It should be noted that cell-face data and cell-connectivity data are not required since the grid is Cartesian. The MGDS code is written in Fortran 90. All data structures are defined as Fortran “derived data types” which enable general and efficient packaging of data for broadcast among partitions in a parallel simulation. C.
Particle Movement and Tracking
(a) Ray-tracing movement procedure
(b) Cartesian movement procedure.
Figure 3. Particle tracking procedures relevant to the DSMC method.
“Ray-tracing” is a general and efficient method of tracking particle movement through a computational mesh that is widely used in the computer graphics industry and is also used by many DSMC codes. The procedure is depicted schematically in Fig. 3(a) for movement within an unstructured 2D triangular mesh. Essentially, the timeto-hit each face of the current cell (∆thit− f in Fig. 3(a)) is computed using the particle position/velocity and the face vertices/normal-vector, where ∆x f is the normal-distance between the particle and cell-face. The particle is then advanced for the minimum time ∆thit−min = min(∆thit− f ) and the particle is re-located to the neighboring cell. The process is then repeated for the remaining time (∆tsim − ∆thit−min ). Of course if (∆thit−min < ∆tsim ) then the particle can be moved for the full time step and remains located in the current cell. This procedure is very general, it naturally sorts particles while moving them, and naturally allows for variable time steps to be used in each cell. Within a Cartesian grid, a particle may be moved for the full time step regardless of how many cells are crossed. The particle’s new position can then be re-indexed on the Cartesian grid very efficiently compared to re-indexing on a non-Cartesian grid. After re-indexing, the particle can be re-located directly to the new cell. This procedure, called “Cartesian-move” is depicted in Fig. 3(b). However, even on a Cartesian grid, there are three main drawbacks to the Cartesian-move procedure. The first is that variable time steps can not be used in each cell. This is not a drawback for computing unsteady flows, however, it is a significant drawback when computing steady-state flows as variable time steps increase the simulation efficiency greatly. The second drawback is that re-indexing on a multi-level Cartesian grid is rather computationally expensive, and may approach the expense of ray-tracing. Third, when simulating flow over complex geometries, ray-tracing must be used in cells close to boundary surfaces, thereby requiring the additional complexity of mixed Cartesian-move and ray-trace algorithms. Although the MGDS code maintains the option of the Cartesian-move procedure, the default is to use the raytracing procedure throughout the entire simulation domain. It is important to realize, however, that the ray-tracing algorithm is more efficient on a Cartesian grid than on a non-Cartesian grid, since the dot products (shown in Fig. 3(a)) involve only a single component. Furthermore, a trend in DSMC algorithm research is to enlarge cells above the ∆x ≈ λ constraint without loss of accuracy through the use of virtual subcells.9 The consequence of this trend for 3D simulations will be that a large majority of simulation particles remain within the same cell during a given time step. On a Cartesian grid, the ray-trace procedure to determine if ∆thit−min < ∆tsim is highly efficient. Further quantitative results comparing these two movement procedures can be found in Ref. 10.
5 of 14 American Institute of Aeronautics and Astronautics
III. Algorithms A.
Core DSMC Algorithm
By separating the Geometry data structure (Fig. 1) from the Cell/Particle data structure (Fig. 2), the main DSMC loop within the MGDS code is made more compact and communication between processors is potentially lowered when particles move across partitions in a parallel simulation. The resulting compact DSMC loop is shown in Fig. 4. Loop (for each L3 cell) Collide Particles (update particle properties) Sample Properties Move (update particle positions) - recursive ray-trace algorithm - including surface collisions - record new x,y,z, proc#, local cell# - particles remain in original linked-list IF (INFLOW) Generate/Move Particles End Loop Package up communication Send data to partitions Global Sort (loop over all cells/particles)
Figure 4. Main DSMC algorithm within the MGDS code.
Each L3 Cell/Particle structure (Fig. 2) contains all necessary information to perform collisions and sample particle properties within the cell, independent of any other cell in the mesh. Next, since the entire Geometry data structure (or large portions of it) are stored on each partition, all particles within a given cell can be moved for their complete time step including variable time steps and surface collisions. Specifically, without accessing any other cell data (which may be located on a different partition), the x, y, and z positions, the processor index, and the local cell array index of each particle can be updated solely using the Geometry data structure. Finally, if the current cell contains inflow surfaces, then particles are generated and fully-moved as just described. The result is a single loop over all L3 cells for each full simulation time step. Each cell within this loop can be processed independently of all other cells in the simulation. It is important to note that upon completion of this loop, all cells still contain linked-lists of their original particles. Only the particle coordinates, destination processor, and local cell index on that processor have been updated; that is, each particle knows exactly where to be sent. At this point the particle data is packaged up and sent to the appropriate partition. As depicted in Fig. 3(b) this approach may have the potential to limit the inter-processor communication for occasional particles that cross multiple partitions in single time step. The final step is a global sort algorithm that removes particles from current linked-lists and adds them to destination linked-lists. This is the only step that is not independent of other cells in the simulation. Optimization of the above movement and sorting procedures is discussed in Ref. 10 along with a shared-memory parallelization technique which “threads” both the main DSMC loop and the global sorting algorithm over multi-core processors. B.
Adaptive Mesh Refinement (AMR)
The AMR algorithm used within the MGDS code inputs the current Geometry data structure, generates a new Geometry data structure adapted precisely to the local mean-free-path, sets local time steps in accordance with the local mean-collision-time, and also interpolates the maximum expected collision frequency (< σcr >max ) between old and new meshes. The algorithm then updates the Cell/Particle data structure as cells are added or removed and finally, the global sort algorithm re-sorts all particles into the appropriate linked-lists in the revised data structure. In this manner, the MGDS simulation continues in a smooth and accurate manner. The AMR procedure is called approximately 4-10 times during a typical steady-state MGDS simulation. With the current implementation described below, one call to the AMR function requires approximately the same computational time as 10 simulation time steps, regardless of the size of the simulation. While this has a negligible effect on the overall simulation time for a steady-state solution, future efficiency improvements will be important for unsteady simulations where the AMR procedure might be called every few timesteps. An overview of the AMR procedure is detailed below along with a 3-level Cartesian grid schematic in Fig. 5. It is important to note that L1 cell sizes remain fixed during the simulation. Thus, the AMR procedure is applied within each L1 cell independently of all other L1 cells according to the following steps: 6 of 14 American Institute of Aeronautics and Astronautics
Figure 5. Schematic of 3-level Cartesian mesh and AMR procedure.
1. Loop through all L3 cells contained within this L1 cell and determine λmax . 2. Set the new L2 cell size to λmax and generate a new Cartesian L2 cell grid within the L1 cell. 3. Loop through all new L2 cells and determine λmin within each. This involves determination of which old L3 cells intersect with each new L2 cell. On a 3D Cartesian grid, computing the volume of intersection between two generic cells is straight-forward and efficient. 4. Set the new L3 cell size within each new L2 cell to λmin . 5. Compute the volume of intersection between new and old L3 cells within each large L1 cell and interpolate (using a volume average) values of λ, τc , and < σcr >max into the new L3 cells. This information can then be used to accurately set an appropriate new local time step and maintain the local collision rate. 6. After processing all L1 cells via the above steps, the complete Geometry data structure is now updated. 7. The new Geometry data structure is now used to update the pointer array of Cell/Particle structures (Fig. 2) as well as modify cell data. Note that in changing the dimensions/vertices of cells, the particles contained within these cells may now be completely un-sorted. 8. Finally, the coordinates of all particles are re-indexed using the new Geometry data structure. The global sort algorithm is then called which adds/removes particle pointers to/from the correct linked lists. The MGDS simulation is now ready to carry-on with the general move/collide DSMC algorithm. C.
Cut Cell Algorithm
Since the MGDS code already uses the ray-tracing technique to determine particle intersections with cell faces, arbitrary triangulated surface meshes can be naturally imbedded within the flow field grid without modification to the MGDS data structures or algorithms. The cut-cell method performs two main functions. The first is to read in a list of triangular surface elements (generated by various commercial surface triangulation packages) and sort all surface elements into the appropriate L3 cells within the Geometry data structure. Note that if a single large surface triangle cuts through multiple small L3 cells, that the surface element may simply be added to multiple cells and does not need to be trimmed. The second main function is to compute the volume of each cut-cell required to determine the collision rate and various macroscopic properties in the cut-cell. Both functions involve moderately complex geometrical calculations, but leave the basic DSMC data structures and algorithms completely unchanged. In order to sort a list of triangular surface elements into the list of L3 Cartesian cells, the MGDS code employs a “Cut Cell Intersection” technique initially detailed in computer graphics literature and more recently adapted for CFD simulations by Aftosmis et al.8 The technique is able to use computationally efficient bitwise operations and comparisons to determine intersections between triangular elements and Cartesian cells. Further details on computing the intersection between generally positioned triangles in 3D are contained in Ref. 8. Once all surface elements are sorted into appropriate L3 cells, the volume of each of these cut-cells is computed. The MGDS code uses a simple Monte Carlo technique to compute cut-volumes. Co-ordinates are chosen at random 7 of 14 American Institute of Aeronautics and Astronautics
within the un-cut Cartesian cell. The dot product between the chosen coordinates and the normal vector of a given surface element is computed. If the dot products are negative for all surface elements contained within the cell, then the point lies outside of the flow domain. The cut-volume is simply determined by dividing the fraction of points determined to lie outside the domain by the total √ number of Monte Carlo co-ordinates considered (N). The error in such a volume calculation scales directly as 1/ N. It should be noted that this procedure is suitable only when the surface geometry cutting a given cell is either completely convex, or, completely concave. More complex collections of surface elements within a single L3 cell are currently not supported by the MGDS code and advanced methods for such geometries are discussed in Ref. 8. The cut-cell algorithm is called at the beginning of each simulation and also immediately following each call to the AMR function. The current cut-cell algorithm requires less computational time than a single AMR call for all simulations presented in this article. Finally, inflow, outflow, symmetry, or wall surfaces may be specified by the user on all sides of the overall Cartesian bounding box of the simulation. These outer surfaces are added to the appropriate L3 cells in addition to the triangulated surface elements sorted by the cut-cell algorithm.
IV. MGDS Simulation Results A.
Effect of variable time steps
Mach 5 flow of argon over a 10 cm flat plate at 30 degrees angle of attack is simulated to steady-state with and without the use of variable time steps. The free-stream density and temperature are 7.5 × 10−5 kg/m3 and 200 K respectively. The temperature of the flat plate is held fixed at Twall = 2000 K on the bottom and Twall = 200 K on the top. The variable hard-sphere (VHS) collision model is used with a power law value of ω = 0.81, and diffuse reflection and full thermal accommodation are imposed for surface collisions. The simulation is three-dimensional stretching approximately 4λ∞ in the z direction. However, since symmetry boundary conditions are imposed on the z boundary planes, the resulting flow is uniform in the z coordinate direction and only the two-dimensional flow field results are presented.
(a) Normalized density field.
(b) Number of simulation particles per cell.
Figure 6. MGDS simulation results without the use of variable time steps.
Figure 6(a) shows the resulting density field (normalized by the free-stream density) when a constant simulation timestep is used. The 3-level Cartesian grid has been adapted to the local mean-free-path and the constant time step is less than 21 τc everywhere, thus producing an accurate solution. It is evident from Fig. 6(a) that the density increases by a maximum of 5 times between the free-stream and the leading edge of the plate. Although the density (and therefore the number density, n) increases towards the plate leading edge, the cell size decreases proportionately. Thus AMR should naturally provide some control in maintaining a constant number of particles per cell. However, as evident from Fig. 6(b), even with AMR, the average number of simulation particles per cell varies widely from less than 10 to
8 of 14 American Institute of Aeronautics and Astronautics
more than 500. If one considers the number of real gas molecules located within a volume of one cubic mean-free-path (Nreal ), the following dependence is realized: Nreal = nλ3 ,
n ≈ ρ,
λ ≈ 1/ρ.
(1)
If constant simulation particle weights are used, the number of simulation particles per cell should therefore scale with 1/ρ2 in 3D (and 1/ρ in 2D). For example, if the density drops by one order of magnitude, although the number of molecules per unit volume drops equally, the volume under consideration rises by three orders of magnitude. Therefore, even when every cell is refined precisely to the local value of λ, if there were 10 particles per cell in the high density region near the leading edge, one would expect (ρ/ρ∞ )2 = 52 times more particles in the lower density free-stream cells. This is precisely verified in Fig. 6(b) where there are approximately 250 particles per cell in the free-stream region and 10 near the leading edge of the flat plate. Note also, how in the very low density region above the plate, there are an enormous number of particles per cell. As stated earlier, only 20 particles per cell are required for accuracy and using additional particles per cell is inefficient. Thus, even with precise AMR for 3D simulations, large variations in the number of particles per cell will occur naturally in a DSMC simulation, resulting in a very inefficient simulations using far more particles than required for statistical accuracy.
(a) Local time step ratio in each L3 cell.
(b) Number of simulation particles per cell.
Figure 7. MGDS simulation results with the use of variable time steps.
As described by Kannenberg and Boyd,11 the use of a variable time step in each cell significantly alleviates this problem. Compared to a global reference time step, ∆tre f , the time step used to advance particles within a given cell is increased or decreased by a factor that varies from cell to cell (∆t = ∆tratio × ∆tre f ). This factor is equal to the ratio of the local mean-collision-time to the global reference time step, ∆tratio = S × τc /∆tre f and can be scaled with a globally constant factor S. At the same time, the weight for all particles within the cell is also multiplied by ∆tratio . Essentially, where τc (and therefore λ) is large, particles ‘move more rapidly’ through these larger cells. During a given instant, this reduces the number of particles found in that cell and, by adjusting the particle weight by the same factor, the number density in the cell remains correct. Since τc ≈ 1/ρ, the combination of variable time steps and AMR improves the scaling for the number of particles per cell to N ≈ constant in 2D and N ≈ 1/ρ in 3D. As detailed in Ref. 11, the new time step, number of particles, and particle weight are used in each cell to compute collision rates and outcomes without any modification to the DSMC algorithm. The MGDS code is able to set a variable time step in each L3 cell. In addition, this local time step can be slightly adjusted to account for in-exact AMR. Even when using a 3-level adaptive grid, locally ∆x 6= λ precisely. In the MGDS code, the local time step factor is set as ∆tratio = S × (τc /∆tre f ) × (∆x/λ). Thus if a given cell is slightly too small the time step is lowered such that additional particles will “accumulate” in this smaller cell. It should be noted that adjusting the local time step in this manner is accurate as long as ∆x/λ ≈ 1, which is the case when using a 3-level Cartesian grid with AMR. Figure 7(a) shows the time scale factor used in each L3 cell for the flat plate problem. The combination of precise AMR and precise local time step adjustment leads to a much more uniform distribution of 9 of 14 American Institute of Aeronautics and Astronautics
particles per cell as seen in Fig. 7(b). The flow field solutions, surface heating rates, and surface shear stress profiles obtained when using the variable time step method are verified to reproduce the results obtained using a constant simulation timestep which required substantially more particles. Varying the particle weight in each cell independent of the time step is an alternate method of controlling the number of particles per cell that is used in existing DSMC codes and also discussed in Ref. 11. Here, particles are either cloned or deleted in order to obtain the desired number of particles per cell, and their weights are adjusted accordingly. While the deletion of simulation particles is not thought to influence solution accuracy, the cloning of particles may correlate statistics and result in random walk errors.1 Although techniques exist to minimize such errors, the combination of AMR and local time steps should greatly reduce the degree to which particles are cloned/destroyed. B.
Hypersonic blunt body flows
(a) Temperature field and final 3-level Cartesian grid.
(b) Close-up of the plate-tip flow region.
Figure 8. Simulation results for Mach 15 nitrogen flow over a vertical flat plate.
Mach 15 flow of diatomic nitrogen gas over a 16 cm vertical flat plate is simulated with the MGDS code using both AMR and the variable time step method. The free-stream density and temperature are 6.5 × 10−6 kg/m3 and 80 K, respectively. The temperature of the flat plate is held fixed at Twall = 1500 K on front surface and Twall = 200 K on the back. The variable hard-sphere (VHS) collision model is used with a power law value of ω = 0.75, and diffuse reflection and full thermal accommodation are imposed for surface collisions. The Larsen-Borgnakke12 model is used for translational-rotational energy exchange together with the variable rotational energy exchange probability model of Boyd13 using a maximum rotational collision number 18.1 and reference temperature for rotational energy exchange of 91.5 K. Vibrational energy is not considered. Again, the simulation is three-dimensional stretching approximately 8λ∞ in the z direction with symmetry boundary conditions. Only the two-dimensional flow field results are presented. The MGDS simulation begins with a uniform Cartesian mesh (L3=L2=L1) that is sized to ∆x = 0.5λ∞ and a uniform time step set to ∆t = 0.2τc−∞ . The simulation rapidly reaches steady state and the solution is then sampled for approximately 200 time steps. At this point the AMR function is called which refines the mesh, re-sorts all particles in the new mesh, resets all local time steps, and then continues with the simulation. The translational temperature field after four AMR calls is shown in Fig. 8(a) which clearly shows the bow shock and compression region ahead of the plate and a rarefied region behind the plate. A close-up of the plate tip is shown in Fig. 8(b). Here the level of refinement between L1 cells (behind the plate) and L3 cells in front is seen to vary by a factor of 30. Indeed, without the variable time step method, the large cells in the plate wake would naturally contain 302 = 900 times more particles than the refined cells in front of the plate. Also evident in Fig. 8(b) is the increased flexibility a 3-level Cartesian mesh has over a 2-level mesh. A 2-level Cartesian mesh requires uniform cell spacing within each L1 cell which would result in substantially more (unnecessary) cells in high gradient region upstream of the flat plate. Not only would a 2-level Cartesian mesh require more cells and therefore more particles, but would also introduce further variation in
10 of 14 American Institute of Aeronautics and Astronautics
the number of particles per cell. In such a 2-level mesh, some of the cells would be now be much smaller than the local value of λ. As a result, these cells may no longer contain sufficient particles (≈ 20) and the solution may now become inaccurate (not just inefficient). In order to restore accuracy, either the global number of particles would have to be increased (highly inefficient) or particle cloning would become necessary.
Figure 9. Mach 15 nitrogen flow over a 3D vertical flat plate.
In order to highlight the capability of the MGDS geometry model, AMR algorithm, and cut-cell algorithm, two additional solutions are presented in Fig. 9 and Fig. 10. The validation of MGDS simulation predictions with experimental results is not addressed in the current article. Figure 9 depicts 3D flow over a vertical flat plate of finite size in the z direction. The free-stream conditions are identical to those used for the simulation displayed in Fig. 8, except Twall = 2500 K on the front of the plate. This modification results in more moderate density gradients near the plate surface and requires less refinement, thereby enabling this 3D flow to be simulated on a single processor. Figure 9 highlights how the general AMR algorithm outlined in Section III B. is capable of smoothly adapting a 3-level Cartesian mesh for 3D flows involving large density gradients. Contours of translational temperature are plotted for the flow field, and contours of the heat transfer coefficient are plotted on the plate surface. In Fig. 9, the fine grid spacing on the front of the plate and the coarse spacing immediately behind the plate is clearly evident. Finally, Fig. 10 shows a 3D solution of hypersonic flow of argon gas over a cylinder. The solution is uniform in the z direction and only the resulting 2D flow field is displayed. The free-stream density and temperature are 6.5 × 10−6 kg/m3 and 80 K, respectively. The temperature of the flat plate is held fixed at Twall = 1500 K. Diffuse reflection and full thermal accommodation are imposed for surface collisions. The free-stream velocity is set to correspond to a very high Mach number of 29. The resulting temperatures (seen in Fig. 10) are unrealistically high as no ionization reactions are enabled. However, the reason such a high Mach number was selected is to test the MGDS AMR and cut-cell algorithms with a challenging case involving large density variations. The cylinder geometry is specified as a triangulated surface and is cut from the initial uniform Cartesian flow field mesh, and also cut from each new mesh generated by the AMR function. Although the cylinder geometry may seem simple, the cut cells resulting from the surface geometry span a wide range of volumes ranging from near complete cells, to very small slices of the original Cartesian cells. A close-up view of the mesh refinement near the cylinder surface at approximately 45 degrees from the stagnation point is shown in Fig. 11.
11 of 14 American Institute of Aeronautics and Astronautics
Figure 10. Mach 29 argon flow over a cylinder.
12 of 14 American Institute of Aeronautics and Astronautics
Figure 11. Close-up view of the cylinder mesh at 45 degrees from stagnation point.
V. Summary and Conclusions In this article, the data structures and algorithms of a newly developed 3D direct simulation Monte Carlo (DSMC) program, called the Molecular Gas Dynamic Simulator (MGDS) code, are outlined. The code employs an embedded 3-level Cartesian mesh, accompanied by a cut-cell algorithm to incorporate triangulated surface geometry into the adaptively refined Cartesian mesh. This geometry model is selected for its low memory storage requirements, its ability to decouple surface and flow field discretizations, and increased computational efficiency of DSMC particle movement procedures over non-Cartesian meshes. Two separate data structures are proposed in order to separate geometry data from cell and particle information. The geometry data structure contains all necessary information required for particle movement and AMR procedures, however for a Cartesian mesh, this still involves little memory such that each partition in a parallel simulation could potentially store the entire geometry. A separate cell/particle data structure is maintained that holds all cell and particle data. This data structure requires significant memory storage and must be partitioned in a parallel simulation. Such separate geometry and cell/particle data structures enable particles within each cell to be moved for a full time step without requiring information from other cell data structures, including movement with variable time steps across multiple cells. Within the MGDS code, particles are moved through the 3level Cartesian mesh using a ray-tracing technique. A simple and efficient AMR algorithm that maintains local cell size and time step consistent with the local mean-free-path and local mean collision time is detailed. Simulations are presented that demonstrate significant efficiency gains for 3D flows with the use of precise AMR and variable time steps. Finally, a cut-cell method is outlined that sorts an arbitrary list of triangulated surface elements into local Cartesian cells and computes the cut volume using a Monte Carlo technique. The ray-trace particle movement algorithm is then able to accurately detect and simulate particle collisions with complex 3D surface geometries.
Acknowledgments This work is partially supported by a seed-grant from the University of Minnesota Supercomputing Institute (MSI). This work is also supported by the Air Force Office of Scientific Research (AFOSR) under Grant No. FA9550-04-10341. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official polices or endorsements, either expressed or implied, of the AFOSR or the U.S. Government.
13 of 14 American Institute of Aeronautics and Astronautics
References 1 Bird,
G. A., Molecular Gas Dynamics and the Direct Simulation of Gas Flows, Oxford University Press, New York, 1994. D. J., Gallis, M. A., Torczynski, J. R., and Wagner, W., “DSMC Convergence Behavior of the Hard-Sphere-Gas Thermal Conductivity for Fourier Heat Flow,” Physics of Fluids, Vol. 18, No. 7, 2006, pp. 1–16. 3 Gallis, M. A., Torczynski, J. R., and Rader, D. J., “DSMC Convergence Behavior for Transient Flows,” AIAA Paper 07-4258, June 2007, presented at the 39th AIAA Thermophysics Conference, Miami, FL. 4 Dietrich, S. and Boyd, I. D., “Scalar and Parallel Optimized Implementation of the Direct Simulation Monte Carlo Method,” Journal of Computational Physics, Vol. 126, 1996, pp. 328–342. 5 Otahal, T. J., Gallis, M. A., and Bartel, T. J., “An Investigation of Two-Dimensional CAD Generated Models with Body Decoupled Cartesian Grids for DSMC,” AIAA Paper 2000-2361, June 2000, presented at the 39th AIAA Thermophysics Conference, Miami, FL. 6 LeBeau, G. J., “A parallel implementation of the direct simulation Monte Carlo method,” Computer Methods in Applied Mechanics and Engineering, Vol. 174, 1999, pp. 319–337. 7 Ivanov, M. S., Markelov, G. N., and Gimelshein, S. F., “Statistical simulation of reactive rarefied flows: numerical approach and applications,” AIAA Paper 98-2669, 1998, presented at the 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV. 8 Aftosmis, M. J., Berger, M. J., and Melton, J. E., “Robust and Efficient Cartesian Mesh Generation for Component-Based Geometry,” AIAA Journal, Vol. 36, No. 6, 1998, pp. 952–960. 9 LeBeau, G., Jacikas, K., and Lumpkin, F., “Virtual Sub-Cells for the Direct Simulation Monte Carlo Method,” AIAA Paper 03-1031, Jan. 2003, presented at the 39th AIAA Thermophysics Conference, Miami, FL. 10 Gao, D. and Schwartzentruber, T. E., “Parallel Implementation of the Direct Simulation Monte Carlo Method for Shared Memory Architectures,” Presented the 48th AIAA Aerospace Sciences Meeting, AIAA Paper 2010-451, Jan. 2010. 11 Kannenberg, K. C. and Boyd, I. D., “Strategies for Efficient Particle Resolution in the Direct Simulation Monte Carlo Method,” Journal of Computational Physics, Vol. 157, 2000, pp. 727–745. 12 Larsen, P. S. and Borgnakke, C., “Statistical Collision Model for Monte Carlo Simulation of Polyatomic Gas Mixture,” Journal of Computational Physics, Vol. 18, 1975, pp. 405–420. 13 Boyd, I. D., “Rotational-Translational Energy Transfer in Rarefied Nonequilibrium Flows,” Physics of Fluids A, Vol. 2, No. 3, 1990, pp. 447– 452. 2 Rader,
14 of 14 American Institute of Aeronautics and Astronautics