Multi-Level Multi-Domain Algorithm ... - Semantic Scholar

4 downloads 24835 Views 9MB Size Report
Apr 15, 2014 - Each sub-domain has its local resolution and interacts .... a very cheap computational cost since the EDR is very small and the need. 9 ...
Multi-Level Multi-Domain Algorithm Implementation For Two-Dimensional Multiscale Particle In Cell Simulations A. Becka , M.E. Innocentib , G. Lapentab,c , S. Markidisd a Laboratoire Leprince-Ringuet - Ecole polytechnique, CNRS-IN2P3 Center for Plasma Astrophysics, Department of Mathematics, K.U. Leuven, Celestijnenlaan 200B, B-3001 Leuven, Belgium c Exascience Intel Lab Europe, Kapeldreef 75, B-3001 Leuven, Belgium d PDC Center for high Performance Computing, KTH Royal Institute of Technology, stockholm, Sweden b

Abstract There are a number of modeling challenges posed by space weather simulations. Most of them arise from the multiscale and multiphysics aspects of the problem. The multiple scales dramatically increase the requirements, in terms of computational resources, because of the need of performing large scale simulations with the proper small-scales resolution. Lately, several suggestions have been made to overcome this difficulty by using various refinement methods which consist in splitting the domain into regions of different resolutions separated by well defined interfaces. The multiphysics issues are generally treated in a similar way: interfaces separate the regions where different equations are solved. This paper presents an innovative approach based on the coexistence of several levels of description, which differ by their resolutions or, potentially, Email address: [email protected] (A. Beck)

Preprint submitted to Journal of Computational Physics

April 15, 2014

by their physics. Instead of interacting through interfaces, these levels are entirely simulated and are interlocked over the complete extension of the overlap area. This scheme has been applied to a parallelized, two-dimensional, Implicit Moment Method Particle in Cell code in order to investigate its multiscale description capabilities. Simulations of magnetic reconnection and plasma expansion in vacuum are presented and possible implementation options for this scheme on very large systems are also discussed. Keywords: Particle-In-Cell, Multiscale, Refinement, Multilevel, Multidomain, Implicit, Adaptive 1. Introduction As the performances of computing devices grow, so do the ambitions of the scientific community. What used to be local simulations of independent problems is now becoming global simulations including many different components: see, in the space weather field, the remarkable efforts done in recent years to simulate the entire heliosphere by coupling existing models based on different physics [1] [2] [3]. Most of the time, these components are physically separated domains differing either by their resolution (multiscale) or by the equations solved (multiphysics). In plasma physics, efforts for supporting multiscale have been made during the last two decades in the frame of Magneto Hydrodynamics (MHD). Adaptive Mesh Refinement (AMR) is now routinely used in all state-of-the-art MHD codes [4, 5, 6, 7, 8]. But it is only recently that people started suggesting new schemes for multiscale support in Particle In Cell (PIC) plasma simulations, either AMR [9, 10] or of the Moving Mesh Adaptation (MMA) type [11, 12, 13]. The common

2

ground shared by most of these works is that, as in most of the AMR flavors, they require a partition of space. The domain is divided into several disjoint sub-domains. Each sub-domain has its local resolution and interacts with neighboring sub-domains through their common interface. Sometimes, in addition to the local resolution level, some quantities are also evaluated at the global, coarser, resolution level. In that case, the high resolution mesh is said to overlap the coarser one over an ’overlap area’. Usually, the overlap area, which is on the coarse grid level by definition, is either not simulated at all, or at least, not simulated fully, with both fields and particles. Two main reasons are identified for the relative paucity of adaptive PIC codes when compared to adaptive MHD codes. One is the fact that most of the PIC codes in use are explicit and thus bounded by the strict stability constraints of explicit PIC methods [14, 15]. This prevents them from fully exploiting the potentiality of adaptive techniques. In fact, the Refinement Factor (RF) between the coarse and refined grids still has to be chosen in order to resolve a fraction of the Debye length and a fraction of the inverse electron plasma frequency for stability reasons. A second reason is the issue of boundary conditions between the grids. In fact, while the exchange of fields between sub-domains of different resolution is rather straightforward, problems arise in the case of the PIC algorithm for computational particles. The standard derivation of the PIC method assumes that the computational particles, used to sample the distribution function, are of the same size as the cell they are sitting in. This clearly rises an issue at the interfaces, where computational particles are supposed to cross sub-domains of different cell sizes. New time-dependent terms ap-

3

pear in the particle equations of motion if the shape function of the crossing particle is instantly adapted to the new cell size. Many solutions to this problem have been proposed in the above mentioned works. In some cases, the grid resolution is constrained to change slowly, in order to be able to neglect the new time-dependent terms in the particle equations of motion [13]. Alternatively, particle splitting and coalescing algorithms [16] are used to map one coarse grid computational particle to an appropriate number of refined grid computational particles and vice versa [17]. Finally, advanced charge deposition algorithms are used for computational particles near the sub-domain interfaces, with the aim of reducing the self-forces that they exert on each other in the proximity of sub-domain boundaries [18]. Innocenti et al. [19] recently suggested a refinement technique called MultiLevel Multi-Domain (MLMD) method. The present work has to be considered the extension in two dimensions of the method proposed there. The main points of novelty of the MLMD method, which differentiate it from previous works, arise from the need to respond to the two issues previously identified. The first challenge is addressed using an Implicit Moment Method (IMM) algorithm [20, 21] instead of an explicit one. Notice that, of the PIC adaptive implementations listed before, only the MMA codes belonging to the Venus/ Celeste/ Democritus family are not explicit. The use of the IMM allows to substitute the strict stability constraints of the explicit PIC with a less exacting accuracy constraint which permits the use of higher grid spacings ∆x and time steps ∆t. This obvious advantage becomes even more relevant in

4

the MLMD approach, since it translates into the possibility of choosing very high RF. It goes up to RF = 14 in the examples presented in Sec. 4. The problem of particle shape function across the level boundaries is addressed with the MLMD implementation. In the MLMD algorithm, particles do not change sub-domains since all levels are fully simulated with fields and particles. Computational particles are spawned and updated at all levels, and not only at the most refined one as it is usually done in AMR PIC. Consequently, particle information does not need to be exchanged from the refined to the coarser grids. This allows to avoid the particle coalescence operations of Fujimoto and Machida [17] and therefore prevents the reported problems with the conservation of the total energy of particles and of the distribution function [16]. The additional computational costs that come from simulating particles at all levels is partially compensated by the use of the IMM algorithm. The paper is structured as follows. In Section 2, the points of novelty of the presented implementation are briefly reviewed. Sections 3.1 and 3.2 illustrate the code capabilities and robustness in the cases of magnetic reconnection and plasma expansion in vacuum. Section 3.3 then focuses specifically on the benefits and limits of the method. Section 4 provides a general picture of the parallel implementation and comments on the performances which have not been optimized yet. 2. Novelty of the presented implementation Here, the two points of novelty of the presented MLMD implementation, the use of a IMM algorithm and the MLMD structure itself, are briefly de5

scribed. Ref. [19] is again referenced for a more accurate discussion of the MLMD implementation choices.

2.1. The Implicit Moment Method The IMM is not new in itself, having been developed as early as in Brackbill and Forslund [20], but it is used for the first time in Ref. [19] and here for non-MMA adaptive implementation. It has interesting stability properties which make it particularly fit for an MLMD method. The IMM is thoroughly described in Vu and Brackbill [22]. The point of relevance for the present paper is the fact that the approximations used to derive the equations solved in the method introduce the following accuracy constraint on the average particle thermal velocity vth : vth ∆t/∆x < 1,

(1)

where ∆x is the grid spacing and ∆t the time step. The average particle should not move more than a cell per time step. The necessity to avoid the Finite Grid Instability [14, 15] adds a lower limit to ς = vth ∆t/∆x. The complete stability and accuracy constraint of the method becomes  < ς < 1,

(2)

with  ≈ 0.01[22, 21]. Notice that, with fixed ∆t, a ∆x too large on the coarse grid might trigger a Finite Grid Instability. This, however, is not a common occurence because of the high frequency dumping properties of the IMM [20]. On the other hand, the accuracy constraint threatens the refined grid in case of a ∆x too 6

small. Those two limits define the theoretical maximum RF, RFmax , one can use without violating the stability or accuracy constraints. Eq. 2 gives RFmax = 1/ ≈ 100. These constraints are very weak with respect to the ones of explicit methods and thus RF as high as RF = 12 and RF = 14, as in Section 3.1 and 4, can be achieved safely. 2.2. The MLMD method The main difference between the MLMD method and the AMR implementations cited earlier is the fact that here multiple levels are simulated fully, with fields and particles, also in the overlap area previously defined.

The different levels are free to evolve relatively independently, in the limit of physical significance and of the inter level operations performed, which loosely couple the level evolution. Level evolution according to the local grid dynamics is thus possible.

Full description of the method can be found in Innocenti et al. [19]. The exchange of information between the coarser (C) and refined (R) grids is repeated in Figure 1 as a reminder. It happens in three stages: (1) interpolation of field boundary conditions from the coarser to the refined grids ”C2R”, (2) projection of the updated field information ”R2C” and (3) exchange of particle boundary conditions ”C2R”. A Particle Repopulation Area (PRA) corresponding to a number of boundary cells determined by the user is identified on the refined grid. The native refined grid particles sitting in the PRA are deleted and new particles are generated according to the coarse grid particle population data. 7

Figure 1: Information exchange in a IMM MLMD system. The coarse grid, grid l, provides the refined grid, grid l + 1, with field and particle boundary conditions, depicted as the ”Boundary Conditions” red arrow and the ”Particle Repopulation” green arrow. The refined grid updates the fields locally calculated on the coarse grid with ”Refined Fields” information, blue arrow. The logical blocks of the IMM algorithm are from Lapenta et al. [21].

8

3. Physical cases This section describes two physical applications of the method in order to pinpoint its benefits and limits. 3.1. Magnetic reconnection Magnetic reconnection [23] is a very interesting problem to simulate with a MLMD code. Two distinct areas can be identified in the reconnection plane, where the physics of interest evolves on rather different spatial and temporal scales [24]. In the electron diffusion region (EDR), processes occur at the smallest and fastest scales. Both electrons and ions are decoupled from the magnetic field lines and the tiny electron scales need to be resolved. In the ion diffusion region (IDR), instead, electrons are still tied to the magnetic field lines and processes happen at ion scales which are at least one order of magnitude larger.

IMM algorithms are routinely used to resolve the ion scales only and save the computational cost of solving the electron scales as explicit codes are forced to [25, 26]. However, in these cases, the electron dynamics in the EDR are reproduced only in a reduced way: the higher frequencies unresolved by the temporal and spatial stepping used are damped (but not suppressed) by the IMM method [20]. The MLMD approach, instead, offers the opportunity of retaining the electron dynamics in the area where they are of relevance at a very cheap computational cost since the EDR is very small and the need 9

for refinement is very localized. A magnetic reconnection simulation in a two levels MLMD system is shown here. The coarse grid has dimensions Lx,gl0 /di = Ly,gl0 /di = 20 where di is the ion skin depth. Four species, electrons, ions and background electrons and ions are simulated. 256 × 256 cells are used on each level. 12 × 12 particles per species per cell are generated. One PRA cell per side is used. A double Harris equilibrium [27] is set at initialization, with half width of the current sheet LH /di = 0.5, mass ratio mi /me = 256, electron thermal velocity vth,e = 0.045 and ratio between the ion and electron temperature Ti /Te = 5. Periodic boundary conditions for fields and particles are set on the coarse grid. By setting initial perturbations of appropriate sign, two anti symmetric reconnection points are initiated at x/di = y/di = 4.84 and x/di = y/di = 14.84. A refined grid with RF=12 per side is centered around the X point at x/di = y/di = 14.84. This X point is referred to as the ’test X point’ in the rest of the paper and exists on both levels. The other X point, refered to as the ’control X point’, is used as comparison during the evolution and exists only on the coarse level. The refined grid has dimensions Lx,gl1 /di = Ly,gl1 /di = 1.67. The same time step, Ωci ∆t = 0.0012, is used for both grids, where Ωci is the ion cyclotron frequency. This time step value is rather small with respect to the one usually used in IMM simulations, which is of the order of a fraction of the inverse ion cyclotron frequency. It is here selected so small to sufficiently resolve in time the refined grid. Notice that the RF used here between the grids is exceptionally high, especially when compared with the ratio between the grids used in PIC AMR

10

codes based on explicit algorithms. A tenth of the ion skin depth is resolved on the coarser grid and a tenth of the electron skin depth on the refined grid. It will be now shown how reconnection develops as expected [28]. On the coarse grid, the projection operations from the refined grid do not hinder the expected evolution at the ion scales. Additionally, the boundary information exchanged from the coarse to the refined grid allows the refined level to correctly evolve at the electron scales. Figure 2 shows a screenshot of electron and ion currents around the test X point of the coarse grid. Field lines are superimposed to the plots and a rectangle highlights the area simulated also with the refined grid. Notice the different scales which emerge in the plots: the ion current develops on scales of the order of the relatively large ion skin depth, while the electron currents are much more localized in space around the X point. The different scales, are due to the different distances from the X line at which the meandering motion of ions and electrons start. The different trajectories of electrons and ions generate a current system in the reconnection plane and thus the characteristic quadrupolar signature of collisionless magnetic reconnection without guide field in the out of plane magnetic field Bz . Such signature is reproduced in Figure 3, top panel. The Hall field Ey , which marks the magnetic separatrices, is plotted in Figure 3, bottom panel. Both Figure 2 and Figure 3 positively compare with the expected current and field evolution during collisionless reconnection without guide field as depicted in [28]. Moreover, the two X points of the coarse grid level display extremely similar results. The presence of the refined grid on top of the test

11

Figure 2: Electron current in the out of plane (top panel) and in the x (center panel) direction and ion current in the x direction (bottom panel) at time Ωci t = 7.275 around the test X point of the coarse grid in the MLMD simulation of magnetic reconnection.

12

Figure 3: Out of plane magnetic field Bz (top panel) and Hall electric field Ey (bottom panel) at time Ωci t = 7.275 around the test X point of the coarse grid in the MLMD simulation of magnetic reconnection.

13

X point does not introduce projection artifacts. The benefits from the increased resolution brought by the refined grid in the MLMD simulation are evident in Figures 4 and 5. The capability of the refined grid to capture features that the reduced resolution of the coarse grid does not allow to observe is particularly evident when the parallel or perpendicular electron thermal velocities are studied, where parallel and perpendicular are respect to the direction of the local magnetic field. The bottom panel in figures 4 and 5 reproduces with a much higher level of details the structure already partially captured by the coarse grid. In particular, a structure resembling an electron jet (see [29] and references therein) departing from the X point is identified by the refined grid. The jet as resolved by the refined grid is colder than the coarse grid counterpart in the parallel direction and approximately at the same temperature, but showing a reduced thickness, in the perpendicular direction. Verification activities have been undertaken to verify that the structures observed are not an artifact of the method and in particular of the particle injection technique used on the refined grid. It is necessary to observe that it is not possible to perform a direct comparison of the refined grid results with a reference simulation fully performed at the refined grid resolution. In fact, the simulation presented here was performed on 16 × 16 × 2 cores for a duration of about seven hours. A comparable run at the refined grid resolution would need 16 × 12 × 16 × 12 cores for a similar time. This is not feasible on most of the currently available systems. However, other kind of verification activities are available. The same simu-

14

Figure 4: Parallel electron thermal velocity at time Ωci t = 7.275 in the MLMD simulation of magnetic reconnection. The top panel depicts a fraction of the coarse grid area, the central panel a close up on the coarse grid test X point and the bottom panel the refined grid area. The same color scale is used in the last two panels.

15

Figure 5: Perpendicular electron thermal velocity at time Ωci t = 7.275 in the MLMD simulation of magnetic reconnection. The top panel depicts a fraction of the coarse grid area, the central panel a close up on the coarse grid test X point and the bottom panel the refined grid area. The same color scale is used in the last two panels.

16

lation has been repeated with PRA=10 instead of 1: the particles sitting in the first ten cells per side of the refined grid are directly generated according to coarse grid particles state in the overlap area [19]. This simulation allows to verify that the thermal velocity structures illustrated are not due to poor particle injection on the refined grid. Figures 6 and 7 show the parallel and perpendicular electron thermal velocities on the coarse (top panel) and refined (bottom panel) grid of the simulation performed with PRA=10 at time Ωci ∆t = 7.76, analogous to figure 4 and 5. The structures observed are qualitatively the same as the ones shown before. In particular, the same electron jet structure is evident and particles appear to cool down in the parallel direction in the refined grid. The results outlined above are in full agreement with the current understanding of reconnection as summarized recently for example in the texbook by Birn and Priest [30]. The electron jet exiting the reconnection region [31, 32] is clearly present in the electron current reproduced by the refined level (not shown here). The modifications of the pressure tensor and thermal speeds in the regions around the EDR are also captured correctly. As noted in the study by Le et al. [33], the perpendicular temperature (Figures 5 and 7) increases in the innermost layer of the EDR but decreases in its proximity, above and below. It stems from the conservation of magnetic moment: in the incoming flux tubes, the electrons are still magnetized. As the magnetic field strength decreases in the flux tubes approaching the X point, the conservation of magnetic moment requires the decrease of the perpendicular thermal velocity. A form of adiabatic cooling. Reversely, in the outgoing flux tubes, the increasing magnetic field intensity observed leads to increased adiabatic

17

heating [34]. Another example of features that can be perceived only with a very high resolution simulation is the inversion layer of electric field in the EDR as described in Chen et al. [35]. Figure 8 shows how the coarse grid is absolutely not capable of describing the fine structure of the EDR while the refined grid shows all the details of the inverted electric field layers. In order to get this result, Chen et al. had to run a very similar PIC simulation, using 2560×2560 cells, fifty times as many cells as in the MLMD simulation, for comparable domain size, mass ratio and resolution. Cases demonstrating a beneficial influence of the refined grid on the coarse grid have also been observed. Figure 9 shows such an example: the Hall field is depicted at time Ωci ∆t = 12.32 for the coarse grid area simulated only with the lower resolution (top panel) and also with higher resolution (bottom panel) and with Harris equilibrium parameters of LH /di = 0.5, mi /me = 256, vth,e = 0.045 and Ti /Te = 20. Notice that the parameters are the same as the simulations illustrated beforehand in this section, with the only difference of an higher temperature ratio between ions and electrons. The higher Ti /Te decreases the inverse electron gyration frequency, which is now 1/Ωce = 0.2139 compared to a previous value of 1/Ωce = 0.4027. With the simulated spatial and temporal resolutions kept the same as the ones in the previous test case (∆xc = 0.078, where ∆xc is the coarse grid spatial resolution, RF = 12 and ωpi ∆t = 0.125), the spatial resolution of the coarse grid is not anymore sufficient to correctly sample electron gyration. Hence, the noise structures irradiating outwards from the reconnection point in Figure 9, top panel, arise. Notice that the coarse grid area simulated is

18

Figure 6: Parallel electron thermal velocity at time Ωci t = 7.76 in the MLMD simulation of magnetic reconnection with PRA=10. The top panel depicts a close up on the coarse grid test X point and the bottom panel the refined grid area. The same color scale is used on the two grids.

19

Figure 7: Perpendicular electron thermal velocity at time Ωci t = 7.76 in the MLMD simulation of magnetic reconnection with PRA=10. The top panel depicts a close up of the coarse grid test X point and the bottom panel the refined grid area. The same color scale is used on the two grids.

20

Figure 8: Inversion layer of electric field as captured by the MLMD simulation in the EDR. The coarse grid (top panel) shows a classical X-structured EDR while the meandering motion of electrons, well captured by the refined grid, is responsible for the formation of the inversion layer of electric field[35] visible on the bottom panel.

21

bigger than before, Lx,gl0 /di = 100, Ly,gl0 /di = 60, to exclude the role of boundary effects in the excitation of these structures. The higher resolution in the refined grid prevents the development of these artifacts at the refined level. Very interestingly, moreover, the exchange of field information between the levels acts in the direction of reducing the noise level in the overlap coarse level area, as shown in Figure 9, top panel. This is a clear demonstration of how the coarse grid is able to benefit locally from the presence of a refined grid, in case the correct evolution of the coarse grid is hindered by the lack of spatial resolution. 3.2. Plasma expansion in vacuum This section illustrates the interest of having access to large scales when the initial small system, well defined on a ’refined grid’, naturally evolves towards a system larger than the initial grid. The evolution of a thermal, collisionless, globally neutral ball of plasma in vacuum is considered. It is commonly accepted that after a transitory phase, the expansion becomes selfsimilar [36, 37]. The transitory phase consists in the ejection of the fastest electrons of the thermal plasma. If the initial Debye length of the plasma is much smaller than the initial ball size, only a small part of the electrons can escape. The transitory phase is then rather short and the plasma remains almost globally neutral. In the following, the results of the simulation of such a slow and progressive expansion in vacuum are shown. The plasma is made of electrons and protons with a mass ratio of 50. The initial radius of the plasma ball is R0 . The thermal velocities and densities are set such as the initial Debye length λD = 2.2 × 10−2 R0 . The grids setup is the following. The refined grid is a square grid of 128 × 128 cells of 22

Figure 9: The two reconnection points of the coarse grid with a resolution which fails to describe the ion scales. Bottom panel is the control X point, the noise amplitude is of the same order as the X structure itself. Top panel is the test X point which displays a significant noise reduction thanks to the benefitial influence of the refined grid.

23

size ∆xf = 3.1 × 10−2 R0 . The diameter of the initial plasma ball occupies approximately half of total size of the refined grid. The refinement factor is 8 so the coarse grid is made of 128 × 128 cells of size ∆xc = 0.24R0 . The diameter of the initial ball occupies less than 10 cells on the coarse grid. The two grids and the plasma ball share the same center. There are 225 computational-particles per cell. Macro-particles are created only in cells that have a non empty intersection with the initial plasma ball. If a computational particle is still created outside the ball of plasma, it is given a zero weight. The time step is ∆t = 4R0 /c. This rather large time step is possible because of the very slow motion of the plasma but mostly thanks to the IMM PIC scheme which remains stable even for large time steps (see section 2). The initial plasma is therefore initially well resolved on the refined grid but extremely coarsely described on the coarse grid. As expected, the first stage of the expansion is well described on the refined grid but once particles exit the refined grid, the lower resolution and larger domain of the coarse grid become an asset allowing to continue to follow the expansion to larger scales. It is interesting to note that all coarse computational particles are subject to fields projected from the refined grid and averaged with the local coarse fields (see section 1) until they reach an area of the coarse grid that does not overlap the refined grid. So the first stage of the expansion, even on the coarse grid, is half driven by the refined fields and their projection on the coarse grid. During this first stage, no particle information is ever exchanged between grids and communications are strictly limited to exchange of fields

24

Figure 10: Logarithm of the ion density profile as a function of time on the coarse (left panel) and refined (right panel) grids. The blue line shows the position of the ion front given by the theoretical model in [37].

from refined to coarse grid. It turns out that this little information exchange is enough to keep coherency between the two grids but has a neglectable influence on the coarse grid. The coarse grid is a great asset when it comes to resolve the latest stage of the expansion but there is an upper limit to its capabilities. The larger the domain , the less cells are included inside the initial ball of plasma and the less computational particles are created on the coarse grid. The number of computational particles on the coarse grid being constant, using a larger grid is a trade off where phase-space resolution is degraded to gain access to bigger domains and larger scales. Allowing for a higher number of particles per cell on the coarse grid would be a simple way to mitigate this effect. Since only a few cells are initially populated, this would have a very reasonable cost. Figure 10 shows the evolution of the ion density profile as a function of time. The evolution is very smooth on the refined grid until the ion front 25

Figure 11: Logarithm of the ion density profile as a function of time on a single coarse grid without refinement. The blue line shows the position of the ion front given by the theoretical model in [37].

passes the grid boundary. From there on, the refined grid is not capable of describing the global expansion any more but it gives well resolved details about the inner dynamics of the ball. On the coarse grid, the expansion is not as well described as on the refined grid. The low resolution captures the ion front motion as a step function but it still follows the global evolution prescribed by the refined grid. On both grids, the position of the ion front fits fairly well with the theoretical value given by the self-similar expansion model of Beck and Pantellini [37] shown in blue on the graphics. Figure 11 shows the same graph as figure 10 but on a single grid run without refinement. It looks very much like the coarse grid with refinement demonstrating that the presence of the refined grid is almost transparent for the coarse grid, in this specific case, all the while giving access to details about the dynamics of the transitory phase.

26

3.3. Benefits and numerical limits of the method The benefits of the method are twofold. First, the user gets a lot of additional information from the refined grid. This is showcased with the fine electron jet structure, the inversion layer of electric fields and details of the transitory phase of the expansion in sections 3.1 and 3.2. Second, the refined grid improves the accuracy of the coarse grid. Indeed, the presence of the refined grid is, at minimum, transparent to the coarse grid, and, at best, very beneficial to it. As shown for the magnetic reconnection case, the refined grid is able to significantly reduce the noise of the coarse grid and to transpose locally generated fields to it. Moreover, forcing similar fields on both grids has the nice property of forcing similar particles behaviour. This is necessary for a good kinetic description of the problem. For instance, if a jet is generated on the refined grid, the coupling via the fields ensures that a similar jet will form on the coarse grid. Two limits to the maximum value of RF have been identified. First, when the RF becomes too large, noise injection from the coarse to the refined grid becomes too strong and the benefit of a higher resolution can be lost. Another limit is set by equation 2. As the RF increases, the timestep has to be reduced in order to match the accuracy constraint on the refined grid. But ∆x remains the same on the coarse grid so ς might eventually become too small and this would trigger a finite grid instability. Nevertheless, very high RFs are still accessible and this represents a significant novelty in the refinement schemes currently available for PIC codes.

27

4. Implementation and performances of MLMD 4.1. Implementation The MLMD algorithm has been implemented in the IMM 2D PIC code, Parsek2D [38]. It is written in C++. It uses a “particle” object, a “grid” object and a “topology” object which defines a 2D MPI communicator. The MLMD implementation is strictly based on the same objects. ngrids of these objects are created, where ngrids is the number of levels the user needs. An “origin” property, added to the “grid” object, marks the starting coordinates of the current grid with respect to its parent grid. The original parallelization, based on a standard domain decomposition, is reused by each level without any changes. Each grid, associated with a particle object, has its own 2D MPI communicator and runs on a separate set of compute nodes. Each level is totally independent from other levels as long as communications between levels are not necessary. When they are, a global 3D communicator, including all processes of all grids, is used. An elegant organization is to use the hyperplanes of the global 3D communicator as the 2D communicators of each grid. The field exchanges between grids, either C2R or R2C, consist in the exact same operations at every time step since the grids positions are fixed. All buffers, weights and communication patterns needed for these interpolations are computed and stored at initialization. The particle information to be exchanged C2R is collected on the coarse grid at the end of the mover operations. Each coarse grid process gathers information about all the computational particles which produce, after splitting, particles sitting in the PRA of the refined grid. That is, all particles 28

with xi,PRA,L − ∆xi,gl ≤ xi,p ≤ xi,PRA,R + ∆xi,gl . The index i marks the spatial direction and xi,PRA,R and xi,PRA,L are the right and left boundary of the PRA projected on the coarse grid. This information is then sent to each refined grid process whose PRA overlaps a part of the sender’s extension. At this point, the refined grid processes proceed to particle splitting operations and communicate the newborn refined computational-particles to the appropriate neighbor processes. Notice that the native refined grid computational particles sitting in the PRA are deleted at the end of the mover operations. 4.2. Performances The performance of the MLMD algorithm should not be measured only in light of its scalability. One should first look at how computationally expensive it would be for a standard method to deliver equivalent results. One of the main advantages of the IMM MLMD code, presented here, with respect to explicit AMR codes, is the wide range of RFs that can be used while still satisfying the stability and accuracy conditions of Equation 2. This unusual feature suggested the method proposed here to evaluate the performances of the IMM MLMD. A Maxwellian plasma with ion and electron thermal velocity vth /c = 0.122 is simulated with ωpe ∆t = 0.15 on a domain with dimensions Lx /de = Ly /de = 21.0. The Maxwellian plasma is chosen as test case for its homogeneity. Two grids are simulated, both of them with a number of cells in the x and y direction of nx = ny = 80.

29

Table 1: Execution times in seconds for two-grids MLMD and single grid fully refined Maxwellian plasma simulations with varying RFs.

RF Exec Time on 2 grids [s] Exec Time on single fully refined grid [s] 2

41.25

57.13

6

43.98

469.4

10

45.87

1426

14

46.97

3287

The execution time registered in all cases for different RFs, fixed number of cycles Ncyc = 101 and using 8 × 8 cores per grid (128 cores in total) is summarized in the second column of Table 1. Notice that the execution times are, as a first approximation, roughly constant with the RF . This is because the same number of particles is simulated in all cases notwithstanding the RF and, as it is well known, particles operations have the highest impact on the execution time in PIC codes [39]. The third column of Table 1 shows the execution times of the above mentioned Maxwellian plasma simulations with single level simulations which adopt the refined grid resolution on the entire domain. The number of cores used here is the same as in the previous MLMD cases but nx and ny increase with the RF to keep up with the increasing resolution of the refined grid. Figure 12 compares the execution times of the two cases as a function of the RF. The gain in computational resources that the MLMD technique grants with respect to one-grid simulations, if only a portion Lx /RF × Ly /RF of the domain requires higher resolution, is of several orders of magnitude.

30

Figure 12: Execution times in seconds for the above mentioned Maxwellian plasma simulations as a function of the RF . The red line depicts the values for the two-grid MLMD simulations as reported in Table 1. The execution times of one-level simulations with resolution equal to the refined grid resolution are plotted in blue.

31

4.3. Parallelization issues of the method It is generally observed that the performances of the code decrease as the FR or the number of refined levels increase. Three reasons for this behavior can be identified. First, the number of operations for particle repopulation increases with the RF since each coarse level particle is used to generate RF 2 refined level particles. Second, the need for boundary conditions from the level above induces sequential dependences which do not appear in the single level algorithm. The refined grid has to wait for the coarse grid to finish the computation and communication of the boundary conditions it needs before being able to start solving the Maxwell equations. The code pays the full price of this sequentiality because each level is assigned to a separate group of nodes. As a consequence, processing units have a lot of idle time while waiting for data from nodes taking care of the currently active level. Third, communications between different grid levels are not load-balanced. Only a few processing units participate in these operations and the unbalance gets worse as the RF increases because a smaller and smaller fraction of the coarse grid interacts with the refined grid. Moreover, the code in its actual state is the closest it could be to the original single-level code. That made the MLMD implementation fairly straightforward in terms of software design, as exposed in section 4.1. But as a consequence, there is ample room for optimization. For instance, a major unnecessary cost is related to the fact that, in the actual state of the algorithm, both grids use the same time step, which is chosen according to the

32

finest grid resolution. It means that the time is over resolved on the coarser grids and many more cycles than strictly necessary are executed on the nodes dedicated to them: out of RF cycles, technically only 1 is necessary on the coarse grid. 4.4. Potential implementation improvements Most of the performance problems listed above could be solved by allowing sub-cycling and distributing each level over the whole system. Sub-cycling consists in having different time steps for each level. Typically, we would have ∆tgl = ∆tgl+1 × RF , meaning that the refined grid would run RF iterations for every single coarse grid iteration. Doing so, the computation of (RF-1) unnecessary coarse grid iterations would be avoided and the risk of having a finite grid instability on the coarse grid for high RFs, as explained in section 3.3, would not be a concern anymore. This would translate into actual wallclock time savings only if the entire number of cores at disposal could participate in the refined grid operations. In the actual state of the implementation, the available compute nodes are evenly split between each level. In these conditions, sub-cycling would only make the compute nodes dedicated to the coarse grid remain idle while waiting for the other levels to finish their set of iterations. This could be prevented by distributing each level over the whole set of available compute nodes. In practice, it means that, instead of hosting a fraction of only one level, each MPI process would host a smaller section of each level. There could be only one active level, but this level would be treated by all nodes simultaneously in a probably very efficient way, since the single level IMM algorithm has already proved an excellent scalability. The sequentiality, previously intro33

duced by the separation of the different levels on separate set of nodes, would then be removed and the code could take full profit of the sub-cycling benefits without suffering from any idle time. This implementation is more sophisticated and requires fundamental changes in the original code but it should be able to scale almost as well as the single level implementation. 5. Conclusion The implementation of a MLMD refinement technique in a 2D IMM PIC code and two applications are described. The results shown have been obtained with a two levels system where a refined grid overlaps a portion of the coarse grid. The refined grid can be used to get a more accurate description of a specific area of the initial coarse grid (see section 3.1). Reversely, the coarse grid can be used as an extension of the refined grid if the system scales become too large to fit in the refined grid only (see section 3.2). Both grids are fully simulated and exchange a minimum amount of information. It has been demonstrated that the refined grid is well driven by the coarse one and can provide extremely detailed information (refinement factor up to 14 in each dimension) at a very moderate cost. The original, non refined, coarse level benefits from the presence of an additional refined level and displays a local noise reduction. The magnetic reconnection case showed the ability of the refined grid to resolve structures not captured by the coarse grid. In particular, the MLMD simulations displayed fine electron structures such as jets or inversion layer of electric fields which previously required heavy, fully refined simulations. The computational savings are remarkable. Nevertheless, some architectural problems, like over sampling in time or communication bottle34

necks, arise, especially when the refinement factor gets high. Sub-cycling and a slightly more sophisticated MPI domain decomposition are proposed to reduce them significantly. 6. Acknowledgements The present work is supported by the Exascience Intel Lab Europe, by the Onderzoekfonds KU Leuven (Research Fund KU LEuven), by the European Commision’s Seventh Framework Programme (FP7/2007-2013) under the grant agreement No. 2636340 (SWIFF project, www.swiff.eu) and by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office (IAP P7/08 CHARM). We acknowledge that the results of this research have been achieved using the PRACE Research Infrastructure resource Curie based in France at TGCC. References [1] A. Chulaki, S. S. Bakshi, D. Berrios, M. Hesse, M. M. Kuznetsova, H. Lee, P. J. MacNeice, A. M. Mendoza, R. Mullinix, K. D. Patel, A. Pulkkinen, L. Rastaetter, J. Shim, A. Taktakishvili, Y. Zheng, Community Coordinated Modeling Center (CCMC): Providing Access to Space Weather Models and Research Support Tools, AGU Fall Meeting Abstracts (2011) A1999. [2] S. L. Young, J. Quinn, J. C. Johnston, Space Weather Forecasting Laboratory, A Baseline Space Weather Forecast Capability, AGU Fall Meeting Abstracts (2010) B6.

35

[3] G. T´oth, I. Sokolov, T. Gombosi, D. Chesney, C. Clauer, D. De Zeeuw, K. Hansen, K. Kane, W. Manchester, R. Oehmke, et al., Space weather modeling framework: A new tool for the space science community, Journal of geophysical research 110 (2005) A12226. [4] O. Steiner, M. Kn¨olker, M. Sch¨ ussler, Dynamic interaction of convection with magnetic flux sheets: first results of a new MHD code, in: R. J. Rutten, C. J. Schrijver (Eds.), Solar Surface Magnetism (1993), pp. 441–470. [5] H. Friedel, R. Grauer, C. Marliani, Adaptive Mesh Refinement for Singular Current Sheets in Incompressible Magnetohydrodynamic Flows, Journal of Computational Physics 134 (1997) 190–198. [6] K. G. Powell, P. L. Roe, T. J. Linde, T. I. Gombosi, D. L. de Zeeuw, A Solution-Adaptive Upwind Scheme for Ideal Magnetohydrodynamics, Journal of Computational Physics 154 (1999) 284–309. [7] U. Ziegler, A three-dimensional Cartesian adaptive mesh code for compressible magnetohydrodynamics, Computer Physics Communications 116 (1999) 65–77. [8] R. Keppens, M. Nool, G. Tth, J. Goedbloed, Adaptive mesh refinement for conservative systems: multi-dimensional efficiency evaluation, Computer Physics Communications 153 (2003) 317 – 339. [9] K. Fujimoto, R. D. Sydora, Electromagnetic particle-in-cell simulations on magnetic reconnection with adaptive mesh refinement, Computer Physics Communications 178 (2008) 915–923. 36

[10] J.-L. Vay, P. Colella, A. Friedman, D. P. Grote, P. McCorquodale, D. B. Serafini, Implementations of mesh refinement schemes for Particle-InCell plasma simulations, Computer Physics Communications 164 (2004) 297–305. [11] J. U. Brackbill, An Adaptive Grid with Directional Control, Journal of Computational Physics 108 (1993) 38–50. [12] L. Chac´on, G. L. Delzanno, J. M. Finn, Robust, multidimensional meshmotion based on monge–kantorovich equidistribution, Journal of Computational Physics 230 (2011) 87–103. [13] G. Lapenta,

Democritus: An adaptive particle in cell (pic) code

for object-plasma interactions, Journal of Computational Physics 230 (2011) 4679–4695. [14] C. K. Birdsall, A. B. Langdon, Plasma physics via computer simulation, Taylor & Francis, 2004. [15] R. Hockney, J. Eastwood, Computer simulation using particles, Taylor & Francis, 1988. [16] G. Lapenta, Particle rezoning for multidimensional kinetic particle-incell simulations, Journal of computational physics 181 (2002) 317–337. [17] K. Fujimoto, S. Machida, Electromagnetic full particle code with adaptive mesh refinement technique: Application to the current sheet evolution, Journal of Computational Physics 214 (2006) 550–566.

37

[18] P. Colella, P. Norgaard,

Controlling self-force errors at refinement

boundaries for amr-pic, Journal of Computational Physics 229 (2010) 947–957. [19] M. Innocenti, G. Lapenta, S. Markidis, A. Beck, A. Vapirev, A Multi Level Multi Domain Method for Particle In Cell Plasma Simulations, Journal of Computational Physics (2013). [20] J. Brackbill, D. Forslund,

An implicit method for electromagnetic

plasma simulation in two dimensions, Journal of Computational Physics 46 (1982) 271–308. [21] G. Lapenta, J. Brackbill, P. Ricci, Kinetic approach to microscopicmacroscopic coupling in space and laboratory plasmas, Physics of plasmas 13 (2006) 055904. [22] H. Vu, J. Brackbill, Celest1d: an implicit, fully kinetic model for lowfrequency, electromagnetic plasma simulation, Computer physics communications 69 (1992) 253–276. [23] D. Biskamp, Magnetic reconnection in plasmas, volume 3, Cambridge University Press, 2005. [24] W. Daughton, J. Scudder, H. Karimabadi, Fully kinetic simulations of undriven magnetic reconnection with open boundary conditions, Physics of Plasmas 13 (2006) 072101. [25] G. Lapenta, Particle in cell methods - with application to simulations in space weather, https://perswww.kuleuven.be/~u0052182/ pic/book.pdf, 2013. 38

[26] P. Ricci, J. Brackbill, W. Daughton, G. Lapenta, Collisionless magnetic reconnection in the presence of a guide field, Physics of plasmas 11 (2004) 4102–4114. [27] E. Harris, On a plasma sheath separating regions of oppositely directed magnetic field, Il Nuovo Cimento (1955-1965) 23 (1962) 115–121. [28] J. Drake, M. Shay, The fundamentals of collisionless reconnection, Reconnection of Magnetic Fields: Magnetohydrodynamic and Collisionless Theory and Observations (2007) 87–99. [29] J. F. Drake, M. A. Shay, M. Swisdak, The Hall fields and fast magnetic reconnection, Physics of Plasmas 15 (2008) 042306. [30] J. Birn, E. Priest, Reconnection of magnetic fields: Mhd and collisionless theory and observations, 2007. [31] K. Fujimoto, Time evolution of the electron diffusion region and the reconnection rate in fully kinetic and large system, Physics of plasmas 13 (2006) 072904–072904. [32] H. Karimabadi, W. Daughton, J. Scudder, Multi-scale structure of the electron diffusion region, Geophysical research letters 34 (2007) L13104. [33] A. Le, J. Egedal, W. Fox, N. Katz, A. Vrublevskis, W. Daughton, J. Drake, Equations of state in collisionless magnetic reconnection, Physics of Plasmas 17 (2010) 055703–055703. [34] H. S. Fu, Y. V. Khotyaintsev, M. Andr´e, A. Vaivads, Fermi and beta-

39

tron acceleration of suprathermal electrons behind dipolarization fronts, Geophysical Research Letters 38 (2011) L16104. [35] L.-J. Chen, W. S. Daughton, B. Lefebvre, R. B. Torbert, The inversion layer of electric fields and electron phase-space-hole structure during two-dimensional collisionless magnetic reconnection, Physics of Plasmas 18 (2011) 012904. [36] M. Murakami, M. M. Basko, Self-similar plasma expansion of a limited mass into vacuum, Journal de Physique IV 133 (2006) 329–334. [37] A. Beck, F. Pantellini, Spherical expansion of a collisionless plasma into vacuum: self-similar solution and ab initio simulations, Plasma Physics and Controlled Fusion 51 (2009) 015004. [38] S. Markidis, E. Camporeale, D. Burgess, Rizwan-Uddin, G. Lapenta, Parsek2D: An Implicit Parallel Particle-in-Cell Code,

in:

N. V.

Pogorelov, E. Audit, P. Colella, G. P. Zank (Eds.), Numerical Modeling of Space Plasma Flows: ASTRONUM-2008, volume 406 of Astronomical Society of the Pacific Conference Series, p. 237. [39] P. L. Pritchett, Particle-in-cell simulations of magnetosphere electrodynamics, IEEE Transactions on Plasma Science 28 (2000) 1976–1990.

40

Suggest Documents