Concurrent, Parallel, Multiphysics Coupling in the ...

2 downloads 208 Views 958KB Size Report
... demanding components, while at the same time usable on laptops for less ..... Physics components have been refurbished, a framework for parallel.
SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

Concurrent, Parallel, Multiphysics Coupling in the FACETS Project J R Cary1 , J Candy2 , J Cobb3 , R H Cohen4 , T Epperly4 , D J Estep5 , S Krasheninnikov6 , A D Malony7 , D C McCune8 , L McInnes9 , A Pankin10 , S Balay9 , J A Carlsson1 , M R Fahey3 , R J Groebner2 , A H Hakim1 , S E Kruger1 , M Miah1 , A Pletzer1 , S Shasharina1 , S Vadlamani1 , D Wade-Stein1 , T D Rognlien4 , A Morris7 , S Shende7 , G W Hammett8 , K Indireshkumar7 , A Yu Pigarov6 , H Zhang9 1

Tech-X Corporation, 5621 Arapahoe Avenue, Suite A, Boulder, CO 80303 General Atomics 3 Oak Ridge National Laboratory 4 Lawrence Livermore National Laboratory 5 Colorado State University 6 University of California at San Diego 7 ParaTools, Inc. 8 Princeton Plasma Physics Laboratory 9 Argonne National Laboratory 10 Lehigh University 2

E-mail: [email protected] Abstract. FACETS (Framework Application for Core-Edge Transport Simulations), is now in its third year. The FACETS team has developed a framework for concurrent coupling of parallel computational physics for use on Leadership Class Facilities (LCFs). In the course of the last year, FACETS has tackled many of the difficult problems of moving to parallel, integrated modeling by developing algorithms for coupled systems, extracting legacy applications as components, modifying them to run on LCFs, and improving the performance of all components. The development of FACETS abides by rigorous engineering standards, including cross platform build and test systems, with the latter covering regression, performance, and visualization. In addition, FACETS has demonstrated the ability to incorporate full turbulence computations for the highest fidelity transport computations. Early indications are that the framework, using such computations, scales to multiple tens of thousands of processors. These accomplishments were a result of an interdisciplinary collaboration among computational physics, computer scientists and applied mathematicians on the team.

1. Introduction The FACETS (Framework Application for Core-Edge Transport Simulations) project [1] has the goal of providing whole-tokamak modeling through coupling separate components for each of the core region, edge region, and wall, with fully realistic sources. This is a complex problem, as each component is parallel in its own right, and each can be parallelized in a distinct manner. Direct simulation of the entire system is not possible due to the range of scales. The spatial scales vary from the electron gyroradius (≈ 0.01 mm in the edge to ≈ 0.1 mm in the core) to c 2009 IOP Publishing Ltd 

1

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

the system size (of order several meters), i.e., by a factor 3 × 10 5 . The time scales vary from the electron gyroperiod (20 ps) to the discharge duration (≈ 1000 s), i.e., by a factor of 6 × 10 13 . Thus, a full simulation would require the integration of 3 × 10 16 (spatial resolution lengths)3 for 6 × 1013 temporal resolution periods, for a product of 2 × 10 30 . With the need for of 106−12 degrees of freedom per spatial resolution volume (100 per length for a modest fluid model, easily larger by 100 to resolve velocity space as well), and 10 2 floating point operations per update of a degree of freedom for on temporal resolution period, such a fundamental simulation will require 2 × 1038−44 floating point operations, which even on petascale platforms, would require 2 × 1023−29 s, exceeding the age of the universe by a factor of 10 6−12 . Given the large disparity between what is possible and what is needed, progress can only be achieved by separating physics into different parts, such that for each part, valid, reduced approximations exist. For example, in the core of the plasma, the rapid transport along field lines assures that the plasma parameters such as density and temperature are, over long time scales, constant on toroidally nested flux surfaces. This reduces the transport equation to one dimension for evolution on the discharge time scale. As another, in the plasma edge, though simulations must be global, they are nevertheless over a narrow region, and one can use averaging methods to reduce the time scale disparity. The above naturally translates to a software component approach, which is the approach FACETS is taking. It is bringing together successively more accurate and, hence, computationally demanding components to model the complete plasma device. It is being constructed to run on Leadership Class Facilities (LCFs) to be able to use the most computationally demanding components, while at the same time usable on laptops for less demanding models. To do this FACETS is constructing a C++ framework for incorporating the best software packages and physics components. In this paper we discuss the FACETS progress of the last year. 2. Converting legacy applications to components suitable for Leadership Class Facilities Transforming legacy fusion applications/libraries to FACETS components suitable for Leadership Class Facilities (LCFs) requires glue code to connect the legacy application to the FACETS interface and a cross-compile build environment to produce a statically linked executable. We chose to target a static executable to make FACETS portable to the widest possible collection of current and future LCFs. Glue code translates FACETS interface calls to legacy application calls and performs tasks like language interoperability, unit conversions, and calculating aggregate quantities from mesh data. In the case of UEDGE, our fluid edge component, we had to replace Python-based glue code with a new approach because Python typically uses dynamically loadable libraries — not a static executable. Our approach to replace UEDGE’s Python-based glue code involved making an extension to Forthon [2] and rewriting functions implemented in Python. UEDGE uses Forthon, a tool for generating Python wrappers for Fortran parameters, subroutines, and functions, to create a Python interface to hundreds of variables and tens of subroutines and functions. We extended Forthon to generate Babel SIDL [3] files and implementation files to provide a C++ interface to UEDGE’s Fortran code. This approach leveraged all the preexisting Forthon interface description files (.v files); although, we had to insert additional information in the .v files in the form of comments. Thus, one set of .v files supports traditional UEDGE and LCF UEDGE. We wrote C++ glue code to replace code previously implemented in Python. The largest part of this work involved writing the routines to input and output UEDGE data structures to/from HDF5 files. For a few of the legacy applications including UEDGE, we provided new autoconf-based configuration and build systems. These systems were designed to support multiple types of 2

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

builds, for example serial, parallel, and LCF. These systems required some adjustments to perform correctly on LCF machines where the front-end nodes are different than the compute nodes. Although mechnical in nature, making these kinds of changes is time consuming. 3. Improving performance of components through algorithmic modifications Recent work has focused on incorporating robust and scalable parallel nonlinear solvers from the PETSc library into UEDGE to solve the nonlinear system f (u) = 0, where u represents the vector of unknowns. We implemented complete functionality for fully implicit, parallel matrix-free Newton-Krylov solvers. The use of PETSc has allowed us to overcome a major bottleneck in the parallel implementation of UEDGE. This multispecies code evolves density and temperature profiles of hydrogen and impurity plasma ions and neutral components of the same. Strong nonlinearities exist owing to ionization and recombination, and the equation set must be well preconditioned to use the efficient Newton-Krylov solver technique. Previously, we could not simultaneously advance plasma and neutrals in parallel because of a very limited block Jacobi algorithm for the parallel preconditioner. Implementation of PETSc has opened an array of possible preconditioners ranging from Jacobi, additive Schwarz, and finally full LU, which now allows us to advance both ions and neutrals together. Furthermore, the PETSc ”coloring” scheme for efficiently computing the full finite-difference preconditioning Jacobian in parallel gives further substantial savings. 4. Embedding parallel turbulence computations in FACETS Turbulence contributes a significant fraction of total transport in the core plasma. Our recent efforts have focused on extending the FACETS core solver to incorporate fluxes from the five-dimensional gyrokinetic continuum turbulence code GYRO. Embedding turbulence calculations creates special software and algorithmic challenges. In particular, as these calculations are very expensive, special domain decomposition strategies need to be developed. Further, efficient algorithms need to be developed to minimize the time it takes to achieve statistical steadystate of the computed fluxes and ensure that the Jacobian needed to advance the transport equations are computed with minimum function calls. For performing the domain decomposition we have Figure 1. Weak scaling on the Indeveloped a set of C++ classes to split a given set of trepid LCF of embedded turbulent processors into different communicators, each running an flux calculation using GYRO. There instance of GYRO in parallel. For example, given 5120 is a 10% drop in efficiency at high processors and ten flux surfaces we create ten worker processor counts due to decomposicommunicators, each running GYRO on 512 processors. tion/network toplogy mismatch. For data transfer with the core solver an additional communicator of all rank 0 processors of the worker communicators is created. A distributed array is created on this messaging communicator and is used to transfer gradients and values to the GYRO instances and fluxes back to the core solver. This infrastructure was tested on the Intrepid LCF for parallel scaling efficiency by running GYRO on 512 processors per flux surface with increasing the number of flux surfaces from 4 to 64. Timing studies show that our infrastructure adds a 3 second overhead per 2 hours of embedded GYRO calculations. Further, the infrastructure scales almost linearly, showing a 10% loss in efficiencies going from 32 flux surfaces (16,384 PEs) to 62 flux surfaces (32,768 PEs). This loss is attributed for failing to take 3

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

into account the network topology of the supercomputer. We are exploring ways to improve this by instrumenting FACETS with the TAU profiling tool. Our current effort is focused on incorporating the fluxes from the GYRO calculations into the core solver so that we can advance the transport equations in time using turbulent fluxes. For this we are developing algorithms to compute running statistical averages of fluxes and the ability to stop a GYRO instance once fluxes are in statistical steady-state. Strategies to restart GYRO from perturbed previous solutions as initial conditions are being developed. These two algorithmic advances will allow us to couple the turbulence fluxes into FACET in the core solver and create a dynamic embedded-turbulence transport solver. 5. Coupling realistic particle and energy sources The parallelized PPPL Monte Carlo package (NUBEAM) is used for computing the core sources from neutral beam injection. Progress in incorporating NUBEAM into FACETS has been made on two fronts: (i) design and development of a Plasma State physics component interface in Fortran and a mechanism to access this interface from a C/C++ driver and (ii) development of a parallelized driver for the NUBEAM component. The strategy for coupling NUBEAM to a C/C++ driver leverages the Plasma State Component interface to NUBEAM that was already under development for the SWIM SCIDAC. The Plasma State (PS) is a fortran-90 object incorporating a time slice representation of a tokamak equilibrium, plasma profiles such as temperatures, densities, and flow velocities, and core plasma particle, momentum, and energy sources. It also incorporates a machine description section containing time invariant quantities such as neutral beam and RF antenna geometries. The PS implementing code along with the C/C++ wrapper is generated from a state specification file using a Python script. A PPPL/Tech-X collaboration [8] developed a reliable, portable method for control of instantiation of fortran-90 Plasma State objects from C++ code, and set/get access to all state elements. This “opaque handle” method identifies state instances using a short array of integers; the C/C++ routines thus access fortran-90 routines through an interface involving fortran-77 primitive data types only. The majority of inputs to NUBEAM-including all physical quantities that would be required for any neutral beam model-are handled through the Plasma State, using this method. The remainder of inputs-NUBEAM specific numerical controls such as the sizes of Monte Carlo particle lists to use in a given simulation-are handled through two C/C++ compatible sequenced “shallow” fortran-90 data types (one for initialization, and a smaller one containing parameters that are adjustable in time). These control structures, containing no allocatable elements, are directly mapped to C structs. A new parallelized driver for the NUBEAM has been developed. This has allowed testing, scaling studies, and served as a template for the C++ driver in FACETS. NUBEAM is the first component in FACETS using volumetric coupling with core transport equations. 6. Plasma-wall interaction module The edge plasma and material wall are strongly coupled primarily via particle recycling, impurity production, and plasma power exhaust. The handling of transient and static peak power loads, core plasma contamination with impurities, plasma-facing components lifetime, and hydrogen retention in walls are the critical issues affecting the design of next-step fusion devices (like ITER, CTF, DEMO). To address these issues and to model self-consistently the plasma-wall coupling in FACETS, the Wall and Plasma-Surface Interactions (WALLPSI) module was developed [4]. The work on WALLPSI verification and validation against vast experimental data is in progress. Several laboratory experiments had showed clear saturation of retained deuterium in beryllium and in graphite (e.g. pyrolitic graphite). In [4], the results of simulations of static deuterium retention in graphite and beryllium at room temperatures with WALLPSI 4

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

were presented showing good agreement with experimental data (see Fig. 2). Here, deuterium is retained via collisional production of broken bond traps followed by population of these traps by deuterium atoms. The modeled saturated dose for deuterium in graphite is consistent with [D]/[C]=0.4, the measured concentration of statically retained deuterium averaged over the projectile range. WALLPSI verification includes solving of simple diffusive 1-D transport problems for hydrogen in wall material. The coupled plasma-wall modeling scheme with WALLPSI was tested by: (i) calculating the inventory build-up of mobile, chemically bonded, adsorbed and trapped hydrogen in the wall as well as the nonlinear variation of hydrogen recycling coefficient and impurity production rate in response to the incident plasma particle and energy fluxes, and (ii) simulating the spatiotemporal evolution of plasma parameters and hydrogen inventory in the wall with the coupled WALLPSI and plasma transport code EDGE1D for a range of plasma conditions [4]. 7. Development of multicomponent visualization capabilities Example of Visualization is extremely valuable in providing better under- Figure 2. WALLPSI validation against standing of scientific data generated by fusion simulations. Various models need to be compared with each other and validated experimental retention data against experiments. That is why FACETS developed a set of standards and tools facilitating FACETS data visualization. All FACETS components use HDF5 data format for the output and comply to a particular standard in organizing HDF5 so that the data that is supposed to be visualized is easily found and interpreted. We call this standard VizSchema [5], [6]. VizSchema is a self-contained vocabulary for describing gridded data and so allows third party applications to visualize and post-process FACETS data. For example, the following pseudo-code snippet shows the metadata indicating thet the dataset (electron temperature in the scape-off-layer) needs to be visualized, needs a mesh called solmesh, should be interpolated to a zone and is a part of a composite variable called tes: Dataset "tesSol" { Att vsType = "variable" Att vsMesh = "solmesh" Att vsCentering = "zonal" Att vsMD = "tes" }

Mesh’s metadata describes its kind and provides appropriate for the kind information. For example, this snippet describes a structured mesh which is a part of a bigger mesh called sol: Group Att Att Att }

"solmesh" { vsType = "mesh" vsKind = "structured" vsMD = "sol"

The vsMD attributes shown above indicate that all tes variables will be combined into one variable which will live on a mesh combined from the sol meshes. Based on the VizSchema standard, we developed a plugin for the VisIt [7] visualization tool. This plugin allows visualization of all FACETS data. An example of multicomponent visualization (core and three edge resions) is shown in Fig. 3. 5

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

8. First physics studies The turbulence in the core region has been the focus of intense research for over 10 years, resulting in a reasonable understanding of the involved processes. As discussed above, FACETS leverages these efforts by integrating embedded gyrokinetic codes for calculating core transport fluxes. A key part of the goal for FACETS in the coming year is to extend the solver techniques to enable full time-dependent simulations using embedded gyrokinetic coupled to Monte Carlo calculation of neutral beam sources. The embedded turbulence technique relies on the turbulence correlation lengths being much smaller than the system scale lengths. This not true for the plasma near the edge of the tokamak, and as a result the a complete understanding of the transport in the edge region is more primitive (although progress is being made in separately-funded projects). Despite the limitations of the fluid edge model used in the inital phase of FACETS, the model provides accurate models for the parallel transport and sources from the wall. For the first core-edge physics study, we exploit these strengths to study the pedestal build-up of tokamak plasmas. The highest-performing tokamak plasmas are characterized by steep edge plasma parameters known as the pedestal. Figure 3. Three-dimensional Eventually, these steep profiles become so steep that they drive visualization of an integrated instabilities which destroy them. The details of how the pedestal core and edge simulation. forms has many unknowns because it is known to depend not only on the edge turbulence, but also the amount of power leaving the core and the way in which the plasma interacts with the wall [Maingi09]. We have begun simulations of pedestal buildup of the DIII-D experimental discharges utilizing an interpretive mode for UEDGE where plasma profiles are taken from experimental measurements and given sources of particles and energy, the transport coefficients are computed. On the closed magnetic field lines, the interpretive procedure assumes plasma profiles are approximately constant on flux surfaces (verifiable by direct 2D UEDGE simulation for spatially varying transport coefficients). The resulting 1D flux-surface averaged transport equation then treats the plasma fluxes as the unknowns, given the experimental profiles, UEDGE-computed neutral particle source, and input edge power and particle flux from neutral beam fueling. From the computed radial plasmas fluxes (density and ion/electron energy), transport diffusivities can be determined (as gradients are known). What is unknown experimentally is the magnitude and detailed 2D distribution of the neutral gas source from gas puffing and wall/divertor recycling of ions into neutrals. The physics question to be answered by this procedure is how the pedestal region is fueled from such neutral sources in a setting where the transport coefficients are constrained and if this source is consistent with core density build-up using theory-based transport there. 9. Workflow for core-edge analysis Computational workflow (in this context) refers to the process by which one starts with the initial set of data, runs the simulations to produce a final set of data, analyzes the data, and produces scientific results. Even for standalone, single-purpose codes, computational workflow issues can be cumbersome and difficult for new users who have to learn how to modify the input parameters, how to run the code on remote systems, and how to analyze the data. As FACETS encapsulates more codes into its framework, these workflow issues increase in difficulty rapidly, and a constant re-evaluation of workflow issues is necessary to ensure that users are able to use FACETS. 6

SciDAC 2009 Journal of Physics: Conference Series 180 (2009) 012056

IOP Publishing doi:10.1088/1742-6596/180/1/012056

As described in the previous section, we are utilizing experimental data to help constrain the simulations. By constraining what we know, we are able to use simulations to help understand the role of other difficult to measure quantities. The type of analysis in this simulation is unique and has required development of new tools. The general procedure is one of using a combination of python and bash scripts to setup the simulation, and python scripts to analyze the output. The python setup scripts are able to handle the disparate data sources from the experimental analysis (multiple because most of the data is generally written to be easily used by core-only codes). The output scripts must collect the output from each component, and then present a unified visualization for analysis. This is done using either matplotlib for routine visualization, or as seen in Fig. 3, visit for higher-end visualization and analysis. Initial studies are underway and the workflow is continually improved as the result of feedback from users and as the simulations progress. 10. Summary The FACETS project has made steady progress towards providing whole device modeling for the fusion community. Physics components have been refurbished, a framework for parallel components is undergoing continuing development, data analysis and visualization tools have been adapted, and the project is now embarking on its first physics studies. The framework has made tens of packages available on platforms from laptops to LCFs. Acknowledgments Work supported by the U.S. Department of Energy (DOE) Office of Science under grants and contracts including DE-FC02-07ER54907 at Tech-X, DE-AC02-06CH11357 at ANL, DEFC02-07ER54909 at CSU, DE-AC52-07NA27344 at LLNL, DE-AC05-00OR22725 at ORNL, DE-FC02-07ER54910 at ParaTools, DE-AC02-76CH03073 at PPPL, DE-FC0207-ER54908 and DE-FG0204-ER54739 at UCSD. This work used the resources of the National Energy Research Scientific Computing Center, which is supported by the DOE Office of Science under Contract No. DE-AC02-05CH11231,and of the National Center for Computational Sciences at ORNL, which is supported by the DOE Office of Science under Contract No. DE-AC05-00OR22725, and of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC0206CH11357. References [1] J R Cary, J Candy, R H Cohen, S Krasheninnikov, D C McCune, D J Estep, J Larson, A D Malony, A. Pankin et al 2008 First results from core-edge parallel composition in the FACETS project J. Physics: Conf. Series 125 012040 [2] D P Grote 2009 Forthon, Lawrence Berkeley National Laboratory, http://hifweb.lbl.gov/Forthon (last viewed June 4, 2009). [3] Tamara Dahlgren, Thomas Epperly, Gary Kumfert, and James Leek. Babel User’s Guide. CASC, Lawrence Livermore National Laboratory, Livermore, CA, 2004. [4] A.Yu. Pigarov and S.I. Krasheninnikov, Coupled plasma-wall modeling, Journal Nuclear Materials 390-391 (2009) 192. [5] VizSchema, https://ice.txcorp.com/trac/vizschema/wiki/WikiStart. [6] S Shasharina, J R Cary, S Veitzer, P Hamill, S Kruger, M Durant, and D Alexander Vizschema - visualization interface for scientific data, to be published in proceedings of Computer Graphics, Visualization, Computer Vision and Image Processing 2009, Algarve, Portugal, June 20-22, 2009. [7] H. Childs, E. S. Brugger, K. S. Bonnell, J. S. Meredith, M. Miller, B, J Whitlock and N. Max A Contract-Based System for Large Data Visualization Proceedings of IEEE Visualization 2005, pp 190-198, Minneapolis, Minnesota, October 23–25, 2005. [8] A Pletzer, D McCune, S Muszala, S Vadlamani and S Kruger, Exposing Fortran Derived Types to C and Other Languages, Computing in Science and Engineering 10 (2008) 86. 7

Suggest Documents