Parallel Simulation of Multicomponent Systems - Semantic Scholar

Parallel Simulation of Multicomponent Systems Michael T. Heath and Xiangmin Jiao

Computational Science and Engineering University of Illinois, Urbana, IL 61801, USA {heath,jiao}@cse.uiuc.edu

Simulation of multicomponent systems poses many critical challenges in science and engineering. We overview some software and algorithmic issues in developing high-performance simulation tools for such systems, based on our experience in developing a large-scale, fullycoupled code for detailed simulation of solid propellant rockets. We briey sketch some of our solutions to these issues, with focus on parallel and performance aspects. We present some recent progress and results of our coupled simulation code, and outline some remaining roadblocks to advancing the state of the art in high-delity, high-performance multicomponent simulations. Abstract.

1

Introduction

Many real-world systems involve complex interactions between multiple physical components. Examples include natural systems, such as climate models, as well as engineered systems, such as automobile, aircraft, or rocket engines. Simulation of such systems helps improve our understanding of their function or design, and potentially leads to substantial savings in time, money, and energy. However, simulation of multicomponent systems poses signicant challenges in the physical disciplines involved, as well as computational mathematics and software systems. In addition, such simulations require enormous computational power and data storage that usually are available only through massively parallel computer systems or clusters. Therefore, the algorithms and software systems must function eciently in parallel in order to have signicant impact on real-world applications of complex multicomponent systems. At the Center for Simulation of Advanced Rockets (CSAR) at the University of Illinois (http://www.csar.uiuc.edu), we have been developing a software system for detailed simulation of solid rocket motors. A rocket motor is a complex system of interactions among various partspropellant, case, insulation, nozzle, igniter, core ow, etc.involving a variety of mechanical, thermal, and chemical processes, materials, and phases. Compared to the simulation codes currently employed in the rocket industry, which typically use one- or two-dimensional empirical models, our software system is fully-coupled, three-dimensional, and based on rst principles, involving a wide range of temporal and spatial scales. Many issues must be addressed in developing such a high-performance system, including

easy-to-use software environment for exploiting parallelism within as well as between components reusable parallel software components and high-performance physics modules system tools for assessing and tuning performance of individual modules as well as the integrated system management of distributed data objects for inter-module interactions parallel computational methods, such as data transfer between disparate, distributed meshes

In this paper we overview some of our recent progress in addressing these issues and outline some remaining challenges in large-scale multicomponent simulations. The remainder of the paper is organized as follows. Section 2 overviews the parallel, multiphysics software system developed at CSAR. Section 3 describes the key characteristics of the physics modules and their parallel implementations. Section 4 presents our novel integration framework for multicomponent systems and some middleware services built on top of it. Section 5 surveys a few key computational problems arising from the integration of such systems and our solutions to them. Section 6 presents sample simulation results obtained with our code, followed by performance results in Section 7. Section 8 concludes the paper with a discussion of some remaining challenges.

2

System Overview

Rocstar

is our high-performance, integrated software system for detailed, whole-

system simulation of solid rocket motors, currently under development at CSAR. We briey overview the methodology and the software components of this system.

2.1

Coupling Methodology

Simulation of a rocket motor involves many disciplines, including three broad physical disciplinesuid dynamics, solid mechanics, and combustionthat interact with each other at the primary system level, with additional subsystem level interactions, such as particles and turbulence within uids. Because of its complex and cross-disciplinary nature, the development of

Rocstar

has been in-

trinsically demanding, requiring diverse backgrounds within the research team. In addition, the capabilities required from the individual physical disciplines are at the frontier of their respective research agendas, which entails rapid and independent evolution of their software implementations. To accommodate the diverse and dynamically changing needs of individual physics disciplines, we have adopted a

partitioned

approach, to enable coupling

of individual software components that solve problems in their own physical and geometrical domains. With this approach, the physical components of the system are naturally mapped onto various software components (or modules),

2

Fig. 1.

Overview of Rocstar software components.

which can then be developed and parallelized independently. These modules are then integrated into a coherent system through an integration framework, which, among other responsibilities, manages the distributed data objects, and performs inter-module communications on parallel machines.

2.2

System Components

To enable parallel simulations of rockets, we have developed a large number of software modules. Fig. 1 shows an overview of the components of the current generation of

Rocstar. These

modules serve very diverse purposes and have di-

verse needs in their development and integration. We group these modules into the following four categories.

Physics modules

solve physical problems in their respective geometric do-

mains. In general, they are similar to stand-alone applications, are typically written in Fortran 90, and use array based data structures encapsulated in derived types.

Service modules provide specic service utilities, such as I/O, communication,

and data transfer. They are typically developed by computer scientists but driven by the needs of applications, and are usually written in C++.

Integration interface provides data management and function invocation mech-

anisms for inter-module interactions.

Control (orchestration ) modules

specify overall coupling schemes. They con-

tain high-level, domain-specic constructs built on top of service modules, provide callback routines for physics modules to obtain boundary conditions, and mediate the initialization, execution, nalization, and I/O for physics and service modules through the integration interface. In

Rocstar, the above categories correspond to the components at the lower-

left, right, center, and top, respectively, of Fig. 1. In the following sections, we describe various parallel aspects associated with these modules. In addition, our system uses some

o-line tools, such as those in the upper-left corner of Fig. 1,

which provide specic pre- or post-processing utilities for physics modules.

3

3

High-Performance Physics Modules

Fig. 2 shows the modeling capabilities of the current generation of

Rocstar. These

capabilities are embedded in physics modules in three broad areas:

uid dynamics, modeling core ow and associated multiphysics phenomena solid mechanics, modeling propellant, insulation, case, nozzle combustion, modeling ignition and burning of propellant We summarize the key features and parallel implementations of these modules, focusing on their interactions and the characteristics that are most critical to the coupling of

Rocstar.

Fig. 2.

3.1

Current modeling capabilities of Rocstar.

Fluid Dynamics Solvers

Roco

[3] and

Rocu

[8] are advanced CFD solvers developed at CSAR. Both

modules solve the integral form of the 3-D time-dependent compressible Euler or Navier-Stokes equations using nite-volume methods. In addition, they are both designed to interact with other physics sub-modules, including turbulence, particles, chemical reactions, and radiation [17], through the physics framework illustrated in Figure 3.

Roco

Rocuid

multi-

uses a multiblock structured

mesh, developed by Blazek, Wasistho, and colleagues.

Rocu,

developed pri-

marily by Haselbacher, uses a mixed unstructured mesh consisting of arbitrary combinations of tetrahedra, hexahedra, prisms, and pyramids. Both solvers provide second- or higher-order accuracy in both space and time, notably with a novel WENO reconstruction method for unstructured grids in

Rocu

[8]. To ac-

commodate dynamically changing uid domains, these solvers allow for moving meshes using an Arbitrary Lagrangian-Eulerian (ALE) formulation. The Geometric Conservation Law (GCL) [24] is satised in a discrete sense to within machine-precision to avoid the introduction of spurious sources of mass, momentum, and energy due to grid motion.

4

Fig. 3.

3.2

Multiphysics framework of Roco and Rocu.

Solid Mechanics Solvers

Rocsolid

and

Rocfrac

are two general-purpose nite-element solvers for linear

and nonlinear elastic problems with large deformations, and both support moving meshes using an ALE formulation. Developed by Namazifard, Parsons, and colleagues,

Rocsolid

is an implicit solver using unstructured hexahedral meshes

[19]. It provides a scalable geometric multigrid method for symmetric linear systems [18], and a BiCGSTAB method for nonsymmetric systems.

Rocfrac

is an

explicit solver, developed by Breitenfeld and Geubelle that uses linear (4-node) or quadratic (10-node) tetrahedral elements for complex geometries, with additional support for other types of elements (including hexahedra and prisms) for anisotropic geometries (e.g., the case of a rocket motor).

Rocfrac

can also

simulate complex dynamic fracture problems, as in the example shown in Fig. 4, using an explicit Cohesive Volumetric Finite Element (CVFE) scheme [6].

Crack propagation in medium with holes.

Three-dimensional simulation of heterogeneous solid propellant.

Fig. 4.

Fig. 5.

5

3.3

Combustion Modules

Rocburn,

our combustion module, provides the regression rate of solid propel-

lant at the burning propellant surface, which is critical for driving the motion of the interface and achieving conservation of mass and energy at the solid-uid interface. Developed by Tang, Massa, and colleagues,

Rocburn

is composed of

three interchangeable one-dimensional combustion models and an adaptor module. The combustion models calculate the burning rate at any given point on the burning surface using either a quasi-steady combustion model using Saint Robert's (a.k.a. Vieille's) law [10], or one of two unsteady combustion models that can capture ignition pressure spikes [22, 23]. The adaptor module manages data and interact with other modules at the level of the burning surface. Additional one-dimensional combustion models can also be plugged into the adaptor module conveniently. In addition,

Rocburn

simulates the ignition of the propel-

lant under convective heat ux, when a thermal boundary layer is present in the

Rocburn are in turn informed by more detailed Rocre, developed by Jackson and Massa [16], which models com-

uids modules. The models in results from

bustion of heterogeneous solid propellant at micron-scale resolution. These scales are small enough to capture the interaction between gas phase elds and solid phase morphology, which plays a fundamental role in the combustion of heterogeneous propellant, as shown by the simulation depicted in Fig. 5. Because it is at the micron scale,

Rocre

is not directly integrated into

Rocstar, but it Rocburn.

is

critical for calibrating the homogenized combustion models used in

3.4

High-Performance Implementations

All of our physics modules are implemented in Fortran 90 and employ derived data types and pointers extensively to achieve data encapsulation and modularity. For example, each block in

Roco

or partition or

Rocu

is regarded as an

object and contains all data relevant to that block, including nodal coordinates, face vectors, cell volumes, and state vectors. To achieve high serial performance, computationally intensive parts of the codes are written in Fortran 77 style (e.g., without using pointers) or utilize LAPACK routines [1] whenever possible.

Parallel Implementations

Each physics module is parallelized for distributed

memory machines using a domain-decomposition approach. For

Roco,

which

uses multiblock structured meshes, parallelization is based on the block topology, where each processor may own one or more blocks. For other modules using unstructured meshes, the parallel implementations are based on element-based partitioning of meshes, typically with one partition per process. For inter-block or inter-partition data exchange, the nite-volume (uids) codes utilize the concept

Roco uses non-blocking Rocu utilizes the services

of ghost (or dummy) cells, with the dierence that persistent communication calls of MPI [7], whereas

provided by the Finite Element Framework built on top of MPI [2]. For the nite-element (solids) modules, the nodes along partition boundaries are shared by multiple processes, and their values are communicated using non-blocking

6

MPI calls.

Rocburn

is solved on the burning patches of either the uid or solid

surface meshes, and its decomposition is derived from that of the volume mesh of the corresponding parent module.

Inter- and Intra-Parallelism Through Adaptive MPI

Adaptive MPI (AMPI) is a

multi-threaded version of MPI that supports virtual processes and dynamic migration [9]. Developed in Kalé's group at Illinois, AMPI is based on Charm++, provides virtually all the standard MPI calls, and can be linked with any MPI code with only minor changes to driver routines. Its virtual processes allow emulating larger machines with smaller ones and overlapping communication with computation, as demonstrated by the example in Fig. 6. To take full advantage of virtual processes, we have have eliminated global variables in nearly all modules in

Rocstar. For each physics module, whose data are well encapsulated to begin

with, a context object (of a derived data type) is introduced to contain pointers to all the data used by the module. Each virtual process has a private copy of the context object for each physics module, and this private copy is passed to all the top-level interface subroutines of the physics module at runtime, utilizing the data registration and function invocation mechanisms of

Roccom, which we will

describe shortly. Utilizing AMPI, our physics code developers and users can, for example, run a 480-process data set by using 480 virtual AMPI processes on just 32 physical processors of a Linux cluster, and to take advantage of parallelism within a process, without having to use a complex hybrid MPI plus OpenMP programming paradigm.

(a) 8 virtual processes on 8 physical processors.

(b) 64 virtual processes on 8 physical processors.

AMPI minimizes processor idle times (gaps in plots) by overlapping communication with computation on dierent virtual processes. Courtesy of Charm group. Fig. 6.

4

Integration Framework and Middleware Services

To accommodate rapidly changing requirements of physics modules, we have developed a software framework that allows the individual components to be developed as independently as possible, and integrate them subsequently with little or no changes. It provides maximum exibility for physics codes and can be

7

adapted to t the diverse needs of the components, instead of requiring the opposite. This framework is dierent from many traditional software architectures and frameworks, which typically assume that the framework is fully in control, and are designed for extension instead of integration.

4.1

Management of Distributed Objects

To facilitate interactions between modules, we have developed an object-oriented,

Roccom. Its design is based on an impersistent objects. An object is said to be

data-centric integration framework called portant observation and assumption of

persistent

if it lasts beyond a major coupled simulation step. In a typical physics

module, especially in the high-performance regime, data objects are allocated during an initialization stage, reused for multiple iterations of calculations, and deallocated during a nalization stage. Therefore, most objects are naturally persistent in multipcomponent simulations. Based on the assumption of persistency,

Roccom

denes a registration mech-

anism for data objects and organizes data into distributed objects called

dows.

A window encapsulates a number of

data attributes,

win-

such as the mesh

(coordinates and connectivities) and some associated eld variables. A window can be partitioned into multiple

panes

for exploiting parallelism or for distin-

guishing dierent material or boundary-condition types. In a parallel setting, a pane belongs to a single process, while a process may own any number of panes. A module constructs windows at runtime by creating attributes and registering their addresses. Dierent modules can communicate with each other only through windows, as illustrated in Figure 7.

Fig. 7.

Schematic of windows and panes.

Fig. 8.

The window-and-pane data abstraction of

Abstraction of data input.

Roccom drastically simplies inter-

module interaction: data objects of physics modules are registered and reorganized into windows, so that their implementation details are hidden from the framework and need not be altered extensively to t the framework. Service utilities can now also be developed independently, by interacting only with window objects. Window objects are self descriptive, and in turn the interface functions can be simplied substantially, frequently reducing the number of functions or the number of arguments per function by an order of magnitude. introduces the novel concept of

partial inheritance 8

Roccom

also

of windows to construct a

sub-window by

using

or

cloning

a subset of the mesh or attributes of another

Roccom

window. In addition, the registered attributes in

can be referenced as

an aggregate, such as using mesh to refer to the collection of nodal coordinates and element connectivity. These advanced features enable performing complex tasks, such as reading or writing data for a whole window, with only one or two

Roccom, see [11]. Roccom, we have developed a number of reusable service modules, middleware services, such as I/O, communication, and performance

function calls. For more information on the novel features of On top of including

tools, which we describe in the following subsections, as well mathematical service, which we discuss in the next section.

4.2

Data Input/Output

In scientic simulations, data exchange between a module and the outside world can be very complex. For le I/O alone, a developer must already face many issues, including various le formats, parallel eciency, platform compatibility, and interoperability with o-line tools. In a dynamic simulation, the situation is even more complex, as the code may need to exchange its mesh and data attributes with mesh repair or remeshing services, or receive data from remote processes. To meet these challenges, we use the

window

abstraction of

Roccom

as the

medium or virtual le for all data exchanges for a module, regardless whether the other side is a service utility, les of various formats, or remote machines, and let middleware services take care of the mapping between the window and the other side. For example, le I/O services map

Roccom

windows with scien-

tic le formats (such as HDF and CGNS), so that the details of le formats and optimization techniques become transparent to application modules. Furthermore, as illustrated in Fig. 8, all application modules obtain data from an input window through a generic function interface, obtain_attribute(), which is supported by a number of services, including le readers and remeshing tools. This novel design allows physics modules to use the same initialization routine to obtain data under dierent circumstances, including initial startup, restart, restart after remeshing, and reinitialization after mesh repair.

4.3

Interpane Communication

Traditional message-passing paradigms typically provide general but low-level inter-process communications, such as send, receive, and broadcast. In physical simulations using nite element or nite volume methods, communications are typically across panes or partitions, whether the panes or partitions are on the same or dierent processes. The

Roccom

framework provides high-level

inter-pane communication abstractions, including performing reductions (such as sum, max, and min operations) on shared nodes, and updating values for ghost (i.e., locally cached copies of remote values of ) nodes or elements. Communication patterns between these nodes and elements are encapsulated in the

9

pane

connectivity

of a window, which can be provided by application modules or con-

structed automatically in parallel using geometric algorithms. These inter-pane communication abstractions simplify parallelization of a large number of modules, including surface propagation and mesh smoothing, which we will discuss in the next section.

4.4

Performance Monitoring

There is a large variety of tools for the collection and analysis of performance data for parallel applications. However, the use of external tools for performance tuning is too laborious a process for many applications, especially for a complex integrated system such as

Rocstar. Many tools are not available on all the

platforms for which we wish to collect data, and using a disparate collection of tools introduces complications as well. To obviate the need for external tools, we have extended the

Roccom

framework's service-based design philosophy with

the development of a performance proling module,

Rocprof

Rocprof.

provides two modes of proling: module-level proling and submodule-

Roccom 's function invocation mechanism, and hence requires no user intervention. For submodule-level proling, Rocprof oers proling services through the stan-

level proling. The module-level proling is fully automatic, embedded in

dard MPI_Pcontrol interface, as well as a native interface for non-MPI based codes. By utilizing the MPI_Pcontrol interface, applications developers can collect proling information for arbitrary, user-dened sections of source code without breaking their stand-alone codes. Internally,

Rocprof

uses PAPI [4] or HPM

(http://www.alphaworks.ibm.com/tech/hpmtoolkit) to collect hardware performance statistics.

5 In

Parallel Computational Methods

Rocstar,

a physical domain is decomposed into a volume mesh, which can

be either block-structured or unstructured, and the numerical discretization is based on either a nite element or nite volume method. The interface between uid and solid moves due to both chemical burning and mechanical deformation. In such a context, we must address a large number of mathematical issues, three of which we discuss here.

5.1

Intermodule Data Transfer

In multiphysics simulations, the computational domains for each physical component are frequently meshed independently, which in turn requires geometric algorithms to correlate the surface meshes at the common interface between each pair of interacting domains to exchange boundary conditions. In a parallel setting, these meshes are typically also partitioned dierently. These surface meshes in general dier both geometrically and combinatorially, and are also partitioned dierently for parallel computation. To correlate such interface meshes, we have

10

developed novel algorithms to constructs a

common renement of two triangular

or quadrilateral meshes modeling the same surface, that is, a ner mesh whose polygons subdivide the polygons of the input surface meshes [13]. To resolve geometric mismatch, the algorithm denes a

conforming homeomorphism

and

utilizes locality and duality to achieve optimal linear time complexity. Due to the nonlinear nature of the problem, our algorithm uses oating-point arithmetic, but nevertheless achieves provable robustness by identifying a set of consistency rules and an intersection principle to resolve any inconsistencies due to numerical errors. After constructing the common renement, we must transfer data between the nonmatching meshes in a numerically accurate and physically conservative manner. Traditional methods, including pointwise interpolation and some weighted residual methods, can achieve either accuracy or conservation, but none could achieve both simultaneously. Leveraging the common renement, we developed more advanced formulations and optimal discretizations that minimize errors in a certain norm while achieving strict conservation, yielding signicant advantages over traditional methods, especially for repeated transfers in multiphysics simulations [12]. For parallel runs, the common renement also contains the correlation of elements across partitions of dierent meshes, and hence provides the communication structure needed for inter-module, inter-process data exchange.

5.2 In

Surface Propagation

Rocstar, the interface must be tracked as it regresses due to burning. In recent

years, Eulerian methods, especially level set methods, have made signicant advancements and become the dominant methods for moving interfaces [20, 21]. In our context Lagrangian representation of the interface is crucial to describe the boundary of volume meshes of physical regions, but there was no known stable numerical methods for Lagrangian surface propagation. To meet this challenge, we have developed a novel class of methods, called

face-osetting methods, based on a new entropy-satisfying Lagrangian (ESL) formulation [14]. Our face-osetting methods exploit some fundamental ideas used by level set methods, together with well-established numerical techniques to provide accurate and stable entropy-satisfying solutions, without requiring Eulerian volume meshes. A fundamental dierence between face-osetting and traditional Lagrangian methods is that our methods solve the Lagrangian formulation face by face, and then reconstruct vertices by constrained minimization and curvature-aware averaging, instead of directly moving vertices along some approximate normal directions. Fig. 9 shows a sample result of the initial burn of a star grain section of a rocket motor, which exhibits rapid expansion at slots and contraction at ns. Our algorithm includes an integrated node redistribution scheme that suces to control mesh quality for moderately moving interfaces without perturbing the geometry. Currently, we are coupling it with more sophisticated geometric and

11

Initial burn of star slice exhibits rapid expansion at slots and contraction at ns. Three subgures correspond to 0%, 6%, and 12% burns, respectively. Fig. 9.

topological algorithms for mesh adaptivity and topological control to provide a more complete solution for a broader range of applications.

5.3 In

Mesh Optimization

Rocstar, each physics module operates on some type of mesh. An outstanding

issue in integrated rocket simulations is the degradation of mesh quality due to the changing geometry resulting from consumption of propellant by burning, which causes the solid region to shrink and the uid region to expand, and compresses or inates their respective meshes. This degradation can lead to excessively small time steps when an element becomes poorly shaped, or even outright failure when an element becomes inverted. Some simple mesh motion algorithms are built into our physics modules. For example, simple Laplacian smoothing [25] is used for unstructured meshes, and a combination of linear transnite interpolation (TFI) with Laplacian smoothing is used for structured meshes in

Roco. These simple schemes are insucient when the meshes undergo

major deformation or distortion. To address the issue, we take a three-tiered approach, in increasing order of aggressiveness: mesh smoothing, mesh repair, and global remeshing. Mesh smoothing copes with gradual changes in the mesh. We provide a combination of in-house tools and integration of external packages. Our in-house eort focuses on parallel, feature-aware surface mesh optimization, and provides novel parallel algorithms for mixed meshes with both triangles and quadrilaterals. To smooth volume meshes, we utilize the serial MESQUITE package [5] from Sandia National Laboratories, which also works for mixed meshes, and we parallelized it by leveraging our across-pane communication abstractions. If the mesh deforms more substantially, then mesh smoothing becomes inadequate and more aggressive mesh repair or even global remeshing may be required, although the latter is too expensive to perform very frequently. For these more drastic measures, we currently focus on only tetrahedral meshes, and leverage third-party tools o-line, including Yams and TetMesh from Simulog and MeshSim from Simmetrix, but we have work in progress to integrate MeshSim into our framework for on-line use. Remeshing requires that data be mapped

12

from the old mesh onto the new mesh, for which we have developed parallel algorithms to transfer both node- and cell-centered data accurately, built on top of the parallel collision detection package developed by Lawlor and Kalé [15]. Fig. 10 shows an example where the deformed star grain is remeshed with the temperature eld of the uids volume transferred from the old to the new mesh.

Fig. 10.

Example of remeshing and data transfer for deformed star grain.

6

Example Results

The

Rocstar

code suite is veried using a number of 3-D problems. Some of

the verication problem sets are used for qualitative verication of simulation results, and others have specic analytical results that should be matched. Most of these simulation runs are performed on ve hundred or more processors for several weeks. We give a sample of our test problems and their results in this section.

6.1

Titan IV Propellant Slumping Accident

One of our major test problems is the propellant slumping in the Titan IV SRMU Prequalication Motor #1. This motor exploded during a static test ring on April 1, 1991, caused by excessive deformation of the aft propellant segment just below the aft joint slot [27]. Fig. 11 shows a cutaway view of the uids domain and the propellant deformation at nearly 1 second after ignition for an incompressible neoHookean material model, obtained from

Rocstar 's 3-D

simulations. The deformation is due to an aeroelastic eect and is particularly noticeable near joint slots. The resulting ow restriction lead to catastrophic failure of the real rocket on the test stand.

6.2

RSRM Ignition

To obtain insight into the process of ignition in the Space Shuttle Reusable Solid Rocket Motor (RSRM), we used our ignition and dynamic burn rate models in

13

Titan IV propellant slumping. Left: Cutaway view of uids domain. Right: Propellant deformation after 1 second. Fig. 11.

a fully coupled 3-D simulation of the entire RSRM (Fig. 12) [22]. Our results demonstrate that hot gas from the igniter (located at upper left in each of the four panels of Fig. 12) reaches deep into the star-grain slots at the head end and ignites the propellant there without a signicant delay, despite our omission of radiation in this calculation. Hot gas takes a long time to penetrate deeply into the two uninhibited (aft) joint slots and ignite the propellant because, unlike the star grain slots, cool gas trapped in joint slots has no easy escape route. This result matches well qualitatively with reality, and compares favorably with experimental measures that we obtained from NASA.

6.3

BATES Motor

The BATES motor is a laboratory scale US Air Force-sponsored rocket design that has been used for many years to investigate factors aecting motor performance. Data made available to CSAR includes the precise geometry as well as the composition of various test propellants. Published data have also been obtained describing the results of a series of experiments to determine the eect of the aluminum composition of the propellant on motor eciency. This information allows CSAR to validate the new particle transport and aluminum burning models [26] in

Rocstar. Fig. 13 shows the basic geometry of the BATES motor, Rocstar simulation, in which burning aluminum droplets

and the results from a

of a single radius (30 micrometers) and aluminum fraction (90%, in particular) are injected beginning at time t = 0. The gas is initially in a quasi-steady state with no aluminum present. At 15 ms, a new quasi-steady state has not yet been reached. Gas and droplets injected in the narrow slot at the head end move rapidly along the axis of the rocket and out through the nozzle, but gas and droplets injected elsewhere on the propellant surface move at much slower speeds. As a result, the droplets near the axis of the rocket have shorter residence times, and therefore a larger nal aluminum fraction, than do droplets

14

RSRM propellant temperature at 100, 125, 150, and 175 ms (clockwise from upper left). Fig. 12.

that spend more time at larger radii. The smoke concentration is correspondingly higher at larger radii. Droplets entering the nozzle with a substantial amount of unburned aluminum reduce the eciency of the motor.

7

Performance Results

In high-performance computing, serial performance is frequently critical to deliver good overall performance. With the aid of

Rocprof, the code developers have

been able to tune their modules selectively. Fig. 14 shows a sample statistics of hardware counters for

Rocfrac,

obtained by

Rocprof

on the ASC Linux Clus-

ter (ALC, http://www.llnl.gov/asci/platforms), which based on the Pentium IV

Rocstar and the compuRocfrac and Roco on a number of platforms. Overall, Rocstar

technology. Fig. 15 shows the absolute performance of tational intensive

achieves a respectful 16% of peak performance on ALC, and 10% or higher on other platforms. To measure the scalability of

Rocstar,

we use a

scaled

problem, i.e., the

problem size is proportional to the number of processors, so that the amount of work per process remain constant. Ideally, the wall-clock time should remain constant if scalability is perfect. Fig. 16 shows the wall-clock times per iteration using the explicit-implicit coupling between

Rocu

and

Rocsolid

with a ve to

one ratio (i.e., ve explicit uid time steps for each implicit solid time step), up to 480 processors on ASC White (Frost), based upon IBM's POWER3 SP technology. Fig. 17 shows the wall-clock time for the explicit-explicit coupling

15

Fig. 13.

BATES motor design (left) and aluminum fraction in at 15 ms (right).

Hardware counters for Rocfrac obtained by Rocprof on Linux cluster.

Fig. 15. Absolute performance of Rocstar on Linux and Mac clusters.

Fig. 14.

between

Roco

and

Rocfrac,

up to 480 processors on ALC. In both cases, the

scalability is excellent even for very large numbers of processors. The interface code, dominantly data transfer between uid and solid interfaces, takes less than 2% of overall time. Times for other modules are negligible and hence are not shown.

8

Conclusion

In this paper, we delivered an overview some software and algorithmic issues in developing high-performance simulation tools for multicomponent systems, based on our hand-on experience on detailed simulation of solid propellant rockets. Working as an interdisciplinary team, we have made substantial achievements in the past few years in advancing the state-of-the-art in these simulations. Many challenges remain, however, including the following outstanding research issues:

Distributed algorithms and data structures for parallel mesh repair and adaptivity

16

Fig. 16. Scalability of Rocstar with Roco and Rocsolid on IBM SP.

Scalability of Rocstar with Rocu and Rocfrac on Linux cluster. Fig. 17.

Generation and partitioning of extremely large meshes under memory constraints Parallel mesh adaptation for crack propagation in three dimensions Data transfer at sliding and adaptive interfaces.

These issues oer new research opportunities, and will require close collaboration between computer and engineering scientists to devise eective and practical methodologies.

Acknowledgments The research results presented in this paper are snapshots of many research contributions made by our colleagues at CSAR, as we have noted in the text. We particularly thank Robert Fiedler for his many contributions to the Example Results section. The CSAR research program is supported by the US Department of Energy through the University of California under subcontract B523819.

References

1. E. Anderson et al. LAPACK Users' Guide. SIAM, Philadelphia, 2nd edition, 1995. 2. M. Bhandarkar and L. V. Kalé. A parallel framework for explicit FEM. In M. Valero et al., editors, Proc. Internat. Conf. High Performance Computing (HiPC 2000), pages 385395. Springer, 2000. 3. J. Blazek. Flow simulation in solid rocket motors using advanced CFD. In 39th AIAA/ASME/SAE/ASEE Joint Propulsion Conf. and Exhibit, Huntsville Alabama, 2003. AIAA Paper 2003-5111. 4. S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. Int. J. High Perf. Comput. Appl., 14:189204, 2000. 5. L. Freitag, T. Leurent, P. Knupp, and D. Melander. MESQUITE design: Issues in the development of a mesh quality improvement toolkit. In 8th Intl. Conf. Numer. Grid Gener. Comput. Field Sim., pages 159168, 2002. 6. P. H. Geubelle and J. Baylor. Impact-induced delamination of composites: a 2D simulation. Composites B, 29B:589602, 1998. 17

7. W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 2nd edition, 1999. 8. A. Haselbacher. A WENO reconstruction method for unstructured grids based on explicit stencil construction. In 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, Jan. 2005. AIAA Paper 2005-0879, to appear. 9. C. Huang, O. S. Lawlor, and L. V. Kalé. Adaptive MPI. In Proc. 16th Internat. Workshop on Languages and Compilers for Parallel Computing (LCPC 03), October 2003. 10. C. Huggett, C. E. Bartley, and M. M. Mills. Solid Propellant Rockets. Princeton University Press, Princeton, NJ, 1960. 11. X. Jiao, M. T. Campbell, and M. T. Heath. Roccom: An object-oriented, datacentric software integration framework for multiphysics simulations. In 17th Ann. ACM Int. Conf. Supercomputing, pages 358368, 2003. 12. X. Jiao and M. T. Heath. Common-renement based data transfer between nonmatching meshes in multiphysics simulations. Int. J. Numer. Meth. Engrg., 61:24012427, 2004. 13. X. Jiao and M. T. Heath. Overlaying surface meshes, part I: Algorithms. Int. J. Comput. Geom. Appl., 14:379402, 2004. 14. X. Jiao, M. T. Heath, and O. S. Lawlor. Face-osetting methods for entropysatisfying Lagrangian surface propagation. In preparation, 2004. 15. O. S. Lawlor and L. V. Kalé. A voxel-based parallel collision detection algorithm. In Proc. Internat. Conf. Supercomputing, pages 285293, June 2002. 16. L. Massa, T. L. Jackson, and M. Short. Numerical simulation of three-dimensional heterogeneous propellant. Combustion Theory and Modelling, 7:579602, 2003. 17. F. M. Najjar, A. Haselbacher, J. P. Ferry, B. Wasistho, S. Balachandar, and R. D. Moser. Large-scale multiphase large-eddy simulation of ows in solid-rocket motors. In 16th AIAA Computational Fluid Dynamics Conf., Orlando, FL, June 2003. AIAA Paper 2003-3700. 18. A. Namazifard and I. D. Parsons. A distributed memory parallel implementation of the multigrid method for solving three-dimensional implicit solid mechanics problems. Int. J. Numer. Meth. Engrg., 61:11731208, 2004. 19. A. Namazifard, I. D. Parsons, A. Acharya, E. Taciroglu, and J. Hales. Parallel structural analysis of solid rocket motors. In AIAA/ASME/SAE/ASEE Joint Propulsion Conf., 2000. AIAA Paper 2000-3457. 20. S. Osher and R. Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. Springer, 2003. 21. J. A. Sethian. Level Set Methods and Fast Marching Methods. Cambridge University Press, 1999. 22. K. C. Tang and M. Q. Brewster. Dynamic combustion of AP composite propellants: Ignition pressure spike, 2001. AIAA Paper 2001-4502. 23. K. C. Tang and M. Q. Brewster. Nonlinear dynamic combustion in solid rockets: L*-eects. J. Propulsion and Power, 14:909918, 2001. 24. P. D. Thomas and C. K. Lombard. Geometric conservation law and its application to ow computations on moving grids. AIAA J., 17:10301037, 1979. 25. J. F. Thompson, B. K. Soni, and N. P. Weatherill, editors. Handbook of Grid Generation. CRC Press, Boca Raton, 1999. 26. J. F. Widener and M. W. Beckstead. Aluminum combustion modeling in solid propellant combustion products, 1998. AIAA Paper 98-3824. 27. W. G. Wilson, J. M. Anderson, and M. Vander Meyden. Titan iv srmu pqm-1 overview, 1992. AIAA Paper 92-3819.

18

Parallel Simulation of Multicomponent Systems - Semantic Scholar

Parallel Simulation of Multicomponent Systems - Semantic Scholar

Suggest Documents

Monte Carlo Simulation of Multicomponent ... - Semantic Scholar

Simulation of paraequilibrium growth in multicomponent systems

simulation of systems - Semantic Scholar

Synthesizing massive parallel simulation systems

Performance Analysis of Parallel Systems - Semantic Scholar

Massively-Parallel Molecular Dynamics Simulation ... - Semantic Scholar

Parallel & Distributed Simulation with DEUS - Semantic Scholar

Simulation Modelling of Parallel Systems - CiteSeerX

Distributed and Parallel Database Systems - Semantic Scholar

Parallel Array-rewriting P Systems - Semantic Scholar

Parallel communicating grammar systems with ... - Semantic Scholar

parallel programming systems for workstation ... - Semantic Scholar

Multicomponent Analysis of the Differential ... - Semantic Scholar

thermodynamics of multicomponent systems with

Multicomponent Analysis of Junctional Movements ... - Semantic Scholar

BIAS IN PARALLEL AND DISTRIBUTED SIMULATION SYSTEMS

fast, parallel simulation of the integrated urban ... - Semantic Scholar

Parallel generation of samples for simulation ... - Semantic Scholar

Parallel generation of samples for simulation ... - Semantic Scholar

parallel discrete event simulation of queuing - Semantic Scholar

Parallel Mass Transfer Simulation of Nanoparticles ... - Semantic Scholar

Efficient Parallel Simulation of an Individual-Based ... - Semantic Scholar

Parallel and Distributed Simulation of Petri Nets - Semantic Scholar

Parallel Simulation of Tsunamis Using a Hybrid ... - Semantic Scholar