Extensible Software Engineering Practices for the Helios High-Fidelity ...

15 downloads 1546 Views 1MB Size Report
lated to the development of production quality Python-based scientific simulation software ... tinuous integration and regression testing, automatic reporting of ...
49th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition 4 - 7 January 2011, Orlando, Florida

AIAA 2011-1178

Extensible Software Engineering Practices for the Helios High-Fidelity Rotary-Wing Simulation Code Buvaneswari Jayaraman∗

Andrew M. Wissink†

US Army/AFDD, Moffett Field, CA

US Army/AFDD, Moffett Field, CA

Sameer Shende‡

Stephen Adamec§

Paratools Inc., Eugene, OR

CREATE-AV, Patuxent River, MD

Venkateswaran Sankaran¶ US Army/AFDD, Moffett Field, CA We describe software engineering practices applied to the Helios code, an integrated computational fluid dynamics and structural dynamics platform for rotorcraft simulations. Helios consists of a collection of legacy and new simulation components, that are integrated together using a Python-based infrastructure. Given its target use as a production platform, rigorous software development practices are used to enable ease of development, builds, testing, and maintenance. Specific elements discussed are the unique aspects related to the development of production quality Python-based scientific simulation software including a multi-platform build environment and the installation of consistent versions of all support software across disparate computer systems. In addition, we also discuss continuous integration and regression testing, automatic reporting of performance/memory usage and scalability analysis.

I.

Introduction

The rapid growth in computational capability in recent years has allowed helicopter researchers and engineers to solve large-scale problems integrating high-fidelity Navier-Stokes-based CFD models for aerodynamics predictions together with structural dynamics models and flight controls. The algorithmic approach for such integrated simulations has been proven using research codes over the past decade (eg., Potsdam et al.1 ) but incorporating these ideas and techniques into a usable and accurate software design tool is a challenge. Helicopter design engineers will adopt simulation software only when they are confident the accuracy has been rigorously tested, documented, and verified by the core development team and outside users. Moreover, to ensure long-term use of newly developed software, scalable performance is of equal importance and, for multi-disciplinary simulations, it is critical that all components of the software scale efficiently. Finally, the software needs to be easy-to-build, easy-to-use and easy-to-maintain. In this paper, we describe the extensible software engineering practices that are being employed in connection with the development of the Helios code, an innovative multi-disciplinary platform for rotorcraft simulations. The CREATE-AV (Air-Vehicles) and the HPC Institute for Advanced Rotorcraft Modeling and Simulation (HI-ARMS) programs sponsored by the DoD HPC Modernization Office have the goal of developing production quality simulation software for use by the acquisition community to assess the flight characteristics of both fielded vehicles and those in early conceptual design stages. One of the software products being developed within these programs is the rotary-wing simulation code called Helios.2–4 Rotorcraft computations are inherently multidisciplinary, requiring the solution of moving body aerodynamics coupled ∗ Research

Scientist, Science and Technology Corp., Engineer, AMRDEC, AIAA Member ‡ President and Director, Paratools Inc. § CREATE-AV Build Master ¶ Aerospace Engineer, AMRDEC, AIAA Member † Aerospace

1 of 14 Institute of protection Aeronautics Astronautics This material is declared a work of the U.S. Government and American is not subject to copyright in theand United States.

with structural dynamics for rotorcraft blade deformations, vehicle flight dynamics, and controls. Helios integrates a collection of separate physics packages that model the aerodynamics, structural dynamics, and vehicle flight controls through a light-weight and flexible Python-based infrastructure. Helios also uses an innovative dual-mesh paradigm that utilizes unstructured meshes in the near-body for ease of mesh generation and Cartesian meshes in the off-body for better accuracy and efficiency. Version 1.0 of the software, called Whitney, has the capability to perform isolated rotor simulations in both hover and forward-flight without or with structural dynamics coupling.2 Following extensive validation by the Helios team, the software underwent product acceptance testing by the CREATE-AV Quality Assurance team. Subsequently, Helios was beta-released to a select group of government and industry users in early 2010. Hands-on training sessions were conducted and continuous user support was provided to help the users with their issues. This continuous engagement with the user community helped the development team to learn about the bugs early and provide appropriate fixes via updated releases. The second version of Helios, called Shasta, introduces additional capabilities such as off-body adaptive mesh refinement (AMR) for more accurate resolution of the rotor-tip vortices and the ability to perform combined rotor and fuselage simulations. Shasta is currently undergoing internal validation and testing and is expected to be beta-released in March 2011. A more complete description of the version 2.0 capabilities is given by Sankaran et al.5 Use of rigorous software development standards with extensive verification and validation is considered the best practice to ensure software quality. To date, use of these practices by the scientific community has been limited. Reasons include a small specialized user base, often in conjunction with changing requirements, and challenges particular to the scientific community—parallel computing, performance, and numerical accuracy in particular. As computer speed grows and design cycle times quicken, there is increasing reliance on predictive scientific software in the design process.6, 7 It is important that such scientific software adopt software engineering practices routinely used by successful Information Technology (IT) software8 to improve the reliability, ease of use and maintainability of these complicated software packages. An overview of the collaboration tools and services available to the development teams under the CREATE program is outlined by Atwood et al.9 The focus of the current paper is on Helios development practices. Specifically, we describe the modular development of the software, standardization of the build environment, continuous integration testing paradigms and performance analyses. Helios employs a Python-based infrastructure to combine a variety of multi-disciplinary component codes, by managing their sequential execution and the data transfer between them.3, 4 The Python-based integration facilitates modularity and extensibility, which are important attributes of a modern computational platform. Interfaces are constructed in the component’s native Fortran/C/C++ and translated into Python for exchange with other modules. By maintaining a well-defined data layout at the python level, all modules can access and share a common set of data with any datatype translations occurring at the interface level. This readily allows substitution of alternative modules, for instance, to replace an existing model with one that shows better scalability or higher fidelity, or the addition of new modules, such as adding new physics or simulation capabilities. Usability is a further challenge that is key to ensuring end-user acceptance of such complex software packages. Helios includes a Graphical User Interface (GUI) to assist the end-user with the set up of run time inputs. The Helios-GUI is developed based on the wxPython based graphics engine developed by the CREATE-AV Kestrel10 team with appropriate template and configuration files to meet Helios requirements. At the present time, the GUI is responsible for setting up all run-time inputs on a local server before the case is packaged to a remote cluster for execution. Future enhancements will look to merge these two steps and enable job submission directly from the user interface. Code development is done within a continuous integration context, which comprises of a source code repository and a fully automated build system. While there are many options for each of these systems, Helios uses svn for source code maintenance11 and the Hudson software22 for continuous integration using make as the build system. The use of multiple languages increases the complexity of the build and standardization across different platforms and compilers becomes difficult. In order to minimize variability across different computer installations, Helios also employs a common run-time environment that packages all the supporting software (besides the compilers themselves) such as Python, wxPython, numpy, matplotlib, etc. The software package is called ptoolsrte (stands for Paratools Run Time Environment)12 and is deployed across the major DSRC’s and commodity clusters as needed. A suite of regression tests are deployed on any new installation in order to systematically verify the fidelity of the installation against established results. In addition, the continuous integration system is used to exercise a subset of the regression tests on

2 of 14 American Institute of Aeronautics and Astronautics

a regular basis to verify integrity of the code at all times. In a large multi-component platform such as Helios, poor performance or scalability of one of the components can seriously impede overall performance. Performance profiling tools are therefore key to detection and amelioration of such problems. While a variety of such tools are available in the software community, the use of multiple languages makes standardization difficult. For this reason, Helios uses the Python-based TAU performance tools, which uses a mix of build-time and compile-time directives to access memory and CPU utilization and makes them available at the infrastructure level.13, 14 In addition, TAU can also be used to investigate scalability performance on different cluster installations.15 Used in concert with the regression tests mentioned earlier, this can help quickly identify potential bottlenecks and suggest ways to improve performance. The outline of the paper is as follows. Section II gives more details of the Helios software, the development model and the graphical user interface. In section III, we discuss the build requirements, distribution, and tools for the multi-language software Helios and present our efforts on Continuous Integration (CI) to perform automatic builds and tests. Section IV presents the approach followed for regression testing at the component and integrated software level and some preliminary results from code coverage analysis. In section V, we present the approach for performance profiling and discuss scalability analyses. Concluding remarks are provided in the final section.

II. II.A.

Helios Development

Helios Components

Helios employs an innovative dual-mesh paradigm that utilizes unstructured meshes in the near-body for ease of mesh generation and Cartesian meshes in the off-body for better accuracy and efficiency.3, 4 Figure 1 shows an example of such a system. The two mesh types overlap each other with data exchange between the two systems being managed by a domain connectivity formulation. The unstructured meshes are bodyconforming and are comprised of a mix of tetrahedra, prisms and pyramids. The Cartesian meshes are managed by a block structured mesh system which has the ability to conform to the geometry and solution features. Distinct solver modules are used for each mesh type: the unstructured near-body solver is the NSU3D code,16 while the adaptive-Cartesian off-body solver is called SAMARC, which is a combination of the SAMRAI meshing infrastructure17 and the ARC3DC code.18 The interpolation of fringe data from one mesh

Figure 1. Dual mesh CFD approach used by Helios.

system to the other is managed by a domain connectivity module called PUNDIT.19 PUNDIT utilizes implicit hole-cutting and is completely automated and scalable, which are especially important for dynamic mesh problems, wherein the near-body unstructured mesh may move with the body (for instance, in the case of

3 of 14 American Institute of Aeronautics and Astronautics

the rotor meshes), while the background Cartesian mesh is stationary, but may change due to adaptation to geometry and/or solution features. Besides aerodynamics, rotorcraft simulations require coupling with structural dynamics and trim controls. Structural dynamics and trim are provided by the comprehensive analysis code, RCAS,20 or optionally by CAMRAD.21 Transfer of fluid dynamic forces from the 3D blade surface to the 1D beam structural models are handled by the Rotor Fluid Structure Interface (RFSI) module. Surface deformations and rotor rotation are used to move and deform the appropriate near-body meshes by a Mesh Motion and Deformation (MMD) module. Finally, necessary coordinate transformations to aid these force and motion transfers are handled by a Fluid and Flight Dynamics Interface (FFDI) module. Additional details of these components are given by Sankaran et al.5 All the components are interfaced together using a flexible and light-weight Python-based infrastructure called the Software Integration Framework (SIF). Figure 2 shows a schematic of SIF. Well-defined Application Programming Interfaces (API’s) are established to define the method calls and data inputs and outputs related to each component. SIF itself is a small set of Python scripts that essentially function as the main program in a conventional monolithic code. It embodies the time-integration loop controller that schedules the appropriate method calls, such as initialize, readMesh, runSubStep, and so on. The data themselves are packaged into Python dictionaries and transferred between components as needed. In order to preserve modularity, all data transfer between components occur through SIF and no component is allowed to “talk” directly to another component. The parallel execution of SIF is accomplished using pyMPI.

Figure 2. Python-based infrastructure in Helios.

Preservation of modularity is a key element of the Helios development process. Each developer is the owner of one or more components, and they are responsible for ensuring that the component adheres to a set of defined software practices. The component developer works with an interface code template and a set of interface data structures, both of which are unique to the component. The interface code template contains all the subroutines that SIF expects from that component. The developer introduces the specific calls to the particular component code from within this template. Further, the developer provides the required data by defining the appropriate data structures. Besides these two elements, there is no other specific dependence of each component code on the other components. Finally, the component developer is also responsible for providing necessary theory and user documentation, unit, validation and regression tests, as well as necessary template files for plugging the component user-inputs into the Helios-GUI. A subversion repository for each Helios module/component is maintained by the developer.11 Stable versions of the module are released by the developers to be integrated into the main Helios software. A separate svn repository is maintained for the entire Helios package. Specialized scripts are used to check out of the developer module repository into the Helios repository, which automatically records the revision number of each module. In this way, clear traceability of each Helios module to its development repository is maintained. Different versions of the modules and Helios itself are maintained in the svn repositories using tags. In this way, it is possible to roll-back changes to previous released versions for interim updates.

4 of 14 American Institute of Aeronautics and Astronautics

II.B.

Helios User Interface

The Helios platform includes a Graphical User Interface (GUI) to assist the end-user with the set-up of run time inputs. The Shasta version supports a number of use-cases based upon whether the configuration involves isolated components (such as a fuselage, or rotor), or a full configuration (eg., fuselage and rotor(s)), or other interacting components (eg., aircraft and store). Figure 3 shows a graphical representation of usecase selection process. The procedure works in a hierarchical fashion starting with the identification of the physical component, eg., fuselage or rotor. The next stage involves the selection of the physical problem—eg., in the case of the rotor, the user can select hover, forward flight or prescribed maneuver. The final stage involves the selection of the solution procedure, eg., for hover, the choices are rotation frame with purely unstructured mesh, rotational frame with dual unstructured-Cartesian mesh or inertial formulation with dual-mesh.

Figure 3. Hierarchical selection in Helios: Flow chart of supported use cases for one configuration.

The corresponding selection in the Helios-GUI is shown in Fig. 4. The figure shows the opening page of the GUI. The user has to walk through and hierarchically select the appropriate use-case of interest. It should be pointed out that the GUI-engine is based on wxPython and is developed by CREATE-AV’s Kestrel product group.10 No source code modifications are necessary to accommodate the Helios product needs. Customizations for specific use-cases, components and inputs are done entirely by including appropriate configuration and template files. Importantly, the GUI development is testament to a modular and extensible development approach, that is common to the different CREATE-AV products. Based on the user-selection, the appropriate GUI inputs are organized so that the user has to enter only the information required by the participating components used by the desired solution scenario. An example pane from the GUI is shown in Figure 5. In this example, the user input panes that are visible correspond to mesh-processing, SIF, NSU3D, SAMARC and PUNDIT. Input panes for the structural dynamics and trim or other components do not appear since the use-case in question does not involve fluid-structure coupling. The GUI saves all user inputs into an xml file, which can be re-read and adjusted at any time. In addition, the GUI writes out ASCII input files for each code module. Since many of the components used in Helios are legacy codes, they read in inputs through their legacy input decks. Upon save, the GUI organizes the inputs in a format required by Helios. This basically involves creating sub-directories in which each components reads and writes its input/output data. Subsequent to this, the run-time directory is tarred up and transferred to a remote cluster for actual job execution.

5 of 14 American Institute of Aeronautics and Astronautics

Figure 4. Use case selection process through Helios GUI.

Figure 5. Helios run time setup GUI panel.

6 of 14 American Institute of Aeronautics and Astronautics

III. III.A.

Run Time Environment and Build Distribution

Run Time Environment

Helios modules are written in a variety of languages. For example the near-body unstructured solver is written in FORTRAN-77 and FORTRAN-90, the off-body solver is written in C++, SIF is written in Python, the GUI is based on wxPython. Moreover, Helios installation relies upon a number of other packages and tools such as HDF5, numpy, swig, matplotlib, and so on. Requiring the user to install all of these packages as a pre-requisite to installing Helios is not only tedious and time consuming, but it also presents difficulties from the point of view of support and maintenance. For example, a cluster that uses a different version of Python or one of the other tools may very well introduce issues that can be difficult to trace and fix. To tackle such issues, Helios uses a customized build and run-time environment called ptoolsrte, developed by ParaTools, Inc.13 This package bundles all the software that Helios relies on—viz., Python, numpy, swig, pyMPI, wxpython, matplotlib, etc. The package is a hybrid source and binary distribution that provides most of the tools pre-built in a binary form. Those tools that must be compiled on the target platform are provided in source form, and built on the target platform during the configuration process. Thus, the user first downloads, installs, and tests the ptoolsrte package on their system, and then configures Helios to use this environment during the build. This process greatly streamlines the installation since users do not need to individually configure their systems with the requisite tools and the development team is assured the supporting software environment is installed properly at the users’ site. III.B.

Build Distribution

CREATE Air Vehicles distributes Helios builds to users as pre-compiled binaries. A single executable file extracts a script which installs the Helios build on a user’s system. Redistributable runtime libraries for Intel and Portland compilers are included as part of the installation if the compilers are unavailable. Two types of builds are produced: a fully-functional build with all components bundled, and a GUI-only version. For full Helios builds, a suitable MPI distribution, OpenMPI, is packaged with the Helios binary so it may be used without additional software on an end-user workstation. Because Helios is linked dynamically with OpenMPI libraries, compatible existing versions of OpenMPI on HPC machines can be used in place of the version packaged with Helios. This is especially useful when OpenMPI has already been compiled on a cluster for specialized hardware support and optimized performance. OpenMPI utilizes a modular component architecture (MCA) loaded at runtime, discovering available communication devices compiled as shared libraries. This allows application binary interface (ABI)-compatible versions of OpenMPI to run a single build of Helios. When using other MPI distributions, a more careful analysis of platform Infiniband drivers and optimizations is required in order to provide binaries compatible with the end-user’s system. GUI-only packages are also available. These packages contain no parallel code and are built for various Linux operating systems with the intent to later copy completed input files and partitioned grids to another cluster with the full Helios build. The GUI is bundled together with a collection of libraries provided through ptoolsrte. This collection satisfies all dependencies for Helios, including windowing libraries. III.C.

Continuous Integration(CI)

In software engineering, continuous integration is used to continuously apply the process of quality control by executing small automated tests. This frequent testing process integrates quality control with the development and helps to detect errors early to maintain the integrity of the code at all times. The most popular tools are Continuum, Cruise Control, and Hudson. They are all free open source tools which support a variety of build tools like Java, Maven, and Ant. All of these tools also have shell support which makes them very flexible. Users can use any build system, including make or cmake, the build system most widely used by the scientific community. After a preliminary analysis of these build tools, we decided to use Hudson22 due to its ease of use, support for distributed builds, and plugin support that allows the use a variety of tools developed by the large and growing user community. Hudson installation requires an execution of a Java command on the downloaded jar file. Glover23 provides a quick start guide for continuous integration using Hudson. Two basic requirements for a continuous integration system are a source code repository and a fully automated build system. Hudson can be configured to automatically check out the source code from the repository and build as required. Once the code is built a series of tests are automatically triggered to 7 of 14 American Institute of Aeronautics and Astronautics

make sure that the new build behaves as expected. Hudson can be configured to trigger the build as often as required and needs minimal intervention from the user after configuration. Hudson also has the ability to support multiple build machines (slaves) from the main installation (master). The master distributes the jobs to the slave nodes based on the job’s configuration. Finally, Hudson can be configured send email messages about the success or failure of the builds and tests. See Fig. 6 for a graphical view of Hudson’s screen report.

Figure 6. Hudson report of the build and regression tests.

IV. IV.A.

Regression Tests and Code Coverage

Regression Tests

Regression tests are tests that are utilized to ensure that routine code updates do not inadvertently introduce bugs or otherwise break the existing functionality of the code. Such tests are exercised both at the component-level and at the Helios product level. At the component level, the tests take many forms: unit, verification and validation tests. Unit tests are meant to test functionality at the subroutine level. In Helios, these tests are usually introduced to test leaf (or terminal) subroutines. They typically involve providing a set of known inputs to the subroutine and checking the outputs with expected values. Verification tests are more mathematically based and check certain functionalities versus analytically established behavior. These tests typically involve collections of subroutines and, in some cases, may even be administered at the component level. An example of verification tests in CFD is the assessment of the order of accuracy of residual computations using exact solutions (or equivalently, using the method of manufactured solutions). Systematic verification tests are not used in the legacy components, but are being developed as part of all new component development (see for example Katz and Sankaran24 ). Finally, validation tests involve testing the whole module or component. They involve standard test cases with well-established experimental measurements or comparator code results. Validation testing is also an important part of product acceptance testing for the Helios product as a whole. Clearly the above tests must be performed as part of the development process, both at the individual component level as well as at the product level. The role of regression tests is to capture salient elements of these tests so that they can be repeated on demand to verify that the code is performing as expected following routine updates to the software package. In order to facilitate such regression testing, the unit, verification and validation tests are fitted with a test harness through standardized scripts. The scripts automatically point to the desired grids and inputs and execute the computation for a certain number of iterations. At the end of the computations, key results are compared with “gold” file results that are also archived along with the test cases. Figure 7a shows the results from a particular regression test vase in Helios. The python script uses the numpy package to compute the average and maximum reported difference in the residual and force files. The differences in this case are observed to be zero indicating perfect agreement between the expected

8 of 14 American Institute of Aeronautics and Astronautics

and computed results. In addition to numerical differences, the test script also generates a graphical plot that compares the computed and expected results as shown in (Fig 7b).

Figure 7. (a).Regression test output comparing the computed and expected result, (b).Difference in normal load computation between computed and “gold” result.

In Helios, several regression tests are available that test out different functionalities and capabilities of the code. In each case, four levels of regression tests are available based on the time it takes to execute the run. Level I tests run in a few minutes and typically involve running 5 iterations. Level II tests execute in about 30 minutes and run about 100 steps. Level III tests execute in two to four hours and run about 1000 iterations. Level IV are the most complete and involve a complete converged solution that may take several hours of CPU time. Of these, Level I tests are the most suited to be used in the context of a continuous integration process. Numerical differences larger than a specified tolerance can be automatically flagged and reported to the development team via email. In addition, all levels of tests are packaged with the Helios build and are available to users to independently test and verify that the build is capable of accurately reproducing known results. IV.B.

Code Coverage

Code coverage is a measure used in software testing to identify the areas of the source code that are tested (and those that are not) by the regression test suites. This helps the developers to design tests to cover the areas of the code that are not tested. Our initial attempts to assess code coverage have utilized the R code coverage tool, which comes as part of the Intel compiler suite.25 The tool is easy to use and Intel flexible enough to allow the user to choose certain components or the entire code for coverage analysis. Coverage testing is turned on using a compiler flag. Running the regression test suites with this version of executable automatically generates the code coverage report. Figure 8 shows a top-level summary of overall coverage for the entire code. This particular test case uses the prescribed blade motion file and performs the analysis to predict the airloads on the rotor blades. The near-body, off-body, domain connectivity, mesh motion, and fluid structure interaction modules are used for this case. The summary includes the number of files that are covered by this test case as well as those that are not covered. Clicking on a file name opens up the source code that is highlighted to show which portions of the code are covered by the test (see Fig. 9). We note that the current preliminary coverage estimates for Helios is about 50% for files and about 33% for functions. While these numbers are not high, they are still encouraging given the fact that Helios utilizes a number of legacy codes. Detailed investigation of the code coverage report will be done to identify parts of the code that are never used and to develop tests to exercise those parts that are applicable. Future native

9 of 14 American Institute of Aeronautics and Astronautics

Figure 8. Code coverage report using Intel compiler.

Figure 9. Source view of sample code.

10 of 14 American Institute of Aeronautics and Astronautics

module development and associated regression tests will aim to increase the code coverage percentage as much as possible.

V.

Performance Profiling

In a multi-component code like Helios poor performance or scalability of one of the modules can impede the performance of the integrated software. Observing the performance of a multi-language Python based code such as Helios requires performance instrumentation at multiple layers. The underlying communication substrate is MPI while the CFD packages are written in a combination of Fortran, C++, C, and Python languages. The pyMPI package from ptoolsrte is used to launch the application. To understand the performance characteristics of the various modules in Helios, and the performance of their integrated execution, we use the TAU performance system.13 TAU provides users with robust instrumentation at multiple levels, measurement, and analysis capabilities for observing and evaluating performance of HPC applications.12 It can instrument application source code automatically at the level of routines and outer loops. It captures both time and hardware counter data, generates trace files in several trace formats, presents the profile information in interactive displays, and stores experiment results in a performance database. A recent improvement to TAU also allows us to accurately measure the extent and volume of I/O and memory allocation, de-allocation, and memory leaks on an un-instrumented program using run-time interposition of POSIX I/O libraries using a tool called tau exec.14 TAU is freely available for download from the TAU website (http://tau.uoregon.edu/) under a BSD style license. Figure 10 shows the performance traces for a CFD application run on ARL’s mjm system generated using TAU’s paraprof tools. This run computes the hover conditions about a quarter-scale V-22 rotor (experimental model referred to as TRAM) on 16 processors, using NSU3D for the near-body solver and SAMARC for the offbody. A lot of useful information can be derived from this simple Python-level instrumentation. It is clear from the profiles what percentage of time is spent in each CFD module, and also the degree of imbalance in the computation. Considering this level of instrumentation requires no change to the application’s executable, this is a very effective way to report the performance of a Helios application with essentially no intervention to the codes themselves.

Figure 10. Trace analysis performed using TAU’s paraprof tools from Python-level instrumentation of Helios modules.

The performance profiling tools described above can isolate where computational time is spent and the relative efficiency of dominant routines. Developers are additionally interested in scaling qualities of their algorithms and software. The TAU package includes an option to load performance data from individual cases into a database and explore the the performance of particular routines across cases using perfexplorer. For example, a particular case may be run across a range of processors and the performance data for each 11 of 14 American Institute of Aeronautics and Astronautics

processor set loaded into the database. The perfexplorer tool can then be used to analyze the scaling trends of particular components, or routines within the component. This analysis helped us identify a source of scaling inefficiency in Helios components that was traced back to poor performance of the MPI Allreduce routine.15 Such information is critical to isolate particular routines that create potential scaling inefficiencies. Figure 11 shows Helios strong-scaling for the TRAM calculation previously described. In a strongly scaled case the problem size remains fixed while the number of processors is increased. Results are shown for two systems, the mjm Linux cluster based at Army Research Lab the hawk SGI-altix machine based at Air Force Research Lab, with the problem scaled from 8 to 128 processors. The scaling qualities are commensurate with other CFD codes for problems of this size, indicating the use of mixed codes with Python does not detrimentally impact scaling.

Figure 11. Strong scaling of Helios for TRAM hover calculation; (a) mjm linux cluster, (b) hawk SGI altix system.

VI.

Concluding Remarks

This paper presents the software engineering practices being followed in the development of Helios, a high-fidelity rotorcraft analysis code. Unlike research codes, Helios is intended to be production-level software that can be used for the next one to two decades and, as such, requires assurances on software quality, accuracy and performance. The paper details the modular and extensible development approach utilizing a light-weight and flexible Python-based infrastructure to combine several candidate component modules. The modules have well-defined API’s consisting of method calls and data specifications and the modules can be easily adapted or exchanged within this architecture. In addition to the main solver components, Helios also has a GUI which is based on the gui-engine developed by the Kestrel product team. The gui-engine is customized for Helios use through the use of configuration files and templates and requires no alteration of the source code. Helios builds within a customized run-time environment that is deployed via the ptoolsrte package. This package contains partially pre-built source packages for all support software and tools that are required for Helios installation. In this way, we ensure that Helios is always installed within a controlled environment, thereby simplifying installation and support of the software. An important aspect of rigorous software quality controls is regression testing at both the component and product levels. Component regression test suites are based upon unit, verification and validation tests. Helios product regression tests are based upon validation tests. A wide range of regression tests are deployed with the Helios installation. Further a small subset of these tests are also exercised regularly as part of a continuous integration process using Hudson so that fidelity of the software can be continually monitored. Finally, the paper also discusses Python-based performance profiling tools using the TAU package. These tools allow memory and CPU profiling of Helios components, which can be used by developers to determine performance bottle-necks and thereby work to ameliorate them. The tools can also be used to monitor

12 of 14 American Institute of Aeronautics and Astronautics

scalability of the software, which is an important attribute of the software on HPC clusters. In summary, we note that the present version of Helios uses a number of legacy components. Future development will involve replacing these legacy elements with native newly developed software modules. The incorporation of extensible software engineering practices such as those outlined in this paper is a necessary step in this software renewal process.

Acknowledgments Material presented in this paper is a product of the CREATE-AV Element of the Computational Research and Engineering for Acquisition Tools and Environments (CREATE) Program sponsored by the U.S. Department of Defense HPC Modernization Program Office. This work was conducted at the High Performance Computing Institute for Advanced Rotorcraft Modeling and Simulation (HIARMS). The authors gratefully acknowledge the contributions by Dr. Jay Sitaraman, Mr. Mark Potsdam, Dr. Anubhav Datta, and Dr. Roger Strawn of the Helios team. We are also grateful to Mr. Todd Tuckey of CREATE-AV Kestrel team for the development of the GUI engine. The authors would also like to acknowledge Dr. Chris Atwood for his support and guidance with the use of the software engineering practices discussed in this paper.

References 1 Potsdam,

M., Yeo, H., and Johnson, W., “Rotor airload preidiction using loose aerodynamic/structural coupling,” American Helicopter Society 60th Annual Forum, Baltimore, MD, June 2004. 2 Sankaran, V., Sitaraman, J., Wissink, A., Datta, A., Jayaraman, B., Potsdam, M., Mavriplis, D., Yang, Z., O’Brien, D., Saberi, H., Cheng, R., Hariharan, N., and Strawn, R., “Application of the Helios computational platform to rotorcraft flowfileds,” 48th AIAA Aerospace Sciences Meeting Including the New Horizons forum and Aerospace Exposition, 4-7 January 2010, Orlando, Florida. 3 Wissink, A. M., J. Sitaraman, V. Sankaran, D. J. Mavriplis, and T. H. Pulliam, “A Multi-Code Python-Based Infrastucture for Overset CFD with Adaptive Cartesian Grids”, AIAA-2008-0927, 46th AIAA Aerosciences Conference, Reno NV, Jan 2008. 4 Sitaraman, J., A. Katz, B. Jayaraman, A. Wissink, V. Sankaran, “Evaluation of a Multi-Solver Paradigm for CFD using Unstructured and Structured Adaptive Cartesian Grids,” AIAA-2008-0660, 46th AIAA Aerosciences Conference, Reno NV, Jan 2008. 5 Sankaran, V., Wissink, A., Datta, A., Sitaraman, J., Jayaraman, B., Potsdam, M., Kamkar, S., Katz, A., Mavriplis, D., Saberi, H., Roget, B., and Strawn, R., “Overview of the Helios V2.0 Computational Platform for Rotorcraft Simulations”, AIAA Paper, 49th Aerospace Sciences Meeting, Orlando, FL, January, 2011. 6 Dubey, A., Antypas, K., Ganapathy, M. K., Ried, L. B., Riley, K., Sheeler, D., Siegel, A., and Weide, K., “Extensible component-based architecture for FLASH, a massively parallel, multiphysics simulation code,”Parallel Computing, V35, 10-11, pp:512-522,2009. 7 Open CFD Limited., “OpenFOAM the open source CFD toolbox user guide,”, Version 1.6, July 2009. 8 Reis, C. R., and de Mattos Fortes, R. P., “An overview of the software engineering process and tools in the Mozilla project,”In Proceedings of the Open Source Software Development Workshop, pp:155-175, 2002. 9 Atwood, C. A., Adamec, S. A., Murphy, M. D., Post, D. E., and Blair, L., “Collaborative software development of scalable DoD computational engieering,”DoD HPCMP User Group Conference, June 2010, Schaumburg, IL. 10 Morton, S., “Rigid maneuvering and aeroelastic results for Kestrel: A CREATE simulation tool,” AIAA Paper2010-1233, 48th Aerospace Sciences Meeting, Jan 2010, Orlando, Florida. 11 http://subversion.tigris.org. 12 Shende, S., “An infrastructure for deploying multi-language CFD application, Final report,” PET CD-KY8-SP1, 2009. https://okc.erdc.hpc.mil. 13 Shende, S., and Malony, A., “The TAU parallel performance system,”Int. J. of High Performace Computing Applications, SAGE Publications, 20(2):287-311, 2006. 14 Shende, S., and Malony, A., “Simplifying memory, I/O, and communication performance assessment using TAU,” Proc. of HPCMP UGC 2010. 15 Wissink, A. M., and S. Shende, “Performance Evaluation of the Multi-Language Helios Rotorcraft Simulation Software,” Proceedings of the DoD HPC Users Group Conference, Seattle WA, June 2008. 16 Mavriplis, D. J., and V. Venkatakrishnan, “A Unified Multigrid Solver for the Navier-Stokes Equations on Mixed Element Meshes,” International Journal for Computational Fluid Dynamics, Vol. 8, 1997, pp. 247-263. 17 Hornung, R. D., A. M. Wissink, and S. R. Kohn, “Managing Complex Data and Geometry in Parallel Structured AMR Applications,” Engineering with Computers, Vol. 22, No. 3-4, Dec. 2006, pp. 181-195. Also see www.llnl.gov/casc/samrai. 18 Wissink. A., Kamkar, S., Pulliam, T., Sitaraman, J., and Sankaran, V., “Cartesian-Adaptive Mesh Refinement for Rotorcraft Wake Resolution,” AIAA Paper 2010-4554, 28th AIAA Applied Aerodynamics Conference, Chicago, IL, June 2010. 19 Sitaraman, J., M. Floros, A. M. Wissink, and M. Potsdam, “Parallel Unsteady Overset Mesh Methodology for a MultiSolver Paradigm with Adaptive Cartesian Grids,” AIAA-2008-7117, 26th AIAA Applied Aerodynamics Conference, Honolulu HI, Jan 2008.

13 of 14 American Institute of Aeronautics and Astronautics

20 Saberi, H., Khoshlahjeh, M., Ormiston, R., and Rutkowski, M. J., “Overview of RCAS and application to advanced rotorcraft problems,”Conference on Aeromechanics, San Francisco, CA, January 21-23, 2004. 21 Johnson, W. “Rotorcraft aerodynamic models for a comprehensive analysis,” American Helicopter Society 54th Annual Forum, Washington, D.C., May 1998. 22 http://hudson-ci.org/. 23 Glover, A. “Spot defects early with continuous integration,” http://www.ibm.com/developerworks/java/tutorials/jcq11207/authors.html. 24 Katz, A. and Sankaran, V., “Mesh Quality E?ects on the Accuracy of CFD Solutions on Unstructured Meshes,” AIAA Paper, 49th Aerospace Sciences Meeting, Olrando, FL,January, 2011. 25 “Intel code coverage tool, In-Depth”, Intel Inc.

14 of 14 American Institute of Aeronautics and Astronautics

Suggest Documents