plementation of adaptive flow solvers using unstructured grids. The excellent ..... 228-234. [15] M. Van Dyke: âAn Album of Fluid Motionâ, Parabolic Press, 1982.
Parallel and Adaptive Finite Element Techniques for Flow Simulation
J. Stiller , W. Wienken , U. Fladrich , R. Grundmann , and W. E. Nagel
TU Dresden, Institut f¨ur Luft- und Raumfahrttechnik (ILR) TU Dresden, Institut f¨ur Str¨omungsmechanik (ISM) TU Dresden, Zentrum f¨ur Hochleistungsrechnen (ZHR) D-01062 Dresden, Germany
Summary We present the software library MG which provides an interface for the parallel implementation of adaptive flow solvers using unstructured grids. The excellent scalability on distributed memory architectures is demonstrated. Current applications include finite element solvers for incompressible as well as compressible flows. Asan large eddy simulations of channel flow example andwethepresent the turbulent using flow past a square cylinder at an SUPG at FEM for the compressible Navier-Stokes equations.
1 Introduction The development of accurate, scalable, solution-adaptive solvers is a cornerstone on the way toward the realistic simulation of problems involving a broad spectrum of scales, as typical for turbulent flows. Since most present (and future) high performance computers rely on distributed memory architectures it is not surprising that virtually every modern CFD code is parallelized. Prominent examples are the DLR FLOWer and TAU codes [1], NPARC’s Wind code [3], or PETSc-FUN3D [5]. Moreover, most commercial CFD codes are now parallelized as well. The integration of adaptive and parallelization techniques, however, involves a new dimension of algorithmic complexity. Not only more flexible and intricate data structures are required. As adaptation changes the local grid density in an unpredictable manner, it also creates the need for dynamic load balancing. So far, only a few research projects have addressed the problem of scalability of parallel adaptive solvers using three-dimensional unstructured grids [2, 4, 11, 14]. The MG library [14] developed by the authors intends to provide a lightweight but highly scalable interface for implementing unstructured flow solvers. In Section 2 we shortly discuss the features and the design of MG. Further the aspects of scalability on parallel computer systems are analyzed in some more detail. In Section 3 we present a finite element Navier-Stokes solver based on MG and preliminary results of its application to large-eddy simulation (LES) of turbulent flows.
2 The MG Library 2.1 Features The objective of developing MG is to provide a software library which combines the following features within one package: – – – – –
data structures and procedures for handling unstructured grids stable and fast methods for grid adaptation transfer operators for parallel multigrid solvers dynamic partitioning and load balancing of multilevel grids support for high-order (spectral element) methods
The current implementation is restricted to tetrahedral cells and does not include grid generation and visualization facilities. On the other hand it offers fully integrated and very efficient tools for parallelization, grid adaptation and multigrid methods. Furthermore, the recently added spectral element library provides the basis for high-order methods that can be applied to direct numerical simulation (DNS) and large-eddy simulation (LES) of complex turbulent flows. 2.2 Scalability The scalability of a flow solver is of crucial importance for large-scale simulations. Especially on massively parallel systems a high efficiency of all subsystems: grid adaptation and partitioning, implicit solvers and explicit updates, is mandatory. The scalability of parallel grid adaptation has been studied extensively in [14]. Under quite general assumptions the cost (wall clock time) of one adaption cycle can be estimated as
, are the sizes, and , where is the number of grid levels,
the numbers of coarse and fine grid partitions, respectively. Neglecting the first contribution, the resulting parallel efficiency is
" !$#
where is a problem dependent constant. An immediate consequence of this estimate is that the parallel efficiency of a fixed size problem will break down inevitably for large . On the other hand, scalability (in terms of a non or slowly growing wall clock time) could be achieved if the ratio remains bounded and is chosen such that a constant partition size is maintained. It should also be noticed that the actual efficiency is affected by several other factors including load balance and possible bottlenecks of the implementation and of the hardware platform. The key features of the algorithm provided with MG are as follows:
1. It is recursive, i.e., fine grid partitions result from coarse grid partitions by subdivision (Fig. 1). These are independent from each other and, hence, this process is easily parallelized. 2. It is multi-objective, i.e., on a given grid level the partitioning method considers the impact on each subsequent grid level. For this purpose the multi-constraint version of MeTiS [7] is used. As demonstrated in [13] this approach leads to a well balanced distribution of all grid levels. 3. All substeps (refinement pattern completion, partitioning, adaptation, and data remapping) are fully parallelized. Altogether these features allow for almost optimal scalability up to several hundred processors (see Figure 2(a)). The parallelization of the other solver components is less critical than grid adaptation. As an example Fig. 2(b) depicts the excellent scalability of the finite element Navier-Stokes solver described in the following section.
3
Application to FEM for Compressible Flow
The applications implemented on top of MG include finite element based solvers for advection-diffusion problems, incompressible and compressible Navier-Stokes equations as well as magnetohydrodynamic problems at low magnetic Reynolds numbers. In the following we focus on the compressible Navier-Stokes solver and its application to the large-eddy simulation of subsonic flows. 3.1 Numerical Model The filtered Navier-Stokes equations for a perfect gas can be written as
where
are the resolved conservative variables and
are the advective and diffusive fluxes with superscript sgs denoting the unresolved (subgrid-scale) contribution. The SGS contribution of diffusive fluxes can be neglected in most situations [17]. The unresolved advective fluxes are modeled using the simplest reliable approach consisting of the trace-free Smagorinsky model with van Driest damping and the Reynolds analogy for the SGS heat flux. Furthermore, the isotropic part of the SGS stress can be neglected for moderate turbulence Mach numbers [19]. The equations are discretized using the Streamline-upwind / Petrov-Galerkin (SUPG) method developed by Hughes and coworkers [12] and further improved by Jansen et al. [6]. The basic idea of SUPG consists in adding a solution-dependent ”upstream-weight” to the Galerkin test functions. In contrast to artificial viscosity
methods this approach is consistent with the exact filtered equations. Using linear shape functions it leads to a second order upwind scheme. For time integration of the resulting system of ordinary differential equations we use the standard 4-stage Runge-Kutta method combined with a block Jacobi iteration for resolving the consistent mass matrix. 3.2 Results The numerical model was implemented on top of MG. Using the tools provided by the library, parallelization and grid adaptation required only a few lines of code. An example that illustrates the capabilities of the solver is given in Fig. 3. It shows a snapshot of the computed flow (Mach number distribution) and the adapted grid in comparison to an Schlieren photograph by A. C. Charters [15]. The computational
nodes was adapted and redistributed over 16 procesgrid consisting of about ). Though the Reynolds sors after every tenth time step (at a Courant number of number was considerably smaller in the computation, the agreement of the resolved flow structures with the experiment is remarkable. A quantitative analysis of the results and performance of this application is in progress and will be subject of a forthcoming paper. In the remaining part of this section we discuss the application to the largeeddy simulation of the turbulent flow in a plane channel and the flow past a square cylinder. In these preliminary computational experiments we deliberately did not use the grid adaption facilities, but focused on the validation of the method itself. A more detailed description and further examples are provided in [18]. Turbulent Channel Flow. The turbulent flow in a plane channel at Reynolds num (based on the shear velocity and the channel half height ) and a ber was considered. To fit the relevant structures, a commean Mach number of putational domain of dimension was chosen. For the homogeneous flow and spanwise directions periodic boundary conditions were imposed. Unfortunately, the available grid generator was not capable of producing an unstructured two-periodic grid. Therefore, an equidistant quasi-structured tetrahedral grid con nodes and approximately million cells was used. In the sisting of and a Prandtl subgrid-scale model the Smagorinsky constant was set to number of was assumed. In Fig. 4 the computed mean velocity and velocity fluctuations are compared to DNS data of Kim et al. [8]. The overestimation of centerline velocity indicates that ) is still too large for computing the the wall normal mesh spacing ( wall shear stress accurately. Apart from this, the quality of results matches that of typical second order finite difference approximations.
Flow past a Square Cylinder. The second example is devoted to the flow past a square cylinder at a Reynolds number of 21,400. This flow was studied experimentally by Lyn et al. [10, 9]. Moreover it was also subject of various numerical
studies summarized in [16]. For our study, the computational domain was fixed to with the cylinder of side length placed at a distance of from the inlet. Laminar inflow, isothermal no-slip cylinder wall and reflect weakly ing outflow conditions were imposed. In the subgrid model and were assumed. The computational domain was discretized using an unstructured grid consisting of approximately 268,000 nodes and 1.29 million tetrahedral cells.
was refined to a resolution of near The global mesh spacing of the cylinder. Though this grid is pretty coarse for a large-eddy simulation at such Reynolds number the results are very promising. In Fig. 5(a) the characteristic flow parameters are compared to the experimental data of Lyn & Rodi and numerical results from [16]. The Strouhal number almost perfectly agrees with the experiment while the drag coefficient, , and the average length of the recirculation zone, , exhibit a difference of less than ten percent. The standard deviations of and the lift coefficient fit well in the available numerical data. The non vanishing lift coefficient indicates, however, that the averaging time was still insufficient for removing the statistical uncertainty. Figures 5(b–g) demonstrate the good agreement of the mean velocity and fluctuations to experiment at the given position. Finally, Fig. 5(h) shows a typical isobar surface revealing the instantaneous vortex structure.
4
Conclusion
The presented MG library provides a highly scalable interface for parallel and adaptive flow solvers using unstructured grids. A finite element Navier-Stokes solver was implemented on top of MG and successfully applied to perform LES of turbulent flows in a channel and past a square cylinder. Though not a reference method, the considered linear SUPG-FEM appears to be suitable for application-oriented LES.
References [1] P. Aumann, H. Barnewitz, H. Schwarten, K. Becker, R. Heinrich, B. Roll, M. Galle, N. Kroll, Th. Gerhold, D. Schamborn, M. Franke: ”MEGAFLOW: Parallel complete aircraft CFD”. Parallel Comp. 27, 2001, pp. 415-440. [2] P. Bastian, K. Birken, K. Johannsen,S. Lang, N. Neuss, H. Rentz-Reichert, C. Wieners: ”UG – A Flexible Software Toolbox for Solving Partial Differential Equations”. Computing and Visualization in Science 1, No. 1, 1997, pp. 27-40. [3] M.S. Fisher, M. Mani, D. Stookesberry: ”Parallel processing with the Wind CFD code at Boeing”. Parallel Comp. 27, 2001, pp. 441-456. ¨ [4] J.E. Flaherty, R.M. Loy, C. Ozturan, M.S. Shephard, B.K. Szymanski, J.D. Teresco, H.L. Ziantz: ”Parallel structure and dynamic load balancing for adaptive finite element computation”. Appl. Num. Math 26, 1998, pp. 241-263. [5] W.D. Gropp, D.K. Kaushik, D.E. Keyes, B.F. Smith: ”High-performance parallel implicit CFD”. Parallel Comp. 27, 2001, pp. 441-456. [6] K.E. Jansen, S.S. Collis, C. Whiting, F. Shakib: ”A better consistency method for loworder stabilized finite element methods”. Comp. Meth. Appl. Mech. Engrg. 174, 1999, 153-170.
[7] G. Karypis, V. Kumar: ”Multilevel Algorithms for Multi-Constraint Graph Partitioning”, Univ. Minnesota, Dep. Computer Science, TR 98-019, 1998. [8] J. Kim, P. Moin, R. Moser: ”Turbulence Statistics in Fully Developed Channel Flow at Low Reynolds Number”. J. Fluid Mech. 177, 1987, pp. 133-166. [9] D. Lyn, S. Einav, W. Rodi, J. Park: ”A Laser-Doppler Velocimetry Study of EnsembleAveraged Characteristics of the Turbulent Wake of a Square Cylinder”. J. Fluid Mech. 304, 1995, pp. 285-319. [10] D. Lyn, W. Rodi: ”The Flapping Shear Layer Formed by Flow Separation from the Forward Corner of a Square Cylinder”. J. Fluid Mech. 267, 1994, pp. 353-376. [11] L. Oliker, R. Biswas: ”PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes”. NAS Rep. 97-020, NASA Ames Research Center, 1997. [12] F. Shakib, T.J.R. Hughes, Z. Johan: ”A new Finite Element Formulation for Computational Fluid Dynamics: X. The Compressible Euler and Navier-Stokes Equations”. Comp. Meth. Appl. Mech. Engrg. 89, 1991, 141-219. [13] J. Stiller, K. Boryczko ,W.E. Nagel: ”A New Approach for Parallel Multigrid Adaption”. In B. Hendrickson et al. (Eds.): Proc. 9th SIAtheM Conf. on Parallel Processing for Sci. Comp., SIAM, 1999, ISBN 8-89871-435-4 [14] J. Stiller, W.E. Nagel, U. Fladrich: ”Scalability of Parallelel Multigrid Adaption”. In E. Dick et al. (Eds.): Multigrid Methods VI. Springer 2000, pp. 228-234. [15] M. Van Dyke: ”An Album of Fluid Motion”, Parabolic Press, 1982. [16] P. Voke: ”Flow Past a Square Cylinder: Test Case LES2”. In J. Challet et al. (Eds.): Direct and Large Eddy Simulation II. Kluwer, 1997. [17] B. Vreman, B. Geurts, H. Kuerten: ”Subgrid-modelling in LES of compressible flow”. Appl. Sci. Research 54, 1995, pp. 191-203. [18] W. Wienken: ”Eine Finite-Element-Methode f¨ur die Large-Eddy-Simulation und ein darauf basierendes Verfahren zur Bestimmung des Kavitationsbeginns”. PHD thesis, TU Dresden, Fakult¨at Maschinenwesen, Jan 2003. [19] T.A. Zang, R.B. Dahlburg, J.P. Dahlburg: ”Direct and large-eddy simulations of threedimensional compressible Navier-Stokes turbulence”. Phys. Fluids A 4, 1992, pp. 127140.
Figure 1
Mulitilevel grid partitioning
1 1
Parallel Efficiency
Parallel Efficiency
0.8
0.6
0.4 6.2 x 105 cells 2.5 x 1066 cells 9.8 x 10 cells 3.9 x 1078 cells 1.6 x 10 cells const. partition size theory, C/M = 0.038
0.2
0 2
4
8
0.95
0.85
0.8
16 32 64 Number of Processors
128
256
1
(a) Local grid adaptation (Cray T3E-1200)
4
8 16 Number of Processors
32
64
120
Scalability
(Mach number coloring)
Figure 3
2
(b) Compressible Navier-Stokes (SGI O3800)
Figure 2
Computation,
Compressible Navier-Stokes FEM + explicit Runge-Kutta 202x50x64 nodes, 3.9 million cells
0.9
Experiment,
[15]
Flow past a sphere at
10
25
9 8
/ u2τ [ - ]
[ - ]
20
15
10
7 6 5 4 3 2
5
1 10
0
10
1
+
y [-]
10
(a) Mean velocity
Figure 4
2
10
3
0
0
30
60
90
+
120
y [-]
(b) Velocity fluctuations
Turbulent channel flow. Symbols DNS [8], solid lines LES
150
180
LES exp. [10] min [16] max [16]
#
-
#
➞
(a) Comparison of flow parameters
2
2
exp. [9] LES
1.5
y/D[-]
1.5
y/D[-]
(b) Reference position
1
0.5
exp. [9] LES
1
0.5
0
0 -0.5
0
0.5
1
2
1.5
2
-0.5
-0.4
/uref [ - ]
2
exp. [9] LES
1.5
1
0.5
0
exp. [9] LES
1
0.5
0
0 0
0.1
0.2
2
0.3
0.4
0.5
0
0.1
/uref [ - ]
2
1.5
0.2
2
0.3
/uref [ - ]
(e) Streamwise velocity fluctuations
y/D[-]
-0.1
(d) Mean cross flow velocity
y/D[-]
y/D[-]
1.5
-0.2
/uref [ - ]
(c) Mean velocity in flow direction 2
-0.3
(f) Cross flow velocity fluctuations
exp. [9] LES
1
0.5
0 -0.2
-0.15
-0.1
-0.05
0
2
0.05
0.1
0.15
0.2
/uref [ - ]
(g) Reynolds stress
Figure 5
(h) Typical isobar surfaces
Flow past square cylinder
0.4
0.5