equations are solved implicitly along lines in regions where the grid is highly ... time integration is retained everywhere else and the explicit and line-implicit flow ...
Application of a line-implicit scheme on stretched unstructured grids Peter Eliasson1 FOI, Swedish Defence Research Agency, SE-16490 Stockholm, Sweden Per Weinerfelt2 Saab Aerosystems, SE-581 88 Linköping, Sweden and Jan Nordström3 FOI, Swedish Defence Research Agency, SE-16490 Stockholm, Sweden Uppsala University, Department of Information Technology, SE-751 05 Uppsala, Sweden
A line-implicit Runge-Kutta time stepping scheme is derived, implemented and applied. It is applied to fluid flow problems governed by the Navier-Stokes equations on stretched unstructured grids. The flow equations are integrated implicitly in time along structured lines in regions where the grid is stretched, typically in the boundary layer, and explicitly elsewhere. The integration technique is introduced for steady state problems with the intention to speed up the rate of convergence. It is extended to unsteady problems by a dual time stepping approach. The paper focuses on the implementation of the line-implicit scheme starting from an explicit multigrid flow solver and on the application of it. Numerical results are presented for test cases in two and three dimensions for inviscid and viscous flow problems. The line-implicit time integration convergence rates are compared to pure explicit convergence rates and the gain is quantified in terms of reduction of iterations and CPU time. All presented test cases show improved convergence rates. The gain is highest for the three dimensional test cases for which reductions of up to 75% of the computing time is obtained.
I. Introduction
C
OMPUTATIONAL fluid dynamics problems in general have widely varying length and time scales. In three dimensional viscous problems with several millions of grid points, the length scales spanning the computational domain are several orders of magnitude larger than the smallest scales resolved by neighboring points, for example in the boundary layer. To resolve these scales, anisotropic grids are used with high aspect ratio cells in boundary layers and possibly in wakes which results in stream-wise length scales that are several orders of magnitude larger than the normal length scales. It is well known that explicit methods applied to problems with disparate scales may be extremely inefficient due to the restrictions in the time step coming from the smallest scale. Implicit methods, on the other hand, can handle these problems efficiently since they have no stability restriction on the time step. The only significant problem with the implicit technique is that a very efficient solver is required. Explicit Runge-Kutta methods in combination with multigrid have shown to be efficient for inviscid fluid dynamics problems1-3 where the computational grids have moderate stretching and steady state convergence may be obtained in O(102) iterations. For viscous high Reynolds number flows, stretched and highly anisotropic grids are used and the convergence is degraded due to the small scales introduced by the stretching. The number of iterations 1
Deputy Research Director, Department of Computational Physics, AIAA Member. PhD., Aeronautical Engineering. 3 Director of Research in Numerical Analysis at FOI and Adjunct Professor in Numerical Analysis at Uppsala University. 2
1 American Institute of Aeronautics and Astronautics
typically increases to O(10 3-10 4), depending on the Reynolds number, leading to a substantially higher computational cost. Although this increased cost may be affordable for steady state RANS problems, the cost increases further with an order of magnitude for unsteady problems where a steady state problem is solved in each time step using the dual time stepping approach4. It is necessary and crucial to enhance the solution procedure by introducing more efficient time integration techniques. There are many time integration techniques available for unstructured grids; an overview is given by Mavriplis5. The approach in this paper is a line-implicit or semi-implicit Runge-Kutta time stepping approach6 where the flow equations are solved implicitly along lines in regions where the grid is highly stretched. The explicit Runge-Kutta time integration is retained everywhere else and the explicit and line-implicit flow equations are accelerated to steady state with agglomeration type of multigrid. The implicit lines are computed as part of the preprocessing required for the solver. The implicit lines are also used for directional coarsening multigrid which permits faster convergence rates. The line-implicit time integration will reduce the stiffness introduced by the stretching of the grid since the implicit integration will remove the restriction of the local time step along these lines which allows for much larger time steps. Another advantage with the line-implicit Runge-Kutta technique is that the implicit procedure can be introduced fairly easily into an existing explicit flow solver; in addition the implicit solution along the lines requires the inversion of a block tri-diagonal matrix which can be done by a direct solution. In the following sections, the flow solver Edge is briefly introduced followed by a description of the numerical solution procedure of the line-implicit approach. Following this, some details about the implementation into Edge are given after which numerical results for inviscid and viscous flow problems in two and three dimensions are presented.
II. Numerical Approach A. The Flow Solver Edge The CFD solver in the present paper is the Edge code (http://www.edge.foi.se/), which is an edge- and nodebased Navier-Stokes flow solver applicable for both structured and unstructured grids7-9. Edge is based on a finite volume formulation where a median dual grid forms the control volumes with the unknowns in the centres. The governing equations are integrated in time by a multistage Runge-Kutta scheme to steady state and with acceleration by FAS agglomeration multigrid. A pre-processor creates the dual grid and the edge based data structure, which is also employed to agglomerate coarser control volumes for the multigrid and to split up the computational domain for parallel calculations10. The Edge solver is able to provide different options such as different spatial discretization techniques, turbulence models, low speed preconditioning and high temperature extensions. B. Flow equations The governing flow equations are discretized in space with a conservative finite-volume method. The Euler or Navier-Stokes equations may be written in semi discrete conservation form as
V
q q R( q) V R E ( q) R I ( q) 0 t t
(1)
where q is the conservative solution vector, V is the volume matrix, R(q ) the residual vector and R E ( q), R I ( q) the explicit and implicit part of the residual vector respectively. The implicit residual vector contains the fluxes along the line only. C. Time integration The equations are integrated to a steady state by a semi-implicit m-stage Runge-Kutta scheme
q0 q n (I k
t R I t )(q k q k 1 ) q 0 q k 1 k R(q k 1 ); k 1,, m V q V
q n 1 q m
2 American Institute of Aeronautics and Astronautics
(2)
R I denotes the Jacobian of the implicit residual q vector. ( 1 ,, m ) are the explicit Runge-Kutta coefficients where m 1 is required for consistency. The coefficients where
( 1 ,, m ) are the implicit Runge-Kutta coefficients. If these values are zero the explicit Runge-Kutta scheme is recovered. The implicit coefficients should be chosen large enough to guarantee convergence but not too large since large values may reduce the speed of convergence. The coefficients may be chosen so that stability is obtained for all m
time steps and it can be shown that
m
i i is required i 1
i 1
to guarantee stability for all time steps and also to guarantee Figure 1. Stability region for the explicit 3-stage damping for infinitely large time steps (L-stability) which is scheme. required for practical applications. This implies that the implicit part has to be applied in all stages. It can also be shown that a consistent approximation is obtained for all positive values of the implicit coefficients. Since the semi-implicit Runge-Kutta scheme will be used for steady state calculations only, the ordinary 3-stage explicit coefficients are used in all calculations in the paper, these coefficients are ( 1 , 2 , 3 ) (2 3 , 2 3 ,1) with the stability region in Fig. 1. The implicit coefficients are selected to ( 1 , 2 , 3 ) (1,1,1) leading to stability for all time steps. These coefficients have been used in all computations in the paper. Although the semi-implicit scheme is stable for infinite time steps in one dimension along the implicit line, the time step will be limited from the scales in the other directions. D. Flux calculations The discretization in space is based on a finite volume formulation on an unstructured grid with a central convective operator with added numerical dissipation based on second and forth difference operators. Consider a control volume V0 for a node with subscript 0. The spatial discretization of Eq. (1) for this node may be written as
V0
n0 q0 n0 f 0 k g 0 k V0Q0 0 t k 1 k 1
(3)
where f 0k denotes the convective flux between nodes 0 and k, n0 is the number of nodes connected with an edge to node 0, g 0k the viscous flux and Q0 the source terms. The convective flux is computed as
f0k f (
q0 qk ) d 0k 2
(4)
where d 0k is the numerical dissipation constructed from a blend of first and third differences where the third differences in turn are constructed from undivided Laplace operators. For more details see Eliasson7. A compact discretization of the normal derivatives in the viscous terms is used11, the remaining tangential parts are constructed from nodal gradients obtained from Green-Gauss integration and added to have a full viscous operator. In the implicit treatment of the equations, the left hand side in Eq. (1) is a block tri-diagonal matrix if one assumes that the flux between two nodes depends on the two nodes only. The block tri-diagonal matrix becomes:
3 American Institute of Aeronautics and Astronautics
R1 I R1 I q2 q1 I R I R I R2 2 2 q q2 q3 R I 1 q I I I RN 1 RN 1 RN 1 q N 2 q N 1 q N I I RN RN q N 1 q N f 1 q 1 f12 f12 q q 2 1 f N 2, N 1 f N 2, N 1 q N 2 q N 1 f N 1, N f N 1, N q N 1 q N
f 12 q1
f12 q2 f 23 q2
f 23 q3 f N 1, N q N 1
f N 1, N q N f N , N 1 q N
(5)
where there are N nodes in the line and index 1 is assumed to be located on a boundary and index N is the last node in the matrix connected to node N+1 being treated explicitly. This is by no means a restriction but only an assumption for illustration. The term
f k ,k 1
denotes the flux Jacobain of the flux between nodes k,k+1 with respect
qk to the unknown in node k. The flux Jacobian
f1 denotes the Jacobian of the boundary flux. The viscous fluxes and q1
source terms have been left out for brevity. The flux Jacobians in Eq. (5) are constructed from the discrete numerical operator in Eqs. (3), (4) by differentiation of the flux. To avoid a block penta-diagonal matrix that would appear from third differences in the numerical dissipation operator in Eq. (4), the numerical dissipation on the left hand side is replaced by a simplified numerical dissipation:
d I k ,k 1 2 k 1 ( qk 1 qk )
(6)
2
where 1 is the spectral radius of the convective operator k 2
f k ,k 1 qk
and 2 is a positive constant. This term
guarantees stability and diagonal dominance. The value has been chosen to 2 0.2 in all calculations in the paper. A too small value may cause divergence; a too large value will deteriorate the rate of convergence. The resulting block tri-diagonal matrix is inverted directly with the Thomas algorithm12 without pivoting. Upwind type of schemes may be used as well with corresponding simplification of the numerical left hand side dissipation. Upwind discretization is applied for the turbulent equations in the examples below. The left hand side operator is first order accurate though.
III. Implementation The most important steps to be taken in the implementation of a line-implicit time integration approach are described. The starting point is an explicit multigrid flow solver.
4 American Institute of Aeronautics and Astronautics
A. Creating the implicit lines The lines are created in a preprocessing step and stored in the flow solver. The algorithm is a graph algorithm based on grouping together neighboring stretched edges 6, 13. The lines are created by using node and edge data only, i.e. the algorithm is consistent with and edge-based formulation without involving element information. The following steps are taken: 1. For each node, construct a list of edges that originate from the node and compute the length of edges. 2. For each node, construct the aspect ratio AR by dividing the largest length of any connected edge to the smallest connected edge. For each edge connected to the node, compute the aspect ratio ARED by dividing the length of the edge by the minimum length of any connected edge to the node. 3. Sort the nodes with respect to aspect ratios. 4. Mark all nodes as free, i.e. they don’t belong to any lines initially. 5. Start the iterative line search by choosing the free node with largest aspect ratio AR > ARmax, where ARmax is a threshold for which the largest aspect ratio in the line should exceed. Mark node as selected. 6. Continue from this node with the next free edge with smallest and available ARED connected through an edge. This node and edge are marked as selected. The procedure is repeated until no more connected free nodes and edges are found that satisfy AR > ARmin. Since the starting node may be in the interior (e.g. in a wake) and not necessarily on a wall boundary, the search is repeated in the opposite direction from the starting node as well. All nodes and edges are marked as a line. 7. Go back to 5 to generate next line. Interrupt when no free node satisfy AR > ARmax . 8. Identify if end nodes belong to a boundary (and what type of boundary) or if it is an interior node. 9. Remove too short lines where N < Nmin. 10. For parallel calculations where a partitioning has been made by cutting edges, ensure that the lines with cut edges are multiply defined in all partitions involved to maintain single partition performance. The outcome of this algorithm is a set of edges that each forms a structured line. With this procedure, the lines are non-overlapping and an edge and node can only occur in one single line. The parameters used in the algorithm above are ARmax = 100, ARmin = 2.5, Nmin = 10, i.e. the aspect ratio of the largest edge should exceed ARmax, the smallest aspect ratio should exceed ARmin and the number of nodes in any line should exceed Nmin. This algorithm can be applied on all multigrid levels. B. Implementation in the flow solver The implementation in the flow solver is done by adding one routine in the Runge-Kutta stage loop that inverts the left-hand side matrix and, as a result, modifies the residual. The unknowns are updated in the way it is done for the explicit scheme, i.e.
qk q0 k
t ~ . R V
(7)
By inserting Eq. (7) in Eq. (2) one obtains the following formulation for stage k:
(I k
t R I t ~ t )(q 0 q k 1 k R ) q 0 q k 1 k R(q k 1 ) V q V V
~
Ax b
(8)
where R denotes the unknown residual. Note that the semi-implicit approach will only update a reduced part of the residual vector containing the nodes being treated implicitly. The explicit approach can then be used to update the ~ entire solution based on the explicit scheme, i.e. where R contains the modified residual vector from Eq. (8) or, for ~ nodes outside the lines for which the residuals R R ( q k 1 ) . To set up the left hand side and invert the matrix the following steps are be taken in the flow solver: 1. Loop over all lines 2. Compute the convective and diffusive flux Jacobians with respect to the primitive variables, this creates more sparse matrices. Add the flux Jacobians in Eq. (5) to the matrix. Add the contribution from interior as well as from boundary fluxes. 3. Multiply with the transformation matrix between primitive and conservative variables. 5 American Institute of Aeronautics and Astronautics
4. Add the contribution from the simplified numerical dissipation on the left hand side in Eq. (6). 5. Add contributions from source terms, e.g. from the turbulent equations or from the dual time stepping approach for unsteady problems. 6. Strong boundary conditions have to be given special attention. With a strong boundary condition, the value on the boundary is injected and hence constant and any derivatives with respect to these values are 0. These values are not unknowns and should be excluded from the matrix. Nevertheless, the values are kept in the matrix since there may be combinations of both strong and weak boundary conditions on a boundary. For weak and strong boundary conditions see Svärd et al.14.
t t R I . ; add the identity matrix to obtain the left hand side A ( I k ) V V q 8. Make an LU-factorization of the matrix A , compute the right hand side in Eq. (8), do the inversion and 7. Multiply the matrix by k
compute the modified residual to be used in Eq. (7). 9. Repeat this procedure for all lines. Strong no-slip wall boundary conditions are mostly used for the velocity and for turbulent variables, for other variables and on other boundaries weak boundary conditions apply for which the flux is computed with the specified values and where nodal values on the boundary are unknowns. In the generation of the Jacobian, the exact flux Jacobian of the convective central flux of the 5 (in 3D) mean flow equations is used. In the Jacobian of the viscous terms, the dynamic viscosity and the turbulent viscosity are assumed to be constant. The turbulent flow equations are treated implicitly as well. In the Jacobian of the turbulent source terms, only the parts with negative real eigenvalues are included, i.e. the turbulent production is excluded. For steady state calculations based on the dual time stepping approach, an additional source term containing the physical time step is added to the diagonal of the matrix. All governing equations are solved simultaneously, i.e. the blocks in the Jacobian for a calculation based on the k- ω turbulence model are 7×7 matrices. The local time step needs to be modified as well. Since the linex implicit time integration is unconditionally stable along the line, the y time steps have to be modified and increased so that they no longer are restricted by the small scales along the line but on the length scale in the y other directions. The increased local time step is what causes the x enhanced convergence rate. To illustrate this, consider the simple Cartesian grid in Fig. 2 where the stretching and the implicit treatment Figure 2. Cartesian grid stretched in y. of the equations are carried out in the y-direction. With an explicit approach the local convective time step is computed as
t exp l CFL
V x y y CFL CFL expl cx cy c
where exp l denotes the convective spectral radius of
; y x
(9)
R obtained from integration over all control surfaces q
enclosing a control volume. c denotes the local maximum characteristic velocity, typically the velocity plus the speed of sound. Hence, on a highly stretched grid, the local time step will be proportional to the smallest scale y . With a semi-implicit approach with line-implicit treatment in the y-direction, the local time step becomes:
t semi CFL
x y V x CFL CFL semi cy c
.
(10)
Here semi denotes the convective spectral radius for the semi-implicit approach obtained in a similar manner as the corresponding explicit spectral radius but where the contributions along the implicit lines have been left. The local time step becomes proportional to x which is much larger on stretched grids. Hence, the speed-up with the semi-implicit approach comes from the larger time steps applied in the regions where the grid is stretched. For large Reynolds numbers the stretching may in the order of O(103-10 5), the increase in local time step with the semiimplicit approach is thus the same. 6 American Institute of Aeronautics and Astronautics
To have the possibility to control how much the local time step is increased, an additional parameter CFL semi 1 is defined in the flow solver that modifies the local time step such that
t exp CFL semi
t 1
t exp t semi
(CFL semi 1)
;
t exp CFL
V V . ; t semi CFL exp semi
(11)
Note that when CFLsemi 1 the explicit time step is obtained, when CFLsemi the semi-implicit time step is obtained. To speed up the rate of convergence, a large value of CFL semi is hence required.
IV. Numerical results Numerical results are presented in two and three dimensions for steady state and unsteady inviscid and viscous calculations. In common for all these calculations is the three stage explicit and semi-implicit Runge-Kutta scheme with the coefficients given above. A CFL number of CFL 1.25 is used everywhere and for the semi-implicit calculations CFL semi 10 4 unless otherwise stated. Directional semi-coarsening is used along the lines, typically 4 fine grid cells are fused to obtain one coarser cell. In the region outside of the lines where the grid is isotropic, 4 and 8 fine grid cells are in average fused to obtain one coarse cell in two and three dimensions respectively. The implicit time integration is only applied on the finest grid and, unless otherwise stated, the no-slip boundary condition for the velocity is satisfied using strong boundary conditions, i.e. the wall zero velocity is injected into the solution. A. Inviscid channel flow The first test case is a 2D channel with a bump on the lower wall. Two structured grids are generated, one baseline isotropic grid with 4941 nodes for Figure 3. Top: isotropic channel grid (4941 nodes). comparisons with explicit calculations. The second grid Bottom: stretched grid with implicit lines (49141 contains 49141 nodes with stretching normal to the nodes). lower wall and with the same stream wise distribution as the first grid. The maximum aspect ratio is 225. The two grids can be seen in Fig. 3 where the implicit lines are marked on the stretched grid. The lines contain about 120 nodes each. Four multigrid levels with Wcycles are used in the calculations. The incoming flow is subsonic, M∞ = 0.3. The total pressure and temperature are specified on the inlet boundary, the static pressure on the outlet. The rate of convergence to machine accuracy as function of the multigrid cycles is displayed in Fig. 4. Six convergence curves are presented with and without multigrid with explicit and line-implicit calculations on the stretched grid. The rate of convergence is, as expected, slow with explicit single grid calculations on the two grids. There is a remarkable speed-up on the isotropic grid with the explicit scheme and multigrid. The convergence is slower though on the stretched grid. The line-implicit time integration without multigrid converges faster than the explicit scheme with multigrid. Approximately the same rate of convergence is obtained with multigrid using lineimplicit time integration on the stretched grid as with multigrid and Figure 4. Rate of convergence (density explicit time integration on the isotropic grid. Hence, the deterioration in residual) for the two channel grids with convergence due to the stretching of the grid is removed by the line- (Ngrid=4) and without (Ngrid=1) implicit time integration. multigrid.
7 American Institute of Aeronautics and Astronautics
B. Turbulent flow over NACA0012
Figure 5. Computational grid and implicit lines for NACA0012. Normal wall distance to first interior node 10 -5 chords with in average 33 nodes per implicit line. The second test case involves turbulent transonic flow over a NACA0012 airfoil on three successively refined grids. The grids have been generated with an in-house grid generator 15,16 have varying normal distance from the airfoil to the first interior node being 10-5, 10 -6 and 10-7 chord lengths respectively. The grids have the same number of nodes on the airfoil, about 310 nodes, and have a varying number of quad cells in the wall normal direction with about 35, 45, and 55 layers in average. The triangular grid outside of these layers are the same for all grids, the total number of nodes is about 51×103, 54×103 and 57×103 nodes respectively. The maximum aspect ratio of the stretching is 1.5×10 3, 15×103 and 150×10 3. The average numbers of nodes in the lines are 33, 40 and 50 respectively. The coarsest grid and the implicit lines can be seen in Fig. 5. Note that the airfoil is cut off with 0.1% thickness at the trailing edge to avoid the singularity with a sharp trailing edge which causes the collapse of the dual grid and convergence problems, in particular with high normal stretching. Four levels of full multigrid W-cycles are used in all calculations. Directional coarsening with coarsening ratios 1:4 or 1:2 along the lines are used. The flow is transonic, the flow conditions are M∞ = 0.754, Re = 6.2×106, with a small angle of attack, α = 2.57°. The Spalart-Allmaras one-equation model was used in the calculations17; the turbulent equations are discretized with a 2 nd order upwind method. The convergence with different implicit CFL numbers CFLsemi is displayed in Fig. 6 for the finest grid with a directional coarsening with a ratio of 1:2 along the lines. The increased rate of convergence can clearly be seen as the CFL number is increased. There is very little gain by increasing the CFL number beyond CFL semi 10 4 . The rate of convergence of the density residual and drag force can be seen in Fig. 7 for the three grids with explicit and line-implicit time integration. The plots display the convergence using two different coarsening ratios, 1:4 and 1:2 along the lines. The convergence with the explicit multigrid acceleration clearly deteriorates as the wall normal distance is reduced. The coarsening ratios along the stretched lines also have a clear influence on the results. A higher coarsening ratio improves the convergence which is natural since larger coarser grids give more isotropic cells that allow for larger time steps with faster convection of the errors. Nevertheless, the rate of convergence is in all cases slower than with the lineimplicit integration. With the line-implicit integration, the fastest convergence is obtained with the two finest grids, the coarsest grid provides slightly slower convergence rates but still faster than with explicit time marching. There is almost no sensitivity to the coarsening ratio along the lines with the implicit time integration which indicates that the damping provided by the implicit scheme has a larger influence on the spectra than the multigrid coarsening ratio. Since the two finest grids provide the same rate of convergence, the convergence is independent Figure 6. Rate of convergence (density residual) for the the NACA0012 airfoil. of the stretching. Increasing CFL semi .
8 American Institute of Aeronautics and Astronautics
Figure 7. Rate of convergence for the turbulent flow over the NACA0012 airfoil. Convergence compared for three wall distances, 10-5, 10 -6 and 10-7 chords. a) Density residual, directional coarsening ratio 1:4, b) Density residual, dir. coarsening ratio 1:2, c) CD, dir. coarsening ratio 1:4, d) CD, dir. coarsening ratio 1:2. A faster convergence of the drag is obtained as well with the implicit approach. On the grid with the highest normal resolution, the drag converges to within 0.1% of its steady state value in about 690 iterations with the semiimplicit approach and coarsening ratio 1:4 along the lines. The corresponding iteration number with the explicit approach is 1700 iterations. C. Three-dimensional flow over the M6 wing The next example demonstrates viscous flow calculations over the ONERA M6 wing. The flow conditions are M∞ = 0.84, α = 3.06°, and Re = 11.3×106, experimental data is available18. A grid with 0.92×106 nodes is used. The grid has a constant number of 30 prismatic layers and about 20×103 nodes on the wing surface. The grid contains 1.2×106 prismatic cells and 1.8×10 6 tetrahedral cells. The wall normal distance to the first interior node is about 1.5×10 -6 chords. The maximum aspect ratio is 12×103, and the implicit lines contain 24-30 nodes. Again semicoarsening is used along the lines with 4 fine dual cells merged into one coarse cell. An average coarsening ratio of 8 is used in the tetrahedral region. An Explicit Algebraic Reynolds Stress Model is used with a length scale determining equation based on ω19, 20. An upwind discretization of the turbulent equations is used where a minmod TVD
Figure 8. Viscous and inviscid surface grids for the ONERA M6. Left: Adapted viscous surface grid. Right: Inviscid surface grid.
Figure 9. Rate of convergence for the viscous flow over the ONERA M6 wing. Left: Convergence of density residual, Right: Convergence of normalized lift coefficient.
9 American Institute of Aeronautics and Astronautics
limiter ensures second order spatial accuracy. The convergence on the viscous grid is compared to the convergence of an inviscid calculation on an unstructured tetrahedral grid with about the same number of nodes on the surface. The inviscid grid has 0.23×106 nodes and 17×103 nodes on the wing surface. The two surface grids are displayed in Fig. 8, the viscous grid has been adapted to the shock. On both grids three full multigrid W-cycles are used. The coarsening ratio of the inviscid grid is 1:8 as in the viscous grid in the tetrahedral part. The rate of convergence of the density residual and the lift force is shown in Fig. 9. The convergence rate is slowest with the explicit time marching. With the line-implicit approach, the slope is steeper and is in fact slightly higher than that obtained for the inviscid calculation. The residuals drop about 7.5 orders of magnitude and stay there; the explicit scheme will eventually reach the same level where the decay of residuals stops. The convergence of the lift force is also considerably faster with the line-implicit approach The lift converges to within 0.1% of its steady state value in about 2600 fine grid multigrid iterations with the explicit time marching. The converged lift force is obtained within about 6 orders of reduction of the density residuals. With the line-implicit approach, the same value is reached already within 550 iterations. D. Three-dimensional flow over a high-lift configuration
Figure 10. High lift model. Left: high lift configuration with fuselage, wing, slat and flap. Right: surface mesh and implicit lines in the junction between the fuselage and slat/wing. Another comparison is carried out for a three-dimensional three element highlift configuration with full span slat and flap deflected for take-off conditions21, 22. The flow conditions are M∞ = 0.176 and Re = 15×10 6 at an angle of attack in the linear range of the lift polar. The computational grid has 7.7×106 nodes, about 40 prismatic elements although the number is varying, 14.1×106 prismatic elements and 2.1×106 tetrahedral elements. The surface grid contains about 194×10 3 nodes. The maximum aspect ratio is as high as 143×103, and the implicit lines contain 24-45 nodes. For both time integration approaches, directional semi-coarsening is used along the implicit lines with 4 fine dual cells Figure 11. Rate of convergence for the high lift test case. Left: per coarse cell, 4 full multigrid V-cycles Convergence of density residual, Right: Convergence of lift. are used. The Spalart-Allmaras turbulence model17 is used with a second order accurate spatial discretization. The model and part of the surface grid is displayed in Fig. 10. The rate of convergence for the density residual and the lift coefficient can be seen in Fig. 11. There is again a considerable speed up in terms of iterations using the semi-implicit time integration. To reach a level of convergence of the lift coefficient within 0.1% of its steady state 10 American Institute of Aeronautics and Astronautics
value requires about 5500 multigrid iterations with explicit time marching, with semi-implicit integration this threshold is reached already within 1300 iterations. The pressure distribution after 1300 semi-implicit and 6000 explicit iterations with experimental comparison in Fig. 12 reveals that the two solutions are the same and a good experimental agreement. E. Three-dimensional unsteady supersonic base flow The final test case is an unsteady test case with supersonic flow over a cylinder. It has a free stream Mach number of M∞ = 2.46 and a Reynolds number of Re = 51.4×106 per meter. The cylindrical base has a radius of R=31.75 mm, the free stream stagnation pressure and temperature are T∞ = 294 K and P∞ = 515000 Pa. This type of flow is commonly encountered behind high speed projectiles. The flow is characterized by expansion waves triggered by the sharp flow-turning over the base corner leading to low pressure behind the base and a separation bubble controlled by the free shear layer. The modeling of the turbulence needs to account for the incoming upstream boundary Figure 12. High lift pressure distrilayer, the flow separation and recirculation after the base as well as for bution at 28% wing span. 1300 semithe free shear layer. It has been numerically validated by both LES and implicit iterations, 6000 explicit iterations. hybrid RANS/LES approaches; an overview is given by Peng23. In this 24 work, the DES approach by Spalart et al. is used. An unstructured hybrid grid of about 0.9×10 6 nodes has been used. The grid has about 15×10 3 nodes on the surface, 500×10 3 prismatic elements with 18 prismatic layers and 3.5×10 6 tetrahedral elements. The maximum aspect ratio is 1.7×103, and the implicit lines contain 17-19 nodes. The mesh is rather coarse to give the highest possible quality of the flow solution but since the actual solution is not of primary interest it is used for the comparison of the two steady state integration approaches. A dual time stepping approach4 is used based on a second order accurate backward difference scheme for which a steady state solution is solved in each time step, the iterations in the steady state calculation are denoted inner iterations. The explicit and line-implicit time integration are applied for the solution of the steady state problem.
Figure 13. Supersonic base flow. Left: computational grid at inflow and cylinder with implicit lines. Right: Instantaneous flow features in an axisymmetric plane, flow simulated with DES. A time step of Δt = 2.52×10-6 seconds (or equivalently Δt = 0.045 R/U∞) is used in the computations. The size of the time step is about 103 times as large as the time step that would have been required with an explicit calculation. The unsteady calculations are started from a poorly converged steady state solution. Three multigrid levels with Wcycles are used; coarsening ratio is 1:4 along the stretched lines and 1:8 in the tetrahedral outer part. The turbulent equation is discretized with a central scheme with added numerical dissipation based on 2nd and 4 th differences. Both the line-implicit and explicit time marching are applied to reduce and converge the residuals in each time step. As a convergence criterion, an absolute value of the maximum density residual is used and the inner iterations are interrupted when the residual falls below this value. Calculations are also carried out with a prescribed number of 150 inner iterations to compare the different levels of convergence with the two approaches. The typical 11 American Institute of Aeronautics and Astronautics
convergence for a few arbitrary time steps of the density residual and lift force from such calculations can be seen in Fig. 14. The global time is synchronized such that 0 inner iterations correspond to an arbitrary selected time. The line-implicit approach is far more efficient and drives the residuals to much lower values. With 150 inner iterations, the residual is reduced about 7 orders of magnitude with the line-implicit approach, with explicit inner iterations the residuals are reduced about three orders of magnitude. 150 inner iterations is beyond the number of iterations required to converge the forces, even for the explicit time marching. If instead a convergence criterion is introduced on the maximum density residual, the number of inner iterations can be reduced from about 40 iterations/time step to about 20 iterations per time step. This corresponds to a reduction of about two orders of magnitude of the residual which is sufficient for the convergence of the forces. The low value of the lift in Fig. 14 is due to the fluctuation of the lift around the zero average value.
Figure 14. Unsteady supersonic base flow, convergence in inner iterations. a) density residual, 150 iterations/time step b) density residual with convergence criterion, c) lift force with convergence criterion.
V. Summary and discussion There is obviously a gain with the line-implicit approach for all computed test cases presented. It should be mentioned though, that there is only a gain for cases where the dominating eigenvalues of the spectra stem from the high stretching of the grid. If e.g. a grid to an airfoil has a very fine streamwise resolution around the leading edge, explicit time marching with directional coarsening multigrid may be just as or even more efficient than line-implicit time marching. To quantify the reduction in iterations and CPU time, the computed results are summarized in Table 1 below. The number of multigrid iterations to obtain convergence to a steady state (or in a time step with dual time stepping) is given for each computed test case, a solution is said to be converged if the forces (lift and drag) has reached a value of that does not deviate more than 0.1% from its steady state value. The additional CPU time is what was obtained on a Linux cluster and may vary due to the computational platform. It should be pointed out the there is practically no memory overhead due to the implicit time integration, only the storage of the implicit lines which requires less than 1 % additional memory. The table reveals that the main reduction of the iterations and CPU time is obtained with the grids having the highest stretching. This is natural since the implicit time integration removes the stiffness due to stretching of the grid. The gain is slightly higher in three dimensions than obtained for the two-dimensional test case. In terms of iterations, the highest reduction is obtained for the high lift test case which has the highest Reynolds number and highest stretching. However, there is a high computational overhead with line-implicit time marching for this test case due to the large prismatic regions and high number of nodes in the implicit lines. The highest reduction in CPU time is obtained with the ONERA M6 wing which has a smaller computational overhead due to shorter lines and smaller prismatic region. The smallest reduction among the test cases in three dimensions is obtained for the unsteady calculations with the base flow. This is due to the moderate stretching with the smallest aspect. Nevertheless, the computing time is reduced and there is a considerable reduction in iterations and computing time for all test cases.
12 American Institute of Aeronautics and Astronautics
Table 1
Convergence data for computed test cases. A solution is converged when the forces (lift and drag) have converged to within 0.1% of its steady state value. Case Wall Directional MaximNo. of No. of Reduced CPU time Reduced dist. coarsening um AR explicit implicit iterations overhead CPU time ratio iterations iterations 1:4 15×10 3 1900 910 52 % 40 % 33 % NACA0012 1.0×10-6 1:2 15×10 3 2040 1030 49 % 36 % 31 % NACA0012 1.0×10-6 -7 3 1.0×10 1:4 150×10 1700 740 56 % 52 % 34 % NACA0012 -7 3 1:2 150×10 3000 1010 66 % 44 % 52 % NACA0012 1.0×10 1:4 12×10 3 2600 550 79 % 28 % 73 % ONERA M6 1.5×10-6 -6 3 1:4 143×10 5500 1300 76 % 75 % 59 % 3D High Lift 1.0×10 1.0×10-5 1:4 2×103 40 20 50 % 15 % 42 % Base flow .
It may be possible to further reduce the computational overhead by saving the factorized matrices in the implicit time integration. E.g. it is possible to store the LU-factorized matrices in the first Runge-Kutta stage and then use the matrices with frozen coefficients in the remaining stages. There will be a memory overhead though for doing. Another extension of the line-implicit approach may be possible for unsteady calculations. Up to now the lineimplicit integration has only been applied to steady state problems and extended to unsteady problems by applying it in the solution of the steady state problem for the dual time stepping approach. Alternatively, the explicit and lineimplicit approach could be applied in a time accurate manner such that the maximum allowable time step is determined by the smallest local time step computed from Eq. (11). Since the implicit scheme is still unconditionally stable along the lines the local time steps may be large enough for practical applications. The coefficients of the Runge-Kutta scheme in Eq. (2) have to be adjusted though to have the desired order of accuracy in time.
VI. Conclusion A convergence acceleration technique for flow problems on stretched grids is demonstrated. The convergence acceleration is based on combined explicit and line-implicit (semi-implicit) time Runge-Kutta The paper focuses on implementation and validation issues of the semi-implicit time integration starting from an explicit multigrid solver. The algorithm to derive the implicit lines is described and the principles behind the implementation in the flow solver are given. The implementation is limited to an additional routine that inverts exactly a block tri-diagonal matrix for each line and that modifies the total residual vector. In addition, the local time step is modified and increased. Explicit and semi-implicit time integration is applied to several steady state and unsteady test cases in two and three dimensions. A dual time stepping approach is used for the unsteady calculations and the two time integration approaches are applied to the steady state problem to be solved in each time step. All test cases have hybrid unstructured stretched grids with cell aspect ratios up to 150×10 3. An order of magnitude reduction of the required number of iterations to steady state is obtained with the semiimplicit approach. Convergence rates similar to rates obtained for inviscid calculations on isotropic grids are obtained indicating that the stiffness introduced by the stretching in the grid has been removed. The largest reduction of the required iterations to steady state is obtained for test cases in three dimensions where reductions up to 80% are obtained. Due to the computational overhead of the line-implicit approach the CPU time is reduced up to 75%.
Acknowledgments This work has been carried out within the EU project SimSAC under contract No. AST5-CT-2006-030838.
References 1
Jameson, A. “Solution of the Euler Equations by a multigrid Method”, Applied Mathematics and Computation, Vol. 13, 1983, pp. 327-356. 2 Mavriplis, D. J., Venkatakrishnan, V. “A Unified Multigrid Solver for the Navier-Stokes Equations on Mixed Elements”, International Journal for Computational Fluid Dynamics, Vol. 8, 1997, pp. 247-263. 3 Peraire, J., Peiro, J., Morgan, K., “A 3D Finite-Element Multigrid Solver for the Euler Equations”, AIAA Paper 92-0449, 1992.
13 American Institute of Aeronautics and Astronautics
4 Jameson, A., “Time Dependent Calculation using Multigrid with Applications to unsteady Flows Past Airfoils and Wings”, AIAA Paper 1991-1596, 1991. 5 Mavriplis, D. J., “On Convergence Acceleration Techniques for Unstructured Meshes”, AIAA Paper 1998-2966, 1998. 6 Mavriplis, D. J., “Directional Agglomeration Multigrid techniques for High Reynolds Number Viscous Flows”, AIAA Journal of Aircraft, Vol. 37, No. 10, 1999, pp. 1222-1229. 7 Eliasson, P., “EDGE, a Navier-Stokes Solver for Unstructured Grids”, FOI report, FOI-R-0298-SE, 2001. 8 Eliasson, P., “EDGE, a Navier-Stokes Solver for Unstructured Grids”, Proceedings to Finite Volumes for Complex Applications III, 2002, pp. 527-534. 9 Eliasson, P., Weinerfelt, P., “Recent Applications of the Flow Solver Edge”, Proceedings to 7th Asian CFD Conference, Bangalore, India, 2007. 10 Berglind, T., “An Agglomeration Algorithm for Navier-Stokes Grids”, AIAA Paper 2000-2254, 2000. 11 Haselbacher, A., McGuirck, J. J., and Page, G.J., “Finite Volume Discretization Aspects for Viscous Flows on Mixed Unstructured Grids”, AIAA Journal, Vol. 37, No. 2, 1999, pp. 177-184. 12 Conte, S.D., deBoor, C., Elementary Numerical Analysis, McGraw-Hill, New York, 1972. 13 Martin, D., Löhner, R.,, “An Implicit Linelet-Based Solver For incompressible Flows” AIAA Paper 92-0668, 1992. 14 Svärd, M., Nordström, J., “A stable high-order finite difference scheme for the compressible Navier-Stokes equations: noslip wall boundary conditions” Journal of Computational Physics, Vol. 227, 2008, pp 4805-4824. 15 Tysell, L., “An Advancing Front Grid Generation System for 3D Unstructured Grids” Proceedings of the 19th ICAS Congress, Anaheim, California, USA, 1994, pp. 1552-1564. 16 Tysell, L., “Hybrid Grid Generation for Complex 3D Geometries,” Numerical Grid Generation in Computational Field Simulation, Proceedings of the 7th Conference, Whistler, British Columbia, Canada, Sept. 25-28, 2000, pp. 337-346. 17 Spalart, P. R., Allmaras, S. R., “A one-equation turbulence model for aerodynamic flows” AIAA Paper 1992-0439, 1992. 18 Schmitt, V., Charpin, F., “Pressure Distributions on the ONERA-M6 Wing at Transonic Mach Numbers” AGARD-AR138, 1979. 19 Wallin, S., Johansson, A. V., “An explicit algebraic Reynolds stress model for incompressible and compressible turbulent flows,” Journal of Fluid Mechanics, Vol. 403, 2000, pp. 89-132. 20 Hellsten, A., “New Advanced k-ω Turbulence Model for High Lift Aerodynamics,” AIAA Journal, Vol. 43, No. 9, 2005, pp. 1857-1869. 21 Eliasson, P., “Investigation of a Half Model High Lift Configuration in a Wind Tunnel,” Journal of Aircraft, Vol. 45, No. 1, 2008, pp. 29-37. 22 Rudnik, R., Eliasson, P., Perraud, J., “Evaluation of CFD methods for transport aircraft high lift systems,” The Aeronautical Journal, 2005, pp. 53-64. 23 Peng, S.-H., “Algebraic Hybrid RANS-LES Modelling Applied to Incompressible and Compressible Turbulent Flows” AIAA Paper 2006-3910, 2006. 24 Spalart, P. R., Jou, W. H., Strelets, M., Allmaras, S. R., “Comments on the feasibility of LES for wings and on a hybrid RANS/LES approach” Advances in DNS/LES: Proceedings of the First AFOSR International Conference on DNS/LES, Greyden Press, Columbus, 1997.
14 American Institute of Aeronautics and Astronautics