parallel edge-based finite element techniques for nonlinear solid

0 downloads 0 Views 326KB Size Report
Newton method for solving nonlinear solid mechanics problems and within the ... Inexact Newton Method (INM) and of the GMRES solution of the implicit.
Edge-based finite element for nonlinear solid mechanics and unsteady transport problems A.L.G.A. Coutinho1, J.L.D. Alves2, M.A.D. Martins1 & D.A.F. de Souza2 1

Center for Parallel Computing COPPE/Federal University of Rio de Janeiro, Brazil 2 Laboratory for Computational Methods in Engineering COPPE/Federal University of Rio de Janeiro, Brazil

Abstract Edge-based data structures are used to improve computational efficiency of matrix-vector products within iterative solvers of finite element codes. Particularly, in this work, they are used in two applications, within the Inexact Newton method for solving nonlinear solid mechanics problems and within the GMRES algorithm for transport problems, both on unstructured meshes composed by tetrahedra or hexaedra. We found that for tetrahedral and hexaedral meshes, the use of edge-based data structures reduce memory requirements to hold matrix coefficients and the number of floating point operations to compute the matrix-vector product needed in the iterative driver of the Inexact Newton method or in the GMRES algorithm.

1 Introduction Nonlinear and time (or pseudo-time) dependent effects, together with a typical three-dimensional large scale domain are relevant aspects in many engineering applications. To treat correctly such phenomena, high performance supercomputing techniques are necessary to make such analysis feasible. Edge-based data structure was introduced in explicit compressible flow codes [10,8] and will be used here to increase computational performance of the Inexact Newton Method (INM) and of the GMRES solution of the implicit partition of a Implicit/Explicit (I/E) algorithm for transport problems.

We first use INM to predict the three-dimensional response of large-scale solid mechanics problems. In the INM, at each nonlinear iteration, a linear system of finite element equations is approximately solved by the preconditioned conjugate gradient method. The computational kernels of the INM, besides residual evaluations and stiffness matrix updatings, are the same of the iterative driver, that is, matrix-vector products and preconditioning. Matrix-vector products are optimized using edge-based data structures, already cited. In other application, we implemented edge-based data structures together with an I/E scheme for unsteady transport problems. The remainder of this work is organized as follows. In the next section we briefly review the governing nonlinear finite element equations, the Inexact Newton methods, the unsteady transport equation and the I/E time integration scheme. In Section 3 we describe the edge-based data structure in a general way. Section 4 shows some numerical examples. The paper ends with a summary of the main conclusions.

2 Governing equations 2.1 Incremental equilibrium equations and the Inexact Newton method The governing equations for the quasi-static deformation of a body occupying a volume Ω is, ∂σ + ρb = 0 ∂x

in Ω ⊂ R 3

(1)

where σ is the Cauchy stress tensor, x is the position vector, ρ is the weight per unit volume and b is a specified body force vector. Eqn (1) is subjected to the kinematic and traction boundary conditions, u( x , t ) = u ( x , t ) in Γ u ; σ : n = h( x , t ) in Γ h

(2)

where Γu represents the portion of the boundary where displacements are prescribed ( u ) and Γh represents the portion of the boundary on which tractions are specified (hi). The boundary of the body is given by Γ = Γ u ∪ Γ h , and t represents a pseudo-time (or increment). Discretizing the above eqns by a displacement-based finite element method we arrive to the discrete equilibrium equation, Fint + Fext = 0

(3)

where Fint is the internal force vector and Fext is the external force vector, accounting for applied forces and boundary conditions. Assuming that external forces are applied incrementally and restricting ourselves to material

nonlinearities only, we arrive, after a standard linearization procedure, to the nonlinear finite element system of equations to be solved at each load increment, K T ∆u = R

(4)

where KT is the tangent stiffness matrix, function of the current displacements, ∆u is the displacement increments vector and R is the unbalanced residual vector, that is, the difference between internal and external forces. For solving large-scale problems, particularly in 3D, it is more efficient to solve approximately the linearized problems by suitable inner iterative methods, such as preconditioned conjugate gradients (PCG). This inner-outer scheme is known as the Inexact Newton Method (INM). We employ here a further enhancement in the INM by choosing adaptively the tolerance for the inner iterative equation solver according to the algorithm suggested by Kelley [6] and detailed in [9]. The inner iterative driver needs to compute sparse matrix-vector products that may be accelerated using as edge-based data structure as we shall see. 2.2 Unsteady transport equation and the implicit/explicit algorithm We consider the transport equation over a domain Ω ⊂ R 3 as follows,

φ

∂u + β ⋅ ∇u − ∇ ⋅ D∇u = f ∂t

in Ω x [0, T ]

(5)

where u is the quantity being transported (temperature, mass concentration, etc), φ is a parameter regulating the time scale, β a divergence free Cartesian velocity field, D the second order orthotropic diffusion tensor and f a known source/sink term. Employing a stabilized finite element formulation, such as SUPG[2] and CAU[4], eqn (5) becomes a set of nonlinear ordinary differential equations represented by the semi-discrete equation, Mu + Ku = F

(6)

where M and K are coefficient matrices and F is a vector accounting for sources/sinks and boundary conditions. A Predictor/Multicorrector[12] Implicit/Explicit time marching algorithm advances solution in time. In the Implicit/Explicit algorithm the finite element mesh is partitioned in two groups, one of explicit elements and the other of implicit elements. The criterion to chose the partitions is based on the local CFL condition, given a time step ∆t, as follows,

CFL = e

β e ∆t

φ e he

(7)

where he is the element size. If the local CFL condition CFLe is greater than the unitary value, then the element must be treated implicitly. For a constant time step analysis, the partitions can be chosen once in the beginning of the analysis. The resulting scheme in each time step, for the explicit partition is, M L u n +1 = Fn +1

(8)

where ML is the lumped mass matrix, resulting in a diagonal system of equations. The scheme for the implicit partition is, M * u n +1 = Fn +1

(9)

where the effective mass matrix M* is a linear combination of M and K, as follows, M* = ( M + α∆tK )

(10)

The parameter α is chosen to be 0.5, representing the trapezoidal rule[5]. This implicit set of non-symmetric nonlinear equations is solved by a diagonal preconditioned GMRES[11] iterative driver, which demands several matrix-vector products, which can be accelerated by using an edge-based data structure to be discussed next.

3 Edge-based data structures Edge-based finite element data structures have been introduced for explicit computations of compressible flows in unstructured grids composed by triangles and tetrahedra[10,8]. It was observed in these works that residual computations with edge-based data structures were faster and required less memory than standard element-based residual evaluations. We may derive an edge-based finite element scheme by noting that the element matrices can be disassembled into their edge contributions. For the set of all elements sharing a given edge s, we may add their contributions, arriving to the edge matrix. In solid mechanics the resulting matrix is symmetric, and we need to store only a upper off-diagonal 3×3 block per edge, considering three degrees-offreedom per node. For advection-diffusion problems, the resulting effective mass matrix, in eqn (10) is non-symmetric. Thus, we need to store all the off-diagonal terms per edge, which is 12 per element or 2 per edge for one degree-of-freedom per node. The effective mass matrix does not have the shape functions conservation property[7], thus the diagonal recovery is not trivial as in block-symmetric

stiffness matrices. As long as the effective matrix is only used in the matrixvector products within the GMRES algorithm, the diagonal terms can be assembled and globally stored, which is already done for the diagonal preconditioning of the effective matrix. Therefore, in this case, matrix-vector calculations are done in two steps. First, we multiply the diagonal globally and second we compute the product with the off-diagonal terms edge-by-edge. Table 1 compares the storage requirements to hold the coefficients of either stiffness and effective mass matrices respectively, as well as the flop count and indirect addressing (i/a) operations for computing matrix-vector products using element and edge-based data structures for tetrahedral meshes. All data in these tables are referred to nnodes, the number of nodes in the finite element mesh. According to Löhner[7], the following estimates are valid for unstructured 3D grids, nel ≈ 5.5×nnodes, nedges ≈ 7×nnodes. For meshes composed by 8-node hexaedra, Martins et al.[9] performed a study to access the asymptotic ratio between the number of edges and the number of elements. Considering the computed asymptotic ratio, we built Table 2, which compares memory estimates to hold the stiffness matrix coefficients and operation counts to compute the matrix-vector products for hexahedral meshes, considering the element-by-element (EBE) and edge-based strategies. In this Table we considered nel ≈ nnodes, nedges ≈ 13×nel. Table 1. Memory to hold the stiffness (S) and the effective mass (EM) matrix coefficients and computational costs for element and edge-based matrix-vector products for tetrahedral finite element meshes. matrix S EM

data structure elements edges elements edges

memory 429 × nnodes 63 × nnodes 66 × nnodes 14 × nnodes

flop 1,386 × nnodes 252 × nnodes 132 × nnodes 28 × nnodes

i/a 198 × nnodes 126 × nnodes 66 × nnodes 42 × nnodes

Table 2. Memory to hold the stiffness (S) and the effective mass (EM) matrix coefficients and computational costs for element and edge-based matrix-vector products for hexahedral finite element meshes. matrix S EM

data structure elements edges elements edges

memory 300 × nnodes 117 × nnodes 56 × nnodes 26 × nnodes

flop 1,152 × nnodes 336 × nnodes 131 × nnodes 72 × nnodes

i/a 72 × nnodes 234 × nnodes 24 × nnodes 78 × nnodes

Clearly data in these tables show the superiority of the edge-based scheme over element-by-element strategies for symmetric and non-symmetric matrices. However, compared to EBE data structure, the edge scheme does not present a good balance between flop and i/a operations. To improve this ratio, [7] have proposed several alternatives to the single edge scheme. The underlying concept of such alternatives is that once data has been gathered, reuse them as much as

possible. This idea, combined with node renumbering strategies[7], introduces further enhancements in the finite element edge-based scheme. One of the alternatives is to use groups of edges, for instance, superedges. Coutinho et al.[3] have found that, for tetrahedral meshes, structures formed by gathering edges in spatial triangular and tetrahedral arrangements, the superedges, present a high data reutilization ratio and are simple to implement. The superedges are formed reordering the edge list, gathering edges with common nodes to form tetrahedra and triangles. To make a distinction between elements and superedges, we call a triangular superedge a superedge3 and a tetrahedral superedge a superedge6. For hexahedral meshes, besides simple edges (s1) and superedge6 (s6) there are s8, s16 and s28 groups of edges as shown in [9]. For a given finite element mesh we first reorder the nodes by Reverse Cuthill-Mckee algorithm to improve data locality. Then we extract the edges forming as much as possible superedges. After that we color each set of edges by a greedy algorithm to allow parallelization on shared vector multiprocessors and scalable shared memory machines. We have observed that for general unstructured grids more than 50% of all edges can be grouped into superedges.

4 Numerical examples 4.1 Plastic and elastic deformation of a hollow sphere This analysis is a test case for JAC3D program[1] and consists in a metallic hollow sphere internally over pressurized. The internal radius is 50 lu and the external radius is 100 lu. The Young modulus is 107 fu/lu2, plastic modulus of 106 fu/lu2, Poisson’s ratio of 0.3 and yield limit of 104 fu/lu2, where lu is length unit and fu is force unit. Due to symmetry in three directions, the domain was subdivided in an octant. The resulting mesh comprises 98,304 hexahedral elements and 105,393 nodes, generating 303,408 equations and 1,305,776 edges subdivided into s1 and s6 is presented in Figure 1a. A perfect elastic-plastic analysis problem was performed on a Cray T90 aiming computational and numerical performance of INM algorithm, varying PCG tolerances within the ranges of [10-6, 10-1] and [10-6,10-2]. The load is divided in 14 steps, residual and displacements tolerances are 10-3 and we limited non-linear iterations to 50. Table 3 presents computational and numerical performance for each PCG tolerance range as one can see, the total number of PCG iterations for the range [10-6,10-1] corresponds to 38% of the total number of iterations for the range [10-6,10-2], representing a gain of 50% in CPU. time. Figure 1b illustrates the evolution of the plastic regions within the hollow sphere at step 7.

(a)

(b)

Figure 1: (a) Octant Mesh; (b) Plastic ratio at step 7. Table 3. Computational and numerical performance for octant analysis. PCG tol. -6

-2

[10 , 10 ] [10-6, 10-1]

PCG iter. 50,384 (1.00) 19,102 (0.38)

non-linear iter.

PCG/NL ratio

116

434.3

114

167.6

CPU time (s) 5,958.512 (1.00) 2,895.356 (0.49)

Next, we discuss computational performance by comparing edge and element based data implementations during the solution of a linear elastic problem. Computation was carried out on a Cray T90 with a PCG tolerance of 10-3. We ran this problem using three different data structures. Table 4 shows the CPU time spent in the matrix-vector products for each data structure. As we can see, the edge-based timings are around 2 times faster than the element-by-element approach. The subtle difference between the first two analysis are due to the superedge complexity of the matrix-vector product algorithm, which enhances the use of the available vector registers of the Cray T90. Table 4. Matrix-vector product times for elastic octant analysis. data structure s1+s6 s1+s6+s8+s16+s28 elements

CPU time (s) 41,8 35,7 72,3

% 58,0 49,0 100,0

4.2 Unsteady Transport Flow Around a Submarine We studied the temperature effects around a Los Angeles class submarine going through a warmer water region. The finite element mesh is composed of 92,564 nodes and 504,947 tetrahedra. We assume that φ = 1 and a diffusion coefficient equal to 10-2. The velocity filed is unitary at the x direction and zero at the others. The time step evaluated for a CFL condition of 10 was 0.037, that when applied to the finite element mesh, selected the relatively smaller elements as implicit, and those were located near the surface mesh, as can be noticed in Figure 2a. An initial temperature of 1 is given in front of the domain, initially at zero temperature. For the preconditioned GMRES method, the tolerance was 10-3 with 25 vectors at Krylov´s base. All units are in coherent scales. Figure 2b shows the

evolution of the 0.5 temperature isosurface around the submarine at time step 300. Table 5 shows the relative time for the same analysis with 4 different approaches, for a given CFL condition: fully implicit element-by-element, implicit/explicit element-by-element, fully implicit edge-by-edge and implicit/explicit edge-by-edge. The simple change of data structure matrix-vector products made the analysis 2 times faster. The combined effect of the time marching Implicit/Explicit algorithm with the edge-based data structure made the analysis 4 times faster.

(a)

(b)

Figure 2: (a) Implicit partition for a CFL 10 imposed condition; (b) 0.5 temperature isosurface over the submarine surface at time step 300. Table 5. Relative time reduction using superedges rather than elements and with an implicit/explicit algorithm rather than a fully implicit. elements I I/E 1.00 0.60

superedges I I/E 0.48 0.24

5 Conclusions We presented a fast, memory inexpensive, finite element solution scheme for analyzing large-scale 3d nonlinear solid mechanics and unsteady transport problems. Our scheme employ novel nonlinear solution strategies, mesh partition time marching algorithms and suitable data-structures, allowing us to tackle challenging problems in their full complexity. The novel data structures, based on edges, rather than elements, are applicable to meshes composed by tetrahedra and hexahedra. Grouping edges into superedges improves furthermore the computational efficiency of the matrix-vector product, reducing the overhead associated to indirect addressing.

Acknowledgements This work is partially supported by CNPq grant 522692/95-8 and ANP, the brazilian Petroleum Agency. Computer time on the Cray T90 is provided by National Cneter of Supercomputing, UFRGS, Brazil. The authors would like to thank Prof. M. Behr, from Rice University, Houston, for the submarine finite element mesh.

References [1] Biffle, J.H. JAC3D - A three-dimensional finite element computer program for the nonlinear quasi-static response of solids with the conjugate gradient method, Sandia Report SAND87-1305, 1993. [2] Brooks, A. N., Hughes, T. J. R. Streamline upwind/Petrov-Galerkin formulation for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations, Computer Methods in Applied Mechanics and Engineering, 32, pp. 199-259, 1982. [3] Coutinho, A. L. G. A., Martins, M. A. D., Alves, J. L. D., et al. Edge-based finite element techniques for non-linear solid mechanics problems. International Journal for Numerical. Methods in. Engineering, 50 (9), pp. 2053-2068, 2001. [4] Galeão, A. C., Carmo, E. G. D. A consistent approximate upwind PetrovGalerkin method for convection-dominated problems, Computer Methods in Applied Mechanics and Engineering, 68, pp. 83-95, 1988. [5] Hughes, T. J. R. The Finite Element Method – Linear static and dynamic finite element analysis, Englewood Cliffs, NJ, Prentice-Hall, 1987. [6] Kelley, C.T.H. Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, 1995. [7] Löhner, R. Applied CFD techniques – an introduction based on finite element methods, John Wiley & Sons, 2001. [8] Luo, H, Baum, J.D., Löhner, R. Edge-based finite element scheme for the euler equations, AIAA Journal, 32 (6), pp. 1183-1190, 1994. [9] Martins, M. A. D., Alves, J. L. D., Coutinho, L. G. A. Parallel edge-based finite element techniques for Nonlinear solid mechanics. Lecture Notes in Computer Sciences, 1981, pp. 506-518, 2001. [10] Peraire, J., Peiro, J., Morgan, K. A Unstructured grid methods for compressible flows, Report R 787, Special Course on Unstructured Grid Methods for Advection Dominated Flows, 5, pp. 1-39, 1992. [11] Saad, Y., Schultz, M. H. GMRES: Generalized Minimal Residual Algorithm for Solving Non-Symmetric Systems. SIAM Journal of Scientific and Statistical Computing, 7, pp. 856-869, 1996. [12] Souza, D. A. F. Edge-based adaptive implicit/explicit algorithm for three dimensional transport problems, COPPE/UFRJ, M. Sc. Thesis, 2002.

Suggest Documents